Fixed typo: schedular/scheduler.

2019-05-07 05:11:36 -07:00

9 KiB

Raw Blame History

layout	page_title	sidebar_current	description
guides	Affinity	guides-advanced-scheduling	The following guide walks the user through using the affinity stanza in Nomad.

Expressing Job Placement Preferences with Affinities

The affinity stanza allows operators to express placement preferences for their jobs on particular types of nodes. Note that there is a key difference between the constraint stanza and the affinity stanza. The constraint stanza strictly filters where jobs are run based on attributes and client metadata. If no nodes are found to match, the placement does not succeed. The affinity stanza acts like a "soft constraint." Nomad will attempt to match the desired affinity, but placement will succeed even if no nodes match the desired criteria. This is done in conjunction with scoring based on the Nomad scheduler's bin packing algorithm which you can read more about here.

Reference Material

The affinity stanza documentation
Scheduling with Nomad

Estimated Time to Complete

20 minutes

Challenge

Your application can run in datacenters dc1 and dc2, but you have a strong preference to run it in dc2. Configure your job to tell the scheduler your preference while still allowing it to place your workload in dc1 if the desired resources aren't available.

Solution

Specify an affinity with the proper weight so that the Nomad scheduler can find the best nodes on which to place your job. The affinity weight will be included when scoring nodes for placement along with other factors like the bin packing algorithm.

Prerequisites

To perform the tasks described in this guide, you need to have a Nomad environment with Consul installed. You can use this repo to easily provision a sandbox environment. This guide will assume a cluster with one server node and three client nodes.

-> Please Note: This guide is for demo purposes and is only using a single server node. In a production cluster, 3 or 5 server nodes are recommended.

Steps

Step 1: Place One of the Client Nodes in a Different Datacenter

We are going express our job placement preference based on the datacenter our nodes are located in. Choose one of your client nodes and edit /etc/nomad.d/nomad.hcl to change its location to dc2. A snippet of an example configuration file is show below with the required change is shown below.

data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0"
datacenter = "dc2"

# Enable the client
client {
  enabled = true
...

After making the change on your chosen client node, restart the Nomad service

$ sudo systemctl restart nomad

If everything worked correctly, you should be able to run the nomad node status command and see that one of your nodes is now in datacenter dc2.

$ nomad node status
ID        DC   Name              Class   Drain  Eligibility  Status
3592943e  dc1  ip-172-31-27-159  <none>  false  eligible     ready
3dea0188  dc1  ip-172-31-16-175  <none>  false  eligible     ready
6b6e9518  dc2  ip-172-31-27-25   <none>  false  eligible     ready

Step 2: Create a Job with the `affinity` Stanza

Create a file with the name redis.nomad and place the following content in it:

job "redis" {
 datacenters = ["dc1", "dc2"]
 type = "service"

 affinity {
   attribute = "${node.datacenter}"
   value = "dc2"
   weight = 100
 }

 group "cache1" {
   count = 4

   task "redis" {
     driver = "docker"

     config {
       image = "redis:latest"
       port_map {
         db = 6379
       }
     }

     resources {
       network {
         port "db" {}
       }
     }

     service {
       name = "redis-cache"
       port = "db"
       check {
         name     = "alive"
         type     = "tcp"
         interval = "10s"
         timeout  = "2s"
       }
     }
   }
 }
}

Note that we used the affinity stanza and specified dc2 as the value for the attribute ${node.datacenter}. We used the value 100 for the weight which will cause the Nomad scheduler to rank nodes in datacenter dc2 with a higher score. Keep in mind that weights can range from -100 to 100, inclusive. Negative weights serve as anti-affinities which cause Nomad to avoid placing allocations on nodes that match the criteria.

Step 3: Register the Job `redis.nomad`

Run the Nomad job with the following command:

$ nomad run redis.nomad 
==> Monitoring evaluation "11388ef2"
    Evaluation triggered by job "redis"
    Allocation "0dfcf0ba" created: node "6b6e9518", group "cache1"
    Allocation "89a9aae9" created: node "3592943e", group "cache1"
    Allocation "9a00f742" created: node "6b6e9518", group "cache1"
    Allocation "fc0f21bc" created: node "3dea0188", group "cache1"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "11388ef2" finished with status "complete"

Note that two of the allocations in this example have been placed on node 6b6e9518. This is the node we configured to be in datacenter dc2. The Nomad scheduler selected this node because of the affinity we specified. All of the allocations have not been placed on this node because the Nomad scheduler considers other factors in the scoring such as bin packing. This helps avoid placing too many instances of the same job on a node and prevents reduced capacity during a node level failure. We will take a detailed look at the scoring in the next few steps.

Step 4: Check the Status of the `redis` Job

At this point, we are going to check the status of our job and verify where our allocations have been placed. Run the following command:

$ nomad status redis

You should see 4 instances of your job running in the Summary section of the output as show below:

...
Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
cache1      0       0         4        0       0         0

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
0dfcf0ba  6b6e9518  cache1      0        run      running  1h44m ago  1h44m ago
89a9aae9  3592943e  cache1      0        run      running  1h44m ago  1h44m ago
9a00f742  6b6e9518  cache1      0        run      running  1h44m ago  1h44m ago
fc0f21bc  3dea0188  cache1      0        run      running  1h44m ago  1h44m ago

You can cross-check this output with the results of the nomad node status command to verify that the majority of your workload has been placed on the node in dc2 (in our case, that node is 6b6e9518).

Step 5: Obtain Detailed Scoring Information on Job Placement

The Nomad scheduler will not always place all of your workload on nodes you have specified in the affinity stanza even if the resources are available. This is because affinity scoring is combined with other metrics as well before making a scheduling decision. In this step, we will take a look at some of those other factors.

Using the output from the previous step, find an allocation that has been placed on a node in dc2 and use the nomad alloc status command with the verbose option to obtain detailed scoring information on it. In this example, we will use the allocation ID 0dfcf0ba (your allocation IDs will be different).

$ nomad alloc status -verbose 0dfcf0ba

The resulting output will show the Placement Metrics section at the bottom.

...
Placement Metrics
Node                                  binpack  job-anti-affinity  node-reschedule-penalty  node-affinity  final score
6b6e9518-d2a4-82c8-af3b-6805c8cdc29c  0.33     0                  0                        1              0.665
3dea0188-ae06-ad98-64dd-a761ab2b1bf3  0.33     0                  0                        0              0.33
3592943e-67e4-461f-d888-d5842372a4d4  0.33     0                  0                        0              0.33

Note that the results from the binpack, job-anti-affinity, node-reschedule-penalty, and node-affinity columns are combined to produce the numbers listed in the final score column for each node. The Nomad scheduler uses the final score for each node in deciding where to make placements.

Next Steps

Experiment with the weight provided in the affinity stanza (the value can be from -100 through 100) and observe how the final score given to each node changes (use the nomad alloc status command as shown in the previous step).

9 KiB Raw Blame History