13 KiB
layout | page_title | sidebar_current | description |
---|---|---|---|
guides | Preemption (Service and Batch Jobs) | guides-operating-a-job-preemption-service-batch | The following guide walks the user through enabling and using preemption on service and batch jobs in Nomad Enterprise (0.9.3 and above). |
Preemption for Service and Batch Jobs
~> Enterprise Only! This functionality only exists in Nomad Enterprise. This is not present in the open source version of Nomad.
Prior to Nomad 0.9, job priority in Nomad was used to process scheduling requests in priority order. Preemption, implemented in Nomad 0.9 allows Nomad to evict running allocations to place allocations of a higher priority. Allocations of a job that are blocked temporarily go into "pending" status until the cluster has additional capacity to run them. This is useful when operators need to run relatively higher priority tasks sooner even under resource contention across the cluster.
While Nomad 0.9 introduced preemption for system jobs, Nomad 0.9.3 Enterprise additionally allows preemption for service and batch jobs. This functionality can easily be enabled by sending a payload with the appropriate options specified to the scheduler configuration API endpoint.
Reference Material
Estimated Time to Complete
20 minutes
Prerequisites
To perform the tasks described in this guide, you need to have a Nomad environment with Consul installed. You can use this repo to easily provision a sandbox environment. This guide will assume a cluster with one server node and three client nodes. To simulate resource contention, the nodes in this environment will each have 1 GB RAM (For AWS, you can choose the t2.micro instance type). Remember that service and batch job preemption require Nomad 0.9.3 Enterprise.
-> Please Note: This guide is for demo purposes and is only using a single server node. In a production cluster, 3 or 5 server nodes are recommended.
Steps
Step 1: Create a Job with Low Priority
Start by creating a job with relatively lower priority into your Nomad cluster.
One of the allocations from this job will be preempted in a subsequent
deployment when there is a resource contention in the cluster. Copy the
following job into a file and name it webserver.nomad
.
job "webserver" {
datacenters = ["dc1"]
type = "service"
priority = 40
group "webserver" {
count = 3
task "apache" {
driver = "docker"
config {
image = "httpd:latest"
port_map {
http = 80
}
}
resources {
network {
mbits = 10
port "http"{}
}
memory = 600
}
service {
name = "apache-webserver"
port = "http"
check {
name = "alive"
type = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
}
}
}
Note that the count is 3 and that each allocation is specifying 600 MB of memory. Remember that each node only has 1 GB of RAM.
Step 2: Run the Low Priority Job
Register webserver.nomad
:
$ nomad run webserver.nomad
==> Monitoring evaluation "1596bfc8"
Evaluation triggered by job "webserver"
Allocation "725d3b49" created: node "16653ac1", group "webserver"
Allocation "e2f9cb3d" created: node "f765c6e8", group "webserver"
Allocation "e9d8df1b" created: node "b0700ec0", group "webserver"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "1596bfc8" finished with status "complete"
You should be able to check the status of the webserver
job at this point and see that an allocation has been placed on each client node in the cluster:
$ nomad status webserver
ID = webserver
Name = webserver
Submit Date = 2019-06-19T04:20:32Z
Type = service
Priority = 40
...
Allocations
ID Node ID Task Group Version Desired Status Created Modified
725d3b49 16653ac1 webserver 0 run running 1m18s ago 59s ago
e2f9cb3d f765c6e8 webserver 0 run running 1m18s ago 1m2s ago
e9d8df1b b0700ec0 webserver 0 run running 1m18s ago 59s ago
Step 3: Create a Job with High Priority
Create another job with a priority greater than the job you just deployed. Copy the following into a file named redis.nomad
:
job "redis" {
datacenters = ["dc1"]
type = "service"
priority = 80
group "cache1" {
count = 1
task "redis" {
driver = "docker"
config {
image = "redis:latest"
port_map {
db = 6379
}
}
resources {
network {
port "db" {}
}
memory = 700
}
service {
name = "redis-cache"
port = "db"
check {
name = "alive"
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
}
}
}
Note that this job has a priority of 80 (greater than the priority of the job from Step 1) and requires 700 MB of memory. This allocation will create a resource contention in the cluster since each node only has 1 GB of memory with a 600 MB allocation already placed on it.
Step 4: Try to Run redis.nomad
Remember that preemption for service and batch jobs are disabled by
default. This means that the redis
job will be queued due
to resource contention in the cluster. You can verify the resource contention before actually registering your job by running the plan
command:
$ nomad plan redis.nomad
+ Job: "redis"
+ Task Group: "cache1" (1 create)
+ Task: "redis" (forces create)
Scheduler dry-run:
- WARNING: Failed to place all allocations.
Task Group "cache1" (failed to place 1 allocation):
* Resources exhausted on 3 nodes
* Dimension "memory" exhausted on 3 nodes
Run the job to see that the allocation will be queued:
$ nomad run redis.nomad
==> Monitoring evaluation "1e54e283"
Evaluation triggered by job "redis"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "1e54e283" finished with status "complete" but failed to place all allocations:
Task Group "cache1" (failed to place 1 allocation):
* Resources exhausted on 3 nodes
* Dimension "memory" exhausted on 3 nodes
Evaluation "1512251a" waiting for additional capacity to place remainder
You may also verify the allocation has been queued by now checking the status of the job:
$ nomad status redis
ID = redis
Name = redis
Submit Date = 2019-06-19T03:33:17Z
Type = service
Priority = 80
...
Placement Failure
Task Group "cache1":
* Resources exhausted on 3 nodes
* Dimension "memory" exhausted on 3 nodes
Allocations
No allocations placed
You may remove this job now. In the next steps, we will enable service job preemption and re-deploy:
$ nomad stop -purge redis
==> Monitoring evaluation "153db6c0"
Evaluation triggered by job "redis"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "153db6c0" finished with status "complete"
Step 5: Enable Service Job Preemption
Verify the scheduler configuration with the following command:
$ curl -s localhost:4646/v1/operator/scheduler/configuration | jq
{
"SchedulerConfig": {
"PreemptionConfig": {
"SystemSchedulerEnabled": true,
"BatchSchedulerEnabled": false,
"ServiceSchedulerEnabled": false
},
"CreateIndex": 5,
"ModifyIndex": 506
},
"Index": 506,
"LastContact": 0,
"KnownLeader": true
}
Note that BatchSchedulerEnabled and
ServiceSchedulerEnabled are both set to false
by default.
Since we are preempting service jobs in this guide, we need to set
ServiceSchedulerEnabled
to true
. We will do this by directly interacting
with the API.
Create the following JSON payload and place it in a file named scheduler.json
:
{
"PreemptionConfig": {
"SystemSchedulerEnabled": true,
"BatchSchedulerEnabled": false,
"ServiceSchedulerEnabled": true
}
}
Note that ServiceSchedulerEnabled has been set to true
.
Run the following command to update the scheduler configuration:
$ curl -XPOST localhost:4646/v1/operator/scheduler/configuration -d @scheduler.json
You should now be able to check the scheduler configuration again and see that preemption has been enabled for service jobs (output below is abbreviated):
$ curl -s localhost:4646/v1/operator/scheduler/configuration | jq
{
"SchedulerConfig": {
"PreemptionConfig": {
"SystemSchedulerEnabled": true,
"BatchSchedulerEnabled": false,
"ServiceSchedulerEnabled": true
},
...
}
Step 6: Try Running redis.nomad
Again
Now that you have enabled preemption on service jobs, deploying your redis
job
should evict one of the lower priority webserver
allocations and place it into
a queue. You can run nomad plan
to see a preview of what will happen:
$ nomad plan redis.nomad
+ Job: "redis"
+ Task Group: "cache1" (1 create)
+ Task: "redis" (forces create)
Scheduler dry-run:
- All tasks successfully allocated.
Preemptions:
Alloc ID Job ID Task Group
725d3b49-d5cf-6ba2-be3d-cb441c10a8b3 webserver webserver
...
Note that Nomad is indicating one of the webserver
allocations will be
evicted.
Now run the redis
job:
$ nomad run redis.nomad
==> Monitoring evaluation "7ada9d9f"
Evaluation triggered by job "redis"
Allocation "8bfcdda3" created: node "16653ac1", group "cache1"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "7ada9d9f" finished with status "complete"
You can check the status of the webserver
job and verify one of the allocations has been evicted:
$ nomad status webserver
ID = webserver
Name = webserver
Submit Date = 2019-06-19T04:20:32Z
Type = service
Priority = 40
...
Summary
Task Group Queued Starting Running Failed Complete Lost
webserver 1 0 2 0 1 0
Placement Failure
Task Group "webserver":
* Resources exhausted on 3 nodes
* Dimension "memory" exhausted on 3 nodes
Allocations
ID Node ID Task Group Version Desired Status Created Modified
725d3b49 16653ac1 webserver 0 evict complete 4m10s ago 33s ago
e2f9cb3d f765c6e8 webserver 0 run running 4m10s ago 3m54s ago
e9d8df1b b0700ec0 webserver 0 run running 4m10s ago 3m51s ago
Step 7: Stop the Redis Job
Stop the redis
job and verify that evicted/queued webserver
allocation
starts running again:
$ nomad stop redis
==> Monitoring evaluation "670922e9"
Evaluation triggered by job "redis"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "670922e9" finished with status "complete"
You should now be able to see from the webserver
status that the third allocation that was previously preempted is running again:
$ nomad status webserver
ID = webserver
Name = webserver
Submit Date = 2019-06-19T04:20:32Z
Type = service
Priority = 40
Datacenters = dc1
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
webserver 0 0 3 0 1 0
Allocations
ID Node ID Task Group Version Desired Status Created Modified
f623eb81 16653ac1 webserver 0 run running 13s ago 7s ago
725d3b49 16653ac1 webserver 0 evict complete 6m44s ago 3m7s ago
e2f9cb3d f765c6e8 webserver 0 run running 6m44s ago 6m28s ago
e9d8df1b b0700ec0 webserver 0 run running 6m44s ago 6m25s ago
Next Steps
The process you learned in this guide can also be applied to batch jobs as well. Read more about preemption in Nomad Enterprise here.