Service and Batch Job Preemption Guide (#5853)
* fix navigation issue for spread guide * skeleton for preemption guide * background info, challenge, and pre-reqs * steps * rewording of intro * re-wording * adding more detail to intro * clarify use of preemption in intro
This commit is contained in:
parent
5bd655e87d
commit
0d52fd8893
|
@ -0,0 +1,438 @@
|
|||
---
|
||||
layout: "guides"
|
||||
page_title: "Preemption (Service and Batch Jobs)"
|
||||
sidebar_current: "guides-operating-a-job-preemption-service-batch"
|
||||
description: |-
|
||||
The following guide walks the user through enabling and using preemption on
|
||||
service and batch jobs in Nomad Enterprise (0.9.3 and above).
|
||||
---
|
||||
|
||||
# Preemption for Service and Batch Jobs
|
||||
|
||||
~> **Enterprise Only!** This functionality only exists in Nomad Enterprise. This
|
||||
is not present in the open source version of Nomad.
|
||||
|
||||
Prior to Nomad 0.9, job [priority][priority] in Nomad was used to process
|
||||
scheduling requests in priority order. Preemption, implemented in Nomad 0.9
|
||||
allows Nomad to evict running allocations to place allocations of a higher
|
||||
priority. Allocations of a job that are blocked temporarily go into "pending"
|
||||
status until the cluster has additional capacity to run them. This is useful
|
||||
when operators need to run relatively higher priority tasks sooner even under
|
||||
resource contention across the cluster.
|
||||
|
||||
While Nomad 0.9 introduced preemption for [system][system-job] jobs, Nomad 0.9.3
|
||||
[Enterprise][enterprise] additionally allows preemption for
|
||||
[service][service-job] and [batch][batch-job] jobs. This functionality can
|
||||
easily be enabled by sending a [payload][payload-preemption-config] with the
|
||||
appropriate options specified to the [scheduler
|
||||
configuration][update-scheduler] API endpoint.
|
||||
|
||||
## Reference Material
|
||||
|
||||
- [Preemption][preemption]
|
||||
- [Nomad Enterprise Preemption][enterprise-preemption]
|
||||
|
||||
## Estimated Time to Complete
|
||||
|
||||
20 minutes
|
||||
|
||||
## Prerequisites
|
||||
|
||||
To perform the tasks described in this guide, you need to have a Nomad
|
||||
environment with Consul installed. You can use this
|
||||
[repo](https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud)
|
||||
to easily provision a sandbox environment. This guide will assume a cluster with
|
||||
one server node and three client nodes. To simulate resource contention, the
|
||||
nodes in this environment will each have 1 GB RAM (For AWS, you can choose the
|
||||
[t2.micro][t2-micro] instance type). Remember that service and batch job
|
||||
preemption require Nomad 0.9.3 [Enterprise][enterprise].
|
||||
|
||||
-> **Please Note:** This guide is for demo purposes and is only using a single
|
||||
server node. In a production cluster, 3 or 5 server nodes are recommended.
|
||||
|
||||
## Steps
|
||||
|
||||
### Step 1: Create a Job with Low Priority
|
||||
|
||||
Start by creating a job with relatively lower priority into your Nomad cluster.
|
||||
One of the allocations from this job will be preempted in a subsequent
|
||||
deployment when there is a resource contention in the cluster. Copy the
|
||||
following job into a file and name it `webserver.nomad`.
|
||||
|
||||
```hcl
|
||||
job "webserver" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
priority = 40
|
||||
|
||||
group "webserver" {
|
||||
count = 3
|
||||
|
||||
task "apache" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "httpd:latest"
|
||||
|
||||
port_map {
|
||||
http = 80
|
||||
}
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "http"{}
|
||||
}
|
||||
|
||||
memory = 600
|
||||
}
|
||||
|
||||
service {
|
||||
name = "apache-webserver"
|
||||
port = "http"
|
||||
|
||||
check {
|
||||
name = "alive"
|
||||
type = "http"
|
||||
path = "/"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
Note that the [count][count] is 3 and that each allocation is specifying 600 MB
|
||||
of [memory][memory]. Remember that each node only has 1 GB of RAM.
|
||||
|
||||
### Step 2: Run the Low Priority Job
|
||||
|
||||
Register `webserver.nomad`:
|
||||
|
||||
```shell
|
||||
$ nomad run webserver.nomad
|
||||
==> Monitoring evaluation "1596bfc8"
|
||||
Evaluation triggered by job "webserver"
|
||||
Allocation "725d3b49" created: node "16653ac1", group "webserver"
|
||||
Allocation "e2f9cb3d" created: node "f765c6e8", group "webserver"
|
||||
Allocation "e9d8df1b" created: node "b0700ec0", group "webserver"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "1596bfc8" finished with status "complete"
|
||||
```
|
||||
You should be able to check the status of the `webserver` job at this point and see that an allocation has been placed on each client node in the cluster:
|
||||
|
||||
```shell
|
||||
$ nomad status webserver
|
||||
ID = webserver
|
||||
Name = webserver
|
||||
Submit Date = 2019-06-19T04:20:32Z
|
||||
Type = service
|
||||
Priority = 40
|
||||
...
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created Modified
|
||||
725d3b49 16653ac1 webserver 0 run running 1m18s ago 59s ago
|
||||
e2f9cb3d f765c6e8 webserver 0 run running 1m18s ago 1m2s ago
|
||||
e9d8df1b b0700ec0 webserver 0 run running 1m18s ago 59s ago
|
||||
```
|
||||
|
||||
### Step 3: Create a Job with High Priority
|
||||
|
||||
Create another job with a [priority][priority] greater than the job you just deployed. Copy the following into a file named `redis.nomad`:
|
||||
|
||||
```hcl
|
||||
job "redis" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
priority = 80
|
||||
|
||||
group "cache1" {
|
||||
count = 1
|
||||
|
||||
task "redis" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "redis:latest"
|
||||
|
||||
port_map {
|
||||
db = 6379
|
||||
}
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
port "db" {}
|
||||
}
|
||||
|
||||
memory = 700
|
||||
}
|
||||
|
||||
service {
|
||||
name = "redis-cache"
|
||||
port = "db"
|
||||
|
||||
check {
|
||||
name = "alive"
|
||||
type = "tcp"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
Note that this job has a priority of 80 (greater than the priority of the job
|
||||
from [Step 1][step-1]) and requires 700 MB of memory. This allocation will
|
||||
create a resource contention in the cluster since each node only has 1 GB of
|
||||
memory with a 600 MB allocation already placed on it.
|
||||
|
||||
### Step 4: Try to Run `redis.nomad`
|
||||
|
||||
Remember that preemption for service and batch jobs are [disabled by
|
||||
default][preemption-config]. This means that the `redis` job will be queued due
|
||||
to resource contention in the cluster. You can verify the resource contention before actually registering your job by running the [`plan`][plan] command:
|
||||
|
||||
```shell
|
||||
$ nomad plan redis.nomad
|
||||
+ Job: "redis"
|
||||
+ Task Group: "cache1" (1 create)
|
||||
+ Task: "redis" (forces create)
|
||||
|
||||
Scheduler dry-run:
|
||||
- WARNING: Failed to place all allocations.
|
||||
Task Group "cache1" (failed to place 1 allocation):
|
||||
* Resources exhausted on 3 nodes
|
||||
* Dimension "memory" exhausted on 3 nodes
|
||||
```
|
||||
Run the job to see that the allocation will be queued:
|
||||
|
||||
```shell
|
||||
$ nomad run redis.nomad
|
||||
==> Monitoring evaluation "1e54e283"
|
||||
Evaluation triggered by job "redis"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "1e54e283" finished with status "complete" but failed to place all allocations:
|
||||
Task Group "cache1" (failed to place 1 allocation):
|
||||
* Resources exhausted on 3 nodes
|
||||
* Dimension "memory" exhausted on 3 nodes
|
||||
Evaluation "1512251a" waiting for additional capacity to place remainder
|
||||
```
|
||||
|
||||
You may also verify the allocation has been queued by now checking the status of the job:
|
||||
|
||||
```shell
|
||||
$ nomad status redis
|
||||
ID = redis
|
||||
Name = redis
|
||||
Submit Date = 2019-06-19T03:33:17Z
|
||||
Type = service
|
||||
Priority = 80
|
||||
...
|
||||
Placement Failure
|
||||
Task Group "cache1":
|
||||
* Resources exhausted on 3 nodes
|
||||
* Dimension "memory" exhausted on 3 nodes
|
||||
|
||||
Allocations
|
||||
No allocations placed
|
||||
```
|
||||
You may remove this job now. In the next steps, we will enable service job preemption and re-deploy:
|
||||
|
||||
```shell
|
||||
$ nomad stop -purge redis
|
||||
==> Monitoring evaluation "153db6c0"
|
||||
Evaluation triggered by job "redis"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "153db6c0" finished with status "complete"
|
||||
```
|
||||
|
||||
### Step 5: Enable Service Job Preemption
|
||||
|
||||
Verify the [scheduler configuration][scheduler-configuration] with the following
|
||||
command:
|
||||
|
||||
```shell
|
||||
$ curl -s localhost:4646/v1/operator/scheduler/configuration | jq
|
||||
{
|
||||
"SchedulerConfig": {
|
||||
"PreemptionConfig": {
|
||||
"SystemSchedulerEnabled": true,
|
||||
"BatchSchedulerEnabled": false,
|
||||
"ServiceSchedulerEnabled": false
|
||||
},
|
||||
"CreateIndex": 5,
|
||||
"ModifyIndex": 506
|
||||
},
|
||||
"Index": 506,
|
||||
"LastContact": 0,
|
||||
"KnownLeader": true
|
||||
}
|
||||
```
|
||||
|
||||
Note that [BatchSchedulerEnabled][batch-enabled] and
|
||||
[ServiceSchedulerEnabled][service-enabled] are both set to `false` by default.
|
||||
Since we are preempting service jobs in this guide, we need to set
|
||||
`ServiceSchedulerEnabled` to `true`. We will do this by directly interacting
|
||||
with the [API][update-scheduler].
|
||||
|
||||
Create the following JSON payload and place it in a file named `scheduler.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"PreemptionConfig": {
|
||||
"SystemSchedulerEnabled": true,
|
||||
"BatchSchedulerEnabled": false,
|
||||
"ServiceSchedulerEnabled": true
|
||||
}
|
||||
}
|
||||
```
|
||||
Note that [ServiceSchedulerEnabled][service-enabled] has been set to `true`.
|
||||
|
||||
Run the following command to update the scheduler configuration:
|
||||
|
||||
```shell
|
||||
$ curl -XPOST localhost:4646/v1/operator/scheduler/configuration -d @scheduler.json
|
||||
```
|
||||
You should now be able to check the scheduler configuration again and see that
|
||||
preemption has been enabled for service jobs (output below is abbreviated):
|
||||
|
||||
```shell
|
||||
$ curl -s localhost:4646/v1/operator/scheduler/configuration | jq
|
||||
{
|
||||
"SchedulerConfig": {
|
||||
"PreemptionConfig": {
|
||||
"SystemSchedulerEnabled": true,
|
||||
"BatchSchedulerEnabled": false,
|
||||
"ServiceSchedulerEnabled": true
|
||||
},
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
### Step 6: Try Running `redis.nomad` Again
|
||||
|
||||
Now that you have enabled preemption on service jobs, deploying your `redis` job
|
||||
should evict one of the lower priority `webserver` allocations and place it into
|
||||
a queue. You can run `nomad plan` to see a preview of what will happen:
|
||||
|
||||
```shell
|
||||
$ nomad plan redis.nomad
|
||||
+ Job: "redis"
|
||||
+ Task Group: "cache1" (1 create)
|
||||
+ Task: "redis" (forces create)
|
||||
|
||||
Scheduler dry-run:
|
||||
- All tasks successfully allocated.
|
||||
|
||||
Preemptions:
|
||||
|
||||
Alloc ID Job ID Task Group
|
||||
725d3b49-d5cf-6ba2-be3d-cb441c10a8b3 webserver webserver
|
||||
...
|
||||
```
|
||||
|
||||
Note that Nomad is indicating one of the `webserver` allocations will be
|
||||
evicted.
|
||||
|
||||
Now run the `redis` job:
|
||||
|
||||
```shell
|
||||
$ nomad run redis.nomad
|
||||
==> Monitoring evaluation "7ada9d9f"
|
||||
Evaluation triggered by job "redis"
|
||||
Allocation "8bfcdda3" created: node "16653ac1", group "cache1"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "7ada9d9f" finished with status "complete"
|
||||
```
|
||||
You can check the status of the `webserver` job and verify one of the allocations has been evicted:
|
||||
|
||||
```shell
|
||||
$ nomad status webserver
|
||||
ID = webserver
|
||||
Name = webserver
|
||||
Submit Date = 2019-06-19T04:20:32Z
|
||||
Type = service
|
||||
Priority = 40
|
||||
...
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
webserver 1 0 2 0 1 0
|
||||
|
||||
Placement Failure
|
||||
Task Group "webserver":
|
||||
* Resources exhausted on 3 nodes
|
||||
* Dimension "memory" exhausted on 3 nodes
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created Modified
|
||||
725d3b49 16653ac1 webserver 0 evict complete 4m10s ago 33s ago
|
||||
e2f9cb3d f765c6e8 webserver 0 run running 4m10s ago 3m54s ago
|
||||
e9d8df1b b0700ec0 webserver 0 run running 4m10s ago 3m51s ago
|
||||
```
|
||||
|
||||
### Step 7: Stop the Redis Job
|
||||
|
||||
Stop the `redis` job and verify that evicted/queued `webserver` allocation
|
||||
starts running again:
|
||||
|
||||
```shell
|
||||
$ nomad stop redis
|
||||
==> Monitoring evaluation "670922e9"
|
||||
Evaluation triggered by job "redis"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "670922e9" finished with status "complete"
|
||||
```
|
||||
You should now be able to see from the `webserver` status that the third allocation that was previously preempted is running again:
|
||||
|
||||
```shell
|
||||
$ nomad status webserver
|
||||
ID = webserver
|
||||
Name = webserver
|
||||
Submit Date = 2019-06-19T04:20:32Z
|
||||
Type = service
|
||||
Priority = 40
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
Parameterized = false
|
||||
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
webserver 0 0 3 0 1 0
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created Modified
|
||||
f623eb81 16653ac1 webserver 0 run running 13s ago 7s ago
|
||||
725d3b49 16653ac1 webserver 0 evict complete 6m44s ago 3m7s ago
|
||||
e2f9cb3d f765c6e8 webserver 0 run running 6m44s ago 6m28s ago
|
||||
e9d8df1b b0700ec0 webserver 0 run running 6m44s ago 6m25s ago
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
The process you learned in this guide can also be applied to
|
||||
[batch][batch-enabled] jobs as well. Read more about preemption in Nomad
|
||||
Enterprise [here][enterprise-preemption].
|
||||
|
||||
[batch-enabled]: /api/operator.html#batchschedulerenabled-1
|
||||
[batch-job]: /docs/schedulers.html#batch
|
||||
[count]: /docs/job-specification/group.html#count
|
||||
[enterprise]: /docs/enterprise/index.html
|
||||
[enterprise-preemption]: /docs/enterprise/preemption/index.html
|
||||
[memory]: /docs/job-specification/resources.html#memory
|
||||
[payload-preemption-config]: /api/operator.html#sample-payload-1
|
||||
[plan]: /docs/commands/job/plan.html
|
||||
[preemption]: /docs/internals/scheduling/preemption.html
|
||||
[preemption-config]: /api/operator.html#preemptionconfig-1
|
||||
[priority]: /docs/job-specification/job.html#priority
|
||||
[service-enabled]: /api/operator.html#serviceschedulerenabled-1
|
||||
[service-job]: /docs/schedulers.html#service
|
||||
[step-1]: #step-1-create-a-job-with-low-priority
|
||||
[system-job]: /docs/schedulers.html#system
|
||||
[t2-micro]: https://aws.amazon.com/ec2/instance-types/
|
||||
[update-scheduler]: /api/operator.html#update-scheduler-configuration
|
||||
[scheduler-configuration]: /api/operator.html#read-scheduler-configuration
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
layout: "guides"
|
||||
page_title: "Spread"
|
||||
sidebar_current: "guides-advanced-scheduling"
|
||||
sidebar_current: "guides-operating-a-job-spread"
|
||||
description: |-
|
||||
The following guide walks the user through using the spread stanza in Nomad.
|
||||
---
|
||||
|
|
|
@ -119,10 +119,14 @@
|
|||
<a href="/guides/operating-a-job/advanced-scheduling/affinity.html">Placement Preferences with Affinities</a>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("guides-spread") %>>
|
||||
<li<%= sidebar_current("guides-operating-a-job-spread") %>>
|
||||
<a href="/guides/operating-a-job/advanced-scheduling/spread.html">Fault Tolerance with Spread</a>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("guides-operating-a-job-preemption-service-batch") %>>
|
||||
<a href="/guides/operating-a-job/advanced-scheduling/preemption-service-batch.html">Preemption (Service and Batch Jobs)</a>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("guides-operating-a-job-external-lxc") %>>
|
||||
<a href="/guides/operating-a-job/external/lxc.html">Running LXC Applications</a>
|
||||
</li>
|
||||
|
|
Loading…
Reference in New Issue