Update operating a job to tell a cohesive story
This commit is contained in:
parent
2fbd4a0cba
commit
19cfd137a2
|
@ -1,15 +0,0 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Operating a Job"
|
||||
sidebar_current: "docs-jobops"
|
||||
description: |-
|
||||
Learn how to operate a Nomad Job.
|
||||
---
|
||||
|
||||
# Operating a Job
|
||||
|
||||
Once a job has been submitted to Nomad, users must be able to inspect the state
|
||||
of tasks, understand resource usage and access task logs. Further, for services,
|
||||
performing zero downtime updates is critical. This section provides some best
|
||||
practices and guidance for operating jobs under Nomad. Please navigate the
|
||||
appropriate sub-sections for more information.
|
|
@ -1,174 +0,0 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Operating a Job: Inspecting State"
|
||||
sidebar_current: "docs-jobops-inspection"
|
||||
description: |-
|
||||
Learn how to inspect a Nomad Job.
|
||||
---
|
||||
|
||||
# Inspecting state
|
||||
|
||||
Once a job is submitted, the next step is to ensure it is running. This section
|
||||
will assume we have submitted a job with the name _example_.
|
||||
|
||||
To get a high-level over view of our job we can use the [`nomad status`
|
||||
command](/docs/commands/status.html). This command will display the list of
|
||||
running allocations, as well as any recent placement failures. An example below
|
||||
shows that the job has some allocations placed but did not have enough resources
|
||||
to place all of the desired allocations. We run with `-evals` to see that there
|
||||
is an outstanding evaluation for the job:
|
||||
|
||||
```
|
||||
$ nomad status example
|
||||
ID = example
|
||||
Name = example
|
||||
Type = service
|
||||
Priority = 50
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
|
||||
Evaluations
|
||||
ID Priority Triggered By Status Placement Failures
|
||||
5744eb15 50 job-register blocked N/A - In Progress
|
||||
8e38e6cf 50 job-register complete true
|
||||
|
||||
Placement Failure
|
||||
Task Group "cache":
|
||||
* Resources exhausted on 1 nodes
|
||||
* Dimension "cpu exhausted" exhausted on 1 nodes
|
||||
|
||||
Allocations
|
||||
ID Eval ID Node ID Task Group Desired Status Created At
|
||||
12681940 8e38e6cf 4beef22f cache run running 08/08/16 21:03:19 CDT
|
||||
395c5882 8e38e6cf 4beef22f cache run running 08/08/16 21:03:19 CDT
|
||||
4d7c6f84 8e38e6cf 4beef22f cache run running 08/08/16 21:03:19 CDT
|
||||
843b07b8 8e38e6cf 4beef22f cache run running 08/08/16 21:03:19 CDT
|
||||
a8bc6d3e 8e38e6cf 4beef22f cache run running 08/08/16 21:03:19 CDT
|
||||
b0beb907 8e38e6cf 4beef22f cache run running 08/08/16 21:03:19 CDT
|
||||
da21c1fd 8e38e6cf 4beef22f cache run running 08/08/16 21:03:19 CDT
|
||||
```
|
||||
|
||||
In the above example we see that the job has a "blocked" evaluation that is in
|
||||
progress. When Nomad can not place all the desired allocations, it creates a
|
||||
blocked evaluation that waits for more resources to become available. We can use
|
||||
the [`eval-status` command](/docs/commands/eval-status.html) to examine any
|
||||
evaluation in more detail. For the most part this should never be necessary but
|
||||
can be useful to see why all of a job's allocations were not placed. For
|
||||
example if we run it on the _example_ job, which had a placement failure
|
||||
according to the above output, we see:
|
||||
|
||||
```
|
||||
nomad eval-status 8e38e6cf
|
||||
ID = 8e38e6cf
|
||||
Status = complete
|
||||
Status Description = complete
|
||||
Type = service
|
||||
TriggeredBy = job-register
|
||||
Job ID = example
|
||||
Priority = 50
|
||||
Placement Failures = true
|
||||
|
||||
Failed Placements
|
||||
Task Group "cache" (failed to place 3 allocations):
|
||||
* Resources exhausted on 1 nodes
|
||||
* Dimension "cpu exhausted" exhausted on 1 nodes
|
||||
|
||||
Evaluation "5744eb15" waiting for additional capacity to place remainder
|
||||
```
|
||||
|
||||
More interesting though is the [`alloc-status`
|
||||
command](/docs/commands/alloc-status.html). This command gives us the most
|
||||
recent events that occurred for a task, its resource usage, port allocations and
|
||||
more:
|
||||
|
||||
```
|
||||
nomad alloc-status 12
|
||||
ID = 12681940
|
||||
Eval ID = 8e38e6cf
|
||||
Name = example.cache[1]
|
||||
Node ID = 4beef22f
|
||||
Job ID = example
|
||||
Client Status = running
|
||||
Created At = 06/28/16 15:37:44 UTC
|
||||
|
||||
Task "redis" is "running"
|
||||
Task Resources
|
||||
CPU Memory Disk IOPS Addresses
|
||||
2/500 6.3 MiB/256 MiB 300 MiB 0 db: 127.0.0.1:57161
|
||||
|
||||
Recent Events:
|
||||
Time Type Description
|
||||
06/28/16 15:46:42 UTC Started Task started by client
|
||||
06/28/16 15:46:10 UTC Restarting Task restarting in 30.863215327s
|
||||
06/28/16 15:46:10 UTC Terminated Exit Code: 137, Exit Message: "Docker container exited with non-zero exit code: 137"
|
||||
06/28/16 15:37:46 UTC Started Task started by client
|
||||
06/28/16 15:37:44 UTC Received Task received by client
|
||||
```
|
||||
|
||||
In the above example we forced killed the Docker container so that we could see
|
||||
in the event history that Nomad detected the failure and restarted the
|
||||
allocation.
|
||||
|
||||
The `alloc-status` command is a good starting to point for debugging an
|
||||
application that did not start. In this example task we are trying to start a
|
||||
redis image using `redis:2.8` but the user has accidentally put a comma instead
|
||||
of a period, typing `redis:2,8`.
|
||||
|
||||
|
||||
When the job is run, it produces an allocation that fails. The `alloc-status`
|
||||
command gives us the reason why:
|
||||
|
||||
```
|
||||
nomad alloc-status c0f1
|
||||
ID = c0f1b34c
|
||||
Eval ID = 4df393cb
|
||||
Name = example.cache[0]
|
||||
Node ID = 13063955
|
||||
Job ID = example
|
||||
Client Status = failed
|
||||
Created At = 06/28/16 15:50:22 UTC
|
||||
|
||||
Task "redis" is "dead"
|
||||
Task Resources
|
||||
CPU Memory Disk IOPS Addresses
|
||||
500 256 MiB 300 MiB 0 db: 127.0.0.1:23285
|
||||
|
||||
Recent Events:
|
||||
Time Type Description
|
||||
06/28/16 15:50:22 UTC Not Restarting Error was unrecoverable
|
||||
06/28/16 15:50:22 UTC Driver Failure failed to create image: Failed to pull `redis:2,8`: API error (500): invalid tag format
|
||||
06/28/16 15:50:22 UTC Received Task received by client
|
||||
```
|
||||
|
||||
Not all failures are this easily debuggable. If the `alloc-status` command shows
|
||||
many restarts occurring as in the example below, it is a good hint that the error
|
||||
is occurring at the application level during start up. These failures can be
|
||||
debugged by looking at logs which is covered in the [Nomad Job Logging
|
||||
documentation](/docs/jobops/logs.html).
|
||||
|
||||
```
|
||||
$ nomad alloc-status e6b6
|
||||
ID = e6b625a1
|
||||
Eval ID = 68b742e8
|
||||
Name = example.cache[0]
|
||||
Node ID = 83ef596c
|
||||
Job ID = example
|
||||
Client Status = pending
|
||||
Created At = 06/28/16 15:55:48
|
||||
|
||||
Task "redis" is "pending"
|
||||
Task Resources
|
||||
CPU Memory Disk IOPS Addresses
|
||||
500 256 MiB 300 MiB 0 db: 127.0.0.1:30153
|
||||
|
||||
Recent Events:
|
||||
Time Type Description
|
||||
06/28/16 15:56:16 UTC Restarting Task restarting in 5.178426031s
|
||||
06/28/16 15:56:16 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
|
||||
06/28/16 15:56:16 UTC Started Task started by client
|
||||
06/28/16 15:56:00 UTC Restarting Task restarting in 5.00123931s
|
||||
06/28/16 15:56:00 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
|
||||
06/28/16 15:55:59 UTC Started Task started by client
|
||||
06/28/16 15:55:48 UTC Received Task received by client
|
||||
```
|
|
@ -1,96 +0,0 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Operating a Job: Accessing Logs"
|
||||
sidebar_current: "docs-jobops-logs"
|
||||
description: |-
|
||||
Learn how to operate a Nomad Job.
|
||||
---
|
||||
|
||||
# Accessing Logs
|
||||
|
||||
Accessing applications logs is critical when debugging issues, performance
|
||||
problems or even for verifying the application is starting correctly. To make
|
||||
this as simple as possible, Nomad provides [log
|
||||
rotation](/docs/jobspec/index.html#log_rotation) in the jobspec, provides a [CLI
|
||||
command](/docs/commands/logs.html) and an [API](/docs/http/client-fs.html#logs)
|
||||
for accessing application logs and data files.
|
||||
|
||||
To see this in action we can just run the example job which created using `nomad
|
||||
init`:
|
||||
|
||||
```
|
||||
$ nomad init
|
||||
Example job file written to example.nomad
|
||||
```
|
||||
|
||||
This job will start a redis instance in a Docker container. We can run it now:
|
||||
|
||||
```
|
||||
$ nomad run example.nomad
|
||||
==> Monitoring evaluation "7a3b78c0"
|
||||
Evaluation triggered by job "example"
|
||||
Allocation "c3c58508" created: node "b5320e2d", group "cache"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "7a3b78c0" finished with status "complete"
|
||||
```
|
||||
|
||||
We can grab the allocation ID from above and use the [`nomad logs`
|
||||
command](/docs/commands/logs.html) to access the applications logs. The `logs`
|
||||
command supports both displaying the logs as well as following logs, blocking
|
||||
for more output.
|
||||
|
||||
Thus to access the `stdout` we can issue the below command:
|
||||
|
||||
```
|
||||
$ nomad logs c3c58508 redis
|
||||
_._
|
||||
_.-``__ ''-._
|
||||
_.-`` `. `_. ''-._ Redis 3.2.1 (00000000/0) 64 bit
|
||||
.-`` .-```. ```\/ _.,_ ''-._
|
||||
( ' , .-` | `, ) Running in standalone mode
|
||||
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
|
||||
| `-._ `._ / _.-' | PID: 1
|
||||
`-._ `-._ `-./ _.-' _.-'
|
||||
|`-._`-._ `-.__.-' _.-'_.-'|
|
||||
| `-._`-._ _.-'_.-' | http://redis.io
|
||||
`-._ `-._`-.__.-'_.-' _.-'
|
||||
|`-._`-._ `-.__.-' _.-'_.-'|
|
||||
| `-._`-._ _.-'_.-' |
|
||||
`-._ `-._`-.__.-'_.-' _.-'
|
||||
`-._ `-.__.-' _.-'
|
||||
`-._ _.-'
|
||||
`-.__.-'
|
||||
|
||||
1:M 28 Jun 19:49:30.504 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
|
||||
1:M 28 Jun 19:49:30.505 # Server started, Redis version 3.2.1
|
||||
1:M 28 Jun 19:49:30.505 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
|
||||
1:M 28 Jun 19:49:30.505 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
|
||||
1:M 28 Jun 19:49:30.505 * The server is now ready to accept connections on port 6379
|
||||
```
|
||||
|
||||
To display the `stderr` for the task we would run the following:
|
||||
|
||||
```
|
||||
$ nomad logs -stderr c3c58508 redis
|
||||
```
|
||||
|
||||
While this works well for quickly accessing logs, we recommend running a
|
||||
log-shipper for long term storage of logs. In many cases this will not be needed
|
||||
and the above will suffice but for use cases in which log retention is needed
|
||||
Nomad can accommodate.
|
||||
|
||||
Since we place application logs inside the `alloc/` directory, all tasks within
|
||||
the same task group have access to each others logs. Thus we can have a task
|
||||
group as follows:
|
||||
|
||||
```
|
||||
group "my-group" {
|
||||
task "log-producer" {...}
|
||||
task "log-shipper" {...}
|
||||
}
|
||||
```
|
||||
|
||||
In the above example, the `log-producer` task is the application that should be
|
||||
run and will be producing the logs we would like to ship and the `log-shipper`
|
||||
reads these logs from the `alloc/logs/` directory and ships them to a long term
|
||||
storage such as S3.
|
|
@ -1,9 +1,11 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Operating a Job: Resource Utilization"
|
||||
sidebar_current: "docs-jobops-resource-utilization"
|
||||
page_title: "Resource Utilization - Operating a Job"
|
||||
sidebar_current: "docs-operating-a-job-resource-utilization"
|
||||
description: |-
|
||||
Learn how to see resource utilization of a Nomad Job.
|
||||
Nomad supports reporting detailed job statistics and resource utilization
|
||||
metrics for most task drivers. This section describes the ways to inspect a
|
||||
job's resource consumption and utilization.
|
||||
---
|
||||
|
||||
# Determining Resource Utilization
|
||||
|
@ -16,7 +18,7 @@ command](/docs/commands/alloc-status.html) by specifying the `-stats` flag.
|
|||
In the below example we are running `redis` and can see its resource utilization
|
||||
below:
|
||||
|
||||
```
|
||||
```text
|
||||
$ nomad alloc-status c3e0
|
||||
ID = c3e0e3e0
|
||||
Eval ID = 617e5e39
|
||||
|
@ -49,7 +51,7 @@ Here we can see that we are near the limit of our configured CPU but we have
|
|||
plenty of memory headroom. We can use this information to alter our job's
|
||||
resources to better reflect is actually needs:
|
||||
|
||||
```
|
||||
```hcl
|
||||
resource {
|
||||
cpu = 2000
|
||||
memory = 100
|
||||
|
|
|
@ -1,16 +0,0 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Operating a Job: Service Discovery"
|
||||
sidebar_current: "docs-jobops-service-discovery"
|
||||
description: |-
|
||||
Learn how to use service discovery with Nomad Jobs.
|
||||
---
|
||||
|
||||
# Using Service Discovery
|
||||
|
||||
Service discovery is key for applications in a dynamic environment to discover
|
||||
each other. As such, Nomad has built in support for registering services and
|
||||
health checks with [Consul](http://consul.io).
|
||||
|
||||
For more details on using service discovery with your application, see
|
||||
the [Service Discovery documentation](/docs/jobspec/servicediscovery.html).
|
|
@ -0,0 +1,93 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Submitting Jobs - Operating a Job"
|
||||
sidebar_current: "docs-operating-a-job-submitting"
|
||||
description: |-
|
||||
The job file is the unit of work in Nomad. Upon authoring, the job file is
|
||||
submitted to the server for evaluation and scheduling. This section discusses
|
||||
some techniques for submitting jobs.
|
||||
---
|
||||
|
||||
# Submitting Jobs
|
||||
|
||||
In Nomad, the description of the job and all its requirements are maintained in
|
||||
a single file called the "job file". This job file resides locally on disk and
|
||||
it is highly recommended that you check job files into source control.
|
||||
|
||||
The general flow for submitting a job in Nomad is:
|
||||
|
||||
1. Author a job file according to the job specification
|
||||
1. Plan and review changes with a Nomad server
|
||||
1. Submit the job file to a Nomad server
|
||||
1. (Optional) Review job status and logs
|
||||
|
||||
Here is a very basic example to get you started.
|
||||
|
||||
## Author a Job File
|
||||
Authoring a job file is very easy. For more detailed information, please see the
|
||||
[job specification](/docs/jobspec/index.html). Here is a sample job file which
|
||||
runs a small docker container web server.
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
datacenters = ["dc1"]
|
||||
|
||||
group "example" {
|
||||
task "server" {
|
||||
driver = "docker"
|
||||
config {
|
||||
image = "hashicorp/http-echo"
|
||||
args = ["-text", "hello world"]
|
||||
}
|
||||
|
||||
resources {
|
||||
memory = 32
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This job file exists on your local workstation in plain text. When you are
|
||||
satisfied with this job file, you will plan and review the scheduler decision.
|
||||
It is generally a best practice to commit job files to source control,
|
||||
especially if you are working in a team.
|
||||
|
||||
## Planning the Job
|
||||
Once the job file is authored, we need to plan out the changes. The `nomad plan`
|
||||
command may be used to perform a dry-run of the scheduler and inform us of
|
||||
which scheduling decisions would take place.
|
||||
|
||||
```shell
|
||||
$ nomad plan example.nomad
|
||||
```
|
||||
|
||||
The resulting output will look like:
|
||||
|
||||
```text
|
||||
TODO: Output
|
||||
```
|
||||
|
||||
Note that no action has been taken. This is a complete dry-run and no
|
||||
allocations have taken place.
|
||||
|
||||
## Submitting the Job
|
||||
Assuming the output of the plan looks acceptable, we can ask Nomad to execute
|
||||
this job. This is done via the `nomad run` command. We can optionally supply
|
||||
the modify index provided to us by the plan command to ensure no changes to this
|
||||
job have taken place between our plan and now.
|
||||
|
||||
```shell
|
||||
$ nomad run -check-index=123 example.nomad
|
||||
```
|
||||
|
||||
The resulting output will look like:
|
||||
|
||||
```text
|
||||
TODO: Output
|
||||
```
|
||||
|
||||
Now that the job is scheduled, it may or may not be running. We need to inspect
|
||||
the allocation status and logs to make sure the job started correctly. The next
|
||||
section on [inspecting state](/docs/operating-a-job/inspecting-state.html) details ways to
|
||||
examine this job.
|
|
@ -1,103 +0,0 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Operating a Job: Task Configuration"
|
||||
sidebar_current: "docs-jobops-task-config"
|
||||
description: |-
|
||||
Learn how to ship task configuration in a Nomad Job.
|
||||
---
|
||||
|
||||
# Task Configurations
|
||||
|
||||
Most tasks need to be parameterized in some way. The simplest is via
|
||||
command-line arguments but often times tasks consume complex configurations via
|
||||
config files. Here we explore how to configure Nomad jobs to support many
|
||||
common configuration use cases.
|
||||
|
||||
## Command-line Arguments
|
||||
|
||||
The simplest type of configuration to support is tasks which take their
|
||||
configuration via command-line arguments that will not change.
|
||||
|
||||
Nomad has many [drivers](/docs/drivers/index.html) and most support passing
|
||||
arguments to their tasks via the `args` parameter. To configure these simply
|
||||
provide the appropriate arguments. Below is an example using the [`docker`
|
||||
driver](/docs/drivers/docker.html) to launch `memcached(8)` and set its thread count
|
||||
to 4, increase log verbosity, as well as assign the correct port and address
|
||||
bindings using interpolation:
|
||||
|
||||
```
|
||||
task "memcached" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "memcached:1.4.27"
|
||||
args = [
|
||||
# Set thread count
|
||||
"-t", "4",
|
||||
|
||||
# Enable the highest verbosity logging mode
|
||||
"-vvv",
|
||||
|
||||
# Use interpolations to limit memory usage and bind
|
||||
# to the proper address
|
||||
"-m", "${NOMAD_MEMORY_LIMIT}",
|
||||
"-p", "${NOMAD_PORT_db}",
|
||||
"-l", "${NOMAD_ADDR_db}"
|
||||
]
|
||||
|
||||
network_mode = "host"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500 # 500 MHz
|
||||
memory = 256 # 256MB
|
||||
network {
|
||||
mbits = 10
|
||||
port "db" {
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
In the above example, we see how easy it is to pass configuration options using
|
||||
the `args` section and even see how
|
||||
[interpolation](/docs/jobspec/interpreted.html) allows us to pass arguments
|
||||
based on the dynamic port and address Nomad chose for this task.
|
||||
|
||||
## Config Files
|
||||
|
||||
Often times applications accept their configurations using configuration files
|
||||
or have so many arguments to be set it would be unwieldy to pass them via
|
||||
arguments. Nomad supports downloading
|
||||
[`artifacts`](/docs/jobspec/index.html#artifact_doc) prior to launching tasks.
|
||||
This allows shipping of configuration files and other assets that the task
|
||||
needs to run properly.
|
||||
|
||||
An example can be seen below, where we download two artifacts, one being the
|
||||
binary to run and the other beings its configuration:
|
||||
|
||||
```
|
||||
task "example" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "my-app"
|
||||
args = ["-config", "local/config.cfg"]
|
||||
}
|
||||
|
||||
# Download the binary to run
|
||||
artifact {
|
||||
source = "http://example.com/example/my-app"
|
||||
}
|
||||
|
||||
# Download the config file
|
||||
artifact {
|
||||
source = "http://example.com/example/config.cfg"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Here we can see a basic example of downloading static configuration files. By
|
||||
default, an `artifact` is downloaded to the task's `local/` directory but is
|
||||
[configurable](/docs/jobspec/index.html#artifact_doc).
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Operating a Job: Updating Jobs"
|
||||
sidebar_current: "docs-jobops-updating"
|
||||
page_title: "Update Strategies - Operating a Job"
|
||||
sidebar_current: "docs-operating-a-job-updating"
|
||||
description: |-
|
||||
Learn how to do safely update Nomad Jobs.
|
||||
---
|
||||
|
@ -21,14 +21,16 @@ support for rolling updates. When a job specifies a rolling update, with the
|
|||
below syntax, Nomad will only update `max-parallel` number of task groups at a
|
||||
time and will wait `stagger` duration before updating the next set.
|
||||
|
||||
```
|
||||
job "rolling" {
|
||||
...
|
||||
```hcl
|
||||
job "example" {
|
||||
# ...
|
||||
|
||||
update {
|
||||
stagger = "30s"
|
||||
max_parallel = 1
|
||||
}
|
||||
...
|
||||
|
||||
# ...
|
||||
}
|
||||
```
|
||||
|
||||
|
@ -37,7 +39,7 @@ jobs to ensure the scheduler will do as we expect. In this example, we have 3
|
|||
web server instances that we want to update their version. After the job file
|
||||
was modified we can run `plan`:
|
||||
|
||||
```
|
||||
```text
|
||||
$ nomad plan my-web.nomad
|
||||
+/- Job: "my-web"
|
||||
+/- Task Group: "web" (3 create/destroy update)
|
||||
|
@ -83,9 +85,9 @@ traffic.
|
|||
In this case we would consider version 1 to be the live set and we want to
|
||||
transition to version 2. We can model this workflow with the below job:
|
||||
|
||||
```
|
||||
```hcl
|
||||
job "my-api" {
|
||||
...
|
||||
# ...
|
||||
|
||||
group "api-green" {
|
||||
count = 10
|
||||
|
@ -127,9 +129,9 @@ operators be confident that deployments will not cause down time. If the deploy
|
|||
is successful and we fully transition from v1 to v2 the job file will look like
|
||||
this:
|
||||
|
||||
```
|
||||
```hcl
|
||||
job "my-api" {
|
||||
...
|
||||
# ...
|
||||
|
||||
group "api-green" {
|
||||
count = 0
|
||||
|
|
|
@ -0,0 +1,105 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Accessing Logs - Operating a Job"
|
||||
sidebar_current: "docs-operating-a-job-accessing-logs"
|
||||
description: |-
|
||||
Nomad provides a top-level mechanism for viewing application logs and data
|
||||
files via the command line interface. This section discusses the nomad logs
|
||||
command and API interface.
|
||||
---
|
||||
|
||||
# Accessing Logs
|
||||
|
||||
Viewing application logs is critical for debugging issues, examining performance
|
||||
problems, or even just verifying the application started correctly. To make this
|
||||
as simple as possible, Nomad provides:
|
||||
|
||||
- Job specification for [log rotation](/docs/jobspec/index.html#log_rotation)
|
||||
- CLI command for [log viewing](/docs/commands/logs.html)
|
||||
- API for programatic [log access](/docs/http/client-fs.html#logs)
|
||||
|
||||
This section will utilize the job named "docs" from the [previous
|
||||
sections](/docs/operating-a-job/submitting-jobs.html), but these operations
|
||||
and command largely apply to all jobs in Nomad.
|
||||
|
||||
As a reminder, here is the output of the run command from the previous example:
|
||||
|
||||
```text
|
||||
$ nomad run docs.nomad
|
||||
==> Monitoring evaluation "42d788a3"
|
||||
Evaluation triggered by job "docs"
|
||||
Allocation "04d9627d" created: node "a1f934c9", group "example"
|
||||
Allocation "e7b8d4f5" created: node "012ea79b", group "example"
|
||||
Allocation "5cbf23a1" modified: node "1e1aa1e0", group "example"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "42d788a3" finished with status "complete"
|
||||
```
|
||||
|
||||
The provided allocation ID (which is also available via the `nomad status`
|
||||
command) is required to access the application's logs. To access the logs of our
|
||||
application, we issue the following command:
|
||||
|
||||
```shell
|
||||
$ nomad logs 04d9627d
|
||||
```
|
||||
|
||||
The output will look something like this:
|
||||
|
||||
```text
|
||||
<timestamp> 10.1.1.196:5678 10.1.1.196:33407 "GET / HTTP/1.1" 200 12 "curl/7.35.0" 21.809µs
|
||||
<timestamp> 10.1.1.196:5678 10.1.1.196:33408 "GET / HTTP/1.1" 200 12 "curl/7.35.0" 20.241µs
|
||||
<timestamp> 10.1.1.196:5678 10.1.1.196:33409 "GET / HTTP/1.1" 200 12 "curl/7.35.0" 13.629µs
|
||||
```
|
||||
|
||||
By default, this will return the logs of the task. If more than one task is
|
||||
defined in the job file, the name of the task is a required argument:
|
||||
|
||||
```shell
|
||||
$ nomad logs 04d9627d server
|
||||
```
|
||||
|
||||
The logs command supports both displaying the logs as well as following logs,
|
||||
blocking for more output, similar to `tail -f`. To follow the logs, use the
|
||||
`-tail` flag:
|
||||
|
||||
```shell
|
||||
$ nomad logs -tail 04d9627d
|
||||
```
|
||||
|
||||
This will stream logs to our console.
|
||||
|
||||
By default, only the logs on stdout are displayed. To show the log output from
|
||||
stderr, use the `-stderr` flag:
|
||||
|
||||
```shell
|
||||
$ nomad logs -stderr 04d9627d
|
||||
```
|
||||
|
||||
## Log Shipper Pattern
|
||||
|
||||
While the logs command works well for quickly accessing application logs, it
|
||||
generally does not scale to large systems or systems that produce a lot of log
|
||||
output, especially for the long-term storage of logs. Nomad only retains log
|
||||
files for a configurable period of time, so chatty applications should use a
|
||||
better log retention strategy.
|
||||
|
||||
Since applications log to the `alloc/` directory, all tasks within the same task
|
||||
group have access to each others logs. Thus it is possible to have a task group
|
||||
as follows:
|
||||
|
||||
```hcl
|
||||
group "my-group" {
|
||||
task "server" {
|
||||
# ...
|
||||
}
|
||||
|
||||
task "log-shipper" {
|
||||
# ...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
In the above example, the `server` task is the application that should be run
|
||||
and will be producing the logs. The `log-shipper` reads those logs from the
|
||||
`alloc/logs/` directory and sends them to a longer-term storage solution such as
|
||||
Amazon S3 or an internal log aggregation system.
|
|
@ -0,0 +1,136 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Configuring Tasks - Operating a Job"
|
||||
sidebar_current: "docs-operating-a-job-configuring-tasks"
|
||||
description: |-
|
||||
Most applications require some kind of configuration. Whether this
|
||||
configuration is provided via the command line or via a configuration file,
|
||||
Nomad has built-in functionality for configuration. This section details two
|
||||
common patterns for configuring tasks.
|
||||
---
|
||||
|
||||
# Configuring Tasks
|
||||
|
||||
Most applications require some kind of configuration. The simplest way is via
|
||||
command-line arguments, but often times tasks consume complex configurations via
|
||||
config files. This section explores how to configure Nomad jobs to support many
|
||||
common configuration use cases.
|
||||
|
||||
## Command-line Arguments
|
||||
|
||||
Many tasks accept configuration via command-line arguments that do not change
|
||||
over time.
|
||||
|
||||
For example, consider the [http-echo](https://github.com/hashicorp/http-echo)
|
||||
server which is a small go binary that renders the provided text as a webpage. The binary accepts two parameters:
|
||||
|
||||
* `-listen` - the address:port to listen on
|
||||
* `-text` - the text to render as the HTML page
|
||||
|
||||
Outside of Nomad, the server is started like this:
|
||||
|
||||
```shell
|
||||
$ http-echo -listen=":5678" -text="hello world"
|
||||
```
|
||||
|
||||
The Nomad equivalent job file might look something like this:
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
datacenters = ["dc1"]
|
||||
|
||||
group "example" {
|
||||
task "server" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/bin/http-echo"
|
||||
args = [
|
||||
"-listen", ":5678",
|
||||
"-text", "hello world",
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "http" {
|
||||
static = "5678"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
~> **This assumes** the <tt>http-echo</tt> binary is already installed and available in the system path. Nomad can also optionally fetch the binary using the <tt>artifact</tt> resource.
|
||||
|
||||
Nomad has many [drivers](/docs/drivers/index.html), and most support passing
|
||||
arguments to their tasks via the `args` parameter. This option also optionally
|
||||
accepts [Nomad interpolation](/docs/jobspec/interpreted.html). For example, if
|
||||
you wanted Nomad to dynamically allocate a high port to bind the service on
|
||||
intead of relying on a static port for the previous job:
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
datacenters = ["dc1"]
|
||||
|
||||
group "example" {
|
||||
task "server" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/bin/http-echo"
|
||||
args = [
|
||||
"-listen", ":${NOMAD_PORT_http}",
|
||||
"-text", "hello world",
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "http" {}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration Files
|
||||
|
||||
Not all applications accept their configuration via command-line flags.
|
||||
Sometimes applications accept their configurations using files instead. Nomad
|
||||
supports downloading [artifacts](/docs/jobspec/index.html#artifact_doc) prior to
|
||||
launching tasks. This allows shipping of configuration files and other assets
|
||||
that the task needs to run properly.
|
||||
|
||||
Here is an example job which pulls down a configuration file as an artifact:
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
datacenters = ["dc1"]
|
||||
|
||||
group "example" {
|
||||
task "server" {
|
||||
driver = "exec"
|
||||
|
||||
artifact {
|
||||
source = "http://example.com/config.hcl"
|
||||
destination = "local/config.hcl"
|
||||
}
|
||||
|
||||
config {
|
||||
command = "my-app"
|
||||
args = [
|
||||
"-config", "local/config.hcl",
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
For more information on the artifact resource, please see the [artifact documentation](/docs/jobspec/index.html#artifact_doc).Zz
|
|
@ -0,0 +1,32 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Operating a Job"
|
||||
sidebar_current: "docs-operating-a-job"
|
||||
description: |-
|
||||
Learn how to operate a Nomad Job.
|
||||
---
|
||||
|
||||
# Operating a Job
|
||||
|
||||
The general flow for operating a job in Nomad is:
|
||||
|
||||
1. Author the job file according to the [job specification](/docs/jobspec/index.html)
|
||||
1. Plan and review the changes with a Nomad server
|
||||
1. Submit the job file to a Nomad server
|
||||
1. (Optional) Review job status and logs
|
||||
|
||||
When updating a job, there are a number of built-in update strategies which may
|
||||
be defined in the job file. The general flow for updating an existing job in
|
||||
Nomad is:
|
||||
|
||||
1. Modify the existing job file with the desired changes
|
||||
1. Plan and review the changes with a Nomad server
|
||||
1. Submit the job file to a Nomad server
|
||||
1. (Optional) Review job status and logs
|
||||
|
||||
Because the job file defines the update strategy (blue-green, rolling updates,
|
||||
etc.), the workflow remains the same regardless of whether this is an initial
|
||||
deployment or a long-running job.
|
||||
|
||||
This section provides some best practices and guidance for operating jobs under
|
||||
Nomad. Please navigate the appropriate sub-sections for more information.
|
|
@ -0,0 +1,215 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Inspecting State - Operating a Job"
|
||||
sidebar_current: "docs-operating-a-job-inspecting-state"
|
||||
description: |-
|
||||
Nomad exposes a number of tools and techniques for inspecting a running job.
|
||||
This is helpful in ensuring the job started successfully. Additionally, it
|
||||
can inform us of any errors that occurred while starting the job.
|
||||
---
|
||||
|
||||
# Inspecting State
|
||||
|
||||
A successful job submission is not an indication of a successfully-running job.
|
||||
This is the nature of a highly-optimistic scheduler. A successful job submission
|
||||
means the server was able to issue the proper scheduling commands. It does not
|
||||
indicate the job is actually running. To verify the job is running, we need to
|
||||
inspect its state.
|
||||
|
||||
This section will utilize the job named "docs" from the [previous
|
||||
sections](/docs/operating-a-job/submitting-jobs.html), but these operations
|
||||
and command largely apply to all jobs in Nomad.
|
||||
|
||||
## Job Status
|
||||
|
||||
After a job is submitted, you can query the status of that job using the status
|
||||
command:
|
||||
|
||||
```shell
|
||||
$ nomad status
|
||||
```
|
||||
|
||||
Here is some sample output:
|
||||
|
||||
```text
|
||||
ID Type Priority Status
|
||||
docs service 50 running
|
||||
```
|
||||
|
||||
At a high level, we can see that our job is currently running, but what does
|
||||
"running" actually mean. By supplying the name of a job to the status command,
|
||||
we can ask Nomad for more detailed job information:
|
||||
|
||||
```shell
|
||||
$ nomad status docs
|
||||
```
|
||||
|
||||
Here is some sample output
|
||||
|
||||
```text
|
||||
ID = docs
|
||||
Name = docs
|
||||
Type = service
|
||||
Priority = 50
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
example 0 0 3 0 0 0
|
||||
|
||||
Allocations
|
||||
ID Eval ID Node ID Task Group Desired Status Created At
|
||||
04d9627d 42d788a3 a1f934c9 example run running <timestamp>
|
||||
e7b8d4f5 42d788a3 012ea79b example run running <timestamp>
|
||||
5cbf23a1 42d788a3 1e1aa1e0 example run running <timestamp>
|
||||
```
|
||||
|
||||
Here we can see that there are three instances of this task running, each with
|
||||
its own allocation. For more information on the `status` command, please see the
|
||||
[CLI documentation for <tt>status</tt>](/docs/commands/status.html).
|
||||
|
||||
## Evaluation Status
|
||||
|
||||
You can think of an evaluation as a submission to the scheduler. An example
|
||||
below shows status output for a job where some allocations were placed
|
||||
successfully, but did not have enough resources to place all of the desired
|
||||
allocations.
|
||||
|
||||
If we issue the status command with the `-evals` flag, we could see there is an
|
||||
outstanding evaluation for this hypothetical job:
|
||||
|
||||
```text
|
||||
$ nomad status -evals docs
|
||||
ID = docs
|
||||
Name = docs
|
||||
Type = service
|
||||
Priority = 50
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
|
||||
Evaluations
|
||||
ID Priority Triggered By Status Placement Failures
|
||||
5744eb15 50 job-register blocked N/A - In Progress
|
||||
8e38e6cf 50 job-register complete true
|
||||
|
||||
Placement Failure
|
||||
Task Group "example":
|
||||
* Resources exhausted on 1 nodes
|
||||
* Dimension "cpu exhausted" exhausted on 1 nodes
|
||||
|
||||
Allocations
|
||||
ID Eval ID Node ID Task Group Desired Status Created At
|
||||
12681940 8e38e6cf 4beef22f example run running <timestamp>
|
||||
395c5882 8e38e6cf 4beef22f example run running <timestamp>
|
||||
4d7c6f84 8e38e6cf 4beef22f example run running <timestamp>
|
||||
843b07b8 8e38e6cf 4beef22f example run running <timestamp>
|
||||
a8bc6d3e 8e38e6cf 4beef22f example run running <timestamp>
|
||||
b0beb907 8e38e6cf 4beef22f example run running <timestamp>
|
||||
da21c1fd 8e38e6cf 4beef22f example run running <timestamp>
|
||||
```
|
||||
|
||||
In the above example we see that the job has a "blocked" evaluation that is in
|
||||
progress. When Nomad can not place all the desired allocations, it creates a
|
||||
blocked evaluation that waits for more resources to become available.
|
||||
|
||||
The `eval-status` command enables us to examine any evaluation in more detail.
|
||||
For the most part this should never be necessary but can be useful to see why
|
||||
all of a job's allocations were not placed. For example if we run it on the job
|
||||
named docs, which had a placement failure according to the above output, we
|
||||
might see:
|
||||
|
||||
```text
|
||||
$ nomad eval-status 8e38e6cf
|
||||
ID = 8e38e6cf
|
||||
Status = complete
|
||||
Status Description = complete
|
||||
Type = service
|
||||
TriggeredBy = job-register
|
||||
Job ID = docs
|
||||
Priority = 50
|
||||
Placement Failures = true
|
||||
|
||||
Failed Placements
|
||||
Task Group "example" (failed to place 3 allocations):
|
||||
* Resources exhausted on 1 nodes
|
||||
* Dimension "cpu exhausted" exhausted on 1 nodes
|
||||
|
||||
Evaluation "5744eb15" waiting for additional capacity to place remainder
|
||||
```
|
||||
|
||||
For more information on the `eval-status` command, please see the [CLI documentation for <tt>eval-status</tt>](/docs/commands/eval-status.html).
|
||||
|
||||
## Allocation Status
|
||||
|
||||
You can think of an allocation as an instruction to schedule. Just like an
|
||||
application or service, an allocation has logs and state. The `alloc-status`
|
||||
command gives us the most recent events that occurred for a task, its resource
|
||||
usage, port allocations and more:
|
||||
|
||||
```text
|
||||
$ nomad alloc-status 04d9627d
|
||||
ID = 04d9627d
|
||||
Eval ID = 42d788a3
|
||||
Name = docs.example[2]
|
||||
Node ID = a1f934c9
|
||||
Job ID = docs
|
||||
Client Status = running
|
||||
|
||||
Task "server" is "running"
|
||||
Task Resources
|
||||
CPU Memory Disk IOPS Addresses
|
||||
0/100 MHz 728 KiB/10 MiB 300 MiB 0 http: 10.1.1.196:5678
|
||||
|
||||
Recent Events:
|
||||
Time Type Description
|
||||
10/09/16 00:36:06 UTC Started Task started by client
|
||||
10/09/16 00:36:05 UTC Received Task received by client
|
||||
```
|
||||
|
||||
The `alloc-status` command is a good starting to point for debugging an
|
||||
application that did not start. Hypothetically assume a user meant to start a
|
||||
Docker container named "redis:2.8", but accidentally put a comma instead of a
|
||||
period, typing "redis:2,8".
|
||||
|
||||
When the job is executed, it produces a failed allocation. The `alloc-status`
|
||||
command will give us the reason why:
|
||||
|
||||
```text
|
||||
$ nomad alloc-status 04d9627d
|
||||
# ...
|
||||
|
||||
Recent Events:
|
||||
Time Type Description
|
||||
06/28/16 15:50:22 UTC Not Restarting Error was unrecoverable
|
||||
06/28/16 15:50:22 UTC Driver Failure failed to create image: Failed to pull `redis:2,8`: API error (500): invalid tag format
|
||||
06/28/16 15:50:22 UTC Received Task received by client
|
||||
```
|
||||
|
||||
Unfortunately not all failures are as easily debuggable. If the `alloc-status`
|
||||
command shows many restarts, there is likely an application-level issue during
|
||||
start up. For example:
|
||||
|
||||
```
|
||||
$ nomad alloc-status 04d9627d
|
||||
# ...
|
||||
|
||||
Recent Events:
|
||||
Time Type Description
|
||||
06/28/16 15:56:16 UTC Restarting Task restarting in 5.178426031s
|
||||
06/28/16 15:56:16 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
|
||||
06/28/16 15:56:16 UTC Started Task started by client
|
||||
06/28/16 15:56:00 UTC Restarting Task restarting in 5.00123931s
|
||||
06/28/16 15:56:00 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
|
||||
06/28/16 15:55:59 UTC Started Task started by client
|
||||
06/28/16 15:55:48 UTC Received Task received by client
|
||||
```
|
||||
|
||||
To debug these failures, we will need to utilize the "logs" command, which is
|
||||
discussed in the [accessing logs](/docs/operating-a-job/accessing-logs.html)
|
||||
section of this documentation.
|
||||
|
||||
For more information on the `alloc-status` command, please see the [CLI
|
||||
documentation for <tt>alloc-status</tt>](/docs/commands/alloc-status.html).
|
|
@ -0,0 +1,97 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Resource Utilization - Operating a Job"
|
||||
sidebar_current: "docs-operating-a-job-resource-utilization"
|
||||
description: |-
|
||||
Nomad supports reporting detailed job statistics and resource utilization
|
||||
metrics for most task drivers. This section describes the ways to inspect a
|
||||
job's resource consumption and utilization.
|
||||
---
|
||||
|
||||
# Resource Utilization
|
||||
|
||||
Understanding the resource utilization of an application is important, and Nomad
|
||||
supports reporting detailed statistics in many of its drivers. The main
|
||||
interface for seeing resource utilization is the `alloc-status` command with the
|
||||
`-stats` flag.
|
||||
|
||||
This section will utilize the job named "docs" from the [previous
|
||||
sections](/docs/operating-a-job/submitting-jobs.html), but these operations
|
||||
and command largely apply to all jobs in Nomad.
|
||||
|
||||
As a reminder, here is the output of the run command from the previous example:
|
||||
|
||||
```text
|
||||
$ nomad run docs.nomad
|
||||
==> Monitoring evaluation "42d788a3"
|
||||
Evaluation triggered by job "docs"
|
||||
Allocation "04d9627d" created: node "a1f934c9", group "example"
|
||||
Allocation "e7b8d4f5" created: node "012ea79b", group "example"
|
||||
Allocation "5cbf23a1" modified: node "1e1aa1e0", group "example"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "42d788a3" finished with status "complete"
|
||||
```
|
||||
|
||||
To see the detailed usage statistics, we can issue the command:
|
||||
|
||||
```shell
|
||||
$ nomad alloc-status -stats 04d9627d
|
||||
```
|
||||
|
||||
And here is some sample output:
|
||||
|
||||
```text
|
||||
$ nomad alloc-status c3e0
|
||||
ID = 04d9627d
|
||||
Eval ID = 42d788a3
|
||||
Name = docs.example[2]
|
||||
Node ID = a1f934c9
|
||||
Job ID = docs
|
||||
Client Status = running
|
||||
|
||||
Task "server" is "running"
|
||||
Task Resources
|
||||
CPU Memory Disk IOPS Addresses
|
||||
75/100 MHz 784 KiB/10 MiB 300 MiB 0 http: 10.1.1.196:5678
|
||||
|
||||
Memory Stats
|
||||
Cache Max Usage RSS Swap
|
||||
56 KiB 1.3 MiB 784 KiB 0 B
|
||||
|
||||
CPU Stats
|
||||
Percent Throttled Periods Throttled Time
|
||||
0.00% 0 0
|
||||
|
||||
Recent Events:
|
||||
Time Type Description
|
||||
<timestamp> Started Task started by client
|
||||
<timestamp> Received Task received by client
|
||||
```
|
||||
|
||||
Here we can see that we are near the limit of our configured CPU but we have
|
||||
plenty of memory headroom. We can use this information to alter our job's
|
||||
resources to better reflect is actually needs:
|
||||
|
||||
```hcl
|
||||
resource {
|
||||
cpu = 200
|
||||
memory = 10
|
||||
}
|
||||
```
|
||||
|
||||
Adjusting resources is very important for a variety of reasons:
|
||||
|
||||
* Ensuring your application does not get OOM killed if it hits its memory limit.
|
||||
* Ensuring the application performs well by ensuring it has some CPU allowance.
|
||||
* Optimizing cluster density by reserving what you need and not over-allocating.
|
||||
|
||||
While single point in time resource usage measurements are useful, it is often
|
||||
more useful to graph resource usage over time to better understand and estimate
|
||||
resource usage. Nomad supports outputting resource data to statsite and statsd
|
||||
and is the recommended way of monitoring resources. For more information about
|
||||
outputting telemetry see the [telemetry
|
||||
documentation](/docs/agent/telemetry.html).
|
||||
|
||||
For more advanced use cases, the resource usage data is also accessible via the
|
||||
client's HTTP API. See the documentation of the Client's [allocation HTTP
|
||||
API](/docs/http/client-allocation-stats.html).
|
|
@ -0,0 +1,181 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Submitting Jobs - Operating a Job"
|
||||
sidebar_current: "docs-operating-a-job-submitting-jobs"
|
||||
description: |-
|
||||
The job file is the unit of work in Nomad. Upon authoring, the job file is
|
||||
submitted to the server for evaluation and scheduling. This section discusses
|
||||
some techniques for submitting jobs.
|
||||
---
|
||||
|
||||
# Submitting Jobs
|
||||
|
||||
In Nomad, the description of the job and all its requirements are maintained in
|
||||
a single file called the "job file". This job file resides locally on disk and
|
||||
it is highly recommended that you check job files into source control.
|
||||
|
||||
The general flow for submitting a job in Nomad is:
|
||||
|
||||
1. Author a job file according to the job specification
|
||||
1. Plan and review changes with a Nomad server
|
||||
1. Submit the job file to a Nomad server
|
||||
1. (Optional) Review job status and logs
|
||||
|
||||
Here is a very basic example to get you started.
|
||||
|
||||
## Author a Job File
|
||||
Authoring a job file is very easy. For more detailed information, please see the
|
||||
[job specification](/docs/jobspec/index.html). Here is a sample job file which
|
||||
runs a small docker container web server to get us started.
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
datacenters = ["dc1"]
|
||||
|
||||
group "example" {
|
||||
task "server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "hashicorp/http-echo"
|
||||
args = [
|
||||
"-listen", ":5678",
|
||||
"-text", "hello world",
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "http" {
|
||||
static = "5678"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This job file exists on your local workstation in plain text. When you are
|
||||
satisfied with this job file, you will plan and review the scheduler decision.
|
||||
It is generally a best practice to commit job files to source control,
|
||||
especially if you are working in a team.
|
||||
|
||||
## Planning the Job
|
||||
|
||||
Once the job file is authored, we need to plan out the changes. The `nomad plan`
|
||||
command invokes a dry-run of the scheduler and inform us of which scheduling
|
||||
decisions would take place.
|
||||
|
||||
```shell
|
||||
$ nomad plan docs.nomad
|
||||
```
|
||||
|
||||
The resulting output will look like:
|
||||
|
||||
```text
|
||||
+ Job: "docs"
|
||||
+ Task Group: "example" (1 create)
|
||||
+ Task: "server" (forces create)
|
||||
|
||||
Scheduler dry-run:
|
||||
- All tasks successfully allocated.
|
||||
|
||||
Job Modify Index: 0
|
||||
To submit the job with version verification run:
|
||||
|
||||
nomad run -check-index 0 docs.nomad
|
||||
|
||||
When running the job with the check-index flag, the job will only be run if the
|
||||
server side version matches the the job modify index returned. If the index has
|
||||
changed, another user has modified the job and the plan's results are
|
||||
potentially invalid.
|
||||
```
|
||||
|
||||
Note that no action was taken. This job is not running. This is a complete
|
||||
dry-run and no allocations have taken place.
|
||||
|
||||
## Submitting the Job
|
||||
|
||||
Assuming the output of the plan looks acceptable, we can ask Nomad to execute
|
||||
this job. This is done via the `nomad run` command. We can optionally supply
|
||||
the modify index provided to us by the plan command to ensure no changes to this
|
||||
job have taken place between our plan and now.
|
||||
|
||||
```shell
|
||||
$ nomad run docs.nomad
|
||||
```
|
||||
|
||||
The resulting output will look like:
|
||||
|
||||
```text
|
||||
==> Monitoring evaluation "0d159869"
|
||||
Evaluation triggered by job "docs"
|
||||
Allocation "5cbf23a1" created: node "1e1aa1e0", group "example"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "0d159869" finished with status "complete"
|
||||
```
|
||||
|
||||
Now that the job is scheduled, it may or may not be running. We need to inspect
|
||||
the allocation status and logs to make sure the job started correctly. The next
|
||||
section on [inspecting state](/docs/operating-a-job/inspecting-state.html)
|
||||
details ways to examine this job's state.
|
||||
|
||||
## Updating the Job
|
||||
|
||||
When making updates to the job, it is best to always run the plan command and
|
||||
then the run command. For example:
|
||||
|
||||
```diff
|
||||
@@ -2,6 +2,8 @@ job "docs" {
|
||||
datacenters = ["dc1"]
|
||||
|
||||
group "example" {
|
||||
+ count = "3"
|
||||
+
|
||||
task "server" {
|
||||
driver = "docker"
|
||||
```
|
||||
|
||||
After we save these changes to disk, run the plan command:
|
||||
|
||||
```text
|
||||
$ nomad plan docs.nomad
|
||||
+/- Job: "docs"
|
||||
+/- Task Group: "example" (2 create, 1 in-place update)
|
||||
+/- Count: "1" => "3" (forces create)
|
||||
Task: "server"
|
||||
|
||||
Scheduler dry-run:
|
||||
- All tasks successfully allocated.
|
||||
|
||||
Job Modify Index: 131
|
||||
To submit the job with version verification run:
|
||||
|
||||
nomad run -check-index 131 docs.nomad
|
||||
|
||||
When running the job with the check-index flag, the job will only be run if the
|
||||
server side version matches the the job modify index returned. If the index has
|
||||
changed, another user has modified the job and the plan's results are
|
||||
potentially invalid.
|
||||
```
|
||||
|
||||
And then run the run command, assuming the output looks okay. Note that we are
|
||||
including the "check-index" parameter. This will ensure that no remote changes
|
||||
have taken place to the job between our plan and run phases.
|
||||
|
||||
```text
|
||||
nomad run -check-index 131 docs.nomad
|
||||
==> Monitoring evaluation "42d788a3"
|
||||
Evaluation triggered by job "docs"
|
||||
Allocation "04d9627d" created: node "a1f934c9", group "example"
|
||||
Allocation "e7b8d4f5" created: node "012ea79b", group "example"
|
||||
Allocation "5cbf23a1" modified: node "1e1aa1e0", group "example"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "42d788a3" finished with status "complete"
|
||||
```
|
||||
|
||||
For more details on advanced job updating strategies such as canary builds and
|
||||
build-green deployments, please see the documentation on [job update
|
||||
strategies](/docs/operating-a-job/update-strategies.html).
|
|
@ -0,0 +1,175 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Update Strategies - Operating a Job"
|
||||
sidebar_current: "docs-operating-a-job-updating"
|
||||
description: |-
|
||||
Learn how to do safely update Nomad Jobs.
|
||||
---
|
||||
|
||||
# Update Strategies
|
||||
|
||||
When operating a service, updating the version of the job will be a common task.
|
||||
Under a cluster scheduler the same best practices apply for reliably deploying
|
||||
new versions including: rolling updates, blue-green deploys and canaries which
|
||||
are special cased blue-green deploys. This section will explore how to do each
|
||||
of these safely with Nomad.
|
||||
|
||||
## Rolling Updates
|
||||
|
||||
In order to update a service without introducing down-time, Nomad has build in
|
||||
support for rolling updates. When a job specifies a rolling update, with the
|
||||
below syntax, Nomad will only update `max-parallel` number of task groups at a
|
||||
time and will wait `stagger` duration before updating the next set.
|
||||
|
||||
```hcl
|
||||
job "example" {
|
||||
# ...
|
||||
|
||||
update {
|
||||
stagger = "30s"
|
||||
max_parallel = 1
|
||||
}
|
||||
|
||||
# ...
|
||||
}
|
||||
```
|
||||
|
||||
We can use the "nomad plan" command while updating jobs to ensure the scheduler
|
||||
will do as we expect. In this example, we have 3 web server instances that we
|
||||
want to update their version. After the job file was modified we can run `plan`:
|
||||
|
||||
```text
|
||||
$ nomad plan my-web.nomad
|
||||
+/- Job: "my-web"
|
||||
+/- Task Group: "web" (3 create/destroy update)
|
||||
+/- Task: "web" (forces create/destroy update)
|
||||
+/- Config {
|
||||
+/- image: "nginx:1.10" => "nginx:1.11"
|
||||
port_map[0][http]: "80"
|
||||
}
|
||||
|
||||
Scheduler dry-run:
|
||||
- All tasks successfully allocated.
|
||||
- Rolling update, next evaluation will be in 10s.
|
||||
|
||||
Job Modify Index: 7
|
||||
To submit the job with version verification run:
|
||||
|
||||
nomad run -check-index 7 my-web.nomad
|
||||
|
||||
When running the job with the check-index flag, the job will only be run if the
|
||||
server side version matches the the job modify index returned. If the index has
|
||||
changed, another user has modified the job and the plan's results are
|
||||
potentially invalid.
|
||||
```
|
||||
|
||||
Here we can see that Nomad will destroy the 3 existing tasks and create 3
|
||||
replacements but it will occur with a rolling update with a stagger of `10s`.
|
||||
For more details on the update block, see
|
||||
the [Jobspec documentation](/docs/jobspec/index.html#update).
|
||||
|
||||
## Blue-green and Canaries
|
||||
|
||||
Blue-green deploys have several names, Red/Black, A/B, Blue/Green, but the
|
||||
concept is the same. The idea is to have two sets of applications with only one
|
||||
of them being live at a given time, except while transitioning from one set to
|
||||
another. What the term "live" means is that the live set of applications are
|
||||
the set receiving traffic.
|
||||
|
||||
So imagine we have an API server that has 10 instances deployed to production
|
||||
at version 1 and we want to upgrade to version 2. Hopefully the new version has
|
||||
been tested in a QA environment and is now ready to start accepting production
|
||||
traffic.
|
||||
|
||||
In this case we would consider version 1 to be the live set and we want to
|
||||
transition to version 2. We can model this workflow with the below job:
|
||||
|
||||
```hcl
|
||||
job "my-api" {
|
||||
# ...
|
||||
|
||||
group "api-green" {
|
||||
count = 10
|
||||
|
||||
task "api-server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "api-server:v1"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "api-blue" {
|
||||
count = 0
|
||||
|
||||
task "api-server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "api-server:v2"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Here we can see the live group is "api-green" since it has a non-zero count. To
|
||||
transition to v2, we up the count of "api-blue" and down the count of
|
||||
"api-green". We can now see how the canary process is a special case of
|
||||
blue-green. If we set "api-blue" to `count = 1` and "api-green" to `count = 9`,
|
||||
there will still be the original 10 instances but we will be testing only one
|
||||
instance of the new version, essentially canarying it.
|
||||
|
||||
If at any time we notice that the new version is behaving incorrectly and we
|
||||
want to roll back, all that we have to do is drop the count of the new group to
|
||||
0 and restore the original version back to 10. This fine control lets job
|
||||
operators be confident that deployments will not cause down time. If the deploy
|
||||
is successful and we fully transition from v1 to v2 the job file will look like
|
||||
this:
|
||||
|
||||
```hcl
|
||||
job "my-api" {
|
||||
# ...
|
||||
|
||||
group "api-green" {
|
||||
count = 0
|
||||
|
||||
task "api-server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "api-server:v1"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "api-blue" {
|
||||
count = 10
|
||||
|
||||
task "api-server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "api-server:v2"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Now "api-blue" is the live group and when we are ready to update the api to v3,
|
||||
we would modify "api-green" and repeat this process. The rate at which the count
|
||||
of groups are incremented and decremented is totally up to the user. It is
|
||||
usually good practice to start by transition one at a time until a certain
|
||||
confidence threshold is met based on application specific logs and metrics.
|
||||
|
||||
## Handling Drain Signals
|
||||
|
||||
On operating systems that support signals, Nomad will signal the application
|
||||
before killing it. This gives the application time to gracefully drain
|
||||
connections and conduct any other cleanup that is necessary. Certain
|
||||
applications take longer to drain than others and as such Nomad lets the job
|
||||
file specify how long to wait in-between signaling the application to exit and
|
||||
forcefully killing it. This is configurable via the `kill_timeout`. More details
|
||||
can be seen in the [Jobspec documentation](/docs/jobspec/index.html#kill_timeout).
|
|
@ -21,6 +21,6 @@ We recommend reading the following as next steps.
|
|||
* [Creating a Cluster](/docs/cluster/bootstrapping.html) - Additional details on
|
||||
creating a production worthy Nomad Cluster.
|
||||
|
||||
* [Operating a Job](/docs/jobops/index.html) - Additional details on how to
|
||||
* [Operating a Job](/docs/operating-a-job/index.html) - Additional details on how to
|
||||
run a job in production.
|
||||
|
||||
|
|
|
@ -24,43 +24,6 @@
|
|||
</ul>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("docs-jobops") %>>
|
||||
<a href="/docs/jobops/index.html">Operating a Job</a>
|
||||
<ul class="nav">
|
||||
<li<%= sidebar_current("docs-jobops-task-config") %>>
|
||||
<a href="/docs/jobops/taskconfig.html">Task Configuration</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-jobops-inspection") %>>
|
||||
<a href="/docs/jobops/inspecting.html">Inspecting State</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-jobops-resource-utilization") %>>
|
||||
<a href="/docs/jobops/resources.html">Resource Utilization</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-jobops-service-discovery") %>>
|
||||
<a href="/docs/jobops/servicediscovery.html">Service Discovery</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-jobops-logs") %>>
|
||||
<a href="/docs/jobops/logs.html">Accessing Logs</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-jobops-updating") %>>
|
||||
<a href="/docs/jobops/updating.html">Updating Jobs</a>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("docs-upgrade") %>>
|
||||
<a href="/docs/upgrade/index.html">Upgrading</a>
|
||||
<ul class="nav">
|
||||
<li<%= sidebar_current("docs-upgrade-upgrading") %>>
|
||||
<a href="/docs/upgrade/index.html">Upgrading Nomad</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-upgrade-specific") %>>
|
||||
<a href="/docs/upgrade/upgrade-specific.html">Specific Version Details</a>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
|
||||
|
||||
<li<%= sidebar_current("docs-jobspec") %>>
|
||||
<a href="/docs/jobspec/index.html">Job Specification</a>
|
||||
<ul class="nav">
|
||||
|
@ -88,6 +51,42 @@
|
|||
</ul>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("docs-operating-a-job") %>>
|
||||
<a href="/docs/operating-a-job/index.html">Operating a Job</a>
|
||||
<ul class="nav">
|
||||
<li<%= sidebar_current("docs-operating-a-job-configuring-tasks") %>>
|
||||
<a href="/docs/operating-a-job/configuring-tasks.html">Configuring Tasks</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-operating-a-job-submitting-jobs") %>>
|
||||
<a href="/docs/operating-a-job/submitting-jobs.html">Submitting Jobs</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-operating-a-job-inspecting-state") %>>
|
||||
<a href="/docs/operating-a-job/inspecting-state.html">Inspecting State</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-operating-a-job-accessing-logs") %>>
|
||||
<a href="/docs/operating-a-job/accessing-logs.html">Accessing Logs</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-operating-a-job-resource-utilization") %>>
|
||||
<a href="/docs/operating-a-job/resource-utilization.html">Resource Utilization</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-operating-a-job-updating") %>>
|
||||
<a href="/docs/operating-a-job/update-strategies.html">Update Strategies</a>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("docs-upgrade") %>>
|
||||
<a href="/docs/upgrade/index.html">Upgrading</a>
|
||||
<ul class="nav">
|
||||
<li<%= sidebar_current("docs-upgrade-upgrading") %>>
|
||||
<a href="/docs/upgrade/index.html">Upgrading Nomad</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-upgrade-specific") %>>
|
||||
<a href="/docs/upgrade/upgrade-specific.html">Specific Version Details</a>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("docs-drivers") %>>
|
||||
<a href="/docs/drivers/index.html">Task Drivers</a>
|
||||
<ul class="nav">
|
||||
|
|
Loading…
Reference in New Issue