Remove old refs
This commit is contained in:
parent
5dc729a6fc
commit
1d783513bd
|
@ -1,75 +0,0 @@
|
||||||
---
|
|
||||||
layout: "docs"
|
|
||||||
page_title: "Resource Utilization - Operating a Job"
|
|
||||||
sidebar_current: "docs-operating-a-job-resource-utilization"
|
|
||||||
description: |-
|
|
||||||
Nomad supports reporting detailed job statistics and resource utilization
|
|
||||||
metrics for most task drivers. This section describes the ways to inspect a
|
|
||||||
job's resource consumption and utilization.
|
|
||||||
---
|
|
||||||
|
|
||||||
# Determining Resource Utilization
|
|
||||||
|
|
||||||
Understanding the resource utilization of your application is important for many
|
|
||||||
reasons and Nomad supports reporting detailed statistics in many of its drivers.
|
|
||||||
The main interface for seeing resource utilization is with the [`alloc-status`
|
|
||||||
command](/docs/commands/alloc-status.html) by specifying the `-stats` flag.
|
|
||||||
|
|
||||||
In the below example we are running `redis` and can see its resource utilization
|
|
||||||
below:
|
|
||||||
|
|
||||||
```text
|
|
||||||
$ nomad alloc-status c3e0
|
|
||||||
ID = c3e0e3e0
|
|
||||||
Eval ID = 617e5e39
|
|
||||||
Name = example.cache[0]
|
|
||||||
Node ID = 39acd6e0
|
|
||||||
Job ID = example
|
|
||||||
Client Status = running
|
|
||||||
Created At = 06/28/16 16:42:42 UTC
|
|
||||||
|
|
||||||
Task "redis" is "running"
|
|
||||||
Task Resources
|
|
||||||
CPU Memory Disk IOPS Addresses
|
|
||||||
957/1000 30 MiB/256 MiB 300 MiB 0 db: 127.0.0.1:34907
|
|
||||||
|
|
||||||
Memory Stats
|
|
||||||
Cache Max Usage RSS Swap
|
|
||||||
32 KiB 79 MiB 30 MiB 0 B
|
|
||||||
|
|
||||||
CPU Stats
|
|
||||||
Percent Throttled Periods Throttled Time
|
|
||||||
73.66% 0 0
|
|
||||||
|
|
||||||
Recent Events:
|
|
||||||
Time Type Description
|
|
||||||
06/28/16 16:43:50 UTC Started Task started by client
|
|
||||||
06/28/16 16:42:42 UTC Received Task received by client
|
|
||||||
```
|
|
||||||
|
|
||||||
Here we can see that we are near the limit of our configured CPU but we have
|
|
||||||
plenty of memory headroom. We can use this information to alter our job's
|
|
||||||
resources to better reflect is actually needs:
|
|
||||||
|
|
||||||
```hcl
|
|
||||||
resource {
|
|
||||||
cpu = 2000
|
|
||||||
memory = 100
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Adjusting resources is very important for a variety of reasons:
|
|
||||||
|
|
||||||
* Ensuring your application does not get OOM killed if it hits its memory limit.
|
|
||||||
* Ensuring the application performs well by ensuring it has some CPU allowance.
|
|
||||||
* Optimizing cluster density by reserving what you need and not over-allocating.
|
|
||||||
|
|
||||||
While single point in time resource usage measurements are useful, it is often
|
|
||||||
more useful to graph resource usage over time to better understand and estimate
|
|
||||||
resource usage. Nomad supports outputting resource data to statsite and statsd
|
|
||||||
and is the recommended way of monitoring resources. For more information about
|
|
||||||
outputting telemetry see the [Telemetry documentation](/docs/agent/telemetry.html).
|
|
||||||
|
|
||||||
For more advanced use cases, the resource usage data may also be accessed via
|
|
||||||
the client's HTTP API. See the documentation of the Client's
|
|
||||||
[Allocation HTTP API](/docs/http/client-allocation-stats.html)
|
|
|
@ -1,93 +0,0 @@
|
||||||
---
|
|
||||||
layout: "docs"
|
|
||||||
page_title: "Submitting Jobs - Operating a Job"
|
|
||||||
sidebar_current: "docs-operating-a-job-submitting"
|
|
||||||
description: |-
|
|
||||||
The job file is the unit of work in Nomad. Upon authoring, the job file is
|
|
||||||
submitted to the server for evaluation and scheduling. This section discusses
|
|
||||||
some techniques for submitting jobs.
|
|
||||||
---
|
|
||||||
|
|
||||||
# Submitting Jobs
|
|
||||||
|
|
||||||
In Nomad, the description of the job and all its requirements are maintained in
|
|
||||||
a single file called the "job file". This job file resides locally on disk and
|
|
||||||
it is highly recommended that you check job files into source control.
|
|
||||||
|
|
||||||
The general flow for submitting a job in Nomad is:
|
|
||||||
|
|
||||||
1. Author a job file according to the job specification
|
|
||||||
1. Plan and review changes with a Nomad server
|
|
||||||
1. Submit the job file to a Nomad server
|
|
||||||
1. (Optional) Review job status and logs
|
|
||||||
|
|
||||||
Here is a very basic example to get you started.
|
|
||||||
|
|
||||||
## Author a Job File
|
|
||||||
Authoring a job file is very easy. For more detailed information, please see the
|
|
||||||
[job specification](/docs/jobspec/index.html). Here is a sample job file which
|
|
||||||
runs a small docker container web server.
|
|
||||||
|
|
||||||
```hcl
|
|
||||||
job "docs" {
|
|
||||||
datacenters = ["dc1"]
|
|
||||||
|
|
||||||
group "example" {
|
|
||||||
task "server" {
|
|
||||||
driver = "docker"
|
|
||||||
config {
|
|
||||||
image = "hashicorp/http-echo"
|
|
||||||
args = ["-text", "hello world"]
|
|
||||||
}
|
|
||||||
|
|
||||||
resources {
|
|
||||||
memory = 32
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
This job file exists on your local workstation in plain text. When you are
|
|
||||||
satisfied with this job file, you will plan and review the scheduler decision.
|
|
||||||
It is generally a best practice to commit job files to source control,
|
|
||||||
especially if you are working in a team.
|
|
||||||
|
|
||||||
## Planning the Job
|
|
||||||
Once the job file is authored, we need to plan out the changes. The `nomad plan`
|
|
||||||
command may be used to perform a dry-run of the scheduler and inform us of
|
|
||||||
which scheduling decisions would take place.
|
|
||||||
|
|
||||||
```shell
|
|
||||||
$ nomad plan example.nomad
|
|
||||||
```
|
|
||||||
|
|
||||||
The resulting output will look like:
|
|
||||||
|
|
||||||
```text
|
|
||||||
TODO: Output
|
|
||||||
```
|
|
||||||
|
|
||||||
Note that no action has been taken. This is a complete dry-run and no
|
|
||||||
allocations have taken place.
|
|
||||||
|
|
||||||
## Submitting the Job
|
|
||||||
Assuming the output of the plan looks acceptable, we can ask Nomad to execute
|
|
||||||
this job. This is done via the `nomad run` command. We can optionally supply
|
|
||||||
the modify index provided to us by the plan command to ensure no changes to this
|
|
||||||
job have taken place between our plan and now.
|
|
||||||
|
|
||||||
```shell
|
|
||||||
$ nomad run -check-index=123 example.nomad
|
|
||||||
```
|
|
||||||
|
|
||||||
The resulting output will look like:
|
|
||||||
|
|
||||||
```text
|
|
||||||
TODO: Output
|
|
||||||
```
|
|
||||||
|
|
||||||
Now that the job is scheduled, it may or may not be running. We need to inspect
|
|
||||||
the allocation status and logs to make sure the job started correctly. The next
|
|
||||||
section on [inspecting state](/docs/operating-a-job/inspecting-state.html) details ways to
|
|
||||||
examine this job.
|
|
|
@ -1,176 +0,0 @@
|
||||||
---
|
|
||||||
layout: "docs"
|
|
||||||
page_title: "Update Strategies - Operating a Job"
|
|
||||||
sidebar_current: "docs-operating-a-job-updating"
|
|
||||||
description: |-
|
|
||||||
Learn how to do safely update Nomad Jobs.
|
|
||||||
---
|
|
||||||
|
|
||||||
# Updating a Job
|
|
||||||
|
|
||||||
When operating a service, updating the version of the job will be a common task.
|
|
||||||
Under a cluster scheduler the same best practices apply for reliably deploying
|
|
||||||
new versions including: rolling updates, blue-green deploys and canaries which
|
|
||||||
are special cased blue-green deploys. This section will explore how to do each
|
|
||||||
of these safely with Nomad.
|
|
||||||
|
|
||||||
## Rolling Updates
|
|
||||||
|
|
||||||
In order to update a service without introducing down-time, Nomad has build in
|
|
||||||
support for rolling updates. When a job specifies a rolling update, with the
|
|
||||||
below syntax, Nomad will only update `max-parallel` number of task groups at a
|
|
||||||
time and will wait `stagger` duration before updating the next set.
|
|
||||||
|
|
||||||
```hcl
|
|
||||||
job "example" {
|
|
||||||
# ...
|
|
||||||
|
|
||||||
update {
|
|
||||||
stagger = "30s"
|
|
||||||
max_parallel = 1
|
|
||||||
}
|
|
||||||
|
|
||||||
# ...
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
We can use the [`nomad plan` command](/docs/commands/plan.html) while updating
|
|
||||||
jobs to ensure the scheduler will do as we expect. In this example, we have 3
|
|
||||||
web server instances that we want to update their version. After the job file
|
|
||||||
was modified we can run `plan`:
|
|
||||||
|
|
||||||
```text
|
|
||||||
$ nomad plan my-web.nomad
|
|
||||||
+/- Job: "my-web"
|
|
||||||
+/- Task Group: "web" (3 create/destroy update)
|
|
||||||
+/- Task: "web" (forces create/destroy update)
|
|
||||||
+/- Config {
|
|
||||||
+/- image: "nginx:1.10" => "nginx:1.11"
|
|
||||||
port_map[0][http]: "80"
|
|
||||||
}
|
|
||||||
|
|
||||||
Scheduler dry-run:
|
|
||||||
- All tasks successfully allocated.
|
|
||||||
- Rolling update, next evaluation will be in 10s.
|
|
||||||
|
|
||||||
Job Modify Index: 7
|
|
||||||
To submit the job with version verification run:
|
|
||||||
|
|
||||||
nomad run -check-index 7 my-web.nomad
|
|
||||||
|
|
||||||
When running the job with the check-index flag, the job will only be run if the
|
|
||||||
server side version matches the the job modify index returned. If the index has
|
|
||||||
changed, another user has modified the job and the plan's results are
|
|
||||||
potentially invalid.
|
|
||||||
```
|
|
||||||
|
|
||||||
Here we can see that Nomad will destroy the 3 existing tasks and create 3
|
|
||||||
replacements but it will occur with a rolling update with a stagger of `10s`.
|
|
||||||
For more details on the update block, see
|
|
||||||
the [Jobspec documentation](/docs/jobspec/index.html#update).
|
|
||||||
|
|
||||||
## Blue-green and Canaries
|
|
||||||
|
|
||||||
Blue-green deploys have several names, Red/Black, A/B, Blue/Green, but the
|
|
||||||
concept is the same. The idea is to have two sets of applications with only one
|
|
||||||
of them being live at a given time, except while transitioning from one set to
|
|
||||||
another. What the term "live" means is that the live set of applications are
|
|
||||||
the set receiving traffic.
|
|
||||||
|
|
||||||
So imagine we have an API server that has 10 instances deployed to production
|
|
||||||
at version 1 and we want to upgrade to version 2. Hopefully the new version has
|
|
||||||
been tested in a QA environment and is now ready to start accepting production
|
|
||||||
traffic.
|
|
||||||
|
|
||||||
In this case we would consider version 1 to be the live set and we want to
|
|
||||||
transition to version 2. We can model this workflow with the below job:
|
|
||||||
|
|
||||||
```hcl
|
|
||||||
job "my-api" {
|
|
||||||
# ...
|
|
||||||
|
|
||||||
group "api-green" {
|
|
||||||
count = 10
|
|
||||||
|
|
||||||
task "api-server" {
|
|
||||||
driver = "docker"
|
|
||||||
|
|
||||||
config {
|
|
||||||
image = "api-server:v1"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
group "api-blue" {
|
|
||||||
count = 0
|
|
||||||
|
|
||||||
task "api-server" {
|
|
||||||
driver = "docker"
|
|
||||||
|
|
||||||
config {
|
|
||||||
image = "api-server:v2"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Here we can see the live group is "api-green" since it has a non-zero count. To
|
|
||||||
transition to v2, we up the count of "api-blue" and down the count of
|
|
||||||
"api-green". We can now see how the canary process is a special case of
|
|
||||||
blue-green. If we set "api-blue" to `count = 1` and "api-green" to `count = 9`,
|
|
||||||
there will still be the original 10 instances but we will be testing only one
|
|
||||||
instance of the new version, essentially canarying it.
|
|
||||||
|
|
||||||
If at any time we notice that the new version is behaving incorrectly and we
|
|
||||||
want to roll back, all that we have to do is drop the count of the new group to
|
|
||||||
0 and restore the original version back to 10. This fine control lets job
|
|
||||||
operators be confident that deployments will not cause down time. If the deploy
|
|
||||||
is successful and we fully transition from v1 to v2 the job file will look like
|
|
||||||
this:
|
|
||||||
|
|
||||||
```hcl
|
|
||||||
job "my-api" {
|
|
||||||
# ...
|
|
||||||
|
|
||||||
group "api-green" {
|
|
||||||
count = 0
|
|
||||||
|
|
||||||
task "api-server" {
|
|
||||||
driver = "docker"
|
|
||||||
|
|
||||||
config {
|
|
||||||
image = "api-server:v1"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
group "api-blue" {
|
|
||||||
count = 10
|
|
||||||
|
|
||||||
task "api-server" {
|
|
||||||
driver = "docker"
|
|
||||||
|
|
||||||
config {
|
|
||||||
image = "api-server:v2"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Now "api-blue" is the live group and when we are ready to update the api to v3,
|
|
||||||
we would modify "api-green" and repeat this process. The rate at which the count
|
|
||||||
of groups are incremented and decremented is totally up to the user. It is
|
|
||||||
usually good practice to start by transition one at a time until a certain
|
|
||||||
confidence threshold is met based on application specific logs and metrics.
|
|
||||||
|
|
||||||
## Handling Drain Signals
|
|
||||||
|
|
||||||
On operating systems that support signals, Nomad will signal the application
|
|
||||||
before killing it. This gives the application time to gracefully drain
|
|
||||||
connections and conduct any other cleanup that is necessary. Certain
|
|
||||||
applications take longer to drain than others and as such Nomad lets the job
|
|
||||||
file specify how long to wait in-between signaling the application to exit and
|
|
||||||
forcefully killing it. This is configurable via the `kill_timeout`. More details
|
|
||||||
can be seen in the [Jobspec documentation](/docs/jobspec/index.html#kill_timeout).
|
|
Loading…
Reference in New Issue