Remove old refs
This commit is contained in:
parent
5dc729a6fc
commit
1d783513bd
|
@ -1,75 +0,0 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Resource Utilization - Operating a Job"
|
||||
sidebar_current: "docs-operating-a-job-resource-utilization"
|
||||
description: |-
|
||||
Nomad supports reporting detailed job statistics and resource utilization
|
||||
metrics for most task drivers. This section describes the ways to inspect a
|
||||
job's resource consumption and utilization.
|
||||
---
|
||||
|
||||
# Determining Resource Utilization
|
||||
|
||||
Understanding the resource utilization of your application is important for many
|
||||
reasons and Nomad supports reporting detailed statistics in many of its drivers.
|
||||
The main interface for seeing resource utilization is with the [`alloc-status`
|
||||
command](/docs/commands/alloc-status.html) by specifying the `-stats` flag.
|
||||
|
||||
In the below example we are running `redis` and can see its resource utilization
|
||||
below:
|
||||
|
||||
```text
|
||||
$ nomad alloc-status c3e0
|
||||
ID = c3e0e3e0
|
||||
Eval ID = 617e5e39
|
||||
Name = example.cache[0]
|
||||
Node ID = 39acd6e0
|
||||
Job ID = example
|
||||
Client Status = running
|
||||
Created At = 06/28/16 16:42:42 UTC
|
||||
|
||||
Task "redis" is "running"
|
||||
Task Resources
|
||||
CPU Memory Disk IOPS Addresses
|
||||
957/1000 30 MiB/256 MiB 300 MiB 0 db: 127.0.0.1:34907
|
||||
|
||||
Memory Stats
|
||||
Cache Max Usage RSS Swap
|
||||
32 KiB 79 MiB 30 MiB 0 B
|
||||
|
||||
CPU Stats
|
||||
Percent Throttled Periods Throttled Time
|
||||
73.66% 0 0
|
||||
|
||||
Recent Events:
|
||||
Time Type Description
|
||||
06/28/16 16:43:50 UTC Started Task started by client
|
||||
06/28/16 16:42:42 UTC Received Task received by client
|
||||
```
|
||||
|
||||
Here we can see that we are near the limit of our configured CPU but we have
|
||||
plenty of memory headroom. We can use this information to alter our job's
|
||||
resources to better reflect is actually needs:
|
||||
|
||||
```hcl
|
||||
resource {
|
||||
cpu = 2000
|
||||
memory = 100
|
||||
}
|
||||
```
|
||||
|
||||
Adjusting resources is very important for a variety of reasons:
|
||||
|
||||
* Ensuring your application does not get OOM killed if it hits its memory limit.
|
||||
* Ensuring the application performs well by ensuring it has some CPU allowance.
|
||||
* Optimizing cluster density by reserving what you need and not over-allocating.
|
||||
|
||||
While single point in time resource usage measurements are useful, it is often
|
||||
more useful to graph resource usage over time to better understand and estimate
|
||||
resource usage. Nomad supports outputting resource data to statsite and statsd
|
||||
and is the recommended way of monitoring resources. For more information about
|
||||
outputting telemetry see the [Telemetry documentation](/docs/agent/telemetry.html).
|
||||
|
||||
For more advanced use cases, the resource usage data may also be accessed via
|
||||
the client's HTTP API. See the documentation of the Client's
|
||||
[Allocation HTTP API](/docs/http/client-allocation-stats.html)
|
|
@ -1,93 +0,0 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Submitting Jobs - Operating a Job"
|
||||
sidebar_current: "docs-operating-a-job-submitting"
|
||||
description: |-
|
||||
The job file is the unit of work in Nomad. Upon authoring, the job file is
|
||||
submitted to the server for evaluation and scheduling. This section discusses
|
||||
some techniques for submitting jobs.
|
||||
---
|
||||
|
||||
# Submitting Jobs
|
||||
|
||||
In Nomad, the description of the job and all its requirements are maintained in
|
||||
a single file called the "job file". This job file resides locally on disk and
|
||||
it is highly recommended that you check job files into source control.
|
||||
|
||||
The general flow for submitting a job in Nomad is:
|
||||
|
||||
1. Author a job file according to the job specification
|
||||
1. Plan and review changes with a Nomad server
|
||||
1. Submit the job file to a Nomad server
|
||||
1. (Optional) Review job status and logs
|
||||
|
||||
Here is a very basic example to get you started.
|
||||
|
||||
## Author a Job File
|
||||
Authoring a job file is very easy. For more detailed information, please see the
|
||||
[job specification](/docs/jobspec/index.html). Here is a sample job file which
|
||||
runs a small docker container web server.
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
datacenters = ["dc1"]
|
||||
|
||||
group "example" {
|
||||
task "server" {
|
||||
driver = "docker"
|
||||
config {
|
||||
image = "hashicorp/http-echo"
|
||||
args = ["-text", "hello world"]
|
||||
}
|
||||
|
||||
resources {
|
||||
memory = 32
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This job file exists on your local workstation in plain text. When you are
|
||||
satisfied with this job file, you will plan and review the scheduler decision.
|
||||
It is generally a best practice to commit job files to source control,
|
||||
especially if you are working in a team.
|
||||
|
||||
## Planning the Job
|
||||
Once the job file is authored, we need to plan out the changes. The `nomad plan`
|
||||
command may be used to perform a dry-run of the scheduler and inform us of
|
||||
which scheduling decisions would take place.
|
||||
|
||||
```shell
|
||||
$ nomad plan example.nomad
|
||||
```
|
||||
|
||||
The resulting output will look like:
|
||||
|
||||
```text
|
||||
TODO: Output
|
||||
```
|
||||
|
||||
Note that no action has been taken. This is a complete dry-run and no
|
||||
allocations have taken place.
|
||||
|
||||
## Submitting the Job
|
||||
Assuming the output of the plan looks acceptable, we can ask Nomad to execute
|
||||
this job. This is done via the `nomad run` command. We can optionally supply
|
||||
the modify index provided to us by the plan command to ensure no changes to this
|
||||
job have taken place between our plan and now.
|
||||
|
||||
```shell
|
||||
$ nomad run -check-index=123 example.nomad
|
||||
```
|
||||
|
||||
The resulting output will look like:
|
||||
|
||||
```text
|
||||
TODO: Output
|
||||
```
|
||||
|
||||
Now that the job is scheduled, it may or may not be running. We need to inspect
|
||||
the allocation status and logs to make sure the job started correctly. The next
|
||||
section on [inspecting state](/docs/operating-a-job/inspecting-state.html) details ways to
|
||||
examine this job.
|
|
@ -1,176 +0,0 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Update Strategies - Operating a Job"
|
||||
sidebar_current: "docs-operating-a-job-updating"
|
||||
description: |-
|
||||
Learn how to do safely update Nomad Jobs.
|
||||
---
|
||||
|
||||
# Updating a Job
|
||||
|
||||
When operating a service, updating the version of the job will be a common task.
|
||||
Under a cluster scheduler the same best practices apply for reliably deploying
|
||||
new versions including: rolling updates, blue-green deploys and canaries which
|
||||
are special cased blue-green deploys. This section will explore how to do each
|
||||
of these safely with Nomad.
|
||||
|
||||
## Rolling Updates
|
||||
|
||||
In order to update a service without introducing down-time, Nomad has build in
|
||||
support for rolling updates. When a job specifies a rolling update, with the
|
||||
below syntax, Nomad will only update `max-parallel` number of task groups at a
|
||||
time and will wait `stagger` duration before updating the next set.
|
||||
|
||||
```hcl
|
||||
job "example" {
|
||||
# ...
|
||||
|
||||
update {
|
||||
stagger = "30s"
|
||||
max_parallel = 1
|
||||
}
|
||||
|
||||
# ...
|
||||
}
|
||||
```
|
||||
|
||||
We can use the [`nomad plan` command](/docs/commands/plan.html) while updating
|
||||
jobs to ensure the scheduler will do as we expect. In this example, we have 3
|
||||
web server instances that we want to update their version. After the job file
|
||||
was modified we can run `plan`:
|
||||
|
||||
```text
|
||||
$ nomad plan my-web.nomad
|
||||
+/- Job: "my-web"
|
||||
+/- Task Group: "web" (3 create/destroy update)
|
||||
+/- Task: "web" (forces create/destroy update)
|
||||
+/- Config {
|
||||
+/- image: "nginx:1.10" => "nginx:1.11"
|
||||
port_map[0][http]: "80"
|
||||
}
|
||||
|
||||
Scheduler dry-run:
|
||||
- All tasks successfully allocated.
|
||||
- Rolling update, next evaluation will be in 10s.
|
||||
|
||||
Job Modify Index: 7
|
||||
To submit the job with version verification run:
|
||||
|
||||
nomad run -check-index 7 my-web.nomad
|
||||
|
||||
When running the job with the check-index flag, the job will only be run if the
|
||||
server side version matches the the job modify index returned. If the index has
|
||||
changed, another user has modified the job and the plan's results are
|
||||
potentially invalid.
|
||||
```
|
||||
|
||||
Here we can see that Nomad will destroy the 3 existing tasks and create 3
|
||||
replacements but it will occur with a rolling update with a stagger of `10s`.
|
||||
For more details on the update block, see
|
||||
the [Jobspec documentation](/docs/jobspec/index.html#update).
|
||||
|
||||
## Blue-green and Canaries
|
||||
|
||||
Blue-green deploys have several names, Red/Black, A/B, Blue/Green, but the
|
||||
concept is the same. The idea is to have two sets of applications with only one
|
||||
of them being live at a given time, except while transitioning from one set to
|
||||
another. What the term "live" means is that the live set of applications are
|
||||
the set receiving traffic.
|
||||
|
||||
So imagine we have an API server that has 10 instances deployed to production
|
||||
at version 1 and we want to upgrade to version 2. Hopefully the new version has
|
||||
been tested in a QA environment and is now ready to start accepting production
|
||||
traffic.
|
||||
|
||||
In this case we would consider version 1 to be the live set and we want to
|
||||
transition to version 2. We can model this workflow with the below job:
|
||||
|
||||
```hcl
|
||||
job "my-api" {
|
||||
# ...
|
||||
|
||||
group "api-green" {
|
||||
count = 10
|
||||
|
||||
task "api-server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "api-server:v1"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "api-blue" {
|
||||
count = 0
|
||||
|
||||
task "api-server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "api-server:v2"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Here we can see the live group is "api-green" since it has a non-zero count. To
|
||||
transition to v2, we up the count of "api-blue" and down the count of
|
||||
"api-green". We can now see how the canary process is a special case of
|
||||
blue-green. If we set "api-blue" to `count = 1` and "api-green" to `count = 9`,
|
||||
there will still be the original 10 instances but we will be testing only one
|
||||
instance of the new version, essentially canarying it.
|
||||
|
||||
If at any time we notice that the new version is behaving incorrectly and we
|
||||
want to roll back, all that we have to do is drop the count of the new group to
|
||||
0 and restore the original version back to 10. This fine control lets job
|
||||
operators be confident that deployments will not cause down time. If the deploy
|
||||
is successful and we fully transition from v1 to v2 the job file will look like
|
||||
this:
|
||||
|
||||
```hcl
|
||||
job "my-api" {
|
||||
# ...
|
||||
|
||||
group "api-green" {
|
||||
count = 0
|
||||
|
||||
task "api-server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "api-server:v1"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "api-blue" {
|
||||
count = 10
|
||||
|
||||
task "api-server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "api-server:v2"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Now "api-blue" is the live group and when we are ready to update the api to v3,
|
||||
we would modify "api-green" and repeat this process. The rate at which the count
|
||||
of groups are incremented and decremented is totally up to the user. It is
|
||||
usually good practice to start by transition one at a time until a certain
|
||||
confidence threshold is met based on application specific logs and metrics.
|
||||
|
||||
## Handling Drain Signals
|
||||
|
||||
On operating systems that support signals, Nomad will signal the application
|
||||
before killing it. This gives the application time to gracefully drain
|
||||
connections and conduct any other cleanup that is necessary. Certain
|
||||
applications take longer to drain than others and as such Nomad lets the job
|
||||
file specify how long to wait in-between signaling the application to exit and
|
||||
forcefully killing it. This is configurable via the `kill_timeout`. More details
|
||||
can be seen in the [Jobspec documentation](/docs/jobspec/index.html#kill_timeout).
|
Loading…
Reference in New Issue