Separate job update strategies into different pages

This commit is contained in:
Seth Vargo 2016-10-11 20:15:30 -04:00
parent db4e676d73
commit 693b072596
No known key found for this signature in database
GPG Key ID: 905A90C2949E8787
8 changed files with 323 additions and 184 deletions

View File

@ -65,7 +65,7 @@ body.layout-intro{
$child-default-state: -4px;
//first teir li
> li {
li {
margin: 0 0 0 10px;
> a {
@ -102,7 +102,7 @@ body.layout-intro{
}
}
.nav {
> .nav {
display: block;
}
}
@ -113,7 +113,6 @@ body.layout-intro{
display: none;
padding-top: 10px;
padding-bottom: 10px;
margin-bottom: 15px;
> li{
margin-left: 10px;
@ -207,6 +206,11 @@ body.layout-intro{
&:hover{
text-decoration: underline;
}
code {
background: inherit;
color: $green-dark;
}
}
img{
@ -231,7 +235,6 @@ body.layout-intro{
}
}
@media (max-width: 992px) {
.bs-docs-section{
@ -248,9 +251,6 @@ body.layout-intro{
}
}
@media (max-width: 480px) {
.bs-docs-section{
img{

View File

@ -178,4 +178,4 @@ nomad run -check-index 131 docs.nomad
For more details on advanced job updating strategies such as canary builds and
build-green deployments, please see the documentation on [job update
strategies](/docs/operating-a-job/update-strategies.html).
strategies](/docs/operating-a-job/update-strategies/index.html).

View File

@ -1,175 +0,0 @@
---
layout: "docs"
page_title: "Update Strategies - Operating a Job"
sidebar_current: "docs-operating-a-job-updating"
description: |-
Learn how to do safely update Nomad Jobs.
---
# Update Strategies
When operating a service, updating the version of the job will be a common task.
Under a cluster scheduler the same best practices apply for reliably deploying
new versions including: rolling updates, blue-green deploys and canaries which
are special cased blue-green deploys. This section will explore how to do each
of these safely with Nomad.
## Rolling Updates
In order to update a service without introducing down-time, Nomad has build in
support for rolling updates. When a job specifies a rolling update, with the
below syntax, Nomad will only update `max-parallel` number of task groups at a
time and will wait `stagger` duration before updating the next set.
```hcl
job "example" {
# ...
update {
stagger = "30s"
max_parallel = 1
}
# ...
}
```
We can use the "nomad plan" command while updating jobs to ensure the scheduler
will do as we expect. In this example, we have 3 web server instances that we
want to update their version. After the job file was modified we can run `plan`:
```text
$ nomad plan my-web.nomad
+/- Job: "my-web"
+/- Task Group: "web" (3 create/destroy update)
+/- Task: "web" (forces create/destroy update)
+/- Config {
+/- image: "nginx:1.10" => "nginx:1.11"
port_map[0][http]: "80"
}
Scheduler dry-run:
- All tasks successfully allocated.
- Rolling update, next evaluation will be in 10s.
Job Modify Index: 7
To submit the job with version verification run:
nomad run -check-index 7 my-web.nomad
When running the job with the check-index flag, the job will only be run if the
server side version matches the job modify index returned. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.
```
Here we can see that Nomad will destroy the 3 existing tasks and create 3
replacements but it will occur with a rolling update with a stagger of `10s`.
For more details on the update block, see
the [Jobspec documentation](/docs/jobspec/index.html#update).
## Blue-green and Canaries
Blue-green deploys have several names, Red/Black, A/B, Blue/Green, but the
concept is the same. The idea is to have two sets of applications with only one
of them being live at a given time, except while transitioning from one set to
another. What the term "live" means is that the live set of applications are
the set receiving traffic.
So imagine we have an API server that has 10 instances deployed to production
at version 1 and we want to upgrade to version 2. Hopefully the new version has
been tested in a QA environment and is now ready to start accepting production
traffic.
In this case we would consider version 1 to be the live set and we want to
transition to version 2. We can model this workflow with the below job:
```hcl
job "my-api" {
# ...
group "api-green" {
count = 10
task "api-server" {
driver = "docker"
config {
image = "api-server:v1"
}
}
}
group "api-blue" {
count = 0
task "api-server" {
driver = "docker"
config {
image = "api-server:v2"
}
}
}
}
```
Here we can see the live group is "api-green" since it has a non-zero count. To
transition to v2, we up the count of "api-blue" and down the count of
"api-green". We can now see how the canary process is a special case of
blue-green. If we set "api-blue" to `count = 1` and "api-green" to `count = 9`,
there will still be the original 10 instances but we will be testing only one
instance of the new version, essentially canarying it.
If at any time we notice that the new version is behaving incorrectly and we
want to roll back, all that we have to do is drop the count of the new group to
0 and restore the original version back to 10. This fine control lets job
operators be confident that deployments will not cause down time. If the deploy
is successful and we fully transition from v1 to v2 the job file will look like
this:
```hcl
job "my-api" {
# ...
group "api-green" {
count = 0
task "api-server" {
driver = "docker"
config {
image = "api-server:v1"
}
}
}
group "api-blue" {
count = 10
task "api-server" {
driver = "docker"
config {
image = "api-server:v2"
}
}
}
}
```
Now "api-blue" is the live group and when we are ready to update the api to v3,
we would modify "api-green" and repeat this process. The rate at which the count
of groups are incremented and decremented is totally up to the user. It is
usually good practice to start by transition one at a time until a certain
confidence threshold is met based on application specific logs and metrics.
## Handling Drain Signals
On operating systems that support signals, Nomad will signal the application
before killing it. This gives the application time to gracefully drain
connections and conduct any other cleanup that is necessary. Certain
applications take longer to drain than others and as such Nomad lets the job
file specify how long to wait in-between signaling the application to exit and
forcefully killing it. This is configurable via the `kill_timeout`. More details
can be seen in the [Jobspec documentation](/docs/jobspec/index.html#kill_timeout).

View File

@ -0,0 +1,152 @@
---
layout: "docs"
page_title: "Blue/Green & Canary Deployments - Operating a Job"
sidebar_current: "docs-operating-a-job-updating-blue-green-deployments"
description: |-
Nomad supports blue/green and canary deployments through the declarative job
file syntax. By specifying multiple task groups, Nomad allows for easy
configuration and rollout of blue/green and canary deployments.
---
# Blue/Green & Canary Deployments
Sometimes [rolling
upgrades](/docs/operating-a-job/update-strategies/rolling-upgrades.html) do not
offer the required flexibility for updating an application in production. Often
organizations prefer to put a "canary" build into production or utilize a
technique known as a "blue/green" deployment to ensure a safe application
rollout to production while minimizing downtime.
Blue/Green deployments have several other names including Red/Black or A/B, but
the concept is generally the same. In a blue/green deployment, there are two
application versions. Only one application version is active at a time, except
during the transition phase from one version to the next. The term "active"
tends to mean "receiving traffic" or "in service".
Imagine a hypothetical API server which has ten instances deployed to production
at version 1.3, and we want to safely upgrade to version 1.4. After the new
version has been approved to production, we may want to do a small rollout. In
the event of failure, we can quickly rollback to 1.3.
To start, version 1.3 is considered the active set and version 1.4 is the
desired set. Here is a sample job file which models the transition from version
1.3 to version 1.4 using a blue/green deployment.
```hcl
job "docs" {
datacenters = ["dc1"]
group "api-green" {
count = 10
task "api-server" {
driver = "docker"
config {
image = "api-server:1.3"
}
}
}
group "api-blue" {
count = 0
task "api-server" {
driver = "docker"
config {
image = "api-server:1.4"
}
}
}
}
```
It is clear that the active group is "api-green" since it has a non-zero count.
To transition to v1.4 (api-blue), we increase the count of api-blue to match
that of api-green.
```diff
@@ -2,6 +2,8 @@ job "docs" {
group "api-blue" {
- count = 0
+ count = 10
task "api-server" {
driver = "docker"
```
Next we plan and run these changes:
```shell
$ nomad plan docs.nomad
```
Assuming the plan output looks okay, we are ready to run these changes.
```shell
$ nomad run docs.nomad
```
Our deployment is not yet finished. We are currently running at double capacity,
so approximately half of our traffic is going to the blue and half is going to
green. Usually we inspect our monitoring and reporting system. If we are
experiencing errors, we reduce the count of "api-blue" back to 0. If we are
running successfully, we change the count of "api-green" to 0.
```diff
@@ -2,6 +2,8 @@ job "docs" {
group "api-green" {
- count = 10
+ count = 0
task "api-server" {
driver = "docker"
```
The next time we want to do a deployment, the "green" group becomes our
transition group, since the "blue" group is currently active.
## Canary Deployments
A canary deployment is a special type of blue/green deployment in which a subset
of nodes continues to run in production for an extended period of time.
Sometimes this is done for logging/analytics or as an extended blue/green
deployment. Whatever the reason, Nomad supports canary deployments. Using the
same strategy as defined above, simply keep the "blue" at a lower number, for
example:
```hcl
job "docs" {
datacenters = ["dc1"]
group "api" {
count = 10
task "api-server" {
driver = "docker"
config {
image = "api-server:1.3"
}
}
}
group "api-canary" {
count = 1
task "api-server" {
driver = "docker"
config {
image = "api-server:1.4"
}
}
}
}
```
Here you can see there is exactly one canary version of our application (v1.4)
and ten regular versions. Typically canary versions are also tagged
appropriately in the [service discovery](/docs/jobspec/servicediscovery.html)
layer to prevent unnecessary routing.

View File

@ -0,0 +1,37 @@
---
layout: "docs"
page_title: "Handling Signals - Operating a Job"
sidebar_current: "docs-operating-a-job-updating-handling-signals"
description: |-
Well-behaved applications expose a way to perform cleanup prior to exiting.
Nomad can optionally send a configurable signal to applications before
killing them, allowing them to drain connections or gracefully terminate.
---
# Handling Signals
On operating systems that support signals, Nomad will send the application a
configurable signal before killing it. This gives the application time to
gracefully drain connections and conduct other cleanup before shutting down.
Certain applications take longer to drain than others, and thus Nomad allows
specifying the amount of time to wait for the application to exit before
force-killing it.
Before Nomad terminates an application, it will send the `SIGINT` signal to the
process. Processes running under Nomad should respond to this signal to
gracefully drain connections. After a configurable timeout, the application wil
be force-terminated.
```hcl
job "docs" {
group "example" {
task "server" {
# ...
kill_timeout = "45s"
}
}
}
```
For more detail on the `kill_timeout` option, please see the [job specification
documentation](/docs/jobspec/index.html#kill_timeout).

View File

@ -0,0 +1,24 @@
---
layout: "docs"
page_title: "Update Strategies - Operating a Job"
sidebar_current: "docs-operating-a-job-updating"
description: |-
This section describes common patterns for updating already-running jobs
including rolling upgrades, blue/green deployments, and canary builds. Nomad
provides built-in support for this functionality.
---
# Update Strategies
Most applications are long-lived and require updates over time. Whether you are
deploying a new version of your web application or upgrading to a new version of
redis, Nomad has built-in support for rolling updates. When a job specifies a
rolling update, Nomad can take some configurable strategies to minimize or
eliminate down time, stagger deployments, and more. This section and subsections
will explore how to do so safely with Nomad.
Please see one of the guides below or use the navigation on the left:
1. [Rolling Upgrades](/docs/operating-a-job/update-strategies/rolling-upgrades.html)
1. [Blue/Green & Canary Deployments](/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html)
1. [Handling Signals](/docs/operating-a-job/update-strategies/handling-signals.html)

View File

@ -0,0 +1,90 @@
---
layout: "docs"
page_title: "Rolling Upgrades - Operating a Job"
sidebar_current: "docs-operating-a-job-updating-rolling-upgrades"
description: |-
In order to update a service while reducing downtime, Nomad provides a
built-in mechanism for rolling upgrades. Rolling upgrades allow for a subset
of applications to be updated at a time, with a waiting period between to
reduce downtime.
---
# Rolling Upgrades
In order to update a service while reducing downtime, Nomad provides a built-in
mechanism for rolling upgrades. Jobs specify their "update strategy" using the
`update` block in the job specification as shown here:
```hcl
job "docs" {
update {
stagger = "30s"
max_parallel = 3
}
group "example" {
task "server" {
# ...
}
}
}
```
In this example, Nomad will only update 3 task groups at a time (`max_parallel =
3`) and will wait 30 seconds (`stagger = "30s"`) before moving on to the next
set of task groups.
## Planning Changes
Suppose we make a change to a file to upgrade the version of a Docker container
that is configured with the same rolling update strategy from above.
```diff
@@ -2,6 +2,8 @@ job "docs" {
group "example" {
task "server" {
driver = "docker"
config {
- image = "nginx:1.10"
+ image = "nginx:1.11"
```
The [`nomad plan` command](http://localhost:4567/docs/commands/plan.html) allows
us to visualize the series of steps the scheduler would perform. We can analyze
this output to confirm it is correct:
```shell
$ nomad plan docs.nomad
```
Here is some sample output:
```text
+/- Job: "my-web"
+/- Task Group: "web" (3 create/destroy update)
+/- Task: "web" (forces create/destroy update)
+/- Config {
+/- image: "nginx:1.10" => "nginx:1.11"
}
Scheduler dry-run:
- All tasks successfully allocated.
- Rolling update, next evaluation will be in 30s.
Job Modify Index: 7
To submit the job with version verification run:
nomad run -check-index 7 my-web.nomad
When running the job with the check-index flag, the job will only be run if the
server side version matches the job modify index returned. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.
```
Here we can see that Nomad will destroy the 3 existing tasks and create 3
replacements but it will occur with a rolling update with a stagger of `30s`.
For more details on the `update` block, see the [job specification
documentation](/docs/jobspec/index.html#update).

View File

@ -70,7 +70,18 @@
<a href="/docs/operating-a-job/resource-utilization.html">Resource Utilization</a>
</li>
<li<%= sidebar_current("docs-operating-a-job-updating") %>>
<a href="/docs/operating-a-job/update-strategies.html">Update Strategies</a>
<a href="/docs/operating-a-job/update-strategies/index.html">Update Strategies</a>
<ul class="nav">
<li<%= sidebar_current("docs-operating-a-job-updating-rolling-upgrades") %>>
<a href="/docs/operating-a-job/update-strategies/rolling-upgrades.html">Rolling Upgrades</a>
</li>
<li<%= sidebar_current("docs-operating-a-job-updating-blue-green-deployments") %>>
<a href="/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html">Blue/Green &amp; Canary</a>
</li>
<li<%= sidebar_current("docs-operating-a-job-updating-handling-signals") %>>
<a href="/docs/operating-a-job/update-strategies/handling-signals.html">Handling Signals</a>
</li>
</ul>
</li>
</ul>
</li>