From 693b0725965cff96867401491c6b0e205540d585 Mon Sep 17 00:00:00 2001 From: Seth Vargo Date: Tue, 11 Oct 2016 20:15:30 -0400 Subject: [PATCH] Separate job update strategies into different pages --- website/source/assets/stylesheets/_docs.scss | 14 +- .../operating-a-job/submitting-jobs.html.md | 2 +- .../operating-a-job/update-strategies.html.md | 175 ------------------ .../blue-green-and-canary-deployments.html.md | 152 +++++++++++++++ .../handling-signals.html.md | 37 ++++ .../update-strategies/index.html.md | 24 +++ .../rolling-upgrades.html.md | 90 +++++++++ website/source/layouts/docs.erb | 13 +- 8 files changed, 323 insertions(+), 184 deletions(-) delete mode 100644 website/source/docs/operating-a-job/update-strategies.html.md create mode 100644 website/source/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html.md create mode 100644 website/source/docs/operating-a-job/update-strategies/handling-signals.html.md create mode 100644 website/source/docs/operating-a-job/update-strategies/index.html.md create mode 100644 website/source/docs/operating-a-job/update-strategies/rolling-upgrades.html.md diff --git a/website/source/assets/stylesheets/_docs.scss b/website/source/assets/stylesheets/_docs.scss index 3e7e3c73d..1d266f27e 100755 --- a/website/source/assets/stylesheets/_docs.scss +++ b/website/source/assets/stylesheets/_docs.scss @@ -65,7 +65,7 @@ body.layout-intro{ $child-default-state: -4px; //first teir li - > li { + li { margin: 0 0 0 10px; > a { @@ -102,7 +102,7 @@ body.layout-intro{ } } - .nav { + > .nav { display: block; } } @@ -113,7 +113,6 @@ body.layout-intro{ display: none; padding-top: 10px; padding-bottom: 10px; - margin-bottom: 15px; > li{ margin-left: 10px; @@ -207,6 +206,11 @@ body.layout-intro{ &:hover{ text-decoration: underline; } + + code { + background: inherit; + color: $green-dark; + } } img{ @@ -231,7 +235,6 @@ body.layout-intro{ } } - @media (max-width: 992px) { .bs-docs-section{ @@ -248,9 +251,6 @@ body.layout-intro{ } } - - - @media (max-width: 480px) { .bs-docs-section{ img{ diff --git a/website/source/docs/operating-a-job/submitting-jobs.html.md b/website/source/docs/operating-a-job/submitting-jobs.html.md index a13abd148..c97c6ca7c 100644 --- a/website/source/docs/operating-a-job/submitting-jobs.html.md +++ b/website/source/docs/operating-a-job/submitting-jobs.html.md @@ -178,4 +178,4 @@ nomad run -check-index 131 docs.nomad For more details on advanced job updating strategies such as canary builds and build-green deployments, please see the documentation on [job update -strategies](/docs/operating-a-job/update-strategies.html). +strategies](/docs/operating-a-job/update-strategies/index.html). diff --git a/website/source/docs/operating-a-job/update-strategies.html.md b/website/source/docs/operating-a-job/update-strategies.html.md deleted file mode 100644 index 0c7930569..000000000 --- a/website/source/docs/operating-a-job/update-strategies.html.md +++ /dev/null @@ -1,175 +0,0 @@ ---- -layout: "docs" -page_title: "Update Strategies - Operating a Job" -sidebar_current: "docs-operating-a-job-updating" -description: |- - Learn how to do safely update Nomad Jobs. ---- - -# Update Strategies - -When operating a service, updating the version of the job will be a common task. -Under a cluster scheduler the same best practices apply for reliably deploying -new versions including: rolling updates, blue-green deploys and canaries which -are special cased blue-green deploys. This section will explore how to do each -of these safely with Nomad. - -## Rolling Updates - -In order to update a service without introducing down-time, Nomad has build in -support for rolling updates. When a job specifies a rolling update, with the -below syntax, Nomad will only update `max-parallel` number of task groups at a -time and will wait `stagger` duration before updating the next set. - -```hcl -job "example" { - # ... - - update { - stagger = "30s" - max_parallel = 1 - } - - # ... -} -``` - -We can use the "nomad plan" command while updating jobs to ensure the scheduler -will do as we expect. In this example, we have 3 web server instances that we -want to update their version. After the job file was modified we can run `plan`: - -```text -$ nomad plan my-web.nomad -+/- Job: "my-web" -+/- Task Group: "web" (3 create/destroy update) - +/- Task: "web" (forces create/destroy update) - +/- Config { - +/- image: "nginx:1.10" => "nginx:1.11" - port_map[0][http]: "80" - } - -Scheduler dry-run: -- All tasks successfully allocated. -- Rolling update, next evaluation will be in 10s. - -Job Modify Index: 7 -To submit the job with version verification run: - -nomad run -check-index 7 my-web.nomad - -When running the job with the check-index flag, the job will only be run if the -server side version matches the job modify index returned. If the index has -changed, another user has modified the job and the plan's results are -potentially invalid. -``` - -Here we can see that Nomad will destroy the 3 existing tasks and create 3 -replacements but it will occur with a rolling update with a stagger of `10s`. -For more details on the update block, see -the [Jobspec documentation](/docs/jobspec/index.html#update). - -## Blue-green and Canaries - -Blue-green deploys have several names, Red/Black, A/B, Blue/Green, but the -concept is the same. The idea is to have two sets of applications with only one -of them being live at a given time, except while transitioning from one set to -another. What the term "live" means is that the live set of applications are -the set receiving traffic. - -So imagine we have an API server that has 10 instances deployed to production -at version 1 and we want to upgrade to version 2. Hopefully the new version has -been tested in a QA environment and is now ready to start accepting production -traffic. - -In this case we would consider version 1 to be the live set and we want to -transition to version 2. We can model this workflow with the below job: - -```hcl -job "my-api" { - # ... - - group "api-green" { - count = 10 - - task "api-server" { - driver = "docker" - - config { - image = "api-server:v1" - } - } - } - - group "api-blue" { - count = 0 - - task "api-server" { - driver = "docker" - - config { - image = "api-server:v2" - } - } - } -} -``` - -Here we can see the live group is "api-green" since it has a non-zero count. To -transition to v2, we up the count of "api-blue" and down the count of -"api-green". We can now see how the canary process is a special case of -blue-green. If we set "api-blue" to `count = 1` and "api-green" to `count = 9`, -there will still be the original 10 instances but we will be testing only one -instance of the new version, essentially canarying it. - -If at any time we notice that the new version is behaving incorrectly and we -want to roll back, all that we have to do is drop the count of the new group to -0 and restore the original version back to 10. This fine control lets job -operators be confident that deployments will not cause down time. If the deploy -is successful and we fully transition from v1 to v2 the job file will look like -this: - -```hcl -job "my-api" { - # ... - - group "api-green" { - count = 0 - - task "api-server" { - driver = "docker" - - config { - image = "api-server:v1" - } - } - } - - group "api-blue" { - count = 10 - - task "api-server" { - driver = "docker" - - config { - image = "api-server:v2" - } - } - } -} -``` - -Now "api-blue" is the live group and when we are ready to update the api to v3, -we would modify "api-green" and repeat this process. The rate at which the count -of groups are incremented and decremented is totally up to the user. It is -usually good practice to start by transition one at a time until a certain -confidence threshold is met based on application specific logs and metrics. - -## Handling Drain Signals - -On operating systems that support signals, Nomad will signal the application -before killing it. This gives the application time to gracefully drain -connections and conduct any other cleanup that is necessary. Certain -applications take longer to drain than others and as such Nomad lets the job -file specify how long to wait in-between signaling the application to exit and -forcefully killing it. This is configurable via the `kill_timeout`. More details -can be seen in the [Jobspec documentation](/docs/jobspec/index.html#kill_timeout). diff --git a/website/source/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html.md b/website/source/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html.md new file mode 100644 index 000000000..a8deb442d --- /dev/null +++ b/website/source/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html.md @@ -0,0 +1,152 @@ +--- +layout: "docs" +page_title: "Blue/Green & Canary Deployments - Operating a Job" +sidebar_current: "docs-operating-a-job-updating-blue-green-deployments" +description: |- + Nomad supports blue/green and canary deployments through the declarative job + file syntax. By specifying multiple task groups, Nomad allows for easy + configuration and rollout of blue/green and canary deployments. +--- + +# Blue/Green & Canary Deployments + +Sometimes [rolling +upgrades](/docs/operating-a-job/update-strategies/rolling-upgrades.html) do not +offer the required flexibility for updating an application in production. Often +organizations prefer to put a "canary" build into production or utilize a +technique known as a "blue/green" deployment to ensure a safe application +rollout to production while minimizing downtime. + +Blue/Green deployments have several other names including Red/Black or A/B, but +the concept is generally the same. In a blue/green deployment, there are two +application versions. Only one application version is active at a time, except +during the transition phase from one version to the next. The term "active" +tends to mean "receiving traffic" or "in service". + +Imagine a hypothetical API server which has ten instances deployed to production +at version 1.3, and we want to safely upgrade to version 1.4. After the new +version has been approved to production, we may want to do a small rollout. In +the event of failure, we can quickly rollback to 1.3. + +To start, version 1.3 is considered the active set and version 1.4 is the +desired set. Here is a sample job file which models the transition from version +1.3 to version 1.4 using a blue/green deployment. + +```hcl +job "docs" { + datacenters = ["dc1"] + + group "api-green" { + count = 10 + + task "api-server" { + driver = "docker" + + config { + image = "api-server:1.3" + } + } + } + + group "api-blue" { + count = 0 + + task "api-server" { + driver = "docker" + + config { + image = "api-server:1.4" + } + } + } +} +``` + +It is clear that the active group is "api-green" since it has a non-zero count. +To transition to v1.4 (api-blue), we increase the count of api-blue to match +that of api-green. + +```diff +@@ -2,6 +2,8 @@ job "docs" { + group "api-blue" { +- count = 0 ++ count = 10 + + task "api-server" { + driver = "docker" +``` + +Next we plan and run these changes: + +```shell +$ nomad plan docs.nomad +``` + +Assuming the plan output looks okay, we are ready to run these changes. + +```shell +$ nomad run docs.nomad +``` + +Our deployment is not yet finished. We are currently running at double capacity, +so approximately half of our traffic is going to the blue and half is going to +green. Usually we inspect our monitoring and reporting system. If we are +experiencing errors, we reduce the count of "api-blue" back to 0. If we are +running successfully, we change the count of "api-green" to 0. + +```diff +@@ -2,6 +2,8 @@ job "docs" { + group "api-green" { +- count = 10 ++ count = 0 + + task "api-server" { + driver = "docker" +``` + +The next time we want to do a deployment, the "green" group becomes our +transition group, since the "blue" group is currently active. + +## Canary Deployments + +A canary deployment is a special type of blue/green deployment in which a subset +of nodes continues to run in production for an extended period of time. +Sometimes this is done for logging/analytics or as an extended blue/green +deployment. Whatever the reason, Nomad supports canary deployments. Using the +same strategy as defined above, simply keep the "blue" at a lower number, for +example: + +```hcl +job "docs" { + datacenters = ["dc1"] + + group "api" { + count = 10 + + task "api-server" { + driver = "docker" + + config { + image = "api-server:1.3" + } + } + } + + group "api-canary" { + count = 1 + + task "api-server" { + driver = "docker" + + config { + image = "api-server:1.4" + } + } + } +} +``` + +Here you can see there is exactly one canary version of our application (v1.4) +and ten regular versions. Typically canary versions are also tagged +appropriately in the [service discovery](/docs/jobspec/servicediscovery.html) +layer to prevent unnecessary routing. diff --git a/website/source/docs/operating-a-job/update-strategies/handling-signals.html.md b/website/source/docs/operating-a-job/update-strategies/handling-signals.html.md new file mode 100644 index 000000000..eadcd743c --- /dev/null +++ b/website/source/docs/operating-a-job/update-strategies/handling-signals.html.md @@ -0,0 +1,37 @@ +--- +layout: "docs" +page_title: "Handling Signals - Operating a Job" +sidebar_current: "docs-operating-a-job-updating-handling-signals" +description: |- + Well-behaved applications expose a way to perform cleanup prior to exiting. + Nomad can optionally send a configurable signal to applications before + killing them, allowing them to drain connections or gracefully terminate. +--- + +# Handling Signals + +On operating systems that support signals, Nomad will send the application a +configurable signal before killing it. This gives the application time to +gracefully drain connections and conduct other cleanup before shutting down. +Certain applications take longer to drain than others, and thus Nomad allows +specifying the amount of time to wait for the application to exit before +force-killing it. + +Before Nomad terminates an application, it will send the `SIGINT` signal to the +process. Processes running under Nomad should respond to this signal to +gracefully drain connections. After a configurable timeout, the application wil +be force-terminated. + +```hcl +job "docs" { + group "example" { + task "server" { + # ... + kill_timeout = "45s" + } + } +} +``` + +For more detail on the `kill_timeout` option, please see the [job specification +documentation](/docs/jobspec/index.html#kill_timeout). diff --git a/website/source/docs/operating-a-job/update-strategies/index.html.md b/website/source/docs/operating-a-job/update-strategies/index.html.md new file mode 100644 index 000000000..b86f8193a --- /dev/null +++ b/website/source/docs/operating-a-job/update-strategies/index.html.md @@ -0,0 +1,24 @@ +--- +layout: "docs" +page_title: "Update Strategies - Operating a Job" +sidebar_current: "docs-operating-a-job-updating" +description: |- + This section describes common patterns for updating already-running jobs + including rolling upgrades, blue/green deployments, and canary builds. Nomad + provides built-in support for this functionality. +--- + +# Update Strategies + +Most applications are long-lived and require updates over time. Whether you are +deploying a new version of your web application or upgrading to a new version of +redis, Nomad has built-in support for rolling updates. When a job specifies a +rolling update, Nomad can take some configurable strategies to minimize or +eliminate down time, stagger deployments, and more. This section and subsections +will explore how to do so safely with Nomad. + +Please see one of the guides below or use the navigation on the left: + +1. [Rolling Upgrades](/docs/operating-a-job/update-strategies/rolling-upgrades.html) +1. [Blue/Green & Canary Deployments](/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html) +1. [Handling Signals](/docs/operating-a-job/update-strategies/handling-signals.html) diff --git a/website/source/docs/operating-a-job/update-strategies/rolling-upgrades.html.md b/website/source/docs/operating-a-job/update-strategies/rolling-upgrades.html.md new file mode 100644 index 000000000..afcdaa583 --- /dev/null +++ b/website/source/docs/operating-a-job/update-strategies/rolling-upgrades.html.md @@ -0,0 +1,90 @@ +--- +layout: "docs" +page_title: "Rolling Upgrades - Operating a Job" +sidebar_current: "docs-operating-a-job-updating-rolling-upgrades" +description: |- + In order to update a service while reducing downtime, Nomad provides a + built-in mechanism for rolling upgrades. Rolling upgrades allow for a subset + of applications to be updated at a time, with a waiting period between to + reduce downtime. +--- + +# Rolling Upgrades + +In order to update a service while reducing downtime, Nomad provides a built-in +mechanism for rolling upgrades. Jobs specify their "update strategy" using the +`update` block in the job specification as shown here: + +```hcl +job "docs" { + update { + stagger = "30s" + max_parallel = 3 + } + + group "example" { + task "server" { + # ... + } + } +} +``` + +In this example, Nomad will only update 3 task groups at a time (`max_parallel = +3`) and will wait 30 seconds (`stagger = "30s"`) before moving on to the next +set of task groups. + +## Planning Changes + +Suppose we make a change to a file to upgrade the version of a Docker container +that is configured with the same rolling update strategy from above. + +```diff +@@ -2,6 +2,8 @@ job "docs" { + group "example" { + task "server" { + driver = "docker" + + config { +- image = "nginx:1.10" ++ image = "nginx:1.11" +``` + +The [`nomad plan` command](http://localhost:4567/docs/commands/plan.html) allows +us to visualize the series of steps the scheduler would perform. We can analyze +this output to confirm it is correct: + +```shell +$ nomad plan docs.nomad +``` + +Here is some sample output: + +```text ++/- Job: "my-web" ++/- Task Group: "web" (3 create/destroy update) + +/- Task: "web" (forces create/destroy update) + +/- Config { + +/- image: "nginx:1.10" => "nginx:1.11" + } + +Scheduler dry-run: +- All tasks successfully allocated. +- Rolling update, next evaluation will be in 30s. + +Job Modify Index: 7 +To submit the job with version verification run: + +nomad run -check-index 7 my-web.nomad + +When running the job with the check-index flag, the job will only be run if the +server side version matches the job modify index returned. If the index has +changed, another user has modified the job and the plan's results are +potentially invalid. +``` + +Here we can see that Nomad will destroy the 3 existing tasks and create 3 +replacements but it will occur with a rolling update with a stagger of `30s`. + +For more details on the `update` block, see the [job specification +documentation](/docs/jobspec/index.html#update). diff --git a/website/source/layouts/docs.erb b/website/source/layouts/docs.erb index 040f6af7e..372870647 100644 --- a/website/source/layouts/docs.erb +++ b/website/source/layouts/docs.erb @@ -70,7 +70,18 @@ Resource Utilization > - Update Strategies + Update Strategies +