Separate job update strategies into different pages

2016-10-11 20:15:30 -04:00 · 2016-10-11 20:15:30 -04:00 · 693b072596
parent db4e676d73
commit 693b072596
8 changed files with 323 additions and 184 deletions
--- a/website/source/assets/stylesheets/_docs.scss
+++ b/website/source/assets/stylesheets/_docs.scss
@ -65,7 +65,7 @@ body.layout-intro{
    $child-default-state: -4px;
    //first teir li
-    > li {
+    li {
      margin: 0 0 0 10px;
      > a {
@ -102,7 +102,7 @@ body.layout-intro{
          }
        }
-        .nav {
+        > .nav {
          display: block;
        }
      }
@ -113,7 +113,6 @@ body.layout-intro{
      display: none;
      padding-top: 10px;
      padding-bottom: 10px;
      margin-bottom: 15px;
      > li{
        margin-left: 10px;
@ -207,6 +206,11 @@ body.layout-intro{
    &:hover{
      text-decoration: underline;
    }
    code {
      background: inherit;
      color: $green-dark;
    }
  }
  img{
@ -231,7 +235,6 @@ body.layout-intro{
  }
 }
@media (max-width: 992px) {
  .bs-docs-section{
@ -248,9 +251,6 @@ body.layout-intro{
  }
 }
@media (max-width: 480px) {
  .bs-docs-section{
    img{
--- a/website/source/docs/operating-a-job/submitting-jobs.html.md
+++ b/website/source/docs/operating-a-job/submitting-jobs.html.md
@ -178,4 +178,4 @@ nomad run -check-index 131 docs.nomad
 For more details on advanced job updating strategies such as canary builds and
 build-green deployments, please see the documentation on [job update
-strategies](/docs/operating-a-job/update-strategies.html).
+strategies](/docs/operating-a-job/update-strategies/index.html).
--- a/website/source/docs/operating-a-job/update-strategies.html.md
+++ b/website/source/docs/operating-a-job/update-strategies.html.md
@ -1,175 +0,0 @@
 ---
 layout: "docs"
 page_title: "Update Strategies - Operating a Job"
 sidebar_current: "docs-operating-a-job-updating"
 description: |-
  Learn how to do safely update Nomad Jobs.
 ---
 # Update Strategies
 When operating a service, updating the version of the job will be a common task.
 Under a cluster scheduler the same best practices apply for reliably deploying
 new versions including: rolling updates, blue-green deploys and canaries which
 are special cased blue-green deploys. This section will explore how to do each
 of these safely with Nomad.
 ## Rolling Updates
 In order to update a service without introducing down-time, Nomad has build in
 support for rolling updates. When a job specifies a rolling update, with the
 below syntax, Nomad will only update `max-parallel` number of task groups at a
 time and will wait `stagger` duration before updating the next set.
 ```hcl
 job "example" {
  # ...
  update {
    stagger      = "30s"
    max_parallel = 1
  }
  # ...
 }
 ```
 We can use the "nomad plan" command while updating jobs to ensure the scheduler
 will do as we expect. In this example, we have 3 web server instances that we
 want to update their version. After the job file was modified we can run `plan`:
 ```text
 $ nomad plan my-web.nomad
 +/- Job: "my-web"
 +/- Task Group: "web" (3 create/destroy update)
  +/- Task: "web" (forces create/destroy update)
    +/- Config {
      +/- image:             "nginx:1.10" => "nginx:1.11"
          port_map[0][http]: "80"
    }
 Scheduler dry-run:
 - All tasks successfully allocated.
 - Rolling update, next evaluation will be in 10s.
 Job Modify Index: 7
 To submit the job with version verification run:
 nomad run -check-index 7 my-web.nomad
 When running the job with the check-index flag, the job will only be run if the
 server side version matches the job modify index returned. If the index has
 changed, another user has modified the job and the plan's results are
 potentially invalid.
 ```
 Here we can see that Nomad will destroy the 3 existing tasks and create 3
 replacements but it will occur with a rolling update with a stagger of `10s`.
 For more details on the update block, see
 the [Jobspec documentation](/docs/jobspec/index.html#update).
 ## Blue-green and Canaries
 Blue-green deploys have several names, Red/Black, A/B, Blue/Green, but the
 concept is the same. The idea is to have two sets of applications with only one
 of them being live at a given time, except while transitioning from one set to
 another.  What the term "live" means is that the live set of applications are
 the set receiving traffic.
 So imagine we have an API server that has 10 instances deployed to production
 at version 1 and we want to upgrade to version 2. Hopefully the new version has
 been tested in a QA environment and is now ready to start accepting production
 traffic.
 In this case we would consider version 1 to be the live set and we want to
 transition to version 2. We can model this workflow with the below job:
 ```hcl
 job "my-api" {
  # ...
  group "api-green" {
    count = 10
    task "api-server" {
      driver = "docker"
      config {
        image = "api-server:v1"
      }
    }
  }
  group "api-blue" {
    count = 0
    task "api-server" {
      driver = "docker"
      config {
        image = "api-server:v2"
      }
    }
  }
 }
 ```
 Here we can see the live group is "api-green" since it has a non-zero count. To
 transition to v2, we up the count of "api-blue" and down the count of
 "api-green". We can now see how the canary process is a special case of
 blue-green. If we set "api-blue" to `count = 1` and "api-green" to `count = 9`,
 there will still be the original 10 instances but we will be testing only one
 instance of the new version, essentially canarying it.
 If at any time we notice that the new version is behaving incorrectly and we
 want to roll back, all that we have to do is drop the count of the new group to
 0 and restore the original version back to 10. This fine control lets job
 operators be confident that deployments will not cause down time. If the deploy
 is successful and we fully transition from v1 to v2 the job file will look like
 this:
 ```hcl
 job "my-api" {
  # ...
  group "api-green" {
    count = 0
    task "api-server" {
      driver = "docker"
      config {
        image = "api-server:v1"
      }
    }
  }
  group "api-blue" {
    count = 10
    task "api-server" {
      driver = "docker"
      config {
        image = "api-server:v2"
      }
    }
  }
 }
 ```
 Now "api-blue" is the live group and when we are ready to update the api to v3,
 we would modify "api-green" and repeat this process. The rate at which the count
 of groups are incremented and decremented is totally up to the user. It is
 usually good practice to start by transition one at a time until a certain
 confidence threshold is met based on application specific logs and metrics.
 ## Handling Drain Signals
 On operating systems that support signals, Nomad will signal the application
 before killing it. This gives the application time to gracefully drain
 connections and conduct any other cleanup that is necessary. Certain
 applications take longer to drain than others and as such Nomad lets the job
 file specify how long to wait in-between signaling the application to exit and
 forcefully killing it. This is configurable via the `kill_timeout`. More details
 can be seen in the [Jobspec documentation](/docs/jobspec/index.html#kill_timeout).
--- a/website/source/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html.md
+++ b/website/source/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html.md
@ -0,0 +1,152 @@
 ---
 layout: "docs"
 page_title: "Blue/Green & Canary Deployments - Operating a Job"
 sidebar_current: "docs-operating-a-job-updating-blue-green-deployments"
 description: |-
  Nomad supports blue/green and canary deployments through the declarative job
  file syntax. By specifying multiple task groups, Nomad allows for easy
  configuration and rollout of blue/green and canary deployments.
 ---
 # Blue/Green &amp; Canary Deployments
 Sometimes [rolling
 upgrades](/docs/operating-a-job/update-strategies/rolling-upgrades.html) do not
 offer the required flexibility for updating an application in production. Often
 organizations prefer to put a "canary" build into production or utilize a
 technique known as a "blue/green" deployment to ensure a safe application
 rollout to production while minimizing downtime.
 Blue/Green deployments have several other names including Red/Black or A/B, but
 the concept is generally the same. In a blue/green deployment, there are two
 application versions. Only one application version is active at a time, except
 during the transition phase from one version to the next. The term "active"
 tends to mean "receiving traffic" or "in service".
 Imagine a hypothetical API server which has ten instances deployed to production
 at version 1.3, and we want to safely upgrade to version 1.4. After the new
 version has been approved to production, we may want to do a small rollout. In
 the event of failure, we can quickly rollback to 1.3.
 To start, version 1.3 is considered the active set and version 1.4 is the
 desired set. Here is a sample job file which models the transition from version
 1.3 to version 1.4 using a blue/green deployment.
 ```hcl
 job "docs" {
  datacenters = ["dc1"]
  group "api-green" {
    count = 10
    task "api-server" {
      driver = "docker"
      config {
        image = "api-server:1.3"
      }
    }
  }
  group "api-blue" {
    count = 0
    task "api-server" {
      driver = "docker"
      config {
        image = "api-server:1.4"
      }
    }
  }
 }
 ```
 It is clear that the active group is "api-green" since it has a non-zero count.
 To transition to v1.4 (api-blue), we increase the count of api-blue to match
 that of api-green.
 ```diff
@@ -2,6 +2,8 @@ job "docs" {
 group "api-blue" {
 -  count = 0
 +  count = 10
   task "api-server" {
     driver = "docker"
 ```
 Next we plan and run these changes:
 ```shell
 $ nomad plan docs.nomad
 ```
 Assuming the plan output looks okay, we are ready to run these changes.
 ```shell
 $ nomad run docs.nomad
 ```
 Our deployment is not yet finished. We are currently running at double capacity,
 so approximately half of our traffic is going to the blue and half is going to
 green. Usually we inspect our monitoring and reporting system. If we are
 experiencing errors, we reduce the count of "api-blue" back to 0. If we are
 running successfully, we change the count of "api-green" to 0.
 ```diff
@@ -2,6 +2,8 @@ job "docs" {
 group "api-green" {
 -  count = 10
 +  count = 0
   task "api-server" {
     driver = "docker"
 ```
 The next time we want to do a deployment, the "green" group becomes our
 transition group, since the "blue" group is currently active.
 ## Canary Deployments
 A canary deployment is a special type of blue/green deployment in which a subset
 of nodes continues to run in production for an extended period of time.
 Sometimes this is done for logging/analytics or as an extended blue/green
 deployment. Whatever the reason, Nomad supports canary deployments. Using the
 same strategy as defined above, simply keep the "blue" at a lower number, for
 example:
 ```hcl
 job "docs" {
  datacenters = ["dc1"]
  group "api" {
    count = 10
    task "api-server" {
      driver = "docker"
      config {
        image = "api-server:1.3"
      }
    }
  }
  group "api-canary" {
    count = 1
    task "api-server" {
      driver = "docker"
      config {
        image = "api-server:1.4"
      }
    }
  }
 }
 ```
 Here you can see there is exactly one canary version of our application (v1.4)
 and ten regular versions. Typically canary versions are also tagged
 appropriately in the [service discovery](/docs/jobspec/servicediscovery.html)
 layer to prevent unnecessary routing.
--- a/website/source/docs/operating-a-job/update-strategies/handling-signals.html.md
+++ b/website/source/docs/operating-a-job/update-strategies/handling-signals.html.md
@ -0,0 +1,37 @@
 ---
 layout: "docs"
 page_title: "Handling Signals - Operating a Job"
 sidebar_current: "docs-operating-a-job-updating-handling-signals"
 description: |-
  Well-behaved applications expose a way to perform cleanup prior to exiting.
  Nomad can optionally send a configurable signal to applications before
  killing them, allowing them to drain connections or gracefully terminate.
 ---
 # Handling Signals
 On operating systems that support signals, Nomad will send the application a
 configurable signal before killing it. This gives the application time to
 gracefully drain connections and conduct other cleanup before shutting down.
 Certain applications take longer to drain than others, and thus Nomad allows
 specifying the amount of time to wait for the application to exit before
 force-killing it.
 Before Nomad terminates an application, it will send the `SIGINT` signal to the
 process. Processes running under Nomad should respond to this signal to
 gracefully drain connections. After a configurable timeout, the application wil
 be force-terminated.
 ```hcl
 job "docs" {
  group "example" {
    task "server" {
      # ...
      kill_timeout = "45s"
    }
  }
 }
 ```
 For more detail on the `kill_timeout` option, please see the [job specification
 documentation](/docs/jobspec/index.html#kill_timeout).
--- a/website/source/docs/operating-a-job/update-strategies/index.html.md
+++ b/website/source/docs/operating-a-job/update-strategies/index.html.md
@ -0,0 +1,24 @@
 ---
 layout: "docs"
 page_title: "Update Strategies - Operating a Job"
 sidebar_current: "docs-operating-a-job-updating"
 description: |-
  This section describes common patterns for updating already-running jobs
  including rolling upgrades, blue/green deployments, and canary builds. Nomad
  provides built-in support for this functionality.
 ---
 # Update Strategies
 Most applications are long-lived and require updates over time. Whether you are
 deploying a new version of your web application or upgrading to a new version of
 redis, Nomad has built-in support for rolling updates. When a job specifies a
 rolling update, Nomad can take some configurable strategies to minimize or
 eliminate down time, stagger deployments, and more. This section and subsections
 will explore how to do so safely with Nomad.
 Please see one of the guides below or use the navigation on the left:
 1. [Rolling Upgrades](/docs/operating-a-job/update-strategies/rolling-upgrades.html)
 1. [Blue/Green &amp; Canary Deployments](/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html)
 1. [Handling Signals](/docs/operating-a-job/update-strategies/handling-signals.html)
--- a/website/source/docs/operating-a-job/update-strategies/rolling-upgrades.html.md
+++ b/website/source/docs/operating-a-job/update-strategies/rolling-upgrades.html.md
@ -0,0 +1,90 @@
 ---
 layout: "docs"
 page_title: "Rolling Upgrades - Operating a Job"
 sidebar_current: "docs-operating-a-job-updating-rolling-upgrades"
 description: |-
  In order to update a service while reducing downtime, Nomad provides a  
  built-in mechanism for rolling upgrades. Rolling upgrades allow for a subset
  of applications to be updated at a time, with a waiting period between to
  reduce downtime.
 ---
 # Rolling Upgrades
 In order to update a service while reducing downtime, Nomad provides a built-in
 mechanism for rolling upgrades. Jobs specify their "update strategy" using the
 `update` block in the job specification as shown here:
 ```hcl
 job "docs" {
  update {
    stagger      = "30s"
    max_parallel = 3
  }
  group "example" {
    task "server" {
      # ...
    }
  }
 }
 ```
 In this example, Nomad will only update 3 task groups at a time (`max_parallel =
 3`) and will wait 30 seconds (`stagger = "30s"`) before moving on to the next
 set of task groups.
 ## Planning Changes
 Suppose we make a change to a file to upgrade the version of a Docker container
 that is configured with the same rolling update strategy from above.
 ```diff
@@ -2,6 +2,8 @@ job "docs" {
   group "example" {
     task "server" {
       driver = "docker"
       config {
 -        image = "nginx:1.10"
 +        image = "nginx:1.11"
 ```
 The [`nomad plan` command](http://localhost:4567/docs/commands/plan.html) allows
 us to visualize the series of steps the scheduler would perform. We can analyze
 this output to confirm it is correct:
 ```shell
 $ nomad plan docs.nomad
 ```
 Here is some sample output:
 ```text
 +/- Job: "my-web"
 +/- Task Group: "web" (3 create/destroy update)
  +/- Task: "web" (forces create/destroy update)
    +/- Config {
      +/- image: "nginx:1.10" => "nginx:1.11"
    }
 Scheduler dry-run:
 - All tasks successfully allocated.
 - Rolling update, next evaluation will be in 30s.
 Job Modify Index: 7
 To submit the job with version verification run:
 nomad run -check-index 7 my-web.nomad
 When running the job with the check-index flag, the job will only be run if the
 server side version matches the job modify index returned. If the index has
 changed, another user has modified the job and the plan's results are
 potentially invalid.
 ```
 Here we can see that Nomad will destroy the 3 existing tasks and create 3
 replacements but it will occur with a rolling update with a stagger of `30s`.
 For more details on the `update` block, see the [job specification
 documentation](/docs/jobspec/index.html#update).
--- a/website/source/layouts/docs.erb
+++ b/website/source/layouts/docs.erb
@ -70,7 +70,18 @@
              <a href="/docs/operating-a-job/resource-utilization.html">Resource Utilization</a>
            </li>
            <li<%= sidebar_current("docs-operating-a-job-updating") %>>
-              <a href="/docs/operating-a-job/update-strategies.html">Update Strategies</a>
+              <a href="/docs/operating-a-job/update-strategies/index.html">Update Strategies</a>
              <ul class="nav">
                <li<%= sidebar_current("docs-operating-a-job-updating-rolling-upgrades") %>>
                  <a href="/docs/operating-a-job/update-strategies/rolling-upgrades.html">Rolling Upgrades</a>
                </li>
                <li<%= sidebar_current("docs-operating-a-job-updating-blue-green-deployments") %>>
                  <a href="/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html">Blue/Green &amp; Canary</a>
                </li>
                <li<%= sidebar_current("docs-operating-a-job-updating-handling-signals") %>>
                  <a href="/docs/operating-a-job/update-strategies/handling-signals.html">Handling Signals</a>
                </li>
              </ul>
            </li>
          </ul>
        </li>