--- layout: "guides" page_title: "Blue/Green & Canary Deployments - Operating a Job" sidebar_current: "guides-operating-a-job-updating-blue-green-deployments" description: |- Nomad has built-in support for doing blue/green and canary deployments to more safely update existing applications and services. --- # Blue/Green & Canary Deployments Sometimes [rolling upgrades](/guides/operating-a-job/update-strategies/rolling-upgrades.html) do not offer the required flexibility for updating an application in production. Often organizations prefer to put a "canary" build into production or utilize a technique known as a "blue/green" deployment to ensure a safe application rollout to production while minimizing downtime. ## Blue/Green Deployments Blue/Green deployments have several other names including Red/Black or A/B, but the concept is generally the same. In a blue/green deployment, there are two application versions. Only one application version is active at a time, except during the transition phase from one version to the next. The term "active" tends to mean "receiving traffic" or "in service". Imagine a hypothetical API server which has five instances deployed to production at version 1.3, and we want to safely upgrade to version 1.4. We want to create five new instances at version 1.4 and in the case that they are operating correctly we want to promote them and take down the five versions running 1.3. In the event of failure, we can quickly rollback to 1.3. To start, we examine our job which is running in production: ```hcl job "docs" { # ... group "api" { count = 5 update { max_parallel = 1 canary = 5 min_healthy_time = "30s" healthy_deadline = "10m" auto_revert = true } task "api-server" { driver = "docker" config { image = "api-server:1.3" } } } } ``` We see that it has an `update` stanza that has the `canary` equal to the desired count. This is what allows us to easily model blue/green deployments. When we change the job to run the "api-server:1.4" image, Nomad will create 5 new allocations without touching the original "api-server:1.3" allocations. Below we can see how this works by changing the image to run the new version: ```diff @@ -2,6 +2,8 @@ job "docs" { group "api" { task "api-server" { config { - image = "api-server:1.3" + image = "api-server:1.4" ``` Next we plan and run these changes: ```text $ nomad job plan docs.nomad +/- Job: "docs" +/- Task Group: "api" (5 canary, 5 ignore) +/- Task: "api-server" (forces create/destroy update) +/- Config { +/- image: "api-server:1.3" => "api-server:1.4" } Scheduler dry-run: - All tasks successfully allocated. Job Modify Index: 7 To submit the job with version verification run: nomad job run -check-index 7 example.nomad When running the job with the check-index flag, the job will only be run if the server side version matches the job modify index returned. If the index has changed, another user has modified the job and the plan's results are potentially invalid. $ nomad job run docs.nomad # ... ``` We can see from the plan output that Nomad is going to create 5 canaries that are running the "api-server:1.4" image and ignore all the allocations running the older image. Now if we examine the status of the job we can see that both the blue ("api-server:1.3") and green ("api-server:1.4") set are running. ```text $ nomad status docs ID = docs Name = docs Submit Date = 07/26/17 19:57:47 UTC Type = service Priority = 50 Datacenters = dc1 Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost api 0 0 10 0 0 0 Latest Deployment ID = 32a080c1 Status = running Description = Deployment is running but requires promotion Deployed Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy api true false 5 5 5 5 0 Allocations ID Node ID Task Group Version Desired Status Created At 6d8eec42 087852e2 api 1 run running 07/26/17 19:57:47 UTC 7051480e 087852e2 api 1 run running 07/26/17 19:57:47 UTC 36c6610f 087852e2 api 1 run running 07/26/17 19:57:47 UTC 410ba474 087852e2 api 1 run running 07/26/17 19:57:47 UTC 85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC 3ac3fe05 087852e2 api 0 run running 07/26/17 19:53:56 UTC 4bd51979 087852e2 api 0 run running 07/26/17 19:53:56 UTC 2998387b 087852e2 api 0 run running 07/26/17 19:53:56 UTC 35b813ee 087852e2 api 0 run running 07/26/17 19:53:56 UTC b53b4289 087852e2 api 0 run running 07/26/17 19:53:56 UTC ``` Now that we have the new set in production, we can route traffic to it and validate the new job version is working properly. Based on whether the new version is functioning properly or improperly we will either want to promote or fail the deployment. ### Promoting the Deployment After deploying the new image along side the old version we have determined it is functioning properly and we want to transition fully to the new version. Doing so is as simple as promoting the deployment: ```text $ nomad deployment promote 32a080c1 ==> Monitoring evaluation "61ac2be5" Evaluation triggered by job "docs" Evaluation within deployment: "32a080c1" Evaluation status changed: "pending" -> "complete" ==> Evaluation "61ac2be5" finished with status "complete" ``` If we look at the job's status we see that after promotion, Nomad stopped the older allocations and is only running the new one. This now completes our blue/green deployment. ```text $ nomad status docs ID = docs Name = docs Submit Date = 07/26/17 19:57:47 UTC Type = service Priority = 50 Datacenters = dc1 Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost api 0 0 5 0 5 0 Latest Deployment ID = 32a080c1 Status = successful Description = Deployment completed successfully Deployed Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy api true true 5 5 5 5 0 Allocations ID Node ID Task Group Version Desired Status Created At 6d8eec42 087852e2 api 1 run running 07/26/17 19:57:47 UTC 7051480e 087852e2 api 1 run running 07/26/17 19:57:47 UTC 36c6610f 087852e2 api 1 run running 07/26/17 19:57:47 UTC 410ba474 087852e2 api 1 run running 07/26/17 19:57:47 UTC 85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC 3ac3fe05 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC 4bd51979 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC 2998387b 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC 35b813ee 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC b53b4289 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC ``` ### Failing the Deployment After deploying the new image alongside the old version we have determined it is not functioning properly and we want to roll back to the old version. Doing so is as simple as failing the deployment: ```text $ nomad deployment fail 32a080c1 Deployment "32a080c1-de5a-a4e7-0218-521d8344c328" failed. Auto-reverted to job version 0. ==> Monitoring evaluation "6840f512" Evaluation triggered by job "example" Evaluation within deployment: "32a080c1" Allocation "0ccb732f" modified: node "36e7a123", group "cache" Allocation "64d4f282" modified: node "36e7a123", group "cache" Allocation "664e33c7" modified: node "36e7a123", group "cache" Allocation "a4cb6a4b" modified: node "36e7a123", group "cache" Allocation "fdd73bdd" modified: node "36e7a123", group "cache" Evaluation status changed: "pending" -> "complete" ==> Evaluation "6840f512" finished with status "complete" ``` If we now look at the job's status we can see that after failing the deployment, Nomad stopped the new allocations and is only running the old ones and reverted the working copy of the job back to the original specification running "api-server:1.3". ```text $ nomad status docs ID = docs Name = docs Submit Date = 07/26/17 19:57:47 UTC Type = service Priority = 50 Datacenters = dc1 Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost api 0 0 5 0 5 0 Latest Deployment ID = 6f3f84b3 Status = successful Description = Deployment completed successfully Deployed Task Group Auto Revert Desired Placed Healthy Unhealthy cache true 5 5 5 0 Allocations ID Node ID Task Group Version Desired Status Created At 27dc2a42 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC 5b7d34bb 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC 983b487d 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC d1cbf45a 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC d6b46def 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC 0ccb732f 36e7a123 api 2 run running 07/26/17 20:06:29 UTC 64d4f282 36e7a123 api 2 run running 07/26/17 20:06:29 UTC 664e33c7 36e7a123 api 2 run running 07/26/17 20:06:29 UTC a4cb6a4b 36e7a123 api 2 run running 07/26/17 20:06:29 UTC fdd73bdd 36e7a123 api 2 run running 07/26/17 20:06:29 UTC $ nomad job deployments docs ID Job ID Job Version Status Description 6f3f84b3 example 2 successful Deployment completed successfully 32a080c1 example 1 failed Deployment marked as failed - rolling back to job version 0 c4c16494 example 0 successful Deployment completed successfully ``` ## Canary Deployments Canary updates are a useful way to test a new version of a job before beginning a rolling upgrade. The `update` stanza supports setting the number of canaries the job operator would like Nomad to create when the job changes via the `canary` parameter. When the job specification is updated, Nomad creates the canaries without stopping any allocations from the previous job. This pattern allows operators to achieve higher confidence in the new job version because they can route traffic, examine logs, etc, to determine the new application is performing properly. ```hcl job "docs" { # ... group "api" { count = 5 update { max_parallel = 1 canary = 1 min_healthy_time = "30s" healthy_deadline = "10m" auto_revert = true } task "api-server" { driver = "docker" config { image = "api-server:1.3" } } } } ``` In the example above, the `update` stanza tells Nomad to create a single canary when the job specification is changed. Below we can see how this works by changing the image to run the new version: ```diff @@ -2,6 +2,8 @@ job "docs" { group "api" { task "api-server" { config { - image = "api-server:1.3" + image = "api-server:1.4" ``` Next we plan and run these changes: ```text $ nomad job plan docs.nomad +/- Job: "docs" +/- Task Group: "api" (1 canary, 5 ignore) +/- Task: "api-server" (forces create/destroy update) +/- Config { +/- image: "api-server:1.3" => "api-server:1.4" } Scheduler dry-run: - All tasks successfully allocated. Job Modify Index: 7 To submit the job with version verification run: nomad job run -check-index 7 example.nomad When running the job with the check-index flag, the job will only be run if the server side version matches the job modify index returned. If the index has changed, another user has modified the job and the plan's results are potentially invalid. $ nomad job run docs.nomad # ... ``` We can see from the plan output that Nomad is going to create 1 canary that will run the "api-server:1.4" image and ignore all the allocations running the older image. If we inspect the status we see that the canary is running along side the older version of the job: ```text $ nomad status docs ID = docs Name = docs Submit Date = 07/26/17 19:57:47 UTC Type = service Priority = 50 Datacenters = dc1 Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost api 0 0 6 0 0 0 Latest Deployment ID = 32a080c1 Status = running Description = Deployment is running but requires promotion Deployed Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy api true false 5 1 1 1 0 Allocations ID Node ID Task Group Version Desired Status Created At 85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC 3ac3fe05 087852e2 api 0 run running 07/26/17 19:53:56 UTC 4bd51979 087852e2 api 0 run running 07/26/17 19:53:56 UTC 2998387b 087852e2 api 0 run running 07/26/17 19:53:56 UTC 35b813ee 087852e2 api 0 run running 07/26/17 19:53:56 UTC b53b4289 087852e2 api 0 run running 07/26/17 19:53:56 UTC ``` Now if we promote the canary, this will trigger a rolling update to replace the remaining allocations running the older image. The rolling update will happen at a rate of `max_parallel`, so in this case one allocation at a time: ```text $ nomad deployment promote 37033151 ==> Monitoring evaluation "37033151" Evaluation triggered by job "docs" Evaluation within deployment: "ed28f6c2" Allocation "f5057465" created: node "f6646949", group "cache" Allocation "f5057465" status changed: "pending" -> "running" Evaluation status changed: "pending" -> "complete" ==> Evaluation "37033151" finished with status "complete" $ nomad status docs ID = docs Name = docs Submit Date = 07/26/17 20:28:59 UTC Type = service Priority = 50 Datacenters = dc1 Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost api 0 0 5 0 2 0 Latest Deployment ID = ed28f6c2 Status = running Description = Deployment is running Deployed Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy api true true 5 1 2 1 0 Allocations ID Node ID Task Group Version Desired Status Created At f5057465 f6646949 api 1 run running 07/26/17 20:29:23 UTC b1c88d20 f6646949 api 1 run running 07/26/17 20:28:59 UTC 1140bacf f6646949 api 0 run running 07/26/17 20:28:37 UTC 1958a34a f6646949 api 0 run running 07/26/17 20:28:37 UTC 4bda385a f6646949 api 0 run running 07/26/17 20:28:37 UTC 62d96f06 f6646949 api 0 stop complete 07/26/17 20:28:37 UTC f58abbb2 f6646949 api 0 stop complete 07/26/17 20:28:37 UTC ``` Alternatively, if the canary was not performing properly, we could abandon the change using the `nomad deployment fail` command, similar to the blue/green example.