1013 lines
40 KiB
Plaintext
1013 lines
40 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Upgrade Guides
|
|
sidebar_title: Specific Version Details
|
|
description: |-
|
|
Specific versions of Nomad may have additional information about the upgrade
|
|
process beyond the standard flow.
|
|
---
|
|
|
|
# Upgrade Guides
|
|
|
|
The [upgrading page](/docs/upgrade) covers the details of doing a standard
|
|
upgrade. However, specific versions of Nomad may have more details provided for
|
|
their upgrades as a result of new features or changed behavior. This page is
|
|
used to document those details separately from the standard upgrade flow.
|
|
|
|
## Nomad 1.0.3, 0.12.10
|
|
|
|
Nomad versions 1.0.3 and 0.12.10 change the behavior of the `exec` and `java` drivers so that
|
|
tasks are isolated in their own PID and IPC namespaces. As a result, the
|
|
process launched by these drivers will be PID 1 in the namespace. This has
|
|
[significant impact](https://man7.org/linux/man-pages/man7/pid_namespaces.7.html)
|
|
on the treatment of a process by the Linux kernel. Furthermore, tasks in the
|
|
same allocation will no longer be able to coordinate using signals, SystemV IPC
|
|
objects, or POSIX message queues. Operators should weigh potential impact of an
|
|
upgrade on their applications against the security consequences inherent in using
|
|
the host namespaces.
|
|
|
|
This is the sole change for Nomad 1.0.3, intended to provide better process
|
|
isolation by default. An upcoming version of Nomad will include options for
|
|
configuring this behavior.
|
|
|
|
This change is limited to the `exec` and `java` driver plugins. It does not affect
|
|
the Nomad server. This only affect Nomad clients running on Linux, using the
|
|
`exec` or `java` drivers or third-party driver plugins which relied on the shared
|
|
Nomad executor library.
|
|
|
|
Upgrading a Nomad client to 1.0.3 or 0.12.10 will not restart existing tasks.
|
|
As such, processes from existing `exec`/`java` tasks will need to be manually restarted
|
|
(using `alloc stop` or another mechanism) in order to be fully isolated.
|
|
|
|
## Nomad 1.0.2
|
|
|
|
#### Dynamic secrets trigger template changes on client restart
|
|
|
|
Nomad 1.0.2 changed the behavior of template `change_mode` triggers when a
|
|
client node restarts. In Nomad 1.0.1 and earlier, the first rendering of a
|
|
template after a client restart would not trigger the `change_mode`. For
|
|
dynamic secrets such as the Vault PKI secrets engine, this resulted in the
|
|
secret being updated but not restarting or signalling the task. When the
|
|
secret's lease expired at some later time, the task workload might fail
|
|
because of the stale secret. For example, a web server's SSL certificate would
|
|
be expired and browsers would be unable to connect.
|
|
|
|
In Nomad 1.0.2, when a client node is restarted any task with Vault secrets
|
|
that are generated or have expired will have its `change_mode` triggered. If
|
|
`change_mode = "restart"` this will result in the task being restarted, to
|
|
avoid the task failing unexpectedly at some point in the future. This change
|
|
only impacts tasks using dynamic Vault secrets engines such as [PKI][pki], or
|
|
when secrets are rotated. Secrets that don't change in Vault will not trigger
|
|
a `change_mode` on client restart.
|
|
|
|
## Nomad 1.0.1
|
|
|
|
#### Envoy worker threads
|
|
|
|
Nomad v1.0.0 changed the default behavior around the number of worker threads
|
|
created by the Envoy when being used as a sidecar for Consul Connect. In Nomad
|
|
v1.0.1, the same default setting of [`--concurrency=1`][envoy_concurrency] is set for Envoy when used
|
|
as a Connect gateway. As before, the [`meta.connect.proxy_concurrency`][proxy_concurrency]
|
|
property can be set in client configuration to override the default value.
|
|
|
|
## Nomad 1.0.0
|
|
|
|
### HCL2 for Job specification
|
|
|
|
Nomad v1.0.0 adopts HCL2 for parsing the job spec. HCL2 extends HCL with more
|
|
expression and reuse support, but adds some stricter schema for HCL blocks
|
|
(a.k.a. stanzas). Check [HCL](/docs/job-specification/hcl2) for more details.
|
|
|
|
### Signal used when stopping Docker tasks
|
|
|
|
When stopping tasks running with the Docker task driver, Nomad documents that a
|
|
`SIGTERM` will be issued (unless configured with `kill_signal`). However, recent
|
|
versions of Nomad would issue `SIGINT` instead. Starting again with Nomad v1.0.0
|
|
`SIGTERM` will be sent by default when stopping Docker tasks.
|
|
|
|
### Deprecated metrics have been removed
|
|
|
|
Nomad v0.7.0 added supported for tagged metrics and deprecated untagged metrics.
|
|
There was support for configuring backwards-compatible metrics. This support has
|
|
been removed with v1.0.0, and all metrics will be emitted with tags.
|
|
|
|
### Null characters in region, datacenter, job name/ID, task group name, and task names
|
|
|
|
Starting with Nomad v1.0.0, jobs will fail validation if any of the following
|
|
contain null character: the job ID or name, the task group name, or the task
|
|
name. Any jobs meeting this requirement should be modified before an update to
|
|
v1.0.0. Similarly, client and server config validation will prohibit either the
|
|
region or the datacenter from containing null characters.
|
|
|
|
### EC2 CPU characteristics may be different
|
|
|
|
Starting with Nomad v1.0.0, the AWS fingerprinter uses data derived from the
|
|
official AWS EC2 API to determine default CPU performance characteristics,
|
|
including core count and core speed. This data should be accurate for each
|
|
instance type per region. Previously, Nomad used a hand-made lookup table that
|
|
was not region aware and may have contained inaccurate or incomplete data. As
|
|
part of this change, the AWS fingerprinter no longer sets the `cpu.modelname`
|
|
attribute.
|
|
|
|
As before, `cpu_total_compute` can be used to override the discovered CPU
|
|
resources available to the Nomad client.
|
|
|
|
### Inclusive language
|
|
|
|
Starting with Nomad v1.0.0, the terms `blacklist` and `whitelist` have been
|
|
deprecated from client configuration and driver configuration. The existing
|
|
configuration values are permitted but will be removed in a future version of
|
|
Nomad. The specific configuration values replaced are:
|
|
|
|
- Client `driver.blacklist` is replaced with `driver.denylist`.
|
|
|
|
- Client `driver.whitelist` is replaced with `driver.allowlist`.
|
|
|
|
- Client `env.blacklist` is replaced with `env.denylist`.
|
|
|
|
- Client `fingerprint.blacklist` is replaced with `fingerprint.denylist`.
|
|
|
|
- Client `fingerprint.whitelist` is replaced with `fingerprint.allowlist`.
|
|
|
|
- Client `user.blacklist` is replaced with `user.denylist`.
|
|
|
|
- Client `template.function_blacklist` is replaced with
|
|
`template.function_denylist`.
|
|
|
|
- Docker driver `docker.caps.whitelist` is replaced with
|
|
`docker.caps.allowlist`.
|
|
|
|
### Consul Connect
|
|
|
|
Nomad 1.0's Consul Connect integration works best with Consul 1.9 or later. The
|
|
ideal upgrade path is:
|
|
|
|
1. Create a new Nomad client image with Nomad 1.0 and Consul 1.9 or later.
|
|
2. Add new hosts based on the image.
|
|
3. [Drain][drain-cli] and shutdown old Nomad client nodes.
|
|
|
|
While inplace upgrades and older versions of Consul are supported by Nomad 1.0,
|
|
Envoy proxies will drop and stop accepting connections while the Nomad agent is
|
|
restarting. Nomad 1.0 with Consul 1.9 do not have this limitation.
|
|
|
|
#### Envoy proxy versions
|
|
|
|
Nomad v1.0.0 changes the behavior around the selection of Envoy version used for
|
|
Connect sidecar proxies. Previously, Nomad always defaulted to Envoy v1.11.2 if
|
|
neither the `meta.connect.sidecar_image` parameter or `sidecar_task` stanza were
|
|
explicitly configured. Likewise the same version of Envoy would be used for
|
|
Connect ingress gateways if `meta.connect.gateway_image` was unset. Starting
|
|
with Nomad v1.0.0, each Nomad Client will query Consul for a list of supported
|
|
Envoy versions. Nomad will make use of the latest version of Envoy supported by
|
|
the Consul agent when launching Envoy as a Connect sidecar proxy. If the version
|
|
of the Consul agent is older than v1.7.8, v1.8.4, or v1.9.0, Nomad will fallback
|
|
to the v1.11.2 version of Envoy. As before, if the `meta.connect.sidecar_image`,
|
|
`meta.connect.gateway_image`, or `sidecar_task` stanza are set, those settings
|
|
take precedence.
|
|
|
|
When upgrading Nomad Clients from a previous version to v1.0.0 and above, it is
|
|
recommended to also upgrade the Consul agents to v1.7.8, 1.8.4, or v1.9.0 or
|
|
newer. Upgrading Nomad and Consul to versions that support the new behavior
|
|
while also doing a full [node drain][] at the time of the upgrade for each node
|
|
will ensure Connect workloads are properly rescheduled onto nodes in such a way
|
|
that the Nomad Clients, Consul agents, and Envoy sidecar tasks maintain
|
|
compatibility with one another.
|
|
|
|
#### Envoy worker threads
|
|
|
|
Nomad v1.0.0 changes the default behavior around the number of worker threads
|
|
created by the Envoy sidecar proxy when using Consul Connect. Previously, the
|
|
Envoy [`--concurrency`][envoy_concurrency] argument was left unset, which caused
|
|
Envoy to spawn as many worker threads as logical cores available on the CPU. The
|
|
`--concurrency` value now defaults to `1` and can be configured by setting the
|
|
[`meta.connect.proxy_concurrency`][proxy_concurrency] property in client
|
|
configuration.
|
|
|
|
## Nomad 0.12.8
|
|
|
|
### Docker volume mounts
|
|
|
|
Nomad 0.12.8 includes security fixes for the handling of Docker volume mounts:
|
|
|
|
- The `docker.volumes.enabled` flag now defaults to `false` as documented.
|
|
|
|
- Docker driver mounts of type "volume" (but not "bind") were not sandboxed and
|
|
could mount arbitrary locations from the client host. The
|
|
`docker.volumes.enabled` configuration will now disable Docker mounts with
|
|
type "volume" when set to `false` (the default).
|
|
|
|
This change Docker impacts jobs that use a `mounts` with type "volume", as shown
|
|
below. This job will fail when placed unless `docker.volumes.enabled = true`.
|
|
|
|
```hcl
|
|
mounts = [
|
|
{
|
|
type = "volume"
|
|
target = "/path/in/container"
|
|
source = "docker_volume"
|
|
volume_options = {
|
|
driver_config = {
|
|
name = "local"
|
|
options = [
|
|
{
|
|
device = "/"
|
|
o = "ro,bind"
|
|
type = "ext4"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
]
|
|
```
|
|
|
|
## Nomad 0.12.6
|
|
|
|
### Artifact and Template Paths
|
|
|
|
Nomad 0.12.6 includes security fixes for privilege escalation vulnerabilities
|
|
in handling of job `template` and `artifact` stanzas:
|
|
|
|
- The `template.source` and `template.destination` fields are now protected by
|
|
the file sandbox introduced in 0.9.6. These paths are now restricted to fall
|
|
inside the task directory by default. An operator can opt-out of this
|
|
protection with the [`template.disable_file_sandbox`][] field in the client
|
|
configuration.
|
|
|
|
- The paths for `template.source`, `template.destination`, and
|
|
`artifact.destination` are validated on job submission to ensure the paths do
|
|
not escape the file sandbox. It was possible to use interpolation to bypass
|
|
this validation. The client now interpolates the paths before checking if they
|
|
are in the file sandbox.
|
|
|
|
~> **Warning:** Due to a [bug][gh-9148] in Nomad v0.12.6, the
|
|
`template.destination` and `artifact.destination` paths do not support
|
|
absolute paths, including the interpolated `NOMAD_SECRETS_DIR`,
|
|
`NOMAD_TASK_DIR`, and `NOMAD_ALLOC_DIR` variables. This bug is fixed in
|
|
v0.12.9. To work around the bug, use a relative path.
|
|
|
|
## Nomad 0.12.0
|
|
|
|
### `mbits` and Task Network Resource deprecation
|
|
|
|
Starting in Nomad 0.12.0 the `mbits` field of the network resource block has
|
|
been deprecated and is no longer considered when making scheduling decisions.
|
|
This is in part because we felt that `mbits` didn't accurately account network
|
|
bandwidth as a resource.
|
|
|
|
Additionally the use of the `network` block inside of a task's `resource` block
|
|
is also deprecated. Users are advised to move their `network` block to the
|
|
`group` block. Recent networking features have only been added to group based
|
|
network configuration. If any usecase or feature which was available with task
|
|
network resource is not fulfilled with group network configuration, please open
|
|
an issue detailing the missing capability.
|
|
|
|
### Enterprise Licensing
|
|
|
|
Enterprise binaries for Nomad are now publicly available via
|
|
[releases.hashicorp.com](https://releases.hashicorp.com/nomad/). By default all
|
|
enterprise features are enabled for 6 hours. During that time enterprise users
|
|
should apply their license with the [`nomad license put ...`](/docs/commands/license/put) command.
|
|
|
|
Once the 6 hour demonstration period expires, Nomad will shutdown. If restarted
|
|
Nomad will shutdown in a very short amount of time unless a valid license is
|
|
applied.
|
|
|
|
~> **Warning:** Due to a [bug][gh-8457] in Nomad v0.12.0, existing clusters
|
|
that are upgraded will **not** have 6 hours to apply a license. The minimal
|
|
grace period should be sufficient to apply a valid license, but enterprise
|
|
users are encouraged to delay upgrading until Nomad v0.12.1 is released and
|
|
fixes the issue.
|
|
|
|
### Docker access host filesystem
|
|
|
|
Nomad 0.12.0 disables Docker tasks access to the host filesystem, by default.
|
|
Prior to Nomad 0.12, Docker tasks may mount and then manipulate any host file
|
|
and may pose a security risk.
|
|
|
|
Operators now must explicitly allow tasks to access host filesystem. [Host
|
|
Volumes](/docs/configuration/client#host_volume-stanza) provide a fine tune
|
|
access to individual paths.
|
|
|
|
To restore pre-0.12.0 behavior, you can enable [Docker
|
|
`volume`](/docs/drivers/docker#enabled-1) to allow binding host paths, by adding
|
|
the following to the nomad client config file:
|
|
|
|
```hcl
|
|
plugin "docker" {
|
|
config {
|
|
volumes {
|
|
enabled = true
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### QEMU images
|
|
|
|
Nomad 0.12.0 restricts the paths the QEMU tasks can load an image from. A QEMU
|
|
task may download an image to the allocation directory to load. But images
|
|
outside the allocation directories must be explicitly allowed by operators in
|
|
the client agent configuration file.
|
|
|
|
For example, you may allow loading QEMU images from `/mnt/qemu-images` by
|
|
adding the following to the agent configuration file:
|
|
|
|
```hcl
|
|
plugin "qemu" {
|
|
config {
|
|
image_paths = ["/mnt/qemu-images"]
|
|
}
|
|
}
|
|
```
|
|
|
|
## Nomad 0.11.7
|
|
|
|
### Docker volume mounts
|
|
|
|
Nomad 0.11.7 includes a security fix for the handling of Docker volume
|
|
mounts. Docker driver mounts of type "volume" (but not "bind") were not
|
|
sandboxed and could mount arbitrary locations from the client host. The
|
|
`docker.volumes.enabled` configuration will now disable Docker mounts with
|
|
type "volume" when set to `false`.
|
|
|
|
This change Docker impacts jobs that use a `mounts` with type "volume", as
|
|
shown below. This job will fail when placed unless `docker.volumes.enabled = true`.
|
|
|
|
```hcl
|
|
mounts = [
|
|
{
|
|
type = "volume"
|
|
target = "/path/in/container"
|
|
source = "docker_volume"
|
|
volume_options = {
|
|
driver_config = {
|
|
name = "local"
|
|
options = [
|
|
{
|
|
device = "/"
|
|
o = "ro,bind"
|
|
type = "ext4"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
]
|
|
```
|
|
|
|
## Nomad 0.11.5
|
|
|
|
### Artifact and Template Paths
|
|
|
|
Nomad 0.11.5 includes backported security fixes for privilege escalation
|
|
vulnerabilities in handling of job `template` and `artifact` stanzas:
|
|
|
|
- The `template.source` and `template.destination` fields are now protected by
|
|
the file sandbox introduced in 0.9.6. These paths are now restricted to fall
|
|
inside the task directory by default. An operator can opt-out of this
|
|
protection with the
|
|
[`template.disable_file_sandbox`](/docs/configuration/client#template-parameters)
|
|
field in the client configuration.
|
|
- The paths for `template.source`, `template.destination`, and
|
|
`artifact.destination` are validated on job submission to ensure the paths
|
|
do not escape the file sandbox. It was possible to use interpolation to
|
|
bypass this validation. The client now interpolates the paths before
|
|
checking if they are in the file sandbox.
|
|
|
|
~> **Warning:** Due to a [bug][gh-9148] in Nomad v0.11.5, the
|
|
`template.destination` and `artifact.destination` paths do not support
|
|
absolute paths, including the interpolated `NOMAD_SECRETS_DIR`,
|
|
`NOMAD_TASK_DIR`, and `NOMAD_ALLOC_DIR` variables. This bug is fixed in
|
|
v0.11.6. To work around the bug, use a relative path.
|
|
|
|
## Nomad 0.11.3
|
|
|
|
Nomad 0.11.3 fixes a critical bug causing the nomad agent to become
|
|
unresponsive. The issue is due to a [Go 1.14.1 runtime
|
|
bug](https://github.com/golang/go/issues/38023) and affects Nomad 0.11.1 and
|
|
0.11.2.
|
|
|
|
## Nomad 0.11.2
|
|
|
|
### Scheduler Scoring Changes
|
|
|
|
Prior to Nomad 0.11.2 the scheduler algorithm used a [node's reserved
|
|
resources][reserved]
|
|
incorrectly during scoring. The result of this bug was that scoring biased in
|
|
favor of nodes with reserved resources vs nodes without reserved resources.
|
|
|
|
Placements will be more correct but slightly different in v0.11.2 vs earlier
|
|
versions of Nomad. Operators do _not_ need to take any actions as the impact of
|
|
the bug fix will only minimally affect scoring.
|
|
|
|
Feasibility (whether a node is capable of running a job at all) is _not_
|
|
affected.
|
|
|
|
### Periodic Jobs and Daylight Saving Time
|
|
|
|
Nomad 0.11.2 fixed a long outstanding bug affecting periodic jobs that are
|
|
scheduled to run during Daylight Saving Time transitions.
|
|
|
|
Nomad 0.11.2 provides a more defined behavior: Nomad evaluates the cron
|
|
expression with respect to specified time zone during transition. A 2:30am
|
|
nightly job with `America/New_York` time zone will not run on the day daylight
|
|
saving time starts; similarly, a 1:30am nightly job will run twice on the day
|
|
daylight saving time ends. See the [Daylight Saving Time][dst] documentation
|
|
for details.
|
|
|
|
## Nomad 0.11.0
|
|
|
|
### client.template: `vault_grace` deprecation
|
|
|
|
Nomad 0.11.0 updates
|
|
[consul-template](https://github.com/hashicorp/consul-template) to v0.24.1. This
|
|
library deprecates the [`vault_grace`][vault_grace] option for templating
|
|
included in Nomad. The feature has been ignored since Vault 0.5 and as long as
|
|
you are running a more recent version of Vault, you can safely remove
|
|
`vault_grace` from your Nomad jobs.
|
|
|
|
### Rkt Task Driver Removed
|
|
|
|
The `rkt` task driver has been deprecated and removed from Nomad. While the code
|
|
is available in an external repository,
|
|
<https://github.com/hashicorp/nomad-driver-rkt>, it will not be maintained as
|
|
`rkt` is [no longer being developed upstream](https://github.com/rkt/rkt). We
|
|
encourage all `rkt` users to find a new task driver as soon as possible.
|
|
|
|
## Nomad 0.10.8
|
|
|
|
### Docker volume mounts
|
|
|
|
Nomad 0.10.8 includes a security fix for the handling of Docker volume mounts.
|
|
Docker driver mounts of type "volume" (but not "bind") were not sandboxed and
|
|
could mount arbitrary locations from the client host. The
|
|
`docker.volumes.enabled` configuration will now disable Docker mounts with type
|
|
"volume" when set to `false`.
|
|
|
|
This change Docker impacts jobs that use a `mounts` with type "volume", as shown
|
|
below. This job will fail when placed unless `docker.volumes.enabled = true`.
|
|
|
|
```hcl
|
|
mounts = [
|
|
{
|
|
type = "volume"
|
|
target = "/path/in/container"
|
|
source = "docker_volume"
|
|
volume_options = {
|
|
driver_config = {
|
|
name = "local"
|
|
options = [
|
|
{
|
|
device = "/"
|
|
o = "ro,bind"
|
|
type = "ext4"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
]
|
|
```
|
|
|
|
## Nomad 0.10.6
|
|
|
|
### Artifact and Template Paths
|
|
|
|
Nomad 0.10.6 includes backported security fixes for privilege escalation
|
|
vulnerabilities in handling of job `template` and `artifact` stanzas:
|
|
|
|
- The `template.source` and `template.destination` fields are now protected by
|
|
the file sandbox introduced in 0.9.6. These paths are now restricted to fall
|
|
inside the task directory by default. An operator can opt-out of this
|
|
protection with the
|
|
[`template.disable_file_sandbox`](/docs/configuration/client#template-parameters)
|
|
field in the client configuration.
|
|
|
|
- The paths for `template.source`, `template.destination`, and
|
|
`artifact.destination` are validated on job submission to ensure the paths
|
|
do not escape the file sandbox. It was possible to use interpolation to
|
|
bypass this validation. The client now interpolates the paths before
|
|
checking if they are in the file sandbox.
|
|
|
|
~> **Warning:** Due to a [bug][gh-9148] in Nomad v0.10.6, the
|
|
`template.destination` and `artifact.destination` paths do not support
|
|
absolute paths, including the interpolated `NOMAD_SECRETS_DIR`,
|
|
`NOMAD_TASK_DIR`, and `NOMAD_ALLOC_DIR` variables. This bug is fixed in
|
|
v0.10.7. To work around the bug, use a relative path.
|
|
|
|
## Nomad 0.10.4
|
|
|
|
### Same-Node Scheduling Penalty Removed
|
|
|
|
Nomad 0.10.4 includes a fix to the scheduler that removes the same-node penalty
|
|
for allocations that have not previously failed. In earlier versions of Nomad,
|
|
the node where an allocation was running was penalized from receiving updated
|
|
versions of that allocation, resulting in a higher chance of the allocation
|
|
being placed on a new node. This was changed so that the penalty only applies to
|
|
nodes where the previous allocation has failed or been rescheduled, to reduce
|
|
the risk of correlated failures on a host. Scheduling weighs a number of
|
|
factors, but this change should reduce movement of allocations that are being
|
|
updated from a healthy state. You can view the placement metrics for an
|
|
allocation with `nomad alloc status -verbose`.
|
|
|
|
### Additional Environment Variable Filtering
|
|
|
|
Nomad will by default prevent certain environment variables set in the client
|
|
process from being passed along into launched tasks. The `CONSUL_HTTP_TOKEN`
|
|
environment variable has been added to the default list. More information can
|
|
be found in the `env.blacklist` [configuration](/docs/configuration/client#env-blacklist) .
|
|
|
|
## Nomad 0.10.3
|
|
|
|
### mTLS Certificate Validation
|
|
|
|
Nomad 0.10.3 includes a fix for a privilege escalation vulnerability in
|
|
validating TLS certificates for RPC with mTLS. Nomad RPC endpoints validated
|
|
that TLS client certificates had not expired and were signed by the same CA as
|
|
the Nomad node, but did not correctly check the certificate's name for the role
|
|
and region as described in the [Securing Nomad with TLS][tls-guide] guide. This
|
|
allows trusted operators with a client certificate signed by the CA to send RPC
|
|
calls as a Nomad client or server node, bypassing access control and accessing
|
|
any secrets available to a client.
|
|
|
|
Nomad clusters configured for mTLS following the [Securing Nomad with
|
|
TLS][tls-guide] guide or the [Vault PKI Secrets Engine
|
|
Integration][tls-vault-guide] guide should already have certificates that will
|
|
pass validation. Before upgrading to Nomad 0.10.3, operators using mTLS with
|
|
`verify_server_hostname = true` should confirm that the common name or SAN of
|
|
all Nomad client node certs is `client.<region>.nomad`, and that the common name
|
|
or SAN of all Nomad server node certs is `server.<region>.nomad`.
|
|
|
|
### Connection Limits Added
|
|
|
|
Nomad 0.10.3 introduces the [limits][] agent configuration parameters for
|
|
mitigating denial of service attacks from users who are not authenticated via
|
|
mTLS. The default limits stanza is:
|
|
|
|
```hcl
|
|
limits {
|
|
https_handshake_timeout = "5s"
|
|
http_max_conns_per_client = 100
|
|
rpc_handshake_timeout = "5s"
|
|
rpc_max_conns_per_client = 100
|
|
}
|
|
```
|
|
|
|
If your Nomad agent's endpoints are protected from unauthenticated users via
|
|
other mechanisms these limits may be safely disabled by setting them to `0`.
|
|
|
|
However the defaults were chosen to be safe for a wide variety of Nomad
|
|
deployments and may protect against accidental abuses of the Nomad API that
|
|
could cause unintended resource usage.
|
|
|
|
## Nomad 0.10.2
|
|
|
|
### Preemption Panic Fixed
|
|
|
|
Nomad 0.9.7 and 0.10.2 fix a [server crashing bug][gh-6787] present in scheduler
|
|
preemption since 0.9.0. Users unable to immediately upgrade Nomad can [disable
|
|
preemption][preemption-api] to avoid the panic.
|
|
|
|
### Dangling Docker Container Cleanup
|
|
|
|
Nomad 0.10.2 addresses an issue occurring in heavily loaded clients, where
|
|
containers are started without being properly managed by Nomad. Nomad 0.10.2
|
|
introduced a reaper that detects and kills such containers.
|
|
|
|
Operators may opt to run reaper in a dry-mode or disabling it through a client
|
|
config.
|
|
|
|
For more information, see [Docker Dangling containers][dangling-containers].
|
|
|
|
## Nomad 0.10.0
|
|
|
|
### Deployments
|
|
|
|
Nomad 0.10 enables rolling deployments for service jobs by default and adds a
|
|
default update stanza when a service job is created or updated. This does not
|
|
affect jobs with an update stanza.
|
|
|
|
In pre-0.10 releases, when updating a service job without an update stanza, all
|
|
existing allocations are stopped while new allocations start up, and this may
|
|
cause a service degradation or an outage. You can regain this behavior and
|
|
disable deployments by setting `max_parallel` to 0.
|
|
|
|
For more information, see [`update` stanza][update].
|
|
|
|
## Nomad 0.9.5
|
|
|
|
### Template Rendering
|
|
|
|
Nomad 0.9.5 includes security fixes for privilege escalation vulnerabilities in
|
|
handling of job `template` stanzas:
|
|
|
|
- The client host's environment variables are now cleaned before rendering the
|
|
template. If a template includes the `env` function, the job should include an
|
|
[`env`](/docs/job-specification/env) stanza to allow access to the variable in
|
|
the template.
|
|
|
|
- The `plugin` function is no longer permitted by default and will raise an
|
|
error if used in a template. Operator can opt-in to permitting this function
|
|
with the new
|
|
[`template.function_blacklist`](/docs/configuration/client#template-parameters)
|
|
field in the client configuration.
|
|
|
|
- The `file` function has been changed to restrict paths to fall inside the task
|
|
directory by default. Paths that used the `NOMAD_TASK_DIR` environment
|
|
variable to prefix file paths should work unchanged. Relative paths or
|
|
symlinks that point outside the task directory will raise an error. An
|
|
operator can opt-out of this protection with the new
|
|
[`template.disable_file_sandbox`](/docs/configuration/client#template-parameters)
|
|
field in the client configuration.
|
|
|
|
## Nomad 0.9.0
|
|
|
|
### Preemption
|
|
|
|
Nomad 0.9 adds preemption support for system jobs. If a system job is submitted
|
|
that has a higher priority than other running jobs on the node, and the node
|
|
does not have capacity remaining, Nomad may preempt those lower priority
|
|
allocations to place the system job. See [preemption][preemption] for more
|
|
details.
|
|
|
|
### Task Driver Plugins
|
|
|
|
All task drivers have become [plugins][plugins] in Nomad 0.9.0. There are two
|
|
user visible differences between 0.8 and 0.9 drivers:
|
|
|
|
- [LXC][lxc] is now community supported and distributed independently.
|
|
|
|
- Task driver [`config`][task-config] stanzas are no longer validated by
|
|
the [`nomad job validate`][validate] command. This is a regression that will
|
|
be fixed in a future release.
|
|
|
|
There is a new method for client driver configuration options, but existing
|
|
`client.options` settings are supported in 0.9. See [plugin
|
|
configuration][plugin-stanza] for details.
|
|
|
|
#### LXC
|
|
|
|
LXC is now an external plugin and must be installed separately. See [the LXC
|
|
driver's documentation][lxc] for details.
|
|
|
|
### Structured Logging
|
|
|
|
Nomad 0.9.0 switches to structured logging. Any log processing on the pre-0.9
|
|
log output will need to be updated to match the structured output.
|
|
|
|
Structured log lines have the format:
|
|
|
|
```
|
|
# <Timestamp> [<Level>] <Component>: <Message>: <KeyN>=<ValueN> ...
|
|
|
|
2019-01-29T05:52:09.221Z [INFO ] client.plugin: starting plugin manager: plugin-type=device
|
|
```
|
|
|
|
Values containing whitespace will be quoted:
|
|
|
|
```
|
|
... starting plugin: task=redis args="[/opt/gopath/bin/nomad logmon]"
|
|
```
|
|
|
|
### HCL2 Transition
|
|
|
|
Nomad 0.9.0 begins a transition to [HCL2][hcl2], the next version of the
|
|
HashiCorp configuration language. While Nomad has begun integrating HCL2, users
|
|
will need to continue to use HCL1 in Nomad 0.9.0 as the transition is
|
|
incomplete.
|
|
|
|
If you interpolate variables in your [`task.config`][task-config] containing
|
|
consecutive dots in their name, you will need to change your job specification
|
|
to use the `env` map. See the following example:
|
|
|
|
```hcl
|
|
env {
|
|
# Note the multiple consecutive dots
|
|
image...version = "3.2"
|
|
|
|
# Valid in both v0.8 and v0.9
|
|
image.version = "3.2"
|
|
}
|
|
|
|
# v0.8 task config stanza:
|
|
task {
|
|
driver = "docker"
|
|
config {
|
|
image = "redis:${image...version}"
|
|
}
|
|
}
|
|
|
|
# v0.9 task config stanza:
|
|
task {
|
|
driver = "docker"
|
|
config {
|
|
image = "redis:${env["image...version"]}"
|
|
}
|
|
}
|
|
```
|
|
|
|
This only affects users who interpolate unusual variables with multiple
|
|
consecutive dots in their task `config` stanza. All other interpolation is
|
|
unchanged.
|
|
|
|
Since HCL2 uses dotted object notation for interpolation users should transition
|
|
away from variable names with multiple consecutive dots.
|
|
|
|
### Downgrading clients
|
|
|
|
Due to the large refactor of the Nomad client in 0.9, downgrading to a previous
|
|
version of the client after upgrading it to Nomad 0.9 is not supported. To
|
|
downgrade safely, users should erase the Nomad client's data directory.
|
|
|
|
### `port_map` Environment Variable Changes
|
|
|
|
Before Nomad 0.9.0 ports mapped via a task driver's `port_map` stanza could be
|
|
interpolated via the `NOMAD_PORT_<label>` environment variables.
|
|
|
|
However, in Nomad 0.9.0 no parameters in a driver's `config` stanza, including
|
|
its `port_map`, are available for interpolation. This means `{{ env NOMAD_PORT_<label> }}` in a `template` stanza or `HTTP_PORT = "${NOMAD_PORT_http}"` in an `env` stanza will now interpolate the _host_ ports,
|
|
not the container's.
|
|
|
|
Nomad 0.10 introduced Task Group Networking which natively supports port mapping
|
|
without relying on task driver specific `port_map` fields. The
|
|
[`to`](/docs/job-specification/network#to) field on group network port stanzas
|
|
will be interpolated properly. Please see the
|
|
[`network`](/docs/job-specification/network/) stanza documentation for details.
|
|
|
|
## Nomad 0.8.0
|
|
|
|
### Raft Protocol Version Compatibility
|
|
|
|
When upgrading to Nomad 0.8.0 from a version lower than 0.7.0, users will need
|
|
to set the [`raft_protocol`](/docs/configuration/server#raft_protocol) option in
|
|
their `server` stanza to 1 in order to maintain backwards compatibility with the
|
|
old servers during the upgrade. After the servers have been migrated to version
|
|
0.8.0, `raft_protocol` can be moved up to 2 and the servers restarted to match
|
|
the default.
|
|
|
|
The Raft protocol must be stepped up in this way; only adjacent version numbers
|
|
are compatible (for example, version 1 cannot talk to version 3). Here is a
|
|
table of the Raft Protocol versions supported by each Nomad version:
|
|
|
|
<table>
|
|
<thead>
|
|
<tr>
|
|
<th>Version</th>
|
|
<th>Supported Raft Protocols</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>0.6 and earlier</td>
|
|
<td>0</td>
|
|
</tr>
|
|
<tr>
|
|
<td>0.7</td>
|
|
<td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td>0.8 and later</td>
|
|
<td>1, 2, 3</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
In order to enable all
|
|
[Autopilot](https://learn.hashicorp.com/tutorials/nomad/autopilot) features, all
|
|
servers in a Nomad cluster must be running with Raft protocol version 3 or
|
|
later.
|
|
|
|
#### Upgrading to Raft Protocol 3
|
|
|
|
This section provides details on upgrading to Raft Protocol 3 in Nomad 0.8 and
|
|
higher. Raft protocol version 3 requires Nomad running 0.8.0 or newer on all
|
|
servers in order to work. See [Raft Protocol Version
|
|
Compatibility](/docs/upgrade/upgrade-specific#raft-protocol-version-compatibility)
|
|
for more details. Also the format of `peers.json` used for outage recovery is
|
|
different when running with the latest Raft protocol. See [Manual Recovery Using
|
|
peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson)
|
|
for a description of the required format.
|
|
|
|
Please note that the Raft protocol is different from Nomad's internal protocol
|
|
as shown in commands like `nomad server members`. To see the version of the Raft
|
|
protocol in use on each server, use the `nomad operator raft list-peers`
|
|
command.
|
|
|
|
The easiest way to upgrade servers is to have each server leave the cluster,
|
|
upgrade its `raft_protocol` version in the `server` stanza, and then add it
|
|
back. Make sure the new server joins successfully and that the cluster is stable
|
|
before rolling the upgrade forward to the next server. It's also possible to
|
|
stand up a new set of servers, and then slowly stand down each of the older
|
|
servers in a similar fashion.
|
|
|
|
When using Raft protocol version 3, servers are identified by their `node-id`
|
|
instead of their IP address when Nomad makes changes to its internal Raft quorum
|
|
configuration. This means that once a cluster has been upgraded with servers all
|
|
running Raft protocol version 3, it will no longer allow servers running any
|
|
older Raft protocol versions to be added. If running a single Nomad server,
|
|
restarting it in-place will result in that server not being able to elect itself
|
|
as a leader. To avoid this, either set the Raft protocol back to 2, or use
|
|
[Manual Recovery Using
|
|
peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson)
|
|
to map the server to its node ID in the Raft quorum configuration.
|
|
|
|
### Node Draining Improvements
|
|
|
|
Node draining via the [`node drain`][drain-cli] command or the [drain
|
|
API][drain-api] has been substantially changed in Nomad 0.8. In Nomad 0.7.1 and
|
|
earlier draining a node would immediately stop all allocations on the node
|
|
being drained. Nomad 0.8 now supports a [`migrate`][migrate] stanza in job
|
|
specifications to control how many allocations may be migrated at once and the
|
|
default will be used for existing jobs.
|
|
|
|
The `drain` command now blocks until the drain completes. To get the Nomad 0.7.1
|
|
and earlier drain behavior use the command: `nomad node drain -enable -force -detach <node-id>`
|
|
|
|
See the [`migrate` stanza documentation][migrate] and [Decommissioning Nodes
|
|
guide](https://learn.hashicorp.com/tutorials/nomad/node-drain) for details.
|
|
|
|
### Periods in Environment Variable Names No Longer Escaped
|
|
|
|
_Applications which expect periods in environment variable names to be replaced
|
|
with underscores must be updated._
|
|
|
|
In Nomad 0.7 periods (`.`) in environment variables names were replaced with an
|
|
underscore in both the [`env`](/docs/job-specification/env) and
|
|
[`template`](/docs/job-specification/template) stanzas.
|
|
|
|
In Nomad 0.8 periods are _not_ replaced and will be included in environment
|
|
variables verbatim.
|
|
|
|
For example the following stanza:
|
|
|
|
```text
|
|
env {
|
|
registry.consul.addr = "${NOMAD_IP_http}:8500"
|
|
}
|
|
```
|
|
|
|
In Nomad 0.7 would be exposed to the task as
|
|
`registry_consul_addr=127.0.0.1:8500`. In Nomad 0.8 it will now appear exactly
|
|
as specified: `registry.consul.addr=127.0.0.1:8500`.
|
|
|
|
### Client APIs Unavailable on Older Nodes
|
|
|
|
Because Nomad 0.8 uses a new RPC mechanism to route node-specific APIs like
|
|
[`nomad alloc fs`](/docs/commands/alloc/fs) through servers to the node,
|
|
0.8 CLIs are incompatible using these commands on clients older than 0.8.
|
|
|
|
To access these commands on older clients either continue to use a pre-0.8
|
|
version of the CLI, or upgrade all clients to 0.8.
|
|
|
|
### CLI Command Changes
|
|
|
|
Nomad 0.8 has changed the organization of CLI commands to be based on
|
|
subcommands. An example of this change is the change from `nomad alloc-status`
|
|
to `nomad alloc status`. All commands have been made to be backwards compatible,
|
|
but operators should update any usage of the old style commands to the new style
|
|
as the old style will be deprecated in future versions of Nomad.
|
|
|
|
### RPC Advertise Address
|
|
|
|
The behavior of the [advertised RPC address](/docs/configuration#rpc-1) has
|
|
changed to be only used to advertise the RPC address of servers to client nodes.
|
|
Server to server communication is done using the advertised Serf address.
|
|
Existing cluster's should not be effected but the advertised RPC address may
|
|
need to be updated to allow connecting client's over a NAT.
|
|
|
|
## Nomad 0.6.0
|
|
|
|
### Default `advertise` address changes
|
|
|
|
When no `advertise` address was specified and Nomad's `bind_addr` was loopback
|
|
or `0.0.0.0`, Nomad attempted to resolve the local hostname to use as an
|
|
advertise address.
|
|
|
|
Many hosts cannot properly resolve their hostname, so Nomad 0.6 defaults
|
|
`advertise` to the first private IP on the host (e.g. `10.1.2.3`).
|
|
|
|
If you manually configure `advertise` addresses no changes are necessary.
|
|
|
|
## Nomad Clients
|
|
|
|
The change to the default, advertised IP also effect clients that do not specify
|
|
which network_interface to use. If you have several routable IPs, it is advised
|
|
to configure the client's [network
|
|
interface](/docs/configuration/client#network_interface) such that tasks bind to
|
|
the correct address.
|
|
|
|
## Nomad 0.5.5
|
|
|
|
### Docker `load` changes
|
|
|
|
Nomad 0.5.5 has a backward incompatible change in the `docker` driver's
|
|
configuration. Prior to 0.5.5 the `load` configuration option accepted a list
|
|
images to load, in 0.5.5 it has been changed to a single string. No
|
|
functionality was changed. Even if more than one item was specified prior to
|
|
0.5.5 only the first item was used.
|
|
|
|
To do a zero-downtime deploy with jobs that use the `load` option:
|
|
|
|
- Upgrade servers to version 0.5.5 or later.
|
|
|
|
- Deploy new client nodes on the same version as the servers.
|
|
|
|
- Resubmit jobs with the `load` option fixed and a constraint to only run on
|
|
version 0.5.5 or later:
|
|
|
|
```hcl
|
|
constraint {
|
|
attribute = "${attr.nomad.version}"
|
|
operator = "version"
|
|
value = ">= 0.5.5"
|
|
}
|
|
```
|
|
|
|
- Drain and shutdown old client nodes.
|
|
|
|
### Validation changes
|
|
|
|
Due to internal job serialization and validation changes you may run into
|
|
issues using 0.5.5 command line tools such as `nomad run` and `nomad validate`
|
|
with 0.5.4 or earlier agents.
|
|
|
|
It is recommended you upgrade agents before or alongside your command line
|
|
tools.
|
|
|
|
## Nomad 0.4.0
|
|
|
|
Nomad 0.4.0 has backward incompatible changes in the logic for Consul
|
|
deregistration. When a Task which was started by Nomad v0.3.x is uncleanly shut
|
|
down, the Nomad 0.4 Client will no longer clean up any stale services. If an
|
|
in-place upgrade of the Nomad client to 0.4 prevents the Task from gracefully
|
|
shutting down and deregistering its Consul-registered services, the Nomad Client
|
|
will not clean up the remaining Consul services registered with the 0.3
|
|
Executor.
|
|
|
|
We recommend draining a node before upgrading to 0.4.0 and then re-enabling the
|
|
node once the upgrade is complete.
|
|
|
|
## Nomad 0.3.1
|
|
|
|
Nomad 0.3.1 removes artifact downloading from driver configurations and places them as
|
|
a first class element of the task. As such, jobs will have to be rewritten in
|
|
the proper format and resubmitted to Nomad. Nomad clients will properly
|
|
re-attach to existing tasks but job definitions must be updated before they can
|
|
be dispatched to clients running 0.3.1.
|
|
|
|
## Nomad 0.3.0
|
|
|
|
Nomad 0.3.0 has made several substantial changes to job files included a new
|
|
`log` block and variable interpretation syntax (`${var}`), a modified `restart`
|
|
policy syntax, and minimum resources for tasks as well as validation. These
|
|
changes require a slight change to the default upgrade flow.
|
|
|
|
After upgrading the version of the servers, all previously submitted jobs must
|
|
be resubmitted with the updated job syntax using a Nomad 0.3.0 binary.
|
|
|
|
- All instances of `$var` must be converted to the new syntax of `${var}`
|
|
|
|
- All tasks must provide their required resources for CPU, memory and disk as
|
|
well as required network usage if ports are required by the task.
|
|
|
|
- Restart policies must be updated to indicate whether it is desired for the
|
|
task to restart on failure or to fail using `mode = "delay"` or `mode = "fail"` respectively.
|
|
|
|
- Service names that include periods will fail validation. To fix, remove any
|
|
periods from the service name before running the job.
|
|
|
|
After updating the Servers and job files, Nomad Clients can be upgraded by first
|
|
draining the node so no tasks are running on it. This can be verified by running
|
|
`nomad node status <node-id>` and verify there are no tasks in the `running`
|
|
state. Once that is done the client can be killed, the `data_dir` should be
|
|
deleted and then Nomad 0.3.0 can be launched.
|
|
|
|
[dangling-containers]: /docs/drivers/docker#dangling-containers
|
|
[drain-api]: /api-docs/nodes#drain-node
|
|
[drain-cli]: /docs/commands/node/drain
|
|
[dst]: /docs/job-specification/periodic#daylight-saving-time
|
|
[envoy_concurrency]: https://www.envoyproxy.io/docs/envoy/latest/operations/cli#cmdoption-concurrency
|
|
[gh-6787]: https://github.com/hashicorp/nomad/issues/6787
|
|
[gh-8457]: https://github.com/hashicorp/nomad/issues/8457
|
|
[gh-9148]: https://github.com/hashicorp/nomad/issues/9148
|
|
[hcl2]: https://github.com/hashicorp/hcl2
|
|
[limits]: /docs/configuration#limits
|
|
[lxc]: /docs/drivers/external/lxc
|
|
[migrate]: /docs/job-specification/migrate
|
|
[plugin-stanza]: /docs/configuration/plugin
|
|
[plugins]: /docs/drivers/external
|
|
[preemption-api]: /api-docs/operator#update-scheduler-configuration
|
|
[preemption]: /docs/internals/scheduling/preemption
|
|
[proxy_concurrency]: /docs/job-specification/sidecar_task#proxy_concurrency
|
|
[reserved]: /docs/configuration/client#reserved-parameters
|
|
[task-config]: /docs/job-specification/task#config
|
|
[tls-guide]: https://learn.hashicorp.com/tutorials/nomad/security-enable-tls
|
|
[tls-vault-guide]: https://learn.hashicorp.com/tutorials/nomad/vault-pki-nomad
|
|
[update]: /docs/job-specification/update
|
|
[validate]: /docs/commands/job/validate
|
|
[vault_grace]: /docs/job-specification/template
|
|
[node drain]: https://www.nomadproject.io/docs/upgrade#5-upgrade-clients
|
|
[`template.disable_file_sandbox`]: /docs/configuration/client#template-parameters
|
|
[pki]: https://www.vaultproject.io/docs/secrets/pki
|