ac90c6f008
An ACL policy with a block without label generates unexpected results. For example, a policy such as this: ``` namespace { policy = "read" } ``` Is applied to a namespace called `policy` instead of the documented behaviour of applying it to the `default` namespace. This happens because of the way HCL1 decodes blocks. Since it doesn't know if a block is expected to have a label it applies the `key` tag to the content of the block and, in the example above, the first key is `policy`, so it sets that as the `namespace` block label. Since this happens internally in the HCL decoder it's not possible to detect the problem externally. Fixing the problem inside the decoder is challenging because the JSON and HCL parsers generate different ASTs that makes impossible to differentiate between a JSON tree from an invalid HCL tree within the decoder. The fix in this commit consists of manually parsing the policy after decoding to clear labels that were not set in the file. This allows the validation rules to consistently catch and return any errors, no matter if the policy is an invalid HCL or JSON.
1873 lines
77 KiB
Plaintext
1873 lines
77 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Upgrade Guides
|
|
description: |-
|
|
Specific versions of Nomad may have additional information about the upgrade
|
|
process beyond the standard flow.
|
|
---
|
|
|
|
# Upgrade Guides
|
|
|
|
The [upgrading page](/nomad/docs/upgrade) covers the details of doing a standard
|
|
upgrade. However, specific versions of Nomad may have more details provided for
|
|
their upgrades as a result of new features or changed behavior. This page is
|
|
used to document those details separately from the standard upgrade flow.
|
|
|
|
## Nomad 1.6.0
|
|
|
|
#### Enterprise License Validation with BuildDate
|
|
|
|
Nomad Enterprise 1.6.0 now compares license `ExpirationTime` with the Nomad binary's `BuildDate`,
|
|
rather than comparing the sometimes more lenient license `TerminationTime` with `time.Now()`.
|
|
See the [licensing FAQ](/nomad/docs/v1.6.x/enterprise/license/faq) for more info,
|
|
but most relevant here is that you should run the new
|
|
[`nomad license inspect`](/nomad/docs/commands/license/inspect) command
|
|
before trying to upgrade your Enterprise servers to v1.6.0 or higher.
|
|
|
|
#### Job Evaluate API Endpoint Requires `submit-job` Instead of `read-job`
|
|
|
|
Nomad 1.6.0 updated the ACL capability requirement for the job evaluate
|
|
endpoint from `read-job` to `submit-job` to better reflect that this operation
|
|
writes state to Nomad. This endpoint is used by the `nomad job eval` CLI
|
|
command and so the ACL requirements changed for the command as well. Users that
|
|
called this endpoint or used this command using tokens with just the `read-job`
|
|
capability or the `read` policy must update their tokens to use the
|
|
`submit-job` capability or the `write` policy.
|
|
|
|
#### Exec Driver Requires New Capability for mlock
|
|
|
|
Nomad 1.6.0 updated the `exec` task driver to maintain the max memory locked
|
|
limit set by the host system. In earlier versions of Nomad this limit was
|
|
*unset* unintentionally.
|
|
|
|
In practice this means that `exec` tasks such as Vault which use the `mlock`
|
|
system call will now need to explicitly add the `ipc_lock` capability.
|
|
|
|
First [allow the `ipc_lock` capability in the Client
|
|
configuration][allow_caps_exec]:
|
|
|
|
```hcl
|
|
plugin "exec" {
|
|
config {
|
|
allow_caps = ["audit_write", "chown", "dac_override", "fowner", "fsetid",
|
|
"kill", "mknod", "net_bind_service", "setfcap", "setgid", "setpcap",
|
|
"setuid", "sys_chroot", "ipc_lock"]
|
|
}
|
|
}
|
|
```
|
|
|
|
Then [add the `ipc_lock` capability to the exec task][cap_add_exec] that uses
|
|
`mlock`:
|
|
|
|
```hcl
|
|
task "vault" {
|
|
driver = "exec"
|
|
|
|
config {
|
|
cap_add = ["ipc_lock"]
|
|
|
|
# ... other task configuration
|
|
}
|
|
|
|
# ... rest of jobspec
|
|
```
|
|
|
|
These additions are backward compatible with Nomad v1.5, so Clients and Jobs
|
|
should be updated prior to upgrading to Nomad v1.6.
|
|
|
|
See [#17780](https://github.com/hashicorp/nomad/issues/17780) for details.
|
|
|
|
#### Namespace ACL policies require a label
|
|
|
|
Nomad 1.6.0 does not allow ACL policies for namespaces without a label. Prior
|
|
to this version, ACL policies for namespaces were allowed to be defined
|
|
without a label, and the documented behavior in this case was that the policy
|
|
would be applied to the `default` namespace.
|
|
|
|
A bug in this logic caused the policy to be incorrectly applied to a different
|
|
namespace. For example, the policy below would be applied to a namespace called
|
|
`policy` instead of `default`.
|
|
|
|
```hcl
|
|
namespace {
|
|
policy = "read"
|
|
}
|
|
```
|
|
|
|
To avoid further confusion and potential security incidents, this functionality
|
|
was removed and now all namespace policies are required to have a label.
|
|
|
|
Tokens currently attached to an invalid policy will stop working after the
|
|
upgrade, so you should fix invalid policies to have an explicit namespace label
|
|
before upgrading Nomad.
|
|
|
|
After the policies are fixed, the existing tokens with those policies will
|
|
continue to work and do not need to be regenerated.
|
|
|
|
#### Command `nomad tls cert create` flag `-cluster-region` deprecated
|
|
|
|
Nomad 1.6.0 will deprecate the command `nomad tls cert create` flag `-cluster-region`
|
|
in favour of using the standard flag `-region`. The `-cluster-region` flag
|
|
will be removed in Nomad 1.7.0
|
|
|
|
#### 32-bit Intel Builds Deprecated
|
|
|
|
Starting with Nomad 1.6.0, HashiCorp will no longer release 32-bit Intel builds
|
|
of Nomad and Nomad Enterprise (the builds named `windows_386` and
|
|
`linux_386`). Bug fixes will continue to be backported to the 1.5.x and 1.4.x
|
|
versions so long as those major versions are still supported.
|
|
|
|
The 32-bit ARM build (`linux_arm` for the armhf architecture) is deprecated and
|
|
may be removed in a future major version of Nomad. The 32-bit ARM build is not
|
|
tested and may include bugs around platform-specific integer sizes. Using 64-bit
|
|
builds for small form-factor hosts such as the RaspberryPi is strongly
|
|
recommended.
|
|
|
|
## Nomad 1.5.7, 1.4.11
|
|
|
|
#### Namespace ACL policies require a label
|
|
|
|
Nomad 1.5.7 and 1.4.11 do not allow ACL policies for namespaces without a
|
|
label. Prior to these versions, ACL policies for namespaces were allowed to be
|
|
defined without a label, and the documented behavior in this case was that the
|
|
policy would be applied to the `default` namespace.
|
|
|
|
A bug in this logic caused the policy to be incorrectly applied to a different
|
|
namespace. For example, the policy below would be applied to a namespace called
|
|
`policy` instead of `default`.
|
|
|
|
```hcl
|
|
namespace {
|
|
policy = "read"
|
|
}
|
|
```
|
|
|
|
To avoid further confusion and potential security incidents, this functionality
|
|
was removed and now all namespace policies are required to have a label.
|
|
|
|
Tokens currently attached to an invalid policy will stop working after the
|
|
upgrade, so you should fix invalid policies to have an explicit namespace label
|
|
before upgrading Nomad.
|
|
|
|
After the policies are fixed, the existing tokens with those policies will
|
|
continue to work and do not need to be regenerated.
|
|
|
|
## Nomad 1.5.5
|
|
|
|
Nomad 1.5.5 fixed a bug where allocations that are rescheduled for jobs
|
|
registered before the upgrade would no longer collect allocation logs. The
|
|
`logs.enabled` field introduced in 1.5.4 is now deprecated and has been replaced
|
|
by a `logs.disabled` field that defaults to false. The `logs.enabled` field value
|
|
will be ignored in 1.5.5 and will be removed in Nomad 1.6.0.
|
|
|
|
## Nomad 1.5.4
|
|
|
|
Nomad 1.5.4 included a bug where allocations that are rescheduled for jobs
|
|
registered before the upgrade would no longer collect allocation logs. The
|
|
client will emit debug-level logs like the following:
|
|
|
|
```
|
|
client.alloc_runner.task_runner.task_hook: log collection is disabled by task
|
|
```
|
|
|
|
You should avoid this version of Nomad and instead install the latest version of
|
|
Nomad 1.5. If you have already upgraded to Nomad 1.5.4, upgrading to Nomad 1.5.5
|
|
will restore logging collection when clients are restarted as part of the
|
|
upgrade process.
|
|
|
|
## Nomad 1.5.1
|
|
|
|
#### Artifact Download Regression Fix
|
|
|
|
Nomad 1.5.1 reverts a behavior of 1.5.0 where artifact downloads were executed
|
|
as the `nobody` user on compatible Linux systems. This was done optimistically
|
|
as defense against compromised artifact endpoints attempting to exploit the
|
|
Nomad Client or tools it uses to perform downloads such as git or mercurial.
|
|
Unfortunately running the child process as any user other than root is not
|
|
compatible with the advice given in Nomad's [security hardening guide][hard_guide]
|
|
which calls for a specific directory tree structure making such operation impossible.
|
|
|
|
Other changes to artifact downloading remain - they are executed as a child
|
|
process of the Nomad agent, and on modern Linux systems make use of the Kernel
|
|
landlock feature to restrict filesystem access from that process.
|
|
|
|
## Nomad 1.5.0
|
|
|
|
#### Pause Container Reconciliation Regression
|
|
|
|
Nomad 1.5.0 introduced a regression to the way the Docker driver reconciles
|
|
dangling containers. This meant pause containers would be erroneously removed,
|
|
even though the allocation was still running. This would not affect the running
|
|
allocation, but does cause it to fail if it needs to restart. An immediate
|
|
workaround is to disable
|
|
[dangling container reconciliation][dangling_container_reconciliation].
|
|
|
|
#### Artifact Download Sandboxing
|
|
|
|
Nomad 1.5.0 changes the way [artifacts] are downloaded when specifying an `artifact`
|
|
in a task configuration. Previously the Nomad Client would download artifacts
|
|
in-process. External commands used to facilitate the download (e.g. `git`, `hg`)
|
|
would be run as `root`, and the resulting payload would be owned as `root` in the
|
|
allocation's task directory.
|
|
|
|
In an effort to improve the resilience and security model of the Nomad Client,
|
|
in 1.5.0 artifact downloads occur in a sub-process. Where possible, that
|
|
sub-process is run as the `nobody` user, and on modern Linux systems will
|
|
be isolated from the filesystem via the kernel's [landlock] capabilitiy.
|
|
|
|
Operators are encouraged to ensure jobs making use of artifacts continue to work
|
|
as expected. In particular, git-ssh users will need to make sure the system-wide
|
|
`/etc/ssh/ssh_known_hosts` file is populated with any necessary remote hosts.
|
|
Previously, Nomad's documentation suggested configuring
|
|
`/root/.ssh/known_hosts` which would apply only to the `root` user.
|
|
|
|
The artifact downloader no longer inherits all environment variables available
|
|
to the Nomad Client. The downloader sub-process environment is set as follows on
|
|
Linux / macOS:
|
|
|
|
```
|
|
PATH=/usr/local/bin:/usr/bin:/bin
|
|
TMPDIR=<path to task dir>/tmp
|
|
```
|
|
|
|
and as follows on Windows:
|
|
|
|
```
|
|
TMP=<path to task dir>\tmp
|
|
TEMP=<path to task dir>\tmp
|
|
PATH=<inherit $PATH>
|
|
HOMEPATH=<inherit $HOMEPATH>
|
|
HOMEDRIVE=<inherit $HOMEDRIVE>
|
|
USERPROFILE=<inherit $USERPROFILE>
|
|
```
|
|
|
|
Configuration of the artifact downloader should happen through the [`options`][artifact_params]
|
|
and [`headers`][artifact_params] fields of the `artifact` block. For backwards
|
|
compatibility, the sandbox can be configured to inherit specified environment variables
|
|
from the Nomad client by setting [`set_environment_variables`][artifact_env].
|
|
|
|
The use of filesystem isolation can be disabled in Client configuration by
|
|
setting [`disable_filesystem_isolation`][artifact_fs_isolation].
|
|
|
|
#### Artifact Decompression Limits
|
|
|
|
Nomad 1.5.0 now sets default limits around artifact decompression. A single artifact
|
|
payload is now limited to 100GB and 4096 files when decompressed. An artifact that
|
|
exceeds these limits during decompression will cause the artifact downloader to
|
|
fail. These limits can be adjusted or disabled in the client artifact configuration
|
|
by setting [`decompression_size_limit`][decompression_size_limit] and
|
|
[`decompression_file_count_limit`][decompression_file_count_limit].
|
|
|
|
#### Datacenter Wildcards
|
|
|
|
In Nomad 1.5.0, the
|
|
[`datacenters`](/nomad/docs/job-specification/job#datacenters) field for a job
|
|
accepts wildcards for multi-character matching. For example, `datacenters =
|
|
["dc*"]` will match all datacenters that start with `"dc"`. The default value
|
|
for `datacenters` is now `["*"]`, so the field can be omitted.
|
|
|
|
The `*` character is no longer a legal character in the
|
|
[`datacenter`](/nomad/docs/configuration#datacenter) field for an agent
|
|
configuration. Before upgrading to Nomad 1.5.0, you should first ensure that
|
|
you've updated any jobs that currently have a `*` in their datacenter name and
|
|
then ensure that no agents have this character in their `datacenter` field name.
|
|
|
|
#### Server `rejoin_after_leave` (default: `false`) now enforced
|
|
|
|
All Nomad versions prior to v1.5.0 have incorrectly ignored the Server
|
|
[`rejoin_after_leave`] configuration option. This bug has been fixed in Nomad
|
|
version v1.5.0.
|
|
|
|
Previous to v1.5.0 the behavior of Nomad `rejoin_after_leave` was always `true`,
|
|
regardless of Nomad server configuration, while the documentation incorrectly
|
|
indicated a default of `false`.
|
|
|
|
Cluster operators should be aware that explicit `leave` events (such as `nomad
|
|
server force-leave`) will now result in behavior which matches this
|
|
configuration, and should review whether they were inadvertently relying on the
|
|
buggy behavior.
|
|
|
|
#### Changes to eval broker metrics
|
|
|
|
The metric `nomad.nomad.broker.total_blocked` has been changed to
|
|
`nomad.nomad.broker.total_pending`. This state refers to internal state of the
|
|
leader's broker, and this is easily confused with the unrelated evaluation
|
|
status `"blocked"` in the Nomad API.
|
|
|
|
#### Deprecated gossip keyring commands removed
|
|
|
|
The commands `nomad operator keyring`, `nomad keyring`, `nomad operator keygen`,
|
|
and `nomad keygen` used to manage the gossip keyring were marked as deprecated
|
|
in Nomad 1.4.0. In Nomad 1.5.0, these commands have been removed. Use the `nomad
|
|
operator gossip keyring` commands to manage the gossip keyring.
|
|
|
|
#### Garbage collection of evaluations and allocations for batch job
|
|
|
|
Versions prior to 1.5.0 only delete evaluations and allocations of batch jobs
|
|
that are explicitly stopped which can lead to unbounded memory growth of Nomad
|
|
when the batch job is executed multiple times.
|
|
|
|
Nomad 1.5.0 introduces a new server configuration
|
|
[`batch_eval_gc_threshold`](/nomad/docs/configuration/server#batch_eval_gc_threshold)
|
|
to control how allocations and evaluations for batch jobs are collected.
|
|
|
|
The default threshold is `24h`. If you need to access completed allocations for
|
|
batch jobs that are older than 24h you must increase this value when upgrading
|
|
Nomad.
|
|
|
|
## Nomad 1.4.5, 1.3.10
|
|
|
|
#### Pause Container Reconciliation Regression
|
|
|
|
Nomad 1.4.5 and 1.3.10 introduced a regression to the way the Docker driver
|
|
reconciles dangling containers. This meant pause containers would be erroneously
|
|
removed, even though the allocation was still running. This would not affect the
|
|
running allocation, but does cause it to fail if it needs to restart. An immediate
|
|
workaround is to disable
|
|
[dangling container reconciliation][dangling_container_reconciliation].
|
|
|
|
## Nomad 1.4.4, 1.3.9
|
|
|
|
#### Garbage collection of evaluations and allocations for batch job
|
|
|
|
Versions prior to 1.4.4 and 1.3.9 only delete evaluations and allocations of
|
|
batch jobs that are explicitly stopped which can lead to unbounded memory
|
|
growth of Nomad when the batch job is executed multiple times.
|
|
|
|
Nomad 1.4.4 and 1.3.9 introduces a new server configuration
|
|
[`batch_eval_gc_threshold`](/nomad/docs/configuration/server#batch_eval_gc_threshold)
|
|
to control how allocations and evaluations for batch jobs are collected.
|
|
|
|
The default threshold is `24h`. If you need to access completed allocations for
|
|
batch jobs that are older than 24h you must increase this value when upgrading
|
|
Nomad.
|
|
|
|
## Nomad 1.4.0
|
|
|
|
#### Possible Panic During Upgrades
|
|
|
|
Nomad 1.4.0 initializes a keyring on the leader if one has not been previously
|
|
created, which writes a new raft entry. Users have reported that the keyring
|
|
initialization can cause a panic on older servers during upgrades. Following the
|
|
documented [upgrade process][] closely will reduce the risk of this panic. But
|
|
if a server with version 1.4.0 becomes leader while servers with versions before
|
|
1.4.0 are still in the cluster, the older servers will panic.
|
|
|
|
The most likely scenario for this is if the leader is still on a version before
|
|
1.4.0 and is netsplit from the rest of the cluster or the server is restarted
|
|
without upgrading, and one of the 1.4.0 servers becomes the leader.
|
|
|
|
You can recover from the panic by immediately upgrading the old servers. This
|
|
bug was fixed in Nomad 1.4.1.
|
|
|
|
#### Raft Protocol Version 2 Unsupported
|
|
|
|
Raft protocol version 2 was deprecated in Nomad v1.3.0, and is being removed
|
|
in Nomad v1.4.0. In Nomad 1.3.0, the default raft protocol version was updated
|
|
to version 3, and in Nomad 1.4.0 Nomad requires the use of raft protocol version
|
|
3. If [`raft_protocol`] version is explicitly set, it must now be set to `3`.
|
|
For more information see the [Upgrading to Raft Protocol 3] guide.
|
|
|
|
#### Audit logs filtering logic changed
|
|
|
|
Audit Log filtering in previous versions of Nomad handled `stages` and
|
|
`operations` filters as `OR` filters. If _either_ condition was met, the logs
|
|
would be filtered. As of 1.4.0, `stages` and `operations` are treated as `AND
|
|
filters`. Logs will only be filtered if all filter conditions match.
|
|
|
|
#### Prevent Overlapping New Allocations with Stopping Allocations
|
|
|
|
Prior to Nomad 1.4.0 the scheduler would consider the resources used by
|
|
allocations that are in the process of stopping to be free for new allocations
|
|
to use. This could cause newer allocations to crash when they try to use TCP
|
|
ports or memory used by an allocation in the process of stopping. The new and
|
|
stopping [allocations would "overlap" improperly.][alloc_overlap]
|
|
|
|
[Nomad 1.4.0 fixes this behavior][gh_10446] so that an allocation's resources
|
|
are only considered free for reuse once the client node the allocation was
|
|
running on reports it has stopped. Technically speaking: only once the
|
|
`Allocation.ClientStatus` has reached a terminal state (`complete`, `failed`,
|
|
or `lost`).
|
|
|
|
Despite this being a bug fix, it is considered a significant enough change in
|
|
behavior to reserve for a major Nomad release and *not* be backported. Please
|
|
report any negative side effects encountered as [new
|
|
issues.][gh_issue]
|
|
|
|
#### `nomad eval status -json` Without Evaluation ID Removed
|
|
|
|
Using `nomad eval status -json` without providing an evaluation ID was
|
|
deprecated in Nomad 1.2.4 with the intent to remove in Nomad 1.4.0. This option
|
|
has been removed. You can use `nomad eval list` to get a list of evaluations and
|
|
can use `nomad eval list -json` to get that list in JSON format. The `nomad eval
|
|
status <eval ID>` command will format a specific evaluation in JSON format if
|
|
the `-json` flag is provided.
|
|
|
|
#### Removing Vault/Consul from Clients
|
|
|
|
Nomad clients no longer have their Consul and Vault fingerprints cleared when
|
|
connectivity is lost with Consul and Vault. To intentionally remove Consul and
|
|
Vault from a client node, you will need to restart the Nomad client agent.
|
|
|
|
#### Numeric Operand Comparisons in Constraints
|
|
|
|
Prior to Nomad 1.4.0 the `<, <=, >, >=` operators in a constraint would always
|
|
compare the operands lexically. This behavior has been changed so that the comparison
|
|
is done numerically if both operands are integers or floats.
|
|
|
|
## Nomad 1.3.3
|
|
|
|
Environments that don't support the use of [`uid`][template_uid] and
|
|
[`gid`][template_gid] in `template` blocks, such as Windows clients, may
|
|
experience task failures with the following message after upgrading to Nomad
|
|
1.3.3:
|
|
|
|
```
|
|
Template failed: error rendering "(dynamic)" => "...": failed looking up user: managing file ownership is not supported on Windows
|
|
```
|
|
|
|
It is recommended to avoid this version of Nomad in such environments.
|
|
|
|
## Nomad 1.3.2, 1.2.9, 1.1.15
|
|
|
|
#### Client `max_kill_timeout` now enforced
|
|
|
|
Nomad versions since v0.9 have incorrectly ignored the Client [`max_kill_timeout`][max_kill_timeout]
|
|
configuration option. This bug has been fixed in Nomad versions v.1.3.2,
|
|
v1.2.9, and v1.1.15. Job submitters should be aware that a Task's [`kill_timeout`][kill_timeout]
|
|
will be reduced to the Client's `max_kill_timeout` if the value exceeds the maximum.
|
|
|
|
## Nomad 1.3.1, 1.2.8, 1.1.14
|
|
|
|
#### Default `artifact` limits
|
|
|
|
Nomad 1.3.1, 1.2.8, and 1.1.14 introduced mechanisms to limit the size of
|
|
`artifact` downloads and how long these operations can take. The limits are
|
|
defined in the new [`artifact`client configuration][client_artifact] and have
|
|
predefined default values.
|
|
|
|
While the defaults set are fairly large, it is recommended to double-check them
|
|
prior to upgrading your Nomad clients to make sure they fit your needs.
|
|
|
|
## Nomad 1.3.0
|
|
|
|
#### Raft Protocol Version 2 Deprecation
|
|
|
|
Raft protocol version 2 will be removed from Nomad in the next major
|
|
release of Nomad, 1.4.0.
|
|
|
|
In Nomad 1.3.0, the default raft protocol version has been updated to
|
|
3. If the [`raft_protocol`] version is not explicitly set, upgrading a
|
|
server will automatically upgrade that server's raft protocol. See the
|
|
[Upgrading to Raft Protocol 3] guide.
|
|
|
|
#### Client State Store
|
|
|
|
The client state store will be automatically migrated to a new schema
|
|
version when upgrading a client.
|
|
|
|
Downgrading to a previous version of the client after upgrading it to
|
|
Nomad 1.3 is not supported. To downgrade safely, users should drain
|
|
all tasks from the Nomad client and erase its data directory.
|
|
|
|
#### CSI Plugins
|
|
|
|
The client filesystem layout for CSI plugins has been updated to
|
|
correctly handle the lifecycle of multiple allocations serving the
|
|
same plugin. Running plugin tasks will not be updated after upgrading
|
|
the client, but it is recommended to redeploy CSI plugin jobs after
|
|
upgrading the cluster.
|
|
|
|
The directory for plugin control sockets will be mounted from a new
|
|
per-allocation directory in the client data dir. This will still be
|
|
bind-mounted to `csi_plugin.mount_config` as in versions of Nomad
|
|
prior to 1.3.0.
|
|
|
|
The volume staging directory for new CSI plugin tasks will now be
|
|
mounted to the task's `NOMAD_TASK_DIR` instead of the
|
|
`csi_plugin.mount_config`.
|
|
|
|
#### Raft leadership transfer on error
|
|
|
|
Starting with Nomad 1.3.0, when a Nomad server is elected the Raft leader but
|
|
fails to complete the process to start acting as the Nomad leader it will
|
|
attempt to gracefully transfer its Raft leadership status to another eligible
|
|
server in the cluster. This operation is only supported when using Raft
|
|
Protocol Version 3.
|
|
|
|
#### Server Raft Database
|
|
|
|
The server raft database in `raft.db` will be automatically migrated to a new
|
|
underlying implementation provided by `go.etcd.io/bbolt`. Downgrading to a previous
|
|
version of the server after upgrading it to Nomad 1.3 is not supported. Like with
|
|
any Nomad upgrade it is recommended to take a snapshot of your database prior to
|
|
upgrading in case a downgrade becomes necessary.
|
|
|
|
The new database implementation enables a new server configuration option for
|
|
controlling the underlying freelist-sync behavior. Clusters experiencing extreme
|
|
disk IO on servers may want to consider disabling freelist-sync to reduce load.
|
|
The tradeoff is longer server startup times, as the database must be completely
|
|
scanned to re-build the freelist from scratch.
|
|
|
|
```hcl
|
|
server {
|
|
raft_boltdb {
|
|
no_freelist_sync = true
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Changes to the `nomad server members` command
|
|
|
|
The standard output of the `nomad server members` command replaces the previous
|
|
`Protocol` column that indicated the Serf protocol version with a new column
|
|
named `Raft Version` which outputs the Raft protocol version defined in each
|
|
server.
|
|
|
|
The `-detailed` flag is now called `-verbose` and outputs the standard values
|
|
in addition to extra information. The previous name is still supported but may
|
|
be removed in future releases.
|
|
|
|
The previous `Protocol` value can be viewed using the `-verbose` flag.
|
|
|
|
#### Changes to `client.template.function_denylist` configuration
|
|
|
|
consul-template v0.28 added a new function
|
|
[`writeToFile`](https://github.com/hashicorp/consul-template/blob/v0.28.0/docs/templating-language.md#writeToFile)
|
|
which can write to arbitrary files on the host.
|
|
|
|
Nomad 1.3.0 disables this function by default in its
|
|
[`function_denylist`](/nomad/docs/configuration/client#function_denylist).
|
|
|
|
However *if you have overridden the default `template.function_denylist` in
|
|
your client configuration, you must add `writeToFile` to your denylist.*
|
|
Failing to do so allows templates to write to arbitrary paths on the host.
|
|
|
|
#### Changes to Envoy metrics labels
|
|
|
|
When using Envoy as a sidecar proxy for Connect enabled services, Nomad will now
|
|
automatically inject the unique allocation ID into Envoy's stats tags configuration.
|
|
Users who wish to set the tag values themselves may do so using the [`proxy.config`](/nomad/docs/job-specification/proxy#config)
|
|
block.
|
|
|
|
```hcl
|
|
connect {
|
|
sidecar_service {
|
|
proxy {
|
|
config {
|
|
envoy_stats_tags = ["nomad.alloc_id=<allocID>"]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Changes to Consul Connect Service Identity Tokens
|
|
|
|
Starting with Nomad 1.3.0, Consul Service Identity Tokens created automatically
|
|
by Nomad on behalf of Connect services will now be created as [`Local`] tokens. These
|
|
tokens will no longer be replicated globally. To facilitate cross-Consul datacenter
|
|
requests of Connect services registered by Nomad, Consul agents will need to be
|
|
configured with [default anonymous][anon_token] ACL tokens with ACL policies of
|
|
sufficient permissions to read service and node metadata pertaining to those
|
|
requests. This mechanism is described in Consul [#7414][consul_acl].
|
|
A typical Consul agent anonymous token may contain an ACL policy such as:
|
|
|
|
```hcl
|
|
service_prefix "" { policy = "read" }
|
|
node_prefix "" { policy = "read" }
|
|
```
|
|
|
|
The minimum version of Consul supported by Nomad's Connect integration is now Consul v1.8.0.
|
|
|
|
#### Changes to task groups that utilise Consul services and checks
|
|
|
|
Starting with Nomad 1.3.0, services and checks that utilise Consul will have an
|
|
automatic constraint placed upon the task group. This ensures they are placed
|
|
on a client with a Consul agent running that meets a minimum version
|
|
requirement. The minimum version of Consul supported by Nomad's service and
|
|
check blocks is now Consul v1.7.0.
|
|
|
|
#### Linux Control Groups Version 2
|
|
|
|
Starting with Nomad 1.3.0, Linux systems configured to use [cgroups v2][cgroups2]
|
|
are now supported. A Nomad client will only activate its v2 control groups manager
|
|
if the system is configured with the cgroups2 controller mounted at `/sys/fs/cgroup`.
|
|
* Systems that do not support cgroups v2 are not affected.
|
|
* Systems configured in hybrid mode typically mount the cgroups2
|
|
controller at `/sys/fs/cgroup/unified`, so Nomad will continue to
|
|
use cgroups v1 for these hosts.
|
|
* Systems configured with only cgroups v2 now correctly support setting cpu [cores].
|
|
|
|
Nomad will preserve the existing cgroup for tasks when a client is
|
|
upgraded, so there will be no disruption to tasks. A new client
|
|
attribute `unique.cgroup.version` indicates which version of control
|
|
groups Nomad is using.
|
|
|
|
When cgroups v2 are in use, Nomad uses `nomad.slice` as the [default parent][cgroup_parent] for cgroups
|
|
created on behalf of tasks. The cgroup created for a task is named in the form `<allocID>.<task>.scope`.
|
|
These cgroups are created by Nomad before a task starts. External task drivers that support
|
|
containerization should be updated to make use of the new cgroup locations.
|
|
|
|
The new cgroup file system layout will look like the following:
|
|
|
|
```shell-session
|
|
➜ tree -d /sys/fs/cgroup/nomad.slice
|
|
/sys/fs/cgroup/nomad.slice
|
|
├── 8b8da4cf-8ebf-b578-0bcf-77190749abf3.redis.scope
|
|
└── a8c8e495-83c8-311b-4657-e6e3127e98bc.example.scope
|
|
```
|
|
#### Support for pre-0.9 Tasks Removed
|
|
|
|
Running tasks that were created on clusters from Nomad version 0.9 or
|
|
earlier will fail to restore after upgrading a cluster to Nomad
|
|
1.3.0. To safely upgrade without unplanned interruptions, force these
|
|
tasks to be rescheduled by `nomad alloc stop` before upgrading. Note
|
|
this only applies to tasks that have been running continuously from
|
|
before 0.9 without rescheduling. Jobs that were created before 0.9 but
|
|
have had tasks replaced over time after 0.9 will operate normally
|
|
during the upgrade.
|
|
|
|
## Nomad 1.2.6, 1.1.12, and 1.0.18
|
|
|
|
#### ACL requirement for the job parse endpoint
|
|
|
|
Nomad 1.2.6, 1.1.12, and 1.0.18 require ACL authentication for the
|
|
[job parse][api_jobs_parse] API endpoint. The `parse-job` capability has been
|
|
created to allow access to this endpoint. The `submit-job`, `read`, and `write`
|
|
policies include this capability.
|
|
|
|
|
|
The capability must be enabled for the namespace used in the API request.
|
|
|
|
## Nomad 1.2.4
|
|
|
|
#### `nomad eval status -json` deprecated
|
|
|
|
Nomad 1.2.4 includes a new `nomad eval list` command that has the
|
|
option to display the results in JSON format with the `-json`
|
|
flag. This replaces the existing `nomad eval status -json` option. In
|
|
Nomad 1.4.0, `nomad eval status -json` will be changed to display only
|
|
the selected evaluation in JSON format.
|
|
|
|
## Nomad 1.2.2
|
|
|
|
### Panic on node class filtering for system and sysbatch jobs fixed
|
|
|
|
Nomad 1.2.2 fixes a [server crashing bug][gh-11563] present in the scheduler
|
|
node class filtering since 1.2.0. Users should upgrade to Nomad 1.2.2 to avoid
|
|
this problem.
|
|
|
|
## Nomad 1.2.0
|
|
|
|
#### Nvidia device plugin
|
|
|
|
The Nvidia device is now an external plugin and must be installed separately.
|
|
Refer to [the Nvidia device plugin's documentation][nvidia] for details.
|
|
|
|
#### ACL requirements for accessing the job details page in the Nomad UI
|
|
|
|
Nomad 1.2.0 introduced a new UI component to display the status of `system` and
|
|
`sysbatch` jobs in each client where they are running. This feature makes an
|
|
API call to an endpoint that requires `node:read` ACL permission. Tokens used
|
|
to access the Nomad UI will need to be updated to include this permission in
|
|
order to access a job details page.
|
|
|
|
This was an unintended change fixed in Nomad 1.2.4.
|
|
|
|
#### HCLv2 Job Specification Parsing
|
|
|
|
In previous versions of Nomad, when rendering a job specification using override
|
|
variables, a warning would be returned if a variable within an override file
|
|
was declared that was not found within the job specification. This behaviour
|
|
differed from passing variables via the `-var` flag, which would always cause an
|
|
error in the same situation.
|
|
|
|
Nomad 1.2.0 fixed the behaviour consistency to always return an error by default,
|
|
where an override variable was specified which was not a known variable within the
|
|
job specification. In order to mitigate this change for users who wish to only
|
|
be warned when this situation arises, the `-hcl-strict=false` flag can be
|
|
specified.
|
|
|
|
## Nomad 1.0.11 and 1.1.5 Enterprise
|
|
|
|
#### Audit log file names
|
|
|
|
Audit log file naming now matches the standard log file naming introduced in
|
|
1.0.10 and 1.1.4. The audit log currently being written will no longer have a
|
|
timestamp appended.
|
|
|
|
## Nomad 1.0.10 and 1.1.4
|
|
|
|
#### Log file names
|
|
|
|
The [`log_file`] configuration option was not being fully respected, as the
|
|
generated filename would include a timestamp. After upgrade, the active log
|
|
file will always be the value defined in `log_file`, with timestamped files
|
|
being created during log rotation.
|
|
|
|
## Nomad 1.0.9 and 1.1.3
|
|
|
|
#### Namespace in Job Run and Plan APIs
|
|
|
|
The Job Run and Plan APIs now respect the `?namespace=...` query parameter over
|
|
the namespace specified in the job itself. This matches the precedence of
|
|
region and [fixes a bug where the `-namespace` flag was not respected for the
|
|
`nomad run` and `nomad apply` commands.][gh-10875]
|
|
|
|
For users of [`api.Client`][go-client] who want their job namespace respected,
|
|
you must ensure the `Config.Namespace` field is unset.
|
|
|
|
#### Docker Driver
|
|
|
|
**1.1.3 only**
|
|
|
|
Starting in Nomad 1.1.2, task groups with `network.mode = "bridge"` generated a
|
|
hosts file in Docker containers. This generated hosts file was bind-mounted
|
|
from the task directory to `/etc/hosts` within the task. In Nomad 1.1.3 the
|
|
source for the bind mount was moved to the allocation directory so that it is
|
|
shared between all tasks in an allocation.
|
|
|
|
Please note that this change may prevent [`extra_hosts`] values from being
|
|
properly set in each task when there are multiple tasks within the same group.
|
|
When using `extra_hosts` with Consul Connect in `bridge` network mode, you
|
|
should set the hosts values in the [`sidecar_task.config`] block instead.
|
|
|
|
## Nomad 1.1.0
|
|
|
|
#### Enterprise licenses
|
|
|
|
Nomad Enterprise licenses are no longer stored in raft or synced between
|
|
servers. Nomad Enterprise servers will not start without a license. There is
|
|
no longer a six hour evaluation period when running Nomad Enterprise. Before
|
|
upgrading, you must provide each server with a license on disk or in its
|
|
environment (see the [Enterprise licensing] documentation for details).
|
|
|
|
The `nomad license put` command has been removed.
|
|
|
|
The `nomad license get` command is no longer forwarded to the Nomad leader,
|
|
and will return the license from the specific server being contacted.
|
|
|
|
Click [here](https://www.hashicorp.com/products/nomad/trial) to get a trial license for Nomad Enterprise.
|
|
|
|
#### Agent Metrics API
|
|
|
|
The Nomad agent metrics API now respects the
|
|
[`prometheus_metrics`](/nomad/docs/configuration/telemetry#prometheus_metrics)
|
|
configuration value. If this value is set to `false`, which is the default value,
|
|
calling `/v1/metrics?format=prometheus` will now result in a response error.
|
|
|
|
#### CSI volumes
|
|
|
|
The volume specification for CSI volumes has been updated to support volume
|
|
creation. The `access_mode` and `attachment_mode` fields have been moved to a
|
|
`capability` block that can be repeated. Existing registered volumes will be
|
|
automatically modified the next time that a volume claim is updated. Volume
|
|
specification files for new volumes should be updated to the format described
|
|
in the [`volume create`] and [`volume register`] commands.
|
|
|
|
The [`volume`] block has an `access_mode` and `attachment_mode` field that are
|
|
required for CSI volumes. Jobs that use CSI volumes should be updated with
|
|
these fields.
|
|
|
|
#### Connect native tasks
|
|
|
|
Connect native tasks running in host networking mode will now have `CONSUL_HTTP_ADDR`
|
|
set automatically. Before this was only the case for bridge networking. If an operator
|
|
already explicitly set `CONSUL_HTTP_ADDR` then it will not get overridden.
|
|
|
|
#### Linux capabilities in exec/java
|
|
|
|
Following the security [remediation][no_net_raw] in Nomad versions 0.12.12, 1.0.5,
|
|
and 1.1.0-rc1, the `exec` and `java` task drivers will additionally no longer enable
|
|
the following linux capabilities by default.
|
|
|
|
```
|
|
AUDIT_CONTROL AUDIT_READ BLOCK_SUSPEND DAC_READ_SEARCH IPC_LOCK IPC_OWNER LEASE
|
|
LINUX_IMMUTABLE MAC_ADMIN MAC_OVERRIDE NET_ADMIN NET_BROADCAST NET_RAW SYS_ADMIN
|
|
SYS_BOOT SYSLOG SYS_MODULE SYS_NICE SYS_PACCT SYS_PTRACE SYS_RAWIO SYS_RESOURCE
|
|
SYS_TIME SYS_TTY_CONFIG WAKE_ALARM
|
|
```
|
|
|
|
The capabilities now enabled by default are modeled after Docker default
|
|
[`linux capabilities`] (excluding `NET_RAW`).
|
|
|
|
```
|
|
AUDIT_WRITE CHOWN DAC_OVERRIDE FOWNER FSETID KILL MKNOD NET_BIND_SERVICE
|
|
SETFCAP SETGID SETPCAP SETUID SYS_CHROOT
|
|
```
|
|
|
|
A new `allow_caps` plugin configuration parameter for [`exec`][allow_caps_exec]
|
|
and [`java`][allow_caps_java] task drivers can be used to restrict the set of
|
|
capabilities allowed for use by tasks.
|
|
|
|
Tasks using the `exec` or `java` task drivers can add or remove desired linux
|
|
capabilities using the [`cap_add`][cap_add_exec] and [`cap_drop`][cap_drop_exec]
|
|
task configuration options.
|
|
|
|
#### iptables
|
|
|
|
Nomad now appends its iptables rules to the `NOMAD-ADMIN` chain instead of
|
|
inserting them as the first rule. This allows better control for user-defined
|
|
iptables rules but users who append rules currently should verify that their
|
|
rules are being appended in the correct order.
|
|
|
|
## Nomad 1.1.0-rc1, 1.0.5, 0.12.12
|
|
|
|
Nomad versions 1.1.0-rc1, 1.0.5 and 0.12.12 change the behavior of the `docker`, `exec`,
|
|
and `java` task drivers so that the [`CAP_NET_RAW`] linux capability is disabled
|
|
by default. This is one of the [`linux capabilities`] that Docker itself enables
|
|
by default, as this capability enables the generation of ICMP packets - used by
|
|
the common `ping` utility for performing network diagnostics. When used by groups in
|
|
`bridge` networking mode, the `CAP_NET_RAW` capability also exposes tasks to ARP spoofing,
|
|
enabling DoS and MITM attacks against other tasks running in `bridge` networking
|
|
on the same host. Operators should weigh potential impact of an upgrade on their
|
|
applications against the security consequences inherit with `CAP_NET_RAW`. Typical
|
|
applications using `tcp` or `udp` based networking should not be affected.
|
|
|
|
This is the sole change for Nomad 1.0.5 and 0.12.12, intended to provide better
|
|
task network isolation by default.
|
|
|
|
Users of the `docker` driver can restore the previous behavior by configuring the
|
|
[`allow_caps`] driver configuration option to explicitly enable the `CAP_NET_RAW`
|
|
capability.
|
|
|
|
```hcl
|
|
plugin "docker" {
|
|
config {
|
|
allow_caps = [
|
|
"CHOWN", "DAC_OVERRIDE", "FSETID", "FOWNER", "MKNOD",
|
|
"SETGID", "SETUID", "SETFCAP", "SETPCAP", "NET_BIND_SERVICE",
|
|
"SYS_CHROOT", "KILL", "AUDIT_WRITE", "NET_RAW",
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
An upcoming version of Nomad will include similar configuration options for the
|
|
`exec` and `java` task drivers.
|
|
|
|
This change is limited to `docker`, `exec`, and `java` driver plugins. It does
|
|
not affect the Nomad server. This only affects Nomad clients running Linux, with
|
|
tasks using `bridge` networking and one of these task drivers, or third-party
|
|
plugins which relied on the shared Nomad executor library.
|
|
|
|
Upgrading a Nomad client to 1.0.5 or 0.12.12 will not restart existing tasks. As
|
|
such, processes from existing `docker`, `exec`, or `java` tasks will need to be
|
|
manually restarted (using `alloc stop` or another mechanism) in order to be
|
|
fully isolated.
|
|
|
|
## Nomad 1.0.3, 0.12.10
|
|
|
|
Nomad versions 1.0.3 and 0.12.10 change the behavior of the `exec` and `java` drivers so that
|
|
tasks are isolated in their own PID and IPC namespaces. As a result, the
|
|
process launched by these drivers will be PID 1 in the namespace. This has
|
|
[significant impact](https://man7.org/linux/man-pages/man7/pid_namespaces.7.html)
|
|
on the treatment of a process by the Linux kernel. Furthermore, tasks in the
|
|
same allocation will no longer be able to coordinate using signals, SystemV IPC
|
|
objects, or POSIX message queues. Operators should weigh potential impact of an
|
|
upgrade on their applications against the security consequences inherent in using
|
|
the host namespaces.
|
|
|
|
This is the sole change for Nomad 1.0.3, intended to provide better process
|
|
isolation by default. An upcoming version of Nomad will include options for
|
|
configuring this behavior.
|
|
|
|
This change is limited to the `exec` and `java` driver plugins. It does not affect
|
|
the Nomad server. This only affect Nomad clients running on Linux, using the
|
|
`exec` or `java` drivers or third-party driver plugins which relied on the shared
|
|
Nomad executor library.
|
|
|
|
Upgrading a Nomad client to 1.0.3 or 0.12.10 will not restart existing tasks.
|
|
As such, processes from existing `exec`/`java` tasks will need to be manually restarted
|
|
(using `alloc stop` or another mechanism) in order to be fully isolated.
|
|
|
|
## Nomad 1.0.2
|
|
|
|
#### Dynamic secrets trigger template changes on client restart
|
|
|
|
Nomad 1.0.2 changed the behavior of template `change_mode` triggers when a
|
|
client node restarts. In Nomad 1.0.1 and earlier, the first rendering of a
|
|
template after a client restart would not trigger the `change_mode`. For
|
|
dynamic secrets such as the Vault PKI secrets engine, this resulted in the
|
|
secret being updated but not restarting or signalling the task. When the
|
|
secret's lease expired at some later time, the task workload might fail
|
|
because of the stale secret. For example, a web server's SSL certificate would
|
|
be expired and browsers would be unable to connect.
|
|
|
|
In Nomad 1.0.2, when a client node is restarted any task with Vault secrets
|
|
that are generated or have expired will have its `change_mode` triggered. If
|
|
`change_mode = "restart"` this will result in the task being restarted, to
|
|
avoid the task failing unexpectedly at some point in the future. This change
|
|
only impacts tasks using dynamic Vault secrets engines such as [PKI][pki], or
|
|
when secrets are rotated. Secrets that don't change in Vault will not trigger
|
|
a `change_mode` on client restart.
|
|
|
|
## Nomad 1.0.1
|
|
|
|
#### Envoy worker threads
|
|
|
|
Nomad v1.0.0 changed the default behavior around the number of worker threads
|
|
created by the Envoy when being used as a sidecar for Consul Connect. In Nomad
|
|
v1.0.1, the same default setting of [`--concurrency=1`][envoy_concurrency] is set for Envoy when used
|
|
as a Connect gateway. As before, the [`meta.connect.proxy_concurrency`][proxy_concurrency]
|
|
property can be set in client configuration to override the default value.
|
|
|
|
## Nomad 1.0.0
|
|
|
|
### HCL2 for Job specification
|
|
|
|
Nomad v1.0.0 adopts HCL2 for parsing the job spec. HCL2 extends HCL with more
|
|
expression and reuse support, but adds some stricter schema for HCL blocks
|
|
(a.k.a. blocks). Check [HCL](/nomad/docs/job-specification/hcl2) for more details.
|
|
|
|
### Signal used when stopping Docker tasks
|
|
|
|
When stopping tasks running with the Docker task driver, Nomad documents that a
|
|
`SIGTERM` will be issued (unless configured with `kill_signal`). However, recent
|
|
versions of Nomad would issue `SIGINT` instead. Starting again with Nomad v1.0.0
|
|
`SIGTERM` will be sent by default when stopping Docker tasks.
|
|
|
|
### Deprecated metrics have been removed
|
|
|
|
Nomad v0.7.0 added supported for tagged metrics and deprecated untagged metrics.
|
|
There was support for configuring backwards-compatible metrics. This support has
|
|
been removed with v1.0.0, and all metrics will be emitted with tags.
|
|
|
|
### Null characters in region, datacenter, job name/ID, task group name, and task names
|
|
|
|
Starting with Nomad v1.0.0, jobs will fail validation if any of the following
|
|
contain null character: the job ID or name, the task group name, or the task
|
|
name. Any jobs meeting this requirement should be modified before an update to
|
|
v1.0.0. Similarly, client and server config validation will prohibit either the
|
|
region or the datacenter from containing null characters.
|
|
|
|
### EC2 CPU characteristics may be different
|
|
|
|
Starting with Nomad v1.0.0, the AWS fingerprinter uses data derived from the
|
|
official AWS EC2 API to determine default CPU performance characteristics,
|
|
including core count and core speed. This data should be accurate for each
|
|
instance type per region. Previously, Nomad used a hand-made lookup table that
|
|
was not region aware and may have contained inaccurate or incomplete data. As
|
|
part of this change, the AWS fingerprinter no longer sets the `cpu.modelname`
|
|
attribute.
|
|
|
|
As before, `cpu_total_compute` can be used to override the discovered CPU
|
|
resources available to the Nomad client.
|
|
|
|
### Inclusive language
|
|
|
|
Starting with Nomad v1.0.0, the terms `blacklist` and `whitelist` have been
|
|
deprecated from client configuration and driver configuration. The existing
|
|
configuration values are permitted but will be removed in a future version of
|
|
Nomad. The specific configuration values replaced are:
|
|
|
|
- Client `driver.blacklist` is replaced with `driver.denylist`.
|
|
|
|
- Client `driver.whitelist` is replaced with `driver.allowlist`.
|
|
|
|
- Client `env.blacklist` is replaced with `env.denylist`.
|
|
|
|
- Client `fingerprint.blacklist` is replaced with `fingerprint.denylist`.
|
|
|
|
- Client `fingerprint.whitelist` is replaced with `fingerprint.allowlist`.
|
|
|
|
- Client `user.blacklist` is replaced with `user.denylist`.
|
|
|
|
- Client `template.function_blacklist` is replaced with
|
|
`template.function_denylist`.
|
|
|
|
- Docker driver `docker.caps.whitelist` is replaced with
|
|
`docker.caps.allowlist`.
|
|
|
|
### Consul Connect
|
|
|
|
Nomad 1.0's Consul Connect integration works best with Consul 1.9 or later. The
|
|
ideal upgrade path is:
|
|
|
|
1. Create a new Nomad client image with Nomad 1.0 and Consul 1.9 or later.
|
|
2. Add new hosts based on the image.
|
|
3. [Drain][drain-cli] and shutdown old Nomad client nodes.
|
|
|
|
While inplace upgrades and older versions of Consul are supported by Nomad 1.0,
|
|
Envoy proxies will drop and stop accepting connections while the Nomad agent is
|
|
restarting. Nomad 1.0 with Consul 1.9 do not have this limitation.
|
|
|
|
#### Envoy proxy versions
|
|
|
|
Nomad v1.0.0 changes the behavior around the selection of Envoy version used for
|
|
Connect sidecar proxies. Previously, Nomad always defaulted to Envoy v1.11.2 if
|
|
neither the `meta.connect.sidecar_image` parameter or `sidecar_task` block were
|
|
explicitly configured. Likewise the same version of Envoy would be used for
|
|
Connect ingress gateways if `meta.connect.gateway_image` was unset. Starting
|
|
with Nomad v1.0.0, each Nomad Client will query Consul for a list of supported
|
|
Envoy versions. Nomad will make use of the latest version of Envoy supported by
|
|
the Consul agent when launching Envoy as a Connect sidecar proxy. If the version
|
|
of the Consul agent is older than v1.7.8, v1.8.4, or v1.9.0, Nomad will fallback
|
|
to the v1.11.2 version of Envoy. As before, if the `meta.connect.sidecar_image`,
|
|
`meta.connect.gateway_image`, or `sidecar_task` block are set, those settings
|
|
take precedence.
|
|
|
|
When upgrading Nomad Clients from a previous version to v1.0.0 and above, it is
|
|
recommended to also upgrade the Consul agents to v1.7.8, 1.8.4, or v1.9.0 or
|
|
newer. Upgrading Nomad and Consul to versions that support the new behavior
|
|
while also doing a full [node drain][] at the time of the upgrade for each node
|
|
will ensure Connect workloads are properly rescheduled onto nodes in such a way
|
|
that the Nomad Clients, Consul agents, and Envoy sidecar tasks maintain
|
|
compatibility with one another.
|
|
|
|
#### Envoy worker threads
|
|
|
|
Nomad v1.0.0 changes the default behavior around the number of worker threads
|
|
created by the Envoy sidecar proxy when using Consul Connect. Previously, the
|
|
Envoy [`--concurrency`][envoy_concurrency] argument was left unset, which caused
|
|
Envoy to spawn as many worker threads as logical cores available on the CPU. The
|
|
`--concurrency` value now defaults to `1` and can be configured by setting the
|
|
[`meta.connect.proxy_concurrency`][proxy_concurrency] property in client
|
|
configuration.
|
|
|
|
## Nomad 0.12.8
|
|
|
|
### Docker volume mounts
|
|
|
|
Nomad 0.12.8 includes security fixes for the handling of Docker volume mounts:
|
|
|
|
- The `docker.volumes.enabled` flag now defaults to `false` as documented.
|
|
|
|
- Docker driver mounts of type "volume" (but not "bind") were not sandboxed and
|
|
could mount arbitrary locations from the client host. The
|
|
`docker.volumes.enabled` configuration will now disable Docker mounts with
|
|
type "volume" when set to `false` (the default).
|
|
|
|
This change Docker impacts jobs that use a `mounts` with type "volume", as shown
|
|
below. This job will fail when placed unless `docker.volumes.enabled = true`.
|
|
|
|
```hcl
|
|
mounts = [
|
|
{
|
|
type = "volume"
|
|
target = "/path/in/container"
|
|
source = "docker_volume"
|
|
volume_options = {
|
|
driver_config = {
|
|
name = "local"
|
|
options = [
|
|
{
|
|
device = "/"
|
|
o = "ro,bind"
|
|
type = "ext4"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
]
|
|
```
|
|
|
|
## Nomad 0.12.6
|
|
|
|
### Artifact and Template Paths
|
|
|
|
Nomad 0.12.6 includes security fixes for privilege escalation vulnerabilities
|
|
in handling of job `template` and `artifact` blocks:
|
|
|
|
- The `template.source` and `template.destination` fields are now protected by
|
|
the file sandbox introduced in 0.9.6. These paths are now restricted to fall
|
|
inside the task directory by default. An operator can opt-out of this
|
|
protection with the [`template.disable_file_sandbox`][] field in the client
|
|
configuration.
|
|
|
|
- The paths for `template.source`, `template.destination`, and
|
|
`artifact.destination` are validated on job submission to ensure the paths do
|
|
not escape the file sandbox. It was possible to use interpolation to bypass
|
|
this validation. The client now interpolates the paths before checking if they
|
|
are in the file sandbox.
|
|
|
|
~> **Warning:** Due to a [bug][gh-9148] in Nomad v0.12.6, the
|
|
`template.destination` and `artifact.destination` paths do not support
|
|
absolute paths, including the interpolated `NOMAD_SECRETS_DIR`,
|
|
`NOMAD_TASK_DIR`, and `NOMAD_ALLOC_DIR` variables. This bug is fixed in
|
|
v0.12.9. To work around the bug, use a relative path.
|
|
|
|
## Nomad 0.12.0
|
|
|
|
### `mbits` and Task Network Resource deprecation
|
|
|
|
Starting in Nomad 0.12.0 the `mbits` field of the network resource block has
|
|
been deprecated and is no longer considered when making scheduling decisions.
|
|
This is in part because we felt that `mbits` didn't accurately account network
|
|
bandwidth as a resource.
|
|
|
|
Additionally the use of the `network` block inside of a task's `resource` block
|
|
is also deprecated. Users are advised to move their `network` block to the
|
|
`group` block. Recent networking features have only been added to group based
|
|
network configuration. If any usecase or feature which was available with task
|
|
network resource is not fulfilled with group network configuration, please open
|
|
an issue detailing the missing capability.
|
|
|
|
Additionally, the `docker` driver's `port_map` configuration is deprecated in
|
|
lieu of the `ports` field.
|
|
|
|
### Enterprise Licensing
|
|
|
|
Enterprise binaries for Nomad are now publicly available via
|
|
[releases.hashicorp.com](https://releases.hashicorp.com/nomad/). By default all
|
|
enterprise features are enabled for 6 hours. During that time enterprise users
|
|
should apply their license with the [`nomad license put ...`](/nomad/docs/v1.0.x/commands/license/put) command.
|
|
|
|
Once the 6 hour demonstration period expires, Nomad will shutdown. If restarted
|
|
Nomad will shutdown in a very short amount of time unless a valid license is
|
|
applied.
|
|
|
|
~> **Warning:** Due to a [bug][gh-8457] in Nomad v0.12.0, existing clusters
|
|
that are upgraded will **not** have 6 hours to apply a license. The minimal
|
|
grace period should be sufficient to apply a valid license, but enterprise
|
|
users are encouraged to delay upgrading until Nomad v0.12.1 is released and
|
|
fixes the issue.
|
|
|
|
### Docker access host filesystem
|
|
|
|
Nomad 0.12.0 disables Docker tasks access to the host filesystem, by default.
|
|
Prior to Nomad 0.12, Docker tasks may mount and then manipulate any host file
|
|
and may pose a security risk.
|
|
|
|
Operators now must explicitly allow tasks to access host filesystem. [Host
|
|
Volumes](/nomad/docs/configuration/client#host_volume-block) provide a fine tune
|
|
access to individual paths.
|
|
|
|
To restore pre-0.12.0 behavior, you can enable [Docker
|
|
`volume`](/nomad/docs/drivers/docker#enabled-1) to allow binding host paths, by adding
|
|
the following to the nomad client config file:
|
|
|
|
```hcl
|
|
plugin "docker" {
|
|
config {
|
|
volumes {
|
|
enabled = true
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### QEMU images
|
|
|
|
Nomad 0.12.0 restricts the paths the QEMU tasks can load an image from. A QEMU
|
|
task may download an image to the allocation directory to load. But images
|
|
outside the allocation directories must be explicitly allowed by operators in
|
|
the client agent configuration file.
|
|
|
|
For example, you may allow loading QEMU images from `/mnt/qemu-images` by
|
|
adding the following to the agent configuration file:
|
|
|
|
```hcl
|
|
plugin "qemu" {
|
|
config {
|
|
image_paths = ["/mnt/qemu-images"]
|
|
}
|
|
}
|
|
```
|
|
|
|
## Nomad 0.11.7
|
|
|
|
### Docker volume mounts
|
|
|
|
Nomad 0.11.7 includes a security fix for the handling of Docker volume
|
|
mounts. Docker driver mounts of type "volume" (but not "bind") were not
|
|
sandboxed and could mount arbitrary locations from the client host. The
|
|
`docker.volumes.enabled` configuration will now disable Docker mounts with
|
|
type "volume" when set to `false`.
|
|
|
|
This change Docker impacts jobs that use a `mounts` with type "volume", as
|
|
shown below. This job will fail when placed unless `docker.volumes.enabled = true`.
|
|
|
|
```hcl
|
|
mounts = [
|
|
{
|
|
type = "volume"
|
|
target = "/path/in/container"
|
|
source = "docker_volume"
|
|
volume_options = {
|
|
driver_config = {
|
|
name = "local"
|
|
options = [
|
|
{
|
|
device = "/"
|
|
o = "ro,bind"
|
|
type = "ext4"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
]
|
|
```
|
|
|
|
## Nomad 0.11.5
|
|
|
|
### Artifact and Template Paths
|
|
|
|
Nomad 0.11.5 includes backported security fixes for privilege escalation
|
|
vulnerabilities in handling of job `template` and `artifact` blocks:
|
|
|
|
- The `template.source` and `template.destination` fields are now protected by
|
|
the file sandbox introduced in 0.9.6. These paths are now restricted to fall
|
|
inside the task directory by default. An operator can opt-out of this
|
|
protection with the
|
|
[`template.disable_file_sandbox`](/nomad/docs/configuration/client#template-parameters)
|
|
field in the client configuration.
|
|
- The paths for `template.source`, `template.destination`, and
|
|
`artifact.destination` are validated on job submission to ensure the paths
|
|
do not escape the file sandbox. It was possible to use interpolation to
|
|
bypass this validation. The client now interpolates the paths before
|
|
checking if they are in the file sandbox.
|
|
|
|
~> **Warning:** Due to a [bug][gh-9148] in Nomad v0.11.5, the
|
|
`template.destination` and `artifact.destination` paths do not support
|
|
absolute paths, including the interpolated `NOMAD_SECRETS_DIR`,
|
|
`NOMAD_TASK_DIR`, and `NOMAD_ALLOC_DIR` variables. This bug is fixed in
|
|
v0.11.6. To work around the bug, use a relative path.
|
|
|
|
## Nomad 0.11.3
|
|
|
|
Nomad 0.11.3 fixes a critical bug causing the nomad agent to become
|
|
unresponsive. The issue is due to a [Go 1.14.1 runtime
|
|
bug](https://github.com/golang/go/issues/38023) and affects Nomad 0.11.1 and
|
|
0.11.2.
|
|
|
|
## Nomad 0.11.2
|
|
|
|
### Scheduler Scoring Changes
|
|
|
|
Prior to Nomad 0.11.2 the scheduler algorithm used a [node's reserved
|
|
resources][reserved]
|
|
incorrectly during scoring. The result of this bug was that scoring biased in
|
|
favor of nodes with reserved resources vs nodes without reserved resources.
|
|
|
|
Placements will be more correct but slightly different in v0.11.2 vs earlier
|
|
versions of Nomad. Operators do _not_ need to take any actions as the impact of
|
|
the bug fix will only minimally affect scoring.
|
|
|
|
Feasibility (whether a node is capable of running a job at all) is _not_
|
|
affected.
|
|
|
|
### Periodic Jobs and Daylight Saving Time
|
|
|
|
Nomad 0.11.2 fixed a long outstanding bug affecting periodic jobs that are
|
|
scheduled to run during Daylight Saving Time transitions.
|
|
|
|
Nomad 0.11.2 provides a more defined behavior: Nomad evaluates the cron
|
|
expression with respect to specified time zone during transition. A 2:30am
|
|
nightly job with `America/New_York` time zone will not run on the day daylight
|
|
saving time starts; similarly, a 1:30am nightly job will run twice on the day
|
|
daylight saving time ends. See the [Daylight Saving Time][dst] documentation
|
|
for details.
|
|
|
|
## Nomad 0.11.0
|
|
|
|
### client.template: `vault_grace` deprecation
|
|
|
|
Nomad 0.11.0 updates
|
|
[consul-template](https://github.com/hashicorp/consul-template) to v0.24.1. This
|
|
library deprecates the [`vault_grace`][vault_grace] option for templating
|
|
included in Nomad. The feature has been ignored since Vault 0.5 and as long as
|
|
you are running a more recent version of Vault, you can safely remove
|
|
`vault_grace` from your Nomad jobs.
|
|
|
|
### Rkt Task Driver Removed
|
|
|
|
The `rkt` task driver has been deprecated and removed from Nomad. While the code
|
|
is available in an external repository,
|
|
<https://github.com/hashicorp/nomad-driver-rkt>, it will not be maintained as
|
|
`rkt` is [no longer being developed upstream](https://github.com/rkt/rkt). We
|
|
encourage all `rkt` users to find a new task driver as soon as possible.
|
|
|
|
## Nomad 0.10.8
|
|
|
|
### Docker volume mounts
|
|
|
|
Nomad 0.10.8 includes a security fix for the handling of Docker volume mounts.
|
|
Docker driver mounts of type "volume" (but not "bind") were not sandboxed and
|
|
could mount arbitrary locations from the client host. The
|
|
`docker.volumes.enabled` configuration will now disable Docker mounts with type
|
|
"volume" when set to `false`.
|
|
|
|
This change Docker impacts jobs that use a `mounts` with type "volume", as shown
|
|
below. This job will fail when placed unless `docker.volumes.enabled = true`.
|
|
|
|
```hcl
|
|
mounts = [
|
|
{
|
|
type = "volume"
|
|
target = "/path/in/container"
|
|
source = "docker_volume"
|
|
volume_options = {
|
|
driver_config = {
|
|
name = "local"
|
|
options = [
|
|
{
|
|
device = "/"
|
|
o = "ro,bind"
|
|
type = "ext4"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
]
|
|
```
|
|
|
|
## Nomad 0.10.6
|
|
|
|
### Artifact and Template Paths
|
|
|
|
Nomad 0.10.6 includes backported security fixes for privilege escalation
|
|
vulnerabilities in handling of job `template` and `artifact` blocks:
|
|
|
|
- The `template.source` and `template.destination` fields are now protected by
|
|
the file sandbox introduced in 0.9.6. These paths are now restricted to fall
|
|
inside the task directory by default. An operator can opt-out of this
|
|
protection with the
|
|
[`template.disable_file_sandbox`](/nomad/docs/configuration/client#template-parameters)
|
|
field in the client configuration.
|
|
|
|
- The paths for `template.source`, `template.destination`, and
|
|
`artifact.destination` are validated on job submission to ensure the paths
|
|
do not escape the file sandbox. It was possible to use interpolation to
|
|
bypass this validation. The client now interpolates the paths before
|
|
checking if they are in the file sandbox.
|
|
|
|
~> **Warning:** Due to a [bug][gh-9148] in Nomad v0.10.6, the
|
|
`template.destination` and `artifact.destination` paths do not support
|
|
absolute paths, including the interpolated `NOMAD_SECRETS_DIR`,
|
|
`NOMAD_TASK_DIR`, and `NOMAD_ALLOC_DIR` variables. This bug is fixed in
|
|
v0.10.7. To work around the bug, use a relative path.
|
|
|
|
## Nomad 0.10.4
|
|
|
|
### Same-Node Scheduling Penalty Removed
|
|
|
|
Nomad 0.10.4 includes a fix to the scheduler that removes the same-node penalty
|
|
for allocations that have not previously failed. In earlier versions of Nomad,
|
|
the node where an allocation was running was penalized from receiving updated
|
|
versions of that allocation, resulting in a higher chance of the allocation
|
|
being placed on a new node. This was changed so that the penalty only applies to
|
|
nodes where the previous allocation has failed or been rescheduled, to reduce
|
|
the risk of correlated failures on a host. Scheduling weighs a number of
|
|
factors, but this change should reduce movement of allocations that are being
|
|
updated from a healthy state. You can view the placement metrics for an
|
|
allocation with `nomad alloc status -verbose`.
|
|
|
|
### Additional Environment Variable Filtering
|
|
|
|
Nomad will by default prevent certain environment variables set in the client
|
|
process from being passed along into launched tasks. The `CONSUL_HTTP_TOKEN`
|
|
environment variable has been added to the default list. More information can
|
|
be found in the `env.blacklist` [configuration](/nomad/docs/configuration/client#env-blacklist) .
|
|
|
|
## Nomad 0.10.3
|
|
|
|
### mTLS Certificate Validation
|
|
|
|
Nomad 0.10.3 includes a fix for a privilege escalation vulnerability in
|
|
validating TLS certificates for RPC with mTLS. Nomad RPC endpoints validated
|
|
that TLS client certificates had not expired and were signed by the same CA as
|
|
the Nomad node, but did not correctly check the certificate's name for the role
|
|
and region as described in the [Securing Nomad with TLS][tls-guide] guide. This
|
|
allows trusted operators with a client certificate signed by the CA to send RPC
|
|
calls as a Nomad client or server node, bypassing access control and accessing
|
|
any secrets available to a client.
|
|
|
|
Nomad clusters configured for mTLS following the [Securing Nomad with
|
|
TLS][tls-guide] guide or the [Vault PKI Secrets Engine
|
|
Integration][tls-vault-guide] guide should already have certificates that will
|
|
pass validation. Before upgrading to Nomad 0.10.3, operators using mTLS with
|
|
`verify_server_hostname = true` should confirm that the common name or SAN of
|
|
all Nomad client node certs is `client.<region>.nomad`, and that the common name
|
|
or SAN of all Nomad server node certs is `server.<region>.nomad`.
|
|
|
|
### Connection Limits Added
|
|
|
|
Nomad 0.10.3 introduces the [limits][] agent configuration parameters for
|
|
mitigating denial of service attacks from users who are not authenticated via
|
|
mTLS. The default limits block is:
|
|
|
|
```hcl
|
|
limits {
|
|
https_handshake_timeout = "5s"
|
|
http_max_conns_per_client = 100
|
|
rpc_handshake_timeout = "5s"
|
|
rpc_max_conns_per_client = 100
|
|
}
|
|
```
|
|
|
|
If your Nomad agent's endpoints are protected from unauthenticated users via
|
|
other mechanisms these limits may be safely disabled by setting them to `0`.
|
|
|
|
However the defaults were chosen to be safe for a wide variety of Nomad
|
|
deployments and may protect against accidental abuses of the Nomad API that
|
|
could cause unintended resource usage.
|
|
|
|
## Nomad 0.10.2
|
|
|
|
### Preemption Panic Fixed
|
|
|
|
Nomad 0.9.7 and 0.10.2 fix a [server crashing bug][gh-6787] present in scheduler
|
|
preemption since 0.9.0. Users unable to immediately upgrade Nomad can [disable
|
|
preemption][preemption-api] to avoid the panic.
|
|
|
|
### Dangling Docker Container Cleanup
|
|
|
|
Nomad 0.10.2 addresses an issue occurring in heavily loaded clients, where
|
|
containers are started without being properly managed by Nomad. Nomad 0.10.2
|
|
introduced a reaper that detects and kills such containers.
|
|
|
|
Operators may opt to run reaper in a dry-mode or disabling it through a client
|
|
config.
|
|
|
|
For more information, see [Docker Dangling containers][dangling-containers].
|
|
|
|
## Nomad 0.10.0
|
|
|
|
### Deployments
|
|
|
|
Nomad 0.10 enables rolling deployments for service jobs by default and adds a
|
|
default update block when a service job is created or updated. This does not
|
|
affect jobs with an update block.
|
|
|
|
In pre-0.10 releases, when updating a service job without an update block, all
|
|
existing allocations are stopped while new allocations start up, and this may
|
|
cause a service degradation or an outage. You can regain this behavior and
|
|
disable deployments by setting `max_parallel` to 0.
|
|
|
|
For more information, see [`update` block][update].
|
|
|
|
## Nomad 0.9.5
|
|
|
|
### Template Rendering
|
|
|
|
Nomad 0.9.5 includes security fixes for privilege escalation vulnerabilities in
|
|
handling of job `template` blocks:
|
|
|
|
- The client host's environment variables are now cleaned before rendering the
|
|
template. If a template includes the `env` function, the job should include an
|
|
[`env`](/nomad/docs/job-specification/env) block to allow access to the variable in
|
|
the template.
|
|
|
|
- The `plugin` function is no longer permitted by default and will raise an
|
|
error if used in a template. Operator can opt-in to permitting this function
|
|
with the new
|
|
[`template.function_blacklist`](/nomad/docs/configuration/client#template-parameters)
|
|
field in the client configuration.
|
|
|
|
- The `file` function has been changed to restrict paths to fall inside the task
|
|
directory by default. Paths that used the `NOMAD_TASK_DIR` environment
|
|
variable to prefix file paths should work unchanged. Relative paths or
|
|
symlinks that point outside the task directory will raise an error. An
|
|
operator can opt-out of this protection with the new
|
|
[`template.disable_file_sandbox`](/nomad/docs/configuration/client#template-parameters)
|
|
field in the client configuration.
|
|
|
|
## Nomad 0.9.0
|
|
|
|
### Preemption
|
|
|
|
Nomad 0.9 adds preemption support for system jobs. If a system job is submitted
|
|
that has a higher priority than other running jobs on the node, and the node
|
|
does not have capacity remaining, Nomad may preempt those lower priority
|
|
allocations to place the system job. See [preemption][preemption] for more
|
|
details.
|
|
|
|
### Task Driver Plugins
|
|
|
|
All task drivers have become [plugins][plugins] in Nomad 0.9.0. There are two
|
|
user visible differences between 0.8 and 0.9 drivers:
|
|
|
|
- [LXC][lxc] is now community supported and distributed independently.
|
|
|
|
- Task driver [`config`][task-config] blocks are no longer validated by
|
|
the [`nomad job validate`][validate] command. This is a regression that will
|
|
be fixed in a future release.
|
|
|
|
There is a new method for client driver configuration options, but existing
|
|
`client.options` settings are supported in 0.9. See [plugin
|
|
configuration][plugin-block] for details.
|
|
|
|
#### LXC
|
|
|
|
LXC is now an external plugin and must be installed separately. See [the LXC
|
|
driver's documentation][lxc] for details.
|
|
|
|
### Structured Logging
|
|
|
|
Nomad 0.9.0 switches to structured logging. Any log processing on the pre-0.9
|
|
log output will need to be updated to match the structured output.
|
|
|
|
Structured log lines have the format:
|
|
|
|
```
|
|
# <Timestamp> [<Level>] <Component>: <Message>: <KeyN>=<ValueN> ...
|
|
|
|
2019-01-29T05:52:09.221Z [INFO ] client.plugin: starting plugin manager: plugin-type=device
|
|
```
|
|
|
|
Values containing whitespace will be quoted:
|
|
|
|
```
|
|
... starting plugin: task=redis args="[/opt/gopath/bin/nomad logmon]"
|
|
```
|
|
|
|
### HCL2 Transition
|
|
|
|
Nomad 0.9.0 begins a transition to [HCL2][hcl2], the next version of the
|
|
HashiCorp configuration language. While Nomad has begun integrating HCL2, users
|
|
will need to continue to use HCL1 in Nomad 0.9.0 as the transition is
|
|
incomplete.
|
|
|
|
If you interpolate variables in your [`task.config`][task-config] containing
|
|
consecutive dots in their name, you will need to change your job specification
|
|
to use the `env` map. See the following example:
|
|
|
|
```hcl
|
|
env {
|
|
# Note the multiple consecutive dots
|
|
image...version = "3.2"
|
|
|
|
# Valid in both v0.8 and v0.9
|
|
image.version = "3.2"
|
|
}
|
|
|
|
# v0.8 task config block:
|
|
task {
|
|
driver = "docker"
|
|
config {
|
|
image = "redis:${image...version}"
|
|
}
|
|
}
|
|
|
|
# v0.9 task config block:
|
|
task {
|
|
driver = "docker"
|
|
config {
|
|
image = "redis:${env["image...version"]}"
|
|
}
|
|
}
|
|
```
|
|
|
|
This only affects users who interpolate unusual variables with multiple
|
|
consecutive dots in their task `config` block. All other interpolation is
|
|
unchanged.
|
|
|
|
Since HCL2 uses dotted object notation for interpolation users should transition
|
|
away from variable names with multiple consecutive dots.
|
|
|
|
### Downgrading clients
|
|
|
|
Due to the large refactor of the Nomad client in 0.9, downgrading to a previous
|
|
version of the client after upgrading it to Nomad 0.9 is not supported. To
|
|
downgrade safely, users should erase the Nomad client's data directory.
|
|
|
|
### `port_map` Environment Variable Changes
|
|
|
|
Before Nomad 0.9.0 ports mapped via a task driver's `port_map` block could be
|
|
interpolated via the `NOMAD_PORT_<label>` environment variables.
|
|
|
|
However, in Nomad 0.9.0 no parameters in a driver's `config` block, including
|
|
its `port_map`, are available for interpolation. This means `{{ env NOMAD_PORT_<label> }}` in a `template` block or `HTTP_PORT = "${NOMAD_PORT_http}"` in an `env` block will now interpolate the _host_ ports,
|
|
not the container's.
|
|
|
|
Nomad 0.10 introduced Task Group Networking which natively supports port mapping
|
|
without relying on task driver specific `port_map` fields. The
|
|
[`to`](/nomad/docs/job-specification/network#to) field on group network port blocks
|
|
will be interpolated properly. Please see the
|
|
[`network`](/nomad/docs/job-specification/network/) block documentation for details.
|
|
|
|
## Nomad 0.8.0
|
|
|
|
### Raft Protocol Version Compatibility
|
|
|
|
When upgrading to Nomad 0.8.0 from a version lower than 0.7.0, users will need
|
|
to set the [`raft_protocol`] option in
|
|
their `server` block to 1 in order to maintain backwards compatibility with the
|
|
old servers during the upgrade. After the servers have been migrated to version
|
|
0.8.0, `raft_protocol` can be moved up to 2 and the servers restarted to match
|
|
the default.
|
|
|
|
The Raft protocol must be stepped up in this way; only adjacent version numbers
|
|
are compatible (for example, version 1 cannot talk to version 3). Here is a
|
|
table of the Raft Protocol versions supported by each Nomad version:
|
|
|
|
<table>
|
|
<thead>
|
|
<tr>
|
|
<th>Version</th>
|
|
<th>Supported Raft Protocols</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>0.6 and earlier</td>
|
|
<td>0</td>
|
|
</tr>
|
|
<tr>
|
|
<td>0.7</td>
|
|
<td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td>0.8 and later</td>
|
|
<td>1, 2, 3</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
In order to enable all
|
|
[Autopilot](/nomad/tutorials/manage-clusters/autopilot) features, all
|
|
servers in a Nomad cluster must be running with Raft protocol version 3 or
|
|
later.
|
|
|
|
### Node Draining Improvements
|
|
|
|
Node draining via the [`node drain`][drain-cli] command or the [drain
|
|
API][drain-api] has been substantially changed in Nomad 0.8. In Nomad 0.7.1 and
|
|
earlier draining a node would immediately stop all allocations on the node
|
|
being drained. Nomad 0.8 now supports a [`migrate`][migrate] block in job
|
|
specifications to control how many allocations may be migrated at once and the
|
|
default will be used for existing jobs.
|
|
|
|
The `drain` command now blocks until the drain completes. To get the Nomad 0.7.1
|
|
and earlier drain behavior use the command: `nomad node drain -enable -force -detach <node-id>`
|
|
|
|
See the [`migrate` block documentation][migrate] and [Decommissioning Nodes
|
|
guide](/nomad/tutorials/manage-clusters/node-drain) for details.
|
|
|
|
### Periods in Environment Variable Names No Longer Escaped
|
|
|
|
_Applications which expect periods in environment variable names to be replaced
|
|
with underscores must be updated._
|
|
|
|
In Nomad 0.7 periods (`.`) in environment variables names were replaced with an
|
|
underscore in both the [`env`](/nomad/docs/job-specification/env) and
|
|
[`template`](/nomad/docs/job-specification/template) blocks.
|
|
|
|
In Nomad 0.8 periods are _not_ replaced and will be included in environment
|
|
variables verbatim.
|
|
|
|
For example the following block:
|
|
|
|
```text
|
|
env {
|
|
registry.consul.addr = "${NOMAD_IP_http}:8500"
|
|
}
|
|
```
|
|
|
|
In Nomad 0.7 would be exposed to the task as
|
|
`registry_consul_addr=127.0.0.1:8500`. In Nomad 0.8 it will now appear exactly
|
|
as specified: `registry.consul.addr=127.0.0.1:8500`.
|
|
|
|
### Client APIs Unavailable on Older Nodes
|
|
|
|
Because Nomad 0.8 uses a new RPC mechanism to route node-specific APIs like
|
|
[`nomad alloc fs`](/nomad/docs/commands/alloc/fs) through servers to the node,
|
|
0.8 CLIs are incompatible using these commands on clients older than 0.8.
|
|
|
|
To access these commands on older clients either continue to use a pre-0.8
|
|
version of the CLI, or upgrade all clients to 0.8.
|
|
|
|
### CLI Command Changes
|
|
|
|
Nomad 0.8 has changed the organization of CLI commands to be based on
|
|
subcommands. An example of this change is the change from `nomad alloc-status`
|
|
to `nomad alloc status`. All commands have been made to be backwards compatible,
|
|
but operators should update any usage of the old style commands to the new style
|
|
as the old style will be deprecated in future versions of Nomad.
|
|
|
|
### RPC Advertise Address
|
|
|
|
The behavior of the [advertised RPC address](/nomad/docs/configuration#rpc-1) has
|
|
changed to be only used to advertise the RPC address of servers to client nodes.
|
|
Server to server communication is done using the advertised Serf address.
|
|
Existing cluster's should not be effected but the advertised RPC address may
|
|
need to be updated to allow connecting client's over a NAT.
|
|
|
|
## Nomad 0.6.0
|
|
|
|
### Default `advertise` address changes
|
|
|
|
When no `advertise` address was specified and Nomad's `bind_addr` was loopback
|
|
or `0.0.0.0`, Nomad attempted to resolve the local hostname to use as an
|
|
advertise address.
|
|
|
|
Many hosts cannot properly resolve their hostname, so Nomad 0.6 defaults
|
|
`advertise` to the first private IP on the host (e.g. `10.1.2.3`).
|
|
|
|
If you manually configure `advertise` addresses no changes are necessary.
|
|
|
|
## Nomad Clients
|
|
|
|
The change to the default, advertised IP also effect clients that do not specify
|
|
which network_interface to use. If you have several routable IPs, it is advised
|
|
to configure the client's [network
|
|
interface](/nomad/docs/configuration/client#network_interface) such that tasks bind to
|
|
the correct address.
|
|
|
|
## Nomad 0.5.5
|
|
|
|
### Docker `load` changes
|
|
|
|
Nomad 0.5.5 has a backward incompatible change in the `docker` driver's
|
|
configuration. Prior to 0.5.5 the `load` configuration option accepted a list
|
|
images to load, in 0.5.5 it has been changed to a single string. No
|
|
functionality was changed. Even if more than one item was specified prior to
|
|
0.5.5 only the first item was used.
|
|
|
|
To do a zero-downtime deploy with jobs that use the `load` option:
|
|
|
|
- Upgrade servers to version 0.5.5 or later.
|
|
|
|
- Deploy new client nodes on the same version as the servers.
|
|
|
|
- Resubmit jobs with the `load` option fixed and a constraint to only run on
|
|
version 0.5.5 or later:
|
|
|
|
```hcl
|
|
constraint {
|
|
attribute = "${attr.nomad.version}"
|
|
operator = "version"
|
|
value = ">= 0.5.5"
|
|
}
|
|
```
|
|
|
|
- Drain and shutdown old client nodes.
|
|
|
|
### Validation changes
|
|
|
|
Due to internal job serialization and validation changes you may run into
|
|
issues using 0.5.5 command line tools such as `nomad run` and `nomad validate`
|
|
with 0.5.4 or earlier agents.
|
|
|
|
It is recommended you upgrade agents before or alongside your command line
|
|
tools.
|
|
|
|
## Nomad 0.4.0
|
|
|
|
Nomad 0.4.0 has backward incompatible changes in the logic for Consul
|
|
deregistration. When a Task which was started by Nomad v0.3.x is uncleanly shut
|
|
down, the Nomad 0.4 Client will no longer clean up any stale services. If an
|
|
in-place upgrade of the Nomad client to 0.4 prevents the Task from gracefully
|
|
shutting down and deregistering its Consul-registered services, the Nomad Client
|
|
will not clean up the remaining Consul services registered with the 0.3
|
|
Executor.
|
|
|
|
We recommend draining a node before upgrading to 0.4.0 and then re-enabling the
|
|
node once the upgrade is complete.
|
|
|
|
## Nomad 0.3.1
|
|
|
|
Nomad 0.3.1 removes artifact downloading from driver configurations and places them as
|
|
a first class element of the task. As such, jobs will have to be rewritten in
|
|
the proper format and resubmitted to Nomad. Nomad clients will properly
|
|
re-attach to existing tasks but job definitions must be updated before they can
|
|
be dispatched to clients running 0.3.1.
|
|
|
|
## Nomad 0.3.0
|
|
|
|
Nomad 0.3.0 has made several substantial changes to job files included a new
|
|
`log` block and variable interpretation syntax (`${var}`), a modified `restart`
|
|
policy syntax, and minimum resources for tasks as well as validation. These
|
|
changes require a slight change to the default upgrade flow.
|
|
|
|
After upgrading the version of the servers, all previously submitted jobs must
|
|
be resubmitted with the updated job syntax using a Nomad 0.3.0 binary.
|
|
|
|
- All instances of `$var` must be converted to the new syntax of `${var}`
|
|
|
|
- All tasks must provide their required resources for CPU, memory and disk as
|
|
well as required network usage if ports are required by the task.
|
|
|
|
- Restart policies must be updated to indicate whether it is desired for the
|
|
task to restart on failure or to fail using `mode = "delay"` or `mode = "fail"` respectively.
|
|
|
|
- Service names that include periods will fail validation. To fix, remove any
|
|
periods from the service name before running the job.
|
|
|
|
After updating the Servers and job files, Nomad Clients can be upgraded by first
|
|
draining the node so no tasks are running on it. This can be verified by running
|
|
`nomad node status <node-id>` and verify there are no tasks in the `running`
|
|
state. Once that is done the client can be killed, the `data_dir` should be
|
|
deleted and then Nomad 0.3.0 can be launched.
|
|
|
|
[api_jobs_parse]: /nomad/api-docs/jobs#parse-job
|
|
[artifacts]: /nomad/docs/job-specification/artifact
|
|
[artifact_params]: /nomad/docs/job-specification/artifact#artifact-parameters
|
|
[cgroups2]: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
|
|
[cgroup_parent]: /nomad/docs/configuration/client#cgroup_parent
|
|
[client_artifact]: /nomad/docs/configuration/client#artifact-parameters
|
|
[cores]: /nomad/docs/job-specification/resources#cores
|
|
[dangling-containers]: /nomad/docs/drivers/docker#dangling-containers
|
|
[drain-api]: /nomad/api-docs/nodes#drain-node
|
|
[drain-cli]: /nomad/docs/commands/node/drain
|
|
[dst]: /nomad/docs/job-specification/periodic#daylight-saving-time
|
|
[envoy_concurrency]: https://www.envoyproxy.io/docs/envoy/latest/operations/cli#cmdoption-concurrency
|
|
[gh-6787]: https://github.com/hashicorp/nomad/issues/6787
|
|
[gh-8457]: https://github.com/hashicorp/nomad/issues/8457
|
|
[gh-9148]: https://github.com/hashicorp/nomad/issues/9148
|
|
[gh-10875]: https://github.com/hashicorp/nomad/pull/10875
|
|
[gh-11563]: https://github.com/hashicorp/nomad/issues/11563
|
|
[go-client]: https://pkg.go.dev/github.com/hashicorp/nomad/api#Client
|
|
[hcl2]: https://github.com/hashicorp/hcl2
|
|
[limits]: /nomad/docs/configuration#limits
|
|
[lxc]: /nomad/plugins/drivers/community/lxc
|
|
[migrate]: /nomad/docs/job-specification/migrate
|
|
[nvidia]: /nomad/plugins/devices/nvidia
|
|
[plugin-block]: /nomad/docs/configuration/plugin
|
|
[plugins]: /nomad/plugins/drivers/community
|
|
[preemption-api]: /nomad/api-docs/operator#update-scheduler-configuration
|
|
[preemption]: /nomad/docs/concepts/scheduling/preemption
|
|
[proxy_concurrency]: /nomad/docs/job-specification/sidecar_task#proxy_concurrency
|
|
[`sidecar_task.config`]: /nomad/docs/job-specification/sidecar_task#config
|
|
[`raft_protocol`]: /nomad/docs/configuration/server#raft_protocol
|
|
[`raft protocol`]: /nomad/docs/configuration/server#raft_protocol
|
|
[`rejoin_after_leave`]: /nomad/docs/configuration/server#rejoin_after_leave
|
|
[reserved]: /nomad/docs/configuration/client#reserved-parameters
|
|
[task-config]: /nomad/docs/job-specification/task#config
|
|
[tls-guide]: /nomad/tutorials/transport-security/security-enable-tls
|
|
[tls-vault-guide]: /nomad/tutorials/integrate-vault/vault-pki-nomad
|
|
[update]: /nomad/docs/job-specification/update
|
|
[validate]: /nomad/docs/commands/job/validate
|
|
[vault_grace]: /nomad/docs/job-specification/template
|
|
[node drain]: /nomad/docs/upgrade#5-upgrade-clients
|
|
[`template.disable_file_sandbox`]: /nomad/docs/configuration/client#template-parameters
|
|
[template_gid]: /nomad/docs/job-specification/template#gid
|
|
[template_uid]: /nomad/docs/job-specification/template#uid
|
|
[pki]: /vault/docs/secrets/pki
|
|
[`volume create`]: /nomad/docs/commands/volume/create
|
|
[`volume register`]: /nomad/docs/commands/volume/register
|
|
[`volume`]: /nomad/docs/job-specification/volume
|
|
[enterprise licensing]: /nomad/docs/enterprise/license
|
|
[`cap_net_raw`]: https://security.stackexchange.com/a/128988
|
|
[`linux capabilities`]: https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities
|
|
[`allow_caps`]: /nomad/docs/drivers/docker#allow_caps
|
|
[`extra_hosts`]: /nomad/docs/drivers/docker#extra_hosts
|
|
[no_net_raw]: /nomad/docs/upgrade/upgrade-specific#nomad-1-1-0-rc1-1-0-5-0-12-12
|
|
[allow_caps_exec]: /nomad/docs/drivers/exec#allow_caps
|
|
[allow_caps_java]: /nomad/docs/drivers/java#allow_caps
|
|
[cap_add_exec]: /nomad/docs/drivers/exec#cap_add
|
|
[cap_drop_exec]: /nomad/docs/drivers/exec#cap_drop
|
|
[`log_file`]: /nomad/docs/configuration#log_file
|
|
[Upgrading to Raft Protocol 3]: /nomad/docs/upgrade#upgrading-to-raft-protocol-3
|
|
[`Local`]: /consul/docs/security/acl/acl-tokens#token-attributes
|
|
[anon_token]: /consul/docs/security/acl/acl-tokens#special-purpose-tokens
|
|
[consul_acl]: https://github.com/hashicorp/consul/issues/7414
|
|
[kill_timeout]: /nomad/docs/job-specification/task#kill_timeout
|
|
[max_kill_timeout]: /nomad/docs/configuration/client#max_kill_timeout
|
|
[alloc_overlap]: https://github.com/hashicorp/nomad/issues/10440
|
|
[gh_10446]: https://github.com/hashicorp/nomad/pull/10446#issuecomment-1224833906
|
|
[gh_issue]: https://github.com/hashicorp/nomad/issues/new/choose
|
|
[upgrade process]: /nomad/docs/upgrade#upgrade-process
|
|
[landlock]: https://docs.kernel.org/userspace-api/landlock.html
|
|
[artifact_fs_isolation]: /nomad/docs/configuration/client#disable_filesystem_isolation
|
|
[decompression_file_count_limit]: /nomad/docs/configuration/client#decompression_file_count_limit
|
|
[decompression_size_limit]: /nomad/docs/configuration/client#decompression_size_limit
|
|
[artifact_env]: /nomad/docs/configuration/client#set_environment_variables
|
|
[dangling_container_reconciliation]: /nomad/docs/drivers/docker#enabled
|
|
[hard_guide]: /nomad/docs/install/production/requirements#hardening-nomad
|