adf147cb36
The job evaluate endpoint creates a new evaluation for the job which is a write operation. This change modifies the necessary capability from `read-job` to `submit-job` to better reflect this.
1723 lines
71 KiB
Plaintext
1723 lines
71 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Upgrade Guides
|
|
description: |-
|
|
Specific versions of Nomad may have additional information about the upgrade
|
|
process beyond the standard flow.
|
|
---
|
|
|
|
# Upgrade Guides
|
|
|
|
The [upgrading page](/nomad/docs/upgrade) covers the details of doing a standard
|
|
upgrade. However, specific versions of Nomad may have more details provided for
|
|
their upgrades as a result of new features or changed behavior. This page is
|
|
used to document those details separately from the standard upgrade flow.
|
|
|
|
## Nomad 1.6.0
|
|
|
|
#### Job Evaluate API Endpoint Requires `submit-job` Instead of `read-job`
|
|
|
|
Nomad 1.6.0 updated the ACL capability requirement for the job evaluate
|
|
endpoint from `read-job` to `submit-job` to better reflect that this operation
|
|
writes state to Nomad. This endpoint is used by the `nomad job eval` CLI
|
|
command and so the ACL requirements changed for the command as well. Users that
|
|
called this endpoint or used this command using tokens with just the `read-job`
|
|
capability or the `read` policy must update their tokens to use the
|
|
`submit-job` capability or the `write` policy.
|
|
|
|
## Nomad 1.5.1
|
|
|
|
#### Artifact Download Regression Fix
|
|
|
|
Nomad 1.5.1 reverts a behavior of 1.5.0 where artifact downloads were executed
|
|
as the `nobody` user on compatible Linux systems. This was done optimistically
|
|
as defense against compromised artifact endpoints attempting to exploit the
|
|
Nomad Client or tools it uses to perform downloads such as git or mercurial.
|
|
Unfortunately running the child process as any user other than root is not
|
|
compatible with the advice given in Nomad's [security hardening guide][hard_guide]
|
|
which calls for a specific directory tree structure making such operation impossible.
|
|
|
|
Other changes to artifact downloading remain - they are executed as a child
|
|
process of the Nomad agent, and on modern Linux systems make use of the Kernel
|
|
landlock feature to restrict filesystem access from that process.
|
|
|
|
## Nomad 1.5.0
|
|
|
|
#### Pause Container Reconciliation Regression
|
|
|
|
Nomad 1.5.0 introduced a regression to the way the Docker driver reconciles
|
|
dangling containers. This meant pause containers would be erroneously removed,
|
|
even though the allocation was still running. This would not affect the running
|
|
allocation, but does cause it to fail if it needs to restart. An immediate
|
|
workaround is to disable
|
|
[dangling container reconciliation][dangling_container_reconciliation].
|
|
|
|
#### Artifact Download Sandboxing
|
|
|
|
Nomad 1.5.0 changes the way [artifacts] are downloaded when specifying an `artifact`
|
|
in a task configuration. Previously the Nomad Client would download artifacts
|
|
in-process. External commands used to facilitate the download (e.g. `git`, `hg`)
|
|
would be run as `root`, and the resulting payload would be owned as `root` in the
|
|
allocation's task directory.
|
|
|
|
In an effort to improve the resilience and security model of the Nomad Client,
|
|
in 1.5.0 artifact downloads occur in a sub-process. Where possible, that
|
|
sub-process is run as the `nobody` user, and on modern Linux systems will
|
|
be isolated from the filesystem via the kernel's [landlock] capabilitiy.
|
|
|
|
Operators are encouraged to ensure jobs making use of artifacts continue to work
|
|
as expected. In particular, git-ssh users will need to make sure the system-wide
|
|
`/etc/ssh/ssh_known_hosts` file is populated with any necessary remote hosts.
|
|
Previously, Nomad's documentation suggested configuring
|
|
`/root/.ssh/known_hosts` which would apply only to the `root` user.
|
|
|
|
The artifact downloader no longer inherits all environment variables available
|
|
to the Nomad Client. The downloader sub-process environment is set as follows on
|
|
Linux / macOS:
|
|
|
|
```
|
|
PATH=/usr/local/bin:/usr/bin:/bin
|
|
TMPDIR=<path to task dir>/tmp
|
|
```
|
|
|
|
and as follows on Windows:
|
|
|
|
```
|
|
TMP=<path to task dir>\tmp
|
|
TEMP=<path to task dir>\tmp
|
|
PATH=<inherit $PATH>
|
|
HOMEPATH=<inherit $HOMEPATH>
|
|
HOMEDRIVE=<inherit $HOMEDRIVE>
|
|
USERPROFILE=<inherit $USERPROFILE>
|
|
```
|
|
|
|
Configuration of the artifact downloader should happen through the [`options`][artifact_params]
|
|
and [`headers`][artifact_params] fields of the `artifact` block. For backwards
|
|
compatibility, the sandbox can be configured to inherit specified environment variables
|
|
from the Nomad client by setting [`set_environment_variables`][artifact_env].
|
|
|
|
The use of filesystem isolation can be disabled in Client configuration by
|
|
setting [`disable_filesystem_isolation`][artifact_fs_isolation].
|
|
|
|
#### Artifact Decompression Limits
|
|
|
|
Nomad 1.5.0 now sets default limits around artifact decompression. A single artifact
|
|
payload is now limited to 100GB and 4096 files when decompressed. An artifact that
|
|
exceeds these limits during decompression will cause the artifact downloader to
|
|
fail. These limits can be adjusted or disabled in the client artifact configuration
|
|
by setting [`decompression_size_limit`][decompression_size_limit] and
|
|
[`decompression_file_count_limit`][decompression_file_count_limit].
|
|
|
|
#### Datacenter Wildcards
|
|
|
|
In Nomad 1.5.0, the
|
|
[`datacenters`](/nomad/docs/job-specification/job#datacenters) field for a job
|
|
accepts wildcards for multi-character matching. For example, `datacenters =
|
|
["dc*"]` will match all datacenters that start with `"dc"`. The default value
|
|
for `datacenters` is now `["*"]`, so the field can be omitted.
|
|
|
|
The `*` character is no longer a legal character in the
|
|
[`datacenter`](/nomad/docs/configuration#datacenter) field for an agent
|
|
configuration. Before upgrading to Nomad 1.5.0, you should first ensure that
|
|
you've updated any jobs that currently have a `*` in their datacenter name and
|
|
then ensure that no agents have this character in their `datacenter` field name.
|
|
|
|
#### Server `rejoin_after_leave` (default: `false`) now enforced
|
|
|
|
All Nomad versions prior to v1.5.0 have incorrectly ignored the Server
|
|
[`rejoin_after_leave`] configuration option. This bug has been fixed in Nomad
|
|
version v1.5.0.
|
|
|
|
Previous to v1.5.0 the behavior of Nomad `rejoin_after_leave` was always `true`,
|
|
regardless of Nomad server configuration, while the documentation incorrectly
|
|
indicated a default of `false`.
|
|
|
|
Cluster operators should be aware that explicit `leave` events (such as `nomad
|
|
server force-leave`) will now result in behavior which matches this
|
|
configuration, and should review whether they were inadvertently relying on the
|
|
buggy behavior.
|
|
|
|
#### Changes to eval broker metrics
|
|
|
|
The metric `nomad.nomad.broker.total_blocked` has been changed to
|
|
`nomad.nomad.broker.total_pending`. This state refers to internal state of the
|
|
leader's broker, and this is easily confused with the unrelated evaluation
|
|
status `"blocked"` in the Nomad API.
|
|
|
|
#### Deprecated gossip keyring commands removed
|
|
|
|
The commands `nomad operator keyring`, `nomad keyring`, `nomad operator keygen`,
|
|
and `nomad keygen` used to manage the gossip keyring were marked as deprecated
|
|
in Nomad 1.4.0. In Nomad 1.5.0, these commands have been removed. Use the `nomad
|
|
operator gossip keyring` commands to manage the gossip keyring.
|
|
|
|
#### Garbage collection of evaluations and allocations for batch job
|
|
|
|
Versions prior to 1.5.0 only delete evaluations and allocations of batch jobs
|
|
that are explicitly stopped which can lead to unbounded memory growth of Nomad
|
|
when the batch job is executed multiple times.
|
|
|
|
Nomad 1.5.0 introduces a new server configuration
|
|
[`batch_eval_gc_threshold`](/nomad/docs/configuration/server#batch_eval_gc_threshold)
|
|
to control how allocations and evaluations for batch jobs are collected.
|
|
|
|
The default threshold is `24h`. If you need to access completed allocations for
|
|
batch jobs that are older than 24h you must increase this value when upgrading
|
|
Nomad.
|
|
|
|
## Nomad 1.4.5, 1.3.10
|
|
|
|
#### Pause Container Reconciliation Regression
|
|
|
|
Nomad 1.4.5 and 1.3.10 introduced a regression to the way the Docker driver
|
|
reconciles dangling containers. This meant pause containers would be erroneously
|
|
removed, even though the allocation was still running. This would not affect the
|
|
running allocation, but does cause it to fail if it needs to restart. An immediate
|
|
workaround is to disable
|
|
[dangling container reconciliation][dangling_container_reconciliation].
|
|
|
|
## Nomad 1.4.4, 1.3.9
|
|
|
|
#### Garbage collection of evaluations and allocations for batch job
|
|
|
|
Versions prior to 1.4.4 and 1.3.9 only delete evaluations and allocations of
|
|
batch jobs that are explicitly stopped which can lead to unbounded memory
|
|
growth of Nomad when the batch job is executed multiple times.
|
|
|
|
Nomad 1.4.4 and 1.3.9 introduces a new server configuration
|
|
[`batch_eval_gc_threshold`](/nomad/docs/configuration/server#batch_eval_gc_threshold)
|
|
to control how allocations and evaluations for batch jobs are collected.
|
|
|
|
The default threshold is `24h`. If you need to access completed allocations for
|
|
batch jobs that are older than 24h you must increase this value when upgrading
|
|
Nomad.
|
|
|
|
## Nomad 1.4.0
|
|
|
|
#### Possible Panic During Upgrades
|
|
|
|
Nomad 1.4.0 initializes a keyring on the leader if one has not been previously
|
|
created, which writes a new raft entry. Users have reported that the keyring
|
|
initialization can cause a panic on older servers during upgrades. Following the
|
|
documented [upgrade process][] closely will reduce the risk of this panic. But
|
|
if a server with version 1.4.0 becomes leader while servers with versions before
|
|
1.4.0 are still in the cluster, the older servers will panic.
|
|
|
|
The most likely scenario for this is if the leader is still on a version before
|
|
1.4.0 and is netsplit from the rest of the cluster or the server is restarted
|
|
without upgrading, and one of the 1.4.0 servers becomes the leader.
|
|
|
|
You can recover from the panic by immediately upgrading the old servers. This
|
|
bug was fixed in Nomad 1.4.1.
|
|
|
|
#### Raft Protocol Version 2 Unsupported
|
|
|
|
Raft protocol version 2 was deprecated in Nomad v1.3.0, and is being removed
|
|
in Nomad v1.4.0. In Nomad 1.3.0, the default raft protocol version was updated
|
|
to version 3, and in Nomad 1.4.0 Nomad requires the use of raft protocol version
|
|
3. If [`raft_protocol`] version is explicitly set, it must now be set to `3`.
|
|
For more information see the [Upgrading to Raft Protocol 3] guide.
|
|
|
|
#### Audit logs filtering logic changed
|
|
|
|
Audit Log filtering in previous versions of Nomad handled `stages` and
|
|
`operations` filters as `OR` filters. If _either_ condition was met, the logs
|
|
would be filtered. As of 1.4.0, `stages` and `operations` are treated as `AND
|
|
filters`. Logs will only be filtered if all filter conditions match.
|
|
|
|
#### Prevent Overlapping New Allocations with Stopping Allocations
|
|
|
|
Prior to Nomad 1.4.0 the scheduler would consider the resources used by
|
|
allocations that are in the process of stopping to be free for new allocations
|
|
to use. This could cause newer allocations to crash when they try to use TCP
|
|
ports or memory used by an allocation in the process of stopping. The new and
|
|
stopping [allocations would "overlap" improperly.][alloc_overlap]
|
|
|
|
[Nomad 1.4.0 fixes this behavior][gh_10446] so that an allocation's resources
|
|
are only considered free for reuse once the client node the allocation was
|
|
running on reports it has stopped. Technically speaking: only once the
|
|
`Allocation.ClientStatus` has reached a terminal state (`complete`, `failed`,
|
|
or `lost`).
|
|
|
|
Despite this being a bug fix, it is considered a significant enough change in
|
|
behavior to reserve for a major Nomad release and *not* be backported. Please
|
|
report any negative side effects encountered as [new
|
|
issues.][gh_issue]
|
|
|
|
#### `nomad eval status -json` Without Evaluation ID Removed
|
|
|
|
Using `nomad eval status -json` without providing an evaluation ID was
|
|
deprecated in Nomad 1.2.4 with the intent to remove in Nomad 1.4.0. This option
|
|
has been removed. You can use `nomad eval list` to get a list of evaluations and
|
|
can use `nomad eval list -json` to get that list in JSON format. The `nomad eval
|
|
status <eval ID>` command will format a specific evaluation in JSON format if
|
|
the `-json` flag is provided.
|
|
|
|
#### Removing Vault/Consul from Clients
|
|
|
|
Nomad clients no longer have their Consul and Vault fingerprints cleared when
|
|
connectivity is lost with Consul and Vault. To intentionally remove Consul and
|
|
Vault from a client node, you will need to restart the Nomad client agent.
|
|
|
|
#### Numeric Operand Comparisons in Constraints
|
|
|
|
Prior to Nomad 1.4.0 the `<, <=, >, >=` operators in a constraint would always
|
|
compare the operands lexically. This behavior has been changed so that the comparison
|
|
is done numerically if both operands are integers or floats.
|
|
|
|
## Nomad 1.3.3
|
|
|
|
Environments that don't support the use of [`uid`][template_uid] and
|
|
[`gid`][template_gid] in `template` blocks, such as Windows clients, may
|
|
experience task failures with the following message after upgrading to Nomad
|
|
1.3.3:
|
|
|
|
```
|
|
Template failed: error rendering "(dynamic)" => "...": failed looking up user: managing file ownership is not supported on Windows
|
|
```
|
|
|
|
It is recommended to avoid this version of Nomad in such environments.
|
|
|
|
## Nomad 1.3.2, 1.2.9, 1.1.15
|
|
|
|
#### Client `max_kill_timeout` now enforced
|
|
|
|
Nomad versions since v0.9 have incorrectly ignored the Client [`max_kill_timeout`][max_kill_timeout]
|
|
configuration option. This bug has been fixed in Nomad versions v.1.3.2,
|
|
v1.2.9, and v1.1.15. Job submitters should be aware that a Task's [`kill_timeout`][kill_timeout]
|
|
will be reduced to the Client's `max_kill_timeout` if the value exceeds the maximum.
|
|
|
|
## Nomad 1.3.1, 1.2.8, 1.1.14
|
|
|
|
#### Default `artifact` limits
|
|
|
|
Nomad 1.3.1, 1.2.8, and 1.1.14 introduced mechanisms to limit the size of
|
|
`artifact` downloads and how long these operations can take. The limits are
|
|
defined in the new [`artifact`client configuration][client_artifact] and have
|
|
predefined default values.
|
|
|
|
While the defaults set are fairly large, it is recommended to double-check them
|
|
prior to upgrading your Nomad clients to make sure they fit your needs.
|
|
|
|
## Nomad 1.3.0
|
|
|
|
#### Raft Protocol Version 2 Deprecation
|
|
|
|
Raft protocol version 2 will be removed from Nomad in the next major
|
|
release of Nomad, 1.4.0.
|
|
|
|
In Nomad 1.3.0, the default raft protocol version has been updated to
|
|
3. If the [`raft_protocol`] version is not explicitly set, upgrading a
|
|
server will automatically upgrade that server's raft protocol. See the
|
|
[Upgrading to Raft Protocol 3] guide.
|
|
|
|
#### Client State Store
|
|
|
|
The client state store will be automatically migrated to a new schema
|
|
version when upgrading a client.
|
|
|
|
Downgrading to a previous version of the client after upgrading it to
|
|
Nomad 1.3 is not supported. To downgrade safely, users should drain
|
|
all tasks from the Nomad client and erase its data directory.
|
|
|
|
#### CSI Plugins
|
|
|
|
The client filesystem layout for CSI plugins has been updated to
|
|
correctly handle the lifecycle of multiple allocations serving the
|
|
same plugin. Running plugin tasks will not be updated after upgrading
|
|
the client, but it is recommended to redeploy CSI plugin jobs after
|
|
upgrading the cluster.
|
|
|
|
The directory for plugin control sockets will be mounted from a new
|
|
per-allocation directory in the client data dir. This will still be
|
|
bind-mounted to `csi_plugin.mount_config` as in versions of Nomad
|
|
prior to 1.3.0.
|
|
|
|
The volume staging directory for new CSI plugin tasks will now be
|
|
mounted to the task's `NOMAD_TASK_DIR` instead of the
|
|
`csi_plugin.mount_config`.
|
|
|
|
#### Raft leadership transfer on error
|
|
|
|
Starting with Nomad 1.3.0, when a Nomad server is elected the Raft leader but
|
|
fails to complete the process to start acting as the Nomad leader it will
|
|
attempt to gracefully transfer its Raft leadership status to another eligible
|
|
server in the cluster. This operation is only supported when using Raft
|
|
Protocol Version 3.
|
|
|
|
#### Server Raft Database
|
|
|
|
The server raft database in `raft.db` will be automatically migrated to a new
|
|
underlying implementation provided by `go.etcd.io/bbolt`. Downgrading to a previous
|
|
version of the server after upgrading it to Nomad 1.3 is not supported. Like with
|
|
any Nomad upgrade it is recommended to take a snapshot of your database prior to
|
|
upgrading in case a downgrade becomes necessary.
|
|
|
|
The new database implementation enables a new server configuration option for
|
|
controlling the underlying freelist-sync behavior. Clusters experiencing extreme
|
|
disk IO on servers may want to consider disabling freelist-sync to reduce load.
|
|
The tradeoff is longer server startup times, as the database must be completely
|
|
scanned to re-build the freelist from scratch.
|
|
|
|
```hcl
|
|
server {
|
|
raft_boltdb {
|
|
no_freelist_sync = true
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Changes to the `nomad server members` command
|
|
|
|
The standard output of the `nomad server members` command replaces the previous
|
|
`Protocol` column that indicated the Serf protocol version with a new column
|
|
named `Raft Version` which outputs the Raft protocol version defined in each
|
|
server.
|
|
|
|
The `-detailed` flag is now called `-verbose` and outputs the standard values
|
|
in addition to extra information. The previous name is still supported but may
|
|
be removed in future releases.
|
|
|
|
The previous `Protocol` value can be viewed using the `-verbose` flag.
|
|
|
|
#### Changes to `client.template.function_denylist` configuration
|
|
|
|
consul-template v0.28 added a new function
|
|
[`writeToFile`](https://github.com/hashicorp/consul-template/blob/v0.28.0/docs/templating-language.md#writeToFile)
|
|
which can write to arbitrary files on the host.
|
|
|
|
Nomad 1.3.0 disables this function by default in its
|
|
[`function_denylist`](/nomad/docs/configuration/client#function_denylist).
|
|
|
|
However *if you have overridden the default `template.function_denylist` in
|
|
your client configuration, you must add `writeToFile` to your denylist.*
|
|
Failing to do so allows templates to write to arbitrary paths on the host.
|
|
|
|
#### Changes to Envoy metrics labels
|
|
|
|
When using Envoy as a sidecar proxy for Connect enabled services, Nomad will now
|
|
automatically inject the unique allocation ID into Envoy's stats tags configuration.
|
|
Users who wish to set the tag values themselves may do so using the [`proxy.config`](/nomad/docs/job-specification/proxy#config)
|
|
block.
|
|
|
|
```hcl
|
|
connect {
|
|
sidecar_service {
|
|
proxy {
|
|
config {
|
|
envoy_stats_tags = ["nomad.alloc_id=<allocID>"]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Changes to Consul Connect Service Identity Tokens
|
|
|
|
Starting with Nomad 1.3.0, Consul Service Identity Tokens created automatically
|
|
by Nomad on behalf of Connect services will now be created as [`Local`] tokens. These
|
|
tokens will no longer be replicated globally. To facilitate cross-Consul datacenter
|
|
requests of Connect services registered by Nomad, Consul agents will need to be
|
|
configured with [default anonymous][anon_token] ACL tokens with ACL policies of
|
|
sufficient permissions to read service and node metadata pertaining to those
|
|
requests. This mechanism is described in Consul [#7414][consul_acl].
|
|
A typical Consul agent anonymous token may contain an ACL policy such as:
|
|
|
|
```hcl
|
|
service_prefix "" { policy = "read" }
|
|
node_prefix "" { policy = "read" }
|
|
```
|
|
|
|
The minimum version of Consul supported by Nomad's Connect integration is now Consul v1.8.0.
|
|
|
|
#### Changes to task groups that utilise Consul services and checks
|
|
|
|
Starting with Nomad 1.3.0, services and checks that utilise Consul will have an
|
|
automatic constraint placed upon the task group. This ensures they are placed
|
|
on a client with a Consul agent running that meets a minimum version
|
|
requirement. The minimum version of Consul supported by Nomad's service and
|
|
check blocks is now Consul v1.7.0.
|
|
|
|
#### Linux Control Groups Version 2
|
|
|
|
Starting with Nomad 1.3.0, Linux systems configured to use [cgroups v2][cgroups2]
|
|
are now supported. A Nomad client will only activate its v2 control groups manager
|
|
if the system is configured with the cgroups2 controller mounted at `/sys/fs/cgroup`.
|
|
* Systems that do not support cgroups v2 are not affected.
|
|
* Systems configured in hybrid mode typically mount the cgroups2
|
|
controller at `/sys/fs/cgroup/unified`, so Nomad will continue to
|
|
use cgroups v1 for these hosts.
|
|
* Systems configured with only cgroups v2 now correctly support setting cpu [cores].
|
|
|
|
Nomad will preserve the existing cgroup for tasks when a client is
|
|
upgraded, so there will be no disruption to tasks. A new client
|
|
attribute `unique.cgroup.version` indicates which version of control
|
|
groups Nomad is using.
|
|
|
|
When cgroups v2 are in use, Nomad uses `nomad.slice` as the [default parent][cgroup_parent] for cgroups
|
|
created on behalf of tasks. The cgroup created for a task is named in the form `<allocID>.<task>.scope`.
|
|
These cgroups are created by Nomad before a task starts. External task drivers that support
|
|
containerization should be updated to make use of the new cgroup locations.
|
|
|
|
The new cgroup file system layout will look like the following:
|
|
|
|
```shell-session
|
|
➜ tree -d /sys/fs/cgroup/nomad.slice
|
|
/sys/fs/cgroup/nomad.slice
|
|
├── 8b8da4cf-8ebf-b578-0bcf-77190749abf3.redis.scope
|
|
└── a8c8e495-83c8-311b-4657-e6e3127e98bc.example.scope
|
|
```
|
|
#### Support for pre-0.9 Tasks Removed
|
|
|
|
Running tasks that were created on clusters from Nomad version 0.9 or
|
|
earlier will fail to restore after upgrading a cluster to Nomad
|
|
1.3.0. To safely upgrade without unplanned interruptions, force these
|
|
tasks to be rescheduled by `nomad alloc stop` before upgrading. Note
|
|
this only applies to tasks that have been running continuously from
|
|
before 0.9 without rescheduling. Jobs that were created before 0.9 but
|
|
have had tasks replaced over time after 0.9 will operate normally
|
|
during the upgrade.
|
|
|
|
## Nomad 1.2.6, 1.1.12, and 1.0.18
|
|
|
|
#### ACL requirement for the job parse endpoint
|
|
|
|
Nomad 1.2.6, 1.1.12, and 1.0.18 require ACL authentication for the
|
|
[job parse][api_jobs_parse] API endpoint. The `parse-job` capability has been
|
|
created to allow access to this endpoint. The `submit-job`, `read`, and `write`
|
|
policies include this capability.
|
|
|
|
|
|
The capability must be enabled for the namespace used in the API request.
|
|
|
|
## Nomad 1.2.4
|
|
|
|
#### `nomad eval status -json` deprecated
|
|
|
|
Nomad 1.2.4 includes a new `nomad eval list` command that has the
|
|
option to display the results in JSON format with the `-json`
|
|
flag. This replaces the existing `nomad eval status -json` option. In
|
|
Nomad 1.4.0, `nomad eval status -json` will be changed to display only
|
|
the selected evaluation in JSON format.
|
|
|
|
## Nomad 1.2.2
|
|
|
|
### Panic on node class filtering for system and sysbatch jobs fixed
|
|
|
|
Nomad 1.2.2 fixes a [server crashing bug][gh-11563] present in the scheduler
|
|
node class filtering since 1.2.0. Users should upgrade to Nomad 1.2.2 to avoid
|
|
this problem.
|
|
|
|
## Nomad 1.2.0
|
|
|
|
#### Nvidia device plugin
|
|
|
|
The Nvidia device is now an external plugin and must be installed separately.
|
|
Refer to [the Nvidia device plugin's documentation][nvidia] for details.
|
|
|
|
#### ACL requirements for accessing the job details page in the Nomad UI
|
|
|
|
Nomad 1.2.0 introduced a new UI component to display the status of `system` and
|
|
`sysbatch` jobs in each client where they are running. This feature makes an
|
|
API call to an endpoint that requires `node:read` ACL permission. Tokens used
|
|
to access the Nomad UI will need to be updated to include this permission in
|
|
order to access a job details page.
|
|
|
|
This was an unintended change fixed in Nomad 1.2.4.
|
|
|
|
#### HCLv2 Job Specification Parsing
|
|
|
|
In previous versions of Nomad, when rendering a job specification using override
|
|
variables, a warning would be returned if a variable within an override file
|
|
was declared that was not found within the job specification. This behaviour
|
|
differed from passing variables via the `-var` flag, which would always cause an
|
|
error in the same situation.
|
|
|
|
Nomad 1.2.0 fixed the behaviour consistency to always return an error by default,
|
|
where an override variable was specified which was not a known variable within the
|
|
job specification. In order to mitigate this change for users who wish to only
|
|
be warned when this situation arises, the `-hcl-strict=false` flag can be
|
|
specified.
|
|
|
|
## Nomad 1.0.11 and 1.1.5 Enterprise
|
|
|
|
#### Audit log file names
|
|
|
|
Audit log file naming now matches the standard log file naming introduced in
|
|
1.0.10 and 1.1.4. The audit log currently being written will no longer have a
|
|
timestamp appended.
|
|
|
|
## Nomad 1.0.10 and 1.1.4
|
|
|
|
#### Log file names
|
|
|
|
The [`log_file`] configuration option was not being fully respected, as the
|
|
generated filename would include a timestamp. After upgrade, the active log
|
|
file will always be the value defined in `log_file`, with timestamped files
|
|
being created during log rotation.
|
|
|
|
## Nomad 1.0.9 and 1.1.3
|
|
|
|
#### Namespace in Job Run and Plan APIs
|
|
|
|
The Job Run and Plan APIs now respect the `?namespace=...` query parameter over
|
|
the namespace specified in the job itself. This matches the precedence of
|
|
region and [fixes a bug where the `-namespace` flag was not respected for the
|
|
`nomad run` and `nomad apply` commands.][gh-10875]
|
|
|
|
For users of [`api.Client`][go-client] who want their job namespace respected,
|
|
you must ensure the `Config.Namespace` field is unset.
|
|
|
|
#### Docker Driver
|
|
|
|
**1.1.3 only**
|
|
|
|
Starting in Nomad 1.1.2, task groups with `network.mode = "bridge"` generated a
|
|
hosts file in Docker containers. This generated hosts file was bind-mounted
|
|
from the task directory to `/etc/hosts` within the task. In Nomad 1.1.3 the
|
|
source for the bind mount was moved to the allocation directory so that it is
|
|
shared between all tasks in an allocation.
|
|
|
|
Please note that this change may prevent [`extra_hosts`] values from being
|
|
properly set in each task when there are multiple tasks within the same group.
|
|
When using `extra_hosts` with Consul Connect in `bridge` network mode, you
|
|
should set the hosts values in the [`sidecar_task.config`] block instead.
|
|
|
|
## Nomad 1.1.0
|
|
|
|
#### Enterprise licenses
|
|
|
|
Nomad Enterprise licenses are no longer stored in raft or synced between
|
|
servers. Nomad Enterprise servers will not start without a license. There is
|
|
no longer a six hour evaluation period when running Nomad Enterprise. Before
|
|
upgrading, you must provide each server with a license on disk or in its
|
|
environment (see the [Enterprise licensing] documentation for details).
|
|
|
|
The `nomad license put` command has been removed.
|
|
|
|
The `nomad license get` command is no longer forwarded to the Nomad leader,
|
|
and will return the license from the specific server being contacted.
|
|
|
|
Click [here](https://www.hashicorp.com/products/nomad/trial) to get a trial license for Nomad Enterprise.
|
|
|
|
#### Agent Metrics API
|
|
|
|
The Nomad agent metrics API now respects the
|
|
[`prometheus_metrics`](/nomad/docs/configuration/telemetry#prometheus_metrics)
|
|
configuration value. If this value is set to `false`, which is the default value,
|
|
calling `/v1/metrics?format=prometheus` will now result in a response error.
|
|
|
|
#### CSI volumes
|
|
|
|
The volume specification for CSI volumes has been updated to support volume
|
|
creation. The `access_mode` and `attachment_mode` fields have been moved to a
|
|
`capability` block that can be repeated. Existing registered volumes will be
|
|
automatically modified the next time that a volume claim is updated. Volume
|
|
specification files for new volumes should be updated to the format described
|
|
in the [`volume create`] and [`volume register`] commands.
|
|
|
|
The [`volume`] block has an `access_mode` and `attachment_mode` field that are
|
|
required for CSI volumes. Jobs that use CSI volumes should be updated with
|
|
these fields.
|
|
|
|
#### Connect native tasks
|
|
|
|
Connect native tasks running in host networking mode will now have `CONSUL_HTTP_ADDR`
|
|
set automatically. Before this was only the case for bridge networking. If an operator
|
|
already explicitly set `CONSUL_HTTP_ADDR` then it will not get overridden.
|
|
|
|
#### Linux capabilities in exec/java
|
|
|
|
Following the security [remediation][no_net_raw] in Nomad versions 0.12.12, 1.0.5,
|
|
and 1.1.0-rc1, the `exec` and `java` task drivers will additionally no longer enable
|
|
the following linux capabilities by default.
|
|
|
|
```
|
|
AUDIT_CONTROL AUDIT_READ BLOCK_SUSPEND DAC_READ_SEARCH IPC_LOCK IPC_OWNER LEASE
|
|
LINUX_IMMUTABLE MAC_ADMIN MAC_OVERRIDE NET_ADMIN NET_BROADCAST NET_RAW SYS_ADMIN
|
|
SYS_BOOT SYSLOG SYS_MODULE SYS_NICE SYS_PACCT SYS_PTRACE SYS_RAWIO SYS_RESOURCE
|
|
SYS_TIME SYS_TTY_CONFIG WAKE_ALARM
|
|
```
|
|
|
|
The capabilities now enabled by default are modeled after Docker default
|
|
[`linux capabilities`] (excluding `NET_RAW`).
|
|
|
|
```
|
|
AUDIT_WRITE CHOWN DAC_OVERRIDE FOWNER FSETID KILL MKNOD NET_BIND_SERVICE
|
|
SETFCAP SETGID SETPCAP SETUID SYS_CHROOT
|
|
```
|
|
|
|
A new `allow_caps` plugin configuration parameter for [`exec`][allow_caps_exec]
|
|
and [`java`][allow_caps_java] task drivers can be used to restrict the set of
|
|
capabilities allowed for use by tasks.
|
|
|
|
Tasks using the `exec` or `java` task drivers can add or remove desired linux
|
|
capabilities using the [`cap_add`][cap_add_exec] and [`cap_drop`][cap_drop_exec]
|
|
task configuration options.
|
|
|
|
#### iptables
|
|
|
|
Nomad now appends its iptables rules to the `NOMAD-ADMIN` chain instead of
|
|
inserting them as the first rule. This allows better control for user-defined
|
|
iptables rules but users who append rules currently should verify that their
|
|
rules are being appended in the correct order.
|
|
|
|
## Nomad 1.1.0-rc1, 1.0.5, 0.12.12
|
|
|
|
Nomad versions 1.1.0-rc1, 1.0.5 and 0.12.12 change the behavior of the `docker`, `exec`,
|
|
and `java` task drivers so that the [`CAP_NET_RAW`] linux capability is disabled
|
|
by default. This is one of the [`linux capabilities`] that Docker itself enables
|
|
by default, as this capability enables the generation of ICMP packets - used by
|
|
the common `ping` utility for performing network diagnostics. When used by groups in
|
|
`bridge` networking mode, the `CAP_NET_RAW` capability also exposes tasks to ARP spoofing,
|
|
enabling DoS and MITM attacks against other tasks running in `bridge` networking
|
|
on the same host. Operators should weigh potential impact of an upgrade on their
|
|
applications against the security consequences inherit with `CAP_NET_RAW`. Typical
|
|
applications using `tcp` or `udp` based networking should not be affected.
|
|
|
|
This is the sole change for Nomad 1.0.5 and 0.12.12, intended to provide better
|
|
task network isolation by default.
|
|
|
|
Users of the `docker` driver can restore the previous behavior by configuring the
|
|
[`allow_caps`] driver configuration option to explicitly enable the `CAP_NET_RAW`
|
|
capability.
|
|
|
|
```hcl
|
|
plugin "docker" {
|
|
config {
|
|
allow_caps = [
|
|
"CHOWN", "DAC_OVERRIDE", "FSETID", "FOWNER", "MKNOD",
|
|
"SETGID", "SETUID", "SETFCAP", "SETPCAP", "NET_BIND_SERVICE",
|
|
"SYS_CHROOT", "KILL", "AUDIT_WRITE", "NET_RAW",
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
An upcoming version of Nomad will include similar configuration options for the
|
|
`exec` and `java` task drivers.
|
|
|
|
This change is limited to `docker`, `exec`, and `java` driver plugins. It does
|
|
not affect the Nomad server. This only affects Nomad clients running Linux, with
|
|
tasks using `bridge` networking and one of these task drivers, or third-party
|
|
plugins which relied on the shared Nomad executor library.
|
|
|
|
Upgrading a Nomad client to 1.0.5 or 0.12.12 will not restart existing tasks. As
|
|
such, processes from existing `docker`, `exec`, or `java` tasks will need to be
|
|
manually restarted (using `alloc stop` or another mechanism) in order to be
|
|
fully isolated.
|
|
|
|
## Nomad 1.0.3, 0.12.10
|
|
|
|
Nomad versions 1.0.3 and 0.12.10 change the behavior of the `exec` and `java` drivers so that
|
|
tasks are isolated in their own PID and IPC namespaces. As a result, the
|
|
process launched by these drivers will be PID 1 in the namespace. This has
|
|
[significant impact](https://man7.org/linux/man-pages/man7/pid_namespaces.7.html)
|
|
on the treatment of a process by the Linux kernel. Furthermore, tasks in the
|
|
same allocation will no longer be able to coordinate using signals, SystemV IPC
|
|
objects, or POSIX message queues. Operators should weigh potential impact of an
|
|
upgrade on their applications against the security consequences inherent in using
|
|
the host namespaces.
|
|
|
|
This is the sole change for Nomad 1.0.3, intended to provide better process
|
|
isolation by default. An upcoming version of Nomad will include options for
|
|
configuring this behavior.
|
|
|
|
This change is limited to the `exec` and `java` driver plugins. It does not affect
|
|
the Nomad server. This only affect Nomad clients running on Linux, using the
|
|
`exec` or `java` drivers or third-party driver plugins which relied on the shared
|
|
Nomad executor library.
|
|
|
|
Upgrading a Nomad client to 1.0.3 or 0.12.10 will not restart existing tasks.
|
|
As such, processes from existing `exec`/`java` tasks will need to be manually restarted
|
|
(using `alloc stop` or another mechanism) in order to be fully isolated.
|
|
|
|
## Nomad 1.0.2
|
|
|
|
#### Dynamic secrets trigger template changes on client restart
|
|
|
|
Nomad 1.0.2 changed the behavior of template `change_mode` triggers when a
|
|
client node restarts. In Nomad 1.0.1 and earlier, the first rendering of a
|
|
template after a client restart would not trigger the `change_mode`. For
|
|
dynamic secrets such as the Vault PKI secrets engine, this resulted in the
|
|
secret being updated but not restarting or signalling the task. When the
|
|
secret's lease expired at some later time, the task workload might fail
|
|
because of the stale secret. For example, a web server's SSL certificate would
|
|
be expired and browsers would be unable to connect.
|
|
|
|
In Nomad 1.0.2, when a client node is restarted any task with Vault secrets
|
|
that are generated or have expired will have its `change_mode` triggered. If
|
|
`change_mode = "restart"` this will result in the task being restarted, to
|
|
avoid the task failing unexpectedly at some point in the future. This change
|
|
only impacts tasks using dynamic Vault secrets engines such as [PKI][pki], or
|
|
when secrets are rotated. Secrets that don't change in Vault will not trigger
|
|
a `change_mode` on client restart.
|
|
|
|
## Nomad 1.0.1
|
|
|
|
#### Envoy worker threads
|
|
|
|
Nomad v1.0.0 changed the default behavior around the number of worker threads
|
|
created by the Envoy when being used as a sidecar for Consul Connect. In Nomad
|
|
v1.0.1, the same default setting of [`--concurrency=1`][envoy_concurrency] is set for Envoy when used
|
|
as a Connect gateway. As before, the [`meta.connect.proxy_concurrency`][proxy_concurrency]
|
|
property can be set in client configuration to override the default value.
|
|
|
|
## Nomad 1.0.0
|
|
|
|
### HCL2 for Job specification
|
|
|
|
Nomad v1.0.0 adopts HCL2 for parsing the job spec. HCL2 extends HCL with more
|
|
expression and reuse support, but adds some stricter schema for HCL blocks
|
|
(a.k.a. blocks). Check [HCL](/nomad/docs/job-specification/hcl2) for more details.
|
|
|
|
### Signal used when stopping Docker tasks
|
|
|
|
When stopping tasks running with the Docker task driver, Nomad documents that a
|
|
`SIGTERM` will be issued (unless configured with `kill_signal`). However, recent
|
|
versions of Nomad would issue `SIGINT` instead. Starting again with Nomad v1.0.0
|
|
`SIGTERM` will be sent by default when stopping Docker tasks.
|
|
|
|
### Deprecated metrics have been removed
|
|
|
|
Nomad v0.7.0 added supported for tagged metrics and deprecated untagged metrics.
|
|
There was support for configuring backwards-compatible metrics. This support has
|
|
been removed with v1.0.0, and all metrics will be emitted with tags.
|
|
|
|
### Null characters in region, datacenter, job name/ID, task group name, and task names
|
|
|
|
Starting with Nomad v1.0.0, jobs will fail validation if any of the following
|
|
contain null character: the job ID or name, the task group name, or the task
|
|
name. Any jobs meeting this requirement should be modified before an update to
|
|
v1.0.0. Similarly, client and server config validation will prohibit either the
|
|
region or the datacenter from containing null characters.
|
|
|
|
### EC2 CPU characteristics may be different
|
|
|
|
Starting with Nomad v1.0.0, the AWS fingerprinter uses data derived from the
|
|
official AWS EC2 API to determine default CPU performance characteristics,
|
|
including core count and core speed. This data should be accurate for each
|
|
instance type per region. Previously, Nomad used a hand-made lookup table that
|
|
was not region aware and may have contained inaccurate or incomplete data. As
|
|
part of this change, the AWS fingerprinter no longer sets the `cpu.modelname`
|
|
attribute.
|
|
|
|
As before, `cpu_total_compute` can be used to override the discovered CPU
|
|
resources available to the Nomad client.
|
|
|
|
### Inclusive language
|
|
|
|
Starting with Nomad v1.0.0, the terms `blacklist` and `whitelist` have been
|
|
deprecated from client configuration and driver configuration. The existing
|
|
configuration values are permitted but will be removed in a future version of
|
|
Nomad. The specific configuration values replaced are:
|
|
|
|
- Client `driver.blacklist` is replaced with `driver.denylist`.
|
|
|
|
- Client `driver.whitelist` is replaced with `driver.allowlist`.
|
|
|
|
- Client `env.blacklist` is replaced with `env.denylist`.
|
|
|
|
- Client `fingerprint.blacklist` is replaced with `fingerprint.denylist`.
|
|
|
|
- Client `fingerprint.whitelist` is replaced with `fingerprint.allowlist`.
|
|
|
|
- Client `user.blacklist` is replaced with `user.denylist`.
|
|
|
|
- Client `template.function_blacklist` is replaced with
|
|
`template.function_denylist`.
|
|
|
|
- Docker driver `docker.caps.whitelist` is replaced with
|
|
`docker.caps.allowlist`.
|
|
|
|
### Consul Connect
|
|
|
|
Nomad 1.0's Consul Connect integration works best with Consul 1.9 or later. The
|
|
ideal upgrade path is:
|
|
|
|
1. Create a new Nomad client image with Nomad 1.0 and Consul 1.9 or later.
|
|
2. Add new hosts based on the image.
|
|
3. [Drain][drain-cli] and shutdown old Nomad client nodes.
|
|
|
|
While inplace upgrades and older versions of Consul are supported by Nomad 1.0,
|
|
Envoy proxies will drop and stop accepting connections while the Nomad agent is
|
|
restarting. Nomad 1.0 with Consul 1.9 do not have this limitation.
|
|
|
|
#### Envoy proxy versions
|
|
|
|
Nomad v1.0.0 changes the behavior around the selection of Envoy version used for
|
|
Connect sidecar proxies. Previously, Nomad always defaulted to Envoy v1.11.2 if
|
|
neither the `meta.connect.sidecar_image` parameter or `sidecar_task` block were
|
|
explicitly configured. Likewise the same version of Envoy would be used for
|
|
Connect ingress gateways if `meta.connect.gateway_image` was unset. Starting
|
|
with Nomad v1.0.0, each Nomad Client will query Consul for a list of supported
|
|
Envoy versions. Nomad will make use of the latest version of Envoy supported by
|
|
the Consul agent when launching Envoy as a Connect sidecar proxy. If the version
|
|
of the Consul agent is older than v1.7.8, v1.8.4, or v1.9.0, Nomad will fallback
|
|
to the v1.11.2 version of Envoy. As before, if the `meta.connect.sidecar_image`,
|
|
`meta.connect.gateway_image`, or `sidecar_task` block are set, those settings
|
|
take precedence.
|
|
|
|
When upgrading Nomad Clients from a previous version to v1.0.0 and above, it is
|
|
recommended to also upgrade the Consul agents to v1.7.8, 1.8.4, or v1.9.0 or
|
|
newer. Upgrading Nomad and Consul to versions that support the new behavior
|
|
while also doing a full [node drain][] at the time of the upgrade for each node
|
|
will ensure Connect workloads are properly rescheduled onto nodes in such a way
|
|
that the Nomad Clients, Consul agents, and Envoy sidecar tasks maintain
|
|
compatibility with one another.
|
|
|
|
#### Envoy worker threads
|
|
|
|
Nomad v1.0.0 changes the default behavior around the number of worker threads
|
|
created by the Envoy sidecar proxy when using Consul Connect. Previously, the
|
|
Envoy [`--concurrency`][envoy_concurrency] argument was left unset, which caused
|
|
Envoy to spawn as many worker threads as logical cores available on the CPU. The
|
|
`--concurrency` value now defaults to `1` and can be configured by setting the
|
|
[`meta.connect.proxy_concurrency`][proxy_concurrency] property in client
|
|
configuration.
|
|
|
|
## Nomad 0.12.8
|
|
|
|
### Docker volume mounts
|
|
|
|
Nomad 0.12.8 includes security fixes for the handling of Docker volume mounts:
|
|
|
|
- The `docker.volumes.enabled` flag now defaults to `false` as documented.
|
|
|
|
- Docker driver mounts of type "volume" (but not "bind") were not sandboxed and
|
|
could mount arbitrary locations from the client host. The
|
|
`docker.volumes.enabled` configuration will now disable Docker mounts with
|
|
type "volume" when set to `false` (the default).
|
|
|
|
This change Docker impacts jobs that use a `mounts` with type "volume", as shown
|
|
below. This job will fail when placed unless `docker.volumes.enabled = true`.
|
|
|
|
```hcl
|
|
mounts = [
|
|
{
|
|
type = "volume"
|
|
target = "/path/in/container"
|
|
source = "docker_volume"
|
|
volume_options = {
|
|
driver_config = {
|
|
name = "local"
|
|
options = [
|
|
{
|
|
device = "/"
|
|
o = "ro,bind"
|
|
type = "ext4"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
]
|
|
```
|
|
|
|
## Nomad 0.12.6
|
|
|
|
### Artifact and Template Paths
|
|
|
|
Nomad 0.12.6 includes security fixes for privilege escalation vulnerabilities
|
|
in handling of job `template` and `artifact` blocks:
|
|
|
|
- The `template.source` and `template.destination` fields are now protected by
|
|
the file sandbox introduced in 0.9.6. These paths are now restricted to fall
|
|
inside the task directory by default. An operator can opt-out of this
|
|
protection with the [`template.disable_file_sandbox`][] field in the client
|
|
configuration.
|
|
|
|
- The paths for `template.source`, `template.destination`, and
|
|
`artifact.destination` are validated on job submission to ensure the paths do
|
|
not escape the file sandbox. It was possible to use interpolation to bypass
|
|
this validation. The client now interpolates the paths before checking if they
|
|
are in the file sandbox.
|
|
|
|
~> **Warning:** Due to a [bug][gh-9148] in Nomad v0.12.6, the
|
|
`template.destination` and `artifact.destination` paths do not support
|
|
absolute paths, including the interpolated `NOMAD_SECRETS_DIR`,
|
|
`NOMAD_TASK_DIR`, and `NOMAD_ALLOC_DIR` variables. This bug is fixed in
|
|
v0.12.9. To work around the bug, use a relative path.
|
|
|
|
## Nomad 0.12.0
|
|
|
|
### `mbits` and Task Network Resource deprecation
|
|
|
|
Starting in Nomad 0.12.0 the `mbits` field of the network resource block has
|
|
been deprecated and is no longer considered when making scheduling decisions.
|
|
This is in part because we felt that `mbits` didn't accurately account network
|
|
bandwidth as a resource.
|
|
|
|
Additionally the use of the `network` block inside of a task's `resource` block
|
|
is also deprecated. Users are advised to move their `network` block to the
|
|
`group` block. Recent networking features have only been added to group based
|
|
network configuration. If any usecase or feature which was available with task
|
|
network resource is not fulfilled with group network configuration, please open
|
|
an issue detailing the missing capability.
|
|
|
|
Additionally, the `docker` driver's `port_map` configuration is deprecated in
|
|
lieu of the `ports` field.
|
|
|
|
### Enterprise Licensing
|
|
|
|
Enterprise binaries for Nomad are now publicly available via
|
|
[releases.hashicorp.com](https://releases.hashicorp.com/nomad/). By default all
|
|
enterprise features are enabled for 6 hours. During that time enterprise users
|
|
should apply their license with the [`nomad license put ...`](/nomad/docs/v1.0.x/commands/license/put) command.
|
|
|
|
Once the 6 hour demonstration period expires, Nomad will shutdown. If restarted
|
|
Nomad will shutdown in a very short amount of time unless a valid license is
|
|
applied.
|
|
|
|
~> **Warning:** Due to a [bug][gh-8457] in Nomad v0.12.0, existing clusters
|
|
that are upgraded will **not** have 6 hours to apply a license. The minimal
|
|
grace period should be sufficient to apply a valid license, but enterprise
|
|
users are encouraged to delay upgrading until Nomad v0.12.1 is released and
|
|
fixes the issue.
|
|
|
|
### Docker access host filesystem
|
|
|
|
Nomad 0.12.0 disables Docker tasks access to the host filesystem, by default.
|
|
Prior to Nomad 0.12, Docker tasks may mount and then manipulate any host file
|
|
and may pose a security risk.
|
|
|
|
Operators now must explicitly allow tasks to access host filesystem. [Host
|
|
Volumes](/nomad/docs/configuration/client#host_volume-block) provide a fine tune
|
|
access to individual paths.
|
|
|
|
To restore pre-0.12.0 behavior, you can enable [Docker
|
|
`volume`](/nomad/docs/drivers/docker#enabled-1) to allow binding host paths, by adding
|
|
the following to the nomad client config file:
|
|
|
|
```hcl
|
|
plugin "docker" {
|
|
config {
|
|
volumes {
|
|
enabled = true
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### QEMU images
|
|
|
|
Nomad 0.12.0 restricts the paths the QEMU tasks can load an image from. A QEMU
|
|
task may download an image to the allocation directory to load. But images
|
|
outside the allocation directories must be explicitly allowed by operators in
|
|
the client agent configuration file.
|
|
|
|
For example, you may allow loading QEMU images from `/mnt/qemu-images` by
|
|
adding the following to the agent configuration file:
|
|
|
|
```hcl
|
|
plugin "qemu" {
|
|
config {
|
|
image_paths = ["/mnt/qemu-images"]
|
|
}
|
|
}
|
|
```
|
|
|
|
## Nomad 0.11.7
|
|
|
|
### Docker volume mounts
|
|
|
|
Nomad 0.11.7 includes a security fix for the handling of Docker volume
|
|
mounts. Docker driver mounts of type "volume" (but not "bind") were not
|
|
sandboxed and could mount arbitrary locations from the client host. The
|
|
`docker.volumes.enabled` configuration will now disable Docker mounts with
|
|
type "volume" when set to `false`.
|
|
|
|
This change Docker impacts jobs that use a `mounts` with type "volume", as
|
|
shown below. This job will fail when placed unless `docker.volumes.enabled = true`.
|
|
|
|
```hcl
|
|
mounts = [
|
|
{
|
|
type = "volume"
|
|
target = "/path/in/container"
|
|
source = "docker_volume"
|
|
volume_options = {
|
|
driver_config = {
|
|
name = "local"
|
|
options = [
|
|
{
|
|
device = "/"
|
|
o = "ro,bind"
|
|
type = "ext4"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
]
|
|
```
|
|
|
|
## Nomad 0.11.5
|
|
|
|
### Artifact and Template Paths
|
|
|
|
Nomad 0.11.5 includes backported security fixes for privilege escalation
|
|
vulnerabilities in handling of job `template` and `artifact` blocks:
|
|
|
|
- The `template.source` and `template.destination` fields are now protected by
|
|
the file sandbox introduced in 0.9.6. These paths are now restricted to fall
|
|
inside the task directory by default. An operator can opt-out of this
|
|
protection with the
|
|
[`template.disable_file_sandbox`](/nomad/docs/configuration/client#template-parameters)
|
|
field in the client configuration.
|
|
- The paths for `template.source`, `template.destination`, and
|
|
`artifact.destination` are validated on job submission to ensure the paths
|
|
do not escape the file sandbox. It was possible to use interpolation to
|
|
bypass this validation. The client now interpolates the paths before
|
|
checking if they are in the file sandbox.
|
|
|
|
~> **Warning:** Due to a [bug][gh-9148] in Nomad v0.11.5, the
|
|
`template.destination` and `artifact.destination` paths do not support
|
|
absolute paths, including the interpolated `NOMAD_SECRETS_DIR`,
|
|
`NOMAD_TASK_DIR`, and `NOMAD_ALLOC_DIR` variables. This bug is fixed in
|
|
v0.11.6. To work around the bug, use a relative path.
|
|
|
|
## Nomad 0.11.3
|
|
|
|
Nomad 0.11.3 fixes a critical bug causing the nomad agent to become
|
|
unresponsive. The issue is due to a [Go 1.14.1 runtime
|
|
bug](https://github.com/golang/go/issues/38023) and affects Nomad 0.11.1 and
|
|
0.11.2.
|
|
|
|
## Nomad 0.11.2
|
|
|
|
### Scheduler Scoring Changes
|
|
|
|
Prior to Nomad 0.11.2 the scheduler algorithm used a [node's reserved
|
|
resources][reserved]
|
|
incorrectly during scoring. The result of this bug was that scoring biased in
|
|
favor of nodes with reserved resources vs nodes without reserved resources.
|
|
|
|
Placements will be more correct but slightly different in v0.11.2 vs earlier
|
|
versions of Nomad. Operators do _not_ need to take any actions as the impact of
|
|
the bug fix will only minimally affect scoring.
|
|
|
|
Feasibility (whether a node is capable of running a job at all) is _not_
|
|
affected.
|
|
|
|
### Periodic Jobs and Daylight Saving Time
|
|
|
|
Nomad 0.11.2 fixed a long outstanding bug affecting periodic jobs that are
|
|
scheduled to run during Daylight Saving Time transitions.
|
|
|
|
Nomad 0.11.2 provides a more defined behavior: Nomad evaluates the cron
|
|
expression with respect to specified time zone during transition. A 2:30am
|
|
nightly job with `America/New_York` time zone will not run on the day daylight
|
|
saving time starts; similarly, a 1:30am nightly job will run twice on the day
|
|
daylight saving time ends. See the [Daylight Saving Time][dst] documentation
|
|
for details.
|
|
|
|
## Nomad 0.11.0
|
|
|
|
### client.template: `vault_grace` deprecation
|
|
|
|
Nomad 0.11.0 updates
|
|
[consul-template](https://github.com/hashicorp/consul-template) to v0.24.1. This
|
|
library deprecates the [`vault_grace`][vault_grace] option for templating
|
|
included in Nomad. The feature has been ignored since Vault 0.5 and as long as
|
|
you are running a more recent version of Vault, you can safely remove
|
|
`vault_grace` from your Nomad jobs.
|
|
|
|
### Rkt Task Driver Removed
|
|
|
|
The `rkt` task driver has been deprecated and removed from Nomad. While the code
|
|
is available in an external repository,
|
|
<https://github.com/hashicorp/nomad-driver-rkt>, it will not be maintained as
|
|
`rkt` is [no longer being developed upstream](https://github.com/rkt/rkt). We
|
|
encourage all `rkt` users to find a new task driver as soon as possible.
|
|
|
|
## Nomad 0.10.8
|
|
|
|
### Docker volume mounts
|
|
|
|
Nomad 0.10.8 includes a security fix for the handling of Docker volume mounts.
|
|
Docker driver mounts of type "volume" (but not "bind") were not sandboxed and
|
|
could mount arbitrary locations from the client host. The
|
|
`docker.volumes.enabled` configuration will now disable Docker mounts with type
|
|
"volume" when set to `false`.
|
|
|
|
This change Docker impacts jobs that use a `mounts` with type "volume", as shown
|
|
below. This job will fail when placed unless `docker.volumes.enabled = true`.
|
|
|
|
```hcl
|
|
mounts = [
|
|
{
|
|
type = "volume"
|
|
target = "/path/in/container"
|
|
source = "docker_volume"
|
|
volume_options = {
|
|
driver_config = {
|
|
name = "local"
|
|
options = [
|
|
{
|
|
device = "/"
|
|
o = "ro,bind"
|
|
type = "ext4"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
]
|
|
```
|
|
|
|
## Nomad 0.10.6
|
|
|
|
### Artifact and Template Paths
|
|
|
|
Nomad 0.10.6 includes backported security fixes for privilege escalation
|
|
vulnerabilities in handling of job `template` and `artifact` blocks:
|
|
|
|
- The `template.source` and `template.destination` fields are now protected by
|
|
the file sandbox introduced in 0.9.6. These paths are now restricted to fall
|
|
inside the task directory by default. An operator can opt-out of this
|
|
protection with the
|
|
[`template.disable_file_sandbox`](/nomad/docs/configuration/client#template-parameters)
|
|
field in the client configuration.
|
|
|
|
- The paths for `template.source`, `template.destination`, and
|
|
`artifact.destination` are validated on job submission to ensure the paths
|
|
do not escape the file sandbox. It was possible to use interpolation to
|
|
bypass this validation. The client now interpolates the paths before
|
|
checking if they are in the file sandbox.
|
|
|
|
~> **Warning:** Due to a [bug][gh-9148] in Nomad v0.10.6, the
|
|
`template.destination` and `artifact.destination` paths do not support
|
|
absolute paths, including the interpolated `NOMAD_SECRETS_DIR`,
|
|
`NOMAD_TASK_DIR`, and `NOMAD_ALLOC_DIR` variables. This bug is fixed in
|
|
v0.10.7. To work around the bug, use a relative path.
|
|
|
|
## Nomad 0.10.4
|
|
|
|
### Same-Node Scheduling Penalty Removed
|
|
|
|
Nomad 0.10.4 includes a fix to the scheduler that removes the same-node penalty
|
|
for allocations that have not previously failed. In earlier versions of Nomad,
|
|
the node where an allocation was running was penalized from receiving updated
|
|
versions of that allocation, resulting in a higher chance of the allocation
|
|
being placed on a new node. This was changed so that the penalty only applies to
|
|
nodes where the previous allocation has failed or been rescheduled, to reduce
|
|
the risk of correlated failures on a host. Scheduling weighs a number of
|
|
factors, but this change should reduce movement of allocations that are being
|
|
updated from a healthy state. You can view the placement metrics for an
|
|
allocation with `nomad alloc status -verbose`.
|
|
|
|
### Additional Environment Variable Filtering
|
|
|
|
Nomad will by default prevent certain environment variables set in the client
|
|
process from being passed along into launched tasks. The `CONSUL_HTTP_TOKEN`
|
|
environment variable has been added to the default list. More information can
|
|
be found in the `env.blacklist` [configuration](/nomad/docs/configuration/client#env-blacklist) .
|
|
|
|
## Nomad 0.10.3
|
|
|
|
### mTLS Certificate Validation
|
|
|
|
Nomad 0.10.3 includes a fix for a privilege escalation vulnerability in
|
|
validating TLS certificates for RPC with mTLS. Nomad RPC endpoints validated
|
|
that TLS client certificates had not expired and were signed by the same CA as
|
|
the Nomad node, but did not correctly check the certificate's name for the role
|
|
and region as described in the [Securing Nomad with TLS][tls-guide] guide. This
|
|
allows trusted operators with a client certificate signed by the CA to send RPC
|
|
calls as a Nomad client or server node, bypassing access control and accessing
|
|
any secrets available to a client.
|
|
|
|
Nomad clusters configured for mTLS following the [Securing Nomad with
|
|
TLS][tls-guide] guide or the [Vault PKI Secrets Engine
|
|
Integration][tls-vault-guide] guide should already have certificates that will
|
|
pass validation. Before upgrading to Nomad 0.10.3, operators using mTLS with
|
|
`verify_server_hostname = true` should confirm that the common name or SAN of
|
|
all Nomad client node certs is `client.<region>.nomad`, and that the common name
|
|
or SAN of all Nomad server node certs is `server.<region>.nomad`.
|
|
|
|
### Connection Limits Added
|
|
|
|
Nomad 0.10.3 introduces the [limits][] agent configuration parameters for
|
|
mitigating denial of service attacks from users who are not authenticated via
|
|
mTLS. The default limits block is:
|
|
|
|
```hcl
|
|
limits {
|
|
https_handshake_timeout = "5s"
|
|
http_max_conns_per_client = 100
|
|
rpc_handshake_timeout = "5s"
|
|
rpc_max_conns_per_client = 100
|
|
}
|
|
```
|
|
|
|
If your Nomad agent's endpoints are protected from unauthenticated users via
|
|
other mechanisms these limits may be safely disabled by setting them to `0`.
|
|
|
|
However the defaults were chosen to be safe for a wide variety of Nomad
|
|
deployments and may protect against accidental abuses of the Nomad API that
|
|
could cause unintended resource usage.
|
|
|
|
## Nomad 0.10.2
|
|
|
|
### Preemption Panic Fixed
|
|
|
|
Nomad 0.9.7 and 0.10.2 fix a [server crashing bug][gh-6787] present in scheduler
|
|
preemption since 0.9.0. Users unable to immediately upgrade Nomad can [disable
|
|
preemption][preemption-api] to avoid the panic.
|
|
|
|
### Dangling Docker Container Cleanup
|
|
|
|
Nomad 0.10.2 addresses an issue occurring in heavily loaded clients, where
|
|
containers are started without being properly managed by Nomad. Nomad 0.10.2
|
|
introduced a reaper that detects and kills such containers.
|
|
|
|
Operators may opt to run reaper in a dry-mode or disabling it through a client
|
|
config.
|
|
|
|
For more information, see [Docker Dangling containers][dangling-containers].
|
|
|
|
## Nomad 0.10.0
|
|
|
|
### Deployments
|
|
|
|
Nomad 0.10 enables rolling deployments for service jobs by default and adds a
|
|
default update block when a service job is created or updated. This does not
|
|
affect jobs with an update block.
|
|
|
|
In pre-0.10 releases, when updating a service job without an update block, all
|
|
existing allocations are stopped while new allocations start up, and this may
|
|
cause a service degradation or an outage. You can regain this behavior and
|
|
disable deployments by setting `max_parallel` to 0.
|
|
|
|
For more information, see [`update` block][update].
|
|
|
|
## Nomad 0.9.5
|
|
|
|
### Template Rendering
|
|
|
|
Nomad 0.9.5 includes security fixes for privilege escalation vulnerabilities in
|
|
handling of job `template` blocks:
|
|
|
|
- The client host's environment variables are now cleaned before rendering the
|
|
template. If a template includes the `env` function, the job should include an
|
|
[`env`](/nomad/docs/job-specification/env) block to allow access to the variable in
|
|
the template.
|
|
|
|
- The `plugin` function is no longer permitted by default and will raise an
|
|
error if used in a template. Operator can opt-in to permitting this function
|
|
with the new
|
|
[`template.function_blacklist`](/nomad/docs/configuration/client#template-parameters)
|
|
field in the client configuration.
|
|
|
|
- The `file` function has been changed to restrict paths to fall inside the task
|
|
directory by default. Paths that used the `NOMAD_TASK_DIR` environment
|
|
variable to prefix file paths should work unchanged. Relative paths or
|
|
symlinks that point outside the task directory will raise an error. An
|
|
operator can opt-out of this protection with the new
|
|
[`template.disable_file_sandbox`](/nomad/docs/configuration/client#template-parameters)
|
|
field in the client configuration.
|
|
|
|
## Nomad 0.9.0
|
|
|
|
### Preemption
|
|
|
|
Nomad 0.9 adds preemption support for system jobs. If a system job is submitted
|
|
that has a higher priority than other running jobs on the node, and the node
|
|
does not have capacity remaining, Nomad may preempt those lower priority
|
|
allocations to place the system job. See [preemption][preemption] for more
|
|
details.
|
|
|
|
### Task Driver Plugins
|
|
|
|
All task drivers have become [plugins][plugins] in Nomad 0.9.0. There are two
|
|
user visible differences between 0.8 and 0.9 drivers:
|
|
|
|
- [LXC][lxc] is now community supported and distributed independently.
|
|
|
|
- Task driver [`config`][task-config] blocks are no longer validated by
|
|
the [`nomad job validate`][validate] command. This is a regression that will
|
|
be fixed in a future release.
|
|
|
|
There is a new method for client driver configuration options, but existing
|
|
`client.options` settings are supported in 0.9. See [plugin
|
|
configuration][plugin-block] for details.
|
|
|
|
#### LXC
|
|
|
|
LXC is now an external plugin and must be installed separately. See [the LXC
|
|
driver's documentation][lxc] for details.
|
|
|
|
### Structured Logging
|
|
|
|
Nomad 0.9.0 switches to structured logging. Any log processing on the pre-0.9
|
|
log output will need to be updated to match the structured output.
|
|
|
|
Structured log lines have the format:
|
|
|
|
```
|
|
# <Timestamp> [<Level>] <Component>: <Message>: <KeyN>=<ValueN> ...
|
|
|
|
2019-01-29T05:52:09.221Z [INFO ] client.plugin: starting plugin manager: plugin-type=device
|
|
```
|
|
|
|
Values containing whitespace will be quoted:
|
|
|
|
```
|
|
... starting plugin: task=redis args="[/opt/gopath/bin/nomad logmon]"
|
|
```
|
|
|
|
### HCL2 Transition
|
|
|
|
Nomad 0.9.0 begins a transition to [HCL2][hcl2], the next version of the
|
|
HashiCorp configuration language. While Nomad has begun integrating HCL2, users
|
|
will need to continue to use HCL1 in Nomad 0.9.0 as the transition is
|
|
incomplete.
|
|
|
|
If you interpolate variables in your [`task.config`][task-config] containing
|
|
consecutive dots in their name, you will need to change your job specification
|
|
to use the `env` map. See the following example:
|
|
|
|
```hcl
|
|
env {
|
|
# Note the multiple consecutive dots
|
|
image...version = "3.2"
|
|
|
|
# Valid in both v0.8 and v0.9
|
|
image.version = "3.2"
|
|
}
|
|
|
|
# v0.8 task config block:
|
|
task {
|
|
driver = "docker"
|
|
config {
|
|
image = "redis:${image...version}"
|
|
}
|
|
}
|
|
|
|
# v0.9 task config block:
|
|
task {
|
|
driver = "docker"
|
|
config {
|
|
image = "redis:${env["image...version"]}"
|
|
}
|
|
}
|
|
```
|
|
|
|
This only affects users who interpolate unusual variables with multiple
|
|
consecutive dots in their task `config` block. All other interpolation is
|
|
unchanged.
|
|
|
|
Since HCL2 uses dotted object notation for interpolation users should transition
|
|
away from variable names with multiple consecutive dots.
|
|
|
|
### Downgrading clients
|
|
|
|
Due to the large refactor of the Nomad client in 0.9, downgrading to a previous
|
|
version of the client after upgrading it to Nomad 0.9 is not supported. To
|
|
downgrade safely, users should erase the Nomad client's data directory.
|
|
|
|
### `port_map` Environment Variable Changes
|
|
|
|
Before Nomad 0.9.0 ports mapped via a task driver's `port_map` block could be
|
|
interpolated via the `NOMAD_PORT_<label>` environment variables.
|
|
|
|
However, in Nomad 0.9.0 no parameters in a driver's `config` block, including
|
|
its `port_map`, are available for interpolation. This means `{{ env NOMAD_PORT_<label> }}` in a `template` block or `HTTP_PORT = "${NOMAD_PORT_http}"` in an `env` block will now interpolate the _host_ ports,
|
|
not the container's.
|
|
|
|
Nomad 0.10 introduced Task Group Networking which natively supports port mapping
|
|
without relying on task driver specific `port_map` fields. The
|
|
[`to`](/nomad/docs/job-specification/network#to) field on group network port blocks
|
|
will be interpolated properly. Please see the
|
|
[`network`](/nomad/docs/job-specification/network/) block documentation for details.
|
|
|
|
## Nomad 0.8.0
|
|
|
|
### Raft Protocol Version Compatibility
|
|
|
|
When upgrading to Nomad 0.8.0 from a version lower than 0.7.0, users will need
|
|
to set the [`raft_protocol`] option in
|
|
their `server` block to 1 in order to maintain backwards compatibility with the
|
|
old servers during the upgrade. After the servers have been migrated to version
|
|
0.8.0, `raft_protocol` can be moved up to 2 and the servers restarted to match
|
|
the default.
|
|
|
|
The Raft protocol must be stepped up in this way; only adjacent version numbers
|
|
are compatible (for example, version 1 cannot talk to version 3). Here is a
|
|
table of the Raft Protocol versions supported by each Nomad version:
|
|
|
|
<table>
|
|
<thead>
|
|
<tr>
|
|
<th>Version</th>
|
|
<th>Supported Raft Protocols</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>0.6 and earlier</td>
|
|
<td>0</td>
|
|
</tr>
|
|
<tr>
|
|
<td>0.7</td>
|
|
<td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td>0.8 and later</td>
|
|
<td>1, 2, 3</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
In order to enable all
|
|
[Autopilot](/nomad/tutorials/manage-clusters/autopilot) features, all
|
|
servers in a Nomad cluster must be running with Raft protocol version 3 or
|
|
later.
|
|
|
|
### Node Draining Improvements
|
|
|
|
Node draining via the [`node drain`][drain-cli] command or the [drain
|
|
API][drain-api] has been substantially changed in Nomad 0.8. In Nomad 0.7.1 and
|
|
earlier draining a node would immediately stop all allocations on the node
|
|
being drained. Nomad 0.8 now supports a [`migrate`][migrate] block in job
|
|
specifications to control how many allocations may be migrated at once and the
|
|
default will be used for existing jobs.
|
|
|
|
The `drain` command now blocks until the drain completes. To get the Nomad 0.7.1
|
|
and earlier drain behavior use the command: `nomad node drain -enable -force -detach <node-id>`
|
|
|
|
See the [`migrate` block documentation][migrate] and [Decommissioning Nodes
|
|
guide](/nomad/tutorials/manage-clusters/node-drain) for details.
|
|
|
|
### Periods in Environment Variable Names No Longer Escaped
|
|
|
|
_Applications which expect periods in environment variable names to be replaced
|
|
with underscores must be updated._
|
|
|
|
In Nomad 0.7 periods (`.`) in environment variables names were replaced with an
|
|
underscore in both the [`env`](/nomad/docs/job-specification/env) and
|
|
[`template`](/nomad/docs/job-specification/template) blocks.
|
|
|
|
In Nomad 0.8 periods are _not_ replaced and will be included in environment
|
|
variables verbatim.
|
|
|
|
For example the following block:
|
|
|
|
```text
|
|
env {
|
|
registry.consul.addr = "${NOMAD_IP_http}:8500"
|
|
}
|
|
```
|
|
|
|
In Nomad 0.7 would be exposed to the task as
|
|
`registry_consul_addr=127.0.0.1:8500`. In Nomad 0.8 it will now appear exactly
|
|
as specified: `registry.consul.addr=127.0.0.1:8500`.
|
|
|
|
### Client APIs Unavailable on Older Nodes
|
|
|
|
Because Nomad 0.8 uses a new RPC mechanism to route node-specific APIs like
|
|
[`nomad alloc fs`](/nomad/docs/commands/alloc/fs) through servers to the node,
|
|
0.8 CLIs are incompatible using these commands on clients older than 0.8.
|
|
|
|
To access these commands on older clients either continue to use a pre-0.8
|
|
version of the CLI, or upgrade all clients to 0.8.
|
|
|
|
### CLI Command Changes
|
|
|
|
Nomad 0.8 has changed the organization of CLI commands to be based on
|
|
subcommands. An example of this change is the change from `nomad alloc-status`
|
|
to `nomad alloc status`. All commands have been made to be backwards compatible,
|
|
but operators should update any usage of the old style commands to the new style
|
|
as the old style will be deprecated in future versions of Nomad.
|
|
|
|
### RPC Advertise Address
|
|
|
|
The behavior of the [advertised RPC address](/nomad/docs/configuration#rpc-1) has
|
|
changed to be only used to advertise the RPC address of servers to client nodes.
|
|
Server to server communication is done using the advertised Serf address.
|
|
Existing cluster's should not be effected but the advertised RPC address may
|
|
need to be updated to allow connecting client's over a NAT.
|
|
|
|
## Nomad 0.6.0
|
|
|
|
### Default `advertise` address changes
|
|
|
|
When no `advertise` address was specified and Nomad's `bind_addr` was loopback
|
|
or `0.0.0.0`, Nomad attempted to resolve the local hostname to use as an
|
|
advertise address.
|
|
|
|
Many hosts cannot properly resolve their hostname, so Nomad 0.6 defaults
|
|
`advertise` to the first private IP on the host (e.g. `10.1.2.3`).
|
|
|
|
If you manually configure `advertise` addresses no changes are necessary.
|
|
|
|
## Nomad Clients
|
|
|
|
The change to the default, advertised IP also effect clients that do not specify
|
|
which network_interface to use. If you have several routable IPs, it is advised
|
|
to configure the client's [network
|
|
interface](/nomad/docs/configuration/client#network_interface) such that tasks bind to
|
|
the correct address.
|
|
|
|
## Nomad 0.5.5
|
|
|
|
### Docker `load` changes
|
|
|
|
Nomad 0.5.5 has a backward incompatible change in the `docker` driver's
|
|
configuration. Prior to 0.5.5 the `load` configuration option accepted a list
|
|
images to load, in 0.5.5 it has been changed to a single string. No
|
|
functionality was changed. Even if more than one item was specified prior to
|
|
0.5.5 only the first item was used.
|
|
|
|
To do a zero-downtime deploy with jobs that use the `load` option:
|
|
|
|
- Upgrade servers to version 0.5.5 or later.
|
|
|
|
- Deploy new client nodes on the same version as the servers.
|
|
|
|
- Resubmit jobs with the `load` option fixed and a constraint to only run on
|
|
version 0.5.5 or later:
|
|
|
|
```hcl
|
|
constraint {
|
|
attribute = "${attr.nomad.version}"
|
|
operator = "version"
|
|
value = ">= 0.5.5"
|
|
}
|
|
```
|
|
|
|
- Drain and shutdown old client nodes.
|
|
|
|
### Validation changes
|
|
|
|
Due to internal job serialization and validation changes you may run into
|
|
issues using 0.5.5 command line tools such as `nomad run` and `nomad validate`
|
|
with 0.5.4 or earlier agents.
|
|
|
|
It is recommended you upgrade agents before or alongside your command line
|
|
tools.
|
|
|
|
## Nomad 0.4.0
|
|
|
|
Nomad 0.4.0 has backward incompatible changes in the logic for Consul
|
|
deregistration. When a Task which was started by Nomad v0.3.x is uncleanly shut
|
|
down, the Nomad 0.4 Client will no longer clean up any stale services. If an
|
|
in-place upgrade of the Nomad client to 0.4 prevents the Task from gracefully
|
|
shutting down and deregistering its Consul-registered services, the Nomad Client
|
|
will not clean up the remaining Consul services registered with the 0.3
|
|
Executor.
|
|
|
|
We recommend draining a node before upgrading to 0.4.0 and then re-enabling the
|
|
node once the upgrade is complete.
|
|
|
|
## Nomad 0.3.1
|
|
|
|
Nomad 0.3.1 removes artifact downloading from driver configurations and places them as
|
|
a first class element of the task. As such, jobs will have to be rewritten in
|
|
the proper format and resubmitted to Nomad. Nomad clients will properly
|
|
re-attach to existing tasks but job definitions must be updated before they can
|
|
be dispatched to clients running 0.3.1.
|
|
|
|
## Nomad 0.3.0
|
|
|
|
Nomad 0.3.0 has made several substantial changes to job files included a new
|
|
`log` block and variable interpretation syntax (`${var}`), a modified `restart`
|
|
policy syntax, and minimum resources for tasks as well as validation. These
|
|
changes require a slight change to the default upgrade flow.
|
|
|
|
After upgrading the version of the servers, all previously submitted jobs must
|
|
be resubmitted with the updated job syntax using a Nomad 0.3.0 binary.
|
|
|
|
- All instances of `$var` must be converted to the new syntax of `${var}`
|
|
|
|
- All tasks must provide their required resources for CPU, memory and disk as
|
|
well as required network usage if ports are required by the task.
|
|
|
|
- Restart policies must be updated to indicate whether it is desired for the
|
|
task to restart on failure or to fail using `mode = "delay"` or `mode = "fail"` respectively.
|
|
|
|
- Service names that include periods will fail validation. To fix, remove any
|
|
periods from the service name before running the job.
|
|
|
|
After updating the Servers and job files, Nomad Clients can be upgraded by first
|
|
draining the node so no tasks are running on it. This can be verified by running
|
|
`nomad node status <node-id>` and verify there are no tasks in the `running`
|
|
state. Once that is done the client can be killed, the `data_dir` should be
|
|
deleted and then Nomad 0.3.0 can be launched.
|
|
|
|
[api_jobs_parse]: /nomad/api-docs/jobs#parse-job
|
|
[artifacts]: /nomad/docs/job-specification/artifact
|
|
[artifact_params]: /nomad/docs/job-specification/artifact#artifact-parameters
|
|
[cgroups2]: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
|
|
[cgroup_parent]: /nomad/docs/configuration/client#cgroup_parent
|
|
[client_artifact]: /nomad/docs/configuration/client#artifact-parameters
|
|
[cores]: /nomad/docs/job-specification/resources#cores
|
|
[dangling-containers]: /nomad/docs/drivers/docker#dangling-containers
|
|
[drain-api]: /nomad/api-docs/nodes#drain-node
|
|
[drain-cli]: /nomad/docs/commands/node/drain
|
|
[dst]: /nomad/docs/job-specification/periodic#daylight-saving-time
|
|
[envoy_concurrency]: https://www.envoyproxy.io/docs/envoy/latest/operations/cli#cmdoption-concurrency
|
|
[gh-6787]: https://github.com/hashicorp/nomad/issues/6787
|
|
[gh-8457]: https://github.com/hashicorp/nomad/issues/8457
|
|
[gh-9148]: https://github.com/hashicorp/nomad/issues/9148
|
|
[gh-10875]: https://github.com/hashicorp/nomad/pull/10875
|
|
[gh-11563]: https://github.com/hashicorp/nomad/issues/11563
|
|
[go-client]: https://pkg.go.dev/github.com/hashicorp/nomad/api#Client
|
|
[hcl2]: https://github.com/hashicorp/hcl2
|
|
[limits]: /nomad/docs/configuration#limits
|
|
[lxc]: /nomad/plugins/drivers/community/lxc
|
|
[migrate]: /nomad/docs/job-specification/migrate
|
|
[nvidia]: /nomad/plugins/devices/nvidia
|
|
[plugin-block]: /nomad/docs/configuration/plugin
|
|
[plugins]: /nomad/plugins/drivers/community
|
|
[preemption-api]: /nomad/api-docs/operator#update-scheduler-configuration
|
|
[preemption]: /nomad/docs/concepts/scheduling/preemption
|
|
[proxy_concurrency]: /nomad/docs/job-specification/sidecar_task#proxy_concurrency
|
|
[`sidecar_task.config`]: /nomad/docs/job-specification/sidecar_task#config
|
|
[`raft_protocol`]: /nomad/docs/configuration/server#raft_protocol
|
|
[`raft protocol`]: /nomad/docs/configuration/server#raft_protocol
|
|
[`rejoin_after_leave`]: /nomad/docs/configuration/server#rejoin_after_leave
|
|
[reserved]: /nomad/docs/configuration/client#reserved-parameters
|
|
[task-config]: /nomad/docs/job-specification/task#config
|
|
[tls-guide]: /nomad/tutorials/transport-security/security-enable-tls
|
|
[tls-vault-guide]: /nomad/tutorials/integrate-vault/vault-pki-nomad
|
|
[update]: /nomad/docs/job-specification/update
|
|
[validate]: /nomad/docs/commands/job/validate
|
|
[vault_grace]: /nomad/docs/job-specification/template
|
|
[node drain]: /nomad/docs/upgrade#5-upgrade-clients
|
|
[`template.disable_file_sandbox`]: /nomad/docs/configuration/client#template-parameters
|
|
[template_gid]: /nomad/docs/job-specification/template#gid
|
|
[template_uid]: /nomad/docs/job-specification/template#uid
|
|
[pki]: /vault/docs/secrets/pki
|
|
[`volume create`]: /nomad/docs/commands/volume/create
|
|
[`volume register`]: /nomad/docs/commands/volume/register
|
|
[`volume`]: /nomad/docs/job-specification/volume
|
|
[enterprise licensing]: /nomad/docs/enterprise/license
|
|
[`cap_net_raw`]: https://security.stackexchange.com/a/128988
|
|
[`linux capabilities`]: https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities
|
|
[`allow_caps`]: /nomad/docs/drivers/docker#allow_caps
|
|
[`extra_hosts`]: /nomad/docs/drivers/docker#extra_hosts
|
|
[no_net_raw]: /nomad/docs/upgrade/upgrade-specific#nomad-1-1-0-rc1-1-0-5-0-12-12
|
|
[allow_caps_exec]: /nomad/docs/drivers/exec#allow_caps
|
|
[allow_caps_java]: /nomad/docs/drivers/java#allow_caps
|
|
[cap_add_exec]: /nomad/docs/drivers/exec#cap_add
|
|
[cap_drop_exec]: /nomad/docs/drivers/exec#cap_drop
|
|
[`log_file`]: /nomad/docs/configuration#log_file
|
|
[Upgrading to Raft Protocol 3]: /nomad/docs/upgrade#upgrading-to-raft-protocol-3
|
|
[`Local`]: /consul/docs/security/acl/acl-tokens#token-attributes
|
|
[anon_token]: /consul/docs/security/acl/acl-tokens#special-purpose-tokens
|
|
[consul_acl]: https://github.com/hashicorp/consul/issues/7414
|
|
[kill_timeout]: /nomad/docs/job-specification/task#kill_timeout
|
|
[max_kill_timeout]: /nomad/docs/configuration/client#max_kill_timeout
|
|
[alloc_overlap]: https://github.com/hashicorp/nomad/issues/10440
|
|
[gh_10446]: https://github.com/hashicorp/nomad/pull/10446#issuecomment-1224833906
|
|
[gh_issue]: https://github.com/hashicorp/nomad/issues/new/choose
|
|
[upgrade process]: /nomad/docs/upgrade#upgrade-process
|
|
[landlock]: https://docs.kernel.org/userspace-api/landlock.html
|
|
[artifact_fs_isolation]: /nomad/docs/configuration/client#disable_filesystem_isolation
|
|
[decompression_file_count_limit]: /nomad/docs/configuration/client#decompression_file_count_limit
|
|
[decompression_size_limit]: /nomad/docs/configuration/client#decompression_size_limit
|
|
[artifact_env]: /nomad/docs/configuration/client#set_environment_variables
|
|
[dangling_container_reconciliation]: /nomad/docs/drivers/docker#enabled
|
|
[hard_guide]: /nomad/docs/install/production/requirements#hardening-nomad
|