This complements the `env` parameter, so that the operator can author
tasks that don't share their Vault token with the workload when using
`image` filesystem isolation. As a result, more powerful tokens can be used
in a job definition, allowing it to use template stanzas to issue all kinds of
secrets (database secrets, Vault tokens with very specific policies, etc.),
without sharing that issuing power with the task itself.
This is accomplished by creating a directory called `private` within
the task's working directory, which shares many properties of
the `secrets` directory (tmpfs where possible, not accessible by
`nomad alloc fs` or Nomad's web UI), but isn't mounted into/bound to the
container.
If the `disable_file` parameter is set to `false` (its default), the Vault token
is also written to the NOMAD_SECRETS_DIR, so the default behavior is
backwards compatible. Even if the operator never changes the default,
they will still benefit from the improved behavior of Nomad never reading
the token back in from that - potentially altered - location.
When the server restarts for the upgrade, it loads the `structs.Job` from the
Raft snapshot/logs. The jobspec has long since been parsed, so none of the
guards around the default value are in play. The empty field value for `Enabled`
is the zero value, which is false.
This doesn't impact any running allocation because we don't replace running
allocations when either the client or server restart. But as soon as any
allocation gets rescheduled (ex. you drain all your clients during upgrades),
it'll be using the `structs.Job` that the server has, which has `Enabled =
false`, and logs will not be collected.
This changeset fixes the bug by adding a new field `Disabled` which defaults to
false (so that the zero value works), and deprecates the old field.
Fixes#17076
Some Nomad users ship application logs out-of-band via syslog. For these users
having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow
disabling the logmon and pointing the task's stdout/stderr to /dev/null.
This changeset is the first of several incremental improvements to log
collection short of full-on logging plugins. The next step will likely be to
extend the internal-only task driver configuration so that cluster
administrators can turn off log collection for the entire driver.
---
Fixes: #11175
Co-authored-by: Thomas Weber <towe75@googlemail.com>
Implement the new `nomad job restart` command that allows operators to
restart allocations tasks or reschedule then entire allocation.
Restarts can be batched to target multiple allocations in parallel.
Between each batch the command can stop and hold for a predefined time
or until the user confirms that the process should proceed.
This implements the "Stateless Restarts" alternative from the original
RFC
(https://gist.github.com/schmichael/e0b8b2ec1eb146301175fd87ddd46180).
The original concept is still worth implementing, as it allows this
functionality to be exposed over an API that can be consumed by the
Nomad UI and other clients. But the implementation turned out to be more
complex than we initially expected so we thought it would be better to
release a stateless CLI-based implementation first to gather feedback
and validate the restart behaviour.
Co-authored-by: Shishir Mahajan <smahajan@roblox.com>
Add `identity` jobspec block to expose workload identity tokens to tasks.
---------
Co-authored-by: Anders <mail@anars.dk>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
When a Nomad agent starts and loads jobs that already existed in the
cluster, the default template uid and gid was being set to 0, since this
is the zero value for int. This caused these jobs to fail in
environments where it was not possible to use 0, such as in Windows
clients.
In order to differentiate between an explicit 0 and a template where
these properties were not set we need to use a pointer.
This PR adds some NSD check status output to the CLI.
1. The 'nomad alloc status' command produces nsd check summary output (if present)
2. The 'nomad alloc checks' sub-command is added to produce complete nsd check output (if present)
UID/GID 0 is usually reserved for the root user/group. While Nomad
clients are expected to run as root it may not always be the case.
Setting these values as -1 if not defined will fallback to the pervious
behaviour of not attempting to set file ownership and use whatever
UID/GID the Nomad agent is running as. It will also keep backwards
compatibility, which is specially important for platforms where this
feature is not supported, like Windows.
* Allow specification of CSI staging and publishing directory path
* Add website documentation for stage_publish_dir
* Replace erroneous reference to csi_plugin.mount_config with csi_plugin.mount_dir
* Avoid requiring CSI plugins to be redeployed after introducing StagePublishDir
After a more detailed analysis of this feature, the approach taken in
PR #12449 was found to be not ideal due to poor UX (users are
responsible for setting the entity alias they would like to use) and
issues around jobs potentially masquerading itself as another Vault
entity.
Move some common Vault API data struct decoding out of the Vault client
so it can be reused in other situations.
Make Vault job validation its own function so it's easier to expand it.
Rename the `Job.VaultPolicies` method to just `Job.Vault` since it
returns the full Vault block, not just their policies.
Set `ChangeMode` on `Vault.Canonicalize`.
Add some missing tests.
Allows specifying an entity alias that will be used by Nomad when
deriving the task Vault token.
An entity alias assigns an indentity to a token, allowing better control
and management of Vault clients since all tokens with the same indentity
alias will now be considered the same client. This helps track Nomad
activity in Vault's audit logs and better control over Vault billing.
Add support for a new Nomad server configuration to define a default
entity alias to be used when deriving Vault tokens. This default value
will be used if the task doesn't have an entity alias defined.
This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza.
- `wait`
It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza.
- `max_stale`
- `block_query_wait`
- `consul_retry`
- `vault_retry`
- `wait`
Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure.
- `wait_bounds`
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Add a new driver capability: RemoteTasks.
When a task is run by a driver with RemoteTasks set, its TaskHandle will
be propagated to the server in its allocation's TaskState. If the task
is replaced due to a down node or draining, its TaskHandle will be
propagated to its replacement allocation.
This allows tasks to be scheduled in remote systems whose lifecycles are
disconnected from the Nomad node's lifecycle.
See https://github.com/hashicorp/nomad-driver-ecs for an example ECS
remote task driver.
Registration of Nomad volumes previously allowed for a single volume
capability (access mode + attachment mode pair). The recent `volume create`
command requires that we pass a list of requested capabilities, but the
existing workflow for claiming volumes and attaching them on the client
assumed that the volume's single capability was correct and unchanging.
Add `AccessMode` and `AttachmentMode` to `CSIVolumeClaim`, use these fields to
set the initial claim value, and add backwards compatibility logic to handle
the existing volumes that already have claims without these fields.
This PR adds the common OSS changes for adding support for Consul Namespaces,
which is going to be a Nomad Enterprise feature. There is no new functionality
provided by this changeset and hopefully no new bugs.
Add a `PerAlloc` field to volume requests that directs the scheduler to test
feasibility for volumes with a source ID that includes the allocation index
suffix (ex. `[0]`), rather than the exact source ID.
Read the `PerAlloc` field when making the volume claim at the client to
determine if the allocation index suffix (ex. `[0]`) should be added to the
volume source ID.
This PR adds the ability to set HTTP headers when downloading
an artifact from an `http` or `https` resource.
The implementation in `go-getter` is such that a new `HTTPGetter`
must be created for each artifact that sets headers (as opposed
to conveniently setting headers per-request). This PR maintains
the memoization of the default Getter objects, creating new ones
only for artifacts where headers are set.
Closes#9306
state store: call-out to generic update of job recommendations from job update method
recommendations API work, and http endpoint errors for OSS
support for scaling polices in task block of job spec
add query filters for ScalingPolicy list endpoint
command: nomad scaling policy list: added -job and -type
adds in oss components to support enterprise multi-vault namespace feature
upgrade specific doc on vault multi-namespaces
vault docs
update test to reflect new error
Add `Canonicalize` methods to the connect components of a service
definition in the `api` package. Without these, we have been relying
on good input for the connect stanza.
Fixes#7993
* jobspec, api: add stop_after_client_disconnect
* nomad/state/state_store: error message typo
* structs: alloc methods to support stop_after_client_disconnect
1. a global AllocStates to track status changes with timestamps. We
need this to track the time at which the alloc became lost
originally.
2. ShouldClientStop() and WaitClientStop() to actually do the math
* scheduler/reconcile_util: delayByStopAfterClientDisconnect
* scheduler/reconcile: use delayByStopAfterClientDisconnect
* scheduler/util: updateNonTerminalAllocsToLost comments
This was setup to only update allocs to lost if the DesiredStatus had
already been set by the scheduler. It seems like the intention was to
update the status from any non-terminal state, and not all lost allocs
have been marked stop or evict by now
* scheduler/testing: AssertEvalStatus just use require
* scheduler/generic_sched: don't create a blocked eval if delayed
* scheduler/generic_sched_test: several scheduling cases
When `nomad job inspect` encodes the response, if the decoded JSON
from the API doesn't exactly match the API struct, the field value
will be omitted even if it has a value. We only want the JSON struct
tag to `omitempty`.
- tg.Count defaults to tg.Scaling.Min if present (falls back on previous default of 1 if Scaling is absent)
- Validate() enforces tg.Scaling.Min <= tg.Count <= tg.Scaling.Max
modification in ApiScalingPolicyToStructs, api.TaskGroup.Validate so that defaults are handled for TaskGroup.Count and
Add mount_options to both the volume definition on registration and to the volume block in the group where the volume is requested. If both are specified, the options provided in the request replace the options defined in the volume. They get passed to the NodePublishVolume, which causes the node plugin to actually mount the volume on the host.
Individual tasks just mount bind into the host mounted volume (unchanged behavior). An operator can mount the same volume with different options by specifying it twice in the group context.
closes#7007
* nomad/structs/volumes: add MountOptions to volume request
* jobspec/test-fixtures/basic.hcl: add mount_options to volume block
* jobspec/parse_test: add expected MountOptions
* api/tasks: add mount_options
* jobspec/parse_group: use hcl decode not mapstructure, mount_options
* client/allocrunner/csi_hook: pass MountOptions through
client/allocrunner/csi_hook: add a VolumeMountOptions
client/allocrunner/csi_hook: drop Options
client/allocrunner/csi_hook: use the structs options
* client/pluginmanager/csimanager/interface: UsageOptions.MountOptions
* client/pluginmanager/csimanager/volume: pass MountOptions in capabilities
* plugins/csi/plugin: remove todo 7007 comment
* nomad/structs/csi: MountOptions
* api/csi: add options to the api for parsing, match structs
* plugins/csi/plugin: move VolumeMountOptions to structs
* api/csi: use specific type for mount_options
* client/allocrunner/csi_hook: merge MountOptions here
* rename CSIOptions to CSIMountOptions
* client/allocrunner/csi_hook
* client/pluginmanager/csimanager/volume
* nomad/structs/csi
* plugins/csi/fake/client: add PrevVolumeCapability
* plugins/csi/plugin
* client/pluginmanager/csimanager/volume_test: remove debugging
* client/pluginmanager/csimanager/volume: fix odd merging logic
* api: rename CSIOptions -> CSIMountOptions
* nomad/csi_endpoint: remove a 7007 comment
* command/alloc_status: show mount options in the volume list
* nomad/structs/csi: include MountOptions in the volume stub
* api/csi: add MountOptions to stub
* command/volume_status_csi: clean up csiVolMountOption, add it
* command/alloc_status: csiVolMountOption lives in volume_csi_status
* command/node_status: display mount flags
* nomad/structs/volumes: npe
* plugins/csi/plugin: npe in ToCSIRepresentation
* jobspec/parse_test: expand volume parse test cases
* command/agent/job_endpoint: ApiTgToStructsTG needs MountOptions
* command/volume_status_csi: copy paste error
* jobspec/test-fixtures/basic: hclfmt
* command/volume_status_csi: clean up csiVolMountOption
This changeset implements the initial registration and fingerprinting
of CSI Plugins as part of #5378. At a high level, it introduces the
following:
* A `csi_plugin` stanza as part of a Nomad task configuration, to
allow a task to expose that it is a plugin.
* A new task runner hook: `csi_plugin_supervisor`. This hook does two
things. When the `csi_plugin` stanza is detected, it will
automatically configure the plugin task to receive bidirectional
mounts to the CSI intermediary directory. At runtime, it will then
perform an initial heartbeat of the plugin and handle submitting it to
the new `dynamicplugins.Registry` for further use by the client, and
then run a lightweight heartbeat loop that will emit task events
when health changes.
* The `dynamicplugins.Registry` for handling plugins that run
as Nomad tasks, in contrast to the existing catalog that requires
`go-plugin` type plugins and to know the plugin configuration in
advance.
* The `csimanager` which fingerprints CSI plugins, in a similar way to
`drivermanager` and `devicemanager`. It currently only fingerprints
the NodeID from the plugin, and assumes that all plugins are
monolithic.
Missing features
* We do not use the live updates of the `dynamicplugin` registry in
the `csimanager` yet.
* We do not deregister the plugins from the client when they shutdown
yet, they just become indefinitely marked as unhealthy. This is
deliberate until we figure out how we should manage deploying new
versions of plugins/transitioning them.