When a task is restored after a client restart, the template runner will
create a new lease for any dynamic secret (ex. Consul or PKI secrets
engines). But because this lease is being created in the prestart hook, we
don't trigger the `change_mode`.
This changeset uses the the existence of the task handle to detect a
previously running task that's been restored, so that we can trigger the
template `change_mode` if the template is changed, as it will be only with
dynamic secrets.
Make backward compatibility notes about Task Driver config options. Namely, call out the use of blocks with non-identifier attributes (like in docker systctl and storage_options) or nesting block syntax within an attribute assignment. Neither of these are valid HCL2. The solution is relatively simple: We can add = and quote the non-identifier attribute names.
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Applying the default --concurrency for gateways was missed before.
Set the default Envoy concurrency to 1 for connect gateways. The
same override value meta.connect.proxy_concurrency applies.
* docs: Remove 1.0 beta warning about HCL2.0 docs
* docs: multiregion is not beta anymore either
* docs: scaling isn't beta
* docs: neither is events api
Mainly note that block labels need to be string literals, and that decimals without a leading significant digits aren't acceptable anymore (e.g. .9 are required to be 0.9).
Dynamic blocks can be used here, but feels too much of a hack, or a hammer to highlight it here, specially given the error reporting and debugging isn't so straightforward. I'd advocate internally for relaxing the restriction and allowing expressions in block labels instead.
Related to https://github.com/hashicorp/nomad/issues/9522
During testing we discovered old versions of Nomad and Consul seemed to
prevent Envoy from accepting new connections while the Nomad agent was
being upgraded.
* use full name for events
use evaluation and allocation instead of short name
* update api event stream package and shortnames
* update docs
* make sync; fix typo
* backwards compat not from 1.0.0-beta event stream api changes
* use api types instead of string
* rm backwards compat note that only changed between prereleases
* remove backwards incompat that only existed in prereleases
* systemd should be downcased
* containerd should be downcased
* spellchecking, adjust list item spacing
* QEMU should be upcased
* spelling, it's->its
* Fewer exclamation points; drive-by list spacing
* Update website/pages/docs/internals/security.mdx
* Namespace is not ent only now.
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Expand `volume` and `volume_mount` sections to describe how to use HCL2
dynamic blocks and interpolation to have finer-grained control over how
allocations get volumes.
Previously, every Envoy Connect sidecar would spawn as many worker
threads as logical CPU cores. That is Envoy's default behavior when
`--concurrency` is not explicitly set. Nomad now sets the concurrency
flag to 1, which is sensible for the default cpu = 250 Mhz resources
allocated for sidecar proxies. The concurrency value can be configured
in Client configuration by setting `meta.connect.proxy_concurrency`.
Closes#9341
If Docker auth helpers are used but aith fails or the image isn't found, we
hard fail the task. Users may set `auth_soft_fail` to fallback to the public
Docker Hub on a per-job basis. But users that mix public and private images
have to set `auth_soft_fail=true` for every job using a public image if Docker
auth helpers are used.
* Remove Managed Sinks from Nomad
Managed Sinks were a beta feature in Nomad 1.0-beta2. During the beta
period it was determined that this was not a scalable approach to
support community and third party sinks.
* update comment
* changelog
Before, upstreams could only be defined using the default datacenter.
Now, the `datacenter` field can be set in a connect upstream definition,
informing consul of the desire for an instance of the upstream service
in the specified datacenter. The field is optional and continues to
default to the local datacenter.
Closes#8964
Add more detailed HCL2 docs, mostly lifted from Packer with tweaks for Nomad.
The function docs are basically verbatim taken from Packer with basic string substitutions. I commented out some for_each details as the examples are mostly driven towards Packer resources. I'll iterate on those with better Nomad examples.
The API is missing values for `ReadAllocs` and `WriteAllocs` fields, resulting
in allocation claims not being populated in the web UI. These fields mirror
the fields in `nomad/structs.CSIVolume`. Returning a separate list of stubs
for read and write would be ideal, but this can't be done without either
bloating the API response with repeated full `Allocation` data, or causing a
panic in previous versions of the CLI.
The `nomad/structs` fields are persisted with nil values and are populated
during RPC, so we'll do the same in the HTTP API and populate the `ReadAllocs`
and `WriteAllocs` fields with a map of allocation IDs, but with null
values. The web UI will then create its `ReadAllocations` and
`WriteAllocations` fields by mapping from those IDs to the values in
`Allocations`, instead of flattening the map into a list.
Only `change_mode = "restart"` will result in the template environment
variables being updated in the task. Clarify the behavior of the unsupported
options.
* `nomad operator keyring` was missing the general options section
* `nomad operator metrics` was missing a page in the docs entirely
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Create a new "Operating Nomad" section of the docs where we can put reference
material for operators that doesn't quite fit in either configuration file /
command line documentation or a step-by-step Learn Guide. Pre-populate this
with the existing telemetry docs and some links out to the Learn Guide
sections.
* vault secrets named with `-` characters cannot be read by `consul-template`
due to limitations in golang's template rendering engine.
* environment variables are not modified in running tasks if
`change_mode.noop` is set.
The `nomad alloc logs` command does not remove terminal escape sequences for
color from the log outputs of a task. Clarify that the standard `-no-color`
flag, which does apply to Nomad's error responses from `nomad alloc logs`,
does not apply to the log output.
This PR adds the ability to set HTTP headers when downloading
an artifact from an `http` or `https` resource.
The implementation in `go-getter` is such that a new `HTTPGetter`
must be created for each artifact that sets headers (as opposed
to conveniently setting headers per-request). This PR maintains
the memoization of the default Getter objects, creating new ones
only for artifacts where headers are set.
Closes#9306
Update the default value for `client.bridge_network_subnet` in docs
to match the new value from 99742f2665. Was `172.26.66.0/23`, is
now `172.26.64.0/20`.
Fixes#9316
The default behavior for `docker.volumes.enabled` is intended to be `false`,
but the HCL schema defaults to `true` if the value is unset. Set the default
literal value to `true`.
Additionally, Docker driver mounts of type "volume" (but not "bind") are not
being properly sandboxed with that setting. Disable Docker mounts with type
"volume" entirely whenever the `docker.volumes.enabled` flag is set to
false. Note this is unrelated to the `volume_mount` feature, which is
constrained to preconfigured host volumes or whatever is mounted by a CSI
plugin.
This changeset includes updates to unit tests that should have been failing
under the documented behavior but were not.
I believe there’s a typo where “workloads” was changed to “jobs” but the original word wasn’t removed. Or maybe it’s the other way around. But currently there is an orphaned one-word sentence.
We recently added documentation disambiguating the terminology of the
allocation/task working directories. This changeset adds an internals document
that describes in more detail exactly what does into the allocation working
directory, how this interacts with the filesystem isolation provided by task
drivers, and how this interacts with features like `artifact` and `template`.
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
This is a first draft of HCLv2 docs - I added the details under hcl2 doc with some minimal info highlighting the hcl2 introductions.
As a longer term strategy, we'll want to mimic the Packer HCL docs structure that details all the blocks and allowed expressions/functions in greater details. However, given that the exact functions and templating syntax is still somewhat influx, I opt to push that to another time.
Dockerhub is going to rate limit unauthenticated pulls.
Use our HashiCorp internal mirror for builds run through CircleCI.
Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
The `template.allow_host_source` configuration was not operable, leading to
the recent security patch in 0.12.6. We forgot to update this piece of the
documentation referring to the correct configuration value.
Beforehand tasks and field replacements did not have access to the
unique ID of their job or its parent. This adds this information as
new environment variables.
* remove event durability
temporarily removing go-memdb event durability until a new strategy is developed on how to best handled increased durability needs
* drop events table schema and state store methods
* fix neweventbuffer invocations
The terms task directory and allocation directory are used throughout the
documentation but these directories are not the same as the `NOMAD_TASK_DIR`
and `NOMAD_ALLOC_DIR` locations. This is confusing when trying to use the
`template` and `artifact` stanzas, especially when trying to use a destination
outside the Nomad-mounted directories for Docker and similar drivers.
This changeset introduces "allocation working directory" to mean the location
on disk where the various directories and artifacts are staged, and "task
working directory" for the task. Clarify how specific task drivers interact
with the task working directory.
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
The terms task directory and allocation directory are used throughout the
documentation but these directories are not the same as the `NOMAD_TASK_DIR`
and `NOMAD_ALLOC_DIR` locations. This is confusing when trying to use the
`template` and `artifact` stanzas, especially when trying to use a destination
outside the Nomad-mounted directories for Docker and similar drivers.
This changeset introduces "allocation working directory" to mean the location
on disk where the various directories and artifacts are staged, and "task
working directory" for the task. Clarify how specific task drivers interact
with the task working directory.
* consul: advertise cni and multi host interface addresses
* structs: add service/check address_mode validation
* ar/groupservices: fetch networkstatus at hook runtime
* ar/groupservice: nil check network status getter before calling
* consul: comment network status can be nil
As newer versions of Consul are released, the minimum version of Envoy
it supports as a sidecar proxy also gets bumped. Starting with the upcoming
Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current
versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the
default implementation of Connect sidecar proxy.
This PR introduces a change such that each Nomad Client will query its
local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545)
and then launch the Connect sidecar proxy task using the latest supported version
of Envoy. If the `SupportedProxies` API component is not available from
Consul, Nomad will fallback to the old version of Envoy supported by old
versions of Consul.
Setting the meta configuration option `meta.connect.sidecar_image` or
setting the `connect.sidecar_task` stanza will take precedence as is
the current behavior for sidecar proxies.
Setting the meta configuration option `meta.connect.gateway_image`
will take precedence as is the current behavior for connect gateways.
`meta.connect.sidecar_image` and `meta.connect.gateway_image` may make
use of the special `${NOMAD_envoy_version}` variable interpolation, which
resolves to the newest version of Envoy supported by the Consul agent.
Addresses #8585#7665
When we try to prefix match the `nomad volume detach` node ID argument, the
node may have been already GC'd. The volume unpublish workflow gracefully
handles this case so that we can free the claim. So make a best effort to find
a node ID among the volume's claimed allocations, or otherwise just use the
node ID we've been given by the user as-is.
CSI plugins with the same plugin ID and type (controller, node, monolith) will
collide on a host, both in the communication socket and in the dynamic plugin
registry. Until this can be fixed, leave notice to operators in the
documentation.
Previously, Nomad was using a hand-made lookup table for looking
up EC2 CPU performance characteristics (core count + speed = ticks).
This data was incomplete and incorrect depending on region. The AWS
API has the correct data but requires API keys to use (i.e. should not
be queried directly from Nomad).
This change introduces a lookup table generated by a small command line
tool in Nomad's tools module which uses the Amazon AWS API.
Running the tool requires AWS_* environment variables set.
$ # in nomad/tools/cpuinfo
$ go run .
Going forward, Nomad can incorporate regeneration of the lookup table
somewhere in the CI pipeline so that we remain up-to-date on the latest
offerings from EC2.
Fixes#7830
The CSI specification for `ValidateVolumeCapability` says that we shall
"reconcile successful capability-validation responses by comparing the
validated capabilities with those that it had originally requested" but leaves
the details of that reconcilation unspecified. This API is not implemented in
Kubernetes, so controller plugins don't have a real-world implementation to
verify their behavior against.
We have found that CSI plugins in the wild may return "successful" but
incomplete `VolumeCapability` responses, so we can't require that all
capabilities we expect have been validated, only that the ones that have been
validated match. This appears to violate the CSI specification but until
that's been resolved in upstream we have to loosen our validation
requirements. The tradeoff is that we're more likely to have runtime errors
during `NodeStageVolume` instead of at the time of volume registration.
The CSI specification allows only the `file-system` attachment mode to have
mount options. The `block-device` mode is left "intentionally empty, for now"
in the protocol. We should be validating against this problem, but our
documentation also had it backwards.
Also adds missing mount_options on group volume.
This PR adds a version specific upgrade note about the docker stop
signal behavior. Also adds test for the signal logic in docker driver.
Closes#8932 which was fixed in #8933
`nomad volume detach volume-id 00000000-0000-0000-0000-000000000000` produces an API call containing the UUID as part of the query string. This is the only way the API accepts the request correctly - if you pass it in the payload you get `detach requires node ID`
When defining a script-check in a group-level service, Nomad needs to
know which task is associated with the check so that it can use the
correct task driver to execute the check.
This PR fixes two bugs:
1) validate service.task or service.check.task is configured
2) make service.check.task inherit service.task if it is itself unset
Fixes#8952
The 'docker.config.infra_image' would default to an amd64 container.
It is possible to reference the correct image for a platform using
the `runtime.GOARCH` variable, eliminating the need to explicitly set
the `infra_image` on non-amd64 platforms.
Also upgrade to Google's pause container version 3.1 from 3.0, which
includes some enhancements around process management.
Fixes#8926
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.
```hcl
service {
connect {
gateway {
proxy {
// envoy proxy configuration
}
ingress {
// ingress-gateway configuration entry
}
}
}
}
```
A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.
Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.
Aims to address #8294 and tangentially #8647
* docker: support group allocated ports
* docker: add new ports driver config to specify which group ports are mapped
* docker: update port mapping docs
The soundness guarantees of the CSI specification leave a little to be desired
in our ability to provide a 100% reliable automated solution for managing
volumes. This changeset provides a new command to bridge this gap by providing
the operator the ability to intervene.
The command doesn't take an allocation ID so that the operator doesn't have to
keep track of alloc IDs that may have been GC'd. Handle this case in the
unpublish RPC by sending the client RPC for all the terminal/nil allocs on the
selected node.
This change adds the ability to set the fields `success_before_passing` and
`failures_before_critical` on Consul service check definitions. This is a
feature added to Consul v1.7.0 and later.
https://www.consul.io/docs/agent/checks#success-failures-before-passing-critical
Nomad doesn't do much besides pass the fields through to Consul.
Fixes#6913
* update vault integration docs
docs/integrations/vault-integration was a copy of the learn guide. Remove that and move /docs/vault-integration to this location instead
fix link
fix link
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Update website/pages/docs/integrations/vault-integration.mdx
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
* revert accidental deletion
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
In order to prevent staleness, changed driver links to point to releases page rather than a specific version.
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Postrun hooks for allocation runners don't currently block the registration of
terminal health with the servers, which is what allows system jobs to be
drained. So draining nodes with jobs that claim CSI volumes requires the
`-ignore-system` job to ensure that the postrun hook for service jobs gets a
chance to execute.
The Nomad binary size has been detailed differently in places
and is subject to changing almost daily. We should therefore
remove this to avoid confusion and misrepresentation.
adds in oss components to support enterprise multi-vault namespace feature
upgrade specific doc on vault multi-namespaces
vault docs
update test to reflect new error
Also fixed the same typo in a test. Fixing the typo fixes the link, but
the link was still broken when running the website locally due to the
trailing slash. It would have worked in prod thanks to redirects, but
using the canonical URL seems ideal.
Not sure if this was meant to imply adding more schedulers to Nomad is
easy, or that we plan on adding pluggable schedulers. Either way,
neither of those statements is really true unless you really stretch the
definitions of "easy" and "plan".
So remove this sentence as I can't imagine it does anything other than
confuse people.
Before docker, the only default was `SIGINT` for `kill_signal`. The
docker driver however defaults to `SIGTERM`, and we should document
as such.
Fixes#7140
This changes fixes a syntax error in the autoscaling apm plugin
docs as well as updates the scaling stanza doc. The stazna wording
implied its use was only for external autoscalers, whereas it also
is used by the UI.
Before, the service definition for a Connect Native service would always
require setting the `service.task` parameter. Now, that parameter is
automatically inferred when there is only one task in the task group.
Fixes#8274