Update the ingress gateway documentation to remove the note stating
that a port must be specified for values in the `hosts` field when
the ingress gateway is listening on a non-standard HTTP port.
Specifying a port was required in Consul 1.8.0, but that requirement
was removed in 1.8.1 with hashicorp/consul#8190 which made Consul
include the port number when constructing the Envoy configuration.
Related Consul docs PR: hashicorp/consul#10827
Tweaks to the commands in Consul Connect page.
For multi-command scripts, having the leading `$` is a bit annoying, as it makes copying the text harder. Also, the `copy` button would only copy the first command and ignore the rest.
Also, the `echo 1 > ...` commands are required to run as root, unlike the rest! I made them use `| sudo tee` pattern to ease copy & paste as well.
Lastly, update the CNI plugin links to 1.0.0. It's fresh off the oven - just got released less than an hour ago: https://github.com/containernetworking/plugins/releases/tag/v1.0.0 .
Using `bridge` networking requires that you have CNI plugins installed
on the client, but this isn't in the jobspec `network` docs which are
the first place someone will look when trying to configure task
networking.
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.
Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.
As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).
Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.
Closes#2527
Otherwise the spinner would just end, which felt a bit awkward.
I wanted to see a "✓" to know that everything was ok, and a "!" (maybe something else?) if something went wrong.
This PR fixes a bug where the underlying Envoy process of a Connect gateway
would consume a full core of CPU if there is more than one sidecar or gateway
in a group. The utilization was being caused by Consul injecting an envoy_ready_listener
on 127.0.0.1:8443, of which only one of the Envoys would be able to bind to.
The others would spin in a hot loop trying to bind the listener.
As a workaround, we now specify -address during the Envoy bootstrap config
step, which is how Consul maps this ready listener. Because there is already
the envoy_admin_listener, and we need to continue supporting running gateways
in host networking mode, and in those case we want to use the same port
value coming from the service.port field, we now bind the admin listener to
the 127.0.0.2 loop-back interface, and the ready listener takes 127.0.0.1.
This shouldn't make a difference in the 99.999% use case where envoy is
being run in its official docker container. Advanced users can reference
${NOMAD_ENVOY_ADMIN_ADDR_<service>} (as they 'ought to) if needed,
as well as the new variable ${NOMAD_ENVOY_READY_ADDR_<service>} for the
envoy_ready_listener.
Alloc exec only works when task is passed as a flag and not an arg.
Alloc logs currently accepts either, but alloc signal and restart only
accept task as an arg. This adds -task as a flag to the other alloc
commands to make the cli UX consistent. If task is passed as a flag and
an arg, it ignores the arg.
In Nomad 1.1.1 we generate a hosts file based on the Nomad-owned network
namespace, rather than using the default hosts file from the pause
container. This hosts file should be shared between tasks in the same
allocation so that tasks can update the file and have the results propagated
between tasks.
The `docker` driver's `port_map` field was deprecated in 0.12 and this is
documented in the task driver's docs, but we never explicitly flagged it for
backwards compatibility.
This PR makes it so that Nomad will automatically set the CONSUL_TLS_SERVER_NAME
environment variable for Connect native tasks running in bridge networking mode
where Consul has TLS enabled. Because of the use of a unix domain socket for
communicating with Consul when in bridge networking mode, the server name is
a file name instead of something compatible with the mTLS certificate Consul
will authenticate against. "localhost" is by default a compatible name, so Nomad
will set the environment variable to that.
Fixes#10804
Current efs driver does not support telling it if its a `node` or a `controller`, and it will not print any error it will just ignore all other parameters then:(
So this will result in endpoint being `/tmp/csi.sock` and not `/csi/csi.sock` which will in turn break nomad/csi integration.
Also I changed the latest image tag to v1.3.2 to make sure anybody copy pasting this example is sure that it will work.
Tested on nomad 1.1.2
When `network.mode = "bridge"`, we create a pause container in Docker with no
networking so that we have a process to hold the network namespace we create
in Nomad. The default `/etc/hosts` file of that pause container is then used
for all the Docker tasks that share that network namespace. Some applications
rely on this file being populated.
This changeset generates a `/etc/hosts` file and bind-mounts it to the
container when Nomad owns the network, so that the container's hostname has an
IP in the file as expected. The hosts file will include the entries added by
the Docker driver's `extra_hosts` field.
In this changeset, only the Docker task driver will take advantage of this
option, as the `exec`/`java` drivers currently copy the host's `/etc/hosts`
file and this can't be changed without breaking backwards compatibility. But
the fields are available in the task driver protobuf for community task
drivers to use if they'd like.
System and batch jobs don't create deployments, which means nomad tries
to monitor a non-existent deployment when it runs a job and outputs an
error message. This adds a check to make sure a deployment exists before
monitoring. Also fixes some formatting.
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Old description of `{plan,worker}.wait_for_index` described the metric
in terms of waiting for a snapshot which has two problems:
1. "Snapshot" is an overloaded term in Nomad and operators can't be
expected to know which use we're referring to here.
2. The most important thing about the metric is what we're waiting *on*
before taking a snapshot: the raft index of the object to be
processed (plan or eval).
The new description tries to cram all of that context into the tiny
space provided.
See #5791 for details about the `wait_for_index` mechanism in general.
This PR fixes the API to _not_ set the default mesh gateway mode. Before,
the mode would be set to "none" in Canonicalize, which is incorrect. We
should pass through the empty string so that folks can make use of Consul
service-defaults Config entries to configure the default mode.
This PR implements first-class support for Nomad running Consul
Connect Mesh Gateways. Mesh gateways enable services in the Connect
mesh to make cross-DC connections via gateways, where each datacenter
may not have full node interconnectivity.
Consul docs with more information:
https://www.consul.io/docs/connect/gateways/mesh-gateway
The following group level service block can be used to establish
a Connect mesh gateway.
service {
connect {
gateway {
mesh {
// no configuration
}
}
}
}
Services can make use of a mesh gateway by configuring so in their
upstream blocks, e.g.
service {
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "<service>"
local_bind_port = <port>
datacenter = "<datacenter>"
mesh_gateway {
mode = "<mode>"
}
}
}
}
}
}
Typical use of a mesh gateway is to create a bridge between datacenters.
A mesh gateway should then be configured with a service port that is
mapped from a host_network configured on a WAN interface in Nomad agent
config, e.g.
client {
host_network "public" {
interface = "eth1"
}
}
Create a port mapping in the group.network block for use by the mesh
gateway service from the public host_network, e.g.
network {
mode = "bridge"
port "mesh_wan" {
host_network = "public"
}
}
Use this port label for the service.port of the mesh gateway, e.g.
service {
name = "mesh-gateway"
port = "mesh_wan"
connect {
gateway {
mesh {}
}
}
}
Currently Envoy is the only supported gateway implementation in Consul.
By default Nomad client will run the latest official Envoy docker image
supported by the local Consul agent. The Envoy task can be customized
by setting `meta.connect.gateway_image` in agent config or by setting
the `connect.sidecar_task` block.
Gateways require Consul 1.8.0+, enforced by the Nomad scheduler.
Closes#9446
Adds clarification to `nomad volume create` commands around how the `volume`
block in the jobspec overrides this behavior. Adds missing section to `nomad
volume register` and to example volume spec for both commands.
Move the words being defined in the /docs/internal/architecture page to be
small headers so that they can be linked to with anchors from Learn guides and
other documentation location.
Update docs for allow_caps, cap_add, cap_drop in exec/java/docker driver
pages. Also update upgrade guide with guidance on new default linux
capabilities for exec and java drivers.
The `capacity` block was removed during implementation in lieu of the
`capacity_max` and `capacity_min` fields, but it wasn't removed from the
example in the documentation.
The default Linux Capabilities set enabled by the docker, exec, and
java task drivers includes CAP_NET_RAW (for making ping just work),
which has the side affect of opening an ARP DoS/MiTM attack between
tasks using bridge networking on the same host network.
https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities
This PR disables CAP_NET_RAW for the docker, exec, and java task
drivers. The previous behavior can be restored for docker using the
allow_caps docker plugin configuration option.
A future version of nomad will enable similar configurability for the
exec and java task drivers.
* website: bump to docs-page prerelease with hidden page support
* website: remove temp check for hidden pages, covered by docs-page
* website: bump to stable docs-page, w next-mdx-remote bump
Follow up to memory oversubscription - expose an env-var to indicate when memory oversubscription is enabled and what the limit is.
This will be helpful for setting hints to app for memory management.
Co-authored-by: Seth Hoenig <shoenig@hashicorp.com>
Add templating to `network-interface` option.
This PR also adds a fast-fail to in the case where an invalid interface is set or produced by the template
* add tests and check for valid interface
* Add documentation
* Incorporate suggestions from code review
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
This PR introduces the /v1/search/fuzzy API endpoint, used for fuzzy
searching objects in Nomad. The fuzzy search endpoint routes requests
to the Nomad Server leader, which implements the Search.FuzzySearch RPC
method.
Requests to the fuzzy search API are based on the api.FuzzySearchRequest
object, e.g.
{
"Text": "ed",
"Context": "all"
}
Responses from the fuzzy search API are based on the api.FuzzySearchResponse
object, e.g.
{
"Index": 27,
"KnownLeader": true,
"LastContact": 0,
"Matches": {
"tasks": [
{
"ID": "redis",
"Scope": [
"default",
"example",
"cache"
]
}
],
"evals": [],
"deployment": [],
"volumes": [],
"scaling_policy": [],
"images": [
{
"ID": "redis:3.2",
"Scope": [
"default",
"example",
"cache",
"redis"
]
}
]
},
"Truncations": {
"volumes": false,
"scaling_policy": false,
"evals": false,
"deployment": false
}
}
The API is tunable using the new server.search stanza, e.g.
server {
search {
fuzzy_enabled = true
limit_query = 200
limit_results = 1000
min_term_length = 5
}
}
These values can be increased or decreased, so as to provide more
search results or to reduce load on the Nomad Server. The fuzzy search
API can be disabled entirely by setting `fuzzy_enabled` to `false`.
Add new `access_mode`/`attachment_mode` fields. Make it more clear which set
of fields belong to create vs register. Update the example spec that's
generated by `volume init`.
This PR adds the common OSS changes for adding support for Consul Namespaces,
which is going to be a Nomad Enterprise feature. There is no new functionality
provided by this changeset and hopefully no new bugs.
Add a `PerAlloc` field to volume requests that directs the scheduler to test
feasibility for volumes with a source ID that includes the allocation index
suffix (ex. `[0]`), rather than the exact source ID.
Read the `PerAlloc` field when making the volume claim at the client to
determine if the allocation index suffix (ex. `[0]`) should be added to the
volume source ID.
Create a convenience command for generating example CSI volume specifications,
similar to the existing `nomad job init` or `nomad quota init` commands.
We've only ever had 1 API version, and we've broken backward
compatibility extremely rarely. Nothing changed about this with the
release of 1.0, let's just remove these sentences and save everybody
some reading.
The terminology here is a bit tricky. Technically Kuberbetes deprecated
their Docker *runtime* support but can still run Docker images. Sadly in
a lot of people's minds "Docker" and "containers" are nearly synonymous.
I think "Linux containers" is a more accurate characterization of
Kubernetes focus than "Docker" at this point.
Fixes#10120
This is maybe a more realistic job that someone would actually submit
via the API; it more closely aligns with the example job from `nomad
job init`.
It's also been updated to use group network and service definitions.
* Impl: base structure
* Impl: ComparisonItem/component bg color
* Style comparison items
* CSS: Typography/spacing pass
* Theme detailCta
* Callouts: content pass
* Impl: Logo grid section
* Add proper logos to logoGrid
* Hack: override button colors w/Nomad
* Link enhancements
* Clean up comments
* Revert link wrapping changes
* Remove paragraph after Why Nomad
* Add available links
* GitHub logo to GitLab
* Move callouts to just under Why Nomad? for A ver.
* Test comment to see if Vercel picks up this commit
In a deployment with two groups (ex. A and B), if group A's canary becomes
healthy before group B's, the deadline for the overall deployment will be set
to that of group A. When the deployment is promoted, if group A is done it
will not contribute to the next deadline cutoff. Group B's old deadline will
be used instead, which will be in the past and immediately trigger a
deployment progress failure. Reset the progress deadline when the job is
promotion to avoid this bug, and to better conform with implicit user
expectations around how the progress deadline should interact with promotions.
HCL2 locals don't have type constraints, and the map syntax is interpreted as
an object. So users may need to explictly convert to map for some functions.
The `convert` function documentation was missing the required type constructor
parameter for collections.
Block labels are not expressions so they can't be interpolated without hacks
like `dynamic` blocks. Clarify this so that we don't confuse users who
shouldn't need to dig into the subtle nuances between expressions and blocks.
Using the environment variable stopped working here a while back,
should be using the port label. Also upgrade to uuid-api:v5 which
supports linux/arm64.
This PR enables jobs configured with a custom sidecar_task to make
use of the `service.expose` feature for creating checks on services
in the service mesh. Before we would check that sidecar_task had not
been set (indicating that something other than envoy may be in use,
which would not support envoy's expose feature). However Consul has
not added support for anything other than envoy and probably never
will, so having the restriction in place seems like an unnecessary
hindrance. If Consul ever does support something other than Envoy,
they will likely find a way to provide the expose feature anyway.
Fixes#9854
This PR adds pid_mode and ipc_mode options to the exec and java task
driver config options. By default these will defer to the default_pid_mode
and default_ipc_mode agent plugin options created in #9969. Setting
these values to "host" mode disables isolation for the task. Doing so
is not recommended, but may be necessary to support legacy job configurations.
Closes#9970
Allow for readiness type checks by configuring nomad to ignore warnings
or errors reported by a service check. This allows the deployment to
progress and while Consul handles introducing the sercive into a
resource pool once the check passes.
This PR adds default_pid_mode and default_ipc_mode options to the exec and java
task drivers. By default these will default to "private" mode, enabling PID and
IPC isolation for tasks. Setting them to "host" mode disables isolation. Doing
so is not recommended, but may be necessary to support legacy job configurations.
Closes#9969
- aws secret key is named incorrectly in the target docs.
It needs to match what is in the nomad-autoscaler repo
(see link below), otherwise the autoscaler will default to AWS sdk
behavior, which could end up using an IAM instance profile
or other environment variables instead of what is passed into the
autoscaler config file.
Ref: e60fb5268d/plugins/builtin/target/aws-asg/plugin/plugin.go (L27)
This PR implements Nomad built-in support for running Consul Connect
terminating gateways. Such a gateway can be used by services running
inside the service mesh to access "legacy" services running outside
the service mesh while still making use of Consul's service identity
based networking and ACL policies.
https://www.consul.io/docs/connect/gateways/terminating-gateway
These gateways are declared as part of a task group level service
definition within the connect stanza.
service {
connect {
gateway {
proxy {
// envoy proxy configuration
}
terminating {
// terminating-gateway configuration entry
}
}
}
}
Currently Envoy is the only supported gateway implementation in
Consul. The gateay task can be customized by configuring the
connect.sidecar_task block.
When the gateway.terminating field is set, Nomad will write/update
the Configuration Entry into Consul on job submission. Because CEs
are global in scope and there may be more than one Nomad cluster
communicating with Consul, there is an assumption that any terminating
gateway defined in Nomad for a particular service will be the same
among Nomad clusters.
Gateways require Consul 1.8.0+, checked by a node constraint.
Closes#9445
I want to strike a balance here:
- On the one hand there are use cases (raw_exec or Docker only) where
running Nomad clients as an unprivileged user is *preferable.*
- On the other hand running Nomad clients as root is our main and best
tested environment. So I want to leave that a strong recommendation.
There's no reason to give the `client.options` to `plugins` migration
top billing on the client configuration page. Remove and and clarify the
more appropriately placed note down below.
The client allocation GC API returns a misleading error message when the
allocation exists but is not yet eligible for GC. Make this clear in the error
response.
Note in the docs that the allocation will still show on the server responses.
Add the ability to configure the Task used for Connect gateways,
similar to how sidecar Task can be configured.
The implementation here simply re-uses the sidecar_task stanza,
and now gets applied whether connect.sidecar_service or
connect.gateway is the thing being defined. In retrospect,
connect.sidecar_task could have been more generically named
like connect.task to make it a little more re-usable.
Closes#9474
When a task is restored after a client restart, the template runner will
create a new lease for any dynamic secret (ex. Consul or PKI secrets
engines). But because this lease is being created in the prestart hook, we
don't trigger the `change_mode`.
This changeset uses the the existence of the task handle to detect a
previously running task that's been restored, so that we can trigger the
template `change_mode` if the template is changed, as it will be only with
dynamic secrets.
Make backward compatibility notes about Task Driver config options. Namely, call out the use of blocks with non-identifier attributes (like in docker systctl and storage_options) or nesting block syntax within an attribute assignment. Neither of these are valid HCL2. The solution is relatively simple: We can add = and quote the non-identifier attribute names.
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Applying the default --concurrency for gateways was missed before.
Set the default Envoy concurrency to 1 for connect gateways. The
same override value meta.connect.proxy_concurrency applies.
* docs: Remove 1.0 beta warning about HCL2.0 docs
* docs: multiregion is not beta anymore either
* docs: scaling isn't beta
* docs: neither is events api
Mainly note that block labels need to be string literals, and that decimals without a leading significant digits aren't acceptable anymore (e.g. .9 are required to be 0.9).
Dynamic blocks can be used here, but feels too much of a hack, or a hammer to highlight it here, specially given the error reporting and debugging isn't so straightforward. I'd advocate internally for relaxing the restriction and allowing expressions in block labels instead.
Related to https://github.com/hashicorp/nomad/issues/9522
During testing we discovered old versions of Nomad and Consul seemed to
prevent Envoy from accepting new connections while the Nomad agent was
being upgraded.
* use full name for events
use evaluation and allocation instead of short name
* update api event stream package and shortnames
* update docs
* make sync; fix typo
* backwards compat not from 1.0.0-beta event stream api changes
* use api types instead of string
* rm backwards compat note that only changed between prereleases
* remove backwards incompat that only existed in prereleases
* systemd should be downcased
* containerd should be downcased
* spellchecking, adjust list item spacing
* QEMU should be upcased
* spelling, it's->its
* Fewer exclamation points; drive-by list spacing
* Update website/pages/docs/internals/security.mdx
* Namespace is not ent only now.
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Expand `volume` and `volume_mount` sections to describe how to use HCL2
dynamic blocks and interpolation to have finer-grained control over how
allocations get volumes.
Previously, every Envoy Connect sidecar would spawn as many worker
threads as logical CPU cores. That is Envoy's default behavior when
`--concurrency` is not explicitly set. Nomad now sets the concurrency
flag to 1, which is sensible for the default cpu = 250 Mhz resources
allocated for sidecar proxies. The concurrency value can be configured
in Client configuration by setting `meta.connect.proxy_concurrency`.
Closes#9341
If Docker auth helpers are used but aith fails or the image isn't found, we
hard fail the task. Users may set `auth_soft_fail` to fallback to the public
Docker Hub on a per-job basis. But users that mix public and private images
have to set `auth_soft_fail=true` for every job using a public image if Docker
auth helpers are used.
* Remove Managed Sinks from Nomad
Managed Sinks were a beta feature in Nomad 1.0-beta2. During the beta
period it was determined that this was not a scalable approach to
support community and third party sinks.
* update comment
* changelog
Before, upstreams could only be defined using the default datacenter.
Now, the `datacenter` field can be set in a connect upstream definition,
informing consul of the desire for an instance of the upstream service
in the specified datacenter. The field is optional and continues to
default to the local datacenter.
Closes#8964
Add more detailed HCL2 docs, mostly lifted from Packer with tweaks for Nomad.
The function docs are basically verbatim taken from Packer with basic string substitutions. I commented out some for_each details as the examples are mostly driven towards Packer resources. I'll iterate on those with better Nomad examples.
The API is missing values for `ReadAllocs` and `WriteAllocs` fields, resulting
in allocation claims not being populated in the web UI. These fields mirror
the fields in `nomad/structs.CSIVolume`. Returning a separate list of stubs
for read and write would be ideal, but this can't be done without either
bloating the API response with repeated full `Allocation` data, or causing a
panic in previous versions of the CLI.
The `nomad/structs` fields are persisted with nil values and are populated
during RPC, so we'll do the same in the HTTP API and populate the `ReadAllocs`
and `WriteAllocs` fields with a map of allocation IDs, but with null
values. The web UI will then create its `ReadAllocations` and
`WriteAllocations` fields by mapping from those IDs to the values in
`Allocations`, instead of flattening the map into a list.
Only `change_mode = "restart"` will result in the template environment
variables being updated in the task. Clarify the behavior of the unsupported
options.
* `nomad operator keyring` was missing the general options section
* `nomad operator metrics` was missing a page in the docs entirely
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Create a new "Operating Nomad" section of the docs where we can put reference
material for operators that doesn't quite fit in either configuration file /
command line documentation or a step-by-step Learn Guide. Pre-populate this
with the existing telemetry docs and some links out to the Learn Guide
sections.
* vault secrets named with `-` characters cannot be read by `consul-template`
due to limitations in golang's template rendering engine.
* environment variables are not modified in running tasks if
`change_mode.noop` is set.
The `nomad alloc logs` command does not remove terminal escape sequences for
color from the log outputs of a task. Clarify that the standard `-no-color`
flag, which does apply to Nomad's error responses from `nomad alloc logs`,
does not apply to the log output.
This PR adds the ability to set HTTP headers when downloading
an artifact from an `http` or `https` resource.
The implementation in `go-getter` is such that a new `HTTPGetter`
must be created for each artifact that sets headers (as opposed
to conveniently setting headers per-request). This PR maintains
the memoization of the default Getter objects, creating new ones
only for artifacts where headers are set.
Closes#9306
Update the default value for `client.bridge_network_subnet` in docs
to match the new value from 99742f2665. Was `172.26.66.0/23`, is
now `172.26.64.0/20`.
Fixes#9316
The default behavior for `docker.volumes.enabled` is intended to be `false`,
but the HCL schema defaults to `true` if the value is unset. Set the default
literal value to `true`.
Additionally, Docker driver mounts of type "volume" (but not "bind") are not
being properly sandboxed with that setting. Disable Docker mounts with type
"volume" entirely whenever the `docker.volumes.enabled` flag is set to
false. Note this is unrelated to the `volume_mount` feature, which is
constrained to preconfigured host volumes or whatever is mounted by a CSI
plugin.
This changeset includes updates to unit tests that should have been failing
under the documented behavior but were not.
I believe there’s a typo where “workloads” was changed to “jobs” but the original word wasn’t removed. Or maybe it’s the other way around. But currently there is an orphaned one-word sentence.
We recently added documentation disambiguating the terminology of the
allocation/task working directories. This changeset adds an internals document
that describes in more detail exactly what does into the allocation working
directory, how this interacts with the filesystem isolation provided by task
drivers, and how this interacts with features like `artifact` and `template`.
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
This is a first draft of HCLv2 docs - I added the details under hcl2 doc with some minimal info highlighting the hcl2 introductions.
As a longer term strategy, we'll want to mimic the Packer HCL docs structure that details all the blocks and allowed expressions/functions in greater details. However, given that the exact functions and templating syntax is still somewhat influx, I opt to push that to another time.
Dockerhub is going to rate limit unauthenticated pulls.
Use our HashiCorp internal mirror for builds run through CircleCI.
Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
The `template.allow_host_source` configuration was not operable, leading to
the recent security patch in 0.12.6. We forgot to update this piece of the
documentation referring to the correct configuration value.