When a Nomad agent starts and loads jobs that already existed in the
cluster, the default template uid and gid was being set to 0, since this
is the zero value for int. This caused these jobs to fail in
environments where it was not possible to use 0, such as in Windows
clients.
In order to differentiate between an explicit 0 and a template where
these properties were not set we need to use a pointer.
This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza.
- `wait`
It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza.
- `max_stale`
- `block_query_wait`
- `consul_retry`
- `vault_retry`
- `wait`
Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure.
- `wait_bounds`
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
This PR implements first-class support for Nomad running Consul
Connect Mesh Gateways. Mesh gateways enable services in the Connect
mesh to make cross-DC connections via gateways, where each datacenter
may not have full node interconnectivity.
Consul docs with more information:
https://www.consul.io/docs/connect/gateways/mesh-gateway
The following group level service block can be used to establish
a Connect mesh gateway.
service {
connect {
gateway {
mesh {
// no configuration
}
}
}
}
Services can make use of a mesh gateway by configuring so in their
upstream blocks, e.g.
service {
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "<service>"
local_bind_port = <port>
datacenter = "<datacenter>"
mesh_gateway {
mode = "<mode>"
}
}
}
}
}
}
Typical use of a mesh gateway is to create a bridge between datacenters.
A mesh gateway should then be configured with a service port that is
mapped from a host_network configured on a WAN interface in Nomad agent
config, e.g.
client {
host_network "public" {
interface = "eth1"
}
}
Create a port mapping in the group.network block for use by the mesh
gateway service from the public host_network, e.g.
network {
mode = "bridge"
port "mesh_wan" {
host_network = "public"
}
}
Use this port label for the service.port of the mesh gateway, e.g.
service {
name = "mesh-gateway"
port = "mesh_wan"
connect {
gateway {
mesh {}
}
}
}
Currently Envoy is the only supported gateway implementation in Consul.
By default Nomad client will run the latest official Envoy docker image
supported by the local Consul agent. The Envoy task can be customized
by setting `meta.connect.gateway_image` in agent config or by setting
the `connect.sidecar_task` block.
Gateways require Consul 1.8.0+, enforced by the Nomad scheduler.
Closes#9446
This PR adds the common OSS changes for adding support for Consul Namespaces,
which is going to be a Nomad Enterprise feature. There is no new functionality
provided by this changeset and hopefully no new bugs.
The details of host volume and CSI volume requests do not show up in `nomad
plan` outputs, although the updates are detected by the scheduler and result
in an update as expected.
This PR implements Nomad built-in support for running Consul Connect
terminating gateways. Such a gateway can be used by services running
inside the service mesh to access "legacy" services running outside
the service mesh while still making use of Consul's service identity
based networking and ACL policies.
https://www.consul.io/docs/connect/gateways/terminating-gateway
These gateways are declared as part of a task group level service
definition within the connect stanza.
service {
connect {
gateway {
proxy {
// envoy proxy configuration
}
terminating {
// terminating-gateway configuration entry
}
}
}
}
Currently Envoy is the only supported gateway implementation in
Consul. The gateay task can be customized by configuring the
connect.sidecar_task block.
When the gateway.terminating field is set, Nomad will write/update
the Configuration Entry into Consul on job submission. Because CEs
are global in scope and there may be more than one Nomad cluster
communicating with Consul, there is an assumption that any terminating
gateway defined in Nomad for a particular service will be the same
among Nomad clusters.
Gateways require Consul 1.8.0+, checked by a node constraint.
Closes#9445
Multiregion deployments use the `NomadTokenID` to allow the deploymentwatcher
to send RPCs between regions with the original submitter's ACL token. This ID
should be filtered from diffs so that it doesn't cause a difference for
purposes of job plans.
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.
```hcl
service {
connect {
gateway {
proxy {
// envoy proxy configuration
}
ingress {
// ingress-gateway configuration entry
}
}
}
}
```
A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.
Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.
Aims to address #8294 and tangentially #8647
The multiregion plan diffs swap the old and new versions for each region when
they're edited (rather than added/removed). The `multiregionRegionDiff`
function call incorrectly reversed its arguments for existing regions.
* jobspec, api: add stop_after_client_disconnect
* nomad/state/state_store: error message typo
* structs: alloc methods to support stop_after_client_disconnect
1. a global AllocStates to track status changes with timestamps. We
need this to track the time at which the alloc became lost
originally.
2. ShouldClientStop() and WaitClientStop() to actually do the math
* scheduler/reconcile_util: delayByStopAfterClientDisconnect
* scheduler/reconcile: use delayByStopAfterClientDisconnect
* scheduler/util: updateNonTerminalAllocsToLost comments
This was setup to only update allocs to lost if the DesiredStatus had
already been set by the scheduler. It seems like the intention was to
update the status from any non-terminal state, and not all lost allocs
have been marked stop or evict by now
* scheduler/testing: AssertEvalStatus just use require
* scheduler/generic_sched: don't create a blocked eval if delayed
* scheduler/generic_sched_test: several scheduling cases
Adds a check for differences in `job.Diff` so that task group networks
and services, including new Consul connect stanzas, show up in the job
plan outputs.