This PR implements Nomad built-in support for running Consul Connect
terminating gateways. Such a gateway can be used by services running
inside the service mesh to access "legacy" services running outside
the service mesh while still making use of Consul's service identity
based networking and ACL policies.
https://www.consul.io/docs/connect/gateways/terminating-gateway
These gateways are declared as part of a task group level service
definition within the connect stanza.
service {
connect {
gateway {
proxy {
// envoy proxy configuration
}
terminating {
// terminating-gateway configuration entry
}
}
}
}
Currently Envoy is the only supported gateway implementation in
Consul. The gateay task can be customized by configuring the
connect.sidecar_task block.
When the gateway.terminating field is set, Nomad will write/update
the Configuration Entry into Consul on job submission. Because CEs
are global in scope and there may be more than one Nomad cluster
communicating with Consul, there is an assumption that any terminating
gateway defined in Nomad for a particular service will be the same
among Nomad clusters.
Gateways require Consul 1.8.0+, checked by a node constraint.
Closes#9445
If the connect.proxy stanza is left unset, the connection timeout
value is not set but is assumed to be, and may cause a non-fatal NPE
on job submission.
In a few places Nomad was using flag implementations directly
from Consul, lending to Nomad's need to import consul. Replace
those uses with helpers already in Nomad, and copy over the bare
minimum needed to make the autopilot flags behave as they have.
This is essentially a port of Consul's similar fix
Changes are:
go get -u github.com/hashicorp/go-connlimit
go mod vendor
Use new HTTP429 handler
20d1ea7d2d
* use full name for events
use evaluation and allocation instead of short name
* update api event stream package and shortnames
* update docs
* make sync; fix typo
* backwards compat not from 1.0.0-beta event stream api changes
* use api types instead of string
* rm backwards compat note that only changed between prereleases
* remove backwards incompat that only existed in prereleases
* Remove Managed Sinks from Nomad
Managed Sinks were a beta feature in Nomad 1.0-beta2. During the beta
period it was determined that this was not a scalable approach to
support community and third party sinks.
* update comment
* changelog
Before, upstreams could only be defined using the default datacenter.
Now, the `datacenter` field can be set in a connect upstream definition,
informing consul of the desire for an instance of the upstream service
in the specified datacenter. The field is optional and continues to
default to the local datacenter.
Closes#8964
The API is missing values for `ReadAllocs` and `WriteAllocs` fields, resulting
in allocation claims not being populated in the web UI. These fields mirror
the fields in `nomad/structs.CSIVolume`. Returning a separate list of stubs
for read and write would be ideal, but this can't be done without either
bloating the API response with repeated full `Allocation` data, or causing a
panic in previous versions of the CLI.
The `nomad/structs` fields are persisted with nil values and are populated
during RPC, so we'll do the same in the HTTP API and populate the `ReadAllocs`
and `WriteAllocs` fields with a map of allocation IDs, but with null
values. The web UI will then create its `ReadAllocations` and
`WriteAllocations` fields by mapping from those IDs to the values in
`Allocations`, instead of flattening the map into a list.
This PR adds the ability to set HTTP headers when downloading
an artifact from an `http` or `https` resource.
The implementation in `go-getter` is such that a new `HTTPGetter`
must be created for each artifact that sets headers (as opposed
to conveniently setting headers per-request). This PR maintains
the memoization of the default Getter objects, creating new ones
only for artifacts where headers are set.
Closes#9306
* Get concrete types out of dynamic payload
wip
pull out value setting to func
* Add TestEventSTream_SetPayloadValue
Add more assertions
use alias type in unmarshalJSON to handle payload rawmessage
shorten unmarshal and remove anonymous wrap struct
* use map structure and helper functions to return concrete types
* ensure times are properly handled
* update test name
* put all decode logic in a single function
Co-authored-by: Kris Hicks <khicks@hashicorp.com>
state store: call-out to generic update of job recommendations from job update method
recommendations API work, and http endpoint errors for OSS
support for scaling polices in task block of job spec
add query filters for ScalingPolicy list endpoint
command: nomad scaling policy list: added -job and -type
* add goroutine text profiles to nomad operator debug
* add server-id=all to nomad operator debug
* fix bug from changing metrics from string to []byte
* Add function to return MetricsSummary struct, metrics gotemplate support
* fix bug resolving 'server-id=all' when no servers are available
* add url to operator_debug tests
* removed test section which is used for future operator_debug.go changes
* separate metrics from operator, use only structs from go-metrics
* ensure parent directories are created as needed
* add suggested comments for text debug pprof
* move check down to where it is used
* add WaitForFiles helper function to wait for multiple files to exist
* compact metrics check
Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>
* fix github's silly apply suggestion
Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>
properly wire up durable event count
move newline responsibility
moves newline creation from NDJson to the http handler, json stream only encodes and sends now
ignore snapshot restore if broker is disabled
enable dev mode to access event steam without acl
use mapping instead of switch
use pointers for config sizes, remove unused ttl, simplify closed conn logic
Fixes#9017
The ?resources=true query parameter includes resources in the object
stub listings. Specifically:
- For `/v1/nodes?resources=true` both the `NodeResources` and
`ReservedResources` field are included.
- For `/v1/allocations?resources=true` the `AllocatedResources` field is
included.
The ?task_states=false query parameter removes TaskStates from
/v1/allocations responses. (By default TaskStates are included.)
Since CPU resources are usually a soft limit it is desirable to allow
setting it as low as possible to allow tasks to run only in "idle" time.
Setting it to 0 is still not allowed to avoid potential unintentional
side effects with allowing a zero value. While there may not be any side
effects this commit attempts to minimize risk by avoiding the issue.
This does *not* change the defaults.
Allocation requests should target servers, which then can forward the
request to the appropriate clients.
Contacting clients directly is fragile and prune to failures: e.g.
clients maybe firewalled and not accessible from the API client, or have
some internal certificates not trusted by the API client.
FWIW, in contexts where we anticipate lots of traffic (e.g. logs, or
exec), the api package attempts contacting the client directly but then
fallsback to using the server. This approach seems excessive in these
simple GET/PUT requests.
Fixes#8894
Copy Consul API's format: QueryOptions.WithContext(context) will now return
a new QueryOption whose HTTP requests will be canceled with the context
provided (and similar for WriteOptions)
The initial implementation of global job stop for MRD looped over all the
regions in the CLI for expedience. This changeset includes the OSS parts of
moving this into the RPC layer so that API consumers don't have to implement
this logic themselves.
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.
```hcl
service {
connect {
gateway {
proxy {
// envoy proxy configuration
}
ingress {
// ingress-gateway configuration entry
}
}
}
}
```
A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.
Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.
Aims to address #8294 and tangentially #8647
The soundness guarantees of the CSI specification leave a little to be desired
in our ability to provide a 100% reliable automated solution for managing
volumes. This changeset provides a new command to bridge this gap by providing
the operator the ability to intervene.
The command doesn't take an allocation ID so that the operator doesn't have to
keep track of alloc IDs that may have been GC'd. Handle this case in the
unpublish RPC by sending the client RPC for all the terminal/nil allocs on the
selected node.
This change adds the ability to set the fields `success_before_passing` and
`failures_before_critical` on Consul service check definitions. This is a
feature added to Consul v1.7.0 and later.
https://www.consul.io/docs/agent/checks#success-failures-before-passing-critical
Nomad doesn't do much besides pass the fields through to Consul.
Fixes#6913
Upgrade our consul/api import to the equivelent of consul@v1.8.1 which includes
a bug fix necessary for #6913. If consul would publish a proper api/ submodule tag
we could reference that.
adds in oss components to support enterprise multi-vault namespace feature
upgrade specific doc on vault multi-namespaces
vault docs
update test to reflect new error
The CSI `volume deregister -force` flag was using the documented parameter
`force` everywhere except in the API, where it was incorrectly passing `purge`
as the query parameter.
* made api.Scaling.Max a pointer, so we can detect (and complain) when it is neglected
* added checks to HCL parsing that it is present
* when Scaling.Max is absent/invalid, don't return extraneous error messages during validation
* tweak to multiregion handling to ensure that the count is valid on the interpolated regional jobs
resolves#8355
* command/agent/host: collect host data, multi platform
* nomad/structs/structs: new HostDataRequest/Response
* client/agent_endpoint: add RPC endpoint
* command/agent/agent_endpoint: add Host
* api/agent: add the Host endpoint
* nomad/client_agent_endpoint: add Agent Host with forwarding
* nomad/client_agent_endpoint: use findClientConn
This changes forwardMonitorClient and forwardProfileClient to use
findClientConn, which was cribbed from the common parts of those
funcs.
* command/debug: call agent hosts
* command/agent/host: eliminate calling external programs
The `nomad volume deregister` command currently returns an error if the volume
has any claims, but in cases where the claims can't be dropped because of
plugin errors, providing a `-force` flag gives the operator an escape hatch.
If the volume has no allocations or if they are all terminal, this flag
deletes the volume from the state store, immediately and implicitly dropping
all claims without further CSI RPCs. Note that this will not also
unmount/detach the volume, which we'll make the responsibility of a separate
`nomad volume detach` command.
This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.
The `connect` stanza now includes a `native` parameter, which can be
set to the name of task that backs the Connect Native Consul service.
There is a new Client configuration parameter for the `consul` stanza
called `share_ssl`. Like `allow_unauthenticated` the default value is
true, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does NOT include auth or token information.
If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the CONSUL_HTTP_TOKEN
environment variable.
Any of the automatically set environment variables can be overridden by
the Connect Native task using the `env` stanza.
Fixes#6083
In multiregion deployments when ACLs are enabled, the deploymentwatcher needs
an appropriately scoped ACL token with the same `submit-job` rights as the
user who submitted it. The token will already be replicated, so store the
accessor ID so that it can be retrieved by the leader.
Integration points for multiregion jobs to be registered in the enterprise
version of Nomad:
* hook in `Job.Register` for enterprise to send job to peer regions
* remove monitoring from `nomad job run` and `nomad job stop` for multiregion jobs
This PR switches the Nomad repository from using govendor to Go modules
for managing dependencies. Aspects of the Nomad workflow remain pretty
much the same. The usual Makefile targets should continue to work as
they always did. The API submodule simply defers to the parent Nomad
version on the repository, keeping the semantics of API versioning that
currently exists.