This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.
```hcl
service {
connect {
gateway {
proxy {
// envoy proxy configuration
}
ingress {
// ingress-gateway configuration entry
}
}
}
}
```
A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.
Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.
Aims to address #8294 and tangentially #8647
The soundness guarantees of the CSI specification leave a little to be desired
in our ability to provide a 100% reliable automated solution for managing
volumes. This changeset provides a new command to bridge this gap by providing
the operator the ability to intervene.
The command doesn't take an allocation ID so that the operator doesn't have to
keep track of alloc IDs that may have been GC'd. Handle this case in the
unpublish RPC by sending the client RPC for all the terminal/nil allocs on the
selected node.
This change adds the ability to set the fields `success_before_passing` and
`failures_before_critical` on Consul service check definitions. This is a
feature added to Consul v1.7.0 and later.
https://www.consul.io/docs/agent/checks#success-failures-before-passing-critical
Nomad doesn't do much besides pass the fields through to Consul.
Fixes#6913
Upgrade our consul/api import to the equivelent of consul@v1.8.1 which includes
a bug fix necessary for #6913. If consul would publish a proper api/ submodule tag
we could reference that.
adds in oss components to support enterprise multi-vault namespace feature
upgrade specific doc on vault multi-namespaces
vault docs
update test to reflect new error
The CSI `volume deregister -force` flag was using the documented parameter
`force` everywhere except in the API, where it was incorrectly passing `purge`
as the query parameter.
* made api.Scaling.Max a pointer, so we can detect (and complain) when it is neglected
* added checks to HCL parsing that it is present
* when Scaling.Max is absent/invalid, don't return extraneous error messages during validation
* tweak to multiregion handling to ensure that the count is valid on the interpolated regional jobs
resolves#8355
* command/agent/host: collect host data, multi platform
* nomad/structs/structs: new HostDataRequest/Response
* client/agent_endpoint: add RPC endpoint
* command/agent/agent_endpoint: add Host
* api/agent: add the Host endpoint
* nomad/client_agent_endpoint: add Agent Host with forwarding
* nomad/client_agent_endpoint: use findClientConn
This changes forwardMonitorClient and forwardProfileClient to use
findClientConn, which was cribbed from the common parts of those
funcs.
* command/debug: call agent hosts
* command/agent/host: eliminate calling external programs
The `nomad volume deregister` command currently returns an error if the volume
has any claims, but in cases where the claims can't be dropped because of
plugin errors, providing a `-force` flag gives the operator an escape hatch.
If the volume has no allocations or if they are all terminal, this flag
deletes the volume from the state store, immediately and implicitly dropping
all claims without further CSI RPCs. Note that this will not also
unmount/detach the volume, which we'll make the responsibility of a separate
`nomad volume detach` command.
This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.
The `connect` stanza now includes a `native` parameter, which can be
set to the name of task that backs the Connect Native Consul service.
There is a new Client configuration parameter for the `consul` stanza
called `share_ssl`. Like `allow_unauthenticated` the default value is
true, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does NOT include auth or token information.
If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the CONSUL_HTTP_TOKEN
environment variable.
Any of the automatically set environment variables can be overridden by
the Connect Native task using the `env` stanza.
Fixes#6083
In multiregion deployments when ACLs are enabled, the deploymentwatcher needs
an appropriately scoped ACL token with the same `submit-job` rights as the
user who submitted it. The token will already be replicated, so store the
accessor ID so that it can be retrieved by the leader.
Integration points for multiregion jobs to be registered in the enterprise
version of Nomad:
* hook in `Job.Register` for enterprise to send job to peer regions
* remove monitoring from `nomad job run` and `nomad job stop` for multiregion jobs
This PR switches the Nomad repository from using govendor to Go modules
for managing dependencies. Aspects of the Nomad workflow remain pretty
much the same. The usual Makefile targets should continue to work as
they always did. The API submodule simply defers to the parent Nomad
version on the repository, keeping the semantics of API versioning that
currently exists.
Fixes#7854
Nomad requires a version of go-getter that is currently in PR (https://github.com/hashicorp/go-getter/pull/256)
We also require some recent bug fix to go-getter around the handling of URL redirects.
Update our vendor'd copy of go-getter to the newly rebased umask changes so that we can incorporate
the latest changes for go-getter.
* changes necessary to support oss licesning shims
revert nomad fmt changes
update test to work with enterprise changes
update tests to work with new ent enforcements
make check
update cas test to use scheduler algorithm
back out preemption changes
add comments
* remove unused method
Also, since github.com/docker/docker is the canonical package names and
is transparently forwarded to github.com/moby/moby, I removed the
moby/moby references in origin.
Pick up a panic fix of f1bd0923b8
Some CirleCI builds fail somewhat mysteriously, e.g. https://circleci.com/gh/hashicorp/nomad/52676 , due to this bug. The test json file emit the following:
```
{"Time":"2020-03-28T12:03:08.602544522Z","Action":"output","Package":"github.com/hashicorp/nomad/client/allocrunner","Test":"TestGroupServiceHook_Update08Alloc","Output":"panic: send on closed channel\n"}
{"Time":"2020-03-28T12:03:08.602576075Z","Action":"output","Package":"github.com/hashicorp/nomad/client/allocrunner","Test":"TestGroupServiceHook_Update08Alloc","Output":"\n"}
{"Time":"2020-03-28T12:03:08.602584429Z","Action":"output","Package":"github.com/hashicorp/nomad/client/allocrunner","Test":"TestGroupServiceHook_Update08Alloc","Output":"goroutine 403 [running]:\n"}
{"Time":"2020-03-28T12:03:08.602590561Z","Action":"output","Package":"github.com/hashicorp/nomad/client/allocrunner","Test":"TestGroupServiceHook_Update08Alloc","Output":"github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually.func1(0xc000a78000, 0xc0009cf160)\n"}
{"Time":"2020-03-28T12:03:08.602598464Z","Action":"output","Package":"github.com/hashicorp/nomad/client/allocrunner","Test":"TestGroupServiceHook_Update08Alloc","Output":"\t/home/circleci/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1494 +0x47\n"}
{"Time":"2020-03-28T12:03:08.602604952Z","Action":"output","Package":"github.com/hashicorp/nomad/client/allocrunner","Test":"TestGroupServiceHook_Update08Alloc","Output":"created by github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually\n"}
```
This change adds Darwin and FreeBSD C code of gopsutil library, that is
needed for these platforms. `shirou/gopsutil` uses some C code that
isn't in a go package, so don't get vendored automatically.
hashicorp/hcl library added some better validation for error and illegal
characters. The diff is primarily improved error reporting. The
parser.go change includes a case where illegal characters were silently
dropped, but now get reported as invalid characters.
Introduce limits to prevent unauthorized users from exhausting all
ephemeral ports on agents:
* `{https,rpc}_handshake_timeout`
* `{http,rpc}_max_conns_per_client`
The handshake timeout closes connections that have not completed the TLS
handshake by the deadline (5s by default). For RPC connections this
timeout also separately applies to first byte being read so RPC
connections with TLS enabled have `rpc_handshake_time * 2` as their
deadline.
The connection limit per client prevents a single remote TCP peer from
exhausting all ephemeral ports. The default is 100, but can be lowered
to a minimum of 26. Since streaming RPC connections create a new TCP
connection (until MultiplexV2 is used), 20 connections are reserved for
Raft and non-streaming RPCs to prevent connection exhaustion due to
streaming RPCs.
All limits are configurable and may be disabled by setting them to `0`.
This also includes a fix that closes connections that attempt to create
TLS RPC connections recursively. While only users with valid mTLS
certificates could perform such an operation, it was added as a
safeguard to prevent programming errors before they could cause resource
exhaustion.
Update github.com/aws/aws-sdk-go and github.com/hashicorp/go-discover to
pick up support for EC2 Metadata Instance Service v2 changes.
Follow up to https://github.com/hashicorp/go-discover/pull/128 .