The description for the `nomad.nomad.blocked_evals.total_blocked` states that this could include evals blocked due to reached quota limits, but the `total_quota_limit` mentions being exclusive to its own metric. I personally interpret `total_blocked` as encompassing any blocked evals for any reason, as written in the docs. Though someone will have to verify the validity of that statement and possibly rectify the other metric description.
Fixed a typo: `limtis` vs `limits`.
* consul: correctly understand missing consul checks as unhealthy
This PR fixes a bug where Nomad assumed any registered Checks would exist
in the service registration coming back from Consul. In some cases, the
Consul may be slow in processing the check registration, and the response
object would not contain checks. Nomad would then scan the empty response
looking for Checks with failing health status, finding none, and then
marking a task/alloc as healthy.
In reality, we must always use Nomad's view of what checks should exist as
the source of truth, and compare that with the response Consul gives us,
making sure they match, before scanning the Consul response for failing
check statuses.
Fixes#15536
* consul: minor CR refactor using maps not sets
* consul: observe transition from healthy to unhealthy checks
* consul: spell healthy correctly
To see why I think this is a good change lets look at why I am making it
My disk was full, which means GC was happening agressively. So by the
time I called the logging endpoint from the SDK, the logs were GC'd
The error I was getting before was:
```
invalid character 'i' in literal false (expecting 'l')
```
Now the error I get is:
```
failed to decode log endpoint response as JSON: "failed to list entries: open /tmp/nomad.data.4219353875/alloc/f11fee50-2b66-a7a2-d3ec-8442cb3d557a/alloc/logs: no such file or directory"
```
Still not super descriptive but much more debugable
The OIDC provider cache is used by the RPC handler as the OIDC
implementation keeps long lived processes running. These process
include connections to the remote OIDC provider.
The Callback server is used by the CLI and starts when the login
command is triggered. This callback server includes success HTML
which is displayed when the user successfully logs into the remote
OIDC provider.
This adds new OIDC endpoints on the RPC endpoint. These two RPCs
handle generating the OIDC provider URL and then completing the
login by exchanging the provider token with an internal Nomad
token.
The RPC endpoints both do double forwarding. The initial forward
is to ensure we are talking to the regional leader; the second
then takes into account whether the auth method generates local or
global tokens. If it creates global tokens, we must then forward
onto the federated regional leader.
This PR adds support for configuring `proxy.upstreams[].config` for
Consul Connect upstreams. This is an opaque config value to Nomad -
the data is passed directly to Consul and is unknown to Nomad.
* connect: fix non-"tcp" ingress gateway validation
changes apply to http, http2, and grpc:
* if "hosts" is excluded, consul will use its default domain
e.g. <service-name>.ingress.dc1.consul
* can't set hosts with "*" service name
* test http2 and grpc too
* [no ci] first pass at plumbing grpc_ca_file
* consul: add support for grpc_ca_file for tls grpc connections in consul 1.14+
This PR adds client config to Nomad for specifying consul.grpc_ca_file
These changes combined with https://github.com/hashicorp/consul/pull/15913 should
finally enable Nomad users to upgrade to Consul 1.14+ and use tls grpc connections.
* consul: add cl entgry for grpc_ca_file
* docs: mention grpc_tls changes due to Consul 1.14
Devices are fingerprinted as groups of similar devices. This prevented
specifying specific device by their ID in constraint and affinity rules.
This commit introduces the `${device.ids}` attribute that returns a
comma separated list of IDs that are part of the device group. Users can
then use the set operators to write rules.
* vault: configure user agent on Nomad vault clients
This PR attempts to set the User-Agent header on each Vault API client
created by Nomad. Still need to figure a way to set User-Agent on the
Vault client created internally by consul-template.
* vault: fixup find-and-replace gone awry
This changeset covers a sidebar discussion that @schmichael and I had around the
design for pre-forwarding auth. This includes some changes extracted out of
#15513 to make it easier to review both and leave a clean history.
* Remove fast path for NodeID. Previously-connected clients will have a NodeID
set on the context, and because this is a large portion of the RPCs sent we
fast-pathed it at the top of the `Authenticate` method. But the context is
shared for all yamux streams over the same yamux session (and TCP
connection). This lets an authenticated HTTP request to a client use the
NodeID for authentication, which is a privilege escalation. Remove the fast
path and annotate it so that we don't break it again.
* Add context to decisions around AuthenticatedIdentity. The `Authenticate`
method taken on its own looks like it wants to return an `acl.ACL` that folds
over all the various identity types (creating an ephemeral ACL on the fly if
neccessary). But keeping these fields idependent allows RPC handlers to
differentiate between internal and external origins so we most likely want to
avoid this. Leave some docstrings as a warning as to why this is built the way
it is.
* Mutate the request rather than returning. When reviewing #15513 we decided
that forcing the request handler to call `SetIdentity` was repetitive and
error prone. Instead, the `Authenticate` method mutates the request by setting
its `AuthenticatedIdentity`.