To see why I think this is a good change lets look at why I am making it
My disk was full, which means GC was happening agressively. So by the
time I called the logging endpoint from the SDK, the logs were GC'd
The error I was getting before was:
```
invalid character 'i' in literal false (expecting 'l')
```
Now the error I get is:
```
failed to decode log endpoint response as JSON: "failed to list entries: open /tmp/nomad.data.4219353875/alloc/f11fee50-2b66-a7a2-d3ec-8442cb3d557a/alloc/logs: no such file or directory"
```
Still not super descriptive but much more debugable
The OIDC provider cache is used by the RPC handler as the OIDC
implementation keeps long lived processes running. These process
include connections to the remote OIDC provider.
The Callback server is used by the CLI and starts when the login
command is triggered. This callback server includes success HTML
which is displayed when the user successfully logs into the remote
OIDC provider.
This adds new OIDC endpoints on the RPC endpoint. These two RPCs
handle generating the OIDC provider URL and then completing the
login by exchanging the provider token with an internal Nomad
token.
The RPC endpoints both do double forwarding. The initial forward
is to ensure we are talking to the regional leader; the second
then takes into account whether the auth method generates local or
global tokens. If it creates global tokens, we must then forward
onto the federated regional leader.
This PR adds support for configuring `proxy.upstreams[].config` for
Consul Connect upstreams. This is an opaque config value to Nomad -
the data is passed directly to Consul and is unknown to Nomad.
* connect: fix non-"tcp" ingress gateway validation
changes apply to http, http2, and grpc:
* if "hosts" is excluded, consul will use its default domain
e.g. <service-name>.ingress.dc1.consul
* can't set hosts with "*" service name
* test http2 and grpc too
* [no ci] first pass at plumbing grpc_ca_file
* consul: add support for grpc_ca_file for tls grpc connections in consul 1.14+
This PR adds client config to Nomad for specifying consul.grpc_ca_file
These changes combined with https://github.com/hashicorp/consul/pull/15913 should
finally enable Nomad users to upgrade to Consul 1.14+ and use tls grpc connections.
* consul: add cl entgry for grpc_ca_file
* docs: mention grpc_tls changes due to Consul 1.14
Devices are fingerprinted as groups of similar devices. This prevented
specifying specific device by their ID in constraint and affinity rules.
This commit introduces the `${device.ids}` attribute that returns a
comma separated list of IDs that are part of the device group. Users can
then use the set operators to write rules.
* vault: configure user agent on Nomad vault clients
This PR attempts to set the User-Agent header on each Vault API client
created by Nomad. Still need to figure a way to set User-Agent on the
Vault client created internally by consul-template.
* vault: fixup find-and-replace gone awry
This changeset covers a sidebar discussion that @schmichael and I had around the
design for pre-forwarding auth. This includes some changes extracted out of
#15513 to make it easier to review both and leave a clean history.
* Remove fast path for NodeID. Previously-connected clients will have a NodeID
set on the context, and because this is a large portion of the RPCs sent we
fast-pathed it at the top of the `Authenticate` method. But the context is
shared for all yamux streams over the same yamux session (and TCP
connection). This lets an authenticated HTTP request to a client use the
NodeID for authentication, which is a privilege escalation. Remove the fast
path and annotate it so that we don't break it again.
* Add context to decisions around AuthenticatedIdentity. The `Authenticate`
method taken on its own looks like it wants to return an `acl.ACL` that folds
over all the various identity types (creating an ephemeral ACL on the fly if
neccessary). But keeping these fields idependent allows RPC handlers to
differentiate between internal and external origins so we most likely want to
avoid this. Leave some docstrings as a warning as to why this is built the way
it is.
* Mutate the request rather than returning. When reviewing #15513 we decided
that forcing the request handler to call `SetIdentity` was repetitive and
error prone. Instead, the `Authenticate` method mutates the request by setting
its `AuthenticatedIdentity`.
This PR modifies the configuration of the networking pause contaier to include
the "unless-stopped" restart policy. The pause container should always be
restored into a running state until Nomad itself issues a stop command for the
container.
This is not a _perfect_ fix for #12216 but it should cover the 99% use case -
where a pause container gets accidently stopped / killed for some reason. There
is still a possibility where the pause container and main task container are
stopped and started in the order where the bad behavior persists, but this is
fundamentally unavoidable due to how docker itself abstracts and manages the
underlying network namespace referenced by the containers.
Closes#12216
This PR fixes the artifact sandbox (new in Nomad 1.5) to allow downloading
artifacts into the shared 'alloc' directory made available to each task in
a common allocation. Previously we assumed the 'alloc' dir would be mounted
under the 'task' dir, but this is only the case in fs isolation: chroot; in
other modes the alloc dir is elsewhere.
The e2e suite is not in good shape right now; let's disable the tests that modify
agent / node state until we can get things working again. Also the one DC test
that was enabled still doesn't work anyway.
Running tests `on: push` prevents GitHub from showing the workflow approval
button, which prevents tests from being run on community-contributed (or even
just non-Nomad HashiCorp folks) PRs. Running `on: pull_request` automatically
picks up opened, reopened, and synchronize hooks (where "synchronize" means a
push to HEAD on the PR's branch, so that'll pick up rebases and updates).
But we also want to run tests on `main` and the various `release` backport
branches, so retain a `on: push` for those.
The command line flag parsing and the HTTP header parsing for CSI secrets
incorrectly split at more than one '=' rune, making it impossible to use secrets
that included that rune.
In #15605 we fixed the bug where the presense of "stale" query parameter
was mean to imply stale, even if the value of the parameter was "false"
or malformed. In parsing, we missed the case where the slice of values
would be nil which lead to a failing test case that was missed because
CI didn't run against the original PR.