When a Nomad service starts it tries to establish a connection with
servers, but it also runs alloc runners to manage whatever allocations
it needs to run.
The alloc runner will invoke several hooks to perform actions, with some
of them requiring access to the Nomad servers, such as Native Service
Discovery Registration.
If the alloc runner starts before a connection is established the alloc
runner will fail, causing the allocation to be shutdown. This is
particularly problematic for disconnected allocations that are
reconnecting, as they may fail as soon as the client reconnects.
This commit changes the RPC request logic to retry it, using the
existing retry mechanism, if there are no servers available.
Also add some debug log lines for this test, because it doesn't make sense
for the allocation to be complete yet a task in the allocation to be not
started yet, which is what the test failures are implying.
Allocations created before 1.4.0 will not have a workload identity token. When
the client running these allocs is upgraded to 1.4.x, the identity hook will run
and replace the node secret ID token used previously with an empty string. This
causes service discovery queries to fail.
Fallback to the node's secret ID when the allocation doesn't have a signed
identity. Note that pre-1.4.0 allocations won't have templates that read
Variables, so there's no threat that this new node ID secret will be able to
read data that the allocation shouldn't have access to.
* Adds meta to job list stub and displays a pack logo on the jobs index
* Changelog
* Modifying struct for optional meta param
* Explicitly ask for meta anytime I look up a job from index or job page
* Test case for the endpoint
* adding meta field to API struct and ommitting from response if empty
* passthru method added to api/jobs.list
* Meta param listed in docs for jobs list
* Update api/jobs.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
* Job spec upload by click or drag
* pseudo-restrict formats
* Changelog
* Tweak to job spec upload to be above editor layer
* Within the job-editor again tho
* Beginning testcase cleanup
* Test progression
* refact: update codemirror fillin logic
Co-authored-by: Jai Bhagat <jaybhagat841@gmail.com>
If a GC claim is written and then volume is deleted before the `volumewatcher`
enters its run loop, we panic on the nil-pointer access. Simply doing a
nil-check at the top of the loop reveals a race condition around shutting down
the loop just as a new update is coming in.
Have the parent `volumeswatcher` send an initial update on the channel before
returning, so that we're still holding the lock. Update the watcher's `Stop`
method to set the running state, which lets us avoid having a second context and
makes stopping synchronous. This reduces the cases we have to handle in the run
loop.
Updated the tests now that we'll safely return from the goroutine and stop the
runner in a larger set of cases. Ran the tests with the `-race` detection flag
and fixed up any problems found here as well.
In order to limit how much the rekey job can monopolize a scheduler worker, we
limit how long it can run to 1min before stopping work and emitting a new
eval. But this exactly matches the default nack timeout, so it'll fail the eval
rather than getting a chance to emit a new one.
Set the timeout for the rekey eval to half the configured nack timeout.
When replication of a single key fails, the replication loop breaks early and
therefore keys that fall later in the sorting order will never get
replicated. This is particularly a problem for clusters impacted by the bug that
caused #14981 and that were later upgraded; the keys that were never replicated
can now never be replicated, and so we need to handle them safely.
Included in the replication fix:
* Refactor the replication loop so that each key replicated in a function call
that returns an error, to make the workflow more clear and reduce nesting. Log
the error and continue.
* Improve stability of keyring replication tests. We no longer block leadership
on initializing the keyring, so there's a race condition in the keyring tests
where we can test for the existence of the root key before the keyring has
been initialize. Change this to an "eventually" test.
But these fixes aren't enough to fix#14981 because they'll end up seeing an
error once a second complaining about the missing key, so we also need to fix
keyring GC so the keys can be removed from the state store. Now we'll store the
key ID used to sign a workload identity in the Allocation, and we'll index the
Allocation table on that so we can track whether any live Allocation was signed
with a particular key ID.
The `Eval.Delete` endpoint has a helper that takes a list of jobs and allocs and
determines whether the eval associated with those is safe to delete (based on
their state). Filtering improvements to the `Eval.Delete` endpoint are going to
need this check to run in the state store itself for consistency.
Refactor to push this check down into the state store to keep the eventual diff
for that work reasonable.
While working on filtering improvements to the `Eval.Delete` endpoint I noticed
that this test was going to need to expand significantly and needed some
refactoring to make that work nicely. In order to reduce the size of the
eventual diff, I've pulled this refactoring out into its own changeset.
The List RPC correctly authorized against the prefix argument. But when
filtering results underneath the prefix, it only checked authorization for
standard ACL tokens and not Workload Identity. This results in WI tokens being
able to read List results (metadata only: variable paths and timestamps) for
variables under the `nomad/` prefix that belong to other jobs in the same
namespace.
Fixes the filtering and split the `handleMixedAuthEndpoint` function into
separate authentication and authorization steps so that we don't need to
re-verify the claim token on each filtered object.
Also includes:
* update semgrep rule for mixed auth endpoints
* variables: List returns empty set when all results are filtered
This change ensures that a token's expiry is checked before every
event is sent to the caller. Previously, a token could still be
used to listen for events after it had expired, as long as the
subscription was made while it was unexpired. This would last until
the token was garbage collected from state.
The check occurs within the RPC as there is currently no state
update when a token expires.
* [no ci] use json for grouping packages for testing
* [no ci] able to get packages in group
* [no ci] able to run groups of tests
* [no ci] more
* [no ci] try disable circle unit tests
* ci: use actions/checkout@v3
* ci: rename to quick
* ci: need make dev in mods cache step
* ci: make compile step depend on checks step
* ci: bump consul and vault versions
* ci: need make dev for group tests
* ci: update ci unit testing docs
* docs: spell plumbing correctly
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
The existing docs on required capabilities are a little sparse and have been the
subject of a lots of questions. Expand on this information and provide a pointer
to the ongoing design discussion around rootless Nomad.
* client: ensure minimal cgroup controllers enabled
This PR fixes a bug where Nomad could not operate properly on operating
systems that set the root cgroup.subtree_control to a set of controllers that
do not include the minimal set of controllers needed by Nomad.
Nomad needs these controllers enabled to operate:
- cpuset
- cpu
- io
- memory
- pids
Now, Nomad will ensure these controllers are enabled during Client initialization,
adding them to cgroup.subtree_control as necessary. This should be particularly
helpful on the RHEL/CentOS/Fedora family of system. Ubuntu systems should be
unaffected as they enable all controllers by default.
Fixes: https://github.com/hashicorp/nomad/issues/14494
* docs: cleanup doc string
* client: cleanup controller writes, enhance log messages
The configuration knobs for root keyring garbage collection are present in the
consumer and present in the user-facing config, but we missed the spot where we
copy from one to the other. Fix this so that users can set their own thresholds.
The root key is automatically rotated every ~30d, but the function that does
both rotation and key GC was wired up such that `nomad system gc` caused an
unexpected key rotation. Split this into two functions so that `nomad system gc`
cleans up old keys without forcing a rotation, which will be done periodially
or by the `nomad operator root keyring rotate` command.
* keyring: don't unblock early if rate limit burst exceeded
The rate limiter returns an error and unblocks early if its burst limit is
exceeded (unless the burst limit is Inf). Ensure we're not unblocking early,
otherwise we'll only slow down the cases where we're already pausing to make
external RPC requests.
* keyring: set MinQueryIndex on stale queries
When keyring replication makes a stale query to non-leader peers to find a key
the leader doesn't have, we need to make sure the peer we're querying has had a
chance to catch up to the most current index for that key. Otherwise it's
possible for newly-added servers to query another newly-added server and get a
non-error nil response for that key ID.
Ensure that we're setting the correct reply index in the blocking query.
Note that the "not found" case does not return an error, just an empty key. So
as a belt-and-suspenders, update the handling of empty responses so that we
don't break the loop early if we hit a server that doesn't have the key.
* test for adding new servers to keyring
* leader: initialize keyring after we have consistent reads
Wait until we're sure the FSM is current before we try to initialize the
keyring.
Also, if a key is rotated immediately following a leader election, plans that
are in-flight may get signed before the new leader has the key. Allow for a
short timeout-and-retry to avoid rejecting plans
Originally this test relied on Job 1 blocking Job 2 until Job 1 had a
terminal *ClientStatus.* Job 2 ensured it would get blocked using 2
mechanisms:
1. A constraint requiring it is placed on the same node as Job 1.
2. Job 2 would require all unreserved CPU on the node to ensure it would
be blocked until Job 1's resources were free.
That 2nd assertion breaks if *any previous job is still running on the
target node!* That seems very likely to happen in the flaky world of our
e2e tests. In fact there may be some jobs we intentionally want running
throughout; in hindsight it was never safe to assume my test would be
the only thing scheduled when it ran.
*Ports to the rescue!* Reserving a static port means that both Job 2
will now block on Job 1 being terminal. It will only conflict with other
tests if those tests use that port *on every node.* I ensured no
existing tests were using the port I chose.
Other changes:
- Gave job a bit more breathing room resource-wise.
- Tightened timings a bit since previous failure ran into the `go test`
time limit.
- Cleaned up the DumpEvals output. It's quite nice and handy now!