Commit Graph

4246 Commits

Author SHA1 Message Date
Seth Hoenig 825c5cc65e
artifact: add client toggle to disable filesystem isolation (#15503)
This PR adds the client config option for turning off filesystem isolation,
applicable on Linux systems where filesystem isolation is possible and
enabled by default.

```hcl
client{
  artifact {
    disable_filesystem_isolation = <bool:false>
  }
}
```

Closes #15496
2022-12-08 12:29:23 -06:00
Piotr Kazmierczak 1cb45630f0
acl: canonicalize ACL Auth Method object (#15492) 2022-12-08 14:05:46 +01:00
Piotr Kazmierczak 10f80f7d9a
bugfix: make sure streaming endpoints are only registered once (#15484)
Streaming RPCs should only be registered once, not on every RPC call, because they set keys in StreamingRpcRegistry.registry map. This PR fixes it by checking whether endpoints are already registered before calling .register() method. Fixes #15474

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-12-07 17:01:45 +01:00
Tim Gross e0fddee386
Pre forwarding authentication (#15417)
Upcoming work to instrument the rate of RPC requests by consumer (and eventually
rate limit) require that we authenticate a RPC request before forwarding. Add a
new top-level `Authenticate` method to the server and have it return an
`AuthenticatedIdentity` struct. RPC handlers will use the relevant fields of
this identity for performing authorization.

This changeset includes:
* The main implementation of `Authenticate`
* Provide a new RPC `ACL.WhoAmI` for debugging authentication. This endpoint
  returns the same `AuthenticatedIdentity` that will be used by RPC handlers. At
  some point we might want to give this an equivalent HTTP endpoint but I didn't
  want to add that to our public API until some of the other Workload Identity
  work is solidified, especially if we don't need it yet.
* A full coverage test of the `Authenticate` method. This sets up two server
  nodes with mTLS and ACLs, some tokens, and some allocations with workload
  identities.
* Wire up an example of using `Authenticate` in the `Namespace.Upsert` RPC and
  see how authorization happens after forwarding.
* A new semgrep rule for `Authenticate`, which we'll need to update once we're
  ready to wire up more RPC endpoints with authorization steps.
2022-12-06 14:44:03 -05:00
Piotr Kazmierczak d3aac1fe08
acl: sso auth method snapshot restore test (#15482) 2022-12-06 15:15:29 +01:00
Piotr Kazmierczak 777173e8da
acl: added type to ACL Auth Method stub (#15480) 2022-12-06 14:47:05 +01:00
Tim Gross dfed1ba5bc
remove most static RPC handlers (#15451)
Nomad server components that aren't in the `nomad` package like the deployment
watcher and volume watcher need to make RPC calls but can't import the Server
struct to do so because it creates a circular reference. These components have a
"shim" object that gets populated to pass a "static" handler that has no RPC
context.

Most RPC handlers are never used in this way, but during server setup we were
constructing a set of static handlers for most RPC endpoints anyways. This is
slightly wasteful but also confusing to developers who end up being encouraged
to just copy what was being done for previous RPCs.

This changeset includes the following refactorings:
* Remove the static handlers field on the server
* Instead construct just the specific static handlers we need to pass into the
  deployment watcher and volume watcher.
* Remove the unnecessary static handler from heartbeater
* Update various tests to avoid needing the static endpoints and have them use a
  endpoint constructed on the spot.

Follow-up work will examine whether we can remove the RPCs from deployment
watcher and volume watcher entirely, falling back to raft applies like node
drainer does currently.
2022-12-02 10:12:05 -05:00
Tim Gross 33f32d526e
fix enterprise endpoint registration (#15446)
In #15430 we refactored the RPC endpoint configuration to make adding the RPC
context easier. But when implementing the change on the Enterprise side, I
discovered that the registration of enterprise endpoints was being done
incorrectly -- this doesn't show up on OSS because the registration is always a
no-op here.
2022-12-01 10:44:33 -05:00
Tim Gross f61f801e77
provide `RPCContext` to all RPC handlers (#15430)
Upcoming work to instrument the rate of RPC requests by consumer (and eventually
rate limit) requires that we thread the `RPCContext` through all RPC
handlers so that we can access the underlying connection. This changeset adds
the context to everywhere we intend to initially support it and intentionally
excludes streaming RPCs and client RPCs.

To improve the ergonomics of adding the context everywhere its needed and to
clarify the requirements of dynamic vs static handlers, I've also done a good
bit of refactoring here:

* canonicalized the RPC handler fields so they're as close to identical as
  possible without introducing unused fields (i.e. I didn't add loggers if the
  handler doesn't use them already).
* canonicalized the imports in the handler files.
* added a `NewExampleEndpoint` function for each handler that ensures we're
  constructing the handlers with the required arguments.
* reordered the registration in server.go to match the order of the files (to
  make it easier to see if we've missed one), and added a bunch of commentary
  there as to what the difference between static and dynamic handlers is.
2022-12-01 10:05:15 -05:00
Piotr Kazmierczak 0eccd3286c
acl: sso auth methods RPC/API/CLI should return created or updated objects (#15410)
Currently CRUD code that operates on SSO auth methods does not return created or updated object upon creation/update. This is bad UX and inconsistent behavior compared to other ACL objects like roles, policies or tokens.

This PR fixes it.

Relates to #13120
2022-11-29 07:36:36 +01:00
James Rasell 726d419da1
acl: replicate auth-methods from federated cluster leaders. (#15366) 2022-11-28 09:20:24 +01:00
Luiz Aoqui 8f91be26ab
scheduler: create placements for non-register MRD (#15325)
* scheduler: create placements for non-register MRD

For multiregion jobs, the scheduler does not create placements on
registration because the deployment must wait for the other regions.
Once of these regions will then trigger the deployment to run.

Currently, this is done in the scheduler by considering any eval for a
multiregion job as "paused" since it's expected that another region will
eventually unpause it.

This becomes a problem where evals not triggered by a job registration
happen, such as on a node update. These types of regional changes do not
have other regions waiting to progress the deployment, and so they were
never resulting in placements.

The fix is to create a deployment at job registration time. This
additional piece of state allows the scheduler to differentiate between
a multiregion change, where there are other regions engaged in the
deployment so no placements are required, from a regional change, where
the scheduler does need to create placements.

This deployment starts in the new "initializing" status to signal to the
scheduler that it needs to compute the initial deployment state. The
multiregion deployment will wait until this deployment state is
persisted and its starts is set to "pending". Without this state
transition it's possible to hit a race condition where the plan applier
and the deployment watcher may step of each other and overwrite their
changes.

* changelog: add entry for #15325
2022-11-25 12:45:34 -05:00
James Rasell 32dfa431f3
sso: add ACL auth-method HTTP API CRUD endpoints (#15338)
* core: remove custom auth-method TTLS and use ACL token TTLS.

* agent: add ACL auth-method HTTP endpoints for CRUD actions.

* api: add ACL auth-method client.
2022-11-23 09:38:02 +01:00
Piotr Kazmierczak bb66b5e770
acl: sso auth method RPC endpoints (#15221)
This PR implements RPC endpoints for SSO auth methods.

This PR is part of the SSO work captured under ☂️ ticket #13120.
2022-11-21 10:15:39 +01:00
Piotr Kazmierczak d02241cad5
acl: sso auth method event stream (#15280)
This PR implements SSO auth method support in the event stream.

This PR is part of the SSO work captured under ☂️ ticket #13120.
2022-11-21 10:06:05 +01:00
Tim Gross 05a46e6648
make eval cancelation really async with `Eval.Ack` (#15298)
Ensure we never block in the `Eval.Ack`
2022-11-18 08:38:17 -05:00
Tim Gross b74a868aae
make eval cancelation async with `Eval.Ack` (#15294)
In #14621 we added an eval canelation reaper goroutine with a channel that
allowed us to wake it up. But we forgot to actually send on this channel from
`Eval.Ack` and are still committing the cancelations synchronously. Fix this by
sending on the buffered channel to wake up the reaper instead.
2022-11-17 16:40:41 -05:00
Tim Gross d0f9e887f7
autopilot: include only servers from the same region (#15290)
When we migrated to the updated autopilot library in Nomad 1.4.0, the interface
for finding servers changed. Previously autopilot would get the serf members and
call `IsServer` on each of them, leaving it up to the implementor to filter out
clients (and in Nomad's case, other regions). But in the "new" autopilot
library, the equivalent interface is `KnownServers` for which we did not filter
by region. This causes spurious attempts for the cross-region stats fetching,
which results in TLS errors and a lot of log noise.

Filter the member set by region to fix the regression.
2022-11-17 12:09:36 -05:00
Tim Gross 510eb435dc
remove deprecated `AllocUpdateRequestType` raft entry (#15285)
After Deployments were added in Nomad 0.6.0, the `AllocUpdateRequestType` raft
log entry was no longer in use. Mark this as deprecated, remove the associated
dead code, and remove references to the metrics it emits from the docs. We'll
leave the entry itself just in case we encounter old raft logs that we need to
be able to safely load.
2022-11-17 12:08:04 -05:00
Tim Gross dd3a07302e
keyring: update handle to state inside replication loop (#15227)
* keyring: update handle to state inside replication loop

When keyring replication starts, we take a handle to the state store. But
whenever a snapshot is restored, this handle is invalidated and no longer points
to a state store that is receiving new keys. This leaks a bunch of memory too!

In addition to operator-initiated restores, when fresh servers are added to
existing clusters with large-enough state, the keyring replication can get
started quickly enough that it's running before the snapshot from the existing
clusters have been restored.

Fix this by updating the handle to the state store on each pass.
2022-11-17 08:40:12 -05:00
Tim Gross 6415fb4284
eval broker: shed all but one blocked eval per job after ack (#14621)
When an evaluation is acknowledged by a scheduler, the resulting plan is
guaranteed to cover up to the `waitIndex` set by the worker based on the most
recent evaluation for that job in the state store. At that point, we no longer
need to retain blocked evaluations in the broker that are older than that index.

Move all but the highest priority / highest `ModifyIndex` blocked eval into a
canceled set. When the `Eval.Ack` RPC returns from the eval broker it will
signal a reap of a batch of cancelable evals to write to raft. This paces the
cancelations limited by how frequently the schedulers are acknowledging evals;
this should reduce the risk of cancelations from overwhelming raft relative to
scheduler progress. In order to avoid straggling batches when the cluster is
quiet, we also include a periodic sweep through the cancelable list.
2022-11-16 16:10:11 -05:00
Tim Gross e8c83c3ecc
test: ensure leader is still valid in reelection test (#15267)
The `TestLeader_Reelection` test waits for a leader to be elected and then makes
some other assertions. But it implcitly assumes that there's no failure of
leadership before shutting down the leader, which can lead to a panic in the
tests. Assert there's still a leader before the shutdown.
2022-11-16 11:04:02 -05:00
Tim Gross 37134a4a37
eval delete: move batching of deletes into RPC handler and state (#15117)
During unusual outage recovery scenarios on large clusters, a backlog of
millions of evaluations can appear. In these cases, the `eval delete` command can
put excessive load on the cluster by listing large sets of evals to extract the
IDs and then sending larges batches of IDs. Although the command's batch size
was carefully tuned, we still need to be JSON deserialize, re-serialize to
MessagePack, send the log entries through raft, and get the FSM applied.

To improve performance of this recovery case, move the batching process into the
RPC handler and the state store. The design here is a little weird, so let's
look a the failed options first:

* A naive solution here would be to just send the filter as the raft request and
  let the FSM apply delete the whole set in a single operation. Benchmarking with
  1M evals on a 3 node cluster demonstrated this can block the FSM apply for
  several minutes, which puts the cluster at risk if there's a leadership
  failover (the barrier write can't be made while this apply is in-flight).

* A less naive but still bad solution would be to have the RPC handler filter
  and paginate, and then hand a list of IDs to the existing raft log
  entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and
  took roughly an hour to complete.

Instead, we're filtering and paginating in the RPC handler to find a page token,
and then passing both the filter and page token in the raft log. The FSM apply
recreates the paginator using the filter and page token to get roughly the same
page of evaluations, which it then deletes. The pagination process is fairly
cheap (only abut 5% of the total FSM apply time), so counter-intuitively this
rework ends up being much faster. A benchmark of 1M evaluations showed this
blocked the FSM apply for 20-30ms at a time (typical for normal operations) and
completes in less than 4 minutes.

Note that, as with the existing design, this delete is not consistent: a new
evaluation inserted "behind" the cursor of the pagination will fail to be
deleted.
2022-11-14 14:08:13 -05:00
Piotr Kazmierczak 4851f9e68a
acl: sso auth method schema and store functions (#15191)
This PR implements ACLAuthMethod type, acl_auth_methods table schema and crud state store methods. It also updates nomadSnapshot.Persist and nomadSnapshot.Restore methods in order for them to work with the new table, and adds two new Raft messages: ACLAuthMethodsUpsertRequestType and ACLAuthMethodsDeleteRequestType

This PR is part of the SSO work captured under ☂️ ticket #13120.
2022-11-10 19:42:41 +01:00
Drew Gonzales aac9404ee5
server: add git revision to serf tags (#9159) 2022-11-07 10:34:33 -05:00
Tim Gross 9e1c0b46d8
API for `Eval.Count` (#15147)
Add a new `Eval.Count` RPC and associated HTTP API endpoints. This API is
designed to support interactive use in the `nomad eval delete` command to get a
count of evals expected to be deleted before doing so.

The state store operations to do this sort of thing are somewhat expensive, but
it's cheaper than serializing a big list of evals to JSON. Note that although it
seems like this could be done as an extra parameter and response field on
`Eval.List`, having it as its own endpoint avoids having to change the response
body shape and lets us avoid handling the legacy filter params supported by
`Eval.List`.
2022-11-07 08:53:19 -05:00
Luiz Aoqui e4c8b59919
Update alloc after reconnect and enforece client heartbeat order (#15068)
* scheduler: allow updates after alloc reconnects

When an allocation reconnects to a cluster the scheduler needs to run
special logic to handle the reconnection, check if a replacement was
create and stop one of them.

If the allocation kept running while the node was disconnected, it will
be reconnected with `ClientStatus: running` and the node will have
`Status: ready`. This combination is the same as the normal steady state
of allocation, where everything is running as expected.

In order to differentiate between the two states (an allocation that is
reconnecting and one that is just running) the scheduler needs an extra
piece of state.

The current implementation uses the presence of a
`TaskClientReconnected` task event to detect when the allocation has
reconnected and thus must go through the reconnection process. But this
event remains even after the allocation is reconnected, causing all
future evals to consider the allocation as still reconnecting.

This commit changes the reconnect logic to use an `AllocState` to
register when the allocation was reconnected. This provides the
following benefits:

  - Only a limited number of task states are kept, and they are used for
    many other events. It's possible that, upon reconnecting, several
    actions are triggered that could cause the `TaskClientReconnected`
    event to be dropped.
  - Task events are set by clients and so their timestamps are subject
    to time skew from servers. This prevents using time to determine if
    an allocation reconnected after a disconnect event.
  - Disconnect events are already stored as `AllocState` and so storing
    reconnects there as well makes it the only source of information
    required.

With the new logic, the reconnection logic is only triggered if the
last `AllocState` is a disconnect event, meaning that the allocation has
not been reconnected yet. After the reconnection is handled, the new
`ClientStatus` is store in `AllocState` allowing future evals to skip
the reconnection logic.

* scheduler: prevent spurious placement on reconnect

When a client reconnects it makes two independent RPC calls:

  - `Node.UpdateStatus` to heartbeat and set its status as `ready`.
  - `Node.UpdateAlloc` to update the status of its allocations.

These two calls can happen in any order, and in case the allocations are
updated before a heartbeat it causes the state to be the same as a node
being disconnected: the node status will still be `disconnected` while
the allocation `ClientStatus` is set to `running`.

The current implementation did not handle this order of events properly,
and the scheduler would create an unnecessary placement since it
considered the allocation was being disconnected. This extra allocation
would then be quickly stopped by the heartbeat eval.

This commit adds a new code path to handle this order of events. If the
node is `disconnected` and the allocation `ClientStatus` is `running`
the scheduler will check if the allocation is actually reconnecting
using its `AllocState` events.

* rpc: only allow alloc updates from `ready` nodes

Clients interact with servers using three main RPC methods:

  - `Node.GetAllocs` reads allocation data from the server and writes it
    to the client.
  - `Node.UpdateAlloc` reads allocation from from the client and writes
    them to the server.
  - `Node.UpdateStatus` writes the client status to the server and is
    used as the heartbeat mechanism.

These three methods are called periodically by the clients and are done
so independently from each other, meaning that there can't be any
assumptions in their ordering.

This can generate scenarios that are hard to reason about and to code
for. For example, when a client misses too many heartbeats it will be
considered `down` or `disconnected` and the allocations it was running
are set to `lost` or `unknown`.

When connectivity is restored the to rest of the cluster, the natural
mental model is to think that the client will heartbeat first and then
update its allocations status into the servers.

But since there's no inherit order in these calls the reverse is just as
possible: the client updates the alloc status and then heartbeats. This
results in a state where allocs are, for example, `running` while the
client is still `disconnected`.

This commit adds a new verification to the `Node.UpdateAlloc` method to
reject updates from nodes that are not `ready`, forcing clients to
heartbeat first. Since this check is done server-side there is no need
to coordinate operations client-side: they can continue sending these
requests independently and alloc update will succeed after the heartbeat
is done.

* chagelog: add entry for #15068

* code review

* client: skip terminal allocations on reconnect

When the client reconnects with the server it synchronizes the state of
its allocations by sending data using the `Node.UpdateAlloc` RPC and
fetching data using the `Node.GetClientAllocs` RPC.

If the data fetch happens before the data write, `unknown` allocations
will still be in this state and would trigger the
`allocRunner.Reconnect` flow.

But when the server `DesiredStatus` for the allocation is `stop` the
client should not reconnect the allocation.

* apply more code review changes

* scheduler: persist changes to reconnected allocs

Reconnected allocs have a new AllocState entry that must be persisted by
the plan applier.

* rpc: read node ID from allocs in UpdateAlloc

The AllocUpdateRequest struct is used in three disjoint use cases:

1. Stripped allocs from clients Node.UpdateAlloc RPC using the Allocs,
   and WriteRequest fields
2. Raft log message using the Allocs, Evals, and WriteRequest fields
3. Plan updates using the AllocsStopped, AllocsUpdated, and Job fields

Adding a new field that would only be used in one these cases (1) made
things more confusing and error prone. While in theory an
AllocUpdateRequest could send allocations from different nodes, in
practice this never actually happens since only clients call this method
with their own allocations.

* scheduler: remove logic to handle exceptional case

This condition could only be hit if, somehow, the allocation status was
set to "running" while the client was "unknown". This was addressed by
enforcing an order in "Node.UpdateStatus" and "Node.UpdateAlloc" RPC
calls, so this scenario is not expected to happen.

Adding unnecessary code to the scheduler makes it harder to read and
reason about it.

* more code review

* remove another unused test
2022-11-04 16:25:11 -04:00
Charlie Voiselle 79c4478f5b
template: error on missing key (#15141)
* Support error_on_missing_value for templates
* Update docs for template stanza
2022-11-04 13:23:01 -04:00
Charlie Voiselle 83e43e01c1
Add missing timer reset (#15134) 2022-11-03 18:57:57 -04:00
Phil Renaud ffb4c63af7
[ui] Adds meta to job list stub and displays a pack logo on the jobs index (#14833)
* Adds meta to job list stub and displays a pack logo on the jobs index

* Changelog

* Modifying struct for optional meta param

* Explicitly ask for meta anytime I look up a job from index or job page

* Test case for the endpoint

* adding meta field to API struct and ommitting from response if empty

* passthru method added to api/jobs.list

* Meta param listed in docs for jobs list

* Update api/jobs.go

Co-authored-by: Tim Gross <tgross@hashicorp.com>

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-11-02 16:58:24 -04:00
Tim Gross 4d7a4171cd
volumewatcher: prevent panic on nil volume (#15101)
If a GC claim is written and then volume is deleted before the `volumewatcher`
enters its run loop, we panic on the nil-pointer access. Simply doing a
nil-check at the top of the loop reveals a race condition around shutting down
the loop just as a new update is coming in.

Have the parent `volumeswatcher` send an initial update on the channel before
returning, so that we're still holding the lock. Update the watcher's `Stop`
method to set the running state, which lets us avoid having a second context and
makes stopping synchronous. This reduces the cases we have to handle in the run
loop.

Updated the tests now that we'll safely return from the goroutine and stop the
runner in a larger set of cases. Ran the tests with the `-race` detection flag
and fixed up any problems found here as well.
2022-11-01 16:53:10 -04:00
Tim Gross 38542f256e
variables: limit rekey eval to half the nack timeout (#15102)
In order to limit how much the rekey job can monopolize a scheduler worker, we
limit how long it can run to 1min before stopping work and emitting a new
eval. But this exactly matches the default nack timeout, so it'll fail the eval
rather than getting a chance to emit a new one.

Set the timeout for the rekey eval to half the configured nack timeout.
2022-11-01 16:50:50 -04:00
Tim Gross 903b5baaa4
keyring: safely handle missing keys and restore GC (#15092)
When replication of a single key fails, the replication loop breaks early and
therefore keys that fall later in the sorting order will never get
replicated. This is particularly a problem for clusters impacted by the bug that
caused #14981 and that were later upgraded; the keys that were never replicated
can now never be replicated, and so we need to handle them safely.

Included in the replication fix:
* Refactor the replication loop so that each key replicated in a function call
  that returns an error, to make the workflow more clear and reduce nesting. Log
  the error and continue.
* Improve stability of keyring replication tests. We no longer block leadership
  on initializing the keyring, so there's a race condition in the keyring tests
  where we can test for the existence of the root key before the keyring has
  been initialize. Change this to an "eventually" test.

But these fixes aren't enough to fix #14981 because they'll end up seeing an
error once a second complaining about the missing key, so we also need to fix
keyring GC so the keys can be removed from the state store. Now we'll store the
key ID used to sign a workload identity in the Allocation, and we'll index the
Allocation table on that so we can track whether any live Allocation was signed
with a particular key ID.
2022-11-01 15:00:50 -04:00
Tim Gross dab9388c75
refactor eval delete safety check (#15070)
The `Eval.Delete` endpoint has a helper that takes a list of jobs and allocs and
determines whether the eval associated with those is safe to delete (based on
their state). Filtering improvements to the `Eval.Delete` endpoint are going to
need this check to run in the state store itself for consistency.

Refactor to push this check down into the state store to keep the eventual diff
for that work reasonable.
2022-10-28 09:10:33 -04:00
Tim Gross 9c37a234e7
test: refactor EvalEndpoint_Delete (#15065)
While working on filtering improvements to the `Eval.Delete` endpoint I noticed
that this test was going to need to expand significantly and needed some
refactoring to make that work nicely. In order to reduce the size of the
eventual diff, I've pulled this refactoring out into its own changeset.
2022-10-27 15:29:22 -04:00
hc-github-team-nomad-core 38b1c8a22a Prepare for next release 2022-10-27 13:08:05 -04:00
hc-github-team-nomad-core fbef8881cd Generate files for 1.4.2 release 2022-10-27 13:08:05 -04:00
Tim Gross 9d906d4632 variables: fix filter on List RPC
The List RPC correctly authorized against the prefix argument. But when
filtering results underneath the prefix, it only checked authorization for
standard ACL tokens and not Workload Identity. This results in WI tokens being
able to read List results (metadata only: variable paths and timestamps) for
variables under the `nomad/` prefix that belong to other jobs in the same
namespace.

Fixes the filtering and split the `handleMixedAuthEndpoint` function into
separate authentication and authorization steps so that we don't need to
re-verify the claim token on each filtered object.

Also includes:
* update semgrep rule for mixed auth endpoints
* variables: List returns empty set when all results are filtered
2022-10-27 13:08:05 -04:00
James Rasell da5069bded event stream: ensure token expiry is correctly checked for subs.
This change ensures that a token's expiry is checked before every
event is sent to the caller. Previously, a token could still be
used to listen for events after it had expired, as long as the
subscription was made while it was unexpired. This would last until
the token was garbage collected from state.

The check occurs within the RPC as there is currently no state
update when a token expires.
2022-10-27 13:08:05 -04:00
Tim Gross aca95c0bc6
keyring: remove root key GC (#15034) 2022-10-25 17:06:18 -04:00
Tim Gross c45d9a9ea8
keyring: refactor to hold locks for less time (#15026)
Follow-up from https://github.com/hashicorp/nomad/pull/14987/files#r1003611644

We don't need to hold the lock when querying the state store, so move the
read-lock to the interior of the `activeKeySet` function.
2022-10-24 16:23:44 -04:00
Tim Gross b9922631bd
keyring: fix missing GC config, don't rotate on manual GC (#15009)
The configuration knobs for root keyring garbage collection are present in the
consumer and present in the user-facing config, but we missed the spot where we
copy from one to the other. Fix this so that users can set their own thresholds.

The root key is automatically rotated every ~30d, but the function that does
both rotation and key GC was wired up such that `nomad system gc` caused an
unexpected key rotation. Split this into two functions so that `nomad system gc`
cleans up old keys without forcing a rotation, which will be done periodially
or by the `nomad operator root keyring rotate` command.
2022-10-24 08:43:42 -04:00
Tim Gross 3a811ac5e7
keyring: fixes for keyring replication on cluster join (#14987)
* keyring: don't unblock early if rate limit burst exceeded

The rate limiter returns an error and unblocks early if its burst limit is
exceeded (unless the burst limit is Inf). Ensure we're not unblocking early,
otherwise we'll only slow down the cases where we're already pausing to make
external RPC requests.

* keyring: set MinQueryIndex on stale queries

When keyring replication makes a stale query to non-leader peers to find a key
the leader doesn't have, we need to make sure the peer we're querying has had a
chance to catch up to the most current index for that key. Otherwise it's
possible for newly-added servers to query another newly-added server and get a
non-error nil response for that key ID.

Ensure that we're setting the correct reply index in the blocking query.

Note that the "not found" case does not return an error, just an empty key. So
as a belt-and-suspenders, update the handling of empty responses so that we
don't break the loop early if we hit a server that doesn't have the key.

* test for adding new servers to keyring

* leader: initialize keyring after we have consistent reads

Wait until we're sure the FSM is current before we try to initialize the
keyring.

Also, if a key is rotated immediately following a leader election, plans that
are in-flight may get signed before the new leader has the key. Allow for a
short timeout-and-retry to avoid rejecting plans
2022-10-21 12:33:16 -04:00
James Rasell 206fb04dc1
acl: allow tokens to read policies linked via roles to the token. (#14982)
ACL tokens are granted permissions either by direct policy links
or via ACL role links. Callers should therefore be able to read
policies directly assigned to the caller token or indirectly by
ACL role links.
2022-10-21 09:05:17 +02:00
James Rasell 215b4e7e36
acl: add ACL roles to event stream topic and resolve policies. (#14923)
This changes adds ACL role creation and deletion to the event
stream. It is exposed as a single topic with two types; the filter
is primarily the role ID but also includes the role name.

While conducting this work it was also discovered that the events
stream has its own ACL resolution logic. This did not account for
ACL tokens which included role links, or tokens with expiry times.
ACL role links are now resolved to their policies and tokens are
checked for expiry correctly.
2022-10-20 09:43:35 +02:00
James Rasell 8e25048f3d
acl: gate ACL role write and delete RPC usage on v1.4.0 or greater. (#14908) 2022-10-18 16:46:11 +02:00
James Rasell 9923f9e6f3
nnsd: gate registration write & delete RPC use on v1.3.0 or greater. (#14924) 2022-10-18 15:30:28 +02:00
Tim Gross 3c78980b78
make version checks specific to region (1.4.x) (#14912)
* One-time tokens are not replicated between regions, so we don't want to enforce
  that the version check across all of serf, just members in the same region.
* Scheduler: Disconnected clients handling is specific to a single region, so we
  don't want to enforce that the version check across all of serf, just members in
  the same region.
* Variables: enforce version check in Apply RPC
* Cleans up a bunch of legacy checks.

This changeset is specific to 1.4.x and the changes for previous versions of
Nomad will be manually backported in a separate PR.
2022-10-17 16:23:51 -04:00
Tim Gross c721ce618e
keyring: filter by region before checking version (#14901)
In #14821 we fixed a panic that can happen if a leadership election happens in
the middle of an upgrade. That fix checks that all servers are at the minimum
version before initializing the keyring (which blocks evaluation processing
during trhe upgrade). But the check we implemented is over the serf membership,
which includes servers in any federated regions, which don't necessarily have
the same upgrade cycle.

Filter the version check by the leader's region.

Also bump up log levels of major keyring operations
2022-10-17 13:21:16 -04:00
Seth Hoenig 1593963cd1
servicedisco: implicit constraint for nomad v1.4 when using nsd checks (#14868)
This PR adds a jobspec mutator to constrain jobs making use of checks
in the nomad service provider to nomad clients of at least v1.4.0.

Before, in a mixed client version cluster it was possible to submit
an NSD job making use of checks and for that job to land on an older,
incompatible client node.

Closes #14862
2022-10-11 08:21:42 -05:00
Seth Hoenig 69ced2a2bd
services: remove assertion on 'task' field being set (#14864)
This PR removes the assertion around when the 'task' field of
a check may be set. Starting in Nomad 1.4 we automatically set
the task field on all checks in support of the NSD checks feature.

This is causing validation problems elsewhere, e.g. when a group
service using the Consul provider sets 'task' it will fail
validation that worked previously.

The assertion of leaving 'task' unset was only about making sure
job submitters weren't expecting some behavior, but in practice
is causing bugs now that we need the task field for more than it
was originally added for.

We can simply update the docs, noting when the task field set by
job submitters actually has value.
2022-10-10 13:02:33 -05:00
Seth Hoenig 5e38a0e82c
cleanup: rename Equals to Equal for consistency (#14759) 2022-10-10 09:28:46 -05:00
Hemanth Krishna e516fc266f
enhancement: UpdateTask when Task is waiting for ShutdownDelay (#14775)
Signed-off-by: Hemanth Krishna <hkpdev008@gmail.com>
2022-10-06 16:33:28 -04:00
Giovani Avelar a625de2062
Allow specification of a custom job name/prefix for parameterized jobs (#14631) 2022-10-06 16:21:40 -04:00
Tim Gross 80ec5e1346
fix panic from keyring raft entries being written during upgrade (#14821)
During an upgrade to Nomad 1.4.0, if a server running 1.4.0 becomes the leader
before one of the 1.3.x servers, the old server will crash because the keyring
is initialized and writes a raft entry.

Wait until all members are on a version that supports the keyring before
initializing it.
2022-10-06 12:47:02 -04:00
Michael Schurter 0df5c7d5ae
test: fix flaky test (#14713)
Need to wait for Stop evals to be processed before you can expect
subsequent RPCs to see the alloc's DesiredStatus=stop.
2022-09-27 10:36:16 -07:00
Tim Gross 87681fca68
CSI: ensure initial unpublish state is checkpointed (#14675)
A test flake revealed a bug in the CSI unpublish workflow, where an unpublish
that comes from a client that's successfully done the node-unpublish step will
not have the claim checkpointed if the controller-unpublish step fails. This
will result in a delay in releasing the volume claim until the next GC.

This changeset also ensures we're using a new snapshot after each write to raft,
and fixes two timing issues in test where either the volume watcher can
unpublish before the unpublish RPC is sent or we don't wait long enough in
resource-restricted environements like GHA.
2022-09-27 08:43:45 -04:00
Seth Hoenig 87ec5fdee5
deps: update set and test (#14680)
This PR updates go-set and shoenig/test, which introduced some breaking
API changes.
2022-09-26 08:28:03 -05:00
Seth Hoenig ae5b800085
cleanup: rearrange mocks package (#14660)
This PR splits up the nomad/mock package into more files. Specific features
that have a lot of mocks get their own file (e.g. acl, variables, csi, connect, etc.).
Above that, functions that return jobs/allocs/nodes are in the job/alloc/node file. And
lastly other mocks/helpers are in mock.go
2022-09-22 13:49:58 -05:00
Derek Strickland 6874997f91
scheduler: Fix bug where the would treat multiregion jobs as paused for job types that don't use deployments (#14659)
* scheduler: Fix bug where the scheduler would treat multiregion jobs as paused for job types that don't use deployments

Co-authored-by: Tim Gross <tgross@hashicorp.com>

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-09-22 14:31:27 -04:00
Florian Apolloner f66d61e17f
consul: Removed unused ConsulUsage.Kinds. (#11303) 2022-09-22 10:07:14 -05:00
Jorge Marey 584ddfe859
Add Namespace, Job and Group to envoy stats (#14311) 2022-09-22 10:38:21 -04:00
Seth Hoenig 2088ca3345
cleanup more helper updates (#14638)
* cleanup: refactor MapStringStringSliceValueSet to be cleaner

* cleanup: replace SliceStringToSet with actual set

* cleanup: replace SliceStringSubset with real set

* cleanup: replace SliceStringContains with slices.Contains

* cleanup: remove unused function SliceStringHasPrefix

* cleanup: fixup StringHasPrefixInSlice doc string

* cleanup: refactor SliceSetDisjoint to use real set

* cleanup: replace CompareSliceSetString with SliceSetEq

* cleanup: replace CompareMapStringString with maps.Equal

* cleanup: replace CopyMapStringString with CopyMap

* cleanup: replace CopyMapStringInterface with CopyMap

* cleanup: fixup more CopyMapStringString and CopyMapStringInt

* cleanup: replace CopySliceString with slices.Clone

* cleanup: remove unused CopySliceInt

* cleanup: refactor CopyMapStringSliceString to be generic as CopyMapOfSlice

* cleanup: replace CopyMap with maps.Clone

* cleanup: run go mod tidy
2022-09-21 14:53:25 -05:00
Michael Schurter bd4b4b8f66
Data race fixes in tests and a new semgrep rule (#14594)
* test: don't use loop vars in goroutines

fixes a data race in the test

* test: copy objects in statestore before mutating

fixes data race in test

* test: @lgfa29's segmgrep rule for loops/goroutines

Found 2 places where we were improperly using loop variables inside
goroutines.
2022-09-15 10:35:08 -07:00
Tim Gross 89dfdef95d
variables: handler should catch errors before conflicts (#14591) 2022-09-14 13:14:17 -04:00
Mahmood Ali a9d5e4c510
scheduler: stopped-yet-running allocs are still running (#10446)
* scheduler: stopped-yet-running allocs are still running

* scheduler: test new stopped-but-running logic

* test: assert nonoverlapping alloc behavior

Also add a simpler Wait test helper to improve line numbers and save few
lines of code.

* docs: tried my best to describe #10446

it's not concise... feedback welcome

* scheduler: fix test that allowed overlapping allocs

* devices: only free devices when ClientStatus is terminal

* test: output nicer failure message if err==nil

Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2022-09-13 12:52:47 -07:00
Seth Hoenig bf4dd30919
Merge pull request #14553 from hashicorp/f-nsd-check-watcher
servicedisco: implement check_restart support for nomad service checks
2022-09-13 09:55:51 -05:00
Charlie Voiselle 6ab59d2aa6
var: Correct 0-index CAS Deletes (#14555)
* Add missing 0 case for VarDeleteCAS, more comments
* Add tests for VarDeleteCAS
2022-09-13 10:12:08 -04:00
Seth Hoenig 9a943107c7 servicedisco: implement check_restart for nomad service checks
This PR implements support for check_restart for checks registered
in the Nomad service provider.

Unlike Consul, Nomad service checks never report a "warning" status,
and so the check_restart.ignore_warnings configuration is not valid
for Nomad service checks.
2022-09-13 08:59:23 -05:00
Seth Hoenig b960925939
Merge pull request #14546 from hashicorp/f-refactor-check-watcher
client: refactor check watcher to be reusable
2022-09-13 07:32:32 -05:00
Tim Gross 03312f3227
variables: restrict allowed paths for variables (#14547)
Restrict variable paths to RFC3986 URL-safe characters that don't conflict with
the use of characters "@" and "." in `template` blocks. This prevents users from
writing variables that will require tricky templating syntax or that they simply
won't be able to use.

Also restrict the length so that a user can't make queries in the state store
unusually expensive (as they are O(k) on the key length).
2022-09-12 16:37:33 -04:00
Seth Hoenig feff36f3f7 client: refactor check watcher to be reusable
This PR refactors agent/consul/check_watcher into client/serviceregistration,
and abstracts away the Consul-specific check lookups.

In doing so we should be able to reuse the existing check watcher logic for
also watching NSD checks in a followup PR.

A chunk of consul/unit_test.go is removed - we'll cover that in e2e tests
in a follow PR if needed. In the long run I'd like to remove this whole file.
2022-09-12 10:13:31 -05:00
Derek Strickland 5ca934015b
job_endpoint: check spec for all regions (#14519)
* job_endpoint: check spec for all regions
2022-09-12 09:24:26 -04:00
Charlie Voiselle b55112714f
Vars: CLI commands for `var get`, `var put`, `var purge` (#14400)
* Includes updates to `var init`
2022-09-09 17:55:20 -04:00
Charlie Voiselle e58998e218
Add client scheduling eligibility to heartbeat (#14483) 2022-09-08 14:31:36 -04:00
Tim Gross 3fc7482ecd
CSI: failed allocation should not block its own controller unpublish (#14484)
A Nomad user reported problems with CSI volumes associated with failed
allocations, where the Nomad server did not send a controller unpublish RPC.

The controller unpublish is skipped if other non-terminal allocations on the
same node claim the volume. The check has a bug where the allocation belonging
to the claim being freed was included in the check incorrectly. During a normal
allocation stop for job stop or a new version of the job, the allocation is
terminal. But allocations that fail are not yet marked terminal at the point in
time when the client sends the unpublish RPC to the server.

For CSI plugins that support controller attach/detach, this means that the
controller will not be able to detach the volume from the allocation's host and
the replacement claim will fail until a GC is run. This changeset fixes the
conditional so that the claim's own allocation is not included, and makes the
logic easier to read. Include a test case covering this path.

Also includes two minor extra bugfixes:

* Entities we get from the state store should always be copied before
altering. Ensure that we copy the volume in the top-level unpublish workflow
before handing off to the steps.

* The list stub object for volumes in `nomad/structs` did not match the stub
object in `api`. The `api` package also did not include the current
readers/writers fields that are expected by the UI. True up the two objects and
add the previously undocumented fields to the docs.
2022-09-08 13:30:05 -04:00
James Rasell 3fa8b0b270
client: fix RPC forwarding when querying checks for alloc. (#14498)
When querying the checks for an allocation, the request must be
forwarded to the agent that is running the allocation. If the
initial request is made to a server agent, the request can be made
directly to the client agent running the allocation. If the
request is made to a client agent not running the alloc, the
request needs to be forwarded to a server and then the correct
client.
2022-09-08 16:55:23 +02:00
Tim Gross 2a961af44c
test: fix concurrent map access in `TestStatsFetcher` (#14496)
The map of in-flight RPCs gets cleared by a goroutine in the test without first
locking it to make sure that it's not being accessed concurrently by the stats
fetcher itself. This can cause a panic in tests.
2022-09-08 10:41:15 -04:00
Tim Gross 5c57a84e99
autopilot: deflake tests (#14475)
Includes:

* Remove leader upgrade raft version test, as older versions of raft are now
  incompatible with our autopilot library.

* Remove attempt to assert initial non-voter status on the `PromoteNonVoter`
  test, as this happens too quickly to reliably detect.

* Unskip some previously-skipped tests which we should make stable.

* Remove the `consul/sdk` retry helper for these tests; this uses panic recovery
  in a kind of a clever/gross way to reduce LoC but it seems to introduce some
  timing issues in the process.

* Add more test step logging and reduce logging noise from the scheduler
  goroutines to make it easier to debug failing tests.

* Be more consistent about using the `waitForStableLeadership` helper so that we
  can assert the cluster is fully stable and not just that we've added peers.
2022-09-07 09:35:01 -04:00
James Rasell 962b1f78e8
core: clarify ACL token expiry GC messages to show global param. (#14466) 2022-09-06 15:42:45 +02:00
Kellen Fox 5086368a1e
Add a log line to help track node eligibility (#14125)
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2022-09-06 14:03:33 +02:00
Yan 6e927fa125
warn destructive update only when count > 1 (#13103) 2022-09-02 15:30:06 -04:00
Tim Gross 7921f044e5
migrate autopilot implementation to raft-autopilot (#14441)
Nomad's original autopilot was importing from a private package in Consul. It
has been moved out to a shared library. Switch Nomad to use this library so that
we can eliminate the import of Consul, which is necessary to build Nomad ENT
with the current version of the Consul SDK. This also will let us pick up
autopilot improvements shared with Consul more easily.
2022-09-01 14:27:10 -04:00
Luiz Aoqui 19de803503
cli: ignore VaultToken when generating job diff (#14424) 2022-09-01 10:01:53 -04:00
James Rasell 4b9bcf94da
chore: remove use of "err" a log line context key for errors. (#14433)
Log lines which include an error should use the full term "error"
as the context key. This provides consistency across the codebase
and avoids a Go style which operators might not be aware of.
2022-09-01 15:06:10 +02:00
Luiz Aoqui dc6525336b
ci: fix TestNomad_BootstrapExpect_NonVoter test (#14407)
PR #12130 refactored the test to use the `wantPeers` helper, but this
function only returns the number of voting peers, which in this test
should be equal to 2.

I think the tests were passing back them because of a bug in Raft
(https://github.com/hashicorp/raft/pull/483) where a non-voting server
was able to transition to candidate state.

One possible evidence of this is that a successful test run would have
the following log line:

```
raft@v1.3.5/raft.go:1058: nomad.raft: updating configuration: command=AddVoter server-id=127.0.0.1:9101 server-addr=127.0.0.1:9101 servers="[{Suffrage:Voter ID:127.0.0.1:9107 Address:127.0.0.1:9107} {Suffrage:Voter ID:127.0.0.1:9105 Address:127.0.0.1:9105} {Suffrage:Voter ID:127.0.0.1:9103 Address:127.0.0.1:9103} {Suffrage:Voter ID:127.0.0.1:9101 Address:127.0.0.1:9101}]"
```

This commit reverts the test logic to check for peer count, regardless
of voting status.
2022-08-30 16:32:54 -04:00
Tim Gross 5784fb8c58
search: enforce correct ACL for search over variables (#14397) 2022-08-30 13:27:31 -04:00
Tim Gross c9d678a91a
keyring: wrap root key in key encryption key (#14388)
Update the on-disk format for the root key so that it's wrapped with a unique
per-key/per-server key encryption key. This is a bit of security theatre for the
current implementation, but it uses `go-kms-wrapping` as the interface for
wrapping the key. This provides a shim for future support of external KMS such
as cloud provider APIs or Vault transit encryption.

* Removes the JSON serialization extension we had on the `RootKey` struct; this
  struct is now only used for key replication and not for disk serialization, so
  we don't need this helper.

* Creates a helper for generating cryptographically random slices of bytes that
  properly accounts for short reads from the source.

* No observable functional changes outside of the on-disk format, so there are
  no test updates.
2022-08-30 10:59:25 -04:00
Seth Hoenig 52de2dc09d
Merge pull request #14290 from hashicorp/cleanup-more-helper-cleanup
cleanup: tidy up helper package some more
2022-08-30 08:19:48 -05:00
James Rasell 755b4745ed
Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main 2022-08-30 08:59:13 +01:00
Seth Hoenig 3e1e2001b9
Merge pull request #14143 from hashicorp/cleanup-slice-sets-3
cleanup: more cleanup of slices that are really sets
2022-08-29 13:52:59 -05:00
Tim Gross 7d1eb2efd5
keyring: split structs to its own file (#14378) 2022-08-29 14:18:35 -04:00
Seth Hoenig 9d0e274f27 cleanup: cleanup more slice-set comparisons 2022-08-29 12:04:21 -05:00
Tim Gross 62a968f443
Merge pull request #14351 from hashicorp/variables-rename
Variables rename
2022-08-29 11:36:50 -04:00
Piotr Kazmierczak 5f353503e5
bugfix: fixed template validation panic in case of incorrect ChangeScript configuration (#14374)
Fixes #14367
2022-08-29 17:11:15 +02:00
Tim Gross 1dc053b917 rename SecureVariables to Variables throughout 2022-08-26 16:06:24 -04:00
Tim Gross dcfd31296b file rename 2022-08-26 16:06:24 -04:00
Seth Hoenig b87689d2d1
Merge pull request #14318 from hashicorp/cleanup-create-pointer-compare
cleanup: create pointer.Compare helper function
2022-08-26 09:15:41 -05:00
Seth Hoenig 6b2655ad86 cleanup: create pointer.Compare helper function
This PR creates a pointer.Compare helper for comparing equality of
two pointers. Strictly only works with primitive types we know are
safe to derefence and compare using '=='.
2022-08-26 08:55:59 -05:00
James Rasell 601588df6b
Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main 2022-08-25 12:14:29 +01:00