Commit graph

3929 commits

Author SHA1 Message Date
Derek Strickland bd719bc7b8 reconciler: 2 phase reconnects and tests (#12333)
* structs: Add alloc.Expired & alloc.Reconnected functions. Add Reconnect eval trigger by.

* node_endpoint: Emit new eval for reconnecting unknown allocs.

* filterByTainted: handle 2 phase commit filtering rules.

* reconciler: Append AllocState on disconnect. Logic updates from testing and 2 phase reconnects.

* allocs: Set reconnect timestamp. Destroy if not DesiredStatusRun. Watch for unknown status.
2022-04-05 17:13:10 -04:00
Derek Strickland bb376320a2 comments: update some stale comments referencing deprecated config name (#12271)
* comments: update some stale comments referencing deprecated config name
2022-04-05 17:12:23 -04:00
Derek Strickland fd04a24ac7 disconnected clients: ensure servers meet minimum required version (#12202)
* planner: expose ServerMeetsMinimumVersion via Planner interface
* filterByTainted: add flag indicating disconnect support
* allocReconciler: accept and pass disconnect support flag
* tests: update dependent tests
2022-04-05 17:12:23 -04:00
Derek Strickland 8e9f8be511 MaxClientDisconnect Jobspec checklist (#12177)
* api: Add struct, conversion function, and tests
* TaskGroup: Add field, validation, and tests
* diff: Add diff handler and test
* docs: Update docs
2022-04-05 17:12:23 -04:00
Derek Strickland d7f44448e1 disconnected clients: Observability plumbing (#12141)
* Add disconnects/reconnect to log output and emit reschedule metrics

* TaskGroupSummary: Add Unknown, update StateStore logic, add to metrics
2022-04-05 17:12:23 -04:00
Derek Strickland 3cbd76ea9d disconnected clients: Add reconnect task event (#12133)
* Add TaskClientReconnectedEvent constant
* Add allocRunner.Reconnect function to manage task state manually
* Removes server-side push
2022-04-05 17:12:23 -04:00
DerekStrickland da9bc350c8 evaluateNodePlan: validate plans for disconnected nodes 2022-04-05 17:12:22 -04:00
DerekStrickland 73fdf5a919 NodeStatusDisconnected: support state transitions for new node status 2022-04-05 17:12:18 -04:00
Derek Strickland b128769e19 reconciler: support disconnected clients (#12058)
* Add merge helper for string maps
* structs: add statuses, MaxClientDisconnect, and helper funcs
* taintedNodes: Include disconnected nodes
* upsertAllocsImpl: don't use existing ClientStatus when upserting unknown
* allocSet: update filterByTainted and add delayByMaxClientDisconnect
* allocReconciler: support disconnecting and reconnecting allocs
* GenericScheduler: upsert unknown and queue reconnecting

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-04-05 17:10:37 -04:00
Luiz Aoqui ab7eb5de6e
Support Vault entity aliases (#12449)
Move some common Vault API data struct decoding out of the Vault client
so it can be reused in other situations.

Make Vault job validation its own function so it's easier to expand it.

Rename the `Job.VaultPolicies` method to just `Job.Vault` since it
returns the full Vault block, not just their policies.

Set `ChangeMode` on `Vault.Canonicalize`.

Add some missing tests.

Allows specifying an entity alias that will be used by Nomad when
deriving the task Vault token.

An entity alias assigns an indentity to a token, allowing better control
and management of Vault clients since all tokens with the same indentity
alias will now be considered the same client. This helps track Nomad
activity in Vault's audit logs and better control over Vault billing.

Add support for a new Nomad server configuration to define a default
entity alias to be used when deriving Vault tokens. This default value
will be used if the task doesn't have an entity alias defined.
2022-04-05 14:18:10 -04:00
James Rasell e2b730d7c9
Merge pull request #12454 from hashicorp/f-rename-service-event-stream
events: add service API logic and rename topic to service from serviceregistration
2022-04-05 16:19:14 +02:00
Grant Griffiths 18a0a2c9a4
CSI: Add secrets flag support for delete volume (#11245) 2022-04-05 08:59:11 -04:00
James Rasell cc7b448d63
events: fixup service events and rename topic to service. 2022-04-05 08:25:22 +01:00
Michael Schurter a1f1294dc4
Merge pull request #12442 from hashicorp/f-sd-add-mixed-auth-read-endpoints
service-disco: add mixed auth to list and read RPC endpoints.
2022-04-04 12:19:29 -07:00
Tim Gross 759310d13a
CSI: volume watcher shutdown fixes (#12439)
The volume watcher design was based on deploymentwatcher and drainer,
but has an important difference: we don't want to maintain a goroutine
for the lifetime of the volume. So we stop the volumewatcher goroutine
for a volume when that volume has no more claims to free. But the
shutdown races with updates on the parent goroutine, and it's possible
to drop updates. Fortunately these updates are picked up on the next
core GC job, but we're most likely to hit this race when we're
replacing an allocation and that's the time we least want to wait.

Wait until the volume has "settled" before stopping this goroutine so
that the race between shutdown and the parent goroutine sending on
`<-updateCh` is pushed to after the window we most care about quick
freeing of claims.

* Fixes a resource leak when volumewatchers are no longer needed. The
  volume is nil and can't ever be started again, so the volume's
  `watcher` should be removed from the top-level `Watcher`.

* De-flakes the GC job test: the test throws an error because the
  claimed node doesn't exist and is unreachable. This flaked instead of
  failed because we didn't correctly wait for the first pass through the
  volumewatcher.

  Make the GC job wait for the volumewatcher to reach the quiescent
  timeout window state before running the GC eval under test, so that
  we're sure the GC job's work isn't being picked up by processing one
  of the earlier claims. Update the claims used so that we're sure the
  GC pass won't hit a node unpublish error.

* Adds trace logging to unpublish operations
2022-04-04 10:46:45 -04:00
James Rasell ea4d7366fc
service-disco: add mixed auth to list and read RPC endpoints.
In the same manner as the delete RPC, the list and read service
registration endpoints can be called either by external operators
or Nomad nodes. The latter occurs when a template is being
rendered which includes Nomad API template funcs. In this case,
the auth token is looked up as the node secret ID for auth.
2022-04-04 13:45:43 +01:00
Seth Hoenig 9670adb6c6 cleanup: purge github.com/pkg/errors 2022-04-01 19:24:02 -05:00
Tim Gross de191e8068
Test lint touchup (#12434)
* lint: require should not be aliased in core_sched_test
* lint: require should not be aliased in volumes_watcher_test
* testing: don't alias state package in core_sched_test
2022-04-01 15:17:58 -04:00
Tim Gross 03c1904112
csi: allow namespace field to be passed in volume spec (#12400)
Use the volume spec's `namespace` field to override the value of the
`-namespace` and `NOMAD_NAMESPACE` field, just as we do with job spec.
2022-03-29 14:46:39 -04:00
Tim Gross a6652bffad
CSI: reorder controller volume detachment (#12387)
In #12112 and #12113 we solved for the problem of races in releasing
volume claims, but there was a case that we missed. During a node
drain with a controller attach/detach, we can hit a race where we call
controller publish before the unpublish has completed. This is
discouraged in the spec but plugins are supposed to handle it
safely. But if the storage provider's API is slow enough and the
plugin doesn't handle the case safely, the volume can get "locked"
into a state where the provider's API won't detach it cleanly.

Check the claim before making any external controller publish RPC
calls so that Nomad is responsible for the canonical information about
whether a volume is currently claimed.

This has a couple side-effects that also had to get fixed here:

* Changing the order means that the volume will have a past claim
  without a valid external node ID because it came from the client, and
  this uncovered a separate bug where we didn't assert the external node
  ID was valid before returning it. Fallthrough to getting the ID from
  the plugins in the state store in this case. We avoided this
  originally because of concerns around plugins getting lost during node
  drain but now that we've fixed that we may want to revisit it in
  future work.
* We should make sure we're handling `FailedPrecondition` cases from
  the controller plugin the same way we handle other retryable cases.
* Several tests had to be updated because they were assuming we fail
  in a particular order that we're no longer doing.
2022-03-29 09:44:00 -04:00
James Rasell 9449e1c3e2
Merge branch 'main' into f-1.3-boogie-nights 2022-03-25 16:40:32 +01:00
Luiz Aoqui e7382a6b45
tests: fix rpc limit tests (#12364) 2022-03-24 15:31:47 -04:00
Michael Schurter 654d458960
core: add deprecated mvn tag to serf (#12327)
Revert a small part of #11600 after @lgfa29 discovered it would break
compatibility with Nomad <= v1.2!

Nomad <= v1.2 expects the `vsn` tag to exist in Serf. It has always been
`1`. It has no functional purpose. However it causes a parsing error if
it is not set:

https://github.com/hashicorp/nomad/blob/v1.2.6/nomad/util.go#L103-L108

This means Nomad servers at version v1.2 or older will not allow servers
without this tag to join.

The `mvn` minor version tag is also checked, but soft fails. I'm not
setting that because I want as much of this cruft gone as possible.
2022-03-24 14:44:21 -04:00
Luiz Aoqui 64b558c14c
core: store and check for Raft version changes (#12362)
Downgrading the Raft version protocol is not a supported operation.
Checking for a downgrade is hard since this information is not stored in
any persistent place. When a server re-joins a cluster with a prior Raft
version, the Serf tag is updated so Nomad can't tell that the version
changed.

Mixed version clusters must be supported to allow for zero-downtime
rolling upgrades. During this it's expected that the cluster will have
mixed Raft versions. Enforcing consistency strong version consistency
would disrupt this flow.

The approach taken here is to store the Raft version on disk. When the
server starts the `raft_protocol` value is written to the file
`data_dir/raft/version`. If that file already exists, its content is
checked against the current `raft_protocol` value to detect downgrades
and prevent the server from starting.

Any other types of errors are ignore to prevent disruptions that are
outside the control of operators. The only option in cases of an invalid
or corrupt file would be to delete it, making this check useless. So
just overwrite its content with the new version and provide guidance on
how to check that their cluster is an expected state.
2022-03-24 14:42:00 -04:00
Tim Gross ff1bed38cd
csi: add -secret and -parameter flag to volume snapshot create (#12360)
Pass-through the `-secret` and `-parameter` flags to allow setting
parameters for the snapshot and overriding the secrets we've stored on
the CSI volume in the state store.
2022-03-24 10:29:50 -04:00
Seth Hoenig a6c905616d core: write peers.json file with correct permissions 2022-03-24 08:26:31 -05:00
James Rasell 96d8512c85
test: move remaining tests to use ci.Parallel. 2022-03-24 08:45:13 +01:00
Tim Gross 1743648901
CSI: fix timestamp from volume snapshot responses (#12352)
Listing snapshots was incorrectly returning nanoseconds instead of
seconds, and formatting of timestamps both list and create snapshot
was treating the timestamp as though it were nanoseconds instead of
seconds. This resulted in create timestamps always being displayed as
zero values.

Fix the unit conversion error in the command line and the incorrect
extraction in the CSI plugin client code. Beef up the unit tests to
make sure this code is actually exercised.
2022-03-23 10:39:28 -04:00
Tim Gross b7075f04fd
CSI: enforce single access mode at validation time (#12337)
A volume that has single-use access mode is feasibility checked during
scheduling to ensure that only a single reader or writer claim
exists. However, because feasibility checking is done one alloc at a
time before the plan is written, a job that's misconfigured to have
count > 1 that mounts one of these volumes will pass feasibility
checking.

Enforce the check at validation time instead to prevent us from even
trying to evaluation a job that's misconfigured this way.
2022-03-23 09:21:26 -04:00
James Rasell bb8514fc75
core: remove node service registrations when node is down.
When a node fails its heart beating a number of actions are taken
to ensure state is cleaned. Service registrations a loosely tied
to nodes, therefore we should remove these from state when a node
is considered terminally down.
2022-03-23 09:42:46 +01:00
James Rasell a646333263
Merge branch 'main' into f-1.3-boogie-nights 2022-03-23 09:41:25 +01:00
Tim Gross 33558cb51e
csi: fix handling of garbage collected node in node unpublish (#12350)
When a node is garbage collected, we assume that the volume is no
longer attached to it and ignore the `ErrUnknownNode` error. But we
used `errors.Is` to check for a wrapped error, and RPC flattens the
errors during serialization. This results in an error check that works
in automated testing but not in real clusters. Use a string contains
check instead.
2022-03-22 15:40:24 -04:00
Luiz Aoqui f8973d364e
core: use the new Raft API when removing peers (#12340)
Raft v3 introduced a new API for adding and removing peers that takes
the peer ID instead of the address.

Prior to this change, Nomad would use the remote peer Raft version for
deciding which API to use, but this would not work in the scenario where
a Raft v3 server tries to remove a Raft v2 server; the code running uses
v3 so it's unable to call the v2 API.

This change uses the Raft version of the server running the code to
decide which API to use. If the remote peer is a Raft v2, it uses the
server address as the ID.
2022-03-22 15:07:31 -04:00
Luiz Aoqui b5a42cd55d
set raft v3 as the default config (#12341) 2022-03-22 15:06:25 -04:00
Tim Gross 60cfeacd76
drainer: defer CSI plugins until last (#12324)
When a node is drained, system jobs are left until last so that
operators can rely on things like log shippers running even as their
applications are getting drained off. Include CSI plugins in this set
so that Controller plugins deployed as services can be handled as
gracefully as Node plugins that are running as system jobs.
2022-03-22 10:26:56 -04:00
Tim Gross 2a2ebd0537
CSI: presentation improvements (#12325)
* Fix plugin capability sorting.
  The `sort.StringSlice` method in the stdlib doesn't actually sort, but
  instead constructs a sorting type which you call `Sort()` on.
* Sort allocations for plugins by modify index.
  Present allocations in modify index order so that newest allocations
  show up at the top of the list. This results in sorted allocs in
  `nomad plugin status :id`, just like `nomad job status :id`.
* Sort allocations for volumes in HTTP response.
  Present allocations in modify index order so that newest allocations
  show up at the top of the list. This results in sorted allocs in
  `nomad volume status :id`, just like `nomad job status :id`.
  This is implemented in the HTTP response and not in the state store
  because the state store maintains two separate lists of allocs that
  are merged before sending over the API.
* Fix length of alloc IDs in `nomad volume status` output
2022-03-22 09:48:38 -04:00
James Rasell 68cd3d89fe
core: fixup node drain update message spelling. 2022-03-21 13:37:08 +01:00
James Rasell 042bf0fa57
client: hookup service wrapper for use within client hooks. 2022-03-21 10:29:57 +01:00
Seth Hoenig 4d86f5d94d ci: limit gotestsum to circle ci
Part 2 of breaking up https://github.com/hashicorp/nomad/pull/12255

This PR makes it so gotestsum is invoked only in CircleCI. Also the
HCLogger(t) is plumbed more correctly in TestServer and TestAgent so
that they respect NOMAD_TEST_LOG_LEVEL.

The reason for these is we'll want to disable logging in GHA,
where spamming the disk with logs really drags performance.
2022-03-18 09:15:01 -05:00
Luiz Aoqui 15089f055f
api: add related evals to eval details (#12305)
The `related` query param is used to indicate that the request should
return a list of related (next, previous, and blocked) evaluations.

Co-authored-by: Jasmine Dahilig <jasmine@hashicorp.com>
2022-03-17 13:56:14 -04:00
Luiz Aoqui 8db12c2a17
server: transfer leadership in case of error (#12293)
When a Nomad server becomes the Raft leader, it must perform several
actions defined in the establishLeadership function. If any of these
actions fail, Raft will think the node is the leader, but it will not
actually be able to act as a Nomad leader.

In this scenario, leadership must be revoked and transferred to another
server if possible, or the node should retry the establishLeadership
steps.
2022-03-17 11:10:57 -04:00
Luiz Aoqui 83d834d84c
tests: move state store namespace tests from ENT (#12308) 2022-03-16 11:56:11 -04:00
Seth Hoenig aca50349f4
Merge pull request #12299 from hashicorp/ci-parallel
ci: trade test parallelization for unconstrained gomaxprocs
2022-03-16 08:55:39 -05:00
Seth Hoenig 2631659551 ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
Luiz Aoqui 8cf599c7fc
fix alloc list test (#12297)
The alloc list test with pagination was creating allocs before the
target namespace existed. This works in OSS but fails in ENT because
quotas are checked before the alloc can be created, so the namespace
must exist beforehand.
2022-03-15 10:41:07 -04:00
James Rasell dc1378d6eb
job: add native service discovery job constraint mutator. 2022-03-14 12:42:12 +01:00
James Rasell 783d7fdc31
jobspec: add service block provider parameter and validation. 2022-03-14 09:21:20 +01:00
Luiz Aoqui 2876739a51
api: apply consistent behaviour of the reverse query parameter (#12244) 2022-03-11 19:44:52 -05:00
Tim Gross 066a820209
job summary query in Job.List RPC should use job's namespace (#12249)
The `Job.List` RPC attaches a `JobSummary` to each job stub. We're
using the request namespace and not the job namespace for that query,
which results in a nil `JobSummary` whenever we pass the wildcard
namespace. This is incorrect and causes panics in downstream consumers
like the CLI, which assume the `JobSummary` is non-nil as an unstate
invariant.
2022-03-09 10:47:19 -05:00
Luiz Aoqui 550c5ab6ec
fix TestCSIVolumeEndpoint_List_PaginationFiltering test (#12245) 2022-03-09 09:40:40 -05:00