Commit graph

20 commits

Author SHA1 Message Date
Tim Gross 51f512a3e6
csi: reap unused volume claims at leadership transitions (#11776)
When `volumewatcher.Watcher` starts on the leader, it starts a watch
on every volume and triggers a reap of unused claims on any change to
that volume. But if a reaping is in-flight during leadership
transitions, it will fail and the event that triggered the reap will
be dropped. Perform one reap of unused claims at the start of the
watcher so that leadership transitions don't drop this event.
2022-01-05 11:40:20 -05:00
James Rasell 0cccf7c2b8
volumewatcher: fix test data race. 2021-06-14 12:11:35 +02:00
Tim Gross 60874ebe25
csi: Postrun hook should not change mode (#9323)
The unpublish workflow requires that we know the mode (RW vs RO) if we want to
unpublish the node. Update the hook and the Unpublish RPC so that we mark the
claim for release in a new state but leave the mode alone. This fixes a bug
where RO claims were failing node unpublish.

The core job GC doesn't know the mode, but we don't need it for that workflow,
so add a mode specifically for GC; the volumewatcher uses this as a sentinel
to check whether claims (with their specific RW vs RO modes) need to be claimed.
2020-11-11 13:06:30 -05:00
Drew Bailey 6c788fdccd
Events/msgtype cleanup (#9117)
* use msgtype in upsert node

adds message type to signature for upsert node, update tests, remove placeholder method

* UpsertAllocs msg type test setup

* use upsertallocs with msg type in signature

update test usage of delete node

delete placeholder msgtype method

* add msgtype to upsert evals signature, update test call sites with test setup msg type

handle snapshot upsert eval outside of FSM and ignore eval event

remove placeholder upsertevalsmsgtype

handle job plan rpc and prevent event creation for plan

msgtype cleanup upsertnodeevents

updatenodedrain msgtype

msg type 0 is a node registration event, so set the default  to the ignore type

* fix named import

* fix signature ordering on upsertnode to match
2020-10-19 09:30:15 -04:00
Tim Gross 9b4917ae5f csi: volumewatcher only needs one pass to collect past claims
If a volume GC and a `nomad volume detach` command land concurrently, we can
end up with multiple claims without an allocation, which results in extra
no-op work when finding claims to collect as past claims.
2020-10-09 11:03:51 -04:00
Tim Gross 69f4f171e5
CSI: fix missing ACL tokens for leader-driven RPCs (#8607)
The volumewatcher and GC job in the leader can't make CSI RPCs when ACLs are
enabled without the leader ACL token being passed thru.
2020-08-07 15:37:27 -04:00
Tim Gross 2854298089
csi: release claims via csi_hook postrun unpublish RPC (#8580)
Add a Postrun hook to send the `CSIVolume.Unpublish` RPC to the server. This
may forward client RPCs to the node plugins or to the controller plugins,
depending on whether other allocations on this node have claims on this
volume.

By making clients responsible for running the `CSIVolume.Unpublish` RPC (and
making the RPC available to a `nomad volume detach` command), the
volumewatcher becomes only used by the core GC job and we no longer need
async volume GC from job deregister and node update.
2020-08-06 14:51:46 -04:00
Tim Gross 314458ebdb
csi: update volumewatcher to use unpublish RPC (#8579)
This changeset updates `nomad/volumewatcher` to take advantage of the
`CSIVolume.Unpublish` RPC. This lets us eliminate a bunch of code and
associated tests. The raft batching code can be safely dropped, as the
characteristic times of the CSI RPCs are on the order of seconds or even
minutes, so batching up raft RPCs added complexity without any real world
performance wins.

Includes refactor w/ test cleanup and dead code elimination in volumewatcher
2020-08-06 14:31:18 -04:00
Tim Gross 23be116da0
csi: add -force flag to volume deregister (#8295)
The `nomad volume deregister` command currently returns an error if the volume
has any claims, but in cases where the claims can't be dropped because of
plugin errors, providing a `-force` flag gives the operator an escape hatch.

If the volume has no allocations or if they are all terminal, this flag
deletes the volume from the state store, immediately and implicitly dropping
all claims without further CSI RPCs. Note that this will not also
unmount/detach the volume, which we'll make the responsibility of a separate
`nomad volume detach` command.
2020-07-01 12:17:51 -04:00
Tim Gross 81ae581da6
test: remove flaky test from volumewatcher (#8189)
The volumewatcher restores itself on notification, but detecting this is racy
because it may reap any claim (or find there are no claims to reap) and
shutdown before we can test whether it's running. This appears to have become
flaky with a new version of golang. The other cases in this test case
sufficiently exercise the start/stop behavior of the volumewatcher, so remove
the flaky section.
2020-06-17 15:41:51 -04:00
Mahmood Ali 19141f8103 {volume|deployment}watcher: check for nil batcher 2020-05-26 14:54:27 -04:00
Mahmood Ali 1c79c3b93d refactor: context is first parameter
By convention, go functions take `context.Context` as the first
argument.
2020-05-26 10:18:09 -04:00
Mahmood Ali 1eff8b0ed8 volumewatcher: no batcher when disabling
When disabling volumewatcher (at the end of a test), avoid starting a
new update batcher with its new goroutine.
2020-05-26 10:18:09 -04:00
Tim Gross 4374c1a837
csi: support Secrets parameter in CSI RPCs (#7923)
CSI plugins can require credentials for some publishing and
unpublishing workflow RPCs. Secrets are configured at the time of
volume registration, stored in the volume struct, and then passed
around as an opaque map by Nomad to the plugins.
2020-05-11 17:12:51 -04:00
Tim Gross 1ec41b6770
volumewatcher: stop watcher goroutines when there's no work (#7909)
The watcher goroutines will be automatically started if a volume has
updates, but when idle we shouldn't keep a goroutine running and
taking up memory.
2020-05-11 09:32:05 -04:00
Tim Gross 8373e917fc
volumewatcher: set maximum batch size for raft update (#7907)
The `volumewatcher` has a 250ms batch window so claim updates will not
typically be large enough to risk exceeding the maximum raft message
size. But large jobs might have enough volume claims that this could
be a danger. Set a maximum batch size of 100 messages per
batch (roughly 33K), as a very conservative safety/robustness guard.

Co-authored-by: Chris Baker <1675087+cgbaker@users.noreply.github.com>
2020-05-08 16:53:57 -04:00
Tim Gross 42f9d517d8
CSI volumewatcher testability improvments (#7889)
* volumewatcher: remove redundant log fields

The constructor for `volumeWatcher` already sets a `logger.With` that
includes the volume ID and namespace fields. Remove them from the
various trace logs.

* volumewatcher: advance state for controller already released

One way of bypassing client RPCs in testing is to set a claim status
to controller-detached, but this results in an incorrect claim state
when we checkpoint.
2020-05-07 15:57:24 -04:00
Tim Gross 1c6dcab56b
volumewatcher: remove spurious nil-check (#7858)
The nil-check here is left-over from an earlier approach that didn't
get merged. It doesn't do anything for us now as we can't ever pass it
`nil` and if we leave it in the `getVolume` call it guards will panic
anyways.
2020-05-04 12:28:32 -04:00
Tim Gross 52e805a6a6
csi: ensure Read/WriteAllocs aren't released early (#7841)
We should only remove the `ReadAllocs`/`WriteAllocs` values for a
volume after the claim has entered the "ready to free"
state. The volume will eventually be released as expected. But
querying the volume API will show the volume is released before the
controller unpublish has finished and this can cause a race with
starting new jobs.

Test updates are to cover cases where we're dropping claims but not
running through the whole reaping process.
2020-04-30 17:11:31 -04:00
Tim Gross a7a64443e1
csi: move volume claim release into volumewatcher (#7794)
This changeset adds a subsystem to run on the leader, similar to the
deployment watcher or node drainer. The `Watcher` performs a blocking
query on updates to the `CSIVolumes` table and triggers reaping of
volume claims.

This will avoid tying up scheduling workers by immediately sending
volume claim workloads into their own loop, rather than blocking the
scheduling workers in the core GC job doing things like talking to CSI
controllers

The volume watcher is enabled on leader step-up and disabled on leader
step-down.

The volume claim GC mechanism now makes an empty claim RPC for the
volume to trigger an index bump. That in turn unblocks the blocking
query in the volume watcher so it can assess which claims can be
released for a volume.
2020-04-30 09:13:00 -04:00