open-nomad

Author	SHA1	Message	Date
Tim Gross	759310d13a	CSI: volume watcher shutdown fixes (#12439 ) The volume watcher design was based on deploymentwatcher and drainer, but has an important difference: we don't want to maintain a goroutine for the lifetime of the volume. So we stop the volumewatcher goroutine for a volume when that volume has no more claims to free. But the shutdown races with updates on the parent goroutine, and it's possible to drop updates. Fortunately these updates are picked up on the next core GC job, but we're most likely to hit this race when we're replacing an allocation and that's the time we least want to wait. Wait until the volume has "settled" before stopping this goroutine so that the race between shutdown and the parent goroutine sending on `<-updateCh` is pushed to after the window we most care about quick freeing of claims. * Fixes a resource leak when volumewatchers are no longer needed. The volume is nil and can't ever be started again, so the volume's `watcher` should be removed from the top-level `Watcher`. * De-flakes the GC job test: the test throws an error because the claimed node doesn't exist and is unreachable. This flaked instead of failed because we didn't correctly wait for the first pass through the volumewatcher. Make the GC job wait for the volumewatcher to reach the quiescent timeout window state before running the GC eval under test, so that we're sure the GC job's work isn't being picked up by processing one of the earlier claims. Update the claims used so that we're sure the GC pass won't hit a node unpublish error. * Adds trace logging to unpublish operations	2022-04-04 10:46:45 -04:00
Tim Gross	b20a6c9ffb	CSI: move terminal alloc handling into denormalization (#11931 ) * The volume claim GC method and volumewatcher both have logic collecting terminal allocations that duplicates most of the logic that's now in the state store's `CSIVolumeDenormalize` method. Copy this logic into the state store so that all code paths have the same view of the past claims. * Remove logic in the volume claim GC that now lives in the state store's `CSIVolumeDenormalize` method. * Remove logic in the volumewatcher that now lives in the state store's `CSIVolumeDenormalize` method. * Remove logic in the node unpublish RPC that now lives in the state store's `CSIVolumeDenormalize` method.	2022-01-27 10:39:08 -05:00
Tim Gross	51f512a3e6	csi: reap unused volume claims at leadership transitions (#11776 ) When `volumewatcher.Watcher` starts on the leader, it starts a watch on every volume and triggers a reap of unused claims on any change to that volume. But if a reaping is in-flight during leadership transitions, it will fail and the event that triggered the reap will be dropped. Perform one reap of unused claims at the start of the watcher so that leadership transitions don't drop this event.	2022-01-05 11:40:20 -05:00
Tim Gross	9b4917ae5f	csi: volumewatcher only needs one pass to collect past claims If a volume GC and a `nomad volume detach` command land concurrently, we can end up with multiple claims without an allocation, which results in extra no-op work when finding claims to collect as past claims.	2020-10-09 11:03:51 -04:00
Tim Gross	69f4f171e5	CSI: fix missing ACL tokens for leader-driven RPCs (#8607 ) The volumewatcher and GC job in the leader can't make CSI RPCs when ACLs are enabled without the leader ACL token being passed thru.	2020-08-07 15:37:27 -04:00
Tim Gross	2854298089	csi: release claims via csi_hook postrun unpublish RPC (#8580 ) Add a Postrun hook to send the `CSIVolume.Unpublish` RPC to the server. This may forward client RPCs to the node plugins or to the controller plugins, depending on whether other allocations on this node have claims on this volume. By making clients responsible for running the `CSIVolume.Unpublish` RPC (and making the RPC available to a `nomad volume detach` command), the volumewatcher becomes only used by the core GC job and we no longer need async volume GC from job deregister and node update.	2020-08-06 14:51:46 -04:00
Tim Gross	314458ebdb	csi: update volumewatcher to use unpublish RPC (#8579 ) This changeset updates `nomad/volumewatcher` to take advantage of the `CSIVolume.Unpublish` RPC. This lets us eliminate a bunch of code and associated tests. The raft batching code can be safely dropped, as the characteristic times of the CSI RPCs are on the order of seconds or even minutes, so batching up raft RPCs added complexity without any real world performance wins. Includes refactor w/ test cleanup and dead code elimination in volumewatcher	2020-08-06 14:31:18 -04:00
Tim Gross	4374c1a837	csi: support Secrets parameter in CSI RPCs (#7923 ) CSI plugins can require credentials for some publishing and unpublishing workflow RPCs. Secrets are configured at the time of volume registration, stored in the volume struct, and then passed around as an opaque map by Nomad to the plugins.	2020-05-11 17:12:51 -04:00
Tim Gross	1ec41b6770	volumewatcher: stop watcher goroutines when there's no work (#7909 ) The watcher goroutines will be automatically started if a volume has updates, but when idle we shouldn't keep a goroutine running and taking up memory.	2020-05-11 09:32:05 -04:00
Tim Gross	42f9d517d8	CSI volumewatcher testability improvments (#7889 ) * volumewatcher: remove redundant log fields The constructor for `volumeWatcher` already sets a `logger.With` that includes the volume ID and namespace fields. Remove them from the various trace logs. * volumewatcher: advance state for controller already released One way of bypassing client RPCs in testing is to set a claim status to controller-detached, but this results in an incorrect claim state when we checkpoint.	2020-05-07 15:57:24 -04:00
Tim Gross	1c6dcab56b	volumewatcher: remove spurious nil-check (#7858 ) The nil-check here is left-over from an earlier approach that didn't get merged. It doesn't do anything for us now as we can't ever pass it `nil` and if we leave it in the `getVolume` call it guards will panic anyways.	2020-05-04 12:28:32 -04:00
Tim Gross	a7a64443e1	csi: move volume claim release into volumewatcher (#7794 ) This changeset adds a subsystem to run on the leader, similar to the deployment watcher or node drainer. The `Watcher` performs a blocking query on updates to the `CSIVolumes` table and triggers reaping of volume claims. This will avoid tying up scheduling workers by immediately sending volume claim workloads into their own loop, rather than blocking the scheduling workers in the core GC job doing things like talking to CSI controllers The volume watcher is enabled on leader step-up and disabled on leader step-down. The volume claim GC mechanism now makes an empty claim RPC for the volume to trigger an index bump. That in turn unblocks the blocking query in the volume watcher so it can assess which claims can be released for a volume.	2020-04-30 09:13:00 -04:00

12 commits