The `volumewatcher` has a 250ms batch window so claim updates will not
typically be large enough to risk exceeding the maximum raft message
size. But large jobs might have enough volume claims that this could
be a danger. Set a maximum batch size of 100 messages per
batch (roughly 33K), as a very conservative safety/robustness guard.
Co-authored-by: Chris Baker <1675087+cgbaker@users.noreply.github.com>
This changeset adds a subsystem to run on the leader, similar to the
deployment watcher or node drainer. The `Watcher` performs a blocking
query on updates to the `CSIVolumes` table and triggers reaping of
volume claims.
This will avoid tying up scheduling workers by immediately sending
volume claim workloads into their own loop, rather than blocking the
scheduling workers in the core GC job doing things like talking to CSI
controllers
The volume watcher is enabled on leader step-up and disabled on leader
step-down.
The volume claim GC mechanism now makes an empty claim RPC for the
volume to trigger an index bump. That in turn unblocks the blocking
query in the volume watcher so it can assess which claims can be
released for a volume.