open-nomad

Commit Graph

Author	SHA1	Message	Date
hc-github-team-nomad-core	1a309b43fd	Backport of vault: fix legacy token workflow for poststop tasks into release/1.6.x #19272 Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-12-01 13:44:57 -05:00
James Rasell	3730b66d8c	test: use correct parallel test setup func (#18326 ) (#18330 )	2023-08-25 14:48:06 +01:00
Luiz Aoqui	bc17cffaef	node pool: node pool upsert on multiregion node register (#17503 ) When registering a node with a new node pool in a non-authoritative region we can't create the node pool because this new pool will not be replicated to other regions. This commit modifies the node registration logic to only allow automatic node pool creation in the authoritative region. In non-authoritative regions, the client is registered, but the node pool is not created. The client is kept in the `initialing` status until its node pool is created in the authoritative region and replicated to the client's region.	2023-06-13 11:28:28 -04:00
Luiz Aoqui	6039c18ab6	node pools: register a node in a node pool (#17405 )	2023-06-02 17:50:50 -04:00
Charlie Voiselle	86e04a4c6c	[core] nil check and error handling for client status in heartbeat responses (#17316 ) Add a nil check to constructNodeServerInfoResponse to manage an apparent race between deregister and client heartbeats. Fixes #17310	2023-05-25 16:04:54 -04:00
Luiz Aoqui	7b5a8f1fb0	Revert "hashicorp/go-msgpack v2 (#16810 )" (#17047 ) This reverts commit 8a98520d56eed3848096734487d8bd3eb9162a65.	2023-05-01 17:18:34 -04:00
Michael Schurter	d3b0bbc088	deps: update go-bexpr from 0.1.11 to 0.1.12 (#16991 ) Pulls in https://github.com/hashicorp/go-bexpr/pull/38 Fixes #16758	2023-04-27 09:01:42 -07:00
Ian Fijolek	619f49afcf	hashicorp/go-msgpack v2 (#16810 ) * Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0 Fixes #16808 * Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack * deps: use go-msgpack v2.0.0 go-msgpack v2.1.0 includes some code changes that we will need to investigate furthere to assess its impact on Nomad, so keeping this dependency on v2.0.0 for now since it's no-op. --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-04-17 17:02:05 -04:00
Seth Hoenig	ba728f8f97	api: enable support for setting original job source (#16763 ) * api: enable support for setting original source alongside job This PR adds support for setting job source material along with the registration of a job. This includes a new HTTP endpoint and a new RPC endpoint for making queries for the original source of a job. The HTTP endpoint is /v1/job/<id>/submission?version=<version> and the RPC method is Job.GetJobSubmission. The job source (if submitted, and doing so is always optional), is stored in the job_submission memdb table, separately from the actual job. This way we do not incur overhead of reading the large string field throughout normal job operations. The server config now includes job_max_source_size for configuring the maximum size the job source may be, before the server simply drops the source material. This should help prevent Bad Things from happening when huge jobs are submitted. If the value is set to 0, all job source material will be dropped. * api: avoid writing var content to disk for parsing * api: move submission validation into RPC layer * api: return an error if updating a job submission without namespace or job id * api: be exact about the job index we associate a submission with (modify) * api: reword api docs scheduling * api: prune all but the last 6 job submissions * api: protect against nil job submission in job validation * api: set max job source size in test server * api: fixups from pr	2023-04-11 08:45:08 -05:00
hashicorp-copywrite[bot]	005636afa0	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Luiz Aoqui	3479e2231f	core: enforce strict steps for clients reconnect (#15808 ) When a Nomad client that is running an allocation with `max_client_disconnect` set misses a heartbeat the Nomad server will update its status to `disconnected`. Upon reconnecting, the client will make three main RPC calls: - `Node.UpdateStatus` is used to set the client status to `ready`. - `Node.UpdateAlloc` is used to update the client-side information about allocations, such as their `ClientStatus`, task states etc. - `Node.Register` is used to upsert the entire node information, including its status. These calls are made concurrently and are also running in parallel with the scheduler. Depending on the order they run the scheduler may end up with incomplete data when reconciling allocations. For example, a client disconnects and its replacement allocation cannot be placed anywhere else, so there's a pending eval waiting for resources. When this client comes back the order of events may be: 1. Client calls `Node.UpdateStatus` and is now `ready`. 2. Scheduler reconciles allocations and places the replacement alloc to the client. The client is now assigned two allocations: the original alloc that is still `unknown` and the replacement that is `pending`. 3. Client calls `Node.UpdateAlloc` and updates the original alloc to `running`. 4. Scheduler notices too many allocs and stops the replacement. This creates unnecessary placements or, in a different order of events, may leave the job without any allocations running until the whole state is updated and reconciled. To avoid problems like this clients must update _all_ of its relevant information before they can be considered `ready` and available for scheduling. To achieve this goal the RPC endpoints mentioned above have been modified to enforce strict steps for nodes reconnecting: - `Node.Register` does not set the client status anymore. - `Node.UpdateStatus` sets the reconnecting client to the `initializing` status until it successfully calls `Node.UpdateAlloc`. These changes are done server-side to avoid the need of additional coordination between clients and servers. Clients are kept oblivious of these changes and will keep making these calls as they normally would. The verification of whether allocations have been updates is done by storing and comparing the Raft index of the last time the client missed a heartbeat and the last time it updated its allocations.	2023-01-25 15:53:59 -05:00
Tim Gross	dfed1ba5bc	remove most static RPC handlers (#15451 ) Nomad server components that aren't in the `nomad` package like the deployment watcher and volume watcher need to make RPC calls but can't import the Server struct to do so because it creates a circular reference. These components have a "shim" object that gets populated to pass a "static" handler that has no RPC context. Most RPC handlers are never used in this way, but during server setup we were constructing a set of static handlers for most RPC endpoints anyways. This is slightly wasteful but also confusing to developers who end up being encouraged to just copy what was being done for previous RPCs. This changeset includes the following refactorings: * Remove the static handlers field on the server * Instead construct just the specific static handlers we need to pass into the deployment watcher and volume watcher. * Remove the unnecessary static handler from heartbeater * Update various tests to avoid needing the static endpoints and have them use a endpoint constructed on the spot. Follow-up work will examine whether we can remove the RPCs from deployment watcher and volume watcher entirely, falling back to raft applies like node drainer does currently.	2022-12-02 10:12:05 -05:00
Luiz Aoqui	e4c8b59919	Update alloc after reconnect and enforece client heartbeat order (#15068 ) * scheduler: allow updates after alloc reconnects When an allocation reconnects to a cluster the scheduler needs to run special logic to handle the reconnection, check if a replacement was create and stop one of them. If the allocation kept running while the node was disconnected, it will be reconnected with `ClientStatus: running` and the node will have `Status: ready`. This combination is the same as the normal steady state of allocation, where everything is running as expected. In order to differentiate between the two states (an allocation that is reconnecting and one that is just running) the scheduler needs an extra piece of state. The current implementation uses the presence of a `TaskClientReconnected` task event to detect when the allocation has reconnected and thus must go through the reconnection process. But this event remains even after the allocation is reconnected, causing all future evals to consider the allocation as still reconnecting. This commit changes the reconnect logic to use an `AllocState` to register when the allocation was reconnected. This provides the following benefits: - Only a limited number of task states are kept, and they are used for many other events. It's possible that, upon reconnecting, several actions are triggered that could cause the `TaskClientReconnected` event to be dropped. - Task events are set by clients and so their timestamps are subject to time skew from servers. This prevents using time to determine if an allocation reconnected after a disconnect event. - Disconnect events are already stored as `AllocState` and so storing reconnects there as well makes it the only source of information required. With the new logic, the reconnection logic is only triggered if the last `AllocState` is a disconnect event, meaning that the allocation has not been reconnected yet. After the reconnection is handled, the new `ClientStatus` is store in `AllocState` allowing future evals to skip the reconnection logic. * scheduler: prevent spurious placement on reconnect When a client reconnects it makes two independent RPC calls: - `Node.UpdateStatus` to heartbeat and set its status as `ready`. - `Node.UpdateAlloc` to update the status of its allocations. These two calls can happen in any order, and in case the allocations are updated before a heartbeat it causes the state to be the same as a node being disconnected: the node status will still be `disconnected` while the allocation `ClientStatus` is set to `running`. The current implementation did not handle this order of events properly, and the scheduler would create an unnecessary placement since it considered the allocation was being disconnected. This extra allocation would then be quickly stopped by the heartbeat eval. This commit adds a new code path to handle this order of events. If the node is `disconnected` and the allocation `ClientStatus` is `running` the scheduler will check if the allocation is actually reconnecting using its `AllocState` events. * rpc: only allow alloc updates from `ready` nodes Clients interact with servers using three main RPC methods: - `Node.GetAllocs` reads allocation data from the server and writes it to the client. - `Node.UpdateAlloc` reads allocation from from the client and writes them to the server. - `Node.UpdateStatus` writes the client status to the server and is used as the heartbeat mechanism. These three methods are called periodically by the clients and are done so independently from each other, meaning that there can't be any assumptions in their ordering. This can generate scenarios that are hard to reason about and to code for. For example, when a client misses too many heartbeats it will be considered `down` or `disconnected` and the allocations it was running are set to `lost` or `unknown`. When connectivity is restored the to rest of the cluster, the natural mental model is to think that the client will heartbeat first and then update its allocations status into the servers. But since there's no inherit order in these calls the reverse is just as possible: the client updates the alloc status and then heartbeats. This results in a state where allocs are, for example, `running` while the client is still `disconnected`. This commit adds a new verification to the `Node.UpdateAlloc` method to reject updates from nodes that are not `ready`, forcing clients to heartbeat first. Since this check is done server-side there is no need to coordinate operations client-side: they can continue sending these requests independently and alloc update will succeed after the heartbeat is done. * chagelog: add entry for #15068 * code review * client: skip terminal allocations on reconnect When the client reconnects with the server it synchronizes the state of its allocations by sending data using the `Node.UpdateAlloc` RPC and fetching data using the `Node.GetClientAllocs` RPC. If the data fetch happens before the data write, `unknown` allocations will still be in this state and would trigger the `allocRunner.Reconnect` flow. But when the server `DesiredStatus` for the allocation is `stop` the client should not reconnect the allocation. * apply more code review changes * scheduler: persist changes to reconnected allocs Reconnected allocs have a new AllocState entry that must be persisted by the plan applier. * rpc: read node ID from allocs in UpdateAlloc The AllocUpdateRequest struct is used in three disjoint use cases: 1. Stripped allocs from clients Node.UpdateAlloc RPC using the Allocs, and WriteRequest fields 2. Raft log message using the Allocs, Evals, and WriteRequest fields 3. Plan updates using the AllocsStopped, AllocsUpdated, and Job fields Adding a new field that would only be used in one these cases (1) made things more confusing and error prone. While in theory an AllocUpdateRequest could send allocations from different nodes, in practice this never actually happens since only clients call this method with their own allocations. * scheduler: remove logic to handle exceptional case This condition could only be hit if, somehow, the allocation status was set to "running" while the client was "unknown". This was addressed by enforcing an order in "Node.UpdateStatus" and "Node.UpdateAlloc" RPC calls, so this scenario is not expected to happen. Adding unnecessary code to the scheduler makes it harder to read and reason about it. * more code review * remove another unused test	2022-11-04 16:25:11 -04:00
Piotr Kazmierczak	b63944b5c1	cleanup: replace TypeToPtr helper methods with pointer.Of (#14151 ) Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.	2022-08-17 18:26:34 +02:00
Michael Schurter	f21272065d	core: emit node evals only for sys jobs in dc (#12955 ) Whenever a node joins the cluster, either for the first time or after being `down`, we emit a evaluation for every system job to ensure all applicable system jobs are running on the node. This patch adds an optimization to skip creating evaluations for system jobs not in the current node's DC. While the scheduler performs the same feasability check, skipping the creation of the evaluation altogether saves disk, network, and memory.	2022-07-06 14:35:18 -07:00
James Rasell	0c0b028a59	core: allow deleting of evaluations (#13492 ) * core: add eval delete RPC and core functionality. * agent: add eval delete HTTP endpoint. * api: add eval delete API functionality. * cli: add eval delete command. * docs: add eval delete website documentation.	2022-07-06 16:30:11 +02:00
James Rasell	716b8e658b	api: Add support for filtering and pagination to the node list endpoint (#12727 )	2022-04-21 17:04:33 +02:00
Derek Strickland	d1d6009e2c	disconnected clients: Support operator manual interventions (#12436 ) * allocrunner: Remove Shutdown call in Reconnect * Node.UpdateAlloc: Stop orphaned allocs. * reconciler: Stop failed reconnects. * Apply feedback from code review. Handle rebase conflict. * Apply suggestions from code review Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-04-06 09:33:32 -04:00
James Rasell	96d8512c85	test: move remaining tests to use ci.Parallel.	2022-03-24 08:45:13 +01:00
James Rasell	bb8514fc75	core: remove node service registrations when node is down. When a node fails its heart beating a number of actions are taken to ensure state is cleaned. Service registrations a loosely tied to nodes, therefore we should remove these from state when a node is considered terminally down.	2022-03-23 09:42:46 +01:00
Seth Hoenig	2631659551	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Luiz Aoqui	2876739a51	api: apply consistent behaviour of the reverse query parameter (#12244 )	2022-03-11 19:44:52 -05:00
Michael Schurter	20bd8acf43	do not initialize copy's slice if nil in original	2021-12-23 16:40:35 -08:00
Mahmood Ali	84a3522133	Consider all system jobs for a new node (#11054 ) When a node becomes ready, create an eval for all system jobs across namespaces. The previous code uses `job.ID` to deduplicate evals, but that ignores the job namespace. Thus if there are multiple jobs in different namespaces sharing the same ID/Name, only one will be considered for running in the new node. Thus, Nomad may skip running some system jobs in that node.	2021-08-18 09:50:37 -04:00
Mahmood Ali	ea003188fa	system: re-evaluate node on feasibility changes (#11007 ) Fix a bug where system jobs may fail to be placed on a node that initially was not eligible for system job placement. This changes causes the reschedule to re-evaluate the node if any attribute used in feasibility checks changes. Fixes https://github.com/hashicorp/nomad/issues/8448	2021-08-10 17:17:44 -04:00
Chris Baker	263ddd567c	Node Drain Metadata (#10250 )	2021-05-07 13:58:40 -04:00
Chris Baker	770c9cecb5	restored Node.Sanitize() for RPC endpoints multiple other updates from code review	2021-03-26 17:03:15 +00:00
Chris Baker	dd291e69f4	removed deprecated fields from Drain structs and API node drain: use msgtype on txn so that events are emitted wip: encoding extension to add Node.Drain field back to API responses new approach for hiding Node.SecretID in the API, using `json` tag documented this approach in the contributing guide refactored the JSON handlers with extensions modified event stream encoding to use the go-msgpack encoders with the extensions	2021-03-21 15:30:11 +00:00
Seth Hoenig	8b05efcf88	consul/connect: Add support for Connect terminating gateways This PR implements Nomad built-in support for running Consul Connect terminating gateways. Such a gateway can be used by services running inside the service mesh to access "legacy" services running outside the service mesh while still making use of Consul's service identity based networking and ACL policies. https://www.consul.io/docs/connect/gateways/terminating-gateway These gateways are declared as part of a task group level service definition within the connect stanza. service { connect { gateway { proxy { // envoy proxy configuration } terminating { // terminating-gateway configuration entry } } } } Currently Envoy is the only supported gateway implementation in Consul. The gateay task can be customized by configuring the connect.sidecar_task block. When the gateway.terminating field is set, Nomad will write/update the Configuration Entry into Consul on job submission. Because CEs are global in scope and there may be more than one Nomad cluster communicating with Consul, there is an assumption that any terminating gateway defined in Nomad for a particular service will be the same among Nomad clusters. Gateways require Consul 1.8.0+, checked by a node constraint. Closes #9445	2021-01-25 10:36:04 -06:00
Michael Schurter	c2dd9bc996	core: open source namespaces	2020-10-22 15:26:32 -07:00
Drew Bailey	6c788fdccd	Events/msgtype cleanup (#9117 ) * use msgtype in upsert node adds message type to signature for upsert node, update tests, remove placeholder method * UpsertAllocs msg type test setup * use upsertallocs with msg type in signature update test usage of delete node delete placeholder msgtype method * add msgtype to upsert evals signature, update test call sites with test setup msg type handle snapshot upsert eval outside of FSM and ignore eval event remove placeholder upsertevalsmsgtype handle job plan rpc and prevent event creation for plan msgtype cleanup upsertnodeevents updatenodedrain msgtype msg type 0 is a node registration event, so set the default to the ignore type * fix named import * fix signature ordering on upsertnode to match	2020-10-19 09:30:15 -04:00
Michael Schurter	dd09fa1a4a	Merge pull request #9055 from hashicorp/f-9017-resources api: add field filters to /v1/{allocations,nodes}	2020-10-14 14:49:39 -07:00
Drew Bailey	c463479848	filter on additional filter keys, remove switch statement duplication properly wire up durable event count move newline responsibility moves newline creation from NDJson to the http handler, json stream only encodes and sends now ignore snapshot restore if broker is disabled enable dev mode to access event steam without acl use mapping instead of switch use pointers for config sizes, remove unused ttl, simplify closed conn logic	2020-10-14 14:14:33 -04:00
Michael Schurter	8ccbd92cb6	api: add field filters to /v1/{allocations,nodes} Fixes #9017 The ?resources=true query parameter includes resources in the object stub listings. Specifically: - For `/v1/nodes?resources=true` both the `NodeResources` and `ReservedResources` field are included. - For `/v1/allocations?resources=true` the `AllocatedResources` field is included. The ?task_states=false query parameter removes TaskStates from /v1/allocations responses. (By default TaskStates are included.)	2020-10-14 10:35:22 -07:00
Drew Bailey	df96b89958	Add EvictCallbackFn to handle removing entries from go-memdb when they are removed from the event buffer. Wire up event buffer size config, use pointers for structs.Events instead of copying.	2020-10-14 12:44:42 -04:00
Drew Bailey	9d48818eb8	writetxn can return error, add alloc and job generic events. Add events table for durability	2020-10-14 12:44:39 -04:00
Drew Bailey	400455d302	Events/eval alloc events (#9012 ) * generic eval update event first pass at alloc client update events * api/event client	2020-10-14 12:44:37 -04:00
Tim Gross	2854298089	csi: release claims via csi_hook postrun unpublish RPC (#8580 ) Add a Postrun hook to send the `CSIVolume.Unpublish` RPC to the server. This may forward client RPCs to the node plugins or to the controller plugins, depending on whether other allocations on this node have claims on this volume. By making clients responsible for running the `CSIVolume.Unpublish` RPC (and making the RPC available to a `nomad volume detach` command), the volumewatcher becomes only used by the core GC job and we no longer need async volume GC from job deregister and node update.	2020-08-06 14:51:46 -04:00
Mahmood Ali	6dfe0f5d3b	tests: use t.Fatalf when it's clearer	2020-05-27 10:09:56 -04:00
Mahmood Ali	5d80d2a511	tests: eval may be processed quickly	2020-05-26 18:53:24 -04:00
Tim Gross	72430a4e62	csi: don't pass volume claim releases thru GC eval (#8021 ) Following the new volumewatcher in #7794 and performance improvements to it that landed afterwards, there's no particular reason we should be threading claim releases through the GC eval rather than writing an empty `CSIVolumeClaimRequest` with the mode set to `CSIVolumeClaimRelease`, just as the GC evaluation would do. Also, by batching up these raft messages, we can reduce the amount of raft writes by 1 and cross-server RPCs by 1 per volume we release claims on.	2020-05-20 15:22:51 -04:00
Seth Hoenig	f6c8db8a8a	consul/connect: use task kind to get service name Fixes #8000 When requesting a Service Identity token from Consul, use the TaskKind of the Task to get at the service name associated with the task. In the past using the TaskName worked because it was generated as a sidecar task with a name that included the service. In the Native context, we need to get at the service name in a more correct way, i.e. using the TaskKind which is defined to include the service name.	2020-05-18 13:46:00 -06:00
Mahmood Ali	64de395df0	tests: ease debugging TestClientEndpoint_CreateNodeEvals TestClientEndpoint_CreateNodeEvals flakes a bit but its output is very confusing, as `structs.Evaluations` overrides GoString. Here, we emit the entire struct of the evaluation, and hopefully we'll figure out the problem the next time it happens	2020-05-09 16:04:32 -04:00
Tim Gross	a7a64443e1	csi: move volume claim release into volumewatcher (#7794 ) This changeset adds a subsystem to run on the leader, similar to the deployment watcher or node drainer. The `Watcher` performs a blocking query on updates to the `CSIVolumes` table and triggers reaping of volume claims. This will avoid tying up scheduling workers by immediately sending volume claim workloads into their own loop, rather than blocking the scheduling workers in the core GC job doing things like talking to CSI controllers The volume watcher is enabled on leader step-up and disabled on leader step-down. The volume claim GC mechanism now makes an empty claim RPC for the volume to trigger an index bump. That in turn unblocks the blocking query in the volume watcher so it can assess which claims can be released for a volume.	2020-04-30 09:13:00 -04:00
Tim Gross	083b35d651	csi: checkpoint volume claim garbage collection (#7782 ) Adds a `CSIVolumeClaim` type to be tracked as current and past claims on a volume. Allows for a client RPC failure during node or controller detachment without having to keep the allocation around after the first garbage collection eval. This changeset lays groundwork for moving the actual detachment RPCs into a volume watching loop outside the GC eval.	2020-04-23 11:06:23 -04:00
Tim Gross	027277a0d9	csi: make volume GC in job deregister safely async The `Job.Deregister` call will block on the client CSI controller RPCs while the alloc still exists on the Nomad client node. So we need to make the volume claim reaping async from the `Job.Deregister`. This allows `nomad job stop` to return immediately. In order to make this work, this changeset changes the volume GC so that the GC jobs are on a by-volume basis rather than a by-job basis; we won't have to query the (possibly deleted) job at the time of volume GC. We smuggle the volume ID and whether it's a purge into the GC eval ID the same way we smuggled the job ID previously.	2020-04-06 10:15:55 -04:00
Lang Martin	24449e23af	csi: volume validate namespace (#7587 ) * nomad/state/state_store: enforce that the volume namespace exists * nomad/csi_endpoint_test: a couple of broken namespaces now * nomad/csi_endpoint_test: one more test * nomad/node_endpoint_test: use structs.DefaultNamespace * nomad/state/state_store_test: use DefaultNamespace	2020-04-02 10:13:41 -04:00
Michael Schurter	464dae514c	test: assert HostVolumes included in ListNodes	2020-03-30 17:34:44 -07:00
Lang Martin	3621df1dbf	csi: volume ids are only unique per namespace (#7358 ) * nomad/state/schema: use the namespace compound index * scheduler/scheduler: CSIVolumeByID interface signature namespace * scheduler/stack: SetJob on CSIVolumeChecker to capture namespace * scheduler/feasible: pass the captured namespace to CSIVolumeByID * nomad/state/state_store: use namespace in csi_volume index * nomad/fsm: pass namespace to CSIVolumeDeregister & Claim * nomad/core_sched: pass the namespace in volumeClaimReap * nomad/node_endpoint_test: namespaces in Claim testing * nomad/csi_endpoint: pass RequestNamespace to state.* * nomad/csi_endpoint_test: appropriately failed test * command/alloc_status_test: appropriately failed test * node_endpoint_test: avoid notTheNamespace for the job * scheduler/feasible_test: call SetJob to capture the namespace * nomad/csi_endpoint: ACL check the req namespace, query by namespace * nomad/state/state_store: remove deregister namespace check * nomad/state/state_store: remove unused CSIVolumes * scheduler/feasible: CSIVolumeChecker SetJob -> SetNamespace * nomad/csi_endpoint: ACL check * nomad/state/state_store_test: remove call to state.CSIVolumes * nomad/core_sched_test: job namespace match so claim gc works	2020-03-23 13:59:25 -04:00
Tim Gross	8bc5641438	csi: volume claim garbage collection (#7125 ) When an alloc is marked terminal (and after node unstage/unpublish have been called), the client syncs the terminal alloc state with the server via `Node.UpdateAlloc RPC`. For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC handler at the server will emit an eval for a new core job to garbage collect CSI volume claims. When this eval is handled on the core scheduler, it will call a `volumeReap` method to release the claims for all terminal allocs on the job. The volume reap will issue a `ControllerUnpublishVolume` RPC for any node that has no alloc claiming the volume. Once this returns (or is skipped), the volume reap will send a new `CSIVolume.Claim` RPC that releases the volume claim for that allocation in the state store, making it available for scheduling again. This same `volumeReap` method will be called from the core job GC, which gives us a second chance to reclaim volumes during GC if there were controller RPC failures.	2020-03-23 13:58:30 -04:00

1 2 3 4

189 Commits