open-nomad/client
Tim Gross a6652bffad
CSI: reorder controller volume detachment (#12387)
In #12112 and #12113 we solved for the problem of races in releasing
volume claims, but there was a case that we missed. During a node
drain with a controller attach/detach, we can hit a race where we call
controller publish before the unpublish has completed. This is
discouraged in the spec but plugins are supposed to handle it
safely. But if the storage provider's API is slow enough and the
plugin doesn't handle the case safely, the volume can get "locked"
into a state where the provider's API won't detach it cleanly.

Check the claim before making any external controller publish RPC
calls so that Nomad is responsible for the canonical information about
whether a volume is currently claimed.

This has a couple side-effects that also had to get fixed here:

* Changing the order means that the volume will have a past claim
  without a valid external node ID because it came from the client, and
  this uncovered a separate bug where we didn't assert the external node
  ID was valid before returning it. Fallthrough to getting the ID from
  the plugins in the state store in this case. We avoided this
  originally because of concerns around plugins getting lost during node
  drain but now that we've fixed that we may want to revisit it in
  future work.
* We should make sure we're handling `FailedPrecondition` cases from
  the controller plugin the same way we handle other retryable cases.
* Several tests had to be updated because they were assuming we fail
  in a particular order that we're no longer doing.
2022-03-29 09:44:00 -04:00
..
allocdir ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
allochealth Merge branch 'main' into f-1.3-boogie-nights 2022-03-23 09:41:25 +01:00
allocrunner CSI: reorder controller volume detachment (#12387) 2022-03-29 09:44:00 -04:00
allocwatcher ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
config Merge branch 'main' into f-1.3-boogie-nights 2022-03-25 16:40:32 +01:00
consul Merge branch 'main' into f-1.3-boogie-nights 2022-03-23 09:41:25 +01:00
devicemanager ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
dynamicplugins ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
fingerprint Merge pull request #12368 from hashicorp/f-1.3-boogie-nights 2022-03-25 18:04:47 +01:00
interfaces replace 'a alloc' with 'an alloc' where appropriate (#11792) 2022-01-10 11:59:46 -05:00
lib client: cgroups v2 code review followup 2022-03-24 13:40:42 -05:00
logmon ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
pluginmanager ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
servers ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
serviceregistration client: modify service wrapper to accomodate restore behaviour. 2022-03-21 09:49:39 +01:00
state Merge branch 'main' into f-1.3-boogie-nights 2022-03-23 09:41:25 +01:00
stats ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
structs ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
taskenv ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
testutil client: cgroups v2 code review followup 2022-03-24 13:40:42 -05:00
vaultclient ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
acl.go Audit config, seams for enterprise audit features 2020-03-23 13:47:42 -04:00
acl_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
agent_endpoint.go json handles were moved to a new package in #10202 2021-04-02 13:31:10 +00:00
agent_endpoint_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
alloc_endpoint.go client: fix multiple imports (#10537) 2021-05-13 14:30:31 -04:00
alloc_endpoint_test.go client: enable support for cgroups v2 2022-03-23 11:35:27 -05:00
alloc_watcher_e2e_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
client.go Merge branch 'main' into f-1.3-boogie-nights 2022-03-25 16:40:32 +01:00
client_stats_endpoint.go Server side impl + touch ups 2018-02-15 13:59:02 -08:00
client_stats_endpoint_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
client_test.go client: enable support for cgroups v2 2022-03-23 11:35:27 -05:00
csi_endpoint.go CSI: allow updates to volumes on re-registration (#12167) 2022-03-07 11:06:59 -05:00
csi_endpoint_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
driver_manager_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
enterprise_client_oss.go gofmt all the files 2021-10-01 10:14:28 -04:00
fingerprint_manager.go chore: fixup inconsistent method receiver names. (#11704) 2021-12-20 11:44:21 +01:00
fingerprint_manager_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
fs_endpoint.go Fix log streaming missing frames (#11721) 2022-01-04 14:07:16 -05:00
fs_endpoint_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
gc.go chore: fix incorrect docstring formatting. 2021-08-30 11:08:12 +02:00
gc_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
heartbeatstop.go Delayed evaluations for stop_after_client_disconnect can cause unwanted extra followup evaluations around job garbage collection (#8099) 2020-06-03 09:48:38 -04:00
heartbeatstop_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
node_updater.go client: use NewNodeEvent builder for consistency (#7559) 2020-03-31 10:02:16 -04:00
rpc.go core: remove all traces of unused protocol version 2022-02-18 16:12:36 -08:00
rpc_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
testing.go client: refactor common service registration objects from Consul. 2022-03-15 09:38:30 +01:00
util.go Revert "client: defensive against getting stale alloc updates" 2020-06-19 15:39:44 -04:00