open-nomad/client
Tim Gross 9ed75e1f72
client: de-duplicate alloc updates and gate during restore (#17074)
When client nodes are restarted, all allocations that have been scheduled on the
node have their modify index updated, including terminal allocations. There are
several contributing factors:

* The `allocSync` method that updates the servers isn't gated on first contact
  with the servers. This means that if a server updates the desired state while
  the client is down, the `allocSync` races with the `Node.ClientGetAlloc`
  RPC. This will typically result in the client updating the server with "running"
  and then immediately thereafter "complete".

* The `allocSync` method unconditionally sends the `Node.UpdateAlloc` RPC even
  if it's possible to assert that the server has definitely seen the client
  state. The allocrunner may queue-up updates even if we gate sending them. So
  then we end up with a race between the allocrunner updating its internal state
  to overwrite the previous update and `allocSync` sending the bogus or duplicate
  update.

This changeset adds tracking of server-acknowledged state to the
allocrunner. This state gets checked in the `allocSync` before adding the update
to the batch, and updated when `Node.UpdateAlloc` returns successfully. To
implement this we need to be able to equality-check the updates against the last
acknowledged state. We also need to add the last acknowledged state to the
client state DB, otherwise we'd drop unacknowledged updates across restarts.

The client restart test has been expanded to cover a variety of allocation
states, including allocs stopped before shutdown, allocs stopped by the server
while the client is down, and allocs that have been completely GC'd on the
server while the client is down. I've also bench tested scenarios where the task
workload is killed while the client is down, resulting in a failed restore.

Fixes #16381
2023-05-11 09:05:24 -04:00
..
allocdir users: eliminate nobody user memoization (#16904) 2023-04-17 12:30:30 -05:00
allochealth [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
allocrunner client: de-duplicate alloc updates and gate during restore (#17074) 2023-05-11 09:05:24 -04:00
allocwatcher [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
config full task cleanup when alloc prerun hook fails (#17104) 2023-05-08 13:17:10 -05:00
consul [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
devicemanager [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
dynamicplugins Merge pull request #16836 from hashicorp/compliance/add-headers 2023-04-10 16:32:03 -07:00
fingerprint cni: fix plugin fingerprinting versions (#16776) 2023-04-20 18:44:39 -07:00
interfaces [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
lib [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
logmon [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
pluginmanager Merge pull request #16836 from hashicorp/compliance/add-headers 2023-04-10 16:32:03 -07:00
servers [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
serviceregistration [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
state client: de-duplicate alloc updates and gate during restore (#17074) 2023-05-11 09:05:24 -04:00
stats [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
structs Revert "hashicorp/go-msgpack v2 (#16810)" (#17047) 2023-05-01 17:18:34 -04:00
taskenv fix host port handling for ipv6 (#16723) 2023-04-20 19:53:20 -07:00
testutil Revert "hashicorp/go-msgpack v2 (#16810)" (#17047) 2023-05-01 17:18:34 -04:00
vaultclient [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
acl.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
acl_test.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
agent_endpoint.go Revert "hashicorp/go-msgpack v2 (#16810)" (#17047) 2023-05-01 17:18:34 -04:00
agent_endpoint_test.go Revert "hashicorp/go-msgpack v2 (#16810)" (#17047) 2023-05-01 17:18:34 -04:00
alloc_endpoint.go Revert "hashicorp/go-msgpack v2 (#16810)" (#17047) 2023-05-01 17:18:34 -04:00
alloc_endpoint_test.go Revert "hashicorp/go-msgpack v2 (#16810)" (#17047) 2023-05-01 17:18:34 -04:00
alloc_watcher_e2e_test.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
client.go client: de-duplicate alloc updates and gate during restore (#17074) 2023-05-11 09:05:24 -04:00
client_stats_endpoint.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
client_stats_endpoint_test.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
client_test.go client: de-duplicate alloc updates and gate during restore (#17074) 2023-05-11 09:05:24 -04:00
csi_endpoint.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
csi_endpoint_test.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
drain.go client: allow `drain_on_shutdown` configuration (#16827) 2023-04-14 15:35:32 -04:00
drain_test.go client: allow `drain_on_shutdown` configuration (#16827) 2023-04-14 15:35:32 -04:00
driver_manager_test.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
enterprise_client_oss.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
fingerprint_manager.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
fingerprint_manager_test.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
fs_endpoint.go Revert "hashicorp/go-msgpack v2 (#16810)" (#17047) 2023-05-01 17:18:34 -04:00
fs_endpoint_test.go Revert "hashicorp/go-msgpack v2 (#16810)" (#17047) 2023-05-01 17:18:34 -04:00
gc.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
gc_test.go api: enable support for setting original job source (#16763) 2023-04-11 08:45:08 -05:00
heartbeatstop.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
heartbeatstop_test.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
meta_endpoint.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
meta_endpoint_test.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
node_updater.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
rpc.go Revert "hashicorp/go-msgpack v2 (#16810)" (#17047) 2023-05-01 17:18:34 -04:00
rpc_test.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
testing.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
util.go client: de-duplicate alloc updates and gate during restore (#17074) 2023-05-11 09:05:24 -04:00