open-nomad/nomad
Mahmood Ali 33dfe98770
deployment watcher: Reuse allocsCh if allocIndex remains the same (#10756)
Fix deployment watchers to avoid creating unnecessary deployment watcher goroutines and blocking queries. `deploymentWatcher.getAllocsCh` creates a new goroutine that makes a blocking query to fetch updates of deployment allocs.

## Background

When operators submit a new or updated service job, Nomad create a new deployment by default. The deployment object controls how fast to place the allocations through [`max_parallel`](https://www.nomadproject.io/docs/job-specification/update#max_parallel) and health checks configurations.

The `scheduler` and `deploymentwatcher` package collaborate to achieve deployment logic: The scheduler only places the canaries and `max_parallel` allocations for a new deployment; the `deploymentwatcher` monitors for alloc progress and then enqueues a new evaluation whenever the scheduler should reprocess a job and places the next `max_parallel` round of allocations.

The `deploymentwatcher` package makes blocking queries against the state store, to fetch all deployments and the relevant allocs for each running deployments. If `deploymentwatcher` fails or is hindered from fetching the state, the deployments fail to make progress.

`Deploymentwatcher` logic only runs on the leader.

## Why unnecessary deployment watchers can halt cluster progress
Previously, `getAllocsCh` is called on every for loop iteration in `deploymentWatcher.watch()` function. However, the for-loop may iterate many times before the allocs get updated. In fact, whenever a new deployment is created/updated/deleted, *all* `deploymentWatcher`s get notified through `w.deploymentUpdateCh`. The `getAllocsCh` goroutines and blocking queries spike significantly and grow quadratically with respect to the number of running deployments. The growth leads to two adverse outcomes:

1. it spikes the CPU/Memory usage resulting potentially leading to OOM or very slow processing
2. it activates the [query rate limiter](abaa9c5c5b/nomad/deploymentwatcher/deployment_watcher.go (L896-L898)), so later the watcher fails to get updates and consequently fails to make progress towards placing new allocations for the deployment!

So the cluster fails to catch up and fails to make progress in almost all deployments. The cluster recovers after a leader transition: the deposed leader stops all watchers and free up goroutines and blocking queries; the new leader recreates the watchers without the quadratic growth and remaining under the rate limiter.  Well, until a spike of deployments are created triggering the condition again.

### Relevant Code References

Path for deployment monitoring:
* [`Watcher.watchDeployments`](abaa9c5c5b/nomad/deploymentwatcher/deployments_watcher.go (L164-L192)) loops waiting for deployment updates.
* On every deployment update, [`w.getDeploys`](abaa9c5c5b/nomad/deploymentwatcher/deployments_watcher.go (L194-L229)) returns all deployments in the system
* `watchDeployments` calls `w.add(d)` on every active deployment
* which in turns, [updates existing watcher if one is found](abaa9c5c5b/nomad/deploymentwatcher/deployments_watcher.go (L251-L255)).
* The deployment watcher [updates local local deployment field and trigger `deploymentUpdateCh` channel]( abaa9c5c5b/nomad/deploymentwatcher/deployment_watcher.go (L136-L147))
* The [deployment watcher `deploymentUpdateCh` selector is activated](abaa9c5c5b/nomad/deploymentwatcher/deployment_watcher.go (L455-L489)). Most of the time the selector clause is a no-op, because the flow was triggered due to another deployment update
* The `watch` for-loop iterates again and in the previous code we create yet another goroutine and blocking call that risks being rate limited.

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2021-06-14 16:01:01 -04:00
..
deploymentwatcher deployment watcher: Reuse allocsCh if allocIndex remains the same (#10756) 2021-06-14 16:01:01 -04:00
drainer Node Drain Metadata (#10250) 2021-05-07 13:58:40 -04:00
mock consul/connect: add support for connect mesh gateways 2021-06-04 08:24:49 -05:00
state tests: use standard library testing.TB 2021-06-09 16:18:45 -07:00
stream events: fix slow client connection to empty event stream (#10637) 2021-05-21 13:17:07 -04:00
structs remove generated files 2021-06-10 08:04:25 -04:00
volumewatcher volumewatcher: fix test data race. 2021-06-14 12:11:35 +02:00
acl.go Audit config, seams for enterprise audit features 2020-03-23 13:47:42 -04:00
acl_endpoint.go one-time token: never return expired tokens 2021-03-10 08:17:56 -05:00
acl_endpoint_test.go one-time token: never return expired tokens 2021-03-10 08:17:56 -05:00
acl_test.go Event Stream: Track ACL changes, unsubscribe on invalidating changes (#9447) 2020-12-01 11:11:34 -05:00
alloc_endpoint.go updated Allocation.List to properly handle ACL checking for namespace=* 2020-11-05 17:26:33 +00:00
alloc_endpoint_test.go documenting test for #9268 2020-11-05 16:19:55 +00:00
autopilot.go implement MinQuorum 2020-02-16 16:04:59 -06:00
autopilot_test.go tests: deflake TestMonitor_Monitor_RemoteServer and cross-region tests 2021-06-10 21:27:55 -04:00
blocked_evals.go Add metrics for blocked eval resources (#10454) 2021-04-29 15:03:45 -04:00
blocked_evals_stats.go Add metrics for blocked eval resources (#10454) 2021-04-29 15:03:45 -04:00
blocked_evals_stats_test.go Add metrics for blocked eval resources (#10454) 2021-04-29 15:03:45 -04:00
blocked_evals_system.go blocked_evals system evals indexed by job and node 2019-07-18 10:32:12 -04:00
blocked_evals_test.go Add metrics for blocked eval resources (#10454) 2021-04-29 15:03:45 -04:00
client_agent_endpoint.go json handles were moved to a new package in #10202 2021-04-02 13:31:10 +00:00
client_agent_endpoint_test.go tests: deflake TestAgentProfile_RemoteClient 2021-06-10 22:00:15 -04:00
client_alloc_endpoint.go Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
client_alloc_endpoint_test.go tests: remove duplicate import statements. 2021-06-11 09:39:22 +02:00
client_csi_endpoint.go CSI: volume snapshot 2021-04-01 11:16:52 -04:00
client_csi_endpoint_test.go CSI: volume snapshot 2021-04-01 11:16:52 -04:00
client_fs_endpoint.go Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
client_fs_endpoint_test.go Events/msgtype cleanup (#9117) 2020-10-19 09:30:15 -04:00
client_rpc.go fixup! vendor: explicit use of hashicorp/go-msgpack 2020-03-31 09:48:07 -04:00
client_rpc_test.go Simplify Bootstrap logic in tests 2020-03-02 13:47:43 -05:00
client_stats_endpoint.go
client_stats_endpoint_test.go Events/msgtype cleanup (#9117) 2020-10-19 09:30:15 -04:00
config.go deployment query rate limit (#10706) 2021-06-04 12:38:46 -07:00
consul.go consul: correctly check consul acl token namespace when using consul oss 2021-06-08 13:55:57 -05:00
consul_oss_test.go consul: correctly check consul acl token namespace when using consul oss 2021-06-08 13:55:57 -05:00
consul_policy.go consul: correctly check consul acl token namespace when using consul oss 2021-06-08 13:55:57 -05:00
consul_policy_oss_test.go consul: move consul acl tests into ent files 2021-06-09 08:38:42 -05:00
consul_policy_test.go consul: correctly check consul acl token namespace when using consul oss 2021-06-08 13:55:57 -05:00
consul_test.go consul: correctly check consul acl token namespace when using consul oss 2021-06-08 13:55:57 -05:00
core_sched.go RPC endpoints to support 'nomad ui -login' 2021-03-10 08:17:56 -05:00
core_sched_test.go CSI: capability block is required for volume registration 2021-04-08 13:02:24 -04:00
csi_endpoint.go csi: accept list of caps during validation in volume register 2021-06-04 07:57:26 -04:00
csi_endpoint_test.go CSI: capability block is required for volume registration 2021-04-08 13:02:24 -04:00
deployment_endpoint.go api: add field filters to /v1/{allocations,nodes} 2020-10-14 10:35:22 -07:00
deployment_endpoint_test.go Events/msgtype cleanup (#9117) 2020-10-19 09:30:15 -04:00
deployment_watcher_shims.go consul: plubming for specifying consul namespace in job/group 2021-04-05 10:03:19 -06:00
drainer_int_test.go Migrate all allocs when draining a node (#10411) 2021-04-21 12:11:14 -04:00
drainer_shims.go
endpoints_oss.go include pro tag in serveral oss.go files 2020-02-10 15:56:14 -05:00
eval_broker.go
eval_broker_test.go nomad: TestEvalBroker_Dequeue_Empty_Timeout() proper goroutine error handling (#6657) 2019-11-08 14:35:06 -05:00
eval_endpoint.go Fix some errcheck errors (#9811) 2021-01-14 12:46:35 -08:00
eval_endpoint_test.go Events/msgtype cleanup (#9117) 2020-10-19 09:30:15 -04:00
event_endpoint.go Event Stream: Track ACL changes, unsubscribe on invalidating changes (#9447) 2020-12-01 11:11:34 -05:00
event_endpoint_test.go events: fix event endpoint tests to ignore heartbeats. 2021-05-24 10:28:19 +02:00
fsm.go Node Drain Metadata (#10250) 2021-05-07 13:58:40 -04:00
fsm_not_ent.go
fsm_registry_oss.go fixup! vendor: explicit use of hashicorp/go-msgpack 2020-03-31 09:48:07 -04:00
fsm_test.go Merge branch 'main' into f-node-drain-api 2021-04-01 15:22:57 -05:00
heartbeat.go
heartbeat_test.go Events/msgtype cleanup (#9117) 2020-10-19 09:30:15 -04:00
job_endpoint.go Allow configuring memory oversubscription (#10466) 2021-04-29 22:09:56 -04:00
job_endpoint_hook_connect.go consul/connect: remove unnecessary connect constraint on clients 2021-06-14 08:01:45 -05:00
job_endpoint_hook_connect_test.go consul/connect: remove unnecessary connect constraint on clients 2021-06-14 08:01:45 -05:00
job_endpoint_hook_expose_check.go connect: use deterministic injected dynamic exposed port 2021-04-30 15:18:22 -06:00
job_endpoint_hook_expose_check_test.go connect: use deterministic injected dynamic exposed port 2021-04-30 15:18:22 -06:00
job_endpoint_hooks.go Allow configuring memory oversubscription (#10466) 2021-04-29 22:09:56 -04:00
job_endpoint_oss.go Allow job Version to start at non-zero value (#9071) 2020-10-12 13:59:48 -04:00
job_endpoint_oss_test.go consul: correctly check consul acl token namespace when using consul oss 2021-06-08 13:55:57 -05:00
job_endpoint_test.go consul: correctly check consul acl token namespace when using consul oss 2021-06-08 13:55:57 -05:00
leader.go leader: call eval log formatting lazily 2021-06-02 09:59:55 -04:00
leader_oss.go include pro tag in serveral oss.go files 2020-02-10 15:56:14 -05:00
leader_test.go deflake TestNomad_BootstrapExpect and other leader tests 2021-06-10 22:04:10 -04:00
merge.go
namespace_endpoint.go Fix some errcheck errors (#9811) 2021-01-14 12:46:35 -08:00
namespace_endpoint_test.go core: open source namespaces 2020-10-22 15:26:32 -07:00
node_endpoint.go consul: plubming for specifying consul namespace in job/group 2021-04-05 10:03:19 -06:00
node_endpoint_test.go Node Drain Metadata (#10250) 2021-05-07 13:58:40 -04:00
operator_endpoint.go minor tweaks from Ent 2020-07-20 09:25:09 -04:00
operator_endpoint_test.go Events/msgtype cleanup (#9117) 2020-10-19 09:30:15 -04:00
periodic.go periodic: always reset periodic children status 2021-03-25 11:27:09 -04:00
periodic_endpoint.go dispatch-job capability to dispatch periodic jobs 2020-10-27 16:33:01 -04:00
periodic_endpoint_test.go dispatch-job capability to dispatch periodic jobs 2020-10-27 16:33:01 -04:00
periodic_test.go periodic: always reset periodic children status 2021-03-25 11:27:09 -04:00
plan_apply.go plan applier: add trace-level log of plan 2021-06-02 10:25:23 -04:00
plan_apply_not_ent.go
plan_apply_pool.go
plan_apply_pool_test.go Events/msgtype cleanup (#9117) 2020-10-19 09:30:15 -04:00
plan_apply_test.go reworked Node.Canonicalize() to enforce invariants, fixed a broken test 2021-03-26 18:58:38 +00:00
plan_endpoint.go
plan_endpoint_test.go tests: swap lib/freeport for tweaked helper/freeport 2019-12-09 08:37:32 -06:00
plan_normalization_test.go api: implement fuzzy search API 2021-04-16 16:36:07 -06:00
plan_queue.go
plan_queue_test.go nomad: fix test goroutine (#6593) 2019-10-31 08:23:32 -04:00
raft_rpc.go
regions_endpoint.go
regions_endpoint_test.go tests: swap lib/freeport for tweaked helper/freeport 2019-12-09 08:37:32 -06:00
rpc.go Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
rpc_test.go use os.ErrDeadlineExceeded in tests 2020-12-07 10:40:28 -05:00
scaling_endpoint.go fix #9227: use both job and type query on scaling policy list endpoint 2020-11-10 23:26:35 +00:00
scaling_endpoint_test.go simple test to ensure that scaling endpoint methods support IsRead for 2021-01-05 13:42:18 +00:00
search_endpoint.go api: include ent fuzzy struct types in oss 2021-04-20 11:19:38 -06:00
search_endpoint_oss.go api: implement fuzzy search API 2021-04-16 16:36:07 -06:00
search_endpoint_test.go api: fuzzy search results include job name with id in scope 2021-04-16 17:03:36 -06:00
serf.go tweak bootstrap testing 2021-01-04 09:00:40 -05:00
serf_test.go deflake TestNomad_BootstrapExpect and other leader tests 2021-06-10 22:04:10 -04:00
server.go deployment query rate limit (#10706) 2021-06-04 12:38:46 -07:00
server_setup_oss.go licensing: remove raft storage and sync 2021-04-28 10:28:23 -04:00
server_test.go Simplify Bootstrap logic in tests 2020-03-02 13:47:43 -05:00
stats_fetcher.go
stats_fetcher_test.go Simplify Bootstrap logic in tests 2020-03-02 13:47:43 -05:00
status_endpoint.go
status_endpoint_test.go tests: swap lib/freeport for tweaked helper/freeport 2019-12-09 08:37:32 -06:00
system_endpoint.go
system_endpoint_test.go Events/msgtype cleanup (#9117) 2020-10-19 09:30:15 -04:00
testing.go tests: deflake TestMonitor_Monitor_RemoteServer and cross-region tests 2021-06-10 21:27:55 -04:00
testing_oss.go licensing: remove raft storage and sync 2021-04-28 10:28:23 -04:00
timetable.go vendor: explicit use of hashicorp/go-msgpack 2020-03-31 09:45:21 -04:00
timetable_test.go fixup! vendor: explicit use of hashicorp/go-msgpack 2020-03-31 09:48:07 -04:00
util.go csi: client RPCs should return wrapped errors for checking (#8605) 2020-08-07 11:01:36 -04:00
util_test.go remove unused dropButLastChannel 2020-02-13 18:56:53 -05:00
vault.go Fix some errcheck errors (#9811) 2021-01-14 12:46:35 -08:00
vault_test.go Merge pull request #8524 from hashicorp/b-vault-health-checks 2020-08-11 16:01:07 -04:00
vault_testing.go on leadership establishment, revoke Vault tokens in background 2020-05-21 07:38:27 -04:00
worker.go add create and modify timestamps to evaluations (#5881) 2019-08-07 09:50:35 -07:00
worker_test.go Events/msgtype cleanup (#9117) 2020-10-19 09:30:15 -04:00