open-nomad/nomad/structs
Luiz Aoqui b656981cf0
Track plan rejection history and automatically mark clients as ineligible (#13421)
Plan rejections occur when the scheduler work and the leader plan
applier disagree on the feasibility of a plan. This may happen for valid
reasons: since Nomad does parallel scheduling, it is expected that
different workers will have a different state when computing placements.

As the final plan reaches the leader plan applier, it may no longer be
valid due to a concurrent scheduling taking up intended resources. In
these situations the plan applier will notify the worker that the plan
was rejected and that they should refresh their state before trying
again.

In some rare and unexpected circumstances it has been observed that
workers will repeatedly submit the same plan, even if they are always
rejected.

While the root cause is still unknown this mitigation has been put in
place. The plan applier will now track the history of plan rejections
per client and include in the plan result a list of node IDs that should
be set as ineligible if the number of rejections in a given time window
crosses a certain threshold. The window size and threshold value can be
adjusted in the server configuration.

To avoid marking several nodes as ineligible at one, the operation is rate
limited to 5 nodes every 30min, with an initial burst of 10 operations.
2022-07-12 18:40:20 -04:00
..
config Merge pull request #13109 from hashicorp/merge-release-1.3.1-branch 2022-05-25 10:45:09 -04:00
alloc.go client: fixed a problem calculating a service namespace. (#13493) 2022-06-28 09:47:28 +02:00
alloc_test.go client: fixed a problem calculating a service namespace. (#13493) 2022-06-28 09:47:28 +02:00
batch_future.go
batch_future_test.go
bitmap.go
bitmap_test.go
connect.go
connect_test.go
consul.go
consul_oss.go
consul_oss_test.go
consul_test.go
csi.go CSI: make plugin health_timeout configurable in csi_plugin stanza (#13340) 2022-06-14 10:04:16 -04:00
csi_test.go
devices.go
devices_test.go
diff.go adding support for customized ingress tls (#13184) 2022-06-02 18:43:58 -04:00
diff_test.go adding support for customized ingress tls (#13184) 2022-06-02 18:43:58 -04:00
encoding.go
errors.go api: enable selecting subset of services using rendezvous hashing 2022-06-25 10:37:37 -05:00
errors_test.go
eval.go core: allow deleting of evaluations (#13492) 2022-07-06 16:30:11 +02:00
event.go
extensions.go remove end-user algorithm selection (#13190) 2022-07-11 13:34:04 -04:00
funcs.go core: merge reserved_ports into host_networks (#13651) 2022-07-12 14:40:25 -07:00
funcs_test.go vault: revert support for entity aliases (#12723) 2022-04-22 10:46:34 -04:00
generate.sh workload identity (#13223) 2022-07-11 13:34:05 -04:00
handlers.go
job.go job_hooks: add implicit constraint when using Consul for services. (#12602) 2022-04-20 14:09:13 +02:00
job_test.go job_hooks: add implicit constraint when using Consul for services. (#12602) 2022-04-20 14:09:13 +02:00
network.go core: merge reserved_ports into host_networks (#13651) 2022-07-12 14:40:25 -07:00
network_test.go core: merge reserved_ports into host_networks (#13651) 2022-07-12 14:40:25 -07:00
node.go
node_class.go
node_class_test.go
node_test.go
operator.go core: allow pausing and un-pausing of leader broker routine (#13045) 2022-07-06 16:13:48 +02:00
search.go Implement HTTP search API for Variables (#13257) 2022-07-11 13:34:05 -04:00
secure_variables.go SV: fixes for namespace handling (#13705) 2022-07-12 11:15:57 -04:00
secure_variables_test.go SV: CAS: Implement Check and Set for Delete and Upsert (#13429) 2022-07-11 13:34:06 -04:00
service_identities.go
service_registration.go api: enable selecting subset of services using rendezvous hashing 2022-06-25 10:37:37 -05:00
service_registration_test.go api: enable selecting subset of services using rendezvous hashing 2022-06-25 10:37:37 -05:00
services.go adding support for customized ingress tls (#13184) 2022-06-02 18:43:58 -04:00
services_test.go docs: add docs and tests for tagged_addresses 2022-05-31 13:02:48 -05:00
streaming_rpc.go
structs.go Track plan rejection history and automatically mark clients as ineligible (#13421) 2022-07-12 18:40:20 -04:00
structs_codegen.go
structs_oss.go
structs_periodic_test.go
structs_test.go client: enforce max_kill_timeout client configuration 2022-07-06 15:29:38 -05:00
testing.go
uuid.go core: allow deleting of evaluations (#13492) 2022-07-06 16:30:11 +02:00
vault.go vault: revert support for entity aliases (#12723) 2022-04-22 10:46:34 -04:00
volume_test.go
volumes.go