open-nomad/nomad/structs
Mahmood Ali fbfe4ab1bd Atomic eval insertion with job (de-)registration
This fixes a bug where jobs may get "stuck" unprocessed that
dispropotionately affect periodic jobs around leadership transitions.
When registering a job, the job registration and the eval to process it
get applied to raft as two separate transactions; if the job
registration succeeds but eval application fails, the job may remain
unprocessed. Operators may detect such failure, when submitting a job
update and get a 500 error code, and they could retry; periodic jobs
failures are more likely to go unnoticed, and no further periodic
invocations will be processed until an operator force evaluation.

This fixes the issue by ensuring that the job registration and eval
application get persisted and processed atomically in the same raft log
entry.

Also, applies the same change to ensure atomicity in job deregistration.

Backward Compatibility

We must maintain compatibility in two scenarios: mixed clusters where a
leader can handle atomic updates but followers cannot, and a recent
cluster processes old log entries from legacy or mixed cluster mode.

To handle this constraints: ensure that the leader continue to emit the
Evaluation log entry until all servers have upgraded; also, when
processing raft logs, the servers honor evaluations found in both spots,
the Eval in job (de-)registration and the eval update entries.

When an updated server sees mix-mode behavior where an eval is inserted
into the raft log twice, it ignores the second instance.

I made one compromise in consistency in the mixed-mode scenario: servers
may disagree on the eval.CreateIndex value: the leader and updated
servers will report the job registration index while old servers will
report the index of the eval update log entry. This discripency doesn't
seem to be material - it's the eval.JobModifyIndex that matters.
2020-07-14 11:59:29 -04:00
..
config
batch_future.go
batch_future_test.go
bitmap.go
bitmap_test.go
csi.go csi: add -force flag to volume deregister (#8295) 2020-07-01 12:17:51 -04:00
csi_test.go
devices.go
devices_test.go
diff.go fix swapped old/new multiregion plan diffs (#8378) 2020-07-08 10:10:50 -04:00
diff_test.go fix swapped old/new multiregion plan diffs (#8378) 2020-07-08 10:10:50 -04:00
errors.go
errors_test.go
funcs.go
funcs_test.go ar: support opting into binding host ports to default network IP (#8321) 2020-07-06 18:51:46 -04:00
generate.sh
network.go ar: support opting into binding host ports to default network IP (#8321) 2020-07-06 18:51:46 -04:00
network_test.go
node.go
node_class.go
node_class_test.go
node_test.go
operator.go
service_identities.go
services.go consul/connect: infer task name in service if possible 2020-07-08 13:31:44 -05:00
services_test.go
streaming_rpc.go
structs.go Atomic eval insertion with job (de-)registration 2020-07-14 11:59:29 -04:00
structs_codegen.go
structs_periodic_test.go
structs_test.go MRD: all regions should start pending (#8433) 2020-07-14 10:57:37 -04:00
testing.go
volumes.go