Commit Graph

62 Commits

Author SHA1 Message Date
Michael Schurter ace5faf948
core: backoff considerably when worker is behind raft (#15523)
Upon dequeuing an evaluation workers snapshot their state store at the
eval's wait index or later. This ensures we process an eval at a point
in time after it was created or updated. Processing an eval on an old
snapshot could cause any number of problems such as:

1. Since job registration atomically updates an eval and job in a single
   raft entry, scheduling against indexes before that may not have the
   eval's job or may have an older version.
2. The older the scheduler's snapshot, the higher the likelihood
   something has changed in the cluster state which will cause the plan
   applier to reject the scheduler's plan. This could waste work or
   even cause eval's to be failed needlessly.

However, the workers run in parallel with a new server pulling the
cluster state from a peer. During this time, which may be many minutes
long, the state store is likely far behind the minimum index required
to process evaluations.

This PR addresses this by adding an additional long backoff period after
an eval is nacked. If the scheduler's indexes catches up within the
additional backoff, it will unblock early to dequeue the next eval.

When the server shuts down we'll get a `context.Canceled` error from the state
store method. We need to bubble this error up so that other callers can detect
it. Handle this case separately when waiting after dequeue so that we can warn
on shutdown instead of throwing an ambiguous error message with just the text
"canceled."

While there may be more precise ways to block scheduling until the
server catches up, this approach adds little risk and covers additional
cases where a server may be temporarily behind due to a spike in load or
a saturated network.

For testing, we make the `raftSyncLimit` into a parameter on the worker's `run` method 
so that we can run backoff tests without waiting 30+ seconds. We haven't followed thru
and made all the worker globals into worker parameters, because there isn't much
use outside of testing, but we can consider that in the future.

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-01-24 08:56:35 -05:00
Tim Gross 3c78980b78
make version checks specific to region (1.4.x) (#14912)
* One-time tokens are not replicated between regions, so we don't want to enforce
  that the version check across all of serf, just members in the same region.
* Scheduler: Disconnected clients handling is specific to a single region, so we
  don't want to enforce that the version check across all of serf, just members in
  the same region.
* Variables: enforce version check in Apply RPC
* Cleans up a bunch of legacy checks.

This changeset is specific to 1.4.x and the changes for previous versions of
Nomad will be manually backported in a separate PR.
2022-10-17 16:23:51 -04:00
Derek Strickland fd04a24ac7 disconnected clients: ensure servers meet minimum required version (#12202)
* planner: expose ServerMeetsMinimumVersion via Planner interface
* filterByTainted: add flag indicating disconnect support
* allocReconciler: accept and pass disconnect support flag
* tests: update dependent tests
2022-04-05 17:12:23 -04:00
Luiz Aoqui b1753d0568
scheduler: detect and log unexpected scheduling collisions (#11793) 2022-01-14 20:09:14 -05:00
Charlie Voiselle 98a240cd99
Make number of scheduler workers reloadable (#11593)
## Development Environment Changes
* Added stringer to build deps

## New HTTP APIs
* Added scheduler worker config API
* Added scheduler worker info API

## New Internals
* (Scheduler)Worker API refactor—Start(), Stop(), Pause(), Resume()
* Update shutdown to use context
* Add mutex for contended server data
    - `workerLock` for the `workers` slice
    - `workerConfigLock` for the `Server.Config.NumSchedulers` and
      `Server.Config.EnabledSchedulers` values

## Other
* Adding docs for scheduler worker api
* Add changelog message

Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2022-01-06 11:56:13 -05:00
Charlie Voiselle 176de1bfe6
Refactor sendAck(3) into sendAck(2),sendNack(2),sendAcknowledgement(3) (#11506) 2021-11-17 10:49:55 -05:00
Jasmine Dahilig 8d980edd2e
add create and modify timestamps to evaluations (#5881) 2019-08-07 09:50:35 -07:00
Lang Martin 8f7a20839e worker comment system -> core 2019-07-18 10:32:13 -04:00
Michael Schurter e4bc943a68 nomad: SnapshotAfter -> SnapshotMinIndex
Rename SnapshotAfter to SnapshotMinIndex. The old name was not
technically accurate. SnapshotAtOrAfter is more accurate, but wordy and
still lacks context about what precisely it is at or after (the index).

SnapshotMinIndex was chosen as it describes the action (snapshot), a
constraint (minimum), and the object of the constraint (index).
2019-06-24 12:16:46 -07:00
Michael Schurter e10fea1d7a nomad: include snapshot index when submitting plans
Plan application should use a state snapshot at or after the Raft index
at which the plan was created otherwise it risks being rejected based on
stale data.

This commit adds a Plan.SnapshotIndex which is set by workers when
submitting plan. SnapshotIndex is set to the Raft index of the snapshot
the worker used to generate the plan.

Plan.SnapshotIndex plays a similar role to PlanResult.RefreshIndex.
While RefreshIndex informs workers their StateStore is behind the
leader's, SnapshotIndex is a way to prevent the leader from using a
StateStore behind the worker's.

Plan.SnapshotIndex should be considered the *lower bound* index for
consistently handling plan application.

Plans must also be committed serially, so Plan N+1 should use a state
snapshot containing Plan N. This is guaranteed for plans *after* the
first plan after a leader election.

The Raft barrier on leader election ensures the leader's statestore has
caught up to the log index at which it was elected. This guarantees its
StateStore is at an index > lastPlanIndex.
2019-06-24 12:16:46 -07:00
Michael Schurter 0e39927782 nomad: emit more detailed error
Avoid returning context.DeadlineExceeded as it lacks helpful information
and is often ignored or handled specially by callers.
2019-05-17 14:37:42 -07:00
Michael Schurter 9732bc37ff nomad: refactor waitForIndex into SnapshotAfter
Generalize wait for index logic in the state store for reuse elsewhere.
Also begin plumbing in a context to combine handling of timeouts and
shutdown.
2019-05-17 13:30:23 -07:00
Arshneet Singh 4cf4324b8f Remove allowPlanOptimization from schedulers 2019-04-23 09:18:02 -07:00
Arshneet Singh b977748a4b Add code for plan normalization 2019-04-23 09:18:01 -07:00
Alex Dadgar 4bdccab550 goimports 2019-01-22 15:44:31 -08:00
Alex Dadgar 6d8bb3a7bd Duplicate blocked evals cancelling improved
The old logic for cancelling duplicate blocked evaluations by job id had
the issue where the newer evaluation could have additional node classes
that it is (in)eligible for that we would not capture. This could make
it such that cluster state could change such that the job would make
progress but no evaluation was unblocked.
2018-11-07 10:08:23 -08:00
Alex Dadgar bd420692f3 fix logging 2018-09-25 10:49:55 -07:00
Preetha Appan 86e725e84c Added logging around nacked evals in the scheduler worker 2018-09-25 10:49:02 -07:00
Alex Dadgar 99498da6ed Denormalize jobs in plan and ignore resources of terminal allocs
Denormalize jobs in AppendAllocs:
AppendAlloc was originally only ever called for inplace upgrades and new
allocations. Both these code paths would remove the job from the
allocation. Now we use this to also add fields such as FollowupEvalID
which did not normalize the job. This is only a performance enhancement.

Ignore terminal allocs:
Failed allocations are annotated with the followup Eval ID when one is
created to replace the failed allocation. However, in the plan applier,
when we check if allocations fit, these terminal allocations were not
filtered. This could result in the plan being rejected if the node would
be overcommited if the terminal allocations resources were considered.
2018-09-24 13:53:43 -07:00
Alex Dadgar 3c19d01d7a server 2018-09-15 16:23:13 -07:00
Alex Dadgar 405dab2253 integration test and basic fixes 2018-03-21 16:51:44 -07:00
Josh Soref b56ebdf350 spelling: invoke 2018-03-11 18:00:32 +00:00
Josh Soref 3232f6f27e spelling: abandoned 2018-03-11 17:34:16 +00:00
Alex Dadgar 6911bd7676 Worker waits til max ModifyIndex across EvalsByJob
This PR fixes a scheduling race condition in which the plan results from
one invocation of the scheduler were not being considered by the next
since the Worker was not waiting for the correct index.

Fixes https://github.com/hashicorp/nomad/issues/3198
2017-09-14 14:28:43 -07:00
Alex Dadgar 84d06f6abe Sync namespace changes 2017-09-07 17:04:21 -07:00
Luke Farnell f0ced87b95 fixed all spelling mistakes for goreport 2017-08-07 17:13:05 -04:00
Alex Dadgar b69b357c7f Nomad builds 2017-02-07 20:31:23 -08:00
Alex Dadgar bbe6e3d0b6 Larger delay on mismatch 2016-10-27 11:41:43 -07:00
Alex Dadgar a1d08c2aba Add scheduler version enforcement 2016-10-26 14:52:48 -07:00
Ben Barnard 83f647ed84 Replace "the the" with "the" in documentation and comments 2016-10-11 15:31:40 -04:00
Diptanu Choudhury dabb83063b Review comments 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 7bafb7c675 Updating the job summary while mutating jobs and allocation objects 2016-07-25 17:26:38 -07:00
Alex Dadgar 3a8a27bcff refresh index eval id in log 2016-06-22 13:48:41 -07:00
Alex Dadgar 8ceb7ead20 Do not use snapshot 2016-06-22 09:33:15 -07:00
Alex Dadgar 25decca3ca Worker waitForIndex uses StateStore index, not Raft Applied Index 2016-06-22 09:04:22 -07:00
Alex Dadgar bfdd5846e1 Track unblock indexes and check evals on block to see if they missed an update while in the scheduler 2016-05-24 20:10:56 -07:00
Alex Dadgar 15936822a4 Worker annotates evals with their snapshot index 2016-05-24 20:10:56 -07:00
Alex Dadgar 18d9e89065 Reuse the same evaluation and reblock it until there is no more work to do 2016-05-24 20:10:56 -07:00
Alex Dadgar 88ddfbed31 Revert "Debug messages around the plan and plan response"
This reverts commit 7646657e6b8a892210779eaf5708341b94b29b24.
2016-02-22 22:24:52 -08:00
Alex Dadgar fa8e2d31ee Revert "err logs in worker and scheduler"
This reverts commit 7befc586521b70eb84013bff367310e4cfa45c27.
2016-02-22 22:23:57 -08:00
Alex Dadgar c2242552a1 Debug messages around the plan and plan response 2016-02-22 20:36:11 -08:00
Alex Dadgar f48eabe753 err logs in worker and scheduler 2016-02-22 14:47:59 -08:00
Alex Dadgar 18d2d9c091 Killing a driver handle is retried with an exponential backoff 2016-02-16 21:00:49 -08:00
Armon Dadgar df16cea2a4 nomad: worker supports create eval 2015-09-07 14:23:48 -07:00
Armon Dadgar c7773feced nomad: improve error messages at start for dev mode 2015-09-06 20:18:47 -07:00
Armon Dadgar 7e644b7cc9 nomad: use fast and slow exponential backoff in worker 2015-08-23 17:39:49 -07:00
Armon Dadgar 8c2bc337e6 nomad: adding ability to pause a worker 2015-08-23 10:52:31 -07:00
Armon Dadgar cae67b7f60 nomad: expose UpdateEval as a planner 2015-08-15 14:25:00 -07:00
Armon Dadgar 8dfcb99e7f nomad: rename SystemScheduler to CoreScheduler 2015-08-15 12:38:58 -07:00
Armon Dadgar f7007bfeb5 nomad: avoid split-brain in plan processing due to leader transition or eval retry 2015-08-12 15:44:36 -07:00