Commit graph

41 commits

Author SHA1 Message Date
Tim Gross fbaf4c8b69
node pools: implement support in scheduler (#17443)
Implement scheduler support for node pool:

* When a scheduler is invoked, we get a set of the ready nodes in the DCs that
  are allowed for that job. Extend the filter to include the node pool.
* Ensure that changes to a job's node pool are picked up as destructive
  allocation updates.
* Add `NodesInPool` as a metric to all reporting done by the scheduler.
* Add the node-in-pool the filter to the `Node.Register` RPC so that we don't
  generate spurious evals for nodes in the wrong pool.
2023-06-07 10:39:03 -04:00
hashicorp-copywrite[bot] 005636afa0 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Tim Gross 3c78980b78
make version checks specific to region (1.4.x) (#14912)
* One-time tokens are not replicated between regions, so we don't want to enforce
  that the version check across all of serf, just members in the same region.
* Scheduler: Disconnected clients handling is specific to a single region, so we
  don't want to enforce that the version check across all of serf, just members in
  the same region.
* Variables: enforce version check in Apply RPC
* Cleans up a bunch of legacy checks.

This changeset is specific to 1.4.x and the changes for previous versions of
Nomad will be manually backported in a separate PR.
2022-10-17 16:23:51 -04:00
Derek Strickland fd04a24ac7 disconnected clients: ensure servers meet minimum required version (#12202)
* planner: expose ServerMeetsMinimumVersion via Planner interface
* filterByTainted: add flag indicating disconnect support
* allocReconciler: accept and pass disconnect support flag
* tests: update dependent tests
2022-04-05 17:12:23 -04:00
Tim Gross d9d4da1e9f
scheduler: seed random shuffle nodes with eval ID (#12008)
Processing an evaluation is nearly a pure function over the state
snapshot, but we randomly shuffle the nodes. This means that
developers can't take a given state snapshot and pass an evaluation
through it and be guaranteed the same plan results.

But the evaluation ID is already random, so if we use this as the seed
for shuffling the nodes we can greatly reduce the sources of
non-determinism. Unfortunately golang map iteration uses a global
source of randomness and not a goroutine-local one, but arguably
if the scheduler behavior is impacted by this, that's a bug in the
iteration.
2022-02-08 12:16:33 -05:00
Luiz Aoqui b1753d0568
scheduler: detect and log unexpected scheduling collisions (#11793) 2022-01-14 20:09:14 -05:00
Seth Hoenig 3371214431 core: implement system batch scheduler
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
2021-08-03 10:30:47 -04:00
Tim Gross 9b2b580d1a
CSI: remove prefix matching from CSIVolumeByID and fix CLI prefix matching (#10158)
Callers of `CSIVolumeByID` are generally assuming they should receive a single
volume. This potentially results in feasibility checking being performed
against the wrong volume if a volume's ID is a prefix substring of other
volume (for example: "test" and "testing").

Removing the incorrect prefix matching from `CSIVolumeByID` breaks prefix
matching in the command line client. Add the required elements for prefix
matching to the commands and API.
2021-03-18 14:32:40 -04:00
Mahmood Ali 8a342926b7 Respect alloc job version for lost/failed allocs
This change fixes a bug where lost/failed allocations are replaced by
allocations with the latest versions, even if the version hasn't been
promoted yet.

Now, when generating a plan for lost/failed allocations, the scheduler
first checks if the current deployment is in Canary stage, and if so, it
ensures that any lost/failed allocations is replaced one with the latest
promoted version instead.
2020-08-19 09:52:48 -04:00
Lang Martin e03c328792
csi: use node MaxVolumes during scheduling (#7565)
* nomad/state/state_store: CSIVolumesByNodeID ignores namespace

* scheduler/scheduler: add CSIVolumesByNodeID to the state interface

* scheduler/feasible: check node MaxVolumes

* nomad/csi_endpoint: no namespace inn CSIVolumesByNodeID anymore

* nomad/state/state_store: avoid DenormalizeAllocationSlice

* nomad/state/iterator: clean up SliceIterator Next

* scheduler/feasible_test: block with MaxVolumes

* nomad/state/state_store_test: fix args to CSIVolumesByNodeID
2020-03-31 17:16:47 -04:00
Lang Martin d994990ef0
csi: the scheduler allows a job with a volume write claim to be updated (#7438)
* nomad/structs/csi: split CanWrite into health, in use

* scheduler/scheduler: expose AllocByID in the state interface

* nomad/state/state_store_test

* scheduler/stack: SetJobID on the matcher

* scheduler/feasible: when a volume writer is in use, check if it's us

* scheduler/feasible: remove SetJob

* nomad/state/state_store: denormalize allocs before Claim

* nomad/structs/csi: return errors on claim, with context

* nomad/csi_endpoint_test: new alloc doesn't look like an update

* nomad/state/state_store_test: change test reference to CanWrite
2020-03-23 21:21:04 -04:00
Lang Martin 3621df1dbf csi: volume ids are only unique per namespace (#7358)
* nomad/state/schema: use the namespace compound index

* scheduler/scheduler: CSIVolumeByID interface signature namespace

* scheduler/stack: SetJob on CSIVolumeChecker to capture namespace

* scheduler/feasible: pass the captured namespace to CSIVolumeByID

* nomad/state/state_store: use namespace in csi_volume index

* nomad/fsm: pass namespace to CSIVolumeDeregister & Claim

* nomad/core_sched: pass the namespace in volumeClaimReap

* nomad/node_endpoint_test: namespaces in Claim testing

* nomad/csi_endpoint: pass RequestNamespace to state.*

* nomad/csi_endpoint_test: appropriately failed test

* command/alloc_status_test: appropriately failed test

* node_endpoint_test: avoid notTheNamespace for the job

* scheduler/feasible_test: call SetJob to capture the namespace

* nomad/csi_endpoint: ACL check the req namespace, query by namespace

* nomad/state/state_store: remove deregister namespace check

* nomad/state/state_store: remove unused CSIVolumes

* scheduler/feasible: CSIVolumeChecker SetJob -> SetNamespace

* nomad/csi_endpoint: ACL check

* nomad/state/state_store_test: remove call to state.CSIVolumes

* nomad/core_sched_test: job namespace match so claim gc works
2020-03-23 13:59:25 -04:00
Lang Martin a0a6766740 CSI: Scheduler knows about CSI constraints and availability (#6995)
* structs: piggyback csi volumes on host volumes for job specs

* state_store: CSIVolumeByID always includes plugins, matches usecase

* scheduler/feasible: csi volume checker

* scheduler/stack: add csi volumes

* contributing: update rpc checklist

* scheduler: add volumes to State interface

* scheduler/feasible: introduce new checker collection tgAvailable

* scheduler/stack: taskGroupCSIVolumes checker is transient

* state_store CSIVolumeDenormalizePlugins comment clarity

* structs: remote TODO comment in TaskGroup Validate

* scheduler/feasible: CSIVolumeChecker hasPlugins improve comment

* scheduler/feasible_test: set t.Parallel

* Update nomad/state/state_store.go

Co-Authored-By: Danielle <dani@hashicorp.com>

* Update scheduler/feasible.go

Co-Authored-By: Danielle <dani@hashicorp.com>

* structs: lift ControllerRequired to each volume

* state_store: store plug.ControllerRequired, use it for volume health

* feasible: csi match fast path remove stale host volume copied logic

* scheduler/feasible: improve comments

Co-authored-by: Danielle <dani@builds.terrible.systems>
2020-03-23 13:58:29 -04:00
Alex Dadgar 4bdccab550 goimports 2019-01-22 15:44:31 -08:00
Preetha Appan 25a047267f
Use scheduler config from state store to enable/disable preemption 2018-10-30 11:06:32 -05:00
Alex Dadgar 3c19d01d7a server 2018-09-15 16:23:13 -07:00
Josh Soref ca4ceb0e5c spelling: commits 2018-03-11 17:47:45 +00:00
Alex Dadgar c1cc51dbee sync 2017-10-13 14:36:02 -07:00
Alex Dadgar 84d06f6abe Sync namespace changes 2017-09-07 17:04:21 -07:00
Alex Dadgar b3f4db0930 cancel deployments 2017-07-07 12:01:17 -07:00
Alex Dadgar b69b357c7f Nomad builds 2017-02-07 20:31:23 -08:00
Diptanu Choudhury 5191b4d33a Making the status command return the allocs of currently registered job 2016-11-24 16:31:30 +01:00
Alex Dadgar a1d08c2aba Add scheduler version enforcement 2016-10-26 14:52:48 -07:00
Alex Dadgar 18d9e89065 Reuse the same evaluation and reblock it until there is no more work to do 2016-05-24 20:10:56 -07:00
Armon Dadgar 35741fcedd scheduler: Use AllocsByNodeTerminal to avoid filtering 2016-02-20 11:29:15 -08:00
Alex Dadgar 25c2c83d6f Revert "dont hard code scheduler type name"
This reverts commit fb0e0dfc2be596ad435a00a2e131a2000146b1f1.
2015-10-23 16:32:45 -07:00
Alex Dadgar 5b0a9b2469 dont hard code scheduler type name 2015-10-23 16:31:45 -07:00
Alex Dadgar 494244ed06 System scheduler and system stack 2015-10-14 18:39:44 -07:00
Armon Dadgar e7c0bb5e1e scheduler: Adding CreateEval to Planner 2015-09-07 14:26:29 -07:00
Armon Dadgar 8bedd3769c nomad: unifying the state store API 2015-09-06 20:56:38 -07:00
Armon Dadgar cae67b7f60 nomad: expose UpdateEval as a planner 2015-08-15 14:25:00 -07:00
Armon Dadgar 3f16a21658 nomad: remove NodesByDatacenterStatus 2015-08-15 13:11:42 -07:00
Armon Dadgar ddfe28e694 scheduler: ServiceScheduler is now GenericScheduler with service and batch modes 2015-08-13 22:28:37 -07:00
Armon Dadgar a077605114 nomad: adding NodesByDatacenterStatus 2015-08-13 13:17:03 -07:00
Armon Dadgar df21ab3d10 scheduler: working on bin pack 2015-08-13 11:54:59 -07:00
Armon Dadgar fb0c230f3b scheduler: working on job updates 2015-08-11 16:41:48 -07:00
Armon Dadgar dbb631d0b2 scheduler: check node status before evicting 2015-08-11 14:04:04 -07:00
Armon Dadgar 28f9234135 scheduler: first pass at job deregister 2015-08-06 17:46:14 -07:00
Armon Dadgar 09a8c15d7e scheduler: adding service scheduler definition 2015-08-06 17:25:14 -07:00
Armon Dadgar d7e9927661 nomad: integrating worker and scheduler 2015-07-28 16:15:32 -07:00
Armon Dadgar 90e8bc9b91 scheduler: initial interface 2015-07-28 16:03:15 -07:00