open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	8ff5ea1bee	CSI: no early return when feasibility check fails on eligible nodes (#13274 ) As a performance optimization in the scheduler, feasibility checks that apply to an entire class are only checked once for all nodes of that class. Other feasibility checks are "available" checks because they rely on more ephemeral characteristics and don't contribute to the hash for the node class. This currently includes only CSI. We have a separate fast path for "available" checks when the node has already been marked eligible on the basis of class. This fast path has a bug where it returns early rather than continuing the loop. This causes the entire task group to be rejected. Fix the bug by not returning early in the fast path and instead jump to the top of the loop like all the other code paths in this method. Includes a new test exercising topology at whole-scheduler level and a fix for an existing test that should've caught this previously.	2022-06-07 13:31:10 -04:00
Tim Gross	2dafe46fe3	CSI: allow updates to volumes on re-registration (#12167 ) CSI `CreateVolume` RPC is idempotent given that the topology, capabilities, and parameters are unchanged. CSI volumes have many user-defined fields that are immutable once set, and many fields that are not user-settable. Update the `Register` RPC so that updating a volume via the API merges onto any existing volume without touching Nomad-controlled fields, while validating it with the same strict requirements expected for idempotent `CreateVolume` RPCs. Also, clarify that this state store method is used for everything, not just for the `Register` RPC.	2022-03-07 11:06:59 -05:00
Tim Gross	f2a4ad0949	CSI: implement support for topology (#12129 )	2022-03-01 10:15:46 -05:00
Tim Gross	57a546489f	CSI: minor refactoring (#12105 ) * rename method checking that free write claims are available * use package-level variables for claim errors * semgrep fix for testify	2022-02-23 11:13:51 -05:00
Tim Gross	d9d4da1e9f	scheduler: seed random shuffle nodes with eval ID (#12008 ) Processing an evaluation is nearly a pure function over the state snapshot, but we randomly shuffle the nodes. This means that developers can't take a given state snapshot and pass an evaluation through it and be guaranteed the same plan results. But the evaluation ID is already random, so if we use this as the seed for shuffling the nodes we can greatly reduce the sources of non-determinism. Unfortunately golang map iteration uses a global source of randomness and not a goroutine-local one, but arguably if the scheduler behavior is impacted by this, that's a bug in the iteration.	2022-02-08 12:16:33 -05:00
Tim Gross	a2433e35fb	CSI: resolve invalid claim states (#11890 ) * csi: resolve invalid claim states on read It's currently possible for CSI volumes to be claimed by allocations that no longer exist. This changeset asserts a reasonable state at the state store level by registering these nil allocations as "past claims" on any read. This will cause any pass through the periodic GC or volumewatcher to trigger the unpublishing workflow for those claims. * csi: make feasibility check errors more understandable When the feasibility checker finds we have no free write claims, it checks to see if any of those claims are for the job we're currently scheduling (so that earlier versions of a job can't block claims for new versions) and reports a conflict if the volume can't be scheduled so that the user can fix their claims. But when the checker hits a claim that has a GCd allocation, the state is recoverable by the server once claim reaping completes and no user intervention is required; the blocked eval should complete. Differentiate the scheduler error produced by these two conditions.	2022-01-27 09:30:03 -05:00
Andrii Chubatiuk	712bd5f5a6	add support for host network interpolation	2021-04-13 09:53:05 -04:00
Tim Gross	fa25e048b2	CSI: unique volume per allocation Add a `PerAlloc` field to volume requests that directs the scheduler to test feasibility for volumes with a source ID that includes the allocation index suffix (ex. `[0]`), rather than the exact source ID. Read the `PerAlloc` field when making the volume claim at the client to determine if the allocation index suffix (ex. `[0]`) should be added to the volume source ID.	2021-03-18 15:35:11 -04:00
Tim Gross	9b2b580d1a	CSI: remove prefix matching from CSIVolumeByID and fix CLI prefix matching (#10158 ) Callers of `CSIVolumeByID` are generally assuming they should receive a single volume. This potentially results in feasibility checking being performed against the wrong volume if a volume's ID is a prefix substring of other volume (for example: "test" and "testing"). Removing the incorrect prefix matching from `CSIVolumeByID` breaks prefix matching in the command line client. Add the required elements for prefix matching to the commands and API.	2021-03-18 14:32:40 -04:00
Tim Gross	0e3264aa4f	scheduler/csi: fix early return when multiple volumes are requested When multiple CSI volumes are requested, the feasibility check could return early for read/write volumes with free claims, even if a later volume in the request was not feasible for any other reason (including not existing at all). This can result in random failure to fail feasibility checking, depending on how the map of volumes was being ordered at runtime. Remove the early return from the feasibility check. Add a test to verify that missing volumes in the map will cause a failure; this test will not catch a regression every test run because of the random map ordering, but any failure will be caught over the course of several CI runs.	2021-03-10 15:18:36 -05:00
Kris Hicks	0a3a748053	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
Seth Hoenig	6b89527505	scheduler: enable upgrade path for bridge network finger print This PR enables users of Nomad < 0.12 to upgrade to Nomad 0.12 and beyond. Nomad 0.12 introduced a network fingerprinter for bridge networks, which is a contstraint checked for if bridge network is being used. If users upgrade servers first as is recommended, suddenly no clients running older versions of Nomad will satisfy the bridge network resource constraint. Instead, this change only enforces the constraint if the Nomad client version is also >= 0.12. Closes #8423	2020-11-13 14:17:01 -06:00
Nick Ethier	416efd83ee	scheduler: do network feasibility checking for system jobs (#8256 )	2020-06-24 16:01:00 -04:00
Nick Ethier	f0559a8162	multi-interface network support	2020-06-19 09:42:10 -04:00
Nick Ethier	4a44deaa5c	CNI Implementation (#7518 )	2020-06-18 11:05:29 -07:00
Tim Gross	161f9aedc3	scheduler: prevent a reported NPE for CSI (#7633 )	2020-04-06 09:42:27 -04:00
Lang Martin	e03c328792	csi: use node MaxVolumes during scheduling (#7565 ) * nomad/state/state_store: CSIVolumesByNodeID ignores namespace * scheduler/scheduler: add CSIVolumesByNodeID to the state interface * scheduler/feasible: check node MaxVolumes * nomad/csi_endpoint: no namespace inn CSIVolumesByNodeID anymore * nomad/state/state_store: avoid DenormalizeAllocationSlice * nomad/state/iterator: clean up SliceIterator Next * scheduler/feasible_test: block with MaxVolumes * nomad/state/state_store_test: fix args to CSIVolumesByNodeID	2020-03-31 17:16:47 -04:00
Lang Martin	d994990ef0	csi: the scheduler allows a job with a volume write claim to be updated (#7438 ) * nomad/structs/csi: split CanWrite into health, in use * scheduler/scheduler: expose AllocByID in the state interface * nomad/state/state_store_test * scheduler/stack: SetJobID on the matcher * scheduler/feasible: when a volume writer is in use, check if it's us * scheduler/feasible: remove SetJob * nomad/state/state_store: denormalize allocs before Claim * nomad/structs/csi: return errors on claim, with context * nomad/csi_endpoint_test: new alloc doesn't look like an update * nomad/state/state_store_test: change test reference to CanWrite	2020-03-23 21:21:04 -04:00
Tim Gross	d1f43a5fea	csi: improve error messages from scheduler (#7426 )	2020-03-23 13:59:25 -04:00
Lang Martin	3621df1dbf	csi: volume ids are only unique per namespace (#7358 ) * nomad/state/schema: use the namespace compound index * scheduler/scheduler: CSIVolumeByID interface signature namespace * scheduler/stack: SetJob on CSIVolumeChecker to capture namespace * scheduler/feasible: pass the captured namespace to CSIVolumeByID * nomad/state/state_store: use namespace in csi_volume index * nomad/fsm: pass namespace to CSIVolumeDeregister & Claim * nomad/core_sched: pass the namespace in volumeClaimReap * nomad/node_endpoint_test: namespaces in Claim testing * nomad/csi_endpoint: pass RequestNamespace to state.* * nomad/csi_endpoint_test: appropriately failed test * command/alloc_status_test: appropriately failed test * node_endpoint_test: avoid notTheNamespace for the job * scheduler/feasible_test: call SetJob to capture the namespace * nomad/csi_endpoint: ACL check the req namespace, query by namespace * nomad/state/state_store: remove deregister namespace check * nomad/state/state_store: remove unused CSIVolumes * scheduler/feasible: CSIVolumeChecker SetJob -> SetNamespace * nomad/csi_endpoint: ACL check * nomad/state/state_store_test: remove call to state.CSIVolumes * nomad/core_sched_test: job namespace match so claim gc works	2020-03-23 13:59:25 -04:00
Danielle Lancashire	e227f31584	sched/feasible: Return more detailed CSI Failure messages	2020-03-23 13:58:30 -04:00
Danielle Lancashire	a2e01c4369	sched/feasible: Validate CSIVolume's correctly Previously we were looking up plugins based on the Alias Name for a CSI Volume within the context of its task group. Here we first look up a volume based on its identifier and then validate the existence of the plugin based on its `PluginID`.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	e56c677221	sched/feasible: CSI - Filter applicable volumes This commit filters the jobs volumes when setting them on the feasibility checker. This ensures that the rest of the checker does not have to worry about non-csi volumes.	2020-03-23 13:58:30 -04:00
Lang Martin	a0a6766740	CSI: Scheduler knows about CSI constraints and availability (#6995 ) * structs: piggyback csi volumes on host volumes for job specs * state_store: CSIVolumeByID always includes plugins, matches usecase * scheduler/feasible: csi volume checker * scheduler/stack: add csi volumes * contributing: update rpc checklist * scheduler: add volumes to State interface * scheduler/feasible: introduce new checker collection tgAvailable * scheduler/stack: taskGroupCSIVolumes checker is transient * state_store CSIVolumeDenormalizePlugins comment clarity * structs: remote TODO comment in TaskGroup Validate * scheduler/feasible: CSIVolumeChecker hasPlugins improve comment * scheduler/feasible_test: set t.Parallel * Update nomad/state/state_store.go Co-Authored-By: Danielle <dani@hashicorp.com> * Update scheduler/feasible.go Co-Authored-By: Danielle <dani@hashicorp.com> * structs: lift ControllerRequired to each volume * state_store: store plug.ControllerRequired, use it for volume health * feasible: csi match fast path remove stale host volume copied logic * scheduler/feasible: improve comments Co-authored-by: Danielle <dani@builds.terrible.systems>	2020-03-23 13:58:29 -04:00
Michael Schurter	796758b8a5	core: add semver constraint The existing version constraint uses logic optimized for package managers, not schedulers, when checking prereleases: - 1.3.0-beta1 will not satisfy ">= 0.6.1" - 1.7.0-rc1 will not satisfy ">= 1.6.0-beta1" This is due to package managers wishing to favor final releases over prereleases. In a scheduler versions more often represent the earliest release all required features/APIs are available in a system. Whether the constraint or the version being evaluated are prereleases has no impact on ordering. This commit adds a new constraint - `semver` - which will use Semver v2.0 ordering when evaluating constraints. Given the above examples: - 1.3.0-beta1 satisfies ">= 0.6.1" using `semver` - 1.7.0-rc1 satisfies ">= 1.6.0-beta1" using `semver` Since existing jobspecs may rely on the old behavior, a new constraint was added and the implicit Consul Connect and Vault constraints were updated to use it.	2019-11-19 08:40:19 -08:00
Danielle Lancashire	78b61de45f	config: Hoist volume.config.source into volume Currently, using a Volume in a job uses the following configuration: ``` volume "alias-name" { type = "volume-type" read_only = true config { source = "host_volume_name" } } ``` This commit migrates to the following: ``` volume "alias-name" { type = "volume-type" source = "host_volume_name" read_only = true } ``` The original design was based due to being uncertain about the future of storage plugins, and to allow maxium flexibility. However, this causes a few issues, namely: - We frequently need to parse this configuration during submission, scheduling, and mounting - It complicates the configuration from and end users perspective - It complicates the ability to do validation As we understand the problem space of CSI a little more, it has become clear that we won't need the `source` to be in config, as it will be used in the majority of cases: - Host Volumes: Always need a source - Preallocated CSI Volumes: Always needs a source from a volume or claim name - Dynamic Persistent CSI Volumes: Always needs a source to attach the volumes to for managing upgrades and to avoid dangling. - Dynamic Ephemeral CSI Volumes: Less thought out, but `source` will probably point to the plugin name, and a `config` block will allow you to pass meta to the plugin. Or will point to a pre-configured ephemeral config. *If implemented The new design simplifies this by merging the source into the volume stanza to solve the above issues with usability, performance, and error handling.	2019-09-13 04:37:59 +02:00
Mahmood Ali	3a1cb51539	schedulers: check all drivers on node When checking driver feasability for an alloc with multiple drivers, we must check that all drivers are detected and healthy. Nomad 0.9 and 0.8 have a bug where we may check a single driver only, but which driver is dependent on map traversal order, which is unspecified in golang spec.	2019-08-29 09:03:31 -04:00
Mahmood Ali	3da10b5cb3	scheduler: tests for multiple drivers in TG	2019-08-29 09:03:31 -04:00
Danielle Lancashire	3a5e48ad18	scheduler: Implicit constraint on readonly hostvol When a Client declares a volume is ReadOnly, we should only schedule it for requests for ReadOnly volumes. This change means that if a host exposes a readonly volume, we then validate that the group level requests for the volume are all read only for that host.	2019-08-21 20:57:05 +02:00
Danielle Lancashire	e132a30899	structs: Unify Volume and VolumeRequest	2019-08-12 15:39:08 +02:00
Danielle	fc53283489	Update scheduler/feasible.go Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>	2019-08-12 15:39:08 +02:00
Danielle Lancashire	073836ec67	scheduler: Add a feasability checker for Host Vols	2019-08-12 15:39:08 +02:00
Alex Dadgar	5198ff05c3	convert driver to device for device constraint/attributes	2019-01-23 10:58:45 -08:00
Alex Dadgar	4bdccab550	goimports	2019-01-22 15:44:31 -08:00
Danielle Tomlinson	9c72dafc95	scheduler: Add is_set/is_not_set constraints This adds constraints for asserting that a given attribute or value exists, or does not exist. This acts as a companion to =, or != operators, e.g: ```hcl constraint { attribute = "${attrs.type}" operator = "!=" value = "database" } constraint { attribute = "${attrs.type}" operator = "is_set" } ```	2018-11-15 11:00:32 -08:00
Danielle Tomlinson	e5c641daa9	scheduler: Allow comparisons of nil values This commit allows the ConstraintChecker to test values that do not exist. This is useful when wanting to _exclude_ given nodes from executing a job, for example, if you wanted to give canary nodes an attribute, and not run critical services on them, you may specify something like the below, but not want to tag all other nodes with the inverse. ```hcl constraint { attribute = "${node.attr.canary} operator = "!=" value = "1" } ``` This also requires all constraint checkers to allow for nil target values, as they will no longer be short circuited by resolving a target.	2018-11-13 13:36:51 -08:00
Alex Dadgar	a7ca737fb6	review comments	2018-11-07 11:31:52 -08:00
Alex Dadgar	6fa893c801	affinities	2018-11-07 10:32:03 -08:00
Alex Dadgar	feb83a2be3	assign devices	2018-11-07 10:32:03 -08:00
Preetha Appan	53c3f8151b	fix linting	2018-10-16 18:29:49 -05:00
Alex Dadgar	f5a76d8411	review comments	2018-10-15 15:31:13 -07:00
Alex Dadgar	7ecd65109a	Check constraints on devices	2018-10-14 13:35:47 -07:00
Alex Dadgar	5284554fcc	rework device checker	2018-10-13 16:47:53 -07:00
Alex Dadgar	9b5aaac410	Device feasability checker	2018-10-13 12:27:49 -07:00
Alex Dadgar	3c19d01d7a	server	2018-09-15 16:23:13 -07:00
Preetha Appan	038ed52877	Fix after rename to ConstraintSetContainsAny	2018-09-04 16:10:11 -05:00
Preetha Appan	dccb693221	test for setcontainsany, and treat set_contains same as set_contains_all	2018-09-04 16:10:11 -05:00
Preetha Appan	5eacd6ada4	Implement affinity support in generic scheduler	2018-09-04 16:10:11 -05:00
Chelsea Holland Komlo	329605b7cc	fix up scheduling test	2018-03-21 15:54:03 -04:00
Chelsea Holland Komlo	60f12d206f	improve comments; update watchDriver	2018-03-21 15:15:26 -04:00

1 2

96 Commits