Commit Graph

696 Commits

Author SHA1 Message Date
Lang Martin 7b675f89ac csi: fix index maintenance for CSIVolume and CSIPlugin tables (#7049)
* state_store: csi volumes/plugins store the index in the txn

* nomad: csi_endpoint_test require index checks need uint64()

* nomad: other tests using int 0 not uint64(0)

* structs: pass index into New, but not other struct methods

* state_store: csi plugin indexes, use new struct interface

* nomad: csi_endpoint_test check index/query meta (on explicit 0)

* structs: NewCSIVolume takes an index arg now

* scheduler/test: NewCSIVolume takes an index arg now
2020-03-23 13:58:29 -04:00
Lang Martin a0a6766740 CSI: Scheduler knows about CSI constraints and availability (#6995)
* structs: piggyback csi volumes on host volumes for job specs

* state_store: CSIVolumeByID always includes plugins, matches usecase

* scheduler/feasible: csi volume checker

* scheduler/stack: add csi volumes

* contributing: update rpc checklist

* scheduler: add volumes to State interface

* scheduler/feasible: introduce new checker collection tgAvailable

* scheduler/stack: taskGroupCSIVolumes checker is transient

* state_store CSIVolumeDenormalizePlugins comment clarity

* structs: remote TODO comment in TaskGroup Validate

* scheduler/feasible: CSIVolumeChecker hasPlugins improve comment

* scheduler/feasible_test: set t.Parallel

* Update nomad/state/state_store.go

Co-Authored-By: Danielle <dani@hashicorp.com>

* Update scheduler/feasible.go

Co-Authored-By: Danielle <dani@hashicorp.com>

* structs: lift ControllerRequired to each volume

* state_store: store plug.ControllerRequired, use it for volume health

* feasible: csi match fast path remove stale host volume copied logic

* scheduler/feasible: improve comments

Co-authored-by: Danielle <dani@builds.terrible.systems>
2020-03-23 13:58:29 -04:00
Jasmine Dahilig 81d051d7e8 fix bug in lifecycle scheduler test mocks 2020-03-21 17:52:51 -04:00
Jasmine Dahilig 0cc9212a54 add test cases for scheduler alloc placement with lifecycle resources 2020-03-21 17:52:47 -04:00
Jasmine Dahilig 3e4e8f2b02 add allocfit test for lifecycles 2020-03-21 17:52:46 -04:00
Mahmood Ali b880607bad update scheduler to account for hooks 2020-03-21 17:52:45 -04:00
Mahmood Ali 9568553d7e Detect network mode change
Mark job as updated if network mode changed.
2020-03-21 16:51:10 -04:00
Drew Bailey 6bd6c6638c
include pro tag in serveral oss.go files 2020-02-10 15:56:14 -05:00
Drew Bailey 9a65556211
add state store test to ensure PlacedCanaries is updated 2020-02-03 13:58:01 -05:00
Drew Bailey f51a3d1f37
nomad state store must be modified through raft, rm local state change 2020-02-03 13:57:34 -05:00
Drew Bailey 1c046a74d8
comment for filtering reason 2020-02-03 09:02:09 -05:00
Drew Bailey e71f132455
add test for node eligibility 2020-02-03 09:02:09 -05:00
Drew Bailey 6b492630dd
make diffSystemAllocsForNode aware of eligibility
diffSystemAllocs -> diffSystemAllocsForNode, this function is only used
for diffing system allocations, but lacked awareness of eligible
nodes and the node ID that the allocation was going to be placed.

This change now ignores a change if its existing allocation is on an
ineligible node. For a new allocation, it also checks tainted and
ineligible nodes in the same function instead of nil-ing out the diff
after computation in diffSystemAllocs
2020-02-03 09:02:08 -05:00
Drew Bailey e613a258da
ignore computed diffs if node is ineligible
test flakey, add temp sleeps for debugging

fix computed class
2020-02-03 09:02:08 -05:00
Drew Bailey 63ddda71e1
Return FailedTGAlloc metric instead of no node err
If an existing system allocation is running and the node its running on
is marked as ineligible, subsequent plan/applys return an RPC error
instead of a more helpful plan result.

This change logs the error, and appends a failedTGAlloc for the
placement.
2020-01-22 10:07:15 -05:00
Drew Bailey ef175c0b31
Update Evicted allocations to lost when lost
If an alloc is being preempted and marked as evict, but the underlying
node is lost before the migration takes place, the allocation currently
stays as desired evict, status running forever, or until the node comes
back online.

This commit updates updateNonTerminalAllocsToLost to check for a
destired status of Evict as well as Stop when updating allocations on
tainted nodes.

switch to table test for lost node cases
2020-01-07 13:34:18 -05:00
Preetha Appan afff27b69b More error->debug for logging in the bin packing iterator 2019-12-12 15:50:16 -06:00
Preetha Appan 3458b41290 Use debug logging for scheduler internals
We currently log an error if preemption is unable to find a suitable set of
allocations to preempt. This commit changes that to debug level since not finding
preemptable allocations is not an error condition.
2019-12-12 12:05:29 -06:00
Michael Schurter 7655e0cee4
Merge pull request #6792 from hashicorp/b-propose-panic
scheduler: fix panic when preempting and evicting allocs
2019-12-03 10:40:19 -08:00
Tim Gross c50057bf1f
scheduler: fix job update placement on prev node penalized (#6781)
Fixes #5856

When the scheduler looks for a placement for an allocation that's
replacing another allocation, it's supposed to penalize the previous
node if the allocation had been rescheduled or failed. But we're
currently always penalizing the node, which leads to unnecessary
migrations on job update.

This commit leaves in place the existing behavior where if the
previous alloc was itself rescheduled, its previous nodes are also
penalized. This is conservative but the right behavior especially on
larger clusters where a group of hosts might be having correlated
trouble (like an AZ failure).

Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>
2019-12-03 06:14:49 -08:00
Michael Schurter 0374069f82 scheduler: update tests with modern error helper 2019-12-02 20:25:52 -08:00
Michael Schurter 19a2ee71d3 scheduler: fix panic when preempting and evicting
Fixes #6787

In ProposedAllocs the proposed alloc slice was being copied while its
contents were not. Since RemoveAllocs nils elements of the proposed
alloc slice and is called twice, it could panic on the second call when
erroneously accessing a nil'd alloc.

The fix is to not copy the proposed alloc slice and pass the slice
returned by the 1st RemoveAllocs call to the 2nd call, thus maintaining
the trimmed length.
2019-12-02 20:22:22 -08:00
Michael Schurter 6f64e52d61
Merge pull request #6699 from hashicorp/f-semver-constraints
Add new "semver" constraint
2019-11-19 12:18:43 -08:00
Drew Bailey 876618b5d2
Removes checking constraints for inplace update 2019-11-19 13:34:41 -05:00
Michael Schurter 796758b8a5 core: add semver constraint
The existing version constraint uses logic optimized for package
managers, not schedulers, when checking prereleases:

- 1.3.0-beta1 will *not* satisfy ">= 0.6.1"
- 1.7.0-rc1 will *not* satisfy ">= 1.6.0-beta1"

This is due to package managers wishing to favor final releases over
prereleases.

In a scheduler versions more often represent the earliest release all
required features/APIs are available in a system. Whether the constraint
or the version being evaluated are prereleases has no impact on
ordering.

This commit adds a new constraint - `semver` - which will use Semver
v2.0 ordering when evaluating constraints. Given the above examples:

- 1.3.0-beta1 satisfies ">= 0.6.1" using `semver`
- 1.7.0-rc1 satisfies ">= 1.6.0-beta1" using `semver`

Since existing jobspecs may rely on the old behavior, a new constraint
was added and the implicit Consul Connect and Vault constraints were
updated to use it.
2019-11-19 08:40:19 -08:00
Drew Bailey e44a66d7fc
DOCS: Spread stanza does not exist on task
Fixes documentation inaccuracy for spread stanza placement. Spreads can
only exist on the top level job struct or within a group.

comment about nil assumption
2019-11-19 08:26:36 -05:00
Drew Bailey 07e3164bf9
Check for changes to affinity and constraints
Adds checks for affinity and constraint changes when determining if we
should update inplace.

refactor to check all levels at once

check for spread changes when checking inplace update
2019-11-19 08:26:34 -05:00
Chris Baker e0105f817a changed all tests to require from t.Fatalf 2019-11-07 22:39:47 +00:00
Chris Baker 95ae01a9f4 the scheduler checks whether task changes require a restart, this needed
to be updated to consider devices
2019-11-07 17:51:15 +00:00
Michael Schurter c6bbe85f42 core: fix panic when AllocatedResources is nil
Fix for #6540
2019-10-28 14:38:21 -07:00
Danielle Lancashire 78b61de45f
config: Hoist volume.config.source into volume
Currently, using a Volume in a job uses the following configuration:

```
volume "alias-name" {
  type = "volume-type"
  read_only = true

  config {
    source = "host_volume_name"
  }
}
```

This commit migrates to the following:

```
volume "alias-name" {
  type = "volume-type"
  source = "host_volume_name"
  read_only = true
}
```

The original design was based due to being uncertain about the future of storage
plugins, and to allow maxium flexibility.

However, this causes a few issues, namely:
- We frequently need to parse this configuration during submission,
scheduling, and mounting
- It complicates the configuration from and end users perspective
- It complicates the ability to do validation

As we understand the problem space of CSI a little more, it has become
clear that we won't need the `source` to be in config, as it will be
used in the majority of cases:

- Host Volumes: Always need a source
- Preallocated CSI Volumes: Always needs a source from a volume or claim name
- Dynamic Persistent CSI Volumes*: Always needs a source to attach the volumes
                                   to for managing upgrades and to avoid dangling.
- Dynamic Ephemeral CSI Volumes*: Less thought out, but `source` will probably point
                                  to the plugin name, and a `config` block will
                                  allow you to pass meta to the plugin. Or will
                                  point to a pre-configured ephemeral config.
*If implemented

The new design simplifies this by merging the source into the volume
stanza to solve the above issues with usability, performance, and error
handling.
2019-09-13 04:37:59 +02:00
Preetha Appan 9accf60805
update comment 2019-09-05 18:43:30 -05:00
Preetha Appan d21c708c4a
Fix inplace updates bug with group level networks
During inplace updates, we should be using network information
from the previous allocation being updated.
2019-09-05 18:37:24 -05:00
Jasmine Dahilig 4edebe389a
add default update stanza and max_parallel=0 disables deployments (#6191) 2019-09-02 10:30:09 -07:00
Mahmood Ali 3a1cb51539 schedulers: check all drivers on node
When checking driver feasability for an alloc with multiple drivers, we
must check that all drivers are detected and healthy.

Nomad 0.9 and 0.8 have a bug where we may check a single driver only,
but which driver is dependent on map traversal order, which is
unspecified in golang spec.
2019-08-29 09:03:31 -04:00
Mahmood Ali 3da10b5cb3 scheduler: tests for multiple drivers in TG 2019-08-29 09:03:31 -04:00
Danielle Lancashire 3a5e48ad18
scheduler: Implicit constraint on readonly hostvol
When a Client declares a volume is ReadOnly, we should only schedule it
for requests for ReadOnly volumes. This change means that if a host
exposes a readonly volume, we then validate that the group level
requests for the volume are all read only for that host.
2019-08-21 20:57:05 +02:00
Danielle Lancashire e132a30899
structs: Unify Volume and VolumeRequest 2019-08-12 15:39:08 +02:00
Danielle fc53283489
Update scheduler/feasible.go
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2019-08-12 15:39:08 +02:00
Danielle Lancashire 073836ec67
scheduler: Add a feasability checker for Host Vols 2019-08-12 15:39:08 +02:00
Preetha Appan e6a496bac0
Code review feedback 2019-07-31 01:04:08 -04:00
Preetha Appan 99eca85206
Scheduler changes to support network at task group level
Also includes unit tests for binpacker and preemption.
The tests verify that network resources specified at the
task group level are properly accounted for
2019-07-31 01:04:08 -04:00
Nick Ethier 7c9520b404
scheduler: fix disk constraints 2019-07-31 01:04:08 -04:00
Nick Ethier 09a4cfd8d7
fix failing tests 2019-07-31 01:04:07 -04:00
Nick Ethier af66a35924
networking: Add new bridge networking mode implementation 2019-07-31 01:04:06 -04:00
Nick Ethier 15989bba8e
ar: cleanup lint errors 2019-07-31 01:03:18 -04:00
Nick Ethier 66c514a388
Add network lifecycle management
Adds a new Prerun and Postrun hooks to manage set up of network namespaces
on linux. Work still needs to be done to make the code platform agnostic and
support Docker style network initalization.
2019-07-31 01:03:17 -04:00
Lang Martin 8157a7b6f8 system_sched submits failed evals as blocked 2019-07-18 10:32:12 -04:00
Preetha Appan 3484f18984
Fix more tests 2019-06-26 16:30:53 -05:00
Preetha Appan 10e7d6df6d
Remove compat code associated with many previous versions of nomad
This removes compat code for namespaces (0.7), Drain(0.8) and other
older features from releases older than Nomad 0.7
2019-06-25 19:05:25 -05:00
Mahmood Ali 8d4f914be9
Merge pull request #5790 from hashicorp/b-reschedule-desired-state
Mark rescheduled allocs as stopped.
2019-06-13 17:28:59 -04:00
Mahmood Ali 5e6327b6a1 Test behavior no reschedule for service/batch jobs 2019-06-13 16:41:19 -04:00
Mahmood Ali faf643a375 Don't stop rescheduleLater allocations
When an alloc is due to be rescheduleLater, it goes through the
reconciler twice: once to be ignored with a follow up evals, and once
again when processing the follow up eval where they appear as
rescheduleNow.

Here, we ignore them in the first run and mark them as stopped in second
iteration; rather than stop them twice.
2019-06-13 09:44:41 -04:00
Mahmood Ali 5dc404ecab Only preempt for network when there is a network
When examining preemption for networks, only consider allocs that have
networks.

Fixes https://github.com/hashicorp/nomad/issues/5793
2019-06-07 18:55:55 -04:00
Mahmood Ali 98575f5788 test: add tests for network devices and preemption 2019-06-07 18:55:02 -04:00
Mahmood Ali fd8fb8c22b Stop allocs to be rescheduled
Currently, when an alloc fails and is rescheduled, the alloc desired
state remains as "run" and the nomad client may not free the resources.

Here, we ensure that an alloc is marked as stopped when it's
rescheduled.

Notice the Desired Status and Description before and after this change:

Before:
```
mars-2:nomad notnoop$ nomad alloc status 02aba49e
ID                   = 02aba49e
Eval ID              = bb9ed1d2
Name                 = example-reschedule.nodes[0]
Node ID              = 5853d547
Node Name            = mars-2.local
Job ID               = example-reschedule
Job Version          = 0
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = run
Desired Description  = <none>
Created              = 10s ago
Modified             = 5s ago
Replacement Alloc ID = d6bf872b

Task "payload" is "dead"
Task Resources
CPU        Memory          Disk     Addresses
0/100 MHz  24 MiB/300 MiB  300 MiB

Task Events:
Started At     = 2019-06-06T21:12:45Z
Finished At    = 2019-06-06T21:12:50Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type            Description
2019-06-06T17:12:50-04:00  Not Restarting  Policy allows no restarts
2019-06-06T17:12:50-04:00  Terminated      Exit Code: 1
2019-06-06T17:12:45-04:00  Started         Task started by client
2019-06-06T17:12:45-04:00  Task Setup      Building Task Directory
2019-06-06T17:12:45-04:00  Received        Task received by client

```

After:

```
ID                   = 5001ccd1
Eval ID              = 53507a02
Name                 = example-reschedule.nodes[0]
Node ID              = a3b04364
Node Name            = mars-2.local
Job ID               = example-reschedule
Job Version          = 0
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = stop
Desired Description  = alloc was rescheduled because it failed
Created              = 13s ago
Modified             = 3s ago
Replacement Alloc ID = 7ba7ac20

Task "payload" is "dead"
Task Resources
CPU         Memory          Disk     Addresses
21/100 MHz  24 MiB/300 MiB  300 MiB

Task Events:
Started At     = 2019-06-06T21:22:50Z
Finished At    = 2019-06-06T21:22:55Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type            Description
2019-06-06T17:22:55-04:00  Not Restarting  Policy allows no restarts
2019-06-06T17:22:55-04:00  Terminated      Exit Code: 1
2019-06-06T17:22:50-04:00  Started         Task started by client
2019-06-06T17:22:50-04:00  Task Setup      Building Task Directory
2019-06-06T17:22:50-04:00  Received        Task received by client
```
2019-06-06 17:27:12 -04:00
Mahmood Ali 3eda42d027 tests: Migrated allocs aren't lost
Fix `TestServiceSched_NodeDown` for checking that the migrated allocs
are actually marked to be stopped.

The boolean logic in test made it skip actually checking client status
as long as desired status was stop.

Here, we mark some jobs for migration while leaving others as running,
and we check that lost flag is only set for non-migrated allocs.
2019-06-06 16:05:07 -04:00
Lang Martin 34230577df describe a pending deployment with auto_promote accurately 2019-05-22 12:32:08 -04:00
Lang Martin d462639cc9 sched reconcile copy AutoPromote to DeploymentState 2019-05-22 12:32:08 -04:00
Preetha Appan 374eee421f
Fix comment and assert score in test case 2019-05-15 12:35:57 -05:00
Nick Ethier f0b9f8e37a
fix missing brace 2019-05-15 13:02:04 -04:00
Nick Ethier 0d851b5d11
scheduler: add check to prohibit returning inf during spread boost calculation 2019-05-15 13:00:24 -04:00
Lang Martin 29ea112586 system_sched & test cleanup comments 2019-05-01 12:25:26 -04:00
Lang Martin c490dacf76 system_sched_test extend the test to check ineligible nodes 2019-05-01 12:25:26 -04:00
Lang Martin c43bcbd35e system_sched when a node is filtered, don't mark failure 2019-05-01 12:25:26 -04:00
Lang Martin aecec5df1b system_sched_test create partially constrained job 2019-05-01 12:25:26 -04:00
Arshneet Singh d4e7a5c005 Add comments to functions, and use require instead of assert 2019-04-23 09:57:21 -07:00
Arshneet Singh 4cf4324b8f Remove allowPlanOptimization from schedulers 2019-04-23 09:18:02 -07:00
Arshneet Singh 0dd4c109e8 Compat tags 2019-04-23 09:18:01 -07:00
Arshneet Singh 65f5fab131 Add tests for plan normalization 2019-04-23 09:18:01 -07:00
Arshneet Singh b977748a4b Add code for plan normalization 2019-04-23 09:18:01 -07:00
Danielle 198a838b61
Merge pull request #5512 from hashicorp/dani/f-alloc-stop
alloc-lifecycle: nomad alloc stop
2019-04-23 13:05:08 +02:00
Danielle Lancashire 832f607433 allocs: Add nomad alloc stop
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.

This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.

The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.
2019-04-23 12:50:23 +02:00
Preetha Appan bcb5c8c70d
remove stray new line 2019-04-12 10:32:48 -05:00
Preetha Appan 8ddc076c1d
Refactor scheduler package to enable preemption for batch/service jobs 2019-04-10 20:24:01 -05:00
James Rasell 9470507cf4
Add NodeName to the alloc/job status outputs.
Currently when operators need to log onto a machine where an alloc
is running they will need to perform both an alloc/job status
call and then a call to discover the node name from the node list.

This updates both the job status and alloc status output to include
the node name within the information to make operator use easier.

Closes #2359
Cloess #1180
2019-04-10 10:34:10 -05:00
Preetha Appan da1ce9bcea
Fix bug where scoring metadata would be overridden during an inplace upgrade. 2019-03-12 23:36:46 -05:00
Alex Dadgar 41265d4d61 Change types of weights on spread/affinity 2019-01-30 12:20:38 -08:00
Nick Ethier 24cbf42798
scheduler: fix NPE when deployment is nil, but placement is a canary 2019-01-28 20:22:59 -06:00
Alex Dadgar 5198ff05c3 convert driver to device for device constraint/attributes 2019-01-23 10:58:45 -08:00
Alex Dadgar 4bdccab550 goimports 2019-01-22 15:44:31 -08:00
Preetha Appan 3b054d6135
Remove unnecessary usage of alloc.Resource 2019-01-10 16:36:47 -06:00
Mahmood Ali 0dfa93a3c1 appease linter 2019-01-08 10:58:49 -05:00
Alex Dadgar 8a35d7b1dd Test recovery 2019-01-07 14:49:41 -08:00
Preetha f406e66ab8
Merge pull request #4881 from hashicorp/f-device-preemption
Device preemption
2018-12-11 18:34:19 -06:00
Preetha Appan 977a4a540d
Early continue after meeting needed count
Also adds another optimization that filters out un-needed allocations
as a final filtering step
2018-12-11 10:12:18 -06:00
Preetha Appan f60c52c8ba
Score combinations of allocs from multiple devices for preemption 2018-12-07 18:35:47 -06:00
Alex Dadgar 1e3c3cb287 Deprecate IOPS
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Preetha Appan 63681fac0c
use structured logging everywhere consistently 2018-12-03 08:31:41 -06:00
Preetha Appan 766820def3
addresses some code clarity review comments 2018-11-27 11:02:06 -06:00
Mahmood Ali 96ffe044e7
Simplify map count update logic
Co-Authored-By: preetapan <preetha@hashicorp.com>
2018-11-27 10:03:11 -06:00
Mahmood Ali 57b94c2d50
code review suggestion
Co-Authored-By: preetapan <preetha@hashicorp.com>
2018-11-27 09:59:57 -06:00
Preetha Appan 86f416a984
Fix formatting 2018-11-16 20:45:52 -06:00
Preetha Appan 8efe6171e4
Fix preemption logic bug, need to group allocations by device first.
This ensures that the set of allocations chosen for preemption all share
the same device where ID is <vendor/type/device>
2018-11-16 20:32:10 -06:00
Danielle Tomlinson 9c72dafc95 scheduler: Add is_set/is_not_set constraints
This adds constraints for asserting that a given attribute or value
exists, or does not exist. This acts as a companion to =, or !=
operators, e.g:

```hcl
constraint {
        attribute = "${attrs.type}"
        operator  = "!="
        value     = "database"
}

constraint {
        attribute = "${attrs.type}"
        operator  = "is_set"
}
```
2018-11-15 11:00:32 -08:00
Preetha Appan 998968f57a
fix linting 2018-11-15 12:27:32 -06:00
Preetha Appan e5de50fba8
Initial implementation of device preemption 2018-11-15 11:09:26 -06:00
Danielle Tomlinson e5c641daa9 scheduler: Allow comparisons of nil values
This commit allows the ConstraintChecker to test values that do not exist.
This is useful when wanting to _exclude_ given nodes from executing a
job, for example, if you wanted to give canary nodes an attribute, and
not run critical services on them, you may specify something like the
below, but not want to tag all other nodes with the inverse.

```hcl
constraint {
  attribute = "${node.attr.canary}
  operator = "!="
  value = "1"
}
```

This also requires all constraint checkers to allow for nil target
values, as they will no longer be short circuited by resolving a target.
2018-11-13 13:36:51 -08:00
Alex Dadgar 08dc2ea702
Merge pull request #4867 from hashicorp/b-deployment-progress-deadline
Blocked evaluation fixes
2018-11-13 10:29:03 -08:00
Preetha Appan de890b9d5c
blank line 2018-11-12 15:50:14 -06:00