Commit Graph

710 Commits

Author SHA1 Message Date
Michael Schurter dd09fa1a4a
Merge pull request #9055 from hashicorp/f-9017-resources
api: add field filters to /v1/{allocations,nodes}
2020-10-14 14:49:39 -07:00
Michael Schurter 8ccbd92cb6 api: add field filters to /v1/{allocations,nodes}
Fixes #9017

The ?resources=true query parameter includes resources in the object
stub listings. Specifically:

- For `/v1/nodes?resources=true` both the `NodeResources` and
  `ReservedResources` field are included.
- For `/v1/allocations?resources=true` the `AllocatedResources` field is
  included.

The ?task_states=false query parameter removes TaskStates from
/v1/allocations responses. (By default TaskStates are included.)
2020-10-14 10:35:22 -07:00
Drew Bailey b4c135358d
use Events to wrap index and events, store in events table 2020-10-14 12:44:39 -04:00
Drew Bailey 9d48818eb8
writetxn can return error, add alloc and job generic events. Add events
table for durability
2020-10-14 12:44:39 -04:00
Drew Bailey 400455d302
Events/eval alloc events (#9012)
* generic eval update event

first pass at alloc client update events

* api/event client
2020-10-14 12:44:37 -04:00
Drew Bailey 4793bb4e01
Events/deployment events (#9004)
* Node Drain events and Node Events (#8980)

Deployment status updates

handle deployment status updates (paused, failed, resume)

deployment alloc health

generate events from apply plan result

txn err check, slim down deployment event

one ndjson line per index

* consolidate down to node event + type

* fix UpdateDeploymentAllocHealth test invocations

* fix test
2020-10-14 12:44:37 -04:00
Tim Gross 3ceb5b36b1
csi: allow more than 1 writer claim for multi-writer mode (#9040)
Fixes a bug where CSI volumes with the `MULTI_NODE_MULTI_WRITER` access mode
were using the same logic as `MULTI_NODE_SINGLE_WRITER` to determine whether
the volume had writer claims available for scheduling.

Extends CSI claim endpoint test to exercise multi-reader and make sure `WriteFreeClaims`
is exercised for multi-writer in feasibility test.
2020-10-07 10:43:23 -04:00
Seth Hoenig f44a4f68ee consul/connect: trigger update as necessary on connect changes
This PR fixes a long standing bug where submitting jobs with changes
to connect services would not trigger updates as expected. Previously,
service blocks were not considered as sources of destructive updates
since they could be synced with consul non-destructively. With Connect,
task group services that have changes to their connect block or to
the service port should be destructive, since the network plumbing of
the alloc is going to need updating.

Fixes #8596 #7991

Non-destructive half in #7192
2020-10-05 14:53:00 -05:00
Neil Mock f749de8543
Fix multi-interface networking in the system scheduler (#8822) 2020-09-22 12:54:34 -04:00
Mahmood Ali 6a0dd8bc87
Merge pull request #8867 from hashicorp/b-canary-substitution
scheduler: Revert requireCanary logic
2020-09-15 12:58:55 -05:00
Mahmood Ali 339617a836 Only ignore rescheduled allocations if they got stopped 2020-09-14 21:11:52 -04:00
Mahmood Ali 98de2d2278 add a test when .NextAllocation is set but alloc is still running 2020-09-14 17:12:53 -04:00
Mahmood Ali fd54cfce6e Revert the `requireCanary` check introduced in https://github.com/hashicorp/nomad/pull/8691/files#diff-1801138ac4d10f2064ba6f2e434ac9b4L430-R431 .
The change was intended to fix a case where a canary alloc may fail to
be rescheduled if all the other allocs fail as well (e.g. if all allocs
happen to be placed on a node that died).  However, it introduced some
unintended side-effects.

Reverting the change for now and will investigate further.
2020-09-10 14:59:02 -04:00
Mahmood Ali c6e1d22697 test for rescheduling non-canaries 2020-09-10 14:59:02 -04:00
Mahmood Ali 8837c9a45d Handle migration of non-deployment jobs
This handles the case where a job when from no-deployment to deployment
with canaries.

Consider a case where a `max_parallel=0` job is submitted as version 0,
then an update is submitted with `max_parallel=1, canary=1` as verion 1.
In this case, we will have 1 canary alloc, and all remaining allocs will
be version 0.  Until the deployment is promoted, we ought to replace the
canaries with version 0 job (which isn't associated with a deployment).
2020-08-26 10:36:34 -04:00
Mahmood Ali 2438b90334 Update scheduler/reconcile.go
Co-authored-by: Chris Baker <1675087+cgbaker@users.noreply.github.com>
2020-08-25 17:37:19 -04:00
Mahmood Ali 38b61b97d8 simplify canary check
`(alloc.DeploymentStatus == nil || !alloc.DeploymentStatus.IsCanary())`
and `!alloc.DeploymentStatus.IsCanary()` are equivalent.
2020-08-25 17:37:19 -04:00
Mahmood Ali e4bb88dfcf tweak stack job manipulation
To address review comments
2020-08-25 17:37:19 -04:00
Mahmood Ali def768728e Have Plan.AppendAlloc accept the job 2020-08-25 17:22:09 -04:00
Mahmood Ali 8a342926b7 Respect alloc job version for lost/failed allocs
This change fixes a bug where lost/failed allocations are replaced by
allocations with the latest versions, even if the version hasn't been
promoted yet.

Now, when generating a plan for lost/failed allocations, the scheduler
first checks if the current deployment is in Canary stage, and if so, it
ensures that any lost/failed allocations is replaced one with the latest
promoted version instead.
2020-08-19 09:52:48 -04:00
Lars Lehtonen fb7b2282b1
scheduler: label loops with nested switch statements for effective break (#8528) 2020-07-24 08:50:41 -04:00
Tim Gross 1ca2c4ec2c scheduler: DesiredCanaries can be set on every pass safely
The reconcile loop sets `DeploymentState.DesiredCanaries` only on the first
pass through the loop and if the job is not paused/pending. In MRD,
deployments will make one pass though the loop while "pending", and were not
ever getting `DesiredCanaries` set. We can't set it in the initial
`DeploymentState` constructor because the first pass through setting up
canaries expects it's not there yet. However, this value is static for a given
version of a job because it's coming from the update stanza, so it's safe to
re-assign the value on subsequent passes.
2020-07-20 11:25:53 -04:00
Tim Gross d3341a2019 refactor: make it clear where we're accessing dstate
The field name `Deployment.TaskGroups` contains a map of `DeploymentState`,
which makes it a little harder to follow state updates when combined with
inconsistent naming conventions, particularly when we also have the state
store or actual `TaskGroup`s in scope. This changeset changes all uses to
`dstate` so as not to be confused with actual TaskGroups.
2020-07-20 11:25:53 -04:00
Tim Gross fe5f5e35aa
mrd: reconcile should treat pending deployments as paused (#8446)
If a job update includes a task group that has no changes, those allocations
have their version bumped in-place. The ends up triggering an eval from
`deploymentwatcher` when it verifies their health. Although this eval is a
no-op, we were only treating pending deployments the same as paused when
the deployment was a new MRD. This means that any eval after the initial one
will kick off the deployment, and that caused pending deployments to "jump
the queue" and run ahead of schedule, breaking MRD invariants and resulting in
a state with all regions blocked.

This behavior can be replicated even in the case of job updates with no
in-place updates by patching `deploymentwatcher` to inject a spurious no-op
eval. This changeset fixes the behavior by treating pending deployments the
same as paused in all cases in the reconciler.
2020-07-16 13:00:08 -04:00
Tim Gross bd457343de
MRD: all regions should start pending (#8433)
Deployments should wait until kicked off by `Job.Register` so that we can
assert that all regions have a scheduled deployment before starting any
region. This changeset includes the OSS fixes to support the ENT work.

`IsMultiregionStarter` has no more callers in OSS, so remove it here.
2020-07-14 10:57:37 -04:00
Nick Ethier e0fb634309
ar: support opting into binding host ports to default network IP (#8321)
* ar: support opting into binding host ports to default network IP

* fix config plumbing

* plumb node address into network resource

* struct: only handle network resource upgrade path once
2020-07-06 18:51:46 -04:00
Tim Gross 31185325c9
reconcile should not overwrite unblocking state (#8349)
Pre-0.12.0 beta, a deployment was considered "complete" if it was
successful. But with MRD we have "blocked" and "unblocking" states as well. We
did not consider the case where a concurrent alloc health status update
triggers a `Compute` call on a deployment that's moved from "blocked" to
"unblocking" (it's a small window), which caused an extra pass thru the
`nextRegion` logic in `deploymentwatcher` and triggered an error when later
transitioning to "successful".

This changeset makes sure we don't overwrite that status.
2020-07-06 11:31:33 -04:00
Nick Ethier 89118016fc
command: correctly show host IP in ports output /w multi-host networks (#8289) 2020-06-25 15:16:01 -04:00
Nick Ethier 416efd83ee
scheduler: do network feasibility checking for system jobs (#8256) 2020-06-24 16:01:00 -04:00
Mahmood Ali 1c1fb5da0a this is OSS 2020-06-22 10:28:45 -04:00
Michael Schurter 562704124d
Merge pull request #8208 from hashicorp/f-multi-network
multi-interface network support
2020-06-19 15:46:48 -07:00
Tim Gross d3ecb87984
multiregion: initial deploymentPaused must match start condition (#8215)
In #8209 we fixed the `max_parallel` stanza for multiregion by introducing the
`IsMultiregionStarter` check, but didn't apply it to the earlier place its
required. The result is that deployments start but don't place allocations.
2020-06-19 13:42:38 -04:00
Tim Gross b654e1b8a4
multiregion: all regions start in running if no max_parallel (#8209)
If `max_parallel` is not set, all regions should begin in a `running` state
rather than a `pending` state. Otherwise the first region is set to `running`
and then all the remaining regions once it enters `blocked. That behavior is
technically correct in that we have at most `max_parallel` regions running,
but definitely not what a user expects.
2020-06-19 11:17:09 -04:00
Nick Ethier f0559a8162
multi-interface network support 2020-06-19 09:42:10 -04:00
Nick Ethier 1e4ea699ad fix test failures from rebase 2020-06-18 11:05:32 -07:00
Nick Ethier 4a44deaa5c CNI Implementation (#7518) 2020-06-18 11:05:29 -07:00
Nick Ethier 0bc0403cc3 Task DNS Options (#7661)
Co-Authored-By: Tim Gross <tgross@hashicorp.com>
Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>
2020-06-18 11:01:31 -07:00
Tim Gross c14a75bfab multiregion: use pending instead of paused
The `paused` state is used as an operator safety mechanism, so that they can
debug a deployment or halt one that's causing a wider failure. By using the
`paused` state as the first state of a multiregion deployment, we risked
resuming an intentionally operator-paused deployment because of activity in a
peer region.

This changeset replaces the use of the `paused` state with a `pending` state,
and provides a `Deployment.Run` internal RPC to replace the use of the
`Deployment.Pause` (resume) RPC we were using in `deploymentwatcher`.
2020-06-17 11:06:14 -04:00
Tim Gross fd50b12ee2 multiregion: integrate with deploymentwatcher
* `nextRegion` should take status parameter
* thread Deployment/Job RPCs thru `nextRegion`
* add `nextRegion` calls to `deploymentwatcher`
* use a better description for paused for peer
2020-06-17 11:06:00 -04:00
Tim Gross 5c4d0a73f4 start all but first region deployment in paused state 2020-06-17 11:05:34 -04:00
Tim Gross 473a0f1d44 multiregion: unblock and cancel RPCs 2020-06-17 11:02:26 -04:00
Lang Martin 069840bef8
scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect (#8105) (#8138)
* scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect

* scheduler/reconcile: thread follupEvalIDs through to results.stop

* scheduler/reconcile: comment typo

* nomad/_test: correct arguments for plan.AppendStoppedAlloc

* scheduler/reconcile: avoid nil, cleanup handleDelayed(Lost|Reschedules)
2020-06-09 17:13:53 -04:00
Lang Martin ac7c39d3d3
Delayed evaluations for `stop_after_client_disconnect` can cause unwanted extra followup evaluations around job garbage collection (#8099)
* client/heartbeatstop: reversed time condition for startup grace

* scheduler/generic_sched: use `delayInstead` to avoid a loop

Without protecting the loop that creates followUpEvals, a delayed eval
is allowed to create an immediate subsequent delayed eval. For both
`stop_after_client_disconnect` and the `reschedule` block, a delayed
eval should always produce some immediate result (running or blocked)
and then only after the outcome of that eval produce a second delayed
eval.

* scheduler/reconcile: lostLater are different than delayedReschedules

Just slightly. `lostLater` allocs should be used to create batched
evaluations, but `handleDelayedReschedules` assumes that the
allocations are in the untainted set. When it creates the in-place
updates to those allocations at the end, it causes the allocation to
be treated as running over in the planner, which causes the initial
`stop_after_client_disconnect` evaluation to be retried by the worker.
2020-06-03 09:48:38 -04:00
Mahmood Ali 21c948f3d3 keep promotion score constants next to use 2020-05-27 15:13:19 -04:00
Mahmood Ali d9792777d9 Open source Preemption code
Nomad 0.12 OSS is to include preemption feature.

This commit moves the private code for managing preemption to OSS
repository.
2020-05-27 15:02:01 -04:00
Lang Martin d3c4700cd3
server: stop after client disconnect (#7939)
* jobspec, api: add stop_after_client_disconnect

* nomad/state/state_store: error message typo

* structs: alloc methods to support stop_after_client_disconnect

1. a global AllocStates to track status changes with timestamps. We
   need this to track the time at which the alloc became lost
   originally.

2. ShouldClientStop() and WaitClientStop() to actually do the math

* scheduler/reconcile_util: delayByStopAfterClientDisconnect

* scheduler/reconcile: use delayByStopAfterClientDisconnect

* scheduler/util: updateNonTerminalAllocsToLost comments

This was setup to only update allocs to lost if the DesiredStatus had
already been set by the scheduler. It seems like the intention was to
update the status from any non-terminal state, and not all lost allocs
have been marked stop or evict by now

* scheduler/testing: AssertEvalStatus just use require

* scheduler/generic_sched: don't create a blocked eval if delayed

* scheduler/generic_sched_test: several scheduling cases
2020-05-13 16:39:04 -04:00
Mahmood Ali 759eade78b missed fixing one invocation 2020-05-01 13:38:46 -04:00
Mahmood Ali b9e3cde865 tests and some clean up 2020-05-01 13:13:30 -04:00
Charlie Voiselle d8e5e02398 Wiring algorithm to scheduler calls 2020-05-01 13:13:29 -04:00
Michael Schurter c901d0e7dd
Merge branch 'master' into b-reserved-scoring 2020-04-30 14:48:14 -07:00