Commit graph

3457 commits

Author SHA1 Message Date
Tim Gross 8a354f828f
store ACL Accessor ID from Job.Register with Job (#8204)
In multiregion deployments when ACLs are enabled, the deploymentwatcher needs
an appropriately scoped ACL token with the same `submit-job` rights as the
user who submitted it. The token will already be replicated, so store the
accessor ID so that it can be retrieved by the leader.
2020-06-19 07:53:29 -04:00
Mahmood Ali 38a01c050e
Merge pull request #8192 from hashicorp/f-status-allnamespaces-2
CLI Allow querying all namespaces for jobs and allocations - Try 2
2020-06-18 20:16:52 -04:00
Nick Ethier 4a44deaa5c CNI Implementation (#7518) 2020-06-18 11:05:29 -07:00
Nick Ethier 0bc0403cc3 Task DNS Options (#7661)
Co-Authored-By: Tim Gross <tgross@hashicorp.com>
Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>
2020-06-18 11:01:31 -07:00
Mahmood Ali c0aa06d9c7 rpc: allow querying allocs across namespaces
This implements the backend handling for querying across namespaces for
allocation list endpoints.
2020-06-17 16:31:06 -04:00
Mahmood Ali e784fe331a use '*' to indicate all namespaces
This reverts the introduction of AllNamespaces parameter that was merged
earlier but never got released.
2020-06-17 16:27:43 -04:00
Tim Gross 81ae581da6
test: remove flaky test from volumewatcher (#8189)
The volumewatcher restores itself on notification, but detecting this is racy
because it may reap any claim (or find there are no claims to reap) and
shutdown before we can test whether it's running. This appears to have become
flaky with a new version of golang. The other cases in this test case
sufficiently exercise the start/stop behavior of the volumewatcher, so remove
the flaky section.
2020-06-17 15:41:51 -04:00
Chris Baker fe9d654640
Merge pull request #8187 from hashicorp/f-8143-block-scaling-during-deployment
modify Job.Scale RPC to return an error if there is an active deployment
2020-06-17 14:38:55 -05:00
Chris Baker cd903218f7 added changelog entry and satisfied make check 2020-06-17 17:43:45 +00:00
Chris Baker ab2b15d8cb modify Job.Scale RPC to return an error if there is an active deployment
resolves #8143
2020-06-17 17:03:35 +00:00
Tim Gross 6b1cb61888 remove test for ent-only behavior 2020-06-17 11:27:29 -04:00
Tim Gross c14a75bfab multiregion: use pending instead of paused
The `paused` state is used as an operator safety mechanism, so that they can
debug a deployment or halt one that's causing a wider failure. By using the
`paused` state as the first state of a multiregion deployment, we risked
resuming an intentionally operator-paused deployment because of activity in a
peer region.

This changeset replaces the use of the `paused` state with a `pending` state,
and provides a `Deployment.Run` internal RPC to replace the use of the
`Deployment.Pause` (resume) RPC we were using in `deploymentwatcher`.
2020-06-17 11:06:14 -04:00
Tim Gross fd50b12ee2 multiregion: integrate with deploymentwatcher
* `nextRegion` should take status parameter
* thread Deployment/Job RPCs thru `nextRegion`
* add `nextRegion` calls to `deploymentwatcher`
* use a better description for paused for peer
2020-06-17 11:06:00 -04:00
Tim Gross 7b12445f29 multiregion: change AutoRevert to OnFailure 2020-06-17 11:05:45 -04:00
Tim Gross 5c4d0a73f4 start all but first region deployment in paused state 2020-06-17 11:05:34 -04:00
Tim Gross 48e9f75c1e multiregion: deploymentwatcher hooks
This changeset establishes hooks in deploymentwatcher for multiregion
deployments (for the enterprise version of Nomad).
2020-06-17 11:05:18 -04:00
Tim Gross b09b7a2475 Multiregion job registration
Integration points for multiregion jobs to be registered in the enterprise
version of Nomad:
* hook in `Job.Register` for enterprise to send job to peer regions
* remove monitoring from `nomad job run` and `nomad job stop` for multiregion jobs
2020-06-17 11:04:58 -04:00
Drew Bailey 9263fcb0d3 Multiregion deploy status and job status CLI 2020-06-17 11:03:34 -04:00
Tim Gross 473a0f1d44 multiregion: unblock and cancel RPCs 2020-06-17 11:02:26 -04:00
Tim Gross ede3a4f1c4 multiregion: request structs 2020-06-17 11:00:34 -04:00
Tim Gross 6851024925 Multiregion structs
Initial struct definitions, jobspec parsing, validation, and conversion
between Nomad structs and API structs for multi-region deployments.
2020-06-17 11:00:14 -04:00
Chris Baker 9fc66bc1aa support in API client and Job.Register RPC for PreserveCounts 2020-06-16 18:45:28 +00:00
Chris Baker 1e3563e08c wip: added PreserveCounts to struct.JobRegisterRequest, development test for Job.Register 2020-06-16 18:45:17 +00:00
Chris Baker 7ed06cced0 core: update Job.Scale to save the previous job count in the ScalingEvent 2020-06-15 19:49:22 +00:00
Chris Baker aeb3ed449e wip: added .PreviousCount to api.ScalingEvent and structs.ScalingEvent, with developmental tests 2020-06-15 19:40:21 +00:00
Mahmood Ali c17ffb2d35
Merge pull request #8131 from hashicorp/f-snapshot-restore
Implement snapshot restore
2020-06-15 08:32:34 -04:00
Mahmood Ali 9bfc3e28d9
Apply suggestions from code review
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2020-06-15 08:32:16 -04:00
Lang Martin 069840bef8
scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect (#8105) (#8138)
* scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect

* scheduler/reconcile: thread follupEvalIDs through to results.stop

* scheduler/reconcile: comment typo

* nomad/_test: correct arguments for plan.AppendStoppedAlloc

* scheduler/reconcile: avoid nil, cleanup handleDelayed(Lost|Reschedules)
2020-06-09 17:13:53 -04:00
Mahmood Ali 63e048e972 clarify ccomments, esp related to leadership code 2020-06-09 12:01:31 -04:00
Mahmood Ali b543460e0a loosen raft timeout 2020-06-07 16:38:11 -04:00
Mahmood Ali 69bb42acf8 tests: prefix agent logs to identify agent sources 2020-06-07 16:38:11 -04:00
Mahmood Ali 47a163b63f reassert leadership 2020-06-07 15:47:06 -04:00
Mahmood Ali 9eb13ae144 basic snapshot restore 2020-06-07 15:46:23 -04:00
Mahmood Ali bf7a3583e5
Merge pull request #8089 from hashicorp/b-leader-worker-count
leadership: pause and unpause workers consistently
2020-06-04 12:01:01 -04:00
Mahmood Ali cd8e1b4d62
stop periodic dispatch at end of tests (#8111) 2020-06-04 09:15:00 -04:00
Lang Martin ac7c39d3d3
Delayed evaluations for stop_after_client_disconnect can cause unwanted extra followup evaluations around job garbage collection (#8099)
* client/heartbeatstop: reversed time condition for startup grace

* scheduler/generic_sched: use `delayInstead` to avoid a loop

Without protecting the loop that creates followUpEvals, a delayed eval
is allowed to create an immediate subsequent delayed eval. For both
`stop_after_client_disconnect` and the `reschedule` block, a delayed
eval should always produce some immediate result (running or blocked)
and then only after the outcome of that eval produce a second delayed
eval.

* scheduler/reconcile: lostLater are different than delayedReschedules

Just slightly. `lostLater` allocs should be used to create batched
evaluations, but `handleDelayedReschedules` assumes that the
allocations are in the untainted set. When it creates the in-place
updates to those allocations at the end, it causes the allocation to
be treated as running over in the planner, which causes the initial
`stop_after_client_disconnect` evaluation to be retried by the worker.
2020-06-03 09:48:38 -04:00
Mahmood Ali 70fbcb99c2 leadership: pause and unpause workers consistently
This fixes a bug where leadership establishment pauses 3/4 of workers
but stepping down unpause only 1/2!
2020-06-01 10:57:53 -04:00
Mahmood Ali 891fb3f8a9 test for paused workers upon leadership revocation 2020-06-01 10:48:42 -04:00
Mahmood Ali de44d9641b
Merge pull request #8047 from hashicorp/f-snapshot-save
API for atomic snapshot backups
2020-06-01 07:55:16 -04:00
Mahmood Ali e37a3312d5 If leadership fails, consider it handled
The callers for `forward` and old implementation expect failures to be
accompanied with a true value!  This fixes the issue and have tests
passing!
2020-05-31 22:06:17 -04:00
Mahmood Ali 30ab9c84e5 more review feedback 2020-05-31 21:39:09 -04:00
Mahmood Ali a73cd01a00
Merge pull request #8001 from hashicorp/f-jobs-list-across-nses
endpoint to expose all jobs across all namespaces
2020-05-31 21:28:03 -04:00
Mahmood Ali 082c085068
Merge pull request #8036 from hashicorp/f-background-vault-revoke-on-restore
Speed up leadership establishment
2020-05-31 21:27:16 -04:00
Mahmood Ali 1af32e65bc clarify rpc consistency readiness comment 2020-05-31 21:26:41 -04:00
Mahmood Ali 0819ea60ea
Apply suggestions from code review
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2020-05-31 21:04:39 -04:00
Mahmood Ali 37c6160b96 Handle nil/empty cluster metadata
Handle case where a snapshot is made before cluster metadata is created.

This fixes a bug where a server may have empty cluster metadata if it
created and installed a Raft snapshot before a new cluster metadata ID is
generated.

This case is very unlikely to arise.  Most likely reason is when
upgrading from an old version slowly where servers may use snapshots
before all servers upgrade.  This happened for a user with a log line
like:

```
2020-05-21T15:21:56.996Z [ERROR] nomad.fsm: ClusterSetMetadata failed: error=""set cluster metadata failed: refusing to set new cluster id, previous: , new: <<redacted>
```
2020-05-29 13:34:21 -04:00
Drew Bailey 23d24c7a7f
removes pro tags (#8014) 2020-05-28 15:40:17 -04:00
Mahmood Ali 475b3b77ad
Merge pull request #8060 from hashicorp/tests-deflake-20200526
Deflake some tests - 2020-05-27 edition
2020-05-27 15:24:31 -04:00
Drew Bailey 34871f89be
Oss license support for ent builds (#8054)
* changes necessary to support oss licesning shims

revert nomad fmt changes

update test to work with enterprise changes

update tests to work with new ent enforcements

make check

update cas test to use scheduler algorithm

back out preemption changes

add comments

* remove unused method
2020-05-27 13:46:52 -04:00
Mahmood Ali 61e4f5aaf9 tests: use GreaterOrEqual and apply change to other tests 2020-05-27 11:22:48 -04:00
Mahmood Ali 6dfe0f5d3b tests: use t.Fatalf when it's clearer 2020-05-27 10:09:56 -04:00
Mahmood Ali ec1fcedb93 tests: node drain events may be duplicated 2020-05-27 08:59:06 -04:00
Mahmood Ali c3c2a85314 tests: wait until clients are in the state store 2020-05-26 18:53:24 -04:00
Mahmood Ali 5d80d2a511 tests: eval may be processed quickly 2020-05-26 18:53:24 -04:00
Mahmood Ali 19141f8103 {volume|deployment}watcher: check for nil batcher 2020-05-26 14:54:27 -04:00
Mahmood Ali 81ac098a22 deploymentwatcher: no batcher when disabling
When disabling deploymentwatcher (at the end of a test), avoid starting a
new update batcher with its new goroutine.
2020-05-26 14:44:47 -04:00
Mahmood Ali ccc89f940a terminate leader goroutines on shutdown
Ensure that nomad steps down (and terminate leader goroutines) on
shutdown, when the server is the leader.

Without this change, `monitorLeadership` may handle `shutdownCh` event
and exit early before handling the raft `leaderCh` event and end up
leaking leadership goroutines.
2020-05-26 10:18:10 -04:00
Mahmood Ali e671913e56 fix a trace logline 2020-05-26 10:18:09 -04:00
Mahmood Ali 1c79c3b93d refactor: context is first parameter
By convention, go functions take `context.Context` as the first
argument.
2020-05-26 10:18:09 -04:00
Mahmood Ali 1eff8b0ed8 volumewatcher: no batcher when disabling
When disabling volumewatcher (at the end of a test), avoid starting a
new update batcher with its new goroutine.
2020-05-26 10:18:09 -04:00
Mahmood Ali b895cef622 always set purgeFunc
purgeFunc cannot be nil, so ensure it's set to a no-op function in
tests.
2020-05-21 21:05:53 -04:00
Mahmood Ali 2108681c1d Endpoint for snapshotting server state 2020-05-21 20:04:38 -04:00
Mahmood Ali fbe140b26c vault: ensure ttl expired tokens are purge
If a token is scheduled for revocation expires before we revoke it,
ensure that it is marked as purged in raft and is only removed from
local vault state if the purge operation succeeds.

Prior to this change, we may remove the accessor from local state but
not purge it from Raft.  This causes unnecessary and churn in the next
leadership elections (and until 0.11.2 result in indefinite retries).
2020-05-21 19:54:50 -04:00
Mahmood Ali aa8e79e55b Reorder leadership handling
Start serving RPC immediately after leader components are enabled, and
move clean up to the bottom as they don't block leadership
responsibilities.
2020-05-21 08:30:31 -04:00
Mahmood Ali 1cf1114627 apply the same change to consul revocation 2020-05-21 08:30:31 -04:00
Mahmood Ali 1399d02f45 rate limit revokeDaemon 2020-05-21 08:30:31 -04:00
Mahmood Ali 6e749d12a0 on leadership establishment, revoke Vault tokens in background
Establishing leadership should be very fast and never make external API
calls.

This fixes a situation where there is a long backlog of Vault tokens to
be revoked on when leadership is gained.  In such case, revoking the
tokens will significantly slow down leadership establishment and slow
down processing.  Worse, the revocation call does not honor leadership
`stopCh` signals, so it will not stop when the leader loses leadership.
2020-05-21 07:38:27 -04:00
Tim Gross 72430a4e62
csi: don't pass volume claim releases thru GC eval (#8021)
Following the new volumewatcher in #7794 and performance improvements
to it that landed afterwards, there's no particular reason we should
be threading claim releases through the GC eval rather than writing an
empty `CSIVolumeClaimRequest` with the mode set to
`CSIVolumeClaimRelease`, just as the GC evaluation would do.

Also, by batching up these raft messages, we can reduce the amount of
raft writes by 1 and cross-server RPCs by 1 per volume we release
claims on.
2020-05-20 15:22:51 -04:00
Tim Gross 3902709c0a
csi: check for empty arguments on CSI endpoint (#8027)
Some of the CSI RPC endpoints were missing validation that the ID or
the Volume definition was present. This could result in nonsense
`CSIVolume` structs being written to raft during registration. This
changeset corrects that bug and adds validation checks to present
nicer error messages to operators in some other cases.
2020-05-20 10:22:24 -04:00
Charlie Voiselle 70303c906c
Simplify comments
Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
2020-05-19 15:05:24 -04:00
Charlie Voiselle 6976a7699e
Set Updated to true for all non-CAS requests 2020-05-19 12:59:39 -04:00
Mahmood Ali 406fce90c3 list all jobs on namespaces the token can access 2020-05-19 09:51:41 -04:00
Seth Hoenig f6c8db8a8a consul/connect: use task kind to get service name
Fixes #8000

When requesting a Service Identity token from Consul, use the TaskKind
of the Task to get at the service name associated with the task. In
the past using the TaskName worked because it was generated as a sidecar
task with a name that included the service. In the Native context, we
need to get at the service name in a more correct way, i.e. using the
TaskKind which is defined to include the service name.
2020-05-18 13:46:00 -06:00
Mahmood Ali 5ab2d52e27 endpoint to expose all jobs across all namespaces
Allow a `/v1/jobs?all_namespaces=true` to list all jobs across all
namespaces.  The returned list is to contain a `Namespace` field
indicating the job namespace.

If ACL is enabled, the request token needs to be a management token or
have `namespace:list-jobs` capability on all existing namespaces.
2020-05-18 13:50:46 -04:00
Tim Gross 2082cf738a
csi: support for VolumeContext and VolumeParameters (#7957)
The MVP for CSI in the 0.11.0 release of Nomad did not include support
for opaque volume parameters or volume context. This changeset adds
support for both.

This also moves args for ControllerValidateCapabilities into a struct.
The CSI plugin `ControllerValidateCapabilities` struct that we turn
into a CSI RPC is accumulating arguments, so moving it into a request
struct will reduce the churn of this internal API, make the plugin
code more readable, and make this method consistent with the other
plugin methods in that package.
2020-05-15 08:16:01 -04:00
Mahmood Ali b385a5d063
Merge pull request #7959 from hashicorp/b-deleted-vault-accessors
vault: ensure that token revocation is idempotent
2020-05-14 12:39:06 -04:00
Mahmood Ali 077342c528 vault: ensure that token revocation is idempotent
This ensures that token revocation is idempotent and can handle when
tokens are revoked out of band.

Idempotency is important to handle some transient failures and retries.
Consider when a single token of a batch fails to be revoked, nomad would
retry revoking the entire batch; tokens already revoked should be
gracefully handled, otherwise, nomad may retry revoking the same
tokens forever.
2020-05-14 11:30:32 -04:00
Mahmood Ali 6ac166e1aa vault: failing test for repeated revocation 2020-05-14 11:30:29 -04:00
Lang Martin d3c4700cd3
server: stop after client disconnect (#7939)
* jobspec, api: add stop_after_client_disconnect

* nomad/state/state_store: error message typo

* structs: alloc methods to support stop_after_client_disconnect

1. a global AllocStates to track status changes with timestamps. We
   need this to track the time at which the alloc became lost
   originally.

2. ShouldClientStop() and WaitClientStop() to actually do the math

* scheduler/reconcile_util: delayByStopAfterClientDisconnect

* scheduler/reconcile: use delayByStopAfterClientDisconnect

* scheduler/util: updateNonTerminalAllocsToLost comments

This was setup to only update allocs to lost if the DesiredStatus had
already been set by the scheduler. It seems like the intention was to
update the status from any non-terminal state, and not all lost allocs
have been marked stop or evict by now

* scheduler/testing: AssertEvalStatus just use require

* scheduler/generic_sched: don't create a blocked eval if delayed

* scheduler/generic_sched_test: several scheduling cases
2020-05-13 16:39:04 -04:00
Mahmood Ali 3b4116e0db
Merge pull request #7894 from hashicorp/b-cronexpr-dst-fix
Fix Daylight saving transition handling
2020-05-12 16:36:11 -04:00
Tim Gross 4374c1a837
csi: support Secrets parameter in CSI RPCs (#7923)
CSI plugins can require credentials for some publishing and
unpublishing workflow RPCs. Secrets are configured at the time of
volume registration, stored in the volume struct, and then passed
around as an opaque map by Nomad to the plugins.
2020-05-11 17:12:51 -04:00
Mahmood Ali 938e916d9c When serializing msgpack, only consider codec tag
When serializing structs with msgpack, only consider type tags of
`codec`.

Hashicorp/go-msgpack (based on ugorji/go) defaults to interpretting
`codec` tag if it's available, but falls to using `json` if `codec`
isn't present.

This behavior is surprising in cases where we want to serialize json
differently from msgpack, e.g. serializing `ConsulExposeConfig`.
2020-05-11 14:14:10 -04:00
Mahmood Ali b4fa8e9588 codec: we use hashicorp/go-msgpack exclusively
No need to maintain two msgpack handles!
2020-05-11 14:05:29 -04:00
Tim Gross 6554e9ee37
csi: log fallthrough on invalid node IDs for client RPC (#7918)
When a CSI client RPC is given a specific node for a controller but
the lookup fails (because the node is gone or is an older version), we
fallthrough to select a node from all those available. This adds
logging to this case to aid in diagnostics.
2020-05-11 12:26:10 -04:00
Tim Gross 1ec41b6770
volumewatcher: stop watcher goroutines when there's no work (#7909)
The watcher goroutines will be automatically started if a volume has
updates, but when idle we shouldn't keep a goroutine running and
taking up memory.
2020-05-11 09:32:05 -04:00
Mahmood Ali 061a439f2c
Merge pull request #7912 from hashicorp/f-scheduler-algorithm-followup
Scheduler Algorithm Defaults handling and docs
2020-05-11 09:30:58 -04:00
Mahmood Ali 0384543d05
Merge pull request #7913 from hashicorp/deflake-TestTaskTemplateManager_BlockedEvents
Deflake TestTaskTemplateManager_BlockedEvents test
2020-05-11 09:30:44 -04:00
Mahmood Ali dff0fcf2f3
Merge pull request #7914 from hashicorp/b-csi-fix-slice-initialization
Fix slice initialization
2020-05-11 09:27:01 -04:00
Tim Gross 3aa761b151
Periodic GC for volume claims (#7881)
This changeset implements a periodic garbage collection of CSI volumes
with missing allocations. This can happen in a scenario where a node
update fails partially and the allocation updates are written to raft
but the evaluations to GC the volumes are dropped. This feature will
cover this edge case and ensure that upgrades from 0.11.0 and 0.11.1
get any stray claims cleaned up.
2020-05-11 08:20:50 -04:00
James Rasell aaf2fe033e
Merge pull request #7903 from hashicorp/b-gh-7902
api: validate scale count value is not negative.
2020-05-11 09:17:01 +02:00
Mahmood Ali 9fac6ea5d9 Fix slice initialization 2020-05-09 21:35:42 -04:00
Mahmood Ali 64de395df0 tests: ease debugging TestClientEndpoint_CreateNodeEvals
TestClientEndpoint_CreateNodeEvals flakes a bit but its output is very
confusing, as `structs.Evaluations` overrides GoString.

Here, we emit the entire struct of the evaluation, and hopefully we'll
figure out the problem the next time it happens
2020-05-09 16:04:32 -04:00
Mahmood Ali ff5c3e81b0 avoid logging after a test completes 2020-05-09 14:40:00 -04:00
Mahmood Ali 2c963885b0 handle upgrade path and defaults
Ensure that `""` Scheduler Algorithm gets explicitly set to binpack on
upgrades or on API handling when user misses the value.

The scheduler already treats `""` value as binpack.  This PR merely
ensures that the operator API returns the effective value.
2020-05-09 12:34:08 -04:00
Tim Gross 8373e917fc
volumewatcher: set maximum batch size for raft update (#7907)
The `volumewatcher` has a 250ms batch window so claim updates will not
typically be large enough to risk exceeding the maximum raft message
size. But large jobs might have enough volume claims that this could
be a danger. Set a maximum batch size of 100 messages per
batch (roughly 33K), as a very conservative safety/robustness guard.

Co-authored-by: Chris Baker <1675087+cgbaker@users.noreply.github.com>
2020-05-08 16:53:57 -04:00
James Rasell 55a2ad3854
api: validate scale count value is not negative.
An operator could submit a scale request including a negative count
value. This negative value caused the Nomad server to panic. The
fix adds validation to the submitted count, returning an error to
the caller if it is negative.
2020-05-08 16:51:40 +02:00
Mahmood Ali 57435950d7 Update current DST and some code style issues 2020-05-07 19:27:05 -04:00
Mahmood Ali c8fb132956 Update cronexpr to point to hashicorp/cronexpr 2020-05-07 17:50:45 -04:00
Mahmood Ali 507c0b8f64 tests for periodic job scheduling and DST 2020-05-07 17:36:59 -04:00
Tim Gross 42f9d517d8
CSI volumewatcher testability improvments (#7889)
* volumewatcher: remove redundant log fields

The constructor for `volumeWatcher` already sets a `logger.With` that
includes the volume ID and namespace fields. Remove them from the
various trace logs.

* volumewatcher: advance state for controller already released

One way of bypassing client RPCs in testing is to set a claim status
to controller-detached, but this results in an incorrect claim state
when we checkpoint.
2020-05-07 15:57:24 -04:00
Tim Gross 801ebcfe8d
periodic GC for CSI plugins (#7878)
This changeset implements a periodic garbage collection of unused CSI
plugins. Plugins are self-cleaning when the last allocation for a
plugin is stopped, but this feature will cover any missing edge cases
and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins
cleaned up.
2020-05-06 16:49:12 -04:00
Tim Gross 00c9bd7ff0
reorder volume claim batch request raft message (#7871)
For backwards compatibility during upgrades, new raft message types
need to come at the end of the enum.
2020-05-06 08:57:51 -04:00
Tim Gross ce86a594a6
csi: fix plugin counts on node update (#7844)
In this changeset:

* If a Nomad client node is running both a controller and a node
  plugin (which is a common case), then if only the controller or the
  node is removed, the plugin was not being updated with the correct
  counts.
* The existing test for plugin cleanup didn't go back to the state
  store, which normally is ok but is complicated in this case by
  denormalization which changes the behavior. This commit makes the
  test more comprehensive.
* Set "controller required" when plugin has `PUBLISH_READONLY`. All
  known controllers that support `PUBLISH_READONLY` also support
  `PUBLISH_UNPUBLISH_VOLUME` but we shouldn't assume this.
* Only create plugins when the allocs for those plugins are
  healthy. If we allow a plugin to be created for the first time when
  the alloc is not healthy, then we'll recreate deleted plugins when
  the job's allocs all get marked terminal.
* Terminal plugin alloc updates should cleanup the plugin. The client
  fingerprint can't tell if the plugin is unhealthy intentionally (for
  the case of updates or job stop). Allocations that are
  server-terminal should delete themselves from the plugin and trigger
  a plugin self-GC, the same as an unused node.
2020-05-05 15:39:57 -04:00
Tim Gross 22e3815e8c
docstring improvements and typo fixes (#7862) 2020-05-05 10:30:50 -04:00
Tim Gross 1c6dcab56b
volumewatcher: remove spurious nil-check (#7858)
The nil-check here is left-over from an earlier approach that didn't
get merged. It doesn't do anything for us now as we can't ever pass it
`nil` and if we leave it in the `getVolume` call it guards will panic
anyways.
2020-05-04 12:28:32 -04:00
Mahmood Ali 78ae7b885a
Merge pull request #7810 from hashicorp/spread-configuration
spread scheduling algorithm
2020-05-01 13:15:19 -04:00
Mahmood Ali 3da74068dd changelog and fix typo 2020-05-01 13:14:20 -04:00
Mahmood Ali b9e3cde865 tests and some clean up 2020-05-01 13:13:30 -04:00
Charlie Voiselle d8e5e02398 Wiring algorithm to scheduler calls 2020-05-01 13:13:29 -04:00
Charlie Voiselle 663fb677cf Add SchedulerAlgorithm to SchedulerConfig 2020-05-01 13:13:29 -04:00
Lang Martin 28bac139cb client/heartbeatstop: destroy allocs when disconnected from servers
- track lastHeartbeat, the client local time of the last successful
  heartbeat round trip
- track allocations with `stop_after_client_disconnect` configured
- trigger allocation destroy (which handles cleanup)
- restore heartbeat/killable allocs tracking when allocs are recovered from disk
- on client restart, stop those allocs after a grace period if the
  servers are still partioned
2020-05-01 12:35:49 -04:00
Michael Schurter c901d0e7dd
Merge branch 'master' into b-reserved-scoring 2020-04-30 14:48:14 -07:00
Tim Gross 52e805a6a6
csi: ensure Read/WriteAllocs aren't released early (#7841)
We should only remove the `ReadAllocs`/`WriteAllocs` values for a
volume after the claim has entered the "ready to free"
state. The volume will eventually be released as expected. But
querying the volume API will show the volume is released before the
controller unpublish has finished and this can cause a race with
starting new jobs.

Test updates are to cover cases where we're dropping claims but not
running through the whole reaping process.
2020-04-30 17:11:31 -04:00
Tim Gross a7a64443e1
csi: move volume claim release into volumewatcher (#7794)
This changeset adds a subsystem to run on the leader, similar to the
deployment watcher or node drainer. The `Watcher` performs a blocking
query on updates to the `CSIVolumes` table and triggers reaping of
volume claims.

This will avoid tying up scheduling workers by immediately sending
volume claim workloads into their own loop, rather than blocking the
scheduling workers in the core GC job doing things like talking to CSI
controllers

The volume watcher is enabled on leader step-up and disabled on leader
step-down.

The volume claim GC mechanism now makes an empty claim RPC for the
volume to trigger an index bump. That in turn unblocks the blocking
query in the volume watcher so it can assess which claims can be
released for a volume.
2020-04-30 09:13:00 -04:00
Tim Gross e34f099d20
csi: read-repair CSI volume claims (#7824)
The `CSIVolumeClaim` fields were added after 0.11.1, so claims made
before that may be missing the value. Repair this when we read the
volume out of the state store.

The `NodeID` field was added after 0.11.0, so we need to ensure it's
been populated during upgrades from 0.11.0.
2020-04-29 11:57:19 -04:00
Mahmood Ali 18f16cfb12
Merge pull request #7818 from greut/codegen
structs: give codecgen import
2020-04-28 12:16:41 -04:00
Chris Baker 315bcf1060
Merge pull request #7816 from hashicorp/b-7789-job-scaling-status-issues
fix issues in Job.ScaleStatus
2020-04-28 06:33:42 -05:00
Yoan Blanc 5ca31f23e5
structs: give codecgen import
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-04-28 08:23:20 +02:00
Nick Ethier 4b810b697a
nomad: build dynamic port for exposed checks if not specified (#7800) 2020-04-28 00:07:41 -04:00
Chris Baker 73f1390316 modified Job.ScaleStatus to ignore deployments and look directly at the
allocations, ignoring canaries
2020-04-27 21:45:39 +00:00
Tim Gross 083b35d651
csi: checkpoint volume claim garbage collection (#7782)
Adds a `CSIVolumeClaim` type to be tracked as current and past claims
on a volume. Allows for a client RPC failure during node or controller
detachment without having to keep the allocation around after the
first garbage collection eval.

This changeset lays groundwork for moving the actual detachment RPCs
into a volume watching loop outside the GC eval.
2020-04-23 11:06:23 -04:00
Chris Baker 09d980be2b modify state store so that autoscaling policies are deleted from their
table as job is stopped (and recreated when job is started)
2020-04-21 23:01:26 +00:00
Tim Gross bd74b593d0
csi: nil-check allocs for VolumeDenormalize and claim methods (#7760) 2020-04-21 08:32:24 -04:00
Michael Dwan ba70c54340
fix panic while deleting CSI plugins for missing job (#7758) 2020-04-20 17:13:33 -04:00
Seth Hoenig 40e0f8a346
Merge pull request #7690 from hashicorp/b-inspect-proxy-output
two fixes for inspect on connect proxy
2020-04-20 10:17:54 -06:00
Anthony Scalisi 9664c6b270
fix spelling errors (#6985) 2020-04-20 09:28:19 -04:00
Jorge Marey 30b877c63a Fix get all vault token policies 2020-04-16 16:38:24 +02:00
Michael Schurter 4c5a0cae35 core: fix node reservation scoring
The BinPackIter accounted for node reservations twice when scoring nodes
which could bias scores toward nodes with reservations.

Pseudo-code for previous algorithm:
```
	proposed  = reservedResources + sum(allocsResources)
	available = nodeResources - reservedResources
	score     = 1 - (proposed / available)
```

The node's reserved resources are added to the total resources used by
allocations, and then the node's reserved resources are later
substracted from the node's overall resources.

The new algorithm is:
```
	proposed  = sum(allocResources)
	available = nodeResources - reservedResources
	score     = 1 - (proposed / available)
```

The node's reserved resources are no longer added to the total resources
used by allocations.

My guess as to how this bug happened is that the resource utilization
variable (`util`) is calculated and returned by the `AllocsFit` function
which needs to take reserved resources into account as a basic
feasibility check.

To avoid re-calculating alloc resource usage (because there may be a
large number of allocs), we reused `util` in the `ScoreFit` function.
`ScoreFit` properly accounts for reserved resources by subtracting them
from the node's overall resources. However since `util` _also_ took
reserved resources into account the score would be incorrect.

Prior to the fix the added test output:
```
Node: reserved     Score: 1.0000
Node: reserved2    Score: 1.0000
Node: no-reserved  Score: 0.9741
```

The scores being 1.0 for *both* nodes with reserved resources is a good
hint something is wrong as they should receive different scores. Upon
further inspection the double accounting of reserved resources caused
their scores to be >1.0 and clamped.

After the fix the added test outputs:
```
Node: no-reserved  Score: 0.9741
Node: reserved     Score: 0.9480
Node: reserved2    Score: 0.8717
```
2020-04-15 15:13:30 -07:00
Seth Hoenig d5ad580d5c structs: fix compatibility between api and nomad/structs proxy definitions
The field names within the structs representing the Connect proxy
definition were not the same (nomad/structs/ vs api/), causing the
values to be lost in translation for the 'nomad job inspect' command.

Since the field names already shipped in v0.11.0 we cannot simply
fix the names. Instead, use the json struct tag on the structs/ structs
to remap the name to match the publicly expose api/ package on json
encoding.

This means existing jobs from v0.11.0 will continue to work, and
the JSON API for job submission will remain backwards compatible.
2020-04-13 15:59:45 -06:00
Tim Gross 4e9bd1e1d1
refactor: consolidate private methods for CSI RPC (#7702)
Follow-up for a method missed in the refactor for #7688. The
`volAndPluginLookup` method is only ever called from the server's `CSI`
RPC and never the `ClientCSI` RPC, so move it into that scope.
2020-04-13 10:46:43 -04:00
Tim Gross f37e986b1b
refactor: make nodeForControllerPlugin private to ClientCSI (#7688)
The current design of `ClientCSI` RPC requires that callers in the
server know about the free-standing `nodeForControllerPlugin`
function. This makes it difficult to send `ClientCSI` RPC messages
from subpackages of `nomad` and adds a bunch of boilerplate to every
server-side caller of a controller RPC.

This changeset makes it so that the `ClientCSI` RPCs will populate and
validate the controller's client node ID if it hasn't been passed by
the caller, centralizing the logic of picking and validating
controller targets into the `nomad.ClientCSI` struct.
2020-04-10 16:47:21 -04:00
Seth Hoenig 20802da8fd connect: correctly deal with nil sidecar_service task stanza
Before, if the sidecar_service stanza of a connect enabled service
was missing, the job submission would cause a panic in the nomad
agent. Since the panic was happening in the API handler the agent
itself continued running, but this change will the condition more
gracefully.

By fixing the `Copy` method, the API handler now returns the proper
error.

$ nomad job run foo.nomad
Error submitting job: Unexpected response code: 500 (1 error occurred:
	* Task group api validation failed: 2 errors occurred:
	* Missing tasks for task group
	* Task group service validation failed: 1 error occurred:
	* Service[0] count-api validation failed: 1 error occurred:
	* Consul Connect must be native or use a sidecar service
2020-04-09 20:28:17 -06:00
Drew Bailey 4ab7c03641
Merge pull request #7618 from hashicorp/b-shutdown-delay-updates
Fixes bug that prevented group shutdown_delay updates
2020-04-06 13:05:20 -04:00
Drew Bailey 0d4bb6bf92
guard against nil maps 2020-04-06 12:25:50 -04:00
Drew Bailey 3b8afce9e6
test added and removed 2020-04-06 11:53:46 -04:00
Drew Bailey 9874e7b21d
Group shutdown delay fixes
Group shutdown delay updates were not properly handled in Update hook.
This commit also ensures that plan output is displayed.
2020-04-06 11:29:12 -04:00
Tim Gross 73dc2ad443 e2e/csi: add waiting for alloc stop 2020-04-06 10:15:55 -04:00
Tim Gross 027277a0d9 csi: make volume GC in job deregister safely async
The `Job.Deregister` call will block on the client CSI controller RPCs
while the alloc still exists on the Nomad client node. So we need to
make the volume claim reaping async from the `Job.Deregister`. This
allows `nomad job stop` to return immediately. In order to make this
work, this changeset changes the volume GC so that the GC jobs are on a
by-volume basis rather than a by-job basis; we won't have to query
the (possibly deleted) job at the time of volume GC. We smuggle the
volume ID and whether it's a purge into the GC eval ID the same way we
smuggled the job ID previously.
2020-04-06 10:15:55 -04:00
Tim Gross 5a3b45864d csi: fix unpublish workflow ID mismatches
The CSI plugins uses the external volume ID for all operations, but
the Client CSI RPCs uses the Nomad volume ID (human-friendly) for the
mount paths. Pass the External ID as an arg in the RPC call so that
the unpublish workflows have it without calling back to the server to
find the external ID.

The controller CSI plugins need the CSI node ID (or in other words,
the storage provider's view of node ID like the EC2 instance ID), not
the Nomad node ID, to determine how to detach the external volume.
2020-04-06 10:15:55 -04:00
Lang Martin 1750426d04
csi: run volume claim GC on job stop -purge (#7615)
* nomad/state/state_store: error message copy/paste error

* nomad/structs/structs: add a VolumeEval to the JobDeregisterResponse

* nomad/job_endpoint: synchronously, volumeClaimReap on job Deregister

* nomad/core_sched: make volumeClaimReap available without a CoreSched

* nomad/job_endpoint: Deregister return early if the job is missing

* nomad/job_endpoint_test: job Deregistion is idempotent

* nomad/core_sched: conditionally ignore alloc status in volumeClaimReap

* nomad/job_endpoint: volumeClaimReap all allocations, even running

* nomad/core_sched_test: extra argument to collectClaimsToGCImpl

* nomad/job_endpoint: job deregistration is not idempotent
2020-04-03 17:37:26 -04:00
Mahmood Ali 816a93ed4a tests: deflake TestAutopilot_RollingUpdate
I hypothesize that the flakiness in rolling update is due to shutting
down s3 server before s4 is properly added as a voter.

The chain of the flakiness is as follows:

1. Bootstrap with s1, s2, s3
2. Add s4
3. Wait for servers to register with 3 voting peers
   * But we already have 3 voters (s1, s2, and s3)
   * s4 is added as a non-voter in Raft v3 and must wait until autopilot promots it
4. Test proceeds without s4 being a voter
5. s3 shutdown
6. cluster changes stall due to leader election and too many pending configuration
changes (e.g. removing s3 from raft, promoting s4).

Here, I have the test wait until s4 is marked as a voter before shutting
down s3, so we don't have too many configuration changes at once.

In https://circleci.com/gh/hashicorp/nomad/57092, I noticed the
following events:

```
TestAutopilot_RollingUpdate: autopilot_test.go:204: adding server s4
    TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.789Z [INFO]  nomad/serf.go:60: nomad: adding server: server="nomad-137.global (Addr: 127.0.0.1:9177) (DC: dc1)"
    TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.789Z [INFO]  raft/raft.go:1018: nomad.raft: updating configuration: command=AddNonvoter server-id=c54b5bf4-1159-34f6-032d-56aefeb08425 server-addr=127.0.0.1:9177 servers="[{Suffrage:Voter ID:df01ba65-d1b2-17a9-f792-a4459b3a7c09 Address:127.0.0.1:9171} {Suffrage:Voter ID:c3337778-811e-2675-87f5-006309888387 Address:127.0.0.1:9173} {Suffrage:Voter ID:186d5e15-c473-e2b3-b5a4-3259a84e10ef Address:127.0.0.1:9169} {Suffrage:Nonvoter ID:c54b5bf4-1159-34f6-032d-56aefeb08425 Address:127.0.0.1:9177}]"

    TestAutopilot_RollingUpdate: autopilot_test.go:218: shutting down server s3
    TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.797Z [INFO]  raft/replication.go:456: nomad.raft: aborting pipeline replication: peer="{Nonvoter c54b5bf4-1159-34f6-032d-56aefeb08425 127.0.0.1:9177}"
    TestAutopilot_RollingUpdate: autopilot_test.go:235: waiting for s4 to stabalize and be promoted
    TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.975Z [ERROR] raft/raft.go:1656: nomad.raft: failed to make requestVote RPC: target="{Voter c3337778-811e-2675-87f5-006309888387 127.0.0.1:9173}" error="dial tcp 127.0.0.1:9173: connect: connection refused"
    TestAutopilot_RollingUpdate: retry.go:121: autopilot_test.go:241: don't want "c3337778-811e-2675-87f5-006309888387"
        autopilot_test.go:241: didn't find map[c54b5bf4-1159-34f6-032d-56aefeb08425:true] in []raft.ServerID{"df01ba65-d1b2-17a9-f792-a4459b3a7c09", "186d5e15-c473-e2b3-b5a4-3259a84e10ef"}
```

Note how s3, c3337778, is present in the peers list in the final
failure, but s4, c54b5bf4, is added as a Nonvoter and isn't present in
the final peers list.
2020-04-03 17:15:41 -04:00
Mahmood Ali 5587dc58c0 Use lowercase for hcl keys
This is not a change in behavior, hcl key matching is case insensitive
as desmonstrated in `command.agent/TestConfig_Parse`
2020-04-03 07:56:00 -04:00
Tim Gross f6b3d38eb8
CSI: move node unmount to server-driven RPCs (#7596)
If a volume-claiming alloc stops and the CSI Node plugin that serves
that alloc's volumes is missing, there's no way for the allocrunner
hook to send the `NodeUnpublish` and `NodeUnstage` RPCs.

This changeset addresses this issue with a redesign of the client-side
for CSI. Rather than unmounting in the alloc runner hook, the alloc
runner hook will simply exit. When the server gets the
`Node.UpdateAlloc` for the terminal allocation that had a volume claim,
it creates a volume claim GC job. This job will made client RPCs to a
new node plugin RPC endpoint, and only once that succeeds, move on to
making the client RPCs to the controller plugin. If the node plugin is
unavailable, the GC job will fail and be requeued.
2020-04-02 16:04:56 -04:00
Nick Ethier 3557099f4c
Merge pull request #7594 from hashicorp/f-connect-lifecycle
connect: set task lifecycle config for injected sidecar task
2020-04-02 12:51:01 -04:00
Lang Martin 24449e23af
csi: volume validate namespace (#7587)
* nomad/state/state_store: enforce that the volume namespace exists

* nomad/csi_endpoint_test: a couple of broken namespaces now

* nomad/csi_endpoint_test: one more test

* nomad/node_endpoint_test: use structs.DefaultNamespace

* nomad/state/state_store_test: use DefaultNamespace
2020-04-02 10:13:41 -04:00
Nick Ethier 90b5d2b13f lint: gofmt 2020-04-01 21:23:47 -04:00
Nick Ethier 92f8bfc729 connect: set task lifecycle config for injected sidecar task
fixes #7593
2020-04-01 21:19:41 -04:00
Chris Baker c3ab837d9e job_endpoint: fixed bad test 2020-04-01 18:11:58 +00:00
Chris Baker 285728f3fa Merge branch 'f-7422-scaling-events' of github.com:hashicorp/nomad into f-7422-scaling-events 2020-04-01 17:28:50 +00:00
Chris Baker 8ec252e627 added indices to the job scaling events, so we could properly do
blocking queries on the job scaling status
2020-04-01 17:28:19 +00:00