Commit graph

1605 commits

Author SHA1 Message Date
Nick Ethier 89118016fc
command: correctly show host IP in ports output /w multi-host networks (#8289) 2020-06-25 15:16:01 -04:00
Tim Gross a449009e9f
multiregion validation fixes (#8265)
Multi-region jobs need to bypass validating counts otherwise we get spurious
warnings in Job.Plan.
2020-06-24 12:18:51 -04:00
Seth Hoenig 3872b493e5
Merge pull request #8011 from hashicorp/f-cnative-host
consul/connect: implement initial support for connect native
2020-06-24 10:33:12 -05:00
Seth Hoenig 011c6b027f connect/native: doc and comment tweaks from PR 2020-06-24 10:13:22 -05:00
Michael Schurter 7869ebc587 docs: add comments to structs.Port struct 2020-06-23 11:38:01 -07:00
Seth Hoenig 6c5ab7f45e consul/connect: split connect native flag and task in service 2020-06-23 10:22:22 -05:00
Seth Hoenig 4d71f22a11 consul/connect: add support for running connect native tasks
This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.

The `connect` stanza now includes a `native` parameter, which can be
set to the name of task that backs the Connect Native Consul service.

There is a new Client configuration parameter for the `consul` stanza
called `share_ssl`. Like `allow_unauthenticated` the default value is
true, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does NOT include auth or token information.

If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the CONSUL_HTTP_TOKEN
environment variable.

Any of the automatically set environment variables can be overridden by
the Connect Native task using the `env` stanza.

Fixes #6083
2020-06-22 14:07:44 -05:00
Michael Schurter 562704124d
Merge pull request #8208 from hashicorp/f-multi-network
multi-interface network support
2020-06-19 15:46:48 -07:00
Nick Ethier a87e91e971
test: fix up testing around host networks 2020-06-19 13:53:31 -04:00
Nick Ethier f0ac1f027a
lint: spelling 2020-06-19 11:29:41 -04:00
Tim Gross b654e1b8a4
multiregion: all regions start in running if no max_parallel (#8209)
If `max_parallel` is not set, all regions should begin in a `running` state
rather than a `pending` state. Otherwise the first region is set to `running`
and then all the remaining regions once it enters `blocked. That behavior is
technically correct in that we have at most `max_parallel` regions running,
but definitely not what a user expects.
2020-06-19 11:17:09 -04:00
Nick Ethier f0559a8162
multi-interface network support 2020-06-19 09:42:10 -04:00
Tim Gross 8a354f828f
store ACL Accessor ID from Job.Register with Job (#8204)
In multiregion deployments when ACLs are enabled, the deploymentwatcher needs
an appropriately scoped ACL token with the same `submit-job` rights as the
user who submitted it. The token will already be replicated, so store the
accessor ID so that it can be retrieved by the leader.
2020-06-19 07:53:29 -04:00
Mahmood Ali 38a01c050e
Merge pull request #8192 from hashicorp/f-status-allnamespaces-2
CLI Allow querying all namespaces for jobs and allocations - Try 2
2020-06-18 20:16:52 -04:00
Nick Ethier 4a44deaa5c CNI Implementation (#7518) 2020-06-18 11:05:29 -07:00
Nick Ethier 0bc0403cc3 Task DNS Options (#7661)
Co-Authored-By: Tim Gross <tgross@hashicorp.com>
Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>
2020-06-18 11:01:31 -07:00
Mahmood Ali e784fe331a use '*' to indicate all namespaces
This reverts the introduction of AllNamespaces parameter that was merged
earlier but never got released.
2020-06-17 16:27:43 -04:00
Tim Gross c14a75bfab multiregion: use pending instead of paused
The `paused` state is used as an operator safety mechanism, so that they can
debug a deployment or halt one that's causing a wider failure. By using the
`paused` state as the first state of a multiregion deployment, we risked
resuming an intentionally operator-paused deployment because of activity in a
peer region.

This changeset replaces the use of the `paused` state with a `pending` state,
and provides a `Deployment.Run` internal RPC to replace the use of the
`Deployment.Pause` (resume) RPC we were using in `deploymentwatcher`.
2020-06-17 11:06:14 -04:00
Tim Gross fd50b12ee2 multiregion: integrate with deploymentwatcher
* `nextRegion` should take status parameter
* thread Deployment/Job RPCs thru `nextRegion`
* add `nextRegion` calls to `deploymentwatcher`
* use a better description for paused for peer
2020-06-17 11:06:00 -04:00
Tim Gross 7b12445f29 multiregion: change AutoRevert to OnFailure 2020-06-17 11:05:45 -04:00
Tim Gross 5c4d0a73f4 start all but first region deployment in paused state 2020-06-17 11:05:34 -04:00
Tim Gross 48e9f75c1e multiregion: deploymentwatcher hooks
This changeset establishes hooks in deploymentwatcher for multiregion
deployments (for the enterprise version of Nomad).
2020-06-17 11:05:18 -04:00
Tim Gross b09b7a2475 Multiregion job registration
Integration points for multiregion jobs to be registered in the enterprise
version of Nomad:
* hook in `Job.Register` for enterprise to send job to peer regions
* remove monitoring from `nomad job run` and `nomad job stop` for multiregion jobs
2020-06-17 11:04:58 -04:00
Drew Bailey 9263fcb0d3 Multiregion deploy status and job status CLI 2020-06-17 11:03:34 -04:00
Tim Gross 473a0f1d44 multiregion: unblock and cancel RPCs 2020-06-17 11:02:26 -04:00
Tim Gross ede3a4f1c4 multiregion: request structs 2020-06-17 11:00:34 -04:00
Tim Gross 6851024925 Multiregion structs
Initial struct definitions, jobspec parsing, validation, and conversion
between Nomad structs and API structs for multi-region deployments.
2020-06-17 11:00:14 -04:00
Chris Baker 1e3563e08c wip: added PreserveCounts to struct.JobRegisterRequest, development test for Job.Register 2020-06-16 18:45:17 +00:00
Chris Baker aeb3ed449e wip: added .PreviousCount to api.ScalingEvent and structs.ScalingEvent, with developmental tests 2020-06-15 19:40:21 +00:00
Mahmood Ali c17ffb2d35
Merge pull request #8131 from hashicorp/f-snapshot-restore
Implement snapshot restore
2020-06-15 08:32:34 -04:00
Lang Martin 069840bef8
scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect (#8105) (#8138)
* scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect

* scheduler/reconcile: thread follupEvalIDs through to results.stop

* scheduler/reconcile: comment typo

* nomad/_test: correct arguments for plan.AppendStoppedAlloc

* scheduler/reconcile: avoid nil, cleanup handleDelayed(Lost|Reschedules)
2020-06-09 17:13:53 -04:00
Mahmood Ali 9eb13ae144 basic snapshot restore 2020-06-07 15:46:23 -04:00
Mahmood Ali de44d9641b
Merge pull request #8047 from hashicorp/f-snapshot-save
API for atomic snapshot backups
2020-06-01 07:55:16 -04:00
Mahmood Ali 30ab9c84e5 more review feedback 2020-05-31 21:39:09 -04:00
Mahmood Ali a73cd01a00
Merge pull request #8001 from hashicorp/f-jobs-list-across-nses
endpoint to expose all jobs across all namespaces
2020-05-31 21:28:03 -04:00
Drew Bailey 34871f89be
Oss license support for ent builds (#8054)
* changes necessary to support oss licesning shims

revert nomad fmt changes

update test to work with enterprise changes

update tests to work with new ent enforcements

make check

update cas test to use scheduler algorithm

back out preemption changes

add comments

* remove unused method
2020-05-27 13:46:52 -04:00
Mahmood Ali 2108681c1d Endpoint for snapshotting server state 2020-05-21 20:04:38 -04:00
Seth Hoenig f6c8db8a8a consul/connect: use task kind to get service name
Fixes #8000

When requesting a Service Identity token from Consul, use the TaskKind
of the Task to get at the service name associated with the task. In
the past using the TaskName worked because it was generated as a sidecar
task with a name that included the service. In the Native context, we
need to get at the service name in a more correct way, i.e. using the
TaskKind which is defined to include the service name.
2020-05-18 13:46:00 -06:00
Mahmood Ali 5ab2d52e27 endpoint to expose all jobs across all namespaces
Allow a `/v1/jobs?all_namespaces=true` to list all jobs across all
namespaces.  The returned list is to contain a `Namespace` field
indicating the job namespace.

If ACL is enabled, the request token needs to be a management token or
have `namespace:list-jobs` capability on all existing namespaces.
2020-05-18 13:50:46 -04:00
Tim Gross 2082cf738a
csi: support for VolumeContext and VolumeParameters (#7957)
The MVP for CSI in the 0.11.0 release of Nomad did not include support
for opaque volume parameters or volume context. This changeset adds
support for both.

This also moves args for ControllerValidateCapabilities into a struct.
The CSI plugin `ControllerValidateCapabilities` struct that we turn
into a CSI RPC is accumulating arguments, so moving it into a request
struct will reduce the churn of this internal API, make the plugin
code more readable, and make this method consistent with the other
plugin methods in that package.
2020-05-15 08:16:01 -04:00
Lang Martin d3c4700cd3
server: stop after client disconnect (#7939)
* jobspec, api: add stop_after_client_disconnect

* nomad/state/state_store: error message typo

* structs: alloc methods to support stop_after_client_disconnect

1. a global AllocStates to track status changes with timestamps. We
   need this to track the time at which the alloc became lost
   originally.

2. ShouldClientStop() and WaitClientStop() to actually do the math

* scheduler/reconcile_util: delayByStopAfterClientDisconnect

* scheduler/reconcile: use delayByStopAfterClientDisconnect

* scheduler/util: updateNonTerminalAllocsToLost comments

This was setup to only update allocs to lost if the DesiredStatus had
already been set by the scheduler. It seems like the intention was to
update the status from any non-terminal state, and not all lost allocs
have been marked stop or evict by now

* scheduler/testing: AssertEvalStatus just use require

* scheduler/generic_sched: don't create a blocked eval if delayed

* scheduler/generic_sched_test: several scheduling cases
2020-05-13 16:39:04 -04:00
Mahmood Ali 3b4116e0db
Merge pull request #7894 from hashicorp/b-cronexpr-dst-fix
Fix Daylight saving transition handling
2020-05-12 16:36:11 -04:00
Tim Gross 4374c1a837
csi: support Secrets parameter in CSI RPCs (#7923)
CSI plugins can require credentials for some publishing and
unpublishing workflow RPCs. Secrets are configured at the time of
volume registration, stored in the volume struct, and then passed
around as an opaque map by Nomad to the plugins.
2020-05-11 17:12:51 -04:00
Mahmood Ali 938e916d9c When serializing msgpack, only consider codec tag
When serializing structs with msgpack, only consider type tags of
`codec`.

Hashicorp/go-msgpack (based on ugorji/go) defaults to interpretting
`codec` tag if it's available, but falls to using `json` if `codec`
isn't present.

This behavior is surprising in cases where we want to serialize json
differently from msgpack, e.g. serializing `ConsulExposeConfig`.
2020-05-11 14:14:10 -04:00
Mahmood Ali b4fa8e9588 codec: we use hashicorp/go-msgpack exclusively
No need to maintain two msgpack handles!
2020-05-11 14:05:29 -04:00
Mahmood Ali 2c963885b0 handle upgrade path and defaults
Ensure that `""` Scheduler Algorithm gets explicitly set to binpack on
upgrades or on API handling when user misses the value.

The scheduler already treats `""` value as binpack.  This PR merely
ensures that the operator API returns the effective value.
2020-05-09 12:34:08 -04:00
Mahmood Ali 57435950d7 Update current DST and some code style issues 2020-05-07 19:27:05 -04:00
Mahmood Ali c8fb132956 Update cronexpr to point to hashicorp/cronexpr 2020-05-07 17:50:45 -04:00
Mahmood Ali 507c0b8f64 tests for periodic job scheduling and DST 2020-05-07 17:36:59 -04:00
Tim Gross 801ebcfe8d
periodic GC for CSI plugins (#7878)
This changeset implements a periodic garbage collection of unused CSI
plugins. Plugins are self-cleaning when the last allocation for a
plugin is stopped, but this feature will cover any missing edge cases
and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins
cleaned up.
2020-05-06 16:49:12 -04:00