open-nomad

Commit Graph

Author	SHA1	Message	Date
Seth Hoenig	5b072029f2	consul/connect: add initial support for ingress gateways This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza. ```hcl service { connect { gateway { proxy { // envoy proxy configuration } ingress { // ingress-gateway configuration entry } } } } ``` A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value. Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver. Aims to address #8294 and tangentially #8647	2020-08-21 16:21:54 -05:00
Nick Ethier	3cd5f46613	Update UI to use new allocated ports fields (#8631 ) * nomad: canonicalize alloc shared resources to populate ports * ui: network ports * ui: remove unused task network references and update tests with new shared ports model * ui: lint * ui: revert auto formatting * ui: remove unused page objects * structs: remove unrelated test from bad conflict resolution * ui: formatting	2020-08-20 11:07:13 -04:00
Mahmood Ali	8a342926b7	Respect alloc job version for lost/failed allocs This change fixes a bug where lost/failed allocations are replaced by allocations with the latest versions, even if the version hasn't been promoted yet. Now, when generating a plan for lost/failed allocations, the scheduler first checks if the current deployment is in Canary stage, and if so, it ensures that any lost/failed allocations is replaced one with the latest promoted version instead.	2020-08-19 09:52:48 -04:00
Tim Gross	38ec70eb8d	multiregion: validation should always return error for OSS (#8687 )	2020-08-18 15:35:38 -04:00
Drew Bailey	bd421b6197	Merge pull request #8453 from hashicorp/oss-multi-vault-ns oss compoments for multi-vault namespaces	2020-07-27 08:45:22 -04:00
Drew Bailey	b296558b8e	oss compoments for multi-vault namespaces adds in oss components to support enterprise multi-vault namespace feature upgrade specific doc on vault multi-namespaces vault docs update test to reflect new error	2020-07-24 10:14:59 -04:00
James Rasell	da91e1d0fc	api: add namespace to scaling status GET response object.	2020-07-24 11:19:25 +02:00
Lang Martin	a3bfd8c209	structs: Job.Validate only allows stop_after_client_disconnected on batch and service jobs (#8444 ) * nomad/structs/structs: add to Job.Validate * Update nomad/structs/structs.go Co-authored-by: Mahmood Ali <mahmood@hashicorp.com> * nomad/structs/structs: match error strings to the config file * nomad/structs/structs_test: clarify the test a bit * nomad/structs/structs_test: typo in the test error comparison Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>	2020-07-20 10:27:25 -04:00
Mahmood Ali	fbfe4ab1bd	Atomic eval insertion with job (de-)registration This fixes a bug where jobs may get "stuck" unprocessed that dispropotionately affect periodic jobs around leadership transitions. When registering a job, the job registration and the eval to process it get applied to raft as two separate transactions; if the job registration succeeds but eval application fails, the job may remain unprocessed. Operators may detect such failure, when submitting a job update and get a 500 error code, and they could retry; periodic jobs failures are more likely to go unnoticed, and no further periodic invocations will be processed until an operator force evaluation. This fixes the issue by ensuring that the job registration and eval application get persisted and processed atomically in the same raft log entry. Also, applies the same change to ensure atomicity in job deregistration. Backward Compatibility We must maintain compatibility in two scenarios: mixed clusters where a leader can handle atomic updates but followers cannot, and a recent cluster processes old log entries from legacy or mixed cluster mode. To handle this constraints: ensure that the leader continue to emit the Evaluation log entry until all servers have upgraded; also, when processing raft logs, the servers honor evaluations found in both spots, the Eval in job (de-)registration and the eval update entries. When an updated server sees mix-mode behavior where an eval is inserted into the raft log twice, it ignores the second instance. I made one compromise in consistency in the mixed-mode scenario: servers may disagree on the eval.CreateIndex value: the leader and updated servers will report the job registration index while old servers will report the index of the eval update log entry. This discripency doesn't seem to be material - it's the eval.JobModifyIndex that matters.	2020-07-14 11:59:29 -04:00
Tim Gross	bd457343de	MRD: all regions should start pending (#8433 ) Deployments should wait until kicked off by `Job.Register` so that we can assert that all regions have a scheduled deployment before starting any region. This changeset includes the OSS fixes to support the ENT work. `IsMultiregionStarter` has no more callers in OSS, so remove it here.	2020-07-14 10:57:37 -04:00
Tim Gross	0ce3c1e942	multiregion: allow empty region DCs (#8426 ) It's supposed to be possible for a region not to have `datacenters` set so that it can use the job's `datacenters` field. This requires that operators use the same DC name across multiple regions, but that's the default client configuration.	2020-07-13 13:34:19 -04:00
Jasmine Dahilig	9e27231953	add poststart hook to task hook coordinator & structs	2020-07-08 11:01:35 -07:00
Chris Baker	a77e012220	better testing of scaling parsing, fixed some broken tests by api changes	2020-07-04 19:32:37 +00:00
Chris Baker	9100b6b7c0	changes to make sure that Max is present and valid, to improve error messages * made api.Scaling.Max a pointer, so we can detect (and complain) when it is neglected * added checks to HCL parsing that it is present * when Scaling.Max is absent/invalid, don't return extraneous error messages during validation * tweak to multiregion handling to ensure that the count is valid on the interpolated regional jobs resolves #8355	2020-07-04 19:05:50 +00:00
Lang Martin	6c22cd587d	api: `nomad debug` new /agent/host (#8325 ) * command/agent/host: collect host data, multi platform * nomad/structs/structs: new HostDataRequest/Response * client/agent_endpoint: add RPC endpoint * command/agent/agent_endpoint: add Host * api/agent: add the Host endpoint * nomad/client_agent_endpoint: add Agent Host with forwarding * nomad/client_agent_endpoint: use findClientConn This changes forwardMonitorClient and forwardProfileClient to use findClientConn, which was cribbed from the common parts of those funcs. * command/debug: call agent hosts * command/agent/host: eliminate calling external programs	2020-07-02 09:51:25 -04:00
Mahmood Ali	7f460d2706	allocrunner: terminate sidecars in the end This fixes a bug where a batch allocation fails to complete if it has sidecars. If the only remaining running tasks in an allocations are sidecars - we must kill them and mark the allocation as complete.	2020-06-29 15:12:15 -04:00
Mahmood Ali	6605ebd314	Merge pull request #8223 from hashicorp/f-multi-network-validate-ports core: validate port numbers are < 65535	2020-06-26 08:31:01 -04:00
Nick Ethier	89118016fc	command: correctly show host IP in ports output /w multi-host networks (#8289 )	2020-06-25 15:16:01 -04:00
Tim Gross	a449009e9f	multiregion validation fixes (#8265 ) Multi-region jobs need to bypass validating counts otherwise we get spurious warnings in Job.Plan.	2020-06-24 12:18:51 -04:00
Seth Hoenig	3872b493e5	Merge pull request #8011 from hashicorp/f-cnative-host consul/connect: implement initial support for connect native	2020-06-24 10:33:12 -05:00
Michael Schurter	7869ebc587	docs: add comments to structs.Port struct	2020-06-23 11:38:01 -07:00
Michael Schurter	13ed710a04	core: validate port numbers are <= 65535 The scheduler returns a very strange error if it detects a port number out of range. If these would somehow make it to the client they would overflow when converted to an int32 and could cause conflicts.	2020-06-23 11:31:49 -07:00
Seth Hoenig	4d71f22a11	consul/connect: add support for running connect native tasks This PR adds the capability of running Connect Native Tasks on Nomad, particularly when TLS and ACLs are enabled on Consul. The `connect` stanza now includes a `native` parameter, which can be set to the name of task that backs the Connect Native Consul service. There is a new Client configuration parameter for the `consul` stanza called `share_ssl`. Like `allow_unauthenticated` the default value is true, but recommended to be disabled in production environments. When enabled, the Nomad Client's Consul TLS information is shared with Connect Native tasks through the normal Consul environment variables. This does NOT include auth or token information. If Consul ACLs are enabled, Service Identity Tokens are automatically and injected into the Connect Native task through the CONSUL_HTTP_TOKEN environment variable. Any of the automatically set environment variables can be overridden by the Connect Native task using the `env` stanza. Fixes #6083	2020-06-22 14:07:44 -05:00
Michael Schurter	562704124d	Merge pull request #8208 from hashicorp/f-multi-network multi-interface network support	2020-06-19 15:46:48 -07:00
Nick Ethier	a87e91e971	test: fix up testing around host networks	2020-06-19 13:53:31 -04:00
Nick Ethier	f0ac1f027a	lint: spelling	2020-06-19 11:29:41 -04:00
Tim Gross	b654e1b8a4	multiregion: all regions start in running if no max_parallel (#8209 ) If `max_parallel` is not set, all regions should begin in a `running` state rather than a `pending` state. Otherwise the first region is set to `running` and then all the remaining regions once it enters `blocked. That behavior is technically correct in that we have at most `max_parallel` regions running, but definitely not what a user expects.	2020-06-19 11:17:09 -04:00
Nick Ethier	f0559a8162	multi-interface network support	2020-06-19 09:42:10 -04:00
Tim Gross	8a354f828f	store ACL Accessor ID from Job.Register with Job (#8204 ) In multiregion deployments when ACLs are enabled, the deploymentwatcher needs an appropriately scoped ACL token with the same `submit-job` rights as the user who submitted it. The token will already be replicated, so store the accessor ID so that it can be retrieved by the leader.	2020-06-19 07:53:29 -04:00
Mahmood Ali	38a01c050e	Merge pull request #8192 from hashicorp/f-status-allnamespaces-2 CLI Allow querying all namespaces for jobs and allocations - Try 2	2020-06-18 20:16:52 -04:00
Nick Ethier	4a44deaa5c	CNI Implementation (#7518 )	2020-06-18 11:05:29 -07:00
Nick Ethier	0bc0403cc3	Task DNS Options (#7661 ) Co-Authored-By: Tim Gross <tgross@hashicorp.com> Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>	2020-06-18 11:01:31 -07:00
Mahmood Ali	e784fe331a	use '*' to indicate all namespaces This reverts the introduction of AllNamespaces parameter that was merged earlier but never got released.	2020-06-17 16:27:43 -04:00
Tim Gross	c14a75bfab	multiregion: use pending instead of paused The `paused` state is used as an operator safety mechanism, so that they can debug a deployment or halt one that's causing a wider failure. By using the `paused` state as the first state of a multiregion deployment, we risked resuming an intentionally operator-paused deployment because of activity in a peer region. This changeset replaces the use of the `paused` state with a `pending` state, and provides a `Deployment.Run` internal RPC to replace the use of the `Deployment.Pause` (resume) RPC we were using in `deploymentwatcher`.	2020-06-17 11:06:14 -04:00
Tim Gross	fd50b12ee2	multiregion: integrate with deploymentwatcher * `nextRegion` should take status parameter * thread Deployment/Job RPCs thru `nextRegion` * add `nextRegion` calls to `deploymentwatcher` * use a better description for paused for peer	2020-06-17 11:06:00 -04:00
Tim Gross	7b12445f29	multiregion: change AutoRevert to OnFailure	2020-06-17 11:05:45 -04:00
Tim Gross	5c4d0a73f4	start all but first region deployment in paused state	2020-06-17 11:05:34 -04:00
Tim Gross	b09b7a2475	Multiregion job registration Integration points for multiregion jobs to be registered in the enterprise version of Nomad: * hook in `Job.Register` for enterprise to send job to peer regions * remove monitoring from `nomad job run` and `nomad job stop` for multiregion jobs	2020-06-17 11:04:58 -04:00
Drew Bailey	9263fcb0d3	Multiregion deploy status and job status CLI	2020-06-17 11:03:34 -04:00
Tim Gross	473a0f1d44	multiregion: unblock and cancel RPCs	2020-06-17 11:02:26 -04:00
Tim Gross	ede3a4f1c4	multiregion: request structs	2020-06-17 11:00:34 -04:00
Tim Gross	6851024925	Multiregion structs Initial struct definitions, jobspec parsing, validation, and conversion between Nomad structs and API structs for multi-region deployments.	2020-06-17 11:00:14 -04:00
Chris Baker	1e3563e08c	wip: added PreserveCounts to struct.JobRegisterRequest, development test for Job.Register	2020-06-16 18:45:17 +00:00
Chris Baker	aeb3ed449e	wip: added .PreviousCount to api.ScalingEvent and structs.ScalingEvent, with developmental tests	2020-06-15 19:40:21 +00:00
Lang Martin	069840bef8	scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect (#8105 ) (#8138 ) * scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect * scheduler/reconcile: thread follupEvalIDs through to results.stop * scheduler/reconcile: comment typo * nomad/_test: correct arguments for plan.AppendStoppedAlloc * scheduler/reconcile: avoid nil, cleanup handleDelayed(Lost\|Reschedules)	2020-06-09 17:13:53 -04:00
Mahmood Ali	a73cd01a00	Merge pull request #8001 from hashicorp/f-jobs-list-across-nses endpoint to expose all jobs across all namespaces	2020-05-31 21:28:03 -04:00
Drew Bailey	34871f89be	Oss license support for ent builds (#8054 ) * changes necessary to support oss licesning shims revert nomad fmt changes update test to work with enterprise changes update tests to work with new ent enforcements make check update cas test to use scheduler algorithm back out preemption changes add comments * remove unused method	2020-05-27 13:46:52 -04:00
Seth Hoenig	f6c8db8a8a	consul/connect: use task kind to get service name Fixes #8000 When requesting a Service Identity token from Consul, use the TaskKind of the Task to get at the service name associated with the task. In the past using the TaskName worked because it was generated as a sidecar task with a name that included the service. In the Native context, we need to get at the service name in a more correct way, i.e. using the TaskKind which is defined to include the service name.	2020-05-18 13:46:00 -06:00
Mahmood Ali	5ab2d52e27	endpoint to expose all jobs across all namespaces Allow a `/v1/jobs?all_namespaces=true` to list all jobs across all namespaces. The returned list is to contain a `Namespace` field indicating the job namespace. If ACL is enabled, the request token needs to be a management token or have `namespace:list-jobs` capability on all existing namespaces.	2020-05-18 13:50:46 -04:00
Lang Martin	d3c4700cd3	server: stop after client disconnect (#7939 ) * jobspec, api: add stop_after_client_disconnect * nomad/state/state_store: error message typo * structs: alloc methods to support stop_after_client_disconnect 1. a global AllocStates to track status changes with timestamps. We need this to track the time at which the alloc became lost originally. 2. ShouldClientStop() and WaitClientStop() to actually do the math * scheduler/reconcile_util: delayByStopAfterClientDisconnect * scheduler/reconcile: use delayByStopAfterClientDisconnect * scheduler/util: updateNonTerminalAllocsToLost comments This was setup to only update allocs to lost if the DesiredStatus had already been set by the scheduler. It seems like the intention was to update the status from any non-terminal state, and not all lost allocs have been marked stop or evict by now * scheduler/testing: AssertEvalStatus just use require * scheduler/generic_sched: don't create a blocked eval if delayed * scheduler/generic_sched_test: several scheduling cases	2020-05-13 16:39:04 -04:00
Mahmood Ali	3b4116e0db	Merge pull request #7894 from hashicorp/b-cronexpr-dst-fix Fix Daylight saving transition handling	2020-05-12 16:36:11 -04:00
Mahmood Ali	938e916d9c	When serializing msgpack, only consider codec tag When serializing structs with msgpack, only consider type tags of `codec`. Hashicorp/go-msgpack (based on ugorji/go) defaults to interpretting `codec` tag if it's available, but falls to using `json` if `codec` isn't present. This behavior is surprising in cases where we want to serialize json differently from msgpack, e.g. serializing `ConsulExposeConfig`.	2020-05-11 14:14:10 -04:00
Mahmood Ali	b4fa8e9588	codec: we use hashicorp/go-msgpack exclusively No need to maintain two msgpack handles!	2020-05-11 14:05:29 -04:00
Mahmood Ali	57435950d7	Update current DST and some code style issues	2020-05-07 19:27:05 -04:00
Mahmood Ali	c8fb132956	Update cronexpr to point to hashicorp/cronexpr	2020-05-07 17:50:45 -04:00
Tim Gross	801ebcfe8d	periodic GC for CSI plugins (#7878 ) This changeset implements a periodic garbage collection of unused CSI plugins. Plugins are self-cleaning when the last allocation for a plugin is stopped, but this feature will cover any missing edge cases and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins cleaned up.	2020-05-06 16:49:12 -04:00
Tim Gross	00c9bd7ff0	reorder volume claim batch request raft message (#7871 ) For backwards compatibility during upgrades, new raft message types need to come at the end of the enum.	2020-05-06 08:57:51 -04:00
Lang Martin	28bac139cb	client/heartbeatstop: destroy allocs when disconnected from servers - track lastHeartbeat, the client local time of the last successful heartbeat round trip - track allocations with `stop_after_client_disconnect` configured - trigger allocation destroy (which handles cleanup) - restore heartbeat/killable allocs tracking when allocs are recovered from disk - on client restart, stop those allocs after a grace period if the servers are still partioned	2020-05-01 12:35:49 -04:00
Tim Gross	a7a64443e1	csi: move volume claim release into volumewatcher (#7794 ) This changeset adds a subsystem to run on the leader, similar to the deployment watcher or node drainer. The `Watcher` performs a blocking query on updates to the `CSIVolumes` table and triggers reaping of volume claims. This will avoid tying up scheduling workers by immediately sending volume claim workloads into their own loop, rather than blocking the scheduling workers in the core GC job doing things like talking to CSI controllers The volume watcher is enabled on leader step-up and disabled on leader step-down. The volume claim GC mechanism now makes an empty claim RPC for the volume to trigger an index bump. That in turn unblocks the blocking query in the volume watcher so it can assess which claims can be released for a volume.	2020-04-30 09:13:00 -04:00
Anthony Scalisi	9664c6b270	fix spelling errors (#6985 )	2020-04-20 09:28:19 -04:00
Lang Martin	1750426d04	csi: run volume claim GC on `job stop -purge` (#7615 ) * nomad/state/state_store: error message copy/paste error * nomad/structs/structs: add a VolumeEval to the JobDeregisterResponse * nomad/job_endpoint: synchronously, volumeClaimReap on job Deregister * nomad/core_sched: make volumeClaimReap available without a CoreSched * nomad/job_endpoint: Deregister return early if the job is missing * nomad/job_endpoint_test: job Deregistion is idempotent * nomad/core_sched: conditionally ignore alloc status in volumeClaimReap * nomad/job_endpoint: volumeClaimReap all allocations, even running * nomad/core_sched_test: extra argument to collectClaimsToGCImpl * nomad/job_endpoint: job deregistration is not idempotent	2020-04-03 17:37:26 -04:00
Chris Baker	8ec252e627	added indices to the job scaling events, so we could properly do blocking queries on the job scaling status	2020-04-01 17:28:19 +00:00
Chris Baker	40d6b3bbd1	adding raft and state_store support to track job scaling events updated ScalingEvent API to record "message string,error bool" instead of confusing "reason,error *string"	2020-04-01 16:15:14 +00:00
Seth Hoenig	0266f056b8	connect: enable proxy.passthrough configuration Enable configuration of HTTP and gRPC endpoints which should be exposed by the Connect sidecar proxy. This changeset is the first "non-magical" pass that lays the groundwork for enabling Consul service checks for tasks running in a network namespace because they are Connect-enabled. The changes here provide for full configuration of the connect { sidecar_service { proxy { expose { paths = [{ path = <exposed endpoint> protocol = <http or grpc> local_path_port = <local endpoint port> listener_port = <inbound mesh port> }, ... ] } } } stanza. Everything from `expose` and below is new, and partially implements the precedent set by Consul: https://www.consul.io/docs/connect/registration/service-registration.html#expose-paths-configuration-reference Combined with a task-group level network port-mapping in the form: port "exposeExample" { to = -1 } it is now possible to "punch a hole" through the network namespace to a specific HTTP or gRPC path, with the anticipated use case of creating Consul checks on Connect enabled services. A future PR may introduce more automagic behavior, where we can do things like 1) auto-fill the 'expose.path.local_path_port' with the default value of the 'service.port' value for task-group level connect-enabled services. 2) automatically generate a port-mapping 3) enable an 'expose.checks' flag which automatically creates exposed endpoints for every compatible consul service check (http/grpc checks on connect enabled services).	2020-03-31 17:15:27 -06:00
Lang Martin	8d4f39fba1	csi: add node events to report progress mounting and unmounting volumes (#7547 ) * nomad/structs/structs: new NodeEventSubsystemCSI * client/client: pass triggerNodeEvent in the CSIConfig * client/pluginmanager/csimanager/instance: add eventer to instanceManager * client/pluginmanager/csimanager/manager: pass triggerNodeEvent * client/pluginmanager/csimanager/volume: node event on [un]mount * nomad/structs/structs: use storage, not CSI * client/pluginmanager/csimanager/volume: use storage, not CSI * client/pluginmanager/csimanager/volume_test: eventer * client/pluginmanager/csimanager/volume: event on error * client/pluginmanager/csimanager/volume_test: check event on error * command/node_status: remove an extra space in event detail format * client/pluginmanager/csimanager/volume: use snake_case for details * client/pluginmanager/csimanager/volume_test: snake_case details	2020-03-31 17:13:52 -04:00
Yoan Blanc	761d014071	vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:45:21 -04:00
Michael Lange	4707a625d6	Add HostVolumes to the NodeListStub	2020-03-30 17:33:43 -07:00
Mahmood Ali	ceed57b48f	per-task restart policy	2020-03-24 17:00:41 -04:00
Chris Baker	f6ec5f9624	made count optional during job scaling actions added ACL protection in Job.Scale in Job.Scale, only perform a Job.Register if the Count was non-nil	2020-03-24 14:39:05 +00:00
Chris Baker	233db5258a	changes to Canonicalize, Validate, and api->struct conversion so that tg.Count, tg.Scaling.Min/Max are well-defined with reasonable defaults. - tg.Count defaults to tg.Scaling.Min if present (falls back on previous default of 1 if Scaling is absent) - Validate() enforces tg.Scaling.Min <= tg.Count <= tg.Scaling.Max modification in ApiScalingPolicyToStructs, api.TaskGroup.Validate so that defaults are handled for TaskGroup.Count and	2020-03-24 13:57:17 +00:00
Chris Baker	f9876a487e	finished Job.ScaleStatus RPC, need to work on http endpoint	2020-03-24 13:57:16 +00:00
Chris Baker	925b59e1d2	wip: scaling status return, almost done	2020-03-24 13:57:15 +00:00
Chris Baker	42270d862c	wip: some tests still failing updating job scaling endpoints to match RFC, cleaning up the API object as well	2020-03-24 13:57:14 +00:00
Chris Baker	abc7a52f56	finished refactoring state store, schema, etc	2020-03-24 13:57:14 +00:00
Chris Baker	3d54f1feba	wip: added Enabled to ScalingPolicyListStub, removed JobID from body of scaling request	2020-03-24 13:57:12 +00:00
Chris Baker	179ab68258	wip: added job.scale rpc endpoint, needs explicit test (tested via http now)	2020-03-24 13:57:09 +00:00
Chris Baker	8453e667c2	wip: working on job group scaling endpoint	2020-03-24 13:55:20 +00:00
Chris Baker	6665d0bfb0	wip: added policy get endpoint, added UUID to policy	2020-03-24 13:55:20 +00:00
Chris Baker	9c2560ceeb	wip: upsert/delete scaling policies on job upsert/delete	2020-03-24 13:55:18 +00:00
Chris Baker	65d92f1fbf	WIP: adding ScalingPolicy to api/structs and state store	2020-03-24 13:55:18 +00:00
Lang Martin	887e1f28c9	csi: CLI for volume status, registration/deregistration and plugin status (#7193 ) * command/csi: csi, csi_plugin, csi_volume * helper/funcs: move ExtraKeys from parse_config to UnusedKeys * command/agent/config_parse: use helper.UnusedKeys * api/csi: annotate CSIVolumes with hcl fields * command/csi_plugin: add Synopsis * command/csi_volume_register: use hcl.Decode style parsing * command/csi_volume_list * command/csi_volume_status: list format, cleanup * command/csi_plugin_list * command/csi_plugin_status * command/csi_volume_deregister * command/csi_volume: add Synopsis * api/contexts/contexts: add csi search contexts to the constants * command/commands: register csi commands * api/csi: fix struct tag for linter * command/csi_plugin_list: unused struct vars * command/csi_plugin_status: unused struct vars * command/csi_volume_list: unused struct vars * api/csi: add allocs to CSIPlugin * command/csi_plugin_status: format the allocs * api/allocations: copy Allocation.Stub in from structs * nomad/client_rpc: add some error context with Errorf * api/csi: collapse read & write alloc maps to a stub list * command/csi_volume_status: cleanup allocation display * command/csi_volume_list: use Schedulable instead of Healthy * command/csi_volume_status: use Schedulable instead of Healthy * command/csi_volume_list: sprintf string * command/csi: delete csi.go, csi_plugin.go * command/plugin: refactor csi components to sub-command plugin status * command/plugin: remove csi * command/plugin_status: remove csi * command/volume: remove csi * command/volume_status: split out csi specific * helper/funcs: add RemoveEqualFold * command/agent/config_parse: use helper.RemoveEqualFold * api/csi: do ,unusedKeys right * command/volume: refactor csi components to `nomad volume` * command/volume_register: split out csi specific * command/commands: use the new top level commands * command/volume_deregister: hardwired type csi for now * command/volume_status: csiFormatVolumes rescued from volume_list * command/plugin_status: avoid a panic on no args * command/volume_status: avoid a panic on no args * command/plugin_status: predictVolumeType * command/volume_status: predictVolumeType * nomad/csi_endpoint_test: move CreateTestPlugin to testing * command/plugin_status_test: use CreateTestCSIPlugin * nomad/structs/structs: add CSIPlugins and CSIVolumes search consts * nomad/state/state_store: add CSIPlugins and CSIVolumesByIDPrefix * nomad/search_endpoint: add CSIPlugins and CSIVolumes * command/plugin_status: move the header to the csi specific * command/volume_status: move the header to the csi specific * nomad/state/state_store: CSIPluginByID prefix * command/status: rename the search context to just Plugins/Volumes * command/plugin,volume_status: test return ids now * command/status: rename the search context to just Plugins/Volumes * command/plugin_status: support -json and -t * command/volume_status: support -json and -t * command/plugin_status_csi: comments * command/_status: clean up text api/csi: fix stale comments * command/volume: make deregister sound less fearsome * command/plugin_status: set the id length * command/plugin_status_csi: more compact plugin health * command/volume: better error message, comment	2020-03-23 13:58:30 -04:00
Tim Gross	8bc5641438	csi: volume claim garbage collection (#7125 ) When an alloc is marked terminal (and after node unstage/unpublish have been called), the client syncs the terminal alloc state with the server via `Node.UpdateAlloc RPC`. For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC handler at the server will emit an eval for a new core job to garbage collect CSI volume claims. When this eval is handled on the core scheduler, it will call a `volumeReap` method to release the claims for all terminal allocs on the job. The volume reap will issue a `ControllerUnpublishVolume` RPC for any node that has no alloc claiming the volume. Once this returns (or is skipped), the volume reap will send a new `CSIVolume.Claim` RPC that releases the volume claim for that allocation in the state store, making it available for scheduling again. This same `volumeReap` method will be called from the core job GC, which gives us a second chance to reclaim volumes during GC if there were controller RPC failures.	2020-03-23 13:58:30 -04:00
Lang Martin	a0a6766740	CSI: Scheduler knows about CSI constraints and availability (#6995 ) * structs: piggyback csi volumes on host volumes for job specs * state_store: CSIVolumeByID always includes plugins, matches usecase * scheduler/feasible: csi volume checker * scheduler/stack: add csi volumes * contributing: update rpc checklist * scheduler: add volumes to State interface * scheduler/feasible: introduce new checker collection tgAvailable * scheduler/stack: taskGroupCSIVolumes checker is transient * state_store CSIVolumeDenormalizePlugins comment clarity * structs: remote TODO comment in TaskGroup Validate * scheduler/feasible: CSIVolumeChecker hasPlugins improve comment * scheduler/feasible_test: set t.Parallel * Update nomad/state/state_store.go Co-Authored-By: Danielle <dani@hashicorp.com> * Update scheduler/feasible.go Co-Authored-By: Danielle <dani@hashicorp.com> * structs: lift ControllerRequired to each volume * state_store: store plug.ControllerRequired, use it for volume health * feasible: csi match fast path remove stale host volume copied logic * scheduler/feasible: improve comments Co-authored-by: Danielle <dani@builds.terrible.systems>	2020-03-23 13:58:29 -04:00
Tim Gross	8673ea5cba	csi: add empty CSI volume publication GC to scheduled core jobs (#7014 ) This changeset adds a new core job `CoreJobCSIVolumePublicationGC` to the leader's loop for scheduling core job evals. Right now this is an empty method body without even a config file stanza. Later changesets will implement the logic of volume publication GC.	2020-03-23 13:58:29 -04:00
Lang Martin	637ce9dfad	structs: new CSIVolume, request types	2020-03-23 13:58:29 -04:00
Danielle Lancashire	426c26d7c0	CSI Plugin Registration (#6555 ) This changeset implements the initial registration and fingerprinting of CSI Plugins as part of #5378. At a high level, it introduces the following: * A `csi_plugin` stanza as part of a Nomad task configuration, to allow a task to expose that it is a plugin. * A new task runner hook: `csi_plugin_supervisor`. This hook does two things. When the `csi_plugin` stanza is detected, it will automatically configure the plugin task to receive bidirectional mounts to the CSI intermediary directory. At runtime, it will then perform an initial heartbeat of the plugin and handle submitting it to the new `dynamicplugins.Registry` for further use by the client, and then run a lightweight heartbeat loop that will emit task events when health changes. * The `dynamicplugins.Registry` for handling plugins that run as Nomad tasks, in contrast to the existing catalog that requires `go-plugin` type plugins and to know the plugin configuration in advance. * The `csimanager` which fingerprints CSI plugins, in a similar way to `drivermanager` and `devicemanager`. It currently only fingerprints the NodeID from the plugin, and assumes that all plugins are monolithic. Missing features * We do not use the live updates of the `dynamicplugin` registry in the `csimanager` yet. * We do not deregister the plugins from the client when they shutdown yet, they just become indefinitely marked as unhealthy. This is deliberate until we figure out how we should manage deploying new versions of plugins/transitioning them.	2020-03-23 13:58:28 -04:00
Jasmine Dahilig	73a64e4397	change jobspec lifecycle stanza to use sidecar attribute instead of block_until status	2020-03-21 17:52:57 -04:00
Jasmine Dahilig	1485b342e2	remove deadline code for now	2020-03-21 17:52:56 -04:00
Jasmine Dahilig	b7f08c9d13	add appropriate lifecycle deadline default of 120s	2020-03-21 17:52:48 -04:00
Mahmood Ali	b880607bad	update scheduler to account for hooks	2020-03-21 17:52:45 -04:00
Jasmine Dahilig	f6e58d6dad	add canonicalize in the right place	2020-03-21 17:52:41 -04:00
Jasmine Dahilig	4498c8c24f	add canonicalization	2020-03-21 17:52:39 -04:00
Jasmine Dahilig	67262d841b	add validation tests and more validation	2020-03-21 17:52:39 -04:00
Mahmood Ali	214d128bd9	it's running now	2020-03-21 17:52:37 -04:00
Jasmine Dahilig	fc13fa9739	change TaskLifecycle RunLevel to Hook and add Deadline time duration	2020-03-21 17:52:37 -04:00
Mahmood Ali	4ebeac721a	update structs with lifecycle	2020-03-21 17:52:36 -04:00
Michael Schurter	2dcc85bed1	jobspec: fixup vault_grace deprecation Followup to #7170 - Moved canonicalization of VaultGrace back into `api/` package. - Fixed tests. - Made docs styling consistent.	2020-03-10 14:58:49 -07:00
Fredrik Hoem Grelland	edb3bd0f3f	Update consul-template to v0.24.1 and remove deprecated vault_grace (#7170 )	2020-02-23 16:24:53 +01:00
Seth Hoenig	587a5d4a8d	nomad: make TaskGroup.UsesConnect helper a public helper	2020-01-31 19:05:11 -06:00
Seth Hoenig	9df33f622f	nomad: proxy requests for Service Identity tokens between Clients and Consul Nomad jobs may be configured with a TaskGroup which contains a Service definition that is Consul Connect enabled. These service definitions end up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In the case where Consul ACLs are enabled, a Service Identity token is required for these tasks to run & connect, etc. This changeset enables the Nomad Server to recieve RPC requests for the derivation of SI tokens on behalf of instances of Consul Connect using Tasks. Those tokens are then relayed back to the requesting Client, which then injects the tokens in the secrets directory of the Task.	2020-01-31 19:03:53 -06:00
Seth Hoenig	93cf770edb	client: enable nomad client to request and set SI tokens for tasks When a job is configured with Consul Connect aware tasks (i.e. sidecar), the Nomad Client should be able to request from Consul (through Nomad Server) Service Identity tokens specific to those tasks.	2020-01-31 19:03:38 -06:00
Seth Hoenig	2b66ce93bb	nomad: ensure a unique ClusterID exists when leader (gh-6702) Enable any Server to lookup the unique ClusterID. If one has not been generated, and this node is the leader, generate a UUID and attempt to apply it through raft. The value is not yet used anywhere in this changeset, but is a prerequisite for gh-6701.	2020-01-31 19:03:26 -06:00
Seth Hoenig	f030a22c7c	command, docs: create and document consul token configuration for connect acls (gh-6716) This change provides an initial pass at setting up the configuration necessary to enable use of Connect with Consul ACLs. Operators will be able to pass in a Consul Token through `-consul-token` or `$CONSUL_TOKEN` in the `job run` and `job revert` commands (similar to Vault tokens). These values are not actually used yet in this changeset.	2020-01-31 19:02:53 -06:00
Mahmood Ali	9611324654	Merge pull request #6922 from hashicorp/b-alloc-canoncalize Handle Upgrades and Alloc.TaskResources modification	2020-01-28 15:12:41 -05:00
Mahmood Ali	f36cc54efd	actually always canonicalize alloc.Job alloc.Job may be stale as well and need to migrate it. It does cost extra cycles but should be negligible.	2020-01-15 09:02:48 -05:00
Mahmood Ali	b1b714691c	address review comments	2020-01-15 08:57:05 -05:00
Drew Bailey	45210ed901	Rename profile package to pprof Address pr feedback, rename profile package to pprof to more accurately describe its purpose. Adds gc param for heap lookup profiles.	2020-01-09 15:15:10 -05:00
Drew Bailey	4ced73875b	leave acl checking to rpc endpoints fix test expectation test wrapNonJSON	2020-01-09 15:15:08 -05:00
Drew Bailey	46121fe3fd	move shared structs out of client and into nomad	2020-01-09 15:15:05 -05:00
Mahmood Ali	d740d347ce	Migrate old alloc structs on read This commit ensures that Alloc.AllocatedResources is properly populated when read from persistence stores (namely Raft and client state store). The alloc struct may have been written previously by an arbitrary old version that may only populate Alloc.TaskResources.	2020-01-09 08:46:50 -05:00
Drew Bailey	d9e41d2880	docs for shutdown delay update docs, address pr comments ensure pointer is not nil use pointer for diff tests, set vs unset	2019-12-16 11:38:35 -05:00
Drew Bailey	ae145c9a37	allow only positive shutdown delay more explicit test case, remove select statement	2019-12-16 11:38:30 -05:00
Drew Bailey	24929776a2	shutdown delay for task groups copy struct values ensure groupserviceHook implements RunnerPreKillhook run deregister first test that shutdown times are delayed move magic number into variable	2019-12-16 11:38:16 -05:00
Michael Schurter	95fd2643d7	connect: canonicalize before adding sidecar Fixes #6853 Canonicalize jobs first before adding any sidecars. This fixes a bug where sidecar tasks were added without interpolated names and broke validation. Sidecar tasks must be canonicalized independently. Also adds a group network to the mock connect job because it wasn't a valid connect job before!	2019-12-12 20:55:56 -08:00
Buck Doyle	5fcc00d0f9	Add gofmt changes	2019-11-20 12:47:01 -06:00
Buck Doyle	dc9c0d5ead	Add explanatory comment	2019-11-20 11:45:44 -06:00
Buck Doyle	db77a24ed3	Merge branch 'master' into f-policy-json	2019-11-20 11:20:07 -06:00
Michael Schurter	796758b8a5	core: add semver constraint The existing version constraint uses logic optimized for package managers, not schedulers, when checking prereleases: - 1.3.0-beta1 will not satisfy ">= 0.6.1" - 1.7.0-rc1 will not satisfy ">= 1.6.0-beta1" This is due to package managers wishing to favor final releases over prereleases. In a scheduler versions more often represent the earliest release all required features/APIs are available in a system. Whether the constraint or the version being evaluated are prereleases has no impact on ordering. This commit adds a new constraint - `semver` - which will use Semver v2.0 ordering when evaluating constraints. Given the above examples: - 1.3.0-beta1 satisfies ">= 0.6.1" using `semver` - 1.7.0-rc1 satisfies ">= 1.6.0-beta1" using `semver` Since existing jobspecs may rely on the old behavior, a new constraint was added and the implicit Consul Connect and Vault constraints were updated to use it.	2019-11-19 08:40:19 -08:00
Nick Ethier	bd454a4c6f	client: improve group service stanza interpolation and check_re… (#6586 ) * client: improve group service stanza interpolation and check_restart support Interpolation can now be done on group service stanzas. Note that some task runtime specific information that was previously available when the service was registered poststart of a task is no longer available. The check_restart stanza for checks defined on group services will now properly restart the allocation upon check failures if configured.	2019-11-18 13:04:01 -05:00
Luiz Aoqui	5bd7cdd5c3	api: add `StartedAt` in `Node.DrainStrategy`	2019-11-13 17:54:40 -05:00
Nick Ethier	e947aaed4f	nomad: fix bug that didn't allow for multiple connect services in same tg	2019-11-08 04:33:39 -05:00
Danielle	fee482ae6c	Merge pull request #6331 from hashicorp/dani/f-volume-mount-propagation volumes: Add support for mount propagation	2019-10-14 14:29:40 +02:00
Danielle Lancashire	4fbcc668d0	volumes: Add support for mount propagation This commit introduces support for configuring mount propagation when mounting volumes with the `volume_mount` stanza on Linux targets. Similar to Kubernetes, we expose 3 options for configuring mount propagation: - private, which is equivalent to `rprivate` on Linux, which does not allow the container to see any new nested mounts after the chroot was created. - host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts that have been created _outside of the container_ to be visible inside the container after the chroot is created. - bidirectional, which is equivalent to `rshared` on Linux, which allows both the container to see new mounts created on the host, but importantly _allows the container to create mounts that are visible in other containers an don the host_ private and host-to-task are safe, but bidirectional mounts can be dangerous, as if the code inside a container creates a mount, and does not clean it up before tearing down the container, it can cause bad things to happen inside the kernel. To add a layer of safety here, we require that the user has ReadWrite permissions on the volume before allowing bidirectional mounts, as a defense in depth / validation case, although creating mounts should also require a priviliged execution environment inside the container.	2019-10-14 14:09:58 +02:00
Mahmood Ali	4b2ba62e35	acl: check ACL against object namespace Fix a bug where a millicious user can access or manipulate an alloc in a namespace they don't have access to. The allocation endpoints perform ACL checks against the request namespace, not the allocation namespace, and performs the allocation lookup independently from namespaces. Here, we check that the requested can access the alloc namespace regardless of the declared request namespace. Ideally, we'd enforce that the declared request namespace matches the actual allocation namespace. Unfortunately, we haven't documented alloc endpoints as namespaced functions; we suspect starting to enforce this will be very disruptive and inappropriate for a nomad point release. As such, we maintain current behavior that doesn't require passing the proper namespace in request. A future major release may start enforcing checking declared namespace.	2019-10-08 12:59:22 -04:00
Danielle Lancashire	78b61de45f	config: Hoist volume.config.source into volume Currently, using a Volume in a job uses the following configuration: ``` volume "alias-name" { type = "volume-type" read_only = true config { source = "host_volume_name" } } ``` This commit migrates to the following: ``` volume "alias-name" { type = "volume-type" source = "host_volume_name" read_only = true } ``` The original design was based due to being uncertain about the future of storage plugins, and to allow maxium flexibility. However, this causes a few issues, namely: - We frequently need to parse this configuration during submission, scheduling, and mounting - It complicates the configuration from and end users perspective - It complicates the ability to do validation As we understand the problem space of CSI a little more, it has become clear that we won't need the `source` to be in config, as it will be used in the majority of cases: - Host Volumes: Always need a source - Preallocated CSI Volumes: Always needs a source from a volume or claim name - Dynamic Persistent CSI Volumes: Always needs a source to attach the volumes to for managing upgrades and to avoid dangling. - Dynamic Ephemeral CSI Volumes: Less thought out, but `source` will probably point to the plugin name, and a `config` block will allow you to pass meta to the plugin. Or will point to a pre-configured ephemeral config. *If implemented The new design simplifies this by merging the source into the volume stanza to solve the above issues with usability, performance, and error handling.	2019-09-13 04:37:59 +02:00
Nick Ethier	6a90a9f505	structs: canonicalize tg Services and Networks (#6257 )	2019-09-04 08:55:47 -04:00
Buck Doyle	21ec6a237c	Merge branch 'master' into f-policy-json # Conflicts: # CHANGELOG.md	2019-09-03 09:56:25 -05:00
Jasmine Dahilig	4edebe389a	add default update stanza and max_parallel=0 disables deployments (#6191 )	2019-09-02 10:30:09 -07:00
Buck Doyle	8b06712d21	Merge branch 'master' into f-policy-json	2019-08-29 11:11:21 -05:00
Mahmood Ali	3791a70aa9	Merge pull request #5676 from hashicorp/f-b-upgrade-ugorji-dep-20190508 Update ugorji/go to latest	2019-08-23 18:29:49 -04:00
Michael Schurter	59e0b67c7f	connect: task hook for bootstrapping envoy sidecar Fixes #6041 Unlike all other Consul operations, boostrapping requires Consul be available. This PR tries Consul 3 times with a backoff to account for the group services being asynchronously registered with Consul.	2019-08-22 08:15:32 -07:00
Michael Schurter	b008fd1724	connect: register group services with Consul Fixes #6042 Add new task group service hook for registering group services like Connect-enabled services. Does not yet support checks.	2019-08-20 12:25:10 -07:00
Tim Gross	a0e923f46c	add optional task field to group service checks	2019-08-20 09:35:31 -04:00
Nick Ethier	24f5a4c276	sidecar_task override in connect admission controller (#6140 ) * structs: use seperate SidecarTask struct for sidecar_task stanza and add merge * nomad: merge SidecarTask into proxy task during connect Mutate hook	2019-08-20 01:22:46 -04:00
Nick Ethier	965f00b2fc	Builtin Admission Controller Framework (#6116 ) * nomad: add admission controller framework * nomad: add admission controller framework and Consul Connect hooks * run admission controllers before checking permissions * client: add default node meta for connect configurables * nomad: remove validateJob func since it has been moved to admission controller * nomad: use new TaskKind type * client: use consts for connect sidecar image and log level * Apply suggestions from code review Co-Authored-By: Michael Schurter <mschurter@hashicorp.com> * nomad: add job register test with connect sidecar * Update nomad/job_endpoint_hooks.go Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-08-15 11:22:37 -04:00
Preetha Appan	72e45dd01e	More code review feedback	2019-08-12 17:41:40 -05:00
Preetha	76c8a11b31	Apply suggestions from code review Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-08-12 17:03:30 -05:00
Preetha Appan	219dc05541	Fix type for kind	2019-08-12 14:39:50 -05:00
Preetha Appan	35506c516d	Improve validation logic and add table driven tests	2019-08-12 14:39:50 -05:00
Preetha Appan	d324a9864e	Add validation for kind field if it is a consul connect proxy	2019-08-12 14:39:50 -05:00
Danielle Lancashire	e132a30899	structs: Unify Volume and VolumeRequest	2019-08-12 15:39:08 +02:00
Danielle Lancashire	6d7b417e54	structs: Add declarations of basic structs for volume support	2019-08-12 15:39:08 +02:00
Preetha Appan	a393ea79e8	Add field "kind" to task for use in connect tasks	2019-08-07 18:43:36 -05:00
Jasmine Dahilig	8d980edd2e	add create and modify timestamps to evaluations (#5881 )	2019-08-07 09:50:35 -07:00
Michael Schurter	d2862b33e6	Merge pull request #6045 from hashicorp/f-connect-groupservice consul: add Connect structs	2019-08-06 15:43:38 -07:00
Michael Schurter	17fd82d6ad	consul: add Connect structs Refactor all Consul structs into {api,structs}/services.go because api/tasks.go didn't make sense anymore and structs/structs.go is gigantic.	2019-08-06 08:15:07 -07:00
Preetha Appan	8b298621ef	Add more comments to clarify job.Stable field	2019-08-05 15:00:53 -05:00
Preetha Appan	e6a496bac0	Code review feedback	2019-07-31 01:04:08 -04:00
Preetha Appan	99eca85206	Scheduler changes to support network at task group level Also includes unit tests for binpacker and preemption. The tests verify that network resources specified at the task group level are properly accounted for	2019-07-31 01:04:08 -04:00
Michael Schurter	4501fe3c4d	structs: deepcopy shared alloc resources Also DRY up Networks code by using Networks.Copy	2019-07-31 01:04:06 -04:00
Michael Schurter	fb487358fb	connect: add group.service stanza support	2019-07-31 01:04:05 -04:00
Nick Ethier	a03f6a95a2	structs: refactor network validation to seperate fn	2019-07-31 01:03:16 -04:00
Danielle	1e7571eb85	fix structs comment Co-Authored-By: nickethier <ncethier@gmail.com>	2019-07-31 01:03:16 -04:00
Nick Ethier	aa7c08679e	structs: Add validations for task group networks	2019-07-31 01:03:16 -04:00
Nick Ethier	6c160df689	fix tests from introducing new struct fields	2019-07-31 01:03:16 -04:00
Nick Ethier	8650429e38	Add network stanza to group Adds a network stanza and additional options to the task group level in prep for allowing shared networking between tasks of an alloc.	2019-07-31 01:03:12 -04:00
Buck Doyle	77f5a38c8f	Add parsed rules to policy response	2019-07-25 10:43:57 -05:00
Jasmine Dahilig	2157f6ddf1	add formatting for hcl parsing error messages (#5972 )	2019-07-19 10:04:39 -07:00
Michael Schurter	81b4b6f19b	Merge pull request #5791 from hashicorp/b-plan-snapshotindex nomad: include snapshot index when submitting plans	2019-07-17 09:25:00 -07:00
Lang Martin	c13c97c6c2	structs drop deprecation warning, revert unnecessary comment change	2019-07-10 13:56:20 -04:00
Lang Martin	a95225d754	NodeDeregisterBatch -> NodeBatchDeregister match JobBatch pattern	2019-07-10 13:56:20 -04:00
Lang Martin	91e139dcb5	structs NodeDeregisterBatchRequestType must go at the end	2019-07-10 13:56:20 -04:00
Lang Martin	683ab8d1d2	structs add NodeDeregisterBatchRequest	2019-07-10 13:56:19 -04:00
Lang Martin	3fb82e83a5	structs add back NodeDeregisterRequest.NodeID, compatibility	2019-07-10 13:56:19 -04:00
Lang Martin	77cf037bff	struct NodeDeregisterRequest has a batch of NodeIDs	2019-07-10 13:56:19 -04:00
Michael Schurter	e10fea1d7a	nomad: include snapshot index when submitting plans Plan application should use a state snapshot at or after the Raft index at which the plan was created otherwise it risks being rejected based on stale data. This commit adds a Plan.SnapshotIndex which is set by workers when submitting plan. SnapshotIndex is set to the Raft index of the snapshot the worker used to generate the plan. Plan.SnapshotIndex plays a similar role to PlanResult.RefreshIndex. While RefreshIndex informs workers their StateStore is behind the leader's, SnapshotIndex is a way to prevent the leader from using a StateStore behind the worker's. Plan.SnapshotIndex should be considered the lower bound index for consistently handling plan application. Plans must also be committed serially, so Plan N+1 should use a state snapshot containing Plan N. This is guaranteed for plans after the first plan after a leader election. The Raft barrier on leader election ensures the leader's statestore has caught up to the log index at which it was elected. This guarantees its StateStore is at an index > lastPlanIndex.	2019-06-24 12:16:46 -07:00
Mahmood Ali	87173111de	Merge pull request #5746 from hashicorp/b-no-updating-inmem-node set node.StatusUpdatedAt in raft	2019-06-05 19:05:21 -04:00
Lang Martin	d46613ff44	structs check TaskGroup.Update for nil	2019-05-22 12:34:57 -04:00
Lang Martin	10a3fd61b0	comment replace COMPAT 0.7.0 for job.Update with more current info	2019-05-22 12:34:57 -04:00
Lang Martin	67ebcc47dd	structs comment todo DeploymentStatus & DeploymentStatusDescription	2019-05-22 12:34:57 -04:00
Lang Martin	21bf9fdf90	structs job warnings for taskgroup with mixed auto_promote settings	2019-05-22 12:34:57 -04:00
Lang Martin	d27d6f8ede	structs validate requires Canary for AutoPromote	2019-05-22 12:32:08 -04:00
Lang Martin	f23f9fd99e	describe a pending deployment without auto_promote more explicitly	2019-05-22 12:32:08 -04:00
Lang Martin	34230577df	describe a pending deployment with auto_promote accurately	2019-05-22 12:32:08 -04:00
Lang Martin	b5fd735960	add update AutoPromote bool	2019-05-22 12:32:08 -04:00
Mahmood Ali	6bdbeed319	set node.StatusUpdatedAt in raft Fix a case where `node.StatusUpdatedAt` was manipulated directly in memory. This ensures that StatusUpdatedAt is set in raft layer, and ensures that the field is updated when node drain/eligibility is updated too.	2019-05-21 16:13:32 -04:00
Preetha	2dcd4291f8	Merge pull request #5702 from hashicorp/f-filter-by-create-index Filter deployments by create index	2019-05-15 21:50:41 -05:00
Michael Schurter	d7e5ace1ed	client: do not restart dead tasks until server is contacted Fixes #1795 Running restored allocations and pulling what allocations to run from the server happen concurrently. This means that if a client is rebooted, and has its allocations rescheduled, it may restart the dead allocations before it contacts the server and determines they should be dead. This commit makes tasks that fail to reattach on restore wait until the server is contacted before restarting.	2019-05-14 10:53:27 -07:00
Preetha Appan	07690d6f9e	Add flag similar to --all for allocs to be able to filter deployments by latest	2019-05-13 18:33:41 -05:00
Jasmine Dahilig	30d346ca15	Merge pull request #5665 from hashicorp/b-empty-datacenters add non-empty string validation for datacenters	2019-05-13 10:23:26 -07:00
Mahmood Ali	cf1f3625b4	Update ugorji/go to latest Our testing so far indicates that ugorji/go/codec maintains backward compatiblity with the version we are using now, for purposes of Nomad serialization. Using latest ugorji/go allows us to get back to using upstream library, get get the optimizations benefits in RPC paths (including code generation optimizations). ugorji/go introduced two significant changes: * time binary format in `debb8e2d2e`. Setting `h.BasicHandle.TimeNotBuiltin = true` restores old behavior * ugorji/go started honoring `json` tag as well: v1.1.4 is the latest but has a bug in handling RawString that's fixed in `d09a80c1e0` .	2019-05-09 19:35:58 -04:00
Mahmood Ali	9d3f13e9b3	remove Index field from EmitNodeEventsResponse `Index` is already included as part of `WriteMeta` embedding. This is a backward compatible change: Clients never read the field; and Server refernces to `EmitNodeEventsResponse.Index` would be using the value in `WriteMeta`, which is consistent with other response structs.	2019-05-08 08:42:26 -04:00
Jasmine Dahilig	016495c368	add non-empty string validation for datacenters	2019-05-03 06:48:02 -07:00
Lang Martin	371014b781	Merge pull request #5553 from hashicorp/b-fingerprinter-manual-config client fingerprinter doesn't overwrite manual configuration	2019-04-26 12:55:34 -04:00
Danielle Lancashire	3409e0be89	allocs: Add nomad alloc signal command This command will be used to send a signal to either a single task within an allocation, or all of the tasks if <task-name> is omitted. If the sent signal terminates the allocation, it will be treated as if the allocation has crashed, rather than as if it was operator-terminated. Signal validation is currently handled by the driver itself and nomad does not attempt to restrict or validate them.	2019-04-25 12:43:32 +02:00
Arshneet Singh	d4e7a5c005	Add comments to functions, and use require instead of assert	2019-04-23 09:57:21 -07:00
Arshneet Singh	4cf4324b8f	Remove allowPlanOptimization from schedulers	2019-04-23 09:18:02 -07:00
Arshneet Singh	0dd4c109e8	Compat tags	2019-04-23 09:18:01 -07:00
Arshneet Singh	b977748a4b	Add code for plan normalization	2019-04-23 09:18:01 -07:00
Danielle	198a838b61	Merge pull request #5512 from hashicorp/dani/f-alloc-stop alloc-lifecycle: nomad alloc stop	2019-04-23 13:05:08 +02:00
Danielle Lancashire	832f607433	allocs: Add nomad alloc stop This adds a `nomad alloc stop` command that can be used to stop and force migrate an allocation to a different node. This is built on top of the AllocUpdateDesiredTransitionRequest and explicitly limits the scope of access to that transition to expose it under the alloc-lifecycle ACL. The API returns the follow up eval that can be used as part of monitoring in the CLI or parsed and used in an external tool.	2019-04-23 12:50:23 +02:00
Lang Martin	7de6e28ddc	structs need to keep assert Equal interface implementation for tests	2019-04-19 15:23:49 -04:00
Lang Martin	977d33970b	structs equals use labeled continue for clarity	2019-04-19 15:23:48 -04:00
Lang Martin	7b99488afa	struct equals use a working pattern for setwise comparison	2019-04-19 15:23:48 -04:00
Lang Martin	eba4e29440	client fingerprinter doesn't overwrite manual configuration Revert "Revert accidental merge of pr #5482" This reverts commit c45652ab8c113487b9d4fbfb107782cbcf8a85b0.	2019-04-19 15:23:48 -04:00
Preetha Appan	22109d1e20	Add preemption related fields to AllocationListStub	2019-04-18 10:36:44 -05:00
Lang Martin	a2a1e7829d	Revert accidental merge of pr #5482 Revert "fingerprint Constraints and Affinities have Equals, as set" This reverts commit 596f16fb5f1a4a6766a57b3311af806d22382609. Revert "client tests assert the independent handling of interface and speed" This reverts commit 7857ac5993a578474d0570819f99b7b6e027de40. Revert "structs missed applying a style change from the review" This reverts commit 658916e3274efa438beadc2535f47109d0c2f0f2. Revert "client, structs comments" This reverts commit be2838d6baa9d382a5013fa80ea016856f28ade2. Revert "client fingerprint updateNetworks preserves the network configuration" This reverts commit fc309cb430e62d8e66267a724f006ae9abe1c63c. Revert "client_test cleanup comments from review" This reverts commit bc0bf4efb9114e699bc662f50c8f12319b6b3445. Revert "client Networks Equals is set equality" This reverts commit f8d432345b54b1953a4a4c719b9269f845e3e573. Revert "struct cleanup indentation in RequestedDevice Equals" This reverts commit f4746411cab328215def6508955b160a53452da3. Revert "struct Equals checks for identity before value checking" This reverts commit 0767a4665ed30ab8d9586a59a74db75d51fd9226. Revert "fix client-test, avoid hardwired platform dependecy on lo0" This reverts commit e89dbb2ab182b6368507dbcd33c3342223eb0ae7. Revert "refactor error in client fingerprint to include the offending data" This reverts commit a7fed726c6e0264d42a58410d840adde780a30f5. Revert "add client updateNodeResources to merge but preserve manual config" This reverts commit 84bd433c7e1d030193e054ec23474380ff3b9032. Revert "refactor struts.RequestedDevice to have its own Equals" This reverts commit 689782524090e51183474516715aa2f34908b8e6. Revert "refactor structs.Resource.Networks to have its own Equals" This reverts commit 49e2e6c77bb3eaa4577772b36c62205061c92fa1. Revert "refactor structs.Resource.Devices to have its own Equals" This reverts commit 4ede9226bb971ae42cc203560ed0029897aec2c9. Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources" This reverts commit 49fbaace5298d5ccf031eb7ebec93906e1d468b5. Revert "add structs.Resources Equals" This reverts commit 8528a2a2a6450e4462a1d02741571b5efcb45f0b. Revert "test that fingerprint resources are updated, net not clobbered" This reverts commit 8ee02ddd23bafc87b9fce52b60c6026335bb722d.	2019-04-11 10:29:40 -04:00
Lang Martin	07ff740408	fingerprint Constraints and Affinities have Equals, as set	2019-04-11 09:56:22 -04:00
Lang Martin	8f07698c03	structs missed applying a style change from the review	2019-04-11 09:56:22 -04:00
Lang Martin	7258a13c72	client, structs comments	2019-04-11 09:56:22 -04:00
Lang Martin	1878bf694e	client Networks Equals is set equality	2019-04-11 09:56:22 -04:00
Lang Martin	e1c91afd19	struct cleanup indentation in RequestedDevice Equals	2019-04-11 09:56:22 -04:00
Lang Martin	0c90efebdc	struct Equals checks for identity before value checking	2019-04-11 09:56:22 -04:00
Lang Martin	1a594b53f6	refactor struts.RequestedDevice to have its own Equals	2019-04-11 09:56:21 -04:00
Lang Martin	ec1ccdeda0	refactor structs.Resource.Networks to have its own Equals NodeResource.Networks uses the same function	2019-04-11 09:56:21 -04:00
Lang Martin	06008465c4	refactor structs.Resource.Devices to have its own Equals	2019-04-11 09:56:21 -04:00
Lang Martin	36f3022246	add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources	2019-04-11 09:56:21 -04:00
Lang Martin	d4567e9909	add structs.Resources Equals	2019-04-11 09:56:21 -04:00
Danielle Lancashire	e135876493	allocs: Add nomad alloc restart This adds a `nomad alloc restart` command and api that allows a job operator with the alloc-lifecycle acl to perform an in-place restart of a Nomad allocation, or a given subtask.	2019-04-11 14:25:49 +02:00
Chris Baker	0ba1600545	server/job_endpoint: accept vault token and pass as part of Job.RegisterRequest [#4555 ]	2019-04-10 10:34:10 -05:00
James Rasell	9470507cf4	Add NodeName to the alloc/job status outputs. Currently when operators need to log onto a machine where an alloc is running they will need to perform both an alloc/job status call and then a call to discover the node name from the node list. This updates both the job status and alloc status output to include the node name within the information to make operator use easier. Closes #2359 Cloess #1180	2019-04-10 10:34:10 -05:00
Charlie Voiselle	604c49beb8	Merge pull request #5344 from hashicorp/b-nexteval-for-failed-follow-up Set NextEval when making `failed-follow-up` evals	2019-02-22 14:14:41 -08:00
Charlie Voiselle	006afdca9b	Added comments * caller should created eval id * prev/next eval used in failed-follow-up	2019-02-22 10:22:52 -08:00
Michael Schurter	6580ed668e	client: don't redownload completed artifacts on retries Track the download status of each artifact independently so that if only one of many artifacts fails to download, completed artifacts aren't downloaded again.	2019-02-20 08:45:12 -08:00
Alex Dadgar	41265d4d61	Change types of weights on spread/affinity	2019-01-30 12:20:38 -08:00
Preetha	ec92bf673c	Merge pull request #5223 from hashicorp/f-jobs-list-datacenters Add Datacenters to the JobListStub struct	2019-01-24 08:13:30 -06:00
Preetha Appan	38422642cb	Use DesiredState to determine whether to stop sending task events	2019-01-22 16:43:32 -06:00
Michael Lange	ce7bc4f56f	Add Datacenters to the JobsListStub struct So it can be used for filtering the full list of jobs	2019-01-22 11:16:35 -08:00
Mahmood Ali	7bdd43f3e0	api: avoid codegen for syncing Given that the values will rarely change, specially considering that any changes would be backward incompatible change. As such, it's simpler to keep syncing manually in the rare occasion and avoid the syncing code overhead.	2019-01-18 18:52:31 -05:00
Mahmood Ali	253532ec00	api: avoid import nomad/structs pkg nomad/structs is an internal package and imports many libraries (e.g. raft, codec) that are not relevant to api clients, and may cause unnecessary dependency pain (e.g. `github.com/ugorji/go/codec` version is very old now). Here, we add a code generator that imports the relevant constants from `nomad/structs`. I considered using this approach for other structs, but didn't find a quick viable way to reduce duplication. `nomad/structs` use values as struct fields (e.g. `string`), while `api` uses value pointer (e.g. `*string`) instead. Also, sometimes, `api` structs contain deprecated fields or additional documentation, so simple copy-paste doesn't work. For these reasons, I opt to keep the status quo.	2019-01-18 14:51:19 -05:00
Nick Ethier	597b7b751d	tr: add retry /w backoff to stats_hook failure	2019-01-12 12:18:24 -05:00
Alex Dadgar	79cfe26021	vet	2019-01-07 14:49:41 -08:00
Alex Dadgar	8a35d7b1dd	Test recovery	2019-01-07 14:49:41 -08:00
Danielle Tomlinson	3647b701a6	taskrunner: Emit task events when a hook fails	2018-12-13 18:20:18 +01:00
Alex Dadgar	c918a96490	Warn if IOPS is being used	2018-12-06 16:17:09 -08:00
Alex Dadgar	1e3c3cb287	Deprecate IOPS IOPS have been modelled as a resource since Nomad 0.1 but has never actually been detected and there is no plan in the short term to add detection. This is because IOPS is a bit simplistic of a unit to define the performance requirements from the underlying storage system. In its current state it adds unnecessary confusion and can be removed without impacting any users. This PR leaves IOPS defined at the jobspec parsing level and in the api/ resources since these are the two public uses of the field. These should be considered deprecated and only exist to allow users to stop using them during the Nomad 0.9.x release. In the future, there should be no expectation that the field will exist.	2018-12-06 15:09:26 -08:00
Alex Dadgar	4ee603c382	Device hook and devices affect computed node class This PR introduces a device hook that retrieves the device mount information for an allocation. It also updates the computed node class computation to take into account devices. TODO Fix the task runner unit test. The environment variable is being lost even though it is being properly set in the prestart hook.	2018-11-27 17:25:33 -08:00
Nick Ethier	29591a7c2e	task_runner: emit event on task exit with exit result details	2018-11-19 22:59:17 -05:00
Danielle Tomlinson	8bf17fe22d	Merge pull request #4875 from hashicorp/f-constraints scheduler: Make != constraints more flexible	2018-11-15 11:04:21 -08:00
Danielle Tomlinson	9c72dafc95	scheduler: Add is_set/is_not_set constraints This adds constraints for asserting that a given attribute or value exists, or does not exist. This acts as a companion to =, or != operators, e.g: ```hcl constraint { attribute = "${attrs.type}" operator = "!=" value = "database" } constraint { attribute = "${attrs.type}" operator = "is_set" } ```	2018-11-15 11:00:32 -08:00
Mahmood Ali	046f098bac	Track Node Device attributes and serve them in API	2018-11-14 14:42:29 -05:00
Alex Dadgar	08dc2ea702	Merge pull request #4867 from hashicorp/b-deployment-progress-deadline Blocked evaluation fixes	2018-11-13 10:29:03 -08:00
Preetha Appan	5f0a9d2cfd	Show preemption output in plan CLI	2018-11-08 09:48:43 -06:00
Alex Dadgar	feb83a2be3	assign devices	2018-11-07 10:32:03 -08:00
Alex Dadgar	2d2248e209	Add devices to allocated resources	2018-11-07 10:32:03 -08:00
Alex Dadgar	b1c5d52817	Track jobs by namespace	2018-11-07 10:22:08 -08:00
Preetha Appan	6fdc84cce3	add comment	2018-11-02 18:11:36 -05:00
Preetha Appan	a6b714b81c	update preemption tests to use new node resource structs also includes a fix to remove unnecessary subtraction of network mbits	2018-11-02 17:59:53 -05:00
Preetha	b2b52b1ada	Merge pull request #4794 from hashicorp/f-preemption-systemjobs Preemption for system jobs	2018-11-02 16:28:06 -05:00
Preetha Appan	57fe5050f0	more minor review feedback	2018-11-01 17:05:17 -05:00
Preetha Appan	fd60e66f86	Plumb alloc resource cache in a few more places. also removed now unused method	2018-11-01 16:44:43 -05:00
Mahmood Ali	9da19c6450	address review comments	2018-10-30 13:58:52 -04:00
Mahmood Ali	4937095389	Allow artifacts checksum interpolation Fixes https://github.com/hashicorp/nomad/issues/4814	2018-10-30 13:24:30 -04:00
Preetha Appan	f1c3eb2792	Introduce interface with multiple implementations for resource distance	2018-10-30 11:06:32 -05:00
Preetha Appan	0494a098ce	More style and readablity fixes from review	2018-10-30 11:06:32 -05:00
Preetha Appan	8807c25b11	Modify preemption code to use new style of resource structs	2018-10-30 11:06:32 -05:00
Preetha Appan	bd34cbb1f7	Support for new scheduler config API, first use case is to disable preemption	2018-10-30 11:06:32 -05:00
Preetha Appan	cc295b90de	Implement preemption for system jobs. This commit implements an allocation selection algorithm for finding allocations to preempt. It currently special cases network resource asks from others (cpu/memory/disk/iops).	2018-10-30 11:06:32 -05:00
Preetha Appan	d11064d6ba	structs and API changes to plan and alloc structs needed for preemption	2018-10-30 11:06:32 -05:00
Preetha Appan	9257387a69	Add number of evictions to DesiredUpdates struct to use in CLI/API	2018-10-30 11:06:32 -05:00

... 3 4 5 6 7 ...

1378 Commits