open-nomad

Commit Graph

Author	SHA1	Message	Date
Drew Bailey	c463479848	filter on additional filter keys, remove switch statement duplication properly wire up durable event count move newline responsibility moves newline creation from NDJson to the http handler, json stream only encodes and sends now ignore snapshot restore if broker is disabled enable dev mode to access event steam without acl use mapping instead of switch use pointers for config sizes, remove unused ttl, simplify closed conn logic	2020-10-14 14:14:33 -04:00
Michael Schurter	8ccbd92cb6	api: add field filters to /v1/{allocations,nodes} Fixes #9017 The ?resources=true query parameter includes resources in the object stub listings. Specifically: - For `/v1/nodes?resources=true` both the `NodeResources` and `ReservedResources` field are included. - For `/v1/allocations?resources=true` the `AllocatedResources` field is included. The ?task_states=false query parameter removes TaskStates from /v1/allocations responses. (By default TaskStates are included.)	2020-10-14 10:35:22 -07:00
Drew Bailey	df96b89958	Add EvictCallbackFn to handle removing entries from go-memdb when they are removed from the event buffer. Wire up event buffer size config, use pointers for structs.Events instead of copying.	2020-10-14 12:44:42 -04:00
Drew Bailey	315f77a301	rehydrate event publisher on snapshot restore address pr feedback	2020-10-14 12:44:41 -04:00
Drew Bailey	9a2b1bd63a	api comments	2020-10-14 12:44:38 -04:00
Drew Bailey	400455d302	Events/eval alloc events (#9012 ) * generic eval update event first pass at alloc client update events * api/event client	2020-10-14 12:44:37 -04:00
Dave May	561637c063	Merge pull request #9034 from hashicorp/dmay-debug-metrics Add metrics command / output to debug bundle	2020-10-06 11:47:09 -04:00
davemay99	67b4161411	added comment to operator metrics function	2020-10-06 11:22:10 -04:00
davemay99	18aa30c00f	metrics return bytes instead of string for more flexibility	2020-10-06 10:49:15 -04:00
davemay99	603cc1776c	Add metrics command / output to debug bundle	2020-10-05 22:30:01 -04:00
Chris Baker	5c68f53c24	updated api tests wrt backwards compat on null chars in IDs	2020-10-05 18:01:50 +00:00
Michael Schurter	765473e8b0	jobspec: lower min cpu resources from 10->1 Since CPU resources are usually a soft limit it is desirable to allow setting it as low as possible to allow tasks to run only in "idle" time. Setting it to 0 is still not allowed to avoid potential unintentional side effects with allowing a zero value. While there may not be any side effects this commit attempts to minimize risk by avoiding the issue. This does not change the defaults.	2020-09-30 12:15:13 -07:00
Luiz Aoqui	88d4eecfd0	add scaling policy type	2020-09-29 17:57:46 -04:00
Mahmood Ali	f5700611c0	api: target servers for allocation requests (#8897 ) Allocation requests should target servers, which then can forward the request to the appropriate clients. Contacting clients directly is fragile and prune to failures: e.g. clients maybe firewalled and not accessible from the API client, or have some internal certificates not trusted by the API client. FWIW, in contexts where we anticipate lots of traffic (e.g. logs, or exec), the api package attempts contacting the client directly but then fallsback to using the server. This approach seems excessive in these simple GET/PUT requests. Fixes #8894	2020-09-16 09:34:17 -04:00
Benjamin Buzbee	4c91f20c44	Add API support for cancelation contexts passed via QueryOptions and WriteOptions (#8836 ) Copy Consul API's format: QueryOptions.WithContext(context) will now return a new QueryOption whose HTTP requests will be canceled with the context provided (and similar for WriteOptions)	2020-09-09 16:22:07 -04:00
Jasmine Dahilig	71a694f39c	Merge pull request #8390 from hashicorp/lifecycle-poststart-hook task lifecycle poststart hook	2020-08-31 13:53:24 -07:00
Tim Gross	b77fe023b5	MRD: move 'job stop -global' handling into RPC (#8776 ) The initial implementation of global job stop for MRD looped over all the regions in the CLI for expedience. This changeset includes the OSS parts of moving this into the RPC layer so that API consumers don't have to implement this logic themselves.	2020-08-28 14:28:13 -04:00
Lang Martin	7d483f93c0	csi: plugins track jobs in addition to allocations, and use job information to set expected counts (#8699 ) * nomad/structs/csi: add explicit job support * nomad/state/state_store: capture job updates directly * api/nodes: CSIInfo needs the AllocID * command/agent/csi_endpoint: AllocID was missing Co-authored-by: Tim Gross <tgross@hashicorp.com>	2020-08-27 17:20:00 -04:00
Tim Gross	4eba6be8b1	tests: allow API test server to respect NOMAD_TEST_LOG_LEVEL (#8760 )	2020-08-27 14:36:17 -04:00
Seth Hoenig	c4fa644315	consul/connect: remove envoy dns option from gateway proxy config	2020-08-24 09:11:55 -05:00
Seth Hoenig	5b072029f2	consul/connect: add initial support for ingress gateways This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza. ```hcl service { connect { gateway { proxy { // envoy proxy configuration } ingress { // ingress-gateway configuration entry } } } } ``` A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value. Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver. Aims to address #8294 and tangentially #8647	2020-08-21 16:21:54 -05:00
James Rasell	b6a884a7cb	Merge pull request #8636 from hashicorp/f-gh-8142 api: add node purge SDK function.	2020-08-17 09:45:54 +02:00
James Rasell	edce647da1	api: add node purge SDK function.	2020-08-14 08:40:03 +01:00
Lang Martin	c82b2a2454	CSI: volume and plugin allocations in the API (#8590 ) * command/agent/csi_endpoint: explicitly convert to API structs, and convert allocs for single object get endpoints	2020-08-11 12:24:41 -04:00
Tim Gross	443fdaa86b	csi: nomad volume detach command (#8584 ) The soundness guarantees of the CSI specification leave a little to be desired in our ability to provide a 100% reliable automated solution for managing volumes. This changeset provides a new command to bridge this gap by providing the operator the ability to intervene. The command doesn't take an allocation ID so that the operator doesn't have to keep track of alloc IDs that may have been GC'd. Handle this case in the unpublish RPC by sending the client RPC for all the terminal/nil allocs on the selected node.	2020-08-11 10:18:54 -04:00
Seth Hoenig	fd4804bf26	consul: able to set pass/fail thresholds on consul service checks This change adds the ability to set the fields `success_before_passing` and `failures_before_critical` on Consul service check definitions. This is a feature added to Consul v1.7.0 and later. https://www.consul.io/docs/agent/checks#success-failures-before-passing-critical Nomad doesn't do much besides pass the fields through to Consul. Fixes #6913	2020-08-10 14:08:09 -05:00
Drew Bailey	bd421b6197	Merge pull request #8453 from hashicorp/oss-multi-vault-ns oss compoments for multi-vault namespaces	2020-07-27 08:45:22 -04:00
Drew Bailey	b296558b8e	oss compoments for multi-vault namespaces adds in oss components to support enterprise multi-vault namespace feature upgrade specific doc on vault multi-namespaces vault docs update test to reflect new error	2020-07-24 10:14:59 -04:00
James Rasell	da91e1d0fc	api: add namespace to scaling status GET response object.	2020-07-24 11:19:25 +02:00
Mahmood Ali	ec9a12e54e	Fix pro tags	2020-07-17 11:02:00 -04:00
Jasmine Dahilig	9e27231953	add poststart hook to task hook coordinator & structs	2020-07-08 11:01:35 -07:00
Tim Gross	b6d9e546aa	fix volume deregister -force params in API (#8380 ) The CSI `volume deregister -force` flag was using the documented parameter `force` everywhere except in the API, where it was incorrectly passing `purge` as the query parameter.	2020-07-08 10:08:22 -04:00
Chris Baker	bdbb066ec8	fixed api tests for changes	2020-07-04 19:23:58 +00:00
Chris Baker	9100b6b7c0	changes to make sure that Max is present and valid, to improve error messages * made api.Scaling.Max a pointer, so we can detect (and complain) when it is neglected * added checks to HCL parsing that it is present * when Scaling.Max is absent/invalid, don't return extraneous error messages during validation * tweak to multiregion handling to ensure that the count is valid on the interpolated regional jobs resolves #8355	2020-07-04 19:05:50 +00:00
Lang Martin	6c22cd587d	api: `nomad debug` new /agent/host (#8325 ) * command/agent/host: collect host data, multi platform * nomad/structs/structs: new HostDataRequest/Response * client/agent_endpoint: add RPC endpoint * command/agent/agent_endpoint: add Host * api/agent: add the Host endpoint * nomad/client_agent_endpoint: add Agent Host with forwarding * nomad/client_agent_endpoint: use findClientConn This changes forwardMonitorClient and forwardProfileClient to use findClientConn, which was cribbed from the common parts of those funcs. * command/debug: call agent hosts * command/agent/host: eliminate calling external programs	2020-07-02 09:51:25 -04:00
Tim Gross	23be116da0	csi: add -force flag to volume deregister (#8295 ) The `nomad volume deregister` command currently returns an error if the volume has any claims, but in cases where the claims can't be dropped because of plugin errors, providing a `-force` flag gives the operator an escape hatch. If the volume has no allocations or if they are all terminal, this flag deletes the volume from the state store, immediately and implicitly dropping all claims without further CSI RPCs. Note that this will not also unmount/detach the volume, which we'll make the responsibility of a separate `nomad volume detach` command.	2020-07-01 12:17:51 -04:00
Nick Ethier	89118016fc	command: correctly show host IP in ports output /w multi-host networks (#8289 )	2020-06-25 15:16:01 -04:00
Seth Hoenig	6c5ab7f45e	consul/connect: split connect native flag and task in service	2020-06-23 10:22:22 -05:00
Seth Hoenig	4d71f22a11	consul/connect: add support for running connect native tasks This PR adds the capability of running Connect Native Tasks on Nomad, particularly when TLS and ACLs are enabled on Consul. The `connect` stanza now includes a `native` parameter, which can be set to the name of task that backs the Connect Native Consul service. There is a new Client configuration parameter for the `consul` stanza called `share_ssl`. Like `allow_unauthenticated` the default value is true, but recommended to be disabled in production environments. When enabled, the Nomad Client's Consul TLS information is shared with Connect Native tasks through the normal Consul environment variables. This does NOT include auth or token information. If Consul ACLs are enabled, Service Identity Tokens are automatically and injected into the Connect Native task through the CONSUL_HTTP_TOKEN environment variable. Any of the automatically set environment variables can be overridden by the Connect Native task using the `env` stanza. Fixes #6083	2020-06-22 14:07:44 -05:00
Michael Schurter	562704124d	Merge pull request #8208 from hashicorp/f-multi-network multi-interface network support	2020-06-19 15:46:48 -07:00
Nick Ethier	a87e91e971	test: fix up testing around host networks	2020-06-19 13:53:31 -04:00
Drew Bailey	c2d7b61939	allow raw body instead of JSON encoded string (#8211 )	2020-06-19 10:57:09 -04:00
Nick Ethier	f0559a8162	multi-interface network support	2020-06-19 09:42:10 -04:00
Tim Gross	8a354f828f	store ACL Accessor ID from Job.Register with Job (#8204 ) In multiregion deployments when ACLs are enabled, the deploymentwatcher needs an appropriately scoped ACL token with the same `submit-job` rights as the user who submitted it. The token will already be replicated, so store the accessor ID so that it can be retrieved by the leader.	2020-06-19 07:53:29 -04:00
Mahmood Ali	38a01c050e	Merge pull request #8192 from hashicorp/f-status-allnamespaces-2 CLI Allow querying all namespaces for jobs and allocations - Try 2	2020-06-18 20:16:52 -04:00
Nick Ethier	0bc0403cc3	Task DNS Options (#7661 ) Co-Authored-By: Tim Gross <tgross@hashicorp.com> Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>	2020-06-18 11:01:31 -07:00
Mahmood Ali	7a33a75449	cli: jobs allow querying jobs in all namespaces	2020-06-17 16:31:01 -04:00
Mahmood Ali	e784fe331a	use '*' to indicate all namespaces This reverts the introduction of AllNamespaces parameter that was merged earlier but never got released.	2020-06-17 16:27:43 -04:00
Tim Gross	7b12445f29	multiregion: change AutoRevert to OnFailure	2020-06-17 11:05:45 -04:00
Tim Gross	b09b7a2475	Multiregion job registration Integration points for multiregion jobs to be registered in the enterprise version of Nomad: * hook in `Job.Register` for enterprise to send job to peer regions * remove monitoring from `nomad job run` and `nomad job stop` for multiregion jobs	2020-06-17 11:04:58 -04:00
Tim Gross	b93efc16d5	multiregion CLI: nomad deployment unblock	2020-06-17 11:03:44 -04:00
Drew Bailey	9263fcb0d3	Multiregion deploy status and job status CLI	2020-06-17 11:03:34 -04:00
Tim Gross	6851024925	Multiregion structs Initial struct definitions, jobspec parsing, validation, and conversion between Nomad structs and API structs for multi-region deployments.	2020-06-17 11:00:14 -04:00
Chris Baker	9fc66bc1aa	support in API client and Job.Register RPC for PreserveCounts	2020-06-16 18:45:28 +00:00
Chris Baker	377f881fbd	removed api.RegisterJobRequest in favor of api.JobRegisterRequest modified `job inspect` and `job run -output` to use anonymous struct to keep previous behavior	2020-06-16 18:45:17 +00:00
Chris Baker	b22b2bf968	wip: developmental test to preserve existing task group counts during job update	2020-06-16 18:45:17 +00:00
Chris Baker	aeb3ed449e	wip: added .PreviousCount to api.ScalingEvent and structs.ScalingEvent, with developmental tests	2020-06-15 19:40:21 +00:00
Mahmood Ali	63e048e972	clarify ccomments, esp related to leadership code	2020-06-09 12:01:31 -04:00
Mahmood Ali	5cf04b5762	api: add snapshot restore	2020-06-07 15:47:07 -04:00
Mahmood Ali	3bb2e4a447	Merge pull request #8083 from hashicorp/test-deflake-20200531 More Test deflaking - 2020-05-31 edition	2020-06-01 09:28:45 -04:00
Mahmood Ali	de44d9641b	Merge pull request #8047 from hashicorp/f-snapshot-save API for atomic snapshot backups	2020-06-01 07:55:16 -04:00
Mahmood Ali	a73cd01a00	Merge pull request #8001 from hashicorp/f-jobs-list-across-nses endpoint to expose all jobs across all namespaces	2020-05-31 21:28:03 -04:00
Mahmood Ali	b102ee164b	tests: terminate agent gracefully Ensure that api test agent is terminated gracefully. This is desired for two purposes: First, to ensure that the logs are fully flished out. If the agent is killed mid log line and go test doesn't emit a new line before `--- PASS:` indicator, the test may be marked as failed, even if it passed. Sample failure is https://circleci.com/gh/hashicorp/nomad/72360 . Second, ensure that the agent terminates any auxiliary processes (e.g. logmon, tasks).	2020-05-31 10:35:37 -04:00
Drew Bailey	5948c4f497	Revert "disable license cli commands"	2020-05-26 12:39:39 -04:00
Mahmood Ali	b548f603dc	Add api/ package function to save snapshot	2020-05-21 20:04:38 -04:00
Seth Hoenig	f136afc04f	api: canonicalize connect components Add `Canonicalize` methods to the connect components of a service definition in the `api` package. Without these, we have been relying on good input for the connect stanza. Fixes #7993	2020-05-19 11:47:22 -06:00
Mahmood Ali	63b87a9a89	update api/ JobListStub	2020-05-19 09:58:19 -04:00
Mahmood Ali	5ab2d52e27	endpoint to expose all jobs across all namespaces Allow a `/v1/jobs?all_namespaces=true` to list all jobs across all namespaces. The returned list is to contain a `Namespace` field indicating the job namespace. If ACL is enabled, the request token needs to be a management token or have `namespace:list-jobs` capability on all existing namespaces.	2020-05-18 13:50:46 -04:00
James Rasell	01236e842b	api: tidy Go module to remove unused modules.	2020-05-18 09:56:23 +02:00
Tim Gross	2082cf738a	csi: support for VolumeContext and VolumeParameters (#7957 ) The MVP for CSI in the 0.11.0 release of Nomad did not include support for opaque volume parameters or volume context. This changeset adds support for both. This also moves args for ControllerValidateCapabilities into a struct. The CSI plugin `ControllerValidateCapabilities` struct that we turn into a CSI RPC is accumulating arguments, so moving it into a request struct will reduce the churn of this internal API, make the plugin code more readable, and make this method consistent with the other plugin methods in that package.	2020-05-15 08:16:01 -04:00
Chris Baker	11d8fb4b16	the api.ScalingEvent struct was missing the .Count field	2020-05-13 20:44:53 +00:00
Lang Martin	d3c4700cd3	server: stop after client disconnect (#7939 ) * jobspec, api: add stop_after_client_disconnect * nomad/state/state_store: error message typo * structs: alloc methods to support stop_after_client_disconnect 1. a global AllocStates to track status changes with timestamps. We need this to track the time at which the alloc became lost originally. 2. ShouldClientStop() and WaitClientStop() to actually do the math * scheduler/reconcile_util: delayByStopAfterClientDisconnect * scheduler/reconcile: use delayByStopAfterClientDisconnect * scheduler/util: updateNonTerminalAllocsToLost comments This was setup to only update allocs to lost if the DesiredStatus had already been set by the scheduler. It seems like the intention was to update the status from any non-terminal state, and not all lost allocs have been marked stop or evict by now * scheduler/testing: AssertEvalStatus just use require * scheduler/generic_sched: don't create a blocked eval if delayed * scheduler/generic_sched_test: several scheduling cases	2020-05-13 16:39:04 -04:00
Mahmood Ali	3b4116e0db	Merge pull request #7894 from hashicorp/b-cronexpr-dst-fix Fix Daylight saving transition handling	2020-05-12 16:36:11 -04:00
Mahmood Ali	326793939e	vendor: use tagged cronexpr, v1.1.0 Also, update to the version with modification notice	2020-05-12 16:20:00 -04:00
Chris Baker	fce0e81d8c	Merge pull request #7928 from hashicorp/license-drop-tags remove tags from api struct	2020-05-12 08:40:02 -05:00
Tim Gross	4374c1a837	csi: support Secrets parameter in CSI RPCs (#7923 ) CSI plugins can require credentials for some publishing and unpublishing workflow RPCs. Secrets are configured at the time of volume registration, stored in the volume struct, and then passed around as an opaque map by Nomad to the plugins.	2020-05-11 17:12:51 -04:00
Drew Bailey	64b58775ab	remove tags from api struct	2020-05-11 16:38:35 -04:00
Drew Bailey	466e8d5043	disable license cli commands	2020-05-11 13:49:29 -04:00
Mahmood Ali	57435950d7	Update current DST and some code style issues	2020-05-07 19:27:05 -04:00
Mahmood Ali	c8fb132956	Update cronexpr to point to hashicorp/cronexpr	2020-05-07 17:50:45 -04:00
Michael Lange	1a9631dbfa	Add ControllersExpected to the PluginListStub	2020-05-07 10:01:52 -07:00
Drew Bailey	48c451709e	update license command output to reflect api changes	2020-05-05 10:28:58 -04:00
Mahmood Ali	b9e3cde865	tests and some clean up	2020-05-01 13:13:30 -04:00
Charlie Voiselle	663fb677cf	Add SchedulerAlgorithm to SchedulerConfig	2020-05-01 13:13:29 -04:00
Drew Bailey	acacecc67b	add license reset command to commands help text formatting remove reset no signed option	2020-04-30 14:46:20 -04:00
Drew Bailey	74abe6ef48	license cli commands cli changes, formatting	2020-04-30 14:46:17 -04:00
Yoan Blanc	417c2995c9	api: fix some documentation typos Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-04-27 10:25:29 +02:00
Yoan Blanc	790df29996	api: testify v1.5.1 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-04-11 13:55:10 +02:00
Tim Gross	966286fee5	fix encoding/decoding tags for api.Task (#7620 ) When `nomad job inspect` encodes the response, if the decoded JSON from the API doesn't exactly match the API struct, the field value will be omitted even if it has a value. We only want the JSON struct tag to `omitempty`.	2020-04-03 16:45:49 -04:00
Chris Baker	8ec252e627	added indices to the job scaling events, so we could properly do blocking queries on the job scaling status	2020-04-01 17:28:19 +00:00
Chris Baker	b2ab42afbb	scaling api: more testing around the scaling events api	2020-04-01 16:39:23 +00:00
Chris Baker	40d6b3bbd1	adding raft and state_store support to track job scaling events updated ScalingEvent API to record "message string,error bool" instead of confusing "reason,error *string"	2020-04-01 16:15:14 +00:00
Seth Hoenig	14c7cebdea	connect: enable automatic expose paths for individual group service checks Part of #6120 Building on the support for enabling connect proxy paths in #7323, this change adds the ability to configure the 'service.check.expose' flag on group-level service check definitions for services that are connect-enabled. This is a slight deviation from the "magic" that Consul provides. With Consul, the 'expose' flag exists on the connect.proxy stanza, which will then auto-generate expose paths for every HTTP and gRPC service check associated with that connect-enabled service. A first attempt at providing similar magic for Nomad's Consul Connect integration followed that pattern exactly, as seen in #7396. However, on reviewing the PR we realized having the `expose` flag on the proxy stanza inseperably ties together the automatic path generation with every HTTP/gRPC defined on the service. This makes sense in Consul's context, because a service definition is reasonably associated with a single "task". With Nomad's group level service definitions however, there is a reasonable expectation that a service definition is more abstractly representative of multiple services within the task group. In this case, one would want to define checks of that service which concretely make HTTP or gRPC requests to different underlying tasks. Such a model is not possible with the course `proxy.expose` flag. Instead, we now have the flag made available within the check definitions themselves. By making the expose feature resolute to each check, it is possible to have some HTTP/gRPC checks which make use of the envoy exposed paths, as well as some HTTP/gRPC checks which make use of some orthongonal port-mapping to do checks on some other task (or even some other bound port of the same task) within the task group. Given this example, group "server-group" { network { mode = "bridge" port "forchecks" { to = -1 } } service { name = "myserver" port = 2000 connect { sidecar_service { } } check { name = "mycheck-myserver" type = "http" port = "forchecks" interval = "3s" timeout = "2s" method = "GET" path = "/classic/responder/health" expose = true } } } Nomad will automatically inject (via job endpoint mutator) the extrapolated expose path configuration, i.e. expose { path { path = "/classic/responder/health" protocol = "http" local_path_port = 2000 listener_port = "forchecks" } } Documentation is coming in #7440 (needs updating, doing next) Modifications to the `countdash` examples in https://github.com/hashicorp/demo-consul-101/pull/6 which will make the examples in the documentation actually runnable. Will add some e2e tests based on the above when it becomes available.	2020-03-31 17:15:50 -06:00
Seth Hoenig	41244c5857	jobspec: parse multi expose.path instead of explicit slice	2020-03-31 17:15:27 -06:00
Seth Hoenig	0266f056b8	connect: enable proxy.passthrough configuration Enable configuration of HTTP and gRPC endpoints which should be exposed by the Connect sidecar proxy. This changeset is the first "non-magical" pass that lays the groundwork for enabling Consul service checks for tasks running in a network namespace because they are Connect-enabled. The changes here provide for full configuration of the connect { sidecar_service { proxy { expose { paths = [{ path = <exposed endpoint> protocol = <http or grpc> local_path_port = <local endpoint port> listener_port = <inbound mesh port> }, ... ] } } } stanza. Everything from `expose` and below is new, and partially implements the precedent set by Consul: https://www.consul.io/docs/connect/registration/service-registration.html#expose-paths-configuration-reference Combined with a task-group level network port-mapping in the form: port "exposeExample" { to = -1 } it is now possible to "punch a hole" through the network namespace to a specific HTTP or gRPC path, with the anticipated use case of creating Consul checks on Connect enabled services. A future PR may introduce more automagic behavior, where we can do things like 1) auto-fill the 'expose.path.local_path_port' with the default value of the 'service.port' value for task-group level connect-enabled services. 2) automatically generate a port-mapping 3) enable an 'expose.checks' flag which automatically creates exposed endpoints for every compatible consul service check (http/grpc checks on connect enabled services).	2020-03-31 17:15:27 -06:00
Tim Gross	f849d2cb5e	api: prevent panic if volume has nil allocs (#7486 )	2020-03-25 09:45:51 -04:00
Mahmood Ali	ceed57b48f	per-task restart policy	2020-03-24 17:00:41 -04:00
Chris Baker	5979d6a81e	more testing for ScalingPolicy, mainly around parsing and canonicalization for Min/Max	2020-03-24 19:43:50 +00:00
Chris Baker	aa5beafe64	Job.Scale should not result in job update or eval create if args.Count == nil plus tests	2020-03-24 17:36:06 +00:00
Chris Baker	bc13bfb433	bad conversion between api.ScalingPolicy and structs.ScalingPolicy meant that we were throwing away .Min if provided	2020-03-24 14:39:06 +00:00
Chris Baker	ab4d174319	added new int64ToPtr method to api/util to avoid pulling in other packages	2020-03-24 14:39:05 +00:00
Chris Baker	f6ec5f9624	made count optional during job scaling actions added ACL protection in Job.Scale in Job.Scale, only perform a Job.Register if the Count was non-nil	2020-03-24 14:39:05 +00:00
Chris Baker	233db5258a	changes to Canonicalize, Validate, and api->struct conversion so that tg.Count, tg.Scaling.Min/Max are well-defined with reasonable defaults. - tg.Count defaults to tg.Scaling.Min if present (falls back on previous default of 1 if Scaling is absent) - Validate() enforces tg.Scaling.Min <= tg.Count <= tg.Scaling.Max modification in ApiScalingPolicyToStructs, api.TaskGroup.Validate so that defaults are handled for TaskGroup.Count and	2020-03-24 13:57:17 +00:00
Chris Baker	5373d503c7	scaling api: put api.* objects in agreement with structs.* objects	2020-03-24 13:57:16 +00:00
Chris Baker	00092a6c29	fixed http endpoints for job.register and job.scalestatus	2020-03-24 13:57:16 +00:00
Chris Baker	925b59e1d2	wip: scaling status return, almost done	2020-03-24 13:57:15 +00:00
Chris Baker	42270d862c	wip: some tests still failing updating job scaling endpoints to match RFC, cleaning up the API object as well	2020-03-24 13:57:14 +00:00
Chris Baker	abc7a52f56	finished refactoring state store, schema, etc	2020-03-24 13:57:14 +00:00
Luiz Aoqui	47d35489d6	wip: use testify in job scaling tests	2020-03-24 13:57:13 +00:00
Luiz Aoqui	d4b6e4b258	wip: add tests for job scale method	2020-03-24 13:57:12 +00:00
Luiz Aoqui	c74c01a643	wip: add scaling policies methods to the client	2020-03-24 13:57:12 +00:00
Chris Baker	3d54f1feba	wip: added Enabled to ScalingPolicyListStub, removed JobID from body of scaling request	2020-03-24 13:57:12 +00:00
Chris Baker	84953a1ed7	wip: remove PolicyOverride from scaling request	2020-03-24 13:57:11 +00:00
Chris Baker	024d203267	wip: added tests for client methods around group scaling	2020-03-24 13:57:11 +00:00
Chris Baker	1c5c2eb71b	wip: add GET endpoint for job group scaling target	2020-03-24 13:57:10 +00:00
Luiz Aoqui	6699d5e536	wip: add job scale endpoint in client	2020-03-24 13:57:10 +00:00
Chris Baker	8453e667c2	wip: working on job group scaling endpoint	2020-03-24 13:55:20 +00:00
Chris Baker	65d92f1fbf	WIP: adding ScalingPolicy to api/structs and state store	2020-03-24 13:55:18 +00:00
Lang Martin	e100444740	csi: add mount_options to volumes and volume requests (#7398 ) Add mount_options to both the volume definition on registration and to the volume block in the group where the volume is requested. If both are specified, the options provided in the request replace the options defined in the volume. They get passed to the NodePublishVolume, which causes the node plugin to actually mount the volume on the host. Individual tasks just mount bind into the host mounted volume (unchanged behavior). An operator can mount the same volume with different options by specifying it twice in the group context. closes #7007 * nomad/structs/volumes: add MountOptions to volume request * jobspec/test-fixtures/basic.hcl: add mount_options to volume block * jobspec/parse_test: add expected MountOptions * api/tasks: add mount_options * jobspec/parse_group: use hcl decode not mapstructure, mount_options * client/allocrunner/csi_hook: pass MountOptions through client/allocrunner/csi_hook: add a VolumeMountOptions client/allocrunner/csi_hook: drop Options client/allocrunner/csi_hook: use the structs options * client/pluginmanager/csimanager/interface: UsageOptions.MountOptions * client/pluginmanager/csimanager/volume: pass MountOptions in capabilities * plugins/csi/plugin: remove todo 7007 comment * nomad/structs/csi: MountOptions * api/csi: add options to the api for parsing, match structs * plugins/csi/plugin: move VolumeMountOptions to structs * api/csi: use specific type for mount_options * client/allocrunner/csi_hook: merge MountOptions here * rename CSIOptions to CSIMountOptions * client/allocrunner/csi_hook * client/pluginmanager/csimanager/volume * nomad/structs/csi * plugins/csi/fake/client: add PrevVolumeCapability * plugins/csi/plugin * client/pluginmanager/csimanager/volume_test: remove debugging * client/pluginmanager/csimanager/volume: fix odd merging logic * api: rename CSIOptions -> CSIMountOptions * nomad/csi_endpoint: remove a 7007 comment * command/alloc_status: show mount options in the volume list * nomad/structs/csi: include MountOptions in the volume stub * api/csi: add MountOptions to stub * command/volume_status_csi: clean up csiVolMountOption, add it * command/alloc_status: csiVolMountOption lives in volume_csi_status * command/node_status: display mount flags * nomad/structs/volumes: npe * plugins/csi/plugin: npe in ToCSIRepresentation * jobspec/parse_test: expand volume parse test cases * command/agent/job_endpoint: ApiTgToStructsTG needs MountOptions * command/volume_status_csi: copy paste error * jobspec/test-fixtures/basic: hclfmt * command/volume_status_csi: clean up csiVolMountOption	2020-03-23 13:59:25 -04:00
Lang Martin	99841222ed	csi: change the API paths to match CLI command layout (#7325 ) * command/agent/csi_endpoint: support type filter in volumes & plugins * command/agent/http: use /v1/volume/csi & /v1/plugin/csi * api/csi: use /v1/volume/csi & /v1/plugin/csi * api/nodes: use /v1/volume/csi & /v1/plugin/csi * api/nodes: not /volumes/csi, just /volumes * command/agent/csi_endpoint: fix ot parameter parsing	2020-03-23 13:58:30 -04:00
Lang Martin	80619137ab	csi: volumes listed in `nomad node status` (#7318 ) * api/allocations: GetTaskGroup finds the taskgroup struct * command/node_status: display CSI volume names * nomad/state/state_store: new CSIVolumesByNodeID * nomad/state/iterator: new SliceIterator type implements memdb.ResultIterator * nomad/csi_endpoint: deal with a slice of volumes * nomad/state/state_store: CSIVolumesByNodeID return a SliceIterator * nomad/structs/csi: CSIVolumeListRequest takes a NodeID * nomad/csi_endpoint: use the return iterator * command/agent/csi_endpoint: parse query params for CSIVolumes.List * api/nodes: new CSIVolumes to list volumes by node * command/node_status: use the new list endpoint to print volumes * nomad/state/state_store: error messages consider the operator * command/node_status: include the Provider	2020-03-23 13:58:30 -04:00
Tim Gross	de4ad6ca38	csi: add Provider field to CSI CLIs and APIs (#7285 ) Derive a provider name and version for plugins (and the volumes that use them) from the CSI identity API `GetPluginInfo`. Expose the vendor name as `Provider` in the API and CLI commands.	2020-03-23 13:58:30 -04:00
Lang Martin	887e1f28c9	csi: CLI for volume status, registration/deregistration and plugin status (#7193 ) * command/csi: csi, csi_plugin, csi_volume * helper/funcs: move ExtraKeys from parse_config to UnusedKeys * command/agent/config_parse: use helper.UnusedKeys * api/csi: annotate CSIVolumes with hcl fields * command/csi_plugin: add Synopsis * command/csi_volume_register: use hcl.Decode style parsing * command/csi_volume_list * command/csi_volume_status: list format, cleanup * command/csi_plugin_list * command/csi_plugin_status * command/csi_volume_deregister * command/csi_volume: add Synopsis * api/contexts/contexts: add csi search contexts to the constants * command/commands: register csi commands * api/csi: fix struct tag for linter * command/csi_plugin_list: unused struct vars * command/csi_plugin_status: unused struct vars * command/csi_volume_list: unused struct vars * api/csi: add allocs to CSIPlugin * command/csi_plugin_status: format the allocs * api/allocations: copy Allocation.Stub in from structs * nomad/client_rpc: add some error context with Errorf * api/csi: collapse read & write alloc maps to a stub list * command/csi_volume_status: cleanup allocation display * command/csi_volume_list: use Schedulable instead of Healthy * command/csi_volume_status: use Schedulable instead of Healthy * command/csi_volume_list: sprintf string * command/csi: delete csi.go, csi_plugin.go * command/plugin: refactor csi components to sub-command plugin status * command/plugin: remove csi * command/plugin_status: remove csi * command/volume: remove csi * command/volume_status: split out csi specific * helper/funcs: add RemoveEqualFold * command/agent/config_parse: use helper.RemoveEqualFold * api/csi: do ,unusedKeys right * command/volume: refactor csi components to `nomad volume` * command/volume_register: split out csi specific * command/commands: use the new top level commands * command/volume_deregister: hardwired type csi for now * command/volume_status: csiFormatVolumes rescued from volume_list * command/plugin_status: avoid a panic on no args * command/volume_status: avoid a panic on no args * command/plugin_status: predictVolumeType * command/volume_status: predictVolumeType * nomad/csi_endpoint_test: move CreateTestPlugin to testing * command/plugin_status_test: use CreateTestCSIPlugin * nomad/structs/structs: add CSIPlugins and CSIVolumes search consts * nomad/state/state_store: add CSIPlugins and CSIVolumesByIDPrefix * nomad/search_endpoint: add CSIPlugins and CSIVolumes * command/plugin_status: move the header to the csi specific * command/volume_status: move the header to the csi specific * nomad/state/state_store: CSIPluginByID prefix * command/status: rename the search context to just Plugins/Volumes * command/plugin,volume_status: test return ids now * command/status: rename the search context to just Plugins/Volumes * command/plugin_status: support -json and -t * command/volume_status: support -json and -t * command/plugin_status_csi: comments * command/_status: clean up text api/csi: fix stale comments * command/volume: make deregister sound less fearsome * command/plugin_status: set the id length * command/plugin_status_csi: more compact plugin health * command/volume: better error message, comment	2020-03-23 13:58:30 -04:00
Lang Martin	369b0e54b9	csi: volumes use `Schedulable` rather than `Healthy` (#7250 ) * structs: add ControllerRequired, volume.Name, no plug.Type * structs: Healthy -> Schedulable * state_store: Healthy -> Schedulable * api: add ControllerRequired to api data types * api: copy csi structs changes * nomad/structs/csi: include name and external id * api/csi: include Name and ExternalID * nomad/structs/csi: comments for the 3 ids	2020-03-23 13:58:30 -04:00
Lang Martin	a4784ef258	csi add allocation context to fingerprinting results (#7133 ) * structs: CSIInfo include AllocID, CSIPlugins no Jobs * state_store: eliminate plugin Jobs, delete an empty plugin * nomad/structs/csi: detect empty plugins correctly * client/allocrunner/taskrunner/plugin_supervisor_hook: option AllocID * client/pluginmanager/csimanager/instance: allocID * client/pluginmanager/csimanager/fingerprint: set AllocID * client/node_updater: split controller and node plugins * api/csi: remove Jobs The CSI Plugin API will map plugins to allocations, which allows plugins to be defined by jobs in many configurations. In particular, multiple plugins can be defined in the same job, and multiple jobs can be used to define a single plugin. Because we now map the allocation context directly from the node, it's no longer necessary to track the jobs associated with a plugin directly. * nomad/csi_endpoint_test: CreateTestPlugin & register via fingerprint * client/dynamicplugins: lift AllocID into the struct from Options * api/csi_test: remove Jobs test * nomad/structs/csi: CSIPlugins has an array of allocs * nomad/state/state_store: implement CSIPluginDenormalize * nomad/state/state_store: CSIPluginDenormalize npe on missing alloc * nomad/csi_endpoint_test: defer deleteNodes for clarity * api/csi_test: disable this test awaiting mocks: https://github.com/hashicorp/nomad/issues/7123	2020-03-23 13:58:30 -04:00
Danielle Lancashire	cd5b4923d0	api: Register CSIPlugin before registering a Volume	2020-03-23 13:58:30 -04:00
Lang Martin	88316208a0	csi: server-side plugin state tracking and api (#6966 ) * structs: CSIPlugin indexes jobs acting as plugins and node updates * schema: csi_plugins table for CSIPlugin * nomad: csi_endpoint use vol.Denormalize, plugin requests * nomad: csi_volume_endpoint: rename to csi_endpoint * agent: add CSI plugin endpoints * state_store_test: use generated ids to avoid t.Parallel conflicts * contributing: add note about registering new RPC structs * command: agent http register plugin lists * api: CSI plugin queries, ControllerHealthy -> ControllersHealthy * state_store: copy on write for volumes and plugins * structs: copy on write for volumes and plugins * state_store: CSIVolumeByID returns an unhealthy volume, denormalize * nomad: csi_endpoint use CSIVolumeDenormalizePlugins * structs: remove struct errors for missing objects * nomad: csi_endpoint return nil for missing objects, not errors * api: return meta from Register to avoid EOF error * state_store: CSIVolumeDenormalize keep allocs in their own maps * state_store: CSIVolumeDeregister error on missing volume * state_store: CSIVolumeRegister set indexes * nomad: csi_endpoint use CSIVolumeDenormalizePlugins tests	2020-03-23 13:58:29 -04:00
Lang Martin	6106a388e6	api: csi	2020-03-23 13:58:29 -04:00
Danielle Lancashire	78b7784f2b	api: Include CSI metadata on nodes	2020-03-23 13:58:29 -04:00
Danielle Lancashire	426c26d7c0	CSI Plugin Registration (#6555 ) This changeset implements the initial registration and fingerprinting of CSI Plugins as part of #5378. At a high level, it introduces the following: * A `csi_plugin` stanza as part of a Nomad task configuration, to allow a task to expose that it is a plugin. * A new task runner hook: `csi_plugin_supervisor`. This hook does two things. When the `csi_plugin` stanza is detected, it will automatically configure the plugin task to receive bidirectional mounts to the CSI intermediary directory. At runtime, it will then perform an initial heartbeat of the plugin and handle submitting it to the new `dynamicplugins.Registry` for further use by the client, and then run a lightweight heartbeat loop that will emit task events when health changes. * The `dynamicplugins.Registry` for handling plugins that run as Nomad tasks, in contrast to the existing catalog that requires `go-plugin` type plugins and to know the plugin configuration in advance. * The `csimanager` which fingerprints CSI plugins, in a similar way to `drivermanager` and `devicemanager`. It currently only fingerprints the NodeID from the plugin, and assumes that all plugins are monolithic. Missing features * We do not use the live updates of the `dynamicplugin` registry in the `csimanager` yet. * We do not deregister the plugins from the client when they shutdown yet, they just become indefinitely marked as unhealthy. This is deliberate until we figure out how we should manage deploying new versions of plugins/transitioning them.	2020-03-23 13:58:28 -04:00
Jasmine Dahilig	73a64e4397	change jobspec lifecycle stanza to use sidecar attribute instead of block_until status	2020-03-21 17:52:57 -04:00
Jasmine Dahilig	1485b342e2	remove deadline code for now	2020-03-21 17:52:56 -04:00
Jasmine Dahilig	b69e8e3a42	remove api package dependency on structs package	2020-03-21 17:52:55 -04:00
Jasmine Dahilig	7064deaafb	put lifecycle nil and empty checks in api Canonicalize	2020-03-21 17:52:50 -04:00
Jasmine Dahilig	39b5eb245c	remove api dependency on structs package, copy lifecycle defaults to api package	2020-03-21 17:52:49 -04:00
Jasmine Dahilig	f6e58d6dad	add canonicalize in the right place	2020-03-21 17:52:41 -04:00
Jasmine Dahilig	fc13fa9739	change TaskLifecycle RunLevel to Hook and add Deadline time duration	2020-03-21 17:52:37 -04:00
Mahmood Ali	3b5786ddb3	add lifecycle to api and parser	2020-03-21 17:52:36 -04:00
James Rasell	8a5acf7fd5	Merge pull request #5970 from jrasell/bug-gh-5506 Fix returned EOF error when calling Nodes GC/GcAlloc API	2020-03-12 10:04:17 +01:00
Michael Schurter	2dcc85bed1	jobspec: fixup vault_grace deprecation Followup to #7170 - Moved canonicalization of VaultGrace back into `api/` package. - Fixed tests. - Made docs styling consistent.	2020-03-10 14:58:49 -07:00
Michael Schurter	b72b3e765c	Merge pull request #7170 from fredrikhgrelland/consul_template_upgrade Update consul-template to v0.24.1 and remove deprecated vault grace	2020-03-10 14:15:47 -07:00
Michael Schurter	452b6f004f	Merge pull request #7231 from hashicorp/b-alloc-dev-panic api: fix panic when displaying devices w/o stat	2020-03-09 07:34:59 -07:00
Mahmood Ali	37e0598344	api: alloc exec recovers from bad client connection If alloc exec fails to connect to the nomad client associated with the alloc, fail over to using a server. The code attempted to special case `net.Error` for failover to rule out other permanent non-networking errors, by reusing a pattern in the logging handling. But this pattern does not apply here. `net/http.Http` wraps all errors as `*url.Error` that is net.Error. The websocket doesn't, and instead returns the raw error. If the raw error isn't a `net.Error`, like in the case of TLS handshake errors, the api package would fail immediately rather than failover.	2020-03-04 17:43:00 -05:00
Michael Schurter	ac3db90497	api: fix panic when displaying devices w/o stat "<none>" mathces `node status -verbose` output	2020-02-26 21:24:31 -05:00
Fredrik Hoem Grelland	edb3bd0f3f	Update consul-template to v0.24.1 and remove deprecated vault_grace (#7170 )	2020-02-23 16:24:53 +01:00
James Rasell	6463532577	Fix panic when canonicalizing a jobspec with incorrect job type. When canonicalizing the ReschedulePolicy a panic was possible if the passed job type was not valid. This change protects against this possibility, in a verbose way to ensure the code path is clear.	2020-02-21 09:14:36 +01:00
James Rasell	e7eb49fe84	api: check response content length before decoding. The API decodeBody function will now check the content length before attempting to decode. If the length is zero, and the out interface is nil then it is safe to assume the API call is not returning any data to the user. This allows us to better handle passing nil to API calls in a single place.	2020-02-20 10:07:44 +01:00
Mahmood Ali	f492ab6d9e	implement MinQuorum	2020-02-16 16:04:59 -06:00
Seth Hoenig	0e44094d1a	client: enable configuring enable_tag_override for services Consul provides a feature of Service Definitions where the tags associated with a service can be modified through the Catalog API, overriding the value(s) configured in the agent's service configuration. To enable this feature, the flag enable_tag_override must be configured in the service definition. Previously, Nomad did not allow configuring this flag, and thus the default value of false was used. Now, it is configurable. Because Nomad itself acts as a state machine around the the service definitions of the tasks it manages, it's worth describing what happens when this feature is enabled and why. Consider the basic case where there is no Nomad, and your service is provided to consul as a boring JSON file. The ultimate source of truth for the definition of that service is the file, and is stored in the agent. Later, Consul performs "anti-entropy" which synchronizes the Catalog (stored only the leaders). Then with enable_tag_override=true, the tags field is available for "external" modification through the Catalog API (rather than directly configuring the service definition file, or using the Agent API). The important observation is that if the service definition ever changes (i.e. the file is changed & config reloaded OR the Agent API is used to modify the service), those "external" tag values are thrown away, and the new service definition is once again the source of truth. In the Nomad case, Nomad itself is the source of truth over the Agent in the same way the JSON file was the source of truth in the example above. That means any time Nomad sets a new service definition, any externally configured tags are going to be replaced. When does this happen? Only on major lifecycle events, for example when a task is modified because of an updated job spec from the 'nomad job run <existing>' command. Otherwise, Nomad's periodic re-sync's with Consul will now no longer try to restore the externally modified tag values (as long as enable_tag_override=true). Fixes #2057	2020-02-10 08:00:55 -06:00
Seth Hoenig	f030a22c7c	command, docs: create and document consul token configuration for connect acls (gh-6716) This change provides an initial pass at setting up the configuration necessary to enable use of Connect with Consul ACLs. Operators will be able to pass in a Consul Token through `-consul-token` or `$CONSUL_TOKEN` in the `job run` and `job revert` commands (similar to Vault tokens). These values are not actually used yet in this changeset.	2020-01-31 19:02:53 -06:00

1 2 3 4 5 ...

963 Commits