open-nomad

Author	SHA1	Message	Date
Mahmood Ali	0f5539c382	exec: http: close websocket connection gracefully In this loop, we ought to close the websocket connection gracefully when the StreamErrWrapper reaches EOF. Previously, it's possible that that we drop the last few events or skip sending the websocket closure. If `handler(handlerPipe)` returns and `cancel` is called, before the loop here completes processing streaming events, the loop exits prematurely without propagating the last few events. Instead here, the loop continues until we hit `httpPipe` EOF (through `decoder.Decode`), to ensure we process the events to completion.	2021-05-24 13:37:23 -04:00
Tim Gross	084a46e0e5	agent: surface websocket errors in logs The websocket interface used for `alloc exec` has to silently drop client send errors because otherwise those errors would interleave with the streamed output. But we may be able to surface errors that cause terminated websockets a little better in the HTTP server logs.	2021-05-24 09:46:45 -04:00
Nomad Release bot	5be44af07d	Generate files for 1.1.0-rc1 release	2021-05-12 22:43:48 +00:00
Chris Baker	263ddd567c	Node Drain Metadata (#10250 )	2021-05-07 13:58:40 -04:00
Mahmood Ali	102763c979	Support disabling TCP checks for connect sidecar services	2021-05-07 12:10:26 -04:00
Mahmood Ali	4b95f6ef42	api: actually set MemoryOversubscriptionEnabled (#10493 )	2021-05-02 22:53:53 -04:00
Seth Hoenig	d54a606819	Merge pull request #10439 from hashicorp/pick-ent-acls-changes e2e: add e2e tests for consul namespaces on ent with acls	2021-04-28 08:30:08 -06:00
Tim Gross	79f81d617e	licensing: remove raft storage and sync This changeset is the OSS portion of the work to remove the raft storage and sync for Nomad Enterprise.	2021-04-28 10:28:23 -04:00
Seth Hoenig	09cd01a5f3	e2e: add e2e tests for consul namespaces on ent with acls This PR adds e2e tests for Consul Namespaces for Nomad Enterprise with Consul ACLs enabled. Needed to add support for Consul ACL tokens with `namespace` and `namespace_prefix` blocks, which Nomad parses and validates before tossing the token. These bits will need to be picked back to OSS.	2021-04-27 14:45:54 -06:00
Mahmood Ali	ed4aad458c	api: Ignore User provided ParentID (#10424 ) ParentID is an internal field that Nomad sets for dispatched or parameterized jobs. Job submitters should not be able to set it directly, as that messes up children tracking. Fixes #10422 . It specifically stops the scheduler from honoring the ParentID. The reason failure and why the scheduler didn't schedule that job once it was created is very interesting and requires follow up with a more technical issue.	2021-04-23 16:22:17 -04:00
Charlie Voiselle	ef8ca60693	Enable go-sockaddr templating for `network-interface` (#10404 ) Add templating to `network-interface` option. This PR also adds a fast-fail to in the case where an invalid interface is set or produced by the template * add tests and check for valid interface * Add documentation * Incorporate suggestions from code review Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2021-04-20 13:55:10 -04:00
Seth Hoenig	4e6dbaaec1	Merge pull request #10184 from hashicorp/f-fuzzy-search api: implement fuzzy search API	2021-04-20 09:06:40 -06:00
Seth Hoenig	509490e5d2	e2e: consul namespace tests from nomad ent (cherry-picked from ent without _ent things) This is part 2/4 of e2e tests for Consul Namespaces. Took a first pass at what the parameterized tests can look like, but only on the ENT side for this PR. Will continue to refactor in the next PRs. Also fixes 2 bugs: - Config Entries registered by Nomad Server on job registration were not getting Namespace set - Group level script checks were not getting Namespace set Those changes will need to be copied back to Nomad OSS. Nomad OSS + no ACLs (previously, needs refactor) Nomad ENT + no ACLs (this) Nomad OSS + ACLs (todo) Nomad ENT + ALCs (todo)	2021-04-19 15:35:31 -06:00
Seth Hoenig	1ee8d5ffc5	api: implement fuzzy search API This PR introduces the /v1/search/fuzzy API endpoint, used for fuzzy searching objects in Nomad. The fuzzy search endpoint routes requests to the Nomad Server leader, which implements the Search.FuzzySearch RPC method. Requests to the fuzzy search API are based on the api.FuzzySearchRequest object, e.g. { "Text": "ed", "Context": "all" } Responses from the fuzzy search API are based on the api.FuzzySearchResponse object, e.g. { "Index": 27, "KnownLeader": true, "LastContact": 0, "Matches": { "tasks": [ { "ID": "redis", "Scope": [ "default", "example", "cache" ] } ], "evals": [], "deployment": [], "volumes": [], "scaling_policy": [], "images": [ { "ID": "redis:3.2", "Scope": [ "default", "example", "cache", "redis" ] } ] }, "Truncations": { "volumes": false, "scaling_policy": false, "evals": false, "deployment": false } } The API is tunable using the new server.search stanza, e.g. server { search { fuzzy_enabled = true limit_query = 200 limit_results = 1000 min_term_length = 5 } } These values can be increased or decreased, so as to provide more search results or to reduce load on the Nomad Server. The fuzzy search API can be disabled entirely by setting `fuzzy_enabled` to `false`.	2021-04-16 16:36:07 -06:00
Nick Ethier	339c671e29	agent: add test for reserved core config mapping	2021-04-13 13:28:15 -04:00
Nick Ethier	edc0da9040	client: only fingerprint reservable cores via cgroups, allowing manual override for other platforms	2021-04-13 13:28:15 -04:00
Nick Ethier	bed4e92b61	fingerprint: implement client fingerprinting of reservable cores on Linux systems this is derived from the configure cpuset cgroup parent (defaults to /nomad) for non Linux systems and Linux systems where cgroups are not enabled, the client defaults to using all cores	2021-04-13 13:28:15 -04:00
Mahmood Ali	6bd2600cd0	Merge pull request #10370 from alrs/command-agent-errs command/agent: fix dropped test errors	2021-04-13 11:40:12 -04:00
Nick Spain	653d84ef68	Add a 'body' field to the check stanza Consul allows specifying the HTTP body to send in a health check. Nomad uses Consul for health checking so this just plumbs the value through to where the Consul API is called. There is no validation that `body` is not used with an incompatible check method like GET.	2021-04-13 09:15:35 -04:00
Lars Lehtonen	d2e7f31906	command/agent: fix dropped test errors	2021-04-13 01:51:24 -07:00
Tim Gross	0892d34ff9	CSI: capability block is required for volume registration	2021-04-08 13:02:24 -04:00
Tim Gross	7d16e49a14	CSI: fix wrong output struct for snapshot list endpoint	2021-04-07 12:00:33 -04:00
Tim Gross	e4f34a96e3	CSI: deletes with API don't have request body Our API client `delete` method doesn't include a request body, but accepts an interface for the response. We were accidentally putting the request body into the response, which doesn't get picked up in unit tests because we're not reading the (always empty) response body anyways.	2021-04-07 12:00:33 -04:00
Tim Gross	34a7b9da5c	CSI: fix wrong RPC name on ListSnapshots	2021-04-07 12:00:33 -04:00
Tim Gross	8af5bd1ad4	CSI: fix decoding error on snapshot create Consumers of the CSI HTTP API are expecting a response object and not a slice of snapshots. Fix the return value.	2021-04-07 12:00:33 -04:00
Tim Gross	69363705a8	CSI: fix HTTP routing for external volume list The HTTP router did not correctly route `/v1/volumes/external` without being explicitly added to the top-level router. Break this out into its own request handler.	2021-04-07 12:00:22 -04:00
Tim Gross	276633673d	CSI: use AccessMode/AttachmentMode from CSIVolumeClaim Registration of Nomad volumes previously allowed for a single volume capability (access mode + attachment mode pair). The recent `volume create` command requires that we pass a list of requested capabilities, but the existing workflow for claiming volumes and attaching them on the client assumed that the volume's single capability was correct and unchanging. Add `AccessMode` and `AttachmentMode` to `CSIVolumeClaim`, use these fields to set the initial claim value, and add backwards compatibility logic to handle the existing volumes that already have claims without these fields.	2021-04-07 11:24:09 -04:00
Seth Hoenig	fe8fce00d9	consul: minor CR cleanup	2021-04-05 10:10:16 -06:00
Seth Hoenig	f17ba33f61	consul: plubming for specifying consul namespace in job/group This PR adds the common OSS changes for adding support for Consul Namespaces, which is going to be a Nomad Enterprise feature. There is no new functionality provided by this changeset and hopefully no new bugs.	2021-04-05 10:03:19 -06:00
Yoan Blanc	ac0d5d8bd3	chore: bump golangci-lint from v1.24 to v1.39 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2021-04-03 09:50:23 +02:00
Chris Baker	b11a092d2d	added missing import from command/agent	2021-04-02 13:53:28 +00:00
Chris Baker	21bc48ca29	json handles were moved to a new package in #10202 this was unecessary after refactoring, so this moves them back to their original location in package structs	2021-04-02 13:31:10 +00:00
Chris Baker	436d46bd19	Merge branch 'main' into f-node-drain-api	2021-04-01 15:22:57 -05:00
Tim Gross	466b620fa4	CSI: volume snapshot	2021-04-01 11:16:52 -04:00
Tim Gross	71b9daffb9	CSI: fix misleading HTTP test The HTTP test to create CSI volumes depends on having a controller plugin to talk to, but the test was using a node-only plugin, which allows it to silently ignore the missing controller.	2021-03-31 16:37:09 -04:00
Tim Gross	aec5337862	CSI: HTTP handlers for create/delete/list	2021-03-31 16:37:09 -04:00
Tim Gross	43622680fa	test infrastructure for mock client RPCs (#10193 ) This commit includes a new test client that allows overriding the RPC protocols. Only the RPCs that are passed in are registered, which lets you implement a mock RPC in the server tests. This commit includes an example of this for the ClientCSI RPC server.	2021-03-31 16:37:09 -04:00
Mahmood Ali	18b581656d	oversubscription: adds CLI and API support This commit updates the API to pass the MemoryMaxMB field, and the CLI to show the max set for the task. Also, start parsing the MemoryMaxMB in hcl2, as it's set by tags. A sample CLI output; note the additional `Max: ` for "task": ``` $ nomad alloc status 96fbeb0b ID = 96fbeb0b-a0b3-aa95-62bf-b8a39492fd5c [...] Task "cgroup-fetcher" is "running" Task Resources CPU Memory Disk Addresses 0/500 MHz 32 MiB/20 MiB 300 MiB Task Events: [...] Task "task" is "running" Task Resources CPU Memory Disk Addresses 0/500 MHz 176 KiB/20 MiB 300 MiB Max: 30 MiB Task Events: [...] ```	2021-03-30 16:55:58 -04:00
Nick Ethier	daecfa61e6	Merge pull request #10203 from hashicorp/f-cpu-cores Reserved Cores [1/4]: Structs and scheduler implementation	2021-03-29 14:05:54 -04:00
Tim Gross	b0d2eed932	redirect from HTTP root to UI should include query params The OTT feature relies on having a query parameter for a one-time token which gets handled by the UI. We need to make sure that query param is preserved when redirecting from the root URL to the `/ui/` URI.	2021-03-26 14:54:41 -04:00
Chris Baker	770c9cecb5	restored Node.Sanitize() for RPC endpoints multiple other updates from code review	2021-03-26 17:03:15 +00:00
Chris Baker	ff0b9a4d3e	added benchmark test for JSON encoding extensions	2021-03-23 20:23:06 +00:00
Chris Baker	cb540ed691	added tests that the API doesn't leak Node.SecretID added more documentation on JSON encoding to the contributing guide	2021-03-23 18:09:20 +00:00
Drew Bailey	74836b95b2	configuration and oss components for licensing (#10216 ) * configuration and oss components for licensing * vendor sync	2021-03-23 09:08:14 -04:00
Chris Baker	a186badf35	moved JSON handlers and extension code around a bit for proper order of initialization	2021-03-22 14:12:42 +00:00
Chris Baker	dd291e69f4	removed deprecated fields from Drain structs and API node drain: use msgtype on txn so that events are emitted wip: encoding extension to add Node.Drain field back to API responses new approach for hiding Node.SecretID in the API, using `json` tag documented this approach in the contributing guide refactored the JSON handlers with extensions modified event stream encoding to use the go-msgpack encoders with the extensions	2021-03-21 15:30:11 +00:00
Nick Ethier	26b200e8bd	api: add new 'cores' field to task resources	2021-03-18 23:13:30 -04:00
Tim Gross	fa25e048b2	CSI: unique volume per allocation Add a `PerAlloc` field to volume requests that directs the scheduler to test feasibility for volumes with a source ID that includes the allocation index suffix (ex. `[0]`), rather than the exact source ID. Read the `PerAlloc` field when making the volume claim at the client to determine if the allocation index suffix (ex. `[0]`) should be added to the volume source ID.	2021-03-18 15:35:11 -04:00
Tim Gross	9b2b580d1a	CSI: remove prefix matching from CSIVolumeByID and fix CLI prefix matching (#10158 ) Callers of `CSIVolumeByID` are generally assuming they should receive a single volume. This potentially results in feasibility checking being performed against the wrong volume if a volume's ID is a prefix substring of other volume (for example: "test" and "testing"). Removing the incorrect prefix matching from `CSIVolumeByID` breaks prefix matching in the command line client. Add the required elements for prefix matching to the commands and API.	2021-03-18 14:32:40 -04:00
Charlie Voiselle	0473f35003	Fixup uses of `sanity` (#10187 ) * Fixup uses of `sanity` * Remove unnecessary comments. These checks are better explained by earlier comments about the context of the test. Per @tgross, moved the tests together to better reinforce the overall shared context. * Update nomad/fsm_test.go	2021-03-16 18:05:08 -04:00
James Rasell	e9d81ace7b	Merge pull request #10140 from hashicorp/b-gh-10070 agent: return req error if prometheus metrics are disabled.	2021-03-10 17:11:39 +01:00
Tim Gross	75878f978e	HTTP API support for 'nomad ui -login' Endpoints for requesting and exchanging one-time tokens via the HTTP API. Includes documentation updates.	2021-03-10 08:17:56 -05:00
James Rasell	782350bd19	agent: return req error if prometheus metrics are disabled. If the user has disabled Prometheus metrics and a request is sent to the metrics endpoint requesting Prometheus formatted metrics, then the request should fail.	2021-03-09 15:28:58 +01:00
Andre Ilhicas	f45fc6c899	consul/connect: enable setting local_bind_address in upstream	2021-02-26 11:37:31 +00:00
Drew Bailey	86d9e1ff90	Merge pull request #9955 from hashicorp/on-update-services Service and Check on_update configuration option (readiness checks)	2021-02-24 10:11:05 -05:00
Seth Hoenig	d557d6bf94	consul/connect: Fix bug where connect sidecar services would be unnecessarily re-registered This PR fixes a bug where sidecar services would be re-registered into Consul every ~30 seconds, caused by the parent service having its tags field set and the sidecar_service tags unset. Nomad would directly compare the tags between its copy of the sidecar service definition and the tags of the sidecar service reported by Consul. This does not work, because Consul will under-the-hood set the sidecar service tags to inherit the parent service tags if the sidecar service tags are unset. The comparison then done by Nomad would not match, if the parent sidecar tags are set. Fixes #10025	2021-02-22 12:02:58 -06:00
AndrewChubatiuk	157f8de903	pr comments	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	cb84d8ef11	fixed typo	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	4f49614760	fixed typo	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	c4bece2b09	fixed typo	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	cb527c5d83	fixed tests	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	abb7fd1e36	fixed connect port label	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	3e11860b6d	fixed connect port label	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	cd152643fb	fixed connect port label	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	af7a16d774	tests fix	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	3d0aa2ef56	allocate sidecar task port on host_network interface	2021-02-13 02:42:13 +02:00
AndrewChubatiuk	47c690479c	fixed tests	2021-02-13 02:42:13 +02:00
AndrewChubatiuk	46c639c5d6	fixed tests	2021-02-13 02:42:13 +02:00
AndrewChubatiuk	77f9b84a30	fixed tests	2021-02-13 02:42:13 +02:00
AndrewChubatiuk	99201412da	removed proxy suffix	2021-02-13 02:42:13 +02:00
AndrewChubatiuk	844ac16900	fixed variable initialization	2021-02-13 02:42:13 +02:00
AndrewChubatiuk	78465bbd23	customized default sidecar checks	2021-02-13 02:42:13 +02:00
Buck Doyle	c22d1114d8	Add handling for license requests in OSS (#9963 ) This changes the license-fetching endpoint to respond with 204 in OSS instead of 501. It closes #9827.	2021-02-08 12:53:06 -06:00
Drew Bailey	82f971f289	OnUpdate configuration for services and checks Allow for readiness type checks by configuring nomad to ignore warnings or errors reported by a service check. This allows the deployment to progress and while Consul handles introducing the sercive into a resource pool once the check passes.	2021-02-08 08:32:40 -05:00
Chris Baker	ce68ee164b	Version 1.0.3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJgEuOKAAoJEFGFLYc0j/xMxF8H/3TTU6Tu+Xm0YvcsDaYDphZ/ X7KQBV0aFiuL5VkTw4PzKEsgryIy9/sqEPyxxyKRowAmos9qhiusjNAIfqdP4TF8 tdZmTedkfWir9uPD+hyv/LXpwbQ2T8kTwS3xHTYvaOmaCxZr710FEn+imnMk1AUn Xs5itkd/CYGr0nBLm+I5GutWSDPmL7Uw8J5Z30fFyoaxoCPAbCWQQNk793SCRUc5 f/uo18V2tFInmQ+3sAdnM4gPewyStK/a5VvzWavL9fVDtYK83wlqWSchTXY5jpVz zNEzt/rYhbBzakPQQKb5zieblh2iGI8aHWpD5w4WduqO2Sg6B/5lAeNZIlW0UJg= =2g3c -----END PGP SIGNATURE----- Merge tag 'v1.0.3' into post-release-1.0.3 Version 1.0.3	2021-01-29 19:30:08 +00:00
Seth Hoenig	8b05efcf88	consul/connect: Add support for Connect terminating gateways This PR implements Nomad built-in support for running Consul Connect terminating gateways. Such a gateway can be used by services running inside the service mesh to access "legacy" services running outside the service mesh while still making use of Consul's service identity based networking and ACL policies. https://www.consul.io/docs/connect/gateways/terminating-gateway These gateways are declared as part of a task group level service definition within the connect stanza. service { connect { gateway { proxy { // envoy proxy configuration } terminating { // terminating-gateway configuration entry } } } } Currently Envoy is the only supported gateway implementation in Consul. The gateay task can be customized by configuring the connect.sidecar_task block. When the gateway.terminating field is set, Nomad will write/update the Configuration Entry into Consul on job submission. Because CEs are global in scope and there may be more than one Nomad cluster communicating with Consul, there is an assumption that any terminating gateway defined in Nomad for a particular service will be the same among Nomad clusters. Gateways require Consul 1.8.0+, checked by a node constraint. Closes #9445	2021-01-25 10:36:04 -06:00
Drew Bailey	630babb886	prevent double job status update (#9768 ) * Prevent Job Statuses from being calculated twice https://github.com/hashicorp/nomad/pull/8435 introduced atomic eval insertion iwth job (de-)registration. This change removes a now obsolete guard which checked if the index was equal to the job.CreateIndex, which would empty the status. Now that the job regisration eval insetion is atomic with the registration this check is no longer necessary to set the job statuses correctly. * test to ensure only single job event for job register * periodic e2e * separate job update summary step * fix updatejobstability to use copy instead of modified reference of job * update envoygatewaybindaddresses copy to prevent job diff on null vs empty * set ConsulGatewayBindAddress to empty map instead of nil fix nil assertions for empty map rm unnecessary guard	2021-01-22 09:18:17 -05:00
Dennis Schön	3eaf1432aa	validate connect block allowed only within group.service	2021-01-20 14:34:23 -05:00
Dennis Schön	c376545f49	don't prefix json logging	2021-01-20 09:09:31 -05:00
Nomad Release bot	f36d983863	Generate files for 1.0.2 release	2021-01-13 16:52:51 +00:00
Nick Ethier	6705f845f2	Merge pull request #9739 from hashicorp/b-alloc-netmode-ports Use port's to value when building service address under 'alloc' addr_mode	2021-01-07 09:16:27 -05:00
Nick Ethier	bb060bd46b	command/agent/consul: remove duplicated tests	2021-01-06 14:11:31 -05:00
Kris Hicks	868ba0cea5	consul: Refactor parts of UpdateWorkload (#9737 ) This removes modification of ops in methods that UpdateWorkload calls, keeping them local to UpdateWorkload. It also includes some rewrites of checkRegs for clarity.	2021-01-06 11:11:28 -08:00
Nick Ethier	ab01e19df3	command/agent/consul: use port's to value when building service address under 'alloc' addr_mode	2021-01-06 13:52:48 -05:00
Seth Hoenig	6c9366986b	consul/connect: avoid NPE from unset connect gateway proxy Submitting a job with an ingress gateway in host networking mode with an absent gateway.proxy block would cause the Nomad client to panic on NPE. The consul registration bits would assume the proxy stanza was not nil, but it could be if the user does not supply any manually configured envoy proxy settings. Check the proxy field is not nil before using it. Fixes #9669	2021-01-05 09:27:01 -06:00
Seth Hoenig	40d36fc0ec	agent: revert use of http connlimit https://github.com/hashicorp/nomad/pull/9608 introduced the use of the built-in HTTP 429 response handler provided by go-connlimit. There is concern though around plausible DOS attacks that need to be addressed, so this PR reverts that functionality. It keeps a fix in the tests around the use of an HTTPS enabled client for when the server is listening on HTTPS. Previously, the tests would fail deterministically with io.EOF because that's how the TLS server terminates invalid connections. Now, the result is much less deterministic. The state of the client connection and the server socket depends on when the connection is closed and how far along the handshake was.	2020-12-14 14:40:14 -06:00
Seth Hoenig	0091325721	command: give flag-helpers a better name	2020-12-14 10:07:27 -06:00
Seth Hoenig	d3f1c3adcf	Merge pull request #9608 from hashicorp/f-go-connlimit Use go-connlimit to ratelimit with 429 responses	2020-12-10 11:05:07 -06:00
Seth Hoenig	a28cd45988	client: fix plumbing of testing object into helper	2020-12-10 11:04:38 -06:00
Kris Hicks	0cf9cae656	Apply some suggested fixes from staticcheck (#9598 )	2020-12-10 07:29:18 -08:00
Seth Hoenig	2cc5787f97	client: fix https test cases in client rate limits	2020-12-10 09:20:28 -06:00
Ben Buzbee	a8e4aa76c6	Use new go-connlimit with HTTP 429 response This is essentially a port of Consul's similar fix Changes are: go get -u github.com/hashicorp/go-connlimit go mod vendor Use new HTTP429 handler `20d1ea7d2d`	2020-12-09 17:57:16 -06:00
Kris Hicks	0a3a748053	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
Kris Hicks	93155ba3da	Add gocritic to golangci-lint config (#9556 )	2020-12-08 12:47:04 -08:00
Dennis Schön	038f2cce57	return 405 on non-GET requests to /v1/event/stream (fixes #9526 ) (#9564 )	2020-12-08 13:09:20 -05:00
Dave May	be0a14d70b	fix AgentHostRequest panic found in GH-9546 (#9554 ) * debug: refactor nodeclass test * debug: add case to track down SIGSEGV on client to server Agent.Host RPC * verify server to avoid panic on AgentHostRequest RPC call, fixes GH-9546 * simplify Agent.Host RPC lookup logic	2020-12-07 17:34:40 -05:00
Dennis Schön	a9c97d9257	use os.ErrDeadlineExceeded in tests	2020-12-07 10:40:28 -05:00
Drew Bailey	9adca240f8	Event Stream: Track ACL changes, unsubscribe on invalidating changes (#9447 ) * upsertaclpolicies * delete acl policies msgtype * upsert acl policies msgtype * delete acl tokens msgtype * acl bootstrap msgtype wip unsubscribe on token delete test that subscriptions are closed after an ACL token has been deleted Start writing policyupdated test * update test to use before/after policy * add SubscribeWithACLCheck to run acl checks on subscribe * update rpc endpoint to use broker acl check * Add and use subscriptions.closeSubscriptionFunc This fixes the issue of not being able to defer unlocking the mutex on the event broker in the for loop. handle acl policy updates * rpc endpoint test for terminating acl change * add comments Co-authored-by: Kris Hicks <khicks@hashicorp.com>	2020-12-01 11:11:34 -05:00
Drew Bailey	a0b7f05a7b	Remove Managed Sinks from Nomad (#9470 ) * Remove Managed Sinks from Nomad Managed Sinks were a beta feature in Nomad 1.0-beta2. During the beta period it was determined that this was not a scalable approach to support community and third party sinks. * update comment * changelog	2020-11-30 14:00:31 -05:00
Seth Hoenig	e81e9223ef	consul/connect: enable setting datacenter in connect upstream Before, upstreams could only be defined using the default datacenter. Now, the `datacenter` field can be set in a connect upstream definition, informing consul of the desire for an instance of the upstream service in the specified datacenter. The field is optional and continues to default to the local datacenter. Closes #8964	2020-11-30 10:38:30 -06:00
Tim Gross	4e79ddea45	csi/api: populate ReadAllocs/WriteAllocs fields (#9377 ) The API is missing values for `ReadAllocs` and `WriteAllocs` fields, resulting in allocation claims not being populated in the web UI. These fields mirror the fields in `nomad/structs.CSIVolume`. Returning a separate list of stubs for read and write would be ideal, but this can't be done without either bloating the API response with repeated full `Allocation` data, or causing a panic in previous versions of the CLI. The `nomad/structs` fields are persisted with nil values and are populated during RPC, so we'll do the same in the HTTP API and populate the `ReadAllocs` and `WriteAllocs` fields with a map of allocation IDs, but with null values. The web UI will then create its `ReadAllocations` and `WriteAllocations` fields by mapping from those IDs to the values in `Allocations`, instead of flattening the map into a list.	2020-11-25 16:44:06 -05:00
Seth Hoenig	3c17dc2ecc	api: safely access legacy MBits field	2020-11-23 10:36:10 -06:00
Nick Ethier	9471892df4	mock: add default host network	2020-11-23 10:11:00 -06:00
Chris Baker	7c6071e6c4	updated alloc_endpoint to mutate a copy of the returned allocation, instead of the instance in the state store	2020-11-15 17:52:50 +00:00
Chris Baker	b244d5e949	documenting test for #9367	2020-11-15 17:47:50 +00:00
Seth Hoenig	bb8a5816a0	jobspec: add support for headers in artifact stanza This PR adds the ability to set HTTP headers when downloading an artifact from an `http` or `https` resource. The implementation in `go-getter` is such that a new `HTTPGetter` must be created for each artifact that sets headers (as opposed to conveniently setting headers per-request). This PR maintains the memoization of the default Getter objects, creating new ones only for artifacts where headers are set. Closes #9306	2020-11-13 12:03:54 -06:00
Seth Hoenig	b19bc6be2b	consul: prevent re-registration churn by correctly comparing sidecar tags Previously, connect sidecars would be re-registered with consul every cycle of Nomad's reconciliation loop around Consul service registrations. This is because part of the comparison used `reflect.DeepEqual` on []string objects, which returns false when one object is `[]string{}` and the other is `[]string{}(nil)`. Unforunately, this was always the case, and every Connect sidecar service would be re-registered on every iteration, which happens every 30 seconds.	2020-11-11 18:01:17 -06:00
Nick Ethier	5e1634eda1	structs: canonicalize allocatedtaskresources to populate shared ports (#9309 )	2020-11-11 16:21:47 -05:00
Tim Gross	60874ebe25	csi: Postrun hook should not change mode (#9323 ) The unpublish workflow requires that we know the mode (RW vs RO) if we want to unpublish the node. Update the hook and the Unpublish RPC so that we mark the claim for release in a new state but leave the mode alone. This fixes a bug where RO claims were failing node unpublish. The core job GC doesn't know the mode, but we don't need it for that workflow, so add a mode specifically for GC; the volumewatcher uses this as a sentinel to check whether claims (with their specific RW vs RO modes) need to be claimed.	2020-11-11 13:06:30 -05:00
Mahmood Ali	69849a42a5	Merge pull request #9298 from hashicorp/f-hcl2-localsvars HCL2: Variables and Locals	2020-11-09 16:44:37 -05:00
Luiz Aoqui	ea81ac5d3d	Merge pull request #9296 from hashicorp/b-remove-namespace-from-scale-request Remove Namespace field from JobScaleRequest	2020-11-09 15:13:33 -05:00
Mahmood Ali	1ae3e8a8eb	Start using the new jobspec2 API	2020-11-09 15:01:31 -05:00
Luiz Aoqui	c536286c7a	remove Namespace field from JobScaleRequest	2020-11-09 13:02:05 -05:00
Nick Ethier	04f5c4ee5f	ar/groupservice: remove drivernetwork (#9233 ) * ar/groupservice: remove drivernetwork * consul: allow host address_mode to accept raw port numbers * consul: fix logic for blank address	2020-11-05 15:00:22 -05:00
Kris Hicks	1da9e7fc67	Add event sink API and CLI commands (#9226 ) Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>	2020-11-02 09:57:35 -08:00
Chris Baker	719077a26d	added new policy capabilities for recommendations API state store: call-out to generic update of job recommendations from job update method recommendations API work, and http endpoint errors for OSS support for scaling polices in task block of job spec add query filters for ScalingPolicy list endpoint command: nomad scaling policy list: added -job and -type	2020-10-28 14:32:16 +00:00
Drew Bailey	86080e25a9	Send events to EventSinks (#9171 ) * Process to send events to configured sinks This PR adds a SinkManager to a server which is responsible for managing managed sinks. Managed sinks subscribe to the event broker and send events to a sink writer (webhook). When changes to the eventstore are made the sinkmanager and managed sink are responsible for reloading or starting a new managed sink. * periodically check in sink progress to raft Save progress on the last successfully sent index to raft. This allows a managed sink to resume close to where it left off in the event of a lost server or leadership change dereference eventsink so we can accurately use the watchch When using a pointer to eventsink struct it was updated immediately and our reload logic would not trigger	2020-10-26 17:27:54 -04:00
Drew Bailey	1ae39a9ed9	event sink crud operation api (#9155 ) * network sink rpc/api plumbing state store methods and restore upsert sink test get sink delete sink event sink list and tests go generate new msg types validate sink on upsert * go generate	2020-10-23 14:23:00 -04:00
Michael Schurter	c2dd9bc996	core: open source namespaces	2020-10-22 15:26:32 -07:00
Mahmood Ali	059e87c862	Merge pull request #9142 from hashicorp/f-hclv2-2.3 Support HCLv2 for Nomad jobs	2020-10-22 12:26:28 -05:00
Drew Bailey	f3dcefe5a9	remove event durability (#9147 ) * remove event durability temporarily removing go-memdb event durability until a new strategy is developed on how to best handled increased durability needs * drop events table schema and state store methods * fix neweventbuffer invocations	2020-10-22 12:21:03 -04:00
Mahmood Ali	f52bda4c30	api: update /render api to parse hclv2	2020-10-21 15:46:57 -04:00
Drew Bailey	6c788fdccd	Events/msgtype cleanup (#9117 ) * use msgtype in upsert node adds message type to signature for upsert node, update tests, remove placeholder method * UpsertAllocs msg type test setup * use upsertallocs with msg type in signature update test usage of delete node delete placeholder msgtype method * add msgtype to upsert evals signature, update test call sites with test setup msg type handle snapshot upsert eval outside of FSM and ignore eval event remove placeholder upsertevalsmsgtype handle job plan rpc and prevent event creation for plan msgtype cleanup upsertnodeevents updatenodedrain msgtype msg type 0 is a node registration event, so set the default to the ignore type * fix named import * fix signature ordering on upsertnode to match	2020-10-19 09:30:15 -04:00
Drew Bailey	fba0d6dc6a	event buffer size and durable count must be non negative	2020-10-15 16:34:33 -04:00
Nick Ethier	4903e5b114	Consul with CNI and host_network addresses (#9095 ) * consul: advertise cni and multi host interface addresses * structs: add service/check address_mode validation * ar/groupservices: fetch networkstatus at hook runtime * ar/groupservice: nil check network status getter before calling * consul: comment network status can be nil	2020-10-15 15:32:21 -04:00
Michael Schurter	ea55c497b7	Merge pull request #9094 from hashicorp/f-1.0 s/0.13/1.0/g	2020-10-15 08:53:33 -07:00
James Rasell	42a6e7140f	Merge pull request #9083 from hashicorp/b-fix-enterprise-config-merge agent: fix enterprise config overlay merging.	2020-10-15 08:40:49 +02:00
Michael Schurter	9c3972937b	s/0.13/1.0/g 1.0 here we come!	2020-10-14 15:17:47 -07:00
Michael Schurter	dd09fa1a4a	Merge pull request #9055 from hashicorp/f-9017-resources api: add field filters to /v1/{allocations,nodes}	2020-10-14 14:49:39 -07:00
Michael Schurter	6890cffd7a	unify boolean parameter parsing	2020-10-14 12:23:25 -07:00
Drew Bailey	c463479848	filter on additional filter keys, remove switch statement duplication properly wire up durable event count move newline responsibility moves newline creation from NDJson to the http handler, json stream only encodes and sends now ignore snapshot restore if broker is disabled enable dev mode to access event steam without acl use mapping instead of switch use pointers for config sizes, remove unused ttl, simplify closed conn logic	2020-10-14 14:14:33 -04:00
Michael Schurter	8ccbd92cb6	api: add field filters to /v1/{allocations,nodes} Fixes #9017 The ?resources=true query parameter includes resources in the object stub listings. Specifically: - For `/v1/nodes?resources=true` both the `NodeResources` and `ReservedResources` field are included. - For `/v1/allocations?resources=true` the `AllocatedResources` field is included. The ?task_states=false query parameter removes TaskStates from /v1/allocations responses. (By default TaskStates are included.)	2020-10-14 10:35:22 -07:00
Drew Bailey	684807bddb	namespace filtering	2020-10-14 12:44:43 -04:00
Drew Bailey	df96b89958	Add EvictCallbackFn to handle removing entries from go-memdb when they are removed from the event buffer. Wire up event buffer size config, use pointers for structs.Events instead of copying.	2020-10-14 12:44:42 -04:00
Drew Bailey	315f77a301	rehydrate event publisher on snapshot restore address pr feedback	2020-10-14 12:44:41 -04:00
Drew Bailey	d793529d61	event durability count and cfg	2020-10-14 12:44:40 -04:00
Drew Bailey	b4c135358d	use Events to wrap index and events, store in events table	2020-10-14 12:44:39 -04:00
Drew Bailey	559517455a	wire up enable_event_publisher	2020-10-14 12:44:38 -04:00
Drew Bailey	4793bb4e01	Events/deployment events (#9004 ) * Node Drain events and Node Events (#8980) Deployment status updates handle deployment status updates (paused, failed, resume) deployment alloc health generate events from apply plan result txn err check, slim down deployment event one ndjson line per index * consolidate down to node event + type * fix UpdateDeploymentAllocHealth test invocations * fix test	2020-10-14 12:44:37 -04:00
Drew Bailey	a4a2975edf	Event Stream API/RPC (#8947 ) This Commit adds an /v1/events/stream endpoint to stream events from. The stream framer has been updated to include a SendFull method which does not fragment the data between multiple frames. This essentially treats the stream framer as a envelope to adhere to the stream framer interface in the UI. If the `encode` query parameter is omitted events will be streamed as newline delimted JSON.	2020-10-14 12:44:36 -04:00
James Rasell	e0734bed77	agent: fix enterprise config overlay merging.	2020-10-14 09:35:16 +02:00
Chris Baker	1d35578bed	removed backwards-compatible/untagged metrics deprecated in 0.7	2020-10-13 20:18:39 +00:00
Seth Hoenig	ed13e5723f	consul/connect: dynamically select envoy sidecar at runtime As newer versions of Consul are released, the minimum version of Envoy it supports as a sidecar proxy also gets bumped. Starting with the upcoming Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the default implementation of Connect sidecar proxy. This PR introduces a change such that each Nomad Client will query its local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545) and then launch the Connect sidecar proxy task using the latest supported version of Envoy. If the `SupportedProxies` API component is not available from Consul, Nomad will fallback to the old version of Envoy supported by old versions of Consul. Setting the meta configuration option `meta.connect.sidecar_image` or setting the `connect.sidecar_task` stanza will take precedence as is the current behavior for sidecar proxies. Setting the meta configuration option `meta.connect.gateway_image` will take precedence as is the current behavior for connect gateways. `meta.connect.sidecar_image` and `meta.connect.gateway_image` may make use of the special `${NOMAD_envoy_version}` variable interpolation, which resolves to the newest version of Envoy supported by the Consul agent. Addresses #8585 #7665	2020-10-13 09:14:12 -05:00
Yoan Blanc	891accb89a	use allow/deny instead of the colored alternatives (#9019 ) Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-10-12 08:47:05 -04:00
Seth Hoenig	6cffbecb3a	Merge pull request #9033 from pierreca/verify-remove-checks Do not double-remove checks removed by Consul	2020-10-06 10:16:13 -05:00
James Rasell	ffe6533ad1	Merge pull request #9027 from hashicorp/f-gh-9026 cli: move tests to use NewMockUi func.	2020-10-06 08:28:18 +02:00
Pierre Cauchois	1efe05f516	Do not double-remove checks removed by Consul When deregistering a service, consul also deregisters the associated checks. The current state keeps track of all services and all checks separately and deregisters them in sequence, which leads, whether during syncs or shutdowns, to check deregistrations happening twice and failing the second time (generating errors in logs) This fix includes: - a fix to the sync logic that just pulls the checks after the services have been synced - a fix to the shutdown mechanism that gets an updated list of checks after deregistering the services, so that we get a cleaner check deregistration process.	2020-10-06 00:30:29 +00:00
Chris Baker	7f701fddd0	updated docs and validation to further prohibit null chars in region, datacenter, and job name	2020-10-05 18:01:50 +00:00
James Rasell	2ed78b8a7e	cli: move tests to use NewMockUi func.	2020-10-05 16:07:41 +02:00
Kent 'picat' Gruber	5e1c716835	Merge pull request #8998 from hashicorp/keygen-32-bytes Use 32-byte key for gossip encryption to enable AES-256	2020-10-02 17:17:55 -04:00
Kent 'picat' Gruber	b03f79700c	Fix panic in test due to the agent's logger not being initialized yet So a null logger is used to avoid the problem.	2020-10-02 11:10:27 -04:00
Fredrik Hoem Grelland	953d4de8dd	update consul-template to v0.25.1 (#8988 )	2020-10-01 14:08:49 -04:00
Kent 'picat' Gruber	90e85f9add	Fix other usages of initKeyring func to use logger as third argument	2020-10-01 11:13:06 -04:00
Kent 'picat' Gruber	b98bb99dfe	Log AES-128 and AES-192 key sizes during keyring initialization	2020-10-01 11:12:14 -04:00
Michael Schurter	765473e8b0	jobspec: lower min cpu resources from 10->1 Since CPU resources are usually a soft limit it is desirable to allow setting it as low as possible to allow tasks to run only in "idle" time. Setting it to 0 is still not allowed to avoid potential unintentional side effects with allowing a zero value. While there may not be any side effects this commit attempts to minimize risk by avoiding the issue. This does not change the defaults.	2020-09-30 12:15:13 -07:00
Michael Schurter	1544341f09	Merge pull request #8862 from hashicorp/release-0.12.4 Prepare for 0.13 development cycle	2020-09-10 09:14:44 -07:00
Mahmood Ali	d4f385d6e1	Upgrade to golang 1.15 (#8858 ) Upgrade to golang 1.15 Starting with golang 1.5, setting Ctty value result in `Setctty set but Ctty not valid in child` error, as part of https://github.com/golang/go/issues/29458 . This commit lifts the fix in https://github.com/creack/pty/pull/97 .	2020-09-09 15:59:29 -04:00
Nomad Release bot	3b8a2f22dc	Generate files for 0.12.4-rc1 release	2020-09-03 02:59:23 +00:00
Tim Gross	b77fe023b5	MRD: move 'job stop -global' handling into RPC (#8776 ) The initial implementation of global job stop for MRD looped over all the regions in the CLI for expedience. This changeset includes the OSS parts of moving this into the RPC layer so that API consumers don't have to implement this logic themselves.	2020-08-28 14:28:13 -04:00
Lang Martin	7d483f93c0	csi: plugins track jobs in addition to allocations, and use job information to set expected counts (#8699 ) * nomad/structs/csi: add explicit job support * nomad/state/state_store: capture job updates directly * api/nodes: CSIInfo needs the AllocID * command/agent/csi_endpoint: AllocID was missing Co-authored-by: Tim Gross <tgross@hashicorp.com>	2020-08-27 17:20:00 -04:00
Seth Hoenig	9f1f2a5673	Merge branch 'master' into f-cc-ingress	2020-08-26 15:31:05 -05:00
Seth Hoenig	dfe179abc5	consul/connect: fixup some comments and context timeout	2020-08-26 13:17:16 -05:00
Tim Gross	f9b6c8153c	csi: fix panic in serializing nil allocs in volume API (#8735 ) - fix panic in serializing nil allocs in volume API - prevent potential panic in serializing plugin allocs	2020-08-25 10:13:05 -04:00
Seth Hoenig	26e77623e5	consul/connect: fixup tests to use new consul sdk	2020-08-24 12:02:41 -05:00
Seth Hoenig	c4fa644315	consul/connect: remove envoy dns option from gateway proxy config	2020-08-24 09:11:55 -05:00
Yoan Blanc	327d17e0dc	fixup! vendor: consul/api, consul/sdk v1.6.0 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-08-24 08:59:03 +02:00
Seth Hoenig	5b072029f2	consul/connect: add initial support for ingress gateways This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza. ```hcl service { connect { gateway { proxy { // envoy proxy configuration } ingress { // ingress-gateway configuration entry } } } } ``` A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value. Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver. Aims to address #8294 and tangentially #8647	2020-08-21 16:21:54 -05:00
Nick Ethier	3cd5f46613	Update UI to use new allocated ports fields (#8631 ) * nomad: canonicalize alloc shared resources to populate ports * ui: network ports * ui: remove unused task network references and update tests with new shared ports model * ui: lint * ui: revert auto formatting * ui: remove unused page objects * structs: remove unrelated test from bad conflict resolution * ui: formatting	2020-08-20 11:07:13 -04:00
Tim Gross	22e77bb03c	mrd: remove redundant validation in HTTP endpoint (#8685 ) The `regionForJob` function in the HTTP job endpoint overrides the region for multiregion jobs to `global`, which is used as a sentinel value in the server's job endpoint to avoid re-registration loops. This changeset removes an extraneous check that results in errors in the web UI and makes round-tripping through the HTTP API cumbersome for all consumers.	2020-08-18 16:48:09 -04:00
Lang Martin	6d8165c410	command/agent/csi_endpoint: explicit allocations (#8669 )	2020-08-13 15:48:08 -04:00
Tim Gross	7dca72acbe	csi: fix panic from assignment to nil map in plugin API (#8666 )	2020-08-13 11:36:41 -04:00
Tim Gross	3faa138732	fix panic converting structs to API in CSI endpoint (#8659 )	2020-08-12 15:59:10 -04:00
Nomad Release bot	1ea9d4eb22	Generate files for 0.12.2 release	2020-08-12 00:50:49 +00:00
Lang Martin	07ea822c6a	nomad debug renamed to nomad operator debug (#8602 ) * renamed: command/debug.go -> command/operator_debug.go * website: rename debug -> operator debug * website/pages/api-docs/agent: name in api docs	2020-08-11 15:39:44 -04:00
Lang Martin	c82b2a2454	CSI: volume and plugin allocations in the API (#8590 ) * command/agent/csi_endpoint: explicitly convert to API structs, and convert allocs for single object get endpoints	2020-08-11 12:24:41 -04:00
Tim Gross	443fdaa86b	csi: nomad volume detach command (#8584 ) The soundness guarantees of the CSI specification leave a little to be desired in our ability to provide a 100% reliable automated solution for managing volumes. This changeset provides a new command to bridge this gap by providing the operator the ability to intervene. The command doesn't take an allocation ID so that the operator doesn't have to keep track of alloc IDs that may have been GC'd. Handle this case in the unpublish RPC by sending the client RPC for all the terminal/nil allocs on the selected node.	2020-08-11 10:18:54 -04:00
Seth Hoenig	fd4804bf26	consul: able to set pass/fail thresholds on consul service checks This change adds the ability to set the fields `success_before_passing` and `failures_before_critical` on Consul service check definitions. This is a feature added to Consul v1.7.0 and later. https://www.consul.io/docs/agent/checks#success-failures-before-passing-critical Nomad doesn't do much besides pass the fields through to Consul. Fixes #6913	2020-08-10 14:08:09 -05:00
Drew Bailey	b296558b8e	oss compoments for multi-vault namespaces adds in oss components to support enterprise multi-vault namespace feature upgrade specific doc on vault multi-namespaces vault docs update test to reflect new error	2020-07-24 10:14:59 -04:00
James Rasell	95db43eaf0	Merge pull request #8491 from hashicorp/b-gh-8481 api: task groups in system jobs do not support scaling stanzas.	2020-07-24 14:20:26 +02:00
Nomad Release bot	f2f50bf48e	Generate files for 0.12.1 release	2020-07-23 13:17:59 +00:00
Lars Lehtonen	e26ea30b7e	command/agent: fix dropped test error (#8504 )	2020-07-22 15:06:35 -04:00
James Rasell	2da8bd8f58	agent: task groups in system jobs do not support scaling stanzas.	2020-07-22 11:10:59 +02:00
Mahmood Ali	72ac33e4e7	Refactor setupLoggers	2020-07-17 11:05:57 -04:00
Mahmood Ali	ad2d484974	Set AgentShutdown	2020-07-17 11:04:57 -04:00
Chris Baker	f8478b6f82	Merge branch 'master' of github.com:hashicorp/nomad into release-0.12.0	2020-07-08 21:16:31 +00:00
Nick Ethier	119ece09a0	docs: add CNI and host_network docs (#8391 ) Co-authored-by: Seth Hoenig <shoenig@hashicorp.com>	2020-07-08 15:45:04 -04:00
Nomad Release bot	549e766eab	Generate files for 0.12.0-rc1 release	2020-07-07 03:17:05 +00:00
Nick Ethier	e0fb634309	ar: support opting into binding host ports to default network IP (#8321 ) * ar: support opting into binding host ports to default network IP * fix config plumbing * plumb node address into network resource * struct: only handle network resource upgrade path once	2020-07-06 18:51:46 -04:00
Tim Gross	18250f71fd	fix region flag vs job region handling in plan/submit (#8347 )	2020-07-06 15:46:09 -04:00
Chris Baker	9100b6b7c0	changes to make sure that Max is present and valid, to improve error messages * made api.Scaling.Max a pointer, so we can detect (and complain) when it is neglected * added checks to HCL parsing that it is present * when Scaling.Max is absent/invalid, don't return extraneous error messages during validation * tweak to multiregion handling to ensure that the count is valid on the interpolated regional jobs resolves #8355	2020-07-04 19:05:50 +00:00
Mahmood Ali	329969b97e	tests: make testagent shutdown idempotent Avoid double freeing ports if an agent.Shutdown() is called multiple times.	2020-07-03 09:16:01 -04:00
Lang Martin	6c22cd587d	api: `nomad debug` new /agent/host (#8325 ) * command/agent/host: collect host data, multi platform * nomad/structs/structs: new HostDataRequest/Response * client/agent_endpoint: add RPC endpoint * command/agent/agent_endpoint: add Host * api/agent: add the Host endpoint * nomad/client_agent_endpoint: add Agent Host with forwarding * nomad/client_agent_endpoint: use findClientConn This changes forwardMonitorClient and forwardProfileClient to use findClientConn, which was cribbed from the common parts of those funcs. * command/debug: call agent hosts * command/agent/host: eliminate calling external programs	2020-07-02 09:51:25 -04:00
Tim Gross	23be116da0	csi: add -force flag to volume deregister (#8295 ) The `nomad volume deregister` command currently returns an error if the volume has any claims, but in cases where the claims can't be dropped because of plugin errors, providing a `-force` flag gives the operator an escape hatch. If the volume has no allocations or if they are all terminal, this flag deletes the volume from the state store, immediately and implicitly dropping all claims without further CSI RPCs. Note that this will not also unmount/detach the volume, which we'll make the responsibility of a separate `nomad volume detach` command.	2020-07-01 12:17:51 -04:00
Tim Gross	e52f76ed53	update compiled static assets	2020-06-24 16:37:13 -04:00
Tim Gross	a449009e9f	multiregion validation fixes (#8265 ) Multi-region jobs need to bypass validating counts otherwise we get spurious warnings in Job.Plan.	2020-06-24 12:18:51 -04:00
Seth Hoenig	e79b79034d	connect/native: fixup command/agent/consul/connect test cases	2020-06-24 09:05:56 -05:00
Seth Hoenig	6c5ab7f45e	consul/connect: split connect native flag and task in service	2020-06-23 10:22:22 -05:00
Seth Hoenig	4d71f22a11	consul/connect: add support for running connect native tasks This PR adds the capability of running Connect Native Tasks on Nomad, particularly when TLS and ACLs are enabled on Consul. The `connect` stanza now includes a `native` parameter, which can be set to the name of task that backs the Connect Native Consul service. There is a new Client configuration parameter for the `consul` stanza called `share_ssl`. Like `allow_unauthenticated` the default value is true, but recommended to be disabled in production environments. When enabled, the Nomad Client's Consul TLS information is shared with Connect Native tasks through the normal Consul environment variables. This does NOT include auth or token information. If Consul ACLs are enabled, Service Identity Tokens are automatically and injected into the Connect Native task through the CONSUL_HTTP_TOKEN environment variable. Any of the automatically set environment variables can be overridden by the Connect Native task using the `env` stanza. Fixes #6083	2020-06-22 14:07:44 -05:00
Michael Schurter	562704124d	Merge pull request #8208 from hashicorp/f-multi-network multi-interface network support	2020-06-19 15:46:48 -07:00
Mahmood Ali	963b1251ff	Merge pull request #8082 from hashicorp/f-raft-multipler Implement raft multipler flag	2020-06-19 10:04:59 -04:00
Nick Ethier	f0559a8162	multi-interface network support	2020-06-19 09:42:10 -04:00
Mahmood Ali	38a01c050e	Merge pull request #8192 from hashicorp/f-status-allnamespaces-2 CLI Allow querying all namespaces for jobs and allocations - Try 2	2020-06-18 20:16:52 -04:00
Nick Ethier	0bc0403cc3	Task DNS Options (#7661 ) Co-Authored-By: Tim Gross <tgross@hashicorp.com> Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>	2020-06-18 11:01:31 -07:00
Mahmood Ali	e784fe331a	use '*' to indicate all namespaces This reverts the introduction of AllNamespaces parameter that was merged earlier but never got released.	2020-06-17 16:27:43 -04:00
Tim Gross	7b12445f29	multiregion: change AutoRevert to OnFailure	2020-06-17 11:05:45 -04:00
Tim Gross	b09b7a2475	Multiregion job registration Integration points for multiregion jobs to be registered in the enterprise version of Nomad: * hook in `Job.Register` for enterprise to send job to peer regions * remove monitoring from `nomad job run` and `nomad job stop` for multiregion jobs	2020-06-17 11:04:58 -04:00
Tim Gross	161bcd9479	use constants from http package	2020-06-17 11:04:02 -04:00
Tim Gross	b93efc16d5	multiregion CLI: nomad deployment unblock	2020-06-17 11:03:44 -04:00
Drew Bailey	9263fcb0d3	Multiregion deploy status and job status CLI	2020-06-17 11:03:34 -04:00
Tim Gross	6851024925	Multiregion structs Initial struct definitions, jobspec parsing, validation, and conversion between Nomad structs and API structs for multi-region deployments.	2020-06-17 11:00:14 -04:00
Chris Baker	1e3563e08c	wip: added PreserveCounts to struct.JobRegisterRequest, development test for Job.Register	2020-06-16 18:45:17 +00:00
Mahmood Ali	69bb42acf8	tests: prefix agent logs to identify agent sources	2020-06-07 16:38:11 -04:00
Mahmood Ali	9eb13ae144	basic snapshot restore	2020-06-07 15:46:23 -04:00
Mahmood Ali	de44d9641b	Merge pull request #8047 from hashicorp/f-snapshot-save API for atomic snapshot backups	2020-06-01 07:55:16 -04:00
Mahmood Ali	a73cd01a00	Merge pull request #8001 from hashicorp/f-jobs-list-across-nses endpoint to expose all jobs across all namespaces	2020-05-31 21:28:03 -04:00
Mahmood Ali	0e8fafd739	implement raft multiplier	2020-05-31 12:24:27 -04:00
Drew Bailey	23d24c7a7f	removes pro tags (#8014 )	2020-05-28 15:40:17 -04:00
Drew Bailey	34871f89be	Oss license support for ent builds (#8054 ) * changes necessary to support oss licesning shims revert nomad fmt changes update test to work with enterprise changes update tests to work with new ent enforcements make check update cas test to use scheduler algorithm back out preemption changes add comments * remove unused method	2020-05-27 13:46:52 -04:00
Mahmood Ali	2108681c1d	Endpoint for snapshotting server state	2020-05-21 20:04:38 -04:00
James Rasell	ae0fb98c6b	api: return custom error if API attempts to decode empty body.	2020-05-19 15:46:31 +02:00
Mahmood Ali	5ab2d52e27	endpoint to expose all jobs across all namespaces Allow a `/v1/jobs?all_namespaces=true` to list all jobs across all namespaces. The returned list is to contain a `Namespace` field indicating the job namespace. If ACL is enabled, the request token needs to be a management token or have `namespace:list-jobs` capability on all existing namespaces.	2020-05-18 13:50:46 -04:00
Nomad Release bot	189a378549	Generate files for 0.11.2 release	2020-05-14 20:49:42 +00:00
Mahmood Ali	9366181be6	always check `default_scheduler_config` config Also, avoid early return on validation to avoid masking some validation bugs in dev setup.	2020-05-14 14:16:12 -04:00
Lang Martin	d3c4700cd3	server: stop after client disconnect (#7939 ) * jobspec, api: add stop_after_client_disconnect * nomad/state/state_store: error message typo * structs: alloc methods to support stop_after_client_disconnect 1. a global AllocStates to track status changes with timestamps. We need this to track the time at which the alloc became lost originally. 2. ShouldClientStop() and WaitClientStop() to actually do the math * scheduler/reconcile_util: delayByStopAfterClientDisconnect * scheduler/reconcile: use delayByStopAfterClientDisconnect * scheduler/util: updateNonTerminalAllocsToLost comments This was setup to only update allocs to lost if the DesiredStatus had already been set by the scheduler. It seems like the intention was to update the status from any non-terminal state, and not all lost allocs have been marked stop or evict by now * scheduler/testing: AssertEvalStatus just use require * scheduler/generic_sched: don't create a blocked eval if delayed * scheduler/generic_sched_test: several scheduling cases	2020-05-13 16:39:04 -04:00
Tim Gross	4374c1a837	csi: support Secrets parameter in CSI RPCs (#7923 ) CSI plugins can require credentials for some publishing and unpublishing workflow RPCs. Secrets are configured at the time of volume registration, stored in the volume struct, and then passed around as an opaque map by Nomad to the plugins.	2020-05-11 17:12:51 -04:00
Mahmood Ali	061a439f2c	Merge pull request #7912 from hashicorp/f-scheduler-algorithm-followup Scheduler Algorithm Defaults handling and docs	2020-05-11 09:30:58 -04:00
Tim Gross	3aa761b151	Periodic GC for volume claims (#7881 ) This changeset implements a periodic garbage collection of CSI volumes with missing allocations. This can happen in a scenario where a node update fails partially and the allocation updates are written to raft but the evaluations to GC the volumes are dropped. This feature will cover this edge case and ensure that upgrades from 0.11.0 and 0.11.1 get any stray claims cleaned up.	2020-05-11 08:20:50 -04:00
Mahmood Ali	2c963885b0	handle upgrade path and defaults Ensure that `""` Scheduler Algorithm gets explicitly set to binpack on upgrades or on API handling when user misses the value. The scheduler already treats `""` value as binpack. This PR merely ensures that the operator API returns the effective value.	2020-05-09 12:34:08 -04:00
Tim Gross	801ebcfe8d	periodic GC for CSI plugins (#7878 ) This changeset implements a periodic garbage collection of unused CSI plugins. Plugins are self-cleaning when the last allocation for a plugin is stopped, but this feature will cover any missing edge cases and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins cleaned up.	2020-05-06 16:49:12 -04:00
Mahmood Ali	b9e3cde865	tests and some clean up	2020-05-01 13:13:30 -04:00
Charlie Voiselle	663fb677cf	Add SchedulerAlgorithm to SchedulerConfig	2020-05-01 13:13:29 -04:00
Drew Bailey	42075ef30e	allow test to check if server is enterprise	2020-04-30 14:46:21 -04:00
Drew Bailey	59b76f90e8	hcl fmt from editor license cli formatting, license endpoints ent only test oss error type assertions	2020-04-30 14:46:18 -04:00
Mahmood Ali	b8fb32f5d2	http: adjust log level for request failure Failed requests due to API client errors are to be marked as DEBUG. The Error log level should be reserved to signal problems with the cluster and are actionable for nomad system operators. Logs due to misbehaving API clients don't represent a system level problem and seem spurius to nomad maintainers at best. These log messages can also be attack vectors for deniel of service attacks by filling servers disk space with spurious log messages.	2020-04-22 16:19:59 -04:00
Mahmood Ali	5b42796f1e	Merge pull request #7704 from hashicorp/b-agent-shutdown-order agent: shutdown agent http server last	2020-04-20 10:37:26 -04:00
Mahmood Ali	4e1366f285	agent: route http logs through hclog Pipe http server log to hclog, so that it uses the same logging format as rest of nomad logs. Also, supports emitting them as json logs, when json formatting is set. The http server logs are emitted as Trace level, as they are typically repsent HTTP client errors (e.g. failed tls handshakes, invalid headers, etc). Though, Panic logs represent server errors and are relayed as Error level.	2020-04-20 10:33:40 -04:00
Mahmood Ali	b78680eee7	agent: shutdown agent http server last Shutdown http server last, after nomad client/server components terminate. Before this change, if the agent is taking an unexpectedly long time to shutdown, the operator cannot query the http server directly: they cannot access agent specific http endpoints and need to query another agent about the troublesome agent. Unexpectedly long shutdown can happen in normal cases, e.g. a client might hung is if one of the allocs it is running has a long shutdown_delay. Here, we switch to ensuring that the http server is shutdown last. I believe this doesn't require extra care in agent shutting down logic while operators may be able to submit write http requests. We already need to cope with operators submiting these http requests to another agent or by servers updating the client allocations.	2020-04-13 10:50:07 -04:00
Mahmood Ali	14d6fec05a	tests: deflake some SetServer related tests Some tests assert on numbers on numbers of servers, e.g. TestHTTP_AgentSetServers and TestHTTP_AgentListServers_ACL . Though, in dev and test modes, the agent starts with servers having duplicate entries for advertised and normalized RPC values, then settles with one unique value after Raft/Serf re-sets servers with one single unique value. This leads to flakiness, as the test will fail if assertion runs before Serf update takes effect. Here, we update the inital dev handling so it only adds a unique value if the advertised and normalized values are the same. Sample log lines illustrating the problem: ``` === CONT TestHTTP_AgentSetServers TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO] nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:127.0.0.1:9008 Address:127.0.0.1:9008}]" TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO] nomad: serf: EventMemberJoin: TestHTTP_AgentSetServers.global 127.0.0.1 TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.035Z [DEBUG] client.server_mgr: new server list: new_servers=[127.0.0.1:9008, 127.0.0.1:9008] old_servers=[] ... TestHTTP_AgentSetServers: agent_endpoint_test.go:759: Error Trace: agent_endpoint_test.go:759 http_test.go:1089 agent_endpoint_test.go:705 Error: "[127.0.0.1:9008 127.0.0.1:9008]" should have 1 item(s), but has 2 Test: TestHTTP_AgentSetServers ```	2020-04-07 09:27:48 -04:00
Mahmood Ali	ed4c4d13a4	fixup! backend: support WS authentication handshake in alloc/exec	2020-04-03 14:20:31 -04:00
Mahmood Ali	e63e096136	backend: support WS authentication handshake in alloc/exec The javascript Websocket API doesn't support setting custom headers (e.g. `X-Nomad-Token`). This change adds support for having an authentication handshake message: clients can set `ws_handshake` URL query parameter to true and send a single handshake message with auth token first before any other mssage. This is a backward compatible change: it does not affect nomad CLI path, as it doesn't set `ws_handshake` parameter.	2020-04-03 11:18:54 -04:00
Mahmood Ali	990cfb6fef	agent config parsing tests for scheduler config	2020-04-03 07:54:32 -04:00
Chris Baker	277d29c6e7	Merge pull request #7572 from hashicorp/f-7422-scaling-events finalizing scaling API work	2020-04-01 13:49:22 -05:00
Seth Hoenig	9aa9721143	connect: fix bug where absent connect.proxy stanza needs default config In some refactoring, a bug was introduced where if the connect.proxy stanza in a submitted job was nil, the default proxy configuration would not be initialized with default values, effectively breaking Connect. connect { sidecar_service {} # should work } In contrast, by setting an empty proxy stanza, the config values would be inserted correctly. connect { sidecar_service { proxy {} # workaround } } This commit restores the original behavior, where having a proxy stanza present is not required. The unit test for this case has also been corrected.	2020-04-01 11:19:32 -06:00
Chris Baker	40d6b3bbd1	adding raft and state_store support to track job scaling events updated ScalingEvent API to record "message string,error bool" instead of confusing "reason,error *string"	2020-04-01 16:15:14 +00:00
Seth Hoenig	14c7cebdea	connect: enable automatic expose paths for individual group service checks Part of #6120 Building on the support for enabling connect proxy paths in #7323, this change adds the ability to configure the 'service.check.expose' flag on group-level service check definitions for services that are connect-enabled. This is a slight deviation from the "magic" that Consul provides. With Consul, the 'expose' flag exists on the connect.proxy stanza, which will then auto-generate expose paths for every HTTP and gRPC service check associated with that connect-enabled service. A first attempt at providing similar magic for Nomad's Consul Connect integration followed that pattern exactly, as seen in #7396. However, on reviewing the PR we realized having the `expose` flag on the proxy stanza inseperably ties together the automatic path generation with every HTTP/gRPC defined on the service. This makes sense in Consul's context, because a service definition is reasonably associated with a single "task". With Nomad's group level service definitions however, there is a reasonable expectation that a service definition is more abstractly representative of multiple services within the task group. In this case, one would want to define checks of that service which concretely make HTTP or gRPC requests to different underlying tasks. Such a model is not possible with the course `proxy.expose` flag. Instead, we now have the flag made available within the check definitions themselves. By making the expose feature resolute to each check, it is possible to have some HTTP/gRPC checks which make use of the envoy exposed paths, as well as some HTTP/gRPC checks which make use of some orthongonal port-mapping to do checks on some other task (or even some other bound port of the same task) within the task group. Given this example, group "server-group" { network { mode = "bridge" port "forchecks" { to = -1 } } service { name = "myserver" port = 2000 connect { sidecar_service { } } check { name = "mycheck-myserver" type = "http" port = "forchecks" interval = "3s" timeout = "2s" method = "GET" path = "/classic/responder/health" expose = true } } } Nomad will automatically inject (via job endpoint mutator) the extrapolated expose path configuration, i.e. expose { path { path = "/classic/responder/health" protocol = "http" local_path_port = 2000 listener_port = "forchecks" } } Documentation is coming in #7440 (needs updating, doing next) Modifications to the `countdash` examples in https://github.com/hashicorp/demo-consul-101/pull/6 which will make the examples in the documentation actually runnable. Will add some e2e tests based on the above when it becomes available.	2020-03-31 17:15:50 -06:00
Seth Hoenig	41244c5857	jobspec: parse multi expose.path instead of explicit slice	2020-03-31 17:15:27 -06:00
Seth Hoenig	0266f056b8	connect: enable proxy.passthrough configuration Enable configuration of HTTP and gRPC endpoints which should be exposed by the Connect sidecar proxy. This changeset is the first "non-magical" pass that lays the groundwork for enabling Consul service checks for tasks running in a network namespace because they are Connect-enabled. The changes here provide for full configuration of the connect { sidecar_service { proxy { expose { paths = [{ path = <exposed endpoint> protocol = <http or grpc> local_path_port = <local endpoint port> listener_port = <inbound mesh port> }, ... ] } } } stanza. Everything from `expose` and below is new, and partially implements the precedent set by Consul: https://www.consul.io/docs/connect/registration/service-registration.html#expose-paths-configuration-reference Combined with a task-group level network port-mapping in the form: port "exposeExample" { to = -1 } it is now possible to "punch a hole" through the network namespace to a specific HTTP or gRPC path, with the anticipated use case of creating Consul checks on Connect enabled services. A future PR may introduce more automagic behavior, where we can do things like 1) auto-fill the 'expose.path.local_path_port' with the default value of the 'service.port' value for task-group level connect-enabled services. 2) automatically generate a port-mapping 3) enable an 'expose.checks' flag which automatically creates exposed endpoints for every compatible consul service check (http/grpc checks on connect enabled services).	2020-03-31 17:15:27 -06:00
Seth Hoenig	1ce4eb17fa	client: use consistent name for struct receiver parameter This helps reduce the number of squiggly lines in Goland.	2020-03-31 17:15:27 -06:00
Yoan Blanc	225c9c1215	fixup! vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:48:07 -04:00
Yoan Blanc	761d014071	vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:45:21 -04:00
Seth Hoenig	b3664c628c	Merge pull request #7524 from hashicorp/docs-consul-acl-minimums consul: annotate Consul interfaces with ACLs	2020-03-30 13:27:27 -06:00
Mahmood Ali	7df337e4c4	Merge pull request #7534 from hashicorp/b-windows-dev-network windows: support -dev mode	2020-03-30 14:35:28 -04:00
Seth Hoenig	0a812ab689	consul: annotate Consul interfaces with ACLs	2020-03-30 10:17:28 -06:00
Drew Bailey	a98dc8c768	update audit examples to an endpoint that is audited	2020-03-30 10:03:11 -04:00
Mahmood Ali	dedf1cd3d7	tests: remove TestHTTP_NodeDrain_Compat Nomad 0.11 servers no longer support having pre-0.8 clients.	2020-03-30 07:06:52 -04:00
Mahmood Ali	8b2b3f99d3	tests: deflake TestHTTP_NodeDrain A node may be recognized as not running any allocs and have its drain flag reset before the test queries it.	2020-03-30 07:06:52 -04:00
Mahmood Ali	b0cc23ae63	tests: deflake TestConsul_PeriodicSync	2020-03-30 07:06:47 -04:00
Mahmood Ali	ec6afa5795	windows: support -dev mode Support running `nomad agent -dev` in Windows, by setting proper network interface. Prior to this change, `nomad` uses `lo` interface but Windows uses "Loopback Pseudo-Interface 1" to refer to loopback device interface: https://github.com/golang/go/blob/go1.14.1/src/net/net_windows_test.go#L304-L318 .	2020-03-28 12:01:51 -04:00
Drew Bailey	a66b4be0f3	remove auditing for /ui/	2020-03-27 10:12:42 -04:00
Drew Bailey	de687edb2e	wrap http.Handlers better comments	2020-03-27 09:35:10 -04:00
Drew Bailey	b96a4da6fc	sync changes made to oss files from ent	2020-03-25 10:57:44 -04:00
Drew Bailey	218bfff6dd	add in change missed from ent	2020-03-25 10:53:38 -04:00
Drew Bailey	97cc19276d	add auditor	2020-03-25 10:48:23 -04:00
Drew Bailey	7329a88758	allow all build contexts to use noOpAuditor	2020-03-25 10:38:40 -04:00
Mahmood Ali	1c1186b344	Merge pull request #7487 from hashicorp/b-xss-oss agent: prevent XSS by controlling Content-Type	2020-03-25 09:56:11 -04:00
Michael Schurter	29622013fa	remove double negative from comment Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>	2020-03-25 09:45:43 -04:00
Michael Schurter	1a27b8a07d	test: assert monitor endpoint sets proper headers	2020-03-25 09:45:43 -04:00
Michael Schurter	d6d44a8214	test: assert fs endpoints are xss safe	2020-03-25 09:45:43 -04:00
Michael Schurter	5ff458e840	agent: prevent XSS by controlling Content-Type	2020-03-25 09:45:43 -04:00
Mahmood Ali	c7cf60c837	tests: test agent to use a noop auditor	2020-03-25 08:45:44 -04:00
Mahmood Ali	ceed57b48f	per-task restart policy	2020-03-24 17:00:41 -04:00
Lang Martin	8bd0405f33	csi: return an empty result list from plugins & volumes without `type`, not an error (#7471 )	2020-03-24 14:28:28 -04:00
Chris Baker	bc13bfb433	bad conversion between api.ScalingPolicy and structs.ScalingPolicy meant that we were throwing away .Min if provided	2020-03-24 14:39:06 +00:00
Chris Baker	f6ec5f9624	made count optional during job scaling actions added ACL protection in Job.Scale in Job.Scale, only perform a Job.Register if the Count was non-nil	2020-03-24 14:39:05 +00:00
Chris Baker	233db5258a	changes to Canonicalize, Validate, and api->struct conversion so that tg.Count, tg.Scaling.Min/Max are well-defined with reasonable defaults. - tg.Count defaults to tg.Scaling.Min if present (falls back on previous default of 1 if Scaling is absent) - Validate() enforces tg.Scaling.Min <= tg.Count <= tg.Scaling.Max modification in ApiScalingPolicyToStructs, api.TaskGroup.Validate so that defaults are handled for TaskGroup.Count and	2020-03-24 13:57:17 +00:00
Chris Baker	00092a6c29	fixed http endpoints for job.register and job.scalestatus	2020-03-24 13:57:16 +00:00
Chris Baker	925b59e1d2	wip: scaling status return, almost done	2020-03-24 13:57:15 +00:00
Chris Baker	42270d862c	wip: some tests still failing updating job scaling endpoints to match RFC, cleaning up the API object as well	2020-03-24 13:57:14 +00:00
Chris Baker	abc7a52f56	finished refactoring state store, schema, etc	2020-03-24 13:57:14 +00:00
Chris Baker	3d54f1feba	wip: added Enabled to ScalingPolicyListStub, removed JobID from body of scaling request	2020-03-24 13:57:12 +00:00
Chris Baker	024d203267	wip: added tests for client methods around group scaling	2020-03-24 13:57:11 +00:00
Chris Baker	1c5c2eb71b	wip: add GET endpoint for job group scaling target	2020-03-24 13:57:10 +00:00
Chris Baker	179ab68258	wip: added job.scale rpc endpoint, needs explicit test (tested via http now)	2020-03-24 13:57:09 +00:00
Chris Baker	8453e667c2	wip: working on job group scaling endpoint	2020-03-24 13:55:20 +00:00
Chris Baker	6665d0bfb0	wip: added policy get endpoint, added UUID to policy	2020-03-24 13:55:20 +00:00
Chris Baker	9c2560ceeb	wip: upsert/delete scaling policies on job upsert/delete	2020-03-24 13:55:18 +00:00
Chris Baker	65d92f1fbf	WIP: adding ScalingPolicy to api/structs and state store	2020-03-24 13:55:18 +00:00
Drew Bailey	10f3b6899b	rename struct field to auditor	2020-03-23 20:09:01 -04:00
Drew Bailey	cf5fcf3748	make auditor interface more explicit	2020-03-23 19:32:58 -04:00
Drew Bailey	d0d32d8f06	fix compilation with correct func	2020-03-23 14:32:11 -04:00
Tim Gross	076fbbf08f	Merge pull request #7012 from hashicorp/f-csi-volumes Container Storage Interface Support	2020-03-23 14:19:46 -04:00
Lang Martin	e100444740	csi: add mount_options to volumes and volume requests (#7398 ) Add mount_options to both the volume definition on registration and to the volume block in the group where the volume is requested. If both are specified, the options provided in the request replace the options defined in the volume. They get passed to the NodePublishVolume, which causes the node plugin to actually mount the volume on the host. Individual tasks just mount bind into the host mounted volume (unchanged behavior). An operator can mount the same volume with different options by specifying it twice in the group context. closes #7007 * nomad/structs/volumes: add MountOptions to volume request * jobspec/test-fixtures/basic.hcl: add mount_options to volume block * jobspec/parse_test: add expected MountOptions * api/tasks: add mount_options * jobspec/parse_group: use hcl decode not mapstructure, mount_options * client/allocrunner/csi_hook: pass MountOptions through client/allocrunner/csi_hook: add a VolumeMountOptions client/allocrunner/csi_hook: drop Options client/allocrunner/csi_hook: use the structs options * client/pluginmanager/csimanager/interface: UsageOptions.MountOptions * client/pluginmanager/csimanager/volume: pass MountOptions in capabilities * plugins/csi/plugin: remove todo 7007 comment * nomad/structs/csi: MountOptions * api/csi: add options to the api for parsing, match structs * plugins/csi/plugin: move VolumeMountOptions to structs * api/csi: use specific type for mount_options * client/allocrunner/csi_hook: merge MountOptions here * rename CSIOptions to CSIMountOptions * client/allocrunner/csi_hook * client/pluginmanager/csimanager/volume * nomad/structs/csi * plugins/csi/fake/client: add PrevVolumeCapability * plugins/csi/plugin * client/pluginmanager/csimanager/volume_test: remove debugging * client/pluginmanager/csimanager/volume: fix odd merging logic * api: rename CSIOptions -> CSIMountOptions * nomad/csi_endpoint: remove a 7007 comment * command/alloc_status: show mount options in the volume list * nomad/structs/csi: include MountOptions in the volume stub * api/csi: add MountOptions to stub * command/volume_status_csi: clean up csiVolMountOption, add it * command/alloc_status: csiVolMountOption lives in volume_csi_status * command/node_status: display mount flags * nomad/structs/volumes: npe * plugins/csi/plugin: npe in ToCSIRepresentation * jobspec/parse_test: expand volume parse test cases * command/agent/job_endpoint: ApiTgToStructsTG needs MountOptions * command/volume_status_csi: copy paste error * jobspec/test-fixtures/basic: hclfmt * command/volume_status_csi: clean up csiVolMountOption	2020-03-23 13:59:25 -04:00
Lang Martin	99841222ed	csi: change the API paths to match CLI command layout (#7325 ) * command/agent/csi_endpoint: support type filter in volumes & plugins * command/agent/http: use /v1/volume/csi & /v1/plugin/csi * api/csi: use /v1/volume/csi & /v1/plugin/csi * api/nodes: use /v1/volume/csi & /v1/plugin/csi * api/nodes: not /volumes/csi, just /volumes * command/agent/csi_endpoint: fix ot parameter parsing	2020-03-23 13:58:30 -04:00
Lang Martin	80619137ab	csi: volumes listed in `nomad node status` (#7318 ) * api/allocations: GetTaskGroup finds the taskgroup struct * command/node_status: display CSI volume names * nomad/state/state_store: new CSIVolumesByNodeID * nomad/state/iterator: new SliceIterator type implements memdb.ResultIterator * nomad/csi_endpoint: deal with a slice of volumes * nomad/state/state_store: CSIVolumesByNodeID return a SliceIterator * nomad/structs/csi: CSIVolumeListRequest takes a NodeID * nomad/csi_endpoint: use the return iterator * command/agent/csi_endpoint: parse query params for CSIVolumes.List * api/nodes: new CSIVolumes to list volumes by node * command/node_status: use the new list endpoint to print volumes * nomad/state/state_store: error messages consider the operator * command/node_status: include the Provider	2020-03-23 13:58:30 -04:00
Lang Martin	887e1f28c9	csi: CLI for volume status, registration/deregistration and plugin status (#7193 ) * command/csi: csi, csi_plugin, csi_volume * helper/funcs: move ExtraKeys from parse_config to UnusedKeys * command/agent/config_parse: use helper.UnusedKeys * api/csi: annotate CSIVolumes with hcl fields * command/csi_plugin: add Synopsis * command/csi_volume_register: use hcl.Decode style parsing * command/csi_volume_list * command/csi_volume_status: list format, cleanup * command/csi_plugin_list * command/csi_plugin_status * command/csi_volume_deregister * command/csi_volume: add Synopsis * api/contexts/contexts: add csi search contexts to the constants * command/commands: register csi commands * api/csi: fix struct tag for linter * command/csi_plugin_list: unused struct vars * command/csi_plugin_status: unused struct vars * command/csi_volume_list: unused struct vars * api/csi: add allocs to CSIPlugin * command/csi_plugin_status: format the allocs * api/allocations: copy Allocation.Stub in from structs * nomad/client_rpc: add some error context with Errorf * api/csi: collapse read & write alloc maps to a stub list * command/csi_volume_status: cleanup allocation display * command/csi_volume_list: use Schedulable instead of Healthy * command/csi_volume_status: use Schedulable instead of Healthy * command/csi_volume_list: sprintf string * command/csi: delete csi.go, csi_plugin.go * command/plugin: refactor csi components to sub-command plugin status * command/plugin: remove csi * command/plugin_status: remove csi * command/volume: remove csi * command/volume_status: split out csi specific * helper/funcs: add RemoveEqualFold * command/agent/config_parse: use helper.RemoveEqualFold * api/csi: do ,unusedKeys right * command/volume: refactor csi components to `nomad volume` * command/volume_register: split out csi specific * command/commands: use the new top level commands * command/volume_deregister: hardwired type csi for now * command/volume_status: csiFormatVolumes rescued from volume_list * command/plugin_status: avoid a panic on no args * command/volume_status: avoid a panic on no args * command/plugin_status: predictVolumeType * command/volume_status: predictVolumeType * nomad/csi_endpoint_test: move CreateTestPlugin to testing * command/plugin_status_test: use CreateTestCSIPlugin * nomad/structs/structs: add CSIPlugins and CSIVolumes search consts * nomad/state/state_store: add CSIPlugins and CSIVolumesByIDPrefix * nomad/search_endpoint: add CSIPlugins and CSIVolumes * command/plugin_status: move the header to the csi specific * command/volume_status: move the header to the csi specific * nomad/state/state_store: CSIPluginByID prefix * command/status: rename the search context to just Plugins/Volumes * command/plugin,volume_status: test return ids now * command/status: rename the search context to just Plugins/Volumes * command/plugin_status: support -json and -t * command/volume_status: support -json and -t * command/plugin_status_csi: comments * command/_status: clean up text api/csi: fix stale comments * command/volume: make deregister sound less fearsome * command/plugin_status: set the id length * command/plugin_status_csi: more compact plugin health * command/volume: better error message, comment	2020-03-23 13:58:30 -04:00
Danielle Lancashire	15c6c05ccf	api: Parse CSI Volumes Previously when deserializing volumes we skipped over volumes that were not of type `host`. This commit ensures that we parse both host and csi volumes correctly.	2020-03-23 13:58:30 -04:00
Lang Martin	88316208a0	csi: server-side plugin state tracking and api (#6966 ) * structs: CSIPlugin indexes jobs acting as plugins and node updates * schema: csi_plugins table for CSIPlugin * nomad: csi_endpoint use vol.Denormalize, plugin requests * nomad: csi_volume_endpoint: rename to csi_endpoint * agent: add CSI plugin endpoints * state_store_test: use generated ids to avoid t.Parallel conflicts * contributing: add note about registering new RPC structs * command: agent http register plugin lists * api: CSI plugin queries, ControllerHealthy -> ControllersHealthy * state_store: copy on write for volumes and plugins * structs: copy on write for volumes and plugins * state_store: CSIVolumeByID returns an unhealthy volume, denormalize * nomad: csi_endpoint use CSIVolumeDenormalizePlugins * structs: remove struct errors for missing objects * nomad: csi_endpoint return nil for missing objects, not errors * api: return meta from Register to avoid EOF error * state_store: CSIVolumeDenormalize keep allocs in their own maps * state_store: CSIVolumeDeregister error on missing volume * state_store: CSIVolumeRegister set indexes * nomad: csi_endpoint use CSIVolumeDenormalizePlugins tests	2020-03-23 13:58:29 -04:00
Lang Martin	2f646fa5e9	agent: csi endpoint	2020-03-23 13:58:29 -04:00
Danielle Lancashire	426c26d7c0	CSI Plugin Registration (#6555 ) This changeset implements the initial registration and fingerprinting of CSI Plugins as part of #5378. At a high level, it introduces the following: * A `csi_plugin` stanza as part of a Nomad task configuration, to allow a task to expose that it is a plugin. * A new task runner hook: `csi_plugin_supervisor`. This hook does two things. When the `csi_plugin` stanza is detected, it will automatically configure the plugin task to receive bidirectional mounts to the CSI intermediary directory. At runtime, it will then perform an initial heartbeat of the plugin and handle submitting it to the new `dynamicplugins.Registry` for further use by the client, and then run a lightweight heartbeat loop that will emit task events when health changes. * The `dynamicplugins.Registry` for handling plugins that run as Nomad tasks, in contrast to the existing catalog that requires `go-plugin` type plugins and to know the plugin configuration in advance. * The `csimanager` which fingerprints CSI plugins, in a similar way to `drivermanager` and `devicemanager`. It currently only fingerprints the NodeID from the plugin, and assumes that all plugins are monolithic. Missing features * We do not use the live updates of the `dynamicplugin` registry in the `csimanager` yet. * We do not deregister the plugins from the client when they shutdown yet, they just become indefinitely marked as unhealthy. This is deliberate until we figure out how we should manage deploying new versions of plugins/transitioning them.	2020-03-23 13:58:28 -04:00
Drew Bailey	b09abef332	Audit config, seams for enterprise audit features allow oss to parse sink duration clean up audit sink parsing ent eventer config reload fix typo SetEnabled to eventer interface client acl test rm dead code fix failing test	2020-03-23 13:47:42 -04:00
Jasmine Dahilig	73a64e4397	change jobspec lifecycle stanza to use sidecar attribute instead of block_until status	2020-03-21 17:52:57 -04:00
Jasmine Dahilig	1485b342e2	remove deadline code for now	2020-03-21 17:52:56 -04:00
Jasmine Dahilig	7b3f3497ed	mock task hook coordinator in consul integration test	2020-03-21 17:52:55 -04:00
Jasmine Dahilig	fc13fa9739	change TaskLifecycle RunLevel to Hook and add Deadline time duration	2020-03-21 17:52:37 -04:00
Mahmood Ali	4ebeac721a	update structs with lifecycle	2020-03-21 17:52:36 -04:00
James Rasell	e3d14cc634	cli: fix indentation issue with -dev-connect agent help output.	2020-03-18 12:25:20 +01:00
Michael Schurter	b72b3e765c	Merge pull request #7170 from fredrikhgrelland/consul_template_upgrade Update consul-template to v0.24.1 and remove deprecated vault grace	2020-03-10 14:15:47 -07:00
Mahmood Ali	19f25f588f	Merge pull request #7252 from hashicorp/b-test-cluster-forming Simplify Bootstrap logic in tests	2020-03-03 16:56:08 -05:00
Mahmood Ali	acbfeb5815	Simplify Bootstrap logic in tests This change updates tests to honor `BootstrapExpect` exclusively when forming test clusters and removes test only knobs, e.g. `config.DevDisableBootstrap`. Background: Test cluster creation is fragile. Test servers don't follow the BootstapExpected route like production clusters. Instead they start as single node clusters and then get rejoin and may risk causing brain split or other test flakiness. The test framework expose few knobs to control those (e.g. `config.DevDisableBootstrap` and `config.Bootstrap`) that control whether a server should bootstrap the cluster. These flags are confusing and it's unclear when to use: their usage in multi-node cluster isn't properly documented. Furthermore, they have some bad side-effects as they don't control Raft library: If `config.DevDisableBootstrap` is true, the test server may not immediately attempt to bootstrap a cluster, but after an election timeout (~50ms), Raft may force a leadership election and win it (with only one vote) and cause a split brain. The knobs are also confusing as Bootstrap is an overloaded term. In BootstrapExpect, we refer to bootstrapping the cluster only after N servers are connected. But in tests and the knobs above, it refers to whether the server is a single node cluster and shouldn't wait for any other server. Changes: This commit makes two changes: First, it relies on `BootstrapExpected` instead of `Bootstrap` and/or `DevMode` flags. This change is relatively trivial. Introduce a `Bootstrapped` flag to track if the cluster is bootstrapped. This allows us to keep `BootstrapExpected` immutable. Previously, the flag was a config value but it gets set to 0 after cluster bootstrap completes.	2020-03-02 13:47:43 -05:00
Mahmood Ali	386f20099b	Honor CNI and bridge related fields Nomad agent may silently ignore cni_path and bridge setting, when it merges configs from multiple files (or against default/dev config). This PR ensures that the values are merged properly.	2020-02-28 14:23:13 -05:00
Mahmood Ali	437d03779c	tests: add tests for parsing cni fields	2020-02-28 14:18:45 -05:00
Fredrik Hoem Grelland	edb3bd0f3f	Update consul-template to v0.24.1 and remove deprecated vault_grace (#7170 )	2020-02-23 16:24:53 +01:00
Seth Hoenig	0f99cdd0d9	Merge pull request #7192 from hashicorp/b-connect-stanza-ignore consul/connect: in-place update sidecar service registrations on changes	2020-02-21 09:24:53 -06:00
Seth Hoenig	54b5173eca	consul/connect: in-place update sidecar service registrations on changes Fix a bug where consul service definitions would not be updated if changes were made to the service in the Nomad job. Currently this only fixes the bug for cases where the fix is a matter of updating consul agent's service registration. There is related bug where destructive changes are required (see #6877) which will be fixed in another PR. The enable_tag_override configuration setting for the parent service is applied to the sidecar service. Fixes #6459	2020-02-19 13:07:04 -06:00
Mahmood Ali	f4d8e1296f	Merge pull request #7171 from hashicorp/update-autopilot-20200214 Update consul vendor and add MinQuorum flag	2020-02-19 10:45:20 -06:00
Drew Bailey	3c0719274c	inlude pro in http_oss.go	2020-02-18 10:29:28 -05:00
Mahmood Ali	98ad59b1de	update rest of consul packages	2020-02-16 16:25:04 -06:00
Mahmood Ali	f492ab6d9e	implement MinQuorum	2020-02-16 16:04:59 -06:00
Mahmood Ali	fd51982018	tests: Avoid StartAsLeader raft config flag It's being deprecated	2020-02-13 18:56:53 -05:00
Seth Hoenig	543354aabe	Merge pull request #7106 from hashicorp/f-ctag-override client: enable configuring enable_tag_override for services	2020-02-13 12:34:48 -06:00
Seth Hoenig	0e44094d1a	client: enable configuring enable_tag_override for services Consul provides a feature of Service Definitions where the tags associated with a service can be modified through the Catalog API, overriding the value(s) configured in the agent's service configuration. To enable this feature, the flag enable_tag_override must be configured in the service definition. Previously, Nomad did not allow configuring this flag, and thus the default value of false was used. Now, it is configurable. Because Nomad itself acts as a state machine around the the service definitions of the tasks it manages, it's worth describing what happens when this feature is enabled and why. Consider the basic case where there is no Nomad, and your service is provided to consul as a boring JSON file. The ultimate source of truth for the definition of that service is the file, and is stored in the agent. Later, Consul performs "anti-entropy" which synchronizes the Catalog (stored only the leaders). Then with enable_tag_override=true, the tags field is available for "external" modification through the Catalog API (rather than directly configuring the service definition file, or using the Agent API). The important observation is that if the service definition ever changes (i.e. the file is changed & config reloaded OR the Agent API is used to modify the service), those "external" tag values are thrown away, and the new service definition is once again the source of truth. In the Nomad case, Nomad itself is the source of truth over the Agent in the same way the JSON file was the source of truth in the example above. That means any time Nomad sets a new service definition, any externally configured tags are going to be replaced. When does this happen? Only on major lifecycle events, for example when a task is modified because of an updated job spec from the 'nomad job run <existing>' command. Otherwise, Nomad's periodic re-sync's with Consul will now no longer try to restore the externally modified tag values (as long as enable_tag_override=true). Fixes #2057	2020-02-10 08:00:55 -06:00
Michael Schurter	65d38d9255	test: fix flaky TestHTTP_FreshClientAllocMetrics	2020-02-07 15:50:53 -08:00
Michael Schurter	9d3093fa31	test: fix missing agent shutdowns	2020-02-07 15:50:53 -08:00
Michael Schurter	d96ceee8c5	testagent: fix case where agent would retry forever	2020-02-07 15:50:53 -08:00
Michael Schurter	e903501e65	test: improve error messages when failing	2020-02-07 15:50:53 -08:00
Michael Schurter	63032917fc	test: allow goroutine to exit even if test blocks	2020-02-07 15:50:53 -08:00
Michael Schurter	9905dec6a3	test: workaround limits race	2020-02-07 15:50:53 -08:00
Michael Schurter	19a1932bbb	test: wait longer than timeout The 1s timeout raced with the 1s deadline it was trying to detect.	2020-02-07 15:50:53 -08:00
Michael Schurter	fd81208db7	test: fix flaky health test Test set Agent.client=nil which prevented the client from being shutdown. This leaked goroutines and could cause panics due to the leaked client goroutines logging after their parent test had finished. Removed ACLs from the server test because I couldn't get it to work with the test agent, and it tested very little.	2020-02-07 15:50:53 -08:00
Michael Schurter	2896f78f77	client: fix race accessing Node.status * Call Node.Canonicalize once when Node is created. * Lock when accessing fields mutated by node update goroutine	2020-02-07 15:50:47 -08:00
Drew Bailey	d830998572	agent Profile req nil check s.agent.Server() clean up logic and tests	2020-02-03 13:20:05 -05:00
Drew Bailey	c4f45f9bde	Fix panic when monitoring a local client node Fixes a panic when accessing a.agent.Server() when agent is a client instead. This pr removes a redundant ACL check since ACLs are validated at the RPC layer. It also nil checks the agent server and uses Client() when appropriate.	2020-02-03 13:20:04 -05:00
Seth Hoenig	78a7d1e426	comments: cleanup some leftover debug comments and such	2020-01-31 19:04:35 -06:00
Seth Hoenig	076cb4754e	agent: re-enable the server in dev mode	2020-01-31 19:04:19 -06:00
Seth Hoenig	8219c78667	nomad: handle SI token revocations concurrently Be able to revoke SI token accessors concurrently, and also ratelimit the requests being made to Consul for the various ACL API uses.	2020-01-31 19:04:14 -06:00
Seth Hoenig	2c7ac9a80d	nomad: fixup token policy validation	2020-01-31 19:04:08 -06:00
Seth Hoenig	9df33f622f	nomad: proxy requests for Service Identity tokens between Clients and Consul Nomad jobs may be configured with a TaskGroup which contains a Service definition that is Consul Connect enabled. These service definitions end up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In the case where Consul ACLs are enabled, a Service Identity token is required for these tasks to run & connect, etc. This changeset enables the Nomad Server to recieve RPC requests for the derivation of SI tokens on behalf of instances of Consul Connect using Tasks. Those tokens are then relayed back to the requesting Client, which then injects the tokens in the secrets directory of the Task.	2020-01-31 19:03:53 -06:00
Seth Hoenig	f030a22c7c	command, docs: create and document consul token configuration for connect acls (gh-6716) This change provides an initial pass at setting up the configuration necessary to enable use of Connect with Consul ACLs. Operators will be able to pass in a Consul Token through `-consul-token` or `$CONSUL_TOKEN` in the `job run` and `job revert` commands (similar to Vault tokens). These values are not actually used yet in this changeset.	2020-01-31 19:02:53 -06:00
Michael Schurter	c82b14b0c4	core: add limits to unauthorized connections Introduce limits to prevent unauthorized users from exhausting all ephemeral ports on agents: * `{https,rpc}_handshake_timeout` * `{http,rpc}_max_conns_per_client` The handshake timeout closes connections that have not completed the TLS handshake by the deadline (5s by default). For RPC connections this timeout also separately applies to first byte being read so RPC connections with TLS enabled have `rpc_handshake_time * 2` as their deadline. The connection limit per client prevents a single remote TCP peer from exhausting all ephemeral ports. The default is 100, but can be lowered to a minimum of 26. Since streaming RPC connections create a new TCP connection (until MultiplexV2 is used), 20 connections are reserved for Raft and non-streaming RPCs to prevent connection exhaustion due to streaming RPCs. All limits are configurable and may be disabled by setting them to `0`. This also includes a fix that closes connections that attempt to create TLS RPC connections recursively. While only users with valid mTLS certificates could perform such an operation, it was added as a safeguard to prevent programming errors before they could cause resource exhaustion.	2020-01-30 10:38:25 -08:00
Mahmood Ali	90cae566e5	Merge pull request #6935 from hashicorp/b-default-preemption-flag scheduler: allow configuring default preemption for system scheduler	2020-01-28 15:11:06 -05:00
Mahmood Ali	af17b4afc7	Support customizing full scheduler config	2020-01-28 14:51:42 -05:00
Nick Ethier	5636203d4e	consul: fix var name from rebase	2020-01-27 14:00:19 -05:00
Nick Ethier	0ae99b3c9c	consul: fix var name from rebase	2020-01-27 12:55:52 -05:00
Nick Ethier	5cbb94e16e	consul: add support for canary meta	2020-01-27 09:53:30 -05:00
Mahmood Ali	1ab682f622	scheduler: allow configuring default preemption for system scheduler Some operators want a greater control over when preemption is enabled, especially during an upgrade to limit potential side-effects.	2020-01-13 08:30:49 -05:00
Drew Bailey	f97d2e96c1	refactor api profile methods comment why we ignore errors parsing params	2020-01-09 15:15:12 -05:00
Drew Bailey	b702dede49	adds qc param, address pr feedback	2020-01-09 15:15:11 -05:00
Drew Bailey	085659f6ff	condense table test	2020-01-09 15:15:10 -05:00
Drew Bailey	45210ed901	Rename profile package to pprof Address pr feedback, rename profile package to pprof to more accurately describe its purpose. Adds gc param for heap lookup profiles.	2020-01-09 15:15:10 -05:00
Drew Bailey	1b8af920f3	address pr feedback	2020-01-09 15:15:09 -05:00

... 5 6 7 8 9 ...

2117 commits