open-nomad

Commit Graph

Author	SHA1	Message	Date
Seth Hoenig	9df33f622f	nomad: proxy requests for Service Identity tokens between Clients and Consul Nomad jobs may be configured with a TaskGroup which contains a Service definition that is Consul Connect enabled. These service definitions end up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In the case where Consul ACLs are enabled, a Service Identity token is required for these tasks to run & connect, etc. This changeset enables the Nomad Server to recieve RPC requests for the derivation of SI tokens on behalf of instances of Consul Connect using Tasks. Those tokens are then relayed back to the requesting Client, which then injects the tokens in the secrets directory of the Task.	2020-01-31 19:03:53 -06:00
Seth Hoenig	93cf770edb	client: enable nomad client to request and set SI tokens for tasks When a job is configured with Consul Connect aware tasks (i.e. sidecar), the Nomad Client should be able to request from Consul (through Nomad Server) Service Identity tokens specific to those tasks.	2020-01-31 19:03:38 -06:00
Seth Hoenig	2b66ce93bb	nomad: ensure a unique ClusterID exists when leader (gh-6702) Enable any Server to lookup the unique ClusterID. If one has not been generated, and this node is the leader, generate a UUID and attempt to apply it through raft. The value is not yet used anywhere in this changeset, but is a prerequisite for gh-6701.	2020-01-31 19:03:26 -06:00
Seth Hoenig	f030a22c7c	command, docs: create and document consul token configuration for connect acls (gh-6716) This change provides an initial pass at setting up the configuration necessary to enable use of Connect with Consul ACLs. Operators will be able to pass in a Consul Token through `-consul-token` or `$CONSUL_TOKEN` in the `job run` and `job revert` commands (similar to Vault tokens). These values are not actually used yet in this changeset.	2020-01-31 19:02:53 -06:00
Michael Schurter	dd7712795d	Merge branch 'master' into b-tls-validation	2020-01-30 11:05:15 -08:00
Mahmood Ali	a9f551542d	Merge pull request #160 from hashicorp/b-mtls-hostname server: validate role and region for RPC w/ mTLS	2020-01-30 12:59:17 -06:00
Michael Schurter	c82b14b0c4	core: add limits to unauthorized connections Introduce limits to prevent unauthorized users from exhausting all ephemeral ports on agents: * `{https,rpc}_handshake_timeout` * `{http,rpc}_max_conns_per_client` The handshake timeout closes connections that have not completed the TLS handshake by the deadline (5s by default). For RPC connections this timeout also separately applies to first byte being read so RPC connections with TLS enabled have `rpc_handshake_time * 2` as their deadline. The connection limit per client prevents a single remote TCP peer from exhausting all ephemeral ports. The default is 100, but can be lowered to a minimum of 26. Since streaming RPC connections create a new TCP connection (until MultiplexV2 is used), 20 connections are reserved for Raft and non-streaming RPCs to prevent connection exhaustion due to streaming RPCs. All limits are configurable and may be disabled by setting them to `0`. This also includes a fix that closes connections that attempt to create TLS RPC connections recursively. While only users with valid mTLS certificates could perform such an operation, it was added as a safeguard to prevent programming errors before they could cause resource exhaustion.	2020-01-30 10:38:25 -08:00
Drew Bailey	da4af9bef3	fix tests, update changelog	2020-01-29 13:55:39 -05:00
Drew Bailey	a61bf32314	Allow nomad monitor command to lookup server UUID Allows addressing servers with nomad monitor using the servers name or ID. Also unifies logic for addressing servers for client_agent_endpoint commands and makes addressing logic region aware. rpc getServer test	2020-01-29 13:55:29 -05:00
Mahmood Ali	9611324654	Merge pull request #6922 from hashicorp/b-alloc-canoncalize Handle Upgrades and Alloc.TaskResources modification	2020-01-28 15:12:41 -05:00
Mahmood Ali	90cae566e5	Merge pull request #6935 from hashicorp/b-default-preemption-flag scheduler: allow configuring default preemption for system scheduler	2020-01-28 15:11:06 -05:00
Mahmood Ali	af17b4afc7	Support customizing full scheduler config	2020-01-28 14:51:42 -05:00
Mahmood Ali	f7a51a14c6	Merge pull request #6977 from hashicorp/b-leadership-flapping-2 Handle Nomad leadership flapping (attempt 2)	2020-01-28 11:40:41 -05:00
Mahmood Ali	687d2b7054	tests: defer closing shutdownCh	2020-01-28 09:53:48 -05:00
Mahmood Ali	ded4233c27	tweak leadership flapping log messages	2020-01-28 09:49:36 -05:00
Mahmood Ali	79823ae07d	handle channel close signal Always deliver last value then send close signal.	2020-01-28 09:44:34 -05:00
Mahmood Ali	d202924a93	include test and address review comments	2020-01-28 09:06:52 -05:00
Nick Ethier	5cbb94e16e	consul: add support for canary meta	2020-01-27 09:53:30 -05:00
Mahmood Ali	e436d2701a	Handle Nomad leadership flapping Fixes a deadlock in leadership handling if leadership flapped. Raft propagates leadership transition to Nomad through a NotifyCh channel. Raft blocks when writing to this channel, so channel must be buffered or aggressively consumed[1]. Otherwise, Raft blocks indefinitely in `raft.runLeader` until the channel is consumed[1] and does not move on to executing follower related logic (in `raft.runFollower`). While Raft `runLeader` defer function blocks, raft cannot process any other raft operations. For example, `run{Leader\|Follower}` methods consume `raft.applyCh`, and while runLeader defer is blocked, all raft log applications or config lookup will block indefinitely. Sadly, `leaderLoop` and `establishLeader` makes few Raft calls! `establishLeader` attempts to auto-create autopilot/scheduler config [3]; and `leaderLoop` attempts to check raft configuration [4]. All of these calls occur without a timeout. Thus, if leadership flapped quickly while `leaderLoop/establishLeadership` is invoked and hit any of these Raft calls, Raft handler _deadlock_ forever. Depending on how many times it flapped and where exactly we get stuck, I suspect it's possible to get in the following case: * Agent metrics/stats http and RPC calls hang as they check raft.Configurations * raft.State remains in Leader state, and server attempts to handle RPC calls (e.g. node/alloc updates) and these hang as well As we create goroutines per RPC call, the number of goroutines grow over time and may trigger a out of memory errors in addition to missed updates. [1] `d90d6d6bda/config.go (L190-L193)` [2] `d90d6d6bda/raft.go (L425-L436)` [3] `2a89e47746/nomad/leader.go (L198-L202)` [4] `2a89e47746/nomad/leader.go (L877)`	2020-01-22 13:08:34 -05:00
Mahmood Ali	129c884105	extract leader step function	2020-01-22 10:55:48 -05:00
Mahmood Ali	f36cc54efd	actually always canonicalize alloc.Job alloc.Job may be stale as well and need to migrate it. It does cost extra cycles but should be negligible.	2020-01-15 09:02:48 -05:00
Mahmood Ali	b1b714691c	address review comments	2020-01-15 08:57:05 -05:00
Mahmood Ali	1ab682f622	scheduler: allow configuring default preemption for system scheduler Some operators want a greater control over when preemption is enabled, especially during an upgrade to limit potential side-effects.	2020-01-13 08:30:49 -05:00
Drew Bailey	ff4bfb8809	Merge pull request #6841 from hashicorp/f-agent-pprof-acl Remote agent pprof endpoints	2020-01-10 14:52:39 -05:00
Mahmood Ali	bfa33cf471	canonicalize allocs from plan results too	2020-01-10 10:41:12 -05:00
Nick Ethier	1f28633954	Merge pull request #6816 from hashicorp/b-multiple-envoy connect: configure envoy to support multiple sidecars in the same alloc	2020-01-09 23:25:39 -05:00
Drew Bailey	b702dede49	adds qc param, address pr feedback	2020-01-09 15:15:11 -05:00
Drew Bailey	45210ed901	Rename profile package to pprof Address pr feedback, rename profile package to pprof to more accurately describe its purpose. Adds gc param for heap lookup profiles.	2020-01-09 15:15:10 -05:00
Drew Bailey	1b8af920f3	address pr feedback	2020-01-09 15:15:09 -05:00
Drew Bailey	4ced73875b	leave acl checking to rpc endpoints fix test expectation test wrapNonJSON	2020-01-09 15:15:08 -05:00
Drew Bailey	279512c7f8	provide helpful error, cleanup logic	2020-01-09 15:15:08 -05:00
Drew Bailey	7bbba613a5	prevent doubly wrapping with rpc error	2020-01-09 15:15:07 -05:00
Drew Bailey	fd42020ad6	RPC server EnableDebug option Passes in agent enable_debug config to nomad server and client configs. This allows for rpc endpoints to have more granular control if they should be enabled or not in combination with ACLs. enable debug on client test	2020-01-09 15:15:07 -05:00
Drew Bailey	31c0aca10a	rename forward func, add comment for why we forward	2020-01-09 15:15:06 -05:00
Drew Bailey	9a80938fb1	region forwarding; prevent recursive forwards for impossible requests prevent region forwarding loop, backfill tests fix failing test	2020-01-09 15:15:06 -05:00
Drew Bailey	46121fe3fd	move shared structs out of client and into nomad	2020-01-09 15:15:05 -05:00
Drew Bailey	aec81a0b99	api agent endpoints helper func to return serverPart based off of serverID	2020-01-09 15:15:05 -05:00
Drew Bailey	3672414888	test pprof headers and profile methods tidy up, add comments clean up seconds param assignment	2020-01-09 15:15:04 -05:00
Drew Bailey	fc37448683	warn when enabled debug is on when registering m -> a receiver name return codederrors, fix query	2020-01-09 15:15:04 -05:00
Drew Bailey	50288461c9	Server request forwarding for Agent.Profile Return rpc errors for profile requests, set up remote forwarding to target leader or server id for profile requests. server forwarding, endpoint tests	2020-01-09 15:15:03 -05:00
Mahmood Ali	d740d347ce	Migrate old alloc structs on read This commit ensures that Alloc.AllocatedResources is properly populated when read from persistence stores (namely Raft and client state store). The alloc struct may have been written previously by an arbitrary old version that may only populate Alloc.TaskResources.	2020-01-09 08:46:50 -05:00
James Rasell	df2dc48790	Fix error parsing config when setting consul.timeout. (#6907 ) When parsing a config file which had the consul.timeout param set, Nomad was reporting an error causing startup to fail. This seems to be caused by the HCL decoder interpreting the timeout type as an int rather than a string. This is caused by the struct TimeoutHCL param having a hcl key of timeout alongside a Timeout struct param of type time.Duration (int). Ensuring the decoder ignores the Timeout struct param ensure the decoder runs correctly.	2020-01-07 13:40:55 -05:00
Nick Ethier	677e9cdc16	connect: configure envoy such that multiple sidecars can run in the same alloc	2020-01-06 11:26:27 -05:00
Michael Schurter	92cdc9de01	nomad/state: remove dead upgrade path code It is uncalled so there hsould be no runtime changes.	2019-12-20 11:10:22 -08:00
Drew Bailey	d9e41d2880	docs for shutdown delay update docs, address pr comments ensure pointer is not nil use pointer for diff tests, set vs unset	2019-12-16 11:38:35 -05:00
Drew Bailey	ae145c9a37	allow only positive shutdown delay more explicit test case, remove select statement	2019-12-16 11:38:30 -05:00
Drew Bailey	24929776a2	shutdown delay for task groups copy struct values ensure groupserviceHook implements RunnerPreKillhook run deregister first test that shutdown times are delayed move magic number into variable	2019-12-16 11:38:16 -05:00
Seth Hoenig	270233e23d	tests: remove trace statements from nodeDrainWatcher.watch Avoid logging in the `watch` function as much as possible, since it is not waited on during a server shutdown. When the logger logs after a test passes, it may or may not cause the testing framework to panic. More info in: https://github.com/golang/go/issues/29388#issuecomment-453648436	2019-12-16 07:08:11 -06:00
Michael Schurter	95fd2643d7	connect: canonicalize before adding sidecar Fixes #6853 Canonicalize jobs first before adding any sidecars. This fixes a bug where sidecar tasks were added without interpolated names and broke validation. Sidecar tasks must be canonicalized independently. Also adds a group network to the mock connect job because it wasn't a valid connect job before!	2019-12-12 20:55:56 -08:00
Seth Hoenig	d45dec1ca8	tests: parallelize state store tests It has been decided we're going to live in a many core world. Let's take advantage of that and parallelize these state store tests which all run in memory and are largely CPU bound. An unscientific benchmark demonstrating the improvement: [mp state (master)] $ go test PASS ok github.com/hashicorp/nomad/nomad/state 5.162s [mp state (f-parallelize-state-store-tests)] $ go test PASS ok github.com/hashicorp/nomad/nomad/state 1.527s	2019-12-11 09:36:37 -06:00
Seth Hoenig	f0c3dca49c	tests: swap lib/freeport for tweaked helper/freeport Copy the updated version of freeport (sdk/freeport), and tweak it for use in Nomad tests. This means staying below port 10000 to avoid conflicts with the lib/freeport that is still transitively used by the old version of consul that we vendor. Also provide implementations to find ephemeral ports of macOS and Windows environments. Ports acquired through freeport are supposed to be returned to freeport, which this change now also introduces. Many tests are modified to include calls to a cleanup function for Server objects. This should help quite a bit with some flakey tests, but not all of them. Our port problems will not go away completely until we upgrade our vendor version of consul. With Go modules, we'll probably do a 'replace' to swap out other copies of freeport with the one now in 'nomad/helper/freeport'.	2019-12-09 08:37:32 -06:00
Danielle Lancashire	d2075ebae9	spellcheck: Fix spelling of retrieve	2019-12-05 18:59:47 -06:00
Mahmood Ali	b3e557cae3	address feedback review apply `s/requestAuthToken/requestACLToken/g`	2019-11-26 08:39:04 -05:00
Mahmood Ali	02e20c720b	acl_endpoint: permission denied for unauthenticated requests If ACL Request is unauthenticated, we should honor the anonymous token. This PR makes few changes: * `GetPolicy` endpoints may return policy if anonymous policy allows it, or return permission denied otherwise. * `ListPolicies` returns an empty policy list, or one with anonymous policy if one exists. Without this PR, the we return an incomprehensible error. Before: ``` $ curl http://localhost:4646/v1/acl/policy/doesntexist; echo acl token lookup failed: index error: UUID must be 36 characters $ curl http://localhost:4646/v1/acl/policies; echo acl token lookup failed: index error: UUID must be 36 characters ``` After: ``` $ curl http://localhost:4646/v1/acl/policy/doesntexist; echo Permission denied $ curl http://localhost:4646/v1/acl/policies; echo [] ```	2019-11-22 08:43:09 -05:00
Michael Schurter	4b6762511d	Merge pull request #6021 from hashicorp/f-anonymous-policy-access api: Update policy endpoint to permit anonymous access	2019-11-20 15:33:45 -08:00
Buck Doyle	5fcc00d0f9	Add gofmt changes	2019-11-20 12:47:01 -06:00
Buck Doyle	dc9c0d5ead	Add explanatory comment	2019-11-20 11:45:44 -06:00
Buck Doyle	bea9837510	Remove extraneous else block	2019-11-20 11:37:45 -06:00
Buck Doyle	d6a3e571bd	Remove extraneous whitespace	2019-11-20 11:37:01 -06:00
Buck Doyle	db77a24ed3	Merge branch 'master' into f-policy-json	2019-11-20 11:20:07 -06:00
Mahmood Ali	7fb4c35831	comments and casing	2019-11-19 16:03:55 -05:00
Mahmood Ali	97d0fd009d	404 if token isn't found	2019-11-19 15:52:53 -05:00
Mahmood Ali	6f8bb5e90b	api: acl bootstrap errors aren't 500 Noticed that ACL endpoints return 500 status code for user errors. This is confusing and can lead to false monitoring alerts. Here, I introduce a concept of RPCCoded errors to be returned by RPC that signal a code in addition to error message. Codes for now match HTTP codes to ease reasoning. ``` $ nomad acl bootstrap Error bootstrapping: Unexpected response code: 500 (ACL bootstrap already done (reset index: 9)) $ nomad acl bootstrap Error bootstrapping: Unexpected response code: 400 (ACL bootstrap already done (reset index: 9)) ```	2019-11-19 15:51:57 -05:00
Michael Schurter	796758b8a5	core: add semver constraint The existing version constraint uses logic optimized for package managers, not schedulers, when checking prereleases: - 1.3.0-beta1 will not satisfy ">= 0.6.1" - 1.7.0-rc1 will not satisfy ">= 1.6.0-beta1" This is due to package managers wishing to favor final releases over prereleases. In a scheduler versions more often represent the earliest release all required features/APIs are available in a system. Whether the constraint or the version being evaluated are prereleases has no impact on ordering. This commit adds a new constraint - `semver` - which will use Semver v2.0 ordering when evaluating constraints. Given the above examples: - 1.3.0-beta1 satisfies ">= 0.6.1" using `semver` - 1.7.0-rc1 satisfies ">= 1.6.0-beta1" using `semver` Since existing jobspecs may rely on the old behavior, a new constraint was added and the implicit Consul Connect and Vault constraints were updated to use it.	2019-11-19 08:40:19 -08:00
Nick Ethier	bd454a4c6f	client: improve group service stanza interpolation and check_re… (#6586 ) * client: improve group service stanza interpolation and check_restart support Interpolation can now be done on group service stanzas. Note that some task runtime specific information that was previously available when the service was registered poststart of a task is no longer available. The check_restart stanza for checks defined on group services will now properly restart the allocation upon check failures if configured.	2019-11-18 13:04:01 -05:00
Luiz Aoqui	e499c5bddc	Merge pull request #6698 from hashicorp/f-add-drain-start-time api: add `StartedAt` in `Node.DrainStrategy`	2019-11-15 15:38:38 -05:00
Luiz Aoqui	e862b61daa	api: use the same initial time for all drain properties	2019-11-14 16:06:09 -05:00
Drew Bailey	9b63828658	serverID to target remote leader or server handle the case where we request a server-id which is this current server update docs, error on node and server id params more accurate names for tests use shared no leader err, formatting rm bad comment remove redundant variable	2019-11-14 10:07:35 -05:00
Drew Bailey	b644e1f47d	add server-id to monitor specific server	2019-11-14 09:53:41 -05:00
Drew Bailey	2185c1a89e	Allows monitor to target leader server Allows user to pass in node-id=leader to forward monitor request to remote a remote leader.	2019-11-14 09:53:40 -05:00
Luiz Aoqui	5bd7cdd5c3	api: add `StartedAt` in `Node.DrainStrategy`	2019-11-13 17:54:40 -05:00
Lars Lehtonen	22a3c21dd0	nomad: fix dropped test error	2019-11-13 12:49:41 -08:00
Michael Schurter	08afb7d605	vault: allow overriding implicit vault constraint There's a bug in version parsing that breaks this constraint when using a prerelease enterprise version of Vault (eg 1.3.0-beta1+ent). While this does not fix the underlying bug it does provide a workaround for future issues related to the implicit constraint. Like the implicit Connect constraint: all implicit constraints should be overridable to allow users to workaround bugs or other factors should the need arise.	2019-11-12 12:26:36 -08:00
Mahmood Ali	c4c37cb42e	vault: check token_explicit_max_ttl as well Vault 1.2.0 deprecated `explicit_max_ttl` in favor of `token_explicit_max_ttl`.	2019-11-12 08:47:23 -05:00
Lars Lehtonen	adbab29228	nomad: TestEvalBroker_Dequeue_Empty_Timeout() proper goroutine error handling (#6657 )	2019-11-08 14:35:06 -05:00
Drew Bailey	7420446458	Merge pull request #6639 from hashicorp/return-after-forward return after request has been forwarded	2019-11-08 09:48:35 -05:00
Lars Lehtonen	39b68e0b88	TestEvalBroker_Dequeue_Blocked() proper goroutine error handling (#6651 ) TestEvalBroker_Dequeue_Blocked() improve test readability	2019-11-08 08:52:23 -05:00
Nick Ethier	e947aaed4f	nomad: fix bug that didn't allow for multiple connect services in same tg	2019-11-08 04:33:39 -05:00
Lars Lehtonen	6deae70e35	TestEvalBroker_PauseResumeNackTimeout() proper goroutine error handling (#6649 ) TestEvalBroker_PauseResumeNackTimeout() improve test readability	2019-11-07 16:04:59 -05:00
Lars Lehtonen	2638cbb31d	nomad: TestEvalBroker_EnqueueAll_Dequeue_Fair() proper goroutine error handling (#6636 ) nomad: TestEvalBroker_EnqueueAll_Dequeue_Fair() improve test readability	2019-11-07 10:39:29 -05:00
Drew Bailey	a5e2e1805f	return after request has been forwarded	2019-11-07 08:33:53 -05:00
Lars Lehtonen	e64f98837c	nomad: fix dropped error in TestJobEndpoint_Deregister_ACL (#6602 )	2019-11-06 16:40:45 -05:00
Drew Bailey	f4a7e3dc75	coordinate closing of doneCh, use interface to simplify callers comments	2019-11-05 11:44:26 -05:00
Drew Bailey	fe542680dc	log-json -> json fix typo command/agent/monitor/monitor.go Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com> Update command/agent/monitor/monitor.go Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com> address feedback, lock to prevent send on closed channel fix lock/unlock for dropped messages	2019-11-05 09:51:59 -05:00
Drew Bailey	298b8358a9	move forwarded monitor request into helper	2019-11-05 09:51:56 -05:00
Drew Bailey	8726b685de	address feedback	2019-11-05 09:51:56 -05:00
Drew Bailey	0e759c401c	moving endpoints over to frames	2019-11-05 09:51:54 -05:00
Drew Bailey	17d876d5ef	rename function, initialize log level better underscores instead of dashes for query params	2019-11-05 09:51:53 -05:00
Drew Bailey	8178beecf0	address feedback, use agent_endpoint instead of monitor	2019-11-05 09:51:53 -05:00
Drew Bailey	db65b1f4a5	agent:read acl policy for monitor	2019-11-05 09:51:52 -05:00
Drew Bailey	2533617888	rpc acl tests for both monitor endpoints	2019-11-05 09:51:51 -05:00
Drew Bailey	3c33747e1f	client monitor endpoint tests	2019-11-05 09:51:50 -05:00
Drew Bailey	4bc68855d0	use intercepting loggers for rpchandlers	2019-11-05 09:51:50 -05:00
Drew Bailey	3b9c33a5f0	new hclog with standardlogger intercept	2019-11-05 09:51:49 -05:00
Drew Bailey	a45ae1cd58	enable json formatting, use queryoptions	2019-11-05 09:51:49 -05:00
Drew Bailey	786989dbe3	New monitor pkg for shared monitor functionality Adds new package that can be used by client and server RPC endpoints to facilitate monitoring based off of a logger clean up old code small comment about write rm old comment about minsize rename to Monitor Removes connection logic from monitor command Keep connection logic in endpoints, use a channel to send results from monitoring use new multisink logger and interfaces small test for dropped messages update go-hclogger and update sink/intercept logger interfaces	2019-11-05 09:51:49 -05:00
Lars Lehtonen	0a4542fadc	nomad: fix test goroutine (#6593 )	2019-10-31 08:23:32 -04:00
Seth Hoenig	98592113a3	Merge pull request #6582 from hashicorp/b-vault-createToken-log-msg nomad: fix vault.CreateToken log message printing wrong error	2019-10-29 17:35:05 -05:00
Mahmood Ali	7f2e4dc5d8	Merge pull request #6574 from hashicorp/b-gh-6570-vault-role-validation vault: honor new `token_period` in vault token role	2019-10-29 10:18:59 -04:00
Seth Hoenig	838c6e3329	nomad: fix vault.CreateToken log message printing wrong error Fixes typo in word "failed". Fixes bug where incorrect error is printed. The old code would only ever print a nil error, instead of the validationErr which is being created.	2019-10-28 23:05:32 -05:00
Mahmood Ali	c5d8d66787	Fix admissionValidators `admissionValidators` doesn't aggregate errors correctly, as it aggregates errors in `errs` reference yet it always returns the nil `err`. Here, we avoid shadowing `err`, and move variable declarations to where they are used.	2019-10-28 10:52:53 -04:00
Mahmood Ali	abb930249a	consul connect: do basic validation before mutating job `groupConnectHook` assumes that Networks is a non-empty slice, but TG hasn't been validated yet and validation may depend on mutation results. As such, we do basic check here before dereferencing network slice elements.	2019-10-28 10:49:02 -04:00
Mahmood Ali	bb45a7a776	add tests for consul connect validation	2019-10-28 10:41:51 -04:00
Mahmood Ali	4c64658397	vault: Support new role field `token_role` Vault 1.2.0 deprecated `period` field in favor of `token_period` in auth role: > * Token store roles use new, common token fields for the values > that overlap with other auth backends. `period`, `explicit_max_ttl`, and > `bound_cidrs` will continue to work, with priority being given to the > `token_` prefixed versions of those parameters. They will also be returned > when doing a read on the role if they were used to provide values initially; > however, in Vault 1.4 if `period` or `explicit_max_ttl` is zero they will no > longer be returned. (`explicit_max_ttl` was already not returned if empty.) https://github.com/hashicorp/vault/blob/master/CHANGELOG.md#120-july-30th-2019	2019-10-28 09:33:26 -04:00
Seth Hoenig	8b03477f46	Merge pull request #6448 from hashicorp/f-set-connect-sidecar-tags connect: enable setting tags on consul connect sidecar service in job…	2019-10-17 15:14:09 -05:00
Seth Hoenig	039fbd3f3b	connect: enable setting tags on consul connect sidecar service in jobspec (#6415 )	2019-10-17 19:25:20 +00:00
Mahmood Ali	4e4a9b252c	Merge pull request #6290 from hashicorp/r-generated-code-refactor dev: avoid codecgen code in downstream projects	2019-10-15 08:22:31 -04:00
Danielle	fee482ae6c	Merge pull request #6331 from hashicorp/dani/f-volume-mount-propagation volumes: Add support for mount propagation	2019-10-14 14:29:40 +02:00
Danielle Lancashire	4fbcc668d0	volumes: Add support for mount propagation This commit introduces support for configuring mount propagation when mounting volumes with the `volume_mount` stanza on Linux targets. Similar to Kubernetes, we expose 3 options for configuring mount propagation: - private, which is equivalent to `rprivate` on Linux, which does not allow the container to see any new nested mounts after the chroot was created. - host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts that have been created _outside of the container_ to be visible inside the container after the chroot is created. - bidirectional, which is equivalent to `rshared` on Linux, which allows both the container to see new mounts created on the host, but importantly _allows the container to create mounts that are visible in other containers an don the host_ private and host-to-task are safe, but bidirectional mounts can be dangerous, as if the code inside a container creates a mount, and does not clean it up before tearing down the container, it can cause bad things to happen inside the kernel. To add a layer of safety here, we require that the user has ReadWrite permissions on the volume before allowing bidirectional mounts, as a defense in depth / validation case, although creating mounts should also require a priviliged execution environment inside the container.	2019-10-14 14:09:58 +02:00
Mahmood Ali	4b2ba62e35	acl: check ACL against object namespace Fix a bug where a millicious user can access or manipulate an alloc in a namespace they don't have access to. The allocation endpoints perform ACL checks against the request namespace, not the allocation namespace, and performs the allocation lookup independently from namespaces. Here, we check that the requested can access the alloc namespace regardless of the declared request namespace. Ideally, we'd enforce that the declared request namespace matches the actual allocation namespace. Unfortunately, we haven't documented alloc endpoints as namespaced functions; we suspect starting to enforce this will be very disruptive and inappropriate for a nomad point release. As such, we maintain current behavior that doesn't require passing the proper namespace in request. A future major release may start enforcing checking declared namespace.	2019-10-08 12:59:22 -04:00
Mahmood Ali	674a457865	use RequestNamespace(), the canonical way to get namespace	2019-09-27 07:40:58 -04:00
Mahmood Ali	e29ee4c400	nomad: defensive check for namespaces in job registration call In a job registration request, ensure that the request namespace "header" and job namespace field match. This should be the case already in prod, as http handlers ensures that the values match [1]. This mitigates bugs that exploit bugs where we may check a value but act on another, resulting into bypassing ACL system. [1] https://github.com/hashicorp/nomad/blob/v0.9.5/command/agent/job_endpoint.go#L415-L418	2019-09-26 17:02:47 -04:00
Lang Martin	fb41dd86ba	default raft protocol v2	2019-09-24 14:37:55 -04:00
Lang Martin	31d7f116dd	nomad/server comments	2019-09-24 14:36:18 -04:00
Tim Gross	cd9c23617f	client/connect: ConsulProxy LocalServicePort/Address (#6358 ) Without a `LocalServicePort`, Connect services will try to use the mapped port even when delivering traffic locally. A user can override this behavior by pinning the port value in the `service` stanza but this prevents us from using the Consul service name to reach the service. This commits configures the Consul proxy with its `LocalServicePort` and `LocalServiceAddress` fields.	2019-09-23 14:30:48 -04:00
Danielle Lancashire	78b61de45f	config: Hoist volume.config.source into volume Currently, using a Volume in a job uses the following configuration: ``` volume "alias-name" { type = "volume-type" read_only = true config { source = "host_volume_name" } } ``` This commit migrates to the following: ``` volume "alias-name" { type = "volume-type" source = "host_volume_name" read_only = true } ``` The original design was based due to being uncertain about the future of storage plugins, and to allow maxium flexibility. However, this causes a few issues, namely: - We frequently need to parse this configuration during submission, scheduling, and mounting - It complicates the configuration from and end users perspective - It complicates the ability to do validation As we understand the problem space of CSI a little more, it has become clear that we won't need the `source` to be in config, as it will be used in the majority of cases: - Host Volumes: Always need a source - Preallocated CSI Volumes: Always needs a source from a volume or claim name - Dynamic Persistent CSI Volumes: Always needs a source to attach the volumes to for managing upgrades and to avoid dangling. - Dynamic Ephemeral CSI Volumes: Less thought out, but `source` will probably point to the plugin name, and a `config` block will allow you to pass meta to the plugin. Or will point to a pre-configured ephemeral config. *If implemented The new design simplifies this by merging the source into the volume stanza to solve the above issues with usability, performance, and error handling.	2019-09-13 04:37:59 +02:00
Mahmood Ali	4b8280e51d	remove generated code	2019-09-06 19:24:15 +00:00
Nomad Release bot	dc7d728a82	Generate files for 0.10.0-beta1 release	2019-09-06 18:47:09 +00:00
Mahmood Ali	01f42053e4	dev: avoid codecgen code in downstream projects This is an attempt to ease dependency management for external driver plugins, by avoiding requiring them to compile ugorji/go generated files. Plugin developers reported some pain with the brittleness of ugorji/go dependency in particular, specially when using go mod, the default go mod manager in golang 1.13. Context -------- Nomad uses msgpack to persist and serialize internal structs, using ugorji/go library. As an optimization, we use ugorji/go code generation to speedup process and aovid the relection-based slow path. We commit these generated files in repository when we cut and tag the release to ease reproducability and debugging old releases. Thus, downstream projects that depend on release tag, indirectly depends on ugorji/go generated code. Sadly, the generated code is brittle and specific to the version of ugorji/go being used. When go mod picks another version of ugorji/go then nomad (go mod by default uses release according to semver), downstream projects face compilation errors. Interestingly, downstream projects don't commonly serialize nomad internal structs. Drivers and device plugins use grpc instead of msgpack for the most part. In the few cases where they use msgpag (e.g. decoding task config), they do without codegen path as they run on driver specific structs not the nomad internal structs. Also, the ugorji/go serialization through reflection is generally backward compatible (mod some ugorji/go regression bugs that get introduced every now and then :( ). Proposal --------- The proposal here is to keep committing ugorji/go codec generated files for releases but to use a go tag for them. All nomad development through the makefile, including releasing, CI and dev flow, has the tag enabled. Downstream plugin projects, by default, will skip these files and life proceed as normal for them. The downside is that nomad developers who use generated code but avoid using make must start passing additional go tag argument. Though this is not a blessed configuration.	2019-09-06 09:22:00 -04:00
Mahmood Ali	6d73ca0cfb	Merge pull request #6250 from hashicorp/f-raft-protocol-v3 Update default raft protocol to version 3	2019-09-04 09:34:41 -04:00
Mahmood Ali	c94a5ef1f8	tests: give up on TestAutopilot_CleanupStaleRaftServer for now	2019-09-04 09:10:53 -04:00
Nick Ethier	6a90a9f505	structs: canonicalize tg Services and Networks (#6257 )	2019-09-04 08:55:47 -04:00
Mahmood Ali	6cefd8f97e	tests: attempt to fix TestAutopilot_CleanupStaleRaftServer Also add a utility function for waiting for stable leadership	2019-09-04 08:49:33 -04:00
Mahmood Ali	035a7a94d9	tests: update time sensitive tests Fix tests whose messages seem timing dependent.	2019-09-04 08:45:25 -04:00
Mahmood Ali	0beb757b6f	tests: disable server auto join by default Tests typically call join cluster directly rather than rely on consul discovery. Worse, consul discovery seems to cause additional leadership transitions when a server is shutdown in tests than tests expect.	2019-09-04 07:54:54 -04:00
Mahmood Ali	3e2ab6e2a3	address review feedback	2019-09-03 21:44:39 -04:00
Mahmood Ali	0a6d73020c	use current nomad version in testing	2019-09-03 21:42:41 -04:00
Mahmood Ali	9bd56587cd	Fix raft tests Wait until leadership stabalizes and all non-voters get promoted before killing leader	2019-09-03 14:53:29 -04:00
Michael Schurter	5957030d18	connect: add unix socket to proxy grpc for envoy (#6232 ) * connect: add unix socket to proxy grpc for envoy Fixes #6124 Implement a L4 proxy from a unix socket inside a network namespace to Consul's gRPC endpoint on the host. This allows Envoy to connect to Consul's xDS configuration API. * connect: pointer receiver on structs with mutexes * connect: warn on all proxy errors	2019-09-03 08:43:38 -07:00
Buck Doyle	21ec6a237c	Merge branch 'master' into f-policy-json # Conflicts: # CHANGELOG.md	2019-09-03 09:56:25 -05:00
Jasmine Dahilig	4edebe389a	add default update stanza and max_parallel=0 disables deployments (#6191 )	2019-09-02 10:30:09 -07:00
Buck Doyle	ab96785fc9	Change test to use valid HCL for rules	2019-08-29 16:09:02 -05:00
Buck Doyle	4a159f5dcf	Change parsing error to set rules to nil	2019-08-29 15:50:34 -05:00
Buck Doyle	5495a7e689	Add standard error-handling for parse failure	2019-08-29 11:12:02 -05:00
Buck Doyle	8b06712d21	Merge branch 'master' into f-policy-json	2019-08-29 11:11:21 -05:00
Mahmood Ali	3da10b5cb3	scheduler: tests for multiple drivers in TG	2019-08-29 09:03:31 -04:00
Mahmood Ali	a67f5f0565	update tests to run with v2	2019-08-28 16:42:08 -04:00
Mahmood Ali	6eabf53b91	Default raft protocol to version 3	2019-08-28 15:56:59 -04:00
Michael Schurter	f5792635ca	Merge pull request #6218 from hashicorp/f-consul-defaults consul: use Consul's defaults and env vars	2019-08-28 11:54:44 -07:00
Nick Ethier	9e96971a75	cli: display group ports and address in alloc status command output (#6189 ) * cli: display group ports and address in alloc status command output * add assertions for port.To = -1 case and convert assertions to testify	2019-08-27 23:59:36 -04:00
Nick Ethier	cbb27e74bc	Add environment variables for connect upstreams (#6171 ) * taskenv: add connect upstream env vars + test * set taskenv upstreams instead of appending * Update client/taskenv/env.go Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-08-27 23:41:38 -04:00
Michael Schurter	3b0e1d8ef7	consul: use Consul's defaults and env vars Use Consul's API package defaults and env vars as Nomad's defaults.	2019-08-27 14:56:52 -07:00
Mahmood Ali	3791a70aa9	Merge pull request #5676 from hashicorp/f-b-upgrade-ugorji-dep-20190508 Update ugorji/go to latest	2019-08-23 18:29:49 -04:00
Jerome Gravel-Niquet	cbdc1978bf	Consul service meta (#6193 ) * adds meta object to service in job spec, sends it to consul * adds tests for service meta * fix tests * adds docs * better hashing for service meta, use helper for copying meta when registering service * tried to be DRY, but looks like it would be more work to use the helper function	2019-08-23 12:49:02 -04:00
Michael Schurter	95b8048553	Merge pull request #6121 from hashicorp/f-connect-bootstrap connect: task hook for bootstrapping envoy sidecar	2019-08-22 10:58:31 -07:00
Michael Schurter	59e0b67c7f	connect: task hook for bootstrapping envoy sidecar Fixes #6041 Unlike all other Consul operations, boostrapping requires Consul be available. This PR tries Consul 3 times with a backoff to account for the group services being asynchronously registered with Consul.	2019-08-22 08:15:32 -07:00
Danielle Lancashire	2e5f28029f	remove hidden field from host volumes We're not shipping support for "hidden" volumes in 0.10 any more, I'll convert this to an issue+mini RFC for future enhancement.	2019-08-22 08:48:05 +02:00
Danielle	0428284aee	Merge pull request #6180 from hashicorp/dani/readonly-acl Fine grained ACLs for Host Volumes	2019-08-21 22:22:14 +02:00
Danielle Lancashire	91bb67f713	acls: Break mount acl into mount-rw and mount-ro	2019-08-21 21:17:30 +02:00
Nick Ethier	c8556daf37	structs: validate no tcp checks for connect services (#6169 )	2019-08-21 12:42:53 -04:00

1 2 3 4 5 ...

3135 Commits