open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	a4e89d72a8	secure vars: filter by path in List RPCs (#14036 ) The List RPCs only checked the ACL for the Prefix argument of the request. Add an ACL filter to the paginator for the List RPC. Extend test coverage of ACLs in the List RPC and in the `acl` package, and add a "deny" capability so that operators can deny specific paths or prefixes below an allowed path.	2022-08-15 11:38:20 -04:00
Tim Gross	4005759d28	move secure variable conflict resolution to state store (#13922 ) Move conflict resolution implementation into the state store with a new Apply RPC. This also makes the RPC for secure variables much more similar to Consul's KV, which will help us support soft deletes in a post-1.4.0 version of Nomad. Reimplement quotas in the state store functions. Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>	2022-08-15 11:19:53 -04:00
Seth Hoenig	3aaaedf52e	cli: forward request for job validation to nomad leader This PR changes the behavior of 'nomad job validate' to forward the request to the nomad leader, rather than responding from any server. This is because we need the leader when validating Vault tokens, since the leader is the only server with an active vault client.	2022-08-10 14:34:04 -05:00
Seth Hoenig	0b52c27a15	Merge pull request #14045 from Abirdcfly/main fix minor unreachable code caused by t.Fatal	2022-08-08 11:47:02 -05:00
Abirdcfly	d66943d4f7	fix minor unreachable code caused by t.Fatal Signed-off-by: Abirdcfly <fp544037857@gmail.com>	2022-08-08 23:50:11 +08:00
Seth Hoenig	2b6bda49b9	core: automatically plumb task name into task-level services and checks	2022-08-05 12:42:41 -05:00
Seth Hoenig	f6f26fb72c	nsd: add support for setting request body in http checks This PR adds support for settings check.body in checks of services making use of Nomad's service provider.	2022-08-04 14:40:23 -05:00
Charles Z	7a8ec90fbe	allow unhealthy canaries without blocking autopromote (#14001 )	2022-08-04 11:53:50 -04:00
Seth Hoenig	dcda57e729	nsd: add support for setting headers on nomad service http checks This PR enables setting of the headers block on services registered into Nomad's service provider. Works just like the existing support in Consul checks.	2022-08-03 10:06:44 -05:00
Seth Hoenig	067aa00a6a	Merge pull request #13953 from hashicorp/f-nsd-check-methods nsd: add support for specifying check.method in nomad service checks	2022-08-03 08:28:38 -05:00
Piotr Kazmierczak	530280505f	client: enable specifying user/group permissions in the template stanza (#13755 ) * Adds Uid/Gid parameters to template. * Updated diff_test * fixed order * update jobspec and api * removed obsolete code * helper functions for jobspec parse test * updated documentation * adjusted API jobs test. * propagate uid/gid setting to job_endpoint * adjusted job_endpoint tests * making uid/gid into pointers * refactor * updated documentation * updated documentation * Update client/allocrunner/taskrunner/template/template_test.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * Update website/content/api-docs/json-jobs.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * propagating documentation change from Luiz * formatting * changelog entry * changed changelog entry Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-08-02 22:15:38 +02:00
Eric Weber	cbce13c1ac	Add stage_publish_base_dir field to csi_plugin stanza of a job (#13919 ) * Allow specification of CSI staging and publishing directory path * Add website documentation for stage_publish_dir * Replace erroneous reference to csi_plugin.mount_config with csi_plugin.mount_dir * Avoid requiring CSI plugins to be redeployed after introducing StagePublishDir	2022-08-02 09:42:44 -04:00
Tim Gross	e5ac6464f6	secure vars: enforce ENT quotas (OSS work) (#13951 ) Move the secure variables quota enforcement calls into the state store to ensure quota checks are atomic with quota updates (in the same transaction). Switch to a machine-size int instead of a uint64 for quota tracking. The ENT-side quota spec is described as int, and negative values have a meaning as "not permitted at all". Using the same type for tracking will make it easier to the math around checks, and uint64 is infeasibly large anyways. Add secure vars to quota HTTP API and CLI outputs and API docs.	2022-08-02 09:32:09 -04:00
Seth Hoenig	a4d4a76994	nsd: add support for specifying check.method in nomad service checks Unblock 'check.method' in service validation. Add tests around making sure this value gets plumbed through.	2022-08-01 16:13:48 -05:00
Tim Gross	04677d205e	block deleting namespace if it contains a secure variable (#13888 ) When we delete a namespace, we check to ensure that there are no non-terminal jobs or CSI volume, which also covers evals, allocs, etc. Secure variables are also namespaces, so extend this check to them as well.	2022-07-22 10:06:35 -04:00
Seth Hoenig	5aaa31a6dc	Merge pull request #13882 from hashicorp/cleanup-onupdate-consts cleanup: use constants for on_update values	2022-07-22 08:58:36 -05:00
Tim Gross	c7a11a86c6	block deleting namespaces if the namespace contains a volume (#13880 ) When we delete a namespace, we check to ensure that there are no non-terminal jobs, which effectively covers evals, allocs, etc. CSI volumes are also namespaced, so extend this check to cover CSI volumes.	2022-07-21 16:13:52 -04:00
Seth Hoenig	d8fe1d10ba	cleanup: use constants for on_update values	2022-07-21 13:09:47 -05:00
Seth Hoenig	c61e779b48	Merge pull request #13715 from hashicorp/dev-nsd-checks client: add support for checks in nomad services	2022-07-21 10:22:57 -05:00
Seth Hoenig	606e3ebdd4	client: updates from pr feedback	2022-07-21 09:54:27 -05:00
Seth Hoenig	8e6eeaa37e	Merge pull request #13869 from hashicorp/b-uniq-services-2 servicedisco: ensure service uniqueness in job validation	2022-07-21 08:24:24 -05:00
Tim Gross	d11da1df5c	workload identity: use parent ID for dispatch/periodic jobs (#13748 ) Workload identities grant implicit access to policies, and operators will not want to craft separate policies for each invocation of a periodic or dispatch job. Use the parent job's ID as the JobID claim.	2022-07-21 09:05:54 -04:00
Tim Gross	9c43c28575	search: use secure vars ACL policy for secure vars context (#13788 ) The search RPC used a placeholder policy for searching within the secure variables context. Now that we have ACL policies built for secure variables, we can use them for search. Requires a new loose policy for checking if a token has any secure variables access within a namespace, so that we can filter on specific paths in the iterator.	2022-07-21 08:39:36 -04:00
Tim Gross	97a6346da0	keyring: use nanos for `CreateTime` in key metadata (#13849 ) Most of our objects use int64 timestamps derived from `UnixNano()` instead of `time.Time` objects. Switch the keyring metadata to use `UnixNano()` for consistency across the API.	2022-07-20 14:46:57 -04:00
Tim Gross	428e23043c	secure vars: limit maximum size of variable data (#13743 ) To discourage accidentally DoS'ing the cluster with secure variables data, we're providing a very low limit to the maximum size of a given secure variable. This currently matches the limit for dispatch payloads. In future versions, we may increase this limit or make it configurable, once we have better metrics from real-world operators.	2022-07-20 14:46:43 -04:00
Seth Hoenig	e5978a9cbf	jobspec: ensure service uniqueness in job validation	2022-07-20 12:38:08 -05:00
Seth Hoenig	d83aae253f	cleanup: track task names and providers using set	2022-07-20 11:48:36 -05:00
Seth Hoenig	bd2935ee54	cleanup: tweaks from cr feedback	2022-07-20 10:42:35 -05:00
Seth Hoenig	93cfeb177b	cleanup: example refactoring out map[string]struct{} using set.Set This PR is a little demo of using github.com/hashicorp/go-set to replace the use of map[T]struct{} as a make-shift set.	2022-07-19 22:50:49 -05:00
Tim Gross	ea38582b40	secure vars: rename automatically accessible vars path for jobs (#13848 ) Tasks are automatically granted access to variables on a path that matches their workload identity, with a well-known prefix. Change the prefix to `nomad/jobs` to allow for future prefixes like `nomad/volumes` or `nomad/plugins`. Reserve the prefix by emitting errors during validation.	2022-07-19 16:17:34 -04:00
Tim Gross	cfa2cb140e	fsm: one-time token expiration should be deterministic (#13737 ) When applying a raft log to expire ACL tokens, we need to use a timestamp provided by the leader so that the result is deterministic across servers. Use leader's timestamp from RPC call	2022-07-18 14:19:29 -04:00
Seth Hoenig	c23da281a1	metrics: even classless blocked evals get metrics This PR fixes a bug where blocked evaluations with no class set would not have metrics exported at the dc:class scope. Fixes #13759	2022-07-15 14:12:44 -05:00
Tim Gross	05cd91155d	keyring: fix flake in replication-after-election test (#13749 ) The test for simulating a key rotation across leader elections was flaky because we weren't waiting for a leader election and was checking the server configs rather than raft for which server was currently the leader. Fixing the flake revealed a bug in the test that we weren't ensuring the new leader was running its own replication, so it wouldn't pick up the key material from the previous follower.	2022-07-15 11:09:09 -04:00
Tim Gross	aa15e0fe7e	secure vars: updates should reduce quota tracking if smaller (#13742 ) When secure variables are updated, we were adding the update to the existing quota tracking without first checking whether it was an update to an existing variable. In that case we need to add/subtract only the difference between the new and existing quota usage.	2022-07-15 11:08:53 -04:00
Tim Gross	0cf8a580c7	search: refactor OSS/ENT split for ACL checks (#13760 ) The split between OSS/ENT in ACL checks for the Search RPC has a lot of repeated code that results in merge conflicts. Move most of the logic into the shared code so that we can call out to thin functions for ENT checks.	2022-07-14 11:31:08 -04:00
Tim Gross	cc9fb1c876	keyring: upserting key metadata in FSM must be deterministic (#13733 )	2022-07-14 08:38:14 -04:00
Luiz Aoqui	b656981cf0	Track plan rejection history and automatically mark clients as ineligible (#13421 ) Plan rejections occur when the scheduler work and the leader plan applier disagree on the feasibility of a plan. This may happen for valid reasons: since Nomad does parallel scheduling, it is expected that different workers will have a different state when computing placements. As the final plan reaches the leader plan applier, it may no longer be valid due to a concurrent scheduling taking up intended resources. In these situations the plan applier will notify the worker that the plan was rejected and that they should refresh their state before trying again. In some rare and unexpected circumstances it has been observed that workers will repeatedly submit the same plan, even if they are always rejected. While the root cause is still unknown this mitigation has been put in place. The plan applier will now track the history of plan rejections per client and include in the plan result a list of node IDs that should be set as ineligible if the number of rejections in a given time window crosses a certain threshold. The window size and threshold value can be adjusted in the server configuration. To avoid marking several nodes as ineligible at one, the operation is rate limited to 5 nodes every 30min, with an initial burst of 10 operations.	2022-07-12 18:40:20 -04:00
Seth Hoenig	297d386bdc	client: add support for checks in nomad services This PR adds support for specifying checks in services registered to the built-in nomad service provider. Currently only HTTP and TCP checks are supported, though more types could be added later.	2022-07-12 17:09:50 -05:00
Michael Schurter	3e50f72fad	core: merge reserved_ports into host_networks (#13651 ) Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.	2022-07-12 14:40:25 -07:00
Charlie Voiselle	6be7a41351	SV: CLI: var list command (#13707 ) * SV CLI: var list * Fix wildcard prefix filtering Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-07-12 12:49:39 -04:00
Tim Gross	8054b3c9e6	secure vars: fix enterprise test by upserting the namespace (#13719 ) In OSS we can upsert an allocation without worrying about whether that alloc is in a namespace that actually exists, but in ENT that upsert will add to the namespace's quotas. Ensure we're doing so in this secure variables RPC test to fix the test breaking in the ENT repo.	2022-07-12 12:05:52 -04:00
Charlie Voiselle	f4784e8d69	SV: fixes for namespace handling (#13705 ) * ACL check namespace value in SecureVariable * Error on wildcard namespace	2022-07-12 11:15:57 -04:00
Phil Renaud	e9219a1ae0	Allow wildcard for Evaluations API (#13530 ) * Failing test and TODO for wildcard * Alias the namespace query parameter for Evals * eval: fix list when using ACLs and * namespace Apply the same verification process as in job, allocs and scaling policy list endpoints to handle the eval list when using an ACL token with limited namespace support but querying using the `` wildcard namespace. changelog: add entry for #13530 * ui: set namespace when querying eval Evals have a unique UUID as ID, but when querying them the Nomad API still expects a namespace query param, otherwise it assumes `default`. Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-07-11 16:42:17 -04:00
Tim Gross	a5a9eedc81	core job for secure variables re-key (#13440 ) When the `Full` flag is passed for key rotation, we kick off a core job to decrypt and re-encrypt all the secure variables so that they use the new key.	2022-07-11 13:34:06 -04:00
Charlie Voiselle	555ac432cd	SV: CAS: Implement Check and Set for Delete and Upsert (#13429 ) * SV: CAS * Implement Check and Set for Delete and Upsert * Reading the conflict from the state store * Update endpoint for new error text * Updated HTTP api tests * Conflicts to the HTTP api * SV: structs: Update SV time to UnixNanos * update mock to UnixNano; refactor * SV: encrypter: quote KeyID in error * SV: mock: add mock for namespace w/ SV	2022-07-11 13:34:06 -04:00
Tim Gross	8a50d2c3e8	implement quota tracking for secure variablees (#13453 ) We need to track per-namespace storage usage for secure variables even in Nomad OSS so that a cluster can be seamlessly upgraded from OSS to ENT without having to re-calculate quota usage. Provide a hook in the upsert RPC for enforcement of quotas in ENT. This will be a no-op in Nomad OSS.	2022-07-11 13:34:06 -04:00
Tim Gross	eaf430bfd5	secure variable server configuration (#13307 ) Add fields for configuring root key garbage collection and automatic rotation. Fix the keystore path so that we write to a tempdir when in dev mode.	2022-07-11 13:34:06 -04:00
Tim Gross	6300427228	core job for key rotation (#13309 ) Extend the GC job to support periodic key rotation. Update the GC process to safely support signed workload identity. We can't GC any key used to sign a workload identity. Finding which key was used to sign every allocation will be expensive, but there are not that many keys. This lets us take a conservative approach: find the oldest live allocation and ensure that we don't GC any key older than that key.	2022-07-11 13:34:06 -04:00
Tim Gross	350fe3495c	fix blocking query for `Keyring.List` RPC (#13384 ) The blocking query for `Keyring.List` appended the keys for each pass through the blocking query to the response. This results in mulitple copies of keys in the response. Overwrite the `reply.Keys` field on each pass through the blocking query to ensure we only get the expected page of responses.	2022-07-11 13:34:05 -04:00
Tim Gross	83dc3ec758	secure variables ACL policies (#13294 ) Adds a new policy block inside namespaces to control access to secure variables on the basis of path, with support for globbing. Splits out VerifyClaim from ResolveClaim. The ServiceRegistration RPC only needs to be able to verify that a claim is valid for some allocation in the store; it doesn't care about implicit policies or capabilities. Split this out to its own method on the server so that the SecureVariables RPC can reuse it as a separate step from resolving policies (see next commit). Support implicit policies based on workload identity	2022-07-11 13:34:05 -04:00

1 2 3 4 5 ...

4061 Commits