open-nomad

Author	SHA1	Message	Date
Seth Hoenig	52de2dc09d	Merge pull request #14290 from hashicorp/cleanup-more-helper-cleanup cleanup: tidy up helper package some more	2022-08-30 08:19:48 -05:00
James Rasell	755b4745ed	Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main	2022-08-30 08:59:13 +01:00
Seth Hoenig	3e1e2001b9	Merge pull request #14143 from hashicorp/cleanup-slice-sets-3 cleanup: more cleanup of slices that are really sets	2022-08-29 13:52:59 -05:00
Tim Gross	7d1eb2efd5	keyring: split structs to its own file (#14378 )	2022-08-29 14:18:35 -04:00
Seth Hoenig	9d0e274f27	cleanup: cleanup more slice-set comparisons	2022-08-29 12:04:21 -05:00
Tim Gross	62a968f443	Merge pull request #14351 from hashicorp/variables-rename Variables rename	2022-08-29 11:36:50 -04:00
Piotr Kazmierczak	5f353503e5	bugfix: fixed template validation panic in case of incorrect ChangeScript configuration (#14374 ) Fixes #14367	2022-08-29 17:11:15 +02:00
Tim Gross	1dc053b917	rename SecureVariables to Variables throughout	2022-08-26 16:06:24 -04:00
Tim Gross	dcfd31296b	file rename	2022-08-26 16:06:24 -04:00
Seth Hoenig	b87689d2d1	Merge pull request #14318 from hashicorp/cleanup-create-pointer-compare cleanup: create pointer.Compare helper function	2022-08-26 09:15:41 -05:00
Seth Hoenig	6b2655ad86	cleanup: create pointer.Compare helper function This PR creates a pointer.Compare helper for comparing equality of two pointers. Strictly only works with primitive types we know are safe to derefence and compare using '=='.	2022-08-26 08:55:59 -05:00
James Rasell	601588df6b	Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main	2022-08-25 12:14:29 +01:00
James Rasell	7a0798663d	acl: fix a bug where roles could be duplicated by name. An ACL roles name must be unique, however, a bug meant multiple roles of the same same could be created. This fixes that problem with checks in the RPC handler and state store.	2022-08-25 09:20:43 +01:00
Luiz Aoqui	e012d9411e	Task lifecycle restart (#14127 ) * allocrunner: handle lifecycle when all tasks die When all tasks die the Coordinator must transition to its terminal state, coordinatorStatePoststop, to unblock poststop tasks. Since this could happen at any time (for example, a prestart task dies), all states must be able to transition to this terminal state. * allocrunner: implement different alloc restarts Add a new alloc restart mode where all tasks are restarted, even if they have already exited. Also unifies the alloc restart logic to use the implementation that restarts tasks concurrently and ignores ErrTaskNotRunning errors since those are expected when restarting the allocation. * allocrunner: allow tasks to run again Prevent the task runner Run() method from exiting to allow a dead task to run again. When the task runner is signaled to restart, the function will jump back to the MAIN loop and run it again. The task runner determines if a task needs to run again based on two new task events that were added to differentiate between a request to restart a specific task, the tasks that are currently running, or all tasks that have already run. * api/cli: add support for all tasks alloc restart Implement the new -all-tasks alloc restart CLI flag and its API counterpar, AllTasks. The client endpoint calls the appropriate restart method from the allocrunner depending on the restart parameters used. * test: fix tasklifecycle Coordinator test * allocrunner: kill taskrunners if all tasks are dead When all non-poststop tasks are dead we need to kill the taskrunners so we don't leak their goroutines, which are blocked in the alloc restart loop. This also ensures the allocrunner exits on its own. * taskrunner: fix tests that waited on WaitCh Now that "dead" tasks may run again, the taskrunner Run() method will not return when the task finishes running, so tests must wait for the task state to be "dead" instead of using the WaitCh, since it won't be closed until the taskrunner is killed. * tests: add tests for all tasks alloc restart * changelog: add entry for #14127 * taskrunner: fix restore logic. The first implementation of the task runner restore process relied on server data (`tr.Alloc().TerminalStatus()`) which may not be available to the client at the time of restore. It also had the incorrect code path. When restoring a dead task the driver handle always needs to be clear cleanly using `clearDriverHandle` otherwise, after exiting the MAIN loop, the task may be killed by `tr.handleKill`. The fix is to store the state of the Run() loop in the task runner local client state: if the task runner ever exits this loop cleanly (not with a shutdown) it will never be able to run again. So if the Run() loops starts with this local state flag set, it must exit early. This local state flag is also being checked on task restart requests. If the task is "dead" and its Run() loop is not active it will never be able to run again. * address code review requests * apply more code review changes * taskrunner: add different Restart modes Using the task event to differentiate between the allocrunner restart methods proved to be confusing for developers to understand how it all worked. So instead of relying on the event type, this commit separated the logic of restarting an taskRunner into two methods: - `Restart` will retain the current behaviour and only will only restart the task if it's currently running. - `ForceRestart` is the new method where a `dead` task is allowed to restart if its `Run()` method is still active. Callers will need to restart the allocRunner taskCoordinator to make sure it will allow the task to run again. * minor fixes	2022-08-24 17:43:07 -04:00
Tim Gross	c732b215f0	vault: detect namespace change in config reload (#14298 ) The `namespace` field was not included in the equality check between old and new Vault configurations, which meant that a Vault config change that only changed the namespace would not be detected as a change and the clients would not be reloaded. Also, the comparison for boolean fields such as `enabled` and `allow_unauthenticated` was on the pointer and not the value of that pointer, which results in spurious reloads in real config reload that is easily missed in typical test scenarios. Includes a minor refactor of the order of fields for `Copy` and `Merge` to match the struct fields in hopes it makes it harder to make this mistake in the future, as well as additional test coverage.	2022-08-24 17:03:29 -04:00
Piotr Kazmierczak	7077d1f9aa	template: custom change_mode scripts (#13972 ) This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707	2022-08-24 17:43:01 +02:00
Seth Hoenig	bff6c88683	cleanup: remove more copies of min/max from helper	2022-08-24 09:56:15 -05:00
James Rasell	9782d6d7ff	acl: allow tokens to lookup linked roles. (#14227 ) When listing or reading an ACL role, roles linked to the ACL token used for authentication can be returned to the caller.	2022-08-24 13:51:51 +02:00
Michael Schurter	0114bcfe5b	core: move LicenseConfig to shared file (#14247 ) This moves LicenseConfig and its Copy method to a shared file so that it can be shared with enterprise code.	2022-08-23 13:44:10 -07:00
Luiz Aoqui	7a8cacc9ec	allocrunner: refactor task coordinator (#14009 ) The current implementation for the task coordinator unblocks tasks by performing destructive operations over its internal state (like closing channels and deleting maps from keys). This presents a problem in situations where we would like to revert the state of a task, such as when restarting an allocation with tasks that have already exited. With this new implementation the task coordinator behaves more like a finite state machine where task may be blocked/unblocked multiple times by performing a state transition. This initial part of the work only refactors the task coordinator and is functionally equivalent to the previous implementation. Future work will build upon this to provide bug fixes and enhancements.	2022-08-22 18:38:49 -04:00
Tim Gross	bf57d76ec7	allow ACL policies to be associated with workload identity (#14140 ) The original design for workload identities and ACLs allows for operators to extend the automatic capabilities of a workload by using a specially-named policy. This has shown to be potentially unsafe because of naming collisions, so instead we'll allow operators to explicitly attach a policy to a workload identity. This changeset adds workload identity fields to ACL policy objects and threads that all the way down to the command line. It also a new secondary index to the ACL policy table on namespace and job so that claim resolution can efficiently query for related policies.	2022-08-22 16:41:21 -04:00
Charlie Voiselle	29e63a6cb2	Make var get a blocking query as expected (#14205 )	2022-08-22 16:37:21 -04:00
Luiz Aoqui	dbffdca92e	template: use pointer values for gid and uid (#14203 ) When a Nomad agent starts and loads jobs that already existed in the cluster, the default template uid and gid was being set to 0, since this is the zero value for int. This caused these jobs to fail in environments where it was not possible to use 0, such as in Windows clients. In order to differentiate between an explicit 0 and a template where these properties were not set we need to use a pointer.	2022-08-22 16:25:49 -04:00
James Rasell	2736cf0dfa	acl: make listing RPC and HTTP API a stub return object. (#14211 ) Making the ACL Role listing return object a stub future-proofs the endpoint. In the event the role object grows, we are not bound by having to return all fields within the list endpoint or change the signature of the endpoint to reduce the list return size.	2022-08-22 17:20:23 +02:00
James Rasell	802d005ef5	acl: add replication to ACL Roles from authoritative region. (#14176 ) ACL Roles along with policies and global token will be replicated from the authoritative region to all federated regions. This involves a new replication loop running on the federated leader. Policies and roles may be replicated at different times, meaning the policies and role references may not be present within the local state upon replication upsert. In order to bypass the RPC and state check, a new RPC request parameter has been added. This is used by the replication process; all other callers will trigger the ACL role policy validation check. There is a new ACL RPC endpoint to allow the reading of a set of ACL Roles which is required by the replication process and matches ACL Policies and Tokens. A bug within the ACL Role listing RPC has also been fixed which returned incorrect data during blocking queries where a deletion had occurred.	2022-08-22 08:54:07 +02:00
Michael Schurter	26637ab55d	core: fix race mutating jobs in scaling api (#14192 ) Since the state store returns a pointer to the shared job structs in memdb we must always copy it before mutating it and applying the new version via raft. Otherwise if the rpc fails before the mutated job is committed to raft (either due to validation, bug, crash, or other exit condition), the leader server will have an updated copy of the job that other servers will not have.	2022-08-19 15:46:54 -07:00
Michael Schurter	3b57df33e3	client: fix data races in config handling (#14139 ) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex only protects the config pointer itself and not fields inside the Config struct: all config mutation must be done on a copy of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an internally backward incompatible API change: our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"	2022-08-18 16:32:04 -07:00
Derek Strickland	8dba52cee2	sentinel: add support for Nomad ACL Token and Namespace (#14171 ) * sentinel: add ability to reference Nomad ACL Token and Namespace in Sentinel policies	2022-08-18 16:33:00 -04:00
Michael Schurter	1d1526cd24	events: fix race in acl event handling (#14188 )	2022-08-18 11:39:14 -07:00
Piotr Kazmierczak	b63944b5c1	cleanup: replace TypeToPtr helper methods with pointer.Of (#14151 ) Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.	2022-08-17 18:26:34 +02:00
James Rasell	9e3f1581fb	core: add ACL role functionality to ACL tokens. ACL tokens can now utilize ACL roles in order to provide API authorization. Each ACL token can be created and linked to an array of policies as well as an array of ACL role links. The link can be provided via the role name or ID, but internally, is always resolved to the ID as this is immutable whereas the name can be changed by operators. When resolving an ACL token, the policies linked from an ACL role are unpacked and combined with the policy array to form the complete auth set for the token. The ACL token creation endpoint handles deduplicating ACL role links as well as ensuring they exist within state. When reading a token, Nomad will also ensure the ACL role link is current. This handles ACL roles being deleted from under a token from a UX standpoint.	2022-08-17 14:45:01 +01:00
Seth Hoenig	bb58b94e15	Merge pull request #14141 from hashicorp/cleanup-slice-sets-2 cleanup: helper func for comparing slices of .Equals types	2022-08-16 15:12:39 -05:00
Michael Schurter	cdf5a74998	core: fix data races in blocked eval chan handling (#14142 ) Similar to the deployment watcher fix in #14121 - the server code loves these mutable structs so we need to guard access to the struct fields with locks. Capturing ch := b.capacityChangeCh is sufficient to satisfy the data race detector, but I noticed it was also possible to leak goroutines: Since the watchCapacity loop is in charge of receiving from capacityChangeCh and exits when stopCh is closed, senders to capacityChangeCh also must exit when stopCh is closed. Otherwise they may block forever if capacityChangeCh is full because it will never be received on again. I did not find evidence of this occurring in my meager smattering of prod goroutine dumps I have laying around, but this isn't surprising as the chan has a buffer of 8096! I would imagine that is sufficient to handle "late" sends and then just get GC'd away when the last reference to the old chan is dropped. This is just additional safety/correctness.	2022-08-16 12:33:53 -07:00
Seth Hoenig	47f5452825	cleanup: helper funcs for comparing slices of references	2022-08-16 13:47:47 -05:00
Michael Schurter	0c3cfb073a	nomad: fix race in Bootstrapped access (#14120 ) * nomad: fix race in Bootstrapped access	2022-08-16 11:22:33 -07:00
Michael Schurter	285979e96c	deployments: fix data races (#14121 ) * deployments: fix data races Both priority and state related fields may be mutated concurrently and need to be accessed with the lock acquired.	2022-08-16 10:50:40 -07:00
Seth Hoenig	7728cf5a9a	Merge pull request #14132 from hashicorp/build-update-go1.19 build: update to go1.19	2022-08-16 11:20:27 -05:00
Seth Hoenig	b3ea68948b	build: run gofmt on all go source files Go 1.19 will forecefully format all your doc strings. To get this out of the way, here is one big commit with all the changes gofmt wants to make.	2022-08-16 11:14:11 -05:00
Seth Hoenig	56b0b456dc	Merge pull request #14102 from hashicorp/cleanup-mesh-gateway-value cleanup: consul mesh gateway type need not be pointer	2022-08-16 10:07:16 -05:00
Tim Gross	a4e89d72a8	secure vars: filter by path in List RPCs (#14036 ) The List RPCs only checked the ACL for the Prefix argument of the request. Add an ACL filter to the paginator for the List RPC. Extend test coverage of ACLs in the List RPC and in the `acl` package, and add a "deny" capability so that operators can deny specific paths or prefixes below an allowed path.	2022-08-15 11:38:20 -04:00
Tim Gross	4005759d28	move secure variable conflict resolution to state store (#13922 ) Move conflict resolution implementation into the state store with a new Apply RPC. This also makes the RPC for secure variables much more similar to Consul's KV, which will help us support soft deletes in a post-1.4.0 version of Nomad. Reimplement quotas in the state store functions. Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>	2022-08-15 11:19:53 -04:00
Seth Hoenig	f9355c29fb	cleanup: consul mesh gateway type need not be pointer This PR changes the use of structs.ConsulMeshGateway to value types instead of via pointers. This will help in a follow up PR where we cleanup a lot of custom comparison code with helper functions instead.	2022-08-13 11:26:58 -05:00
James Rasell	581a5bb6ad	rpc: add ACL Role RPC endpoint for CRUD actions. New ACL Role RPC endpoints have been created to allow the creation, update, read, and deletion of ACL roles. All endpoints require a management token; in the future readers will also be allowed to view roles associated to their ACL token. The create endpoint in particular is responsible for deduplicating ACL policy links and ensuring named policies are found within state. This is done within the RPC handler so we perform a single loop through the links for slight efficiency.	2022-08-11 08:43:50 +01:00
Seth Hoenig	3aaaedf52e	cli: forward request for job validation to nomad leader This PR changes the behavior of 'nomad job validate' to forward the request to the nomad leader, rather than responding from any server. This is because we need the leader when validating Vault tokens, since the leader is the only server with an active vault client.	2022-08-10 14:34:04 -05:00
James Rasell	e660c9a908	core: add ACL role state schema and functionality. (#13955 ) This commit includes the new state schema for ACL roles along with state interaction functions for CRUD actions. The change also includes snapshot persist and restore functionality and the addition of FSM messages for Raft updates which will come via RPC endpoints.	2022-08-09 09:33:41 +02:00
Seth Hoenig	0b52c27a15	Merge pull request #14045 from Abirdcfly/main fix minor unreachable code caused by t.Fatal	2022-08-08 11:47:02 -05:00
Abirdcfly	d66943d4f7	fix minor unreachable code caused by t.Fatal Signed-off-by: Abirdcfly <fp544037857@gmail.com>	2022-08-08 23:50:11 +08:00
Seth Hoenig	2b6bda49b9	core: automatically plumb task name into task-level services and checks	2022-08-05 12:42:41 -05:00
Seth Hoenig	f6f26fb72c	nsd: add support for setting request body in http checks This PR adds support for settings check.body in checks of services making use of Nomad's service provider.	2022-08-04 14:40:23 -05:00
Charles Z	7a8ec90fbe	allow unhealthy canaries without blocking autopromote (#14001 )	2022-08-04 11:53:50 -04:00

1 2 3 4 5 ...

4108 commits