open-nomad

Commit Graph

Author	SHA1	Message	Date
James Rasell	41284382dd	api: update keyring comment to reflect correct feature name. (#14558 )	2022-09-13 10:05:03 -04:00
James Rasell	1f877bac1c	acl: fix encoding expiration time in ACL token list API. (#14542 )	2022-09-12 15:50:35 +02:00
Charlie Voiselle	b55112714f	Vars: CLI commands for `var get`, `var put`, `var purge` (#14400 ) * Includes updates to `var init`	2022-09-09 17:55:20 -04:00
Tim Gross	9259a373cd	remove root keyring install API (#14514 ) * keyring rotate API should require put/post method * remove keyring install API	2022-09-09 08:50:35 -04:00
Tim Gross	3fc7482ecd	CSI: failed allocation should not block its own controller unpublish (#14484 ) A Nomad user reported problems with CSI volumes associated with failed allocations, where the Nomad server did not send a controller unpublish RPC. The controller unpublish is skipped if other non-terminal allocations on the same node claim the volume. The check has a bug where the allocation belonging to the claim being freed was included in the check incorrectly. During a normal allocation stop for job stop or a new version of the job, the allocation is terminal. But allocations that fail are not yet marked terminal at the point in time when the client sends the unpublish RPC to the server. For CSI plugins that support controller attach/detach, this means that the controller will not be able to detach the volume from the allocation's host and the replacement claim will fail until a GC is run. This changeset fixes the conditional so that the claim's own allocation is not included, and makes the logic easier to read. Include a test case covering this path. Also includes two minor extra bugfixes: * Entities we get from the state store should always be copied before altering. Ensure that we copy the volume in the top-level unpublish workflow before handing off to the steps. * The list stub object for volumes in `nomad/structs` did not match the stub object in `api`. The `api` package also did not include the current readers/writers fields that are expected by the UI. True up the two objects and add the previously undocumented fields to the docs.	2022-09-08 13:30:05 -04:00
Tim Gross	cc9b480996	testing: setting env var incompatible with parallel tests (#14405 ) Neither the `os.Setenv` nor `t.Setenv` helper are safe to use in parallel tests because environment variables are process-global. The stdlib panics if you try to do this. Remove the `ci.Parallel()` call from all tests where we're setting environment variables.	2022-08-30 14:49:03 -04:00
James Rasell	755b4745ed	Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main	2022-08-30 08:59:13 +01:00
Tim Gross	1dc053b917	rename SecureVariables to Variables throughout	2022-08-26 16:06:24 -04:00
Tim Gross	dcfd31296b	file rename	2022-08-26 16:06:24 -04:00
Charlie Voiselle	ad737d008b	SV API: return upserted variable to caller (#14325 ) * Return created variable to caller in HTTP and Go APIs * Update tests for returned values	2022-08-25 17:38:15 -04:00
James Rasell	601588df6b	Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main	2022-08-25 12:14:29 +01:00
James Rasell	7a0798663d	acl: fix a bug where roles could be duplicated by name. An ACL roles name must be unique, however, a bug meant multiple roles of the same same could be created. This fixes that problem with checks in the RPC handler and state store.	2022-08-25 09:20:43 +01:00
Luiz Aoqui	e012d9411e	Task lifecycle restart (#14127 ) * allocrunner: handle lifecycle when all tasks die When all tasks die the Coordinator must transition to its terminal state, coordinatorStatePoststop, to unblock poststop tasks. Since this could happen at any time (for example, a prestart task dies), all states must be able to transition to this terminal state. * allocrunner: implement different alloc restarts Add a new alloc restart mode where all tasks are restarted, even if they have already exited. Also unifies the alloc restart logic to use the implementation that restarts tasks concurrently and ignores ErrTaskNotRunning errors since those are expected when restarting the allocation. * allocrunner: allow tasks to run again Prevent the task runner Run() method from exiting to allow a dead task to run again. When the task runner is signaled to restart, the function will jump back to the MAIN loop and run it again. The task runner determines if a task needs to run again based on two new task events that were added to differentiate between a request to restart a specific task, the tasks that are currently running, or all tasks that have already run. * api/cli: add support for all tasks alloc restart Implement the new -all-tasks alloc restart CLI flag and its API counterpar, AllTasks. The client endpoint calls the appropriate restart method from the allocrunner depending on the restart parameters used. * test: fix tasklifecycle Coordinator test * allocrunner: kill taskrunners if all tasks are dead When all non-poststop tasks are dead we need to kill the taskrunners so we don't leak their goroutines, which are blocked in the alloc restart loop. This also ensures the allocrunner exits on its own. * taskrunner: fix tests that waited on WaitCh Now that "dead" tasks may run again, the taskrunner Run() method will not return when the task finishes running, so tests must wait for the task state to be "dead" instead of using the WaitCh, since it won't be closed until the taskrunner is killed. * tests: add tests for all tasks alloc restart * changelog: add entry for #14127 * taskrunner: fix restore logic. The first implementation of the task runner restore process relied on server data (`tr.Alloc().TerminalStatus()`) which may not be available to the client at the time of restore. It also had the incorrect code path. When restoring a dead task the driver handle always needs to be clear cleanly using `clearDriverHandle` otherwise, after exiting the MAIN loop, the task may be killed by `tr.handleKill`. The fix is to store the state of the Run() loop in the task runner local client state: if the task runner ever exits this loop cleanly (not with a shutdown) it will never be able to run again. So if the Run() loops starts with this local state flag set, it must exit early. This local state flag is also being checked on task restart requests. If the task is "dead" and its Run() loop is not active it will never be able to run again. * address code review requests * apply more code review changes * taskrunner: add different Restart modes Using the task event to differentiate between the allocrunner restart methods proved to be confusing for developers to understand how it all worked. So instead of relying on the event type, this commit separated the logic of restarting an taskRunner into two methods: - `Restart` will retain the current behaviour and only will only restart the task if it's currently running. - `ForceRestart` is the new method where a `dead` task is allowed to restart if its `Run()` method is still active. Callers will need to restart the allocRunner taskCoordinator to make sure it will allow the task to run again. * minor fixes	2022-08-24 17:43:07 -04:00
Piotr Kazmierczak	7077d1f9aa	template: custom change_mode scripts (#13972 ) This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707	2022-08-24 17:43:01 +02:00
Seth Hoenig	60e22d2b13	Merge pull request #14221 from hashicorp/build-require-go1.19 build: go.mod should require go1.19	2022-08-23 07:53:13 -05:00
Tim Gross	bf57d76ec7	allow ACL policies to be associated with workload identity (#14140 ) The original design for workload identities and ACLs allows for operators to extend the automatic capabilities of a workload by using a specially-named policy. This has shown to be potentially unsafe because of naming collisions, so instead we'll allow operators to explicitly attach a policy to a workload identity. This changeset adds workload identity fields to ACL policy objects and threads that all the way down to the command line. It also a new secondary index to the ACL policy table on namespace and job so that claim resolution can efficiently query for related policies.	2022-08-22 16:41:21 -04:00
Luiz Aoqui	dbffdca92e	template: use pointer values for gid and uid (#14203 ) When a Nomad agent starts and loads jobs that already existed in the cluster, the default template uid and gid was being set to 0, since this is the zero value for int. This caused these jobs to fail in environments where it was not possible to use 0, such as in Windows clients. In order to differentiate between an explicit 0 and a template where these properties were not set we need to use a pointer.	2022-08-22 16:25:49 -04:00
James Rasell	2736cf0dfa	acl: make listing RPC and HTTP API a stub return object. (#14211 ) Making the ACL Role listing return object a stub future-proofs the endpoint. In the event the role object grows, we are not bound by having to return all fields within the list endpoint or change the signature of the endpoint to reduce the list return size.	2022-08-22 17:20:23 +02:00
Seth Hoenig	9bce3a2e36	build: go.mod should require go1.19 Since we started using atomic.Pointer, we should specify the go1.19 requirement in our go.mod files.	2022-08-21 20:41:49 -05:00
Seth Hoenig	88a1353149	cli: display nomad service check status output in CLI commands This PR adds some NSD check status output to the CLI. 1. The 'nomad alloc status' command produces nsd check summary output (if present) 2. The 'nomad alloc checks' sub-command is added to produce complete nsd check output (if present)	2022-08-19 09:18:29 -05:00
dependabot[bot]	05d943ed51	build(deps): bump github.com/shoenig/test from 0.3.0 to 0.3.1 in /api (#14194 ) Bumps [github.com/shoenig/test](https://github.com/shoenig/test) from 0.3.0 to 0.3.1. - [Release notes](https://github.com/shoenig/test/releases) - [Commits](https://github.com/shoenig/test/compare/v0.3.0...v0.3.1) --- updated-dependencies: - dependency-name: github.com/shoenig/test dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-08-19 11:32:59 +02:00
Piotr Kazmierczak	b63944b5c1	cleanup: replace TypeToPtr helper methods with pointer.Of (#14151 ) Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.	2022-08-17 18:26:34 +02:00
James Rasell	f5d8cb2d90	api: add ACL role linking to ACL tokens.	2022-08-17 14:45:32 +01:00
Seth Hoenig	0a6497ee1f	api: trim space of error response output	2022-08-16 15:00:38 -05:00
Tim Gross	c38c052ef3	api: document warnings for setting `api.ClientConnTimeout` (#14122 ) HTTP API consumers that have network line-of-sight to client nodes can connect directly for a small number of APIs. But in environments where the consumer doesn't have line-of-sight, there's a long pause waiting for the `api.ClientConnTimeout` to expire. Warn about this in the API docs so that authors can avoid the extra timeout.	2022-08-15 16:06:02 -04:00
James Rasell	8724974683	api: add ACL Role API implementation for CRUD actions.	2022-08-11 08:44:37 +01:00
James Rasell	2c540b03c5	api: use errors.New not fmt.Errorf when error doesn't have format. (#14027 ) * api: use errors.New not fmt.Errorf when error doesn't have format. * semgrep: add rule to catch fmt.Errorf use without formatting.	2022-08-05 17:05:47 +02:00
Luiz Aoqui	e3d78c343c	template: set default UID/GID to -1 (#13998 ) UID/GID 0 is usually reserved for the root user/group. While Nomad clients are expected to run as root it may not always be the case. Setting these values as -1 if not defined will fallback to the pervious behaviour of not attempting to set file ownership and use whatever UID/GID the Nomad agent is running as. It will also keep backwards compatibility, which is specially important for platforms where this feature is not supported, like Windows.	2022-08-04 11:26:08 -04:00
Piotr Kazmierczak	530280505f	client: enable specifying user/group permissions in the template stanza (#13755 ) * Adds Uid/Gid parameters to template. * Updated diff_test * fixed order * update jobspec and api * removed obsolete code * helper functions for jobspec parse test * updated documentation * adjusted API jobs test. * propagate uid/gid setting to job_endpoint * adjusted job_endpoint tests * making uid/gid into pointers * refactor * updated documentation * updated documentation * Update client/allocrunner/taskrunner/template/template_test.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * Update website/content/api-docs/json-jobs.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * propagating documentation change from Luiz * formatting * changelog entry * changed changelog entry Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-08-02 22:15:38 +02:00
James Rasell	bb5b510c9d	cli: do not import structs, use API package only. (#13938 )	2022-08-02 16:33:08 +02:00
Eric Weber	cbce13c1ac	Add stage_publish_base_dir field to csi_plugin stanza of a job (#13919 ) * Allow specification of CSI staging and publishing directory path * Add website documentation for stage_publish_dir * Replace erroneous reference to csi_plugin.mount_config with csi_plugin.mount_dir * Avoid requiring CSI plugins to be redeployed after introducing StagePublishDir	2022-08-02 09:42:44 -04:00
Tim Gross	e5ac6464f6	secure vars: enforce ENT quotas (OSS work) (#13951 ) Move the secure variables quota enforcement calls into the state store to ensure quota checks are atomic with quota updates (in the same transaction). Switch to a machine-size int instead of a uint64 for quota tracking. The ENT-side quota spec is described as int, and negative values have a meaning as "not permitted at all". Using the same type for tracking will make it easier to the math around checks, and uint64 is infeasibly large anyways. Add secure vars to quota HTTP API and CLI outputs and API docs.	2022-08-02 09:32:09 -04:00
James Rasell	663aa92b7a	Merge branch 'main' into f-gh-13120-sso-umbrella	2022-08-02 08:30:03 +01:00
Tim Gross	97a6346da0	keyring: use nanos for `CreateTime` in key metadata (#13849 ) Most of our objects use int64 timestamps derived from `UnixNano()` instead of `time.Time` objects. Switch the keyring metadata to use `UnixNano()` for consistency across the API.	2022-07-20 14:46:57 -04:00
James Rasell	f6d12a3c00	acl: enable configuration and visualisation of token expiration for users (#13846 ) * api: add ACL token expiry params to HTTP API * cli: allow setting and displaying ACL token expiry	2022-07-20 10:06:23 +02:00
dependabot[bot]	d3d1199b81	chore(deps): bump github.com/mitchellh/mapstructure from 1.4.3 to 1.5.0 in /api (#12725 ) * chore(deps): bump github.com/mitchellh/mapstructure in /api Bumps [github.com/mitchellh/mapstructure](https://github.com/mitchellh/mapstructure) from 1.4.3 to 1.5.0. - [Release notes](https://github.com/mitchellh/mapstructure/releases) - [Changelog](https://github.com/mitchellh/mapstructure/blob/master/CHANGELOG.md) - [Commits](https://github.com/mitchellh/mapstructure/compare/v1.4.3...v1.5.0) --- updated-dependencies: - dependency-name: github.com/mitchellh/mapstructure dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Also bump mapstructure in main go.mod Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-07-13 11:57:16 -07:00
dependabot[bot]	4b7253b33f	build(deps): bump github.com/gorilla/websocket from 1.4.2 to 1.5.0 in /api (#12075 ) * build(deps): bump github.com/gorilla/websocket in /api Bumps [github.com/gorilla/websocket](https://github.com/gorilla/websocket) from 1.4.2 to 1.5.0. - [Release notes](https://github.com/gorilla/websocket/releases) - [Commits](https://github.com/gorilla/websocket/compare/v1.4.2...v1.5.0) --- updated-dependencies: - dependency-name: github.com/gorilla/websocket dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * deps: also bump websocket dep in main binary Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-07-12 16:49:31 -07:00
dependabot[bot]	20d67890bf	build(deps): bump github.com/docker/go-units from 0.3.3 to 0.4.0 in /api (#11519 ) * build(deps): bump github.com/docker/go-units from 0.3.3 to 0.4.0 in /api Bumps [github.com/docker/go-units](https://github.com/docker/go-units) from 0.3.3 to 0.4.0. - [Release notes](https://github.com/docker/go-units/releases) - [Commits](https://github.com/docker/go-units/compare/v0.3.3...v0.4.0) --- updated-dependencies: - dependency-name: github.com/docker/go-units dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Tidy go.sum Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-07-12 12:54:56 -07:00
Tim Gross	a5a9eedc81	core job for secure variables re-key (#13440 ) When the `Full` flag is passed for key rotation, we kick off a core job to decrypt and re-encrypt all the secure variables so that they use the new key.	2022-07-11 13:34:06 -04:00
Charlie Voiselle	a8145ba7b8	SV: GO API: Secure Variables Go API (#13502 )	2022-07-11 13:34:06 -04:00
Charlie Voiselle	1fe080c6de	Implement HTTP search API for Variables (#13257 ) * Add Path only index for SecureVariables * Add GetSecureVariablesByPrefix; refactor tests * Add search for SecureVariables * Add prefix search for secure variables	2022-07-11 13:34:05 -04:00
Tim Gross	81b0c4fd36	keyring command line (#13169 ) Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>	2022-07-11 13:34:04 -04:00
Tim Gross	5a85d96322	remove end-user algorithm selection (#13190 ) After internal design review, we decided to remove exposing algorithm choice to the end-user for the initial release. We'll solve nonce rotation by forcing rotations automatically on key GC (in a core job, not included in this changeset). Default to AES-256 GCM for the following criteria: * faster implementation when hardware acceleration is available * FIPS compliant * implementation in pure go * post-quantum resistance Also fixed a bug in the decoding from keystore and switched to a harder-to-misuse encoding method.	2022-07-11 13:34:04 -04:00
Tim Gross	f2ee585830	bootstrap keyring (#13124 ) When a server becomes leader, it will check if there are any keys in the state store, and create one if there is not. The key metadata will be replicated via raft to all followers, who will then get the key material via key replication (not implemented in this changeset).	2022-07-11 13:34:04 -04:00
Tim Gross	05eef2b95c	keystore serialization (#13106 ) This changeset implements the keystore serialization/deserialization: * Adds a JSON serialization extension for the `RootKey` struct, along with a metadata stub. When we serialize RootKey to the on-disk keystore, we want to base64 encode the key material but also exclude any frequently-changing fields which are stored in raft. * Implements methods for loading/saving keys to the keystore. * Implements methods for restoring the whole keystore from disk. * Wires it all up with the `Keyring` RPC handlers and fixes up any fallout on tests.	2022-07-11 13:34:04 -04:00
Tim Gross	c6929a6c1e	keyring HTTP API (#13077 )	2022-07-11 13:34:04 -04:00
Charlie Voiselle	07418877c6	Fix Testutil for delve debugging API tests (#13589 )	2022-07-06 10:47:48 -04:00
James Rasell	0c0b028a59	core: allow deleting of evaluations (#13492 ) * core: add eval delete RPC and core functionality. * agent: add eval delete HTTP endpoint. * api: add eval delete API functionality. * cli: add eval delete command. * docs: add eval delete website documentation.	2022-07-06 16:30:11 +02:00
James Rasell	181b247384	core: allow pausing and un-pausing of leader broker routine (#13045 ) * core: allow pause/un-pause of eval broker on region leader. * agent: add ability to pause eval broker via scheduler config. * cli: add operator scheduler commands to interact with config. * api: add ability to pause eval broker via scheduler config * e2e: add operator scheduler test for eval broker pause. * docs: include new opertor scheduler CLI and pause eval API info.	2022-07-06 16:13:48 +02:00
dependabot[bot]	4278846eeb	build(deps): bump github.com/stretchr/testify in /api Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.7.5 to 1.8.0. - [Release notes](https://github.com/stretchr/testify/releases) - [Commits](https://github.com/stretchr/testify/compare/v1.7.5...v1.8.0) --- updated-dependencies: - dependency-name: github.com/stretchr/testify dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2022-07-01 09:50:54 -05:00

1 2 3 4 5 ...

1095 Commits