open-nomad

Author	SHA1	Message	Date
Charlie Voiselle	ad737d008b	SV API: return upserted variable to caller (#14325 ) * Return created variable to caller in HTTP and Go APIs * Update tests for returned values	2022-08-25 17:38:15 -04:00
Luiz Aoqui	e012d9411e	Task lifecycle restart (#14127 ) * allocrunner: handle lifecycle when all tasks die When all tasks die the Coordinator must transition to its terminal state, coordinatorStatePoststop, to unblock poststop tasks. Since this could happen at any time (for example, a prestart task dies), all states must be able to transition to this terminal state. * allocrunner: implement different alloc restarts Add a new alloc restart mode where all tasks are restarted, even if they have already exited. Also unifies the alloc restart logic to use the implementation that restarts tasks concurrently and ignores ErrTaskNotRunning errors since those are expected when restarting the allocation. * allocrunner: allow tasks to run again Prevent the task runner Run() method from exiting to allow a dead task to run again. When the task runner is signaled to restart, the function will jump back to the MAIN loop and run it again. The task runner determines if a task needs to run again based on two new task events that were added to differentiate between a request to restart a specific task, the tasks that are currently running, or all tasks that have already run. * api/cli: add support for all tasks alloc restart Implement the new -all-tasks alloc restart CLI flag and its API counterpar, AllTasks. The client endpoint calls the appropriate restart method from the allocrunner depending on the restart parameters used. * test: fix tasklifecycle Coordinator test * allocrunner: kill taskrunners if all tasks are dead When all non-poststop tasks are dead we need to kill the taskrunners so we don't leak their goroutines, which are blocked in the alloc restart loop. This also ensures the allocrunner exits on its own. * taskrunner: fix tests that waited on WaitCh Now that "dead" tasks may run again, the taskrunner Run() method will not return when the task finishes running, so tests must wait for the task state to be "dead" instead of using the WaitCh, since it won't be closed until the taskrunner is killed. * tests: add tests for all tasks alloc restart * changelog: add entry for #14127 * taskrunner: fix restore logic. The first implementation of the task runner restore process relied on server data (`tr.Alloc().TerminalStatus()`) which may not be available to the client at the time of restore. It also had the incorrect code path. When restoring a dead task the driver handle always needs to be clear cleanly using `clearDriverHandle` otherwise, after exiting the MAIN loop, the task may be killed by `tr.handleKill`. The fix is to store the state of the Run() loop in the task runner local client state: if the task runner ever exits this loop cleanly (not with a shutdown) it will never be able to run again. So if the Run() loops starts with this local state flag set, it must exit early. This local state flag is also being checked on task restart requests. If the task is "dead" and its Run() loop is not active it will never be able to run again. * address code review requests * apply more code review changes * taskrunner: add different Restart modes Using the task event to differentiate between the allocrunner restart methods proved to be confusing for developers to understand how it all worked. So instead of relying on the event type, this commit separated the logic of restarting an taskRunner into two methods: - `Restart` will retain the current behaviour and only will only restart the task if it's currently running. - `ForceRestart` is the new method where a `dead` task is allowed to restart if its `Run()` method is still active. Callers will need to restart the allocRunner taskCoordinator to make sure it will allow the task to run again. * minor fixes	2022-08-24 17:43:07 -04:00
Piotr Kazmierczak	7077d1f9aa	template: custom change_mode scripts (#13972 ) This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707	2022-08-24 17:43:01 +02:00
Seth Hoenig	60e22d2b13	Merge pull request #14221 from hashicorp/build-require-go1.19 build: go.mod should require go1.19	2022-08-23 07:53:13 -05:00
Tim Gross	bf57d76ec7	allow ACL policies to be associated with workload identity (#14140 ) The original design for workload identities and ACLs allows for operators to extend the automatic capabilities of a workload by using a specially-named policy. This has shown to be potentially unsafe because of naming collisions, so instead we'll allow operators to explicitly attach a policy to a workload identity. This changeset adds workload identity fields to ACL policy objects and threads that all the way down to the command line. It also a new secondary index to the ACL policy table on namespace and job so that claim resolution can efficiently query for related policies.	2022-08-22 16:41:21 -04:00
Luiz Aoqui	dbffdca92e	template: use pointer values for gid and uid (#14203 ) When a Nomad agent starts and loads jobs that already existed in the cluster, the default template uid and gid was being set to 0, since this is the zero value for int. This caused these jobs to fail in environments where it was not possible to use 0, such as in Windows clients. In order to differentiate between an explicit 0 and a template where these properties were not set we need to use a pointer.	2022-08-22 16:25:49 -04:00
Seth Hoenig	9bce3a2e36	build: go.mod should require go1.19 Since we started using atomic.Pointer, we should specify the go1.19 requirement in our go.mod files.	2022-08-21 20:41:49 -05:00
Seth Hoenig	88a1353149	cli: display nomad service check status output in CLI commands This PR adds some NSD check status output to the CLI. 1. The 'nomad alloc status' command produces nsd check summary output (if present) 2. The 'nomad alloc checks' sub-command is added to produce complete nsd check output (if present)	2022-08-19 09:18:29 -05:00
dependabot[bot]	05d943ed51	build(deps): bump github.com/shoenig/test from 0.3.0 to 0.3.1 in /api (#14194 ) Bumps [github.com/shoenig/test](https://github.com/shoenig/test) from 0.3.0 to 0.3.1. - [Release notes](https://github.com/shoenig/test/releases) - [Commits](https://github.com/shoenig/test/compare/v0.3.0...v0.3.1) --- updated-dependencies: - dependency-name: github.com/shoenig/test dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-08-19 11:32:59 +02:00
Piotr Kazmierczak	b63944b5c1	cleanup: replace TypeToPtr helper methods with pointer.Of (#14151 ) Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.	2022-08-17 18:26:34 +02:00
Seth Hoenig	0a6497ee1f	api: trim space of error response output	2022-08-16 15:00:38 -05:00
Tim Gross	c38c052ef3	api: document warnings for setting `api.ClientConnTimeout` (#14122 ) HTTP API consumers that have network line-of-sight to client nodes can connect directly for a small number of APIs. But in environments where the consumer doesn't have line-of-sight, there's a long pause waiting for the `api.ClientConnTimeout` to expire. Warn about this in the API docs so that authors can avoid the extra timeout.	2022-08-15 16:06:02 -04:00
James Rasell	2c540b03c5	api: use errors.New not fmt.Errorf when error doesn't have format. (#14027 ) * api: use errors.New not fmt.Errorf when error doesn't have format. * semgrep: add rule to catch fmt.Errorf use without formatting.	2022-08-05 17:05:47 +02:00
Luiz Aoqui	e3d78c343c	template: set default UID/GID to -1 (#13998 ) UID/GID 0 is usually reserved for the root user/group. While Nomad clients are expected to run as root it may not always be the case. Setting these values as -1 if not defined will fallback to the pervious behaviour of not attempting to set file ownership and use whatever UID/GID the Nomad agent is running as. It will also keep backwards compatibility, which is specially important for platforms where this feature is not supported, like Windows.	2022-08-04 11:26:08 -04:00
Piotr Kazmierczak	530280505f	client: enable specifying user/group permissions in the template stanza (#13755 ) * Adds Uid/Gid parameters to template. * Updated diff_test * fixed order * update jobspec and api * removed obsolete code * helper functions for jobspec parse test * updated documentation * adjusted API jobs test. * propagate uid/gid setting to job_endpoint * adjusted job_endpoint tests * making uid/gid into pointers * refactor * updated documentation * updated documentation * Update client/allocrunner/taskrunner/template/template_test.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * Update website/content/api-docs/json-jobs.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * propagating documentation change from Luiz * formatting * changelog entry * changed changelog entry Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-08-02 22:15:38 +02:00
James Rasell	bb5b510c9d	cli: do not import structs, use API package only. (#13938 )	2022-08-02 16:33:08 +02:00
Eric Weber	cbce13c1ac	Add stage_publish_base_dir field to csi_plugin stanza of a job (#13919 ) * Allow specification of CSI staging and publishing directory path * Add website documentation for stage_publish_dir * Replace erroneous reference to csi_plugin.mount_config with csi_plugin.mount_dir * Avoid requiring CSI plugins to be redeployed after introducing StagePublishDir	2022-08-02 09:42:44 -04:00
Tim Gross	e5ac6464f6	secure vars: enforce ENT quotas (OSS work) (#13951 ) Move the secure variables quota enforcement calls into the state store to ensure quota checks are atomic with quota updates (in the same transaction). Switch to a machine-size int instead of a uint64 for quota tracking. The ENT-side quota spec is described as int, and negative values have a meaning as "not permitted at all". Using the same type for tracking will make it easier to the math around checks, and uint64 is infeasibly large anyways. Add secure vars to quota HTTP API and CLI outputs and API docs.	2022-08-02 09:32:09 -04:00
Tim Gross	97a6346da0	keyring: use nanos for `CreateTime` in key metadata (#13849 ) Most of our objects use int64 timestamps derived from `UnixNano()` instead of `time.Time` objects. Switch the keyring metadata to use `UnixNano()` for consistency across the API.	2022-07-20 14:46:57 -04:00
dependabot[bot]	d3d1199b81	chore(deps): bump github.com/mitchellh/mapstructure from 1.4.3 to 1.5.0 in /api (#12725 ) * chore(deps): bump github.com/mitchellh/mapstructure in /api Bumps [github.com/mitchellh/mapstructure](https://github.com/mitchellh/mapstructure) from 1.4.3 to 1.5.0. - [Release notes](https://github.com/mitchellh/mapstructure/releases) - [Changelog](https://github.com/mitchellh/mapstructure/blob/master/CHANGELOG.md) - [Commits](https://github.com/mitchellh/mapstructure/compare/v1.4.3...v1.5.0) --- updated-dependencies: - dependency-name: github.com/mitchellh/mapstructure dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Also bump mapstructure in main go.mod Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-07-13 11:57:16 -07:00
dependabot[bot]	4b7253b33f	build(deps): bump github.com/gorilla/websocket from 1.4.2 to 1.5.0 in /api (#12075 ) * build(deps): bump github.com/gorilla/websocket in /api Bumps [github.com/gorilla/websocket](https://github.com/gorilla/websocket) from 1.4.2 to 1.5.0. - [Release notes](https://github.com/gorilla/websocket/releases) - [Commits](https://github.com/gorilla/websocket/compare/v1.4.2...v1.5.0) --- updated-dependencies: - dependency-name: github.com/gorilla/websocket dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * deps: also bump websocket dep in main binary Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-07-12 16:49:31 -07:00
dependabot[bot]	20d67890bf	build(deps): bump github.com/docker/go-units from 0.3.3 to 0.4.0 in /api (#11519 ) * build(deps): bump github.com/docker/go-units from 0.3.3 to 0.4.0 in /api Bumps [github.com/docker/go-units](https://github.com/docker/go-units) from 0.3.3 to 0.4.0. - [Release notes](https://github.com/docker/go-units/releases) - [Commits](https://github.com/docker/go-units/compare/v0.3.3...v0.4.0) --- updated-dependencies: - dependency-name: github.com/docker/go-units dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Tidy go.sum Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-07-12 12:54:56 -07:00
Tim Gross	a5a9eedc81	core job for secure variables re-key (#13440 ) When the `Full` flag is passed for key rotation, we kick off a core job to decrypt and re-encrypt all the secure variables so that they use the new key.	2022-07-11 13:34:06 -04:00
Charlie Voiselle	a8145ba7b8	SV: GO API: Secure Variables Go API (#13502 )	2022-07-11 13:34:06 -04:00
Charlie Voiselle	1fe080c6de	Implement HTTP search API for Variables (#13257 ) * Add Path only index for SecureVariables * Add GetSecureVariablesByPrefix; refactor tests * Add search for SecureVariables * Add prefix search for secure variables	2022-07-11 13:34:05 -04:00
Tim Gross	81b0c4fd36	keyring command line (#13169 ) Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>	2022-07-11 13:34:04 -04:00
Tim Gross	5a85d96322	remove end-user algorithm selection (#13190 ) After internal design review, we decided to remove exposing algorithm choice to the end-user for the initial release. We'll solve nonce rotation by forcing rotations automatically on key GC (in a core job, not included in this changeset). Default to AES-256 GCM for the following criteria: * faster implementation when hardware acceleration is available * FIPS compliant * implementation in pure go * post-quantum resistance Also fixed a bug in the decoding from keystore and switched to a harder-to-misuse encoding method.	2022-07-11 13:34:04 -04:00
Tim Gross	f2ee585830	bootstrap keyring (#13124 ) When a server becomes leader, it will check if there are any keys in the state store, and create one if there is not. The key metadata will be replicated via raft to all followers, who will then get the key material via key replication (not implemented in this changeset).	2022-07-11 13:34:04 -04:00
Tim Gross	05eef2b95c	keystore serialization (#13106 ) This changeset implements the keystore serialization/deserialization: * Adds a JSON serialization extension for the `RootKey` struct, along with a metadata stub. When we serialize RootKey to the on-disk keystore, we want to base64 encode the key material but also exclude any frequently-changing fields which are stored in raft. * Implements methods for loading/saving keys to the keystore. * Implements methods for restoring the whole keystore from disk. * Wires it all up with the `Keyring` RPC handlers and fixes up any fallout on tests.	2022-07-11 13:34:04 -04:00
Tim Gross	c6929a6c1e	keyring HTTP API (#13077 )	2022-07-11 13:34:04 -04:00
Charlie Voiselle	07418877c6	Fix Testutil for delve debugging API tests (#13589 )	2022-07-06 10:47:48 -04:00
James Rasell	0c0b028a59	core: allow deleting of evaluations (#13492 ) * core: add eval delete RPC and core functionality. * agent: add eval delete HTTP endpoint. * api: add eval delete API functionality. * cli: add eval delete command. * docs: add eval delete website documentation.	2022-07-06 16:30:11 +02:00
James Rasell	181b247384	core: allow pausing and un-pausing of leader broker routine (#13045 ) * core: allow pause/un-pause of eval broker on region leader. * agent: add ability to pause eval broker via scheduler config. * cli: add operator scheduler commands to interact with config. * api: add ability to pause eval broker via scheduler config * e2e: add operator scheduler test for eval broker pause. * docs: include new opertor scheduler CLI and pause eval API info.	2022-07-06 16:13:48 +02:00
dependabot[bot]	4278846eeb	build(deps): bump github.com/stretchr/testify in /api Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.7.5 to 1.8.0. - [Release notes](https://github.com/stretchr/testify/releases) - [Commits](https://github.com/stretchr/testify/compare/v1.7.5...v1.8.0) --- updated-dependencies: - dependency-name: github.com/stretchr/testify dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2022-07-01 09:50:54 -05:00
Seth Hoenig	adf41cc7ef	Merge pull request #13485 from hashicorp/f-api-env-cleanup api: use testing.T.Setenv to set env vars in tests	2022-06-27 12:16:09 -05:00
Seth Hoenig	ead76333fd	api: use testing.T.Setenv to set env vars in tests	2022-06-25 09:20:52 -05:00
Seth Hoenig	a5eab7c0f2	tests: update testify dependency	2022-06-24 15:39:21 +00:00
James Rasell	f1f7c5040b	api: added sysbatch job type constant to match other schedulers. (#13359 )	2022-06-16 11:53:04 +02:00
Grant Griffiths	99896da443	CSI: make plugin health_timeout configurable in csi_plugin stanza (#13340 ) Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2022-06-14 10:04:16 -04:00
Lance Haig	4bf27d743d	Allow Operator Generated bootstrap token (#12520 )	2022-06-03 07:37:24 -04:00
Huan Wang	7d15157635	adding support for customized ingress tls (#13184 )	2022-06-02 18:43:58 -04:00
Seth Hoenig	54efec5dfe	docs: add docs and tests for tagged_addresses	2022-05-31 13:02:48 -05:00
Jorge Marey	f966614602	Allow setting tagged addresses on services	2022-05-31 10:06:55 -05:00
Seth Hoenig	4631045d83	connect: enable setting connect upstream destination namespace	2022-05-26 09:39:36 -05:00
Tim Gross	b72ff42ada	api: include Consul token in job revert API (#13065 )	2022-05-19 11:30:29 -04:00
Seth Hoenig	65f7abf2f4	cli: update default redis and use nomad service discovery Closes #12927 Closes #12958 This PR updates the version of redis used in our examples from 3.2 to 7. The old version is very not supported anymore, and we should be setting a good example by using a supported version. The long-form example job is now fixed so that the service stanza uses nomad as the service discovery provider, and so now the job runs without a requirement of having Consul running and configured.	2022-05-17 10:24:19 -05:00
Michael Schurter	6449ba8d41	api: add ParseHCLOpts helper method (#12777 ) The existing ParseHCL func didn't allow setting HCLv1=true.	2022-04-25 11:51:52 -07:00
Luiz Aoqui	a8cc633156	vault: revert support for entity aliases (#12723 ) After a more detailed analysis of this feature, the approach taken in PR #12449 was found to be not ideal due to poor UX (users are responsible for setting the entity alias they would like to use) and issues around jobs potentially masquerading itself as another Vault entity.	2022-04-22 10:46:34 -04:00
Seth Hoenig	3fcac242c6	services: enable setting arbitrary address value in service registrations This PR introduces the `address` field in the `service` block so that Nomad or Consul services can be registered with a custom `.Address.` to advertise. The address can be an IP address or domain name. If the `address` field is set, the `service.address_mode` must be set in `auto` mode.	2022-04-22 09:14:29 -05:00
Shishir	f5121d261e	Add os to NodeListStub struct. (#12497 ) * Add os to NodeListStub struct. Signed-off-by: Shishir Mahajan <smahajan@roblox.com> * Add os as a query param to /v1/nodes. Signed-off-by: Shishir Mahajan <smahajan@roblox.com> * Add test: os as a query param to /v1/nodes. Signed-off-by: Shishir Mahajan <smahajan@roblox.com>	2022-04-15 17:22:45 -07:00

1 2 3 4 5 ...

1079 commits