open-nomad

Commit Graph

Author	SHA1	Message	Date
Charlie Voiselle	0b9ba1ae92	chore: Convert assets from bindatafs to go embeds (#16066 ) * Convert assets from bindatafs to go embeds * Add command/asset to "uninteresting" list for missing test check * Remove generate-examples target * Update paths in tests	2023-02-10 12:02:29 -05:00
James Rasell	0d37892024	cli: fix use of the sanitized method type for the login command. (#16105 ) When an auth method was not supplied and the OIDC type was given in lower case, the CLI was not matching the default method due to casing and responded with a confusing user message. This change fixes the above problem, along with making use of the santized type easier.	2023-02-09 15:23:54 +01:00
hc-github-team-nomad-core	6dedb795cf	Generate files for 1.5.0-beta.1 release	2023-02-08 08:54:36 +00:00
Michael Schurter	35d65c7c7e	Dynamic Node Metadata (#15844 ) Fixes #14617 Dynamic Node Metadata allows Nomad users, and their jobs, to update Node metadata through an API. Currently Node metadata is only reloaded when a Client agent is restarted. Includes new UI for editing metadata as well. --------- Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>	2023-02-07 14:42:25 -08:00
Charlie Voiselle	31a289891d	Add sprig for command templates (#9053 ) Adds the sprig functions to the template funcmap prepended with `sprig_` to match the behavior in consul-template	2023-02-07 14:07:20 -05:00
James Rasell	8cc212167b	agent: fix agent HTTP server audit event implementation access. (#16076 )	2023-02-07 17:20:11 +01:00
Dao Thanh Tung	54dc2f629a	doc: specifiy the default output is of json format for `nomad quota inspect` command (#15984 ) Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>	2023-02-07 16:34:05 +01:00
Seth Hoenig	590ae08752	main: remove deprecated uses of rand.Seed (#16074 ) * main: remove deprecated uses of rand.Seed go1.20 deprecates rand.Seed, and seeds the rand package automatically. Remove cases where we seed the random package, and cleanup the one case where we intentionally create a known random source. * cl: update cl * mod: update go mod	2023-02-07 09:19:38 -06:00
Tim Gross	8a7d6b0cde	cli: remove deprecated `keyring` and `keygen` commands (#16068 ) These command were marked as deprecated in 1.4.0 with intent to remove in 1.5.0. Remove them and clean up the docs.	2023-02-07 09:49:52 -05:00
Dao Thanh Tung	ae720fe28d	Add `-json` and `-t` flag for `nomad acl token create` command (#16055 ) Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>	2023-02-07 12:05:41 +01:00
Michael Schurter	0a496c845e	Task API via Unix Domain Socket (#15864 ) This change introduces the Task API: a portable way for tasks to access Nomad's HTTP API. This particular implementation uses a Unix Domain Socket and, unlike the agent's HTTP API, always requires authentication even if ACLs are disabled. This PR contains the core feature and tests but followup work is required for the following TODO items: - Docs - might do in a followup since dynamic node metadata / task api / workload id all need to interlink - Unit tests for auth middleware - Caching for auth middleware - Rate limiting on negative lookups for auth middleware --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-02-06 11:31:22 -08:00
Seth Hoenig	911700ffea	build: update to go1.20 (#16029 ) * build: update to go1.20 * build: use stringy go1.20 in circle yaml * tests: handle new x509 certificate error structure in go1.20 * cl: add cl entry	2023-02-03 08:14:53 -06:00
Charlie Voiselle	cc6f4719f1	Add option to expose workload token to task (#15755 ) Add `identity` jobspec block to expose workload identity tokens to tasks. --------- Co-authored-by: Anders <mail@anars.dk> Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2023-02-02 10:59:14 -08:00
Daniel Bennett	dc9c8d4e47	Change `job init` default to example`.nomad.hcl` and recommend in docs (#15997 ) recommend .nomad.hcl for job files instead of .nomad (without .hcl) * nomad job init -> example.nomad.hcl * update docs	2023-02-02 11:47:47 -06:00
Tim Gross	971a286ea3	cli: Fix a panic in `deployment status` when scheduling is slow (#16011 ) If a deployment fails, the `deployment status` command can get a nil deployment when it checks for a rollback deployment if there isn't one (or at least not one at the time of the query). Fix the panic.	2023-02-02 12:34:44 -05:00
Phil Renaud	3db9f11c37	[feat] Nomad Job Templates (#15746 ) * Extend variables under the nomad path prefix to allow for job-templates (#15570) * Extend variables under the nomad path prefix to allow for job-templates * Add job-templates to error message hinting * RadioCard component for Job Templates (#15582) * chore: add * test: component API * ui: component template * refact: remove bc naming collission * styles: remove SASS var causing conflicts * Disallow specific variable at nomad/job-templates (#15681) * Disallows variables at exactly nomad/job-templates * idiomatic refactor * Expanding nomad job init to accept a template flag (#15571) * Adding a string flag for templates on job init * data-down actions-up version of a custom template editor within variable * Dont force grid on job template editor * list-templates flag started * Correctly slice from end of path name * Pre-review cleanup * Variable form acceptance test for job template editing * Some review cleanup * List Job templates test * Example from template test * Using must.assertions instead of require etc * ui: add choose template button (#15596) * ui: add new routes * chore: update file directory * ui: add choose template button * test: button and page navigation * refact: update var name * ui: use `Button` component from `HDS` (#15607) * ui: integrate buttons * refact: remove helper * ui: remove icons on non-tertiary buttons * refact: update normalize method for key/value pairs (#15612) * `revert`: `onCancel` for `JobDefinition` The `onCancel` method isn't included in the component API for `JobEditor` and the primary cancel behavior exists outside of the component. With the exception of the `JobDefinition` page where we include this button in the top right of the component instead of next to the `Plan` button. * style: increase button size * style: keep lime green * ui: select template (#15613) * ui: deprecate unused component * ui: deprecate tests * ui: jobs.run.templates.index * ui: update logic to handle templates * refact: revert key/value changes * style: padding for cards + buttons * temp: fixtures for mirage testing * Revert "refact: revert key/value changes" This reverts commit 124e95d12140be38fc921f7e15243034092c4063. * ui: guard template for unsaved job * ui: handle reading template variable * Revert "refact: update normalize method for key/value pairs (#15612)" This reverts commit 6f5ffc9b610702aee7c47fbff742cc81f819ab74. * revert: remove test fixtures * revert: prettier problems * refact: test doesnt need filter expression * styling: button sizes and responsive cards * refact: remove route guarding * ui: update variable adapter * refact: remove model editing behavior * refact: model should query variables to populate editor * ui: clear qp on exit * refact: cleanup deprecated API * refact: query all namespaces * refact: deprecate action * ui: rely on collection * refact: patch deprecate transition API * refact: patch test to expect namespace qp * styling: padding, conditionals * ui: flashMessage on 404 * test: update for o(n+1) query * ui: create new job template (#15744) * refact: remove unused code * refact: add type safety * test: select template flow * test: add data-test attrs * chore: remove dead code * test: create new job flow * ui: add create button * ui: create job template * refact: no need for wildcard * refact: record instead of delete * styling: spacing * ui: add error handling and form validation to job create template (#15767) * ui: handle server side errors * ui: show error to prevent duplicate * refact: conditional namespace * ui: save as template flow (#15787) * bug: patches failing tests associated with `pretender` (#15812) * refact: update assertion * refact: test set-up * ui: job templates manager view (#15815) * ui: manager list view * test: edit flow * refact: deprecate column-helper * ui: template edit and delete flow (#15823) * ui: manager list view * refact: update title * refact: update permissions * ui: template edit page * bug: typo * refact: update toast messages * bug: clear selections on exit (#15827) * bug: clear controllers on exit * test: mirage config changes (#15828) * refact: deprecate column-helper * style: update z-index for HDS * Revert "style: update z-index for HDS" This reverts commit d3d87ceab6d083f7164941587448607838944fc1. * refact: update delete button * refact: edit redirect * refact: patch reactivity issues * styling: fixed width * refact: override defaults * styling: edit text causing overflow * styling: add inline text Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com> * bug: edit `text` to `template` Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com> Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com> * test: delete flow job templates (#15896) * refact: edit names * bug: set correct ref to store * chore: trim whitespace: * test: delete flow * bug: reactively update view (#15904) * Initialized default jobs (#15856) * Initialized default jobs * More jobs scaffolded * Better commenting on a couple example job specs * Adapter doing the work * fall back to epic config * Label format helper and custom serialization logic * Test updates to account for a never-empty state * Test suite uses settled and maintain RecordArray in adapter return * Updates to hello-world and variables example jobspecs * Parameterized job gets optional payload output * Formatting changes for param and service discovery job templates * Multi-group service discovery job * Basic test for default templates (#15965) * Basic test for default templates * Percy snapshot for manage page * Some late-breaking design changes * Some copy edits to the header paragraphs for job templates (#15967) * Added some init options for job templates (#15994) * Async method for populating default job templates from the variable adapter --------- Co-authored-by: Jai <41024828+ChaiWithJai@users.noreply.github.com>	2023-02-02 10:37:40 -05:00
Charlie Voiselle	4caac1a92f	client: Add option to enable hairpinMode on Nomad bridge (#15961 ) * Add `bridge_network_hairpin_mode` client config setting * Add node attribute: `nomad.bridge.hairpin_mode` * Changed format string to use `%q` to escape user provided data * Add test to validate template JSON for developer safety Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2023-02-02 10:12:15 -05:00
jmwilkinson	37834dffda	Allow wildcard datacenters to be specified in job file (#11170 ) Also allows for default value of `datacenters = ["*"]`	2023-02-02 09:57:45 -05:00
James Rasell	9e8325d63c	acl: fix a bug in token creation when parsing expiration TTLs. (#15999 ) The ACL token decoding was not correctly handling time duration syntax such as "1h" which forced people to use the nanosecond representation via the HTTP API. The change adds an unmarshal function which allows this syntax to be used, along with other styles correctly.	2023-02-01 17:43:41 +01:00
James Rasell	67acfd9f6b	acl: return 400 not 404 code when creating an invalid policy. (#16000 )	2023-02-01 17:40:15 +01:00
stswidwinski	16eefbbf4d	GC: ensure no leakage of evaluations for batch jobs. (#15097 ) Prior to 2409f72 the code compared the modification index of a job to itself. Afterwards, the code compared the creation index of the job to itself. In either case there should never be a case of re-parenting of allocs causing the evaluation to trivially always result in false, which leads to unreclaimable memory. Prior to this change allocations and evaluations for batch jobs were never garbage collected until the batch job was explicitly stopped. The new `batch_eval_gc_threshold` server configuration controls how often they are collected. The default threshold is `24h`.	2023-01-31 13:32:14 -05:00
Jorge Marey	d1c9aad762	Rename fields on proxyConfig (#15541 ) * Change api Fields for expose and paths * Add changelog entry * changelog: add deprecation notes about connect fields * api: minor style tweaks --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-01-30 09:31:16 -06:00
Piotr Kazmierczak	14b53df3b6	renamed stanza to block for consistency with other projects (#15941 )	2023-01-30 15:48:43 +01:00
James Rasell	6accfb1f43	cli: separate auth method config output for easier reading. (#15892 )	2023-01-30 11:44:26 +01:00
Seth Hoenig	074b76e3bf	consul: check for acceptable service identity on consul tokens (#15928 ) When registering a job with a service and 'consul.allow_unauthenticated=false', we scan the given Consul token for an acceptable policy or role with an acceptable policy, but did not scan for an acceptable service identity (which is backed by an acceptable virtual policy). This PR updates our consul token validation to also accept a matching service identity when registering a service into Consul. Fixes #15902	2023-01-27 18:15:51 -06:00
Tim Gross	881a4cfaff	metrics: Add remaining server RPC rate metrics (#15901 )	2023-01-27 08:29:53 -05:00
Piotr Kazmierczak	f4d6efe69f	acl: make auth method default across all types (#15869 )	2023-01-26 14:17:11 +01:00
James Rasell	5d33891910	sso: allow binding rules to create management ACL tokens. (#15860 ) * sso: allow binding rules to create management ACL tokens. * docs: update binding rule docs to detail management type addition.	2023-01-26 09:57:44 +01:00
Tim Gross	055434cca9	add metric for count of RPC requests (#15515 ) Implement a metric for RPC requests with labels on the identity, so that administrators can monitor the source of requests within the cluster. This changeset demonstrates the change with the new `ACL.WhoAmI` RPC, and we'll wire up the remaining RPCs once we've threaded the new pre-forwarding authentication through the all. Note that metrics are measured after we forward but before we return any authentication error. This ensures that we only emit metrics on the server that actually serves the request. We'll perform rate limiting at the same place. Includes telemetry configuration to omit identity labels.	2023-01-24 11:54:20 -05:00
Tim Gross	2030d62920	implement pre-forwarding auth on select RPCs (#15513 ) In #15417 we added a new `Authenticate` method to the server that returns an `AuthenticatedIdentity` struct. This changeset implements this method for a small number of RPC endpoints that together represent all the various ways in which RPCs are sent, so that we can validate that we're happy with this approach.	2023-01-24 10:52:07 -05:00
Karl Johann Schubert	b773a1b77f	client: add disk_total_mb and disk_free_mb config options (#15852 )	2023-01-24 09:14:22 -05:00
Charlie Voiselle	5ea1d8a970	Add raft snapshot configuration options (#15522 ) * Add config elements * Wire in snapshot configuration to raft * Add hot reload of raft config * Add documentation for new raft settings * Add changelog	2023-01-20 14:21:51 -05:00
James Rasell	f8f1d45e8a	cli: use localhost for default login callback address. (#15820 )	2023-01-19 16:46:17 +01:00
James Rasell	fad9b40e53	Merge branch 'main' into sso/gh-13120-oidc-login	2023-01-18 10:05:31 +00:00
Phil Renaud	98c5259f3e	[sso] OIDC Updates for the UI (#15804 ) * Updated UI to handle OIDC method changes * Remove redundant store unload call	2023-01-17 17:01:47 -05:00
Dao Thanh Tung	e2ae6d62e1	fix bug in nomad fmt -check does not return error code (#15797 )	2023-01-17 09:15:34 -05:00
James Rasell	d09138a7c5	cli: add login command to allow OIDC provider SSO login.	2023-01-13 13:16:09 +00:00
James Rasell	b3a6cfecc4	api: add OIDC HTTP API endpoints and SDK.	2023-01-13 13:15:58 +00:00
Seth Hoenig	fe7795ce16	consul/connect: support for proxy upstreams opaque config (#15761 ) This PR adds support for configuring `proxy.upstreams[].config` for Consul Connect upstreams. This is an opaque config value to Nomad - the data is passed directly to Consul and is unknown to Nomad.	2023-01-12 08:20:54 -06:00
Anthony Davis	1c32471805	Fix rejoin_after_leave behavior (#15552 )	2023-01-11 16:39:24 -05:00
Dao Thanh Tung	09b25d71b8	cli: Add a nomad operator client state command (#15469 ) Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>	2023-01-11 10:03:31 -05:00
Dao Thanh Tung	ca2f509e82	agent: Make agent syslog log level inherit from Nomad agent log (#15625 )	2023-01-04 09:38:06 -05:00
Tim Gross	8859e1bff1	csi: Fix parsing of '=' in secrets at command line and HTTP (#15670 ) The command line flag parsing and the HTTP header parsing for CSI secrets incorrectly split at more than one '=' rune, making it impossible to use secrets that included that rune.	2023-01-03 16:28:38 -05:00
Seth Hoenig	7214e21402	ci: swap freeport for portal in packages (#15661 )	2023-01-03 11:25:20 -06:00
Seth Hoenig	9eb2433871	command: fixup parsing of stale query parameter (#15631 ) In #15605 we fixed the bug where the presense of "stale" query parameter was mean to imply stale, even if the value of the parameter was "false" or malformed. In parsing, we missed the case where the slice of values would be nil which lead to a failing test case that was missed because CI didn't run against the original PR.	2023-01-03 08:21:20 -06:00
Seth Hoenig	266ca25a81	cleanup: remove usage of consul/sdk/testutil/retry (#15609 ) This PR removes usages of `consul/sdk/testutil/retry`, as part of the ongoing effort to remove use of any non-API module from Consul. There is one remanining usage in the helper/freeport package, but that will get removed as part of #15589	2023-01-02 08:06:20 -06:00
Dao Thanh Tung	53cd1b4871	fix: `stale` querystring parameter value as boolean (#15605 ) * Add changes to make stale querystring param boolean Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Make error message more consistent Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Changes from code review + Adding CHANGELOG file Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Changes from code review to use github.com/shoenig/test package Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Change must.Nil() to must.NoError() Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Minor fix on the import order Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Fix existing code format too Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Minor changes addressing code review feedbacks Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * swap must.EqOp() order of param provided Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>	2023-01-01 13:04:14 -06:00
Seth Hoenig	92dfa41286	command: fixup tests concerning multi job stop (#15606 ) * command: fixup job multi-stop test This PR refactors the StopCommand test that runs 10 jobs and then passes them all to one invokation of 'job stop'. * test: swap use of assert for must * test: cleanup job files we create * command: fixup job stop failure tests Now that JobStop works on concurrent jobs, the error messages are different. * cleanup: use multiple post scripts	2022-12-21 16:21:48 -06:00
Seth Hoenig	83f9fc9db4	tests: do not return error from testagent shutdown (#15595 )	2022-12-21 08:23:58 -06:00
Danish Prakash	dc81568f93	command/job_stop: accept multiple jobs, stop concurrently (#12582 ) * command/job_stop: accept multiple jobs, stop concurrently Signed-off-by: danishprakash <grafitykoncept@gmail.com> * command/job_stop_test: add test for multiple job stops Signed-off-by: danishprakash <grafitykoncept@gmail.com> * improve output, add changelog and docs Signed-off-by: danishprakash <grafitykoncept@gmail.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-12-16 15:46:58 -08:00
James Rasell	b0730ebb02	cli: add ACL binding rule commands for CRUD actions. (#15554 )	2022-12-15 16:57:44 +01:00
James Rasell	95c9ffa505	ACL: add ACL binding rule RPC and HTTP API handlers. (#15529 ) This change add the RPC ACL binding rule handlers. These handlers are responsible for the creation, updating, reading, and deletion of binding rules. The write handlers are feature gated so that they can only be used when all federated servers are running the required version. The HTTP API handlers and API SDK have also been added where required. This allows the endpoints to be called from the API by users and clients.	2022-12-15 09:18:55 +01:00
Piotr Kazmierczak	d62a869caa	acl: numerous small bugfixes for acl auth methods CLI (#15539 ) This PR contains a number of small bugfixes discovered during #15538 work.	2022-12-14 13:25:40 +01:00
Piotr Kazmierczak	db98e26375	bugfix: acl sso auth methods test failures (#15512 ) This PR fixes unit test failures introduced in f4e89e2	2022-12-09 18:47:32 +01:00
Piotr Kazmierczak	777173e8da	acl: added type to ACL Auth Method stub (#15480 )	2022-12-06 14:47:05 +01:00
Piotr Kazmierczak	9c3f04b488	bugfix: corrected indentation for ACL auth method create CLI command (#15481 )	2022-12-06 14:45:24 +01:00
Seth Hoenig	119f7b1cd1	consul: fixup expected consul tagged_addresses when using ipv6 (#15411 ) This PR is a continuation of #14917, where we missed the ipv6 cases. Consul auto-inserts tagged_addresses for keys - lan_ipv4 - wan_ipv4 - lan_ipv6 - wan_ipv6 even though the service registration coming from Nomad does not contain such elements. When doing the differential between services Nomad expects to be registered vs. the services actually registered into Consul, we must first purge these automatically inserted tagged_addresses if they do not exist in the Nomad view of the Consul service.	2022-12-01 07:38:30 -06:00
Piotr Kazmierczak	0eccd3286c	acl: sso auth methods RPC/API/CLI should return created or updated objects (#15410 ) Currently CRUD code that operates on SSO auth methods does not return created or updated object upon creation/update. This is bad UX and inconsistent behavior compared to other ACL objects like roles, policies or tokens. This PR fixes it. Relates to #13120	2022-11-29 07:36:36 +01:00
Piotr Kazmierczak	db9316c4d3	acl: sso auth methods cli commands (#15322 ) This PR implements CLI commands to interact with SSO auth methods. This PR is part of the SSO work captured under ☂️ ticket #13120.	2022-11-28 10:51:45 +01:00
Piotr Kazmierczak	9c85315bd2	bugfix: typos in acl role commands (#15382 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2022-11-25 10:28:33 +01:00
Luiz Aoqui	4208cfcfbd	cli: improve errors for multiregion deployments (#15326 ) Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>	2022-11-23 16:40:13 -05:00
Jack	62f7de7ed5	cli: `wait` flag for use with `deployment status -monitor` (#15262 )	2022-11-23 16:36:13 -05:00
James Rasell	32dfa431f3	sso: add ACL auth-method HTTP API CRUD endpoints (#15338 ) * core: remove custom auth-method TTLS and use ACL token TTLS. * agent: add ACL auth-method HTTP endpoints for CRUD actions. * api: add ACL auth-method client.	2022-11-23 09:38:02 +01:00
Lance Haig	0263e7af34	Add command "nomad tls" (#14296 )	2022-11-22 14:12:07 -05:00
hc-github-team-nomad-core	031d75e158	Generate files for 1.4.3 release	2022-11-22 12:56:29 -05:00
Seth Hoenig	bf4b5f9a8d	consul: add trace logging around service registrations (#15311 ) This PR adds trace logging around the differential done between a Nomad service registration and its corresponding Consul service registration, in an effort to shed light on why a service registration request is being made.	2022-11-21 08:03:56 -06:00
James Rasell	a7350853ae	api: ensure ACL role upsert decode error returns a 400 status code. (#15253 )	2022-11-18 17:47:43 +01:00
James Rasell	3225cf77b6	api: ensure all request body decode error return a 400 status code. (#15252 )	2022-11-18 17:04:33 +01:00
James Rasell	2e19e9639e	agent: ensure all HTTP Server methods are pointer receivers. (#15250 )	2022-11-15 16:31:44 +01:00
Tim Gross	37134a4a37	eval delete: move batching of deletes into RPC handler and state (#15117 ) During unusual outage recovery scenarios on large clusters, a backlog of millions of evaluations can appear. In these cases, the `eval delete` command can put excessive load on the cluster by listing large sets of evals to extract the IDs and then sending larges batches of IDs. Although the command's batch size was carefully tuned, we still need to be JSON deserialize, re-serialize to MessagePack, send the log entries through raft, and get the FSM applied. To improve performance of this recovery case, move the batching process into the RPC handler and the state store. The design here is a little weird, so let's look a the failed options first: * A naive solution here would be to just send the filter as the raft request and let the FSM apply delete the whole set in a single operation. Benchmarking with 1M evals on a 3 node cluster demonstrated this can block the FSM apply for several minutes, which puts the cluster at risk if there's a leadership failover (the barrier write can't be made while this apply is in-flight). * A less naive but still bad solution would be to have the RPC handler filter and paginate, and then hand a list of IDs to the existing raft log entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and took roughly an hour to complete. Instead, we're filtering and paginating in the RPC handler to find a page token, and then passing both the filter and page token in the raft log. The FSM apply recreates the paginator using the filter and page token to get roughly the same page of evaluations, which it then deletes. The pagination process is fairly cheap (only abut 5% of the total FSM apply time), so counter-intuitively this rework ends up being much faster. A benchmark of 1M evaluations showed this blocked the FSM apply for 20-30ms at a time (typical for normal operations) and completes in less than 4 minutes. Note that, as with the existing design, this delete is not consistent: a new evaluation inserted "behind" the cursor of the pagination will fail to be deleted.	2022-11-14 14:08:13 -05:00
Derek Strickland	80b6f27efd	api: remove `mapstructure` tags from`Port` struct (#12916 ) This PR solves a defect in the deserialization of api.Port structs when returning structs from theEventStream. Previously, the api.Port struct's fields were decorated with both mapstructure and hcl tags to support the network.port stanza's use of the keyword static when posting a static port value. This works fine when posting a job and when retrieving any struct that has an embedded api.Port instance as long as the value is deserialized using JSON decoding. The EventStream, however, uses mapstructure to decode event payloads in the api package. mapstructure expects an underlying field named static which does not exist. The result was that the Port.Value field would always be set to 0. Upon further inspection, a few things became apparent. The struct already has hcl tags that support the indirection during job submission. Serialization/deserialization with both the json and hcl packages produce the desired result. The use of of the mapstructure tags provided no value as the Port struct contains only fields with primitive types. This PR: Removes the mapstructure tags from the api.Port structs Updates the job parsing logic to use hcl instead of mapstructure when decoding Port instances. Closes #11044 Co-authored-by: DerekStrickland <dstrickland@hashicorp.com> Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>	2022-11-08 11:26:28 +01:00
Drew Gonzales	aac9404ee5	server: add git revision to serf tags (#9159 )	2022-11-07 10:34:33 -05:00
Tim Gross	9e1c0b46d8	API for `Eval.Count` (#15147 ) Add a new `Eval.Count` RPC and associated HTTP API endpoints. This API is designed to support interactive use in the `nomad eval delete` command to get a count of evals expected to be deleted before doing so. The state store operations to do this sort of thing are somewhat expensive, but it's cheaper than serializing a big list of evals to JSON. Note that although it seems like this could be done as an extra parameter and response field on `Eval.List`, having it as its own endpoint avoids having to change the response body shape and lets us avoid handling the legacy filter params supported by `Eval.List`.	2022-11-07 08:53:19 -05:00
Charlie Voiselle	79c4478f5b	template: error on missing key (#15141 ) * Support error_on_missing_value for templates * Update docs for template stanza	2022-11-04 13:23:01 -04:00
Phil Renaud	ffb4c63af7	[ui] Adds meta to job list stub and displays a pack logo on the jobs index (#14833 ) * Adds meta to job list stub and displays a pack logo on the jobs index * Changelog * Modifying struct for optional meta param * Explicitly ask for meta anytime I look up a job from index or job page * Test case for the endpoint * adding meta field to API struct and ommitting from response if empty * passthru method added to api/jobs.list * Meta param listed in docs for jobs list * Update api/jobs.go Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-11-02 16:58:24 -04:00
hc-github-team-nomad-core	fbef8881cd	Generate files for 1.4.2 release	2022-10-27 13:08:05 -04:00
Tim Gross	b9922631bd	keyring: fix missing GC config, don't rotate on manual GC (#15009 ) The configuration knobs for root keyring garbage collection are present in the consumer and present in the user-facing config, but we missed the spot where we copy from one to the other. Fix this so that users can set their own thresholds. The root key is automatically rotated every ~30d, but the function that does both rotation and key GC was wired up such that `nomad system gc` caused an unexpected key rotation. Split this into two functions so that `nomad system gc` cleans up old keys without forcing a rotation, which will be done periodially or by the `nomad operator root keyring rotate` command.	2022-10-24 08:43:42 -04:00
Luiz Aoqui	593e48e826	cli: prevent panic on `operator debug` (#14992 ) If the API returns an error during debug bundle collection the CLI was expanding the wrong error object, resulting in a panic since `err` is `nil`.	2022-10-20 15:53:58 -04:00
Luiz Aoqui	0fddb4d7e8	Post 1.4.1 release (#14988 ) * Generate files for 1.4.1 release * Prepare for next release Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>	2022-10-20 13:09:41 -04:00
Seth Hoenig	756b71b7d2	deps: bump shoenig for str func bugfixes (#14974 ) And fix the one place we use them.	2022-10-20 08:11:43 -05:00
James Rasell	d7b311ce55	acl: correctly resolve ACL roles within client cache. (#14922 ) The client ACL cache was not accounting for tokens which included ACL role links. This change modifies the behaviour to resolve role links to policies. It will also now store ACL roles within the cache for quick lookup. The cache TTL is configurable in the same manner as policies or tokens. Another small fix is included that takes into account the ACL token expiry time. This was not included, which meant tokens with expiry could be used past the expiry time, until they were GC'd.	2022-10-20 09:37:32 +02:00
Seth Hoenig	57375566d4	consul: register checks along with service on initial registration (#14944 ) * consul: register checks along with service on initial registration This PR updates Nomad's Consul service client to include checks in an initial service registration, so that the checks associated with the service are registered "atomically" with the service. Before, we would only register the checks after the service registration, which causes problems where the service is deemed healthy, even if one or more checks are unhealthy - especially problematic in the case where SuccessBeforePassing is configured. Fixes #3935 * cr: followup to fix cause of extra consul logging * cr: fix another bug * cr: fixup changelog	2022-10-19 12:40:56 -05:00
Seth Hoenig	f1b902beac	consul: do not re-register already registered services (#14917 ) This PR updates Nomad's Consul service client to do map comparisons using maps.Equal instead of reflect.DeepEqual. The bug fix is in how DeepEqual treats nil slices different from empty slices, when actually they should be treated the same.	2022-10-18 08:10:59 -05:00
Seth Hoenig	306b4dd38e	cleanup: remove another string-set helper function (#14902 )	2022-10-17 14:14:52 -05:00
Michael Schurter	45ce8c13cf	client: remove unused LogOutput and LogLevel (#14867 ) * client: remove unused LogOutput * client: remove unused config.LogLevel	2022-10-11 09:24:40 -07:00
Seth Hoenig	5e38a0e82c	cleanup: rename Equals to Equal for consistency (#14759 )	2022-10-10 09:28:46 -05:00
Damian Czaja	95f969c4bf	cli: add `nomad fmt` (#14779 )	2022-10-06 17:00:29 -04:00
Gabriel Villalonga Simon	b974c32ba6	Check that JobPlanResponse Diff Type is None before checking for changes on getExitCode (#14492 )	2022-10-06 16:23:22 -04:00
Giovani Avelar	a625de2062	Allow specification of a custom job name/prefix for parameterized jobs (#14631 )	2022-10-06 16:21:40 -04:00
hc-github-team-nomad-core	4fdcd197c0	Generate files for 1.4.0 release	2022-10-06 09:16:00 -07:00
Tim Gross	341dc84a77	variables: use correct URL in ref to docs (#14792 )	2022-10-04 11:30:49 -04:00
Seth Hoenig	c68ed3b4c8	client: protect user lookups with global lock (#14742 ) * client: protect user lookups with global lock This PR updates Nomad client to always do user lookups while holding a global process lock. This is to prevent concurrency unsafe implementations of NSS, but still enabling NSS lookups of users (i.e. cannot not use osusergo). * cl: add cl	2022-09-29 09:30:13 -05:00
hc-github-team-nomad-core	2fe5a962f3	Generate files for 1.4.0-rc.1 release	2022-09-27 17:33:32 -04:00
Tim Gross	a661399b41	cli: fix doc strings for `var get` command (#14697 )	2022-09-26 15:05:22 -04:00
Luiz Aoqui	f7c6534a79	cli: set content length on `operator api` requests (#14634 ) http.NewRequestWithContext will only set the right value for Content-Length if the input is bytes.Buffer, bytes.Reader, or *strings.Reader [0]. Since os.Stdin is an os.File, POST requests made with the `nomad operator api` command would always have Content-Length set to -1, which is interpreted as an unknown length by web servers. [0]: https://pkg.go.dev/net/http#NewRequestWithContext	2022-09-26 14:21:40 -04:00
Tim Gross	c29c4bd66c	cli: remove deprecated `eval status -json` list behavior (#14651 ) In Nomad 1.2.6 we shipped `eval list`, which accepts a `-json` flag, and deprecated the usage of `eval status` without an evaluation ID with an upgrade note that it would be removed in Nomad 1.4.0. This changeset completes that work.	2022-09-22 10:56:32 -04:00
Jorge Marey	584ddfe859	Add Namespace, Job and Group to envoy stats (#14311 )	2022-09-22 10:38:21 -04:00
Luiz Aoqui	e0ba6400a7	cli: print success message on var put (#14620 )	2022-09-22 10:18:01 -04:00
Tim Gross	d327a68696	operator debug: write NDJSON for large collections (#14610 ) The `operator debug` command writes JSON files from API responses as a single line containing an array of JSON objects. But some of these files can be extremely large (GB's) for large production clusters, which makes it difficult to parse them using typical line-oriented Unix command line tools that can stream their inputs without consuming a lot of memory. For collections that are typically large, instead emit newline-delimited JSON. This changeset includes some first-pass refactoring of this command. It breaks up monolithic methods that validate a path, create a file, serialize objects, and write them to disk into smaller functions, some of which can now be standalone to take advantage of generics.	2022-09-22 10:02:00 -04:00
James Rasell	a25028c412	cli: fix a bug in operator API when setting HTTPS via address. (#14635 ) Operators may have a setup whereby the TLS config comes from a source other than setting Nomad specific env vars. In this case, we should attempt to identify the scheme using the config setting as a fallback.	2022-09-22 15:43:58 +02:00
Seth Hoenig	2088ca3345	cleanup more helper updates (#14638 ) * cleanup: refactor MapStringStringSliceValueSet to be cleaner * cleanup: replace SliceStringToSet with actual set * cleanup: replace SliceStringSubset with real set * cleanup: replace SliceStringContains with slices.Contains * cleanup: remove unused function SliceStringHasPrefix * cleanup: fixup StringHasPrefixInSlice doc string * cleanup: refactor SliceSetDisjoint to use real set * cleanup: replace CompareSliceSetString with SliceSetEq * cleanup: replace CompareMapStringString with maps.Equal * cleanup: replace CopyMapStringString with CopyMap * cleanup: replace CopyMapStringInterface with CopyMap * cleanup: fixup more CopyMapStringString and CopyMapStringInt * cleanup: replace CopySliceString with slices.Clone * cleanup: remove unused CopySliceInt * cleanup: refactor CopyMapStringSliceString to be generic as CopyMapOfSlice * cleanup: replace CopyMap with maps.Clone * cleanup: run go mod tidy	2022-09-21 14:53:25 -05:00
Luiz Aoqui	c3c8ae584f	api: provide more detail on ACL bootstrap request error (#14629 )	2022-09-20 21:20:04 -04:00
Derek Strickland	24af28dc30	Merge pull request #14602 from hashicorp/release/1.4.0-beta.1 Release/1.4.0 beta.1	2022-09-15 13:57:40 -04:00
Tim Gross	81516db4b2	variables: fix ENT-only test failure in command tests (#14599 ) The `TestVarGetCommand` test uses the wrong namespace in the autocomplete test. The namespace only gets validated against if we have quota enforcement (or more typically by ACL checks), so the test only fails in the ENT repo test runs.	2022-09-15 10:37:57 -04:00
hc-github-team-nomad-core	a3a718e167	Generate files for 1.4.0-beta.1 release	2022-09-14 19:32:18 +00:00
hc-github-team-nomad-core	b91437bb68	Generate files for 1.4.0-beta.1 release	2022-09-14 18:59:59 +00:00
Seth Hoenig	5187f92c5e	cleanup: create interface for check watcher and mock it in nsd tests (#14577 ) * cleanup: create interface for check watcher and mock it in nsd tests * cleanup: add comments for check watcher interface	2022-09-14 08:25:20 -05:00
Seth Hoenig	bf4dd30919	Merge pull request #14553 from hashicorp/f-nsd-check-watcher servicedisco: implement check_restart support for nomad service checks	2022-09-13 09:55:51 -05:00
Tim Gross	357e7f4521	docs: include path in ACL requirements for variables (#14561 ) Also add links to the ACL policy reference and variables concepts docs near the top of the page.	2022-09-13 10:21:29 -04:00
Seth Hoenig	9a943107c7	servicedisco: implement check_restart for nomad service checks This PR implements support for check_restart for checks registered in the Nomad service provider. Unlike Consul, Nomad service checks never report a "warning" status, and so the check_restart.ignore_warnings configuration is not valid for Nomad service checks.	2022-09-13 08:59:23 -05:00
Seth Hoenig	b960925939	Merge pull request #14546 from hashicorp/f-refactor-check-watcher client: refactor check watcher to be reusable	2022-09-13 07:32:32 -05:00
Tim Gross	cd7aba96fc	variables: change spec file extension to match rename (#14552 ) Also fixes a typo in the `var put` help text.	2022-09-12 16:26:18 -04:00
Charlie Voiselle	4c9554f87c	Update flags to align with other var commands. (#14550 )	2022-09-12 15:26:12 -04:00
Seth Hoenig	feff36f3f7	client: refactor check watcher to be reusable This PR refactors agent/consul/check_watcher into client/serviceregistration, and abstracts away the Consul-specific check lookups. In doing so we should be able to reuse the existing check watcher logic for also watching NSD checks in a followup PR. A chunk of consul/unit_test.go is removed - we'll cover that in e2e tests in a follow PR if needed. In the long run I'd like to remove this whole file.	2022-09-12 10:13:31 -05:00
Charlie Voiselle	b55112714f	Vars: CLI commands for `var get`, `var put`, `var purge` (#14400 ) * Includes updates to `var init`	2022-09-09 17:55:20 -04:00
Seth Hoenig	31234d6a62	cleanup: consolidate interfaces for workload restarting This PR combines two of the same interface definitions around workload restarting	2022-09-09 08:59:04 -05:00
Tim Gross	9259a373cd	remove root keyring install API (#14514 ) * keyring rotate API should require put/post method * remove keyring install API	2022-09-09 08:50:35 -04:00
James Rasell	3fa8b0b270	client: fix RPC forwarding when querying checks for alloc. (#14498 ) When querying the checks for an allocation, the request must be forwarded to the agent that is running the allocation. If the initial request is made to a server agent, the request can be made directly to the client agent running the allocation. If the request is made to a client agent not running the alloc, the request needs to be forwarded to a server and then the correct client.	2022-09-08 16:55:23 +02:00
Tim Gross	6ff59e71a5	cli: remove network from `quota status` output (#14468 ) Network quotas were removed in Nomad 1.0.4. Remove the fields no longer in use from the `quota status` output.	2022-09-06 09:37:16 -04:00
Tim Gross	7921f044e5	migrate autopilot implementation to raft-autopilot (#14441 ) Nomad's original autopilot was importing from a private package in Consul. It has been moved out to a shared library. Switch Nomad to use this library so that we can eliminate the import of Consul, which is necessary to build Nomad ENT with the current version of the Consul SDK. This also will let us pick up autopilot improvements shared with Consul more easily.	2022-09-01 14:27:10 -04:00
Luiz Aoqui	94d7dddccd	cli: set -hcl2-strict to false if -hcl1 is defined (#14426 ) These options are mutually exclusive but, since `-hcl2-strict` defaults to `true` users had to explicitily set it to `false` when using `-hcl1`. Also return `255` when job plan fails validation as this is the expected code in this situation.	2022-09-01 10:42:08 -04:00
Derek Strickland	35e91ff376	Merge release 1.3.5 files (#14425 ) * Merge release 1.3.5 files * Generate files for 1.3.5 release * Prepare for next release Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>	2022-08-31 18:31:56 -04:00
Charlie Voiselle	5c0e34dd33	Vars: Update CT dependency to support variables. (#14399 ) * Update Consul Template dep to support Nomad vars * Remove `Peering` config for Consul Testservers Upgrading to the 1.14 Consul SDK introduces and additional default configuration—`Peering`—that is not compatible with versions of Consul before v1.13.0. because Nomad tests against Consul v1.11.1, this configuration has to be nil'ed out before passing it to the Consul binary.	2022-08-30 15:26:01 -04:00
Tim Gross	cc9b480996	testing: setting env var incompatible with parallel tests (#14405 ) Neither the `os.Setenv` nor `t.Setenv` helper are safe to use in parallel tests because environment variables are process-global. The stdlib panics if you try to do this. Remove the `ci.Parallel()` call from all tests where we're setting environment variables.	2022-08-30 14:49:03 -04:00
Tim Gross	5784fb8c58	search: enforce correct ACL for search over variables (#14397 )	2022-08-30 13:27:31 -04:00
Seth Hoenig	52de2dc09d	Merge pull request #14290 from hashicorp/cleanup-more-helper-cleanup cleanup: tidy up helper package some more	2022-08-30 08:19:48 -05:00
James Rasell	755b4745ed	Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main	2022-08-30 08:59:13 +01:00
Tim Gross	62a968f443	Merge pull request #14351 from hashicorp/variables-rename Variables rename	2022-08-29 11:36:50 -04:00
Michael Schurter	dbffe22465	consul: allow stale namespace results (#12953 ) Nomad reconciles services it expects to be registered in Consul with what is actually registered in the local Consul agent. This is necessary to prevent leaking service registrations if Nomad crashes at certain points (or if there are bugs). When Consul has namespaces enabled, we must iterate over each available namespace to be sure no services were leaked into non-default namespaces. Since this reconciliation happens often, there's no need to require results from the Consul leader server. In large clusters this creates far more load than the "freshness" of the response is worth. Therefore this patch switches the request to AllowStale=true	2022-08-26 16:05:12 -07:00
Tim Gross	1dc053b917	rename SecureVariables to Variables throughout	2022-08-26 16:06:24 -04:00
Tim Gross	dcfd31296b	file rename	2022-08-26 16:06:24 -04:00
Vladimir Sokolov	b646810401	cli: force periodic job if its id equals search prefix	2022-08-26 10:54:37 -04:00
Luiz Aoqui	ad84b22a72	Post 1.3.4 release (#14329 ) * Generate files for 1.3.4 release * Prepare for next release * Update CHANGELOG.md Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>	2022-08-26 10:09:13 -04:00
Charlie Voiselle	ad737d008b	SV API: return upserted variable to caller (#14325 ) * Return created variable to caller in HTTP and Go APIs * Update tests for returned values	2022-08-25 17:38:15 -04:00
Seth Hoenig	fd9744b9eb	Merge pull request #14301 from hashicorp/b-fix-check-status-test-racey testing: fix flakey check status test	2022-08-25 08:30:46 -05:00
James Rasell	601588df6b	Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main	2022-08-25 12:14:29 +01:00
Luiz Aoqui	e012d9411e	Task lifecycle restart (#14127 ) * allocrunner: handle lifecycle when all tasks die When all tasks die the Coordinator must transition to its terminal state, coordinatorStatePoststop, to unblock poststop tasks. Since this could happen at any time (for example, a prestart task dies), all states must be able to transition to this terminal state. * allocrunner: implement different alloc restarts Add a new alloc restart mode where all tasks are restarted, even if they have already exited. Also unifies the alloc restart logic to use the implementation that restarts tasks concurrently and ignores ErrTaskNotRunning errors since those are expected when restarting the allocation. * allocrunner: allow tasks to run again Prevent the task runner Run() method from exiting to allow a dead task to run again. When the task runner is signaled to restart, the function will jump back to the MAIN loop and run it again. The task runner determines if a task needs to run again based on two new task events that were added to differentiate between a request to restart a specific task, the tasks that are currently running, or all tasks that have already run. * api/cli: add support for all tasks alloc restart Implement the new -all-tasks alloc restart CLI flag and its API counterpar, AllTasks. The client endpoint calls the appropriate restart method from the allocrunner depending on the restart parameters used. * test: fix tasklifecycle Coordinator test * allocrunner: kill taskrunners if all tasks are dead When all non-poststop tasks are dead we need to kill the taskrunners so we don't leak their goroutines, which are blocked in the alloc restart loop. This also ensures the allocrunner exits on its own. * taskrunner: fix tests that waited on WaitCh Now that "dead" tasks may run again, the taskrunner Run() method will not return when the task finishes running, so tests must wait for the task state to be "dead" instead of using the WaitCh, since it won't be closed until the taskrunner is killed. * tests: add tests for all tasks alloc restart * changelog: add entry for #14127 * taskrunner: fix restore logic. The first implementation of the task runner restore process relied on server data (`tr.Alloc().TerminalStatus()`) which may not be available to the client at the time of restore. It also had the incorrect code path. When restoring a dead task the driver handle always needs to be clear cleanly using `clearDriverHandle` otherwise, after exiting the MAIN loop, the task may be killed by `tr.handleKill`. The fix is to store the state of the Run() loop in the task runner local client state: if the task runner ever exits this loop cleanly (not with a shutdown) it will never be able to run again. So if the Run() loops starts with this local state flag set, it must exit early. This local state flag is also being checked on task restart requests. If the task is "dead" and its Run() loop is not active it will never be able to run again. * address code review requests * apply more code review changes * taskrunner: add different Restart modes Using the task event to differentiate between the allocrunner restart methods proved to be confusing for developers to understand how it all worked. So instead of relying on the event type, this commit separated the logic of restarting an taskRunner into two methods: - `Restart` will retain the current behaviour and only will only restart the task if it's currently running. - `ForceRestart` is the new method where a `dead` task is allowed to restart if its `Run()` method is still active. Callers will need to restart the allocRunner taskCoordinator to make sure it will allow the task to run again. * minor fixes	2022-08-24 17:43:07 -04:00
Tim Gross	c732b215f0	vault: detect namespace change in config reload (#14298 ) The `namespace` field was not included in the equality check between old and new Vault configurations, which meant that a Vault config change that only changed the namespace would not be detected as a change and the clients would not be reloaded. Also, the comparison for boolean fields such as `enabled` and `allow_unauthenticated` was on the pointer and not the value of that pointer, which results in spurious reloads in real config reload that is easily missed in typical test scenarios. Includes a minor refactor of the order of fields for `Copy` and `Merge` to match the struct fields in hopes it makes it harder to make this mistake in the future, as well as additional test coverage.	2022-08-24 17:03:29 -04:00
Seth Hoenig	ff59b90d41	testing: fix flakey check status test This PR fixes a flakey test where we did not wait on the check status to actually become failing (go too fast and you just get a pending check). Instead add a helper for waiting on any check in the alloc to become the state we are looking for.	2022-08-24 15:11:41 -05:00
Seth Hoenig	062c817450	cleanup: move fs helpers into escapingfs	2022-08-24 14:45:34 -05:00
Piotr Kazmierczak	7077d1f9aa	template: custom change_mode scripts (#13972 ) This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707	2022-08-24 17:43:01 +02:00
James Rasell	2ccc48c167	cli: use policy flag for role creation and update.	2022-08-24 15:15:02 +01:00
James Rasell	7401677e4e	cli: output none when a token has no expiration.	2022-08-24 15:14:49 +01:00
James Rasell	9782d6d7ff	acl: allow tokens to lookup linked roles. (#14227 ) When listing or reading an ACL role, roles linked to the ACL token used for authentication can be returned to the caller.	2022-08-24 13:51:51 +02:00
Luiz Aoqui	7ee3de3ea5	fix minor issues found durint ENT merge (#14250 )	2022-08-23 17:22:18 -04:00
Luiz Aoqui	7a8cacc9ec	allocrunner: refactor task coordinator (#14009 ) The current implementation for the task coordinator unblocks tasks by performing destructive operations over its internal state (like closing channels and deleting maps from keys). This presents a problem in situations where we would like to revert the state of a task, such as when restarting an allocation with tasks that have already exited. With this new implementation the task coordinator behaves more like a finite state machine where task may be blocked/unblocked multiple times by performing a state transition. This initial part of the work only refactors the task coordinator and is functionally equivalent to the previous implementation. Future work will build upon this to provide bug fixes and enhancements.	2022-08-22 18:38:49 -04:00
Tim Gross	bf57d76ec7	allow ACL policies to be associated with workload identity (#14140 ) The original design for workload identities and ACLs allows for operators to extend the automatic capabilities of a workload by using a specially-named policy. This has shown to be potentially unsafe because of naming collisions, so instead we'll allow operators to explicitly attach a policy to a workload identity. This changeset adds workload identity fields to ACL policy objects and threads that all the way down to the command line. It also a new secondary index to the ACL policy table on namespace and job so that claim resolution can efficiently query for related policies.	2022-08-22 16:41:21 -04:00
Luiz Aoqui	dbffdca92e	template: use pointer values for gid and uid (#14203 ) When a Nomad agent starts and loads jobs that already existed in the cluster, the default template uid and gid was being set to 0, since this is the zero value for int. This caused these jobs to fail in environments where it was not possible to use 0, such as in Windows clients. In order to differentiate between an explicit 0 and a template where these properties were not set we need to use a pointer.	2022-08-22 16:25:49 -04:00
James Rasell	2736cf0dfa	acl: make listing RPC and HTTP API a stub return object. (#14211 ) Making the ACL Role listing return object a stub future-proofs the endpoint. In the event the role object grows, we are not bound by having to return all fields within the list endpoint or change the signature of the endpoint to reduce the list return size.	2022-08-22 17:20:23 +02:00
James Rasell	802d005ef5	acl: add replication to ACL Roles from authoritative region. (#14176 ) ACL Roles along with policies and global token will be replicated from the authoritative region to all federated regions. This involves a new replication loop running on the federated leader. Policies and roles may be replicated at different times, meaning the policies and role references may not be present within the local state upon replication upsert. In order to bypass the RPC and state check, a new RPC request parameter has been added. This is used by the replication process; all other callers will trigger the ACL role policy validation check. There is a new ACL RPC endpoint to allow the reading of a set of ACL Roles which is required by the replication process and matches ACL Policies and Tokens. A bug within the ACL Role listing RPC has also been fixed which returned incorrect data during blocking queries where a deletion had occurred.	2022-08-22 08:54:07 +02:00
Seth Hoenig	88a1353149	cli: display nomad service check status output in CLI commands This PR adds some NSD check status output to the CLI. 1. The 'nomad alloc status' command produces nsd check summary output (if present) 2. The 'nomad alloc checks' sub-command is added to produce complete nsd check output (if present)	2022-08-19 09:18:29 -05:00
Michael Schurter	3b57df33e3	client: fix data races in config handling (#14139 ) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex only protects the config pointer itself and not fields inside the Config struct: all config mutation must be done on a copy of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an internally backward incompatible API change: our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"	2022-08-18 16:32:04 -07:00
Seth Hoenig	c5d36eaa2f	cleanup: fixing warnings and refactoring of command package, part 2 This PR continues the cleanup of the command package, removing linter warnings, refactoring to use helpers, making tests easier to read, etc.	2022-08-18 09:43:39 -05:00
Seth Hoenig	4c1a0d4907	cleanup: first pass at fixing command package warnings This PR is the first of several for cleaning up warnings, and refactoring bits of code in the command package. First pass is over acl_ files and gets some helpers in place.	2022-08-17 15:33:37 -05:00
Piotr Kazmierczak	b63944b5c1	cleanup: replace TypeToPtr helper methods with pointer.Of (#14151 ) Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.	2022-08-17 18:26:34 +02:00
James Rasell	51a7df50bb	cli: add ability to create and view tokens with ACL role links.	2022-08-17 14:49:52 +01:00
Seth Hoenig	7728cf5a9a	Merge pull request #14132 from hashicorp/build-update-go1.19 build: update to go1.19	2022-08-16 11:20:27 -05:00
Seth Hoenig	b3ea68948b	build: run gofmt on all go source files Go 1.19 will forecefully format all your doc strings. To get this out of the way, here is one big commit with all the changes gofmt wants to make.	2022-08-16 11:14:11 -05:00
Seth Hoenig	56b0b456dc	Merge pull request #14102 from hashicorp/cleanup-mesh-gateway-value cleanup: consul mesh gateway type need not be pointer	2022-08-16 10:07:16 -05:00
Charlie Voiselle	dba6b39815	SV CLI: var init (#13820 ) * Nomad dep: add museli/reflow * SV CLI: var init	2022-08-15 13:43:29 -04:00
Tim Gross	4005759d28	move secure variable conflict resolution to state store (#13922 ) Move conflict resolution implementation into the state store with a new Apply RPC. This also makes the RPC for secure variables much more similar to Consul's KV, which will help us support soft deletes in a post-1.4.0 version of Nomad. Reimplement quotas in the state store functions. Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>	2022-08-15 11:19:53 -04:00
Seth Hoenig	f9355c29fb	cleanup: consul mesh gateway type need not be pointer This PR changes the use of structs.ConsulMeshGateway to value types instead of via pointers. This will help in a follow up PR where we cleanup a lot of custom comparison code with helper functions instead.	2022-08-13 11:26:58 -05:00
James Rasell	9c97560ded	cli: add new acl role subcommands for CRUD role actions. (#14087 )	2022-08-12 09:52:32 +02:00
Seth Hoenig	ba5c45ab93	cli: respect vault token in plan command This PR fixes a regression where the 'job plan' command would not respect a Vault token if set via --vault-token or $VAULT_TOKEN. Basically the same bug/fix as for the validate command in https://github.com/hashicorp/nomad/issues/13062 Fixes https://github.com/hashicorp/nomad/issues/13939	2022-08-11 08:54:08 -05:00
Seth Hoenig	1901cfaba8	Merge pull request #14069 from brian-athinkingape/cli-fix-memstats-cgroupsv2 cli: for systems with cgroups v2, fix alloation resource utilization showing 0 memory used	2022-08-11 07:27:48 -05:00
James Rasell	9cd0dd2ff7	http: add ACL Role HTTP endpoints for CRUD actions. These new endpoints are exposed under the /v1/acl/roles and /v1/acl/role endpoints.	2022-08-11 08:44:19 +01:00
Luiz Aoqui	815adbada5	Post 1.3.3 release (#14064 ) * Generate files for 1.3.3 release * Prepare for next release * Merge release 1.3.3 files Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>	2022-08-09 17:27:29 -04:00
Brian Chau	6621bb9db5	cli: for systems with cgroups v2, fix alloation resource utilization showing 0 memory used	2022-08-09 14:09:14 -07:00
Derek Strickland	77df9c133b	Add Nomad RetryConfig to agent template config (#13907 ) * add Nomad RetryConfig to agent template config	2022-08-03 16:56:30 -04:00
Piotr Kazmierczak	530280505f	client: enable specifying user/group permissions in the template stanza (#13755 ) * Adds Uid/Gid parameters to template. * Updated diff_test * fixed order * update jobspec and api * removed obsolete code * helper functions for jobspec parse test * updated documentation * adjusted API jobs test. * propagate uid/gid setting to job_endpoint * adjusted job_endpoint tests * making uid/gid into pointers * refactor * updated documentation * updated documentation * Update client/allocrunner/taskrunner/template/template_test.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * Update website/content/api-docs/json-jobs.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * propagating documentation change from Luiz * formatting * changelog entry * changed changelog entry Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-08-02 22:15:38 +02:00
James Rasell	bb5b510c9d	cli: do not import structs, use API package only. (#13938 )	2022-08-02 16:33:08 +02:00
Eric Weber	cbce13c1ac	Add stage_publish_base_dir field to csi_plugin stanza of a job (#13919 ) * Allow specification of CSI staging and publishing directory path * Add website documentation for stage_publish_dir * Replace erroneous reference to csi_plugin.mount_config with csi_plugin.mount_dir * Avoid requiring CSI plugins to be redeployed after introducing StagePublishDir	2022-08-02 09:42:44 -04:00
Tim Gross	e5ac6464f6	secure vars: enforce ENT quotas (OSS work) (#13951 ) Move the secure variables quota enforcement calls into the state store to ensure quota checks are atomic with quota updates (in the same transaction). Switch to a machine-size int instead of a uint64 for quota tracking. The ENT-side quota spec is described as int, and negative values have a meaning as "not permitted at all". Using the same type for tracking will make it easier to the math around checks, and uint64 is infeasibly large anyways. Add secure vars to quota HTTP API and CLI outputs and API docs.	2022-08-02 09:32:09 -04:00
James Rasell	663aa92b7a	Merge branch 'main' into f-gh-13120-sso-umbrella	2022-08-02 08:30:03 +01:00
Tim Gross	8404f998f7	fix flaky `TestAgent_ProxyRPC_Dev` test (#13925 ) This test is a fairly trivial test of the agent RPC, but the test setup waits for a short fixed window after the node starts to send the RPC. After looking at detailed logs for recent test failures, it looks like the node registration for the first node doesn't get a chance to happen before we make the RPC call. Use `WaitForResultUntil` to give the test more time to run in slower test environments, while allowing it to finish quickly if possible.	2022-07-28 14:47:15 -04:00
Lars Lehtonen	a80df0480e	testing: fix dropped test errors in command/agent (#13926 )	2022-07-28 11:04:31 -04:00
Seth Hoenig	d8fe1d10ba	cleanup: use constants for on_update values	2022-07-21 13:09:47 -05:00
Seth Hoenig	c61e779b48	Merge pull request #13715 from hashicorp/dev-nsd-checks client: add support for checks in nomad services	2022-07-21 10:22:57 -05:00
Seth Hoenig	67c6336c67	Merge pull request #13870 from hashicorp/exp-fp-optimization client: use test timeouts for network fingerprinters in dev mode	2022-07-21 08:18:02 -05:00
Tim Gross	9c43c28575	search: use secure vars ACL policy for secure vars context (#13788 ) The search RPC used a placeholder policy for searching within the secure variables context. Now that we have ACL policies built for secure variables, we can use them for search. Requires a new loose policy for checking if a token has any secure variables access within a namespace, so that we can filter on specific paths in the iterator.	2022-07-21 08:39:36 -04:00
Seth Hoenig	6f93aca63e	devmode: use minimal network timeouts for network fingerprinters in dev mode	2022-07-20 15:13:14 -05:00
Tim Gross	97a6346da0	keyring: use nanos for `CreateTime` in key metadata (#13849 ) Most of our objects use int64 timestamps derived from `UnixNano()` instead of `time.Time` objects. Switch the keyring metadata to use `UnixNano()` for consistency across the API.	2022-07-20 14:46:57 -04:00
Tim Gross	96aea74b4b	docs: keyring commands (#13690 ) Document the secure variables keyring commands, document the aliased gossip keyring commands, and note that the old gossip keyring commands are deprecated.	2022-07-20 14:14:10 -04:00
Will Jordan	5354409b1a	Return 429 response on HTTP max connection limit (#13621 ) Return 429 response on HTTP max connection limit. Instead of silently closing the connection, return a `429 Too Many Requests` HTTP response with a helpful error message to aid debugging when the connection limit is unintentionally reached. Set a 10-millisecond write timeout and rate limiter for connection-limit 429 response to prevent writing the HTTP response from consuming too many server resources. Add `nomad.agent.http.exceeded metric` counting the number of HTTP connections exceeding concurrency limit.	2022-07-20 14:12:21 -04:00
James Rasell	f6d12a3c00	acl: enable configuration and visualisation of token expiration for users (#13846 ) * api: add ACL token expiry params to HTTP API * cli: allow setting and displaying ACL token expiry	2022-07-20 10:06:23 +02:00
hc-github-team-nomad-core	fa09c13016	Generate files for 1.3.2 release	2022-07-13 19:33:41 -04:00
Michael Schurter	d54d90edfa	http: only log alloc/exec errors when non-nil (#13730 )	2022-07-13 09:44:51 -07:00
Luiz Aoqui	b656981cf0	Track plan rejection history and automatically mark clients as ineligible (#13421 ) Plan rejections occur when the scheduler work and the leader plan applier disagree on the feasibility of a plan. This may happen for valid reasons: since Nomad does parallel scheduling, it is expected that different workers will have a different state when computing placements. As the final plan reaches the leader plan applier, it may no longer be valid due to a concurrent scheduling taking up intended resources. In these situations the plan applier will notify the worker that the plan was rejected and that they should refresh their state before trying again. In some rare and unexpected circumstances it has been observed that workers will repeatedly submit the same plan, even if they are always rejected. While the root cause is still unknown this mitigation has been put in place. The plan applier will now track the history of plan rejections per client and include in the plan result a list of node IDs that should be set as ineligible if the number of rejections in a given time window crosses a certain threshold. The window size and threshold value can be adjusted in the server configuration. To avoid marking several nodes as ineligible at one, the operation is rate limited to 5 nodes every 30min, with an initial burst of 10 operations.	2022-07-12 18:40:20 -04:00
Seth Hoenig	297d386bdc	client: add support for checks in nomad services This PR adds support for specifying checks in services registered to the built-in nomad service provider. Currently only HTTP and TCP checks are supported, though more types could be added later.	2022-07-12 17:09:50 -05:00
Michael Schurter	3e50f72fad	core: merge reserved_ports into host_networks (#13651 ) Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.	2022-07-12 14:40:25 -07:00
Charlie Voiselle	6be7a41351	SV: CLI: var list command (#13707 ) * SV CLI: var list * Fix wildcard prefix filtering Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-07-12 12:49:39 -04:00
James Rasell	0292f48396	server: add ACL token expiration config parameters. (#13667 ) This commit adds configuration parameters to control ACL token expirations. This includes both limits on the min and max TTL expiration values, as well as a GC threshold for expired tokens.	2022-07-12 13:43:25 +02:00
Tim Gross	a5a9eedc81	core job for secure variables re-key (#13440 ) When the `Full` flag is passed for key rotation, we kick off a core job to decrypt and re-encrypt all the secure variables so that they use the new key.	2022-07-11 13:34:06 -04:00
Charlie Voiselle	555ac432cd	SV: CAS: Implement Check and Set for Delete and Upsert (#13429 ) * SV: CAS * Implement Check and Set for Delete and Upsert * Reading the conflict from the state store * Update endpoint for new error text * Updated HTTP api tests * Conflicts to the HTTP api * SV: structs: Update SV time to UnixNanos * update mock to UnixNano; refactor * SV: encrypter: quote KeyID in error * SV: mock: add mock for namespace w/ SV	2022-07-11 13:34:06 -04:00
Tim Gross	eaf430bfd5	secure variable server configuration (#13307 ) Add fields for configuring root key garbage collection and automatic rotation. Fix the keystore path so that we write to a tempdir when in dev mode.	2022-07-11 13:34:06 -04:00
Tim Gross	4d011d4c53	move gossip keyring command to their own subcommands (#13383 ) Move all the gossip keyring and key generation commands under `operator gossip keyring` subcommands to align with the new `operator secure-variables keyring` subcommands. Deprecate the `operator keyring` and `operator keygen` commands.	2022-07-11 13:34:06 -04:00
Charlie Voiselle	1fe080c6de	Implement HTTP search API for Variables (#13257 ) * Add Path only index for SecureVariables * Add GetSecureVariablesByPrefix; refactor tests * Add search for SecureVariables * Add prefix search for secure variables	2022-07-11 13:34:05 -04:00
Charlie Voiselle	06c6a950c4	Secure Variables: Seperate Encrypted and Decrypted structs (#13355 ) This PR splits SecureVariable into SecureVariableDecrypted and SecureVariableEncrypted in order to use the type system to help verify that cleartext secret material is not committed to file. * Make Encrypt function return KeyID * Split SecureVariable Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-07-11 13:34:05 -04:00
Tim Gross	56deb6f8cc	keyring CLI: refactor to use subcommands (#13351 ) Split the flag options for the `secure-variables keyring` into their own subcommands. The gossip keyring CLI will be similarly refactored and the old version will be deprecated.	2022-07-11 13:34:05 -04:00
Tim Gross	81b0c4fd36	keyring command line (#13169 ) Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>	2022-07-11 13:34:04 -04:00

... 2 3 4 5 6 ...

3623 Commits