open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	055434cca9	add metric for count of RPC requests (#15515 ) Implement a metric for RPC requests with labels on the identity, so that administrators can monitor the source of requests within the cluster. This changeset demonstrates the change with the new `ACL.WhoAmI` RPC, and we'll wire up the remaining RPCs once we've threaded the new pre-forwarding authentication through the all. Note that metrics are measured after we forward but before we return any authentication error. This ensures that we only emit metrics on the server that actually serves the request. We'll perform rate limiting at the same place. Includes telemetry configuration to omit identity labels.	2023-01-24 11:54:20 -05:00
Tim Gross	2030d62920	implement pre-forwarding auth on select RPCs (#15513 ) In #15417 we added a new `Authenticate` method to the server that returns an `AuthenticatedIdentity` struct. This changeset implements this method for a small number of RPC endpoints that together represent all the various ways in which RPCs are sent, so that we can validate that we're happy with this approach.	2023-01-24 10:52:07 -05:00
Karl Johann Schubert	b773a1b77f	client: add disk_total_mb and disk_free_mb config options (#15852 )	2023-01-24 09:14:22 -05:00
Charlie Voiselle	5ea1d8a970	Add raft snapshot configuration options (#15522 ) * Add config elements * Wire in snapshot configuration to raft * Add hot reload of raft config * Add documentation for new raft settings * Add changelog	2023-01-20 14:21:51 -05:00
James Rasell	f8f1d45e8a	cli: use localhost for default login callback address. (#15820 )	2023-01-19 16:46:17 +01:00
James Rasell	fad9b40e53	Merge branch 'main' into sso/gh-13120-oidc-login	2023-01-18 10:05:31 +00:00
Phil Renaud	98c5259f3e	[sso] OIDC Updates for the UI (#15804 ) * Updated UI to handle OIDC method changes * Remove redundant store unload call	2023-01-17 17:01:47 -05:00
Dao Thanh Tung	e2ae6d62e1	fix bug in nomad fmt -check does not return error code (#15797 )	2023-01-17 09:15:34 -05:00
James Rasell	d09138a7c5	cli: add login command to allow OIDC provider SSO login.	2023-01-13 13:16:09 +00:00
James Rasell	b3a6cfecc4	api: add OIDC HTTP API endpoints and SDK.	2023-01-13 13:15:58 +00:00
Seth Hoenig	fe7795ce16	consul/connect: support for proxy upstreams opaque config (#15761 ) This PR adds support for configuring `proxy.upstreams[].config` for Consul Connect upstreams. This is an opaque config value to Nomad - the data is passed directly to Consul and is unknown to Nomad.	2023-01-12 08:20:54 -06:00
Anthony Davis	1c32471805	Fix rejoin_after_leave behavior (#15552 )	2023-01-11 16:39:24 -05:00
Dao Thanh Tung	09b25d71b8	cli: Add a nomad operator client state command (#15469 ) Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>	2023-01-11 10:03:31 -05:00
Dao Thanh Tung	ca2f509e82	agent: Make agent syslog log level inherit from Nomad agent log (#15625 )	2023-01-04 09:38:06 -05:00
Tim Gross	8859e1bff1	csi: Fix parsing of '=' in secrets at command line and HTTP (#15670 ) The command line flag parsing and the HTTP header parsing for CSI secrets incorrectly split at more than one '=' rune, making it impossible to use secrets that included that rune.	2023-01-03 16:28:38 -05:00
Seth Hoenig	7214e21402	ci: swap freeport for portal in packages (#15661 )	2023-01-03 11:25:20 -06:00
Seth Hoenig	9eb2433871	command: fixup parsing of stale query parameter (#15631 ) In #15605 we fixed the bug where the presense of "stale" query parameter was mean to imply stale, even if the value of the parameter was "false" or malformed. In parsing, we missed the case where the slice of values would be nil which lead to a failing test case that was missed because CI didn't run against the original PR.	2023-01-03 08:21:20 -06:00
Seth Hoenig	266ca25a81	cleanup: remove usage of consul/sdk/testutil/retry (#15609 ) This PR removes usages of `consul/sdk/testutil/retry`, as part of the ongoing effort to remove use of any non-API module from Consul. There is one remanining usage in the helper/freeport package, but that will get removed as part of #15589	2023-01-02 08:06:20 -06:00
Dao Thanh Tung	53cd1b4871	fix: `stale` querystring parameter value as boolean (#15605 ) * Add changes to make stale querystring param boolean Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Make error message more consistent Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Changes from code review + Adding CHANGELOG file Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Changes from code review to use github.com/shoenig/test package Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Change must.Nil() to must.NoError() Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Minor fix on the import order Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Fix existing code format too Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * Minor changes addressing code review feedbacks Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> * swap must.EqOp() order of param provided Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg> Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>	2023-01-01 13:04:14 -06:00
Seth Hoenig	92dfa41286	command: fixup tests concerning multi job stop (#15606 ) * command: fixup job multi-stop test This PR refactors the StopCommand test that runs 10 jobs and then passes them all to one invokation of 'job stop'. * test: swap use of assert for must * test: cleanup job files we create * command: fixup job stop failure tests Now that JobStop works on concurrent jobs, the error messages are different. * cleanup: use multiple post scripts	2022-12-21 16:21:48 -06:00
Seth Hoenig	83f9fc9db4	tests: do not return error from testagent shutdown (#15595 )	2022-12-21 08:23:58 -06:00
Danish Prakash	dc81568f93	command/job_stop: accept multiple jobs, stop concurrently (#12582 ) * command/job_stop: accept multiple jobs, stop concurrently Signed-off-by: danishprakash <grafitykoncept@gmail.com> * command/job_stop_test: add test for multiple job stops Signed-off-by: danishprakash <grafitykoncept@gmail.com> * improve output, add changelog and docs Signed-off-by: danishprakash <grafitykoncept@gmail.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-12-16 15:46:58 -08:00
James Rasell	b0730ebb02	cli: add ACL binding rule commands for CRUD actions. (#15554 )	2022-12-15 16:57:44 +01:00
James Rasell	95c9ffa505	ACL: add ACL binding rule RPC and HTTP API handlers. (#15529 ) This change add the RPC ACL binding rule handlers. These handlers are responsible for the creation, updating, reading, and deletion of binding rules. The write handlers are feature gated so that they can only be used when all federated servers are running the required version. The HTTP API handlers and API SDK have also been added where required. This allows the endpoints to be called from the API by users and clients.	2022-12-15 09:18:55 +01:00
Piotr Kazmierczak	d62a869caa	acl: numerous small bugfixes for acl auth methods CLI (#15539 ) This PR contains a number of small bugfixes discovered during #15538 work.	2022-12-14 13:25:40 +01:00
Piotr Kazmierczak	db98e26375	bugfix: acl sso auth methods test failures (#15512 ) This PR fixes unit test failures introduced in f4e89e2	2022-12-09 18:47:32 +01:00
Piotr Kazmierczak	777173e8da	acl: added type to ACL Auth Method stub (#15480 )	2022-12-06 14:47:05 +01:00
Piotr Kazmierczak	9c3f04b488	bugfix: corrected indentation for ACL auth method create CLI command (#15481 )	2022-12-06 14:45:24 +01:00
Seth Hoenig	119f7b1cd1	consul: fixup expected consul tagged_addresses when using ipv6 (#15411 ) This PR is a continuation of #14917, where we missed the ipv6 cases. Consul auto-inserts tagged_addresses for keys - lan_ipv4 - wan_ipv4 - lan_ipv6 - wan_ipv6 even though the service registration coming from Nomad does not contain such elements. When doing the differential between services Nomad expects to be registered vs. the services actually registered into Consul, we must first purge these automatically inserted tagged_addresses if they do not exist in the Nomad view of the Consul service.	2022-12-01 07:38:30 -06:00
Piotr Kazmierczak	0eccd3286c	acl: sso auth methods RPC/API/CLI should return created or updated objects (#15410 ) Currently CRUD code that operates on SSO auth methods does not return created or updated object upon creation/update. This is bad UX and inconsistent behavior compared to other ACL objects like roles, policies or tokens. This PR fixes it. Relates to #13120	2022-11-29 07:36:36 +01:00
Piotr Kazmierczak	db9316c4d3	acl: sso auth methods cli commands (#15322 ) This PR implements CLI commands to interact with SSO auth methods. This PR is part of the SSO work captured under ☂️ ticket #13120.	2022-11-28 10:51:45 +01:00
Piotr Kazmierczak	9c85315bd2	bugfix: typos in acl role commands (#15382 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2022-11-25 10:28:33 +01:00
Luiz Aoqui	4208cfcfbd	cli: improve errors for multiregion deployments (#15326 ) Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>	2022-11-23 16:40:13 -05:00
Jack	62f7de7ed5	cli: `wait` flag for use with `deployment status -monitor` (#15262 )	2022-11-23 16:36:13 -05:00
James Rasell	32dfa431f3	sso: add ACL auth-method HTTP API CRUD endpoints (#15338 ) * core: remove custom auth-method TTLS and use ACL token TTLS. * agent: add ACL auth-method HTTP endpoints for CRUD actions. * api: add ACL auth-method client.	2022-11-23 09:38:02 +01:00
Lance Haig	0263e7af34	Add command "nomad tls" (#14296 )	2022-11-22 14:12:07 -05:00
hc-github-team-nomad-core	031d75e158	Generate files for 1.4.3 release	2022-11-22 12:56:29 -05:00
Seth Hoenig	bf4b5f9a8d	consul: add trace logging around service registrations (#15311 ) This PR adds trace logging around the differential done between a Nomad service registration and its corresponding Consul service registration, in an effort to shed light on why a service registration request is being made.	2022-11-21 08:03:56 -06:00
James Rasell	a7350853ae	api: ensure ACL role upsert decode error returns a 400 status code. (#15253 )	2022-11-18 17:47:43 +01:00
James Rasell	3225cf77b6	api: ensure all request body decode error return a 400 status code. (#15252 )	2022-11-18 17:04:33 +01:00
James Rasell	2e19e9639e	agent: ensure all HTTP Server methods are pointer receivers. (#15250 )	2022-11-15 16:31:44 +01:00
Tim Gross	37134a4a37	eval delete: move batching of deletes into RPC handler and state (#15117 ) During unusual outage recovery scenarios on large clusters, a backlog of millions of evaluations can appear. In these cases, the `eval delete` command can put excessive load on the cluster by listing large sets of evals to extract the IDs and then sending larges batches of IDs. Although the command's batch size was carefully tuned, we still need to be JSON deserialize, re-serialize to MessagePack, send the log entries through raft, and get the FSM applied. To improve performance of this recovery case, move the batching process into the RPC handler and the state store. The design here is a little weird, so let's look a the failed options first: * A naive solution here would be to just send the filter as the raft request and let the FSM apply delete the whole set in a single operation. Benchmarking with 1M evals on a 3 node cluster demonstrated this can block the FSM apply for several minutes, which puts the cluster at risk if there's a leadership failover (the barrier write can't be made while this apply is in-flight). * A less naive but still bad solution would be to have the RPC handler filter and paginate, and then hand a list of IDs to the existing raft log entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and took roughly an hour to complete. Instead, we're filtering and paginating in the RPC handler to find a page token, and then passing both the filter and page token in the raft log. The FSM apply recreates the paginator using the filter and page token to get roughly the same page of evaluations, which it then deletes. The pagination process is fairly cheap (only abut 5% of the total FSM apply time), so counter-intuitively this rework ends up being much faster. A benchmark of 1M evaluations showed this blocked the FSM apply for 20-30ms at a time (typical for normal operations) and completes in less than 4 minutes. Note that, as with the existing design, this delete is not consistent: a new evaluation inserted "behind" the cursor of the pagination will fail to be deleted.	2022-11-14 14:08:13 -05:00
Derek Strickland	80b6f27efd	api: remove `mapstructure` tags from`Port` struct (#12916 ) This PR solves a defect in the deserialization of api.Port structs when returning structs from theEventStream. Previously, the api.Port struct's fields were decorated with both mapstructure and hcl tags to support the network.port stanza's use of the keyword static when posting a static port value. This works fine when posting a job and when retrieving any struct that has an embedded api.Port instance as long as the value is deserialized using JSON decoding. The EventStream, however, uses mapstructure to decode event payloads in the api package. mapstructure expects an underlying field named static which does not exist. The result was that the Port.Value field would always be set to 0. Upon further inspection, a few things became apparent. The struct already has hcl tags that support the indirection during job submission. Serialization/deserialization with both the json and hcl packages produce the desired result. The use of of the mapstructure tags provided no value as the Port struct contains only fields with primitive types. This PR: Removes the mapstructure tags from the api.Port structs Updates the job parsing logic to use hcl instead of mapstructure when decoding Port instances. Closes #11044 Co-authored-by: DerekStrickland <dstrickland@hashicorp.com> Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>	2022-11-08 11:26:28 +01:00
Drew Gonzales	aac9404ee5	server: add git revision to serf tags (#9159 )	2022-11-07 10:34:33 -05:00
Tim Gross	9e1c0b46d8	API for `Eval.Count` (#15147 ) Add a new `Eval.Count` RPC and associated HTTP API endpoints. This API is designed to support interactive use in the `nomad eval delete` command to get a count of evals expected to be deleted before doing so. The state store operations to do this sort of thing are somewhat expensive, but it's cheaper than serializing a big list of evals to JSON. Note that although it seems like this could be done as an extra parameter and response field on `Eval.List`, having it as its own endpoint avoids having to change the response body shape and lets us avoid handling the legacy filter params supported by `Eval.List`.	2022-11-07 08:53:19 -05:00
Charlie Voiselle	79c4478f5b	template: error on missing key (#15141 ) * Support error_on_missing_value for templates * Update docs for template stanza	2022-11-04 13:23:01 -04:00
Phil Renaud	ffb4c63af7	[ui] Adds meta to job list stub and displays a pack logo on the jobs index (#14833 ) * Adds meta to job list stub and displays a pack logo on the jobs index * Changelog * Modifying struct for optional meta param * Explicitly ask for meta anytime I look up a job from index or job page * Test case for the endpoint * adding meta field to API struct and ommitting from response if empty * passthru method added to api/jobs.list * Meta param listed in docs for jobs list * Update api/jobs.go Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-11-02 16:58:24 -04:00
hc-github-team-nomad-core	fbef8881cd	Generate files for 1.4.2 release	2022-10-27 13:08:05 -04:00
Tim Gross	b9922631bd	keyring: fix missing GC config, don't rotate on manual GC (#15009 ) The configuration knobs for root keyring garbage collection are present in the consumer and present in the user-facing config, but we missed the spot where we copy from one to the other. Fix this so that users can set their own thresholds. The root key is automatically rotated every ~30d, but the function that does both rotation and key GC was wired up such that `nomad system gc` caused an unexpected key rotation. Split this into two functions so that `nomad system gc` cleans up old keys without forcing a rotation, which will be done periodially or by the `nomad operator root keyring rotate` command.	2022-10-24 08:43:42 -04:00
Luiz Aoqui	593e48e826	cli: prevent panic on `operator debug` (#14992 ) If the API returns an error during debug bundle collection the CLI was expanding the wrong error object, resulting in a panic since `err` is `nil`.	2022-10-20 15:53:58 -04:00
Luiz Aoqui	0fddb4d7e8	Post 1.4.1 release (#14988 ) * Generate files for 1.4.1 release * Prepare for next release Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>	2022-10-20 13:09:41 -04:00
Seth Hoenig	756b71b7d2	deps: bump shoenig for str func bugfixes (#14974 ) And fix the one place we use them.	2022-10-20 08:11:43 -05:00
James Rasell	d7b311ce55	acl: correctly resolve ACL roles within client cache. (#14922 ) The client ACL cache was not accounting for tokens which included ACL role links. This change modifies the behaviour to resolve role links to policies. It will also now store ACL roles within the cache for quick lookup. The cache TTL is configurable in the same manner as policies or tokens. Another small fix is included that takes into account the ACL token expiry time. This was not included, which meant tokens with expiry could be used past the expiry time, until they were GC'd.	2022-10-20 09:37:32 +02:00
Seth Hoenig	57375566d4	consul: register checks along with service on initial registration (#14944 ) * consul: register checks along with service on initial registration This PR updates Nomad's Consul service client to include checks in an initial service registration, so that the checks associated with the service are registered "atomically" with the service. Before, we would only register the checks after the service registration, which causes problems where the service is deemed healthy, even if one or more checks are unhealthy - especially problematic in the case where SuccessBeforePassing is configured. Fixes #3935 * cr: followup to fix cause of extra consul logging * cr: fix another bug * cr: fixup changelog	2022-10-19 12:40:56 -05:00
Seth Hoenig	f1b902beac	consul: do not re-register already registered services (#14917 ) This PR updates Nomad's Consul service client to do map comparisons using maps.Equal instead of reflect.DeepEqual. The bug fix is in how DeepEqual treats nil slices different from empty slices, when actually they should be treated the same.	2022-10-18 08:10:59 -05:00
Seth Hoenig	306b4dd38e	cleanup: remove another string-set helper function (#14902 )	2022-10-17 14:14:52 -05:00
Michael Schurter	45ce8c13cf	client: remove unused LogOutput and LogLevel (#14867 ) * client: remove unused LogOutput * client: remove unused config.LogLevel	2022-10-11 09:24:40 -07:00
Seth Hoenig	5e38a0e82c	cleanup: rename Equals to Equal for consistency (#14759 )	2022-10-10 09:28:46 -05:00
Damian Czaja	95f969c4bf	cli: add `nomad fmt` (#14779 )	2022-10-06 17:00:29 -04:00
Gabriel Villalonga Simon	b974c32ba6	Check that JobPlanResponse Diff Type is None before checking for changes on getExitCode (#14492 )	2022-10-06 16:23:22 -04:00
Giovani Avelar	a625de2062	Allow specification of a custom job name/prefix for parameterized jobs (#14631 )	2022-10-06 16:21:40 -04:00
hc-github-team-nomad-core	4fdcd197c0	Generate files for 1.4.0 release	2022-10-06 09:16:00 -07:00
Tim Gross	341dc84a77	variables: use correct URL in ref to docs (#14792 )	2022-10-04 11:30:49 -04:00
Seth Hoenig	c68ed3b4c8	client: protect user lookups with global lock (#14742 ) * client: protect user lookups with global lock This PR updates Nomad client to always do user lookups while holding a global process lock. This is to prevent concurrency unsafe implementations of NSS, but still enabling NSS lookups of users (i.e. cannot not use osusergo). * cl: add cl	2022-09-29 09:30:13 -05:00
hc-github-team-nomad-core	2fe5a962f3	Generate files for 1.4.0-rc.1 release	2022-09-27 17:33:32 -04:00
Tim Gross	a661399b41	cli: fix doc strings for `var get` command (#14697 )	2022-09-26 15:05:22 -04:00
Luiz Aoqui	f7c6534a79	cli: set content length on `operator api` requests (#14634 ) http.NewRequestWithContext will only set the right value for Content-Length if the input is bytes.Buffer, bytes.Reader, or *strings.Reader [0]. Since os.Stdin is an os.File, POST requests made with the `nomad operator api` command would always have Content-Length set to -1, which is interpreted as an unknown length by web servers. [0]: https://pkg.go.dev/net/http#NewRequestWithContext	2022-09-26 14:21:40 -04:00
Tim Gross	c29c4bd66c	cli: remove deprecated `eval status -json` list behavior (#14651 ) In Nomad 1.2.6 we shipped `eval list`, which accepts a `-json` flag, and deprecated the usage of `eval status` without an evaluation ID with an upgrade note that it would be removed in Nomad 1.4.0. This changeset completes that work.	2022-09-22 10:56:32 -04:00
Jorge Marey	584ddfe859	Add Namespace, Job and Group to envoy stats (#14311 )	2022-09-22 10:38:21 -04:00
Luiz Aoqui	e0ba6400a7	cli: print success message on var put (#14620 )	2022-09-22 10:18:01 -04:00
Tim Gross	d327a68696	operator debug: write NDJSON for large collections (#14610 ) The `operator debug` command writes JSON files from API responses as a single line containing an array of JSON objects. But some of these files can be extremely large (GB's) for large production clusters, which makes it difficult to parse them using typical line-oriented Unix command line tools that can stream their inputs without consuming a lot of memory. For collections that are typically large, instead emit newline-delimited JSON. This changeset includes some first-pass refactoring of this command. It breaks up monolithic methods that validate a path, create a file, serialize objects, and write them to disk into smaller functions, some of which can now be standalone to take advantage of generics.	2022-09-22 10:02:00 -04:00
James Rasell	a25028c412	cli: fix a bug in operator API when setting HTTPS via address. (#14635 ) Operators may have a setup whereby the TLS config comes from a source other than setting Nomad specific env vars. In this case, we should attempt to identify the scheme using the config setting as a fallback.	2022-09-22 15:43:58 +02:00
Seth Hoenig	2088ca3345	cleanup more helper updates (#14638 ) * cleanup: refactor MapStringStringSliceValueSet to be cleaner * cleanup: replace SliceStringToSet with actual set * cleanup: replace SliceStringSubset with real set * cleanup: replace SliceStringContains with slices.Contains * cleanup: remove unused function SliceStringHasPrefix * cleanup: fixup StringHasPrefixInSlice doc string * cleanup: refactor SliceSetDisjoint to use real set * cleanup: replace CompareSliceSetString with SliceSetEq * cleanup: replace CompareMapStringString with maps.Equal * cleanup: replace CopyMapStringString with CopyMap * cleanup: replace CopyMapStringInterface with CopyMap * cleanup: fixup more CopyMapStringString and CopyMapStringInt * cleanup: replace CopySliceString with slices.Clone * cleanup: remove unused CopySliceInt * cleanup: refactor CopyMapStringSliceString to be generic as CopyMapOfSlice * cleanup: replace CopyMap with maps.Clone * cleanup: run go mod tidy	2022-09-21 14:53:25 -05:00
Luiz Aoqui	c3c8ae584f	api: provide more detail on ACL bootstrap request error (#14629 )	2022-09-20 21:20:04 -04:00
Derek Strickland	24af28dc30	Merge pull request #14602 from hashicorp/release/1.4.0-beta.1 Release/1.4.0 beta.1	2022-09-15 13:57:40 -04:00
Tim Gross	81516db4b2	variables: fix ENT-only test failure in command tests (#14599 ) The `TestVarGetCommand` test uses the wrong namespace in the autocomplete test. The namespace only gets validated against if we have quota enforcement (or more typically by ACL checks), so the test only fails in the ENT repo test runs.	2022-09-15 10:37:57 -04:00
hc-github-team-nomad-core	a3a718e167	Generate files for 1.4.0-beta.1 release	2022-09-14 19:32:18 +00:00
hc-github-team-nomad-core	b91437bb68	Generate files for 1.4.0-beta.1 release	2022-09-14 18:59:59 +00:00
Seth Hoenig	5187f92c5e	cleanup: create interface for check watcher and mock it in nsd tests (#14577 ) * cleanup: create interface for check watcher and mock it in nsd tests * cleanup: add comments for check watcher interface	2022-09-14 08:25:20 -05:00
Seth Hoenig	bf4dd30919	Merge pull request #14553 from hashicorp/f-nsd-check-watcher servicedisco: implement check_restart support for nomad service checks	2022-09-13 09:55:51 -05:00
Tim Gross	357e7f4521	docs: include path in ACL requirements for variables (#14561 ) Also add links to the ACL policy reference and variables concepts docs near the top of the page.	2022-09-13 10:21:29 -04:00
Seth Hoenig	9a943107c7	servicedisco: implement check_restart for nomad service checks This PR implements support for check_restart for checks registered in the Nomad service provider. Unlike Consul, Nomad service checks never report a "warning" status, and so the check_restart.ignore_warnings configuration is not valid for Nomad service checks.	2022-09-13 08:59:23 -05:00
Seth Hoenig	b960925939	Merge pull request #14546 from hashicorp/f-refactor-check-watcher client: refactor check watcher to be reusable	2022-09-13 07:32:32 -05:00
Tim Gross	cd7aba96fc	variables: change spec file extension to match rename (#14552 ) Also fixes a typo in the `var put` help text.	2022-09-12 16:26:18 -04:00
Charlie Voiselle	4c9554f87c	Update flags to align with other var commands. (#14550 )	2022-09-12 15:26:12 -04:00
Seth Hoenig	feff36f3f7	client: refactor check watcher to be reusable This PR refactors agent/consul/check_watcher into client/serviceregistration, and abstracts away the Consul-specific check lookups. In doing so we should be able to reuse the existing check watcher logic for also watching NSD checks in a followup PR. A chunk of consul/unit_test.go is removed - we'll cover that in e2e tests in a follow PR if needed. In the long run I'd like to remove this whole file.	2022-09-12 10:13:31 -05:00
Charlie Voiselle	b55112714f	Vars: CLI commands for `var get`, `var put`, `var purge` (#14400 ) * Includes updates to `var init`	2022-09-09 17:55:20 -04:00
Seth Hoenig	31234d6a62	cleanup: consolidate interfaces for workload restarting This PR combines two of the same interface definitions around workload restarting	2022-09-09 08:59:04 -05:00
Tim Gross	9259a373cd	remove root keyring install API (#14514 ) * keyring rotate API should require put/post method * remove keyring install API	2022-09-09 08:50:35 -04:00
James Rasell	3fa8b0b270	client: fix RPC forwarding when querying checks for alloc. (#14498 ) When querying the checks for an allocation, the request must be forwarded to the agent that is running the allocation. If the initial request is made to a server agent, the request can be made directly to the client agent running the allocation. If the request is made to a client agent not running the alloc, the request needs to be forwarded to a server and then the correct client.	2022-09-08 16:55:23 +02:00
Tim Gross	6ff59e71a5	cli: remove network from `quota status` output (#14468 ) Network quotas were removed in Nomad 1.0.4. Remove the fields no longer in use from the `quota status` output.	2022-09-06 09:37:16 -04:00
Tim Gross	7921f044e5	migrate autopilot implementation to raft-autopilot (#14441 ) Nomad's original autopilot was importing from a private package in Consul. It has been moved out to a shared library. Switch Nomad to use this library so that we can eliminate the import of Consul, which is necessary to build Nomad ENT with the current version of the Consul SDK. This also will let us pick up autopilot improvements shared with Consul more easily.	2022-09-01 14:27:10 -04:00
Luiz Aoqui	94d7dddccd	cli: set -hcl2-strict to false if -hcl1 is defined (#14426 ) These options are mutually exclusive but, since `-hcl2-strict` defaults to `true` users had to explicitily set it to `false` when using `-hcl1`. Also return `255` when job plan fails validation as this is the expected code in this situation.	2022-09-01 10:42:08 -04:00
Derek Strickland	35e91ff376	Merge release 1.3.5 files (#14425 ) * Merge release 1.3.5 files * Generate files for 1.3.5 release * Prepare for next release Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>	2022-08-31 18:31:56 -04:00
Charlie Voiselle	5c0e34dd33	Vars: Update CT dependency to support variables. (#14399 ) * Update Consul Template dep to support Nomad vars * Remove `Peering` config for Consul Testservers Upgrading to the 1.14 Consul SDK introduces and additional default configuration—`Peering`—that is not compatible with versions of Consul before v1.13.0. because Nomad tests against Consul v1.11.1, this configuration has to be nil'ed out before passing it to the Consul binary.	2022-08-30 15:26:01 -04:00
Tim Gross	cc9b480996	testing: setting env var incompatible with parallel tests (#14405 ) Neither the `os.Setenv` nor `t.Setenv` helper are safe to use in parallel tests because environment variables are process-global. The stdlib panics if you try to do this. Remove the `ci.Parallel()` call from all tests where we're setting environment variables.	2022-08-30 14:49:03 -04:00
Tim Gross	5784fb8c58	search: enforce correct ACL for search over variables (#14397 )	2022-08-30 13:27:31 -04:00
Seth Hoenig	52de2dc09d	Merge pull request #14290 from hashicorp/cleanup-more-helper-cleanup cleanup: tidy up helper package some more	2022-08-30 08:19:48 -05:00
James Rasell	755b4745ed	Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main	2022-08-30 08:59:13 +01:00
Tim Gross	62a968f443	Merge pull request #14351 from hashicorp/variables-rename Variables rename	2022-08-29 11:36:50 -04:00
Michael Schurter	dbffe22465	consul: allow stale namespace results (#12953 ) Nomad reconciles services it expects to be registered in Consul with what is actually registered in the local Consul agent. This is necessary to prevent leaking service registrations if Nomad crashes at certain points (or if there are bugs). When Consul has namespaces enabled, we must iterate over each available namespace to be sure no services were leaked into non-default namespaces. Since this reconciliation happens often, there's no need to require results from the Consul leader server. In large clusters this creates far more load than the "freshness" of the response is worth. Therefore this patch switches the request to AllowStale=true	2022-08-26 16:05:12 -07:00
Tim Gross	1dc053b917	rename SecureVariables to Variables throughout	2022-08-26 16:06:24 -04:00
Tim Gross	dcfd31296b	file rename	2022-08-26 16:06:24 -04:00
Vladimir Sokolov	b646810401	cli: force periodic job if its id equals search prefix	2022-08-26 10:54:37 -04:00
Luiz Aoqui	ad84b22a72	Post 1.3.4 release (#14329 ) * Generate files for 1.3.4 release * Prepare for next release * Update CHANGELOG.md Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>	2022-08-26 10:09:13 -04:00
Charlie Voiselle	ad737d008b	SV API: return upserted variable to caller (#14325 ) * Return created variable to caller in HTTP and Go APIs * Update tests for returned values	2022-08-25 17:38:15 -04:00
Seth Hoenig	fd9744b9eb	Merge pull request #14301 from hashicorp/b-fix-check-status-test-racey testing: fix flakey check status test	2022-08-25 08:30:46 -05:00
James Rasell	601588df6b	Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main	2022-08-25 12:14:29 +01:00
Luiz Aoqui	e012d9411e	Task lifecycle restart (#14127 ) * allocrunner: handle lifecycle when all tasks die When all tasks die the Coordinator must transition to its terminal state, coordinatorStatePoststop, to unblock poststop tasks. Since this could happen at any time (for example, a prestart task dies), all states must be able to transition to this terminal state. * allocrunner: implement different alloc restarts Add a new alloc restart mode where all tasks are restarted, even if they have already exited. Also unifies the alloc restart logic to use the implementation that restarts tasks concurrently and ignores ErrTaskNotRunning errors since those are expected when restarting the allocation. * allocrunner: allow tasks to run again Prevent the task runner Run() method from exiting to allow a dead task to run again. When the task runner is signaled to restart, the function will jump back to the MAIN loop and run it again. The task runner determines if a task needs to run again based on two new task events that were added to differentiate between a request to restart a specific task, the tasks that are currently running, or all tasks that have already run. * api/cli: add support for all tasks alloc restart Implement the new -all-tasks alloc restart CLI flag and its API counterpar, AllTasks. The client endpoint calls the appropriate restart method from the allocrunner depending on the restart parameters used. * test: fix tasklifecycle Coordinator test * allocrunner: kill taskrunners if all tasks are dead When all non-poststop tasks are dead we need to kill the taskrunners so we don't leak their goroutines, which are blocked in the alloc restart loop. This also ensures the allocrunner exits on its own. * taskrunner: fix tests that waited on WaitCh Now that "dead" tasks may run again, the taskrunner Run() method will not return when the task finishes running, so tests must wait for the task state to be "dead" instead of using the WaitCh, since it won't be closed until the taskrunner is killed. * tests: add tests for all tasks alloc restart * changelog: add entry for #14127 * taskrunner: fix restore logic. The first implementation of the task runner restore process relied on server data (`tr.Alloc().TerminalStatus()`) which may not be available to the client at the time of restore. It also had the incorrect code path. When restoring a dead task the driver handle always needs to be clear cleanly using `clearDriverHandle` otherwise, after exiting the MAIN loop, the task may be killed by `tr.handleKill`. The fix is to store the state of the Run() loop in the task runner local client state: if the task runner ever exits this loop cleanly (not with a shutdown) it will never be able to run again. So if the Run() loops starts with this local state flag set, it must exit early. This local state flag is also being checked on task restart requests. If the task is "dead" and its Run() loop is not active it will never be able to run again. * address code review requests * apply more code review changes * taskrunner: add different Restart modes Using the task event to differentiate between the allocrunner restart methods proved to be confusing for developers to understand how it all worked. So instead of relying on the event type, this commit separated the logic of restarting an taskRunner into two methods: - `Restart` will retain the current behaviour and only will only restart the task if it's currently running. - `ForceRestart` is the new method where a `dead` task is allowed to restart if its `Run()` method is still active. Callers will need to restart the allocRunner taskCoordinator to make sure it will allow the task to run again. * minor fixes	2022-08-24 17:43:07 -04:00
Tim Gross	c732b215f0	vault: detect namespace change in config reload (#14298 ) The `namespace` field was not included in the equality check between old and new Vault configurations, which meant that a Vault config change that only changed the namespace would not be detected as a change and the clients would not be reloaded. Also, the comparison for boolean fields such as `enabled` and `allow_unauthenticated` was on the pointer and not the value of that pointer, which results in spurious reloads in real config reload that is easily missed in typical test scenarios. Includes a minor refactor of the order of fields for `Copy` and `Merge` to match the struct fields in hopes it makes it harder to make this mistake in the future, as well as additional test coverage.	2022-08-24 17:03:29 -04:00
Seth Hoenig	ff59b90d41	testing: fix flakey check status test This PR fixes a flakey test where we did not wait on the check status to actually become failing (go too fast and you just get a pending check). Instead add a helper for waiting on any check in the alloc to become the state we are looking for.	2022-08-24 15:11:41 -05:00
Seth Hoenig	062c817450	cleanup: move fs helpers into escapingfs	2022-08-24 14:45:34 -05:00
Piotr Kazmierczak	7077d1f9aa	template: custom change_mode scripts (#13972 ) This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707	2022-08-24 17:43:01 +02:00
James Rasell	2ccc48c167	cli: use policy flag for role creation and update.	2022-08-24 15:15:02 +01:00
James Rasell	7401677e4e	cli: output none when a token has no expiration.	2022-08-24 15:14:49 +01:00
James Rasell	9782d6d7ff	acl: allow tokens to lookup linked roles. (#14227 ) When listing or reading an ACL role, roles linked to the ACL token used for authentication can be returned to the caller.	2022-08-24 13:51:51 +02:00
Luiz Aoqui	7ee3de3ea5	fix minor issues found durint ENT merge (#14250 )	2022-08-23 17:22:18 -04:00
Luiz Aoqui	7a8cacc9ec	allocrunner: refactor task coordinator (#14009 ) The current implementation for the task coordinator unblocks tasks by performing destructive operations over its internal state (like closing channels and deleting maps from keys). This presents a problem in situations where we would like to revert the state of a task, such as when restarting an allocation with tasks that have already exited. With this new implementation the task coordinator behaves more like a finite state machine where task may be blocked/unblocked multiple times by performing a state transition. This initial part of the work only refactors the task coordinator and is functionally equivalent to the previous implementation. Future work will build upon this to provide bug fixes and enhancements.	2022-08-22 18:38:49 -04:00
Tim Gross	bf57d76ec7	allow ACL policies to be associated with workload identity (#14140 ) The original design for workload identities and ACLs allows for operators to extend the automatic capabilities of a workload by using a specially-named policy. This has shown to be potentially unsafe because of naming collisions, so instead we'll allow operators to explicitly attach a policy to a workload identity. This changeset adds workload identity fields to ACL policy objects and threads that all the way down to the command line. It also a new secondary index to the ACL policy table on namespace and job so that claim resolution can efficiently query for related policies.	2022-08-22 16:41:21 -04:00
Luiz Aoqui	dbffdca92e	template: use pointer values for gid and uid (#14203 ) When a Nomad agent starts and loads jobs that already existed in the cluster, the default template uid and gid was being set to 0, since this is the zero value for int. This caused these jobs to fail in environments where it was not possible to use 0, such as in Windows clients. In order to differentiate between an explicit 0 and a template where these properties were not set we need to use a pointer.	2022-08-22 16:25:49 -04:00
James Rasell	2736cf0dfa	acl: make listing RPC and HTTP API a stub return object. (#14211 ) Making the ACL Role listing return object a stub future-proofs the endpoint. In the event the role object grows, we are not bound by having to return all fields within the list endpoint or change the signature of the endpoint to reduce the list return size.	2022-08-22 17:20:23 +02:00
James Rasell	802d005ef5	acl: add replication to ACL Roles from authoritative region. (#14176 ) ACL Roles along with policies and global token will be replicated from the authoritative region to all federated regions. This involves a new replication loop running on the federated leader. Policies and roles may be replicated at different times, meaning the policies and role references may not be present within the local state upon replication upsert. In order to bypass the RPC and state check, a new RPC request parameter has been added. This is used by the replication process; all other callers will trigger the ACL role policy validation check. There is a new ACL RPC endpoint to allow the reading of a set of ACL Roles which is required by the replication process and matches ACL Policies and Tokens. A bug within the ACL Role listing RPC has also been fixed which returned incorrect data during blocking queries where a deletion had occurred.	2022-08-22 08:54:07 +02:00
Seth Hoenig	88a1353149	cli: display nomad service check status output in CLI commands This PR adds some NSD check status output to the CLI. 1. The 'nomad alloc status' command produces nsd check summary output (if present) 2. The 'nomad alloc checks' sub-command is added to produce complete nsd check output (if present)	2022-08-19 09:18:29 -05:00
Michael Schurter	3b57df33e3	client: fix data races in config handling (#14139 ) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex only protects the config pointer itself and not fields inside the Config struct: all config mutation must be done on a copy of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an internally backward incompatible API change: our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"	2022-08-18 16:32:04 -07:00
Seth Hoenig	c5d36eaa2f	cleanup: fixing warnings and refactoring of command package, part 2 This PR continues the cleanup of the command package, removing linter warnings, refactoring to use helpers, making tests easier to read, etc.	2022-08-18 09:43:39 -05:00
Seth Hoenig	4c1a0d4907	cleanup: first pass at fixing command package warnings This PR is the first of several for cleaning up warnings, and refactoring bits of code in the command package. First pass is over acl_ files and gets some helpers in place.	2022-08-17 15:33:37 -05:00
Piotr Kazmierczak	b63944b5c1	cleanup: replace TypeToPtr helper methods with pointer.Of (#14151 ) Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.	2022-08-17 18:26:34 +02:00
James Rasell	51a7df50bb	cli: add ability to create and view tokens with ACL role links.	2022-08-17 14:49:52 +01:00
Seth Hoenig	7728cf5a9a	Merge pull request #14132 from hashicorp/build-update-go1.19 build: update to go1.19	2022-08-16 11:20:27 -05:00
Seth Hoenig	b3ea68948b	build: run gofmt on all go source files Go 1.19 will forecefully format all your doc strings. To get this out of the way, here is one big commit with all the changes gofmt wants to make.	2022-08-16 11:14:11 -05:00
Seth Hoenig	56b0b456dc	Merge pull request #14102 from hashicorp/cleanup-mesh-gateway-value cleanup: consul mesh gateway type need not be pointer	2022-08-16 10:07:16 -05:00
Charlie Voiselle	dba6b39815	SV CLI: var init (#13820 ) * Nomad dep: add museli/reflow * SV CLI: var init	2022-08-15 13:43:29 -04:00
Tim Gross	4005759d28	move secure variable conflict resolution to state store (#13922 ) Move conflict resolution implementation into the state store with a new Apply RPC. This also makes the RPC for secure variables much more similar to Consul's KV, which will help us support soft deletes in a post-1.4.0 version of Nomad. Reimplement quotas in the state store functions. Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>	2022-08-15 11:19:53 -04:00
Seth Hoenig	f9355c29fb	cleanup: consul mesh gateway type need not be pointer This PR changes the use of structs.ConsulMeshGateway to value types instead of via pointers. This will help in a follow up PR where we cleanup a lot of custom comparison code with helper functions instead.	2022-08-13 11:26:58 -05:00
James Rasell	9c97560ded	cli: add new acl role subcommands for CRUD role actions. (#14087 )	2022-08-12 09:52:32 +02:00
Seth Hoenig	ba5c45ab93	cli: respect vault token in plan command This PR fixes a regression where the 'job plan' command would not respect a Vault token if set via --vault-token or $VAULT_TOKEN. Basically the same bug/fix as for the validate command in https://github.com/hashicorp/nomad/issues/13062 Fixes https://github.com/hashicorp/nomad/issues/13939	2022-08-11 08:54:08 -05:00
Seth Hoenig	1901cfaba8	Merge pull request #14069 from brian-athinkingape/cli-fix-memstats-cgroupsv2 cli: for systems with cgroups v2, fix alloation resource utilization showing 0 memory used	2022-08-11 07:27:48 -05:00
James Rasell	9cd0dd2ff7	http: add ACL Role HTTP endpoints for CRUD actions. These new endpoints are exposed under the /v1/acl/roles and /v1/acl/role endpoints.	2022-08-11 08:44:19 +01:00
Luiz Aoqui	815adbada5	Post 1.3.3 release (#14064 ) * Generate files for 1.3.3 release * Prepare for next release * Merge release 1.3.3 files Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>	2022-08-09 17:27:29 -04:00
Brian Chau	6621bb9db5	cli: for systems with cgroups v2, fix alloation resource utilization showing 0 memory used	2022-08-09 14:09:14 -07:00
Derek Strickland	77df9c133b	Add Nomad RetryConfig to agent template config (#13907 ) * add Nomad RetryConfig to agent template config	2022-08-03 16:56:30 -04:00
Piotr Kazmierczak	530280505f	client: enable specifying user/group permissions in the template stanza (#13755 ) * Adds Uid/Gid parameters to template. * Updated diff_test * fixed order * update jobspec and api * removed obsolete code * helper functions for jobspec parse test * updated documentation * adjusted API jobs test. * propagate uid/gid setting to job_endpoint * adjusted job_endpoint tests * making uid/gid into pointers * refactor * updated documentation * updated documentation * Update client/allocrunner/taskrunner/template/template_test.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * Update website/content/api-docs/json-jobs.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * propagating documentation change from Luiz * formatting * changelog entry * changed changelog entry Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-08-02 22:15:38 +02:00
James Rasell	bb5b510c9d	cli: do not import structs, use API package only. (#13938 )	2022-08-02 16:33:08 +02:00
Eric Weber	cbce13c1ac	Add stage_publish_base_dir field to csi_plugin stanza of a job (#13919 ) * Allow specification of CSI staging and publishing directory path * Add website documentation for stage_publish_dir * Replace erroneous reference to csi_plugin.mount_config with csi_plugin.mount_dir * Avoid requiring CSI plugins to be redeployed after introducing StagePublishDir	2022-08-02 09:42:44 -04:00
Tim Gross	e5ac6464f6	secure vars: enforce ENT quotas (OSS work) (#13951 ) Move the secure variables quota enforcement calls into the state store to ensure quota checks are atomic with quota updates (in the same transaction). Switch to a machine-size int instead of a uint64 for quota tracking. The ENT-side quota spec is described as int, and negative values have a meaning as "not permitted at all". Using the same type for tracking will make it easier to the math around checks, and uint64 is infeasibly large anyways. Add secure vars to quota HTTP API and CLI outputs and API docs.	2022-08-02 09:32:09 -04:00
James Rasell	663aa92b7a	Merge branch 'main' into f-gh-13120-sso-umbrella	2022-08-02 08:30:03 +01:00
Tim Gross	8404f998f7	fix flaky `TestAgent_ProxyRPC_Dev` test (#13925 ) This test is a fairly trivial test of the agent RPC, but the test setup waits for a short fixed window after the node starts to send the RPC. After looking at detailed logs for recent test failures, it looks like the node registration for the first node doesn't get a chance to happen before we make the RPC call. Use `WaitForResultUntil` to give the test more time to run in slower test environments, while allowing it to finish quickly if possible.	2022-07-28 14:47:15 -04:00
Lars Lehtonen	a80df0480e	testing: fix dropped test errors in command/agent (#13926 )	2022-07-28 11:04:31 -04:00
Seth Hoenig	d8fe1d10ba	cleanup: use constants for on_update values	2022-07-21 13:09:47 -05:00
Seth Hoenig	c61e779b48	Merge pull request #13715 from hashicorp/dev-nsd-checks client: add support for checks in nomad services	2022-07-21 10:22:57 -05:00
Seth Hoenig	67c6336c67	Merge pull request #13870 from hashicorp/exp-fp-optimization client: use test timeouts for network fingerprinters in dev mode	2022-07-21 08:18:02 -05:00
Tim Gross	9c43c28575	search: use secure vars ACL policy for secure vars context (#13788 ) The search RPC used a placeholder policy for searching within the secure variables context. Now that we have ACL policies built for secure variables, we can use them for search. Requires a new loose policy for checking if a token has any secure variables access within a namespace, so that we can filter on specific paths in the iterator.	2022-07-21 08:39:36 -04:00
Seth Hoenig	6f93aca63e	devmode: use minimal network timeouts for network fingerprinters in dev mode	2022-07-20 15:13:14 -05:00
Tim Gross	97a6346da0	keyring: use nanos for `CreateTime` in key metadata (#13849 ) Most of our objects use int64 timestamps derived from `UnixNano()` instead of `time.Time` objects. Switch the keyring metadata to use `UnixNano()` for consistency across the API.	2022-07-20 14:46:57 -04:00
Tim Gross	96aea74b4b	docs: keyring commands (#13690 ) Document the secure variables keyring commands, document the aliased gossip keyring commands, and note that the old gossip keyring commands are deprecated.	2022-07-20 14:14:10 -04:00
Will Jordan	5354409b1a	Return 429 response on HTTP max connection limit (#13621 ) Return 429 response on HTTP max connection limit. Instead of silently closing the connection, return a `429 Too Many Requests` HTTP response with a helpful error message to aid debugging when the connection limit is unintentionally reached. Set a 10-millisecond write timeout and rate limiter for connection-limit 429 response to prevent writing the HTTP response from consuming too many server resources. Add `nomad.agent.http.exceeded metric` counting the number of HTTP connections exceeding concurrency limit.	2022-07-20 14:12:21 -04:00
James Rasell	f6d12a3c00	acl: enable configuration and visualisation of token expiration for users (#13846 ) * api: add ACL token expiry params to HTTP API * cli: allow setting and displaying ACL token expiry	2022-07-20 10:06:23 +02:00
hc-github-team-nomad-core	fa09c13016	Generate files for 1.3.2 release	2022-07-13 19:33:41 -04:00
Michael Schurter	d54d90edfa	http: only log alloc/exec errors when non-nil (#13730 )	2022-07-13 09:44:51 -07:00
Luiz Aoqui	b656981cf0	Track plan rejection history and automatically mark clients as ineligible (#13421 ) Plan rejections occur when the scheduler work and the leader plan applier disagree on the feasibility of a plan. This may happen for valid reasons: since Nomad does parallel scheduling, it is expected that different workers will have a different state when computing placements. As the final plan reaches the leader plan applier, it may no longer be valid due to a concurrent scheduling taking up intended resources. In these situations the plan applier will notify the worker that the plan was rejected and that they should refresh their state before trying again. In some rare and unexpected circumstances it has been observed that workers will repeatedly submit the same plan, even if they are always rejected. While the root cause is still unknown this mitigation has been put in place. The plan applier will now track the history of plan rejections per client and include in the plan result a list of node IDs that should be set as ineligible if the number of rejections in a given time window crosses a certain threshold. The window size and threshold value can be adjusted in the server configuration. To avoid marking several nodes as ineligible at one, the operation is rate limited to 5 nodes every 30min, with an initial burst of 10 operations.	2022-07-12 18:40:20 -04:00
Seth Hoenig	297d386bdc	client: add support for checks in nomad services This PR adds support for specifying checks in services registered to the built-in nomad service provider. Currently only HTTP and TCP checks are supported, though more types could be added later.	2022-07-12 17:09:50 -05:00
Michael Schurter	3e50f72fad	core: merge reserved_ports into host_networks (#13651 ) Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.	2022-07-12 14:40:25 -07:00
Charlie Voiselle	6be7a41351	SV: CLI: var list command (#13707 ) * SV CLI: var list * Fix wildcard prefix filtering Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-07-12 12:49:39 -04:00
James Rasell	0292f48396	server: add ACL token expiration config parameters. (#13667 ) This commit adds configuration parameters to control ACL token expirations. This includes both limits on the min and max TTL expiration values, as well as a GC threshold for expired tokens.	2022-07-12 13:43:25 +02:00
Tim Gross	a5a9eedc81	core job for secure variables re-key (#13440 ) When the `Full` flag is passed for key rotation, we kick off a core job to decrypt and re-encrypt all the secure variables so that they use the new key.	2022-07-11 13:34:06 -04:00
Charlie Voiselle	555ac432cd	SV: CAS: Implement Check and Set for Delete and Upsert (#13429 ) * SV: CAS * Implement Check and Set for Delete and Upsert * Reading the conflict from the state store * Update endpoint for new error text * Updated HTTP api tests * Conflicts to the HTTP api * SV: structs: Update SV time to UnixNanos * update mock to UnixNano; refactor * SV: encrypter: quote KeyID in error * SV: mock: add mock for namespace w/ SV	2022-07-11 13:34:06 -04:00
Tim Gross	eaf430bfd5	secure variable server configuration (#13307 ) Add fields for configuring root key garbage collection and automatic rotation. Fix the keystore path so that we write to a tempdir when in dev mode.	2022-07-11 13:34:06 -04:00
Tim Gross	4d011d4c53	move gossip keyring command to their own subcommands (#13383 ) Move all the gossip keyring and key generation commands under `operator gossip keyring` subcommands to align with the new `operator secure-variables keyring` subcommands. Deprecate the `operator keyring` and `operator keygen` commands.	2022-07-11 13:34:06 -04:00
Charlie Voiselle	1fe080c6de	Implement HTTP search API for Variables (#13257 ) * Add Path only index for SecureVariables * Add GetSecureVariablesByPrefix; refactor tests * Add search for SecureVariables * Add prefix search for secure variables	2022-07-11 13:34:05 -04:00
Charlie Voiselle	06c6a950c4	Secure Variables: Seperate Encrypted and Decrypted structs (#13355 ) This PR splits SecureVariable into SecureVariableDecrypted and SecureVariableEncrypted in order to use the type system to help verify that cleartext secret material is not committed to file. * Make Encrypt function return KeyID * Split SecureVariable Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-07-11 13:34:05 -04:00
Tim Gross	56deb6f8cc	keyring CLI: refactor to use subcommands (#13351 ) Split the flag options for the `secure-variables keyring` into their own subcommands. The gossip keyring CLI will be similarly refactored and the old version will be deprecated.	2022-07-11 13:34:05 -04:00
Tim Gross	81b0c4fd36	keyring command line (#13169 ) Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>	2022-07-11 13:34:04 -04:00
Charlie Voiselle	619e0cbafd	Don't write a SecureVariable with no Items (#13258 )	2022-07-11 13:34:04 -04:00
Tim Gross	5a85d96322	remove end-user algorithm selection (#13190 ) After internal design review, we decided to remove exposing algorithm choice to the end-user for the initial release. We'll solve nonce rotation by forcing rotations automatically on key GC (in a core job, not included in this changeset). Default to AES-256 GCM for the following criteria: * faster implementation when hardware acceleration is available * FIPS compliant * implementation in pure go * post-quantum resistance Also fixed a bug in the decoding from keystore and switched to a harder-to-misuse encoding method.	2022-07-11 13:34:04 -04:00
Tim Gross	f2ee585830	bootstrap keyring (#13124 ) When a server becomes leader, it will check if there are any keys in the state store, and create one if there is not. The key metadata will be replicated via raft to all followers, who will then get the key material via key replication (not implemented in this changeset).	2022-07-11 13:34:04 -04:00
Charlie Voiselle	3717688f3e	Secure Variables: Variables - State store, FSM, RPC (#13098 ) * Secure Variables: State Store * Secure Variables: FSM * Secure Variables: RPC * Secure Variables: HTTP API Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-07-11 13:34:04 -04:00
Tim Gross	05eef2b95c	keystore serialization (#13106 ) This changeset implements the keystore serialization/deserialization: * Adds a JSON serialization extension for the `RootKey` struct, along with a metadata stub. When we serialize RootKey to the on-disk keystore, we want to base64 encode the key material but also exclude any frequently-changing fields which are stored in raft. * Implements methods for loading/saving keys to the keystore. * Implements methods for restoring the whole keystore from disk. * Wires it all up with the `Keyring` RPC handlers and fixes up any fallout on tests.	2022-07-11 13:34:04 -04:00
Tim Gross	c6929a6c1e	keyring HTTP API (#13077 )	2022-07-11 13:34:04 -04:00
Charlie Voiselle	2019eab2c8	Provide mock secure variables implementation (#12980 ) * Add SecureVariable mock * Add SecureVariableStub * Add SecureVariable Copy and Stub funcs	2022-07-11 13:34:03 -04:00
Tim Gross	b6dd1191b2	snapshot restore-from-archive streaming and filtering (#13658 ) Stream snapshot to FSM when restoring from archive The `RestoreFromArchive` helper decompresses the snapshot archive to a temporary file before reading it into the FSM. For large snapshots this performs a lot of disk IO. Stream decompress the snapshot as we read it, without first writing to a temporary file. Add bexpr filters to the `RestoreFromArchive` helper. The operator can pass these as `-filter` arguments to `nomad operator snapshot state` (and other commands in the future) to include only desired data when reading the snapshot.	2022-07-11 10:48:00 -04:00
James Rasell	353323d171	agent: test full object when performing test config parse. (#13668 )	2022-07-11 16:21:36 +02:00
James Rasell	9eb63c9e03	cli: ensure node status and drain use correct cmd name. (#13656 )	2022-07-11 09:50:42 +02:00
Luiz Aoqui	03433dd8af	cli: improve output of eval commands (#13581 ) Use the same output format when listing multiple evals in the `eval list` command and when `eval status <prefix>` matches more than one eval. Include the eval namespace in all output formats and always include the job ID in `eval status` since, even `node-update` evals are related to a job. Add Node ID to the evals table output to help differentiate `node-update` evals. Co-authored-by: James Rasell <jrasell@hashicorp.com>	2022-07-07 13:13:34 -04:00
Tim Gross	1fc8995590	query for leader in `operator debug` command (#13472 ) The `operator debug` command doesn't output the leader anywhere in the output, which adds extra burden to offline debugging (away from an ongoing incident where you can simply check manually). Query the `/v1/status/leader` API but degrade gracefully.	2022-07-06 10:57:44 -04:00
James Rasell	0c0b028a59	core: allow deleting of evaluations (#13492 ) * core: add eval delete RPC and core functionality. * agent: add eval delete HTTP endpoint. * api: add eval delete API functionality. * cli: add eval delete command. * docs: add eval delete website documentation.	2022-07-06 16:30:11 +02:00
James Rasell	181b247384	core: allow pausing and un-pausing of leader broker routine (#13045 ) * core: allow pause/un-pause of eval broker on region leader. * agent: add ability to pause eval broker via scheduler config. * cli: add operator scheduler commands to interact with config. * api: add ability to pause eval broker via scheduler config * e2e: add operator scheduler test for eval broker pause. * docs: include new opertor scheduler CLI and pause eval API info.	2022-07-06 16:13:48 +02:00
Seth Hoenig	97726c2fd8	Merge pull request #12862 from hashicorp/f-choose-services api: enable selecting subset of services using rendezvous hashing	2022-06-30 15:17:40 -05:00
Yoan Blanc	3d96145ea5	feat: docker/docker/pkg/term has been deprecated in favor of moby/term See https://github.com/moby/moby/pull/40825 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2022-06-26 15:35:27 +02:00
Seth Hoenig	9467bc9eb3	api: enable selecting subset of services using rendezvous hashing This PR adds the 'choose' query parameter to the '/v1/service/<service>' endpoint. The value of 'choose' is in the form '<number>\|<key>', number is the number of desired services and key is a value unique but consistent to the requester (e.g. allocID). Folks aren't really expected to use this API directly, but rather through consul-template which will soon be getting a new helper function making use of this query parameter. Example, curl 'localhost:4646/v1/service/redis?choose=2\|abc123' Note: consul-templte v0.29.1 includes the necessary nomadServices functionality.	2022-06-25 10:37:37 -05:00
Seth Hoenig	91e08d5e23	core: remove support for raft protocol version 2 This PR checks server config for raft_protocol, which must now be set to 3 or unset (0). When unset, version 3 is used as the default.	2022-06-23 14:37:50 +00:00
Derek Strickland	9de4d7367c	cli: fix detach handling (#13405 ) Fix detach handling for: - `deployment fail` - `deployment promote` - `deployment resume` - `deployment unblock` - `job promote`	2022-06-21 06:01:23 -04:00
Joseph Martin	4aa96d5bfc	Return evalID if `-detach` flag is passed to job revert (#13364 ) * Return evalID if `-detach` flag is passed to job revert	2022-06-15 14:20:29 -04:00
Grant Griffiths	99896da443	CSI: make plugin health_timeout configurable in csi_plugin stanza (#13340 ) Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2022-06-14 10:04:16 -04:00
Derek Strickland	34dea90d7a	docker: update images to reference hashicorpdev Docker organization (#12903 ) docker: update images to reference hashicorpdev dockerhub organization generate job_init.bindata_assetfs.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-06-08 15:06:00 -04:00
Kevin Schoonover	544c276128	parse ACL token from authorization header (#12534 )	2022-06-06 15:51:02 -04:00
Karan Sharma	9426be01fc	feat: Warn if bootstrap_expect is even number (#12961 )	2022-06-06 15:22:59 +02:00
Lance Haig	4bf27d743d	Allow Operator Generated bootstrap token (#12520 )	2022-06-03 07:37:24 -04:00
Huan Wang	7d15157635	adding support for customized ingress tls (#13184 )	2022-06-02 18:43:58 -04:00
Seth Hoenig	54efec5dfe	docs: add docs and tests for tagged_addresses	2022-05-31 13:02:48 -05:00
Jorge Marey	f966614602	Allow setting tagged addresses on services	2022-05-31 10:06:55 -05:00

... 2 3 4 5 6 ...

3595 Commits