open-nomad

Author	SHA1	Message	Date
Tim Gross	5a85d96322	remove end-user algorithm selection (#13190 ) After internal design review, we decided to remove exposing algorithm choice to the end-user for the initial release. We'll solve nonce rotation by forcing rotations automatically on key GC (in a core job, not included in this changeset). Default to AES-256 GCM for the following criteria: * faster implementation when hardware acceleration is available * FIPS compliant * implementation in pure go * post-quantum resistance Also fixed a bug in the decoding from keystore and switched to a harder-to-misuse encoding method.	2022-07-11 13:34:04 -04:00
Tim Gross	973b474b3c	provide state store query for variables by key ID (#13195 ) The core jobs to garbage collect unused keys and perform full key rotations will need to be able to query secure variables by key ID for efficiency. Add an index to the state store and associated query function and test.	2022-07-11 13:34:04 -04:00
Tim Gross	f2ee585830	bootstrap keyring (#13124 ) When a server becomes leader, it will check if there are any keys in the state store, and create one if there is not. The key metadata will be replicated via raft to all followers, who will then get the key material via key replication (not implemented in this changeset).	2022-07-11 13:34:04 -04:00
Charlie Voiselle	3717688f3e	Secure Variables: Variables - State store, FSM, RPC (#13098 ) * Secure Variables: State Store * Secure Variables: FSM * Secure Variables: RPC * Secure Variables: HTTP API Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-07-11 13:34:04 -04:00
Tim Gross	05eef2b95c	keystore serialization (#13106 ) This changeset implements the keystore serialization/deserialization: * Adds a JSON serialization extension for the `RootKey` struct, along with a metadata stub. When we serialize RootKey to the on-disk keystore, we want to base64 encode the key material but also exclude any frequently-changing fields which are stored in raft. * Implements methods for loading/saving keys to the keystore. * Implements methods for restoring the whole keystore from disk. * Wires it all up with the `Keyring` RPC handlers and fixes up any fallout on tests.	2022-07-11 13:34:04 -04:00
Tim Gross	2f0fd556ad	keyring RPC handlers (#13075 ) Implement the upsert, list, delete, and rotate RPC handlers for the secure variables keyring. Operations on the keyring itself are still stubbed out.	2022-07-11 13:34:04 -04:00
Tim Gross	b1dc6dcef0	keyring state store operations (#13016 ) Implement the basic upsert, list, and delete operations for `RootKeyMeta` needed by the Keyring RPCs. This changeset also implements two convenience methods `RootKeyMetaByID` and `GetActiveRootKeyMeta` which are useful for testing but also will be needed to implement the rest of the RPCs.	2022-07-11 13:34:04 -04:00
Charlie Voiselle	2019eab2c8	Provide mock secure variables implementation (#12980 ) * Add SecureVariable mock * Add SecureVariableStub * Add SecureVariable Copy and Stub funcs	2022-07-11 13:34:03 -04:00
Tim Gross	d29e85d150	secure variables: initial state store (#12932 ) Implement the core SecureVariable and RootKey structs in memdb, provide the minimal skeleton for FSM, and a dummy storage and keyring RPC endpoint.	2022-07-11 13:34:01 -04:00
Tim Gross	b6dd1191b2	snapshot restore-from-archive streaming and filtering (#13658 ) Stream snapshot to FSM when restoring from archive The `RestoreFromArchive` helper decompresses the snapshot archive to a temporary file before reading it into the FSM. For large snapshots this performs a lot of disk IO. Stream decompress the snapshot as we read it, without first writing to a temporary file. Add bexpr filters to the `RestoreFromArchive` helper. The operator can pass these as `-filter` arguments to `nomad operator snapshot state` (and other commands in the future) to include only desired data when reading the snapshot.	2022-07-11 10:48:00 -04:00
Seth Hoenig	239eaf9a29	Merge pull request #13626 from hashicorp/b-client-max-kill-timeout client: enforce max_kill_timeout client configuration	2022-07-07 13:44:39 -05:00
Luiz Aoqui	85908415f9	state: fix eval list by prefix with * namespace (#13551 )	2022-07-07 14:21:51 -04:00
Michael Schurter	f21272065d	core: emit node evals only for sys jobs in dc (#12955 ) Whenever a node joins the cluster, either for the first time or after being `down`, we emit a evaluation for every system job to ensure all applicable system jobs are running on the node. This patch adds an optimization to skip creating evaluations for system jobs not in the current node's DC. While the scheduler performs the same feasability check, skipping the creation of the evaluation altogether saves disk, network, and memory.	2022-07-06 14:35:18 -07:00
Seth Hoenig	5dd8aa3e27	client: enforce max_kill_timeout client configuration This PR fixes a bug where client configuration max_kill_timeout was not being enforced. The feature was introduced in 9f44780 but seems to have been removed during the major drivers refactoring. We can make sure the value is enforced by pluming it through the DriverHandler, which now uses the lesser of the task.killTimeout or client.maxKillTimeout. Also updates Event.SetKillTimeout to require both the task.killTimeout and client.maxKillTimeout so that we don't make the mistake of using the wrong value - as it was being given only the task.killTimeout before.	2022-07-06 15:29:38 -05:00
Luiz Aoqui	a9a66ad018	api: apply new ACL check for wildcard namespace (#13608 ) api: apply new ACL check for wildcard namespace In #13606 the ACL check was refactored to better support the all namespaces wildcard (`*`). This commit applies the changes to the jobs and alloc list endpoints.	2022-07-06 16:17:16 -04:00
Luiz Aoqui	74c5578432	api: refactor ACL check for namespace wildcard (#13606 ) Improve how the all namespaces wildcard (``) is handled when checking ACL permissions. When using the wildcard namespace the `AllowNsOp` would return false since it looks for a namespace called `` to match. This commit changes this behavior to return `true` when the queried namespace is `*` and the token allows the operation in _any_ namespace. Actual permission must be checked per object. The helper function `AllowNsOpFunc` returns a function that can be used to make this verification.	2022-07-06 15:22:30 -04:00
James Rasell	0c0b028a59	core: allow deleting of evaluations (#13492 ) * core: add eval delete RPC and core functionality. * agent: add eval delete HTTP endpoint. * api: add eval delete API functionality. * cli: add eval delete command. * docs: add eval delete website documentation.	2022-07-06 16:30:11 +02:00
James Rasell	181b247384	core: allow pausing and un-pausing of leader broker routine (#13045 ) * core: allow pause/un-pause of eval broker on region leader. * agent: add ability to pause eval broker via scheduler config. * cli: add operator scheduler commands to interact with config. * api: add ability to pause eval broker via scheduler config * e2e: add operator scheduler test for eval broker pause. * docs: include new opertor scheduler CLI and pause eval API info.	2022-07-06 16:13:48 +02:00
Seth Hoenig	97726c2fd8	Merge pull request #12862 from hashicorp/f-choose-services api: enable selecting subset of services using rendezvous hashing	2022-06-30 15:17:40 -05:00
Michael Schurter	1cc0ae8795	docs: fix Plan{,Result}.NodeUpdate comment (#13534 ) It appears way back when this was first implemented in 9a917281af9c0a97a6c59575eaa52c5c86ffc60d, it was renamed from NodeEvict (with a correct comment) to NodeUpdate. The comment was changed from referring to only evictions to referring to "all allocs" in the first sentence and "stop or evict" in the second. This confuses every time I see it because I read the name (NodeUpdate) and first sentence ("all the allocs") and assume this represents all allocations... which isn't true. I'm going to assume I'm the only one who doesn't read the 2nd sentence and that's why this suboptimal wording has lasted 7 years, but can we change it for my sake?	2022-06-30 12:47:14 -07:00
James Rasell	d080eed9ae	client: fixed a problem calculating a service namespace. (#13493 ) When calculating a services namespace for registration, the code assumed the first task within the task array would include a service block. This is incorrect as it is possible only a latter task within the array contains a service definition. This change fixes the logic, so we correctly search for a service definition before identifying the namespace.	2022-06-28 09:47:28 +02:00
Shishir Mahajan	6ba8245283	Fix typo: orthogonal. Signed-off-by: Shishir Mahajan <smahajan@roblox.com>	2022-06-27 12:12:51 -07:00
Seth Hoenig	9467bc9eb3	api: enable selecting subset of services using rendezvous hashing This PR adds the 'choose' query parameter to the '/v1/service/<service>' endpoint. The value of 'choose' is in the form '<number>\|<key>', number is the number of desired services and key is a value unique but consistent to the requester (e.g. allocID). Folks aren't really expected to use this API directly, but rather through consul-template which will soon be getting a new helper function making use of this query parameter. Example, curl 'localhost:4646/v1/service/redis?choose=2\|abc123' Note: consul-templte v0.29.1 includes the necessary nomadServices functionality.	2022-06-25 10:37:37 -05:00
Tim Gross	4368dcc02f	fix deadlock in plan_apply (#13407 ) The plan applier has to get a snapshot with a minimum index for the plan it's working on in order to ensure consistency. Under heavy raft loads, we can exceed the timeout. When this happens, we hit a bug where the plan applier blocks waiting on the `indexCh` forever, and all schedulers will block in `Plan.Submit`. Closing the `indexCh` when the `asyncPlanWait` is done with it will prevent the deadlock without impacting correctness of the previous snapshot index. This changeset includes the a PoC failing test that works by injecting a large timeout into the state store. We need to turn this into a test we can run normally without breaking the state store before we can merge this PR. Increase `snapshotMinIndex` timeout to 10s. This timeout creates backpressure where any concurrent `Plan.Submit` RPCs will block waiting for results. This sheds load across all servers and gives raft some CPU to catch up, because schedulers won't dequeue more work while waiting. Increase it to 10s based on observations of large production clusters.	2022-06-23 12:06:27 -04:00
Grant Griffiths	99896da443	CSI: make plugin health_timeout configurable in csi_plugin stanza (#13340 ) Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2022-06-14 10:04:16 -04:00
Tim Gross	9d5523a72d	CSI: skip node unpublish on GC'd or down nodes (#13301 ) If the node has been GC'd or is down, we can't send it a node unpublish. The CSI spec requires that we don't send the controller unpublish before the node unpublish, but in the case where a node is gone we can't know the final fate of the node unpublish step. The `csi_hook` on the client will unpublish if the allocation has stopped and if the host is terminated there's no mount for the volume anyways. So we'll now assume that the node has unpublished at its end. If it hasn't, any controller unpublish will potentially hang or error and need to be retried.	2022-06-09 11:33:22 -04:00
James Rasell	f5e78a3791	state: only update index on change when deleting evals. (#13227 ) When deleting evaluations and allocations during a reap event, the index table entries for evals and allocs was updated irregardless of whether changes were made. This change modifies the state logic so that the index table is only modified when the corresponding table has actually been modified. Along with matching expected behaviour, this change has the potential to reduce the number of times blocking queries will return without any real state change.	2022-06-07 11:56:43 +02:00
Lance Haig	4bf27d743d	Allow Operator Generated bootstrap token (#12520 )	2022-06-03 07:37:24 -04:00
Huan Wang	7d15157635	adding support for customized ingress tls (#13184 )	2022-06-02 18:43:58 -04:00
Seth Hoenig	0399b7e4c5	Merge pull request #12951 from jorgemarey/f-srv-tagged-addresses Allow setting tagged addresses on services	2022-06-01 10:51:49 -05:00
Seth Hoenig	189176f052	consul: avoid reflection in comparing service map types	2022-06-01 10:22:00 -05:00
Tim Gross	6873670dd6	refactor index threshold calculation for core GC jobs (#13196 ) Almost all GC jobs check the index of the objects being GC'd to see if they're older than a configured threshold. This code was repeated six times in `CoreScheduler` with only logging changes, so it seems safe to extract it as its own method.	2022-06-01 11:12:20 -04:00
Seth Hoenig	dca954faac	build: update golangci-lint to v1.46.2 This version of golangci-lint improves support for generics, but also is more strict in copy vs. loop for slice copying.	2022-05-31 23:32:01 +00:00
Seth Hoenig	54efec5dfe	docs: add docs and tests for tagged_addresses	2022-05-31 13:02:48 -05:00
Jorge Marey	f966614602	Allow setting tagged addresses on services	2022-05-31 10:06:55 -05:00
Seth Hoenig	4631045d83	connect: enable setting connect upstream destination namespace	2022-05-26 09:39:36 -05:00
Luiz Aoqui	769ff1dcc3	Merge pull request #13109 from hashicorp/merge-release-1.3.1-branch Merge release 1.3.1 branch	2022-05-25 10:45:09 -04:00
Seth Hoenig	626a345fb2	ci: switch to 22.04 LTS for GHA Core CI tests	2022-05-25 08:19:40 -05:00
Michael Schurter	2965dc6a1a	artifact: fix numerous go-getter security issues Fix numerous go-getter security issues: - Add timeouts to http, git, and hg operations to prevent DoS - Add size limit to http to prevent resource exhaustion - Disable following symlinks in both artifacts and `job run` - Stop performing initial HEAD request to avoid file corruption on retries and DoS opportunities. Approach Since Nomad has no ability to differentiate a DoS-via-large-artifact vs a legitimate workload, all of the new limits are configurable at the client agent level. The max size of HTTP downloads is also exposed as a node attribute so that if some workloads have large artifacts they can specify a high limit in their jobspecs. In the future all of this plumbing could be extended to enable/disable specific getters or artifact downloading entirely on a per-node basis.	2022-05-24 16:29:39 -04:00
Luiz Aoqui	0a00059f3c	core: test duplicated blocked eval stats In the original test, the eval generator would use a random value for the job ID, resulting in an unxercised code path for duplicate blocked evals.	2022-05-24 15:44:06 -04:00
Seth Hoenig	a5943da0c7	core: add tests for blocked evals math	2022-05-24 09:05:18 -05:00
Seth Hoenig	0c145ac1e4	core: remove correct set of resources on blocked eval	2022-05-23 15:18:55 -05:00
Seth Hoenig	fc58f4972c	cli: correctly use and validate job with vault token set This PR fixes `job validate` to respect '-vault-token', '$VAULT_TOKEN', '-vault-namespace' if set.	2022-05-19 12:13:34 -05:00
Eng Zer Jun	97d1bc735c	test: use `T.TempDir` to create temporary test directory (#12853 ) * test: use `T.TempDir` to create temporary test directory This commit replaces `ioutil.TempDir` with `t.TempDir` in tests. The directory created by `t.TempDir` is automatically removed when the test and all its subtests complete. Prior to this commit, temporary directory created using `ioutil.TempDir` needs to be removed manually by calling `os.RemoveAll`, which is omitted in some tests. The error handling boilerplate e.g. defer func() { if err := os.RemoveAll(dir); err != nil { t.Fatal(err) } } is also tedious, but `t.TempDir` handles this for us nicely. Reference: https://pkg.go.dev/testing#T.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix TestLogmon_Start_restart on Windows Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix failing TestConsul_Integration t.TempDir fails to perform the cleanup properly because the folder is still in use testing.go:967: TempDir RemoveAll cleanup: unlinkat /tmp/TestConsul_Integration2837567823/002/191a6f1a-5371-cf7c-da38-220fe85d10e5/web/secrets: device or resource busy Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-05-12 11:42:40 -04:00
James Rasell	9ea1a6faf6	fsm: add service registration snapshot persistence. (#12896 )	2022-05-06 15:53:27 +02:00
James Rasell	a05114fdac	core: add namespace to plan for node rejected log line. (#12868 )	2022-05-05 10:56:40 +02:00
Tim Gross	45b238ec82	CSI: node drain should end once only plugins remain (#12846 ) In #12324 we made it so that plugins wait until the node drain is complete, as we do for system jobs. But we neglected to mark the node drain as complete once only plugins (or system jobs) remaining, which means that the node drain is left in a draining state until the `deadline` time expires. This was incorrectly documented as expected behavior in #12324.	2022-05-03 10:20:22 -04:00
Luiz Aoqui	a8cc633156	vault: revert support for entity aliases (#12723 ) After a more detailed analysis of this feature, the approach taken in PR #12449 was found to be not ideal due to poor UX (users are responsible for setting the entity alias they would like to use) and issues around jobs potentially masquerading itself as another Vault entity.	2022-04-22 10:46:34 -04:00
Seth Hoenig	c4aab10e53	services: cr followup	2022-04-22 09:14:29 -05:00
Seth Hoenig	3fcac242c6	services: enable setting arbitrary address value in service registrations This PR introduces the `address` field in the `service` block so that Nomad or Consul services can be registered with a custom `.Address.` to advertise. The address can be an IP address or domain name. If the `address` field is set, the `service.address_mode` must be set in `auto` mode.	2022-04-22 09:14:29 -05:00
James Rasell	716b8e658b	api: Add support for filtering and pagination to the node list endpoint (#12727 )	2022-04-21 17:04:33 +02:00
Derek Strickland	5e309f3f33	reconciler: Handle canaries when client disconnects (#12539 ) * plan_apply: Allow node updates in disconnected node plans * plan: Keep the job when persisting unknown allocs * reconciler: stop unknown allocs when stopping all * reconcile_util: reorder filtering to handle canaries; skip rescheduling unknown * heartbeat: Fix bug in node heartbeating	2022-04-21 10:05:58 -04:00
James Rasell	257e1c4f96	autopilot: correctly return errors within state functions. (#12714 )	2022-04-21 08:54:50 +02:00
James Rasell	010acce59f	job_hooks: add implicit constraint when using Consul for services. (#12602 )	2022-04-20 14:09:13 +02:00
Seth Hoenig	411158acff	Merge pull request #12586 from hashicorp/f-local-si-token connect: create SI tokens in local scope	2022-04-19 07:53:01 -05:00
chavacava	eb1c42e643	QueryOptions.SetTimeToBlock should take pointer receiver Fixes a bug where blocking queries that are retried don't have their blocking timeout reset, resulting in them running longer than expected.	2022-04-18 10:41:27 -04:00
Seth Hoenig	df587d8263	docs: update documentation with connect acls changes This PR updates the changelog, adds notes the 1.3 upgrade guide, and updates the connect integration docs with documentation about the new requirement on Consul ACL policies of Consul agent default anonymous ACL tokens.	2022-04-18 08:22:33 -05:00
Jorge Marey	707c7f3a11	Change consul SI tokens to be local	2022-04-18 08:22:33 -05:00
Shishir	f5121d261e	Add os to NodeListStub struct. (#12497 ) * Add os to NodeListStub struct. Signed-off-by: Shishir Mahajan <smahajan@roblox.com> * Add os as a query param to /v1/nodes. Signed-off-by: Shishir Mahajan <smahajan@roblox.com> * Add test: os as a query param to /v1/nodes. Signed-off-by: Shishir Mahajan <smahajan@roblox.com>	2022-04-15 17:22:45 -07:00
Tim Gross	826d9d47f9	CSI: replace structs->api with serialization extension (#12583 ) The CSI HTTP API has to transform the CSI volume to redact secrets, remove the claims fields, and to consolidate the allocation stubs into a single slice of alloc stubs. This was done manually in #8590 but this is a large amount of code and has proven both very bug prone (see #8659, #8666, #8699, #8735, and #12150) and requires updating lots of code every time we add a field to volumes or plugins. In #10202 we introduce encoding improvements for the `Node` struct that allow a more minimal transformation. Apply this same approach to serializing `structs.CSIVolume` to API responses. Also, the original reasoning behind #8590 for plugins no longer holds because the counts are now denormalized within the state store, so we can simply remove this transformation entirely.	2022-04-15 14:29:34 -04:00
Derek Strickland	4d3a0aae6d	heartbeat: Handle transitioning from disconnected to down (#12559 )	2022-04-15 09:47:45 -04:00
Derek Strickland	0891218ee9	system_scheduler: support disconnected clients (#12555 ) * structs: Add helper method for checking if alloc is configured to disconnect * system_scheduler: Add support for disconnected clients	2022-04-15 09:31:32 -04:00
James Rasell	4cdc46ae75	service discovery: add pagination and filtering support to info requests (#12552 ) * services: add pagination and filter support to info RPC. * cli: add filter flag to service info command. * docs: add pagination and filter details to services info API. * paginator: minor updates to comment and func signature.	2022-04-13 07:41:44 +02:00
Luiz Aoqui	82027edb2f	add some godocs for the API pagination tokenizer options (#12547 )	2022-04-12 10:27:22 -04:00
Tim Gross	4078e6ea0e	scripts: fix interpreter for bash (#12549 ) Many of our scripts have a non-portable interpreter line for bash and use bash-specific variables like `BASH_SOURCE`. Update the interpreter line to be portable between various Linuxes and macOS without complaint from posix shell users.	2022-04-12 10:08:21 -04:00
Yoan Blanc	5e8254beda	feat: remove dependency to consul/lib Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2022-04-09 13:22:44 +02:00
Lars Lehtonen	df1edf5cf4	nomad/state: fix dropped test errors (#12406 )	2022-04-07 10:48:10 -04:00
Derek Strickland	065b3ed886	plan_apply: Add missing unit test for validating plans for disconnected clients (#12495 )	2022-04-07 09:58:09 -04:00
Jasmine Dahilig	38efb3c8d8	metrics: emit stats for vault token next_renewal & last_renewal #5222 (#12435 )	2022-04-06 10:03:11 -07:00
Jorge Marey	96dd3f53c6	Fix in-place updates over ineligible nodes (#12264 )	2022-04-06 11:30:40 -04:00
Derek Strickland	0ab89b1728	Merge pull request #12476 from hashicorp/f-disconnected-client-allocation-handling disconnected clients: Feature branch merge	2022-04-06 10:11:57 -04:00
Derek Strickland	d1d6009e2c	disconnected clients: Support operator manual interventions (#12436 ) * allocrunner: Remove Shutdown call in Reconnect * Node.UpdateAlloc: Stop orphaned allocs. * reconciler: Stop failed reconnects. * Apply feedback from code review. Handle rebase conflict. * Apply suggestions from code review Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-04-06 09:33:32 -04:00
Seth Hoenig	2e2ff3f75e	Merge pull request #12419 from hashicorp/exec-cleanup raw_exec: make raw exec driver work with cgroups v2	2022-04-05 16:42:01 -05:00
Derek Strickland	43d20ebdbd	disconnected clients: `TaskGroup` validation (#12418 ) * TaskGroup: Validate that max_client_disconnect and stop_after_client_disconnect are mutually exclusive.	2022-04-05 17:14:50 -04:00
Derek Strickland	bd719bc7b8	reconciler: 2 phase reconnects and tests (#12333 ) * structs: Add alloc.Expired & alloc.Reconnected functions. Add Reconnect eval trigger by. * node_endpoint: Emit new eval for reconnecting unknown allocs. * filterByTainted: handle 2 phase commit filtering rules. * reconciler: Append AllocState on disconnect. Logic updates from testing and 2 phase reconnects. * allocs: Set reconnect timestamp. Destroy if not DesiredStatusRun. Watch for unknown status.	2022-04-05 17:13:10 -04:00
Derek Strickland	bb376320a2	comments: update some stale comments referencing deprecated config name (#12271 ) * comments: update some stale comments referencing deprecated config name	2022-04-05 17:12:23 -04:00
Derek Strickland	fd04a24ac7	disconnected clients: ensure servers meet minimum required version (#12202 ) * planner: expose ServerMeetsMinimumVersion via Planner interface * filterByTainted: add flag indicating disconnect support * allocReconciler: accept and pass disconnect support flag * tests: update dependent tests	2022-04-05 17:12:23 -04:00
Derek Strickland	8e9f8be511	`MaxClientDisconnect` Jobspec checklist (#12177 ) * api: Add struct, conversion function, and tests * TaskGroup: Add field, validation, and tests * diff: Add diff handler and test * docs: Update docs	2022-04-05 17:12:23 -04:00
Derek Strickland	d7f44448e1	disconnected clients: Observability plumbing (#12141 ) * Add disconnects/reconnect to log output and emit reschedule metrics * TaskGroupSummary: Add Unknown, update StateStore logic, add to metrics	2022-04-05 17:12:23 -04:00
Derek Strickland	3cbd76ea9d	disconnected clients: Add reconnect task event (#12133 ) * Add TaskClientReconnectedEvent constant * Add allocRunner.Reconnect function to manage task state manually * Removes server-side push	2022-04-05 17:12:23 -04:00
DerekStrickland	da9bc350c8	evaluateNodePlan: validate plans for disconnected nodes	2022-04-05 17:12:22 -04:00
DerekStrickland	73fdf5a919	NodeStatusDisconnected: support state transitions for new node status	2022-04-05 17:12:18 -04:00
Derek Strickland	b128769e19	reconciler: support disconnected clients (#12058 ) * Add merge helper for string maps * structs: add statuses, MaxClientDisconnect, and helper funcs * taintedNodes: Include disconnected nodes * upsertAllocsImpl: don't use existing ClientStatus when upserting unknown * allocSet: update filterByTainted and add delayByMaxClientDisconnect * allocReconciler: support disconnecting and reconnecting allocs * GenericScheduler: upsert unknown and queue reconnecting Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-04-05 17:10:37 -04:00
Seth Hoenig	e0d5845fda	raw_exec: fixup review comments	2022-04-05 15:21:28 -05:00
Luiz Aoqui	ab7eb5de6e	Support Vault entity aliases (#12449 ) Move some common Vault API data struct decoding out of the Vault client so it can be reused in other situations. Make Vault job validation its own function so it's easier to expand it. Rename the `Job.VaultPolicies` method to just `Job.Vault` since it returns the full Vault block, not just their policies. Set `ChangeMode` on `Vault.Canonicalize`. Add some missing tests. Allows specifying an entity alias that will be used by Nomad when deriving the task Vault token. An entity alias assigns an indentity to a token, allowing better control and management of Vault clients since all tokens with the same indentity alias will now be considered the same client. This helps track Nomad activity in Vault's audit logs and better control over Vault billing. Add support for a new Nomad server configuration to define a default entity alias to be used when deriving Vault tokens. This default value will be used if the task doesn't have an entity alias defined.	2022-04-05 14:18:10 -04:00
James Rasell	e2b730d7c9	Merge pull request #12454 from hashicorp/f-rename-service-event-stream events: add service API logic and rename topic to service from serviceregistration	2022-04-05 16:19:14 +02:00
Grant Griffiths	18a0a2c9a4	CSI: Add secrets flag support for delete volume (#11245 )	2022-04-05 08:59:11 -04:00
James Rasell	cc7b448d63	events: fixup service events and rename topic to service.	2022-04-05 08:25:22 +01:00
Seth Hoenig	52aaf86f52	raw_exec: make raw exec driver work with cgroups v2 This PR adds support for the raw_exec driver on systems with only cgroups v2. The raw exec driver is able to use cgroups to manage processes. This happens only on Linux, when exec_driver is enabled, and the no_cgroups option is not set. The driver uses the freezer controller to freeze processes of a task, issue a sigkill, then unfreeze. Previously the implementation assumed cgroups v1, and now it also supports cgroups v2. There is a bit of refactoring in this PR, but the fundamental design remains the same. Closes #12351 #12348	2022-04-04 16:11:38 -05:00
Michael Schurter	a1f1294dc4	Merge pull request #12442 from hashicorp/f-sd-add-mixed-auth-read-endpoints service-disco: add mixed auth to list and read RPC endpoints.	2022-04-04 12:19:29 -07:00
Tim Gross	759310d13a	CSI: volume watcher shutdown fixes (#12439 ) The volume watcher design was based on deploymentwatcher and drainer, but has an important difference: we don't want to maintain a goroutine for the lifetime of the volume. So we stop the volumewatcher goroutine for a volume when that volume has no more claims to free. But the shutdown races with updates on the parent goroutine, and it's possible to drop updates. Fortunately these updates are picked up on the next core GC job, but we're most likely to hit this race when we're replacing an allocation and that's the time we least want to wait. Wait until the volume has "settled" before stopping this goroutine so that the race between shutdown and the parent goroutine sending on `<-updateCh` is pushed to after the window we most care about quick freeing of claims. * Fixes a resource leak when volumewatchers are no longer needed. The volume is nil and can't ever be started again, so the volume's `watcher` should be removed from the top-level `Watcher`. * De-flakes the GC job test: the test throws an error because the claimed node doesn't exist and is unreachable. This flaked instead of failed because we didn't correctly wait for the first pass through the volumewatcher. Make the GC job wait for the volumewatcher to reach the quiescent timeout window state before running the GC eval under test, so that we're sure the GC job's work isn't being picked up by processing one of the earlier claims. Update the claims used so that we're sure the GC pass won't hit a node unpublish error. * Adds trace logging to unpublish operations	2022-04-04 10:46:45 -04:00
James Rasell	ea4d7366fc	service-disco: add mixed auth to list and read RPC endpoints. In the same manner as the delete RPC, the list and read service registration endpoints can be called either by external operators or Nomad nodes. The latter occurs when a template is being rendered which includes Nomad API template funcs. In this case, the auth token is looked up as the node secret ID for auth.	2022-04-04 13:45:43 +01:00
Seth Hoenig	9670adb6c6	cleanup: purge github.com/pkg/errors	2022-04-01 19:24:02 -05:00
Tim Gross	de191e8068	Test lint touchup (#12434 ) * lint: require should not be aliased in core_sched_test * lint: require should not be aliased in volumes_watcher_test * testing: don't alias state package in core_sched_test	2022-04-01 15:17:58 -04:00
Tim Gross	03c1904112	csi: allow `namespace` field to be passed in volume spec (#12400 ) Use the volume spec's `namespace` field to override the value of the `-namespace` and `NOMAD_NAMESPACE` field, just as we do with job spec.	2022-03-29 14:46:39 -04:00
Tim Gross	a6652bffad	CSI: reorder controller volume detachment (#12387 ) In #12112 and #12113 we solved for the problem of races in releasing volume claims, but there was a case that we missed. During a node drain with a controller attach/detach, we can hit a race where we call controller publish before the unpublish has completed. This is discouraged in the spec but plugins are supposed to handle it safely. But if the storage provider's API is slow enough and the plugin doesn't handle the case safely, the volume can get "locked" into a state where the provider's API won't detach it cleanly. Check the claim before making any external controller publish RPC calls so that Nomad is responsible for the canonical information about whether a volume is currently claimed. This has a couple side-effects that also had to get fixed here: * Changing the order means that the volume will have a past claim without a valid external node ID because it came from the client, and this uncovered a separate bug where we didn't assert the external node ID was valid before returning it. Fallthrough to getting the ID from the plugins in the state store in this case. We avoided this originally because of concerns around plugins getting lost during node drain but now that we've fixed that we may want to revisit it in future work. * We should make sure we're handling `FailedPrecondition` cases from the controller plugin the same way we handle other retryable cases. * Several tests had to be updated because they were assuming we fail in a particular order that we're no longer doing.	2022-03-29 09:44:00 -04:00
James Rasell	9449e1c3e2	Merge branch 'main' into f-1.3-boogie-nights	2022-03-25 16:40:32 +01:00
Luiz Aoqui	e7382a6b45	tests: fix rpc limit tests (#12364 )	2022-03-24 15:31:47 -04:00
Michael Schurter	654d458960	core: add deprecated mvn tag to serf (#12327 ) Revert a small part of #11600 after @lgfa29 discovered it would break compatibility with Nomad <= v1.2! Nomad <= v1.2 expects the `vsn` tag to exist in Serf. It has always been `1`. It has no functional purpose. However it causes a parsing error if it is not set: https://github.com/hashicorp/nomad/blob/v1.2.6/nomad/util.go#L103-L108 This means Nomad servers at version v1.2 or older will not allow servers without this tag to join. The `mvn` minor version tag is also checked, but soft fails. I'm not setting that because I want as much of this cruft gone as possible.	2022-03-24 14:44:21 -04:00
Luiz Aoqui	64b558c14c	core: store and check for Raft version changes (#12362 ) Downgrading the Raft version protocol is not a supported operation. Checking for a downgrade is hard since this information is not stored in any persistent place. When a server re-joins a cluster with a prior Raft version, the Serf tag is updated so Nomad can't tell that the version changed. Mixed version clusters must be supported to allow for zero-downtime rolling upgrades. During this it's expected that the cluster will have mixed Raft versions. Enforcing consistency strong version consistency would disrupt this flow. The approach taken here is to store the Raft version on disk. When the server starts the `raft_protocol` value is written to the file `data_dir/raft/version`. If that file already exists, its content is checked against the current `raft_protocol` value to detect downgrades and prevent the server from starting. Any other types of errors are ignore to prevent disruptions that are outside the control of operators. The only option in cases of an invalid or corrupt file would be to delete it, making this check useless. So just overwrite its content with the new version and provide guidance on how to check that their cluster is an expected state.	2022-03-24 14:42:00 -04:00

1 2 3 4 5 ...

4055 commits