open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	c7a11a86c6	block deleting namespaces if the namespace contains a volume (#13880 ) When we delete a namespace, we check to ensure that there are no non-terminal jobs, which effectively covers evals, allocs, etc. CSI volumes are also namespaced, so extend this check to cover CSI volumes.	2022-07-21 16:13:52 -04:00
Tim Gross	cfa2cb140e	fsm: one-time token expiration should be deterministic (#13737 ) When applying a raft log to expire ACL tokens, we need to use a timestamp provided by the leader so that the result is deterministic across servers. Use leader's timestamp from RPC call	2022-07-18 14:19:29 -04:00
Tim Gross	aa15e0fe7e	secure vars: updates should reduce quota tracking if smaller (#13742 ) When secure variables are updated, we were adding the update to the existing quota tracking without first checking whether it was an update to an existing variable. In that case we need to add/subtract only the difference between the new and existing quota usage.	2022-07-15 11:08:53 -04:00
Tim Gross	cc9fb1c876	keyring: upserting key metadata in FSM must be deterministic (#13733 )	2022-07-14 08:38:14 -04:00
Luiz Aoqui	b656981cf0	Track plan rejection history and automatically mark clients as ineligible (#13421 ) Plan rejections occur when the scheduler work and the leader plan applier disagree on the feasibility of a plan. This may happen for valid reasons: since Nomad does parallel scheduling, it is expected that different workers will have a different state when computing placements. As the final plan reaches the leader plan applier, it may no longer be valid due to a concurrent scheduling taking up intended resources. In these situations the plan applier will notify the worker that the plan was rejected and that they should refresh their state before trying again. In some rare and unexpected circumstances it has been observed that workers will repeatedly submit the same plan, even if they are always rejected. While the root cause is still unknown this mitigation has been put in place. The plan applier will now track the history of plan rejections per client and include in the plan result a list of node IDs that should be set as ineligible if the number of rejections in a given time window crosses a certain threshold. The window size and threshold value can be adjusted in the server configuration. To avoid marking several nodes as ineligible at one, the operation is rate limited to 5 nodes every 30min, with an initial burst of 10 operations.	2022-07-12 18:40:20 -04:00
Tim Gross	a5a9eedc81	core job for secure variables re-key (#13440 ) When the `Full` flag is passed for key rotation, we kick off a core job to decrypt and re-encrypt all the secure variables so that they use the new key.	2022-07-11 13:34:06 -04:00
Charlie Voiselle	555ac432cd	SV: CAS: Implement Check and Set for Delete and Upsert (#13429 ) * SV: CAS * Implement Check and Set for Delete and Upsert * Reading the conflict from the state store * Update endpoint for new error text * Updated HTTP api tests * Conflicts to the HTTP api * SV: structs: Update SV time to UnixNanos * update mock to UnixNano; refactor * SV: encrypter: quote KeyID in error * SV: mock: add mock for namespace w/ SV	2022-07-11 13:34:06 -04:00
Tim Gross	8a50d2c3e8	implement quota tracking for secure variablees (#13453 ) We need to track per-namespace storage usage for secure variables even in Nomad OSS so that a cluster can be seamlessly upgraded from OSS to ENT without having to re-calculate quota usage. Provide a hook in the upsert RPC for enforcement of quotas in ENT. This will be a no-op in Nomad OSS.	2022-07-11 13:34:06 -04:00
Charlie Voiselle	1fe080c6de	Implement HTTP search API for Variables (#13257 ) * Add Path only index for SecureVariables * Add GetSecureVariablesByPrefix; refactor tests * Add search for SecureVariables * Add prefix search for secure variables	2022-07-11 13:34:05 -04:00
Charlie Voiselle	06c6a950c4	Secure Variables: Seperate Encrypted and Decrypted structs (#13355 ) This PR splits SecureVariable into SecureVariableDecrypted and SecureVariableEncrypted in order to use the type system to help verify that cleartext secret material is not committed to file. * Make Encrypt function return KeyID * Split SecureVariable Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-07-11 13:34:05 -04:00
Tim Gross	5a85d96322	remove end-user algorithm selection (#13190 ) After internal design review, we decided to remove exposing algorithm choice to the end-user for the initial release. We'll solve nonce rotation by forcing rotations automatically on key GC (in a core job, not included in this changeset). Default to AES-256 GCM for the following criteria: * faster implementation when hardware acceleration is available * FIPS compliant * implementation in pure go * post-quantum resistance Also fixed a bug in the decoding from keystore and switched to a harder-to-misuse encoding method.	2022-07-11 13:34:04 -04:00
Tim Gross	973b474b3c	provide state store query for variables by key ID (#13195 ) The core jobs to garbage collect unused keys and perform full key rotations will need to be able to query secure variables by key ID for efficiency. Add an index to the state store and associated query function and test.	2022-07-11 13:34:04 -04:00
Charlie Voiselle	3717688f3e	Secure Variables: Variables - State store, FSM, RPC (#13098 ) * Secure Variables: State Store * Secure Variables: FSM * Secure Variables: RPC * Secure Variables: HTTP API Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-07-11 13:34:04 -04:00
Tim Gross	05eef2b95c	keystore serialization (#13106 ) This changeset implements the keystore serialization/deserialization: * Adds a JSON serialization extension for the `RootKey` struct, along with a metadata stub. When we serialize RootKey to the on-disk keystore, we want to base64 encode the key material but also exclude any frequently-changing fields which are stored in raft. * Implements methods for loading/saving keys to the keystore. * Implements methods for restoring the whole keystore from disk. * Wires it all up with the `Keyring` RPC handlers and fixes up any fallout on tests.	2022-07-11 13:34:04 -04:00
Tim Gross	b1dc6dcef0	keyring state store operations (#13016 ) Implement the basic upsert, list, and delete operations for `RootKeyMeta` needed by the Keyring RPCs. This changeset also implements two convenience methods `RootKeyMetaByID` and `GetActiveRootKeyMeta` which are useful for testing but also will be needed to implement the rest of the RPCs.	2022-07-11 13:34:04 -04:00
Tim Gross	d29e85d150	secure variables: initial state store (#12932 ) Implement the core SecureVariable and RootKey structs in memdb, provide the minimal skeleton for FSM, and a dummy storage and keyring RPC endpoint.	2022-07-11 13:34:01 -04:00
Luiz Aoqui	85908415f9	state: fix eval list by prefix with * namespace (#13551 )	2022-07-07 14:21:51 -04:00
James Rasell	0c0b028a59	core: allow deleting of evaluations (#13492 ) * core: add eval delete RPC and core functionality. * agent: add eval delete HTTP endpoint. * api: add eval delete API functionality. * cli: add eval delete command. * docs: add eval delete website documentation.	2022-07-06 16:30:11 +02:00
James Rasell	f5e78a3791	state: only update index on change when deleting evals. (#13227 ) When deleting evaluations and allocations during a reap event, the index table entries for evals and allocs was updated irregardless of whether changes were made. This change modifies the state logic so that the index table is only modified when the corresponding table has actually been modified. Along with matching expected behaviour, this change has the potential to reduce the number of times blocking queries will return without any real state change.	2022-06-07 11:56:43 +02:00
James Rasell	257e1c4f96	autopilot: correctly return errors within state functions. (#12714 )	2022-04-21 08:54:50 +02:00
James Rasell	4cdc46ae75	service discovery: add pagination and filtering support to info requests (#12552 ) * services: add pagination and filter support to info RPC. * cli: add filter flag to service info command. * docs: add pagination and filter details to services info API. * paginator: minor updates to comment and func signature.	2022-04-13 07:41:44 +02:00
Luiz Aoqui	82027edb2f	add some godocs for the API pagination tokenizer options (#12547 )	2022-04-12 10:27:22 -04:00
Lars Lehtonen	df1edf5cf4	nomad/state: fix dropped test errors (#12406 )	2022-04-07 10:48:10 -04:00
Derek Strickland	d7f44448e1	disconnected clients: Observability plumbing (#12141 ) * Add disconnects/reconnect to log output and emit reschedule metrics * TaskGroupSummary: Add Unknown, update StateStore logic, add to metrics	2022-04-05 17:12:23 -04:00
Derek Strickland	b128769e19	reconciler: support disconnected clients (#12058 ) * Add merge helper for string maps * structs: add statuses, MaxClientDisconnect, and helper funcs * taintedNodes: Include disconnected nodes * upsertAllocsImpl: don't use existing ClientStatus when upserting unknown * allocSet: update filterByTainted and add delayByMaxClientDisconnect * allocReconciler: support disconnecting and reconnecting allocs * GenericScheduler: upsert unknown and queue reconnecting Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-04-05 17:10:37 -04:00
James Rasell	cc7b448d63	events: fixup service events and rename topic to service.	2022-04-05 08:25:22 +01:00
Seth Hoenig	9670adb6c6	cleanup: purge github.com/pkg/errors	2022-04-01 19:24:02 -05:00
James Rasell	96d8512c85	test: move remaining tests to use ci.Parallel.	2022-03-24 08:45:13 +01:00
James Rasell	a646333263	Merge branch 'main' into f-1.3-boogie-nights	2022-03-23 09:41:25 +01:00
Tim Gross	2a2ebd0537	CSI: presentation improvements (#12325 ) * Fix plugin capability sorting. The `sort.StringSlice` method in the stdlib doesn't actually sort, but instead constructs a sorting type which you call `Sort()` on. * Sort allocations for plugins by modify index. Present allocations in modify index order so that newest allocations show up at the top of the list. This results in sorted allocs in `nomad plugin status :id`, just like `nomad job status :id`. * Sort allocations for volumes in HTTP response. Present allocations in modify index order so that newest allocations show up at the top of the list. This results in sorted allocs in `nomad volume status :id`, just like `nomad job status :id`. This is implemented in the HTTP response and not in the state store because the state store maintains two separate lists of allocs that are merged before sending over the API. * Fix length of alloc IDs in `nomad volume status` output	2022-03-22 09:48:38 -04:00
Luiz Aoqui	15089f055f	api: add related evals to eval details (#12305 ) The `related` query param is used to indicate that the request should return a list of related (next, previous, and blocked) evaluations. Co-authored-by: Jasmine Dahilig <jasmine@hashicorp.com>	2022-03-17 13:56:14 -04:00
Luiz Aoqui	83d834d84c	tests: move state store namespace tests from ENT (#12308 )	2022-03-16 11:56:11 -04:00
Seth Hoenig	2631659551	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Luiz Aoqui	2876739a51	api: apply consistent behaviour of the reverse query parameter (#12244 )	2022-03-11 19:44:52 -05:00
Luiz Aoqui	ab8ce87bba	Add pagination, filtering and sort to more API endpoints (#12186 )	2022-03-08 20:54:17 -05:00
Tim Gross	2dafe46fe3	CSI: allow updates to volumes on re-registration (#12167 ) CSI `CreateVolume` RPC is idempotent given that the topology, capabilities, and parameters are unchanged. CSI volumes have many user-defined fields that are immutable once set, and many fields that are not user-settable. Update the `Register` RPC so that updating a volume via the API merges onto any existing volume without touching Nomad-controlled fields, while validating it with the same strict requirements expected for idempotent `CreateVolume` RPCs. Also, clarify that this state store method is used for everything, not just for the `Register` RPC.	2022-03-07 11:06:59 -05:00
Tim Gross	b776c1c196	csi: fix prefix queries for plugin list RPC (#12194 ) The `CSIPlugin.List` RPC was intended to accept a prefix to filter the list of plugins being listed. This was being accidentally being done in the state store instead, which contributed to incorrect filtering behavior for plugins in the `volume plugin status` command. Move the prefix matching into the RPC so that it calls the prefix-matching method in the state store if we're looking for a prefix. Update the `plugin status command` to accept a prefix for the plugin ID argument so that it matches the expected behavior of other commands.	2022-03-04 16:44:09 -05:00
Luiz Aoqui	b1809eb48c	Fix CSI volume list with prefix and `` namespace (#12184 ) When using a prefix value and the wildcard for namespace, the endpoint would not take the prefix value into consideration due to the order in which the checks were executed but also the logic for retrieving volumes from the state store. This commit changes the order to check for a prefix first and wraps the result iterator of the state store query in a filter to apply the prefix.	2022-03-03 17:27:04 -05:00
Luiz Aoqui	01931587ba	api: paginated results with different ordering (#12128 ) The paginator logic was built when go-memdb iterators would return items ordered lexicographically by their ID prefixes, but #12054 added the option for some tables to return results ordered by their `CreateIndex` instead, which invalidated the previous paginator assumption. The iterator used for pagination must still return results in some order so that the paginator can properly handle requests where the next_token value is not present in the results anymore (e.g., the eval was GC'ed). In these situations, the paginator will start the returned page in the first element right after where the requested token should've been. This commit moves the logic to generate pagination tokens from the elements being paginated to the iterator itself so that callers can have more control over the token format to make sure they are properly ordered and stable. It also allows configuring the paginator as being ordered in ascending or descending order, which is relevant when looking for a token that may not be present anymore.	2022-03-01 15:36:49 -05:00
James Rasell	8a23afdb56	events: add state objects and logic for service registrations.	2022-02-28 10:44:58 +01:00
James Rasell	20249bb761	state: add service registration restore functionality.	2022-02-28 10:15:27 +01:00
James Rasell	74b367553e	state: add service registration state interaction functions.	2022-02-28 10:15:03 +01:00
James Rasell	cf0b63d561	state: add the table schema for the service_registrations table.	2022-02-28 10:14:10 +01:00
Tim Gross	5b7b9fdafb	csi: tolerate missing plugins on job delete (#12114 ) If a plugin job fails before successfully fingerprinting the plugins, the plugin will not exist when we try to delete the job. Tolerate missing plugins.	2022-02-24 08:53:15 -05:00
Tim Gross	17dc0adee3	csi: fix broken test (#12110 )	2022-02-23 13:48:39 -05:00
Luiz Aoqui	de91954582	initial base work for implementing sorting and filter across API endpoints (#12076 )	2022-02-16 14:34:36 -05:00
Luiz Aoqui	110dbeeb9d	Add `go-bexpr` filters to evals and deployment list endpoints (#12034 )	2022-02-16 11:40:30 -05:00
Seth Hoenig	40c714a681	api: return sorted results in certain list endpoints These API endpoints now return results in chronological order. They can return results in reverse chronological order by setting the query parameter ascending=true. - Eval.List - Deployment.List	2022-02-15 13:48:28 -06:00
Tim Gross	6bd33d3fb9	CSI: use job status not alloc status for plugin updates from summary (#12027 ) When an allocation is updated, the job summary for the associated job is also updated. CSI uses the job summary to set the expected count for controller and node plugins. We incorrectly used the allocation's server status instead of the job status when deciding whether to update or remove the job from the plugins. This caused a node drain or other terminal state for an allocation to clear the expected count for the entire plugin. Use the job status to guide whether to update or remove the expected count. The existing CSI tests for the state store incorrectly modeled the updates we received from servers vs those we received from clients, leading to test assertions that passed when they should not. Rework the tests to clarify each step in the lifecycle and rename CSI state store functions for clarity	2022-02-09 11:51:49 -05:00
Tim Gross	b20a6c9ffb	CSI: move terminal alloc handling into denormalization (#11931 ) * The volume claim GC method and volumewatcher both have logic collecting terminal allocations that duplicates most of the logic that's now in the state store's `CSIVolumeDenormalize` method. Copy this logic into the state store so that all code paths have the same view of the past claims. * Remove logic in the volume claim GC that now lives in the state store's `CSIVolumeDenormalize` method. * Remove logic in the volumewatcher that now lives in the state store's `CSIVolumeDenormalize` method. * Remove logic in the node unpublish RPC that now lives in the state store's `CSIVolumeDenormalize` method.	2022-01-27 10:39:08 -05:00

1 2 3 4 5 ...

557 Commits