The original design for workload identities and ACLs allows for operators to
extend the automatic capabilities of a workload by using a specially-named
policy. This has shown to be potentially unsafe because of naming collisions, so
instead we'll allow operators to explicitly attach a policy to a workload
identity.
This changeset adds workload identity fields to ACL policy objects and threads
that all the way down to the command line. It also a new secondary index to the
ACL policy table on namespace and job so that claim resolution can efficiently
query for related policies.
Move conflict resolution implementation into the state store with a new Apply RPC.
This also makes the RPC for secure variables much more similar to Consul's KV,
which will help us support soft deletes in a post-1.4.0 version of Nomad.
Reimplement quotas in the state store functions.
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Move the secure variables quota enforcement calls into the state store to ensure
quota checks are atomic with quota updates (in the same transaction).
Switch to a machine-size int instead of a uint64 for quota tracking. The
ENT-side quota spec is described as int, and negative values have a meaning as
"not permitted at all". Using the same type for tracking will make it easier to
the math around checks, and uint64 is infeasibly large anyways.
Add secure vars to quota HTTP API and CLI outputs and API docs.
When we delete a namespace, we check to ensure that there are no non-terminal
jobs or CSI volume, which also covers evals, allocs, etc. Secure variables are
also namespaces, so extend this check to them as well.
When we delete a namespace, we check to ensure that there are no non-terminal
jobs, which effectively covers evals, allocs, etc. CSI volumes are also
namespaced, so extend this check to cover CSI volumes.
When applying a raft log to expire ACL tokens, we need to use a
timestamp provided by the leader so that the result is deterministic
across servers. Use leader's timestamp from RPC call
When secure variables are updated, we were adding the update to the
existing quota tracking without first checking whether it was an
update to an existing variable. In that case we need to add/subtract
only the difference between the new and existing quota usage.
Plan rejections occur when the scheduler work and the leader plan
applier disagree on the feasibility of a plan. This may happen for valid
reasons: since Nomad does parallel scheduling, it is expected that
different workers will have a different state when computing placements.
As the final plan reaches the leader plan applier, it may no longer be
valid due to a concurrent scheduling taking up intended resources. In
these situations the plan applier will notify the worker that the plan
was rejected and that they should refresh their state before trying
again.
In some rare and unexpected circumstances it has been observed that
workers will repeatedly submit the same plan, even if they are always
rejected.
While the root cause is still unknown this mitigation has been put in
place. The plan applier will now track the history of plan rejections
per client and include in the plan result a list of node IDs that should
be set as ineligible if the number of rejections in a given time window
crosses a certain threshold. The window size and threshold value can be
adjusted in the server configuration.
To avoid marking several nodes as ineligible at one, the operation is rate
limited to 5 nodes every 30min, with an initial burst of 10 operations.
When the `Full` flag is passed for key rotation, we kick off a core
job to decrypt and re-encrypt all the secure variables so that they
use the new key.
* SV: CAS
* Implement Check and Set for Delete and Upsert
* Reading the conflict from the state store
* Update endpoint for new error text
* Updated HTTP api tests
* Conflicts to the HTTP api
* SV: structs: Update SV time to UnixNanos
* update mock to UnixNano; refactor
* SV: encrypter: quote KeyID in error
* SV: mock: add mock for namespace w/ SV
We need to track per-namespace storage usage for secure variables even
in Nomad OSS so that a cluster can be seamlessly upgraded from OSS to
ENT without having to re-calculate quota usage.
Provide a hook in the upsert RPC for enforcement of quotas in
ENT. This will be a no-op in Nomad OSS.
* Add Path only index for SecureVariables
* Add GetSecureVariablesByPrefix; refactor tests
* Add search for SecureVariables
* Add prefix search for secure variables
This PR splits SecureVariable into SecureVariableDecrypted and
SecureVariableEncrypted in order to use the type system to help
verify that cleartext secret material is not committed to file.
* Make Encrypt function return KeyID
* Split SecureVariable
Co-authored-by: Tim Gross <tgross@hashicorp.com>
After internal design review, we decided to remove exposing algorithm
choice to the end-user for the initial release. We'll solve nonce
rotation by forcing rotations automatically on key GC (in a core job,
not included in this changeset). Default to AES-256 GCM for the
following criteria:
* faster implementation when hardware acceleration is available
* FIPS compliant
* implementation in pure go
* post-quantum resistance
Also fixed a bug in the decoding from keystore and switched to a
harder-to-misuse encoding method.
The core jobs to garbage collect unused keys and perform full key
rotations will need to be able to query secure variables by key ID for
efficiency. Add an index to the state store and associated query
function and test.
This changeset implements the keystore serialization/deserialization:
* Adds a JSON serialization extension for the `RootKey` struct, along with a metadata stub. When we serialize RootKey to the on-disk keystore, we want to base64 encode the key material but also exclude any frequently-changing fields which are stored in raft.
* Implements methods for loading/saving keys to the keystore.
* Implements methods for restoring the whole keystore from disk.
* Wires it all up with the `Keyring` RPC handlers and fixes up any fallout on tests.
Implement the basic upsert, list, and delete operations for
`RootKeyMeta` needed by the Keyring RPCs.
This changeset also implements two convenience methods
`RootKeyMetaByID` and `GetActiveRootKeyMeta` which are useful for
testing but also will be needed to implement the rest of the RPCs.
When deleting evaluations and allocations during a reap event, the
index table entries for evals and allocs was updated irregardless
of whether changes were made.
This change modifies the state logic so that the index table is
only modified when the corresponding table has actually been
modified. Along with matching expected behaviour, this change has
the potential to reduce the number of times blocking queries will
return without any real state change.
* services: add pagination and filter support to info RPC.
* cli: add filter flag to service info command.
* docs: add pagination and filter details to services info API.
* paginator: minor updates to comment and func signature.
* Fix plugin capability sorting.
The `sort.StringSlice` method in the stdlib doesn't actually sort, but
instead constructs a sorting type which you call `Sort()` on.
* Sort allocations for plugins by modify index.
Present allocations in modify index order so that newest allocations
show up at the top of the list. This results in sorted allocs in
`nomad plugin status :id`, just like `nomad job status :id`.
* Sort allocations for volumes in HTTP response.
Present allocations in modify index order so that newest allocations
show up at the top of the list. This results in sorted allocs in
`nomad volume status :id`, just like `nomad job status :id`.
This is implemented in the HTTP response and not in the state store
because the state store maintains two separate lists of allocs that
are merged before sending over the API.
* Fix length of alloc IDs in `nomad volume status` output
The `related` query param is used to indicate that the request should
return a list of related (next, previous, and blocked) evaluations.
Co-authored-by: Jasmine Dahilig <jasmine@hashicorp.com>
CSI `CreateVolume` RPC is idempotent given that the topology,
capabilities, and parameters are unchanged. CSI volumes have many
user-defined fields that are immutable once set, and many fields that
are not user-settable.
Update the `Register` RPC so that updating a volume via the API merges
onto any existing volume without touching Nomad-controlled fields,
while validating it with the same strict requirements expected for
idempotent `CreateVolume` RPCs.
Also, clarify that this state store method is used for everything, not just
for the `Register` RPC.
The `CSIPlugin.List` RPC was intended to accept a prefix to filter the
list of plugins being listed. This was being accidentally being done
in the state store instead, which contributed to incorrect filtering
behavior for plugins in the `volume plugin status` command.
Move the prefix matching into the RPC so that it calls the
prefix-matching method in the state store if we're looking for a
prefix.
Update the `plugin status command` to accept a prefix for the plugin
ID argument so that it matches the expected behavior of other commands.
When using a prefix value and the * wildcard for namespace, the endpoint
would not take the prefix value into consideration due to the order in
which the checks were executed but also the logic for retrieving volumes
from the state store.
This commit changes the order to check for a prefix first and wraps the
result iterator of the state store query in a filter to apply the
prefix.
The paginator logic was built when go-memdb iterators would return items
ordered lexicographically by their ID prefixes, but #12054 added the
option for some tables to return results ordered by their `CreateIndex`
instead, which invalidated the previous paginator assumption.
The iterator used for pagination must still return results in some order
so that the paginator can properly handle requests where the next_token
value is not present in the results anymore (e.g., the eval was GC'ed).
In these situations, the paginator will start the returned page in the
first element right after where the requested token should've been.
This commit moves the logic to generate pagination tokens from the
elements being paginated to the iterator itself so that callers can have
more control over the token format to make sure they are properly
ordered and stable.
It also allows configuring the paginator as being ordered in ascending
or descending order, which is relevant when looking for a token that may
not be present anymore.