* structs: CSIInfo include AllocID, CSIPlugins no Jobs
* state_store: eliminate plugin Jobs, delete an empty plugin
* nomad/structs/csi: detect empty plugins correctly
* client/allocrunner/taskrunner/plugin_supervisor_hook: option AllocID
* client/pluginmanager/csimanager/instance: allocID
* client/pluginmanager/csimanager/fingerprint: set AllocID
* client/node_updater: split controller and node plugins
* api/csi: remove Jobs
The CSI Plugin API will map plugins to allocations, which allows
plugins to be defined by jobs in many configurations. In particular,
multiple plugins can be defined in the same job, and multiple jobs can
be used to define a single plugin.
Because we now map the allocation context directly from the node, it's
no longer necessary to track the jobs associated with a plugin
directly.
* nomad/csi_endpoint_test: CreateTestPlugin & register via fingerprint
* client/dynamicplugins: lift AllocID into the struct from Options
* api/csi_test: remove Jobs test
* nomad/structs/csi: CSIPlugins has an array of allocs
* nomad/state/state_store: implement CSIPluginDenormalize
* nomad/state/state_store: CSIPluginDenormalize npe on missing alloc
* nomad/csi_endpoint_test: defer deleteNodes for clarity
* api/csi_test: disable this test awaiting mocks:
https://github.com/hashicorp/nomad/issues/7123
When an alloc is marked terminal (and after node unstage/unpublish
have been called), the client syncs the terminal alloc state with the
server via `Node.UpdateAlloc RPC`.
For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC
handler at the server will emit an eval for a new core job to garbage
collect CSI volume claims. When this eval is handled on the core
scheduler, it will call a `volumeReap` method to release the claims
for all terminal allocs on the job.
The volume reap will issue a `ControllerUnpublishVolume` RPC for any
node that has no alloc claiming the volume. Once this returns (or
is skipped), the volume reap will send a new `CSIVolume.Claim` RPC
that releases the volume claim for that allocation in the state store,
making it available for scheduling again.
This same `volumeReap` method will be called from the core job GC,
which gives us a second chance to reclaim volumes during GC if there
were controller RPC failures.
* state_store: csi volumes/plugins store the index in the txn
* nomad: csi_endpoint_test require index checks need uint64()
* nomad: other tests using int 0 not uint64(0)
* structs: pass index into New, but not other struct methods
* state_store: csi plugin indexes, use new struct interface
* nomad: csi_endpoint_test check index/query meta (on explicit 0)
* structs: NewCSIVolume takes an index arg now
* scheduler/test: NewCSIVolume takes an index arg now
state_store: change claim counts
state_store: get volumes by all, by driver
state_store: process volume claims
state_store: csi volume register error on update
Nomad jobs may be configured with a TaskGroup which contains a Service
definition that is Consul Connect enabled. These service definitions end
up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In
the case where Consul ACLs are enabled, a Service Identity token is required
for these tasks to run & connect, etc. This changeset enables the Nomad Server
to recieve RPC requests for the derivation of SI tokens on behalf of instances
of Consul Connect using Tasks. Those tokens are then relayed back to the
requesting Client, which then injects the tokens in the secrets directory of
the Task.
Enable any Server to lookup the unique ClusterID. If one has not been
generated, and this node is the leader, generate a UUID and attempt to
apply it through raft.
The value is not yet used anywhere in this changeset, but is a prerequisite
for gh-6701.
It has been decided we're going to live in a many core world.
Let's take advantage of that and parallelize these state store
tests which all run in memory and are largely CPU bound.
An unscientific benchmark demonstrating the improvement:
[mp state (master)] $ go test
PASS
ok github.com/hashicorp/nomad/nomad/state 5.162s
[mp state (f-parallelize-state-store-tests)] $ go test
PASS
ok github.com/hashicorp/nomad/nomad/state 1.527s
If ACL Request is unauthenticated, we should honor the anonymous token.
This PR makes few changes:
* `GetPolicy` endpoints may return policy if anonymous policy allows it,
or return permission denied otherwise.
* `ListPolicies` returns an empty policy list, or one with anonymous
policy if one exists.
Without this PR, the we return an incomprehensible error.
Before:
```
$ curl http://localhost:4646/v1/acl/policy/doesntexist; echo
acl token lookup failed: index error: UUID must be 36 characters
$ curl http://localhost:4646/v1/acl/policies; echo
acl token lookup failed: index error: UUID must be 36 characters
```
After:
```
$ curl http://localhost:4646/v1/acl/policy/doesntexist; echo
Permission denied
$ curl http://localhost:4646/v1/acl/policies; echo
[]
```
Rename SnapshotAfter to SnapshotMinIndex. The old name was not
technically accurate. SnapshotAtOrAfter is more accurate, but wordy and
still lacks context about what precisely it is at or after (the index).
SnapshotMinIndex was chosen as it describes the action (snapshot), a
constraint (minimum), and the object of the constraint (index).
Fix a case where `node.StatusUpdatedAt` was manipulated directly in
memory.
This ensures that StatusUpdatedAt is set in raft layer, and ensures that
the field is updated when node drain/eligibility is updated too.
This fixes a bug in the state store during plan apply. When
denormalizing preempted allocations it incorrectly set the preemptor's
job during the update. This eventually causes a panic downstream in the
client. Added a test assertion that failed before and passes after this fix
Fixes https://github.com/hashicorp/nomad/issues/4299
Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals.
Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index. However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals. When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation.
This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.
Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.
Further fix an issue where the blocked eval optimization would make it
so no evals were created to progress the deployment. To reproduce this
issue, prior to this commit, you can create a job with two task groups.
The first group has count 1 and resources such that it can not be
placed. The second group has count 3, max_parallel=1, and can be placed.
Run this first and then update the second group to do a deployment. It
will place the first of three, but never progress since there exists a
blocked eval. However, that doesn't capture the fact that there are two
groups being deployed.
This commit implements an allocation selection algorithm for finding
allocations to preempt. It currently special cases network resource asks
from others (cpu/memory/disk/iops).
Fixes three issues:
1. Retrieving the latest evaluation index was not properly selecting the
greatest index. This would undermine checks we had to reduce the number
of evaluations created when the latest eval index was greater than any
alloc change
2. Fix an issue where the blocking query code was using the incorrect
index such that the index was higher than necassary.
3. Special case handling of blocked evaluation since the create/snapshot
index is no particularly useful since they can be reblocked.
We performed the DeploymentStatus nil checks a couple different ways, so
hopefully this helper will consoldiate them and make it more clear what
the code is doing.
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
Fixes an issue in which the versions were improperly sorted which would
cause pruning of the wrong job version. This essentially meant that job
versions above 255 would be dropped from the job version table (note
this was due to the prefix walk crossing from the 1-byte to 2-byte
threshold).
Fixes https://github.com/hashicorp/nomad/issues/3357
This PR exposes errors returned by the FSM to the deployment watcher and
thus the API. It also adds an error to handle the case of promoting a
deployment that has no eligible canaries.
Prior to this commit they would be marked as dead if they had no
currently running allocations -- even though they would spring back to
life (running) if the cluster state changed such that a new eval+alloc
was created.
This PR fixes our vet script and fixes all the missed vet changes.
It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
This PR causes blocked evaluations to be cancelled if there is a
subsequent successful evaluation for the job. This fixes UX problems
showing failed placements when there are not any in reality and makes GC
possible for these jobs in certain cases.
Fixes https://github.com/hashicorp/nomad/issues/2124
Token revocation
Remove from the statestore
Revoke tokens
Don't error when Vault is disabled as this could cause issue if the operator ever goes from enabled to disabled
update server interface to allow enable/disable and config loading
test the new functions
Leader revoke
Use active