* state_store: csi volumes/plugins store the index in the txn
* nomad: csi_endpoint_test require index checks need uint64()
* nomad: other tests using int 0 not uint64(0)
* structs: pass index into New, but not other struct methods
* state_store: csi plugin indexes, use new struct interface
* nomad: csi_endpoint_test check index/query meta (on explicit 0)
* structs: NewCSIVolume takes an index arg now
* scheduler/test: NewCSIVolume takes an index arg now
This changeset adds a new core job `CoreJobCSIVolumePublicationGC` to
the leader's loop for scheduling core job evals. Right now this is an
empty method body without even a config file stanza. Later changesets
will implement the logic of volume publication GC.
state_store: change claim counts
state_store: get volumes by all, by driver
state_store: process volume claims
state_store: csi volume register error on update
This changeset implements the initial registration and fingerprinting
of CSI Plugins as part of #5378. At a high level, it introduces the
following:
* A `csi_plugin` stanza as part of a Nomad task configuration, to
allow a task to expose that it is a plugin.
* A new task runner hook: `csi_plugin_supervisor`. This hook does two
things. When the `csi_plugin` stanza is detected, it will
automatically configure the plugin task to receive bidirectional
mounts to the CSI intermediary directory. At runtime, it will then
perform an initial heartbeat of the plugin and handle submitting it to
the new `dynamicplugins.Registry` for further use by the client, and
then run a lightweight heartbeat loop that will emit task events
when health changes.
* The `dynamicplugins.Registry` for handling plugins that run
as Nomad tasks, in contrast to the existing catalog that requires
`go-plugin` type plugins and to know the plugin configuration in
advance.
* The `csimanager` which fingerprints CSI plugins, in a similar way to
`drivermanager` and `devicemanager`. It currently only fingerprints
the NodeID from the plugin, and assumes that all plugins are
monolithic.
Missing features
* We do not use the live updates of the `dynamicplugin` registry in
the `csimanager` yet.
* We do not deregister the plugins from the client when they shutdown
yet, they just become indefinitely marked as unhealthy. This is
deliberate until we figure out how we should manage deploying new
versions of plugins/transitioning them.
The test starts enough connections to hit the limit, then closes the
connection and immediately starts one expecting the new one to succeed.
We must wait until the server side recognizes the connection
closing and free up a limits slot. The current test attempts to achieve
that by waiting to get an error on conn.Read, however, this error is
returned from local client without waiting for server update.
As such, I change the logic so it retries on connection rejection but
force the first non-EOF failure to be a deadline error.
This fixes a bug where a forwarded node update request may be assumed
to be the actual direct client connection if the server just lost
leadership.
When a nomad non-leader server receives a Node.UpdateStatus request, it
forwards the RPC request to the leader, and holds on the request
Yamux connection in a cache to allow for server<->client forwarding.
When the leader handles the request, it must differentiate between a
forwarded connection vs the actual connection. This is done in
https://github.com/hashicorp/nomad/blob/v0.10.4/nomad/node_endpoint.go#L412
Now, consider if the non-leader server forwards to the connection to a
recently deposed nomad leader, which in turn forwards the RPC request to
the new leader.
Without this change, the deposed leader will mistake the forwarded
connection for the actual client connection and cache it mapped to the
client ID. If the server attempts to connect to that client, it will
attempt to start a connection/session to the other server instead and
the call will hang forever.
This change ensures that we only add node connection mapping if the
request is not a forwarded request, regardless of circumstances.
This deflake the tests in the deploymentwatcher package. The package
uses a mock deployment watcher backend, where the watcher in a
background goroutine calls UpdateDeploymentStatus . If the mock isn't
configured to expect the call, the background goroutine will fail. One
UpdateDeploymentStatus call is made at the end of the background
goroutine, which may occur after the test completes, thus explaining the
flakiness.
This change updates tests to honor `BootstrapExpect` exclusively when
forming test clusters and removes test only knobs, e.g.
`config.DevDisableBootstrap`.
Background:
Test cluster creation is fragile. Test servers don't follow the
BootstapExpected route like production clusters. Instead they start as
single node clusters and then get rejoin and may risk causing brain
split or other test flakiness.
The test framework expose few knobs to control those (e.g.
`config.DevDisableBootstrap` and `config.Bootstrap`) that control
whether a server should bootstrap the cluster. These flags are
confusing and it's unclear when to use: their usage in multi-node
cluster isn't properly documented. Furthermore, they have some bad
side-effects as they don't control Raft library: If
`config.DevDisableBootstrap` is true, the test server may not
immediately attempt to bootstrap a cluster, but after an election
timeout (~50ms), Raft may force a leadership election and win it (with
only one vote) and cause a split brain.
The knobs are also confusing as Bootstrap is an overloaded term. In
BootstrapExpect, we refer to bootstrapping the cluster only after N
servers are connected. But in tests and the knobs above, it refers to
whether the server is a single node cluster and shouldn't wait for any
other server.
Changes:
This commit makes two changes:
First, it relies on `BootstrapExpected` instead of `Bootstrap` and/or
`DevMode` flags. This change is relatively trivial.
Introduce a `Bootstrapped` flag to track if the cluster is bootstrapped.
This allows us to keep `BootstrapExpected` immutable. Previously, the
flag was a config value but it gets set to 0 after cluster bootstrap
completes.