Fixes#16517
Given a 3 Server cluster with at least 1 Client connected to Follower 1:
If a NodeMeta.{Apply,Read} for the Client request is received by
Follower 1 with `AllowStale = false` the Follower will forward the
request to the Leader.
The Leader, not being connected to the target Client, will forward the
RPC to Follower 1.
Follower 1, seeing AllowStale=false, will forward the request to the
Leader.
The Leader, not being connected to... well hoppefully you get the
picture: an infinite loop occurs.
This changeset implements the keystore serialization/deserialization:
* Adds a JSON serialization extension for the `RootKey` struct, along with a metadata stub. When we serialize RootKey to the on-disk keystore, we want to base64 encode the key material but also exclude any frequently-changing fields which are stored in raft.
* Implements methods for loading/saving keys to the keystore.
* Implements methods for restoring the whole keystore from disk.
* Wires it all up with the `Keyring` RPC handlers and fixes up any fallout on tests.
Downgrading the Raft version protocol is not a supported operation.
Checking for a downgrade is hard since this information is not stored in
any persistent place. When a server re-joins a cluster with a prior Raft
version, the Serf tag is updated so Nomad can't tell that the version
changed.
Mixed version clusters must be supported to allow for zero-downtime
rolling upgrades. During this it's expected that the cluster will have
mixed Raft versions. Enforcing consistency strong version consistency
would disrupt this flow.
The approach taken here is to store the Raft version on disk. When the
server starts the `raft_protocol` value is written to the file
`data_dir/raft/version`. If that file already exists, its content is
checked against the current `raft_protocol` value to detect downgrades
and prevent the server from starting.
Any other types of errors are ignore to prevent disruptions that are
outside the control of operators. The only option in cases of an invalid
or corrupt file would be to delete it, making this check useless. So
just overwrite its content with the new version and provide guidance on
how to check that their cluster is an expected state.
Part 2 of breaking up https://github.com/hashicorp/nomad/pull/12255
This PR makes it so gotestsum is invoked only in CircleCI. Also the
HCLogger(t) is plumbed more correctly in TestServer and TestAgent so
that they respect NOMAD_TEST_LOG_LEVEL.
The reason for these is we'll want to disable logging in GHA,
where spamming the disk with logs really drags performance.
## Development Environment Changes
* Added stringer to build deps
## New HTTP APIs
* Added scheduler worker config API
* Added scheduler worker info API
## New Internals
* (Scheduler)Worker API refactor—Start(), Stop(), Pause(), Resume()
* Update shutdown to use context
* Add mutex for contended server data
- `workerLock` for the `workers` slice
- `workerConfigLock` for the `Server.Config.NumSchedulers` and
`Server.Config.EnabledSchedulers` values
## Other
* Adding docs for scheduler worker api
* Add changelog message
Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
Ensure that all servers are joined to each other before test proceed,
instead of just joining them to the first server and relying on
background serf propagation.
Relying on backgorund serf propagation is a cause of flakiness,
specially for tests with multiple regions. The server receiving the RPC
may not be aware of the region and fail to forward RPC accordingly.
For example, consider `TestMonitor_Monitor_RemoteServer` failure in https://app.circleci.com/pipelines/github/hashicorp/nomad/16402/workflows/7f327235-7d0c-40ba-9757-600522afca51/jobs/158045 you can observe:
* `nomad-117` is joined to `nomad-118` and `nomad-119`
* `nomad-119` is the foreign region
* `nomad-117` gains leadership in the default region, `nomad-118` is the non-leader
* search logs for `nomad: adding server` and notice that `nomad-118`
only added `nomad-118` and `nomad-118`, but not `nomad-119`!
* so the query to the non-leader in the test fails to be forwarded to
the appopriate region.
This PR introduces the /v1/search/fuzzy API endpoint, used for fuzzy
searching objects in Nomad. The fuzzy search endpoint routes requests
to the Nomad Server leader, which implements the Search.FuzzySearch RPC
method.
Requests to the fuzzy search API are based on the api.FuzzySearchRequest
object, e.g.
{
"Text": "ed",
"Context": "all"
}
Responses from the fuzzy search API are based on the api.FuzzySearchResponse
object, e.g.
{
"Index": 27,
"KnownLeader": true,
"LastContact": 0,
"Matches": {
"tasks": [
{
"ID": "redis",
"Scope": [
"default",
"example",
"cache"
]
}
],
"evals": [],
"deployment": [],
"volumes": [],
"scaling_policy": [],
"images": [
{
"ID": "redis:3.2",
"Scope": [
"default",
"example",
"cache",
"redis"
]
}
]
},
"Truncations": {
"volumes": false,
"scaling_policy": false,
"evals": false,
"deployment": false
}
}
The API is tunable using the new server.search stanza, e.g.
server {
search {
fuzzy_enabled = true
limit_query = 200
limit_results = 1000
min_term_length = 5
}
}
These values can be increased or decreased, so as to provide more
search results or to reduce load on the Nomad Server. The fuzzy search
API can be disabled entirely by setting `fuzzy_enabled` to `false`.
* upsertaclpolicies
* delete acl policies msgtype
* upsert acl policies msgtype
* delete acl tokens msgtype
* acl bootstrap msgtype
wip unsubscribe on token delete
test that subscriptions are closed after an ACL token has been deleted
Start writing policyupdated test
* update test to use before/after policy
* add SubscribeWithACLCheck to run acl checks on subscribe
* update rpc endpoint to use broker acl check
* Add and use subscriptions.closeSubscriptionFunc
This fixes the issue of not being able to defer unlocking the mutex on
the event broker in the for loop.
handle acl policy updates
* rpc endpoint test for terminating acl change
* add comments
Co-authored-by: Kris Hicks <khicks@hashicorp.com>
properly wire up durable event count
move newline responsibility
moves newline creation from NDJson to the http handler, json stream only encodes and sends now
ignore snapshot restore if broker is disabled
enable dev mode to access event steam without acl
use mapping instead of switch
use pointers for config sizes, remove unused ttl, simplify closed conn logic
This Commit adds an /v1/events/stream endpoint to stream events from.
The stream framer has been updated to include a SendFull method which
does not fragment the data between multiple frames. This essentially
treats the stream framer as a envelope to adhere to the stream framer
interface in the UI.
If the `encode` query parameter is omitted events will be streamed as
newline delimted JSON.
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.
```hcl
service {
connect {
gateway {
proxy {
// envoy proxy configuration
}
ingress {
// ingress-gateway configuration entry
}
}
}
}
```
A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.
Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.
Aims to address #8294 and tangentially #8647
* nomad/structs/csi: delete just one plugin type from a node
* nomad/structs/csi: add DeleteAlloc
* nomad/state/state_store: add deleteJobFromPlugin
* nomad/state/state_store: use DeleteAlloc not DeleteNodeType
* move CreateTestCSIPlugin to state to avoid an import cycle
* nomad/state/state_store_test: delete a plugin by deleting its jobs
* nomad/*_test: move CreateTestCSIPlugin to state
* nomad/state/state_store: update one plugin per transaction
* command/plugin_status_test: move CreateTestCSIPlugin
* nomad: csi: handle nils CSIPlugin methods, clarity
* command/csi: csi, csi_plugin, csi_volume
* helper/funcs: move ExtraKeys from parse_config to UnusedKeys
* command/agent/config_parse: use helper.UnusedKeys
* api/csi: annotate CSIVolumes with hcl fields
* command/csi_plugin: add Synopsis
* command/csi_volume_register: use hcl.Decode style parsing
* command/csi_volume_list
* command/csi_volume_status: list format, cleanup
* command/csi_plugin_list
* command/csi_plugin_status
* command/csi_volume_deregister
* command/csi_volume: add Synopsis
* api/contexts/contexts: add csi search contexts to the constants
* command/commands: register csi commands
* api/csi: fix struct tag for linter
* command/csi_plugin_list: unused struct vars
* command/csi_plugin_status: unused struct vars
* command/csi_volume_list: unused struct vars
* api/csi: add allocs to CSIPlugin
* command/csi_plugin_status: format the allocs
* api/allocations: copy Allocation.Stub in from structs
* nomad/client_rpc: add some error context with Errorf
* api/csi: collapse read & write alloc maps to a stub list
* command/csi_volume_status: cleanup allocation display
* command/csi_volume_list: use Schedulable instead of Healthy
* command/csi_volume_status: use Schedulable instead of Healthy
* command/csi_volume_list: sprintf string
* command/csi: delete csi.go, csi_plugin.go
* command/plugin: refactor csi components to sub-command plugin status
* command/plugin: remove csi
* command/plugin_status: remove csi
* command/volume: remove csi
* command/volume_status: split out csi specific
* helper/funcs: add RemoveEqualFold
* command/agent/config_parse: use helper.RemoveEqualFold
* api/csi: do ,unusedKeys right
* command/volume: refactor csi components to `nomad volume`
* command/volume_register: split out csi specific
* command/commands: use the new top level commands
* command/volume_deregister: hardwired type csi for now
* command/volume_status: csiFormatVolumes rescued from volume_list
* command/plugin_status: avoid a panic on no args
* command/volume_status: avoid a panic on no args
* command/plugin_status: predictVolumeType
* command/volume_status: predictVolumeType
* nomad/csi_endpoint_test: move CreateTestPlugin to testing
* command/plugin_status_test: use CreateTestCSIPlugin
* nomad/structs/structs: add CSIPlugins and CSIVolumes search consts
* nomad/state/state_store: add CSIPlugins and CSIVolumesByIDPrefix
* nomad/search_endpoint: add CSIPlugins and CSIVolumes
* command/plugin_status: move the header to the csi specific
* command/volume_status: move the header to the csi specific
* nomad/state/state_store: CSIPluginByID prefix
* command/status: rename the search context to just Plugins/Volumes
* command/plugin,volume_status: test return ids now
* command/status: rename the search context to just Plugins/Volumes
* command/plugin_status: support -json and -t
* command/volume_status: support -json and -t
* command/plugin_status_csi: comments
* command/*_status: clean up text
* api/csi: fix stale comments
* command/volume: make deregister sound less fearsome
* command/plugin_status: set the id length
* command/plugin_status_csi: more compact plugin health
* command/volume: better error message, comment
This change updates tests to honor `BootstrapExpect` exclusively when
forming test clusters and removes test only knobs, e.g.
`config.DevDisableBootstrap`.
Background:
Test cluster creation is fragile. Test servers don't follow the
BootstapExpected route like production clusters. Instead they start as
single node clusters and then get rejoin and may risk causing brain
split or other test flakiness.
The test framework expose few knobs to control those (e.g.
`config.DevDisableBootstrap` and `config.Bootstrap`) that control
whether a server should bootstrap the cluster. These flags are
confusing and it's unclear when to use: their usage in multi-node
cluster isn't properly documented. Furthermore, they have some bad
side-effects as they don't control Raft library: If
`config.DevDisableBootstrap` is true, the test server may not
immediately attempt to bootstrap a cluster, but after an election
timeout (~50ms), Raft may force a leadership election and win it (with
only one vote) and cause a split brain.
The knobs are also confusing as Bootstrap is an overloaded term. In
BootstrapExpect, we refer to bootstrapping the cluster only after N
servers are connected. But in tests and the knobs above, it refers to
whether the server is a single node cluster and shouldn't wait for any
other server.
Changes:
This commit makes two changes:
First, it relies on `BootstrapExpected` instead of `Bootstrap` and/or
`DevMode` flags. This change is relatively trivial.
Introduce a `Bootstrapped` flag to track if the cluster is bootstrapped.
This allows us to keep `BootstrapExpected` immutable. Previously, the
flag was a config value but it gets set to 0 after cluster bootstrap
completes.
Nomad jobs may be configured with a TaskGroup which contains a Service
definition that is Consul Connect enabled. These service definitions end
up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In
the case where Consul ACLs are enabled, a Service Identity token is required
for these tasks to run & connect, etc. This changeset enables the Nomad Server
to recieve RPC requests for the derivation of SI tokens on behalf of instances
of Consul Connect using Tasks. Those tokens are then relayed back to the
requesting Client, which then injects the tokens in the secrets directory of
the Task.
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.
Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.
This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
Tests typically call join cluster directly rather than rely on consul
discovery. Worse, consul discovery seems to cause additional leadership
transitions when a server is shutdown in tests than tests expect.