The `Job.Deregister` call will block on the client CSI controller RPCs
while the alloc still exists on the Nomad client node. So we need to
make the volume claim reaping async from the `Job.Deregister`. This
allows `nomad job stop` to return immediately. In order to make this
work, this changeset changes the volume GC so that the GC jobs are on a
by-volume basis rather than a by-job basis; we won't have to query
the (possibly deleted) job at the time of volume GC. We smuggle the
volume ID and whether it's a purge into the GC eval ID the same way we
smuggled the job ID previously.
The CSI plugins uses the external volume ID for all operations, but
the Client CSI RPCs uses the Nomad volume ID (human-friendly) for the
mount paths. Pass the External ID as an arg in the RPC call so that
the unpublish workflows have it without calling back to the server to
find the external ID.
The controller CSI plugins need the CSI node ID (or in other words,
the storage provider's view of node ID like the EC2 instance ID), not
the Nomad node ID, to determine how to detach the external volume.
* nomad/state/state_store: error message copy/paste error
* nomad/structs/structs: add a VolumeEval to the JobDeregisterResponse
* nomad/job_endpoint: synchronously, volumeClaimReap on job Deregister
* nomad/core_sched: make volumeClaimReap available without a CoreSched
* nomad/job_endpoint: Deregister return early if the job is missing
* nomad/job_endpoint_test: job Deregistion is idempotent
* nomad/core_sched: conditionally ignore alloc status in volumeClaimReap
* nomad/job_endpoint: volumeClaimReap all allocations, even running
* nomad/core_sched_test: extra argument to collectClaimsToGCImpl
* nomad/job_endpoint: job deregistration is not idempotent
I hypothesize that the flakiness in rolling update is due to shutting
down s3 server before s4 is properly added as a voter.
The chain of the flakiness is as follows:
1. Bootstrap with s1, s2, s3
2. Add s4
3. Wait for servers to register with 3 voting peers
* But we already have 3 voters (s1, s2, and s3)
* s4 is added as a non-voter in Raft v3 and must wait until autopilot promots it
4. Test proceeds without s4 being a voter
5. s3 shutdown
6. cluster changes stall due to leader election and too many pending configuration
changes (e.g. removing s3 from raft, promoting s4).
Here, I have the test wait until s4 is marked as a voter before shutting
down s3, so we don't have too many configuration changes at once.
In https://circleci.com/gh/hashicorp/nomad/57092, I noticed the
following events:
```
TestAutopilot_RollingUpdate: autopilot_test.go:204: adding server s4
TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.789Z [INFO] nomad/serf.go:60: nomad: adding server: server="nomad-137.global (Addr: 127.0.0.1:9177) (DC: dc1)"
TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.789Z [INFO] raft/raft.go:1018: nomad.raft: updating configuration: command=AddNonvoter server-id=c54b5bf4-1159-34f6-032d-56aefeb08425 server-addr=127.0.0.1:9177 servers="[{Suffrage:Voter ID:df01ba65-d1b2-17a9-f792-a4459b3a7c09 Address:127.0.0.1:9171} {Suffrage:Voter ID:c3337778-811e-2675-87f5-006309888387 Address:127.0.0.1:9173} {Suffrage:Voter ID:186d5e15-c473-e2b3-b5a4-3259a84e10ef Address:127.0.0.1:9169} {Suffrage:Nonvoter ID:c54b5bf4-1159-34f6-032d-56aefeb08425 Address:127.0.0.1:9177}]"
TestAutopilot_RollingUpdate: autopilot_test.go:218: shutting down server s3
TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.797Z [INFO] raft/replication.go:456: nomad.raft: aborting pipeline replication: peer="{Nonvoter c54b5bf4-1159-34f6-032d-56aefeb08425 127.0.0.1:9177}"
TestAutopilot_RollingUpdate: autopilot_test.go:235: waiting for s4 to stabalize and be promoted
TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.975Z [ERROR] raft/raft.go:1656: nomad.raft: failed to make requestVote RPC: target="{Voter c3337778-811e-2675-87f5-006309888387 127.0.0.1:9173}" error="dial tcp 127.0.0.1:9173: connect: connection refused"
TestAutopilot_RollingUpdate: retry.go:121: autopilot_test.go:241: don't want "c3337778-811e-2675-87f5-006309888387"
autopilot_test.go:241: didn't find map[c54b5bf4-1159-34f6-032d-56aefeb08425:true] in []raft.ServerID{"df01ba65-d1b2-17a9-f792-a4459b3a7c09", "186d5e15-c473-e2b3-b5a4-3259a84e10ef"}
```
Note how s3, c3337778, is present in the peers list in the final
failure, but s4, c54b5bf4, is added as a Nonvoter and isn't present in
the final peers list.
If a volume-claiming alloc stops and the CSI Node plugin that serves
that alloc's volumes is missing, there's no way for the allocrunner
hook to send the `NodeUnpublish` and `NodeUnstage` RPCs.
This changeset addresses this issue with a redesign of the client-side
for CSI. Rather than unmounting in the alloc runner hook, the alloc
runner hook will simply exit. When the server gets the
`Node.UpdateAlloc` for the terminal allocation that had a volume claim,
it creates a volume claim GC job. This job will made client RPCs to a
new node plugin RPC endpoint, and only once that succeeds, move on to
making the client RPCs to the controller plugin. If the node plugin is
unavailable, the GC job will fail and be requeued.
* nomad/state/state_store: enforce that the volume namespace exists
* nomad/csi_endpoint_test: a couple of broken namespaces now
* nomad/csi_endpoint_test: one more test
* nomad/node_endpoint_test: use structs.DefaultNamespace
* nomad/state/state_store_test: use DefaultNamespace
Part of #6120
Building on the support for enabling connect proxy paths in #7323, this change
adds the ability to configure the 'service.check.expose' flag on group-level
service check definitions for services that are connect-enabled. This is a slight
deviation from the "magic" that Consul provides. With Consul, the 'expose' flag
exists on the connect.proxy stanza, which will then auto-generate expose paths
for every HTTP and gRPC service check associated with that connect-enabled
service.
A first attempt at providing similar magic for Nomad's Consul Connect integration
followed that pattern exactly, as seen in #7396. However, on reviewing the PR
we realized having the `expose` flag on the proxy stanza inseperably ties together
the automatic path generation with every HTTP/gRPC defined on the service. This
makes sense in Consul's context, because a service definition is reasonably
associated with a single "task". With Nomad's group level service definitions
however, there is a reasonable expectation that a service definition is more
abstractly representative of multiple services within the task group. In this
case, one would want to define checks of that service which concretely make HTTP
or gRPC requests to different underlying tasks. Such a model is not possible
with the course `proxy.expose` flag.
Instead, we now have the flag made available within the check definitions themselves.
By making the expose feature resolute to each check, it is possible to have
some HTTP/gRPC checks which make use of the envoy exposed paths, as well as
some HTTP/gRPC checks which make use of some orthongonal port-mapping to do
checks on some other task (or even some other bound port of the same task)
within the task group.
Given this example,
group "server-group" {
network {
mode = "bridge"
port "forchecks" {
to = -1
}
}
service {
name = "myserver"
port = 2000
connect {
sidecar_service {
}
}
check {
name = "mycheck-myserver"
type = "http"
port = "forchecks"
interval = "3s"
timeout = "2s"
method = "GET"
path = "/classic/responder/health"
expose = true
}
}
}
Nomad will automatically inject (via job endpoint mutator) the
extrapolated expose path configuration, i.e.
expose {
path {
path = "/classic/responder/health"
protocol = "http"
local_path_port = 2000
listener_port = "forchecks"
}
}
Documentation is coming in #7440 (needs updating, doing next)
Modifications to the `countdash` examples in https://github.com/hashicorp/demo-consul-101/pull/6
which will make the examples in the documentation actually runnable.
Will add some e2e tests based on the above when it becomes available.
Enable configuration of HTTP and gRPC endpoints which should be exposed by
the Connect sidecar proxy. This changeset is the first "non-magical" pass
that lays the groundwork for enabling Consul service checks for tasks
running in a network namespace because they are Connect-enabled. The changes
here provide for full configuration of the
connect {
sidecar_service {
proxy {
expose {
paths = [{
path = <exposed endpoint>
protocol = <http or grpc>
local_path_port = <local endpoint port>
listener_port = <inbound mesh port>
}, ... ]
}
}
}
stanza. Everything from `expose` and below is new, and partially implements
the precedent set by Consul:
https://www.consul.io/docs/connect/registration/service-registration.html#expose-paths-configuration-reference
Combined with a task-group level network port-mapping in the form:
port "exposeExample" { to = -1 }
it is now possible to "punch a hole" through the network namespace
to a specific HTTP or gRPC path, with the anticipated use case of creating
Consul checks on Connect enabled services.
A future PR may introduce more automagic behavior, where we can do things like
1) auto-fill the 'expose.path.local_path_port' with the default value of the
'service.port' value for task-group level connect-enabled services.
2) automatically generate a port-mapping
3) enable an 'expose.checks' flag which automatically creates exposed endpoints
for every compatible consul service check (http/grpc checks on connect
enabled services).
* nomad/structs/structs: new NodeEventSubsystemCSI
* client/client: pass triggerNodeEvent in the CSIConfig
* client/pluginmanager/csimanager/instance: add eventer to instanceManager
* client/pluginmanager/csimanager/manager: pass triggerNodeEvent
* client/pluginmanager/csimanager/volume: node event on [un]mount
* nomad/structs/structs: use storage, not CSI
* client/pluginmanager/csimanager/volume: use storage, not CSI
* client/pluginmanager/csimanager/volume_test: eventer
* client/pluginmanager/csimanager/volume: event on error
* client/pluginmanager/csimanager/volume_test: check event on error
* command/node_status: remove an extra space in event detail format
* client/pluginmanager/csimanager/volume: use snake_case for details
* client/pluginmanager/csimanager/volume_test: snake_case details
The `csi_plugins` and `csi_volumes` tables were missing support for
snapshot persist and restore. This means restoring a snapshot would
result in missing information for CSI.
In tests where the logger is a test logger, emitting a trace log in a
background thread while it's shutting down may trigger a panic. Thus
avoid logging Trace if err != nil. Note that we already log an error
when err isn't a trace.
This fixes cases where tests panic with a trace like:
```
panic: Log in goroutine after TestAllocGarbageCollector_MakeRoomFor_MaxAllocs has completed
goroutine 30 [running]:
testing.(*common).logDepth(0xc000aa9e60, 0xc000c4a000, 0xab, 0x3)
/usr/local/Cellar/go/1.14/libexec/src/testing/testing.go:680 +0x4d3
testing.(*common).log(...)
/usr/local/Cellar/go/1.14/libexec/src/testing/testing.go:662
testing.(*common).Logf(0xc000aa9e60, 0x690b941, 0x4, 0xc001366c00, 0x2, 0x2)
/usr/local/Cellar/go/1.14/libexec/src/testing/testing.go:701 +0x7e
github.com/hashicorp/nomad/helper/testlog.(*writer).Write(0xc000a82a60, 0xc0000b48c0, 0xab, 0x13f, 0x0, 0x0, 0x0)
/Users/notnoop/go/src/github.com/hashicorp/nomad/helper/testlog/testlog.go:34 +0x106
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(*writer).Flush(0xc000a80900, 0xbf9870f000000001, 0x20a87556e, 0x8b12bc0)
/Users/notnoop/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/writer.go:29 +0x14f
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(*intLogger).log(0xc000e2c180, 0xc0003b6880, 0x17, 0x1, 0x6974edc, 0x22, 0xc000db57a0, 0x6, 0x6)
/Users/notnoop/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/intlogger.go:139 +0x15d
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(*intLogger).Trace(0xc000e2c180, 0x6974edc, 0x22, 0xc000db57a0, 0x6, 0x6)
/Users/notnoop/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/intlogger.go:446 +0x7a
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(*interceptLogger).Trace(0xc0002f1ad0, 0x6974edc, 0x22, 0xc000db57a0, 0x6, 0x6)
/Users/notnoop/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/interceptlogger.go:48 +0x9c
github.com/hashicorp/nomad/nomad/drainer.(*drainingJobWatcher).watch(0xc0002f2380)
/Users/notnoop/go/src/github.com/hashicorp/nomad/nomad/drainer/watch_jobs.go:147 +0x1125
created by github.com/hashicorp/nomad/nomad/drainer.NewDrainingJobWatcher
/Users/notnoop/go/src/github.com/hashicorp/nomad/nomad/drainer/watch_jobs.go:89 +0x1e3
FAIL github.com/hashicorp/nomad/client 10.605s
FAIL
```
* added method to retrieve all scaling policies for use in snapshotting, plus test
* better testing for ScalingPoliciesByNamespace
* added scaling policy snapshot persist and restore (and test of restore)
manually tested snapshot restore.
resolves#7539
* nomad/structs/csi: delete just one plugin type from a node
* nomad/structs/csi: add DeleteAlloc
* nomad/state/state_store: add deleteJobFromPlugin
* nomad/state/state_store: use DeleteAlloc not DeleteNodeType
* move CreateTestCSIPlugin to state to avoid an import cycle
* nomad/state/state_store_test: delete a plugin by deleting its jobs
* nomad/*_test: move CreateTestCSIPlugin to state
* nomad/state/state_store: update one plugin per transaction
* command/plugin_status_test: move CreateTestCSIPlugin
* nomad: csi: handle nils CSIPlugin methods, clarity
If not in use and not changing external ids, it should not be an error to register a volume again.
* nomad/state/state_store: make volume registration idempotent
TestStateStore_Indexes specifically tests for `nodes` index, but asserts
on the exact number of indexes present in the state. This is fragile
and will break almost everytime we add a state index.