Merge branch 'master' into b-reserved-scoring

This commit is contained in:
Michael Schurter 2020-04-30 14:48:14 -07:00 committed by GitHub
commit c901d0e7dd
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
154 changed files with 5992 additions and 1727 deletions

View file

@ -1,9 +1,30 @@
## 0.11.1 (Unreleased) ## 0.11.2 (Unreleased)
FEATURES:
* **Task dependencies UI**: task lifecycle charts and details
BUG FIXES:
* api: autoscaling policies should not be returned for stopped jobs [[GH-7768](https://github.com/hashicorp/nomad/issues/7768)]
* core: job scale status endpoint was returning incorrect counts [[GH-7789](https://github.com/hashicorp/nomad/issues/7789)]
* core: Fixed a bug where scores for allocations were biased toward nodes with resource reservations [[GH-7730](https://github.com/hashicorp/nomad/issues/7730)]
* jobspec: autoscaling policy block should return a parsing error multiple `policy` blocks are provided [[GH-7716](https://github.com/hashicorp/nomad/issues/7716)]
* ui: Fixed a bug where exec popup had incorrect URL for jobs where name ≠ id [[GH-7814](https://github.com/hashicorp/nomad/issues/7814)]
## 0.11.1 (April 22, 2020)
BUG FIXES: BUG FIXES:
* core: Fixed a bug that only ran a task `shutdown_delay` if the task had a registered service [[GH-7663](https://github.com/hashicorp/nomad/issues/7663)] * core: Fixed a bug that only ran a task `shutdown_delay` if the task had a registered service [[GH-7663](https://github.com/hashicorp/nomad/issues/7663)]
* core: Fixed a bug where scores for allocations were biased toward nodes with resource reservations [[GH-7730](https://github.com/hashicorp/nomad/issues/7730)] * core: Fixed a panic when garbage collecting a job with allocations spanning multiple versions [[GH-7758](https://github.com/hashicorp/nomad/issues/7758)]
* agent: Fixed a bug where http server logs did not honor json log formatting, and reduced http server logging level to Trace [[GH-7748](https://github.com/hashicorp/nomad/issues/7748)]
* connect: Fixed bugs where some connect parameters would be ignored [[GH-7690](https://github.com/hashicorp/nomad/pull/7690)] [[GH-7684](https://github.com/hashicorp/nomad/pull/7684)]
* connect: Fixed a bug where an absent connect sidecar_service stanza would trigger panic [[GH-7683](https://github.com/hashicorp/nomad/pull/7683)]
* connect: Fixed a bug where some connect proxy fields would be dropped from 'job inspect' output [[GH-7397](https://github.com/hashicorp/nomad/issues/7397)]
* csi: Fixed a panic when claiming a volume for an allocation that was already garbage collected [[GH-7760](https://github.com/hashicorp/nomad/issues/7760)]
* csi: Fixed a bug where CSI plugins with `NODE_STAGE_VOLUME` capabilities were receiving an incorrect volume ID [[GH-7754](https://github.com/hashicorp/nomad/issues/7754)]
* driver/docker: Fixed a bug where retrying failed docker creation may in rare cases trigger a panic [[GH-7749](https://github.com/hashicorp/nomad/issues/7749)]
* scheduler: Fixed a bug in managing allocated devices for a job allocation in in-place update scenarios [[GH-7762](https://github.com/hashicorp/nomad/issues/7762)]
* vault: Upgrade http2 library to fix Vault API calls that fail with `http2: no cached connection was available` [[GH-7673](https://github.com/hashicorp/nomad/issues/7673)] * vault: Upgrade http2 library to fix Vault API calls that fail with `http2: no cached connection was available` [[GH-7673](https://github.com/hashicorp/nomad/issues/7673)]
## 0.11.0 (April 8, 2020) ## 0.11.0 (April 8, 2020)
@ -62,7 +83,7 @@ BUG FIXES:
SECURITY: SECURITY:
* server: Override content-type headers for unsafe content. CVE-TBD [[GH-7468](https://github.com/hashicorp/nomad/issues/7468)] * server: Override content-type headers for unsafe content. CVE-2020-10944 [[GH-7468](https://github.com/hashicorp/nomad/issues/7468)]
## 0.10.4 (February 19, 2020) ## 0.10.4 (February 19, 2020)

View file

@ -175,11 +175,7 @@ deps: ## Install build and development dependencies
GO111MODULE=on go get -u gotest.tools/gotestsum GO111MODULE=on go get -u gotest.tools/gotestsum
GO111MODULE=on go get -u github.com/fatih/hclfmt GO111MODULE=on go get -u github.com/fatih/hclfmt
GO111MODULE=on go get -u github.com/golang/protobuf/protoc-gen-go@v1.3.4 GO111MODULE=on go get -u github.com/golang/protobuf/protoc-gen-go@v1.3.4
GO111MODULE=on go get -u github.com/hashicorp/go-msgpack/codec/codecgen@v1.1.5
# The tag here must correspoond to codec version nomad uses, e.g. v1.1.5.
# Though, v1.1.5 codecgen has a bug in code generator, so using a specific sha
# here instead.
GO111MODULE=on go get -u github.com/hashicorp/go-msgpack/codec/codecgen@f51b5189210768cf0d476580cf287620374d4f02
.PHONY: lint-deps .PHONY: lint-deps
lint-deps: ## Install linter dependencies lint-deps: ## Install linter dependencies
@ -200,11 +196,15 @@ check: ## Lint the source code
@golangci-lint run -j 1 @golangci-lint run -j 1
@echo "==> Spell checking website..." @echo "==> Spell checking website..."
@misspell -error -source=text website/source/ @misspell -error -source=text website/pages/
@echo "==> Check proto files are in-sync..." @echo "==> Check proto files are in-sync..."
@$(MAKE) proto @$(MAKE) proto
@if (git status | grep -q .pb.go); then echo the following proto files are out of sync; git status |grep .pb.go; exit 1; fi @if (git status -s | grep -q .pb.go); then echo the following proto files are out of sync; git status -s | grep .pb.go; exit 1; fi
@echo "==> Check format of jobspecs and HCL files..."
@$(MAKE) hclfmt
@if (git status -s | grep -q -e '\.hcl$$' -e '\.nomad$$'); then echo the following HCL files are out of sync; git status -s | grep -e '\.hcl$$' -e '\.nomad$$'; exit 1; fi
@echo "==> Check API package is isolated from rest" @echo "==> Check API package is isolated from rest"
@if go list --test -f '{{ join .Deps "\n" }}' ./api | grep github.com/hashicorp/nomad/ | grep -v -e /vendor/ -e /nomad/api/ -e nomad/api.test; then echo " /api package depends the ^^ above internal nomad packages. Remove such dependency"; exit 1; fi @if go list --test -f '{{ join .Deps "\n" }}' ./api | grep github.com/hashicorp/nomad/ | grep -v -e /vendor/ -e /nomad/api/ -e nomad/api.test; then echo " /api package depends the ^^ above internal nomad packages. Remove such dependency"; exit 1; fi
@ -229,7 +229,7 @@ generate-structs: ## Update generated code
.PHONY: proto .PHONY: proto
proto: proto:
@echo "--> Generating proto bindings..." @echo "--> Generating proto bindings..."
@for file in $$(git ls-files "*.proto" | grep -v "vendor\/.*.proto"); do \ @for file in $$(git ls-files "*.proto" | grep -E -v -- "vendor\/.*.proto|demo\/.*.proto"); do \
protoc -I . -I ../../.. --go_out=plugins=grpc:. $$file; \ protoc -I . -I ../../.. --go_out=plugins=grpc:. $$file; \
done done

View file

@ -10,7 +10,7 @@ import (
const ( const (
// The following levels are the only valid values for the `policy = "read"` stanza. // The following levels are the only valid values for the `policy = "read"` stanza.
// When policies are merged together, the most privilege is granted, except for deny // When policies are merged together, the most privilege is granted, except for deny
// which always takes precedence and supercedes. // which always takes precedence and supersedes.
PolicyDeny = "deny" PolicyDeny = "deny"
PolicyRead = "read" PolicyRead = "read"
PolicyList = "list" PolicyList = "list"

View file

@ -45,7 +45,7 @@ type Jobs struct {
client *Client client *Client
} }
// JobsParseRequest is used for arguments of the /vi/jobs/parse endpoint // JobsParseRequest is used for arguments of the /v1/jobs/parse endpoint
type JobsParseRequest struct { type JobsParseRequest struct {
// JobHCL is an hcl jobspec // JobHCL is an hcl jobspec
JobHCL string JobHCL string
@ -60,7 +60,7 @@ func (c *Client) Jobs() *Jobs {
return &Jobs{client: c} return &Jobs{client: c}
} }
// Parse is used to convert the HCL repesentation of a Job to JSON server side. // ParseHCL is used to convert the HCL repesentation of a Job to JSON server side.
// To parse the HCL client side see package github.com/hashicorp/nomad/jobspec // To parse the HCL client side see package github.com/hashicorp/nomad/jobspec
func (j *Jobs) ParseHCL(jobHCL string, canonicalize bool) (*Job, error) { func (j *Jobs) ParseHCL(jobHCL string, canonicalize bool) (*Job, error) {
var job Job var job Job

View file

@ -125,7 +125,7 @@ func (a *allocHealthSetter) SetHealth(healthy, isDeploy bool, trackerTaskEvents
a.ar.allocBroadcaster.Send(calloc) a.ar.allocBroadcaster.Send(calloc)
} }
// initRunnerHooks intializes the runners hooks. // initRunnerHooks initializes the runners hooks.
func (ar *allocRunner) initRunnerHooks(config *clientconfig.Config) error { func (ar *allocRunner) initRunnerHooks(config *clientconfig.Config) error {
hookLogger := ar.logger.Named("runner_hook") hookLogger := ar.logger.Named("runner_hook")

View file

@ -104,6 +104,7 @@ func (c *csiHook) claimVolumesFromAlloc() (map[string]*volumeAndRequest, error)
req := &structs.CSIVolumeClaimRequest{ req := &structs.CSIVolumeClaimRequest{
VolumeID: pair.request.Source, VolumeID: pair.request.Source,
AllocationID: c.alloc.ID, AllocationID: c.alloc.ID,
NodeID: c.alloc.NodeID,
Claim: claimType, Claim: claimType,
} }
req.Region = c.alloc.Job.Region req.Region = c.alloc.Job.Region

View file

@ -143,12 +143,12 @@ func (c *CSI) ControllerDetachVolume(req *structs.ClientCSIControllerDetachVolum
csiReq := req.ToCSIRequest() csiReq := req.ToCSIRequest()
// Submit the request for a volume to the CSI Plugin. // Submit the request for a volume to the CSI Plugin.
ctx, cancelFn := context.WithTimeout(context.Background(), 30*time.Second) ctx, cancelFn := c.requestContext()
defer cancelFn() defer cancelFn()
// CSI ControllerUnpublishVolume errors for timeout, codes.Unavailable and // CSI ControllerUnpublishVolume errors for timeout, codes.Unavailable and
// codes.ResourceExhausted are retried; all other errors are fatal. // codes.ResourceExhausted are retried; all other errors are fatal.
_, err = plugin.ControllerUnpublishVolume(ctx, csiReq, _, err = plugin.ControllerUnpublishVolume(ctx, csiReq,
grpc_retry.WithPerRetryTimeout(10*time.Second), grpc_retry.WithPerRetryTimeout(CSIPluginRequestTimeout),
grpc_retry.WithMax(3), grpc_retry.WithMax(3),
grpc_retry.WithBackoff(grpc_retry.BackoffExponential(100*time.Millisecond))) grpc_retry.WithBackoff(grpc_retry.BackoffExponential(100*time.Millisecond)))
if err != nil { if err != nil {

View file

@ -21,14 +21,14 @@ import (
const ( const (
// AwsMetadataTimeout is the timeout used when contacting the AWS metadata // AwsMetadataTimeout is the timeout used when contacting the AWS metadata
// service // services.
AwsMetadataTimeout = 2 * time.Second AwsMetadataTimeout = 2 * time.Second
) )
// map of instance type to approximate speed, in Mbits/s // map of instance type to approximate speed, in Mbits/s
// Estimates from http://stackoverflow.com/a/35806587 // Estimates from http://stackoverflow.com/a/35806587
// This data is meant for a loose approximation // This data is meant for a loose approximation
var ec2InstanceSpeedMap = map[*regexp.Regexp]int{ var ec2NetSpeedTable = map[*regexp.Regexp]int{
regexp.MustCompile("t2.nano"): 30, regexp.MustCompile("t2.nano"): 30,
regexp.MustCompile("t2.micro"): 70, regexp.MustCompile("t2.micro"): 70,
regexp.MustCompile("t2.small"): 125, regexp.MustCompile("t2.small"): 125,
@ -46,6 +46,353 @@ var ec2InstanceSpeedMap = map[*regexp.Regexp]int{
regexp.MustCompile(`.*\.32xlarge`): 10000, regexp.MustCompile(`.*\.32xlarge`): 10000,
} }
type ec2Specs struct {
mhz float64
cores int
model string
}
func (e ec2Specs) ticks() int {
return int(e.mhz) * e.cores
}
func specs(ghz float64, vCores int, model string) ec2Specs {
return ec2Specs{
mhz: ghz * 1000,
cores: vCores,
model: model,
}
}
// Map of instance type to documented CPU speed.
//
// Most values are taken from https://aws.amazon.com/ec2/instance-types/.
// Values for a1 & m6g (Graviton) are taken from https://en.wikichip.org/wiki/annapurna_labs/alpine/al73400
// Values for inf1 are taken from launching a inf1.xlarge and looking at /proc/cpuinfo
//
// In a few cases, AWS has upgraded the generation of CPU while keeping the same
// instance designation. Since it is possible to launch on the lower performance
// CPU, that one is used as the spec for the instance type.
//
// This table is provided as a best-effort to determine the number of CPU ticks
// available for use by Nomad tasks. If an instance type is missing, the fallback
// behavior is to use values from go-psutil, which is only capable of reading
// "current" CPU MHz.
var ec2ProcSpeedTable = map[string]ec2Specs{
// -- General Purpose --
// a1
"a1.medium": specs(2.3, 1, "AWS Graviton"),
"a1.large": specs(2.3, 2, "AWS Graviton"),
"a1.xlarge": specs(2.3, 4, "AWS Graviton"),
"a1.2xlarge": specs(2.3, 8, "AWS Graviton"),
"a1.4xlarge": specs(2.3, 16, "AWS Graviton"),
"a1.metal": specs(2.3, 16, "AWS Graviton"),
// t3
"t3.nano": specs(2.5, 2, "2.5 GHz Intel Scalable"),
"t3.micro": specs(2.5, 2, "2.5 GHz Intel Scalable"),
"t3.small": specs(2.5, 2, "2.5 GHz Intel Scalable"),
"t3.medium": specs(2.5, 2, "2.5 GHz Intel Scalable"),
"t3.large": specs(2.5, 2, "2.5 GHz Intel Scalable"),
"t3.xlarge": specs(2.5, 4, "2.5 GHz Intel Scalable"),
"t3.2xlarge": specs(2.5, 8, "2.5 GHz Intel Scalable"),
// t3a
"t3a.nano": specs(2.5, 2, "2.5 GHz AMD EPYC 7000 series"),
"t3a.micro": specs(2.5, 2, "2.5 GHz AMD EPYC 7000 series"),
"t3a.small": specs(2.5, 2, "2.5 GHz AMD EPYC 7000 series"),
"t3a.medium": specs(2.5, 2, "2.5 GHz AMD EPYC 7000 series"),
"t3a.large": specs(2.5, 2, "2.5 GHz AMD EPYC 7000 series"),
"t3a.xlarge": specs(2.5, 4, "2.5 GHz AMD EPYC 7000 series"),
"t3a.2xlarge": specs(2.5, 8, "2.5 GHz AMD EPYC 7000 series"),
// t2
"t2.nano": specs(3.3, 1, "3.3 GHz Intel Scalable"),
"t2.micro": specs(3.3, 1, "3.3 GHz Intel Scalable"),
"t2.small": specs(3.3, 1, "3.3 GHz Intel Scalable"),
"t2.medium": specs(3.3, 2, "3.3 GHz Intel Scalable"),
"t2.large": specs(3.0, 2, "3.0 GHz Intel Scalable"),
"t2.xlarge": specs(3.0, 4, "3.0 GHz Intel Scalable"),
"t2.2xlarge": specs(3.0, 8, "3.0 GHz Intel Scalable"),
// m6g
"m6g.medium": specs(2.3, 1, "AWS Graviton2 Neoverse"),
"m6g.large": specs(2.3, 2, "AWS Graviton2 Neoverse"),
"m6g.xlarge": specs(2.3, 4, "AWS Graviton2 Neoverse"),
"m6g.2xlarge": specs(2.3, 8, "AWS Graviton2 Neoverse"),
"m6g.4xlarge": specs(2.3, 16, "AWS Graviton2 Neoverse"),
"m6g.8xlarge": specs(2.3, 32, "AWS Graviton2 Neoverse"),
"m6g.12xlarge": specs(2.3, 48, "AWS Graviton2 Neoverse"),
"m6g.16xlarge": specs(2.3, 64, "AWS Graviton2 Neoverse"),
// m5, m5d
"m5.large": specs(3.1, 2, "3.1 GHz Intel Xeon Platinum"),
"m5.xlarge": specs(3.1, 4, "3.1 GHz Intel Xeon Platinum"),
"m5.2xlarge": specs(3.1, 8, "3.1 GHz Intel Xeon Platinum"),
"m5.4xlarge": specs(3.1, 16, "3.1 GHz Intel Xeon Platinum"),
"m5.8xlarge": specs(3.1, 32, "3.1 GHz Intel Xeon Platinum"),
"m5.12xlarge": specs(3.1, 48, "3.1 GHz Intel Xeon Platinum"),
"m5.16xlarge": specs(3.1, 64, "3.1 GHz Intel Xeon Platinum"),
"m5.24xlarge": specs(3.1, 96, "3.1 GHz Intel Xeon Platinum"),
"m5.metal": specs(3.1, 96, "3.1 GHz Intel Xeon Platinum"),
"m5d.large": specs(3.1, 2, "3.1 GHz Intel Xeon Platinum"),
"m5d.xlarge": specs(3.1, 4, "3.1 GHz Intel Xeon Platinum"),
"m5d.2xlarge": specs(3.1, 8, "3.1 GHz Intel Xeon Platinum"),
"m5d.4xlarge": specs(3.1, 16, "3.1 GHz Intel Xeon Platinum"),
"m5d.8xlarge": specs(3.1, 32, "3.1 GHz Intel Xeon Platinum"),
"m5d.12xlarge": specs(3.1, 48, "3.1 GHz Intel Xeon Platinum"),
"m5d.16xlarge": specs(3.1, 64, "3.1 GHz Intel Xeon Platinum"),
"m5d.24xlarge": specs(3.1, 96, "3.1 GHz Intel Xeon Platinum"),
"m5d.metal": specs(3.1, 96, "3.1 GHz Intel Xeon Platinum"),
// m5a, m5ad
"m5a.large": specs(2.5, 2, "2.5 GHz AMD EPYC 7000 series"),
"m5a.xlarge": specs(2.5, 4, "2.5 GHz AMD EPYC 7000 series"),
"m5a.2xlarge": specs(2.5, 8, "2.5 GHz AMD EPYC 7000 series"),
"m5a.4xlarge": specs(2.5, 16, "2.5 GHz AMD EPYC 7000 series"),
"m5a.8xlarge": specs(2.5, 32, "2.5 GHz AMD EPYC 7000 series"),
"m5a.12xlarge": specs(2.5, 48, "2.5 GHz AMD EPYC 7000 series"),
"m5a.16xlarge": specs(2.5, 64, "2.5 GHz AMD EPYC 7000 series"),
"m5a.24xlarge": specs(2.5, 96, "2.5 GHz AMD EPYC 7000 series"),
"m5ad.large": specs(2.5, 2, "2.5 GHz AMD EPYC 7000 series"),
"m5ad.xlarge": specs(2.5, 4, "2.5 GHz AMD EPYC 7000 series"),
"m5ad.2xlarge": specs(2.5, 8, "2.5 GHz AMD EPYC 7000 series"),
"m5ad.4xlarge": specs(2.5, 16, "2.5 GHz AMD EPYC 7000 series"),
"m5ad.12xlarge": specs(2.5, 48, "2.5 GHz AMD EPYC 7000 series"),
"m5ad.24xlarge": specs(2.5, 96, "2.5 GHz AMD EPYC 7000 series"),
// m5n, m5dn
"m5n.large": specs(3.1, 2, "3.1 GHz Intel Xeon Scalable"),
"m5n.xlarge": specs(3.1, 4, "3.1 GHz Intel Xeon Scalable"),
"m5n.2xlarge": specs(3.1, 8, "3.1 GHz Intel Xeon Scalable"),
"m5n.4xlarge": specs(3.1, 16, "3.1 GHz Intel Xeon Scalable"),
"m5n.8xlarge": specs(3.1, 32, "3.1 GHz Intel Xeon Scalable"),
"m5n.12xlarge": specs(3.1, 48, "3.1 GHz Intel Xeon Scalable"),
"m5n.16xlarge": specs(3.1, 64, "3.1 GHz Intel Xeon Scalable"),
"m5n.24xlarge": specs(3.1, 96, "3.1 GHz Intel Xeon Scalable"),
"m5dn.large": specs(3.1, 2, "3.1 GHz Intel Xeon Scalable"),
"m5dn.xlarge": specs(3.1, 4, "3.1 GHz Intel Xeon Scalable"),
"m5dn.2xlarge": specs(3.1, 8, "3.1 GHz Intel Xeon Scalable"),
"m5dn.4xlarge": specs(3.1, 16, "3.1 GHz Intel Xeon Scalable"),
"m5dn.8xlarge": specs(3.1, 32, "3.1 GHz Intel Xeon Scalable"),
"m5dn.12xlarge": specs(3.1, 48, "3.1 GHz Intel Xeon Scalable"),
"m5dn.16xlarge": specs(3.1, 64, "3.1 GHz Intel Xeon Scalable"),
"m5dn.24xlarge": specs(3.1, 96, "3.1 GHz Intel Xeon Scalable"),
// m4
"m4.large": specs(2.3, 2, "2.3 GHz Intel Xeon® E5-2686 v4"),
"m4.xlarge": specs(2.3, 4, "2.3 GHz Intel Xeon® E5-2686 v4"),
"m4.2xlarge": specs(2.3, 8, "2.3 GHz Intel Xeon® E5-2686 v4"),
"m4.4xlarge": specs(2.3, 16, "2.3 GHz Intel Xeon® E5-2686 v4"),
"m4.10xlarge": specs(2.3, 40, "2.3 GHz Intel Xeon® E5-2686 v4"),
"m4.16xlarge": specs(2.3, 64, "2.3 GHz Intel Xeon® E5-2686 v4"),
// -- Compute Optimized --
// c5, c5d
"c5.large": specs(3.4, 2, "3.4 GHz Intel Xeon Platinum 8000"),
"c5.xlarge": specs(3.4, 4, "3.4 GHz Intel Xeon Platinum 8000"),
"c5.2xlarge": specs(3.4, 8, "3.4 GHz Intel Xeon Platinum 8000"),
"c5.4xlarge": specs(3.4, 16, "3.4 GHz Intel Xeon Platinum 8000"),
"c5.9xlarge": specs(3.4, 36, "3.4 GHz Intel Xeon Platinum 8000"),
"c5.12xlarge": specs(3.6, 48, "3.6 GHz Intel Xeon Scalable"),
"c5.18xlarge": specs(3.6, 72, "3.6 GHz Intel Xeon Scalable"),
"c5.24xlarge": specs(3.6, 96, "3.6 GHz Intel Xeon Scalable"),
"c5.metal": specs(3.6, 96, "3.6 GHz Intel Xeon Scalable"),
"c5d.large": specs(3.4, 2, "3.4 GHz Intel Xeon Platinum 8000"),
"c5d.xlarge": specs(3.4, 4, "3.4 GHz Intel Xeon Platinum 8000"),
"c5d.2xlarge": specs(3.4, 8, "3.4 GHz Intel Xeon Platinum 8000"),
"c5d.4xlarge": specs(3.4, 16, "3.4 GHz Intel Xeon Platinum 8000"),
"c5d.9xlarge": specs(3.4, 36, "3.4 GHz Intel Xeon Platinum 8000"),
"c5d.12xlarge": specs(3.6, 48, "3.6 GHz Intel Xeon Scalable"),
"c5d.18xlarge": specs(3.6, 72, "3.6 GHz Intel Xeon Scalable"),
"c5d.24xlarge": specs(3.6, 96, "3.6 GHz Intel Xeon Scalable"),
"c5d.metal": specs(3.6, 96, "3.6 GHz Intel Xeon Scalable"),
// c5n
"c5n.large": specs(3.0, 2, "3.0 GHz Intel Xeon Platinum"),
"c5n.xlarge": specs(3.0, 4, "3.0 GHz Intel Xeon Platinum"),
"c5n.2xlarge": specs(3.0, 8, "3.0 GHz Intel Xeon Platinum"),
"c5n.4xlarge": specs(3.0, 16, "3.0 GHz Intel Xeon Platinum"),
"c5n.9xlarge": specs(3.0, 36, "3.0 GHz Intel Xeon Platinum"),
"c5n.18xlarge": specs(3.0, 72, "3.0 GHz Intel Xeon Platinum"),
"c5n.metal": specs(3.0, 72, "3.0 GHz Intel Xeon Platinum"),
// c4
"c4.large": specs(2.9, 2, "2.9 GHz Intel Xeon E5-2666 v3"),
"c4.xlarge": specs(2.9, 4, "2.9 GHz Intel Xeon E5-2666 v3"),
"c4.2xlarge": specs(2.9, 8, "2.9 GHz Intel Xeon E5-2666 v3"),
"c4.4xlarge": specs(2.9, 16, "2.9 GHz Intel Xeon E5-2666 v3"),
"c4.8xlarge": specs(2.9, 36, "2.9 GHz Intel Xeon E5-2666 v3"),
// -- Memory Optimized --
// r5, r5d
"r5.large": specs(3.1, 2, "3.1 GHz Intel Xeon Platinum 8175"),
"r5.xlarge": specs(3.1, 4, "3.1 GHz Intel Xeon Platinum 8175"),
"r5.2xlarge": specs(3.1, 8, "3.1 GHz Intel Xeon Platinum 8175"),
"r5.4xlarge": specs(3.1, 16, "3.1 GHz Intel Xeon Platinum 8175"),
"r5.8xlarge": specs(3.1, 32, "3.1 GHz Intel Xeon Platinum 8175"),
"r5.12xlarge": specs(3.1, 48, "3.1 GHz Intel Xeon Platinum 8175"),
"r5.16xlarge": specs(3.1, 64, "3.1 GHz Intel Xeon Platinum 8175"),
"r5.24xlarge": specs(3.1, 96, "3.1 GHz Intel Xeon Platinum 8175"),
"r5.metal": specs(3.1, 96, "3.1 GHz Intel Xeon Platinum 8175"),
"r5d.large": specs(3.1, 2, "3.1 GHz Intel Xeon Platinum 8175"),
"r5d.xlarge": specs(3.1, 4, "3.1 GHz Intel Xeon Platinum 8175"),
"r5d.2xlarge": specs(3.1, 8, "3.1 GHz Intel Xeon Platinum 8175"),
"r5d.4xlarge": specs(3.1, 16, "3.1 GHz Intel Xeon Platinum 8175"),
"r5d.8xlarge": specs(3.1, 32, "3.1 GHz Intel Xeon Platinum 8175"),
"r5d.12xlarge": specs(3.1, 48, "3.1 GHz Intel Xeon Platinum 8175"),
"r5d.16xlarge": specs(3.1, 64, "3.1 GHz Intel Xeon Platinum 8175"),
"r5d.24xlarge": specs(3.1, 96, "3.1 GHz Intel Xeon Platinum 8175"),
"r5d.metal": specs(3.1, 96, "3.1 GHz Intel Xeon Platinum 8175"),
// r5a, r5ad
"r5a.large": specs(2.5, 2, "2.5 GHz AMD EPYC 7000 series"),
"r5a.xlarge": specs(2.5, 4, "2.5 GHz AMD EPYC 7000 series"),
"r5a.2xlarge": specs(2.5, 8, "2.5 GHz AMD EPYC 7000 series"),
"r5a.4xlarge": specs(2.5, 16, "2.5 GHz AMD EPYC 7000 series"),
"r5a.8xlarge": specs(2.5, 32, "2.5 GHz AMD EPYC 7000 series"),
"r5a.12xlarge": specs(2.5, 48, "2.5 GHz AMD EPYC 7000 series"),
"r5a.16xlarge": specs(2.5, 64, "2.5 GHz AMD EPYC 7000 series"),
"r5a.24xlarge": specs(2.5, 96, "2.5 GHz AMD EPYC 7000 series"),
"r5ad.large": specs(2.5, 2, "2.5 GHz AMD EPYC 7000 series"),
"r5ad.xlarge": specs(2.5, 4, "2.5 GHz AMD EPYC 7000 series"),
"r5ad.2xlarge": specs(2.5, 8, "2.5 GHz AMD EPYC 7000 series"),
"r5ad.4xlarge": specs(2.5, 16, "2.5 GHz AMD EPYC 7000 series"),
"r5ad.8xlarge": specs(2.5, 32, "2.5 GHz AMD EPYC 7000 series"),
"r5ad.12xlarge": specs(2.5, 48, "2.5 GHz AMD EPYC 7000 series"),
"r5ad.16xlarge": specs(2.5, 64, "2.5 GHz AMD EPYC 7000 series"),
"r5ad.24xlarge": specs(2.5, 96, "2.5 GHz AMD EPYC 7000 series"),
// r5n
"r5n.large": specs(3.1, 2, "3.1 GHz Intel Xeon Scalable"),
"r5n.xlarge": specs(3.1, 4, "3.1 GHz Intel Xeon Scalable"),
"r5n.2xlarge": specs(3.1, 8, "3.1 GHz Intel Xeon Scalable"),
"r5n.4xlarge": specs(3.1, 16, "3.1 GHz Intel Xeon Scalable"),
"r5n.8xlarge": specs(3.1, 32, "3.1 GHz Intel Xeon Scalable"),
"r5n.12xlarge": specs(3.1, 48, "3.1 GHz Intel Xeon Scalable"),
"r5n.16xlarge": specs(3.1, 64, "3.1 GHz Intel Xeon Scalable"),
"r5n.24xlarge": specs(3.1, 96, "3.1 GHz Intel Xeon Scalable"),
"r5dn.large": specs(3.1, 2, "3.1 GHz Intel Xeon Scalable"),
"r5dn.xlarge": specs(3.1, 4, "3.1 GHz Intel Xeon Scalable"),
"r5dn.2xlarge": specs(3.1, 8, "3.1 GHz Intel Xeon Scalable"),
"r5dn.4xlarge": specs(3.1, 16, "3.1 GHz Intel Xeon Scalable"),
"r5dn.8xlarge": specs(3.1, 32, "3.1 GHz Intel Xeon Scalable"),
"r5dn.12xlarge": specs(3.1, 48, "3.1 GHz Intel Xeon Scalable"),
"r5dn.16xlarge": specs(3.1, 64, "3.1 GHz Intel Xeon Scalable"),
"r5dn.24xlarge": specs(3.1, 96, "3.1 GHz Intel Xeon Scalable"),
// r4
"r4.large": specs(2.3, 2, "2.3 GHz Intel Xeon E5-2686 v4"),
"r4.xlarge": specs(2.3, 4, "2.3 GHz Intel Xeon E5-2686 v4"),
"r4.2xlarge": specs(2.3, 8, "2.3 GHz Intel Xeon E5-2686 v4"),
"r4.4xlarge": specs(2.3, 16, "2.3 GHz Intel Xeon E5-2686 v4"),
"r4.8xlarge": specs(2.3, 32, "2.3 GHz Intel Xeon E5-2686 v4"),
"r4.16xlarge": specs(2.3, 64, "2.3 GHz Intel Xeon E5-2686 v4"),
// x1e
"x1e.xlarge": specs(2.3, 4, "2.3 GHz Intel Xeon E7-8880 v3"),
"x1e.2xlarge": specs(2.3, 8, "2.3 GHz Intel Xeon E7-8880 v3"),
"x1e.4xlarge": specs(2.3, 16, "2.3 GHz Intel Xeon E7-8880 v3"),
"x1e.8xlarge": specs(2.3, 32, "2.3 GHz Intel Xeon E7-8880 v3"),
"x1e.16xlarge": specs(2.3, 64, "2.3 GHz Intel Xeon E7-8880 v3"),
"x1e.32xlarge": specs(2.3, 128, "2.3 GHz Intel Xeon E7-8880 v3"),
// x1
"x1.16xlarge": specs(2.3, 64, "2.3 GHz Intel Xeon E7-8880 v3"),
"x1.32xlarge": specs(2.3, 64, "2.3 GHz Intel Xeon E7-8880 v3"),
// high-memory
"u-6tb1.metal": specs(2.1, 448, "2.1 GHz Intel Xeon Platinum 8176M"),
"u-9tb1.metal": specs(2.1, 448, "2.1 GHz Intel Xeon Platinum 8176M"),
"u-12tb1.metal": specs(2.1, 448, "2.1 GHz Intel Xeon Platinum 8176M"),
"u-18tb1.metal": specs(2.7, 448, "2.7 GHz Intel Xeon Scalable"),
"u-24tb1.metal": specs(2.7, 448, "2.7 GHz Intel Xeon Scalable"),
// z1d
"z1d.large": specs(4.0, 2, "4.0 GHz Intel Xeon Scalable"),
"z1d.xlarge": specs(4.0, 4, "4.0 GHz Intel Xeon Scalable"),
"z1d.2xlarge": specs(4.0, 8, "4.0 GHz Intel Xeon Scalable"),
"z1d.3xlarge": specs(4.0, 12, "4.0 GHz Intel Xeon Scalable"),
"z1d.6xlarge": specs(4.0, 24, "4.0 GHz Intel Xeon Scalable"),
"z1d.12xlarge": specs(4.0, 48, "4.0 GHz Intel Xeon Scalable"),
"z1d.metal": specs(4.0, 48, "4.0 GHz Intel Xeon Scalable"),
// -- Accelerated Computing --
// p3, p3dn
"p3.2xlarge": specs(2.3, 8, "2.3 GHz Intel Xeon E5-2686 v4"),
"p3.8xlarge": specs(2.3, 32, "2.3 GHz Intel Xeon E5-2686 v4"),
"p3.16xlarge": specs(2.3, 64, "2.3 GHz Intel Xeon E5-2686 v4"),
"p3dn.24xlarge": specs(2.5, 96, "2.5 GHz Intel Xeon P-8175M"),
// p2
"p2.xlarge": specs(2.3, 4, "2.3 GHz Intel Xeon E5-2686 v4"),
"p2.8xlarge": specs(2.3, 32, "2.3 GHz Intel Xeon E5-2686 v4"),
"p2.16xlarge": specs(2.3, 64, "2.3 GHz Intel Xeon E5-2686 v4"),
// inf1
"inf1.xlarge": specs(3.0, 4, "3.0 GHz Intel Xeon Platinum 8275CL"),
"inf1.2xlarge": specs(3.0, 8, "3.0 GHz Intel Xeon Platinum 8275CL"),
"inf1.6xlarge": specs(3.0, 24, "3.0 GHz Intel Xeon Platinum 8275CL"),
"inf1.24xlarge": specs(3.0, 96, "3.0 GHz Intel Xeon Platinum 8275CL"),
// g4dn
"g4dn.xlarge": specs(2.5, 4, "2.5 GHz Cascade Lake 24C"),
"g4dn.2xlarge": specs(2.5, 8, "2.5 GHz Cascade Lake 24C"),
"g4dn.4xlarge": specs(2.5, 16, "2.5 GHz Cascade Lake 24C"),
"g4dn.8xlarge": specs(2.5, 32, "2.5 GHz Cascade Lake 24C"),
"g4dn.16xlarge": specs(2.5, 64, "2.5 GHz Cascade Lake 24C"),
"g4dn.12xlarge": specs(2.5, 48, "2.5 GHz Cascade Lake 24C"),
"g4dn.metal": specs(2.5, 96, "2.5 GHz Cascade Lake 24C"),
// g3
"g3s.xlarge": specs(2.3, 4, "2.3 GHz Intel Xeon E5-2686 v4"),
"g3s.4xlarge": specs(2.3, 16, "2.3 GHz Intel Xeon E5-2686 v4"),
"g3s.8xlarge": specs(2.3, 32, "2.3 GHz Intel Xeon E5-2686 v4"),
"g3s.16xlarge": specs(2.3, 64, "2.3 GHz Intel Xeon E5-2686 v4"),
// f1
"f1.2xlarge": specs(2.3, 8, "Intel Xeon E5-2686 v4"),
"f1.4xlarge": specs(2.3, 16, "Intel Xeon E5-2686 v4"),
"f1.16xlarge": specs(2.3, 64, "Intel Xeon E5-2686 v4"),
// -- Storage Optimized --
// i3
"i3.large": specs(2.3, 2, "2.3 GHz Intel Xeon E5 2686 v4"),
"i3.xlarge": specs(2.3, 4, "2.3 GHz Intel Xeon E5 2686 v4"),
"i3.2xlarge": specs(2.3, 8, "2.3 GHz Intel Xeon E5 2686 v4"),
"i3.4xlarge": specs(2.3, 16, "2.3 GHz Intel Xeon E5 2686 v4"),
"i3.8xlarge": specs(2.3, 32, "2.3 GHz Intel Xeon E5 2686 v4"),
"i3.16xlarge": specs(2.3, 64, "2.3 GHz Intel Xeon E5 2686 v4"),
"i3.metal": specs(2.3, 72, "2.3 GHz Intel Xeon E5 2686 v4"),
// i3en
"i3en.large": specs(3.1, 2, "3.1 GHz Intel Xeon Scalable"),
"i3en.xlarge": specs(3.1, 4, "3.1 GHz Intel Xeon Scalable"),
"i3en.2xlarge": specs(3.1, 8, "3.1 GHz Intel Xeon Scalable"),
"i3en.3xlarge": specs(3.1, 12, "3.1 GHz Intel Xeon Scalable"),
"i3en.6xlarge": specs(3.1, 24, "3.1 GHz Intel Xeon Scalable"),
"i3en.12xlarge": specs(3.1, 48, "3.1 GHz Intel Xeon Scalable"),
"i3en.24xlarge": specs(3.1, 96, "3.1 GHz Intel Xeon Scalable"),
"i3en.metal": specs(3.1, 96, "3.1 GHz Intel Xeon Scalable"),
// d2
"d2.xlarge": specs(2.4, 4, "2.4 GHz Intel Xeon E5-2676 v3"),
"d2.2xlarge": specs(2.4, 8, "2.4 GHz Intel Xeon E5-2676 v3"),
"d2.4xlarge": specs(2.4, 16, "2.4 GHz Intel Xeon E5-2676 v3"),
"d2.8xlarge": specs(2.4, 36, "2.4 GHz Intel Xeon E5-2676 v3"),
// h1
"h1.2xlarge": specs(2.3, 8, "2.3 GHz Intel Xeon E5 2686 v4"),
"h1.4xlarge": specs(2.3, 16, "2.3 GHz Intel Xeon E5 2686 v4"),
"h1.8xlarge": specs(2.3, 32, "2.3 GHz Intel Xeon E5 2686 v4"),
"h1.16xlarge": specs(2.3, 64, "2.3 GHz Intel Xeon E5 2686 v4"),
}
// EnvAWSFingerprint is used to fingerprint AWS metadata // EnvAWSFingerprint is used to fingerprint AWS metadata
type EnvAWSFingerprint struct { type EnvAWSFingerprint struct {
StaticFingerprinter StaticFingerprinter
@ -128,25 +475,48 @@ func (f *EnvAWSFingerprint) Fingerprint(request *FingerprintRequest, response *F
response.AddAttribute(key, v) response.AddAttribute(key, v)
} }
// newNetwork is populated and added to the Nodes resources // accumulate resource information, then assign to response
var newNetwork *structs.NetworkResource var resources *structs.Resources
var nodeResources *structs.NodeResources
// copy over network specific information // copy over network specific information
if val, ok := response.Attributes["unique.platform.aws.local-ipv4"]; ok && val != "" { if val, ok := response.Attributes["unique.platform.aws.local-ipv4"]; ok && val != "" {
response.AddAttribute("unique.network.ip-address", val) response.AddAttribute("unique.network.ip-address", val)
nodeResources = new(structs.NodeResources)
newNetwork = &structs.NetworkResource{ nodeResources.Networks = []*structs.NetworkResource{
Device: "eth0", {
IP: val, Device: "eth0",
CIDR: val + "/32", IP: val,
MBits: f.throughput(request, ec2meta, val), CIDR: val + "/32",
} MBits: f.throughput(request, ec2meta, val),
},
response.NodeResources = &structs.NodeResources{
Networks: []*structs.NetworkResource{newNetwork},
} }
} }
// copy over CPU speed information
if specs := f.lookupCPU(ec2meta); specs != nil {
response.AddAttribute("cpu.modelname", specs.model)
response.AddAttribute("cpu.frequency", fmt.Sprintf("%.0f", specs.mhz))
response.AddAttribute("cpu.numcores", fmt.Sprintf("%d", specs.cores))
f.logger.Debug("lookup ec2 cpu", "cores", specs.cores, "MHz", log.Fmt("%.0f", specs.mhz), "model", specs.model)
if ticks := specs.ticks(); request.Config.CpuCompute <= 0 {
response.AddAttribute("cpu.totalcompute", fmt.Sprintf("%d", ticks))
f.logger.Debug("setting ec2 cpu ticks", "ticks", ticks)
resources = new(structs.Resources)
resources.CPU = ticks
if nodeResources == nil {
nodeResources = new(structs.NodeResources)
}
nodeResources.Cpu = structs.NodeCpuResources{CpuShares: int64(ticks)}
}
} else {
f.logger.Warn("failed to find the cpu specification for this instance type")
}
response.Resources = resources
response.NodeResources = nodeResources
// populate Links // populate Links
response.AddLink("aws.ec2", fmt.Sprintf("%s.%s", response.AddLink("aws.ec2", fmt.Sprintf("%s.%s",
response.Attributes["platform.aws.placement.availability-zone"], response.Attributes["platform.aws.placement.availability-zone"],
@ -156,6 +526,28 @@ func (f *EnvAWSFingerprint) Fingerprint(request *FingerprintRequest, response *F
return nil return nil
} }
func (f *EnvAWSFingerprint) instanceType(ec2meta *ec2metadata.EC2Metadata) (string, error) {
response, err := ec2meta.GetMetadata("instance-type")
if err != nil {
return "", err
}
return strings.TrimSpace(response), nil
}
func (f *EnvAWSFingerprint) lookupCPU(ec2meta *ec2metadata.EC2Metadata) *ec2Specs {
instanceType, err := f.instanceType(ec2meta)
if err != nil {
f.logger.Warn("failed to read EC2 metadata instance-type", "error", err)
return nil
}
for iType, specs := range ec2ProcSpeedTable {
if strings.EqualFold(iType, instanceType) {
return &specs
}
}
return nil
}
func (f *EnvAWSFingerprint) throughput(request *FingerprintRequest, ec2meta *ec2metadata.EC2Metadata, ip string) int { func (f *EnvAWSFingerprint) throughput(request *FingerprintRequest, ec2meta *ec2metadata.EC2Metadata, ip string) int {
throughput := request.Config.NetworkSpeed throughput := request.Config.NetworkSpeed
if throughput != 0 { if throughput != 0 {
@ -180,17 +572,15 @@ func (f *EnvAWSFingerprint) throughput(request *FingerprintRequest, ec2meta *ec2
// EnvAWSFingerprint uses lookup table to approximate network speeds // EnvAWSFingerprint uses lookup table to approximate network speeds
func (f *EnvAWSFingerprint) linkSpeed(ec2meta *ec2metadata.EC2Metadata) int { func (f *EnvAWSFingerprint) linkSpeed(ec2meta *ec2metadata.EC2Metadata) int {
instanceType, err := f.instanceType(ec2meta)
resp, err := ec2meta.GetMetadata("instance-type")
if err != nil { if err != nil {
f.logger.Error("error reading instance-type", "error", err) f.logger.Error("error reading instance-type", "error", err)
return 0 return 0
} }
key := strings.Trim(resp, "\n")
netSpeed := 0 netSpeed := 0
for reg, speed := range ec2InstanceSpeedMap { for reg, speed := range ec2NetSpeedTable {
if reg.MatchString(key) { if reg.MatchString(instanceType) {
netSpeed = speed netSpeed = speed
break break
} }
@ -210,11 +600,11 @@ func ec2MetaClient(endpoint string, timeout time.Duration) (*ec2metadata.EC2Meta
c = c.WithEndpoint(endpoint) c = c.WithEndpoint(endpoint)
} }
session, err := session.NewSession(c) sess, err := session.NewSession(c)
if err != nil { if err != nil {
return nil, err return nil, err
} }
return ec2metadata.New(session, c), nil return ec2metadata.New(sess, c), nil
} }
func isAWS(ec2meta *ec2metadata.EC2Metadata) bool { func isAWS(ec2meta *ec2metadata.EC2Metadata) bool {

View file

@ -202,6 +202,74 @@ func TestNetworkFingerprint_AWS_IncompleteImitation(t *testing.T) {
require.Nil(t, response.NodeResources) require.Nil(t, response.NodeResources)
} }
func TestCPUFingerprint_AWS_InstanceFound(t *testing.T) {
endpoint, cleanup := startFakeEC2Metadata(t, awsStubs)
defer cleanup()
f := NewEnvAWSFingerprint(testlog.HCLogger(t))
f.(*EnvAWSFingerprint).endpoint = endpoint
node := &structs.Node{Attributes: make(map[string]string)}
request := &FingerprintRequest{Config: &config.Config{}, Node: node}
var response FingerprintResponse
err := f.Fingerprint(request, &response)
require.NoError(t, err)
require.True(t, response.Detected)
require.Equal(t, "2.5 GHz AMD EPYC 7000 series", response.Attributes["cpu.modelname"])
require.Equal(t, "2500", response.Attributes["cpu.frequency"])
require.Equal(t, "8", response.Attributes["cpu.numcores"])
require.Equal(t, "20000", response.Attributes["cpu.totalcompute"])
require.Equal(t, 20000, response.Resources.CPU)
require.Equal(t, int64(20000), response.NodeResources.Cpu.CpuShares)
}
func TestCPUFingerprint_AWS_OverrideCompute(t *testing.T) {
endpoint, cleanup := startFakeEC2Metadata(t, awsStubs)
defer cleanup()
f := NewEnvAWSFingerprint(testlog.HCLogger(t))
f.(*EnvAWSFingerprint).endpoint = endpoint
node := &structs.Node{Attributes: make(map[string]string)}
request := &FingerprintRequest{Config: &config.Config{
CpuCompute: 99999,
}, Node: node}
var response FingerprintResponse
err := f.Fingerprint(request, &response)
require.NoError(t, err)
require.True(t, response.Detected)
require.Equal(t, "2.5 GHz AMD EPYC 7000 series", response.Attributes["cpu.modelname"])
require.Equal(t, "2500", response.Attributes["cpu.frequency"])
require.Equal(t, "8", response.Attributes["cpu.numcores"])
require.NotContains(t, response.Attributes, "cpu.totalcompute")
require.Nil(t, response.Resources) // defaults in cpu fingerprinter
require.Zero(t, response.NodeResources.Cpu) // defaults in cpu fingerprinter
}
func TestCPUFingerprint_AWS_InstanceNotFound(t *testing.T) {
endpoint, cleanup := startFakeEC2Metadata(t, unknownInstanceType)
defer cleanup()
f := NewEnvAWSFingerprint(testlog.HCLogger(t))
f.(*EnvAWSFingerprint).endpoint = endpoint
node := &structs.Node{Attributes: make(map[string]string)}
request := &FingerprintRequest{Config: &config.Config{}, Node: node}
var response FingerprintResponse
err := f.Fingerprint(request, &response)
require.NoError(t, err)
require.True(t, response.Detected)
require.NotContains(t, response.Attributes, "cpu.modelname")
require.NotContains(t, response.Attributes, "cpu.frequency")
require.NotContains(t, response.Attributes, "cpu.numcores")
require.NotContains(t, response.Attributes, "cpu.totalcompute")
require.Nil(t, response.Resources)
require.Nil(t, response.NodeResources)
}
/// Utility functions for tests /// Utility functions for tests
func startFakeEC2Metadata(t *testing.T, endpoints []endpoint) (endpoint string, cleanup func()) { func startFakeEC2Metadata(t *testing.T, endpoints []endpoint) (endpoint string, cleanup func()) {
@ -252,7 +320,7 @@ var awsStubs = []endpoint{
{ {
Uri: "/latest/meta-data/instance-type", Uri: "/latest/meta-data/instance-type",
ContentType: "text/plain", ContentType: "text/plain",
Body: "m3.2xlarge", Body: "t3a.2xlarge",
}, },
{ {
Uri: "/latest/meta-data/local-hostname", Uri: "/latest/meta-data/local-hostname",
@ -276,6 +344,34 @@ var awsStubs = []endpoint{
}, },
} }
var unknownInstanceType = []endpoint{
{
Uri: "/latest/meta-data/ami-id",
ContentType: "text/plain",
Body: "ami-1234",
},
{
Uri: "/latest/meta-data/hostname",
ContentType: "text/plain",
Body: "ip-10-0-0-207.us-west-2.compute.internal",
},
{
Uri: "/latest/meta-data/placement/availability-zone",
ContentType: "text/plain",
Body: "us-west-2a",
},
{
Uri: "/latest/meta-data/instance-id",
ContentType: "text/plain",
Body: "i-b3ba3875",
},
{
Uri: "/latest/meta-data/instance-type",
ContentType: "text/plain",
Body: "xyz123.uber",
},
}
// noNetworkAWSStubs mimics an EC2 instance but without local ip address // noNetworkAWSStubs mimics an EC2 instance but without local ip address
// may happen in environments with odd EC2 Metadata emulation // may happen in environments with odd EC2 Metadata emulation
var noNetworkAWSStubs = []endpoint{ var noNetworkAWSStubs = []endpoint{

View file

@ -2,7 +2,6 @@ package csimanager
import ( import (
"context" "context"
"fmt"
"sync" "sync"
"testing" "testing"
"time" "time"
@ -47,7 +46,6 @@ func TestInstanceManager_Shutdown(t *testing.T) {
im.shutdownCtxCancelFn = cancelFn im.shutdownCtxCancelFn = cancelFn
im.shutdownCh = make(chan struct{}) im.shutdownCh = make(chan struct{})
im.updater = func(_ string, info *structs.CSIInfo) { im.updater = func(_ string, info *structs.CSIInfo) {
fmt.Println(info)
lock.Lock() lock.Lock()
defer lock.Unlock() defer lock.Unlock()
pluginHealth = info.Healthy pluginHealth = info.Healthy

View file

@ -166,7 +166,7 @@ func (v *volumeManager) stageVolume(ctx context.Context, vol *structs.CSIVolume,
// CSI NodeStageVolume errors for timeout, codes.Unavailable and // CSI NodeStageVolume errors for timeout, codes.Unavailable and
// codes.ResourceExhausted are retried; all other errors are fatal. // codes.ResourceExhausted are retried; all other errors are fatal.
return v.plugin.NodeStageVolume(ctx, return v.plugin.NodeStageVolume(ctx,
vol.ID, vol.RemoteID(),
publishContext, publishContext,
pluginStagingPath, pluginStagingPath,
capability, capability,

View file

@ -1,6 +0,0 @@
#!/bin/bash
set -e
codecgen -d 102 -t codegen_generated -o structs.generated.go structs.go
sed -i'' -e 's|"github.com/ugorji/go/codec|"github.com/hashicorp/go-msgpack/codec|g' structs.generated.go

View file

@ -1,6 +1,6 @@
package structs package structs
//go:generate ./generate.sh //go:generate codecgen -c github.com/hashicorp/go-msgpack/codec -d 102 -t codegen_generated -o structs.generated.go structs.go
import ( import (
"errors" "errors"

View file

@ -640,10 +640,12 @@ func (c *Command) Run(args []string) int {
logGate.Flush() logGate.Flush()
return 1 return 1
} }
defer c.agent.Shutdown()
// Shutdown the HTTP server at the end
defer func() { defer func() {
c.agent.Shutdown()
// Shutdown the http server at the end, to ease debugging if
// the agent takes long to shutdown
if c.httpServer != nil { if c.httpServer != nil {
c.httpServer.Shutdown() c.httpServer.Shutdown()
} }

View file

@ -146,6 +146,7 @@ func NewHTTPServer(agent *Agent, config *Config) (*HTTPServer, error) {
Addr: srv.Addr, Addr: srv.Addr,
Handler: gzip(mux), Handler: gzip(mux),
ConnState: makeConnState(config.TLSConfig.EnableHTTP, handshakeTimeout, maxConns), ConnState: makeConnState(config.TLSConfig.EnableHTTP, handshakeTimeout, maxConns),
ErrorLog: newHTTPServerLogger(srv.logger),
} }
go func() { go func() {
@ -466,7 +467,11 @@ func (s *HTTPServer) wrap(handler func(resp http.ResponseWriter, req *http.Reque
resp.WriteHeader(code) resp.WriteHeader(code)
resp.Write([]byte(errMsg)) resp.Write([]byte(errMsg))
s.logger.Error("request failed", "method", req.Method, "path", reqURL, "error", err, "code", code) if isAPIClientError(code) {
s.logger.Debug("request failed", "method", req.Method, "path", reqURL, "error", err, "code", code)
} else {
s.logger.Error("request failed", "method", req.Method, "path", reqURL, "error", err, "code", code)
}
return return
} }
@ -520,7 +525,11 @@ func (s *HTTPServer) wrapNonJSON(handler func(resp http.ResponseWriter, req *htt
code, errMsg := errCodeFromHandler(err) code, errMsg := errCodeFromHandler(err)
resp.WriteHeader(code) resp.WriteHeader(code)
resp.Write([]byte(errMsg)) resp.Write([]byte(errMsg))
s.logger.Error("request failed", "method", req.Method, "path", reqURL, "error", err, "code", code) if isAPIClientError(code) {
s.logger.Debug("request failed", "method", req.Method, "path", reqURL, "error", err, "code", code)
} else {
s.logger.Error("request failed", "method", req.Method, "path", reqURL, "error", err, "code", code)
}
return return
} }
@ -532,6 +541,11 @@ func (s *HTTPServer) wrapNonJSON(handler func(resp http.ResponseWriter, req *htt
return f return f
} }
// isAPIClientError returns true if the passed http code represents a client error
func isAPIClientError(code int) bool {
return 400 <= code && code <= 499
}
// decodeBody is used to decode a JSON request body // decodeBody is used to decode a JSON request body
func decodeBody(req *http.Request, out interface{}) error { func decodeBody(req *http.Request, out interface{}) error {
dec := json.NewDecoder(req.Body) dec := json.NewDecoder(req.Body)

View file

@ -0,0 +1,33 @@
package agent
import (
"bytes"
"log"
hclog "github.com/hashicorp/go-hclog"
)
func newHTTPServerLogger(logger hclog.Logger) *log.Logger {
return log.New(&httpServerLoggerAdapter{logger}, "", 0)
}
// a logger adapter that forwards http server logs as a Trace level
// hclog log entries. Logs related to panics are forwarded with Error level.
//
// HTTP server logs are typically spurious as they represent HTTP
// client errors (e.g. TLS handshake failures).
type httpServerLoggerAdapter struct {
logger hclog.Logger
}
func (l *httpServerLoggerAdapter) Write(data []byte) (int, error) {
if bytes.Contains(data, []byte("panic")) {
str := string(bytes.TrimRight(data, " \t\n"))
l.logger.Error(str)
} else if l.logger.IsTrace() {
str := string(bytes.TrimRight(data, " \t\n"))
l.logger.Trace(str)
}
return len(data), nil
}

View file

@ -0,0 +1,48 @@
package agent
import (
"bytes"
"testing"
"github.com/hashicorp/go-hclog"
"github.com/stretchr/testify/require"
)
func TestHttpServerLoggerFilters_Level_Info(t *testing.T) {
var buf bytes.Buffer
hclogger := hclog.New(&hclog.LoggerOptions{
Name: "testlog",
Output: &buf,
Level: hclog.Info,
})
stdlogger := newHTTPServerLogger(hclogger)
// spurious logging would be filtered out
stdlogger.Printf("spurious logging: %v", "arg")
require.Empty(t, buf.String())
// panics are included
stdlogger.Printf("panic while processing: %v", "endpoint")
require.Contains(t, buf.String(), "[ERROR] testlog: panic while processing: endpoint")
}
func TestHttpServerLoggerFilters_Level_Trace(t *testing.T) {
var buf bytes.Buffer
hclogger := hclog.New(&hclog.LoggerOptions{
Name: "testlog",
Output: &buf,
Level: hclog.Trace,
})
stdlogger := newHTTPServerLogger(hclogger)
// spurious logging will be included as Trace level
stdlogger.Printf("spurious logging: %v", "arg")
require.Contains(t, buf.String(), "[TRACE] testlog: spurious logging: arg")
stdlogger.Printf("panic while processing: %v", "endpoint")
require.Contains(t, buf.String(), "[ERROR] testlog: panic while processing: endpoint")
}

View file

@ -1082,6 +1082,18 @@ func TestHTTPServer_Limits_OK(t *testing.T) {
} }
} }
func Test_IsAPIClientError(t *testing.T) {
trueCases := []int{400, 403, 404, 499}
for _, c := range trueCases {
require.Truef(t, isAPIClientError(c), "code: %v", c)
}
falseCases := []int{100, 300, 500, 501, 505}
for _, c := range falseCases {
require.Falsef(t, isAPIClientError(c), "code: %v", c)
}
}
func httpTest(t testing.TB, cb func(c *Config), f func(srv *TestAgent)) { func httpTest(t testing.TB, cb func(c *Config), f func(srv *TestAgent)) {
s := makeHTTPServer(t, cb) s := makeHTTPServer(t, cb)
defer s.Shutdown() defer s.Shutdown()

View file

@ -85,14 +85,12 @@ func (c *DeploymentStatusCommand) Run(args []string) int {
// Check that we got exactly one argument // Check that we got exactly one argument
args = flags.Args() args = flags.Args()
if l := len(args); l != 1 { if l := len(args); l > 1 {
c.Ui.Error("This command takes one argument: <deployment id>") c.Ui.Error("This command takes one argument: <deployment id>")
c.Ui.Error(commandErrorText(c)) c.Ui.Error(commandErrorText(c))
return 1 return 1
} }
dID := args[0]
// Truncate the id unless full length is requested // Truncate the id unless full length is requested
length := shortId length := shortId
if verbose { if verbose {
@ -106,7 +104,20 @@ func (c *DeploymentStatusCommand) Run(args []string) int {
return 1 return 1
} }
// List if no arguments are provided
if len(args) == 0 {
deploys, _, err := client.Deployments().List(nil)
if err != nil {
c.Ui.Error(fmt.Sprintf("Error retrieving deployments: %s", err))
return 1
}
c.Ui.Output(formatDeployments(deploys, length))
return 0
}
// Do a prefix lookup // Do a prefix lookup
dID := args[0]
deploy, possible, err := getDeployment(client.Deployments(), dID) deploy, possible, err := getDeployment(client.Deployments(), dID)
if err != nil { if err != nil {
c.Ui.Error(fmt.Sprintf("Error retrieving deployment: %s", err)) c.Ui.Error(fmt.Sprintf("Error retrieving deployment: %s", err))

View file

@ -1,13 +1,13 @@
package command package command
import ( import (
"strings"
"testing" "testing"
"github.com/hashicorp/nomad/nomad/mock" "github.com/hashicorp/nomad/nomad/mock"
"github.com/mitchellh/cli" "github.com/mitchellh/cli"
"github.com/posener/complete" "github.com/posener/complete"
"github.com/stretchr/testify/assert" "github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
) )
func TestDeploymentStatusCommand_Implements(t *testing.T) { func TestDeploymentStatusCommand_Implements(t *testing.T) {
@ -21,20 +21,23 @@ func TestDeploymentStatusCommand_Fails(t *testing.T) {
cmd := &DeploymentStatusCommand{Meta: Meta{Ui: ui}} cmd := &DeploymentStatusCommand{Meta: Meta{Ui: ui}}
// Fails on misuse // Fails on misuse
if code := cmd.Run([]string{"some", "bad", "args"}); code != 1 { code := cmd.Run([]string{"some", "bad", "args"})
t.Fatalf("expected exit code 1, got: %d", code) require.Equal(t, 1, code)
} out := ui.ErrorWriter.String()
if out := ui.ErrorWriter.String(); !strings.Contains(out, commandErrorText(cmd)) { require.Contains(t, out, commandErrorText(cmd))
t.Fatalf("expected help output, got: %s", out)
}
ui.ErrorWriter.Reset() ui.ErrorWriter.Reset()
if code := cmd.Run([]string{"-address=nope", "12"}); code != 1 { code = cmd.Run([]string{"-address=nope", "12"})
t.Fatalf("expected exit code 1, got: %d", code) require.Equal(t, 1, code)
} out = ui.ErrorWriter.String()
if out := ui.ErrorWriter.String(); !strings.Contains(out, "Error retrieving deployment") { require.Contains(t, out, "Error retrieving deployment")
t.Fatalf("expected failed query error, got: %s", out) ui.ErrorWriter.Reset()
}
code = cmd.Run([]string{"-address=nope"})
require.Equal(t, 1, code)
out = ui.ErrorWriter.String()
// "deployments" indicates that we attempted to list all deployments
require.Contains(t, out, "Error retrieving deployments")
ui.ErrorWriter.Reset() ui.ErrorWriter.Reset()
} }

View file

@ -17,7 +17,7 @@ const (
nomad job run -check-index %d %s nomad job run -check-index %d %s
When running the job with the check-index flag, the job will only be run if the When running the job with the check-index flag, the job will only be run if the
server side version matches the job modify index returned. If the index has job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are changed, another user has modified the job and the plan's results are
potentially invalid.` potentially invalid.`

View file

@ -15,6 +15,7 @@
* [ ] Add structs/fields to `nomad/structs` package * [ ] Add structs/fields to `nomad/structs` package
* Validation happens in this package and must be implemented * Validation happens in this package and must be implemented
* Implement other methods and tests from `api/` package * Implement other methods and tests from `api/` package
* Note that analogous struct field names should match with `api/` package
* [ ] Add conversion between `api/` and `nomad/structs` in `command/agent/job_endpoint.go` * [ ] Add conversion between `api/` and `nomad/structs` in `command/agent/job_endpoint.go`
* [ ] Add check for job diff in `nomad/structs/diff.go` * [ ] Add check for job diff in `nomad/structs/diff.go`
* Note that fields must be listed in alphabetical order in `FieldDiff` slices in `nomad/structs/diff_test.go` * Note that fields must be listed in alphabetical order in `FieldDiff` slices in `nomad/structs/diff_test.go`

1
demo/grpc-checks/.gitignore vendored Normal file
View file

@ -0,0 +1 @@
grpc-checks

View file

@ -0,0 +1,18 @@
FROM golang:alpine as builder
WORKDIR /build
ADD . /build
RUN apk add protoc && \
go get -u github.com/golang/protobuf/protoc-gen-go
RUN go version && \
go env && \
go generate && \
CGO_ENABLED=0 GOOS=linux go build
FROM alpine:latest
MAINTAINER nomadproject.io
WORKDIR /opt
COPY --from=builder /build/grpc-checks /opt
ENTRYPOINT ["/opt/grpc-checks"]

View file

@ -0,0 +1,38 @@
# grpc-checks
An example service that exposes a gRPC healthcheck endpoint
### generate protobuf
Note that main.go also includes this as a go:generate directive
so that running this by hand is not necessary
```bash
$ protoc -I ./health ./health/health.proto --go_out=plugins=grpc:health
```
### build & run example
Generate, compile, and run the example server.
```bash
go generate
go build
go run main.go
```
### publish
#### Testing locally
```bash
$ docker build -t hashicorpnomad/grpc-checks:test .
$ docker run --rm hashicorpnomad/grpc-checks:test
```
#### Upload to Docker Hub
```bash
# replace <version> with the next version number
docker login
$ docker build -t hashicorpnomad/grpc-checks:<version> .
$ docker push hashicorpnomad/grpc-checks:<version>
```

View file

@ -0,0 +1,31 @@
package example
import (
"context"
"log"
ghc "google.golang.org/grpc/health/grpc_health_v1"
)
// Server is a trivial gRPC server that implements the standard grpc.health.v1
// interface.
type Server struct {
}
func New() *Server {
return new(Server)
}
func (s *Server) Check(ctx context.Context, hcr *ghc.HealthCheckRequest) (*ghc.HealthCheckResponse, error) {
log.Printf("Check:%s (%s)", hcr.Service, hcr.String())
return &ghc.HealthCheckResponse{
Status: ghc.HealthCheckResponse_SERVING,
}, nil
}
func (s *Server) Watch(hcr *ghc.HealthCheckRequest, hws ghc.Health_WatchServer) error {
log.Printf("Watch:%s (%s)", hcr.Service, hcr.String())
return hws.Send(&ghc.HealthCheckResponse{
Status: ghc.HealthCheckResponse_SERVING,
})
}

8
demo/grpc-checks/go.mod Normal file
View file

@ -0,0 +1,8 @@
module github.com/hashicorp/nomad/demo/grpc-checks
go 1.14
require (
github.com/golang/protobuf v1.3.5
google.golang.org/grpc v1.28.1
)

53
demo/grpc-checks/go.sum Normal file
View file

@ -0,0 +1,53 @@
cloud.google.com/go v0.26.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw=
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
github.com/census-instrumentation/opencensus-proto v0.2.1/go.mod h1:f6KPmirojxKA12rnyqOA5BBL4O983OfeGPqjHWSTneU=
github.com/client9/misspell v0.3.4/go.mod h1:qj6jICC3Q7zFZvVWo7KLAzC3yx5G7kyvSDkc90ppPyw=
github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f/go.mod h1:M8M6+tZqaGXZJjfX53e64911xZQV5JYwmTeXPW+k8Sc=
github.com/envoyproxy/go-control-plane v0.9.0/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4=
github.com/envoyproxy/go-control-plane v0.9.4/go.mod h1:6rpuAdCZL397s3pYoYcLgu1mIlRU8Am5FuJP05cCM98=
github.com/envoyproxy/protoc-gen-validate v0.1.0/go.mod h1:iSmxcyjqTsJpI2R4NaDN7+kN2VEUnK/pcBlmesArF7c=
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b/go.mod h1:SBH7ygxi8pfUlaOkMMuAQtPIUF8ecWP5IEl/CR7VP2Q=
github.com/golang/mock v1.1.1/go.mod h1:oTYuIxOrZwtPieC+H1uAHpcLFnEyAGVDL/k47Jfbm0A=
github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/golang/protobuf v1.3.2/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/golang/protobuf v1.3.3/go.mod h1:vzj43D7+SQXF/4pzW/hwtAqwc6iTitCiVSaWz5lYuqw=
github.com/golang/protobuf v1.3.5 h1:F768QJ1E9tib+q5Sc8MkdJi1RxLTbRcTf8LJV56aRls=
github.com/golang/protobuf v1.3.5/go.mod h1:6O5/vntMXwX2lRkT1hjjk0nAC1IDOTvTlVgjlRvqsdk=
github.com/google/go-cmp v0.2.0/go.mod h1:oXzfMopK8JAjlY9xF4vHSVASa0yLyX7SntLO5aqRK0M=
github.com/hashicorp/nomad v0.11.1 h1:ow411q+bAduxC0X0V3NLx9slQzwG9wiB66yVzpQ0aEg=
github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtYGdfGttqA=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
golang.org/x/lint v0.0.0-20181026193005-c67002cb31c3/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE=
golang.org/x/lint v0.0.0-20190227174305-5b3e6a55c961/go.mod h1:wehouNa3lNwaWXcvxsM5YxQ5yQlVC4a0KAMCusXpPoU=
golang.org/x/lint v0.0.0-20190313153728-d0100b6bd8b3/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc=
golang.org/x/net v0.0.0-20180724234803-3673e40ba225/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20180826012351-8a410e7b638d/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20190213061140-3a22650c66bd/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20190311183353-d8887717615a h1:oWX7TPOiFAMXLq8o0ikBYfCJVlRHBcsciT5bXOrH628=
golang.org/x/net v0.0.0-20190311183353-d8887717615a/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a h1:1BGLXjeY4akVXGgbC9HugT3Jv3hCI0z56oJR5vAMgBU=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/text v0.3.0 h1:g61tztE5qeGQ89tm6NTjjM9VPIm088od1l6aSorWRWg=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/tools v0.0.0-20190114222345-bf090417da8b/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20190226205152-f727befe758c/go.mod h1:9Yl7xja0Znq3iFh3HoIrodX9oNMXvdceNzlUR8zjMvY=
golang.org/x/tools v0.0.0-20190311212946-11955173bddd/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs=
golang.org/x/tools v0.0.0-20190524140312-2c0ae7006135/go.mod h1:RgjU9mgBXZiqYHBnxXauZ1Gv1EHHAz9KjViQ78xBX0Q=
google.golang.org/appengine v1.1.0/go.mod h1:EbEs0AVv82hx2wNQdGPgUI5lhzA/G0D9YwlJXL52JkM=
google.golang.org/appengine v1.4.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4=
google.golang.org/genproto v0.0.0-20180817151627-c66870c02cf8/go.mod h1:JiN7NxoALGmiZfu7CAH4rXhgtRTLTxftemlI0sWmxmc=
google.golang.org/genproto v0.0.0-20190819201941-24fa4b261c55 h1:gSJIx1SDwno+2ElGhA4+qG2zF97qiUzTM+rQ0klBOcE=
google.golang.org/genproto v0.0.0-20190819201941-24fa4b261c55/go.mod h1:DMBHOl98Agz4BDEuKkezgsaosCRResVns1a3J2ZsMNc=
google.golang.org/grpc v1.19.0/go.mod h1:mqu4LbDTu4XGKhr4mRzUsmM4RtVoemTSY81AxZiDr8c=
google.golang.org/grpc v1.23.0/go.mod h1:Y5yQAOtifL1yxbo5wqy6BxZv8vAUGQwXBOALyacEbxg=
google.golang.org/grpc v1.25.1/go.mod h1:c3i+UQWmh7LiEpx4sFZnkU36qjEYZ0imhYfXVyQciAY=
google.golang.org/grpc v1.28.1 h1:C1QC6KzgSiLyBabDi87BbjaGreoRgGUF5nOyvfrAZ1k=
google.golang.org/grpc v1.28.1/go.mod h1:rpkK4SK4GF4Ach/+MFLZUBavHOvF2JJB5uozKKal+60=
honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=

40
demo/grpc-checks/main.go Normal file
View file

@ -0,0 +1,40 @@
package main
import (
"fmt"
"log"
"net"
"os"
"github.com/hashicorp/nomad/demo/grpc-checks/example"
"google.golang.org/grpc"
ghc "google.golang.org/grpc/health/grpc_health_v1"
)
func main() {
port := os.Getenv("GRPC_HC_PORT")
if port == "" {
port = "3333"
}
address := fmt.Sprintf(":%s", port)
log.Printf("creating tcp listener on %s", address)
listener, err := net.Listen("tcp", address)
if err != nil {
log.Printf("unable to create listener: %v", err)
os.Exit(1)
}
log.Printf("creating grpc server")
grpcServer := grpc.NewServer()
log.Printf("registering health server")
ghc.RegisterHealthServer(grpcServer, example.New())
log.Printf("listening ...")
if err := grpcServer.Serve(listener); err != nil {
log.Printf("unable to listen: %v", err)
os.Exit(1)
}
}

View file

@ -25,7 +25,7 @@ sudo docker --version
sudo apt-get install unzip curl vim -y sudo apt-get install unzip curl vim -y
echo "Installing Nomad..." echo "Installing Nomad..."
NOMAD_VERSION=0.10.4 NOMAD_VERSION=0.11.0
cd /tmp/ cd /tmp/
curl -sSL https://releases.hashicorp.com/nomad/${NOMAD_VERSION}/nomad_${NOMAD_VERSION}_linux_amd64.zip -o nomad.zip curl -sSL https://releases.hashicorp.com/nomad/${NOMAD_VERSION}/nomad_${NOMAD_VERSION}_linux_amd64.zip -o nomad.zip
unzip nomad.zip unzip nomad.zip

View file

@ -21,3 +21,16 @@ client {
ports { ports {
http = 5656 http = 5656
} }
# Because we will potentially have two clients talking to the same
# Docker daemon, we have to disable the dangling container cleanup,
# otherwise they will stop each other's work thinking it was orphaned.
plugin "docker" {
config {
gc {
dangling_containers {
enabled = false
}
}
}
}

View file

@ -21,3 +21,16 @@ client {
ports { ports {
http = 5657 http = 5657
} }
# Because we will potentially have two clients talking to the same
# Docker daemon, we have to disable the dangling container cleanup,
# otherwise they will stop each other's work thinking it was orphaned.
plugin "docker" {
config {
gc {
dangling_containers {
enabled = false
}
}
}
}

View file

@ -439,16 +439,21 @@ CREATE:
return container, nil return container, nil
} }
// Delete matching containers // Purge conflicting container if found.
err = client.RemoveContainer(docker.RemoveContainerOptions{ // If container is nil here, the conflicting container was
ID: container.ID, // deleted in our check here, so retry again.
Force: true, if container != nil {
}) // Delete matching containers
if err != nil { err = client.RemoveContainer(docker.RemoveContainerOptions{
d.logger.Error("failed to purge container", "container_id", container.ID) ID: container.ID,
return nil, recoverableErrTimeouts(fmt.Errorf("Failed to purge container %s: %s", container.ID, err)) Force: true,
} else { })
d.logger.Info("purged container", "container_id", container.ID) if err != nil {
d.logger.Error("failed to purge container", "container_id", container.ID)
return nil, recoverableErrTimeouts(fmt.Errorf("Failed to purge container %s: %s", container.ID, err))
} else {
d.logger.Info("purged container", "container_id", container.ID)
}
} }
if attempted < 5 { if attempted < 5 {

View file

@ -1,4 +1,5 @@
NOMAD_SHA ?= $(shell git rev-parse HEAD) NOMAD_SHA ?= $(shell git rev-parse HEAD)
PKG_PATH = $(shell pwd)/../../pkg/linux_amd64/nomad
dev-cluster: dev-cluster:
terraform apply -auto-approve -var-file=terraform.tfvars.dev terraform apply -auto-approve -var-file=terraform.tfvars.dev
@ -6,5 +7,11 @@ dev-cluster:
cd .. && NOMAD_E2E=1 go test -v . -nomad.sha=$(NOMAD_SHA) -provision.terraform ./provisioning.json -skipTests cd .. && NOMAD_E2E=1 go test -v . -nomad.sha=$(NOMAD_SHA) -provision.terraform ./provisioning.json -skipTests
terraform output message terraform output message
dev-cluster-from-local:
terraform apply -auto-approve -var-file=terraform.tfvars.dev
terraform output provisioning | jq . > ../provisioning.json
cd .. && NOMAD_E2E=1 go test -v . -nomad.local_file=$(PKG_PATH) -provision.terraform ./provisioning.json -skipTests
terraform output message
clean: clean:
terraform destroy -auto-approve terraform destroy -auto-approve

View file

@ -366,15 +366,16 @@ func parseScalingPolicy(out **api.ScalingPolicy, list *ast.ObjectList) error {
// If we have policy, then parse that // If we have policy, then parse that
if o := listVal.Filter("policy"); len(o.Items) > 0 { if o := listVal.Filter("policy"); len(o.Items) > 0 {
for _, o := range o.Elem().Items { if len(o.Elem().Items) > 1 {
var m map[string]interface{} return fmt.Errorf("only one 'policy' block allowed per 'scaling' block")
if err := hcl.DecodeObject(&m, o.Val); err != nil { }
return err p := o.Elem().Items[0]
} var m map[string]interface{}
if err := hcl.DecodeObject(&m, p.Val); err != nil {
if err := mapstructure.WeakDecode(m, &result.Policy); err != nil { return err
return err }
} if err := mapstructure.WeakDecode(m, &result.Policy); err != nil {
return err
} }
} }

View file

@ -281,7 +281,7 @@ func parseSidecarTask(item *ast.ObjectItem) (*api.SidecarTask, error) {
KillSignal: task.KillSignal, KillSignal: task.KillSignal,
} }
// Parse ShutdownDelay separately to get pointer // Parse ShutdownDelay separatly to get pointer
var m map[string]interface{} var m map[string]interface{}
if err := hcl.DecodeObject(&m, item.Val); err != nil { if err := hcl.DecodeObject(&m, item.Val); err != nil {
return nil, err return nil, err
@ -320,6 +320,24 @@ func parseProxy(o *ast.ObjectItem) (*api.ConsulProxy, error) {
} }
var proxy api.ConsulProxy var proxy api.ConsulProxy
var m map[string]interface{}
if err := hcl.DecodeObject(&m, o.Val); err != nil {
return nil, err
}
delete(m, "upstreams")
delete(m, "expose")
delete(m, "config")
dec, err := mapstructure.NewDecoder(&mapstructure.DecoderConfig{
Result: &proxy,
})
if err != nil {
return nil, err
}
if err := dec.Decode(m); err != nil {
return nil, fmt.Errorf("proxy: %v", err)
}
var listVal *ast.ObjectList var listVal *ast.ObjectList
if ot, ok := o.Val.(*ast.ObjectType); ok { if ot, ok := o.Val.(*ast.ObjectType); ok {

View file

@ -894,28 +894,6 @@ func TestParse(t *testing.T) {
}, },
false, false,
}, },
{
"service-connect-sidecar_task-name.hcl",
&api.Job{
ID: helper.StringToPtr("sidecar_task_name"),
Name: helper.StringToPtr("sidecar_task_name"),
Type: helper.StringToPtr("service"),
TaskGroups: []*api.TaskGroup{{
Name: helper.StringToPtr("group"),
Services: []*api.Service{{
Name: "example",
Connect: &api.ConsulConnect{
Native: false,
SidecarService: &api.ConsulSidecarService{},
SidecarTask: &api.SidecarTask{
Name: "my-sidecar",
},
},
}},
}},
},
false,
},
{ {
"reschedule-job.hcl", "reschedule-job.hcl",
&api.Job{ &api.Job{
@ -1051,6 +1029,7 @@ func TestParse(t *testing.T) {
SidecarService: &api.ConsulSidecarService{ SidecarService: &api.ConsulSidecarService{
Tags: []string{"side1", "side2"}, Tags: []string{"side1", "side2"},
Proxy: &api.ConsulProxy{ Proxy: &api.ConsulProxy{
LocalServicePort: 8080,
Upstreams: []*api.ConsulUpstream{ Upstreams: []*api.ConsulUpstream{
{ {
DestinationName: "other-service", DestinationName: "other-service",
@ -1172,6 +1151,99 @@ func TestParse(t *testing.T) {
}, },
false, false,
}, },
{
"tg-service-connect-sidecar_task-name.hcl",
&api.Job{
ID: helper.StringToPtr("sidecar_task_name"),
Name: helper.StringToPtr("sidecar_task_name"),
Type: helper.StringToPtr("service"),
TaskGroups: []*api.TaskGroup{{
Name: helper.StringToPtr("group"),
Services: []*api.Service{{
Name: "example",
Connect: &api.ConsulConnect{
Native: false,
SidecarService: &api.ConsulSidecarService{},
SidecarTask: &api.SidecarTask{
Name: "my-sidecar",
},
},
}},
}},
},
false,
},
{
"tg-service-connect-proxy.hcl",
&api.Job{
ID: helper.StringToPtr("service-connect-proxy"),
Name: helper.StringToPtr("service-connect-proxy"),
Type: helper.StringToPtr("service"),
TaskGroups: []*api.TaskGroup{{
Name: helper.StringToPtr("group"),
Services: []*api.Service{{
Name: "example",
Connect: &api.ConsulConnect{
Native: false,
SidecarService: &api.ConsulSidecarService{
Proxy: &api.ConsulProxy{
LocalServiceAddress: "10.0.1.2",
LocalServicePort: 8080,
ExposeConfig: &api.ConsulExposeConfig{
Path: []*api.ConsulExposePath{{
Path: "/metrics",
Protocol: "http",
LocalPathPort: 9001,
ListenerPort: "metrics",
}, {
Path: "/health",
Protocol: "http",
LocalPathPort: 9002,
ListenerPort: "health",
}},
},
Upstreams: []*api.ConsulUpstream{{
DestinationName: "upstream1",
LocalBindPort: 2001,
}, {
DestinationName: "upstream2",
LocalBindPort: 2002,
}},
Config: map[string]interface{}{
"foo": "bar",
},
},
},
},
}},
}},
},
false,
},
{
"tg-service-connect-local-service.hcl",
&api.Job{
ID: helper.StringToPtr("connect-proxy-local-service"),
Name: helper.StringToPtr("connect-proxy-local-service"),
Type: helper.StringToPtr("service"),
TaskGroups: []*api.TaskGroup{{
Name: helper.StringToPtr("group"),
Services: []*api.Service{{
Name: "example",
Connect: &api.ConsulConnect{
Native: false,
SidecarService: &api.ConsulSidecarService{
Proxy: &api.ConsulProxy{
LocalServiceAddress: "10.0.1.2",
LocalServicePort: 9876,
},
},
},
}},
}},
},
false,
},
{ {
"tg-service-check-expose.hcl", "tg-service-check-expose.hcl",
&api.Job{ &api.Job{
@ -1238,6 +1310,32 @@ func TestParse(t *testing.T) {
}, },
false, false,
}, },
{
"tg-scaling-policy-minimal.hcl",
&api.Job{
ID: helper.StringToPtr("elastic"),
Name: helper.StringToPtr("elastic"),
TaskGroups: []*api.TaskGroup{
{
Name: helper.StringToPtr("group"),
Scaling: &api.ScalingPolicy{
Min: nil,
Max: 0,
Policy: nil,
Enabled: nil,
},
},
},
},
false,
},
{
"tg-scaling-policy-multi-policy.hcl",
nil,
true,
},
} }
for _, tc := range cases { for _, tc := range cases {

View file

@ -0,0 +1,5 @@
job "elastic" {
group "group" {
scaling {}
}
}

View file

@ -0,0 +1,19 @@
job "elastic" {
group "group" {
scaling {
enabled = false
min = 5
max = 100
policy {
foo = "right"
b = true
}
policy {
foo = "wrong"
c = false
}
}
}
}

View file

@ -0,0 +1,18 @@
job "connect-proxy-local-service" {
type = "service"
group "group" {
service {
name = "example"
connect {
sidecar_service {
proxy {
local_service_port = 9876
local_service_address = "10.0.1.2"
}
}
}
}
}
}

View file

@ -0,0 +1,48 @@
job "service-connect-proxy" {
type = "service"
group "group" {
service {
name = "example"
connect {
sidecar_service {
proxy {
local_service_port = 8080
local_service_address = "10.0.1.2"
upstreams {
destination_name = "upstream1"
local_bind_port = 2001
}
upstreams {
destination_name = "upstream2"
local_bind_port = 2002
}
expose {
path {
path = "/metrics"
protocol = "http"
local_path_port = 9001
listener_port = "metrics"
}
path {
path = "/health"
protocol = "http"
local_path_port = 9002
listener_port = "health"
}
}
config {
foo = "bar"
}
}
}
}
}
}
}

View file

@ -4,12 +4,14 @@ job "sidecar_task_name" {
group "group" { group "group" {
service { service {
name = "example" name = "example"
connect { connect {
sidecar_service {} sidecar_service = {}
sidecar_task { sidecar_task {
name = "my-sidecar" name = "my-sidecar"
} }
} }
} }
} }
} }

View file

@ -8,9 +8,7 @@ import (
log "github.com/hashicorp/go-hclog" log "github.com/hashicorp/go-hclog"
memdb "github.com/hashicorp/go-memdb" memdb "github.com/hashicorp/go-memdb"
multierror "github.com/hashicorp/go-multierror"
version "github.com/hashicorp/go-version" version "github.com/hashicorp/go-version"
cstructs "github.com/hashicorp/nomad/client/structs"
"github.com/hashicorp/nomad/nomad/state" "github.com/hashicorp/nomad/nomad/state"
"github.com/hashicorp/nomad/nomad/structs" "github.com/hashicorp/nomad/nomad/structs"
"github.com/hashicorp/nomad/scheduler" "github.com/hashicorp/nomad/scheduler"
@ -711,188 +709,30 @@ func allocGCEligible(a *structs.Allocation, job *structs.Job, gcTime time.Time,
return timeDiff > interval.Nanoseconds() return timeDiff > interval.Nanoseconds()
} }
// TODO: we need a periodic trigger to iterate over all the volumes and split
// them up into separate work items, same as we do for jobs.
// csiVolumeClaimGC is used to garbage collect CSI volume claims // csiVolumeClaimGC is used to garbage collect CSI volume claims
func (c *CoreScheduler) csiVolumeClaimGC(eval *structs.Evaluation) error { func (c *CoreScheduler) csiVolumeClaimGC(eval *structs.Evaluation) error {
c.logger.Trace("garbage collecting unclaimed CSI volume claims") c.logger.Trace("garbage collecting unclaimed CSI volume claims", "eval.JobID", eval.JobID)
// Volume ID smuggled in with the eval's own JobID // Volume ID smuggled in with the eval's own JobID
evalVolID := strings.Split(eval.JobID, ":") evalVolID := strings.Split(eval.JobID, ":")
if len(evalVolID) != 3 {
// COMPAT(1.0): 0.11.0 shipped with 3 fields. tighten this check to len == 2
if len(evalVolID) < 2 {
c.logger.Error("volume gc called without volID") c.logger.Error("volume gc called without volID")
return nil return nil
} }
volID := evalVolID[1] volID := evalVolID[1]
runningAllocs := evalVolID[2] == "purge"
return volumeClaimReap(c.srv, volID, eval.Namespace,
c.srv.config.Region, eval.LeaderACL, runningAllocs)
}
func volumeClaimReap(srv RPCServer, volID, namespace, region, leaderACL string, runningAllocs bool) error {
ws := memdb.NewWatchSet()
vol, err := srv.State().CSIVolumeByID(ws, namespace, volID)
if err != nil {
return err
}
if vol == nil {
return nil
}
vol, err = srv.State().CSIVolumeDenormalize(ws, vol)
if err != nil {
return err
}
plug, err := srv.State().CSIPluginByID(ws, vol.PluginID)
if err != nil {
return err
}
gcClaims, nodeClaims := collectClaimsToGCImpl(vol, runningAllocs)
var result *multierror.Error
for _, claim := range gcClaims {
nodeClaims, err = volumeClaimReapImpl(srv,
&volumeClaimReapArgs{
vol: vol,
plug: plug,
allocID: claim.allocID,
nodeID: claim.nodeID,
mode: claim.mode,
namespace: namespace,
region: region,
leaderACL: leaderACL,
nodeClaims: nodeClaims,
},
)
if err != nil {
result = multierror.Append(result, err)
continue
}
}
return result.ErrorOrNil()
}
type gcClaimRequest struct {
allocID string
nodeID string
mode structs.CSIVolumeClaimMode
}
func collectClaimsToGCImpl(vol *structs.CSIVolume, runningAllocs bool) ([]gcClaimRequest, map[string]int) {
gcAllocs := []gcClaimRequest{}
nodeClaims := map[string]int{} // node IDs -> count
collectFunc := func(allocs map[string]*structs.Allocation,
mode structs.CSIVolumeClaimMode) {
for _, alloc := range allocs {
// we call denormalize on the volume above to populate
// Allocation pointers. But the alloc might have been
// garbage collected concurrently, so if the alloc is
// still nil we can safely skip it.
if alloc == nil {
continue
}
nodeClaims[alloc.NodeID]++
if runningAllocs || alloc.Terminated() {
gcAllocs = append(gcAllocs, gcClaimRequest{
allocID: alloc.ID,
nodeID: alloc.NodeID,
mode: mode,
})
}
}
}
collectFunc(vol.WriteAllocs, structs.CSIVolumeClaimWrite)
collectFunc(vol.ReadAllocs, structs.CSIVolumeClaimRead)
return gcAllocs, nodeClaims
}
type volumeClaimReapArgs struct {
vol *structs.CSIVolume
plug *structs.CSIPlugin
allocID string
nodeID string
mode structs.CSIVolumeClaimMode
region string
namespace string
leaderACL string
nodeClaims map[string]int // node IDs -> count
}
func volumeClaimReapImpl(srv RPCServer, args *volumeClaimReapArgs) (map[string]int, error) {
vol := args.vol
nodeID := args.nodeID
// (1) NodePublish / NodeUnstage must be completed before controller
// operations or releasing the claim.
nReq := &cstructs.ClientCSINodeDetachVolumeRequest{
PluginID: args.plug.ID,
VolumeID: vol.ID,
ExternalID: vol.RemoteID(),
AllocID: args.allocID,
NodeID: nodeID,
AttachmentMode: vol.AttachmentMode,
AccessMode: vol.AccessMode,
ReadOnly: args.mode == structs.CSIVolumeClaimRead,
}
err := srv.RPC("ClientCSI.NodeDetachVolume", nReq,
&cstructs.ClientCSINodeDetachVolumeResponse{})
if err != nil {
return args.nodeClaims, err
}
args.nodeClaims[nodeID]--
// (2) we only emit the controller unpublish if no other allocs
// on the node need it, but we also only want to make this
// call at most once per node
if vol.ControllerRequired && args.nodeClaims[nodeID] < 1 {
// we need to get the CSI Node ID, which is not the same as
// the Nomad Node ID
ws := memdb.NewWatchSet()
targetNode, err := srv.State().NodeByID(ws, nodeID)
if err != nil {
return args.nodeClaims, err
}
if targetNode == nil {
return args.nodeClaims, fmt.Errorf("%s: %s",
structs.ErrUnknownNodePrefix, nodeID)
}
targetCSIInfo, ok := targetNode.CSINodePlugins[args.plug.ID]
if !ok {
return args.nodeClaims, fmt.Errorf("Failed to find NodeInfo for node: %s", targetNode.ID)
}
cReq := &cstructs.ClientCSIControllerDetachVolumeRequest{
VolumeID: vol.RemoteID(),
ClientCSINodeID: targetCSIInfo.NodeInfo.ID,
}
cReq.PluginID = args.plug.ID
err = srv.RPC("ClientCSI.ControllerDetachVolume", cReq,
&cstructs.ClientCSIControllerDetachVolumeResponse{})
if err != nil {
return args.nodeClaims, err
}
}
// (3) release the claim from the state store, allowing it to be rescheduled
req := &structs.CSIVolumeClaimRequest{ req := &structs.CSIVolumeClaimRequest{
VolumeID: vol.ID, VolumeID: volID,
AllocationID: args.allocID, Claim: structs.CSIVolumeClaimRelease,
Claim: structs.CSIVolumeClaimRelease,
WriteRequest: structs.WriteRequest{
Region: args.region,
Namespace: args.namespace,
AuthToken: args.leaderACL,
},
} }
err = srv.RPC("CSIVolume.Claim", req, &structs.CSIVolumeClaimResponse{}) req.Namespace = eval.Namespace
if err != nil { req.Region = c.srv.config.Region
return args.nodeClaims, err
} err := c.srv.RPC("CSIVolume.Claim", req, &structs.CSIVolumeClaimResponse{})
return args.nodeClaims, nil return err
} }

View file

@ -6,10 +6,8 @@ import (
"time" "time"
memdb "github.com/hashicorp/go-memdb" memdb "github.com/hashicorp/go-memdb"
cstructs "github.com/hashicorp/nomad/client/structs"
"github.com/hashicorp/nomad/helper/uuid" "github.com/hashicorp/nomad/helper/uuid"
"github.com/hashicorp/nomad/nomad/mock" "github.com/hashicorp/nomad/nomad/mock"
"github.com/hashicorp/nomad/nomad/state"
"github.com/hashicorp/nomad/nomad/structs" "github.com/hashicorp/nomad/nomad/structs"
"github.com/hashicorp/nomad/testutil" "github.com/hashicorp/nomad/testutil"
"github.com/stretchr/testify/assert" "github.com/stretchr/testify/assert"
@ -2195,270 +2193,3 @@ func TestAllocation_GCEligible(t *testing.T) {
alloc.ClientStatus = structs.AllocClientStatusComplete alloc.ClientStatus = structs.AllocClientStatusComplete
require.True(allocGCEligible(alloc, nil, time.Now(), 1000)) require.True(allocGCEligible(alloc, nil, time.Now(), 1000))
} }
func TestCSI_GCVolumeClaims_Collection(t *testing.T) {
t.Parallel()
srv, shutdownSrv := TestServer(t, func(c *Config) { c.NumSchedulers = 0 })
defer shutdownSrv()
testutil.WaitForLeader(t, srv.RPC)
state := srv.fsm.State()
ws := memdb.NewWatchSet()
index := uint64(100)
// Create a client node, plugin, and volume
node := mock.Node()
node.Attributes["nomad.version"] = "0.11.0" // client RPCs not supported on early version
node.CSINodePlugins = map[string]*structs.CSIInfo{
"csi-plugin-example": {
PluginID: "csi-plugin-example",
Healthy: true,
RequiresControllerPlugin: true,
NodeInfo: &structs.CSINodeInfo{},
},
}
node.CSIControllerPlugins = map[string]*structs.CSIInfo{
"csi-plugin-example": {
PluginID: "csi-plugin-example",
Healthy: true,
RequiresControllerPlugin: true,
ControllerInfo: &structs.CSIControllerInfo{
SupportsReadOnlyAttach: true,
SupportsAttachDetach: true,
SupportsListVolumes: true,
SupportsListVolumesAttachedNodes: false,
},
},
}
err := state.UpsertNode(99, node)
require.NoError(t, err)
volId0 := uuid.Generate()
ns := structs.DefaultNamespace
vols := []*structs.CSIVolume{{
ID: volId0,
Namespace: ns,
PluginID: "csi-plugin-example",
AccessMode: structs.CSIVolumeAccessModeMultiNodeSingleWriter,
AttachmentMode: structs.CSIVolumeAttachmentModeFilesystem,
}}
err = state.CSIVolumeRegister(index, vols)
index++
require.NoError(t, err)
vol, err := state.CSIVolumeByID(ws, ns, volId0)
require.NoError(t, err)
require.True(t, vol.ControllerRequired)
require.Len(t, vol.ReadAllocs, 0)
require.Len(t, vol.WriteAllocs, 0)
// Create a job with 2 allocations
job := mock.Job()
job.TaskGroups[0].Volumes = map[string]*structs.VolumeRequest{
"_": {
Name: "someVolume",
Type: structs.VolumeTypeCSI,
Source: volId0,
ReadOnly: false,
},
}
err = state.UpsertJob(index, job)
index++
require.NoError(t, err)
alloc1 := mock.Alloc()
alloc1.JobID = job.ID
alloc1.NodeID = node.ID
err = state.UpsertJobSummary(index, mock.JobSummary(alloc1.JobID))
index++
require.NoError(t, err)
alloc1.TaskGroup = job.TaskGroups[0].Name
alloc2 := mock.Alloc()
alloc2.JobID = job.ID
alloc2.NodeID = node.ID
err = state.UpsertJobSummary(index, mock.JobSummary(alloc2.JobID))
index++
require.NoError(t, err)
alloc2.TaskGroup = job.TaskGroups[0].Name
err = state.UpsertAllocs(104, []*structs.Allocation{alloc1, alloc2})
require.NoError(t, err)
// Claim the volumes and verify the claims were set
err = state.CSIVolumeClaim(index, ns, volId0, alloc1, structs.CSIVolumeClaimWrite)
index++
require.NoError(t, err)
err = state.CSIVolumeClaim(index, ns, volId0, alloc2, structs.CSIVolumeClaimRead)
index++
require.NoError(t, err)
vol, err = state.CSIVolumeByID(ws, ns, volId0)
require.NoError(t, err)
require.Len(t, vol.ReadAllocs, 1)
require.Len(t, vol.WriteAllocs, 1)
// Update both allocs as failed/terminated
alloc1.ClientStatus = structs.AllocClientStatusFailed
alloc2.ClientStatus = structs.AllocClientStatusFailed
err = state.UpdateAllocsFromClient(index, []*structs.Allocation{alloc1, alloc2})
require.NoError(t, err)
vol, err = state.CSIVolumeDenormalize(ws, vol)
require.NoError(t, err)
gcClaims, nodeClaims := collectClaimsToGCImpl(vol, false)
require.Equal(t, nodeClaims[node.ID], 2)
require.Len(t, gcClaims, 2)
}
func TestCSI_GCVolumeClaims_Reap(t *testing.T) {
t.Parallel()
require := require.New(t)
s, shutdownSrv := TestServer(t, func(c *Config) { c.NumSchedulers = 0 })
defer shutdownSrv()
testutil.WaitForLeader(t, s.RPC)
node := mock.Node()
plugin := mock.CSIPlugin()
vol := mock.CSIVolume(plugin)
alloc := mock.Alloc()
cases := []struct {
Name string
Claim gcClaimRequest
ClaimsCount map[string]int
ControllerRequired bool
ExpectedErr string
ExpectedCount int
ExpectedClaimsCount int
ExpectedNodeDetachVolumeCount int
ExpectedControllerDetachVolumeCount int
ExpectedVolumeClaimCount int
srv *MockRPCServer
}{
{
Name: "NodeDetachVolume fails",
Claim: gcClaimRequest{
allocID: alloc.ID,
nodeID: node.ID,
mode: structs.CSIVolumeClaimRead,
},
ClaimsCount: map[string]int{node.ID: 1},
ControllerRequired: true,
ExpectedErr: "node plugin missing",
ExpectedClaimsCount: 1,
ExpectedNodeDetachVolumeCount: 1,
srv: &MockRPCServer{
state: s.State(),
nextCSINodeDetachVolumeError: fmt.Errorf("node plugin missing"),
},
},
{
Name: "ControllerDetachVolume no controllers",
Claim: gcClaimRequest{
allocID: alloc.ID,
nodeID: node.ID,
mode: structs.CSIVolumeClaimRead,
},
ClaimsCount: map[string]int{node.ID: 1},
ControllerRequired: true,
ExpectedErr: fmt.Sprintf(
"Unknown node: %s", node.ID),
ExpectedClaimsCount: 0,
ExpectedNodeDetachVolumeCount: 1,
ExpectedControllerDetachVolumeCount: 0,
srv: &MockRPCServer{
state: s.State(),
},
},
{
Name: "ControllerDetachVolume node-only",
Claim: gcClaimRequest{
allocID: alloc.ID,
nodeID: node.ID,
mode: structs.CSIVolumeClaimRead,
},
ClaimsCount: map[string]int{node.ID: 1},
ControllerRequired: false,
ExpectedClaimsCount: 0,
ExpectedNodeDetachVolumeCount: 1,
ExpectedControllerDetachVolumeCount: 0,
ExpectedVolumeClaimCount: 1,
srv: &MockRPCServer{
state: s.State(),
},
},
}
for _, tc := range cases {
t.Run(tc.Name, func(t *testing.T) {
vol.ControllerRequired = tc.ControllerRequired
nodeClaims, err := volumeClaimReapImpl(tc.srv, &volumeClaimReapArgs{
vol: vol,
plug: plugin,
allocID: tc.Claim.allocID,
nodeID: tc.Claim.nodeID,
mode: tc.Claim.mode,
region: "global",
namespace: "default",
leaderACL: "not-in-use",
nodeClaims: tc.ClaimsCount,
})
if tc.ExpectedErr != "" {
require.EqualError(err, tc.ExpectedErr)
} else {
require.NoError(err)
}
require.Equal(tc.ExpectedClaimsCount,
nodeClaims[tc.Claim.nodeID], "expected claims")
require.Equal(tc.ExpectedNodeDetachVolumeCount,
tc.srv.countCSINodeDetachVolume, "node detach RPC count")
require.Equal(tc.ExpectedControllerDetachVolumeCount,
tc.srv.countCSIControllerDetachVolume, "controller detach RPC count")
require.Equal(tc.ExpectedVolumeClaimCount,
tc.srv.countCSIVolumeClaim, "volume claim RPC count")
})
}
}
type MockRPCServer struct {
state *state.StateStore
// mock responses for ClientCSI.NodeDetachVolume
nextCSINodeDetachVolumeResponse *cstructs.ClientCSINodeDetachVolumeResponse
nextCSINodeDetachVolumeError error
countCSINodeDetachVolume int
// mock responses for ClientCSI.ControllerDetachVolume
nextCSIControllerDetachVolumeResponse *cstructs.ClientCSIControllerDetachVolumeResponse
nextCSIControllerDetachVolumeError error
countCSIControllerDetachVolume int
// mock responses for CSI.VolumeClaim
nextCSIVolumeClaimResponse *structs.CSIVolumeClaimResponse
nextCSIVolumeClaimError error
countCSIVolumeClaim int
}
func (srv *MockRPCServer) RPC(method string, args interface{}, reply interface{}) error {
switch method {
case "ClientCSI.NodeDetachVolume":
reply = srv.nextCSINodeDetachVolumeResponse
srv.countCSINodeDetachVolume++
return srv.nextCSINodeDetachVolumeError
case "ClientCSI.ControllerDetachVolume":
reply = srv.nextCSIControllerDetachVolumeResponse
srv.countCSIControllerDetachVolume++
return srv.nextCSIControllerDetachVolumeError
case "CSIVolume.Claim":
reply = srv.nextCSIVolumeClaimResponse
srv.countCSIVolumeClaim++
return srv.nextCSIVolumeClaimError
default:
return fmt.Errorf("unexpected method %q passed to mock", method)
}
}
func (srv *MockRPCServer) State() *state.StateStore { return srv.state }

View file

@ -348,15 +348,31 @@ func (v *CSIVolume) Claim(args *structs.CSIVolumeClaimRequest, reply *structs.CS
return structs.ErrPermissionDenied return structs.ErrPermissionDenied
} }
// if this is a new claim, add a Volume and PublishContext from the // COMPAT(1.0): the NodeID field was added after 0.11.0 and so we
// controller (if any) to the reply // need to ensure it's been populated during upgrades from 0.11.0
// to later patch versions. Remove this block in 1.0
if args.Claim != structs.CSIVolumeClaimRelease && args.NodeID == "" {
state := v.srv.fsm.State()
ws := memdb.NewWatchSet()
alloc, err := state.AllocByID(ws, args.AllocationID)
if err != nil {
return err
}
if alloc == nil {
return fmt.Errorf("%s: %s",
structs.ErrUnknownAllocationPrefix, args.AllocationID)
}
args.NodeID = alloc.NodeID
}
if args.Claim != structs.CSIVolumeClaimRelease { if args.Claim != structs.CSIVolumeClaimRelease {
// if this is a new claim, add a Volume and PublishContext from the
// controller (if any) to the reply
err = v.controllerPublishVolume(args, reply) err = v.controllerPublishVolume(args, reply)
if err != nil { if err != nil {
return fmt.Errorf("controller publish: %v", err) return fmt.Errorf("controller publish: %v", err)
} }
} }
resp, index, err := v.srv.raftApply(structs.CSIVolumeClaimRequestType, args) resp, index, err := v.srv.raftApply(structs.CSIVolumeClaimRequestType, args)
if err != nil { if err != nil {
v.logger.Error("csi raft apply failed", "error", err, "method", "claim") v.logger.Error("csi raft apply failed", "error", err, "method", "claim")
@ -400,6 +416,7 @@ func (v *CSIVolume) controllerPublishVolume(req *structs.CSIVolumeClaimRequest,
return nil return nil
} }
// get Nomad's ID for the client node (not the storage provider's ID)
targetNode, err := state.NodeByID(ws, alloc.NodeID) targetNode, err := state.NodeByID(ws, alloc.NodeID)
if err != nil { if err != nil {
return err return err
@ -407,15 +424,19 @@ func (v *CSIVolume) controllerPublishVolume(req *structs.CSIVolumeClaimRequest,
if targetNode == nil { if targetNode == nil {
return fmt.Errorf("%s: %s", structs.ErrUnknownNodePrefix, alloc.NodeID) return fmt.Errorf("%s: %s", structs.ErrUnknownNodePrefix, alloc.NodeID)
} }
// get the the storage provider's ID for the client node (not
// Nomad's ID for the node)
targetCSIInfo, ok := targetNode.CSINodePlugins[plug.ID] targetCSIInfo, ok := targetNode.CSINodePlugins[plug.ID]
if !ok { if !ok {
return fmt.Errorf("Failed to find NodeInfo for node: %s", targetNode.ID) return fmt.Errorf("Failed to find NodeInfo for node: %s", targetNode.ID)
} }
externalNodeID := targetCSIInfo.NodeInfo.ID
method := "ClientCSI.ControllerAttachVolume" method := "ClientCSI.ControllerAttachVolume"
cReq := &cstructs.ClientCSIControllerAttachVolumeRequest{ cReq := &cstructs.ClientCSIControllerAttachVolumeRequest{
VolumeID: vol.RemoteID(), VolumeID: vol.RemoteID(),
ClientCSINodeID: targetCSIInfo.NodeInfo.ID, ClientCSINodeID: externalNodeID,
AttachmentMode: vol.AttachmentMode, AttachmentMode: vol.AttachmentMode,
AccessMode: vol.AccessMode, AccessMode: vol.AccessMode,
ReadOnly: req.Claim == structs.CSIVolumeClaimRead, ReadOnly: req.Claim == structs.CSIVolumeClaimRead,

View file

@ -201,11 +201,22 @@ func TestCSIVolumeEndpoint_Claim(t *testing.T) {
defer shutdown() defer shutdown()
testutil.WaitForLeader(t, srv.RPC) testutil.WaitForLeader(t, srv.RPC)
index := uint64(1000)
state := srv.fsm.State() state := srv.fsm.State()
codec := rpcClient(t, srv) codec := rpcClient(t, srv)
id0 := uuid.Generate() id0 := uuid.Generate()
alloc := mock.BatchAlloc() alloc := mock.BatchAlloc()
// Create a client node and alloc
node := mock.Node()
alloc.NodeID = node.ID
summary := mock.JobSummary(alloc.JobID)
index++
require.NoError(t, state.UpsertJobSummary(index, summary))
index++
require.NoError(t, state.UpsertAllocs(index, []*structs.Allocation{alloc}))
// Create an initial volume claim request; we expect it to fail // Create an initial volume claim request; we expect it to fail
// because there's no such volume yet. // because there's no such volume yet.
claimReq := &structs.CSIVolumeClaimRequest{ claimReq := &structs.CSIVolumeClaimRequest{
@ -222,8 +233,8 @@ func TestCSIVolumeEndpoint_Claim(t *testing.T) {
require.EqualError(t, err, fmt.Sprintf("controller publish: volume not found: %s", id0), require.EqualError(t, err, fmt.Sprintf("controller publish: volume not found: %s", id0),
"expected 'volume not found' error because volume hasn't yet been created") "expected 'volume not found' error because volume hasn't yet been created")
// Create a client node, plugin, alloc, and volume // Create a plugin and volume
node := mock.Node()
node.CSINodePlugins = map[string]*structs.CSIInfo{ node.CSINodePlugins = map[string]*structs.CSIInfo{
"minnie": { "minnie": {
PluginID: "minnie", PluginID: "minnie",
@ -231,7 +242,8 @@ func TestCSIVolumeEndpoint_Claim(t *testing.T) {
NodeInfo: &structs.CSINodeInfo{}, NodeInfo: &structs.CSINodeInfo{},
}, },
} }
err = state.UpsertNode(1002, node) index++
err = state.UpsertNode(index, node)
require.NoError(t, err) require.NoError(t, err)
vols := []*structs.CSIVolume{{ vols := []*structs.CSIVolume{{
@ -244,7 +256,8 @@ func TestCSIVolumeEndpoint_Claim(t *testing.T) {
Segments: map[string]string{"foo": "bar"}, Segments: map[string]string{"foo": "bar"},
}}, }},
}} }}
err = state.CSIVolumeRegister(1003, vols) index++
err = state.CSIVolumeRegister(index, vols)
require.NoError(t, err) require.NoError(t, err)
// Verify that the volume exists, and is healthy // Verify that the volume exists, and is healthy
@ -263,12 +276,6 @@ func TestCSIVolumeEndpoint_Claim(t *testing.T) {
require.Len(t, volGetResp.Volume.ReadAllocs, 0) require.Len(t, volGetResp.Volume.ReadAllocs, 0)
require.Len(t, volGetResp.Volume.WriteAllocs, 0) require.Len(t, volGetResp.Volume.WriteAllocs, 0)
// Upsert the job and alloc
alloc.NodeID = node.ID
summary := mock.JobSummary(alloc.JobID)
require.NoError(t, state.UpsertJobSummary(1004, summary))
require.NoError(t, state.UpsertAllocs(1005, []*structs.Allocation{alloc}))
// Now our claim should succeed // Now our claim should succeed
err = msgpackrpc.CallWithCodec(codec, "CSIVolume.Claim", claimReq, claimResp) err = msgpackrpc.CallWithCodec(codec, "CSIVolume.Claim", claimReq, claimResp)
require.NoError(t, err) require.NoError(t, err)
@ -284,8 +291,10 @@ func TestCSIVolumeEndpoint_Claim(t *testing.T) {
alloc2 := mock.Alloc() alloc2 := mock.Alloc()
alloc2.JobID = uuid.Generate() alloc2.JobID = uuid.Generate()
summary = mock.JobSummary(alloc2.JobID) summary = mock.JobSummary(alloc2.JobID)
require.NoError(t, state.UpsertJobSummary(1005, summary)) index++
require.NoError(t, state.UpsertAllocs(1006, []*structs.Allocation{alloc2})) require.NoError(t, state.UpsertJobSummary(index, summary))
index++
require.NoError(t, state.UpsertAllocs(index, []*structs.Allocation{alloc2}))
claimReq.AllocationID = alloc2.ID claimReq.AllocationID = alloc2.ID
err = msgpackrpc.CallWithCodec(codec, "CSIVolume.Claim", claimReq, claimResp) err = msgpackrpc.CallWithCodec(codec, "CSIVolume.Claim", claimReq, claimResp)
require.EqualError(t, err, "volume max claim reached", require.EqualError(t, err, "volume max claim reached",

View file

@ -270,6 +270,8 @@ func (n *nomadFSM) Apply(log *raft.Log) interface{} {
return n.applyCSIVolumeDeregister(buf[1:], log.Index) return n.applyCSIVolumeDeregister(buf[1:], log.Index)
case structs.CSIVolumeClaimRequestType: case structs.CSIVolumeClaimRequestType:
return n.applyCSIVolumeClaim(buf[1:], log.Index) return n.applyCSIVolumeClaim(buf[1:], log.Index)
case structs.CSIVolumeClaimBatchRequestType:
return n.applyCSIVolumeBatchClaim(buf[1:], log.Index)
case structs.ScalingEventRegisterRequestType: case structs.ScalingEventRegisterRequestType:
return n.applyUpsertScalingEvent(buf[1:], log.Index) return n.applyUpsertScalingEvent(buf[1:], log.Index)
} }
@ -1156,6 +1158,24 @@ func (n *nomadFSM) applyCSIVolumeDeregister(buf []byte, index uint64) interface{
return nil return nil
} }
func (n *nomadFSM) applyCSIVolumeBatchClaim(buf []byte, index uint64) interface{} {
var batch *structs.CSIVolumeClaimBatchRequest
if err := structs.Decode(buf, &batch); err != nil {
panic(fmt.Errorf("failed to decode request: %v", err))
}
defer metrics.MeasureSince([]string{"nomad", "fsm", "apply_csi_volume_batch_claim"}, time.Now())
for _, req := range batch.Claims {
err := n.state.CSIVolumeClaim(index, req.RequestNamespace(),
req.VolumeID, req.ToClaim())
if err != nil {
n.logger.Error("CSIVolumeClaim for batch failed", "error", err)
return err // note: fails the remaining batch
}
}
return nil
}
func (n *nomadFSM) applyCSIVolumeClaim(buf []byte, index uint64) interface{} { func (n *nomadFSM) applyCSIVolumeClaim(buf []byte, index uint64) interface{} {
var req structs.CSIVolumeClaimRequest var req structs.CSIVolumeClaimRequest
if err := structs.Decode(buf, &req); err != nil { if err := structs.Decode(buf, &req); err != nil {
@ -1163,26 +1183,10 @@ func (n *nomadFSM) applyCSIVolumeClaim(buf []byte, index uint64) interface{} {
} }
defer metrics.MeasureSince([]string{"nomad", "fsm", "apply_csi_volume_claim"}, time.Now()) defer metrics.MeasureSince([]string{"nomad", "fsm", "apply_csi_volume_claim"}, time.Now())
ws := memdb.NewWatchSet() if err := n.state.CSIVolumeClaim(index, req.RequestNamespace(), req.VolumeID, req.ToClaim()); err != nil {
alloc, err := n.state.AllocByID(ws, req.AllocationID)
if err != nil {
n.logger.Error("AllocByID failed", "error", err)
return err
}
if alloc == nil {
n.logger.Error("AllocByID failed to find alloc", "alloc_id", req.AllocationID)
if err != nil {
return err
}
return structs.ErrUnknownAllocationPrefix
}
if err := n.state.CSIVolumeClaim(index, req.RequestNamespace(), req.VolumeID, alloc, req.Claim); err != nil {
n.logger.Error("CSIVolumeClaim failed", "error", err) n.logger.Error("CSIVolumeClaim failed", "error", err)
return err return err
} }
return nil return nil
} }

View file

@ -1,11 +0,0 @@
package nomad
import "github.com/hashicorp/nomad/nomad/state"
// RPCServer is a minimal interface of the Server, intended as
// an aid for testing logic surrounding server-to-server or
// server-to-client RPC calls
type RPCServer interface {
RPC(method string, args interface{}, reply interface{}) error
State() *state.StateStore
}

View file

@ -737,19 +737,13 @@ func (j *Job) Deregister(args *structs.JobDeregisterRequest, reply *structs.JobD
for _, vol := range volumesToGC { for _, vol := range volumesToGC {
// we have to build this eval by hand rather than calling srv.CoreJob // we have to build this eval by hand rather than calling srv.CoreJob
// here because we need to use the volume's namespace // here because we need to use the volume's namespace
runningAllocs := ":ok"
if args.Purge {
runningAllocs = ":purge"
}
eval := &structs.Evaluation{ eval := &structs.Evaluation{
ID: uuid.Generate(), ID: uuid.Generate(),
Namespace: job.Namespace, Namespace: job.Namespace,
Priority: structs.CoreJobPriority, Priority: structs.CoreJobPriority,
Type: structs.JobTypeCore, Type: structs.JobTypeCore,
TriggeredBy: structs.EvalTriggerAllocStop, TriggeredBy: structs.EvalTriggerAllocStop,
JobID: structs.CoreJobCSIVolumeClaimGC + ":" + vol.Source + runningAllocs, JobID: structs.CoreJobCSIVolumeClaimGC + ":" + vol.Source,
LeaderACL: j.srv.getLeaderAcl(), LeaderACL: j.srv.getLeaderAcl(),
Status: structs.EvalStatusPending, Status: structs.EvalStatusPending,
CreateTime: now, CreateTime: now,
@ -1806,10 +1800,6 @@ func (j *Job) ScaleStatus(args *structs.JobScaleStatusRequest,
reply.JobScaleStatus = nil reply.JobScaleStatus = nil
return nil return nil
} }
deployment, err := state.LatestDeploymentByJobID(ws, args.RequestNamespace(), args.JobID)
if err != nil {
return err
}
events, eventsIndex, err := state.ScalingEventsByJob(ws, args.RequestNamespace(), args.JobID) events, eventsIndex, err := state.ScalingEventsByJob(ws, args.RequestNamespace(), args.JobID)
if err != nil { if err != nil {
@ -1819,6 +1809,13 @@ func (j *Job) ScaleStatus(args *structs.JobScaleStatusRequest,
events = make(map[string][]*structs.ScalingEvent) events = make(map[string][]*structs.ScalingEvent)
} }
var allocs []*structs.Allocation
var allocsIndex uint64
allocs, err = state.AllocsByJob(ws, job.Namespace, job.ID, false)
if err != nil {
return err
}
// Setup the output // Setup the output
reply.JobScaleStatus = &structs.JobScaleStatus{ reply.JobScaleStatus = &structs.JobScaleStatus{
JobID: job.ID, JobID: job.ID,
@ -1832,24 +1829,45 @@ func (j *Job) ScaleStatus(args *structs.JobScaleStatusRequest,
tgScale := &structs.TaskGroupScaleStatus{ tgScale := &structs.TaskGroupScaleStatus{
Desired: tg.Count, Desired: tg.Count,
} }
if deployment != nil {
if ds, ok := deployment.TaskGroups[tg.Name]; ok {
tgScale.Placed = ds.PlacedAllocs
tgScale.Healthy = ds.HealthyAllocs
tgScale.Unhealthy = ds.UnhealthyAllocs
}
}
tgScale.Events = events[tg.Name] tgScale.Events = events[tg.Name]
reply.JobScaleStatus.TaskGroups[tg.Name] = tgScale reply.JobScaleStatus.TaskGroups[tg.Name] = tgScale
} }
maxIndex := job.ModifyIndex for _, alloc := range allocs {
if deployment != nil && deployment.ModifyIndex > maxIndex { // TODO: ignore canaries until we figure out what we should do with canaries
maxIndex = deployment.ModifyIndex if alloc.DeploymentStatus != nil && alloc.DeploymentStatus.Canary {
continue
}
if alloc.TerminalStatus() {
continue
}
tgScale, ok := reply.JobScaleStatus.TaskGroups[alloc.TaskGroup]
if !ok || tgScale == nil {
continue
}
tgScale.Placed++
if alloc.ClientStatus == structs.AllocClientStatusRunning {
tgScale.Running++
}
if alloc.DeploymentStatus != nil && alloc.DeploymentStatus.HasHealth() {
if alloc.DeploymentStatus.IsHealthy() {
tgScale.Healthy++
} else if alloc.DeploymentStatus.IsUnhealthy() {
tgScale.Unhealthy++
}
}
if alloc.ModifyIndex > allocsIndex {
allocsIndex = alloc.ModifyIndex
}
} }
maxIndex := job.ModifyIndex
if eventsIndex > maxIndex { if eventsIndex > maxIndex {
maxIndex = eventsIndex maxIndex = eventsIndex
} }
if allocsIndex > maxIndex {
maxIndex = allocsIndex
}
reply.Index = maxIndex reply.Index = maxIndex
// Set the query response // Set the query response

View file

@ -1,9 +1,11 @@
package nomad package nomad
import ( import (
"fmt"
"strconv" "strconv"
"strings" "strings"
"github.com/hashicorp/nomad/helper/uuid"
"github.com/hashicorp/nomad/nomad/structs" "github.com/hashicorp/nomad/nomad/structs"
"github.com/pkg/errors" "github.com/pkg/errors"
) )
@ -197,6 +199,21 @@ func exposePathForCheck(tg *structs.TaskGroup, s *structs.Service, check *struct
return nil, nil return nil, nil
} }
// If the check is exposable but doesn't have a port label set build
// a port with a generated label, add it to the group's Dynamic ports
// and set the check port label to the generated label.
//
// This lets PortLabel be optional for any exposed check.
if check.PortLabel == "" {
port := structs.Port{
Label: fmt.Sprintf("svc_%s_ck_%s", s.Name, uuid.Generate()[:6]),
To: -1,
}
tg.Networks[0].DynamicPorts = append(tg.Networks[0].DynamicPorts, port)
check.PortLabel = port.Label
}
// Determine the local service port (i.e. what port the service is actually // Determine the local service port (i.e. what port the service is actually
// listening to inside the network namespace). // listening to inside the network namespace).
// //
@ -216,9 +233,7 @@ func exposePathForCheck(tg *structs.TaskGroup, s *structs.Service, check *struct
} }
// The Path, Protocol, and PortLabel are just copied over from the service // The Path, Protocol, and PortLabel are just copied over from the service
// check definition. It is required that the user configure their own port // check definition.
// mapping for each check, including setting the 'to = -1' sentinel value
// enabling the network namespace pass-through.
return &structs.ConsulExposePath{ return &structs.ConsulExposePath{
Path: check.Path, Path: check.Path,
Protocol: check.Protocol, Protocol: check.Protocol,

View file

@ -346,6 +346,36 @@ func TestJobExposeCheckHook_exposePathForCheck(t *testing.T) {
}, s, c) }, s, c)
require.EqualError(t, err, `unable to determine local service port for service check group1->service1->check1`) require.EqualError(t, err, `unable to determine local service port for service check group1->service1->check1`)
}) })
t.Run("empty check port", func(t *testing.T) {
c := &structs.ServiceCheck{
Name: "check1",
Type: "http",
Path: "/health",
}
s := &structs.Service{
Name: "service1",
PortLabel: "9999",
Checks: []*structs.ServiceCheck{c},
}
tg := &structs.TaskGroup{
Name: "group1",
Services: []*structs.Service{s},
Networks: structs.Networks{{
Mode: "bridge",
DynamicPorts: []structs.Port{},
}},
}
ePath, err := exposePathForCheck(tg, s, c)
require.NoError(t, err)
require.Len(t, tg.Networks[0].DynamicPorts, 1)
require.Equal(t, &structs.ConsulExposePath{
Path: "/health",
Protocol: "",
LocalPathPort: 9999,
ListenerPort: tg.Networks[0].DynamicPorts[0].Label,
}, ePath)
})
} }
func TestJobExposeCheckHook_containsExposePath(t *testing.T) { func TestJobExposeCheckHook_containsExposePath(t *testing.T) {

View file

@ -5627,42 +5627,104 @@ func TestJobEndpoint_GetScaleStatus(t *testing.T) {
testutil.WaitForLeader(t, s1.RPC) testutil.WaitForLeader(t, s1.RPC)
state := s1.fsm.State() state := s1.fsm.State()
job := mock.Job() jobV1 := mock.Job()
// check before job registration // check before registration
// Fetch the scaling status // Fetch the scaling status
get := &structs.JobScaleStatusRequest{ get := &structs.JobScaleStatusRequest{
JobID: job.ID, JobID: jobV1.ID,
QueryOptions: structs.QueryOptions{ QueryOptions: structs.QueryOptions{
Region: "global", Region: "global",
Namespace: job.Namespace, Namespace: jobV1.Namespace,
}, },
} }
var resp2 structs.JobScaleStatusResponse var resp2 structs.JobScaleStatusResponse
require.NoError(msgpackrpc.CallWithCodec(codec, "Job.ScaleStatus", get, &resp2)) require.NoError(msgpackrpc.CallWithCodec(codec, "Job.ScaleStatus", get, &resp2))
require.Nil(resp2.JobScaleStatus) require.Nil(resp2.JobScaleStatus)
// Create the register request // stopped (previous version)
err := state.UpsertJob(1000, job) require.NoError(state.UpsertJob(1000, jobV1), "UpsertJob")
require.Nil(err) a0 := mock.Alloc()
a0.Job = jobV1
a0.Namespace = jobV1.Namespace
a0.JobID = jobV1.ID
a0.ClientStatus = structs.AllocClientStatusComplete
require.NoError(state.UpsertAllocs(1010, []*structs.Allocation{a0}), "UpsertAllocs")
jobV2 := jobV1.Copy()
require.NoError(state.UpsertJob(1100, jobV2), "UpsertJob")
a1 := mock.Alloc()
a1.Job = jobV2
a1.Namespace = jobV2.Namespace
a1.JobID = jobV2.ID
a1.ClientStatus = structs.AllocClientStatusRunning
// healthy
a1.DeploymentStatus = &structs.AllocDeploymentStatus{
Healthy: helper.BoolToPtr(true),
}
a2 := mock.Alloc()
a2.Job = jobV2
a2.Namespace = jobV2.Namespace
a2.JobID = jobV2.ID
a2.ClientStatus = structs.AllocClientStatusPending
// unhealthy
a2.DeploymentStatus = &structs.AllocDeploymentStatus{
Healthy: helper.BoolToPtr(false),
}
a3 := mock.Alloc()
a3.Job = jobV2
a3.Namespace = jobV2.Namespace
a3.JobID = jobV2.ID
a3.ClientStatus = structs.AllocClientStatusRunning
// canary
a3.DeploymentStatus = &structs.AllocDeploymentStatus{
Healthy: helper.BoolToPtr(true),
Canary: true,
}
// no health
a4 := mock.Alloc()
a4.Job = jobV2
a4.Namespace = jobV2.Namespace
a4.JobID = jobV2.ID
a4.ClientStatus = structs.AllocClientStatusRunning
// upsert allocations
require.NoError(state.UpsertAllocs(1110, []*structs.Allocation{a1, a2, a3, a4}), "UpsertAllocs")
event := &structs.ScalingEvent{
Time: time.Now().Unix(),
Count: helper.Int64ToPtr(5),
Message: "message",
Error: false,
Meta: map[string]interface{}{
"a": "b",
},
EvalID: nil,
}
require.NoError(state.UpsertScalingEvent(1003, &structs.ScalingEventRequest{
Namespace: jobV2.Namespace,
JobID: jobV2.ID,
TaskGroup: jobV2.TaskGroups[0].Name,
ScalingEvent: event,
}), "UpsertScalingEvent")
// check after job registration // check after job registration
require.NoError(msgpackrpc.CallWithCodec(codec, "Job.ScaleStatus", get, &resp2)) require.NoError(msgpackrpc.CallWithCodec(codec, "Job.ScaleStatus", get, &resp2))
require.NotNil(resp2.JobScaleStatus) require.NotNil(resp2.JobScaleStatus)
expectedStatus := structs.JobScaleStatus{ expectedStatus := structs.JobScaleStatus{
JobID: job.ID, JobID: jobV2.ID,
JobCreateIndex: job.CreateIndex, JobCreateIndex: jobV2.CreateIndex,
JobModifyIndex: job.ModifyIndex, JobModifyIndex: a1.CreateIndex,
JobStopped: job.Stop, JobStopped: jobV2.Stop,
TaskGroups: map[string]*structs.TaskGroupScaleStatus{ TaskGroups: map[string]*structs.TaskGroupScaleStatus{
job.TaskGroups[0].Name: { jobV2.TaskGroups[0].Name: {
Desired: job.TaskGroups[0].Count, Desired: jobV2.TaskGroups[0].Count,
Placed: 0, Placed: 3,
Running: 0, Running: 2,
Healthy: 0, Healthy: 1,
Unhealthy: 0, Unhealthy: 1,
Events: nil, Events: []*structs.ScalingEvent{event},
}, },
}, },
} }

View file

@ -241,6 +241,9 @@ func (s *Server) establishLeadership(stopCh chan struct{}) error {
// Enable the NodeDrainer // Enable the NodeDrainer
s.nodeDrainer.SetEnabled(true, s.State()) s.nodeDrainer.SetEnabled(true, s.State())
// Enable the volume watcher, since we are now the leader
s.volumeWatcher.SetEnabled(true, s.State())
// Restore the eval broker state // Restore the eval broker state
if err := s.restoreEvals(); err != nil { if err := s.restoreEvals(); err != nil {
return err return err
@ -870,6 +873,9 @@ func (s *Server) revokeLeadership() error {
// Disable the node drainer // Disable the node drainer
s.nodeDrainer.SetEnabled(false, nil) s.nodeDrainer.SetEnabled(false, nil)
// Disable the volume watcher
s.volumeWatcher.SetEnabled(false, nil)
// Disable any enterprise systems required. // Disable any enterprise systems required.
if err := s.revokeEnterpriseLeadership(); err != nil { if err := s.revokeEnterpriseLeadership(); err != nil {
return err return err

View file

@ -1313,6 +1313,9 @@ func CSIVolume(plugin *structs.CSIPlugin) *structs.CSIVolume {
MountOptions: &structs.CSIMountOptions{}, MountOptions: &structs.CSIMountOptions{},
ReadAllocs: map[string]*structs.Allocation{}, ReadAllocs: map[string]*structs.Allocation{},
WriteAllocs: map[string]*structs.Allocation{}, WriteAllocs: map[string]*structs.Allocation{},
ReadClaims: map[string]*structs.CSIVolumeClaim{},
WriteClaims: map[string]*structs.CSIVolumeClaim{},
PastClaims: map[string]*structs.CSIVolumeClaim{},
PluginID: plugin.ID, PluginID: plugin.ID,
Provider: plugin.Provider, Provider: plugin.Provider,
ProviderVersion: plugin.Version, ProviderVersion: plugin.Version,

View file

@ -1149,7 +1149,7 @@ func (n *Node) UpdateAlloc(args *structs.AllocUpdateRequest, reply *structs.Gene
Priority: structs.CoreJobPriority, Priority: structs.CoreJobPriority,
Type: structs.JobTypeCore, Type: structs.JobTypeCore,
TriggeredBy: structs.EvalTriggerAllocStop, TriggeredBy: structs.EvalTriggerAllocStop,
JobID: structs.CoreJobCSIVolumeClaimGC + ":" + volAndNamespace[0] + ":no", JobID: structs.CoreJobCSIVolumeClaimGC + ":" + volAndNamespace[0],
LeaderACL: n.srv.getLeaderAcl(), LeaderACL: n.srv.getLeaderAcl(),
Status: structs.EvalStatusPending, Status: structs.EvalStatusPending,
CreateTime: now.UTC().UnixNano(), CreateTime: now.UTC().UnixNano(),

View file

@ -2381,9 +2381,17 @@ func TestClientEndpoint_UpdateAlloc_UnclaimVolumes(t *testing.T) {
require.NoError(t, err) require.NoError(t, err)
// Claim the volumes and verify the claims were set // Claim the volumes and verify the claims were set
err = state.CSIVolumeClaim(105, ns, volId0, alloc1, structs.CSIVolumeClaimWrite) err = state.CSIVolumeClaim(105, ns, volId0, &structs.CSIVolumeClaim{
AllocationID: alloc1.ID,
NodeID: alloc1.NodeID,
Mode: structs.CSIVolumeClaimWrite,
})
require.NoError(t, err) require.NoError(t, err)
err = state.CSIVolumeClaim(106, ns, volId0, alloc2, structs.CSIVolumeClaimRead) err = state.CSIVolumeClaim(106, ns, volId0, &structs.CSIVolumeClaim{
AllocationID: alloc2.ID,
NodeID: alloc2.NodeID,
Mode: structs.CSIVolumeClaimRead,
})
require.NoError(t, err) require.NoError(t, err)
vol, err = state.CSIVolumeByID(ws, ns, volId0) vol, err = state.CSIVolumeByID(ws, ns, volId0)
require.NoError(t, err) require.NoError(t, err)
@ -2406,7 +2414,7 @@ func TestClientEndpoint_UpdateAlloc_UnclaimVolumes(t *testing.T) {
// Verify the eval for the claim GC was emitted // Verify the eval for the claim GC was emitted
// Lookup the evaluations // Lookup the evaluations
eval, err := state.EvalsByJob(ws, job.Namespace, structs.CoreJobCSIVolumeClaimGC+":"+volId0+":no") eval, err := state.EvalsByJob(ws, job.Namespace, structs.CoreJobCSIVolumeClaimGC+":"+volId0)
require.NotNil(t, eval) require.NotNil(t, eval)
require.Nil(t, err) require.Nil(t, err)
} }

View file

@ -35,6 +35,7 @@ import (
"github.com/hashicorp/nomad/nomad/state" "github.com/hashicorp/nomad/nomad/state"
"github.com/hashicorp/nomad/nomad/structs" "github.com/hashicorp/nomad/nomad/structs"
"github.com/hashicorp/nomad/nomad/structs/config" "github.com/hashicorp/nomad/nomad/structs/config"
"github.com/hashicorp/nomad/nomad/volumewatcher"
"github.com/hashicorp/nomad/scheduler" "github.com/hashicorp/nomad/scheduler"
"github.com/hashicorp/raft" "github.com/hashicorp/raft"
raftboltdb "github.com/hashicorp/raft-boltdb" raftboltdb "github.com/hashicorp/raft-boltdb"
@ -186,6 +187,9 @@ type Server struct {
// nodeDrainer is used to drain allocations from nodes. // nodeDrainer is used to drain allocations from nodes.
nodeDrainer *drainer.NodeDrainer nodeDrainer *drainer.NodeDrainer
// volumeWatcher is used to release volume claims
volumeWatcher *volumewatcher.Watcher
// evalBroker is used to manage the in-progress evaluations // evalBroker is used to manage the in-progress evaluations
// that are waiting to be brokered to a sub-scheduler // that are waiting to be brokered to a sub-scheduler
evalBroker *EvalBroker evalBroker *EvalBroker
@ -399,6 +403,12 @@ func NewServer(config *Config, consulCatalog consul.CatalogAPI, consulACLs consu
return nil, fmt.Errorf("failed to create deployment watcher: %v", err) return nil, fmt.Errorf("failed to create deployment watcher: %v", err)
} }
// Setup the volume watcher
if err := s.setupVolumeWatcher(); err != nil {
s.logger.Error("failed to create volume watcher", "error", err)
return nil, fmt.Errorf("failed to create volume watcher: %v", err)
}
// Setup the node drainer. // Setup the node drainer.
s.setupNodeDrainer() s.setupNodeDrainer()
@ -993,6 +1003,27 @@ func (s *Server) setupDeploymentWatcher() error {
return nil return nil
} }
// setupVolumeWatcher creates a volume watcher that consumes the RPC
// endpoints for state information and makes transitions via Raft through a
// shim that provides the appropriate methods.
func (s *Server) setupVolumeWatcher() error {
// Create the raft shim type to restrict the set of raft methods that can be
// made
raftShim := &volumeWatcherRaftShim{
apply: s.raftApply,
}
// Create the volume watcher
s.volumeWatcher = volumewatcher.NewVolumesWatcher(
s.logger, raftShim,
s.staticEndpoints.ClientCSI,
volumewatcher.LimitStateQueriesPerSecond,
volumewatcher.CrossVolumeUpdateBatchDuration)
return nil
}
// setupNodeDrainer creates a node drainer which will be enabled when a server // setupNodeDrainer creates a node drainer which will be enabled when a server
// becomes a leader. // becomes a leader.
func (s *Server) setupNodeDrainer() { func (s *Server) setupNodeDrainer() {

View file

@ -1187,15 +1187,14 @@ func (s *StateStore) deleteJobFromPlugin(index uint64, txn *memdb.Txn, job *stru
plugins := map[string]*structs.CSIPlugin{} plugins := map[string]*structs.CSIPlugin{}
for _, a := range allocs { for _, a := range allocs {
tg := job.LookupTaskGroup(a.TaskGroup)
// if its nil, we can just panic // if its nil, we can just panic
tg := a.Job.LookupTaskGroup(a.TaskGroup)
for _, t := range tg.Tasks { for _, t := range tg.Tasks {
if t.CSIPluginConfig != nil { if t.CSIPluginConfig != nil {
plugAllocs = append(plugAllocs, &pair{ plugAllocs = append(plugAllocs, &pair{
pluginID: t.CSIPluginConfig.ID, pluginID: t.CSIPluginConfig.ID,
alloc: a, alloc: a,
}) })
} }
} }
} }
@ -1479,16 +1478,10 @@ func (s *StateStore) DeleteJobTxn(index uint64, namespace, jobID string, txn Txn
return fmt.Errorf("index update failed: %v", err) return fmt.Errorf("index update failed: %v", err)
} }
// Delete any job scaling policies // Delete any remaining job scaling policies
numDeletedScalingPolicies, err := txn.DeleteAll("scaling_policy", "target_prefix", namespace, jobID) if err := s.deleteJobScalingPolicies(index, job, txn); err != nil {
if err != nil {
return fmt.Errorf("deleting job scaling policies failed: %v", err) return fmt.Errorf("deleting job scaling policies failed: %v", err)
} }
if numDeletedScalingPolicies > 0 {
if err := txn.Insert("index", &IndexEntry{"scaling_policy", index}); err != nil {
return fmt.Errorf("index update failed: %v", err)
}
}
// Delete the scaling events // Delete the scaling events
if _, err = txn.DeleteAll("scaling_event", "id", namespace, jobID); err != nil { if _, err = txn.DeleteAll("scaling_event", "id", namespace, jobID); err != nil {
@ -1507,6 +1500,20 @@ func (s *StateStore) DeleteJobTxn(index uint64, namespace, jobID string, txn Txn
return nil return nil
} }
// deleteJobScalingPolicies deletes any scaling policies associated with the job
func (s *StateStore) deleteJobScalingPolicies(index uint64, job *structs.Job, txn *memdb.Txn) error {
numDeletedScalingPolicies, err := txn.DeleteAll("scaling_policy", "target_prefix", job.Namespace, job.ID)
if err != nil {
return fmt.Errorf("deleting job scaling policies failed: %v", err)
}
if numDeletedScalingPolicies > 0 {
if err := txn.Insert("index", &IndexEntry{"scaling_policy", index}); err != nil {
return fmt.Errorf("index update failed: %v", err)
}
}
return nil
}
// deleteJobVersions deletes all versions of the given job. // deleteJobVersions deletes all versions of the given job.
func (s *StateStore) deleteJobVersions(index uint64, job *structs.Job, txn *memdb.Txn) error { func (s *StateStore) deleteJobVersions(index uint64, job *structs.Job, txn *memdb.Txn) error {
iter, err := txn.Get("job_version", "id_prefix", job.Namespace, job.ID) iter, err := txn.Get("job_version", "id_prefix", job.Namespace, job.ID)
@ -2018,9 +2025,10 @@ func (s *StateStore) CSIVolumesByNamespace(ws memdb.WatchSet, namespace string)
} }
// CSIVolumeClaim updates the volume's claim count and allocation list // CSIVolumeClaim updates the volume's claim count and allocation list
func (s *StateStore) CSIVolumeClaim(index uint64, namespace, id string, alloc *structs.Allocation, claim structs.CSIVolumeClaimMode) error { func (s *StateStore) CSIVolumeClaim(index uint64, namespace, id string, claim *structs.CSIVolumeClaim) error {
txn := s.db.Txn(true) txn := s.db.Txn(true)
defer txn.Abort() defer txn.Abort()
ws := memdb.NewWatchSet()
row, err := txn.First("csi_volumes", "id", namespace, id) row, err := txn.First("csi_volumes", "id", namespace, id)
if err != nil { if err != nil {
@ -2035,7 +2043,21 @@ func (s *StateStore) CSIVolumeClaim(index uint64, namespace, id string, alloc *s
return fmt.Errorf("volume row conversion error") return fmt.Errorf("volume row conversion error")
} }
ws := memdb.NewWatchSet() var alloc *structs.Allocation
if claim.Mode != structs.CSIVolumeClaimRelease {
alloc, err = s.AllocByID(ws, claim.AllocationID)
if err != nil {
s.logger.Error("AllocByID failed", "error", err)
return fmt.Errorf(structs.ErrUnknownAllocationPrefix)
}
if alloc == nil {
s.logger.Error("AllocByID failed to find alloc", "alloc_id", claim.AllocationID)
if err != nil {
return fmt.Errorf(structs.ErrUnknownAllocationPrefix)
}
}
}
volume, err := s.CSIVolumeDenormalizePlugins(ws, orig.Copy()) volume, err := s.CSIVolumeDenormalizePlugins(ws, orig.Copy())
if err != nil { if err != nil {
return err return err
@ -2046,9 +2068,14 @@ func (s *StateStore) CSIVolumeClaim(index uint64, namespace, id string, alloc *s
return err return err
} }
err = volume.Claim(claim, alloc) // in the case of a job deregistration, there will be no allocation ID
if err != nil { // for the claim but we still want to write an updated index to the volume
return err // so that volume reaping is triggered
if claim.AllocationID != "" {
err = volume.Claim(claim, alloc)
if err != nil {
return err
}
} }
volume.ModifyIndex = index volume.ModifyIndex = index
@ -2144,14 +2171,27 @@ func (s *StateStore) CSIVolumeDenormalizePlugins(ws memdb.WatchSet, vol *structs
return vol, nil return vol, nil
} }
// csiVolumeDenormalizeAllocs returns a CSIVolume with allocations // CSIVolumeDenormalize returns a CSIVolume with allocations
func (s *StateStore) CSIVolumeDenormalize(ws memdb.WatchSet, vol *structs.CSIVolume) (*structs.CSIVolume, error) { func (s *StateStore) CSIVolumeDenormalize(ws memdb.WatchSet, vol *structs.CSIVolume) (*structs.CSIVolume, error) {
for id := range vol.ReadAllocs { for id := range vol.ReadAllocs {
a, err := s.AllocByID(ws, id) a, err := s.AllocByID(ws, id)
if err != nil { if err != nil {
return nil, err return nil, err
} }
vol.ReadAllocs[id] = a if a != nil {
vol.ReadAllocs[id] = a
// COMPAT(1.0): the CSIVolumeClaim fields were added
// after 0.11.1, so claims made before that may be
// missing this value. (same for WriteAlloc below)
if _, ok := vol.ReadClaims[id]; !ok {
vol.ReadClaims[id] = &structs.CSIVolumeClaim{
AllocationID: a.ID,
NodeID: a.NodeID,
Mode: structs.CSIVolumeClaimRead,
State: structs.CSIVolumeClaimStateTaken,
}
}
}
} }
for id := range vol.WriteAllocs { for id := range vol.WriteAllocs {
@ -2159,7 +2199,17 @@ func (s *StateStore) CSIVolumeDenormalize(ws memdb.WatchSet, vol *structs.CSIVol
if err != nil { if err != nil {
return nil, err return nil, err
} }
vol.WriteAllocs[id] = a if a != nil {
vol.WriteAllocs[id] = a
if _, ok := vol.WriteClaims[id]; !ok {
vol.WriteClaims[id] = &structs.CSIVolumeClaim{
AllocationID: a.ID,
NodeID: a.NodeID,
Mode: structs.CSIVolumeClaimWrite,
State: structs.CSIVolumeClaimStateTaken,
}
}
}
} }
return vol, nil return vol, nil
@ -4244,6 +4294,13 @@ func (s *StateStore) updateJobScalingPolicies(index uint64, job *structs.Job, tx
ws := memdb.NewWatchSet() ws := memdb.NewWatchSet()
if job.Stop {
if err := s.deleteJobScalingPolicies(index, job, txn); err != nil {
return fmt.Errorf("deleting job scaling policies failed: %v", err)
}
return nil
}
scalingPolicies := job.GetScalingPolicies() scalingPolicies := job.GetScalingPolicies()
newTargets := map[string]struct{}{} newTargets := map[string]struct{}{}
for _, p := range scalingPolicies { for _, p := range scalingPolicies {

View file

@ -2941,18 +2941,33 @@ func TestStateStore_CSIVolume(t *testing.T) {
vs = slurp(iter) vs = slurp(iter)
require.Equal(t, 1, len(vs)) require.Equal(t, 1, len(vs))
// Allocs
a0 := mock.Alloc()
a1 := mock.Alloc()
index++
err = state.UpsertAllocs(index, []*structs.Allocation{a0, a1})
require.NoError(t, err)
// Claims // Claims
a0 := &structs.Allocation{ID: uuid.Generate()}
a1 := &structs.Allocation{ID: uuid.Generate()}
r := structs.CSIVolumeClaimRead r := structs.CSIVolumeClaimRead
w := structs.CSIVolumeClaimWrite w := structs.CSIVolumeClaimWrite
u := structs.CSIVolumeClaimRelease u := structs.CSIVolumeClaimRelease
claim0 := &structs.CSIVolumeClaim{
AllocationID: a0.ID,
NodeID: node.ID,
Mode: r,
}
claim1 := &structs.CSIVolumeClaim{
AllocationID: a1.ID,
NodeID: node.ID,
Mode: w,
}
index++ index++
err = state.CSIVolumeClaim(index, ns, vol0, a0, r) err = state.CSIVolumeClaim(index, ns, vol0, claim0)
require.NoError(t, err) require.NoError(t, err)
index++ index++
err = state.CSIVolumeClaim(index, ns, vol0, a1, w) err = state.CSIVolumeClaim(index, ns, vol0, claim1)
require.NoError(t, err) require.NoError(t, err)
ws = memdb.NewWatchSet() ws = memdb.NewWatchSet()
@ -2961,7 +2976,8 @@ func TestStateStore_CSIVolume(t *testing.T) {
vs = slurp(iter) vs = slurp(iter)
require.False(t, vs[0].WriteFreeClaims()) require.False(t, vs[0].WriteFreeClaims())
err = state.CSIVolumeClaim(2, ns, vol0, a0, u) claim0.Mode = u
err = state.CSIVolumeClaim(2, ns, vol0, claim0)
require.NoError(t, err) require.NoError(t, err)
ws = memdb.NewWatchSet() ws = memdb.NewWatchSet()
iter, err = state.CSIVolumesByPluginID(ws, ns, "minnie") iter, err = state.CSIVolumesByPluginID(ws, ns, "minnie")
@ -2980,10 +2996,13 @@ func TestStateStore_CSIVolume(t *testing.T) {
// release claims to unblock deregister // release claims to unblock deregister
index++ index++
err = state.CSIVolumeClaim(index, ns, vol0, a0, u) claim0.State = structs.CSIVolumeClaimStateReadyToFree
err = state.CSIVolumeClaim(index, ns, vol0, claim0)
require.NoError(t, err) require.NoError(t, err)
index++ index++
err = state.CSIVolumeClaim(index, ns, vol0, a1, u) claim1.Mode = u
claim1.State = structs.CSIVolumeClaimStateReadyToFree
err = state.CSIVolumeClaim(index, ns, vol0, claim1)
require.NoError(t, err) require.NoError(t, err)
index++ index++
@ -8427,7 +8446,96 @@ func TestStateStore_DeleteScalingPolicies(t *testing.T) {
require.False(watchFired(ws)) require.False(watchFired(ws))
} }
func TestStateStore_DeleteJob_ChildScalingPolicies(t *testing.T) { func TestStateStore_StopJob_DeleteScalingPolicies(t *testing.T) {
t.Parallel()
require := require.New(t)
state := testStateStore(t)
job := mock.Job()
err := state.UpsertJob(1000, job)
require.NoError(err)
policy := mock.ScalingPolicy()
policy.Target[structs.ScalingTargetJob] = job.ID
err = state.UpsertScalingPolicies(1100, []*structs.ScalingPolicy{policy})
require.NoError(err)
// Ensure the scaling policy is present and start some watches
wsGet := memdb.NewWatchSet()
out, err := state.ScalingPolicyByTarget(wsGet, policy.Target)
require.NoError(err)
require.NotNil(out)
wsList := memdb.NewWatchSet()
_, err = state.ScalingPolicies(wsList)
require.NoError(err)
// Stop the job
job, err = state.JobByID(nil, job.Namespace, job.ID)
require.NoError(err)
job.Stop = true
err = state.UpsertJob(1200, job)
require.NoError(err)
// Ensure:
// * the scaling policy was deleted
// * the watches were fired
// * the table index was advanced
require.True(watchFired(wsGet))
require.True(watchFired(wsList))
out, err = state.ScalingPolicyByTarget(nil, policy.Target)
require.NoError(err)
require.Nil(out)
index, err := state.Index("scaling_policy")
require.GreaterOrEqual(index, uint64(1200))
}
func TestStateStore_UnstopJob_UpsertScalingPolicies(t *testing.T) {
t.Parallel()
require := require.New(t)
state := testStateStore(t)
job, policy := mock.JobWithScalingPolicy()
job.Stop = true
// establish watcher, verify there are no scaling policies yet
ws := memdb.NewWatchSet()
list, err := state.ScalingPolicies(ws)
require.NoError(err)
require.Nil(list.Next())
// upsert a stopped job, verify that we don't fire the watcher or add any scaling policies
err = state.UpsertJob(1000, job)
require.NoError(err)
require.False(watchFired(ws))
// stopped job should have no scaling policies, watcher doesn't fire
list, err = state.ScalingPolicies(ws)
require.NoError(err)
require.Nil(list.Next())
// Establish a new watcher
ws = memdb.NewWatchSet()
_, err = state.ScalingPolicies(ws)
require.NoError(err)
// Unstop this job, say you'll run it again...
job.Stop = false
err = state.UpsertJob(1100, job)
require.NoError(err)
// Ensure the scaling policy was added, watch was fired, index was advanced
require.True(watchFired(ws))
out, err := state.ScalingPolicyByTarget(nil, policy.Target)
require.NoError(err)
require.NotNil(out)
index, err := state.Index("scaling_policy")
require.GreaterOrEqual(index, uint64(1100))
}
func TestStateStore_DeleteJob_DeleteScalingPolicies(t *testing.T) {
t.Parallel() t.Parallel()
require := require.New(t) require := require.New(t)

View file

@ -185,6 +185,22 @@ func (v *CSIMountOptions) GoString() string {
return v.String() return v.String()
} }
type CSIVolumeClaim struct {
AllocationID string
NodeID string
Mode CSIVolumeClaimMode
State CSIVolumeClaimState
}
type CSIVolumeClaimState int
const (
CSIVolumeClaimStateTaken CSIVolumeClaimState = iota
CSIVolumeClaimStateNodeDetached
CSIVolumeClaimStateControllerDetached
CSIVolumeClaimStateReadyToFree
)
// CSIVolume is the full representation of a CSI Volume // CSIVolume is the full representation of a CSI Volume
type CSIVolume struct { type CSIVolume struct {
// ID is a namespace unique URL safe identifier for the volume // ID is a namespace unique URL safe identifier for the volume
@ -200,8 +216,12 @@ type CSIVolume struct {
MountOptions *CSIMountOptions MountOptions *CSIMountOptions
// Allocations, tracking claim status // Allocations, tracking claim status
ReadAllocs map[string]*Allocation ReadAllocs map[string]*Allocation // AllocID -> Allocation
WriteAllocs map[string]*Allocation WriteAllocs map[string]*Allocation // AllocID -> Allocation
ReadClaims map[string]*CSIVolumeClaim // AllocID -> claim
WriteClaims map[string]*CSIVolumeClaim // AllocID -> claim
PastClaims map[string]*CSIVolumeClaim // AllocID -> claim
// Schedulable is true if all the denormalized plugin health fields are true, and the // Schedulable is true if all the denormalized plugin health fields are true, and the
// volume has not been marked for garbage collection // volume has not been marked for garbage collection
@ -262,6 +282,10 @@ func (v *CSIVolume) newStructs() {
v.ReadAllocs = map[string]*Allocation{} v.ReadAllocs = map[string]*Allocation{}
v.WriteAllocs = map[string]*Allocation{} v.WriteAllocs = map[string]*Allocation{}
v.ReadClaims = map[string]*CSIVolumeClaim{}
v.WriteClaims = map[string]*CSIVolumeClaim{}
v.PastClaims = map[string]*CSIVolumeClaim{}
} }
func (v *CSIVolume) RemoteID() string { func (v *CSIVolume) RemoteID() string {
@ -350,27 +374,43 @@ func (v *CSIVolume) Copy() *CSIVolume {
out.WriteAllocs[k] = v out.WriteAllocs[k] = v
} }
for k, v := range v.ReadClaims {
claim := *v
out.ReadClaims[k] = &claim
}
for k, v := range v.WriteClaims {
claim := *v
out.WriteClaims[k] = &claim
}
for k, v := range v.PastClaims {
claim := *v
out.PastClaims[k] = &claim
}
return out return out
} }
// Claim updates the allocations and changes the volume state // Claim updates the allocations and changes the volume state
func (v *CSIVolume) Claim(claim CSIVolumeClaimMode, alloc *Allocation) error { func (v *CSIVolume) Claim(claim *CSIVolumeClaim, alloc *Allocation) error {
switch claim { switch claim.Mode {
case CSIVolumeClaimRead: case CSIVolumeClaimRead:
return v.ClaimRead(alloc) return v.ClaimRead(claim, alloc)
case CSIVolumeClaimWrite: case CSIVolumeClaimWrite:
return v.ClaimWrite(alloc) return v.ClaimWrite(claim, alloc)
case CSIVolumeClaimRelease: case CSIVolumeClaimRelease:
return v.ClaimRelease(alloc) return v.ClaimRelease(claim)
} }
return nil return nil
} }
// ClaimRead marks an allocation as using a volume read-only // ClaimRead marks an allocation as using a volume read-only
func (v *CSIVolume) ClaimRead(alloc *Allocation) error { func (v *CSIVolume) ClaimRead(claim *CSIVolumeClaim, alloc *Allocation) error {
if _, ok := v.ReadAllocs[alloc.ID]; ok { if _, ok := v.ReadAllocs[claim.AllocationID]; ok {
return nil return nil
} }
if alloc == nil {
return fmt.Errorf("allocation missing: %s", claim.AllocationID)
}
if !v.ReadSchedulable() { if !v.ReadSchedulable() {
return fmt.Errorf("unschedulable") return fmt.Errorf("unschedulable")
@ -378,16 +418,24 @@ func (v *CSIVolume) ClaimRead(alloc *Allocation) error {
// Allocations are copy on write, so we want to keep the id but don't need the // Allocations are copy on write, so we want to keep the id but don't need the
// pointer. We'll get it from the db in denormalize. // pointer. We'll get it from the db in denormalize.
v.ReadAllocs[alloc.ID] = nil v.ReadAllocs[claim.AllocationID] = nil
delete(v.WriteAllocs, alloc.ID) delete(v.WriteAllocs, claim.AllocationID)
v.ReadClaims[claim.AllocationID] = claim
delete(v.WriteClaims, claim.AllocationID)
delete(v.PastClaims, claim.AllocationID)
return nil return nil
} }
// ClaimWrite marks an allocation as using a volume as a writer // ClaimWrite marks an allocation as using a volume as a writer
func (v *CSIVolume) ClaimWrite(alloc *Allocation) error { func (v *CSIVolume) ClaimWrite(claim *CSIVolumeClaim, alloc *Allocation) error {
if _, ok := v.WriteAllocs[alloc.ID]; ok { if _, ok := v.WriteAllocs[claim.AllocationID]; ok {
return nil return nil
} }
if alloc == nil {
return fmt.Errorf("allocation missing: %s", claim.AllocationID)
}
if !v.WriteSchedulable() { if !v.WriteSchedulable() {
return fmt.Errorf("unschedulable") return fmt.Errorf("unschedulable")
@ -406,13 +454,26 @@ func (v *CSIVolume) ClaimWrite(alloc *Allocation) error {
// pointer. We'll get it from the db in denormalize. // pointer. We'll get it from the db in denormalize.
v.WriteAllocs[alloc.ID] = nil v.WriteAllocs[alloc.ID] = nil
delete(v.ReadAllocs, alloc.ID) delete(v.ReadAllocs, alloc.ID)
v.WriteClaims[alloc.ID] = claim
delete(v.ReadClaims, alloc.ID)
delete(v.PastClaims, alloc.ID)
return nil return nil
} }
// ClaimRelease is called when the allocation has terminated and already stopped using the volume // ClaimRelease is called when the allocation has terminated and
func (v *CSIVolume) ClaimRelease(alloc *Allocation) error { // already stopped using the volume
delete(v.ReadAllocs, alloc.ID) func (v *CSIVolume) ClaimRelease(claim *CSIVolumeClaim) error {
delete(v.WriteAllocs, alloc.ID) if claim.State == CSIVolumeClaimStateReadyToFree {
delete(v.ReadAllocs, claim.AllocationID)
delete(v.WriteAllocs, claim.AllocationID)
delete(v.ReadClaims, claim.AllocationID)
delete(v.WriteClaims, claim.AllocationID)
delete(v.PastClaims, claim.AllocationID)
} else {
v.PastClaims[claim.AllocationID] = claim
}
return nil return nil
} }
@ -513,13 +574,28 @@ const (
CSIVolumeClaimRelease CSIVolumeClaimRelease
) )
type CSIVolumeClaimBatchRequest struct {
Claims []CSIVolumeClaimRequest
}
type CSIVolumeClaimRequest struct { type CSIVolumeClaimRequest struct {
VolumeID string VolumeID string
AllocationID string AllocationID string
NodeID string
Claim CSIVolumeClaimMode Claim CSIVolumeClaimMode
State CSIVolumeClaimState
WriteRequest WriteRequest
} }
func (req *CSIVolumeClaimRequest) ToClaim() *CSIVolumeClaim {
return &CSIVolumeClaim{
AllocationID: req.AllocationID,
NodeID: req.NodeID,
Mode: req.Claim,
State: req.State,
}
}
type CSIVolumeClaimResponse struct { type CSIVolumeClaimResponse struct {
// Opaque static publish properties of the volume. SP MAY use this // Opaque static publish properties of the volume. SP MAY use this
// field to ensure subsequent `NodeStageVolume` or `NodePublishVolume` // field to ensure subsequent `NodeStageVolume` or `NodePublishVolume`

View file

@ -12,17 +12,28 @@ func TestCSIVolumeClaim(t *testing.T) {
vol.Schedulable = true vol.Schedulable = true
alloc := &Allocation{ID: "a1", Namespace: "n", JobID: "j"} alloc := &Allocation{ID: "a1", Namespace: "n", JobID: "j"}
claim := &CSIVolumeClaim{
AllocationID: alloc.ID,
NodeID: "foo",
Mode: CSIVolumeClaimRead,
}
require.NoError(t, vol.ClaimRead(alloc)) require.NoError(t, vol.ClaimRead(claim, alloc))
require.True(t, vol.ReadSchedulable()) require.True(t, vol.ReadSchedulable())
require.True(t, vol.WriteSchedulable()) require.True(t, vol.WriteSchedulable())
require.NoError(t, vol.ClaimRead(alloc)) require.NoError(t, vol.ClaimRead(claim, alloc))
require.NoError(t, vol.ClaimWrite(alloc)) claim.Mode = CSIVolumeClaimWrite
require.NoError(t, vol.ClaimWrite(claim, alloc))
require.True(t, vol.ReadSchedulable()) require.True(t, vol.ReadSchedulable())
require.False(t, vol.WriteFreeClaims()) require.False(t, vol.WriteFreeClaims())
vol.ClaimRelease(alloc) vol.ClaimRelease(claim)
require.True(t, vol.ReadSchedulable())
require.False(t, vol.WriteFreeClaims())
claim.State = CSIVolumeClaimStateReadyToFree
vol.ClaimRelease(claim)
require.True(t, vol.ReadSchedulable()) require.True(t, vol.ReadSchedulable())
require.True(t, vol.WriteFreeClaims()) require.True(t, vol.WriteFreeClaims())
} }

View file

@ -2,5 +2,9 @@
set -e set -e
FILES="$(ls ./*.go | grep -v -e _test.go -e .generated.go | tr '\n' ' ')" FILES="$(ls ./*.go | grep -v -e _test.go -e .generated.go | tr '\n' ' ')"
codecgen -d 100 -t codegen_generated -o structs.generated.go ${FILES} codecgen \
sed -i'' -e 's|"github.com/ugorji/go/codec|"github.com/hashicorp/go-msgpack/codec|g' structs.generated.go -c github.com/hashicorp/go-msgpack/codec \
-d 100 \
-t codegen_generated \
-o structs.generated.go \
${FILES}

View file

@ -331,7 +331,7 @@ func (idx *NetworkIndex) AssignNetwork(ask *NetworkResource) (out *NetworkResour
// getDynamicPortsPrecise takes the nodes used port bitmap which may be nil if // getDynamicPortsPrecise takes the nodes used port bitmap which may be nil if
// no ports have been allocated yet, the network ask and returns a set of unused // no ports have been allocated yet, the network ask and returns a set of unused
// ports to fullfil the ask's DynamicPorts or an error if it failed. An error // ports to fulfil the ask's DynamicPorts or an error if it failed. An error
// means the ask can not be satisfied as the method does a precise search. // means the ask can not be satisfied as the method does a precise search.
func getDynamicPortsPrecise(nodeUsed Bitmap, ask *NetworkResource) ([]int, error) { func getDynamicPortsPrecise(nodeUsed Bitmap, ask *NetworkResource) ([]int, error) {
// Create a copy of the used ports and apply the new reserves // Create a copy of the used ports and apply the new reserves
@ -373,7 +373,7 @@ func getDynamicPortsPrecise(nodeUsed Bitmap, ask *NetworkResource) ([]int, error
// getDynamicPortsStochastic takes the nodes used port bitmap which may be nil if // getDynamicPortsStochastic takes the nodes used port bitmap which may be nil if
// no ports have been allocated yet, the network ask and returns a set of unused // no ports have been allocated yet, the network ask and returns a set of unused
// ports to fullfil the ask's DynamicPorts or an error if it failed. An error // ports to fulfil the ask's DynamicPorts or an error if it failed. An error
// does not mean the ask can not be satisfied as the method has a fixed amount // does not mean the ask can not be satisfied as the method has a fixed amount
// of random probes and if these fail, the search is aborted. // of random probes and if these fail, the search is aborted.
func getDynamicPortsStochastic(nodeUsed Bitmap, ask *NetworkResource) ([]int, error) { func getDynamicPortsStochastic(nodeUsed Bitmap, ask *NetworkResource) ([]int, error) {

View file

@ -889,7 +889,9 @@ type ConsulProxy struct {
// Expose configures the consul proxy.expose stanza to "open up" endpoints // Expose configures the consul proxy.expose stanza to "open up" endpoints
// used by task-group level service checks using HTTP or gRPC protocols. // used by task-group level service checks using HTTP or gRPC protocols.
Expose *ConsulExposeConfig //
// Use json tag to match with field name in api/
Expose *ConsulExposeConfig `json:"ExposeConfig"`
// Config is a proxy configuration. It is opaque to Nomad and passed // Config is a proxy configuration. It is opaque to Nomad and passed
// directly to Consul. // directly to Consul.
@ -905,7 +907,7 @@ func (p *ConsulProxy) Copy() *ConsulProxy {
newP := &ConsulProxy{ newP := &ConsulProxy{
LocalServiceAddress: p.LocalServiceAddress, LocalServiceAddress: p.LocalServiceAddress,
LocalServicePort: p.LocalServicePort, LocalServicePort: p.LocalServicePort,
Expose: p.Expose, Expose: p.Expose.Copy(),
} }
if n := len(p.Upstreams); n > 0 { if n := len(p.Upstreams); n > 0 {
@ -1009,7 +1011,8 @@ func (u *ConsulUpstream) Equals(o *ConsulUpstream) bool {
// ExposeConfig represents a Consul Connect expose jobspec stanza. // ExposeConfig represents a Consul Connect expose jobspec stanza.
type ConsulExposeConfig struct { type ConsulExposeConfig struct {
Paths []ConsulExposePath // Use json tag to match with field name in api/
Paths []ConsulExposePath `json:"Path"`
} }
type ConsulExposePath struct { type ConsulExposePath struct {

View file

@ -90,6 +90,7 @@ const (
CSIVolumeRegisterRequestType CSIVolumeRegisterRequestType
CSIVolumeDeregisterRequestType CSIVolumeDeregisterRequestType
CSIVolumeClaimRequestType CSIVolumeClaimRequestType
CSIVolumeClaimBatchRequestType
ScalingEventRegisterRequestType ScalingEventRegisterRequestType
) )
@ -1708,7 +1709,7 @@ type Node struct {
// COMPAT: Remove in Nomad 0.9 // COMPAT: Remove in Nomad 0.9
// Drain is controlled by the servers, and not the client. // Drain is controlled by the servers, and not the client.
// If true, no jobs will be scheduled to this node, and existing // If true, no jobs will be scheduled to this node, and existing
// allocations will be drained. Superceded by DrainStrategy in Nomad // allocations will be drained. Superseded by DrainStrategy in Nomad
// 0.8 but kept for backward compat. // 0.8 but kept for backward compat.
Drain bool Drain bool

View file

@ -423,7 +423,7 @@ func TestVaultClient_ValidateRole_Deprecated_Success(t *testing.T) {
}) })
} }
func TestVaultClient_ValidateRole_NonExistant(t *testing.T) { func TestVaultClient_ValidateRole_NonExistent(t *testing.T) {
t.Parallel() t.Parallel()
v := testutil.NewTestVault(t) v := testutil.NewTestVault(t)
defer v.Stop() defer v.Stop()

View file

@ -0,0 +1,125 @@
package volumewatcher
import (
"context"
"time"
"github.com/hashicorp/nomad/nomad/structs"
)
// VolumeUpdateBatcher is used to batch the updates for volume claims
type VolumeUpdateBatcher struct {
// batch is the batching duration
batch time.Duration
// raft is used to actually commit the updates
raft VolumeRaftEndpoints
// workCh is used to pass evaluations to the daemon process
workCh chan *updateWrapper
// ctx is used to exit the daemon batcher
ctx context.Context
}
// NewVolumeUpdateBatcher returns an VolumeUpdateBatcher that uses the
// passed raft endpoints to create the updates to volume claims, and
// exits the batcher when the passed exit channel is closed.
func NewVolumeUpdateBatcher(batchDuration time.Duration, raft VolumeRaftEndpoints, ctx context.Context) *VolumeUpdateBatcher {
b := &VolumeUpdateBatcher{
batch: batchDuration,
raft: raft,
ctx: ctx,
workCh: make(chan *updateWrapper, 10),
}
go b.batcher()
return b
}
// CreateUpdate batches the volume claim update and returns a future
// that tracks the completion of the request.
func (b *VolumeUpdateBatcher) CreateUpdate(claims []structs.CSIVolumeClaimRequest) *BatchFuture {
wrapper := &updateWrapper{
claims: claims,
f: make(chan *BatchFuture, 1),
}
b.workCh <- wrapper
return <-wrapper.f
}
type updateWrapper struct {
claims []structs.CSIVolumeClaimRequest
f chan *BatchFuture
}
// batcher is the long lived batcher goroutine
func (b *VolumeUpdateBatcher) batcher() {
var timerCh <-chan time.Time
claims := make(map[string]structs.CSIVolumeClaimRequest)
future := NewBatchFuture()
for {
select {
case <-b.ctx.Done():
// note: we can't flush here because we're likely no
// longer the leader
return
case w := <-b.workCh:
if timerCh == nil {
timerCh = time.After(b.batch)
}
// de-dupe and store the claim update, and attach the future
for _, upd := range w.claims {
claims[upd.VolumeID+upd.RequestNamespace()] = upd
}
w.f <- future
case <-timerCh:
// Capture the future and create a new one
f := future
future = NewBatchFuture()
// Create the batch request
req := structs.CSIVolumeClaimBatchRequest{}
for _, claim := range claims {
req.Claims = append(req.Claims, claim)
}
// Upsert the claims in a go routine
go f.Set(b.raft.UpsertVolumeClaims(&req))
// Reset the claims list and timer
claims = make(map[string]structs.CSIVolumeClaimRequest)
timerCh = nil
}
}
}
// BatchFuture is a future that can be used to retrieve the index for
// the update or any error in the update process
type BatchFuture struct {
index uint64
err error
waitCh chan struct{}
}
// NewBatchFuture returns a new BatchFuture
func NewBatchFuture() *BatchFuture {
return &BatchFuture{
waitCh: make(chan struct{}),
}
}
// Set sets the results of the future, unblocking any client.
func (f *BatchFuture) Set(index uint64, err error) {
f.index = index
f.err = err
close(f.waitCh)
}
// Results returns the creation index and any error.
func (f *BatchFuture) Results() (uint64, error) {
<-f.waitCh
return f.index, f.err
}

View file

@ -0,0 +1,85 @@
package volumewatcher
import (
"context"
"fmt"
"sync"
"testing"
"github.com/hashicorp/nomad/helper/testlog"
"github.com/hashicorp/nomad/nomad/mock"
"github.com/hashicorp/nomad/nomad/state"
"github.com/hashicorp/nomad/nomad/structs"
"github.com/stretchr/testify/require"
)
// TestVolumeWatch_Batcher tests the update batching logic
func TestVolumeWatch_Batcher(t *testing.T) {
t.Parallel()
require := require.New(t)
ctx, exitFn := context.WithCancel(context.Background())
defer exitFn()
srv := &MockBatchingRPCServer{}
srv.state = state.TestStateStore(t)
srv.volumeUpdateBatcher = NewVolumeUpdateBatcher(CrossVolumeUpdateBatchDuration, srv, ctx)
plugin := mock.CSIPlugin()
node := testNode(nil, plugin, srv.State())
// because we wait for the results to return from the batch for each
// Watcher.updateClaims, we can't test that we're batching except across
// multiple volume watchers. create 2 volumes and their watchers here.
alloc0 := mock.Alloc()
alloc0.ClientStatus = structs.AllocClientStatusComplete
vol0 := testVolume(nil, plugin, alloc0, node.ID)
w0 := &volumeWatcher{
v: vol0,
rpc: srv,
state: srv.State(),
updateClaims: srv.UpdateClaims,
logger: testlog.HCLogger(t),
}
alloc1 := mock.Alloc()
alloc1.ClientStatus = structs.AllocClientStatusComplete
vol1 := testVolume(nil, plugin, alloc1, node.ID)
w1 := &volumeWatcher{
v: vol1,
rpc: srv,
state: srv.State(),
updateClaims: srv.UpdateClaims,
logger: testlog.HCLogger(t),
}
srv.nextCSIControllerDetachError = fmt.Errorf("some controller plugin error")
var wg sync.WaitGroup
wg.Add(2)
go func() {
w0.volumeReapImpl(vol0)
wg.Done()
}()
go func() {
w1.volumeReapImpl(vol1)
wg.Done()
}()
wg.Wait()
require.Equal(structs.CSIVolumeClaimStateNodeDetached, vol0.PastClaims[alloc0.ID].State)
require.Equal(structs.CSIVolumeClaimStateNodeDetached, vol1.PastClaims[alloc1.ID].State)
require.Equal(2, srv.countCSINodeDetachVolume)
require.Equal(2, srv.countCSIControllerDetachVolume)
require.Equal(2, srv.countUpdateClaims)
// note: it's technically possible that the volumeReapImpl
// goroutines get de-scheduled and we don't write both updates in
// the same batch. but this seems really unlikely, so we're
// testing for both cases here so that if we start seeing a flake
// here in the future we have a clear cause for it.
require.GreaterOrEqual(srv.countUpsertVolumeClaims, 1)
require.Equal(1, srv.countUpsertVolumeClaims)
}

View file

@ -0,0 +1,28 @@
package volumewatcher
import (
cstructs "github.com/hashicorp/nomad/client/structs"
"github.com/hashicorp/nomad/nomad/structs"
)
// VolumeRaftEndpoints exposes the volume watcher to a set of functions
// to apply data transforms via Raft.
type VolumeRaftEndpoints interface {
// UpsertVolumeClaims applys a batch of claims to raft
UpsertVolumeClaims(*structs.CSIVolumeClaimBatchRequest) (uint64, error)
}
// ClientRPC is a minimal interface of the Server, intended as an aid
// for testing logic surrounding server-to-server or server-to-client
// RPC calls and to avoid circular references between the nomad
// package and the volumewatcher
type ClientRPC interface {
ControllerDetachVolume(args *cstructs.ClientCSIControllerDetachVolumeRequest, reply *cstructs.ClientCSIControllerDetachVolumeResponse) error
NodeDetachVolume(args *cstructs.ClientCSINodeDetachVolumeRequest, reply *cstructs.ClientCSINodeDetachVolumeResponse) error
}
// claimUpdater is the function used to update claims on behalf of a volume
// (used to wrap batch updates so that we can test
// volumeWatcher methods synchronously without batching)
type updateClaimsFn func(claims []structs.CSIVolumeClaimRequest) (uint64, error)

View file

@ -0,0 +1,148 @@
package volumewatcher
import (
cstructs "github.com/hashicorp/nomad/client/structs"
"github.com/hashicorp/nomad/nomad/mock"
"github.com/hashicorp/nomad/nomad/state"
"github.com/hashicorp/nomad/nomad/structs"
)
// Create a client node with plugin info
func testNode(node *structs.Node, plugin *structs.CSIPlugin, s *state.StateStore) *structs.Node {
if node != nil {
return node
}
node = mock.Node()
node.Attributes["nomad.version"] = "0.11.0" // client RPCs not supported on early version
node.CSINodePlugins = map[string]*structs.CSIInfo{
plugin.ID: {
PluginID: plugin.ID,
Healthy: true,
RequiresControllerPlugin: plugin.ControllerRequired,
NodeInfo: &structs.CSINodeInfo{},
},
}
if plugin.ControllerRequired {
node.CSIControllerPlugins = map[string]*structs.CSIInfo{
plugin.ID: {
PluginID: plugin.ID,
Healthy: true,
RequiresControllerPlugin: true,
ControllerInfo: &structs.CSIControllerInfo{
SupportsReadOnlyAttach: true,
SupportsAttachDetach: true,
SupportsListVolumes: true,
SupportsListVolumesAttachedNodes: false,
},
},
}
} else {
node.CSIControllerPlugins = map[string]*structs.CSIInfo{}
}
s.UpsertNode(99, node)
return node
}
// Create a test volume with claim info
func testVolume(vol *structs.CSIVolume, plugin *structs.CSIPlugin, alloc *structs.Allocation, nodeID string) *structs.CSIVolume {
if vol != nil {
return vol
}
vol = mock.CSIVolume(plugin)
vol.ControllerRequired = plugin.ControllerRequired
vol.ReadAllocs = map[string]*structs.Allocation{alloc.ID: alloc}
vol.ReadClaims = map[string]*structs.CSIVolumeClaim{
alloc.ID: {
AllocationID: alloc.ID,
NodeID: nodeID,
Mode: structs.CSIVolumeClaimRead,
State: structs.CSIVolumeClaimStateTaken,
},
}
return vol
}
// COMPAT(1.0): the claim fields were added after 0.11.1; this
// mock and the associated test cases can be removed for 1.0
func testOldVolume(vol *structs.CSIVolume, plugin *structs.CSIPlugin, alloc *structs.Allocation, nodeID string) *structs.CSIVolume {
if vol != nil {
return vol
}
vol = mock.CSIVolume(plugin)
vol.ControllerRequired = plugin.ControllerRequired
vol.ReadAllocs = map[string]*structs.Allocation{alloc.ID: alloc}
return vol
}
type MockRPCServer struct {
state *state.StateStore
// mock responses for ClientCSI.NodeDetachVolume
nextCSINodeDetachResponse *cstructs.ClientCSINodeDetachVolumeResponse
nextCSINodeDetachError error
countCSINodeDetachVolume int
// mock responses for ClientCSI.ControllerDetachVolume
nextCSIControllerDetachVolumeResponse *cstructs.ClientCSIControllerDetachVolumeResponse
nextCSIControllerDetachError error
countCSIControllerDetachVolume int
countUpdateClaims int
countUpsertVolumeClaims int
}
func (srv *MockRPCServer) ControllerDetachVolume(args *cstructs.ClientCSIControllerDetachVolumeRequest, reply *cstructs.ClientCSIControllerDetachVolumeResponse) error {
reply = srv.nextCSIControllerDetachVolumeResponse
srv.countCSIControllerDetachVolume++
return srv.nextCSIControllerDetachError
}
func (srv *MockRPCServer) NodeDetachVolume(args *cstructs.ClientCSINodeDetachVolumeRequest, reply *cstructs.ClientCSINodeDetachVolumeResponse) error {
reply = srv.nextCSINodeDetachResponse
srv.countCSINodeDetachVolume++
return srv.nextCSINodeDetachError
}
func (srv *MockRPCServer) UpsertVolumeClaims(*structs.CSIVolumeClaimBatchRequest) (uint64, error) {
srv.countUpsertVolumeClaims++
return 0, nil
}
func (srv *MockRPCServer) State() *state.StateStore { return srv.state }
func (srv *MockRPCServer) UpdateClaims(claims []structs.CSIVolumeClaimRequest) (uint64, error) {
srv.countUpdateClaims++
return 0, nil
}
type MockBatchingRPCServer struct {
MockRPCServer
volumeUpdateBatcher *VolumeUpdateBatcher
}
func (srv *MockBatchingRPCServer) UpdateClaims(claims []structs.CSIVolumeClaimRequest) (uint64, error) {
srv.countUpdateClaims++
return srv.volumeUpdateBatcher.CreateUpdate(claims).Results()
}
type MockStatefulRPCServer struct {
MockRPCServer
volumeUpdateBatcher *VolumeUpdateBatcher
}
func (srv *MockStatefulRPCServer) UpsertVolumeClaims(batch *structs.CSIVolumeClaimBatchRequest) (uint64, error) {
srv.countUpsertVolumeClaims++
index, _ := srv.state.LatestIndex()
for _, req := range batch.Claims {
index++
err := srv.state.CSIVolumeClaim(index, req.RequestNamespace(),
req.VolumeID, req.ToClaim())
if err != nil {
return 0, err
}
}
return index, nil
}

View file

@ -0,0 +1,382 @@
package volumewatcher
import (
"context"
"fmt"
"sync"
log "github.com/hashicorp/go-hclog"
memdb "github.com/hashicorp/go-memdb"
multierror "github.com/hashicorp/go-multierror"
cstructs "github.com/hashicorp/nomad/client/structs"
"github.com/hashicorp/nomad/nomad/state"
"github.com/hashicorp/nomad/nomad/structs"
)
// volumeWatcher is used to watch a single volume and trigger the
// scheduler when allocation health transitions.
type volumeWatcher struct {
// v is the volume being watched
v *structs.CSIVolume
// state is the state that is watched for state changes.
state *state.StateStore
// updateClaims is the function used to apply claims to raft
updateClaims updateClaimsFn
// server interface for CSI client RPCs
rpc ClientRPC
logger log.Logger
shutdownCtx context.Context // parent context
ctx context.Context // own context
exitFn context.CancelFunc
// updateCh is triggered when there is an updated volume
updateCh chan *structs.CSIVolume
wLock sync.RWMutex
running bool
}
// newVolumeWatcher returns a volume watcher that is used to watch
// volumes
func newVolumeWatcher(parent *Watcher, vol *structs.CSIVolume) *volumeWatcher {
w := &volumeWatcher{
updateCh: make(chan *structs.CSIVolume, 1),
updateClaims: parent.updateClaims,
v: vol,
state: parent.state,
rpc: parent.rpc,
logger: parent.logger.With("volume_id", vol.ID, "namespace", vol.Namespace),
shutdownCtx: parent.ctx,
}
// Start the long lived watcher that scans for allocation updates
w.Start()
return w
}
// Notify signals an update to the tracked volume.
func (vw *volumeWatcher) Notify(v *structs.CSIVolume) {
if !vw.isRunning() {
vw.Start()
}
select {
case vw.updateCh <- v:
case <-vw.shutdownCtx.Done(): // prevent deadlock if we stopped
case <-vw.ctx.Done(): // prevent deadlock if we stopped
}
}
func (vw *volumeWatcher) Start() {
vw.logger.Trace("starting watcher", "id", vw.v.ID, "namespace", vw.v.Namespace)
vw.wLock.Lock()
defer vw.wLock.Unlock()
vw.running = true
ctx, exitFn := context.WithCancel(vw.shutdownCtx)
vw.ctx = ctx
vw.exitFn = exitFn
go vw.watch()
}
// Stop stops watching the volume. This should be called whenever a
// volume's claims are fully reaped or the watcher is no longer needed.
func (vw *volumeWatcher) Stop() {
vw.logger.Trace("no more claims", "id", vw.v.ID, "namespace", vw.v.Namespace)
vw.exitFn()
}
func (vw *volumeWatcher) isRunning() bool {
vw.wLock.RLock()
defer vw.wLock.RUnlock()
select {
case <-vw.shutdownCtx.Done():
return false
case <-vw.ctx.Done():
return false
default:
return vw.running
}
}
// watch is the long-running function that watches for changes to a volume.
// Each pass steps the volume's claims through the various states of reaping
// until the volume has no more claims eligible to be reaped.
func (vw *volumeWatcher) watch() {
for {
select {
// TODO(tgross): currently server->client RPC have no cancellation
// context, so we can't stop the long-runner RPCs gracefully
case <-vw.shutdownCtx.Done():
return
case <-vw.ctx.Done():
return
case vol := <-vw.updateCh:
// while we won't make raft writes if we get a stale update,
// we can still fire extra CSI RPC calls if we don't check this
if vol == nil || vw.v == nil || vol.ModifyIndex >= vw.v.ModifyIndex {
vol = vw.getVolume(vol)
if vol == nil {
return
}
vw.volumeReap(vol)
}
}
}
}
// getVolume returns the tracked volume, fully populated with the current
// state
func (vw *volumeWatcher) getVolume(vol *structs.CSIVolume) *structs.CSIVolume {
vw.wLock.RLock()
defer vw.wLock.RUnlock()
var err error
ws := memdb.NewWatchSet()
vol, err = vw.state.CSIVolumeDenormalizePlugins(ws, vol.Copy())
if err != nil {
vw.logger.Error("could not query plugins for volume", "error", err)
return nil
}
vol, err = vw.state.CSIVolumeDenormalize(ws, vol)
if err != nil {
vw.logger.Error("could not query allocs for volume", "error", err)
return nil
}
vw.v = vol
return vol
}
// volumeReap collects errors for logging but doesn't return them
// to the main loop.
func (vw *volumeWatcher) volumeReap(vol *structs.CSIVolume) {
vw.logger.Trace("releasing unused volume claims", "id", vol.ID, "namespace", vol.Namespace)
err := vw.volumeReapImpl(vol)
if err != nil {
vw.logger.Error("error releasing volume claims", "error", err)
}
if vw.isUnclaimed(vol) {
vw.Stop()
}
}
func (vw *volumeWatcher) isUnclaimed(vol *structs.CSIVolume) bool {
return len(vol.ReadClaims) == 0 && len(vol.WriteClaims) == 0 && len(vol.PastClaims) == 0
}
func (vw *volumeWatcher) volumeReapImpl(vol *structs.CSIVolume) error {
var result *multierror.Error
nodeClaims := map[string]int{} // node IDs -> count
jobs := map[string]bool{} // jobID -> stopped
// if a job is purged, the subsequent alloc updates can't
// trigger a GC job because there's no job for them to query.
// Job.Deregister will send a claim release on all claims
// but the allocs will not yet be terminated. save the status
// for each job so that we don't requery in this pass
checkStopped := func(jobID string) bool {
namespace := vw.v.Namespace
isStopped, ok := jobs[jobID]
if !ok {
ws := memdb.NewWatchSet()
job, err := vw.state.JobByID(ws, namespace, jobID)
if err != nil {
isStopped = true
}
if job == nil || job.Stopped() {
isStopped = true
}
jobs[jobID] = isStopped
}
return isStopped
}
collect := func(allocs map[string]*structs.Allocation,
claims map[string]*structs.CSIVolumeClaim) {
for allocID, alloc := range allocs {
if alloc == nil {
_, exists := vol.PastClaims[allocID]
if !exists {
vol.PastClaims[allocID] = &structs.CSIVolumeClaim{
AllocationID: allocID,
State: structs.CSIVolumeClaimStateReadyToFree,
}
}
continue
}
nodeClaims[alloc.NodeID]++
if alloc.Terminated() || checkStopped(alloc.JobID) {
// don't overwrite the PastClaim if we've seen it before,
// so that we can track state between subsequent calls
_, exists := vol.PastClaims[allocID]
if !exists {
claim, ok := claims[allocID]
if !ok {
claim = &structs.CSIVolumeClaim{
AllocationID: allocID,
NodeID: alloc.NodeID,
}
}
claim.State = structs.CSIVolumeClaimStateTaken
vol.PastClaims[allocID] = claim
}
}
}
}
collect(vol.ReadAllocs, vol.ReadClaims)
collect(vol.WriteAllocs, vol.WriteClaims)
if len(vol.PastClaims) == 0 {
return nil
}
for _, claim := range vol.PastClaims {
var err error
// previous checkpoints may have set the past claim state already.
// in practice we should never see CSIVolumeClaimStateControllerDetached
// but having an option for the state makes it easy to add a checkpoint
// in a backwards compatible way if we need one later
switch claim.State {
case structs.CSIVolumeClaimStateNodeDetached:
goto NODE_DETACHED
case structs.CSIVolumeClaimStateControllerDetached:
goto RELEASE_CLAIM
case structs.CSIVolumeClaimStateReadyToFree:
goto RELEASE_CLAIM
}
err = vw.nodeDetach(vol, claim)
if err != nil {
result = multierror.Append(result, err)
break
}
NODE_DETACHED:
nodeClaims[claim.NodeID]--
err = vw.controllerDetach(vol, claim, nodeClaims)
if err != nil {
result = multierror.Append(result, err)
break
}
RELEASE_CLAIM:
err = vw.checkpoint(vol, claim)
if err != nil {
result = multierror.Append(result, err)
break
}
// the checkpoint deletes from the state store, but this operates
// on our local copy which aids in testing
delete(vol.PastClaims, claim.AllocationID)
}
return result.ErrorOrNil()
}
// nodeDetach makes the client NodePublish / NodeUnstage RPCs, which
// must be completed before controller operations or releasing the claim.
func (vw *volumeWatcher) nodeDetach(vol *structs.CSIVolume, claim *structs.CSIVolumeClaim) error {
vw.logger.Trace("detaching node", "id", vol.ID, "namespace", vol.Namespace)
nReq := &cstructs.ClientCSINodeDetachVolumeRequest{
PluginID: vol.PluginID,
VolumeID: vol.ID,
ExternalID: vol.RemoteID(),
AllocID: claim.AllocationID,
NodeID: claim.NodeID,
AttachmentMode: vol.AttachmentMode,
AccessMode: vol.AccessMode,
ReadOnly: claim.Mode == structs.CSIVolumeClaimRead,
}
err := vw.rpc.NodeDetachVolume(nReq,
&cstructs.ClientCSINodeDetachVolumeResponse{})
if err != nil {
return fmt.Errorf("could not detach from node: %v", err)
}
claim.State = structs.CSIVolumeClaimStateNodeDetached
return vw.checkpoint(vol, claim)
}
// controllerDetach makes the client RPC to the controller to
// unpublish the volume if a controller is required and no other
// allocs on the node need it
func (vw *volumeWatcher) controllerDetach(vol *structs.CSIVolume, claim *structs.CSIVolumeClaim, nodeClaims map[string]int) error {
if !vol.ControllerRequired || nodeClaims[claim.NodeID] > 1 {
claim.State = structs.CSIVolumeClaimStateReadyToFree
return nil
}
vw.logger.Trace("detaching controller", "id", vol.ID, "namespace", vol.Namespace)
// note: we need to get the CSI Node ID, which is not the same as
// the Nomad Node ID
ws := memdb.NewWatchSet()
targetNode, err := vw.state.NodeByID(ws, claim.NodeID)
if err != nil {
return err
}
if targetNode == nil {
return fmt.Errorf("%s: %s", structs.ErrUnknownNodePrefix, claim.NodeID)
}
targetCSIInfo, ok := targetNode.CSINodePlugins[vol.PluginID]
if !ok {
return fmt.Errorf("failed to find NodeInfo for node: %s", targetNode.ID)
}
plug, err := vw.state.CSIPluginByID(ws, vol.PluginID)
if err != nil {
return fmt.Errorf("plugin lookup error: %s %v", vol.PluginID, err)
}
if plug == nil {
return fmt.Errorf("plugin lookup error: %s missing plugin", vol.PluginID)
}
cReq := &cstructs.ClientCSIControllerDetachVolumeRequest{
VolumeID: vol.RemoteID(),
ClientCSINodeID: targetCSIInfo.NodeInfo.ID,
}
cReq.PluginID = plug.ID
err = vw.rpc.ControllerDetachVolume(cReq,
&cstructs.ClientCSIControllerDetachVolumeResponse{})
if err != nil {
return fmt.Errorf("could not detach from controller: %v", err)
}
claim.State = structs.CSIVolumeClaimStateReadyToFree
return nil
}
func (vw *volumeWatcher) checkpoint(vol *structs.CSIVolume, claim *structs.CSIVolumeClaim) error {
vw.logger.Trace("checkpointing claim", "id", vol.ID, "namespace", vol.Namespace)
req := structs.CSIVolumeClaimRequest{
VolumeID: vol.ID,
AllocationID: claim.AllocationID,
NodeID: claim.NodeID,
Claim: structs.CSIVolumeClaimRelease,
State: claim.State,
WriteRequest: structs.WriteRequest{
Namespace: vol.Namespace,
// Region: vol.Region, // TODO(tgross) should volumes have regions?
},
}
index, err := vw.updateClaims([]structs.CSIVolumeClaimRequest{req})
if err == nil && index != 0 {
vw.wLock.Lock()
defer vw.wLock.Unlock()
vw.v.ModifyIndex = index
}
if err != nil {
return fmt.Errorf("could not checkpoint claim release: %v", err)
}
return nil
}

View file

@ -0,0 +1,294 @@
package volumewatcher
import (
"context"
"fmt"
"testing"
"github.com/hashicorp/nomad/helper/testlog"
"github.com/hashicorp/nomad/nomad/mock"
"github.com/hashicorp/nomad/nomad/state"
"github.com/hashicorp/nomad/nomad/structs"
"github.com/stretchr/testify/require"
)
// TestVolumeWatch_OneReap tests one pass through the reaper
func TestVolumeWatch_OneReap(t *testing.T) {
t.Parallel()
require := require.New(t)
cases := []struct {
Name string
Volume *structs.CSIVolume
Node *structs.Node
ControllerRequired bool
ExpectedErr string
ExpectedClaimsCount int
ExpectedNodeDetachCount int
ExpectedControllerDetachCount int
ExpectedUpdateClaimsCount int
srv *MockRPCServer
}{
{
Name: "No terminal allocs",
Volume: mock.CSIVolume(mock.CSIPlugin()),
ControllerRequired: true,
srv: &MockRPCServer{
state: state.TestStateStore(t),
nextCSINodeDetachError: fmt.Errorf("should never see this"),
},
},
{
Name: "NodeDetachVolume fails",
ControllerRequired: true,
ExpectedErr: "some node plugin error",
ExpectedNodeDetachCount: 1,
srv: &MockRPCServer{
state: state.TestStateStore(t),
nextCSINodeDetachError: fmt.Errorf("some node plugin error"),
},
},
{
Name: "NodeDetachVolume node-only happy path",
ControllerRequired: false,
ExpectedNodeDetachCount: 1,
ExpectedUpdateClaimsCount: 2,
srv: &MockRPCServer{
state: state.TestStateStore(t),
},
},
{
Name: "ControllerDetachVolume no controllers available",
Node: mock.Node(),
ControllerRequired: true,
ExpectedErr: "Unknown node",
ExpectedNodeDetachCount: 1,
ExpectedUpdateClaimsCount: 1,
srv: &MockRPCServer{
state: state.TestStateStore(t),
},
},
{
Name: "ControllerDetachVolume controller error",
ControllerRequired: true,
ExpectedErr: "some controller error",
ExpectedNodeDetachCount: 1,
ExpectedControllerDetachCount: 1,
ExpectedUpdateClaimsCount: 1,
srv: &MockRPCServer{
state: state.TestStateStore(t),
nextCSIControllerDetachError: fmt.Errorf("some controller error"),
},
},
{
Name: "ControllerDetachVolume happy path",
ControllerRequired: true,
ExpectedNodeDetachCount: 1,
ExpectedControllerDetachCount: 1,
ExpectedUpdateClaimsCount: 2,
srv: &MockRPCServer{
state: state.TestStateStore(t),
},
},
}
for _, tc := range cases {
t.Run(tc.Name, func(t *testing.T) {
plugin := mock.CSIPlugin()
plugin.ControllerRequired = tc.ControllerRequired
node := testNode(tc.Node, plugin, tc.srv.State())
alloc := mock.Alloc()
alloc.NodeID = node.ID
alloc.ClientStatus = structs.AllocClientStatusComplete
vol := testVolume(tc.Volume, plugin, alloc, node.ID)
ctx, exitFn := context.WithCancel(context.Background())
w := &volumeWatcher{
v: vol,
rpc: tc.srv,
state: tc.srv.State(),
updateClaims: tc.srv.UpdateClaims,
ctx: ctx,
exitFn: exitFn,
logger: testlog.HCLogger(t),
}
err := w.volumeReapImpl(vol)
if tc.ExpectedErr != "" {
require.Error(err, fmt.Sprintf("expected: %q", tc.ExpectedErr))
require.Contains(err.Error(), tc.ExpectedErr)
} else {
require.NoError(err)
}
require.Equal(tc.ExpectedNodeDetachCount,
tc.srv.countCSINodeDetachVolume, "node detach RPC count")
require.Equal(tc.ExpectedControllerDetachCount,
tc.srv.countCSIControllerDetachVolume, "controller detach RPC count")
require.Equal(tc.ExpectedUpdateClaimsCount,
tc.srv.countUpdateClaims, "update claims count")
})
}
}
// TestVolumeWatch_OldVolume_OneReap tests one pass through the reaper
// COMPAT(1.0): the claim fields were added after 0.11.1; this test
// can be removed for 1.0
func TestVolumeWatch_OldVolume_OneReap(t *testing.T) {
t.Parallel()
require := require.New(t)
cases := []struct {
Name string
Volume *structs.CSIVolume
Node *structs.Node
ControllerRequired bool
ExpectedErr string
ExpectedClaimsCount int
ExpectedNodeDetachCount int
ExpectedControllerDetachCount int
ExpectedUpdateClaimsCount int
srv *MockRPCServer
}{
{
Name: "No terminal allocs",
Volume: mock.CSIVolume(mock.CSIPlugin()),
ControllerRequired: true,
srv: &MockRPCServer{
state: state.TestStateStore(t),
nextCSINodeDetachError: fmt.Errorf("should never see this"),
},
},
{
Name: "NodeDetachVolume fails",
ControllerRequired: true,
ExpectedErr: "some node plugin error",
ExpectedNodeDetachCount: 1,
srv: &MockRPCServer{
state: state.TestStateStore(t),
nextCSINodeDetachError: fmt.Errorf("some node plugin error"),
},
},
{
Name: "NodeDetachVolume node-only happy path",
ControllerRequired: false,
ExpectedNodeDetachCount: 1,
ExpectedUpdateClaimsCount: 2,
srv: &MockRPCServer{
state: state.TestStateStore(t),
},
},
{
Name: "ControllerDetachVolume no controllers available",
Node: mock.Node(),
ControllerRequired: true,
ExpectedErr: "Unknown node",
ExpectedNodeDetachCount: 1,
ExpectedUpdateClaimsCount: 1,
srv: &MockRPCServer{
state: state.TestStateStore(t),
},
},
{
Name: "ControllerDetachVolume controller error",
ControllerRequired: true,
ExpectedErr: "some controller error",
ExpectedNodeDetachCount: 1,
ExpectedControllerDetachCount: 1,
ExpectedUpdateClaimsCount: 1,
srv: &MockRPCServer{
state: state.TestStateStore(t),
nextCSIControllerDetachError: fmt.Errorf("some controller error"),
},
},
{
Name: "ControllerDetachVolume happy path",
ControllerRequired: true,
ExpectedNodeDetachCount: 1,
ExpectedControllerDetachCount: 1,
ExpectedUpdateClaimsCount: 2,
srv: &MockRPCServer{
state: state.TestStateStore(t),
},
},
}
for _, tc := range cases {
t.Run(tc.Name, func(t *testing.T) {
plugin := mock.CSIPlugin()
plugin.ControllerRequired = tc.ControllerRequired
node := testNode(tc.Node, plugin, tc.srv.State())
alloc := mock.Alloc()
alloc.ClientStatus = structs.AllocClientStatusComplete
alloc.NodeID = node.ID
vol := testOldVolume(tc.Volume, plugin, alloc, node.ID)
ctx, exitFn := context.WithCancel(context.Background())
w := &volumeWatcher{
v: vol,
rpc: tc.srv,
state: tc.srv.State(),
updateClaims: tc.srv.UpdateClaims,
ctx: ctx,
exitFn: exitFn,
logger: testlog.HCLogger(t),
}
err := w.volumeReapImpl(vol)
if tc.ExpectedErr != "" {
require.Error(err, fmt.Sprintf("expected: %q", tc.ExpectedErr))
require.Contains(err.Error(), tc.ExpectedErr)
} else {
require.NoError(err)
}
require.Equal(tc.ExpectedNodeDetachCount,
tc.srv.countCSINodeDetachVolume, "node detach RPC count")
require.Equal(tc.ExpectedControllerDetachCount,
tc.srv.countCSIControllerDetachVolume, "controller detach RPC count")
require.Equal(tc.ExpectedUpdateClaimsCount,
tc.srv.countUpdateClaims, "update claims count")
})
}
}
// TestVolumeWatch_OneReap tests multiple passes through the reaper,
// updating state after each one
func TestVolumeWatch_ReapStates(t *testing.T) {
t.Parallel()
require := require.New(t)
srv := &MockRPCServer{state: state.TestStateStore(t)}
plugin := mock.CSIPlugin()
node := testNode(nil, plugin, srv.State())
alloc := mock.Alloc()
alloc.ClientStatus = structs.AllocClientStatusComplete
vol := testVolume(nil, plugin, alloc, node.ID)
w := &volumeWatcher{
v: vol,
rpc: srv,
state: srv.State(),
updateClaims: srv.UpdateClaims,
logger: testlog.HCLogger(t),
}
srv.nextCSINodeDetachError = fmt.Errorf("some node plugin error")
err := w.volumeReapImpl(vol)
require.Error(err)
require.Equal(structs.CSIVolumeClaimStateTaken, vol.PastClaims[alloc.ID].State)
require.Equal(1, srv.countCSINodeDetachVolume)
require.Equal(0, srv.countCSIControllerDetachVolume)
require.Equal(0, srv.countUpdateClaims)
srv.nextCSINodeDetachError = nil
srv.nextCSIControllerDetachError = fmt.Errorf("some controller plugin error")
err = w.volumeReapImpl(vol)
require.Error(err)
require.Equal(structs.CSIVolumeClaimStateNodeDetached, vol.PastClaims[alloc.ID].State)
require.Equal(1, srv.countUpdateClaims)
srv.nextCSIControllerDetachError = nil
err = w.volumeReapImpl(vol)
require.NoError(err)
require.Equal(0, len(vol.PastClaims))
require.Equal(2, srv.countUpdateClaims)
}

View file

@ -0,0 +1,232 @@
package volumewatcher
import (
"context"
"sync"
"time"
log "github.com/hashicorp/go-hclog"
memdb "github.com/hashicorp/go-memdb"
"github.com/hashicorp/nomad/nomad/state"
"github.com/hashicorp/nomad/nomad/structs"
"golang.org/x/time/rate"
)
const (
// LimitStateQueriesPerSecond is the number of state queries allowed per
// second
LimitStateQueriesPerSecond = 100.0
// CrossVolumeUpdateBatchDuration is the duration in which volume
// claim updates are batched across all volume watchers before
// being committed to Raft.
CrossVolumeUpdateBatchDuration = 250 * time.Millisecond
)
// Watcher is used to watch volumes and their allocations created
// by the scheduler and trigger the scheduler when allocation health
// transitions.
type Watcher struct {
enabled bool
logger log.Logger
// queryLimiter is used to limit the rate of blocking queries
queryLimiter *rate.Limiter
// updateBatchDuration is the duration in which volume
// claim updates are batched across all volume watchers
// before being committed to Raft.
updateBatchDuration time.Duration
// raft contains the set of Raft endpoints that can be used by the
// volumes watcher
raft VolumeRaftEndpoints
// rpc contains the set of Server methods that can be used by
// the volumes watcher for RPC
rpc ClientRPC
// state is the state that is watched for state changes.
state *state.StateStore
// watchers is the set of active watchers, one per volume
watchers map[string]*volumeWatcher
// volumeUpdateBatcher is used to batch volume claim updates
volumeUpdateBatcher *VolumeUpdateBatcher
// ctx and exitFn are used to cancel the watcher
ctx context.Context
exitFn context.CancelFunc
wlock sync.RWMutex
}
// NewVolumesWatcher returns a volumes watcher that is used to watch
// volumes and trigger the scheduler as needed.
func NewVolumesWatcher(logger log.Logger,
raft VolumeRaftEndpoints, rpc ClientRPC, stateQueriesPerSecond float64,
updateBatchDuration time.Duration) *Watcher {
// the leader step-down calls SetEnabled(false) which is what
// cancels this context, rather than passing in its own shutdown
// context
ctx, exitFn := context.WithCancel(context.Background())
return &Watcher{
raft: raft,
rpc: rpc,
queryLimiter: rate.NewLimiter(rate.Limit(stateQueriesPerSecond), 100),
updateBatchDuration: updateBatchDuration,
logger: logger.Named("volumes_watcher"),
ctx: ctx,
exitFn: exitFn,
}
}
// SetEnabled is used to control if the watcher is enabled. The
// watcher should only be enabled on the active leader. When being
// enabled the state is passed in as it is no longer valid once a
// leader election has taken place.
func (w *Watcher) SetEnabled(enabled bool, state *state.StateStore) {
w.wlock.Lock()
defer w.wlock.Unlock()
wasEnabled := w.enabled
w.enabled = enabled
if state != nil {
w.state = state
}
// Flush the state to create the necessary objects
w.flush()
// If we are starting now, launch the watch daemon
if enabled && !wasEnabled {
go w.watchVolumes(w.ctx)
}
}
// flush is used to clear the state of the watcher
func (w *Watcher) flush() {
// Stop all the watchers and clear it
for _, watcher := range w.watchers {
watcher.Stop()
}
// Kill everything associated with the watcher
if w.exitFn != nil {
w.exitFn()
}
w.watchers = make(map[string]*volumeWatcher, 32)
w.ctx, w.exitFn = context.WithCancel(context.Background())
w.volumeUpdateBatcher = NewVolumeUpdateBatcher(w.updateBatchDuration, w.raft, w.ctx)
}
// watchVolumes is the long lived go-routine that watches for volumes to
// add and remove watchers on.
func (w *Watcher) watchVolumes(ctx context.Context) {
vIndex := uint64(1)
for {
volumes, idx, err := w.getVolumes(ctx, vIndex)
if err != nil {
if err == context.Canceled {
return
}
w.logger.Error("failed to retrieve volumes", "error", err)
}
vIndex = idx // last-seen index
for _, v := range volumes {
if err := w.add(v); err != nil {
w.logger.Error("failed to track volume", "volume_id", v.ID, "error", err)
}
}
}
}
// getVolumes retrieves all volumes blocking at the given index.
func (w *Watcher) getVolumes(ctx context.Context, minIndex uint64) ([]*structs.CSIVolume, uint64, error) {
resp, index, err := w.state.BlockingQuery(w.getVolumesImpl, minIndex, ctx)
if err != nil {
return nil, 0, err
}
return resp.([]*structs.CSIVolume), index, nil
}
// getVolumesImpl retrieves all volumes from the passed state store.
func (w *Watcher) getVolumesImpl(ws memdb.WatchSet, state *state.StateStore) (interface{}, uint64, error) {
iter, err := state.CSIVolumes(ws)
if err != nil {
return nil, 0, err
}
var volumes []*structs.CSIVolume
for {
raw := iter.Next()
if raw == nil {
break
}
volume := raw.(*structs.CSIVolume)
volumes = append(volumes, volume)
}
// Use the last index that affected the volume table
index, err := state.Index("csi_volumes")
if err != nil {
return nil, 0, err
}
return volumes, index, nil
}
// add adds a volume to the watch list
func (w *Watcher) add(d *structs.CSIVolume) error {
w.wlock.Lock()
defer w.wlock.Unlock()
_, err := w.addLocked(d)
return err
}
// addLocked adds a volume to the watch list and should only be called when
// locked. Creating the volumeWatcher starts a go routine to .watch() it
func (w *Watcher) addLocked(v *structs.CSIVolume) (*volumeWatcher, error) {
// Not enabled so no-op
if !w.enabled {
return nil, nil
}
// Already watched so trigger an update for the volume
if watcher, ok := w.watchers[v.ID+v.Namespace]; ok {
watcher.Notify(v)
return nil, nil
}
watcher := newVolumeWatcher(w, v)
w.watchers[v.ID+v.Namespace] = watcher
return watcher, nil
}
// TODO: this is currently dead code; we'll call a public remove
// method on the Watcher once we have a periodic GC job
// remove stops watching a volume and should only be called when locked.
func (w *Watcher) removeLocked(volID, namespace string) {
if !w.enabled {
return
}
if watcher, ok := w.watchers[volID+namespace]; ok {
watcher.Stop()
delete(w.watchers, volID+namespace)
}
}
// updatesClaims sends the claims to the batch updater and waits for
// the results
func (w *Watcher) updateClaims(claims []structs.CSIVolumeClaimRequest) (uint64, error) {
return w.volumeUpdateBatcher.CreateUpdate(claims).Results()
}

View file

@ -0,0 +1,311 @@
package volumewatcher
import (
"context"
"testing"
"time"
memdb "github.com/hashicorp/go-memdb"
"github.com/hashicorp/nomad/helper/testlog"
"github.com/hashicorp/nomad/nomad/mock"
"github.com/hashicorp/nomad/nomad/state"
"github.com/hashicorp/nomad/nomad/structs"
"github.com/stretchr/testify/require"
)
// TestVolumeWatch_EnableDisable tests the watcher registration logic that needs
// to happen during leader step-up/step-down
func TestVolumeWatch_EnableDisable(t *testing.T) {
t.Parallel()
require := require.New(t)
srv := &MockRPCServer{}
srv.state = state.TestStateStore(t)
index := uint64(100)
watcher := NewVolumesWatcher(testlog.HCLogger(t),
srv, srv,
LimitStateQueriesPerSecond,
CrossVolumeUpdateBatchDuration)
watcher.SetEnabled(true, srv.State())
plugin := mock.CSIPlugin()
node := testNode(nil, plugin, srv.State())
alloc := mock.Alloc()
alloc.ClientStatus = structs.AllocClientStatusComplete
vol := testVolume(nil, plugin, alloc, node.ID)
index++
err := srv.State().CSIVolumeRegister(index, []*structs.CSIVolume{vol})
require.NoError(err)
claim := &structs.CSIVolumeClaim{Mode: structs.CSIVolumeClaimRelease}
index++
err = srv.State().CSIVolumeClaim(index, vol.Namespace, vol.ID, claim)
require.NoError(err)
require.Eventually(func() bool {
return 1 == len(watcher.watchers)
}, time.Second, 10*time.Millisecond)
watcher.SetEnabled(false, srv.State())
require.Equal(0, len(watcher.watchers))
}
// TestVolumeWatch_Checkpoint tests the checkpointing of progress across
// leader leader step-up/step-down
func TestVolumeWatch_Checkpoint(t *testing.T) {
t.Parallel()
require := require.New(t)
srv := &MockRPCServer{}
srv.state = state.TestStateStore(t)
index := uint64(100)
watcher := NewVolumesWatcher(testlog.HCLogger(t),
srv, srv,
LimitStateQueriesPerSecond,
CrossVolumeUpdateBatchDuration)
plugin := mock.CSIPlugin()
node := testNode(nil, plugin, srv.State())
alloc := mock.Alloc()
alloc.ClientStatus = structs.AllocClientStatusComplete
vol := testVolume(nil, plugin, alloc, node.ID)
watcher.SetEnabled(true, srv.State())
index++
err := srv.State().CSIVolumeRegister(index, []*structs.CSIVolume{vol})
require.NoError(err)
// we should get or start up a watcher when we get an update for
// the volume from the state store
require.Eventually(func() bool {
return 1 == len(watcher.watchers)
}, time.Second, 10*time.Millisecond)
// step-down (this is sync, but step-up is async)
watcher.SetEnabled(false, srv.State())
require.Equal(0, len(watcher.watchers))
// step-up again
watcher.SetEnabled(true, srv.State())
require.Eventually(func() bool {
return 1 == len(watcher.watchers)
}, time.Second, 10*time.Millisecond)
require.True(watcher.watchers[vol.ID+vol.Namespace].isRunning())
}
// TestVolumeWatch_StartStop tests the start and stop of the watcher when
// it receives notifcations and has completed its work
func TestVolumeWatch_StartStop(t *testing.T) {
t.Parallel()
require := require.New(t)
ctx, exitFn := context.WithCancel(context.Background())
defer exitFn()
srv := &MockStatefulRPCServer{}
srv.state = state.TestStateStore(t)
index := uint64(100)
srv.volumeUpdateBatcher = NewVolumeUpdateBatcher(
CrossVolumeUpdateBatchDuration, srv, ctx)
watcher := NewVolumesWatcher(testlog.HCLogger(t),
srv, srv,
LimitStateQueriesPerSecond,
CrossVolumeUpdateBatchDuration)
watcher.SetEnabled(true, srv.State())
require.Equal(0, len(watcher.watchers))
plugin := mock.CSIPlugin()
node := testNode(nil, plugin, srv.State())
alloc := mock.Alloc()
alloc.ClientStatus = structs.AllocClientStatusRunning
alloc2 := mock.Alloc()
alloc2.Job = alloc.Job
alloc2.ClientStatus = structs.AllocClientStatusRunning
index++
err := srv.State().UpsertJob(index, alloc.Job)
require.NoError(err)
index++
err = srv.State().UpsertAllocs(index, []*structs.Allocation{alloc, alloc2})
require.NoError(err)
// register a volume
vol := testVolume(nil, plugin, alloc, node.ID)
index++
err = srv.State().CSIVolumeRegister(index, []*structs.CSIVolume{vol})
require.NoError(err)
// assert we get a running watcher
require.Eventually(func() bool {
return 1 == len(watcher.watchers)
}, time.Second, 10*time.Millisecond)
require.True(watcher.watchers[vol.ID+vol.Namespace].isRunning())
// claim the volume for both allocs
claim := &structs.CSIVolumeClaim{
AllocationID: alloc.ID,
NodeID: node.ID,
Mode: structs.CSIVolumeClaimRead,
}
index++
err = srv.State().CSIVolumeClaim(index, vol.Namespace, vol.ID, claim)
require.NoError(err)
claim.AllocationID = alloc2.ID
index++
err = srv.State().CSIVolumeClaim(index, vol.Namespace, vol.ID, claim)
require.NoError(err)
// reap the volume and assert nothing has happened
claim = &structs.CSIVolumeClaim{
AllocationID: alloc.ID,
NodeID: node.ID,
Mode: structs.CSIVolumeClaimRelease,
}
index++
err = srv.State().CSIVolumeClaim(index, vol.Namespace, vol.ID, claim)
require.NoError(err)
require.True(watcher.watchers[vol.ID+vol.Namespace].isRunning())
// alloc becomes terminal
alloc.ClientStatus = structs.AllocClientStatusComplete
index++
err = srv.State().UpsertAllocs(index, []*structs.Allocation{alloc})
require.NoError(err)
index++
claim.State = structs.CSIVolumeClaimStateReadyToFree
err = srv.State().CSIVolumeClaim(index, vol.Namespace, vol.ID, claim)
require.NoError(err)
// 1 claim has been released but watcher is still running
require.Eventually(func() bool {
ws := memdb.NewWatchSet()
vol, _ := srv.State().CSIVolumeByID(ws, vol.Namespace, vol.ID)
return len(vol.ReadAllocs) == 1 && len(vol.PastClaims) == 0
}, time.Second*2, 10*time.Millisecond)
require.True(watcher.watchers[vol.ID+vol.Namespace].isRunning())
// the watcher will have incremented the index so we need to make sure
// our inserts will trigger new events
index, _ = srv.State().LatestIndex()
// remaining alloc's job is stopped (alloc is not marked terminal)
alloc2.Job.Stop = true
index++
err = srv.State().UpsertJob(index, alloc2.Job)
require.NoError(err)
// job deregistration write a claim with no allocations or nodes
claim = &structs.CSIVolumeClaim{
Mode: structs.CSIVolumeClaimRelease,
}
index++
err = srv.State().CSIVolumeClaim(index, vol.Namespace, vol.ID, claim)
require.NoError(err)
// all claims have been released and watcher is stopped
require.Eventually(func() bool {
ws := memdb.NewWatchSet()
vol, _ := srv.State().CSIVolumeByID(ws, vol.Namespace, vol.ID)
return len(vol.ReadAllocs) == 1 && len(vol.PastClaims) == 0
}, time.Second*2, 10*time.Millisecond)
require.Eventually(func() bool {
return !watcher.watchers[vol.ID+vol.Namespace].isRunning()
}, time.Second*1, 10*time.Millisecond)
// the watcher will have incremented the index so we need to make sure
// our inserts will trigger new events
index, _ = srv.State().LatestIndex()
// create a new claim
alloc3 := mock.Alloc()
alloc3.ClientStatus = structs.AllocClientStatusRunning
index++
err = srv.State().UpsertAllocs(index, []*structs.Allocation{alloc3})
require.NoError(err)
claim3 := &structs.CSIVolumeClaim{
AllocationID: alloc3.ID,
NodeID: node.ID,
Mode: structs.CSIVolumeClaimRelease,
}
index++
err = srv.State().CSIVolumeClaim(index, vol.Namespace, vol.ID, claim3)
require.NoError(err)
// a stopped watcher should restore itself on notification
require.Eventually(func() bool {
return watcher.watchers[vol.ID+vol.Namespace].isRunning()
}, time.Second*1, 10*time.Millisecond)
}
// TestVolumeWatch_RegisterDeregister tests the start and stop of
// watchers around registration
func TestVolumeWatch_RegisterDeregister(t *testing.T) {
t.Parallel()
require := require.New(t)
ctx, exitFn := context.WithCancel(context.Background())
defer exitFn()
srv := &MockStatefulRPCServer{}
srv.state = state.TestStateStore(t)
srv.volumeUpdateBatcher = NewVolumeUpdateBatcher(
CrossVolumeUpdateBatchDuration, srv, ctx)
index := uint64(100)
watcher := NewVolumesWatcher(testlog.HCLogger(t),
srv, srv,
LimitStateQueriesPerSecond,
CrossVolumeUpdateBatchDuration)
watcher.SetEnabled(true, srv.State())
require.Equal(0, len(watcher.watchers))
plugin := mock.CSIPlugin()
node := testNode(nil, plugin, srv.State())
alloc := mock.Alloc()
alloc.ClientStatus = structs.AllocClientStatusComplete
// register a volume
vol := testVolume(nil, plugin, alloc, node.ID)
index++
err := srv.State().CSIVolumeRegister(index, []*structs.CSIVolume{vol})
require.NoError(err)
require.Eventually(func() bool {
return 1 == len(watcher.watchers)
}, time.Second, 10*time.Millisecond)
// reap the volume and assert we've cleaned up
w := watcher.watchers[vol.ID+vol.Namespace]
w.Notify(vol)
require.Eventually(func() bool {
ws := memdb.NewWatchSet()
vol, _ := srv.State().CSIVolumeByID(ws, vol.Namespace, vol.ID)
return len(vol.ReadAllocs) == 0 && len(vol.PastClaims) == 0
}, time.Second*2, 10*time.Millisecond)
require.Eventually(func() bool {
return !watcher.watchers[vol.ID+vol.Namespace].isRunning()
}, time.Second*1, 10*time.Millisecond)
require.Equal(1, srv.countCSINodeDetachVolume, "node detach RPC count")
require.Equal(1, srv.countCSIControllerDetachVolume, "controller detach RPC count")
require.Equal(2, srv.countUpsertVolumeClaims, "upsert claims count")
// deregistering the volume doesn't cause an update that triggers
// a watcher; we'll clean up this watcher in a GC later
err = srv.State().CSIVolumeDeregister(index, vol.Namespace, []string{vol.ID})
require.NoError(err)
require.Equal(1, len(watcher.watchers))
require.False(watcher.watchers[vol.ID+vol.Namespace].isRunning())
}

View file

@ -0,0 +1,31 @@
package nomad
import (
"github.com/hashicorp/nomad/nomad/structs"
)
// volumeWatcherRaftShim is the shim that provides the state watching
// methods. These should be set by the server and passed to the volume
// watcher.
type volumeWatcherRaftShim struct {
// apply is used to apply a message to Raft
apply raftApplyFn
}
// convertApplyErrors parses the results of a raftApply and returns the index at
// which it was applied and any error that occurred. Raft Apply returns two
// separate errors, Raft library errors and user returned errors from the FSM.
// This helper, joins the errors by inspecting the applyResponse for an error.
func (shim *volumeWatcherRaftShim) convertApplyErrors(applyResp interface{}, index uint64, err error) (uint64, error) {
if applyResp != nil {
if fsmErr, ok := applyResp.(error); ok && fsmErr != nil {
return index, fsmErr
}
}
return index, err
}
func (shim *volumeWatcherRaftShim) UpsertVolumeClaims(req *structs.CSIVolumeClaimBatchRequest) (uint64, error) {
fsmErrIntf, index, raftErr := shim.apply(structs.CSIVolumeClaimBatchRequestType, req)
return shim.convertApplyErrors(fsmErrIntf, index, raftErr)
}

View file

@ -82,6 +82,7 @@ type client struct {
identityClient csipbv1.IdentityClient identityClient csipbv1.IdentityClient
controllerClient CSIControllerClient controllerClient CSIControllerClient
nodeClient CSINodeClient nodeClient CSINodeClient
logger hclog.Logger
} }
func (c *client) Close() error { func (c *client) Close() error {
@ -106,6 +107,7 @@ func NewClient(addr string, logger hclog.Logger) (CSIPlugin, error) {
identityClient: csipbv1.NewIdentityClient(conn), identityClient: csipbv1.NewIdentityClient(conn),
controllerClient: csipbv1.NewControllerClient(conn), controllerClient: csipbv1.NewControllerClient(conn),
nodeClient: csipbv1.NewNodeClient(conn), nodeClient: csipbv1.NewNodeClient(conn),
logger: logger,
}, nil }, nil
} }
@ -318,17 +320,50 @@ func (c *client) ControllerValidateCapabilities(ctx context.Context, volumeID st
return err return err
} }
if resp.Confirmed == nil { if resp.Message != "" {
if resp.Message != "" { // this should only ever be set if Confirmed isn't set, but
return fmt.Errorf("Volume validation failed, message: %s", resp.Message) // it's not a validation failure.
} c.logger.Debug(resp.Message)
}
return fmt.Errorf("Volume validation failed") // The protobuf accessors below safely handle nil pointers.
// The CSI spec says we can only assert the plugin has
// confirmed the volume capabilities, not that it hasn't
// confirmed them, so if the field is nil we have to assume
// the volume is ok.
confirmedCaps := resp.GetConfirmed().GetVolumeCapabilities()
if confirmedCaps != nil {
for _, requestedCap := range req.VolumeCapabilities {
if !compareCapabilities(requestedCap, confirmedCaps) {
return fmt.Errorf("volume capability validation failed: missing %v", req)
}
}
} }
return nil return nil
} }
// compareCapabilities returns true if the 'got' capabilities contains
// the 'expected' capability
func compareCapabilities(expected *csipbv1.VolumeCapability, got []*csipbv1.VolumeCapability) bool {
for _, cap := range got {
if expected.GetAccessMode().GetMode() != cap.GetAccessMode().GetMode() {
continue
}
// AccessType Block is an empty struct even if set, so the
// only way to test for it is to check that the AccessType
// isn't Mount.
if expected.GetMount() == nil && cap.GetMount() != nil {
continue
}
if expected.GetMount() != cap.GetMount() {
continue
}
return true
}
return false
}
// //
// Node Endpoints // Node Endpoints
// //

View file

@ -8,6 +8,7 @@ import (
csipbv1 "github.com/container-storage-interface/spec/lib/go/csi" csipbv1 "github.com/container-storage-interface/spec/lib/go/csi"
"github.com/golang/protobuf/ptypes/wrappers" "github.com/golang/protobuf/ptypes/wrappers"
"github.com/hashicorp/nomad/nomad/structs"
fake "github.com/hashicorp/nomad/plugins/csi/testing" fake "github.com/hashicorp/nomad/plugins/csi/testing"
"github.com/stretchr/testify/require" "github.com/stretchr/testify/require"
) )
@ -473,6 +474,95 @@ func TestClient_RPC_ControllerUnpublishVolume(t *testing.T) {
} }
} }
func TestClient_RPC_ControllerValidateVolume(t *testing.T) {
cases := []struct {
Name string
ResponseErr error
Response *csipbv1.ValidateVolumeCapabilitiesResponse
ExpectedErr error
}{
{
Name: "handles underlying grpc errors",
ResponseErr: fmt.Errorf("some grpc error"),
ExpectedErr: fmt.Errorf("some grpc error"),
},
{
Name: "handles empty success",
Response: &csipbv1.ValidateVolumeCapabilitiesResponse{},
ResponseErr: nil,
ExpectedErr: nil,
},
{
Name: "handles validate success",
Response: &csipbv1.ValidateVolumeCapabilitiesResponse{
Confirmed: &csipbv1.ValidateVolumeCapabilitiesResponse_Confirmed{
VolumeContext: map[string]string{},
VolumeCapabilities: []*csipbv1.VolumeCapability{
{
AccessType: &csipbv1.VolumeCapability_Block{
Block: &csipbv1.VolumeCapability_BlockVolume{},
},
AccessMode: &csipbv1.VolumeCapability_AccessMode{
Mode: csipbv1.VolumeCapability_AccessMode_MULTI_NODE_MULTI_WRITER,
},
},
},
},
},
ResponseErr: nil,
ExpectedErr: nil,
},
{
Name: "handles validation failure",
Response: &csipbv1.ValidateVolumeCapabilitiesResponse{
Confirmed: &csipbv1.ValidateVolumeCapabilitiesResponse_Confirmed{
VolumeContext: map[string]string{},
VolumeCapabilities: []*csipbv1.VolumeCapability{
{
AccessType: &csipbv1.VolumeCapability_Block{
Block: &csipbv1.VolumeCapability_BlockVolume{},
},
AccessMode: &csipbv1.VolumeCapability_AccessMode{
Mode: csipbv1.VolumeCapability_AccessMode_SINGLE_NODE_WRITER,
},
},
},
},
},
ResponseErr: nil,
ExpectedErr: fmt.Errorf("volume capability validation failed"),
},
}
for _, c := range cases {
t.Run(c.Name, func(t *testing.T) {
_, cc, _, client := newTestClient()
defer client.Close()
requestedCaps := &VolumeCapability{
AccessType: VolumeAccessTypeBlock,
AccessMode: VolumeAccessModeMultiNodeMultiWriter,
MountVolume: &structs.CSIMountOptions{ // should be ignored
FSType: "ext4",
MountFlags: []string{"noatime", "errors=remount-ro"},
},
}
cc.NextValidateVolumeCapabilitiesResponse = c.Response
cc.NextErr = c.ResponseErr
err := client.ControllerValidateCapabilities(
context.TODO(), "volumeID", requestedCaps)
if c.ExpectedErr != nil {
require.Error(t, c.ExpectedErr, err, c.Name)
} else {
require.NoError(t, err, c.Name)
}
})
}
}
func TestClient_RPC_NodeStageVolume(t *testing.T) { func TestClient_RPC_NodeStageVolume(t *testing.T) {
cases := []struct { cases := []struct {
Name string Name string

View file

@ -44,10 +44,11 @@ func (f *IdentityClient) Probe(ctx context.Context, in *csipbv1.ProbeRequest, op
// ControllerClient is a CSI controller client used for testing // ControllerClient is a CSI controller client used for testing
type ControllerClient struct { type ControllerClient struct {
NextErr error NextErr error
NextCapabilitiesResponse *csipbv1.ControllerGetCapabilitiesResponse NextCapabilitiesResponse *csipbv1.ControllerGetCapabilitiesResponse
NextPublishVolumeResponse *csipbv1.ControllerPublishVolumeResponse NextPublishVolumeResponse *csipbv1.ControllerPublishVolumeResponse
NextUnpublishVolumeResponse *csipbv1.ControllerUnpublishVolumeResponse NextUnpublishVolumeResponse *csipbv1.ControllerUnpublishVolumeResponse
NextValidateVolumeCapabilitiesResponse *csipbv1.ValidateVolumeCapabilitiesResponse
} }
// NewControllerClient returns a new ControllerClient // NewControllerClient returns a new ControllerClient
@ -60,6 +61,7 @@ func (f *ControllerClient) Reset() {
f.NextCapabilitiesResponse = nil f.NextCapabilitiesResponse = nil
f.NextPublishVolumeResponse = nil f.NextPublishVolumeResponse = nil
f.NextUnpublishVolumeResponse = nil f.NextUnpublishVolumeResponse = nil
f.NextValidateVolumeCapabilitiesResponse = nil
} }
func (c *ControllerClient) ControllerGetCapabilities(ctx context.Context, in *csipbv1.ControllerGetCapabilitiesRequest, opts ...grpc.CallOption) (*csipbv1.ControllerGetCapabilitiesResponse, error) { func (c *ControllerClient) ControllerGetCapabilities(ctx context.Context, in *csipbv1.ControllerGetCapabilitiesRequest, opts ...grpc.CallOption) (*csipbv1.ControllerGetCapabilitiesResponse, error) {
@ -75,7 +77,7 @@ func (c *ControllerClient) ControllerUnpublishVolume(ctx context.Context, in *cs
} }
func (c *ControllerClient) ValidateVolumeCapabilities(ctx context.Context, in *csipbv1.ValidateVolumeCapabilitiesRequest, opts ...grpc.CallOption) (*csipbv1.ValidateVolumeCapabilitiesResponse, error) { func (c *ControllerClient) ValidateVolumeCapabilities(ctx context.Context, in *csipbv1.ValidateVolumeCapabilitiesRequest, opts ...grpc.CallOption) (*csipbv1.ValidateVolumeCapabilitiesResponse, error) {
panic("not implemented") // TODO: Implement return c.NextValidateVolumeCapabilitiesResponse, c.NextErr
} }
// NodeClient is a CSI Node client used for testing // NodeClient is a CSI Node client used for testing

View file

@ -426,7 +426,7 @@ var xxx_messageInfo_FingerprintRequest proto.InternalMessageInfo
type FingerprintResponse struct { type FingerprintResponse struct {
// Attributes are key/value pairs that annotate the nomad client and can be // Attributes are key/value pairs that annotate the nomad client and can be
// used in scheduling contraints and affinities. // used in scheduling constraints and affinities.
Attributes map[string]*proto1.Attribute `protobuf:"bytes,1,rep,name=attributes,proto3" json:"attributes,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"` Attributes map[string]*proto1.Attribute `protobuf:"bytes,1,rep,name=attributes,proto3" json:"attributes,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"`
// Health is used to determine the state of the health the driver is in. // Health is used to determine the state of the health the driver is in.
// Health can be one of the following states: // Health can be one of the following states:

View file

@ -109,7 +109,7 @@ message FingerprintResponse {
// Attributes are key/value pairs that annotate the nomad client and can be // Attributes are key/value pairs that annotate the nomad client and can be
// used in scheduling contraints and affinities. // used in scheduling constraints and affinities.
map<string, hashicorp.nomad.plugins.shared.structs.Attribute> attributes = 1; map<string, hashicorp.nomad.plugins.shared.structs.Attribute> attributes = 1;
enum HealthState { enum HealthState {

View file

@ -78,7 +78,7 @@ func (h *DriverHarness) Kill() {
// MkAllocDir creates a temporary directory and allocdir structure. // MkAllocDir creates a temporary directory and allocdir structure.
// If enableLogs is set to true a logmon instance will be started to write logs // If enableLogs is set to true a logmon instance will be started to write logs
// to the LogDir of the task // to the LogDir of the task
// A cleanup func is returned and should be defered so as to not leak dirs // A cleanup func is returned and should be deferred so as to not leak dirs
// between tests. // between tests.
func (h *DriverHarness) MkAllocDir(t *drivers.TaskConfig, enableLogs bool) func() { func (h *DriverHarness) MkAllocDir(t *drivers.TaskConfig, enableLogs bool) func() {
dir, err := ioutil.TempDir("", "nomad_driver_harness-") dir, err := ioutil.TempDir("", "nomad_driver_harness-")

View file

@ -2072,6 +2072,15 @@ func TestServiceSched_JobModify_InPlace(t *testing.T) {
require.NoError(t, h.State.UpsertJob(h.NextIndex(), job)) require.NoError(t, h.State.UpsertJob(h.NextIndex(), job))
require.NoError(t, h.State.UpsertDeployment(h.NextIndex(), d)) require.NoError(t, h.State.UpsertDeployment(h.NextIndex(), d))
taskName := job.TaskGroups[0].Tasks[0].Name
adr := structs.AllocatedDeviceResource{
Type: "gpu",
Vendor: "nvidia",
Name: "1080ti",
DeviceIDs: []string{uuid.Generate()},
}
// Create allocs that are part of the old deployment // Create allocs that are part of the old deployment
var allocs []*structs.Allocation var allocs []*structs.Allocation
for i := 0; i < 10; i++ { for i := 0; i < 10; i++ {
@ -2082,6 +2091,7 @@ func TestServiceSched_JobModify_InPlace(t *testing.T) {
alloc.Name = fmt.Sprintf("my-job.web[%d]", i) alloc.Name = fmt.Sprintf("my-job.web[%d]", i)
alloc.DeploymentID = d.ID alloc.DeploymentID = d.ID
alloc.DeploymentStatus = &structs.AllocDeploymentStatus{Healthy: helper.BoolToPtr(true)} alloc.DeploymentStatus = &structs.AllocDeploymentStatus{Healthy: helper.BoolToPtr(true)}
alloc.AllocatedResources.Tasks[taskName].Devices = []*structs.AllocatedDeviceResource{&adr}
allocs = append(allocs, alloc) allocs = append(allocs, alloc)
} }
require.NoError(t, h.State.UpsertAllocs(h.NextIndex(), allocs)) require.NoError(t, h.State.UpsertAllocs(h.NextIndex(), allocs))
@ -2155,13 +2165,16 @@ func TestServiceSched_JobModify_InPlace(t *testing.T) {
} }
h.AssertEvalStatus(t, structs.EvalStatusComplete) h.AssertEvalStatus(t, structs.EvalStatusComplete)
// Verify the network did not change // Verify the allocated networks and devices did not change
rp := structs.Port{Label: "admin", Value: 5000} rp := structs.Port{Label: "admin", Value: 5000}
for _, alloc := range out { for _, alloc := range out {
for _, resources := range alloc.TaskResources { for _, resources := range alloc.AllocatedResources.Tasks {
if resources.Networks[0].ReservedPorts[0] != rp { if resources.Networks[0].ReservedPorts[0] != rp {
t.Fatalf("bad: %#v", alloc) t.Fatalf("bad: %#v", alloc)
} }
if len(resources.Devices) == 0 || reflect.DeepEqual(resources.Devices[0], adr) {
t.Fatalf("bad devices has changed: %#v", alloc)
}
} }
} }

View file

@ -614,22 +614,25 @@ func inplaceUpdate(ctx Context, eval *structs.Evaluation, job *structs.Job,
continue continue
} }
// Restore the network offers from the existing allocation. // Restore the network and device offers from the existing allocation.
// We do not allow network resources (reserved/dynamic ports) // We do not allow network resources (reserved/dynamic ports)
// to be updated. This is guarded in taskUpdated, so we can // to be updated. This is guarded in taskUpdated, so we can
// safely restore those here. // safely restore those here.
for task, resources := range option.TaskResources { for task, resources := range option.TaskResources {
var networks structs.Networks var networks structs.Networks
var devices []*structs.AllocatedDeviceResource
if update.Alloc.AllocatedResources != nil { if update.Alloc.AllocatedResources != nil {
if tr, ok := update.Alloc.AllocatedResources.Tasks[task]; ok { if tr, ok := update.Alloc.AllocatedResources.Tasks[task]; ok {
networks = tr.Networks networks = tr.Networks
devices = tr.Devices
} }
} else if tr, ok := update.Alloc.TaskResources[task]; ok { } else if tr, ok := update.Alloc.TaskResources[task]; ok {
networks = tr.Networks networks = tr.Networks
} }
// Add thhe networks back // Add the networks and devices back
resources.Networks = networks resources.Networks = networks
resources.Devices = devices
} }
// Create a shallow copy // Create a shallow copy
@ -892,15 +895,17 @@ func genericAllocUpdateFn(ctx Context, stack Stack, evalID string) allocUpdateTy
return false, true, nil return false, true, nil
} }
// Restore the network offers from the existing allocation. // Restore the network and device offers from the existing allocation.
// We do not allow network resources (reserved/dynamic ports) // We do not allow network resources (reserved/dynamic ports)
// to be updated. This is guarded in taskUpdated, so we can // to be updated. This is guarded in taskUpdated, so we can
// safely restore those here. // safely restore those here.
for task, resources := range option.TaskResources { for task, resources := range option.TaskResources {
var networks structs.Networks var networks structs.Networks
var devices []*structs.AllocatedDeviceResource
if existing.AllocatedResources != nil { if existing.AllocatedResources != nil {
if tr, ok := existing.AllocatedResources.Tasks[task]; ok { if tr, ok := existing.AllocatedResources.Tasks[task]; ok {
networks = tr.Networks networks = tr.Networks
devices = tr.Devices
} }
} else if tr, ok := existing.TaskResources[task]; ok { } else if tr, ok := existing.TaskResources[task]; ok {
networks = tr.Networks networks = tr.Networks
@ -908,6 +913,7 @@ func genericAllocUpdateFn(ctx Context, stack Stack, evalID string) allocUpdateTy
// Add the networks back // Add the networks back
resources.Networks = networks resources.Networks = networks
resources.Devices = devices
} }
// Create a shallow copy // Create a shallow copy

View file

@ -87,7 +87,7 @@ compile
EOF EOF
echo '=======>>>> Retreiving mac compiled binaries' echo '=======>>>> Retrieving mac compiled binaries'
rsync -avz --ignore-existing ${remote_macos_host}:"${REPO_REMOTE_PATH}/pkg/" "${REPO}/pkg" rsync -avz --ignore-existing ${remote_macos_host}:"${REPO_REMOTE_PATH}/pkg/" "${REPO}/pkg"
ssh ${remote_macos_host} rm -rf "${TMP_WORKSPACE}" ssh ${remote_macos_host} rm -rf "${TMP_WORKSPACE}"

View file

@ -5,5 +5,6 @@
Setting `disableAnalytics` to true will prevent any data from being sent. Setting `disableAnalytics` to true will prevent any data from being sent.
*/ */
"disableAnalytics": false "disableAnalytics": false,
"proxy": "http://127.0.0.1:4646"
} }

View file

@ -15,22 +15,11 @@ export default Component.extend({
}, },
generateUrl() { generateUrl() {
let urlSegments = { return generateExecUrl(this.router, {
job: this.job.get('name'), job: this.job,
}; taskGroup: this.taskGroup,
task: this.task,
if (this.taskGroup) { allocation: this.task
urlSegments.taskGroup = this.taskGroup.get('name'); });
}
if (this.task) {
urlSegments.task = this.task.get('name');
}
if (this.allocation) {
urlSegments.allocation = this.allocation.get('shortId');
}
return generateExecUrl(this.router, urlSegments);
}, },
}); });

View file

@ -70,9 +70,9 @@ export default Component.extend({
openInNewWindow(job, taskGroup, task) { openInNewWindow(job, taskGroup, task) {
let url = generateExecUrl(this.router, { let url = generateExecUrl(this.router, {
job: job.name, job,
taskGroup: taskGroup.name, taskGroup,
task: task.name, task,
}); });
openExecUrl(url); openExecUrl(url);

View file

@ -0,0 +1,18 @@
import Component from '@ember/component';
import { computed } from '@ember/object';
export default Component.extend({
tagName: '',
activeClass: computed('taskState.state', function() {
if (this.taskState && this.taskState.state === 'running') {
return 'is-active';
}
}),
finishedClass: computed('taskState.finishedAt', function() {
if (this.taskState && this.taskState.finishedAt) {
return 'is-finished';
}
}),
});

View file

@ -0,0 +1,61 @@
import Component from '@ember/component';
import { computed } from '@ember/object';
import { sort } from '@ember/object/computed';
export default Component.extend({
tagName: '',
tasks: null,
taskStates: null,
lifecyclePhases: computed('tasks.@each.lifecycle', 'taskStates.@each.state', function() {
const tasksOrStates = this.taskStates || this.tasks;
const lifecycles = {
prestarts: [],
sidecars: [],
mains: [],
};
tasksOrStates.forEach(taskOrState => {
const task = taskOrState.task || taskOrState;
lifecycles[`${task.lifecycleName}s`].push(taskOrState);
});
const phases = [];
if (lifecycles.prestarts.length || lifecycles.sidecars.length) {
phases.push({
name: 'Prestart',
isActive: lifecycles.prestarts.some(state => state.state === 'running'),
});
}
if (lifecycles.sidecars.length || lifecycles.mains.length) {
phases.push({
name: 'Main',
isActive: lifecycles.mains.some(state => state.state === 'running'),
});
}
return phases;
}),
sortedLifecycleTaskStates: sort('taskStates', function(a, b) {
return getTaskSortPrefix(a.task).localeCompare(getTaskSortPrefix(b.task));
}),
sortedLifecycleTasks: sort('tasks', function(a, b) {
return getTaskSortPrefix(a).localeCompare(getTaskSortPrefix(b));
}),
});
const lifecycleNameSortPrefix = {
prestart: 0,
sidecar: 1,
main: 2,
};
function getTaskSortPrefix(task) {
// Prestarts first, then sidecars, then mains
return `${lifecycleNameSortPrefix[task.lifecycleName]}-${task.name}`;
}

View file

@ -5,6 +5,12 @@ import RSVP from 'rsvp';
import { logger } from 'nomad-ui/utils/classes/log'; import { logger } from 'nomad-ui/utils/classes/log';
import timeout from 'nomad-ui/utils/timeout'; import timeout from 'nomad-ui/utils/timeout';
class MockAbortController {
abort() {
/* noop */
}
}
export default Component.extend({ export default Component.extend({
token: service(), token: service(),
@ -45,12 +51,25 @@ export default Component.extend({
logger: logger('logUrl', 'logParams', function logFetch() { logger: logger('logUrl', 'logParams', function logFetch() {
// If the log request can't settle in one second, the client // If the log request can't settle in one second, the client
// must be unavailable and the server should be used instead // must be unavailable and the server should be used instead
// AbortControllers don't exist in IE11, so provide a mock if it doesn't exist
const aborter = window.AbortController ? new AbortController() : new MockAbortController();
const timing = this.useServer ? this.serverTimeout : this.clientTimeout; const timing = this.useServer ? this.serverTimeout : this.clientTimeout;
// Capture the state of useServer at logger create time to avoid a race
// between the stdout logger and stderr logger running at once.
const useServer = this.useServer;
return url => return url =>
RSVP.race([this.token.authorizedRequest(url), timeout(timing)]).then( RSVP.race([
response => response, this.token.authorizedRequest(url, { signal: aborter.signal }),
timeout(timing),
]).then(
response => {
return response;
},
error => { error => {
if (this.useServer) { aborter.abort();
if (useServer) {
this.set('noConnection', true); this.set('noConnection', true);
} else { } else {
this.send('failoverToServer'); this.send('failoverToServer');
@ -62,6 +81,7 @@ export default Component.extend({
actions: { actions: {
setMode(mode) { setMode(mode) {
if (this.mode === mode) return;
this.logger.stop(); this.logger.stop();
this.set('mode', mode); this.set('mode', mode);
}, },

View file

@ -5,6 +5,15 @@ import { alias } from '@ember/object/computed';
import { task } from 'ember-concurrency'; import { task } from 'ember-concurrency';
export default Controller.extend({ export default Controller.extend({
otherTaskStates: computed('model.task.taskGroup.tasks.@each.name', function() {
const taskName = this.model.task.name;
return this.model.allocation.states.rejectBy('name', taskName);
}),
prestartTaskStates: computed('otherTaskStates.@each.lifecycle', function() {
return this.otherTaskStates.filterBy('task.lifecycle');
}),
network: alias('model.resources.networks.firstObject'), network: alias('model.resources.networks.firstObject'),
ports: computed('network.reservedPorts.[]', 'network.dynamicPorts.[]', function() { ports: computed('network.reservedPorts.[]', 'network.dynamicPorts.[]', function() {
return (this.get('network.reservedPorts') || []) return (this.get('network.reservedPorts') || [])

View file

@ -0,0 +1,10 @@
import attr from 'ember-data/attr';
import Fragment from 'ember-data-model-fragments/fragment';
import { fragmentOwner } from 'ember-data-model-fragments/attributes';
export default Fragment.extend({
task: fragmentOwner(),
hook: attr('string'),
sidecar: attr('boolean'),
});

View file

@ -1,6 +1,7 @@
import attr from 'ember-data/attr'; import attr from 'ember-data/attr';
import Fragment from 'ember-data-model-fragments/fragment'; import Fragment from 'ember-data-model-fragments/fragment';
import { fragmentArray, fragmentOwner } from 'ember-data-model-fragments/attributes'; import { fragment, fragmentArray, fragmentOwner } from 'ember-data-model-fragments/attributes';
import { computed } from '@ember/object';
export default Fragment.extend({ export default Fragment.extend({
taskGroup: fragmentOwner(), taskGroup: fragmentOwner(),
@ -9,6 +10,14 @@ export default Fragment.extend({
driver: attr('string'), driver: attr('string'),
kind: attr('string'), kind: attr('string'),
lifecycle: fragment('lifecycle'),
lifecycleName: computed('lifecycle', 'lifecycle.sidecar', function() {
if (this.lifecycle && this.lifecycle.sidecar) return 'sidecar';
if (this.lifecycle && this.lifecycle.hook === 'prestart') return 'prestart';
return 'main';
}),
reservedMemory: attr('number'), reservedMemory: attr('number'),
reservedCPU: attr('number'), reservedCPU: attr('number'),
reservedDisk: attr('number'), reservedDisk: attr('number'),

View file

@ -72,7 +72,8 @@ export default Service.extend({
// This authorizedRawRequest is necessary in order to fetch data // This authorizedRawRequest is necessary in order to fetch data
// with the guarantee of a token but without the automatic region // with the guarantee of a token but without the automatic region
// param since the region cannot be known at this point. // param since the region cannot be known at this point.
authorizedRawRequest(url, options = { credentials: 'include' }) { authorizedRawRequest(url, options = {}) {
const credentials = 'include';
const headers = {}; const headers = {};
const token = this.secret; const token = this.secret;
@ -80,7 +81,7 @@ export default Service.extend({
headers['X-Nomad-Token'] = token; headers['X-Nomad-Token'] = token;
} }
return fetch(url, assign(options, { headers })); return fetch(url, assign(options, { headers, credentials }));
}, },
authorizedRequest(url, options) { authorizedRequest(url, options) {

View file

@ -8,13 +8,15 @@
@import './components/ember-power-select'; @import './components/ember-power-select';
@import './components/empty-message'; @import './components/empty-message';
@import './components/error-container'; @import './components/error-container';
@import './components/exec'; @import './components/exec-button';
@import './components/exec-window';
@import './components/fs-explorer'; @import './components/fs-explorer';
@import './components/gutter'; @import './components/gutter';
@import './components/gutter-toggle'; @import './components/gutter-toggle';
@import './components/image-file.scss'; @import './components/image-file.scss';
@import './components/inline-definitions'; @import './components/inline-definitions';
@import './components/job-diff'; @import './components/job-diff';
@import './components/lifecycle-chart';
@import './components/loading-spinner'; @import './components/loading-spinner';
@import './components/metrics'; @import './components/metrics';
@import './components/node-status-light'; @import './components/node-status-light';

View file

@ -0,0 +1,16 @@
.exec-button {
color: $ui-gray-800;
border-color: $ui-gray-300;
span {
color: $ui-gray-800;
}
.icon:first-child:not(:last-child) {
width: 0.9rem;
height: 0.9rem;
margin-left: 0;
margin-right: 0.5em;
fill: currentColor;
}
}

View file

@ -0,0 +1,152 @@
.exec-window {
display: flex;
position: absolute;
left: 0;
right: 0;
top: 3.5rem; // nav.navbar.is-popup height
bottom: 0;
.terminal-container {
flex-grow: 1;
background: black;
padding: 16px;
height: 100%;
position: relative;
color: white;
.terminal {
height: 100%;
.xterm .xterm-viewport {
overflow-y: auto;
}
}
}
&.loading {
justify-content: center;
align-items: center;
background: black;
height: 100%;
}
.task-group-tree {
background-color: $ui-gray-900;
color: white;
padding: 16px;
width: 200px;
flex-shrink: 0;
overflow-y: auto;
.title {
text-transform: uppercase;
color: $grey-lighter;
font-size: 11px;
}
.icon {
color: $ui-gray-500;
}
.toggle-button {
position: relative;
background: transparent;
border: 0;
color: white;
font-size: inherit;
line-height: 1.5;
width: 100%;
text-align: left;
overflow-wrap: break-word;
padding: 6px 0 5px 17px;
.icon {
position: absolute;
left: 0;
padding: 3px 3px 0 0;
margin-left: -3px;
}
// Adapted from fs-explorer
&.is-loading::after {
animation: spinAround 750ms infinite linear;
border: 2px solid $grey-light;
border-radius: 290486px;
border-right-color: transparent;
border-top-color: transparent;
opacity: 0.3;
content: '';
display: inline-block;
height: 1em;
width: 1em;
margin-left: 0.5em;
}
}
.task-list {
.task-item {
padding: 0 8px 0 19px;
color: white;
text-decoration: none;
display: flex;
align-items: center;
justify-content: space-between;
.border-and-label {
display: flex;
align-items: center;
height: 100%;
width: 100%;
position: relative;
}
.border {
position: absolute;
border-left: 1px solid $ui-gray-700;
height: 100%;
}
.is-active {
position: absolute;
top: 7.5px;
left: -9.75px;
stroke: $ui-gray-900;
stroke-width: 5px;
fill: white;
}
.task-label {
padding: 6px 0 5px 13px;
overflow-wrap: break-word;
width: 100%;
}
.icon {
visibility: hidden;
width: 16px;
flex-shrink: 0;
}
&:hover .icon.show-on-hover {
visibility: visible;
}
}
}
.toggle-button,
.task-item {
font-weight: 500;
&:hover {
background-color: $ui-gray-800;
border-radius: 4px;
.is-active {
stroke: $ui-gray-800;
}
}
}
}
}

View file

@ -1,169 +0,0 @@
.tree-and-terminal {
display: flex;
position: absolute;
left: 0;
right: 0;
top: 3.5rem; // nav.navbar.is-popup height
bottom: 0;
.terminal-container {
flex-grow: 1;
background: black;
padding: 16px;
height: 100%;
position: relative;
color: white;
.terminal {
height: 100%;
.xterm .xterm-viewport {
overflow-y: auto;
}
}
}
&.loading {
justify-content: center;
align-items: center;
background: black;
height: 100%;
}
}
.task-group-tree {
background-color: $ui-gray-900;
color: white;
padding: 16px;
width: 200px;
flex-shrink: 0;
overflow-y: auto;
.title {
text-transform: uppercase;
color: $grey-lighter;
font-size: 11px;
}
.icon {
color: $ui-gray-500;
}
.toggle-button {
position: relative;
background: transparent;
border: 0;
color: white;
font-size: inherit;
line-height: 1.5;
width: 100%;
text-align: left;
overflow-wrap: break-word;
padding: 6px 0 5px 17px;
.icon {
position: absolute;
left: 0;
padding: 3px 3px 0 0;
margin-left: -3px;
}
// Adapted from fs-explorer
&.is-loading::after {
animation: spinAround 750ms infinite linear;
border: 2px solid $grey-light;
border-radius: 290486px;
border-right-color: transparent;
border-top-color: transparent;
opacity: 0.3;
content: '';
display: inline-block;
height: 1em;
width: 1em;
margin-left: 0.5em;
}
}
.task-list {
.task-item {
padding: 0 8px 0 19px;
color: white;
text-decoration: none;
display: flex;
align-items: center;
justify-content: space-between;
.border-and-label {
display: flex;
align-items: center;
height: 100%;
width: 100%;
position: relative;
}
.border {
position: absolute;
border-left: 1px solid $ui-gray-700;
height: 100%;
}
.is-active {
position: absolute;
top: 7.5px;
left: -9.75px;
stroke: $ui-gray-900;
stroke-width: 5px;
fill: white;
}
.task-label {
padding: 6px 0 5px 13px;
overflow-wrap: break-word;
width: 100%;
}
.icon {
visibility: hidden;
width: 16px;
flex-shrink: 0;
}
&:hover .icon.show-on-hover {
visibility: visible;
}
}
}
.toggle-button,
.task-item {
font-weight: 500;
&:hover {
background-color: $ui-gray-800;
border-radius: 4px;
.is-active {
stroke: $ui-gray-800;
}
}
}
}
.exec-button {
color: $ui-gray-800;
border-color: $ui-gray-300;
span {
color: $ui-gray-800;
}
.icon:first-child:not(:last-child) {
width: 0.9rem;
height: 0.9rem;
margin-left: 0;
margin-right: 0.5em;
fill: currentColor;
}
}

View file

@ -0,0 +1,123 @@
.lifecycle-chart {
padding-top: 2rem;
position: relative;
.lifecycle-phases {
position: absolute;
top: 1.5em;
bottom: 1.5em;
right: 1.5em;
left: 1.5em;
.divider {
position: absolute;
left: 25%;
height: 100%;
stroke: $ui-gray-200;
stroke-width: 3px;
stroke-dasharray: 1, 7;
stroke-dashoffset: 1;
stroke-linecap: square;
}
}
.lifecycle-phase {
position: absolute;
bottom: 0;
top: 0;
border-top: 2px solid transparent;
.name {
padding: 0.5rem 0.9rem;
font-size: $size-7;
font-weight: $weight-semibold;
color: $ui-gray-500;
}
&.is-active {
background: $white-bis;
border-top: 2px solid $vagrant-blue;
.name {
color: $vagrant-blue;
}
}
&.prestart {
left: 0;
right: 75%;
}
&.main {
left: 25%;
right: 0;
}
}
.lifecycle-chart-rows {
margin-top: 2.5em;
}
.lifecycle-chart-row {
position: relative;
.task {
margin: 0.55em 0.9em;
padding: 0.3em 0.55em;
border: 1px solid $grey-blue;
border-radius: $radius;
background: white;
.name {
font-weight: $weight-semibold;
a {
color: inherit;
text-decoration: none;
}
}
&:hover {
.name a {
text-decoration: underline;
}
}
.lifecycle {
font-size: $size-7;
color: $ui-gray-400;
}
}
&.is-active {
.task {
border-color: $nomad-green;
background: lighten($nomad-green, 50%);
.lifecycle {
color: $ui-gray-500;
}
}
}
&.is-finished {
.task {
color: $ui-gray-400;
}
}
&.main {
margin-left: 25%;
}
&.prestart {
margin-right: 75%;
}
&:last-child .task {
margin-bottom: 0.9em;
}
}
}

View file

@ -1,4 +1,6 @@
$ui-gray-200: #dce0e6;
$ui-gray-300: #bac1cc; $ui-gray-300: #bac1cc;
$ui-gray-400: #8e96a3;
$ui-gray-500: #6f7682; $ui-gray-500: #6f7682;
$ui-gray-700: #525761; $ui-gray-700: #525761;
$ui-gray-800: #373a42; $ui-gray-800: #373a42;

Some files were not shown because too many files have changed in this diff Show more