open-nomad

Author	SHA1	Message	Date
Mahmood Ali	e76ff9f679	Merge pull request #7543 from hashicorp/test-flakiness-20200330_1 Test flakiness fixes - 2020-03-30 Edition	2020-03-30 09:26:26 -04:00
Mahmood Ali	57bebfdb5c	tests: avoid logging after test completion	2020-03-30 09:08:34 -04:00
Mahmood Ali	13381448e0	avoid logging in draining job watcher In tests where the logger is a test logger, emitting a trace log in a background thread while it's shutting down may trigger a panic. Thus avoid logging Trace if err != nil. Note that we already log an error when err isn't a trace. This fixes cases where tests panic with a trace like: ``` panic: Log in goroutine after TestAllocGarbageCollector_MakeRoomFor_MaxAllocs has completed goroutine 30 [running]: testing.(common).logDepth(0xc000aa9e60, 0xc000c4a000, 0xab, 0x3) /usr/local/Cellar/go/1.14/libexec/src/testing/testing.go:680 +0x4d3 testing.(common).log(...) /usr/local/Cellar/go/1.14/libexec/src/testing/testing.go:662 testing.(common).Logf(0xc000aa9e60, 0x690b941, 0x4, 0xc001366c00, 0x2, 0x2) /usr/local/Cellar/go/1.14/libexec/src/testing/testing.go:701 +0x7e github.com/hashicorp/nomad/helper/testlog.(writer).Write(0xc000a82a60, 0xc0000b48c0, 0xab, 0x13f, 0x0, 0x0, 0x0) /Users/notnoop/go/src/github.com/hashicorp/nomad/helper/testlog/testlog.go:34 +0x106 github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(writer).Flush(0xc000a80900, 0xbf9870f000000001, 0x20a87556e, 0x8b12bc0) /Users/notnoop/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/writer.go:29 +0x14f github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(intLogger).log(0xc000e2c180, 0xc0003b6880, 0x17, 0x1, 0x6974edc, 0x22, 0xc000db57a0, 0x6, 0x6) /Users/notnoop/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/intlogger.go:139 +0x15d github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(intLogger).Trace(0xc000e2c180, 0x6974edc, 0x22, 0xc000db57a0, 0x6, 0x6) /Users/notnoop/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/intlogger.go:446 +0x7a github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog.(interceptLogger).Trace(0xc0002f1ad0, 0x6974edc, 0x22, 0xc000db57a0, 0x6, 0x6) /Users/notnoop/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-hclog/interceptlogger.go:48 +0x9c github.com/hashicorp/nomad/nomad/drainer.(*drainingJobWatcher).watch(0xc0002f2380) /Users/notnoop/go/src/github.com/hashicorp/nomad/nomad/drainer/watch_jobs.go:147 +0x1125 created by github.com/hashicorp/nomad/nomad/drainer.NewDrainingJobWatcher /Users/notnoop/go/src/github.com/hashicorp/nomad/nomad/drainer/watch_jobs.go:89 +0x1e3 FAIL github.com/hashicorp/nomad/client 10.605s FAIL ```	2020-03-30 07:06:53 -04:00
Mahmood Ali	36ad8ee2e0	tests: add debugging for TestAutopilot_RollingUpdate	2020-03-30 07:06:53 -04:00
Chris Baker	d6287c43b9	clean up some tests	2020-03-29 23:38:36 +00:00
Chris Baker	5e3c38be2f	state_store: * added method to retrieve all scaling policies for use in snapshotting, plus test * better testing for ScalingPoliciesByNamespace * added scaling policy snapshot persist and restore (and test of restore) manually tested snapshot restore. resolves #7539	2020-03-29 13:32:44 +00:00
Lang Martin	50ff9ccd44	csi: plugin deregistration on plugin job GC (#7502 ) * nomad/structs/csi: delete just one plugin type from a node * nomad/structs/csi: add DeleteAlloc * nomad/state/state_store: add deleteJobFromPlugin * nomad/state/state_store: use DeleteAlloc not DeleteNodeType * move CreateTestCSIPlugin to state to avoid an import cycle * nomad/state/state_store_test: delete a plugin by deleting its jobs * nomad/_test: move CreateTestCSIPlugin to state nomad/state/state_store: update one plugin per transaction * command/plugin_status_test: move CreateTestCSIPlugin * nomad: csi: handle nils CSIPlugin methods, clarity	2020-03-26 17:07:18 -04:00
Lang Martin	3375c92aa0	csi: make volume registration idempotent (#7490 ) If not in use and not changing external ids, it should not be an error to register a volume again. * nomad/state/state_store: make volume registration idempotent	2020-03-26 12:27:19 -04:00
Lang Martin	ea80330aaa	csi: nomad/structs: test volume denormalize without plugin (#7472 )	2020-03-26 09:43:59 -04:00
Mahmood Ali	b33dbe539b	tests: TestCSIPluginEndpoint_ACLNamespaceAlloc is ent TestCSIPluginEndpoint_ACLNamespaceAlloc uses namespace features not present in OSS.	2020-03-25 08:45:44 -04:00
Mahmood Ali	281fc9837c	tests: relax index checks TestStateStore_Indexes specifically tests for `nodes` index, but asserts on the exact number of indexes present in the state. This is fragile and will break almost everytime we add a state index.	2020-03-25 08:45:38 -04:00
Mahmood Ali	ceed57b48f	per-task restart policy	2020-03-24 17:00:41 -04:00
Chris Baker	ffd79583f6	Merge pull request #7474 from hashicorp/f-scaling-changes-from-review more testing for scaling API	2020-03-24 15:32:10 -05:00
Chris Baker	c638c2c352	update RPC scaling endpoint tests to use renamed 'scale' policy disposition	2020-03-24 20:18:12 +00:00
Chris Baker	5979d6a81e	more testing for ScalingPolicy, mainly around parsing and canonicalization for Min/Max	2020-03-24 19:43:50 +00:00
Chris Baker	aa5beafe64	Job.Scale should not result in job update or eval create if args.Count == nil plus tests	2020-03-24 17:36:06 +00:00
Tim Gross	913da68296	csi: remove client from plugin on client node update (#7462 ) Plugins track the client nodes where they are placed. On client updates, remove the client from the plugin tracking if the client is no longer running an instance of that controller/node plugin. Extends the state store tests to ensure deregistration works as expected and that controllers and nodes are being tracked independently.	2020-03-24 13:26:31 -04:00
Chris Baker	9e530e167d	Merge pull request #7409 from hashicorp/scaling-api Scaling API changes	2020-03-24 11:02:09 -05:00
Chris Baker	606c79b320	add acl validation to Scaling.ListPolicies and Scaling.GetPolicy	2020-03-24 14:39:05 +00:00
Chris Baker	f6ec5f9624	made count optional during job scaling actions added ACL protection in Job.Scale in Job.Scale, only perform a Job.Register if the Count was non-nil	2020-03-24 14:39:05 +00:00
Chris Baker	41b002eecc	wip: ACL checking for RPC Job.ScaleStatus	2020-03-24 14:39:05 +00:00
Lang Martin	bd22afd003	csi: volume deregister fails for volumes actively in use (#7445 ) * nomad/structs/csi: add InUse to CSIVolume * nomad/state/state_store: block volume deregistration for in use vols	2020-03-24 10:10:44 -04:00
Chris Baker	233db5258a	changes to Canonicalize, Validate, and api->struct conversion so that tg.Count, tg.Scaling.Min/Max are well-defined with reasonable defaults. - tg.Count defaults to tg.Scaling.Min if present (falls back on previous default of 1 if Scaling is absent) - Validate() enforces tg.Scaling.Min <= tg.Count <= tg.Scaling.Max modification in ApiScalingPolicyToStructs, api.TaskGroup.Validate so that defaults are handled for TaskGroup.Count and	2020-03-24 13:57:17 +00:00
Chris Baker	f9876a487e	finished Job.ScaleStatus RPC, need to work on http endpoint	2020-03-24 13:57:16 +00:00
Chris Baker	925b59e1d2	wip: scaling status return, almost done	2020-03-24 13:57:15 +00:00
James Rasell	f125b5fb2d	scaling: ensure min and max int64s are in toplevel of block.	2020-03-24 13:57:15 +00:00
Chris Baker	42270d862c	wip: some tests still failing updating job scaling endpoints to match RFC, cleaning up the API object as well	2020-03-24 13:57:14 +00:00
Chris Baker	abc7a52f56	finished refactoring state store, schema, etc	2020-03-24 13:57:14 +00:00
Chris Baker	116aa98ed7	wip: removed some commented junk from scaling poc	2020-03-24 13:57:13 +00:00
Chris Baker	3d54f1feba	wip: added Enabled to ScalingPolicyListStub, removed JobID from body of scaling request	2020-03-24 13:57:12 +00:00
Chris Baker	024d203267	wip: added tests for client methods around group scaling	2020-03-24 13:57:11 +00:00
Chris Baker	179ab68258	wip: added job.scale rpc endpoint, needs explicit test (tested via http now)	2020-03-24 13:57:09 +00:00
Chris Baker	8453e667c2	wip: working on job group scaling endpoint	2020-03-24 13:55:20 +00:00
Chris Baker	6665d0bfb0	wip: added policy get endpoint, added UUID to policy	2020-03-24 13:55:20 +00:00
Chris Baker	9c2560ceeb	wip: upsert/delete scaling policies on job upsert/delete	2020-03-24 13:55:18 +00:00
Chris Baker	65d92f1fbf	WIP: adding ScalingPolicy to api/structs and state store	2020-03-24 13:55:18 +00:00
Tim Gross	fa01a6ea59	csi: fix missing health count from volume list stub	2020-03-24 09:42:59 -04:00
Lang Martin	0847cb513c	csi: volume/plugin list should return an empty array, not nil (#7443 ) * nomad/csi_endpoint: return an empty list, not nil * nomad/csi_endpoint_test: volume list returns non-nil	2020-03-23 21:21:40 -04:00
Lang Martin	d994990ef0	csi: the scheduler allows a job with a volume write claim to be updated (#7438 ) * nomad/structs/csi: split CanWrite into health, in use * scheduler/scheduler: expose AllocByID in the state interface * nomad/state/state_store_test * scheduler/stack: SetJobID on the matcher * scheduler/feasible: when a volume writer is in use, check if it's us * scheduler/feasible: remove SetJob * nomad/state/state_store: denormalize allocs before Claim * nomad/structs/csi: return errors on claim, with context * nomad/csi_endpoint_test: new alloc doesn't look like an update * nomad/state/state_store_test: change test reference to CanWrite	2020-03-23 21:21:04 -04:00
Tim Gross	076fbbf08f	Merge pull request #7012 from hashicorp/f-csi-volumes Container Storage Interface Support	2020-03-23 14:19:46 -04:00
Lang Martin	e100444740	csi: add mount_options to volumes and volume requests (#7398 ) Add mount_options to both the volume definition on registration and to the volume block in the group where the volume is requested. If both are specified, the options provided in the request replace the options defined in the volume. They get passed to the NodePublishVolume, which causes the node plugin to actually mount the volume on the host. Individual tasks just mount bind into the host mounted volume (unchanged behavior). An operator can mount the same volume with different options by specifying it twice in the group context. closes #7007 * nomad/structs/volumes: add MountOptions to volume request * jobspec/test-fixtures/basic.hcl: add mount_options to volume block * jobspec/parse_test: add expected MountOptions * api/tasks: add mount_options * jobspec/parse_group: use hcl decode not mapstructure, mount_options * client/allocrunner/csi_hook: pass MountOptions through client/allocrunner/csi_hook: add a VolumeMountOptions client/allocrunner/csi_hook: drop Options client/allocrunner/csi_hook: use the structs options * client/pluginmanager/csimanager/interface: UsageOptions.MountOptions * client/pluginmanager/csimanager/volume: pass MountOptions in capabilities * plugins/csi/plugin: remove todo 7007 comment * nomad/structs/csi: MountOptions * api/csi: add options to the api for parsing, match structs * plugins/csi/plugin: move VolumeMountOptions to structs * api/csi: use specific type for mount_options * client/allocrunner/csi_hook: merge MountOptions here * rename CSIOptions to CSIMountOptions * client/allocrunner/csi_hook * client/pluginmanager/csimanager/volume * nomad/structs/csi * plugins/csi/fake/client: add PrevVolumeCapability * plugins/csi/plugin * client/pluginmanager/csimanager/volume_test: remove debugging * client/pluginmanager/csimanager/volume: fix odd merging logic * api: rename CSIOptions -> CSIMountOptions * nomad/csi_endpoint: remove a 7007 comment * command/alloc_status: show mount options in the volume list * nomad/structs/csi: include MountOptions in the volume stub * api/csi: add MountOptions to stub * command/volume_status_csi: clean up csiVolMountOption, add it * command/alloc_status: csiVolMountOption lives in volume_csi_status * command/node_status: display mount flags * nomad/structs/volumes: npe * plugins/csi/plugin: npe in ToCSIRepresentation * jobspec/parse_test: expand volume parse test cases * command/agent/job_endpoint: ApiTgToStructsTG needs MountOptions * command/volume_status_csi: copy paste error * jobspec/test-fixtures/basic: hclfmt * command/volume_status_csi: clean up csiVolMountOption	2020-03-23 13:59:25 -04:00
Lang Martin	6b6ae6c2bd	csi: ACLs for plugin endpoints (#7380 ) * acl/policy: add PolicyList for global ACLs * acl/acl: plugin policy * acl/acl: maxPrivilege is required to allow "list" * nomad/csi_endpoint: enforce plugin access with PolicyPlugin * nomad/csi_endpoint: check job ACL swapped params * nomad/csi_endpoint_test: test alloc filtering * acl/policy: add namespace csi-register-plugin * nomad/job_endpoint: check csi-register-plugin ACL on registration * nomad/job_endpoint_test: add plugin job cases	2020-03-23 13:59:25 -04:00
Lang Martin	b596e67f47	csi: implement volume ACLs (#7339 ) * acl/policy: add the volume ACL policies * nomad/csi_endpoint: enforce ACLs for volume access * nomad/search_endpoint_oss: volume acls * acl/acl: add plugin read as a global policy * acl/policy: add PluginPolicy global cap type * nomad/csi_endpoint: check the global plugin ACL policy * nomad/mock/acl: PluginPolicy * nomad/csi_endpoint: fix list rebase * nomad/core_sched_test: new test since #7358 * nomad/csi_endpoint_test: use correct permissions for list * nomad/csi_endpoint: allowCSIMount keeps ACL checks together * nomad/job_endpoint: check mount permission for jobs * nomad/job_endpoint_test: need plugin read, too	2020-03-23 13:59:25 -04:00
Lang Martin	3621df1dbf	csi: volume ids are only unique per namespace (#7358 ) * nomad/state/schema: use the namespace compound index * scheduler/scheduler: CSIVolumeByID interface signature namespace * scheduler/stack: SetJob on CSIVolumeChecker to capture namespace * scheduler/feasible: pass the captured namespace to CSIVolumeByID * nomad/state/state_store: use namespace in csi_volume index * nomad/fsm: pass namespace to CSIVolumeDeregister & Claim * nomad/core_sched: pass the namespace in volumeClaimReap * nomad/node_endpoint_test: namespaces in Claim testing * nomad/csi_endpoint: pass RequestNamespace to state.* * nomad/csi_endpoint_test: appropriately failed test * command/alloc_status_test: appropriately failed test * node_endpoint_test: avoid notTheNamespace for the job * scheduler/feasible_test: call SetJob to capture the namespace * nomad/csi_endpoint: ACL check the req namespace, query by namespace * nomad/state/state_store: remove deregister namespace check * nomad/state/state_store: remove unused CSIVolumes * scheduler/feasible: CSIVolumeChecker SetJob -> SetNamespace * nomad/csi_endpoint: ACL check * nomad/state/state_store_test: remove call to state.CSIVolumes * nomad/core_sched_test: job namespace match so claim gc works	2020-03-23 13:59:25 -04:00
Tim Gross	22e9f679c3	csi: implement controller detach RPCs (#7356 ) This changeset implements the remaining controller detach RPCs: server-to-client and client-to-controller. The tests also uncovered a bug in our RPC for claims which is fixed here; the volume claim RPC is used for both claiming and releasing a claim on a volume. We should only submit a controller publish RPC when the claim is new and not when it's being released.	2020-03-23 13:59:25 -04:00
Tim Gross	0cd2d3cc29	csi: make claims on volumes idempotent for the same alloc (#7328 ) Nomad clients will push node updates during client restart which can cause an extra claim for a volume by the same alloc. If an alloc already claims a volume, we can allow it to be treated as a valid claim and continue.	2020-03-23 13:58:30 -04:00
Lang Martin	6750c262a4	csi: use `ExternalID`, when set, to identify volumes for outside RPC calls (#7326 ) * nomad/structs/csi: new RemoteID() uses the ExternalID if set * nomad/csi_endpoint: pass RemoteID to volume request types * client/pluginmanager/csimanager/volume: pass RemoteID to NodePublishVolume	2020-03-23 13:58:30 -04:00
Lang Martin	80619137ab	csi: volumes listed in `nomad node status` (#7318 ) * api/allocations: GetTaskGroup finds the taskgroup struct * command/node_status: display CSI volume names * nomad/state/state_store: new CSIVolumesByNodeID * nomad/state/iterator: new SliceIterator type implements memdb.ResultIterator * nomad/csi_endpoint: deal with a slice of volumes * nomad/state/state_store: CSIVolumesByNodeID return a SliceIterator * nomad/structs/csi: CSIVolumeListRequest takes a NodeID * nomad/csi_endpoint: use the return iterator * command/agent/csi_endpoint: parse query params for CSIVolumes.List * api/nodes: new CSIVolumes to list volumes by node * command/node_status: use the new list endpoint to print volumes * nomad/state/state_store: error messages consider the operator * command/node_status: include the Provider	2020-03-23 13:58:30 -04:00
Lang Martin	de25fc6cf4	csi: csi-hostpath plugin unimplemented error on controller publish (#7299 ) * client/allocrunner/csi_hook: tag errors * nomad/client_csi_endpoint: tag errors * nomad/client_rpc: remove an unnecessary error tag * nomad/state/state_store: ControllerRequired fix intent We use ControllerRequired to indicate that a volume should use the publish/unpublish workflow, rather than that it has a controller. We need to check both RequiresControllerPlugin and SupportsAttachDetach from the fingerprint to check that. * nomad/csi_endpoint: tag errors * nomad/csi_endpoint_test: longer error messages, mock fingerprints	2020-03-23 13:58:30 -04:00
Tim Gross	b04d23dae0	csi: ensure volume query is idempotent (#7303 ) We denormalize the `CSIVolume` struct when we query it from the state store by getting the plugin and its health. But unless we copy the volume, this denormalization gets synced back to the state store without passing through the fsm (which is invalid).	2020-03-23 13:58:30 -04:00

1 2 3 4 5 ...

3188 commits