open-nomad

Author	SHA1	Message	Date
hashicorp-copywrite[bot]	005636afa0	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Tim Gross	dfed1ba5bc	remove most static RPC handlers (#15451 ) Nomad server components that aren't in the `nomad` package like the deployment watcher and volume watcher need to make RPC calls but can't import the Server struct to do so because it creates a circular reference. These components have a "shim" object that gets populated to pass a "static" handler that has no RPC context. Most RPC handlers are never used in this way, but during server setup we were constructing a set of static handlers for most RPC endpoints anyways. This is slightly wasteful but also confusing to developers who end up being encouraged to just copy what was being done for previous RPCs. This changeset includes the following refactorings: * Remove the static handlers field on the server * Instead construct just the specific static handlers we need to pass into the deployment watcher and volume watcher. * Remove the unnecessary static handler from heartbeater * Update various tests to avoid needing the static endpoints and have them use a endpoint constructed on the spot. Follow-up work will examine whether we can remove the RPCs from deployment watcher and volume watcher entirely, falling back to raft applies like node drainer does currently.	2022-12-02 10:12:05 -05:00
Tim Gross	87681fca68	CSI: ensure initial unpublish state is checkpointed (#14675 ) A test flake revealed a bug in the CSI unpublish workflow, where an unpublish that comes from a client that's successfully done the node-unpublish step will not have the claim checkpointed if the controller-unpublish step fails. This will result in a delay in releasing the volume claim until the next GC. This changeset also ensures we're using a new snapshot after each write to raft, and fixes two timing issues in test where either the volume watcher can unpublish before the unpublish RPC is sent or we don't wait long enough in resource-restricted environements like GHA.	2022-09-27 08:43:45 -04:00
Tim Gross	3fc7482ecd	CSI: failed allocation should not block its own controller unpublish (#14484 ) A Nomad user reported problems with CSI volumes associated with failed allocations, where the Nomad server did not send a controller unpublish RPC. The controller unpublish is skipped if other non-terminal allocations on the same node claim the volume. The check has a bug where the allocation belonging to the claim being freed was included in the check incorrectly. During a normal allocation stop for job stop or a new version of the job, the allocation is terminal. But allocations that fail are not yet marked terminal at the point in time when the client sends the unpublish RPC to the server. For CSI plugins that support controller attach/detach, this means that the controller will not be able to detach the volume from the allocation's host and the replacement claim will fail until a GC is run. This changeset fixes the conditional so that the claim's own allocation is not included, and makes the logic easier to read. Include a test case covering this path. Also includes two minor extra bugfixes: * Entities we get from the state store should always be copied before altering. Ensure that we copy the volume in the top-level unpublish workflow before handing off to the steps. * The list stub object for volumes in `nomad/structs` did not match the stub object in `api`. The `api` package also did not include the current readers/writers fields that are expected by the UI. True up the two objects and add the previously undocumented fields to the docs.	2022-09-08 13:30:05 -04:00
Michael Schurter	3b57df33e3	client: fix data races in config handling (#14139 ) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex only protects the config pointer itself and not fields inside the Config struct: all config mutation must be done on a copy of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an internally backward incompatible API change: our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"	2022-08-18 16:32:04 -07:00
Tim Gross	9d5523a72d	CSI: skip node unpublish on GC'd or down nodes (#13301 ) If the node has been GC'd or is down, we can't send it a node unpublish. The CSI spec requires that we don't send the controller unpublish before the node unpublish, but in the case where a node is gone we can't know the final fate of the node unpublish step. The `csi_hook` on the client will unpublish if the allocation has stopped and if the host is terminated there's no mount for the volume anyways. So we'll now assume that the node has unpublished at its end. If it hasn't, any controller unpublish will potentially hang or error and need to be retried.	2022-06-09 11:33:22 -04:00
Grant Griffiths	18a0a2c9a4	CSI: Add secrets flag support for delete volume (#11245 )	2022-04-05 08:59:11 -04:00
Tim Gross	03c1904112	csi: allow `namespace` field to be passed in volume spec (#12400 ) Use the volume spec's `namespace` field to override the value of the `-namespace` and `NOMAD_NAMESPACE` field, just as we do with job spec.	2022-03-29 14:46:39 -04:00
Tim Gross	a6652bffad	CSI: reorder controller volume detachment (#12387 ) In #12112 and #12113 we solved for the problem of races in releasing volume claims, but there was a case that we missed. During a node drain with a controller attach/detach, we can hit a race where we call controller publish before the unpublish has completed. This is discouraged in the spec but plugins are supposed to handle it safely. But if the storage provider's API is slow enough and the plugin doesn't handle the case safely, the volume can get "locked" into a state where the provider's API won't detach it cleanly. Check the claim before making any external controller publish RPC calls so that Nomad is responsible for the canonical information about whether a volume is currently claimed. This has a couple side-effects that also had to get fixed here: * Changing the order means that the volume will have a past claim without a valid external node ID because it came from the client, and this uncovered a separate bug where we didn't assert the external node ID was valid before returning it. Fallthrough to getting the ID from the plugins in the state store in this case. We avoided this originally because of concerns around plugins getting lost during node drain but now that we've fixed that we may want to revisit it in future work. * We should make sure we're handling `FailedPrecondition` cases from the controller plugin the same way we handle other retryable cases. * Several tests had to be updated because they were assuming we fail in a particular order that we're no longer doing.	2022-03-29 09:44:00 -04:00
Tim Gross	1743648901	CSI: fix timestamp from volume snapshot responses (#12352 ) Listing snapshots was incorrectly returning nanoseconds instead of seconds, and formatting of timestamps both list and create snapshot was treating the timestamp as though it were nanoseconds instead of seconds. This resulted in create timestamps always being displayed as zero values. Fix the unit conversion error in the command line and the incorrect extraction in the CSI plugin client code. Beef up the unit tests to make sure this code is actually exercised.	2022-03-23 10:39:28 -04:00
Seth Hoenig	2631659551	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Luiz Aoqui	550c5ab6ec	fix TestCSIVolumeEndpoint_List_PaginationFiltering test (#12245 )	2022-03-09 09:40:40 -05:00
Luiz Aoqui	ab8ce87bba	Add pagination, filtering and sort to more API endpoints (#12186 )	2022-03-08 20:54:17 -05:00
Tim Gross	2dafe46fe3	CSI: allow updates to volumes on re-registration (#12167 ) CSI `CreateVolume` RPC is idempotent given that the topology, capabilities, and parameters are unchanged. CSI volumes have many user-defined fields that are immutable once set, and many fields that are not user-settable. Update the `Register` RPC so that updating a volume via the API merges onto any existing volume without touching Nomad-controlled fields, while validating it with the same strict requirements expected for idempotent `CreateVolume` RPCs. Also, clarify that this state store method is used for everything, not just for the `Register` RPC.	2022-03-07 11:06:59 -05:00
Tim Gross	3a692a4360	csi: get plugin ID for creating snapshot from volume, not args (#12195 ) The `CreateSnapshot` RPC expects a plugin ID to be set by the API, but in the common case of the `nomad volume snapshot create` command, we don't ask the user for the plugin ID because it's available from the volume we're snapshotting. Change the order of the RPC so that we get the volume first and then use the volume's plugin ID for the plugin if the API didn't set the value.	2022-03-07 09:06:50 -05:00
Tim Gross	b776c1c196	csi: fix prefix queries for plugin list RPC (#12194 ) The `CSIPlugin.List` RPC was intended to accept a prefix to filter the list of plugins being listed. This was being accidentally being done in the state store instead, which contributed to incorrect filtering behavior for plugins in the `volume plugin status` command. Move the prefix matching into the RPC so that it calls the prefix-matching method in the state store if we're looking for a prefix. Update the `plugin status command` to accept a prefix for the plugin ID argument so that it matches the expected behavior of other commands.	2022-03-04 16:44:09 -05:00
Luiz Aoqui	b1809eb48c	Fix CSI volume list with prefix and `` namespace (#12184 ) When using a prefix value and the wildcard for namespace, the endpoint would not take the prefix value into consideration due to the order in which the checks were executed but also the logic for retrieving volumes from the state store. This commit changes the order to check for a prefix first and wraps the result iterator of the state store query in a filter to apply the prefix.	2022-03-03 17:27:04 -05:00
Tim Gross	f2a4ad0949	CSI: implement support for topology (#12129 )	2022-03-01 10:15:46 -05:00
Tim Gross	cfe3117af8	CSI: enforce usage at claim time (#12112 ) * Remove redundant schedulable check in `FreeWriteClaims`. If a volume has been created but not yet claimed, its capabilities will be checked in `WriteSchedulable` at both scheduling time and claim time. We don't need to also check them in the `FreeWriteClaims` method. * Enforce maximum volume claims for writers. When the scheduler checks feasibility for CSI volumes, the check is fairly loose: earlier versions of the same job are not counted as active claims. This allows the scheduler to place new allocations for the new version of a job, under the assumption that we'll replace the existing allocations and their volume claims. But when the alloc runner claims the volume, we need to enforce the active claims even if they're for allocations of an earlier version of the job. Otherwise we'll try to mount a volume that's currently being unmounted, and this will cause replacement allocations to frequently fail. * Enforce single-node reader check for read-only volumes. When the alloc runner makes a claim for a read-only volume, we only check that the volume is potentially schedulable and not that it actually has free read claims.	2022-02-24 09:37:37 -05:00
Tim Gross	57a546489f	CSI: minor refactoring (#12105 ) * rename method checking that free write claims are available * use package-level variables for claim errors * semgrep fix for testify	2022-02-23 11:13:51 -05:00
Luiz Aoqui	40093f97cd	api: support namespace wildcard in CSI volume list (#11724 )	2021-12-21 17:19:45 -05:00
Grant Griffiths	fecbbaee22	CSI ListSnapshots secrets implementation Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2021-07-28 11:30:29 -07:00
Tim Gross	0892d34ff9	CSI: capability block is required for volume registration	2021-04-08 13:02:24 -04:00
Tim Gross	d2e479505c	CSI: capability check ListVolumes at RPC for nicer error messages The plugin stub object does not include fine-grained capability checks, which means `nomad volume status -verbose` will return ugly and verbose error "Unimplemented" messages from the plugin if it does not support the CSI `ListVolumes` RPC. Return a nicer error message from our RPC handler instead.	2021-04-07 12:00:22 -04:00
Tim Gross	466b620fa4	CSI: volume snapshot	2021-04-01 11:16:52 -04:00
Tim Gross	9fc4cf1419	CSI: fingerprint detailed controller capabilities In order to support new controller RPCs, we need to fingerprint volume capabilities in more detail and perform controller RPCs only when the specific capability is present. This fixes a bug in Ceph support where the plugin can only suport create/delete but we assume that it also supports attach/detach.	2021-03-31 16:37:09 -04:00
Tim Gross	f149abfa41	CSI: volume creation/registration should not validate attachment The CSI specification requires that we validate a list of `Capability` (access mode + accessibility) when we create volume, but the existing volume registration workflow incorrectly validates a single capability. The specific capability required by a volume claim is checked at the time we make the claim, so remove the check for `AttachmentMode`/`AcccessMode`.	2021-03-31 16:37:09 -04:00
Tim Gross	aec5337862	CSI: HTTP handlers for create/delete/list	2021-03-31 16:37:09 -04:00
Tim Gross	d38008176e	CSI: create/delete/list volume RPCs This commit implements the RPC handlers on the client that talk to the CSI plugins on that client for the Create/Delete/List RPC.	2021-03-31 16:37:09 -04:00
Drew Bailey	9adca240f8	Event Stream: Track ACL changes, unsubscribe on invalidating changes (#9447 ) * upsertaclpolicies * delete acl policies msgtype * upsert acl policies msgtype * delete acl tokens msgtype * acl bootstrap msgtype wip unsubscribe on token delete test that subscriptions are closed after an ACL token has been deleted Start writing policyupdated test * update test to use before/after policy * add SubscribeWithACLCheck to run acl checks on subscribe * update rpc endpoint to use broker acl check * Add and use subscriptions.closeSubscriptionFunc This fixes the issue of not being able to defer unlocking the mutex on the event broker in the for loop. handle acl policy updates * rpc endpoint test for terminating acl change * add comments Co-authored-by: Kris Hicks <khicks@hashicorp.com>	2020-12-01 11:11:34 -05:00
Tim Gross	c320c1ba57	CSI: fix struct copying errors (#9239 ) The CSIVolume struct "denormalizes" allocations when it's first queried from the state store. The CSIVolumeByID method on the state store copies the volume before denormalizing so that we don't end up with unexpected changes. The copying has some subtle bugs that meant that Allocations (as well as Topologies and MountOptions) were not getting copied when expected. Also, ensure we never write allocations attached to volumes to the state store during claims.	2020-11-18 10:59:25 -05:00
Drew Bailey	6c788fdccd	Events/msgtype cleanup (#9117 ) * use msgtype in upsert node adds message type to signature for upsert node, update tests, remove placeholder method * UpsertAllocs msg type test setup * use upsertallocs with msg type in signature update test usage of delete node delete placeholder msgtype method * add msgtype to upsert evals signature, update test call sites with test setup msg type handle snapshot upsert eval outside of FSM and ignore eval event remove placeholder upsertevalsmsgtype handle job plan rpc and prevent event creation for plan msgtype cleanup upsertnodeevents updatenodedrain msgtype msg type 0 is a node registration event, so set the default to the ignore type * fix named import * fix signature ordering on upsertnode to match	2020-10-19 09:30:15 -04:00
Tim Gross	e8c13a2307	csi: validate mount options during volume registration (#9044 ) Volumes using attachment mode `file-system` use the CSI filesystem API when they're mounted, and can be passed mount options. But `block-device` mode volumes don't have this option. When RPCs are made to plugins, we are silently dropping the mount options we don't expect to see, but this results in a poor operator experience when the mount options aren't honored. This changeset makes passing mount options to a `block-device` volume a validation error.	2020-10-08 09:23:21 -04:00
Tim Gross	3ceb5b36b1	csi: allow more than 1 writer claim for multi-writer mode (#9040 ) Fixes a bug where CSI volumes with the `MULTI_NODE_MULTI_WRITER` access mode were using the same logic as `MULTI_NODE_SINGLE_WRITER` to determine whether the volume had writer claims available for scheduling. Extends CSI claim endpoint test to exercise multi-reader and make sure `WriteFreeClaims` is exercised for multi-writer in feasibility test.	2020-10-07 10:43:23 -04:00
Lang Martin	7d483f93c0	csi: plugins track jobs in addition to allocations, and use job information to set expected counts (#8699 ) * nomad/structs/csi: add explicit job support * nomad/state/state_store: capture job updates directly * api/nodes: CSIInfo needs the AllocID * command/agent/csi_endpoint: AllocID was missing Co-authored-by: Tim Gross <tgross@hashicorp.com>	2020-08-27 17:20:00 -04:00
Lang Martin	a27913e699	CSI RPC Token (#8626 ) * client/allocrunner/csi_hook: use the Node SecretID * client/allocrunner/csi_hook: include the namespace for Claim	2020-08-11 13:08:39 -04:00
Tim Gross	443fdaa86b	csi: nomad volume detach command (#8584 ) The soundness guarantees of the CSI specification leave a little to be desired in our ability to provide a 100% reliable automated solution for managing volumes. This changeset provides a new command to bridge this gap by providing the operator the ability to intervene. The command doesn't take an allocation ID so that the operator doesn't have to keep track of alloc IDs that may have been GC'd. Handle this case in the unpublish RPC by sending the client RPC for all the terminal/nil allocs on the selected node.	2020-08-11 10:18:54 -04:00
Tim Gross	eaa14ab64c	csi: add unpublish RPC (#8572 ) This changeset is plumbing for a `nomad volume detach` command that will be reused by the volumewatcher claim GC as well.	2020-08-06 13:51:29 -04:00
Lars Lehtonen	f32e80175d	nomad: fix dropped test error (#8356 )	2020-07-06 08:46:54 -04:00
Tim Gross	4374c1a837	csi: support Secrets parameter in CSI RPCs (#7923 ) CSI plugins can require credentials for some publishing and unpublishing workflow RPCs. Secrets are configured at the time of volume registration, stored in the volume struct, and then passed around as an opaque map by Nomad to the plugins.	2020-05-11 17:12:51 -04:00
Tim Gross	801ebcfe8d	periodic GC for CSI plugins (#7878 ) This changeset implements a periodic garbage collection of unused CSI plugins. Plugins are self-cleaning when the last allocation for a plugin is stopped, but this feature will cover any missing edge cases and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins cleaned up.	2020-05-06 16:49:12 -04:00
Tim Gross	ce86a594a6	csi: fix plugin counts on node update (#7844 ) In this changeset: * If a Nomad client node is running both a controller and a node plugin (which is a common case), then if only the controller or the node is removed, the plugin was not being updated with the correct counts. * The existing test for plugin cleanup didn't go back to the state store, which normally is ok but is complicated in this case by denormalization which changes the behavior. This commit makes the test more comprehensive. * Set "controller required" when plugin has `PUBLISH_READONLY`. All known controllers that support `PUBLISH_READONLY` also support `PUBLISH_UNPUBLISH_VOLUME` but we shouldn't assume this. * Only create plugins when the allocs for those plugins are healthy. If we allow a plugin to be created for the first time when the alloc is not healthy, then we'll recreate deleted plugins when the job's allocs all get marked terminal. * Terminal plugin alloc updates should cleanup the plugin. The client fingerprint can't tell if the plugin is unhealthy intentionally (for the case of updates or job stop). Allocations that are server-terminal should delete themselves from the plugin and trigger a plugin self-GC, the same as an unused node.	2020-05-05 15:39:57 -04:00
Tim Gross	e34f099d20	csi: read-repair CSI volume claims (#7824 ) The `CSIVolumeClaim` fields were added after 0.11.1, so claims made before that may be missing the value. Repair this when we read the volume out of the state store. The `NodeID` field was added after 0.11.0, so we need to ensure it's been populated during upgrades from 0.11.0.	2020-04-29 11:57:19 -04:00
Tim Gross	4e9bd1e1d1	refactor: consolidate private methods for CSI RPC (#7702 ) Follow-up for a method missed in the refactor for #7688. The `volAndPluginLookup` method is only ever called from the server's `CSI` RPC and never the `ClientCSI` RPC, so move it into that scope.	2020-04-13 10:46:43 -04:00
Tim Gross	f37e986b1b	refactor: make nodeForControllerPlugin private to ClientCSI (#7688 ) The current design of `ClientCSI` RPC requires that callers in the server know about the free-standing `nodeForControllerPlugin` function. This makes it difficult to send `ClientCSI` RPC messages from subpackages of `nomad` and adds a bunch of boilerplate to every server-side caller of a controller RPC. This changeset makes it so that the `ClientCSI` RPCs will populate and validate the controller's client node ID if it hasn't been passed by the caller, centralizing the logic of picking and validating controller targets into the `nomad.ClientCSI` struct.	2020-04-10 16:47:21 -04:00
Tim Gross	f6b3d38eb8	CSI: move node unmount to server-driven RPCs (#7596 ) If a volume-claiming alloc stops and the CSI Node plugin that serves that alloc's volumes is missing, there's no way for the allocrunner hook to send the `NodeUnpublish` and `NodeUnstage` RPCs. This changeset addresses this issue with a redesign of the client-side for CSI. Rather than unmounting in the alloc runner hook, the alloc runner hook will simply exit. When the server gets the `Node.UpdateAlloc` for the terminal allocation that had a volume claim, it creates a volume claim GC job. This job will made client RPCs to a new node plugin RPC endpoint, and only once that succeeds, move on to making the client RPCs to the controller plugin. If the node plugin is unavailable, the GC job will fail and be requeued.	2020-04-02 16:04:56 -04:00
Lang Martin	24449e23af	csi: volume validate namespace (#7587 ) * nomad/state/state_store: enforce that the volume namespace exists * nomad/csi_endpoint_test: a couple of broken namespaces now * nomad/csi_endpoint_test: one more test * nomad/node_endpoint_test: use structs.DefaultNamespace * nomad/state/state_store_test: use DefaultNamespace	2020-04-02 10:13:41 -04:00
Lang Martin	50ff9ccd44	csi: plugin deregistration on plugin job GC (#7502 ) * nomad/structs/csi: delete just one plugin type from a node * nomad/structs/csi: add DeleteAlloc * nomad/state/state_store: add deleteJobFromPlugin * nomad/state/state_store: use DeleteAlloc not DeleteNodeType * move CreateTestCSIPlugin to state to avoid an import cycle * nomad/state/state_store_test: delete a plugin by deleting its jobs * nomad/_test: move CreateTestCSIPlugin to state nomad/state/state_store: update one plugin per transaction * command/plugin_status_test: move CreateTestCSIPlugin * nomad: csi: handle nils CSIPlugin methods, clarity	2020-03-26 17:07:18 -04:00
Mahmood Ali	b33dbe539b	tests: TestCSIPluginEndpoint_ACLNamespaceAlloc is ent TestCSIPluginEndpoint_ACLNamespaceAlloc uses namespace features not present in OSS.	2020-03-25 08:45:44 -04:00
Lang Martin	0847cb513c	csi: volume/plugin list should return an empty array, not nil (#7443 ) * nomad/csi_endpoint: return an empty list, not nil * nomad/csi_endpoint_test: volume list returns non-nil	2020-03-23 21:21:40 -04:00

1 2

67 commits