open-nomad

Author	SHA1	Message	Date
James Rasell	96d8512c85	test: move remaining tests to use ci.Parallel.	2022-03-24 08:45:13 +01:00
James Rasell	bb8514fc75	core: remove node service registrations when node is down. When a node fails its heart beating a number of actions are taken to ensure state is cleaned. Service registrations a loosely tied to nodes, therefore we should remove these from state when a node is considered terminally down.	2022-03-23 09:42:46 +01:00
James Rasell	a646333263	Merge branch 'main' into f-1.3-boogie-nights	2022-03-23 09:41:25 +01:00
Tim Gross	33558cb51e	csi: fix handling of garbage collected node in node unpublish (#12350 ) When a node is garbage collected, we assume that the volume is no longer attached to it and ignore the `ErrUnknownNode` error. But we used `errors.Is` to check for a wrapped error, and RPC flattens the errors during serialization. This results in an error check that works in automated testing but not in real clusters. Use a string contains check instead.	2022-03-22 15:40:24 -04:00
Luiz Aoqui	f8973d364e	core: use the new Raft API when removing peers (#12340 ) Raft v3 introduced a new API for adding and removing peers that takes the peer ID instead of the address. Prior to this change, Nomad would use the remote peer Raft version for deciding which API to use, but this would not work in the scenario where a Raft v3 server tries to remove a Raft v2 server; the code running uses v3 so it's unable to call the v2 API. This change uses the Raft version of the server running the code to decide which API to use. If the remote peer is a Raft v2, it uses the server address as the ID.	2022-03-22 15:07:31 -04:00
Luiz Aoqui	b5a42cd55d	set raft v3 as the default config (#12341 )	2022-03-22 15:06:25 -04:00
Tim Gross	60cfeacd76	drainer: defer CSI plugins until last (#12324 ) When a node is drained, system jobs are left until last so that operators can rely on things like log shippers running even as their applications are getting drained off. Include CSI plugins in this set so that Controller plugins deployed as services can be handled as gracefully as Node plugins that are running as system jobs.	2022-03-22 10:26:56 -04:00
Tim Gross	2a2ebd0537	CSI: presentation improvements (#12325 ) * Fix plugin capability sorting. The `sort.StringSlice` method in the stdlib doesn't actually sort, but instead constructs a sorting type which you call `Sort()` on. * Sort allocations for plugins by modify index. Present allocations in modify index order so that newest allocations show up at the top of the list. This results in sorted allocs in `nomad plugin status :id`, just like `nomad job status :id`. * Sort allocations for volumes in HTTP response. Present allocations in modify index order so that newest allocations show up at the top of the list. This results in sorted allocs in `nomad volume status :id`, just like `nomad job status :id`. This is implemented in the HTTP response and not in the state store because the state store maintains two separate lists of allocs that are merged before sending over the API. * Fix length of alloc IDs in `nomad volume status` output	2022-03-22 09:48:38 -04:00
James Rasell	68cd3d89fe	core: fixup node drain update message spelling.	2022-03-21 13:37:08 +01:00
James Rasell	042bf0fa57	client: hookup service wrapper for use within client hooks.	2022-03-21 10:29:57 +01:00
Seth Hoenig	4d86f5d94d	ci: limit gotestsum to circle ci Part 2 of breaking up https://github.com/hashicorp/nomad/pull/12255 This PR makes it so gotestsum is invoked only in CircleCI. Also the HCLogger(t) is plumbed more correctly in TestServer and TestAgent so that they respect NOMAD_TEST_LOG_LEVEL. The reason for these is we'll want to disable logging in GHA, where spamming the disk with logs really drags performance.	2022-03-18 09:15:01 -05:00
Luiz Aoqui	15089f055f	api: add related evals to eval details (#12305 ) The `related` query param is used to indicate that the request should return a list of related (next, previous, and blocked) evaluations. Co-authored-by: Jasmine Dahilig <jasmine@hashicorp.com>	2022-03-17 13:56:14 -04:00
Luiz Aoqui	8db12c2a17	server: transfer leadership in case of error (#12293 ) When a Nomad server becomes the Raft leader, it must perform several actions defined in the establishLeadership function. If any of these actions fail, Raft will think the node is the leader, but it will not actually be able to act as a Nomad leader. In this scenario, leadership must be revoked and transferred to another server if possible, or the node should retry the establishLeadership steps.	2022-03-17 11:10:57 -04:00
Luiz Aoqui	83d834d84c	tests: move state store namespace tests from ENT (#12308 )	2022-03-16 11:56:11 -04:00
Seth Hoenig	aca50349f4	Merge pull request #12299 from hashicorp/ci-parallel ci: trade test parallelization for unconstrained gomaxprocs	2022-03-16 08:55:39 -05:00
Seth Hoenig	2631659551	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Luiz Aoqui	8cf599c7fc	fix alloc list test (#12297 ) The alloc list test with pagination was creating allocs before the target namespace existed. This works in OSS but fails in ENT because quotas are checked before the alloc can be created, so the namespace must exist beforehand.	2022-03-15 10:41:07 -04:00
James Rasell	dc1378d6eb	job: add native service discovery job constraint mutator.	2022-03-14 12:42:12 +01:00
James Rasell	783d7fdc31	jobspec: add service block provider parameter and validation.	2022-03-14 09:21:20 +01:00
Luiz Aoqui	2876739a51	api: apply consistent behaviour of the reverse query parameter (#12244 )	2022-03-11 19:44:52 -05:00
Tim Gross	066a820209	job summary query in `Job.List` RPC should use job's namespace (#12249 ) The `Job.List` RPC attaches a `JobSummary` to each job stub. We're using the request namespace and not the job namespace for that query, which results in a nil `JobSummary` whenever we pass the wildcard namespace. This is incorrect and causes panics in downstream consumers like the CLI, which assume the `JobSummary` is non-nil as an unstate invariant.	2022-03-09 10:47:19 -05:00
Luiz Aoqui	550c5ab6ec	fix TestCSIVolumeEndpoint_List_PaginationFiltering test (#12245 )	2022-03-09 09:40:40 -05:00
Luiz Aoqui	ab8ce87bba	Add pagination, filtering and sort to more API endpoints (#12186 )	2022-03-08 20:54:17 -05:00
Michael Schurter	7bb8de68e5	Merge pull request #12138 from jorgemarey/f-ns-meta Add metadata to namespaces	2022-03-07 10:19:33 -08:00
Tim Gross	2dafe46fe3	CSI: allow updates to volumes on re-registration (#12167 ) CSI `CreateVolume` RPC is idempotent given that the topology, capabilities, and parameters are unchanged. CSI volumes have many user-defined fields that are immutable once set, and many fields that are not user-settable. Update the `Register` RPC so that updating a volume via the API merges onto any existing volume without touching Nomad-controlled fields, while validating it with the same strict requirements expected for idempotent `CreateVolume` RPCs. Also, clarify that this state store method is used for everything, not just for the `Register` RPC.	2022-03-07 11:06:59 -05:00
Tim Gross	3a692a4360	csi: get plugin ID for creating snapshot from volume, not args (#12195 ) The `CreateSnapshot` RPC expects a plugin ID to be set by the API, but in the common case of the `nomad volume snapshot create` command, we don't ask the user for the plugin ID because it's available from the volume we're snapshotting. Change the order of the RPC so that we get the volume first and then use the volume's plugin ID for the plugin if the API didn't set the value.	2022-03-07 09:06:50 -05:00
Jorge Marey	372ea7479b	Add changelog file. Add meta to ns mock for testing	2022-03-07 10:56:56 +01:00
Tim Gross	b776c1c196	csi: fix prefix queries for plugin list RPC (#12194 ) The `CSIPlugin.List` RPC was intended to accept a prefix to filter the list of plugins being listed. This was being accidentally being done in the state store instead, which contributed to incorrect filtering behavior for plugins in the `volume plugin status` command. Move the prefix matching into the RPC so that it calls the prefix-matching method in the state store if we're looking for a prefix. Update the `plugin status command` to accept a prefix for the plugin ID argument so that it matches the expected behavior of other commands.	2022-03-04 16:44:09 -05:00
Luiz Aoqui	b1809eb48c	Fix CSI volume list with prefix and `` namespace (#12184 ) When using a prefix value and the wildcard for namespace, the endpoint would not take the prefix value into consideration due to the order in which the checks were executed but also the logic for retrieving volumes from the state store. This commit changes the order to check for a prefix first and wraps the result iterator of the state store query in a filter to apply the prefix.	2022-03-03 17:27:04 -05:00
James Rasell	ca6ba2e047	rpc: add job service registration list RPC endpoint.	2022-03-03 11:26:14 +01:00
James Rasell	b68d573aa5	rpc: add alloc service registration list RPC endpoint.	2022-03-03 11:25:55 +01:00
James Rasell	1ad8ea558a	rpc: add service registration RPC endpoints.	2022-03-03 11:25:29 +01:00
James Rasell	52283f057f	fsm: add FSM functionality for service registration endpoints.	2022-03-03 11:24:29 +01:00
Luiz Aoqui	01931587ba	api: paginated results with different ordering (#12128 ) The paginator logic was built when go-memdb iterators would return items ordered lexicographically by their ID prefixes, but #12054 added the option for some tables to return results ordered by their `CreateIndex` instead, which invalidated the previous paginator assumption. The iterator used for pagination must still return results in some order so that the paginator can properly handle requests where the next_token value is not present in the results anymore (e.g., the eval was GC'ed). In these situations, the paginator will start the returned page in the first element right after where the requested token should've been. This commit moves the logic to generate pagination tokens from the elements being paginated to the iterator itself so that callers can have more control over the token format to make sure they are properly ordered and stable. It also allows configuring the paginator as being ordered in ascending or descending order, which is relevant when looking for a token that may not be present anymore.	2022-03-01 15:36:49 -05:00
Tim Gross	f2a4ad0949	CSI: implement support for topology (#12129 )	2022-03-01 10:15:46 -05:00
James Rasell	8a23afdb56	events: add state objects and logic for service registrations.	2022-02-28 10:44:58 +01:00
James Rasell	20249bb761	state: add service registration restore functionality.	2022-02-28 10:15:27 +01:00
James Rasell	74b367553e	state: add service registration state interaction functions.	2022-02-28 10:15:03 +01:00
James Rasell	cfdb5a3c66	structs: add service registration struct and basic composed funcs.	2022-02-28 10:14:40 +01:00
James Rasell	1da859c60e	mock: add service registration mock generation for test use.	2022-02-28 10:14:25 +01:00
James Rasell	cf0b63d561	state: add the table schema for the service_registrations table.	2022-02-28 10:14:10 +01:00
Jorge Marey	a466f01120	Add metadata to namespaces	2022-02-27 09:09:10 +01:00
Seth Hoenig	1274aa690f	tests: deflake test that joins a server with non-voting servers to form qourum This PR - upgrades the serf library - has the test start the join process using the un-joined server first - disables schedulers on the servers - uses the WaitForLeader and wantPeers helpers Not sure which, if any of these actually improves the flakiness of this test.	2022-02-24 17:02:58 -06:00
Tim Gross	cfe3117af8	CSI: enforce usage at claim time (#12112 ) * Remove redundant schedulable check in `FreeWriteClaims`. If a volume has been created but not yet claimed, its capabilities will be checked in `WriteSchedulable` at both scheduling time and claim time. We don't need to also check them in the `FreeWriteClaims` method. * Enforce maximum volume claims for writers. When the scheduler checks feasibility for CSI volumes, the check is fairly loose: earlier versions of the same job are not counted as active claims. This allows the scheduler to place new allocations for the new version of a job, under the assumption that we'll replace the existing allocations and their volume claims. But when the alloc runner claims the volume, we need to enforce the active claims even if they're for allocations of an earlier version of the job. Otherwise we'll try to mount a volume that's currently being unmounted, and this will cause replacement allocations to frequently fail. * Enforce single-node reader check for read-only volumes. When the alloc runner makes a claim for a read-only volume, we only check that the volume is potentially schedulable and not that it actually has free read claims.	2022-02-24 09:37:37 -05:00
Sander Mol	42b338308f	add go-sockaddr templating support to nomad consul address (#12084 )	2022-02-24 09:34:54 -05:00
Florian Apolloner	3bced8f558	namespaces: allow enabling/disabling allowed drivers per namespace	2022-02-24 09:27:32 -05:00
Seth Hoenig	57b9c64b8f	Merge pull request #12107 from hashicorp/use-bbolt core: swap bolt impl and enable configuring raft freelist sync behavior	2022-02-24 08:25:54 -06:00
Tim Gross	5b7b9fdafb	csi: tolerate missing plugins on job delete (#12114 ) If a plugin job fails before successfully fingerprinting the plugins, the plugin will not exist when we try to delete the job. Tolerate missing plugins.	2022-02-24 08:53:15 -05:00
Seth Hoenig	de95998faa	core: switch to go.etc.io/bbolt This PR swaps the underlying BoltDB implementation from boltdb/bolt to go.etc.io/bbolt. In addition, the Server has a new configuration option for disabling NoFreelistSync on the underlying database. Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81 Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720	2022-02-23 14:26:41 -06:00
Tim Gross	17dc0adee3	csi: fix broken test (#12110 )	2022-02-23 13:48:39 -05:00

1 2 3 4 5 ...

3901 commits