open-nomad

Author	SHA1	Message	Date
Tim Gross	466b620fa4	CSI: volume snapshot	2021-04-01 11:16:52 -04:00
Tim Gross	71b9daffb9	CSI: fix misleading HTTP test The HTTP test to create CSI volumes depends on having a controller plugin to talk to, but the test was using a node-only plugin, which allows it to silently ignore the missing controller.	2021-03-31 16:37:09 -04:00
Tim Gross	9fc4cf1419	CSI: fingerprint detailed controller capabilities In order to support new controller RPCs, we need to fingerprint volume capabilities in more detail and perform controller RPCs only when the specific capability is present. This fixes a bug in Ceph support where the plugin can only suport create/delete but we assume that it also supports attach/detach.	2021-03-31 16:37:09 -04:00
Tim Gross	f149abfa41	CSI: volume creation/registration should not validate attachment The CSI specification requires that we validate a list of `Capability` (access mode + accessibility) when we create volume, but the existing volume registration workflow incorrectly validates a single capability. The specific capability required by a volume claim is checked at the time we make the claim, so remove the check for `AttachmentMode`/`AcccessMode`.	2021-03-31 16:37:09 -04:00
Tim Gross	aec5337862	CSI: HTTP handlers for create/delete/list	2021-03-31 16:37:09 -04:00
Tim Gross	d38008176e	CSI: create/delete/list volume RPCs This commit implements the RPC handlers on the client that talk to the CSI plugins on that client for the Create/Delete/List RPC.	2021-03-31 16:37:09 -04:00
Tim Gross	43622680fa	test infrastructure for mock client RPCs (#10193 ) This commit includes a new test client that allows overriding the RPC protocols. Only the RPCs that are passed in are registered, which lets you implement a mock RPC in the server tests. This commit includes an example of this for the ClientCSI RPC server.	2021-03-31 16:37:09 -04:00
Mahmood Ali	e24dac2f39	fixup! oversubscription: Add MemoryMaxMB to internal structs	2021-03-31 09:26:26 -04:00
Mahmood Ali	4ec2d8f5e4	fixup! oversubscription: Add MemoryMaxMB to internal structs	2021-03-31 08:52:20 -04:00
Mahmood Ali	0c2551270a	oversubscription: Add MemoryMaxMB to internal structs Start tracking a new MemoryMaxMB field that represents the maximum memory a task may use in the client. This allows tasks to specify a memory reservation (to be used by scheduler when placing the task) but use excess memory used on the client if the client has any. This commit adds the server tracking for the value, and ensures that allocations AllocatedResource fields include the value.	2021-03-30 16:55:58 -04:00
Nick Ethier	daecfa61e6	Merge pull request #10203 from hashicorp/f-cpu-cores Reserved Cores [1/4]: Structs and scheduler implementation	2021-03-29 14:05:54 -04:00
Mahmood Ali	dbc3850358	Merge pull request #10145 from hashicorp/b-periodic-init-status periodic: always reset periodic children status	2021-03-26 09:19:08 -04:00
Mahmood Ali	5d75705edd	dispatched parameterized job should clear status too	2021-03-25 15:14:21 -04:00
Mahmood Ali	e643742a38	Add a test for parameterized summary counts	2021-03-25 11:27:09 -04:00
Mahmood Ali	b0e048bfa4	periodic: always reset periodic children status Fixes a bug where Nomad reports negative or incorrect running children counts for periodic jobs. The periodic dispatcher derives a child job without reseting the status. If the periodic job has a `running` status, the derived job will start as `running` status and transition to `pending`. Since this is unexpected transition, the counting in StateStore.setJobSummary gets out of sync and result in negative/incorrect values. Note that this only affects periodic jobs after a leader transition. During the first job registration, the job is added with `pending` or `""` status. However, after a leader transition, the new leader repopulates the dispatcher heap with `"running"` status and triggers the bug.	2021-03-25 11:27:09 -04:00
Drew Bailey	74836b95b2	configuration and oss components for licensing (#10216 ) * configuration and oss components for licensing * vendor sync	2021-03-23 09:08:14 -04:00
Mahmood Ali	73b7b98018	testing: default nomad test nodes to 1.0.0 (#10213 )	2021-03-23 08:24:26 -04:00
Nick Ethier	648ade63ad	scheduler: implement scheduling of reserved cores	2021-03-19 00:29:07 -04:00
Nick Ethier	26b200e8bd	api: add new 'cores' field to task resources	2021-03-18 23:13:30 -04:00
Nick Ethier	4b2912d343	structs: add struct fields and funcs for reservable cpu cores	2021-03-18 22:49:06 -04:00
Tim Gross	fa25e048b2	CSI: unique volume per allocation Add a `PerAlloc` field to volume requests that directs the scheduler to test feasibility for volumes with a source ID that includes the allocation index suffix (ex. `[0]`), rather than the exact source ID. Read the `PerAlloc` field when making the volume claim at the client to determine if the allocation index suffix (ex. `[0]`) should be added to the volume source ID.	2021-03-18 15:35:11 -04:00
Tim Gross	9b2b580d1a	CSI: remove prefix matching from CSIVolumeByID and fix CLI prefix matching (#10158 ) Callers of `CSIVolumeByID` are generally assuming they should receive a single volume. This potentially results in feasibility checking being performed against the wrong volume if a volume's ID is a prefix substring of other volume (for example: "test" and "testing"). Removing the incorrect prefix matching from `CSIVolumeByID` breaks prefix matching in the command line client. Add the required elements for prefix matching to the commands and API.	2021-03-18 14:32:40 -04:00
Charlie Voiselle	0473f35003	Fixup uses of `sanity` (#10187 ) * Fixup uses of `sanity` * Remove unnecessary comments. These checks are better explained by earlier comments about the context of the test. Per @tgross, moved the tests together to better reinforce the overall shared context. * Update nomad/fsm_test.go	2021-03-16 18:05:08 -04:00
Mahmood Ali	1d48433356	server: handle invalid jobs in expose handler hook (#10154 ) The expose handler hook must handle if the submitted job is invalid. Without this validation, the rpc handler panics on invalid input. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2021-03-10 09:12:46 -05:00
Tim Gross	7010a344d6	one-time token: never return expired tokens	2021-03-10 08:17:56 -05:00
Tim Gross	97b0e26d1f	RPC endpoints to support 'nomad ui -login' RPC endpoints for the user-driven APIs (`UpsertOneTimeToken` and `ExchangeOneTimeToken`) and token expiration (`ExpireOneTimeTokens`). Includes adding expiration to the periodic core GC job.	2021-03-10 08:17:56 -05:00
Tim Gross	6b2c4b56d0	state store updates for one-time tokens The `OneTimeToken` struct is to support the `nomad ui -login` command. This changeset adds the struct to the Nomad state store.	2021-03-10 08:17:56 -05:00
Andre Ilhicas	f45fc6c899	consul/connect: enable setting local_bind_address in upstream	2021-02-26 11:37:31 +00:00
Drew Bailey	86d9e1ff90	Merge pull request #9955 from hashicorp/on-update-services Service and Check on_update configuration option (readiness checks)	2021-02-24 10:11:05 -05:00
Tim Gross	b764f52ab9	deploymentwatcher: reset progress deadline on promotion (#10042 ) In a deployment with two groups (ex. A and B), if group A's canary becomes healthy before group B's, the deadline for the overall deployment will be set to that of group A. When the deployment is promoted, if group A is done it will not contribute to the next deadline cutoff. Group B's old deadline will be used instead, which will be in the past and immediately trigger a deployment progress failure. Reset the progress deadline when the job is promotion to avoid this bug, and to better conform with implicit user expectations around how the progress deadline should interact with promotions.	2021-02-22 16:44:03 -05:00
James Rasell	6553cc3da7	drainer: fix error message when handling drain deadlined nodes.	2021-02-18 11:45:44 +01:00
AndrewChubatiuk	cd152643fb	fixed connect port label	2021-02-13 02:42:14 +02:00
AndrewChubatiuk	3d0aa2ef56	allocate sidecar task port on host_network interface	2021-02-13 02:42:13 +02:00
Nick Ethier	fcc1f4c805	Merge pull request #9946 from hashicorp/b-9477 structs: namespace port validation by host_network	2021-02-11 12:53:28 -05:00
Seth Hoenig	45e0e70a50	consul/connect: enable custom sidecars to use expose checks This PR enables jobs configured with a custom sidecar_task to make use of the `service.expose` feature for creating checks on services in the service mesh. Before we would check that sidecar_task had not been set (indicating that something other than envoy may be in use, which would not support envoy's expose feature). However Consul has not added support for anything other than envoy and probably never will, so having the restriction in place seems like an unnecessary hindrance. If Consul ever does support something other than Envoy, they will likely find a way to provide the expose feature anyway. Fixes #9854	2021-02-09 10:49:37 -06:00
Drew Bailey	8507d54e3b	e2e test for on_update service checks check_restart not compatible with on_update=ignore reword caveat	2021-02-08 08:32:40 -05:00
Drew Bailey	82f971f289	OnUpdate configuration for services and checks Allow for readiness type checks by configuring nomad to ignore warnings or errors reported by a service check. This allows the deployment to progress and while Consul handles introducing the sercive into a resource pool once the check passes.	2021-02-08 08:32:40 -05:00
Nick Ethier	eacc4da499	Merge branch 'master' into b-9477	2021-02-05 11:58:13 -05:00
Tim Gross	eb3dd17fb2	volumes: implement plan diff for volume requests The details of host volume and CSI volume requests do not show up in `nomad plan` outputs, although the updates are detected by the scheduler and result in an update as expected.	2021-02-04 16:55:17 -05:00
Chris Baker	ebbb760ec4	support for scaling_policy in global prefix search	2021-02-03 19:26:57 +00:00
Nick Ethier	43a4d72fda	structs: namespace port validation by host_network	2021-02-02 14:56:52 -05:00
Seth Hoenig	720780992c	consul/connect: copy bind address map if empty This parameter is now supposed to be non-nil even if empty, and the Copy method should also maintain that invariant.	2021-01-25 10:36:04 -06:00
Seth Hoenig	1ad219c441	consul/connect: remove debug line	2021-01-25 10:36:04 -06:00
Seth Hoenig	8b05efcf88	consul/connect: Add support for Connect terminating gateways This PR implements Nomad built-in support for running Consul Connect terminating gateways. Such a gateway can be used by services running inside the service mesh to access "legacy" services running outside the service mesh while still making use of Consul's service identity based networking and ACL policies. https://www.consul.io/docs/connect/gateways/terminating-gateway These gateways are declared as part of a task group level service definition within the connect stanza. service { connect { gateway { proxy { // envoy proxy configuration } terminating { // terminating-gateway configuration entry } } } } Currently Envoy is the only supported gateway implementation in Consul. The gateay task can be customized by configuring the connect.sidecar_task block. When the gateway.terminating field is set, Nomad will write/update the Configuration Entry into Consul on job submission. Because CEs are global in scope and there may be more than one Nomad cluster communicating with Consul, there is an assumption that any terminating gateway defined in Nomad for a particular service will be the same among Nomad clusters. Gateways require Consul 1.8.0+, checked by a node constraint. Closes #9445	2021-01-25 10:36:04 -06:00
Drew Bailey	007158ee75	ignore setting job summary when oldstatus == newstatus (#9884 )	2021-01-25 10:34:27 -05:00
Drew Bailey	630babb886	prevent double job status update (#9768 ) * Prevent Job Statuses from being calculated twice https://github.com/hashicorp/nomad/pull/8435 introduced atomic eval insertion iwth job (de-)registration. This change removes a now obsolete guard which checked if the index was equal to the job.CreateIndex, which would empty the status. Now that the job regisration eval insetion is atomic with the registration this check is no longer necessary to set the job statuses correctly. * test to ensure only single job event for job register * periodic e2e * separate job update summary step * fix updatejobstability to use copy instead of modified reference of job * update envoygatewaybindaddresses copy to prevent job diff on null vs empty * set ConsulGatewayBindAddress to empty map instead of nil fix nil assertions for empty map rm unnecessary guard	2021-01-22 09:18:17 -05:00
Kris Hicks	8f9e47a8e7	Clean up Task Validation tests (#9833 ) Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>	2021-01-21 11:53:02 -08:00
Dennis Schön	3eaf1432aa	validate connect block allowed only within group.service	2021-01-20 14:34:23 -05:00
Seth Hoenig	f213b8c51b	consul/connect: always set gateway proxy default timeout If the connect.proxy stanza is left unset, the connection timeout value is not set but is assumed to be, and may cause a non-fatal NPE on job submission.	2021-01-19 11:23:41 -06:00
Kris Hicks	d71a90c8a4	Fix some errcheck errors (#9811 ) * Throw away result of multierror.Append When given a multierror.Error, it is mutated, therefore the return value is not needed. Simplify MergeMultierrorWarnings, use StringBuilder * Hash.Write() never returns an error * Remove error that was always nil * Remove error from Resources.Add signature When this was originally written it could return an error, but that was refactored away, and callers of it as of today never handle the error. * Throw away results of io.Copy during Bridge * Handle errors when computing node class in test	2021-01-14 12:46:35 -08:00

1 2 3 4 5 ...

3645 commits