open-nomad

Author	SHA1	Message	Date
Danielle Lancashire	564f5cec93	csimanager: Fingerprint controller capabilities	2020-03-23 13:58:29 -04:00
Danielle Lancashire	9a23e27439	client_csi: Validate Access/Attachment modes	2020-03-23 13:58:28 -04:00
Danielle Lancashire	2fc65371a8	csi: ClientCSIControllerPublish* -> ClientCSIControllerAttach*	2020-03-23 13:58:28 -04:00
Danielle Lancashire	259852b05f	csi: Model Attachment and Access modes	2020-03-23 13:58:28 -04:00
Danielle Lancashire	2c29b1c53d	client: Setup CSI RPC Endpoint This commit introduces a new set of endpoints to a Nomad Client: ClientCSI. ClientCSI is responsible for mediating requests from a Nomad Server to a CSI Plugin running on a Nomad Client. It should only really be used to make controller RPCs.	2020-03-23 13:58:28 -04:00
Danielle Lancashire	426c26d7c0	CSI Plugin Registration (#6555 ) This changeset implements the initial registration and fingerprinting of CSI Plugins as part of #5378. At a high level, it introduces the following: * A `csi_plugin` stanza as part of a Nomad task configuration, to allow a task to expose that it is a plugin. * A new task runner hook: `csi_plugin_supervisor`. This hook does two things. When the `csi_plugin` stanza is detected, it will automatically configure the plugin task to receive bidirectional mounts to the CSI intermediary directory. At runtime, it will then perform an initial heartbeat of the plugin and handle submitting it to the new `dynamicplugins.Registry` for further use by the client, and then run a lightweight heartbeat loop that will emit task events when health changes. * The `dynamicplugins.Registry` for handling plugins that run as Nomad tasks, in contrast to the existing catalog that requires `go-plugin` type plugins and to know the plugin configuration in advance. * The `csimanager` which fingerprints CSI plugins, in a similar way to `drivermanager` and `devicemanager`. It currently only fingerprints the NodeID from the plugin, and assumes that all plugins are monolithic. Missing features * We do not use the live updates of the `dynamicplugin` registry in the `csimanager` yet. * We do not deregister the plugins from the client when they shutdown yet, they just become indefinitely marked as unhealthy. This is deliberate until we figure out how we should manage deploying new versions of plugins/transitioning them.	2020-03-23 13:58:28 -04:00
Mahmood Ali	5801039214	address review feedback	2020-03-21 17:52:58 -04:00
Mahmood Ali	e1f53347e9	tr: proceed to mark other tasks as dead if alloc fails	2020-03-21 17:52:58 -04:00
Mahmood Ali	e30d26b404	fix test	2020-03-21 17:52:57 -04:00
Jasmine Dahilig	73a64e4397	change jobspec lifecycle stanza to use sidecar attribute instead of block_until status	2020-03-21 17:52:57 -04:00
Jasmine Dahilig	89778bc88d	fix restart policy for system jobs with no lifecycle	2020-03-21 17:52:56 -04:00
Jasmine Dahilig	56e0b8e933	refactor TaskHookCoordinator tests to use mock package and add failed init and sidecar test cases	2020-03-21 17:52:56 -04:00
Jasmine Dahilig	2a8dac077c	remove debugging test code from TestAllocRunner_TaskLeader_StopRestoredTG	2020-03-21 17:52:54 -04:00
Jasmine Dahilig	deb26aefab	fix bug in lifecycle restore tests after refactor	2020-03-21 17:52:54 -04:00
Jasmine Dahilig	2e93d7a875	fix failing ci test: TestTaskRunner_UnregisterConsul_Retries	2020-03-21 17:52:54 -04:00
Jasmine Dahilig	d54a83afee	fix linting errors	2020-03-21 17:52:53 -04:00
Jasmine Dahilig	3d1ffb9337	add task hook coordinator many init tasks test case	2020-03-21 17:52:53 -04:00
Jasmine Dahilig	80f0256cb4	refactor task hook coordinator helper method and tests	2020-03-21 17:52:53 -04:00
Jasmine Dahilig	a0fe570317	clean up restore test	2020-03-21 17:52:52 -04:00
Jasmine Dahilig	7ed08eb75a	partial test for restore functionality	2020-03-21 17:52:52 -04:00
Jasmine Dahilig	0c44d0017d	account for client restarts in task lifecycle hooks	2020-03-21 17:52:51 -04:00
Jasmine Dahilig	4ab39318cc	clean up restart conditions and restart tests for task lifecycle	2020-03-21 17:52:50 -04:00
Jasmine Dahilig	7064deaafb	put lifecycle nil and empty checks in api Canonicalize	2020-03-21 17:52:50 -04:00
Jasmine Dahilig	c27223207c	update task hook coordinator tests	2020-03-21 17:52:46 -04:00
Jasmine Dahilig	12393f90e7	add test for lifecycle coordinator	2020-03-21 17:52:42 -04:00
Jasmine Dahilig	b9a258ed7b	incorporate lifecycle into restart tracker	2020-03-21 17:52:40 -04:00
Mahmood Ali	d7354b8920	Add a coordinator for alloc runners	2020-03-21 17:52:38 -04:00
Yoan Blanc	67692789b7	vendor: vault api and sdk Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-21 17:57:48 +01:00
Mahmood Ali	92712c48eb	Merge pull request #7236 from hashicorp/b-remove-rkt Remove rkt as a built-in driver	2020-03-17 09:07:35 -04:00
Mahmood Ali	d59f149597	Update gopsutil code Latest gosutil includes two backward incompatible changes: First, it removed unused Stolen field in `cae8efcffa (diff-d9747e2da342bdb995f6389533ad1a3d)` . Second, it updated the Windows cpu stats calculation to be inline with other platforms, where it returns absolate stats rather than percentages. See https://github.com/shirou/gopsutil/pull/611.	2020-03-15 09:37:05 +01:00
Yoan Blanc	f85cbddaf1	gopsutils: v2.20.2 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-15 09:36:59 +01:00
Michael Schurter	b72b3e765c	Merge pull request #7170 from fredrikhgrelland/consul_template_upgrade Update consul-template to v0.24.1 and remove deprecated vault grace	2020-03-10 14:15:47 -07:00
Mahmood Ali	21e19ef40d	Merge pull request #7255 from hashicorp/vendor-update-grpc-20200302 update grpc	2020-03-04 09:32:16 -05:00
Mahmood Ali	88cfe504a0	update grpc Upgrade grpc to v1.27.1 and protobuf plugins to v1.3.4.	2020-03-03 08:39:54 -05:00
Mahmood Ali	acbfeb5815	Simplify Bootstrap logic in tests This change updates tests to honor `BootstrapExpect` exclusively when forming test clusters and removes test only knobs, e.g. `config.DevDisableBootstrap`. Background: Test cluster creation is fragile. Test servers don't follow the BootstapExpected route like production clusters. Instead they start as single node clusters and then get rejoin and may risk causing brain split or other test flakiness. The test framework expose few knobs to control those (e.g. `config.DevDisableBootstrap` and `config.Bootstrap`) that control whether a server should bootstrap the cluster. These flags are confusing and it's unclear when to use: their usage in multi-node cluster isn't properly documented. Furthermore, they have some bad side-effects as they don't control Raft library: If `config.DevDisableBootstrap` is true, the test server may not immediately attempt to bootstrap a cluster, but after an election timeout (~50ms), Raft may force a leadership election and win it (with only one vote) and cause a split brain. The knobs are also confusing as Bootstrap is an overloaded term. In BootstrapExpect, we refer to bootstrapping the cluster only after N servers are connected. But in tests and the knobs above, it refers to whether the server is a single node cluster and shouldn't wait for any other server. Changes: This commit makes two changes: First, it relies on `BootstrapExpected` instead of `Bootstrap` and/or `DevMode` flags. This change is relatively trivial. Introduce a `Bootstrapped` flag to track if the cluster is bootstrapped. This allows us to keep `BootstrapExpected` immutable. Previously, the flag was a config value but it gets set to 0 after cluster bootstrap completes.	2020-03-02 13:47:43 -05:00
Mahmood Ali	a8d6950007	Remove rkt as a built-in driver Rkt has been archived and is no longer an active project: * https://github.com/rkt/rkt * https://github.com/rkt/rkt/issues/4024 The rkt driver will continue to live as an external plugin.	2020-02-26 22:16:41 -05:00
Fredrik Hoem Grelland	edb3bd0f3f	Update consul-template to v0.24.1 and remove deprecated vault_grace (#7170 )	2020-02-23 16:24:53 +01:00
Nick Ethier	eb9c8593ba	Merge pull request #7163 from hashicorp/b-driver-plugin-recovery drivermanager: attempt dispense on reattachment failure	2020-02-21 10:33:20 -05:00
Mahmood Ali	98ad59b1de	update rest of consul packages	2020-02-16 16:25:04 -06:00
Nick Ethier	d8eed3119d	drivermanager: attempt dispense on reattachment failure	2020-02-15 00:50:06 -05:00
Seth Hoenig	543354aabe	Merge pull request #7106 from hashicorp/f-ctag-override client: enable configuring enable_tag_override for services	2020-02-13 12:34:48 -06:00
Michael Schurter	8c332a3757	Merge pull request #7102 from hashicorp/test-limits Fix some race conditions and flaky tests	2020-02-13 10:19:11 -08:00
Seth Hoenig	7f33b92e0b	command: use consistent CONSUL_HTTP_TOKEN name Consul CLI uses CONSUL_HTTP_TOKEN, so Nomad should use the same. Note that consul-template uses CONSUL_TOKEN, which Nomad also uses, so be careful to preserve any reference to that in the consul-template context.	2020-02-12 10:42:33 -06:00
Seth Hoenig	0e44094d1a	client: enable configuring enable_tag_override for services Consul provides a feature of Service Definitions where the tags associated with a service can be modified through the Catalog API, overriding the value(s) configured in the agent's service configuration. To enable this feature, the flag enable_tag_override must be configured in the service definition. Previously, Nomad did not allow configuring this flag, and thus the default value of false was used. Now, it is configurable. Because Nomad itself acts as a state machine around the the service definitions of the tasks it manages, it's worth describing what happens when this feature is enabled and why. Consider the basic case where there is no Nomad, and your service is provided to consul as a boring JSON file. The ultimate source of truth for the definition of that service is the file, and is stored in the agent. Later, Consul performs "anti-entropy" which synchronizes the Catalog (stored only the leaders). Then with enable_tag_override=true, the tags field is available for "external" modification through the Catalog API (rather than directly configuring the service definition file, or using the Agent API). The important observation is that if the service definition ever changes (i.e. the file is changed & config reloaded OR the Agent API is used to modify the service), those "external" tag values are thrown away, and the new service definition is once again the source of truth. In the Nomad case, Nomad itself is the source of truth over the Agent in the same way the JSON file was the source of truth in the example above. That means any time Nomad sets a new service definition, any externally configured tags are going to be replaced. When does this happen? Only on major lifecycle events, for example when a task is modified because of an updated job spec from the 'nomad job run <existing>' command. Otherwise, Nomad's periodic re-sync's with Consul will now no longer try to restore the externally modified tag values (as long as enable_tag_override=true). Fixes #2057	2020-02-10 08:00:55 -06:00
Michael Schurter	2896f78f77	client: fix race accessing Node.status * Call Node.Canonicalize once when Node is created. * Lock when accessing fields mutated by node update goroutine	2020-02-07 15:50:47 -08:00
Seth Hoenig	db7bcba027	tests: set consul token for nomad client for testing SIDS TR hook	2020-01-31 19:06:15 -06:00
Seth Hoenig	9b20ca5b25	e2e: setup consul ACLs a little more correctly	2020-01-31 19:06:11 -06:00
Seth Hoenig	4152254c3a	tests: skip some SIDS hook tests if running tests as root	2020-01-31 19:05:32 -06:00
Seth Hoenig	441e8c7db7	client: additional test cases around failures in SIDS hook	2020-01-31 19:05:27 -06:00
Seth Hoenig	c281b05fc0	client: PR cleanup - improved logging around kill task in SIDS hook	2020-01-31 19:05:23 -06:00

1 2 3 4 5 ...

4044 commits