open-nomad

Commit Graph

Author	SHA1	Message	Date
Seth Hoenig	78a7d1e426	comments: cleanup some leftover debug comments and such	2020-01-31 19:04:35 -06:00
Seth Hoenig	8219c78667	nomad: handle SI token revocations concurrently Be able to revoke SI token accessors concurrently, and also ratelimit the requests being made to Consul for the various ACL API uses.	2020-01-31 19:04:14 -06:00
Seth Hoenig	2c7ac9a80d	nomad: fixup token policy validation	2020-01-31 19:04:08 -06:00
Seth Hoenig	9df33f622f	nomad: proxy requests for Service Identity tokens between Clients and Consul Nomad jobs may be configured with a TaskGroup which contains a Service definition that is Consul Connect enabled. These service definitions end up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In the case where Consul ACLs are enabled, a Service Identity token is required for these tasks to run & connect, etc. This changeset enables the Nomad Server to recieve RPC requests for the derivation of SI tokens on behalf of instances of Consul Connect using Tasks. Those tokens are then relayed back to the requesting Client, which then injects the tokens in the secrets directory of the Task.	2020-01-31 19:03:53 -06:00
Nick Ethier	5636203d4e	consul: fix var name from rebase	2020-01-27 14:00:19 -05:00
Nick Ethier	0ae99b3c9c	consul: fix var name from rebase	2020-01-27 12:55:52 -05:00
Nick Ethier	5cbb94e16e	consul: add support for canary meta	2020-01-27 09:53:30 -05:00
Nick Ethier	bd454a4c6f	client: improve group service stanza interpolation and check_re… (#6586 ) * client: improve group service stanza interpolation and check_restart support Interpolation can now be done on group service stanzas. Note that some task runtime specific information that was previously available when the service was registered poststart of a task is no longer available. The check_restart stanza for checks defined on group services will now properly restart the allocation upon check failures if configured.	2019-11-18 13:04:01 -05:00
Michael Schurter	9fed8d1bed	client: fix panic from 0.8 -> 0.10 upgrade makeAllocTaskServices did not do a nil check on AllocatedResources which causes a panic when upgrading directly from 0.8 to 0.10. While skipping 0.9 is not supported we intend to fix serious crashers caused by such upgrades to prevent cluster outages. I did a quick audit of the client package and everywhere else that accesses AllocatedResources appears to be properly guarded by a nil check.	2019-11-01 07:47:03 -07:00
Seth Hoenig	039fbd3f3b	connect: enable setting tags on consul connect sidecar service in jobspec (#6415 )	2019-10-17 19:25:20 +00:00
Tim Gross	cd9c23617f	client/connect: ConsulProxy LocalServicePort/Address (#6358 ) Without a `LocalServicePort`, Connect services will try to use the mapped port even when delivering traffic locally. A user can override this behavior by pinning the port value in the `service` stanza but this prevents us from using the Consul service name to reach the service. This commits configures the Consul proxy with its `LocalServicePort` and `LocalServiceAddress` fields.	2019-09-23 14:30:48 -04:00
Tim Gross	e3e30c15a9	remove resolved TODO from UpdateTTL docstring (#6336 )	2019-09-16 16:26:06 -04:00
Tim Gross	0f29dcc935	support script checks for task group services (#6197 ) In Nomad prior to Consul Connect, all Consul checks work the same except for Script checks. Because the Task being checked is running in its own container namespaces, the check is executed by Nomad in the Task's context. If the Script check passes, Nomad uses the TTL check feature of Consul to update the check status. This means in order to run a Script check, we need to know what Task to execute it in. To support Consul Connect, we need Group Services, and these need to be registered in Consul along with their checks. We could push the Service down into the Task, but this doesn't work if someone wants to associate a service with a task's ports, but do script checks in another task in the allocation. Because Nomad is handling the Script check and not Consul anyways, this moves the script check handling into the task runner so that the task runner can own the script check's configuration and lifecycle. This will allow us to pass the group service check configuration down into a task without associating the service itself with the task. When tasks are checked for script checks, we walk back through their task group to see if there are script checks associated with the task. If so, we'll spin off script check tasklets for them. The group-level service and any restart behaviors it needs are entirely encapsulated within the group service hook.	2019-09-03 15:09:04 -04:00
Evan Ercolano	fcf66918d0	Remove unused canary param from MakeTaskServiceID	2019-08-31 16:53:23 -04:00
Michael Schurter	67b7bc1e90	consul: ignore connect services when syncing Consul registers Connect services automatically, however Nomad thinks it owns them due to the _nomad prefix. Since the services are managed by Consul, Nomad needs to explicitly ignore them or otherwies they will be removed.	2019-08-30 11:53:41 -07:00
Jerome Gravel-Niquet	cbdc1978bf	Consul service meta (#6193 ) * adds meta object to service in job spec, sends it to consul * adds tests for service meta * fix tests * adds docs * better hashing for service meta, use helper for copying meta when registering service * tried to be DRY, but looks like it would be more work to use the helper function	2019-08-23 12:49:02 -04:00
Michael Schurter	59e0b67c7f	connect: task hook for bootstrapping envoy sidecar Fixes #6041 Unlike all other Consul operations, boostrapping requires Consul be available. This PR tries Consul 3 times with a backoff to account for the group services being asynchronously registered with Consul.	2019-08-22 08:15:32 -07:00
Michael Schurter	b008fd1724	connect: register group services with Consul Fixes #6042 Add new task group service hook for registering group services like Connect-enabled services. Does not yet support checks.	2019-08-20 12:25:10 -07:00
Michael Schurter	db4de5fae9	Merge pull request #5975 from hashicorp/b-check-watcher-deadlock consul: fix deadlock in check-based restarts	2019-07-18 13:13:40 -07:00
Michael Schurter	6d095b3b36	consul: add test for check watcher deadlock	2019-07-18 08:24:09 -07:00
Michael Schurter	826d2503e6	Update command/agent/consul/check_watcher.go Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>	2019-07-18 07:08:27 -07:00
Michael Schurter	5407584bc3	consul: fix deadlock in check-based restarts Fixes #5395 Alternative to #5957 Make task restarting asynchronous when handling check-based restarts. This matches the pre-0.9 behavior where TaskRunner.Restart was an asynchronous signal. The check-based restarting code was not designed to handle blocking in TaskRunner.Restart. 0.9 made it reentrant and could easily overwhelm the buffered update chan and deadlock. Many thanks to @byronwolfman for his excellent debugging, PR, and reproducer! I created this alternative as changing the functionality of TaskRunner.Restart has a much larger impact. This approach reverts to old known-good behavior and minimizes the number of places changes are made.	2019-07-17 15:22:21 -07:00
Mahmood Ali	ec7e258d71	address review feedback	2019-07-17 10:43:13 +07:00
Mahmood Ali	e07413c420	Avoid de-registering slowly restored services When a nomad client restarts/upgraded, nomad restores state from running task and starts the sync loop. If sync loop runs early, it may deregister services from Consul prematurely even when Consul has the running service as healthy. This is not ideal, as re-registering the service means potentially waiting a whole service health check interval before declaring the service healthy. We attempt to mitigate this by introducing an initialization probation period. During this time, we only deregister services and checks that were explicitly deregistered, and leave unrecognized ones alone. This serves as a grace period for restoring to complete, or for operators to restore should they recognize they restored with the wrong nomad data directory.	2019-06-14 11:15:21 -04:00
Danielle Lancashire	8112177503	consul: Include port-label in service registration It is possible to provide multiple identically named services with different port assignments in a Nomad configuration. We introduced a regression when migrating to stable service identifiers where multiple services with the same name would conflict, and the last definition would take precedence. This commit includes the port label in the stable service identifier to allow the previous behaviour where this was supported, for example providing: ```hcl service { name = "redis-cache" tags = ["global", "cache"] port = "db" check { name = "alive" type = "tcp" interval = "10s" timeout = "2s" } } service { name = "redis-cache" tags = ["global", "foo"] port = "foo" check { name = "alive" type = "tcp" port = "db" interval = "10s" timeout = "2s" } } service { name = "redis-cache" tags = ["global", "bar"] port = "bar" check { name = "alive" type = "tcp" port = "db" interval = "10s" timeout = "2s" } } ``` in a nomad task definition is now completely valid. Each service definition with the same name must still have a unique port label however.	2019-06-13 15:24:54 +02:00
Mahmood Ali	2ddc39973d	Merge pull request #5668 from hashicorp/flaky-test-20190430 fix flaky test by allowing for call invocation overhead	2019-05-13 12:33:44 -04:00
Danielle Lancashire	0da2924b2a	consul: Document example check id	2019-05-09 13:22:22 +02:00
Mahmood Ali	d405fcb093	fix flaky test by allowing for call invocation overhead	2019-05-08 18:04:37 -04:00
Danielle Lancashire	d824e00d1a	consul: Do not deregister external checks This commit causes sync to skip deregistering checks that are not managed by nomad, such as service maintenance mode checks. This is handled in the same way as service registrations - by doing a Nomad specific prefix match.	2019-05-02 16:54:18 +02:00
Danielle Lancashire	0b8e85118e	consul: Use a stable identifier for services The current implementation of Service Registration uses a hash of the nomad-internal state of a service to register it with Consul, this means that any update to the service invalidates this name and we then deregister, and recreate the service in Consul. While this behaviour slightly simplifies reasoning about service registration, this becomes problematic when we add consul health checks to a service. When the service is re-registered, so are the checks, which default to failing for at least one check period. This commit migrates us to using a stable identifier based on the allocation, task, and service identifiers, and uses the difference between the remote and local state to decide when to push updates. It uses the existing hashing mechanic to decide when UpdateTask should regenerate service registrations for providing to Sync, but this should be removable as part of a future refactor. It additionally introduces the _nomad-check- prefix for check definitions, to allow for future allowing of consul features like maintenance mode.	2019-05-02 16:54:18 +02:00
Michael Schurter	e3e1797850	consul: squelch noisy useless logs Only log when syncing actually did something.	2019-02-04 11:07:57 -08:00
Michael Schurter	fc1bb95ef8	Remove old comment; it's been fixed!	2019-01-14 09:56:53 -08:00
Mahmood Ali	916a40bb9e	move cstructs.DeviceNetwork to drivers pkg	2019-01-08 09:11:47 -05:00
Nick Ethier	82175d1328	client/drivermananger: add driver manager The driver manager is modeled after the device manager and is started by the client. It's responsible for handling driver lifecycle and reattachment state, as well as processing the incomming fingerprint and task events from each driver. The mananger exposes a method for registering event handlers for task events that is used by the task runner to update the server when a task has been updated with an event. Since driver fingerprinting has been implemented by the driver manager, it is no longer needed in the fingerprint mananger and has been removed.	2018-12-18 22:55:18 -05:00
Michael Schurter	8fa5e90095	consul: add ScriptExecutor context wrapper Since d335a82859ca2177bc6deda0c2c85b559daf2db3 ScriptExecutors now take a timeout duration instead of a context. This broke the script check removal code which used context cancelation propagation to remove script checks while they were executing. This commit adds a wrapper around ScriptExecutors that obeys context cancelation again. The only downside is that it leaks a goroutine until the underlying Exec call completes or timeouts. Since check removal is relatively rare, check timeouts usually low, and scripts usually fast, the risk of leaking a goroutine seems very small.	2018-12-03 20:26:31 -08:00
Michael Schurter	6459c19ffc	consul: fix script checks exiting after 1 run Fixes a regression caused in d335a82859ca2177bc6deda0c2c85b559daf2db3 The removal of the inner context made the remaining cancels cancel the outer context and cause script checks to exit prematurely.	2018-12-03 18:50:02 -08:00
Alex Dadgar	4ee603c382	Device hook and devices affect computed node class This PR introduces a device hook that retrieves the device mount information for an allocation. It also updates the computed node class computation to take into account devices. TODO Fix the task runner unit test. The environment variable is being lost even though it is being properly set in the prestart hook.	2018-11-27 17:25:33 -08:00
Preetha Appan	18708d3f0b	Pass service metadata "external-source" for consul UI integration	2018-11-16 11:28:56 -06:00
Mahmood Ali	c7610d8c22	mark and skip failing consul failing tests	2018-11-13 10:21:40 -05:00
Michael Schurter	6bdbfb8129	tests: get consul integration tests building	2018-11-05 12:32:05 -08:00
Michael Schurter	d71a1b4547	tests: more fixes due to api changes	2018-10-29 15:25:22 -07:00
Michael Schurter	2b1b3d7e1e	tests: get tests building if not yet passing	2018-10-16 16:56:57 -07:00
Nick Ethier	f192c3752a	client: refactor post allocrunnerv2 finalization	2018-10-16 16:56:56 -07:00
Nick Ethier	4a4c7dbbfc	client: begin driver plugin integration client: fingerprint driver plugins	2018-10-16 16:56:56 -07:00
Alex Dadgar	7946a14aa8	Fix lints	2018-10-16 16:56:56 -07:00
Michael Schurter	a4b4d7b266	consul service hook Deregistration works but difficult to test due to terminal updates not being fully implemented in the new client/ar/tr.	2018-10-16 16:53:29 -07:00
Alex Dadgar	52f9cd7637	fixing tests	2018-10-04 14:26:19 -07:00
Alex Dadgar	ca28afa3b2	small fixes	2018-09-15 16:42:38 -07:00
Alex Dadgar	7739ef51ce	agent + consul	2018-09-13 10:43:40 -07:00
Alex Dadgar	300b1a7a15	Tests only use testlog package logger	2018-06-13 15:40:56 -07:00

1 2 3 4 5

215 Commits