open-nomad

Author	SHA1	Message	Date
Michael Schurter	fc1bb95ef8	Remove old comment; it's been fixed!	2019-01-14 09:56:53 -08:00
Mahmood Ali	916a40bb9e	move cstructs.DeviceNetwork to drivers pkg	2019-01-08 09:11:47 -05:00
Nick Ethier	82175d1328	client/drivermananger: add driver manager The driver manager is modeled after the device manager and is started by the client. It's responsible for handling driver lifecycle and reattachment state, as well as processing the incomming fingerprint and task events from each driver. The mananger exposes a method for registering event handlers for task events that is used by the task runner to update the server when a task has been updated with an event. Since driver fingerprinting has been implemented by the driver manager, it is no longer needed in the fingerprint mananger and has been removed.	2018-12-18 22:55:18 -05:00
Michael Schurter	8fa5e90095	consul: add ScriptExecutor context wrapper Since d335a82859ca2177bc6deda0c2c85b559daf2db3 ScriptExecutors now take a timeout duration instead of a context. This broke the script check removal code which used context cancelation propagation to remove script checks while they were executing. This commit adds a wrapper around ScriptExecutors that obeys context cancelation again. The only downside is that it leaks a goroutine until the underlying Exec call completes or timeouts. Since check removal is relatively rare, check timeouts usually low, and scripts usually fast, the risk of leaking a goroutine seems very small.	2018-12-03 20:26:31 -08:00
Michael Schurter	6459c19ffc	consul: fix script checks exiting after 1 run Fixes a regression caused in d335a82859ca2177bc6deda0c2c85b559daf2db3 The removal of the inner context made the remaining cancels cancel the outer context and cause script checks to exit prematurely.	2018-12-03 18:50:02 -08:00
Alex Dadgar	4ee603c382	Device hook and devices affect computed node class This PR introduces a device hook that retrieves the device mount information for an allocation. It also updates the computed node class computation to take into account devices. TODO Fix the task runner unit test. The environment variable is being lost even though it is being properly set in the prestart hook.	2018-11-27 17:25:33 -08:00
Preetha Appan	18708d3f0b	Pass service metadata "external-source" for consul UI integration	2018-11-16 11:28:56 -06:00
Mahmood Ali	c7610d8c22	mark and skip failing consul failing tests	2018-11-13 10:21:40 -05:00
Michael Schurter	6bdbfb8129	tests: get consul integration tests building	2018-11-05 12:32:05 -08:00
Michael Schurter	d71a1b4547	tests: more fixes due to api changes	2018-10-29 15:25:22 -07:00
Michael Schurter	2b1b3d7e1e	tests: get tests building if not yet passing	2018-10-16 16:56:57 -07:00
Nick Ethier	f192c3752a	client: refactor post allocrunnerv2 finalization	2018-10-16 16:56:56 -07:00
Nick Ethier	4a4c7dbbfc	client: begin driver plugin integration client: fingerprint driver plugins	2018-10-16 16:56:56 -07:00
Alex Dadgar	7946a14aa8	Fix lints	2018-10-16 16:56:56 -07:00
Michael Schurter	a4b4d7b266	consul service hook Deregistration works but difficult to test due to terminal updates not being fully implemented in the new client/ar/tr.	2018-10-16 16:53:29 -07:00
Alex Dadgar	52f9cd7637	fixing tests	2018-10-04 14:26:19 -07:00
Alex Dadgar	ca28afa3b2	small fixes	2018-09-15 16:42:38 -07:00
Alex Dadgar	7739ef51ce	agent + consul	2018-09-13 10:43:40 -07:00
Alex Dadgar	300b1a7a15	Tests only use testlog package logger	2018-06-13 15:40:56 -07:00
Alex Dadgar	90c2108bfb	Fix gc tests + parallel destroy + small test fixes	2018-06-12 10:23:45 -07:00
Preetha Appan	ce6d4a8d7a	Fix tests and move isClient to constructor	2018-06-01 15:59:53 -05:00
Preetha Appan	a5bfaa098c	Fix unnecessary deregistration in consul sync This commit fixes an issue where if a nomad client and server shared the same consul instance, the server would deregister any services and checks registered by clients for running tasks.	2018-06-01 14:48:25 -05:00
Alex Dadgar	51e67daf69	Use Tags when CanaryTags isn't specified This PR fixes a bug where we weren't defaulting to `tags` when `canary_tags` was empty and adds documentation.	2018-05-23 13:07:47 -07:00
Michael Schurter	f1d13683e6	consul: remove services with/without canary tags Guard against Canary being set to false at the same time as an allocation is being stopped: this could cause RemoveTask to be called with the wrong Canary value and leaking a service. Deleting both Canary values is the safest route.	2018-05-07 14:55:01 -05:00
Michael Schurter	50e04c976e	consul: support canary tags for services Also refactor Consul ServiceClient to take a struct instead of a massive set of arguments. Meant updating a lot of code but it should be far easier to extend in the future as you will only need to update a single struct instead of every single call site. Adds an e2e test for canary tags.	2018-05-07 14:55:01 -05:00
Michael Schurter	f6a4713141	consul: make grpc checks more like http checks	2018-05-04 11:08:11 -07:00
Michael Schurter	382caec1e1	consul: initial grpc implementation Needs to be more like http.	2018-05-04 11:08:11 -07:00
Michael Schurter	cfcbb9fa21	consul: periodically reconcile services/checks Periodically sync services and checks from Nomad to Consul. This is mostly useful when testing with the Consul dev agent which does not persist state across restarts. However, this is a reasonable safety measure to prevent skew between Consul's state and Nomad's services+checks. Also modernized the test suite a bit.	2018-04-19 15:45:42 -07:00
Michael Schurter	d3650fb2cd	test: build with mock_driver by default `make release` and `make prerelease` set a `release` tag to disable enabling the `mock_driver`	2018-04-18 14:45:33 -07:00
Michael Schurter	86f562be3a	Remove unnecessary conversions	2018-03-16 16:32:59 -07:00
Michael Schurter	c3e8f6319c	gofmt -s (simplify) files	2018-03-16 16:31:16 -07:00
Michael Schurter	0971114f0c	Replace Consul TLSSkipVerify handling Instead of checking Consul's version on startup to see if it supports TLSSkipVerify, assume that it does and only log in the job service handler if we discover Consul does not support TLSSkipVerify. The old code would break TLSSkipVerify support if Nomad started before Consul (such as on system boot) as TLSSkipVerify would default to false if Consul wasn't running. Since TLSSkipVerify has been supported since Consul 0.7.2, it's safe to relax our handling.	2018-03-14 17:43:06 -07:00
Josh Soref	1359fd2c3d	spelling: unexpected	2018-03-11 19:08:07 +00:00
Josh Soref	42d7f19861	spelling: supports	2018-03-11 19:00:11 +00:00
Josh Soref	05305afcd9	spelling: services	2018-03-11 18:53:58 +00:00
Josh Soref	ad55e85e73	spelling: registrations	2018-03-11 18:40:53 +00:00
Josh Soref	3c1ce6d16d	spelling: otherwise	2018-03-11 18:34:27 +00:00
Josh Soref	85fabc63c8	spelling: expected	2018-03-11 17:57:01 +00:00
Josh Soref	7a6dfa4b1a	spelling: current	2018-03-11 17:52:32 +00:00
Josh Soref	5f87691df1	spelling: asynchronously	2018-03-11 17:41:50 +00:00
Michael Schurter	8a0cf66822	Improve invalid port error message for services Related to #3681 If a user specifies an invalid port label when using address_mode=driver they'll get an error message about the label being an invalid number which is very confusing. I also added a bunch of testing around Service.AddressMode validation since I was concerned by the linked issue that there were cases I was missing. Unfortunately when address_mode=driver is used there's only so much validation that can be done as structs/structs.go validation never peeks into the driver config which would be needed to verify the port labels/map.	2018-01-18 15:35:24 -08:00
Michael Schurter	447dc5bbd3	Fix test	2018-01-18 15:35:24 -08:00
Michael Schurter	583e17fad5	Always advertise driver IP when in driver mode Fixes #3681 When in drive address mode Nomad should always advertise the driver's IP in Consul even when no network exists. This matches the 0.6 behavior. When in host address mode Nomad advertises the alloc's network's IP if one exists. Otherwise it lets Consul determine the IP. I also added some much needed logging around Docker's network discovery.	2018-01-18 15:35:24 -08:00
Michael Schurter	714eb0b266	Services should not require a port Fixes #3673	2017-12-19 15:50:23 -08:00
Michael Schurter	cdcefd0908	Use the Service.Hash() method in agent service ids The allocID and taskName parameters are useless for agents, but it's still nice to reuse the same hash method for agent and task services. This brings in the lowercase mode for the agent hash as well.	2017-12-11 16:50:15 -08:00
Michael Schurter	4f1002c1a8	Be more defensive in port checks	2017-12-08 12:27:57 -08:00
Michael Schurter	d613e0aaf5	Move service hash logic to Service.Hash method	2017-12-08 12:03:43 -08:00
Michael Schurter	b71edf846f	Hash fields used in task service IDs Fixes #3620 Previously we concatenated tags into task service IDs. This could break deregistration of tag names that contained double //s like some Fabio tags. This change breaks service ID backward compatibility so on upgrade all users services and checks will be removed and re-added with new IDs. This change has the side effect of including all service fields in the ID's hash, so we no longer have to track PortLabel and AddressMode changes independently.	2017-12-08 12:03:43 -08:00
Michael Schurter	91282315d1	Prevent using port 0 with address_mode=driver	2017-12-08 12:03:43 -08:00
Michael Schurter	4b20441eef	Validate port label for host address mode Also skip getting an address for script checks which don't use them. Fixed a weird invalid reserved port in a TaskRunner test helper as well as a problem with our mock Alloc/Job. Hopefully the latter doesn't cause other tests to fail, but we were referencing an invalid PortLabel and just not catching it before.	2017-12-08 12:03:43 -08:00
Michael Schurter	4347026f83	Test Consul from TaskRunner thoroughly Rely less on the mockConsulServiceClient because the real consul.ServiceClient needs all the testing it can get!	2017-12-08 12:03:00 -08:00
Michael Schurter	4ae115dc59	Allow custom ports for services and checks Fixes #3380 Adds address_mode to checks (but no auto) and allows services and checks to set literal port numbers when using address_mode=driver. This allows SDNs, overlays, etc to advertise internal and host addresses as well as do checks against either.	2017-12-08 12:03:00 -08:00
Jens Herrmann	5680fcccc2	Fix typos in metric names. #3610	2017-12-01 15:24:14 +01:00
Michael Schurter	0aace3d749	Don't set Interval on TTL health checks	2017-10-16 17:35:47 -07:00
Alex Dadgar	4173834231	Enable more linters	2017-09-26 15:26:33 -07:00
Michael Schurter	a844fba8d2	Fix comments: task -> check	2017-09-15 15:19:53 -07:00
Michael Schurter	0f2a3dcec9	Test check watch updates	2017-09-14 16:48:39 -07:00
Michael Schurter	847fe080f6	Rename unhealthy var and fix test indeterminism	2017-09-14 16:48:39 -07:00
Michael Schurter	573a0df03d	Watched -> TriggersRestart Watched was a silly name	2017-09-14 16:48:39 -07:00
Michael Schurter	4ea19baa52	Handle multiple failing checks on a single task Before this commit if a task had 2 checks cause restarts at the same time, both would trigger restarts of the task! This change removes all checks for a task whenever one of them is restarted.	2017-09-14 16:48:39 -07:00
Michael Schurter	73fb71ca10	RestartDelay isn't needed as checks are re-added on restarts @dadgar made the excellent observation in #3105 that TaskRunner removes and re-registers checks on restarts. This means checkWatcher doesn't need to do any internal restart tracking. Individual checks can just remove themselves and be re-added when the task restarts.	2017-09-14 16:48:39 -07:00
Michael Schurter	448ad3945f	Simplify from 2 select loops to one	2017-09-14 16:48:39 -07:00
Michael Schurter	550e631eea	Wrap check watch updates in a struct Reusing checkRestart for both adds/removes and the main check restarting logic was confusing.	2017-09-14 16:48:39 -07:00
Michael Schurter	72e5c0c0aa	Fix whitespace	2017-09-14 16:47:41 -07:00
Michael Schurter	ade29ecbed	Improve check watcher logging and add tests Also expose a mock Consul Agent to allow testing ServiceClient and checkWatcher from TaskRunner without actually talking to a real Consul.	2017-09-14 16:47:41 -07:00
Michael Schurter	a137676358	Add comments and move delay calc to TaskRunner	2017-09-14 16:46:54 -07:00
Michael Schurter	a180c00fc3	on_warning=false -> ignore_warnings=false Treat warnings as unhealthy by default	2017-09-14 16:46:54 -07:00
Michael Schurter	8a87475498	Use existing restart policy infrastructure	2017-09-14 16:46:54 -07:00
Michael Schurter	22690c5f4c	Add check watcher for restarting unhealthy tasks	2017-09-14 16:46:54 -07:00
Michael Schurter	7f6e1f3a9c	Initializing embedded structs is weird	2017-08-17 16:49:14 -07:00
Michael Schurter	0634eef12a	Test createCheckReg	2017-08-17 16:49:14 -07:00
Michael Schurter	bb8d5689d8	Add Header and Method support for HTTP checks	2017-08-17 16:44:21 -07:00
Alex Dadgar	43dff0a11d	Fix integration test	2017-08-14 10:52:49 -07:00
Alex Dadgar	6e20acb503	Merge pull request #2984 from hashicorp/b-tags Fix alloc health with checks using interpolation	2017-08-10 13:07:25 -07:00
Alex Dadgar	c8f74ac43b	Address comments	2017-08-10 13:07:08 -07:00
Alex Dadgar	d86b3977b9	Fix alloc health with checks using interpolation Fixes an issue in which the allocation health watcher was checking for allocations health based on un-interpolated services and checks. Change the interface for retrieving check information from Consul to retrieving all registered services and checks by allocation. In the future this will allow us to output nicer messages. Fixes https://github.com/hashicorp/nomad/issues/2969	2017-08-07 16:27:08 -07:00
Luke Farnell	f0ced87b95	fixed all spelling mistakes for goreport	2017-08-07 17:13:05 -04:00
Michael Schurter	5794e5ece7	Use int32 for atomic ops to avoid alignment issues From https://golang.org/pkg/sync/atomic/#pkg-note-BUG : On both ARM and x86-32, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically. The first word in a global variable or in an allocated struct or slice can be relied upon to be 64-bit aligned.	2017-08-04 10:14:16 -07:00
Michael Schurter	d2f8fdcad5	Fix comment	2017-07-25 12:13:05 -07:00
Michael Schurter	3e6231842d	Forgot to setcmdenv This would leak a consul agent	2017-07-25 12:09:57 -07:00
Michael Schurter	4b83eba599	Use seen more conservatively	2017-07-24 16:48:40 -07:00
Michael Schurter	cdf138eb27	Always increment failures... ...as it's used in calculating the backoff	2017-07-24 15:37:53 -07:00
Michael Schurter	809724ad8d	Track whether Consul has ever been seen Need a way to squelch Consul operation errors on shutdown. If it's never been seen don't log errors about deregs failing.	2017-07-24 12:12:02 -07:00
Michael Schurter	edbe62a879	Synchronously deregister agent on shutdown Fixes #2891 Previously the agent services and checks were being asynchrously deregistered on shutdown, so it was a race between the sync goroutine deregistering them and Nomad shutting down. This switches to synchronously deregister agent serivces and checks which doesn't really have a downside since the sync goroutines retry behavior doesn't help on shutdown anyway.	2017-07-24 11:40:37 -07:00
Alex Dadgar	553bc91725	Parallel client tests (#2890 ) * alloc_runner * Random tests * parallel task_runner and no exec compatible check * Parallel client * Fail fast and use random ports * Fix docker port mapping * Make concurrent pull less timing dependant * up parallel * Fixes * don't build chroots in parallel on travis * Reduce parallelism on travis with lxc/rkt * make java test app not run forever * drop parallelism a little * use docker ports that are out of the os's ephemeral port range * Limit even more on travis * rkt deadline	2017-07-22 19:04:36 -07:00
Alex Dadgar	4dd5d943c7	remove root requirement on consul integration check	2017-07-21 19:32:41 -07:00
Michael Schurter	125a3fb2f9	Error -> Errof	2017-07-19 10:00:57 -07:00
Michael Schurter	99d1486f32	Never remove unknown agent services Fixes #2827 This is a tradeoff. The pro is that you can run separate client and server agents on the same node and advertise both. The con is that if a Nomad agent crashes and isn't restarted on that node in the same mode its entry will not be cleaned up. That con scenario seems far less likely to occur than the scenario on the pro side, and even if we do leak an agent entry the checks will be failing so nothing should attempt to use it.	2017-07-18 13:23:01 -07:00
Alex Dadgar	bf2dafb8e9	check id method name changed	2017-07-07 12:15:09 -07:00
Alex Dadgar	067ed86a47	Client watches for allocation health using task state and Consul checks This PR adds watching of allocation health at the client. The client can watch for health based on the tasks running on time and also based on the consul checks passing.	2017-07-07 12:10:04 -07:00
Michael Schurter	a863ead30e	Fix test error formats	2017-06-26 12:53:43 -07:00
Michael Schurter	9da78ae25f	Remove debug logging	2017-06-21 17:19:08 -07:00
Michael Schurter	c0eff81383	Fix Service.AddressMode changes during task updates	2017-06-21 17:19:08 -07:00
Michael Schurter	67d154a274	Test driver network advertisement and checks	2017-06-21 17:19:08 -07:00
Michael Schurter	b9bfb84b53	Implement DriverNetwork and Service.AddressMode Ideally DriverNetwork would be fully populated in Driver.Prestart, but Docker doesn't assign the container's IP until you start the container. However, it's important to setup the port env vars before calling Driver.Start, so Prestart should populate that.	2017-06-21 17:19:08 -07:00
Michael Schurter	06f937bf28	Merge pull request #2591 from hashicorp/b-2180-script-updates Properly interpolate services on updated tasks	2017-05-17 09:09:01 -07:00
Alex Dadgar	ba70cc4f01	Merge branch 'master' into f-bolt-db	2017-05-09 11:11:55 -07:00
Michael Schurter	85210eb92f	Update consul/api to support unix socket addrs Fixes #2594	2017-05-08 11:57:04 -07:00
Alex Dadgar	2d54ee2925	Fix tests	2017-05-03 15:14:19 -07:00
Alex Dadgar	9faa98e13b	Fix tests	2017-05-03 12:38:49 -07:00

1 2 3 4 5

234 commits