open-nomad

Author	SHA1	Message	Date
Mahmood Ali	967452a3f0	fifo: Use plain fifo file in Unix This PR switches to using plain fifo files instead of golang structs managed by containerd/fifo library. The library main benefit is management of opening fifo files. In Linux, a reader `open()` request would block until a writer opens the file (and vice-versa). The library uses goroutines so that it's the first IO operation that blocks. This benefit isn't really useful for us: Given that logmon simply streams output in a separate process, blocking of opening or first read is effectively the same. The library additionally makes further complications for managing state and tracking read/write permission that seems overhead for our use, compared to using a file directly. Looking here, I made the following incidental changes: * document that we do handle if fifo files are already created, as we rely on that behavior for logmon restarts * use type system to lock read vs write: currently, fifo library returns `io.ReadWriteCloser` even if fifo is opened for writing only!	2019-04-01 13:18:03 -04:00
Michael Schurter	a4572919cd	Merge pull request #5456 from hashicorp/test-taskenv tests: port pre-0.9 task env tests	2019-03-25 10:41:38 -07:00
Michael Schurter	8efad12538	tests: port pre-0.9 task env tests I chose to make them more of integration tests since there's a lot more plumbing involved. The internal implementation details of how we craft task envs can now change and these tests will still properly assert the task runtime environment is setup properly.	2019-03-25 09:46:53 -07:00
Michael Schurter	9afbc45cff	Bump to dev post-0.9.0-rc1 release	2019-03-22 08:26:30 -07:00
Nomad Release bot	3ab3dd4105	Generate files for 0.9.0-rc1 release	2019-03-21 19:06:13 +00:00
Mahmood Ali	b08a2744f8	Merge pull request #5428 from hashicorp/b-dropped-logs-on-task-restart client/logmon: restart log collection correctly when a task is restarted	2019-03-21 14:02:08 -04:00
Mahmood Ali	729458f110	fix TestLogmon_Start_restart	2019-03-21 13:36:46 -04:00
Nick Ethier	b252d712df	logmon: fix test assertion	2019-03-20 21:37:17 -04:00
Nick Ethier	c1f5011181	logmon: remove sleeps from tests	2019-03-20 10:45:09 -04:00
Nick Ethier	e14041bdec	logmon: add tests for rotation and open/closing of fifos	2019-03-19 14:41:23 -04:00
Nick Ethier	dc18b8928a	logmon: make Start rpc idempotent and simplify hook	2019-03-19 14:02:36 -04:00
Nick Ethier	ac7fbee1b8	logmon:add static check for logmon exited hook	2019-03-18 15:59:43 -04:00
Nick Ethier	7dc3d83634	client/logmon: restart log collection correctly when a task is restarted	2019-03-15 23:59:18 -04:00
Mahmood Ali	fb55717b0c	Regenerate Proto files (#5421 ) Noticed that the protobuf files are out of sync with ones generated by 1.2.0 protoc go plugin. The cause for these files seem to be related to release processes, e.g. [0.9.0-beta1 preperation](`ecec3d38de (diff-da4da188ee496377d456025c2eab4e87)`), and [0.9.0-beta3 preperation](`b849d84f2f`). This restores the changes to that of the pinned protoc version and fails build if protobuf files are out of sync. Sample failing Travis job is that of the first commit change: https://travis-ci.org/hashicorp/nomad/jobs/506285085	2019-03-14 10:56:27 -04:00
Michael Schurter	b126e9eec4	Merge pull request #5386 from hashicorp/b-logmon-stop Fix task/logmon leak after crash	2019-03-12 15:23:02 -07:00
Michael Schurter	0ba1a5251b	client: cleanup and document context uses Some of the context uses in TR hooks are useless (Killed during Stop never seems meaningful). None of the hooks are interruptable for graceful shutdown which is unfortunate and probably needs fixing.	2019-03-12 15:03:54 -07:00
Mahmood Ali	8deb532be2	run TestAllocations_Stats in CI	2019-03-08 07:57:37 -05:00
Michael Schurter	32d31575cc	client: emit event and call exited hooks during cleanup Builds upon earlier commit that cleans up restored handles of terminal allocs by also emitting terminated events and calling exited hooks when appropriate.	2019-03-05 15:12:02 -08:00
Michael Schurter	a4bc46b6e6	test: fix NewMemDB API change	2019-03-04 13:37:20 -08:00
Michael Schurter	64e145ebdb	logmon: drop reattach log level as its expected Logged once per terminal task on agent restart.	2019-03-04 13:26:01 -08:00
Michael Schurter	c5271d3fa5	client: test logmon cleanup The test is sadly quite complicated and peeks into things (logmon's reattach config) AR doesn't normally have access to. However, I couldn't find another way of asserting logmon got cleaned up without resorting to smaller unit tests. Smaller unit tests risk re-implementing dependencies in an unrealistic way, so I opted for an ugly integration test.	2019-03-04 13:15:15 -08:00
Preetha Appan	0e547d29ad	s/mananger/manager	2019-03-04 12:25:54 -06:00
Michael Schurter	ef8d284352	client: ensure task is cleaned up when terminal This commit is a significant change. TR.Run is now always executed, even for terminal allocations. This was changed to allow TR.Run to cleanup (run stop hooks) if a handle was recovered. This is intended to handle the case of Nomad receiving a DesiredStatus=Stop allocation update, persisting it, but crashing before stopping AR/TR. The commit also renames task runner hook data as it was very easy to accidently set state on Requests instead of Responses using the old field names.	2019-03-01 14:00:23 -08:00
Michael Schurter	3f386e3951	Remove generated files for 0.9.0-beta3	2019-02-26 10:34:08 -08:00
Michael Schurter	d74755900e	Generate files for 0.9.0-beta3 release	2019-02-26 09:44:49 -08:00
Michael Schurter	812f1679e2	Merge pull request #5352 from hashicorp/b-leaked-logmon logmon fixes	2019-02-26 08:35:46 -08:00
Michael Schurter	e39a10a1f4	tests: move unix-specific test to its own file Other logmon tests should be portable.	2019-02-26 07:56:44 -08:00
Mahmood Ali	45b6392d4e	tests: port some fingerprint tests from 0.8 (#5359 ) Port some integration tests of driver fingerprinting. Some tests (e.g. `TestFingerprintManager_Run_DriversInBlacklist`) have been subsituted by more isolated tests in `client/pluginmanager/drivermanager/manager_test.go`	2019-02-26 10:54:16 -05:00
Michael Schurter	3b2a592e93	client: restart task on logmon failures This code chooses to be conservative as opposed to optimal: when failing to reattach to logmon simply return a recoverable error instead of immediately trying to restart logmon. The recoverable error will cause the task's restart policy to be applied and a new logmon will be launched upon restart. Trying to do the optimal approach of simply starting a new logmon requires error string comparison and should be tested against a task actively logging to assert the behavior (are writes blocked? dropped?).	2019-02-25 15:42:45 -08:00
Michael Schurter	8830b00866	client: test logmon_hook	2019-02-23 15:36:48 -08:00
Preetha Appan	43679f4ce1	More alloc runner tests ported from 0.8.7	2019-02-22 17:58:06 -06:00
Mahmood Ali	32551fb0e5	emit TaskRestartSignal event on vault restart When Vault token expires and task is restarted, emit `TaskRestartSignal` similar to v0.8.7	2019-02-22 15:56:14 -05:00
Mahmood Ali	8cb4bbcc08	address review comments	2019-02-22 15:56:14 -05:00
Mahmood Ali	216eaa4843	tests: port TestTaskRunner_VaultManager_Signal From https://github.com/hashicorp/nomad/blob/v0.8.7/client/task_runner_test.go#L1427	2019-02-22 15:53:04 -05:00
Mahmood Ali	8e9e732319	tests: port TestTaskRunner_VaultManager_Restart From https://github.com/hashicorp/nomad/blob/v0.8.7/client/task_runner_test.go#L1352	2019-02-22 15:53:04 -05:00
Mahmood Ali	33122ca7c0	tests: port TestTaskRunner_UnregisterConsul_Retries From https://github.com/hashicorp/nomad/blob/v0.8.7/client/task_runner_test.go#L620	2019-02-22 15:53:04 -05:00
Mahmood Ali	0128b0ce7a	tests: port TestTaskRunner_Template_NewVaultToken From https://github.com/hashicorp/nomad/blob/v0.8.7/client/task_runner_test.go#L1275	2019-02-22 15:53:04 -05:00
Mahmood Ali	cfb80583af	tests: port TestTaskRunner_Template_Artifact From https://github.com/hashicorp/nomad/blob/v0.8.7/client/task_runner_test.go#L1195	2019-02-22 15:52:59 -05:00
Mahmood Ali	1b14214a88	tests: port TestAllocRunner_RetryArtifact Port TestAllocRunner_RetryArtifact from https://github.com/hashicorp/nomad/blob/v0.8.7/client/alloc_runner_test.go#L610-L672 I changed the test name because it doesn't actually test that artifact hooks is retried	2019-02-22 15:50:39 -05:00
Mahmood Ali	c827e6e05a	tests: port TestAllocRunner_MoveAllocDir test	2019-02-22 15:50:39 -05:00
Michael Schurter	a2e3ea6dc9	logmon: fix reattach configuration There were multiple bugs here: 1. Reattach unmarshalling always returned an error because you can't unmarshal into a nil pointer. 2. The hook data wasn't being saved because it was put on the request struct, not the response struct. 3. The plugin configuration should only have reattach or a command set. Not both. 4. Setting Done=true meant the hook was never re-run on agent restart so reattaching was never attempted.	2019-02-21 15:32:18 -08:00
Michael Schurter	f5e0dba9d1	fingerprint: improve initial fingerpint message The initial fingerprint message is actually fairly useful, so I bumped it to Debug and fixed the output formatting.	2019-02-21 15:32:18 -08:00
Michael Schurter	01cabdff88	client: restart on recoverable StartTask errors Fixes restarting on recoverable errors from StartTask. Ports TestTaskRunner_Run_RecoverableStartError from 0.8 which discovered the bug.	2019-02-21 15:30:49 -08:00
Michael Schurter	e3f321cd27	test: port TestTaskRunner_RestartSignalTask_NotRunning from 0.8	2019-02-21 15:30:49 -08:00
Michael Schurter	f3aa945a00	test: port TestTaskRunner_DriverNetwork from 0.8	2019-02-21 15:30:49 -08:00
Michael Schurter	518405ac33	Merge pull request #5322 from hashicorp/b-artifact-retries Fix regression by restarting on artifact download errors	2019-02-21 15:28:51 -08:00
Mahmood Ali	6d30284ec9	Merge pull request #5341 from hashicorp/ci-windows-docker Run Docker tests in Windows AppVeyor CI	2019-02-21 13:17:33 -05:00
Michael Schurter	2553800eb8	tests: port TestAllocRunner_Destroy from 0.8 Also add destroy(ar) helper to fix a bunch of shutdown races in AR tests.	2019-02-20 12:35:09 -08:00
Michael Schurter	6580ed668e	client: don't redownload completed artifacts on retries Track the download status of each artifact independently so that if only one of many artifacts fails to download, completed artifacts aren't downloaded again.	2019-02-20 08:45:12 -08:00
Michael Schurter	908bfab4c2	client: artifact errors are retry-able 0.9.0beta2 contains a regression where artifact download errors would not cause a task restart and instead immediately fail the task. This restores the pre-0.9 behavior of retrying all artifact errors and adds missing tests.	2019-02-20 07:21:27 -08:00
Michael Schurter	79ccf00b72	tests: add new task runner test helper Adds a new helper and removes a duplicated test.	2019-02-20 07:21:27 -08:00
Mahmood Ali	33ff8c3e8d	tests: expect Docker on AppVeyor Prepare to run docker on AppVeyor Windows environment	2019-02-20 07:41:47 -05:00
Michael Schurter	159042a1a3	client: fix setting alloc unhealthy at deadline During the 0.9 client refactor the code to fail a deployment when the deadline was reached was broken. This restores and tests that behavior.	2019-02-19 07:44:14 -08:00
Mahmood Ali	87be233aca	test: improve readability of duration Co-Authored-By: schmichael <michael.schurter@gmail.com>	2019-02-14 08:12:06 -08:00
Mahmood Ali	16d3414842	test: improve failure message Co-Authored-By: schmichael <michael.schurter@gmail.com>	2019-02-14 08:11:37 -08:00
Michael Schurter	4814f0fb0b	tests: port TestTaskRunner_Download_List from 0.8	2019-02-12 15:48:04 -08:00
Michael Schurter	a152e3ef17	consul: fix task deregistration hook Broke ShutdownDelay but the test was timing dependent so it just appeared flaky. Made the test slower so that it should never incorrectly pass.	2019-02-12 15:36:02 -08:00
Michael Schurter	4ad879e75e	tests: port TaskRunner_DeriveToken tests from 0.8	2019-02-12 15:36:02 -08:00
Michael Schurter	6743ed9fdc	tests: port TestTaskRunner_BlockForVault from 0.8 Also fix race conditions in the mock vault client.	2019-02-12 13:46:09 -08:00
Michael Schurter	6c0cc65b2e	simplify hcl2 parsing helper No need to pass in the entire eval context	2019-02-04 11:07:57 -08:00
Michael Schurter	fec2752fb2	client: log when allocs have been processed Will hopefully help us catch deadlocks/livelocks/slowdowns in the add/remove allocs pipeline which should be fast.	2019-02-04 11:07:57 -08:00
Michael Schurter	2db91425e3	Remove 0.9.0-beta2 generated files	2019-02-01 08:28:44 -08:00
Alex Dadgar	84d0afccae	Generate files for 0.9.0-beta2	2019-01-30 13:31:50 -08:00
Alex Dadgar	449e582ffc	Merge pull request #5281 from hashicorp/f-affinity-weight-int Change types of weights on spread/affinity	2019-01-30 13:25:56 -08:00
Alex Dadgar	d2e5ede119	remove generated structs	2019-01-30 12:38:34 -08:00
Nick Ethier	e7ea26449e	client: fix bug during 0.8 state up grade that causes external drivers to fail	2019-01-30 14:22:29 -05:00
Alex Dadgar	bc804dda2e	Nomad 0.9.0-beta1 generated code	2019-01-30 10:49:44 -08:00
Alex Dadgar	5062c54874	Fix usage of fsi variable	2019-01-29 14:07:55 -08:00
Alex Dadgar	6f418ebaf0	Always populate task dir environment variables Fixes an issue where if a task was restarted after restating the client, the task dir environment variables would not be populated. This PR fixes this for both upgrades from 0.8.X and for normal 0.9 restarts.	2019-01-29 13:17:10 -08:00
Nick Ethier	bcbed3c532	Merge pull request #5248 from hashicorp/b-rawexec-leak Fix leaked executor in raw_exec	2019-01-28 21:18:31 -05:00
Alex Dadgar	5da21635fb	Fix env templates having interpolated destinations Fixes an issue where env templates that had interpolated destinations would not work. Fixes https://github.com/hashicorp/nomad/issues/5250	2019-01-28 10:28:53 -08:00
Nick Ethier	8d7a47340c	drivermanager: don't store nil reattach configs	2019-01-25 23:07:04 -05:00
Alex Dadgar	d6412fd8e7	Fix double restart counting for templates This PR fixes an issue where template restarts would count twice since it was emitting a restarting event.	2019-01-25 15:38:13 -08:00
Nick Ethier	be976d9c9a	Merge branch 'master' into f-driver-upgradepath-test * master: (23 commits) tests: avoid assertion in goroutine spell check ci: run checkscripts tests: deflake TestRktDriver_StartWaitRecoverWaitStop drivers/rkt: Remove unused github.com/rkt/rkt drivers/rkt: allow development on non-linux cli: Hide `nomad docker_logger` from help output api: test api and structs are in sync goimports until make check is happy nil check node resources to prevent panic tr: use context in as select statement move pluginutils -> helper/pluginutils vet goimports gofmt Split hclspec move hclutils Driver tests do not use hcl2/hcl, hclspec, or hclutils move reattach config loader and singleton ...	2019-01-23 21:01:24 -05:00
Nick Ethier	5b9013528e	drivers: add docker upgrade path and e2e test	2019-01-23 14:44:42 -05:00
Nick Ethier	a36c4320ff	Merge pull request #5227 from hashicorp/b-client-highcpu-usage Fix bug related to high cpu usage	2019-01-23 14:27:51 -05:00
Michael Schurter	13f061a83f	Merge pull request #5196 from hashicorp/f-plugin-utils Make plugins/shared external and make pluginutls/	2019-01-23 06:59:32 -08:00
Preetha	05bf183ba3	Merge pull request #5225 from hashicorp/b-notaskevent-terminalallocs Don't emit task events after alloc is in a terminal DesiredState	2019-01-23 08:54:10 -06:00
Michael Schurter	32daa7b47b	goimports until make check is happy	2019-01-23 06:27:14 -08:00
Nick Ethier	bcc3935228	tr: use context in as select statement	2019-01-22 20:11:39 -05:00
Michael Schurter	be0bab7c3f	move pluginutils -> helper/pluginutils I wanted a different color bikeshed, so I get to paint it	2019-01-22 15:50:08 -08:00
Alex Dadgar	4bdccab550	goimports	2019-01-22 15:44:31 -08:00
Alex Dadgar	b7a65676fe	gofmt	2019-01-22 15:43:34 -08:00
Alex Dadgar	2ca0e97361	Split hclspec	2019-01-22 15:43:34 -08:00
Alex Dadgar	5ca6dd7988	move hclutils	2019-01-22 15:43:34 -08:00
Alex Dadgar	72a5691897	Driver tests do not use hcl2/hcl, hclspec, or hclutils	2019-01-22 15:43:34 -08:00
Alex Dadgar	b2c7268843	move reattach config	2019-01-22 15:11:58 -08:00
Alex Dadgar	cdcd3c929c	loader and singleton	2019-01-22 15:11:57 -08:00
Alex Dadgar	6c2782f037	move catalog + grpcutils	2019-01-22 15:11:57 -08:00
Preetha Appan	38422642cb	Use DesiredState to determine whether to stop sending task events	2019-01-22 16:43:32 -06:00
Preetha Appan	862c9b7de5	dont emit events for terminal allocs	2019-01-22 16:26:33 -06:00
Michael Schurter	1fa376cac6	Merge pull request #5211 from hashicorp/test-porting-08 Port some 0.8 TaskRunner tests	2019-01-22 14:05:53 -08:00
Michael Schurter	8ced0adb67	test: port TestTaskRunner_CheckWatcher_Restart Added ability to adjust the number of events the TaskRunner keeps as there's no way to observe all events otherwise. Task events differ slightly from 0.8 because 0.9 emits Terminated every time a task exits instead of only when it exits on its own (not due to restart or kill). 0.9 does not emit Killing/Killed for restarts like 0.8 which seems fine as `Restart Signaled/Terminated/Restarting` is more descriptive. Original v0.8 events emitted: ``` expected := []string{ "Received", "Task Setup", "Started", "Restart Signaled", "Killing", "Killed", "Restarting", "Started", "Restart Signaled", "Killing", "Killed", "Restarting", "Started", "Restart Signaled", "Killing", "Killed", "Not Restarting", } ```	2019-01-22 09:46:46 -08:00
Michael Schurter	1719752a9d	test: port RestartTask from 0.8	2019-01-22 08:08:08 -08:00
Michael Schurter	9edff19625	test: port SignalFailure test from 0.8 Also fix signal error handling in mock_driver.	2019-01-22 08:08:08 -08:00
Preetha Appan	299a5fc821	Rename TaskKillRequest/Response to TaskPreKillRequest/Response	2019-01-22 09:54:02 -06:00
Preetha Appan	5a5b9c5666	Fix log comments	2019-01-22 09:45:58 -06:00
Preetha Appan	06e15f8381	Rename TaskKillHook to TaskPreKillHook to more closely match usage Also added/fixed comments	2019-01-22 09:41:56 -06:00
Michael Schurter	3b02af9386	Fix comment Co-Authored-By: preetapan <preetha@hashicorp.com>	2019-01-22 09:41:21 -06:00
Preetha Appan	09291c689b	Rename TaskKillHook to TaskPreKillHook to more closely match usage Also added/fixed comments	2019-01-22 09:41:21 -06:00
Mahmood Ali	a9b73e6b86	Merge pull request #5216 from hashicorp/b-fix-tests-20180118 tests: deflake client TestFS_Logs_TaskPending test	2019-01-21 09:54:15 -05:00
Mahmood Ali	d19ba5bd8e	tests: deflake client TestFS_Logs_TaskPending test	2019-01-18 21:26:48 -05:00
Nick Ethier	47127de671	ar: return error from hooks if occured	2019-01-18 18:31:02 -05:00
Nick Ethier	e3c6f89b9a	drivers: use consts for task handle version	2019-01-18 18:31:01 -05:00
Nick Ethier	6804450c69	cleanup code comments and small fixes from refactor	2019-01-18 18:31:01 -05:00
Nick Ethier	05bd369d1f	driver: add pre09 migration logic	2019-01-18 18:31:01 -05:00
Mahmood Ali	5df63fda7c	Merge pull request #5190 from hashicorp/f-memory-usage Track Basic Memory Usage as reported by cgroups	2019-01-18 16:46:02 -05:00
Chris Baker	290c3f36ad	set TaskGroupName in task_runner	2019-01-18 20:25:11 +00:00
Chris Baker	8917961caa	documenting test for task runner failure to set TaskGroupName	2019-01-18 20:00:49 +00:00
Michael Schurter	cfadacfd95	Merge pull request #5203 from hashicorp/b-terminated client: restore Terminated event on every exit	2019-01-18 08:54:15 -08:00
Danielle Tomlinson	bf21612e2b	Merge pull request #5174 from hashicorp/dani/windows Some Windows fixes and CI	2019-01-18 11:21:53 +01:00
Preetha Appan	e0b68a19c6	Fix one more place that should be using taskResources taskResources handles new resource fields in a backwards compatible way	2019-01-17 15:52:51 -06:00
Michael Schurter	a20ac7c1de	client: restore Terminated event on every exit v0.9.0-dev started emitting a Terminated event every time a task process exited. While this wasn't true in previous versions, it's a useful task event because it's the only place for job operators to view the task's exit code. This behavior is asserted in the e2e/taskevents tests.	2019-01-17 10:02:25 -08:00
Danielle Tomlinson	11c733faa8	allocwatcher: Stat_t is unavailable on win	2019-01-17 18:43:14 +01:00
Danielle Tomlinson	62e06eda56	chore: Cleanup formatting	2019-01-17 18:43:13 +01:00
Danielle Tomlinson	580b8c5dda	client/fs: Skip delete-while-streaming test on win	2019-01-17 18:43:13 +01:00
Danielle Tomlinson	4dbddd0620	client/fs: windows error message for not found	2019-01-17 18:43:13 +01:00
Danielle Tomlinson	915bab2365	vaultclient: use require for error assertions	2019-01-17 18:43:13 +01:00
Danielle Tomlinson	dc55d3e353	vaultclient: Update tests for vault 1.0	2019-01-17 18:43:13 +01:00
Danielle Tomlinson	7a5d511349	fingerprinter: Use HCLogger for windows	2019-01-17 18:43:13 +01:00
Danielle Tomlinson	a695b3562c	Merge pull request #5193 from hashicorp/dani/logmon-reattach logmon: Reattach to existing loggers	2019-01-16 17:34:13 +01:00
Danielle Tomlinson	99da4c780d	logmon: Reattach to existing loggers This commit prevents us from creating duplicate logmon hooks when restoring allocations by persisting the logmon reattach config using HookData.	2019-01-16 14:56:10 +01:00
Michael Schurter	daa7d029a1	test: porting TestTaskRunner_SimpleRun_Dispatch Porting test from 0.8 to 0.9.	2019-01-15 15:22:13 -08:00
Michael Schurter	48afda786b	Merge pull request #5187 from hashicorp/test-consul Port a bunch of pre-0.9 Consul tests to 0.9	2019-01-15 07:41:50 -08:00
Alex Dadgar	471fdb3ccf	Merge pull request #5173 from hashicorp/b-log-levels Plugins use parent loggers	2019-01-14 16:14:30 -08:00
Mahmood Ali	9909d98bee	Track Basic Memory Usage as reported by cgroups Track current memory usage, `memory.usage_in_bytes`, in addition to `memory.max_memory_usage_in_bytes` and friends. This number is closer what Docker reports. Related to https://github.com/hashicorp/nomad/issues/5165 .	2019-01-14 18:47:52 -05:00
Nick Ethier	c619e70d39	Merge pull request #5018 from hashicorp/f-executor-stats executor: streaming stats api	2019-01-14 15:02:35 -05:00
Michael Schurter	4e7ea460e8	test: port some pre-0.9 DeploymentHealth tests Skipping a failing one as I need to move to some other work and don't want to leave this work orphaned on my machine.	2019-01-14 09:56:53 -08:00
Michael Schurter	ff2f23f5f9	test: assert service interpolation behavior Ported from pre-0.9 tests.	2019-01-14 09:56:53 -08:00
Michael Schurter	5746be5844	test: add some extra logging	2019-01-14 09:56:53 -08:00
Michael Schurter	e877bb6370	test: assert shutdown delay deregs first Restore a pre-0.9 test that asserts Consul services are deregistered before a task's shutdown delay.	2019-01-14 09:56:53 -08:00
Michael Schurter	1ca858fa92	Update client/allocrunner/taskrunner/stats_hook.go Co-Authored-By: nickethier <ncethier@gmail.com>	2019-01-14 12:31:27 -05:00
Nick Ethier	fbd403df96	tr: stop stats collection on Exited hook	2019-01-14 12:30:14 -05:00
Nick Ethier	597b7b751d	tr: add retry /w backoff to stats_hook failure	2019-01-12 12:18:24 -05:00
Nick Ethier	7e306afde3	executor: fix failing stats related test	2019-01-12 12:18:23 -05:00
Nick Ethier	9fea54e0dc	executor: implement streaming stats API plugins/driver: update driver interface to support streaming stats client/tr: use streaming stats api TODO: * how to handle errors and closed channel during stats streaming * prevent tight loop if Stats(ctx) returns an error drivers: update drivers TaskStats RPC to handle streaming results executor: better error handling in stats rpc docker: better control and error handling of stats rpc driver: allow stats to return a recoverable error	2019-01-12 12:18:22 -05:00
Preetha Appan	9e8dbf6a4b	linting fixes	2019-01-12 10:38:20 -06:00
Preetha Appan	c94179578d	Make unit test for allocrunner failure much nicer	2019-01-12 10:38:20 -06:00
Preetha Appan	da0d083b03	Add unit test to simulate alloc runner creation failure	2019-01-12 10:38:20 -06:00
Preetha Appan	e7b59ac08c	Only set deployment health if not already set	2019-01-12 10:38:20 -06:00
Michael Schurter	dbf4c3a3c8	Apply suggestions from code review Co-Authored-By: preetapan <preetha@hashicorp.com>	2019-01-12 10:38:20 -06:00
Preetha Appan	7bd1440710	REfactor statedb factory config to set it directly in client config	2019-01-12 10:38:20 -06:00
Preetha Appan	e237f19b38	Remove invalid allocs	2019-01-12 10:38:20 -06:00
Preetha Appan	f059ef8a47	Modified destroy failure handling to rely on allocrunner's destroy method Added a unit test with custom statedb implementation that errors, to use to verify destroy errors	2019-01-12 10:37:12 -06:00
Preetha Appan	6c95da8f67	Add back code to mark alloc as failed when restore fails Also modify restore such that any handled errors don't propagate back to the client	2019-01-12 10:37:12 -06:00
Preetha Appan	5fde0b0f5c	Revert code that made an alloc update when restore fails Restore currently shuts down the client so the alloc update cant always make it to the server	2019-01-12 10:37:12 -06:00
Preetha Appan	41bfdd764b	Handle client initialization errors when adding allocs or restoring allocs We mark the alloc as failed and track failed allocs so that we don't send updates after the first time	2019-01-12 10:37:12 -06:00
Alex Dadgar	14ed757a56	Plugins use parent loggers This PR fixes various instances of plugins being launched without using the parent loggers. This meant that logs would not all go to the same output, break formatting etc.	2019-01-11 11:36:37 -08:00
Danielle Tomlinson	3e586e93da	client: Cleanup allocrunner access	2019-01-11 18:39:18 +01:00
Mahmood Ali	c3eaa0f4c8	tests: enable and fix tests requiring mock driver	2019-01-10 10:10:11 -05:00
Alex Dadgar	bd12e0b1f7	Merge pull request #5168 from hashicorp/b-kill-race Improve Kill handling on task runner	2019-01-09 12:05:10 -08:00
Alex Dadgar	069e181e8f	add more comments	2019-01-09 12:04:22 -08:00
Michael Schurter	e5ddff861c	Spelling fix Co-Authored-By: dadgar <alex@hashicorp.com>	2019-01-09 11:42:40 -08:00
Mahmood Ali	90f3cea187	Merge pull request #5157 from hashicorp/r-drivers-no-cstructs drivers: avoid referencing client/structs package	2019-01-09 13:06:46 -05:00
Mahmood Ali	ff48dbb8a9	Merge pull request #5163 from hashicorp/r-minor-changes-20180108 Fix a panic on node.Deregister fail	2019-01-09 09:56:00 -05:00
Mahmood Ali	1f2473263e	fix more cases of logging arity errors	2019-01-09 09:22:47 -05:00
Mahmood Ali	4952f2a182	Merge pull request #5159 from hashicorp/r-macos-tests Fix Travis MacOS job	2019-01-09 08:22:30 -05:00
Alex Dadgar	149dec2169	Improve Kill handling on task runner This PR improves how killing a task is handled. Before the kill function directly orchestrated the killing and was only valid while the task was running. The new behavior is to mark the desired state and wait for the task runner to converge to that state.	2019-01-08 16:42:26 -08:00
Mahmood Ali	9f7eb1bdfa	tests: fix a test job constaints failing in macOS Allow scheduling mock job when running on MacOS (or Windows) hosts.	2019-01-08 12:37:42 -05:00
Michael Schurter	c24f4f94c1	Merge pull request #5151 from hashicorp/b-task-events Emit Killing task events and add e2e tests	2019-01-08 09:33:04 -08:00
Mahmood Ali	6d36b52412	run gofmt	2019-01-08 11:15:38 -05:00
Michael Schurter	92f9cda5f4	Merge pull request #5035 from hashicorp/test-client test: re-eanble periodic fingerprint test	2019-01-08 07:37:39 -08:00
Michael Schurter	5925424c7c	client: emit Killing/Killed task events We were just emitting Killed/Terminated events before. In v0.8 we emitted Killing/Killed, but lacked Terminated when explicitly stopping a task. This change makes it so Terminated is always included, whether explicitly stopping a task or it exiting on its own. New output: 2019-01-04T14:58:51-08:00 Killed Task successfully killed 2019-01-04T14:58:51-08:00 Terminated Exit Code: 130, Signal: 2 2019-01-04T14:58:51-08:00 Killing Sent interrupt 2019-01-04T14:58:51-08:00 Leader Task Dead Leader Task in Group dead 2019-01-04T14:58:49-08:00 Started Task started by client 2019-01-04T14:58:49-08:00 Task Setup Building Task Directory 2019-01-04T14:58:49-08:00 Received Task received by client Old (v0.8.6) output: 2019-01-04T22:14:54Z Killed Task successfully killed 2019-01-04T22:14:54Z Killing Sent interrupt. Waiting 5s before force killing 2019-01-04T22:14:54Z Leader Task Dead Leader Task in Group dead 2019-01-04T22:14:53Z Started Task started by client 2019-01-04T22:14:53Z Task Setup Building Task Directory 2019-01-04T22:14:53Z Received Task received by client	2019-01-08 07:20:54 -08:00
Michael Schurter	324e989327	Merge pull request #5034 from hashicorp/test-fix-races Test fix races	2019-01-08 07:04:09 -08:00
Mahmood Ali	916a40bb9e	move cstructs.DeviceNetwork to drivers pkg	2019-01-08 09:11:47 -05:00
Mahmood Ali	9369b123de	use drivers.FSIsolation	2019-01-08 09:11:47 -05:00
Mahmood Ali	c10a8fd7fe	remove deprecated allocrunner	2019-01-08 09:11:47 -05:00
Mahmood Ali	f475a56087	remove always false parameter Simplify allocDir.Build() function to avoid depending on client/structs, and remove a parameter that's always set to `false`. The motivation here is to avoid a dependency cycle between drivers/cstructs and alloc_dir.	2019-01-08 09:11:47 -05:00
Danielle Tomlinson	8df20f49f7	drivers: Add internal interface for Shutdown This allows us to correctly terminate internal state during runs of the nomad test suite, e.g closing eventer contexts correctly.	2019-01-08 13:48:49 +01:00
Alex Dadgar	edf132758d	Merge pull request #5152 from hashicorp/f-recover Task runner recovers from external plugin exiting	2019-01-07 15:27:33 -08:00
Alex Dadgar	0106f23aaa	Review comments	2019-01-07 14:50:28 -08:00
Alex Dadgar	79cfe26021	vet	2019-01-07 14:49:41 -08:00
Alex Dadgar	8a35d7b1dd	Test recovery	2019-01-07 14:49:41 -08:00
Alex Dadgar	f40f8ce02e	Mock driver has recovery, stats	2019-01-07 14:49:40 -08:00
Alex Dadgar	3f24e4d6ca	comments	2019-01-07 14:49:40 -08:00
Alex Dadgar	44dca19012	Fix hooks	2019-01-07 14:49:40 -08:00
Alex Dadgar	c9825a9c36	recover	2019-01-07 14:49:40 -08:00
Alex Dadgar	c3f05f2476	Don't log event error on driver shutdown	2019-01-07 14:49:40 -08:00
Michael Schurter	d686ad51fb	Merge pull request #5043 from hashicorp/b-taskenv-conflicts taskenv: have maps take precedence over primitives	2019-01-07 12:34:48 -08:00
Mahmood Ali	0ba7b0c132	tests: helper function for checking docker presense	2019-01-07 08:27:06 -05:00
Mahmood Ali	cd3c6cf60b	taskrunner: emit TaskReceived event Preserve pre-0.9, where task runner emits `Received: Task received by client` event on task runner creation.	2019-01-04 14:32:29 -05:00
Michael Schurter	875e231511	Merge pull request #5038 from hashicorp/b-drivermanager-tests WIP: fix failing tests caused by async driver manager	2019-01-03 12:32:18 -08:00
Danielle Tomlinson	35a4790740	Merge pull request #5142 from hashicorp/dani/cleanup-allocrunner-logs allocrunner: Standardised discard logs	2019-01-03 18:40:48 +01:00
Preetha	8078cb79f0	Merge pull request #5140 from hashicorp/dani/b-taskrunner taskrunner: Persist environment from hooks	2019-01-03 09:30:52 -06:00
Danielle Tomlinson	29196ca70e	allocrunner: Standardised discard logs Follow up from https://github.com/hashicorp/nomad/pull/5007#pullrequestreview-186739124	2019-01-03 14:04:31 +01:00
Danielle Tomlinson	1c8baf7db7	chore: Fix environement->environment typo	2019-01-03 13:31:30 +01:00
Danielle Tomlinson	28aa34ea78	taskrunner: Persist environment from hooks https://github.com/hashicorp/nomad/pull/5032 introduced a regression where the origHookState was used in place of the response from the hook.	2019-01-03 13:13:57 +01:00
Alex Dadgar	d7d32c2f61	Merge pull request #5032 from hashicorp/f-driver-env Store device envs separately and pass to drivers	2018-12-20 13:38:27 -08:00
Michael Schurter	e47a3ceed6	taskenv: have maps take precedence over primitives The Bug: You may have seen log lines like this when running 0.9.0-dev: ``` ... client.alloc_runner.task_runner: some environment variables not available for rendering: ... keys="attr.driver.docker.volumes.enabled, attr.driver.docker.version, attr.driver.docker.bridge_ip, attr.driver.qemu.version" ``` Not only should we not be erroring on builtin driver attributes, but the results were nondeterministic due to map iteration order! The root cause is that we have an old root attribute for all drivers like: ``` attr.driver.docker = "1" ``` When attributes were opaque variable names it was fine to also have "nested" attributes like: ``` attr.driver.docker.version = "1.2.3" ``` However in the HCLv2 world the variable names are no longer opaque: they form an object tree. The `docker` object can no longer both hold a value (`"1"`) and nested attributes (`version = "1.2.3"`). The Fix: Since the old `attr.driver.<name> = "1"` attribues are useless for task config interpolation, create a new precedence rule for creating the task config evaluation context: Maps take precedence over primitives. This means `attr.driver.docker.version` will always take precedence over `attr.driver.docker`. The results are determinstic and give users access to the more useful metadata. I made this a general precedence rule instead of special-casing driver attrs because it seemed like better default behavior than spamming WARNings to logs that were likely unactionable by users.	2018-12-20 11:37:46 -08:00
Nick Ethier	a96afb6c91	fix tests that fail as a result of async client startup	2018-12-20 00:53:44 -05:00
Nick Ethier	6c43ccf628	client: add proper build flag to allocrunner testing.go	2018-12-19 20:22:07 -05:00
Michael Schurter	0a0fb6f86d	test: re-eanble periodic fingerprint test	2018-12-19 17:08:24 -08:00
Michael Schurter	add2dd8c2d	test: copy AR's Alloc before mutating Fixes a race in client tests	2018-12-19 15:48:02 -08:00
Michael Schurter	17ed3f27ae	drivermgr: fix race in building driver list	2018-12-19 15:48:02 -08:00
Michael Schurter	4448f19413	Merge pull request #5030 from hashicorp/test-client-statusupdate client: assert alloc status updates work	2018-12-19 14:55:34 -08:00
Alex Dadgar	9d34802f7a	Store device envs separately and pass to drivers	2018-12-19 14:23:09 -08:00
Michael Schurter	951100af16	client: assert alloc status updates work Re-enabling and updating an old test. Able to cut out a ton of extra work by using WaitForRunning which does almost everything this test needs.	2018-12-19 11:41:53 -08:00
Michael Schurter	ee23bdafbc	client/state: missing deploy status isn't an error Fixes TestClient_SaveRestoreState	2018-12-19 10:39:27 -08:00
Michael Schurter	c84998e996	tests: implement HasHealth for mock health	2018-12-19 10:39:27 -08:00
Michael Schurter	ba1ddd2238	gofmt -s -w upgrade_int_test.go	2018-12-19 10:39:27 -08:00
Michael Schurter	337d07fdd8	client/state: improve upgradeTaskBucket error handling And add a test	2018-12-19 10:39:27 -08:00
Michael Schurter	c5ddcb6a15	client/state: add context to errors Unfortunately I don't know how to test these errors. As far as I can tell they should only happen if there was a programming error in the upgrade code or the underlying boltdb was corrupted somehow.	2018-12-19 10:39:27 -08:00
Michael Schurter	99bd5b3422	client/state: use 2 as version; test error path	2018-12-19 10:39:27 -08:00
Michael Schurter	d9ea8252a7	client/state: support upgrading from 0.8->0.9 Also persist and load DeploymentStatus to avoid rechecking health after client restarts.	2018-12-19 10:39:27 -08:00
Michael Schurter	0018b2f659	client/state: reorg state buckets to ease transition * Prefix task bucket with task- to prevent name conflicts * Shorten device manager bucket name * Remove commented out outdated var * Update layout comment	2018-12-19 10:22:28 -08:00
Michael Schurter	461599ff20	tr: fix HookState Copy() and Equal() methods They did not take into account the Env field.	2018-12-19 09:58:06 -08:00
Danielle Tomlinson	c580512d32	allocrunner: Close updates routine correctly	2018-12-19 18:32:51 +01:00
Nick Ethier	969ec51730	devicemanager: fix devicemanager tests	2018-12-19 00:35:12 -05:00
Nick Ethier	6f1777284d	drivermanager: use correct plugin config types	2018-12-18 23:07:01 -05:00
Nick Ethier	a02308ee6a	drivermanager: attempt to reattach and shutdown driver plugin if blocked by allow/block lists	2018-12-18 23:01:57 -05:00
Nick Ethier	ce1a5cba0e	drivermanager: use allocID and task name to route task events	2018-12-18 23:01:51 -05:00
Nick Ethier	bda32f9c79	client/pluginmanager: add plugin manager interface to device/driver managers	2018-12-18 22:56:23 -05:00
Nick Ethier	d8a0265e68	client: batch initial fingerprinting in plugin manangers drivermanager: fix pr comments/feedback	2018-12-18 22:56:19 -05:00
Nick Ethier	7d23cbf448	client/drivermananger: fixup issues from rebase and address PR comments	2018-12-18 22:55:38 -05:00
Nick Ethier	1543335710	tr: deregister task handler on cleanup	2018-12-18 22:55:38 -05:00
Nick Ethier	82175d1328	client/drivermananger: add driver manager The driver manager is modeled after the device manager and is started by the client. It's responsible for handling driver lifecycle and reattachment state, as well as processing the incomming fingerprint and task events from each driver. The mananger exposes a method for registering event handlers for task events that is used by the task runner to update the server when a task has been updated with an event. Since driver fingerprinting has been implemented by the driver manager, it is no longer needed in the fingerprint mananger and has been removed.	2018-12-18 22:55:18 -05:00
Alex Dadgar	730a6f5b9a	lint	2018-12-18 16:48:00 -08:00
Alex Dadgar	4c57d2ec4d	Add plugin API versioning to plugin loader and plugins	2018-12-18 16:48:00 -08:00
Alex Dadgar	9d1403d617	Merge pull request #5002 from hashicorp/b-task-config-resources Convert driver resource to AllocatedTaskResource	2018-12-18 16:46:34 -08:00
Danielle Tomlinson	0edc65631a	Merge pull request #5007 from hashicorp/dani/f-allocrunner-async allocrunner: Async api for shutdown/destroy/update	2018-12-19 01:26:41 +01:00
Alex Dadgar	8efac7ec81	Fix unit tests + upgrade pathing resources	2018-12-18 15:50:44 -08:00
Alex Dadgar	b8268d9a46	Lint	2018-12-18 15:50:44 -08:00
Alex Dadgar	66cf3156b2	LinuxResources doesn't use task.Resources	2018-12-18 15:50:44 -08:00
Alex Dadgar	327b551b39	Drivers	2018-12-18 15:50:11 -08:00
Alex Dadgar	b653ae2af7	utilities	2018-12-18 15:48:52 -08:00
Danielle Tomlinson	95a0c4fb29	taskrunner: Use a random suffix for Task Config The RestartCount is not really suitable for use as a source of uniqueness within task invocations as it is not monotonic, and interacts with the restart stanza in a users config, so conflates restarts due to task failures, with restarts due to enviromental changes, such as consul template or vault secrets changing. Here we instead use a substring from a uuid, which is more random than we strictly need, but is nicer than rolling our own random string generator here.	2018-12-19 00:38:54 +01:00
Danielle Tomlinson	1be0170ebe	client: Update tests for async destroy	2018-12-18 23:38:34 +01:00
Danielle Tomlinson	d6eb084d8a	allocrunner: Drop and log updates after closing waitCh	2018-12-18 23:38:34 +01:00
Danielle Tomlinson	0d91285cd6	allocrunner: Documentation for ShutdownCh/DestroyCh	2018-12-18 23:38:34 +01:00
Danielle Tomlinson	f2bb13818e	fixup: Log when we detect out of order updates	2018-12-18 23:38:33 +01:00
Danielle Tomlinson	986fde0f5a	allocrunner: Handle updates asynchronously This creates a new buffered channel and goroutine on the allocrunner for serializing updates to allocations. This allows us to take updates off the routine that is used from processing updates from the server, without having complicated machinery for tracking update lifetimes, or other external synchronization. This results in a nice performance improvement and signficantly better throughput on batch changes such as preempting a large number of jobs for a larger placement.	2018-12-18 23:38:33 +01:00
Danielle Tomlinson	f3fa9d1406	gc: Wait for allocrunners to be destroyed	2018-12-18 23:38:33 +01:00
Danielle Tomlinson	cb78a90f40	client: Async API for shutdown/destroy allocrunners	2018-12-18 23:38:33 +01:00
Danielle Tomlinson	d1fbac1aad	allocrunner: Async shutdown and destroy This commit reduces the locking required to shutdown or destroy allocrunners, and allows parallel shutdown and destroy of allocrunners during shutdown.	2018-12-18 23:38:33 +01:00
Danielle Tomlinson	d9174d8dcf	Merge pull request #4989 from hashicorp/dani/b-client-update-race-condition client: Give a copy of clientconfig to allocrunner	2018-12-17 10:49:46 +01:00
Danielle Tomlinson	53aa1bc198	Merge pull request #5004 from hashicorp/dani/f-hook-errors client: Emit TaskEvents when task hooks fail	2018-12-17 10:42:57 +01:00
Danielle Tomlinson	a50ea29da4	taskrunner: Use hook errors for artifacts	2018-12-17 10:39:38 +01:00
Mahmood Ali	2d2c562e18	Remove implicit check I intended to remove this line in 29ef7ecf2372f980d12a9900e1b2a351568dd415 - see my notes there for details.	2018-12-16 09:14:26 -05:00
Mahmood Ali	d58e38e912	tests: avoid implicitly asserting clean shutdown The assertion here is causing many spurious failures that aren't actually relevant to the test itself. We are tracking the cause for this failure independently, and it would make more sense to have a dedicated test for clean shutdown.	2018-12-15 15:30:09 -05:00
Danielle Tomlinson	3647b701a6	taskrunner: Emit task events when a hook fails	2018-12-13 18:20:18 +01:00
Danielle Tomlinson	8b06e8d297	Merge pull request #4990 from hashicorp/dani/b-alloc-lock client: updateAlloc release lock after read	2018-12-13 12:43:59 +01:00
Danielle Tomlinson	3823599da9	client: Give a copy of clientconfig to allocrunner Currently, there is a race condition between creating a taskrunner, and updating node attributes via fingerprinting. This is because the taskenv builder will try to iterate over the clientconfig.Node.Attributes map, which can be concurrently updated by the fingerprinting process, thus causing a panic. This fixes that by providing a copy of the clientconfg to the allocrunner inside the Read lock during config creation.	2018-12-13 12:42:15 +01:00
Alex Dadgar	20c59df8b9	Merge pull request #4969 from hashicorp/f-alloc-hooks Make alloc health watcher a postrun hook rather than shutdown hook	2018-12-12 14:34:36 -08:00
Danielle Tomlinson	4184eadaf4	client: updateAlloc release lock after read The allocLock is used to synchronize access to the alloc runner map, not to ensure internal consistency of the alloc runners themselves. This updates the updateAlloc process to avoid hanging on to an exclusive lock of the map while applying changes to allocrunners themselves, as they should be internally consistent. This fixes a bug where any client allocation api will block during the shutdown or updating of an allocrunner and its child taskrunners.	2018-12-12 16:30:01 +01:00
Mahmood Ali	3d166e6e9c	Merge pull request #4984 from hashicorp/b-client-update-driver client: update driver info on new driver fingerprint	2018-12-11 18:01:03 -05:00
Mahmood Ali	69b2355274	Merge pull request #4975 from hashicorp/fix-master-20181209 Some test fixes and remedies	2018-12-11 18:00:21 -05:00
Alex Dadgar	1531b6d534	Merge pull request #4970 from hashicorp/f-no-iops Deprecate IOPS	2018-12-11 12:51:22 -08:00
Mahmood Ali	ba515947c2	client: update driver info on new fingerprint Fixes a bug where a driver health and attributes are never updated from their initial status. If a driver started unhealthy, it may never go into a healthy status.	2018-12-11 14:25:10 -05:00
Danielle Tomlinson	ed1791f4bf	client: Style: use fluent style for building loggers	2018-12-11 18:03:45 +01:00
Danielle Tomlinson	805669ead4	client: Correctly pass a noop PrevAllocMigrator when restoring	2018-12-11 15:46:58 +01:00
Mahmood Ali	3babda5d45	tests: no need for buffer channel	2018-12-11 09:35:26 -05:00
Mahmood Ali	5a487ac884	tests: prevent indefinite blocking in some tests Noticed few places where tests seem to block indefinitely and panic after the test run reaches the test package timeout. I intend to follow up with the proper fix later, but timing out is much better than indefinitely blocking.	2018-12-11 09:35:26 -05:00
Mahmood Ali	4635168f20	test: fix TestFingerprintManager_Run_Combination Let's use a fingerprinter that doesn't have values prepopulated in test fixtures.	2018-12-11 09:35:26 -05:00
Danielle Tomlinson	6fb5ca6ad5	allocrunner: Test alloc runners should include a noop migrator	2018-12-11 13:12:35 +01:00
Danielle Tomlinson	4b4b85e3f4	allocwatcher: Cleanup new migrator/watcher interface	2018-12-11 13:12:35 +01:00
Danielle Tomlinson	83720575de	client: Unify handling of previous and preempted allocs	2018-12-11 13:12:35 +01:00
Danielle Tomlinson	dff7093243	client: Wait for preempted allocs to terminate When starting an allocation that is preempting other allocs, we create a new group allocation watcher, and then wait for the allocations to terminate in the allocation PreRun hooks. If there's no preempted allocations, then we simply provide a NoopAllocWatcher.	2018-12-11 00:59:18 +01:00
Danielle Tomlinson	2cdef6a7b4	allocwatcher: Add Group AllocWatcher The Group Alloc watcher is an implementation of a PrevAllocWatcher that can wait for multiple previous allocs before terminating. This is to be used when running an allocation that is preempting upstream allocations, and thus only supports being ran with a local alloc watcher. It also currently requires all of its child watchers to correctly handle context cancellation. Should this be a problem, it should be fairly easy to implement a replacement using channels rather than a waitgroup. It obeys the PrevAllocWatcher interface for convenience, but it may be better to extract Migration capabilities into a seperate interface for greater clarity.	2018-12-11 00:58:27 +01:00
Marcin Matlaszek	39eec70f31	Recover from any possible io error when invoking Write on FileRotator As of now, FileRotator uses bufio.Write under the hood to write data to configured output file. Due to the way how bufio handles any occurred io error - saves it into `err` variable never resetting it automatically - any operation like `Write`, `Flush` etc will become a no-op, returning the very same, saved error (eg. Out of disk space) even when the problem is fixed (eg. disk space is available again). That automatically means that FileRotator will stop writing any logs, reporting the same error over and over again, even if it's no longer valid. This PR fixes it by resetting the bufio Writer, which resets any errors and tries to write requested data.	2018-12-07 18:22:29 +01:00
Alex Dadgar	1e3c3cb287	Deprecate IOPS IOPS have been modelled as a resource since Nomad 0.1 but has never actually been detected and there is no plan in the short term to add detection. This is because IOPS is a bit simplistic of a unit to define the performance requirements from the underlying storage system. In its current state it adds unnecessary confusion and can be removed without impacting any users. This PR leaves IOPS defined at the jobspec parsing level and in the api/ resources since these are the two public uses of the field. These should be considered deprecated and only exist to allow users to stop using them during the Nomad 0.9.x release. In the future, there should be no expectation that the field will exist.	2018-12-06 15:09:26 -08:00
Danielle Tomlinson	e3621c55fa	gc: Fix maxallocs integration test	2018-12-06 21:50:50 +01:00
Alex Dadgar	c4b5f80918	Make alloc health watcher a postrun hook rather than shutdown hook	2018-12-06 12:30:31 -08:00
Danielle Tomlinson	62b98e64ca	client/gc: Replace GC integration test with unit The previous integration test was broken during the client refactor, and it seems to be some sort of race with state updating. I'm going to try and construct a replacement test as part of work on performance, but for now, the underlying behaviour is still being tested.	2018-12-06 12:28:23 +01:00
Danielle Tomlinson	f6e474fd55	client: Re-enable GC tests	2018-12-06 12:28:23 +01:00
Danielle Tomlinson	d043532cb0	allocrunner: Basic test alloc runner	2018-12-06 12:28:23 +01:00
Alex Dadgar	b39c21d49c	Fix various bugs with task events Fixes the following: * Emitting events when the task fails to start * Don't double emit events on task shutdown (nomad stop) * Don't emit a OOM kill metric unless actually OOM'd	2018-12-05 14:27:07 -08:00
Danielle Tomlinson	10b3e68a6d	Merge pull request #4925 from hashicorp/f-driver-plugins-dani Third Party Driver Plugins Support	2018-12-03 20:48:19 +01:00
Mahmood Ali	88622b97bd	libcontainer to manage /dev and /proc (#4945 ) libcontainer already manages `/dev`, overriding task_dir - so let's use it for `/proc` as well and remove deadcode.	2018-12-03 10:41:01 -05:00
Danielle Tomlinson	9bd77e9295	testfix: Fix import cycle in allocdir tests	2018-12-01 17:25:30 +01:00
Danielle Tomlinson	66c521ca17	client: Move fingerprint structs to pkg This removes a cyclical dependency when importing client/structs from dependencies of the plugin_loader, specifically, drivers. Due to client/config also depending on the plugin_loader. It also better reflects the ownership of fingerprint structs, as they are fairly internal to the fingerprint manager.	2018-12-01 17:10:39 +01:00
Danielle Tomlinson	2db5ae38d8	client: Rename drivers/shared/env => client/taskenv	2018-11-30 12:18:39 +01:00
Danielle Tomlinson	f3a77b8084	client: Merge driver/shared/structs and client/structs	2018-11-30 10:56:45 +01:00
Danielle Tomlinson	b9295f0d56	client/driver: Remove package	2018-11-30 10:47:08 +01:00
Danielle Tomlinson	fdfe93aa25	fixup: executorplugin: fix rkt build	2018-11-30 10:47:08 +01:00
Danielle Tomlinson	d72ecd95ec	client/driver: Vendor setEnvvars into docker_test	2018-11-30 10:46:13 +01:00
Danielle Tomlinson	d26a310db0	client: Move executor plugins into own package	2018-11-30 10:46:13 +01:00
Danielle Tomlinson	d259c36844	driver: Flatten SetEnvvars into taskdirhook	2018-11-30 10:46:13 +01:00
Danielle Tomlinson	6b72e96eba	client: Move driver/logging to logmon/logging The logging package is used by logmon and the legacy mock_driver. Because the legacy drivers are going away, I'm moving it here to signify its actual ownership.	2018-11-30 10:46:13 +01:00
Danielle Tomlinson	04c8851b4c	client: Migrate DriverStats optout to drivers/shared/structs	2018-11-30 10:46:13 +01:00
Danielle Tomlinson	dbd82e1af4	client: Remove test dependency on client/driver	2018-11-30 10:46:13 +01:00
Danielle Tomlinson	0544a57abe	drivers: Move client/drivers/executor to drivers/shared/executor	2018-11-30 10:46:13 +01:00
Danielle Tomlinson	1a29811169	drivers: Move client/drivers/env to drivers/shared/env As part of deprecating legacy drivers, we're moving the env package to a new drivers/shared tree, as it is used by the modern docker and rkt driver packages, and is useful for 3rd party plugins.	2018-11-30 10:46:13 +01:00
Nick Ethier	bbe420718a	Merge pull request #4922 from hashicorp/f-drivermananger add generic plugin manager interface and orchestration	2018-11-28 22:17:04 -05:00
Preetha	1f526db414	Merge pull request #4919 from hashicorp/f-fingerprint-attribute-type Modify fingerprint interface to use typed attribute struct	2018-11-28 14:18:28 -06:00
Michael Schurter	1bd9a9f9dd	Merge pull request #4894 from hashicorp/f-device-hook Device hook and devices affect computed node class	2018-11-28 12:10:43 -06:00
Preetha Appan	f89dbcd9cc	modify fingerprint interface to use typed attribute struct	2018-11-28 10:01:03 -06:00
Nick Ethier	60c6907ea5	client/plugin: remove println from plugin group func	2018-11-27 22:45:09 -05:00
Nick Ethier	600738e991	client/plugin: lint/spelling errors	2018-11-27 22:45:09 -05:00
Nick Ethier	45a6bf7acd	client/plugin: add generic plugin mananger interface and orchestration	2018-11-27 22:45:03 -05:00
Mahmood Ali	ad1f8d8c20	Fixes in old lxc driver	2018-11-27 21:40:43 -05:00
Michael Schurter	3e56ee005a	add nil check around task resources in device hook Looking at NewTaskRunner I'm unsure whether TaskRunner.TaskResources (from which req.TaskResources is set) is intended to be nil at times or if the TODO in NewTaskRunner is intended to ensure it is always non-nil.	2018-11-27 17:25:33 -08:00
Michael Schurter	b75e9fce37	assume that slices contain only non-nil items	2018-11-27 17:25:33 -08:00
Michael Schurter	85073f9d29	client: properly support hook env vars The old approach was incomplete. Hook env vars are now: * persisted and restored between agent restarts * deterministic (LWW if 2 hooks set the same key)	2018-11-27 17:25:33 -08:00
Alex Dadgar	4ee603c382	Device hook and devices affect computed node class This PR introduces a device hook that retrieves the device mount information for an allocation. It also updates the computed node class computation to take into account devices. TODO Fix the task runner unit test. The environment variable is being lost even though it is being properly set in the prestart hook.	2018-11-27 17:25:33 -08:00
Michael Schurter	27e07f657e	Merge pull request #4896 from hashicorp/b-prevalloc-deadlock Fix deadlock in previous alloc watcher by emitting last alloc update	2018-11-27 19:07:16 -06:00
Michael Schurter	b75f79a793	fix test breakage caused by rebase	2018-11-27 16:34:01 -08:00
Michael Schurter	91da566935	fix mispelings	2018-11-27 16:33:55 -08:00
Chris Baker	a1fb1f3830	Merge pull request #4891 from hashicorp/b-1150-rkt-volume-names drivers/rkt: fix invalid volumes	2018-11-27 18:55:00 -05:00
Danielle Tomlinson	3651dbdc25	Merge pull request #4909 from hashicorp/b-restart-delay taskrunner: Return the restart delay correctly	2018-11-27 23:55:54 +01:00
Michael Schurter	22149a661e	client: comment on importance of chan ops ordering	2018-11-27 14:11:32 -08:00

... 4 5 6 7 8 ...

3898 commits