Commit graph

2416 commits

Author SHA1 Message Date
Alex Dadgar 8cf9d15b01 typo 2017-07-22 12:33:07 -07:00
Alex Dadgar 9e9c20ca77 small fixes 2017-07-22 12:25:02 -07:00
Alex Dadgar 5a3df2ed89 Merge pull request #2888 from hashicorp/b-fix-allocrunner-test
Fix TestAllocRunner_TaskLeader_StopTG and unrelated races
2017-07-22 11:44:04 -07:00
Alex Dadgar 46c8bec9b0 faster vaultclient 2017-07-21 19:38:37 -07:00
Michael Schurter d840fc8c95 Fix tr race by not sharing alloc/task
prestart only needs the original alloc/task so pass their pointers in.
Task updates may concurrently replace the pointer on tr.
2017-07-21 16:17:42 -07:00
Michael Schurter a22cfa8387 Minor test race fix 2017-07-21 16:17:23 -07:00
Michael Schurter 9a7a1d8c13 Fix race by not accessing tr.task from ar 2017-07-21 16:16:53 -07:00
Michael Schurter 2e9a1e3fa6 Remove unneeded saveTaskRunnerState method
Collapse it into the one place it's called
2017-07-21 16:16:02 -07:00
Michael Schurter 996ce9286e Fix test race by locking around ar.tasks access 2017-07-21 14:25:51 -07:00
Michael Schurter 8d1d8eac46 Fix handle race 2017-07-21 14:00:32 -07:00
Michael Schurter 5f40901422 Fix more test races 2017-07-21 14:00:21 -07:00
Michael Schurter b9ba447399 Fixup a few more even rarer test races 2017-07-21 13:43:32 -07:00
Michael Schurter 38cb2021dd Always interpolate task before calling with Consul
Also switch to returning a copy of the task to avoid races between
altering the Task and persitence.
2017-07-21 13:37:16 -07:00
Michael Schurter 6e80a8ee39 Fix TestAllocRunner_TaskLeader_StopTG
Also make alloc runner tests less racy. Basically every alloc runner
test used to have races with `upd.{Count,Allocs}`
2017-07-21 13:37:16 -07:00
Alex Dadgar e509661cf9 executor and logging pkg 2017-07-21 12:14:54 -07:00
Alex Dadgar 7c433a1767 Parallel 2017-07-21 12:06:39 -07:00
Alex Dadgar 56f9cf86df Speed up client startup 2017-07-20 22:34:24 -07:00
Michael Schurter 0d7f7e2b9d Merge pull request #2878 from hashicorp/b-save-state
Fix state handling on restart
2017-07-20 17:16:46 -07:00
Karel Malec cf985f011c Pass task group name as NOMAD_GROUP_NAME environment variable 2017-07-21 01:22:54 +02:00
Alex Dadgar 09c8ee621b Destroy tasks that are part of terminal alloc 2017-07-20 12:02:04 -07:00
Michael Schurter 9a7f649e56 Don't save task runner state if it is destroyed 2017-07-20 10:17:41 -07:00
Alex Dadgar 64776b1370 Should not persist state after alloc_runner is garbage collected 2017-07-19 17:31:30 -07:00
Michael Schurter c1b8bef813 Use broadcast send retry logic everywhere 2017-07-18 14:36:32 -07:00
Alex Dadgar d2381c9263 Merge pull request #2853 from hashicorp/b-watcher
Improve alloc health watcher
2017-07-18 14:12:28 -07:00
Alex Dadgar bd43bd509c Save deployment status 2017-07-18 12:37:52 -07:00
Alex Dadgar 41f67e3535 Small fixes 2017-07-18 12:19:57 -07:00
Michael Schurter c24e73ede7 Fix deadlock caused by syncing during destroy
When replacing an alloc the new alloc is blocked until the old alloc is
destroyed. This could cause a deadlock:

1. Destroying the old alloc includes a final sync of its status
2. Syncing status causes a GC
3. A GC looks for terminal allocs to cleanup
4. The GC waits for an alloc to stop completely before GC'ing

If the GC chooses the currently-being-destroyed-alloc to GC, the GC
deadlocks. If `client.max_parallel` deadlocks happen the GC is wedged
until the Nomad process is restarted.

Performing the final sync asynchronously is an ugly hack but prevents
the deadlock by allowing the final sync to occur after the alloc runner
has shutdown and been destroyed.
2017-07-18 11:12:56 -07:00
Michael Schurter 420be86e39 Test AllocDir.Copy 2017-07-17 15:46:54 -07:00
Michael Schurter cdb2e96d99 Add AllocRunner.allocID for ease-of-use
Since the AllocRunner.alloc struct can be mutated, most of AllocRunner
needs to acquire a lock to get the alloc's ID. Log lines always need to
include the alloc ID, so we often skipped acquiring a lock just to grab
the ID and accepted the race.

Let's make the race detector a little happier by storing the ID in a
single assignment field.
2017-07-17 15:46:54 -07:00
Michael Schurter 181fda825a Fix log level 2017-07-17 15:46:54 -07:00
Michael Schurter 98f6e7f10f Don't fail if task dirs don't exist on creation
Task dir metadata is created in AllocRunner.Run which may not run before
an alloc is sync'd and Nomad exits. There's no reason not to just create
task dir metadata on restore if it doesn't exist.
2017-07-17 15:46:54 -07:00
Michael Schurter 51515cbe0c Ensure allocDir is never nil and persisted safely
Fixes #2834
2017-07-17 15:46:54 -07:00
Alex Dadgar 0821ee67f5 Fix alloc broadcaster panic on double close 2017-07-17 14:09:05 -07:00
Michael Schurter 0a6bf87365 Fix nil panic in Docker error condition
Fixes #2835

Yet another bug caused by overwriting container and then trying to
reference container.ID in the err handling block. Did a quick audit of
docker.go and it seems to be the last offender. See #2804 for previous
bug.
2017-07-14 10:48:19 -07:00
Alex Dadgar 05894f4611 Small fixes 2017-07-07 17:34:50 -07:00
Michael Schurter fecb16cfb2 Merge pull request #2793 from hashicorp/b-2776-ct-vault-servername
Propagate vault.tls_server_name to consul-template
2017-07-07 16:44:19 -07:00
Michael Schurter 95a9a5da71 Merge pull request #2787 from hashicorp/f-docker-test-mac
Test #2652 - Docker MAC Address option
2017-07-07 16:22:10 -07:00
Michael Schurter 4be4df21c9 Merge pull request #2797 from hashicorp/f-2785-docker-bridge-ip
Add driver.docker.bridge_ip node attribute
2017-07-07 16:20:20 -07:00
Michael Schurter 94389c3ecc Remove debug logging 2017-07-07 16:19:42 -07:00
Michael Schurter 5e3e3818db Merge pull request #2804 from hashicorp/b-2802-docker-panic
Don't panic in container list/remove/inspect race
2017-07-07 15:35:51 -07:00
Michael Schurter 67a7b0eac9 Don't panic in container list/remove/inspect race
Fixes #2802

While it's hard to reproduce the theoretical race is:

1. This goroutine calls ListContainers()
2. Another goroutine removes a container X
3. This goroutine attempts to InspectContainer(X)

However, this bug could be hit in the much simpler case of
InspectContainer() timing out.

In those cases an error is returned and the old code attempted to wrap
the error with the now-nil container.ID. Storing the container ID fixes
that panic.
2017-07-07 15:10:59 -07:00
Alex Dadgar bf97a2455c Vet and small improvement on watcher failure detection 2017-07-07 14:53:01 -07:00
Alex Dadgar 45712c6ca3 test fixes 2017-07-07 14:11:27 -07:00
Alex Dadgar ade9a7c768 @jippi Changed my mind! Good suggestion 2017-07-07 12:12:48 -07:00
Alex Dadgar c063eba836 Warn log 2017-07-07 12:10:04 -07:00
Alex Dadgar 067ed86a47 Client watches for allocation health using task state and Consul checks
This PR adds watching of allocation health at the client. The client can
watch for health based on the tasks running on time and also based on
the consul checks passing.
2017-07-07 12:10:04 -07:00
Alex Dadgar 001058227e watcher per alloc 2017-07-07 12:07:08 -07:00
Alex Dadgar 2e2fd26bed Update index 2017-07-07 12:07:08 -07:00
Alex Dadgar ecee5e370e initial watcher 2017-07-07 12:07:08 -07:00
Alex Dadgar c77944ed29 assign names 2017-07-07 12:03:11 -07:00