Commit graph

8712 commits

Author SHA1 Message Date
Michael Schurter ed77c0944b DRY up restart handling a bit.
All 3 error/failure cases share restart logic, but 2 of them have
special cased conditions.
2017-09-14 16:48:39 -07:00
Michael Schurter 573a0df03d Watched -> TriggersRestart
Watched was a silly name
2017-09-14 16:48:39 -07:00
Michael Schurter 4ea19baa52 Handle multiple failing checks on a single task
Before this commit if a task had 2 checks cause restarts at the same
time, both would trigger restarts of the task! This change removes all
checks for a task whenever one of them is restarted.
2017-09-14 16:48:39 -07:00
Michael Schurter 73fb71ca10 RestartDelay isn't needed as checks are re-added on restarts
@dadgar made the excellent observation in #3105 that TaskRunner removes
and re-registers checks on restarts. This means checkWatcher doesn't
need to do *any* internal restart tracking. Individual checks can just
remove themselves and be re-added when the task restarts.
2017-09-14 16:48:39 -07:00
Michael Schurter 448ad3945f Simplify from 2 select loops to one 2017-09-14 16:48:39 -07:00
Michael Schurter 550e631eea Wrap check watch updates in a struct
Reusing checkRestart for both adds/removes and the main check restarting
logic was confusing.
2017-09-14 16:48:39 -07:00
Michael Schurter d299d42089 Canonicalize and Merge CheckRestart in api 2017-09-14 16:48:39 -07:00
Michael Schurter 72e5c0c0aa Fix whitespace 2017-09-14 16:47:41 -07:00
Michael Schurter 06dd86adbd Remove unused lastStart field 2017-09-14 16:47:41 -07:00
Michael Schurter 0447f79288 Removed partially implemented allocLock 2017-09-14 16:47:41 -07:00
Michael Schurter ade29ecbed Improve check watcher logging and add tests
Also expose a mock Consul Agent to allow testing ServiceClient and
checkWatcher from TaskRunner without actually talking to a real Consul.
2017-09-14 16:47:41 -07:00
Michael Schurter 6a9e0c63c4 Add changelog entry for #3105 2017-09-14 16:47:41 -07:00
Michael Schurter 95c6077435 Document new check_restart stanza 2017-09-14 16:46:54 -07:00
Michael Schurter 99f4aa999a Default grace period to 1s 2017-09-14 16:46:54 -07:00
Michael Schurter a137676358 Add comments and move delay calc to TaskRunner 2017-09-14 16:46:54 -07:00
Michael Schurter a180c00fc3 on_warning=false -> ignore_warnings=false
Treat warnings as unhealthy by default
2017-09-14 16:46:54 -07:00
Michael Schurter 8a87475498 Use existing restart policy infrastructure 2017-09-14 16:46:54 -07:00
Michael Schurter 22690c5f4c Add check watcher for restarting unhealthy tasks 2017-09-14 16:46:54 -07:00
Michael Schurter b35d208428 Nest restart fields in CheckRestart 2017-09-14 16:46:54 -07:00
Michael Schurter bf34505509 Add restart fields 2017-09-14 16:46:54 -07:00
Chelsea Komlo 3b857c5e8f Merge pull request #3213 from hashicorp/f-acl-job-summary
Add job endpoint ACL
2017-09-14 18:21:19 -04:00
Alex Dadgar c08f9e729f Merge pull request #3217 from hashicorp/b-batch-filter
Fix batch handling of complete allocs/node drains
2017-09-14 15:11:40 -07:00
Alex Dadgar b2f892b2ac changelog 2017-09-14 15:11:26 -07:00
Alex Dadgar 3904bde9a3 Fix batch handling of complete allocs/node drains
This PR fixes:
* An issue in which a node-drain that contains a complete batch alloc
would cause a replacement
* An issue in which allocations with the same name during a scale
down/stop event wouldn't be properly stopped.
* An issue in which batch allocations from previous job versions may not
have been stopped properly.

Fixes https://github.com/hashicorp/nomad/issues/3210
2017-09-14 15:08:57 -07:00
Alex Dadgar 96442414b8 Changelog 2017-09-14 14:35:53 -07:00
Alex Dadgar 6c935f7303 Update CHANGELOG.md 2017-09-14 14:34:02 -07:00
Alex Dadgar d156cb48b3 Merge pull request #3206 from hashicorp/b-eval-index
Worker waits til max ModifyIndex across EvalsByJob
2017-09-14 14:29:32 -07:00
Alex Dadgar e862bbc78a Changelog 2017-09-14 14:29:02 -07:00
Alex Dadgar 567eef50a8 Address feedback 2017-09-14 14:28:43 -07:00
Alex Dadgar 6911bd7676 Worker waits til max ModifyIndex across EvalsByJob
This PR fixes a scheduling race condition in which the plan results from
one invocation of the scheduler were not being considered by the next
since the Worker was not waiting for the correct index.

Fixes https://github.com/hashicorp/nomad/issues/3198
2017-09-14 14:28:43 -07:00
Alex Dadgar 765de546d8 Merge pull request #3214 from hashicorp/f-agent-servers
Sort /v1/agent/servers output
2017-09-14 14:22:00 -07:00
Alex Dadgar 08a0b1c2b6 changelog 2017-09-14 14:21:41 -07:00
Alex Dadgar 01180fec58 use assert 2017-09-14 14:20:22 -07:00
Alex Dadgar c55b7ce4d6 Sort /v1/agent/servers output
This PR sorts the output of the endpoint since its results are used as
part of Consul checks to avoid the value changing unnecessarily.

Fixes https://github.com/hashicorp/nomad/issues/3211
2017-09-14 14:20:22 -07:00
Alex Dadgar 90a3c20017 Merge pull request #3195 from hashicorp/b-node-locking
Non-locked accessors to common Node fields
2017-09-14 14:09:35 -07:00
Alex Dadgar d306da846c changelog and feedback 2017-09-14 14:08:58 -07:00
Alex Dadgar 07ed83fdd5 Non-locked accessors to common Node fields
This PR removes locking around commonly accessed node attributes that do
not need to be locked. The locking could cause nodes to TTL as the
heartbeat code path was acquiring a lock that could be held for an
excessively long time. An example of this is when Vault is inaccessible,
since the fingerprint is run with a lock held but the Vault
fingerprinter makes the API calls with a large timeout.

Fixes https://github.com/hashicorp/nomad/issues/2689
2017-09-14 14:08:26 -07:00
Chelsea Holland Komlo be7efd71d4 fixups from code review 2017-09-14 20:14:38 +00:00
Chelsea Holland Komlo 0d28c95b6b use separate response object 2017-09-14 19:17:05 +00:00
Chelsea Holland Komlo 79abb9810b update to use ACL test helpers 2017-09-14 19:08:25 +00:00
Chelsea Holland Komlo 3eff2a06c5 add job endpoint ACL 2017-09-14 18:17:35 +00:00
Alex Dadgar 2102bc3968 Merge pull request #3209 from dezmodue/patch-1
Adding missing <
2017-09-14 10:53:26 -07:00
Alex Dadgar fa2dd57071 Merge pull request #3205 from hashicorp/f-deployment-acl
Deployment.GetDeployment ACL enforcement
2017-09-14 10:50:17 -07:00
Alex Dadgar 1e644393aa review feeback 2017-09-14 10:50:04 -07:00
Alex Dadgar 9b997d2670 fix multierror merge 2017-09-13 21:48:52 -07:00
Alex Dadgar 5502f46951 changelog 2017-09-13 15:46:41 -07:00
Alex Dadgar 0de4df881f Merge pull request #3203 from hashicorp/b-search-hyphens
Fix UUID search with hyphens
2017-09-13 15:45:22 -07:00
Chelsea Komlo a8adee10c0 Merge pull request #3171 from hashicorp/f-prometheus-metrics
Prometheus metrics
2017-09-13 17:02:32 -04:00
Chelsea Holland Komlo 4ccb73ac67 vendor go-metrics 2017-09-13 19:31:26 +00:00
Chelsea Holland Komlo 2939751811 vendor gzip libarary 2017-09-13 19:21:21 +00:00