Commit graph

140 commits

Author SHA1 Message Date
Michael Schurter cdcefd0908 Use the Service.Hash() method in agent service ids
The allocID and taskName parameters are useless for agents, but it's
still nice to reuse the same hash method for agent and task services.
This brings in the lowercase mode for the agent hash as well.
2017-12-11 16:50:15 -08:00
Michael Schurter 4f1002c1a8 Be more defensive in port checks 2017-12-08 12:27:57 -08:00
Michael Schurter d613e0aaf5 Move service hash logic to Service.Hash method 2017-12-08 12:03:43 -08:00
Michael Schurter b71edf846f Hash fields used in task service IDs
Fixes #3620

Previously we concatenated tags into task service IDs. This could break
deregistration of tag names that contained double //s like some Fabio
tags.

This change breaks service ID backward compatibility so on upgrade all
users services and checks will be removed and re-added with new IDs.

This change has the side effect of including all service fields in the
ID's hash, so we no longer have to track PortLabel and AddressMode
changes independently.
2017-12-08 12:03:43 -08:00
Michael Schurter 91282315d1 Prevent using port 0 with address_mode=driver 2017-12-08 12:03:43 -08:00
Michael Schurter 4b20441eef Validate port label for host address mode
Also skip getting an address for script checks which don't use them.

Fixed a weird invalid reserved port in a TaskRunner test helper as well
as a problem with our mock Alloc/Job. Hopefully the latter doesn't cause
other tests to fail, but we were referencing an invalid PortLabel and
just not catching it before.
2017-12-08 12:03:43 -08:00
Michael Schurter 4347026f83 Test Consul from TaskRunner thoroughly
Rely less on the mockConsulServiceClient because the real
consul.ServiceClient needs all the testing it can get!
2017-12-08 12:03:00 -08:00
Michael Schurter 4ae115dc59 Allow custom ports for services and checks
Fixes #3380

Adds address_mode to checks (but no auto) and allows services and checks
to set literal port numbers when using address_mode=driver.

This allows SDNs, overlays, etc to advertise internal and host addresses
as well as do checks against either.
2017-12-08 12:03:00 -08:00
Jens Herrmann 5680fcccc2 Fix typos in metric names. #3610 2017-12-01 15:24:14 +01:00
Michael Schurter 0aace3d749 Don't set Interval on TTL health checks 2017-10-16 17:35:47 -07:00
Alex Dadgar 4173834231 Enable more linters 2017-09-26 15:26:33 -07:00
Michael Schurter a844fba8d2 Fix comments: task -> check 2017-09-15 15:19:53 -07:00
Michael Schurter 0f2a3dcec9 Test check watch updates 2017-09-14 16:48:39 -07:00
Michael Schurter 847fe080f6 Rename unhealthy var and fix test indeterminism 2017-09-14 16:48:39 -07:00
Michael Schurter 573a0df03d Watched -> TriggersRestart
Watched was a silly name
2017-09-14 16:48:39 -07:00
Michael Schurter 4ea19baa52 Handle multiple failing checks on a single task
Before this commit if a task had 2 checks cause restarts at the same
time, both would trigger restarts of the task! This change removes all
checks for a task whenever one of them is restarted.
2017-09-14 16:48:39 -07:00
Michael Schurter 73fb71ca10 RestartDelay isn't needed as checks are re-added on restarts
@dadgar made the excellent observation in #3105 that TaskRunner removes
and re-registers checks on restarts. This means checkWatcher doesn't
need to do *any* internal restart tracking. Individual checks can just
remove themselves and be re-added when the task restarts.
2017-09-14 16:48:39 -07:00
Michael Schurter 448ad3945f Simplify from 2 select loops to one 2017-09-14 16:48:39 -07:00
Michael Schurter 550e631eea Wrap check watch updates in a struct
Reusing checkRestart for both adds/removes and the main check restarting
logic was confusing.
2017-09-14 16:48:39 -07:00
Michael Schurter 72e5c0c0aa Fix whitespace 2017-09-14 16:47:41 -07:00
Michael Schurter ade29ecbed Improve check watcher logging and add tests
Also expose a mock Consul Agent to allow testing ServiceClient and
checkWatcher from TaskRunner without actually talking to a real Consul.
2017-09-14 16:47:41 -07:00
Michael Schurter a137676358 Add comments and move delay calc to TaskRunner 2017-09-14 16:46:54 -07:00
Michael Schurter a180c00fc3 on_warning=false -> ignore_warnings=false
Treat warnings as unhealthy by default
2017-09-14 16:46:54 -07:00
Michael Schurter 8a87475498 Use existing restart policy infrastructure 2017-09-14 16:46:54 -07:00
Michael Schurter 22690c5f4c Add check watcher for restarting unhealthy tasks 2017-09-14 16:46:54 -07:00
Michael Schurter 7f6e1f3a9c Initializing embedded structs is weird 2017-08-17 16:49:14 -07:00
Michael Schurter 0634eef12a Test createCheckReg 2017-08-17 16:49:14 -07:00
Michael Schurter bb8d5689d8 Add Header and Method support for HTTP checks 2017-08-17 16:44:21 -07:00
Alex Dadgar 43dff0a11d Fix integration test 2017-08-14 10:52:49 -07:00
Alex Dadgar 6e20acb503 Merge pull request #2984 from hashicorp/b-tags
Fix alloc health with checks using interpolation
2017-08-10 13:07:25 -07:00
Alex Dadgar c8f74ac43b Address comments 2017-08-10 13:07:08 -07:00
Alex Dadgar d86b3977b9 Fix alloc health with checks using interpolation
Fixes an issue in which the allocation health watcher was checking for
allocations health based on un-interpolated services and checks. Change
the interface for retrieving check information from Consul to retrieving
all registered services and checks by allocation. In the future this
will allow us to output nicer messages.

Fixes https://github.com/hashicorp/nomad/issues/2969
2017-08-07 16:27:08 -07:00
Luke Farnell f0ced87b95 fixed all spelling mistakes for goreport 2017-08-07 17:13:05 -04:00
Michael Schurter 5794e5ece7 Use int32 for atomic ops to avoid alignment issues
From https://golang.org/pkg/sync/atomic/#pkg-note-BUG :

On both ARM and x86-32, it is the caller's responsibility to arrange for
64-bit alignment of 64-bit words accessed atomically. The first word in
a global variable or in an allocated struct or slice can be relied upon
to be 64-bit aligned.
2017-08-04 10:14:16 -07:00
Michael Schurter d2f8fdcad5 Fix comment 2017-07-25 12:13:05 -07:00
Michael Schurter 3e6231842d Forgot to setcmdenv
This would leak a consul agent
2017-07-25 12:09:57 -07:00
Michael Schurter 4b83eba599 Use seen more conservatively 2017-07-24 16:48:40 -07:00
Michael Schurter cdf138eb27 Always increment failures...
...as it's used in calculating the backoff
2017-07-24 15:37:53 -07:00
Michael Schurter 809724ad8d Track whether Consul has ever been seen
Need a way to squelch Consul operation errors on shutdown. If it's never
been seen don't log errors about deregs failing.
2017-07-24 12:12:02 -07:00
Michael Schurter edbe62a879 Synchronously deregister agent on shutdown
Fixes #2891

Previously the agent services and checks were being asynchrously
deregistered on shutdown, so it was a race between the sync goroutine
deregistering them and Nomad shutting down.

This switches to synchronously deregister agent serivces and checks
which doesn't really have a downside since the sync goroutines retry
behavior doesn't help on shutdown anyway.
2017-07-24 11:40:37 -07:00
Alex Dadgar 553bc91725 Parallel client tests (#2890)
* alloc_runner

* Random tests

* parallel task_runner and no exec compatible check

* Parallel client

* Fail fast and use random ports

* Fix docker port mapping

* Make concurrent pull less timing dependant

* up parallel

* Fixes

* don't build chroots in parallel on travis

* Reduce parallelism on travis with lxc/rkt

* make java test app not run forever

* drop parallelism a little

* use docker ports that are out of the os's ephemeral port range

* Limit even more on travis

* rkt deadline
2017-07-22 19:04:36 -07:00
Alex Dadgar 4dd5d943c7 remove root requirement on consul integration check 2017-07-21 19:32:41 -07:00
Michael Schurter 125a3fb2f9 Error -> Errof 2017-07-19 10:00:57 -07:00
Michael Schurter 99d1486f32 Never remove unknown agent services
Fixes #2827

This is a tradeoff. The pro is that you can run separate client and
server agents on the same node and advertise both. The con is that if a
Nomad agent crashes and isn't restarted on that node in the same mode
its entry will not be cleaned up.

That con scenario seems far less likely to occur than the scenario on
the pro side, and even if we do leak an agent entry the checks will be
failing so nothing should attempt to use it.
2017-07-18 13:23:01 -07:00
Alex Dadgar bf2dafb8e9 check id method name changed 2017-07-07 12:15:09 -07:00
Alex Dadgar 067ed86a47 Client watches for allocation health using task state and Consul checks
This PR adds watching of allocation health at the client. The client can
watch for health based on the tasks running on time and also based on
the consul checks passing.
2017-07-07 12:10:04 -07:00
Michael Schurter a863ead30e Fix test error formats 2017-06-26 12:53:43 -07:00
Michael Schurter 9da78ae25f Remove debug logging 2017-06-21 17:19:08 -07:00
Michael Schurter c0eff81383 Fix Service.AddressMode changes during task updates 2017-06-21 17:19:08 -07:00
Michael Schurter 67d154a274 Test driver network advertisement and checks 2017-06-21 17:19:08 -07:00