Commit graph

320 commits

Author SHA1 Message Date
Michael Schurter ade29ecbed Improve check watcher logging and add tests
Also expose a mock Consul Agent to allow testing ServiceClient and
checkWatcher from TaskRunner without actually talking to a real Consul.
2017-09-14 16:47:41 -07:00
Michael Schurter a137676358 Add comments and move delay calc to TaskRunner 2017-09-14 16:46:54 -07:00
Michael Schurter a180c00fc3 on_warning=false -> ignore_warnings=false
Treat warnings as unhealthy by default
2017-09-14 16:46:54 -07:00
Michael Schurter 8a87475498 Use existing restart policy infrastructure 2017-09-14 16:46:54 -07:00
Michael Schurter 22690c5f4c Add check watcher for restarting unhealthy tasks 2017-09-14 16:46:54 -07:00
Michael Schurter 7f6e1f3a9c Initializing embedded structs is weird 2017-08-17 16:49:14 -07:00
Michael Schurter 0634eef12a Test createCheckReg 2017-08-17 16:49:14 -07:00
Michael Schurter bb8d5689d8 Add Header and Method support for HTTP checks 2017-08-17 16:44:21 -07:00
Alex Dadgar 43dff0a11d Fix integration test 2017-08-14 10:52:49 -07:00
Alex Dadgar 6e20acb503 Merge pull request #2984 from hashicorp/b-tags
Fix alloc health with checks using interpolation
2017-08-10 13:07:25 -07:00
Alex Dadgar c8f74ac43b Address comments 2017-08-10 13:07:08 -07:00
Alex Dadgar d86b3977b9 Fix alloc health with checks using interpolation
Fixes an issue in which the allocation health watcher was checking for
allocations health based on un-interpolated services and checks. Change
the interface for retrieving check information from Consul to retrieving
all registered services and checks by allocation. In the future this
will allow us to output nicer messages.

Fixes https://github.com/hashicorp/nomad/issues/2969
2017-08-07 16:27:08 -07:00
Luke Farnell f0ced87b95 fixed all spelling mistakes for goreport 2017-08-07 17:13:05 -04:00
Michael Schurter 5794e5ece7 Use int32 for atomic ops to avoid alignment issues
From https://golang.org/pkg/sync/atomic/#pkg-note-BUG :

On both ARM and x86-32, it is the caller's responsibility to arrange for
64-bit alignment of 64-bit words accessed atomically. The first word in
a global variable or in an allocated struct or slice can be relied upon
to be 64-bit aligned.
2017-08-04 10:14:16 -07:00
Michael Schurter d2f8fdcad5 Fix comment 2017-07-25 12:13:05 -07:00
Michael Schurter 3e6231842d Forgot to setcmdenv
This would leak a consul agent
2017-07-25 12:09:57 -07:00
Michael Schurter 4b83eba599 Use seen more conservatively 2017-07-24 16:48:40 -07:00
Michael Schurter cdf138eb27 Always increment failures...
...as it's used in calculating the backoff
2017-07-24 15:37:53 -07:00
Michael Schurter 809724ad8d Track whether Consul has ever been seen
Need a way to squelch Consul operation errors on shutdown. If it's never
been seen don't log errors about deregs failing.
2017-07-24 12:12:02 -07:00
Michael Schurter edbe62a879 Synchronously deregister agent on shutdown
Fixes #2891

Previously the agent services and checks were being asynchrously
deregistered on shutdown, so it was a race between the sync goroutine
deregistering them and Nomad shutting down.

This switches to synchronously deregister agent serivces and checks
which doesn't really have a downside since the sync goroutines retry
behavior doesn't help on shutdown anyway.
2017-07-24 11:40:37 -07:00
Alex Dadgar 553bc91725 Parallel client tests (#2890)
* alloc_runner

* Random tests

* parallel task_runner and no exec compatible check

* Parallel client

* Fail fast and use random ports

* Fix docker port mapping

* Make concurrent pull less timing dependant

* up parallel

* Fixes

* don't build chroots in parallel on travis

* Reduce parallelism on travis with lxc/rkt

* make java test app not run forever

* drop parallelism a little

* use docker ports that are out of the os's ephemeral port range

* Limit even more on travis

* rkt deadline
2017-07-22 19:04:36 -07:00
Alex Dadgar 4dd5d943c7 remove root requirement on consul integration check 2017-07-21 19:32:41 -07:00
Michael Schurter 125a3fb2f9 Error -> Errof 2017-07-19 10:00:57 -07:00
Michael Schurter 99d1486f32 Never remove unknown agent services
Fixes #2827

This is a tradeoff. The pro is that you can run separate client and
server agents on the same node and advertise both. The con is that if a
Nomad agent crashes and isn't restarted on that node in the same mode
its entry will not be cleaned up.

That con scenario seems far less likely to occur than the scenario on
the pro side, and even if we do leak an agent entry the checks will be
failing so nothing should attempt to use it.
2017-07-18 13:23:01 -07:00
Alex Dadgar bf2dafb8e9 check id method name changed 2017-07-07 12:15:09 -07:00
Alex Dadgar 067ed86a47 Client watches for allocation health using task state and Consul checks
This PR adds watching of allocation health at the client. The client can
watch for health based on the tasks running on time and also based on
the consul checks passing.
2017-07-07 12:10:04 -07:00
Michael Schurter a863ead30e Fix test error formats 2017-06-26 12:53:43 -07:00
Michael Schurter 9da78ae25f Remove debug logging 2017-06-21 17:19:08 -07:00
Michael Schurter c0eff81383 Fix Service.AddressMode changes during task updates 2017-06-21 17:19:08 -07:00
Michael Schurter 67d154a274 Test driver network advertisement and checks 2017-06-21 17:19:08 -07:00
Michael Schurter b9bfb84b53 Implement DriverNetwork and Service.AddressMode
Ideally DriverNetwork would be fully populated in Driver.Prestart, but
Docker doesn't assign the container's IP until you start the container.

However, it's important to setup the port env vars before calling
Driver.Start, so Prestart should populate that.
2017-06-21 17:19:08 -07:00
Michael Schurter 06f937bf28 Merge pull request #2591 from hashicorp/b-2180-script-updates
Properly interpolate services on updated tasks
2017-05-17 09:09:01 -07:00
Alex Dadgar ba70cc4f01 Merge branch 'master' into f-bolt-db 2017-05-09 11:11:55 -07:00
Michael Schurter 85210eb92f Update consul/api to support unix socket addrs
Fixes #2594
2017-05-08 11:57:04 -07:00
Alex Dadgar 2d54ee2925 Fix tests 2017-05-03 15:14:19 -07:00
Alex Dadgar 9faa98e13b Fix tests 2017-05-03 12:38:49 -07:00
Michael Schurter cafefa049b Properly interpolate services on updated tasks
Previously was interpolating the original task's services again.

Fixes #2180

Also fixes a slight memory leak in the new consul agent. Script check
handles weren't being deleted after cancellation.
2017-04-26 11:22:01 -07:00
Michael Schurter 8926743106 Fix consul test build on Windows 2017-04-19 16:14:11 -07:00
Michael Schurter 83f9591d75 Thanks go vet! 2017-04-19 13:05:41 -07:00
Michael Schurter e997ae44a5 Skip checks with TLSSkipVerify if it's unsupported
Fixes #2218
2017-04-19 12:45:34 -07:00
Michael Schurter 4910f867e7 Use spiffy new Go 1.8 subtest feature 2017-04-19 12:42:48 -07:00
Michael Schurter 3e8dd386ee Forgot an important word 2017-04-19 12:42:48 -07:00
Michael Schurter 947e31e9c2 Only register HTTPS agent check when Consul>=0.7.2
Support for TLSSkipVerify in other checks coming soon!
2017-04-19 12:42:48 -07:00
Michael Schurter 383f21559e Explain weird timer logic 2017-04-19 12:42:48 -07:00
Michael Schurter 4db840b54e Metricsify new Consul client 2017-04-19 12:42:48 -07:00
Michael Schurter 59c687c940 Always fail script checks when deadline exceeded 2017-04-19 12:42:48 -07:00
Michael Schurter ca29bb2cac Test script check exit codes 2017-04-19 12:42:47 -07:00
Michael Schurter 6f6431ce0b Follow _testing.go convention for testing tools 2017-04-19 12:42:47 -07:00
Michael Schurter 2a11508cc8 Rework to account for ports not being in IDs
Previous implementation assumed all struct fields were included in
service and check IDs. Service IDs never include port labels and check
IDs *optionally* include port labels, so lots of things had to change.

Added a really big test to exercise this.
2017-04-19 12:42:47 -07:00
Michael Schurter 6e0fd86361 Remove commits return value
...and still protect against leaking agent entries in Consul on
shutdown.
2017-04-19 12:42:47 -07:00
Michael Schurter fb52bbfa45 Explain PortLabel handling in RegisterAgent 2017-04-19 12:42:47 -07:00
Michael Schurter 2728216f6f Plumb alloc id + task name into script check logs 2017-04-19 12:42:47 -07:00
Michael Schurter 01b20cbb59 Stop being lazy and just type out struct{}{} 2017-04-19 12:42:47 -07:00
Michael Schurter 5e908dbf38 Use nifty testtask sleep command for xplat compat 2017-04-19 12:42:47 -07:00
Michael Schurter 8c433cfba1 Add comments, clarify names, fix PR comments 2017-04-19 12:42:47 -07:00
Michael Schurter 40b3987e90 Explain cleanup defer in test 2017-04-19 12:42:47 -07:00
Michael Schurter 7c8ad71da5 Fix comment to reflect reality 2017-04-19 12:42:47 -07:00
Michael Schurter 745ad9521f Move ScriptExecutor to driver 2017-04-19 12:42:47 -07:00
Michael Schurter 926d141532 Fix shutdown when consul is down 2017-04-19 12:42:47 -07:00
Michael Schurter 87b28cfb75 Switch ServiceClient to synchronizing state
Previously it applied a stream of operations. Reconciling state is less
complex and error prone at the cost of slightly higher CPU/memory usage.
2017-04-19 12:42:47 -07:00
Michael Schurter 244251490a Add UpdateTask method instead of Remove/Add 2017-04-19 12:42:47 -07:00
Michael Schurter 8118c41931 Remove some lies 2017-04-19 12:42:47 -07:00
Michael Schurter 0143bef8c7 Remove unused syncInterval 2017-04-19 12:42:47 -07:00
Michael Schurter e204a287ed Refactor Consul Syncer into new ServiceClient
Fixes #2478 #2474 #1995 #2294

The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.

The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.

Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.

Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
2017-04-19 12:42:47 -07:00
Michael Schurter 9798e7c7c1 Add SyncNow test 2016-12-02 16:44:18 -08:00
Michael Schurter 92f579891c Remove lie from comment 2016-11-18 11:05:22 -08:00
Michael Schurter fc1f557a36 Fix incorrect lock usage around consul maps 2016-11-18 10:12:02 -08:00
Michael Schurter da9a8bbcef Fix SyncNow 2016-11-18 10:12:02 -08:00
Michael Schurter 57e5e03ccc New chaos test to exercise consul syncer 2016-11-18 10:12:02 -08:00
Michael Schurter 393efe2579 Switch syncer tests to use embedded consul 2016-11-15 10:22:25 -08:00
Diptanu Choudhury 7367d6adb4 Removing warn msg 2016-09-04 17:22:49 -07:00
Michael Schurter ec96e03563 Query consul without helpers in test to be safe 2016-08-31 14:14:17 -07:00
Michael Schurter aa8cbe777a Make comment more precise 2016-08-31 10:43:55 -07:00
Michael Schurter 7f6b5f4d2a Fix error message when querying consul fails 2016-08-31 10:42:51 -07:00
Michael Schurter db1cf6385e Assert syncer state is consistent with consul's state 2016-08-31 09:19:54 -07:00
Michael Schurter f6bf81270d Remove extra consul service tracking entirely
Instead just remove all associated services on shutdown.
2016-08-30 17:10:15 -07:00
Michael Schurter 27912a95d6 Fix old services not getting removed from consul on update
Fixes #1661
2016-08-30 16:36:42 -07:00
capone ce4d236a40 Fixed CR defects 2016-08-24 01:33:44 +03:00
capone 10cc50ad29 First attemt to fix issue #1636 2016-08-23 19:09:20 +03:00
Diptanu Choudhury 4683aa3dc6 Cleaning up some code 2016-08-16 15:22:26 -07:00
Marin 69bc3a8fc8 Add support for initial check status 2016-08-16 12:05:15 -07:00
David Bresson 7d0fb58f03 Add a test for query string support 2016-08-11 12:22:46 -07:00
David Bresson c651cbbfd6 Add error handling to check registration 2016-08-11 11:49:48 -07:00
David Bresson f47a543a4d Add support for query strings in check paths 2016-08-11 11:27:29 -07:00
Alex Dadgar c203885352 consul syncer uses multi-error 2016-08-09 12:24:50 -07:00
Diptanu Choudhury 48b9684b1e Using net.JoinHostPort instead of handcrafting addrs 2016-07-08 16:45:14 -07:00
Diptanu Choudhury b180223f4b Allowing ports to be overriden in check definitions 2016-07-08 14:14:25 -07:00
Sean Chittenden 871a31a8ec
Teach config.ConsulConfig how to construct a consulapi TLS client.
Said differently, centralize the creation of consul's client config
in one place and use it everywhere.
2016-06-16 22:51:06 -07:00
Sean Chittenden 59c54e9aeb
Fix tests, don't take the address of DefaultConsulCommand() 2016-06-16 21:21:39 -07:00
Sean Chittenden d17af396ca
Create config.DefaultConsulConfig() 2016-06-16 20:41:05 -07:00
Sean Chittenden 10ce8f27d4
Temporarily disable various syncer checks due to the API changes made earlier today. 2016-06-13 19:52:17 -07:00
Sean Chittenden a54b4e08f8
Drive-by comment correction 2016-06-13 18:14:50 -07:00
Alex Dadgar 4b04e503f3 address comments 2016-06-13 17:32:18 -07:00
Alex Dadgar 8bbf4a55e5 Fix IDs and domain scoping 2016-06-13 16:30:58 -07:00
Diptanu Choudhury 2e10458367 Removing unwated line of code 2016-06-13 15:37:55 +02:00
Diptanu Choudhury d019d8ef8e implemented reconciliation of unwanted services 2016-06-13 14:52:26 +02:00
Alex Dadgar 232654cdee register checks 2016-06-12 21:28:56 -07:00
Alex Dadgar a82c2bb058 Do not reconcile in client and cleanup executor a bit 2016-06-12 18:22:07 -07:00
Alex Dadgar c68d23b7d6 Merge pull request #1264 from hashicorp/b-rename-services
Rename ConsulService back to Service
2016-06-12 16:44:49 -07:00
Diptanu Choudhury 751b2e1bcb Remove initial delay of registering services with consul 2016-06-13 01:42:56 +02:00
Alex Dadgar 8e231fa382 Rename ConsulService back to Service 2016-06-12 16:36:49 -07:00
Sean Chittenden 917766a3df
Prefer %+q over %q in log messages. 2016-06-11 18:17:20 -04:00
Sean Chittenden bbd8dfa798
goling(1) compliance pass (e.g. Rpc* -> RPC) 2016-06-10 23:38:28 -04:00
Sean Chittenden bc771d35df
Query for the Nomad service across multiple Consul datacenters. 2016-06-10 23:05:14 -04:00
Sean Chittenden d99467ef5e
Always create a consul.Syncer. Use a default Consul Config if necessary. 2016-06-10 15:55:27 -04:00
Sean Chittenden f6a0459ae5
Always create a consul.Syncer. Use a default Consul Config if necessary. 2016-06-10 15:55:27 -04:00
Sean Chittenden 447fe59fd2
Hand wave over the syncer tests atm, these will be fixed shortly. 2016-06-10 15:54:39 -04:00
Sean Chittenden fa7285cb5b
Don't spam the consul if Consul is not available.
Log once when Consul goes away, and log when Consul comes back.
2016-06-10 15:54:39 -04:00
Sean Chittenden a2109af862
Properly cover Syncer attributes with the registryLock.
trackedServices, delegateChecks, trackedChecks, and checkRunners
should all be covered.  This lock needs to be reasonably narrow and
can't use defer due to possible recursive locking concerns further
downstream from the call sites.
2016-06-10 15:54:39 -04:00
Sean Chittenden f9b561b52f
On Syncer Shutdown, remove all services that match a Syncer's prefix. 2016-06-10 15:54:39 -04:00
Sean Chittenden 57e084e4df
Sync checks with Consul by comparing the AgentCheckReg w/ ConsulService
The source of truth is the local Nomad Agent.  Any checks are not local that
have a matching prefix are removed.  Changed checks are re-registered
and missing checks are re-added.
2016-06-10 15:54:39 -04:00
Sean Chittenden 197feae679
Sync services with Consul by comparing the AgentServiceReg w/ ConsulService
The source of truth is the local Nomad Agent.  Any services not local that
have a matching prefix are removed.  Changed services are re-registered
and missing services are re-added.
2016-06-10 15:54:39 -04:00
Sean Chittenden 9a223936bb
Generate and sync Consul ServiceIDs consistently 2016-06-10 15:54:39 -04:00
Sean Chittenden bcad11cc2a
Rename runChecks to consulAvailable
Apologies in advance for the variable thrash, the fingerprinter is
no longer used to gate whether or not Consul is available any more.
2016-06-10 15:54:39 -04:00
Sean Chittenden efe00d2ccd
Update Syncer.Run() to call SyncServices(). 2016-06-10 15:54:39 -04:00
Sean Chittenden 83775c5c8b
Add "Service Groups" to the Syncer.
Now the right way to register services with the Syncer is to call
`SetServices(groupName, []*services)`.  This was required to allow
the Syncer to sync either the Client, Server, or Both using a
single Syncer.
2016-06-10 15:54:39 -04:00
Sean Chittenden 06e5e615a8
Rename command/agent/consul/sync.go to syncer.go 2016-06-10 15:54:39 -04:00
Sean Chittenden 7956eb0c80
Rename structs.Task's Service attribute to ConsulService 2016-06-10 15:54:39 -04:00
Sean Chittenden 05a2b9528c
Remove Syncer.registerService()
This call is obsolete by a future commit that changes the canonical
source of truth to be consul.AgentServiceRegistration structs, which
means it is not necessary to construct AgentServiceRegistration
objects every time a registration is made, we just reuse the existing
object.
2016-06-10 15:54:39 -04:00
Sean Chittenden 8c813630e6
Move package client/consul/sync to command/agent/consul.
This has been done to allow the Server and Client to reuse the same
Syncer because the Agent may be running Client, Server, or both
simultaneously and we only want one Syncer object alive in the agent.
2016-06-10 15:54:39 -04:00