Commit graph

522 commits

Author SHA1 Message Date
Alex Dadgar 9d612c8cb0 Cleanup 2018-04-16 15:48:34 -07:00
Alex Dadgar 32adaf9dfc Copy the config given to the alloc runner 2018-04-16 15:45:52 -07:00
Alex Dadgar 4f2a7b6949 Fix copying drivers 2018-04-16 15:45:51 -07:00
Alex Dadgar 0b799822ff Operate on copy 2018-04-16 15:45:49 -07:00
Alex Dadgar ff1a1a63e8 Move where attribute for driver detection is set 2018-04-12 15:50:25 -07:00
Alex Dadgar f24ce2c50c Driver health detection cleanups
This PR does:

1. Health message based on detection has format "Driver XXX detected"
and "Driver XXX not detected"
2. Set initial health description based on detection status and don't
wait for the first health check.
3. Combine updating attributes on the node, fingerprint and health
checking update for drivers into a single call back.
4. Condensed driver info in `node status` only shows detected drivers
and make the output less wide by removing spaces.
2018-04-12 12:46:40 -07:00
Andrei Burd 502d17fa90 Added node class to tagged metrics 2018-04-11 12:20:59 +03:00
Alex Dadgar 3d367d6fd7 Fix client uptime metric missing client prefix 2018-04-10 10:39:36 -07:00
Alex Dadgar ae1f76477e Start rebalance after discovering new servers 2018-04-05 15:41:59 -07:00
Alex Dadgar be2513e0f9 more jitter 2018-04-05 13:48:33 -07:00
Alex Dadgar bd3345942c Handle no leader and faster retries near limit
Handle the ErrNoLeader case and apply slower retries. Also when we have
missed the heartbeat retry aggressively, backing off after we have
missed for more than 30 seconds.
2018-04-05 11:22:47 -07:00
Alex Dadgar 279b5c22e5 Scale heartbeat retrying based on remaining heartbeat time 2018-04-05 10:58:13 -07:00
Alex Dadgar 7941f4eb2d Fire retry only when consul discovers new servers 2018-04-05 10:40:17 -07:00
Alex Dadgar 86c32358d4
Spelling error 2018-04-03 18:30:01 -07:00
Alex Dadgar 01a6beafbf RPC Retry Watcher 2018-04-03 18:05:28 -07:00
Alex Dadgar 58a3ec3fb2 Improve Vault error handling 2018-04-03 14:29:22 -07:00
Chelsea Holland Komlo 2174ede6b9 add clarifying comment 2018-03-29 10:58:39 -04:00
Chelsea Holland Komlo e3319afee1 emit first node event 2018-03-28 17:26:53 -04:00
Chelsea Holland Komlo efc03e252c specify driver health messages 2018-03-28 11:35:21 -04:00
Chelsea Holland Komlo 003bc209b9 use time.Time for node events for compatibility 2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo f801709a0a fix issue when updating node events 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 60f12d206f improve comments; update watchDriver 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 739784736a remove unused function 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d92703617c simplify logic
bump log level
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 86b7b3d2d9 fix up health check logic comparison; add node events to client driver checks 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 53a5bc2bb3 Code review feedback 2018-03-21 15:15:26 -04:00
Alex Dadgar 34dc58421c notes from walk through 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 44b6951dda improve tests 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 0425be8f48 updating comments; locking concurrent node access 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo c50d02ae93 go style; update comments 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3aa726baab fix scheduler driver name; create node structs file 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3cba95e8a7 allow nomad to schedule based on the status of a client driver health check
Slight updates for go style
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 0bde357731 add concept of health checks to fingerprinters and nodes
fix up feedback from code review

add driver info for all drivers to node
2018-03-21 15:15:25 -04:00
Preetha Appan 3c38eededd
Fix spelling in comment 2018-03-14 15:54:25 -05:00
Alex Dadgar bef4a8ee09 fix clearing node events 2018-03-14 09:48:59 -07:00
Chelsea Komlo 810eedfa2a
Merge pull request #3945 from hashicorp/f-add-node-events
Add node events
2018-03-14 08:42:55 -04:00
Preetha 360d6e5a92
Merge pull request #3968 from hashicorp/f-nicer-vault-error
Make server side error messages from vault more clearer
2018-03-13 20:49:39 -05:00
Alex Dadgar de6ebb6e6c small cleanup 2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo b41501e442 code review feedback 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 1488b076d1 code review feedback 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo a8655320fd fix up go check warnings 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 0934769b04 add client side emitting of node events
Changelog
2018-03-13 18:08:21 -07:00
Preetha Appan 914eaed64f
Address some code review comments 2018-03-13 18:19:16 -05:00
Preetha Appan 09c231ce43
Return the err from server correctly 2018-03-13 18:10:14 -05:00
Preetha Appan 9618f52746
Remove error wrapping and make vault connection server side errors clearer. 2018-03-13 17:09:03 -05:00
Alex Dadgar 4844317cc2
Merge pull request #3890 from hashicorp/b-heartbeat
Heartbeat improvements and handling failures during establishing leadership
2018-03-12 14:41:59 -07:00
Josh Soref 173ce63fe9 spelling: transition 2018-03-11 19:06:05 +00:00
Josh Soref 782c704de6 spelling: thresholds 2018-03-11 19:03:47 +00:00
Josh Soref 8149694f3a spelling: server 2018-03-11 18:55:30 +00:00
Josh Soref 258d76ec13 spelling: registry 2018-03-11 18:41:13 +00:00
Josh Soref 3c1ce6d16d spelling: otherwise 2018-03-11 18:34:27 +00:00
Josh Soref 1ef6d6319e spelling: labels 2018-03-11 18:21:44 +00:00
Josh Soref 52b83328fc spelling: heartbeating 2018-03-11 18:12:19 +00:00
Josh Soref c9b86bbc2f spelling: controls 2018-03-11 17:50:39 +00:00
Josh Soref e78cf9c81a spelling: already 2018-03-11 17:39:04 +00:00
Josh Soref b8b46d3f74 spelling: allocation 2018-03-11 17:37:22 +00:00
Chelsea Holland Komlo 122d1c4e4a simplify retry logic 2018-03-01 09:48:26 -05:00
Chelsea Holland Komlo 355805db56 reset timer after updating node copy 2018-02-27 17:18:10 -05:00
Chelsea Holland Komlo a72aaaf47f add network resources equal method, use time ticker
remove impossible test case
2018-02-27 12:42:53 -05:00
Chelsea Holland Komlo e736e31820 use time ticker, update how network resources are compared 2018-02-26 18:47:11 -05:00
Chelsea Holland Komlo 5059065b52 improved testing; node networks comparison 2018-02-26 15:55:38 -05:00
Chelsea Holland Komlo 1f31b39fe8 code review fixups 2018-02-26 12:36:30 -05:00
Chelsea Holland Komlo ed8c8afbcd edge trigger node update
test update config copy trigger
2018-02-26 12:36:04 -05:00
Alex Dadgar 49a47483d1 Registering back to initializing
Fix a bug in which if the node attributes/meta changed, we would
re-register the node in status initializing. This would incorrectly
trigger the client to log that it missed its heartbeat.

It would change the status of the Node to initializing until the next
heartbeat occured.
2018-02-16 17:49:31 -08:00
Alex Dadgar eff4455c68 Fix original client server list behavior 2018-02-15 16:04:53 -08:00
Alex Dadgar f9cf642436 Client tls 2018-02-15 15:22:57 -08:00
Alex Dadgar e685211892 Code review feedback 2018-02-15 13:59:02 -08:00
Alex Dadgar 2c0ad26374 New RPC Modes and basic setup for streaming RPC handlers 2018-02-15 13:59:01 -08:00
Alex Dadgar 9bc75f0ad4 Fix manager tests and make testagent recover from port conflicts 2018-02-15 13:59:01 -08:00
Alex Dadgar 3f1f8604bb initial round of comment review 2018-02-15 13:59:01 -08:00
Alex Dadgar c8c1284bc3 SetServer command actually returns an error if given an invalid server 2018-02-15 13:59:01 -08:00
Alex Dadgar 3f786b904b use server manager 2018-02-15 13:59:01 -08:00
Alex Dadgar 6dd1c9f49d Refactor 2018-02-15 13:59:00 -08:00
Alex Dadgar 1472b943d6 Stats Endpoint 2018-02-15 13:59:00 -08:00
Chelsea Holland Komlo 4a26959825 code review feedback 2018-02-07 18:10:55 -05:00
Chelsea Holland Komlo d626d24488 remove dependency on client for fingerprint manager 2018-02-07 18:10:45 -05:00
Chelsea Holland Komlo e012e5ab8a add fingerprint manager 2018-02-07 18:10:33 -05:00
Chelsea Holland Komlo b21233fe23 update log message 2018-02-01 19:46:57 -05:00
Chelsea Holland Komlo 6f9c0ab361 req/resp should be within config locks; rename for detected fingerprints
changelog
2018-02-01 19:00:39 -05:00
Chelsea Holland Komlo b8e8064835 code review fixup 2018-01-31 18:34:03 -05:00
Chelsea Holland Komlo 7b53474a6e add applicable boolean to fingerprint response
public fields and remove getter functions
2018-01-31 13:21:45 -05:00
Chelsea Holland Komlo 9482c322b7 locks for fingerprint reads/writes 2018-01-30 11:32:45 -05:00
Chelsea Holland Komlo 7c19de797c create safe getters and setters for fingerprint response 2018-01-26 11:22:05 -05:00
Chelsea Holland Komlo 896d6f8058 fixups from code review 2018-01-26 07:04:32 -05:00
Chelsea Holland Komlo 9a8344333b refactor Fingerprint to request/response construct 2018-01-24 11:54:02 -05:00
Chelsea Holland Komlo 649f86f094 refactor creating a new tls configuration 2018-01-16 08:02:39 -05:00
Chelsea Holland Komlo 6c9f9c8ac3 adding additional test assertions; differentiate reloading agent and http server 2018-01-16 07:34:39 -05:00
Chelsea Holland Komlo 214d128eb9 reload raft transport layer
fix up linting
2018-01-08 14:52:28 -05:00
Chelsea Holland Komlo 0708d34135 call reload on agent, client, and server separately 2018-01-08 09:56:31 -05:00
Chelsea Holland Komlo 9741097406 reloading tls config should be atomic for clients/servers 2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo ae7fc4695e fixups from code review
Revert "close raft long-lived connections"

This reverts commit 3ffda28206fcb3d63ad117fd1d27ae6f832b6625.

reload raft connections on changing tls
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo acd3d1b162 fix up downgrading client to plaintext
add locks around changing server configuration
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo c0ad9a4627 add ability to upgrade/downgrade nomad agents tls configurations via sighup 2018-01-08 09:21:06 -05:00
Alex Dadgar 91ffbbb517 Review feedback 2017-12-07 16:10:57 -08:00
Alex Dadgar 02baa6c52b Handle race between fingerprinters and registration 2017-12-07 13:09:37 -08:00
Alex Dadgar 4409fdacc0 Drop trace logging 2017-12-06 18:02:24 -08:00
Alex Dadgar cd9a7f14b8 Add logging around heartbeats 2017-12-06 17:57:50 -08:00
Chelsea Komlo 2dfda33703 Nomad agent reload TLS configuration on SIGHUP (#3479)
* Allow server TLS configuration to be reloaded via SIGHUP

* dynamic tls reloading for nomad agents

* code cleanup and refactoring

* ensure keyloader is initialized, add comments

* allow downgrading from TLS

* initalize keyloader if necessary

* integration test for tls reload

* fix up test to assert success on reloaded TLS configuration

* failure in loading a new TLS config should remain at current

Reload only the config if agent is already using TLS

* reload agent configuration before specific server/client

lock keyloader before loading/caching a new certificate

* introduce a get-or-set method for keyloader

* fixups from code review

* fix up linting errors

* fixups from code review

* add lock for config updates; improve copy of tls config

* GetCertificate only reloads certificates dynamically for the server

* config updates/copies should be on agent

* improve http integration test

* simplify agent reloading storing a local copy of config

* reuse the same keyloader when reloading

* Test that server and client get reloaded but keep keyloader

* Keyloader exposes GetClientCertificate as well for outgoing connections

* Fix spelling

* correct changelog style
2017-11-14 17:53:23 -08:00
Michael Schurter 1769db98b7 Fix regression by returning error on unknown alloc 2017-11-01 15:16:38 -05:00
Michael Schurter 73e9b57908 Trigger GCs after alloc changes
GC much more aggressively by triggering GCs when allocations become
terminal as well as after new allocations are added.
2017-11-01 15:16:38 -05:00