Commit graph

549 commits

Author SHA1 Message Date
Michael Schurter 533bc93b3a implement all boltdb interactions behind StateDB 2018-10-16 16:53:30 -07:00
Michael Schurter a5d3e3fb0a Implement alloc updates in arv2
Updates are applied asynchronously but sequentially
2018-10-16 16:53:30 -07:00
Michael Schurter a4b4d7b266 consul service hook
Deregistration works but difficult to test due to terminal updates not
being fully implemented in the new client/ar/tr.
2018-10-16 16:53:29 -07:00
Michael Schurter 5be982e674 restore vault client 2018-10-16 16:53:29 -07:00
Alex Dadgar fd3bc1bd39 Update state with server 2018-10-16 16:53:29 -07:00
Michael Schurter 7f4ec50906 missed locking around c.allocs access 2018-10-16 16:53:29 -07:00
Michael Schurter 516d641db0 client: implement all-or-nothing alloc restoration
Restoring calls NewAR -> Restore -> Run

NewAR now calls NewTR
AR.Restore calls TR.Restore
AR.Run calls TR.Run
2018-10-16 16:53:29 -07:00
Alex Dadgar 80f6ce50c0 vault hook 2018-10-16 16:53:29 -07:00
Michael Schurter b360f6f96e fix hclog level 2018-10-16 16:53:29 -07:00
Michael Schurter 4f43ff5c51 pass statedb into allocrunnerv2 2018-10-16 16:53:29 -07:00
Michael Schurter 0f7dcfdc9a example redis job "runs" on arv2! see below
Tons left to do and lots of churn:
1. No state saving
2. No shutdown or gc
3. Removed AR factory *for now*
4. Made all "Config" structs local to the package they configure
5. Added allocID to GC to avoid a lookup

Really hating how many things use *structs.Allocation. It's not bad
without state saving, but if AllocRunner starts updating its copy things
get racy fast.
2018-10-16 16:53:29 -07:00
Alex Dadgar 01f8e5b95f renames 2018-10-04 14:57:25 -07:00
Alex Dadgar 52f9cd7637 fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar 5c8697667e Node reserved resources 2018-09-29 18:44:55 -07:00
Alex Dadgar 3183153315 Node resources on client 2018-09-29 17:23:41 -07:00
Alex Dadgar 9971b3393f yamux 2018-09-17 14:22:40 -07:00
Alex Dadgar 7739ef51ce agent + consul 2018-09-13 10:43:40 -07:00
Michael Schurter 08862fc177 fix race around error handling 2018-09-05 17:34:17 -07:00
Preetha 043f4c208b
Merge pull request #3882 from burdandrei/telemetry-add-node-class-tag
Added node class to tagged metrics
2018-06-21 17:04:35 -05:00
Alex Dadgar b61051b3cd
Merge pull request #4409 from hashicorp/r-client-packages
Refactor client packages
2018-06-13 17:32:25 -07:00
Alex Dadgar 90c2108bfb Fix gc tests + parallel destroy + small test fixes 2018-06-12 10:23:45 -07:00
Alex Dadgar f5ff509fa5 Refactor - wip 2018-06-12 10:23:45 -07:00
Chelsea Holland Komlo f74e74b22d add client logic to determine whether TLS RPC connections should reload 2018-06-08 14:38:58 -04:00
Chelsea Holland Komlo 064b5481e0 add server join info to server and client 2018-05-31 10:50:03 -07:00
Chelsea Holland Komlo 38f611a7f2 refactor NewTLSConfiguration to pass in verifyIncoming/verifyOutgoing
add missing fields to TLS merge method
2018-05-23 18:35:30 -04:00
Chelsea Holland Komlo 796bae6f1b allow configurable cipher suites
disallow 3DES and RC4 ciphers

add documentation for tls_cipher_suites
2018-05-09 17:15:31 -04:00
Chelsea Holland Komlo 9b8a079558 fix up comments 2018-04-17 11:53:08 -04:00
Alex Dadgar 9d612c8cb0 Cleanup 2018-04-16 15:48:34 -07:00
Alex Dadgar 32adaf9dfc Copy the config given to the alloc runner 2018-04-16 15:45:52 -07:00
Alex Dadgar 4f2a7b6949 Fix copying drivers 2018-04-16 15:45:51 -07:00
Alex Dadgar 0b799822ff Operate on copy 2018-04-16 15:45:49 -07:00
Alex Dadgar ff1a1a63e8 Move where attribute for driver detection is set 2018-04-12 15:50:25 -07:00
Alex Dadgar f24ce2c50c Driver health detection cleanups
This PR does:

1. Health message based on detection has format "Driver XXX detected"
and "Driver XXX not detected"
2. Set initial health description based on detection status and don't
wait for the first health check.
3. Combine updating attributes on the node, fingerprint and health
checking update for drivers into a single call back.
4. Condensed driver info in `node status` only shows detected drivers
and make the output less wide by removing spaces.
2018-04-12 12:46:40 -07:00
Andrei Burd 502d17fa90 Added node class to tagged metrics 2018-04-11 12:20:59 +03:00
Alex Dadgar 3d367d6fd7 Fix client uptime metric missing client prefix 2018-04-10 10:39:36 -07:00
Alex Dadgar ae1f76477e Start rebalance after discovering new servers 2018-04-05 15:41:59 -07:00
Alex Dadgar be2513e0f9 more jitter 2018-04-05 13:48:33 -07:00
Alex Dadgar bd3345942c Handle no leader and faster retries near limit
Handle the ErrNoLeader case and apply slower retries. Also when we have
missed the heartbeat retry aggressively, backing off after we have
missed for more than 30 seconds.
2018-04-05 11:22:47 -07:00
Alex Dadgar 279b5c22e5 Scale heartbeat retrying based on remaining heartbeat time 2018-04-05 10:58:13 -07:00
Alex Dadgar 7941f4eb2d Fire retry only when consul discovers new servers 2018-04-05 10:40:17 -07:00
Alex Dadgar 86c32358d4
Spelling error 2018-04-03 18:30:01 -07:00
Alex Dadgar 01a6beafbf RPC Retry Watcher 2018-04-03 18:05:28 -07:00
Alex Dadgar 58a3ec3fb2 Improve Vault error handling 2018-04-03 14:29:22 -07:00
Chelsea Holland Komlo 2174ede6b9 add clarifying comment 2018-03-29 10:58:39 -04:00
Chelsea Holland Komlo e3319afee1 emit first node event 2018-03-28 17:26:53 -04:00
Chelsea Holland Komlo efc03e252c specify driver health messages 2018-03-28 11:35:21 -04:00
Chelsea Holland Komlo 003bc209b9 use time.Time for node events for compatibility 2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo f801709a0a fix issue when updating node events 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 60f12d206f improve comments; update watchDriver 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 739784736a remove unused function 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d92703617c simplify logic
bump log level
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 86b7b3d2d9 fix up health check logic comparison; add node events to client driver checks 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 53a5bc2bb3 Code review feedback 2018-03-21 15:15:26 -04:00
Alex Dadgar 34dc58421c notes from walk through 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 44b6951dda improve tests 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 0425be8f48 updating comments; locking concurrent node access 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo c50d02ae93 go style; update comments 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3aa726baab fix scheduler driver name; create node structs file 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3cba95e8a7 allow nomad to schedule based on the status of a client driver health check
Slight updates for go style
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 0bde357731 add concept of health checks to fingerprinters and nodes
fix up feedback from code review

add driver info for all drivers to node
2018-03-21 15:15:25 -04:00
Preetha Appan 3c38eededd
Fix spelling in comment 2018-03-14 15:54:25 -05:00
Alex Dadgar bef4a8ee09 fix clearing node events 2018-03-14 09:48:59 -07:00
Chelsea Komlo 810eedfa2a
Merge pull request #3945 from hashicorp/f-add-node-events
Add node events
2018-03-14 08:42:55 -04:00
Preetha 360d6e5a92
Merge pull request #3968 from hashicorp/f-nicer-vault-error
Make server side error messages from vault more clearer
2018-03-13 20:49:39 -05:00
Alex Dadgar de6ebb6e6c small cleanup 2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo b41501e442 code review feedback 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 1488b076d1 code review feedback 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo a8655320fd fix up go check warnings 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 0934769b04 add client side emitting of node events
Changelog
2018-03-13 18:08:21 -07:00
Preetha Appan 914eaed64f
Address some code review comments 2018-03-13 18:19:16 -05:00
Preetha Appan 09c231ce43
Return the err from server correctly 2018-03-13 18:10:14 -05:00
Preetha Appan 9618f52746
Remove error wrapping and make vault connection server side errors clearer. 2018-03-13 17:09:03 -05:00
Alex Dadgar 4844317cc2
Merge pull request #3890 from hashicorp/b-heartbeat
Heartbeat improvements and handling failures during establishing leadership
2018-03-12 14:41:59 -07:00
Josh Soref 173ce63fe9 spelling: transition 2018-03-11 19:06:05 +00:00
Josh Soref 782c704de6 spelling: thresholds 2018-03-11 19:03:47 +00:00
Josh Soref 8149694f3a spelling: server 2018-03-11 18:55:30 +00:00
Josh Soref 258d76ec13 spelling: registry 2018-03-11 18:41:13 +00:00
Josh Soref 3c1ce6d16d spelling: otherwise 2018-03-11 18:34:27 +00:00
Josh Soref 1ef6d6319e spelling: labels 2018-03-11 18:21:44 +00:00
Josh Soref 52b83328fc spelling: heartbeating 2018-03-11 18:12:19 +00:00
Josh Soref c9b86bbc2f spelling: controls 2018-03-11 17:50:39 +00:00
Josh Soref e78cf9c81a spelling: already 2018-03-11 17:39:04 +00:00
Josh Soref b8b46d3f74 spelling: allocation 2018-03-11 17:37:22 +00:00
Chelsea Holland Komlo 122d1c4e4a simplify retry logic 2018-03-01 09:48:26 -05:00
Chelsea Holland Komlo 355805db56 reset timer after updating node copy 2018-02-27 17:18:10 -05:00
Chelsea Holland Komlo a72aaaf47f add network resources equal method, use time ticker
remove impossible test case
2018-02-27 12:42:53 -05:00
Chelsea Holland Komlo e736e31820 use time ticker, update how network resources are compared 2018-02-26 18:47:11 -05:00
Chelsea Holland Komlo 5059065b52 improved testing; node networks comparison 2018-02-26 15:55:38 -05:00
Chelsea Holland Komlo 1f31b39fe8 code review fixups 2018-02-26 12:36:30 -05:00
Chelsea Holland Komlo ed8c8afbcd edge trigger node update
test update config copy trigger
2018-02-26 12:36:04 -05:00
Alex Dadgar 49a47483d1 Registering back to initializing
Fix a bug in which if the node attributes/meta changed, we would
re-register the node in status initializing. This would incorrectly
trigger the client to log that it missed its heartbeat.

It would change the status of the Node to initializing until the next
heartbeat occured.
2018-02-16 17:49:31 -08:00
Alex Dadgar eff4455c68 Fix original client server list behavior 2018-02-15 16:04:53 -08:00
Alex Dadgar f9cf642436 Client tls 2018-02-15 15:22:57 -08:00
Alex Dadgar e685211892 Code review feedback 2018-02-15 13:59:02 -08:00
Alex Dadgar 2c0ad26374 New RPC Modes and basic setup for streaming RPC handlers 2018-02-15 13:59:01 -08:00
Alex Dadgar 9bc75f0ad4 Fix manager tests and make testagent recover from port conflicts 2018-02-15 13:59:01 -08:00
Alex Dadgar 3f1f8604bb initial round of comment review 2018-02-15 13:59:01 -08:00
Alex Dadgar c8c1284bc3 SetServer command actually returns an error if given an invalid server 2018-02-15 13:59:01 -08:00
Alex Dadgar 3f786b904b use server manager 2018-02-15 13:59:01 -08:00
Alex Dadgar 6dd1c9f49d Refactor 2018-02-15 13:59:00 -08:00