Michael Schurter
8d1419c62b
client: fix accessing alloc runners
...
* GetClientAlloc() gains nothing from using allAllocs()
* getAllocatedResources was calling getAllocRunners() twice
2018-10-16 16:56:55 -07:00
Michael Schurter
e6e2930a00
tr: implement stats collection hook
...
Tested except for the net/rpc specific error case which may need
changing in the gRPC world.
2018-10-16 16:53:31 -07:00
Alex Dadgar
cebfead6bc
add logger back
2018-10-16 16:53:30 -07:00
Alex Dadgar
8504505c0d
client uses passed logger and fix fingerprinters
2018-10-16 16:53:30 -07:00
Michael Schurter
9d1ea3b228
client: hclog-ify most of the client
...
Leaving fingerprinters in case that interface changes with plugins.
2018-10-16 16:53:30 -07:00
Michael Schurter
e42154fc46
implement stopping, destroying, and disk migration
...
* Stopping an alloc is implemented via Updates but update hooks are
*not* run.
* Destroying an alloc is a best effort cleanup.
* AllocRunner destroy hooks implemented.
* Disk migration and blocking on a previous allocation exiting moved to
its own package to avoid cycles. Now only depends on alloc broadcaster
instead of also using a waitch.
* AllocBroadcaster now only drops stale allocations and always keeps the
latest version.
* Made AllocDir safe for concurrent use
Lots of internal contexts that are currently unused. Unsure if they
should be used or removed.
2018-10-16 16:53:30 -07:00
Michael Schurter
4236255686
lots of comment/log fixes
2018-10-16 16:53:30 -07:00
Michael Schurter
357641c364
persist alloc state on changes, not periodically
...
Allow alloc and task runners to persist their own state when something
changes instead of periodically syncing all state.
2018-10-16 16:53:30 -07:00
Michael Schurter
a3fe0510d1
Move all encoding and put deduping into state db
...
Still WIP as it does not handle deletions.
2018-10-16 16:53:30 -07:00
Michael Schurter
533bc93b3a
implement all boltdb interactions behind StateDB
2018-10-16 16:53:30 -07:00
Michael Schurter
a5d3e3fb0a
Implement alloc updates in arv2
...
Updates are applied asynchronously but sequentially
2018-10-16 16:53:30 -07:00
Michael Schurter
a4b4d7b266
consul service hook
...
Deregistration works but difficult to test due to terminal updates not
being fully implemented in the new client/ar/tr.
2018-10-16 16:53:29 -07:00
Michael Schurter
5be982e674
restore vault client
2018-10-16 16:53:29 -07:00
Alex Dadgar
fd3bc1bd39
Update state with server
2018-10-16 16:53:29 -07:00
Michael Schurter
7f4ec50906
missed locking around c.allocs access
2018-10-16 16:53:29 -07:00
Michael Schurter
516d641db0
client: implement all-or-nothing alloc restoration
...
Restoring calls NewAR -> Restore -> Run
NewAR now calls NewTR
AR.Restore calls TR.Restore
AR.Run calls TR.Run
2018-10-16 16:53:29 -07:00
Alex Dadgar
80f6ce50c0
vault hook
2018-10-16 16:53:29 -07:00
Michael Schurter
b360f6f96e
fix hclog level
2018-10-16 16:53:29 -07:00
Michael Schurter
4f43ff5c51
pass statedb into allocrunnerv2
2018-10-16 16:53:29 -07:00
Michael Schurter
0f7dcfdc9a
example redis job "runs" on arv2! see below
...
Tons left to do and lots of churn:
1. No state saving
2. No shutdown or gc
3. Removed AR factory *for now*
4. Made all "Config" structs local to the package they configure
5. Added allocID to GC to avoid a lookup
Really hating how many things use *structs.Allocation. It's not bad
without state saving, but if AllocRunner starts updating its copy things
get racy fast.
2018-10-16 16:53:29 -07:00
Alex Dadgar
01f8e5b95f
renames
2018-10-04 14:57:25 -07:00
Alex Dadgar
52f9cd7637
fixing tests
2018-10-04 14:26:19 -07:00
Alex Dadgar
5c8697667e
Node reserved resources
2018-09-29 18:44:55 -07:00
Alex Dadgar
3183153315
Node resources on client
2018-09-29 17:23:41 -07:00
Alex Dadgar
9971b3393f
yamux
2018-09-17 14:22:40 -07:00
Alex Dadgar
7739ef51ce
agent + consul
2018-09-13 10:43:40 -07:00
Michael Schurter
08862fc177
fix race around error handling
2018-09-05 17:34:17 -07:00
Preetha
043f4c208b
Merge pull request #3882 from burdandrei/telemetry-add-node-class-tag
...
Added node class to tagged metrics
2018-06-21 17:04:35 -05:00
Alex Dadgar
b61051b3cd
Merge pull request #4409 from hashicorp/r-client-packages
...
Refactor client packages
2018-06-13 17:32:25 -07:00
Alex Dadgar
90c2108bfb
Fix gc tests + parallel destroy + small test fixes
2018-06-12 10:23:45 -07:00
Alex Dadgar
f5ff509fa5
Refactor - wip
2018-06-12 10:23:45 -07:00
Chelsea Holland Komlo
f74e74b22d
add client logic to determine whether TLS RPC connections should reload
2018-06-08 14:38:58 -04:00
Chelsea Holland Komlo
064b5481e0
add server join info to server and client
2018-05-31 10:50:03 -07:00
Chelsea Holland Komlo
38f611a7f2
refactor NewTLSConfiguration to pass in verifyIncoming/verifyOutgoing
...
add missing fields to TLS merge method
2018-05-23 18:35:30 -04:00
Chelsea Holland Komlo
796bae6f1b
allow configurable cipher suites
...
disallow 3DES and RC4 ciphers
add documentation for tls_cipher_suites
2018-05-09 17:15:31 -04:00
Chelsea Holland Komlo
9b8a079558
fix up comments
2018-04-17 11:53:08 -04:00
Alex Dadgar
9d612c8cb0
Cleanup
2018-04-16 15:48:34 -07:00
Alex Dadgar
32adaf9dfc
Copy the config given to the alloc runner
2018-04-16 15:45:52 -07:00
Alex Dadgar
4f2a7b6949
Fix copying drivers
2018-04-16 15:45:51 -07:00
Alex Dadgar
0b799822ff
Operate on copy
2018-04-16 15:45:49 -07:00
Alex Dadgar
ff1a1a63e8
Move where attribute for driver detection is set
2018-04-12 15:50:25 -07:00
Alex Dadgar
f24ce2c50c
Driver health detection cleanups
...
This PR does:
1. Health message based on detection has format "Driver XXX detected"
and "Driver XXX not detected"
2. Set initial health description based on detection status and don't
wait for the first health check.
3. Combine updating attributes on the node, fingerprint and health
checking update for drivers into a single call back.
4. Condensed driver info in `node status` only shows detected drivers
and make the output less wide by removing spaces.
2018-04-12 12:46:40 -07:00
Andrei Burd
502d17fa90
Added node class to tagged metrics
2018-04-11 12:20:59 +03:00
Alex Dadgar
3d367d6fd7
Fix client uptime metric missing client prefix
2018-04-10 10:39:36 -07:00
Alex Dadgar
ae1f76477e
Start rebalance after discovering new servers
2018-04-05 15:41:59 -07:00
Alex Dadgar
be2513e0f9
more jitter
2018-04-05 13:48:33 -07:00
Alex Dadgar
bd3345942c
Handle no leader and faster retries near limit
...
Handle the ErrNoLeader case and apply slower retries. Also when we have
missed the heartbeat retry aggressively, backing off after we have
missed for more than 30 seconds.
2018-04-05 11:22:47 -07:00
Alex Dadgar
279b5c22e5
Scale heartbeat retrying based on remaining heartbeat time
2018-04-05 10:58:13 -07:00
Alex Dadgar
7941f4eb2d
Fire retry only when consul discovers new servers
2018-04-05 10:40:17 -07:00
Alex Dadgar
86c32358d4
Spelling error
2018-04-03 18:30:01 -07:00
Alex Dadgar
01a6beafbf
RPC Retry Watcher
2018-04-03 18:05:28 -07:00
Alex Dadgar
58a3ec3fb2
Improve Vault error handling
2018-04-03 14:29:22 -07:00
Chelsea Holland Komlo
2174ede6b9
add clarifying comment
2018-03-29 10:58:39 -04:00
Chelsea Holland Komlo
e3319afee1
emit first node event
2018-03-28 17:26:53 -04:00
Chelsea Holland Komlo
efc03e252c
specify driver health messages
2018-03-28 11:35:21 -04:00
Chelsea Holland Komlo
003bc209b9
use time.Time for node events for compatibility
2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo
f801709a0a
fix issue when updating node events
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
60f12d206f
improve comments; update watchDriver
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
739784736a
remove unused function
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
d92703617c
simplify logic
...
bump log level
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
86b7b3d2d9
fix up health check logic comparison; add node events to client driver checks
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
53a5bc2bb3
Code review feedback
2018-03-21 15:15:26 -04:00
Alex Dadgar
34dc58421c
notes from walk through
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
44b6951dda
improve tests
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
0425be8f48
updating comments; locking concurrent node access
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
c50d02ae93
go style; update comments
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
3aa726baab
fix scheduler driver name; create node structs file
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
3cba95e8a7
allow nomad to schedule based on the status of a client driver health check
...
Slight updates for go style
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
0bde357731
add concept of health checks to fingerprinters and nodes
...
fix up feedback from code review
add driver info for all drivers to node
2018-03-21 15:15:25 -04:00
Preetha Appan
3c38eededd
Fix spelling in comment
2018-03-14 15:54:25 -05:00
Alex Dadgar
bef4a8ee09
fix clearing node events
2018-03-14 09:48:59 -07:00
Chelsea Komlo
810eedfa2a
Merge pull request #3945 from hashicorp/f-add-node-events
...
Add node events
2018-03-14 08:42:55 -04:00
Preetha
360d6e5a92
Merge pull request #3968 from hashicorp/f-nicer-vault-error
...
Make server side error messages from vault more clearer
2018-03-13 20:49:39 -05:00
Alex Dadgar
de6ebb6e6c
small cleanup
2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo
b41501e442
code review feedback
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
1488b076d1
code review feedback
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
a8655320fd
fix up go check warnings
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
0934769b04
add client side emitting of node events
...
Changelog
2018-03-13 18:08:21 -07:00
Preetha Appan
914eaed64f
Address some code review comments
2018-03-13 18:19:16 -05:00
Preetha Appan
09c231ce43
Return the err from server correctly
2018-03-13 18:10:14 -05:00
Preetha Appan
9618f52746
Remove error wrapping and make vault connection server side errors clearer.
2018-03-13 17:09:03 -05:00
Alex Dadgar
4844317cc2
Merge pull request #3890 from hashicorp/b-heartbeat
...
Heartbeat improvements and handling failures during establishing leadership
2018-03-12 14:41:59 -07:00
Josh Soref
173ce63fe9
spelling: transition
2018-03-11 19:06:05 +00:00
Josh Soref
782c704de6
spelling: thresholds
2018-03-11 19:03:47 +00:00
Josh Soref
8149694f3a
spelling: server
2018-03-11 18:55:30 +00:00
Josh Soref
258d76ec13
spelling: registry
2018-03-11 18:41:13 +00:00
Josh Soref
3c1ce6d16d
spelling: otherwise
2018-03-11 18:34:27 +00:00
Josh Soref
1ef6d6319e
spelling: labels
2018-03-11 18:21:44 +00:00
Josh Soref
52b83328fc
spelling: heartbeating
2018-03-11 18:12:19 +00:00
Josh Soref
c9b86bbc2f
spelling: controls
2018-03-11 17:50:39 +00:00
Josh Soref
e78cf9c81a
spelling: already
2018-03-11 17:39:04 +00:00
Josh Soref
b8b46d3f74
spelling: allocation
2018-03-11 17:37:22 +00:00
Chelsea Holland Komlo
122d1c4e4a
simplify retry logic
2018-03-01 09:48:26 -05:00
Chelsea Holland Komlo
355805db56
reset timer after updating node copy
2018-02-27 17:18:10 -05:00
Chelsea Holland Komlo
a72aaaf47f
add network resources equal method, use time ticker
...
remove impossible test case
2018-02-27 12:42:53 -05:00
Chelsea Holland Komlo
e736e31820
use time ticker, update how network resources are compared
2018-02-26 18:47:11 -05:00
Chelsea Holland Komlo
5059065b52
improved testing; node networks comparison
2018-02-26 15:55:38 -05:00
Chelsea Holland Komlo
1f31b39fe8
code review fixups
2018-02-26 12:36:30 -05:00
Chelsea Holland Komlo
ed8c8afbcd
edge trigger node update
...
test update config copy trigger
2018-02-26 12:36:04 -05:00
Alex Dadgar
49a47483d1
Registering back to initializing
...
Fix a bug in which if the node attributes/meta changed, we would
re-register the node in status initializing. This would incorrectly
trigger the client to log that it missed its heartbeat.
It would change the status of the Node to initializing until the next
heartbeat occured.
2018-02-16 17:49:31 -08:00
Alex Dadgar
eff4455c68
Fix original client server list behavior
2018-02-15 16:04:53 -08:00
Alex Dadgar
f9cf642436
Client tls
2018-02-15 15:22:57 -08:00
Alex Dadgar
e685211892
Code review feedback
2018-02-15 13:59:02 -08:00
Alex Dadgar
2c0ad26374
New RPC Modes and basic setup for streaming RPC handlers
2018-02-15 13:59:01 -08:00
Alex Dadgar
9bc75f0ad4
Fix manager tests and make testagent recover from port conflicts
2018-02-15 13:59:01 -08:00
Alex Dadgar
3f1f8604bb
initial round of comment review
2018-02-15 13:59:01 -08:00
Alex Dadgar
c8c1284bc3
SetServer command actually returns an error if given an invalid server
2018-02-15 13:59:01 -08:00
Alex Dadgar
3f786b904b
use server manager
2018-02-15 13:59:01 -08:00
Alex Dadgar
6dd1c9f49d
Refactor
2018-02-15 13:59:00 -08:00
Alex Dadgar
1472b943d6
Stats Endpoint
2018-02-15 13:59:00 -08:00
Chelsea Holland Komlo
4a26959825
code review feedback
2018-02-07 18:10:55 -05:00
Chelsea Holland Komlo
d626d24488
remove dependency on client for fingerprint manager
2018-02-07 18:10:45 -05:00
Chelsea Holland Komlo
e012e5ab8a
add fingerprint manager
2018-02-07 18:10:33 -05:00
Chelsea Holland Komlo
b21233fe23
update log message
2018-02-01 19:46:57 -05:00
Chelsea Holland Komlo
6f9c0ab361
req/resp should be within config locks; rename for detected fingerprints
...
changelog
2018-02-01 19:00:39 -05:00
Chelsea Holland Komlo
b8e8064835
code review fixup
2018-01-31 18:34:03 -05:00
Chelsea Holland Komlo
7b53474a6e
add applicable boolean to fingerprint response
...
public fields and remove getter functions
2018-01-31 13:21:45 -05:00
Chelsea Holland Komlo
9482c322b7
locks for fingerprint reads/writes
2018-01-30 11:32:45 -05:00
Chelsea Holland Komlo
7c19de797c
create safe getters and setters for fingerprint response
2018-01-26 11:22:05 -05:00
Chelsea Holland Komlo
896d6f8058
fixups from code review
2018-01-26 07:04:32 -05:00
Chelsea Holland Komlo
9a8344333b
refactor Fingerprint to request/response construct
2018-01-24 11:54:02 -05:00
Chelsea Holland Komlo
649f86f094
refactor creating a new tls configuration
2018-01-16 08:02:39 -05:00
Chelsea Holland Komlo
6c9f9c8ac3
adding additional test assertions; differentiate reloading agent and http server
2018-01-16 07:34:39 -05:00
Chelsea Holland Komlo
214d128eb9
reload raft transport layer
...
fix up linting
2018-01-08 14:52:28 -05:00
Chelsea Holland Komlo
0708d34135
call reload on agent, client, and server separately
2018-01-08 09:56:31 -05:00
Chelsea Holland Komlo
9741097406
reloading tls config should be atomic for clients/servers
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo
ae7fc4695e
fixups from code review
...
Revert "close raft long-lived connections"
This reverts commit 3ffda28206fcb3d63ad117fd1d27ae6f832b6625.
reload raft connections on changing tls
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo
acd3d1b162
fix up downgrading client to plaintext
...
add locks around changing server configuration
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo
c0ad9a4627
add ability to upgrade/downgrade nomad agents tls configurations via sighup
2018-01-08 09:21:06 -05:00
Alex Dadgar
91ffbbb517
Review feedback
2017-12-07 16:10:57 -08:00
Alex Dadgar
02baa6c52b
Handle race between fingerprinters and registration
2017-12-07 13:09:37 -08:00
Alex Dadgar
4409fdacc0
Drop trace logging
2017-12-06 18:02:24 -08:00
Alex Dadgar
cd9a7f14b8
Add logging around heartbeats
2017-12-06 17:57:50 -08:00
Chelsea Komlo
2dfda33703
Nomad agent reload TLS configuration on SIGHUP ( #3479 )
...
* Allow server TLS configuration to be reloaded via SIGHUP
* dynamic tls reloading for nomad agents
* code cleanup and refactoring
* ensure keyloader is initialized, add comments
* allow downgrading from TLS
* initalize keyloader if necessary
* integration test for tls reload
* fix up test to assert success on reloaded TLS configuration
* failure in loading a new TLS config should remain at current
Reload only the config if agent is already using TLS
* reload agent configuration before specific server/client
lock keyloader before loading/caching a new certificate
* introduce a get-or-set method for keyloader
* fixups from code review
* fix up linting errors
* fixups from code review
* add lock for config updates; improve copy of tls config
* GetCertificate only reloads certificates dynamically for the server
* config updates/copies should be on agent
* improve http integration test
* simplify agent reloading storing a local copy of config
* reuse the same keyloader when reloading
* Test that server and client get reloaded but keep keyloader
* Keyloader exposes GetClientCertificate as well for outgoing connections
* Fix spelling
* correct changelog style
2017-11-14 17:53:23 -08:00
Michael Schurter
1769db98b7
Fix regression by returning error on unknown alloc
2017-11-01 15:16:38 -05:00
Michael Schurter
73e9b57908
Trigger GCs after alloc changes
...
GC much more aggressively by triggering GCs when allocations become
terminal as well as after new allocations are added.
2017-11-01 15:16:38 -05:00
Michael Schurter
2a81160dcd
Fix GC'd alloc tracking
...
The Client.allocs map now contains all AllocRunners again, not just
un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs
allocs.
Also stops logging "marked for GC" twice.
2017-11-01 15:16:38 -05:00
Alex Dadgar
4831380e57
Node access is done using locked Node copy
...
Fixes https://github.com/hashicorp/nomad/issues/3454
Reliably reproduced the data race before by having a fingerprinter
change the nodes attributes every millisecond and syncing at the same
rate. With fix, did not ever panic.
2017-10-27 13:27:24 -07:00
Michael Schurter
15b991e039
base64 migrate token
...
HTTP header values must be ASCII.
Also constant time compare tokens and test the generate and compare
helper functions.
2017-10-13 10:59:13 -07:00
Chelsea Holland Komlo
e1c4701a43
fix up build warnings
2017-10-11 17:11:57 -07:00
Chelsea Holland Komlo
b018ca4d46
fixing up code review comments
2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo
410adaf726
Add functionality for authenticated volumes
2017-10-11 17:09:20 -07:00
Michael Schurter
a66c53d45a
Remove structs
import from api
...
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Alex Dadgar
4173834231
Enable more linters
2017-09-26 15:26:33 -07:00
Chelsea Holland Komlo
b26454cf99
Move setGaugeForAllocationStats to emitClientMetrics
2017-09-25 16:05:49 +00:00
Alex Dadgar
d306da846c
changelog and feedback
2017-09-14 14:08:58 -07:00
Alex Dadgar
07ed83fdd5
Non-locked accessors to common Node fields
...
This PR removes locking around commonly accessed node attributes that do
not need to be locked. The locking could cause nodes to TTL as the
heartbeat code path was acquiring a lock that could be held for an
excessively long time. An example of this is when Vault is inaccessible,
since the fingerprint is run with a lock held but the Vault
fingerprinter makes the API calls with a large timeout.
Fixes https://github.com/hashicorp/nomad/issues/2689
2017-09-14 14:08:26 -07:00
Chelsea Holland Komlo
848af92183
fix panic in emitting tagged metrics
2017-09-11 15:32:37 +00:00
Chelsea Holland Komlo
0ef43c3c5f
final code review fixups
2017-09-05 18:47:44 +00:00
Chelsea Holland Komlo
a8cbd0b559
fixups from code review
2017-09-05 14:13:34 +00:00