open-nomad

Author	SHA1	Message	Date
Alex Dadgar	45e41cca03	allocrunnerv2 -> allocrunner	2018-10-16 16:56:56 -07:00
Alex Dadgar	6c9d9d5173	move files around	2018-10-16 16:56:55 -07:00
Michael Schurter	960f3be76c	client: expose task state to client The interesting decision in this commit was to expose AR's state and not a fully materialized Allocation struct. AR.clientAlloc builds an Alloc that contains the task state, so I considered simply memoizing and exposing that method. However, that would lead to AR having two awkwardly similar methods: - Alloc() - which returns the server-sent alloc - ClientAlloc() - which returns the fully materialized client alloc Since ClientAlloc() could be memoized it would be just as cheap to call as Alloc(), so why not replace Alloc() entirely? Replacing Alloc() entirely would require Update() to immediately materialize the task states on server-sent Allocs as there may have been local task state changes since the server received an Alloc update. This quickly becomes difficult to reason about: should Update hooks use the TaskStates? Are state changes caused by TR Update hooks immediately reflected in the Alloc? Should AR persist its copy of the Alloc? If so, are its TaskStates canonical or the TaskStates on TR? So! Forget that. Let's separate the static Allocation from the dynamic AR & TR state! - AR.Alloc() is for static Allocation access (often for the Job) - AR.AllocState() is for the dynamic AR & TR runtime state (deployment status, task states, etc). If code needs to know the status of a task: AllocState() If code needs to know the names of tasks: Alloc() It should be very easy for a developer to reason about which method they should call and what they can do with the return values.	2018-10-16 16:56:55 -07:00
Michael Schurter	8d1419c62b	client: fix accessing alloc runners * GetClientAlloc() gains nothing from using allAllocs() * getAllocatedResources was calling getAllocRunners() twice	2018-10-16 16:56:55 -07:00
Michael Schurter	e6e2930a00	tr: implement stats collection hook Tested except for the net/rpc specific error case which may need changing in the gRPC world.	2018-10-16 16:53:31 -07:00
Alex Dadgar	cebfead6bc	add logger back	2018-10-16 16:53:30 -07:00
Alex Dadgar	8504505c0d	client uses passed logger and fix fingerprinters	2018-10-16 16:53:30 -07:00
Michael Schurter	9d1ea3b228	client: hclog-ify most of the client Leaving fingerprinters in case that interface changes with plugins.	2018-10-16 16:53:30 -07:00
Michael Schurter	e42154fc46	implement stopping, destroying, and disk migration * Stopping an alloc is implemented via Updates but update hooks are not run. * Destroying an alloc is a best effort cleanup. * AllocRunner destroy hooks implemented. * Disk migration and blocking on a previous allocation exiting moved to its own package to avoid cycles. Now only depends on alloc broadcaster instead of also using a waitch. * AllocBroadcaster now only drops stale allocations and always keeps the latest version. * Made AllocDir safe for concurrent use Lots of internal contexts that are currently unused. Unsure if they should be used or removed.	2018-10-16 16:53:30 -07:00
Michael Schurter	4236255686	lots of comment/log fixes	2018-10-16 16:53:30 -07:00
Michael Schurter	357641c364	persist alloc state on changes, not periodically Allow alloc and task runners to persist their own state when something changes instead of periodically syncing all state.	2018-10-16 16:53:30 -07:00
Michael Schurter	a3fe0510d1	Move all encoding and put deduping into state db Still WIP as it does not handle deletions.	2018-10-16 16:53:30 -07:00
Michael Schurter	533bc93b3a	implement all boltdb interactions behind StateDB	2018-10-16 16:53:30 -07:00
Michael Schurter	a5d3e3fb0a	Implement alloc updates in arv2 Updates are applied asynchronously but sequentially	2018-10-16 16:53:30 -07:00
Michael Schurter	a4b4d7b266	consul service hook Deregistration works but difficult to test due to terminal updates not being fully implemented in the new client/ar/tr.	2018-10-16 16:53:29 -07:00
Michael Schurter	5be982e674	restore vault client	2018-10-16 16:53:29 -07:00
Alex Dadgar	fd3bc1bd39	Update state with server	2018-10-16 16:53:29 -07:00
Michael Schurter	7f4ec50906	missed locking around c.allocs access	2018-10-16 16:53:29 -07:00
Michael Schurter	516d641db0	client: implement all-or-nothing alloc restoration Restoring calls NewAR -> Restore -> Run NewAR now calls NewTR AR.Restore calls TR.Restore AR.Run calls TR.Run	2018-10-16 16:53:29 -07:00
Alex Dadgar	80f6ce50c0	vault hook	2018-10-16 16:53:29 -07:00
Michael Schurter	b360f6f96e	fix hclog level	2018-10-16 16:53:29 -07:00
Michael Schurter	4f43ff5c51	pass statedb into allocrunnerv2	2018-10-16 16:53:29 -07:00
Michael Schurter	0f7dcfdc9a	example redis job "runs" on arv2! see below Tons left to do and lots of churn: 1. No state saving 2. No shutdown or gc 3. Removed AR factory for now 4. Made all "Config" structs local to the package they configure 5. Added allocID to GC to avoid a lookup Really hating how many things use *structs.Allocation. It's not bad without state saving, but if AllocRunner starts updating its copy things get racy fast.	2018-10-16 16:53:29 -07:00
Alex Dadgar	01f8e5b95f	renames	2018-10-04 14:57:25 -07:00
Alex Dadgar	52f9cd7637	fixing tests	2018-10-04 14:26:19 -07:00
Alex Dadgar	5c8697667e	Node reserved resources	2018-09-29 18:44:55 -07:00
Alex Dadgar	3183153315	Node resources on client	2018-09-29 17:23:41 -07:00
Alex Dadgar	9971b3393f	yamux	2018-09-17 14:22:40 -07:00
Alex Dadgar	7739ef51ce	agent + consul	2018-09-13 10:43:40 -07:00
Michael Schurter	08862fc177	fix race around error handling	2018-09-05 17:34:17 -07:00
Preetha	043f4c208b	Merge pull request #3882 from burdandrei/telemetry-add-node-class-tag Added node class to tagged metrics	2018-06-21 17:04:35 -05:00
Alex Dadgar	b61051b3cd	Merge pull request #4409 from hashicorp/r-client-packages Refactor client packages	2018-06-13 17:32:25 -07:00
Alex Dadgar	90c2108bfb	Fix gc tests + parallel destroy + small test fixes	2018-06-12 10:23:45 -07:00
Alex Dadgar	f5ff509fa5	Refactor - wip	2018-06-12 10:23:45 -07:00
Chelsea Holland Komlo	f74e74b22d	add client logic to determine whether TLS RPC connections should reload	2018-06-08 14:38:58 -04:00
Chelsea Holland Komlo	064b5481e0	add server join info to server and client	2018-05-31 10:50:03 -07:00
Chelsea Holland Komlo	38f611a7f2	refactor NewTLSConfiguration to pass in verifyIncoming/verifyOutgoing add missing fields to TLS merge method	2018-05-23 18:35:30 -04:00
Chelsea Holland Komlo	796bae6f1b	allow configurable cipher suites disallow 3DES and RC4 ciphers add documentation for tls_cipher_suites	2018-05-09 17:15:31 -04:00
Chelsea Holland Komlo	9b8a079558	fix up comments	2018-04-17 11:53:08 -04:00
Alex Dadgar	9d612c8cb0	Cleanup	2018-04-16 15:48:34 -07:00
Alex Dadgar	32adaf9dfc	Copy the config given to the alloc runner	2018-04-16 15:45:52 -07:00
Alex Dadgar	4f2a7b6949	Fix copying drivers	2018-04-16 15:45:51 -07:00
Alex Dadgar	0b799822ff	Operate on copy	2018-04-16 15:45:49 -07:00
Alex Dadgar	ff1a1a63e8	Move where attribute for driver detection is set	2018-04-12 15:50:25 -07:00
Alex Dadgar	f24ce2c50c	Driver health detection cleanups This PR does: 1. Health message based on detection has format "Driver XXX detected" and "Driver XXX not detected" 2. Set initial health description based on detection status and don't wait for the first health check. 3. Combine updating attributes on the node, fingerprint and health checking update for drivers into a single call back. 4. Condensed driver info in `node status` only shows detected drivers and make the output less wide by removing spaces.	2018-04-12 12:46:40 -07:00
Andrei Burd	502d17fa90	Added node class to tagged metrics	2018-04-11 12:20:59 +03:00
Alex Dadgar	3d367d6fd7	Fix client uptime metric missing client prefix	2018-04-10 10:39:36 -07:00
Alex Dadgar	ae1f76477e	Start rebalance after discovering new servers	2018-04-05 15:41:59 -07:00
Alex Dadgar	be2513e0f9	more jitter	2018-04-05 13:48:33 -07:00
Alex Dadgar	bd3345942c	Handle no leader and faster retries near limit Handle the ErrNoLeader case and apply slower retries. Also when we have missed the heartbeat retry aggressively, backing off after we have missed for more than 30 seconds.	2018-04-05 11:22:47 -07:00

1 2 3 4 5 ...

511 commits