open-nomad

Author	SHA1	Message	Date
Lang Martin	a2a1e7829d	Revert accidental merge of pr #5482 Revert "fingerprint Constraints and Affinities have Equals, as set" This reverts commit 596f16fb5f1a4a6766a57b3311af806d22382609. Revert "client tests assert the independent handling of interface and speed" This reverts commit 7857ac5993a578474d0570819f99b7b6e027de40. Revert "structs missed applying a style change from the review" This reverts commit 658916e3274efa438beadc2535f47109d0c2f0f2. Revert "client, structs comments" This reverts commit be2838d6baa9d382a5013fa80ea016856f28ade2. Revert "client fingerprint updateNetworks preserves the network configuration" This reverts commit fc309cb430e62d8e66267a724f006ae9abe1c63c. Revert "client_test cleanup comments from review" This reverts commit bc0bf4efb9114e699bc662f50c8f12319b6b3445. Revert "client Networks Equals is set equality" This reverts commit f8d432345b54b1953a4a4c719b9269f845e3e573. Revert "struct cleanup indentation in RequestedDevice Equals" This reverts commit f4746411cab328215def6508955b160a53452da3. Revert "struct Equals checks for identity before value checking" This reverts commit 0767a4665ed30ab8d9586a59a74db75d51fd9226. Revert "fix client-test, avoid hardwired platform dependecy on lo0" This reverts commit e89dbb2ab182b6368507dbcd33c3342223eb0ae7. Revert "refactor error in client fingerprint to include the offending data" This reverts commit a7fed726c6e0264d42a58410d840adde780a30f5. Revert "add client updateNodeResources to merge but preserve manual config" This reverts commit 84bd433c7e1d030193e054ec23474380ff3b9032. Revert "refactor struts.RequestedDevice to have its own Equals" This reverts commit 689782524090e51183474516715aa2f34908b8e6. Revert "refactor structs.Resource.Networks to have its own Equals" This reverts commit 49e2e6c77bb3eaa4577772b36c62205061c92fa1. Revert "refactor structs.Resource.Devices to have its own Equals" This reverts commit 4ede9226bb971ae42cc203560ed0029897aec2c9. Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources" This reverts commit 49fbaace5298d5ccf031eb7ebec93906e1d468b5. Revert "add structs.Resources Equals" This reverts commit 8528a2a2a6450e4462a1d02741571b5efcb45f0b. Revert "test that fingerprint resources are updated, net not clobbered" This reverts commit 8ee02ddd23bafc87b9fce52b60c6026335bb722d.	2019-04-11 10:29:40 -04:00
Lang Martin	7258a13c72	client, structs comments	2019-04-11 09:56:22 -04:00
Lang Martin	22d87e4538	client fingerprint updateNetworks preserves the network configuration	2019-04-11 09:56:22 -04:00
Lang Martin	f211500cea	add client updateNodeResources to merge but preserve manual config	2019-04-11 09:56:22 -04:00
Danielle Lancashire	e135876493	allocs: Add nomad alloc restart This adds a `nomad alloc restart` command and api that allows a job operator with the alloc-lifecycle acl to perform an in-place restart of a Nomad allocation, or a given subtask.	2019-04-11 14:25:49 +02:00
Michael Schurter	fec2752fb2	client: log when allocs have been processed Will hopefully help us catch deadlocks/livelocks/slowdowns in the add/remove allocs pipeline which should be fast.	2019-02-04 11:07:57 -08:00
Preetha Appan	e7b59ac08c	Only set deployment health if not already set	2019-01-12 10:38:20 -06:00
Michael Schurter	dbf4c3a3c8	Apply suggestions from code review Co-Authored-By: preetapan <preetha@hashicorp.com>	2019-01-12 10:38:20 -06:00
Preetha Appan	7bd1440710	REfactor statedb factory config to set it directly in client config	2019-01-12 10:38:20 -06:00
Preetha Appan	e237f19b38	Remove invalid allocs	2019-01-12 10:38:20 -06:00
Preetha Appan	f059ef8a47	Modified destroy failure handling to rely on allocrunner's destroy method Added a unit test with custom statedb implementation that errors, to use to verify destroy errors	2019-01-12 10:37:12 -06:00
Preetha Appan	6c95da8f67	Add back code to mark alloc as failed when restore fails Also modify restore such that any handled errors don't propagate back to the client	2019-01-12 10:37:12 -06:00
Preetha Appan	5fde0b0f5c	Revert code that made an alloc update when restore fails Restore currently shuts down the client so the alloc update cant always make it to the server	2019-01-12 10:37:12 -06:00
Preetha Appan	41bfdd764b	Handle client initialization errors when adding allocs or restoring allocs We mark the alloc as failed and track failed allocs so that we don't send updates after the first time	2019-01-12 10:37:12 -06:00
Danielle Tomlinson	3e586e93da	client: Cleanup allocrunner access	2019-01-11 18:39:18 +01:00
Alex Dadgar	c9825a9c36	recover	2019-01-07 14:49:40 -08:00
Nick Ethier	a96afb6c91	fix tests that fail as a result of async client startup	2018-12-20 00:53:44 -05:00
Michael Schurter	d9ea8252a7	client/state: support upgrading from 0.8->0.9 Also persist and load DeploymentStatus to avoid rechecking health after client restarts.	2018-12-19 10:39:27 -08:00
Nick Ethier	a02308ee6a	drivermanager: attempt to reattach and shutdown driver plugin if blocked by allow/block lists	2018-12-18 23:01:57 -05:00
Nick Ethier	ce1a5cba0e	drivermanager: use allocID and task name to route task events	2018-12-18 23:01:51 -05:00
Nick Ethier	d8a0265e68	client: batch initial fingerprinting in plugin manangers drivermanager: fix pr comments/feedback	2018-12-18 22:56:19 -05:00
Nick Ethier	7d23cbf448	client/drivermananger: fixup issues from rebase and address PR comments	2018-12-18 22:55:38 -05:00
Nick Ethier	82175d1328	client/drivermananger: add driver manager The driver manager is modeled after the device manager and is started by the client. It's responsible for handling driver lifecycle and reattachment state, as well as processing the incomming fingerprint and task events from each driver. The mananger exposes a method for registering event handlers for task events that is used by the task runner to update the server when a task has been updated with an event. Since driver fingerprinting has been implemented by the driver manager, it is no longer needed in the fingerprint mananger and has been removed.	2018-12-18 22:55:18 -05:00
Danielle Tomlinson	cb78a90f40	client: Async API for shutdown/destroy allocrunners	2018-12-18 23:38:33 +01:00
Danielle Tomlinson	d9174d8dcf	Merge pull request #4989 from hashicorp/dani/b-client-update-race-condition client: Give a copy of clientconfig to allocrunner	2018-12-17 10:49:46 +01:00
Danielle Tomlinson	8b06e8d297	Merge pull request #4990 from hashicorp/dani/b-alloc-lock client: updateAlloc release lock after read	2018-12-13 12:43:59 +01:00
Danielle Tomlinson	3823599da9	client: Give a copy of clientconfig to allocrunner Currently, there is a race condition between creating a taskrunner, and updating node attributes via fingerprinting. This is because the taskenv builder will try to iterate over the clientconfig.Node.Attributes map, which can be concurrently updated by the fingerprinting process, thus causing a panic. This fixes that by providing a copy of the clientconfg to the allocrunner inside the Read lock during config creation.	2018-12-13 12:42:15 +01:00
Danielle Tomlinson	4184eadaf4	client: updateAlloc release lock after read The allocLock is used to synchronize access to the alloc runner map, not to ensure internal consistency of the alloc runners themselves. This updates the updateAlloc process to avoid hanging on to an exclusive lock of the map while applying changes to allocrunners themselves, as they should be internally consistent. This fixes a bug where any client allocation api will block during the shutdown or updating of an allocrunner and its child taskrunners.	2018-12-12 16:30:01 +01:00
Mahmood Ali	3d166e6e9c	Merge pull request #4984 from hashicorp/b-client-update-driver client: update driver info on new driver fingerprint	2018-12-11 18:01:03 -05:00
Alex Dadgar	1531b6d534	Merge pull request #4970 from hashicorp/f-no-iops Deprecate IOPS	2018-12-11 12:51:22 -08:00
Mahmood Ali	ba515947c2	client: update driver info on new fingerprint Fixes a bug where a driver health and attributes are never updated from their initial status. If a driver started unhealthy, it may never go into a healthy status.	2018-12-11 14:25:10 -05:00
Danielle Tomlinson	805669ead4	client: Correctly pass a noop PrevAllocMigrator when restoring	2018-12-11 15:46:58 +01:00
Danielle Tomlinson	83720575de	client: Unify handling of previous and preempted allocs	2018-12-11 13:12:35 +01:00
Danielle Tomlinson	dff7093243	client: Wait for preempted allocs to terminate When starting an allocation that is preempting other allocs, we create a new group allocation watcher, and then wait for the allocations to terminate in the allocation PreRun hooks. If there's no preempted allocations, then we simply provide a NoopAllocWatcher.	2018-12-11 00:59:18 +01:00
Alex Dadgar	1e3c3cb287	Deprecate IOPS IOPS have been modelled as a resource since Nomad 0.1 but has never actually been detected and there is no plan in the short term to add detection. This is because IOPS is a bit simplistic of a unit to define the performance requirements from the underlying storage system. In its current state it adds unnecessary confusion and can be removed without impacting any users. This PR leaves IOPS defined at the jobspec parsing level and in the api/ resources since these are the two public uses of the field. These should be considered deprecated and only exist to allow users to stop using them during the Nomad 0.9.x release. In the future, there should be no expectation that the field will exist.	2018-12-06 15:09:26 -08:00
Danielle Tomlinson	66c521ca17	client: Move fingerprint structs to pkg This removes a cyclical dependency when importing client/structs from dependencies of the plugin_loader, specifically, drivers. Due to client/config also depending on the plugin_loader. It also better reflects the ownership of fingerprint structs, as they are fairly internal to the fingerprint manager.	2018-12-01 17:10:39 +01:00
Alex Dadgar	4ee603c382	Device hook and devices affect computed node class This PR introduces a device hook that retrieves the device mount information for an allocation. It also updates the computed node class computation to take into account devices. TODO Fix the task runner unit test. The environment variable is being lost even though it is being properly set in the prestart hook.	2018-11-27 17:25:33 -08:00
Michael Schurter	1e4ef139dd	Merge pull request #4883 from hashicorp/f-graceful-shutdown Support graceful shutdowns in agent	2018-11-27 15:55:15 -06:00
Michael Schurter	4f7e6f9464	client: fix races in use of goroutine group The group utility struct does not support asynchronously launched goroutines (goroutines-inside-of-goroutines), so switch those uses to a normal go call. This means watchNodeUpdates and watchNodeEvents may not be shutdown when Shutdown() exits. During nomad agent shutdown this does not matter. During tests this means a test may leak those goroutines or be unable to know when those goroutines have exited. Since there's no runtime impact and these goroutines do not affect alloc state syncing it seems ok to risk leaking them.	2018-11-26 12:52:55 -08:00
Michael Schurter	9f43fb6d29	client: reuse group instead of diy'ing it	2018-11-26 12:52:31 -08:00
Michael Schurter	5bd744ac3d	client: support graceful shutdowns Client.Shutdown now blocks until all AllocRunners and TaskRunners have exited their Run loops. Tasks are left running.	2018-11-19 16:39:30 -08:00
Mahmood Ali	f139234372	address review comments	2018-11-16 17:13:01 -05:00
Mahmood Ali	f72e599ee7	Populate alloc stats API with device stats This change makes few compromises: * Looks up the devices associated with tasks at look up time. Given that `nomad alloc status` is called rarely generally (compared to stats telemetry and general job reporting), it seems fine. However, the lookup overhead grows bounded by number of `tasks x total-host-devices`, which can be significant. * `client.Client` performs the task devices->statistics lookup. It passes self to alloc/task runners so they can look up the device statistics allocated to them. * Currently alloc/task runners are responsible for constructing the entire RPC response for stats * The alternatives for making task runners device statistics aware don't seem appealing (e.g. having task runners contain reference to hostStats) * On the alloc aggregation resource usage, I did a naive merging of task device statistics. * Personally, I question the value of such aggregation, compared to costs of struct duplication and bloating the response - but opted to be consistent in the API. * With naive concatination, device instances from a single device group used by separate tasks in the alloc, would be aggregated in two separate device group statistics.	2018-11-16 10:26:32 -05:00
Mahmood Ali	046f098bac	Track Node Device attributes and serve them in API	2018-11-14 14:42:29 -05:00
Mahmood Ali	b74ccc742c	Expose Device Stats in /client/stats API endpoint	2018-11-14 14:41:19 -05:00
Alex Dadgar	a7ca737fb6	review comments	2018-11-07 11:31:52 -08:00
Alex Dadgar	204ca8230c	Device manager Introduce a device manager that manages the lifecycle of device plugins on the client. It fingerprints, collects stats, and forwards Reserve requests to the correct plugin. The manager, also handles device plugins failing and validates their output.	2018-11-07 10:43:15 -08:00
Michael Schurter	b7a9d61a38	ar: initialize allocwatcher on restore Fixes a panic. Left a comment on how the behavior could be improved, but this is what releases <0.9.0 did.	2018-10-19 09:45:45 -07:00
Michael Schurter	e060174130	ar: fix leader handling, state restoring, and destroying unrun ARs * Migrated all of the old leader task tests and got them passing * Refactor and consolidate task killing code in AR to always kill leader tasks first * Fixed lots of issues with state restoring * Fixed deadlock in AR.Destroy if AR.Run had never been called * Added a new in memory statedb for testing	2018-10-19 09:45:45 -07:00
Nick Ethier	3183b33d24	client: review comments and fixup/skip tests	2018-10-16 16:56:56 -07:00
Nick Ethier	f192c3752a	client: refactor post allocrunnerv2 finalization	2018-10-16 16:56:56 -07:00
Nick Ethier	4a4c7dbbfc	client: begin driver plugin integration client: fingerprint driver plugins	2018-10-16 16:56:56 -07:00
Alex Dadgar	45e41cca03	allocrunnerv2 -> allocrunner	2018-10-16 16:56:56 -07:00
Alex Dadgar	6c9d9d5173	move files around	2018-10-16 16:56:55 -07:00
Michael Schurter	960f3be76c	client: expose task state to client The interesting decision in this commit was to expose AR's state and not a fully materialized Allocation struct. AR.clientAlloc builds an Alloc that contains the task state, so I considered simply memoizing and exposing that method. However, that would lead to AR having two awkwardly similar methods: - Alloc() - which returns the server-sent alloc - ClientAlloc() - which returns the fully materialized client alloc Since ClientAlloc() could be memoized it would be just as cheap to call as Alloc(), so why not replace Alloc() entirely? Replacing Alloc() entirely would require Update() to immediately materialize the task states on server-sent Allocs as there may have been local task state changes since the server received an Alloc update. This quickly becomes difficult to reason about: should Update hooks use the TaskStates? Are state changes caused by TR Update hooks immediately reflected in the Alloc? Should AR persist its copy of the Alloc? If so, are its TaskStates canonical or the TaskStates on TR? So! Forget that. Let's separate the static Allocation from the dynamic AR & TR state! - AR.Alloc() is for static Allocation access (often for the Job) - AR.AllocState() is for the dynamic AR & TR runtime state (deployment status, task states, etc). If code needs to know the status of a task: AllocState() If code needs to know the names of tasks: Alloc() It should be very easy for a developer to reason about which method they should call and what they can do with the return values.	2018-10-16 16:56:55 -07:00
Michael Schurter	8d1419c62b	client: fix accessing alloc runners * GetClientAlloc() gains nothing from using allAllocs() * getAllocatedResources was calling getAllocRunners() twice	2018-10-16 16:56:55 -07:00
Michael Schurter	e6e2930a00	tr: implement stats collection hook Tested except for the net/rpc specific error case which may need changing in the gRPC world.	2018-10-16 16:53:31 -07:00
Alex Dadgar	cebfead6bc	add logger back	2018-10-16 16:53:30 -07:00
Alex Dadgar	8504505c0d	client uses passed logger and fix fingerprinters	2018-10-16 16:53:30 -07:00
Michael Schurter	9d1ea3b228	client: hclog-ify most of the client Leaving fingerprinters in case that interface changes with plugins.	2018-10-16 16:53:30 -07:00
Michael Schurter	e42154fc46	implement stopping, destroying, and disk migration * Stopping an alloc is implemented via Updates but update hooks are not run. * Destroying an alloc is a best effort cleanup. * AllocRunner destroy hooks implemented. * Disk migration and blocking on a previous allocation exiting moved to its own package to avoid cycles. Now only depends on alloc broadcaster instead of also using a waitch. * AllocBroadcaster now only drops stale allocations and always keeps the latest version. * Made AllocDir safe for concurrent use Lots of internal contexts that are currently unused. Unsure if they should be used or removed.	2018-10-16 16:53:30 -07:00
Michael Schurter	4236255686	lots of comment/log fixes	2018-10-16 16:53:30 -07:00
Michael Schurter	357641c364	persist alloc state on changes, not periodically Allow alloc and task runners to persist their own state when something changes instead of periodically syncing all state.	2018-10-16 16:53:30 -07:00
Michael Schurter	a3fe0510d1	Move all encoding and put deduping into state db Still WIP as it does not handle deletions.	2018-10-16 16:53:30 -07:00
Michael Schurter	533bc93b3a	implement all boltdb interactions behind StateDB	2018-10-16 16:53:30 -07:00
Michael Schurter	a5d3e3fb0a	Implement alloc updates in arv2 Updates are applied asynchronously but sequentially	2018-10-16 16:53:30 -07:00
Michael Schurter	a4b4d7b266	consul service hook Deregistration works but difficult to test due to terminal updates not being fully implemented in the new client/ar/tr.	2018-10-16 16:53:29 -07:00
Michael Schurter	5be982e674	restore vault client	2018-10-16 16:53:29 -07:00
Alex Dadgar	fd3bc1bd39	Update state with server	2018-10-16 16:53:29 -07:00
Michael Schurter	7f4ec50906	missed locking around c.allocs access	2018-10-16 16:53:29 -07:00
Michael Schurter	516d641db0	client: implement all-or-nothing alloc restoration Restoring calls NewAR -> Restore -> Run NewAR now calls NewTR AR.Restore calls TR.Restore AR.Run calls TR.Run	2018-10-16 16:53:29 -07:00
Alex Dadgar	80f6ce50c0	vault hook	2018-10-16 16:53:29 -07:00
Michael Schurter	b360f6f96e	fix hclog level	2018-10-16 16:53:29 -07:00
Michael Schurter	4f43ff5c51	pass statedb into allocrunnerv2	2018-10-16 16:53:29 -07:00
Michael Schurter	0f7dcfdc9a	example redis job "runs" on arv2! see below Tons left to do and lots of churn: 1. No state saving 2. No shutdown or gc 3. Removed AR factory for now 4. Made all "Config" structs local to the package they configure 5. Added allocID to GC to avoid a lookup Really hating how many things use *structs.Allocation. It's not bad without state saving, but if AllocRunner starts updating its copy things get racy fast.	2018-10-16 16:53:29 -07:00
Alex Dadgar	01f8e5b95f	renames	2018-10-04 14:57:25 -07:00
Alex Dadgar	52f9cd7637	fixing tests	2018-10-04 14:26:19 -07:00
Alex Dadgar	5c8697667e	Node reserved resources	2018-09-29 18:44:55 -07:00
Alex Dadgar	3183153315	Node resources on client	2018-09-29 17:23:41 -07:00
Alex Dadgar	9971b3393f	yamux	2018-09-17 14:22:40 -07:00
Alex Dadgar	7739ef51ce	agent + consul	2018-09-13 10:43:40 -07:00
Michael Schurter	08862fc177	fix race around error handling	2018-09-05 17:34:17 -07:00
Preetha	043f4c208b	Merge pull request #3882 from burdandrei/telemetry-add-node-class-tag Added node class to tagged metrics	2018-06-21 17:04:35 -05:00
Alex Dadgar	b61051b3cd	Merge pull request #4409 from hashicorp/r-client-packages Refactor client packages	2018-06-13 17:32:25 -07:00
Alex Dadgar	90c2108bfb	Fix gc tests + parallel destroy + small test fixes	2018-06-12 10:23:45 -07:00
Alex Dadgar	f5ff509fa5	Refactor - wip	2018-06-12 10:23:45 -07:00
Chelsea Holland Komlo	f74e74b22d	add client logic to determine whether TLS RPC connections should reload	2018-06-08 14:38:58 -04:00
Chelsea Holland Komlo	064b5481e0	add server join info to server and client	2018-05-31 10:50:03 -07:00
Chelsea Holland Komlo	38f611a7f2	refactor NewTLSConfiguration to pass in verifyIncoming/verifyOutgoing add missing fields to TLS merge method	2018-05-23 18:35:30 -04:00
Chelsea Holland Komlo	796bae6f1b	allow configurable cipher suites disallow 3DES and RC4 ciphers add documentation for tls_cipher_suites	2018-05-09 17:15:31 -04:00
Chelsea Holland Komlo	9b8a079558	fix up comments	2018-04-17 11:53:08 -04:00
Alex Dadgar	9d612c8cb0	Cleanup	2018-04-16 15:48:34 -07:00
Alex Dadgar	32adaf9dfc	Copy the config given to the alloc runner	2018-04-16 15:45:52 -07:00
Alex Dadgar	4f2a7b6949	Fix copying drivers	2018-04-16 15:45:51 -07:00
Alex Dadgar	0b799822ff	Operate on copy	2018-04-16 15:45:49 -07:00
Alex Dadgar	ff1a1a63e8	Move where attribute for driver detection is set	2018-04-12 15:50:25 -07:00
Alex Dadgar	f24ce2c50c	Driver health detection cleanups This PR does: 1. Health message based on detection has format "Driver XXX detected" and "Driver XXX not detected" 2. Set initial health description based on detection status and don't wait for the first health check. 3. Combine updating attributes on the node, fingerprint and health checking update for drivers into a single call back. 4. Condensed driver info in `node status` only shows detected drivers and make the output less wide by removing spaces.	2018-04-12 12:46:40 -07:00
Andrei Burd	502d17fa90	Added node class to tagged metrics	2018-04-11 12:20:59 +03:00
Alex Dadgar	3d367d6fd7	Fix client uptime metric missing client prefix	2018-04-10 10:39:36 -07:00
Alex Dadgar	ae1f76477e	Start rebalance after discovering new servers	2018-04-05 15:41:59 -07:00
Alex Dadgar	be2513e0f9	more jitter	2018-04-05 13:48:33 -07:00
Alex Dadgar	bd3345942c	Handle no leader and faster retries near limit Handle the ErrNoLeader case and apply slower retries. Also when we have missed the heartbeat retry aggressively, backing off after we have missed for more than 30 seconds.	2018-04-05 11:22:47 -07:00
Alex Dadgar	279b5c22e5	Scale heartbeat retrying based on remaining heartbeat time	2018-04-05 10:58:13 -07:00
Alex Dadgar	7941f4eb2d	Fire retry only when consul discovers new servers	2018-04-05 10:40:17 -07:00
Alex Dadgar	86c32358d4	Spelling error	2018-04-03 18:30:01 -07:00
Alex Dadgar	01a6beafbf	RPC Retry Watcher	2018-04-03 18:05:28 -07:00
Alex Dadgar	58a3ec3fb2	Improve Vault error handling	2018-04-03 14:29:22 -07:00
Chelsea Holland Komlo	2174ede6b9	add clarifying comment	2018-03-29 10:58:39 -04:00
Chelsea Holland Komlo	e3319afee1	emit first node event	2018-03-28 17:26:53 -04:00
Chelsea Holland Komlo	efc03e252c	specify driver health messages	2018-03-28 11:35:21 -04:00
Chelsea Holland Komlo	003bc209b9	use time.Time for node events for compatibility	2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo	f801709a0a	fix issue when updating node events	2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo	60f12d206f	improve comments; update watchDriver	2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo	739784736a	remove unused function	2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo	d92703617c	simplify logic bump log level	2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo	86b7b3d2d9	fix up health check logic comparison; add node events to client driver checks	2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo	53a5bc2bb3	Code review feedback	2018-03-21 15:15:26 -04:00
Alex Dadgar	34dc58421c	notes from walk through	2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo	44b6951dda	improve tests	2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo	0425be8f48	updating comments; locking concurrent node access	2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo	c50d02ae93	go style; update comments	2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo	3aa726baab	fix scheduler driver name; create node structs file	2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo	3cba95e8a7	allow nomad to schedule based on the status of a client driver health check Slight updates for go style	2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo	0bde357731	add concept of health checks to fingerprinters and nodes fix up feedback from code review add driver info for all drivers to node	2018-03-21 15:15:25 -04:00
Preetha Appan	3c38eededd	Fix spelling in comment	2018-03-14 15:54:25 -05:00
Alex Dadgar	bef4a8ee09	fix clearing node events	2018-03-14 09:48:59 -07:00
Chelsea Komlo	810eedfa2a	Merge pull request #3945 from hashicorp/f-add-node-events Add node events	2018-03-14 08:42:55 -04:00
Preetha	360d6e5a92	Merge pull request #3968 from hashicorp/f-nicer-vault-error Make server side error messages from vault more clearer	2018-03-13 20:49:39 -05:00
Alex Dadgar	de6ebb6e6c	small cleanup	2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo	b41501e442	code review feedback	2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo	1488b076d1	code review feedback	2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo	a8655320fd	fix up go check warnings	2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo	0934769b04	add client side emitting of node events Changelog	2018-03-13 18:08:21 -07:00
Preetha Appan	914eaed64f	Address some code review comments	2018-03-13 18:19:16 -05:00
Preetha Appan	09c231ce43	Return the err from server correctly	2018-03-13 18:10:14 -05:00
Preetha Appan	9618f52746	Remove error wrapping and make vault connection server side errors clearer.	2018-03-13 17:09:03 -05:00
Alex Dadgar	4844317cc2	Merge pull request #3890 from hashicorp/b-heartbeat Heartbeat improvements and handling failures during establishing leadership	2018-03-12 14:41:59 -07:00
Josh Soref	173ce63fe9	spelling: transition	2018-03-11 19:06:05 +00:00
Josh Soref	782c704de6	spelling: thresholds	2018-03-11 19:03:47 +00:00
Josh Soref	8149694f3a	spelling: server	2018-03-11 18:55:30 +00:00
Josh Soref	258d76ec13	spelling: registry	2018-03-11 18:41:13 +00:00
Josh Soref	3c1ce6d16d	spelling: otherwise	2018-03-11 18:34:27 +00:00
Josh Soref	1ef6d6319e	spelling: labels	2018-03-11 18:21:44 +00:00
Josh Soref	52b83328fc	spelling: heartbeating	2018-03-11 18:12:19 +00:00
Josh Soref	c9b86bbc2f	spelling: controls	2018-03-11 17:50:39 +00:00
Josh Soref	e78cf9c81a	spelling: already	2018-03-11 17:39:04 +00:00
Josh Soref	b8b46d3f74	spelling: allocation	2018-03-11 17:37:22 +00:00
Chelsea Holland Komlo	122d1c4e4a	simplify retry logic	2018-03-01 09:48:26 -05:00
Chelsea Holland Komlo	355805db56	reset timer after updating node copy	2018-02-27 17:18:10 -05:00
Chelsea Holland Komlo	a72aaaf47f	add network resources equal method, use time ticker remove impossible test case	2018-02-27 12:42:53 -05:00
Chelsea Holland Komlo	e736e31820	use time ticker, update how network resources are compared	2018-02-26 18:47:11 -05:00
Chelsea Holland Komlo	5059065b52	improved testing; node networks comparison	2018-02-26 15:55:38 -05:00
Chelsea Holland Komlo	1f31b39fe8	code review fixups	2018-02-26 12:36:30 -05:00
Chelsea Holland Komlo	ed8c8afbcd	edge trigger node update test update config copy trigger	2018-02-26 12:36:04 -05:00
Alex Dadgar	49a47483d1	Registering back to initializing Fix a bug in which if the node attributes/meta changed, we would re-register the node in status initializing. This would incorrectly trigger the client to log that it missed its heartbeat. It would change the status of the Node to initializing until the next heartbeat occured.	2018-02-16 17:49:31 -08:00
Alex Dadgar	eff4455c68	Fix original client server list behavior	2018-02-15 16:04:53 -08:00
Alex Dadgar	f9cf642436	Client tls	2018-02-15 15:22:57 -08:00
Alex Dadgar	e685211892	Code review feedback	2018-02-15 13:59:02 -08:00
Alex Dadgar	2c0ad26374	New RPC Modes and basic setup for streaming RPC handlers	2018-02-15 13:59:01 -08:00
Alex Dadgar	9bc75f0ad4	Fix manager tests and make testagent recover from port conflicts	2018-02-15 13:59:01 -08:00
Alex Dadgar	3f1f8604bb	initial round of comment review	2018-02-15 13:59:01 -08:00
Alex Dadgar	c8c1284bc3	SetServer command actually returns an error if given an invalid server	2018-02-15 13:59:01 -08:00
Alex Dadgar	3f786b904b	use server manager	2018-02-15 13:59:01 -08:00
Alex Dadgar	6dd1c9f49d	Refactor	2018-02-15 13:59:00 -08:00
Alex Dadgar	1472b943d6	Stats Endpoint	2018-02-15 13:59:00 -08:00
Chelsea Holland Komlo	4a26959825	code review feedback	2018-02-07 18:10:55 -05:00
Chelsea Holland Komlo	d626d24488	remove dependency on client for fingerprint manager	2018-02-07 18:10:45 -05:00
Chelsea Holland Komlo	e012e5ab8a	add fingerprint manager	2018-02-07 18:10:33 -05:00
Chelsea Holland Komlo	b21233fe23	update log message	2018-02-01 19:46:57 -05:00
Chelsea Holland Komlo	6f9c0ab361	req/resp should be within config locks; rename for detected fingerprints changelog	2018-02-01 19:00:39 -05:00
Chelsea Holland Komlo	b8e8064835	code review fixup	2018-01-31 18:34:03 -05:00
Chelsea Holland Komlo	7b53474a6e	add applicable boolean to fingerprint response public fields and remove getter functions	2018-01-31 13:21:45 -05:00
Chelsea Holland Komlo	9482c322b7	locks for fingerprint reads/writes	2018-01-30 11:32:45 -05:00
Chelsea Holland Komlo	7c19de797c	create safe getters and setters for fingerprint response	2018-01-26 11:22:05 -05:00
Chelsea Holland Komlo	896d6f8058	fixups from code review	2018-01-26 07:04:32 -05:00
Chelsea Holland Komlo	9a8344333b	refactor Fingerprint to request/response construct	2018-01-24 11:54:02 -05:00
Chelsea Holland Komlo	649f86f094	refactor creating a new tls configuration	2018-01-16 08:02:39 -05:00
Chelsea Holland Komlo	6c9f9c8ac3	adding additional test assertions; differentiate reloading agent and http server	2018-01-16 07:34:39 -05:00
Chelsea Holland Komlo	214d128eb9	reload raft transport layer fix up linting	2018-01-08 14:52:28 -05:00
Chelsea Holland Komlo	0708d34135	call reload on agent, client, and server separately	2018-01-08 09:56:31 -05:00
Chelsea Holland Komlo	9741097406	reloading tls config should be atomic for clients/servers	2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo	ae7fc4695e	fixups from code review Revert "close raft long-lived connections" This reverts commit 3ffda28206fcb3d63ad117fd1d27ae6f832b6625. reload raft connections on changing tls	2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo	acd3d1b162	fix up downgrading client to plaintext add locks around changing server configuration	2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo	c0ad9a4627	add ability to upgrade/downgrade nomad agents tls configurations via sighup	2018-01-08 09:21:06 -05:00
Alex Dadgar	91ffbbb517	Review feedback	2017-12-07 16:10:57 -08:00
Alex Dadgar	02baa6c52b	Handle race between fingerprinters and registration	2017-12-07 13:09:37 -08:00
Alex Dadgar	4409fdacc0	Drop trace logging	2017-12-06 18:02:24 -08:00
Alex Dadgar	cd9a7f14b8	Add logging around heartbeats	2017-12-06 17:57:50 -08:00
Chelsea Komlo	2dfda33703	Nomad agent reload TLS configuration on SIGHUP (#3479 ) * Allow server TLS configuration to be reloaded via SIGHUP * dynamic tls reloading for nomad agents * code cleanup and refactoring * ensure keyloader is initialized, add comments * allow downgrading from TLS * initalize keyloader if necessary * integration test for tls reload * fix up test to assert success on reloaded TLS configuration * failure in loading a new TLS config should remain at current Reload only the config if agent is already using TLS * reload agent configuration before specific server/client lock keyloader before loading/caching a new certificate * introduce a get-or-set method for keyloader * fixups from code review * fix up linting errors * fixups from code review * add lock for config updates; improve copy of tls config * GetCertificate only reloads certificates dynamically for the server * config updates/copies should be on agent * improve http integration test * simplify agent reloading storing a local copy of config * reuse the same keyloader when reloading * Test that server and client get reloaded but keep keyloader * Keyloader exposes GetClientCertificate as well for outgoing connections * Fix spelling * correct changelog style	2017-11-14 17:53:23 -08:00
Michael Schurter	1769db98b7	Fix regression by returning error on unknown alloc	2017-11-01 15:16:38 -05:00
Michael Schurter	73e9b57908	Trigger GCs after alloc changes GC much more aggressively by triggering GCs when allocations become terminal as well as after new allocations are added.	2017-11-01 15:16:38 -05:00
Michael Schurter	2a81160dcd	Fix GC'd alloc tracking The Client.allocs map now contains all AllocRunners again, not just un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs allocs. Also stops logging "marked for GC" twice.	2017-11-01 15:16:38 -05:00
Alex Dadgar	4831380e57	Node access is done using locked Node copy Fixes https://github.com/hashicorp/nomad/issues/3454 Reliably reproduced the data race before by having a fingerprinter change the nodes attributes every millisecond and syncing at the same rate. With fix, did not ever panic.	2017-10-27 13:27:24 -07:00
Michael Schurter	15b991e039	base64 migrate token HTTP header values must be ASCII. Also constant time compare tokens and test the generate and compare helper functions.	2017-10-13 10:59:13 -07:00
Chelsea Holland Komlo	e1c4701a43	fix up build warnings	2017-10-11 17:11:57 -07:00
Chelsea Holland Komlo	b018ca4d46	fixing up code review comments	2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo	410adaf726	Add functionality for authenticated volumes	2017-10-11 17:09:20 -07:00
Michael Schurter	a66c53d45a	Remove `structs` import from `api` Goes a step further and removes structs import from api's tests as well by moving GenerateUUID to its own package.	2017-09-29 10:36:08 -07:00
Alex Dadgar	4173834231	Enable more linters	2017-09-26 15:26:33 -07:00
Chelsea Holland Komlo	b26454cf99	Move setGaugeForAllocationStats to emitClientMetrics	2017-09-25 16:05:49 +00:00

... 2 3 4 5 6 ...

713 commits