open-nomad

Author	SHA1	Message	Date
Lang Martin	a4784ef258	csi add allocation context to fingerprinting results (#7133 ) * structs: CSIInfo include AllocID, CSIPlugins no Jobs * state_store: eliminate plugin Jobs, delete an empty plugin * nomad/structs/csi: detect empty plugins correctly * client/allocrunner/taskrunner/plugin_supervisor_hook: option AllocID * client/pluginmanager/csimanager/instance: allocID * client/pluginmanager/csimanager/fingerprint: set AllocID * client/node_updater: split controller and node plugins * api/csi: remove Jobs The CSI Plugin API will map plugins to allocations, which allows plugins to be defined by jobs in many configurations. In particular, multiple plugins can be defined in the same job, and multiple jobs can be used to define a single plugin. Because we now map the allocation context directly from the node, it's no longer necessary to track the jobs associated with a plugin directly. * nomad/csi_endpoint_test: CreateTestPlugin & register via fingerprint * client/dynamicplugins: lift AllocID into the struct from Options * api/csi_test: remove Jobs test * nomad/structs/csi: CSIPlugins has an array of allocs * nomad/state/state_store: implement CSIPluginDenormalize * nomad/state/state_store: CSIPluginDenormalize npe on missing alloc * nomad/csi_endpoint_test: defer deleteNodes for clarity * api/csi_test: disable this test awaiting mocks: https://github.com/hashicorp/nomad/issues/7123	2020-03-23 13:58:30 -04:00
Tim Gross	8bc5641438	csi: volume claim garbage collection (#7125 ) When an alloc is marked terminal (and after node unstage/unpublish have been called), the client syncs the terminal alloc state with the server via `Node.UpdateAlloc RPC`. For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC handler at the server will emit an eval for a new core job to garbage collect CSI volume claims. When this eval is handled on the core scheduler, it will call a `volumeReap` method to release the claims for all terminal allocs on the job. The volume reap will issue a `ControllerUnpublishVolume` RPC for any node that has no alloc claiming the volume. Once this returns (or is skipped), the volume reap will send a new `CSIVolume.Claim` RPC that releases the volume claim for that allocation in the state store, making it available for scheduling again. This same `volumeReap` method will be called from the core job GC, which gives us a second chance to reclaim volumes during GC if there were controller RPC failures.	2020-03-23 13:58:30 -04:00
Lang Martin	7b675f89ac	csi: fix index maintenance for CSIVolume and CSIPlugin tables (#7049 ) * state_store: csi volumes/plugins store the index in the txn * nomad: csi_endpoint_test require index checks need uint64() * nomad: other tests using int 0 not uint64(0) * structs: pass index into New, but not other struct methods * state_store: csi plugin indexes, use new struct interface * nomad: csi_endpoint_test check index/query meta (on explicit 0) * structs: NewCSIVolume takes an index arg now * scheduler/test: NewCSIVolume takes an index arg now	2020-03-23 13:58:29 -04:00
Lang Martin	a0a6766740	CSI: Scheduler knows about CSI constraints and availability (#6995 ) * structs: piggyback csi volumes on host volumes for job specs * state_store: CSIVolumeByID always includes plugins, matches usecase * scheduler/feasible: csi volume checker * scheduler/stack: add csi volumes * contributing: update rpc checklist * scheduler: add volumes to State interface * scheduler/feasible: introduce new checker collection tgAvailable * scheduler/stack: taskGroupCSIVolumes checker is transient * state_store CSIVolumeDenormalizePlugins comment clarity * structs: remote TODO comment in TaskGroup Validate * scheduler/feasible: CSIVolumeChecker hasPlugins improve comment * scheduler/feasible_test: set t.Parallel * Update nomad/state/state_store.go Co-Authored-By: Danielle <dani@hashicorp.com> * Update scheduler/feasible.go Co-Authored-By: Danielle <dani@hashicorp.com> * structs: lift ControllerRequired to each volume * state_store: store plug.ControllerRequired, use it for volume health * feasible: csi match fast path remove stale host volume copied logic * scheduler/feasible: improve comments Co-authored-by: Danielle <dani@builds.terrible.systems>	2020-03-23 13:58:29 -04:00
Lang Martin	88316208a0	csi: server-side plugin state tracking and api (#6966 ) * structs: CSIPlugin indexes jobs acting as plugins and node updates * schema: csi_plugins table for CSIPlugin * nomad: csi_endpoint use vol.Denormalize, plugin requests * nomad: csi_volume_endpoint: rename to csi_endpoint * agent: add CSI plugin endpoints * state_store_test: use generated ids to avoid t.Parallel conflicts * contributing: add note about registering new RPC structs * command: agent http register plugin lists * api: CSI plugin queries, ControllerHealthy -> ControllersHealthy * state_store: copy on write for volumes and plugins * structs: copy on write for volumes and plugins * state_store: CSIVolumeByID returns an unhealthy volume, denormalize * nomad: csi_endpoint use CSIVolumeDenormalizePlugins * structs: remove struct errors for missing objects * nomad: csi_endpoint return nil for missing objects, not errors * api: return meta from Register to avoid EOF error * state_store: CSIVolumeDenormalize keep allocs in their own maps * state_store: CSIVolumeDeregister error on missing volume * state_store: CSIVolumeRegister set indexes * nomad: csi_endpoint use CSIVolumeDenormalizePlugins tests	2020-03-23 13:58:29 -04:00
Lang Martin	5b31b140c3	csi: do not use namespace specific identifiers	2020-03-23 13:58:29 -04:00
Lang Martin	4bb4dd98eb	state_store: CSIVolume insert, get, delete, claim state_store: change claim counts state_store: get volumes by all, by driver state_store: process volume claims state_store: csi volume register error on update	2020-03-23 13:58:29 -04:00
Lang Martin	0422b967db	schema: csi_volumes schema	2020-03-23 13:58:29 -04:00
Mahmood Ali	f492ab6d9e	implement MinQuorum	2020-02-16 16:04:59 -06:00
Drew Bailey	6bd6c6638c	include pro tag in serveral oss.go files	2020-02-10 15:56:14 -05:00
Drew Bailey	9a65556211	add state store test to ensure PlacedCanaries is updated	2020-02-03 13:58:01 -05:00
Drew Bailey	f51a3d1f37	nomad state store must be modified through raft, rm local state change	2020-02-03 13:57:34 -05:00
Drew Bailey	74779f23e6	keep placed canaries aligned with alloc status	2020-02-03 13:57:33 -05:00
Seth Hoenig	9df33f622f	nomad: proxy requests for Service Identity tokens between Clients and Consul Nomad jobs may be configured with a TaskGroup which contains a Service definition that is Consul Connect enabled. These service definitions end up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In the case where Consul ACLs are enabled, a Service Identity token is required for these tasks to run & connect, etc. This changeset enables the Nomad Server to recieve RPC requests for the derivation of SI tokens on behalf of instances of Consul Connect using Tasks. Those tokens are then relayed back to the requesting Client, which then injects the tokens in the secrets directory of the Task.	2020-01-31 19:03:53 -06:00
Seth Hoenig	2b66ce93bb	nomad: ensure a unique ClusterID exists when leader (gh-6702) Enable any Server to lookup the unique ClusterID. If one has not been generated, and this node is the leader, generate a UUID and attempt to apply it through raft. The value is not yet used anywhere in this changeset, but is a prerequisite for gh-6701.	2020-01-31 19:03:26 -06:00
Mahmood Ali	bfa33cf471	canonicalize allocs from plan results too	2020-01-10 10:41:12 -05:00
Michael Schurter	92cdc9de01	nomad/state: remove dead upgrade path code It is uncalled so there hsould be no runtime changes.	2019-12-20 11:10:22 -08:00
Seth Hoenig	d45dec1ca8	tests: parallelize state store tests It has been decided we're going to live in a many core world. Let's take advantage of that and parallelize these state store tests which all run in memory and are largely CPU bound. An unscientific benchmark demonstrating the improvement: [mp state (master)] $ go test PASS ok github.com/hashicorp/nomad/nomad/state 5.162s [mp state (f-parallelize-state-store-tests)] $ go test PASS ok github.com/hashicorp/nomad/nomad/state 1.527s	2019-12-11 09:36:37 -06:00
Mahmood Ali	02e20c720b	acl_endpoint: permission denied for unauthenticated requests If ACL Request is unauthenticated, we should honor the anonymous token. This PR makes few changes: * `GetPolicy` endpoints may return policy if anonymous policy allows it, or return permission denied otherwise. * `ListPolicies` returns an empty policy list, or one with anonymous policy if one exists. Without this PR, the we return an incomprehensible error. Before: ``` $ curl http://localhost:4646/v1/acl/policy/doesntexist; echo acl token lookup failed: index error: UUID must be 36 characters $ curl http://localhost:4646/v1/acl/policies; echo acl token lookup failed: index error: UUID must be 36 characters ``` After: ``` $ curl http://localhost:4646/v1/acl/policy/doesntexist; echo Permission denied $ curl http://localhost:4646/v1/acl/policies; echo [] ```	2019-11-22 08:43:09 -05:00
Michael Schurter	81b4b6f19b	Merge pull request #5791 from hashicorp/b-plan-snapshotindex nomad: include snapshot index when submitting plans	2019-07-17 09:25:00 -07:00
Lang Martin	a8e72a5b68	state_store error if called without node_ids	2019-07-10 13:56:20 -04:00
Lang Martin	8e53c105fc	state_store just one index update, test deletion	2019-07-10 13:56:19 -04:00
Lang Martin	5a6a947e98	state_store improve error messages	2019-07-10 13:56:19 -04:00
Lang Martin	be2d6853cb	state_store DeleteNode operates on a batch of ids	2019-07-10 13:56:19 -04:00
Preetha Appan	3cb798235d	Missed one revert of backwards compatibility for node drain	2019-07-01 16:46:05 -05:00
Preetha Appan	aa2b4b4e00	Undo removal of node drain compat changes Decided to remove that in 0.10	2019-07-01 15:12:01 -05:00
Preetha Appan	3484f18984	Fix more tests	2019-06-26 16:30:53 -05:00
Preetha Appan	23319e04d6	Restore accidentally deleted block	2019-06-26 13:59:14 -05:00
Preetha Appan	10e7d6df6d	Remove compat code associated with many previous versions of nomad This removes compat code for namespaces (0.7), Drain(0.8) and other older features from releases older than Nomad 0.7	2019-06-25 19:05:25 -05:00
Michael Schurter	e4bc943a68	nomad: SnapshotAfter -> SnapshotMinIndex Rename SnapshotAfter to SnapshotMinIndex. The old name was not technically accurate. SnapshotAtOrAfter is more accurate, but wordy and still lacks context about what precisely it is at or after (the index). SnapshotMinIndex was chosen as it describes the action (snapshot), a constraint (minimum), and the object of the constraint (index).	2019-06-24 12:16:46 -07:00
Mahmood Ali	07f2c77c44	comment DenormalizeAllocationDiffSlice applies to terminal allocs only	2019-06-12 08:28:43 -04:00
Mahmood Ali	392f5bac44	Stop updating allocs.Job on stopping or preemption	2019-06-10 18:30:20 -04:00
Mahmood Ali	6c8e329819	test that stopped alloc jobs aren't modified When an alloc is stopped, test that we don't update the job found in alloc with new job that is no longer relevent for this alloc.	2019-06-10 17:14:26 -04:00
Mahmood Ali	87173111de	Merge pull request #5746 from hashicorp/b-no-updating-inmem-node set node.StatusUpdatedAt in raft	2019-06-05 19:05:21 -04:00
Lang Martin	0c403eafde	state_store typo in a comment	2019-05-22 12:32:08 -04:00
Mahmood Ali	6bdbeed319	set node.StatusUpdatedAt in raft Fix a case where `node.StatusUpdatedAt` was manipulated directly in memory. This ensures that StatusUpdatedAt is set in raft layer, and ensures that the field is updated when node drain/eligibility is updated too.	2019-05-21 16:13:32 -04:00
Michael Schurter	1bc731da47	nomad: remove unused NotifyGroup struct I don't think it's been used for a long time.	2019-05-17 13:30:23 -07:00
Michael Schurter	9732bc37ff	nomad: refactor waitForIndex into SnapshotAfter Generalize wait for index logic in the state store for reuse elsewhere. Also begin plumbing in a context to combine handling of timeouts and shutdown.	2019-05-17 13:30:23 -07:00
Preetha	c8fdf20c66	Merge pull request #5717 from hashicorp/b-plan-apply-preemptions Fix bug in plan applier introduced in PR-5602	2019-05-16 11:01:05 -05:00
Preetha	555dd23c2c	remove stray newline Co-Authored-By: Danielle <dani@builds.terrible.systems>	2019-05-15 21:11:52 -05:00
Preetha Appan	2b787aad7e	Fix bug in plan applier introduced in PR-5602 This fixes a bug in the state store during plan apply. When denormalizing preempted allocations it incorrectly set the preemptor's job during the update. This eventually causes a panic downstream in the client. Added a test assertion that failed before and passes after this fix	2019-05-15 20:34:06 -05:00
Preetha Appan	d448750449	Lookup job only once, and fix tests	2019-05-13 18:33:41 -05:00
Preetha Appan	07690d6f9e	Add flag similar to --all for allocs to be able to filter deployments by latest	2019-05-13 18:33:41 -05:00
Arshneet Singh	9cc39edb67	Return error when preempted/stopped alloc doesn't exist during denormalization	2019-04-24 12:36:07 -07:00
Arshneet Singh	d4e7a5c005	Add comments to functions, and use require instead of assert	2019-04-23 09:57:21 -07:00
Arshneet Singh	4cf4324b8f	Remove allowPlanOptimization from schedulers	2019-04-23 09:18:02 -07:00
Arshneet Singh	65f5fab131	Add tests for plan normalization	2019-04-23 09:18:01 -07:00
Arshneet Singh	b977748a4b	Add code for plan normalization	2019-04-23 09:18:01 -07:00
Alex Dadgar	4bdccab550	goimports	2019-01-22 15:44:31 -08:00
Preetha Appan	510d7839e4	code review comments	2019-01-18 17:41:39 -06:00
Preetha Appan	be9656d195	fix linting	2019-01-17 15:36:33 -06:00
Preetha Appan	0f8a113ead	Refactor to find jobs with child instances more effeciently also added unit tests	2019-01-17 14:29:48 -06:00
Preetha Appan	be36fee48e	Use IsParameterized/isPeriodic methods	2019-01-17 12:15:42 -06:00
Preetha Appan	81a8f18cac	Fix bug in reconcile summaries that affects periodic/parameterized jobs This fixes incorrect parent job summaries by recomputing them in the ReconcileJobSummaries method in the state store	2019-01-17 12:01:01 -06:00
Preetha Appan	8656d3379f	Add guards around subtracting summary count	2018-12-03 11:16:35 -06:00
Mahmood Ali	6281700c0c	address review comments	2018-11-20 13:21:39 -05:00
Mahmood Ali	d744e71fa9	add a missing no errorassertion	2018-11-19 21:44:00 -05:00
Mahmood Ali	b93643cd96	Fix a panic related to batch GC `deleteJobVersions` does concurrent modifications to iterated items while iterating, by deleting job versions while it's iterating on them,	2018-11-19 20:59:45 -05:00
Mahmood Ali	bff9c3b3e9	Reproduce a panic related to batch GC Test case that reproduces a panic with the following stacktrace: ``` panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715] goroutine 35 [running]: testing.tRunner.func1(0xc0001e2200) /usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387 panic(0x167e400, 0x1c43a30) /usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9 github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530) /go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5 github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(radixIterator).Next(0xc0003a0420, 0x1756059, 0xb) /go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e github.com/hashicorp/nomad/nomad/state.(StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0) /go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1 github.com/hashicorp/nomad/nomad/state.(StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0) /go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100) /go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2 github.com/hashicorp/nomad/nomad/state.(StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0) /go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79 github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200) /go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685 testing.tRunner(0xc0001e2200, 0x1777138) /usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf created by testing.(T).Run /usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353 ```	2018-11-19 20:58:32 -05:00
Mahmood Ali	a4a9347501	fix comment typos	2018-11-14 08:36:14 -05:00
Mahmood Ali	1403ad21b9	Changelog job re-run fix	2018-11-13 07:52:51 -05:00
Mahmood Ali	e2d668f21c	Merge pull request #4861 from hashicorp/b-batch-deregister-transaction Run job deregistering in a single transaction	2018-11-12 20:59:44 -05:00
Mahmood Ali	8513b3cccb	Comment public functions and batch write txn	2018-11-12 16:09:39 -05:00
Preetha Appan	7ef126a027	Smaller methods, and added tests for RPC layer	2018-11-10 17:37:33 -06:00
Mahmood Ali	9c0a15f3ce	Run job deregistering in a single transaction Fixes https://github.com/hashicorp/nomad/issues/4299 Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals. Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index. However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals. When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation. This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.	2018-11-09 22:35:26 -05:00
Alex Dadgar	98398a8a44	Merge pull request #4842 from hashicorp/b-deployment-progress-deadline Fix multiple bugs with progress deadline handling	2018-11-08 13:31:54 -08:00
Alex Dadgar	261aae32b1	more robust merging of the deployment status when getting updates from the client	2018-11-05 16:39:09 -08:00
Alex Dadgar	1c31970464	Fix multiple tgs with progress deadline handling Fix an issue in which the deployment watcher would fail the deployment based on the earliest progress deadline of the deployment regardless of if the task group has finished. Further fix an issue where the blocked eval optimization would make it so no evals were created to progress the deployment. To reproduce this issue, prior to this commit, you can create a job with two task groups. The first group has count 1 and resources such that it can not be placed. The second group has count 3, max_parallel=1, and can be placed. Run this first and then update the second group to do a deployment. It will place the first of three, but never progress since there exists a blocked eval. However, that doesn't capture the fact that there are two groups being deployed.	2018-11-05 16:06:17 -08:00
Preetha Appan	57fe5050f0	more minor review feedback	2018-11-01 17:05:17 -05:00
Preetha Appan	8f7eb61823	Introduce a response object for scheduler configuration	2018-10-30 11:06:32 -05:00
Preetha Appan	1a5421f5d7	more minor cleanup	2018-10-30 11:06:32 -05:00
Preetha Appan	0494a098ce	More style and readablity fixes from review	2018-10-30 11:06:32 -05:00
Preetha Appan	1415032c13	More review comments	2018-10-30 11:06:32 -05:00
Preetha Appan	7b8156fc47	Restore/Snapshot plus unit tests for scheduler configuration	2018-10-30 11:06:32 -05:00
Preetha Appan	bd34cbb1f7	Support for new scheduler config API, first use case is to disable preemption	2018-10-30 11:06:32 -05:00
Preetha Appan	eb38488d08	Fix logic bug, unit test for plan apply method in state store	2018-10-30 11:06:32 -05:00
Preetha Appan	cc295b90de	Implement preemption for system jobs. This commit implements an allocation selection algorithm for finding allocations to preempt. It currently special cases network resource asks from others (cpu/memory/disk/iops).	2018-10-30 11:06:32 -05:00
Alex Dadgar	52f9cd7637	fixing tests	2018-10-04 14:26:19 -07:00
Alex Dadgar	b2449ae1ce	Fix deployment watcher index usage Fixes three issues: 1. Retrieving the latest evaluation index was not properly selecting the greatest index. This would undermine checks we had to reduce the number of evaluations created when the latest eval index was greater than any alloc change 2. Fix an issue where the blocking query code was using the incorrect index such that the index was higher than necassary. 3. Special case handling of blocked evaluation since the create/snapshot index is no particularly useful since they can be reblocked.	2018-09-21 13:59:11 -07:00
Alex Dadgar	3c19d01d7a	server	2018-09-15 16:23:13 -07:00
Alex Dadgar	300b1a7a15	Tests only use testlog package logger	2018-06-13 15:40:56 -07:00
Alex Dadgar	21c5ed850d	Register events	2018-05-22 14:06:33 -07:00
Alex Dadgar	17aac1c9de	node heartbeat missed event	2018-05-22 14:05:46 -07:00
Alex Dadgar	5f2080bc26	Emit events based on eligibility	2018-05-22 14:04:59 -07:00
Alex Dadgar	a35248d1d8	Plumb event via FSM	2018-05-10 16:30:54 -07:00
Preetha Appan	cba13e4ec5	Fix test set up to set ModifyTime for alloc	2018-05-07 14:55:01 -05:00
Preetha	02d63432b4	Fix typo	2018-05-07 14:55:01 -05:00
Alex Dadgar	738056634e	Fix the initial progress deadline calculation when the alloc is inplace updated to be part of a new deployment	2018-05-07 14:55:01 -05:00
Alex Dadgar	319763a5d8	remove unnessary merge of DeploymentStatus.Timestamp	2018-05-07 14:50:01 -05:00
Alex Dadgar	f95ab4ade8	Mark canaries on creation, and unmark on promotion	2018-05-07 14:50:01 -05:00
Alex Dadgar	641ef81cbf	Test fixes	2018-05-07 14:50:01 -05:00
Alex Dadgar	99e00fb774	Pass through timestamp	2018-05-07 14:50:01 -05:00
Alex Dadgar	1336002255	Progress deadline in deployment state	2018-05-07 14:50:01 -05:00
Preetha Appan	52b3b53181	Update ModifyIndex of alloc when setting NextAllocation value	2018-05-03 17:04:36 -05:00
Michael Schurter	91b5bb58d9	add HasHealth helper for nil checks We performed the DeploymentStatus nil checks a couple different ways, so hopefully this helper will consoldiate them and make it more clear what the code is doing.	2018-03-29 09:29:19 -07:00
Chelsea Holland Komlo	31557cc44f	move tests to use time.Time	2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo	003bc209b9	use time.Time for node events for compatibility	2018-03-27 15:43:57 -04:00
Michael Schurter	cb61a4bdc7	Fix linting errors	2018-03-21 16:51:45 -07:00
Alex Dadgar	2d91b9dfba	Batch drain update	2018-03-21 16:51:44 -07:00
Alex Dadgar	7b2bad8c5e	Toggle Drain allows resetting eligibility This PR allows marking a node as eligible for scheduling while toggling drain. By default the `nomad node drain -disable` commmand will mark it as eligible but the drainer will maintain in-eligibility.	2018-03-21 16:51:44 -07:00
Alex Dadgar	4754366640	job watcher	2018-03-21 16:51:44 -07:00
Alex Dadgar	93871c18f8	Fix retaining the drain	2018-03-21 16:51:44 -07:00
Alex Dadgar	0fba0101b6	RPC/FSM/State Store for Eligibility	2018-03-21 16:51:44 -07:00
Alex Dadgar	2f5309d82a	Remove update time	2018-03-21 16:51:43 -07:00
Alex Dadgar	0965c9ed28	Fix tests	2018-03-21 16:51:43 -07:00
Alex Dadgar	e459a666ed	Node.Drain takes strategy	2018-03-21 16:49:48 -07:00
Michael Schurter	03d0e5b8a0	improve drain fsm/statestore tests	2018-03-21 16:49:48 -07:00
Michael Schurter	d1ec65d765	switch to new raft DesiredTransition message	2018-03-21 16:49:48 -07:00
Alex Dadgar	db4a634072	RPC, FSM, State Store for marking DesiredTransistion fix build tag	2018-03-21 16:49:48 -07:00
Michael Schurter	c0542474db	drain: initial drainv2 structs and impl	2018-03-21 16:49:48 -07:00
Alex Dadgar	de6ebb6e6c	small cleanup	2018-03-13 18:08:22 -07:00
Alex Dadgar	63e14b7d63	nodeevents -> events	2018-03-13 18:08:22 -07:00
Alex Dadgar	d3c3deffad	fixes	2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo	b41501e442	code review feedback	2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo	8f109c344c	make check fixes	2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo	1488b076d1	code review feedback	2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo	19ef872769	keep state store functions in one file	2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo	a8bcbd81e6	batch submitting node events	2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo	d30c269fbe	code review feedback	2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo	00d9923454	Ensure node updates don't strip node events Add node events to CLI	2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo	93b732f97e	move adding node registration event to the state store	2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo	e7e4a31f5d	fix up error logging	2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo	4ede27a3c8	RPC, FSM, state store for Node.EmitEvent add node event when registering a node for the first time	2018-03-13 18:05:40 -07:00
Michael Schurter	7dd7fbcda2	non-Existent -> nonexistent Reverting from #3963 https://www.merriam-webster.com/dictionary/existent	2018-03-12 11:59:33 -07:00
Josh Soref	64546b4f99	spelling: referenced	2018-03-11 18:40:45 +00:00
Josh Soref	7ab998803b	spelling: periodic	2018-03-11 18:37:05 +00:00
Josh Soref	7f6e4012a0	spelling: existent	2018-03-11 18:30:37 +00:00
Kyle Havlovitz	2ccf565bf6	Refactor redundancy_zone/upgrade_version out of client meta	2018-01-29 20:03:38 -08:00
Preetha Appan	288ff0b6f0	Add test case to verify setting next alloc id correctly	2018-01-24 17:55:29 -06:00
Preetha Appan	fd2fbefa4c	Add a field to track the next allocation during a replacement	2018-01-24 17:55:05 -06:00
Kyle Havlovitz	8d41f4ad40	Formatting/test adjustments	2018-01-18 15:03:35 -08:00
Kyle Havlovitz	12ff22ea70	Merge branch 'master' into autopilot	2018-01-18 13:29:25 -08:00
Preetha Appan	d788c0464c	Clean up error logging	2017-12-18 17:56:12 -06:00
Alex Dadgar	1791cc3ca5	Handle upgrade path	2017-12-18 15:51:35 -08:00
Kyle Havlovitz	1c07066064	Add autopilot functionality based on Consul's autopilot	2017-12-18 14:29:41 -08:00
Preetha Appan	40cb1d327c	Address some code review comments	2017-12-18 15:22:23 -06:00
Preetha Appan	51bd0b59c7	Return an error if evaluation doesn't exist in state store at plan apply time.	2017-12-18 14:55:36 -06:00
Preetha Appan	3c36abfe14	Update eval modify index as part of plan apply.	2017-12-18 10:03:55 -06:00
Preetha Appan	39d70be009	Add ModifyTime to Allocation and update it both on plan applies and client initiated updates	2017-11-01 15:13:48 -05:00
Alex Dadgar	5d9db4c2df	Bypass status checks for system, periodic, parameterized jobs	2017-10-27 09:34:50 -07:00
Alex Dadgar	f4aa5ea0c7	lax timing	2017-10-24 10:58:06 -07:00
Alex Dadgar	1192385c63	Lax blocking query test timing	2017-10-20 13:07:17 -07:00
Alex Dadgar	c1cc51dbee	sync	2017-10-13 14:36:02 -07:00
Michael Schurter	dfd2967cdb	Merge pull request #3376 from hashicorp/f-node-acls Allow Node.SecretID for Node.GetNode and Allocs.GetAlloc	2017-10-13 11:51:48 -07:00
Michael Schurter	a003e3dd43	Add StateStore.NodeBySecretID	2017-10-12 15:27:29 -07:00
Michael Schurter	51bce7b1a3	Add index to Node.SecretID	2017-10-12 15:21:20 -07:00
Alex Dadgar	e7e18c931c	Fix sorting of job versions Fixes an issue in which the versions were improperly sorted which would cause pruning of the wrong job version. This essentially meant that job versions above 255 would be dropped from the job version table (note this was due to the prefix walk crossing from the 1-byte to 2-byte threshold). Fixes https://github.com/hashicorp/nomad/issues/3357	2017-10-12 13:33:55 -07:00
Michael Schurter	a66c53d45a	Remove `structs` import from `api` Goes a step further and removes structs import from api's tests as well by moving GenerateUUID to its own package.	2017-09-29 10:36:08 -07:00
Alex Dadgar	4173834231	Enable more linters	2017-09-26 15:26:33 -07:00
Alex Dadgar	828c4abc44	Fix upgrading from 0.6.x to 0.7.0	2017-09-19 10:28:14 -05:00
Alex Dadgar	e5ec915ac3	sync	2017-09-19 10:08:23 -05:00
Armon Dadgar	20a8e590a0	nomad: support ACL bootstrap reset	2017-09-10 16:03:30 -07:00
Alex Dadgar	84d06f6abe	Sync namespace changes	2017-09-07 17:04:21 -07:00
Armon Dadgar	10500c39e5	nomad: fixing test	2017-09-04 13:21:01 -07:00
Armon Dadgar	97404e3f8c	nomad: compute hash for ACL policies and tokens	2017-09-04 13:09:34 -07:00
Armon Dadgar	1ace912341	nomad: adding bootstrapping checks	2017-09-04 13:05:53 -07:00
Armon Dadgar	06a7f12fad	nomad: adding bootstrap state store method	2017-09-04 13:05:53 -07:00
Armon Dadgar	583a11cebd	nomad: Adding ability to filter list of tokens to global only	2017-09-04 13:04:45 -07:00
Armon Dadgar	bc697dc50e	Address @dadgar feedback	2017-09-04 13:04:45 -07:00
Armon Dadgar	f91d2608cb	nomad: renambe PublicID to AccessorID for consistency	2017-09-04 13:04:45 -07:00
Armon Dadgar	a17991e907	nomad: CRUD methods for ACLTokens	2017-09-04 13:04:45 -07:00
Armon Dadgar	8623bf9a5b	nomad: adding ACLToken table	2017-09-04 13:04:45 -07:00
Armon Dadgar	cde8e9301b	nomad: fixing state store tests due to signature mismatch	2017-09-04 13:04:44 -07:00
Armon Dadgar	351afa0069	nomad: Upsert and Delete ACL policies can take a list	2017-09-04 13:03:14 -07:00
Armon Dadgar	4cb544e8f3	nomad: Adding CRUD to state store for ACL Policies	2017-09-04 13:03:14 -07:00
Armon Dadgar	85cad11885	nomad: adding policy table to state store	2017-09-04 13:03:14 -07:00
Alex Dadgar	4cc8bac48d	fix blocking query due to ctx change	2017-08-31 15:34:55 -07:00
Alex Dadgar	590ff91bf3	Deployment watcher takes state store	2017-08-30 18:51:59 -07:00
Alex Dadgar	dfcb73c896	Fix purging job versions This PR fixes an issue in which the job versions weren't properly cleaned when removing a job. Fixes https://github.com/hashicorp/nomad/issues/3052	2017-08-18 15:46:03 -07:00
Luke Farnell	f0ced87b95	fixed all spelling mistakes for goreport	2017-08-07 17:13:05 -04:00
Alex Dadgar	5e98c3ce95	Expose FSM errors into deployment watcher and API This PR exposes errors returned by the FSM to the deployment watcher and thus the API. It also adds an error to handle the case of promoting a deployment that has no eligible canaries.	2017-07-25 16:23:22 -07:00
Alex Dadgar	311084c724	Allow the deployment to not exist and just no-op	2017-07-17 14:09:59 -07:00
Alex Dadgar	3a29b38108	Status description shows requiring promotion	2017-07-07 12:12:48 -07:00
Alex Dadgar	08bf34f9a3	Fix JobModifyIndex changing when job is marked stable	2017-07-07 12:12:48 -07:00
Alex Dadgar	de54ffd1f6	Deployment from inplace updates tracks placed properly.	2017-07-07 12:10:04 -07:00
Alex Dadgar	a7fdc74bd4	feedback	2017-07-07 12:10:04 -07:00
Alex Dadgar	5457bb7962	Job stability	2017-07-07 12:10:04 -07:00
Alex Dadgar	d07a5a2008	Complete deployments mark jobs as stable This PR allows jobs to be marked as stable automatically by a successful deployment.	2017-07-07 12:10:04 -07:00
Alex Dadgar	454083ba1b	Remove canary	2017-07-07 12:10:04 -07:00
Alex Dadgar	c10d7ab871	Remove promoted bit from allocation	2017-07-07 12:10:04 -07:00
Alex Dadgar	09dfa2fc10	Rename CreateDeployments and remove cancelling behavior in state_store	2017-07-07 12:10:04 -07:00
Alex Dadgar	2e2fd26bed	Update index	2017-07-07 12:07:08 -07:00
Alex Dadgar	ecee5e370e	initial watcher	2017-07-07 12:07:08 -07:00
Alex Dadgar	e7034691ea	deployment status	2017-07-07 12:07:07 -07:00
Alex Dadgar	b64185a3f1	Deployment GC This PR implements the garbage collector for deployments. Deployments will by default be garbage collected after 1 hour.	2017-07-07 12:05:57 -07:00
Alex Dadgar	73325f888f	deployment api	2017-07-07 12:03:11 -07:00
Alex Dadgar	dad9e69822	more comment fixes	2017-07-07 12:03:11 -07:00
Alex Dadgar	c189948ad2	comments on watcher	2017-07-07 12:03:11 -07:00
Alex Dadgar	87d187d777	Tests	2017-07-07 12:03:11 -07:00
Alex Dadgar	6f821beec4	fix integration slightly	2017-07-07 12:03:11 -07:00
Alex Dadgar	b4c8f56570	Deployment watcher tests	2017-07-07 12:03:11 -07:00
Alex Dadgar	80dc4d66d8	Deployments list	2017-07-07 12:03:11 -07:00
Alex Dadgar	eec3cefee4	state store tests	2017-07-07 12:03:11 -07:00
Alex Dadgar	d04877d23c	initial impl	2017-07-07 12:03:11 -07:00
Alex Dadgar	0d42b5d421	initial reconciler	2017-07-07 12:01:17 -07:00
Michael Schurter	8d3e13ab8a	System jobs without evals are running too	2017-07-03 13:48:51 -07:00
Michael Schurter	f7d2a74ddf	System jobs should be running until stopped Prior to this commit they would be marked as dead if they had no currently running allocations -- even though they would spring back to life (running) if the cluster state changed such that a new eval+alloc was created.	2017-06-28 11:39:24 -07:00
Alex Dadgar	83f5e65aae	Plan allows updating the status of deployments	2017-05-11 12:49:04 -07:00
Alex Dadgar	9a576bafd1	Use a detected struct to hold deployment status for an allocation	2017-05-11 11:09:29 -07:00
Alex Dadgar	71788faacd	Easy feedback fixes	2017-05-10 15:26:00 -07:00
Alex Dadgar	7078d563cb	Create Deployments through plan application	2017-05-05 15:33:19 -07:00
Alex Dadgar	343ff03f02	Deployment struct, state store, fsm persist/restore	2017-05-04 13:37:18 -07:00
Alex Dadgar	aed852782f	Merge pull request #2592 from hashicorp/b-gc-race Protect against nil job in new allocation	2017-05-01 13:54:43 -07:00
Alex Dadgar	efa91c3d89	Protect against nil job in new allocation	2017-04-26 18:27:27 -07:00
Alex Dadgar	367f4b592f	docs	2017-04-20 11:14:06 -07:00
Alex Dadgar	1b97c9abdd	Revert server endpoint	2017-04-20 11:14:06 -07:00
Alex Dadgar	5a2449d236	Respond to review comments	2017-04-19 10:54:03 -07:00
Alex Dadgar	1769fe468a	Fix some tests	2017-04-17 19:39:20 -07:00
Alex Dadgar	34332af70e	GC and some fixes	2017-04-15 17:08:05 -07:00
Alex Dadgar	3145086a42	non-purge deregisters	2017-04-15 17:08:05 -07:00
Alex Dadgar	fda44689b7	Histories -> Versions	2017-04-15 17:08:05 -07:00
Alex Dadgar	f97664512b	Upsert Job Histories	2017-04-15 17:08:05 -07:00
Alex Dadgar	d489ed3c7d	Job History schema	2017-04-15 17:08:05 -07:00
Alex Dadgar	787be30f13	Fix periodic job state This PR fixes an issue in which a periodic job would incorrectly transistion to status dead. Fixes https://github.com/hashicorp/nomad/issues/2268	2017-03-27 10:35:36 -07:00
Alex Dadgar	5be806a3df	Fix vet script and fix vet problems This PR fixes our vet script and fixes all the missed vet changes. It also fixes pointers being printed in `nomad stop <job>` and `nomad node-status <node>`.	2017-02-27 16:00:19 -08:00
Alex Dadgar	21ef1ce685	Add guard	2017-02-10 16:29:28 -08:00
Alex Dadgar	5d293c0f1e	Add abandon tests and use snapshot for blocking queries	2017-02-08 11:18:03 -08:00
Alex Dadgar	36d018514b	Fix test	2017-02-07 11:35:38 -08:00
Alex Dadgar	bc2e6b0cc2	Fix state store tests	2017-02-06 16:46:23 -08:00
Alex Dadgar	c026a97ce7	Use watchset on getter methods	2017-02-05 12:45:57 -08:00
Alex Dadgar	570efcaebd	Update state store and blocking query helper	2017-02-05 12:03:11 -08:00
Alex Dadgar	7f9c6466d4	Disallow GC of parameterized jobs This PR makes it so parameterized jobs do not get garbage collected and adds a test.	2017-01-26 11:57:32 -08:00
Michael Schurter	1f7b5b4b47	Rename Constructor -> Parameterized Job	2017-01-20 12:43:10 -08:00
Alex Dadgar	a616e9d970	Merge pull request #2163 from hashicorp/b-summary Job Summary: Fix queued accounting and remove in-place state store updates	2017-01-11 13:36:36 -08:00
Alex Dadgar	efbb2894c7	Review fixes	2017-01-11 13:18:36 -08:00
Alex Dadgar	0f046b179a	Merge pull request #2155 from hashicorp/f-cancel Cancel blocked evals upon successful one for job	2017-01-11 13:10:35 -08:00
Alex Dadgar	d0f918495b	Store pointer of JobSummary in state store and remove in-place modifications of the object and replace with Copy-Update-Insert operations	2017-01-08 13:55:03 -08:00
Alex Dadgar	8d5f0fea69	Merge pull request #2128 from hashicorp/f-dispatch Nomad Constructor Jobs and Dispatch	2017-01-06 05:22:49 +08:00
Alex Dadgar	86980e08f0	Cancel blocked evals upon successful one for job This PR causes blocked evaluations to be cancelled if there is a subsequent successful evaluation for the job. This fixes UX problems showing failed placements when there are not any in reality and makes GC possible for these jobs in certain cases. Fixes https://github.com/hashicorp/nomad/issues/2124	2017-01-04 16:16:04 -08:00
Diptanu Choudhury	9786708526	Added comments	2016-12-19 18:10:02 -08:00
Alex Dadgar	2761e1d8ea	fix tests	2016-12-16 10:21:56 -08:00
Alex Dadgar	bf1e157bd8	Children fixes + nomad status outputs summaries Children object is always initialized instead of lazily. `nomad status` outputs children summaries and has specialized view for constructor jobs.	2016-12-14 16:58:54 -08:00
Alex Dadgar	4a5c3c8db0	Rename structs	2016-12-14 14:28:43 -08:00
Alex Dadgar	1235fc6581	summary tests	2016-12-13 16:15:40 -08:00
Alex Dadgar	af2865ea48	Don't modify jobs status inplace	2016-12-12 11:42:47 -08:00
Alex Dadgar	8885ab70a6	Handle the delete case	2016-12-06 20:15:10 -08:00
Alex Dadgar	ef79e77e52	Children summary	2016-12-06 17:06:57 -08:00
Alex Dadgar	c005fcb973	Add structs	2016-12-05 17:24:37 -08:00
Diptanu Choudhury	5191b4d33a	Making the status command return the allocs of currently registered job	2016-11-24 16:31:30 +01:00
Diptanu Choudhury	56ed1d3cd8	Fixing the upgrade path for ephemeral disk	2016-11-08 15:24:51 -08:00
Alex Dadgar	df4398beac	Implement blocking queries for /v1/job/evaluations	2016-10-29 17:30:34 -07:00
Diptanu Choudhury	1b3c5e98c8	Renaming LocalDisk to EphemeralDisk (#1710 ) Renaming LocalDisk to EphemeralDisk	2016-09-14 15:43:42 -07:00
Diptanu Choudhury	6028682ad2	Adding LocalDisk to alloc.Job	2016-09-01 17:41:50 -07:00
Alex Dadgar	3c9936ae4a	Merge pull request #1659 from hashicorp/f-revoke-accessors Token revocation and keeping only a single Vault client active among servers	2016-08-31 14:10:46 -07:00
Diptanu Choudhury	bfee7b30a3	Introducing shared resources in alloc	2016-08-29 13:49:25 -07:00
Alex Dadgar	48696ba0cc	Use tomb to shutdown Token revocation Remove from the statestore Revoke tokens Don't error when Vault is disabled as this could cause issue if the operator ever goes from enabled to disabled update server interface to allow enable/disable and config loading test the new functions Leader revoke Use active	2016-08-28 14:06:25 -07:00
Diptanu Choudhury	13497913f9	Ensuring resources are re-calculated properly in fsm	2016-08-26 20:13:11 -07:00
Diptanu Choudhury	2f681b6415	Added copy method to LocalDisk	2016-08-26 14:24:47 -05:00
Diptanu Choudhury	3447658bba	Added scheduler tests to ensure disk constraints are honored	2016-08-25 15:31:56 -05:00
Diptanu Choudhury	ffaf6c6299	Fixed some tests	2016-08-25 13:56:39 -05:00

... 3 4 5 6 7 ...

566 commits