open-nomad

Author	SHA1	Message	Date
Mahmood Ali	9c0a15f3ce	Run job deregistering in a single transaction Fixes https://github.com/hashicorp/nomad/issues/4299 Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals. Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index. However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals. When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation. This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.	2018-11-09 22:35:26 -05:00
Preetha	3739713ce1	Merge pull request #4839 from hashicorp/b-gc-alloc-jobversion Remove terminal allocations associated with older job modify index	2018-11-09 12:21:42 -06:00
Preetha Appan	39072977d6	Use create index as trigger condition to gc old terminal allocs	2018-11-09 11:44:21 -06:00
Alex Dadgar	2f06d88f47	Merge pull request #4847 from hashicorp/b-blocked-eval Blocked evaluation fixes	2018-11-08 13:40:01 -08:00
Alex Dadgar	98398a8a44	Merge pull request #4842 from hashicorp/b-deployment-progress-deadline Fix multiple bugs with progress deadline handling	2018-11-08 13:31:54 -08:00
Alex Dadgar	991791a513	typo fix	2018-11-08 13:28:27 -08:00
Alex Dadgar	be54e56570	review fixes	2018-11-08 09:48:36 -08:00
Preetha Appan	5f0a9d2cfd	Show preemption output in plan CLI	2018-11-08 09:48:43 -06:00
Alex Dadgar	dbb05357bc	fix test	2018-11-07 11:59:24 -08:00
Alex Dadgar	36abd3a3d8	review comments	2018-11-07 10:33:22 -08:00
Alex Dadgar	e3cbb2c82e	allocs fit checks if devices get oversubscribed	2018-11-07 10:33:22 -08:00
Alex Dadgar	4f9b3ede87	Split device accounter and allocator	2018-11-07 10:32:03 -08:00
Alex Dadgar	6fa893c801	affinities	2018-11-07 10:32:03 -08:00
Alex Dadgar	feb83a2be3	assign devices	2018-11-07 10:32:03 -08:00
Alex Dadgar	2d2248e209	Add devices to allocated resources	2018-11-07 10:32:03 -08:00
Alex Dadgar	b1c5d52817	Track jobs by namespace	2018-11-07 10:22:08 -08:00
Alex Dadgar	6d8bb3a7bd	Duplicate blocked evals cancelling improved The old logic for cancelling duplicate blocked evaluations by job id had the issue where the newer evaluation could have additional node classes that it is (in)eligible for that we would not capture. This could make it such that cluster state could change such that the job would make progress but no evaluation was unblocked.	2018-11-07 10:08:23 -08:00
Preetha Appan	a9aec7e628	Fix failing resource subtraction test	2018-11-06 12:26:26 -06:00
Alex Dadgar	261aae32b1	more robust merging of the deployment status when getting updates from the client	2018-11-05 16:39:09 -08:00
Alex Dadgar	1c31970464	Fix multiple tgs with progress deadline handling Fix an issue in which the deployment watcher would fail the deployment based on the earliest progress deadline of the deployment regardless of if the task group has finished. Further fix an issue where the blocked eval optimization would make it so no evals were created to progress the deployment. To reproduce this issue, prior to this commit, you can create a job with two task groups. The first group has count 1 and resources such that it can not be placed. The second group has count 3, max_parallel=1, and can be placed. Run this first and then update the second group to do a deployment. It will place the first of three, but never progress since there exists a blocked eval. However, that doesn't capture the fact that there are two groups being deployed.	2018-11-05 16:06:17 -08:00
Preetha Appan	6fdc84cce3	add comment	2018-11-02 18:11:36 -05:00
Preetha Appan	a6b714b81c	update preemption tests to use new node resource structs also includes a fix to remove unnecessary subtraction of network mbits	2018-11-02 17:59:53 -05:00
Preetha	b2b52b1ada	Merge pull request #4794 from hashicorp/f-preemption-systemjobs Preemption for system jobs	2018-11-02 16:28:06 -05:00
Preetha Appan	c33469157d	unit test plan apply with preemptions	2018-11-01 20:06:32 -05:00
Preetha Appan	57fe5050f0	more minor review feedback	2018-11-01 17:05:17 -05:00
Preetha Appan	fd60e66f86	Plumb alloc resource cache in a few more places. also removed now unused method	2018-11-01 16:44:43 -05:00
Preetha Appan	e586817ce7	batch jobs GC removes terminal allocs if job modifyindex is older than running job	2018-11-01 00:05:31 -05:00
Mahmood Ali	9da19c6450	address review comments	2018-10-30 13:58:52 -04:00
Mahmood Ali	4937095389	Allow artifacts checksum interpolation Fixes https://github.com/hashicorp/nomad/issues/4814	2018-10-30 13:24:30 -04:00
Preetha Appan	f1c3eb2792	Introduce interface with multiple implementations for resource distance	2018-10-30 11:06:32 -05:00
Preetha Appan	8f7eb61823	Introduce a response object for scheduler configuration	2018-10-30 11:06:32 -05:00
Preetha Appan	1a5421f5d7	more minor cleanup	2018-10-30 11:06:32 -05:00
Preetha Appan	0494a098ce	More style and readablity fixes from review	2018-10-30 11:06:32 -05:00
Preetha Appan	1415032c13	More review comments	2018-10-30 11:06:32 -05:00
Preetha Appan	b97f85e3e0	style fixes	2018-10-30 11:06:32 -05:00
Preetha Appan	12278527c7	make default config a variable	2018-10-30 11:06:32 -05:00
Preetha Appan	32cc764072	Add fsm layer tests	2018-10-30 11:06:32 -05:00
Preetha Appan	7b8156fc47	Restore/Snapshot plus unit tests for scheduler configuration	2018-10-30 11:06:32 -05:00
Preetha Appan	8807c25b11	Modify preemption code to use new style of resource structs	2018-10-30 11:06:32 -05:00
Preetha Appan	c1c1c230e4	Make preemption config a struct to allow for enabling based on scheduler type	2018-10-30 11:06:32 -05:00
Preetha Appan	bd34cbb1f7	Support for new scheduler config API, first use case is to disable preemption	2018-10-30 11:06:32 -05:00
Preetha Appan	3190a2c29b	Fix linting	2018-10-30 11:06:32 -05:00
Preetha Appan	eb38488d08	Fix logic bug, unit test for plan apply method in state store	2018-10-30 11:06:32 -05:00
Preetha Appan	9e4a35fff0	Fix comment	2018-10-30 11:06:32 -05:00
Preetha Appan	cc295b90de	Implement preemption for system jobs. This commit implements an allocation selection algorithm for finding allocations to preempt. It currently special cases network resource asks from others (cpu/memory/disk/iops).	2018-10-30 11:06:32 -05:00
Preetha Appan	d11064d6ba	structs and API changes to plan and alloc structs needed for preemption	2018-10-30 11:06:32 -05:00
Preetha Appan	9257387a69	Add number of evictions to DesiredUpdates struct to use in CLI/API	2018-10-30 11:06:32 -05:00
Preetha Appan	5ff4b8e36f	REview feedback	2018-10-30 11:06:32 -05:00
Preetha Appan	5b3bfb63eb	structs and API changes to plan and alloc structs needed for preemption	2018-10-30 11:06:32 -05:00
Michael Schurter	5d49832de4	tests: fix usages of TestClient cleanup and mock driver	2018-10-29 14:21:05 -07:00
Michael Schurter	e060174130	ar: fix leader handling, state restoring, and destroying unrun ARs * Migrated all of the old leader task tests and got them passing * Refactor and consolidate task killing code in AR to always kill leader tasks first * Fixed lots of issues with state restoring * Fixed deadlock in AR.Destroy if AR.Run had never been called * Added a new in memory statedb for testing	2018-10-19 09:45:45 -07:00
Alex Dadgar	6f0ed6184b	Fix client reloading and pass the plugin loaders to server and client	2018-10-16 16:56:55 -07:00
Michael Schurter	a4b4d7b266	consul service hook Deregistration works but difficult to test due to terminal updates not being fully implemented in the new client/ar/tr.	2018-10-16 16:53:29 -07:00
Alex Dadgar	e401c660e7	Implement lifecycle hooks on the task runner	2018-10-16 16:53:29 -07:00
Alex Dadgar	a78cefec18	use int64	2018-10-16 15:34:32 -07:00
Preetha Appan	7c0d8c646c	Change CPU/Disk/MemoryMB to int everywhere in new resource structs	2018-10-16 16:21:42 -05:00
Alex Dadgar	f5a76d8411	review comments	2018-10-15 15:31:13 -07:00
Alex Dadgar	f9b056e1d1	Replace attributes map with new Attribute object	2018-10-13 14:08:58 -07:00
Alex Dadgar	04ba425dd5	validate constraints/affinities	2018-10-13 12:27:49 -07:00
Alex Dadgar	9b5aaac410	Device feasability checker	2018-10-13 12:27:49 -07:00
Alex Dadgar	bfb4caa2e7	node devices	2018-10-13 12:27:49 -07:00
Alex Dadgar	5a07f9f96e	parse affinities and constraints on devices	2018-10-11 14:05:19 -07:00
Alex Dadgar	a2a56a930c	Diff	2018-10-08 17:02:58 -07:00
Alex Dadgar	6b08b9d6b6	Define device request structs	2018-10-08 15:38:03 -07:00
Alex Dadgar	01f8e5b95f	renames	2018-10-04 14:57:25 -07:00
Alex Dadgar	52f9cd7637	fixing tests	2018-10-04 14:26:19 -07:00
Alex Dadgar	bac5cb1e8b	Scheduler uses allocated resources	2018-10-02 17:08:25 -07:00
Alex Dadgar	147d2430a1	allocated resources structs	2018-09-29 18:47:28 -07:00
Alex Dadgar	5c8697667e	Node reserved resources	2018-09-29 18:44:55 -07:00
Alex Dadgar	3183153315	Node resources on client	2018-09-29 17:23:41 -07:00
Alex Dadgar	9b793531d6	Merge pull request #4720 from hashicorp/b-jet-fixes Series of scheduler fixes / debugging enhancements	2018-09-25 13:25:11 -07:00
Alex Dadgar	bd420692f3	fix logging	2018-09-25 10:49:55 -07:00
Preetha Appan	86e725e84c	Added logging around nacked evals in the scheduler worker	2018-09-25 10:49:02 -07:00
Alex Dadgar	6a21f9fe96	Unique TriggerBy for blocked evals Give blocked evals a unique triggerby reason to make debugging a chain of evaluations easier.	2018-09-24 14:47:49 -07:00
Alex Dadgar	e1a102f58c	test allocs fit	2018-09-24 13:59:01 -07:00
Alex Dadgar	d7f5be9148	Better comment on snapshotindex	2018-09-24 13:53:43 -07:00
Alex Dadgar	99498da6ed	Denormalize jobs in plan and ignore resources of terminal allocs Denormalize jobs in AppendAllocs: AppendAlloc was originally only ever called for inplace upgrades and new allocations. Both these code paths would remove the job from the allocation. Now we use this to also add fields such as FollowupEvalID which did not normalize the job. This is only a performance enhancement. Ignore terminal allocs: Failed allocations are annotated with the followup Eval ID when one is created to replace the failed allocation. However, in the plan applier, when we check if allocations fit, these terminal allocations were not filtered. This could result in the plan being rejected if the node would be overcommited if the terminal allocations resources were considered.	2018-09-24 13:53:43 -07:00
Alex Dadgar	de442226ae	Fix other instances of blocking queries	2018-09-24 13:52:39 -07:00
Alex Dadgar	7f0d241ef4	always handle failed allocation	2018-09-21 15:13:54 -07:00
Alex Dadgar	b2449ae1ce	Fix deployment watcher index usage Fixes three issues: 1. Retrieving the latest evaluation index was not properly selecting the greatest index. This would undermine checks we had to reduce the number of evaluations created when the latest eval index was greater than any alloc change 2. Fix an issue where the blocking query code was using the incorrect index such that the index was higher than necassary. 3. Special case handling of blocked evaluation since the create/snapshot index is no particularly useful since they can be reblocked.	2018-09-21 13:59:11 -07:00
Alex Dadgar	5009566503	do not bootstrap with non voters	2018-09-19 17:17:39 -07:00
Alex Dadgar	e8f89597f5	fix rpc test	2018-09-19 10:17:54 -07:00
Alex Dadgar	9971b3393f	yamux	2018-09-17 14:22:40 -07:00
Alex Dadgar	b2f500b48c	Serf/Raft/Memberlist logger	2018-09-17 13:57:52 -07:00
Alex Dadgar	ca28afa3b2	small fixes	2018-09-15 16:42:38 -07:00
Alex Dadgar	3c19d01d7a	server	2018-09-15 16:23:13 -07:00
Alex Dadgar	7739ef51ce	agent + consul	2018-09-13 10:43:40 -07:00
Preetha Appan	996484981c	Fix panic when reschedule policy for allocation can't be looked up because its task group changed	2018-09-05 17:01:02 -05:00
Alex Dadgar	4f89cabd34	Merge pull request #4631 from hashicorp/f-plugin-config Parse plugin configs	2018-09-04 17:04:13 -07:00
Alex Dadgar	cc92cd92cd	Merge pull request #4642 from hashicorp/b-vet Fix vet errors and use newer go version in travis	2018-09-04 17:04:02 -07:00
Alex Dadgar	c6576ddac1	Fix make check errors	2018-09-04 16:03:52 -07:00
Preetha Appan	26288b9522	Fix more review feedback	2018-09-04 16:10:11 -05:00
Preetha Appan	751c0eb5a5	code review feedback	2018-09-04 16:10:11 -05:00
Preetha Appan	4f8e925b54	Move topk and delay heap to separate packages under lib	2018-09-04 16:10:11 -05:00
Preetha Appan	9bc0962527	Track top k nodes by norm score rather than top k nodes per scorer	2018-09-04 16:10:11 -05:00
Preetha Appan	6ed527c636	Use heap to store top K scoring nodes. Scoring metadata is now aggregated by scorer type to make it easier to parse when reading it in the CLI.	2018-09-04 16:10:11 -05:00
Preetha Appan	dd5fe6373f	Fix scoring logic for uneven spread to incorporate current alloc count Also addressed other small code review comments	2018-09-04 16:10:11 -05:00
Preetha Appan	e72c0fe527	more cleanup	2018-09-04 16:10:11 -05:00
Preetha Appan	92d37acc2a	comment and formatting cleanup	2018-09-04 16:10:11 -05:00
Preetha Appan	5812f906c8	Allow empty spread targets, and validate target percentages.	2018-09-04 16:10:11 -05:00

1 2 3 4 5 ...

2590 commits