open-nomad

Commit Graph

Author	SHA1	Message	Date
Mahmood Ali	6bdbeed319	set node.StatusUpdatedAt in raft Fix a case where `node.StatusUpdatedAt` was manipulated directly in memory. This ensures that StatusUpdatedAt is set in raft layer, and ensures that the field is updated when node drain/eligibility is updated too.	2019-05-21 16:13:32 -04:00
Michael Schurter	1bc731da47	nomad: remove unused NotifyGroup struct I don't think it's been used for a long time.	2019-05-17 13:30:23 -07:00
Michael Schurter	9732bc37ff	nomad: refactor waitForIndex into SnapshotAfter Generalize wait for index logic in the state store for reuse elsewhere. Also begin plumbing in a context to combine handling of timeouts and shutdown.	2019-05-17 13:30:23 -07:00
Preetha	c8fdf20c66	Merge pull request #5717 from hashicorp/b-plan-apply-preemptions Fix bug in plan applier introduced in PR-5602	2019-05-16 11:01:05 -05:00
Preetha	555dd23c2c	remove stray newline Co-Authored-By: Danielle <dani@builds.terrible.systems>	2019-05-15 21:11:52 -05:00
Preetha Appan	2b787aad7e	Fix bug in plan applier introduced in PR-5602 This fixes a bug in the state store during plan apply. When denormalizing preempted allocations it incorrectly set the preemptor's job during the update. This eventually causes a panic downstream in the client. Added a test assertion that failed before and passes after this fix	2019-05-15 20:34:06 -05:00
Preetha Appan	d448750449	Lookup job only once, and fix tests	2019-05-13 18:33:41 -05:00
Preetha Appan	07690d6f9e	Add flag similar to --all for allocs to be able to filter deployments by latest	2019-05-13 18:33:41 -05:00
Arshneet Singh	9cc39edb67	Return error when preempted/stopped alloc doesn't exist during denormalization	2019-04-24 12:36:07 -07:00
Arshneet Singh	d4e7a5c005	Add comments to functions, and use require instead of assert	2019-04-23 09:57:21 -07:00
Arshneet Singh	4cf4324b8f	Remove allowPlanOptimization from schedulers	2019-04-23 09:18:02 -07:00
Arshneet Singh	65f5fab131	Add tests for plan normalization	2019-04-23 09:18:01 -07:00
Arshneet Singh	b977748a4b	Add code for plan normalization	2019-04-23 09:18:01 -07:00
Alex Dadgar	4bdccab550	goimports	2019-01-22 15:44:31 -08:00
Preetha Appan	510d7839e4	code review comments	2019-01-18 17:41:39 -06:00
Preetha Appan	be9656d195	fix linting	2019-01-17 15:36:33 -06:00
Preetha Appan	0f8a113ead	Refactor to find jobs with child instances more effeciently also added unit tests	2019-01-17 14:29:48 -06:00
Preetha Appan	be36fee48e	Use IsParameterized/isPeriodic methods	2019-01-17 12:15:42 -06:00
Preetha Appan	81a8f18cac	Fix bug in reconcile summaries that affects periodic/parameterized jobs This fixes incorrect parent job summaries by recomputing them in the ReconcileJobSummaries method in the state store	2019-01-17 12:01:01 -06:00
Preetha Appan	8656d3379f	Add guards around subtracting summary count	2018-12-03 11:16:35 -06:00
Mahmood Ali	6281700c0c	address review comments	2018-11-20 13:21:39 -05:00
Mahmood Ali	d744e71fa9	add a missing no errorassertion	2018-11-19 21:44:00 -05:00
Mahmood Ali	b93643cd96	Fix a panic related to batch GC `deleteJobVersions` does concurrent modifications to iterated items while iterating, by deleting job versions while it's iterating on them,	2018-11-19 20:59:45 -05:00
Mahmood Ali	bff9c3b3e9	Reproduce a panic related to batch GC Test case that reproduces a panic with the following stacktrace: ``` panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715] goroutine 35 [running]: testing.tRunner.func1(0xc0001e2200) /usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387 panic(0x167e400, 0x1c43a30) /usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9 github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530) /go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5 github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(radixIterator).Next(0xc0003a0420, 0x1756059, 0xb) /go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e github.com/hashicorp/nomad/nomad/state.(StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0) /go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1 github.com/hashicorp/nomad/nomad/state.(StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0) /go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100) /go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2 github.com/hashicorp/nomad/nomad/state.(StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0) /go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79 github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200) /go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685 testing.tRunner(0xc0001e2200, 0x1777138) /usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf created by testing.(T).Run /usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353 ```	2018-11-19 20:58:32 -05:00
Mahmood Ali	a4a9347501	fix comment typos	2018-11-14 08:36:14 -05:00
Mahmood Ali	1403ad21b9	Changelog job re-run fix	2018-11-13 07:52:51 -05:00
Mahmood Ali	e2d668f21c	Merge pull request #4861 from hashicorp/b-batch-deregister-transaction Run job deregistering in a single transaction	2018-11-12 20:59:44 -05:00
Mahmood Ali	8513b3cccb	Comment public functions and batch write txn	2018-11-12 16:09:39 -05:00
Preetha Appan	7ef126a027	Smaller methods, and added tests for RPC layer	2018-11-10 17:37:33 -06:00
Mahmood Ali	9c0a15f3ce	Run job deregistering in a single transaction Fixes https://github.com/hashicorp/nomad/issues/4299 Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals. Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index. However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals. When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation. This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.	2018-11-09 22:35:26 -05:00
Alex Dadgar	98398a8a44	Merge pull request #4842 from hashicorp/b-deployment-progress-deadline Fix multiple bugs with progress deadline handling	2018-11-08 13:31:54 -08:00
Alex Dadgar	261aae32b1	more robust merging of the deployment status when getting updates from the client	2018-11-05 16:39:09 -08:00
Alex Dadgar	1c31970464	Fix multiple tgs with progress deadline handling Fix an issue in which the deployment watcher would fail the deployment based on the earliest progress deadline of the deployment regardless of if the task group has finished. Further fix an issue where the blocked eval optimization would make it so no evals were created to progress the deployment. To reproduce this issue, prior to this commit, you can create a job with two task groups. The first group has count 1 and resources such that it can not be placed. The second group has count 3, max_parallel=1, and can be placed. Run this first and then update the second group to do a deployment. It will place the first of three, but never progress since there exists a blocked eval. However, that doesn't capture the fact that there are two groups being deployed.	2018-11-05 16:06:17 -08:00
Preetha Appan	57fe5050f0	more minor review feedback	2018-11-01 17:05:17 -05:00
Preetha Appan	8f7eb61823	Introduce a response object for scheduler configuration	2018-10-30 11:06:32 -05:00
Preetha Appan	1a5421f5d7	more minor cleanup	2018-10-30 11:06:32 -05:00
Preetha Appan	0494a098ce	More style and readablity fixes from review	2018-10-30 11:06:32 -05:00
Preetha Appan	1415032c13	More review comments	2018-10-30 11:06:32 -05:00
Preetha Appan	7b8156fc47	Restore/Snapshot plus unit tests for scheduler configuration	2018-10-30 11:06:32 -05:00
Preetha Appan	bd34cbb1f7	Support for new scheduler config API, first use case is to disable preemption	2018-10-30 11:06:32 -05:00
Preetha Appan	eb38488d08	Fix logic bug, unit test for plan apply method in state store	2018-10-30 11:06:32 -05:00
Preetha Appan	cc295b90de	Implement preemption for system jobs. This commit implements an allocation selection algorithm for finding allocations to preempt. It currently special cases network resource asks from others (cpu/memory/disk/iops).	2018-10-30 11:06:32 -05:00
Alex Dadgar	52f9cd7637	fixing tests	2018-10-04 14:26:19 -07:00
Alex Dadgar	b2449ae1ce	Fix deployment watcher index usage Fixes three issues: 1. Retrieving the latest evaluation index was not properly selecting the greatest index. This would undermine checks we had to reduce the number of evaluations created when the latest eval index was greater than any alloc change 2. Fix an issue where the blocking query code was using the incorrect index such that the index was higher than necassary. 3. Special case handling of blocked evaluation since the create/snapshot index is no particularly useful since they can be reblocked.	2018-09-21 13:59:11 -07:00
Alex Dadgar	3c19d01d7a	server	2018-09-15 16:23:13 -07:00
Alex Dadgar	300b1a7a15	Tests only use testlog package logger	2018-06-13 15:40:56 -07:00
Alex Dadgar	21c5ed850d	Register events	2018-05-22 14:06:33 -07:00
Alex Dadgar	17aac1c9de	node heartbeat missed event	2018-05-22 14:05:46 -07:00
Alex Dadgar	5f2080bc26	Emit events based on eligibility	2018-05-22 14:04:59 -07:00
Alex Dadgar	a35248d1d8	Plumb event via FSM	2018-05-10 16:30:54 -07:00

1 2 3 4 5 ...

331 Commits