Arshneet Singh
9cc39edb67
Return error when preempted/stopped alloc doesn't exist during denormalization
2019-04-24 12:36:07 -07:00
Arshneet Singh
d4e7a5c005
Add comments to functions, and use require instead of assert
2019-04-23 09:57:21 -07:00
Arshneet Singh
4cf4324b8f
Remove allowPlanOptimization from schedulers
2019-04-23 09:18:02 -07:00
Arshneet Singh
65f5fab131
Add tests for plan normalization
2019-04-23 09:18:01 -07:00
Arshneet Singh
b977748a4b
Add code for plan normalization
2019-04-23 09:18:01 -07:00
Alex Dadgar
4bdccab550
goimports
2019-01-22 15:44:31 -08:00
Preetha Appan
510d7839e4
code review comments
2019-01-18 17:41:39 -06:00
Preetha Appan
be9656d195
fix linting
2019-01-17 15:36:33 -06:00
Preetha Appan
0f8a113ead
Refactor to find jobs with child instances more effeciently
...
also added unit tests
2019-01-17 14:29:48 -06:00
Preetha Appan
be36fee48e
Use IsParameterized/isPeriodic methods
2019-01-17 12:15:42 -06:00
Preetha Appan
81a8f18cac
Fix bug in reconcile summaries that affects periodic/parameterized jobs
...
This fixes incorrect parent job summaries by recomputing them in the
ReconcileJobSummaries method in the state store
2019-01-17 12:01:01 -06:00
Preetha Appan
8656d3379f
Add guards around subtracting summary count
2018-12-03 11:16:35 -06:00
Mahmood Ali
6281700c0c
address review comments
2018-11-20 13:21:39 -05:00
Mahmood Ali
d744e71fa9
add a missing no errorassertion
2018-11-19 21:44:00 -05:00
Mahmood Ali
b93643cd96
Fix a panic related to batch GC
...
`deleteJobVersions` does concurrent modifications to iterated items
while iterating, by deleting job versions while it's iterating on them,
2018-11-19 20:59:45 -05:00
Mahmood Ali
bff9c3b3e9
Reproduce a panic related to batch GC
...
Test case that reproduces a panic with the following stacktrace:
```
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715]
goroutine 35 [running]:
testing.tRunner.func1(0xc0001e2200)
/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387
panic(0x167e400, 0x1c43a30)
/usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530)
/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc0003a0420, 0x1756059, 0xb)
/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e
github.com/hashicorp/nomad/nomad/state.(*StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1
github.com/hashicorp/nomad/nomad/state.(*StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2
github.com/hashicorp/nomad/nomad/state.(*StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685
testing.tRunner(0xc0001e2200, 0x1777138)
/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353
```
2018-11-19 20:58:32 -05:00
Mahmood Ali
a4a9347501
fix comment typos
2018-11-14 08:36:14 -05:00
Mahmood Ali
1403ad21b9
Changelog job re-run fix
2018-11-13 07:52:51 -05:00
Mahmood Ali
e2d668f21c
Merge pull request #4861 from hashicorp/b-batch-deregister-transaction
...
Run job deregistering in a single transaction
2018-11-12 20:59:44 -05:00
Mahmood Ali
8513b3cccb
Comment public functions and batch write txn
2018-11-12 16:09:39 -05:00
Preetha Appan
7ef126a027
Smaller methods, and added tests for RPC layer
2018-11-10 17:37:33 -06:00
Mahmood Ali
9c0a15f3ce
Run job deregistering in a single transaction
...
Fixes https://github.com/hashicorp/nomad/issues/4299
Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals.
Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index. However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals. When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation.
This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.
2018-11-09 22:35:26 -05:00
Alex Dadgar
98398a8a44
Merge pull request #4842 from hashicorp/b-deployment-progress-deadline
...
Fix multiple bugs with progress deadline handling
2018-11-08 13:31:54 -08:00
Alex Dadgar
261aae32b1
more robust merging of the deployment status when getting updates from the client
2018-11-05 16:39:09 -08:00
Alex Dadgar
1c31970464
Fix multiple tgs with progress deadline handling
...
Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.
Further fix an issue where the blocked eval optimization would make it
so no evals were created to progress the deployment. To reproduce this
issue, prior to this commit, you can create a job with two task groups.
The first group has count 1 and resources such that it can not be
placed. The second group has count 3, max_parallel=1, and can be placed.
Run this first and then update the second group to do a deployment. It
will place the first of three, but never progress since there exists a
blocked eval. However, that doesn't capture the fact that there are two
groups being deployed.
2018-11-05 16:06:17 -08:00
Preetha Appan
57fe5050f0
more minor review feedback
2018-11-01 17:05:17 -05:00
Preetha Appan
8f7eb61823
Introduce a response object for scheduler configuration
2018-10-30 11:06:32 -05:00
Preetha Appan
1a5421f5d7
more minor cleanup
2018-10-30 11:06:32 -05:00
Preetha Appan
0494a098ce
More style and readablity fixes from review
2018-10-30 11:06:32 -05:00
Preetha Appan
1415032c13
More review comments
2018-10-30 11:06:32 -05:00
Preetha Appan
7b8156fc47
Restore/Snapshot plus unit tests for scheduler configuration
2018-10-30 11:06:32 -05:00
Preetha Appan
bd34cbb1f7
Support for new scheduler config API, first use case is to disable preemption
2018-10-30 11:06:32 -05:00
Preetha Appan
eb38488d08
Fix logic bug, unit test for plan apply method in state store
2018-10-30 11:06:32 -05:00
Preetha Appan
cc295b90de
Implement preemption for system jobs.
...
This commit implements an allocation selection algorithm for finding
allocations to preempt. It currently special cases network resource asks
from others (cpu/memory/disk/iops).
2018-10-30 11:06:32 -05:00
Alex Dadgar
52f9cd7637
fixing tests
2018-10-04 14:26:19 -07:00
Alex Dadgar
b2449ae1ce
Fix deployment watcher index usage
...
Fixes three issues:
1. Retrieving the latest evaluation index was not properly selecting the
greatest index. This would undermine checks we had to reduce the number
of evaluations created when the latest eval index was greater than any
alloc change
2. Fix an issue where the blocking query code was using the incorrect
index such that the index was higher than necassary.
3. Special case handling of blocked evaluation since the create/snapshot
index is no particularly useful since they can be reblocked.
2018-09-21 13:59:11 -07:00
Alex Dadgar
3c19d01d7a
server
2018-09-15 16:23:13 -07:00
Alex Dadgar
300b1a7a15
Tests only use testlog package logger
2018-06-13 15:40:56 -07:00
Alex Dadgar
21c5ed850d
Register events
2018-05-22 14:06:33 -07:00
Alex Dadgar
17aac1c9de
node heartbeat missed event
2018-05-22 14:05:46 -07:00
Alex Dadgar
5f2080bc26
Emit events based on eligibility
2018-05-22 14:04:59 -07:00
Alex Dadgar
a35248d1d8
Plumb event via FSM
2018-05-10 16:30:54 -07:00
Preetha Appan
cba13e4ec5
Fix test set up to set ModifyTime for alloc
2018-05-07 14:55:01 -05:00
Preetha
02d63432b4
Fix typo
2018-05-07 14:55:01 -05:00
Alex Dadgar
738056634e
Fix the initial progress deadline calculation when the alloc is inplace updated to be part of a new deployment
2018-05-07 14:55:01 -05:00
Alex Dadgar
319763a5d8
remove unnessary merge of DeploymentStatus.Timestamp
2018-05-07 14:50:01 -05:00
Alex Dadgar
f95ab4ade8
Mark canaries on creation, and unmark on promotion
2018-05-07 14:50:01 -05:00
Alex Dadgar
641ef81cbf
Test fixes
2018-05-07 14:50:01 -05:00
Alex Dadgar
99e00fb774
Pass through timestamp
2018-05-07 14:50:01 -05:00
Alex Dadgar
1336002255
Progress deadline in deployment state
2018-05-07 14:50:01 -05:00