Michael Schurter
81b4b6f19b
Merge pull request #5791 from hashicorp/b-plan-snapshotindex
...
nomad: include snapshot index when submitting plans
2019-07-17 09:25:00 -07:00
Lang Martin
a8e72a5b68
state_store error if called without node_ids
2019-07-10 13:56:20 -04:00
Lang Martin
8e53c105fc
state_store just one index update, test deletion
2019-07-10 13:56:19 -04:00
Lang Martin
5a6a947e98
state_store improve error messages
2019-07-10 13:56:19 -04:00
Lang Martin
be2d6853cb
state_store DeleteNode operates on a batch of ids
2019-07-10 13:56:19 -04:00
Preetha Appan
3cb798235d
Missed one revert of backwards compatibility for node drain
2019-07-01 16:46:05 -05:00
Preetha Appan
aa2b4b4e00
Undo removal of node drain compat changes
...
Decided to remove that in 0.10
2019-07-01 15:12:01 -05:00
Preetha Appan
3484f18984
Fix more tests
2019-06-26 16:30:53 -05:00
Preetha Appan
23319e04d6
Restore accidentally deleted block
2019-06-26 13:59:14 -05:00
Preetha Appan
10e7d6df6d
Remove compat code associated with many previous versions of nomad
...
This removes compat code for namespaces (0.7), Drain(0.8) and other
older features from releases older than Nomad 0.7
2019-06-25 19:05:25 -05:00
Michael Schurter
e4bc943a68
nomad: SnapshotAfter -> SnapshotMinIndex
...
Rename SnapshotAfter to SnapshotMinIndex. The old name was not
technically accurate. SnapshotAtOrAfter is more accurate, but wordy and
still lacks context about what precisely it is at or after (the index).
SnapshotMinIndex was chosen as it describes the action (snapshot), a
constraint (minimum), and the object of the constraint (index).
2019-06-24 12:16:46 -07:00
Mahmood Ali
07f2c77c44
comment DenormalizeAllocationDiffSlice applies to terminal allocs only
2019-06-12 08:28:43 -04:00
Mahmood Ali
392f5bac44
Stop updating allocs.Job on stopping or preemption
2019-06-10 18:30:20 -04:00
Mahmood Ali
6c8e329819
test that stopped alloc jobs aren't modified
...
When an alloc is stopped, test that we don't update the job found in
alloc with new job that is no longer relevent for this alloc.
2019-06-10 17:14:26 -04:00
Mahmood Ali
87173111de
Merge pull request #5746 from hashicorp/b-no-updating-inmem-node
...
set node.StatusUpdatedAt in raft
2019-06-05 19:05:21 -04:00
Lang Martin
0c403eafde
state_store typo in a comment
2019-05-22 12:32:08 -04:00
Mahmood Ali
6bdbeed319
set node.StatusUpdatedAt in raft
...
Fix a case where `node.StatusUpdatedAt` was manipulated directly in
memory.
This ensures that StatusUpdatedAt is set in raft layer, and ensures that
the field is updated when node drain/eligibility is updated too.
2019-05-21 16:13:32 -04:00
Michael Schurter
1bc731da47
nomad: remove unused NotifyGroup struct
...
I don't think it's been used for a long time.
2019-05-17 13:30:23 -07:00
Michael Schurter
9732bc37ff
nomad: refactor waitForIndex into SnapshotAfter
...
Generalize wait for index logic in the state store for reuse elsewhere.
Also begin plumbing in a context to combine handling of timeouts and
shutdown.
2019-05-17 13:30:23 -07:00
Preetha
c8fdf20c66
Merge pull request #5717 from hashicorp/b-plan-apply-preemptions
...
Fix bug in plan applier introduced in PR-5602
2019-05-16 11:01:05 -05:00
Preetha
555dd23c2c
remove stray newline
...
Co-Authored-By: Danielle <dani@builds.terrible.systems>
2019-05-15 21:11:52 -05:00
Preetha Appan
2b787aad7e
Fix bug in plan applier introduced in PR-5602
...
This fixes a bug in the state store during plan apply. When
denormalizing preempted allocations it incorrectly set the preemptor's
job during the update. This eventually causes a panic downstream in the
client. Added a test assertion that failed before and passes after this fix
2019-05-15 20:34:06 -05:00
Preetha Appan
d448750449
Lookup job only once, and fix tests
2019-05-13 18:33:41 -05:00
Preetha Appan
07690d6f9e
Add flag similar to --all for allocs to be able to filter deployments by latest
2019-05-13 18:33:41 -05:00
Arshneet Singh
9cc39edb67
Return error when preempted/stopped alloc doesn't exist during denormalization
2019-04-24 12:36:07 -07:00
Arshneet Singh
d4e7a5c005
Add comments to functions, and use require instead of assert
2019-04-23 09:57:21 -07:00
Arshneet Singh
4cf4324b8f
Remove allowPlanOptimization from schedulers
2019-04-23 09:18:02 -07:00
Arshneet Singh
65f5fab131
Add tests for plan normalization
2019-04-23 09:18:01 -07:00
Arshneet Singh
b977748a4b
Add code for plan normalization
2019-04-23 09:18:01 -07:00
Alex Dadgar
4bdccab550
goimports
2019-01-22 15:44:31 -08:00
Preetha Appan
510d7839e4
code review comments
2019-01-18 17:41:39 -06:00
Preetha Appan
be9656d195
fix linting
2019-01-17 15:36:33 -06:00
Preetha Appan
0f8a113ead
Refactor to find jobs with child instances more effeciently
...
also added unit tests
2019-01-17 14:29:48 -06:00
Preetha Appan
be36fee48e
Use IsParameterized/isPeriodic methods
2019-01-17 12:15:42 -06:00
Preetha Appan
81a8f18cac
Fix bug in reconcile summaries that affects periodic/parameterized jobs
...
This fixes incorrect parent job summaries by recomputing them in the
ReconcileJobSummaries method in the state store
2019-01-17 12:01:01 -06:00
Preetha Appan
8656d3379f
Add guards around subtracting summary count
2018-12-03 11:16:35 -06:00
Mahmood Ali
6281700c0c
address review comments
2018-11-20 13:21:39 -05:00
Mahmood Ali
d744e71fa9
add a missing no errorassertion
2018-11-19 21:44:00 -05:00
Mahmood Ali
b93643cd96
Fix a panic related to batch GC
...
`deleteJobVersions` does concurrent modifications to iterated items
while iterating, by deleting job versions while it's iterating on them,
2018-11-19 20:59:45 -05:00
Mahmood Ali
bff9c3b3e9
Reproduce a panic related to batch GC
...
Test case that reproduces a panic with the following stacktrace:
```
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715]
goroutine 35 [running]:
testing.tRunner.func1(0xc0001e2200)
/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387
panic(0x167e400, 0x1c43a30)
/usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530)
/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc0003a0420, 0x1756059, 0xb)
/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e
github.com/hashicorp/nomad/nomad/state.(*StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1
github.com/hashicorp/nomad/nomad/state.(*StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2
github.com/hashicorp/nomad/nomad/state.(*StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685
testing.tRunner(0xc0001e2200, 0x1777138)
/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353
```
2018-11-19 20:58:32 -05:00
Mahmood Ali
a4a9347501
fix comment typos
2018-11-14 08:36:14 -05:00
Mahmood Ali
1403ad21b9
Changelog job re-run fix
2018-11-13 07:52:51 -05:00
Mahmood Ali
e2d668f21c
Merge pull request #4861 from hashicorp/b-batch-deregister-transaction
...
Run job deregistering in a single transaction
2018-11-12 20:59:44 -05:00
Mahmood Ali
8513b3cccb
Comment public functions and batch write txn
2018-11-12 16:09:39 -05:00
Preetha Appan
7ef126a027
Smaller methods, and added tests for RPC layer
2018-11-10 17:37:33 -06:00
Mahmood Ali
9c0a15f3ce
Run job deregistering in a single transaction
...
Fixes https://github.com/hashicorp/nomad/issues/4299
Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals.
Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index. However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals. When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation.
This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.
2018-11-09 22:35:26 -05:00
Alex Dadgar
98398a8a44
Merge pull request #4842 from hashicorp/b-deployment-progress-deadline
...
Fix multiple bugs with progress deadline handling
2018-11-08 13:31:54 -08:00
Alex Dadgar
261aae32b1
more robust merging of the deployment status when getting updates from the client
2018-11-05 16:39:09 -08:00
Alex Dadgar
1c31970464
Fix multiple tgs with progress deadline handling
...
Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.
Further fix an issue where the blocked eval optimization would make it
so no evals were created to progress the deployment. To reproduce this
issue, prior to this commit, you can create a job with two task groups.
The first group has count 1 and resources such that it can not be
placed. The second group has count 3, max_parallel=1, and can be placed.
Run this first and then update the second group to do a deployment. It
will place the first of three, but never progress since there exists a
blocked eval. However, that doesn't capture the fact that there are two
groups being deployed.
2018-11-05 16:06:17 -08:00
Preetha Appan
57fe5050f0
more minor review feedback
2018-11-01 17:05:17 -05:00