Commit Graph

285 Commits

Author SHA1 Message Date
Michael Schurter 81b4b6f19b
Merge pull request #5791 from hashicorp/b-plan-snapshotindex
nomad: include snapshot index when submitting plans
2019-07-17 09:25:00 -07:00
Lang Martin a8e72a5b68 state_store error if called without node_ids 2019-07-10 13:56:20 -04:00
Lang Martin 8e53c105fc state_store just one index update, test deletion 2019-07-10 13:56:19 -04:00
Lang Martin 5a6a947e98 state_store improve error messages 2019-07-10 13:56:19 -04:00
Lang Martin be2d6853cb state_store DeleteNode operates on a batch of ids 2019-07-10 13:56:19 -04:00
Preetha Appan 3cb798235d
Missed one revert of backwards compatibility for node drain 2019-07-01 16:46:05 -05:00
Preetha Appan 23319e04d6
Restore accidentally deleted block 2019-06-26 13:59:14 -05:00
Preetha Appan 10e7d6df6d
Remove compat code associated with many previous versions of nomad
This removes compat code for namespaces (0.7), Drain(0.8) and other
older features from releases older than Nomad 0.7
2019-06-25 19:05:25 -05:00
Michael Schurter e4bc943a68 nomad: SnapshotAfter -> SnapshotMinIndex
Rename SnapshotAfter to SnapshotMinIndex. The old name was not
technically accurate. SnapshotAtOrAfter is more accurate, but wordy and
still lacks context about what precisely it is at or after (the index).

SnapshotMinIndex was chosen as it describes the action (snapshot), a
constraint (minimum), and the object of the constraint (index).
2019-06-24 12:16:46 -07:00
Mahmood Ali 07f2c77c44 comment DenormalizeAllocationDiffSlice applies to terminal allocs only 2019-06-12 08:28:43 -04:00
Mahmood Ali 392f5bac44 Stop updating allocs.Job on stopping or preemption 2019-06-10 18:30:20 -04:00
Mahmood Ali 87173111de
Merge pull request #5746 from hashicorp/b-no-updating-inmem-node
set node.StatusUpdatedAt in raft
2019-06-05 19:05:21 -04:00
Lang Martin 0c403eafde state_store typo in a comment 2019-05-22 12:32:08 -04:00
Mahmood Ali 6bdbeed319 set node.StatusUpdatedAt in raft
Fix a case where `node.StatusUpdatedAt` was manipulated directly in
memory.

This ensures that StatusUpdatedAt is set in raft layer, and ensures that
the field is updated when node drain/eligibility is updated too.
2019-05-21 16:13:32 -04:00
Michael Schurter 9732bc37ff nomad: refactor waitForIndex into SnapshotAfter
Generalize wait for index logic in the state store for reuse elsewhere.
Also begin plumbing in a context to combine handling of timeouts and
shutdown.
2019-05-17 13:30:23 -07:00
Preetha c8fdf20c66
Merge pull request #5717 from hashicorp/b-plan-apply-preemptions
Fix bug in plan applier introduced in PR-5602
2019-05-16 11:01:05 -05:00
Preetha Appan 2b787aad7e
Fix bug in plan applier introduced in PR-5602
This fixes a bug in the state store during plan apply. When
denormalizing preempted allocations it incorrectly set the preemptor's
job during the update. This eventually causes a panic downstream in the
client. Added a test assertion that failed before and passes after this fix
2019-05-15 20:34:06 -05:00
Preetha Appan d448750449
Lookup job only once, and fix tests 2019-05-13 18:33:41 -05:00
Preetha Appan 07690d6f9e
Add flag similar to --all for allocs to be able to filter deployments by latest 2019-05-13 18:33:41 -05:00
Arshneet Singh 9cc39edb67 Return error when preempted/stopped alloc doesn't exist during denormalization 2019-04-24 12:36:07 -07:00
Arshneet Singh d4e7a5c005 Add comments to functions, and use require instead of assert 2019-04-23 09:57:21 -07:00
Arshneet Singh b977748a4b Add code for plan normalization 2019-04-23 09:18:01 -07:00
Alex Dadgar 4bdccab550 goimports 2019-01-22 15:44:31 -08:00
Preetha Appan 510d7839e4
code review comments 2019-01-18 17:41:39 -06:00
Preetha Appan be9656d195
fix linting 2019-01-17 15:36:33 -06:00
Preetha Appan 0f8a113ead
Refactor to find jobs with child instances more effeciently
also added unit tests
2019-01-17 14:29:48 -06:00
Preetha Appan be36fee48e
Use IsParameterized/isPeriodic methods 2019-01-17 12:15:42 -06:00
Preetha Appan 81a8f18cac
Fix bug in reconcile summaries that affects periodic/parameterized jobs
This fixes incorrect parent job summaries by recomputing them in the
ReconcileJobSummaries method in the state store
2019-01-17 12:01:01 -06:00
Preetha Appan 8656d3379f
Add guards around subtracting summary count 2018-12-03 11:16:35 -06:00
Mahmood Ali b93643cd96 Fix a panic related to batch GC
`deleteJobVersions` does concurrent modifications to iterated items
while iterating, by deleting job versions while it's iterating on them,
2018-11-19 20:59:45 -05:00
Mahmood Ali a4a9347501 fix comment typos 2018-11-14 08:36:14 -05:00
Mahmood Ali 1403ad21b9 Changelog job re-run fix 2018-11-13 07:52:51 -05:00
Mahmood Ali e2d668f21c
Merge pull request #4861 from hashicorp/b-batch-deregister-transaction
Run job deregistering in a single transaction
2018-11-12 20:59:44 -05:00
Mahmood Ali 8513b3cccb Comment public functions and batch write txn 2018-11-12 16:09:39 -05:00
Preetha Appan 7ef126a027
Smaller methods, and added tests for RPC layer 2018-11-10 17:37:33 -06:00
Mahmood Ali 9c0a15f3ce Run job deregistering in a single transaction
Fixes https://github.com/hashicorp/nomad/issues/4299

Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals.

Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index.  However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals.  When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation.

This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.
2018-11-09 22:35:26 -05:00
Alex Dadgar 98398a8a44
Merge pull request #4842 from hashicorp/b-deployment-progress-deadline
Fix multiple bugs with progress deadline handling
2018-11-08 13:31:54 -08:00
Alex Dadgar 261aae32b1 more robust merging of the deployment status when getting updates from the client 2018-11-05 16:39:09 -08:00
Alex Dadgar 1c31970464 Fix multiple tgs with progress deadline handling
Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.

Further fix an issue where the blocked eval optimization would make it
so no evals were created to progress the deployment. To reproduce this
issue, prior to this commit, you can create a job with two task groups.
The first group has count 1 and resources such that it can not be
placed. The second group has count 3, max_parallel=1, and can be placed.
Run this first and then update the second group to do a deployment. It
will place the first of three, but never progress since there exists a
blocked eval. However, that doesn't capture the fact that there are two
groups being deployed.
2018-11-05 16:06:17 -08:00
Preetha Appan 57fe5050f0
more minor review feedback 2018-11-01 17:05:17 -05:00
Preetha Appan 1415032c13
More review comments 2018-10-30 11:06:32 -05:00
Preetha Appan 7b8156fc47
Restore/Snapshot plus unit tests for scheduler configuration 2018-10-30 11:06:32 -05:00
Preetha Appan bd34cbb1f7
Support for new scheduler config API, first use case is to disable preemption 2018-10-30 11:06:32 -05:00
Preetha Appan eb38488d08
Fix logic bug, unit test for plan apply method in state store 2018-10-30 11:06:32 -05:00
Preetha Appan cc295b90de
Implement preemption for system jobs.
This commit implements an allocation selection algorithm for finding
allocations to preempt. It currently special cases network resource asks
from others (cpu/memory/disk/iops).
2018-10-30 11:06:32 -05:00
Alex Dadgar 52f9cd7637 fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar b2449ae1ce Fix deployment watcher index usage
Fixes three issues:
1. Retrieving the latest evaluation index was not properly selecting the
greatest index. This would undermine checks we had to reduce the number
of evaluations created when the latest eval index was greater than any
alloc change
2. Fix an issue where the blocking query code was using the incorrect
index such that the index was higher than necassary.
3. Special case handling of blocked evaluation since the create/snapshot
index is no particularly useful since they can be reblocked.
2018-09-21 13:59:11 -07:00
Alex Dadgar 3c19d01d7a server 2018-09-15 16:23:13 -07:00
Alex Dadgar 300b1a7a15 Tests only use testlog package logger 2018-06-13 15:40:56 -07:00
Alex Dadgar 21c5ed850d Register events 2018-05-22 14:06:33 -07:00