Commit Graph

293 Commits

Author SHA1 Message Date
Mahmood Ali 6bdbeed319 set node.StatusUpdatedAt in raft
Fix a case where `node.StatusUpdatedAt` was manipulated directly in
memory.

This ensures that StatusUpdatedAt is set in raft layer, and ensures that
the field is updated when node drain/eligibility is updated too.
2019-05-21 16:13:32 -04:00
Michael Schurter 9732bc37ff nomad: refactor waitForIndex into SnapshotAfter
Generalize wait for index logic in the state store for reuse elsewhere.
Also begin plumbing in a context to combine handling of timeouts and
shutdown.
2019-05-17 13:30:23 -07:00
Preetha c8fdf20c66
Merge pull request #5717 from hashicorp/b-plan-apply-preemptions
Fix bug in plan applier introduced in PR-5602
2019-05-16 11:01:05 -05:00
Preetha 555dd23c2c
remove stray newline
Co-Authored-By: Danielle <dani@builds.terrible.systems>
2019-05-15 21:11:52 -05:00
Preetha Appan 2b787aad7e
Fix bug in plan applier introduced in PR-5602
This fixes a bug in the state store during plan apply. When
denormalizing preempted allocations it incorrectly set the preemptor's
job during the update. This eventually causes a panic downstream in the
client. Added a test assertion that failed before and passes after this fix
2019-05-15 20:34:06 -05:00
Preetha Appan 07690d6f9e
Add flag similar to --all for allocs to be able to filter deployments by latest 2019-05-13 18:33:41 -05:00
Arshneet Singh 9cc39edb67 Return error when preempted/stopped alloc doesn't exist during denormalization 2019-04-24 12:36:07 -07:00
Arshneet Singh d4e7a5c005 Add comments to functions, and use require instead of assert 2019-04-23 09:57:21 -07:00
Arshneet Singh 4cf4324b8f Remove allowPlanOptimization from schedulers 2019-04-23 09:18:02 -07:00
Arshneet Singh 65f5fab131 Add tests for plan normalization 2019-04-23 09:18:01 -07:00
Preetha Appan 0f8a113ead
Refactor to find jobs with child instances more effeciently
also added unit tests
2019-01-17 14:29:48 -06:00
Mahmood Ali 6281700c0c address review comments 2018-11-20 13:21:39 -05:00
Mahmood Ali d744e71fa9 add a missing no errorassertion 2018-11-19 21:44:00 -05:00
Mahmood Ali bff9c3b3e9 Reproduce a panic related to batch GC
Test case that reproduces a panic with the following stacktrace:

```
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715]

goroutine 35 [running]:
testing.tRunner.func1(0xc0001e2200)
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387
panic(0x167e400, 0x1c43a30)
	/usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530)
	/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc0003a0420, 0x1756059, 0xb)
	/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e
github.com/hashicorp/nomad/nomad/state.(*StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1
github.com/hashicorp/nomad/nomad/state.(*StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2
github.com/hashicorp/nomad/nomad/state.(*StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685
testing.tRunner(0xc0001e2200, 0x1777138)
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353
```
2018-11-19 20:58:32 -05:00
Alex Dadgar 98398a8a44
Merge pull request #4842 from hashicorp/b-deployment-progress-deadline
Fix multiple bugs with progress deadline handling
2018-11-08 13:31:54 -08:00
Alex Dadgar 261aae32b1 more robust merging of the deployment status when getting updates from the client 2018-11-05 16:39:09 -08:00
Alex Dadgar 1c31970464 Fix multiple tgs with progress deadline handling
Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.

Further fix an issue where the blocked eval optimization would make it
so no evals were created to progress the deployment. To reproduce this
issue, prior to this commit, you can create a job with two task groups.
The first group has count 1 and resources such that it can not be
placed. The second group has count 3, max_parallel=1, and can be placed.
Run this first and then update the second group to do a deployment. It
will place the first of three, but never progress since there exists a
blocked eval. However, that doesn't capture the fact that there are two
groups being deployed.
2018-11-05 16:06:17 -08:00
Preetha Appan 1a5421f5d7
more minor cleanup 2018-10-30 11:06:32 -05:00
Preetha Appan 0494a098ce
More style and readablity fixes from review 2018-10-30 11:06:32 -05:00
Preetha Appan 7b8156fc47
Restore/Snapshot plus unit tests for scheduler configuration 2018-10-30 11:06:32 -05:00
Preetha Appan eb38488d08
Fix logic bug, unit test for plan apply method in state store 2018-10-30 11:06:32 -05:00
Alex Dadgar 52f9cd7637 fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar 21c5ed850d Register events 2018-05-22 14:06:33 -07:00
Alex Dadgar 17aac1c9de node heartbeat missed event 2018-05-22 14:05:46 -07:00
Alex Dadgar 5f2080bc26 Emit events based on eligibility 2018-05-22 14:04:59 -07:00
Alex Dadgar a35248d1d8 Plumb event via FSM 2018-05-10 16:30:54 -07:00
Preetha Appan cba13e4ec5
Fix test set up to set ModifyTime for alloc 2018-05-07 14:55:01 -05:00
Alex Dadgar 319763a5d8
remove unnessary merge of DeploymentStatus.Timestamp 2018-05-07 14:50:01 -05:00
Alex Dadgar f95ab4ade8
Mark canaries on creation, and unmark on promotion 2018-05-07 14:50:01 -05:00
Alex Dadgar 641ef81cbf
Test fixes 2018-05-07 14:50:01 -05:00
Alex Dadgar 99e00fb774
Pass through timestamp 2018-05-07 14:50:01 -05:00
Alex Dadgar 1336002255
Progress deadline in deployment state 2018-05-07 14:50:01 -05:00
Preetha Appan 52b3b53181
Update ModifyIndex of alloc when setting NextAllocation value 2018-05-03 17:04:36 -05:00
Chelsea Holland Komlo 31557cc44f move tests to use time.Time 2018-03-27 15:43:57 -04:00
Michael Schurter cb61a4bdc7 Fix linting errors 2018-03-21 16:51:45 -07:00
Alex Dadgar 2d91b9dfba Batch drain update 2018-03-21 16:51:44 -07:00
Alex Dadgar 7b2bad8c5e Toggle Drain allows resetting eligibility
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
2018-03-21 16:51:44 -07:00
Alex Dadgar 0fba0101b6 RPC/FSM/State Store for Eligibility 2018-03-21 16:51:44 -07:00
Alex Dadgar 2f5309d82a Remove update time 2018-03-21 16:51:43 -07:00
Alex Dadgar 0965c9ed28 Fix tests 2018-03-21 16:51:43 -07:00
Alex Dadgar e459a666ed Node.Drain takes strategy 2018-03-21 16:49:48 -07:00
Michael Schurter 03d0e5b8a0 improve drain fsm/statestore tests 2018-03-21 16:49:48 -07:00
Michael Schurter d1ec65d765 switch to new raft DesiredTransition message 2018-03-21 16:49:48 -07:00
Alex Dadgar db4a634072 RPC, FSM, State Store for marking DesiredTransistion
fix build tag
2018-03-21 16:49:48 -07:00
Alex Dadgar 63e14b7d63 nodeevents -> events 2018-03-13 18:08:22 -07:00
Alex Dadgar d3c3deffad fixes 2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo 8f109c344c make check fixes 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 1488b076d1 code review feedback 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 19ef872769 keep state store functions in one file 2018-03-13 18:08:21 -07:00
Michael Schurter 7dd7fbcda2 non-Existent -> nonexistent
Reverting from #3963

https://www.merriam-webster.com/dictionary/existent
2018-03-12 11:59:33 -07:00
Josh Soref 7f6e4012a0 spelling: existent 2018-03-11 18:30:37 +00:00
Preetha Appan 288ff0b6f0
Add test case to verify setting next alloc id correctly 2018-01-24 17:55:29 -06:00
Preetha Appan 40cb1d327c
Address some code review comments 2017-12-18 15:22:23 -06:00
Preetha Appan 3c36abfe14
Update eval modify index as part of plan apply. 2017-12-18 10:03:55 -06:00
Alex Dadgar f4aa5ea0c7 lax timing 2017-10-24 10:58:06 -07:00
Alex Dadgar 1192385c63 Lax blocking query test timing 2017-10-20 13:07:17 -07:00
Alex Dadgar c1cc51dbee sync 2017-10-13 14:36:02 -07:00
Michael Schurter dfd2967cdb Merge pull request #3376 from hashicorp/f-node-acls
Allow Node.SecretID for Node.GetNode and Allocs.GetAlloc
2017-10-13 11:51:48 -07:00
Michael Schurter a003e3dd43 Add StateStore.NodeBySecretID 2017-10-12 15:27:29 -07:00
Alex Dadgar e7e18c931c Fix sorting of job versions
Fixes an issue in which the versions were improperly sorted which would
cause pruning of the wrong job version. This essentially meant that job
versions above 255 would be dropped from the job version table (note
this was due to the prefix walk crossing from the 1-byte to 2-byte
threshold).

Fixes https://github.com/hashicorp/nomad/issues/3357
2017-10-12 13:33:55 -07:00
Michael Schurter a66c53d45a Remove `structs` import from `api`
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Alex Dadgar 4173834231 Enable more linters 2017-09-26 15:26:33 -07:00
Alex Dadgar e5ec915ac3 sync 2017-09-19 10:08:23 -05:00
Armon Dadgar 20a8e590a0 nomad: support ACL bootstrap reset 2017-09-10 16:03:30 -07:00
Alex Dadgar 84d06f6abe Sync namespace changes 2017-09-07 17:04:21 -07:00
Armon Dadgar 10500c39e5 nomad: fixing test 2017-09-04 13:21:01 -07:00
Armon Dadgar 1ace912341 nomad: adding bootstrapping checks 2017-09-04 13:05:53 -07:00
Armon Dadgar 06a7f12fad nomad: adding bootstrap state store method 2017-09-04 13:05:53 -07:00
Armon Dadgar 583a11cebd nomad: Adding ability to filter list of tokens to global only 2017-09-04 13:04:45 -07:00
Armon Dadgar f91d2608cb nomad: renambe PublicID to AccessorID for consistency 2017-09-04 13:04:45 -07:00
Armon Dadgar a17991e907 nomad: CRUD methods for ACLTokens 2017-09-04 13:04:45 -07:00
Armon Dadgar cde8e9301b nomad: fixing state store tests due to signature mismatch 2017-09-04 13:04:44 -07:00
Armon Dadgar 351afa0069 nomad: Upsert and Delete ACL policies can take a list 2017-09-04 13:03:14 -07:00
Armon Dadgar 4cb544e8f3 nomad: Adding CRUD to state store for ACL Policies 2017-09-04 13:03:14 -07:00
Alex Dadgar 4cc8bac48d fix blocking query due to ctx change 2017-08-31 15:34:55 -07:00
Alex Dadgar 590ff91bf3 Deployment watcher takes state store 2017-08-30 18:51:59 -07:00
Alex Dadgar dfcb73c896 Fix purging job versions
This PR fixes an issue in which the job versions weren't properly
cleaned when removing a job.

Fixes https://github.com/hashicorp/nomad/issues/3052
2017-08-18 15:46:03 -07:00
Luke Farnell f0ced87b95 fixed all spelling mistakes for goreport 2017-08-07 17:13:05 -04:00
Alex Dadgar 5e98c3ce95 Expose FSM errors into deployment watcher and API
This PR exposes errors returned by the FSM to the deployment watcher and
thus the API. It also adds an error to handle the case of promoting a
deployment that has no eligible canaries.
2017-07-25 16:23:22 -07:00
Alex Dadgar 3a29b38108 Status description shows requiring promotion 2017-07-07 12:12:48 -07:00
Alex Dadgar de54ffd1f6 Deployment from inplace updates tracks placed properly. 2017-07-07 12:10:04 -07:00
Alex Dadgar 5457bb7962 Job stability 2017-07-07 12:10:04 -07:00
Alex Dadgar d07a5a2008 Complete deployments mark jobs as stable
This PR allows jobs to be marked as stable automatically by a successful
deployment.
2017-07-07 12:10:04 -07:00
Alex Dadgar 454083ba1b Remove canary 2017-07-07 12:10:04 -07:00
Alex Dadgar c10d7ab871 Remove promoted bit from allocation 2017-07-07 12:10:04 -07:00
Alex Dadgar 09dfa2fc10 Rename CreateDeployments and remove cancelling behavior in state_store 2017-07-07 12:10:04 -07:00
Alex Dadgar b64185a3f1 Deployment GC
This PR implements the garbage collector for deployments. Deployments
will by default be garbage collected after 1 hour.
2017-07-07 12:05:57 -07:00
Alex Dadgar 73325f888f deployment api 2017-07-07 12:03:11 -07:00
Alex Dadgar dad9e69822 more comment fixes 2017-07-07 12:03:11 -07:00
Alex Dadgar 80dc4d66d8 Deployments list 2017-07-07 12:03:11 -07:00
Alex Dadgar eec3cefee4 state store tests 2017-07-07 12:03:11 -07:00
Alex Dadgar d04877d23c initial impl 2017-07-07 12:03:11 -07:00
Michael Schurter 8d3e13ab8a System jobs without evals are running too 2017-07-03 13:48:51 -07:00
Michael Schurter f7d2a74ddf System jobs should be running until stopped
Prior to this commit they would be marked as dead if they had no
currently running allocations -- even though they would spring back to
life (running) if the cluster state changed such that a new eval+alloc
was created.
2017-06-28 11:39:24 -07:00
Alex Dadgar 83f5e65aae Plan allows updating the status of deployments 2017-05-11 12:49:04 -07:00
Alex Dadgar 7078d563cb Create Deployments through plan application 2017-05-05 15:33:19 -07:00
Alex Dadgar 343ff03f02 Deployment struct, state store, fsm persist/restore 2017-05-04 13:37:18 -07:00
Alex Dadgar aed852782f Merge pull request #2592 from hashicorp/b-gc-race
Protect against nil job in new allocation
2017-05-01 13:54:43 -07:00
Alex Dadgar efa91c3d89 Protect against nil job in new allocation 2017-04-26 18:27:27 -07:00
Alex Dadgar 1b97c9abdd Revert server endpoint 2017-04-20 11:14:06 -07:00
Alex Dadgar 1769fe468a Fix some tests 2017-04-17 19:39:20 -07:00
Alex Dadgar 34332af70e GC and some fixes 2017-04-15 17:08:05 -07:00
Alex Dadgar 3145086a42 non-purge deregisters 2017-04-15 17:08:05 -07:00
Alex Dadgar fda44689b7 Histories -> Versions 2017-04-15 17:08:05 -07:00
Alex Dadgar f97664512b Upsert Job Histories 2017-04-15 17:08:05 -07:00
Alex Dadgar 787be30f13 Fix periodic job state
This PR fixes an issue in which a periodic job would incorrectly
transistion to status dead.

Fixes https://github.com/hashicorp/nomad/issues/2268
2017-03-27 10:35:36 -07:00
Alex Dadgar 5d293c0f1e Add abandon tests and use snapshot for blocking queries 2017-02-08 11:18:03 -08:00
Alex Dadgar 36d018514b Fix test 2017-02-07 11:35:38 -08:00
Alex Dadgar bc2e6b0cc2 Fix state store tests 2017-02-06 16:46:23 -08:00
Alex Dadgar 0f046b179a Merge pull request #2155 from hashicorp/f-cancel
Cancel blocked evals upon successful one for job
2017-01-11 13:10:35 -08:00
Alex Dadgar 8d5f0fea69 Merge pull request #2128 from hashicorp/f-dispatch
Nomad Constructor Jobs and Dispatch
2017-01-06 05:22:49 +08:00
Alex Dadgar 86980e08f0 Cancel blocked evals upon successful one for job
This PR causes blocked evaluations to be cancelled if there is a
subsequent successful evaluation for the job. This fixes UX problems
showing failed placements when there are not any in reality and makes GC
possible for these jobs in certain cases.

Fixes https://github.com/hashicorp/nomad/issues/2124
2017-01-04 16:16:04 -08:00
Alex Dadgar 2761e1d8ea fix tests 2016-12-16 10:21:56 -08:00
Alex Dadgar 1235fc6581 summary tests 2016-12-13 16:15:40 -08:00
Diptanu Choudhury 5191b4d33a Making the status command return the allocs of currently registered job 2016-11-24 16:31:30 +01:00
Alex Dadgar df4398beac Implement blocking queries for /v1/job/evaluations 2016-10-29 17:30:34 -07:00
Diptanu Choudhury 1b3c5e98c8 Renaming LocalDisk to EphemeralDisk (#1710)
Renaming LocalDisk to EphemeralDisk
2016-09-14 15:43:42 -07:00
Diptanu Choudhury 6028682ad2 Adding LocalDisk to alloc.Job 2016-09-01 17:41:50 -07:00
Alex Dadgar 3c9936ae4a Merge pull request #1659 from hashicorp/f-revoke-accessors
Token revocation and keeping only a single Vault client active among servers
2016-08-31 14:10:46 -07:00
Alex Dadgar 48696ba0cc Use tomb to shutdown
Token revocation

Remove from the statestore

Revoke tokens

Don't error when Vault is disabled as this could cause issue if the operator ever goes from enabled to disabled

update server interface to allow enable/disable and config loading

test the new functions

Leader revoke

Use active
2016-08-28 14:06:25 -07:00
Diptanu Choudhury 3447658bba Added scheduler tests to ensure disk constraints are honored 2016-08-25 15:31:56 -05:00
Diptanu Choudhury 8105613c25 Added an upgrade path for existing jobs with no local disk 2016-08-25 13:00:20 -05:00
Alex Dadgar 901000f789 Raft message, fsm and state store table 2016-08-19 16:40:37 -07:00
Diptanu Choudhury 6dc5b1972c Setting job's create index as summary create index during reconciliation 2016-08-04 15:14:01 -07:00
Alex Dadgar 2fb67fefb5 Merge pull request #1516 from hashicorp/f-lost-state-sched
Make scheduler mark allocations as lost
2016-08-04 11:36:02 -07:00
Diptanu Choudhury 88d383c47f Updated tests and comments 2016-08-04 11:29:36 -07:00
Diptanu Choudhury c24e8ba7d8 Not updating summary if job is de-registered 2016-08-03 17:00:08 -07:00
Alex Dadgar ac3328e812 Make scheduler mark allocations as lost 2016-08-03 15:57:46 -07:00
Diptanu Choudhury 1b60e0823a Added a test for restoring the summaries in fsm 2016-08-03 11:58:36 -07:00
Alex Dadgar 4197e62e78 Remove old way of marking lost 2016-08-03 11:20:56 -07:00
Diptanu Choudhury 6f8c40fca7 Not updating summary if create index of summary not same as job's create index 2016-08-02 18:59:45 -07:00
Diptanu Choudhury 87fdeb5393 Updated the logic to update job summary 2016-08-02 16:08:20 -07:00
Diptanu Choudhury b69b7129a6 Using the parnet transaction to query the allocation while updating summary 2016-08-01 16:46:05 -07:00
Diptanu Choudhury b0e1f02e26 Not updating job summaries if jobs are not present 2016-07-28 15:24:27 -07:00
Diptanu Choudhury 1bab053490 Updated some tests 2016-07-26 15:11:48 -07:00
Diptanu Choudhury 5bded8d54d Setting the right indexes while creating Job Summary 2016-07-25 17:51:20 -07:00
Diptanu Choudhury 3089833397 Reconciling the queued allocations during restore 2016-07-25 17:31:40 -07:00
Diptanu Choudhury f1c9427c37 Added code to create missing job summaries 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 50842b88c7 Fixed some bugs 2016-07-25 17:26:38 -07:00
Alex Dadgar e0114fee05 InitFields to Canonicalize 2016-07-20 16:08:52 -07:00
Diptanu Choudhury 487c66b84d Removing the queued state of Job Summary and alloc desired status false 2016-07-13 13:20:46 -06:00
Diptanu Choudhury 5d782abd50 Refactored the test 2016-07-12 14:37:51 -06:00
Diptanu Choudhury 00b9b4c6e8 Accounting lost state of allocations 2016-07-12 14:27:45 -06:00
Diptanu Choudhury 313d7aa7f5 Added a test to ensure client alloc updates are happening properly 2016-07-12 11:41:13 -06:00
Diptanu Choudhury 91b828d299 Updated logic to handle change in desired status of allocation when client status is still pending 2016-07-12 11:41:13 -06:00
Diptanu Choudhury 6937c0f7f3 Added test for job summary restore 2016-07-12 11:41:13 -06:00
Diptanu Choudhury 5e6f9ef69e Added methods to save and restore job summary snapshots 2016-07-12 11:41:13 -06:00
Diptanu Choudhury 67953b1583 Added a test to ensure correctness of job summary when client updates alloc 2016-07-12 11:41:13 -06:00
Diptanu Choudhury 837b70f285 Added test to make sure summary gets deleted when job gets deleted 2016-07-12 11:41:13 -06:00
Diptanu Choudhury 0606840080 Implemented logic to update the job summary when allocs are inserted 2016-07-12 11:41:13 -06:00