Mahmood Ali
6bdbeed319
set node.StatusUpdatedAt in raft
...
Fix a case where `node.StatusUpdatedAt` was manipulated directly in
memory.
This ensures that StatusUpdatedAt is set in raft layer, and ensures that
the field is updated when node drain/eligibility is updated too.
2019-05-21 16:13:32 -04:00
Michael Schurter
9732bc37ff
nomad: refactor waitForIndex into SnapshotAfter
...
Generalize wait for index logic in the state store for reuse elsewhere.
Also begin plumbing in a context to combine handling of timeouts and
shutdown.
2019-05-17 13:30:23 -07:00
Preetha
c8fdf20c66
Merge pull request #5717 from hashicorp/b-plan-apply-preemptions
...
Fix bug in plan applier introduced in PR-5602
2019-05-16 11:01:05 -05:00
Preetha
555dd23c2c
remove stray newline
...
Co-Authored-By: Danielle <dani@builds.terrible.systems>
2019-05-15 21:11:52 -05:00
Preetha Appan
2b787aad7e
Fix bug in plan applier introduced in PR-5602
...
This fixes a bug in the state store during plan apply. When
denormalizing preempted allocations it incorrectly set the preemptor's
job during the update. This eventually causes a panic downstream in the
client. Added a test assertion that failed before and passes after this fix
2019-05-15 20:34:06 -05:00
Preetha Appan
07690d6f9e
Add flag similar to --all for allocs to be able to filter deployments by latest
2019-05-13 18:33:41 -05:00
Arshneet Singh
9cc39edb67
Return error when preempted/stopped alloc doesn't exist during denormalization
2019-04-24 12:36:07 -07:00
Arshneet Singh
d4e7a5c005
Add comments to functions, and use require instead of assert
2019-04-23 09:57:21 -07:00
Arshneet Singh
4cf4324b8f
Remove allowPlanOptimization from schedulers
2019-04-23 09:18:02 -07:00
Arshneet Singh
65f5fab131
Add tests for plan normalization
2019-04-23 09:18:01 -07:00
Preetha Appan
0f8a113ead
Refactor to find jobs with child instances more effeciently
...
also added unit tests
2019-01-17 14:29:48 -06:00
Mahmood Ali
6281700c0c
address review comments
2018-11-20 13:21:39 -05:00
Mahmood Ali
d744e71fa9
add a missing no errorassertion
2018-11-19 21:44:00 -05:00
Mahmood Ali
bff9c3b3e9
Reproduce a panic related to batch GC
...
Test case that reproduces a panic with the following stacktrace:
```
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715]
goroutine 35 [running]:
testing.tRunner.func1(0xc0001e2200)
/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387
panic(0x167e400, 0x1c43a30)
/usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530)
/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc0003a0420, 0x1756059, 0xb)
/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e
github.com/hashicorp/nomad/nomad/state.(*StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1
github.com/hashicorp/nomad/nomad/state.(*StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2
github.com/hashicorp/nomad/nomad/state.(*StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685
testing.tRunner(0xc0001e2200, 0x1777138)
/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353
```
2018-11-19 20:58:32 -05:00
Alex Dadgar
98398a8a44
Merge pull request #4842 from hashicorp/b-deployment-progress-deadline
...
Fix multiple bugs with progress deadline handling
2018-11-08 13:31:54 -08:00
Alex Dadgar
261aae32b1
more robust merging of the deployment status when getting updates from the client
2018-11-05 16:39:09 -08:00
Alex Dadgar
1c31970464
Fix multiple tgs with progress deadline handling
...
Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.
Further fix an issue where the blocked eval optimization would make it
so no evals were created to progress the deployment. To reproduce this
issue, prior to this commit, you can create a job with two task groups.
The first group has count 1 and resources such that it can not be
placed. The second group has count 3, max_parallel=1, and can be placed.
Run this first and then update the second group to do a deployment. It
will place the first of three, but never progress since there exists a
blocked eval. However, that doesn't capture the fact that there are two
groups being deployed.
2018-11-05 16:06:17 -08:00
Preetha Appan
1a5421f5d7
more minor cleanup
2018-10-30 11:06:32 -05:00
Preetha Appan
0494a098ce
More style and readablity fixes from review
2018-10-30 11:06:32 -05:00
Preetha Appan
7b8156fc47
Restore/Snapshot plus unit tests for scheduler configuration
2018-10-30 11:06:32 -05:00
Preetha Appan
eb38488d08
Fix logic bug, unit test for plan apply method in state store
2018-10-30 11:06:32 -05:00
Alex Dadgar
52f9cd7637
fixing tests
2018-10-04 14:26:19 -07:00
Alex Dadgar
21c5ed850d
Register events
2018-05-22 14:06:33 -07:00
Alex Dadgar
17aac1c9de
node heartbeat missed event
2018-05-22 14:05:46 -07:00
Alex Dadgar
5f2080bc26
Emit events based on eligibility
2018-05-22 14:04:59 -07:00
Alex Dadgar
a35248d1d8
Plumb event via FSM
2018-05-10 16:30:54 -07:00
Preetha Appan
cba13e4ec5
Fix test set up to set ModifyTime for alloc
2018-05-07 14:55:01 -05:00
Alex Dadgar
319763a5d8
remove unnessary merge of DeploymentStatus.Timestamp
2018-05-07 14:50:01 -05:00
Alex Dadgar
f95ab4ade8
Mark canaries on creation, and unmark on promotion
2018-05-07 14:50:01 -05:00
Alex Dadgar
641ef81cbf
Test fixes
2018-05-07 14:50:01 -05:00
Alex Dadgar
99e00fb774
Pass through timestamp
2018-05-07 14:50:01 -05:00
Alex Dadgar
1336002255
Progress deadline in deployment state
2018-05-07 14:50:01 -05:00
Preetha Appan
52b3b53181
Update ModifyIndex of alloc when setting NextAllocation value
2018-05-03 17:04:36 -05:00
Chelsea Holland Komlo
31557cc44f
move tests to use time.Time
2018-03-27 15:43:57 -04:00
Michael Schurter
cb61a4bdc7
Fix linting errors
2018-03-21 16:51:45 -07:00
Alex Dadgar
2d91b9dfba
Batch drain update
2018-03-21 16:51:44 -07:00
Alex Dadgar
7b2bad8c5e
Toggle Drain allows resetting eligibility
...
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
2018-03-21 16:51:44 -07:00
Alex Dadgar
0fba0101b6
RPC/FSM/State Store for Eligibility
2018-03-21 16:51:44 -07:00
Alex Dadgar
2f5309d82a
Remove update time
2018-03-21 16:51:43 -07:00
Alex Dadgar
0965c9ed28
Fix tests
2018-03-21 16:51:43 -07:00
Alex Dadgar
e459a666ed
Node.Drain takes strategy
2018-03-21 16:49:48 -07:00
Michael Schurter
03d0e5b8a0
improve drain fsm/statestore tests
2018-03-21 16:49:48 -07:00
Michael Schurter
d1ec65d765
switch to new raft DesiredTransition message
2018-03-21 16:49:48 -07:00
Alex Dadgar
db4a634072
RPC, FSM, State Store for marking DesiredTransistion
...
fix build tag
2018-03-21 16:49:48 -07:00
Alex Dadgar
63e14b7d63
nodeevents -> events
2018-03-13 18:08:22 -07:00
Alex Dadgar
d3c3deffad
fixes
2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo
8f109c344c
make check fixes
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
1488b076d1
code review feedback
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
19ef872769
keep state store functions in one file
2018-03-13 18:08:21 -07:00
Michael Schurter
7dd7fbcda2
non-Existent -> nonexistent
...
Reverting from #3963
https://www.merriam-webster.com/dictionary/existent
2018-03-12 11:59:33 -07:00
Josh Soref
7f6e4012a0
spelling: existent
2018-03-11 18:30:37 +00:00
Preetha Appan
288ff0b6f0
Add test case to verify setting next alloc id correctly
2018-01-24 17:55:29 -06:00
Preetha Appan
40cb1d327c
Address some code review comments
2017-12-18 15:22:23 -06:00
Preetha Appan
3c36abfe14
Update eval modify index as part of plan apply.
2017-12-18 10:03:55 -06:00
Alex Dadgar
f4aa5ea0c7
lax timing
2017-10-24 10:58:06 -07:00
Alex Dadgar
1192385c63
Lax blocking query test timing
2017-10-20 13:07:17 -07:00
Alex Dadgar
c1cc51dbee
sync
2017-10-13 14:36:02 -07:00
Michael Schurter
dfd2967cdb
Merge pull request #3376 from hashicorp/f-node-acls
...
Allow Node.SecretID for Node.GetNode and Allocs.GetAlloc
2017-10-13 11:51:48 -07:00
Michael Schurter
a003e3dd43
Add StateStore.NodeBySecretID
2017-10-12 15:27:29 -07:00
Alex Dadgar
e7e18c931c
Fix sorting of job versions
...
Fixes an issue in which the versions were improperly sorted which would
cause pruning of the wrong job version. This essentially meant that job
versions above 255 would be dropped from the job version table (note
this was due to the prefix walk crossing from the 1-byte to 2-byte
threshold).
Fixes https://github.com/hashicorp/nomad/issues/3357
2017-10-12 13:33:55 -07:00
Michael Schurter
a66c53d45a
Remove `structs` import from `api`
...
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Alex Dadgar
4173834231
Enable more linters
2017-09-26 15:26:33 -07:00
Alex Dadgar
e5ec915ac3
sync
2017-09-19 10:08:23 -05:00
Armon Dadgar
20a8e590a0
nomad: support ACL bootstrap reset
2017-09-10 16:03:30 -07:00
Alex Dadgar
84d06f6abe
Sync namespace changes
2017-09-07 17:04:21 -07:00
Armon Dadgar
10500c39e5
nomad: fixing test
2017-09-04 13:21:01 -07:00
Armon Dadgar
1ace912341
nomad: adding bootstrapping checks
2017-09-04 13:05:53 -07:00
Armon Dadgar
06a7f12fad
nomad: adding bootstrap state store method
2017-09-04 13:05:53 -07:00
Armon Dadgar
583a11cebd
nomad: Adding ability to filter list of tokens to global only
2017-09-04 13:04:45 -07:00
Armon Dadgar
f91d2608cb
nomad: renambe PublicID to AccessorID for consistency
2017-09-04 13:04:45 -07:00
Armon Dadgar
a17991e907
nomad: CRUD methods for ACLTokens
2017-09-04 13:04:45 -07:00
Armon Dadgar
cde8e9301b
nomad: fixing state store tests due to signature mismatch
2017-09-04 13:04:44 -07:00
Armon Dadgar
351afa0069
nomad: Upsert and Delete ACL policies can take a list
2017-09-04 13:03:14 -07:00
Armon Dadgar
4cb544e8f3
nomad: Adding CRUD to state store for ACL Policies
2017-09-04 13:03:14 -07:00
Alex Dadgar
4cc8bac48d
fix blocking query due to ctx change
2017-08-31 15:34:55 -07:00
Alex Dadgar
590ff91bf3
Deployment watcher takes state store
2017-08-30 18:51:59 -07:00
Alex Dadgar
dfcb73c896
Fix purging job versions
...
This PR fixes an issue in which the job versions weren't properly
cleaned when removing a job.
Fixes https://github.com/hashicorp/nomad/issues/3052
2017-08-18 15:46:03 -07:00
Luke Farnell
f0ced87b95
fixed all spelling mistakes for goreport
2017-08-07 17:13:05 -04:00
Alex Dadgar
5e98c3ce95
Expose FSM errors into deployment watcher and API
...
This PR exposes errors returned by the FSM to the deployment watcher and
thus the API. It also adds an error to handle the case of promoting a
deployment that has no eligible canaries.
2017-07-25 16:23:22 -07:00
Alex Dadgar
3a29b38108
Status description shows requiring promotion
2017-07-07 12:12:48 -07:00
Alex Dadgar
de54ffd1f6
Deployment from inplace updates tracks placed properly.
2017-07-07 12:10:04 -07:00
Alex Dadgar
5457bb7962
Job stability
2017-07-07 12:10:04 -07:00
Alex Dadgar
d07a5a2008
Complete deployments mark jobs as stable
...
This PR allows jobs to be marked as stable automatically by a successful
deployment.
2017-07-07 12:10:04 -07:00
Alex Dadgar
454083ba1b
Remove canary
2017-07-07 12:10:04 -07:00
Alex Dadgar
c10d7ab871
Remove promoted bit from allocation
2017-07-07 12:10:04 -07:00
Alex Dadgar
09dfa2fc10
Rename CreateDeployments and remove cancelling behavior in state_store
2017-07-07 12:10:04 -07:00
Alex Dadgar
b64185a3f1
Deployment GC
...
This PR implements the garbage collector for deployments. Deployments
will by default be garbage collected after 1 hour.
2017-07-07 12:05:57 -07:00
Alex Dadgar
73325f888f
deployment api
2017-07-07 12:03:11 -07:00
Alex Dadgar
dad9e69822
more comment fixes
2017-07-07 12:03:11 -07:00
Alex Dadgar
80dc4d66d8
Deployments list
2017-07-07 12:03:11 -07:00
Alex Dadgar
eec3cefee4
state store tests
2017-07-07 12:03:11 -07:00
Alex Dadgar
d04877d23c
initial impl
2017-07-07 12:03:11 -07:00
Michael Schurter
8d3e13ab8a
System jobs without evals are running too
2017-07-03 13:48:51 -07:00
Michael Schurter
f7d2a74ddf
System jobs should be running until stopped
...
Prior to this commit they would be marked as dead if they had no
currently running allocations -- even though they would spring back to
life (running) if the cluster state changed such that a new eval+alloc
was created.
2017-06-28 11:39:24 -07:00
Alex Dadgar
83f5e65aae
Plan allows updating the status of deployments
2017-05-11 12:49:04 -07:00
Alex Dadgar
7078d563cb
Create Deployments through plan application
2017-05-05 15:33:19 -07:00
Alex Dadgar
343ff03f02
Deployment struct, state store, fsm persist/restore
2017-05-04 13:37:18 -07:00
Alex Dadgar
aed852782f
Merge pull request #2592 from hashicorp/b-gc-race
...
Protect against nil job in new allocation
2017-05-01 13:54:43 -07:00
Alex Dadgar
efa91c3d89
Protect against nil job in new allocation
2017-04-26 18:27:27 -07:00
Alex Dadgar
1b97c9abdd
Revert server endpoint
2017-04-20 11:14:06 -07:00
Alex Dadgar
1769fe468a
Fix some tests
2017-04-17 19:39:20 -07:00
Alex Dadgar
34332af70e
GC and some fixes
2017-04-15 17:08:05 -07:00
Alex Dadgar
3145086a42
non-purge deregisters
2017-04-15 17:08:05 -07:00
Alex Dadgar
fda44689b7
Histories -> Versions
2017-04-15 17:08:05 -07:00
Alex Dadgar
f97664512b
Upsert Job Histories
2017-04-15 17:08:05 -07:00
Alex Dadgar
787be30f13
Fix periodic job state
...
This PR fixes an issue in which a periodic job would incorrectly
transistion to status dead.
Fixes https://github.com/hashicorp/nomad/issues/2268
2017-03-27 10:35:36 -07:00
Alex Dadgar
5d293c0f1e
Add abandon tests and use snapshot for blocking queries
2017-02-08 11:18:03 -08:00
Alex Dadgar
36d018514b
Fix test
2017-02-07 11:35:38 -08:00
Alex Dadgar
bc2e6b0cc2
Fix state store tests
2017-02-06 16:46:23 -08:00
Alex Dadgar
0f046b179a
Merge pull request #2155 from hashicorp/f-cancel
...
Cancel blocked evals upon successful one for job
2017-01-11 13:10:35 -08:00
Alex Dadgar
8d5f0fea69
Merge pull request #2128 from hashicorp/f-dispatch
...
Nomad Constructor Jobs and Dispatch
2017-01-06 05:22:49 +08:00
Alex Dadgar
86980e08f0
Cancel blocked evals upon successful one for job
...
This PR causes blocked evaluations to be cancelled if there is a
subsequent successful evaluation for the job. This fixes UX problems
showing failed placements when there are not any in reality and makes GC
possible for these jobs in certain cases.
Fixes https://github.com/hashicorp/nomad/issues/2124
2017-01-04 16:16:04 -08:00
Alex Dadgar
2761e1d8ea
fix tests
2016-12-16 10:21:56 -08:00
Alex Dadgar
1235fc6581
summary tests
2016-12-13 16:15:40 -08:00
Diptanu Choudhury
5191b4d33a
Making the status command return the allocs of currently registered job
2016-11-24 16:31:30 +01:00
Alex Dadgar
df4398beac
Implement blocking queries for /v1/job/evaluations
2016-10-29 17:30:34 -07:00
Diptanu Choudhury
1b3c5e98c8
Renaming LocalDisk to EphemeralDisk ( #1710 )
...
Renaming LocalDisk to EphemeralDisk
2016-09-14 15:43:42 -07:00
Diptanu Choudhury
6028682ad2
Adding LocalDisk to alloc.Job
2016-09-01 17:41:50 -07:00
Alex Dadgar
3c9936ae4a
Merge pull request #1659 from hashicorp/f-revoke-accessors
...
Token revocation and keeping only a single Vault client active among servers
2016-08-31 14:10:46 -07:00
Alex Dadgar
48696ba0cc
Use tomb to shutdown
...
Token revocation
Remove from the statestore
Revoke tokens
Don't error when Vault is disabled as this could cause issue if the operator ever goes from enabled to disabled
update server interface to allow enable/disable and config loading
test the new functions
Leader revoke
Use active
2016-08-28 14:06:25 -07:00
Diptanu Choudhury
3447658bba
Added scheduler tests to ensure disk constraints are honored
2016-08-25 15:31:56 -05:00
Diptanu Choudhury
8105613c25
Added an upgrade path for existing jobs with no local disk
2016-08-25 13:00:20 -05:00
Alex Dadgar
901000f789
Raft message, fsm and state store table
2016-08-19 16:40:37 -07:00
Diptanu Choudhury
6dc5b1972c
Setting job's create index as summary create index during reconciliation
2016-08-04 15:14:01 -07:00
Alex Dadgar
2fb67fefb5
Merge pull request #1516 from hashicorp/f-lost-state-sched
...
Make scheduler mark allocations as lost
2016-08-04 11:36:02 -07:00
Diptanu Choudhury
88d383c47f
Updated tests and comments
2016-08-04 11:29:36 -07:00
Diptanu Choudhury
c24e8ba7d8
Not updating summary if job is de-registered
2016-08-03 17:00:08 -07:00
Alex Dadgar
ac3328e812
Make scheduler mark allocations as lost
2016-08-03 15:57:46 -07:00
Diptanu Choudhury
1b60e0823a
Added a test for restoring the summaries in fsm
2016-08-03 11:58:36 -07:00
Alex Dadgar
4197e62e78
Remove old way of marking lost
2016-08-03 11:20:56 -07:00
Diptanu Choudhury
6f8c40fca7
Not updating summary if create index of summary not same as job's create index
2016-08-02 18:59:45 -07:00
Diptanu Choudhury
87fdeb5393
Updated the logic to update job summary
2016-08-02 16:08:20 -07:00
Diptanu Choudhury
b69b7129a6
Using the parnet transaction to query the allocation while updating summary
2016-08-01 16:46:05 -07:00
Diptanu Choudhury
b0e1f02e26
Not updating job summaries if jobs are not present
2016-07-28 15:24:27 -07:00
Diptanu Choudhury
1bab053490
Updated some tests
2016-07-26 15:11:48 -07:00
Diptanu Choudhury
5bded8d54d
Setting the right indexes while creating Job Summary
2016-07-25 17:51:20 -07:00
Diptanu Choudhury
3089833397
Reconciling the queued allocations during restore
2016-07-25 17:31:40 -07:00
Diptanu Choudhury
f1c9427c37
Added code to create missing job summaries
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
50842b88c7
Fixed some bugs
2016-07-25 17:26:38 -07:00
Alex Dadgar
e0114fee05
InitFields to Canonicalize
2016-07-20 16:08:52 -07:00
Diptanu Choudhury
487c66b84d
Removing the queued state of Job Summary and alloc desired status false
2016-07-13 13:20:46 -06:00
Diptanu Choudhury
5d782abd50
Refactored the test
2016-07-12 14:37:51 -06:00
Diptanu Choudhury
00b9b4c6e8
Accounting lost state of allocations
2016-07-12 14:27:45 -06:00
Diptanu Choudhury
313d7aa7f5
Added a test to ensure client alloc updates are happening properly
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
91b828d299
Updated logic to handle change in desired status of allocation when client status is still pending
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
6937c0f7f3
Added test for job summary restore
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
5e6f9ef69e
Added methods to save and restore job summary snapshots
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
67953b1583
Added a test to ensure correctness of job summary when client updates alloc
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
837b70f285
Added test to make sure summary gets deleted when job gets deleted
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
0606840080
Implemented logic to update the job summary when allocs are inserted
2016-07-12 11:41:13 -06:00