Commit Graph

423 Commits

Author SHA1 Message Date
Arshneet Singh 9cc39edb67 Return error when preempted/stopped alloc doesn't exist during denormalization 2019-04-24 12:36:07 -07:00
Arshneet Singh d4e7a5c005 Add comments to functions, and use require instead of assert 2019-04-23 09:57:21 -07:00
Arshneet Singh 4cf4324b8f Remove allowPlanOptimization from schedulers 2019-04-23 09:18:02 -07:00
Arshneet Singh 65f5fab131 Add tests for plan normalization 2019-04-23 09:18:01 -07:00
Arshneet Singh b977748a4b Add code for plan normalization 2019-04-23 09:18:01 -07:00
Alex Dadgar 4bdccab550 goimports 2019-01-22 15:44:31 -08:00
Preetha Appan 510d7839e4
code review comments 2019-01-18 17:41:39 -06:00
Preetha Appan be9656d195
fix linting 2019-01-17 15:36:33 -06:00
Preetha Appan 0f8a113ead
Refactor to find jobs with child instances more effeciently
also added unit tests
2019-01-17 14:29:48 -06:00
Preetha Appan be36fee48e
Use IsParameterized/isPeriodic methods 2019-01-17 12:15:42 -06:00
Preetha Appan 81a8f18cac
Fix bug in reconcile summaries that affects periodic/parameterized jobs
This fixes incorrect parent job summaries by recomputing them in the
ReconcileJobSummaries method in the state store
2019-01-17 12:01:01 -06:00
Preetha Appan 8656d3379f
Add guards around subtracting summary count 2018-12-03 11:16:35 -06:00
Mahmood Ali 6281700c0c address review comments 2018-11-20 13:21:39 -05:00
Mahmood Ali d744e71fa9 add a missing no errorassertion 2018-11-19 21:44:00 -05:00
Mahmood Ali b93643cd96 Fix a panic related to batch GC
`deleteJobVersions` does concurrent modifications to iterated items
while iterating, by deleting job versions while it's iterating on them,
2018-11-19 20:59:45 -05:00
Mahmood Ali bff9c3b3e9 Reproduce a panic related to batch GC
Test case that reproduces a panic with the following stacktrace:

```
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715]

goroutine 35 [running]:
testing.tRunner.func1(0xc0001e2200)
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387
panic(0x167e400, 0x1c43a30)
	/usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530)
	/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc0003a0420, 0x1756059, 0xb)
	/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e
github.com/hashicorp/nomad/nomad/state.(*StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1
github.com/hashicorp/nomad/nomad/state.(*StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2
github.com/hashicorp/nomad/nomad/state.(*StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685
testing.tRunner(0xc0001e2200, 0x1777138)
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353
```
2018-11-19 20:58:32 -05:00
Mahmood Ali a4a9347501 fix comment typos 2018-11-14 08:36:14 -05:00
Mahmood Ali 1403ad21b9 Changelog job re-run fix 2018-11-13 07:52:51 -05:00
Mahmood Ali e2d668f21c
Merge pull request #4861 from hashicorp/b-batch-deregister-transaction
Run job deregistering in a single transaction
2018-11-12 20:59:44 -05:00
Mahmood Ali 8513b3cccb Comment public functions and batch write txn 2018-11-12 16:09:39 -05:00
Preetha Appan 7ef126a027
Smaller methods, and added tests for RPC layer 2018-11-10 17:37:33 -06:00
Mahmood Ali 9c0a15f3ce Run job deregistering in a single transaction
Fixes https://github.com/hashicorp/nomad/issues/4299

Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals.

Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index.  However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals.  When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation.

This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.
2018-11-09 22:35:26 -05:00
Alex Dadgar 98398a8a44
Merge pull request #4842 from hashicorp/b-deployment-progress-deadline
Fix multiple bugs with progress deadline handling
2018-11-08 13:31:54 -08:00
Alex Dadgar 261aae32b1 more robust merging of the deployment status when getting updates from the client 2018-11-05 16:39:09 -08:00
Alex Dadgar 1c31970464 Fix multiple tgs with progress deadline handling
Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.

Further fix an issue where the blocked eval optimization would make it
so no evals were created to progress the deployment. To reproduce this
issue, prior to this commit, you can create a job with two task groups.
The first group has count 1 and resources such that it can not be
placed. The second group has count 3, max_parallel=1, and can be placed.
Run this first and then update the second group to do a deployment. It
will place the first of three, but never progress since there exists a
blocked eval. However, that doesn't capture the fact that there are two
groups being deployed.
2018-11-05 16:06:17 -08:00
Preetha Appan 57fe5050f0
more minor review feedback 2018-11-01 17:05:17 -05:00
Preetha Appan 8f7eb61823
Introduce a response object for scheduler configuration 2018-10-30 11:06:32 -05:00
Preetha Appan 1a5421f5d7
more minor cleanup 2018-10-30 11:06:32 -05:00
Preetha Appan 0494a098ce
More style and readablity fixes from review 2018-10-30 11:06:32 -05:00
Preetha Appan 1415032c13
More review comments 2018-10-30 11:06:32 -05:00
Preetha Appan 7b8156fc47
Restore/Snapshot plus unit tests for scheduler configuration 2018-10-30 11:06:32 -05:00
Preetha Appan bd34cbb1f7
Support for new scheduler config API, first use case is to disable preemption 2018-10-30 11:06:32 -05:00
Preetha Appan eb38488d08
Fix logic bug, unit test for plan apply method in state store 2018-10-30 11:06:32 -05:00
Preetha Appan cc295b90de
Implement preemption for system jobs.
This commit implements an allocation selection algorithm for finding
allocations to preempt. It currently special cases network resource asks
from others (cpu/memory/disk/iops).
2018-10-30 11:06:32 -05:00
Alex Dadgar 52f9cd7637 fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar b2449ae1ce Fix deployment watcher index usage
Fixes three issues:
1. Retrieving the latest evaluation index was not properly selecting the
greatest index. This would undermine checks we had to reduce the number
of evaluations created when the latest eval index was greater than any
alloc change
2. Fix an issue where the blocking query code was using the incorrect
index such that the index was higher than necassary.
3. Special case handling of blocked evaluation since the create/snapshot
index is no particularly useful since they can be reblocked.
2018-09-21 13:59:11 -07:00
Alex Dadgar 3c19d01d7a server 2018-09-15 16:23:13 -07:00
Alex Dadgar 300b1a7a15 Tests only use testlog package logger 2018-06-13 15:40:56 -07:00
Alex Dadgar 21c5ed850d Register events 2018-05-22 14:06:33 -07:00
Alex Dadgar 17aac1c9de node heartbeat missed event 2018-05-22 14:05:46 -07:00
Alex Dadgar 5f2080bc26 Emit events based on eligibility 2018-05-22 14:04:59 -07:00
Alex Dadgar a35248d1d8 Plumb event via FSM 2018-05-10 16:30:54 -07:00
Preetha Appan cba13e4ec5
Fix test set up to set ModifyTime for alloc 2018-05-07 14:55:01 -05:00
Preetha 02d63432b4
Fix typo 2018-05-07 14:55:01 -05:00
Alex Dadgar 738056634e
Fix the initial progress deadline calculation when the alloc is inplace updated to be part of a new deployment 2018-05-07 14:55:01 -05:00
Alex Dadgar 319763a5d8
remove unnessary merge of DeploymentStatus.Timestamp 2018-05-07 14:50:01 -05:00
Alex Dadgar f95ab4ade8
Mark canaries on creation, and unmark on promotion 2018-05-07 14:50:01 -05:00
Alex Dadgar 641ef81cbf
Test fixes 2018-05-07 14:50:01 -05:00
Alex Dadgar 99e00fb774
Pass through timestamp 2018-05-07 14:50:01 -05:00
Alex Dadgar 1336002255
Progress deadline in deployment state 2018-05-07 14:50:01 -05:00
Preetha Appan 52b3b53181
Update ModifyIndex of alloc when setting NextAllocation value 2018-05-03 17:04:36 -05:00
Michael Schurter 91b5bb58d9 add HasHealth helper for nil checks
We performed the DeploymentStatus nil checks a couple different ways, so
hopefully this helper will consoldiate them and make it more clear what
the code is doing.
2018-03-29 09:29:19 -07:00
Chelsea Holland Komlo 31557cc44f move tests to use time.Time 2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo 003bc209b9 use time.Time for node events for compatibility 2018-03-27 15:43:57 -04:00
Michael Schurter cb61a4bdc7 Fix linting errors 2018-03-21 16:51:45 -07:00
Alex Dadgar 2d91b9dfba Batch drain update 2018-03-21 16:51:44 -07:00
Alex Dadgar 7b2bad8c5e Toggle Drain allows resetting eligibility
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
2018-03-21 16:51:44 -07:00
Alex Dadgar 4754366640 job watcher 2018-03-21 16:51:44 -07:00
Alex Dadgar 93871c18f8 Fix retaining the drain 2018-03-21 16:51:44 -07:00
Alex Dadgar 0fba0101b6 RPC/FSM/State Store for Eligibility 2018-03-21 16:51:44 -07:00
Alex Dadgar 2f5309d82a Remove update time 2018-03-21 16:51:43 -07:00
Alex Dadgar 0965c9ed28 Fix tests 2018-03-21 16:51:43 -07:00
Alex Dadgar e459a666ed Node.Drain takes strategy 2018-03-21 16:49:48 -07:00
Michael Schurter 03d0e5b8a0 improve drain fsm/statestore tests 2018-03-21 16:49:48 -07:00
Michael Schurter d1ec65d765 switch to new raft DesiredTransition message 2018-03-21 16:49:48 -07:00
Alex Dadgar db4a634072 RPC, FSM, State Store for marking DesiredTransistion
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter c0542474db drain: initial drainv2 structs and impl 2018-03-21 16:49:48 -07:00
Alex Dadgar de6ebb6e6c small cleanup 2018-03-13 18:08:22 -07:00
Alex Dadgar 63e14b7d63 nodeevents -> events 2018-03-13 18:08:22 -07:00
Alex Dadgar d3c3deffad fixes 2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo b41501e442 code review feedback 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 8f109c344c make check fixes 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 1488b076d1 code review feedback 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 19ef872769 keep state store functions in one file 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo a8bcbd81e6 batch submitting node events 2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo d30c269fbe code review feedback 2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo 00d9923454 Ensure node updates don't strip node events
Add node events to CLI
2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo 93b732f97e move adding node registration event to the state store 2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo e7e4a31f5d fix up error logging 2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo 4ede27a3c8 RPC, FSM, state store for Node.EmitEvent
add node event when registering a node for the first time
2018-03-13 18:05:40 -07:00
Michael Schurter 7dd7fbcda2 non-Existent -> nonexistent
Reverting from #3963

https://www.merriam-webster.com/dictionary/existent
2018-03-12 11:59:33 -07:00
Josh Soref 64546b4f99 spelling: referenced 2018-03-11 18:40:45 +00:00
Josh Soref 7ab998803b spelling: periodic 2018-03-11 18:37:05 +00:00
Josh Soref 7f6e4012a0 spelling: existent 2018-03-11 18:30:37 +00:00
Kyle Havlovitz 2ccf565bf6 Refactor redundancy_zone/upgrade_version out of client meta 2018-01-29 20:03:38 -08:00
Preetha Appan 288ff0b6f0
Add test case to verify setting next alloc id correctly 2018-01-24 17:55:29 -06:00
Preetha Appan fd2fbefa4c
Add a field to track the next allocation during a replacement 2018-01-24 17:55:05 -06:00
Kyle Havlovitz 8d41f4ad40 Formatting/test adjustments 2018-01-18 15:03:35 -08:00
Kyle Havlovitz 12ff22ea70 Merge branch 'master' into autopilot 2018-01-18 13:29:25 -08:00
Preetha Appan d788c0464c
Clean up error logging 2017-12-18 17:56:12 -06:00
Alex Dadgar 1791cc3ca5 Handle upgrade path 2017-12-18 15:51:35 -08:00
Kyle Havlovitz 1c07066064 Add autopilot functionality based on Consul's autopilot 2017-12-18 14:29:41 -08:00
Preetha Appan 40cb1d327c
Address some code review comments 2017-12-18 15:22:23 -06:00
Preetha Appan 51bd0b59c7
Return an error if evaluation doesn't exist in state store at plan apply time. 2017-12-18 14:55:36 -06:00
Preetha Appan 3c36abfe14
Update eval modify index as part of plan apply. 2017-12-18 10:03:55 -06:00
Preetha Appan 39d70be009 Add ModifyTime to Allocation and update it both on plan applies and client initiated updates 2017-11-01 15:13:48 -05:00
Alex Dadgar 5d9db4c2df Bypass status checks for system, periodic, parameterized jobs 2017-10-27 09:34:50 -07:00
Alex Dadgar f4aa5ea0c7 lax timing 2017-10-24 10:58:06 -07:00
Alex Dadgar 1192385c63 Lax blocking query test timing 2017-10-20 13:07:17 -07:00
Alex Dadgar c1cc51dbee sync 2017-10-13 14:36:02 -07:00
Michael Schurter dfd2967cdb Merge pull request #3376 from hashicorp/f-node-acls
Allow Node.SecretID for Node.GetNode and Allocs.GetAlloc
2017-10-13 11:51:48 -07:00
Michael Schurter a003e3dd43 Add StateStore.NodeBySecretID 2017-10-12 15:27:29 -07:00
Michael Schurter 51bce7b1a3 Add index to Node.SecretID 2017-10-12 15:21:20 -07:00
Alex Dadgar e7e18c931c Fix sorting of job versions
Fixes an issue in which the versions were improperly sorted which would
cause pruning of the wrong job version. This essentially meant that job
versions above 255 would be dropped from the job version table (note
this was due to the prefix walk crossing from the 1-byte to 2-byte
threshold).

Fixes https://github.com/hashicorp/nomad/issues/3357
2017-10-12 13:33:55 -07:00
Michael Schurter a66c53d45a Remove `structs` import from `api`
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Alex Dadgar 4173834231 Enable more linters 2017-09-26 15:26:33 -07:00
Alex Dadgar 828c4abc44 Fix upgrading from 0.6.x to 0.7.0 2017-09-19 10:28:14 -05:00
Alex Dadgar e5ec915ac3 sync 2017-09-19 10:08:23 -05:00
Armon Dadgar 20a8e590a0 nomad: support ACL bootstrap reset 2017-09-10 16:03:30 -07:00
Alex Dadgar 84d06f6abe Sync namespace changes 2017-09-07 17:04:21 -07:00
Armon Dadgar 10500c39e5 nomad: fixing test 2017-09-04 13:21:01 -07:00
Armon Dadgar 97404e3f8c nomad: compute hash for ACL policies and tokens 2017-09-04 13:09:34 -07:00
Armon Dadgar 1ace912341 nomad: adding bootstrapping checks 2017-09-04 13:05:53 -07:00
Armon Dadgar 06a7f12fad nomad: adding bootstrap state store method 2017-09-04 13:05:53 -07:00
Armon Dadgar 583a11cebd nomad: Adding ability to filter list of tokens to global only 2017-09-04 13:04:45 -07:00
Armon Dadgar bc697dc50e Address @dadgar feedback 2017-09-04 13:04:45 -07:00
Armon Dadgar f91d2608cb nomad: renambe PublicID to AccessorID for consistency 2017-09-04 13:04:45 -07:00
Armon Dadgar a17991e907 nomad: CRUD methods for ACLTokens 2017-09-04 13:04:45 -07:00
Armon Dadgar 8623bf9a5b nomad: adding ACLToken table 2017-09-04 13:04:45 -07:00
Armon Dadgar cde8e9301b nomad: fixing state store tests due to signature mismatch 2017-09-04 13:04:44 -07:00
Armon Dadgar 351afa0069 nomad: Upsert and Delete ACL policies can take a list 2017-09-04 13:03:14 -07:00
Armon Dadgar 4cb544e8f3 nomad: Adding CRUD to state store for ACL Policies 2017-09-04 13:03:14 -07:00
Armon Dadgar 85cad11885 nomad: adding policy table to state store 2017-09-04 13:03:14 -07:00
Alex Dadgar 4cc8bac48d fix blocking query due to ctx change 2017-08-31 15:34:55 -07:00
Alex Dadgar 590ff91bf3 Deployment watcher takes state store 2017-08-30 18:51:59 -07:00
Alex Dadgar dfcb73c896 Fix purging job versions
This PR fixes an issue in which the job versions weren't properly
cleaned when removing a job.

Fixes https://github.com/hashicorp/nomad/issues/3052
2017-08-18 15:46:03 -07:00
Luke Farnell f0ced87b95 fixed all spelling mistakes for goreport 2017-08-07 17:13:05 -04:00
Alex Dadgar 5e98c3ce95 Expose FSM errors into deployment watcher and API
This PR exposes errors returned by the FSM to the deployment watcher and
thus the API. It also adds an error to handle the case of promoting a
deployment that has no eligible canaries.
2017-07-25 16:23:22 -07:00
Alex Dadgar 311084c724 Allow the deployment to not exist and just no-op 2017-07-17 14:09:59 -07:00
Alex Dadgar 3a29b38108 Status description shows requiring promotion 2017-07-07 12:12:48 -07:00
Alex Dadgar 08bf34f9a3 Fix JobModifyIndex changing when job is marked stable 2017-07-07 12:12:48 -07:00
Alex Dadgar de54ffd1f6 Deployment from inplace updates tracks placed properly. 2017-07-07 12:10:04 -07:00
Alex Dadgar a7fdc74bd4 feedback 2017-07-07 12:10:04 -07:00
Alex Dadgar 5457bb7962 Job stability 2017-07-07 12:10:04 -07:00
Alex Dadgar d07a5a2008 Complete deployments mark jobs as stable
This PR allows jobs to be marked as stable automatically by a successful
deployment.
2017-07-07 12:10:04 -07:00
Alex Dadgar 454083ba1b Remove canary 2017-07-07 12:10:04 -07:00
Alex Dadgar c10d7ab871 Remove promoted bit from allocation 2017-07-07 12:10:04 -07:00
Alex Dadgar 09dfa2fc10 Rename CreateDeployments and remove cancelling behavior in state_store 2017-07-07 12:10:04 -07:00
Alex Dadgar 2e2fd26bed Update index 2017-07-07 12:07:08 -07:00
Alex Dadgar ecee5e370e initial watcher 2017-07-07 12:07:08 -07:00
Alex Dadgar e7034691ea deployment status 2017-07-07 12:07:07 -07:00
Alex Dadgar b64185a3f1 Deployment GC
This PR implements the garbage collector for deployments. Deployments
will by default be garbage collected after 1 hour.
2017-07-07 12:05:57 -07:00
Alex Dadgar 73325f888f deployment api 2017-07-07 12:03:11 -07:00
Alex Dadgar dad9e69822 more comment fixes 2017-07-07 12:03:11 -07:00
Alex Dadgar c189948ad2 comments on watcher 2017-07-07 12:03:11 -07:00
Alex Dadgar 87d187d777 Tests 2017-07-07 12:03:11 -07:00
Alex Dadgar 6f821beec4 fix integration slightly 2017-07-07 12:03:11 -07:00
Alex Dadgar b4c8f56570 Deployment watcher tests 2017-07-07 12:03:11 -07:00
Alex Dadgar 80dc4d66d8 Deployments list 2017-07-07 12:03:11 -07:00
Alex Dadgar eec3cefee4 state store tests 2017-07-07 12:03:11 -07:00