Preetha Appan
be36fee48e
Use IsParameterized/isPeriodic methods
2019-01-17 12:15:42 -06:00
Preetha Appan
81a8f18cac
Fix bug in reconcile summaries that affects periodic/parameterized jobs
...
This fixes incorrect parent job summaries by recomputing them in the
ReconcileJobSummaries method in the state store
2019-01-17 12:01:01 -06:00
Mahmood Ali
a4a9347501
fix comment typos
2018-11-14 08:36:14 -05:00
Alex Dadgar
08dc2ea702
Merge pull request #4867 from hashicorp/b-deployment-progress-deadline
...
Blocked evaluation fixes
2018-11-13 10:29:03 -08:00
Mahmood Ali
8513b3cccb
Comment public functions and batch write txn
2018-11-12 16:09:39 -05:00
Mahmood Ali
9c0a15f3ce
Run job deregistering in a single transaction
...
Fixes https://github.com/hashicorp/nomad/issues/4299
Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals.
Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index. However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals. When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation.
This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.
2018-11-09 22:35:26 -05:00
Alex Dadgar
b1c5d52817
Track jobs by namespace
2018-11-07 10:22:08 -08:00
Preetha Appan
32cc764072
Add fsm layer tests
2018-10-30 11:06:32 -05:00
Preetha Appan
7b8156fc47
Restore/Snapshot plus unit tests for scheduler configuration
2018-10-30 11:06:32 -05:00
Preetha Appan
bd34cbb1f7
Support for new scheduler config API, first use case is to disable preemption
2018-10-30 11:06:32 -05:00
Preetha Appan
cc295b90de
Implement preemption for system jobs.
...
This commit implements an allocation selection algorithm for finding
allocations to preempt. It currently special cases network resource asks
from others (cpu/memory/disk/iops).
2018-10-30 11:06:32 -05:00
Alex Dadgar
52f9cd7637
fixing tests
2018-10-04 14:26:19 -07:00
Alex Dadgar
ca28afa3b2
small fixes
2018-09-15 16:42:38 -07:00
Alex Dadgar
3c19d01d7a
server
2018-09-15 16:23:13 -07:00
Alex Dadgar
300b1a7a15
Tests only use testlog package logger
2018-06-13 15:40:56 -07:00
Alex Dadgar
352f2e03b5
Clean up leaked deployments on restoration
...
This PR cancels deployments that are active but do not have a job
associated with them. This is a broken invariant that causes issues in
the deployment watcher since it will not track them. Thus they are
objects that can't be operated on or cleaned up.
Fixes https://github.com/hashicorp/nomad/issues/4286
2018-05-23 16:44:21 -07:00
Alex Dadgar
17aac1c9de
node heartbeat missed event
2018-05-22 14:05:46 -07:00
Alex Dadgar
5f2080bc26
Emit events based on eligibility
2018-05-22 14:04:59 -07:00
Alex Dadgar
a35248d1d8
Plumb event via FSM
2018-05-10 16:30:54 -07:00
Alex Dadgar
c91ce5cc38
Fix not enqueuing eval
2018-05-07 14:50:01 -05:00
Alex Dadgar
641ef81cbf
Test fixes
2018-05-07 14:50:01 -05:00
Alex Dadgar
d0f237086b
UX touchups
2018-04-26 15:24:27 -07:00
Alex Dadgar
2b14371db5
Fix spelling
2018-04-03 15:58:03 -07:00
Alex Dadgar
9617a13a2b
Correctly handle the upgrade path of a node being drained when applying Raft logs
2018-04-03 15:32:44 -07:00
Alex Dadgar
301704091b
Handle upgrade where Node doesn't have eligiblity
...
This PR handles upgrading a node that has no scheduling eligiblity set.
2018-03-29 16:52:23 -07:00
Alex Dadgar
2d91b9dfba
Batch drain update
2018-03-21 16:51:44 -07:00
Alex Dadgar
7b2bad8c5e
Toggle Drain allows resetting eligibility
...
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
2018-03-21 16:51:44 -07:00
Alex Dadgar
010a6b8ca5
Unblock evals once eligible
2018-03-21 16:51:44 -07:00
Alex Dadgar
0fba0101b6
RPC/FSM/State Store for Eligibility
2018-03-21 16:51:44 -07:00
Alex Dadgar
2f5309d82a
Remove update time
2018-03-21 16:51:43 -07:00
Alex Dadgar
e459a666ed
Node.Drain takes strategy
2018-03-21 16:49:48 -07:00
Michael Schurter
d1ec65d765
switch to new raft DesiredTransition message
2018-03-21 16:49:48 -07:00
Alex Dadgar
db4a634072
RPC, FSM, State Store for marking DesiredTransistion
...
fix build tag
2018-03-21 16:49:48 -07:00
Alex Dadgar
586ae36d13
Batch Deregister RPC
2018-03-16 10:53:03 -07:00
Alex Dadgar
de6ebb6e6c
small cleanup
2018-03-13 18:08:22 -07:00
Alex Dadgar
d3c3deffad
fixes
2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo
1488b076d1
code review feedback
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
a8bcbd81e6
batch submitting node events
2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo
d30c269fbe
code review feedback
2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo
4ede27a3c8
RPC, FSM, state store for Node.EmitEvent
...
add node event when registering a node for the first time
2018-03-13 18:05:40 -07:00
Josh Soref
f28efbbc79
spelling: sanitize
2018-03-11 18:52:59 +00:00
Josh Soref
7ab998803b
spelling: periodic
2018-03-11 18:37:05 +00:00
Josh Soref
d300623abe
spelling: evaluation
2018-03-11 18:01:35 +00:00
Josh Soref
d9ce1f7882
spelling: deregister
2018-03-11 17:53:22 +00:00
Preetha Appan
52a665836e
Remove extra fields set in client allocations during update
2018-01-31 09:58:05 -06:00
Preetha Appan
2567b51c58
Edge trigger evaluation when allocations client status is failed
2018-01-31 09:56:53 -06:00
Kyle Havlovitz
12ff22ea70
Merge branch 'master' into autopilot
2018-01-18 13:29:25 -08:00
Kyle Havlovitz
1c07066064
Add autopilot functionality based on Consul's autopilot
2017-12-18 14:29:41 -08:00
Preetha Appan
51bd0b59c7
Return an error if evaluation doesn't exist in state store at plan apply time.
2017-12-18 14:55:36 -06:00
Alex Dadgar
86608124ca
Fix followers not creating periodic launch
...
Fix an issue in which periodic launches wouldn't be made on followers.
2017-12-11 13:55:17 -08:00
Alex Dadgar
c1cc51dbee
sync
2017-10-13 14:36:02 -07:00
Michael Schurter
a66c53d45a
Remove structs
import from api
...
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Alex Dadgar
54e04b5c0e
Merge pull request #3201 from hashicorp/b-periodic-restore
...
Fix restoration of stopped periodic jobs
2017-09-13 11:42:29 -07:00
Alex Dadgar
a2363e7583
sync acls
2017-09-13 11:38:29 -07:00
Alex Dadgar
e3dbcdcb44
Fix restoration of stopped periodic jobs
...
This PR fixes an issue in which we would add a stopped periodic job to
the periodic launcher.
2017-09-12 14:25:40 -07:00
Armon Dadgar
20a8e590a0
nomad: support ACL bootstrap reset
2017-09-10 16:03:30 -07:00
Alex Dadgar
84d06f6abe
Sync namespace changes
2017-09-07 17:04:21 -07:00
Armon Dadgar
e24a4abf2c
nomad: adding ACL bootstrap endpoints
2017-09-04 13:05:53 -07:00
Armon Dadgar
e5c69f162c
nomad: implement ACL token endpoints
2017-09-04 13:04:45 -07:00
Armon Dadgar
e9bad0bf37
nomad: Add ACL Token snapshot/restore to FSM
2017-09-04 13:04:45 -07:00
Armon Dadgar
afdde24799
nomad: adding upsert policy endpoint
2017-09-04 13:03:15 -07:00
Armon Dadgar
e3e243f433
nomad: implement policy delete endpoint
2017-09-04 13:03:15 -07:00
Armon Dadgar
f147f25fe5
Addressing @dadgar review feedback
2017-09-04 13:03:15 -07:00
Armon Dadgar
10b583ea38
nomad: adding FSM snapshot/restore of ACL policies
2017-09-04 13:03:14 -07:00
Alex Dadgar
5457bb7962
Job stability
2017-07-07 12:10:04 -07:00
Alex Dadgar
b64185a3f1
Deployment GC
...
This PR implements the garbage collector for deployments. Deployments
will by default be garbage collected after 1 hour.
2017-07-07 12:05:57 -07:00
Alex Dadgar
dad9e69822
more comment fixes
2017-07-07 12:03:11 -07:00
Alex Dadgar
6688a3f76c
FSM Tests
2017-07-07 12:03:11 -07:00
Alex Dadgar
d04877d23c
initial impl
2017-07-07 12:03:11 -07:00
Alex Dadgar
919c50ba5c
Merge branch 'master' into f-update-block
2017-05-11 13:08:31 -07:00
Alex Dadgar
50eec3ef35
handle upgrading old update block syntax
2017-05-10 13:48:53 -07:00
Alex Dadgar
7078d563cb
Create Deployments through plan application
2017-05-05 15:33:19 -07:00
Alex Dadgar
343ff03f02
Deployment struct, state store, fsm persist/restore
2017-05-04 13:37:18 -07:00
Alex Dadgar
b67c40f717
Proper denormalization in optimistic state store
2017-05-01 14:49:57 -07:00
Alex Dadgar
3145086a42
non-purge deregisters
2017-04-15 17:08:05 -07:00
Alex Dadgar
103e8d21fb
Fix dispatch of periodic job
...
This PR fixes an issue in which when a periodic and parameterized job
was dispatched, an allocation would be immediately created.
Fixes https://github.com/hashicorp/nomad/issues/2470
2017-03-27 16:55:17 -07:00
Alex Dadgar
eaf285b208
Fix missing summary restoration
2017-02-08 11:51:48 -08:00
Alex Dadgar
5d293c0f1e
Add abandon tests and use snapshot for blocking queries
2017-02-08 11:18:03 -08:00
Alex Dadgar
b69b357c7f
Nomad builds
2017-02-07 20:31:23 -08:00
Alex Dadgar
570efcaebd
Update state store and blocking query helper
2017-02-05 12:03:11 -08:00
Alex Dadgar
a616e9d970
Merge pull request #2163 from hashicorp/b-summary
...
Job Summary: Fix queued accounting and remove in-place state store updates
2017-01-11 13:36:36 -08:00
Alex Dadgar
efbb2894c7
Review fixes
2017-01-11 13:18:36 -08:00
Alex Dadgar
d0f918495b
Store pointer of JobSummary in state store and remove in-place modifications of the object and replace with Copy-Update-Insert operations
2017-01-08 13:55:03 -08:00
Alex Dadgar
86980e08f0
Cancel blocked evals upon successful one for job
...
This PR causes blocked evaluations to be cancelled if there is a
subsequent successful evaluation for the job. This fixes UX problems
showing failed placements when there are not any in reality and makes GC
possible for these jobs in certain cases.
Fixes https://github.com/hashicorp/nomad/issues/2124
2017-01-04 16:16:04 -08:00
Diptanu Choudhury
56ed1d3cd8
Fixing the upgrade path for ephemeral disk
2016-11-08 15:24:51 -08:00
Alex Dadgar
3c9936ae4a
Merge pull request #1659 from hashicorp/f-revoke-accessors
...
Token revocation and keeping only a single Vault client active among servers
2016-08-31 14:10:46 -07:00
Alex Dadgar
6047414fb9
address comments
2016-08-31 14:10:33 -07:00
Diptanu Choudhury
bfee7b30a3
Introducing shared resources in alloc
2016-08-29 13:49:25 -07:00
Alex Dadgar
48696ba0cc
Use tomb to shutdown
...
Token revocation
Remove from the statestore
Revoke tokens
Don't error when Vault is disabled as this could cause issue if the operator ever goes from enabled to disabled
update server interface to allow enable/disable and config loading
test the new functions
Leader revoke
Use active
2016-08-28 14:06:25 -07:00
Diptanu Choudhury
13497913f9
Ensuring resources are re-calculated properly in fsm
2016-08-26 20:13:11 -07:00
Alex Dadgar
76d324a8f0
fix comment
2016-08-22 11:41:47 -07:00
Alex Dadgar
901000f789
Raft message, fsm and state store table
2016-08-19 16:40:37 -07:00
Diptanu Choudhury
1518f23d0a
Making servers reconcile job summaries when they acquire leadership
2016-08-05 16:47:36 -07:00
Diptanu Choudhury
88d383c47f
Updated tests and comments
2016-08-04 11:29:36 -07:00
Diptanu Choudhury
74caed0c7a
Added an endpoint for users to reconcile job summaries
2016-08-03 16:12:47 -07:00
Diptanu Choudhury
1b60e0823a
Added a test for restoring the summaries in fsm
2016-08-03 11:58:36 -07:00
Alex Dadgar
2332a58944
Do not update the job of allocations that are being stopped
2016-08-02 17:53:31 -07:00
Diptanu Choudhury
10a5c06a5a
Running the tests in verbose mode
2016-07-26 14:02:47 -07:00
Diptanu Choudhury
d1a6bdb4ba
Making the queued allocations bind late
2016-07-25 22:11:11 -07:00
Diptanu Choudhury
3089833397
Reconciling the queued allocations during restore
2016-07-25 17:31:40 -07:00