Commit graph

2617 commits

Author SHA1 Message Date
Preetha Appan c33469157d
unit test plan apply with preemptions 2018-11-01 20:06:32 -05:00
Preetha Appan 57fe5050f0
more minor review feedback 2018-11-01 17:05:17 -05:00
Preetha Appan fd60e66f86
Plumb alloc resource cache in a few more places.
also removed now unused method
2018-11-01 16:44:43 -05:00
Preetha Appan e586817ce7
batch jobs GC removes terminal allocs if job modifyindex is older than running job 2018-11-01 00:05:31 -05:00
Mahmood Ali 9da19c6450 address review comments 2018-10-30 13:58:52 -04:00
Mahmood Ali 4937095389 Allow artifacts checksum interpolation
Fixes https://github.com/hashicorp/nomad/issues/4814
2018-10-30 13:24:30 -04:00
Preetha Appan f1c3eb2792
Introduce interface with multiple implementations for resource distance 2018-10-30 11:06:32 -05:00
Preetha Appan 8f7eb61823
Introduce a response object for scheduler configuration 2018-10-30 11:06:32 -05:00
Preetha Appan 1a5421f5d7
more minor cleanup 2018-10-30 11:06:32 -05:00
Preetha Appan 0494a098ce
More style and readablity fixes from review 2018-10-30 11:06:32 -05:00
Preetha Appan 1415032c13
More review comments 2018-10-30 11:06:32 -05:00
Preetha Appan b97f85e3e0
style fixes 2018-10-30 11:06:32 -05:00
Preetha Appan 12278527c7
make default config a variable 2018-10-30 11:06:32 -05:00
Preetha Appan 32cc764072
Add fsm layer tests 2018-10-30 11:06:32 -05:00
Preetha Appan 7b8156fc47
Restore/Snapshot plus unit tests for scheduler configuration 2018-10-30 11:06:32 -05:00
Preetha Appan 8807c25b11
Modify preemption code to use new style of resource structs 2018-10-30 11:06:32 -05:00
Preetha Appan c1c1c230e4
Make preemption config a struct to allow for enabling based on scheduler type 2018-10-30 11:06:32 -05:00
Preetha Appan bd34cbb1f7
Support for new scheduler config API, first use case is to disable preemption 2018-10-30 11:06:32 -05:00
Preetha Appan 3190a2c29b
Fix linting 2018-10-30 11:06:32 -05:00
Preetha Appan eb38488d08
Fix logic bug, unit test for plan apply method in state store 2018-10-30 11:06:32 -05:00
Preetha Appan 9e4a35fff0
Fix comment 2018-10-30 11:06:32 -05:00
Preetha Appan cc295b90de
Implement preemption for system jobs.
This commit implements an allocation selection algorithm for finding
allocations to preempt. It currently special cases network resource asks
from others (cpu/memory/disk/iops).
2018-10-30 11:06:32 -05:00
Preetha Appan d11064d6ba
structs and API changes to plan and alloc structs needed for preemption 2018-10-30 11:06:32 -05:00
Preetha Appan 9257387a69
Add number of evictions to DesiredUpdates struct to use in CLI/API 2018-10-30 11:06:32 -05:00
Preetha Appan 5ff4b8e36f
REview feedback 2018-10-30 11:06:32 -05:00
Preetha Appan 5b3bfb63eb
structs and API changes to plan and alloc structs needed for preemption 2018-10-30 11:06:32 -05:00
Michael Schurter 5d49832de4 tests: fix usages of TestClient cleanup and mock driver 2018-10-29 14:21:05 -07:00
Michael Schurter e060174130 ar: fix leader handling, state restoring, and destroying unrun ARs
* Migrated all of the old leader task tests and got them passing
* Refactor and consolidate task killing code in AR to always kill leader
  tasks first
* Fixed lots of issues with state restoring
* Fixed deadlock in AR.Destroy if AR.Run had never been called
* Added a new in memory statedb for testing
2018-10-19 09:45:45 -07:00
Alex Dadgar 6f0ed6184b Fix client reloading and pass the plugin loaders to server and client 2018-10-16 16:56:55 -07:00
Michael Schurter a4b4d7b266 consul service hook
Deregistration works but difficult to test due to terminal updates not
being fully implemented in the new client/ar/tr.
2018-10-16 16:53:29 -07:00
Alex Dadgar e401c660e7 Implement lifecycle hooks on the task runner 2018-10-16 16:53:29 -07:00
Alex Dadgar a78cefec18 use int64 2018-10-16 15:34:32 -07:00
Preetha Appan 7c0d8c646c
Change CPU/Disk/MemoryMB to int everywhere in new resource structs 2018-10-16 16:21:42 -05:00
Alex Dadgar f5a76d8411 review comments 2018-10-15 15:31:13 -07:00
Alex Dadgar f9b056e1d1 Replace attributes map with new Attribute object 2018-10-13 14:08:58 -07:00
Alex Dadgar 04ba425dd5 validate constraints/affinities 2018-10-13 12:27:49 -07:00
Alex Dadgar 9b5aaac410 Device feasability checker 2018-10-13 12:27:49 -07:00
Alex Dadgar bfb4caa2e7 node devices 2018-10-13 12:27:49 -07:00
Alex Dadgar 5a07f9f96e parse affinities and constraints on devices 2018-10-11 14:05:19 -07:00
Alex Dadgar a2a56a930c Diff 2018-10-08 17:02:58 -07:00
Alex Dadgar 6b08b9d6b6 Define device request structs 2018-10-08 15:38:03 -07:00
Alex Dadgar 01f8e5b95f renames 2018-10-04 14:57:25 -07:00
Alex Dadgar 52f9cd7637 fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar bac5cb1e8b Scheduler uses allocated resources 2018-10-02 17:08:25 -07:00
Alex Dadgar 147d2430a1 allocated resources structs 2018-09-29 18:47:28 -07:00
Alex Dadgar 5c8697667e Node reserved resources 2018-09-29 18:44:55 -07:00
Alex Dadgar 3183153315 Node resources on client 2018-09-29 17:23:41 -07:00
Alex Dadgar 9b793531d6
Merge pull request #4720 from hashicorp/b-jet-fixes
Series of scheduler fixes / debugging enhancements
2018-09-25 13:25:11 -07:00
Alex Dadgar bd420692f3 fix logging 2018-09-25 10:49:55 -07:00
Preetha Appan 86e725e84c Added logging around nacked evals in the scheduler worker 2018-09-25 10:49:02 -07:00
Alex Dadgar 6a21f9fe96 Unique TriggerBy for blocked evals
Give blocked evals a unique triggerby reason to make debugging a chain
of evaluations easier.
2018-09-24 14:47:49 -07:00
Alex Dadgar e1a102f58c test allocs fit 2018-09-24 13:59:01 -07:00
Alex Dadgar d7f5be9148 Better comment on snapshotindex 2018-09-24 13:53:43 -07:00
Alex Dadgar 99498da6ed Denormalize jobs in plan and ignore resources of terminal allocs
Denormalize jobs in AppendAllocs:
AppendAlloc was originally only ever called for inplace upgrades and new
allocations. Both these code paths would remove the job from the
allocation. Now we use this to also add fields such as FollowupEvalID
which did not normalize the job. This is only a performance enhancement.

Ignore terminal allocs:
Failed allocations are annotated with the followup Eval ID when one is
created to replace the failed allocation. However, in the plan applier,
when we check if allocations fit, these terminal allocations were not
filtered. This could result in the plan being rejected if the node would
be overcommited if the terminal allocations resources were considered.
2018-09-24 13:53:43 -07:00
Alex Dadgar de442226ae Fix other instances of blocking queries 2018-09-24 13:52:39 -07:00
Alex Dadgar 7f0d241ef4 always handle failed allocation 2018-09-21 15:13:54 -07:00
Alex Dadgar b2449ae1ce Fix deployment watcher index usage
Fixes three issues:
1. Retrieving the latest evaluation index was not properly selecting the
greatest index. This would undermine checks we had to reduce the number
of evaluations created when the latest eval index was greater than any
alloc change
2. Fix an issue where the blocking query code was using the incorrect
index such that the index was higher than necassary.
3. Special case handling of blocked evaluation since the create/snapshot
index is no particularly useful since they can be reblocked.
2018-09-21 13:59:11 -07:00
Alex Dadgar 5009566503 do not bootstrap with non voters 2018-09-19 17:17:39 -07:00
Alex Dadgar e8f89597f5 fix rpc test 2018-09-19 10:17:54 -07:00
Alex Dadgar 9971b3393f yamux 2018-09-17 14:22:40 -07:00
Alex Dadgar b2f500b48c Serf/Raft/Memberlist logger 2018-09-17 13:57:52 -07:00
Alex Dadgar ca28afa3b2 small fixes 2018-09-15 16:42:38 -07:00
Alex Dadgar 3c19d01d7a server 2018-09-15 16:23:13 -07:00
Alex Dadgar 7739ef51ce agent + consul 2018-09-13 10:43:40 -07:00
Preetha Appan 996484981c
Fix panic when reschedule policy for allocation can't be looked up
because its task group changed
2018-09-05 17:01:02 -05:00
Alex Dadgar 4f89cabd34
Merge pull request #4631 from hashicorp/f-plugin-config
Parse plugin configs
2018-09-04 17:04:13 -07:00
Alex Dadgar cc92cd92cd
Merge pull request #4642 from hashicorp/b-vet
Fix vet errors and use newer go version in travis
2018-09-04 17:04:02 -07:00
Alex Dadgar c6576ddac1 Fix make check errors 2018-09-04 16:03:52 -07:00
Preetha Appan 26288b9522
Fix more review feedback 2018-09-04 16:10:11 -05:00
Preetha Appan 751c0eb5a5
code review feedback 2018-09-04 16:10:11 -05:00
Preetha Appan 4f8e925b54
Move topk and delay heap to separate packages under lib 2018-09-04 16:10:11 -05:00
Preetha Appan 9bc0962527
Track top k nodes by norm score rather than top k nodes per scorer 2018-09-04 16:10:11 -05:00
Preetha Appan 6ed527c636
Use heap to store top K scoring nodes.
Scoring metadata is now aggregated by scorer type to make it easier
to parse when reading it in the CLI.
2018-09-04 16:10:11 -05:00
Preetha Appan dd5fe6373f
Fix scoring logic for uneven spread to incorporate current alloc count
Also addressed other small code review comments
2018-09-04 16:10:11 -05:00
Preetha Appan e72c0fe527
more cleanup 2018-09-04 16:10:11 -05:00
Preetha Appan 92d37acc2a
comment and formatting cleanup 2018-09-04 16:10:11 -05:00
Preetha Appan 5812f906c8
Allow empty spread targets, and validate target percentages. 2018-09-04 16:10:11 -05:00
Preetha Appan 71bff00326
validate spread from job/task group validate methods 2018-09-04 16:10:11 -05:00
Preetha Appan fbd0004707
Fix warnings 2018-09-04 16:10:11 -05:00
Preetha Appan 5eb82b6260
Validate method, and rename ratio field to percent 2018-09-04 16:10:11 -05:00
Preetha Appan 0037d72fa8
Structs and validation for spread 2018-09-04 16:10:11 -05:00
Preetha Appan c407e3626f
More review comments 2018-09-04 16:10:11 -05:00
Preetha Appan dbbb4a957a
Fail validation if system job has affinities 2018-09-04 16:10:11 -05:00
Preetha Appan 0bc030c6fb
Treat set_contains as a synonym of set_contains_all 2018-09-04 16:10:11 -05:00
Preetha Appan e85a721cfb
Include affinities in job and task diff, and more test cases 2018-09-04 16:10:11 -05:00
Preetha Appan f06c7ab2ad
Fix Copy method for job and task to include affinities 2018-09-04 16:10:11 -05:00
Preetha Appan 9f0caa9c3d
Affinity parsing, api and structs 2018-09-04 16:10:11 -05:00
Preetha Appan 9e29cfee76
Use readlock 2018-09-04 11:45:05 -05:00
Preetha Appan 062e5f1898
Use eval broker lock when reading/modifying delay heap 2018-08-31 10:59:48 -05:00
Alex Dadgar bff1669ee4 Plugin config parsing 2018-08-29 17:06:01 -07:00
Chelsea Komlo 0a69cdb304
Merge pull request #4565 from hashicorp/b-compare-cert-alg
Error if TLS Certificate signature algorithm isn't supported in cipher suites
2018-08-15 16:09:46 -04:00
Xopherus 8d747578e8 Close multiplexer when context is cancelled
Multiplexer continues to create rpc connections even when
the context which is passed to the underlying rpc connections
is cancelled by the server.

This was causing #4413 - when a SIGHUP causes everything to reload,
it uses context to cancel the underlying http/rpc connections
so that they may come up with the new configuration.
The multiplexer was not being cancelled properly so it would
continue to create rpc connections and constantly fail,
causing communication issues with other nomad agents.

Fixes #4413
2018-08-13 19:32:49 -04:00
Chelsea Holland Komlo 31d6d00381 add simple getter for certificate 2018-08-10 12:37:21 -04:00
Andrei Burd 444ee45aff Parametrized/periodic jobs per child tagged metric emmision 2018-06-21 10:40:56 +03:00
Alex Dadgar b61051b3cd
Merge pull request #4409 from hashicorp/r-client-packages
Refactor client packages
2018-06-13 17:32:25 -07:00
Alex Dadgar 300b1a7a15 Tests only use testlog package logger 2018-06-13 15:40:56 -07:00
Chelsea Komlo 03075b603a
Merge pull request #4399 from hashicorp/r-reload-refactor
Refactor logic for dynamic reloading
2018-06-13 13:35:12 -04:00
Alex Dadgar d0043691fb remove structs + bump version 2018-06-11 13:52:19 -07:00
Alex Dadgar af5753d2cd bump version + generated files 2018-06-11 13:39:42 -07:00
Nick Ethier e75e3ae665
nomad: use require pkg for tests 2018-06-11 13:50:50 -04:00
Nick Ethier 50c72adbd7
nomad: code review comments 2018-06-11 13:27:48 -04:00
Nick Ethier a581cc9c01
nomad/structs: fix job diff test 2018-06-11 13:06:49 -04:00
Nick Ethier 41e010cdc2
nomad: add 'Dispatch' field to Job
New -bash: Dispatch: command not found field is used to denote if the Job is a child dispatched job of
a parameterized job.
2018-06-11 11:59:03 -04:00
Chelsea Holland Komlo de03ce8070 move logic to determine whether to reload tls configuration to tlsutil helper 2018-06-08 14:33:58 -04:00
Chelsea Komlo d738976234
Merge pull request #4395 from hashicorp/b-vault-second
Fix for dynamically reloading vault
2018-06-07 18:03:00 -04:00
Chelsea Holland Komlo dcc9cdfeb7 fixup! comment and move to always log server reload operation 2018-06-07 17:12:36 -04:00
Chelsea Holland Komlo 41e35edf0c fix test that now requires different config for test assertions 2018-06-07 17:07:06 -04:00
Chelsea Holland Komlo 9f6bd7bf3a move logic for testing equality for vault config 2018-06-07 16:23:50 -04:00
Chelsea Holland Komlo 282f37b1ee fix for dynamically reloading vault 2018-06-07 15:34:18 -04:00
Nick Ethier 2555bff4f5
nomad: add error check in test 2018-06-06 14:08:42 -04:00
Nick Ethier d35bf6d184
nomad: handle edge case where node drain event shouldn't be emitted 2018-06-06 14:02:10 -04:00
Alex Dadgar 23cd56dc78 remove generated structs 2018-06-01 16:11:28 -07:00
Alex Dadgar c0386819b3 bump version/lint/generated files 2018-06-01 15:23:10 -07:00
Preetha Appan 4134fcd2c7
Fix test setup for FSMSnapshotRestore_Deployments to use a valid job that exists 2018-05-31 14:39:39 -05:00
Alex Dadgar 7f25fcc1bd
Merge pull request #4354 from hashicorp/b-job-modify
Deployment adds JobSpecModifyIndex
2018-05-31 17:57:38 +00:00
Alex Dadgar f2b2e0482b code review fixes 2018-05-31 10:57:08 -07:00
Preetha c2291a4cf7
Merge pull request #4349 from hashicorp/b-reconcile-raft-upgrade
Remove an unnecessary check in nomad member reconciliation
2018-05-30 13:17:38 -07:00
Alex Dadgar 92777a8018
Merge pull request #4329 from hashicorp/b-leaked-deployments
Clean up leaked deployments on restoration
2018-05-30 20:06:03 +00:00
Preetha 6ec6bcdd87
Merge pull request #4352 from hashicorp/b-update-drain-no-drainstrategy
fix bug with node eligibility staying disabled even after drain is disabled
2018-05-30 11:48:48 -07:00
Alex Dadgar 195e19827b Deployment adds JobSpecModifyIndex
Deployment tracks the Job.JobModifyIndex so that PUTS against /v1/jobs
can be more easily coorelated with the created deployment.

Fixes https://github.com/hashicorp/nomad/issues/4301
2018-05-30 11:33:56 -07:00
Preetha Appan c896a85a96
better test comment 2018-05-30 13:05:15 -05:00
Preetha Appan 647ccc2dc3
fix bug where disabling a node drain when there is no drain strategy set causes scheduling eligibility to stay ineligible 2018-05-30 12:28:46 -05:00
Alex Dadgar ad3dbe8ed0
Merge pull request #4338 from hashicorp/tls_prefer_server_cipher_suites
Add support for tls PreferServerCipherSuites
2018-05-30 17:25:17 +00:00
Preetha Appan 2fd20310ea
Remove checks in member reconcile that was causing servers in protocol 3 to not change their ID in raft forever 2018-05-30 11:34:45 -05:00
Chelsea Holland Komlo 19e4a5489b add support for tls PreferServerCipherSuites
add further tests for tls configuration
2018-05-25 13:20:00 -04:00
Alex Dadgar 15a71cc16e
Merge pull request #4331 from capone212/b-3595-fix-heartbeat
Fixed #3595
2018-05-25 00:57:03 +00:00
capone212 a0d4d4a336 Fixed #3595 (https://github.com/hashicorp/nomad/issues/3595)
Stopping heartbeat timer before remove
2018-05-24 13:15:06 +00:00
Alex Dadgar 352f2e03b5 Clean up leaked deployments on restoration
This PR cancels deployments that are active but do not have a job
associated with them. This is a broken invariant that causes issues in
the deployment watcher since it will not track them. Thus they are
objects that can't be operated on or cleaned up.

Fixes https://github.com/hashicorp/nomad/issues/4286
2018-05-23 16:44:21 -07:00
Chelsea Holland Komlo 38f611a7f2 refactor NewTLSConfiguration to pass in verifyIncoming/verifyOutgoing
add missing fields to TLS merge method
2018-05-23 18:35:30 -04:00
Alex Dadgar c268640c02 Fix noisy log 2018-05-22 14:45:34 -07:00
Alex Dadgar 21c5ed850d Register events 2018-05-22 14:06:33 -07:00
Alex Dadgar 17aac1c9de node heartbeat missed event 2018-05-22 14:05:46 -07:00
Alex Dadgar 1fe9cb4f00 update error message 2018-05-22 14:04:59 -07:00
Alex Dadgar 5f2080bc26 Emit events based on eligibility 2018-05-22 14:04:59 -07:00
Alex Dadgar 86be50fa05
Merge pull request #4284 from hashicorp/f-drain-event
Emit Node Events for draining
2018-05-22 21:04:18 +00:00
Alex Dadgar b6ecb75af9 update error message 2018-05-22 14:01:43 -07:00
Preetha b409a3ed5b
Merge pull request #4313 from hashicorp/b-alloc-gc-desiredstate
Check allocation's desired state in GC eligibility logic
2018-05-21 16:49:49 -07:00
Preetha 159888a856
Merge pull request #4274 from hashicorp/f-force-rescheduling
Add CLI and API support for forcing rescheduling of failed allocs
2018-05-21 16:24:22 -07:00
Preetha Appan a9d63c0df3
Check allocation's desired state in GC eligibility logic in core scheduler 2018-05-21 13:28:31 -05:00
Chelsea Komlo 687c26093c
Merge pull request #4269 from hashicorp/f-tls-remove-weak-standards
Configurable TLS cipher suites and versions; disallow weak ciphers
2018-05-11 08:11:46 -04:00
Alex Dadgar 9a2237bdab Drain complete 2018-05-10 17:22:06 -07:00
Alex Dadgar 0cb31feb1f Add node event when draining is set/removed/updated 2018-05-10 16:54:43 -07:00
Alex Dadgar a35248d1d8 Plumb event via FSM 2018-05-10 16:30:54 -07:00
Preetha Appan bfa0937bbb
Code review feedback 2018-05-10 14:42:24 -05:00
Preetha Appan ca5758741b
Update serf to pick up graceful leave fix 2018-05-10 11:16:24 -05:00
Chelsea Holland Komlo 620558c107 log error if unable to create TLS configuration 2018-05-10 11:51:54 -04:00
Chelsea Holland Komlo 44f536f18e add support for configurable TLS minimum version 2018-05-09 18:07:12 -04:00
Chelsea Holland Komlo 796bae6f1b allow configurable cipher suites
disallow 3DES and RC4 ciphers

add documentation for tls_cipher_suites
2018-05-09 17:15:31 -04:00
Preetha Appan b12df3c64b
Added CLI for evaluating job given ID, and modified client API for evaluate to take a request payload 2018-05-09 15:04:27 -05:00
Preetha Appan ef531b0f34
Add unit tests for forced rescheduling 2018-05-09 11:30:42 -05:00