Commit Graph

2586 Commits

Author SHA1 Message Date
Alex Dadgar a78cefec18 use int64 2018-10-16 15:34:32 -07:00
Preetha Appan 7c0d8c646c
Change CPU/Disk/MemoryMB to int everywhere in new resource structs 2018-10-16 16:21:42 -05:00
Alex Dadgar f5a76d8411 review comments 2018-10-15 15:31:13 -07:00
Alex Dadgar f9b056e1d1 Replace attributes map with new Attribute object 2018-10-13 14:08:58 -07:00
Alex Dadgar 04ba425dd5 validate constraints/affinities 2018-10-13 12:27:49 -07:00
Alex Dadgar 9b5aaac410 Device feasability checker 2018-10-13 12:27:49 -07:00
Alex Dadgar bfb4caa2e7 node devices 2018-10-13 12:27:49 -07:00
Alex Dadgar 5a07f9f96e parse affinities and constraints on devices 2018-10-11 14:05:19 -07:00
Alex Dadgar a2a56a930c Diff 2018-10-08 17:02:58 -07:00
Alex Dadgar 6b08b9d6b6 Define device request structs 2018-10-08 15:38:03 -07:00
Alex Dadgar 01f8e5b95f renames 2018-10-04 14:57:25 -07:00
Alex Dadgar 52f9cd7637 fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar bac5cb1e8b Scheduler uses allocated resources 2018-10-02 17:08:25 -07:00
Alex Dadgar 147d2430a1 allocated resources structs 2018-09-29 18:47:28 -07:00
Alex Dadgar 5c8697667e Node reserved resources 2018-09-29 18:44:55 -07:00
Alex Dadgar 3183153315 Node resources on client 2018-09-29 17:23:41 -07:00
Alex Dadgar 9b793531d6
Merge pull request #4720 from hashicorp/b-jet-fixes
Series of scheduler fixes / debugging enhancements
2018-09-25 13:25:11 -07:00
Alex Dadgar bd420692f3 fix logging 2018-09-25 10:49:55 -07:00
Preetha Appan 86e725e84c Added logging around nacked evals in the scheduler worker 2018-09-25 10:49:02 -07:00
Alex Dadgar 6a21f9fe96 Unique TriggerBy for blocked evals
Give blocked evals a unique triggerby reason to make debugging a chain
of evaluations easier.
2018-09-24 14:47:49 -07:00
Alex Dadgar e1a102f58c test allocs fit 2018-09-24 13:59:01 -07:00
Alex Dadgar d7f5be9148 Better comment on snapshotindex 2018-09-24 13:53:43 -07:00
Alex Dadgar 99498da6ed Denormalize jobs in plan and ignore resources of terminal allocs
Denormalize jobs in AppendAllocs:
AppendAlloc was originally only ever called for inplace upgrades and new
allocations. Both these code paths would remove the job from the
allocation. Now we use this to also add fields such as FollowupEvalID
which did not normalize the job. This is only a performance enhancement.

Ignore terminal allocs:
Failed allocations are annotated with the followup Eval ID when one is
created to replace the failed allocation. However, in the plan applier,
when we check if allocations fit, these terminal allocations were not
filtered. This could result in the plan being rejected if the node would
be overcommited if the terminal allocations resources were considered.
2018-09-24 13:53:43 -07:00
Alex Dadgar de442226ae Fix other instances of blocking queries 2018-09-24 13:52:39 -07:00
Alex Dadgar 7f0d241ef4 always handle failed allocation 2018-09-21 15:13:54 -07:00
Alex Dadgar b2449ae1ce Fix deployment watcher index usage
Fixes three issues:
1. Retrieving the latest evaluation index was not properly selecting the
greatest index. This would undermine checks we had to reduce the number
of evaluations created when the latest eval index was greater than any
alloc change
2. Fix an issue where the blocking query code was using the incorrect
index such that the index was higher than necassary.
3. Special case handling of blocked evaluation since the create/snapshot
index is no particularly useful since they can be reblocked.
2018-09-21 13:59:11 -07:00
Alex Dadgar 5009566503 do not bootstrap with non voters 2018-09-19 17:17:39 -07:00
Alex Dadgar e8f89597f5 fix rpc test 2018-09-19 10:17:54 -07:00
Alex Dadgar 9971b3393f yamux 2018-09-17 14:22:40 -07:00
Alex Dadgar b2f500b48c Serf/Raft/Memberlist logger 2018-09-17 13:57:52 -07:00
Alex Dadgar ca28afa3b2 small fixes 2018-09-15 16:42:38 -07:00
Alex Dadgar 3c19d01d7a server 2018-09-15 16:23:13 -07:00
Alex Dadgar 7739ef51ce agent + consul 2018-09-13 10:43:40 -07:00
Preetha Appan 996484981c
Fix panic when reschedule policy for allocation can't be looked up
because its task group changed
2018-09-05 17:01:02 -05:00
Alex Dadgar 4f89cabd34
Merge pull request #4631 from hashicorp/f-plugin-config
Parse plugin configs
2018-09-04 17:04:13 -07:00
Alex Dadgar cc92cd92cd
Merge pull request #4642 from hashicorp/b-vet
Fix vet errors and use newer go version in travis
2018-09-04 17:04:02 -07:00
Alex Dadgar c6576ddac1 Fix make check errors 2018-09-04 16:03:52 -07:00
Preetha Appan 26288b9522
Fix more review feedback 2018-09-04 16:10:11 -05:00
Preetha Appan 751c0eb5a5
code review feedback 2018-09-04 16:10:11 -05:00
Preetha Appan 4f8e925b54
Move topk and delay heap to separate packages under lib 2018-09-04 16:10:11 -05:00
Preetha Appan 9bc0962527
Track top k nodes by norm score rather than top k nodes per scorer 2018-09-04 16:10:11 -05:00
Preetha Appan 6ed527c636
Use heap to store top K scoring nodes.
Scoring metadata is now aggregated by scorer type to make it easier
to parse when reading it in the CLI.
2018-09-04 16:10:11 -05:00
Preetha Appan dd5fe6373f
Fix scoring logic for uneven spread to incorporate current alloc count
Also addressed other small code review comments
2018-09-04 16:10:11 -05:00
Preetha Appan e72c0fe527
more cleanup 2018-09-04 16:10:11 -05:00
Preetha Appan 92d37acc2a
comment and formatting cleanup 2018-09-04 16:10:11 -05:00
Preetha Appan 5812f906c8
Allow empty spread targets, and validate target percentages. 2018-09-04 16:10:11 -05:00
Preetha Appan 71bff00326
validate spread from job/task group validate methods 2018-09-04 16:10:11 -05:00
Preetha Appan fbd0004707
Fix warnings 2018-09-04 16:10:11 -05:00
Preetha Appan 5eb82b6260
Validate method, and rename ratio field to percent 2018-09-04 16:10:11 -05:00
Preetha Appan 0037d72fa8
Structs and validation for spread 2018-09-04 16:10:11 -05:00
Preetha Appan c407e3626f
More review comments 2018-09-04 16:10:11 -05:00
Preetha Appan dbbb4a957a
Fail validation if system job has affinities 2018-09-04 16:10:11 -05:00
Preetha Appan 0bc030c6fb
Treat set_contains as a synonym of set_contains_all 2018-09-04 16:10:11 -05:00
Preetha Appan e85a721cfb
Include affinities in job and task diff, and more test cases 2018-09-04 16:10:11 -05:00
Preetha Appan f06c7ab2ad
Fix Copy method for job and task to include affinities 2018-09-04 16:10:11 -05:00
Preetha Appan 9f0caa9c3d
Affinity parsing, api and structs 2018-09-04 16:10:11 -05:00
Preetha Appan 9e29cfee76
Use readlock 2018-09-04 11:45:05 -05:00
Preetha Appan 062e5f1898
Use eval broker lock when reading/modifying delay heap 2018-08-31 10:59:48 -05:00
Alex Dadgar bff1669ee4 Plugin config parsing 2018-08-29 17:06:01 -07:00
Chelsea Komlo 0a69cdb304
Merge pull request #4565 from hashicorp/b-compare-cert-alg
Error if TLS Certificate signature algorithm isn't supported in cipher suites
2018-08-15 16:09:46 -04:00
Xopherus 8d747578e8 Close multiplexer when context is cancelled
Multiplexer continues to create rpc connections even when
the context which is passed to the underlying rpc connections
is cancelled by the server.

This was causing #4413 - when a SIGHUP causes everything to reload,
it uses context to cancel the underlying http/rpc connections
so that they may come up with the new configuration.
The multiplexer was not being cancelled properly so it would
continue to create rpc connections and constantly fail,
causing communication issues with other nomad agents.

Fixes #4413
2018-08-13 19:32:49 -04:00
Chelsea Holland Komlo 31d6d00381 add simple getter for certificate 2018-08-10 12:37:21 -04:00
Andrei Burd 444ee45aff Parametrized/periodic jobs per child tagged metric emmision 2018-06-21 10:40:56 +03:00
Alex Dadgar b61051b3cd
Merge pull request #4409 from hashicorp/r-client-packages
Refactor client packages
2018-06-13 17:32:25 -07:00
Alex Dadgar 300b1a7a15 Tests only use testlog package logger 2018-06-13 15:40:56 -07:00
Chelsea Komlo 03075b603a
Merge pull request #4399 from hashicorp/r-reload-refactor
Refactor logic for dynamic reloading
2018-06-13 13:35:12 -04:00
Alex Dadgar d0043691fb remove structs + bump version 2018-06-11 13:52:19 -07:00
Alex Dadgar af5753d2cd bump version + generated files 2018-06-11 13:39:42 -07:00
Nick Ethier e75e3ae665
nomad: use require pkg for tests 2018-06-11 13:50:50 -04:00
Nick Ethier 50c72adbd7
nomad: code review comments 2018-06-11 13:27:48 -04:00
Nick Ethier a581cc9c01
nomad/structs: fix job diff test 2018-06-11 13:06:49 -04:00
Nick Ethier 41e010cdc2
nomad: add 'Dispatch' field to Job
New -bash: Dispatch: command not found field is used to denote if the Job is a child dispatched job of
a parameterized job.
2018-06-11 11:59:03 -04:00
Chelsea Holland Komlo de03ce8070 move logic to determine whether to reload tls configuration to tlsutil helper 2018-06-08 14:33:58 -04:00
Chelsea Komlo d738976234
Merge pull request #4395 from hashicorp/b-vault-second
Fix for dynamically reloading vault
2018-06-07 18:03:00 -04:00
Chelsea Holland Komlo dcc9cdfeb7 fixup! comment and move to always log server reload operation 2018-06-07 17:12:36 -04:00
Chelsea Holland Komlo 41e35edf0c fix test that now requires different config for test assertions 2018-06-07 17:07:06 -04:00
Chelsea Holland Komlo 9f6bd7bf3a move logic for testing equality for vault config 2018-06-07 16:23:50 -04:00
Chelsea Holland Komlo 282f37b1ee fix for dynamically reloading vault 2018-06-07 15:34:18 -04:00
Nick Ethier 2555bff4f5
nomad: add error check in test 2018-06-06 14:08:42 -04:00
Nick Ethier d35bf6d184
nomad: handle edge case where node drain event shouldn't be emitted 2018-06-06 14:02:10 -04:00
Alex Dadgar 23cd56dc78 remove generated structs 2018-06-01 16:11:28 -07:00
Alex Dadgar c0386819b3 bump version/lint/generated files 2018-06-01 15:23:10 -07:00
Preetha Appan 4134fcd2c7
Fix test setup for FSMSnapshotRestore_Deployments to use a valid job that exists 2018-05-31 14:39:39 -05:00
Alex Dadgar 7f25fcc1bd
Merge pull request #4354 from hashicorp/b-job-modify
Deployment adds JobSpecModifyIndex
2018-05-31 17:57:38 +00:00
Alex Dadgar f2b2e0482b code review fixes 2018-05-31 10:57:08 -07:00
Preetha c2291a4cf7
Merge pull request #4349 from hashicorp/b-reconcile-raft-upgrade
Remove an unnecessary check in nomad member reconciliation
2018-05-30 13:17:38 -07:00
Alex Dadgar 92777a8018
Merge pull request #4329 from hashicorp/b-leaked-deployments
Clean up leaked deployments on restoration
2018-05-30 20:06:03 +00:00
Preetha 6ec6bcdd87
Merge pull request #4352 from hashicorp/b-update-drain-no-drainstrategy
fix bug with node eligibility staying disabled even after drain is disabled
2018-05-30 11:48:48 -07:00
Alex Dadgar 195e19827b Deployment adds JobSpecModifyIndex
Deployment tracks the Job.JobModifyIndex so that PUTS against /v1/jobs
can be more easily coorelated with the created deployment.

Fixes https://github.com/hashicorp/nomad/issues/4301
2018-05-30 11:33:56 -07:00
Preetha Appan c896a85a96
better test comment 2018-05-30 13:05:15 -05:00
Preetha Appan 647ccc2dc3
fix bug where disabling a node drain when there is no drain strategy set causes scheduling eligibility to stay ineligible 2018-05-30 12:28:46 -05:00
Alex Dadgar ad3dbe8ed0
Merge pull request #4338 from hashicorp/tls_prefer_server_cipher_suites
Add support for tls PreferServerCipherSuites
2018-05-30 17:25:17 +00:00
Preetha Appan 2fd20310ea
Remove checks in member reconcile that was causing servers in protocol 3 to not change their ID in raft forever 2018-05-30 11:34:45 -05:00
Chelsea Holland Komlo 19e4a5489b add support for tls PreferServerCipherSuites
add further tests for tls configuration
2018-05-25 13:20:00 -04:00
Alex Dadgar 15a71cc16e
Merge pull request #4331 from capone212/b-3595-fix-heartbeat
Fixed #3595
2018-05-25 00:57:03 +00:00
capone212 a0d4d4a336 Fixed #3595 (https://github.com/hashicorp/nomad/issues/3595)
Stopping heartbeat timer before remove
2018-05-24 13:15:06 +00:00
Alex Dadgar 352f2e03b5 Clean up leaked deployments on restoration
This PR cancels deployments that are active but do not have a job
associated with them. This is a broken invariant that causes issues in
the deployment watcher since it will not track them. Thus they are
objects that can't be operated on or cleaned up.

Fixes https://github.com/hashicorp/nomad/issues/4286
2018-05-23 16:44:21 -07:00
Chelsea Holland Komlo 38f611a7f2 refactor NewTLSConfiguration to pass in verifyIncoming/verifyOutgoing
add missing fields to TLS merge method
2018-05-23 18:35:30 -04:00
Alex Dadgar c268640c02 Fix noisy log 2018-05-22 14:45:34 -07:00
Alex Dadgar 21c5ed850d Register events 2018-05-22 14:06:33 -07:00
Alex Dadgar 17aac1c9de node heartbeat missed event 2018-05-22 14:05:46 -07:00
Alex Dadgar 1fe9cb4f00 update error message 2018-05-22 14:04:59 -07:00
Alex Dadgar 5f2080bc26 Emit events based on eligibility 2018-05-22 14:04:59 -07:00
Alex Dadgar 86be50fa05
Merge pull request #4284 from hashicorp/f-drain-event
Emit Node Events for draining
2018-05-22 21:04:18 +00:00
Alex Dadgar b6ecb75af9 update error message 2018-05-22 14:01:43 -07:00
Preetha b409a3ed5b
Merge pull request #4313 from hashicorp/b-alloc-gc-desiredstate
Check allocation's desired state in GC eligibility logic
2018-05-21 16:49:49 -07:00
Preetha 159888a856
Merge pull request #4274 from hashicorp/f-force-rescheduling
Add CLI and API support for forcing rescheduling of failed allocs
2018-05-21 16:24:22 -07:00
Preetha Appan a9d63c0df3
Check allocation's desired state in GC eligibility logic in core scheduler 2018-05-21 13:28:31 -05:00
Chelsea Komlo 687c26093c
Merge pull request #4269 from hashicorp/f-tls-remove-weak-standards
Configurable TLS cipher suites and versions; disallow weak ciphers
2018-05-11 08:11:46 -04:00
Alex Dadgar 9a2237bdab Drain complete 2018-05-10 17:22:06 -07:00
Alex Dadgar 0cb31feb1f Add node event when draining is set/removed/updated 2018-05-10 16:54:43 -07:00
Alex Dadgar a35248d1d8 Plumb event via FSM 2018-05-10 16:30:54 -07:00
Preetha Appan bfa0937bbb
Code review feedback 2018-05-10 14:42:24 -05:00
Preetha Appan ca5758741b
Update serf to pick up graceful leave fix 2018-05-10 11:16:24 -05:00
Chelsea Holland Komlo 620558c107 log error if unable to create TLS configuration 2018-05-10 11:51:54 -04:00
Chelsea Holland Komlo 44f536f18e add support for configurable TLS minimum version 2018-05-09 18:07:12 -04:00
Chelsea Holland Komlo 796bae6f1b allow configurable cipher suites
disallow 3DES and RC4 ciphers

add documentation for tls_cipher_suites
2018-05-09 17:15:31 -04:00
Preetha Appan b12df3c64b
Added CLI for evaluating job given ID, and modified client API for evaluate to take a request payload 2018-05-09 15:04:27 -05:00
Preetha Appan ef531b0f34
Add unit tests for forced rescheduling 2018-05-09 11:30:42 -05:00
Chelsea Holland Komlo d51611040f Add driver health information to node list stub 2018-05-09 11:21:54 -04:00
Preetha Appan 1b8d8b2186
Fix logic inversion in force rescheduling 2018-05-08 20:00:06 -05:00
Preetha Appan c1b92c284e
Work in progress - force rescheduling of failed allocs 2018-05-08 17:26:57 -05:00
Preetha Appan c7edbd5f41
newlines in test 2018-05-07 14:55:01 -05:00
Preetha Appan 4e75456beb
Fix deadlock in deadline timer logic when progress deadline is passed and the deployment is updated. 2018-05-07 14:55:01 -05:00
Preetha Appan cba13e4ec5
Fix test set up to set ModifyTime for alloc 2018-05-07 14:55:01 -05:00
Preetha Appan 19b096d203
Set modify time for allocs in unit test, and define current time in one spot 2018-05-07 14:55:01 -05:00
Preetha Appan 4c377b112e
Fix panic in deployment watcher when deployment is not in the state store due to a gc 2018-05-07 14:55:01 -05:00
Preetha 02d63432b4
Fix typo 2018-05-07 14:55:01 -05:00
Alex Dadgar 738056634e
Fix the initial progress deadline calculation when the alloc is inplace updated to be part of a new deployment 2018-05-07 14:55:01 -05:00
Michael Schurter e90d051c43
consul: change hashed canary bytes 2018-05-07 14:55:01 -05:00
Alex Dadgar 768fec8505
Allow healthy canary deployment to skip progress deadline 2018-05-07 14:55:01 -05:00
Alex Dadgar 8626c1b94a
Reschedule when we have canaries properly 2018-05-07 14:55:01 -05:00
Michael Schurter 50e04c976e
consul: support canary tags for services
Also refactor Consul ServiceClient to take a struct instead of a massive
set of arguments. Meant updating a lot of code but it should be far
easier to extend in the future as you will only need to update a single
struct instead of every single call site.

Adds an e2e test for canary tags.
2018-05-07 14:55:01 -05:00
Michael Schurter a3038cefb4
typo: transistion -> transition 2018-05-07 14:50:01 -05:00
Alex Dadgar bd38675365
Fix tests 2018-05-07 14:50:01 -05:00
Alex Dadgar 319763a5d8
remove unnessary merge of DeploymentStatus.Timestamp 2018-05-07 14:50:01 -05:00
Alex Dadgar f4af30fbb5
Canary tags structs 2018-05-07 14:50:01 -05:00
Alex Dadgar f95ab4ade8
Mark canaries on creation, and unmark on promotion 2018-05-07 14:50:01 -05:00
Preetha Appan b2b773e696
better comments and remove commented code 2018-05-07 14:50:01 -05:00
Preetha Appan 90a2311cef
Fix deadlock in deployment watcher when deployment starts with no allocations and eventually has failed allocations 2018-05-07 14:50:01 -05:00
Alex Dadgar 224b3092ae
change default to 10m and docs 2018-05-07 14:50:01 -05:00
Alex Dadgar c91ce5cc38
Fix not enqueuing eval 2018-05-07 14:50:01 -05:00
Alex Dadgar 8d50955054
Fix typos 2018-05-07 14:50:01 -05:00
Alex Dadgar 641ef81cbf
Test fixes 2018-05-07 14:50:01 -05:00
Alex Dadgar 8a81038cdb
Set Reschedule from deployment watcher 2018-05-07 14:50:01 -05:00
Alex Dadgar a510774451
Use UpdateAllocDesiredTransistion instead of UpsertEval but no transistions yet 2018-05-07 14:50:01 -05:00
Alex Dadgar fcf4f582d0
small review feedback fixes 2018-05-07 14:50:01 -05:00
Alex Dadgar e5caaf3358
Small test fix 2018-05-07 14:50:01 -05:00
Alex Dadgar 9bff9024b3
add latest eval back 2018-05-07 14:50:01 -05:00
Alex Dadgar 99e00fb774
Pass through timestamp 2018-05-07 14:50:01 -05:00