Commit Graph

3001 Commits

Author SHA1 Message Date
Arshneet Singh b7b050cdd1 Change min version required for plan optimization 2019-04-24 12:36:07 -07:00
Arshneet Singh 9cc39edb67 Return error when preempted/stopped alloc doesn't exist during denormalization 2019-04-24 12:36:07 -07:00
Lang Martin 19ba0f4882 structs_test use testify require.True instead of t.Fatal 2019-04-23 17:00:11 -04:00
Arshneet Singh d4e7a5c005 Add comments to functions, and use require instead of assert 2019-04-23 09:57:21 -07:00
Arshneet Singh 4cf4324b8f Remove allowPlanOptimization from schedulers 2019-04-23 09:18:02 -07:00
Arshneet Singh 0dd4c109e8 Compat tags 2019-04-23 09:18:01 -07:00
Arshneet Singh 65f5fab131 Add tests for plan normalization 2019-04-23 09:18:01 -07:00
Arshneet Singh b977748a4b Add code for plan normalization 2019-04-23 09:18:01 -07:00
Danielle 198a838b61
Merge pull request #5512 from hashicorp/dani/f-alloc-stop
alloc-lifecycle: nomad alloc stop
2019-04-23 13:05:08 +02:00
Danielle Lancashire 832f607433 allocs: Add nomad alloc stop
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.

This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.

The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.
2019-04-23 12:50:23 +02:00
Lang Martin 8aa97cff13 tests over setwise equality of fingerprinted parts 2019-04-19 15:49:24 -04:00
Lang Martin 7de6e28ddc structs need to keep assert Equal interface implementation for tests 2019-04-19 15:23:49 -04:00
Lang Martin 977d33970b structs equals use labeled continue for clarity 2019-04-19 15:23:48 -04:00
Lang Martin 7b99488afa struct equals use a working pattern for setwise comparison 2019-04-19 15:23:48 -04:00
Lang Martin eba4e29440 client fingerprinter doesn't overwrite manual configuration
Revert "Revert accidental merge of pr #5482"
This reverts commit c45652ab8c113487b9d4fbfb107782cbcf8a85b0.
2019-04-19 15:23:48 -04:00
Preetha Appan 22109d1e20
Add preemption related fields to AllocationListStub 2019-04-18 10:36:44 -05:00
Lang Martin a2a1e7829d Revert accidental merge of pr #5482
Revert "fingerprint Constraints and Affinities have Equals, as set"
This reverts commit 596f16fb5f1a4a6766a57b3311af806d22382609.

Revert "client tests assert the independent handling of interface and speed"
This reverts commit 7857ac5993a578474d0570819f99b7b6e027de40.

Revert "structs missed applying a style change from the review"
This reverts commit 658916e3274efa438beadc2535f47109d0c2f0f2.

Revert "client, structs comments"
This reverts commit be2838d6baa9d382a5013fa80ea016856f28ade2.

Revert "client fingerprint updateNetworks preserves the network configuration"
This reverts commit fc309cb430e62d8e66267a724f006ae9abe1c63c.

Revert "client_test cleanup comments from review"
This reverts commit bc0bf4efb9114e699bc662f50c8f12319b6b3445.

Revert "client Networks Equals is set equality"
This reverts commit f8d432345b54b1953a4a4c719b9269f845e3e573.

Revert "struct cleanup indentation in RequestedDevice Equals"
This reverts commit f4746411cab328215def6508955b160a53452da3.

Revert "struct Equals checks for identity before value checking"
This reverts commit 0767a4665ed30ab8d9586a59a74db75d51fd9226.

Revert "fix client-test, avoid hardwired platform dependecy on lo0"
This reverts commit e89dbb2ab182b6368507dbcd33c3342223eb0ae7.

Revert "refactor error in client fingerprint to include the offending data"
This reverts commit a7fed726c6e0264d42a58410d840adde780a30f5.

Revert "add client updateNodeResources to merge but preserve manual config"
This reverts commit 84bd433c7e1d030193e054ec23474380ff3b9032.

Revert "refactor struts.RequestedDevice to have its own Equals"
This reverts commit 689782524090e51183474516715aa2f34908b8e6.

Revert "refactor structs.Resource.Networks to have its own Equals"
This reverts commit 49e2e6c77bb3eaa4577772b36c62205061c92fa1.

Revert "refactor structs.Resource.Devices to have its own Equals"
This reverts commit 4ede9226bb971ae42cc203560ed0029897aec2c9.

Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources"
This reverts commit 49fbaace5298d5ccf031eb7ebec93906e1d468b5.

Revert "add structs.Resources Equals"
This reverts commit 8528a2a2a6450e4462a1d02741571b5efcb45f0b.

Revert "test that fingerprint resources are updated, net not clobbered"
This reverts commit 8ee02ddd23bafc87b9fce52b60c6026335bb722d.
2019-04-11 10:29:40 -04:00
Lang Martin 07ff740408 fingerprint Constraints and Affinities have Equals, as set 2019-04-11 09:56:22 -04:00
Lang Martin 8f07698c03 structs missed applying a style change from the review 2019-04-11 09:56:22 -04:00
Lang Martin 7258a13c72 client, structs comments 2019-04-11 09:56:22 -04:00
Lang Martin 1878bf694e client Networks Equals is set equality 2019-04-11 09:56:22 -04:00
Lang Martin e1c91afd19 struct cleanup indentation in RequestedDevice Equals 2019-04-11 09:56:22 -04:00
Lang Martin 0c90efebdc struct Equals checks for identity before value checking 2019-04-11 09:56:22 -04:00
Lang Martin 1a594b53f6 refactor struts.RequestedDevice to have its own Equals 2019-04-11 09:56:21 -04:00
Lang Martin ec1ccdeda0 refactor structs.Resource.Networks to have its own Equals
NodeResource.Networks uses the same function
2019-04-11 09:56:21 -04:00
Lang Martin 06008465c4 refactor structs.Resource.Devices to have its own Equals 2019-04-11 09:56:21 -04:00
Lang Martin 36f3022246 add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources 2019-04-11 09:56:21 -04:00
Lang Martin d4567e9909 add structs.Resources Equals 2019-04-11 09:56:21 -04:00
Danielle Lancashire e135876493 allocs: Add nomad alloc restart
This adds a `nomad alloc restart` command and api that allows a job operator
with the alloc-lifecycle acl to perform an in-place restart of a Nomad
allocation, or a given subtask.
2019-04-11 14:25:49 +02:00
Chris Baker 34e100cc96
server vault client: use two vault clients, one with namespace, one without for /sys calls 2019-04-10 10:34:10 -05:00
Michael Schurter cc7768c170
Update nomad/structs/config/vault.go
Co-Authored-By: cgbaker <cgbaker@hashicorp.com>
2019-04-10 10:34:10 -05:00
Chris Baker a26d4fe1e5
docs: -vault-namespace, VAULT_NAMESPACE, and config
agent: added VAULT_NAMESPACE env-based configuration
2019-04-10 10:34:10 -05:00
Chris Baker d3041cdb17
wip: added config parsing support, CLI flag, still need more testing, VAULT_ var, documentation 2019-04-10 10:34:10 -05:00
Chris Baker 0eaeef872f
config/docs: added `namespace` to vault config
server/client: process `namespace` config, setting on the instantiated vault client
2019-04-10 10:34:10 -05:00
Michael Schurter c0cd96ef75
Update nomad/job_endpoint_test.go
Co-Authored-By: cgbaker <cgbaker@hashicorp.com>
2019-04-10 10:34:10 -05:00
Michael Schurter 188c32421a
Update nomad/job_endpoint.go
Co-Authored-By: cgbaker <cgbaker@hashicorp.com>
2019-04-10 10:34:10 -05:00
Chris Baker 0ba1600545
server/job_endpoint: accept vault token and pass as part of Job.RegisterRequest [#4555] 2019-04-10 10:34:10 -05:00
James Rasell 9470507cf4
Add NodeName to the alloc/job status outputs.
Currently when operators need to log onto a machine where an alloc
is running they will need to perform both an alloc/job status
call and then a call to discover the node name from the node list.

This updates both the job status and alloc status output to include
the node name within the information to make operator use easier.

Closes #2359
Cloess #1180
2019-04-10 10:34:10 -05:00
Michael Schurter 45b4827ad7 Bump to 0.9.1-dev 2019-04-09 09:01:48 -07:00
Nomad Release bot e307734e4a Generate files for 0.9.0 release 2019-04-09 01:56:00 +00:00
Michael Schurter 3af602b633 Remove 0.9.0-rc2 generated files 2019-04-03 07:41:09 -07:00
Nomad Release bot 16b4336ccf Generate files for 0.9.0-rc2 release 2019-04-03 01:54:29 +00:00
Michael Schurter 9afbc45cff Bump to dev post-0.9.0-rc1 release 2019-03-22 08:26:30 -07:00
Nomad Release bot 3ab3dd4105 Generate files for 0.9.0-rc1 release 2019-03-21 19:06:13 +00:00
HashedDan caad68e799 server: inconsistent receiver notation corrected
Signed-off-by: HashedDan <georgedanielmangum@gmail.com>
2019-03-16 17:53:53 -05:00
Alex Dadgar e779d9444b
Update nomad/eval_endpoint_test.go
Co-Authored-By: schmichael <michael.schurter@gmail.com>
2019-03-05 15:19:15 -08:00
Alex Dadgar 1857f5d7c1
Update nomad/eval_endpoint.go
Co-Authored-By: schmichael <michael.schurter@gmail.com>
2019-03-05 15:19:07 -08:00
Michael Schurter e37bbb21a5 nomad: simplify code and improve parameter name 2019-03-04 13:44:14 -08:00
Michael Schurter 05f51499ba nomad: compare current eval when setting WaitIndex
Consider currently dequeued Evaluation's ModifyIndex when determining
its WaitIndex. Normally the Evaluation itself would already be in the
state store snapshot used to determine the WaitIndex. However, since the FSM
applies Raft messages to the state store concurrently with Dequeueing,
it's possible the currently dequeued Evaluation won't yet exist in the
state store snapshot used by JobsForEval.

This can be solved by always considering the current eval's modify index
and using it if it is greater than all of the evals returned by the
state store.
2019-03-01 15:23:39 -08:00
Michael Schurter 3f386e3951 Remove generated files for 0.9.0-beta3 2019-02-26 10:34:08 -08:00
Michael Schurter d74755900e Generate files for 0.9.0-beta3 release 2019-02-26 09:44:49 -08:00
Charlie Voiselle 604c49beb8
Merge pull request #5344 from hashicorp/b-nexteval-for-failed-follow-up
Set NextEval when making `failed-follow-up` evals
2019-02-22 14:14:41 -08:00
Charlie Voiselle 006afdca9b Added comments
* caller should created eval id
* prev/next eval used in failed-follow-up
2019-02-22 10:22:52 -08:00
Charlie Voiselle c28c195f42 Set NextEval when making `failed-follow-up` evals
This allows users to locate failed-follow-up evals more easily
2019-02-20 16:07:11 -08:00
Michael Schurter 6580ed668e client: don't redownload completed artifacts on retries
Track the download status of each artifact independently so that if only
one of many artifacts fails to download, completed artifacts aren't
downloaded again.
2019-02-20 08:45:12 -08:00
Michael Schurter 2db91425e3 Remove 0.9.0-beta2 generated files 2019-02-01 08:28:44 -08:00
Alex Dadgar 84d0afccae Generate files for 0.9.0-beta2 2019-01-30 13:31:50 -08:00
Alex Dadgar d2e5ede119 remove generated structs 2019-01-30 12:38:34 -08:00
Alex Dadgar 41265d4d61 Change types of weights on spread/affinity 2019-01-30 12:20:38 -08:00
Alex Dadgar bc804dda2e Nomad 0.9.0-beta1 generated code 2019-01-30 10:49:44 -08:00
Preetha Appan c848a1d387
ensure tests run a 0.9 server 2019-01-29 16:19:45 -06:00
Preetha Appan 496eb1de0c
Guard operator endpoints for minimum server version 2019-01-29 15:50:36 -06:00
Preetha Appan 7578522f58
variable name fix 2019-01-29 13:48:45 -06:00
Preetha Appan a6cebbbf9e
Make sure that all servers are 0.9 before applying scheduler config entry 2019-01-29 12:47:42 -06:00
Michael Schurter 3aba7ee826 nomad: fix panic when no node conn found
A missing return would cause a panic when a server could find no route
to a client.
2019-01-28 21:55:35 -08:00
Mahmood Ali f9164dae67
Merge pull request #5228 from hashicorp/f-vault-err-tweaks
server/vault: tweak error messages
2019-01-25 11:17:31 -05:00
Mahmood Ali f4560d8a2a server/vault: tweak error messages
Closes #5139
2019-01-25 10:33:54 -05:00
Preetha ec92bf673c
Merge pull request #5223 from hashicorp/f-jobs-list-datacenters
Add Datacenters to the JobListStub struct
2019-01-24 08:13:30 -06:00
Michael Schurter 13f061a83f
Merge pull request #5196 from hashicorp/f-plugin-utils
Make plugins/shared external and make pluginutls/
2019-01-23 06:59:32 -08:00
Michael Schurter 32daa7b47b goimports until make check is happy 2019-01-23 06:27:14 -08:00
Michael Schurter be0bab7c3f move pluginutils -> helper/pluginutils
I wanted a different color bikeshed, so I get to paint it
2019-01-22 15:50:08 -08:00
Alex Dadgar 4bdccab550 goimports 2019-01-22 15:44:31 -08:00
Alex Dadgar cdcd3c929c loader and singleton 2019-01-22 15:11:57 -08:00
Alex Dadgar 6c2782f037 move catalog + grpcutils 2019-01-22 15:11:57 -08:00
Preetha Appan 38422642cb
Use DesiredState to determine whether to stop sending task events 2019-01-22 16:43:32 -06:00
Michael Lange ce7bc4f56f Add Datacenters to the JobsListStub struct
So it can be used for filtering the full list of jobs
2019-01-22 11:16:35 -08:00
Mahmood Ali e1803b685b tests: deflake TestClientAllocations_GarbageCollect_Remote
Use the same strategy as one in f2f383b07543a09ca989b82738926f7248e1ab28
2019-01-19 09:07:27 -05:00
Mahmood Ali b2203a3a22
Merge pull request #5215 from hashicorp/test-fix-garbagecollect
test: fix flaky garbage collect test
2019-01-18 21:10:01 -05:00
Mahmood Ali 05e32fb525
Merge pull request #5213 from hashicorp/b-api-separate
Slimmer /api package
2019-01-18 20:52:53 -05:00
Michael Schurter 0cd35ba335 test: fix flaky garbage collect test
This seems to fix TestClientAllocations_GarbageCollectAll_Remote being
flaky.

This test confuses me. It joins 2 servers, but then goes out of its way
to make sure the test client only interacts with one. There are not
enough comments for me to figure out the precise assertions this test is
trying to make.

A good old fashioned wait-for-the-client-to-register seems to fix the
flakiness though. The error was that the node could not be found, so
this makes some sense. However, lots of other tests seem to use the same
"wait for node" logic and don't appear to be flaky, so who knows why
waiting fixes this one.

Passes with -race.
2019-01-18 16:01:30 -08:00
Mahmood Ali 7bdd43f3e0 api: avoid codegen for syncing
Given that the values will rarely change, specially considering that any
changes would be backward incompatible change.  As such, it's simpler to
keep syncing manually in the rare occasion and avoid the syncing code
overhead.
2019-01-18 18:52:31 -05:00
Preetha Appan 510d7839e4
code review comments 2019-01-18 17:41:39 -06:00
Mahmood Ali 253532ec00 api: avoid import nomad/structs pkg
nomad/structs is an internal package and imports many libraries (e.g.
raft, codec) that are not relevant to api clients, and may cause
unnecessary dependency pain (e.g. `github.com/ugorji/go/codec`
version is very old now).

Here, we add a code generator that imports the relevant constants from
`nomad/structs`.

I considered using this approach for other structs, but didn't find a
quick viable way to reduce duplication.  `nomad/structs` use values as
struct fields (e.g. `string`), while `api` uses value pointer (e.g.
`*string`) instead.  Also, sometimes, `api` structs contain deprecated
fields or additional documentation, so simple copy-paste doesn't work.
For these reasons, I opt to keep the status quo.
2019-01-18 14:51:19 -05:00
Preetha Appan be9656d195
fix linting 2019-01-17 15:36:33 -06:00
Preetha Appan 0f8a113ead
Refactor to find jobs with child instances more effeciently
also added unit tests
2019-01-17 14:29:48 -06:00
Preetha Appan be36fee48e
Use IsParameterized/isPeriodic methods 2019-01-17 12:15:42 -06:00
Preetha Appan 81a8f18cac
Fix bug in reconcile summaries that affects periodic/parameterized jobs
This fixes incorrect parent job summaries by recomputing them in the
ReconcileJobSummaries method in the state store
2019-01-17 12:01:01 -06:00
Nick Ethier 597b7b751d
tr: add retry /w backoff to stats_hook failure 2019-01-12 12:18:24 -05:00
Mahmood Ali 4414a2ce1c tests: remove tests for unsupported features
With switching to driver plugins, driver validation is quite tricky and
we need to do some design thinking before supporting it against.
2019-01-10 10:21:48 -05:00
Nick Wales 7a7b5da0df Adds optional Consul service tags to nomad server and agent services, gh#4297 2019-01-09 22:02:46 +00:00
Mahmood Ali 1f2473263e fix more cases of logging arity errors 2019-01-09 09:22:47 -05:00
Mahmood Ali 6f077a73dc Fix panic on failure
Error expects an odd number of arguments, and panics otherwise.
2019-01-08 12:19:44 -05:00
Michael Schurter 324e989327
Merge pull request #5034 from hashicorp/test-fix-races
Test fix races
2019-01-08 07:04:09 -08:00
Alex Dadgar 79cfe26021 vet 2019-01-07 14:49:41 -08:00
Alex Dadgar 8a35d7b1dd Test recovery 2019-01-07 14:49:41 -08:00
Nick Ethier a96afb6c91
fix tests that fail as a result of async client startup 2018-12-20 00:53:44 -05:00
Michael Schurter 6c1dbb659d test: fix race and nil panic in nomad/ tests
Race was test only and due to unlocked map access.

Panic was test only and due to checking a field on a struct even when we
knew the struct was nil.

Race output that was fixed:
```
==================
WARNING: DATA RACE
Read at 0x00c000697dd0 by goroutine 768:
  runtime.mapaccess2()
      /usr/local/go/src/runtime/map.go:439 +0x0
  github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds.func8()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:402
+0xe6
  github.com/hashicorp/nomad/testutil.WaitForResultRetries()
      /home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:30
+0x5a
  github.com/hashicorp/nomad/testutil.WaitForResult()
      /home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:22
+0x57
  github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:401
+0xb53
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:827 +0x162

Previous write at 0x00c000697dd0 by goroutine 569:
  runtime.mapassign()
      /usr/local/go/src/runtime/map.go:549 +0x0
  github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).Add()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:224
+0x2eb
  github.com/hashicorp/nomad/nomad.(*Server).restorePeriodicDispatcher()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:394
+0x29a
  github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:234
+0x593
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c

Goroutine 768 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:878 +0x650
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:1119 +0xa8
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:827 +0x162
  testing.runTests()
      /usr/local/go/src/testing/testing.go:1117 +0x4ee
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:1034 +0x2ee
  main.main()
      _testmain.go:1150 +0x221

Goroutine 569 (running) created at:
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
==================
```
2018-12-19 15:48:02 -08:00
Michael Schurter 004fa574cb test: fix race in eval broker update chan
Similar to previous commits the delayed eval update chan was set and
access from different goroutines causing a race. Passing the chan on the
stack resolves the race.

Race output from `go test -race -run 'Server_RPC$'` in nomad/

```
==================
WARNING: DATA RACE
Write at 0x00c000339150 by goroutine 63:
  github.com/hashicorp/nomad/nomad.(*EvalBroker).flush()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:708
+0x3dc
  github.com/hashicorp/nomad/nomad.(*EvalBroker).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:174
+0xc4
  github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:718
+0x1fd
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c

Previous read at 0x00c000339150 by goroutine 73:
  github.com/hashicorp/nomad/nomad.(*EvalBroker).runDelayedEvalsWatcher()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:771
+0x176

Goroutine 63 (running) created at:
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269

Goroutine 73 (running) created at:
  github.com/hashicorp/nomad/nomad.(*EvalBroker).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:170
+0x173
  github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:207
+0x355
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
2018-12-19 15:48:02 -08:00
Michael Schurter 1c137690c4 test: fix race around block eval chans
Similar to previous commit, stop and change chans were being set and
accessed from different goroutines. Passing the chans on the stack
resolves the race.

Output from `go test -race -run 'Server_RPC$' in nomad/

```
==================
WARNING: DATA RACE
Write at 0x00c0002b4e10 by goroutine 63:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).Flush()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:648
+0x32a
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149
+0x12b
  github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721
+0x232
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c

Previous read at 0x00c0002b4e10 by goroutine 75:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).watchCapacity()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:483
+0xfe

Goroutine 63 (running) created at:
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269

Goroutine 75 (finished) created at:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:141
+0xba
  github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210
+0x392
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
==================
WARNING: DATA RACE
Write at 0x00c0002b4e50 by goroutine 63:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).Flush()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:649
+0x388
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149
+0x12b
  github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721
+0x232
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c

Previous read at 0x00c0002b4e50 by goroutine 77:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).prune()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:690
+0xae

Goroutine 63 (running) created at:
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269

Goroutine 77 (finished) created at:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:142
+0xdc
  github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210
+0x392
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
2018-12-19 15:48:02 -08:00
Michael Schurter 80263861aa test: fix race around updateCh handling
PeriodicDispatch.SetEnabled sets updateCh in one goroutine, and
PeriodicDispatch.run accesses updateCh in another.

The race can be prevented by having SetEnabled pass updateCh to run.

Race detector output from `go test -race -run TestServer_RPC` in nomad/

```
==================
WARNING: DATA RACE
Write at 0x00c0001d3f48 by goroutine 75:
  github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:468
+0x256
  github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:724
+0x267
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:131
+0x3c
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:163
+0x4dd
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c

Previous read at 0x00c0001d3f48 by goroutine 515:
  github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).run()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:338
+0x177

Goroutine 75 (running) created at:
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269

Goroutine 515 (running) created at:
  github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:176
+0x1bc
  github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:231
+0x582
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
2018-12-19 15:48:02 -08:00
Danielle Tomlinson 3647b701a6 taskrunner: Emit task events when a hook fails 2018-12-13 18:20:18 +01:00
Chris Baker af593c401c
Merge pull request #4974 from hashicorp/b-1173-log-spam
rpc accept loop: added backoff on logging
2018-12-12 16:54:42 -08:00
Chris Baker 121a9eb8cb some changes for more idiomatic code 2018-12-12 23:11:17 +00:00
Alex Dadgar fbe4d67d1b fix iops related tests 2018-12-12 14:32:22 -08:00
Chris Baker 34600f8b75 fixed bug in loop delay 2018-12-12 19:16:41 +00:00
Chris Baker 89c64932c1 gofmt 2018-12-12 19:09:06 +00:00
Chris Baker 22c11d8799 improved code for readability 2018-12-12 18:52:06 +00:00
Preetha f406e66ab8
Merge pull request #4881 from hashicorp/f-device-preemption
Device preemption
2018-12-11 18:34:19 -06:00
Alex Dadgar 1531b6d534
Merge pull request #4970 from hashicorp/f-no-iops
Deprecate IOPS
2018-12-11 12:51:22 -08:00
Chris Baker 59beae35df nomad/rpc listener: modified to throttle logging on "permanent" Accept() errors as well (with a higher delay cap) 2018-12-07 22:14:15 +00:00
Chris Baker 707bac0a7b rpc accept loop: added backoff on logging for failed connections, in case there is a fast fail loop (NMD-1173) 2018-12-07 20:12:55 +00:00
Alex Dadgar c918a96490 Warn if IOPS is being used 2018-12-06 16:17:09 -08:00
Alex Dadgar 1e3c3cb287 Deprecate IOPS
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Alex Dadgar 14a61ea3ea Don't GC running but desired stop allocations
This PR fixes an edge case where we could GC an allocation that was in a
desired stop state but had not terminated yet. This can be hit if the
client hasn't shutdown the allocation yet or if the allocation is still
shutting down (long kill_timeout).

Fixes https://github.com/hashicorp/nomad/issues/4940
2018-12-05 13:01:12 -08:00
Mahmood Ali adb4d69576
Merge pull request #4956 from hashicorp/b-vault-client-tweaks-followup
server/vault: Lock Vault expiration tracking
2018-12-04 19:46:59 -05:00
Mahmood Ali 50e38104a5 server/nomad: Lock Vault expiration tracking
`currentExpiration` field is accessed in multiple goroutines: Stats and
renewal, so needs locking.

I don't anticipate high contention, so simple mutex suffices.
2018-12-04 09:29:48 -05:00
Preetha Appan 8656d3379f
Add guards around subtracting summary count 2018-12-03 11:16:35 -06:00
Danielle Tomlinson 51a9f7369e
Merge pull request #4936 from hashicorp/f-legacy-refactor
Refactor and repackage client/driver
2018-11-30 13:38:06 +01:00
Danielle Tomlinson d4cbd608ff nomad: Remove on-submission job validation
With the introduction of driver plugins, we're temporarily relying on
_run time validation_ of driver configurations, rather than submission
time.
2018-11-30 10:47:08 +01:00
Nick Ethier 80ae7e34f4
Merge pull request #4906 from hashicorp/f-metric-prefix-master
Port metric prefix filtering to master
2018-11-29 22:27:47 -05:00
Nick Ethier b1484aec33
nomad: fix hclog usage 2018-11-29 22:27:39 -05:00
Mahmood Ali 0a2611e41f vault: protect against empty Vault secret response
Also, fix a case where a successful second attempt of loading token can
cause a panic.
2018-11-29 09:34:17 -05:00
Alex Dadgar 4ee603c382 Device hook and devices affect computed node class
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.

TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
2018-11-27 17:25:33 -08:00
Nick Ethier 95362eaa02
Merge pull request #4844 from hashicorp/f-docker-plugin
Docker driver plugin
2018-11-20 20:43:03 -05:00
Mahmood Ali 2e6133fd33 nil secrets as recoverable to keep renew attempts 2018-11-20 17:11:55 -05:00
Mahmood Ali 5827438983 Renew past recorded expiry till unrecoverable error
Keep attempting to renew Vault token past locally recorded expiry, just
in case the token was renewed out of band, e.g. on another Nomad server,
until Vault returns an unrecoverable error.
2018-11-20 17:10:55 -05:00
Mahmood Ali 5836a341dd fix typo 2018-11-20 17:10:55 -05:00
Mahmood Ali 93add67e04 round ttl duration for users 2018-11-20 17:10:55 -05:00
Mahmood Ali 4a0544b369 Track renewal expiration properly 2018-11-20 17:10:55 -05:00
Mahmood Ali 79aa934a4b reconcile interface 2018-11-20 17:10:55 -05:00
Mahmood Ali 6efea6d8fc Populate agent-info with vault
Return Vault TTL info to /agent/self API and `nomad agent-info` command.
2018-11-20 17:10:55 -05:00
Mahmood Ali 6034af5084 Avoid explicit precomputed stats field
Seems like the stats field is a micro-optimization that doesn't justify
the complexity it introduces.  Removing it and computing the stats from
revoking field directly.
2018-11-20 17:10:54 -05:00
Mahmood Ali 14842200ec More metrics for Server vault
Add a gauge to track remaining time-to-live, duration of renewal request API call.
2018-11-20 17:10:54 -05:00
Mahmood Ali e1994e59bd address review comments 2018-11-20 17:10:54 -05:00
Mahmood Ali 35179c9655 Wrap Vault API api errors for easing debugging 2018-11-20 17:10:54 -05:00
Mahmood Ali 55456fc823 Set a 1s floor for Vault renew operation backoff 2018-11-20 17:10:54 -05:00
Mahmood Ali 7ad8f6c103
Merge pull request #4903 from hashicorp/b-delete-versions-mod-while-iter
Fix a panic related to batch GC
2018-11-20 15:16:02 -05:00
Mahmood Ali 6281700c0c address review comments 2018-11-20 13:21:39 -05:00
Nick Ethier 5c5cae79ab
nomad: only lookup job is disable_dispatched_job_summary_metrics is set 2018-11-19 23:22:23 -05:00
Nick Ethier 8ac69f440d
nomad: lookup job instead of adding Dispatched to summary 2018-11-19 23:22:02 -05:00
Nick Ethier 85b221a1d6
nomad: add flag to disable publishing of job_summary metrics for dispatched jobs 2018-11-19 23:21:19 -05:00
Nick Ethier 29591a7c2e
task_runner: emit event on task exit with exit result details 2018-11-19 22:59:17 -05:00
Mahmood Ali d744e71fa9 add a missing no errorassertion 2018-11-19 21:44:00 -05:00
Mahmood Ali b93643cd96 Fix a panic related to batch GC
`deleteJobVersions` does concurrent modifications to iterated items
while iterating, by deleting job versions while it's iterating on them,
2018-11-19 20:59:45 -05:00
Mahmood Ali bff9c3b3e9 Reproduce a panic related to batch GC
Test case that reproduces a panic with the following stacktrace:

```
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715]

goroutine 35 [running]:
testing.tRunner.func1(0xc0001e2200)
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387
panic(0x167e400, 0x1c43a30)
	/usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530)
	/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc0003a0420, 0x1756059, 0xb)
	/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e
github.com/hashicorp/nomad/nomad/state.(*StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1
github.com/hashicorp/nomad/nomad/state.(*StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2
github.com/hashicorp/nomad/nomad/state.(*StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685
testing.tRunner(0xc0001e2200, 0x1777138)
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353
```
2018-11-19 20:58:32 -05:00
Michael Schurter 56ed4f01be vault: fix panic by checking for nil secret
Vault's RenewSelf(...) API may return (nil, nil). We failed to check if
secret was nil before attempting to use it.

RenewSelf:
e3eee5b4fb/api/auth_token.go (L138-L155)

Calls ParseSecret:
e3eee5b4fb/api/secret.go (L309-L311)

If anyone has an idea on how to test this I didn't see any options. We
use a real Vault service, so there's no opportunity to mock the
response.
2018-11-19 17:07:59 -08:00
Danielle Tomlinson 8bf17fe22d
Merge pull request #4875 from hashicorp/f-constraints
scheduler: Make != constraints more flexible
2018-11-15 11:04:21 -08:00
Danielle Tomlinson 9c72dafc95 scheduler: Add is_set/is_not_set constraints
This adds constraints for asserting that a given attribute or value
exists, or does not exist. This acts as a companion to =, or !=
operators, e.g:

```hcl
constraint {
        attribute = "${attrs.type}"
        operator  = "!="
        value     = "database"
}

constraint {
        attribute = "${attrs.type}"
        operator  = "is_set"
}
```
2018-11-15 11:00:32 -08:00
Preetha Appan e5de50fba8
Initial implementation of device preemption 2018-11-15 11:09:26 -06:00
Mahmood Ali 046f098bac Track Node Device attributes and serve them in API 2018-11-14 14:42:29 -05:00
Mahmood Ali a4a9347501 fix comment typos 2018-11-14 08:36:14 -05:00
Mahmood Ali 1e92161f14
Merge pull request #4858 from hashicorp/b-fix-master-20181109
Fix some tests in master
2018-11-13 16:08:26 -05:00
Alex Dadgar 08dc2ea702
Merge pull request #4867 from hashicorp/b-deployment-progress-deadline
Blocked evaluation fixes
2018-11-13 10:29:03 -08:00
Mahmood Ali 865419e756 convert all config durations to strings in tests 2018-11-13 10:21:40 -05:00
Mahmood Ali 4e18846fd9 Adjust streaming duration
This test expects 11 repeats of the same message emitted at intervals of
200ms; so we need more than 2 seconds to adjust for time sleep
variations and the like.  So raising it to 3s here that should be
enough.
2018-11-13 10:21:40 -05:00
Mahmood Ali 1403ad21b9 Changelog job re-run fix 2018-11-13 07:52:51 -05:00
Mahmood Ali e2d668f21c
Merge pull request #4861 from hashicorp/b-batch-deregister-transaction
Run job deregistering in a single transaction
2018-11-12 20:59:44 -05:00
Alex Dadgar a90dc978e1 Handle new eval being the duplicate properly 2018-11-12 16:02:23 -08:00
Mahmood Ali 8513b3cccb Comment public functions and batch write txn 2018-11-12 16:09:39 -05:00
Preetha Appan 7ef126a027
Smaller methods, and added tests for RPC layer 2018-11-10 17:37:33 -06:00
Preetha Appan 75662b50d1
Use response object/querymeta/writemeta in scheduler config API 2018-11-10 10:31:10 -06:00
Mahmood Ali 9c0a15f3ce Run job deregistering in a single transaction
Fixes https://github.com/hashicorp/nomad/issues/4299

Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals.

Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index.  However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals.  When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation.

This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.
2018-11-09 22:35:26 -05:00
Preetha 3739713ce1
Merge pull request #4839 from hashicorp/b-gc-alloc-jobversion
Remove terminal allocations associated with older job modify index
2018-11-09 12:21:42 -06:00
Preetha Appan 39072977d6
Use create index as trigger condition to gc old terminal allocs 2018-11-09 11:44:21 -06:00
Alex Dadgar 2f06d88f47
Merge pull request #4847 from hashicorp/b-blocked-eval
Blocked evaluation fixes
2018-11-08 13:40:01 -08:00
Alex Dadgar 98398a8a44
Merge pull request #4842 from hashicorp/b-deployment-progress-deadline
Fix multiple bugs with progress deadline handling
2018-11-08 13:31:54 -08:00
Alex Dadgar 991791a513 typo fix 2018-11-08 13:28:27 -08:00
Alex Dadgar be54e56570 review fixes 2018-11-08 09:48:36 -08:00
Preetha Appan 5f0a9d2cfd
Show preemption output in plan CLI 2018-11-08 09:48:43 -06:00
Alex Dadgar dbb05357bc fix test 2018-11-07 11:59:24 -08:00
Alex Dadgar 36abd3a3d8 review comments 2018-11-07 10:33:22 -08:00
Alex Dadgar e3cbb2c82e allocs fit checks if devices get oversubscribed 2018-11-07 10:33:22 -08:00
Alex Dadgar 4f9b3ede87 Split device accounter and allocator 2018-11-07 10:32:03 -08:00
Alex Dadgar 6fa893c801 affinities 2018-11-07 10:32:03 -08:00
Alex Dadgar feb83a2be3 assign devices 2018-11-07 10:32:03 -08:00
Alex Dadgar 2d2248e209 Add devices to allocated resources 2018-11-07 10:32:03 -08:00
Alex Dadgar b1c5d52817 Track jobs by namespace 2018-11-07 10:22:08 -08:00
Alex Dadgar 6d8bb3a7bd Duplicate blocked evals cancelling improved
The old logic for cancelling duplicate blocked evaluations by job id had
the issue where the newer evaluation could have additional node classes
that it is (in)eligible for that we would not capture. This could make
it such that cluster state could change such that the job would make
progress but no evaluation was unblocked.
2018-11-07 10:08:23 -08:00
Preetha Appan a9aec7e628
Fix failing resource subtraction test 2018-11-06 12:26:26 -06:00
Alex Dadgar 261aae32b1 more robust merging of the deployment status when getting updates from the client 2018-11-05 16:39:09 -08:00
Alex Dadgar 1c31970464 Fix multiple tgs with progress deadline handling
Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.

Further fix an issue where the blocked eval optimization would make it
so no evals were created to progress the deployment. To reproduce this
issue, prior to this commit, you can create a job with two task groups.
The first group has count 1 and resources such that it can not be
placed. The second group has count 3, max_parallel=1, and can be placed.
Run this first and then update the second group to do a deployment. It
will place the first of three, but never progress since there exists a
blocked eval. However, that doesn't capture the fact that there are two
groups being deployed.
2018-11-05 16:06:17 -08:00
Preetha Appan 6fdc84cce3
add comment 2018-11-02 18:11:36 -05:00
Preetha Appan a6b714b81c
update preemption tests to use new node resource structs
also includes a fix to remove unnecessary subtraction of network mbits
2018-11-02 17:59:53 -05:00
Preetha b2b52b1ada
Merge pull request #4794 from hashicorp/f-preemption-systemjobs
Preemption for system jobs
2018-11-02 16:28:06 -05:00
Preetha Appan c33469157d
unit test plan apply with preemptions 2018-11-01 20:06:32 -05:00
Preetha Appan 57fe5050f0
more minor review feedback 2018-11-01 17:05:17 -05:00
Preetha Appan fd60e66f86
Plumb alloc resource cache in a few more places.
also removed now unused method
2018-11-01 16:44:43 -05:00
Preetha Appan e586817ce7
batch jobs GC removes terminal allocs if job modifyindex is older than running job 2018-11-01 00:05:31 -05:00
Mahmood Ali 9da19c6450 address review comments 2018-10-30 13:58:52 -04:00
Mahmood Ali 4937095389 Allow artifacts checksum interpolation
Fixes https://github.com/hashicorp/nomad/issues/4814
2018-10-30 13:24:30 -04:00
Preetha Appan f1c3eb2792
Introduce interface with multiple implementations for resource distance 2018-10-30 11:06:32 -05:00
Preetha Appan 8f7eb61823
Introduce a response object for scheduler configuration 2018-10-30 11:06:32 -05:00
Preetha Appan 1a5421f5d7
more minor cleanup 2018-10-30 11:06:32 -05:00
Preetha Appan 0494a098ce
More style and readablity fixes from review 2018-10-30 11:06:32 -05:00
Preetha Appan 1415032c13
More review comments 2018-10-30 11:06:32 -05:00
Preetha Appan b97f85e3e0
style fixes 2018-10-30 11:06:32 -05:00
Preetha Appan 12278527c7
make default config a variable 2018-10-30 11:06:32 -05:00
Preetha Appan 32cc764072
Add fsm layer tests 2018-10-30 11:06:32 -05:00
Preetha Appan 7b8156fc47
Restore/Snapshot plus unit tests for scheduler configuration 2018-10-30 11:06:32 -05:00
Preetha Appan 8807c25b11
Modify preemption code to use new style of resource structs 2018-10-30 11:06:32 -05:00
Preetha Appan c1c1c230e4
Make preemption config a struct to allow for enabling based on scheduler type 2018-10-30 11:06:32 -05:00
Preetha Appan bd34cbb1f7
Support for new scheduler config API, first use case is to disable preemption 2018-10-30 11:06:32 -05:00
Preetha Appan 3190a2c29b
Fix linting 2018-10-30 11:06:32 -05:00
Preetha Appan eb38488d08
Fix logic bug, unit test for plan apply method in state store 2018-10-30 11:06:32 -05:00
Preetha Appan 9e4a35fff0
Fix comment 2018-10-30 11:06:32 -05:00
Preetha Appan cc295b90de
Implement preemption for system jobs.
This commit implements an allocation selection algorithm for finding
allocations to preempt. It currently special cases network resource asks
from others (cpu/memory/disk/iops).
2018-10-30 11:06:32 -05:00
Preetha Appan d11064d6ba
structs and API changes to plan and alloc structs needed for preemption 2018-10-30 11:06:32 -05:00
Preetha Appan 9257387a69
Add number of evictions to DesiredUpdates struct to use in CLI/API 2018-10-30 11:06:32 -05:00
Preetha Appan 5ff4b8e36f
REview feedback 2018-10-30 11:06:32 -05:00
Preetha Appan 5b3bfb63eb
structs and API changes to plan and alloc structs needed for preemption 2018-10-30 11:06:32 -05:00
Michael Schurter 5d49832de4 tests: fix usages of TestClient cleanup and mock driver 2018-10-29 14:21:05 -07:00
Michael Schurter e060174130 ar: fix leader handling, state restoring, and destroying unrun ARs
* Migrated all of the old leader task tests and got them passing
* Refactor and consolidate task killing code in AR to always kill leader
  tasks first
* Fixed lots of issues with state restoring
* Fixed deadlock in AR.Destroy if AR.Run had never been called
* Added a new in memory statedb for testing
2018-10-19 09:45:45 -07:00
Alex Dadgar 6f0ed6184b Fix client reloading and pass the plugin loaders to server and client 2018-10-16 16:56:55 -07:00
Michael Schurter a4b4d7b266 consul service hook
Deregistration works but difficult to test due to terminal updates not
being fully implemented in the new client/ar/tr.
2018-10-16 16:53:29 -07:00
Alex Dadgar e401c660e7 Implement lifecycle hooks on the task runner 2018-10-16 16:53:29 -07:00
Alex Dadgar a78cefec18 use int64 2018-10-16 15:34:32 -07:00
Preetha Appan 7c0d8c646c
Change CPU/Disk/MemoryMB to int everywhere in new resource structs 2018-10-16 16:21:42 -05:00
Alex Dadgar f5a76d8411 review comments 2018-10-15 15:31:13 -07:00
Alex Dadgar f9b056e1d1 Replace attributes map with new Attribute object 2018-10-13 14:08:58 -07:00
Alex Dadgar 04ba425dd5 validate constraints/affinities 2018-10-13 12:27:49 -07:00
Alex Dadgar 9b5aaac410 Device feasability checker 2018-10-13 12:27:49 -07:00
Alex Dadgar bfb4caa2e7 node devices 2018-10-13 12:27:49 -07:00
Alex Dadgar 5a07f9f96e parse affinities and constraints on devices 2018-10-11 14:05:19 -07:00
Alex Dadgar a2a56a930c Diff 2018-10-08 17:02:58 -07:00
Alex Dadgar 6b08b9d6b6 Define device request structs 2018-10-08 15:38:03 -07:00
Alex Dadgar 01f8e5b95f renames 2018-10-04 14:57:25 -07:00
Alex Dadgar 52f9cd7637 fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar bac5cb1e8b Scheduler uses allocated resources 2018-10-02 17:08:25 -07:00
Alex Dadgar 147d2430a1 allocated resources structs 2018-09-29 18:47:28 -07:00
Alex Dadgar 5c8697667e Node reserved resources 2018-09-29 18:44:55 -07:00
Alex Dadgar 3183153315 Node resources on client 2018-09-29 17:23:41 -07:00
Alex Dadgar 9b793531d6
Merge pull request #4720 from hashicorp/b-jet-fixes
Series of scheduler fixes / debugging enhancements
2018-09-25 13:25:11 -07:00
Alex Dadgar bd420692f3 fix logging 2018-09-25 10:49:55 -07:00
Preetha Appan 86e725e84c Added logging around nacked evals in the scheduler worker 2018-09-25 10:49:02 -07:00
Alex Dadgar 6a21f9fe96 Unique TriggerBy for blocked evals
Give blocked evals a unique triggerby reason to make debugging a chain
of evaluations easier.
2018-09-24 14:47:49 -07:00
Alex Dadgar e1a102f58c test allocs fit 2018-09-24 13:59:01 -07:00
Alex Dadgar d7f5be9148 Better comment on snapshotindex 2018-09-24 13:53:43 -07:00
Alex Dadgar 99498da6ed Denormalize jobs in plan and ignore resources of terminal allocs
Denormalize jobs in AppendAllocs:
AppendAlloc was originally only ever called for inplace upgrades and new
allocations. Both these code paths would remove the job from the
allocation. Now we use this to also add fields such as FollowupEvalID
which did not normalize the job. This is only a performance enhancement.

Ignore terminal allocs:
Failed allocations are annotated with the followup Eval ID when one is
created to replace the failed allocation. However, in the plan applier,
when we check if allocations fit, these terminal allocations were not
filtered. This could result in the plan being rejected if the node would
be overcommited if the terminal allocations resources were considered.
2018-09-24 13:53:43 -07:00
Alex Dadgar de442226ae Fix other instances of blocking queries 2018-09-24 13:52:39 -07:00
Alex Dadgar 7f0d241ef4 always handle failed allocation 2018-09-21 15:13:54 -07:00
Alex Dadgar b2449ae1ce Fix deployment watcher index usage
Fixes three issues:
1. Retrieving the latest evaluation index was not properly selecting the
greatest index. This would undermine checks we had to reduce the number
of evaluations created when the latest eval index was greater than any
alloc change
2. Fix an issue where the blocking query code was using the incorrect
index such that the index was higher than necassary.
3. Special case handling of blocked evaluation since the create/snapshot
index is no particularly useful since they can be reblocked.
2018-09-21 13:59:11 -07:00
Alex Dadgar 5009566503 do not bootstrap with non voters 2018-09-19 17:17:39 -07:00
Alex Dadgar e8f89597f5 fix rpc test 2018-09-19 10:17:54 -07:00
Alex Dadgar 9971b3393f yamux 2018-09-17 14:22:40 -07:00
Alex Dadgar b2f500b48c Serf/Raft/Memberlist logger 2018-09-17 13:57:52 -07:00
Alex Dadgar ca28afa3b2 small fixes 2018-09-15 16:42:38 -07:00
Alex Dadgar 3c19d01d7a server 2018-09-15 16:23:13 -07:00
Alex Dadgar 7739ef51ce agent + consul 2018-09-13 10:43:40 -07:00
Preetha Appan 996484981c
Fix panic when reschedule policy for allocation can't be looked up
because its task group changed
2018-09-05 17:01:02 -05:00
Alex Dadgar 4f89cabd34
Merge pull request #4631 from hashicorp/f-plugin-config
Parse plugin configs
2018-09-04 17:04:13 -07:00
Alex Dadgar cc92cd92cd
Merge pull request #4642 from hashicorp/b-vet
Fix vet errors and use newer go version in travis
2018-09-04 17:04:02 -07:00
Alex Dadgar c6576ddac1 Fix make check errors 2018-09-04 16:03:52 -07:00
Preetha Appan 26288b9522
Fix more review feedback 2018-09-04 16:10:11 -05:00
Preetha Appan 751c0eb5a5
code review feedback 2018-09-04 16:10:11 -05:00
Preetha Appan 4f8e925b54
Move topk and delay heap to separate packages under lib 2018-09-04 16:10:11 -05:00
Preetha Appan 9bc0962527
Track top k nodes by norm score rather than top k nodes per scorer 2018-09-04 16:10:11 -05:00
Preetha Appan 6ed527c636
Use heap to store top K scoring nodes.
Scoring metadata is now aggregated by scorer type to make it easier
to parse when reading it in the CLI.
2018-09-04 16:10:11 -05:00
Preetha Appan dd5fe6373f
Fix scoring logic for uneven spread to incorporate current alloc count
Also addressed other small code review comments
2018-09-04 16:10:11 -05:00
Preetha Appan e72c0fe527
more cleanup 2018-09-04 16:10:11 -05:00
Preetha Appan 92d37acc2a
comment and formatting cleanup 2018-09-04 16:10:11 -05:00
Preetha Appan 5812f906c8
Allow empty spread targets, and validate target percentages. 2018-09-04 16:10:11 -05:00
Preetha Appan 71bff00326
validate spread from job/task group validate methods 2018-09-04 16:10:11 -05:00
Preetha Appan fbd0004707
Fix warnings 2018-09-04 16:10:11 -05:00
Preetha Appan 5eb82b6260
Validate method, and rename ratio field to percent 2018-09-04 16:10:11 -05:00
Preetha Appan 0037d72fa8
Structs and validation for spread 2018-09-04 16:10:11 -05:00
Preetha Appan c407e3626f
More review comments 2018-09-04 16:10:11 -05:00
Preetha Appan dbbb4a957a
Fail validation if system job has affinities 2018-09-04 16:10:11 -05:00
Preetha Appan 0bc030c6fb
Treat set_contains as a synonym of set_contains_all 2018-09-04 16:10:11 -05:00
Preetha Appan e85a721cfb
Include affinities in job and task diff, and more test cases 2018-09-04 16:10:11 -05:00
Preetha Appan f06c7ab2ad
Fix Copy method for job and task to include affinities 2018-09-04 16:10:11 -05:00
Preetha Appan 9f0caa9c3d
Affinity parsing, api and structs 2018-09-04 16:10:11 -05:00
Preetha Appan 9e29cfee76
Use readlock 2018-09-04 11:45:05 -05:00
Preetha Appan 062e5f1898
Use eval broker lock when reading/modifying delay heap 2018-08-31 10:59:48 -05:00
Alex Dadgar bff1669ee4 Plugin config parsing 2018-08-29 17:06:01 -07:00
Chelsea Komlo 0a69cdb304
Merge pull request #4565 from hashicorp/b-compare-cert-alg
Error if TLS Certificate signature algorithm isn't supported in cipher suites
2018-08-15 16:09:46 -04:00
Xopherus 8d747578e8 Close multiplexer when context is cancelled
Multiplexer continues to create rpc connections even when
the context which is passed to the underlying rpc connections
is cancelled by the server.

This was causing #4413 - when a SIGHUP causes everything to reload,
it uses context to cancel the underlying http/rpc connections
so that they may come up with the new configuration.
The multiplexer was not being cancelled properly so it would
continue to create rpc connections and constantly fail,
causing communication issues with other nomad agents.

Fixes #4413
2018-08-13 19:32:49 -04:00
Chelsea Holland Komlo 31d6d00381 add simple getter for certificate 2018-08-10 12:37:21 -04:00
Andrei Burd 444ee45aff Parametrized/periodic jobs per child tagged metric emmision 2018-06-21 10:40:56 +03:00
Alex Dadgar b61051b3cd
Merge pull request #4409 from hashicorp/r-client-packages
Refactor client packages
2018-06-13 17:32:25 -07:00
Alex Dadgar 300b1a7a15 Tests only use testlog package logger 2018-06-13 15:40:56 -07:00
Chelsea Komlo 03075b603a
Merge pull request #4399 from hashicorp/r-reload-refactor
Refactor logic for dynamic reloading
2018-06-13 13:35:12 -04:00
Alex Dadgar d0043691fb remove structs + bump version 2018-06-11 13:52:19 -07:00
Alex Dadgar af5753d2cd bump version + generated files 2018-06-11 13:39:42 -07:00
Nick Ethier e75e3ae665
nomad: use require pkg for tests 2018-06-11 13:50:50 -04:00
Nick Ethier 50c72adbd7
nomad: code review comments 2018-06-11 13:27:48 -04:00
Nick Ethier a581cc9c01
nomad/structs: fix job diff test 2018-06-11 13:06:49 -04:00
Nick Ethier 41e010cdc2
nomad: add 'Dispatch' field to Job
New -bash: Dispatch: command not found field is used to denote if the Job is a child dispatched job of
a parameterized job.
2018-06-11 11:59:03 -04:00
Chelsea Holland Komlo de03ce8070 move logic to determine whether to reload tls configuration to tlsutil helper 2018-06-08 14:33:58 -04:00
Chelsea Komlo d738976234
Merge pull request #4395 from hashicorp/b-vault-second
Fix for dynamically reloading vault
2018-06-07 18:03:00 -04:00
Chelsea Holland Komlo dcc9cdfeb7 fixup! comment and move to always log server reload operation 2018-06-07 17:12:36 -04:00
Chelsea Holland Komlo 41e35edf0c fix test that now requires different config for test assertions 2018-06-07 17:07:06 -04:00
Chelsea Holland Komlo 9f6bd7bf3a move logic for testing equality for vault config 2018-06-07 16:23:50 -04:00
Chelsea Holland Komlo 282f37b1ee fix for dynamically reloading vault 2018-06-07 15:34:18 -04:00
Nick Ethier 2555bff4f5
nomad: add error check in test 2018-06-06 14:08:42 -04:00
Nick Ethier d35bf6d184
nomad: handle edge case where node drain event shouldn't be emitted 2018-06-06 14:02:10 -04:00
Alex Dadgar 23cd56dc78 remove generated structs 2018-06-01 16:11:28 -07:00
Alex Dadgar c0386819b3 bump version/lint/generated files 2018-06-01 15:23:10 -07:00
Preetha Appan 4134fcd2c7
Fix test setup for FSMSnapshotRestore_Deployments to use a valid job that exists 2018-05-31 14:39:39 -05:00
Alex Dadgar 7f25fcc1bd
Merge pull request #4354 from hashicorp/b-job-modify
Deployment adds JobSpecModifyIndex
2018-05-31 17:57:38 +00:00
Alex Dadgar f2b2e0482b code review fixes 2018-05-31 10:57:08 -07:00
Preetha c2291a4cf7
Merge pull request #4349 from hashicorp/b-reconcile-raft-upgrade
Remove an unnecessary check in nomad member reconciliation
2018-05-30 13:17:38 -07:00
Alex Dadgar 92777a8018
Merge pull request #4329 from hashicorp/b-leaked-deployments
Clean up leaked deployments on restoration
2018-05-30 20:06:03 +00:00
Preetha 6ec6bcdd87
Merge pull request #4352 from hashicorp/b-update-drain-no-drainstrategy
fix bug with node eligibility staying disabled even after drain is disabled
2018-05-30 11:48:48 -07:00
Alex Dadgar 195e19827b Deployment adds JobSpecModifyIndex
Deployment tracks the Job.JobModifyIndex so that PUTS against /v1/jobs
can be more easily coorelated with the created deployment.

Fixes https://github.com/hashicorp/nomad/issues/4301
2018-05-30 11:33:56 -07:00
Preetha Appan c896a85a96
better test comment 2018-05-30 13:05:15 -05:00
Preetha Appan 647ccc2dc3
fix bug where disabling a node drain when there is no drain strategy set causes scheduling eligibility to stay ineligible 2018-05-30 12:28:46 -05:00
Alex Dadgar ad3dbe8ed0
Merge pull request #4338 from hashicorp/tls_prefer_server_cipher_suites
Add support for tls PreferServerCipherSuites
2018-05-30 17:25:17 +00:00
Preetha Appan 2fd20310ea
Remove checks in member reconcile that was causing servers in protocol 3 to not change their ID in raft forever 2018-05-30 11:34:45 -05:00
Chelsea Holland Komlo 19e4a5489b add support for tls PreferServerCipherSuites
add further tests for tls configuration
2018-05-25 13:20:00 -04:00
Alex Dadgar 15a71cc16e
Merge pull request #4331 from capone212/b-3595-fix-heartbeat
Fixed #3595
2018-05-25 00:57:03 +00:00
capone212 a0d4d4a336 Fixed #3595 (https://github.com/hashicorp/nomad/issues/3595)
Stopping heartbeat timer before remove
2018-05-24 13:15:06 +00:00
Alex Dadgar 352f2e03b5 Clean up leaked deployments on restoration
This PR cancels deployments that are active but do not have a job
associated with them. This is a broken invariant that causes issues in
the deployment watcher since it will not track them. Thus they are
objects that can't be operated on or cleaned up.

Fixes https://github.com/hashicorp/nomad/issues/4286
2018-05-23 16:44:21 -07:00
Chelsea Holland Komlo 38f611a7f2 refactor NewTLSConfiguration to pass in verifyIncoming/verifyOutgoing
add missing fields to TLS merge method
2018-05-23 18:35:30 -04:00
Alex Dadgar c268640c02 Fix noisy log 2018-05-22 14:45:34 -07:00
Alex Dadgar 21c5ed850d Register events 2018-05-22 14:06:33 -07:00
Alex Dadgar 17aac1c9de node heartbeat missed event 2018-05-22 14:05:46 -07:00
Alex Dadgar 1fe9cb4f00 update error message 2018-05-22 14:04:59 -07:00
Alex Dadgar 5f2080bc26 Emit events based on eligibility 2018-05-22 14:04:59 -07:00
Alex Dadgar 86be50fa05
Merge pull request #4284 from hashicorp/f-drain-event
Emit Node Events for draining
2018-05-22 21:04:18 +00:00
Alex Dadgar b6ecb75af9 update error message 2018-05-22 14:01:43 -07:00
Preetha b409a3ed5b
Merge pull request #4313 from hashicorp/b-alloc-gc-desiredstate
Check allocation's desired state in GC eligibility logic
2018-05-21 16:49:49 -07:00
Preetha 159888a856
Merge pull request #4274 from hashicorp/f-force-rescheduling
Add CLI and API support for forcing rescheduling of failed allocs
2018-05-21 16:24:22 -07:00
Preetha Appan a9d63c0df3
Check allocation's desired state in GC eligibility logic in core scheduler 2018-05-21 13:28:31 -05:00
Chelsea Komlo 687c26093c
Merge pull request #4269 from hashicorp/f-tls-remove-weak-standards
Configurable TLS cipher suites and versions; disallow weak ciphers
2018-05-11 08:11:46 -04:00
Alex Dadgar 9a2237bdab Drain complete 2018-05-10 17:22:06 -07:00
Alex Dadgar 0cb31feb1f Add node event when draining is set/removed/updated 2018-05-10 16:54:43 -07:00
Alex Dadgar a35248d1d8 Plumb event via FSM 2018-05-10 16:30:54 -07:00
Preetha Appan bfa0937bbb
Code review feedback 2018-05-10 14:42:24 -05:00
Preetha Appan ca5758741b
Update serf to pick up graceful leave fix 2018-05-10 11:16:24 -05:00
Chelsea Holland Komlo 620558c107 log error if unable to create TLS configuration 2018-05-10 11:51:54 -04:00
Chelsea Holland Komlo 44f536f18e add support for configurable TLS minimum version 2018-05-09 18:07:12 -04:00
Chelsea Holland Komlo 796bae6f1b allow configurable cipher suites
disallow 3DES and RC4 ciphers

add documentation for tls_cipher_suites
2018-05-09 17:15:31 -04:00
Preetha Appan b12df3c64b
Added CLI for evaluating job given ID, and modified client API for evaluate to take a request payload 2018-05-09 15:04:27 -05:00
Preetha Appan ef531b0f34
Add unit tests for forced rescheduling 2018-05-09 11:30:42 -05:00
Chelsea Holland Komlo d51611040f Add driver health information to node list stub 2018-05-09 11:21:54 -04:00
Preetha Appan 1b8d8b2186
Fix logic inversion in force rescheduling 2018-05-08 20:00:06 -05:00
Preetha Appan c1b92c284e
Work in progress - force rescheduling of failed allocs 2018-05-08 17:26:57 -05:00
Preetha Appan c7edbd5f41
newlines in test 2018-05-07 14:55:01 -05:00
Preetha Appan 4e75456beb
Fix deadlock in deadline timer logic when progress deadline is passed and the deployment is updated. 2018-05-07 14:55:01 -05:00
Preetha Appan cba13e4ec5
Fix test set up to set ModifyTime for alloc 2018-05-07 14:55:01 -05:00
Preetha Appan 19b096d203
Set modify time for allocs in unit test, and define current time in one spot 2018-05-07 14:55:01 -05:00
Preetha Appan 4c377b112e
Fix panic in deployment watcher when deployment is not in the state store due to a gc 2018-05-07 14:55:01 -05:00
Preetha 02d63432b4
Fix typo 2018-05-07 14:55:01 -05:00
Alex Dadgar 738056634e
Fix the initial progress deadline calculation when the alloc is inplace updated to be part of a new deployment 2018-05-07 14:55:01 -05:00
Michael Schurter e90d051c43
consul: change hashed canary bytes 2018-05-07 14:55:01 -05:00
Alex Dadgar 768fec8505
Allow healthy canary deployment to skip progress deadline 2018-05-07 14:55:01 -05:00
Alex Dadgar 8626c1b94a
Reschedule when we have canaries properly 2018-05-07 14:55:01 -05:00
Michael Schurter 50e04c976e
consul: support canary tags for services
Also refactor Consul ServiceClient to take a struct instead of a massive
set of arguments. Meant updating a lot of code but it should be far
easier to extend in the future as you will only need to update a single
struct instead of every single call site.

Adds an e2e test for canary tags.
2018-05-07 14:55:01 -05:00
Michael Schurter a3038cefb4
typo: transistion -> transition 2018-05-07 14:50:01 -05:00
Alex Dadgar bd38675365
Fix tests 2018-05-07 14:50:01 -05:00