Commit graph

2671 commits

Author SHA1 Message Date
Michael Schurter 0cd35ba335 test: fix flaky garbage collect test
This seems to fix TestClientAllocations_GarbageCollectAll_Remote being
flaky.

This test confuses me. It joins 2 servers, but then goes out of its way
to make sure the test client only interacts with one. There are not
enough comments for me to figure out the precise assertions this test is
trying to make.

A good old fashioned wait-for-the-client-to-register seems to fix the
flakiness though. The error was that the node could not be found, so
this makes some sense. However, lots of other tests seem to use the same
"wait for node" logic and don't appear to be flaky, so who knows why
waiting fixes this one.

Passes with -race.
2019-01-18 16:01:30 -08:00
Mahmood Ali 7bdd43f3e0 api: avoid codegen for syncing
Given that the values will rarely change, specially considering that any
changes would be backward incompatible change.  As such, it's simpler to
keep syncing manually in the rare occasion and avoid the syncing code
overhead.
2019-01-18 18:52:31 -05:00
Preetha Appan 510d7839e4
code review comments 2019-01-18 17:41:39 -06:00
Mahmood Ali 253532ec00 api: avoid import nomad/structs pkg
nomad/structs is an internal package and imports many libraries (e.g.
raft, codec) that are not relevant to api clients, and may cause
unnecessary dependency pain (e.g. `github.com/ugorji/go/codec`
version is very old now).

Here, we add a code generator that imports the relevant constants from
`nomad/structs`.

I considered using this approach for other structs, but didn't find a
quick viable way to reduce duplication.  `nomad/structs` use values as
struct fields (e.g. `string`), while `api` uses value pointer (e.g.
`*string`) instead.  Also, sometimes, `api` structs contain deprecated
fields or additional documentation, so simple copy-paste doesn't work.
For these reasons, I opt to keep the status quo.
2019-01-18 14:51:19 -05:00
Preetha Appan be9656d195
fix linting 2019-01-17 15:36:33 -06:00
Preetha Appan 0f8a113ead
Refactor to find jobs with child instances more effeciently
also added unit tests
2019-01-17 14:29:48 -06:00
Preetha Appan be36fee48e
Use IsParameterized/isPeriodic methods 2019-01-17 12:15:42 -06:00
Preetha Appan 81a8f18cac
Fix bug in reconcile summaries that affects periodic/parameterized jobs
This fixes incorrect parent job summaries by recomputing them in the
ReconcileJobSummaries method in the state store
2019-01-17 12:01:01 -06:00
Nick Ethier 597b7b751d
tr: add retry /w backoff to stats_hook failure 2019-01-12 12:18:24 -05:00
Mahmood Ali 4414a2ce1c tests: remove tests for unsupported features
With switching to driver plugins, driver validation is quite tricky and
we need to do some design thinking before supporting it against.
2019-01-10 10:21:48 -05:00
Mahmood Ali 1f2473263e fix more cases of logging arity errors 2019-01-09 09:22:47 -05:00
Mahmood Ali 6f077a73dc Fix panic on failure
Error expects an odd number of arguments, and panics otherwise.
2019-01-08 12:19:44 -05:00
Michael Schurter 324e989327
Merge pull request #5034 from hashicorp/test-fix-races
Test fix races
2019-01-08 07:04:09 -08:00
Alex Dadgar 79cfe26021 vet 2019-01-07 14:49:41 -08:00
Alex Dadgar 8a35d7b1dd Test recovery 2019-01-07 14:49:41 -08:00
Nick Ethier a96afb6c91
fix tests that fail as a result of async client startup 2018-12-20 00:53:44 -05:00
Michael Schurter 6c1dbb659d test: fix race and nil panic in nomad/ tests
Race was test only and due to unlocked map access.

Panic was test only and due to checking a field on a struct even when we
knew the struct was nil.

Race output that was fixed:
```
==================
WARNING: DATA RACE
Read at 0x00c000697dd0 by goroutine 768:
  runtime.mapaccess2()
      /usr/local/go/src/runtime/map.go:439 +0x0
  github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds.func8()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:402
+0xe6
  github.com/hashicorp/nomad/testutil.WaitForResultRetries()
      /home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:30
+0x5a
  github.com/hashicorp/nomad/testutil.WaitForResult()
      /home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:22
+0x57
  github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:401
+0xb53
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:827 +0x162

Previous write at 0x00c000697dd0 by goroutine 569:
  runtime.mapassign()
      /usr/local/go/src/runtime/map.go:549 +0x0
  github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).Add()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:224
+0x2eb
  github.com/hashicorp/nomad/nomad.(*Server).restorePeriodicDispatcher()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:394
+0x29a
  github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:234
+0x593
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c

Goroutine 768 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:878 +0x650
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:1119 +0xa8
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:827 +0x162
  testing.runTests()
      /usr/local/go/src/testing/testing.go:1117 +0x4ee
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:1034 +0x2ee
  main.main()
      _testmain.go:1150 +0x221

Goroutine 569 (running) created at:
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
==================
```
2018-12-19 15:48:02 -08:00
Michael Schurter 004fa574cb test: fix race in eval broker update chan
Similar to previous commits the delayed eval update chan was set and
access from different goroutines causing a race. Passing the chan on the
stack resolves the race.

Race output from `go test -race -run 'Server_RPC$'` in nomad/

```
==================
WARNING: DATA RACE
Write at 0x00c000339150 by goroutine 63:
  github.com/hashicorp/nomad/nomad.(*EvalBroker).flush()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:708
+0x3dc
  github.com/hashicorp/nomad/nomad.(*EvalBroker).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:174
+0xc4
  github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:718
+0x1fd
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c

Previous read at 0x00c000339150 by goroutine 73:
  github.com/hashicorp/nomad/nomad.(*EvalBroker).runDelayedEvalsWatcher()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:771
+0x176

Goroutine 63 (running) created at:
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269

Goroutine 73 (running) created at:
  github.com/hashicorp/nomad/nomad.(*EvalBroker).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:170
+0x173
  github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:207
+0x355
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
2018-12-19 15:48:02 -08:00
Michael Schurter 1c137690c4 test: fix race around block eval chans
Similar to previous commit, stop and change chans were being set and
accessed from different goroutines. Passing the chans on the stack
resolves the race.

Output from `go test -race -run 'Server_RPC$' in nomad/

```
==================
WARNING: DATA RACE
Write at 0x00c0002b4e10 by goroutine 63:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).Flush()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:648
+0x32a
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149
+0x12b
  github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721
+0x232
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c

Previous read at 0x00c0002b4e10 by goroutine 75:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).watchCapacity()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:483
+0xfe

Goroutine 63 (running) created at:
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269

Goroutine 75 (finished) created at:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:141
+0xba
  github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210
+0x392
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
==================
WARNING: DATA RACE
Write at 0x00c0002b4e50 by goroutine 63:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).Flush()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:649
+0x388
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149
+0x12b
  github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721
+0x232
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c

Previous read at 0x00c0002b4e50 by goroutine 77:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).prune()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:690
+0xae

Goroutine 63 (running) created at:
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269

Goroutine 77 (finished) created at:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:142
+0xdc
  github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210
+0x392
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
2018-12-19 15:48:02 -08:00
Michael Schurter 80263861aa test: fix race around updateCh handling
PeriodicDispatch.SetEnabled sets updateCh in one goroutine, and
PeriodicDispatch.run accesses updateCh in another.

The race can be prevented by having SetEnabled pass updateCh to run.

Race detector output from `go test -race -run TestServer_RPC` in nomad/

```
==================
WARNING: DATA RACE
Write at 0x00c0001d3f48 by goroutine 75:
  github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:468
+0x256
  github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:724
+0x267
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:131
+0x3c
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:163
+0x4dd
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c

Previous read at 0x00c0001d3f48 by goroutine 515:
  github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).run()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:338
+0x177

Goroutine 75 (running) created at:
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269

Goroutine 515 (running) created at:
  github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:176
+0x1bc
  github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:231
+0x582
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
2018-12-19 15:48:02 -08:00
Danielle Tomlinson 3647b701a6 taskrunner: Emit task events when a hook fails 2018-12-13 18:20:18 +01:00
Chris Baker af593c401c
Merge pull request #4974 from hashicorp/b-1173-log-spam
rpc accept loop: added backoff on logging
2018-12-12 16:54:42 -08:00
Chris Baker 121a9eb8cb some changes for more idiomatic code 2018-12-12 23:11:17 +00:00
Alex Dadgar fbe4d67d1b fix iops related tests 2018-12-12 14:32:22 -08:00
Chris Baker 34600f8b75 fixed bug in loop delay 2018-12-12 19:16:41 +00:00
Chris Baker 89c64932c1 gofmt 2018-12-12 19:09:06 +00:00
Chris Baker 22c11d8799 improved code for readability 2018-12-12 18:52:06 +00:00
Preetha f406e66ab8
Merge pull request #4881 from hashicorp/f-device-preemption
Device preemption
2018-12-11 18:34:19 -06:00
Alex Dadgar 1531b6d534
Merge pull request #4970 from hashicorp/f-no-iops
Deprecate IOPS
2018-12-11 12:51:22 -08:00
Chris Baker 59beae35df nomad/rpc listener: modified to throttle logging on "permanent" Accept() errors as well (with a higher delay cap) 2018-12-07 22:14:15 +00:00
Chris Baker 707bac0a7b rpc accept loop: added backoff on logging for failed connections, in case there is a fast fail loop (NMD-1173) 2018-12-07 20:12:55 +00:00
Alex Dadgar c918a96490 Warn if IOPS is being used 2018-12-06 16:17:09 -08:00
Alex Dadgar 1e3c3cb287 Deprecate IOPS
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Alex Dadgar 14a61ea3ea Don't GC running but desired stop allocations
This PR fixes an edge case where we could GC an allocation that was in a
desired stop state but had not terminated yet. This can be hit if the
client hasn't shutdown the allocation yet or if the allocation is still
shutting down (long kill_timeout).

Fixes https://github.com/hashicorp/nomad/issues/4940
2018-12-05 13:01:12 -08:00
Mahmood Ali adb4d69576
Merge pull request #4956 from hashicorp/b-vault-client-tweaks-followup
server/vault: Lock Vault expiration tracking
2018-12-04 19:46:59 -05:00
Mahmood Ali 50e38104a5 server/nomad: Lock Vault expiration tracking
`currentExpiration` field is accessed in multiple goroutines: Stats and
renewal, so needs locking.

I don't anticipate high contention, so simple mutex suffices.
2018-12-04 09:29:48 -05:00
Preetha Appan 8656d3379f
Add guards around subtracting summary count 2018-12-03 11:16:35 -06:00
Danielle Tomlinson 51a9f7369e
Merge pull request #4936 from hashicorp/f-legacy-refactor
Refactor and repackage client/driver
2018-11-30 13:38:06 +01:00
Danielle Tomlinson d4cbd608ff nomad: Remove on-submission job validation
With the introduction of driver plugins, we're temporarily relying on
_run time validation_ of driver configurations, rather than submission
time.
2018-11-30 10:47:08 +01:00
Nick Ethier 80ae7e34f4
Merge pull request #4906 from hashicorp/f-metric-prefix-master
Port metric prefix filtering to master
2018-11-29 22:27:47 -05:00
Nick Ethier b1484aec33
nomad: fix hclog usage 2018-11-29 22:27:39 -05:00
Mahmood Ali 0a2611e41f vault: protect against empty Vault secret response
Also, fix a case where a successful second attempt of loading token can
cause a panic.
2018-11-29 09:34:17 -05:00
Alex Dadgar 4ee603c382 Device hook and devices affect computed node class
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.

TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
2018-11-27 17:25:33 -08:00
Nick Ethier 95362eaa02
Merge pull request #4844 from hashicorp/f-docker-plugin
Docker driver plugin
2018-11-20 20:43:03 -05:00
Mahmood Ali 2e6133fd33 nil secrets as recoverable to keep renew attempts 2018-11-20 17:11:55 -05:00
Mahmood Ali 5827438983 Renew past recorded expiry till unrecoverable error
Keep attempting to renew Vault token past locally recorded expiry, just
in case the token was renewed out of band, e.g. on another Nomad server,
until Vault returns an unrecoverable error.
2018-11-20 17:10:55 -05:00
Mahmood Ali 5836a341dd fix typo 2018-11-20 17:10:55 -05:00
Mahmood Ali 93add67e04 round ttl duration for users 2018-11-20 17:10:55 -05:00
Mahmood Ali 4a0544b369 Track renewal expiration properly 2018-11-20 17:10:55 -05:00
Mahmood Ali 79aa934a4b reconcile interface 2018-11-20 17:10:55 -05:00
Mahmood Ali 6efea6d8fc Populate agent-info with vault
Return Vault TTL info to /agent/self API and `nomad agent-info` command.
2018-11-20 17:10:55 -05:00
Mahmood Ali 6034af5084 Avoid explicit precomputed stats field
Seems like the stats field is a micro-optimization that doesn't justify
the complexity it introduces.  Removing it and computing the stats from
revoking field directly.
2018-11-20 17:10:54 -05:00
Mahmood Ali 14842200ec More metrics for Server vault
Add a gauge to track remaining time-to-live, duration of renewal request API call.
2018-11-20 17:10:54 -05:00
Mahmood Ali e1994e59bd address review comments 2018-11-20 17:10:54 -05:00
Mahmood Ali 35179c9655 Wrap Vault API api errors for easing debugging 2018-11-20 17:10:54 -05:00
Mahmood Ali 55456fc823 Set a 1s floor for Vault renew operation backoff 2018-11-20 17:10:54 -05:00
Mahmood Ali 7ad8f6c103
Merge pull request #4903 from hashicorp/b-delete-versions-mod-while-iter
Fix a panic related to batch GC
2018-11-20 15:16:02 -05:00
Mahmood Ali 6281700c0c address review comments 2018-11-20 13:21:39 -05:00
Nick Ethier 5c5cae79ab
nomad: only lookup job is disable_dispatched_job_summary_metrics is set 2018-11-19 23:22:23 -05:00
Nick Ethier 8ac69f440d
nomad: lookup job instead of adding Dispatched to summary 2018-11-19 23:22:02 -05:00
Nick Ethier 85b221a1d6
nomad: add flag to disable publishing of job_summary metrics for dispatched jobs 2018-11-19 23:21:19 -05:00
Nick Ethier 29591a7c2e
task_runner: emit event on task exit with exit result details 2018-11-19 22:59:17 -05:00
Mahmood Ali d744e71fa9 add a missing no errorassertion 2018-11-19 21:44:00 -05:00
Mahmood Ali b93643cd96 Fix a panic related to batch GC
`deleteJobVersions` does concurrent modifications to iterated items
while iterating, by deleting job versions while it's iterating on them,
2018-11-19 20:59:45 -05:00
Mahmood Ali bff9c3b3e9 Reproduce a panic related to batch GC
Test case that reproduces a panic with the following stacktrace:

```
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715]

goroutine 35 [running]:
testing.tRunner.func1(0xc0001e2200)
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387
panic(0x167e400, 0x1c43a30)
	/usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530)
	/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc0003a0420, 0x1756059, 0xb)
	/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e
github.com/hashicorp/nomad/nomad/state.(*StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1
github.com/hashicorp/nomad/nomad/state.(*StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2
github.com/hashicorp/nomad/nomad/state.(*StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685
testing.tRunner(0xc0001e2200, 0x1777138)
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353
```
2018-11-19 20:58:32 -05:00
Michael Schurter 56ed4f01be vault: fix panic by checking for nil secret
Vault's RenewSelf(...) API may return (nil, nil). We failed to check if
secret was nil before attempting to use it.

RenewSelf:
e3eee5b4fb/api/auth_token.go (L138-L155)

Calls ParseSecret:
e3eee5b4fb/api/secret.go (L309-L311)

If anyone has an idea on how to test this I didn't see any options. We
use a real Vault service, so there's no opportunity to mock the
response.
2018-11-19 17:07:59 -08:00
Danielle Tomlinson 8bf17fe22d
Merge pull request #4875 from hashicorp/f-constraints
scheduler: Make != constraints more flexible
2018-11-15 11:04:21 -08:00
Danielle Tomlinson 9c72dafc95 scheduler: Add is_set/is_not_set constraints
This adds constraints for asserting that a given attribute or value
exists, or does not exist. This acts as a companion to =, or !=
operators, e.g:

```hcl
constraint {
        attribute = "${attrs.type}"
        operator  = "!="
        value     = "database"
}

constraint {
        attribute = "${attrs.type}"
        operator  = "is_set"
}
```
2018-11-15 11:00:32 -08:00
Preetha Appan e5de50fba8
Initial implementation of device preemption 2018-11-15 11:09:26 -06:00
Mahmood Ali 046f098bac Track Node Device attributes and serve them in API 2018-11-14 14:42:29 -05:00
Mahmood Ali a4a9347501 fix comment typos 2018-11-14 08:36:14 -05:00
Mahmood Ali 1e92161f14
Merge pull request #4858 from hashicorp/b-fix-master-20181109
Fix some tests in master
2018-11-13 16:08:26 -05:00
Alex Dadgar 08dc2ea702
Merge pull request #4867 from hashicorp/b-deployment-progress-deadline
Blocked evaluation fixes
2018-11-13 10:29:03 -08:00
Mahmood Ali 865419e756 convert all config durations to strings in tests 2018-11-13 10:21:40 -05:00
Mahmood Ali 4e18846fd9 Adjust streaming duration
This test expects 11 repeats of the same message emitted at intervals of
200ms; so we need more than 2 seconds to adjust for time sleep
variations and the like.  So raising it to 3s here that should be
enough.
2018-11-13 10:21:40 -05:00
Mahmood Ali 1403ad21b9 Changelog job re-run fix 2018-11-13 07:52:51 -05:00
Mahmood Ali e2d668f21c
Merge pull request #4861 from hashicorp/b-batch-deregister-transaction
Run job deregistering in a single transaction
2018-11-12 20:59:44 -05:00
Alex Dadgar a90dc978e1 Handle new eval being the duplicate properly 2018-11-12 16:02:23 -08:00
Mahmood Ali 8513b3cccb Comment public functions and batch write txn 2018-11-12 16:09:39 -05:00
Preetha Appan 7ef126a027
Smaller methods, and added tests for RPC layer 2018-11-10 17:37:33 -06:00
Preetha Appan 75662b50d1
Use response object/querymeta/writemeta in scheduler config API 2018-11-10 10:31:10 -06:00
Mahmood Ali 9c0a15f3ce Run job deregistering in a single transaction
Fixes https://github.com/hashicorp/nomad/issues/4299

Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals.

Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index.  However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals.  When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation.

This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.
2018-11-09 22:35:26 -05:00
Preetha 3739713ce1
Merge pull request #4839 from hashicorp/b-gc-alloc-jobversion
Remove terminal allocations associated with older job modify index
2018-11-09 12:21:42 -06:00
Preetha Appan 39072977d6
Use create index as trigger condition to gc old terminal allocs 2018-11-09 11:44:21 -06:00
Alex Dadgar 2f06d88f47
Merge pull request #4847 from hashicorp/b-blocked-eval
Blocked evaluation fixes
2018-11-08 13:40:01 -08:00
Alex Dadgar 98398a8a44
Merge pull request #4842 from hashicorp/b-deployment-progress-deadline
Fix multiple bugs with progress deadline handling
2018-11-08 13:31:54 -08:00
Alex Dadgar 991791a513 typo fix 2018-11-08 13:28:27 -08:00
Alex Dadgar be54e56570 review fixes 2018-11-08 09:48:36 -08:00
Preetha Appan 5f0a9d2cfd
Show preemption output in plan CLI 2018-11-08 09:48:43 -06:00
Alex Dadgar dbb05357bc fix test 2018-11-07 11:59:24 -08:00
Alex Dadgar 36abd3a3d8 review comments 2018-11-07 10:33:22 -08:00
Alex Dadgar e3cbb2c82e allocs fit checks if devices get oversubscribed 2018-11-07 10:33:22 -08:00
Alex Dadgar 4f9b3ede87 Split device accounter and allocator 2018-11-07 10:32:03 -08:00
Alex Dadgar 6fa893c801 affinities 2018-11-07 10:32:03 -08:00
Alex Dadgar feb83a2be3 assign devices 2018-11-07 10:32:03 -08:00
Alex Dadgar 2d2248e209 Add devices to allocated resources 2018-11-07 10:32:03 -08:00
Alex Dadgar b1c5d52817 Track jobs by namespace 2018-11-07 10:22:08 -08:00
Alex Dadgar 6d8bb3a7bd Duplicate blocked evals cancelling improved
The old logic for cancelling duplicate blocked evaluations by job id had
the issue where the newer evaluation could have additional node classes
that it is (in)eligible for that we would not capture. This could make
it such that cluster state could change such that the job would make
progress but no evaluation was unblocked.
2018-11-07 10:08:23 -08:00
Preetha Appan a9aec7e628
Fix failing resource subtraction test 2018-11-06 12:26:26 -06:00
Alex Dadgar 261aae32b1 more robust merging of the deployment status when getting updates from the client 2018-11-05 16:39:09 -08:00