Preetha Appan
7578522f58
variable name fix
2019-01-29 13:48:45 -06:00
Preetha Appan
a6cebbbf9e
Make sure that all servers are 0.9 before applying scheduler config entry
2019-01-29 12:47:42 -06:00
Michael Schurter
3aba7ee826
nomad: fix panic when no node conn found
...
A missing return would cause a panic when a server could find no route
to a client.
2019-01-28 21:55:35 -08:00
Mahmood Ali
f9164dae67
Merge pull request #5228 from hashicorp/f-vault-err-tweaks
...
server/vault: tweak error messages
2019-01-25 11:17:31 -05:00
Mahmood Ali
f4560d8a2a
server/vault: tweak error messages
...
Closes #5139
2019-01-25 10:33:54 -05:00
Preetha
ec92bf673c
Merge pull request #5223 from hashicorp/f-jobs-list-datacenters
...
Add Datacenters to the JobListStub struct
2019-01-24 08:13:30 -06:00
Michael Schurter
13f061a83f
Merge pull request #5196 from hashicorp/f-plugin-utils
...
Make plugins/shared external and make pluginutls/
2019-01-23 06:59:32 -08:00
Michael Schurter
32daa7b47b
goimports until make check is happy
2019-01-23 06:27:14 -08:00
Michael Schurter
be0bab7c3f
move pluginutils -> helper/pluginutils
...
I wanted a different color bikeshed, so I get to paint it
2019-01-22 15:50:08 -08:00
Alex Dadgar
4bdccab550
goimports
2019-01-22 15:44:31 -08:00
Alex Dadgar
cdcd3c929c
loader and singleton
2019-01-22 15:11:57 -08:00
Alex Dadgar
6c2782f037
move catalog + grpcutils
2019-01-22 15:11:57 -08:00
Preetha Appan
38422642cb
Use DesiredState to determine whether to stop sending task events
2019-01-22 16:43:32 -06:00
Michael Lange
ce7bc4f56f
Add Datacenters to the JobsListStub struct
...
So it can be used for filtering the full list of jobs
2019-01-22 11:16:35 -08:00
Mahmood Ali
e1803b685b
tests: deflake TestClientAllocations_GarbageCollect_Remote
...
Use the same strategy as one in f2f383b07543a09ca989b82738926f7248e1ab28
2019-01-19 09:07:27 -05:00
Mahmood Ali
b2203a3a22
Merge pull request #5215 from hashicorp/test-fix-garbagecollect
...
test: fix flaky garbage collect test
2019-01-18 21:10:01 -05:00
Mahmood Ali
05e32fb525
Merge pull request #5213 from hashicorp/b-api-separate
...
Slimmer /api package
2019-01-18 20:52:53 -05:00
Michael Schurter
0cd35ba335
test: fix flaky garbage collect test
...
This seems to fix TestClientAllocations_GarbageCollectAll_Remote being
flaky.
This test confuses me. It joins 2 servers, but then goes out of its way
to make sure the test client only interacts with one. There are not
enough comments for me to figure out the precise assertions this test is
trying to make.
A good old fashioned wait-for-the-client-to-register seems to fix the
flakiness though. The error was that the node could not be found, so
this makes some sense. However, lots of other tests seem to use the same
"wait for node" logic and don't appear to be flaky, so who knows why
waiting fixes this one.
Passes with -race.
2019-01-18 16:01:30 -08:00
Mahmood Ali
7bdd43f3e0
api: avoid codegen for syncing
...
Given that the values will rarely change, specially considering that any
changes would be backward incompatible change. As such, it's simpler to
keep syncing manually in the rare occasion and avoid the syncing code
overhead.
2019-01-18 18:52:31 -05:00
Preetha Appan
510d7839e4
code review comments
2019-01-18 17:41:39 -06:00
Mahmood Ali
253532ec00
api: avoid import nomad/structs pkg
...
nomad/structs is an internal package and imports many libraries (e.g.
raft, codec) that are not relevant to api clients, and may cause
unnecessary dependency pain (e.g. `github.com/ugorji/go/codec`
version is very old now).
Here, we add a code generator that imports the relevant constants from
`nomad/structs`.
I considered using this approach for other structs, but didn't find a
quick viable way to reduce duplication. `nomad/structs` use values as
struct fields (e.g. `string`), while `api` uses value pointer (e.g.
`*string`) instead. Also, sometimes, `api` structs contain deprecated
fields or additional documentation, so simple copy-paste doesn't work.
For these reasons, I opt to keep the status quo.
2019-01-18 14:51:19 -05:00
Preetha Appan
be9656d195
fix linting
2019-01-17 15:36:33 -06:00
Preetha Appan
0f8a113ead
Refactor to find jobs with child instances more effeciently
...
also added unit tests
2019-01-17 14:29:48 -06:00
Preetha Appan
be36fee48e
Use IsParameterized/isPeriodic methods
2019-01-17 12:15:42 -06:00
Preetha Appan
81a8f18cac
Fix bug in reconcile summaries that affects periodic/parameterized jobs
...
This fixes incorrect parent job summaries by recomputing them in the
ReconcileJobSummaries method in the state store
2019-01-17 12:01:01 -06:00
Nick Ethier
597b7b751d
tr: add retry /w backoff to stats_hook failure
2019-01-12 12:18:24 -05:00
Mahmood Ali
4414a2ce1c
tests: remove tests for unsupported features
...
With switching to driver plugins, driver validation is quite tricky and
we need to do some design thinking before supporting it against.
2019-01-10 10:21:48 -05:00
Nick Wales
7a7b5da0df
Adds optional Consul service tags to nomad server and agent services, gh#4297
2019-01-09 22:02:46 +00:00
Mahmood Ali
1f2473263e
fix more cases of logging arity errors
2019-01-09 09:22:47 -05:00
Mahmood Ali
6f077a73dc
Fix panic on failure
...
Error expects an odd number of arguments, and panics otherwise.
2019-01-08 12:19:44 -05:00
Michael Schurter
324e989327
Merge pull request #5034 from hashicorp/test-fix-races
...
Test fix races
2019-01-08 07:04:09 -08:00
Alex Dadgar
79cfe26021
vet
2019-01-07 14:49:41 -08:00
Alex Dadgar
8a35d7b1dd
Test recovery
2019-01-07 14:49:41 -08:00
Nick Ethier
a96afb6c91
fix tests that fail as a result of async client startup
2018-12-20 00:53:44 -05:00
Michael Schurter
6c1dbb659d
test: fix race and nil panic in nomad/ tests
...
Race was test only and due to unlocked map access.
Panic was test only and due to checking a field on a struct even when we
knew the struct was nil.
Race output that was fixed:
```
==================
WARNING: DATA RACE
Read at 0x00c000697dd0 by goroutine 768:
runtime.mapaccess2()
/usr/local/go/src/runtime/map.go:439 +0x0
github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds.func8()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:402
+0xe6
github.com/hashicorp/nomad/testutil.WaitForResultRetries()
/home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:30
+0x5a
github.com/hashicorp/nomad/testutil.WaitForResult()
/home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:22
+0x57
github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:401
+0xb53
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
Previous write at 0x00c000697dd0 by goroutine 569:
runtime.mapassign()
/usr/local/go/src/runtime/map.go:549 +0x0
github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).Add()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:224
+0x2eb
github.com/hashicorp/nomad/nomad.(*Server).restorePeriodicDispatcher()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:394
+0x29a
github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:234
+0x593
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
Goroutine 768 (running) created at:
testing.(*T).Run()
/usr/local/go/src/testing/testing.go:878 +0x650
testing.runTests.func1()
/usr/local/go/src/testing/testing.go:1119 +0xa8
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
testing.runTests()
/usr/local/go/src/testing/testing.go:1117 +0x4ee
testing.(*M).Run()
/usr/local/go/src/testing/testing.go:1034 +0x2ee
main.main()
_testmain.go:1150 +0x221
Goroutine 569 (running) created at:
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
==================
```
2018-12-19 15:48:02 -08:00
Michael Schurter
004fa574cb
test: fix race in eval broker update chan
...
Similar to previous commits the delayed eval update chan was set and
access from different goroutines causing a race. Passing the chan on the
stack resolves the race.
Race output from `go test -race -run 'Server_RPC$'` in nomad/
```
==================
WARNING: DATA RACE
Write at 0x00c000339150 by goroutine 63:
github.com/hashicorp/nomad/nomad.(*EvalBroker).flush()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:708
+0x3dc
github.com/hashicorp/nomad/nomad.(*EvalBroker).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:174
+0xc4
github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:718
+0x1fd
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
Previous read at 0x00c000339150 by goroutine 73:
github.com/hashicorp/nomad/nomad.(*EvalBroker).runDelayedEvalsWatcher()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:771
+0x176
Goroutine 63 (running) created at:
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
Goroutine 73 (running) created at:
github.com/hashicorp/nomad/nomad.(*EvalBroker).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:170
+0x173
github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:207
+0x355
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
2018-12-19 15:48:02 -08:00
Michael Schurter
1c137690c4
test: fix race around block eval chans
...
Similar to previous commit, stop and change chans were being set and
accessed from different goroutines. Passing the chans on the stack
resolves the race.
Output from `go test -race -run 'Server_RPC$' in nomad/
```
==================
WARNING: DATA RACE
Write at 0x00c0002b4e10 by goroutine 63:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).Flush()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:648
+0x32a
github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149
+0x12b
github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721
+0x232
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
Previous read at 0x00c0002b4e10 by goroutine 75:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).watchCapacity()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:483
+0xfe
Goroutine 63 (running) created at:
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
Goroutine 75 (finished) created at:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:141
+0xba
github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210
+0x392
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
==================
WARNING: DATA RACE
Write at 0x00c0002b4e50 by goroutine 63:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).Flush()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:649
+0x388
github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149
+0x12b
github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721
+0x232
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
Previous read at 0x00c0002b4e50 by goroutine 77:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).prune()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:690
+0xae
Goroutine 63 (running) created at:
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
Goroutine 77 (finished) created at:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:142
+0xdc
github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210
+0x392
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
2018-12-19 15:48:02 -08:00
Michael Schurter
80263861aa
test: fix race around updateCh handling
...
PeriodicDispatch.SetEnabled sets updateCh in one goroutine, and
PeriodicDispatch.run accesses updateCh in another.
The race can be prevented by having SetEnabled pass updateCh to run.
Race detector output from `go test -race -run TestServer_RPC` in nomad/
```
==================
WARNING: DATA RACE
Write at 0x00c0001d3f48 by goroutine 75:
github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:468
+0x256
github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:724
+0x267
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:131
+0x3c
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:163
+0x4dd
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
Previous read at 0x00c0001d3f48 by goroutine 515:
github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).run()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:338
+0x177
Goroutine 75 (running) created at:
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
Goroutine 515 (running) created at:
github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:176
+0x1bc
github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:231
+0x582
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
2018-12-19 15:48:02 -08:00
Danielle Tomlinson
3647b701a6
taskrunner: Emit task events when a hook fails
2018-12-13 18:20:18 +01:00
Chris Baker
af593c401c
Merge pull request #4974 from hashicorp/b-1173-log-spam
...
rpc accept loop: added backoff on logging
2018-12-12 16:54:42 -08:00
Chris Baker
121a9eb8cb
some changes for more idiomatic code
2018-12-12 23:11:17 +00:00
Alex Dadgar
fbe4d67d1b
fix iops related tests
2018-12-12 14:32:22 -08:00
Chris Baker
34600f8b75
fixed bug in loop delay
2018-12-12 19:16:41 +00:00
Chris Baker
89c64932c1
gofmt
2018-12-12 19:09:06 +00:00
Chris Baker
22c11d8799
improved code for readability
2018-12-12 18:52:06 +00:00
Preetha
f406e66ab8
Merge pull request #4881 from hashicorp/f-device-preemption
...
Device preemption
2018-12-11 18:34:19 -06:00
Alex Dadgar
1531b6d534
Merge pull request #4970 from hashicorp/f-no-iops
...
Deprecate IOPS
2018-12-11 12:51:22 -08:00
Chris Baker
59beae35df
nomad/rpc listener: modified to throttle logging on "permanent" Accept() errors as well (with a higher delay cap)
2018-12-07 22:14:15 +00:00
Chris Baker
707bac0a7b
rpc accept loop: added backoff on logging for failed connections, in case there is a fast fail loop (NMD-1173)
2018-12-07 20:12:55 +00:00
Alex Dadgar
c918a96490
Warn if IOPS is being used
2018-12-06 16:17:09 -08:00
Alex Dadgar
1e3c3cb287
Deprecate IOPS
...
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Alex Dadgar
14a61ea3ea
Don't GC running but desired stop allocations
...
This PR fixes an edge case where we could GC an allocation that was in a
desired stop state but had not terminated yet. This can be hit if the
client hasn't shutdown the allocation yet or if the allocation is still
shutting down (long kill_timeout).
Fixes https://github.com/hashicorp/nomad/issues/4940
2018-12-05 13:01:12 -08:00
Mahmood Ali
adb4d69576
Merge pull request #4956 from hashicorp/b-vault-client-tweaks-followup
...
server/vault: Lock Vault expiration tracking
2018-12-04 19:46:59 -05:00
Mahmood Ali
50e38104a5
server/nomad: Lock Vault expiration tracking
...
`currentExpiration` field is accessed in multiple goroutines: Stats and
renewal, so needs locking.
I don't anticipate high contention, so simple mutex suffices.
2018-12-04 09:29:48 -05:00
Preetha Appan
8656d3379f
Add guards around subtracting summary count
2018-12-03 11:16:35 -06:00
Danielle Tomlinson
51a9f7369e
Merge pull request #4936 from hashicorp/f-legacy-refactor
...
Refactor and repackage client/driver
2018-11-30 13:38:06 +01:00
Danielle Tomlinson
d4cbd608ff
nomad: Remove on-submission job validation
...
With the introduction of driver plugins, we're temporarily relying on
_run time validation_ of driver configurations, rather than submission
time.
2018-11-30 10:47:08 +01:00
Nick Ethier
80ae7e34f4
Merge pull request #4906 from hashicorp/f-metric-prefix-master
...
Port metric prefix filtering to master
2018-11-29 22:27:47 -05:00
Nick Ethier
b1484aec33
nomad: fix hclog usage
2018-11-29 22:27:39 -05:00
Mahmood Ali
0a2611e41f
vault: protect against empty Vault secret response
...
Also, fix a case where a successful second attempt of loading token can
cause a panic.
2018-11-29 09:34:17 -05:00
Alex Dadgar
4ee603c382
Device hook and devices affect computed node class
...
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.
TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
2018-11-27 17:25:33 -08:00
Nick Ethier
95362eaa02
Merge pull request #4844 from hashicorp/f-docker-plugin
...
Docker driver plugin
2018-11-20 20:43:03 -05:00
Mahmood Ali
2e6133fd33
nil secrets as recoverable to keep renew attempts
2018-11-20 17:11:55 -05:00
Mahmood Ali
5827438983
Renew past recorded expiry till unrecoverable error
...
Keep attempting to renew Vault token past locally recorded expiry, just
in case the token was renewed out of band, e.g. on another Nomad server,
until Vault returns an unrecoverable error.
2018-11-20 17:10:55 -05:00
Mahmood Ali
5836a341dd
fix typo
2018-11-20 17:10:55 -05:00
Mahmood Ali
93add67e04
round ttl duration for users
2018-11-20 17:10:55 -05:00
Mahmood Ali
4a0544b369
Track renewal expiration properly
2018-11-20 17:10:55 -05:00
Mahmood Ali
79aa934a4b
reconcile interface
2018-11-20 17:10:55 -05:00
Mahmood Ali
6efea6d8fc
Populate agent-info with vault
...
Return Vault TTL info to /agent/self API and `nomad agent-info` command.
2018-11-20 17:10:55 -05:00
Mahmood Ali
6034af5084
Avoid explicit precomputed stats field
...
Seems like the stats field is a micro-optimization that doesn't justify
the complexity it introduces. Removing it and computing the stats from
revoking field directly.
2018-11-20 17:10:54 -05:00
Mahmood Ali
14842200ec
More metrics for Server vault
...
Add a gauge to track remaining time-to-live, duration of renewal request API call.
2018-11-20 17:10:54 -05:00
Mahmood Ali
e1994e59bd
address review comments
2018-11-20 17:10:54 -05:00
Mahmood Ali
35179c9655
Wrap Vault API api errors for easing debugging
2018-11-20 17:10:54 -05:00
Mahmood Ali
55456fc823
Set a 1s floor for Vault renew operation backoff
2018-11-20 17:10:54 -05:00
Mahmood Ali
7ad8f6c103
Merge pull request #4903 from hashicorp/b-delete-versions-mod-while-iter
...
Fix a panic related to batch GC
2018-11-20 15:16:02 -05:00
Mahmood Ali
6281700c0c
address review comments
2018-11-20 13:21:39 -05:00
Nick Ethier
5c5cae79ab
nomad: only lookup job is disable_dispatched_job_summary_metrics is set
2018-11-19 23:22:23 -05:00
Nick Ethier
8ac69f440d
nomad: lookup job instead of adding Dispatched to summary
2018-11-19 23:22:02 -05:00
Nick Ethier
85b221a1d6
nomad: add flag to disable publishing of job_summary metrics for dispatched jobs
2018-11-19 23:21:19 -05:00
Nick Ethier
29591a7c2e
task_runner: emit event on task exit with exit result details
2018-11-19 22:59:17 -05:00
Mahmood Ali
d744e71fa9
add a missing no errorassertion
2018-11-19 21:44:00 -05:00
Mahmood Ali
b93643cd96
Fix a panic related to batch GC
...
`deleteJobVersions` does concurrent modifications to iterated items
while iterating, by deleting job versions while it's iterating on them,
2018-11-19 20:59:45 -05:00
Mahmood Ali
bff9c3b3e9
Reproduce a panic related to batch GC
...
Test case that reproduces a panic with the following stacktrace:
```
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715]
goroutine 35 [running]:
testing.tRunner.func1(0xc0001e2200)
/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387
panic(0x167e400, 0x1c43a30)
/usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530)
/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc0003a0420, 0x1756059, 0xb)
/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e
github.com/hashicorp/nomad/nomad/state.(*StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1
github.com/hashicorp/nomad/nomad/state.(*StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2
github.com/hashicorp/nomad/nomad/state.(*StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685
testing.tRunner(0xc0001e2200, 0x1777138)
/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353
```
2018-11-19 20:58:32 -05:00
Michael Schurter
56ed4f01be
vault: fix panic by checking for nil secret
...
Vault's RenewSelf(...) API may return (nil, nil). We failed to check if
secret was nil before attempting to use it.
RenewSelf:
e3eee5b4fb/api/auth_token.go (L138-L155)
Calls ParseSecret:
e3eee5b4fb/api/secret.go (L309-L311)
If anyone has an idea on how to test this I didn't see any options. We
use a real Vault service, so there's no opportunity to mock the
response.
2018-11-19 17:07:59 -08:00
Danielle Tomlinson
8bf17fe22d
Merge pull request #4875 from hashicorp/f-constraints
...
scheduler: Make != constraints more flexible
2018-11-15 11:04:21 -08:00
Danielle Tomlinson
9c72dafc95
scheduler: Add is_set/is_not_set constraints
...
This adds constraints for asserting that a given attribute or value
exists, or does not exist. This acts as a companion to =, or !=
operators, e.g:
```hcl
constraint {
attribute = "${attrs.type}"
operator = "!="
value = "database"
}
constraint {
attribute = "${attrs.type}"
operator = "is_set"
}
```
2018-11-15 11:00:32 -08:00
Preetha Appan
e5de50fba8
Initial implementation of device preemption
2018-11-15 11:09:26 -06:00
Mahmood Ali
046f098bac
Track Node Device attributes and serve them in API
2018-11-14 14:42:29 -05:00
Mahmood Ali
a4a9347501
fix comment typos
2018-11-14 08:36:14 -05:00
Mahmood Ali
1e92161f14
Merge pull request #4858 from hashicorp/b-fix-master-20181109
...
Fix some tests in master
2018-11-13 16:08:26 -05:00
Alex Dadgar
08dc2ea702
Merge pull request #4867 from hashicorp/b-deployment-progress-deadline
...
Blocked evaluation fixes
2018-11-13 10:29:03 -08:00
Mahmood Ali
865419e756
convert all config durations to strings in tests
2018-11-13 10:21:40 -05:00
Mahmood Ali
4e18846fd9
Adjust streaming duration
...
This test expects 11 repeats of the same message emitted at intervals of
200ms; so we need more than 2 seconds to adjust for time sleep
variations and the like. So raising it to 3s here that should be
enough.
2018-11-13 10:21:40 -05:00
Mahmood Ali
1403ad21b9
Changelog job re-run fix
2018-11-13 07:52:51 -05:00
Mahmood Ali
e2d668f21c
Merge pull request #4861 from hashicorp/b-batch-deregister-transaction
...
Run job deregistering in a single transaction
2018-11-12 20:59:44 -05:00
Alex Dadgar
a90dc978e1
Handle new eval being the duplicate properly
2018-11-12 16:02:23 -08:00
Mahmood Ali
8513b3cccb
Comment public functions and batch write txn
2018-11-12 16:09:39 -05:00
Preetha Appan
7ef126a027
Smaller methods, and added tests for RPC layer
2018-11-10 17:37:33 -06:00
Preetha Appan
75662b50d1
Use response object/querymeta/writemeta in scheduler config API
2018-11-10 10:31:10 -06:00
Mahmood Ali
9c0a15f3ce
Run job deregistering in a single transaction
...
Fixes https://github.com/hashicorp/nomad/issues/4299
Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals.
Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index. However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals. When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation.
This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.
2018-11-09 22:35:26 -05:00
Preetha
3739713ce1
Merge pull request #4839 from hashicorp/b-gc-alloc-jobversion
...
Remove terminal allocations associated with older job modify index
2018-11-09 12:21:42 -06:00
Preetha Appan
39072977d6
Use create index as trigger condition to gc old terminal allocs
2018-11-09 11:44:21 -06:00
Alex Dadgar
2f06d88f47
Merge pull request #4847 from hashicorp/b-blocked-eval
...
Blocked evaluation fixes
2018-11-08 13:40:01 -08:00
Alex Dadgar
98398a8a44
Merge pull request #4842 from hashicorp/b-deployment-progress-deadline
...
Fix multiple bugs with progress deadline handling
2018-11-08 13:31:54 -08:00
Alex Dadgar
991791a513
typo fix
2018-11-08 13:28:27 -08:00
Alex Dadgar
be54e56570
review fixes
2018-11-08 09:48:36 -08:00
Preetha Appan
5f0a9d2cfd
Show preemption output in plan CLI
2018-11-08 09:48:43 -06:00
Alex Dadgar
dbb05357bc
fix test
2018-11-07 11:59:24 -08:00
Alex Dadgar
36abd3a3d8
review comments
2018-11-07 10:33:22 -08:00
Alex Dadgar
e3cbb2c82e
allocs fit checks if devices get oversubscribed
2018-11-07 10:33:22 -08:00
Alex Dadgar
4f9b3ede87
Split device accounter and allocator
2018-11-07 10:32:03 -08:00
Alex Dadgar
6fa893c801
affinities
2018-11-07 10:32:03 -08:00
Alex Dadgar
feb83a2be3
assign devices
2018-11-07 10:32:03 -08:00
Alex Dadgar
2d2248e209
Add devices to allocated resources
2018-11-07 10:32:03 -08:00
Alex Dadgar
b1c5d52817
Track jobs by namespace
2018-11-07 10:22:08 -08:00
Alex Dadgar
6d8bb3a7bd
Duplicate blocked evals cancelling improved
...
The old logic for cancelling duplicate blocked evaluations by job id had
the issue where the newer evaluation could have additional node classes
that it is (in)eligible for that we would not capture. This could make
it such that cluster state could change such that the job would make
progress but no evaluation was unblocked.
2018-11-07 10:08:23 -08:00
Preetha Appan
a9aec7e628
Fix failing resource subtraction test
2018-11-06 12:26:26 -06:00
Alex Dadgar
261aae32b1
more robust merging of the deployment status when getting updates from the client
2018-11-05 16:39:09 -08:00
Alex Dadgar
1c31970464
Fix multiple tgs with progress deadline handling
...
Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.
Further fix an issue where the blocked eval optimization would make it
so no evals were created to progress the deployment. To reproduce this
issue, prior to this commit, you can create a job with two task groups.
The first group has count 1 and resources such that it can not be
placed. The second group has count 3, max_parallel=1, and can be placed.
Run this first and then update the second group to do a deployment. It
will place the first of three, but never progress since there exists a
blocked eval. However, that doesn't capture the fact that there are two
groups being deployed.
2018-11-05 16:06:17 -08:00
Preetha Appan
6fdc84cce3
add comment
2018-11-02 18:11:36 -05:00
Preetha Appan
a6b714b81c
update preemption tests to use new node resource structs
...
also includes a fix to remove unnecessary subtraction of network mbits
2018-11-02 17:59:53 -05:00
Preetha
b2b52b1ada
Merge pull request #4794 from hashicorp/f-preemption-systemjobs
...
Preemption for system jobs
2018-11-02 16:28:06 -05:00
Preetha Appan
c33469157d
unit test plan apply with preemptions
2018-11-01 20:06:32 -05:00
Preetha Appan
57fe5050f0
more minor review feedback
2018-11-01 17:05:17 -05:00
Preetha Appan
fd60e66f86
Plumb alloc resource cache in a few more places.
...
also removed now unused method
2018-11-01 16:44:43 -05:00
Preetha Appan
e586817ce7
batch jobs GC removes terminal allocs if job modifyindex is older than running job
2018-11-01 00:05:31 -05:00
Mahmood Ali
9da19c6450
address review comments
2018-10-30 13:58:52 -04:00
Mahmood Ali
4937095389
Allow artifacts checksum interpolation
...
Fixes https://github.com/hashicorp/nomad/issues/4814
2018-10-30 13:24:30 -04:00
Preetha Appan
f1c3eb2792
Introduce interface with multiple implementations for resource distance
2018-10-30 11:06:32 -05:00
Preetha Appan
8f7eb61823
Introduce a response object for scheduler configuration
2018-10-30 11:06:32 -05:00
Preetha Appan
1a5421f5d7
more minor cleanup
2018-10-30 11:06:32 -05:00
Preetha Appan
0494a098ce
More style and readablity fixes from review
2018-10-30 11:06:32 -05:00
Preetha Appan
1415032c13
More review comments
2018-10-30 11:06:32 -05:00
Preetha Appan
b97f85e3e0
style fixes
2018-10-30 11:06:32 -05:00
Preetha Appan
12278527c7
make default config a variable
2018-10-30 11:06:32 -05:00
Preetha Appan
32cc764072
Add fsm layer tests
2018-10-30 11:06:32 -05:00
Preetha Appan
7b8156fc47
Restore/Snapshot plus unit tests for scheduler configuration
2018-10-30 11:06:32 -05:00
Preetha Appan
8807c25b11
Modify preemption code to use new style of resource structs
2018-10-30 11:06:32 -05:00
Preetha Appan
c1c1c230e4
Make preemption config a struct to allow for enabling based on scheduler type
2018-10-30 11:06:32 -05:00
Preetha Appan
bd34cbb1f7
Support for new scheduler config API, first use case is to disable preemption
2018-10-30 11:06:32 -05:00
Preetha Appan
3190a2c29b
Fix linting
2018-10-30 11:06:32 -05:00
Preetha Appan
eb38488d08
Fix logic bug, unit test for plan apply method in state store
2018-10-30 11:06:32 -05:00
Preetha Appan
9e4a35fff0
Fix comment
2018-10-30 11:06:32 -05:00
Preetha Appan
cc295b90de
Implement preemption for system jobs.
...
This commit implements an allocation selection algorithm for finding
allocations to preempt. It currently special cases network resource asks
from others (cpu/memory/disk/iops).
2018-10-30 11:06:32 -05:00
Preetha Appan
d11064d6ba
structs and API changes to plan and alloc structs needed for preemption
2018-10-30 11:06:32 -05:00
Preetha Appan
9257387a69
Add number of evictions to DesiredUpdates struct to use in CLI/API
2018-10-30 11:06:32 -05:00
Preetha Appan
5ff4b8e36f
REview feedback
2018-10-30 11:06:32 -05:00
Preetha Appan
5b3bfb63eb
structs and API changes to plan and alloc structs needed for preemption
2018-10-30 11:06:32 -05:00
Michael Schurter
5d49832de4
tests: fix usages of TestClient cleanup and mock driver
2018-10-29 14:21:05 -07:00
Michael Schurter
e060174130
ar: fix leader handling, state restoring, and destroying unrun ARs
...
* Migrated all of the old leader task tests and got them passing
* Refactor and consolidate task killing code in AR to always kill leader
tasks first
* Fixed lots of issues with state restoring
* Fixed deadlock in AR.Destroy if AR.Run had never been called
* Added a new in memory statedb for testing
2018-10-19 09:45:45 -07:00
Alex Dadgar
6f0ed6184b
Fix client reloading and pass the plugin loaders to server and client
2018-10-16 16:56:55 -07:00
Michael Schurter
a4b4d7b266
consul service hook
...
Deregistration works but difficult to test due to terminal updates not
being fully implemented in the new client/ar/tr.
2018-10-16 16:53:29 -07:00
Alex Dadgar
e401c660e7
Implement lifecycle hooks on the task runner
2018-10-16 16:53:29 -07:00
Alex Dadgar
a78cefec18
use int64
2018-10-16 15:34:32 -07:00
Preetha Appan
7c0d8c646c
Change CPU/Disk/MemoryMB to int everywhere in new resource structs
2018-10-16 16:21:42 -05:00
Alex Dadgar
f5a76d8411
review comments
2018-10-15 15:31:13 -07:00
Alex Dadgar
f9b056e1d1
Replace attributes map with new Attribute object
2018-10-13 14:08:58 -07:00
Alex Dadgar
04ba425dd5
validate constraints/affinities
2018-10-13 12:27:49 -07:00
Alex Dadgar
9b5aaac410
Device feasability checker
2018-10-13 12:27:49 -07:00
Alex Dadgar
bfb4caa2e7
node devices
2018-10-13 12:27:49 -07:00
Alex Dadgar
5a07f9f96e
parse affinities and constraints on devices
2018-10-11 14:05:19 -07:00
Alex Dadgar
a2a56a930c
Diff
2018-10-08 17:02:58 -07:00
Alex Dadgar
6b08b9d6b6
Define device request structs
2018-10-08 15:38:03 -07:00
Alex Dadgar
01f8e5b95f
renames
2018-10-04 14:57:25 -07:00
Alex Dadgar
52f9cd7637
fixing tests
2018-10-04 14:26:19 -07:00
Alex Dadgar
bac5cb1e8b
Scheduler uses allocated resources
2018-10-02 17:08:25 -07:00
Alex Dadgar
147d2430a1
allocated resources structs
2018-09-29 18:47:28 -07:00
Alex Dadgar
5c8697667e
Node reserved resources
2018-09-29 18:44:55 -07:00
Alex Dadgar
3183153315
Node resources on client
2018-09-29 17:23:41 -07:00
Alex Dadgar
9b793531d6
Merge pull request #4720 from hashicorp/b-jet-fixes
...
Series of scheduler fixes / debugging enhancements
2018-09-25 13:25:11 -07:00
Alex Dadgar
bd420692f3
fix logging
2018-09-25 10:49:55 -07:00
Preetha Appan
86e725e84c
Added logging around nacked evals in the scheduler worker
2018-09-25 10:49:02 -07:00
Alex Dadgar
6a21f9fe96
Unique TriggerBy for blocked evals
...
Give blocked evals a unique triggerby reason to make debugging a chain
of evaluations easier.
2018-09-24 14:47:49 -07:00
Alex Dadgar
e1a102f58c
test allocs fit
2018-09-24 13:59:01 -07:00
Alex Dadgar
d7f5be9148
Better comment on snapshotindex
2018-09-24 13:53:43 -07:00
Alex Dadgar
99498da6ed
Denormalize jobs in plan and ignore resources of terminal allocs
...
Denormalize jobs in AppendAllocs:
AppendAlloc was originally only ever called for inplace upgrades and new
allocations. Both these code paths would remove the job from the
allocation. Now we use this to also add fields such as FollowupEvalID
which did not normalize the job. This is only a performance enhancement.
Ignore terminal allocs:
Failed allocations are annotated with the followup Eval ID when one is
created to replace the failed allocation. However, in the plan applier,
when we check if allocations fit, these terminal allocations were not
filtered. This could result in the plan being rejected if the node would
be overcommited if the terminal allocations resources were considered.
2018-09-24 13:53:43 -07:00
Alex Dadgar
de442226ae
Fix other instances of blocking queries
2018-09-24 13:52:39 -07:00
Alex Dadgar
7f0d241ef4
always handle failed allocation
2018-09-21 15:13:54 -07:00
Alex Dadgar
b2449ae1ce
Fix deployment watcher index usage
...
Fixes three issues:
1. Retrieving the latest evaluation index was not properly selecting the
greatest index. This would undermine checks we had to reduce the number
of evaluations created when the latest eval index was greater than any
alloc change
2. Fix an issue where the blocking query code was using the incorrect
index such that the index was higher than necassary.
3. Special case handling of blocked evaluation since the create/snapshot
index is no particularly useful since they can be reblocked.
2018-09-21 13:59:11 -07:00
Alex Dadgar
5009566503
do not bootstrap with non voters
2018-09-19 17:17:39 -07:00
Alex Dadgar
e8f89597f5
fix rpc test
2018-09-19 10:17:54 -07:00
Alex Dadgar
9971b3393f
yamux
2018-09-17 14:22:40 -07:00
Alex Dadgar
b2f500b48c
Serf/Raft/Memberlist logger
2018-09-17 13:57:52 -07:00
Alex Dadgar
ca28afa3b2
small fixes
2018-09-15 16:42:38 -07:00
Alex Dadgar
3c19d01d7a
server
2018-09-15 16:23:13 -07:00
Alex Dadgar
7739ef51ce
agent + consul
2018-09-13 10:43:40 -07:00
Preetha Appan
996484981c
Fix panic when reschedule policy for allocation can't be looked up
...
because its task group changed
2018-09-05 17:01:02 -05:00
Alex Dadgar
4f89cabd34
Merge pull request #4631 from hashicorp/f-plugin-config
...
Parse plugin configs
2018-09-04 17:04:13 -07:00
Alex Dadgar
cc92cd92cd
Merge pull request #4642 from hashicorp/b-vet
...
Fix vet errors and use newer go version in travis
2018-09-04 17:04:02 -07:00
Alex Dadgar
c6576ddac1
Fix make check errors
2018-09-04 16:03:52 -07:00
Preetha Appan
26288b9522
Fix more review feedback
2018-09-04 16:10:11 -05:00
Preetha Appan
751c0eb5a5
code review feedback
2018-09-04 16:10:11 -05:00
Preetha Appan
4f8e925b54
Move topk and delay heap to separate packages under lib
2018-09-04 16:10:11 -05:00
Preetha Appan
9bc0962527
Track top k nodes by norm score rather than top k nodes per scorer
2018-09-04 16:10:11 -05:00
Preetha Appan
6ed527c636
Use heap to store top K scoring nodes.
...
Scoring metadata is now aggregated by scorer type to make it easier
to parse when reading it in the CLI.
2018-09-04 16:10:11 -05:00
Preetha Appan
dd5fe6373f
Fix scoring logic for uneven spread to incorporate current alloc count
...
Also addressed other small code review comments
2018-09-04 16:10:11 -05:00
Preetha Appan
e72c0fe527
more cleanup
2018-09-04 16:10:11 -05:00
Preetha Appan
92d37acc2a
comment and formatting cleanup
2018-09-04 16:10:11 -05:00
Preetha Appan
5812f906c8
Allow empty spread targets, and validate target percentages.
2018-09-04 16:10:11 -05:00
Preetha Appan
71bff00326
validate spread from job/task group validate methods
2018-09-04 16:10:11 -05:00
Preetha Appan
fbd0004707
Fix warnings
2018-09-04 16:10:11 -05:00
Preetha Appan
5eb82b6260
Validate method, and rename ratio field to percent
2018-09-04 16:10:11 -05:00
Preetha Appan
0037d72fa8
Structs and validation for spread
2018-09-04 16:10:11 -05:00
Preetha Appan
c407e3626f
More review comments
2018-09-04 16:10:11 -05:00
Preetha Appan
dbbb4a957a
Fail validation if system job has affinities
2018-09-04 16:10:11 -05:00
Preetha Appan
0bc030c6fb
Treat set_contains as a synonym of set_contains_all
2018-09-04 16:10:11 -05:00
Preetha Appan
e85a721cfb
Include affinities in job and task diff, and more test cases
2018-09-04 16:10:11 -05:00
Preetha Appan
f06c7ab2ad
Fix Copy method for job and task to include affinities
2018-09-04 16:10:11 -05:00
Preetha Appan
9f0caa9c3d
Affinity parsing, api and structs
2018-09-04 16:10:11 -05:00
Preetha Appan
9e29cfee76
Use readlock
2018-09-04 11:45:05 -05:00
Preetha Appan
062e5f1898
Use eval broker lock when reading/modifying delay heap
2018-08-31 10:59:48 -05:00
Alex Dadgar
bff1669ee4
Plugin config parsing
2018-08-29 17:06:01 -07:00
Chelsea Komlo
0a69cdb304
Merge pull request #4565 from hashicorp/b-compare-cert-alg
...
Error if TLS Certificate signature algorithm isn't supported in cipher suites
2018-08-15 16:09:46 -04:00
Xopherus
8d747578e8
Close multiplexer when context is cancelled
...
Multiplexer continues to create rpc connections even when
the context which is passed to the underlying rpc connections
is cancelled by the server.
This was causing #4413 - when a SIGHUP causes everything to reload,
it uses context to cancel the underlying http/rpc connections
so that they may come up with the new configuration.
The multiplexer was not being cancelled properly so it would
continue to create rpc connections and constantly fail,
causing communication issues with other nomad agents.
Fixes #4413
2018-08-13 19:32:49 -04:00
Chelsea Holland Komlo
31d6d00381
add simple getter for certificate
2018-08-10 12:37:21 -04:00
Andrei Burd
444ee45aff
Parametrized/periodic jobs per child tagged metric emmision
2018-06-21 10:40:56 +03:00
Alex Dadgar
b61051b3cd
Merge pull request #4409 from hashicorp/r-client-packages
...
Refactor client packages
2018-06-13 17:32:25 -07:00
Alex Dadgar
300b1a7a15
Tests only use testlog package logger
2018-06-13 15:40:56 -07:00
Chelsea Komlo
03075b603a
Merge pull request #4399 from hashicorp/r-reload-refactor
...
Refactor logic for dynamic reloading
2018-06-13 13:35:12 -04:00
Alex Dadgar
d0043691fb
remove structs + bump version
2018-06-11 13:52:19 -07:00
Alex Dadgar
af5753d2cd
bump version + generated files
2018-06-11 13:39:42 -07:00
Nick Ethier
e75e3ae665
nomad: use require pkg for tests
2018-06-11 13:50:50 -04:00
Nick Ethier
50c72adbd7
nomad: code review comments
2018-06-11 13:27:48 -04:00
Nick Ethier
a581cc9c01
nomad/structs: fix job diff test
2018-06-11 13:06:49 -04:00
Nick Ethier
41e010cdc2
nomad: add 'Dispatch' field to Job
...
New -bash: Dispatch: command not found field is used to denote if the Job is a child dispatched job of
a parameterized job.
2018-06-11 11:59:03 -04:00
Chelsea Holland Komlo
de03ce8070
move logic to determine whether to reload tls configuration to tlsutil helper
2018-06-08 14:33:58 -04:00
Chelsea Komlo
d738976234
Merge pull request #4395 from hashicorp/b-vault-second
...
Fix for dynamically reloading vault
2018-06-07 18:03:00 -04:00
Chelsea Holland Komlo
dcc9cdfeb7
fixup! comment and move to always log server reload operation
2018-06-07 17:12:36 -04:00
Chelsea Holland Komlo
41e35edf0c
fix test that now requires different config for test assertions
2018-06-07 17:07:06 -04:00
Chelsea Holland Komlo
9f6bd7bf3a
move logic for testing equality for vault config
2018-06-07 16:23:50 -04:00
Chelsea Holland Komlo
282f37b1ee
fix for dynamically reloading vault
2018-06-07 15:34:18 -04:00
Nick Ethier
2555bff4f5
nomad: add error check in test
2018-06-06 14:08:42 -04:00
Nick Ethier
d35bf6d184
nomad: handle edge case where node drain event shouldn't be emitted
2018-06-06 14:02:10 -04:00
Alex Dadgar
23cd56dc78
remove generated structs
2018-06-01 16:11:28 -07:00
Alex Dadgar
c0386819b3
bump version/lint/generated files
2018-06-01 15:23:10 -07:00
Preetha Appan
4134fcd2c7
Fix test setup for FSMSnapshotRestore_Deployments to use a valid job that exists
2018-05-31 14:39:39 -05:00
Alex Dadgar
7f25fcc1bd
Merge pull request #4354 from hashicorp/b-job-modify
...
Deployment adds JobSpecModifyIndex
2018-05-31 17:57:38 +00:00
Alex Dadgar
f2b2e0482b
code review fixes
2018-05-31 10:57:08 -07:00
Preetha
c2291a4cf7
Merge pull request #4349 from hashicorp/b-reconcile-raft-upgrade
...
Remove an unnecessary check in nomad member reconciliation
2018-05-30 13:17:38 -07:00
Alex Dadgar
92777a8018
Merge pull request #4329 from hashicorp/b-leaked-deployments
...
Clean up leaked deployments on restoration
2018-05-30 20:06:03 +00:00
Preetha
6ec6bcdd87
Merge pull request #4352 from hashicorp/b-update-drain-no-drainstrategy
...
fix bug with node eligibility staying disabled even after drain is disabled
2018-05-30 11:48:48 -07:00
Alex Dadgar
195e19827b
Deployment adds JobSpecModifyIndex
...
Deployment tracks the Job.JobModifyIndex so that PUTS against /v1/jobs
can be more easily coorelated with the created deployment.
Fixes https://github.com/hashicorp/nomad/issues/4301
2018-05-30 11:33:56 -07:00
Preetha Appan
c896a85a96
better test comment
2018-05-30 13:05:15 -05:00
Preetha Appan
647ccc2dc3
fix bug where disabling a node drain when there is no drain strategy set causes scheduling eligibility to stay ineligible
2018-05-30 12:28:46 -05:00
Alex Dadgar
ad3dbe8ed0
Merge pull request #4338 from hashicorp/tls_prefer_server_cipher_suites
...
Add support for tls PreferServerCipherSuites
2018-05-30 17:25:17 +00:00
Preetha Appan
2fd20310ea
Remove checks in member reconcile that was causing servers in protocol 3 to not change their ID in raft forever
2018-05-30 11:34:45 -05:00
Chelsea Holland Komlo
19e4a5489b
add support for tls PreferServerCipherSuites
...
add further tests for tls configuration
2018-05-25 13:20:00 -04:00
Alex Dadgar
15a71cc16e
Merge pull request #4331 from capone212/b-3595-fix-heartbeat
...
Fixed #3595
2018-05-25 00:57:03 +00:00
capone212
a0d4d4a336
Fixed #3595 ( https://github.com/hashicorp/nomad/issues/3595 )
...
Stopping heartbeat timer before remove
2018-05-24 13:15:06 +00:00
Alex Dadgar
352f2e03b5
Clean up leaked deployments on restoration
...
This PR cancels deployments that are active but do not have a job
associated with them. This is a broken invariant that causes issues in
the deployment watcher since it will not track them. Thus they are
objects that can't be operated on or cleaned up.
Fixes https://github.com/hashicorp/nomad/issues/4286
2018-05-23 16:44:21 -07:00
Chelsea Holland Komlo
38f611a7f2
refactor NewTLSConfiguration to pass in verifyIncoming/verifyOutgoing
...
add missing fields to TLS merge method
2018-05-23 18:35:30 -04:00
Alex Dadgar
c268640c02
Fix noisy log
2018-05-22 14:45:34 -07:00
Alex Dadgar
21c5ed850d
Register events
2018-05-22 14:06:33 -07:00
Alex Dadgar
17aac1c9de
node heartbeat missed event
2018-05-22 14:05:46 -07:00
Alex Dadgar
1fe9cb4f00
update error message
2018-05-22 14:04:59 -07:00
Alex Dadgar
5f2080bc26
Emit events based on eligibility
2018-05-22 14:04:59 -07:00
Alex Dadgar
86be50fa05
Merge pull request #4284 from hashicorp/f-drain-event
...
Emit Node Events for draining
2018-05-22 21:04:18 +00:00
Alex Dadgar
b6ecb75af9
update error message
2018-05-22 14:01:43 -07:00
Preetha
b409a3ed5b
Merge pull request #4313 from hashicorp/b-alloc-gc-desiredstate
...
Check allocation's desired state in GC eligibility logic
2018-05-21 16:49:49 -07:00
Preetha
159888a856
Merge pull request #4274 from hashicorp/f-force-rescheduling
...
Add CLI and API support for forcing rescheduling of failed allocs
2018-05-21 16:24:22 -07:00
Preetha Appan
a9d63c0df3
Check allocation's desired state in GC eligibility logic in core scheduler
2018-05-21 13:28:31 -05:00
Chelsea Komlo
687c26093c
Merge pull request #4269 from hashicorp/f-tls-remove-weak-standards
...
Configurable TLS cipher suites and versions; disallow weak ciphers
2018-05-11 08:11:46 -04:00
Alex Dadgar
9a2237bdab
Drain complete
2018-05-10 17:22:06 -07:00
Alex Dadgar
0cb31feb1f
Add node event when draining is set/removed/updated
2018-05-10 16:54:43 -07:00
Alex Dadgar
a35248d1d8
Plumb event via FSM
2018-05-10 16:30:54 -07:00
Preetha Appan
bfa0937bbb
Code review feedback
2018-05-10 14:42:24 -05:00
Preetha Appan
ca5758741b
Update serf to pick up graceful leave fix
2018-05-10 11:16:24 -05:00
Chelsea Holland Komlo
620558c107
log error if unable to create TLS configuration
2018-05-10 11:51:54 -04:00
Chelsea Holland Komlo
44f536f18e
add support for configurable TLS minimum version
2018-05-09 18:07:12 -04:00
Chelsea Holland Komlo
796bae6f1b
allow configurable cipher suites
...
disallow 3DES and RC4 ciphers
add documentation for tls_cipher_suites
2018-05-09 17:15:31 -04:00
Preetha Appan
b12df3c64b
Added CLI for evaluating job given ID, and modified client API for evaluate to take a request payload
2018-05-09 15:04:27 -05:00
Preetha Appan
ef531b0f34
Add unit tests for forced rescheduling
2018-05-09 11:30:42 -05:00
Chelsea Holland Komlo
d51611040f
Add driver health information to node list stub
2018-05-09 11:21:54 -04:00
Preetha Appan
1b8d8b2186
Fix logic inversion in force rescheduling
2018-05-08 20:00:06 -05:00
Preetha Appan
c1b92c284e
Work in progress - force rescheduling of failed allocs
2018-05-08 17:26:57 -05:00
Preetha Appan
c7edbd5f41
newlines in test
2018-05-07 14:55:01 -05:00
Preetha Appan
4e75456beb
Fix deadlock in deadline timer logic when progress deadline is passed and the deployment is updated.
2018-05-07 14:55:01 -05:00
Preetha Appan
cba13e4ec5
Fix test set up to set ModifyTime for alloc
2018-05-07 14:55:01 -05:00
Preetha Appan
19b096d203
Set modify time for allocs in unit test, and define current time in one spot
2018-05-07 14:55:01 -05:00
Preetha Appan
4c377b112e
Fix panic in deployment watcher when deployment is not in the state store due to a gc
2018-05-07 14:55:01 -05:00
Preetha
02d63432b4
Fix typo
2018-05-07 14:55:01 -05:00
Alex Dadgar
738056634e
Fix the initial progress deadline calculation when the alloc is inplace updated to be part of a new deployment
2018-05-07 14:55:01 -05:00
Michael Schurter
e90d051c43
consul: change hashed canary bytes
2018-05-07 14:55:01 -05:00
Alex Dadgar
768fec8505
Allow healthy canary deployment to skip progress deadline
2018-05-07 14:55:01 -05:00
Alex Dadgar
8626c1b94a
Reschedule when we have canaries properly
2018-05-07 14:55:01 -05:00
Michael Schurter
50e04c976e
consul: support canary tags for services
...
Also refactor Consul ServiceClient to take a struct instead of a massive
set of arguments. Meant updating a lot of code but it should be far
easier to extend in the future as you will only need to update a single
struct instead of every single call site.
Adds an e2e test for canary tags.
2018-05-07 14:55:01 -05:00
Michael Schurter
a3038cefb4
typo: transistion -> transition
2018-05-07 14:50:01 -05:00
Alex Dadgar
bd38675365
Fix tests
2018-05-07 14:50:01 -05:00
Alex Dadgar
319763a5d8
remove unnessary merge of DeploymentStatus.Timestamp
2018-05-07 14:50:01 -05:00
Alex Dadgar
f4af30fbb5
Canary tags structs
2018-05-07 14:50:01 -05:00
Alex Dadgar
f95ab4ade8
Mark canaries on creation, and unmark on promotion
2018-05-07 14:50:01 -05:00
Preetha Appan
b2b773e696
better comments and remove commented code
2018-05-07 14:50:01 -05:00
Preetha Appan
90a2311cef
Fix deadlock in deployment watcher when deployment starts with no allocations and eventually has failed allocations
2018-05-07 14:50:01 -05:00
Alex Dadgar
224b3092ae
change default to 10m and docs
2018-05-07 14:50:01 -05:00
Alex Dadgar
c91ce5cc38
Fix not enqueuing eval
2018-05-07 14:50:01 -05:00
Alex Dadgar
8d50955054
Fix typos
2018-05-07 14:50:01 -05:00
Alex Dadgar
641ef81cbf
Test fixes
2018-05-07 14:50:01 -05:00
Alex Dadgar
8a81038cdb
Set Reschedule from deployment watcher
2018-05-07 14:50:01 -05:00
Alex Dadgar
a510774451
Use UpdateAllocDesiredTransistion instead of UpsertEval but no transistions yet
2018-05-07 14:50:01 -05:00
Alex Dadgar
fcf4f582d0
small review feedback fixes
2018-05-07 14:50:01 -05:00
Alex Dadgar
e5caaf3358
Small test fix
2018-05-07 14:50:01 -05:00
Alex Dadgar
9bff9024b3
add latest eval back
2018-05-07 14:50:01 -05:00
Alex Dadgar
99e00fb774
Pass through timestamp
2018-05-07 14:50:01 -05:00
Alex Dadgar
c49b5f9949
Handle progressed deployments and tests
2018-05-07 14:50:01 -05:00
Alex Dadgar
9e75ea0a11
Deployment watcher based on deployment having progress deadline
2018-05-07 14:50:01 -05:00
Alex Dadgar
1336002255
Progress deadline in deployment state
2018-05-07 14:50:01 -05:00
Alex Dadgar
55b483709f
Fix tests
2018-05-07 14:50:01 -05:00
Alex Dadgar
ee50789c22
Initial implementation
2018-05-07 14:50:01 -05:00
Michael Schurter
a4caf8208b
tests: fix grpc fields in task diff
2018-05-04 11:08:45 -07:00
Michael Schurter
f6a4713141
consul: make grpc checks more like http checks
2018-05-04 11:08:11 -07:00
Michael Schurter
382caec1e1
consul: initial grpc implementation
...
Needs to be more like http.
2018-05-04 11:08:11 -07:00
Preetha Appan
52b3b53181
Update ModifyIndex of alloc when setting NextAllocation value
2018-05-03 17:04:36 -05:00
Preetha Appan
274bed1892
Add RescheduleTracker to allocs list stub struct
2018-05-01 14:53:47 -05:00
Alex Dadgar
de4af37249
version bump and remove generated
2018-04-27 11:10:00 -07:00
Alex Dadgar
845a43864a
generated files
2018-04-27 10:45:40 -07:00
Alex Dadgar
d03c881802
small cleanup and logging
2018-04-27 10:36:28 -07:00
Alex Dadgar
da3a552d8d
Fix issue where node connection map wasn't being pruned
2018-04-27 10:16:03 -07:00
Alex Dadgar
35e06ddb31
Remove generated and version bump
2018-04-26 16:49:19 -07:00
Alex Dadgar
43192cefae
generated files
2018-04-26 16:28:58 -07:00
Alex Dadgar
265a6d4f8b
Merge pull request #4224 from hashicorp/b-cron-parse
...
Handle potential panic in cron parsing
2018-04-26 16:22:37 -07:00
Alex Dadgar
05eccb063f
Merge branch 'b-cron-parse' of github.com:hashicorp/nomad into b-cron-parse
2018-04-26 15:51:56 -07:00
Alex Dadgar
ea24513d38
Allow nomad to restore bad periodic job
2018-04-26 15:51:47 -07:00
Chelsea Holland Komlo
ce1c3e0c2d
add unit tests for panic cron parsing bug
...
add comments for cron parsing wrapper
2018-04-26 18:47:08 -04:00
Alex Dadgar
15ad3f94af
Fix command line
2018-04-26 15:46:22 -07:00
Alex Dadgar
dc2907c2c9
Codecgen full package
2018-04-26 15:24:53 -07:00
Alex Dadgar
d0f237086b
UX touchups
2018-04-26 15:24:27 -07:00
Chelsea Holland Komlo
fca0169dbc
handle potential panic in cron parsing
2018-04-26 16:57:45 -04:00
Alex Dadgar
ff7e2b960f
Add test
2018-04-26 13:28:24 -07:00
Alex Dadgar
4a23307baf
Track all client connections
2018-04-26 13:22:09 -07:00
Alex Dadgar
5320205853
Sort signals in implicit constraint
...
Fixes https://github.com/hashicorp/nomad/issues/4212
2018-04-26 10:12:47 -07:00
Alex Dadgar
79844f1d01
Safety guard
2018-04-25 16:00:56 -07:00
Alex Dadgar
d45f39f24e
Fix detecting drain strategy on GC'd node
2018-04-25 16:00:56 -07:00
Nick Ethier
2e6c95f511
Merge pull request #4138 from hashicorp/i-hcl-json-endpoint
...
HCL to JSON api endpoint
2018-04-19 14:18:34 -04:00
Alex Dadgar
eeb85299ff
gofmt -s nomad/structs/structs_test.go
2018-04-17 13:39:32 -07:00
Chelsea Holland Komlo
788b23e17e
add test for node copy
2018-04-17 12:58:07 -04:00
Nick Ethier
31da01856a
command/agent: add HCL mock for parse endpoint
2018-04-16 19:21:09 -04:00
Alex Dadgar
4f2a7b6949
Fix copying drivers
2018-04-16 15:45:51 -07:00
Alex Dadgar
adaf4fa7e0
Remove generated structs
2018-04-12 16:35:31 -07:00
Alex Dadgar
663c4d0433
Version bump and generated files
2018-04-12 16:21:50 -07:00
Preetha
bdc17ebf10
Merge pull request #4139 from hashicorp/b-reschedule-invalid-system-jobs
...
Make system jobs fail validation if they contain a reschedule stanza
2018-04-11 20:01:19 -05:00
Preetha Appan
9f84e17bfd
dont print reschedule policy in error message
2018-04-11 17:07:14 -05:00
Preetha Appan
fa90f036c6
Fix more tests
2018-04-11 15:51:24 -05:00
Preetha Appan
81f856e7c9
Fix one more failing test
2018-04-11 15:49:23 -05:00
Preetha
0b6fbb8e16
Merge pull request #4131 from hashicorp/b-rescheduling-fix-gc
...
Update garbage collection logic to make sure allocs with pending evals are not GCed
2018-04-11 15:44:36 -05:00
Preetha Appan
1da4d88f3d
Make test descriptions better
2018-04-11 15:12:23 -05:00
Preetha Appan
a7b7b662ed
Make system jobs fail validation if they contain a reschedule stanza
2018-04-11 14:56:20 -05:00
Preetha Appan
688fd9ee37
Update alloc GC eligility logic to not rely on follow up evals
2018-04-11 13:58:02 -05:00
Charlie Voiselle
ba88f00ccb
Changed "til" to "until"
...
Should be "till" or "until"; chose "until" because it is unambiguous as to meaning.
2018-04-11 12:36:28 -05:00
Preetha
dec5b99478
Merge pull request #4120 from hashicorp/b-rescheduling-minimize-evals
...
Batch evals for rescheduling failed allocs correctly
2018-04-10 17:18:35 -05:00
Preetha Appan
59cce1d620
Fix unit test for core scheduler GC
2018-04-10 17:12:06 -05:00
Preetha Appan
7040884002
Simplify and update allocation gc eligibility logic
2018-04-10 16:08:37 -05:00
Preetha
c88fef4c4b
Merge pull request #4127 from hashicorp/b-autopilot-removepeer-fixes
...
Add node id persistence
2018-04-10 16:05:00 -05:00
Preetha Appan
a569d34f25
Add custom status description for rescheduling follow up evals, and make unit test robust
2018-04-10 15:30:15 -05:00
Preetha Appan
d17bfd8045
Make leader election test run on all three protocol versions
2018-04-10 14:20:02 -05:00
Preetha Appan
b3402efd0b
Adds a new custom description for update alloc triggered evals to make it easier to unit test.
2018-04-10 14:00:07 -05:00
Preetha Appan
6d0e1c9fea
Use preconfigured nodeID if there isn't a persisted node ID, and persist it if its not persisted.
2018-04-10 08:47:33 -05:00
Preetha Appan
216c053742
Remove debug print statements
2018-04-10 08:16:50 -05:00
Alex Dadgar
d179a09b83
WIP: Not setting node id properlperly
2018-04-09 18:01:28 -07:00
Preetha Appan
868f4f19f4
Unit tests for rolling upgrade and killing a leader
2018-04-09 17:42:30 -05:00
Preetha Appan
24203ae2f7
Remove duplicate commit
2018-04-09 15:08:09 -05:00
Preetha Appan
d1cb5df477
Batch evals for rescheduling failed allocs correctly and group them by job ID
2018-04-09 14:05:31 -05:00
Michael Schurter
d086f17708
rpc: wrap up old version check in a helper
...
DRY it up
2018-04-09 11:09:05 -07:00
Michael Schurter
e1cbcf0b3c
rpc: give min rpc version variable a better name
2018-04-09 11:09:05 -07:00
Michael Schurter
88a9409f8e
rpc: only attempt NodeRpc for nodes>=0.8
...
Attempting NodeRpc (or streaming node rpc) for clients that do not
support it causes it to hang indefinitely because while the TCP
connection exists, the client will never respond.
2018-04-09 11:08:06 -07:00
Preetha
6254d75eee
Merge pull request #4101 from hashicorp/b-rescheduling-edge-fixes
...
Fixes edge cases around timing/ task finish time being set more than once
2018-04-04 16:18:21 -05:00
Preetha Appan
5e4525bd30
Moves setting finishedAt to the right place and adds two unit tests.
2018-04-04 14:38:15 -05:00
Michael Schurter
b1a90462a8
Merge pull request #4094 from hashicorp/b-drain-panic
...
drain: fix double-close panic on drain future
2018-04-04 10:31:14 -07:00
Alex Dadgar
4c9c6decd3
Merge pull request #4100 from hashicorp/b-vault-no-auth
...
Improve handling of Vault errors
2018-04-03 17:23:43 -07:00
Alex Dadgar
af1b185ce4
Fix flaky deadline tests
2018-04-03 16:51:57 -07:00
Michael Schurter
ba6628a1b6
drain: return on first error
...
If one error is encountered it is unlikely any further attempts will
succeed, so fail fast.
2018-04-03 16:46:35 -07:00
Alex Dadgar
2b14371db5
Fix spelling
2018-04-03 15:58:03 -07:00
Alex Dadgar
9617a13a2b
Correctly handle the upgrade path of a node being drained when applying Raft logs
2018-04-03 15:32:44 -07:00
Preetha Appan
00537c739b
Fixes edge cases around timing and task finish time being set more than once
2018-04-03 16:34:59 -05:00
Alex Dadgar
58a3ec3fb2
Improve Vault error handling
2018-04-03 14:29:22 -07:00
Michael Schurter
edc4891283
drain: improve tests and fix spelling
...
* transistion -> transition
* don't t.Fatal in goroutines
* don't mutate global state
2018-04-02 16:40:47 -07:00
Michael Schurter
6840becf46
drain: refactor batch_future into its own file
...
aka What If structs.go Wasn't So Big?
2018-04-02 16:40:06 -07:00
Michael Schurter
44a749a7cc
drain: fix double-close panic on drain future
2018-04-02 16:39:18 -07:00
Alex Dadgar
86f9044676
remove generated files
2018-03-30 16:52:49 -07:00
Alex Dadgar
af81349dbe
Generated files
2018-03-30 16:14:40 -07:00
Alex Dadgar
23ec54a372
Merge pull request #4089 from hashicorp/tls-error-fix
...
Check for nil for RPC listener; prevent double closing of listener channel
2018-03-30 16:08:13 -07:00
Alex Dadgar
7f28cfcdfe
small cleanup
2018-03-30 15:49:56 -07:00
Chelsea Holland Komlo
a77dd08dd9
prevent double close due to error in creating listener
2018-03-30 17:15:56 -04:00
Chelsea Holland Komlo
402a026c88
add further error handling for rpc connection handling
2018-03-30 17:03:36 -04:00
Alex Dadgar
e8809f40dc
Test transistion from both infinite and a future deadline to force
2018-03-30 11:24:39 -07:00
Alex Dadgar
32a673a7e1
Fix force deadline notification
2018-03-30 09:58:29 -07:00
Alex Dadgar
1aa415b0d8
Integration test
2018-03-30 09:33:23 -07:00
Alex Dadgar
dc03fab29b
Canonicalize migrate
2018-03-29 17:42:58 -07:00
Alex Dadgar
e458ab9031
Merge branch 'master' into b-drain-batch
2018-03-29 17:10:34 -07:00
Michael Schurter
62e9553333
Merge pull request #4069 from hashicorp/f-hashealth
...
add HasHealth helper for nil checks
2018-03-29 17:03:20 -07:00
Alex Dadgar
301704091b
Handle upgrade where Node doesn't have eligiblity
...
This PR handles upgrading a node that has no scheduling eligiblity set.
2018-03-29 16:52:23 -07:00
Alex Dadgar
7d2aae2c11
test handleTaskGroup
2018-03-29 16:38:47 -07:00
Alex Dadgar
049a9213d2
Watch batch jobs
2018-03-29 16:07:51 -07:00
Preetha
9a732c4acb
Merge pull request #4071 from hashicorp/b-handle-missing-finishedat
...
handle missing finishedAt
2018-03-29 17:11:34 -05:00
Alex Dadgar
f12194328c
Integration test for batch complete case
2018-03-29 13:51:04 -07:00
Preetha
81d48fc7cf
Merge pull request #4079 from hashicorp/b-filter-desiredstop
...
Filter desired status stop allocs correctly
2018-03-29 15:36:22 -05:00
Preetha Appan
c8317532ff
Use time from task events if task state does not have FinishedAt set
2018-03-29 14:05:56 -05:00
Alex Dadgar
b194f93f2f
Disallow Update stanza on Batch
2018-03-29 11:28:56 -07:00
Michael Schurter
91b5bb58d9
add HasHealth helper for nil checks
...
We performed the DeploymentStatus nil checks a couple different ways, so
hopefully this helper will consoldiate them and make it more clear what
the code is doing.
2018-03-29 09:29:19 -07:00
Chelsea Komlo
607e631714
Merge pull request #4046 from hashicorp/tls-same-file-reload
...
Check file contents when determining if agent should reload TLS confi…
2018-03-29 10:51:32 -04:00
Preetha Appan
5090fefe96
Filter out allocs with DesiredState = stop, and unit tests
2018-03-29 09:28:52 -05:00
Preetha Appan
8776f4b942
Fix failing test
2018-03-29 07:59:38 -05:00
Preetha Appan
2da661595d
If FinishedAt is not set use alloc's modify time for rescheduling logic
2018-03-29 07:42:58 -05:00
Alex Dadgar
b18f789020
Unmark drain when nodes hit their deadline and only batch/system left and add all job type integration test
2018-03-28 17:25:58 -07:00
Chelsea Holland Komlo
b33d909bf9
add test to assert invalid files return error
2018-03-28 18:31:35 -04:00
Chelsea Holland Komlo
58ada9bc42
return error when setting checksum; don't reload
2018-03-28 18:15:50 -04:00
Chelsea Holland Komlo
2d5af7ff4d
set TLS checksum when parsing config
...
Refactor checksum comparison, always set checksum if it is empty
2018-03-28 09:56:11 -04:00
Michael Schurter
65ddae86f8
Merge pull request #4054 from hashicorp/b-drainer-index-fix
...
drainer: reset index when new job registered
2018-03-27 16:28:25 -07:00
Michael Schurter
79a2781585
Merge pull request #4053 from hashicorp/b-drain-sys-jobs-2
...
drain: fix draining of system jobs
2018-03-27 16:26:45 -07:00
Alex Dadgar
de4b3772f1
Create evals for system jobs when drain is unset
...
This PR creates evals for system jobs when:
* Drain is unset and mark eligible is true
* Eligibility is restored to the node
2018-03-27 15:53:24 -07:00
Chelsea Holland Komlo
dd5f627feb
set server configuration checksum on reload
2018-03-27 18:03:52 -04:00
Michael Schurter
ec60a1d3e3
drain: improve comments
2018-03-27 14:27:09 -07:00
Michael Schurter
e5dfb7e487
drain: unittest draining node logic
2018-03-27 14:24:01 -07:00
Michael Schurter
a1ed305a24
test: add mock batch and system allocs
...
Since the BatchJob helper had a different task group than the Alloc
helper, it was difficult to create a valid batch alloc.
2018-03-27 14:24:01 -07:00
Michael Schurter
77bddc7941
drain: stop sys jobs after drain completes
...
System allocs should be drained when a node's deadline is hit or when
all other allocs on the node have stopped/migrated.
2018-03-27 14:24:01 -07:00
Michael Schurter
fae77b874b
drainer: reset index when new job registered
2018-03-27 14:12:59 -07:00
Chelsea Holland Komlo
b522a0fadc
fix up to string to use time.Time
2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo
31557cc44f
move tests to use time.Time
2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo
003bc209b9
use time.Time for node events for compatibility
2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo
6e6d6b7e33
check file contents when determining if agent should reload TLS configuration
2018-03-27 15:42:20 -04:00
Alex Dadgar
59005d1d26
Merge pull request #4049 from hashicorp/b-tunnel
...
Only track nodes if the conn is from the node
2018-03-27 12:39:34 -07:00
Alex Dadgar
5dacb057b7
Only track nodes if the conn is from the node
...
Fixes a bug in which a connection to a Nomad server was treated as a
connection to a node because the server forwarded a node specific RPC.
2018-03-27 09:59:31 -07:00
Chelsea Komlo
57e2cd04bd
Merge pull request #4025 from hashicorp/reload-http-tls
...
Allow TLS configurations for HTTP and RPC connections to be reloaded …
2018-03-26 18:00:30 -04:00
Preetha Appan
539114124e
Fix too long token test case
2018-03-26 16:28:33 -05:00
Preetha Appan
33e170c15d
s/linear/constant/g
2018-03-26 14:45:09 -05:00
Preetha Appan
7db930b3c3
Extra test case and better error message for ambiguous config
2018-03-26 13:30:09 -05:00
Chelsea Holland Komlo
c2a95f9d7d
add test for upgrading only RPC connections
2018-03-26 10:55:27 -04:00
Preetha Appan
fbd56c35a8
Adds additional validation for ambigous settings (having both unlimited and attempts set)
2018-03-24 10:29:20 -05:00
Alex Dadgar
39987d5236
Merge branch 'master' into b-acl-name
2018-03-22 14:51:40 -07:00
Michael Schurter
a7f627e34c
eligbile -> eligible
2018-03-21 16:55:22 -07:00
Michael Schurter
a4f346abeb
remove spurious TODOs and FIXMEs
2018-03-21 16:55:22 -07:00
Michael Schurter
9f3086a268
test: must initialize jobResults with new func
2018-03-21 16:51:45 -07:00
Michael Schurter
e432c9af55
test: disable node drainer during tests
...
Node drainer would throw off the index checks
2018-03-21 16:51:45 -07:00
Michael Schurter
5c8c4bce2a
test: disable drain during fsm test
...
drainer was unsetting drain before fsm could read written value
2018-03-21 16:51:45 -07:00
Michael Schurter
341d87aa48
tests: use mock.BatchJob to fix tests
2018-03-21 16:51:45 -07:00
Michael Schurter
8b107acc06
mock: add BatchJob() helper
2018-03-21 16:51:45 -07:00
Michael Schurter
cb61a4bdc7
Fix linting errors
2018-03-21 16:51:45 -07:00
Alex Dadgar
640ebdaef6
fix race in drain integration tests
2018-03-21 16:51:45 -07:00
Michael Schurter
c401d5a098
Refactor assertOps into a helper func
2018-03-21 16:51:45 -07:00
Michael Schurter
187b0e1a48
Remove debug prints
2018-03-21 16:51:45 -07:00
Michael Schurter
f67eca48ac
Deregister garbage collected jobs
2018-03-21 16:51:45 -07:00
Michael Schurter
922842546c
JobNs -> NamespacedID
...
Also drop the New func as it's easy to swap the order of arguments since
they're both strings.
2018-03-21 16:51:45 -07:00
Michael Schurter
8dc7d9fb6a
drainer: RegisterJob -> RegisterJobs
...
Test job watcher
2018-03-21 16:51:45 -07:00
Michael Schurter
3116897099
Fix deadline heap triggering
...
Chan must be buffered to avoid skipping triggering altogether
Also made timing in a test a bit more lenient
2018-03-21 16:51:45 -07:00
Alex Dadgar
9d23c965da
fix comment
2018-03-21 16:51:45 -07:00
Alex Dadgar
fb4badf1bc
sharding
2018-03-21 16:51:44 -07:00
Alex Dadgar
2d91b9dfba
Batch drain update
2018-03-21 16:51:44 -07:00
Alex Dadgar
92b636dd32
Fix deadline handling
2018-03-21 16:51:44 -07:00
Michael Schurter
9898edfa90
Switch to drainerv2 impl
2018-03-21 16:51:44 -07:00
Alex Dadgar
7b2bad8c5e
Toggle Drain allows resetting eligibility
...
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
2018-03-21 16:51:44 -07:00
Alex Dadgar
ad80e655cc
code review
2018-03-21 16:51:44 -07:00
Alex Dadgar
11f9fe4960
spelling fixes
2018-03-21 16:51:44 -07:00
Alex Dadgar
bc7385812d
Comments
2018-03-21 16:51:44 -07:00
Alex Dadgar
e87c677a42
handle empty node case
2018-03-21 16:51:44 -07:00
Alex Dadgar
405dab2253
integration test and basic fixes
2018-03-21 16:51:44 -07:00
Alex Dadgar
e63bcb474d
Drainer
2018-03-21 16:51:44 -07:00
Alex Dadgar
4754366640
job watcher
2018-03-21 16:51:44 -07:00
Alex Dadgar
504bfabb4d
Node's being untracked or having updated deadlines, updates the deadliner
2018-03-21 16:51:44 -07:00
Alex Dadgar
66eaaa6a4d
node watcher
2018-03-21 16:51:44 -07:00
Alex Dadgar
527ac0b39d
drain heap
2018-03-21 16:51:44 -07:00
Alex Dadgar
2d4c193a0a
Initial design
2018-03-21 16:51:44 -07:00
Alex Dadgar
33ca319080
System test runs on mac
2018-03-21 16:51:44 -07:00
Alex Dadgar
f8d4a3a9e6
Fix file names
2018-03-21 16:51:44 -07:00
Michael Schurter
32a7649359
refactor main drainloop into 2 more methods
2018-03-21 16:51:44 -07:00
Michael Schurter
5e52f84bb7
drainer: refactor newStopAllocs, applyMigrations
2018-03-21 16:51:44 -07:00
Michael Schurter
62960ed7bd
client: don't monitor health of non-service jobs
...
Also fix system job draining; won't work without deadline fixes
2018-03-21 16:51:44 -07:00
Alex Dadgar
a37329189a
Improve DeadlineTime helper
2018-03-21 16:51:44 -07:00
Michael Schurter
b7c993f0e5
drainer: convert fsm errors to go errors
2018-03-21 16:51:44 -07:00
Michael Schurter
ab0de41884
drainer: factor job & node watchers out of drainer.go
2018-03-21 16:51:44 -07:00
Michael Schurter
5922aef623
Restart every time SetEnabled(true) is called
2018-03-21 16:51:44 -07:00
Michael Schurter
959d447d38
Remove unused context
2018-03-21 16:51:44 -07:00
Michael Schurter
8b41e9b2e1
drainer: drainer should shutdown with server
2018-03-21 16:51:44 -07:00
Michael Schurter
0a17076ad2
refactor drainer into a subpkg
2018-03-21 16:51:44 -07:00
Alex Dadgar
93871c18f8
Fix retaining the drain
2018-03-21 16:51:44 -07:00
Alex Dadgar
010a6b8ca5
Unblock evals once eligible
2018-03-21 16:51:44 -07:00
Alex Dadgar
8289cc3c6f
HTTP and API
2018-03-21 16:51:44 -07:00
Alex Dadgar
0fba0101b6
RPC/FSM/State Store for Eligibility
2018-03-21 16:51:44 -07:00
Alex Dadgar
b3d2346419
Upgrade path
2018-03-21 16:51:43 -07:00
Alex Dadgar
2f5309d82a
Remove update time
2018-03-21 16:51:43 -07:00
Alex Dadgar
0965c9ed28
Fix tests
2018-03-21 16:51:43 -07:00
Alex Dadgar
010228577e
Drain cli, api, http
2018-03-21 16:51:43 -07:00
Alex Dadgar
e459a666ed
Node.Drain takes strategy
2018-03-21 16:49:48 -07:00
Michael Schurter
03d0e5b8a0
improve drain fsm/statestore tests
2018-03-21 16:49:48 -07:00
Michael Schurter
d1ec65d765
switch to new raft DesiredTransition message
2018-03-21 16:49:48 -07:00
Michael Schurter
acf59ee75e
drainer: switch to job based watching
2018-03-21 16:49:48 -07:00
Alex Dadgar
db4a634072
RPC, FSM, State Store for marking DesiredTransistion
...
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter
c0542474db
drain: initial drainv2 structs and impl
2018-03-21 16:49:48 -07:00
Chelsea Komlo
6fc9231dac
Merge pull request #3856 from hashicorp/f-client-add-health-checks
...
Client driver health checks for Docker
2018-03-21 18:05:00 -04:00
Chelsea Holland Komlo
66e44cdb73
Allow TLS configurations for HTTP and RPC connections to be reloaded separately
2018-03-21 17:51:08 -04:00
Preetha
01898b2c25
Merge pull request #4007 from hashicorp/f-show-rescheduling-cli-job-status
...
Show a section on upcoming delayed evaluations when applicable
2018-03-21 14:37:38 -05:00
Chelsea Holland Komlo
f801709a0a
fix issue when updating node events
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
86b7b3d2d9
fix up health check logic comparison; add node events to client driver checks
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
d8f68e5ef8
fix up codereview feedback
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
c7fd0bd8a1
fix up scheduler mocks
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
c50d02ae93
go style; update comments
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
a522da6994
fix up gofmt
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
3aa726baab
fix scheduler driver name; create node structs file
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
3cba95e8a7
allow nomad to schedule based on the status of a client driver health check
...
Slight updates for go style
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
0bde357731
add concept of health checks to fingerprinters and nodes
...
fix up feedback from code review
add driver info for all drivers to node
2018-03-21 15:15:25 -04:00
Preetha
17f2f52f08
Merge pull request #3979 from hashicorp/b_update_compat_delete
...
Delete compatibility code for job level update stanza
2018-03-21 09:17:01 -05:00
Michael Schurter
70c370c6fe
Merge pull request #4003 from jrasell/f_gh_3988
...
Allow Nomads Consul health check names to be configurable.
2018-03-20 16:44:08 -07:00
James Rasell
121c3bc997
Update Consul check params from using health-check to check.
2018-03-20 16:03:58 +01:00
Preetha Appan
31a3c81c3b
Show a section on upcoming delayed evaluations when applicable
2018-03-19 21:42:37 -05:00
Preetha Appan
33a5a72323
Make suggested interval round to seconds, and more end to end test cases
2018-03-19 14:56:52 -05:00
James Rasell
15afef9b77
Allow Nomads Consul health checks to be configurable.
...
This change allows the client HTTP and the server HTTP, Serf and
RPC health check names within Consul to be configurable with the
defaults as previous. The configuration can be done via either a
config file or using CLI flags.
Closes #3988
2018-03-19 19:37:56 +01:00
Alex Dadgar
9e05c9a50e
Merge pull request #3997 from hashicorp/b-serf-addr
...
RPC Advertise used exclusively for Clients
2018-03-19 09:30:20 -07:00
Alex Dadgar
2baa1c38f2
clarify comment
2018-03-16 16:47:08 -07:00
Alex Dadgar
b8607ad6d6
Heartbeat uses client rpc advertise and server defaults server rpc advertise addr
2018-03-16 16:47:08 -07:00
Alex Dadgar
52b7fb5361
Separate client and server rpc advertise addresses
2018-03-16 16:47:08 -07:00
Michael Schurter
c3e8f6319c
gofmt -s (simplify) files
2018-03-16 16:31:16 -07:00
Alex Dadgar
b3ab063132
Merge pull request #3992 from hashicorp/f-vault-orphan
...
Allow and recommend Orphaned Vault tokens
2018-03-16 10:59:54 -07:00
Alex Dadgar
6a44e6092f
Pull snapshotting out of loop
2018-03-16 10:54:26 -07:00
Alex Dadgar
7545c0053e
job gc uses batch endpoint
2018-03-16 10:53:03 -07:00
Alex Dadgar
586ae36d13
Batch Deregister RPC
2018-03-16 10:53:03 -07:00
Alex Dadgar
c152774997
Allow and recommend Orphaned Vault tokens
...
This PR removes enforcement that the Vault token role disallows orphaned
tokens and recommends orphaned tokens to simplify the
bootstrapping/upgrading of Nomad clusters. The requirement that Nomad's
Vault token never expire and be shared by all instances of Nomad servers
is not operationally friendly.
2018-03-15 15:32:08 -07:00
Alex Dadgar
fc782d5942
List unblocks on summary changes
2018-03-15 10:22:03 -07:00
Alex Dadgar
85be2d99b3
Drop ACL todo
2018-03-14 16:41:46 -07:00
Alex Dadgar
3537c73289
Merge pull request #3978 from hashicorp/b-core-sched
...
Always add core scheduler
2018-03-14 16:13:15 -07:00
Preetha Appan
56e60e5840
Fix linting warning
2018-03-14 16:12:22 -05:00
Preetha Appan
9a5e6edf1f
Rename DelayCeiling to MaxDelay
2018-03-14 16:10:32 -05:00
Preetha Appan
4193015e48
Update comment
2018-03-14 16:10:32 -05:00
Preetha Appan
3e96c6c4e0
Address more code review feedback
2018-03-14 16:10:32 -05:00
Preetha Appan
9749b7a05c
Avoids unnecessary timer object creation
2018-03-14 16:10:32 -05:00
Preetha Appan
9fed0d2103
Get reschedule policy from the alloc directly
2018-03-14 16:10:32 -05:00
Preetha Appan
4d5e9bcb45
Extra comments, remove unnecessary if condition
2018-03-14 16:10:32 -05:00
Preetha Appan
c6f333c90f
Move delayheap to lib package
2018-03-14 16:10:32 -05:00
Preetha Appan
1ab8f2b57a
Address some code review comments
2018-03-14 16:10:32 -05:00
Preetha Appan
7887f39ff4
Added a delay heap to track evals with WaitUntil set, and use in eval broker
2018-03-14 16:10:32 -05:00
Preetha Appan
342c3fb961
Added FollowupEvalID field and helper methods to calculate reschedule eligibility based on delay
2018-03-14 16:10:32 -05:00
Preetha Appan
87538fc87d
Fix formatting
2018-03-14 16:10:32 -05:00
Preetha Appan
51ec6ec15e
Formatting and linting fixes
2018-03-14 16:10:32 -05:00
Preetha Appan
5f50c3d618
Add new reschedule options to API layer and unit tests
2018-03-14 16:10:32 -05:00
Preetha Appan
10c9662222
New delayed rescheduling options, validation function and unit tests
2018-03-14 16:10:32 -05:00
Alex Dadgar
92cb552ff6
Always add core scheduler and detect invalid schedulers
2018-03-14 10:53:27 -07:00
Alex Dadgar
55e4f5cdc4
Require core scheduler
2018-03-14 10:37:49 -07:00
Preetha Appan
948d917a60
lint warning fixed
2018-03-14 11:30:09 -05:00
Preetha Appan
a924183604
Remove compat code for upgrade stanza that copied state from job level update stanza
2018-03-14 10:21:46 -05:00
Chelsea Komlo
810eedfa2a
Merge pull request #3945 from hashicorp/f-add-node-events
...
Add node events
2018-03-14 08:42:55 -04:00
Preetha
360d6e5a92
Merge pull request #3968 from hashicorp/f-nicer-vault-error
...
Make server side error messages from vault more clearer
2018-03-13 20:49:39 -05:00
Preetha Appan
7b5955826d
Fix lint warning
2018-03-13 20:49:01 -05:00
Alex Dadgar
de6ebb6e6c
small cleanup
2018-03-13 18:08:22 -07:00
Alex Dadgar
63e14b7d63
nodeevents -> events
2018-03-13 18:08:22 -07:00
Alex Dadgar
d3c3deffad
fixes
2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo
b41501e442
code review feedback
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
8f109c344c
make check fixes
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
1488b076d1
code review feedback
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
19ef872769
keep state store functions in one file
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
a8655320fd
fix up go check warnings
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
a8bcbd81e6
batch submitting node events
2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo
d30c269fbe
code review feedback
2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo
0f306aa0dd
move all structs to structs file
2018-03-13 18:05:40 -07:00