open-nomad

Author	SHA1	Message	Date
Preetha	ec92bf673c	Merge pull request #5223 from hashicorp/f-jobs-list-datacenters Add Datacenters to the JobListStub struct	2019-01-24 08:13:30 -06:00
Michael Schurter	13f061a83f	Merge pull request #5196 from hashicorp/f-plugin-utils Make plugins/shared external and make pluginutls/	2019-01-23 06:59:32 -08:00
Michael Schurter	32daa7b47b	goimports until make check is happy	2019-01-23 06:27:14 -08:00
Michael Schurter	be0bab7c3f	move pluginutils -> helper/pluginutils I wanted a different color bikeshed, so I get to paint it	2019-01-22 15:50:08 -08:00
Alex Dadgar	4bdccab550	goimports	2019-01-22 15:44:31 -08:00
Alex Dadgar	cdcd3c929c	loader and singleton	2019-01-22 15:11:57 -08:00
Alex Dadgar	6c2782f037	move catalog + grpcutils	2019-01-22 15:11:57 -08:00
Preetha Appan	38422642cb	Use DesiredState to determine whether to stop sending task events	2019-01-22 16:43:32 -06:00
Michael Lange	ce7bc4f56f	Add Datacenters to the JobsListStub struct So it can be used for filtering the full list of jobs	2019-01-22 11:16:35 -08:00
Mahmood Ali	e1803b685b	tests: deflake TestClientAllocations_GarbageCollect_Remote Use the same strategy as one in f2f383b07543a09ca989b82738926f7248e1ab28	2019-01-19 09:07:27 -05:00
Mahmood Ali	b2203a3a22	Merge pull request #5215 from hashicorp/test-fix-garbagecollect test: fix flaky garbage collect test	2019-01-18 21:10:01 -05:00
Mahmood Ali	05e32fb525	Merge pull request #5213 from hashicorp/b-api-separate Slimmer /api package	2019-01-18 20:52:53 -05:00
Michael Schurter	0cd35ba335	test: fix flaky garbage collect test This seems to fix TestClientAllocations_GarbageCollectAll_Remote being flaky. This test confuses me. It joins 2 servers, but then goes out of its way to make sure the test client only interacts with one. There are not enough comments for me to figure out the precise assertions this test is trying to make. A good old fashioned wait-for-the-client-to-register seems to fix the flakiness though. The error was that the node could not be found, so this makes some sense. However, lots of other tests seem to use the same "wait for node" logic and don't appear to be flaky, so who knows why waiting fixes this one. Passes with -race.	2019-01-18 16:01:30 -08:00
Mahmood Ali	7bdd43f3e0	api: avoid codegen for syncing Given that the values will rarely change, specially considering that any changes would be backward incompatible change. As such, it's simpler to keep syncing manually in the rare occasion and avoid the syncing code overhead.	2019-01-18 18:52:31 -05:00
Preetha Appan	510d7839e4	code review comments	2019-01-18 17:41:39 -06:00
Mahmood Ali	253532ec00	api: avoid import nomad/structs pkg nomad/structs is an internal package and imports many libraries (e.g. raft, codec) that are not relevant to api clients, and may cause unnecessary dependency pain (e.g. `github.com/ugorji/go/codec` version is very old now). Here, we add a code generator that imports the relevant constants from `nomad/structs`. I considered using this approach for other structs, but didn't find a quick viable way to reduce duplication. `nomad/structs` use values as struct fields (e.g. `string`), while `api` uses value pointer (e.g. `*string`) instead. Also, sometimes, `api` structs contain deprecated fields or additional documentation, so simple copy-paste doesn't work. For these reasons, I opt to keep the status quo.	2019-01-18 14:51:19 -05:00
Preetha Appan	be9656d195	fix linting	2019-01-17 15:36:33 -06:00
Preetha Appan	0f8a113ead	Refactor to find jobs with child instances more effeciently also added unit tests	2019-01-17 14:29:48 -06:00
Preetha Appan	be36fee48e	Use IsParameterized/isPeriodic methods	2019-01-17 12:15:42 -06:00
Preetha Appan	81a8f18cac	Fix bug in reconcile summaries that affects periodic/parameterized jobs This fixes incorrect parent job summaries by recomputing them in the ReconcileJobSummaries method in the state store	2019-01-17 12:01:01 -06:00
Nick Ethier	597b7b751d	tr: add retry /w backoff to stats_hook failure	2019-01-12 12:18:24 -05:00
Mahmood Ali	4414a2ce1c	tests: remove tests for unsupported features With switching to driver plugins, driver validation is quite tricky and we need to do some design thinking before supporting it against.	2019-01-10 10:21:48 -05:00
Nick Wales	7a7b5da0df	Adds optional Consul service tags to nomad server and agent services, gh#4297	2019-01-09 22:02:46 +00:00
Mahmood Ali	1f2473263e	fix more cases of logging arity errors	2019-01-09 09:22:47 -05:00
Mahmood Ali	6f077a73dc	Fix panic on failure Error expects an odd number of arguments, and panics otherwise.	2019-01-08 12:19:44 -05:00
Michael Schurter	324e989327	Merge pull request #5034 from hashicorp/test-fix-races Test fix races	2019-01-08 07:04:09 -08:00
Alex Dadgar	79cfe26021	vet	2019-01-07 14:49:41 -08:00
Alex Dadgar	8a35d7b1dd	Test recovery	2019-01-07 14:49:41 -08:00
Nick Ethier	a96afb6c91	fix tests that fail as a result of async client startup	2018-12-20 00:53:44 -05:00
Michael Schurter	6c1dbb659d	test: fix race and nil panic in nomad/ tests Race was test only and due to unlocked map access. Panic was test only and due to checking a field on a struct even when we knew the struct was nil. Race output that was fixed: ``` ================== WARNING: DATA RACE Read at 0x00c000697dd0 by goroutine 768: runtime.mapaccess2() /usr/local/go/src/runtime/map.go:439 +0x0 github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds.func8() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:402 +0xe6 github.com/hashicorp/nomad/testutil.WaitForResultRetries() /home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:30 +0x5a github.com/hashicorp/nomad/testutil.WaitForResult() /home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:22 +0x57 github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:401 +0xb53 testing.tRunner() /usr/local/go/src/testing/testing.go:827 +0x162 Previous write at 0x00c000697dd0 by goroutine 569: runtime.mapassign() /usr/local/go/src/runtime/map.go:549 +0x0 github.com/hashicorp/nomad/nomad.(PeriodicDispatch).Add() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:224 +0x2eb github.com/hashicorp/nomad/nomad.(Server).restorePeriodicDispatcher() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:394 +0x29a github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:234 +0x593 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Goroutine 768 (running) created at: testing.(T).Run() /usr/local/go/src/testing/testing.go:878 +0x650 testing.runTests.func1() /usr/local/go/src/testing/testing.go:1119 +0xa8 testing.tRunner() /usr/local/go/src/testing/testing.go:827 +0x162 testing.runTests() /usr/local/go/src/testing/testing.go:1117 +0x4ee testing.(M).Run() /usr/local/go/src/testing/testing.go:1034 +0x2ee main.main() _testmain.go:1150 +0x221 Goroutine 569 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 ================== ```	2018-12-19 15:48:02 -08:00
Michael Schurter	004fa574cb	test: fix race in eval broker update chan Similar to previous commits the delayed eval update chan was set and access from different goroutines causing a race. Passing the chan on the stack resolves the race. Race output from `go test -race -run 'Server_RPC$'` in nomad/ ``` ================== WARNING: DATA RACE Write at 0x00c000339150 by goroutine 63: github.com/hashicorp/nomad/nomad.(EvalBroker).flush() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:708 +0x3dc github.com/hashicorp/nomad/nomad.(EvalBroker).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:174 +0xc4 github.com/hashicorp/nomad/nomad.(Server).revokeLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:718 +0x1fd github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122 +0x95d github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Previous read at 0x00c000339150 by goroutine 73: github.com/hashicorp/nomad/nomad.(EvalBroker).runDelayedEvalsWatcher() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:771 +0x176 Goroutine 63 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 Goroutine 73 (running) created at: github.com/hashicorp/nomad/nomad.(EvalBroker).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:170 +0x173 github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:207 +0x355 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c ================== ```	2018-12-19 15:48:02 -08:00
Michael Schurter	1c137690c4	test: fix race around block eval chans Similar to previous commit, stop and change chans were being set and accessed from different goroutines. Passing the chans on the stack resolves the race. Output from `go test -race -run 'Server_RPC$' in nomad/ ``` ================== WARNING: DATA RACE Write at 0x00c0002b4e10 by goroutine 63: github.com/hashicorp/nomad/nomad.(BlockedEvals).Flush() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:648 +0x32a github.com/hashicorp/nomad/nomad.(BlockedEvals).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149 +0x12b github.com/hashicorp/nomad/nomad.(Server).revokeLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721 +0x232 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122 +0x95d github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Previous read at 0x00c0002b4e10 by goroutine 75: github.com/hashicorp/nomad/nomad.(BlockedEvals).watchCapacity() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:483 +0xfe Goroutine 63 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 Goroutine 75 (finished) created at: github.com/hashicorp/nomad/nomad.(BlockedEvals).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:141 +0xba github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210 +0x392 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c ================== ================== WARNING: DATA RACE Write at 0x00c0002b4e50 by goroutine 63: github.com/hashicorp/nomad/nomad.(BlockedEvals).Flush() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:649 +0x388 github.com/hashicorp/nomad/nomad.(BlockedEvals).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149 +0x12b github.com/hashicorp/nomad/nomad.(Server).revokeLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721 +0x232 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122 +0x95d github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Previous read at 0x00c0002b4e50 by goroutine 77: github.com/hashicorp/nomad/nomad.(BlockedEvals).prune() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:690 +0xae Goroutine 63 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 Goroutine 77 (finished) created at: github.com/hashicorp/nomad/nomad.(BlockedEvals).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:142 +0xdc github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210 +0x392 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c ================== ```	2018-12-19 15:48:02 -08:00
Michael Schurter	80263861aa	test: fix race around updateCh handling PeriodicDispatch.SetEnabled sets updateCh in one goroutine, and PeriodicDispatch.run accesses updateCh in another. The race can be prevented by having SetEnabled pass updateCh to run. Race detector output from `go test -race -run TestServer_RPC` in nomad/ ``` ================== WARNING: DATA RACE Write at 0x00c0001d3f48 by goroutine 75: github.com/hashicorp/nomad/nomad.(PeriodicDispatch).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:468 +0x256 github.com/hashicorp/nomad/nomad.(Server).revokeLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:724 +0x267 github.com/hashicorp/nomad/nomad.(Server).leaderLoop.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:131 +0x3c github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:163 +0x4dd github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Previous read at 0x00c0001d3f48 by goroutine 515: github.com/hashicorp/nomad/nomad.(PeriodicDispatch).run() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:338 +0x177 Goroutine 75 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 Goroutine 515 (running) created at: github.com/hashicorp/nomad/nomad.(PeriodicDispatch).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:176 +0x1bc github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:231 +0x582 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c ================== ```	2018-12-19 15:48:02 -08:00
Danielle Tomlinson	3647b701a6	taskrunner: Emit task events when a hook fails	2018-12-13 18:20:18 +01:00
Chris Baker	af593c401c	Merge pull request #4974 from hashicorp/b-1173-log-spam rpc accept loop: added backoff on logging	2018-12-12 16:54:42 -08:00
Chris Baker	121a9eb8cb	some changes for more idiomatic code	2018-12-12 23:11:17 +00:00
Alex Dadgar	fbe4d67d1b	fix iops related tests	2018-12-12 14:32:22 -08:00
Chris Baker	34600f8b75	fixed bug in loop delay	2018-12-12 19:16:41 +00:00
Chris Baker	89c64932c1	gofmt	2018-12-12 19:09:06 +00:00
Chris Baker	22c11d8799	improved code for readability	2018-12-12 18:52:06 +00:00
Preetha	f406e66ab8	Merge pull request #4881 from hashicorp/f-device-preemption Device preemption	2018-12-11 18:34:19 -06:00
Alex Dadgar	1531b6d534	Merge pull request #4970 from hashicorp/f-no-iops Deprecate IOPS	2018-12-11 12:51:22 -08:00
Chris Baker	59beae35df	nomad/rpc listener: modified to throttle logging on "permanent" Accept() errors as well (with a higher delay cap)	2018-12-07 22:14:15 +00:00
Chris Baker	707bac0a7b	rpc accept loop: added backoff on logging for failed connections, in case there is a fast fail loop (NMD-1173)	2018-12-07 20:12:55 +00:00
Alex Dadgar	c918a96490	Warn if IOPS is being used	2018-12-06 16:17:09 -08:00
Alex Dadgar	1e3c3cb287	Deprecate IOPS IOPS have been modelled as a resource since Nomad 0.1 but has never actually been detected and there is no plan in the short term to add detection. This is because IOPS is a bit simplistic of a unit to define the performance requirements from the underlying storage system. In its current state it adds unnecessary confusion and can be removed without impacting any users. This PR leaves IOPS defined at the jobspec parsing level and in the api/ resources since these are the two public uses of the field. These should be considered deprecated and only exist to allow users to stop using them during the Nomad 0.9.x release. In the future, there should be no expectation that the field will exist.	2018-12-06 15:09:26 -08:00
Alex Dadgar	14a61ea3ea	Don't GC running but desired stop allocations This PR fixes an edge case where we could GC an allocation that was in a desired stop state but had not terminated yet. This can be hit if the client hasn't shutdown the allocation yet or if the allocation is still shutting down (long kill_timeout). Fixes https://github.com/hashicorp/nomad/issues/4940	2018-12-05 13:01:12 -08:00
Mahmood Ali	adb4d69576	Merge pull request #4956 from hashicorp/b-vault-client-tweaks-followup server/vault: Lock Vault expiration tracking	2018-12-04 19:46:59 -05:00
Mahmood Ali	50e38104a5	server/nomad: Lock Vault expiration tracking `currentExpiration` field is accessed in multiple goroutines: Stats and renewal, so needs locking. I don't anticipate high contention, so simple mutex suffices.	2018-12-04 09:29:48 -05:00
Preetha Appan	8656d3379f	Add guards around subtracting summary count	2018-12-03 11:16:35 -06:00
Danielle Tomlinson	51a9f7369e	Merge pull request #4936 from hashicorp/f-legacy-refactor Refactor and repackage client/driver	2018-11-30 13:38:06 +01:00
Danielle Tomlinson	d4cbd608ff	nomad: Remove on-submission job validation With the introduction of driver plugins, we're temporarily relying on _run time validation_ of driver configurations, rather than submission time.	2018-11-30 10:47:08 +01:00
Nick Ethier	80ae7e34f4	Merge pull request #4906 from hashicorp/f-metric-prefix-master Port metric prefix filtering to master	2018-11-29 22:27:47 -05:00
Nick Ethier	b1484aec33	nomad: fix hclog usage	2018-11-29 22:27:39 -05:00
Mahmood Ali	0a2611e41f	vault: protect against empty Vault secret response Also, fix a case where a successful second attempt of loading token can cause a panic.	2018-11-29 09:34:17 -05:00
Alex Dadgar	4ee603c382	Device hook and devices affect computed node class This PR introduces a device hook that retrieves the device mount information for an allocation. It also updates the computed node class computation to take into account devices. TODO Fix the task runner unit test. The environment variable is being lost even though it is being properly set in the prestart hook.	2018-11-27 17:25:33 -08:00
Nick Ethier	95362eaa02	Merge pull request #4844 from hashicorp/f-docker-plugin Docker driver plugin	2018-11-20 20:43:03 -05:00
Mahmood Ali	2e6133fd33	nil secrets as recoverable to keep renew attempts	2018-11-20 17:11:55 -05:00
Mahmood Ali	5827438983	Renew past recorded expiry till unrecoverable error Keep attempting to renew Vault token past locally recorded expiry, just in case the token was renewed out of band, e.g. on another Nomad server, until Vault returns an unrecoverable error.	2018-11-20 17:10:55 -05:00
Mahmood Ali	5836a341dd	fix typo	2018-11-20 17:10:55 -05:00
Mahmood Ali	93add67e04	round ttl duration for users	2018-11-20 17:10:55 -05:00
Mahmood Ali	4a0544b369	Track renewal expiration properly	2018-11-20 17:10:55 -05:00
Mahmood Ali	79aa934a4b	reconcile interface	2018-11-20 17:10:55 -05:00
Mahmood Ali	6efea6d8fc	Populate agent-info with vault Return Vault TTL info to /agent/self API and `nomad agent-info` command.	2018-11-20 17:10:55 -05:00
Mahmood Ali	6034af5084	Avoid explicit precomputed stats field Seems like the stats field is a micro-optimization that doesn't justify the complexity it introduces. Removing it and computing the stats from revoking field directly.	2018-11-20 17:10:54 -05:00
Mahmood Ali	14842200ec	More metrics for Server vault Add a gauge to track remaining time-to-live, duration of renewal request API call.	2018-11-20 17:10:54 -05:00
Mahmood Ali	e1994e59bd	address review comments	2018-11-20 17:10:54 -05:00
Mahmood Ali	35179c9655	Wrap Vault API api errors for easing debugging	2018-11-20 17:10:54 -05:00
Mahmood Ali	55456fc823	Set a 1s floor for Vault renew operation backoff	2018-11-20 17:10:54 -05:00
Mahmood Ali	7ad8f6c103	Merge pull request #4903 from hashicorp/b-delete-versions-mod-while-iter Fix a panic related to batch GC	2018-11-20 15:16:02 -05:00
Mahmood Ali	6281700c0c	address review comments	2018-11-20 13:21:39 -05:00
Nick Ethier	5c5cae79ab	nomad: only lookup job is disable_dispatched_job_summary_metrics is set	2018-11-19 23:22:23 -05:00
Nick Ethier	8ac69f440d	nomad: lookup job instead of adding Dispatched to summary	2018-11-19 23:22:02 -05:00
Nick Ethier	85b221a1d6	nomad: add flag to disable publishing of job_summary metrics for dispatched jobs	2018-11-19 23:21:19 -05:00
Nick Ethier	29591a7c2e	task_runner: emit event on task exit with exit result details	2018-11-19 22:59:17 -05:00
Mahmood Ali	d744e71fa9	add a missing no errorassertion	2018-11-19 21:44:00 -05:00
Mahmood Ali	b93643cd96	Fix a panic related to batch GC `deleteJobVersions` does concurrent modifications to iterated items while iterating, by deleting job versions while it's iterating on them,	2018-11-19 20:59:45 -05:00
Mahmood Ali	bff9c3b3e9	Reproduce a panic related to batch GC Test case that reproduces a panic with the following stacktrace: ``` panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715] goroutine 35 [running]: testing.tRunner.func1(0xc0001e2200) /usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387 panic(0x167e400, 0x1c43a30) /usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9 github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530) /go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5 github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(radixIterator).Next(0xc0003a0420, 0x1756059, 0xb) /go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e github.com/hashicorp/nomad/nomad/state.(StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0) /go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1 github.com/hashicorp/nomad/nomad/state.(StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0) /go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100) /go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2 github.com/hashicorp/nomad/nomad/state.(StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0) /go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79 github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200) /go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685 testing.tRunner(0xc0001e2200, 0x1777138) /usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf created by testing.(T).Run /usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353 ```	2018-11-19 20:58:32 -05:00
Michael Schurter	56ed4f01be	vault: fix panic by checking for nil secret Vault's RenewSelf(...) API may return (nil, nil). We failed to check if secret was nil before attempting to use it. RenewSelf: `e3eee5b4fb/api/auth_token.go (L138-L155)` Calls ParseSecret: `e3eee5b4fb/api/secret.go (L309-L311)` If anyone has an idea on how to test this I didn't see any options. We use a real Vault service, so there's no opportunity to mock the response.	2018-11-19 17:07:59 -08:00
Danielle Tomlinson	8bf17fe22d	Merge pull request #4875 from hashicorp/f-constraints scheduler: Make != constraints more flexible	2018-11-15 11:04:21 -08:00
Danielle Tomlinson	9c72dafc95	scheduler: Add is_set/is_not_set constraints This adds constraints for asserting that a given attribute or value exists, or does not exist. This acts as a companion to =, or != operators, e.g: ```hcl constraint { attribute = "${attrs.type}" operator = "!=" value = "database" } constraint { attribute = "${attrs.type}" operator = "is_set" } ```	2018-11-15 11:00:32 -08:00
Preetha Appan	e5de50fba8	Initial implementation of device preemption	2018-11-15 11:09:26 -06:00
Mahmood Ali	046f098bac	Track Node Device attributes and serve them in API	2018-11-14 14:42:29 -05:00
Mahmood Ali	a4a9347501	fix comment typos	2018-11-14 08:36:14 -05:00
Mahmood Ali	1e92161f14	Merge pull request #4858 from hashicorp/b-fix-master-20181109 Fix some tests in master	2018-11-13 16:08:26 -05:00
Alex Dadgar	08dc2ea702	Merge pull request #4867 from hashicorp/b-deployment-progress-deadline Blocked evaluation fixes	2018-11-13 10:29:03 -08:00
Mahmood Ali	865419e756	convert all config durations to strings in tests	2018-11-13 10:21:40 -05:00
Mahmood Ali	4e18846fd9	Adjust streaming duration This test expects 11 repeats of the same message emitted at intervals of 200ms; so we need more than 2 seconds to adjust for time sleep variations and the like. So raising it to 3s here that should be enough.	2018-11-13 10:21:40 -05:00
Mahmood Ali	1403ad21b9	Changelog job re-run fix	2018-11-13 07:52:51 -05:00
Mahmood Ali	e2d668f21c	Merge pull request #4861 from hashicorp/b-batch-deregister-transaction Run job deregistering in a single transaction	2018-11-12 20:59:44 -05:00
Alex Dadgar	a90dc978e1	Handle new eval being the duplicate properly	2018-11-12 16:02:23 -08:00
Mahmood Ali	8513b3cccb	Comment public functions and batch write txn	2018-11-12 16:09:39 -05:00
Preetha Appan	7ef126a027	Smaller methods, and added tests for RPC layer	2018-11-10 17:37:33 -06:00
Preetha Appan	75662b50d1	Use response object/querymeta/writemeta in scheduler config API	2018-11-10 10:31:10 -06:00
Mahmood Ali	9c0a15f3ce	Run job deregistering in a single transaction Fixes https://github.com/hashicorp/nomad/issues/4299 Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals. Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index. However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals. When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation. This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.	2018-11-09 22:35:26 -05:00
Preetha	3739713ce1	Merge pull request #4839 from hashicorp/b-gc-alloc-jobversion Remove terminal allocations associated with older job modify index	2018-11-09 12:21:42 -06:00
Preetha Appan	39072977d6	Use create index as trigger condition to gc old terminal allocs	2018-11-09 11:44:21 -06:00
Alex Dadgar	2f06d88f47	Merge pull request #4847 from hashicorp/b-blocked-eval Blocked evaluation fixes	2018-11-08 13:40:01 -08:00
Alex Dadgar	98398a8a44	Merge pull request #4842 from hashicorp/b-deployment-progress-deadline Fix multiple bugs with progress deadline handling	2018-11-08 13:31:54 -08:00
Alex Dadgar	991791a513	typo fix	2018-11-08 13:28:27 -08:00
Alex Dadgar	be54e56570	review fixes	2018-11-08 09:48:36 -08:00
Preetha Appan	5f0a9d2cfd	Show preemption output in plan CLI	2018-11-08 09:48:43 -06:00
Alex Dadgar	dbb05357bc	fix test	2018-11-07 11:59:24 -08:00
Alex Dadgar	36abd3a3d8	review comments	2018-11-07 10:33:22 -08:00
Alex Dadgar	e3cbb2c82e	allocs fit checks if devices get oversubscribed	2018-11-07 10:33:22 -08:00
Alex Dadgar	4f9b3ede87	Split device accounter and allocator	2018-11-07 10:32:03 -08:00
Alex Dadgar	6fa893c801	affinities	2018-11-07 10:32:03 -08:00
Alex Dadgar	feb83a2be3	assign devices	2018-11-07 10:32:03 -08:00
Alex Dadgar	2d2248e209	Add devices to allocated resources	2018-11-07 10:32:03 -08:00
Alex Dadgar	b1c5d52817	Track jobs by namespace	2018-11-07 10:22:08 -08:00
Alex Dadgar	6d8bb3a7bd	Duplicate blocked evals cancelling improved The old logic for cancelling duplicate blocked evaluations by job id had the issue where the newer evaluation could have additional node classes that it is (in)eligible for that we would not capture. This could make it such that cluster state could change such that the job would make progress but no evaluation was unblocked.	2018-11-07 10:08:23 -08:00
Preetha Appan	a9aec7e628	Fix failing resource subtraction test	2018-11-06 12:26:26 -06:00
Alex Dadgar	261aae32b1	more robust merging of the deployment status when getting updates from the client	2018-11-05 16:39:09 -08:00
Alex Dadgar	1c31970464	Fix multiple tgs with progress deadline handling Fix an issue in which the deployment watcher would fail the deployment based on the earliest progress deadline of the deployment regardless of if the task group has finished. Further fix an issue where the blocked eval optimization would make it so no evals were created to progress the deployment. To reproduce this issue, prior to this commit, you can create a job with two task groups. The first group has count 1 and resources such that it can not be placed. The second group has count 3, max_parallel=1, and can be placed. Run this first and then update the second group to do a deployment. It will place the first of three, but never progress since there exists a blocked eval. However, that doesn't capture the fact that there are two groups being deployed.	2018-11-05 16:06:17 -08:00
Preetha Appan	6fdc84cce3	add comment	2018-11-02 18:11:36 -05:00
Preetha Appan	a6b714b81c	update preemption tests to use new node resource structs also includes a fix to remove unnecessary subtraction of network mbits	2018-11-02 17:59:53 -05:00
Preetha	b2b52b1ada	Merge pull request #4794 from hashicorp/f-preemption-systemjobs Preemption for system jobs	2018-11-02 16:28:06 -05:00
Preetha Appan	c33469157d	unit test plan apply with preemptions	2018-11-01 20:06:32 -05:00
Preetha Appan	57fe5050f0	more minor review feedback	2018-11-01 17:05:17 -05:00
Preetha Appan	fd60e66f86	Plumb alloc resource cache in a few more places. also removed now unused method	2018-11-01 16:44:43 -05:00
Preetha Appan	e586817ce7	batch jobs GC removes terminal allocs if job modifyindex is older than running job	2018-11-01 00:05:31 -05:00
Mahmood Ali	9da19c6450	address review comments	2018-10-30 13:58:52 -04:00
Mahmood Ali	4937095389	Allow artifacts checksum interpolation Fixes https://github.com/hashicorp/nomad/issues/4814	2018-10-30 13:24:30 -04:00
Preetha Appan	f1c3eb2792	Introduce interface with multiple implementations for resource distance	2018-10-30 11:06:32 -05:00
Preetha Appan	8f7eb61823	Introduce a response object for scheduler configuration	2018-10-30 11:06:32 -05:00
Preetha Appan	1a5421f5d7	more minor cleanup	2018-10-30 11:06:32 -05:00
Preetha Appan	0494a098ce	More style and readablity fixes from review	2018-10-30 11:06:32 -05:00
Preetha Appan	1415032c13	More review comments	2018-10-30 11:06:32 -05:00
Preetha Appan	b97f85e3e0	style fixes	2018-10-30 11:06:32 -05:00
Preetha Appan	12278527c7	make default config a variable	2018-10-30 11:06:32 -05:00
Preetha Appan	32cc764072	Add fsm layer tests	2018-10-30 11:06:32 -05:00
Preetha Appan	7b8156fc47	Restore/Snapshot plus unit tests for scheduler configuration	2018-10-30 11:06:32 -05:00
Preetha Appan	8807c25b11	Modify preemption code to use new style of resource structs	2018-10-30 11:06:32 -05:00
Preetha Appan	c1c1c230e4	Make preemption config a struct to allow for enabling based on scheduler type	2018-10-30 11:06:32 -05:00
Preetha Appan	bd34cbb1f7	Support for new scheduler config API, first use case is to disable preemption	2018-10-30 11:06:32 -05:00
Preetha Appan	3190a2c29b	Fix linting	2018-10-30 11:06:32 -05:00
Preetha Appan	eb38488d08	Fix logic bug, unit test for plan apply method in state store	2018-10-30 11:06:32 -05:00
Preetha Appan	9e4a35fff0	Fix comment	2018-10-30 11:06:32 -05:00
Preetha Appan	cc295b90de	Implement preemption for system jobs. This commit implements an allocation selection algorithm for finding allocations to preempt. It currently special cases network resource asks from others (cpu/memory/disk/iops).	2018-10-30 11:06:32 -05:00
Preetha Appan	d11064d6ba	structs and API changes to plan and alloc structs needed for preemption	2018-10-30 11:06:32 -05:00
Preetha Appan	9257387a69	Add number of evictions to DesiredUpdates struct to use in CLI/API	2018-10-30 11:06:32 -05:00
Preetha Appan	5ff4b8e36f	REview feedback	2018-10-30 11:06:32 -05:00
Preetha Appan	5b3bfb63eb	structs and API changes to plan and alloc structs needed for preemption	2018-10-30 11:06:32 -05:00
Michael Schurter	5d49832de4	tests: fix usages of TestClient cleanup and mock driver	2018-10-29 14:21:05 -07:00
Michael Schurter	e060174130	ar: fix leader handling, state restoring, and destroying unrun ARs * Migrated all of the old leader task tests and got them passing * Refactor and consolidate task killing code in AR to always kill leader tasks first * Fixed lots of issues with state restoring * Fixed deadlock in AR.Destroy if AR.Run had never been called * Added a new in memory statedb for testing	2018-10-19 09:45:45 -07:00
Alex Dadgar	6f0ed6184b	Fix client reloading and pass the plugin loaders to server and client	2018-10-16 16:56:55 -07:00
Michael Schurter	a4b4d7b266	consul service hook Deregistration works but difficult to test due to terminal updates not being fully implemented in the new client/ar/tr.	2018-10-16 16:53:29 -07:00
Alex Dadgar	e401c660e7	Implement lifecycle hooks on the task runner	2018-10-16 16:53:29 -07:00
Alex Dadgar	a78cefec18	use int64	2018-10-16 15:34:32 -07:00
Preetha Appan	7c0d8c646c	Change CPU/Disk/MemoryMB to int everywhere in new resource structs	2018-10-16 16:21:42 -05:00
Alex Dadgar	f5a76d8411	review comments	2018-10-15 15:31:13 -07:00
Alex Dadgar	f9b056e1d1	Replace attributes map with new Attribute object	2018-10-13 14:08:58 -07:00
Alex Dadgar	04ba425dd5	validate constraints/affinities	2018-10-13 12:27:49 -07:00
Alex Dadgar	9b5aaac410	Device feasability checker	2018-10-13 12:27:49 -07:00
Alex Dadgar	bfb4caa2e7	node devices	2018-10-13 12:27:49 -07:00
Alex Dadgar	5a07f9f96e	parse affinities and constraints on devices	2018-10-11 14:05:19 -07:00
Alex Dadgar	a2a56a930c	Diff	2018-10-08 17:02:58 -07:00
Alex Dadgar	6b08b9d6b6	Define device request structs	2018-10-08 15:38:03 -07:00
Alex Dadgar	01f8e5b95f	renames	2018-10-04 14:57:25 -07:00
Alex Dadgar	52f9cd7637	fixing tests	2018-10-04 14:26:19 -07:00
Alex Dadgar	bac5cb1e8b	Scheduler uses allocated resources	2018-10-02 17:08:25 -07:00
Alex Dadgar	147d2430a1	allocated resources structs	2018-09-29 18:47:28 -07:00
Alex Dadgar	5c8697667e	Node reserved resources	2018-09-29 18:44:55 -07:00
Alex Dadgar	3183153315	Node resources on client	2018-09-29 17:23:41 -07:00
Alex Dadgar	9b793531d6	Merge pull request #4720 from hashicorp/b-jet-fixes Series of scheduler fixes / debugging enhancements	2018-09-25 13:25:11 -07:00
Alex Dadgar	bd420692f3	fix logging	2018-09-25 10:49:55 -07:00
Preetha Appan	86e725e84c	Added logging around nacked evals in the scheduler worker	2018-09-25 10:49:02 -07:00
Alex Dadgar	6a21f9fe96	Unique TriggerBy for blocked evals Give blocked evals a unique triggerby reason to make debugging a chain of evaluations easier.	2018-09-24 14:47:49 -07:00
Alex Dadgar	e1a102f58c	test allocs fit	2018-09-24 13:59:01 -07:00
Alex Dadgar	d7f5be9148	Better comment on snapshotindex	2018-09-24 13:53:43 -07:00
Alex Dadgar	99498da6ed	Denormalize jobs in plan and ignore resources of terminal allocs Denormalize jobs in AppendAllocs: AppendAlloc was originally only ever called for inplace upgrades and new allocations. Both these code paths would remove the job from the allocation. Now we use this to also add fields such as FollowupEvalID which did not normalize the job. This is only a performance enhancement. Ignore terminal allocs: Failed allocations are annotated with the followup Eval ID when one is created to replace the failed allocation. However, in the plan applier, when we check if allocations fit, these terminal allocations were not filtered. This could result in the plan being rejected if the node would be overcommited if the terminal allocations resources were considered.	2018-09-24 13:53:43 -07:00
Alex Dadgar	de442226ae	Fix other instances of blocking queries	2018-09-24 13:52:39 -07:00
Alex Dadgar	7f0d241ef4	always handle failed allocation	2018-09-21 15:13:54 -07:00
Alex Dadgar	b2449ae1ce	Fix deployment watcher index usage Fixes three issues: 1. Retrieving the latest evaluation index was not properly selecting the greatest index. This would undermine checks we had to reduce the number of evaluations created when the latest eval index was greater than any alloc change 2. Fix an issue where the blocking query code was using the incorrect index such that the index was higher than necassary. 3. Special case handling of blocked evaluation since the create/snapshot index is no particularly useful since they can be reblocked.	2018-09-21 13:59:11 -07:00
Alex Dadgar	5009566503	do not bootstrap with non voters	2018-09-19 17:17:39 -07:00
Alex Dadgar	e8f89597f5	fix rpc test	2018-09-19 10:17:54 -07:00
Alex Dadgar	9971b3393f	yamux	2018-09-17 14:22:40 -07:00
Alex Dadgar	b2f500b48c	Serf/Raft/Memberlist logger	2018-09-17 13:57:52 -07:00
Alex Dadgar	ca28afa3b2	small fixes	2018-09-15 16:42:38 -07:00
Alex Dadgar	3c19d01d7a	server	2018-09-15 16:23:13 -07:00
Alex Dadgar	7739ef51ce	agent + consul	2018-09-13 10:43:40 -07:00
Preetha Appan	996484981c	Fix panic when reschedule policy for allocation can't be looked up because its task group changed	2018-09-05 17:01:02 -05:00
Alex Dadgar	4f89cabd34	Merge pull request #4631 from hashicorp/f-plugin-config Parse plugin configs	2018-09-04 17:04:13 -07:00
Alex Dadgar	cc92cd92cd	Merge pull request #4642 from hashicorp/b-vet Fix vet errors and use newer go version in travis	2018-09-04 17:04:02 -07:00
Alex Dadgar	c6576ddac1	Fix make check errors	2018-09-04 16:03:52 -07:00
Preetha Appan	26288b9522	Fix more review feedback	2018-09-04 16:10:11 -05:00
Preetha Appan	751c0eb5a5	code review feedback	2018-09-04 16:10:11 -05:00
Preetha Appan	4f8e925b54	Move topk and delay heap to separate packages under lib	2018-09-04 16:10:11 -05:00
Preetha Appan	9bc0962527	Track top k nodes by norm score rather than top k nodes per scorer	2018-09-04 16:10:11 -05:00
Preetha Appan	6ed527c636	Use heap to store top K scoring nodes. Scoring metadata is now aggregated by scorer type to make it easier to parse when reading it in the CLI.	2018-09-04 16:10:11 -05:00
Preetha Appan	dd5fe6373f	Fix scoring logic for uneven spread to incorporate current alloc count Also addressed other small code review comments	2018-09-04 16:10:11 -05:00
Preetha Appan	e72c0fe527	more cleanup	2018-09-04 16:10:11 -05:00
Preetha Appan	92d37acc2a	comment and formatting cleanup	2018-09-04 16:10:11 -05:00
Preetha Appan	5812f906c8	Allow empty spread targets, and validate target percentages.	2018-09-04 16:10:11 -05:00
Preetha Appan	71bff00326	validate spread from job/task group validate methods	2018-09-04 16:10:11 -05:00
Preetha Appan	fbd0004707	Fix warnings	2018-09-04 16:10:11 -05:00
Preetha Appan	5eb82b6260	Validate method, and rename ratio field to percent	2018-09-04 16:10:11 -05:00
Preetha Appan	0037d72fa8	Structs and validation for spread	2018-09-04 16:10:11 -05:00
Preetha Appan	c407e3626f	More review comments	2018-09-04 16:10:11 -05:00
Preetha Appan	dbbb4a957a	Fail validation if system job has affinities	2018-09-04 16:10:11 -05:00

... 2 3 4 5 6 ...

2784 commits