open-nomad

Commit Graph

Author	SHA1	Message	Date
Preetha Appan	aa2b4b4e00	Undo removal of node drain compat changes Decided to remove that in 0.10	2019-07-01 15:12:01 -05:00
Preetha Appan	3484f18984	Fix more tests	2019-06-26 16:30:53 -05:00
Preetha Appan	ff1b80dba6	Fix node drain test	2019-06-26 16:12:07 -05:00
Preetha Appan	23319e04d6	Restore accidentally deleted block	2019-06-26 13:59:14 -05:00
Michael Schurter	69ba495f0c	nomad: expand comments on subtle plan apply behaviors	2019-06-26 08:49:24 -07:00
Preetha Appan	66fa6a67ec	newline	2019-06-25 19:41:09 -05:00
Preetha Appan	10e7d6df6d	Remove compat code associated with many previous versions of nomad This removes compat code for namespaces (0.7), Drain(0.8) and other older features from releases older than Nomad 0.7	2019-06-25 19:05:25 -05:00
Michael Schurter	e4bc943a68	nomad: SnapshotAfter -> SnapshotMinIndex Rename SnapshotAfter to SnapshotMinIndex. The old name was not technically accurate. SnapshotAtOrAfter is more accurate, but wordy and still lacks context about what precisely it is at or after (the index). SnapshotMinIndex was chosen as it describes the action (snapshot), a constraint (minimum), and the object of the constraint (index).	2019-06-24 12:16:46 -07:00
Michael Schurter	0f8164b2f1	nomad: evaluate plans after previous plan index The previous commit prevented evaluating plans against a state snapshot which is older than the snapshot at which the plan was created. This is correct and prevents failures trying to retrieve referenced objects that may not exist until the plan's snapshot. However, this is insufficient to guarantee consistency if the following events occur: 1. P1, P2, and P3 are enqueued with snapshot @ 100 2. Leader evaluates and applies Plan P1 with snapshot @ 100 3. Leader evaluates Plan P2 with snapshot+P1 @ 100 4. P1 commits @ 101 4. Leader evaluates applies Plan P3 with snapshot+P2 @ 100 Since only the previous plan is optimistically applied to the state store, the snapshot used to evaluate a plan may not contain the N-2 plan! To ensure plans are evaluated and applied serially we must consider all previous plan's committed indexes when evaluating further plans. Therefore combined with the last PR, the minimum index at which to evaluate a plan is: min(previousPlanResultIndex, plan.SnapshotIndex)	2019-06-24 12:16:46 -07:00
Michael Schurter	e10fea1d7a	nomad: include snapshot index when submitting plans Plan application should use a state snapshot at or after the Raft index at which the plan was created otherwise it risks being rejected based on stale data. This commit adds a Plan.SnapshotIndex which is set by workers when submitting plan. SnapshotIndex is set to the Raft index of the snapshot the worker used to generate the plan. Plan.SnapshotIndex plays a similar role to PlanResult.RefreshIndex. While RefreshIndex informs workers their StateStore is behind the leader's, SnapshotIndex is a way to prevent the leader from using a StateStore behind the worker's. Plan.SnapshotIndex should be considered the lower bound index for consistently handling plan application. Plans must also be committed serially, so Plan N+1 should use a state snapshot containing Plan N. This is guaranteed for plans after the first plan after a leader election. The Raft barrier on leader election ensures the leader's statestore has caught up to the log index at which it was elected. This guarantees its StateStore is at an index > lastPlanIndex.	2019-06-24 12:16:46 -07:00
Chris Baker	59fac48d92	alloc lifecycle: 404 when attempting to stop non-existent allocation	2019-06-20 21:27:22 +00:00
Preetha	586e50d1a4	Merge pull request #5841 from hashicorp/f-raft-snapshot-metrics Raft and state store indexes as metrics	2019-06-19 12:01:03 -05:00
Preetha Appan	dc0ac81609	Change interval of raft stats collection to 10s	2019-06-19 11:58:46 -05:00
Preetha Appan	104d66f10c	Changed name of metric	2019-06-17 15:51:31 -05:00
Chris Baker	e0170e1c67	metrics: add namespace label to allocation metrics	2019-06-17 20:50:26 +00:00
Preetha Appan	c54b4a5b17	Emit metrics with raft commit and apply index and statestore latest index	2019-06-14 16:30:27 -05:00
Jasmine Dahilig	ed9740db10	Merge pull request #5664 from hashicorp/f-http-hcl-region backfill region from hcl for jobUpdate and jobPlan	2019-06-13 12:25:01 -07:00
Jasmine Dahilig	51e141be7a	backfill region from job hcl in jobUpdate and jobPlan endpoints - updated region in job metadata that gets persisted to nomad datastore - fixed many unrelated unit tests that used an invalid region value (they previously passed because hcl wasn't getting picked up and the job would default to global region)	2019-06-13 08:03:16 -07:00
Nick Ethier	1b7fa4fe29	Optional Consul service tags for nomad server and agent services (#5706 ) Optional Consul service tags for nomad server and agent services	2019-06-13 09:00:35 -04:00
Mahmood Ali	e31159bf1f	Prepare for 0.9.4 dev cycle	2019-06-12 18:47:50 +00:00
Nomad Release bot	4803215109	Generate files for 0.9.3 release	2019-06-12 16:11:16 +00:00
Mahmood Ali	07f2c77c44	comment DenormalizeAllocationDiffSlice applies to terminal allocs only	2019-06-12 08:28:43 -04:00
Lang Martin	fe8a4781d8	config merge maintains *HCL string fields used for duration conversion	2019-06-11 16:34:04 -04:00
Mahmood Ali	392f5bac44	Stop updating allocs.Job on stopping or preemption	2019-06-10 18:30:20 -04:00
Mahmood Ali	6c8e329819	test that stopped alloc jobs aren't modified When an alloc is stopped, test that we don't update the job found in alloc with new job that is no longer relevent for this alloc.	2019-06-10 17:14:26 -04:00
Mahmood Ali	d30c3d10b0	Merge pull request #5747 from hashicorp/b-test-fixes-20190521-1 More test fixes	2019-06-05 19:09:18 -04:00
Mahmood Ali	87173111de	Merge pull request #5746 from hashicorp/b-no-updating-inmem-node set node.StatusUpdatedAt in raft	2019-06-05 19:05:21 -04:00
Mahmood Ali	97957fbf75	Prepare for 0.9.3 dev cycle	2019-06-05 14:54:00 +00:00
Nomad Release bot	43bfbf3fcc	Generate files for 0.9.2 release	2019-06-05 11:59:27 +00:00
Michael Schurter	073893f529	nomad: disable service+batch preemption by default Enterprise only. Disable preemption for service and batch jobs by default. Maintain backward compatibility in a x.y.Z release. Consider switching the default for new clusters in the future.	2019-06-04 15:54:50 -07:00
Michael Schurter	a8fc50cc1b	nomad: revert use of SnapshotAfter in planApply Revert plan_apply.go changes from #5411 Since non-Command Raft messages do not update the StateStore index, SnapshotAfter may unnecessarily block and needlessly fail in idle clusters where the last Raft message is a non-Command message. This is trivially reproducible with the dev agent and a job that has 2 tasks, 1 of which fails. The correct logic would be to SnapshotAfter the previous plan's index to ensure consistency. New clusters or newly elected leaders will not have a previous plan, so the index the leader was elected should be used instead.	2019-06-03 15:34:21 -07:00
Mahmood Ali	a4ead8ff79	remove 0.9.2-rc1 generated code	2019-05-23 11:14:24 -04:00
Nomad Release bot	6d6bc59732	Generate files for 0.9.2-rc1 release	2019-05-22 19:29:30 +00:00
Lang Martin	d46613ff44	structs check TaskGroup.Update for nil	2019-05-22 12:34:57 -04:00
Lang Martin	10a3fd61b0	comment replace COMPAT 0.7.0 for job.Update with more current info	2019-05-22 12:34:57 -04:00
Lang Martin	67ebcc47dd	structs comment todo DeploymentStatus & DeploymentStatusDescription	2019-05-22 12:34:57 -04:00
Lang Martin	21bf9fdf90	structs job warnings for taskgroup with mixed auto_promote settings	2019-05-22 12:34:57 -04:00
Lang Martin	0f6f543a5f	deployment_watcher auto promote iff every task group is auto promotable	2019-05-22 12:34:57 -04:00
Lang Martin	d27d6f8ede	structs validate requires Canary for AutoPromote	2019-05-22 12:32:08 -04:00
Lang Martin	0c668ecc7a	log error on autoPromoteDeployment failure	2019-05-22 12:32:08 -04:00
Lang Martin	f23f9fd99e	describe a pending deployment without auto_promote more explicitly	2019-05-22 12:32:08 -04:00
Lang Martin	34230577df	describe a pending deployment with auto_promote accurately	2019-05-22 12:32:08 -04:00
Lang Martin	b5fd735960	add update AutoPromote bool	2019-05-22 12:32:08 -04:00
Lang Martin	3c5a9fed22	deployments_watcher_test new TestWatcher_AutoPromoteDeployment	2019-05-22 12:32:08 -04:00
Lang Martin	0bebf5d7f8	deployment_watcher when it's ok to autopromote, do so	2019-05-22 12:32:08 -04:00
Lang Martin	0cf4168ed9	deployments_watcher comments	2019-05-22 12:32:08 -04:00
Lang Martin	0c403eafde	state_store typo in a comment	2019-05-22 12:32:08 -04:00
Lang Martin	e1e28307be	new deploymentwatcher/doc.go for package level documentation	2019-05-22 12:32:08 -04:00
Mahmood Ali	9ff5f163b5	update callers in tests	2019-05-21 21:10:17 -04:00
Mahmood Ali	6bdbeed319	set node.StatusUpdatedAt in raft Fix a case where `node.StatusUpdatedAt` was manipulated directly in memory. This ensures that StatusUpdatedAt is set in raft layer, and ensures that the field is updated when node drain/eligibility is updated too.	2019-05-21 16:13:32 -04:00
Mahmood Ali	2159d0f3ac	tests: fix some nomad/drainer test data races	2019-05-21 14:40:58 -04:00
Mahmood Ali	3b0152d778	tests: fix deploymentwatcher tests data races	2019-05-21 14:29:45 -04:00
Michael Schurter	689794e08d	nomad: fix deadlock in UnblockClassAndQuota Previous commit could introduce a deadlock if the capacityChangeCh was full and the receiving side exited before freeing a slot for the sending side could send. Flush would then block forever waiting to acquire the lock just to throw the pending update away. The race is around getting/setting the chan field, not chan operations, so only lock around getting the chan field.	2019-05-20 15:41:52 -07:00
Michael Schurter	8c99214f69	nomad: fix race in BlockedEvals I assume the mutex was being released before sending on capacityChangeCh to avoid blocking in the critical section, but: 1. This is race. 2. capacityChangeCh has a huge buffer (8096). If it's full things already seem Very Bad, and a little backpressure seems appropriate.	2019-05-20 15:26:20 -07:00
Michael Schurter	05a9c6aedb	Merge pull request #5411 from hashicorp/b-snapshotafter Block plan application until state store has caught up to raft	2019-05-20 14:03:10 -07:00
Mahmood Ali	cd64ada95d	Run TestClientAllocations_Restart_ACL test	2019-05-17 20:30:23 -04:00
Michael Schurter	0e39927782	nomad: emit more detailed error Avoid returning context.DeadlineExceeded as it lacks helpful information and is often ignored or handled specially by callers.	2019-05-17 14:37:42 -07:00
Michael Schurter	b80a7e0feb	nomad: wait for state store to sync in plan apply Wait for state store to catch up with raft when applying plans.	2019-05-17 14:37:12 -07:00
Michael Schurter	1bc731da47	nomad: remove unused NotifyGroup struct I don't think it's been used for a long time.	2019-05-17 13:30:23 -07:00
Michael Schurter	9732bc37ff	nomad: refactor waitForIndex into SnapshotAfter Generalize wait for index logic in the state store for reuse elsewhere. Also begin plumbing in a context to combine handling of timeouts and shutdown.	2019-05-17 13:30:23 -07:00
Preetha	c8fdf20c66	Merge pull request #5717 from hashicorp/b-plan-apply-preemptions Fix bug in plan applier introduced in PR-5602	2019-05-16 11:01:05 -05:00
Preetha	2dcd4291f8	Merge pull request #5702 from hashicorp/f-filter-by-create-index Filter deployments by create index	2019-05-15 21:50:41 -05:00
Preetha	555dd23c2c	remove stray newline Co-Authored-By: Danielle <dani@builds.terrible.systems>	2019-05-15 21:11:52 -05:00
Preetha Appan	2b787aad7e	Fix bug in plan applier introduced in PR-5602 This fixes a bug in the state store during plan apply. When denormalizing preempted allocations it incorrectly set the preemptor's job during the update. This eventually causes a panic downstream in the client. Added a test assertion that failed before and passes after this fix	2019-05-15 20:34:06 -05:00
Danielle	d202582502	Merge pull request #5699 from hashicorp/dani/b-eval-broker-lifetime Eval Broker: Prevent redundant enqueue's when a node is not a leader	2019-05-15 23:30:52 +01:00
Danielle Lancashire	2fb93a6229	evalbroker: test for no enqueue on disabled	2019-05-15 11:02:21 +02:00
Nick Ethier	ade97bc91f	fixup #5172 and rebase against master	2019-05-14 14:37:34 -04:00
Nick Ethier	cab6a95668	Merge branch 'master' into pr/5172 * master: (912 commits) Update redirects.txt Added redirect for Spark guide link client: log when server list changes docs: mention regression in task config validation fix update to changelog update CHANGELOG with datacenter config validation https://github.com/hashicorp/nomad/pull/5665 typo: "atleast" -> "at least" implement nomad exec for rkt docs: fixed typo use pty/tty terminology similar to github.com/kr/pty vendor github.com/kr/pty drivers: implement streaming exec for executor based drivers executors: implement streaming exec executor: scaffolding for executor grpc handling client: expose allocated memory per task client improve a comment in updateNetworks stalebot: Add 'thinking' as an exempt label (#5684) Added Sparrow link update links to use new canonical location Add redirects for restructing done in GH-5667 ...	2019-05-14 14:10:33 -04:00
Michael Schurter	d7e5ace1ed	client: do not restart dead tasks until server is contacted Fixes #1795 Running restored allocations and pulling what allocations to run from the server happen concurrently. This means that if a client is rebooted, and has its allocations rescheduled, it may restart the dead allocations before it contacts the server and determines they should be dead. This commit makes tasks that fail to reattach on restore wait until the server is contacted before restarting.	2019-05-14 10:53:27 -07:00
Danielle Lancashire	d9815888ed	evalbroker: Simplify nextDelayedEval locking	2019-05-14 14:06:27 +02:00
Danielle Lancashire	38562afbc1	evalbroker: No new enqueues when disabled Currently when an evalbroker is disabled, it still recieves delayed enqueues via log application in the fsm. This causes an ever growing heap of evaluations that will never be drained, and can cause memory issues in larger clusters, or when left running for an extended period of time without a leader election. This commit prevents the enqueuing of evaluations while we are disabled, and relies on the leader restoreEvals routine to handle reconciling state during a leadership transition. Existing dequeues during an Enabled->Disabled broker state transition are handled by the enqueueLocked function dropping evals.	2019-05-14 13:59:10 +02:00
Danielle Lancashire	c91ae21a6c	evalbroker: Flush within update lock Primarily a cleanup commit, however, currently there is a potential race condition (that I'm not sure we've ever actually hit) during a flapping SetEnabled/Disabled state where we may never correctly restart the eval broker, if it was being called from multiple routines.	2019-05-14 13:26:56 +02:00
Preetha Appan	4d3f74e161	Fix test setup to have correct jobcreateindex for deployments	2019-05-13 18:53:47 -05:00
Preetha Appan	d448750449	Lookup job only once, and fix tests	2019-05-13 18:33:41 -05:00
Preetha Appan	07690d6f9e	Add flag similar to --all for allocs to be able to filter deployments by latest	2019-05-13 18:33:41 -05:00
Jasmine Dahilig	30d346ca15	Merge pull request #5665 from hashicorp/b-empty-datacenters add non-empty string validation for datacenters	2019-05-13 10:23:26 -07:00
Mahmood Ali	cf1f3625b4	Update ugorji/go to latest Our testing so far indicates that ugorji/go/codec maintains backward compatiblity with the version we are using now, for purposes of Nomad serialization. Using latest ugorji/go allows us to get back to using upstream library, get get the optimizations benefits in RPC paths (including code generation optimizations). ugorji/go introduced two significant changes: * time binary format in `debb8e2d2e`. Setting `h.BasicHandle.TimeNotBuiltin = true` restores old behavior * ugorji/go started honoring `json` tag as well: v1.1.4 is the latest but has a bug in handling RawString that's fixed in `d09a80c1e0` .	2019-05-09 19:35:58 -04:00
Mahmood Ali	919827f2df	Merge pull request #5632 from hashicorp/f-nomad-exec-parts-01-base nomad exec part 1: plumbing and docker driver	2019-05-09 18:09:27 -04:00
Mahmood Ali	3c668732af	server: server forwarding logic for nomad exec endpoint	2019-05-09 16:49:08 -04:00
Jasmine Dahilig	0ba2bd15b9	add unit tests for datacenter non-empty string validation	2019-05-08 11:51:52 -07:00
Mahmood Ali	9d3f13e9b3	remove Index field from EmitNodeEventsResponse `Index` is already included as part of `WriteMeta` embedding. This is a backward compatible change: Clients never read the field; and Server refernces to `EmitNodeEventsResponse.Index` would be using the value in `WriteMeta`, which is consistent with other response structs.	2019-05-08 08:42:26 -04:00
Preetha	1538913a2a	Merge pull request #5628 from hashicorp/f-preemption-config Add config to disable preemption for batch/service jobs	2019-05-06 15:40:35 -05:00
Mahmood Ali	f35ad92a8b	Merge pull request #5646 from hashicorp/some-ugorji-fixes Codegen codec helpers for all nomad structs	2019-05-06 13:23:12 -04:00
Lang Martin	9f3f11df97	Merge pull request #5601 from hashicorp/b-config-parse-direct-hcl config parse direct hcl	2019-05-06 12:05:19 -04:00
Mahmood Ali	92c133b905	Update peers info with new raft config details	2019-05-03 16:55:53 -04:00
Preetha Appan	ad3c263d3f	Rename to match system scheduler config. Also added docs	2019-05-03 14:06:12 -05:00
Jasmine Dahilig	016495c368	add non-empty string validation for datacenters	2019-05-03 06:48:02 -07:00
Hemanth Basappa	3fef02aa93	Add support in nomad for supporting raft 3 protocol peers.json	2019-05-02 09:11:23 -07:00
Mahmood Ali	21d21baf8b	codegen codecs for nomad structs `ls *[!_test].go` was ignoring any file that ends with `s.go` (or any of the letter inside `[]`), including `structs.go`!	2019-05-01 12:42:55 -04:00
Lang Martin	598112a1cc	tag HCL bookkeeping keys with json:"-" to keep them out of the api	2019-04-30 10:29:14 -04:00
Lang Martin	5ebae65d1a	agent/config, config/* mapstructure tags -> hcl tags	2019-04-30 10:29:14 -04:00
Preetha Appan	6615d5c868	Add config to disable preemption for batch/service jobs	2019-04-29 18:48:07 -05:00
Lang Martin	371014b781	Merge pull request #5553 from hashicorp/b-fingerprinter-manual-config client fingerprinter doesn't overwrite manual configuration	2019-04-26 12:55:34 -04:00
Danielle Lancashire	3409e0be89	allocs: Add nomad alloc signal command This command will be used to send a signal to either a single task within an allocation, or all of the tasks if <task-name> is omitted. If the sent signal terminates the allocation, it will be treated as if the allocation has crashed, rather than as if it was operator-terminated. Signal validation is currently handled by the driver itself and nomad does not attempt to restrict or validate them.	2019-04-25 12:43:32 +02:00
Arshneet Singh	b7b050cdd1	Change min version required for plan optimization	2019-04-24 12:36:07 -07:00
Arshneet Singh	9cc39edb67	Return error when preempted/stopped alloc doesn't exist during denormalization	2019-04-24 12:36:07 -07:00
Lang Martin	19ba0f4882	structs_test use testify require.True instead of t.Fatal	2019-04-23 17:00:11 -04:00
Arshneet Singh	d4e7a5c005	Add comments to functions, and use require instead of assert	2019-04-23 09:57:21 -07:00
Arshneet Singh	4cf4324b8f	Remove allowPlanOptimization from schedulers	2019-04-23 09:18:02 -07:00
Arshneet Singh	0dd4c109e8	Compat tags	2019-04-23 09:18:01 -07:00
Arshneet Singh	65f5fab131	Add tests for plan normalization	2019-04-23 09:18:01 -07:00
Arshneet Singh	b977748a4b	Add code for plan normalization	2019-04-23 09:18:01 -07:00
Danielle	198a838b61	Merge pull request #5512 from hashicorp/dani/f-alloc-stop alloc-lifecycle: nomad alloc stop	2019-04-23 13:05:08 +02:00
Danielle Lancashire	832f607433	allocs: Add nomad alloc stop This adds a `nomad alloc stop` command that can be used to stop and force migrate an allocation to a different node. This is built on top of the AllocUpdateDesiredTransitionRequest and explicitly limits the scope of access to that transition to expose it under the alloc-lifecycle ACL. The API returns the follow up eval that can be used as part of monitoring in the CLI or parsed and used in an external tool.	2019-04-23 12:50:23 +02:00
Lang Martin	8aa97cff13	tests over setwise equality of fingerprinted parts	2019-04-19 15:49:24 -04:00
Lang Martin	7de6e28ddc	structs need to keep assert Equal interface implementation for tests	2019-04-19 15:23:49 -04:00
Lang Martin	977d33970b	structs equals use labeled continue for clarity	2019-04-19 15:23:48 -04:00
Lang Martin	7b99488afa	struct equals use a working pattern for setwise comparison	2019-04-19 15:23:48 -04:00
Lang Martin	eba4e29440	client fingerprinter doesn't overwrite manual configuration Revert "Revert accidental merge of pr #5482" This reverts commit c45652ab8c113487b9d4fbfb107782cbcf8a85b0.	2019-04-19 15:23:48 -04:00
Preetha Appan	22109d1e20	Add preemption related fields to AllocationListStub	2019-04-18 10:36:44 -05:00
Lang Martin	a2a1e7829d	Revert accidental merge of pr #5482 Revert "fingerprint Constraints and Affinities have Equals, as set" This reverts commit 596f16fb5f1a4a6766a57b3311af806d22382609. Revert "client tests assert the independent handling of interface and speed" This reverts commit 7857ac5993a578474d0570819f99b7b6e027de40. Revert "structs missed applying a style change from the review" This reverts commit 658916e3274efa438beadc2535f47109d0c2f0f2. Revert "client, structs comments" This reverts commit be2838d6baa9d382a5013fa80ea016856f28ade2. Revert "client fingerprint updateNetworks preserves the network configuration" This reverts commit fc309cb430e62d8e66267a724f006ae9abe1c63c. Revert "client_test cleanup comments from review" This reverts commit bc0bf4efb9114e699bc662f50c8f12319b6b3445. Revert "client Networks Equals is set equality" This reverts commit f8d432345b54b1953a4a4c719b9269f845e3e573. Revert "struct cleanup indentation in RequestedDevice Equals" This reverts commit f4746411cab328215def6508955b160a53452da3. Revert "struct Equals checks for identity before value checking" This reverts commit 0767a4665ed30ab8d9586a59a74db75d51fd9226. Revert "fix client-test, avoid hardwired platform dependecy on lo0" This reverts commit e89dbb2ab182b6368507dbcd33c3342223eb0ae7. Revert "refactor error in client fingerprint to include the offending data" This reverts commit a7fed726c6e0264d42a58410d840adde780a30f5. Revert "add client updateNodeResources to merge but preserve manual config" This reverts commit 84bd433c7e1d030193e054ec23474380ff3b9032. Revert "refactor struts.RequestedDevice to have its own Equals" This reverts commit 689782524090e51183474516715aa2f34908b8e6. Revert "refactor structs.Resource.Networks to have its own Equals" This reverts commit 49e2e6c77bb3eaa4577772b36c62205061c92fa1. Revert "refactor structs.Resource.Devices to have its own Equals" This reverts commit 4ede9226bb971ae42cc203560ed0029897aec2c9. Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources" This reverts commit 49fbaace5298d5ccf031eb7ebec93906e1d468b5. Revert "add structs.Resources Equals" This reverts commit 8528a2a2a6450e4462a1d02741571b5efcb45f0b. Revert "test that fingerprint resources are updated, net not clobbered" This reverts commit 8ee02ddd23bafc87b9fce52b60c6026335bb722d.	2019-04-11 10:29:40 -04:00
Lang Martin	07ff740408	fingerprint Constraints and Affinities have Equals, as set	2019-04-11 09:56:22 -04:00
Lang Martin	8f07698c03	structs missed applying a style change from the review	2019-04-11 09:56:22 -04:00
Lang Martin	7258a13c72	client, structs comments	2019-04-11 09:56:22 -04:00
Lang Martin	1878bf694e	client Networks Equals is set equality	2019-04-11 09:56:22 -04:00
Lang Martin	e1c91afd19	struct cleanup indentation in RequestedDevice Equals	2019-04-11 09:56:22 -04:00
Lang Martin	0c90efebdc	struct Equals checks for identity before value checking	2019-04-11 09:56:22 -04:00
Lang Martin	1a594b53f6	refactor struts.RequestedDevice to have its own Equals	2019-04-11 09:56:21 -04:00
Lang Martin	ec1ccdeda0	refactor structs.Resource.Networks to have its own Equals NodeResource.Networks uses the same function	2019-04-11 09:56:21 -04:00
Lang Martin	06008465c4	refactor structs.Resource.Devices to have its own Equals	2019-04-11 09:56:21 -04:00
Lang Martin	36f3022246	add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources	2019-04-11 09:56:21 -04:00
Lang Martin	d4567e9909	add structs.Resources Equals	2019-04-11 09:56:21 -04:00
Danielle Lancashire	e135876493	allocs: Add nomad alloc restart This adds a `nomad alloc restart` command and api that allows a job operator with the alloc-lifecycle acl to perform an in-place restart of a Nomad allocation, or a given subtask.	2019-04-11 14:25:49 +02:00
Chris Baker	34e100cc96	server vault client: use two vault clients, one with namespace, one without for /sys calls	2019-04-10 10:34:10 -05:00
Michael Schurter	cc7768c170	Update nomad/structs/config/vault.go Co-Authored-By: cgbaker <cgbaker@hashicorp.com>	2019-04-10 10:34:10 -05:00
Chris Baker	a26d4fe1e5	docs: -vault-namespace, VAULT_NAMESPACE, and config agent: added VAULT_NAMESPACE env-based configuration	2019-04-10 10:34:10 -05:00
Chris Baker	d3041cdb17	wip: added config parsing support, CLI flag, still need more testing, VAULT_ var, documentation	2019-04-10 10:34:10 -05:00
Chris Baker	0eaeef872f	config/docs: added `namespace` to vault config server/client: process `namespace` config, setting on the instantiated vault client	2019-04-10 10:34:10 -05:00
Michael Schurter	c0cd96ef75	Update nomad/job_endpoint_test.go Co-Authored-By: cgbaker <cgbaker@hashicorp.com>	2019-04-10 10:34:10 -05:00
Michael Schurter	188c32421a	Update nomad/job_endpoint.go Co-Authored-By: cgbaker <cgbaker@hashicorp.com>	2019-04-10 10:34:10 -05:00
Chris Baker	0ba1600545	server/job_endpoint: accept vault token and pass as part of Job.RegisterRequest [#4555 ]	2019-04-10 10:34:10 -05:00
James Rasell	9470507cf4	Add NodeName to the alloc/job status outputs. Currently when operators need to log onto a machine where an alloc is running they will need to perform both an alloc/job status call and then a call to discover the node name from the node list. This updates both the job status and alloc status output to include the node name within the information to make operator use easier. Closes #2359 Cloess #1180	2019-04-10 10:34:10 -05:00
Michael Schurter	45b4827ad7	Bump to 0.9.1-dev	2019-04-09 09:01:48 -07:00
Nomad Release bot	e307734e4a	Generate files for 0.9.0 release	2019-04-09 01:56:00 +00:00
Michael Schurter	3af602b633	Remove 0.9.0-rc2 generated files	2019-04-03 07:41:09 -07:00
Nomad Release bot	16b4336ccf	Generate files for 0.9.0-rc2 release	2019-04-03 01:54:29 +00:00
Michael Schurter	9afbc45cff	Bump to dev post-0.9.0-rc1 release	2019-03-22 08:26:30 -07:00
Nomad Release bot	3ab3dd4105	Generate files for 0.9.0-rc1 release	2019-03-21 19:06:13 +00:00
HashedDan	caad68e799	server: inconsistent receiver notation corrected Signed-off-by: HashedDan <georgedanielmangum@gmail.com>	2019-03-16 17:53:53 -05:00
Alex Dadgar	e779d9444b	Update nomad/eval_endpoint_test.go Co-Authored-By: schmichael <michael.schurter@gmail.com>	2019-03-05 15:19:15 -08:00
Alex Dadgar	1857f5d7c1	Update nomad/eval_endpoint.go Co-Authored-By: schmichael <michael.schurter@gmail.com>	2019-03-05 15:19:07 -08:00
Michael Schurter	e37bbb21a5	nomad: simplify code and improve parameter name	2019-03-04 13:44:14 -08:00
Michael Schurter	05f51499ba	nomad: compare current eval when setting WaitIndex Consider currently dequeued Evaluation's ModifyIndex when determining its WaitIndex. Normally the Evaluation itself would already be in the state store snapshot used to determine the WaitIndex. However, since the FSM applies Raft messages to the state store concurrently with Dequeueing, it's possible the currently dequeued Evaluation won't yet exist in the state store snapshot used by JobsForEval. This can be solved by always considering the current eval's modify index and using it if it is greater than all of the evals returned by the state store.	2019-03-01 15:23:39 -08:00
Michael Schurter	3f386e3951	Remove generated files for 0.9.0-beta3	2019-02-26 10:34:08 -08:00
Michael Schurter	d74755900e	Generate files for 0.9.0-beta3 release	2019-02-26 09:44:49 -08:00
Charlie Voiselle	604c49beb8	Merge pull request #5344 from hashicorp/b-nexteval-for-failed-follow-up Set NextEval when making `failed-follow-up` evals	2019-02-22 14:14:41 -08:00
Charlie Voiselle	006afdca9b	Added comments * caller should created eval id * prev/next eval used in failed-follow-up	2019-02-22 10:22:52 -08:00
Charlie Voiselle	c28c195f42	Set NextEval when making `failed-follow-up` evals This allows users to locate failed-follow-up evals more easily	2019-02-20 16:07:11 -08:00
Michael Schurter	6580ed668e	client: don't redownload completed artifacts on retries Track the download status of each artifact independently so that if only one of many artifacts fails to download, completed artifacts aren't downloaded again.	2019-02-20 08:45:12 -08:00
Michael Schurter	2db91425e3	Remove 0.9.0-beta2 generated files	2019-02-01 08:28:44 -08:00
Alex Dadgar	84d0afccae	Generate files for 0.9.0-beta2	2019-01-30 13:31:50 -08:00
Alex Dadgar	d2e5ede119	remove generated structs	2019-01-30 12:38:34 -08:00
Alex Dadgar	41265d4d61	Change types of weights on spread/affinity	2019-01-30 12:20:38 -08:00
Alex Dadgar	bc804dda2e	Nomad 0.9.0-beta1 generated code	2019-01-30 10:49:44 -08:00
Preetha Appan	c848a1d387	ensure tests run a 0.9 server	2019-01-29 16:19:45 -06:00
Preetha Appan	496eb1de0c	Guard operator endpoints for minimum server version	2019-01-29 15:50:36 -06:00
Preetha Appan	7578522f58	variable name fix	2019-01-29 13:48:45 -06:00
Preetha Appan	a6cebbbf9e	Make sure that all servers are 0.9 before applying scheduler config entry	2019-01-29 12:47:42 -06:00
Michael Schurter	3aba7ee826	nomad: fix panic when no node conn found A missing return would cause a panic when a server could find no route to a client.	2019-01-28 21:55:35 -08:00
Mahmood Ali	f9164dae67	Merge pull request #5228 from hashicorp/f-vault-err-tweaks server/vault: tweak error messages	2019-01-25 11:17:31 -05:00
Mahmood Ali	f4560d8a2a	server/vault: tweak error messages Closes #5139	2019-01-25 10:33:54 -05:00
Preetha	ec92bf673c	Merge pull request #5223 from hashicorp/f-jobs-list-datacenters Add Datacenters to the JobListStub struct	2019-01-24 08:13:30 -06:00
Michael Schurter	13f061a83f	Merge pull request #5196 from hashicorp/f-plugin-utils Make plugins/shared external and make pluginutls/	2019-01-23 06:59:32 -08:00
Michael Schurter	32daa7b47b	goimports until make check is happy	2019-01-23 06:27:14 -08:00
Michael Schurter	be0bab7c3f	move pluginutils -> helper/pluginutils I wanted a different color bikeshed, so I get to paint it	2019-01-22 15:50:08 -08:00
Alex Dadgar	4bdccab550	goimports	2019-01-22 15:44:31 -08:00
Alex Dadgar	cdcd3c929c	loader and singleton	2019-01-22 15:11:57 -08:00
Alex Dadgar	6c2782f037	move catalog + grpcutils	2019-01-22 15:11:57 -08:00
Preetha Appan	38422642cb	Use DesiredState to determine whether to stop sending task events	2019-01-22 16:43:32 -06:00
Michael Lange	ce7bc4f56f	Add Datacenters to the JobsListStub struct So it can be used for filtering the full list of jobs	2019-01-22 11:16:35 -08:00
Mahmood Ali	e1803b685b	tests: deflake TestClientAllocations_GarbageCollect_Remote Use the same strategy as one in f2f383b07543a09ca989b82738926f7248e1ab28	2019-01-19 09:07:27 -05:00
Mahmood Ali	b2203a3a22	Merge pull request #5215 from hashicorp/test-fix-garbagecollect test: fix flaky garbage collect test	2019-01-18 21:10:01 -05:00
Mahmood Ali	05e32fb525	Merge pull request #5213 from hashicorp/b-api-separate Slimmer /api package	2019-01-18 20:52:53 -05:00
Michael Schurter	0cd35ba335	test: fix flaky garbage collect test This seems to fix TestClientAllocations_GarbageCollectAll_Remote being flaky. This test confuses me. It joins 2 servers, but then goes out of its way to make sure the test client only interacts with one. There are not enough comments for me to figure out the precise assertions this test is trying to make. A good old fashioned wait-for-the-client-to-register seems to fix the flakiness though. The error was that the node could not be found, so this makes some sense. However, lots of other tests seem to use the same "wait for node" logic and don't appear to be flaky, so who knows why waiting fixes this one. Passes with -race.	2019-01-18 16:01:30 -08:00
Mahmood Ali	7bdd43f3e0	api: avoid codegen for syncing Given that the values will rarely change, specially considering that any changes would be backward incompatible change. As such, it's simpler to keep syncing manually in the rare occasion and avoid the syncing code overhead.	2019-01-18 18:52:31 -05:00
Preetha Appan	510d7839e4	code review comments	2019-01-18 17:41:39 -06:00
Mahmood Ali	253532ec00	api: avoid import nomad/structs pkg nomad/structs is an internal package and imports many libraries (e.g. raft, codec) that are not relevant to api clients, and may cause unnecessary dependency pain (e.g. `github.com/ugorji/go/codec` version is very old now). Here, we add a code generator that imports the relevant constants from `nomad/structs`. I considered using this approach for other structs, but didn't find a quick viable way to reduce duplication. `nomad/structs` use values as struct fields (e.g. `string`), while `api` uses value pointer (e.g. `*string`) instead. Also, sometimes, `api` structs contain deprecated fields or additional documentation, so simple copy-paste doesn't work. For these reasons, I opt to keep the status quo.	2019-01-18 14:51:19 -05:00
Preetha Appan	be9656d195	fix linting	2019-01-17 15:36:33 -06:00
Preetha Appan	0f8a113ead	Refactor to find jobs with child instances more effeciently also added unit tests	2019-01-17 14:29:48 -06:00
Preetha Appan	be36fee48e	Use IsParameterized/isPeriodic methods	2019-01-17 12:15:42 -06:00
Preetha Appan	81a8f18cac	Fix bug in reconcile summaries that affects periodic/parameterized jobs This fixes incorrect parent job summaries by recomputing them in the ReconcileJobSummaries method in the state store	2019-01-17 12:01:01 -06:00
Nick Ethier	597b7b751d	tr: add retry /w backoff to stats_hook failure	2019-01-12 12:18:24 -05:00
Mahmood Ali	4414a2ce1c	tests: remove tests for unsupported features With switching to driver plugins, driver validation is quite tricky and we need to do some design thinking before supporting it against.	2019-01-10 10:21:48 -05:00
Nick Wales	7a7b5da0df	Adds optional Consul service tags to nomad server and agent services, gh#4297	2019-01-09 22:02:46 +00:00
Mahmood Ali	1f2473263e	fix more cases of logging arity errors	2019-01-09 09:22:47 -05:00
Mahmood Ali	6f077a73dc	Fix panic on failure Error expects an odd number of arguments, and panics otherwise.	2019-01-08 12:19:44 -05:00
Michael Schurter	324e989327	Merge pull request #5034 from hashicorp/test-fix-races Test fix races	2019-01-08 07:04:09 -08:00
Alex Dadgar	79cfe26021	vet	2019-01-07 14:49:41 -08:00
Alex Dadgar	8a35d7b1dd	Test recovery	2019-01-07 14:49:41 -08:00
Nick Ethier	a96afb6c91	fix tests that fail as a result of async client startup	2018-12-20 00:53:44 -05:00
Michael Schurter	6c1dbb659d	test: fix race and nil panic in nomad/ tests Race was test only and due to unlocked map access. Panic was test only and due to checking a field on a struct even when we knew the struct was nil. Race output that was fixed: ``` ================== WARNING: DATA RACE Read at 0x00c000697dd0 by goroutine 768: runtime.mapaccess2() /usr/local/go/src/runtime/map.go:439 +0x0 github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds.func8() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:402 +0xe6 github.com/hashicorp/nomad/testutil.WaitForResultRetries() /home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:30 +0x5a github.com/hashicorp/nomad/testutil.WaitForResult() /home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:22 +0x57 github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:401 +0xb53 testing.tRunner() /usr/local/go/src/testing/testing.go:827 +0x162 Previous write at 0x00c000697dd0 by goroutine 569: runtime.mapassign() /usr/local/go/src/runtime/map.go:549 +0x0 github.com/hashicorp/nomad/nomad.(PeriodicDispatch).Add() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:224 +0x2eb github.com/hashicorp/nomad/nomad.(Server).restorePeriodicDispatcher() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:394 +0x29a github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:234 +0x593 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Goroutine 768 (running) created at: testing.(T).Run() /usr/local/go/src/testing/testing.go:878 +0x650 testing.runTests.func1() /usr/local/go/src/testing/testing.go:1119 +0xa8 testing.tRunner() /usr/local/go/src/testing/testing.go:827 +0x162 testing.runTests() /usr/local/go/src/testing/testing.go:1117 +0x4ee testing.(M).Run() /usr/local/go/src/testing/testing.go:1034 +0x2ee main.main() _testmain.go:1150 +0x221 Goroutine 569 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 ================== ```	2018-12-19 15:48:02 -08:00
Michael Schurter	004fa574cb	test: fix race in eval broker update chan Similar to previous commits the delayed eval update chan was set and access from different goroutines causing a race. Passing the chan on the stack resolves the race. Race output from `go test -race -run 'Server_RPC$'` in nomad/ ``` ================== WARNING: DATA RACE Write at 0x00c000339150 by goroutine 63: github.com/hashicorp/nomad/nomad.(EvalBroker).flush() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:708 +0x3dc github.com/hashicorp/nomad/nomad.(EvalBroker).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:174 +0xc4 github.com/hashicorp/nomad/nomad.(Server).revokeLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:718 +0x1fd github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122 +0x95d github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Previous read at 0x00c000339150 by goroutine 73: github.com/hashicorp/nomad/nomad.(EvalBroker).runDelayedEvalsWatcher() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:771 +0x176 Goroutine 63 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 Goroutine 73 (running) created at: github.com/hashicorp/nomad/nomad.(EvalBroker).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:170 +0x173 github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:207 +0x355 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c ================== ```	2018-12-19 15:48:02 -08:00
Michael Schurter	1c137690c4	test: fix race around block eval chans Similar to previous commit, stop and change chans were being set and accessed from different goroutines. Passing the chans on the stack resolves the race. Output from `go test -race -run 'Server_RPC$' in nomad/ ``` ================== WARNING: DATA RACE Write at 0x00c0002b4e10 by goroutine 63: github.com/hashicorp/nomad/nomad.(BlockedEvals).Flush() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:648 +0x32a github.com/hashicorp/nomad/nomad.(BlockedEvals).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149 +0x12b github.com/hashicorp/nomad/nomad.(Server).revokeLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721 +0x232 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122 +0x95d github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Previous read at 0x00c0002b4e10 by goroutine 75: github.com/hashicorp/nomad/nomad.(BlockedEvals).watchCapacity() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:483 +0xfe Goroutine 63 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 Goroutine 75 (finished) created at: github.com/hashicorp/nomad/nomad.(BlockedEvals).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:141 +0xba github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210 +0x392 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c ================== ================== WARNING: DATA RACE Write at 0x00c0002b4e50 by goroutine 63: github.com/hashicorp/nomad/nomad.(BlockedEvals).Flush() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:649 +0x388 github.com/hashicorp/nomad/nomad.(BlockedEvals).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149 +0x12b github.com/hashicorp/nomad/nomad.(Server).revokeLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721 +0x232 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122 +0x95d github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Previous read at 0x00c0002b4e50 by goroutine 77: github.com/hashicorp/nomad/nomad.(BlockedEvals).prune() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:690 +0xae Goroutine 63 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 Goroutine 77 (finished) created at: github.com/hashicorp/nomad/nomad.(BlockedEvals).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:142 +0xdc github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210 +0x392 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c ================== ```	2018-12-19 15:48:02 -08:00
Michael Schurter	80263861aa	test: fix race around updateCh handling PeriodicDispatch.SetEnabled sets updateCh in one goroutine, and PeriodicDispatch.run accesses updateCh in another. The race can be prevented by having SetEnabled pass updateCh to run. Race detector output from `go test -race -run TestServer_RPC` in nomad/ ``` ================== WARNING: DATA RACE Write at 0x00c0001d3f48 by goroutine 75: github.com/hashicorp/nomad/nomad.(PeriodicDispatch).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:468 +0x256 github.com/hashicorp/nomad/nomad.(Server).revokeLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:724 +0x267 github.com/hashicorp/nomad/nomad.(Server).leaderLoop.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:131 +0x3c github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:163 +0x4dd github.com/hashicorp/nomad/nomad.(Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c Previous read at 0x00c0001d3f48 by goroutine 515: github.com/hashicorp/nomad/nomad.(PeriodicDispatch).run() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:338 +0x177 Goroutine 75 (running) created at: github.com/hashicorp/nomad/nomad.(Server).monitorLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70 +0x269 Goroutine 515 (running) created at: github.com/hashicorp/nomad/nomad.(PeriodicDispatch).SetEnabled() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:176 +0x1bc github.com/hashicorp/nomad/nomad.(Server).establishLeadership() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:231 +0x582 github.com/hashicorp/nomad/nomad.(Server).leaderLoop() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117 +0x82e github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1() /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72 +0x6c ================== ```	2018-12-19 15:48:02 -08:00
Danielle Tomlinson	3647b701a6	taskrunner: Emit task events when a hook fails	2018-12-13 18:20:18 +01:00
Chris Baker	af593c401c	Merge pull request #4974 from hashicorp/b-1173-log-spam rpc accept loop: added backoff on logging	2018-12-12 16:54:42 -08:00
Chris Baker	121a9eb8cb	some changes for more idiomatic code	2018-12-12 23:11:17 +00:00
Alex Dadgar	fbe4d67d1b	fix iops related tests	2018-12-12 14:32:22 -08:00
Chris Baker	34600f8b75	fixed bug in loop delay	2018-12-12 19:16:41 +00:00
Chris Baker	89c64932c1	gofmt	2018-12-12 19:09:06 +00:00
Chris Baker	22c11d8799	improved code for readability	2018-12-12 18:52:06 +00:00
Preetha	f406e66ab8	Merge pull request #4881 from hashicorp/f-device-preemption Device preemption	2018-12-11 18:34:19 -06:00
Alex Dadgar	1531b6d534	Merge pull request #4970 from hashicorp/f-no-iops Deprecate IOPS	2018-12-11 12:51:22 -08:00
Chris Baker	59beae35df	nomad/rpc listener: modified to throttle logging on "permanent" Accept() errors as well (with a higher delay cap)	2018-12-07 22:14:15 +00:00
Chris Baker	707bac0a7b	rpc accept loop: added backoff on logging for failed connections, in case there is a fast fail loop (NMD-1173)	2018-12-07 20:12:55 +00:00
Alex Dadgar	c918a96490	Warn if IOPS is being used	2018-12-06 16:17:09 -08:00
Alex Dadgar	1e3c3cb287	Deprecate IOPS IOPS have been modelled as a resource since Nomad 0.1 but has never actually been detected and there is no plan in the short term to add detection. This is because IOPS is a bit simplistic of a unit to define the performance requirements from the underlying storage system. In its current state it adds unnecessary confusion and can be removed without impacting any users. This PR leaves IOPS defined at the jobspec parsing level and in the api/ resources since these are the two public uses of the field. These should be considered deprecated and only exist to allow users to stop using them during the Nomad 0.9.x release. In the future, there should be no expectation that the field will exist.	2018-12-06 15:09:26 -08:00
Alex Dadgar	14a61ea3ea	Don't GC running but desired stop allocations This PR fixes an edge case where we could GC an allocation that was in a desired stop state but had not terminated yet. This can be hit if the client hasn't shutdown the allocation yet or if the allocation is still shutting down (long kill_timeout). Fixes https://github.com/hashicorp/nomad/issues/4940	2018-12-05 13:01:12 -08:00
Mahmood Ali	adb4d69576	Merge pull request #4956 from hashicorp/b-vault-client-tweaks-followup server/vault: Lock Vault expiration tracking	2018-12-04 19:46:59 -05:00
Mahmood Ali	50e38104a5	server/nomad: Lock Vault expiration tracking `currentExpiration` field is accessed in multiple goroutines: Stats and renewal, so needs locking. I don't anticipate high contention, so simple mutex suffices.	2018-12-04 09:29:48 -05:00
Preetha Appan	8656d3379f	Add guards around subtracting summary count	2018-12-03 11:16:35 -06:00
Danielle Tomlinson	51a9f7369e	Merge pull request #4936 from hashicorp/f-legacy-refactor Refactor and repackage client/driver	2018-11-30 13:38:06 +01:00
Danielle Tomlinson	d4cbd608ff	nomad: Remove on-submission job validation With the introduction of driver plugins, we're temporarily relying on _run time validation_ of driver configurations, rather than submission time.	2018-11-30 10:47:08 +01:00
Nick Ethier	80ae7e34f4	Merge pull request #4906 from hashicorp/f-metric-prefix-master Port metric prefix filtering to master	2018-11-29 22:27:47 -05:00
Nick Ethier	b1484aec33	nomad: fix hclog usage	2018-11-29 22:27:39 -05:00
Mahmood Ali	0a2611e41f	vault: protect against empty Vault secret response Also, fix a case where a successful second attempt of loading token can cause a panic.	2018-11-29 09:34:17 -05:00
Alex Dadgar	4ee603c382	Device hook and devices affect computed node class This PR introduces a device hook that retrieves the device mount information for an allocation. It also updates the computed node class computation to take into account devices. TODO Fix the task runner unit test. The environment variable is being lost even though it is being properly set in the prestart hook.	2018-11-27 17:25:33 -08:00
Nick Ethier	95362eaa02	Merge pull request #4844 from hashicorp/f-docker-plugin Docker driver plugin	2018-11-20 20:43:03 -05:00
Mahmood Ali	2e6133fd33	nil secrets as recoverable to keep renew attempts	2018-11-20 17:11:55 -05:00
Mahmood Ali	5827438983	Renew past recorded expiry till unrecoverable error Keep attempting to renew Vault token past locally recorded expiry, just in case the token was renewed out of band, e.g. on another Nomad server, until Vault returns an unrecoverable error.	2018-11-20 17:10:55 -05:00
Mahmood Ali	5836a341dd	fix typo	2018-11-20 17:10:55 -05:00
Mahmood Ali	93add67e04	round ttl duration for users	2018-11-20 17:10:55 -05:00
Mahmood Ali	4a0544b369	Track renewal expiration properly	2018-11-20 17:10:55 -05:00
Mahmood Ali	79aa934a4b	reconcile interface	2018-11-20 17:10:55 -05:00
Mahmood Ali	6efea6d8fc	Populate agent-info with vault Return Vault TTL info to /agent/self API and `nomad agent-info` command.	2018-11-20 17:10:55 -05:00
Mahmood Ali	6034af5084	Avoid explicit precomputed stats field Seems like the stats field is a micro-optimization that doesn't justify the complexity it introduces. Removing it and computing the stats from revoking field directly.	2018-11-20 17:10:54 -05:00
Mahmood Ali	14842200ec	More metrics for Server vault Add a gauge to track remaining time-to-live, duration of renewal request API call.	2018-11-20 17:10:54 -05:00
Mahmood Ali	e1994e59bd	address review comments	2018-11-20 17:10:54 -05:00
Mahmood Ali	35179c9655	Wrap Vault API api errors for easing debugging	2018-11-20 17:10:54 -05:00
Mahmood Ali	55456fc823	Set a 1s floor for Vault renew operation backoff	2018-11-20 17:10:54 -05:00
Mahmood Ali	7ad8f6c103	Merge pull request #4903 from hashicorp/b-delete-versions-mod-while-iter Fix a panic related to batch GC	2018-11-20 15:16:02 -05:00
Mahmood Ali	6281700c0c	address review comments	2018-11-20 13:21:39 -05:00
Nick Ethier	5c5cae79ab	nomad: only lookup job is disable_dispatched_job_summary_metrics is set	2018-11-19 23:22:23 -05:00
Nick Ethier	8ac69f440d	nomad: lookup job instead of adding Dispatched to summary	2018-11-19 23:22:02 -05:00
Nick Ethier	85b221a1d6	nomad: add flag to disable publishing of job_summary metrics for dispatched jobs	2018-11-19 23:21:19 -05:00
Nick Ethier	29591a7c2e	task_runner: emit event on task exit with exit result details	2018-11-19 22:59:17 -05:00
Mahmood Ali	d744e71fa9	add a missing no errorassertion	2018-11-19 21:44:00 -05:00
Mahmood Ali	b93643cd96	Fix a panic related to batch GC `deleteJobVersions` does concurrent modifications to iterated items while iterating, by deleting job versions while it's iterating on them,	2018-11-19 20:59:45 -05:00
Mahmood Ali	bff9c3b3e9	Reproduce a panic related to batch GC Test case that reproduces a panic with the following stacktrace: ``` panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715] goroutine 35 [running]: testing.tRunner.func1(0xc0001e2200) /usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387 panic(0x167e400, 0x1c43a30) /usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9 github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530) /go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5 github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(radixIterator).Next(0xc0003a0420, 0x1756059, 0xb) /go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e github.com/hashicorp/nomad/nomad/state.(StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0) /go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1 github.com/hashicorp/nomad/nomad/state.(StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0) /go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100) /go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2 github.com/hashicorp/nomad/nomad/state.(StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0) /go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79 github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200) /go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685 testing.tRunner(0xc0001e2200, 0x1777138) /usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf created by testing.(T).Run /usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353 ```	2018-11-19 20:58:32 -05:00
Michael Schurter	56ed4f01be	vault: fix panic by checking for nil secret Vault's RenewSelf(...) API may return (nil, nil). We failed to check if secret was nil before attempting to use it. RenewSelf: `e3eee5b4fb/api/auth_token.go (L138-L155)` Calls ParseSecret: `e3eee5b4fb/api/secret.go (L309-L311)` If anyone has an idea on how to test this I didn't see any options. We use a real Vault service, so there's no opportunity to mock the response.	2018-11-19 17:07:59 -08:00
Danielle Tomlinson	8bf17fe22d	Merge pull request #4875 from hashicorp/f-constraints scheduler: Make != constraints more flexible	2018-11-15 11:04:21 -08:00
Danielle Tomlinson	9c72dafc95	scheduler: Add is_set/is_not_set constraints This adds constraints for asserting that a given attribute or value exists, or does not exist. This acts as a companion to =, or != operators, e.g: ```hcl constraint { attribute = "${attrs.type}" operator = "!=" value = "database" } constraint { attribute = "${attrs.type}" operator = "is_set" } ```	2018-11-15 11:00:32 -08:00
Preetha Appan	e5de50fba8	Initial implementation of device preemption	2018-11-15 11:09:26 -06:00
Mahmood Ali	046f098bac	Track Node Device attributes and serve them in API	2018-11-14 14:42:29 -05:00
Mahmood Ali	a4a9347501	fix comment typos	2018-11-14 08:36:14 -05:00
Mahmood Ali	1e92161f14	Merge pull request #4858 from hashicorp/b-fix-master-20181109 Fix some tests in master	2018-11-13 16:08:26 -05:00
Alex Dadgar	08dc2ea702	Merge pull request #4867 from hashicorp/b-deployment-progress-deadline Blocked evaluation fixes	2018-11-13 10:29:03 -08:00
Mahmood Ali	865419e756	convert all config durations to strings in tests	2018-11-13 10:21:40 -05:00
Mahmood Ali	4e18846fd9	Adjust streaming duration This test expects 11 repeats of the same message emitted at intervals of 200ms; so we need more than 2 seconds to adjust for time sleep variations and the like. So raising it to 3s here that should be enough.	2018-11-13 10:21:40 -05:00
Mahmood Ali	1403ad21b9	Changelog job re-run fix	2018-11-13 07:52:51 -05:00
Mahmood Ali	e2d668f21c	Merge pull request #4861 from hashicorp/b-batch-deregister-transaction Run job deregistering in a single transaction	2018-11-12 20:59:44 -05:00
Alex Dadgar	a90dc978e1	Handle new eval being the duplicate properly	2018-11-12 16:02:23 -08:00
Mahmood Ali	8513b3cccb	Comment public functions and batch write txn	2018-11-12 16:09:39 -05:00
Preetha Appan	7ef126a027	Smaller methods, and added tests for RPC layer	2018-11-10 17:37:33 -06:00
Preetha Appan	75662b50d1	Use response object/querymeta/writemeta in scheduler config API	2018-11-10 10:31:10 -06:00
Mahmood Ali	9c0a15f3ce	Run job deregistering in a single transaction Fixes https://github.com/hashicorp/nomad/issues/4299 Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals. Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index. However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals. When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation. This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.	2018-11-09 22:35:26 -05:00
Preetha	3739713ce1	Merge pull request #4839 from hashicorp/b-gc-alloc-jobversion Remove terminal allocations associated with older job modify index	2018-11-09 12:21:42 -06:00
Preetha Appan	39072977d6	Use create index as trigger condition to gc old terminal allocs	2018-11-09 11:44:21 -06:00
Alex Dadgar	2f06d88f47	Merge pull request #4847 from hashicorp/b-blocked-eval Blocked evaluation fixes	2018-11-08 13:40:01 -08:00
Alex Dadgar	98398a8a44	Merge pull request #4842 from hashicorp/b-deployment-progress-deadline Fix multiple bugs with progress deadline handling	2018-11-08 13:31:54 -08:00
Alex Dadgar	991791a513	typo fix	2018-11-08 13:28:27 -08:00
Alex Dadgar	be54e56570	review fixes	2018-11-08 09:48:36 -08:00
Preetha Appan	5f0a9d2cfd	Show preemption output in plan CLI	2018-11-08 09:48:43 -06:00
Alex Dadgar	dbb05357bc	fix test	2018-11-07 11:59:24 -08:00
Alex Dadgar	36abd3a3d8	review comments	2018-11-07 10:33:22 -08:00
Alex Dadgar	e3cbb2c82e	allocs fit checks if devices get oversubscribed	2018-11-07 10:33:22 -08:00
Alex Dadgar	4f9b3ede87	Split device accounter and allocator	2018-11-07 10:32:03 -08:00
Alex Dadgar	6fa893c801	affinities	2018-11-07 10:32:03 -08:00
Alex Dadgar	feb83a2be3	assign devices	2018-11-07 10:32:03 -08:00
Alex Dadgar	2d2248e209	Add devices to allocated resources	2018-11-07 10:32:03 -08:00
Alex Dadgar	b1c5d52817	Track jobs by namespace	2018-11-07 10:22:08 -08:00
Alex Dadgar	6d8bb3a7bd	Duplicate blocked evals cancelling improved The old logic for cancelling duplicate blocked evaluations by job id had the issue where the newer evaluation could have additional node classes that it is (in)eligible for that we would not capture. This could make it such that cluster state could change such that the job would make progress but no evaluation was unblocked.	2018-11-07 10:08:23 -08:00
Preetha Appan	a9aec7e628	Fix failing resource subtraction test	2018-11-06 12:26:26 -06:00
Alex Dadgar	261aae32b1	more robust merging of the deployment status when getting updates from the client	2018-11-05 16:39:09 -08:00
Alex Dadgar	1c31970464	Fix multiple tgs with progress deadline handling Fix an issue in which the deployment watcher would fail the deployment based on the earliest progress deadline of the deployment regardless of if the task group has finished. Further fix an issue where the blocked eval optimization would make it so no evals were created to progress the deployment. To reproduce this issue, prior to this commit, you can create a job with two task groups. The first group has count 1 and resources such that it can not be placed. The second group has count 3, max_parallel=1, and can be placed. Run this first and then update the second group to do a deployment. It will place the first of three, but never progress since there exists a blocked eval. However, that doesn't capture the fact that there are two groups being deployed.	2018-11-05 16:06:17 -08:00
Preetha Appan	6fdc84cce3	add comment	2018-11-02 18:11:36 -05:00
Preetha Appan	a6b714b81c	update preemption tests to use new node resource structs also includes a fix to remove unnecessary subtraction of network mbits	2018-11-02 17:59:53 -05:00
Preetha	b2b52b1ada	Merge pull request #4794 from hashicorp/f-preemption-systemjobs Preemption for system jobs	2018-11-02 16:28:06 -05:00
Preetha Appan	c33469157d	unit test plan apply with preemptions	2018-11-01 20:06:32 -05:00
Preetha Appan	57fe5050f0	more minor review feedback	2018-11-01 17:05:17 -05:00
Preetha Appan	fd60e66f86	Plumb alloc resource cache in a few more places. also removed now unused method	2018-11-01 16:44:43 -05:00
Preetha Appan	e586817ce7	batch jobs GC removes terminal allocs if job modifyindex is older than running job	2018-11-01 00:05:31 -05:00
Mahmood Ali	9da19c6450	address review comments	2018-10-30 13:58:52 -04:00
Mahmood Ali	4937095389	Allow artifacts checksum interpolation Fixes https://github.com/hashicorp/nomad/issues/4814	2018-10-30 13:24:30 -04:00
Preetha Appan	f1c3eb2792	Introduce interface with multiple implementations for resource distance	2018-10-30 11:06:32 -05:00
Preetha Appan	8f7eb61823	Introduce a response object for scheduler configuration	2018-10-30 11:06:32 -05:00
Preetha Appan	1a5421f5d7	more minor cleanup	2018-10-30 11:06:32 -05:00
Preetha Appan	0494a098ce	More style and readablity fixes from review	2018-10-30 11:06:32 -05:00
Preetha Appan	1415032c13	More review comments	2018-10-30 11:06:32 -05:00
Preetha Appan	b97f85e3e0	style fixes	2018-10-30 11:06:32 -05:00
Preetha Appan	12278527c7	make default config a variable	2018-10-30 11:06:32 -05:00
Preetha Appan	32cc764072	Add fsm layer tests	2018-10-30 11:06:32 -05:00
Preetha Appan	7b8156fc47	Restore/Snapshot plus unit tests for scheduler configuration	2018-10-30 11:06:32 -05:00
Preetha Appan	8807c25b11	Modify preemption code to use new style of resource structs	2018-10-30 11:06:32 -05:00
Preetha Appan	c1c1c230e4	Make preemption config a struct to allow for enabling based on scheduler type	2018-10-30 11:06:32 -05:00
Preetha Appan	bd34cbb1f7	Support for new scheduler config API, first use case is to disable preemption	2018-10-30 11:06:32 -05:00
Preetha Appan	3190a2c29b	Fix linting	2018-10-30 11:06:32 -05:00
Preetha Appan	eb38488d08	Fix logic bug, unit test for plan apply method in state store	2018-10-30 11:06:32 -05:00
Preetha Appan	9e4a35fff0	Fix comment	2018-10-30 11:06:32 -05:00
Preetha Appan	cc295b90de	Implement preemption for system jobs. This commit implements an allocation selection algorithm for finding allocations to preempt. It currently special cases network resource asks from others (cpu/memory/disk/iops).	2018-10-30 11:06:32 -05:00

... 4 5 6 7 8 ...

3045 Commits