Commit Graph

2981 Commits

Author SHA1 Message Date
Michael Schurter 81b4b6f19b
Merge pull request #5791 from hashicorp/b-plan-snapshotindex
nomad: include snapshot index when submitting plans
2019-07-17 09:25:00 -07:00
Mahmood Ali ad39bcef60 rpc: use tls wrapped connection for streaming rpc
This ensures that server-to-server streaming RPC calls use the tls
wrapped connections.

Prior to this, `streamingRpcImpl` function uses tls for setting header
and invoking the rpc method, but returns unwrapped tls connection.
Thus, streaming writes fail with tls errors.

This tls streaming bug existed since 0.8.0[1], but PR #5654[2]
exacerbated it in 0.9.2.  Prior to PR #5654, nomad client used to
shuffle servers at every heartbeat -- `servers.Manager.setServers`[3]
always shuffled servers and was called by heartbeat code[4].  Shuffling
servers meant that a nomad client would heartbeat and establish a
connection against all nomad servers eventually.  When handling
streaming RPC calls, nomad servers used these local connection to
communicate directly to the client.  The server-to-server forwarding
logic was left mostly unexercised.

PR #5654 means that a nomad client may connect to a single server only
and caused the server-to-server forward streaming RPC code to get
exercised more and unearthed the problem.

[1] https://github.com/hashicorp/nomad/blob/v0.8.0/nomad/rpc.go#L501-L515
[2] https://github.com/hashicorp/nomad/pull/5654
[3] https://github.com/hashicorp/nomad/blob/v0.9.1/client/servers/manager.go#L198-L216
[4] https://github.com/hashicorp/nomad/blob/v0.9.1/client/client.go#L1603
2019-07-12 14:41:44 +08:00
Mahmood Ali 9c9bec62fd rpc: add positive tests for server streaming RPC 2019-07-12 14:32:52 +08:00
Lang Martin 0b97175a16 node_endpoint preserve both messages as rpcs and in raft 2019-07-10 13:56:20 -04:00
Lang Martin ee4848167c core_sched add compat comment for later removal 2019-07-10 13:56:20 -04:00
Lang Martin c13c97c6c2 structs drop deprecation warning, revert unnecessary comment change 2019-07-10 13:56:20 -04:00
Lang Martin a95225d754 NodeDeregisterBatch -> NodeBatchDeregister match JobBatch pattern 2019-07-10 13:56:20 -04:00
Lang Martin a8e72a5b68 state_store error if called without node_ids 2019-07-10 13:56:20 -04:00
Lang Martin 44cbca9b98 fsm new NodeDeregisterBatchRequestType sorted at the end of the case 2019-07-10 13:56:20 -04:00
Lang Martin 91e139dcb5 structs NodeDeregisterBatchRequestType must go at the end 2019-07-10 13:56:20 -04:00
Lang Martin 1cc6b4062c fsm label batch_deregister_node metrics explicitly
Co-Authored-By: Mahmood Ali <mahmood@notnoop.com>
2019-07-10 13:56:20 -04:00
Lang Martin ad3549f906 core_sched use the new rpc names 2019-07-10 13:56:20 -04:00
Lang Martin ce0f03651a fsm support new NodeDeregisterBatchRequest 2019-07-10 13:56:20 -04:00
Lang Martin fa5649998e node endpoint support new NodeDeregisterBatchRequest 2019-07-10 13:56:19 -04:00
Lang Martin 683ab8d1d2 structs add NodeDeregisterBatchRequest 2019-07-10 13:56:19 -04:00
Lang Martin 82349aba5d node_endpoint argument setup 2019-07-10 13:56:19 -04:00
Lang Martin 6dbf5d7d13 fsm return an error on both NodeDeregisterRequest fields set 2019-07-10 13:56:19 -04:00
Lang Martin fbc78ba96c fsm variable names for consistency 2019-07-10 13:56:19 -04:00
Lang Martin 09fd05bd8f node_endpoint raft store then shutdown, test deprecation 2019-07-10 13:56:19 -04:00
Lang Martin 4610c70777 util simplify partitionAll 2019-07-10 13:56:19 -04:00
Lang Martin d22d9fb5b2 core_sched check ServersMeetMinimumVersion 2019-07-10 13:56:19 -04:00
Lang Martin 3bf41211fb fsm honor new and old style NodeDeregisterRequests 2019-07-10 13:56:19 -04:00
Lang Martin 3fb82e83a5 structs add back NodeDeregisterRequest.NodeID, compatibility 2019-07-10 13:56:19 -04:00
Lang Martin a4472e3d34 core_sched check ServersMeetMinimumVersion, send old node deregister 2019-07-10 13:56:19 -04:00
Lang Martin 8e53c105fc state_store just one index update, test deletion 2019-07-10 13:56:19 -04:00
Lang Martin 3e2d1f0338 node_endpoint improve error messages 2019-07-10 13:56:19 -04:00
Lang Martin 5a6a947e98 state_store improve error messages 2019-07-10 13:56:19 -04:00
Lang Martin fd14cedf95 drainer watch_nodes_test batch of 1 2019-07-10 13:56:19 -04:00
Lang Martin b176066d42 node_endpoint deregister the batch of nodes 2019-07-10 13:56:19 -04:00
Lang Martin a97407e030 fsm NodeDeregisterRequest is now a batch 2019-07-10 13:56:19 -04:00
Lang Martin d5ff2834ca core_sched batch node deregistration requests 2019-07-10 13:56:19 -04:00
Lang Martin 10848841be util partitionAll for paging 2019-07-10 13:56:19 -04:00
Lang Martin be2d6853cb state_store DeleteNode operates on a batch of ids 2019-07-10 13:56:19 -04:00
Lang Martin 77cf037bff struct NodeDeregisterRequest has a batch of NodeIDs 2019-07-10 13:56:19 -04:00
Mahmood Ali ea3a98357f Block rpc handling until state store is caught up
Here, we ensure that when leader only responds to RPC calls when state
store is up to date.  At leadership transition or launch with restored
state, the server local store might not be caught up with latest raft
logs and may return a stale read.

The solution here is to have an RPC consistency read gate, enabled when
`establishLeadership` completes before we respond to RPC calls.
`establishLeadership` is gated by a `raft.Barrier` which ensures that
all prior raft logs have been applied.

Conversely, the gate is disabled when leadership is lost.

This is very much inspired by https://github.com/hashicorp/consul/pull/3154/files
2019-07-02 16:07:37 +08:00
Preetha Appan 3cb798235d
Missed one revert of backwards compatibility for node drain 2019-07-01 16:46:05 -05:00
Preetha Appan aa2b4b4e00
Undo removal of node drain compat changes
Decided to remove that in 0.10
2019-07-01 15:12:01 -05:00
Preetha Appan 3484f18984
Fix more tests 2019-06-26 16:30:53 -05:00
Preetha Appan ff1b80dba6
Fix node drain test 2019-06-26 16:12:07 -05:00
Preetha Appan 23319e04d6
Restore accidentally deleted block 2019-06-26 13:59:14 -05:00
Michael Schurter 69ba495f0c nomad: expand comments on subtle plan apply behaviors 2019-06-26 08:49:24 -07:00
Preetha Appan 66fa6a67ec
newline 2019-06-25 19:41:09 -05:00
Preetha Appan 10e7d6df6d
Remove compat code associated with many previous versions of nomad
This removes compat code for namespaces (0.7), Drain(0.8) and other
older features from releases older than Nomad 0.7
2019-06-25 19:05:25 -05:00
Michael Schurter e4bc943a68 nomad: SnapshotAfter -> SnapshotMinIndex
Rename SnapshotAfter to SnapshotMinIndex. The old name was not
technically accurate. SnapshotAtOrAfter is more accurate, but wordy and
still lacks context about what precisely it is at or after (the index).

SnapshotMinIndex was chosen as it describes the action (snapshot), a
constraint (minimum), and the object of the constraint (index).
2019-06-24 12:16:46 -07:00
Michael Schurter 0f8164b2f1 nomad: evaluate plans after previous plan index
The previous commit prevented evaluating plans against a state snapshot
which is older than the snapshot at which the plan was created.  This is
correct and prevents failures trying to retrieve referenced objects that
may not exist until the plan's snapshot. However, this is insufficient
to guarantee consistency if the following events occur:

1. P1, P2, and P3 are enqueued with snapshot @ 100
2. Leader evaluates and applies Plan P1 with snapshot @ 100
3. Leader evaluates Plan P2 with snapshot+P1 @ 100
4. P1 commits @ 101
4. Leader evaluates applies Plan P3 with snapshot+P2 @ 100

Since only the previous plan is optimistically applied to the state
store, the snapshot used to evaluate a plan may not contain the N-2
plan!

To ensure plans are evaluated and applied serially we must consider all
previous plan's committed indexes when evaluating further plans.

Therefore combined with the last PR, the minimum index at which to
evaluate a plan is:

    min(previousPlanResultIndex, plan.SnapshotIndex)
2019-06-24 12:16:46 -07:00
Michael Schurter e10fea1d7a nomad: include snapshot index when submitting plans
Plan application should use a state snapshot at or after the Raft index
at which the plan was created otherwise it risks being rejected based on
stale data.

This commit adds a Plan.SnapshotIndex which is set by workers when
submitting plan. SnapshotIndex is set to the Raft index of the snapshot
the worker used to generate the plan.

Plan.SnapshotIndex plays a similar role to PlanResult.RefreshIndex.
While RefreshIndex informs workers their StateStore is behind the
leader's, SnapshotIndex is a way to prevent the leader from using a
StateStore behind the worker's.

Plan.SnapshotIndex should be considered the *lower bound* index for
consistently handling plan application.

Plans must also be committed serially, so Plan N+1 should use a state
snapshot containing Plan N. This is guaranteed for plans *after* the
first plan after a leader election.

The Raft barrier on leader election ensures the leader's statestore has
caught up to the log index at which it was elected. This guarantees its
StateStore is at an index > lastPlanIndex.
2019-06-24 12:16:46 -07:00
Chris Baker 59fac48d92 alloc lifecycle: 404 when attempting to stop non-existent allocation 2019-06-20 21:27:22 +00:00
Preetha 586e50d1a4
Merge pull request #5841 from hashicorp/f-raft-snapshot-metrics
Raft and state store indexes as metrics
2019-06-19 12:01:03 -05:00
Preetha Appan dc0ac81609
Change interval of raft stats collection to 10s 2019-06-19 11:58:46 -05:00
Preetha Appan 104d66f10c
Changed name of metric 2019-06-17 15:51:31 -05:00
Chris Baker e0170e1c67 metrics: add namespace label to allocation metrics 2019-06-17 20:50:26 +00:00
Preetha Appan c54b4a5b17
Emit metrics with raft commit and apply index and statestore latest index 2019-06-14 16:30:27 -05:00
Jasmine Dahilig ed9740db10
Merge pull request #5664 from hashicorp/f-http-hcl-region
backfill region from hcl for jobUpdate and jobPlan
2019-06-13 12:25:01 -07:00
Jasmine Dahilig 51e141be7a backfill region from job hcl in jobUpdate and jobPlan endpoints
- updated region in job metadata that gets persisted to nomad datastore
- fixed many unrelated unit tests that used an invalid region value
(they previously passed because hcl wasn't getting picked up and
the job would default to global region)
2019-06-13 08:03:16 -07:00
Nick Ethier 1b7fa4fe29
Optional Consul service tags for nomad server and agent services (#5706)
Optional Consul service tags for nomad server and agent services
2019-06-13 09:00:35 -04:00
Mahmood Ali e31159bf1f Prepare for 0.9.4 dev cycle 2019-06-12 18:47:50 +00:00
Nomad Release bot 4803215109 Generate files for 0.9.3 release 2019-06-12 16:11:16 +00:00
Mahmood Ali 07f2c77c44 comment DenormalizeAllocationDiffSlice applies to terminal allocs only 2019-06-12 08:28:43 -04:00
Lang Martin fe8a4781d8 config merge maintains *HCL string fields used for duration conversion 2019-06-11 16:34:04 -04:00
Mahmood Ali 392f5bac44 Stop updating allocs.Job on stopping or preemption 2019-06-10 18:30:20 -04:00
Mahmood Ali 6c8e329819 test that stopped alloc jobs aren't modified
When an alloc is stopped, test that we don't update the job found in
alloc with new job  that is no longer relevent for this alloc.
2019-06-10 17:14:26 -04:00
Mahmood Ali d30c3d10b0
Merge pull request #5747 from hashicorp/b-test-fixes-20190521-1
More test fixes
2019-06-05 19:09:18 -04:00
Mahmood Ali 87173111de
Merge pull request #5746 from hashicorp/b-no-updating-inmem-node
set node.StatusUpdatedAt in raft
2019-06-05 19:05:21 -04:00
Mahmood Ali 97957fbf75 Prepare for 0.9.3 dev cycle 2019-06-05 14:54:00 +00:00
Nomad Release bot 43bfbf3fcc Generate files for 0.9.2 release 2019-06-05 11:59:27 +00:00
Michael Schurter 073893f529 nomad: disable service+batch preemption by default
Enterprise only.

Disable preemption for service and batch jobs by default.

Maintain backward compatibility in a x.y.Z release. Consider switching
the default for new clusters in the future.
2019-06-04 15:54:50 -07:00
Michael Schurter a8fc50cc1b nomad: revert use of SnapshotAfter in planApply
Revert plan_apply.go changes from #5411

Since non-Command Raft messages do not update the StateStore index,
SnapshotAfter may unnecessarily block and needlessly fail in idle
clusters where the last Raft message is a non-Command message.

This is trivially reproducible with the dev agent and a job that has 2
tasks, 1 of which fails.

The correct logic would be to SnapshotAfter the previous plan's index to
ensure consistency. New clusters or newly elected leaders will not have
a previous plan, so the index the leader was elected should be used
instead.
2019-06-03 15:34:21 -07:00
Mahmood Ali a4ead8ff79 remove 0.9.2-rc1 generated code 2019-05-23 11:14:24 -04:00
Nomad Release bot 6d6bc59732 Generate files for 0.9.2-rc1 release 2019-05-22 19:29:30 +00:00
Lang Martin d46613ff44 structs check TaskGroup.Update for nil 2019-05-22 12:34:57 -04:00
Lang Martin 10a3fd61b0 comment replace COMPAT 0.7.0 for job.Update with more current info 2019-05-22 12:34:57 -04:00
Lang Martin 67ebcc47dd structs comment todo DeploymentStatus & DeploymentStatusDescription 2019-05-22 12:34:57 -04:00
Lang Martin 21bf9fdf90 structs job warnings for taskgroup with mixed auto_promote settings 2019-05-22 12:34:57 -04:00
Lang Martin 0f6f543a5f deployment_watcher auto promote iff every task group is auto promotable 2019-05-22 12:34:57 -04:00
Lang Martin d27d6f8ede structs validate requires Canary for AutoPromote 2019-05-22 12:32:08 -04:00
Lang Martin 0c668ecc7a log error on autoPromoteDeployment failure 2019-05-22 12:32:08 -04:00
Lang Martin f23f9fd99e describe a pending deployment without auto_promote more explicitly 2019-05-22 12:32:08 -04:00
Lang Martin 34230577df describe a pending deployment with auto_promote accurately 2019-05-22 12:32:08 -04:00
Lang Martin b5fd735960 add update AutoPromote bool 2019-05-22 12:32:08 -04:00
Lang Martin 3c5a9fed22 deployments_watcher_test new TestWatcher_AutoPromoteDeployment 2019-05-22 12:32:08 -04:00
Lang Martin 0bebf5d7f8 deployment_watcher when it's ok to autopromote, do so 2019-05-22 12:32:08 -04:00
Lang Martin 0cf4168ed9 deployments_watcher comments 2019-05-22 12:32:08 -04:00
Lang Martin 0c403eafde state_store typo in a comment 2019-05-22 12:32:08 -04:00
Lang Martin e1e28307be new deploymentwatcher/doc.go for package level documentation 2019-05-22 12:32:08 -04:00
Mahmood Ali 9ff5f163b5 update callers in tests 2019-05-21 21:10:17 -04:00
Mahmood Ali 6bdbeed319 set node.StatusUpdatedAt in raft
Fix a case where `node.StatusUpdatedAt` was manipulated directly in
memory.

This ensures that StatusUpdatedAt is set in raft layer, and ensures that
the field is updated when node drain/eligibility is updated too.
2019-05-21 16:13:32 -04:00
Mahmood Ali 2159d0f3ac tests: fix some nomad/drainer test data races 2019-05-21 14:40:58 -04:00
Mahmood Ali 3b0152d778 tests: fix deploymentwatcher tests data races 2019-05-21 14:29:45 -04:00
Michael Schurter 689794e08d nomad: fix deadlock in UnblockClassAndQuota
Previous commit could introduce a deadlock if the capacityChangeCh was
full and the receiving side exited before freeing a slot for the sending
side could send. Flush would then block forever waiting to acquire the
lock just to throw the pending update away.

The race is around getting/setting the chan field, not chan operations,
so only lock around getting the chan field.
2019-05-20 15:41:52 -07:00
Michael Schurter 8c99214f69 nomad: fix race in BlockedEvals
I assume the mutex was being released before sending on capacityChangeCh
to avoid blocking in the critical section, but:

1. This is race.
2. capacityChangeCh has a *huge* buffer (8096). If it's full things
   already seem Very Bad, and a little backpressure seems appropriate.
2019-05-20 15:26:20 -07:00
Michael Schurter 05a9c6aedb
Merge pull request #5411 from hashicorp/b-snapshotafter
Block plan application until state store has caught up to raft
2019-05-20 14:03:10 -07:00
Mahmood Ali cd64ada95d Run TestClientAllocations_Restart_ACL test 2019-05-17 20:30:23 -04:00
Michael Schurter 0e39927782 nomad: emit more detailed error
Avoid returning context.DeadlineExceeded as it lacks helpful information
and is often ignored or handled specially by callers.
2019-05-17 14:37:42 -07:00
Michael Schurter b80a7e0feb nomad: wait for state store to sync in plan apply
Wait for state store to catch up with raft when applying plans.
2019-05-17 14:37:12 -07:00
Michael Schurter 1bc731da47 nomad: remove unused NotifyGroup struct
I don't think it's been used for a long time.
2019-05-17 13:30:23 -07:00
Michael Schurter 9732bc37ff nomad: refactor waitForIndex into SnapshotAfter
Generalize wait for index logic in the state store for reuse elsewhere.
Also begin plumbing in a context to combine handling of timeouts and
shutdown.
2019-05-17 13:30:23 -07:00
Preetha c8fdf20c66
Merge pull request #5717 from hashicorp/b-plan-apply-preemptions
Fix bug in plan applier introduced in PR-5602
2019-05-16 11:01:05 -05:00
Preetha 2dcd4291f8
Merge pull request #5702 from hashicorp/f-filter-by-create-index
Filter deployments by create index
2019-05-15 21:50:41 -05:00
Preetha 555dd23c2c
remove stray newline
Co-Authored-By: Danielle <dani@builds.terrible.systems>
2019-05-15 21:11:52 -05:00
Preetha Appan 2b787aad7e
Fix bug in plan applier introduced in PR-5602
This fixes a bug in the state store during plan apply. When
denormalizing preempted allocations it incorrectly set the preemptor's
job during the update. This eventually causes a panic downstream in the
client. Added a test assertion that failed before and passes after this fix
2019-05-15 20:34:06 -05:00
Danielle d202582502
Merge pull request #5699 from hashicorp/dani/b-eval-broker-lifetime
Eval Broker: Prevent redundant enqueue's when a node is not a leader
2019-05-15 23:30:52 +01:00
Danielle Lancashire 2fb93a6229 evalbroker: test for no enqueue on disabled 2019-05-15 11:02:21 +02:00
Nick Ethier ade97bc91f
fixup #5172 and rebase against master 2019-05-14 14:37:34 -04:00
Nick Ethier cab6a95668
Merge branch 'master' into pr/5172
* master: (912 commits)
  Update redirects.txt
  Added redirect for Spark guide link
  client: log when server list changes
  docs: mention regression in task config validation
  fix update to changelog
  update CHANGELOG with datacenter config validation https://github.com/hashicorp/nomad/pull/5665
  typo: "atleast" -> "at least"
  implement nomad exec for rkt
  docs: fixed typo
  use pty/tty terminology similar to github.com/kr/pty
  vendor github.com/kr/pty
  drivers: implement streaming exec for executor based drivers
  executors: implement streaming exec
  executor: scaffolding for executor grpc handling
  client: expose allocated memory per task
  client improve a comment in updateNetworks
  stalebot: Add 'thinking' as an exempt label (#5684)
  Added Sparrow link
  update links to use new canonical location
  Add redirects for restructing done in GH-5667
  ...
2019-05-14 14:10:33 -04:00
Michael Schurter d7e5ace1ed client: do not restart dead tasks until server is contacted
Fixes #1795

Running restored allocations and pulling what allocations to run from
the server happen concurrently. This means that if a client is rebooted,
and has its allocations rescheduled, it may restart the dead allocations
before it contacts the server and determines they should be dead.

This commit makes tasks that fail to reattach on restore wait until the
server is contacted before restarting.
2019-05-14 10:53:27 -07:00
Danielle Lancashire d9815888ed evalbroker: Simplify nextDelayedEval locking 2019-05-14 14:06:27 +02:00
Danielle Lancashire 38562afbc1 evalbroker: No new enqueues when disabled
Currently when an evalbroker is disabled, it still recieves delayed
enqueues via log application in the fsm. This causes an ever growing
heap of evaluations that will never be drained, and can cause memory
issues in larger clusters, or when left running for an extended period
of time without a leader election.

This commit prevents the enqueuing of evaluations while we are
disabled, and relies on the leader restoreEvals routine to handle
reconciling state during a leadership transition.

Existing dequeues during an Enabled->Disabled broker state transition are
handled by the enqueueLocked function dropping evals.
2019-05-14 13:59:10 +02:00
Danielle Lancashire c91ae21a6c evalbroker: Flush within update lock
Primarily a cleanup commit, however, currently there is a potential race
condition (that I'm not sure we've ever actually hit) during a flapping
SetEnabled/Disabled state where we may never correctly restart the eval
broker, if it was being called from multiple routines.
2019-05-14 13:26:56 +02:00
Preetha Appan 4d3f74e161
Fix test setup to have correct jobcreateindex for deployments 2019-05-13 18:53:47 -05:00
Preetha Appan d448750449
Lookup job only once, and fix tests 2019-05-13 18:33:41 -05:00
Preetha Appan 07690d6f9e
Add flag similar to --all for allocs to be able to filter deployments by latest 2019-05-13 18:33:41 -05:00
Jasmine Dahilig 30d346ca15
Merge pull request #5665 from hashicorp/b-empty-datacenters
add non-empty string validation for datacenters
2019-05-13 10:23:26 -07:00
Mahmood Ali cf1f3625b4 Update ugorji/go to latest
Our testing so far indicates that ugorji/go/codec maintains backward
compatiblity with the version we are using now, for purposes of Nomad
serialization.

Using latest ugorji/go allows us to get back to using upstream library,
get get the optimizations benefits in RPC paths (including code
generation optimizations).

ugorji/go introduced two significant changes:
* time binary format in debb8e2d2e.  Setting `h.BasicHandle.TimeNotBuiltin = true` restores old behavior
* ugorji/go started honoring `json` tag as well:

v1.1.4 is the latest but has a bug in handling RawString that's fixed in
d09a80c1e0
.
2019-05-09 19:35:58 -04:00
Mahmood Ali 919827f2df
Merge pull request #5632 from hashicorp/f-nomad-exec-parts-01-base
nomad exec part 1: plumbing and docker driver
2019-05-09 18:09:27 -04:00
Mahmood Ali 3c668732af server: server forwarding logic for nomad exec endpoint 2019-05-09 16:49:08 -04:00
Jasmine Dahilig 0ba2bd15b9 add unit tests for datacenter non-empty string validation 2019-05-08 11:51:52 -07:00
Mahmood Ali 9d3f13e9b3 remove Index field from EmitNodeEventsResponse
`Index` is already included as part of `WriteMeta` embedding.

This is a backward compatible change: Clients never read the field; and
Server refernces to `EmitNodeEventsResponse.Index` would be using the
value in `WriteMeta`, which is consistent with other response structs.
2019-05-08 08:42:26 -04:00
Preetha 1538913a2a
Merge pull request #5628 from hashicorp/f-preemption-config
Add config to disable preemption for batch/service jobs
2019-05-06 15:40:35 -05:00
Mahmood Ali f35ad92a8b
Merge pull request #5646 from hashicorp/some-ugorji-fixes
Codegen codec helpers for all nomad structs
2019-05-06 13:23:12 -04:00
Lang Martin 9f3f11df97
Merge pull request #5601 from hashicorp/b-config-parse-direct-hcl
config parse direct hcl
2019-05-06 12:05:19 -04:00
Mahmood Ali 92c133b905 Update peers info with new raft config details 2019-05-03 16:55:53 -04:00
Preetha Appan ad3c263d3f
Rename to match system scheduler config.
Also added docs
2019-05-03 14:06:12 -05:00
Jasmine Dahilig 016495c368 add non-empty string validation for datacenters 2019-05-03 06:48:02 -07:00
Hemanth Basappa 3fef02aa93 Add support in nomad for supporting raft 3 protocol peers.json 2019-05-02 09:11:23 -07:00
Mahmood Ali 21d21baf8b codegen codecs for nomad structs
`ls *[!_test].go` was ignoring any file that ends with `s.go` (or any of
the letter inside `[]`), including `structs.go`!
2019-05-01 12:42:55 -04:00
Lang Martin 598112a1cc tag HCL bookkeeping keys with json:"-" to keep them out of the api 2019-04-30 10:29:14 -04:00
Lang Martin 5ebae65d1a agent/config, config/* mapstructure tags -> hcl tags 2019-04-30 10:29:14 -04:00
Preetha Appan 6615d5c868
Add config to disable preemption for batch/service jobs 2019-04-29 18:48:07 -05:00
Lang Martin 371014b781
Merge pull request #5553 from hashicorp/b-fingerprinter-manual-config
client fingerprinter doesn't overwrite manual configuration
2019-04-26 12:55:34 -04:00
Danielle Lancashire 3409e0be89 allocs: Add nomad alloc signal command
This command will be used to send a signal to either a single task within an
allocation, or all of the tasks if <task-name> is omitted. If the sent signal
terminates the allocation, it will be treated as if the allocation has crashed,
rather than as if it was operator-terminated.

Signal validation is currently handled by the driver itself and nomad
does not attempt to restrict or validate them.
2019-04-25 12:43:32 +02:00
Arshneet Singh b7b050cdd1 Change min version required for plan optimization 2019-04-24 12:36:07 -07:00
Arshneet Singh 9cc39edb67 Return error when preempted/stopped alloc doesn't exist during denormalization 2019-04-24 12:36:07 -07:00
Lang Martin 19ba0f4882 structs_test use testify require.True instead of t.Fatal 2019-04-23 17:00:11 -04:00
Arshneet Singh d4e7a5c005 Add comments to functions, and use require instead of assert 2019-04-23 09:57:21 -07:00
Arshneet Singh 4cf4324b8f Remove allowPlanOptimization from schedulers 2019-04-23 09:18:02 -07:00
Arshneet Singh 0dd4c109e8 Compat tags 2019-04-23 09:18:01 -07:00
Arshneet Singh 65f5fab131 Add tests for plan normalization 2019-04-23 09:18:01 -07:00
Arshneet Singh b977748a4b Add code for plan normalization 2019-04-23 09:18:01 -07:00
Danielle 198a838b61
Merge pull request #5512 from hashicorp/dani/f-alloc-stop
alloc-lifecycle: nomad alloc stop
2019-04-23 13:05:08 +02:00
Danielle Lancashire 832f607433 allocs: Add nomad alloc stop
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.

This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.

The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.
2019-04-23 12:50:23 +02:00
Lang Martin 8aa97cff13 tests over setwise equality of fingerprinted parts 2019-04-19 15:49:24 -04:00
Lang Martin 7de6e28ddc structs need to keep assert Equal interface implementation for tests 2019-04-19 15:23:49 -04:00
Lang Martin 977d33970b structs equals use labeled continue for clarity 2019-04-19 15:23:48 -04:00
Lang Martin 7b99488afa struct equals use a working pattern for setwise comparison 2019-04-19 15:23:48 -04:00
Lang Martin eba4e29440 client fingerprinter doesn't overwrite manual configuration
Revert "Revert accidental merge of pr #5482"
This reverts commit c45652ab8c113487b9d4fbfb107782cbcf8a85b0.
2019-04-19 15:23:48 -04:00
Preetha Appan 22109d1e20
Add preemption related fields to AllocationListStub 2019-04-18 10:36:44 -05:00
Lang Martin a2a1e7829d Revert accidental merge of pr #5482
Revert "fingerprint Constraints and Affinities have Equals, as set"
This reverts commit 596f16fb5f1a4a6766a57b3311af806d22382609.

Revert "client tests assert the independent handling of interface and speed"
This reverts commit 7857ac5993a578474d0570819f99b7b6e027de40.

Revert "structs missed applying a style change from the review"
This reverts commit 658916e3274efa438beadc2535f47109d0c2f0f2.

Revert "client, structs comments"
This reverts commit be2838d6baa9d382a5013fa80ea016856f28ade2.

Revert "client fingerprint updateNetworks preserves the network configuration"
This reverts commit fc309cb430e62d8e66267a724f006ae9abe1c63c.

Revert "client_test cleanup comments from review"
This reverts commit bc0bf4efb9114e699bc662f50c8f12319b6b3445.

Revert "client Networks Equals is set equality"
This reverts commit f8d432345b54b1953a4a4c719b9269f845e3e573.

Revert "struct cleanup indentation in RequestedDevice Equals"
This reverts commit f4746411cab328215def6508955b160a53452da3.

Revert "struct Equals checks for identity before value checking"
This reverts commit 0767a4665ed30ab8d9586a59a74db75d51fd9226.

Revert "fix client-test, avoid hardwired platform dependecy on lo0"
This reverts commit e89dbb2ab182b6368507dbcd33c3342223eb0ae7.

Revert "refactor error in client fingerprint to include the offending data"
This reverts commit a7fed726c6e0264d42a58410d840adde780a30f5.

Revert "add client updateNodeResources to merge but preserve manual config"
This reverts commit 84bd433c7e1d030193e054ec23474380ff3b9032.

Revert "refactor struts.RequestedDevice to have its own Equals"
This reverts commit 689782524090e51183474516715aa2f34908b8e6.

Revert "refactor structs.Resource.Networks to have its own Equals"
This reverts commit 49e2e6c77bb3eaa4577772b36c62205061c92fa1.

Revert "refactor structs.Resource.Devices to have its own Equals"
This reverts commit 4ede9226bb971ae42cc203560ed0029897aec2c9.

Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources"
This reverts commit 49fbaace5298d5ccf031eb7ebec93906e1d468b5.

Revert "add structs.Resources Equals"
This reverts commit 8528a2a2a6450e4462a1d02741571b5efcb45f0b.

Revert "test that fingerprint resources are updated, net not clobbered"
This reverts commit 8ee02ddd23bafc87b9fce52b60c6026335bb722d.
2019-04-11 10:29:40 -04:00
Lang Martin 07ff740408 fingerprint Constraints and Affinities have Equals, as set 2019-04-11 09:56:22 -04:00
Lang Martin 8f07698c03 structs missed applying a style change from the review 2019-04-11 09:56:22 -04:00
Lang Martin 7258a13c72 client, structs comments 2019-04-11 09:56:22 -04:00
Lang Martin 1878bf694e client Networks Equals is set equality 2019-04-11 09:56:22 -04:00
Lang Martin e1c91afd19 struct cleanup indentation in RequestedDevice Equals 2019-04-11 09:56:22 -04:00
Lang Martin 0c90efebdc struct Equals checks for identity before value checking 2019-04-11 09:56:22 -04:00
Lang Martin 1a594b53f6 refactor struts.RequestedDevice to have its own Equals 2019-04-11 09:56:21 -04:00
Lang Martin ec1ccdeda0 refactor structs.Resource.Networks to have its own Equals
NodeResource.Networks uses the same function
2019-04-11 09:56:21 -04:00
Lang Martin 06008465c4 refactor structs.Resource.Devices to have its own Equals 2019-04-11 09:56:21 -04:00
Lang Martin 36f3022246 add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources 2019-04-11 09:56:21 -04:00
Lang Martin d4567e9909 add structs.Resources Equals 2019-04-11 09:56:21 -04:00
Danielle Lancashire e135876493 allocs: Add nomad alloc restart
This adds a `nomad alloc restart` command and api that allows a job operator
with the alloc-lifecycle acl to perform an in-place restart of a Nomad
allocation, or a given subtask.
2019-04-11 14:25:49 +02:00
Chris Baker 34e100cc96
server vault client: use two vault clients, one with namespace, one without for /sys calls 2019-04-10 10:34:10 -05:00
Michael Schurter cc7768c170
Update nomad/structs/config/vault.go
Co-Authored-By: cgbaker <cgbaker@hashicorp.com>
2019-04-10 10:34:10 -05:00
Chris Baker a26d4fe1e5
docs: -vault-namespace, VAULT_NAMESPACE, and config
agent: added VAULT_NAMESPACE env-based configuration
2019-04-10 10:34:10 -05:00
Chris Baker d3041cdb17
wip: added config parsing support, CLI flag, still need more testing, VAULT_ var, documentation 2019-04-10 10:34:10 -05:00
Chris Baker 0eaeef872f
config/docs: added `namespace` to vault config
server/client: process `namespace` config, setting on the instantiated vault client
2019-04-10 10:34:10 -05:00
Michael Schurter c0cd96ef75
Update nomad/job_endpoint_test.go
Co-Authored-By: cgbaker <cgbaker@hashicorp.com>
2019-04-10 10:34:10 -05:00
Michael Schurter 188c32421a
Update nomad/job_endpoint.go
Co-Authored-By: cgbaker <cgbaker@hashicorp.com>
2019-04-10 10:34:10 -05:00
Chris Baker 0ba1600545
server/job_endpoint: accept vault token and pass as part of Job.RegisterRequest [#4555] 2019-04-10 10:34:10 -05:00
James Rasell 9470507cf4
Add NodeName to the alloc/job status outputs.
Currently when operators need to log onto a machine where an alloc
is running they will need to perform both an alloc/job status
call and then a call to discover the node name from the node list.

This updates both the job status and alloc status output to include
the node name within the information to make operator use easier.

Closes #2359
Cloess #1180
2019-04-10 10:34:10 -05:00
Michael Schurter 45b4827ad7 Bump to 0.9.1-dev 2019-04-09 09:01:48 -07:00
Nomad Release bot e307734e4a Generate files for 0.9.0 release 2019-04-09 01:56:00 +00:00
Michael Schurter 3af602b633 Remove 0.9.0-rc2 generated files 2019-04-03 07:41:09 -07:00
Nomad Release bot 16b4336ccf Generate files for 0.9.0-rc2 release 2019-04-03 01:54:29 +00:00
Michael Schurter 9afbc45cff Bump to dev post-0.9.0-rc1 release 2019-03-22 08:26:30 -07:00
Nomad Release bot 3ab3dd4105 Generate files for 0.9.0-rc1 release 2019-03-21 19:06:13 +00:00
HashedDan caad68e799 server: inconsistent receiver notation corrected
Signed-off-by: HashedDan <georgedanielmangum@gmail.com>
2019-03-16 17:53:53 -05:00
Alex Dadgar e779d9444b
Update nomad/eval_endpoint_test.go
Co-Authored-By: schmichael <michael.schurter@gmail.com>
2019-03-05 15:19:15 -08:00
Alex Dadgar 1857f5d7c1
Update nomad/eval_endpoint.go
Co-Authored-By: schmichael <michael.schurter@gmail.com>
2019-03-05 15:19:07 -08:00
Michael Schurter e37bbb21a5 nomad: simplify code and improve parameter name 2019-03-04 13:44:14 -08:00
Michael Schurter 05f51499ba nomad: compare current eval when setting WaitIndex
Consider currently dequeued Evaluation's ModifyIndex when determining
its WaitIndex. Normally the Evaluation itself would already be in the
state store snapshot used to determine the WaitIndex. However, since the FSM
applies Raft messages to the state store concurrently with Dequeueing,
it's possible the currently dequeued Evaluation won't yet exist in the
state store snapshot used by JobsForEval.

This can be solved by always considering the current eval's modify index
and using it if it is greater than all of the evals returned by the
state store.
2019-03-01 15:23:39 -08:00
Michael Schurter 3f386e3951 Remove generated files for 0.9.0-beta3 2019-02-26 10:34:08 -08:00
Michael Schurter d74755900e Generate files for 0.9.0-beta3 release 2019-02-26 09:44:49 -08:00
Charlie Voiselle 604c49beb8
Merge pull request #5344 from hashicorp/b-nexteval-for-failed-follow-up
Set NextEval when making `failed-follow-up` evals
2019-02-22 14:14:41 -08:00
Charlie Voiselle 006afdca9b Added comments
* caller should created eval id
* prev/next eval used in failed-follow-up
2019-02-22 10:22:52 -08:00
Charlie Voiselle c28c195f42 Set NextEval when making `failed-follow-up` evals
This allows users to locate failed-follow-up evals more easily
2019-02-20 16:07:11 -08:00
Michael Schurter 6580ed668e client: don't redownload completed artifacts on retries
Track the download status of each artifact independently so that if only
one of many artifacts fails to download, completed artifacts aren't
downloaded again.
2019-02-20 08:45:12 -08:00
Michael Schurter 2db91425e3 Remove 0.9.0-beta2 generated files 2019-02-01 08:28:44 -08:00
Alex Dadgar 84d0afccae Generate files for 0.9.0-beta2 2019-01-30 13:31:50 -08:00
Alex Dadgar d2e5ede119 remove generated structs 2019-01-30 12:38:34 -08:00
Alex Dadgar 41265d4d61 Change types of weights on spread/affinity 2019-01-30 12:20:38 -08:00
Alex Dadgar bc804dda2e Nomad 0.9.0-beta1 generated code 2019-01-30 10:49:44 -08:00
Preetha Appan c848a1d387
ensure tests run a 0.9 server 2019-01-29 16:19:45 -06:00
Preetha Appan 496eb1de0c
Guard operator endpoints for minimum server version 2019-01-29 15:50:36 -06:00
Preetha Appan 7578522f58
variable name fix 2019-01-29 13:48:45 -06:00
Preetha Appan a6cebbbf9e
Make sure that all servers are 0.9 before applying scheduler config entry 2019-01-29 12:47:42 -06:00
Michael Schurter 3aba7ee826 nomad: fix panic when no node conn found
A missing return would cause a panic when a server could find no route
to a client.
2019-01-28 21:55:35 -08:00
Mahmood Ali f9164dae67
Merge pull request #5228 from hashicorp/f-vault-err-tweaks
server/vault: tweak error messages
2019-01-25 11:17:31 -05:00
Mahmood Ali f4560d8a2a server/vault: tweak error messages
Closes #5139
2019-01-25 10:33:54 -05:00
Preetha ec92bf673c
Merge pull request #5223 from hashicorp/f-jobs-list-datacenters
Add Datacenters to the JobListStub struct
2019-01-24 08:13:30 -06:00
Michael Schurter 13f061a83f
Merge pull request #5196 from hashicorp/f-plugin-utils
Make plugins/shared external and make pluginutls/
2019-01-23 06:59:32 -08:00
Michael Schurter 32daa7b47b goimports until make check is happy 2019-01-23 06:27:14 -08:00