Commit Graph

2740 Commits

Author SHA1 Message Date
Mahmood Ali 6bdbeed319 set node.StatusUpdatedAt in raft
Fix a case where `node.StatusUpdatedAt` was manipulated directly in
memory.

This ensures that StatusUpdatedAt is set in raft layer, and ensures that
the field is updated when node drain/eligibility is updated too.
2019-05-21 16:13:32 -04:00
Michael Schurter 689794e08d nomad: fix deadlock in UnblockClassAndQuota
Previous commit could introduce a deadlock if the capacityChangeCh was
full and the receiving side exited before freeing a slot for the sending
side could send. Flush would then block forever waiting to acquire the
lock just to throw the pending update away.

The race is around getting/setting the chan field, not chan operations,
so only lock around getting the chan field.
2019-05-20 15:41:52 -07:00
Michael Schurter 8c99214f69 nomad: fix race in BlockedEvals
I assume the mutex was being released before sending on capacityChangeCh
to avoid blocking in the critical section, but:

1. This is race.
2. capacityChangeCh has a *huge* buffer (8096). If it's full things
   already seem Very Bad, and a little backpressure seems appropriate.
2019-05-20 15:26:20 -07:00
Michael Schurter 05a9c6aedb
Merge pull request #5411 from hashicorp/b-snapshotafter
Block plan application until state store has caught up to raft
2019-05-20 14:03:10 -07:00
Mahmood Ali cd64ada95d Run TestClientAllocations_Restart_ACL test 2019-05-17 20:30:23 -04:00
Michael Schurter 0e39927782 nomad: emit more detailed error
Avoid returning context.DeadlineExceeded as it lacks helpful information
and is often ignored or handled specially by callers.
2019-05-17 14:37:42 -07:00
Michael Schurter b80a7e0feb nomad: wait for state store to sync in plan apply
Wait for state store to catch up with raft when applying plans.
2019-05-17 14:37:12 -07:00
Michael Schurter 1bc731da47 nomad: remove unused NotifyGroup struct
I don't think it's been used for a long time.
2019-05-17 13:30:23 -07:00
Michael Schurter 9732bc37ff nomad: refactor waitForIndex into SnapshotAfter
Generalize wait for index logic in the state store for reuse elsewhere.
Also begin plumbing in a context to combine handling of timeouts and
shutdown.
2019-05-17 13:30:23 -07:00
Preetha c8fdf20c66
Merge pull request #5717 from hashicorp/b-plan-apply-preemptions
Fix bug in plan applier introduced in PR-5602
2019-05-16 11:01:05 -05:00
Preetha 2dcd4291f8
Merge pull request #5702 from hashicorp/f-filter-by-create-index
Filter deployments by create index
2019-05-15 21:50:41 -05:00
Preetha 555dd23c2c
remove stray newline
Co-Authored-By: Danielle <dani@builds.terrible.systems>
2019-05-15 21:11:52 -05:00
Preetha Appan 2b787aad7e
Fix bug in plan applier introduced in PR-5602
This fixes a bug in the state store during plan apply. When
denormalizing preempted allocations it incorrectly set the preemptor's
job during the update. This eventually causes a panic downstream in the
client. Added a test assertion that failed before and passes after this fix
2019-05-15 20:34:06 -05:00
Danielle d202582502
Merge pull request #5699 from hashicorp/dani/b-eval-broker-lifetime
Eval Broker: Prevent redundant enqueue's when a node is not a leader
2019-05-15 23:30:52 +01:00
Danielle Lancashire 2fb93a6229 evalbroker: test for no enqueue on disabled 2019-05-15 11:02:21 +02:00
Michael Schurter d7e5ace1ed client: do not restart dead tasks until server is contacted
Fixes #1795

Running restored allocations and pulling what allocations to run from
the server happen concurrently. This means that if a client is rebooted,
and has its allocations rescheduled, it may restart the dead allocations
before it contacts the server and determines they should be dead.

This commit makes tasks that fail to reattach on restore wait until the
server is contacted before restarting.
2019-05-14 10:53:27 -07:00
Danielle Lancashire d9815888ed evalbroker: Simplify nextDelayedEval locking 2019-05-14 14:06:27 +02:00
Danielle Lancashire 38562afbc1 evalbroker: No new enqueues when disabled
Currently when an evalbroker is disabled, it still recieves delayed
enqueues via log application in the fsm. This causes an ever growing
heap of evaluations that will never be drained, and can cause memory
issues in larger clusters, or when left running for an extended period
of time without a leader election.

This commit prevents the enqueuing of evaluations while we are
disabled, and relies on the leader restoreEvals routine to handle
reconciling state during a leadership transition.

Existing dequeues during an Enabled->Disabled broker state transition are
handled by the enqueueLocked function dropping evals.
2019-05-14 13:59:10 +02:00
Danielle Lancashire c91ae21a6c evalbroker: Flush within update lock
Primarily a cleanup commit, however, currently there is a potential race
condition (that I'm not sure we've ever actually hit) during a flapping
SetEnabled/Disabled state where we may never correctly restart the eval
broker, if it was being called from multiple routines.
2019-05-14 13:26:56 +02:00
Preetha Appan 4d3f74e161
Fix test setup to have correct jobcreateindex for deployments 2019-05-13 18:53:47 -05:00
Preetha Appan d448750449
Lookup job only once, and fix tests 2019-05-13 18:33:41 -05:00
Preetha Appan 07690d6f9e
Add flag similar to --all for allocs to be able to filter deployments by latest 2019-05-13 18:33:41 -05:00
Jasmine Dahilig 30d346ca15
Merge pull request #5665 from hashicorp/b-empty-datacenters
add non-empty string validation for datacenters
2019-05-13 10:23:26 -07:00
Mahmood Ali 919827f2df
Merge pull request #5632 from hashicorp/f-nomad-exec-parts-01-base
nomad exec part 1: plumbing and docker driver
2019-05-09 18:09:27 -04:00
Mahmood Ali 3c668732af server: server forwarding logic for nomad exec endpoint 2019-05-09 16:49:08 -04:00
Jasmine Dahilig 0ba2bd15b9 add unit tests for datacenter non-empty string validation 2019-05-08 11:51:52 -07:00
Mahmood Ali 9d3f13e9b3 remove Index field from EmitNodeEventsResponse
`Index` is already included as part of `WriteMeta` embedding.

This is a backward compatible change: Clients never read the field; and
Server refernces to `EmitNodeEventsResponse.Index` would be using the
value in `WriteMeta`, which is consistent with other response structs.
2019-05-08 08:42:26 -04:00
Preetha 1538913a2a
Merge pull request #5628 from hashicorp/f-preemption-config
Add config to disable preemption for batch/service jobs
2019-05-06 15:40:35 -05:00
Mahmood Ali f35ad92a8b
Merge pull request #5646 from hashicorp/some-ugorji-fixes
Codegen codec helpers for all nomad structs
2019-05-06 13:23:12 -04:00
Lang Martin 9f3f11df97
Merge pull request #5601 from hashicorp/b-config-parse-direct-hcl
config parse direct hcl
2019-05-06 12:05:19 -04:00
Mahmood Ali 92c133b905 Update peers info with new raft config details 2019-05-03 16:55:53 -04:00
Preetha Appan ad3c263d3f
Rename to match system scheduler config.
Also added docs
2019-05-03 14:06:12 -05:00
Jasmine Dahilig 016495c368 add non-empty string validation for datacenters 2019-05-03 06:48:02 -07:00
Hemanth Basappa 3fef02aa93 Add support in nomad for supporting raft 3 protocol peers.json 2019-05-02 09:11:23 -07:00
Mahmood Ali 21d21baf8b codegen codecs for nomad structs
`ls *[!_test].go` was ignoring any file that ends with `s.go` (or any of
the letter inside `[]`), including `structs.go`!
2019-05-01 12:42:55 -04:00
Lang Martin 598112a1cc tag HCL bookkeeping keys with json:"-" to keep them out of the api 2019-04-30 10:29:14 -04:00
Lang Martin 5ebae65d1a agent/config, config/* mapstructure tags -> hcl tags 2019-04-30 10:29:14 -04:00
Preetha Appan 6615d5c868
Add config to disable preemption for batch/service jobs 2019-04-29 18:48:07 -05:00
Lang Martin 371014b781
Merge pull request #5553 from hashicorp/b-fingerprinter-manual-config
client fingerprinter doesn't overwrite manual configuration
2019-04-26 12:55:34 -04:00
Danielle Lancashire 3409e0be89 allocs: Add nomad alloc signal command
This command will be used to send a signal to either a single task within an
allocation, or all of the tasks if <task-name> is omitted. If the sent signal
terminates the allocation, it will be treated as if the allocation has crashed,
rather than as if it was operator-terminated.

Signal validation is currently handled by the driver itself and nomad
does not attempt to restrict or validate them.
2019-04-25 12:43:32 +02:00
Arshneet Singh b7b050cdd1 Change min version required for plan optimization 2019-04-24 12:36:07 -07:00
Arshneet Singh 9cc39edb67 Return error when preempted/stopped alloc doesn't exist during denormalization 2019-04-24 12:36:07 -07:00
Lang Martin 19ba0f4882 structs_test use testify require.True instead of t.Fatal 2019-04-23 17:00:11 -04:00
Arshneet Singh d4e7a5c005 Add comments to functions, and use require instead of assert 2019-04-23 09:57:21 -07:00
Arshneet Singh 4cf4324b8f Remove allowPlanOptimization from schedulers 2019-04-23 09:18:02 -07:00
Arshneet Singh 0dd4c109e8 Compat tags 2019-04-23 09:18:01 -07:00
Arshneet Singh 65f5fab131 Add tests for plan normalization 2019-04-23 09:18:01 -07:00
Arshneet Singh b977748a4b Add code for plan normalization 2019-04-23 09:18:01 -07:00
Danielle 198a838b61
Merge pull request #5512 from hashicorp/dani/f-alloc-stop
alloc-lifecycle: nomad alloc stop
2019-04-23 13:05:08 +02:00
Danielle Lancashire 832f607433 allocs: Add nomad alloc stop
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.

This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.

The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.
2019-04-23 12:50:23 +02:00