Commit Graph

293 Commits

Author SHA1 Message Date
Luiz Aoqui 88d4eecfd0
add scaling policy type 2020-09-29 17:57:46 -04:00
Drew Bailey 28aa0387e9
remove node events for state track changing pr
remove Txn and update calls with ReadTxn()

constructor for changetrackerdb
2020-09-04 10:23:35 -04:00
Drew Bailey 0af749c92e
Transaction change tracking
This commit wraps memdb.DB with a changeTrackerDB, which is a thin
wrapper around memdb.DB which enables go-memdb's TrackChanges on all write
transactions. When the transaction is comitted the changes are sent to
an eventPublisher which will be used to create and emit change events.

debugging TestFSM_ReconcileSummaries

wip

revert back rebase

revert back rebase

fix snapshot to actually use a snapshot
2020-09-01 10:27:20 -04:00
Lang Martin 7d483f93c0
csi: plugins track jobs in addition to allocations, and use job information to set expected counts (#8699)
* nomad/structs/csi: add explicit job support
* nomad/state/state_store: capture job updates directly
* api/nodes: CSIInfo needs the AllocID
* command/agent/csi_endpoint: AllocID was missing
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2020-08-27 17:20:00 -04:00
Chris Baker 655cbb4d3c documenting tests for prefix bugs around job scaling policies 2020-08-27 03:22:13 +00:00
Lars Lehtonen f8a42f587f
nomad/state: fix dropped scaling_policy test errors 2020-08-07 23:05:33 -07:00
Tim Gross 23be116da0
csi: add -force flag to volume deregister (#8295)
The `nomad volume deregister` command currently returns an error if the volume
has any claims, but in cases where the claims can't be dropped because of
plugin errors, providing a `-force` flag gives the operator an escape hatch.

If the volume has no allocations or if they are all terminal, this flag
deletes the volume from the state store, immediately and implicitly dropping
all claims without further CSI RPCs. Note that this will not also
unmount/detach the volume, which we'll make the responsibility of a separate
`nomad volume detach` command.
2020-07-01 12:17:51 -04:00
Drew Bailey 01e2cc5054
allow ClusterMetadata to accept a watchset (#8299)
* allow ClusterMetadata to accept a watchset

* use nil instead of empty watchset
2020-06-26 13:23:32 -04:00
Tim Gross ce86a594a6
csi: fix plugin counts on node update (#7844)
In this changeset:

* If a Nomad client node is running both a controller and a node
  plugin (which is a common case), then if only the controller or the
  node is removed, the plugin was not being updated with the correct
  counts.
* The existing test for plugin cleanup didn't go back to the state
  store, which normally is ok but is complicated in this case by
  denormalization which changes the behavior. This commit makes the
  test more comprehensive.
* Set "controller required" when plugin has `PUBLISH_READONLY`. All
  known controllers that support `PUBLISH_READONLY` also support
  `PUBLISH_UNPUBLISH_VOLUME` but we shouldn't assume this.
* Only create plugins when the allocs for those plugins are
  healthy. If we allow a plugin to be created for the first time when
  the alloc is not healthy, then we'll recreate deleted plugins when
  the job's allocs all get marked terminal.
* Terminal plugin alloc updates should cleanup the plugin. The client
  fingerprint can't tell if the plugin is unhealthy intentionally (for
  the case of updates or job stop). Allocations that are
  server-terminal should delete themselves from the plugin and trigger
  a plugin self-GC, the same as an unused node.
2020-05-05 15:39:57 -04:00
Tim Gross 52e805a6a6
csi: ensure Read/WriteAllocs aren't released early (#7841)
We should only remove the `ReadAllocs`/`WriteAllocs` values for a
volume after the claim has entered the "ready to free"
state. The volume will eventually be released as expected. But
querying the volume API will show the volume is released before the
controller unpublish has finished and this can cause a race with
starting new jobs.

Test updates are to cover cases where we're dropping claims but not
running through the whole reaping process.
2020-04-30 17:11:31 -04:00
Tim Gross 083b35d651
csi: checkpoint volume claim garbage collection (#7782)
Adds a `CSIVolumeClaim` type to be tracked as current and past claims
on a volume. Allows for a client RPC failure during node or controller
detachment without having to keep the allocation around after the
first garbage collection eval.

This changeset lays groundwork for moving the actual detachment RPCs
into a volume watching loop outside the GC eval.
2020-04-23 11:06:23 -04:00
Chris Baker 09d980be2b modify state store so that autoscaling policies are deleted from their
table as job is stopped (and recreated when job is started)
2020-04-21 23:01:26 +00:00
Lang Martin 24449e23af
csi: volume validate namespace (#7587)
* nomad/state/state_store: enforce that the volume namespace exists

* nomad/csi_endpoint_test: a couple of broken namespaces now

* nomad/csi_endpoint_test: one more test

* nomad/node_endpoint_test: use structs.DefaultNamespace

* nomad/state/state_store_test: use DefaultNamespace
2020-04-02 10:13:41 -04:00
Chris Baker 8ec252e627 added indices to the job scaling events, so we could properly do
blocking queries on the job scaling status
2020-04-01 17:28:19 +00:00
Chris Baker 40d6b3bbd1 adding raft and state_store support to track job scaling events
updated ScalingEvent API to record "message string,error bool" instead
of confusing "reason,error *string"
2020-04-01 16:15:14 +00:00
Lang Martin e03c328792
csi: use node MaxVolumes during scheduling (#7565)
* nomad/state/state_store: CSIVolumesByNodeID ignores namespace

* scheduler/scheduler: add CSIVolumesByNodeID to the state interface

* scheduler/feasible: check node MaxVolumes

* nomad/csi_endpoint: no namespace inn CSIVolumesByNodeID anymore

* nomad/state/state_store: avoid DenormalizeAllocationSlice

* nomad/state/iterator: clean up SliceIterator Next

* scheduler/feasible_test: block with MaxVolumes

* nomad/state/state_store_test: fix args to CSIVolumesByNodeID
2020-03-31 17:16:47 -04:00
Tim Gross 54b3573fc9
state: support snapshot of CSI plugin and volume tables (#7546)
The `csi_plugins` and `csi_volumes` tables were missing support for
snapshot persist and restore. This means restoring a snapshot would
result in missing information for CSI.
2020-03-30 11:17:16 -04:00
Chris Baker d6287c43b9 clean up some tests 2020-03-29 23:38:36 +00:00
Chris Baker 5e3c38be2f state_store:
* added method to retrieve all scaling policies for use in snapshotting, plus test
* better testing for ScalingPoliciesByNamespace
* added scaling policy snapshot persist and restore (and test of restore)

manually tested snapshot restore.

resolves #7539
2020-03-29 13:32:44 +00:00
Lang Martin 50ff9ccd44
csi: plugin deregistration on plugin job GC (#7502)
* nomad/structs/csi: delete just one plugin type from a node

* nomad/structs/csi: add DeleteAlloc

* nomad/state/state_store: add deleteJobFromPlugin

* nomad/state/state_store: use DeleteAlloc not DeleteNodeType

* move CreateTestCSIPlugin to state to avoid an import cycle

* nomad/state/state_store_test: delete a plugin by deleting its jobs

* nomad/*_test: move CreateTestCSIPlugin to state

* nomad/state/state_store: update one plugin per transaction

* command/plugin_status_test: move CreateTestCSIPlugin

* nomad: csi: handle nils CSIPlugin methods, clarity
2020-03-26 17:07:18 -04:00
Lang Martin 3375c92aa0
csi: make volume registration idempotent (#7490)
If not in use and not changing external ids, it should not be an error to register a volume again.

* nomad/state/state_store: make volume registration idempotent
2020-03-26 12:27:19 -04:00
Lang Martin ea80330aaa
csi: nomad/structs: test volume denormalize without plugin (#7472) 2020-03-26 09:43:59 -04:00
Mahmood Ali 281fc9837c tests: relax index checks
TestStateStore_Indexes specifically tests for `nodes` index, but asserts
on the exact number of indexes present in the state.  This is fragile
and will break almost everytime we add a state index.
2020-03-25 08:45:38 -04:00
Tim Gross 913da68296
csi: remove client from plugin on client node update (#7462)
Plugins track the client nodes where they are placed. On client
updates, remove the client from the plugin tracking if the client is
no longer running an instance of that controller/node plugin.

Extends the state store tests to ensure deregistration works as
expected and that controllers and nodes are being tracked
independently.
2020-03-24 13:26:31 -04:00
Chris Baker 9e530e167d
Merge pull request #7409 from hashicorp/scaling-api
Scaling API changes
2020-03-24 11:02:09 -05:00
Lang Martin bd22afd003
csi: volume deregister fails for volumes actively in use (#7445)
* nomad/structs/csi: add InUse to CSIVolume

* nomad/state/state_store: block volume deregistration for in use vols
2020-03-24 10:10:44 -04:00
Chris Baker abc7a52f56 finished refactoring state store, schema, etc 2020-03-24 13:57:14 +00:00
Chris Baker 6665d0bfb0 wip: added policy get endpoint, added UUID to policy 2020-03-24 13:55:20 +00:00
Chris Baker 9c2560ceeb wip: upsert/delete scaling policies on job upsert/delete 2020-03-24 13:55:18 +00:00
Chris Baker 65d92f1fbf WIP: adding ScalingPolicy to api/structs and state store 2020-03-24 13:55:18 +00:00
Lang Martin d994990ef0
csi: the scheduler allows a job with a volume write claim to be updated (#7438)
* nomad/structs/csi: split CanWrite into health, in use

* scheduler/scheduler: expose AllocByID in the state interface

* nomad/state/state_store_test

* scheduler/stack: SetJobID on the matcher

* scheduler/feasible: when a volume writer is in use, check if it's us

* scheduler/feasible: remove SetJob

* nomad/state/state_store: denormalize allocs before Claim

* nomad/structs/csi: return errors on claim, with context

* nomad/csi_endpoint_test: new alloc doesn't look like an update

* nomad/state/state_store_test: change test reference to CanWrite
2020-03-23 21:21:04 -04:00
Lang Martin 3621df1dbf csi: volume ids are only unique per namespace (#7358)
* nomad/state/schema: use the namespace compound index

* scheduler/scheduler: CSIVolumeByID interface signature namespace

* scheduler/stack: SetJob on CSIVolumeChecker to capture namespace

* scheduler/feasible: pass the captured namespace to CSIVolumeByID

* nomad/state/state_store: use namespace in csi_volume index

* nomad/fsm: pass namespace to CSIVolumeDeregister & Claim

* nomad/core_sched: pass the namespace in volumeClaimReap

* nomad/node_endpoint_test: namespaces in Claim testing

* nomad/csi_endpoint: pass RequestNamespace to state.*

* nomad/csi_endpoint_test: appropriately failed test

* command/alloc_status_test: appropriately failed test

* node_endpoint_test: avoid notTheNamespace for the job

* scheduler/feasible_test: call SetJob to capture the namespace

* nomad/csi_endpoint: ACL check the req namespace, query by namespace

* nomad/state/state_store: remove deregister namespace check

* nomad/state/state_store: remove unused CSIVolumes

* scheduler/feasible: CSIVolumeChecker SetJob -> SetNamespace

* nomad/csi_endpoint: ACL check

* nomad/state/state_store_test: remove call to state.CSIVolumes

* nomad/core_sched_test: job namespace match so claim gc works
2020-03-23 13:59:25 -04:00
Lang Martin 80619137ab csi: volumes listed in `nomad node status` (#7318)
* api/allocations: GetTaskGroup finds the taskgroup struct

* command/node_status: display CSI volume names

* nomad/state/state_store: new CSIVolumesByNodeID

* nomad/state/iterator: new SliceIterator type implements memdb.ResultIterator

* nomad/csi_endpoint: deal with a slice of volumes

* nomad/state/state_store: CSIVolumesByNodeID return a SliceIterator

* nomad/structs/csi: CSIVolumeListRequest takes a NodeID

* nomad/csi_endpoint: use the return iterator

* command/agent/csi_endpoint: parse query params for CSIVolumes.List

* api/nodes: new CSIVolumes to list volumes by node

* command/node_status: use the new list endpoint to print volumes

* nomad/state/state_store: error messages consider the operator

* command/node_status: include the Provider
2020-03-23 13:58:30 -04:00
Tim Gross b04d23dae0 csi: ensure volume query is idempotent (#7303)
We denormalize the `CSIVolume` struct when we query it from the state
store by getting the plugin and its health. But unless we copy the
volume, this denormalization gets synced back to the state store
without passing through the fsm (which is invalid).
2020-03-23 13:58:30 -04:00
Lang Martin 369b0e54b9 csi: volumes use `Schedulable` rather than `Healthy` (#7250)
* structs: add ControllerRequired, volume.Name, no plug.Type

* structs: Healthy -> Schedulable

* state_store: Healthy -> Schedulable

* api: add ControllerRequired to api data types

* api: copy csi structs changes

* nomad/structs/csi: include name and external id

* api/csi: include Name and ExternalID

* nomad/structs/csi: comments for the 3 ids
2020-03-23 13:58:30 -04:00
Lang Martin a4784ef258 csi add allocation context to fingerprinting results (#7133)
* structs: CSIInfo include AllocID, CSIPlugins no Jobs

* state_store: eliminate plugin Jobs, delete an empty plugin

* nomad/structs/csi: detect empty plugins correctly

* client/allocrunner/taskrunner/plugin_supervisor_hook: option AllocID

* client/pluginmanager/csimanager/instance: allocID

* client/pluginmanager/csimanager/fingerprint: set AllocID

* client/node_updater: split controller and node plugins

* api/csi: remove Jobs

The CSI Plugin API will map plugins to allocations, which allows
plugins to be defined by jobs in many configurations. In particular,
multiple plugins can be defined in the same job, and multiple jobs can
be used to define a single plugin.

Because we now map the allocation context directly from the node, it's
no longer necessary to track the jobs associated with a plugin
directly.

* nomad/csi_endpoint_test: CreateTestPlugin & register via fingerprint

* client/dynamicplugins: lift AllocID into the struct from Options

* api/csi_test: remove Jobs test

* nomad/structs/csi: CSIPlugins has an array of allocs

* nomad/state/state_store: implement CSIPluginDenormalize

* nomad/state/state_store: CSIPluginDenormalize npe on missing alloc

* nomad/csi_endpoint_test: defer deleteNodes for clarity

* api/csi_test: disable this test awaiting mocks:
https://github.com/hashicorp/nomad/issues/7123
2020-03-23 13:58:30 -04:00
Lang Martin 7b675f89ac csi: fix index maintenance for CSIVolume and CSIPlugin tables (#7049)
* state_store: csi volumes/plugins store the index in the txn

* nomad: csi_endpoint_test require index checks need uint64()

* nomad: other tests using int 0 not uint64(0)

* structs: pass index into New, but not other struct methods

* state_store: csi plugin indexes, use new struct interface

* nomad: csi_endpoint_test check index/query meta (on explicit 0)

* structs: NewCSIVolume takes an index arg now

* scheduler/test: NewCSIVolume takes an index arg now
2020-03-23 13:58:29 -04:00
Lang Martin 88316208a0 csi: server-side plugin state tracking and api (#6966)
* structs: CSIPlugin indexes jobs acting as plugins and node updates

* schema: csi_plugins table for CSIPlugin

* nomad: csi_endpoint use vol.Denormalize, plugin requests

* nomad: csi_volume_endpoint: rename to csi_endpoint

* agent: add CSI plugin endpoints

* state_store_test: use generated ids to avoid t.Parallel conflicts

* contributing: add note about registering new RPC structs

* command: agent http register plugin lists

* api: CSI plugin queries, ControllerHealthy -> ControllersHealthy

* state_store: copy on write for volumes and plugins

* structs: copy on write for volumes and plugins

* state_store: CSIVolumeByID returns an unhealthy volume, denormalize

* nomad: csi_endpoint use CSIVolumeDenormalizePlugins

* structs: remove struct errors for missing objects

* nomad: csi_endpoint return nil for missing objects, not errors

* api: return meta from Register to avoid EOF error

* state_store: CSIVolumeDenormalize keep allocs in their own maps

* state_store: CSIVolumeDeregister error on missing volume

* state_store: CSIVolumeRegister set indexes

* nomad: csi_endpoint use CSIVolumeDenormalizePlugins tests
2020-03-23 13:58:29 -04:00
Lang Martin 5b31b140c3 csi: do not use namespace specific identifiers 2020-03-23 13:58:29 -04:00
Lang Martin 4bb4dd98eb state_store: CSIVolume insert, get, delete, claim
state_store: change claim counts

state_store: get volumes by all, by driver

state_store: process volume claims

state_store: csi volume register error on update
2020-03-23 13:58:29 -04:00
Drew Bailey 9a65556211
add state store test to ensure PlacedCanaries is updated 2020-02-03 13:58:01 -05:00
Seth Hoenig 9df33f622f nomad: proxy requests for Service Identity tokens between Clients and Consul
Nomad jobs may be configured with a TaskGroup which contains a Service
definition that is Consul Connect enabled. These service definitions end
up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In
the case where Consul ACLs are enabled, a Service Identity token is required
for these tasks to run & connect, etc. This changeset enables the Nomad Server
to recieve RPC requests for the derivation of SI tokens on behalf of instances
of Consul Connect using Tasks. Those tokens are then relayed back to the
requesting Client, which then injects the tokens in the secrets directory of
the Task.
2020-01-31 19:03:53 -06:00
Seth Hoenig 2b66ce93bb nomad: ensure a unique ClusterID exists when leader (gh-6702)
Enable any Server to lookup the unique ClusterID. If one has not been
generated, and this node is the leader, generate a UUID and attempt to
apply it through raft.

The value is not yet used anywhere in this changeset, but is a prerequisite
for gh-6701.
2020-01-31 19:03:26 -06:00
Seth Hoenig d45dec1ca8 tests: parallelize state store tests
It has been decided we're going to live in a many core world.
Let's take advantage of that and parallelize these state store
tests which all run in memory and are largely CPU bound.

An unscientific benchmark demonstrating the improvement:

[mp state (master)] $ go test
PASS
ok  	github.com/hashicorp/nomad/nomad/state	5.162s

[mp state (f-parallelize-state-store-tests)] $ go test
PASS
ok  	github.com/hashicorp/nomad/nomad/state	1.527s
2019-12-11 09:36:37 -06:00
Michael Schurter 81b4b6f19b
Merge pull request #5791 from hashicorp/b-plan-snapshotindex
nomad: include snapshot index when submitting plans
2019-07-17 09:25:00 -07:00
Lang Martin 8e53c105fc state_store just one index update, test deletion 2019-07-10 13:56:19 -04:00
Lang Martin be2d6853cb state_store DeleteNode operates on a batch of ids 2019-07-10 13:56:19 -04:00
Michael Schurter e4bc943a68 nomad: SnapshotAfter -> SnapshotMinIndex
Rename SnapshotAfter to SnapshotMinIndex. The old name was not
technically accurate. SnapshotAtOrAfter is more accurate, but wordy and
still lacks context about what precisely it is at or after (the index).

SnapshotMinIndex was chosen as it describes the action (snapshot), a
constraint (minimum), and the object of the constraint (index).
2019-06-24 12:16:46 -07:00
Mahmood Ali 392f5bac44 Stop updating allocs.Job on stopping or preemption 2019-06-10 18:30:20 -04:00
Mahmood Ali 6c8e329819 test that stopped alloc jobs aren't modified
When an alloc is stopped, test that we don't update the job found in
alloc with new job  that is no longer relevent for this alloc.
2019-06-10 17:14:26 -04:00
Mahmood Ali 6bdbeed319 set node.StatusUpdatedAt in raft
Fix a case where `node.StatusUpdatedAt` was manipulated directly in
memory.

This ensures that StatusUpdatedAt is set in raft layer, and ensures that
the field is updated when node drain/eligibility is updated too.
2019-05-21 16:13:32 -04:00
Michael Schurter 9732bc37ff nomad: refactor waitForIndex into SnapshotAfter
Generalize wait for index logic in the state store for reuse elsewhere.
Also begin plumbing in a context to combine handling of timeouts and
shutdown.
2019-05-17 13:30:23 -07:00
Preetha c8fdf20c66
Merge pull request #5717 from hashicorp/b-plan-apply-preemptions
Fix bug in plan applier introduced in PR-5602
2019-05-16 11:01:05 -05:00
Preetha 555dd23c2c
remove stray newline
Co-Authored-By: Danielle <dani@builds.terrible.systems>
2019-05-15 21:11:52 -05:00
Preetha Appan 2b787aad7e
Fix bug in plan applier introduced in PR-5602
This fixes a bug in the state store during plan apply. When
denormalizing preempted allocations it incorrectly set the preemptor's
job during the update. This eventually causes a panic downstream in the
client. Added a test assertion that failed before and passes after this fix
2019-05-15 20:34:06 -05:00
Preetha Appan 07690d6f9e
Add flag similar to --all for allocs to be able to filter deployments by latest 2019-05-13 18:33:41 -05:00
Arshneet Singh 9cc39edb67 Return error when preempted/stopped alloc doesn't exist during denormalization 2019-04-24 12:36:07 -07:00
Arshneet Singh d4e7a5c005 Add comments to functions, and use require instead of assert 2019-04-23 09:57:21 -07:00
Arshneet Singh 4cf4324b8f Remove allowPlanOptimization from schedulers 2019-04-23 09:18:02 -07:00
Arshneet Singh 65f5fab131 Add tests for plan normalization 2019-04-23 09:18:01 -07:00
Preetha Appan 0f8a113ead
Refactor to find jobs with child instances more effeciently
also added unit tests
2019-01-17 14:29:48 -06:00
Mahmood Ali 6281700c0c address review comments 2018-11-20 13:21:39 -05:00
Mahmood Ali d744e71fa9 add a missing no errorassertion 2018-11-19 21:44:00 -05:00
Mahmood Ali bff9c3b3e9 Reproduce a panic related to batch GC
Test case that reproduces a panic with the following stacktrace:

```
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715]

goroutine 35 [running]:
testing.tRunner.func1(0xc0001e2200)
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387
panic(0x167e400, 0x1c43a30)
	/usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530)
	/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc0003a0420, 0x1756059, 0xb)
	/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e
github.com/hashicorp/nomad/nomad/state.(*StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1
github.com/hashicorp/nomad/nomad/state.(*StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2
github.com/hashicorp/nomad/nomad/state.(*StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685
testing.tRunner(0xc0001e2200, 0x1777138)
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353
```
2018-11-19 20:58:32 -05:00
Alex Dadgar 98398a8a44
Merge pull request #4842 from hashicorp/b-deployment-progress-deadline
Fix multiple bugs with progress deadline handling
2018-11-08 13:31:54 -08:00
Alex Dadgar 261aae32b1 more robust merging of the deployment status when getting updates from the client 2018-11-05 16:39:09 -08:00
Alex Dadgar 1c31970464 Fix multiple tgs with progress deadline handling
Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.

Further fix an issue where the blocked eval optimization would make it
so no evals were created to progress the deployment. To reproduce this
issue, prior to this commit, you can create a job with two task groups.
The first group has count 1 and resources such that it can not be
placed. The second group has count 3, max_parallel=1, and can be placed.
Run this first and then update the second group to do a deployment. It
will place the first of three, but never progress since there exists a
blocked eval. However, that doesn't capture the fact that there are two
groups being deployed.
2018-11-05 16:06:17 -08:00
Preetha Appan 1a5421f5d7
more minor cleanup 2018-10-30 11:06:32 -05:00
Preetha Appan 0494a098ce
More style and readablity fixes from review 2018-10-30 11:06:32 -05:00
Preetha Appan 7b8156fc47
Restore/Snapshot plus unit tests for scheduler configuration 2018-10-30 11:06:32 -05:00
Preetha Appan eb38488d08
Fix logic bug, unit test for plan apply method in state store 2018-10-30 11:06:32 -05:00
Alex Dadgar 52f9cd7637 fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar 21c5ed850d Register events 2018-05-22 14:06:33 -07:00
Alex Dadgar 17aac1c9de node heartbeat missed event 2018-05-22 14:05:46 -07:00
Alex Dadgar 5f2080bc26 Emit events based on eligibility 2018-05-22 14:04:59 -07:00
Alex Dadgar a35248d1d8 Plumb event via FSM 2018-05-10 16:30:54 -07:00
Preetha Appan cba13e4ec5
Fix test set up to set ModifyTime for alloc 2018-05-07 14:55:01 -05:00
Alex Dadgar 319763a5d8
remove unnessary merge of DeploymentStatus.Timestamp 2018-05-07 14:50:01 -05:00
Alex Dadgar f95ab4ade8
Mark canaries on creation, and unmark on promotion 2018-05-07 14:50:01 -05:00
Alex Dadgar 641ef81cbf
Test fixes 2018-05-07 14:50:01 -05:00
Alex Dadgar 99e00fb774
Pass through timestamp 2018-05-07 14:50:01 -05:00
Alex Dadgar 1336002255
Progress deadline in deployment state 2018-05-07 14:50:01 -05:00
Preetha Appan 52b3b53181
Update ModifyIndex of alloc when setting NextAllocation value 2018-05-03 17:04:36 -05:00
Chelsea Holland Komlo 31557cc44f move tests to use time.Time 2018-03-27 15:43:57 -04:00
Michael Schurter cb61a4bdc7 Fix linting errors 2018-03-21 16:51:45 -07:00
Alex Dadgar 2d91b9dfba Batch drain update 2018-03-21 16:51:44 -07:00
Alex Dadgar 7b2bad8c5e Toggle Drain allows resetting eligibility
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
2018-03-21 16:51:44 -07:00
Alex Dadgar 0fba0101b6 RPC/FSM/State Store for Eligibility 2018-03-21 16:51:44 -07:00
Alex Dadgar 2f5309d82a Remove update time 2018-03-21 16:51:43 -07:00
Alex Dadgar 0965c9ed28 Fix tests 2018-03-21 16:51:43 -07:00
Alex Dadgar e459a666ed Node.Drain takes strategy 2018-03-21 16:49:48 -07:00
Michael Schurter 03d0e5b8a0 improve drain fsm/statestore tests 2018-03-21 16:49:48 -07:00
Michael Schurter d1ec65d765 switch to new raft DesiredTransition message 2018-03-21 16:49:48 -07:00
Alex Dadgar db4a634072 RPC, FSM, State Store for marking DesiredTransistion
fix build tag
2018-03-21 16:49:48 -07:00
Alex Dadgar 63e14b7d63 nodeevents -> events 2018-03-13 18:08:22 -07:00
Alex Dadgar d3c3deffad fixes 2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo 8f109c344c make check fixes 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 1488b076d1 code review feedback 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 19ef872769 keep state store functions in one file 2018-03-13 18:08:21 -07:00
Michael Schurter 7dd7fbcda2 non-Existent -> nonexistent
Reverting from #3963

https://www.merriam-webster.com/dictionary/existent
2018-03-12 11:59:33 -07:00