Commit Graph

457 Commits

Author SHA1 Message Date
Drew Bailey 1ae39a9ed9
event sink crud operation api (#9155)
* network sink rpc/api plumbing

state store methods and restore

upsert sink test

get sink

delete sink

event sink list and tests

go generate new msg types

validate sink on upsert

* go generate
2020-10-23 14:23:00 -04:00
Michael Schurter c2dd9bc996 core: open source namespaces 2020-10-22 15:26:32 -07:00
Drew Bailey f3dcefe5a9
remove event durability (#9147)
* remove event durability

temporarily removing go-memdb event durability until a new strategy is developed on how to best handled increased durability needs

* drop events table schema and state store methods

* fix neweventbuffer invocations
2020-10-22 12:21:03 -04:00
Drew Bailey 8451de99b2
adds two base event stream e2e tests (#9126)
* adds two base event stream e2e tests

test evaluation filter keys are included

* Apply suggestions from code review

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* gc aftereach

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2020-10-20 08:26:21 -04:00
Drew Bailey 6c788fdccd
Events/msgtype cleanup (#9117)
* use msgtype in upsert node

adds message type to signature for upsert node, update tests, remove placeholder method

* UpsertAllocs msg type test setup

* use upsertallocs with msg type in signature

update test usage of delete node

delete placeholder msgtype method

* add msgtype to upsert evals signature, update test call sites with test setup msg type

handle snapshot upsert eval outside of FSM and ignore eval event

remove placeholder upsertevalsmsgtype

handle job plan rpc and prevent event creation for plan

msgtype cleanup upsertnodeevents

updatenodedrain msgtype

msg type 0 is a node registration event, so set the default  to the ignore type

* fix named import

* fix signature ordering on upsertnode to match
2020-10-19 09:30:15 -04:00
Drew Bailey c57e760933
remove special node drain event type
rely on standardized events instead of special node drain event
2020-10-15 16:44:36 -04:00
Michael Schurter dd09fa1a4a
Merge pull request #9055 from hashicorp/f-9017-resources
api: add field filters to /v1/{allocations,nodes}
2020-10-14 14:49:39 -07:00
Drew Bailey c463479848
filter on additional filter keys, remove switch statement duplication
properly wire up durable event count

move newline responsibility

moves newline creation from NDJson to the http handler, json stream only encodes and sends now

ignore snapshot restore if broker is disabled

enable dev mode to access event steam without acl

use mapping instead of switch

use pointers for config sizes, remove unused ttl, simplify closed conn logic
2020-10-14 14:14:33 -04:00
Michael Schurter 8ccbd92cb6 api: add field filters to /v1/{allocations,nodes}
Fixes #9017

The ?resources=true query parameter includes resources in the object
stub listings. Specifically:

- For `/v1/nodes?resources=true` both the `NodeResources` and
  `ReservedResources` field are included.
- For `/v1/allocations?resources=true` the `AllocatedResources` field is
  included.

The ?task_states=false query parameter removes TaskStates from
/v1/allocations responses. (By default TaskStates are included.)
2020-10-14 10:35:22 -07:00
Drew Bailey 684807bddb
namespace filtering 2020-10-14 12:44:43 -04:00
Drew Bailey fdc576af09
handle txn returning error 2020-10-14 12:44:42 -04:00
Drew Bailey df96b89958
Add EvictCallbackFn to handle removing entries from go-memdb when they
are removed from the event buffer.

Wire up event buffer size config, use pointers for structs.Events
instead of copying.
2020-10-14 12:44:42 -04:00
Drew Bailey 315f77a301
rehydrate event publisher on snapshot restore
address pr feedback
2020-10-14 12:44:41 -04:00
Drew Bailey d793529d61
event durability count and cfg 2020-10-14 12:44:40 -04:00
Drew Bailey b4c135358d
use Events to wrap index and events, store in events table 2020-10-14 12:44:39 -04:00
Drew Bailey 9d48818eb8
writetxn can return error, add alloc and job generic events. Add events
table for durability
2020-10-14 12:44:39 -04:00
Drew Bailey 400455d302
Events/eval alloc events (#9012)
* generic eval update event

first pass at alloc client update events

* api/event client
2020-10-14 12:44:37 -04:00
Drew Bailey 4793bb4e01
Events/deployment events (#9004)
* Node Drain events and Node Events (#8980)

Deployment status updates

handle deployment status updates (paused, failed, resume)

deployment alloc health

generate events from apply plan result

txn err check, slim down deployment event

one ndjson line per index

* consolidate down to node event + type

* fix UpdateDeploymentAllocHealth test invocations

* fix test
2020-10-14 12:44:37 -04:00
Drew Bailey a4a2975edf
Event Stream API/RPC (#8947)
This Commit adds an /v1/events/stream endpoint to stream events from.

The stream framer has been updated to include a SendFull method which
does not fragment the data between multiple frames. This essentially
treats the stream framer as a envelope to adhere to the stream framer
interface in the UI.

If the `encode` query parameter is omitted events will be streamed as
newline delimted JSON.
2020-10-14 12:44:36 -04:00
Drew Bailey 207068ca28
Events/event source node (#8918)
* Node Register/Deregister event sourcing

example upsert node with context

fill in writetxnwithctx

ctx passing to handle event type creation, wip test

node deregistration event

drop Node from registration event

* node batch deregistration
2020-10-14 12:44:35 -04:00
Drew Bailey 4753904b90
Events/cfg enable publisher (#8916)
* only enable publisher based on config

* add default prune tick

* back out state abandon changes on fsm close
2020-10-14 12:44:35 -04:00
Drew Bailey fddac3af00
Event Buffer Implemenation
adds an event buffer to hold events from raft changes.

update events to use event buffer

fix append call

provide way to prune buffer items after TTL

event publisher tests

basic publish test

wire up max item ttl

rename package to stream, cleanup exploratory work

subscription filtering

subscription plumbing

allow subscribers to consume events, handle closing subscriptions

back out old exploratory ctx work

fix lint

remove unused ctx bits

add a few comments

fix test

stop publisher on abandon
2020-10-14 12:44:34 -04:00
Tim Gross 4335d847a4
Allow job Version to start at non-zero value (#9071)
Stop coercing version of new job to 0 in the state_store, so that we can add
regions to a multi-region deployment. Send new version, rather than existing
version, to MRD to accomodate version-choosing logic changes in ENT.

Co-authored-by: Chris Baker <1675087+cgbaker@users.noreply.github.com>
2020-10-12 13:59:48 -04:00
Nick Ethier d45be0b5a6
client: add NetworkStatus to Allocation (#8657) 2020-10-12 13:43:04 -04:00
Luiz Aoqui 88d4eecfd0
add scaling policy type 2020-09-29 17:57:46 -04:00
Drew Bailey 28aa0387e9
remove node events for state track changing pr
remove Txn and update calls with ReadTxn()

constructor for changetrackerdb
2020-09-04 10:23:35 -04:00
Drew Bailey d5f6d3b3c5
fix a few missed txn changes 2020-09-01 10:27:21 -04:00
Drew Bailey 9253146bf4
fix bad merge from scalingpoliciesbynamespace 2020-09-01 10:27:20 -04:00
Drew Bailey 45762d8df8
noop changetracker for snapshots 2020-09-01 10:27:20 -04:00
Drew Bailey 0af749c92e
Transaction change tracking
This commit wraps memdb.DB with a changeTrackerDB, which is a thin
wrapper around memdb.DB which enables go-memdb's TrackChanges on all write
transactions. When the transaction is comitted the changes are sent to
an eventPublisher which will be used to create and emit change events.

debugging TestFSM_ReconcileSummaries

wip

revert back rebase

revert back rebase

fix snapshot to actually use a snapshot
2020-09-01 10:27:20 -04:00
Lang Martin 7d483f93c0
csi: plugins track jobs in addition to allocations, and use job information to set expected counts (#8699)
* nomad/structs/csi: add explicit job support
* nomad/state/state_store: capture job updates directly
* api/nodes: CSIInfo needs the AllocID
* command/agent/csi_endpoint: AllocID was missing
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2020-08-27 17:20:00 -04:00
Chris Baker 8b9145fabd state_store/fix the prefix bugs for scaling policies documented in 1a9318 2020-08-27 04:25:37 +00:00
Chris Baker 655cbb4d3c documenting tests for prefix bugs around job scaling policies 2020-08-27 03:22:13 +00:00
Lang Martin e8a5565c1a
nomad/state/state_store: handle type conversion failure explicitly (#8660) 2020-08-12 17:53:12 -04:00
Lang Martin c82b2a2454
CSI: volume and plugin allocations in the API (#8590)
* command/agent/csi_endpoint: explicitly convert to API structs, and convert allocs for single object get endpoints
2020-08-11 12:24:41 -04:00
Lang Martin f245ba91c4
nomad/state/state_store: two cases of incorrect CSIPlugin in-place (#8630) 2020-08-10 18:15:29 -04:00
Tim Gross e5496c7994
csi: missing plugins during node delete are not an error (#8619)
When deregistering a client, CSI plugins running on that client may not get a
chance to fingerprint before being stopped. Account for the case where a
plugin allocation is the last instance of the plugin and has been deleted from
the state store to avoid errors during node deregistration.
2020-08-10 11:02:01 -04:00
Lars Lehtonen f8a42f587f
nomad/state: fix dropped scaling_policy test errors 2020-08-07 23:05:33 -07:00
Tim Gross d3341a2019 refactor: make it clear where we're accessing dstate
The field name `Deployment.TaskGroups` contains a map of `DeploymentState`,
which makes it a little harder to follow state updates when combined with
inconsistent naming conventions, particularly when we also have the state
store or actual `TaskGroup`s in scope. This changeset changes all uses to
`dstate` so as not to be confused with actual TaskGroups.
2020-07-20 11:25:53 -04:00
Mahmood Ali 78568b8e63 Remove unused state.TestInitState 2020-07-20 09:55:55 -04:00
Mahmood Ali 5adbd9f666 enterprise specific state store objects 2020-07-20 09:22:26 -04:00
Chris Baker 5aa46e9a8f modified state store to allow version skipping, to support multiregion version syncing also, passing existing version into multiregionRegister to support this 2020-07-06 14:16:55 +00:00
Tim Gross 23be116da0
csi: add -force flag to volume deregister (#8295)
The `nomad volume deregister` command currently returns an error if the volume
has any claims, but in cases where the claims can't be dropped because of
plugin errors, providing a `-force` flag gives the operator an escape hatch.

If the volume has no allocations or if they are all terminal, this flag
deletes the volume from the state store, immediately and implicitly dropping
all claims without further CSI RPCs. Note that this will not also
unmount/detach the volume, which we'll make the responsibility of a separate
`nomad volume detach` command.
2020-07-01 12:17:51 -04:00
Drew Bailey 01e2cc5054
allow ClusterMetadata to accept a watchset (#8299)
* allow ClusterMetadata to accept a watchset

* use nil instead of empty watchset
2020-06-26 13:23:32 -04:00
Mahmood Ali c0aa06d9c7 rpc: allow querying allocs across namespaces
This implements the backend handling for querying across namespaces for
allocation list endpoints.
2020-06-17 16:31:06 -04:00
Lang Martin 069840bef8
scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect (#8105) (#8138)
* scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect

* scheduler/reconcile: thread follupEvalIDs through to results.stop

* scheduler/reconcile: comment typo

* nomad/_test: correct arguments for plan.AppendStoppedAlloc

* scheduler/reconcile: avoid nil, cleanup handleDelayed(Lost|Reschedules)
2020-06-09 17:13:53 -04:00
Mahmood Ali a73cd01a00
Merge pull request #8001 from hashicorp/f-jobs-list-across-nses
endpoint to expose all jobs across all namespaces
2020-05-31 21:28:03 -04:00
Mahmood Ali 37c6160b96 Handle nil/empty cluster metadata
Handle case where a snapshot is made before cluster metadata is created.

This fixes a bug where a server may have empty cluster metadata if it
created and installed a Raft snapshot before a new cluster metadata ID is
generated.

This case is very unlikely to arise.  Most likely reason is when
upgrading from an old version slowly where servers may use snapshots
before all servers upgrade.  This happened for a user with a log line
like:

```
2020-05-21T15:21:56.996Z [ERROR] nomad.fsm: ClusterSetMetadata failed: error=""set cluster metadata failed: refusing to set new cluster id, previous: , new: <<redacted>
```
2020-05-29 13:34:21 -04:00
Drew Bailey 23d24c7a7f
removes pro tags (#8014) 2020-05-28 15:40:17 -04:00
Mahmood Ali 5ab2d52e27 endpoint to expose all jobs across all namespaces
Allow a `/v1/jobs?all_namespaces=true` to list all jobs across all
namespaces.  The returned list is to contain a `Namespace` field
indicating the job namespace.

If ACL is enabled, the request token needs to be a management token or
have `namespace:list-jobs` capability on all existing namespaces.
2020-05-18 13:50:46 -04:00