Commit graph

206 commits

Author SHA1 Message Date
Drew Bailey c463479848
filter on additional filter keys, remove switch statement duplication
properly wire up durable event count

move newline responsibility

moves newline creation from NDJson to the http handler, json stream only encodes and sends now

ignore snapshot restore if broker is disabled

enable dev mode to access event steam without acl

use mapping instead of switch

use pointers for config sizes, remove unused ttl, simplify closed conn logic
2020-10-14 14:14:33 -04:00
Drew Bailey df96b89958
Add EvictCallbackFn to handle removing entries from go-memdb when they
are removed from the event buffer.

Wire up event buffer size config, use pointers for structs.Events
instead of copying.
2020-10-14 12:44:42 -04:00
Drew Bailey 315f77a301
rehydrate event publisher on snapshot restore
address pr feedback
2020-10-14 12:44:41 -04:00
Drew Bailey d793529d61
event durability count and cfg 2020-10-14 12:44:40 -04:00
Drew Bailey b4c135358d
use Events to wrap index and events, store in events table 2020-10-14 12:44:39 -04:00
Drew Bailey 9d48818eb8
writetxn can return error, add alloc and job generic events. Add events
table for durability
2020-10-14 12:44:39 -04:00
Drew Bailey 400455d302
Events/eval alloc events (#9012)
* generic eval update event

first pass at alloc client update events

* api/event client
2020-10-14 12:44:37 -04:00
Drew Bailey 4793bb4e01
Events/deployment events (#9004)
* Node Drain events and Node Events (#8980)

Deployment status updates

handle deployment status updates (paused, failed, resume)

deployment alloc health

generate events from apply plan result

txn err check, slim down deployment event

one ndjson line per index

* consolidate down to node event + type

* fix UpdateDeploymentAllocHealth test invocations

* fix test
2020-10-14 12:44:37 -04:00
Drew Bailey a4a2975edf
Event Stream API/RPC (#8947)
This Commit adds an /v1/events/stream endpoint to stream events from.

The stream framer has been updated to include a SendFull method which
does not fragment the data between multiple frames. This essentially
treats the stream framer as a envelope to adhere to the stream framer
interface in the UI.

If the `encode` query parameter is omitted events will be streamed as
newline delimted JSON.
2020-10-14 12:44:36 -04:00
Drew Bailey 207068ca28
Events/event source node (#8918)
* Node Register/Deregister event sourcing

example upsert node with context

fill in writetxnwithctx

ctx passing to handle event type creation, wip test

node deregistration event

drop Node from registration event

* node batch deregistration
2020-10-14 12:44:35 -04:00
Drew Bailey 4753904b90
Events/cfg enable publisher (#8916)
* only enable publisher based on config

* add default prune tick

* back out state abandon changes on fsm close
2020-10-14 12:44:35 -04:00
Drew Bailey f820744746
abandon current state on server shutdown 2020-10-14 12:44:34 -04:00
Luiz Aoqui 88d4eecfd0
add scaling policy type 2020-09-29 17:57:46 -04:00
Michael Schurter 9dd59ceaa7 core: improve job deregister error logging
Noticed this error in some production logs, and they were far from
helpful. Changes:

1. Include job ID in logs
2. Wrap errors and log once instead of double log lines
3. Test fsm error handling behavior
2020-09-21 08:59:03 -07:00
Drew Bailey 0af749c92e
Transaction change tracking
This commit wraps memdb.DB with a changeTrackerDB, which is a thin
wrapper around memdb.DB which enables go-memdb's TrackChanges on all write
transactions. When the transaction is comitted the changes are sent to
an eventPublisher which will be used to create and emit change events.

debugging TestFSM_ReconcileSummaries

wip

revert back rebase

revert back rebase

fix snapshot to actually use a snapshot
2020-09-01 10:27:20 -04:00
Mahmood Ali aa500f7ba3 comment compat concern in fsm.go 2020-07-15 11:23:49 -04:00
Mahmood Ali f4a921f2be no need to handle duplicate evals anymore 2020-07-15 11:14:49 -04:00
Mahmood Ali fbfe4ab1bd Atomic eval insertion with job (de-)registration
This fixes a bug where jobs may get "stuck" unprocessed that
dispropotionately affect periodic jobs around leadership transitions.
When registering a job, the job registration and the eval to process it
get applied to raft as two separate transactions; if the job
registration succeeds but eval application fails, the job may remain
unprocessed. Operators may detect such failure, when submitting a job
update and get a 500 error code, and they could retry; periodic jobs
failures are more likely to go unnoticed, and no further periodic
invocations will be processed until an operator force evaluation.

This fixes the issue by ensuring that the job registration and eval
application get persisted and processed atomically in the same raft log
entry.

Also, applies the same change to ensure atomicity in job deregistration.

Backward Compatibility

We must maintain compatibility in two scenarios: mixed clusters where a
leader can handle atomic updates but followers cannot, and a recent
cluster processes old log entries from legacy or mixed cluster mode.

To handle this constraints: ensure that the leader continue to emit the
Evaluation log entry until all servers have upgraded; also, when
processing raft logs, the servers honor evaluations found in both spots,
the Eval in job (de-)registration and the eval update entries.

When an updated server sees mix-mode behavior where an eval is inserted
into the raft log twice, it ignores the second instance.

I made one compromise in consistency in the mixed-mode scenario: servers
may disagree on the eval.CreateIndex value: the leader and updated
servers will report the job registration index while old servers will
report the index of the eval update log entry. This discripency doesn't
seem to be material - it's the eval.JobModifyIndex that matters.
2020-07-14 11:59:29 -04:00
Tim Gross 23be116da0
csi: add -force flag to volume deregister (#8295)
The `nomad volume deregister` command currently returns an error if the volume
has any claims, but in cases where the claims can't be dropped because of
plugin errors, providing a `-force` flag gives the operator an escape hatch.

If the volume has no allocations or if they are all terminal, this flag
deletes the volume from the state store, immediately and implicitly dropping
all claims without further CSI RPCs. Note that this will not also
unmount/detach the volume, which we'll make the responsibility of a separate
`nomad volume detach` command.
2020-07-01 12:17:51 -04:00
Drew Bailey 01e2cc5054
allow ClusterMetadata to accept a watchset (#8299)
* allow ClusterMetadata to accept a watchset

* use nil instead of empty watchset
2020-06-26 13:23:32 -04:00
Mahmood Ali 37c6160b96 Handle nil/empty cluster metadata
Handle case where a snapshot is made before cluster metadata is created.

This fixes a bug where a server may have empty cluster metadata if it
created and installed a Raft snapshot before a new cluster metadata ID is
generated.

This case is very unlikely to arise.  Most likely reason is when
upgrading from an old version slowly where servers may use snapshots
before all servers upgrade.  This happened for a user with a log line
like:

```
2020-05-21T15:21:56.996Z [ERROR] nomad.fsm: ClusterSetMetadata failed: error=""set cluster metadata failed: refusing to set new cluster id, previous: , new: <<redacted>
```
2020-05-29 13:34:21 -04:00
Mahmood Ali 2c963885b0 handle upgrade path and defaults
Ensure that `""` Scheduler Algorithm gets explicitly set to binpack on
upgrades or on API handling when user misses the value.

The scheduler already treats `""` value as binpack.  This PR merely
ensures that the operator API returns the effective value.
2020-05-09 12:34:08 -04:00
Tim Gross 801ebcfe8d
periodic GC for CSI plugins (#7878)
This changeset implements a periodic garbage collection of unused CSI
plugins. Plugins are self-cleaning when the last allocation for a
plugin is stopped, but this feature will cover any missing edge cases
and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins
cleaned up.
2020-05-06 16:49:12 -04:00
Tim Gross a7a64443e1
csi: move volume claim release into volumewatcher (#7794)
This changeset adds a subsystem to run on the leader, similar to the
deployment watcher or node drainer. The `Watcher` performs a blocking
query on updates to the `CSIVolumes` table and triggers reaping of
volume claims.

This will avoid tying up scheduling workers by immediately sending
volume claim workloads into their own loop, rather than blocking the
scheduling workers in the core GC job doing things like talking to CSI
controllers

The volume watcher is enabled on leader step-up and disabled on leader
step-down.

The volume claim GC mechanism now makes an empty claim RPC for the
volume to trigger an index bump. That in turn unblocks the blocking
query in the volume watcher so it can assess which claims can be
released for a volume.
2020-04-30 09:13:00 -04:00
Tim Gross 083b35d651
csi: checkpoint volume claim garbage collection (#7782)
Adds a `CSIVolumeClaim` type to be tracked as current and past claims
on a volume. Allows for a client RPC failure during node or controller
detachment without having to keep the allocation around after the
first garbage collection eval.

This changeset lays groundwork for moving the actual detachment RPCs
into a volume watching loop outside the GC eval.
2020-04-23 11:06:23 -04:00
Chris Baker b2ab42afbb scaling api: more testing around the scaling events api 2020-04-01 16:39:23 +00:00
Chris Baker 40d6b3bbd1 adding raft and state_store support to track job scaling events
updated ScalingEvent API to record "message string,error bool" instead
of confusing "reason,error *string"
2020-04-01 16:15:14 +00:00
Yoan Blanc 225c9c1215 fixup! vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:48:07 -04:00
Yoan Blanc 761d014071 vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:45:21 -04:00
Tim Gross 54b3573fc9
state: support snapshot of CSI plugin and volume tables (#7546)
The `csi_plugins` and `csi_volumes` tables were missing support for
snapshot persist and restore. This means restoring a snapshot would
result in missing information for CSI.
2020-03-30 11:17:16 -04:00
Chris Baker 5e3c38be2f state_store:
* added method to retrieve all scaling policies for use in snapshotting, plus test
* better testing for ScalingPoliciesByNamespace
* added scaling policy snapshot persist and restore (and test of restore)

manually tested snapshot restore.

resolves #7539
2020-03-29 13:32:44 +00:00
Lang Martin 3621df1dbf csi: volume ids are only unique per namespace (#7358)
* nomad/state/schema: use the namespace compound index

* scheduler/scheduler: CSIVolumeByID interface signature namespace

* scheduler/stack: SetJob on CSIVolumeChecker to capture namespace

* scheduler/feasible: pass the captured namespace to CSIVolumeByID

* nomad/state/state_store: use namespace in csi_volume index

* nomad/fsm: pass namespace to CSIVolumeDeregister & Claim

* nomad/core_sched: pass the namespace in volumeClaimReap

* nomad/node_endpoint_test: namespaces in Claim testing

* nomad/csi_endpoint: pass RequestNamespace to state.*

* nomad/csi_endpoint_test: appropriately failed test

* command/alloc_status_test: appropriately failed test

* node_endpoint_test: avoid notTheNamespace for the job

* scheduler/feasible_test: call SetJob to capture the namespace

* nomad/csi_endpoint: ACL check the req namespace, query by namespace

* nomad/state/state_store: remove deregister namespace check

* nomad/state/state_store: remove unused CSIVolumes

* scheduler/feasible: CSIVolumeChecker SetJob -> SetNamespace

* nomad/csi_endpoint: ACL check

* nomad/state/state_store_test: remove call to state.CSIVolumes

* nomad/core_sched_test: job namespace match so claim gc works
2020-03-23 13:59:25 -04:00
Danielle Lancashire 9d4307a3ef csi_endpoint: Provide AllocID in req, and return Volume
Currently, the client has to ship an entire allocation to the server as
part of performing a VolumeClaim, this has a few problems:

Firstly, it means the client is sending significantly more data than is
required (an allocation contains the entire contents of a Nomad job,
alongside other irrelevant state) which has a non-zero (de)serialization
cost.

Secondly, because the allocation was never re-fetched from the state
store, it means that we were potentially open to issues caused by stale
state on a misbehaving or malicious client.

The change removes both of those issues at the cost of a couple of more
state store lookups, but they should be relatively cheap.

We also now provide the CSIVolume in the response for a claim, so the
client can perform a Claim without first going ahead and fetching all of
the volumes.
2020-03-23 13:58:30 -04:00
Lang Martin 5b31b140c3 csi: do not use namespace specific identifiers 2020-03-23 13:58:29 -04:00
Lang Martin 857cd37ab5 fsm: dispatch CSIVolume register, deregister, claim 2020-03-23 13:58:29 -04:00
Seth Hoenig 9df33f622f nomad: proxy requests for Service Identity tokens between Clients and Consul
Nomad jobs may be configured with a TaskGroup which contains a Service
definition that is Consul Connect enabled. These service definitions end
up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In
the case where Consul ACLs are enabled, a Service Identity token is required
for these tasks to run & connect, etc. This changeset enables the Nomad Server
to recieve RPC requests for the derivation of SI tokens on behalf of instances
of Consul Connect using Tasks. Those tokens are then relayed back to the
requesting Client, which then injects the tokens in the secrets directory of
the Task.
2020-01-31 19:03:53 -06:00
Seth Hoenig 2b66ce93bb nomad: ensure a unique ClusterID exists when leader (gh-6702)
Enable any Server to lookup the unique ClusterID. If one has not been
generated, and this node is the leader, generate a UUID and attempt to
apply it through raft.

The value is not yet used anywhere in this changeset, but is a prerequisite
for gh-6701.
2020-01-31 19:03:26 -06:00
Mahmood Ali d740d347ce Migrate old alloc structs on read
This commit ensures that Alloc.AllocatedResources is properly populated
when read from persistence stores (namely Raft and client state store).
The alloc struct may have been written previously by an arbitrary old
version that may only populate Alloc.TaskResources.
2020-01-09 08:46:50 -05:00
Lang Martin ea275d5ce7 fsm attach UnblockNode on node updates 2019-07-18 10:32:12 -04:00
Lang Martin a95225d754 NodeDeregisterBatch -> NodeBatchDeregister match JobBatch pattern 2019-07-10 13:56:20 -04:00
Lang Martin 44cbca9b98 fsm new NodeDeregisterBatchRequestType sorted at the end of the case 2019-07-10 13:56:20 -04:00
Lang Martin 1cc6b4062c fsm label batch_deregister_node metrics explicitly
Co-Authored-By: Mahmood Ali <mahmood@notnoop.com>
2019-07-10 13:56:20 -04:00
Lang Martin ce0f03651a fsm support new NodeDeregisterBatchRequest 2019-07-10 13:56:20 -04:00
Lang Martin 6dbf5d7d13 fsm return an error on both NodeDeregisterRequest fields set 2019-07-10 13:56:19 -04:00
Lang Martin fbc78ba96c fsm variable names for consistency 2019-07-10 13:56:19 -04:00
Lang Martin 3bf41211fb fsm honor new and old style NodeDeregisterRequests 2019-07-10 13:56:19 -04:00
Lang Martin a97407e030 fsm NodeDeregisterRequest is now a batch 2019-07-10 13:56:19 -04:00
Preetha Appan 10e7d6df6d
Remove compat code associated with many previous versions of nomad
This removes compat code for namespaces (0.7), Drain(0.8) and other
older features from releases older than Nomad 0.7
2019-06-25 19:05:25 -05:00
Mahmood Ali 6bdbeed319 set node.StatusUpdatedAt in raft
Fix a case where `node.StatusUpdatedAt` was manipulated directly in
memory.

This ensures that StatusUpdatedAt is set in raft layer, and ensures that
the field is updated when node drain/eligibility is updated too.
2019-05-21 16:13:32 -04:00
Alex Dadgar 4bdccab550 goimports 2019-01-22 15:44:31 -08:00