Commit graph

27 commits

Author SHA1 Message Date
Lang Martin ad2fb4b297 client/heartbeatstop: don't store client state, use timeout
In order to minimize this change while keeping a simple version of the
behavior, we set `lastOk` to the current time less the intial server
connection timeout. If the client starts and never contacts the
server, it will stop all configured tasks after the initial server
connection grace period, on the assumption that we've been out of
touch longer than any configured `stop_after_client_disconnect`.

The more complex state behavior might be justified later, but we
should learn about failure modes first.
2020-05-01 12:35:49 -04:00
Lang Martin 28bac139cb client/heartbeatstop: destroy allocs when disconnected from servers
- track lastHeartbeat, the client local time of the last successful
  heartbeat round trip
- track allocations with `stop_after_client_disconnect` configured
- trigger allocation destroy (which handles cleanup)
- restore heartbeat/killable allocs tracking when allocs are recovered from disk
- on client restart, stop those allocs after a grace period if the
  servers are still partioned
2020-05-01 12:35:49 -04:00
Tim Gross eda7be552c csi: add dynamicplugins registry to client state store (#7330)
In order to correctly fingerprint dynamic plugins on client restarts,
we need to persist a handle to the plugin (that is, connection info)
to the client state store.

The dynamic registry will sync automatically to the client state
whenever it receives a register/deregister call.
2020-03-23 13:58:30 -04:00
Tim Gross 1cf7ef44ed csi: docstring and log message fixups (#7327)
Fix some docstring typos and fix noisy log message during client restarts.
A log for the common case where the plugin socket isn't ready yet
isn't actionable by the operator so having it at info is just noise.
2020-03-23 13:58:30 -04:00
Mahmood Ali a0340016b9 client: canonicalize alloc.Job on restore
There is a case for always canonicalizing alloc.Job field when
canonicalizing the alloc.  I'm less certain of implications though, and
the job canonicalize hasn't changed for a long time.

Here, we special case client restore from database as it's probably the
most relevant part.  When receiving an alloc from RPC, the data should
be fresh enough.
2020-01-28 09:59:05 -05:00
Mahmood Ali f36cc54efd actually always canonicalize alloc.Job
alloc.Job may be stale as well and need to migrate it.  It does cost
extra cycles but should be negligible.
2020-01-15 09:02:48 -05:00
Mahmood Ali b1b714691c address review comments 2020-01-15 08:57:05 -05:00
Mahmood Ali d740d347ce Migrate old alloc structs on read
This commit ensures that Alloc.AllocatedResources is properly populated
when read from persistence stores (namely Raft and client state store).
The alloc struct may have been written previously by an arbitrary old
version that may only populate Alloc.TaskResources.
2020-01-09 08:46:50 -05:00
Jasmine Dahilig 0780adfa7f
timeout after 5 seconds when client opens a data directory (#6348) 2019-09-24 16:28:21 -07:00
Michael Schurter ee23bdafbc client/state: missing deploy status isn't an error
Fixes TestClient_SaveRestoreState
2018-12-19 10:39:27 -08:00
Michael Schurter 99bd5b3422 client/state: use 2 as version; test error path 2018-12-19 10:39:27 -08:00
Michael Schurter d9ea8252a7 client/state: support upgrading from 0.8->0.9
Also persist and load DeploymentStatus to avoid rechecking health after
client restarts.
2018-12-19 10:39:27 -08:00
Michael Schurter 0018b2f659 client/state: reorg state buckets to ease transition
* Prefix task bucket with task- to prevent name conflicts
* Shorten device manager bucket name
* Remove commented out outdated var
* Update layout comment
2018-12-19 10:22:28 -08:00
Nick Ethier 82175d1328
client/drivermananger: add driver manager
The driver manager is modeled after the device manager and is started by the client.
It's responsible for handling driver lifecycle and reattachment state, as well as
processing the incomming fingerprint and task events from each driver. The mananger
exposes a method for registering event handlers for task events that is used by the
task runner to update the server when a task has been updated with an event.

Since driver fingerprinting has been implemented by the driver manager, it is no
longer needed in the fingerprint mananger and has been removed.
2018-12-18 22:55:18 -05:00
Alex Dadgar a7ca737fb6 review comments 2018-11-07 11:31:52 -08:00
Alex Dadgar f0c7a8159b tests 2018-11-07 10:43:15 -08:00
Alex Dadgar 204ca8230c Device manager
Introduce a device manager that manages the lifecycle of device plugins
on the client. It fingerprints, collects stats, and forwards Reserve
requests to the correct plugin. The manager, also handles device plugins
failing and validates their output.
2018-11-07 10:43:15 -08:00
Michael Schurter e060174130 ar: fix leader handling, state restoring, and destroying unrun ARs
* Migrated all of the old leader task tests and got them passing
* Refactor and consolidate task killing code in AR to always kill leader
  tasks first
* Fixed lots of issues with state restoring
* Fixed deadlock in AR.Destroy if AR.Run had never been called
* Added a new in memory statedb for testing
2018-10-19 09:45:45 -07:00
Alex Dadgar 45e41cca03 allocrunnerv2 -> allocrunner 2018-10-16 16:56:56 -07:00
Michael Schurter 1d747048ea tests: ensure task state is initialized in NewAR
Also expose NoopDB for use in tests.
2018-10-16 16:56:55 -07:00
Michael Schurter 4236255686 lots of comment/log fixes 2018-10-16 16:53:30 -07:00
Michael Schurter 820af27171 wrap boltdb in a write deduplicator
Saves a tiny bit of cpu and some IO. Sadly doesn't prevent all IO on
duplicate writes as the transactions are still created and committed.

$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: github.com/hashicorp/nomad/helper/boltdd
BenchmarkWriteDeduplication_On-4             500           4059591 ns/op           23736 B/op         56 allocs/op
BenchmarkWriteDeduplication_Off-4            300           4115319 ns/op           25942 B/op         55 allocs/op
2018-10-16 16:53:30 -07:00
Michael Schurter 990228a6e2 wip wrap boltdb to get path information
finished but doesn't handle deleting deeply nested buckets
2018-10-16 16:53:30 -07:00
Michael Schurter a3fe0510d1 Move all encoding and put deduping into state db
Still WIP as it does not handle deletions.
2018-10-16 16:53:30 -07:00
Michael Schurter 533bc93b3a implement all boltdb interactions behind StateDB 2018-10-16 16:53:30 -07:00
Michael Schurter 516d641db0 client: implement all-or-nothing alloc restoration
Restoring calls NewAR -> Restore -> Run

NewAR now calls NewTR
AR.Restore calls TR.Restore
AR.Run calls TR.Run
2018-10-16 16:53:29 -07:00
Alex Dadgar f5ff509fa5 Refactor - wip 2018-06-12 10:23:45 -07:00
Renamed from client/state_database.go (Browse further)