Commit graph

176 commits

Author SHA1 Message Date
Tim Gross 1dc053b917 rename SecureVariables to Variables throughout 2022-08-26 16:06:24 -04:00
Seth Hoenig b3ea68948b build: run gofmt on all go source files
Go 1.19 will forecefully format all your doc strings. To get this
out of the way, here is one big commit with all the changes gofmt
wants to make.
2022-08-16 11:14:11 -05:00
Tim Gross a5a9eedc81 core job for secure variables re-key (#13440)
When the `Full` flag is passed for key rotation, we kick off a core
job to decrypt and re-encrypt all the secure variables so that they
use the new key.
2022-07-11 13:34:06 -04:00
Tim Gross 6300427228 core job for key rotation (#13309)
Extend the GC job to support periodic key rotation.

Update the GC process to safely support signed workload identity. We
can't GC any key used to sign a workload identity. Finding which key
was used to sign every allocation will be expensive, but there are not
that many keys. This lets us take a conservative approach: find the
oldest live allocation and ensure that we don't GC any key older than
that key.
2022-07-11 13:34:06 -04:00
Tim Gross 7055ce89b1 keyring replication (#13167)
Replication for the secure variables keyring. Because only key
metadata is stored in raft, we need to distribute key material
out-of-band from raft replication. A goroutine runs on each server and
watches for changes to the `RootKeyMeta`. When a new key is received,
attempt to fetch the key from the leader. If the leader doesn't have
the key (which may happen if a key is rotated right before a leader
transition), try to get the key from any peer.
2022-07-11 13:34:04 -04:00
Tim Gross d5a214484c core job for root key GC (#13199)
Inactive and unused keys older than a threshold will be periodically
garbage collected.
2022-07-11 13:34:04 -04:00
Tim Gross 5a85d96322 remove end-user algorithm selection (#13190)
After internal design review, we decided to remove exposing algorithm
choice to the end-user for the initial release. We'll solve nonce
rotation by forcing rotations automatically on key GC (in a core job,
not included in this changeset). Default to AES-256 GCM for the
following criteria:

* faster implementation when hardware acceleration is available
* FIPS compliant
* implementation in pure go
* post-quantum resistance

Also fixed a bug in the decoding from keystore and switched to a 
harder-to-misuse encoding method.
2022-07-11 13:34:04 -04:00
Tim Gross f2ee585830 bootstrap keyring (#13124)
When a server becomes leader, it will check if there are any keys in
the state store, and create one if there is not. The key metadata will
be replicated via raft to all followers, who will then get the key
material via key replication (not implemented in this changeset).
2022-07-11 13:34:04 -04:00
James Rasell 181b247384
core: allow pausing and un-pausing of leader broker routine (#13045)
* core: allow pause/un-pause of eval broker on region leader.

* agent: add ability to pause eval broker via scheduler config.

* cli: add operator scheduler commands to interact with config.

* api: add ability to pause eval broker via scheduler config

* e2e: add operator scheduler test for eval broker pause.

* docs: include new opertor scheduler CLI and pause eval API info.
2022-07-06 16:13:48 +02:00
Derek Strickland d7f44448e1 disconnected clients: Observability plumbing (#12141)
* Add disconnects/reconnect to log output and emit reschedule metrics

* TaskGroupSummary: Add Unknown, update StateStore logic, add to metrics
2022-04-05 17:12:23 -04:00
Seth Hoenig 9670adb6c6 cleanup: purge github.com/pkg/errors 2022-04-01 19:24:02 -05:00
Luiz Aoqui f8973d364e
core: use the new Raft API when removing peers (#12340)
Raft v3 introduced a new API for adding and removing peers that takes
the peer ID instead of the address.

Prior to this change, Nomad would use the remote peer Raft version for
deciding which API to use, but this would not work in the scenario where
a Raft v3 server tries to remove a Raft v2 server; the code running uses
v3 so it's unable to call the v2 API.

This change uses the Raft version of the server running the code to
decide which API to use. If the remote peer is a Raft v2, it uses the
server address as the ID.
2022-03-22 15:07:31 -04:00
Luiz Aoqui 8db12c2a17
server: transfer leadership in case of error (#12293)
When a Nomad server becomes the Raft leader, it must perform several
actions defined in the establishLeadership function. If any of these
actions fail, Raft will think the node is the leader, but it will not
actually be able to act as a Nomad leader.

In this scenario, leadership must be revoked and transferred to another
server if possible, or the node should retry the establishLeadership
steps.
2022-03-17 11:10:57 -04:00
Luiz Aoqui 2876739a51
api: apply consistent behaviour of the reverse query parameter (#12244) 2022-03-11 19:44:52 -05:00
Seth Hoenig 40c714a681 api: return sorted results in certain list endpoints
These API endpoints now return results in chronological order. They
can return results in reverse chronological order by setting the
query parameter ascending=true.

- Eval.List
- Deployment.List
2022-02-15 13:48:28 -06:00
Tim Gross 04977525dd
csi: update leader's ACL in volumewatcher (#11891)
The volumewatcher that runs on the leader needs to make RPC calls
rather than writing to raft (as we do in the deploymentwatcher)
because the unpublish workflow needs to make RPC calls to the
clients. This requires that the volumewatcher has access to the
leader's ACL token.

But when leadership transitions, the new leader creates a new leader
ACL token. This ACL token needs to be passed into the volumewatcher
when we enable it, otherwise the volumewatcher can find itself with a
stale token.
2022-01-24 11:49:50 -05:00
Charlie Voiselle 98a240cd99
Make number of scheduler workers reloadable (#11593)
## Development Environment Changes
* Added stringer to build deps

## New HTTP APIs
* Added scheduler worker config API
* Added scheduler worker info API

## New Internals
* (Scheduler)Worker API refactor—Start(), Stop(), Pause(), Resume()
* Update shutdown to use context
* Add mutex for contended server data
    - `workerLock` for the `workers` slice
    - `workerConfigLock` for the `Server.Config.NumSchedulers` and
      `Server.Config.EnabledSchedulers` values

## Other
* Adding docs for scheduler worker api
* Add changelog message

Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2022-01-06 11:56:13 -05:00
Mahmood Ali ac3cf10849
nomad: only activate one-time auth tokens with 1.1.0 (#10952)
Fix a panic in handling one-time auth tokens, used to support `nomad ui
--authenticate`.

If the nomad leader is a 1.1.x with some servers running as 1.0.x, the
pre-1.1.0 servers risk crashing and the cluster may lose quorum. That
can happen when `nomad authenticate -ui` command is issued, or when the
leader scans for expired tokens every 10 minutes.

Fixed #10943 .
2021-07-27 13:17:55 -04:00
Tim Gross 7a55a6af16 leader: call eval log formatting lazily
Arguments to our logger's various write methods are evaluated eagerly, so
method calls in log parameters will always be called, regardless of log
level. Move some logger messages to the logger's `Fmt` method so that
`GoString` is evaluated lazily instead.
2021-06-02 09:59:55 -04:00
Tim Gross 97b0e26d1f RPC endpoints to support 'nomad ui -login'
RPC endpoints for the user-driven APIs (`UpsertOneTimeToken` and
`ExchangeOneTimeToken`) and token expiration (`ExpireOneTimeTokens`).
Includes adding expiration to the periodic core GC job.
2021-03-10 08:17:56 -05:00
Drew Bailey a0b7f05a7b
Remove Managed Sinks from Nomad (#9470)
* Remove Managed Sinks from Nomad

Managed Sinks were a beta feature in Nomad 1.0-beta2. During the beta
period it was determined that this was not a scalable approach to
support community and third party sinks.

* update comment

* changelog
2020-11-30 14:00:31 -05:00
Drew Bailey d62d8a8587
Event sink manager improvements (#9206)
* Improve managed sink run loop and reloading

resetCh no longer needed

length of buffer equal to count of items, not count of events in each item

update equality fn name, pr feedback

clean up sink manager sink creation

* update test to reflect changes

* bad editor find and replace

* pr feedback
2020-11-02 09:21:32 -05:00
Drew Bailey 86080e25a9
Send events to EventSinks (#9171)
* Process to send events to configured sinks

This PR adds a SinkManager to a server which is responsible for managing
managed sinks. Managed sinks subscribe to the event broker and send
events to a sink writer (webhook). When changes to the eventstore are
made the sinkmanager and managed sink are responsible for reloading or
starting a new managed sink.

* periodically check in sink progress to raft

Save progress on the last successfully sent index to raft. This allows a
managed sink to resume close to where it left off in the event of a lost
server or leadership change

dereference eventsink so we can accurately use the watchch

When using a pointer to eventsink struct it was updated immediately and our reload logic would not trigger
2020-10-26 17:27:54 -04:00
Michael Schurter c2dd9bc996 core: open source namespaces 2020-10-22 15:26:32 -07:00
Chris Baker 1d35578bed removed backwards-compatible/untagged metrics deprecated in 0.7 2020-10-13 20:18:39 +00:00
Tim Gross 1aa242c15a
failed core jobs should not have follow-ups (#8682)
If a core job fails more than the delivery limit, the leader will create a new
eval with the TriggeredBy field set to `failed-follow-up`.

Evaluations for core jobs have the leader's ACL, which is not valid on another
leader after an election. The `failed-follow-up` evals do not have ACLs, so
core job evals that fail more than the delivery limit or core job evals that
span leader elections will never succeed and will be re-enqueued forever. So
we should not retry with a `failed-follow-up`.
2020-08-18 16:48:43 -04:00
Mahmood Ali fbfe4ab1bd Atomic eval insertion with job (de-)registration
This fixes a bug where jobs may get "stuck" unprocessed that
dispropotionately affect periodic jobs around leadership transitions.
When registering a job, the job registration and the eval to process it
get applied to raft as two separate transactions; if the job
registration succeeds but eval application fails, the job may remain
unprocessed. Operators may detect such failure, when submitting a job
update and get a 500 error code, and they could retry; periodic jobs
failures are more likely to go unnoticed, and no further periodic
invocations will be processed until an operator force evaluation.

This fixes the issue by ensuring that the job registration and eval
application get persisted and processed atomically in the same raft log
entry.

Also, applies the same change to ensure atomicity in job deregistration.

Backward Compatibility

We must maintain compatibility in two scenarios: mixed clusters where a
leader can handle atomic updates but followers cannot, and a recent
cluster processes old log entries from legacy or mixed cluster mode.

To handle this constraints: ensure that the leader continue to emit the
Evaluation log entry until all servers have upgraded; also, when
processing raft logs, the servers honor evaluations found in both spots,
the Eval in job (de-)registration and the eval update entries.

When an updated server sees mix-mode behavior where an eval is inserted
into the raft log twice, it ignores the second instance.

I made one compromise in consistency in the mixed-mode scenario: servers
may disagree on the eval.CreateIndex value: the leader and updated
servers will report the job registration index while old servers will
report the index of the eval update log entry. This discripency doesn't
seem to be material - it's the eval.JobModifyIndex that matters.
2020-07-14 11:59:29 -04:00
Mahmood Ali 63e048e972 clarify ccomments, esp related to leadership code 2020-06-09 12:01:31 -04:00
Mahmood Ali 47a163b63f reassert leadership 2020-06-07 15:47:06 -04:00
Mahmood Ali 70fbcb99c2 leadership: pause and unpause workers consistently
This fixes a bug where leadership establishment pauses 3/4 of workers
but stepping down unpause only 1/2!
2020-06-01 10:57:53 -04:00
Mahmood Ali 082c085068
Merge pull request #8036 from hashicorp/f-background-vault-revoke-on-restore
Speed up leadership establishment
2020-05-31 21:27:16 -04:00
Mahmood Ali 1af32e65bc clarify rpc consistency readiness comment 2020-05-31 21:26:41 -04:00
Mahmood Ali 0819ea60ea
Apply suggestions from code review
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2020-05-31 21:04:39 -04:00
Mahmood Ali ccc89f940a terminate leader goroutines on shutdown
Ensure that nomad steps down (and terminate leader goroutines) on
shutdown, when the server is the leader.

Without this change, `monitorLeadership` may handle `shutdownCh` event
and exit early before handling the raft `leaderCh` event and end up
leaking leadership goroutines.
2020-05-26 10:18:10 -04:00
Mahmood Ali aa8e79e55b Reorder leadership handling
Start serving RPC immediately after leader components are enabled, and
move clean up to the bottom as they don't block leadership
responsibilities.
2020-05-21 08:30:31 -04:00
Mahmood Ali 1cf1114627 apply the same change to consul revocation 2020-05-21 08:30:31 -04:00
Mahmood Ali 6e749d12a0 on leadership establishment, revoke Vault tokens in background
Establishing leadership should be very fast and never make external API
calls.

This fixes a situation where there is a long backlog of Vault tokens to
be revoked on when leadership is gained.  In such case, revoking the
tokens will significantly slow down leadership establishment and slow
down processing.  Worse, the revocation call does not honor leadership
`stopCh` signals, so it will not stop when the leader loses leadership.
2020-05-21 07:38:27 -04:00
Tim Gross 3aa761b151
Periodic GC for volume claims (#7881)
This changeset implements a periodic garbage collection of CSI volumes
with missing allocations. This can happen in a scenario where a node
update fails partially and the allocation updates are written to raft
but the evaluations to GC the volumes are dropped. This feature will
cover this edge case and ensure that upgrades from 0.11.0 and 0.11.1
get any stray claims cleaned up.
2020-05-11 08:20:50 -04:00
Tim Gross 801ebcfe8d
periodic GC for CSI plugins (#7878)
This changeset implements a periodic garbage collection of unused CSI
plugins. Plugins are self-cleaning when the last allocation for a
plugin is stopped, but this feature will cover any missing edge cases
and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins
cleaned up.
2020-05-06 16:49:12 -04:00
Tim Gross a7a64443e1
csi: move volume claim release into volumewatcher (#7794)
This changeset adds a subsystem to run on the leader, similar to the
deployment watcher or node drainer. The `Watcher` performs a blocking
query on updates to the `CSIVolumes` table and triggers reaping of
volume claims.

This will avoid tying up scheduling workers by immediately sending
volume claim workloads into their own loop, rather than blocking the
scheduling workers in the core GC job doing things like talking to CSI
controllers

The volume watcher is enabled on leader step-up and disabled on leader
step-down.

The volume claim GC mechanism now makes an empty claim RPC for the
volume to trigger an index bump. That in turn unblocks the blocking
query in the volume watcher so it can assess which claims can be
released for a volume.
2020-04-30 09:13:00 -04:00
Mahmood Ali 367133a399 Use latest raft patterns 2020-02-13 18:56:52 -05:00
Seth Hoenig 6bfa50acdc nomad: remove unused default schedular variable
This is from a merge conflict resolution that went the wrong direction.

I assumed the block had been added, but really it had been removed. Now,
it is removed once again.
2020-01-31 19:06:37 -06:00
Seth Hoenig d3cd6afd7e nomad: min cluster version for connect ACLs is now v0.10.4 2020-01-31 19:06:19 -06:00
Seth Hoenig 4ee55fcd6c nomad,client: apply more comment/style PR tweaks 2020-01-31 19:04:52 -06:00
Seth Hoenig 8219c78667 nomad: handle SI token revocations concurrently
Be able to revoke SI token accessors concurrently, and also
ratelimit the requests being made to Consul for the various
ACL API uses.
2020-01-31 19:04:14 -06:00
Seth Hoenig 9df33f622f nomad: proxy requests for Service Identity tokens between Clients and Consul
Nomad jobs may be configured with a TaskGroup which contains a Service
definition that is Consul Connect enabled. These service definitions end
up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In
the case where Consul ACLs are enabled, a Service Identity token is required
for these tasks to run & connect, etc. This changeset enables the Nomad Server
to recieve RPC requests for the derivation of SI tokens on behalf of instances
of Consul Connect using Tasks. Those tokens are then relayed back to the
requesting Client, which then injects the tokens in the secrets directory of
the Task.
2020-01-31 19:03:53 -06:00
Seth Hoenig 2b66ce93bb nomad: ensure a unique ClusterID exists when leader (gh-6702)
Enable any Server to lookup the unique ClusterID. If one has not been
generated, and this node is the leader, generate a UUID and attempt to
apply it through raft.

The value is not yet used anywhere in this changeset, but is a prerequisite
for gh-6701.
2020-01-31 19:03:26 -06:00
Mahmood Ali 90cae566e5
Merge pull request #6935 from hashicorp/b-default-preemption-flag
scheduler: allow configuring default preemption for system scheduler
2020-01-28 15:11:06 -05:00
Mahmood Ali af17b4afc7 Support customizing full scheduler config 2020-01-28 14:51:42 -05:00
Mahmood Ali ded4233c27 tweak leadership flapping log messages 2020-01-28 09:49:36 -05:00