Commit graph

23 commits

Author SHA1 Message Date
Tim Gross 17aee4d69c
fingerprint: don't clear Consul/Vault attributes on failure (#14673)
Clients periodically fingerprint Vault and Consul to ensure the server has
updated attributes in the client's fingerprint. If the client can't reach
Vault/Consul, the fingerprinter clears the attributes and requires a node
update. Although this seems like correct behavior so that we can detect
intentional removal of Vault/Consul access, it has two serious failure modes:

(1) If a local Consul agent is restarted to pick up configuration changes and the
client happens to fingerprint at that moment, the client will update its
fingerprint and result in evaluations for all its jobs and all the system jobs
in the cluster.

(2) If a client loses Vault connectivity, the same thing happens. But the
consequences are much worse in the Vault case because Vault is not run as a
local agent, so Vault connectivity failures are highly correlated across the
entire cluster. A 15 second Vault outage will cause a new `node-update`
evalution for every system job on the cluster times the number of nodes, plus
one `node-update` evaluation for every non-system job on each node. On large
clusters of 1000s of nodes, we've seen this create a large backlog of evaluations.

This changeset updates the fingerprinting behavior to keep the last fingerprint
if Consul or Vault queries fail. This prevents a storm of evaluations at the
cost of requiring a client restart if Consul or Vault is intentionally removed
from the client.
2022-09-23 14:45:12 -04:00
Dave May 3c04d7927b
cli: refactor operator debug capture (#11466)
* debug: refactor Consul API collection
* debug: refactor Vault API collection
* debug: cleanup test timing
* debug: extend test to multiregion
* debug: save cmdline flags in bundle
* debug: add cli version to output
* Add changelog entry
2021-11-05 19:43:10 -04:00
Seth Hoenig f0c3dca49c tests: swap lib/freeport for tweaked helper/freeport
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.

Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.

This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
2019-12-09 08:37:32 -06:00
Danielle Tomlinson 7fca934509 chore: General Cleanup 2019-01-17 18:43:14 +01:00
Michael Schurter 9692271926 Update testutil/vault.go
Co-Authored-By: dantoml <dani@tomlinson.io>
2019-01-17 18:43:14 +01:00
Danielle Tomlinson 160c8d80e8 testutil: Start vault in the same routine as waiting
This is a workaround for the windows process model.

Go os/exec does not pass the parent process handle to the child
processes STARTUPINFO struct, this means that unless we wait in
the _same_ execution context as Starting the process, the
handle will be lost, and we cannot kill it without regaining
a handle.

A better long term solution would be a higher level process
abstraction that uses windows.CreateProcess on windows.
2019-01-17 18:43:13 +01:00
Danielle Tomlinson 5d54a0408f fingerprint: Limit vault shutdown waiting
When vault is installed through chocolatey, it also installs a shim that
will not pass kill signals to the child. This means the process will
never actually terminate, and we lose the process handle.

Here, rather than waiting forever, we timeout fast.
2019-01-17 18:43:13 +01:00
Alex Dadgar e546215046 add a vault test matrix 2018-09-19 10:18:10 -07:00
Alex Dadgar cb0d0ef009 move to consul freeport implementation 2017-10-23 16:51:40 -07:00
Alex Dadgar dbc014b360 Standardize retrieving a free port into a helper package 2017-10-23 16:48:20 -07:00
Michael Schurter a66c53d45a Remove structs import from api
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Alex Dadgar b5d782c493 Use LookupPath to add the exe extension 2017-07-25 17:42:36 -07:00
Alex Dadgar a9c786a4fe Make test Vault pick random ports 2017-07-25 17:40:59 -07:00
Michael Schurter 5f1f91a46c Use go-testing-interface instead of testing
This drops the testings stdlib pkg from our dependencies. Saves a
whopping 46kb on our binary (was really hoping for more of a win there),
but also avoids potential ugliness with how testing sets flags.
2017-07-25 15:35:19 -07:00
Alex Dadgar 553bc91725 Parallel client tests (#2890)
* alloc_runner

* Random tests

* parallel task_runner and no exec compatible check

* Parallel client

* Fail fast and use random ports

* Fix docker port mapping

* Make concurrent pull less timing dependant

* up parallel

* Fixes

* don't build chroots in parallel on travis

* Reduce parallelism on travis with lxc/rkt

* make java test app not run forever

* drop parallelism a little

* use docker ports that are out of the os's ephemeral port range

* Limit even more on travis

* rkt deadline
2017-07-22 19:04:36 -07:00
Alex Dadgar 57f643b252 Use a lock to increment the vault port 2017-05-16 13:25:07 -04:00
Alex Dadgar 09502334e1 vault testutil changes bin name based on OS 2017-04-04 15:14:35 -07:00
Alex Dadgar 183d0bdd15 Cleanup and skip test 2017-01-27 15:06:01 -08:00
Alex Dadgar 751aa114bf Fix Vault parsing of booleans 2016-10-10 18:04:39 -07:00
Alex Dadgar 48696ba0cc Use tomb to shutdown
Token revocation

Remove from the statestore

Revoke tokens

Don't error when Vault is disabled as this could cause issue if the operator ever goes from enabled to disabled

update server interface to allow enable/disable and config loading

test the new functions

Leader revoke

Use active
2016-08-28 14:06:25 -07:00
Alex Dadgar 19be6b57b2 fixes 2016-08-19 20:02:32 -07:00
Alex Dadgar a981fb4e0e test renewal 2016-08-17 16:25:38 -07:00
Alex Dadgar a8efce874f Token renewal and beginning of tests 2016-08-17 16:25:38 -07:00