Commit graph

77 commits

Author SHA1 Message Date
Dave May e89302aa4b
nomad operator debug - add client node filtering arguments (#9331)
* operator debug - add client node filtering arguments

* add WaitForClient helper function

* use RPC in WaitForClient to avoid unnecessary imports

* guard against nil values

* move initialization up and shorten test duration

* cleanup nodeLookupFailCount logic

* only display max node notice if we actually tried to capture nodes
2020-11-12 11:25:28 -05:00
Dave May f37e90be18
Metrics gotemplate support, debug bundle features (#9067)
* add goroutine text profiles to nomad operator debug

* add server-id=all to nomad operator debug

* fix bug from changing metrics from string to []byte

* Add function to return MetricsSummary struct, metrics gotemplate support

* fix bug resolving 'server-id=all' when no servers are available

* add url to operator_debug tests

* removed test section which is used for future operator_debug.go changes

* separate metrics from operator, use only structs from go-metrics

* ensure parent directories are created as needed

* add suggested comments for text debug pprof

* move check down to where it is used

* add WaitForFiles helper function to wait for multiple files to exist

* compact metrics check

Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>

* fix github's silly apply suggestion

Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>
2020-10-14 15:16:10 -04:00
Mahmood Ali 2ab54fe060 gracefully shutdown test server 2020-05-27 08:59:06 -04:00
Mahmood Ali 0a539c629f tests: wait until leadership loop finishes
Reverts d5c7d6e491e36a11159211f5236c19a41bed4d8e .

We actually need to forward the request to ensure that the leader is
properly configured and that establishedLeadership completes.
2020-03-06 14:41:59 -05:00
Mahmood Ali 6daac02548 avoid forwarding leadership checks in tests
The tests only care if a test server recognizes the leader.
2020-03-02 13:47:43 -05:00
Michael Schurter c82b14b0c4 core: add limits to unauthorized connections
Introduce limits to prevent unauthorized users from exhausting all
ephemeral ports on agents:

 * `{https,rpc}_handshake_timeout`
 * `{http,rpc}_max_conns_per_client`

The handshake timeout closes connections that have not completed the TLS
handshake by the deadline (5s by default). For RPC connections this
timeout also separately applies to first byte being read so RPC
connections with TLS enabled have `rpc_handshake_time * 2` as their
deadline.

The connection limit per client prevents a single remote TCP peer from
exhausting all ephemeral ports. The default is 100, but can be lowered
to a minimum of 26. Since streaming RPC connections create a new TCP
connection (until MultiplexV2 is used), 20 connections are reserved for
Raft and non-streaming RPCs to prevent connection exhaustion due to
streaming RPCs.

All limits are configurable and may be disabled by setting them to `0`.

This also includes a fix that closes connections that attempt to create
TLS RPC connections recursively. While only users with valid mTLS
certificates could perform such an operation, it was added as a
safeguard to prevent programming errors before they could cause resource
exhaustion.
2020-01-30 10:38:25 -08:00
Seth Hoenig f0c3dca49c tests: swap lib/freeport for tweaked helper/freeport
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.

Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.

This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
2019-12-09 08:37:32 -06:00
Mahmood Ali 4b2ba62e35 acl: check ACL against object namespace
Fix a bug where a millicious user can access or manipulate an alloc in a
namespace they don't have access to.  The allocation endpoints perform
ACL checks against the request namespace, not the allocation namespace,
and performs the allocation lookup independently from namespaces.

Here, we check that the requested can access the alloc namespace
regardless of the declared request namespace.

Ideally, we'd enforce that the declared request namespace matches
the actual allocation namespace.  Unfortunately, we haven't documented
alloc endpoints as namespaced functions; we suspect starting to enforce
this will be very disruptive and inappropriate for a nomad point
release.  As such, we maintain current behavior that doesn't require
passing the proper namespace in request.  A future major release may
start enforcing checking declared namespace.
2019-10-08 12:59:22 -04:00
Mahmood Ali 6cefd8f97e tests: attempt to fix TestAutopilot_CleanupStaleRaftServer
Also add a utility function for waiting for stable leadership
2019-09-04 08:49:33 -04:00
Mahmood Ali fc72fff0ed test helper for registering jobs with acl
Test helper that allows registration of jobs when ACL is activated.
2019-04-30 10:23:56 -04:00
Mahmood Ali 8c82c19831 tests: IsTravis() -> IsCI()
Replace IsTravis() references that is intended for more CI environments
rather than for Travis environment specifically.
2019-02-20 08:21:03 -05:00
Mahmood Ali 33ff8c3e8d tests: expect Docker on AppVeyor
Prepare to run docker on AppVeyor Windows environment
2019-02-20 07:41:47 -05:00
Alex Dadgar 4bdccab550 goimports 2019-01-22 15:44:31 -08:00
Danielle Tomlinson 7fca934509 chore: General Cleanup 2019-01-17 18:43:14 +01:00
Michael Schurter 9692271926 Update testutil/vault.go
Co-Authored-By: dantoml <dani@tomlinson.io>
2019-01-17 18:43:14 +01:00
Danielle Tomlinson 160c8d80e8 testutil: Start vault in the same routine as waiting
This is a workaround for the windows process model.

Go os/exec does not pass the parent process handle to the child
processes STARTUPINFO struct, this means that unless we wait in
the _same_ execution context as Starting the process, the
handle will be lost, and we cannot kill it without regaining
a handle.

A better long term solution would be a higher level process
abstraction that uses windows.CreateProcess on windows.
2019-01-17 18:43:13 +01:00
Danielle Tomlinson 5d54a0408f fingerprint: Limit vault shutdown waiting
When vault is installed through chocolatey, it also installs a shim that
will not pass kill signals to the child. This means the process will
never actually terminate, and we lose the process handle.

Here, rather than waiting forever, we timeout fast.
2019-01-17 18:43:13 +01:00
Mahmood Ali b7d421a149 tests: WaitForRunning checks for pending only
WaitForRunning risks a race condition where the allocation succeeds and
completes before WaitForRunning is called (or while it is running).

Here, I made the behavior match the function documentation.

I considered making it stricter, but callers need to account for
allocation terminating immediately after WaitForRunning terminates
anyway.
2019-01-10 15:36:57 -05:00
Mahmood Ali c3eaa0f4c8 tests: enable and fix tests requiring mock driver 2019-01-10 10:10:11 -05:00
Michael Schurter f279b1d1b1 tests: test logs endpoint against pending task
Although the really exciting change is making WaitForRunning return the
allocations that it started. This should cut down test boilerplate
significantly.
2018-10-16 16:56:55 -07:00
Michael Schurter 1c9ccdeab5 tests: fix races caused by sharing a buffer
httptest.ResponseRecorder exposes a bytes.Buffer which we were reading
and writing concurrently to test streaming log APIs. This is a race, so
I wrapped the struct in a lock with some helpers.
2018-10-16 16:56:55 -07:00
Alex Dadgar e546215046 add a vault test matrix 2018-09-19 10:18:10 -07:00
Michael Schurter 6def5bc4f9 client: set host name when migrating over tls
Not setting the host name led the Go HTTP client to expect a certificate
with a DNS-resolvable name. Since Nomad uses `${role}.${region}.nomad`
names ephemeral dir migrations were broken when TLS was enabled.

Added an e2e test to ensure this doesn't break again as it's very
difficult to test and the TLS configuration is very easy to get wrong.
2018-09-05 17:24:17 -07:00
Michael Schurter d3650fb2cd test: build with mock_driver by default
`make release` and `make prerelease` set a `release` tag to disable
enabling the `mock_driver`
2018-04-18 14:45:33 -07:00
Alex Dadgar 345169a3cc
Merge pull request #4105 from hashicorp/b-flaky-deadline-tests
Fix flaky deadline tests
2018-04-03 17:21:38 -07:00
Alex Dadgar af1b185ce4 Fix flaky deadline tests 2018-04-03 16:51:57 -07:00
Oz Katz 1b3eaf70ae
Support custom Consul config for TestServer
Adds a Consul field to the TestServerConfig that allows passing in non-default values for e.g. consul address.
This will allow the TestServer to integrate with Consul's testutil/TestServer.
2018-04-04 01:40:11 +03:00
Alex Dadgar b18f789020 Unmark drain when nodes hit their deadline and only batch/system left and add all job type integration test 2018-03-28 17:25:58 -07:00
Alex Dadgar d498fa950a Remove fake advertise address and fix TestAPI_OperatorAutopilotServerHealth 2018-03-19 15:49:12 -07:00
Michael Schurter 0ac43a7622 Skip QEMU graceful shutdown test except on Travis
Hopefully we can reuse the SkipSlow helper elsewhere.
2018-01-31 15:47:26 -08:00
Kyle Havlovitz 1c07066064 Add autopilot functionality based on Consul's autopilot 2017-12-18 14:29:41 -08:00
Alex Dadgar cb0d0ef009 move to consul freeport implementation 2017-10-23 16:51:40 -07:00
Alex Dadgar dbc014b360 Standardize retrieving a free port into a helper package 2017-10-23 16:48:20 -07:00
Michael Schurter a66c53d45a Remove structs import from api
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Chelsea Holland Komlo 09d340be23 fix access for health check 2017-09-15 22:57:31 +00:00
Armon Dadgar 1062637abd testutil: Allow enabling ACLs 2017-09-04 13:07:44 -07:00
Alex Dadgar b5d782c493 Use LookupPath to add the exe extension 2017-07-25 17:42:36 -07:00
Alex Dadgar a9c786a4fe Make test Vault pick random ports 2017-07-25 17:40:59 -07:00
Michael Schurter 5f1f91a46c Use go-testing-interface instead of testing
This drops the testings stdlib pkg from our dependencies. Saves a
whopping 46kb on our binary (was really hoping for more of a win there),
but also avoids potential ugliness with how testing sets flags.
2017-07-25 15:35:19 -07:00
Alex Dadgar 553bc91725 Parallel client tests (#2890)
* alloc_runner

* Random tests

* parallel task_runner and no exec compatible check

* Parallel client

* Fail fast and use random ports

* Fix docker port mapping

* Make concurrent pull less timing dependant

* up parallel

* Fixes

* don't build chroots in parallel on travis

* Reduce parallelism on travis with lxc/rkt

* make java test app not run forever

* drop parallelism a little

* use docker ports that are out of the os's ephemeral port range

* Limit even more on travis

* rkt deadline
2017-07-22 19:04:36 -07:00
Alex Dadgar 789751d84e Parallel 2017-07-21 16:33:04 -07:00
Alex Dadgar 57f643b252 Use a lock to increment the vault port 2017-05-16 13:25:07 -04:00
Michael Schurter aede1478db Create AssertUntil helper func 2017-04-06 17:05:09 -07:00
Alex Dadgar 09502334e1 vault testutil changes bin name based on OS 2017-04-04 15:14:35 -07:00
Alex Dadgar 7a958bdfdd Change testserver binary lookup 2017-04-04 14:45:29 -07:00
Alex Dadgar 183d0bdd15 Cleanup and skip test 2017-01-27 15:06:01 -08:00
Alex Dadgar d82747bd33 Benchmark 2016-12-09 14:44:50 -08:00
Michael Schurter a0d39ddd48 Advertise a non-loopback ip in tests 2016-11-14 14:42:59 -08:00
Alex Dadgar 751aa114bf Fix Vault parsing of booleans 2016-10-10 18:04:39 -07:00
Alex Dadgar 48696ba0cc Use tomb to shutdown
Token revocation

Remove from the statestore

Revoke tokens

Don't error when Vault is disabled as this could cause issue if the operator ever goes from enabled to disabled

update server interface to allow enable/disable and config loading

test the new functions

Leader revoke

Use active
2016-08-28 14:06:25 -07:00