Commit graph

3946 commits

Author SHA1 Message Date
Chris Dickson 4d8ba272d1 client: expose allocated CPU per task (#6784) 2019-12-09 15:40:22 -05:00
Seth Hoenig f0c3dca49c tests: swap lib/freeport for tweaked helper/freeport
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.

Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.

This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
2019-12-09 08:37:32 -06:00
Mahmood Ali ded2a725db
Merge pull request #6788 from hashicorp/b-timeout-logmon-stop
logmon: add timeout to RPC operations
2019-12-06 19:12:06 -05:00
Danielle Lancashire d2075ebae9 spellcheck: Fix spelling of retrieve 2019-12-05 18:59:47 -06:00
Mahmood Ali b2ae27863e
Merge pull request #6779 from hashicorp/r-aws-fingerprint-via-library
Use AWS SDK to access EC2 Metadata
2019-12-02 13:30:51 -05:00
Mahmood Ali 83089feff5 logmon: add timeout to RPC operations
Add an RPC timeout for logmon.  In
https://github.com/hashicorp/nomad/issues/6461#issuecomment-559747758 ,
`logmonClient.Stop` locked up and indefinitely blocked the task runner
destroy operation.

This is an incremental improvement.  We still need to follow up to
understand how we got to that state, and the full impact of locked-up
Stop and its link to pending allocations on restart.
2019-12-02 10:33:05 -05:00
Mahmood Ali 293276a457 fingerprint code refactor
Some code cleanup:

* Use a field for setting EC2 metadata instead of env-vars in testing;
but keep environment variables for backward compatibility reasons

* Update tests to use testify
2019-11-26 10:51:28 -05:00
Mahmood Ali 1e48f8e20d fingerprint: avoid api query if config overrides it 2019-11-26 10:51:28 -05:00
Mahmood Ali 5bb9089431 fingerprint: use ec2metadata package 2019-11-26 10:51:27 -05:00
Lars Lehtonen 0d344e8578
client: fix use of T.Fatal inside TestFS_logsImpl_NoFollow() goroutine. 2019-11-25 23:51:28 -08:00
Mahmood Ali e89108fb01 fixup! tests: don't assume eth0 network is available 2019-11-21 08:28:20 -05:00
Mahmood Ali 443804b5c7 tests: don't assume eth0 network is available
TestClient_UpdateNodeFromFingerprintKeepsConfig checks a test node
network interface, which is hardcoded to `eth0` and is updated
asynchronously.  This causes flakiness when eth0 isn't available.

Here, we hardcode the value to an arbitrary network interface.
2019-11-20 20:37:30 -05:00
Mahmood Ali ed3f1957e7 tests: run TestClient_WatchAllocs in non-linux environments 2019-11-20 20:37:29 -05:00
Mahmood Ali 521f51a929 testS: fix TestClient_RestoreError
When spinning a second client, ensure that it uses new driver
instances, rather than reuse the already shutdown unhealthy drivers from
first instance.

This speeds up tests significantly, but cutting ~50 seconds or so, the
timeout in NewClient until drivers fingerprints.  They never do because
drivers were shutdown already.
2019-11-20 20:37:28 -05:00
Mahmood Ali 4efb71cf0c tests: remove TestClient_RestoreError test
TestClient_RestoreError is very slow, taking ~81 seconds.

It has few problematic patterns.  It's unclear what it tests, it
simulates a failure condition where all state db lookup fails and
asserts that alloc fails.  Though starting from
https://github.com/hashicorp/nomad/pull/6216 , we don't fail allocs in
that condition but rather restart them.

Also, the drivers used in second client `c2` are the same singleton
instances used in `c1` and already shutdown.  We ought to start healthy
new driver instances.
2019-11-20 20:37:27 -05:00
Preetha be4a51d5b8
Merge pull request #6349 from hashicorp/b-host-stats
client: Return empty values when host stats fail
2019-11-20 10:13:02 -06:00
Lang Martin aa985ebe21
getter: allow the gcs download scheme (#6692) 2019-11-19 09:10:56 -05:00
Nick Ethier bd454a4c6f
client: improve group service stanza interpolation and check_re… (#6586)
* client: improve group service stanza interpolation and check_restart support

Interpolation can now be done on group service stanzas. Note that some task runtime specific information
that was previously available when the service was registered poststart of a task is no longer available.

The check_restart stanza for checks defined on group services will now properly restart the allocation upon
check failures if configured.
2019-11-18 13:04:01 -05:00
Drew Bailey b644e1f47d
add server-id to monitor specific server 2019-11-14 09:53:41 -05:00
Drew Bailey f4a7e3dc75
coordinate closing of doneCh, use interface to simplify callers
comments
2019-11-05 11:44:26 -05:00
Drew Bailey fe542680dc
log-json -> json
fix typo command/agent/monitor/monitor.go

Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com>

Update command/agent/monitor/monitor.go

Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com>

address feedback, lock to prevent send on closed channel

fix lock/unlock for dropped messages
2019-11-05 09:51:59 -05:00
Drew Bailey ddfa20b993
address feedback, fix gauge metric name 2019-11-05 09:51:57 -05:00
Drew Bailey 298b8358a9
move forwarded monitor request into helper 2019-11-05 09:51:56 -05:00
Drew Bailey 318b6c91bf
monitor command takes no args
rm extra new line

fix lint errors

return after close

fix, simplify test
2019-11-05 09:51:55 -05:00
Drew Bailey c7b633b6c1
lock in sub select
rm redundant lock

wip to use framing

wip switch to stream frames
2019-11-05 09:51:54 -05:00
Drew Bailey 17d876d5ef
rename function, initialize log level better
underscores instead of dashes for query params
2019-11-05 09:51:53 -05:00
Drew Bailey 8178beecf0
address feedback, use agent_endpoint instead of monitor 2019-11-05 09:51:53 -05:00
Drew Bailey db65b1f4a5
agent:read acl policy for monitor 2019-11-05 09:51:52 -05:00
Drew Bailey 2533617888
rpc acl tests for both monitor endpoints 2019-11-05 09:51:51 -05:00
Drew Bailey a45ae1cd58
enable json formatting, use queryoptions 2019-11-05 09:51:49 -05:00
Drew Bailey 786989dbe3
New monitor pkg for shared monitor functionality
Adds new package that can be used by client and server RPC endpoints to
facilitate monitoring based off of a logger

clean up old code

small comment about write

rm old comment about minsize

rename to Monitor

Removes connection logic from monitor command

Keep connection logic in endpoints, use a channel to send results from
monitoring

use new multisink logger and interfaces

small test for dropped messages

update go-hclogger and update sink/intercept logger interfaces
2019-11-05 09:51:49 -05:00
Drew Bailey e076204820
get local rpc endpoint working 2019-11-05 09:51:48 -05:00
Drew Bailey 976c43157c
remove log_writer
prefix output with proper spacing

update gzip handler, adjust first byte flow to allow gzip handler bypass

wip, first stab at wiring up rpc endpoint
2019-11-05 09:51:48 -05:00
Michael Schurter 9fed8d1bed client: fix panic from 0.8 -> 0.10 upgrade
makeAllocTaskServices did not do a nil check on AllocatedResources
which causes a panic when upgrading directly from 0.8 to 0.10. While
skipping 0.9 is not supported we intend to fix serious crashers caused
by such upgrades to prevent cluster outages.

I did a quick audit of the client package and everywhere else that
accesses AllocatedResources appears to be properly guarded by a nil
check.
2019-11-01 07:47:03 -07:00
Lars Lehtonen 4ed9427c77 client/allocwatcher: fix dropped test error (#6592) 2019-10-31 08:29:25 -04:00
Michael Schurter eba4d4cd6f vault: remove dead lease code 2019-10-25 15:08:35 -07:00
Michael Schurter 39437a5c5b
Merge branch 'master' into release-0100 2019-10-22 08:17:57 -07:00
Michael Schurter b6bb561854 cleanup post 0.10.0 release 2019-10-22 07:48:09 -07:00
Nomad Release bot 3e6c9dd40e Generate files for 0.10.0 release 2019-10-22 12:34:56 +00:00
Mahmood Ali 262dcb0842 Revert "lint: ignore generated windows syscall wrappers"
This reverts commit 482862e6ab0f8db748367bb1eefc2efd11fbe11a.
2019-10-22 08:23:44 -04:00
Michael Schurter 460bd63db0 client: expose group network ports in env vars
Fixes #6375

Intentionally omitted IPs prior to 0.10.0 release to minimize changes
and risk.
2019-10-21 13:28:35 -07:00
Michael Schurter 8634533e82 client: expose group network ports in env vars
Fixes #6375

Intentionally omitted IPs prior to 0.10.0 release to minimize changes
and risk.
2019-10-21 12:31:13 -07:00
Michael Schurter bb82f365ff connect: upgrade to envoy 1.11.2 and add sha
Append the Docker image sha to the Envoy image to ensure users default
to using the version that Nomad was tested against.
2019-10-18 10:16:58 -07:00
Michael Schurter ee5ea3ecc7 connect: upgrade to envoy 1.11.2 and add sha
Append the Docker image sha to the Envoy image to ensure users default
to using the version that Nomad was tested against.
2019-10-18 07:46:53 -07:00
Mahmood Ali 4e4a9b252c
Merge pull request #6290 from hashicorp/r-generated-code-refactor
dev: avoid codecgen code in downstream projects
2019-10-15 08:22:31 -04:00
Michael Schurter 2992cb80b0 Remove 0.10.0-rc1 generated files 2019-10-10 13:31:42 -07:00
Nomad Release bot 3007f1662e Generate files for 0.10.0-rc1 release 2019-10-10 19:08:23 +00:00
Michael Schurter f54f1cb321
Revert "Revert "Use joint context to cancel prestart hooks"" 2019-10-08 11:34:09 -07:00
Michael Schurter 81a30ae106
Revert "Use joint context to cancel prestart hooks" 2019-10-08 11:27:08 -07:00
Mahmood Ali 4b2ba62e35 acl: check ACL against object namespace
Fix a bug where a millicious user can access or manipulate an alloc in a
namespace they don't have access to.  The allocation endpoints perform
ACL checks against the request namespace, not the allocation namespace,
and performs the allocation lookup independently from namespaces.

Here, we check that the requested can access the alloc namespace
regardless of the declared request namespace.

Ideally, we'd enforce that the declared request namespace matches
the actual allocation namespace.  Unfortunately, we haven't documented
alloc endpoints as namespaced functions; we suspect starting to enforce
this will be very disruptive and inappropriate for a nomad point
release.  As such, we maintain current behavior that doesn't require
passing the proper namespace in request.  A future major release may
start enforcing checking declared namespace.
2019-10-08 12:59:22 -04:00