open-nomad

Author	SHA1	Message	Date
Michael Schurter	9905dec6a3	test: workaround limits race	2020-02-07 15:50:53 -08:00
Michael Schurter	19a1932bbb	test: wait longer than timeout The 1s timeout raced with the 1s deadline it was trying to detect.	2020-02-07 15:50:53 -08:00
Michael Schurter	fd81208db7	test: fix flaky health test Test set Agent.client=nil which prevented the client from being shutdown. This leaked goroutines and could cause panics due to the leaked client goroutines logging after their parent test had finished. Removed ACLs from the server test because I couldn't get it to work with the test agent, and it tested very little.	2020-02-07 15:50:53 -08:00
Michael Schurter	2896f78f77	client: fix race accessing Node.status * Call Node.Canonicalize once when Node is created. * Lock when accessing fields mutated by node update goroutine	2020-02-07 15:50:47 -08:00
Drew Bailey	d830998572	agent Profile req nil check s.agent.Server() clean up logic and tests	2020-02-03 13:20:05 -05:00
Drew Bailey	c4f45f9bde	Fix panic when monitoring a local client node Fixes a panic when accessing a.agent.Server() when agent is a client instead. This pr removes a redundant ACL check since ACLs are validated at the RPC layer. It also nil checks the agent server and uses Client() when appropriate.	2020-02-03 13:20:04 -05:00
Seth Hoenig	78a7d1e426	comments: cleanup some leftover debug comments and such	2020-01-31 19:04:35 -06:00
Seth Hoenig	076cb4754e	agent: re-enable the server in dev mode	2020-01-31 19:04:19 -06:00
Seth Hoenig	8219c78667	nomad: handle SI token revocations concurrently Be able to revoke SI token accessors concurrently, and also ratelimit the requests being made to Consul for the various ACL API uses.	2020-01-31 19:04:14 -06:00
Seth Hoenig	2c7ac9a80d	nomad: fixup token policy validation	2020-01-31 19:04:08 -06:00
Seth Hoenig	9df33f622f	nomad: proxy requests for Service Identity tokens between Clients and Consul Nomad jobs may be configured with a TaskGroup which contains a Service definition that is Consul Connect enabled. These service definitions end up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In the case where Consul ACLs are enabled, a Service Identity token is required for these tasks to run & connect, etc. This changeset enables the Nomad Server to recieve RPC requests for the derivation of SI tokens on behalf of instances of Consul Connect using Tasks. Those tokens are then relayed back to the requesting Client, which then injects the tokens in the secrets directory of the Task.	2020-01-31 19:03:53 -06:00
Seth Hoenig	f030a22c7c	command, docs: create and document consul token configuration for connect acls (gh-6716) This change provides an initial pass at setting up the configuration necessary to enable use of Connect with Consul ACLs. Operators will be able to pass in a Consul Token through `-consul-token` or `$CONSUL_TOKEN` in the `job run` and `job revert` commands (similar to Vault tokens). These values are not actually used yet in this changeset.	2020-01-31 19:02:53 -06:00
Michael Schurter	c82b14b0c4	core: add limits to unauthorized connections Introduce limits to prevent unauthorized users from exhausting all ephemeral ports on agents: * `{https,rpc}_handshake_timeout` * `{http,rpc}_max_conns_per_client` The handshake timeout closes connections that have not completed the TLS handshake by the deadline (5s by default). For RPC connections this timeout also separately applies to first byte being read so RPC connections with TLS enabled have `rpc_handshake_time * 2` as their deadline. The connection limit per client prevents a single remote TCP peer from exhausting all ephemeral ports. The default is 100, but can be lowered to a minimum of 26. Since streaming RPC connections create a new TCP connection (until MultiplexV2 is used), 20 connections are reserved for Raft and non-streaming RPCs to prevent connection exhaustion due to streaming RPCs. All limits are configurable and may be disabled by setting them to `0`. This also includes a fix that closes connections that attempt to create TLS RPC connections recursively. While only users with valid mTLS certificates could perform such an operation, it was added as a safeguard to prevent programming errors before they could cause resource exhaustion.	2020-01-30 10:38:25 -08:00
Mahmood Ali	90cae566e5	Merge pull request #6935 from hashicorp/b-default-preemption-flag scheduler: allow configuring default preemption for system scheduler	2020-01-28 15:11:06 -05:00
Mahmood Ali	af17b4afc7	Support customizing full scheduler config	2020-01-28 14:51:42 -05:00
Nick Ethier	5636203d4e	consul: fix var name from rebase	2020-01-27 14:00:19 -05:00
Nick Ethier	0ae99b3c9c	consul: fix var name from rebase	2020-01-27 12:55:52 -05:00
Nick Ethier	5cbb94e16e	consul: add support for canary meta	2020-01-27 09:53:30 -05:00
Mahmood Ali	1ab682f622	scheduler: allow configuring default preemption for system scheduler Some operators want a greater control over when preemption is enabled, especially during an upgrade to limit potential side-effects.	2020-01-13 08:30:49 -05:00
Drew Bailey	f97d2e96c1	refactor api profile methods comment why we ignore errors parsing params	2020-01-09 15:15:12 -05:00
Drew Bailey	b702dede49	adds qc param, address pr feedback	2020-01-09 15:15:11 -05:00
Drew Bailey	085659f6ff	condense table test	2020-01-09 15:15:10 -05:00
Drew Bailey	45210ed901	Rename profile package to pprof Address pr feedback, rename profile package to pprof to more accurately describe its purpose. Adds gc param for heap lookup profiles.	2020-01-09 15:15:10 -05:00
Drew Bailey	1b8af920f3	address pr feedback	2020-01-09 15:15:09 -05:00
Drew Bailey	4ced73875b	leave acl checking to rpc endpoints fix test expectation test wrapNonJSON	2020-01-09 15:15:08 -05:00
Drew Bailey	279512c7f8	provide helpful error, cleanup logic	2020-01-09 15:15:08 -05:00
Drew Bailey	7bbba613a5	prevent doubly wrapping with rpc error	2020-01-09 15:15:07 -05:00
Drew Bailey	fd42020ad6	RPC server EnableDebug option Passes in agent enable_debug config to nomad server and client configs. This allows for rpc endpoints to have more granular control if they should be enabled or not in combination with ACLs. enable debug on client test	2020-01-09 15:15:07 -05:00
Drew Bailey	9a80938fb1	region forwarding; prevent recursive forwards for impossible requests prevent region forwarding loop, backfill tests fix failing test	2020-01-09 15:15:06 -05:00
Drew Bailey	46121fe3fd	move shared structs out of client and into nomad	2020-01-09 15:15:05 -05:00
Drew Bailey	3672414888	test pprof headers and profile methods tidy up, add comments clean up seconds param assignment	2020-01-09 15:15:04 -05:00
Drew Bailey	fc37448683	warn when enabled debug is on when registering m -> a receiver name return codederrors, fix query	2020-01-09 15:15:04 -05:00
Drew Bailey	62eb2d76a6	acl and debug test table rename implementation method	2020-01-09 15:15:03 -05:00
Drew Bailey	50288461c9	Server request forwarding for Agent.Profile Return rpc errors for profile requests, set up remote forwarding to target leader or server id for profile requests. server forwarding, endpoint tests	2020-01-09 15:15:03 -05:00
Drew Bailey	901f362858	test for known pprof endpoints	2020-01-09 15:15:02 -05:00
Drew Bailey	49ad5fbc85	agent pprof endpoints wip, agent endpoint and client endpoint for pprof profiles agent endpoint test	2020-01-09 15:15:02 -05:00
Drew Bailey	d9e41d2880	docs for shutdown delay update docs, address pr comments ensure pointer is not nil use pointer for diff tests, set vs unset	2019-12-16 11:38:35 -05:00
Drew Bailey	24929776a2	shutdown delay for task groups copy struct values ensure groupserviceHook implements RunnerPreKillhook run deregister first test that shutdown times are delayed move magic number into variable	2019-12-16 11:38:16 -05:00
Seth Hoenig	f0c3dca49c	tests: swap lib/freeport for tweaked helper/freeport Copy the updated version of freeport (sdk/freeport), and tweak it for use in Nomad tests. This means staying below port 10000 to avoid conflicts with the lib/freeport that is still transitively used by the old version of consul that we vendor. Also provide implementations to find ephemeral ports of macOS and Windows environments. Ports acquired through freeport are supposed to be returned to freeport, which this change now also introduces. Many tests are modified to include calls to a cleanup function for Server objects. This should help quite a bit with some flakey tests, but not all of them. Our port problems will not go away completely until we upgrade our vendor version of consul. With Go modules, we'll probably do a 'replace' to swap out other copies of freeport with the one now in 'nomad/helper/freeport'.	2019-12-09 08:37:32 -06:00
Michael Schurter	3008473f9b	Merge branch 'master' into release-0102	2019-12-04 14:13:34 -08:00
Mahmood Ali	7b8cfee162	tests: deflake TestHTTP_FreshClientAllocMetrics The test asserts that alloc counts get reported accurately in metrics by inspecting the metrics endpoint directly. Sadly, the metrics as collected by `armon/go-metrics` seem to be stateful and may contain info from other tests. This means that the test can fail depending on the order of returned metrics. Inspecting the metrics output of one failing run, you can see the duplicate guage entries but for different node_ids: ``` { "Name": "service-name.default-0a3ba4b6-2109-485e-be74-6864228aed3d.client.allocations.terminal", "Value": 10, "Labels": { "datacenter": "dc1", "node_class": "none", "node_id": "67402bf4-00f3-bd8d-9fa8-f4d1924a892a" } }, { "Name": "service-name.default-0a3ba4b6-2109-485e-be74-6864228aed3d.client.allocations.terminal", "Value": 0, "Labels": { "datacenter": "dc1", "node_class": "none", "node_id": "a2945b48-7e66-68e2-c922-49b20dd4e20c" } }, ```	2019-11-22 18:41:21 -05:00
Nomad Release bot	db6420367d	Generate files for 0.10.2-rc1 release	2019-11-22 18:42:49 +00:00
Mahmood Ali	be6c60455e	Merge pull request #6669 from hashicorp/b-cors-allow-credentials Allow UI to query client directly for task logs/state	2019-11-20 15:14:01 -05:00
Preetha	42c1c85285	Merge pull request #6421 from hashicorp/b-acl-bootstrap-codes api: acl bootstrap errors aren't 500	2019-11-20 10:36:08 -06:00
Preetha	be4a51d5b8	Merge pull request #6349 from hashicorp/b-host-stats client: Return empty values when host stats fail	2019-11-20 10:13:02 -06:00
Mahmood Ali	6f8bb5e90b	api: acl bootstrap errors aren't 500 Noticed that ACL endpoints return 500 status code for user errors. This is confusing and can lead to false monitoring alerts. Here, I introduce a concept of RPCCoded errors to be returned by RPC that signal a code in addition to error message. Codes for now match HTTP codes to ease reasoning. ``` $ nomad acl bootstrap Error bootstrapping: Unexpected response code: 500 (ACL bootstrap already done (reset index: 9)) $ nomad acl bootstrap Error bootstrapping: Unexpected response code: 400 (ACL bootstrap already done (reset index: 9)) ```	2019-11-19 15:51:57 -05:00
Nick Ethier	bd454a4c6f	client: improve group service stanza interpolation and check_re… (#6586 ) * client: improve group service stanza interpolation and check_restart support Interpolation can now be done on group service stanzas. Note that some task runtime specific information that was previously available when the service was registered poststart of a task is no longer available. The check_restart stanza for checks defined on group services will now properly restart the allocation upon check failures if configured.	2019-11-18 13:04:01 -05:00
Drew Bailey	9b63828658	serverID to target remote leader or server handle the case where we request a server-id which is this current server update docs, error on node and server id params more accurate names for tests use shared no leader err, formatting rm bad comment remove redundant variable	2019-11-14 10:07:35 -05:00
Drew Bailey	b644e1f47d	add server-id to monitor specific server	2019-11-14 09:53:41 -05:00
Drew Bailey	acd97d0731	Merge pull request #6670 from hashicorp/api/fallthrough-test test rootfallthrough handler	2019-11-13 10:51:31 -05:00

1 2 3 4 5 ...

1491 commits