open-nomad

Author	SHA1	Message	Date
Michael Schurter	3b57df33e3	client: fix data races in config handling (#14139 ) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex only protects the config pointer itself and not fields inside the Config struct: all config mutation must be done on a copy of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an internally backward incompatible API change: our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"	2022-08-18 16:32:04 -07:00
Piotr Kazmierczak	b63944b5c1	cleanup: replace TypeToPtr helper methods with pointer.Of (#14151 ) Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.	2022-08-17 18:26:34 +02:00
Lars Lehtonen	a80df0480e	testing: fix dropped test errors in command/agent (#13926 )	2022-07-28 11:04:31 -04:00
Will Jordan	5354409b1a	Return 429 response on HTTP max connection limit (#13621 ) Return 429 response on HTTP max connection limit. Instead of silently closing the connection, return a `429 Too Many Requests` HTTP response with a helpful error message to aid debugging when the connection limit is unintentionally reached. Set a 10-millisecond write timeout and rate limiter for connection-limit 429 response to prevent writing the HTTP response from consuming too many server resources. Add `nomad.agent.http.exceeded metric` counting the number of HTTP connections exceeding concurrency limit.	2022-07-20 14:12:21 -04:00
Kevin Schoonover	544c276128	parse ACL token from authorization header (#12534 )	2022-06-06 15:51:02 -04:00
Lars Lehtonen	81bb1ef030	command/agent: check err before close (#12574 )	2022-04-15 08:54:03 -04:00
Tim Gross	1724765096	api: use `cleanhttp.DefaultPooledTransport` for default API client (#12492 ) We expect every Nomad API client to use a single connection to any given agent, so take advantage of keep-alive by switching the default transport to `DefaultPooledClient`. Provide a facility to close idle connections for testing purposes. Restores the previously reverted #12409 Co-authored-by: Ben Buzbee <bbuzbee@cloudflare.com>	2022-04-06 16:14:53 -04:00
Seth Hoenig	2631659551	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Luiz Aoqui	15f9d54dea	api: prevent excessice CPU load on job parse Add new namespace ACL requirement for the /v1/jobs/parse endpoint and return early if HCLv2 parsing fails. The endpoint now requires the new `parse-job` ACL capability or `submit-job`.	2022-02-09 19:51:47 -05:00
Kevin Schoonover	5d9a506bc0	agent: support multiple http address in addresses.http (#11582 )	2022-01-03 09:33:53 -05:00
Tim Gross	7770eda3f1	config: fix test-only failures in UI handler setup (#11571 ) The `TestHTTPServer_Limits_Error` test never starts the agent so it had an incomplete configuration, which caused panics in the test. Fix the configuration. The PR #11555 had a branch name like `f-ui-*` which caused CI to skip the unit tests over the HTTP handler setup, so this wasn't caught in PR review.	2021-11-24 16:19:04 -05:00
James Rasell	751c8217d1	core: allow setting and propagation of eval priority on job de/registration (#11532 ) This change modifies the Nomad job register and deregister RPCs to accept an updated option set which includes eval priority. This param is optional and override the use of the job priority to set the eval priority. In order to ensure all evaluations as a result of the request use the same eval priority, the priority is shared to the allocReconciler and deploymentWatcher. This creates a new distinction between eval priority and job priority. The Nomad agent HTTP API has been modified to allow setting the eval priority on job update and delete. To keep consistency with the current v1 API, job update accepts this as a payload param; job delete accepts this as a query param. Any user supplied value is validated within the agent HTTP handler removing the need to pass invalid requests to the server. The register and deregister opts functions now all for setting the eval priority on requests. The change includes a small change to the DeregisterOpts function which handles nil opts. This brings the function inline with the RegisterOpts.	2021-11-23 09:23:31 +01:00
Mahmood Ali	aa77c2731b	tests: use standard library testing.TB Glint pulled in an updated version of mitchellh/go-testing-interface which broke some existing tests because the update added a Parallel() method to testing.T. This switches to the standard library testing.TB which doesn't have a Parallel() method.	2021-06-09 16:18:45 -07:00
Isabel Suchanek	dfaef2468c	cli: add monitor flag to deployment status Adding '-verbose' will print out the allocation information for the deployment. This also changes the job run command so that it now blocks until deployment is complete and adds timestamps to the output so that it's more in line with the output of node drain. This uses glint to print in place in running in a tty. Because glint doesn't yet support cmd/powershell, Windows workflows use a different library to print in place, which results in slightly different formatting: 1) different margins, and 2) no spinner indicating deployment in progress.	2021-06-09 16:18:45 -07:00
Chris Baker	b11a092d2d	added missing import from command/agent	2021-04-02 13:53:28 +00:00
Chris Baker	21bc48ca29	json handles were moved to a new package in #10202 this was unecessary after refactoring, so this moves them back to their original location in package structs	2021-04-02 13:31:10 +00:00
Chris Baker	436d46bd19	Merge branch 'main' into f-node-drain-api	2021-04-01 15:22:57 -05:00
Tim Gross	aec5337862	CSI: HTTP handlers for create/delete/list	2021-03-31 16:37:09 -04:00
Tim Gross	b0d2eed932	redirect from HTTP root to UI should include query params The OTT feature relies on having a query parameter for a one-time token which gets handled by the UI. We need to make sure that query param is preserved when redirecting from the root URL to the `/ui/` URI.	2021-03-26 14:54:41 -04:00
Chris Baker	770c9cecb5	restored Node.Sanitize() for RPC endpoints multiple other updates from code review	2021-03-26 17:03:15 +00:00
Chris Baker	ff0b9a4d3e	added benchmark test for JSON encoding extensions	2021-03-23 20:23:06 +00:00
Chris Baker	cb540ed691	added tests that the API doesn't leak Node.SecretID added more documentation on JSON encoding to the contributing guide	2021-03-23 18:09:20 +00:00
Seth Hoenig	40d36fc0ec	agent: revert use of http connlimit https://github.com/hashicorp/nomad/pull/9608 introduced the use of the built-in HTTP 429 response handler provided by go-connlimit. There is concern though around plausible DOS attacks that need to be addressed, so this PR reverts that functionality. It keeps a fix in the tests around the use of an HTTPS enabled client for when the server is listening on HTTPS. Previously, the tests would fail deterministically with io.EOF because that's how the TLS server terminates invalid connections. Now, the result is much less deterministic. The state of the client connection and the server socket depends on when the connection is closed and how far along the handshake was.	2020-12-14 14:40:14 -06:00
Seth Hoenig	a28cd45988	client: fix plumbing of testing object into helper	2020-12-10 11:04:38 -06:00
Seth Hoenig	2cc5787f97	client: fix https test cases in client rate limits	2020-12-10 09:20:28 -06:00
Dennis Schön	a9c97d9257	use os.ErrDeadlineExceeded in tests	2020-12-07 10:40:28 -05:00
Michael Schurter	6890cffd7a	unify boolean parameter parsing	2020-10-14 12:23:25 -07:00
Michael Schurter	8ccbd92cb6	api: add field filters to /v1/{allocations,nodes} Fixes #9017 The ?resources=true query parameter includes resources in the object stub listings. Specifically: - For `/v1/nodes?resources=true` both the `NodeResources` and `ReservedResources` field are included. - For `/v1/allocations?resources=true` the `AllocatedResources` field is included. The ?task_states=false query parameter removes TaskStates from /v1/allocations responses. (By default TaskStates are included.)	2020-10-14 10:35:22 -07:00
Mahmood Ali	d4f385d6e1	Upgrade to golang 1.15 (#8858 ) Upgrade to golang 1.15 Starting with golang 1.5, setting Ctty value result in `Setctty set but Ctty not valid in child` error, as part of https://github.com/golang/go/issues/29458 . This commit lifts the fix in https://github.com/creack/pty/pull/97 .	2020-09-09 15:59:29 -04:00
James Rasell	ae0fb98c6b	api: return custom error if API attempts to decode empty body.	2020-05-19 15:46:31 +02:00
Mahmood Ali	b8fb32f5d2	http: adjust log level for request failure Failed requests due to API client errors are to be marked as DEBUG. The Error log level should be reserved to signal problems with the cluster and are actionable for nomad system operators. Logs due to misbehaving API clients don't represent a system level problem and seem spurius to nomad maintainers at best. These log messages can also be attack vectors for deniel of service attacks by filling servers disk space with spurious log messages.	2020-04-22 16:19:59 -04:00
Yoan Blanc	225c9c1215	fixup! vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:48:07 -04:00
Yoan Blanc	761d014071	vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:45:21 -04:00
Michael Schurter	e903501e65	test: improve error messages when failing	2020-02-07 15:50:53 -08:00
Michael Schurter	9905dec6a3	test: workaround limits race	2020-02-07 15:50:53 -08:00
Michael Schurter	19a1932bbb	test: wait longer than timeout The 1s timeout raced with the 1s deadline it was trying to detect.	2020-02-07 15:50:53 -08:00
Michael Schurter	c82b14b0c4	core: add limits to unauthorized connections Introduce limits to prevent unauthorized users from exhausting all ephemeral ports on agents: * `{https,rpc}_handshake_timeout` * `{http,rpc}_max_conns_per_client` The handshake timeout closes connections that have not completed the TLS handshake by the deadline (5s by default). For RPC connections this timeout also separately applies to first byte being read so RPC connections with TLS enabled have `rpc_handshake_time * 2` as their deadline. The connection limit per client prevents a single remote TCP peer from exhausting all ephemeral ports. The default is 100, but can be lowered to a minimum of 26. Since streaming RPC connections create a new TCP connection (until MultiplexV2 is used), 20 connections are reserved for Raft and non-streaming RPCs to prevent connection exhaustion due to streaming RPCs. All limits are configurable and may be disabled by setting them to `0`. This also includes a fix that closes connections that attempt to create TLS RPC connections recursively. While only users with valid mTLS certificates could perform such an operation, it was added as a safeguard to prevent programming errors before they could cause resource exhaustion.	2020-01-30 10:38:25 -08:00
Drew Bailey	4ced73875b	leave acl checking to rpc endpoints fix test expectation test wrapNonJSON	2020-01-09 15:15:08 -05:00
Drew Bailey	acd97d0731	Merge pull request #6670 from hashicorp/api/fallthrough-test test rootfallthrough handler	2019-11-13 10:51:31 -05:00
Lars Lehtonen	1dbf44bc40	command/agent: Prune Dead Code (#6682 ) * remove unused MockPeriodicJob() from tests * remove unused getIndex() from tests * remove unused checkIndex() from tests * remove unused assertIndex() from tests * remove unused Agent.findLoopbackDevice()	2019-11-13 08:20:01 -05:00
Drew Bailey	f5310ff63f	fix so assertions are test case driven	2019-11-12 14:28:21 -05:00
Drew Bailey	f989f38594	test /ui/ path	2019-11-11 12:12:42 -05:00
Drew Bailey	a0548824f3	test rootfallthrough handler	2019-11-11 12:08:44 -05:00
Michael Schurter	9f179e9fab	Fix HTTP code for permission denied errors Fixes #3697 The existing code and test case only covered the leader behavior. When querying against non-leaders the error has an "rpc error: " prefix. To provide consistency in HTTP error response I also strip the "rpc error: " prefix for 403 responses as they offer no beneficial additional information (and in theory disclose a tiny bit of data to unauthorized users, but it would be a pretty weird bit of data to use in a malicious way).	2018-01-09 15:25:53 -08:00
Chelsea Komlo	2dfda33703	Nomad agent reload TLS configuration on SIGHUP (#3479 ) * Allow server TLS configuration to be reloaded via SIGHUP * dynamic tls reloading for nomad agents * code cleanup and refactoring * ensure keyloader is initialized, add comments * allow downgrading from TLS * initalize keyloader if necessary * integration test for tls reload * fix up test to assert success on reloaded TLS configuration * failure in loading a new TLS config should remain at current Reload only the config if agent is already using TLS * reload agent configuration before specific server/client lock keyloader before loading/caching a new certificate * introduce a get-or-set method for keyloader * fixups from code review * fix up linting errors * fixups from code review * add lock for config updates; improve copy of tls config * GetCertificate only reloads certificates dynamically for the server * config updates/copies should be on agent * improve http integration test * simplify agent reloading storing a local copy of config * reuse the same keyloader when reloading * Test that server and client get reloaded but keep keyloader * Keyloader exposes GetClientCertificate as well for outgoing connections * Fix spelling * correct changelog style	2017-11-14 17:53:23 -08:00
Alex Dadgar	dbc014b360	Standardize retrieving a free port into a helper package	2017-10-23 16:48:20 -07:00
Alex Dadgar	d6b970eec9	Handle invalid token as well	2017-10-12 15:39:05 -07:00
Alex Dadgar	0b538ded83	403 instead of 500 for permission denied	2017-10-12 14:10:20 -07:00
Armon Dadgar	5c94e7e99f	agent: thread through token for ACL endpoint tests	2017-09-04 13:05:53 -07:00
Armon Dadgar	4107335cb2	agent: Adding X-Nomad-Token header parsing	2017-09-04 13:05:53 -07:00

1 2

69 commits