open-nomad

Author	SHA1	Message	Date
Mahmood Ali	9611324654	Merge pull request #6922 from hashicorp/b-alloc-canoncalize Handle Upgrades and Alloc.TaskResources modification	2020-01-28 15:12:41 -05:00
Mahmood Ali	bc183a3654	tests: run_for is already a string	2020-01-28 14:58:57 -05:00
Mahmood Ali	a0340016b9	client: canonicalize alloc.Job on restore There is a case for always canonicalizing alloc.Job field when canonicalizing the alloc. I'm less certain of implications though, and the job canonicalize hasn't changed for a long time. Here, we special case client restore from database as it's probably the most relevant part. When receiving an alloc from RPC, the data should be fresh enough.	2020-01-28 09:59:05 -05:00
Mahmood Ali	f36cc54efd	actually always canonicalize alloc.Job alloc.Job may be stale as well and need to migrate it. It does cost extra cycles but should be negligible.	2020-01-15 09:02:48 -05:00
Mahmood Ali	b1b714691c	address review comments	2020-01-15 08:57:05 -05:00
Drew Bailey	ff4bfb8809	Merge pull request #6841 from hashicorp/f-agent-pprof-acl Remote agent pprof endpoints	2020-01-10 14:52:39 -05:00
Nick Ethier	1f28633954	Merge pull request #6816 from hashicorp/b-multiple-envoy connect: configure envoy to support multiple sidecars in the same alloc	2020-01-09 23:25:39 -05:00
Drew Bailey	45210ed901	Rename profile package to pprof Address pr feedback, rename profile package to pprof to more accurately describe its purpose. Adds gc param for heap lookup profiles.	2020-01-09 15:15:10 -05:00
Drew Bailey	1b8af920f3	address pr feedback	2020-01-09 15:15:09 -05:00
Drew Bailey	279512c7f8	provide helpful error, cleanup logic	2020-01-09 15:15:08 -05:00
Drew Bailey	fd42020ad6	RPC server EnableDebug option Passes in agent enable_debug config to nomad server and client configs. This allows for rpc endpoints to have more granular control if they should be enabled or not in combination with ACLs. enable debug on client test	2020-01-09 15:15:07 -05:00
Drew Bailey	9a80938fb1	region forwarding; prevent recursive forwards for impossible requests prevent region forwarding loop, backfill tests fix failing test	2020-01-09 15:15:06 -05:00
Drew Bailey	46121fe3fd	move shared structs out of client and into nomad	2020-01-09 15:15:05 -05:00
Drew Bailey	3672414888	test pprof headers and profile methods tidy up, add comments clean up seconds param assignment	2020-01-09 15:15:04 -05:00
Drew Bailey	fc37448683	warn when enabled debug is on when registering m -> a receiver name return codederrors, fix query	2020-01-09 15:15:04 -05:00
Drew Bailey	50288461c9	Server request forwarding for Agent.Profile Return rpc errors for profile requests, set up remote forwarding to target leader or server id for profile requests. server forwarding, endpoint tests	2020-01-09 15:15:03 -05:00
Drew Bailey	49ad5fbc85	agent pprof endpoints wip, agent endpoint and client endpoint for pprof profiles agent endpoint test	2020-01-09 15:15:02 -05:00
Mahmood Ali	4e5d867644	client: stop using alloc.TaskResources Now that alloc.Canonicalize() is called in all alloc sources in the client (i.e. on state restore and RPC fetching), we no longer need to check alloc.TaskResources. alloc.AllocatedResources is always non-nil through alloc runner. Though, early on, we check for alloc validity, so NewTaskRunner and TaskEnv must still check. `TestClient_AddAllocError` test validates that behavior.	2020-01-09 09:25:07 -05:00
Mahmood Ali	7c153e1a64	client: canonicalize alloc runner on RPC	2020-01-09 08:46:50 -05:00
Mahmood Ali	d740d347ce	Migrate old alloc structs on read This commit ensures that Alloc.AllocatedResources is properly populated when read from persistence stores (namely Raft and client state store). The alloc struct may have been written previously by an arbitrary old version that may only populate Alloc.TaskResources.	2020-01-09 08:46:50 -05:00
Tim Gross	fa4da93578	interpolate environment for services in script checks (#6916 ) In 0.10.2 (specifically 387b016) we added interpolation to group service blocks and centralized the logic for task environment interpolation. This wasn't also added to script checks, which caused a regression where the IDs for script checks for services w/ interpolated fields (ex. the service name) didn't match the service ID that was registered with Consul. This changeset calls the same taskenv interpolation logic during `script_check` configuration, and adds tests to reduce the risk of future regressions by comparing the IDs of service hook and the check hook.	2020-01-09 08:12:54 -05:00
Nick Ethier	9c3cc63cd1	tr: initialize envoybootstrap prestart hook response.Env field	2020-01-08 13:41:38 -05:00
Nick Ethier	105cbf6df9	tr: expose envoy sidecar admin port as environment variable	2020-01-06 21:53:45 -05:00
Nick Ethier	677e9cdc16	connect: configure envoy such that multiple sidecars can run in the same alloc	2020-01-06 11:26:27 -05:00
Tim Gross	e9bac50c76	client: fix trace log message in alloc hook update (#6881 )	2019-12-19 16:44:04 -05:00
Drew Bailey	d9e41d2880	docs for shutdown delay update docs, address pr comments ensure pointer is not nil use pointer for diff tests, set vs unset	2019-12-16 11:38:35 -05:00
Drew Bailey	ae145c9a37	allow only positive shutdown delay more explicit test case, remove select statement	2019-12-16 11:38:30 -05:00
Drew Bailey	24929776a2	shutdown delay for task groups copy struct values ensure groupserviceHook implements RunnerPreKillhook run deregister first test that shutdown times are delayed move magic number into variable	2019-12-16 11:38:16 -05:00
Danielle	b006be623d	Update client/fingerprint/env_aws.go Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>	2019-12-16 14:48:52 +01:00
Danielle Lancashire	5a87b3ab4b	env_aws: Disable Retries and set Session cfg Previously, Nomad used hand rolled HTTP requests to interact with the EC2 metadata API. Recently however, we switched to using the AWS SDK for this fingerprinting. The default behaviour of the AWS SDK is to perform retries with exponential backoff when a request fails. This is problematic for Nomad, because interacting with the EC2 API is in our client start path. Here we revert to our pre-existing behaviour of not performing retries in the fast path, as if the metadata service is unavailable, it's likely that nomad is not running in AWS.	2019-12-16 10:56:32 +01:00
Mahmood Ali	4a1cc67f58	Merge pull request #6820 from hashicorp/f-skip-docker-logging-knob driver: allow disabling log collection	2019-12-13 11:41:20 -05:00
Mahmood Ali	a7361612b6	Merge pull request #6556 from hashicorp/c-vendor-multierror-20191025 Update go-multierror library	2019-12-13 11:32:42 -05:00
Mahmood Ali	46bc3b57e6	address review comments	2019-12-13 11:21:00 -05:00
Mahmood Ali	b3a1e571e5	tests: fix error format assertion multierror library changed formatting slightly.	2019-12-13 11:01:20 -05:00
Chris Dickson	4d8ba272d1	client: expose allocated CPU per task (#6784 )	2019-12-09 15:40:22 -05:00
Seth Hoenig	f0c3dca49c	tests: swap lib/freeport for tweaked helper/freeport Copy the updated version of freeport (sdk/freeport), and tweak it for use in Nomad tests. This means staying below port 10000 to avoid conflicts with the lib/freeport that is still transitively used by the old version of consul that we vendor. Also provide implementations to find ephemeral ports of macOS and Windows environments. Ports acquired through freeport are supposed to be returned to freeport, which this change now also introduces. Many tests are modified to include calls to a cleanup function for Server objects. This should help quite a bit with some flakey tests, but not all of them. Our port problems will not go away completely until we upgrade our vendor version of consul. With Go modules, we'll probably do a 'replace' to swap out other copies of freeport with the one now in 'nomad/helper/freeport'.	2019-12-09 08:37:32 -06:00
Mahmood Ali	0b7085ba3a	driver: allow disabling log collection Operators commonly have docker logs aggregated using various tools and don't need nomad to manage their docker logs. Worse, Nomad uses a somewhat heavy docker api call to collect them and it seems to cause problems when a client runs hundreds of log collections. Here we add a knob to disable log aggregation completely for nomad. When log collection is disabled, we avoid running logmon and docker_logger for the docker tasks in this implementation. The downside here is once disabled, `nomad logs ...` commands and API no longer return logs and operators must corrolate alloc-ids with their aggregated log info. This is meant as a stop gap measure. Ideally, we'd follow up with at least two changes: First, we should optimize behavior when we can such that operators don't need to disable docker log collection. Potentially by reverting to using pre-0.9 syslog aggregation in linux environments, though with different trade-offs. Second, when/if logs are disabled, nomad logs endpoints should lookup docker logs api on demand. This ensures that the cost of log collection is paid sparingly.	2019-12-08 14:15:03 -05:00
Mahmood Ali	ded2a725db	Merge pull request #6788 from hashicorp/b-timeout-logmon-stop logmon: add timeout to RPC operations	2019-12-06 19:12:06 -05:00
Danielle Lancashire	d2075ebae9	spellcheck: Fix spelling of retrieve	2019-12-05 18:59:47 -06:00
Mahmood Ali	b2ae27863e	Merge pull request #6779 from hashicorp/r-aws-fingerprint-via-library Use AWS SDK to access EC2 Metadata	2019-12-02 13:30:51 -05:00
Mahmood Ali	83089feff5	logmon: add timeout to RPC operations Add an RPC timeout for logmon. In https://github.com/hashicorp/nomad/issues/6461#issuecomment-559747758 , `logmonClient.Stop` locked up and indefinitely blocked the task runner destroy operation. This is an incremental improvement. We still need to follow up to understand how we got to that state, and the full impact of locked-up Stop and its link to pending allocations on restart.	2019-12-02 10:33:05 -05:00
Mahmood Ali	293276a457	fingerprint code refactor Some code cleanup: * Use a field for setting EC2 metadata instead of env-vars in testing; but keep environment variables for backward compatibility reasons * Update tests to use testify	2019-11-26 10:51:28 -05:00
Mahmood Ali	1e48f8e20d	fingerprint: avoid api query if config overrides it	2019-11-26 10:51:28 -05:00
Mahmood Ali	5bb9089431	fingerprint: use ec2metadata package	2019-11-26 10:51:27 -05:00
Lars Lehtonen	0d344e8578	client: fix use of T.Fatal inside TestFS_logsImpl_NoFollow() goroutine.	2019-11-25 23:51:28 -08:00
Mahmood Ali	e89108fb01	fixup! tests: don't assume eth0 network is available	2019-11-21 08:28:20 -05:00
Mahmood Ali	443804b5c7	tests: don't assume eth0 network is available TestClient_UpdateNodeFromFingerprintKeepsConfig checks a test node network interface, which is hardcoded to `eth0` and is updated asynchronously. This causes flakiness when eth0 isn't available. Here, we hardcode the value to an arbitrary network interface.	2019-11-20 20:37:30 -05:00
Mahmood Ali	ed3f1957e7	tests: run TestClient_WatchAllocs in non-linux environments	2019-11-20 20:37:29 -05:00
Mahmood Ali	521f51a929	testS: fix TestClient_RestoreError When spinning a second client, ensure that it uses new driver instances, rather than reuse the already shutdown unhealthy drivers from first instance. This speeds up tests significantly, but cutting ~50 seconds or so, the timeout in NewClient until drivers fingerprints. They never do because drivers were shutdown already.	2019-11-20 20:37:28 -05:00
Mahmood Ali	4efb71cf0c	tests: remove TestClient_RestoreError test TestClient_RestoreError is very slow, taking ~81 seconds. It has few problematic patterns. It's unclear what it tests, it simulates a failure condition where all state db lookup fails and asserts that alloc fails. Though starting from https://github.com/hashicorp/nomad/pull/6216 , we don't fail allocs in that condition but rather restart them. Also, the drivers used in second client `c2` are the same singleton instances used in `c1` and already shutdown. We ought to start healthy new driver instances.	2019-11-20 20:37:27 -05:00

1 2 3 4 5 ...

3981 commits