open-nomad

Commit Graph

Author	SHA1	Message	Date
Mahmood Ali	504ce7f12c	Merge pull request #6865 from hashicorp/b-cli-error-formatting cli: sequence cli.Ui operations	2019-12-16 16:06:13 -05:00
Drew Bailey	2030a9bd16	update changelog for group shutdown_delay	2019-12-16 11:41:37 -05:00
Drew Bailey	d9e41d2880	docs for shutdown delay update docs, address pr comments ensure pointer is not nil use pointer for diff tests, set vs unset	2019-12-16 11:38:35 -05:00
Drew Bailey	ae145c9a37	allow only positive shutdown delay more explicit test case, remove select statement	2019-12-16 11:38:30 -05:00
Drew Bailey	24929776a2	shutdown delay for task groups copy struct values ensure groupserviceHook implements RunnerPreKillhook run deregister first test that shutdown times are delayed move magic number into variable	2019-12-16 11:38:16 -05:00
Mahmood Ali	76be9b4afb	cli: sequence cli.Ui operations Fixes a bug where if a command flag parsing errors, the resulting error and help usage messages get interleaved in unexpected and non-user friendly way. The reason is that we have flag parsing library effectively writes to ui.Error in a goroutine. This is problematic: first, we lose the sequencing between help usage and error message; second, cli.Ui methods are not concurrent safe. Here, we introduce a custom error writer that buffers result and calls ui.Error() in the write method and in the same goroutine. For context, we need to wrap ui.Error because it's line-oriented, while flags library expects a io.Writer which is bytes oriented.	2019-12-16 10:08:17 -05:00
Danielle	fc85ec7295	env_aws: Disable Retries and set Session cfg (#6860 ) env_aws: Disable Retries and set Session cfg	2019-12-16 15:25:29 +01:00
Danielle	b006be623d	Update client/fingerprint/env_aws.go Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>	2019-12-16 14:48:52 +01:00
Seth Hoenig	90716f13d3	Merge pull request #6848 from hashicorp/b-log-less-in-node-drainer tests: remove trace statements from nodeDrainWatcher.watch	2019-12-16 07:47:52 -06:00
Tim Gross	9b2b4da3a4	e2e: run client/allocs metrics nightly tests vs Windows (#6850 ) Adds Windows targets to the client/allocs metrics tests. Removes the `allocstats` test, which covers less than these tests and is now redundant. Adds a firewall rule to our Windows instances so that the prometheus server can scrape the Nomad HTTP API for metrics.	2019-12-16 08:34:17 -05:00
Michel Vocks	5cb462fd13	Add raw field for ClientCert and ClientKey	2019-12-16 14:30:00 +01:00
Seth Hoenig	270233e23d	tests: remove trace statements from nodeDrainWatcher.watch Avoid logging in the `watch` function as much as possible, since it is not waited on during a server shutdown. When the logger logs after a test passes, it may or may not cause the testing framework to panic. More info in: https://github.com/golang/go/issues/29388#issuecomment-453648436	2019-12-16 07:08:11 -06:00
Michel Vocks	6e413b3929	Update go mod	2019-12-16 12:47:10 +01:00
Michel Vocks	3864d91d03	Add option to set certificate in-memory via SDK	2019-12-16 10:59:27 +01:00
Danielle Lancashire	5a87b3ab4b	env_aws: Disable Retries and set Session cfg Previously, Nomad used hand rolled HTTP requests to interact with the EC2 metadata API. Recently however, we switched to using the AWS SDK for this fingerprinting. The default behaviour of the AWS SDK is to perform retries with exponential backoff when a request fails. This is problematic for Nomad, because interacting with the EC2 API is in our client start path. Here we revert to our pre-existing behaviour of not performing retries in the fast path, as if the metadata service is unavailable, it's likely that nomad is not running in AWS.	2019-12-16 10:56:32 +01:00
Michael Schurter	9b08968c2e	Merge pull request #6855 from hashicorp/b-interp-connect-task connect: canonicalize before adding sidecar	2019-12-13 09:26:44 -08:00
Mahmood Ali	4a1cc67f58	Merge pull request #6820 from hashicorp/f-skip-docker-logging-knob driver: allow disabling log collection	2019-12-13 11:41:20 -05:00
Mahmood Ali	a7361612b6	Merge pull request #6556 from hashicorp/c-vendor-multierror-20191025 Update go-multierror library	2019-12-13 11:32:42 -05:00
Mahmood Ali	46bc3b57e6	address review comments	2019-12-13 11:21:00 -05:00
Mahmood Ali	b3a1e571e5	tests: fix error format assertion multierror library changed formatting slightly.	2019-12-13 11:01:20 -05:00
Buck Doyle	d1b689c0f4	Update changelog with #6817	2019-12-13 09:16:31 -06:00
Mahmood Ali	ea30ab9c56	Update go-multierror to 72917a1 To pick up https://github.com/hashicorp/go-multierror/pull/28	2019-12-13 10:13:31 -05:00
Buck Doyle	6ff6a41df8	Fix flapping status light test (#6852 ) I unintentionally introduced a flapping test in #6817. The draining status of the node will be randomly chosen and that flag takes precedence over eligibility. This forces the draining flag to be false rather than random so the test should no longer flap. See here for an example failure: https://circleci.com/gh/hashicorp/nomad/26368	2019-12-13 09:02:02 -06:00
Preetha Appan	5f7cfa5050	update changelog	2019-12-13 08:22:14 -06:00
Mahmood Ali	5a81cd3309	Merge pull request #6839 from hashicorp/b-cgroup-cleanup executor: stop joining executor to container cgroup	2019-12-13 09:05:09 -05:00
Michael Schurter	5a25766714	docs: add #6855 to changelog Also make Connect related fixes more consistent in the changelog. I suspect users won't care if a Connect related fix is in the server's admission controller or in the client's groupservice hook or somewhere else, so I think grouping them by `consul/connect:` makes the most sense.	2019-12-12 20:58:49 -08:00
Michael Schurter	95fd2643d7	connect: canonicalize before adding sidecar Fixes #6853 Canonicalize jobs first before adding any sidecars. This fixes a bug where sidecar tasks were added without interpolated names and broke validation. Sidecar tasks must be canonicalized independently. Also adds a group network to the mock connect job because it wasn't a valid connect job before!	2019-12-12 20:55:56 -08:00
Mahmood Ali	570d81966f	Merge pull request #6854 from hashicorp/update-changelog Add notarization details to changelog	2019-12-12 20:56:53 -05:00
Michele	161780ba00	Add clarifying update	2019-12-12 15:28:47 -08:00
Michele	49d0ef63aa	Add apple notarization note	2019-12-12 15:24:18 -08:00
Preetha	2692949556	Merge pull request #6849 from hashicorp/b-debug-preemption Use debug logging for scheduler internals	2019-12-12 16:15:46 -06:00
Preetha Appan	afff27b69b	More error->debug for logging in the bin packing iterator	2019-12-12 15:50:16 -06:00
ebarriosjr	b953239227	driver/pot: Added extra_hosts and args commands (#6577 )	2019-12-12 16:29:45 -05:00
Buck Doyle	09067b4eb7	UI: Fix client sorting (#6817 ) There are two changes here, and some caveats/commentary: 1. The “State“ table column was actually sorting only by status. The state was not an actual property, just something calculated in each client row, as a product of status, isEligible, and isDraining. This PR adds isDraining as a component of compositeState so it can be used for sorting. 2. The Sortable mixin declares dependent keys that cause the sort to be live-updating, but only if the members of the array change, such as if a new client is added, but not if any of the sortable properties change. This PR adds a SortableFactory function that generates a mixin whose listSorted computed property includes dependent keys for the sortable properties, so the table will live-update if any of the sortable properties change, not just the array members. There’s a warning if you use SortableFactory without dependent keys and via the original Sortable interface, so we can eventually migrate away from it.	2019-12-12 13:06:54 -06:00
Michael Lange	ed8fd28a10	Merge pull request #6808 from hashicorp/b-ui/unclosed-log-streams UI: Unclosed log streams	2019-12-12 10:55:49 -08:00
Preetha Appan	3458b41290	Use debug logging for scheduler internals We currently log an error if preemption is unable to find a suitable set of allocations to preempt. This commit changes that to debug level since not finding preemptable allocations is not an error condition.	2019-12-12 12:05:29 -06:00
Tim Gross	e439e927ed	e2e: run client/allocs metrics tests nightly (#6842 ) Refactor the metrics end-to-end tests so they can be run with our e2e test framework. Runs fabio/prometheus and a collection of jobs that will cause metrics to be measured. We then query Prometheus to ensure we're publishing those allocation metrics and some metrics from the clients as well. Includes adding a placeholder for running the same tests on Windows.	2019-12-12 12:45:16 -05:00
Mahmood Ali	d80ae6765b	simplify cgroup path lookup	2019-12-11 12:43:25 -05:00
Seth Hoenig	2600c95af7	Merge pull request #6838 from hashicorp/f-parallelize-state-store-tests tests: parallelize state store tests	2019-12-11 11:05:52 -06:00
Mahmood Ali	94ab62dfb4	executor: stop joining executor to container cgroup Stop joining libcontainer executor process into the newly created task container cgroup, to ensure that the cgroups are fully destroyed on shutdown, and to make it consistent with other plugin processes. Previously, executor process is added to the container cgroup so the executor process resources get aggregated along with user processes in our metric aggregation. However, adding executor process to container cgroup adds some complications with much benefits: First, it complicates cleanup. We must ensure that the executor is removed from container cgroup on shutdown. Though, we had a bug where we missed removing it from the systemd cgroup. Because executor uses `containerState.CgroupPaths` on launch, which includes systemd, but `cgroups.GetAllSubsystems` which doesn't. Second, it may have advese side-effects. When a user process is cpu bound or uses too much memory, executor should remain functioning without risk of being killed (by OOM killer) or throttled. Third, it is inconsistent with other drivers and plugins. Logmon and DockerLogger processes aren't in the task cgroups. Neither are containerd processes, though it is equivalent to executor in responsibility. Fourth, in my experience when executor process moves cgroup while it's running, the cgroup aggregation is odd. The cgroup `memory.usage_in_bytes` doesn't seem to capture the full memory usage of the executor process and becomes a red-harring when investigating memory issues. For all the reasons above, I opted to have executor remain in nomad agent cgroup and we can revisit this when we have a better story for plugin process cgroup management.	2019-12-11 11:28:09 -05:00
Mahmood Ali	739e5e8811	drivers/exec: test all cgroups are destroyed	2019-12-11 11:12:29 -05:00
Seth Hoenig	d45dec1ca8	tests: parallelize state store tests It has been decided we're going to live in a many core world. Let's take advantage of that and parallelize these state store tests which all run in memory and are largely CPU bound. An unscientific benchmark demonstrating the improvement: [mp state (master)] $ go test PASS ok github.com/hashicorp/nomad/nomad/state 5.162s [mp state (f-parallelize-state-store-tests)] $ go test PASS ok github.com/hashicorp/nomad/nomad/state 1.527s	2019-12-11 09:36:37 -06:00
Tim Gross	b25713a837	doc: spread is inherited from job to group (#6837 )	2019-12-11 09:59:26 -05:00
Drew Bailey	a8bb422500	Merge pull request #6834 from hashicorp/monitor-changelog add 6828 to changelog	2019-12-11 08:17:12 -05:00
Michael Schurter	cdfaa3ca8a	Merge pull request #6833 from hashicorp/sentinel-imports-note Make note of Sentinel standard imports	2019-12-10 13:56:01 -08:00
Chris Arcand	deb84a41f6	Make note of Sentinel standard imports > Sentinel-embedded applications can choose to whitelist or blacklist certain standard imports. Please reference the documentation for the Sentinel-enabled application you're using to determine if all standard imports are available.	2019-12-10 14:44:51 -06:00
Drew Bailey	fc6f49dba9	add 6828 to changelog	2019-12-10 15:02:34 -05:00
Tim Gross	5289c1e0aa	doc: explain ALLOC_INDEX uniqueness guarantees (#6830 ) The `ALLOC_INDEX` isn't guaranteed to be unique, and this has caused some user confusion. The servers make a best-effort attempt to make this value unique from 0 to count-1 but when you have canaries on the task group, there are reused indexes because you have multiple job versions running at the same time. If a user needs a unique number for interpolating a value in your application, they can get this by combining the job version and the alloc index. Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-12-10 10:30:26 -05:00
Danielle	246a4e898b	Merge pull request #6828 from hashicorp/b/nomad-monitor-panic command: error when no node is found for `monitor`	2019-12-10 14:29:32 +01:00
Danielle Lancashire	cd764ab0e9	command: error when no node is found for `monitor` Currently `nomad monitor -node-id` will panic when a node-id does not match any nodes, as there is no empty result bounds checking. Here we return an error to the user when no nodes are found.	2019-12-10 13:10:47 +01:00

1 2 3 4 5 ...

16748 Commits All Branches Search

16748 Commits

All Branches