Commit Graph

16743 Commits

Author SHA1 Message Date
Mahmood Ali 76be9b4afb cli: sequence cli.Ui operations
Fixes a bug where if a command flag parsing errors, the resulting error
and help usage messages get interleaved in unexpected and non-user
friendly way.

The reason is that we have flag parsing library effectively writes to
ui.Error in a goroutine.  This is problematic: first, we lose the sequencing between help
usage and error message; second, cli.Ui methods are not concurrent safe.

Here, we introduce a custom error writer that buffers result and calls
ui.Error() in the write method and in the same goroutine.

For context, we need to wrap ui.Error because it's line-oriented, while
flags library expects a io.Writer which is bytes oriented.
2019-12-16 10:08:17 -05:00
Danielle fc85ec7295
env_aws: Disable Retries and set Session cfg (#6860)
env_aws: Disable Retries and set Session cfg
2019-12-16 15:25:29 +01:00
Danielle b006be623d
Update client/fingerprint/env_aws.go
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2019-12-16 14:48:52 +01:00
Seth Hoenig 90716f13d3
Merge pull request #6848 from hashicorp/b-log-less-in-node-drainer
tests: remove trace statements from nodeDrainWatcher.watch
2019-12-16 07:47:52 -06:00
Tim Gross 9b2b4da3a4
e2e: run client/allocs metrics nightly tests vs Windows (#6850)
Adds Windows targets to the client/allocs metrics tests. Removes the
`allocstats` test, which covers less than these tests and is now
redundant.

Adds a firewall rule to our Windows instances so that the prometheus
server can scrape the Nomad HTTP API for metrics.
2019-12-16 08:34:17 -05:00
Michel Vocks 5cb462fd13 Add raw field for ClientCert and ClientKey 2019-12-16 14:30:00 +01:00
Seth Hoenig 270233e23d tests: remove trace statements from nodeDrainWatcher.watch
Avoid logging in the `watch` function as much as possible, since
it is not waited on during a server shutdown. When the logger
logs after a test passes, it may or may not cause the testing
framework to panic.

More info in:
https://github.com/golang/go/issues/29388#issuecomment-453648436
2019-12-16 07:08:11 -06:00
Michel Vocks 6e413b3929 Update go mod 2019-12-16 12:47:10 +01:00
Michel Vocks 3864d91d03 Add option to set certificate in-memory via SDK 2019-12-16 10:59:27 +01:00
Danielle Lancashire 5a87b3ab4b
env_aws: Disable Retries and set Session cfg
Previously, Nomad used hand rolled HTTP requests to interact with the
EC2 metadata API. Recently however, we switched to using the AWS SDK for
this fingerprinting.

The default behaviour of the AWS SDK is to perform retries with
exponential backoff when a request fails. This is problematic for Nomad,
because interacting with the EC2 API is in our client start path.

Here we revert to our pre-existing behaviour of not performing retries
in the fast path, as if the metadata service is unavailable, it's likely
that nomad is not running in AWS.
2019-12-16 10:56:32 +01:00
Michael Schurter 9b08968c2e
Merge pull request #6855 from hashicorp/b-interp-connect-task
connect: canonicalize before adding sidecar
2019-12-13 09:26:44 -08:00
Mahmood Ali 4a1cc67f58
Merge pull request #6820 from hashicorp/f-skip-docker-logging-knob
driver: allow disabling log collection
2019-12-13 11:41:20 -05:00
Mahmood Ali a7361612b6
Merge pull request #6556 from hashicorp/c-vendor-multierror-20191025
Update go-multierror library
2019-12-13 11:32:42 -05:00
Mahmood Ali 46bc3b57e6 address review comments 2019-12-13 11:21:00 -05:00
Mahmood Ali b3a1e571e5 tests: fix error format assertion
multierror library changed formatting slightly.
2019-12-13 11:01:20 -05:00
Buck Doyle d1b689c0f4 Update changelog with #6817 2019-12-13 09:16:31 -06:00
Mahmood Ali ea30ab9c56 Update go-multierror to 72917a1
To pick up https://github.com/hashicorp/go-multierror/pull/28
2019-12-13 10:13:31 -05:00
Buck Doyle 6ff6a41df8
Fix flapping status light test (#6852)
I unintentionally introduced a flapping test in #6817. The
draining status of the node will be randomly chosen and
that flag takes precedence over eligibility. This forces
the draining flag to be false rather than random so the
test should no longer flap.

See here for an example failure:
https://circleci.com/gh/hashicorp/nomad/26368
2019-12-13 09:02:02 -06:00
Preetha Appan 5f7cfa5050 update changelog 2019-12-13 08:22:14 -06:00
Mahmood Ali 5a81cd3309
Merge pull request #6839 from hashicorp/b-cgroup-cleanup
executor: stop joining executor to container cgroup
2019-12-13 09:05:09 -05:00
Michael Schurter 5a25766714 docs: add #6855 to changelog
Also make Connect related fixes more consistent in the changelog. I
suspect users won't care if a Connect related fix is in the server's
admission controller or in the client's groupservice hook or somewhere
else, so I think grouping them by `consul/connect:` makes the most
sense.
2019-12-12 20:58:49 -08:00
Michael Schurter 95fd2643d7 connect: canonicalize before adding sidecar
Fixes #6853

Canonicalize jobs first before adding any sidecars. This fixes a bug
where sidecar tasks were added without interpolated names and broke
validation. Sidecar tasks must be canonicalized independently.

Also adds a group network to the mock connect job because it wasn't a
valid connect job before!
2019-12-12 20:55:56 -08:00
Mahmood Ali 570d81966f
Merge pull request #6854 from hashicorp/update-changelog
Add notarization details to changelog
2019-12-12 20:56:53 -05:00
Michele 161780ba00 Add clarifying update 2019-12-12 15:28:47 -08:00
Michele 49d0ef63aa Add apple notarization note 2019-12-12 15:24:18 -08:00
Preetha 2692949556
Merge pull request #6849 from hashicorp/b-debug-preemption
Use debug logging for scheduler internals
2019-12-12 16:15:46 -06:00
Preetha Appan afff27b69b More error->debug for logging in the bin packing iterator 2019-12-12 15:50:16 -06:00
ebarriosjr b953239227 driver/pot: Added extra_hosts and args commands (#6577) 2019-12-12 16:29:45 -05:00
Buck Doyle 09067b4eb7
UI: Fix client sorting (#6817)
There are two changes here, and some caveats/commentary:

1. The “State“ table column was actually sorting only by status. The state was not an actual property, just something calculated in each client row, as a product of status, isEligible, and isDraining. This PR adds isDraining as a component of compositeState so it can be used for sorting.

2. The Sortable mixin declares dependent keys that cause the sort to be live-updating, but only if the members of the array change, such as if a new client is added, but not if any of the sortable properties change. This PR adds a SortableFactory function that generates a mixin whose listSorted computed property includes dependent keys for the sortable properties, so the table will live-update if any of the sortable properties change, not just the array members. There’s a warning if you use SortableFactory without dependent keys and via the original Sortable interface, so we can eventually migrate away from it.
2019-12-12 13:06:54 -06:00
Michael Lange ed8fd28a10
Merge pull request #6808 from hashicorp/b-ui/unclosed-log-streams
UI: Unclosed log streams
2019-12-12 10:55:49 -08:00
Preetha Appan 3458b41290 Use debug logging for scheduler internals
We currently log an error if preemption is unable to find a suitable set of
allocations to preempt. This commit changes that to debug level since not finding
preemptable allocations is not an error condition.
2019-12-12 12:05:29 -06:00
Tim Gross e439e927ed
e2e: run client/allocs metrics tests nightly (#6842)
Refactor the metrics end-to-end tests so they can be run with our e2e
test framework. Runs fabio/prometheus and a collection of jobs that
will cause metrics to be measured. We then query Prometheus to ensure
we're publishing those allocation metrics and some metrics from the
clients as well.

Includes adding a placeholder for running the same tests on Windows.
2019-12-12 12:45:16 -05:00
Mahmood Ali d80ae6765b simplify cgroup path lookup 2019-12-11 12:43:25 -05:00
Seth Hoenig 2600c95af7
Merge pull request #6838 from hashicorp/f-parallelize-state-store-tests
tests: parallelize state store tests
2019-12-11 11:05:52 -06:00
Mahmood Ali 94ab62dfb4 executor: stop joining executor to container cgroup
Stop joining libcontainer executor process into the newly created task
container cgroup, to ensure that the cgroups are fully destroyed on
shutdown, and to make it consistent with other plugin processes.

Previously, executor process is added to the container cgroup so the
executor process resources get aggregated along with user processes in
our metric aggregation.

However, adding executor process to container cgroup adds some
complications with much benefits:

First, it complicates cleanup.  We must ensure that the executor is
removed from container cgroup on shutdown.  Though, we had a bug where
we missed removing it from the systemd cgroup.  Because executor uses
`containerState.CgroupPaths` on launch, which includes systemd, but
`cgroups.GetAllSubsystems` which doesn't.

Second, it may have advese side-effects.  When a user process is cpu
bound or uses too much memory, executor should remain functioning
without risk of being killed (by OOM killer) or throttled.

Third, it is inconsistent with other drivers and plugins.  Logmon and
DockerLogger processes aren't in the task cgroups.  Neither are
containerd processes, though it is equivalent to executor in
responsibility.

Fourth, in my experience when executor process moves cgroup while it's
running, the cgroup aggregation is odd.  The cgroup
`memory.usage_in_bytes` doesn't seem to capture the full memory usage of
the executor process and becomes a red-harring when investigating memory
issues.

For all the reasons above, I opted to have executor remain in nomad
agent cgroup and we can revisit this when we have a better story for
plugin process cgroup management.
2019-12-11 11:28:09 -05:00
Mahmood Ali 739e5e8811 drivers/exec: test all cgroups are destroyed 2019-12-11 11:12:29 -05:00
Seth Hoenig d45dec1ca8 tests: parallelize state store tests
It has been decided we're going to live in a many core world.
Let's take advantage of that and parallelize these state store
tests which all run in memory and are largely CPU bound.

An unscientific benchmark demonstrating the improvement:

[mp state (master)] $ go test
PASS
ok  	github.com/hashicorp/nomad/nomad/state	5.162s

[mp state (f-parallelize-state-store-tests)] $ go test
PASS
ok  	github.com/hashicorp/nomad/nomad/state	1.527s
2019-12-11 09:36:37 -06:00
Tim Gross b25713a837
doc: spread is inherited from job to group (#6837) 2019-12-11 09:59:26 -05:00
Drew Bailey a8bb422500
Merge pull request #6834 from hashicorp/monitor-changelog
add 6828 to changelog
2019-12-11 08:17:12 -05:00
Michael Schurter cdfaa3ca8a
Merge pull request #6833 from hashicorp/sentinel-imports-note
Make note of Sentinel standard imports
2019-12-10 13:56:01 -08:00
Chris Arcand deb84a41f6 Make note of Sentinel standard imports
> Sentinel-embedded applications can choose to whitelist or blacklist
certain standard imports. Please reference the documentation for the
Sentinel-enabled application you're using to determine if all standard
imports are available.
2019-12-10 14:44:51 -06:00
Drew Bailey fc6f49dba9
add 6828 to changelog 2019-12-10 15:02:34 -05:00
Tim Gross 5289c1e0aa
doc: explain ALLOC_INDEX uniqueness guarantees (#6830)
The `ALLOC_INDEX` isn't guaranteed to be unique, and this has caused
some user confusion. The servers make a best-effort attempt to make
this value unique from 0 to count-1 but when you have canaries on the
task group, there are reused indexes because you have multiple job
versions running at the same time. If a user needs a unique number for
interpolating a value in your application, they can get this by
combining the job version and the alloc index.

Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>
2019-12-10 10:30:26 -05:00
Danielle 246a4e898b
Merge pull request #6828 from hashicorp/b/nomad-monitor-panic
command: error when no node is found for `monitor`
2019-12-10 14:29:32 +01:00
Danielle Lancashire cd764ab0e9
command: error when no node is found for `monitor`
Currently `nomad monitor -node-id` will panic when a node-id does not
match any nodes, as there is no empty result bounds checking. Here we
return an error to the user when no nodes are found.
2019-12-10 13:10:47 +01:00
Chris Dickson 4d8ba272d1 client: expose allocated CPU per task (#6784) 2019-12-09 15:40:22 -05:00
Seth Hoenig 2508892973
Merge pull request #6800 from hashicorp/b-update-freeport
tests: swap lib/freeport for tweaked helper/freeport
2019-12-09 09:50:26 -06:00
Tim Gross 74a01477fd
Merge pull request #6631 from hashicorp/dependabot/npm_and_yarn/ui/lodash.mergewith-4.6.2
Bump lodash.mergewith from 4.6.1 to 4.6.2 in /ui
2019-12-09 09:47:14 -05:00
Tim Gross 06e30473c0
Merge pull request #6629 from hashicorp/dependabot/npm_and_yarn/ui/lodash.defaultsdeep-4.6.1
Bump lodash.defaultsdeep from 4.6.0 to 4.6.1 in /ui
2019-12-09 09:47:05 -05:00
Seth Hoenig f0c3dca49c tests: swap lib/freeport for tweaked helper/freeport
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.

Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.

This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
2019-12-09 08:37:32 -06:00