Commit graph

17990 commits

Author SHA1 Message Date
Tim Gross 52e805a6a6
csi: ensure Read/WriteAllocs aren't released early (#7841)
We should only remove the `ReadAllocs`/`WriteAllocs` values for a
volume after the claim has entered the "ready to free"
state. The volume will eventually be released as expected. But
querying the volume API will show the volume is released before the
controller unpublish has finished and this can cause a race with
starting new jobs.

Test updates are to cover cases where we're dropping claims but not
running through the whole reaping process.
2020-04-30 17:11:31 -04:00
Jasmine Dahilig a9004faa11
UI: Add representations for task lifecycles (#7659)
This adds details about task lifecycles to allocations, task groups,
and tasks. It includes a live-updating timeline-like chart on allocations.
2020-04-30 08:15:19 -05:00
Tim Gross a7a64443e1
csi: move volume claim release into volumewatcher (#7794)
This changeset adds a subsystem to run on the leader, similar to the
deployment watcher or node drainer. The `Watcher` performs a blocking
query on updates to the `CSIVolumes` table and triggers reaping of
volume claims.

This will avoid tying up scheduling workers by immediately sending
volume claim workloads into their own loop, rather than blocking the
scheduling workers in the core GC job doing things like talking to CSI
controllers

The volume watcher is enabled on leader step-up and disabled on leader
step-down.

The volume claim GC mechanism now makes an empty claim RPC for the
volume to trigger an index bump. That in turn unblocks the blocking
query in the volume watcher so it can assess which claims can be
released for a volume.
2020-04-30 09:13:00 -04:00
Michael Lange c3085f04b6
Merge pull request #7820 from hashicorp/b-ui/ui-log-races
UI: Log streaming bug fix medley
2020-04-29 18:06:47 -07:00
Michael Lange 21ef3633be Make the no connection error on the logs page dismissable 2020-04-29 17:36:17 -07:00
Michael Lange e74cd16252 Fix race condition where stdout and stderr requests can cause a no connection error
This would happen because a no connection error happens after the second request fails, but
that's because it's assumed the second request is to a server node. However, if a user clicks
stderr fast enough, the first and second requests are both to the client node. This changes
the logic to check if the request is to the server before deeming log streaming a total failure.
2020-04-29 17:36:17 -07:00
Michael Lange aafbeaba75 Clicking stdout/stderr when already on that tab is now a noop 2020-04-29 17:36:16 -07:00
Michael Lange 7452a9a57d Abort log fetch request when failing over from client to server
Typically a failover means that the client can't be reached. However, if
the client does eventually return after the timeout period, the log will
stream indefinitely. This fixes that using an API that wasn't broadly
available at the time this was first written.
2020-04-29 17:34:49 -07:00
Michael Lange 9ba563c48e Always pass credential in fetch requests, but also treat options reasonably
Now options can be provided without also having to remember to pass
credentials. This is convenient for abort controller signals.
2020-04-29 17:34:49 -07:00
Seth Hoenig dee7f3ea11
Merge pull request #7828 from hashicorp/b-ec2-speeds
env_aws: use best-effort lookup table for CPU performance in EC2
2020-04-29 11:25:54 -06:00
Seth Hoenig 880c4e23d3 env_aws: combine 3 log lines into 1 2020-04-29 10:47:36 -06:00
Seth Hoenig 67303b666c
env_aws: downgrade log line
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2020-04-29 10:34:26 -06:00
Seth Hoenig 5ddc607701
env_aws: fixup log line
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2020-04-29 10:33:53 -06:00
Tim Gross e34f099d20
csi: read-repair CSI volume claims (#7824)
The `CSIVolumeClaim` fields were added after 0.11.1, so claims made
before that may be missing the value. Repair this when we read the
volume out of the state store.

The `NodeID` field was added after 0.11.0, so we need to ensure it's
been populated during upgrades from 0.11.0.
2020-04-29 11:57:19 -04:00
Buck Doyle d4708860f0
UI: Fix exec popup link for job id ≠ name (#7815)
This closes #7814. It makes URL-generation more central and changes
the exec URL to include job id instead of name.
2020-04-29 07:54:04 -05:00
Mahmood Ali 0ab0463d20
Merge pull request #7829 from ccn/vendor-go-dockerclient-v1.6.5
Vendor: update fsouza/go-dockerclient to v1.6.5
2020-04-29 08:48:40 -04:00
ccn 889816d65c Remove unused internal subpackages 2020-04-29 20:21:44 +08:00
ccn a4c36add17 Vendor: update fsouza/go-dockerclient to v1.6.5 2020-04-29 18:54:55 +08:00
Seth Hoenig f8596a3602 env_aws: use best-effort lookup table for CPU performance in EC2
Fixes #7681

The current behavior of the CPU fingerprinter in AWS is that it
reads the **current** speed from `/proc/cpuinfo` (`CPU MHz` field).

This is because the max CPU frequency is not available by reading
anything on the EC2 instance itself. Normally on Linux one would
look at e.g. `sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq`
or perhaps parse the values from the `CPU max MHz` field in
`/proc/cpuinfo`, but those values are not available.

Furthermore, no metadata about the CPU is made available in the
EC2 metadata service.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html

Since `go-psutil` cannot determine the max CPU speed it defaults to
the current CPU speed, which could be basically any number between
0 and the true max. This is particularly bad on large, powerful
reserved instances which often idle at ~800 MHz while Nomad does
its fingerprinting (typically IO bound), which Nomad then uses as
the max, which results in severe loss of available resources.

Since the CPU specification is unavailable programmatically (at least
not without sudo) use a best-effort lookup table. This table was
generated by going through every instance type in AWS documentation
and copy-pasting the numbers.
https://aws.amazon.com/ec2/instance-types/

This approach obviously is not ideal as future instance types will
need to be added as they are introduced to AWS. However, using the
table should only be an improvement over the status quo since right
now Nomad miscalculates available CPU resources on all instance types.
2020-04-28 19:01:33 -06:00
Mahmood Ali 18ac17b189
Merge pull request #7827 from hashicorp/deps-go-msgpack-v1.1.5
Harmonize go-msgpack/codec/codecgen
2020-04-28 18:13:09 -04:00
Mahmood Ali 18dba6fdad Harmonize go-msgpack/codec/codecgen
Use v1.1.5 of go-msgpack/codec/codecgen, so go-msgpack codecgen matches
the library version.

We branched off earlier to pick up
f51b518921
, but apparently that's not needed as we could customize the package via
`-c` argument.
2020-04-28 17:12:31 -04:00
Tim Gross 4935b304a0
e2e: add helper to Makefile for local file deployments (#7822) 2020-04-28 16:15:58 -04:00
Lang Martin e32b5b12dd
command: deployment status without a prefix lists deployments (#7821) 2020-04-28 15:11:32 -04:00
Mahmood Ali 18f16cfb12
Merge pull request #7818 from greut/codegen
structs: give codecgen import
2020-04-28 12:16:41 -04:00
Buck Doyle 438aec636a
UI: update exec styles to match conventions (#7811) 2020-04-28 08:33:07 -05:00
Chris Baker 315bcf1060
Merge pull request #7816 from hashicorp/b-7789-job-scaling-status-issues
fix issues in Job.ScaleStatus
2020-04-28 06:33:42 -05:00
Yoan Blanc 5ca31f23e5
structs: give codecgen import
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-04-28 08:23:20 +02:00
Nick Ethier 4b810b697a
nomad: build dynamic port for exposed checks if not specified (#7800) 2020-04-28 00:07:41 -04:00
Chris Baker 6e48d73be8 updated changelog 2020-04-27 21:46:56 +00:00
Chris Baker 73f1390316 modified Job.ScaleStatus to ignore deployments and look directly at the
allocations, ignoring canaries
2020-04-27 21:45:39 +00:00
Charlie Voiselle 10ed58cee6 Adding API homepage to sidebar. 2020-04-27 13:41:11 -04:00
Charlie Voiselle 59470f4e90
Merge pull request #7801 from hashicorp/d-fix-docker-credhelper-example
[docs] Update credential helper example in docker.mdx
2020-04-27 11:44:54 -04:00
Mahmood Ali 57008ce95a
Merge pull request #7809 from greut/typos
api: fix some documentation typos
2020-04-27 08:50:25 -04:00
Mahmood Ali f68bfa9e55
Merge pull request #7805 from hashicorp/vendor-go-metrics-v0.3.3
Vendor: update armon/go-metrics to v0.3.3
2020-04-27 08:49:50 -04:00
Yoan Blanc 417c2995c9
api: fix some documentation typos
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-04-27 10:25:29 +02:00
Mahmood Ali e4f28e24a3 Vendor: update armon/go-metrics to v0.3.3
To pick up a lock contention fix in prometheus sink:
https://github.com/armon/go-metrics/pull/107 .
2020-04-26 08:54:50 -04:00
Charlie Voiselle f1ababc31b
Update docker.mdx 2020-04-24 23:20:02 -04:00
Charlie Voiselle e5ebce0e6b
Merge pull request #7792 from angrycub/f-disable_dangling_container_gc
Disable dangling container GC for demo
2020-04-24 23:12:16 -04:00
Seth Hoenig 4fa0c395df
Merge pull request #7784 from hashicorp/demo-grpc-checks
demo: create a demo service for grpc healthchecks
2020-04-24 11:35:58 -06:00
Seth Hoenig f2ef576510 demo: create a demo service for grpc healthchecks
Examples for HTTP based task-group service healthchecks are
covered by the `countdash` demo, but gRPC checks currently
have no runnable examples.

This PR adds a trivial gRPC enabled application that provides
a Service implementing the standard gRPC healthcheck interface.
2020-04-24 10:59:50 -06:00
Tim Gross bad9a82df8
ci: add a linting check for HCL files (#7791)
Running `make dev` runs `hclfmt`, but this isn't checked as part of
CI. That makes it possible to merge un-formatted HCL and Nomad
jobspecs that later will make for dirty git staging areas when
developers pull master.

This changeset adds HCL linting to the `make check` target.
2020-04-23 14:32:44 -04:00
Charlie Voiselle 14b5a00932
Disable dangling container GC for demo 2020-04-23 11:51:03 -04:00
Tim Gross 083b35d651
csi: checkpoint volume claim garbage collection (#7782)
Adds a `CSIVolumeClaim` type to be tracked as current and past claims
on a volume. Allows for a client RPC failure during node or controller
detachment without having to keep the allocation around after the
first garbage collection eval.

This changeset lays groundwork for moving the actual detachment RPCs
into a volume watching loop outside the GC eval.
2020-04-23 11:06:23 -04:00
Tim Gross e7e9c83aa3
website: fix path for spellchecking and correct errors (#7790) 2020-04-23 10:38:08 -04:00
Chris Baker 2f7372d29d
Merge pull request #7788 from hashicorp/b-7716-scaling-policy-parsing
parsing should error if scaling block includes multiple policy blocks
2020-04-23 08:57:31 -05:00
Chris Baker beeccc26e4 changelog entries for 7772 and 7788 2020-04-23 12:45:52 +00:00
Chris Baker 8ea4a7e84b return parsing error if scaling policy includes more than one policy block
also, check that parsing a minimal scaling block doesn't throw any errors
2020-04-23 12:37:45 +00:00
Michael Lange 0dac605902
Merge pull request #7689 from hashicorp/ui/plumb-proxy-config-to-proxy
UI Plumb proxy config to proxy
2020-04-22 19:31:27 -07:00
Mahmood Ali 018e39b456
Merge pull request #7785 from hashicorp/b-http-fail-log-level
http: adjust log level for request failure
2020-04-22 17:03:11 -04:00
Mahmood Ali b8fb32f5d2 http: adjust log level for request failure
Failed requests due to API client errors are to be marked as DEBUG.

The Error log level should be reserved to signal problems with the
cluster and are actionable for nomad system operators.  Logs due to
misbehaving API clients don't represent a system level problem and seem
spurius to nomad maintainers at best.  These log messages can also be
attack vectors for deniel of service attacks by filling servers disk
space with spurious log messages.
2020-04-22 16:19:59 -04:00