open-nomad

Commit Graph

Author	SHA1	Message	Date
Mahmood Ali	bc25da9dac	Merge pull request #7851 from hashicorp/spread-configuration-followup Follow up fix for spread	2020-05-01 13:48:31 -04:00
Mahmood Ali	759eade78b	missed fixing one invocation	2020-05-01 13:38:46 -04:00
Tim Gross	139c65c436	e2e: csi test can purge target job (#7823 )	2020-05-01 13:25:50 -04:00
Mahmood Ali	78ae7b885a	Merge pull request #7810 from hashicorp/spread-configuration spread scheduling algorithm	2020-05-01 13:15:19 -04:00
Mahmood Ali	3da74068dd	changelog and fix typo	2020-05-01 13:14:20 -04:00
Mahmood Ali	b9e3cde865	tests and some clean up	2020-05-01 13:13:30 -04:00
Charlie Voiselle	d8e5e02398	Wiring algorithm to scheduler calls	2020-05-01 13:13:29 -04:00
Charlie Voiselle	663fb677cf	Add SchedulerAlgorithm to SchedulerConfig	2020-05-01 13:13:29 -04:00
Lang Martin	ad2fb4b297	client/heartbeatstop: don't store client state, use timeout In order to minimize this change while keeping a simple version of the behavior, we set `lastOk` to the current time less the intial server connection timeout. If the client starts and never contacts the server, it will stop all configured tasks after the initial server connection grace period, on the assumption that we've been out of touch longer than any configured `stop_after_client_disconnect`. The more complex state behavior might be justified later, but we should learn about failure modes first.	2020-05-01 12:35:49 -04:00
Lang Martin	28bac139cb	client/heartbeatstop: destroy allocs when disconnected from servers - track lastHeartbeat, the client local time of the last successful heartbeat round trip - track allocations with `stop_after_client_disconnect` configured - trigger allocation destroy (which handles cleanup) - restore heartbeat/killable allocs tracking when allocs are recovered from disk - on client restart, stop those allocs after a grace period if the servers are still partioned	2020-05-01 12:35:49 -04:00
Drew Bailey	fdd61e4a63	Merge pull request #7847 from hashicorp/ent-license-404 temporarily test for 404 until endpoint is ready	2020-05-01 11:31:10 -04:00
Drew Bailey	581ad558a8	temporarily test for 404 until endpoint is ready	2020-05-01 11:24:37 -04:00
Drew Bailey	2b8fc650c9	Merge pull request #7778 from hashicorp/license-cli License cli	2020-05-01 08:51:40 -04:00
changli0617	a37176ab44	Update _app.js	2020-04-30 15:25:43 -07:00
Michael Schurter	cebc6b939e	Merge pull request #7730 from hashicorp/b-reserved-scoring core: fix node reservation scoring	2020-04-30 14:48:36 -07:00
Michael Schurter	c901d0e7dd	Merge branch 'master' into b-reserved-scoring	2020-04-30 14:48:14 -07:00
Michael Schurter	439a9f7301	Update website/pages/docs/upgrade/upgrade-specific.mdx Co-authored-by: Alex Dadgar <alex@hashicorp.com>	2020-04-30 14:47:12 -07:00
Tim Gross	cbae10333c	csi: check returned volume capability validation (#7831 ) This changeset corrects handling of the `ValidationVolumeCapabilities` response: * The CSI spec for the `ValidationVolumeCapabilities` requires that plugins only set the `Confirmed` field if they've validated all capabilities. The Nomad client improperly assumes that the lack of a `Confirmed` field should be treated as a failure. This breaks the Azure and Linode block storage plugins, which don't set this optional field. * The CSI spec also requires that the orchestrator check the validation responses to guard against older versions of a plugin reporting "valid" for newer fields it doesn't understand.	2020-04-30 17:12:32 -04:00
Tim Gross	cc7dbad1c7	csi: restore long timeout for controller plugins (#7840 ) During MVP development, we reduced the timeout for controller plugins to avoid long hangs in GC workers. But now that this work has been moved to the volume watcher, we can restore the original timeout which is better suited for the characteristic timescales of some cloud provider APIs and better matches the behavior of k8s.	2020-04-30 17:12:05 -04:00
Tim Gross	52e805a6a6	csi: ensure Read/WriteAllocs aren't released early (#7841 ) We should only remove the `ReadAllocs`/`WriteAllocs` values for a volume after the claim has entered the "ready to free" state. The volume will eventually be released as expected. But querying the volume API will show the volume is released before the controller unpublish has finished and this can cause a race with starting new jobs. Test updates are to cover cases where we're dropping claims but not running through the whole reaping process.	2020-04-30 17:11:31 -04:00
Drew Bailey	41c7d49eb7	properly format license output	2020-04-30 14:46:26 -04:00
Drew Bailey	42075ef30e	allow test to check if server is enterprise	2020-04-30 14:46:21 -04:00
Drew Bailey	acacecc67b	add license reset command to commands help text formatting remove reset no signed option	2020-04-30 14:46:20 -04:00
Drew Bailey	a266284f60	test all commands oss err	2020-04-30 14:46:19 -04:00
Drew Bailey	59b76f90e8	hcl fmt from editor license cli formatting, license endpoints ent only test oss error type assertions	2020-04-30 14:46:18 -04:00
Drew Bailey	74abe6ef48	license cli commands cli changes, formatting	2020-04-30 14:46:17 -04:00
Jasmine Dahilig	a9004faa11	UI: Add representations for task lifecycles (#7659 ) This adds details about task lifecycles to allocations, task groups, and tasks. It includes a live-updating timeline-like chart on allocations.	2020-04-30 08:15:19 -05:00
Tim Gross	a7a64443e1	csi: move volume claim release into volumewatcher (#7794 ) This changeset adds a subsystem to run on the leader, similar to the deployment watcher or node drainer. The `Watcher` performs a blocking query on updates to the `CSIVolumes` table and triggers reaping of volume claims. This will avoid tying up scheduling workers by immediately sending volume claim workloads into their own loop, rather than blocking the scheduling workers in the core GC job doing things like talking to CSI controllers The volume watcher is enabled on leader step-up and disabled on leader step-down. The volume claim GC mechanism now makes an empty claim RPC for the volume to trigger an index bump. That in turn unblocks the blocking query in the volume watcher so it can assess which claims can be released for a volume.	2020-04-30 09:13:00 -04:00
Michael Lange	c3085f04b6	Merge pull request #7820 from hashicorp/b-ui/ui-log-races UI: Log streaming bug fix medley	2020-04-29 18:06:47 -07:00
Michael Lange	21ef3633be	Make the no connection error on the logs page dismissable	2020-04-29 17:36:17 -07:00
Michael Lange	e74cd16252	Fix race condition where stdout and stderr requests can cause a no connection error This would happen because a no connection error happens after the second request fails, but that's because it's assumed the second request is to a server node. However, if a user clicks stderr fast enough, the first and second requests are both to the client node. This changes the logic to check if the request is to the server before deeming log streaming a total failure.	2020-04-29 17:36:17 -07:00
Michael Lange	aafbeaba75	Clicking stdout/stderr when already on that tab is now a noop	2020-04-29 17:36:16 -07:00
Michael Lange	7452a9a57d	Abort log fetch request when failing over from client to server Typically a failover means that the client can't be reached. However, if the client does eventually return after the timeout period, the log will stream indefinitely. This fixes that using an API that wasn't broadly available at the time this was first written.	2020-04-29 17:34:49 -07:00
Michael Lange	9ba563c48e	Always pass credential in fetch requests, but also treat options reasonably Now options can be provided without also having to remember to pass credentials. This is convenient for abort controller signals.	2020-04-29 17:34:49 -07:00
Seth Hoenig	dee7f3ea11	Merge pull request #7828 from hashicorp/b-ec2-speeds env_aws: use best-effort lookup table for CPU performance in EC2	2020-04-29 11:25:54 -06:00
Seth Hoenig	880c4e23d3	env_aws: combine 3 log lines into 1	2020-04-29 10:47:36 -06:00
Seth Hoenig	67303b666c	env_aws: downgrade log line Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>	2020-04-29 10:34:26 -06:00
Seth Hoenig	5ddc607701	env_aws: fixup log line Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>	2020-04-29 10:33:53 -06:00
Tim Gross	e34f099d20	csi: read-repair CSI volume claims (#7824 ) The `CSIVolumeClaim` fields were added after 0.11.1, so claims made before that may be missing the value. Repair this when we read the volume out of the state store. The `NodeID` field was added after 0.11.0, so we need to ensure it's been populated during upgrades from 0.11.0.	2020-04-29 11:57:19 -04:00
Buck Doyle	d4708860f0	UI: Fix exec popup link for job id ≠ name (#7815 ) This closes #7814. It makes URL-generation more central and changes the exec URL to include job id instead of name.	2020-04-29 07:54:04 -05:00
Mahmood Ali	0ab0463d20	Merge pull request #7829 from ccn/vendor-go-dockerclient-v1.6.5 Vendor: update fsouza/go-dockerclient to v1.6.5	2020-04-29 08:48:40 -04:00
ccn	889816d65c	Remove unused internal subpackages	2020-04-29 20:21:44 +08:00
ccn	a4c36add17	Vendor: update fsouza/go-dockerclient to v1.6.5	2020-04-29 18:54:55 +08:00
Seth Hoenig	f8596a3602	env_aws: use best-effort lookup table for CPU performance in EC2 Fixes #7681 The current behavior of the CPU fingerprinter in AWS is that it reads the current speed from `/proc/cpuinfo` (`CPU MHz` field). This is because the max CPU frequency is not available by reading anything on the EC2 instance itself. Normally on Linux one would look at e.g. `sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq` or perhaps parse the values from the `CPU max MHz` field in `/proc/cpuinfo`, but those values are not available. Furthermore, no metadata about the CPU is made available in the EC2 metadata service. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html Since `go-psutil` cannot determine the max CPU speed it defaults to the current CPU speed, which could be basically any number between 0 and the true max. This is particularly bad on large, powerful reserved instances which often idle at ~800 MHz while Nomad does its fingerprinting (typically IO bound), which Nomad then uses as the max, which results in severe loss of available resources. Since the CPU specification is unavailable programmatically (at least not without sudo) use a best-effort lookup table. This table was generated by going through every instance type in AWS documentation and copy-pasting the numbers. https://aws.amazon.com/ec2/instance-types/ This approach obviously is not ideal as future instance types will need to be added as they are introduced to AWS. However, using the table should only be an improvement over the status quo since right now Nomad miscalculates available CPU resources on all instance types.	2020-04-28 19:01:33 -06:00
Mahmood Ali	18ac17b189	Merge pull request #7827 from hashicorp/deps-go-msgpack-v1.1.5 Harmonize go-msgpack/codec/codecgen	2020-04-28 18:13:09 -04:00
Mahmood Ali	18dba6fdad	Harmonize go-msgpack/codec/codecgen Use v1.1.5 of go-msgpack/codec/codecgen, so go-msgpack codecgen matches the library version. We branched off earlier to pick up `f51b518921` , but apparently that's not needed as we could customize the package via `-c` argument.	2020-04-28 17:12:31 -04:00
Tim Gross	4935b304a0	e2e: add helper to Makefile for local file deployments (#7822 )	2020-04-28 16:15:58 -04:00
Lang Martin	e32b5b12dd	command: deployment status without a prefix lists deployments (#7821 )	2020-04-28 15:11:32 -04:00
Mahmood Ali	18f16cfb12	Merge pull request #7818 from greut/codegen structs: give codecgen import	2020-04-28 12:16:41 -04:00
Buck Doyle	438aec636a	UI: update exec styles to match conventions (#7811 )	2020-04-28 08:33:07 -05:00

... 5 6 7 8 9 ...

18321 Commits All Branches Search

18321 Commits

All Branches