open-nomad

Author	SHA1	Message	Date
Tim Gross	00c9bd7ff0	reorder volume claim batch request raft message (#7871 ) For backwards compatibility during upgrades, new raft message types need to come at the end of the enum.	2020-05-06 08:57:51 -04:00
Mahmood Ali	24e0c7f081	ui: only count running allocations in client view In the client view list, only show running allocations count for each client, rather than include already completed tasks. This is done for two reasons: First, consitency with the CLI: `nomad node status --allocs` only shows running allocs. Second, and more importantly, the count is useful to estimate how loaded the clients are. Allocs that have completed (but not GCed yet) have very little value to operators.	2020-05-05 21:31:58 -04:00
Tim Gross	ce86a594a6	csi: fix plugin counts on node update (#7844 ) In this changeset: * If a Nomad client node is running both a controller and a node plugin (which is a common case), then if only the controller or the node is removed, the plugin was not being updated with the correct counts. * The existing test for plugin cleanup didn't go back to the state store, which normally is ok but is complicated in this case by denormalization which changes the behavior. This commit makes the test more comprehensive. * Set "controller required" when plugin has `PUBLISH_READONLY`. All known controllers that support `PUBLISH_READONLY` also support `PUBLISH_UNPUBLISH_VOLUME` but we shouldn't assume this. * Only create plugins when the allocs for those plugins are healthy. If we allow a plugin to be created for the first time when the alloc is not healthy, then we'll recreate deleted plugins when the job's allocs all get marked terminal. * Terminal plugin alloc updates should cleanup the plugin. The client fingerprint can't tell if the plugin is unhealthy intentionally (for the case of updates or job stop). Allocations that are server-terminal should delete themselves from the plugin and trigger a plugin self-GC, the same as an unused node.	2020-05-05 15:39:57 -04:00
Tim Gross	3cca738478	csi: fix mount validation (#7869 ) Several of the CSI `VolumeCapability` methods return pointers, which we were then comparing to pointers in the request rather than dereferencing them and comparing their contents. This changeset does a more fine-grained comparison of the request vs the capabilities, and adds better error messaging.	2020-05-05 15:13:07 -04:00
Drew Bailey	13beb103a4	Merge pull request #7867 from hashicorp/license-command-updates update license command output to reflect api changes	2020-05-05 14:10:22 -04:00
Tim Gross	22e3815e8c	docstring improvements and typo fixes (#7862 )	2020-05-05 10:30:50 -04:00
Drew Bailey	48c451709e	update license command output to reflect api changes	2020-05-05 10:28:58 -04:00
Juan Larriba	a0df437c62	Run Linux Images (LCOW) and Windows Containers side by side (#7850 ) Makes it possible to run Linux Containers On Windows with Nomad alongside Windows Containers. Fingerprint prevents only to run Nomad in Windows 10 with Linux Containers	2020-05-04 13:08:47 -04:00
Tim Gross	1c6dcab56b	volumewatcher: remove spurious nil-check (#7858 ) The nil-check here is left-over from an earlier approach that didn't get merged. It doesn't do anything for us now as we can't ever pass it `nil` and if we leave it in the `getVolume` call it guards will panic anyways.	2020-05-04 12:28:32 -04:00
Mahmood Ali	0d730326c1	Merge pull request #7856 from Renerick/patch-1 Fix URL schema in `drain` documentation	2020-05-03 13:51:05 -04:00
Denis Palashevskiy	4095860116	Fix URL schema in `drain` documentation	2020-05-03 20:50:40 +04:00
Michael Schurter	1d7f8391ee	Merge pull request #7846 from hashicorp/changli0617-patch-1 Update _app.js for Nomad Virtual Day	2020-05-01 14:33:23 -07:00
Michael Lange	260da00852	Add embedded task group to allocation to reference when allocation is historical	2020-05-01 14:30:02 -07:00
Michael Lange	91b97e0170	Stabilize job and allocation job versions in fixtures	2020-05-01 14:29:24 -07:00
Michael Lange	9a857a7042	Comment why the allocation has to be reloaded	2020-05-01 14:27:53 -07:00
Mahmood Ali	bc25da9dac	Merge pull request #7851 from hashicorp/spread-configuration-followup Follow up fix for spread	2020-05-01 13:48:31 -04:00
Mahmood Ali	759eade78b	missed fixing one invocation	2020-05-01 13:38:46 -04:00
Tim Gross	139c65c436	e2e: csi test can purge target job (#7823 )	2020-05-01 13:25:50 -04:00
Mahmood Ali	78ae7b885a	Merge pull request #7810 from hashicorp/spread-configuration spread scheduling algorithm	2020-05-01 13:15:19 -04:00
Mahmood Ali	3da74068dd	changelog and fix typo	2020-05-01 13:14:20 -04:00
Mahmood Ali	b9e3cde865	tests and some clean up	2020-05-01 13:13:30 -04:00
Charlie Voiselle	d8e5e02398	Wiring algorithm to scheduler calls	2020-05-01 13:13:29 -04:00
Charlie Voiselle	663fb677cf	Add SchedulerAlgorithm to SchedulerConfig	2020-05-01 13:13:29 -04:00
Lang Martin	ad2fb4b297	client/heartbeatstop: don't store client state, use timeout In order to minimize this change while keeping a simple version of the behavior, we set `lastOk` to the current time less the intial server connection timeout. If the client starts and never contacts the server, it will stop all configured tasks after the initial server connection grace period, on the assumption that we've been out of touch longer than any configured `stop_after_client_disconnect`. The more complex state behavior might be justified later, but we should learn about failure modes first.	2020-05-01 12:35:49 -04:00
Lang Martin	28bac139cb	client/heartbeatstop: destroy allocs when disconnected from servers - track lastHeartbeat, the client local time of the last successful heartbeat round trip - track allocations with `stop_after_client_disconnect` configured - trigger allocation destroy (which handles cleanup) - restore heartbeat/killable allocs tracking when allocs are recovered from disk - on client restart, stop those allocs after a grace period if the servers are still partioned	2020-05-01 12:35:49 -04:00
Drew Bailey	fdd61e4a63	Merge pull request #7847 from hashicorp/ent-license-404 temporarily test for 404 until endpoint is ready	2020-05-01 11:31:10 -04:00
Drew Bailey	581ad558a8	temporarily test for 404 until endpoint is ready	2020-05-01 11:24:37 -04:00
Drew Bailey	2b8fc650c9	Merge pull request #7778 from hashicorp/license-cli License cli	2020-05-01 08:51:40 -04:00
changli0617	a37176ab44	Update _app.js	2020-04-30 15:25:43 -07:00
Michael Schurter	cebc6b939e	Merge pull request #7730 from hashicorp/b-reserved-scoring core: fix node reservation scoring	2020-04-30 14:48:36 -07:00
Michael Schurter	c901d0e7dd	Merge branch 'master' into b-reserved-scoring	2020-04-30 14:48:14 -07:00
Michael Schurter	439a9f7301	Update website/pages/docs/upgrade/upgrade-specific.mdx Co-authored-by: Alex Dadgar <alex@hashicorp.com>	2020-04-30 14:47:12 -07:00
Tim Gross	cbae10333c	csi: check returned volume capability validation (#7831 ) This changeset corrects handling of the `ValidationVolumeCapabilities` response: * The CSI spec for the `ValidationVolumeCapabilities` requires that plugins only set the `Confirmed` field if they've validated all capabilities. The Nomad client improperly assumes that the lack of a `Confirmed` field should be treated as a failure. This breaks the Azure and Linode block storage plugins, which don't set this optional field. * The CSI spec also requires that the orchestrator check the validation responses to guard against older versions of a plugin reporting "valid" for newer fields it doesn't understand.	2020-04-30 17:12:32 -04:00
Tim Gross	cc7dbad1c7	csi: restore long timeout for controller plugins (#7840 ) During MVP development, we reduced the timeout for controller plugins to avoid long hangs in GC workers. But now that this work has been moved to the volume watcher, we can restore the original timeout which is better suited for the characteristic timescales of some cloud provider APIs and better matches the behavior of k8s.	2020-04-30 17:12:05 -04:00
Tim Gross	52e805a6a6	csi: ensure Read/WriteAllocs aren't released early (#7841 ) We should only remove the `ReadAllocs`/`WriteAllocs` values for a volume after the claim has entered the "ready to free" state. The volume will eventually be released as expected. But querying the volume API will show the volume is released before the controller unpublish has finished and this can cause a race with starting new jobs. Test updates are to cover cases where we're dropping claims but not running through the whole reaping process.	2020-04-30 17:11:31 -04:00
Drew Bailey	41c7d49eb7	properly format license output	2020-04-30 14:46:26 -04:00
Drew Bailey	42075ef30e	allow test to check if server is enterprise	2020-04-30 14:46:21 -04:00
Drew Bailey	acacecc67b	add license reset command to commands help text formatting remove reset no signed option	2020-04-30 14:46:20 -04:00
Drew Bailey	a266284f60	test all commands oss err	2020-04-30 14:46:19 -04:00
Drew Bailey	59b76f90e8	hcl fmt from editor license cli formatting, license endpoints ent only test oss error type assertions	2020-04-30 14:46:18 -04:00
Drew Bailey	74abe6ef48	license cli commands cli changes, formatting	2020-04-30 14:46:17 -04:00
Jasmine Dahilig	a9004faa11	UI: Add representations for task lifecycles (#7659 ) This adds details about task lifecycles to allocations, task groups, and tasks. It includes a live-updating timeline-like chart on allocations.	2020-04-30 08:15:19 -05:00
Tim Gross	a7a64443e1	csi: move volume claim release into volumewatcher (#7794 ) This changeset adds a subsystem to run on the leader, similar to the deployment watcher or node drainer. The `Watcher` performs a blocking query on updates to the `CSIVolumes` table and triggers reaping of volume claims. This will avoid tying up scheduling workers by immediately sending volume claim workloads into their own loop, rather than blocking the scheduling workers in the core GC job doing things like talking to CSI controllers The volume watcher is enabled on leader step-up and disabled on leader step-down. The volume claim GC mechanism now makes an empty claim RPC for the volume to trigger an index bump. That in turn unblocks the blocking query in the volume watcher so it can assess which claims can be released for a volume.	2020-04-30 09:13:00 -04:00
Michael Lange	c3085f04b6	Merge pull request #7820 from hashicorp/b-ui/ui-log-races UI: Log streaming bug fix medley	2020-04-29 18:06:47 -07:00
Michael Lange	21ef3633be	Make the no connection error on the logs page dismissable	2020-04-29 17:36:17 -07:00
Michael Lange	e74cd16252	Fix race condition where stdout and stderr requests can cause a no connection error This would happen because a no connection error happens after the second request fails, but that's because it's assumed the second request is to a server node. However, if a user clicks stderr fast enough, the first and second requests are both to the client node. This changes the logic to check if the request is to the server before deeming log streaming a total failure.	2020-04-29 17:36:17 -07:00
Michael Lange	aafbeaba75	Clicking stdout/stderr when already on that tab is now a noop	2020-04-29 17:36:16 -07:00
Michael Lange	7452a9a57d	Abort log fetch request when failing over from client to server Typically a failover means that the client can't be reached. However, if the client does eventually return after the timeout period, the log will stream indefinitely. This fixes that using an API that wasn't broadly available at the time this was first written.	2020-04-29 17:34:49 -07:00
Michael Lange	9ba563c48e	Always pass credential in fetch requests, but also treat options reasonably Now options can be provided without also having to remember to pass credentials. This is convenient for abort controller signals.	2020-04-29 17:34:49 -07:00
Seth Hoenig	dee7f3ea11	Merge pull request #7828 from hashicorp/b-ec2-speeds env_aws: use best-effort lookup table for CPU performance in EC2	2020-04-29 11:25:54 -06:00

... 2 3 4 5 6 ...

18186 commits