Commit graph

19331 commits

Author SHA1 Message Date
Tim Gross 3169839653
docs: always use -ignore-system on node drain with CSI (#8606)
Postrun hooks for allocation runners don't currently block the registration of
terminal health with the servers, which is what allows system jobs to be
drained. So draining nodes with jobs that claim CSI volumes requires the
`-ignore-system` job to ensure that the postrun hook for service jobs gets a
chance to execute.
2020-08-07 11:22:28 -04:00
Tim Gross 7d53ed88d6
csi: client RPCs should return wrapped errors for checking (#8605)
When the client-side actions of a CSI client RPC succeed but we get
disconnected during the RPC or we fail to checkpoint the claim state, we want
to be able to retry the client RPC without getting blocked by the client-side
state (ex. mount points) already having been cleaned up in previous calls.
2020-08-07 11:01:36 -04:00
Tim Gross 81b604fa13
csi: controller unpublish should check current alloc count (#8604)
Using the count of node claims from earlier in the `CSIVolume.Unpublish RPC
doesn't correctly account for cases where the RPC was interrupted but
checkpointed. Instead, we'll check the current allocation count and status to
determine whether we need to send a controller unpublish.
2020-08-07 10:43:45 -04:00
Seth Hoenig d6a60ff4b1
Merge pull request #8603 from hashicorp/f-upgrade-consul-api
deps: upgrade import of consul/api
2020-08-07 08:46:19 -05:00
Seth Hoenig fb1c85a956 deps: upgrade import of consul/api
Upgrade our consul/api import to the equivelent of consul@v1.8.1 which includes
a bug fix necessary for #6913. If consul would publish a proper api/ submodule tag
we could reference that.
2020-08-06 21:02:33 -05:00
Michael Lange 286e56ed82 Make eq-by helper resilient to a lack of prop since handlebars doesn't short-circuit evaluation 2020-08-06 17:59:26 -07:00
Michael Lange 476002d727 Key the annotations each loop by annotationKey for stable dom nodes 2020-08-06 17:58:43 -07:00
Michael Lange a04d4f2d76 Add integration test for line-chart annotation staggering 2020-08-06 17:37:09 -07:00
Michael Lange 59d98b80ca
Add missing word "two" to test name
Co-authored-by: Buck Doyle <buck@hashicorp.com>
2020-08-06 15:43:29 -07:00
Tim Gross 2854298089
csi: release claims via csi_hook postrun unpublish RPC (#8580)
Add a Postrun hook to send the `CSIVolume.Unpublish` RPC to the server. This
may forward client RPCs to the node plugins or to the controller plugins,
depending on whether other allocations on this node have claims on this
volume.

By making clients responsible for running the `CSIVolume.Unpublish` RPC (and
making the RPC available to a `nomad volume detach` command), the
volumewatcher becomes only used by the core GC job and we no longer need
async volume GC from job deregister and node update.
2020-08-06 14:51:46 -04:00
Michael Schurter c7c603eda5 build: update from Go 1.14.6 to Go 1.14.7
Go 1.14.7 fixes CVE-2020-16845 which is not believed to impact Nomad.
2020-08-06 11:50:29 -07:00
Michael Schurter 057e1c021f
Merge pull request #8597 from hashicorp/b-vault-revoke-log-line
vault: log once per interval if batching revocation
2020-08-06 11:32:47 -07:00
Tim Gross 314458ebdb
csi: update volumewatcher to use unpublish RPC (#8579)
This changeset updates `nomad/volumewatcher` to take advantage of the
`CSIVolume.Unpublish` RPC. This lets us eliminate a bunch of code and
associated tests. The raft batching code can be safely dropped, as the
characteristic times of the CSI RPCs are on the order of seconds or even
minutes, so batching up raft RPCs added complexity without any real world
performance wins.

Includes refactor w/ test cleanup and dead code elimination in volumewatcher
2020-08-06 14:31:18 -04:00
Tim Gross eaa14ab64c
csi: add unpublish RPC (#8572)
This changeset is plumbing for a `nomad volume detach` command that will be
reused by the volumewatcher claim GC as well.
2020-08-06 13:51:29 -04:00
Tim Gross 4bbf18703f
csi: retry controller client RPCs on next controller (#8561)
The documentation encourages operators to run multiple controller plugin
instances for HA, but the client RPCs don't take advantage of this by retrying
when the RPC fails in cases when the plugin is unavailable (because the node
has drained or the alloc has failed but we haven't received an updated
fingerprint yet).

This changeset tries all known controllers on ready nodes before giving up,
and adds tests that exercise the client RPC routing and retries.
2020-08-06 13:24:24 -04:00
Luiz Aoqui 0fadad46d3
Merge pull request #8595 from hashicorp/docs/fix-connect-log-level
docs: fix Consul Connect log_level meta key
2020-08-06 11:00:09 -04:00
Buck Doyle 67f8d95917 Add override for null events collection
This removes some errors in the console if there are no
autoscaling events.
2020-08-06 07:51:56 -05:00
James Rasell 38f23b79df
Merge pull request #8574 from shishir-a412ed/f-ui-containerd-driver
Add nomad-driver-containerd to nomad UI docs.
2020-08-06 09:36:11 +02:00
Michael Schurter 2385fee0d2 vault: log once per interval if batching revocation
This log line should be rare since:

1. Most tokens should be logged synchronously, not via this async
   batched method. Async revocation only takes place when Vault
   connectivity is lost and after leader election so no revocations are
   missed.
2. There should rarely be >1 batch (1,000) tokens to revoke since the
   above conditions should be brief and infrequent.
3. Interval is 5 minutes, so this log line will be emitted at *most*
   once every 5 minutes.

What makes this log line rare is also what makes it interesting: due to
a bug prior to Nomad 0.11.2 some tokens may never get revoked. Therefore
Nomad tries to re-revoke them on every leader election. This caused a
massive buildup of old tokens that would never be properly revoked and
purged. Nomad 0.11.3 mostly fixed this but still had a bug in purging
revoked tokens via Raft (fixed in #8553).

The nomad.vault.distributed_tokens_revoked metric is only ticked upon
successful revocation and purging, making any bugs or slowness in the
process difficult to detect.

Logging before a potentially slow revocation+purge operation is
performed will give users much better indications of what activity is
going on should the process fail to make it to the metric.
2020-08-05 15:39:21 -07:00
Luiz Aoqui 602d3373ed
docs: fix Consul Connect log_level meta key 2020-08-05 17:01:03 -04:00
Buck Doyle 9074d33f28
UI: Add truncation of rendered search results (#8571)
This closes #8549. Thanks to @optiz0r for the bug report. Having
the global search attempt to render every returned result is
obviously a mistake!
2020-08-05 15:58:44 -05:00
Michael Lange 3b59b52bca Compare scale events by their UID instead of reference equality 2020-08-05 12:02:23 -07:00
Michael Lange ecaee67ff1 Use the correct gray for the info details 2020-08-05 12:02:22 -07:00
Michael Lange 033618c46e Safestr the annotation style property 2020-08-05 12:02:22 -07:00
Michael Lange ebdb0c4101 Conditionally show the scaling timeline or accordion 2020-08-05 12:02:22 -07:00
Michael Lange 4c4e2e505f Unit test coverage for the ScaleEventsChart data domain logic 2020-08-05 12:02:22 -07:00
Michael Lange 792fa64101 Integration tests for the ScaleEventsChart component 2020-08-05 12:02:22 -07:00
Michael Lange 39583e0ce1 Force mock error scale events to be annotations 2020-08-05 12:02:22 -07:00
Michael Lange 09f6bca470 New ScaleEventsChart component
Displays all scale events in the form of an annotated line chart. When
annotations are clicked, the timestamp, message, and meta propeties for
the event are displayed below the chart.
2020-08-05 12:02:22 -07:00
Michael Lange 21f3b7dfcc Add activeAnnotation property to line-chart 2020-08-05 12:02:22 -07:00
Michael Lange 2903d1f504 Stagger line chart annotations when they are too close 2020-08-05 12:02:22 -07:00
Michael Lange b332e186b2 Add curve options to line chart 2020-08-05 12:02:22 -07:00
Michael Lange a891e907f5 Test coverage for line chart annotations 2020-08-05 12:02:22 -07:00
Michael Lange 24b6aeb746 Story for line chart annotations 2020-08-05 12:02:22 -07:00
Michael Lange 8445e22faf Add annotations to the line chart component 2020-08-05 12:02:22 -07:00
Michael Lange 299f2b6453 Make the default time series date format for line chart more useful 2020-08-05 12:02:21 -07:00
Shishir Mahajan 874f948520
Fix review comments. 2020-08-05 11:51:00 -07:00
Shishir Mahajan 088b0694b4 Add nomad-driver-containerd to nomad UI docs. 2020-08-04 11:29:05 -07:00
Chris Baker a5dc6df0ff
Merge pull request #8583 from hashicorp/cgbaker-patch-1
Update CHANGELOG.md
2020-07-31 11:16:48 -05:00
Chris Baker 07e8b405d2
Update CHANGELOG.md
Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
2020-07-31 11:14:13 -05:00
Chris Baker e920bd22bb
Update CHANGELOG.md
Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
2020-07-31 11:13:40 -05:00
Chris Baker 8ba61e60d6
Update CHANGELOG.md
label in changelog listed wrong issue number
2020-07-31 11:05:53 -05:00
Drew Bailey c06a84e4a2
ignore VAULT_NAMESPACE (#8581)
VAULT_NAMESPACE in 0.12.1 and previous versions is already ignored. \n revert change that used it as a default since it will break oss users
2020-07-31 10:33:21 -04:00
Tim Gross 5dba653b43
csi/e2e: add 2nd controller for node drain testing (#8573) 2020-07-31 08:03:49 -04:00
Buck Doyle c2ce0a1dec
Add linting for acceptance accessibility audits (#8570)
This makes use of the PR I recently had merged to eslint-plugin-ember-a11y-testing
to add linting that ensures an accessibility audit is called at least once per acceptance
test file. When I have added linting for component tests, it can apply there too.

I added exclusions for the filesystem browser tests, which are covered by behaviors/fs
and for the search test which will involve significant overrides to Ember Power Select
default templates.
2020-07-30 12:40:05 -05:00
James Rasell 90903bb625
Merge pull request #8555 from hashicorp/remove-size-detail-from-docs-homepage
docs: remove Nomad binary size from README and website.
2020-07-30 19:20:23 +02:00
Mahmood Ali 490b9ce3a0
Handle Scaling Policies in Job Plan endpoint (#8567)
Fixes https://github.com/hashicorp/nomad/issues/8544

This PR fixes a bug where using `nomad job plan ...` always report no change if the submitted job contain scaling.

The issue has three contributing factors:
1. The plan endpoint doesn't populate the required scaling policy ID; unlike the job register endpoint
2. The plan endpoint suppresses errors on job insertion - the job insertion fails here, because the scaling policy is missing the required ID
3. The scheduler reports no update necessary when the relevant job isn't in store (because the insertion failed)

This PR fixes the first two factors.  Changing the scheduler to be more strict might make sense, but may violate some idempotency invariant or make the scheduler more brittle.
2020-07-30 12:27:36 -04:00
Michael Lange acecdbf4a2
Merge pull request #8569 from hashicorp/d/update-scale-request-params
Docs: Update Reason to Message in the job scale docs
2020-07-30 09:18:18 -07:00
Michael Lange 42517d87b4
Merge pull request #8568 from hashicorp/b-ui/scale-post-message
Use the correct Message property instead of Reason in scale POST request
2020-07-30 09:13:15 -07:00
Michael Lange af446cec10 Update Reason to Message in the job scale docs 2020-07-30 09:06:08 -07:00