Commit graph

19074 commits

Author SHA1 Message Date
Tim Gross 7d53ed88d6
csi: client RPCs should return wrapped errors for checking (#8605)
When the client-side actions of a CSI client RPC succeed but we get
disconnected during the RPC or we fail to checkpoint the claim state, we want
to be able to retry the client RPC without getting blocked by the client-side
state (ex. mount points) already having been cleaned up in previous calls.
2020-08-07 11:01:36 -04:00
Tim Gross 81b604fa13
csi: controller unpublish should check current alloc count (#8604)
Using the count of node claims from earlier in the `CSIVolume.Unpublish RPC
doesn't correctly account for cases where the RPC was interrupted but
checkpointed. Instead, we'll check the current allocation count and status to
determine whether we need to send a controller unpublish.
2020-08-07 10:43:45 -04:00
Seth Hoenig d6a60ff4b1
Merge pull request #8603 from hashicorp/f-upgrade-consul-api
deps: upgrade import of consul/api
2020-08-07 08:46:19 -05:00
Seth Hoenig fb1c85a956 deps: upgrade import of consul/api
Upgrade our consul/api import to the equivelent of consul@v1.8.1 which includes
a bug fix necessary for #6913. If consul would publish a proper api/ submodule tag
we could reference that.
2020-08-06 21:02:33 -05:00
Tim Gross 2854298089
csi: release claims via csi_hook postrun unpublish RPC (#8580)
Add a Postrun hook to send the `CSIVolume.Unpublish` RPC to the server. This
may forward client RPCs to the node plugins or to the controller plugins,
depending on whether other allocations on this node have claims on this
volume.

By making clients responsible for running the `CSIVolume.Unpublish` RPC (and
making the RPC available to a `nomad volume detach` command), the
volumewatcher becomes only used by the core GC job and we no longer need
async volume GC from job deregister and node update.
2020-08-06 14:51:46 -04:00
Michael Schurter 057e1c021f
Merge pull request #8597 from hashicorp/b-vault-revoke-log-line
vault: log once per interval if batching revocation
2020-08-06 11:32:47 -07:00
Tim Gross 314458ebdb
csi: update volumewatcher to use unpublish RPC (#8579)
This changeset updates `nomad/volumewatcher` to take advantage of the
`CSIVolume.Unpublish` RPC. This lets us eliminate a bunch of code and
associated tests. The raft batching code can be safely dropped, as the
characteristic times of the CSI RPCs are on the order of seconds or even
minutes, so batching up raft RPCs added complexity without any real world
performance wins.

Includes refactor w/ test cleanup and dead code elimination in volumewatcher
2020-08-06 14:31:18 -04:00
Tim Gross eaa14ab64c
csi: add unpublish RPC (#8572)
This changeset is plumbing for a `nomad volume detach` command that will be
reused by the volumewatcher claim GC as well.
2020-08-06 13:51:29 -04:00
Tim Gross 4bbf18703f
csi: retry controller client RPCs on next controller (#8561)
The documentation encourages operators to run multiple controller plugin
instances for HA, but the client RPCs don't take advantage of this by retrying
when the RPC fails in cases when the plugin is unavailable (because the node
has drained or the alloc has failed but we haven't received an updated
fingerprint yet).

This changeset tries all known controllers on ready nodes before giving up,
and adds tests that exercise the client RPC routing and retries.
2020-08-06 13:24:24 -04:00
Luiz Aoqui 0fadad46d3
Merge pull request #8595 from hashicorp/docs/fix-connect-log-level
docs: fix Consul Connect log_level meta key
2020-08-06 11:00:09 -04:00
James Rasell 38f23b79df
Merge pull request #8574 from shishir-a412ed/f-ui-containerd-driver
Add nomad-driver-containerd to nomad UI docs.
2020-08-06 09:36:11 +02:00
Michael Schurter 2385fee0d2 vault: log once per interval if batching revocation
This log line should be rare since:

1. Most tokens should be logged synchronously, not via this async
   batched method. Async revocation only takes place when Vault
   connectivity is lost and after leader election so no revocations are
   missed.
2. There should rarely be >1 batch (1,000) tokens to revoke since the
   above conditions should be brief and infrequent.
3. Interval is 5 minutes, so this log line will be emitted at *most*
   once every 5 minutes.

What makes this log line rare is also what makes it interesting: due to
a bug prior to Nomad 0.11.2 some tokens may never get revoked. Therefore
Nomad tries to re-revoke them on every leader election. This caused a
massive buildup of old tokens that would never be properly revoked and
purged. Nomad 0.11.3 mostly fixed this but still had a bug in purging
revoked tokens via Raft (fixed in #8553).

The nomad.vault.distributed_tokens_revoked metric is only ticked upon
successful revocation and purging, making any bugs or slowness in the
process difficult to detect.

Logging before a potentially slow revocation+purge operation is
performed will give users much better indications of what activity is
going on should the process fail to make it to the metric.
2020-08-05 15:39:21 -07:00
Luiz Aoqui 602d3373ed
docs: fix Consul Connect log_level meta key 2020-08-05 17:01:03 -04:00
Buck Doyle 9074d33f28
UI: Add truncation of rendered search results (#8571)
This closes #8549. Thanks to @optiz0r for the bug report. Having
the global search attempt to render every returned result is
obviously a mistake!
2020-08-05 15:58:44 -05:00
Shishir Mahajan 874f948520
Fix review comments. 2020-08-05 11:51:00 -07:00
Shishir Mahajan 088b0694b4 Add nomad-driver-containerd to nomad UI docs. 2020-08-04 11:29:05 -07:00
Chris Baker a5dc6df0ff
Merge pull request #8583 from hashicorp/cgbaker-patch-1
Update CHANGELOG.md
2020-07-31 11:16:48 -05:00
Chris Baker 07e8b405d2
Update CHANGELOG.md
Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
2020-07-31 11:14:13 -05:00
Chris Baker e920bd22bb
Update CHANGELOG.md
Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
2020-07-31 11:13:40 -05:00
Chris Baker 8ba61e60d6
Update CHANGELOG.md
label in changelog listed wrong issue number
2020-07-31 11:05:53 -05:00
Drew Bailey c06a84e4a2
ignore VAULT_NAMESPACE (#8581)
VAULT_NAMESPACE in 0.12.1 and previous versions is already ignored. \n revert change that used it as a default since it will break oss users
2020-07-31 10:33:21 -04:00
Tim Gross 5dba653b43
csi/e2e: add 2nd controller for node drain testing (#8573) 2020-07-31 08:03:49 -04:00
Buck Doyle c2ce0a1dec
Add linting for acceptance accessibility audits (#8570)
This makes use of the PR I recently had merged to eslint-plugin-ember-a11y-testing
to add linting that ensures an accessibility audit is called at least once per acceptance
test file. When I have added linting for component tests, it can apply there too.

I added exclusions for the filesystem browser tests, which are covered by behaviors/fs
and for the search test which will involve significant overrides to Ember Power Select
default templates.
2020-07-30 12:40:05 -05:00
James Rasell 90903bb625
Merge pull request #8555 from hashicorp/remove-size-detail-from-docs-homepage
docs: remove Nomad binary size from README and website.
2020-07-30 19:20:23 +02:00
Mahmood Ali 490b9ce3a0
Handle Scaling Policies in Job Plan endpoint (#8567)
Fixes https://github.com/hashicorp/nomad/issues/8544

This PR fixes a bug where using `nomad job plan ...` always report no change if the submitted job contain scaling.

The issue has three contributing factors:
1. The plan endpoint doesn't populate the required scaling policy ID; unlike the job register endpoint
2. The plan endpoint suppresses errors on job insertion - the job insertion fails here, because the scaling policy is missing the required ID
3. The scheduler reports no update necessary when the relevant job isn't in store (because the insertion failed)

This PR fixes the first two factors.  Changing the scheduler to be more strict might make sense, but may violate some idempotency invariant or make the scheduler more brittle.
2020-07-30 12:27:36 -04:00
Michael Lange acecdbf4a2
Merge pull request #8569 from hashicorp/d/update-scale-request-params
Docs: Update Reason to Message in the job scale docs
2020-07-30 09:18:18 -07:00
Michael Lange 42517d87b4
Merge pull request #8568 from hashicorp/b-ui/scale-post-message
Use the correct Message property instead of Reason in scale POST request
2020-07-30 09:13:15 -07:00
Michael Lange af446cec10 Update Reason to Message in the job scale docs 2020-07-30 09:06:08 -07:00
Michael Lange 7e5cfa216e Use the correct Message property instead of Reason in scale POST request
Also use a more informative default message (one that includes the new
count)
2020-07-30 08:43:15 -07:00
Michael Lange 868509de5f
Merge pull request #8563 from hashicorp/b-ui/missing-job-distribution-chart-texture
UI: Restore striped texture used in the job distribution bar
2020-07-30 08:20:48 -07:00
Buck Doyle 58ce7c298f
Add documentation for job name parameter (#8566) 2020-07-30 10:13:39 -05:00
Buck Doyle bf056b3011
Change capitalisation
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2020-07-30 10:09:47 -05:00
Buck Doyle 7855adf127 Add documentation for job name parameter 2020-07-30 10:03:50 -05:00
Buck Doyle adada0d5b0
Fix placement invocations (#8558) 2020-07-30 09:56:58 -05:00
Buck Doyle 7596cfd5e7
Change job search navigation to use id (#8560)
This fixes #8548. It was a significant oversight to use the
name instead of the id!
2020-07-30 09:10:08 -05:00
Tim Gross 87f9bfaf1e
e2e/csi: update EFS plugin test to use v1.0 (#8562) 2020-07-30 08:41:48 -04:00
Michael Lange 5dfa8f6350 Remove now superfluous svg-patterns.js component file 2020-07-29 22:55:16 -07:00
Michael Lange 8a78999019 Move the svg-patterns template into the component dir 2020-07-29 22:54:30 -07:00
Michael Lange 9dafbf52cf
Merge pull request #8551 from hashicorp/f-ui/scaling-observability
UI: Scaling observability
2020-07-29 22:38:12 -07:00
Michael Lange b7ade13e85 Add scaling observability feature to the changelog 2020-07-29 22:27:54 -07:00
Michael Lange 4d2f322e10 Add a tooltip to explain the count change icons 2020-07-29 19:30:00 -07:00
Michael Lange b0f2a9f51d Fix scale and summary adapters to correct live reloading 2020-07-29 19:26:32 -07:00
Michael Lange 602b6771ba Assert that the scale up/down indicator is not shown when the count is null 2020-07-29 19:26:32 -07:00
Michael Lange 13af67ac80 Integration tests for the scale-events-accordion component 2020-07-29 19:26:32 -07:00
Michael Lange 69795e8b7d Refactor scale events into their own component 2020-07-29 19:26:32 -07:00
Michael Lange 4b7f431981 Acceptance tests for scaling events 2020-07-29 19:07:24 -07:00
Michael Lange 8a995a0db8 Make scale event properties more conditional and serialized correctly 2020-07-29 19:07:24 -07:00
Michael Lange 203f7e06b8 Present scaling events on the job task group page 2020-07-29 19:07:24 -07:00
Michael Lange d92ade8c54 Load and watch the job scale endpoint on the task group page 2020-07-29 19:07:24 -07:00
Michael Lange 8eabff06d5 Finish modeling behaviors within job scale events 2020-07-29 19:07:24 -07:00