Commit graph

22603 commits

Author SHA1 Message Date
Tim Gross 5f30279cd2
e2e: StopJob should tolerate progress deadline expired (#12179)
The `TestRescheduleProgressDeadlineFail` E2E test failed during test
cleanup because the error message "progress deadline expired" that it
emits when we stop the job does not match the one expected from
monitoring the `job stop` command. Update the `StopJob` helper to
tolerate this use case as well.
2022-03-04 08:55:22 -05:00
Tim Gross 4c4895e19c
e2e: configure prometheus for mTLS for Metrics suite (#12181)
The `Metrics` suite uses prometheus to scrape Nomad metrics so that
we're testing the full user experience of extracting metrics from
Nomad. With the addition of mTLS, we need to make sure prometheus also
has mTLS configuration because the metrics endpoint is protected.

Update the Nomad client configuration and prometheus job to bind-mount
the client's certs into the task so that the job can use these certs
to scrape the server. This is a temporary solution that gets the job
passing; we should give the job its own certificates (issued by
Vault?) when we've done some of the infrastructure rework we'd like.
2022-03-04 08:55:06 -05:00
Tim Gross f470eb9f1e
csi: ensure WriteOptions aren't nil when handling secrets (#12182)
When we set the headers for CSI secrets in the `WriteOptions`, it
turns out that we're not always passing a non-nil object. In that
case, instanstiate it on demand in the API.
2022-03-04 08:49:04 -05:00
Luiz Aoqui b1809eb48c
Fix CSI volume list with prefix and * namespace (#12184)
When using a prefix value and the * wildcard for namespace, the endpoint
would not take the prefix value into consideration due to the order in
which the checks were executed but also the logic for retrieving volumes
from the state store.

This commit changes the order to check for a prefix first and wraps the
result iterator of the state store query in a filter to apply the
prefix.
2022-03-03 17:27:04 -05:00
Tim Gross b8b08fb32d
e2e: use UUID for CSI idempotency token (#12183)
The AWS EBS plugin appears to use the name field of the volume as an
idempotency token that persists across the entire AWS account, not
just the plugin lifespan.

Also fix the regex for the volume ID, which was originally taken from
the job ID regex but isn't actually the same. This hasn't failed tests
for us because we've always passed in the same volume ID.
2022-03-03 17:00:00 -05:00
Tim Gross 1502af3523
e2e: use operator api for Networking suite validation (#12180)
With mTLS enabled, using `curl` in a bash script for validation
involves having to configure arguments to `curl` based on whether or
not the test infrastructure is using mTLS, whether ACLs are enabled,
etc. Use the new `operator api` command instead to pick up the client
configuration from the test environment automatically.
2022-03-03 15:17:29 -05:00
Tim Gross 3247e422d1
csi: add missing fields to HTTP API response (#12178)
The HTTP endpoint for CSI manually serializes the internal struct to
the API struct for purposes of redaction (see also #10470). Add fields
that were missing from this serialization so they don't show up as
always empty in the API response.
2022-03-03 15:15:28 -05:00
Luiz Aoqui fe38da1137
ci: disable Go test semgrep rules (#12175) 2022-03-02 20:30:27 -05:00
Michael Schurter 0f6923c750
Merge pull request #10808 from hashicorp/f-curl
cli: add operator api command
2022-03-02 10:12:16 -08:00
Michael Schurter a8833b7d86 docs: add op api examples 2022-03-01 17:15:26 -08:00
Michael Schurter 72134ef5a7 docs: add op api examples 2022-03-01 17:12:58 -08:00
Michael Schurter 0bb9f06637 cli: fix op api method handling 2022-03-01 16:44:15 -08:00
Michael Schurter fcf4515875 docs: add op api options 2022-03-01 16:43:53 -08:00
Ashlee M Boyer c3691a44df
docs: Fixing path for autoscaling/agent/source nav item (#12166) 2022-03-01 17:24:12 -05:00
Luiz Aoqui 01931587ba
api: paginated results with different ordering (#12128)
The paginator logic was built when go-memdb iterators would return items
ordered lexicographically by their ID prefixes, but #12054 added the
option for some tables to return results ordered by their `CreateIndex`
instead, which invalidated the previous paginator assumption.

The iterator used for pagination must still return results in some order
so that the paginator can properly handle requests where the next_token
value is not present in the results anymore (e.g., the eval was GC'ed).

In these situations, the paginator will start the returned page in the
first element right after where the requested token should've been.

This commit moves the logic to generate pagination tokens from the
elements being paginated to the iterator itself so that callers can have
more control over the token format to make sure they are properly
ordered and stable.

It also allows configuring the paginator as being ordered in ascending
or descending order, which is relevant when looking for a token that may
not be present anymore.
2022-03-01 15:36:49 -05:00
Tim Gross f65c804544
csi: subcommand for volume snapshot (#12152) 2022-03-01 13:30:30 -05:00
Tim Gross f4dfaec589
CSI: set plugin socket path on restore (#12149)
The Prestart hook for task runner hooks doesn't get called when we
restore a task, because the task is already running. The Postrun hook
for CSI plugin supervisors needs the socket path to have been
populated so that the client has a valid path.
2022-03-01 10:22:52 -05:00
Tim Gross f2a4ad0949
CSI: implement support for topology (#12129) 2022-03-01 10:15:46 -05:00
Tim Gross c90e674918
CSI: use HTTP headers for passing CSI secrets (#12144) 2022-03-01 08:47:01 -05:00
Tim Gross a499401b34
csi: fix redaction of volume status mount flags (#12150)
The `volume status` command and associated API redacts the entire
mount options instead of just the `MountFlags` field that can contain
sensitive data. Return a redacted value so that the return value makes
sense to operators who have set this field.
2022-03-01 08:34:03 -05:00
Tim Gross 99d03cdc6c
CSI: sort capabilities in plugin status (#12154)
Also fix `LIST_SNAPSHOTS` capability name
2022-03-01 07:59:31 -05:00
Tim Gross ca06f6153a
docs: clarify that plugin commands are for CSI only (#12151) 2022-03-01 07:57:41 -05:00
Tim Gross 02ae95ab22
csi: respect -verbose flag for allocs in volume status (#12153) 2022-03-01 07:57:29 -05:00
Kevin Wang 166011237b
fix(website): hide version select on /plugins & /tools (#12145)
* fix(website/plugins): display version select

* fix: hide version select on `/tools` + `/plugins`
2022-02-28 12:44:08 -05:00
Tim Gross 77fac26d5e
CI: increase test run timeout (#12143) 2022-02-28 11:30:59 -05:00
Seth Hoenig 5cf57e429a
Merge pull request #12137 from hashicorp/rpc-advertise-docs
docs: clairfy advertise.rpc effect
2022-02-28 08:15:28 -06:00
Michael Schurter cbf6ba843d
cli: fix op api typos
Co-authored-by: Seth Hoenig <seth.a.hoenig@gmail.com>
2022-02-25 16:31:56 -08:00
Michael Schurter 4550c5fb80 cli: only return 1 on errors from op api
We don't want people to expect stable error codes for errors, and I
don't think these were useful for scripts anyway.
2022-02-25 16:23:31 -08:00
Michael Schurter aeff156177 docs: fix nav for op api 2022-02-25 16:21:14 -08:00
Seth Hoenig 5269b2e02f docs: clairfy advertise.rpc effect
The advertise.rpc config option is not intuitive. At first glance you'd
assume it works like advertise.http or advertise.serf, but it does not.

The current behavior is working as intended, but the documentation is
very hard to parse and doesn't draw a clear picture of what the setting
actually does.

Closes https://github.com/hashicorp/nomad/issues/11075
2022-02-25 16:02:29 -06:00
Jai 817e66f930
Merge pull request #12134 from hashicorp/b-ui/target-link
ui:  external links open in new tabs
2022-02-25 10:29:04 -05:00
Seth Hoenig 34d46cd4c4
Merge pull request #12130 from hashicorp/flakey-serf-non-voter
tests: deflake test that joins a server with non-voting servers to form quorum
2022-02-25 09:12:42 -06:00
Jai Bhagat 8958d48ca9 ui: external links open in new tabs 2022-02-25 09:24:37 -05:00
Michael Schurter f6342d1d45 docs: add changelog for #10808 2022-02-24 17:13:42 -08:00
Michael Schurter a42d832f98 cli: add tests and minor fixes for op api
Trimmed spaces around header values.

Fixed method getting forced to GET.
2022-02-24 17:06:07 -08:00
Michael Schurter 238a732098 cli: add filter support 2022-02-24 15:52:54 -08:00
Michael Schurter bb3daac628 rename nomad curl to nomad operator api 2022-02-24 15:52:54 -08:00
Michael Schurter 141db0c562 cli: add curl command
Just a hackweek project at this point.
2022-02-24 15:52:54 -08:00
Seth Hoenig 1274aa690f tests: deflake test that joins a server with non-voting servers to form qourum
This PR
 - upgrades the serf library
 - has the test start the join process using the un-joined server first
 - disables schedulers on the servers
 - uses the WaitForLeader and wantPeers helpers

Not sure which, if any of these actually improves the flakiness of this test.
2022-02-24 17:02:58 -06:00
Zachary Shilton 81521ca248
chore: bump docs-page for code-block fix (#12117)
* chore: bump to latest docs-page

* fix: bump to react-consent-manager patch

* chore: bump to consent-manager with events dep

* chore: bump to stable consent-manager release
2022-02-24 15:34:54 -05:00
Tim Gross 31ee2a3c67
CSI: ensure all fields are mapped from structs to api response (#12124)
In PR #12108 we added missing fields to the plugin response, but we
didn't include the manual serialization steps that we need until
issue #10470 is resolved.
2022-02-24 14:17:15 -05:00
Tim Gross 13ea2c7fb3
CSI: display plugin capabilities in verbose status (#12116)
The behaviors of CSI plugins are governed by their capabilities as
defined by the CSI specification. When debugging plugin issues, it's
useful to know which behaviors are expected so they can be matched
against RPC calls made to the plugin allocations.

Expose the plugin capabilities as named in the CSI spec in the `nomad
plugin status -verbose` output.
2022-02-24 13:51:38 -05:00
Luiz Aoqui 61d79e75b0
docs: add docs for the autoscaler on_error and on_check_error configuration (#12083) 2022-02-24 12:12:29 -05:00
James Rasell bc6056cbbe
Merge pull request #12122 from hashicorp/b-api-remove-namespace-test-ent-tag
api: remove ent build tag on namespace test file.
2022-02-24 17:13:15 +01:00
James Rasell 8f175d44da
api: remove ent build tag on namespace test file. 2022-02-24 16:40:04 +01:00
Tim Gross 22cf24a6bd
CSI: retry claims from client when max claims are reached (#12113)
When the alloc runner claims a volume, an allocation for a previous
version of the job may still have the volume claimed because it's
still shutting down. In this case we'll receive an error from the
server. Retry this error until we succeed or until a very long timeout
expires, to give operators a chance to recover broken plugins.

Make the alloc runner hook tolerant of temporary RPC failures.
2022-02-24 10:39:07 -05:00
Tim Gross cfe3117af8
CSI: enforce usage at claim time (#12112)
* Remove redundant schedulable check in `FreeWriteClaims`. If a volume
  has been created but not yet claimed, its capabilities will be checked
  in `WriteSchedulable` at both scheduling time and claim time. We don't
  need to also check them in the `FreeWriteClaims` method.

* Enforce maximum volume claims for writers.

  When the scheduler checks feasibility for CSI volumes, the check is
  fairly loose: earlier versions of the same job are not counted as
  active claims. This allows the scheduler to place new allocations
  for the new version of a job, under the assumption that we'll replace
  the existing allocations and their volume claims.

  But when the alloc runner claims the volume, we need to enforce the
  active claims even if they're for allocations of an earlier version of
  the job. Otherwise we'll try to mount a volume that's currently being
  unmounted, and this will cause replacement allocations to frequently
  fail.

* Enforce single-node reader check for read-only volumes. When the
  alloc runner makes a claim for a read-only volume, we only check that
  the volume is potentially schedulable and not that it actually has
  free read claims.
2022-02-24 09:37:37 -05:00
Sander Mol 42b338308f
add go-sockaddr templating support to nomad consul address (#12084) 2022-02-24 09:34:54 -05:00
Florian Apolloner 3bced8f558
namespaces: allow enabling/disabling allowed drivers per namespace 2022-02-24 09:27:32 -05:00
Seth Hoenig 57b9c64b8f
Merge pull request #12107 from hashicorp/use-bbolt
core: swap bolt impl and enable configuring raft freelist sync behavior
2022-02-24 08:25:54 -06:00