Commit Graph

23936 Commits

Author SHA1 Message Date
Luiz Aoqui 1b87d292a3
client: retry RPC call when no server is available (#15140)
When a Nomad service starts it tries to establish a connection with
servers, but it also runs alloc runners to manage whatever allocations
it needs to run.

The alloc runner will invoke several hooks to perform actions, with some
of them requiring access to the Nomad servers, such as Native Service
Discovery Registration.

If the alloc runner starts before a connection is established the alloc
runner will fail, causing the allocation to be shutdown. This is
particularly problematic for disconnected allocations that are
reconnecting, as they may fail as soon as the client reconnects.

This commit changes the RPC request logic to retry it, using the
existing retry mechanism, if there are no servers available.
2022-11-04 14:09:39 -04:00
Charlie Voiselle 79c4478f5b
template: error on missing key (#15141)
* Support error_on_missing_value for templates
* Update docs for template stanza
2022-11-04 13:23:01 -04:00
Seth Hoenig d7aa37a5c9
e2e: explicitly wait on task status in chroot download exec test (#15145)
Also add some debug log lines for this test, because it doesn't make sense
for the allocation to be complete yet a task in the allocation to be not
started yet, which is what the test failures are implying.
2022-11-04 09:50:11 -05:00
Charlie Voiselle 83e43e01c1
Add missing timer reset (#15134) 2022-11-03 18:57:57 -04:00
Ethan 654ae1d591
fix: batchFirstFingerprints does not update device on node after v1.3.5 (#15125)
* fix: update device in batch first footprint

* cl: add cl note

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2022-11-03 16:31:39 -05:00
Phil Renaud 147df77e62
Ember patched for security release (#15126) 2022-11-03 16:29:23 -04:00
Tim Gross 672fb46d12
WI: set identity to client secret if missing (#15121)
Allocations created before 1.4.0 will not have a workload identity token. When
the client running these allocs is upgraded to 1.4.x, the identity hook will run
and replace the node secret ID token used previously with an empty string. This
causes service discovery queries to fail.

Fallback to the node's secret ID when the allocation doesn't have a signed
identity. Note that pre-1.4.0 allocations won't have templates that read
Variables, so there's no threat that this new node ID secret will be able to
read data that the allocation shouldn't have access to.
2022-11-03 11:10:11 -04:00
Phil Renaud ab5bfa8149
Accidentally trailed off on a docs paragraph (#15118) 2022-11-02 23:33:41 -04:00
Phil Renaud ffb4c63af7
[ui] Adds meta to job list stub and displays a pack logo on the jobs index (#14833)
* Adds meta to job list stub and displays a pack logo on the jobs index

* Changelog

* Modifying struct for optional meta param

* Explicitly ask for meta anytime I look up a job from index or job page

* Test case for the endpoint

* adding meta field to API struct and ommitting from response if empty

* passthru method added to api/jobs.list

* Meta param listed in docs for jobs list

* Update api/jobs.go

Co-authored-by: Tim Gross <tgross@hashicorp.com>

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-11-02 16:58:24 -04:00
Phil Renaud 6d5fe56fa1
Job spec upload (#14747)
* Job spec upload by click or drag

* pseudo-restrict formats

* Changelog

* Tweak to job spec upload to be above editor layer

* Within the job-editor again tho

* Beginning testcase cleanup

* Test progression

* refact: update codemirror fillin logic

Co-authored-by: Jai Bhagat <jaybhagat841@gmail.com>
2022-11-02 10:34:10 -04:00
Seth Hoenig a0bdc67d6a
build: update to go1.19.3 (#15099) 2022-11-01 15:54:49 -05:00
Tim Gross 4d7a4171cd
volumewatcher: prevent panic on nil volume (#15101)
If a GC claim is written and then volume is deleted before the `volumewatcher`
enters its run loop, we panic on the nil-pointer access. Simply doing a
nil-check at the top of the loop reveals a race condition around shutting down
the loop just as a new update is coming in.

Have the parent `volumeswatcher` send an initial update on the channel before
returning, so that we're still holding the lock. Update the watcher's `Stop`
method to set the running state, which lets us avoid having a second context and
makes stopping synchronous. This reduces the cases we have to handle in the run
loop.

Updated the tests now that we'll safely return from the goroutine and stop the
runner in a larger set of cases. Ran the tests with the `-race` detection flag
and fixed up any problems found here as well.
2022-11-01 16:53:10 -04:00
Tim Gross 38542f256e
variables: limit rekey eval to half the nack timeout (#15102)
In order to limit how much the rekey job can monopolize a scheduler worker, we
limit how long it can run to 1min before stopping work and emitting a new
eval. But this exactly matches the default nack timeout, so it'll fail the eval
rather than getting a chance to emit a new one.

Set the timeout for the rekey eval to half the configured nack timeout.
2022-11-01 16:50:50 -04:00
Tim Gross 903b5baaa4
keyring: safely handle missing keys and restore GC (#15092)
When replication of a single key fails, the replication loop breaks early and
therefore keys that fall later in the sorting order will never get
replicated. This is particularly a problem for clusters impacted by the bug that
caused #14981 and that were later upgraded; the keys that were never replicated
can now never be replicated, and so we need to handle them safely.

Included in the replication fix:
* Refactor the replication loop so that each key replicated in a function call
  that returns an error, to make the workflow more clear and reduce nesting. Log
  the error and continue.
* Improve stability of keyring replication tests. We no longer block leadership
  on initializing the keyring, so there's a race condition in the keyring tests
  where we can test for the existence of the root key before the keyring has
  been initialize. Change this to an "eventually" test.

But these fixes aren't enough to fix #14981 because they'll end up seeing an
error once a second complaining about the missing key, so we also need to fix
keyring GC so the keys can be removed from the state store. Now we'll store the
key ID used to sign a workload identity in the Allocation, and we'll index the
Allocation table on that so we can track whether any live Allocation was signed
with a particular key ID.
2022-11-01 15:00:50 -04:00
dependabot[bot] acc94d523f
build(deps): bump github.com/docker/cli from 20.10.18+incompatible to 20.10.21+incompatible (#15078)
* build(deps): bump github.com/docker/cli

Bumps [github.com/docker/cli](https://github.com/docker/cli) from 20.10.18+incompatible to 20.10.21+incompatible.
- [Release notes](https://github.com/docker/cli/releases)
- [Commits](https://github.com/docker/cli/compare/v20.10.18...v20.10.21)

---
updated-dependencies:
- dependency-name: github.com/docker/cli
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* deps: updated github.com/docker/cli from 20.10.18+incompatible to 20.10.21+incompatible

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2022-10-31 08:50:33 -05:00
dependabot[bot] c0fc174e0d
build(deps): bump github.com/hashicorp/serf from 0.10.0 to 0.10.1 (#15077)
Bumps [github.com/hashicorp/serf](https://github.com/hashicorp/serf) from 0.10.0 to 0.10.1.
- [Release notes](https://github.com/hashicorp/serf/releases)
- [Changelog](https://github.com/hashicorp/serf/blob/master/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/serf/compare/v0.10.0...v0.10.1)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/serf
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-31 08:48:34 -05:00
dependabot[bot] 369e4da4ad
build(deps): bump github.com/aws/aws-sdk-go from 1.44.84 to 1.44.126 (#15081)
* build(deps): bump github.com/aws/aws-sdk-go from 1.44.84 to 1.44.126

Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.44.84 to 1.44.126.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Commits](https://github.com/aws/aws-sdk-go/compare/v1.44.84...v1.44.126)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* deps: update github.com/aws/aws-sdk-go from 1.44.84 to 1.44.126

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2022-10-31 08:47:48 -05:00
dependabot[bot] 0ae4c4a241
build(deps): bump github.com/stretchr/testify in /api (#15082)
Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.8.0 to 1.8.1.
- [Release notes](https://github.com/stretchr/testify/releases)
- [Commits](https://github.com/stretchr/testify/compare/v1.8.0...v1.8.1)

---
updated-dependencies:
- dependency-name: github.com/stretchr/testify
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-31 08:45:04 -05:00
dependabot[bot] fc9a731813
build(deps): bump github.com/hashicorp/memberlist from 0.4.0 to 0.5.0 (#15080)
Bumps [github.com/hashicorp/memberlist](https://github.com/hashicorp/memberlist) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/hashicorp/memberlist/releases)
- [Commits](https://github.com/hashicorp/memberlist/compare/v0.4.0...v0.5.0)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/memberlist
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-31 08:44:27 -05:00
dependabot[bot] 6d4fef46fe
build(deps): bump go.uber.org/goleak from 1.1.12 to 1.2.0 (#15079)
Bumps [go.uber.org/goleak](https://github.com/uber-go/goleak) from 1.1.12 to 1.2.0.
- [Release notes](https://github.com/uber-go/goleak/releases)
- [Changelog](https://github.com/uber-go/goleak/blob/master/CHANGELOG.md)
- [Commits](https://github.com/uber-go/goleak/compare/v1.1.12...v1.2.0)

---
updated-dependencies:
- dependency-name: go.uber.org/goleak
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-31 08:43:29 -05:00
Tim Gross dab9388c75
refactor eval delete safety check (#15070)
The `Eval.Delete` endpoint has a helper that takes a list of jobs and allocs and
determines whether the eval associated with those is safe to delete (based on
their state). Filtering improvements to the `Eval.Delete` endpoint are going to
need this check to run in the state store itself for consistency.

Refactor to push this check down into the state store to keep the eventual diff
for that work reasonable.
2022-10-28 09:10:33 -04:00
Seth Hoenig ec915b2436
build: update linters (#15063)
Remove dead linters and add some interesting new ones.
2022-10-27 15:02:30 -05:00
Tim Gross 9c37a234e7
test: refactor EvalEndpoint_Delete (#15065)
While working on filtering improvements to the `Eval.Delete` endpoint I noticed
that this test was going to need to expand significantly and needed some
refactoring to make that work nicely. In order to reduce the size of the
eventual diff, I've pulled this refactoring out into its own changeset.
2022-10-27 15:29:22 -04:00
Charlie Voiselle d57e333534
Update architecture-state-store.md (#15049) 2022-10-27 14:03:43 -04:00
Tim Gross 8ac41c167f
Merge pull request #15062 from hashicorp/post-1.4.2-release
Post 1.4.2 release
2022-10-27 13:38:36 -04:00
Tim Gross 2ce1728fa6 Merge release 1.4.2 files
Changelog updates for 1.4.2 and backports.
2022-10-27 13:31:29 -04:00
hc-github-team-nomad-core 38b1c8a22a Prepare for next release 2022-10-27 13:08:05 -04:00
hc-github-team-nomad-core fbef8881cd Generate files for 1.4.2 release 2022-10-27 13:08:05 -04:00
Tim Gross 9d906d4632 variables: fix filter on List RPC
The List RPC correctly authorized against the prefix argument. But when
filtering results underneath the prefix, it only checked authorization for
standard ACL tokens and not Workload Identity. This results in WI tokens being
able to read List results (metadata only: variable paths and timestamps) for
variables under the `nomad/` prefix that belong to other jobs in the same
namespace.

Fixes the filtering and split the `handleMixedAuthEndpoint` function into
separate authentication and authorization steps so that we don't need to
re-verify the claim token on each filtered object.

Also includes:
* update semgrep rule for mixed auth endpoints
* variables: List returns empty set when all results are filtered
2022-10-27 13:08:05 -04:00
James Rasell da5069bded event stream: ensure token expiry is correctly checked for subs.
This change ensures that a token's expiry is checked before every
event is sent to the caller. Previously, a token could still be
used to listen for events after it had expired, as long as the
subscription was made while it was unexpired. This would last until
the token was garbage collected from state.

The check occurs within the RPC as there is currently no state
update when a token expires.
2022-10-27 13:08:05 -04:00
dependabot[bot] 81ac5d93f1
build(deps): bump github.com/kr/pretty from 0.3.0 to 0.3.1 in /api (#14859)
* build(deps): bump github.com/kr/pretty from 0.3.0 to 0.3.1 in /api

Bumps [github.com/kr/pretty](https://github.com/kr/pretty) from 0.3.0 to 0.3.1.
- [Release notes](https://github.com/kr/pretty/releases)
- [Commits](https://github.com/kr/pretty/compare/v0.3.0...v0.3.1)

---
updated-dependencies:
- dependency-name: github.com/kr/pretty
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* deps: update in root as well

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2022-10-27 11:58:00 -05:00
dependabot[bot] 7324d90ba7
build(deps): bump github.com/ryanuber/columnize (#14858)
Bumps [github.com/ryanuber/columnize](https://github.com/ryanuber/columnize) from 2.1.1-0.20170703205827-abc90934186a+incompatible to 2.1.2+incompatible.
- [Release notes](https://github.com/ryanuber/columnize/releases)
- [Commits](https://github.com/ryanuber/columnize/commits/v2.1.2)

---
updated-dependencies:
- dependency-name: github.com/ryanuber/columnize
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-27 11:34:27 -05:00
dependabot[bot] 2f86f92d87
build(deps): bump github.com/shirou/gopsutil/v3 from 3.22.8 to 3.22.9 (#14857)
Bumps [github.com/shirou/gopsutil/v3](https://github.com/shirou/gopsutil) from 3.22.8 to 3.22.9.
- [Release notes](https://github.com/shirou/gopsutil/releases)
- [Commits](https://github.com/shirou/gopsutil/compare/v3.22.8...v3.22.9)

---
updated-dependencies:
- dependency-name: github.com/shirou/gopsutil/v3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-27 11:33:50 -05:00
dependabot[bot] 07796965b1
build(deps): bump google.golang.org/grpc from 1.48.0 to 1.50.1 (#14897)
* build(deps): bump google.golang.org/grpc from 1.48.0 to 1.50.1

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.48.0 to 1.50.1.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.48.0...v1.50.1)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* cl: add changelog entry for grpc

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2022-10-27 11:32:48 -05:00
dependabot[bot] eb210f2af7
build(deps): bump github.com/fsouza/go-dockerclient from 1.8.2 to 1.9.0 (#14898)
* build(deps): bump github.com/fsouza/go-dockerclient from 1.8.2 to 1.9.0

Bumps [github.com/fsouza/go-dockerclient](https://github.com/fsouza/go-dockerclient) from 1.8.2 to 1.9.0.
- [Release notes](https://github.com/fsouza/go-dockerclient/releases)
- [Changelog](https://github.com/fsouza/go-dockerclient/blob/main/container_changes_test.go)
- [Commits](https://github.com/fsouza/go-dockerclient/compare/v1.8.2...v1.9.0)

---
updated-dependencies:
- dependency-name: github.com/fsouza/go-dockerclient
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* cl: add changelog entry

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2022-10-27 11:05:45 -05:00
Seth Hoenig 4f3a1e6f7d
ci: use groups of tests in gha (#15018)
* [no ci] use json for grouping packages for testing

* [no ci] able to get packages in group

* [no ci] able to run groups of tests

* [no ci] more

* [no ci] try disable circle unit tests

* ci: use actions/checkout@v3

* ci: rename to quick

* ci: need make dev in mods cache step

* ci: make compile step depend on checks step

* ci: bump consul and vault versions

* ci: need make dev for group tests

* ci: update ci unit testing docs

* docs: spell plumbing correctly

Co-authored-by: Tim Gross <tgross@hashicorp.com>

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-10-27 09:02:58 -05:00
Charlie Voiselle 28cd831085
Update consul-template dep (#15045) 2022-10-26 11:51:45 -04:00
Tim Gross f29c781fa7
docs: improved documentation on hardening and required capabilities (#15036)
The existing docs on required capabilities are a little sparse and have been the
subject of a lots of questions. Expand on this information and provide a pointer
to the ongoing design discussion around rootless Nomad.
2022-10-26 09:46:13 -04:00
Tim Gross aca95c0bc6
keyring: remove root key GC (#15034) 2022-10-25 17:06:18 -04:00
Seth Hoenig d69556fb35
client: ensure minimal cgroup controllers enabled (#15027)
* client: ensure minimal cgroup controllers enabled

This PR fixes a bug where Nomad could not operate properly on operating
systems that set the root cgroup.subtree_control to a set of controllers that
do not include the minimal set of controllers needed by Nomad.

Nomad needs these controllers enabled to operate:
- cpuset
- cpu
- io
- memory
- pids

Now, Nomad will ensure these controllers are enabled during Client initialization,
adding them to cgroup.subtree_control as necessary. This should be particularly
helpful on the RHEL/CentOS/Fedora family of system. Ubuntu systems should be
unaffected as they enable all controllers by default.

Fixes: https://github.com/hashicorp/nomad/issues/14494

* docs: cleanup doc string

* client: cleanup controller writes, enhance log messages
2022-10-24 16:08:54 -05:00
Tim Gross c45d9a9ea8
keyring: refactor to hold locks for less time (#15026)
Follow-up from https://github.com/hashicorp/nomad/pull/14987/files#r1003611644

We don't need to hold the lock when querying the state store, so move the
read-lock to the interior of the `activeKeySet` function.
2022-10-24 16:23:44 -04:00
Zach Shilton 4dd0bd916b
docs: add details to redirects file (#15020) 2022-10-24 13:16:07 -04:00
Seth Hoenig 32744a3548
deps: update hashicorp/raft to v1.3.11 (#15021)
* deps: update hashicorp/raft to v1.3.11

Includes part of the fix for https://github.com/hashicorp/raft/issues/524

* cl: add changelog entry
2022-10-24 12:10:24 -05:00
Seth Hoenig dd2999d6af
ci: add -core suffix to mods action (#15015)
Forgot to add this line to the new mods action; without it, it
creates a cache different from the one used by the other jobs.
2022-10-24 08:49:01 -05:00
Jai f4138a88e0
refact: preserve promise.then behavior for acceptance tests (#15003) 2022-10-24 09:04:39 -04:00
Tim Gross b9922631bd
keyring: fix missing GC config, don't rotate on manual GC (#15009)
The configuration knobs for root keyring garbage collection are present in the
consumer and present in the user-facing config, but we missed the spot where we
copy from one to the other. Fix this so that users can set their own thresholds.

The root key is automatically rotated every ~30d, but the function that does
both rotation and key GC was wired up such that `nomad system gc` caused an
unexpected key rotation. Split this into two functions so that `nomad system gc`
cleans up old keys without forcing a rotation, which will be done periodially
or by the `nomad operator root keyring rotate` command.
2022-10-24 08:43:42 -04:00
Seth Hoenig 91d29e6449
ci: use the same go mod cache across test-core jobs (#15006)
* ci: use the same go mod cache for test-core jobs

* ci: precache go modules

* ci: add a mods precache job
2022-10-21 17:38:45 -05:00
Tim Gross 3a811ac5e7
keyring: fixes for keyring replication on cluster join (#14987)
* keyring: don't unblock early if rate limit burst exceeded

The rate limiter returns an error and unblocks early if its burst limit is
exceeded (unless the burst limit is Inf). Ensure we're not unblocking early,
otherwise we'll only slow down the cases where we're already pausing to make
external RPC requests.

* keyring: set MinQueryIndex on stale queries

When keyring replication makes a stale query to non-leader peers to find a key
the leader doesn't have, we need to make sure the peer we're querying has had a
chance to catch up to the most current index for that key. Otherwise it's
possible for newly-added servers to query another newly-added server and get a
non-error nil response for that key ID.

Ensure that we're setting the correct reply index in the blocking query.

Note that the "not found" case does not return an error, just an empty key. So
as a belt-and-suspenders, update the handling of empty responses so that we
don't break the loop early if we hit a server that doesn't have the key.

* test for adding new servers to keyring

* leader: initialize keyring after we have consistent reads

Wait until we're sure the FSM is current before we try to initialize the
keyring.

Also, if a key is rotated immediately following a leader election, plans that
are in-flight may get signed before the new leader has the key. Allow for a
short timeout-and-retry to avoid rejecting plans
2022-10-21 12:33:16 -04:00
Michael Schurter 9cac60dbed
test: use port collision instead of cpu exhaustion (#14994)
Originally this test relied on Job 1 blocking Job 2 until Job 1 had a
terminal *ClientStatus.* Job 2 ensured it would get blocked using 2
mechanisms:

1. A constraint requiring it is placed on the same node as Job 1.
2. Job 2 would require all unreserved CPU on the node to ensure it would
   be blocked until Job 1's resources were free.

That 2nd assertion breaks if *any previous job is still running on the
target node!* That seems very likely to happen in the flaky world of our
e2e tests. In fact there may be some jobs we intentionally want running
throughout; in hindsight it was never safe to assume my test would be
the only thing scheduled when it ran.

*Ports to the rescue!* Reserving a static port means that both Job 2
will now block on Job 1 being terminal. It will only conflict with other
tests if those tests use that port *on every node.* I ensured no
existing tests were using the port I chose.

Other changes:
- Gave job a bit more breathing room resource-wise.
- Tightened timings a bit since previous failure ran into the `go test`
  time limit.
- Cleaned up the DumpEvals output. It's quite nice and handy now!
2022-10-21 07:53:26 -07:00
Luiz Aoqui 8b8d85bce7
docs: use of `node_class` when autoscaling (#14950)
Document how the value of `node_class` is used during cluster scaling.

https://github.com/hashicorp/nomad-autoscaler/issues/255
2022-10-21 10:35:45 -04:00