Commit graph

22450 commits

Author SHA1 Message Date
Jai Bhagat fd5d766c41 temp: csi refactor 2022-02-10 09:14:32 -05:00
Jai Bhagat 4f802449dc temp: namespace model refact 2022-02-10 09:11:42 -05:00
Tim Gross 6bd33d3fb9
CSI: use job status not alloc status for plugin updates from summary (#12027)
When an allocation is updated, the job summary for the associated job
is also updated. CSI uses the job summary to set the expected count
for controller and node plugins. We incorrectly used the allocation's
server status instead of the job status when deciding whether to
update or remove the job from the plugins. This caused a node drain or
other terminal state for an allocation to clear the expected count for
the entire plugin.

Use the job status to guide whether to update or remove the expected
count.

The existing CSI tests for the state store incorrectly modeled the
updates we received from servers vs those we received from clients,
leading to test assertions that passed when they should not.

Rework the tests to clarify each step in the lifecycle and rename CSI state
store functions for clarity
2022-02-09 11:51:49 -05:00
Tim Gross 59c8558969
docs and changelog for nomad config validate (#12031) 2022-02-09 10:20:45 -05:00
Kevin Schoonover 1dcfff2f70
fingerprint: remove metadata from digitalocean (#12032) 2022-02-09 07:31:45 -05:00
Thomas Lefebvre 3b57f3af9d
Add config command and config validate subcommand to nomad CLI (#9198) 2022-02-08 16:52:35 -05:00
Tim Gross 21bd4835bd
fingerprint: digitalocean fingerprint test requires metadata header (#12028) 2022-02-08 16:35:13 -05:00
Seth Hoenig 5dad1cbb98
Merge pull request #12026 from hashicorp/f-update-aws
env: update aws cpu configs
2022-02-08 13:56:50 -06:00
Seth Hoenig 5cb365b36b env: update aws cpu configs
By running the tools/ec2info tool
2022-02-08 12:44:00 -06:00
Tim Gross d9d4da1e9f
scheduler: seed random shuffle nodes with eval ID (#12008)
Processing an evaluation is nearly a pure function over the state
snapshot, but we randomly shuffle the nodes. This means that
developers can't take a given state snapshot and pass an evaluation
through it and be guaranteed the same plan results.

But the evaluation ID is already random, so if we use this as the seed
for shuffling the nodes we can greatly reduce the sources of
non-determinism. Unfortunately golang map iteration uses a global
source of randomness and not a goroutine-local one, but arguably
if the scheduler behavior is impacted by this, that's a bug in the
iteration.
2022-02-08 12:16:33 -05:00
Seth Hoenig aece0ddda8
Merge pull request #12024 from hashicorp/docs-update-cl
changelog: update changelog for DO
2022-02-08 10:29:09 -06:00
Seth Hoenig a06ae106f0
cl: fix DO name
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-02-08 10:28:57 -06:00
Seth Hoenig a21776a82d changelog: update changelog for DO 2022-02-08 08:43:49 -06:00
Seth Hoenig 451d9b0dd2
Merge pull request #12015 from kevinschoonover/main
client/fingerprint: add digitalocean fingerprinter
2022-02-08 08:41:03 -06:00
Dylan Staley fdf67e6bb5
Merge pull request #11936 from hashicorp/ds.ie11-warning
website: display warning in IE 11
2022-02-07 13:59:41 -08:00
Kevin Schoonover b13573d4ab address comments
Co-authored-by: Seth Hoenig <seth.a.hoenig@gmail.com>
2022-02-07 09:03:48 -08:00
Tim Gross 464026c87b
scheduler: recover from panic (#12009)
If processing a specific evaluation causes the scheduler (and
therefore the entire server) to panic, that evaluation will never
get a chance to be nack'd and cleared from the state store. It will
get dequeued by another scheduler, causing that server to panic, and
so forth until all servers are in a panic loop. This prevents the
operator from intervening to remove the evaluation or update the
state.

Recover the goroutine from the top-level `Process` methods for each
scheduler so that this condition can be detected without panicking the
server process. This will lead to a loop of recovering the scheduler
goroutine until the eval can be removed or nack'd, but that's much
better than taking a downtime.
2022-02-07 11:47:53 -05:00
Kevin Schoonover 68eeaa7a18 small fixes 2022-02-05 22:23:43 -08:00
Kevin Schoonover 5523275e95 add digitalocean fingerprinter 2022-02-05 22:17:36 -08:00
Derek Strickland 7a63a249ca
reconciler: improve variable names and extract methods from inline logic (#12010)
* reconciler: improved variable names and extract methods from inline logic

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-02-05 04:54:19 -05:00
Luiz Aoqui 0e09b120e4
fix mTLS certificate check on agent to agent RPCs (#11998)
PR #11956 implemented a new mTLS RPC check to validate the role of the
certificate used in the request, but further testing revealed two flaws:

  1. client-only endpoints did not accept server certificates so the
     request would fail when forwarded from one server to another.
  2. the certificate was being checked after the request was forwarded,
     so the check would happen over the server certificate, not the
     actual source.

This commit checks for the desired mTLS level, where the client level
accepts both, a server or a client certificate. It also validates the
cercertificate before the request is forwarded.
2022-02-04 20:35:20 -05:00
ttys3 5faf344152
style: fix up very long tag word breaking the allocation service table width (#11995) 2022-02-04 19:40:03 -05:00
Karthick Ramachandran 0600bc32e2
improve error message on service length (#12012) 2022-02-04 19:39:34 -05:00
Dylan Staley e135369549 feat: display warning in IE 11 2022-02-04 14:25:52 -08:00
Seth Hoenig 420fd17459
Merge pull request #12002 from hashicorp/dependabot/go_modules/github.com/hashicorp/go-version-1.4.0
build(deps): bump github.com/hashicorp/go-version from 1.3.0 to 1.4.0
2022-02-04 08:31:53 -06:00
Seth Hoenig 77f4015426
Merge pull request #11937 from hashicorp/dependabot/go_modules/google.golang.org/grpc-1.44.0
build(deps): bump google.golang.org/grpc from 1.42.0 to 1.44.0
2022-02-04 08:29:58 -06:00
Luiz Aoqui c459c17579
add semgrep rule to check for potential time.After leaks (#12001) 2022-02-03 17:33:07 -05:00
dependabot[bot] 898107e311
build(deps): bump github.com/hashicorp/go-version from 1.3.0 to 1.4.0
Bumps [github.com/hashicorp/go-version](https://github.com/hashicorp/go-version) from 1.3.0 to 1.4.0.
- [Release notes](https://github.com/hashicorp/go-version/releases)
- [Changelog](https://github.com/hashicorp/go-version/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/go-version/compare/v1.3.0...v1.4.0)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-version
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-02-03 21:41:10 +00:00
dependabot[bot] 685f011d07
build(deps): bump google.golang.org/grpc from 1.42.0 to 1.44.0
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.42.0 to 1.44.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.42.0...v1.44.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-02-03 21:40:06 +00:00
Seth Hoenig 34cb21fecc
Merge pull request #11929 from hashicorp/dependabot/go_modules/github.com/mitchellh/copystructure-1.2.0
build(deps): bump github.com/mitchellh/copystructure from 1.1.1 to 1.2.0
2022-02-03 15:28:21 -06:00
Tim Gross 790e49b0dd
core: log CSI GC cutoff index only on non-forced GC (#11997)
Non-CSI garbage collection tasks on the server only log the cutoff
index in the case where it's not a forced GC from `nomad system gc`.
Do the same for CSI for consistency.
2022-02-03 15:03:39 -05:00
Tim Gross 7ad15b2b42
raft: default to protocol v3 (#11572)
Many of Nomad's Autopilot features require raft protocol version
3. Set the default raft protocol to 3, and improve the upgrade
documentation.
2022-02-03 15:03:12 -05:00
Seth Hoenig 5f48e18189
Merge pull request #11983 from hashicorp/b-select-after
cleanup: prevent leaks from time.After
2022-02-03 09:38:06 -06:00
Glen Yu 8abc28fc24
added Int32ToPtr helper function (#11985) 2022-02-02 17:12:54 -05:00
ttys3 1ab3b4d3d8
correct task row memory unit (#11980) 2022-02-02 17:00:25 -05:00
René Moser 05db861938
api-docs: add SysBatchSchedulerEnabled docs (#11973) 2022-02-02 16:54:47 -05:00
Samantha 54f8c04c91
Fix health checking for ephemeral poststart tasks (#11945)
Update the logic in the Nomad client's alloc health tracker which
erroneously marks existing healthy allocations with dead poststart ephemeral
tasks as unhealthy even if they were already successful during a previous
deployment.
2022-02-02 16:29:49 -05:00
Seth Hoenig db2347a86c cleanup: prevent leaks from time.After
This PR replaces use of time.After with a safe helper function
that creates a time.Timer to use instead. The new function returns
both a time.Timer and a Stop function that the caller must handle.

Unlike time.NewTimer, the helper function does not panic if the duration
set is <= 0.
2022-02-02 14:32:26 -06:00
Luiz Aoqui c4cff5359f
Verify TLS certificate on endpoints that are used between agents only (#11956) 2022-02-02 15:03:18 -05:00
Seth Hoenig f6217fe424
Merge pull request #11972 from hashicorp/b-disable-semgrep-structs
build: disable semgrep on structs.go for now
2022-02-02 12:57:47 -06:00
James Rasell ba735bc35f
Merge pull request #11976 from hashicorp/b-gh-11950-missed
e2e: moved missed volume test stop command to util helper.
2022-02-02 09:58:11 +01:00
James Rasell adc3c44e29
e2e: moved missed volume test stop command to util helper. 2022-02-02 08:42:58 +01:00
Tim Gross 0b1978736e
Merge pull request #11971 from hashicorp/merge-release-1.2.5-branch
prepare for next release
2022-02-01 11:16:38 -05:00
Tim Gross 7a0d151fab prepare for next release 2022-02-01 11:13:22 -05:00
Seth Hoenig 60ca29161f build: disable semgrep on structs.go for now 2022-02-01 10:09:49 -06:00
Tim Gross 95f26b307d
update download to Nomad v1.2.5 (#11969) 2022-02-01 11:04:06 -05:00
James Rasell fb7dbdf35d
Merge pull request #11968 from hashicorp/b-gh-11950
e2e: account for new job stop CLI exit behaviour.
2022-02-01 15:56:56 +01:00
Seth Hoenig 5f07ab5c80
Merge pull request #11966 from hashicorp/deps-no-special-vendor
deps: import libtime the normal way
2022-02-01 08:46:30 -06:00
James Rasell 0a50d9fd2a
e2e: account for new job stop CLI exit behaviour.
PR #11550 changed the job stop exit behaviour when monitoring the
deployment. When stopping a job, the deployment becomes cancelled
and therefore the CLI now exits with status code 1 as it see this
as an error.

This change adds a new utility e2e function that accounts for this
behaviour.
2022-02-01 14:16:37 +01:00
Michael Schurter fd242ab7f8
Merge pull request #11878 from kainoaseto/fix/multi-task-group-canary-deploys
Bugfix: auto-promote canary taskgroups when mixed with non-canary taskgroups
2022-01-31 16:22:51 -08:00