Commit Graph

22674 Commits

Author SHA1 Message Date
Tim Gross 5c91bc877c
csi: set gRPC authority header for unix domain socket (#12359)
The go-grpc library used by most CSI plugins doesn't require the
authority header to be set, which violates the HTTP2 spec but doesn't
impact Nomad because both sides of the connection are using the same
library. But plugins written in other languages (`democratic-csi` for
example) may have more strictly conforming gRPC server libraries and
we need to set the authority header manually.
2022-03-23 12:01:08 -04:00
Tim Gross 1743648901
CSI: fix timestamp from volume snapshot responses (#12352)
Listing snapshots was incorrectly returning nanoseconds instead of
seconds, and formatting of timestamps both list and create snapshot
was treating the timestamp as though it were nanoseconds instead of
seconds. This resulted in create timestamps always being displayed as
zero values.

Fix the unit conversion error in the command line and the incorrect
extraction in the CSI plugin client code. Beef up the unit tests to
make sure this code is actually exercised.
2022-03-23 10:39:28 -04:00
Tim Gross b7075f04fd
CSI: enforce single access mode at validation time (#12337)
A volume that has single-use access mode is feasibility checked during
scheduling to ensure that only a single reader or writer claim
exists. However, because feasibility checking is done one alloc at a
time before the plan is written, a job that's misconfigured to have
count > 1 that mounts one of these volumes will pass feasibility
checking.

Enforce the check at validation time instead to prevent us from even
trying to evaluation a job that's misconfigured this way.
2022-03-23 09:21:26 -04:00
Tim Gross 33558cb51e
csi: fix handling of garbage collected node in node unpublish (#12350)
When a node is garbage collected, we assume that the volume is no
longer attached to it and ignore the `ErrUnknownNode` error. But we
used `errors.Is` to check for a wrapped error, and RPC flattens the
errors during serialization. This results in an error check that works
in automated testing but not in real clusters. Use a string contains
check instead.
2022-03-22 15:40:24 -04:00
Luiz Aoqui f8973d364e
core: use the new Raft API when removing peers (#12340)
Raft v3 introduced a new API for adding and removing peers that takes
the peer ID instead of the address.

Prior to this change, Nomad would use the remote peer Raft version for
deciding which API to use, but this would not work in the scenario where
a Raft v3 server tries to remove a Raft v2 server; the code running uses
v3 so it's unable to call the v2 API.

This change uses the Raft version of the server running the code to
decide which API to use. If the remote peer is a Raft v2, it uses the
server address as the ID.
2022-03-22 15:07:31 -04:00
Luiz Aoqui b5a42cd55d
set raft v3 as the default config (#12341) 2022-03-22 15:06:25 -04:00
Tim Gross 60cfeacd76
drainer: defer CSI plugins until last (#12324)
When a node is drained, system jobs are left until last so that
operators can rely on things like log shippers running even as their
applications are getting drained off. Include CSI plugins in this set
so that Controller plugins deployed as services can be handled as
gracefully as Node plugins that are running as system jobs.
2022-03-22 10:26:56 -04:00
Jonathan Tey 4635be07ab
demo: add missing file for Kadalu CSI demo (#12336) 2022-03-22 09:50:50 -04:00
Tim Gross 2a2ebd0537
CSI: presentation improvements (#12325)
* Fix plugin capability sorting.
  The `sort.StringSlice` method in the stdlib doesn't actually sort, but
  instead constructs a sorting type which you call `Sort()` on.
* Sort allocations for plugins by modify index.
  Present allocations in modify index order so that newest allocations
  show up at the top of the list. This results in sorted allocs in
  `nomad plugin status :id`, just like `nomad job status :id`.
* Sort allocations for volumes in HTTP response.
  Present allocations in modify index order so that newest allocations
  show up at the top of the list. This results in sorted allocs in
  `nomad volume status :id`, just like `nomad job status :id`.
  This is implemented in the HTTP response and not in the state store
  because the state store maintains two separate lists of allocs that
  are merged before sending over the API.
* Fix length of alloc IDs in `nomad volume status` output
2022-03-22 09:48:38 -04:00
James Rasell f1e4db70d2
Merge pull request #12332 from hashicorp/b-node-fixup-drainupdate-msg-spelling
core: fixup node drain update message spelling.
2022-03-22 10:44:03 +01:00
Tim Gross e687a21da9
CSI: set plugin `CSI_ENDPOINT` env var only if unset by user (#12257)
* Use unix:// prefix for CSI_ENDPOINT variable by default
* Some plugins have strict validation over the format of the
  `CSI_ENDPOINT` variable, and unfortunately not all plugins
  agree. Allow the user to override the `CSI_ENDPOINT` to workaround
  those cases.
* Update all demos and tests with CSI_ENDPOINT
2022-03-21 11:48:47 -04:00
Tim Gross bd403f2f88
E2E: ensure `ConnectACLsE2ETest` has clean state before starting (#12334)
The `ConnectACLsE2ETest` checks that the SI tokens have been properly
cleaned up between tests, but following the change to use HCP the
previous `Connect` test suite will often have SI tokens that haven't
been cleaned up by the time this test suite runs. Wait for the SI
tokens to be cleaned up at the start of the test to ensure we have a
clean state.
2022-03-21 11:05:02 -04:00
James Rasell 68cd3d89fe
core: fixup node drain update message spelling. 2022-03-21 13:37:08 +01:00
Seth Hoenig 3303a4534a
Merge pull request #12322 from hashicorp/ci-gha
ci: turn on testing in github actions
2022-03-18 12:58:09 -05:00
Seth Hoenig 8eea6e3aa3 ci: scope to push, ignore more dirs, update go update script 2022-03-18 12:47:38 -05:00
Seth Hoenig 57bd480062 ci: turn on testing in github actions 2022-03-18 11:12:24 -05:00
Seth Hoenig f2914ea36c
Merge pull request #12321 from hashicorp/ci-less-logging
ci: limit gotestsum to circle ci
2022-03-18 10:02:13 -05:00
Seth Hoenig 4d86f5d94d ci: limit gotestsum to circle ci
Part 2 of breaking up https://github.com/hashicorp/nomad/pull/12255

This PR makes it so gotestsum is invoked only in CircleCI. Also the
HCLogger(t) is plumbed more correctly in TestServer and TestAgent so
that they respect NOMAD_TEST_LOG_LEVEL.

The reason for these is we'll want to disable logging in GHA,
where spamming the disk with logs really drags performance.
2022-03-18 09:15:01 -05:00
Tim Gross 1561f66d99
api: fix ENT-only test imports for moved testutil package (#12320)
The `api/testutil` package was moved to `api/internal/testutil` but
this wasn't caught in the ENT tests because they're not run here in
the OSS repo.
2022-03-18 10:12:28 -04:00
Tim Gross 9f05d62338
E2E with HCP Consul/Vault (#12267)
Use HCP Consul and HCP Vault for the Consul and Vault clusters used in E2E testing. This has the following benefits:

* Without the need to support mTLS bootstrapping for Consul and Vault, we can simplify the mTLS configuration by leaning on Terraform instead of janky bash shell scripting.
* Vault bootstrapping is no longer required, so we can eliminate even more janky shell scripting
* Our E2E exercises HCP, which is important to us as an organization
* With the reduction in configurability, we can simplify the Terraform configuration and drop the complicated `provision.sh`/`provision.ps1` scripts we were using previously. We can template Nomad configuration files and upload them with the `file` provisioner.
* Packer builds for Linux and Windows become much simpler.

tl;dr way less janky shell scripting!
2022-03-18 09:27:28 -04:00
Seth Hoenig ab9a639a0a
Merge pull request #12313 from hashicorp/purge-parallel-2
ci: more parallel removal
2022-03-17 13:48:37 -05:00
Seth Hoenig b73d911f05 ci: do not exclude Parallel semgrep rule 2022-03-17 13:45:56 -05:00
Luiz Aoqui 68e5b58007
cli: display Raft version in `server members` (#12317)
The previous output of the `nomad server members` command would output a
column named `Protocol` that displayed the Serf protocol being currently
used by servers.

This is not a configurable option, so it holds very little value to
operators. It is also easy to confuse it with the Raft Protocol version,
which is configurable and highly relevant to operators.

This commit replaces the previous `Protocol` column with the new `Raft
Version`. It also updates the `-detailed` flag to be called `-verbose`
so it matches other commands. The detailed output now also outputs the
same information as the standard output with the addition of the
previous `Protocol` column and `Tags`.
2022-03-17 14:15:10 -04:00
Luiz Aoqui 15089f055f
api: add related evals to eval details (#12305)
The `related` query param is used to indicate that the request should
return a list of related (next, previous, and blocked) evaluations.

Co-authored-by: Jasmine Dahilig <jasmine@hashicorp.com>
2022-03-17 13:56:14 -04:00
Luiz Aoqui 8db12c2a17
server: transfer leadership in case of error (#12293)
When a Nomad server becomes the Raft leader, it must perform several
actions defined in the establishLeadership function. If any of these
actions fail, Raft will think the node is the leader, but it will not
actually be able to act as a Nomad leader.

In this scenario, leadership must be revoked and transferred to another
server if possible, or the node should retry the establishLeadership
steps.
2022-03-17 11:10:57 -04:00
Seth Hoenig 373d8f7241 ci: missing import for nomad09upgrade 2022-03-17 08:49:15 -05:00
Seth Hoenig 58b3d1711b ci: semgrep rule for parallel tests
Adds a semgrep rule warning about using ci.Parallel instead of t.Parallel
2022-03-17 08:43:37 -05:00
Seth Hoenig f87eb666c7 e2e: have e2e use ci.Parallel
This is a followup to having tests run in serial in CI.

The e2e package isn't in CI, but lets use the helper anyway
so we can setup semgrep rules covering the entire repository.
2022-03-17 08:37:34 -05:00
Seth Hoenig 3943dd1e16 ci: use serial testing for api in CI
This is a followup to running tests in serial in CI.
Since the API package cannot import anything outside of api/,
copy the ci.Parallel function into api/internal/testutil, and
have api tests use that.
2022-03-17 08:35:01 -05:00
James Rasell 91e4d20b1d
Merge pull request #12307 from hashicorp/b-groupservices-avoid-double-tg-lookup
client: avoid double group lookup within groupservice hook setup.
2022-03-16 17:00:01 +01:00
Luiz Aoqui 83d834d84c
tests: move state store namespace tests from ENT (#12308) 2022-03-16 11:56:11 -04:00
Seth Hoenig aca50349f4
Merge pull request #12299 from hashicorp/ci-parallel
ci: trade test parallelization for unconstrained gomaxprocs
2022-03-16 08:55:39 -05:00
Seth Hoenig 2b83614a26 ci: explain why ci runs tests in serial now 2022-03-16 08:38:42 -05:00
James Rasell e7d0dfbc8b
client: avoid double group lookup within groupservice hook setup. 2022-03-16 09:42:57 +01:00
Seth Hoenig 2631659551 ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
Luiz Aoqui 8cf599c7fc
fix alloc list test (#12297)
The alloc list test with pagination was creating allocs before the
target namespace existed. This works in OSS but fails in ENT because
quotas are checked before the alloc can be created, so the namespace
must exist beforehand.
2022-03-15 10:41:07 -04:00
Tim Gross 3bf948dc00
docs: clarify `restart` inheritance and add examples (#12275)
Clarify the behavior of `restart` inheritance with respect to Connect
sidecar tasks. Remove incorrect language about the scheduler being
involved in restart decisions. Try to make the `delay` mode
documentation more clear, and provide examples of delay vs fail.
2022-03-14 15:49:08 -04:00
Luiz Aoqui 9b393d0535
docs: initial docs for the new API features (#12094) 2022-03-14 10:58:42 -04:00
Lars Lehtonen 93cc3392ad
scheduler: fix unused dstate variable (#12268) 2022-03-14 10:00:59 -04:00
Luiz Aoqui 2876739a51
api: apply consistent behaviour of the reverse query parameter (#12244) 2022-03-11 19:44:52 -05:00
Luiz Aoqui a42e64c039
docs: add namespace param to job parse API (#12258) 2022-03-10 16:35:07 -05:00
Seth Hoenig 098767ce57
Merge pull request #12253 from hashicorp/hack-tiny-chroot
testing: use a smaller chroot when running exec driver tests
2022-03-09 17:12:13 -06:00
Seth Hoenig 6f0998fcad testing: use a smaller chroot when running exec driver tests
The default chroot copies all of /bin, /usr, etc. which can ammount
to gigabytes of stuff not actually needed for running our tests.

Use a smaller chroot in test cases so that CI infra with poor disk
IO has a chance.
2022-03-09 16:24:07 -06:00
Seth Hoenig abf46fa6aa
Merge pull request #12251 from hashicorp/docs-contributing-cgo
docs: describe the cgo dependency
2022-03-09 13:12:48 -06:00
Seth Hoenig ebae987480 docs: describe the cgo dependency 2022-03-09 12:46:57 -06:00
Tim Gross 066a820209
job summary query in `Job.List` RPC should use job's namespace (#12249)
The `Job.List` RPC attaches a `JobSummary` to each job stub. We're
using the request namespace and not the job namespace for that query,
which results in a nil `JobSummary` whenever we pass the wildcard
namespace. This is incorrect and causes panics in downstream consumers
like the CLI, which assume the `JobSummary` is non-nil as an unstate
invariant.
2022-03-09 10:47:19 -05:00
Luiz Aoqui 550c5ab6ec
fix TestCSIVolumeEndpoint_List_PaginationFiltering test (#12245) 2022-03-09 09:40:40 -05:00
Luiz Aoqui ab8ce87bba
Add pagination, filtering and sort to more API endpoints (#12186) 2022-03-08 20:54:17 -05:00
Michael Klein e096a0a5ab
Upgrade Ember and friends 3.28 (#12215)
* chore: upgrade forward compatible packages

* chore: v3.20.2...v3.24.0

* chore: silence string prototype extension deprecation

* refact: don't test clicking disabled button job-list

Recent test-helper upgrades will guard against clicking disabled buttons
as this is not something that real users can do. We need to change our
tests accordingly.

* fix: await async test helper `expectError`

We have to await this async test function otherwise the test's
rendering context will be torn down before we run assertions
against it.

* fix: don't try to click disabled two-step-button

Recent test-helper updates prohibit clicking disabled buttons. We need
to adapt the tests accordingly.

* fix: recommendation-accordion

Use up-to-date semantics for handling list-accordion closing
in recommendation-accordion.

* fixes toggling recommendation-accordion toggle.

* fix: simple-unless linting error application.hbs

There's no reason to use unless here - we can use if instead.

* fix: no-quoteless-attributes recommendation accordion

* fix: no-quoteless-attributes recommendation-chart

* fix: allow `unless` - global-header.hbs

This is a valid use of unless in our opinion.

* fix: allow unless in job-diff

This is not a great use for unless but we don't want to change this
behavior atm.

* fix: no-attrs-in-components list-pager

There is no need to use this.attrs in classic components. When we
will convert to glimmer we will use `@`-instead.

* fix: simple-unless job/definition

We can convert to a simple if here.

* fix: allow inline-styles stats-box component

To make linter happy.

* fix: disable no-action and no-invalid-interactive

Will be adressed in follow-up PRs.

* chore: update ember-classic-decorator to latest

* chore: upgrade ember-can to latest

* chore: upgrade ember-composable-helpers to latest

* chore: upgrade ember-concurrency

* fix: recomputation deprecation `Trigger`

schedule `do` on actions queue to work around recomputation deprecation
when triggering Trigger on `did-insert`.

* chore: upgrade ember-cli-string-helpers

* chore: upgrade ember-copy

* chore: upgrade ember-data-model-fragments

* chore: upgrade ember-deprecation-workflow

* chore: upgrade ember-inline-svg

* chore: upgrade ember-modifier

* chore: upgrade ember-truth-helpers

* chore: upgrade ember-moment & ember-cli-moment-shim

* chore: upgrade ember-power-select

* chore: upgrade ember-responsive

* chore: upgrade ember-sinon

* chore: upgrade ember-cli-mirage

For now we will stay on 2.2 - upgrades > 2.3 break the build.

* chore: upgrade 3.24.0 to 3.28.5

* fix: add missing classic decorators on adapters

* fix: missing classic decorators to serializers

* fix: don't reopen Ember.Object anymore

* fix: remove unused useNativeEvents

ember-cli-page-objects doesn't provide this method anymore

* fix: add missing attributeBindings for test-selectors

ember-test-selectors doesn't provides automatic bindings for
data-test-* attributes anymore.

* fix: classic decorator for application serializer test

* fix: remove `removeContext` from tests.

It is unneeded and ember-cli-page-objects doesn't provides
this method anymore.

* fix: remove deprecations `run.*`-invocations

* fix: `collapseWhitespace` in optimize test

* fix: make sure to load async relationship before access

* fix: dependent keys for relationship computeds

We need to add `*.isFulfilled` as dependent keys for computeds that
access async relationships.

* fix: `computed.read`-invocations use `read` instead

* chore: prettify templates

* fix: use map instead of mapBy ember-cli-page-object

Doesn't work with updated ember-cli-page-object anymore.

* fix: remove remaining deprecated `run.*`-calls

* chore: add more deprecations deprecation-workflow

* fix: `implicit-injection`-deprecation

All routes that add watchers will need to inject the store-service
as the store service is internally used in watchers.

* fix: more implicit injection deprecations

* chore: silence implicit-injection deprecation

We can tackle the deprecation when we find the time.

* fix: new linting errors after upgrade

* fix: remove merge conflicts prettierignore

* chore: upgrade to run node 12.22 when building binaries
2022-03-08 12:28:36 -05:00
Tim Gross 5ae30849a9
docs: add note about docker DNS config when using bridge mode (#12229)
The Docker DNS configuration options are not compatible with a
group-level network in `bridge` mode. Warn users about this in the
Docker task configuration docs.
2022-03-08 11:59:20 -05:00