When a node is garbage collected, we assume that the volume is no
longer attached to it and ignore the `ErrUnknownNode` error. But we
used `errors.Is` to check for a wrapped error, and RPC flattens the
errors during serialization. This results in an error check that works
in automated testing but not in real clusters. Use a string contains
check instead.
Raft v3 introduced a new API for adding and removing peers that takes
the peer ID instead of the address.
Prior to this change, Nomad would use the remote peer Raft version for
deciding which API to use, but this would not work in the scenario where
a Raft v3 server tries to remove a Raft v2 server; the code running uses
v3 so it's unable to call the v2 API.
This change uses the Raft version of the server running the code to
decide which API to use. If the remote peer is a Raft v2, it uses the
server address as the ID.
When a node is drained, system jobs are left until last so that
operators can rely on things like log shippers running even as their
applications are getting drained off. Include CSI plugins in this set
so that Controller plugins deployed as services can be handled as
gracefully as Node plugins that are running as system jobs.
* Fix plugin capability sorting.
The `sort.StringSlice` method in the stdlib doesn't actually sort, but
instead constructs a sorting type which you call `Sort()` on.
* Sort allocations for plugins by modify index.
Present allocations in modify index order so that newest allocations
show up at the top of the list. This results in sorted allocs in
`nomad plugin status :id`, just like `nomad job status :id`.
* Sort allocations for volumes in HTTP response.
Present allocations in modify index order so that newest allocations
show up at the top of the list. This results in sorted allocs in
`nomad volume status :id`, just like `nomad job status :id`.
This is implemented in the HTTP response and not in the state store
because the state store maintains two separate lists of allocs that
are merged before sending over the API.
* Fix length of alloc IDs in `nomad volume status` output
* Use unix:// prefix for CSI_ENDPOINT variable by default
* Some plugins have strict validation over the format of the
`CSI_ENDPOINT` variable, and unfortunately not all plugins
agree. Allow the user to override the `CSI_ENDPOINT` to workaround
those cases.
* Update all demos and tests with CSI_ENDPOINT
The `ConnectACLsE2ETest` checks that the SI tokens have been properly
cleaned up between tests, but following the change to use HCP the
previous `Connect` test suite will often have SI tokens that haven't
been cleaned up by the time this test suite runs. Wait for the SI
tokens to be cleaned up at the start of the test to ensure we have a
clean state.
Part 2 of breaking up https://github.com/hashicorp/nomad/pull/12255
This PR makes it so gotestsum is invoked only in CircleCI. Also the
HCLogger(t) is plumbed more correctly in TestServer and TestAgent so
that they respect NOMAD_TEST_LOG_LEVEL.
The reason for these is we'll want to disable logging in GHA,
where spamming the disk with logs really drags performance.
Use HCP Consul and HCP Vault for the Consul and Vault clusters used in E2E testing. This has the following benefits:
* Without the need to support mTLS bootstrapping for Consul and Vault, we can simplify the mTLS configuration by leaning on Terraform instead of janky bash shell scripting.
* Vault bootstrapping is no longer required, so we can eliminate even more janky shell scripting
* Our E2E exercises HCP, which is important to us as an organization
* With the reduction in configurability, we can simplify the Terraform configuration and drop the complicated `provision.sh`/`provision.ps1` scripts we were using previously. We can template Nomad configuration files and upload them with the `file` provisioner.
* Packer builds for Linux and Windows become much simpler.
tl;dr way less janky shell scripting!
The previous output of the `nomad server members` command would output a
column named `Protocol` that displayed the Serf protocol being currently
used by servers.
This is not a configurable option, so it holds very little value to
operators. It is also easy to confuse it with the Raft Protocol version,
which is configurable and highly relevant to operators.
This commit replaces the previous `Protocol` column with the new `Raft
Version`. It also updates the `-detailed` flag to be called `-verbose`
so it matches other commands. The detailed output now also outputs the
same information as the standard output with the addition of the
previous `Protocol` column and `Tags`.
The `related` query param is used to indicate that the request should
return a list of related (next, previous, and blocked) evaluations.
Co-authored-by: Jasmine Dahilig <jasmine@hashicorp.com>
When a Nomad server becomes the Raft leader, it must perform several
actions defined in the establishLeadership function. If any of these
actions fail, Raft will think the node is the leader, but it will not
actually be able to act as a Nomad leader.
In this scenario, leadership must be revoked and transferred to another
server if possible, or the node should retry the establishLeadership
steps.
This is a followup to having tests run in serial in CI.
The e2e package isn't in CI, but lets use the helper anyway
so we can setup semgrep rules covering the entire repository.
This is a followup to running tests in serial in CI.
Since the API package cannot import anything outside of api/,
copy the ci.Parallel function into api/internal/testutil, and
have api tests use that.
The alloc list test with pagination was creating allocs before the
target namespace existed. This works in OSS but fails in ENT because
quotas are checked before the alloc can be created, so the namespace
must exist beforehand.
Clarify the behavior of `restart` inheritance with respect to Connect
sidecar tasks. Remove incorrect language about the scheduler being
involved in restart decisions. Try to make the `delay` mode
documentation more clear, and provide examples of delay vs fail.
The default chroot copies all of /bin, /usr, etc. which can ammount
to gigabytes of stuff not actually needed for running our tests.
Use a smaller chroot in test cases so that CI infra with poor disk
IO has a chance.
The `Job.List` RPC attaches a `JobSummary` to each job stub. We're
using the request namespace and not the job namespace for that query,
which results in a nil `JobSummary` whenever we pass the wildcard
namespace. This is incorrect and causes panics in downstream consumers
like the CLI, which assume the `JobSummary` is non-nil as an unstate
invariant.
* chore: upgrade forward compatible packages
* chore: v3.20.2...v3.24.0
* chore: silence string prototype extension deprecation
* refact: don't test clicking disabled button job-list
Recent test-helper upgrades will guard against clicking disabled buttons
as this is not something that real users can do. We need to change our
tests accordingly.
* fix: await async test helper `expectError`
We have to await this async test function otherwise the test's
rendering context will be torn down before we run assertions
against it.
* fix: don't try to click disabled two-step-button
Recent test-helper updates prohibit clicking disabled buttons. We need
to adapt the tests accordingly.
* fix: recommendation-accordion
Use up-to-date semantics for handling list-accordion closing
in recommendation-accordion.
* fixes toggling recommendation-accordion toggle.
* fix: simple-unless linting error application.hbs
There's no reason to use unless here - we can use if instead.
* fix: no-quoteless-attributes recommendation accordion
* fix: no-quoteless-attributes recommendation-chart
* fix: allow `unless` - global-header.hbs
This is a valid use of unless in our opinion.
* fix: allow unless in job-diff
This is not a great use for unless but we don't want to change this
behavior atm.
* fix: no-attrs-in-components list-pager
There is no need to use this.attrs in classic components. When we
will convert to glimmer we will use `@`-instead.
* fix: simple-unless job/definition
We can convert to a simple if here.
* fix: allow inline-styles stats-box component
To make linter happy.
* fix: disable no-action and no-invalid-interactive
Will be adressed in follow-up PRs.
* chore: update ember-classic-decorator to latest
* chore: upgrade ember-can to latest
* chore: upgrade ember-composable-helpers to latest
* chore: upgrade ember-concurrency
* fix: recomputation deprecation `Trigger`
schedule `do` on actions queue to work around recomputation deprecation
when triggering Trigger on `did-insert`.
* chore: upgrade ember-cli-string-helpers
* chore: upgrade ember-copy
* chore: upgrade ember-data-model-fragments
* chore: upgrade ember-deprecation-workflow
* chore: upgrade ember-inline-svg
* chore: upgrade ember-modifier
* chore: upgrade ember-truth-helpers
* chore: upgrade ember-moment & ember-cli-moment-shim
* chore: upgrade ember-power-select
* chore: upgrade ember-responsive
* chore: upgrade ember-sinon
* chore: upgrade ember-cli-mirage
For now we will stay on 2.2 - upgrades > 2.3 break the build.
* chore: upgrade 3.24.0 to 3.28.5
* fix: add missing classic decorators on adapters
* fix: missing classic decorators to serializers
* fix: don't reopen Ember.Object anymore
* fix: remove unused useNativeEvents
ember-cli-page-objects doesn't provide this method anymore
* fix: add missing attributeBindings for test-selectors
ember-test-selectors doesn't provides automatic bindings for
data-test-* attributes anymore.
* fix: classic decorator for application serializer test
* fix: remove `removeContext` from tests.
It is unneeded and ember-cli-page-objects doesn't provides
this method anymore.
* fix: remove deprecations `run.*`-invocations
* fix: `collapseWhitespace` in optimize test
* fix: make sure to load async relationship before access
* fix: dependent keys for relationship computeds
We need to add `*.isFulfilled` as dependent keys for computeds that
access async relationships.
* fix: `computed.read`-invocations use `read` instead
* chore: prettify templates
* fix: use map instead of mapBy ember-cli-page-object
Doesn't work with updated ember-cli-page-object anymore.
* fix: remove remaining deprecated `run.*`-calls
* chore: add more deprecations deprecation-workflow
* fix: `implicit-injection`-deprecation
All routes that add watchers will need to inject the store-service
as the store service is internally used in watchers.
* fix: more implicit injection deprecations
* chore: silence implicit-injection deprecation
We can tackle the deprecation when we find the time.
* fix: new linting errors after upgrade
* fix: remove merge conflicts prettierignore
* chore: upgrade to run node 12.22 when building binaries
The Docker DNS configuration options are not compatible with a
group-level network in `bridge` mode. Warn users about this in the
Docker task configuration docs.