This PR fixes a bug where the task group information was not being set
on the serviceHook.AllocInfo struct, which is needed later on for calculating
the CheckID of a nomad service check. The CheckID is calculated independently
from multiple callsites, and the information being passed in must be consistent,
including the group name.
The workload.AllocInfo.Group was not set at this callsite, due to the bug fixed in this PR.
https://github.com/hashicorp/nomad/blob/main/client/serviceregistration/nsd/nsd.go#L114
This PR
- fixes a panic in GetItems when looking up a variable that does not exist.
- deprecates GetItems in favor of GetVariableItems which avoids returning a pointer to a map
- deprecates ErrVariableNotFound in favor of ErrVariablePathNotFound which is an actual error type
- does some minor code cleanup to make linters happier
* taskapi: return Forbidden on bad credentials
Prior to this change a "Server error" would be returned when ACLs are
enabled which did not match when ACLs are disabled.
* e2e: love love love datacenter wildcard default
* e2e: skip windows nodes on linux only test
The Logfs are a bit weird because they're most useful when converted to
Printfs to make debugging the test much faster, but that makes CI noisy.
In a perfect world Go would expose how many tests are being run and we
could stream output live if there's only 1. For now I left these helpful
lines in as basically glorified comments.
Add an Elastic Network Interface (ENI) to each Linux host, on a secondary subnet
we have provisioned in each AZ. Revise security groups as follows:
* Split out client security groups from servers so that we can't have clients
accidentally accessing serf addresses or other unexpected cross-talk.
* Add new security groups for the secondary subnet that only allows
communication within the security group so we can exercise behaviors with
multiple IPs.
This changeset doesn't include any Nomad configuration changes needed to take
advantage of the extra network interface. I'll include those with testing for
PR #16217.
The RPC TLS enforcement test was frequently failing with broken connections. The
most likely cause was that the tests started to run before the server had
started its RPC server. Wait until it self-elects to ensure that the RPC server
is up. This seems to have corrected the error; I ran this 3 times without a
failure (even accounting for `gotestsum` retries).
Also, fix a minor test bug that didn't impact the test but showed an incorrect
usage for `Status.Ping.`
The `nomad fmt -check` command incorrectly writes to file because we didn't
return before writing the file on a diff. Fix this bug and update the command
internals to differentiate between the write-to-file and write-to-stdout code
paths, which are activated by different combinations of options and flags.
The docstring for the `-list` and `-write` flags is also unclear and can be
easily misread to be the opposite of the actual behavior. Clarify this and fix
up the docs to match.
This changeset also refactors the tests quite a bit so as to make the test
outputs clear when something is incorrect.
The panic bug for upgrades with older servers that shipped in 1.4.0 was fixed in
1.4.1, which makes the versions described in the warning in the upgrade guide
misleading. Clarify the upgrade guide.
* artifact: protect against unbounded artifact decompression
Starting with 1.5.0, set defaut values for artifact decompression limits.
artifact.decompression_size_limit (default "100GB") - the maximum amount of
data that will be decompressed before triggering an error and cancelling
the operation
artifact.decompression_file_count_limit (default 4096) - the maximum number
of files that will be decompressed before triggering an error and
cancelling the operation.
* artifact: assert limits cannot be nil in validation
* Warn when Items key isn't directly accessible
Go template requires that map keys are alphanumeric for direct access
using the dotted reference syntax. This warns users when they create
keys that run afoul of this requirement.
- cli: use regex to detect invalid indentifiers in var keys
- test: fix slash in escape test case
- api: share warning formatting function between API and CLI
- ui: warn if var key has characters other than _, letter, or number
---------
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
This PR fixes the CNI plugin fingerprinter to take into account the fact
that the cni_path config can be a multi-path (e.g. `/foo:/bar:/baz`).
Accumulate plugins from each of the possible path elements. If scanning
any of the named directory fails, the fingerprinter fails.
Fixes#16083
No CL/BP - has not shipped yet.
* Convert assets from bindatafs to go embeds
* Add command/asset to "uninteresting" list for missing test check
* Remove generate-examples target
* Update paths in tests
The eval broker's `Cancelable` method used by the cancelable eval reaper mutates
the slice of cancelable evals by removing a batch at a time from the slice. But
this method unsafely uses a read lock despite this mutation. Under normal
workloads this is likely to be safe but when the eval broker is under the heavy
load this feature is intended to fix, we're likely to have a race
condition. Switch this to a write lock, like the other locks that mutate the
eval broker state.
This changeset also adjusts the timeout to allow poorly-sized Actions runners
more time to schedule the appropriate goroutines. The test has also been updated
to use `shoenig/test/wait` so we can have sensible reporting of the results
rather than just a timeout error when things go wrong.
Some of the core scheduler tests need the maximum batch size for writes to be
smaller than the usual `structs.MaxUUIDsPerWriteRequest`. But they do so by
unsafely modifying the global struct, which creates test flakes in other tests.
Modify the functions under test to take a batch size parameter. Production code
will pass the global while the tests can inject smaller values. Turn the
`structs.MaxUUIDsPerWriteRequest` into a constant, and add a semgrep rule for
avoiding this kind of thing in the future.
* docs: add dynamic node metadata api docs
Also update all paths in the client API docs to explicitly state the
`/v1/` prefix. We're inconsistent about that, but I think it's better to
display the full path than to only show the fragment. If we ever do a
`/v2/` whether or not we explicitly state `/v1/` in our docs won't be
our greatest concern.
* docs: add task-api docs
In #15901 we introduced pre-forwarding authentication for RPCs so that we can
grab the identity for rate metrics. The `ACL.Bootstrap` RPC is an
unauthenticated endpoint, so any error message from authentication is not
particularly useful. This would be harmless, but if you try to bootstrap with
your `NOMAD_TOKEN` already set (perhaps because you were talking to another
cluster previously from the same shell session), you'll get an authentication
error instead of just having the token be ignored. This is a regression from the
existing behavior, so have this endpoint ignore auth errors the same way we do
for every other unauthenticated endpoint (ex `Status.Peers`)