The RPC TLS enforcement test was frequently failing with broken connections. The
most likely cause was that the tests started to run before the server had
started its RPC server. Wait until it self-elects to ensure that the RPC server
is up. This seems to have corrected the error; I ran this 3 times without a
failure (even accounting for `gotestsum` retries).
Also, fix a minor test bug that didn't impact the test but showed an incorrect
usage for `Status.Ping.`
The `nomad fmt -check` command incorrectly writes to file because we didn't
return before writing the file on a diff. Fix this bug and update the command
internals to differentiate between the write-to-file and write-to-stdout code
paths, which are activated by different combinations of options and flags.
The docstring for the `-list` and `-write` flags is also unclear and can be
easily misread to be the opposite of the actual behavior. Clarify this and fix
up the docs to match.
This changeset also refactors the tests quite a bit so as to make the test
outputs clear when something is incorrect.
The panic bug for upgrades with older servers that shipped in 1.4.0 was fixed in
1.4.1, which makes the versions described in the warning in the upgrade guide
misleading. Clarify the upgrade guide.
* artifact: protect against unbounded artifact decompression
Starting with 1.5.0, set defaut values for artifact decompression limits.
artifact.decompression_size_limit (default "100GB") - the maximum amount of
data that will be decompressed before triggering an error and cancelling
the operation
artifact.decompression_file_count_limit (default 4096) - the maximum number
of files that will be decompressed before triggering an error and
cancelling the operation.
* artifact: assert limits cannot be nil in validation
* Warn when Items key isn't directly accessible
Go template requires that map keys are alphanumeric for direct access
using the dotted reference syntax. This warns users when they create
keys that run afoul of this requirement.
- cli: use regex to detect invalid indentifiers in var keys
- test: fix slash in escape test case
- api: share warning formatting function between API and CLI
- ui: warn if var key has characters other than _, letter, or number
---------
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
This PR fixes the CNI plugin fingerprinter to take into account the fact
that the cni_path config can be a multi-path (e.g. `/foo:/bar:/baz`).
Accumulate plugins from each of the possible path elements. If scanning
any of the named directory fails, the fingerprinter fails.
Fixes#16083
No CL/BP - has not shipped yet.
* Convert assets from bindatafs to go embeds
* Add command/asset to "uninteresting" list for missing test check
* Remove generate-examples target
* Update paths in tests
The eval broker's `Cancelable` method used by the cancelable eval reaper mutates
the slice of cancelable evals by removing a batch at a time from the slice. But
this method unsafely uses a read lock despite this mutation. Under normal
workloads this is likely to be safe but when the eval broker is under the heavy
load this feature is intended to fix, we're likely to have a race
condition. Switch this to a write lock, like the other locks that mutate the
eval broker state.
This changeset also adjusts the timeout to allow poorly-sized Actions runners
more time to schedule the appropriate goroutines. The test has also been updated
to use `shoenig/test/wait` so we can have sensible reporting of the results
rather than just a timeout error when things go wrong.
Some of the core scheduler tests need the maximum batch size for writes to be
smaller than the usual `structs.MaxUUIDsPerWriteRequest`. But they do so by
unsafely modifying the global struct, which creates test flakes in other tests.
Modify the functions under test to take a batch size parameter. Production code
will pass the global while the tests can inject smaller values. Turn the
`structs.MaxUUIDsPerWriteRequest` into a constant, and add a semgrep rule for
avoiding this kind of thing in the future.
* docs: add dynamic node metadata api docs
Also update all paths in the client API docs to explicitly state the
`/v1/` prefix. We're inconsistent about that, but I think it's better to
display the full path than to only show the fragment. If we ever do a
`/v2/` whether or not we explicitly state `/v1/` in our docs won't be
our greatest concern.
* docs: add task-api docs
In #15901 we introduced pre-forwarding authentication for RPCs so that we can
grab the identity for rate metrics. The `ACL.Bootstrap` RPC is an
unauthenticated endpoint, so any error message from authentication is not
particularly useful. This would be harmless, but if you try to bootstrap with
your `NOMAD_TOKEN` already set (perhaps because you were talking to another
cluster previously from the same shell session), you'll get an authentication
error instead of just having the token be ignored. This is a regression from the
existing behavior, so have this endpoint ignore auth errors the same way we do
for every other unauthenticated endpoint (ex `Status.Peers`)
* users: create cache for user lookups
This PR introduces a global cache for OS user lookups. This should
relieve pressure on the OS domain/directory lookups, which would be
queried more now that Task API exists.
Hits are cached for 1 hour, and misses are cached for 1 minute. These
values are fairly arbitrary - we can tweak them if there is any reason to.
Closes#16010
* users: delete expired negative entry from cache
When an auth method was not supplied and the OIDC type was given
in lower case, the CLI was not matching the default method due to
casing and responded with a confusing user message.
This change fixes the above problem, along with making use of the
santized type easier.