* Convert assets from bindatafs to go embeds
* Add command/asset to "uninteresting" list for missing test check
* Remove generate-examples target
* Update paths in tests
The eval broker's `Cancelable` method used by the cancelable eval reaper mutates
the slice of cancelable evals by removing a batch at a time from the slice. But
this method unsafely uses a read lock despite this mutation. Under normal
workloads this is likely to be safe but when the eval broker is under the heavy
load this feature is intended to fix, we're likely to have a race
condition. Switch this to a write lock, like the other locks that mutate the
eval broker state.
This changeset also adjusts the timeout to allow poorly-sized Actions runners
more time to schedule the appropriate goroutines. The test has also been updated
to use `shoenig/test/wait` so we can have sensible reporting of the results
rather than just a timeout error when things go wrong.
Some of the core scheduler tests need the maximum batch size for writes to be
smaller than the usual `structs.MaxUUIDsPerWriteRequest`. But they do so by
unsafely modifying the global struct, which creates test flakes in other tests.
Modify the functions under test to take a batch size parameter. Production code
will pass the global while the tests can inject smaller values. Turn the
`structs.MaxUUIDsPerWriteRequest` into a constant, and add a semgrep rule for
avoiding this kind of thing in the future.
* docs: add dynamic node metadata api docs
Also update all paths in the client API docs to explicitly state the
`/v1/` prefix. We're inconsistent about that, but I think it's better to
display the full path than to only show the fragment. If we ever do a
`/v2/` whether or not we explicitly state `/v1/` in our docs won't be
our greatest concern.
* docs: add task-api docs
In #15901 we introduced pre-forwarding authentication for RPCs so that we can
grab the identity for rate metrics. The `ACL.Bootstrap` RPC is an
unauthenticated endpoint, so any error message from authentication is not
particularly useful. This would be harmless, but if you try to bootstrap with
your `NOMAD_TOKEN` already set (perhaps because you were talking to another
cluster previously from the same shell session), you'll get an authentication
error instead of just having the token be ignored. This is a regression from the
existing behavior, so have this endpoint ignore auth errors the same way we do
for every other unauthenticated endpoint (ex `Status.Peers`)
* users: create cache for user lookups
This PR introduces a global cache for OS user lookups. This should
relieve pressure on the OS domain/directory lookups, which would be
queried more now that Task API exists.
Hits are cached for 1 hour, and misses are cached for 1 minute. These
values are fairly arbitrary - we can tweak them if there is any reason to.
Closes#16010
* users: delete expired negative entry from cache
When an auth method was not supplied and the OIDC type was given
in lower case, the CLI was not matching the default method due to
casing and responded with a confusing user message.
This change fixes the above problem, along with making use of the
santized type easier.
Fixes#14617
Dynamic Node Metadata allows Nomad users, and their jobs, to update Node metadata through an API. Currently Node metadata is only reloaded when a Client agent is restarted.
Includes new UI for editing metadata as well.
---------
Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>
In #13374 we updated the commented-out `license_path` in the packaged example
configuration file to match the existing documentation. Although this config
value was commented-out, it was reported that changing the value was
confusing. Update the commented-out line to the previous value and update the
documented examples to match that. This matches most of the examples for
Consul/Vault licensing as well. I've double-checked the tutorials and it looks
like it'd been left on the previous value there, so no additional work to be
done.
* main: remove deprecated uses of rand.Seed
go1.20 deprecates rand.Seed, and seeds the rand package
automatically. Remove cases where we seed the random package,
and cleanup the one case where we intentionally create a
known random source.
* cl: update cl
* mod: update go mod
* docker: disable driver when running as non-root on cgroups v2 hosts
This PR modifies the docker driver to behave like exec when being run
as a non-root user on a host machine with cgroups v2 enabled. Because
of how cpu resources are managed by the Nomad client, the nomad agent
must be run as root to manage docker-created cgroups.
* cl: update cl
This change introduces the Task API: a portable way for tasks to access Nomad's HTTP API. This particular implementation uses a Unix Domain Socket and, unlike the agent's HTTP API, always requires authentication even if ACLs are disabled.
This PR contains the core feature and tests but followup work is required for the following TODO items:
- Docs - might do in a followup since dynamic node metadata / task api / workload id all need to interlink
- Unit tests for auth middleware
- Caching for auth middleware
- Rate limiting on negative lookups for auth middleware
---------
Co-authored-by: Seth Hoenig <shoenig@duck.com>
Many of the functions in the `utils.go` file are specific to a particular
scheduler, and very few of them have guards (or even names) that help avoid
misuse with features specific to a given scheduler type. Move these
functions (and their tests) into files specific to their scheduler type without
any functionality changes to make it clear which bits go with what.
* Demoable state
* Demo mirage color
* Label as a block with foreground and background colours
* Test mock updates
* Go test updated
* Documentation update for label support
Service jobs should have unique allocation Names, derived from the
Job.ID. System jobs do not have unique allocation Names because the index is
intended to indicated the instance out of a desired count size. Because system
jobs do not have an explicit count but the results are based on the targeted
nodes, the index is less informative and this was intentionally omitted from the
original design.
Update docs to make it clear that NOMAD_ALLOC_INDEX is always zero for
system/sysbatch jobs
Validate that `volume.per_alloc` is incompatible with system/sysbatch jobs.
System and sysbatch jobs always have a `NOMAD_ALLOC_INDEX` of 0. So
interpolation via `per_alloc` will not work as soon as there's more than one
allocation placed. Validate against this on job submission.
* largely a doc-ification of this commit message:
d47678074bf8ae9ff2da3c91d0729bf03aee8446
this doesn't spell out all the possible failure modes,
but should be a good starting point for folks.
* connect: add doc link to envoy bootstrap error
* add Unwrap() to RecoverableError
mainly for easier testing