Add structs and fields to support the Nomad Pools Governance Enterprise
feature of controlling node pool access via namespaces.
Nomad Enterprise allows users to specify a default node pool to be used
by jobs that don't specify one. In order to accomplish this, it's
necessary to distinguish between a job that explicitly uses the
`default` node pool and one that did not specify any.
If the `default` node pool is set during job canonicalization it's
impossible to do this, so this commit allows a job to have an empty node
pool value during registration but sets to `default` at the admission
controller mutator.
In order to guarantee state consistency the state store validates that
the job node pool is set and exists before inserting it.
Implement scheduler support for node pool:
* When a scheduler is invoked, we get a set of the ready nodes in the DCs that
are allowed for that job. Extend the filter to include the node pool.
* Ensure that changes to a job's node pool are picked up as destructive
allocation updates.
* Add `NodesInPool` as a metric to all reporting done by the scheduler.
* Add the node-in-pool the filter to the `Node.Register` RPC so that we don't
generate spurious evals for nodes in the wrong pool.
Implements the HTTP API associated with the `NodePool.ListJobs` RPC, including
the `api` package for the public API and documentation.
Update the `NodePool.ListJobs` RPC to fix the missing handling of the special
"all" pool.
This changeset only adds the `node_pool` field to the jobspec, and ensures that
it gets picked up correctly as a change. Without the rest of the implementation
landed yet, the field will be ignored.
When the server restarts for the upgrade, it loads the `structs.Job` from the
Raft snapshot/logs. The jobspec has long since been parsed, so none of the
guards around the default value are in play. The empty field value for `Enabled`
is the zero value, which is false.
This doesn't impact any running allocation because we don't replace running
allocations when either the client or server restart. But as soon as any
allocation gets rescheduled (ex. you drain all your clients during upgrades),
it'll be using the `structs.Job` that the server has, which has `Enabled =
false`, and logs will not be collected.
This changeset fixes the bug by adding a new field `Disabled` which defaults to
false (so that the zero value works), and deprecates the old field.
Fixes#17076
This PR modifies references to the envoyproxy/envoy docker image to
explicitly include the docker.io prefix. This does not affect existing
users, but makes things easier for Podman users, who otherwise need to
specify the full name because Podman does not default to docker.io
Some Nomad users ship application logs out-of-band via syslog. For these users
having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow
disabling the logmon and pointing the task's stdout/stderr to /dev/null.
This changeset is the first of several incremental improvements to log
collection short of full-on logging plugins. The next step will likely be to
extend the internal-only task driver configuration so that cluster
administrators can turn off log collection for the entire driver.
---
Fixes: #11175
Co-authored-by: Thomas Weber <towe75@googlemail.com>
* api: enable support for setting original source alongside job
This PR adds support for setting job source material along with
the registration of a job.
This includes a new HTTP endpoint and a new RPC endpoint for
making queries for the original source of a job. The
HTTP endpoint is /v1/job/<id>/submission?version=<version> and
the RPC method is Job.GetJobSubmission.
The job source (if submitted, and doing so is always optional), is
stored in the job_submission memdb table, separately from the
actual job. This way we do not incur overhead of reading the large
string field throughout normal job operations.
The server config now includes job_max_source_size for configuring
the maximum size the job source may be, before the server simply
drops the source material. This should help prevent Bad Things from
happening when huge jobs are submitted. If the value is set to 0,
all job source material will be dropped.
* api: avoid writing var content to disk for parsing
* api: move submission validation into RPC layer
* api: return an error if updating a job submission without namespace or job id
* api: be exact about the job index we associate a submission with (modify)
* api: reword api docs scheduling
* api: prune all but the last 6 job submissions
* api: protect against nil job submission in job validation
* api: set max job source size in test server
* api: fixups from pr
* docs: dispatch_payload docs had some weirdness
Docs said "Examples" when there was only 1 example. Not sure what the
floating "to" in the description was for.
* docs: missing a heading level on jobs api docs
The job evaluate endpoint creates a new evaluation for the job which is
a write operation. This change modifies the necessary capability from
`read-job` to `submit-job` to better reflect this.
* docs: add dynamic node metadata api docs
Also update all paths in the client API docs to explicitly state the
`/v1/` prefix. We're inconsistent about that, but I think it's better to
display the full path than to only show the fragment. If we ever do a
`/v2/` whether or not we explicitly state `/v1/` in our docs won't be
our greatest concern.
* docs: add task-api docs
Add `identity` jobspec block to expose workload identity tokens to tasks.
---------
Co-authored-by: Anders <mail@anars.dk>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
The ACL token decoding was not correctly handling time duration
syntax such as "1h" which forced people to use the nanosecond
representation via the HTTP API.
The change adds an unmarshal function which allows this syntax to
be used, along with other styles correctly.
Add a new `Eval.Count` RPC and associated HTTP API endpoints. This API is
designed to support interactive use in the `nomad eval delete` command to get a
count of evals expected to be deleted before doing so.
The state store operations to do this sort of thing are somewhat expensive, but
it's cheaper than serializing a big list of evals to JSON. Note that although it
seems like this could be done as an extra parameter and response field on
`Eval.List`, having it as its own endpoint avoids having to change the response
body shape and lets us avoid handling the legacy filter params supported by
`Eval.List`.
* Adds meta to job list stub and displays a pack logo on the jobs index
* Changelog
* Modifying struct for optional meta param
* Explicitly ask for meta anytime I look up a job from index or job page
* Test case for the endpoint
* adding meta field to API struct and ommitting from response if empty
* passthru method added to api/jobs.list
* Meta param listed in docs for jobs list
* Update api/jobs.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
This changes adds ACL role creation and deletion to the event
stream. It is exposed as a single topic with two types; the filter
is primarily the role ID but also includes the role name.
While conducting this work it was also discovered that the events
stream has its own ACL resolution logic. This did not account for
ACL tokens which included role links, or tokens with expiry times.
ACL role links are now resolved to their policies and tokens are
checked for expiry correctly.