Workload identities grant implicit access to policies, and operators
will not want to craft separate policies for each invocation of a
periodic or dispatch job. Use the parent job's ID as the JobID claim.
The search RPC used a placeholder policy for searching within the secure
variables context. Now that we have ACL policies built for secure variables, we
can use them for search. Requires a new loose policy for checking if a token has
any secure variables access within a namespace, so that we can filter on
specific paths in the iterator.
Most of our objects use int64 timestamps derived from `UnixNano()` instead of
`time.Time` objects. Switch the keyring metadata to use `UnixNano()` for
consistency across the API.
To discourage accidentally DoS'ing the cluster with secure variables
data, we're providing a very low limit to the maximum size of a given
secure variable. This currently matches the limit for dispatch
payloads.
In future versions, we may increase this limit or make it
configurable, once we have better metrics from real-world operators.
Document the secure variables keyring commands, document the aliased
gossip keyring commands, and note that the old gossip keyring commands
are deprecated.
Return 429 response on HTTP max connection limit. Instead of silently closing
the connection, return a `429 Too Many Requests` HTTP response with a helpful
error message to aid debugging when the connection limit is unintentionally
reached.
Set a 10-millisecond write timeout and rate limiter for connection-limit 429
response to prevent writing the HTTP response from consuming too many server
resources.
Add `nomad.agent.http.exceeded metric` counting the number of HTTP connections
exceeding concurrency limit.
* Support pathLinkedEntities starting with nomad/jobs/ instead of jobs/
* links from jobs/groups/tasks to variables now look for nomad/jobs/ instead of jobs/
* Tests updated to reflect nomad/jobs/ change
* Acceptance test for disallowing nomad/foo/, and hint text updates
* Defensive logic in case path not yet set
* Allow exactly nomad/jobs as a variable path
Tasks are automatically granted access to variables on a path that matches their
workload identity, with a well-known prefix. Change the prefix to `nomad/jobs`
to allow for future prefixes like `nomad/volumes` or `nomad/plugins`. Reserve
the prefix by emitting errors during validation.
When applying a raft log to expire ACL tokens, we need to use a
timestamp provided by the leader so that the result is deterministic
across servers. Use leader's timestamp from RPC call
This PR creates a top-level 'check' page for job-specification docs.
The content for checks is about half the content of the service page, and
is about to increase in size when we add docs about Nomad service checks.
Seemed like a good idea to just split the checks section out into its own
thing (e.g. check_restart is already a topic).
Doing the move first lets us backport this change without adding Nomad service
check stuff yet.
Mostly just a lift-and-shift but with some tweaked examples to de-emphasize
the use of script checks.
The "Secure Nomad with Access Control" guide provides a tutorial for
bootstrapping Nomad ACLs, writing policies, and creating tokens. Add a reference
guide just for the ACL policy specification.
* docs: tighten up parameterized job metrics docs
* docs: improve alloc status descriptions
Remove `nomad.client.allocations.start` as it doesn't exist.
The test for simulating a key rotation across leader elections was
flaky because we weren't waiting for a leader election and was
checking the server configs rather than raft for which server was
currently the leader. Fixing the flake revealed a bug in the test that
we weren't ensuring the new leader was running its own replication, so
it wouldn't pick up the key material from the previous follower.
When secure variables are updated, we were adding the update to the
existing quota tracking without first checking whether it was an
update to an existing variable. In that case we need to add/subtract
only the difference between the new and existing quota usage.
Related to #13740
- blocked_evals.total_blocked is the number of evals blocked for *any*
reason
- blocked_evals.total_quota_limit is the number of evals blocked by
quota limits, but critically: their resources are *not* counted in the
cpu/memory
The split between OSS/ENT in ACL checks for the Search RPC has a lot
of repeated code that results in merge conflicts. Move most of the
logic into the shared code so that we can call out to thin functions
for ENT checks.
* Vars from job prototype
* singular linked variable from job
* Links from task groups and tasks to their variables incl periodic and parameterized
* Lintfix
* Make sure they can list em before we list em
* Tests from job/group/task to var
Plan rejections occur when the scheduler work and the leader plan
applier disagree on the feasibility of a plan. This may happen for valid
reasons: since Nomad does parallel scheduling, it is expected that
different workers will have a different state when computing placements.
As the final plan reaches the leader plan applier, it may no longer be
valid due to a concurrent scheduling taking up intended resources. In
these situations the plan applier will notify the worker that the plan
was rejected and that they should refresh their state before trying
again.
In some rare and unexpected circumstances it has been observed that
workers will repeatedly submit the same plan, even if they are always
rejected.
While the root cause is still unknown this mitigation has been put in
place. The plan applier will now track the history of plan rejections
per client and include in the plan result a list of node IDs that should
be set as ineligible if the number of rejections in a given time window
crosses a certain threshold. The window size and threshold value can be
adjusted in the server configuration.
To avoid marking several nodes as ineligible at one, the operation is rate
limited to 5 nodes every 30min, with an initial burst of 10 operations.
Fixes#13505
This fixes#13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports).
As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility:
Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly.
Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me.
So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.