Commit graph

2109 commits

Author SHA1 Message Date
Tim Gross fbaf4c8b69
node pools: implement support in scheduler (#17443)
Implement scheduler support for node pool:

* When a scheduler is invoked, we get a set of the ready nodes in the DCs that
  are allowed for that job. Extend the filter to include the node pool.
* Ensure that changes to a job's node pool are picked up as destructive
  allocation updates.
* Add `NodesInPool` as a metric to all reporting done by the scheduler.
* Add the node-in-pool the filter to the `Node.Register` RPC so that we don't
  generate spurious evals for nodes in the wrong pool.
2023-06-07 10:39:03 -04:00
Luiz Aoqui 2420c93179
node pools: list nodes in pool (#17413) 2023-06-06 10:43:43 -04:00
Luiz Aoqui aa1b33d157
node pools: add event stream support (#17412) 2023-06-06 10:14:47 -04:00
Tim Gross 2d16ec6c6f
node pools: implement RPC to list jobs in a given node pool (#17396)
Implements the `NodePool.ListJobs` RPC, with pagination and filtering based on
the existing `Job.List` RPC.
2023-06-05 15:36:52 -04:00
Luiz Aoqui 6039c18ab6
node pools: register a node in a node pool (#17405) 2023-06-02 17:50:50 -04:00
Samantha b92a782b6e
check: Add support for Consul field tls_server_name (#17334) 2023-06-02 10:19:12 -04:00
Luiz Aoqui f755b9469f
core: refactor task validation (#17344)
Move all validations related to task fields to Task.Validate(). Prior to
this, some task validations were being done inside TaskGroup.Validate()
because they required access to some group values.

But similarly to how TaskGroup.Validate() tasks the job as parameter,
it's fair to expect the task to receive its group.
2023-06-01 19:26:42 -04:00
Luiz Aoqui 4be8d7c049
core: fix kill_timeout validation when progress_deadline is 0 (#17342) 2023-06-01 19:01:32 -04:00
Luiz Aoqui 9bb57c08e3
node pool: add search support (#17385) 2023-06-01 17:48:14 -04:00
Tim Gross 4f14fa0518
node pools: add node_pool field to job spec (#17379)
This changeset only adds the `node_pool` field to the jobspec, and ensures that
it gets picked up correctly as a change. Without the rest of the implementation
landed yet, the field will be ignored.
2023-06-01 16:08:55 -04:00
Luiz Aoqui c61e75f302
node pools: add CRUD API (#17384) 2023-06-01 15:55:49 -04:00
Seth Hoenig acfdf0f479
compliance: add headers with fixed copywrite tool (#17353)
Closes #17117
2023-05-30 09:20:32 -05:00
Lance Haig 568da5918b
cli: tls certs not created with correct SANs (#16959)
The `nomad tls cert` command did not create certificates with the correct SANs for
them to work with non default domain and region names. This changset updates the
code to support non default domains and regions in the certificates.
2023-05-22 09:31:56 -04:00
Phil Renaud 7e56ca62d1
[ui] Adds a "Scheduling" filter to the job.allocations page (#17227)
* Basic filter concept

* Make sure NextAllocation gets sent up with allocation stub
2023-05-18 16:24:41 -04:00
James Rasell 96f7c84e4e
variable: fixup metadata copy comment and remove unrequired type. (#17234) 2023-05-18 13:49:41 +01:00
Luiz Aoqui 389212bfda
node pool: initial base work (#17163)
Implementation of the base work for the new node pools feature. It includes a new `NodePool` struct and its corresponding state store table.

Upon start the state store is populated with two built-in node pools that cannot be modified nor deleted:

  * `all` is a node pool that always includes all nodes in the cluster.
  * `default` is the node pool where nodes that don't specify a node pool in their configuration are placed.
2023-05-15 10:49:08 -04:00
Seth Hoenig 81e36b3650
core: eliminate second index on job_submissions table (#17146)
* core: eliminate second index on job_submissions table

This PR refactors the job_submissions state store code to eliminate the
use of a second index formerly used for purging all versions of a given
job. In practice we ended up with duplicate entries on the table. Instead,
use index prefix scanning on the primary index and tidy up any potential
for creating (or removing) duplicates.

* core: pr comments followup
2023-05-11 09:51:08 -05:00
Tim Gross 9ed75e1f72
client: de-duplicate alloc updates and gate during restore (#17074)
When client nodes are restarted, all allocations that have been scheduled on the
node have their modify index updated, including terminal allocations. There are
several contributing factors:

* The `allocSync` method that updates the servers isn't gated on first contact
  with the servers. This means that if a server updates the desired state while
  the client is down, the `allocSync` races with the `Node.ClientGetAlloc`
  RPC. This will typically result in the client updating the server with "running"
  and then immediately thereafter "complete".

* The `allocSync` method unconditionally sends the `Node.UpdateAlloc` RPC even
  if it's possible to assert that the server has definitely seen the client
  state. The allocrunner may queue-up updates even if we gate sending them. So
  then we end up with a race between the allocrunner updating its internal state
  to overwrite the previous update and `allocSync` sending the bogus or duplicate
  update.

This changeset adds tracking of server-acknowledged state to the
allocrunner. This state gets checked in the `allocSync` before adding the update
to the batch, and updated when `Node.UpdateAlloc` returns successfully. To
implement this we need to be able to equality-check the updates against the last
acknowledged state. We also need to add the last acknowledged state to the
client state DB, otherwise we'd drop unacknowledged updates across restarts.

The client restart test has been expanded to cover a variety of allocation
states, including allocs stopped before shutdown, allocs stopped by the server
while the client is down, and allocs that have been completely GC'd on the
server while the client is down. I've also bench tested scenarios where the task
workload is killed while the client is down, resulting in a failed restore.

Fixes #16381
2023-05-11 09:05:24 -04:00
Tim Gross 17bd930ca9
logs: fix missing allocation logs after update to Nomad 1.5.4 (#17087)
When the server restarts for the upgrade, it loads the `structs.Job` from the
Raft snapshot/logs. The jobspec has long since been parsed, so none of the
guards around the default value are in play. The empty field value for `Enabled`
is the zero value, which is false.

This doesn't impact any running allocation because we don't replace running
allocations when either the client or server restart. But as soon as any
allocation gets rescheduled (ex. you drain all your clients during upgrades),
it'll be using the `structs.Job` that the server has, which has `Enabled =
false`, and logs will not be collected.

This changeset fixes the bug by adding a new field `Disabled` which defaults to
false (so that the zero value works), and deprecates the old field.

Fixes #17076
2023-05-04 16:01:18 -04:00
Michael Schurter 3b3b02b741
dep: update from jwt/v4 to jwt/v5 (#17062)
Their release notes are here: https://github.com/golang-jwt/jwt/releases

Seemed wise to upgrade before we do even more with JWTs. For example
this upgrade *would* have mattered if we already implemented common JWT
claims such as expiration. Since we didn't rely on any claim
verification this upgrade is a noop...

...except for 1 test that called `Claims.Valid()`! Removing that
assertion *seems* scary, but it didn't actually do anything because we
didn't implement any of the standard claims it validated:

https://github.com/golang-jwt/jwt/blob/v4.5.0/map_claims.go#L120-L151

So functionally this major upgrade is a noop.
2023-05-03 11:17:38 -07:00
Luiz Aoqui 7b5a8f1fb0
Revert "hashicorp/go-msgpack v2 (#16810)" (#17047)
This reverts commit 8a98520d56eed3848096734487d8bd3eb9162a65.
2023-05-01 17:18:34 -04:00
James Rasell ac98c2ed40
vars: ensure struct reciever names are consistent. (#16995) 2023-04-27 13:51:11 +01:00
Tim Gross 72cbe53f19
logs: allow disabling log collection in jobspec (#16962)
Some Nomad users ship application logs out-of-band via syslog. For these users
having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow
disabling the logmon and pointing the task's stdout/stderr to /dev/null.

This changeset is the first of several incremental improvements to log
collection short of full-on logging plugins. The next step will likely be to
extend the internal-only task driver configuration so that cluster
administrators can turn off log collection for the entire driver.

---

Fixes: #11175

Co-authored-by: Thomas Weber <towe75@googlemail.com>
2023-04-24 10:00:27 -04:00
valodzka 379497a484
fix host port handling for ipv6 (#16723) 2023-04-20 19:53:20 -07:00
James Rasell 367cfa6d93
rpc: use "+" concatination in hot path RPC rate limit metrics. (#16923) 2023-04-18 13:41:34 +01:00
Ian Fijolek 619f49afcf
hashicorp/go-msgpack v2 (#16810)
* Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0

Fixes #16808

* Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack

* deps: use go-msgpack v2.0.0

go-msgpack v2.1.0 includes some code changes that we will need to
investigate furthere to assess its impact on Nomad, so keeping this
dependency on v2.0.0 for now since it's no-op.

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-04-17 17:02:05 -04:00
Tim Gross 62548616d4
client: allow drain_on_shutdown configuration (#16827)
Adds a new configuration to clients to optionally allow them to drain their
workloads on shutdown. The client sends the `Node.UpdateDrain` RPC targeting
itself and then monitors the drain state as seen by the server until the drain
is complete or the deadline expires. If it loses connection with the server, it
will monitor local client status instead to ensure allocations are stopped
before exiting.
2023-04-14 15:35:32 -04:00
Juana De La Cuesta 8302085384
Deployment Status Command Does Not Respect -namespace Wildcard (#16792)
* func: add namespace support for list deployment

* func: add wildcard to namespace filter for deployments

* Update deployment_endpoint.go

* style: use must instead of require or asseert

* style: rename paginator to avoid clash with import

* style: add changelog entry

* fix: add missing parameter for upsert jobs
2023-04-12 11:02:14 +02:00
Seth Hoenig ba728f8f97
api: enable support for setting original job source (#16763)
* api: enable support for setting original source alongside job

This PR adds support for setting job source material along with
the registration of a job.

This includes a new HTTP endpoint and a new RPC endpoint for
making queries for the original source of a job. The
HTTP endpoint is /v1/job/<id>/submission?version=<version> and
the RPC method is Job.GetJobSubmission.

The job source (if submitted, and doing so is always optional), is
stored in the job_submission memdb table, separately from the
actual job. This way we do not incur overhead of reading the large
string field throughout normal job operations.

The server config now includes job_max_source_size for configuring
the maximum size the job source may be, before the server simply
drops the source material. This should help prevent Bad Things from
happening when huge jobs are submitted. If the value is set to 0,
all job source material will be dropped.

* api: avoid writing var content to disk for parsing

* api: move submission validation into RPC layer

* api: return an error if updating a job submission without namespace or job id

* api: be exact about the job index we associate a submission with (modify)

* api: reword api docs scheduling

* api: prune all but the last 6 job submissions

* api: protect against nil job submission in job validation

* api: set max job source size in test server

* api: fixups from pr
2023-04-11 08:45:08 -05:00
hashicorp-copywrite[bot] 005636afa0 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Tim Gross 1335543731
ephemeral disk: migrate should imply sticky (#16826)
The `ephemeral_disk` block's `migrate` field allows for best-effort migration of
the ephemeral disk data to new nodes. The documentation says the `migrate` field
is only respected if `sticky=true`, but in fact if client ACLs are not set the
data is migrated even if `sticky=false`.

The existing behavior when client ACLs are disabled has existed since the early
implementation, so "fixing" that case now would silently break backwards
compatibility. Additionally, having `migrate` not imply `sticky` seems
nonsensical: it suggests that if we place on a new node we migrate the data but
if we place on the same node, we throw the data away!

Update so that `migrate=true` implies `sticky=true` as follows:

* The failure mode when client ACLs are enabled comes from the server not passing
  along a migration token. Update the server so that the server provides a
  migration token whenever `migrate=true` and not just when `sticky=true` too.
* Update the scheduler so that `migrate` implies `sticky`.
* Update the client so that we check for `migrate || sticky` where appropriate.
* Refactor the E2E tests to move them off the old framework and make the intention
  of the test more clear.
2023-04-07 16:33:45 -04:00
Juana De La Cuesta 9b4871fece
Prevent kill_timeout greater than progress_deadline (#16761)
* func: add validation for kill timeout smaller than progress dealine

* style: add changelog

* style: typo in changelog

* style: remove refactored test

* Update .changelog/16761.txt

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/structs/structs.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

---------

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-04-04 18:17:10 +02:00
Juana De La Cuesta 1fc13b83d8
style: update documentation (#16729) 2023-03-31 16:38:16 +02:00
Horacio Monsalvo 20372b1721
connect: add meta on ConsulSidecarService (#16705)
Co-authored-by: Sol-Stiep <sol.stiep@southworks.com>
2023-03-30 16:09:28 -04:00
Piotr Kazmierczak e48c48e89b acl: RPC endpoints for JWT auth (#15918) 2023-03-30 09:39:56 +02:00
Piotr Kazmierczak a9230fb0b7 acl: JWT auth method 2023-03-30 09:39:56 +02:00
Michael Schurter f8884d8b52
client/metadata: fix crasher caused by AllowStale = false (#16549)
Fixes #16517

Given a 3 Server cluster with at least 1 Client connected to Follower 1:

If a NodeMeta.{Apply,Read} for the Client request is received by
Follower 1 with `AllowStale = false` the Follower will forward the
request to the Leader.

The Leader, not being connected to the target Client, will forward the
RPC to Follower 1.

Follower 1, seeing AllowStale=false, will forward the request to the
Leader.

The Leader, not being connected to... well hoppefully you get the
picture: an infinite loop occurs.
2023-03-20 16:32:32 -07:00
Seth Hoenig ed7177de76
scheduler: annotate tasksUpdated with reason and purge DeepEquals (#16421)
* scheduler: annotate tasksUpdated with reason and purge DeepEquals

* cr: move opaque into helper

* cr: swap affinity/spread hashing for slice equal

* contributing: update checklist-jobspec with notes about struct methods

* cr: add more cases to wait config equal method

* cr: use reflect when comparing envoy config blocks

* cl: add cl
2023-03-14 09:46:00 -05:00
Tim Gross 9dfb51579c
scheduler: refactor system util tests (#16416)
The tests for the system allocs reconciling code path (`diffSystemAllocs`)
include many impossible test environments, such as passing allocs for the wrong
node into the function. This makes the test assertions nonsensible for use in
walking yourself through the correct behavior.

I've pulled this changeset out of PR #16097 so that we can merge these
improvements and revisit the right approach to fix the problem in #16097 with
less urgency now that the PFNR bug fix has been merged. This changeset breaks up
a couple of tests, expands test coverage, and makes test assertions more
clear. It also corrects one bit of production code that behaves fine in
production because of canonicalization, but forces us to remember to set values
in tests to compensate.
2023-03-13 11:59:31 -04:00
Tim Gross 99d46e5a49
scheduling: prevent self-collision in dynamic port network offerings (#16401)
When the scheduler tries to find a placement for a new allocation, it iterates
over a subset of nodes. For each node, we populate a `NetworkIndex` bitmap with
the ports of all existing allocations and any other allocations already proposed
as part of this same evaluation via its `SetAllocs` method. Then we make an
"ask" of the `NetworkIndex` in `AssignPorts` for any ports we need and receive
an "offer" in return. The offer will include both static ports and any dynamic
port assignments.

The `AssignPorts` method was written to support group networks, and it shares
code that selects dynamic ports with the original `AssignTaskNetwork`
code. `AssignTaskNetwork` can request multiple ports from the bitmap at a
time. But `AssignPorts` requests them one at a time and does not account for
possible collisions, and doesn't return an error in that case.

What happens next varies:

1. If the scheduler doesn't place the allocation on that node, the port
   conflict is thrown away and there's no problem.
2. If the node is picked and this is the only allocation (or last allocation),
   the plan applier will reject the plan when it calls `SetAllocs`, as we'd expect.
3. If the node is picked and there are additional allocations in the same eval
   that iterate over the same node, their call to `SetAllocs` will detect the
   impossible state and the node will be rejected. This can have the puzzling
   behavior where a second task group for the job without any networking at all
   can hit a port collision error!

It looks like this bug has existed since we implemented group networks, but
there are several factors that add up to making the issue rare for many users
yet frustratingly frequent for others:

* You're more likely to hit this bug the more tightly packed your range for
  dynamic ports is. With 12000 ports in the range by default, many clusters can
  avoid this for a long time.
* You're more likely to hit case (3) for jobs with lots of allocations or if a
  scheduler has to iterate over a large number of nodes, such as with system jobs,
  jobs with `spread` blocks, or (sometimes) jobs using `unique` constraints.

For unlucky combinations of these factors, it's possible that case (3) happens
repeatedly, preventing scheduling of a given job until a client state
change (ex. restarting the agent so all its allocations are rescheduled
elsewhere) re-opens the range of dynamic ports available.

This changeset:

* Fixes the bug by accounting for collisions in dynamic port selection in
  `AssignPorts`.
* Adds test coverage for `AssignPorts`, expands coverage of this case for the
  deprecated `AssignTaskNetwork`, and tightens the dynamic port range in a
  scheduler test for spread scheduling to more easily detect this kind of problem
  in the future.
* Adds a `String()` method to `Bitmap` so that any future "screaming" log lines
  have a human-readable list of used ports.
2023-03-09 10:09:54 -05:00
Tim Gross a2ceab3d8c
scheduler: correctly detect inplace update with wildcard datacenters (#16362)
Wildcard datacenters introduced a bug where a job with any wildcard datacenters
will always be treated as a destructive update when we check whether a
datacenter has been removed from the jobspec.

Includes updating the helper so that callers don't have to loop over the job's
datacenters.
2023-03-07 10:05:59 -05:00
Luiz Aoqui 2a1a790820
client: don't emit task shutdown delay event if not waiting (#16281) 2023-03-03 18:22:06 -05:00
Michael Schurter bd7b60712e
Accept Workload Identities for Client RPCs (#16254)
This change resolves policies for workload identities when calling Client RPCs. Previously only ACL tokens could be used for Client RPCs.

Since the same cache is used for both bearer tokens (ACL and Workload ID), the token cache size was doubled.

---------

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-02-27 10:17:47 -08:00
James Rasell 8295d0e516
acl: add validation to binding rule selector on upsert. (#16210)
* acl: add validation to binding rule selector on upsert.

* docs: add more information on binding rule selector escaping.
2023-02-17 15:38:55 +01:00
Alessio Perugini 4e9ec24b22
Allow configurable range of Job priorities (#16084) 2023-02-17 09:23:13 -05:00
Seth Hoenig 165791dd89
artifact: protect against unbounded artifact decompression (1.5.0) (#16151)
* artifact: protect against unbounded artifact decompression

Starting with 1.5.0, set defaut values for artifact decompression limits.

artifact.decompression_size_limit (default "100GB") - the maximum amount of
data that will be decompressed before triggering an error and cancelling
the operation

artifact.decompression_file_count_limit (default 4096) - the maximum number
of files that will be decompressed before triggering an error and
cancelling the operation.

* artifact: assert limits cannot be nil in validation
2023-02-14 09:28:39 -06:00
Charlie Voiselle d93ba0cf32
Add warnings to var put for non-alphanumeric keys. (#15933)
* Warn when Items key isn't directly accessible

Go template requires that map keys are alphanumeric for direct access
using the dotted reference syntax. This warns users when they create
keys that run afoul of this requirement.

- cli: use regex to detect invalid indentifiers in var keys
- test: fix slash in escape test case
- api: share warning formatting function between API and CLI
- ui: warn if var key has characters other than _, letter, or number

---------
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-02-13 16:14:59 -05:00
Seth Hoenig df3e8f82da
deps: update go-set, go-landlock (#16146)
Made a breaking change in go-set (String() signature), need to update
both these dependencies together and also fix a thing in structs.go
2023-02-13 08:26:30 -06:00
Tim Gross c2bd829fe2
tests: don't mutate global structs in core scheduler tests (#16120)
Some of the core scheduler tests need the maximum batch size for writes to be
smaller than the usual `structs.MaxUUIDsPerWriteRequest`. But they do so by
unsafely modifying the global struct, which creates test flakes in other tests.

Modify the functions under test to take a batch size parameter. Production code
will pass the global while the tests can inject smaller values. Turn the
`structs.MaxUUIDsPerWriteRequest` into a constant, and add a semgrep rule for
avoiding this kind of thing in the future.
2023-02-10 09:26:00 -05:00
Seth Hoenig 0e7bf87ee1
deps: upgrade to hashicorp/golang-lru/v2 (#16085) 2023-02-08 15:20:33 -06:00