Commit graph

4374 commits

Author SHA1 Message Date
grembo 7936c1e33f
Add disable_file parameter to job's vault stanza (#13343)
This complements the `env` parameter, so that the operator can author
tasks that don't share their Vault token with the workload when using 
`image` filesystem isolation. As a result, more powerful tokens can be used 
in a job definition, allowing it to use template stanzas to issue all kinds of 
secrets (database secrets, Vault tokens with very specific policies, etc.), 
without sharing that issuing power with the task itself.

This is accomplished by creating a directory called `private` within
the task's working directory, which shares many properties of
the `secrets` directory (tmpfs where possible, not accessible by
`nomad alloc fs` or Nomad's web UI), but isn't mounted into/bound to the
container.

If the `disable_file` parameter is set to `false` (its default), the Vault token
is also written to the NOMAD_SECRETS_DIR, so the default behavior is
backwards compatible. Even if the operator never changes the default,
they will still benefit from the improved behavior of Nomad never reading
the token back in from that - potentially altered - location.
2023-06-23 15:15:04 -04:00
James Rasell 78cdf0d0d8
server: remove unused endpoints struct. (#17665) 2023-06-23 08:20:33 +01:00
Luiz Aoqui 8f05eaaa68
np: check for license on RPC endpoints (#17656) 2023-06-22 12:52:20 -04:00
Tim Gross 11216d09af
client: send node secret with every client-to-server RPC (#16799)
In Nomad 1.5.3 we fixed a security bug that allowed bypass of ACL checks if the
request came thru a client node first. But this fix broke (knowingly) the
identification of many client-to-server RPCs. These will be now measured as if
they were anonymous. The reason for this is that many client-to-server RPCs do
not send the node secret and instead rely on the protection of mTLS.

This changeset ensures that the node secret is being sent with every
client-to-server RPC request. In a future version of Nomad we can add
enforcement on the server side, but this was left out of this changeset to
reduce risks to the safe upgrade path.

Sending the node secret as an auth token introduces a new problem during initial
introduction of a client. Clients send many RPCs concurrently with
`Node.Register`, but until the node is registered the node secret is unknown to
the server and will be rejected as invalid. This causes permission denied
errors.

To fix that, this changeset introduces a gate on having successfully made a
`Node.Register` RPC before any other RPCs can be sent (except for `Status.Ping`,
which we need earlier but which also ignores the error because that handler
doesn't do an authorization check). This ensures that we only send requests with
a node secret already known to the server. This also makes client startup a
little easier to reason about because we know `Node.Register` must succeed
first, and it should make for a good place to hook in future plans for secure
introduction of nodes. The tradeoff is that an existing client that has running
allocs will take slightly longer (a second or two) to transition to ready after
a restart, because the transition in `Node.UpdateStatus` is gated at the server
by first submitting `Node.UpdateAlloc` with client alloc updates.
2023-06-22 11:06:49 -04:00
James Rasell 4e2d019639
variables: remove unused state store functions. (#17660) 2023-06-22 13:54:58 +01:00
James Rasell 71fdd7e891
core: use faster concatenation for alloc name generation. (#17591) 2023-06-22 07:46:28 +01:00
Luiz Aoqui ac08fc751b
node pools: apply node pool scheduler configuration (#17598) 2023-06-21 20:31:50 -04:00
Tim Gross ff9ba8ff73
scheduler: tolerate having only one dynamic port available (#17619)
If the dynamic port range for a node is set so that the min is equal to the max,
there's only one port available and this passes config validation. But the
scheduler panics when it tries to pick a random port. Only add the randomness
when there's more than one to pick from.

Adds a test for the behavior but also adjusts the commentary on a couple of the
existing tests that made it seem like this case was already covered if you
didn't look too closely.

Fixes: #17585
2023-06-20 13:29:25 -04:00
Luiz Aoqui 2f5df1d8a4
test: add MultiregionMinJob mock (#17614) 2023-06-20 10:57:02 -04:00
James Rasell 86e4c6cb9d
state: move variables tests to use must library. (#17609) 2023-06-20 15:46:16 +01:00
James Rasell 68df578c73
state: remove vague scaling event schema todo item. (#17610) 2023-06-20 15:22:11 +01:00
Luiz Aoqui a56b10e857
chore: fix typo and copyright header (#17605) 2023-06-20 10:09:47 -04:00
Luiz Aoqui cfb3bb517f
np: scheduler configuration updates (#17575)
* jobspec: rename node pool scheduler_configuration

In HCL specifications we usually call configuration blocks `config`
instead of `configuration`.

* np: add memory oversubscription config

* np: make scheduler config ENT
2023-06-19 11:41:46 -04:00
Luiz Aoqui d5aa72190f
node pools: namespace integration (#17562)
Add structs and fields to support the Nomad Pools Governance Enterprise
feature of controlling node pool access via namespaces.

Nomad Enterprise allows users to specify a default node pool to be used
by jobs that don't specify one. In order to accomplish this, it's
necessary to distinguish between a job that explicitly uses the
`default` node pool and one that did not specify any.

If the `default` node pool is set during job canonicalization it's
impossible to do this, so this commit allows a job to have an empty node
pool value during registration but sets to `default` at the admission
controller mutator.

In order to guarantee state consistency the state store validates that
the job node pool is set and exists before inserting it.
2023-06-16 16:30:22 -04:00
Luiz Aoqui bdc7f3305f
rpc: fix log message in Node.UpdateStatus (#17537) 2023-06-14 16:51:46 -04:00
Luiz Aoqui bc17cffaef
node pool: node pool upsert on multiregion node register (#17503)
When registering a node with a new node pool in a non-authoritative
region we can't create the node pool because this new pool will not be
replicated to other regions.

This commit modifies the node registration logic to only allow automatic
node pool creation in the authoritative region.

In non-authoritative regions, the client is registered, but the node
pool is not created. The client is kept in the `initialing` status until
its node pool is created in the authoritative region and replicated to
the client's region.
2023-06-13 11:28:28 -04:00
Tim Gross 952eb2713e
node pools: protect against deleting occupied pools (#17457)
We don't want to delete node pools that have nodes or non-terminal jobs. Add a
check in the `DeleteNodePools` RPC to check locally and in federated regions,
similar to how we check that it's safe to delete namespaces.
2023-06-13 09:57:42 -04:00
Tim Gross e8a361310f
node pools: replicate from authoritative region (#17456)
Upserts and deletes of node pools are forwarded to the authoritative region,
just like we do for namespaces, quotas, ACL policies, etc. Replicate node pools
from the authoritative region.
2023-06-12 13:24:24 -04:00
Tim Gross bb7f0edd6a
node pools: prevent panic on upsert during upgrades (#17474)
Whenever we write a Raft log entry for node pools, we need to first make sure
that all servers can safely apply the log without panicking. Gate upsert and
delete RPCs on all servers being upgraded to the minimum version.
2023-06-12 09:01:30 -04:00
Tim Gross e3a37c0b97
replication: fix potential panic during upgrades (#17476)
If the authoritative region has been upgraded to a version of Nomad that has new
replicated objects (such as ACL Auth Methods, ACL Binding Rules, etc.), the
non-authoritative regions will start replicating those objects as soon as their
leader is upgraded. If a server in the non-authoritative region is upgraded and
then becomes the leader before all the other servers in the region have been
upgraded, then it will attempt to write a Raft log entry that the followers
don't understand. The followers will then panic.

Add same the minimum version checks that we do for RPC writes to the leader's
replication loop.
2023-06-12 08:53:56 -04:00
Tim Gross fbaf4c8b69
node pools: implement support in scheduler (#17443)
Implement scheduler support for node pool:

* When a scheduler is invoked, we get a set of the ready nodes in the DCs that
  are allowed for that job. Extend the filter to include the node pool.
* Ensure that changes to a job's node pool are picked up as destructive
  allocation updates.
* Add `NodesInPool` as a metric to all reporting done by the scheduler.
* Add the node-in-pool the filter to the `Node.Register` RPC so that we don't
  generate spurious evals for nodes in the wrong pool.
2023-06-07 10:39:03 -04:00
Tim Gross c0f2295510
node pools: implement HTTP API to list jobs in pool (#17431)
Implements the HTTP API associated with the `NodePool.ListJobs` RPC, including
the `api` package for the public API and documentation.

Update the `NodePool.ListJobs` RPC to fix the missing handling of the special
"all" pool.
2023-06-06 11:40:13 -04:00
Luiz Aoqui 2420c93179
node pools: list nodes in pool (#17413) 2023-06-06 10:43:43 -04:00
Luiz Aoqui aa1b33d157
node pools: add event stream support (#17412) 2023-06-06 10:14:47 -04:00
Tim Gross 2d16ec6c6f
node pools: implement RPC to list jobs in a given node pool (#17396)
Implements the `NodePool.ListJobs` RPC, with pagination and filtering based on
the existing `Job.List` RPC.
2023-06-05 15:36:52 -04:00
Luiz Aoqui 700168e136
node pools: fix node upsert and state mutation tests (#17430) 2023-06-05 14:58:32 -04:00
Luiz Aoqui 6039c18ab6
node pools: register a node in a node pool (#17405) 2023-06-02 17:50:50 -04:00
Luiz Aoqui 3a962d07f8
np: fix node pool search permission check (#17400)
When checking if a token is allowed to query the search endpoints we
need to return an error if the search context includes `node_pool` and
the token doesn't have access to _any_ pool. This prevents returning an
empty list instead of a permission denied error.
2023-06-02 12:22:47 -04:00
Samantha b92a782b6e
check: Add support for Consul field tls_server_name (#17334) 2023-06-02 10:19:12 -04:00
Tim Gross 56e9b944e8
node pools: validate pool exists on job registration (#17386)
Add a new job admission hook for node pools that enforces the pool exists on
registration. Also provide the skeleton function we need for Enterprise
enforcement functions we'll implement later.
2023-06-02 09:32:07 -04:00
Luiz Aoqui f755b9469f
core: refactor task validation (#17344)
Move all validations related to task fields to Task.Validate(). Prior to
this, some task validations were being done inside TaskGroup.Validate()
because they required access to some group values.

But similarly to how TaskGroup.Validate() tasks the job as parameter,
it's fair to expect the task to receive its group.
2023-06-01 19:26:42 -04:00
Luiz Aoqui 4be8d7c049
core: fix kill_timeout validation when progress_deadline is 0 (#17342) 2023-06-01 19:01:32 -04:00
Luiz Aoqui 9bb57c08e3
node pool: add search support (#17385) 2023-06-01 17:48:14 -04:00
Tim Gross 4f14fa0518
node pools: add node_pool field to job spec (#17379)
This changeset only adds the `node_pool` field to the jobspec, and ensures that
it gets picked up correctly as a change. Without the rest of the implementation
landed yet, the field will be ignored.
2023-06-01 16:08:55 -04:00
Luiz Aoqui c61e75f302
node pools: add CRUD API (#17384) 2023-06-01 15:55:49 -04:00
Seth Hoenig acfdf0f479
compliance: add headers with fixed copywrite tool (#17353)
Closes #17117
2023-05-30 09:20:32 -05:00
Charlie Voiselle 86e04a4c6c
[core] nil check and error handling for client status in heartbeat responses (#17316)
Add a nil check to constructNodeServerInfoResponse to manage an apparent race between deregister and client heartbeats. Fixes #17310
2023-05-25 16:04:54 -04:00
Lance Haig 568da5918b
cli: tls certs not created with correct SANs (#16959)
The `nomad tls cert` command did not create certificates with the correct SANs for
them to work with non default domain and region names. This changset updates the
code to support non default domains and regions in the certificates.
2023-05-22 09:31:56 -04:00
Roberto Hidalgo 2f702a9f11
allow periodic jobs to use workload identity ACL policies (#17018)
When resolving ACL policies, we were not using the parent ID for the policy
lookup for dispatch/periodic jobs, even though the claims were signed for that
parent ID. This prevents all calls to the Task API (and other WI-authenticated
API calls) from a periodically-dispatched job failing with 403.

Fix this by using the parent job ID whenever it's available.
2023-05-22 09:19:16 -04:00
Phil Renaud 7e56ca62d1
[ui] Adds a "Scheduling" filter to the job.allocations page (#17227)
* Basic filter concept

* Make sure NextAllocation gets sent up with allocation stub
2023-05-18 16:24:41 -04:00
James Rasell 96f7c84e4e
variable: fixup metadata copy comment and remove unrequired type. (#17234) 2023-05-18 13:49:41 +01:00
Piotr Kazmierczak fe272c3686
refactor acl.UpsertTokens to avoid unnecessary RPC calls. (#17194)
New RPC endpoints introduced during OIDC and JWT auth perform unnecessary many RPC calls when they upsert generated ACL tokens, as pointed out by @tgross.

This PR moves the common logic from acl.UpsertTokens method into a helper method that contains common logic, and sidesteps authentication, metrics, etc.
2023-05-16 09:31:51 +02:00
Luiz Aoqui 389212bfda
node pool: initial base work (#17163)
Implementation of the base work for the new node pools feature. It includes a new `NodePool` struct and its corresponding state store table.

Upon start the state store is populated with two built-in node pools that cannot be modified nor deleted:

  * `all` is a node pool that always includes all nodes in the cluster.
  * `default` is the node pool where nodes that don't specify a node pool in their configuration are placed.
2023-05-15 10:49:08 -04:00
Seth Hoenig 81e36b3650
core: eliminate second index on job_submissions table (#17146)
* core: eliminate second index on job_submissions table

This PR refactors the job_submissions state store code to eliminate the
use of a second index formerly used for purging all versions of a given
job. In practice we ended up with duplicate entries on the table. Instead,
use index prefix scanning on the primary index and tidy up any potential
for creating (or removing) duplicates.

* core: pr comments followup
2023-05-11 09:51:08 -05:00
Tim Gross 9ed75e1f72
client: de-duplicate alloc updates and gate during restore (#17074)
When client nodes are restarted, all allocations that have been scheduled on the
node have their modify index updated, including terminal allocations. There are
several contributing factors:

* The `allocSync` method that updates the servers isn't gated on first contact
  with the servers. This means that if a server updates the desired state while
  the client is down, the `allocSync` races with the `Node.ClientGetAlloc`
  RPC. This will typically result in the client updating the server with "running"
  and then immediately thereafter "complete".

* The `allocSync` method unconditionally sends the `Node.UpdateAlloc` RPC even
  if it's possible to assert that the server has definitely seen the client
  state. The allocrunner may queue-up updates even if we gate sending them. So
  then we end up with a race between the allocrunner updating its internal state
  to overwrite the previous update and `allocSync` sending the bogus or duplicate
  update.

This changeset adds tracking of server-acknowledged state to the
allocrunner. This state gets checked in the `allocSync` before adding the update
to the batch, and updated when `Node.UpdateAlloc` returns successfully. To
implement this we need to be able to equality-check the updates against the last
acknowledged state. We also need to add the last acknowledged state to the
client state DB, otherwise we'd drop unacknowledged updates across restarts.

The client restart test has been expanded to cover a variety of allocation
states, including allocs stopped before shutdown, allocs stopped by the server
while the client is down, and allocs that have been completely GC'd on the
server while the client is down. I've also bench tested scenarios where the task
workload is killed while the client is down, resulting in a failed restore.

Fixes #16381
2023-05-11 09:05:24 -04:00
Seth Hoenig 74714272cc
api: set the job submission during job reversion (#17097)
* api: set the job submission during job reversion

This PR fixes a bug where the job submission would always be nil when
a job goes through a reversion to a previous version. Basically we need
to detect when this happens, lookup the submission of the job version
being reverted to, and set that as the submission of the new job being
created.

* e2e: add e2e test for job submissions during reversion

This e2e test ensures a reverted job inherits the job submission
associated with the version of the job being reverted to.
2023-05-08 14:18:34 -05:00
Daniel Bennett a7ed6f5c53
full task cleanup when alloc prerun hook fails (#17104)
to avoid leaking task resources (e.g. containers,
iptables) if allocRunner prerun fails during
restore on client restart.

now if prerun fails, TaskRunner.MarkFailedKill()
will only emit an event, mark the task as failed,
and cancel the tr's killCtx, so then ar.runTasks()
-> tr.Run() can take care of the actual cleanup.

removed from (formerly) tr.MarkFailedDead(),
now handled by tr.Run():
 * set task state as dead
 * save task runner local state
 * task stop hooks

also done in tr.Run() now that it's not skipped:
 * handleKill() to kill tasks while respecting
   their shutdown delay, and retrying as needed
   * also includes task preKill hooks
 * clearDriverHandle() to destroy the task
   and associated resources
 * task exited hooks
2023-05-08 13:17:10 -05:00
stswidwinski 9c1c2cb5d2
Correct the status description and modify time of canceled evals. (#17071)
Fix for #17070. Corrected the status description and modify time of evals which are canceled due to another eval having completed in the meantime.
2023-05-08 08:50:36 -04:00
Seth Hoenig fff2eec625
connect: use heuristic to detect sidecar task driver (#17065)
* connect: use heuristic to detect sidecar task driver

This PR adds a heuristic to detect whether to use the podman task driver
for the connect sidecar proxy. The podman driver will be selected if there
is at least one task in the task group configured to use podman, and there
are zero tasks in the group configured to use docker. In all other cases
the task driver defaults to docker.

After this change, we should be able to run typical Connect jobspecs
(e.g. nomad job init [-short] -connect) on Clusters configured with the
podman task driver, without modification to the job files.

Closes #17042

* golf: cleanup driver detection logic
2023-05-05 10:19:30 -05:00
James Rasell 6ec4a69f47
scale: fixed a bug where evals could be created with wrong type. (#17092)
The job scale RPC endpoint hard-coded the eval creation to use the
type of service. This meant scaling events triggered on jobs of
type batch would create evaluations with the wrong type, which
does not seem to cause any problems, just confusion when
correlating the two.
2023-05-05 14:46:10 +01:00