Commit graph

4430 commits

Author SHA1 Message Date
Tim Gross ad7355e58b
CSI: persist previous mounts on client to restore during restart (#17840)
When claiming a CSI volume, we need to ensure the CSI node plugin is running
before we send any CSI RPCs. This extends even to the controller publish RPC
because it requires the storage provider's "external node ID" for the
client. This primarily impacts client restarts but also is a problem if the node
plugin exits (and fingerprints) while the allocation that needs a CSI volume
claim is being placed.

Unfortunately there's no mapping of volume to plugin ID available in the
jobspec, so we don't have enough information to wait on plugins until we either
get the volume from the server or retrieve the plugin ID from data we've
persisted on the client.

If we always require getting the volume from the server before making the claim,
a client restart for disconnected clients will cause all the allocations that
need CSI volumes to fail. Even while connected, checking in with the server to
verify the volume's plugin before trying to make a claim RPC is inherently racy,
so we'll leave that case as-is and it will fail the claim if the node plugin
needed to support a newly-placed allocation is flapping such that the node
fingerprint is changing.

This changeset persists a minimum subset of data about the volume and its plugin
in the client state DB, and retrieves that data during the CSI hook's prerun to
avoid re-claiming and remounting the volume unnecessarily.

This changeset also updates the RPC handler to use the external node ID from the
claim whenever it is available.

Fixes: #13028
2023-07-10 13:20:15 -04:00
Tim Gross 5025731ebe
consul: handle "not found" errors from Consul when deleting tokens (#17847)
In Consul 1.15.0, the Delete Token API was changed so as to return an error when
deleting a non-existent ACL token. This means that if Nomad successfully deletes
the token but fails to persist that fact, it will get stuck trying to delete a
non-existent token forever.

Update the token deletion function to ignore "not found" errors and treat them
as successful deletions.

Fixes: #17833
2023-07-07 16:22:13 -04:00
James Rasell 45073e8a05
job: ensure node pool is canonicalized for state restores. (#17765) 2023-06-30 07:37:22 +01:00
nicoche 649831c1d3
deploymentwatcher: fail early whenever possible (#17341)
Given a deployment that has a `progress_deadline`, if a task group runs
out of reschedule attempts, allow it to fail at this time instead of
waiting until the `progress_deadline` is reached.

Fixes: #17260
2023-06-26 14:01:03 -04:00
James Rasell 74ab0badb4
test: add drain config tests. (#17724) 2023-06-26 16:23:13 +01:00
Luiz Aoqui 66962b2b28
np: fix list of jobs for node pool all (#17705)
Unlike nodes, jobs are allowed to be registered in the node pool `all`,
in which case all nodes are used for evaluating placements. When listing
jobs for the `all` node pool only those that are explicitly in this node
pool should be returned.
2023-06-23 15:47:53 -04:00
grembo 7936c1e33f
Add disable_file parameter to job's vault stanza (#13343)
This complements the `env` parameter, so that the operator can author
tasks that don't share their Vault token with the workload when using 
`image` filesystem isolation. As a result, more powerful tokens can be used 
in a job definition, allowing it to use template stanzas to issue all kinds of 
secrets (database secrets, Vault tokens with very specific policies, etc.), 
without sharing that issuing power with the task itself.

This is accomplished by creating a directory called `private` within
the task's working directory, which shares many properties of
the `secrets` directory (tmpfs where possible, not accessible by
`nomad alloc fs` or Nomad's web UI), but isn't mounted into/bound to the
container.

If the `disable_file` parameter is set to `false` (its default), the Vault token
is also written to the NOMAD_SECRETS_DIR, so the default behavior is
backwards compatible. Even if the operator never changes the default,
they will still benefit from the improved behavior of Nomad never reading
the token back in from that - potentially altered - location.
2023-06-23 15:15:04 -04:00
James Rasell 78cdf0d0d8
server: remove unused endpoints struct. (#17665) 2023-06-23 08:20:33 +01:00
Luiz Aoqui 8f05eaaa68
np: check for license on RPC endpoints (#17656) 2023-06-22 12:52:20 -04:00
Tim Gross 11216d09af
client: send node secret with every client-to-server RPC (#16799)
In Nomad 1.5.3 we fixed a security bug that allowed bypass of ACL checks if the
request came thru a client node first. But this fix broke (knowingly) the
identification of many client-to-server RPCs. These will be now measured as if
they were anonymous. The reason for this is that many client-to-server RPCs do
not send the node secret and instead rely on the protection of mTLS.

This changeset ensures that the node secret is being sent with every
client-to-server RPC request. In a future version of Nomad we can add
enforcement on the server side, but this was left out of this changeset to
reduce risks to the safe upgrade path.

Sending the node secret as an auth token introduces a new problem during initial
introduction of a client. Clients send many RPCs concurrently with
`Node.Register`, but until the node is registered the node secret is unknown to
the server and will be rejected as invalid. This causes permission denied
errors.

To fix that, this changeset introduces a gate on having successfully made a
`Node.Register` RPC before any other RPCs can be sent (except for `Status.Ping`,
which we need earlier but which also ignores the error because that handler
doesn't do an authorization check). This ensures that we only send requests with
a node secret already known to the server. This also makes client startup a
little easier to reason about because we know `Node.Register` must succeed
first, and it should make for a good place to hook in future plans for secure
introduction of nodes. The tradeoff is that an existing client that has running
allocs will take slightly longer (a second or two) to transition to ready after
a restart, because the transition in `Node.UpdateStatus` is gated at the server
by first submitting `Node.UpdateAlloc` with client alloc updates.
2023-06-22 11:06:49 -04:00
James Rasell 4e2d019639
variables: remove unused state store functions. (#17660) 2023-06-22 13:54:58 +01:00
James Rasell 71fdd7e891
core: use faster concatenation for alloc name generation. (#17591) 2023-06-22 07:46:28 +01:00
Luiz Aoqui ac08fc751b
node pools: apply node pool scheduler configuration (#17598) 2023-06-21 20:31:50 -04:00
Tim Gross ff9ba8ff73
scheduler: tolerate having only one dynamic port available (#17619)
If the dynamic port range for a node is set so that the min is equal to the max,
there's only one port available and this passes config validation. But the
scheduler panics when it tries to pick a random port. Only add the randomness
when there's more than one to pick from.

Adds a test for the behavior but also adjusts the commentary on a couple of the
existing tests that made it seem like this case was already covered if you
didn't look too closely.

Fixes: #17585
2023-06-20 13:29:25 -04:00
Luiz Aoqui 2f5df1d8a4
test: add MultiregionMinJob mock (#17614) 2023-06-20 10:57:02 -04:00
James Rasell 86e4c6cb9d
state: move variables tests to use must library. (#17609) 2023-06-20 15:46:16 +01:00
James Rasell 68df578c73
state: remove vague scaling event schema todo item. (#17610) 2023-06-20 15:22:11 +01:00
Luiz Aoqui a56b10e857
chore: fix typo and copyright header (#17605) 2023-06-20 10:09:47 -04:00
Luiz Aoqui cfb3bb517f
np: scheduler configuration updates (#17575)
* jobspec: rename node pool scheduler_configuration

In HCL specifications we usually call configuration blocks `config`
instead of `configuration`.

* np: add memory oversubscription config

* np: make scheduler config ENT
2023-06-19 11:41:46 -04:00
Luiz Aoqui d5aa72190f
node pools: namespace integration (#17562)
Add structs and fields to support the Nomad Pools Governance Enterprise
feature of controlling node pool access via namespaces.

Nomad Enterprise allows users to specify a default node pool to be used
by jobs that don't specify one. In order to accomplish this, it's
necessary to distinguish between a job that explicitly uses the
`default` node pool and one that did not specify any.

If the `default` node pool is set during job canonicalization it's
impossible to do this, so this commit allows a job to have an empty node
pool value during registration but sets to `default` at the admission
controller mutator.

In order to guarantee state consistency the state store validates that
the job node pool is set and exists before inserting it.
2023-06-16 16:30:22 -04:00
Luiz Aoqui bdc7f3305f
rpc: fix log message in Node.UpdateStatus (#17537) 2023-06-14 16:51:46 -04:00
Luiz Aoqui bc17cffaef
node pool: node pool upsert on multiregion node register (#17503)
When registering a node with a new node pool in a non-authoritative
region we can't create the node pool because this new pool will not be
replicated to other regions.

This commit modifies the node registration logic to only allow automatic
node pool creation in the authoritative region.

In non-authoritative regions, the client is registered, but the node
pool is not created. The client is kept in the `initialing` status until
its node pool is created in the authoritative region and replicated to
the client's region.
2023-06-13 11:28:28 -04:00
Tim Gross 952eb2713e
node pools: protect against deleting occupied pools (#17457)
We don't want to delete node pools that have nodes or non-terminal jobs. Add a
check in the `DeleteNodePools` RPC to check locally and in federated regions,
similar to how we check that it's safe to delete namespaces.
2023-06-13 09:57:42 -04:00
Tim Gross e8a361310f
node pools: replicate from authoritative region (#17456)
Upserts and deletes of node pools are forwarded to the authoritative region,
just like we do for namespaces, quotas, ACL policies, etc. Replicate node pools
from the authoritative region.
2023-06-12 13:24:24 -04:00
Tim Gross bb7f0edd6a
node pools: prevent panic on upsert during upgrades (#17474)
Whenever we write a Raft log entry for node pools, we need to first make sure
that all servers can safely apply the log without panicking. Gate upsert and
delete RPCs on all servers being upgraded to the minimum version.
2023-06-12 09:01:30 -04:00
Tim Gross e3a37c0b97
replication: fix potential panic during upgrades (#17476)
If the authoritative region has been upgraded to a version of Nomad that has new
replicated objects (such as ACL Auth Methods, ACL Binding Rules, etc.), the
non-authoritative regions will start replicating those objects as soon as their
leader is upgraded. If a server in the non-authoritative region is upgraded and
then becomes the leader before all the other servers in the region have been
upgraded, then it will attempt to write a Raft log entry that the followers
don't understand. The followers will then panic.

Add same the minimum version checks that we do for RPC writes to the leader's
replication loop.
2023-06-12 08:53:56 -04:00
Tim Gross fbaf4c8b69
node pools: implement support in scheduler (#17443)
Implement scheduler support for node pool:

* When a scheduler is invoked, we get a set of the ready nodes in the DCs that
  are allowed for that job. Extend the filter to include the node pool.
* Ensure that changes to a job's node pool are picked up as destructive
  allocation updates.
* Add `NodesInPool` as a metric to all reporting done by the scheduler.
* Add the node-in-pool the filter to the `Node.Register` RPC so that we don't
  generate spurious evals for nodes in the wrong pool.
2023-06-07 10:39:03 -04:00
Tim Gross c0f2295510
node pools: implement HTTP API to list jobs in pool (#17431)
Implements the HTTP API associated with the `NodePool.ListJobs` RPC, including
the `api` package for the public API and documentation.

Update the `NodePool.ListJobs` RPC to fix the missing handling of the special
"all" pool.
2023-06-06 11:40:13 -04:00
Luiz Aoqui 2420c93179
node pools: list nodes in pool (#17413) 2023-06-06 10:43:43 -04:00
Luiz Aoqui aa1b33d157
node pools: add event stream support (#17412) 2023-06-06 10:14:47 -04:00
Tim Gross 2d16ec6c6f
node pools: implement RPC to list jobs in a given node pool (#17396)
Implements the `NodePool.ListJobs` RPC, with pagination and filtering based on
the existing `Job.List` RPC.
2023-06-05 15:36:52 -04:00
Luiz Aoqui 700168e136
node pools: fix node upsert and state mutation tests (#17430) 2023-06-05 14:58:32 -04:00
Luiz Aoqui 6039c18ab6
node pools: register a node in a node pool (#17405) 2023-06-02 17:50:50 -04:00
Luiz Aoqui 3a962d07f8
np: fix node pool search permission check (#17400)
When checking if a token is allowed to query the search endpoints we
need to return an error if the search context includes `node_pool` and
the token doesn't have access to _any_ pool. This prevents returning an
empty list instead of a permission denied error.
2023-06-02 12:22:47 -04:00
Samantha b92a782b6e
check: Add support for Consul field tls_server_name (#17334) 2023-06-02 10:19:12 -04:00
Tim Gross 56e9b944e8
node pools: validate pool exists on job registration (#17386)
Add a new job admission hook for node pools that enforces the pool exists on
registration. Also provide the skeleton function we need for Enterprise
enforcement functions we'll implement later.
2023-06-02 09:32:07 -04:00
Luiz Aoqui f755b9469f
core: refactor task validation (#17344)
Move all validations related to task fields to Task.Validate(). Prior to
this, some task validations were being done inside TaskGroup.Validate()
because they required access to some group values.

But similarly to how TaskGroup.Validate() tasks the job as parameter,
it's fair to expect the task to receive its group.
2023-06-01 19:26:42 -04:00
Luiz Aoqui 4be8d7c049
core: fix kill_timeout validation when progress_deadline is 0 (#17342) 2023-06-01 19:01:32 -04:00
Luiz Aoqui 9bb57c08e3
node pool: add search support (#17385) 2023-06-01 17:48:14 -04:00
Tim Gross 4f14fa0518
node pools: add node_pool field to job spec (#17379)
This changeset only adds the `node_pool` field to the jobspec, and ensures that
it gets picked up correctly as a change. Without the rest of the implementation
landed yet, the field will be ignored.
2023-06-01 16:08:55 -04:00
Luiz Aoqui c61e75f302
node pools: add CRUD API (#17384) 2023-06-01 15:55:49 -04:00
Seth Hoenig acfdf0f479
compliance: add headers with fixed copywrite tool (#17353)
Closes #17117
2023-05-30 09:20:32 -05:00
Charlie Voiselle 86e04a4c6c
[core] nil check and error handling for client status in heartbeat responses (#17316)
Add a nil check to constructNodeServerInfoResponse to manage an apparent race between deregister and client heartbeats. Fixes #17310
2023-05-25 16:04:54 -04:00
Lance Haig 568da5918b
cli: tls certs not created with correct SANs (#16959)
The `nomad tls cert` command did not create certificates with the correct SANs for
them to work with non default domain and region names. This changset updates the
code to support non default domains and regions in the certificates.
2023-05-22 09:31:56 -04:00
Roberto Hidalgo 2f702a9f11
allow periodic jobs to use workload identity ACL policies (#17018)
When resolving ACL policies, we were not using the parent ID for the policy
lookup for dispatch/periodic jobs, even though the claims were signed for that
parent ID. This prevents all calls to the Task API (and other WI-authenticated
API calls) from a periodically-dispatched job failing with 403.

Fix this by using the parent job ID whenever it's available.
2023-05-22 09:19:16 -04:00
Phil Renaud 7e56ca62d1
[ui] Adds a "Scheduling" filter to the job.allocations page (#17227)
* Basic filter concept

* Make sure NextAllocation gets sent up with allocation stub
2023-05-18 16:24:41 -04:00
James Rasell 96f7c84e4e
variable: fixup metadata copy comment and remove unrequired type. (#17234) 2023-05-18 13:49:41 +01:00
Piotr Kazmierczak fe272c3686
refactor acl.UpsertTokens to avoid unnecessary RPC calls. (#17194)
New RPC endpoints introduced during OIDC and JWT auth perform unnecessary many RPC calls when they upsert generated ACL tokens, as pointed out by @tgross.

This PR moves the common logic from acl.UpsertTokens method into a helper method that contains common logic, and sidesteps authentication, metrics, etc.
2023-05-16 09:31:51 +02:00
Luiz Aoqui 389212bfda
node pool: initial base work (#17163)
Implementation of the base work for the new node pools feature. It includes a new `NodePool` struct and its corresponding state store table.

Upon start the state store is populated with two built-in node pools that cannot be modified nor deleted:

  * `all` is a node pool that always includes all nodes in the cluster.
  * `default` is the node pool where nodes that don't specify a node pool in their configuration are placed.
2023-05-15 10:49:08 -04:00
Seth Hoenig 81e36b3650
core: eliminate second index on job_submissions table (#17146)
* core: eliminate second index on job_submissions table

This PR refactors the job_submissions state store code to eliminate the
use of a second index formerly used for purging all versions of a given
job. In practice we ended up with duplicate entries on the table. Instead,
use index prefix scanning on the primary index and tidy up any potential
for creating (or removing) duplicates.

* core: pr comments followup
2023-05-11 09:51:08 -05:00
Tim Gross 9ed75e1f72
client: de-duplicate alloc updates and gate during restore (#17074)
When client nodes are restarted, all allocations that have been scheduled on the
node have their modify index updated, including terminal allocations. There are
several contributing factors:

* The `allocSync` method that updates the servers isn't gated on first contact
  with the servers. This means that if a server updates the desired state while
  the client is down, the `allocSync` races with the `Node.ClientGetAlloc`
  RPC. This will typically result in the client updating the server with "running"
  and then immediately thereafter "complete".

* The `allocSync` method unconditionally sends the `Node.UpdateAlloc` RPC even
  if it's possible to assert that the server has definitely seen the client
  state. The allocrunner may queue-up updates even if we gate sending them. So
  then we end up with a race between the allocrunner updating its internal state
  to overwrite the previous update and `allocSync` sending the bogus or duplicate
  update.

This changeset adds tracking of server-acknowledged state to the
allocrunner. This state gets checked in the `allocSync` before adding the update
to the batch, and updated when `Node.UpdateAlloc` returns successfully. To
implement this we need to be able to equality-check the updates against the last
acknowledged state. We also need to add the last acknowledged state to the
client state DB, otherwise we'd drop unacknowledged updates across restarts.

The client restart test has been expanded to cover a variety of allocation
states, including allocs stopped before shutdown, allocs stopped by the server
while the client is down, and allocs that have been completely GC'd on the
server while the client is down. I've also bench tested scenarios where the task
workload is killed while the client is down, resulting in a failed restore.

Fixes #16381
2023-05-11 09:05:24 -04:00
Seth Hoenig 74714272cc
api: set the job submission during job reversion (#17097)
* api: set the job submission during job reversion

This PR fixes a bug where the job submission would always be nil when
a job goes through a reversion to a previous version. Basically we need
to detect when this happens, lookup the submission of the job version
being reverted to, and set that as the submission of the new job being
created.

* e2e: add e2e test for job submissions during reversion

This e2e test ensures a reverted job inherits the job submission
associated with the version of the job being reverted to.
2023-05-08 14:18:34 -05:00
Daniel Bennett a7ed6f5c53
full task cleanup when alloc prerun hook fails (#17104)
to avoid leaking task resources (e.g. containers,
iptables) if allocRunner prerun fails during
restore on client restart.

now if prerun fails, TaskRunner.MarkFailedKill()
will only emit an event, mark the task as failed,
and cancel the tr's killCtx, so then ar.runTasks()
-> tr.Run() can take care of the actual cleanup.

removed from (formerly) tr.MarkFailedDead(),
now handled by tr.Run():
 * set task state as dead
 * save task runner local state
 * task stop hooks

also done in tr.Run() now that it's not skipped:
 * handleKill() to kill tasks while respecting
   their shutdown delay, and retrying as needed
   * also includes task preKill hooks
 * clearDriverHandle() to destroy the task
   and associated resources
 * task exited hooks
2023-05-08 13:17:10 -05:00
stswidwinski 9c1c2cb5d2
Correct the status description and modify time of canceled evals. (#17071)
Fix for #17070. Corrected the status description and modify time of evals which are canceled due to another eval having completed in the meantime.
2023-05-08 08:50:36 -04:00
Seth Hoenig fff2eec625
connect: use heuristic to detect sidecar task driver (#17065)
* connect: use heuristic to detect sidecar task driver

This PR adds a heuristic to detect whether to use the podman task driver
for the connect sidecar proxy. The podman driver will be selected if there
is at least one task in the task group configured to use podman, and there
are zero tasks in the group configured to use docker. In all other cases
the task driver defaults to docker.

After this change, we should be able to run typical Connect jobspecs
(e.g. nomad job init [-short] -connect) on Clusters configured with the
podman task driver, without modification to the job files.

Closes #17042

* golf: cleanup driver detection logic
2023-05-05 10:19:30 -05:00
James Rasell 6ec4a69f47
scale: fixed a bug where evals could be created with wrong type. (#17092)
The job scale RPC endpoint hard-coded the eval creation to use the
type of service. This meant scaling events triggered on jobs of
type batch would create evaluations with the wrong type, which
does not seem to cause any problems, just confusion when
correlating the two.
2023-05-05 14:46:10 +01:00
Tim Gross 17bd930ca9
logs: fix missing allocation logs after update to Nomad 1.5.4 (#17087)
When the server restarts for the upgrade, it loads the `structs.Job` from the
Raft snapshot/logs. The jobspec has long since been parsed, so none of the
guards around the default value are in play. The empty field value for `Enabled`
is the zero value, which is false.

This doesn't impact any running allocation because we don't replace running
allocations when either the client or server restart. But as soon as any
allocation gets rescheduled (ex. you drain all your clients during upgrades),
it'll be using the `structs.Job` that the server has, which has `Enabled =
false`, and logs will not be collected.

This changeset fixes the bug by adding a new field `Disabled` which defaults to
false (so that the zero value works), and deprecates the old field.

Fixes #17076
2023-05-04 16:01:18 -04:00
Michael Schurter 3b3b02b741
dep: update from jwt/v4 to jwt/v5 (#17062)
Their release notes are here: https://github.com/golang-jwt/jwt/releases

Seemed wise to upgrade before we do even more with JWTs. For example
this upgrade *would* have mattered if we already implemented common JWT
claims such as expiration. Since we didn't rely on any claim
verification this upgrade is a noop...

...except for 1 test that called `Claims.Valid()`! Removing that
assertion *seems* scary, but it didn't actually do anything because we
didn't implement any of the standard claims it validated:

https://github.com/golang-jwt/jwt/blob/v4.5.0/map_claims.go#L120-L151

So functionally this major upgrade is a noop.
2023-05-03 11:17:38 -07:00
Luiz Aoqui 7b5a8f1fb0
Revert "hashicorp/go-msgpack v2 (#16810)" (#17047)
This reverts commit 8a98520d56eed3848096734487d8bd3eb9162a65.
2023-05-01 17:18:34 -04:00
Michael Schurter d3b0bbc088
deps: update go-bexpr from 0.1.11 to 0.1.12 (#16991)
Pulls in https://github.com/hashicorp/go-bexpr/pull/38

Fixes #16758
2023-04-27 09:01:42 -07:00
James Rasell ac98c2ed40
vars: ensure struct reciever names are consistent. (#16995) 2023-04-27 13:51:11 +01:00
James Rasell 4d2c1403c2
scale: do not allow scaling of jobs with type system. (#16969) 2023-04-25 15:47:44 +01:00
Tim Gross 72cbe53f19
logs: allow disabling log collection in jobspec (#16962)
Some Nomad users ship application logs out-of-band via syslog. For these users
having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow
disabling the logmon and pointing the task's stdout/stderr to /dev/null.

This changeset is the first of several incremental improvements to log
collection short of full-on logging plugins. The next step will likely be to
extend the internal-only task driver configuration so that cluster
administrators can turn off log collection for the entire driver.

---

Fixes: #11175

Co-authored-by: Thomas Weber <towe75@googlemail.com>
2023-04-24 10:00:27 -04:00
valodzka 379497a484
fix host port handling for ipv6 (#16723) 2023-04-20 19:53:20 -07:00
James Rasell 367cfa6d93
rpc: use "+" concatination in hot path RPC rate limit metrics. (#16923) 2023-04-18 13:41:34 +01:00
Ian Fijolek 619f49afcf
hashicorp/go-msgpack v2 (#16810)
* Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0

Fixes #16808

* Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack

* deps: use go-msgpack v2.0.0

go-msgpack v2.1.0 includes some code changes that we will need to
investigate furthere to assess its impact on Nomad, so keeping this
dependency on v2.0.0 for now since it's no-op.

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-04-17 17:02:05 -04:00
Tim Gross 62548616d4
client: allow drain_on_shutdown configuration (#16827)
Adds a new configuration to clients to optionally allow them to drain their
workloads on shutdown. The client sends the `Node.UpdateDrain` RPC targeting
itself and then monitors the drain state as seen by the server until the drain
is complete or the deadline expires. If it loses connection with the server, it
will monitor local client status instead to ensure allocations are stopped
before exiting.
2023-04-14 15:35:32 -04:00
Tim Gross 5a9abdc469
drain: use client status to determine drain is complete (#14348)
If an allocation is slow to stop because of `kill_timeout` or `shutdown_delay`,
the node drain is marked as complete prematurely, even though drain monitoring
will continue to report allocation migrations. This impacts the UI or API
clients that monitor node draining to shut down nodes.

This changeset updates the behavior to wait until the client status of all
drained allocs are terminal before marking the node as done draining.
2023-04-13 08:55:28 -04:00
James Rasell b7a41fe48d
core: ensure all Server receiver names are consistent. (#16859) 2023-04-12 14:03:07 +01:00
Juana De La Cuesta 8302085384
Deployment Status Command Does Not Respect -namespace Wildcard (#16792)
* func: add namespace support for list deployment

* func: add wildcard to namespace filter for deployments

* Update deployment_endpoint.go

* style: use must instead of require or asseert

* style: rename paginator to avoid clash with import

* style: add changelog entry

* fix: add missing parameter for upsert jobs
2023-04-12 11:02:14 +02:00
Tim Gross a9a350cfdb
drainer: fix codec race condition in integration test (#16845)
msgpackrpc codec handles are specific to a connection and cannot be shared
between goroutines; this can cause corrupted decoding. Fix the drainer
integration test so that we create separate codecs for the goroutines that the
test helper spins up to simulate client updates.

This changeset also refactors the drainer integration test to bring it up to
current idioms and library usages, make assertions more clear, and reduce
duplication.
2023-04-11 14:31:13 -04:00
Seth Hoenig ba728f8f97
api: enable support for setting original job source (#16763)
* api: enable support for setting original source alongside job

This PR adds support for setting job source material along with
the registration of a job.

This includes a new HTTP endpoint and a new RPC endpoint for
making queries for the original source of a job. The
HTTP endpoint is /v1/job/<id>/submission?version=<version> and
the RPC method is Job.GetJobSubmission.

The job source (if submitted, and doing so is always optional), is
stored in the job_submission memdb table, separately from the
actual job. This way we do not incur overhead of reading the large
string field throughout normal job operations.

The server config now includes job_max_source_size for configuring
the maximum size the job source may be, before the server simply
drops the source material. This should help prevent Bad Things from
happening when huge jobs are submitted. If the value is set to 0,
all job source material will be dropped.

* api: avoid writing var content to disk for parsing

* api: move submission validation into RPC layer

* api: return an error if updating a job submission without namespace or job id

* api: be exact about the job index we associate a submission with (modify)

* api: reword api docs scheduling

* api: prune all but the last 6 job submissions

* api: protect against nil job submission in job validation

* api: set max job source size in test server

* api: fixups from pr
2023-04-11 08:45:08 -05:00
Piotr Kazmierczak dea8b1a093
acl: bump JWT auth gate to 1.5.4 (#16838) 2023-04-11 10:07:45 +02:00
hashicorp-copywrite[bot] 005636afa0 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Tim Gross 1335543731
ephemeral disk: migrate should imply sticky (#16826)
The `ephemeral_disk` block's `migrate` field allows for best-effort migration of
the ephemeral disk data to new nodes. The documentation says the `migrate` field
is only respected if `sticky=true`, but in fact if client ACLs are not set the
data is migrated even if `sticky=false`.

The existing behavior when client ACLs are disabled has existed since the early
implementation, so "fixing" that case now would silently break backwards
compatibility. Additionally, having `migrate` not imply `sticky` seems
nonsensical: it suggests that if we place on a new node we migrate the data but
if we place on the same node, we throw the data away!

Update so that `migrate=true` implies `sticky=true` as follows:

* The failure mode when client ACLs are enabled comes from the server not passing
  along a migration token. Update the server so that the server provides a
  migration token whenever `migrate=true` and not just when `sticky=true` too.
* Update the scheduler so that `migrate` implies `sticky`.
* Update the client so that we check for `migrate || sticky` where appropriate.
* Refactor the E2E tests to move them off the old framework and make the intention
  of the test more clear.
2023-04-07 16:33:45 -04:00
hc-github-team-nomad-core 3578078caf Prepare for next release 2023-04-05 12:31:42 -04:00
hc-github-team-nomad-core b64ee2726d Generate files for 1.5.3 release 2023-04-05 12:31:30 -04:00
Tim Gross 8278f23042 acl: fix ACL bypass for anon requests that pass thru client HTTP
Requests without an ACL token that pass thru the client's HTTP API are treated
as though they come from the client itself. This allows bypass of ACLs on RPC
requests where ACL permissions are checked (like `Job.Register`). Invalid tokens
are correctly rejected.

Fix the bypass by only setting a client ID on the identity if we have a valid node secret.

Note that this changeset will break rate metrics for RPCs sent by clients
without a client secret such as `Node.GetClientAllocs`; these requests will be
recorded as anonymous.

Future work should:
* Ensure the node secret is sent with all client-driven RPCs except
  `Node.Register` which is TOFU.
* Create a new `acl.ACL` object from client requests so that we
  can enforce ACLs for all endpoints in a uniform way that's less error-prone.~
2023-04-05 12:17:51 -04:00
Juana De La Cuesta 9b4871fece
Prevent kill_timeout greater than progress_deadline (#16761)
* func: add validation for kill timeout smaller than progress dealine

* style: add changelog

* style: typo in changelog

* style: remove refactored test

* Update .changelog/16761.txt

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/structs/structs.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

---------

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-04-04 18:17:10 +02:00
Juana De La Cuesta 1fc13b83d8
style: update documentation (#16729) 2023-03-31 16:38:16 +02:00
Daniel Bennett c42950e342
ent: move all license info into LicenseConfig{} (#16738)
and add new TestConfigForServer() to get a
valid nomad.Config to use in tests

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-03-30 16:15:05 -05:00
Horacio Monsalvo 20372b1721
connect: add meta on ConsulSidecarService (#16705)
Co-authored-by: Sol-Stiep <sol.stiep@southworks.com>
2023-03-30 16:09:28 -04:00
Piotr Kazmierczak 1a5eba24a6 acl: set minACLJWTAuthMethodVersion to 1.5.3 and adjust code comment 2023-03-30 15:30:42 +02:00
Piotr Kazmierczak d98c8f6759 acl: rebased on main and changed the gate to 1.5.3-dev 2023-03-30 09:40:12 +02:00
Piotr Kazmierczak 16b6bd9ff2 acl: fix canonicalization of JWT auth method mock (#16531) 2023-03-30 09:39:56 +02:00
Piotr Kazmierczak 2b353902a1 acl: HTTP endpoints for JWT auth (#16519) 2023-03-30 09:39:56 +02:00
Piotr Kazmierczak e48c48e89b acl: RPC endpoints for JWT auth (#15918) 2023-03-30 09:39:56 +02:00
Piotr Kazmierczak a9230fb0b7 acl: JWT auth method 2023-03-30 09:39:56 +02:00
Juana De La Cuesta 320884b8ee
Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true (#16583)
* Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true
Fixes #11052
When restoring periodic dispatcher, all periodic jobs are forced without checking for previous childre.

* Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true
Fixes #11052
When restoring periodic dispatcher, all periodic jobs are forced without checking for previous children.

* style: refactor force run function

* fix: remove defer and inline unlock for speed optimization

* Update nomad/leader.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/leader_test.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/leader_test.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/leader_test.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/leader_test.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/leader_test.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/leader_test.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/leader_test.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* style: refactor tests to use must

* Update nomad/leader_test.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/leader_test.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/leader_test.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/leader_test.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* Update nomad/leader_test.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>

* fix: move back from defer to calling unlock before returning.

createEval cant be called with the lock on

* style: refactor test to use must

* added new entry to changelog and update comments

---------

Co-authored-by: James Rasell <jrasell@hashicorp.com>
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-03-27 17:25:05 +02:00
Juana De La Cuesta 21b675244e
style: rename ForceRun to ForceEval, for clarity (#16617) 2023-03-27 15:38:48 +02:00
Tim Gross 977c88dcea
drainer: test refactoring to clarify behavior around delete/down nodes (#16612)
This changeset refactors the tests of the draining node watcher so that we don't
mock the node watcher's `Remove` and `Update` methods for its own tests. Instead
we'll mock the node watcher's dependencies (the job watcher and deadline
notifier) and now unit tests can cover the real code. This allows us to remove a
bunch of TODOs in `watch_nodes.go` around testing and clarify some important
behaviors:

* Nodes that are down or disconnected will still be watched until the scheduler
  decides what to do with their allocations. This will drive the job watcher but
  not the node watcher, and that lets the node watcher gracefully handle cases
  where a heartbeat fails but the node heartbeats again before its allocs can be
  evicted.

* Stop watching nodes that have been deleted. The blocking query for nodes set
  the maximum index to the highest index of a node it found, rather than the
  index of the nodes table. This misses updates to the index from deleting
  nodes. This was done as an performance optimization to avoid excessive
  unblocking, but because the query is over all nodes anyways there's no
  optimization to be had here. Remove the optimization so we can detect deleted
  nodes without having to wait for an update to an unrelated node.
2023-03-23 14:07:09 -04:00
Michael Schurter f8884d8b52
client/metadata: fix crasher caused by AllowStale = false (#16549)
Fixes #16517

Given a 3 Server cluster with at least 1 Client connected to Follower 1:

If a NodeMeta.{Apply,Read} for the Client request is received by
Follower 1 with `AllowStale = false` the Follower will forward the
request to the Leader.

The Leader, not being connected to the target Client, will forward the
RPC to Follower 1.

Follower 1, seeing AllowStale=false, will forward the request to the
Leader.

The Leader, not being connected to... well hoppefully you get the
picture: an infinite loop occurs.
2023-03-20 16:32:32 -07:00
Seth Hoenig d6dcc53c0a
tls enforcement flaky tests (#16543)
* tests: add WaitForLeaders helpers using must/wait timings

* tests: start servers for mtls tests together

Fixes #16253 (hopefully)
2023-03-17 14:11:13 -05:00
Piotr Kazmierczak 14927e93bc
acl: fix canonicalization of OIDC auth method mock (#16534) 2023-03-17 15:37:54 +01:00
Seth Hoenig ed7177de76
scheduler: annotate tasksUpdated with reason and purge DeepEquals (#16421)
* scheduler: annotate tasksUpdated with reason and purge DeepEquals

* cr: move opaque into helper

* cr: swap affinity/spread hashing for slice equal

* contributing: update checklist-jobspec with notes about struct methods

* cr: add more cases to wait config equal method

* cr: use reflect when comparing envoy config blocks

* cl: add cl
2023-03-14 09:46:00 -05:00
Luiz Aoqui adf147cb36
acl: update job eval requirement to submit-job (#16463)
The job evaluate endpoint creates a new evaluation for the job which is
a write operation. This change modifies the necessary capability from
`read-job` to `submit-job` to better reflect this.
2023-03-13 17:13:54 -04:00
Tim Gross 9dfb51579c
scheduler: refactor system util tests (#16416)
The tests for the system allocs reconciling code path (`diffSystemAllocs`)
include many impossible test environments, such as passing allocs for the wrong
node into the function. This makes the test assertions nonsensible for use in
walking yourself through the correct behavior.

I've pulled this changeset out of PR #16097 so that we can merge these
improvements and revisit the right approach to fix the problem in #16097 with
less urgency now that the PFNR bug fix has been merged. This changeset breaks up
a couple of tests, expands test coverage, and makes test assertions more
clear. It also corrects one bit of production code that behaves fine in
production because of canonicalization, but forces us to remember to set values
in tests to compensate.
2023-03-13 11:59:31 -04:00
Seth Hoenig 630bd8eb68
scheduler: add simple benchmark for tasksUpdated (#16422)
In preperation for some refactoring to tasksUpdated, add a benchmark to the
old code so it's easy to compare with the changes, making sure nothing goes
off the rails for performance.
2023-03-13 10:44:14 -05:00
hc-github-team-nomad-core 2d1a4d90e9 Prepare for next release 2023-03-13 11:13:27 -04:00
hc-github-team-nomad-core 35167e692a Generate files for 1.5.1 release 2023-03-13 11:13:27 -04:00