Commit Graph

3595 Commits

Author SHA1 Message Date
Nando 66809615f4 volume-status : show namespace the volume belongs to (#17911)
* volume-status : show namespace the volume belongs to

Re-apply changes reverted by 950235df48869e0f3f1dc8950dc430394ababa85
2023-08-17 11:13:42 -04:00
Tim Gross 0a19fe3b60 fix multiple overflow errors in exponential backoff (#18200)
We use capped exponential backoff in several places in the code when handling
failures. The code we've copy-and-pasted all over has a check to see if the
backoff is greater than the limit, but this check happens after the bitshift and
we always increment the number of attempts. This causes an overflow with a
fairly small number of failures (ex. at one place I tested it occurs after only
24 iterations), resulting in a negative backoff which then never recovers. The
backoff becomes a tight loop consuming resources and/or DoS'ing a Nomad RPC
handler or an external API such as Vault. Note this doesn't occur in places
where we cap the number of iterations so the loop breaks (usually to return an
error), so long as the number of iterations is reasonable.

Introduce a helper with a check on the cap before the bitshift to avoid overflow in all 
places this can occur.

Fixes: #18199
Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>
2023-08-15 14:39:09 -04:00
Seth Hoenig a45b689d8e update go1.21 (#18184)
* build: update to go1.21

* go: eliminate helpers in favor of min/max

* build: run go mod tidy

* build: swap depguard for semgrep

* command: fixup broken tls error check on go1.21
2023-08-15 14:40:33 +02:00
Esteban Barrios 9f19d7c373 config: add configurable content security policy (#18085) 2023-08-14 14:25:21 -04:00
hc-github-team-nomad-core f812bccb4e
Backport of Tuning job versions retention. #17635 into release/1.6.x (#18169)
This pull request was automerged via backport-assistant
2023-08-07 13:48:09 -05:00
hc-github-team-nomad-core d3529d7be6
Backport of CLI: make snapshot name requiered in creating volume snapshots into release/1.6.x (#18152)
This pull request was automerged via backport-assistant
2023-08-04 04:36:50 -05:00
hc-github-team-nomad-core 3b076edf11
Backport of cli: search all namespaces for node volumes into release/1.6.x (#18119)
This pull request was automerged via backport-assistant
2023-08-01 08:56:34 -05:00
Tim Gross 9fe88ebefe cli: support wildcard namespace in alloc subcommands (#18095)
The alloc exec and filesystem/logs commands allow passing the `-job` flag to
select a random allocation. If the namespace for the command is set to `*`, the
RPC handler doesn't handle this correctly as it's expecting to query for a
specific job. Most commands handle this ambiguity by first verifying that only a
single object of the type in question exists (ex. a single node or job).

Update these commands so that when the `-job` flag is set we first verify
there's a single job that matches. This also allows us to extend the
functionality to allow for the `-job` flag to support prefix matching.

Fixes: #12097
2023-07-31 13:15:49 -04:00
hc-github-team-nomad-core 2ed92e0c6c
Backport of feature: Add new field render_templates on restart block into release/1.6.x (#18094)
This pull request was automerged via backport-assistant
2023-07-28 13:54:00 -05:00
hc-github-team-nomad-core 34ac0e5aad
cli: add help message for `-consul-namespace` (#18081) (#18091)
Add missing help entry for the `-consul-namespace` flag in `nomad job
run`.
2023-07-28 10:34:44 -04:00
James Rasell b8cb1e79a3
chore(lint): use Go stdlib variables for HTTP methods and status codes (#17968) (#18074)
Co-authored-by: Ville Vesilehto <ville@vesilehto.fi>
2023-07-26 16:38:39 +01:00
hc-github-team-nomad-core cf18df8eb4
backport of commit 14102979762cc48183cd70dc91e26c08f630ab9d (#18067)
This pull request was automerged via backport-assistant
2023-07-26 08:30:35 -05:00
hc-github-team-nomad-core b4c4dcb818
backport of commit b7d14f133c69a64e39c40417705d29b6f2b96f60 (#18065)
This pull request was automerged via backport-assistant
2023-07-26 08:23:49 -05:00
James Rasell 40549e1132
check in stderrFrame is nil before logging stderrFrame.Data (#17815) (#18041)
Co-authored-by: Kevin Mulvey <kmulvey@linux.com>
2023-07-24 10:32:10 +01:00
hc-github-team-nomad-core 88ea0c3cc2 Generate files for 1.6.1 release 2023-07-21 13:49:42 +00:00
hc-github-team-nomad-core 3011314f23
Backport of volume-status : show namespace the volume belongs to into release/1.6.x (#17997)
This pull request was automerged via backport-assistant
2023-07-19 15:37:18 -05:00
hc-github-team-nomad-core 609a97cfab Generate files for 1.6.0 release 2023-07-18 18:51:11 +00:00
hc-github-team-nomad-core 1a1e1d5d4d Generate files for 1.6.0-rc.1 release 2023-07-11 15:19:54 +00:00
hc-github-team-nomad-core 0951fe1c50
backport of commit 0a5e90120b18ff450457463d6bcee68ec6804bb0 (#17900)
This pull request was automerged via backport-assistant
2023-07-11 10:00:05 -05:00
Lance Haig 0455389534
Add the ability to customise the details of the CA (#17309)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-07-11 08:53:09 +01:00
hc-github-team-nomad-core 5c703a49b1 Generate files for 1.6.0-beta.1 release 2023-06-28 11:06:20 -04:00
Tim Gross 926b3030d7
cli: fix broken `node pool jobs` test (#17715)
In #17705 we fixed a bug in the treatment of the "all" node pool for the `node
pool jobs` command but missed a test in the CLI.
2023-06-23 14:10:45 -07:00
grembo 7936c1e33f
Add `disable_file` parameter to job's `vault` stanza (#13343)
This complements the `env` parameter, so that the operator can author
tasks that don't share their Vault token with the workload when using 
`image` filesystem isolation. As a result, more powerful tokens can be used 
in a job definition, allowing it to use template stanzas to issue all kinds of 
secrets (database secrets, Vault tokens with very specific policies, etc.), 
without sharing that issuing power with the task itself.

This is accomplished by creating a directory called `private` within
the task's working directory, which shares many properties of
the `secrets` directory (tmpfs where possible, not accessible by
`nomad alloc fs` or Nomad's web UI), but isn't mounted into/bound to the
container.

If the `disable_file` parameter is set to `false` (its default), the Vault token
is also written to the NOMAD_SECRETS_DIR, so the default behavior is
backwards compatible. Even if the operator never changes the default,
they will still benefit from the improved behavior of Nomad never reading
the token back in from that - potentially altered - location.
2023-06-23 15:15:04 -04:00
Phil Renaud 16886bf6bf
Moves to the current LTS release of Node for our build and release workflows (#17639) 2023-06-21 15:17:24 -04:00
Luiz Aoqui cfb3bb517f
np: scheduler configuration updates (#17575)
* jobspec: rename node pool scheduler_configuration

In HCL specifications we usually call configuration blocks `config`
instead of `configuration`.

* np: add memory oversubscription config

* np: make scheduler config ENT
2023-06-19 11:41:46 -04:00
Luiz Aoqui d07f9ae2fe
cli: prevent panic if job node pool is nil (#17571)
If the `nomad` CLI is used to access a cluster running a version that
does not include node pools the command will `nil` panic when trying to
resolve the job's node pool.
2023-06-16 17:08:36 -04:00
Luiz Aoqui d5aa72190f
node pools: namespace integration (#17562)
Add structs and fields to support the Nomad Pools Governance Enterprise
feature of controlling node pool access via namespaces.

Nomad Enterprise allows users to specify a default node pool to be used
by jobs that don't specify one. In order to accomplish this, it's
necessary to distinguish between a job that explicitly uses the
`default` node pool and one that did not specify any.

If the `default` node pool is set during job canonicalization it's
impossible to do this, so this commit allows a job to have an empty node
pool value during registration but sets to `default` at the admission
controller mutator.

In order to guarantee state consistency the state store validates that
the job node pool is set and exists before inserting it.
2023-06-16 16:30:22 -04:00
Tim Gross 5b9322c70a
docs: clarify node pool apply/delete behavior (#17529) 2023-06-14 15:58:53 -04:00
Tim Gross 5f509b8ce0
cli: fix missing `-quiet` flag for `var init` (#17526)
The `var init` command was intended to have support for a `-quiet` flag but it
was not documented and never parsed.
2023-06-14 14:52:46 -04:00
Tim Gross 736ad3ed32
docs: note namespace apply/delete behaviors, fix metric (#17527)
This changeset includes some fixes to documentation discovered while working on
node pools, but we didn't want to include in the node pool PRs so they can get
backported easily:

* namespace apply/delete commands are forwarded to the authoritative region
* deleting a namespace requires there are no non-terminal jobs in any of the
  federated regions
* fixed a typo in the name of the `nomad.client.allocated.disk` metric
2023-06-14 14:52:06 -04:00
Tim Gross c1a01697c8
node pools: implement `node pool init` command (#17479)
Implement a `nomad node pool init` command that generates an example spec file
in either HCL or JSON format.
2023-06-13 14:51:29 -04:00
Luiz Aoqui bc17cffaef
node pool: node pool upsert on multiregion node register (#17503)
When registering a node with a new node pool in a non-authoritative
region we can't create the node pool because this new pool will not be
replicated to other regions.

This commit modifies the node registration logic to only allow automatic
node pool creation in the authoritative region.

In non-authoritative regions, the client is registered, but the node
pool is not created. The client is kept in the `initialing` status until
its node pool is created in the authoritative region and replicated to
the client's region.
2023-06-13 11:28:28 -04:00
stswidwinski 9a58474400
conf: Add preemption_config to the server extra HCL keys which should be removed (#17481)
Add preemption_config to the set of keys which should be pruned from the server
config as described in #17480.
2023-06-13 10:48:19 +02:00
Tim Gross e8a361310f
node pools: replicate from authoritative region (#17456)
Upserts and deletes of node pools are forwarded to the authoritative region,
just like we do for namespaces, quotas, ACL policies, etc. Replicate node pools
from the authoritative region.
2023-06-12 13:24:24 -04:00
Phil Renaud 944f30674d
[ui] Parallelize ember tests (#17442)
* Exam to parallelize tests

* Logging to try to solve test flakiness

* Logging in another failure

* Hardening for one test and snapshot for another

* Explicitly set the first one as the servicedAlloc instead of randomly picking

* A wild CircleCI test failure appears

* de-log
2023-06-07 17:01:35 -04:00
Tim Gross fbaf4c8b69
node pools: implement support in scheduler (#17443)
Implement scheduler support for node pool:

* When a scheduler is invoked, we get a set of the ready nodes in the DCs that
  are allowed for that job. Extend the filter to include the node pool.
* Ensure that changes to a job's node pool are picked up as destructive
  allocation updates.
* Add `NodesInPool` as a metric to all reporting done by the scheduler.
* Add the node-in-pool the filter to the `Node.Register` RPC so that we don't
  generate spurious evals for nodes in the wrong pool.
2023-06-07 10:39:03 -04:00
Luiz Aoqui 5878113c41
node pool: implement `nomad node pool nodes` CLI (#17444) 2023-06-07 10:37:27 -04:00
Tim Gross 06fc284644
node pools: implement CLI for `node pool jobs` command (#17432) 2023-06-06 15:02:26 -04:00
Tim Gross c0f2295510
node pools: implement HTTP API to list jobs in pool (#17431)
Implements the HTTP API associated with the `NodePool.ListJobs` RPC, including
the `api` package for the public API and documentation.

Update the `NodePool.ListJobs` RPC to fix the missing handling of the special
"all" pool.
2023-06-06 11:40:13 -04:00
Luiz Aoqui 2420c93179
node pools: list nodes in pool (#17413) 2023-06-06 10:43:43 -04:00
Dao Thanh Tung 7c7f2d00bb
Add check for missing `path` in client `host_volume` config (#17393) 2023-06-05 19:31:19 -04:00
Luiz Aoqui 6039c18ab6
node pools: register a node in a node pool (#17405) 2023-06-02 17:50:50 -04:00
Luiz Aoqui b770f2b1ef
node pools: implement CLI (#17388) 2023-06-02 15:49:57 -04:00
Samantha b92a782b6e
check: Add support for Consul field tls_server_name (#17334) 2023-06-02 10:19:12 -04:00
Tim Gross 4f14fa0518
node pools: add `node_pool` field to job spec (#17379)
This changeset only adds the `node_pool` field to the jobspec, and ensures that
it gets picked up correctly as a change. Without the rest of the implementation
landed yet, the field will be ignored.
2023-06-01 16:08:55 -04:00
Luiz Aoqui c61e75f302
node pools: add CRUD API (#17384) 2023-06-01 15:55:49 -04:00
Luiz Aoqui 6236cb8f82
cli: output errors when monitoring deployment (#17348) 2023-05-30 11:12:12 -04:00
Luiz Aoqui e236d6dedd
cli: fix panic on job restart (#17346)
When monitoring the replacement allocation, if the
`Allocations().Info()` request fails, the `alloc` variable is `nil`, so
it should not be read.
2023-05-30 11:08:49 -04:00
Luiz Aoqui bb2395031b
client: fix Consul version finterprint (#17349)
Consul v1.13.8 was released with a breaking change in the /v1/agent/self
endpoint version where a line break was being returned.

This caused the Nomad finterprint to fail because `NewVersion` errors on
parse.

This commit removes any extra space from the Consul version returned by
the API.
2023-05-30 11:07:57 -04:00
Seth Hoenig acfdf0f479
compliance: add headers with fixed copywrite tool (#17353)
Closes #17117
2023-05-30 09:20:32 -05:00