Commit graph

850 commits

Author SHA1 Message Date
Tim Gross 2d16ec6c6f
node pools: implement RPC to list jobs in a given node pool (#17396)
Implements the `NodePool.ListJobs` RPC, with pagination and filtering based on
the existing `Job.List` RPC.
2023-06-05 15:36:52 -04:00
KamilCuk cc64281445
Add group_add docker option (#17313) 2023-06-02 20:26:01 -04:00
Luiz Aoqui 6039c18ab6
node pools: register a node in a node pool (#17405) 2023-06-02 17:50:50 -04:00
Luiz Aoqui b770f2b1ef
node pools: implement CLI (#17388) 2023-06-02 15:49:57 -04:00
Samantha b92a782b6e
check: Add support for Consul field tls_server_name (#17334) 2023-06-02 10:19:12 -04:00
Luiz Aoqui 9bb57c08e3
node pool: add search support (#17385) 2023-06-01 17:48:14 -04:00
Tim Gross 4f14fa0518
node pools: add node_pool field to job spec (#17379)
This changeset only adds the `node_pool` field to the jobspec, and ensures that
it gets picked up correctly as a change. Without the rest of the implementation
landed yet, the field will be ignored.
2023-06-01 16:08:55 -04:00
Luiz Aoqui c61e75f302
node pools: add CRUD API (#17384) 2023-06-01 15:55:49 -04:00
Luiz Aoqui 45b0391378
np: implement ACL for node pools (#17365) 2023-06-01 13:03:20 -04:00
Seth Hoenig e04d8cf77b
docs: fixup example of readiness check (#17296)
A "readiness" check implies a failing healthcheck will not cause the
deployment of a service to stop - i.e. it is only used as a liveness
probe in the context of service discoverability.

Fix our docs example to reflect that a readiness check is created by
setting on_update to "ignore" (as opposed to "ignore_warnings").
2023-05-23 15:29:10 -05:00
Tim Gross b9ca3bc9b1
build: remove 386 builds for Nomad 1.6.0 (#17239)
The 32-bit Intel builds (aka "386") are not tested and likely have bugs
involving platform-sized integers when operated at any non-trivial scale. Remove
these builds from the upcoming Nomad 1.6.0 and provide recommendations in the
upgrade notes for those users who might have hobbyist boards running 32-bit
ARM (this will primarily be the RaspberryPi Zero or older spins of the RaspPi).

DO NOT BACKPORT TO 1.5.x OR EARLIER!
2023-05-22 13:27:17 -04:00
Lance Haig 568da5918b
cli: tls certs not created with correct SANs (#16959)
The `nomad tls cert` command did not create certificates with the correct SANs for
them to work with non default domain and region names. This changset updates the
code to support non default domains and regions in the certificates.
2023-05-22 09:31:56 -04:00
Tim Gross 9838349c23
document which fields can be updated by volume register (#17249)
The `volume register` command can update a small subset of the volume's fields
in-place, with some restrictions depending on whether the volume is currently in
use. Document these in the `volume register` command docs and the volume
specification docs.

Fixes: #17247
2023-05-22 09:15:25 -04:00
Tim Gross 4881f2451a
docs: describe the default Workload Identity ACL policy (#17245)
Workload Identities have an implicit default policy. This policy can't currently
be described via HCL because it includes task interpolation for Variables and
access to the Services API (which doesn't exist as its own ACL
capbility). Describe this in our WI documentation.

Fixes: #16277
2023-05-19 11:38:05 -04:00
Mike Nomitch 6df2160e69
docs: add documentation on ephemeral disk and logs (#15829) 2023-05-17 16:58:11 -04:00
Roman Zipp edf83f432a
docs: remove unneeded brackets from job specification template docs (#17219) 2023-05-17 16:45:00 -04:00
Tim Gross 6814e8e6d9
drivers: make internal DisableLogCollection capability public (#17196)
The `DisableLogCollection` capability was introduced as an experimental
interface for the Docker driver in 0.10.4. The interface has been stable and
allowing third-party task drivers the same capability would be useful for those
drivers that don't need the additional overhead of logmon.

This PR only makes the capability public. It doesn't yet add it to the
configuration options for the other internal drivers.

Fixes: #14636 #15686
2023-05-16 09:16:03 -04:00
Mark Lewis 1729c955d2
Update delete.mdx (#17184)
Fix typo
2023-05-15 13:31:52 +01:00
Tim Gross ba269eaf3f
docs: add note to upgrade guide about yanked version (#17115)
Nomad 1.5.4 shipped with a logmon bug that we rolled out a fix for in Nomad
1.5.5. Unfortunately we can't yank the release but we should leave a note in the
upgrade guide telling users to avoid it.
2023-05-08 13:28:45 -04:00
Tim Gross 5f3ff346ea
post release 1.5.5 (#17098)
* changelog entries for 1.5.5 and missing merge of changelog for 1.5.4, 1.4.9,
  and 1.3.14
* note on deprecation of `logs.enabled` field
2023-05-05 11:46:08 -04:00
Tim Gross 17bd930ca9
logs: fix missing allocation logs after update to Nomad 1.5.4 (#17087)
When the server restarts for the upgrade, it loads the `structs.Job` from the
Raft snapshot/logs. The jobspec has long since been parsed, so none of the
guards around the default value are in play. The empty field value for `Enabled`
is the zero value, which is false.

This doesn't impact any running allocation because we don't replace running
allocations when either the client or server restart. But as soon as any
allocation gets rescheduled (ex. you drain all your clients during upgrades),
it'll be using the `structs.Job` that the server has, which has `Enabled =
false`, and logs will not be collected.

This changeset fixes the bug by adding a new field `Disabled` which defaults to
false (so that the zero value works), and deprecates the old field.

Fixes #17076
2023-05-04 16:01:18 -04:00
Seth Hoenig 4347c1d705
docs: move CNI reference plugins installation to CNI overview page (#17068)
* docs: move CNI reference plugins installation to CNI overview page

This PR moves the instruction steps for install the CNI reference plugins
from the Consul Mesh integration page to the general Networking CNI page.

These plugins are required for bridge networking, not just Consul Mesh,
so it makes sense to have them on the general CNI page.

Closes #17038

* docs: fix a link to post install steps
2023-05-04 11:32:06 -05:00
James Rasell 50414bba12
docs: update artifact jobspec sshkey example path. (#17077) 2023-05-04 14:29:36 +01:00
Seth Hoenig e8d53ea30b
connect: use explicit docker.io prefix in default envoy image names (#17045)
This PR modifies references to the envoyproxy/envoy docker image to
explicitly include the docker.io prefix. This does not affect existing
users, but makes things easier for Podman users, who otherwise need to
specify the full name because Podman does not default to docker.io
2023-05-02 09:27:48 -05:00
Seth Hoenig 5744b2cd4f
docs: add more notes about artifact breaking changes in 1.5.0 (#17005)
* changelog: note artifact breaking changes for 1.5.0

* docs: add note about environment variables to artifact job spec docs

* Update website/content/docs/job-specification/artifact.mdx

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-04-27 11:41:18 -05:00
James Rasell fddef4c6e1
docs: use appropriate file extension for autoscaler agent config. (#16993) 2023-04-27 15:00:28 +01:00
Tim Gross 72cbe53f19
logs: allow disabling log collection in jobspec (#16962)
Some Nomad users ship application logs out-of-band via syslog. For these users
having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow
disabling the logmon and pointing the task's stdout/stderr to /dev/null.

This changeset is the first of several incremental improvements to log
collection short of full-on logging plugins. The next step will likely be to
extend the internal-only task driver configuration so that cluster
administrators can turn off log collection for the entire driver.

---

Fixes: #11175

Co-authored-by: Thomas Weber <towe75@googlemail.com>
2023-04-24 10:00:27 -04:00
Tim Gross b5a54b3b5f
docs: fix keyring path in install docs (#16946) 2023-04-20 16:20:39 -04:00
Luiz Aoqui b0fe69fded
docs: add missing field Capabilities to Namespace API (#16931) 2023-04-19 08:14:36 -07:00
Luiz Aoqui c7387dbd3a
docs: add missing API field JobACL and fix workload identity headers (#16930) 2023-04-19 08:12:58 -07:00
Chris van Meer d2f1766f3a
Updates to the UI block (#16328)
1. On the Consul address, following the recommendation for the HTTPS
   API on port 8501.
2. Add the hint to use HEX values for the colors.
2023-04-18 18:28:17 -07:00
Tim Gross 04e049caed
license: show Terminated field in license get command (#16892) 2023-04-17 09:01:43 -04:00
Tim Gross 62548616d4
client: allow drain_on_shutdown configuration (#16827)
Adds a new configuration to clients to optionally allow them to drain their
workloads on shutdown. The client sends the `Node.UpdateDrain` RPC targeting
itself and then monitors the drain state as seen by the server until the drain
is complete or the deadline expires. If it loses connection with the server, it
will monitor local client status instead to ensure allocations are stopped
before exiting.
2023-04-14 15:35:32 -04:00
Michael Schurter 79c521e570
docs: add node meta command docs (#16828)
* docs: add node meta command docs

Fixes #16758

* it helps if you actually add the files to git

* fix typos and examples vs usage
2023-04-12 15:29:33 -07:00
Tim Gross 4df2d9bda8
E2E: clarify drain -deadline and -force flag behaviors (#16868)
The `-deadline` and `-force` flag for the `nomad node drain` command only cause
the draining to ignore the `migrate` block's healthy deadline, max parallel,
etc. These flags don't have anything to do with the `kill_timeout` or
`shutdown_delay` options of the jobspec.

This changeset fixes the skipped E2E tests so that they validate the intended
behavior, and updates the docs for more clarity.
2023-04-12 15:27:24 -04:00
Tim Gross 657ae6f7d2
docs: document signal handling (#16835)
Expand documentation about Nomad's signal handling behaviors, including removing
incorrect information about graceful client shutdowns.
2023-04-11 16:26:39 -04:00
Seth Hoenig ba728f8f97
api: enable support for setting original job source (#16763)
* api: enable support for setting original source alongside job

This PR adds support for setting job source material along with
the registration of a job.

This includes a new HTTP endpoint and a new RPC endpoint for
making queries for the original source of a job. The
HTTP endpoint is /v1/job/<id>/submission?version=<version> and
the RPC method is Job.GetJobSubmission.

The job source (if submitted, and doing so is always optional), is
stored in the job_submission memdb table, separately from the
actual job. This way we do not incur overhead of reading the large
string field throughout normal job operations.

The server config now includes job_max_source_size for configuring
the maximum size the job source may be, before the server simply
drops the source material. This should help prevent Bad Things from
happening when huge jobs are submitted. If the value is set to 0,
all job source material will be dropped.

* api: avoid writing var content to disk for parsing

* api: move submission validation into RPC layer

* api: return an error if updating a job submission without namespace or job id

* api: be exact about the job index we associate a submission with (modify)

* api: reword api docs scheduling

* api: prune all but the last 6 job submissions

* api: protect against nil job submission in job validation

* api: set max job source size in test server

* api: fixups from pr
2023-04-11 08:45:08 -05:00
Tim Gross 1335543731
ephemeral disk: migrate should imply sticky (#16826)
The `ephemeral_disk` block's `migrate` field allows for best-effort migration of
the ephemeral disk data to new nodes. The documentation says the `migrate` field
is only respected if `sticky=true`, but in fact if client ACLs are not set the
data is migrated even if `sticky=false`.

The existing behavior when client ACLs are disabled has existed since the early
implementation, so "fixing" that case now would silently break backwards
compatibility. Additionally, having `migrate` not imply `sticky` seems
nonsensical: it suggests that if we place on a new node we migrate the data but
if we place on the same node, we throw the data away!

Update so that `migrate=true` implies `sticky=true` as follows:

* The failure mode when client ACLs are enabled comes from the server not passing
  along a migration token. Update the server so that the server provides a
  migration token whenever `migrate=true` and not just when `sticky=true` too.
* Update the scheduler so that `migrate` implies `sticky`.
* Update the client so that we check for `migrate || sticky` where appropriate.
* Refactor the E2E tests to move them off the old framework and make the intention
  of the test more clear.
2023-04-07 16:33:45 -04:00
Tim Gross e117ff3877
docs: remove reference to vSphere from CSI concepts docs (#16765)
The vSphere plugin is exclusive to k8s because it relies on k8s-APIs (and
crashes without them being present). Upstream unfortunately will not support
Nomad, so we shouldn't refer to it in our concept docs here.
2023-04-05 15:20:24 -04:00
James Rasell cb6ba80f0f
cli: stream both stdout and stderr when following an alloc. (#16556)
This update changes the behaviour when following logs from an
allocation, so that both stdout and stderr files streamed when the
operator supplies the follow flag. The previous behaviour is held
when all other flags and situations are provided.

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-04-04 10:42:27 +01:00
Tim Gross 0c582a2c94
docs: fix use of gpg to avoid teeing binary to terminal (#16767) 2023-04-03 10:54:21 -04:00
Tim Gross ffd5435ceb
docs: fix install instructions for apt (#16764)
The workflow described in the docs for apt installation is deprecated. Update to
match the workflow described in the Tutorials and official packaging guide.
2023-04-03 10:06:59 -04:00
Daniel Bennett c9adc22eec
Update enterprise licensing documentation (#16615)
updated various docs for new expiration behavior
and new command `nomad license inspect` to validate pre-upgrade
2023-03-30 16:40:19 -05:00
Horacio Monsalvo 20372b1721
connect: add meta on ConsulSidecarService (#16705)
Co-authored-by: Sol-Stiep <sol.stiep@southworks.com>
2023-03-30 16:09:28 -04:00
Piotr Kazmierczak acfc266c30 acl: JWT changelog entry and typo fix 2023-03-30 09:40:11 +02:00
Piotr Kazmierczak 4609119fb5 acl: JWT auth CLI (#16532) 2023-03-30 09:39:56 +02:00
Piotr Kazmierczak a9230fb0b7 acl: JWT auth method 2023-03-30 09:39:56 +02:00
Max Fröhlich ba590b081e
docs: mention Nomad Admission Control Proxy (#16702) 2023-03-28 15:18:26 -04:00
Tim Gross f22ff2b847
docs: clarify capabilities options for docker driver (#16693)
The `docker` driver cannot expand capabilities beyond the default set when the
task is a non-root user. Clarify this in the documentation of `allow_caps` and
update the `cap_add` and `cap_drop` to match the `exec` driver, which has more
clear language overall.
2023-03-28 13:32:08 -04:00
Tim Gross 78acc75b57
docs: add notes about keyring to snapshot restore (#16663)
When cluster administrators restore from Raft snapshot, they also need to ensure the
keyring is in place. For on-prem users doing in-place upgrades this is less of a
concern but for typical cloud workflows where the whole host is replaced, it's
an important warning (at least until #14852 has been implemented).
2023-03-28 08:31:01 -04:00