Commit graph

4439 commits

Author SHA1 Message Date
Tim Gross b9ca3bc9b1
build: remove 386 builds for Nomad 1.6.0 (#17239)
The 32-bit Intel builds (aka "386") are not tested and likely have bugs
involving platform-sized integers when operated at any non-trivial scale. Remove
these builds from the upcoming Nomad 1.6.0 and provide recommendations in the
upgrade notes for those users who might have hobbyist boards running 32-bit
ARM (this will primarily be the RaspberryPi Zero or older spins of the RaspPi).

DO NOT BACKPORT TO 1.5.x OR EARLIER!
2023-05-22 13:27:17 -04:00
Lance Haig 568da5918b
cli: tls certs not created with correct SANs (#16959)
The `nomad tls cert` command did not create certificates with the correct SANs for
them to work with non default domain and region names. This changset updates the
code to support non default domains and regions in the certificates.
2023-05-22 09:31:56 -04:00
Tim Gross 9838349c23
document which fields can be updated by volume register (#17249)
The `volume register` command can update a small subset of the volume's fields
in-place, with some restrictions depending on whether the volume is currently in
use. Document these in the `volume register` command docs and the volume
specification docs.

Fixes: #17247
2023-05-22 09:15:25 -04:00
Tim Gross 4881f2451a
docs: describe the default Workload Identity ACL policy (#17245)
Workload Identities have an implicit default policy. This policy can't currently
be described via HCL because it includes task interpolation for Variables and
access to the Services API (which doesn't exist as its own ACL
capbility). Describe this in our WI documentation.

Fixes: #16277
2023-05-19 11:38:05 -04:00
Mike Nomitch 6df2160e69
docs: add documentation on ephemeral disk and logs (#15829) 2023-05-17 16:58:11 -04:00
Roman Zipp edf83f432a
docs: remove unneeded brackets from job specification template docs (#17219) 2023-05-17 16:45:00 -04:00
dependabot[bot] 7a92c7b5ac
build(deps-dev): bump prettier from 2.2.1 to 2.8.8 in /website (#16965)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-16 12:12:53 -05:00
Tim Gross 6814e8e6d9
drivers: make internal DisableLogCollection capability public (#17196)
The `DisableLogCollection` capability was introduced as an experimental
interface for the Docker driver in 0.10.4. The interface has been stable and
allowing third-party task drivers the same capability would be useful for those
drivers that don't need the additional overhead of logmon.

This PR only makes the capability public. It doesn't yet add it to the
configuration options for the other internal drivers.

Fixes: #14636 #15686
2023-05-16 09:16:03 -04:00
dependabot[bot] d980e0a815
build(deps-dev): bump @hashicorp/platform-content-conformance (#17030)
Bumps @hashicorp/platform-content-conformance from 0.0.10 to 0.0.11.

---
updated-dependencies:
- dependency-name: "@hashicorp/platform-content-conformance"
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-15 11:28:03 -04:00
dependabot[bot] f49eb3278b
build(deps-dev): bump next from 12.3.1 to 13.4.2 in /website (#17177)
Bumps [next](https://github.com/vercel/next.js) from 12.3.1 to 13.4.2.
- [Release notes](https://github.com/vercel/next.js/releases)
- [Changelog](https://github.com/vercel/next.js/blob/canary/release.js)
- [Commits](https://github.com/vercel/next.js/compare/v12.3.1...v13.4.2)

---
updated-dependencies:
- dependency-name: next
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-15 10:46:40 -04:00
Mark Lewis 1729c955d2
Update delete.mdx (#17184)
Fix typo
2023-05-15 13:31:52 +01:00
Tim Gross ba269eaf3f
docs: add note to upgrade guide about yanked version (#17115)
Nomad 1.5.4 shipped with a logmon bug that we rolled out a fix for in Nomad
1.5.5. Unfortunately we can't yank the release but we should leave a note in the
upgrade guide telling users to avoid it.
2023-05-08 13:28:45 -04:00
Tim Gross 5f3ff346ea
post release 1.5.5 (#17098)
* changelog entries for 1.5.5 and missing merge of changelog for 1.5.4, 1.4.9,
  and 1.3.14
* note on deprecation of `logs.enabled` field
2023-05-05 11:46:08 -04:00
Tim Gross 17bd930ca9
logs: fix missing allocation logs after update to Nomad 1.5.4 (#17087)
When the server restarts for the upgrade, it loads the `structs.Job` from the
Raft snapshot/logs. The jobspec has long since been parsed, so none of the
guards around the default value are in play. The empty field value for `Enabled`
is the zero value, which is false.

This doesn't impact any running allocation because we don't replace running
allocations when either the client or server restart. But as soon as any
allocation gets rescheduled (ex. you drain all your clients during upgrades),
it'll be using the `structs.Job` that the server has, which has `Enabled =
false`, and logs will not be collected.

This changeset fixes the bug by adding a new field `Disabled` which defaults to
false (so that the zero value works), and deprecates the old field.

Fixes #17076
2023-05-04 16:01:18 -04:00
Seth Hoenig 4347c1d705
docs: move CNI reference plugins installation to CNI overview page (#17068)
* docs: move CNI reference plugins installation to CNI overview page

This PR moves the instruction steps for install the CNI reference plugins
from the Consul Mesh integration page to the general Networking CNI page.

These plugins are required for bridge networking, not just Consul Mesh,
so it makes sense to have them on the general CNI page.

Closes #17038

* docs: fix a link to post install steps
2023-05-04 11:32:06 -05:00
James Rasell 50414bba12
docs: update artifact jobspec sshkey example path. (#17077) 2023-05-04 14:29:36 +01:00
Seth Hoenig e8d53ea30b
connect: use explicit docker.io prefix in default envoy image names (#17045)
This PR modifies references to the envoyproxy/envoy docker image to
explicitly include the docker.io prefix. This does not affect existing
users, but makes things easier for Podman users, who otherwise need to
specify the full name because Podman does not default to docker.io
2023-05-02 09:27:48 -05:00
Seth Hoenig 5744b2cd4f
docs: add more notes about artifact breaking changes in 1.5.0 (#17005)
* changelog: note artifact breaking changes for 1.5.0

* docs: add note about environment variables to artifact job spec docs

* Update website/content/docs/job-specification/artifact.mdx

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-04-27 11:41:18 -05:00
James Rasell fddef4c6e1
docs: use appropriate file extension for autoscaler agent config. (#16993) 2023-04-27 15:00:28 +01:00
Tim Gross 72cbe53f19
logs: allow disabling log collection in jobspec (#16962)
Some Nomad users ship application logs out-of-band via syslog. For these users
having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow
disabling the logmon and pointing the task's stdout/stderr to /dev/null.

This changeset is the first of several incremental improvements to log
collection short of full-on logging plugins. The next step will likely be to
extend the internal-only task driver configuration so that cluster
administrators can turn off log collection for the entire driver.

---

Fixes: #11175

Co-authored-by: Thomas Weber <towe75@googlemail.com>
2023-04-24 10:00:27 -04:00
Tim Gross b5a54b3b5f
docs: fix keyring path in install docs (#16946) 2023-04-20 16:20:39 -04:00
Luiz Aoqui b0fe69fded
docs: add missing field Capabilities to Namespace API (#16931) 2023-04-19 08:14:36 -07:00
Luiz Aoqui c7387dbd3a
docs: add missing API field JobACL and fix workload identity headers (#16930) 2023-04-19 08:12:58 -07:00
Chris van Meer d2f1766f3a
Updates to the UI block (#16328)
1. On the Consul address, following the recommendation for the HTTPS
   API on port 8501.
2. Add the hint to use HEX values for the colors.
2023-04-18 18:28:17 -07:00
Tim Gross 04e049caed
license: show Terminated field in license get command (#16892) 2023-04-17 09:01:43 -04:00
Tim Gross 62548616d4
client: allow drain_on_shutdown configuration (#16827)
Adds a new configuration to clients to optionally allow them to drain their
workloads on shutdown. The client sends the `Node.UpdateDrain` RPC targeting
itself and then monitors the drain state as seen by the server until the drain
is complete or the deadline expires. If it loses connection with the server, it
will monitor local client status instead to ensure allocations are stopped
before exiting.
2023-04-14 15:35:32 -04:00
Michael Schurter 79c521e570
docs: add node meta command docs (#16828)
* docs: add node meta command docs

Fixes #16758

* it helps if you actually add the files to git

* fix typos and examples vs usage
2023-04-12 15:29:33 -07:00
Tim Gross 4df2d9bda8
E2E: clarify drain -deadline and -force flag behaviors (#16868)
The `-deadline` and `-force` flag for the `nomad node drain` command only cause
the draining to ignore the `migrate` block's healthy deadline, max parallel,
etc. These flags don't have anything to do with the `kill_timeout` or
`shutdown_delay` options of the jobspec.

This changeset fixes the skipped E2E tests so that they validate the intended
behavior, and updates the docs for more clarity.
2023-04-12 15:27:24 -04:00
Tim Gross 657ae6f7d2
docs: document signal handling (#16835)
Expand documentation about Nomad's signal handling behaviors, including removing
incorrect information about graceful client shutdowns.
2023-04-11 16:26:39 -04:00
Seth Hoenig ba728f8f97
api: enable support for setting original job source (#16763)
* api: enable support for setting original source alongside job

This PR adds support for setting job source material along with
the registration of a job.

This includes a new HTTP endpoint and a new RPC endpoint for
making queries for the original source of a job. The
HTTP endpoint is /v1/job/<id>/submission?version=<version> and
the RPC method is Job.GetJobSubmission.

The job source (if submitted, and doing so is always optional), is
stored in the job_submission memdb table, separately from the
actual job. This way we do not incur overhead of reading the large
string field throughout normal job operations.

The server config now includes job_max_source_size for configuring
the maximum size the job source may be, before the server simply
drops the source material. This should help prevent Bad Things from
happening when huge jobs are submitted. If the value is set to 0,
all job source material will be dropped.

* api: avoid writing var content to disk for parsing

* api: move submission validation into RPC layer

* api: return an error if updating a job submission without namespace or job id

* api: be exact about the job index we associate a submission with (modify)

* api: reword api docs scheduling

* api: prune all but the last 6 job submissions

* api: protect against nil job submission in job validation

* api: set max job source size in test server

* api: fixups from pr
2023-04-11 08:45:08 -05:00
hashicorp-copywrite[bot] 005636afa0 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Tim Gross 1335543731
ephemeral disk: migrate should imply sticky (#16826)
The `ephemeral_disk` block's `migrate` field allows for best-effort migration of
the ephemeral disk data to new nodes. The documentation says the `migrate` field
is only respected if `sticky=true`, but in fact if client ACLs are not set the
data is migrated even if `sticky=false`.

The existing behavior when client ACLs are disabled has existed since the early
implementation, so "fixing" that case now would silently break backwards
compatibility. Additionally, having `migrate` not imply `sticky` seems
nonsensical: it suggests that if we place on a new node we migrate the data but
if we place on the same node, we throw the data away!

Update so that `migrate=true` implies `sticky=true` as follows:

* The failure mode when client ACLs are enabled comes from the server not passing
  along a migration token. Update the server so that the server provides a
  migration token whenever `migrate=true` and not just when `sticky=true` too.
* Update the scheduler so that `migrate` implies `sticky`.
* Update the client so that we check for `migrate || sticky` where appropriate.
* Refactor the E2E tests to move them off the old framework and make the intention
  of the test more clear.
2023-04-07 16:33:45 -04:00
Tim Gross e117ff3877
docs: remove reference to vSphere from CSI concepts docs (#16765)
The vSphere plugin is exclusive to k8s because it relies on k8s-APIs (and
crashes without them being present). Upstream unfortunately will not support
Nomad, so we shouldn't refer to it in our concept docs here.
2023-04-05 15:20:24 -04:00
James Rasell cb6ba80f0f
cli: stream both stdout and stderr when following an alloc. (#16556)
This update changes the behaviour when following logs from an
allocation, so that both stdout and stderr files streamed when the
operator supplies the follow flag. The previous behaviour is held
when all other flags and situations are provided.

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-04-04 10:42:27 +01:00
Tim Gross 0c582a2c94
docs: fix use of gpg to avoid teeing binary to terminal (#16767) 2023-04-03 10:54:21 -04:00
Tim Gross ffd5435ceb
docs: fix install instructions for apt (#16764)
The workflow described in the docs for apt installation is deprecated. Update to
match the workflow described in the Tutorials and official packaging guide.
2023-04-03 10:06:59 -04:00
Daniel Bennett c9adc22eec
Update enterprise licensing documentation (#16615)
updated various docs for new expiration behavior
and new command `nomad license inspect` to validate pre-upgrade
2023-03-30 16:40:19 -05:00
Horacio Monsalvo 20372b1721
connect: add meta on ConsulSidecarService (#16705)
Co-authored-by: Sol-Stiep <sol.stiep@southworks.com>
2023-03-30 16:09:28 -04:00
Piotr Kazmierczak acfc266c30 acl: JWT changelog entry and typo fix 2023-03-30 09:40:11 +02:00
Piotr Kazmierczak 4609119fb5 acl: JWT auth CLI (#16532) 2023-03-30 09:39:56 +02:00
Piotr Kazmierczak a9230fb0b7 acl: JWT auth method 2023-03-30 09:39:56 +02:00
Max Fröhlich ba590b081e
docs: mention Nomad Admission Control Proxy (#16702) 2023-03-28 15:18:26 -04:00
Tim Gross f22ff2b847
docs: clarify capabilities options for docker driver (#16693)
The `docker` driver cannot expand capabilities beyond the default set when the
task is a non-root user. Clarify this in the documentation of `allow_caps` and
update the `cap_add` and `cap_drop` to match the `exec` driver, which has more
clear language overall.
2023-03-28 13:32:08 -04:00
Tim Gross 78acc75b57
docs: add notes about keyring to snapshot restore (#16663)
When cluster administrators restore from Raft snapshot, they also need to ensure the
keyring is in place. For on-prem users doing in-place upgrades this is less of a
concern but for typical cloud workflows where the whole host is replaced, it's
an important warning (at least until #14852 has been implemented).
2023-03-28 08:31:01 -04:00
Tim Gross a953456460
docs: fix template retry attempts default documentation (#16667)
The configuration docs for `client.template.vault_retry`, `consul_retry`, and
`nomad_retry` incorrectly document the default number of attempts to be
unlimited (0). When we added these config blocks, we defaulted the fields to
`nil` for backwards compatibility, which causes them to fall back to the default
consul-template configuration values.
2023-03-28 08:27:06 -04:00
James Rasell a53f9a4094
docs: fix-up legacy link in client config page. (#16678) 2023-03-28 09:32:34 +01:00
Tobias Birkefeld 581eba9f41
docs: fix link of Read Stats API (#16673)
The former link results in a 404. Update the link to the correct developer docs.
2023-03-28 08:49:44 +01:00
ron-savoia 743414739d
docs: added section of needed ACL rules for Nomad UI (#16494) 2023-03-24 08:57:16 -04:00
Luiz Aoqui e5d31bca61
cli: job restart command (#16278)
Implement the new `nomad job restart` command that allows operators to
restart allocations tasks or reschedule then entire allocation.

Restarts can be batched to target multiple allocations in parallel.
Between each batch the command can stop and hold for a predefined time
or until the user confirms that the process should proceed.

This implements the "Stateless Restarts" alternative from the original
RFC
(https://gist.github.com/schmichael/e0b8b2ec1eb146301175fd87ddd46180).
The original concept is still worth implementing, as it allows this
functionality to be exposed over an API that can be consumed by the
Nomad UI and other clients. But the implementation turned out to be more
complex than we initially expected so we thought it would be better to
release a stateless CLI-based implementation first to gather feedback
and validate the restart behaviour.

Co-authored-by: Shishir Mahajan <smahajan@roblox.com>
2023-03-23 18:28:26 -04:00
James Rasell 7dd1484757
docs: detail support for Nomad checks in service block. (#16598) 2023-03-22 09:27:58 +01:00