Commit graph

899 commits

Author SHA1 Message Date
Seth Hoenig b5427a9f3b
Merge pull request #14215 from hashicorp/docs-update-checks-for-nsd
docs: update check documentation with NSD specifics
2022-08-23 09:23:53 -05:00
Seth Hoenig fb82f11e70
docs: fix checks doc typo
Co-authored-by: Piotr Kazmierczak <phk@mm.st>
2022-08-23 09:23:36 -05:00
Tim Gross bf57d76ec7
allow ACL policies to be associated with workload identity (#14140)
The original design for workload identities and ACLs allows for operators to
extend the automatic capabilities of a workload by using a specially-named
policy. This has shown to be potentially unsafe because of naming collisions, so
instead we'll allow operators to explicitly attach a policy to a workload
identity.

This changeset adds workload identity fields to ACL policy objects and threads
that all the way down to the command line. It also a new secondary index to the
ACL policy table on namespace and job so that claim resolution can efficiently
query for related policies.
2022-08-22 16:41:21 -04:00
Luiz Aoqui dbffdca92e
template: use pointer values for gid and uid (#14203)
When a Nomad agent starts and loads jobs that already existed in the
cluster, the default template uid and gid was being set to 0, since this
is the zero value for int. This caused these jobs to fail in
environments where it was not possible to use 0, such as in Windows
clients.

In order to differentiate between an explicit 0 and a template where
these properties were not set we need to use a pointer.
2022-08-22 16:25:49 -04:00
Seth Hoenig ea6d010790 docs: update check documentation with NSD specifics
This PR updates the checks documentation to mention support for checks
when using the Nomad service provider. There are limitations of NSD
compared to Consul, and those configuration options are now noted as
being Consul-only.
2022-08-22 10:50:26 -05:00
Phil Renaud cbd4deedf8
[ui] general keyboard navigation: 1.3.4 release (#14138)
* Initialized keyboard service

Neat but funky: dynamic subnav traversal

👻

generalized traverseSubnav method

Shift as special modifier key

Nice little demo panel

Keyboard shortcuts keycard

Some animation styles on keyboard shortcuts

Handle situations where a link is deeply nested from its parent menu item

Keyboard service cleanup

helper-based initializer and teardown for new contextual commands

Keyboard shortcuts modal component added and demo-ghost removed

Removed j and k from subnav traversal

Register and unregister methods for subnav plus new subnavs for volumes and volume

register main nav method

Generalizing the register nav method

12762 table keynav (#12975)

* Experimental feature: shortcut visual hints

* Long way around to a custom modifier for keyboard shortcuts

* dynamic table and list iterative shortcuts

* Progress with regular old tether

* Delogging

* Table Keynav tether fix, server and client navs, and fix to shiftless on modified arrow keys

Go to Optimize keyboard link and storage key changed to g r

parameterized jobs keyboard nav

Dynamic numeric keynav for multiple tables (#13482)

* Multiple tables init

* URL-bind enumerable keyboard commands and add to more taskRow and allocationRows

* Type safety and lint fixes

* Consolidated push to keyCommands

* Default value when removing keyCommands

* Remove the URL-based removal method and perform a recompute on any add

Get tests passing in Keynav: remove math helpers and a few other defensive moves (#13761)

* Remove ember math helpers

* Test fixes for jobparts/body

* Kill an unneeded integration helper test

* delog

* Trying if disabling percy lets this finish

* Okay so its not percy; try parallelism in circle

* Percyless yet again

* Trying a different angle to not have percy

* Upgrade percy to 1.6.1

[ui] Keyboard nav: "u" key to go up a level (#13754)

* U to go up a level

* Mislabelled my conditional

* Custom lint ignore rule

* Custom lint ignore rule, this time with commas

* Since we're getting rid of ember math helpers elsewhere, do the math ourselves here

Replace ArrowLeft etc. with an ascii arrow (#13776)

* Replace ArrowLeft etc. with an ascii arrow

* non-mutative helper cleanup

Keyboard Nav: let users rebind their shortcuts (#13781)

* click-outside and shortcuts enabled/disabled toggle

* Trap focus when modal open

* Enabled/disabled saved to localStorage

* Autofocus edit button on variable index

* Modal overflow styles

* Functional rebind

* Saving rebinds to localStorage for all majors

* Started on defaultCommandBindings

* Modal header style and cancel rebind on escape

* keyboardable keybindings w buttons instead of spans

* recording and defaultvalues

* Enter short-circuits rebind

* Only some commands are rebindable, and dont show dupes

* No unused get import

* More visually distinct header on modal

* Disallowed keys for rebind, showing buffer as you type, and moving dedupe to modal logic

willDestroy hook to prevent tests from doubling/tripling up addEventListener on kb events

remove unused tests

Keyboard Navigation acceptance tests (#13893)

* Acceptance tests for keyboard modal

* a11y audit fix and localStorage clear

* Bind/rebind/localStorage tests

* Keyboard tests for dynamic nav and tables

* Rebinder and assert expectation

* Second percy snapshot showing hints no longer relevant

Weird issue where linktos with query props specifically from the task-groups page would fail to route / hit undefined.shouldSuperCede errors

Adds the concept of exclusivity to a keycommand, removing peers that also share its label

Lintfix

Changelog and PR feedback

Changelog and PR feedback

Fix to rebinding in firefox by blurring the now-disabled button on rebind (#14053)

* Secure Variables shortcuts removed

* Variable index route autofocus removed

* Updated changelog entry

* Updated changelog entry

* Keynav docs (#14148)

* Section added to the API Docs UI page

* Added a note about disabling

* Prev and Next order

* Remove dev log and unneeded comments
2022-08-17 12:59:33 -04:00
Kerim Satirli 614171610f
adds link for Nomad-Pack GitHub action (#14118) 2022-08-16 08:34:26 +02:00
Tim Gross a4e89d72a8
secure vars: filter by path in List RPCs (#14036)
The List RPCs only checked the ACL for the Prefix argument of the request. Add
an ACL filter to the paginator for the List RPC.

Extend test coverage of ACLs in the List RPC and in the `acl` package, and add a
"deny" capability so that operators can deny specific paths or prefixes below an
allowed path.
2022-08-15 11:38:20 -04:00
Mike Nomitch ce310b350d
Add notes about DAS being prometheus only (#14040) 2022-08-15 10:17:31 +02:00
Seth Hoenig eb966a4ce8
Merge pull request #14086 from Morantron/patch-1
Fix typo in /tools/autoscaling
2022-08-12 09:07:12 -05:00
Seth Hoenig 394aebfbd9
Merge pull request #14088 from hashicorp/b-plan-vault-token
cli: support vault token in plan command
2022-08-12 09:05:34 -05:00
Seth Hoenig 1224fdf60d
Merge pull request #14089 from hashicorp/f-docker-disable-healthchecks
docker: configuration for disable docker healthcheck
2022-08-12 09:00:31 -05:00
James Rasell f6a5961a20
docs: correctly state RPC port is used by servers and clients. (#14091) 2022-08-12 10:14:14 +02:00
Seth Hoenig dc761aa7ec docker: create a docker task config setting for disable built-in healthcheck
This PR adds a docker driver task configuration setting for turning off
built-in HEALTHCHECK of a container.

References)
https://docs.docker.com/engine/reference/builder/#healthcheck
https://github.com/docker/engine-api/blob/master/types/container/config.go#L16

Closes #5310
Closes #14068
2022-08-11 10:33:48 -05:00
Seth Hoenig ba5c45ab93 cli: respect vault token in plan command
This PR fixes a regression where the 'job plan' command would not respect
a Vault token if set via --vault-token or $VAULT_TOKEN.

Basically the same bug/fix as for the validate command in https://github.com/hashicorp/nomad/issues/13062

Fixes https://github.com/hashicorp/nomad/issues/13939
2022-08-11 08:54:08 -05:00
Morantron 741170160f
Update index.mdx 2022-08-11 09:03:52 +02:00
Seth Hoenig 3d925a78e5
Merge pull request #14065 from hashicorp/b-fwd-vtoken-validation
cli: forward request for job validation to nomad leader
2022-08-10 15:14:01 -05:00
Seth Hoenig 3aaaedf52e cli: forward request for job validation to nomad leader
This PR changes the behavior of 'nomad job validate' to forward the
request to the nomad leader, rather than responding from any server.

This is because we need the leader when validating Vault tokens, since
the leader is the only server with an active vault client.
2022-08-10 14:34:04 -05:00
dgotlieb 7fbc8baaeb
doc typo fix
docker and podman don't suck 🤣
2022-08-10 15:04:07 +03:00
Charlie Voiselle 9a19279f59
Sweep of docs for repeated words; minor edits (#14032) 2022-08-05 16:45:30 -04:00
Luiz Aoqui 9affe31a0f
qemu: reduce monitor socket path (#13971)
The QEMU driver can take an optional `graceful_shutdown` configuration
which will create a Unix socket to send ACPI shutdown signal to the VM.

Unix sockets have a hard length limit and the driver implementation
assumed that QEMU versions 2.10.1 were able to handle longer paths. This
is not correct, the linked QEMU fix only changed the behaviour from
silently truncating longer socket paths to throwing an error.

By validating the socket path before starting the QEMU machine we can
provide users a more actionable and meaningful error message, and by
using a shorter socket file name we leave a bit more room for
user-defined values in the path, such as the task name.

The maximum length allowed is also platform-dependant, so validation
needs to be different for each OS.
2022-08-04 12:10:35 -04:00
Luiz Aoqui e3d78c343c
template: set default UID/GID to -1 (#13998)
UID/GID 0 is usually reserved for the root user/group. While Nomad
clients are expected to run as root it may not always be the case.

Setting these values as -1 if not defined will fallback to the pervious
behaviour of not attempting to set file ownership and use whatever
UID/GID the Nomad agent is running as. It will also keep backwards
compatibility, which is specially important for platforms where this
feature is not supported, like Windows.
2022-08-04 11:26:08 -04:00
Luiz Aoqui 8f05a55def
docs: remove link to HCL2 timestamp function (#13999)
The `timestamp` HCL2 function was never part of the set of supported
functions.
2022-08-04 10:07:51 -04:00
Derek Strickland 77df9c133b
Add Nomad RetryConfig to agent template config (#13907)
* add Nomad RetryConfig to agent template config
2022-08-03 16:56:30 -04:00
Piotr Kazmierczak 530280505f
client: enable specifying user/group permissions in the template stanza (#13755)
* Adds Uid/Gid parameters to template.

* Updated diff_test

* fixed order

* update jobspec and api

* removed obsolete code

* helper functions for jobspec parse test

* updated documentation

* adjusted API jobs test.

* propagate uid/gid setting to job_endpoint

* adjusted job_endpoint tests

* making uid/gid into pointers

* refactor

* updated documentation

* updated documentation

* Update client/allocrunner/taskrunner/template/template_test.go

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* Update website/content/api-docs/json-jobs.mdx

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

* propagating documentation change from Luiz

* formatting

* changelog entry

* changed changelog entry

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-08-02 22:15:38 +02:00
Tim Gross e025afdf87
docs: concepts for secure variables and workload identity (#13764)
Includes concept docs for secure variables, concept docs for workload
identity, and an operations docs for keyring management.
2022-08-02 10:06:26 -04:00
Eric Weber cbce13c1ac
Add stage_publish_base_dir field to csi_plugin stanza of a job (#13919)
* Allow specification of CSI staging and publishing directory path
* Add website documentation for stage_publish_dir
* Replace erroneous reference to csi_plugin.mount_config with csi_plugin.mount_dir
* Avoid requiring CSI plugins to be redeployed after introducing StagePublishDir
2022-08-02 09:42:44 -04:00
Tim Gross e5ac6464f6
secure vars: enforce ENT quotas (OSS work) (#13951)
Move the secure variables quota enforcement calls into the state store to ensure
quota checks are atomic with quota updates (in the same transaction).

Switch to a machine-size int instead of a uint64 for quota tracking. The
ENT-side quota spec is described as int, and negative values have a meaning as
"not permitted at all". Using the same type for tracking will make it easier to
the math around checks, and uint64 is infeasibly large anyways.

Add secure vars to quota HTTP API and CLI outputs and API docs.
2022-08-02 09:32:09 -04:00
Tim Gross f14fafe914
docs: fix path for quota/usage API (#13952) 2022-08-02 08:46:45 -04:00
asymmetric b89718d70e
Update filesystem.mdx (#13738)
fix alloc working directory path
2022-07-25 10:25:48 -04:00
Scott Holodak 12ef89a61a
docs: fix placement for scaling and csi_plugin (#13892) 2022-07-25 10:06:59 -04:00
Charlie Voiselle 456ad33b7c
Fix link (#13881) 2022-07-22 12:27:45 -04:00
Michael Schurter 0d1c9a53a4
docs: clarify submit-job allows stopping (#13871) 2022-07-21 10:18:57 -07:00
Tim Gross 96aea74b4b
docs: keyring commands (#13690)
Document the secure variables keyring commands, document the aliased
gossip keyring commands, and note that the old gossip keyring commands
are deprecated.
2022-07-20 14:14:10 -04:00
Tim Gross 49ad3dc3ba
docs: document secure variables server config options (#13695) 2022-07-20 14:13:39 -04:00
Will Jordan 5354409b1a
Return 429 response on HTTP max connection limit (#13621)
Return 429 response on HTTP max connection limit. Instead of silently closing
the connection, return a `429 Too Many Requests` HTTP response with a helpful
error message to aid debugging when the connection limit is unintentionally
reached.

Set a 10-millisecond write timeout and rate limiter for connection-limit 429
response to prevent writing the HTTP response from consuming too many server
resources.

Add `nomad.agent.http.exceeded metric` counting the number of HTTP connections
exceeding concurrency limit.
2022-07-20 14:12:21 -04:00
Luiz Aoqui 3dc701a8d0
docs: update Autoscaler AWS plugin with new ws_credential_provider config (#13779) 2022-07-19 10:27:55 -04:00
Niklas Hambüchen 422c83e97a
docs: job-specification: Explain that priority has no effect on run order (#13835)
Makes the issues from #9845 and #12792 less surprising to the user.
2022-07-19 08:55:29 -04:00
Andy Assareh e49c021792
word typo digestible (#13772) 2022-07-19 09:00:52 +02:00
Seth Hoenig 4dea14267d
Merge pull request #13813 from hashicorp/docs-move-checks
docs: move checks into own page
2022-07-18 12:27:43 -05:00
Seth Hoenig 4459312541 docs: move checks into own page
This PR creates a top-level 'check' page for job-specification docs.

The content for checks is about half the content of the service page, and
is about to increase in size when we add docs about Nomad service checks.
Seemed like a good idea to just split the checks section out into its own
thing (e.g. check_restart is already a topic).

Doing the move first lets us backport this change without adding Nomad service
check stuff yet.

Mostly just a lift-and-shift but with some tweaked examples to de-emphasize
the use of script checks.
2022-07-18 09:34:55 -05:00
Tim Gross 1e8978ca04
docs: ACL policy spec reference (#13787)
The "Secure Nomad with Access Control" guide provides a tutorial for
bootstrapping Nomad ACLs, writing policies, and creating tokens. Add a reference
guide just for the ACL policy specification.
2022-07-18 09:35:28 -04:00
Luiz Aoqui 730f869b6b
docs: update Podman docs to v0.4.0 (#13783) 2022-07-15 18:01:35 -04:00
Michael Schurter e97548b5f8
Improve metrics reference documentation (#13769)
* docs: tighten up parameterized job metrics docs

* docs: improve alloc status descriptions

Remove `nomad.client.allocations.start` as it doesn't exist.
2022-07-15 14:22:39 -07:00
Michael Schurter 5414f49821
docs: clarify blocked_evals metrics (#13751)
Related to #13740

- blocked_evals.total_blocked is the number of evals blocked for *any*
  reason
- blocked_evals.total_quota_limit is the number of evals blocked by
  quota limits, but critically: their resources are *not* counted in the
  cpu/memory
2022-07-14 11:32:33 -07:00
Seth Hoenig 3a32220b3b
Merge pull request #13716 from hashicorp/docs-update-consul-warning
docs: remove consul 1.12.0 warning
2022-07-14 08:45:56 -05:00
Luiz Aoqui b656981cf0
Track plan rejection history and automatically mark clients as ineligible (#13421)
Plan rejections occur when the scheduler work and the leader plan
applier disagree on the feasibility of a plan. This may happen for valid
reasons: since Nomad does parallel scheduling, it is expected that
different workers will have a different state when computing placements.

As the final plan reaches the leader plan applier, it may no longer be
valid due to a concurrent scheduling taking up intended resources. In
these situations the plan applier will notify the worker that the plan
was rejected and that they should refresh their state before trying
again.

In some rare and unexpected circumstances it has been observed that
workers will repeatedly submit the same plan, even if they are always
rejected.

While the root cause is still unknown this mitigation has been put in
place. The plan applier will now track the history of plan rejections
per client and include in the plan result a list of node IDs that should
be set as ineligible if the number of rejections in a given time window
crosses a certain threshold. The window size and threshold value can be
adjusted in the server configuration.

To avoid marking several nodes as ineligible at one, the operation is rate
limited to 5 nodes every 30min, with an initial burst of 10 operations.
2022-07-12 18:40:20 -04:00
Michael Schurter 3e50f72fad
core: merge reserved_ports into host_networks (#13651)
Fixes #13505

This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports).

As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility:

Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly.
Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me.
So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.
2022-07-12 14:40:25 -07:00
Seth Hoenig a9fa48f3db docs: remove consul 1.12.0 warning 2022-07-12 09:53:17 -05:00
Tim Gross fc4cd53cfb
docs: rename Internals to Concepts (#13696) 2022-07-11 16:55:33 -04:00
Tim Gross d49ff0175c
docs: move operator subcommands under their own trees (#13677)
The sidebar navigation tree for the `operator` sub-sub commands is
getting cluttered and we have a new set of commands coming to support
secure variables keyring as well. Move these all under their own
subtrees.
2022-07-11 14:00:24 -04:00
Seth Hoenig ed2f2b1a75
docs: move upgrade docs for max_client_timeout
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-07-07 16:46:26 -05:00
Seth Hoenig 905e673553 docs: upgrade guide for client max_kill_timeout 2022-07-07 15:27:40 -05:00
Luiz Aoqui 03433dd8af
cli: improve output of eval commands (#13581)
Use the same output format when listing multiple evals in the `eval
list` command and when `eval status <prefix>` matches more than one
eval.

Include the eval namespace in all output formats and always include the
job ID in `eval status` since, even `node-update` evals are related to a
job.

Add Node ID to the evals table output to help differentiate
`node-update` evals.

Co-authored-by: James Rasell <jrasell@hashicorp.com>
2022-07-07 13:13:34 -04:00
Ted Behling 6a032a54d2
driver/docker: Don't pull InfraImage if it exists (#13265)
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2022-07-07 17:44:06 +02:00
Seth Hoenig b9fe6c8d2c docs: fixup from cr comments 2022-07-07 08:37:10 -05:00
Seth Hoenig 1c31ef285e docs: add docs for simple load balancing nomad services
This PR adds a section to template docs for simple load balancing with nomad servicse.
2022-07-06 17:34:30 -05:00
James Rasell 0c0b028a59
core: allow deleting of evaluations (#13492)
* core: add eval delete RPC and core functionality.

* agent: add eval delete HTTP endpoint.

* api: add eval delete API functionality.

* cli: add eval delete command.

* docs: add eval delete website documentation.
2022-07-06 16:30:11 +02:00
James Rasell 181b247384
core: allow pausing and un-pausing of leader broker routine (#13045)
* core: allow pause/un-pause of eval broker on region leader.

* agent: add ability to pause eval broker via scheduler config.

* cli: add operator scheduler commands to interact with config.

* api: add ability to pause eval broker via scheduler config

* e2e: add operator scheduler test for eval broker pause.

* docs: include new opertor scheduler CLI and pause eval API info.
2022-07-06 16:13:48 +02:00
Michelle Noorali f227855de1
doc: explain permissions for Vault sys/capabilties-self 2022-07-06 10:01:30 -04:00
Yann Coleu fe64f8cdd7
docs: typo on command word (#13582) 2022-07-05 16:24:25 -04:00
Steven Collins ab97650098
docs: Add 'serial' attribute to usb driver (#13547) 2022-07-05 16:23:04 -04:00
Seth Hoenig 97726c2fd8
Merge pull request #12862 from hashicorp/f-choose-services
api: enable selecting subset of services using rendezvous hashing
2022-06-30 15:17:40 -05:00
Derek Strickland 47e3b28dba
docs: update task leader to explain shutdown sequence. (#13498)
* docs: update task leader to explain shutdown sequence.
2022-06-29 05:13:45 -04:00
James Rasell d21e4abe3f
docs: fixup HCL2 index collection function documentation. (#13511) 2022-06-28 18:27:38 +02:00
Andrew 3a87406f2f
Fix typo in Docker docs (#13497) 2022-06-28 11:05:50 +02:00
Seth Hoenig 9467bc9eb3 api: enable selecting subset of services using rendezvous hashing
This PR adds the 'choose' query parameter to the '/v1/service/<service>' endpoint.

The value of 'choose' is in the form '<number>|<key>', number is the number
of desired services and key is a value unique but consistent to the requester
(e.g. allocID).

Folks aren't really expected to use this API directly, but rather through consul-template
which will soon be getting a new helper function making use of this query parameter.

Example,

curl 'localhost:4646/v1/service/redis?choose=2|abc123'

Note: consul-templte v0.29.1 includes the necessary nomadServices functionality.
2022-06-25 10:37:37 -05:00
Seth Hoenig 91e08d5e23 core: remove support for raft protocol version 2
This PR checks server config for raft_protocol, which must now
be set to 3 or unset (0). When unset, version 3 is used as the
default.
2022-06-23 14:37:50 +00:00
Michael Schurter 7b7c72b21d
docs: clarify total_escaped is just an optimization (#13460) 2022-06-22 11:39:56 -07:00
Elijah Voigt 665b198968
Lob.com uses Nomad too! (#13295)
Lob.com has been ramping up our use of Nomad for ~6 months.
Now that we've started blogging about it we'd love to be on the _official_ list.
2022-06-21 09:10:08 -04:00
Derek Strickland a15cef689d
Improve Autoscaler overview (#13396)
Improve Autoscaler overview documentation.
2022-06-17 05:15:22 -04:00
Nick Wales 3a8c8250f4
Merge pull request #13401 from nickwales/tls_typo
Updates TLS documentation
2022-06-16 12:34:59 -05:00
Arthur Leclerc d98a9b1d72
docs: Fix typo (#13389) 2022-06-16 13:24:18 -04:00
Nick Wales c964ae0135
Updates TLS documentation 2022-06-16 12:15:40 -05:00
James Hu 7e3d21646d
Fix spelling error (#13397) 2022-06-16 12:41:49 -04:00
Luiz Aoqui 6598567725
docs: create volume spec page (#13353)
In addition to jobs, there are other objects in Nomad that have a
specific format and can be provided to commands and API endpoints.

This commit creates a new menu section to hold the specification for
volumes and update the command pages to point to the new centralized
definition.

Redirecting the previous entries is not possible with `redirect.js`
because they are done server-side and URL fragments are not accessible
to detect a match. So we provide hidden anchors with a link to the new
page to guide users towards the new documentation.

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-06-14 14:08:25 -04:00
Grant Griffiths 99896da443
CSI: make plugin health_timeout configurable in csi_plugin stanza (#13340)
Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>
2022-06-14 10:04:16 -04:00
Michael Schurter f41ea0e5dc
docs: explain behavior of system gc command (#13342) 2022-06-13 09:54:23 +02:00
Derek Strickland 5ebd06a8f9
template: improve default language for max_stale and wait (#13334)
* template: improve default language for max_stale and wait

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2022-06-10 14:34:25 -04:00
Daniel Rossbach 8c52c03c8c
qemu driver: Add option to configure drive_interface (#11864) 2022-06-10 10:03:51 -04:00
Raffaele Di Fazio 66938e0ef0
Update supplement.mdx with the right GitHub spelling (#13326) 2022-06-10 11:46:19 +02:00
phreakocious 94a78597d2
Add guest_agent config option for QEMU driver (#12800)
Add boolean 'guest_agent' config option for QEMU driver, which will
create the socket file for the QEMU Guest Agent in the task dir when
enabled.
2022-06-09 09:21:38 -04:00
Derek Strickland 34dea90d7a
docker: update images to reference hashicorpdev Docker organization (#12903)
docker: update images to reference hashicorpdev dockerhub organization
generate job_init.bindata_assetfs.go

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-06-08 15:06:00 -04:00
Derek Strickland 13ea5ae87a
consul-template: Add fault tolerant defaults (#13041)
consul-template: Add fault tolerant defaults

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-06-08 14:08:25 -04:00
Shantanu Gadgil 43d8baace0
heartbeat_grace is a server parameter (#13288)
`heartbeat_grace` is a `server` parameter, not a `client` parameter.
2022-06-08 10:49:23 -04:00
Kevin Schoonover 544c276128
parse ACL token from authorization header (#12534) 2022-06-06 15:51:02 -04:00
Conor Evans 86116a7607
add filebase64 function (#11791)
Signed-off-by: Conor Evans <coevans@tcd.ie>
2022-06-06 11:58:17 -04:00
dgotlieb 116d78a89c
docs: update warning for gateway listener docs for non-tcp protos 2022-06-06 10:53:01 -04:00
Radek Simko af4f976516
docs/job-spec: Fix formatting in network page (#13228) 2022-06-06 10:14:12 -04:00
Radek Simko 498819f1a8
docs/docker: fix broken link to bridge mode (#13221) 2022-06-06 09:59:36 -04:00
Radek Simko 14227b8297
docs: link to client reqs section for added clarity (#13215) 2022-06-06 09:56:29 -04:00
Lance Haig 4bf27d743d
Allow Operator Generated bootstrap token (#12520) 2022-06-03 07:37:24 -04:00
Huan Wang 7d15157635
adding support for customized ingress tls (#13184) 2022-06-02 18:43:58 -04:00
Shantanu Gadgil 6cb8c95534
fingerprint kernel architecture name (#13182) 2022-06-02 15:51:00 -04:00
Seth Hoenig 0399b7e4c5
Merge pull request #12951 from jorgemarey/f-srv-tagged-addresses
Allow setting tagged addresses on services
2022-06-01 10:51:49 -05:00
Anthony d5e084b297
docs: added note about vault -period flag (#13185) 2022-05-31 14:26:03 -07:00
Seth Hoenig 54efec5dfe docs: add docs and tests for tagged_addresses 2022-05-31 13:02:48 -05:00
pabloyoyoista d0ff73ddbe
docs: add podman ulimit option (#13180) 2022-05-31 11:16:46 -04:00
James Rasell 59a4a19a4f
docs: add allocation and job services API endpoint docs. (#13174) 2022-05-30 16:15:09 +02:00
Waquid Valiya Peedikakkal c06edaacc8
docs: add nomad-pipeline to community tools page (#13172) 2022-05-30 09:05:38 +02:00
Luiz Aoqui bb7b44a9ae
docs: add wander to the community tools page (#13165) 2022-05-27 11:53:01 -04:00
Toyam Cox 06b934ccf6
docs: make the example for 'load' work (#13102) 2022-05-27 08:48:58 -04:00
Seth Hoenig c2ba1e2e29
Merge pull request #13125 from hashicorp/b-connect-upstream-namespace
connect: enable setting connect upstream destination namespace
2022-05-26 10:29:11 -05:00
Seth Hoenig 4631045d83 connect: enable setting connect upstream destination namespace 2022-05-26 09:39:36 -05:00
Amier Chery 05274c9c9f
Merge pull request #13083 from josegonzalez/patch-1
Update service.check.task definition to match code
2022-05-26 10:38:49 -04:00
Michael Schurter 2965dc6a1a
artifact: fix numerous go-getter security issues
Fix numerous go-getter security issues:

- Add timeouts to http, git, and hg operations to prevent DoS
- Add size limit to http to prevent resource exhaustion
- Disable following symlinks in both artifacts and `job run`
- Stop performing initial HEAD request to avoid file corruption on
  retries and DoS opportunities.

**Approach**

Since Nomad has no ability to differentiate a DoS-via-large-artifact vs
a legitimate workload, all of the new limits are configurable at the
client agent level.

The max size of HTTP downloads is also exposed as a node attribute so
that if some workloads have large artifacts they can specify a high
limit in their jobspecs.

In the future all of this plumbing could be extended to enable/disable
specific getters or artifact downloading entirely on a per-node basis.
2022-05-24 16:29:39 -04:00
PinkLolicorn 83dd9e801e
docs: mount_flags takes a slice of strings (#13087)
The description of `mount_flags` provides incorrect example
of the accepted value format.

This fixes the issue by changing the example from a string
`ro,noatime` to a slice of strings `["ro", "noatime"]`.
2022-05-20 09:16:17 -04:00
Jose Diaz-Gonzalez fa1077fbcd
docs: correct where task cannot be defined 2022-05-19 21:24:58 -04:00
Jose Diaz-Gonzalez ea01fe398f
Update service.check.task definition to match code
Nomad errors out when attempting to specify a task for a service that uses consul connect but does not have script or gRPC checks. See 304d0cf595/nomad/structs/structs.go (L6643) for details.
2022-05-19 20:54:49 -04:00
Seth Hoenig fc58f4972c cli: correctly use and validate job with vault token set
This PR fixes `job validate` to respect '-vault-token', '$VAULT_TOKEN',
'-vault-namespace' if set.
2022-05-19 12:13:34 -05:00
Seth Hoenig 65f7abf2f4 cli: update default redis and use nomad service discovery
Closes #12927
Closes #12958

This PR updates the version of redis used in our examples from 3.2 to 7.
The old version is very not supported anymore, and we should be setting
a good example by using a supported version.

The long-form example job is now fixed so that the service stanza uses
nomad as the service discovery provider, and so now the job runs without
a requirement of having Consul running and configured.
2022-05-17 10:24:19 -05:00
Luiz Aoqui fea13f39b3
docs: add Consul 1.12.0 upgrade notice 2022-05-16 18:44:26 -04:00
Karan Sharma e0be868b79
docs: Fix typo in sidecar_service (#13021) 2022-05-16 09:35:42 +02:00
Tim Gross 6e5d6eb3b5
docs: note that already-dispatched jobs cannot be updated (#12973) 2022-05-12 16:18:42 -04:00
Michael Schurter 5a43d3c675
docs: add sysbatch to scheduling internals (#12954) 2022-05-11 17:06:17 -07:00
Chetan Sarva 14752cd2c0
docs: add version note to nomad services template (#12910) 2022-05-06 17:39:27 +02:00
Tim Gross 26b9f88ef3
docs: add missing set_contains_any constraint docs (#12886)
This constraint and affinity was added in 0.9.x but was only
documented for affinities. Close that documentation gap.
2022-05-05 11:11:05 -04:00
Tim Gross 45b238ec82
CSI: node drain should end once only plugins remain (#12846)
In #12324 we made it so that plugins wait until the node drain is
complete, as we do for system jobs. But we neglected to mark the node
drain as complete once only plugins (or system jobs) remaining, which
means that the node drain is left in a draining state until the
`deadline` time expires. This was incorrectly documented as expected
behavior in #12324.
2022-05-03 10:20:22 -04:00
Seth Hoenig b8d807c320
Merge pull request #12840 from hashicorp/docs-nvidia-updates
docs: update nvidia driver documentation
2022-05-02 10:07:02 -05:00
Seth Hoenig 684abb9e28 docs: update nvidia driver documentation
notably:
- name of the compiled binary is 'nomad-device-nvidia', not 'nvidia-gpu'
- link to Nvidia docs for installing the container runtime toolkit
- list docker v19.03 as minimum version, to track with nvidia's new container runtime toolkit
2022-05-02 09:11:05 -05:00
Matus Goljer a741cc76b5
nomad can also install autocomplete for fish shell (#12834) 2022-05-02 09:26:55 -04:00
Tim Gross d06ad50538
docs: clarify capacity_min/max for volumes (#12825)
The capacity fields for `create volume` set bounds on the resulting
size of the volume, but the ultimate size of the volume will be
determined by the storage provider (between the min and max). Clarify
this in the documentation and provide a suggestion for how to set a
exact size.
2022-04-29 13:38:30 -04:00
Derek Strickland 584bf0162f
docs: Add known limitations callouts to Max Client Disconnect section (#12801)
* docs: Add known limitations callouts to Max Client Disconnect section
2022-04-28 16:17:14 -04:00
Tim Gross c763c4cb96
remove pre-0.9 driver code and related E2E test (#12791)
This test exercises upgrades between 0.8 and Nomad versions greater
than 0.9. We have not supported 0.8.x in a very long time and in any
case the test has been marked to skip because the downloader doesn't
work.
2022-04-27 09:53:37 -04:00
Michael Schurter 1256c8ef66
docs: update json jobs docs (#12766)
* docs: update json jobs docs

Did you know that Nomad has not 1 but 2 JSON formats for jobs? 2½ if you
want to acknowledge that sometimes our JSON job representations have a
Job top-level wrapper and sometimes do not.

The 2½ formats are:
```
 1.   HCL JSON
 2.   Input API JSON (top-level Job field)
 2.5. Output API JSON (lacks top-level Job field)
```

`#2` is what our docs consider our API JSON. `#2.5` seems to be an
accident of history we can't fix with breaking API compatibility.

`#1` is an even more interesting accident of history: the `jobspec2`
package automatically detects if the input to Parse is JSON and switches
to a JSON parser. This behavior is undocumented, the format is
unspecified, and there is no official HashiCorp tooling to produce this
JSON from HCL. The plot thickens when you discover popular third party
tools like hcl2json.com and https://github.com/tmccombs/hcl2json seem to
produce JSON that `nomad run` accepts!

Since we have no telemetry around whether or not anyone passes HCL JSON
to `nomad run`, and people don't file bugs around features that Just
Work, I'm choosing to leave that code path in place and *acknowledged
but not suggested* in documentation.

See https://github.com/hashicorp/hcl/issues/498 for a more comprehensive
discussion of what officially supporting HCL JSON in Nomad would look
like.

(I also added some of the missing fields to the (Input API flavor) JSON
Job documentation, but it still needs a lot of work to be
comprehensive.)

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-04-22 15:57:27 -07:00
Luiz Aoqui a8cc633156
vault: revert support for entity aliases (#12723)
After a more detailed analysis of this feature, the approach taken in
PR #12449 was found to be not ideal due to poor UX (users are
responsible for setting the entity alias they would like to use) and
issues around jobs potentially masquerading itself as another Vault
entity.
2022-04-22 10:46:34 -04:00
Seth Hoenig c4aab10e53 services: cr followup 2022-04-22 09:14:29 -05:00
Seth Hoenig 3fcac242c6 services: enable setting arbitrary address value in service registrations
This PR introduces the `address` field in the `service` block so that Nomad
or Consul services can be registered with a custom `.Address.` to advertise.

The address can be an IP address or domain name. If the `address` field is
set, the `service.address_mode` must be set in `auto` mode.
2022-04-22 09:14:29 -05:00
James Rasell b5d10bcece
docs: add upgrade note for Consul implicit constraint. (#12749) 2022-04-22 15:53:27 +02:00
James Rasell 046831466c
cli: add pagination flags to service info command. (#12730) 2022-04-22 10:32:40 +02:00
Michael Schurter 5db3a671db
cli: add -json flag to support job commands (#12591)
* cli: add -json flag to support job commands

While the CLI has always supported running JSON jobs, its support has
been via HCLv2's JSON parsing. I have no idea what format it expects the
job to be in, but it's absolutely not in the same format as the API
expects.

So I ignored that and added a new -json flag to explicitly support *API*
style JSON jobspecs.

The jobspecs can even have the wrapping {"Job": {...}} envelope or not!

* docs: fix example for `nomad job validate`

We haven't been able to validate inside driver config stanzas ever since
the move to task driver plugins. 😭
2022-04-21 13:20:36 -07:00
Tim Gross f4287c870d
cli: detect directory when applying namespace spec file (#12738)
The new `namespace apply` feature that allows for passing a namespace
specification file detects the difference between an empty namespace
and a namespace specification by checking if the file exists. For most
cases, the file will have an extension like `.hcl` and so there's
little danger that a user will apply a file spec when they intended to
apply a file name.

But because directory names typically don't include an extension,
you're much more likely to collide when trying to `namespace apply` by
name only, and then you get a confusing error message of the form:

   Failed to read file: read $namespace: is a directory

Detect the case where the namespace name collides with a directory in
the current working directory, and skip trying to load the directory.
2022-04-21 14:53:45 -04:00
James Rasell 716b8e658b
api: Add support for filtering and pagination to the node list endpoint (#12727) 2022-04-21 17:04:33 +02:00
Tim Gross 79a9d788d2
docs: fix broken link from template to client config (#12733) 2022-04-21 11:04:04 -04:00
James Rasell c4195c452a
docs: update HCL2 dynamic example to use block with label. (#12715) 2022-04-21 10:18:04 +02:00
James Rasell 42068f8823
client: add NOMAD_SHORT_ALLOC_ID allocation env var. (#12603) 2022-04-20 10:30:48 +02:00
Seth Hoenig df587d8263 docs: update documentation with connect acls changes
This PR updates the changelog, adds notes the 1.3 upgrade guide, and
updates the connect integration docs with documentation about the new
requirement on Consul ACL policies of Consul agent default anonymous ACL
tokens.
2022-04-18 08:22:33 -05:00
Shishir f5121d261e
Add os to NodeListStub struct. (#12497)
* Add os to NodeListStub struct.

Signed-off-by: Shishir Mahajan <smahajan@roblox.com>

* Add os as a query param to /v1/nodes.

Signed-off-by: Shishir Mahajan <smahajan@roblox.com>

* Add test: os as a query param to /v1/nodes.

Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2022-04-15 17:22:45 -07:00
Michael Schurter 70a04dd106
docs: add plan for node rejected details and more (#12564)
- Moved federation docs to the bottom since *everyone* is potentially
  affected by the other sections on the page, but only users of
  federation are affected by it.
- Added section on the plan for node rejected bug since it is fairly
  easy to diagnose and removing affected nodes is a fairly reliable
  workaround.
- Mention 5s cliff for wait_for_index.
- Remove the lie that we do not have job status metrics! How old was
  that?!
- Reinforce the importance of monitoring basic system resources
2022-04-14 16:09:33 -07:00
Seth Hoenig a1c4f16cf1 connect: prefix tag with nomad.; merge into envoy_stats_tags; update docs
This PR expands on the work done in #12543 to
- prefix the tag, so it is now "nomad.alloc_id" to be more consistent with Consul tags
- merge into pre-existing envoy_stats_tags fields
- update the upgrade guide docs
- update changelog
2022-04-14 12:52:52 -05:00
James Rasell 4cdc46ae75
service discovery: add pagination and filtering support to info requests (#12552)
* services: add pagination and filter support to info RPC.
* cli: add filter flag to service info command.
* docs: add pagination and filter details to services info API.
* paginator: minor updates to comment and func signature.
2022-04-13 07:41:44 +02:00
Karan Sharma 37c907a8d2
feat: add nomctx and nomad-events-sink (#12542) 2022-04-11 14:47:03 -04:00
Seth Hoenig a75bc27601 docs: fixup title formatting in upgrade guide 2022-04-08 11:50:54 -05:00
Luiz Aoqui 0190f378a7
docs: fix upgrade specific broken link and conflict tag (#12521) 2022-04-08 12:36:47 -04:00
James Rasell 6ac5fd9768
docs: add nomad services template jobspec example. (#12514) 2022-04-08 17:29:19 +02:00
Seth Hoenig e7aa81d3cb docs: tweak hcl2 validation example 2022-04-08 08:43:42 -05:00
Thomas Wunderlich 3f6465f078
Add custom variable validation to docs
Custom variable validation is a useful feature that is supported by
Nomad and not just Terraform. As such it should be documented on the
input variable page.
I've cribbed the content from the terraform docs so this should be
consistent across projects
2022-04-07 19:06:06 -04:00
Jasmine Dahilig 386f2fac3a
docs: add token_last_renewal and token_next_renewal to server metrics and key metrics #12435 (#12505) 2022-04-07 15:12:41 -07:00
Tim Gross 09b5e8d388
Fix flaky operator debug test (#12501)
We introduced a `pprof-interval` argument to `operator debug` in #11938, and unfortunately this has resulted in a lot of test flakes. The actual command in use is mostly fine (although I've fixed some quirks here), so what's really happened is that the change has revealed some existing issues in the tests. Summary of changes:

* Make first pprof collection synchronous to preserve the existing
  behavior for the common case where the pprof interval matches the
  duration.

* Clamp `operator debug` pprof timing to that of the command. The
  `pprof-duration` should be no more than `duration` and the
  `pprof-interval` should be no more than `pprof-duration`. Clamp the
  values rather than throwing errors, which could change the commands
  that existing users might already have in debugging scripts

* Testing: remove test parallelism

  The `operator debug` tests that stand up servers can't be run in
  parallel, because we don't have a way of canceling the API calls for
  pprof. The agent will still be running the last pprof when we exit,
  and that breaks the next test that talks to that same agent.
  (Because you can only run one pprof at a time on any process!)

  We could split off each subtest into its own server, but this test
  suite is already very slow. In future work we should fix this "for
  real" by making the API call cancelable.


* Testing: assert against unexpected errors in `operator debug` tests.

  If we assert there are no unexpected error outputs, it's easier for
  the developer to debug when something is going wrong with the tests
  because the error output will be presented as a failing test, rather
  than just a failing exit code check. Or worse, no failing exit code
  check!

  This also forces us to be explicit about which tests will return 0
  exit codes but still emit (presumably ignorable) error outputs.

Additional minor bug fixes (mostly in tests) and test refactorings:

* Fix text alignment on pprof Duration in `operator debug` output

* Remove "done" channel from `operator debug` event stream test. The
  goroutine we're blocking for here already tells us it's done by
  sending a value, so block on that instead of an extraneous channel

* Event stream test timer should start at current time, not zero

* Remove noise from `operator debug` test log output. The `t.Logf`
  calls already are picked out from the rest of the test output by
  being prefixed with the filename.

* Remove explicit pprof args so we use the defaults clamped from
  duration/interval
2022-04-07 15:00:07 -04:00
Seth Hoenig 0870aa31dc client: set environment variable indicating set of reserved cpu cores
This PR injects the 'NOMAD_CPU_CORES' environment variable into
tasks that have been allocated reserved cpu cores. The value uses
normal cpuset notation, as found in cpuset.cpu cgroup interface files.

Note this value is not necessiarly the same as the content of the actual
cpuset.cpus interface file, which will also include shared cpu cores when
using cgroups v2. This variable is a workaround for users who used to be
able to read the reserved cgroup cpuset file, but lose the information
about distinct reserved cores when using cgroups v2.

Side discussion in: https://github.com/hashicorp/nomad/issues/12374
2022-04-07 09:09:35 -05:00
Jasmine Dahilig f67b108f9f
docs: update vault-token note in job run command #8040 (#12385) 2022-04-06 10:01:38 -07:00
James Rasell 7096fecd10
website: add initial website docs for Nomad service discovery. (#12456) 2022-04-06 18:51:14 +02:00
Derek Strickland 0ab89b1728
Merge pull request #12476 from hashicorp/f-disconnected-client-allocation-handling
disconnected clients: Feature branch merge
2022-04-06 10:11:57 -04:00
Mike Nomitch 7405ebbad1
Add max client disconnect docs (#12467)
Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2022-04-06 08:54:14 -04:00
Seth Hoenig 2e2ff3f75e
Merge pull request #12419 from hashicorp/exec-cleanup
raw_exec: make raw exec driver work with cgroups v2
2022-04-05 16:42:01 -05:00
Tim Gross 5b9772e68f
docs: updates for CSI plugin improvements for 1.3.0 (#12466) 2022-04-05 17:13:51 -04:00
Derek Strickland 8e9f8be511 MaxClientDisconnect Jobspec checklist (#12177)
* api: Add struct, conversion function, and tests
* TaskGroup: Add field, validation, and tests
* diff: Add diff handler and test
* docs: Update docs
2022-04-05 17:12:23 -04:00
Derek Strickland d7f44448e1 disconnected clients: Observability plumbing (#12141)
* Add disconnects/reconnect to log output and emit reschedule metrics

* TaskGroupSummary: Add Unknown, update StateStore logic, add to metrics
2022-04-05 17:12:23 -04:00
Shishir a6801f73d1
cli: add -quiet to nomad node status command. (#12426) 2022-04-05 15:53:43 -04:00
Luiz Aoqui ab7eb5de6e
Support Vault entity aliases (#12449)
Move some common Vault API data struct decoding out of the Vault client
so it can be reused in other situations.

Make Vault job validation its own function so it's easier to expand it.

Rename the `Job.VaultPolicies` method to just `Job.Vault` since it
returns the full Vault block, not just their policies.

Set `ChangeMode` on `Vault.Canonicalize`.

Add some missing tests.

Allows specifying an entity alias that will be used by Nomad when
deriving the task Vault token.

An entity alias assigns an indentity to a token, allowing better control
and management of Vault clients since all tokens with the same indentity
alias will now be considered the same client. This helps track Nomad
activity in Vault's audit logs and better control over Vault billing.

Add support for a new Nomad server configuration to define a default
entity alias to be used when deriving Vault tokens. This default value
will be used if the task doesn't have an entity alias defined.
2022-04-05 14:18:10 -04:00
Grant Griffiths 18a0a2c9a4
CSI: Add secrets flag support for delete volume (#11245) 2022-04-05 08:59:11 -04:00
Seth Hoenig 52aaf86f52 raw_exec: make raw exec driver work with cgroups v2
This PR adds support for the raw_exec driver on systems with only cgroups v2.

The raw exec driver is able to use cgroups to manage processes. This happens
only on Linux, when exec_driver is enabled, and the no_cgroups option is not
set. The driver uses the freezer controller to freeze processes of a task,
issue a sigkill, then unfreeze. Previously the implementation assumed cgroups
v1, and now it also supports cgroups v2.

There is a bit of refactoring in this PR, but the fundamental design remains
the same.

Closes #12351 #12348
2022-04-04 16:11:38 -05:00
Danish Prakash e7e8ce212e
command/operator_debug: add pprof interval (#11938) 2022-04-04 15:24:12 -04:00
Seth Hoenig f9b0ffafde
Merge pull request #12431 from hashicorp/docs-sysbatch-exists-typo
docs: fix typo in system batch description
2022-04-01 09:58:06 -05:00
Seth Hoenig e9eacb1153 docs: fix typo in system batch description 2022-04-01 09:46:03 -05:00
Bryce Kalow 9b0d77ae78
website: redirect /api to api-docs and update internal links (#12410) 2022-03-31 11:33:27 -05:00
Tim Gross 8dccc43c2f
docs: remove deprecated client options parameters docs (#12416)
The client configuration options for drivers have been deprecated
since 0.9. We haven't torn them out completely but because they're
deprecated it's been hard to guarantee correct behavior. Remove the
documentation so that users aren't misled about their viability.
2022-03-31 11:45:51 -04:00
Michael Schurter cae69ba8ce
Merge pull request #12312 from hashicorp/f-writeToFile
template: disallow `writeToFile` by default
2022-03-29 13:41:59 -07:00
Tim Gross 03c1904112
csi: allow namespace field to be passed in volume spec (#12400)
Use the volume spec's `namespace` field to override the value of the
`-namespace` and `NOMAD_NAMESPACE` field, just as we do with job spec.
2022-03-29 14:46:39 -04:00
Michael Schurter 33fe04ff6a
template: fix comments and docs
Review notes from @lgfa29

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-03-29 09:25:23 -07:00
Michael Schurter 7a28fcb8af template: disallow writeToFile by default
Resolves #12095 by WONTFIXing it.

This approach disables `writeToFile` as it allows arbitrary host
filesystem writes and is only a small quality of life improvement over
multiple `template` stanzas.

This approach has the significant downside of leaving people who have
altered their `template.function_denylist` *still vulnerable!* I added
an upgrade note, but we should have implemented the denylist as a
`map[string]bool` so that new funcs could be denied without overriding
custom configurations.

This PR also includes a bug fix that broke enabling all consul-template
funcs. We repeatedly failed to differentiate between a nil (unset)
denylist and an empty (allow all) one.
2022-03-28 17:05:42 -07:00
Shishir afcce3eea5
Display OS name in nomad node status command. (#12388)
Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2022-03-28 09:28:14 -04:00
Hunter Morris dcaf99dcc1
client: Add AWS EC2 instance-life-cycle from metadata to client fingerprint (#12371) 2022-03-25 11:50:52 -04:00
Luiz Aoqui 848a3b271f
docs: fix link and add note about Nomad v1.3.0 on raft v3 upgrade (#12378) 2022-03-25 10:11:46 -04:00
dgotlieb f53f61c6ce
Add grpc and http2 listeners to gateway docs (#12367)
Stating at Nomad version 1.2.0 `grpc` and `http2` [protocols are supported](https://github.com/hashicorp/nomad/pull/11187)
2022-03-24 17:09:19 -04:00
Seth Hoenig 987dda3092
Merge pull request #12274 from hashicorp/f-cgroupsv2
client: enable cpuset support for cgroups.v2
2022-03-24 14:22:54 -05:00
Seth Hoenig 113b7eb727 client: cgroups v2 code review followup 2022-03-24 13:40:42 -05:00
Tim Gross ff1bed38cd
csi: add -secret and -parameter flag to volume snapshot create (#12360)
Pass-through the `-secret` and `-parameter` flags to allow setting
parameters for the snapshot and overriding the secrets we've stored on
the CSI volume in the state store.
2022-03-24 10:29:50 -04:00
Seth Hoenig 2e5c6de820 client: enable support for cgroups v2
This PR introduces support for using Nomad on systems with cgroups v2 [1]
enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux
distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems
for Nomad users.

Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer,
but not so for managing cpuset cgroups. Before, Nomad has been making use of
a feature in v1 where a PID could be a member of more than one cgroup. In v2
this is no longer possible, and so the logic around computing cpuset values
must be modified. When Nomad detects v2, it manages cpuset values in-process,
rather than making use of cgroup heirarchy inheritence via shared/reserved
parents.

Nomad will only activate the v2 logic when it detects cgroups2 is mounted at
/sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2
mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to
use the v1 logic, and should operate as before. Systems that do not support
cgroups v2 are also not affected.

When v2 is activated, Nomad will create a parent called nomad.slice (unless
otherwise configured in Client conifg), and create cgroups for tasks using
naming convention <allocID>-<task>.scope. These follow the naming convention
set by systemd and also used by Docker when cgroups v2 is detected.

Client nodes now export a new fingerprint attribute, unique.cgroups.version
which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by
Nomad.

The new cpuset management strategy fixes #11705, where docker tasks that
spawned processes on startup would "leak". In cgroups v2, the PIDs are
started in the cgroup they will always live in, and thus the cause of
the leak is eliminated.

[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

Closes #11289
Fixes #11705 #11773 #11933
2022-03-23 11:35:27 -05:00
Tim Gross 60cfeacd76
drainer: defer CSI plugins until last (#12324)
When a node is drained, system jobs are left until last so that
operators can rely on things like log shippers running even as their
applications are getting drained off. Include CSI plugins in this set
so that Controller plugins deployed as services can be handled as
gracefully as Node plugins that are running as system jobs.
2022-03-22 10:26:56 -04:00
Luiz Aoqui 68e5b58007
cli: display Raft version in server members (#12317)
The previous output of the `nomad server members` command would output a
column named `Protocol` that displayed the Serf protocol being currently
used by servers.

This is not a configurable option, so it holds very little value to
operators. It is also easy to confuse it with the Raft Protocol version,
which is configurable and highly relevant to operators.

This commit replaces the previous `Protocol` column with the new `Raft
Version`. It also updates the `-detailed` flag to be called `-verbose`
so it matches other commands. The detailed output now also outputs the
same information as the standard output with the addition of the
previous `Protocol` column and `Tags`.
2022-03-17 14:15:10 -04:00
Luiz Aoqui 15089f055f
api: add related evals to eval details (#12305)
The `related` query param is used to indicate that the request should
return a list of related (next, previous, and blocked) evaluations.

Co-authored-by: Jasmine Dahilig <jasmine@hashicorp.com>
2022-03-17 13:56:14 -04:00
Luiz Aoqui 8db12c2a17
server: transfer leadership in case of error (#12293)
When a Nomad server becomes the Raft leader, it must perform several
actions defined in the establishLeadership function. If any of these
actions fail, Raft will think the node is the leader, but it will not
actually be able to act as a Nomad leader.

In this scenario, leadership must be revoked and transferred to another
server if possible, or the node should retry the establishLeadership
steps.
2022-03-17 11:10:57 -04:00
Tim Gross 3bf948dc00
docs: clarify restart inheritance and add examples (#12275)
Clarify the behavior of `restart` inheritance with respect to Connect
sidecar tasks. Remove incorrect language about the scheduler being
involved in restart decisions. Try to make the `delay` mode
documentation more clear, and provide examples of delay vs fail.
2022-03-14 15:49:08 -04:00
Luiz Aoqui 9b393d0535
docs: initial docs for the new API features (#12094) 2022-03-14 10:58:42 -04:00
Luiz Aoqui 2876739a51
api: apply consistent behaviour of the reverse query parameter (#12244) 2022-03-11 19:44:52 -05:00
Luiz Aoqui a42e64c039
docs: add namespace param to job parse API (#12258) 2022-03-10 16:35:07 -05:00
Tim Gross 5ae30849a9
docs: add note about docker DNS config when using bridge mode (#12229)
The Docker DNS configuration options are not compatible with a
group-level network in `bridge` mode. Warn users about this in the
Docker task configuration docs.
2022-03-08 11:59:20 -05:00
Merlin Scholz 68457be72c
docs: elaborate on networking issues with firewalld (#12214) 2022-03-08 09:49:29 -05:00
Mike Nomitch 3955dd36d7
Merge pull request #12192 from hashicorp/website/add-new-tools
Add openapi and caravan to tools page
2022-03-07 11:21:24 -08:00
Ignacio Torres Masdeu 2793054147
docs: fix examples for set_contains_all and set_contains_any (#12093) 2022-03-07 13:55:57 -05:00
Michael Schurter 7bb8de68e5
Merge pull request #12138 from jorgemarey/f-ns-meta
Add metadata to namespaces
2022-03-07 10:19:33 -08:00
Tim Gross b94837a2b8
csi: add pagination args to volume snapshot list (#12193)
The snapshot list API supports pagination as part of the CSI
specification, but we didn't have it plumbed through to the command
line.
2022-03-07 12:19:28 -05:00
Tim Gross 09a7612150
csi: volume snapshot list plugin option is required (#12197)
The RPC for listing volume snapshots requires a plugin ID. Update the
`volume snapshot list` command to find the specific plugin from the
provided prefix.
2022-03-07 09:58:29 -05:00
Michael Schurter 69913d6ac5 docs: add meta to namespace docs 2022-03-04 14:18:57 -08:00
Mike Nomitch 32bc5638a0
Updated OpenAPI info on tools page
Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2022-03-04 12:54:08 -08:00
Mike Nomitch 0129f7f1a5 Add openapi and caravan to tools page 2022-03-04 09:56:21 -06:00
James Rasell 6aa741dd16
docs: add note regarding HCLv2 func and interpolation. 2022-03-04 12:06:25 +01:00
Michael Schurter 0f6923c750
Merge pull request #10808 from hashicorp/f-curl
cli: add operator api command
2022-03-02 10:12:16 -08:00
Michael Schurter a8833b7d86 docs: add op api examples 2022-03-01 17:15:26 -08:00
Michael Schurter 72134ef5a7 docs: add op api examples 2022-03-01 17:12:58 -08:00
Michael Schurter fcf4515875 docs: add op api options 2022-03-01 16:43:53 -08:00
Ashlee M Boyer c3691a44df
docs: Fixing path for autoscaling/agent/source nav item (#12166) 2022-03-01 17:24:12 -05:00
Tim Gross f2a4ad0949
CSI: implement support for topology (#12129) 2022-03-01 10:15:46 -05:00
Tim Gross c90e674918
CSI: use HTTP headers for passing CSI secrets (#12144) 2022-03-01 08:47:01 -05:00
Tim Gross ca06f6153a
docs: clarify that plugin commands are for CSI only (#12151) 2022-03-01 07:57:41 -05:00
Jorge Marey a466f01120 Add metadata to namespaces 2022-02-27 09:09:10 +01:00
Seth Hoenig 5269b2e02f docs: clairfy advertise.rpc effect
The advertise.rpc config option is not intuitive. At first glance you'd
assume it works like advertise.http or advertise.serf, but it does not.

The current behavior is working as intended, but the documentation is
very hard to parse and doesn't draw a clear picture of what the setting
actually does.

Closes https://github.com/hashicorp/nomad/issues/11075
2022-02-25 16:02:29 -06:00
Michael Schurter bb3daac628 rename nomad curl to nomad operator api 2022-02-24 15:52:54 -08:00
Michael Schurter 141db0c562 cli: add curl command
Just a hackweek project at this point.
2022-02-24 15:52:54 -08:00
Luiz Aoqui 61d79e75b0
docs: add docs for the autoscaler on_error and on_check_error configuration (#12083) 2022-02-24 12:12:29 -05:00
Sander Mol 42b338308f
add go-sockaddr templating support to nomad consul address (#12084) 2022-02-24 09:34:54 -05:00
Florian Apolloner 3bced8f558
namespaces: allow enabling/disabling allowed drivers per namespace 2022-02-24 09:27:32 -05:00
Seth Hoenig 8e6d97744b docs: emphasize snapshot before upgrading 2022-02-24 08:22:41 -06:00
Seth Hoenig de95998faa core: switch to go.etc.io/bbolt
This PR swaps the underlying BoltDB implementation from boltdb/bolt
to go.etc.io/bbolt.

In addition, the Server has a new configuration option for disabling
NoFreelistSync on the underlying database.

Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81
Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720
2022-02-23 14:26:41 -06:00
Tim Gross 246db87a74
CSI: allow for concurrent plugin allocations (#12078)
The dynamic plugin registry assumes that plugins are singletons, which
matches the behavior of other Nomad plugins. But because dynamic
plugins like CSI are implemented by allocations, we need to handle the
possibility of multiple allocations for a given plugin type + ID, as
well as behaviors around interleaved allocation starts and stops.

Update the data structure for the dynamic registry so that more recent
allocations take over as the instance manager singleton, but we still
preserve the previous running allocations so that restores work
without racing.

Multiple allocations can run on a client for the same plugin, even if
only during updates. Provide each plugin task a unique path for the
control socket so that the tasks don't interfere with each other.
2022-02-23 15:23:07 -05:00
Charlie Voiselle 01f6e57602
Fixed scheduler config examples (#12049) 2022-02-23 12:58:29 -05:00
Mike Nomitch f3d1cf4dbd
Merge pull request #12065 from hashicorp/docs-add-form-link
Adding link to interview form
2022-02-22 11:05:20 -08:00
Luiz Aoqui 02ee075506
docs: update link to mount in Docker task driver (#12101) 2022-02-22 13:39:49 -05:00
Michael Schurter 7494a0c4fd core: remove all traces of unused protocol version
Nomad inherited protocol version numbering configuration from Consul and
Serf, but unlike those projects Nomad has never used it. Nomad's
`protocol_version` has always been `1`.

While the code is effectively unused and therefore poses no runtime
risks to leave, I felt like removing it was best because:

1. Nomad's RPC subsystem has been able to evolve extensively without
   needing to increment the version number.
2. Nomad's HTTP API has evolved extensively without increment
   `API{Major,Minor}Version`. If we want to version the HTTP API in the
   future, I doubt this is the mechanism we would choose.
3. The presence of the `server.protocol_version` configuration
   parameter is confusing since `server.raft_protocol` *is* an important
   parameter for operators to consider. Even more confusing is that
   there is a distinct Serf protocol version which is included in `nomad
   server members` output under the heading `Protocol`. `raft_protocol`
   is the *only* protocol version relevant to Nomad developers and
   operators. The other protocol versions are either deadcode or have
   never changed (Serf).
4. If we were to need to version the RPC, HTTP API, or Serf protocols, I
   don't think these configuration parameters and variables are the best
   choice. If we come to that point we should choose a versioning scheme
   based on the use case and modern best practices -- not this 6+ year
   old dead code.
2022-02-18 16:12:36 -08:00
Adrián López b1565c7bf4
Update autoscaler AWS ASG target docs: AWS keypair can be empty (#11977) 2022-02-18 17:29:19 -05:00
James Rasell f2d73442e8
docs: add autoscaler hcloud target plugin link. (#12087) 2022-02-18 17:28:38 -05:00
Luiz Aoqui 110dbeeb9d
Add go-bexpr filters to evals and deployment list endpoints (#12034) 2022-02-16 11:40:30 -05:00
Tiernan c30b4617aa
interpolate network.dns block on client (#12021) 2022-02-16 08:39:44 -05:00
Seth Hoenig 40c714a681 api: return sorted results in certain list endpoints
These API endpoints now return results in chronological order. They
can return results in reverse chronological order by setting the
query parameter ascending=true.

- Eval.List
- Deployment.List
2022-02-15 13:48:28 -06:00
Mike Nomitch 8377f5cfe3 Adding link to interview form 2022-02-14 12:38:26 -08:00
James Rasell 926458c5b2
Merge pull request #12053 from marcaurele/fix-typo
doc(typo): technical typo in advertised example
2022-02-11 14:27:12 +01:00
Luiz Aoqui d976e4a19b
docs: add upgrade note and ACL requirements for the job submit endpoint (#12046) 2022-02-10 15:35:16 -05:00
Marc-Aurèle Brothier fb80dc57a1
small typo in advertised example 2022-02-10 13:53:05 +01:00
Tim Gross 59c8558969
docs and changelog for nomad config validate (#12031) 2022-02-09 10:20:45 -05:00
Tim Gross 7ad15b2b42
raft: default to protocol v3 (#11572)
Many of Nomad's Autopilot features require raft protocol version
3. Set the default raft protocol to 3, and improve the upgrade
documentation.
2022-02-03 15:03:12 -05:00
René Moser 05db861938
api-docs: add SysBatchSchedulerEnabled docs (#11973) 2022-02-02 16:54:47 -05:00
James Rasell a7f569d0e1
docs: add cores to client reserved config block. 2022-01-26 15:56:16 +01:00
Dan Norris 160682cf2b
docs: Update volume create/register mount options to use []string example (#11912)
The examples for `nomad volume create` and `nomad volume register` are
not setting `mount_flags` using an array of strings.

This fixes the issue by changing the example to be `mount_flags =
["noatime"]`.
2022-01-24 11:34:21 -05:00
Luiz Aoqui 626e633b41
docs: add nomad.plan.node_rejected metric (#11860) 2022-01-18 13:47:20 -05:00
Dave May 330d24a873
cli: Add event stream capture to nomad operator debug (#11865) 2022-01-17 21:35:51 -05:00
Luiz Aoqui ed9f277925
docs: update 1.2.0 upgrade note now that the UI ACL is fixed (#11840) 2022-01-17 11:09:08 -05:00
Luiz Aoqui f981a1ed7e
docs: add HashiBox to the list of community tools (#11861) 2022-01-17 11:08:41 -05:00
James Rasell 82b168bf34
Merge pull request #11403 from hashicorp/f-gh-11059
agent/docs: add better clarification when top-level data dir needs setting
2022-01-13 16:41:35 +01:00
Luiz Aoqui 7e6acf0e68
docs: fix autoscaling Datadog site configuration (#11824) 2022-01-12 21:06:30 -05:00
Derek Strickland 0a8e03f0f7
Expose Consul template configuration parameters (#11606)
This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza.

- `wait`

It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza.

- `max_stale`
- `block_query_wait`
- `consul_retry`
- `vault_retry` 
- `wait` 

Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure.

- `wait_bounds`

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2022-01-10 10:19:07 -05:00
Tim Gross fa64822e49
docs: note that clients need to have ACLs enabled (#11799)
Client endpoints such as `alloc exec` are enforced on the client if
the API client or CLI has "line of sight" to the client. This is
already in the Learn guide but having it in the ACL configuration docs
would be helpful.
2022-01-07 16:18:41 -05:00
Tim Gross 32f150d469
docs: new scheduler metrics (#11790)
* Fixed name of `nomad.scheduler.allocs.reschedule` metric
* Added new metrics to metrics reference documentation
* Expanded definitions of "waiting" metrics
* Changelog entry for #10236 and #10237
2022-01-07 09:51:15 -05:00
Charlie Voiselle 98a240cd99
Make number of scheduler workers reloadable (#11593)
## Development Environment Changes
* Added stringer to build deps

## New HTTP APIs
* Added scheduler worker config API
* Added scheduler worker info API

## New Internals
* (Scheduler)Worker API refactor—Start(), Stop(), Pause(), Resume()
* Update shutdown to use context
* Add mutex for contended server data
    - `workerLock` for the `workers` slice
    - `workerConfigLock` for the `Server.Config.NumSchedulers` and
      `Server.Config.EnabledSchedulers` values

## Other
* Adding docs for scheduler worker api
* Add changelog message

Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2022-01-06 11:56:13 -05:00
James Rasell 1f4e100edc
Merge pull request #11762 from hashicorp/b-gh-11681
docs: add 1.2.0 HCLv2 strict parsing upgrade note.
2022-01-04 09:30:09 +01:00
Tim Gross 6b1b3e7ef8
docs: fix attribute name for java version detection (#11764) 2022-01-03 16:50:25 -05:00
James Rasell 117c79117e
docs: add 1.2.0 HCLv2 strict parsing upgrade note. 2022-01-03 15:41:18 +00:00
Tim Gross 2806dc2bd7
docs/tests for multiple HTTP address config (#11760) 2022-01-03 10:17:13 -05:00
Kevin Schoonover 5d9a506bc0
agent: support multiple http address in addresses.http (#11582) 2022-01-03 09:33:53 -05:00
Tim Gross 395628efe1
api: paginate deployment list and accept wildcard namespace (#11743)
Add `per_page` and `next_token` handling to `Deployment.List` RPC, and
allow the use of a wildcard namespace for namespace filtering.
2022-01-03 08:36:02 -05:00
Shishir 65eab35412
Add support for setting pids_limit in docker plugin config. (#11526) 2021-12-21 13:31:34 -05:00
Tim Gross b0c3b99b03
scheduler: fix quadratic performance with spread blocks (#11712)
When the scheduler picks a node for each evaluation, the
`LimitIterator` provides at most 2 eligible nodes for the
`MaxScoreIterator` to choose from. This keeps scheduling fast while
producing acceptable results because the results are binpacked.

Jobs with a `spread` block (or node affinity) remove this limit in
order to produce correct spread scoring. This means that every
allocation within a job with a `spread` block is evaluated against
_all_ eligible nodes. Operators of large clusters have reported that
jobs with `spread` blocks that are eligible on a large number of nodes
can take longer than the nack timeout to evaluate (60s). Typical
evaluations are processed in milliseconds.

In practice, it's not necessary to evaluate every eligible node for
every allocation on large clusters, because the `RandomIterator` at
the base of the scheduler stack produces enough variation in each pass
that the likelihood of an uneven spread is negligible. Note that
feasibility is checked before the limit, so this only impacts the
number of _eligible_ nodes available for scoring, not the total number
of nodes.

This changeset sets the iterator limit for "large" `spread` block and
node affinity jobs to be equal to the number of desired
allocations. This brings an example problematic job evaluation down
from ~3min to ~10s. The included tests ensure that we have acceptable
spread results across a variety of large cluster topologies.
2021-12-21 10:10:01 -05:00
Andy Assareh 8ba4e063e2
Mesh Gateway doc enhancements (#11354)
* Mesh Gateway doc enhancements

1. I believe this line should be corrected to add mesh as one of the choices
2. I found that we are not setting this meta, and it is a required element for wan federation. I believe it would be helpful and potentially time saving to note that right here.
2021-12-20 17:10:44 -05:00
Guilherme ae05515b50
Fix 'check calculations' link (#11420) 2021-12-20 17:09:15 -05:00
Tim Gross e046bb31e9
api: respect wildcard in evaluations list API (#11710) 2021-12-20 12:23:50 -05:00
Luiz Aoqui a46d799f2a
docs: add v1.2.0 upgrade guide about Nomad UI ACL change for job details page (#11689) 2021-12-16 14:32:20 -05:00
Luiz Aoqui 4b39494cd1
docs: add more references and examples to the template block (#11691) 2021-12-16 14:14:01 -05:00
Tim Gross f2615992a4
cli: unhide advanced operator raft debugging commands (#11682)
The `nomad operator raft` and `nomad operator snapshot state`
subcommands for inspecting on-disk raft state were hidden and
undocumented. Expose and document these so that advanced operators
have support for these tools.
2021-12-16 10:32:11 -05:00
Tim Gross 536e3c5282
nomad eval list command (#11675)
Use the new filtering and pagination capabilities of the `Eval.List`
RPC to provide filtering and pagination at the command line.

Also includes note that `nomad eval status -json` is deprecated and
will be replaced with a single evaluation view in a future version of
Nomad.
2021-12-15 11:58:38 -05:00
Noel Quiles 235a778a56
website: Copy updates (#11677) 2021-12-14 16:35:21 -05:00
Tim Gross a0cf5db797
provide -no-shutdown-delay flag for job/alloc stop (#11596)
Some operators use very long group/task `shutdown_delay` settings to
safely drain network connections to their workloads after service
deregistration. But during incident response, they may want to cause
that drain to be skipped so they can quickly shed load.

Provide a `-no-shutdown-delay` flag on the `nomad alloc stop` and
`nomad job stop` commands that bypasses the delay. This sets a new
desired transition state on the affected allocations that the
allocation/task runner will identify during pre-kill on the client.

Note (as documented here) that using this flag will almost always
result in failed inbound network connections for workloads as the
tasks will exit before clients receive updated service discovery
information and won't be gracefully drained.
2021-12-13 14:54:53 -05:00
Tim Gross 624ecab901
evaluations list pagination and filtering (#11648)
API queries can request pagination using the `NextToken` and `PerPage`
fields of `QueryOptions`, when supported by the underlying API.

Add a `NextToken` field to the `structs.QueryMeta` so that we have a
common field across RPCs to tell the caller where to resume paging
from on their next API call. Include this field on the `api.QueryMeta`
as well so that it's available for future versions of List HTTP APIs
that wrap the response with `QueryMeta` rather than returning a simple
list of structs. In the meantime callers can get the `X-Nomad-NextToken`.

Add pagination to the `Eval.List` RPC by checking for pagination token
and page size in `QueryOptions`. This will allow resuming from the
last ID seen so long as the query parameters and the state store
itself are unchanged between requests.

Add filtering by job ID or evaluation status over the results we get
out of the state store.

Parse the query parameters of the `Eval.List` API into the arguments
expected for filtering in the RPC call.
2021-12-10 13:43:03 -05:00
Kevin Wang 3e6757f211
feat(website): extract /plugins /tools docs (#11584)
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Mike Nomitch <mnomitch@hashicorp.com>
2021-12-09 14:25:18 -05:00
Lukas W 0e5958d671
CLI: Return non-zero exit code when deployment fails in nomad run (#11550)
* Exit non-zero from run command if deployment fails
* Fix typo in deployment monitor introduced in 0edda11
2021-12-09 09:09:28 -05:00
Tim Gross 348f482c94
docs: improve docs for troubleshooting and monitoring scheduler (#11623)
This changeset adds more specific recommendations as to what metrics
to monitor, and what resources should be examined during incident
response.

It also renames the "Telemetry" section to "Monitoring Nomad" to
surface the material better and distinguish it from the "Metric
Reference".

Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
2021-12-07 15:52:13 -05:00
James Rasell d44e5620dd
docs: add license expiry metric to metrics website doc. 2021-12-07 10:31:51 +00:00
Shantanu Gadgil 0838678609
mention sysbatch in addition to batch (#11587) 2021-12-06 19:12:03 -05:00
Tim Gross 03e697a69d
scheduler: config option to reject job registration (#11610)
During incident response, operators may find that automated processes
elsewhere in the organization can be generating new workloads on Nomad
clusters that are unable to handle the workload. This changeset adds a
field to the `SchedulerConfiguration` API that causes all job
registration calls to be rejected unless the request has a management
ACL token.
2021-12-06 15:20:34 -05:00
Tim Gross 39acac33a0
ui: change Consul/Vault base URL field name (#11589)
Give ourselves some room for extension in the UI configuration block
by naming the field `ui_url`, which will let us have an `api_url`.
Fix the template path to ensure we're getting the right value from the
API.
2021-11-30 13:20:29 -05:00
James Rasell e34bb8ab1d
Merge pull request #11577 from hashicorp/b-gh-11576
docs: add deprecation note to old style network task env vars.
2021-11-30 12:15:31 +01:00
Tim Gross ba038a1ebc
docs: mount_flags takes a slice of strings (#11583)
The `mount_flags` option takes a slice of strings, not a
comma-separated string like the flags passed to `mount(8)`.
2021-11-29 10:07:34 -05:00
James Rasell 0260cc6306
docs: add deprecation note to old style network task env vars. 2021-11-25 12:58:32 +01:00
Luiz Aoqui 0b82d62bc6
docs: document new Prometheus configuration for the Autoscaler APM plugin (#11562) 2021-11-24 17:37:35 -05:00
Luiz Aoqui 0859eac724
docs: add CLI and config docs for the Autoscaler policy source config (#11559) 2021-11-24 16:17:37 -05:00
Luiz Aoqui fa23106612
docs: add upgrade guide notes for Nomad 1.2.2 (#11567) 2021-11-24 14:24:20 -05:00
Tim Gross fcb96de9a7
config: UI configuration block with Vault/Consul links (#11555)
Add `ui` block to agent configuration to enable/disable the web UI and
provide the web UI with links to Vault/Consul.
2021-11-24 11:20:02 -05:00
James Rasell 6dddf9a1fb
Merge pull request #11535 from hashicorp/docs-vault-token
docs: clarify vault.token only required on servers
2021-11-23 09:26:06 +01:00
James Rasell 751c8217d1
core: allow setting and propagation of eval priority on job de/registration (#11532)
This change modifies the Nomad job register and deregister RPCs to
accept an updated option set which includes eval priority. This
param is optional and override the use of the job priority to set
the eval priority.

In order to ensure all evaluations as a result of the request use
the same eval priority, the priority is shared to the
allocReconciler and deploymentWatcher. This creates a new
distinction between eval priority and job priority.

The Nomad agent HTTP API has been modified to allow setting the
eval priority on job update and delete. To keep consistency with
the current v1 API, job update accepts this as a payload param;
job delete accepts this as a query param.

Any user supplied value is validated within the agent HTTP handler
removing the need to pass invalid requests to the server.

The register and deregister opts functions now all for setting
the eval priority on requests.

The change includes a small change to the DeregisterOpts function
which handles nil opts. This brings the function inline with the
RegisterOpts.
2021-11-23 09:23:31 +01:00
Luiz Aoqui d3c1a03edd Version 1.2.1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJhl94SAAoJELC0QQl2hbZ2pqoP/R7HyOxvealo5MBJcG4mGiWT
 Hsu9VXpYKDWn0GSXd3JmqYWH7tIwFMXispZ7pMlDLieypW3UpMYIbIquaePxOaRL
 yhlc0CLT7JDsFPx8Puv1fgKXaS3EfFyJlYx437bhCQ+K0k2+1n3EOhrzU/DQ4j8V
 D5qxlkZh6IK6brIJ54NivGzTxtzGGvIGXCrDPolX3cwoBtyO/pbecfEkRlN2xwxl
 P68l52+Jit3lK2Cljh4Kr1qFj8voHPjYUTXGas8ZkIVrx9l4fb6CHib2y3hy4bRR
 qwXT4keWc8bxtLQ7vtetGBAXp4UKJigziE4imhHAttBN9th2/Oy0qSQCNX3xELJC
 Jwgc+N+ON63QI2sP/8FWvmeUrJpASRITYl/Gr8uOR6n1PacrBhFT9OV4VMkte1ua
 jS/WF/7k21NZYqZca+thvN12wmw/gSEAEeCHH5kR3vPLeV6FdanhKLjufMNuMShc
 UKJCEZw1/Lyux1XkLqMPoZ4DCak8/HskupQoLNsekF1Uki8ObU4as7GERedxqkj6
 i2+1QIQMqvviskOwT0QOWm4RFXjRQsIK8uUfXzHHWDMzDhvnGjB0eWVMLAj4/rTe
 46yUP4kdarFkxwkDmLEyoogdD35wC4Xc8Y8IynzUTN77pOWID5QEyFZVaaBB4NR3
 wNowUJGrNkxEYXwGSkjh
 =Zuw2
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEElFaq1Z5DKdB91i+lKfRZwNnLtXMFAmGbu3sACgkQKfRZwNnL
 tXMx4BAAksQ07tSoOku8zDwx2JpoiNApoYhMLlfJ4S3Mw+RYtbayAMRyA08GG56I
 U85XJB/Z2CzliYL/Nya1e3z6Gyn92V0iD9u7N1xEAPt8PdyiXqIBZn1rWoiCcnMO
 C3f2aRGhLZMVOZG0v7fgbh1PkhJt4MLcRQE9nn5ojPvFzW9bL0Iz7lc9IxHQtaU0
 rANDcXdj3IhiOdEgjtO++Qhdeu3t2SBhT2xFnlJ3gXC2q/aY1a2C7BYdlSxtw0JU
 nKpxvBTsB7rINGcYxhXZlckui5YLL4BX11XqsYhUTMC+33vxE5HNty1ANc1+SNyO
 0iHp0yc5J6MCLuiZ/2sBek2tC+KHCufb+qEIqPmBpcWPJRT8HjginLxj/HyL2TQc
 pLF9XxhYKvv0sm3Zr3Ima5kqWgayph3XhQ73hKs9f7SLfErr6qr4XaI8egZA4OTG
 0QGmY/61UlAdsz5tUvIGRWYD5rqXyXIYnUprldPSQdeZ0o2GjX7T0GZ934O5uHfE
 Ne73GafGn8JaGxH9+AEHMJAVpkrzWR1wrExL3kGJ8NF40HlsYofIuhTkZqMKX3EH
 7KfefSJW1NQAGeAEwjtvzhmUiM0cVoCWGd4COxX1G3oJ0o8gZ3RklDEA4Pa9C0rO
 pBW/KIckPpGieGvPaA3mqmXDjx6oOaxPi9wd5TniBHh43pgrASo=
 =KVce
 -----END PGP SIGNATURE-----

Merge tag 'v1.2.1' into merge-release-1.2.1-branch

Version 1.2.1
2021-11-22 10:47:04 -05:00
Tim Gross fc1d4814d9
qemu: add args_allowlist to sandbox VM command line inputs
The QEMU driver allows arbitrary command line options, but many of
these options give access to host resources that operators may not
want to expose such as devices. Add an optional allowlist to the
plugin configuration so that operators can limit the resources for
QEMU.
2021-11-19 11:11:52 -05:00
James Rasell 88cc158ae1
docs: add global query param to API job deregister endpoint. 2021-11-19 13:45:24 +01:00
Michael Schurter cfe4922213 docs: clarify vault.token only required on servers
While it *is* clarified toward the bottom of this page, I've seen people
go to great lengths to configure tokens for clients anyway, so I think
it's worth noting on the parameter's docs as well.
2021-11-18 16:34:59 -08:00
Luiz Aoqui 12feb598af
docs: add note about the Nomad APM autoscaling plugin and scaling cluster to zero (#11494) 2021-11-16 11:58:26 -05:00
Luiz Aoqui 9a09fe160c
docs: remove mutual-exclusion between node class and datacenter in scaling policies (#11499) 2021-11-16 11:58:14 -05:00
kfenech1 26a0158ead
docs: nomad.client.unallocated.memory is in Megabytes not bytes (#11468) 2021-11-08 11:05:11 -05:00
Alessandro De Blasis 07c670fdc0
cli: show host_network in nomad status (#11432)
Enhance the CLI in order to return the host network in two flavors 
(default, verbose) of the `node status` command.

Fixes: #11223.
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
2021-11-05 09:02:46 -04:00
James Rasell 503f201415
Merge pull request #11444 from hashicorp/b-update-apidocs-alloclist-sample-resp
docs: update API alloc list sample response to be current.
2021-11-05 08:09:23 +01:00
Florian Apolloner ef88795af3
Added a -hcl2-strict flag to allow for lenient hcl variable parsing. (#11284)
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2021-11-04 16:33:09 +01:00
James Rasell 992abe6597
Merge pull request #11333 from hashicorp/assareh-patch-1
exactly one of ingress, terminating, or mesh must be configured
2021-11-04 11:13:04 +01:00
James Rasell 01ecb5b9ce
docs: update API alloc list same response to be current. 2021-11-04 10:22:21 +01:00
Michael Schurter ef3fc79225
Merge pull request #11334 from hashicorp/f-chroot-skip-allocdir
client: never embed alloc_dir in chroot
2021-11-03 16:48:09 -07:00
Luiz Aoqui 4fb5b8b6e7
docs: update podman driver documentation (#11300) 2021-11-03 11:07:44 -04:00
Luiz Aoqui 5d204c8ced
Revert "Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799)" (#11433) 2021-11-02 17:42:52 -04:00
James Rasell 163f2eadd0
Merge pull request #11425 from hashicorp/b-add-timeout-consul-docs
docs: document Consul timeout config parameter.
2021-11-02 15:28:34 +01:00
James Rasell c071efbd6b
Merge pull request #11411 from hashicorp/f-gh-11406
cli: add json and template flag opts to acl bootstrap command.
2021-11-02 09:48:25 +01:00
James Rasell 9d0fe24e25
docs: document Consul timeout config parameter. 2021-11-02 08:28:45 +01:00
James Rasell 46564ac579
docs: update acl bootstrap command to show json and template opts. 2021-10-29 09:01:58 +02:00
Pavel Alimpiev 068066cb0e
Fix typo in documentation 2021-10-29 03:31:53 +03:00
James Rasell d6388db576
docs: clarify server data_dir config needs top-level data_dir cfg. 2021-10-28 13:07:37 +02:00
Dave May 509c74ce19
debug: update default node-id and docs (#11398)
* debug: default node-id to all
* debug: align cli help and website documentation
2021-10-27 13:43:56 -04:00
Mike Nomitch 569a55675b
Replaces accidental use of Vault with Nomad (#11355) 2021-10-27 08:35:31 -07:00
Luiz Aoqui ecc7a288ec
docs: add note and example of storing nomad job plan index to disk (#11377) 2021-10-26 20:25:22 -04:00
Charlie Voiselle 7d02c8b605
DOCS: Update Consul Connect to Consul service mesh (#11362)
* Update Consul Connect to Consul service mesh
* Apply suggestions from code review
2021-10-26 15:10:21 -04:00
Luiz Aoqui 3c22fc79a5
add dispatch idempotency token support in the CLI (#10930) 2021-10-22 12:39:05 -04:00
Luiz Aoqui 6853bf9632
cli: allow setting namespace and region in the nomad ui command (#11364) 2021-10-21 16:24:39 -04:00
James Rasell 6011411111
Merge pull request #11339 from hashicorp/b-website-fixup-interpolation-formatting
website: fixup link formatting within interpolation doc.
2021-10-21 09:15:36 +02:00
Michael Schurter 10c3bad652 client: never embed alloc_dir in chroot
Fixes #2522

Skip embedding client.alloc_dir when building chroot. If a user
configures a Nomad client agent so that the chroot_env will embed the
client.alloc_dir, Nomad will happily infinitely recurse while building
the chroot until something horrible happens. The best case scenario is
the filesystem's path length limit is hit. The worst case scenario is
disk space is exhausted.

A bad agent configuration will look something like this:

```hcl
data_dir = "/tmp/nomad-badagent"

client {
  enabled = true

  chroot_env {
    # Note that the source matches the data_dir
    "/tmp/nomad-badagent" = "/ohno"
    # ...
  }
}
```

Note that `/ohno/client` (the state_dir) will still be created but not
`/ohno/alloc` (the alloc_dir).
While I cannot think of a good reason why someone would want to embed
Nomad's client (and possibly server) directories in chroots, there
should be no cause for harm. chroots are only built when Nomad runs as
root, and Nomad disables running exec jobs as root by default. Therefore
even if client state is copied into chroots, it will be inaccessible to
tasks.

Skipping the `data_dir` and `{client,server}.state_dir` is possible, but
this PR attempts to implement the minimum viable solution to reduce risk
of unintended side effects or bugs.

When running tests as root in a vm without the fix, the following error
occurs:

```
=== RUN   TestAllocDir_SkipAllocDir
    alloc_dir_test.go:520:
                Error Trace:    alloc_dir_test.go:520
                Error:          Received unexpected error:
                                Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long
                Test:           TestAllocDir_SkipAllocDir
--- FAIL: TestAllocDir_SkipAllocDir (22.76s)
```

Also removed unused Copy methods on AllocDir and TaskDir structs.

Thanks to @eveld for not letting me forget about this!
2021-10-18 09:22:01 -07:00
James Rasell 2f5f6e0fdd
website: fixup link formatting within interpolation doc. 2021-10-18 12:21:05 +02:00
Andy Assareh 8c638217ac
exactly one of ingress, terminating, or mesh must be configured
i believe mesh should be included in this statement was omitted.
2021-10-15 14:15:02 -07:00
Shishir Mahajan d4daef7ebf Add support for --init to docker driver.
Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2021-10-15 12:53:25 -07:00
Luiz Aoqui f1fb0987ab
docs: update Nvidia device plugin as external (#11313) 2021-10-14 12:22:31 -04:00
Charlie Voiselle cb8e52b5df
Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799) 2021-10-13 21:23:13 -04:00
Michael Schurter 59fda1894e
Merge pull request #11167 from a-zagaevskiy/master
Support configurable dynamic port range
2021-10-13 16:47:38 -07:00
Jorge Marey 2af0422bca
Add os-nova nomad autoscaler repo link (#11277) 2021-10-12 17:04:58 -04:00
Dave May 76b05f3cd2
cli: Add nomad job allocs command (#11242) 2021-10-12 16:30:36 -04:00
Matt Mukerjee b56432e645
Add FailoverHeartbeatTTL to config (#11127)
FailoverHeartbeatTTL is the amount of time to wait after a server leader failure
before considering reallocating client tasks. This TTL should be fairly long as
the new server leader needs to rebuild the entire heartbeat map for the
cluster. In deployments with a small number of machines, the default TTL (5m)
may be unnecessary long. Let's allow operators to configure this value in their
config files.
2021-10-06 18:48:12 -04:00
Amit Shuster 188be1b5df
Lightrun Integration - External task driver (#11203) 2021-10-06 15:34:34 -04:00
Florian Apolloner 0fa60dae9d
Added support for -force-color to the CLI. (#10975) 2021-10-06 10:02:42 -04:00
Yan 6ff0b6debc
add -show-url option for ui command (#11213) 2021-10-05 20:08:42 -04:00
Luiz Aoqui 63d1ac8939
docs: document that network mode is only supported on Linux (#11192)
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2021-10-01 23:17:20 -04:00
Michael Schurter 5530392256 docs: add new client.{min,max}_dynamic_port params 2021-09-30 17:10:28 -07:00
Tim Gross 6800485dcb devices: externalize nvidia device driver 2021-09-29 13:43:37 -07:00
Luiz Aoqui a7872f0ba5
docs: add Nomad version requirement note for sysbatch (#11231) 2021-09-29 15:14:51 -04:00
jmwilkinson d88b224248
Update filesystem.mdx (#11182)
* Update filesystem.mdx

Update summary of alloc directory to include information on access differences between task drivers and filesystem isolation modes.

Co-authored-by: Tim Gross <tim@0x74696d.com>
2021-09-27 16:36:04 -07:00
James Rasell 8e4cc1b88b
Merge pull request #11224 from hashicorp/b-docs-node-eval-apidocs
docs: fix API docs node evaluate example call.
2021-09-24 15:18:49 +02:00
James Rasell 10f0fc3cc5
docs: fix API docs node evaluate example call. 2021-09-24 10:28:22 +01:00
Charlie Voiselle e707012136
Clarify that reservation example
The current wording can lead someone to believe that you can use percentage values.
2021-09-22 18:30:39 -04:00
Michael Schurter 0745fdbcf6
Merge pull request #11215 from hashicorp/b-license-env-deny
client: add NOMAD_LICENSE to default env deny list
2021-09-21 16:53:26 -07:00
Luiz Aoqui 8d19831247
docs: add some extra documentation around client host environment variables (#11208)
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2021-09-21 17:23:30 -04:00
Michael Schurter 4ad0c258b9 client: add NOMAD_LICENSE to default env deny list
By default we should not expose the NOMAD_LICENSE environment variable
to tasks.

Also refactor where the DefaultEnvDenyList lives so we don't have to
maintain 2 copies of it. Since client/config is the most obvious
location, keep a reference there to its unfortunate home buried deep
in command/agent/host. Since the agent uses this list as well for the
/agent/host endpoint the list must be accessible from both command/agent
and client.
2021-09-21 13:51:17 -07:00
Michael Schurter aa241fb87f docs: add upgrade guide entry for audit log naming 2021-09-16 16:19:52 -07:00
James Rasell b5039c96a4
docs: add network.hostname job specification website entry. 2021-09-15 11:43:47 +02:00
Joel Watson 7e100cc682
Merge pull request #11145 from hashicorp/watsonian/gpu-update
docs: Update Nvidia GPU installation instructions
2021-09-09 10:19:18 -05:00
Andy Assareh 40790017fd
typo - capability (#11152) 2021-09-08 14:34:02 -07:00
Joel Watson 4d0fde00f5 docs: Update Nvidia GPU installation instructions 2021-09-07 15:26:32 -05:00
Forest Anderson 3d68bf81d6
Change dashboard port to http (#11129) 2021-09-03 20:34:40 -04:00
Andy Assareh 60df2a2d0f
suggest changing port number to nomad default (#11140)
i found this confusing since 8300 is associated with consul. suggest using more nomad ports
2021-09-03 20:15:32 -04:00
Isabel Suchanek ab51050ce8
events: fix wildcard namespace handling (#10935)
This fixes a bug in the event stream API where it currently interprets
namespace=* as an actual namespace, not a wildcard. When Nomad parses
incoming requests, it sets namespace to default if not specified, which
means the request namespace will never be an empty string, which is what
the event subscription was checking for. This changes the conditional
logic to check for a wildcard namespace instead of an empty one.

It also updates some event tests to include the default namespace in the
subscription to match current behavior.

Fixes #10903
2021-09-02 09:36:55 -07:00
Luiz Aoqui f09d5ebcd6
Document Docker extra_hosts behaviour post v1.1.3 (#11079)
Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2021-09-01 12:41:06 -04:00
Michael Lange c186628975
Merge pull request #11101 from hashicorp/d/event-stream-ndjson
Mention the ndjson standard format the event stream uses
2021-08-31 11:55:00 -07:00
Derek Strickland a705e84e77
Add firewall statement to requirements (#11106)
This PR adds a sentence about configuring your firewall to allow required Nomad ports. This is being added to help search discoverability.

This closes issue #11076
2021-08-31 10:29:33 -04:00
Michael Lange 1340c82144
Mention the ndjson standard format the event stream uses
Knowing this upfront is important when looking for common libraries to help consume events.
2021-08-30 11:53:38 -07:00
Mahmood Ali 483d30f578
release 1.1.4 (#11088) 2021-08-30 11:43:05 -04:00
James Rasell 4dd5c47a47
Merge pull request #11091 from hashicorp/consolidate-cni-plugins-to-1.0.0
cni: consolidate cni plugins within test install and docs to use v1.0.0
2021-08-30 09:39:39 +02:00
Mahmood Ali 53f11e0080
docs: note env and meta map assignment syntax (#11095) 2021-08-29 14:35:09 -04:00
James Rasell ec221ab792
docs: update website to detail cni plugins v1.0.0 2021-08-27 11:15:25 +02:00
Luiz Aoqui be2389c8ad
Update scaling and policy blocks documentation (#11071)
* website: update `scaling` and `policy` blocks documentation

* website: hclfmt examples in scaling block docs
2021-08-25 09:10:18 -04:00
James Rasell 75be5acb08
Merge pull request #11042 from hashicorp/docs-remove-ingress-host-port-callout
docs: Remove note on ingress gateway hosts field needing a port number
2021-08-25 12:29:59 +02:00
Luiz Aoqui 104d29e808
Don't timestamp active log file (#11070)
* don't timestamp active log file

* website: update log_file default value

* changelog: add entry for #11070

* website: add upgrade instructions for log_file in v1.14 and v1.2.0
2021-08-23 11:27:34 -04:00
Zachary Shilton f87ae568d9
Upgrade global styles (#10936)
* website: upgrade global-styles packages

* website: upgrade community page

* website: hide alert-banner on mobile

* website: upgrade g-container to g-grid-container

* website: update /security to use markdown-page

* website: fix unsupported prop

* website: fix incorrect github link in security page

* website: bump to latest patched dependencies
2021-08-20 11:53:12 -04:00