Commit Graph

4127 Commits

Author SHA1 Message Date
Luiz Aoqui 5e642a4742
add Nomad v1.3.0-beta.1 download box (#12517) 2022-04-08 12:04:14 -04:00
James Rasell 6ac5fd9768
docs: add nomad services template jobspec example. (#12514) 2022-04-08 17:29:19 +02:00
Seth Hoenig e7aa81d3cb docs: tweak hcl2 validation example 2022-04-08 08:43:42 -05:00
Thomas Wunderlich 3f6465f078
Add custom variable validation to docs
Custom variable validation is a useful feature that is supported by
Nomad and not just Terraform. As such it should be documented on the
input variable page.
I've cribbed the content from the terraform docs so this should be
consistent across projects
2022-04-07 19:06:06 -04:00
Jasmine Dahilig 386f2fac3a
docs: add token_last_renewal and token_next_renewal to server metrics and key metrics #12435 (#12505) 2022-04-07 15:12:41 -07:00
Tim Gross 09b5e8d388
Fix flaky `operator debug` test (#12501)
We introduced a `pprof-interval` argument to `operator debug` in #11938, and unfortunately this has resulted in a lot of test flakes. The actual command in use is mostly fine (although I've fixed some quirks here), so what's really happened is that the change has revealed some existing issues in the tests. Summary of changes:

* Make first pprof collection synchronous to preserve the existing
  behavior for the common case where the pprof interval matches the
  duration.

* Clamp `operator debug` pprof timing to that of the command. The
  `pprof-duration` should be no more than `duration` and the
  `pprof-interval` should be no more than `pprof-duration`. Clamp the
  values rather than throwing errors, which could change the commands
  that existing users might already have in debugging scripts

* Testing: remove test parallelism

  The `operator debug` tests that stand up servers can't be run in
  parallel, because we don't have a way of canceling the API calls for
  pprof. The agent will still be running the last pprof when we exit,
  and that breaks the next test that talks to that same agent.
  (Because you can only run one pprof at a time on any process!)

  We could split off each subtest into its own server, but this test
  suite is already very slow. In future work we should fix this "for
  real" by making the API call cancelable.


* Testing: assert against unexpected errors in `operator debug` tests.

  If we assert there are no unexpected error outputs, it's easier for
  the developer to debug when something is going wrong with the tests
  because the error output will be presented as a failing test, rather
  than just a failing exit code check. Or worse, no failing exit code
  check!

  This also forces us to be explicit about which tests will return 0
  exit codes but still emit (presumably ignorable) error outputs.

Additional minor bug fixes (mostly in tests) and test refactorings:

* Fix text alignment on pprof Duration in `operator debug` output

* Remove "done" channel from `operator debug` event stream test. The
  goroutine we're blocking for here already tells us it's done by
  sending a value, so block on that instead of an extraneous channel

* Event stream test timer should start at current time, not zero

* Remove noise from `operator debug` test log output. The `t.Logf`
  calls already are picked out from the rest of the test output by
  being prefixed with the filename.

* Remove explicit pprof args so we use the defaults clamped from
  duration/interval
2022-04-07 15:00:07 -04:00
Seth Hoenig 0870aa31dc client: set environment variable indicating set of reserved cpu cores
This PR injects the 'NOMAD_CPU_CORES' environment variable into
tasks that have been allocated reserved cpu cores. The value uses
normal cpuset notation, as found in cpuset.cpu cgroup interface files.

Note this value is not necessiarly the same as the content of the actual
cpuset.cpus interface file, which will also include shared cpu cores when
using cgroups v2. This variable is a workaround for users who used to be
able to read the reserved cgroup cpuset file, but lose the information
about distinct reserved cores when using cgroups v2.

Side discussion in: https://github.com/hashicorp/nomad/issues/12374
2022-04-07 09:09:35 -05:00
Jasmine Dahilig f67b108f9f
docs: update vault-token note in job run command #8040 (#12385) 2022-04-06 10:01:38 -07:00
James Rasell 7096fecd10
website: add initial website docs for Nomad service discovery. (#12456) 2022-04-06 18:51:14 +02:00
Derek Strickland 0ab89b1728
Merge pull request #12476 from hashicorp/f-disconnected-client-allocation-handling
disconnected clients: Feature branch merge
2022-04-06 10:11:57 -04:00
Mike Nomitch 7405ebbad1
Add max client disconnect docs (#12467)
Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2022-04-06 08:54:14 -04:00
Seth Hoenig 2e2ff3f75e
Merge pull request #12419 from hashicorp/exec-cleanup
raw_exec: make raw exec driver work with cgroups v2
2022-04-05 16:42:01 -05:00
Tim Gross 5b9772e68f
docs: updates for CSI plugin improvements for 1.3.0 (#12466) 2022-04-05 17:13:51 -04:00
Derek Strickland 8e9f8be511 `MaxClientDisconnect` Jobspec checklist (#12177)
* api: Add struct, conversion function, and tests
* TaskGroup: Add field, validation, and tests
* diff: Add diff handler and test
* docs: Update docs
2022-04-05 17:12:23 -04:00
Derek Strickland d7f44448e1 disconnected clients: Observability plumbing (#12141)
* Add disconnects/reconnect to log output and emit reschedule metrics

* TaskGroupSummary: Add Unknown, update StateStore logic, add to metrics
2022-04-05 17:12:23 -04:00
Shishir a6801f73d1
cli: add -quiet to nomad node status command. (#12426) 2022-04-05 15:53:43 -04:00
Luiz Aoqui ab7eb5de6e
Support Vault entity aliases (#12449)
Move some common Vault API data struct decoding out of the Vault client
so it can be reused in other situations.

Make Vault job validation its own function so it's easier to expand it.

Rename the `Job.VaultPolicies` method to just `Job.Vault` since it
returns the full Vault block, not just their policies.

Set `ChangeMode` on `Vault.Canonicalize`.

Add some missing tests.

Allows specifying an entity alias that will be used by Nomad when
deriving the task Vault token.

An entity alias assigns an indentity to a token, allowing better control
and management of Vault clients since all tokens with the same indentity
alias will now be considered the same client. This helps track Nomad
activity in Vault's audit logs and better control over Vault billing.

Add support for a new Nomad server configuration to define a default
entity alias to be used when deriving Vault tokens. This default value
will be used if the task doesn't have an entity alias defined.
2022-04-05 14:18:10 -04:00
Grant Griffiths 18a0a2c9a4
CSI: Add secrets flag support for delete volume (#11245) 2022-04-05 08:59:11 -04:00
Seth Hoenig 52aaf86f52 raw_exec: make raw exec driver work with cgroups v2
This PR adds support for the raw_exec driver on systems with only cgroups v2.

The raw exec driver is able to use cgroups to manage processes. This happens
only on Linux, when exec_driver is enabled, and the no_cgroups option is not
set. The driver uses the freezer controller to freeze processes of a task,
issue a sigkill, then unfreeze. Previously the implementation assumed cgroups
v1, and now it also supports cgroups v2.

There is a bit of refactoring in this PR, but the fundamental design remains
the same.

Closes #12351 #12348
2022-04-04 16:11:38 -05:00
Danish Prakash e7e8ce212e
command/operator_debug: add pprof interval (#11938) 2022-04-04 15:24:12 -04:00
Seth Hoenig f9b0ffafde
Merge pull request #12431 from hashicorp/docs-sysbatch-exists-typo
docs: fix typo in system batch description
2022-04-01 09:58:06 -05:00
Seth Hoenig e9eacb1153 docs: fix typo in system batch description 2022-04-01 09:46:03 -05:00
Bryce Kalow 9b0d77ae78
website: redirect /api to api-docs and update internal links (#12410) 2022-03-31 11:33:27 -05:00
Tim Gross 8dccc43c2f
docs: remove deprecated client options parameters docs (#12416)
The client configuration options for drivers have been deprecated
since 0.9. We haven't torn them out completely but because they're
deprecated it's been hard to guarantee correct behavior. Remove the
documentation so that users aren't misled about their viability.
2022-03-31 11:45:51 -04:00
Michael Schurter cae69ba8ce
Merge pull request #12312 from hashicorp/f-writeToFile
template: disallow `writeToFile` by default
2022-03-29 13:41:59 -07:00
Tim Gross 03c1904112
csi: allow `namespace` field to be passed in volume spec (#12400)
Use the volume spec's `namespace` field to override the value of the
`-namespace` and `NOMAD_NAMESPACE` field, just as we do with job spec.
2022-03-29 14:46:39 -04:00
Michael Schurter 33fe04ff6a
template: fix comments and docs
Review notes from @lgfa29

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-03-29 09:25:23 -07:00
Michael Schurter 7a28fcb8af template: disallow `writeToFile` by default
Resolves #12095 by WONTFIXing it.

This approach disables `writeToFile` as it allows arbitrary host
filesystem writes and is only a small quality of life improvement over
multiple `template` stanzas.

This approach has the significant downside of leaving people who have
altered their `template.function_denylist` *still vulnerable!* I added
an upgrade note, but we should have implemented the denylist as a
`map[string]bool` so that new funcs could be denied without overriding
custom configurations.

This PR also includes a bug fix that broke enabling all consul-template
funcs. We repeatedly failed to differentiate between a nil (unset)
denylist and an empty (allow all) one.
2022-03-28 17:05:42 -07:00
Shishir afcce3eea5
Display OS name in nomad node status command. (#12388)
Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2022-03-28 09:28:14 -04:00
Hunter Morris dcaf99dcc1
client: Add AWS EC2 instance-life-cycle from metadata to client fingerprint (#12371) 2022-03-25 11:50:52 -04:00
Luiz Aoqui 848a3b271f
docs: fix link and add note about Nomad v1.3.0 on raft v3 upgrade (#12378) 2022-03-25 10:11:46 -04:00
dgotlieb f53f61c6ce
Add grpc and http2 listeners to gateway docs (#12367)
Stating at Nomad version 1.2.0 `grpc` and `http2` [protocols are supported](https://github.com/hashicorp/nomad/pull/11187)
2022-03-24 17:09:19 -04:00
Seth Hoenig 987dda3092
Merge pull request #12274 from hashicorp/f-cgroupsv2
client: enable cpuset support for cgroups.v2
2022-03-24 14:22:54 -05:00
Seth Hoenig 113b7eb727 client: cgroups v2 code review followup 2022-03-24 13:40:42 -05:00
Tim Gross ff1bed38cd
csi: add `-secret` and `-parameter` flag to `volume snapshot create` (#12360)
Pass-through the `-secret` and `-parameter` flags to allow setting
parameters for the snapshot and overriding the secrets we've stored on
the CSI volume in the state store.
2022-03-24 10:29:50 -04:00
Seth Hoenig 2e5c6de820 client: enable support for cgroups v2
This PR introduces support for using Nomad on systems with cgroups v2 [1]
enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux
distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems
for Nomad users.

Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer,
but not so for managing cpuset cgroups. Before, Nomad has been making use of
a feature in v1 where a PID could be a member of more than one cgroup. In v2
this is no longer possible, and so the logic around computing cpuset values
must be modified. When Nomad detects v2, it manages cpuset values in-process,
rather than making use of cgroup heirarchy inheritence via shared/reserved
parents.

Nomad will only activate the v2 logic when it detects cgroups2 is mounted at
/sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2
mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to
use the v1 logic, and should operate as before. Systems that do not support
cgroups v2 are also not affected.

When v2 is activated, Nomad will create a parent called nomad.slice (unless
otherwise configured in Client conifg), and create cgroups for tasks using
naming convention <allocID>-<task>.scope. These follow the naming convention
set by systemd and also used by Docker when cgroups v2 is detected.

Client nodes now export a new fingerprint attribute, unique.cgroups.version
which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by
Nomad.

The new cpuset management strategy fixes #11705, where docker tasks that
spawned processes on startup would "leak". In cgroups v2, the PIDs are
started in the cgroup they will always live in, and thus the cause of
the leak is eliminated.

[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

Closes #11289
Fixes #11705 #11773 #11933
2022-03-23 11:35:27 -05:00
Tim Gross 60cfeacd76
drainer: defer CSI plugins until last (#12324)
When a node is drained, system jobs are left until last so that
operators can rely on things like log shippers running even as their
applications are getting drained off. Include CSI plugins in this set
so that Controller plugins deployed as services can be handled as
gracefully as Node plugins that are running as system jobs.
2022-03-22 10:26:56 -04:00
Luiz Aoqui 68e5b58007
cli: display Raft version in `server members` (#12317)
The previous output of the `nomad server members` command would output a
column named `Protocol` that displayed the Serf protocol being currently
used by servers.

This is not a configurable option, so it holds very little value to
operators. It is also easy to confuse it with the Raft Protocol version,
which is configurable and highly relevant to operators.

This commit replaces the previous `Protocol` column with the new `Raft
Version`. It also updates the `-detailed` flag to be called `-verbose`
so it matches other commands. The detailed output now also outputs the
same information as the standard output with the addition of the
previous `Protocol` column and `Tags`.
2022-03-17 14:15:10 -04:00
Luiz Aoqui 15089f055f
api: add related evals to eval details (#12305)
The `related` query param is used to indicate that the request should
return a list of related (next, previous, and blocked) evaluations.

Co-authored-by: Jasmine Dahilig <jasmine@hashicorp.com>
2022-03-17 13:56:14 -04:00
Luiz Aoqui 8db12c2a17
server: transfer leadership in case of error (#12293)
When a Nomad server becomes the Raft leader, it must perform several
actions defined in the establishLeadership function. If any of these
actions fail, Raft will think the node is the leader, but it will not
actually be able to act as a Nomad leader.

In this scenario, leadership must be revoked and transferred to another
server if possible, or the node should retry the establishLeadership
steps.
2022-03-17 11:10:57 -04:00
Tim Gross 3bf948dc00
docs: clarify `restart` inheritance and add examples (#12275)
Clarify the behavior of `restart` inheritance with respect to Connect
sidecar tasks. Remove incorrect language about the scheduler being
involved in restart decisions. Try to make the `delay` mode
documentation more clear, and provide examples of delay vs fail.
2022-03-14 15:49:08 -04:00
Luiz Aoqui 9b393d0535
docs: initial docs for the new API features (#12094) 2022-03-14 10:58:42 -04:00
Luiz Aoqui 2876739a51
api: apply consistent behaviour of the reverse query parameter (#12244) 2022-03-11 19:44:52 -05:00
Luiz Aoqui a42e64c039
docs: add namespace param to job parse API (#12258) 2022-03-10 16:35:07 -05:00
Tim Gross 5ae30849a9
docs: add note about docker DNS config when using bridge mode (#12229)
The Docker DNS configuration options are not compatible with a
group-level network in `bridge` mode. Warn users about this in the
Docker task configuration docs.
2022-03-08 11:59:20 -05:00
Merlin Scholz 68457be72c
docs: elaborate on networking issues with firewalld (#12214) 2022-03-08 09:49:29 -05:00
Mike Nomitch 3955dd36d7
Merge pull request #12192 from hashicorp/website/add-new-tools
Add openapi and caravan to tools page
2022-03-07 11:21:24 -08:00
Ignacio Torres Masdeu 2793054147
docs: fix examples for set_contains_all and set_contains_any (#12093) 2022-03-07 13:55:57 -05:00
Michael Schurter 7bb8de68e5
Merge pull request #12138 from jorgemarey/f-ns-meta
Add metadata to namespaces
2022-03-07 10:19:33 -08:00
Tim Gross b94837a2b8
csi: add pagination args to `volume snapshot list` (#12193)
The snapshot list API supports pagination as part of the CSI
specification, but we didn't have it plumbed through to the command
line.
2022-03-07 12:19:28 -05:00
Tim Gross 09a7612150
csi: volume snapshot list plugin option is required (#12197)
The RPC for listing volume snapshots requires a plugin ID. Update the
`volume snapshot list` command to find the specific plugin from the
provided prefix.
2022-03-07 09:58:29 -05:00
Michael Schurter 69913d6ac5 docs: add meta to namespace docs 2022-03-04 14:18:57 -08:00
Mike Nomitch 32bc5638a0
Updated OpenAPI info on tools page
Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2022-03-04 12:54:08 -08:00
Mike Nomitch 0129f7f1a5 Add openapi and caravan to tools page 2022-03-04 09:56:21 -06:00
James Rasell 6aa741dd16
docs: add note regarding HCLv2 func and interpolation. 2022-03-04 12:06:25 +01:00
Michael Schurter 0f6923c750
Merge pull request #10808 from hashicorp/f-curl
cli: add operator api command
2022-03-02 10:12:16 -08:00
Michael Schurter a8833b7d86 docs: add op api examples 2022-03-01 17:15:26 -08:00
Michael Schurter 72134ef5a7 docs: add op api examples 2022-03-01 17:12:58 -08:00
Michael Schurter fcf4515875 docs: add op api options 2022-03-01 16:43:53 -08:00
Ashlee M Boyer c3691a44df
docs: Fixing path for autoscaling/agent/source nav item (#12166) 2022-03-01 17:24:12 -05:00
Tim Gross f2a4ad0949
CSI: implement support for topology (#12129) 2022-03-01 10:15:46 -05:00
Tim Gross c90e674918
CSI: use HTTP headers for passing CSI secrets (#12144) 2022-03-01 08:47:01 -05:00
Tim Gross ca06f6153a
docs: clarify that plugin commands are for CSI only (#12151) 2022-03-01 07:57:41 -05:00
Kevin Wang 166011237b
fix(website): hide version select on `/plugins` & `/tools` (#12145)
* fix(website/plugins): display version select

* fix: hide version select on `/tools` + `/plugins`
2022-02-28 12:44:08 -05:00
Jorge Marey a466f01120 Add metadata to namespaces 2022-02-27 09:09:10 +01:00
Michael Schurter aeff156177 docs: fix nav for op api 2022-02-25 16:21:14 -08:00
Seth Hoenig 5269b2e02f docs: clairfy advertise.rpc effect
The advertise.rpc config option is not intuitive. At first glance you'd
assume it works like advertise.http or advertise.serf, but it does not.

The current behavior is working as intended, but the documentation is
very hard to parse and doesn't draw a clear picture of what the setting
actually does.

Closes https://github.com/hashicorp/nomad/issues/11075
2022-02-25 16:02:29 -06:00
Michael Schurter bb3daac628 rename `nomad curl` to `nomad operator api` 2022-02-24 15:52:54 -08:00
Michael Schurter 141db0c562 cli: add curl command
Just a hackweek project at this point.
2022-02-24 15:52:54 -08:00
Zachary Shilton 81521ca248
chore: bump docs-page for code-block fix (#12117)
* chore: bump to latest docs-page

* fix: bump to react-consent-manager patch

* chore: bump to consent-manager with events dep

* chore: bump to stable consent-manager release
2022-02-24 15:34:54 -05:00
Luiz Aoqui 61d79e75b0
docs: add docs for the autoscaler `on_error` and `on_check_error` configuration (#12083) 2022-02-24 12:12:29 -05:00
Sander Mol 42b338308f
add go-sockaddr templating support to nomad consul address (#12084) 2022-02-24 09:34:54 -05:00
Florian Apolloner 3bced8f558
namespaces: allow enabling/disabling allowed drivers per namespace 2022-02-24 09:27:32 -05:00
Seth Hoenig 8e6d97744b docs: emphasize snapshot before upgrading 2022-02-24 08:22:41 -06:00
Seth Hoenig de95998faa core: switch to go.etc.io/bbolt
This PR swaps the underlying BoltDB implementation from boltdb/bolt
to go.etc.io/bbolt.

In addition, the Server has a new configuration option for disabling
NoFreelistSync on the underlying database.

Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81
Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720
2022-02-23 14:26:41 -06:00
Tim Gross 246db87a74
CSI: allow for concurrent plugin allocations (#12078)
The dynamic plugin registry assumes that plugins are singletons, which
matches the behavior of other Nomad plugins. But because dynamic
plugins like CSI are implemented by allocations, we need to handle the
possibility of multiple allocations for a given plugin type + ID, as
well as behaviors around interleaved allocation starts and stops.

Update the data structure for the dynamic registry so that more recent
allocations take over as the instance manager singleton, but we still
preserve the previous running allocations so that restores work
without racing.

Multiple allocations can run on a client for the same plugin, even if
only during updates. Provide each plugin task a unique path for the
control socket so that the tasks don't interfere with each other.
2022-02-23 15:23:07 -05:00
Charlie Voiselle 01f6e57602
Fixed scheduler config examples (#12049) 2022-02-23 12:58:29 -05:00
Mike Nomitch f3d1cf4dbd
Merge pull request #12065 from hashicorp/docs-add-form-link
Adding link to interview form
2022-02-22 11:05:20 -08:00
Luiz Aoqui 02ee075506
docs: update link to `mount` in Docker task driver (#12101) 2022-02-22 13:39:49 -05:00
Michael Schurter 7494a0c4fd core: remove all traces of unused protocol version
Nomad inherited protocol version numbering configuration from Consul and
Serf, but unlike those projects Nomad has never used it. Nomad's
`protocol_version` has always been `1`.

While the code is effectively unused and therefore poses no runtime
risks to leave, I felt like removing it was best because:

1. Nomad's RPC subsystem has been able to evolve extensively without
   needing to increment the version number.
2. Nomad's HTTP API has evolved extensively without increment
   `API{Major,Minor}Version`. If we want to version the HTTP API in the
   future, I doubt this is the mechanism we would choose.
3. The presence of the `server.protocol_version` configuration
   parameter is confusing since `server.raft_protocol` *is* an important
   parameter for operators to consider. Even more confusing is that
   there is a distinct Serf protocol version which is included in `nomad
   server members` output under the heading `Protocol`. `raft_protocol`
   is the *only* protocol version relevant to Nomad developers and
   operators. The other protocol versions are either deadcode or have
   never changed (Serf).
4. If we were to need to version the RPC, HTTP API, or Serf protocols, I
   don't think these configuration parameters and variables are the best
   choice. If we come to that point we should choose a versioning scheme
   based on the use case and modern best practices -- not this 6+ year
   old dead code.
2022-02-18 16:12:36 -08:00
Adrián López b1565c7bf4
Update autoscaler AWS ASG target docs: AWS keypair can be empty (#11977) 2022-02-18 17:29:19 -05:00
James Rasell f2d73442e8
docs: add autoscaler hcloud target plugin link. (#12087) 2022-02-18 17:28:38 -05:00
Luiz Aoqui 110dbeeb9d
Add `go-bexpr` filters to evals and deployment list endpoints (#12034) 2022-02-16 11:40:30 -05:00
Tiernan c30b4617aa
interpolate network.dns block on client (#12021) 2022-02-16 08:39:44 -05:00
Seth Hoenig 40c714a681 api: return sorted results in certain list endpoints
These API endpoints now return results in chronological order. They
can return results in reverse chronological order by setting the
query parameter ascending=true.

- Eval.List
- Deployment.List
2022-02-15 13:48:28 -06:00
Mike Nomitch 8377f5cfe3 Adding link to interview form 2022-02-14 12:38:26 -08:00
James Rasell 926458c5b2
Merge pull request #12053 from marcaurele/fix-typo
doc(typo): technical typo in advertised example
2022-02-11 14:27:12 +01:00
Luiz Aoqui d976e4a19b
docs: add upgrade note and ACL requirements for the job submit endpoint (#12046) 2022-02-10 15:35:16 -05:00
Luiz Aoqui 1d5b96bdf7
update download to Nomad v1.2.6 (#12042) 2022-02-10 15:33:28 -05:00
Marc-Aurèle Brothier fb80dc57a1
small typo in advertised example 2022-02-10 13:53:05 +01:00
Tim Gross 59c8558969
docs and changelog for `nomad config validate` (#12031) 2022-02-09 10:20:45 -05:00
Dylan Staley fdf67e6bb5
Merge pull request #11936 from hashicorp/ds.ie11-warning
website: display warning in IE 11
2022-02-07 13:59:41 -08:00
Dylan Staley e135369549 feat: display warning in IE 11 2022-02-04 14:25:52 -08:00
Tim Gross 7ad15b2b42
raft: default to protocol v3 (#11572)
Many of Nomad's Autopilot features require raft protocol version
3. Set the default raft protocol to 3, and improve the upgrade
documentation.
2022-02-03 15:03:12 -05:00
René Moser 05db861938
api-docs: add SysBatchSchedulerEnabled docs (#11973) 2022-02-02 16:54:47 -05:00
Tim Gross 95f26b307d
update download to Nomad v1.2.5 (#11969) 2022-02-01 11:04:06 -05:00
Noel Quiles 9dcb7306da
website: Add Demandbase tag to consent manager (#11941)
* chore: Add Demandbase tag to consent manager

* fix: Add services to manager options
2022-01-28 14:37:35 -05:00
James Rasell a7f569d0e1
docs: add `cores` to client reserved config block. 2022-01-26 15:56:16 +01:00
Dan Norris 160682cf2b
docs: Update volume create/register mount options to use []string example (#11912)
The examples for `nomad volume create` and `nomad volume register` are
not setting `mount_flags` using an array of strings.

This fixes the issue by changing the example to be `mount_flags =
["noatime"]`.
2022-01-24 11:34:21 -05:00
Luiz Aoqui b7dbae650a
update download to Nomad v1.2.4 (#11880) 2022-01-19 11:10:24 -05:00
Luiz Aoqui 626e633b41
docs: add `nomad.plan.node_rejected` metric (#11860) 2022-01-18 13:47:20 -05:00
Dave May 330d24a873
cli: Add event stream capture to nomad operator debug (#11865) 2022-01-17 21:35:51 -05:00
Luiz Aoqui ed9f277925
docs: update 1.2.0 upgrade note now that the UI ACL is fixed (#11840) 2022-01-17 11:09:08 -05:00
Luiz Aoqui f981a1ed7e
docs: add HashiBox to the list of community tools (#11861) 2022-01-17 11:08:41 -05:00
James Rasell 82b168bf34
Merge pull request #11403 from hashicorp/f-gh-11059
agent/docs: add better clarification when top-level data dir needs setting
2022-01-13 16:41:35 +01:00
Luiz Aoqui 7e6acf0e68
docs: fix autoscaling Datadog site configuration (#11824) 2022-01-12 21:06:30 -05:00
sara-gawlinski 37a5642f5d
Update alert-banner (#11817)
Updating banner for edge survey
2022-01-12 11:28:17 -05:00
Derek Strickland 0a8e03f0f7
Expose Consul template configuration parameters (#11606)
This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza.

- `wait`

It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza.

- `max_stale`
- `block_query_wait`
- `consul_retry`
- `vault_retry` 
- `wait` 

Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure.

- `wait_bounds`

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2022-01-10 10:19:07 -05:00
Tim Gross fa64822e49
docs: note that clients need to have ACLs enabled (#11799)
Client endpoints such as `alloc exec` are enforced on the client if
the API client or CLI has "line of sight" to the client. This is
already in the Learn guide but having it in the ACL configuration docs
would be helpful.
2022-01-07 16:18:41 -05:00
Tim Gross 32f150d469
docs: new scheduler metrics (#11790)
* Fixed name of `nomad.scheduler.allocs.reschedule` metric
* Added new metrics to metrics reference documentation
* Expanded definitions of "waiting" metrics
* Changelog entry for #10236 and #10237
2022-01-07 09:51:15 -05:00
Charlie Voiselle 98a240cd99
Make number of scheduler workers reloadable (#11593)
## Development Environment Changes
* Added stringer to build deps

## New HTTP APIs
* Added scheduler worker config API
* Added scheduler worker info API

## New Internals
* (Scheduler)Worker API refactor—Start(), Stop(), Pause(), Resume()
* Update shutdown to use context
* Add mutex for contended server data
    - `workerLock` for the `workers` slice
    - `workerConfigLock` for the `Server.Config.NumSchedulers` and
      `Server.Config.EnabledSchedulers` values

## Other
* Adding docs for scheduler worker api
* Add changelog message

Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2022-01-06 11:56:13 -05:00
James Rasell 1f4e100edc
Merge pull request #11762 from hashicorp/b-gh-11681
docs: add 1.2.0 HCLv2 strict parsing upgrade note.
2022-01-04 09:30:09 +01:00
Tim Gross 6b1b3e7ef8
docs: fix attribute name for java version detection (#11764) 2022-01-03 16:50:25 -05:00
James Rasell 117c79117e
docs: add 1.2.0 HCLv2 strict parsing upgrade note. 2022-01-03 15:41:18 +00:00
Tim Gross 2806dc2bd7
docs/tests for multiple HTTP address config (#11760) 2022-01-03 10:17:13 -05:00
Kevin Schoonover 5d9a506bc0
agent: support multiple http address in addresses.http (#11582) 2022-01-03 09:33:53 -05:00
Tim Gross 395628efe1
api: paginate deployment list and accept wildcard namespace (#11743)
Add `per_page` and `next_token` handling to `Deployment.List` RPC, and
allow the use of a wildcard namespace for namespace filtering.
2022-01-03 08:36:02 -05:00
Jeff Escalante 60e7a186e7
add enterprise downloads page (#11750) 2021-12-25 14:42:12 -05:00
Noel Quiles e748508e67
website: Upgrade deps (#11709)
* Update @hashicorp/react-subnav

* Update <Subnav /> & <ProductDownloadsPage />
2021-12-23 16:18:57 -05:00
Alex Carpenter d1b577330a
Merge pull request #11669 from hashicorp/ac.home-redirect
fix: redirects website `/home` to `/`
2021-12-22 09:43:05 -05:00
Shishir 65eab35412
Add support for setting pids_limit in docker plugin config. (#11526) 2021-12-21 13:31:34 -05:00
Tim Gross b0c3b99b03
scheduler: fix quadratic performance with spread blocks (#11712)
When the scheduler picks a node for each evaluation, the
`LimitIterator` provides at most 2 eligible nodes for the
`MaxScoreIterator` to choose from. This keeps scheduling fast while
producing acceptable results because the results are binpacked.

Jobs with a `spread` block (or node affinity) remove this limit in
order to produce correct spread scoring. This means that every
allocation within a job with a `spread` block is evaluated against
_all_ eligible nodes. Operators of large clusters have reported that
jobs with `spread` blocks that are eligible on a large number of nodes
can take longer than the nack timeout to evaluate (60s). Typical
evaluations are processed in milliseconds.

In practice, it's not necessary to evaluate every eligible node for
every allocation on large clusters, because the `RandomIterator` at
the base of the scheduler stack produces enough variation in each pass
that the likelihood of an uneven spread is negligible. Note that
feasibility is checked before the limit, so this only impacts the
number of _eligible_ nodes available for scoring, not the total number
of nodes.

This changeset sets the iterator limit for "large" `spread` block and
node affinity jobs to be equal to the number of desired
allocations. This brings an example problematic job evaluation down
from ~3min to ~10s. The included tests ensure that we have acceptable
spread results across a variety of large cluster topologies.
2021-12-21 10:10:01 -05:00
Andy Assareh 8ba4e063e2
Mesh Gateway doc enhancements (#11354)
* Mesh Gateway doc enhancements

1. I believe this line should be corrected to add mesh as one of the choices
2. I found that we are not setting this meta, and it is a required element for wan federation. I believe it would be helpful and potentially time saving to note that right here.
2021-12-20 17:10:44 -05:00
Guilherme ae05515b50
Fix 'check calculations' link (#11420) 2021-12-20 17:09:15 -05:00
Tim Gross e046bb31e9
api: respect wildcard in evaluations list API (#11710) 2021-12-20 12:23:50 -05:00
Luiz Aoqui a46d799f2a
docs: add v1.2.0 upgrade guide about Nomad UI ACL change for job details page (#11689) 2021-12-16 14:32:20 -05:00
Luiz Aoqui 4b39494cd1
docs: add more references and examples to the `template` block (#11691) 2021-12-16 14:14:01 -05:00
Noel Quiles 3759dd09f1
website: Disable alert banner (#11688) 2021-12-16 13:43:47 -05:00
Tim Gross f2615992a4
cli: unhide advanced operator raft debugging commands (#11682)
The `nomad operator raft` and `nomad operator snapshot state`
subcommands for inspecting on-disk raft state were hidden and
undocumented. Expose and document these so that advanced operators
have support for these tools.
2021-12-16 10:32:11 -05:00
Tim Gross 536e3c5282
`nomad eval list` command (#11675)
Use the new filtering and pagination capabilities of the `Eval.List`
RPC to provide filtering and pagination at the command line.

Also includes note that `nomad eval status -json` is deprecated and
will be replaced with a single evaluation view in a future version of
Nomad.
2021-12-15 11:58:38 -05:00
Noel Quiles 235a778a56
website: Copy updates (#11677) 2021-12-14 16:35:21 -05:00
Noel Quiles 2cd9fc5825
website: Update website Docker image (#11667) 2021-12-13 16:40:46 -05:00
Kevin Wang a62362966c
feat: versioned docs (#11407) 2021-12-13 16:21:57 -05:00
Tim Gross a0cf5db797
provide `-no-shutdown-delay` flag for job/alloc stop (#11596)
Some operators use very long group/task `shutdown_delay` settings to
safely drain network connections to their workloads after service
deregistration. But during incident response, they may want to cause
that drain to be skipped so they can quickly shed load.

Provide a `-no-shutdown-delay` flag on the `nomad alloc stop` and
`nomad job stop` commands that bypasses the delay. This sets a new
desired transition state on the affected allocations that the
allocation/task runner will identify during pre-kill on the client.

Note (as documented here) that using this flag will almost always
result in failed inbound network connections for workloads as the
tasks will exit before clients receive updated service discovery
information and won't be gracefully drained.
2021-12-13 14:54:53 -05:00
Alex Carpenter 37cb8ffc1a fix: redirects /home to / 2021-12-13 14:28:56 -05:00
Tim Gross 2557a4932f
update download to Nomad v1.2.3 (#11664) 2021-12-13 09:56:12 -05:00
Tim Gross 624ecab901
evaluations list pagination and filtering (#11648)
API queries can request pagination using the `NextToken` and `PerPage`
fields of `QueryOptions`, when supported by the underlying API.

Add a `NextToken` field to the `structs.QueryMeta` so that we have a
common field across RPCs to tell the caller where to resume paging
from on their next API call. Include this field on the `api.QueryMeta`
as well so that it's available for future versions of List HTTP APIs
that wrap the response with `QueryMeta` rather than returning a simple
list of structs. In the meantime callers can get the `X-Nomad-NextToken`.

Add pagination to the `Eval.List` RPC by checking for pagination token
and page size in `QueryOptions`. This will allow resuming from the
last ID seen so long as the query parameters and the state store
itself are unchanged between requests.

Add filtering by job ID or evaluation status over the results we get
out of the state store.

Parse the query parameters of the `Eval.List` API into the arguments
expected for filtering in the RPC call.
2021-12-10 13:43:03 -05:00
Kevin Wang 3e6757f211
feat(website): extract `/plugins` `/tools` docs (#11584)
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Mike Nomitch <mnomitch@hashicorp.com>
2021-12-09 14:25:18 -05:00
Brandon Romano ff3da9f0a4
Update the banner (#11656) 2021-12-09 12:08:58 -05:00
Lukas W 0e5958d671
CLI: Return non-zero exit code when deployment fails in `nomad run` (#11550)
* Exit non-zero from run command if deployment fails
* Fix typo in deployment monitor introduced in 0edda11
2021-12-09 09:09:28 -05:00
Tim Gross 348f482c94
docs: improve docs for troubleshooting and monitoring scheduler (#11623)
This changeset adds more specific recommendations as to what metrics
to monitor, and what resources should be examined during incident
response.

It also renames the "Telemetry" section to "Monitoring Nomad" to
surface the material better and distinguish it from the "Metric
Reference".

Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
2021-12-07 15:52:13 -05:00
Noel Quiles 7bdbf9b027
website: Upgrade <HashiStackMenu /> to latest (#11615)
* Update @hashicorp/react-hashi-stack-menu

* Upgrade to latest

* One last upgrade
2021-12-07 15:25:28 -05:00
James Rasell d44e5620dd
docs: add license expiry metric to metrics website doc. 2021-12-07 10:31:51 +00:00
Shantanu Gadgil 0838678609
mention `sysbatch` in addition to `batch` (#11587) 2021-12-06 19:12:03 -05:00
Tim Gross 03e697a69d
scheduler: config option to reject job registration (#11610)
During incident response, operators may find that automated processes
elsewhere in the organization can be generating new workloads on Nomad
clusters that are unable to handle the workload. This changeset adds a
field to the `SchedulerConfiguration` API that causes all job
registration calls to be rejected unless the request has a management
ACL token.
2021-12-06 15:20:34 -05:00
Zachary Shilton a16f383d82
website: bump deps to fix print styles (#11365)
* website: bump deps to fix print styles

* website: fix up print styles

* fix: hashi-stack-menu print selector
2021-12-03 10:14:21 -05:00
Tim Gross 39acac33a0
ui: change Consul/Vault base URL field name (#11589)
Give ourselves some room for extension in the UI configuration block
by naming the field `ui_url`, which will let us have an `api_url`.
Fix the template path to ensure we're getting the right value from the
API.
2021-11-30 13:20:29 -05:00
James Rasell e34bb8ab1d
Merge pull request #11577 from hashicorp/b-gh-11576
docs: add deprecation note to old style network task env vars.
2021-11-30 12:15:31 +01:00
Brandon Romano cd043ca699 Updates use cases 2021-11-29 09:16:17 -08:00
Tim Gross ba038a1ebc
docs: `mount_flags` takes a slice of strings (#11583)
The `mount_flags` option takes a slice of strings, not a
comma-separated string like the flags passed to `mount(8)`.
2021-11-29 10:07:34 -05:00