Commit Graph

4069 Commits

Author SHA1 Message Date
Seth Hoenig fc58f4972c cli: correctly use and validate job with vault token set
This PR fixes `job validate` to respect '-vault-token', '$VAULT_TOKEN',
'-vault-namespace' if set.
2022-05-19 12:13:34 -05:00
Seth Hoenig 65f7abf2f4 cli: update default redis and use nomad service discovery
Closes #12927
Closes #12958

This PR updates the version of redis used in our examples from 3.2 to 7.
The old version is very not supported anymore, and we should be setting
a good example by using a supported version.

The long-form example job is now fixed so that the service stanza uses
nomad as the service discovery provider, and so now the job runs without
a requirement of having Consul running and configured.
2022-05-17 10:24:19 -05:00
Luiz Aoqui fea13f39b3
docs: add Consul 1.12.0 upgrade notice 2022-05-16 18:44:26 -04:00
Karan Sharma e0be868b79
docs: Fix typo in sidecar_service (#13021) 2022-05-16 09:35:42 +02:00
dependabot[bot] 4ae15399bd
build(deps): bump cross-fetch from 3.1.4 to 3.1.5 in /website (#12818)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-13 09:31:49 -05:00
Michael Schurter 7f8cf9e2dc
docs: link s/port-plan-failure to more helpful doc (#12968)
The shortlink /s/port-plan-failure is logged when a plan for a node is
rejected to help users debug and mitigate repeated `plan for node
rejected` failures.

The current link to #9506 is... less than useful. It is not clear to
users what steps they should take to either fix their cluster or
contribute to the issue.

While .../monitoring-nomad#progess isn't as comprehensive as it could
be, it's a much more gentle introduction to the class of bug than the
original issue.
2022-05-12 13:59:17 -07:00
Tim Gross 6e5d6eb3b5
docs: note that already-dispatched jobs cannot be updated (#12973) 2022-05-12 16:18:42 -04:00
Tim Gross ae2d7d6727
docs: remove beta tag for CSI from sidebar (#12970) 2022-05-12 14:12:40 -04:00
Michael Schurter 5a43d3c675
docs: add `sysbatch` to scheduling internals (#12954) 2022-05-11 17:06:17 -07:00
Chetan Sarva 14752cd2c0
docs: add version note to nomad services template (#12910) 2022-05-06 17:39:27 +02:00
Tim Gross 26b9f88ef3
docs: add missing `set_contains_any` constraint docs (#12886)
This constraint and affinity was added in 0.9.x but was only
documented for affinities. Close that documentation gap.
2022-05-05 11:11:05 -04:00
Bryce Kalow e9319abc78
website: remove source code and related CI jobs (#12596)
* remove website source code and related circle jobs

* remove data files

* updates platform-cli

* update local instructions

* updates package-lock
2022-05-05 09:53:22 -05:00
Tim Gross 45b238ec82
CSI: node drain should end once only plugins remain (#12846)
In #12324 we made it so that plugins wait until the node drain is
complete, as we do for system jobs. But we neglected to mark the node
drain as complete once only plugins (or system jobs) remaining, which
means that the node drain is left in a draining state until the
`deadline` time expires. This was incorrectly documented as expected
behavior in #12324.
2022-05-03 10:20:22 -04:00
Alex Carpenter d59b517ab2
[WIP] feat: homepage and use case pages redesign (#11873)
* feat: connect homepage and use case pages

* fix: internalLink usage

* fix: query name

* chore: add homepage patterns

* chore: remove offerings

* chore: add intro features

* chore: bump subnav

* chore: updating patterns

* chore: add use case to the subnav

* chore: cleanup unused import

* chore: remove subnav border
2022-05-03 09:06:00 -04:00
Seth Hoenig b8d807c320
Merge pull request #12840 from hashicorp/docs-nvidia-updates
docs: update nvidia driver documentation
2022-05-02 10:07:02 -05:00
Seth Hoenig 684abb9e28 docs: update nvidia driver documentation
notably:
- name of the compiled binary is 'nomad-device-nvidia', not 'nvidia-gpu'
- link to Nvidia docs for installing the container runtime toolkit
- list docker v19.03 as minimum version, to track with nvidia's new container runtime toolkit
2022-05-02 09:11:05 -05:00
Matus Goljer a741cc76b5
nomad can also install autocomplete for fish shell (#12834) 2022-05-02 09:26:55 -04:00
Tim Gross d06ad50538
docs: clarify `capacity_min/max` for volumes (#12825)
The capacity fields for `create volume` set bounds on the resulting
size of the volume, but the ultimate size of the volume will be
determined by the storage provider (between the min and max). Clarify
this in the documentation and provide a suggestion for how to set a
exact size.
2022-04-29 13:38:30 -04:00
Derek Strickland 584bf0162f
docs: Add known limitations callouts to Max Client Disconnect section (#12801)
* docs: Add known limitations callouts to Max Client Disconnect section
2022-04-28 16:17:14 -04:00
Tim Gross c763c4cb96
remove pre-0.9 driver code and related E2E test (#12791)
This test exercises upgrades between 0.8 and Nomad versions greater
than 0.9. We have not supported 0.8.x in a very long time and in any
case the test has been marked to skip because the downloader doesn't
work.
2022-04-27 09:53:37 -04:00
Michael Schurter 1256c8ef66
docs: update json jobs docs (#12766)
* docs: update json jobs docs

Did you know that Nomad has not 1 but 2 JSON formats for jobs? 2½ if you
want to acknowledge that sometimes our JSON job representations have a
Job top-level wrapper and sometimes do not.

The 2½ formats are:
```
 1.   HCL JSON
 2.   Input API JSON (top-level Job field)
 2.5. Output API JSON (lacks top-level Job field)
```

`#2` is what our docs consider our API JSON. `#2.5` seems to be an
accident of history we can't fix with breaking API compatibility.

`#1` is an even more interesting accident of history: the `jobspec2`
package automatically detects if the input to Parse is JSON and switches
to a JSON parser. This behavior is undocumented, the format is
unspecified, and there is no official HashiCorp tooling to produce this
JSON from HCL. The plot thickens when you discover popular third party
tools like hcl2json.com and https://github.com/tmccombs/hcl2json seem to
produce JSON that `nomad run` accepts!

Since we have no telemetry around whether or not anyone passes HCL JSON
to `nomad run`, and people don't file bugs around features that Just
Work, I'm choosing to leave that code path in place and *acknowledged
but not suggested* in documentation.

See https://github.com/hashicorp/hcl/issues/498 for a more comprehensive
discussion of what officially supporting HCL JSON in Nomad would look
like.

(I also added some of the missing fields to the (Input API flavor) JSON
Job documentation, but it still needs a lot of work to be
comprehensive.)

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-04-22 15:57:27 -07:00
Luiz Aoqui a8cc633156
vault: revert support for entity aliases (#12723)
After a more detailed analysis of this feature, the approach taken in
PR #12449 was found to be not ideal due to poor UX (users are
responsible for setting the entity alias they would like to use) and
issues around jobs potentially masquerading itself as another Vault
entity.
2022-04-22 10:46:34 -04:00
Seth Hoenig c4aab10e53 services: cr followup 2022-04-22 09:14:29 -05:00
Seth Hoenig 3fcac242c6 services: enable setting arbitrary address value in service registrations
This PR introduces the `address` field in the `service` block so that Nomad
or Consul services can be registered with a custom `.Address.` to advertise.

The address can be an IP address or domain name. If the `address` field is
set, the `service.address_mode` must be set in `auto` mode.
2022-04-22 09:14:29 -05:00
James Rasell b5d10bcece
docs: add upgrade note for Consul implicit constraint. (#12749) 2022-04-22 15:53:27 +02:00
James Rasell 046831466c
cli: add pagination flags to service info command. (#12730) 2022-04-22 10:32:40 +02:00
Michael Schurter 5db3a671db
cli: add -json flag to support job commands (#12591)
* cli: add -json flag to support job commands

While the CLI has always supported running JSON jobs, its support has
been via HCLv2's JSON parsing. I have no idea what format it expects the
job to be in, but it's absolutely not in the same format as the API
expects.

So I ignored that and added a new -json flag to explicitly support *API*
style JSON jobspecs.

The jobspecs can even have the wrapping {"Job": {...}} envelope or not!

* docs: fix example for `nomad job validate`

We haven't been able to validate inside driver config stanzas ever since
the move to task driver plugins. 😭
2022-04-21 13:20:36 -07:00
Tim Gross f4287c870d
cli: detect directory when applying namespace spec file (#12738)
The new `namespace apply` feature that allows for passing a namespace
specification file detects the difference between an empty namespace
and a namespace specification by checking if the file exists. For most
cases, the file will have an extension like `.hcl` and so there's
little danger that a user will apply a file spec when they intended to
apply a file name.

But because directory names typically don't include an extension,
you're much more likely to collide when trying to `namespace apply` by
name only, and then you get a confusing error message of the form:

   Failed to read file: read $namespace: is a directory

Detect the case where the namespace name collides with a directory in
the current working directory, and skip trying to load the directory.
2022-04-21 14:53:45 -04:00
James Rasell 716b8e658b
api: Add support for filtering and pagination to the node list endpoint (#12727) 2022-04-21 17:04:33 +02:00
Tim Gross 79a9d788d2
docs: fix broken link from `template` to client config (#12733) 2022-04-21 11:04:04 -04:00
James Rasell c4195c452a
docs: update HCL2 dynamic example to use block with label. (#12715) 2022-04-21 10:18:04 +02:00
James Rasell 42068f8823
client: add NOMAD_SHORT_ALLOC_ID allocation env var. (#12603) 2022-04-20 10:30:48 +02:00
Seth Hoenig df587d8263 docs: update documentation with connect acls changes
This PR updates the changelog, adds notes the 1.3 upgrade guide, and
updates the connect integration docs with documentation about the new
requirement on Consul ACL policies of Consul agent default anonymous ACL
tokens.
2022-04-18 08:22:33 -05:00
Shishir f5121d261e
Add os to NodeListStub struct. (#12497)
* Add os to NodeListStub struct.

Signed-off-by: Shishir Mahajan <smahajan@roblox.com>

* Add os as a query param to /v1/nodes.

Signed-off-by: Shishir Mahajan <smahajan@roblox.com>

* Add test: os as a query param to /v1/nodes.

Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2022-04-15 17:22:45 -07:00
Kevin Wang c74c06746b
chore: redirects (#12560)
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-04-15 13:13:40 -04:00
Michael Schurter 70a04dd106
docs: add plan for node rejected details and more (#12564)
- Moved federation docs to the bottom since *everyone* is potentially
  affected by the other sections on the page, but only users of
  federation are affected by it.
- Added section on the plan for node rejected bug since it is fairly
  easy to diagnose and removing affected nodes is a fairly reliable
  workaround.
- Mention 5s cliff for wait_for_index.
- Remove the lie that we do not have job status metrics! How old was
  that?!
- Reinforce the importance of monitoring basic system resources
2022-04-14 16:09:33 -07:00
Seth Hoenig a1c4f16cf1 connect: prefix tag with nomad.; merge into envoy_stats_tags; update docs
This PR expands on the work done in #12543 to
- prefix the tag, so it is now "nomad.alloc_id" to be more consistent with Consul tags
- merge into pre-existing envoy_stats_tags fields
- update the upgrade guide docs
- update changelog
2022-04-14 12:52:52 -05:00
James Rasell 4cdc46ae75
service discovery: add pagination and filtering support to info requests (#12552)
* services: add pagination and filter support to info RPC.
* cli: add filter flag to service info command.
* docs: add pagination and filter details to services info API.
* paginator: minor updates to comment and func signature.
2022-04-13 07:41:44 +02:00
Tim Gross 4078e6ea0e
scripts: fix interpreter for bash (#12549)
Many of our scripts have a non-portable interpreter line for bash and
use bash-specific variables like `BASH_SOURCE`. Update the interpreter
line to be portable between various Linuxes and macOS without
complaint from posix shell users.
2022-04-12 10:08:21 -04:00
Karan Sharma 37c907a8d2
feat: add nomctx and nomad-events-sink (#12542) 2022-04-11 14:47:03 -04:00
Seth Hoenig a75bc27601 docs: fixup title formatting in upgrade guide 2022-04-08 11:50:54 -05:00
Luiz Aoqui 0190f378a7
docs: fix upgrade specific broken link and conflict tag (#12521) 2022-04-08 12:36:47 -04:00
Luiz Aoqui 5e642a4742
add Nomad v1.3.0-beta.1 download box (#12517) 2022-04-08 12:04:14 -04:00
James Rasell 6ac5fd9768
docs: add nomad services template jobspec example. (#12514) 2022-04-08 17:29:19 +02:00
Seth Hoenig e7aa81d3cb docs: tweak hcl2 validation example 2022-04-08 08:43:42 -05:00
Thomas Wunderlich 3f6465f078
Add custom variable validation to docs
Custom variable validation is a useful feature that is supported by
Nomad and not just Terraform. As such it should be documented on the
input variable page.
I've cribbed the content from the terraform docs so this should be
consistent across projects
2022-04-07 19:06:06 -04:00
Jasmine Dahilig 386f2fac3a
docs: add token_last_renewal and token_next_renewal to server metrics and key metrics #12435 (#12505) 2022-04-07 15:12:41 -07:00
Tim Gross 09b5e8d388
Fix flaky `operator debug` test (#12501)
We introduced a `pprof-interval` argument to `operator debug` in #11938, and unfortunately this has resulted in a lot of test flakes. The actual command in use is mostly fine (although I've fixed some quirks here), so what's really happened is that the change has revealed some existing issues in the tests. Summary of changes:

* Make first pprof collection synchronous to preserve the existing
  behavior for the common case where the pprof interval matches the
  duration.

* Clamp `operator debug` pprof timing to that of the command. The
  `pprof-duration` should be no more than `duration` and the
  `pprof-interval` should be no more than `pprof-duration`. Clamp the
  values rather than throwing errors, which could change the commands
  that existing users might already have in debugging scripts

* Testing: remove test parallelism

  The `operator debug` tests that stand up servers can't be run in
  parallel, because we don't have a way of canceling the API calls for
  pprof. The agent will still be running the last pprof when we exit,
  and that breaks the next test that talks to that same agent.
  (Because you can only run one pprof at a time on any process!)

  We could split off each subtest into its own server, but this test
  suite is already very slow. In future work we should fix this "for
  real" by making the API call cancelable.


* Testing: assert against unexpected errors in `operator debug` tests.

  If we assert there are no unexpected error outputs, it's easier for
  the developer to debug when something is going wrong with the tests
  because the error output will be presented as a failing test, rather
  than just a failing exit code check. Or worse, no failing exit code
  check!

  This also forces us to be explicit about which tests will return 0
  exit codes but still emit (presumably ignorable) error outputs.

Additional minor bug fixes (mostly in tests) and test refactorings:

* Fix text alignment on pprof Duration in `operator debug` output

* Remove "done" channel from `operator debug` event stream test. The
  goroutine we're blocking for here already tells us it's done by
  sending a value, so block on that instead of an extraneous channel

* Event stream test timer should start at current time, not zero

* Remove noise from `operator debug` test log output. The `t.Logf`
  calls already are picked out from the rest of the test output by
  being prefixed with the filename.

* Remove explicit pprof args so we use the defaults clamped from
  duration/interval
2022-04-07 15:00:07 -04:00
Seth Hoenig 0870aa31dc client: set environment variable indicating set of reserved cpu cores
This PR injects the 'NOMAD_CPU_CORES' environment variable into
tasks that have been allocated reserved cpu cores. The value uses
normal cpuset notation, as found in cpuset.cpu cgroup interface files.

Note this value is not necessiarly the same as the content of the actual
cpuset.cpus interface file, which will also include shared cpu cores when
using cgroups v2. This variable is a workaround for users who used to be
able to read the reserved cgroup cpuset file, but lose the information
about distinct reserved cores when using cgroups v2.

Side discussion in: https://github.com/hashicorp/nomad/issues/12374
2022-04-07 09:09:35 -05:00
Jasmine Dahilig f67b108f9f
docs: update vault-token note in job run command #8040 (#12385) 2022-04-06 10:01:38 -07:00