open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	45b238ec82	CSI: node drain should end once only plugins remain (#12846 ) In #12324 we made it so that plugins wait until the node drain is complete, as we do for system jobs. But we neglected to mark the node drain as complete once only plugins (or system jobs) remaining, which means that the node drain is left in a draining state until the `deadline` time expires. This was incorrectly documented as expected behavior in #12324.	2022-05-03 10:20:22 -04:00
Seth Hoenig	b8d807c320	Merge pull request #12840 from hashicorp/docs-nvidia-updates docs: update nvidia driver documentation	2022-05-02 10:07:02 -05:00
Seth Hoenig	684abb9e28	docs: update nvidia driver documentation notably: - name of the compiled binary is 'nomad-device-nvidia', not 'nvidia-gpu' - link to Nvidia docs for installing the container runtime toolkit - list docker v19.03 as minimum version, to track with nvidia's new container runtime toolkit	2022-05-02 09:11:05 -05:00
Matus Goljer	a741cc76b5	nomad can also install autocomplete for fish shell (#12834 )	2022-05-02 09:26:55 -04:00
Tim Gross	d06ad50538	docs: clarify `capacity_min/max` for volumes (#12825 ) The capacity fields for `create volume` set bounds on the resulting size of the volume, but the ultimate size of the volume will be determined by the storage provider (between the min and max). Clarify this in the documentation and provide a suggestion for how to set a exact size.	2022-04-29 13:38:30 -04:00
Derek Strickland	584bf0162f	docs: Add known limitations callouts to Max Client Disconnect section (#12801 ) * docs: Add known limitations callouts to Max Client Disconnect section	2022-04-28 16:17:14 -04:00
Tim Gross	c763c4cb96	remove pre-0.9 driver code and related E2E test (#12791 ) This test exercises upgrades between 0.8 and Nomad versions greater than 0.9. We have not supported 0.8.x in a very long time and in any case the test has been marked to skip because the downloader doesn't work.	2022-04-27 09:53:37 -04:00
Michael Schurter	1256c8ef66	docs: update json jobs docs (#12766 ) * docs: update json jobs docs Did you know that Nomad has not 1 but 2 JSON formats for jobs? 2½ if you want to acknowledge that sometimes our JSON job representations have a Job top-level wrapper and sometimes do not. The 2½ formats are: ``` 1. HCL JSON 2. Input API JSON (top-level Job field) 2.5. Output API JSON (lacks top-level Job field) ``` `#2` is what our docs consider our API JSON. `#2.5` seems to be an accident of history we can't fix with breaking API compatibility. `#1` is an even more interesting accident of history: the `jobspec2` package automatically detects if the input to Parse is JSON and switches to a JSON parser. This behavior is undocumented, the format is unspecified, and there is no official HashiCorp tooling to produce this JSON from HCL. The plot thickens when you discover popular third party tools like hcl2json.com and https://github.com/tmccombs/hcl2json seem to produce JSON that `nomad run` accepts! Since we have no telemetry around whether or not anyone passes HCL JSON to `nomad run`, and people don't file bugs around features that Just Work, I'm choosing to leave that code path in place and acknowledged but not suggested in documentation. See https://github.com/hashicorp/hcl/issues/498 for a more comprehensive discussion of what officially supporting HCL JSON in Nomad would look like. (I also added some of the missing fields to the (Input API flavor) JSON Job documentation, but it still needs a lot of work to be comprehensive.) Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-04-22 15:57:27 -07:00
Luiz Aoqui	a8cc633156	vault: revert support for entity aliases (#12723 ) After a more detailed analysis of this feature, the approach taken in PR #12449 was found to be not ideal due to poor UX (users are responsible for setting the entity alias they would like to use) and issues around jobs potentially masquerading itself as another Vault entity.	2022-04-22 10:46:34 -04:00
Seth Hoenig	c4aab10e53	services: cr followup	2022-04-22 09:14:29 -05:00
Seth Hoenig	3fcac242c6	services: enable setting arbitrary address value in service registrations This PR introduces the `address` field in the `service` block so that Nomad or Consul services can be registered with a custom `.Address.` to advertise. The address can be an IP address or domain name. If the `address` field is set, the `service.address_mode` must be set in `auto` mode.	2022-04-22 09:14:29 -05:00
James Rasell	b5d10bcece	docs: add upgrade note for Consul implicit constraint. (#12749 )	2022-04-22 15:53:27 +02:00
James Rasell	046831466c	cli: add pagination flags to service info command. (#12730 )	2022-04-22 10:32:40 +02:00
Michael Schurter	5db3a671db	cli: add -json flag to support job commands (#12591 ) * cli: add -json flag to support job commands While the CLI has always supported running JSON jobs, its support has been via HCLv2's JSON parsing. I have no idea what format it expects the job to be in, but it's absolutely not in the same format as the API expects. So I ignored that and added a new -json flag to explicitly support API style JSON jobspecs. The jobspecs can even have the wrapping {"Job": {...}} envelope or not! * docs: fix example for `nomad job validate` We haven't been able to validate inside driver config stanzas ever since the move to task driver plugins. 😭	2022-04-21 13:20:36 -07:00
Tim Gross	f4287c870d	cli: detect directory when applying namespace spec file (#12738 ) The new `namespace apply` feature that allows for passing a namespace specification file detects the difference between an empty namespace and a namespace specification by checking if the file exists. For most cases, the file will have an extension like `.hcl` and so there's little danger that a user will apply a file spec when they intended to apply a file name. But because directory names typically don't include an extension, you're much more likely to collide when trying to `namespace apply` by name only, and then you get a confusing error message of the form: Failed to read file: read $namespace: is a directory Detect the case where the namespace name collides with a directory in the current working directory, and skip trying to load the directory.	2022-04-21 14:53:45 -04:00
James Rasell	716b8e658b	api: Add support for filtering and pagination to the node list endpoint (#12727 )	2022-04-21 17:04:33 +02:00
Tim Gross	79a9d788d2	docs: fix broken link from `template` to client config (#12733 )	2022-04-21 11:04:04 -04:00
James Rasell	c4195c452a	docs: update HCL2 dynamic example to use block with label. (#12715 )	2022-04-21 10:18:04 +02:00
James Rasell	42068f8823	client: add NOMAD_SHORT_ALLOC_ID allocation env var. (#12603 )	2022-04-20 10:30:48 +02:00
Seth Hoenig	df587d8263	docs: update documentation with connect acls changes This PR updates the changelog, adds notes the 1.3 upgrade guide, and updates the connect integration docs with documentation about the new requirement on Consul ACL policies of Consul agent default anonymous ACL tokens.	2022-04-18 08:22:33 -05:00
Shishir	f5121d261e	Add os to NodeListStub struct. (#12497 ) * Add os to NodeListStub struct. Signed-off-by: Shishir Mahajan <smahajan@roblox.com> * Add os as a query param to /v1/nodes. Signed-off-by: Shishir Mahajan <smahajan@roblox.com> * Add test: os as a query param to /v1/nodes. Signed-off-by: Shishir Mahajan <smahajan@roblox.com>	2022-04-15 17:22:45 -07:00
Michael Schurter	70a04dd106	docs: add plan for node rejected details and more (#12564 ) - Moved federation docs to the bottom since everyone is potentially affected by the other sections on the page, but only users of federation are affected by it. - Added section on the plan for node rejected bug since it is fairly easy to diagnose and removing affected nodes is a fairly reliable workaround. - Mention 5s cliff for wait_for_index. - Remove the lie that we do not have job status metrics! How old was that?! - Reinforce the importance of monitoring basic system resources	2022-04-14 16:09:33 -07:00
Seth Hoenig	a1c4f16cf1	connect: prefix tag with nomad.; merge into envoy_stats_tags; update docs This PR expands on the work done in #12543 to - prefix the tag, so it is now "nomad.alloc_id" to be more consistent with Consul tags - merge into pre-existing envoy_stats_tags fields - update the upgrade guide docs - update changelog	2022-04-14 12:52:52 -05:00
James Rasell	4cdc46ae75	service discovery: add pagination and filtering support to info requests (#12552 ) * services: add pagination and filter support to info RPC. * cli: add filter flag to service info command. * docs: add pagination and filter details to services info API. * paginator: minor updates to comment and func signature.	2022-04-13 07:41:44 +02:00
Karan Sharma	37c907a8d2	feat: add nomctx and nomad-events-sink (#12542 )	2022-04-11 14:47:03 -04:00
Seth Hoenig	a75bc27601	docs: fixup title formatting in upgrade guide	2022-04-08 11:50:54 -05:00
Luiz Aoqui	0190f378a7	docs: fix upgrade specific broken link and conflict tag (#12521 )	2022-04-08 12:36:47 -04:00
James Rasell	6ac5fd9768	docs: add nomad services template jobspec example. (#12514 )	2022-04-08 17:29:19 +02:00
Seth Hoenig	e7aa81d3cb	docs: tweak hcl2 validation example	2022-04-08 08:43:42 -05:00
Thomas Wunderlich	3f6465f078	Add custom variable validation to docs Custom variable validation is a useful feature that is supported by Nomad and not just Terraform. As such it should be documented on the input variable page. I've cribbed the content from the terraform docs so this should be consistent across projects	2022-04-07 19:06:06 -04:00
Jasmine Dahilig	386f2fac3a	docs: add token_last_renewal and token_next_renewal to server metrics and key metrics #12435 (#12505 )	2022-04-07 15:12:41 -07:00
Tim Gross	09b5e8d388	Fix flaky `operator debug` test (#12501 ) We introduced a `pprof-interval` argument to `operator debug` in #11938, and unfortunately this has resulted in a lot of test flakes. The actual command in use is mostly fine (although I've fixed some quirks here), so what's really happened is that the change has revealed some existing issues in the tests. Summary of changes: * Make first pprof collection synchronous to preserve the existing behavior for the common case where the pprof interval matches the duration. * Clamp `operator debug` pprof timing to that of the command. The `pprof-duration` should be no more than `duration` and the `pprof-interval` should be no more than `pprof-duration`. Clamp the values rather than throwing errors, which could change the commands that existing users might already have in debugging scripts * Testing: remove test parallelism The `operator debug` tests that stand up servers can't be run in parallel, because we don't have a way of canceling the API calls for pprof. The agent will still be running the last pprof when we exit, and that breaks the next test that talks to that same agent. (Because you can only run one pprof at a time on any process!) We could split off each subtest into its own server, but this test suite is already very slow. In future work we should fix this "for real" by making the API call cancelable. * Testing: assert against unexpected errors in `operator debug` tests. If we assert there are no unexpected error outputs, it's easier for the developer to debug when something is going wrong with the tests because the error output will be presented as a failing test, rather than just a failing exit code check. Or worse, no failing exit code check! This also forces us to be explicit about which tests will return 0 exit codes but still emit (presumably ignorable) error outputs. Additional minor bug fixes (mostly in tests) and test refactorings: * Fix text alignment on pprof Duration in `operator debug` output * Remove "done" channel from `operator debug` event stream test. The goroutine we're blocking for here already tells us it's done by sending a value, so block on that instead of an extraneous channel * Event stream test timer should start at current time, not zero * Remove noise from `operator debug` test log output. The `t.Logf` calls already are picked out from the rest of the test output by being prefixed with the filename. * Remove explicit pprof args so we use the defaults clamped from duration/interval	2022-04-07 15:00:07 -04:00
Seth Hoenig	0870aa31dc	client: set environment variable indicating set of reserved cpu cores This PR injects the 'NOMAD_CPU_CORES' environment variable into tasks that have been allocated reserved cpu cores. The value uses normal cpuset notation, as found in cpuset.cpu cgroup interface files. Note this value is not necessiarly the same as the content of the actual cpuset.cpus interface file, which will also include shared cpu cores when using cgroups v2. This variable is a workaround for users who used to be able to read the reserved cgroup cpuset file, but lose the information about distinct reserved cores when using cgroups v2. Side discussion in: https://github.com/hashicorp/nomad/issues/12374	2022-04-07 09:09:35 -05:00
Jasmine Dahilig	f67b108f9f	docs: update vault-token note in job run command #8040 (#12385 )	2022-04-06 10:01:38 -07:00
James Rasell	7096fecd10	website: add initial website docs for Nomad service discovery. (#12456 )	2022-04-06 18:51:14 +02:00
Derek Strickland	0ab89b1728	Merge pull request #12476 from hashicorp/f-disconnected-client-allocation-handling disconnected clients: Feature branch merge	2022-04-06 10:11:57 -04:00
Mike Nomitch	7405ebbad1	Add max client disconnect docs (#12467 ) Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>	2022-04-06 08:54:14 -04:00
Seth Hoenig	2e2ff3f75e	Merge pull request #12419 from hashicorp/exec-cleanup raw_exec: make raw exec driver work with cgroups v2	2022-04-05 16:42:01 -05:00
Tim Gross	5b9772e68f	docs: updates for CSI plugin improvements for 1.3.0 (#12466 )	2022-04-05 17:13:51 -04:00
Derek Strickland	8e9f8be511	`MaxClientDisconnect` Jobspec checklist (#12177 ) * api: Add struct, conversion function, and tests * TaskGroup: Add field, validation, and tests * diff: Add diff handler and test * docs: Update docs	2022-04-05 17:12:23 -04:00
Derek Strickland	d7f44448e1	disconnected clients: Observability plumbing (#12141 ) * Add disconnects/reconnect to log output and emit reschedule metrics * TaskGroupSummary: Add Unknown, update StateStore logic, add to metrics	2022-04-05 17:12:23 -04:00
Shishir	a6801f73d1	cli: add -quiet to nomad node status command. (#12426 )	2022-04-05 15:53:43 -04:00
Luiz Aoqui	ab7eb5de6e	Support Vault entity aliases (#12449 ) Move some common Vault API data struct decoding out of the Vault client so it can be reused in other situations. Make Vault job validation its own function so it's easier to expand it. Rename the `Job.VaultPolicies` method to just `Job.Vault` since it returns the full Vault block, not just their policies. Set `ChangeMode` on `Vault.Canonicalize`. Add some missing tests. Allows specifying an entity alias that will be used by Nomad when deriving the task Vault token. An entity alias assigns an indentity to a token, allowing better control and management of Vault clients since all tokens with the same indentity alias will now be considered the same client. This helps track Nomad activity in Vault's audit logs and better control over Vault billing. Add support for a new Nomad server configuration to define a default entity alias to be used when deriving Vault tokens. This default value will be used if the task doesn't have an entity alias defined.	2022-04-05 14:18:10 -04:00
Grant Griffiths	18a0a2c9a4	CSI: Add secrets flag support for delete volume (#11245 )	2022-04-05 08:59:11 -04:00
Seth Hoenig	52aaf86f52	raw_exec: make raw exec driver work with cgroups v2 This PR adds support for the raw_exec driver on systems with only cgroups v2. The raw exec driver is able to use cgroups to manage processes. This happens only on Linux, when exec_driver is enabled, and the no_cgroups option is not set. The driver uses the freezer controller to freeze processes of a task, issue a sigkill, then unfreeze. Previously the implementation assumed cgroups v1, and now it also supports cgroups v2. There is a bit of refactoring in this PR, but the fundamental design remains the same. Closes #12351 #12348	2022-04-04 16:11:38 -05:00
Danish Prakash	e7e8ce212e	command/operator_debug: add pprof interval (#11938 )	2022-04-04 15:24:12 -04:00
Seth Hoenig	f9b0ffafde	Merge pull request #12431 from hashicorp/docs-sysbatch-exists-typo docs: fix typo in system batch description	2022-04-01 09:58:06 -05:00
Seth Hoenig	e9eacb1153	docs: fix typo in system batch description	2022-04-01 09:46:03 -05:00
Bryce Kalow	9b0d77ae78	website: redirect /api to api-docs and update internal links (#12410 )	2022-03-31 11:33:27 -05:00
Tim Gross	8dccc43c2f	docs: remove deprecated client options parameters docs (#12416 ) The client configuration options for drivers have been deprecated since 0.9. We haven't torn them out completely but because they're deprecated it's been hard to guarantee correct behavior. Remove the documentation so that users aren't misled about their viability.	2022-03-31 11:45:51 -04:00

1 2 3 4 5 ...

482 Commits