open-nomad

Author	SHA1	Message	Date
Michael Schurter	fcf4515875	docs: add op api options	2022-03-01 16:43:53 -08:00
Michael Schurter	bb3daac628	rename `nomad curl` to `nomad operator api`	2022-02-24 15:52:54 -08:00
Michael Schurter	141db0c562	cli: add curl command Just a hackweek project at this point.	2022-02-24 15:52:54 -08:00
Sander Mol	42b338308f	add go-sockaddr templating support to nomad consul address (#12084 )	2022-02-24 09:34:54 -05:00
Florian Apolloner	3bced8f558	namespaces: allow enabling/disabling allowed drivers per namespace	2022-02-24 09:27:32 -05:00
Seth Hoenig	8e6d97744b	docs: emphasize snapshot before upgrading	2022-02-24 08:22:41 -06:00
Seth Hoenig	de95998faa	core: switch to go.etc.io/bbolt This PR swaps the underlying BoltDB implementation from boltdb/bolt to go.etc.io/bbolt. In addition, the Server has a new configuration option for disabling NoFreelistSync on the underlying database. Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81 Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720	2022-02-23 14:26:41 -06:00
Tim Gross	246db87a74	CSI: allow for concurrent plugin allocations (#12078 ) The dynamic plugin registry assumes that plugins are singletons, which matches the behavior of other Nomad plugins. But because dynamic plugins like CSI are implemented by allocations, we need to handle the possibility of multiple allocations for a given plugin type + ID, as well as behaviors around interleaved allocation starts and stops. Update the data structure for the dynamic registry so that more recent allocations take over as the instance manager singleton, but we still preserve the previous running allocations so that restores work without racing. Multiple allocations can run on a client for the same plugin, even if only during updates. Provide each plugin task a unique path for the control socket so that the tasks don't interfere with each other.	2022-02-23 15:23:07 -05:00
Mike Nomitch	f3d1cf4dbd	Merge pull request #12065 from hashicorp/docs-add-form-link Adding link to interview form	2022-02-22 11:05:20 -08:00
Luiz Aoqui	02ee075506	docs: update link to `mount` in Docker task driver (#12101 )	2022-02-22 13:39:49 -05:00
Michael Schurter	7494a0c4fd	core: remove all traces of unused protocol version Nomad inherited protocol version numbering configuration from Consul and Serf, but unlike those projects Nomad has never used it. Nomad's `protocol_version` has always been `1`. While the code is effectively unused and therefore poses no runtime risks to leave, I felt like removing it was best because: 1. Nomad's RPC subsystem has been able to evolve extensively without needing to increment the version number. 2. Nomad's HTTP API has evolved extensively without increment `API{Major,Minor}Version`. If we want to version the HTTP API in the future, I doubt this is the mechanism we would choose. 3. The presence of the `server.protocol_version` configuration parameter is confusing since `server.raft_protocol` is an important parameter for operators to consider. Even more confusing is that there is a distinct Serf protocol version which is included in `nomad server members` output under the heading `Protocol`. `raft_protocol` is the only protocol version relevant to Nomad developers and operators. The other protocol versions are either deadcode or have never changed (Serf). 4. If we were to need to version the RPC, HTTP API, or Serf protocols, I don't think these configuration parameters and variables are the best choice. If we come to that point we should choose a versioning scheme based on the use case and modern best practices -- not this 6+ year old dead code.	2022-02-18 16:12:36 -08:00
Luiz Aoqui	110dbeeb9d	Add `go-bexpr` filters to evals and deployment list endpoints (#12034 )	2022-02-16 11:40:30 -05:00
Tiernan	c30b4617aa	interpolate network.dns block on client (#12021 )	2022-02-16 08:39:44 -05:00
Mike Nomitch	8377f5cfe3	Adding link to interview form	2022-02-14 12:38:26 -08:00
James Rasell	926458c5b2	Merge pull request #12053 from marcaurele/fix-typo doc(typo): technical typo in advertised example	2022-02-11 14:27:12 +01:00
Luiz Aoqui	d976e4a19b	docs: add upgrade note and ACL requirements for the job submit endpoint (#12046 )	2022-02-10 15:35:16 -05:00
Marc-Aurèle Brothier	fb80dc57a1	small typo in advertised example	2022-02-10 13:53:05 +01:00
Tim Gross	59c8558969	docs and changelog for `nomad config validate` (#12031 )	2022-02-09 10:20:45 -05:00
Tim Gross	7ad15b2b42	raft: default to protocol v3 (#11572 ) Many of Nomad's Autopilot features require raft protocol version 3. Set the default raft protocol to 3, and improve the upgrade documentation.	2022-02-03 15:03:12 -05:00
James Rasell	a7f569d0e1	docs: add `cores` to client reserved config block.	2022-01-26 15:56:16 +01:00
Dan Norris	160682cf2b	docs: Update volume create/register mount options to use []string example (#11912 ) The examples for `nomad volume create` and `nomad volume register` are not setting `mount_flags` using an array of strings. This fixes the issue by changing the example to be `mount_flags = ["noatime"]`.	2022-01-24 11:34:21 -05:00
Luiz Aoqui	626e633b41	docs: add `nomad.plan.node_rejected` metric (#11860 )	2022-01-18 13:47:20 -05:00
Dave May	330d24a873	cli: Add event stream capture to nomad operator debug (#11865 )	2022-01-17 21:35:51 -05:00
Luiz Aoqui	ed9f277925	docs: update 1.2.0 upgrade note now that the UI ACL is fixed (#11840 )	2022-01-17 11:09:08 -05:00
James Rasell	82b168bf34	Merge pull request #11403 from hashicorp/f-gh-11059 agent/docs: add better clarification when top-level data dir needs setting	2022-01-13 16:41:35 +01:00
Derek Strickland	0a8e03f0f7	Expose Consul template configuration parameters (#11606 ) This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza. - `wait` It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza. - `max_stale` - `block_query_wait` - `consul_retry` - `vault_retry` - `wait` Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure. - `wait_bounds` Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-01-10 10:19:07 -05:00
Tim Gross	fa64822e49	docs: note that clients need to have ACLs enabled (#11799 ) Client endpoints such as `alloc exec` are enforced on the client if the API client or CLI has "line of sight" to the client. This is already in the Learn guide but having it in the ACL configuration docs would be helpful.	2022-01-07 16:18:41 -05:00
Tim Gross	32f150d469	docs: new scheduler metrics (#11790 ) * Fixed name of `nomad.scheduler.allocs.reschedule` metric * Added new metrics to metrics reference documentation * Expanded definitions of "waiting" metrics * Changelog entry for #10236 and #10237	2022-01-07 09:51:15 -05:00
James Rasell	1f4e100edc	Merge pull request #11762 from hashicorp/b-gh-11681 docs: add 1.2.0 HCLv2 strict parsing upgrade note.	2022-01-04 09:30:09 +01:00
Tim Gross	6b1b3e7ef8	docs: fix attribute name for java version detection (#11764 )	2022-01-03 16:50:25 -05:00
James Rasell	117c79117e	docs: add 1.2.0 HCLv2 strict parsing upgrade note.	2022-01-03 15:41:18 +00:00
Tim Gross	2806dc2bd7	docs/tests for multiple HTTP address config (#11760 )	2022-01-03 10:17:13 -05:00
Kevin Schoonover	5d9a506bc0	agent: support multiple http address in addresses.http (#11582 )	2022-01-03 09:33:53 -05:00
Shishir	65eab35412	Add support for setting pids_limit in docker plugin config. (#11526 )	2021-12-21 13:31:34 -05:00
Tim Gross	b0c3b99b03	scheduler: fix quadratic performance with spread blocks (#11712 ) When the scheduler picks a node for each evaluation, the `LimitIterator` provides at most 2 eligible nodes for the `MaxScoreIterator` to choose from. This keeps scheduling fast while producing acceptable results because the results are binpacked. Jobs with a `spread` block (or node affinity) remove this limit in order to produce correct spread scoring. This means that every allocation within a job with a `spread` block is evaluated against _all_ eligible nodes. Operators of large clusters have reported that jobs with `spread` blocks that are eligible on a large number of nodes can take longer than the nack timeout to evaluate (60s). Typical evaluations are processed in milliseconds. In practice, it's not necessary to evaluate every eligible node for every allocation on large clusters, because the `RandomIterator` at the base of the scheduler stack produces enough variation in each pass that the likelihood of an uneven spread is negligible. Note that feasibility is checked before the limit, so this only impacts the number of _eligible_ nodes available for scoring, not the total number of nodes. This changeset sets the iterator limit for "large" `spread` block and node affinity jobs to be equal to the number of desired allocations. This brings an example problematic job evaluation down from ~3min to ~10s. The included tests ensure that we have acceptable spread results across a variety of large cluster topologies.	2021-12-21 10:10:01 -05:00
Andy Assareh	8ba4e063e2	Mesh Gateway doc enhancements (#11354 ) * Mesh Gateway doc enhancements 1. I believe this line should be corrected to add mesh as one of the choices 2. I found that we are not setting this meta, and it is a required element for wan federation. I believe it would be helpful and potentially time saving to note that right here.	2021-12-20 17:10:44 -05:00
Luiz Aoqui	a46d799f2a	docs: add v1.2.0 upgrade guide about Nomad UI ACL change for job details page (#11689 )	2021-12-16 14:32:20 -05:00
Luiz Aoqui	4b39494cd1	docs: add more references and examples to the `template` block (#11691 )	2021-12-16 14:14:01 -05:00
Tim Gross	f2615992a4	cli: unhide advanced operator raft debugging commands (#11682 ) The `nomad operator raft` and `nomad operator snapshot state` subcommands for inspecting on-disk raft state were hidden and undocumented. Expose and document these so that advanced operators have support for these tools.	2021-12-16 10:32:11 -05:00
Tim Gross	536e3c5282	`nomad eval list` command (#11675 ) Use the new filtering and pagination capabilities of the `Eval.List` RPC to provide filtering and pagination at the command line. Also includes note that `nomad eval status -json` is deprecated and will be replaced with a single evaluation view in a future version of Nomad.	2021-12-15 11:58:38 -05:00
Noel Quiles	235a778a56	website: Copy updates (#11677 )	2021-12-14 16:35:21 -05:00
Tim Gross	a0cf5db797	provide `-no-shutdown-delay` flag for job/alloc stop (#11596 ) Some operators use very long group/task `shutdown_delay` settings to safely drain network connections to their workloads after service deregistration. But during incident response, they may want to cause that drain to be skipped so they can quickly shed load. Provide a `-no-shutdown-delay` flag on the `nomad alloc stop` and `nomad job stop` commands that bypasses the delay. This sets a new desired transition state on the affected allocations that the allocation/task runner will identify during pre-kill on the client. Note (as documented here) that using this flag will almost always result in failed inbound network connections for workloads as the tasks will exit before clients receive updated service discovery information and won't be gracefully drained.	2021-12-13 14:54:53 -05:00
Kevin Wang	3e6757f211	feat(website): extract `/plugins` `/tools` docs (#11584 ) Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> Co-authored-by: Mike Nomitch <mnomitch@hashicorp.com>	2021-12-09 14:25:18 -05:00
Lukas W	0e5958d671	CLI: Return non-zero exit code when deployment fails in `nomad run` (#11550 ) * Exit non-zero from run command if deployment fails * Fix typo in deployment monitor introduced in 0edda11	2021-12-09 09:09:28 -05:00
Tim Gross	348f482c94	docs: improve docs for troubleshooting and monitoring scheduler (#11623 ) This changeset adds more specific recommendations as to what metrics to monitor, and what resources should be examined during incident response. It also renames the "Telemetry" section to "Monitoring Nomad" to surface the material better and distinguish it from the "Metric Reference". Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>	2021-12-07 15:52:13 -05:00
James Rasell	d44e5620dd	docs: add license expiry metric to metrics website doc.	2021-12-07 10:31:51 +00:00
Shantanu Gadgil	0838678609	mention `sysbatch` in addition to `batch` (#11587 )	2021-12-06 19:12:03 -05:00
Tim Gross	03e697a69d	scheduler: config option to reject job registration (#11610 ) During incident response, operators may find that automated processes elsewhere in the organization can be generating new workloads on Nomad clusters that are unable to handle the workload. This changeset adds a field to the `SchedulerConfiguration` API that causes all job registration calls to be rejected unless the request has a management ACL token.	2021-12-06 15:20:34 -05:00
Tim Gross	39acac33a0	ui: change Consul/Vault base URL field name (#11589 ) Give ourselves some room for extension in the UI configuration block by naming the field `ui_url`, which will let us have an `api_url`. Fix the template path to ensure we're getting the right value from the API.	2021-11-30 13:20:29 -05:00
Tim Gross	ba038a1ebc	docs: `mount_flags` takes a slice of strings (#11583 ) The `mount_flags` option takes a slice of strings, not a comma-separated string like the flags passed to `mount(8)`.	2021-11-29 10:07:34 -05:00

1 2 3 4 5 ...

328 commits