Commit graph

872 commits

Author SHA1 Message Date
VishnuJin 67efb19e94
fingerprint: added windows os.build attribute to host fingerprint (#17576) 2023-06-21 10:53:50 -04:00
Luiz Aoqui cfb3bb517f
np: scheduler configuration updates (#17575)
* jobspec: rename node pool scheduler_configuration

In HCL specifications we usually call configuration blocks `config`
instead of `configuration`.

* np: add memory oversubscription config

* np: make scheduler config ENT
2023-06-19 11:41:46 -04:00
Bruce Lok 72e92bc17f
fix typo peers.json (#17538) 2023-06-19 07:56:51 +01:00
Luiz Aoqui d5aa72190f
node pools: namespace integration (#17562)
Add structs and fields to support the Nomad Pools Governance Enterprise
feature of controlling node pool access via namespaces.

Nomad Enterprise allows users to specify a default node pool to be used
by jobs that don't specify one. In order to accomplish this, it's
necessary to distinguish between a job that explicitly uses the
`default` node pool and one that did not specify any.

If the `default` node pool is set during job canonicalization it's
impossible to do this, so this commit allows a job to have an empty node
pool value during registration but sets to `default` at the admission
controller mutator.

In order to guarantee state consistency the state store validates that
the job node pool is set and exists before inserting it.
2023-06-16 16:30:22 -04:00
Tim Gross 3da948d0c8
node pools: support node.pool constraint in scheduler (#17548)
Although most of the time jobs will be assigned to a single node pool, users may
want to set the node pool to "all" and then constraint to a subset of node
pools. Add support for setting a contraint like `${node.pool}`.
2023-06-16 13:31:46 -04:00
Tim Gross f411f0c0fb
docs: node pool specification (#17553) 2023-06-16 10:37:47 -04:00
Tim Gross df366df1cd
docs: fix broken link in variables spec page (#17554) 2023-06-15 15:57:00 -04:00
Tim Gross 524183e2b1
docs: add missing client.allocs metrics (#17540)
The docs were missing counter metrics emitted by the task runner around task
state changes.
2023-06-15 09:18:11 -04:00
Tim Gross 5b9322c70a
docs: clarify node pool apply/delete behavior (#17529) 2023-06-14 15:58:53 -04:00
Tim Gross dc9fae34ca
node pools: add pool as label on client metrics (#17528)
This changeset adds the node pool as a label anywhere we're already emitting
labels with additional information such as node class or ID about the client.
2023-06-14 15:58:38 -04:00
Tim Gross 5f509b8ce0
cli: fix missing -quiet flag for var init (#17526)
The `var init` command was intended to have support for a `-quiet` flag but it
was not documented and never parsed.
2023-06-14 14:52:46 -04:00
Tim Gross 736ad3ed32
docs: note namespace apply/delete behaviors, fix metric (#17527)
This changeset includes some fixes to documentation discovered while working on
node pools, but we didn't want to include in the node pool PRs so they can get
backported easily:

* namespace apply/delete commands are forwarded to the authoritative region
* deleting a namespace requires there are no non-terminal jobs in any of the
  federated regions
* fixed a typo in the name of the `nomad.client.allocated.disk` metric
2023-06-14 14:52:06 -04:00
Tim Gross c1a01697c8
node pools: implement node pool init command (#17479)
Implement a `nomad node pool init` command that generates an example spec file
in either HCL or JSON format.
2023-06-13 14:51:29 -04:00
Luiz Aoqui bc17cffaef
node pool: node pool upsert on multiregion node register (#17503)
When registering a node with a new node pool in a non-authoritative
region we can't create the node pool because this new pool will not be
replicated to other regions.

This commit modifies the node registration logic to only allow automatic
node pool creation in the authoritative region.

In non-authoritative regions, the client is registered, but the node
pool is not created. The client is kept in the `initialing` status until
its node pool is created in the authoritative region and replicated to
the client's region.
2023-06-13 11:28:28 -04:00
Piotr Kazmierczak 57dad0ca07
docs: corrections and additional information for OIDC-related concepts (#17470) 2023-06-09 16:50:22 +02:00
Piotr Kazmierczak 0a4052ece5
docs: add missing login API endpoint documentation (#17467) 2023-06-09 15:59:01 +02:00
Tim Gross fbaf4c8b69
node pools: implement support in scheduler (#17443)
Implement scheduler support for node pool:

* When a scheduler is invoked, we get a set of the ready nodes in the DCs that
  are allowed for that job. Extend the filter to include the node pool.
* Ensure that changes to a job's node pool are picked up as destructive
  allocation updates.
* Add `NodesInPool` as a metric to all reporting done by the scheduler.
* Add the node-in-pool the filter to the `Node.Register` RPC so that we don't
  generate spurious evals for nodes in the wrong pool.
2023-06-07 10:39:03 -04:00
Luiz Aoqui 5878113c41
node pool: implement nomad node pool nodes CLI (#17444) 2023-06-07 10:37:27 -04:00
Tim Gross 06fc284644
node pools: implement CLI for node pool jobs command (#17432) 2023-06-06 15:02:26 -04:00
Tim Gross c0f2295510
node pools: implement HTTP API to list jobs in pool (#17431)
Implements the HTTP API associated with the `NodePool.ListJobs` RPC, including
the `api` package for the public API and documentation.

Update the `NodePool.ListJobs` RPC to fix the missing handling of the special
"all" pool.
2023-06-06 11:40:13 -04:00
Luiz Aoqui 2420c93179
node pools: list nodes in pool (#17413) 2023-06-06 10:43:43 -04:00
Luiz Aoqui aa1b33d157
node pools: add event stream support (#17412) 2023-06-06 10:14:47 -04:00
Tim Gross 2d16ec6c6f
node pools: implement RPC to list jobs in a given node pool (#17396)
Implements the `NodePool.ListJobs` RPC, with pagination and filtering based on
the existing `Job.List` RPC.
2023-06-05 15:36:52 -04:00
KamilCuk cc64281445
Add group_add docker option (#17313) 2023-06-02 20:26:01 -04:00
Luiz Aoqui 6039c18ab6
node pools: register a node in a node pool (#17405) 2023-06-02 17:50:50 -04:00
Luiz Aoqui b770f2b1ef
node pools: implement CLI (#17388) 2023-06-02 15:49:57 -04:00
Samantha b92a782b6e
check: Add support for Consul field tls_server_name (#17334) 2023-06-02 10:19:12 -04:00
Luiz Aoqui 9bb57c08e3
node pool: add search support (#17385) 2023-06-01 17:48:14 -04:00
Tim Gross 4f14fa0518
node pools: add node_pool field to job spec (#17379)
This changeset only adds the `node_pool` field to the jobspec, and ensures that
it gets picked up correctly as a change. Without the rest of the implementation
landed yet, the field will be ignored.
2023-06-01 16:08:55 -04:00
Luiz Aoqui c61e75f302
node pools: add CRUD API (#17384) 2023-06-01 15:55:49 -04:00
Luiz Aoqui 45b0391378
np: implement ACL for node pools (#17365) 2023-06-01 13:03:20 -04:00
Seth Hoenig e04d8cf77b
docs: fixup example of readiness check (#17296)
A "readiness" check implies a failing healthcheck will not cause the
deployment of a service to stop - i.e. it is only used as a liveness
probe in the context of service discoverability.

Fix our docs example to reflect that a readiness check is created by
setting on_update to "ignore" (as opposed to "ignore_warnings").
2023-05-23 15:29:10 -05:00
Tim Gross b9ca3bc9b1
build: remove 386 builds for Nomad 1.6.0 (#17239)
The 32-bit Intel builds (aka "386") are not tested and likely have bugs
involving platform-sized integers when operated at any non-trivial scale. Remove
these builds from the upcoming Nomad 1.6.0 and provide recommendations in the
upgrade notes for those users who might have hobbyist boards running 32-bit
ARM (this will primarily be the RaspberryPi Zero or older spins of the RaspPi).

DO NOT BACKPORT TO 1.5.x OR EARLIER!
2023-05-22 13:27:17 -04:00
Lance Haig 568da5918b
cli: tls certs not created with correct SANs (#16959)
The `nomad tls cert` command did not create certificates with the correct SANs for
them to work with non default domain and region names. This changset updates the
code to support non default domains and regions in the certificates.
2023-05-22 09:31:56 -04:00
Tim Gross 9838349c23
document which fields can be updated by volume register (#17249)
The `volume register` command can update a small subset of the volume's fields
in-place, with some restrictions depending on whether the volume is currently in
use. Document these in the `volume register` command docs and the volume
specification docs.

Fixes: #17247
2023-05-22 09:15:25 -04:00
Tim Gross 4881f2451a
docs: describe the default Workload Identity ACL policy (#17245)
Workload Identities have an implicit default policy. This policy can't currently
be described via HCL because it includes task interpolation for Variables and
access to the Services API (which doesn't exist as its own ACL
capbility). Describe this in our WI documentation.

Fixes: #16277
2023-05-19 11:38:05 -04:00
Mike Nomitch 6df2160e69
docs: add documentation on ephemeral disk and logs (#15829) 2023-05-17 16:58:11 -04:00
Roman Zipp edf83f432a
docs: remove unneeded brackets from job specification template docs (#17219) 2023-05-17 16:45:00 -04:00
Tim Gross 6814e8e6d9
drivers: make internal DisableLogCollection capability public (#17196)
The `DisableLogCollection` capability was introduced as an experimental
interface for the Docker driver in 0.10.4. The interface has been stable and
allowing third-party task drivers the same capability would be useful for those
drivers that don't need the additional overhead of logmon.

This PR only makes the capability public. It doesn't yet add it to the
configuration options for the other internal drivers.

Fixes: #14636 #15686
2023-05-16 09:16:03 -04:00
Mark Lewis 1729c955d2
Update delete.mdx (#17184)
Fix typo
2023-05-15 13:31:52 +01:00
Tim Gross ba269eaf3f
docs: add note to upgrade guide about yanked version (#17115)
Nomad 1.5.4 shipped with a logmon bug that we rolled out a fix for in Nomad
1.5.5. Unfortunately we can't yank the release but we should leave a note in the
upgrade guide telling users to avoid it.
2023-05-08 13:28:45 -04:00
Tim Gross 5f3ff346ea
post release 1.5.5 (#17098)
* changelog entries for 1.5.5 and missing merge of changelog for 1.5.4, 1.4.9,
  and 1.3.14
* note on deprecation of `logs.enabled` field
2023-05-05 11:46:08 -04:00
Tim Gross 17bd930ca9
logs: fix missing allocation logs after update to Nomad 1.5.4 (#17087)
When the server restarts for the upgrade, it loads the `structs.Job` from the
Raft snapshot/logs. The jobspec has long since been parsed, so none of the
guards around the default value are in play. The empty field value for `Enabled`
is the zero value, which is false.

This doesn't impact any running allocation because we don't replace running
allocations when either the client or server restart. But as soon as any
allocation gets rescheduled (ex. you drain all your clients during upgrades),
it'll be using the `structs.Job` that the server has, which has `Enabled =
false`, and logs will not be collected.

This changeset fixes the bug by adding a new field `Disabled` which defaults to
false (so that the zero value works), and deprecates the old field.

Fixes #17076
2023-05-04 16:01:18 -04:00
Seth Hoenig 4347c1d705
docs: move CNI reference plugins installation to CNI overview page (#17068)
* docs: move CNI reference plugins installation to CNI overview page

This PR moves the instruction steps for install the CNI reference plugins
from the Consul Mesh integration page to the general Networking CNI page.

These plugins are required for bridge networking, not just Consul Mesh,
so it makes sense to have them on the general CNI page.

Closes #17038

* docs: fix a link to post install steps
2023-05-04 11:32:06 -05:00
James Rasell 50414bba12
docs: update artifact jobspec sshkey example path. (#17077) 2023-05-04 14:29:36 +01:00
Seth Hoenig e8d53ea30b
connect: use explicit docker.io prefix in default envoy image names (#17045)
This PR modifies references to the envoyproxy/envoy docker image to
explicitly include the docker.io prefix. This does not affect existing
users, but makes things easier for Podman users, who otherwise need to
specify the full name because Podman does not default to docker.io
2023-05-02 09:27:48 -05:00
Seth Hoenig 5744b2cd4f
docs: add more notes about artifact breaking changes in 1.5.0 (#17005)
* changelog: note artifact breaking changes for 1.5.0

* docs: add note about environment variables to artifact job spec docs

* Update website/content/docs/job-specification/artifact.mdx

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-04-27 11:41:18 -05:00
James Rasell fddef4c6e1
docs: use appropriate file extension for autoscaler agent config. (#16993) 2023-04-27 15:00:28 +01:00
Tim Gross 72cbe53f19
logs: allow disabling log collection in jobspec (#16962)
Some Nomad users ship application logs out-of-band via syslog. For these users
having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow
disabling the logmon and pointing the task's stdout/stderr to /dev/null.

This changeset is the first of several incremental improvements to log
collection short of full-on logging plugins. The next step will likely be to
extend the internal-only task driver configuration so that cluster
administrators can turn off log collection for the entire driver.

---

Fixes: #11175

Co-authored-by: Thomas Weber <towe75@googlemail.com>
2023-04-24 10:00:27 -04:00
Tim Gross b5a54b3b5f
docs: fix keyring path in install docs (#16946) 2023-04-20 16:20:39 -04:00