Commit Graph

3312 Commits

Author SHA1 Message Date
Tim Gross 09a7612150
csi: volume snapshot list plugin option is required (#12197)
The RPC for listing volume snapshots requires a plugin ID. Update the
`volume snapshot list` command to find the specific plugin from the
provided prefix.
2022-03-07 09:58:29 -05:00
Michael Schurter 5bf877ecf2 cli: namespace meta should be formatted consistently 2022-03-04 14:13:48 -08:00
Michael Schurter 6841385b73 cli: namespace tests should be run on oss 2022-03-04 14:13:48 -08:00
Michael Schurter e8a5258ad4 cli: namespace apply should autocomplete hcl files 2022-03-04 14:13:33 -08:00
Tim Gross b776c1c196
csi: fix prefix queries for plugin list RPC (#12194)
The `CSIPlugin.List` RPC was intended to accept a prefix to filter the
list of plugins being listed. This was being accidentally being done
in the state store instead, which contributed to incorrect filtering
behavior for plugins in the `volume plugin status` command.

Move the prefix matching into the RPC so that it calls the
prefix-matching method in the state store if we're looking for a
prefix.

Update the `plugin status command` to accept a prefix for the plugin
ID argument so that it matches the expected behavior of other commands.
2022-03-04 16:44:09 -05:00
Tim Gross 3247e422d1
csi: add missing fields to HTTP API response (#12178)
The HTTP endpoint for CSI manually serializes the internal struct to
the API struct for purposes of redaction (see also #10470). Add fields
that were missing from this serialization so they don't show up as
always empty in the API response.
2022-03-03 15:15:28 -05:00
James Rasell 8ce6684955
http: add alloc service registration agent HTTP endpoint. 2022-03-03 12:13:32 +01:00
James Rasell 81fe915e6c
http: add job service registration agent HTTP endpoint. 2022-03-03 12:13:13 +01:00
James Rasell 60cc73fe5d
http: add agent service registration HTTP endpoint. 2022-03-03 12:13:00 +01:00
Michael Schurter 0f6923c750
Merge pull request #10808 from hashicorp/f-curl
cli: add operator api command
2022-03-02 10:12:16 -08:00
Michael Schurter 0bb9f06637 cli: fix op api method handling 2022-03-01 16:44:15 -08:00
Tim Gross f65c804544
csi: subcommand for volume snapshot (#12152) 2022-03-01 13:30:30 -05:00
Tim Gross f2a4ad0949
CSI: implement support for topology (#12129) 2022-03-01 10:15:46 -05:00
Tim Gross c90e674918
CSI: use HTTP headers for passing CSI secrets (#12144) 2022-03-01 08:47:01 -05:00
Tim Gross a499401b34
csi: fix redaction of `volume status` mount flags (#12150)
The `volume status` command and associated API redacts the entire
mount options instead of just the `MountFlags` field that can contain
sensitive data. Return a redacted value so that the return value makes
sense to operators who have set this field.
2022-03-01 08:34:03 -05:00
Tim Gross 99d03cdc6c
CSI: sort capabilities in `plugin status` (#12154)
Also fix `LIST_SNAPSHOTS` capability name
2022-03-01 07:59:31 -05:00
Tim Gross 02ae95ab22
csi: respect -verbose flag for allocs in volume status (#12153) 2022-03-01 07:57:29 -05:00
Jorge Marey a466f01120 Add metadata to namespaces 2022-02-27 09:09:10 +01:00
Michael Schurter cbf6ba843d
cli: fix op api typos
Co-authored-by: Seth Hoenig <seth.a.hoenig@gmail.com>
2022-02-25 16:31:56 -08:00
Michael Schurter 4550c5fb80 cli: only return 1 on errors from op api
We don't want people to expect stable error codes for errors, and I
don't think these were useful for scripts anyway.
2022-02-25 16:23:31 -08:00
Michael Schurter a42d832f98 cli: add tests and minor fixes for op api
Trimmed spaces around header values.

Fixed method getting forced to GET.
2022-02-24 17:06:07 -08:00
Michael Schurter 238a732098 cli: add filter support 2022-02-24 15:52:54 -08:00
Michael Schurter bb3daac628 rename `nomad curl` to `nomad operator api` 2022-02-24 15:52:54 -08:00
Michael Schurter 141db0c562 cli: add curl command
Just a hackweek project at this point.
2022-02-24 15:52:54 -08:00
Tim Gross 31ee2a3c67
CSI: ensure all fields are mapped from structs to api response (#12124)
In PR #12108 we added missing fields to the plugin response, but we
didn't include the manual serialization steps that we need until
issue #10470 is resolved.
2022-02-24 14:17:15 -05:00
Tim Gross 13ea2c7fb3
CSI: display plugin capabilities in verbose status (#12116)
The behaviors of CSI plugins are governed by their capabilities as
defined by the CSI specification. When debugging plugin issues, it's
useful to know which behaviors are expected so they can be matched
against RPC calls made to the plugin allocations.

Expose the plugin capabilities as named in the CSI spec in the `nomad
plugin status -verbose` output.
2022-02-24 13:51:38 -05:00
Sander Mol 42b338308f
add go-sockaddr templating support to nomad consul address (#12084) 2022-02-24 09:34:54 -05:00
Florian Apolloner 3bced8f558
namespaces: allow enabling/disabling allowed drivers per namespace 2022-02-24 09:27:32 -05:00
Seth Hoenig a0350b0608 command: switch from raft-boltdb to raft-boltdb/v2 2022-02-23 14:43:59 -06:00
Seth Hoenig de95998faa core: switch to go.etc.io/bbolt
This PR swaps the underlying BoltDB implementation from boltdb/bolt
to go.etc.io/bbolt.

In addition, the Server has a new configuration option for disabling
NoFreelistSync on the underlying database.

Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81
Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720
2022-02-23 14:26:41 -06:00
Michael Schurter 7494a0c4fd core: remove all traces of unused protocol version
Nomad inherited protocol version numbering configuration from Consul and
Serf, but unlike those projects Nomad has never used it. Nomad's
`protocol_version` has always been `1`.

While the code is effectively unused and therefore poses no runtime
risks to leave, I felt like removing it was best because:

1. Nomad's RPC subsystem has been able to evolve extensively without
   needing to increment the version number.
2. Nomad's HTTP API has evolved extensively without increment
   `API{Major,Minor}Version`. If we want to version the HTTP API in the
   future, I doubt this is the mechanism we would choose.
3. The presence of the `server.protocol_version` configuration
   parameter is confusing since `server.raft_protocol` *is* an important
   parameter for operators to consider. Even more confusing is that
   there is a distinct Serf protocol version which is included in `nomad
   server members` output under the heading `Protocol`. `raft_protocol`
   is the *only* protocol version relevant to Nomad developers and
   operators. The other protocol versions are either deadcode or have
   never changed (Serf).
4. If we were to need to version the RPC, HTTP API, or Serf protocols, I
   don't think these configuration parameters and variables are the best
   choice. If we come to that point we should choose a versioning scheme
   based on the use case and modern best practices -- not this 6+ year
   old dead code.
2022-02-18 16:12:36 -08:00
Luiz Aoqui de91954582
initial base work for implementing sorting and filter across API endpoints (#12076) 2022-02-16 14:34:36 -05:00
Luiz Aoqui 110dbeeb9d
Add `go-bexpr` filters to evals and deployment list endpoints (#12034) 2022-02-16 11:40:30 -05:00
Seth Hoenig ac3cd73d00
Merge pull request #12054 from hashicorp/b-creation-indexes
api: return sorted results in certain list endpoints
2022-02-15 15:08:38 -06:00
Seth Hoenig 40c714a681 api: return sorted results in certain list endpoints
These API endpoints now return results in chronological order. They
can return results in reverse chronological order by setting the
query parameter ascending=true.

- Eval.List
- Deployment.List
2022-02-15 13:48:28 -06:00
Alex Holyoake 3071c7d91b
config: merge ReservableCores in clientConfig (#12044) 2022-02-15 08:36:37 -05:00
Tim Gross 2f79a260fe
csi: volume cli prefix matching should accept exact match (#12051)
The `volume detach`, `volume deregister`, and `volume status` commands
accept a prefix argument for the volume ID. Update the behavior on
exact matches so that if there is more than one volume that matches
the prefix, we should only return an error if one of the volume IDs is
not an exact match. Otherwise we won't be able to use these commands
at all on those volumes. This also makes the behavior of these commands
consistent with `job stop`.
2022-02-11 08:53:03 -05:00
Luiz Aoqui 3bf6036487 Version 1.2.6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJiBIXqAAoJELC0QQl2hbZ2M8cP/A7LENJbFSph25M1aGItra5j
 BphSX//Sq/v9ZzO44rOGNYQGfTpFT8STJgj2GC50qR/ilF4KX4D0oZlDyu/6D0NG
 ouN9RUjnFd6IEDQrjqqqhr3F69Z95SWVfi1rfgn/pIgOYkVEXfi6DXaulVVyd2ZT
 J0G5w5ryl5d8PhuL7TWw4zbhZRQn0hVspZv/1s3/I9aG6Sew8SMweeOxbN9lBr7E
 H19Amdjh6ugRuPgU7YMpKDVrZQRv9Wt7BUP/uc0u3LiW9z3Ko8ZKnCRKErtL5Kc3
 HDZsWe+t3va4Uekzd0HULNcYU4kwjogdRYRzX5kRsOyXelrZkQIqYFiKrk1wVbq/
 cYM5DUak6eUQBGhgi3UY0fklBFq4GDGpiwEzn7rvQb0PRSuVyykgbZ12fzyIu8dp
 tWbR/WOEg9F+jva6HkR2kDIcr5mDmny3Pxi5aUT6lMk1111nCzOjDzhLkQVtfsex
 FDMByXxM4oWAK3ouq2OIdxDL2c742A2933C4/30KWE7Xy7twsvkGw52irw66VO3V
 4PHP880cDvEDaEh15mY/8FlaAE7t/gsCUuYLxGwl33TaXSRBLc9vVNrrp89q53TD
 ZcvXTBpHUOWa6ZlHF/4f8LW44rowM6bU0Wili7NaWOKx86dnUJMG4sqJifNgcpS/
 7lXogv98CYLbMy4X4if0
 =NY1Z
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEElFaq1Z5DKdB91i+lKfRZwNnLtXMFAmIFbbkACgkQKfRZwNnL
 tXOr/g/+N2ZBMK8ohEvtdXLl7WXrVhgJfUSVbdD5Kfshul9CPn3yWRxJzqtEN2Pf
 55ozeWLpoziP9y9LviJ7rDidXcTmDFutbFdGJ3L+ZLdLILsNOq1A+lbuwO3fJngZ
 5aiPoJLsw4sqj6uHaM6Cls2f145O92nT7GXEHCxuvGHeSf3NkcR+zRY5nPrLTIrA
 uxYefCOzP6C2I+W7dL4Oj5R5EZd4UDi1WiL8pGzwm24LcagZN2ctctolAeF9OlJX
 M58UUv9b4GObe617u8MeH0LIlyZiNwn9JqrV33dKVTyrkBIYfYxkzdzMKf1csVYk
 kQb13KPdPTASBAGTl+sxeXXnw/bg09JXGcvREX5lLyQqY8xGwTv2FpTmybKWLiss
 Bg6BbejrgtCPBik0EAHWV0+kVzhi9bPfUYwTXLDCzMtrbyCyPoWchruel2sm41U1
 ezRDzlSvf6nrXf7sAv6umJICck4Bc5Gol+8W7fxvWqnY9rQ3ds2v7E5lXZMBbOmE
 JSi+EDWBJjBAXehE6pLxeVsvlHMRWN007Z2UeD4neGIgG7xFJLq6nKeUKoiNIpgk
 hKBL8iwHyuJfrBB/dcPzI9NV+jL6OZ/oI1RWxSj0MX/B4VXZp8HrqZA5JxzQolUg
 KIxqe4iX3WIkQv+UU4WiELvs4O7fujB4KWz3iQokhwDxqGUpffk=
 =5EG2
 -----END PGP SIGNATURE-----

Merge tag 'v1.2.6' into merge-release-1.2.6-branch

Version 1.2.6
2022-02-10 14:55:34 -05:00
Luiz Aoqui 15f9d54dea
api: prevent excessice CPU load on job parse
Add new namespace ACL requirement for the /v1/jobs/parse endpoint and
return early if HCLv2 parsing fails.

The endpoint now requires the new `parse-job` ACL capability or
`submit-job`.
2022-02-09 19:51:47 -05:00
Thomas Lefebvre 3b57f3af9d
Add config command and config validate subcommand to nomad CLI (#9198) 2022-02-08 16:52:35 -05:00
Tim Gross 7ad15b2b42
raft: default to protocol v3 (#11572)
Many of Nomad's Autopilot features require raft protocol version
3. Set the default raft protocol to 3, and improve the upgrade
documentation.
2022-02-03 15:03:12 -05:00
Seth Hoenig db2347a86c cleanup: prevent leaks from time.After
This PR replaces use of time.After with a safe helper function
that creates a time.Timer to use instead. The new function returns
both a time.Timer and a Stop function that the caller must handle.

Unlike time.NewTimer, the helper function does not panic if the duration
set is <= 0.
2022-02-02 14:32:26 -06:00
Derek Strickland 460416e787 Update IsEmpty to check for pre-1.2.4 fields (#11930) 2022-01-28 14:41:49 -05:00
Derek Strickland b3c8ab9be7
Update IsEmpty to check for pre-1.2.4 fields (#11930) 2022-01-26 11:31:37 -05:00
Tim Gross 1dad0e597e
fix integer bounds checks (#11815)
* driver: fix integer conversion error

The shared executor incorrectly parsed the user's group into int32 and
then cast to uint32 without bounds checking. This is harmless because
an out-of-bounds gid will throw an error later, but it triggers
security and code quality scans. Parse directly to uint32 so that we
get correct error handling.

* helper: fix integer conversion error

The autopilot flags helper incorrectly parses a uint64 to a uint which
is machine specific size. Although we don't have 32-bit builds, this
sets off security and code quality scaans. Parse to the machine sized
uint.

* driver: restrict bounds of port map

The plugin server doesn't constrain the maximum integer for port
maps. This could result in a user-visible misconfiguration, but it
also triggers security and code quality scans. Restrict the bounds
before casting to int32 and return an error.

* cpuset: restrict upper bounds of cpuset values

Our cpuset configuration expects values in the range of uint16 to
match the expectations set by the kernel, but we don't constrain the
values before downcasting. An underflow could lead to allocations
failing on the client rather than being caught earlier. This also make
security and code quality scanners happy.

* http: fix integer downcast for per_page parameter

The parser for the `per_page` query parameter downcasts to int32
without bounds checking. This could result in underflow and
nonsensical paging, but there's no server-side consequences for
this. Fixing this will silence some security and code quality scanners
though.
2022-01-25 11:16:48 -05:00
Seth Hoenig 0030424384
Merge pull request #11889 from hashicorp/build-update-circle
build: upgrade circleci configuration
2022-01-24 10:18:21 -06:00
Seth Hoenig 2f0cfb5740 build: upgrade and speedup circleci configuration
This PR upgrades our CI images and fixes some affected tests.

- upgrade go-machine-image to premade latest ubuntu LTS (ubuntu-2004:202111-02)

- eliminate go-machine-recent-image (no longer necessary)

- manage GOPATH in GNUMakefile (see https://discuss.circleci.com/t/gopath-is-set-to-multiple-directories/7174)

- fix tcp dial error check (message seems to be OS specific)

- spot check values measured instead of specifically 'RSS' (rss no longer reported in cgroups v2)

- use safe MkdirTemp for generating tmpfiles

NOT applied: (too flakey)

- eliminate setting GOMAXPROCS=1 (build tools were also affected by this setting)

- upgrade resource type for all imanges to large (2C -> 4C)
2022-01-24 08:28:14 -06:00
Seth Hoenig f2a71fd0d9 deps: pty has new home
github.com/kr/pty was moved to github.com/creack/pty

Swap this dependency so we can upgrade to the latest version
and no longer need a replace directive.
2022-01-19 12:33:05 -06:00
Seth Hoenig 2a5f7c0386 deps: swap gzip handler for gorilla
This has been pinned since the Go modules migration, because the
nytimes gzip handler was modified in version v1.1.0 in a way that
is no longer compatible.

Pretty sure it is this commit: c551b6c3b4

Instead use handler.CompressHandler from gorilla, which is a web toolkit we already
make use of for other things.
2022-01-19 11:52:19 -06:00
Nomad Release bot de3070d49a Generate files for 1.2.4 release 2022-01-18 23:43:00 +00:00
Dave May 330d24a873
cli: Add event stream capture to nomad operator debug (#11865) 2022-01-17 21:35:51 -05:00
Michael Schurter 99c863f909
cli: improve debug error messages (#11507)
Improves `nomad debug` error messages when contacting agents that do not
have /v1/agent/host endpoints (the endpoint was added in v0.12.0)

Part of #9568 and manually tested against Nomad v0.8.7.

Hopefully isRedirectError can be reused for more cases listed in #9568
2022-01-17 11:15:17 -05:00
Tim Gross 33f7c6cba4
csi: when warning for multiple prefix matches, use full ID (#11853)
When the `volume deregister` or `volume detach` commands get an ID
prefix that matches multiple volumes, show the full length of the
volume IDs in the list of volumes shown so so that the user can select
the correct one.
2022-01-14 12:25:48 -05:00
Tim Gross 9c4864badd
freebsd: build fix for ARM7 32-bit (#11854)
The size of `stat_t` fields is architecture dependent, which was
reportedly causing a build failure on FreeBSD ARM7 32-bit
systems. This changeset matches the behavior we have on Linux.
2022-01-14 12:25:32 -05:00
James Rasell 82b168bf34
Merge pull request #11403 from hashicorp/f-gh-11059
agent/docs: add better clarification when top-level data dir needs setting
2022-01-13 16:41:35 +01:00
Luiz Aoqui d48e50da9a
Fix log level parsing from lines that include a timestamp (#11838) 2022-01-13 09:56:35 -05:00
Michael Schurter e6eff95769 agent: validate reserved_ports are valid
Goal is to fix at least one of the causes that can cause a node to be
ineligible to receive work:
https://github.com/hashicorp/nomad/issues/9506#issuecomment-1002880600
2022-01-12 14:21:47 -08:00
Seth Hoenig 8c97ffd68e cleanup: stop referencing depreceted HeaderMap field
Remove reference to the deprecated ResponseRecorder.HeaderMap field,
instead calling .Response.Header() to get the same data.

closes #10520
2022-01-12 10:32:54 -06:00
Derek Strickland 0a8e03f0f7
Expose Consul template configuration parameters (#11606)
This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza.

- `wait`

It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza.

- `max_stale`
- `block_query_wait`
- `consul_retry`
- `vault_retry` 
- `wait` 

Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure.

- `wait_bounds`

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2022-01-10 10:19:07 -05:00
Charlie Voiselle 98a240cd99
Make number of scheduler workers reloadable (#11593)
## Development Environment Changes
* Added stringer to build deps

## New HTTP APIs
* Added scheduler worker config API
* Added scheduler worker info API

## New Internals
* (Scheduler)Worker API refactor—Start(), Stop(), Pause(), Resume()
* Update shutdown to use context
* Add mutex for contended server data
    - `workerLock` for the `workers` slice
    - `workerConfigLock` for the `Server.Config.NumSchedulers` and
      `Server.Config.EnabledSchedulers` values

## Other
* Adding docs for scheduler worker api
* Add changelog message

Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2022-01-06 11:56:13 -05:00
Tim Gross 2806dc2bd7
docs/tests for multiple HTTP address config (#11760) 2022-01-03 10:17:13 -05:00
Kevin Schoonover 5d9a506bc0
agent: support multiple http address in addresses.http (#11582) 2022-01-03 09:33:53 -05:00
James Rasell 45f4689f9c
chore: fixup inconsistent method receiver names. (#11704) 2021-12-20 11:44:21 +01:00
Tim Gross c7cc3cf4dc
cli: stream raft logs to operator raft logs subcommand (#11684)
The `nomad operator raft logs` command uses a raft helper that reads
in the logs from raft and serializes them to JSON. The previous
implementation returned the slice of all logs and then serializes the
entire object. Update the helper to stream the log entries and then
serialize them as newline-delimited JSON.
2021-12-16 13:38:58 -05:00
Tim Gross f2615992a4
cli: unhide advanced operator raft debugging commands (#11682)
The `nomad operator raft` and `nomad operator snapshot state`
subcommands for inspecting on-disk raft state were hidden and
undocumented. Expose and document these so that advanced operators
have support for these tools.
2021-12-16 10:32:11 -05:00
Tim Gross 536e3c5282
`nomad eval list` command (#11675)
Use the new filtering and pagination capabilities of the `Eval.List`
RPC to provide filtering and pagination at the command line.

Also includes note that `nomad eval status -json` is deprecated and
will be replaced with a single evaluation view in a future version of
Nomad.
2021-12-15 11:58:38 -05:00
Tim Gross f8a133a810
cli: ensure `-stale` flag is respected by `nomad operator debug` (#11678)
When a cluster doesn't have a leader, the `nomad operator debug`
command can safely use stale queries to gracefully degrade the
consistency of almost all its queries. The query parameter for these
API calls was not being set by the command.

Some `api` package queries do not include `QueryOptions` because
they target a specific agent, but they can potentially be forwarded to
other agents. If there is no leader, these forwarded queries will
fail. Provide methods to call these APIs with `QueryOptions`.
2021-12-15 10:44:03 -05:00
Tim Gross a0cf5db797
provide `-no-shutdown-delay` flag for job/alloc stop (#11596)
Some operators use very long group/task `shutdown_delay` settings to
safely drain network connections to their workloads after service
deregistration. But during incident response, they may want to cause
that drain to be skipped so they can quickly shed load.

Provide a `-no-shutdown-delay` flag on the `nomad alloc stop` and
`nomad job stop` commands that bypasses the delay. This sets a new
desired transition state on the affected allocations that the
allocation/task runner will identify during pre-kill on the client.

Note (as documented here) that using this flag will almost always
result in failed inbound network connections for workloads as the
tasks will exit before clients receive updated service discovery
information and won't be gracefully drained.
2021-12-13 14:54:53 -05:00
Tim Gross 624ecab901
evaluations list pagination and filtering (#11648)
API queries can request pagination using the `NextToken` and `PerPage`
fields of `QueryOptions`, when supported by the underlying API.

Add a `NextToken` field to the `structs.QueryMeta` so that we have a
common field across RPCs to tell the caller where to resume paging
from on their next API call. Include this field on the `api.QueryMeta`
as well so that it's available for future versions of List HTTP APIs
that wrap the response with `QueryMeta` rather than returning a simple
list of structs. In the meantime callers can get the `X-Nomad-NextToken`.

Add pagination to the `Eval.List` RPC by checking for pagination token
and page size in `QueryOptions`. This will allow resuming from the
last ID seen so long as the query parameters and the state store
itself are unchanged between requests.

Add filtering by job ID or evaluation status over the results we get
out of the state store.

Parse the query parameters of the `Eval.List` API into the arguments
expected for filtering in the RPC call.
2021-12-10 13:43:03 -05:00
Lukas W 0e5958d671
CLI: Return non-zero exit code when deployment fails in `nomad run` (#11550)
* Exit non-zero from run command if deployment fails
* Fix typo in deployment monitor introduced in 0edda11
2021-12-09 09:09:28 -05:00
Vyacheslav Morov 6a244f18ad
cli: Add var args to plan output. (#11631) 2021-12-07 10:43:52 -05:00
Tim Gross 03e697a69d
scheduler: config option to reject job registration (#11610)
During incident response, operators may find that automated processes
elsewhere in the organization can be generating new workloads on Nomad
clusters that are unable to handle the workload. This changeset adds a
field to the `SchedulerConfiguration` API that causes all job
registration calls to be rejected unless the request has a management
ACL token.
2021-12-06 15:20:34 -05:00
Derek Strickland fb6dbffa59
Override TLS flags individually for meta commands (#11592)
* Override TLS flags individually for meta commands

* Update command/meta.go

Co-authored-by: Tim Gross <tgross@hashicorp.com>

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2021-12-01 12:07:48 -05:00
Tim Gross 7770eda3f1
config: fix test-only failures in UI handler setup (#11571)
The `TestHTTPServer_Limits_Error` test never starts the agent so it
had an incomplete configuration, which caused panics in the test. Fix
the configuration.

The PR #11555 had a branch name like `f-ui-*` which caused CI to skip
the unit tests over the HTTP handler setup, so this wasn't caught in
PR review.
2021-11-24 16:19:04 -05:00
Tim Gross fcb96de9a7
config: UI configuration block with Vault/Consul links (#11555)
Add `ui` block to agent configuration to enable/disable the web UI and
provide the web UI with links to Vault/Consul.
2021-11-24 11:20:02 -05:00
James Rasell 751c8217d1
core: allow setting and propagation of eval priority on job de/registration (#11532)
This change modifies the Nomad job register and deregister RPCs to
accept an updated option set which includes eval priority. This
param is optional and override the use of the job priority to set
the eval priority.

In order to ensure all evaluations as a result of the request use
the same eval priority, the priority is shared to the
allocReconciler and deploymentWatcher. This creates a new
distinction between eval priority and job priority.

The Nomad agent HTTP API has been modified to allow setting the
eval priority on job update and delete. To keep consistency with
the current v1 API, job update accepts this as a payload param;
job delete accepts this as a query param.

Any user supplied value is validated within the agent HTTP handler
removing the need to pass invalid requests to the server.

The register and deregister opts functions now all for setting
the eval priority on requests.

The change includes a small change to the DeregisterOpts function
which handles nil opts. This brings the function inline with the
RegisterOpts.
2021-11-23 09:23:31 +01:00
Tim Gross e729133134
api: return 404 for alloc FS list/stat endpoints (#11482)
* api: return 404 for alloc FS list/stat endpoints

If the alloc filesystem doesn't have a file requested by the List
Files or Stat File API, we currently return a HTTP 500 error with the
expected "file not found" error message. Return a HTTP 404 error
instead.

* update FS Handler

Previously the FS handler would interpret a 500 status as a 404
in the adapter layer by checking if the response body contained
the text  or is the response status
was 500 and then throw an error code for 404.

Co-authored-by: Jai Bhagat <jaybhagat841@gmail.com>
2021-11-17 11:15:07 -05:00
Luiz Aoqui 610a8a05e6
Merge release 1.2.0 rc1 branch (#11486) 2021-11-09 17:55:13 -05:00
Dave May 3c04d7927b
cli: refactor operator debug capture (#11466)
* debug: refactor Consul API collection
* debug: refactor Vault API collection
* debug: cleanup test timing
* debug: extend test to multiregion
* debug: save cmdline flags in bundle
* debug: add cli version to output
* Add changelog entry
2021-11-05 19:43:10 -04:00
Alessandro De Blasis 07c670fdc0
cli: show `host_network` in `nomad status` (#11432)
Enhance the CLI in order to return the host network in two flavors 
(default, verbose) of the `node status` command.

Fixes: #11223.
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
2021-11-05 09:02:46 -04:00
Florian Apolloner ef88795af3
Added a `-hcl2-strict` flag to allow for lenient hcl variable parsing. (#11284)
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2021-11-04 16:33:09 +01:00
James Rasell 674761436e
Merge pull request #11165 from hashicorp/b-gh-11149
jobspec2: ensure consistent error handling between var-file & var.
2021-11-04 16:24:00 +01:00
Mahmood Ali 4fc6e50782
Raft Debugging Improvements (#11414) 2021-11-04 10:16:12 -04:00
Michael Schurter ef3fc79225
Merge pull request #11334 from hashicorp/f-chroot-skip-allocdir
client: never embed alloc_dir in chroot
2021-11-03 16:48:09 -07:00
Luiz Aoqui 5d204c8ced
Revert "Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799)" (#11433) 2021-11-02 17:42:52 -04:00
James Rasell c071efbd6b
Merge pull request #11411 from hashicorp/f-gh-11406
cli: add json and template flag opts to acl bootstrap command.
2021-11-02 09:48:25 +01:00
Charlie Voiselle 29e7d46dd9
Making RPC Upgrade mode reloadable. (#11144)
- Making RPC Upgrade mode reloadable.
- Add suggestions from code review
- remove spurious comment
- switch to require(t,...) form for test.
- Add to changelog
2021-11-01 16:30:53 -04:00
James Rasell 6c9e6e6f20
cli: add json and template flag opts to acl boostrap command. 2021-10-29 09:00:50 +02:00
James Rasell 4c92a77aac
agent: clarify error info when data dir needs setting. 2021-10-28 15:05:56 +02:00
Dave May 509c74ce19
debug: update default node-id and docs (#11398)
* debug: default node-id to all
* debug: align cli help and website documentation
2021-10-27 13:43:56 -04:00
Mahmood Ali cdddd64a42
logging: Log the cause behind agent startup failure (#11353)
Log the failure error when the agent fails to start. Previously, the
agent startup failure error would be emitted to the command UI but not
logged. So it doesn't get emitted to syslog or `log_file` if they are
set, and it makes debugging much harder. Also, logging the error again
before exit makes the error more visible: previously, the operator
needed to scroll to the top to find the error.

On a sample failure, the output will look like:
```
==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
==> Loaded configuration from sample-configs/config-bad
==> Starting Nomad agent...
==> Error starting agent: setting up server node ID failed: mkdir /path-without-permission: read-only file system
    2021-10-20T14:38:51.179-0400 [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/path-without-permission/plugins
    2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins
    2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [ERROR] agent: error starting agent: error="setting up server node ID failed: mkdir /path-without-permission: read-only file system"
```

This change adds the final `ERROR` message. It's easy to miss the `==>
Error starting agent` above.
2021-10-27 10:41:17 -07:00
Luiz Aoqui b463715a98
prevent active log from being overwritten when agent starts (#11386) 2021-10-26 20:57:07 -04:00
Luiz Aoqui 979faf41e5
fix test names (#11374) 2021-10-22 15:43:55 -04:00
Luiz Aoqui 3c22fc79a5
add dispatch idempotency token support in the CLI (#10930) 2021-10-22 12:39:05 -04:00
Luiz Aoqui 6853bf9632
cli: allow setting namespace and region in the `nomad ui` command (#11364) 2021-10-21 16:24:39 -04:00
Shishir Mahajan dd93f72920 Code cleanup: Remove extra if clause.
Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2021-10-19 16:52:11 -07:00
Michael Schurter 10c3bad652 client: never embed alloc_dir in chroot
Fixes #2522

Skip embedding client.alloc_dir when building chroot. If a user
configures a Nomad client agent so that the chroot_env will embed the
client.alloc_dir, Nomad will happily infinitely recurse while building
the chroot until something horrible happens. The best case scenario is
the filesystem's path length limit is hit. The worst case scenario is
disk space is exhausted.

A bad agent configuration will look something like this:

```hcl
data_dir = "/tmp/nomad-badagent"

client {
  enabled = true

  chroot_env {
    # Note that the source matches the data_dir
    "/tmp/nomad-badagent" = "/ohno"
    # ...
  }
}
```

Note that `/ohno/client` (the state_dir) will still be created but not
`/ohno/alloc` (the alloc_dir).
While I cannot think of a good reason why someone would want to embed
Nomad's client (and possibly server) directories in chroots, there
should be no cause for harm. chroots are only built when Nomad runs as
root, and Nomad disables running exec jobs as root by default. Therefore
even if client state is copied into chroots, it will be inaccessible to
tasks.

Skipping the `data_dir` and `{client,server}.state_dir` is possible, but
this PR attempts to implement the minimum viable solution to reduce risk
of unintended side effects or bugs.

When running tests as root in a vm without the fix, the following error
occurs:

```
=== RUN   TestAllocDir_SkipAllocDir
    alloc_dir_test.go:520:
                Error Trace:    alloc_dir_test.go:520
                Error:          Received unexpected error:
                                Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long
                Test:           TestAllocDir_SkipAllocDir
--- FAIL: TestAllocDir_SkipAllocDir (22.76s)
```

Also removed unused Copy methods on AllocDir and TaskDir structs.

Thanks to @eveld for not letting me forget about this!
2021-10-18 09:22:01 -07:00
Luiz Aoqui 130970e12e
Merge missing commits from 1.2.0-beta1 release branch (#11319) 2021-10-14 16:10:05 -04:00
Luiz Aoqui 9d48daed8c
fix `nomad job allocs` command name (#11314) 2021-10-14 12:44:59 -04:00
Charlie Voiselle cb8e52b5df
Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799) 2021-10-13 21:23:13 -04:00
Michael Schurter 59fda1894e
Merge pull request #11167 from a-zagaevskiy/master
Support configurable dynamic port range
2021-10-13 16:47:38 -07:00
Michael Schurter e14cd34392 client: improve errors & tests for dynamic ports 2021-10-13 16:25:25 -07:00
Dave May c37a6ed583
cli: rename paths in debug bundle for clarity (#11307)
* Rename folders to reflect purpose
* Improve captured files test coverage
* Rename CSI plugins output file
* Add changelog entry
* fix test and make changelog message more explicit

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2021-10-13 18:00:55 -04:00
Mahmood Ali fa4df28fcd
tests: ensure that tests restore env-var values (#11309)
Fix a test corruption issue, where a test accidentally unsets
the `NOMAD_LICENSE` environment variable, that's relied on by some
tests.

As a habit, tests should always restore the environment variable value
on test completion. Golang 1.17 introduced
[`t.Setenv`](https://pkg.go.dev/testing#T.Setenv) to address this issue.
However, as 1.0.x and 1.1.x branches target golang 1.15 and 1.16, I
opted to use a helper function to ease backports.
2021-10-13 17:26:56 -04:00
Dave May 305e8e98bf
cli: Improved autocomplete support for job dispatch and operator debug (#11270)
* Add autocomplete to nomad job dispatch
* Add autocomplete to nomad operator debug
* Update incorrect comment
* Update test to verify autocomplete
* Add changelog
* Apply lint suggestions
* Create dynamic slices instead of specific length
* Align style across predictors
2021-10-12 20:01:54 -04:00
Dave May 2d14c54fa0
debug: Improve namespace and region support (#11269)
* Include region and namespace in CLI output
* Add region and prefix matching for server members
* Add namespace and region API outputs to cluster metadata folder
* Add region awareness to WaitForClient helper function
* Add helper functions for SliceStringHasPrefix and StringHasPrefixInSlice
* Refactor test client agent generation
* Add tests for region
* Add changelog
2021-10-12 16:58:41 -04:00
Dave May 76b05f3cd2
cli: Add nomad job allocs command (#11242) 2021-10-12 16:30:36 -04:00
Luiz Aoqui 3e0bad5a41
wrap `log` messages with `hclog` (#11291) 2021-10-12 14:38:44 -04:00
Aleksandr Zagaevskiy d92666e6a7 fixup! Support configurable dynamic port range 2021-10-11 14:13:59 +03:00
Matt Mukerjee b56432e645
Add FailoverHeartbeatTTL to config (#11127)
FailoverHeartbeatTTL is the amount of time to wait after a server leader failure
before considering reallocating client tasks. This TTL should be fairly long as
the new server leader needs to rebuild the entire heartbeat map for the
cluster. In deployments with a small number of machines, the default TTL (5m)
may be unnecessary long. Let's allow operators to configure this value in their
config files.
2021-10-06 18:48:12 -04:00
Shantanu Gadgil 0ce156123d
auth_soft_fail needed for public images when agent is configured with auth (#11190) 2021-10-06 15:30:23 -04:00
Florian Apolloner 0fa60dae9d
Added support for `-force-color` to the CLI. (#10975) 2021-10-06 10:02:42 -04:00
Yan 6ff0b6debc
add `-show-url` option for `ui` command (#11213) 2021-10-05 20:08:42 -04:00
Luiz Aoqui 0a62bdc3c5
fix panic when Connect mesh gateway doesn't have a proxy block (#11257)
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2021-10-04 15:52:07 -04:00
Mahmood Ali 4d90afb425 gofmt all the files
mostly to handle build directives in 1.17.
2021-10-01 10:14:28 -04:00
Michael Schurter c6e72b6818 client: output reserved ports with min/max ports
Also add a little more min/max port testing and add the consts back that
had been removed: but unexported and as defaults.
2021-09-30 17:05:46 -07:00
Michael Schurter 4ad0c258b9 client: add NOMAD_LICENSE to default env deny list
By default we should not expose the NOMAD_LICENSE environment variable
to tasks.

Also refactor where the DefaultEnvDenyList lives so we don't have to
maintain 2 copies of it. Since client/config is the most obvious
location, keep a reference there to its unfortunate home buried deep
in command/agent/host. Since the agent uses this list as well for the
/agent/host endpoint the list must be accessible from both command/agent
and client.
2021-09-21 13:51:17 -07:00
Florian Apolloner 7805b8edf4
Fixed usage of NOMAD_CLI_NO_COLOR env variable. (#11168) 2021-09-17 20:37:05 -04:00
James Rasell 0e926ef3fd
allow configuration of Docker hostnames in bridge mode (#11173)
Add a new hostname string parameter to the network block which
allows operators to specify the hostname of the network namespace.
Changing this causes a destructive update to the allocation and it
is omitted if empty from API responses. This parameter also supports
interpolation.

In order to have a hostname passed as a configuration param when
creating an allocation network, the CreateNetwork func of the
DriverNetworkManager interface needs to be updated. In order to
minimize the disruption of future changes, rather than add another
string func arg, the function now accepts a request struct along with
the allocID param. The struct has the hostname as a field.

The in-tree implementations of DriverNetworkManager.CreateNetwork
have been modified to account for the function signature change.
In updating for the change, the enhancement of adding hostnames to
network namespaces has also been added to the Docker driver, whilst
the default Linux manager does not current implement it.
2021-09-16 08:13:09 +02:00
Aleksandr Zagaevskiy ebb87e65fe Support configurable dynamic port range 2021-09-10 11:52:47 +03:00
James Rasell 257d63eec9
jobspec2: ensure consistent error handling between var-file & var. 2021-09-09 11:18:11 +02:00
James Rasell 04a15b5c16
Merge pull request #11105 from hashicorp/f-add-staticcheck-ci
ci: add staticcheck with ST1020 and update golangci-lint
2021-09-09 09:42:12 +02:00
Luiz Aoqui 4dd8b6b571
cli: include all possible scores in alloc status metric table (#11128) 2021-09-08 17:30:11 -04:00
James Rasell d4a333e9b5
lint: mark false positive or fix gocritic append lint errors. 2021-09-06 10:49:44 +02:00
James Rasell b6813f1221
chore: fix incorrect docstring formatting. 2021-08-30 11:08:12 +02:00
Luiz Aoqui 104d29e808
Don't timestamp active log file (#11070)
* don't timestamp active log file

* website: update log_file default value

* changelog: add entry for #11070

* website: add upgrade instructions for log_file in v1.14 and v1.2.0
2021-08-23 11:27:34 -04:00
Mahmood Ali c37339a8c8
Merge pull request #9160 from hashicorp/f-sysbatch
core: implement system batch scheduler
2021-08-16 09:30:24 -04:00
Michael Schurter a7aae6fa0c
Merge pull request #10848 from ggriffiths/listsnapshot_secrets
CSI Listsnapshot secrets support
2021-08-10 15:59:33 -07:00
Seth Hoenig 3371214431 core: implement system batch scheduler
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
2021-08-03 10:30:47 -04:00
James Rasell 78a489418d
cli: fix minor format error within `-ca-cert` help text. 2021-08-03 16:05:06 +02:00
Mahmood Ali 0bc12fba7c
Only initialize task.VolumeMounts when not-nil (#10990)
1.1.3 had a bug where task.VolumeMounts will be an empty slice instead of nil. Eventually, it gets canonicalized and is set to `nil`, but it seems to confuse dry-run planning.

The regression was introduced in https://github.com/hashicorp/nomad/pull/10855/files#diff-56b3c82fcbc857f8fb93a903f1610f6e6859b3610a4eddf92bad9ea27fdc85ecL1028-R1037 . Curiously, it's the only place where `len(apiTask.VolumeMounts)` check was dropped. I assume it was dropped accidentally.

Fixes #10981
2021-08-02 13:08:10 -04:00
Nomad Release bot b5dff8be42 Generate files for 1.1.3 release 2021-07-29 03:43:03 +00:00
Grant Griffiths fecbbaee22 CSI ListSnapshots secrets implementation
Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>
2021-07-28 11:30:29 -07:00
Mahmood Ali d97927ebcf
cli: Use glint to determine if os.Stdout is tty (#10926)
Use glint to determine if os.Stdout is a terminal.

glint Terminal renderer expects os.Stdout [not only to be a terminal, but also to have non-zero size](b492b545f6/renderer_term.go (L39-L46)). It's unclear how this condition arises, but this additional check causes Nomad to render deployments progress through glint when glint cannot support it.

By using golint to perform the check, we eliminate the risk of mis-judgement.
2021-07-23 11:27:47 -04:00
Luiz Aoqui 484037aff1
fix `nomad alloc signal` help message (#10917) 2021-07-21 11:02:44 -04:00
Kent 'picat' Gruber decd59dbd1
Merge pull request #10886 from hashicorp/cli-handle-successful-deployment
Handle successful/canceled/blocked deployments in CLI output
2021-07-16 12:27:22 -04:00
Kent 'picat' Gruber b85b56624b Handle `DeploymentStatusFailed` unless `hasAutoRevert` 2021-07-15 17:06:13 -04:00
Mahmood Ali 996ea1fa46
Merge pull request #10875 from hashicorp/b-namespace-flag-override
cli: `-namespace` should override job namespace
2021-07-14 17:28:36 -04:00
Kent 'picat' Gruber 15342d0f6a Handle successful/canceled/blocked deployments in CLI output
Otherwise the spinner would just end, which felt a bit awkward.

I wanted to see a  "✓" to know that everything was ok, and a "!" (maybe something else?) if something went wrong.
2021-07-09 19:27:55 -04:00
Seth Hoenig 7c3db812fd consul/connect: remove sidecar proxy before removing parent service
This PR will have Nomad de-register a sidecar proxy service before
attempting to de-register the parent service. Otherwise, Consul will
emit a warning and an error.

Fixes #10845
2021-07-08 13:30:19 -05:00
Seth Hoenig 2607853a26
Merge pull request #10872 from hashicorp/b-cc-regex-checkids
consul/connect: Avoid assumption of parent service when filtering connect proxies
2021-07-08 13:29:40 -05:00
Seth Hoenig 284cd214ec consul/connect: improve regex from CR suggestions 2021-07-08 13:05:05 -05:00
Tim Gross a3bc87a2eb cli: `-namespace` should override job namespace
When a jobspec doesn't include a namespace, we provide it with the default
namespace, but this ends up overriding the explicit `-namespace` flag. This
changeset uses the same logic as region parsing to create an order of
precedence: the query string parameter (the `-namespace` flag) overrides the
API request body which overrides the jobspec.
2021-07-08 13:17:27 -04:00
Seth Hoenig 868b246128 consul/connect: Avoid assumption of parent service when filtering connect proxies
This PR uses regex-based matching for sidecar proxy services and checks when syncing
with Consul. Previously we would check if the parent of the sidecar was still being
tracked in Nomad. This is a false invariant - one which we must not depend when we
make #10845 work.

Fixes #10843
2021-07-08 09:43:41 -05:00
Mahmood Ali 1f34f2197b
Merge pull request #10806 from hashicorp/munda/idempotent-job-dispatch
Enforce idempotency of dispatched jobs using token on dispatch request
2021-07-08 10:23:31 -04:00
Tim Gross 8f25a9d7cd
cni: respect default `cni_config_dir` and `cni_path` (#10870)
The default agent configuration values were not set, which meant they were not
being set in the client configuration and this results in fingerprints failing
unless the values were set explicitly.
2021-07-08 09:56:57 -04:00
Tim Gross e88e1e5001
testing: prevent panic when `job status` output changes (#10869)
The `command/TestJobStatusCommand_Run` test assumes that it gets back running
allocations and will panic the test runner rather than failing.
2021-07-08 09:25:44 -04:00
Alex Munda 02c1a4d912
Set/parse idempotency_token query param 2021-07-07 16:26:55 -05:00
Seth Hoenig a57b066402
Merge pull request #10865 from hashicorp/b-deregister-noops
consul: avoid extra sync operations when no action required
2021-07-07 13:42:46 -05:00
Isabel Suchanek 13db600665
cli: add -task flag to alloc signal, restart (#10859)
Alloc exec only works when task is passed as a flag and not an arg.
Alloc logs currently accepts either, but alloc signal and restart only
accept task as an arg. This adds -task as a flag to the other alloc
commands to make the cli UX consistent. If task is passed as a flag and
an arg, it ignores the arg.
2021-07-07 09:58:16 -07:00
Seth Hoenig 56a6a1b1df consul: avoid extra sync operations when no action required
This PR makes it so the Consul sync logic will ignore operations that
do not specify an action to take (i.e. [de-]register [services|checks]).

Ideally such noops would be discarded at the callsites (i.e. users
of [Create|Update|Remove]Workload], but we can also be defensive
at the commit point.

Also adds 2 trace logging statements which are helpful for diagnosing
sync operations with Consul - when they happen and why.

Fixes #10797
2021-07-07 11:24:56 -05:00
Tim Gross 69a7c9db7e
csi: account for nil volume_mount in API-to-structs conversion (#10855)
Fix a nil pointer in the API struct to `nomad/structs` conversion when a
`volume_mount` block is empty.
2021-07-07 08:06:39 -04:00
James Rasell 7a89dfe0cb
cli: fixed system commands so they correctly use passed flags. 2021-06-28 10:57:50 +02:00
Tim Gross 38e83f5ddc
csi: fix CLI panic when formatting volume status with -verbose flag (#10818)
When the `-verbose` flag is passed to the `nomad volume status` command, we
hit a code path where the rows of text to be formatted were not initialized
correctly, resulting in a panic in the CLI.
2021-06-25 16:17:37 -04:00
Dave May 1e51d00d98
Add remaining pprof profiles to nomad operator debug (#10748)
* Add remaining pprof profiles to debug dump
* Refactor pprof profile capture
* Add WaitForFilesUntil and WaitForResultUntil utility functions
* Add CHANGELOG entry
2021-06-21 14:22:49 -04:00
Russell Rollins 76446ba512
Adds error handling for client error in getRandomJobAlloc. (#10787) 2021-06-18 16:26:43 -04:00
Seth Hoenig 0d9208f1a0 consul: set task name only for group service checks
This PR fixes a bug introduced in a refactoring

https://github.com/hashicorp/nomad/pull/10764/files#diff-56b3c82fcbc857f8fb93a903f1610f6e6859b3610a4eddf92bad9ea27fdc85ec

where task level service checks would inherent the task name
field, when they shouldn't.

Fixes #10781
2021-06-18 12:16:27 -05:00
Seth Hoenig 532b898b07 consul/connect: in-place update service definition when connect upstreams are modified
This PR fixes a bug where modifying the upstreams of a Connect sidecar proxy
would not result Consul applying the changes, unless an additional change to
the job would trigger a task replacement (thus replacing the service definition).

The fix is to check if upstreams have been modified between Nomad's view of the
sidecar service definition, and the service definition for the sidecar that is
actually registered in Consul.

Fixes #8754
2021-06-16 16:48:26 -05:00
Seth Hoenig d75669da4a consul: make failures_before_critical and success_before_passing work with group services
This PR fixes some job submission plumbing to make sure the Consul Check parameters
- failure_before_critical
- success_before_passing

work with group-level services. They already work with task-level services.
2021-06-15 11:20:40 -05:00
Isabel Suchanek e3cde4f4b3
cli: check deployment exists before monitoring (#10757)
System and batch jobs don't create deployments, which means nomad tries
to monitor a non-existent deployment when it runs a job and outputs an
error message. This adds a check to make sure a deployment exists before
monitoring. Also fixes some formatting.

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2021-06-14 16:42:38 -07:00
Luiz Aoqui 98e0e952a6
fix agent-info help message formatting (#10747) 2021-06-11 15:39:28 -04:00
James Rasell 939b23936a
Merge pull request #10744 from hashicorp/b-remove-duplicate-imports
chore: remove duplicate import statements
2021-06-11 16:42:34 +02:00
James Rasell 492e308846
tests: remove duplicate import statements. 2021-06-11 09:39:22 +02:00
Nomad Release bot 7cc7389afd Generate files for 1.1.1 release 2021-06-10 08:04:25 -04:00
Mahmood Ali aa77c2731b tests: use standard library testing.TB
Glint pulled in an updated version of mitchellh/go-testing-interface
which broke some existing tests because the update added a Parallel()
method to testing.T. This switches to the standard library testing.TB
which doesn't have a Parallel() method.
2021-06-09 16:18:45 -07:00
Isabel Suchanek dfaef2468c cli: add monitor flag to deployment status
Adding '-verbose' will print out the allocation information for the
deployment. This also changes the job run command so that it now blocks
until deployment is complete and adds timestamps to the output so that
it's more in line with the output of node drain.

This uses glint to print in place in running in a tty. Because glint
doesn't yet support cmd/powershell, Windows workflows use a different
library to print in place, which results in slightly different
formatting: 1) different margins, and 2) no spinner indicating
deployment in progress.
2021-06-09 16:18:45 -07:00
Seth Hoenig dbdc479970 consul: move consul acl tests into ent files
(cherry-pick ent back to oss)

This PR moves a lot of Consul ACL token validation tests into ent files,
so that we can verify correct behavior difference between OSS and ENT
Nomad versions.
2021-06-09 08:38:42 -05:00
Seth Hoenig d656777dd7
Merge pull request #10720 from hashicorp/f-cns-acl-check
consul: correctly check consul acl token namespace when using consul oss
2021-06-08 15:43:42 -05:00
Seth Hoenig 87be8c4c4b consul: correctly check consul acl token namespace when using consul oss
This PR fixes the Nomad Object Namespace <-> Consul ACL Token relationship
check when using Consul OSS (or Consul ENT without namespace support).

Nomad v1.1.0 introduced a regression where Nomad would fail the validation
when submitting Connect jobs and allow_unauthenticated set to true, with
Consul OSS - because it would do the namespace check against the Consul ACL
token assuming the "default" namespace, which does not work because Consul OSS
does not have namespaces.

Instead of making the bad assumption, expand the namespace check to handle
each special case explicitly.

Fixes #10718
2021-06-08 13:55:57 -05:00
Seth Hoenig c13bf8b917
Merge pull request #10715 from hashicorp/f-cns-attrs
consul: probe consul namespace feature before using namespace api
2021-06-07 16:11:17 -05:00
Seth Hoenig 209e2d6d81 consul: pr cleanup namespace probe function signatures 2021-06-07 15:41:01 -05:00
Seth Hoenig 519429a2de consul: probe consul namespace feature before using namespace api
This PR changes Nomad's wrapper around the Consul NamespaceAPI so that
it will detect if the Consul Namespaces feature is enabled before making
a request to the Namespaces API. Namespaces are not enabled in Consul OSS,
and require a suitable license to be used with Consul ENT.

Previously Nomad would check for a 404 status code when makeing a request
to the Namespaces API to "detect" if Consul OSS was being used. This does
not work for Consul ENT with Namespaces disabled, which returns a 500.

Now we avoid requesting the namespace API altogether if Consul is detected
to be the OSS sku, or if the Namespaces feature is not licensed. Since
Consul can be upgraded from OSS to ENT, or a new license applied, we cache
the value for 1 minute, refreshing on demand if expired.

Fixes https://github.com/hashicorp/nomad-enterprise/issues/575

Note that the ticket originally describes using attributes from https://github.com/hashicorp/nomad/issues/10688.
This turns out not to be possible due to a chicken-egg situation between
bootstrapping the agent and setting up the consul client. Also fun: the
Consul fingerprinter creates its own Consul client, because there is no
[currently] no way to pass the agent's client through the fingerprint factory.
2021-06-07 12:19:25 -05:00
James Rasell 888371a012
cmd: validate the type flag when querying plugin status. 2021-06-07 13:53:28 +02:00
Jasmine Dahilig ca4be6857e
deployment query rate limit (#10706) 2021-06-04 12:38:46 -07:00
Seth Hoenig d026ff1f66 consul/connect: add support for connect mesh gateways
This PR implements first-class support for Nomad running Consul
Connect Mesh Gateways. Mesh gateways enable services in the Connect
mesh to make cross-DC connections via gateways, where each datacenter
may not have full node interconnectivity.

Consul docs with more information:
https://www.consul.io/docs/connect/gateways/mesh-gateway

The following group level service block can be used to establish
a Connect mesh gateway.

service {
  connect {
    gateway {
      mesh {
        // no configuration
      }
    }
  }
}

Services can make use of a mesh gateway by configuring so in their
upstream blocks, e.g.

service {
  connect {
    sidecar_service {
      proxy {
        upstreams {
          destination_name = "<service>"
          local_bind_port  = <port>
          datacenter       = "<datacenter>"
          mesh_gateway {
            mode = "<mode>"
          }
        }
      }
    }
  }
}

Typical use of a mesh gateway is to create a bridge between datacenters.
A mesh gateway should then be configured with a service port that is
mapped from a host_network configured on a WAN interface in Nomad agent
config, e.g.

client {
  host_network "public" {
    interface = "eth1"
  }
}

Create a port mapping in the group.network block for use by the mesh
gateway service from the public host_network, e.g.

network {
  mode = "bridge"
  port "mesh_wan" {
    host_network = "public"
  }
}

Use this port label for the service.port of the mesh gateway, e.g.

service {
  name = "mesh-gateway"
  port = "mesh_wan"
  connect {
    gateway {
      mesh {}
    }
  }
}

Currently Envoy is the only supported gateway implementation in Consul.
By default Nomad client will run the latest official Envoy docker image
supported by the local Consul agent. The Envoy task can be customized
by setting `meta.connect.gateway_image` in agent config or by setting
the `connect.sidecar_task` block.

Gateways require Consul 1.8.0+, enforced by the Nomad scheduler.

Closes #9446
2021-06-04 08:24:49 -05:00
Grant Griffiths 3f41150fbb CSI snapshot list: do not shorten snapshot ID
Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>
2021-05-27 13:28:18 -04:00
Mahmood Ali 0f5539c382 exec: http: close websocket connection gracefully
In this loop, we ought to close the websocket connection gracefully when
the StreamErrWrapper reaches EOF.

Previously, it's possible that that we drop the last few events or skip sending
the websocket closure. If `handler(handlerPipe)` returns and `cancel` is called,
before the loop here completes processing streaming events, the loop exits
prematurely without propagating the last few events.

Instead here, the loop continues until we hit `httpPipe` EOF (through
`decoder.Decode`), to ensure we process the events to completion.
2021-05-24 13:37:23 -04:00
Luiz Aoqui c1ef539fa3
Display confirmation message on 'nomad volume delete' and 'nomad volume deregister' 2021-05-24 12:02:55 -04:00
Tim Gross 82fe7300e5
cli: improve wildcard namespace prefix matches (#10648)
When a wildcard namespace is used for `nomad job` commands that support prefix
matching, avoid asking the user for input if a prefix is an unambiguous exact
match so that the behavior is similar to the commands using a specific or
unset namespace.
2021-05-24 11:38:05 -04:00
Tim Gross 084a46e0e5
agent: surface websocket errors in logs
The websocket interface used for `alloc exec` has to silently drop client send
errors because otherwise those errors would interleave with the streamed
output. But we may be able to surface errors that cause terminated websockets
a little better in the HTTP server logs.
2021-05-24 09:46:45 -04:00
Mahmood Ali b518454bf8
cli: Handle nil MemoryMaxMB (#10620)
Handle when MemoryMaxMB is nil, as expected when a new 1.1.0 is hitting
a pre-1.1.0 Server.
2021-05-19 16:56:06 -04:00
Nomad Release bot 5be44af07d Generate files for 1.1.0-rc1 release 2021-05-12 22:43:48 +00:00
Chris Baker 263ddd567c
Node Drain Metadata (#10250) 2021-05-07 13:58:40 -04:00
Mahmood Ali 102763c979
Support disabling TCP checks for connect sidecar services 2021-05-07 12:10:26 -04:00
Nick Ethier 2978c430e5 command: show number of reserved cores on alloc status output 2021-05-05 08:11:41 -04:00
Mahmood Ali 4b95f6ef42
api: actually set MemoryOversubscriptionEnabled (#10493) 2021-05-02 22:53:53 -04:00
Mahmood Ali 98a9a9052f
Port OSS changes for Enterprise Quota accounting (#10481) 2021-04-30 09:48:03 -04:00
Tim Gross 7fdfbfc0f0 license: remove "Terminates At" from license get command
The `Terminates At` field can't be removed from the struct for backwards
compatibility reasons, but there's no purpose to it anymore so we shouldn't be
showing it to end users of the command.
2021-04-28 12:00:30 -04:00
Tim Gross 4f9c5c4bac license: update 'license get' command 2021-04-28 12:00:30 -04:00
Seth Hoenig d54a606819
Merge pull request #10439 from hashicorp/pick-ent-acls-changes
e2e: add e2e tests for consul namespaces on ent with acls
2021-04-28 08:30:08 -06:00
Tim Gross 79f81d617e licensing: remove raft storage and sync
This changeset is the OSS portion of the work to remove the raft storage and
sync for Nomad Enterprise.
2021-04-28 10:28:23 -04:00
Seth Hoenig 09cd01a5f3 e2e: add e2e tests for consul namespaces on ent with acls
This PR adds e2e tests for Consul Namespaces for Nomad Enterprise
with Consul ACLs enabled.

Needed to add support for Consul ACL tokens with `namespace` and
`namespace_prefix` blocks, which Nomad parses and validates before
tossing the token. These bits will need to be picked back to OSS.
2021-04-27 14:45:54 -06:00
Mahmood Ali ed4aad458c
api: Ignore User provided ParentID (#10424)
ParentID is an internal field that Nomad sets for dispatched or parameterized jobs. Job submitters should not be able to set it directly, as that messes up children tracking.

Fixes #10422 . It specifically stops the scheduler from honoring the ParentID. The reason failure and why the scheduler didn't schedule that job once it was created is very interesting and requires follow up with a more technical issue.
2021-04-23 16:22:17 -04:00
Charlie Voiselle ef8ca60693
Enable go-sockaddr templating for `network-interface` (#10404)
Add templating to `network-interface` option.
This PR also adds a fast-fail to in the case where an invalid interface is set or produced by the template

* add tests and check for valid interface
* Add documentation
* Incorporate suggestions from code review

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2021-04-20 13:55:10 -04:00
Seth Hoenig 4e6dbaaec1
Merge pull request #10184 from hashicorp/f-fuzzy-search
api: implement fuzzy search API
2021-04-20 09:06:40 -06:00
Seth Hoenig 509490e5d2 e2e: consul namespace tests from nomad ent
(cherry-picked from ent without _ent things)

This is part 2/4 of e2e tests for Consul Namespaces. Took a
first pass at what the parameterized tests can look like, but
only on the ENT side for this PR. Will continue to refactor
in the next PRs.

Also fixes 2 bugs:
 - Config Entries registered by Nomad Server on job registration
   were not getting Namespace set
 - Group level script checks were not getting Namespace set

Those changes will need to be copied back to Nomad OSS.

Nomad OSS + no ACLs (previously, needs refactor)
Nomad ENT + no ACLs (this)
Nomad OSS + ACLs (todo)
Nomad ENT + ALCs (todo)
2021-04-19 15:35:31 -06:00
Mahmood Ali d880ba9c62 cli: filename arg for `volume init` and `quote init` 2021-04-18 14:14:05 -04:00
Seth Hoenig 1ee8d5ffc5 api: implement fuzzy search API
This PR introduces the /v1/search/fuzzy API endpoint, used for fuzzy
searching objects in Nomad. The fuzzy search endpoint routes requests
to the Nomad Server leader, which implements the Search.FuzzySearch RPC
method.

Requests to the fuzzy search API are based on the api.FuzzySearchRequest
object, e.g.

{
  "Text": "ed",
  "Context": "all"
}

Responses from the fuzzy search API are based on the api.FuzzySearchResponse
object, e.g.

{
  "Index": 27,
  "KnownLeader": true,
  "LastContact": 0,
  "Matches": {
    "tasks": [
      {
        "ID": "redis",
        "Scope": [
          "default",
          "example",
          "cache"
        ]
      }
    ],
    "evals": [],
    "deployment": [],
    "volumes": [],
    "scaling_policy": [],
    "images": [
      {
        "ID": "redis:3.2",
        "Scope": [
          "default",
          "example",
          "cache",
          "redis"
        ]
      }
    ]
  },
  "Truncations": {
    "volumes": false,
    "scaling_policy": false,
    "evals": false,
    "deployment": false
  }
}

The API is tunable using the new server.search stanza, e.g.

server {
  search {
    fuzzy_enabled   = true
    limit_query     = 200
    limit_results   = 1000
    min_term_length = 5
  }
}

These values can be increased or decreased, so as to provide more
search results or to reduce load on the Nomad Server. The fuzzy search
API can be disabled entirely by setting `fuzzy_enabled` to `false`.
2021-04-16 16:36:07 -06:00
Nick Ethier 339c671e29 agent: add test for reserved core config mapping 2021-04-13 13:28:15 -04:00
Nick Ethier edc0da9040 client: only fingerprint reservable cores via cgroups, allowing manual override for other platforms 2021-04-13 13:28:15 -04:00