open-nomad

Commit Graph

Author	SHA1	Message	Date
Jorge Marey	a466f01120	Add metadata to namespaces	2022-02-27 09:09:10 +01:00
Michael Schurter	cbf6ba843d	cli: fix op api typos Co-authored-by: Seth Hoenig <seth.a.hoenig@gmail.com>	2022-02-25 16:31:56 -08:00
Michael Schurter	4550c5fb80	cli: only return 1 on errors from op api We don't want people to expect stable error codes for errors, and I don't think these were useful for scripts anyway.	2022-02-25 16:23:31 -08:00
Michael Schurter	a42d832f98	cli: add tests and minor fixes for op api Trimmed spaces around header values. Fixed method getting forced to GET.	2022-02-24 17:06:07 -08:00
Michael Schurter	238a732098	cli: add filter support	2022-02-24 15:52:54 -08:00
Michael Schurter	bb3daac628	rename `nomad curl` to `nomad operator api`	2022-02-24 15:52:54 -08:00
Michael Schurter	141db0c562	cli: add curl command Just a hackweek project at this point.	2022-02-24 15:52:54 -08:00
Tim Gross	31ee2a3c67	CSI: ensure all fields are mapped from structs to api response (#12124 ) In PR #12108 we added missing fields to the plugin response, but we didn't include the manual serialization steps that we need until issue #10470 is resolved.	2022-02-24 14:17:15 -05:00
Tim Gross	13ea2c7fb3	CSI: display plugin capabilities in verbose status (#12116 ) The behaviors of CSI plugins are governed by their capabilities as defined by the CSI specification. When debugging plugin issues, it's useful to know which behaviors are expected so they can be matched against RPC calls made to the plugin allocations. Expose the plugin capabilities as named in the CSI spec in the `nomad plugin status -verbose` output.	2022-02-24 13:51:38 -05:00
Sander Mol	42b338308f	add go-sockaddr templating support to nomad consul address (#12084 )	2022-02-24 09:34:54 -05:00
Florian Apolloner	3bced8f558	namespaces: allow enabling/disabling allowed drivers per namespace	2022-02-24 09:27:32 -05:00
Seth Hoenig	a0350b0608	command: switch from raft-boltdb to raft-boltdb/v2	2022-02-23 14:43:59 -06:00
Seth Hoenig	de95998faa	core: switch to go.etc.io/bbolt This PR swaps the underlying BoltDB implementation from boltdb/bolt to go.etc.io/bbolt. In addition, the Server has a new configuration option for disabling NoFreelistSync on the underlying database. Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81 Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720	2022-02-23 14:26:41 -06:00
Michael Schurter	7494a0c4fd	core: remove all traces of unused protocol version Nomad inherited protocol version numbering configuration from Consul and Serf, but unlike those projects Nomad has never used it. Nomad's `protocol_version` has always been `1`. While the code is effectively unused and therefore poses no runtime risks to leave, I felt like removing it was best because: 1. Nomad's RPC subsystem has been able to evolve extensively without needing to increment the version number. 2. Nomad's HTTP API has evolved extensively without increment `API{Major,Minor}Version`. If we want to version the HTTP API in the future, I doubt this is the mechanism we would choose. 3. The presence of the `server.protocol_version` configuration parameter is confusing since `server.raft_protocol` is an important parameter for operators to consider. Even more confusing is that there is a distinct Serf protocol version which is included in `nomad server members` output under the heading `Protocol`. `raft_protocol` is the only protocol version relevant to Nomad developers and operators. The other protocol versions are either deadcode or have never changed (Serf). 4. If we were to need to version the RPC, HTTP API, or Serf protocols, I don't think these configuration parameters and variables are the best choice. If we come to that point we should choose a versioning scheme based on the use case and modern best practices -- not this 6+ year old dead code.	2022-02-18 16:12:36 -08:00
Luiz Aoqui	de91954582	initial base work for implementing sorting and filter across API endpoints (#12076 )	2022-02-16 14:34:36 -05:00
Luiz Aoqui	110dbeeb9d	Add `go-bexpr` filters to evals and deployment list endpoints (#12034 )	2022-02-16 11:40:30 -05:00
Seth Hoenig	ac3cd73d00	Merge pull request #12054 from hashicorp/b-creation-indexes api: return sorted results in certain list endpoints	2022-02-15 15:08:38 -06:00
Seth Hoenig	40c714a681	api: return sorted results in certain list endpoints These API endpoints now return results in chronological order. They can return results in reverse chronological order by setting the query parameter ascending=true. - Eval.List - Deployment.List	2022-02-15 13:48:28 -06:00
Alex Holyoake	3071c7d91b	config: merge ReservableCores in clientConfig (#12044 )	2022-02-15 08:36:37 -05:00
Tim Gross	2f79a260fe	csi: volume cli prefix matching should accept exact match (#12051 ) The `volume detach`, `volume deregister`, and `volume status` commands accept a prefix argument for the volume ID. Update the behavior on exact matches so that if there is more than one volume that matches the prefix, we should only return an error if one of the volume IDs is not an exact match. Otherwise we won't be able to use these commands at all on those volumes. This also makes the behavior of these commands consistent with `job stop`.	2022-02-11 08:53:03 -05:00
Luiz Aoqui	3bf6036487	Version 1.2.6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJiBIXqAAoJELC0QQl2hbZ2M8cP/A7LENJbFSph25M1aGItra5j BphSX//Sq/v9ZzO44rOGNYQGfTpFT8STJgj2GC50qR/ilF4KX4D0oZlDyu/6D0NG ouN9RUjnFd6IEDQrjqqqhr3F69Z95SWVfi1rfgn/pIgOYkVEXfi6DXaulVVyd2ZT J0G5w5ryl5d8PhuL7TWw4zbhZRQn0hVspZv/1s3/I9aG6Sew8SMweeOxbN9lBr7E H19Amdjh6ugRuPgU7YMpKDVrZQRv9Wt7BUP/uc0u3LiW9z3Ko8ZKnCRKErtL5Kc3 HDZsWe+t3va4Uekzd0HULNcYU4kwjogdRYRzX5kRsOyXelrZkQIqYFiKrk1wVbq/ cYM5DUak6eUQBGhgi3UY0fklBFq4GDGpiwEzn7rvQb0PRSuVyykgbZ12fzyIu8dp tWbR/WOEg9F+jva6HkR2kDIcr5mDmny3Pxi5aUT6lMk1111nCzOjDzhLkQVtfsex FDMByXxM4oWAK3ouq2OIdxDL2c742A2933C4/30KWE7Xy7twsvkGw52irw66VO3V 4PHP880cDvEDaEh15mY/8FlaAE7t/gsCUuYLxGwl33TaXSRBLc9vVNrrp89q53TD ZcvXTBpHUOWa6ZlHF/4f8LW44rowM6bU0Wili7NaWOKx86dnUJMG4sqJifNgcpS/ 7lXogv98CYLbMy4X4if0 =NY1Z -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEElFaq1Z5DKdB91i+lKfRZwNnLtXMFAmIFbbkACgkQKfRZwNnL tXOr/g/+N2ZBMK8ohEvtdXLl7WXrVhgJfUSVbdD5Kfshul9CPn3yWRxJzqtEN2Pf 55ozeWLpoziP9y9LviJ7rDidXcTmDFutbFdGJ3L+ZLdLILsNOq1A+lbuwO3fJngZ 5aiPoJLsw4sqj6uHaM6Cls2f145O92nT7GXEHCxuvGHeSf3NkcR+zRY5nPrLTIrA uxYefCOzP6C2I+W7dL4Oj5R5EZd4UDi1WiL8pGzwm24LcagZN2ctctolAeF9OlJX M58UUv9b4GObe617u8MeH0LIlyZiNwn9JqrV33dKVTyrkBIYfYxkzdzMKf1csVYk kQb13KPdPTASBAGTl+sxeXXnw/bg09JXGcvREX5lLyQqY8xGwTv2FpTmybKWLiss Bg6BbejrgtCPBik0EAHWV0+kVzhi9bPfUYwTXLDCzMtrbyCyPoWchruel2sm41U1 ezRDzlSvf6nrXf7sAv6umJICck4Bc5Gol+8W7fxvWqnY9rQ3ds2v7E5lXZMBbOmE JSi+EDWBJjBAXehE6pLxeVsvlHMRWN007Z2UeD4neGIgG7xFJLq6nKeUKoiNIpgk hKBL8iwHyuJfrBB/dcPzI9NV+jL6OZ/oI1RWxSj0MX/B4VXZp8HrqZA5JxzQolUg KIxqe4iX3WIkQv+UU4WiELvs4O7fujB4KWz3iQokhwDxqGUpffk= =5EG2 -----END PGP SIGNATURE----- Merge tag 'v1.2.6' into merge-release-1.2.6-branch Version 1.2.6	2022-02-10 14:55:34 -05:00
Luiz Aoqui	15f9d54dea	api: prevent excessice CPU load on job parse Add new namespace ACL requirement for the /v1/jobs/parse endpoint and return early if HCLv2 parsing fails. The endpoint now requires the new `parse-job` ACL capability or `submit-job`.	2022-02-09 19:51:47 -05:00
Thomas Lefebvre	3b57f3af9d	Add config command and config validate subcommand to nomad CLI (#9198 )	2022-02-08 16:52:35 -05:00
Tim Gross	7ad15b2b42	raft: default to protocol v3 (#11572 ) Many of Nomad's Autopilot features require raft protocol version 3. Set the default raft protocol to 3, and improve the upgrade documentation.	2022-02-03 15:03:12 -05:00
Seth Hoenig	db2347a86c	cleanup: prevent leaks from time.After This PR replaces use of time.After with a safe helper function that creates a time.Timer to use instead. The new function returns both a time.Timer and a Stop function that the caller must handle. Unlike time.NewTimer, the helper function does not panic if the duration set is <= 0.	2022-02-02 14:32:26 -06:00
Derek Strickland	460416e787	Update IsEmpty to check for pre-1.2.4 fields (#11930 )	2022-01-28 14:41:49 -05:00
Derek Strickland	b3c8ab9be7	Update IsEmpty to check for pre-1.2.4 fields (#11930 )	2022-01-26 11:31:37 -05:00
Tim Gross	1dad0e597e	fix integer bounds checks (#11815 ) * driver: fix integer conversion error The shared executor incorrectly parsed the user's group into int32 and then cast to uint32 without bounds checking. This is harmless because an out-of-bounds gid will throw an error later, but it triggers security and code quality scans. Parse directly to uint32 so that we get correct error handling. * helper: fix integer conversion error The autopilot flags helper incorrectly parses a uint64 to a uint which is machine specific size. Although we don't have 32-bit builds, this sets off security and code quality scaans. Parse to the machine sized uint. * driver: restrict bounds of port map The plugin server doesn't constrain the maximum integer for port maps. This could result in a user-visible misconfiguration, but it also triggers security and code quality scans. Restrict the bounds before casting to int32 and return an error. * cpuset: restrict upper bounds of cpuset values Our cpuset configuration expects values in the range of uint16 to match the expectations set by the kernel, but we don't constrain the values before downcasting. An underflow could lead to allocations failing on the client rather than being caught earlier. This also make security and code quality scanners happy. * http: fix integer downcast for per_page parameter The parser for the `per_page` query parameter downcasts to int32 without bounds checking. This could result in underflow and nonsensical paging, but there's no server-side consequences for this. Fixing this will silence some security and code quality scanners though.	2022-01-25 11:16:48 -05:00
Seth Hoenig	0030424384	Merge pull request #11889 from hashicorp/build-update-circle build: upgrade circleci configuration	2022-01-24 10:18:21 -06:00
Seth Hoenig	2f0cfb5740	build: upgrade and speedup circleci configuration This PR upgrades our CI images and fixes some affected tests. - upgrade go-machine-image to premade latest ubuntu LTS (ubuntu-2004:202111-02) - eliminate go-machine-recent-image (no longer necessary) - manage GOPATH in GNUMakefile (see https://discuss.circleci.com/t/gopath-is-set-to-multiple-directories/7174) - fix tcp dial error check (message seems to be OS specific) - spot check values measured instead of specifically 'RSS' (rss no longer reported in cgroups v2) - use safe MkdirTemp for generating tmpfiles NOT applied: (too flakey) - eliminate setting GOMAXPROCS=1 (build tools were also affected by this setting) - upgrade resource type for all imanges to large (2C -> 4C)	2022-01-24 08:28:14 -06:00
Seth Hoenig	f2a71fd0d9	deps: pty has new home github.com/kr/pty was moved to github.com/creack/pty Swap this dependency so we can upgrade to the latest version and no longer need a replace directive.	2022-01-19 12:33:05 -06:00
Seth Hoenig	2a5f7c0386	deps: swap gzip handler for gorilla This has been pinned since the Go modules migration, because the nytimes gzip handler was modified in version v1.1.0 in a way that is no longer compatible. Pretty sure it is this commit: `c551b6c3b4` Instead use handler.CompressHandler from gorilla, which is a web toolkit we already make use of for other things.	2022-01-19 11:52:19 -06:00
Nomad Release bot	de3070d49a	Generate files for 1.2.4 release	2022-01-18 23:43:00 +00:00
Dave May	330d24a873	cli: Add event stream capture to nomad operator debug (#11865 )	2022-01-17 21:35:51 -05:00
Michael Schurter	99c863f909	cli: improve debug error messages (#11507 ) Improves `nomad debug` error messages when contacting agents that do not have /v1/agent/host endpoints (the endpoint was added in v0.12.0) Part of #9568 and manually tested against Nomad v0.8.7. Hopefully isRedirectError can be reused for more cases listed in #9568	2022-01-17 11:15:17 -05:00
Tim Gross	33f7c6cba4	csi: when warning for multiple prefix matches, use full ID (#11853 ) When the `volume deregister` or `volume detach` commands get an ID prefix that matches multiple volumes, show the full length of the volume IDs in the list of volumes shown so so that the user can select the correct one.	2022-01-14 12:25:48 -05:00
Tim Gross	9c4864badd	freebsd: build fix for ARM7 32-bit (#11854 ) The size of `stat_t` fields is architecture dependent, which was reportedly causing a build failure on FreeBSD ARM7 32-bit systems. This changeset matches the behavior we have on Linux.	2022-01-14 12:25:32 -05:00
James Rasell	82b168bf34	Merge pull request #11403 from hashicorp/f-gh-11059 agent/docs: add better clarification when top-level data dir needs setting	2022-01-13 16:41:35 +01:00
Luiz Aoqui	d48e50da9a	Fix log level parsing from lines that include a timestamp (#11838 )	2022-01-13 09:56:35 -05:00
Michael Schurter	e6eff95769	agent: validate reserved_ports are valid Goal is to fix at least one of the causes that can cause a node to be ineligible to receive work: https://github.com/hashicorp/nomad/issues/9506#issuecomment-1002880600	2022-01-12 14:21:47 -08:00
Seth Hoenig	8c97ffd68e	cleanup: stop referencing depreceted HeaderMap field Remove reference to the deprecated ResponseRecorder.HeaderMap field, instead calling .Response.Header() to get the same data. closes #10520	2022-01-12 10:32:54 -06:00
Derek Strickland	0a8e03f0f7	Expose Consul template configuration parameters (#11606 ) This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza. - `wait` It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza. - `max_stale` - `block_query_wait` - `consul_retry` - `vault_retry` - `wait` Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure. - `wait_bounds` Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-01-10 10:19:07 -05:00
Charlie Voiselle	98a240cd99	Make number of scheduler workers reloadable (#11593 ) ## Development Environment Changes * Added stringer to build deps ## New HTTP APIs * Added scheduler worker config API * Added scheduler worker info API ## New Internals * (Scheduler)Worker API refactor—Start(), Stop(), Pause(), Resume() * Update shutdown to use context * Add mutex for contended server data - `workerLock` for the `workers` slice - `workerConfigLock` for the `Server.Config.NumSchedulers` and `Server.Config.EnabledSchedulers` values ## Other * Adding docs for scheduler worker api * Add changelog message Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>	2022-01-06 11:56:13 -05:00
Tim Gross	2806dc2bd7	docs/tests for multiple HTTP address config (#11760 )	2022-01-03 10:17:13 -05:00
Kevin Schoonover	5d9a506bc0	agent: support multiple http address in addresses.http (#11582 )	2022-01-03 09:33:53 -05:00
James Rasell	45f4689f9c	chore: fixup inconsistent method receiver names. (#11704 )	2021-12-20 11:44:21 +01:00
Tim Gross	c7cc3cf4dc	cli: stream raft logs to operator raft logs subcommand (#11684 ) The `nomad operator raft logs` command uses a raft helper that reads in the logs from raft and serializes them to JSON. The previous implementation returned the slice of all logs and then serializes the entire object. Update the helper to stream the log entries and then serialize them as newline-delimited JSON.	2021-12-16 13:38:58 -05:00
Tim Gross	f2615992a4	cli: unhide advanced operator raft debugging commands (#11682 ) The `nomad operator raft` and `nomad operator snapshot state` subcommands for inspecting on-disk raft state were hidden and undocumented. Expose and document these so that advanced operators have support for these tools.	2021-12-16 10:32:11 -05:00
Tim Gross	536e3c5282	`nomad eval list` command (#11675 ) Use the new filtering and pagination capabilities of the `Eval.List` RPC to provide filtering and pagination at the command line. Also includes note that `nomad eval status -json` is deprecated and will be replaced with a single evaluation view in a future version of Nomad.	2021-12-15 11:58:38 -05:00
Tim Gross	f8a133a810	cli: ensure `-stale` flag is respected by `nomad operator debug` (#11678 ) When a cluster doesn't have a leader, the `nomad operator debug` command can safely use stale queries to gracefully degrade the consistency of almost all its queries. The query parameter for these API calls was not being set by the command. Some `api` package queries do not include `QueryOptions` because they target a specific agent, but they can potentially be forwarded to other agents. If there is no leader, these forwarded queries will fail. Provide methods to call these APIs with `QueryOptions`.	2021-12-15 10:44:03 -05:00
Tim Gross	a0cf5db797	provide `-no-shutdown-delay` flag for job/alloc stop (#11596 ) Some operators use very long group/task `shutdown_delay` settings to safely drain network connections to their workloads after service deregistration. But during incident response, they may want to cause that drain to be skipped so they can quickly shed load. Provide a `-no-shutdown-delay` flag on the `nomad alloc stop` and `nomad job stop` commands that bypasses the delay. This sets a new desired transition state on the affected allocations that the allocation/task runner will identify during pre-kill on the client. Note (as documented here) that using this flag will almost always result in failed inbound network connections for workloads as the tasks will exit before clients receive updated service discovery information and won't be gracefully drained.	2021-12-13 14:54:53 -05:00
Tim Gross	624ecab901	evaluations list pagination and filtering (#11648 ) API queries can request pagination using the `NextToken` and `PerPage` fields of `QueryOptions`, when supported by the underlying API. Add a `NextToken` field to the `structs.QueryMeta` so that we have a common field across RPCs to tell the caller where to resume paging from on their next API call. Include this field on the `api.QueryMeta` as well so that it's available for future versions of List HTTP APIs that wrap the response with `QueryMeta` rather than returning a simple list of structs. In the meantime callers can get the `X-Nomad-NextToken`. Add pagination to the `Eval.List` RPC by checking for pagination token and page size in `QueryOptions`. This will allow resuming from the last ID seen so long as the query parameters and the state store itself are unchanged between requests. Add filtering by job ID or evaluation status over the results we get out of the state store. Parse the query parameters of the `Eval.List` API into the arguments expected for filtering in the RPC call.	2021-12-10 13:43:03 -05:00
Lukas W	0e5958d671	CLI: Return non-zero exit code when deployment fails in `nomad run` (#11550 ) * Exit non-zero from run command if deployment fails * Fix typo in deployment monitor introduced in 0edda11	2021-12-09 09:09:28 -05:00
Vyacheslav Morov	6a244f18ad	cli: Add var args to plan output. (#11631 )	2021-12-07 10:43:52 -05:00
Tim Gross	03e697a69d	scheduler: config option to reject job registration (#11610 ) During incident response, operators may find that automated processes elsewhere in the organization can be generating new workloads on Nomad clusters that are unable to handle the workload. This changeset adds a field to the `SchedulerConfiguration` API that causes all job registration calls to be rejected unless the request has a management ACL token.	2021-12-06 15:20:34 -05:00
Derek Strickland	fb6dbffa59	Override TLS flags individually for meta commands (#11592 ) * Override TLS flags individually for meta commands * Update command/meta.go Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2021-12-01 12:07:48 -05:00
Tim Gross	7770eda3f1	config: fix test-only failures in UI handler setup (#11571 ) The `TestHTTPServer_Limits_Error` test never starts the agent so it had an incomplete configuration, which caused panics in the test. Fix the configuration. The PR #11555 had a branch name like `f-ui-*` which caused CI to skip the unit tests over the HTTP handler setup, so this wasn't caught in PR review.	2021-11-24 16:19:04 -05:00
Tim Gross	fcb96de9a7	config: UI configuration block with Vault/Consul links (#11555 ) Add `ui` block to agent configuration to enable/disable the web UI and provide the web UI with links to Vault/Consul.	2021-11-24 11:20:02 -05:00
James Rasell	751c8217d1	core: allow setting and propagation of eval priority on job de/registration (#11532 ) This change modifies the Nomad job register and deregister RPCs to accept an updated option set which includes eval priority. This param is optional and override the use of the job priority to set the eval priority. In order to ensure all evaluations as a result of the request use the same eval priority, the priority is shared to the allocReconciler and deploymentWatcher. This creates a new distinction between eval priority and job priority. The Nomad agent HTTP API has been modified to allow setting the eval priority on job update and delete. To keep consistency with the current v1 API, job update accepts this as a payload param; job delete accepts this as a query param. Any user supplied value is validated within the agent HTTP handler removing the need to pass invalid requests to the server. The register and deregister opts functions now all for setting the eval priority on requests. The change includes a small change to the DeregisterOpts function which handles nil opts. This brings the function inline with the RegisterOpts.	2021-11-23 09:23:31 +01:00
Tim Gross	e729133134	api: return 404 for alloc FS list/stat endpoints (#11482 ) * api: return 404 for alloc FS list/stat endpoints If the alloc filesystem doesn't have a file requested by the List Files or Stat File API, we currently return a HTTP 500 error with the expected "file not found" error message. Return a HTTP 404 error instead. * update FS Handler Previously the FS handler would interpret a 500 status as a 404 in the adapter layer by checking if the response body contained the text or is the response status was 500 and then throw an error code for 404. Co-authored-by: Jai Bhagat <jaybhagat841@gmail.com>	2021-11-17 11:15:07 -05:00
Luiz Aoqui	610a8a05e6	Merge release 1.2.0 rc1 branch (#11486 )	2021-11-09 17:55:13 -05:00
Dave May	3c04d7927b	cli: refactor operator debug capture (#11466 ) * debug: refactor Consul API collection * debug: refactor Vault API collection * debug: cleanup test timing * debug: extend test to multiregion * debug: save cmdline flags in bundle * debug: add cli version to output * Add changelog entry	2021-11-05 19:43:10 -04:00
Alessandro De Blasis	07c670fdc0	cli: show `host_network` in `nomad status` (#11432 ) Enhance the CLI in order to return the host network in two flavors (default, verbose) of the `node status` command. Fixes: #11223. Signed-off-by: Alessandro De Blasis <alex@deblasis.net>	2021-11-05 09:02:46 -04:00
Florian Apolloner	ef88795af3	Added a `-hcl2-strict` flag to allow for lenient hcl variable parsing. (#11284 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2021-11-04 16:33:09 +01:00
James Rasell	674761436e	Merge pull request #11165 from hashicorp/b-gh-11149 jobspec2: ensure consistent error handling between var-file & var.	2021-11-04 16:24:00 +01:00
Mahmood Ali	4fc6e50782	Raft Debugging Improvements (#11414 )	2021-11-04 10:16:12 -04:00
Michael Schurter	ef3fc79225	Merge pull request #11334 from hashicorp/f-chroot-skip-allocdir client: never embed alloc_dir in chroot	2021-11-03 16:48:09 -07:00
Luiz Aoqui	5d204c8ced	Revert "Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799 )" (#11433 )	2021-11-02 17:42:52 -04:00
James Rasell	c071efbd6b	Merge pull request #11411 from hashicorp/f-gh-11406 cli: add json and template flag opts to acl bootstrap command.	2021-11-02 09:48:25 +01:00
Charlie Voiselle	29e7d46dd9	Making RPC Upgrade mode reloadable. (#11144 ) - Making RPC Upgrade mode reloadable. - Add suggestions from code review - remove spurious comment - switch to require(t,...) form for test. - Add to changelog	2021-11-01 16:30:53 -04:00
James Rasell	6c9e6e6f20	cli: add json and template flag opts to acl boostrap command.	2021-10-29 09:00:50 +02:00
James Rasell	4c92a77aac	agent: clarify error info when data dir needs setting.	2021-10-28 15:05:56 +02:00
Dave May	509c74ce19	debug: update default node-id and docs (#11398 ) * debug: default node-id to all * debug: align cli help and website documentation	2021-10-27 13:43:56 -04:00
Mahmood Ali	cdddd64a42	logging: Log the cause behind agent startup failure (#11353 ) Log the failure error when the agent fails to start. Previously, the agent startup failure error would be emitted to the command UI but not logged. So it doesn't get emitted to syslog or `log_file` if they are set, and it makes debugging much harder. Also, logging the error again before exit makes the error more visible: previously, the operator needed to scroll to the top to find the error. On a sample failure, the output will look like: ``` ==> WARNING: Bootstrap mode enabled! Potentially unsafe operation. ==> Loaded configuration from sample-configs/config-bad ==> Starting Nomad agent... ==> Error starting agent: setting up server node ID failed: mkdir /path-without-permission: read-only file system 2021-10-20T14:38:51.179-0400 [WARN] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/path-without-permission/plugins 2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins 2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=java type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=docker type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=exec type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [ERROR] agent: error starting agent: error="setting up server node ID failed: mkdir /path-without-permission: read-only file system" ``` This change adds the final `ERROR` message. It's easy to miss the `==> Error starting agent` above.	2021-10-27 10:41:17 -07:00
Luiz Aoqui	b463715a98	prevent active log from being overwritten when agent starts (#11386 )	2021-10-26 20:57:07 -04:00
Luiz Aoqui	979faf41e5	fix test names (#11374 )	2021-10-22 15:43:55 -04:00
Luiz Aoqui	3c22fc79a5	add dispatch idempotency token support in the CLI (#10930 )	2021-10-22 12:39:05 -04:00
Luiz Aoqui	6853bf9632	cli: allow setting namespace and region in the `nomad ui` command (#11364 )	2021-10-21 16:24:39 -04:00
Shishir Mahajan	dd93f72920	Code cleanup: Remove extra if clause. Signed-off-by: Shishir Mahajan <smahajan@roblox.com>	2021-10-19 16:52:11 -07:00
Michael Schurter	10c3bad652	client: never embed alloc_dir in chroot Fixes #2522 Skip embedding client.alloc_dir when building chroot. If a user configures a Nomad client agent so that the chroot_env will embed the client.alloc_dir, Nomad will happily infinitely recurse while building the chroot until something horrible happens. The best case scenario is the filesystem's path length limit is hit. The worst case scenario is disk space is exhausted. A bad agent configuration will look something like this: ```hcl data_dir = "/tmp/nomad-badagent" client { enabled = true chroot_env { # Note that the source matches the data_dir "/tmp/nomad-badagent" = "/ohno" # ... } } ``` Note that `/ohno/client` (the state_dir) will still be created but not `/ohno/alloc` (the alloc_dir). While I cannot think of a good reason why someone would want to embed Nomad's client (and possibly server) directories in chroots, there should be no cause for harm. chroots are only built when Nomad runs as root, and Nomad disables running exec jobs as root by default. Therefore even if client state is copied into chroots, it will be inaccessible to tasks. Skipping the `data_dir` and `{client,server}.state_dir` is possible, but this PR attempts to implement the minimum viable solution to reduce risk of unintended side effects or bugs. When running tests as root in a vm without the fix, the following error occurs: ``` === RUN TestAllocDir_SkipAllocDir alloc_dir_test.go:520: Error Trace: alloc_dir_test.go:520 Error: Received unexpected error: Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long Test: TestAllocDir_SkipAllocDir --- FAIL: TestAllocDir_SkipAllocDir (22.76s) ``` Also removed unused Copy methods on AllocDir and TaskDir structs. Thanks to @eveld for not letting me forget about this!	2021-10-18 09:22:01 -07:00
Luiz Aoqui	130970e12e	Merge missing commits from 1.2.0-beta1 release branch (#11319 )	2021-10-14 16:10:05 -04:00
Luiz Aoqui	9d48daed8c	fix `nomad job allocs` command name (#11314 )	2021-10-14 12:44:59 -04:00
Charlie Voiselle	cb8e52b5df	Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799 )	2021-10-13 21:23:13 -04:00
Michael Schurter	59fda1894e	Merge pull request #11167 from a-zagaevskiy/master Support configurable dynamic port range	2021-10-13 16:47:38 -07:00
Michael Schurter	e14cd34392	client: improve errors & tests for dynamic ports	2021-10-13 16:25:25 -07:00
Dave May	c37a6ed583	cli: rename paths in debug bundle for clarity (#11307 ) * Rename folders to reflect purpose * Improve captured files test coverage * Rename CSI plugins output file * Add changelog entry * fix test and make changelog message more explicit Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2021-10-13 18:00:55 -04:00
Mahmood Ali	fa4df28fcd	tests: ensure that tests restore env-var values (#11309 ) Fix a test corruption issue, where a test accidentally unsets the `NOMAD_LICENSE` environment variable, that's relied on by some tests. As a habit, tests should always restore the environment variable value on test completion. Golang 1.17 introduced [`t.Setenv`](https://pkg.go.dev/testing#T.Setenv) to address this issue. However, as 1.0.x and 1.1.x branches target golang 1.15 and 1.16, I opted to use a helper function to ease backports.	2021-10-13 17:26:56 -04:00
Dave May	305e8e98bf	cli: Improved autocomplete support for job dispatch and operator debug (#11270 ) * Add autocomplete to nomad job dispatch * Add autocomplete to nomad operator debug * Update incorrect comment * Update test to verify autocomplete * Add changelog * Apply lint suggestions * Create dynamic slices instead of specific length * Align style across predictors	2021-10-12 20:01:54 -04:00
Dave May	2d14c54fa0	debug: Improve namespace and region support (#11269 ) * Include region and namespace in CLI output * Add region and prefix matching for server members * Add namespace and region API outputs to cluster metadata folder * Add region awareness to WaitForClient helper function * Add helper functions for SliceStringHasPrefix and StringHasPrefixInSlice * Refactor test client agent generation * Add tests for region * Add changelog	2021-10-12 16:58:41 -04:00
Dave May	76b05f3cd2	cli: Add nomad job allocs command (#11242 )	2021-10-12 16:30:36 -04:00
Luiz Aoqui	3e0bad5a41	wrap `log` messages with `hclog` (#11291 )	2021-10-12 14:38:44 -04:00
Aleksandr Zagaevskiy	d92666e6a7	fixup! Support configurable dynamic port range	2021-10-11 14:13:59 +03:00
Matt Mukerjee	b56432e645	Add FailoverHeartbeatTTL to config (#11127 ) FailoverHeartbeatTTL is the amount of time to wait after a server leader failure before considering reallocating client tasks. This TTL should be fairly long as the new server leader needs to rebuild the entire heartbeat map for the cluster. In deployments with a small number of machines, the default TTL (5m) may be unnecessary long. Let's allow operators to configure this value in their config files.	2021-10-06 18:48:12 -04:00
Shantanu Gadgil	0ce156123d	auth_soft_fail needed for public images when agent is configured with auth (#11190 )	2021-10-06 15:30:23 -04:00
Florian Apolloner	0fa60dae9d	Added support for `-force-color` to the CLI. (#10975 )	2021-10-06 10:02:42 -04:00
Yan	6ff0b6debc	add `-show-url` option for `ui` command (#11213 )	2021-10-05 20:08:42 -04:00
Luiz Aoqui	0a62bdc3c5	fix panic when Connect mesh gateway doesn't have a proxy block (#11257 ) Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2021-10-04 15:52:07 -04:00
Mahmood Ali	4d90afb425	gofmt all the files mostly to handle build directives in 1.17.	2021-10-01 10:14:28 -04:00
Michael Schurter	c6e72b6818	client: output reserved ports with min/max ports Also add a little more min/max port testing and add the consts back that had been removed: but unexported and as defaults.	2021-09-30 17:05:46 -07:00
Michael Schurter	4ad0c258b9	client: add NOMAD_LICENSE to default env deny list By default we should not expose the NOMAD_LICENSE environment variable to tasks. Also refactor where the DefaultEnvDenyList lives so we don't have to maintain 2 copies of it. Since client/config is the most obvious location, keep a reference there to its unfortunate home buried deep in command/agent/host. Since the agent uses this list as well for the /agent/host endpoint the list must be accessible from both command/agent and client.	2021-09-21 13:51:17 -07:00
Florian Apolloner	7805b8edf4	Fixed usage of NOMAD_CLI_NO_COLOR env variable. (#11168 )	2021-09-17 20:37:05 -04:00
James Rasell	0e926ef3fd	allow configuration of Docker hostnames in bridge mode (#11173 ) Add a new hostname string parameter to the network block which allows operators to specify the hostname of the network namespace. Changing this causes a destructive update to the allocation and it is omitted if empty from API responses. This parameter also supports interpolation. In order to have a hostname passed as a configuration param when creating an allocation network, the CreateNetwork func of the DriverNetworkManager interface needs to be updated. In order to minimize the disruption of future changes, rather than add another string func arg, the function now accepts a request struct along with the allocID param. The struct has the hostname as a field. The in-tree implementations of DriverNetworkManager.CreateNetwork have been modified to account for the function signature change. In updating for the change, the enhancement of adding hostnames to network namespaces has also been added to the Docker driver, whilst the default Linux manager does not current implement it.	2021-09-16 08:13:09 +02:00
Aleksandr Zagaevskiy	ebb87e65fe	Support configurable dynamic port range	2021-09-10 11:52:47 +03:00
James Rasell	257d63eec9	jobspec2: ensure consistent error handling between var-file & var.	2021-09-09 11:18:11 +02:00
James Rasell	04a15b5c16	Merge pull request #11105 from hashicorp/f-add-staticcheck-ci ci: add staticcheck with ST1020 and update golangci-lint	2021-09-09 09:42:12 +02:00
Luiz Aoqui	4dd8b6b571	cli: include all possible scores in alloc status metric table (#11128 )	2021-09-08 17:30:11 -04:00
James Rasell	d4a333e9b5	lint: mark false positive or fix gocritic append lint errors.	2021-09-06 10:49:44 +02:00
James Rasell	b6813f1221	chore: fix incorrect docstring formatting.	2021-08-30 11:08:12 +02:00
Luiz Aoqui	104d29e808	Don't timestamp active log file (#11070 ) * don't timestamp active log file * website: update log_file default value * changelog: add entry for #11070 * website: add upgrade instructions for log_file in v1.14 and v1.2.0	2021-08-23 11:27:34 -04:00
Mahmood Ali	c37339a8c8	Merge pull request #9160 from hashicorp/f-sysbatch core: implement system batch scheduler	2021-08-16 09:30:24 -04:00
Michael Schurter	a7aae6fa0c	Merge pull request #10848 from ggriffiths/listsnapshot_secrets CSI Listsnapshot secrets support	2021-08-10 15:59:33 -07:00
Seth Hoenig	3371214431	core: implement system batch scheduler This PR implements a new "System Batch" scheduler type. Jobs can make use of this new scheduler by setting their type to 'sysbatch'. Like the name implies, sysbatch can be thought of as a hybrid between system and batch jobs - it is for running short lived jobs intended to run on every compatible node in the cluster. As with batch jobs, sysbatch jobs can also be periodic and/or parameterized dispatch jobs. A sysbatch job is considered complete when it has been run on all compatible nodes until reaching a terminal state (success or failed on retries). Feasibility and preemption are governed the same as with system jobs. In this PR, the update stanza is not yet supported. The update stanza is sill limited in functionality for the underlying system scheduler, and is not useful yet for sysbatch jobs. Further work in #4740 will improve support for the update stanza and deployments. Closes #2527	2021-08-03 10:30:47 -04:00
James Rasell	78a489418d	cli: fix minor format error within `-ca-cert` help text.	2021-08-03 16:05:06 +02:00
Mahmood Ali	0bc12fba7c	Only initialize task.VolumeMounts when not-nil (#10990 ) 1.1.3 had a bug where task.VolumeMounts will be an empty slice instead of nil. Eventually, it gets canonicalized and is set to `nil`, but it seems to confuse dry-run planning. The regression was introduced in https://github.com/hashicorp/nomad/pull/10855/files#diff-56b3c82fcbc857f8fb93a903f1610f6e6859b3610a4eddf92bad9ea27fdc85ecL1028-R1037 . Curiously, it's the only place where `len(apiTask.VolumeMounts)` check was dropped. I assume it was dropped accidentally. Fixes #10981	2021-08-02 13:08:10 -04:00
Nomad Release bot	b5dff8be42	Generate files for 1.1.3 release	2021-07-29 03:43:03 +00:00
Grant Griffiths	fecbbaee22	CSI ListSnapshots secrets implementation Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2021-07-28 11:30:29 -07:00
Mahmood Ali	d97927ebcf	cli: Use glint to determine if os.Stdout is tty (#10926 ) Use glint to determine if os.Stdout is a terminal. glint Terminal renderer expects os.Stdout [not only to be a terminal, but also to have non-zero size](`b492b545f6/renderer_term.go (L39-L46)`). It's unclear how this condition arises, but this additional check causes Nomad to render deployments progress through glint when glint cannot support it. By using golint to perform the check, we eliminate the risk of mis-judgement.	2021-07-23 11:27:47 -04:00
Luiz Aoqui	484037aff1	fix `nomad alloc signal` help message (#10917 )	2021-07-21 11:02:44 -04:00
Kent 'picat' Gruber	decd59dbd1	Merge pull request #10886 from hashicorp/cli-handle-successful-deployment Handle successful/canceled/blocked deployments in CLI output	2021-07-16 12:27:22 -04:00
Kent 'picat' Gruber	b85b56624b	Handle `DeploymentStatusFailed` unless `hasAutoRevert`	2021-07-15 17:06:13 -04:00
Mahmood Ali	996ea1fa46	Merge pull request #10875 from hashicorp/b-namespace-flag-override cli: `-namespace` should override job namespace	2021-07-14 17:28:36 -04:00
Kent 'picat' Gruber	15342d0f6a	Handle successful/canceled/blocked deployments in CLI output Otherwise the spinner would just end, which felt a bit awkward. I wanted to see a "✓" to know that everything was ok, and a "!" (maybe something else?) if something went wrong.	2021-07-09 19:27:55 -04:00
Seth Hoenig	7c3db812fd	consul/connect: remove sidecar proxy before removing parent service This PR will have Nomad de-register a sidecar proxy service before attempting to de-register the parent service. Otherwise, Consul will emit a warning and an error. Fixes #10845	2021-07-08 13:30:19 -05:00
Seth Hoenig	2607853a26	Merge pull request #10872 from hashicorp/b-cc-regex-checkids consul/connect: Avoid assumption of parent service when filtering connect proxies	2021-07-08 13:29:40 -05:00
Seth Hoenig	284cd214ec	consul/connect: improve regex from CR suggestions	2021-07-08 13:05:05 -05:00
Tim Gross	a3bc87a2eb	cli: `-namespace` should override job namespace When a jobspec doesn't include a namespace, we provide it with the default namespace, but this ends up overriding the explicit `-namespace` flag. This changeset uses the same logic as region parsing to create an order of precedence: the query string parameter (the `-namespace` flag) overrides the API request body which overrides the jobspec.	2021-07-08 13:17:27 -04:00
Seth Hoenig	868b246128	consul/connect: Avoid assumption of parent service when filtering connect proxies This PR uses regex-based matching for sidecar proxy services and checks when syncing with Consul. Previously we would check if the parent of the sidecar was still being tracked in Nomad. This is a false invariant - one which we must not depend when we make #10845 work. Fixes #10843	2021-07-08 09:43:41 -05:00
Mahmood Ali	1f34f2197b	Merge pull request #10806 from hashicorp/munda/idempotent-job-dispatch Enforce idempotency of dispatched jobs using token on dispatch request	2021-07-08 10:23:31 -04:00
Tim Gross	8f25a9d7cd	cni: respect default `cni_config_dir` and `cni_path` (#10870 ) The default agent configuration values were not set, which meant they were not being set in the client configuration and this results in fingerprints failing unless the values were set explicitly.	2021-07-08 09:56:57 -04:00
Tim Gross	e88e1e5001	testing: prevent panic when `job status` output changes (#10869 ) The `command/TestJobStatusCommand_Run` test assumes that it gets back running allocations and will panic the test runner rather than failing.	2021-07-08 09:25:44 -04:00
Alex Munda	02c1a4d912	Set/parse idempotency_token query param	2021-07-07 16:26:55 -05:00
Seth Hoenig	a57b066402	Merge pull request #10865 from hashicorp/b-deregister-noops consul: avoid extra sync operations when no action required	2021-07-07 13:42:46 -05:00
Isabel Suchanek	13db600665	cli: add -task flag to alloc signal, restart (#10859 ) Alloc exec only works when task is passed as a flag and not an arg. Alloc logs currently accepts either, but alloc signal and restart only accept task as an arg. This adds -task as a flag to the other alloc commands to make the cli UX consistent. If task is passed as a flag and an arg, it ignores the arg.	2021-07-07 09:58:16 -07:00
Seth Hoenig	56a6a1b1df	consul: avoid extra sync operations when no action required This PR makes it so the Consul sync logic will ignore operations that do not specify an action to take (i.e. [de-]register [services\|checks]). Ideally such noops would be discarded at the callsites (i.e. users of [Create\|Update\|Remove]Workload], but we can also be defensive at the commit point. Also adds 2 trace logging statements which are helpful for diagnosing sync operations with Consul - when they happen and why. Fixes #10797	2021-07-07 11:24:56 -05:00
Tim Gross	69a7c9db7e	csi: account for nil volume_mount in API-to-structs conversion (#10855 ) Fix a nil pointer in the API struct to `nomad/structs` conversion when a `volume_mount` block is empty.	2021-07-07 08:06:39 -04:00
James Rasell	7a89dfe0cb	cli: fixed system commands so they correctly use passed flags.	2021-06-28 10:57:50 +02:00
Tim Gross	38e83f5ddc	csi: fix CLI panic when formatting volume status with -verbose flag (#10818 ) When the `-verbose` flag is passed to the `nomad volume status` command, we hit a code path where the rows of text to be formatted were not initialized correctly, resulting in a panic in the CLI.	2021-06-25 16:17:37 -04:00
Dave May	1e51d00d98	Add remaining pprof profiles to nomad operator debug (#10748 ) * Add remaining pprof profiles to debug dump * Refactor pprof profile capture * Add WaitForFilesUntil and WaitForResultUntil utility functions * Add CHANGELOG entry	2021-06-21 14:22:49 -04:00
Russell Rollins	76446ba512	Adds error handling for client error in getRandomJobAlloc. (#10787 )	2021-06-18 16:26:43 -04:00
Seth Hoenig	0d9208f1a0	consul: set task name only for group service checks This PR fixes a bug introduced in a refactoring https://github.com/hashicorp/nomad/pull/10764/files#diff-56b3c82fcbc857f8fb93a903f1610f6e6859b3610a4eddf92bad9ea27fdc85ec where task level service checks would inherent the task name field, when they shouldn't. Fixes #10781	2021-06-18 12:16:27 -05:00
Seth Hoenig	532b898b07	consul/connect: in-place update service definition when connect upstreams are modified This PR fixes a bug where modifying the upstreams of a Connect sidecar proxy would not result Consul applying the changes, unless an additional change to the job would trigger a task replacement (thus replacing the service definition). The fix is to check if upstreams have been modified between Nomad's view of the sidecar service definition, and the service definition for the sidecar that is actually registered in Consul. Fixes #8754	2021-06-16 16:48:26 -05:00
Seth Hoenig	d75669da4a	consul: make failures_before_critical and success_before_passing work with group services This PR fixes some job submission plumbing to make sure the Consul Check parameters - failure_before_critical - success_before_passing work with group-level services. They already work with task-level services.	2021-06-15 11:20:40 -05:00
Isabel Suchanek	e3cde4f4b3	cli: check deployment exists before monitoring (#10757 ) System and batch jobs don't create deployments, which means nomad tries to monitor a non-existent deployment when it runs a job and outputs an error message. This adds a check to make sure a deployment exists before monitoring. Also fixes some formatting. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2021-06-14 16:42:38 -07:00
Luiz Aoqui	98e0e952a6	fix agent-info help message formatting (#10747 )	2021-06-11 15:39:28 -04:00
James Rasell	939b23936a	Merge pull request #10744 from hashicorp/b-remove-duplicate-imports chore: remove duplicate import statements	2021-06-11 16:42:34 +02:00
James Rasell	492e308846	tests: remove duplicate import statements.	2021-06-11 09:39:22 +02:00
Nomad Release bot	7cc7389afd	Generate files for 1.1.1 release	2021-06-10 08:04:25 -04:00
Mahmood Ali	aa77c2731b	tests: use standard library testing.TB Glint pulled in an updated version of mitchellh/go-testing-interface which broke some existing tests because the update added a Parallel() method to testing.T. This switches to the standard library testing.TB which doesn't have a Parallel() method.	2021-06-09 16:18:45 -07:00
Isabel Suchanek	dfaef2468c	cli: add monitor flag to deployment status Adding '-verbose' will print out the allocation information for the deployment. This also changes the job run command so that it now blocks until deployment is complete and adds timestamps to the output so that it's more in line with the output of node drain. This uses glint to print in place in running in a tty. Because glint doesn't yet support cmd/powershell, Windows workflows use a different library to print in place, which results in slightly different formatting: 1) different margins, and 2) no spinner indicating deployment in progress.	2021-06-09 16:18:45 -07:00
Seth Hoenig	dbdc479970	consul: move consul acl tests into ent files (cherry-pick ent back to oss) This PR moves a lot of Consul ACL token validation tests into ent files, so that we can verify correct behavior difference between OSS and ENT Nomad versions.	2021-06-09 08:38:42 -05:00
Seth Hoenig	d656777dd7	Merge pull request #10720 from hashicorp/f-cns-acl-check consul: correctly check consul acl token namespace when using consul oss	2021-06-08 15:43:42 -05:00
Seth Hoenig	87be8c4c4b	consul: correctly check consul acl token namespace when using consul oss This PR fixes the Nomad Object Namespace <-> Consul ACL Token relationship check when using Consul OSS (or Consul ENT without namespace support). Nomad v1.1.0 introduced a regression where Nomad would fail the validation when submitting Connect jobs and allow_unauthenticated set to true, with Consul OSS - because it would do the namespace check against the Consul ACL token assuming the "default" namespace, which does not work because Consul OSS does not have namespaces. Instead of making the bad assumption, expand the namespace check to handle each special case explicitly. Fixes #10718	2021-06-08 13:55:57 -05:00
Seth Hoenig	c13bf8b917	Merge pull request #10715 from hashicorp/f-cns-attrs consul: probe consul namespace feature before using namespace api	2021-06-07 16:11:17 -05:00
Seth Hoenig	209e2d6d81	consul: pr cleanup namespace probe function signatures	2021-06-07 15:41:01 -05:00
Seth Hoenig	519429a2de	consul: probe consul namespace feature before using namespace api This PR changes Nomad's wrapper around the Consul NamespaceAPI so that it will detect if the Consul Namespaces feature is enabled before making a request to the Namespaces API. Namespaces are not enabled in Consul OSS, and require a suitable license to be used with Consul ENT. Previously Nomad would check for a 404 status code when makeing a request to the Namespaces API to "detect" if Consul OSS was being used. This does not work for Consul ENT with Namespaces disabled, which returns a 500. Now we avoid requesting the namespace API altogether if Consul is detected to be the OSS sku, or if the Namespaces feature is not licensed. Since Consul can be upgraded from OSS to ENT, or a new license applied, we cache the value for 1 minute, refreshing on demand if expired. Fixes https://github.com/hashicorp/nomad-enterprise/issues/575 Note that the ticket originally describes using attributes from https://github.com/hashicorp/nomad/issues/10688. This turns out not to be possible due to a chicken-egg situation between bootstrapping the agent and setting up the consul client. Also fun: the Consul fingerprinter creates its own Consul client, because there is no [currently] no way to pass the agent's client through the fingerprint factory.	2021-06-07 12:19:25 -05:00
James Rasell	888371a012	cmd: validate the type flag when querying plugin status.	2021-06-07 13:53:28 +02:00
Jasmine Dahilig	ca4be6857e	deployment query rate limit (#10706 )	2021-06-04 12:38:46 -07:00
Seth Hoenig	d026ff1f66	consul/connect: add support for connect mesh gateways This PR implements first-class support for Nomad running Consul Connect Mesh Gateways. Mesh gateways enable services in the Connect mesh to make cross-DC connections via gateways, where each datacenter may not have full node interconnectivity. Consul docs with more information: https://www.consul.io/docs/connect/gateways/mesh-gateway The following group level service block can be used to establish a Connect mesh gateway. service { connect { gateway { mesh { // no configuration } } } } Services can make use of a mesh gateway by configuring so in their upstream blocks, e.g. service { connect { sidecar_service { proxy { upstreams { destination_name = "<service>" local_bind_port = <port> datacenter = "<datacenter>" mesh_gateway { mode = "<mode>" } } } } } } Typical use of a mesh gateway is to create a bridge between datacenters. A mesh gateway should then be configured with a service port that is mapped from a host_network configured on a WAN interface in Nomad agent config, e.g. client { host_network "public" { interface = "eth1" } } Create a port mapping in the group.network block for use by the mesh gateway service from the public host_network, e.g. network { mode = "bridge" port "mesh_wan" { host_network = "public" } } Use this port label for the service.port of the mesh gateway, e.g. service { name = "mesh-gateway" port = "mesh_wan" connect { gateway { mesh {} } } } Currently Envoy is the only supported gateway implementation in Consul. By default Nomad client will run the latest official Envoy docker image supported by the local Consul agent. The Envoy task can be customized by setting `meta.connect.gateway_image` in agent config or by setting the `connect.sidecar_task` block. Gateways require Consul 1.8.0+, enforced by the Nomad scheduler. Closes #9446	2021-06-04 08:24:49 -05:00
Grant Griffiths	3f41150fbb	CSI snapshot list: do not shorten snapshot ID Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2021-05-27 13:28:18 -04:00
Mahmood Ali	0f5539c382	exec: http: close websocket connection gracefully In this loop, we ought to close the websocket connection gracefully when the StreamErrWrapper reaches EOF. Previously, it's possible that that we drop the last few events or skip sending the websocket closure. If `handler(handlerPipe)` returns and `cancel` is called, before the loop here completes processing streaming events, the loop exits prematurely without propagating the last few events. Instead here, the loop continues until we hit `httpPipe` EOF (through `decoder.Decode`), to ensure we process the events to completion.	2021-05-24 13:37:23 -04:00
Luiz Aoqui	c1ef539fa3	Display confirmation message on 'nomad volume delete' and 'nomad volume deregister'	2021-05-24 12:02:55 -04:00
Tim Gross	82fe7300e5	cli: improve wildcard namespace prefix matches (#10648 ) When a wildcard namespace is used for `nomad job` commands that support prefix matching, avoid asking the user for input if a prefix is an unambiguous exact match so that the behavior is similar to the commands using a specific or unset namespace.	2021-05-24 11:38:05 -04:00
Tim Gross	084a46e0e5	agent: surface websocket errors in logs The websocket interface used for `alloc exec` has to silently drop client send errors because otherwise those errors would interleave with the streamed output. But we may be able to surface errors that cause terminated websockets a little better in the HTTP server logs.	2021-05-24 09:46:45 -04:00
Mahmood Ali	b518454bf8	cli: Handle nil MemoryMaxMB (#10620 ) Handle when MemoryMaxMB is nil, as expected when a new 1.1.0 is hitting a pre-1.1.0 Server.	2021-05-19 16:56:06 -04:00
Nomad Release bot	5be44af07d	Generate files for 1.1.0-rc1 release	2021-05-12 22:43:48 +00:00
Chris Baker	263ddd567c	Node Drain Metadata (#10250 )	2021-05-07 13:58:40 -04:00
Mahmood Ali	102763c979	Support disabling TCP checks for connect sidecar services	2021-05-07 12:10:26 -04:00
Nick Ethier	2978c430e5	command: show number of reserved cores on alloc status output	2021-05-05 08:11:41 -04:00
Mahmood Ali	4b95f6ef42	api: actually set MemoryOversubscriptionEnabled (#10493 )	2021-05-02 22:53:53 -04:00
Mahmood Ali	98a9a9052f	Port OSS changes for Enterprise Quota accounting (#10481 )	2021-04-30 09:48:03 -04:00
Tim Gross	7fdfbfc0f0	license: remove "Terminates At" from license get command The `Terminates At` field can't be removed from the struct for backwards compatibility reasons, but there's no purpose to it anymore so we shouldn't be showing it to end users of the command.	2021-04-28 12:00:30 -04:00
Tim Gross	4f9c5c4bac	license: update 'license get' command	2021-04-28 12:00:30 -04:00
Seth Hoenig	d54a606819	Merge pull request #10439 from hashicorp/pick-ent-acls-changes e2e: add e2e tests for consul namespaces on ent with acls	2021-04-28 08:30:08 -06:00
Tim Gross	79f81d617e	licensing: remove raft storage and sync This changeset is the OSS portion of the work to remove the raft storage and sync for Nomad Enterprise.	2021-04-28 10:28:23 -04:00
Seth Hoenig	09cd01a5f3	e2e: add e2e tests for consul namespaces on ent with acls This PR adds e2e tests for Consul Namespaces for Nomad Enterprise with Consul ACLs enabled. Needed to add support for Consul ACL tokens with `namespace` and `namespace_prefix` blocks, which Nomad parses and validates before tossing the token. These bits will need to be picked back to OSS.	2021-04-27 14:45:54 -06:00
Mahmood Ali	ed4aad458c	api: Ignore User provided ParentID (#10424 ) ParentID is an internal field that Nomad sets for dispatched or parameterized jobs. Job submitters should not be able to set it directly, as that messes up children tracking. Fixes #10422 . It specifically stops the scheduler from honoring the ParentID. The reason failure and why the scheduler didn't schedule that job once it was created is very interesting and requires follow up with a more technical issue.	2021-04-23 16:22:17 -04:00
Charlie Voiselle	ef8ca60693	Enable go-sockaddr templating for `network-interface` (#10404 ) Add templating to `network-interface` option. This PR also adds a fast-fail to in the case where an invalid interface is set or produced by the template * add tests and check for valid interface * Add documentation * Incorporate suggestions from code review Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2021-04-20 13:55:10 -04:00
Seth Hoenig	4e6dbaaec1	Merge pull request #10184 from hashicorp/f-fuzzy-search api: implement fuzzy search API	2021-04-20 09:06:40 -06:00
Seth Hoenig	509490e5d2	e2e: consul namespace tests from nomad ent (cherry-picked from ent without _ent things) This is part 2/4 of e2e tests for Consul Namespaces. Took a first pass at what the parameterized tests can look like, but only on the ENT side for this PR. Will continue to refactor in the next PRs. Also fixes 2 bugs: - Config Entries registered by Nomad Server on job registration were not getting Namespace set - Group level script checks were not getting Namespace set Those changes will need to be copied back to Nomad OSS. Nomad OSS + no ACLs (previously, needs refactor) Nomad ENT + no ACLs (this) Nomad OSS + ACLs (todo) Nomad ENT + ALCs (todo)	2021-04-19 15:35:31 -06:00
Mahmood Ali	d880ba9c62	cli: filename arg for `volume init` and `quote init`	2021-04-18 14:14:05 -04:00
Seth Hoenig	1ee8d5ffc5	api: implement fuzzy search API This PR introduces the /v1/search/fuzzy API endpoint, used for fuzzy searching objects in Nomad. The fuzzy search endpoint routes requests to the Nomad Server leader, which implements the Search.FuzzySearch RPC method. Requests to the fuzzy search API are based on the api.FuzzySearchRequest object, e.g. { "Text": "ed", "Context": "all" } Responses from the fuzzy search API are based on the api.FuzzySearchResponse object, e.g. { "Index": 27, "KnownLeader": true, "LastContact": 0, "Matches": { "tasks": [ { "ID": "redis", "Scope": [ "default", "example", "cache" ] } ], "evals": [], "deployment": [], "volumes": [], "scaling_policy": [], "images": [ { "ID": "redis:3.2", "Scope": [ "default", "example", "cache", "redis" ] } ] }, "Truncations": { "volumes": false, "scaling_policy": false, "evals": false, "deployment": false } } The API is tunable using the new server.search stanza, e.g. server { search { fuzzy_enabled = true limit_query = 200 limit_results = 1000 min_term_length = 5 } } These values can be increased or decreased, so as to provide more search results or to reduce load on the Nomad Server. The fuzzy search API can be disabled entirely by setting `fuzzy_enabled` to `false`.	2021-04-16 16:36:07 -06:00
Nick Ethier	339c671e29	agent: add test for reserved core config mapping	2021-04-13 13:28:15 -04:00
Nick Ethier	edc0da9040	client: only fingerprint reservable cores via cgroups, allowing manual override for other platforms	2021-04-13 13:28:15 -04:00
Nick Ethier	bed4e92b61	fingerprint: implement client fingerprinting of reservable cores on Linux systems this is derived from the configure cpuset cgroup parent (defaults to /nomad) for non Linux systems and Linux systems where cgroups are not enabled, the client defaults to using all cores	2021-04-13 13:28:15 -04:00
Mahmood Ali	6bd2600cd0	Merge pull request #10370 from alrs/command-agent-errs command/agent: fix dropped test errors	2021-04-13 11:40:12 -04:00
Nick Spain	653d84ef68	Add a 'body' field to the check stanza Consul allows specifying the HTTP body to send in a health check. Nomad uses Consul for health checking so this just plumbs the value through to where the Consul API is called. There is no validation that `body` is not used with an incompatible check method like GET.	2021-04-13 09:15:35 -04:00
Lars Lehtonen	d2e7f31906	command/agent: fix dropped test errors	2021-04-13 01:51:24 -07:00
Tim Gross	4fc27df695	cli: add help for 'ui -authenticate' flag	2021-04-12 13:56:55 -04:00
Tim Gross	cba09a5bcf	CSI: listing from plugins can return EOF The AWS EBS CSI plugin was observed to return a EOF when we get to the end of the paging for `ListSnapshots`, counter to specification. Handle this case gracefully, including for `ListVolumes` (which EBS doesn't support but has similar semantics). Also fixes a timestamp formatting bug on `ListSnapshots`	2021-04-08 13:32:19 -04:00
Tim Gross	0892d34ff9	CSI: capability block is required for volume registration	2021-04-08 13:02:24 -04:00
Tim Gross	7d16e49a14	CSI: fix wrong output struct for snapshot list endpoint	2021-04-07 12:00:33 -04:00
Tim Gross	d2d12b201c	CSI: fix URL for volume snapshot list	2021-04-07 12:00:33 -04:00
Tim Gross	e4f34a96e3	CSI: deletes with API don't have request body Our API client `delete` method doesn't include a request body, but accepts an interface for the response. We were accidentally putting the request body into the response, which doesn't get picked up in unit tests because we're not reading the (always empty) response body anyways.	2021-04-07 12:00:33 -04:00
Tim Gross	35ee06137e	CSI: fix index error on formatting function for volume snapshots	2021-04-07 12:00:33 -04:00
Tim Gross	34a7b9da5c	CSI: fix wrong RPC name on ListSnapshots	2021-04-07 12:00:33 -04:00
Tim Gross	8af5bd1ad4	CSI: fix decoding error on snapshot create Consumers of the CSI HTTP API are expecting a response object and not a slice of snapshots. Fix the return value.	2021-04-07 12:00:33 -04:00
Tim Gross	69363705a8	CSI: fix HTTP routing for external volume list The HTTP router did not correctly route `/v1/volumes/external` without being explicitly added to the top-level router. Break this out into its own request handler.	2021-04-07 12:00:22 -04:00
Tim Gross	2e8dc1dee2	CSI: fix early return on error from list external volumes command If a plugin returns an error, we should continue at the outer scope to query the next plugin, otherwise we just retry the plugin we got an error on (potentially infinitely if it's an invalid request like an unsupported plugin).	2021-04-07 12:00:22 -04:00
Tim Gross	70f5363a89	docs: update CSI create/register fields Add new `access_mode`/`attachment_mode` fields. Make it more clear which set of fields belong to create vs register. Update the example spec that's generated by `volume init`.	2021-04-07 11:24:09 -04:00
Tim Gross	276633673d	CSI: use AccessMode/AttachmentMode from CSIVolumeClaim Registration of Nomad volumes previously allowed for a single volume capability (access mode + attachment mode pair). The recent `volume create` command requires that we pass a list of requested capabilities, but the existing workflow for claiming volumes and attaching them on the client assumed that the volume's single capability was correct and unchanging. Add `AccessMode` and `AttachmentMode` to `CSIVolumeClaim`, use these fields to set the initial claim value, and add backwards compatibility logic to handle the existing volumes that already have claims without these fields.	2021-04-07 11:24:09 -04:00

... 2 3 4 5 6 ...

3295 Commits