Commit Graph

3088 Commits

Author SHA1 Message Date
Tim Gross 7770eda3f1
config: fix test-only failures in UI handler setup (#11571)
The `TestHTTPServer_Limits_Error` test never starts the agent so it
had an incomplete configuration, which caused panics in the test. Fix
the configuration.

The PR #11555 had a branch name like `f-ui-*` which caused CI to skip
the unit tests over the HTTP handler setup, so this wasn't caught in
PR review.
2021-11-24 16:19:04 -05:00
Tim Gross fcb96de9a7
config: UI configuration block with Vault/Consul links (#11555)
Add `ui` block to agent configuration to enable/disable the web UI and
provide the web UI with links to Vault/Consul.
2021-11-24 11:20:02 -05:00
James Rasell 751c8217d1
core: allow setting and propagation of eval priority on job de/registration (#11532)
This change modifies the Nomad job register and deregister RPCs to
accept an updated option set which includes eval priority. This
param is optional and override the use of the job priority to set
the eval priority.

In order to ensure all evaluations as a result of the request use
the same eval priority, the priority is shared to the
allocReconciler and deploymentWatcher. This creates a new
distinction between eval priority and job priority.

The Nomad agent HTTP API has been modified to allow setting the
eval priority on job update and delete. To keep consistency with
the current v1 API, job update accepts this as a payload param;
job delete accepts this as a query param.

Any user supplied value is validated within the agent HTTP handler
removing the need to pass invalid requests to the server.

The register and deregister opts functions now all for setting
the eval priority on requests.

The change includes a small change to the DeregisterOpts function
which handles nil opts. This brings the function inline with the
RegisterOpts.
2021-11-23 09:23:31 +01:00
Tim Gross e729133134
api: return 404 for alloc FS list/stat endpoints (#11482)
* api: return 404 for alloc FS list/stat endpoints

If the alloc filesystem doesn't have a file requested by the List
Files or Stat File API, we currently return a HTTP 500 error with the
expected "file not found" error message. Return a HTTP 404 error
instead.

* update FS Handler

Previously the FS handler would interpret a 500 status as a 404
in the adapter layer by checking if the response body contained
the text  or is the response status
was 500 and then throw an error code for 404.

Co-authored-by: Jai Bhagat <jaybhagat841@gmail.com>
2021-11-17 11:15:07 -05:00
Luiz Aoqui 610a8a05e6
Merge release 1.2.0 rc1 branch (#11486) 2021-11-09 17:55:13 -05:00
Dave May 3c04d7927b
cli: refactor operator debug capture (#11466)
* debug: refactor Consul API collection
* debug: refactor Vault API collection
* debug: cleanup test timing
* debug: extend test to multiregion
* debug: save cmdline flags in bundle
* debug: add cli version to output
* Add changelog entry
2021-11-05 19:43:10 -04:00
Alessandro De Blasis 07c670fdc0
cli: show `host_network` in `nomad status` (#11432)
Enhance the CLI in order to return the host network in two flavors 
(default, verbose) of the `node status` command.

Fixes: #11223.
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
2021-11-05 09:02:46 -04:00
Florian Apolloner ef88795af3
Added a `-hcl2-strict` flag to allow for lenient hcl variable parsing. (#11284)
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2021-11-04 16:33:09 +01:00
James Rasell 674761436e
Merge pull request #11165 from hashicorp/b-gh-11149
jobspec2: ensure consistent error handling between var-file & var.
2021-11-04 16:24:00 +01:00
Mahmood Ali 4fc6e50782
Raft Debugging Improvements (#11414) 2021-11-04 10:16:12 -04:00
Michael Schurter ef3fc79225
Merge pull request #11334 from hashicorp/f-chroot-skip-allocdir
client: never embed alloc_dir in chroot
2021-11-03 16:48:09 -07:00
Luiz Aoqui 5d204c8ced
Revert "Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799)" (#11433) 2021-11-02 17:42:52 -04:00
James Rasell c071efbd6b
Merge pull request #11411 from hashicorp/f-gh-11406
cli: add json and template flag opts to acl bootstrap command.
2021-11-02 09:48:25 +01:00
Charlie Voiselle 29e7d46dd9
Making RPC Upgrade mode reloadable. (#11144)
- Making RPC Upgrade mode reloadable.
- Add suggestions from code review
- remove spurious comment
- switch to require(t,...) form for test.
- Add to changelog
2021-11-01 16:30:53 -04:00
James Rasell 6c9e6e6f20
cli: add json and template flag opts to acl boostrap command. 2021-10-29 09:00:50 +02:00
Dave May 509c74ce19
debug: update default node-id and docs (#11398)
* debug: default node-id to all
* debug: align cli help and website documentation
2021-10-27 13:43:56 -04:00
Mahmood Ali cdddd64a42
logging: Log the cause behind agent startup failure (#11353)
Log the failure error when the agent fails to start. Previously, the
agent startup failure error would be emitted to the command UI but not
logged. So it doesn't get emitted to syslog or `log_file` if they are
set, and it makes debugging much harder. Also, logging the error again
before exit makes the error more visible: previously, the operator
needed to scroll to the top to find the error.

On a sample failure, the output will look like:
```
==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
==> Loaded configuration from sample-configs/config-bad
==> Starting Nomad agent...
==> Error starting agent: setting up server node ID failed: mkdir /path-without-permission: read-only file system
    2021-10-20T14:38:51.179-0400 [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/path-without-permission/plugins
    2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins
    2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [ERROR] agent: error starting agent: error="setting up server node ID failed: mkdir /path-without-permission: read-only file system"
```

This change adds the final `ERROR` message. It's easy to miss the `==>
Error starting agent` above.
2021-10-27 10:41:17 -07:00
Luiz Aoqui b463715a98
prevent active log from being overwritten when agent starts (#11386) 2021-10-26 20:57:07 -04:00
Luiz Aoqui 979faf41e5
fix test names (#11374) 2021-10-22 15:43:55 -04:00
Luiz Aoqui 3c22fc79a5
add dispatch idempotency token support in the CLI (#10930) 2021-10-22 12:39:05 -04:00
Luiz Aoqui 6853bf9632
cli: allow setting namespace and region in the `nomad ui` command (#11364) 2021-10-21 16:24:39 -04:00
Shishir Mahajan dd93f72920 Code cleanup: Remove extra if clause.
Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2021-10-19 16:52:11 -07:00
Michael Schurter 10c3bad652 client: never embed alloc_dir in chroot
Fixes #2522

Skip embedding client.alloc_dir when building chroot. If a user
configures a Nomad client agent so that the chroot_env will embed the
client.alloc_dir, Nomad will happily infinitely recurse while building
the chroot until something horrible happens. The best case scenario is
the filesystem's path length limit is hit. The worst case scenario is
disk space is exhausted.

A bad agent configuration will look something like this:

```hcl
data_dir = "/tmp/nomad-badagent"

client {
  enabled = true

  chroot_env {
    # Note that the source matches the data_dir
    "/tmp/nomad-badagent" = "/ohno"
    # ...
  }
}
```

Note that `/ohno/client` (the state_dir) will still be created but not
`/ohno/alloc` (the alloc_dir).
While I cannot think of a good reason why someone would want to embed
Nomad's client (and possibly server) directories in chroots, there
should be no cause for harm. chroots are only built when Nomad runs as
root, and Nomad disables running exec jobs as root by default. Therefore
even if client state is copied into chroots, it will be inaccessible to
tasks.

Skipping the `data_dir` and `{client,server}.state_dir` is possible, but
this PR attempts to implement the minimum viable solution to reduce risk
of unintended side effects or bugs.

When running tests as root in a vm without the fix, the following error
occurs:

```
=== RUN   TestAllocDir_SkipAllocDir
    alloc_dir_test.go:520:
                Error Trace:    alloc_dir_test.go:520
                Error:          Received unexpected error:
                                Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long
                Test:           TestAllocDir_SkipAllocDir
--- FAIL: TestAllocDir_SkipAllocDir (22.76s)
```

Also removed unused Copy methods on AllocDir and TaskDir structs.

Thanks to @eveld for not letting me forget about this!
2021-10-18 09:22:01 -07:00
Luiz Aoqui 130970e12e
Merge missing commits from 1.2.0-beta1 release branch (#11319) 2021-10-14 16:10:05 -04:00
Luiz Aoqui 9d48daed8c
fix `nomad job allocs` command name (#11314) 2021-10-14 12:44:59 -04:00
Charlie Voiselle cb8e52b5df
Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799) 2021-10-13 21:23:13 -04:00
Michael Schurter 59fda1894e
Merge pull request #11167 from a-zagaevskiy/master
Support configurable dynamic port range
2021-10-13 16:47:38 -07:00
Michael Schurter e14cd34392 client: improve errors & tests for dynamic ports 2021-10-13 16:25:25 -07:00
Dave May c37a6ed583
cli: rename paths in debug bundle for clarity (#11307)
* Rename folders to reflect purpose
* Improve captured files test coverage
* Rename CSI plugins output file
* Add changelog entry
* fix test and make changelog message more explicit

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2021-10-13 18:00:55 -04:00
Mahmood Ali fa4df28fcd
tests: ensure that tests restore env-var values (#11309)
Fix a test corruption issue, where a test accidentally unsets
the `NOMAD_LICENSE` environment variable, that's relied on by some
tests.

As a habit, tests should always restore the environment variable value
on test completion. Golang 1.17 introduced
[`t.Setenv`](https://pkg.go.dev/testing#T.Setenv) to address this issue.
However, as 1.0.x and 1.1.x branches target golang 1.15 and 1.16, I
opted to use a helper function to ease backports.
2021-10-13 17:26:56 -04:00
Dave May 305e8e98bf
cli: Improved autocomplete support for job dispatch and operator debug (#11270)
* Add autocomplete to nomad job dispatch
* Add autocomplete to nomad operator debug
* Update incorrect comment
* Update test to verify autocomplete
* Add changelog
* Apply lint suggestions
* Create dynamic slices instead of specific length
* Align style across predictors
2021-10-12 20:01:54 -04:00
Dave May 2d14c54fa0
debug: Improve namespace and region support (#11269)
* Include region and namespace in CLI output
* Add region and prefix matching for server members
* Add namespace and region API outputs to cluster metadata folder
* Add region awareness to WaitForClient helper function
* Add helper functions for SliceStringHasPrefix and StringHasPrefixInSlice
* Refactor test client agent generation
* Add tests for region
* Add changelog
2021-10-12 16:58:41 -04:00
Dave May 76b05f3cd2
cli: Add nomad job allocs command (#11242) 2021-10-12 16:30:36 -04:00
Luiz Aoqui 3e0bad5a41
wrap `log` messages with `hclog` (#11291) 2021-10-12 14:38:44 -04:00
Aleksandr Zagaevskiy d92666e6a7 fixup! Support configurable dynamic port range 2021-10-11 14:13:59 +03:00
Matt Mukerjee b56432e645
Add FailoverHeartbeatTTL to config (#11127)
FailoverHeartbeatTTL is the amount of time to wait after a server leader failure
before considering reallocating client tasks. This TTL should be fairly long as
the new server leader needs to rebuild the entire heartbeat map for the
cluster. In deployments with a small number of machines, the default TTL (5m)
may be unnecessary long. Let's allow operators to configure this value in their
config files.
2021-10-06 18:48:12 -04:00
Shantanu Gadgil 0ce156123d
auth_soft_fail needed for public images when agent is configured with auth (#11190) 2021-10-06 15:30:23 -04:00
Florian Apolloner 0fa60dae9d
Added support for `-force-color` to the CLI. (#10975) 2021-10-06 10:02:42 -04:00
Yan 6ff0b6debc
add `-show-url` option for `ui` command (#11213) 2021-10-05 20:08:42 -04:00
Luiz Aoqui 0a62bdc3c5
fix panic when Connect mesh gateway doesn't have a proxy block (#11257)
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2021-10-04 15:52:07 -04:00
Mahmood Ali 4d90afb425 gofmt all the files
mostly to handle build directives in 1.17.
2021-10-01 10:14:28 -04:00
Michael Schurter c6e72b6818 client: output reserved ports with min/max ports
Also add a little more min/max port testing and add the consts back that
had been removed: but unexported and as defaults.
2021-09-30 17:05:46 -07:00
Michael Schurter 4ad0c258b9 client: add NOMAD_LICENSE to default env deny list
By default we should not expose the NOMAD_LICENSE environment variable
to tasks.

Also refactor where the DefaultEnvDenyList lives so we don't have to
maintain 2 copies of it. Since client/config is the most obvious
location, keep a reference there to its unfortunate home buried deep
in command/agent/host. Since the agent uses this list as well for the
/agent/host endpoint the list must be accessible from both command/agent
and client.
2021-09-21 13:51:17 -07:00
Florian Apolloner 7805b8edf4
Fixed usage of NOMAD_CLI_NO_COLOR env variable. (#11168) 2021-09-17 20:37:05 -04:00
James Rasell 0e926ef3fd
allow configuration of Docker hostnames in bridge mode (#11173)
Add a new hostname string parameter to the network block which
allows operators to specify the hostname of the network namespace.
Changing this causes a destructive update to the allocation and it
is omitted if empty from API responses. This parameter also supports
interpolation.

In order to have a hostname passed as a configuration param when
creating an allocation network, the CreateNetwork func of the
DriverNetworkManager interface needs to be updated. In order to
minimize the disruption of future changes, rather than add another
string func arg, the function now accepts a request struct along with
the allocID param. The struct has the hostname as a field.

The in-tree implementations of DriverNetworkManager.CreateNetwork
have been modified to account for the function signature change.
In updating for the change, the enhancement of adding hostnames to
network namespaces has also been added to the Docker driver, whilst
the default Linux manager does not current implement it.
2021-09-16 08:13:09 +02:00
Aleksandr Zagaevskiy ebb87e65fe Support configurable dynamic port range 2021-09-10 11:52:47 +03:00
James Rasell 257d63eec9
jobspec2: ensure consistent error handling between var-file & var. 2021-09-09 11:18:11 +02:00
James Rasell 04a15b5c16
Merge pull request #11105 from hashicorp/f-add-staticcheck-ci
ci: add staticcheck with ST1020 and update golangci-lint
2021-09-09 09:42:12 +02:00
Luiz Aoqui 4dd8b6b571
cli: include all possible scores in alloc status metric table (#11128) 2021-09-08 17:30:11 -04:00
James Rasell d4a333e9b5
lint: mark false positive or fix gocritic append lint errors. 2021-09-06 10:49:44 +02:00