During incident response, operators may find that automated processes
elsewhere in the organization can be generating new workloads on Nomad
clusters that are unable to handle the workload. This changeset adds a
field to the `SchedulerConfiguration` API that causes all job
registration calls to be rejected unless the request has a management
ACL token.
The `consul.client_auto_join` configuration block tells the Nomad
client whether to use Consul service discovery to find Nomad
servers. By default it is set to `true`, but contrary to the
documentation it was only respected during the initial client
registration. If a client missed a heartbeat, failed a
`Node.UpdateStatus` RPC, or if there was no Nomad leader, the client
would fallback to Consul even if `client_auto_join` was set to
`false`. This changeset returns early from the client's trigger for
Consul discovery if the `client_auto_join` field is set to `false`.
In the system scheduler, if a subset of clients are filtered by class,
we hit a code path where the `AllocMetric` has been copied, but the
`Copy` method does not instantiate the various maps. This leads to an
assignment to a nil map. This changeset ensures that the maps are
non-nil before continuing.
The `Copy` method relies on functions in the `helper` package that all
return nil slices or maps when passed zero-length inputs. This
changeset to fix the panic bug intentionally defers updating those
functions because it'll have potential impact on memory usage. See
https://github.com/hashicorp/nomad/issues/11564 for more details.
In the system scheduler, if a subset of clients are filtered by class,
we hit a code path where the `AllocMetric` has been copied, but the
`Copy` method does not instantiate the various maps. This leads to an
assignment to a nil map. This changeset ensures that the maps are
non-nil before continuing.
The `Copy` method relies on functions in the `helper` package that all
return nil slices or maps when passed zero-length inputs. This
changeset to fix the panic bug intentionally defers updating those
functions because it'll have potential impact on memory usage. See
https://github.com/hashicorp/nomad/issues/11564 for more details.
This change modifies the Nomad job register and deregister RPCs to
accept an updated option set which includes eval priority. This
param is optional and override the use of the job priority to set
the eval priority.
In order to ensure all evaluations as a result of the request use
the same eval priority, the priority is shared to the
allocReconciler and deploymentWatcher. This creates a new
distinction between eval priority and job priority.
The Nomad agent HTTP API has been modified to allow setting the
eval priority on job update and delete. To keep consistency with
the current v1 API, job update accepts this as a payload param;
job delete accepts this as a query param.
Any user supplied value is validated within the agent HTTP handler
removing the need to pass invalid requests to the server.
The register and deregister opts functions now all for setting
the eval priority on requests.
The change includes a small change to the DeregisterOpts function
which handles nil opts. This brings the function inline with the
RegisterOpts.
The QEMU driver allows arbitrary command line options, but many of
these options give access to host resources that operators may not
want to expose such as devices. Add an optional allowlist to the
plugin configuration so that operators can limit the resources for
QEMU.
* api: return 404 for alloc FS list/stat endpoints
If the alloc filesystem doesn't have a file requested by the List
Files or Stat File API, we currently return a HTTP 500 error with the
expected "file not found" error message. Return a HTTP 404 error
instead.
* update FS Handler
Previously the FS handler would interpret a 500 status as a 404
in the adapter layer by checking if the response body contained
the text or is the response status
was 500 and then throw an error code for 404.
Co-authored-by: Jai Bhagat <jaybhagat841@gmail.com>
go-getter 1.5.9 includes a patch in 1.5.6 that automatically unpacks
uncompressed tar archives. Previously Nomad only unpacked compressed
archives, but documented that it unpacked all archives.
* debug: refactor Consul API collection
* debug: refactor Vault API collection
* debug: cleanup test timing
* debug: extend test to multiregion
* debug: save cmdline flags in bundle
* debug: add cli version to output
* Add changelog entry
Enhance the CLI in order to return the host network in two flavors
(default, verbose) of the `node status` command.
Fixes: #11223.
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
- Making RPC Upgrade mode reloadable.
- Add suggestions from code review
- remove spurious comment
- switch to require(t,...) form for test.
- Add to changelog
As we have continued to see reports of #9506 we need to elevate this log
line as it is the only way to detect when plans are being *erroneously*
rejected.
Users who see this log line repeatedly should drain and restart the node
in the log line. This seems to workaorund the issue.
Please post any details on #9506!
Log the failure error when the agent fails to start. Previously, the
agent startup failure error would be emitted to the command UI but not
logged. So it doesn't get emitted to syslog or `log_file` if they are
set, and it makes debugging much harder. Also, logging the error again
before exit makes the error more visible: previously, the operator
needed to scroll to the top to find the error.
On a sample failure, the output will look like:
```
==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
==> Loaded configuration from sample-configs/config-bad
==> Starting Nomad agent...
==> Error starting agent: setting up server node ID failed: mkdir /path-without-permission: read-only file system
2021-10-20T14:38:51.179-0400 [WARN] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/path-without-permission/plugins
2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins
2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins
2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=java type=driver plugin_version=0.1.0
2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=docker type=driver plugin_version=0.1.0
2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=exec type=driver plugin_version=0.1.0
2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
2021-10-20T14:38:51.181-0400 [ERROR] agent: error starting agent: error="setting up server node ID failed: mkdir /path-without-permission: read-only file system"
```
This change adds the final `ERROR` message. It's easy to miss the `==>
Error starting agent` above.
The system scheduler should leave allocs on draining nodes as-is, but
stop node stop allocs on nodes that are no longer part of the job
datacenters.
Previously, the scheduler did not make the distinction and left system
job allocs intact if they are already running.
I've added a failing test first, which you can see in https://app.circleci.com/jobs/github/hashicorp/nomad/179661 .
Fixes https://github.com/hashicorp/nomad/issues/11373
Pick up https://github.com/golang/snappy/pull/56 to handle arm64 architectures to fix panics. tldr; Golang 1.16 changed `memmove` implementation for arm64 requiring additional cpu registers that snappy wasn't preserving in its assembly implementation.
Other projects have experienced this issue as well, searching for `encode_arm64.s:666` on your favorite search engine will reveal some. Vault updated the dependency earlier this August: https://github.com/hashicorp/vault/pull/12371 .
I believe this issue affects Nomad 1.2.x and 1.1.x. Nomad 1.0.x use Golang 1.15 and isn't affected. However, backporting the change to 1.0.x should be harmless.
Fixed https://github.com/hashicorp/nomad/issues/11385 .
* Include region and namespace in CLI output
* Add region and prefix matching for server members
* Add namespace and region API outputs to cluster metadata folder
* Add region awareness to WaitForClient helper function
* Add helper functions for SliceStringHasPrefix and StringHasPrefixInSlice
* Refactor test client agent generation
* Add tests for region
* Add changelog
We see this error all the time
```
no handler registered for event
event.Message=, event.Annotations=, event.Timestamp=0001-01-01T00:00:00Z, event.TaskName=, event.AllocID=, event.TaskID=,
```
So we're handling an even with all default fields. I noted that this can
happen if only err is set as in
```
func (d *driverPluginClient) handleTaskEvents(reqCtx context.Context, ch chan *TaskEvent, stream proto.Driver_TaskEventsClient) {
defer close(ch)
for {
ev, err := stream.Recv()
if err != nil {
if err != io.EOF {
ch <- &TaskEvent{
Err: grpcutils.HandleReqCtxGrpcErr(err, reqCtx, d.doneCtx),
}
}
```
In this case Err fails to be serialized by the logger, see this test
```
ev := &drivers.TaskEvent{
Err: fmt.Errorf("errz"),
}
i.logger.Warn("ben test", "event", ev)
i.logger.Warn("ben test2", "event err str", ev.Err.Error())
i.logger.Warn("ben test3", "event err", ev.Err)
ev.Err = nil
i.logger.Warn("ben test4", "nil error", ev.Err)
2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.643900Z","driver":"mock_driver","event":{"TaskID":"","TaskName":"","AllocID":"","Timestamp":"0001-01-01T00:00:00Z","Message":"","Annotations":null,"Err":{}}}
2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test2","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.644226Z","driver":"mock_driver","event err str":"errz"}
2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test3","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.644240Z","driver":"mock_driver","event err":"errz"}
2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test4","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.644252Z","driver":"mock_driver","nil error":null}
```
Note in the first example err is set to an empty object and the error is
lost.
What we want is the last two examples which call out the err field
explicitly so we can see what it is in this case
FailoverHeartbeatTTL is the amount of time to wait after a server leader failure
before considering reallocating client tasks. This TTL should be fairly long as
the new server leader needs to rebuild the entire heartbeat map for the
cluster. In deployments with a small number of machines, the default TTL (5m)
may be unnecessary long. Let's allow operators to configure this value in their
config files.
Suppress stats streaming error log messages when task finishes.
Streaming errors are expected when a task finishes and they aren't
actionable to users.
Also, note that the task runner Stats hook retries collecting stats
after a delay. If the connection terminates prematurely, it will be
retried, and closing the stats stream is not very disruptive.
Ideally, executor terminates cleanly when task exits, but that's a more
substantial change that may require changing the executor/drivers interface.
Fixes#10814
By default we should not expose the NOMAD_LICENSE environment variable
to tasks.
Also refactor where the DefaultEnvDenyList lives so we don't have to
maintain 2 copies of it. Since client/config is the most obvious
location, keep a reference there to its unfortunate home buried deep
in command/agent/host. Since the agent uses this list as well for the
/agent/host endpoint the list must be accessible from both command/agent
and client.
While I don't think this fully encompasses the changes, other bits
like marking sysbatch as dead immediately are new so haven't changed
from a previous release.
This fixes a bug in the event stream API where it currently interprets
namespace=* as an actual namespace, not a wildcard. When Nomad parses
incoming requests, it sets namespace to default if not specified, which
means the request namespace will never be an empty string, which is what
the event subscription was checking for. This changes the conditional
logic to check for a wildcard namespace instead of an empty one.
It also updates some event tests to include the default namespace in the
subscription to match current behavior.
Fixes#10903
When mTLS is enabled, only nomad servers of the region should access the
Raft RPC layer. Clients and servers in other regions should only use the
Nomad RPC endpoints.
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Co-authored-by: Seth Hoenig <shoenig@hashicorp.com>
* don't timestamp active log file
* website: update log_file default value
* changelog: add entry for #11070
* website: add upgrade instructions for log_file in v1.14 and v1.2.0
When a node becomes ready, create an eval for all system jobs across
namespaces.
The previous code uses `job.ID` to deduplicate evals, but that ignores
the job namespace. Thus if there are multiple jobs in different
namespaces sharing the same ID/Name, only one will be considered for
running in the new node. Thus, Nomad may skip running some system jobs
in that node.
Fix a bug where system jobs may fail to be placed on a node that
initially was not eligible for system job placement.
This changes causes the reschedule to re-evaluate the node if any
attribute used in feasibility checks changes.
Fixes https://github.com/hashicorp/nomad/issues/8448
In a multi-task-group job, treat 0 canary groups as auto-promote.
This change fixes an edge case where Nomad requires a manual promotion,
if the job had any group with canary=0 and rest of groups having
auto_promote set.
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Speed up client startup, by retrying more until the servers are known.
Currently, if client fingerprinting is fast and finishes before the
client connect to a server, node registration may be delayed by 15
seconds or so!
Ideally, we'd wait until the client discovers the servers and then retry
immediately, but that requires significant code changes.
Here, we simply retry the node registration request every second. That's
basically the equivalent of check if the client discovered servers every
second. Should be a cheap operation.
When testing this change on my local computer and where both servers and
clients are co-located, the time from startup till node registration
dropped from 34 seconds to 8 seconds!
When creating a TCP proxy bridge for Connect tasks, we are at the
mercy of either end for managing the connection state. For long
lived gRPC connections the proxy could reasonably expect to stay
open until the context was cancelled. For the HTTP connections used
by connect native tasks, we experience connection disconnects.
The proxy gets recreated as needed on follow up requests, however
we also emit a WARN log when the connection is broken. This PR
lowers the WARN to a TRACE, because these disconnects are to be
expected.
Ideally we would be able to proxy at the HTTP layer, however Consul
or the connect native task could be configured to expect mTLS, preventing
Nomad from MiTM the requests.
We also can't mange the proxy lifecycle more intelligently, because
we have no control over the HTTP client or server and how they wish
to manage connection state.
What we have now works, it's just noisy.
Fixes#10933
* api: revert to defaulting to http/1
PR #10778 incidentally changed the api http client to connect with
HTTP/2 first. However, the websocket libraries used in `alloc exec`
features don't handle http/2 well, and don't downgrade to http/1
gracefully.
Given that the switch is incidental, and not requested by users.
Furthermore, api consumers can opt-in to forcing http/2 by setting
custom http clients.
Fixes#10922
Fix a panic in handling one-time auth tokens, used to support `nomad ui
--authenticate`.
If the nomad leader is a 1.1.x with some servers running as 1.0.x, the
pre-1.1.0 servers risk crashing and the cluster may lose quorum. That
can happen when `nomad authenticate -ui` command is issued, or when the
leader scans for expired tokens every 10 minutes.
Fixed#10943 .
Use glint to determine if os.Stdout is a terminal.
glint Terminal renderer expects os.Stdout [not only to be a terminal, but also to have non-zero size](b492b545f6/renderer_term.go (L39-L46)). It's unclear how this condition arises, but this additional check causes Nomad to render deployments progress through glint when glint cannot support it.
By using golint to perform the check, we eliminate the risk of mis-judgement.
When the client launches, use a consistent read to fetch its own allocs,
but allow stale read afterwards as long as reads don't revert into older
state.
This change addresses an edge case affecting restarting client. When a
client restarts, it may fetch a stale data concerning its allocs: allocs
that have completed prior to the client shutdown may still have "run/running"
desired/client status, and have the client attempt to re-run again.
An alternative approach is to track the indices such that the client
set MinQueryIndex on the maximum index the client ever saw, or compare
received allocs against locally restored client state. Garbage
collection complicates this approach (local knowledge is not complete),
and the approach still risks starting "dead" allocations (e.g. the
allocation may have been placed when client just restarted and have
already been reschuled by the time the client started. This approach
here is effective against all kinds of stalness problems with small
overhead.
Basically the same as #10896 but with the Affinity struct.
Since we use reflect.DeepEquals for job comparison, there is
risk of false positives for changes due to a job struct with
memoized vs non-memoized strings.
Closes#10897
This PR fixes a bug where the underlying Envoy process of a Connect gateway
would consume a full core of CPU if there is more than one sidecar or gateway
in a group. The utilization was being caused by Consul injecting an envoy_ready_listener
on 127.0.0.1:8443, of which only one of the Envoys would be able to bind to.
The others would spin in a hot loop trying to bind the listener.
As a workaround, we now specify -address during the Envoy bootstrap config
step, which is how Consul maps this ready listener. Because there is already
the envoy_admin_listener, and we need to continue supporting running gateways
in host networking mode, and in those case we want to use the same port
value coming from the service.port field, we now bind the admin listener to
the 127.0.0.2 loop-back interface, and the ready listener takes 127.0.0.1.
This shouldn't make a difference in the 99.999% use case where envoy is
being run in its official docker container. Advanced users can reference
${NOMAD_ENVOY_ADMIN_ADDR_<service>} (as they 'ought to) if needed,
as well as the new variable ${NOMAD_ENVOY_READY_ADDR_<service>} for the
envoy_ready_listener.
Adds missing interpolation step to the `meta` blocks when building the task
environment. Also fixes incorrect parameter order in the test assertion and
adds diagnostics to the test.
This PR will have Nomad de-register a sidecar proxy service before
attempting to de-register the parent service. Otherwise, Consul will
emit a warning and an error.
Fixes#10845
When a jobspec doesn't include a namespace, we provide it with the default
namespace, but this ends up overriding the explicit `-namespace` flag. This
changeset uses the same logic as region parsing to create an order of
precedence: the query string parameter (the `-namespace` flag) overrides the
API request body which overrides the jobspec.
This PR uses regex-based matching for sidecar proxy services and checks when syncing
with Consul. Previously we would check if the parent of the sidecar was still being
tracked in Nomad. This is a false invariant - one which we must not depend when we
make #10845 work.
Fixes#10843
The default agent configuration values were not set, which meant they were not
being set in the client configuration and this results in fingerprints failing
unless the values were set explicitly.
When a task group with `service` block(s) is validated, we validate that there
are no duplicates, but this validation doesn't have access to the task environment
because it hasn't been created yet. Services and checks with interpolation can
be flagged incorrectly as conflicting. Name conflicts in services are not
actually an error in Consul and users have reported wanting to use the same
service name for task groups differentiated by tags.
Alloc exec only works when task is passed as a flag and not an arg.
Alloc logs currently accepts either, but alloc signal and restart only
accept task as an arg. This adds -task as a flag to the other alloc
commands to make the cli UX consistent. If task is passed as a flag and
an arg, it ignores the arg.
This PR makes it so the Consul sync logic will ignore operations that
do not specify an action to take (i.e. [de-]register [services|checks]).
Ideally such noops would be discarded at the callsites (i.e. users
of [Create|Update|Remove]Workload], but we can also be defensive
at the commit point.
Also adds 2 trace logging statements which are helpful for diagnosing
sync operations with Consul - when they happen and why.
Fixes#10797
Updates to the datacenter field should be destructive for any allocation that
is on a node no longer in the list of datacenters, but inplace for any
allocation on a node that is still in the list. Add a check for this change to
the system and generic schedulers after we've checked the task definition for
updates and obtained the node for each current allocation.
There are bits of logic in callers of RemoveWorkload on group/task
cleanup hooks which call RemoveWorkload with the "Canary" version
of the workload, in case the alloc is marked as a Canary. This logic
triggers an extra sync with Consul, and also doesn't do the intended
behavior - for which no special casing is necessary anyway. When the
workload is marked for removal, all associated services and checks
will be removed regardless of the Canary status, because the service
and check IDs do not incorporate the canary-ness in the first place.
The only place where canary-ness matters is when updating a workload,
where we need to compute the hash of the services and checks to determine
whether they have been modified, the Canary flag of which is a part of
that.
Fixes#10842
Adopts [`go-changelog`](https://github.com/hashicorp/go-changelog) for managing Nomad's changelog. `go-changelog` is becoming the HashiCorp defacto standard tool for managing changelog, e.g. [Consul](https://github.com/hashicorp/consul/pull/8387), [Vault](https://github.com/hashicorp/vault/pull/10363), [Waypoint](https://github.com/hashicorp/waypoint/pull/1179). [Consul](https://github.com/hashicorp/consul/pull/8387) seems to be the first product to adopt it, and its PR has the most context - though I've updated `.changelog/README.md` with the relevant info here.
## Changes to developers workflow
When opening PRs, developers should add a changelog entry in `.changelog/<PR#>.txt`. Check [`.changelog/README.md`](https://github.com/hashicorp/nomad/blob/docs-adopt-gochangelog/.changelog/README.md#developer-guide).
For the WIP release, entries can be amended even after the PR merged, and new files may be added post-hoc (e.g. during transition period, missed accidentally, community PRs, etc).
### Transitioning
Pending PRs can start including the changelog entry files immediately.
For 1.1.3/1.0.9 cycle, the release coordinator should create the entries for any PR that gets merged without a changelog entry file. They should also move any 1.1.3 entry in CHANGELOG.md to a changelog entry file, as this PR done for GH-10818.
## Changes to release process
Before cutting a release, release coordinator should update the changelog by inserting the output of `make changelog` to CHANGELOG.md with appropriate headers. See [`.changelog/README.md`](https://github.com/hashicorp/nomad/blob/docs-adopt-gochangelog/.changelog/README.md#how-to-generate-changelog-entries-for-release) for more details.
## Details
go-changelog is a basic templating engine for maintaining changelog in HashiCorp environment.
It expects the changelog entries as files indexed by their PR number. The CLI generates the changelog section for a release by comparing two git references (e.g. `HEAD` and the latest release, e.g. `v1.1.2`), and still requires manual process for updating CHANGELOG.md and final formatting.
The approach has many nice advantages:
* Avoids changelog related merge conflicts: Each PR touches different file!
* Copes with amendments and post-PR updates: Just add or update a changelog entry file using the original PR numbers.
* Addresses the release backporting scenario: Cherry-picking PRs will cherry-pick the relevant changelog entry automatically!
* Only relies on data available through `git` - no reliance on GitHub metadata or require GitHub credentials
The approach has few downsides though:
* CHANGELOG.md going stale during development and must be updated manually before cutting the release
* Repository watchers can no longer glance at the CHANGELOG.md to see upcoming changes
* We can periodically update the file, but `go-changelog` tool does not aid with that
* `go-changelog` tool does not offer good error reporting. If an entry is has an invalid tag (e.g. uses `release-note:bugfix` instead of `release-note:bug`), the entry will be dropped silently
* We should update go-changelog to warn against unexpected entry tags
* TODO: Meanwhile, PR reviewers and release coordinators should watch out
## Potential follow ups
We should follow up with CI checks to ensure PR changes include a warning. I've opted not to include that now. We still make many non-changelog-worth PRs for website/docs, for large features that get merged in multiple small PRs. I did not want to include a check that fails often.
Also, we should follow up to have `go-changelog` emit better warnings on unexpected tag.
In Nomad 1.1.1 we generate a hosts file based on the Nomad-owned network
namespace, rather than using the default hosts file from the pause
container. This hosts file should be shared between tasks in the same
allocation so that tasks can update the file and have the results propagated
between tasks.