When mTLS is enabled, only nomad servers of the region should access the
Raft RPC layer. Clients and servers in other regions should only use the
Nomad RPC endpoints.
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Co-authored-by: Seth Hoenig <shoenig@hashicorp.com>
* don't timestamp active log file
* website: update log_file default value
* changelog: add entry for #11070
* website: add upgrade instructions for log_file in v1.14 and v1.2.0
When a node becomes ready, create an eval for all system jobs across
namespaces.
The previous code uses `job.ID` to deduplicate evals, but that ignores
the job namespace. Thus if there are multiple jobs in different
namespaces sharing the same ID/Name, only one will be considered for
running in the new node. Thus, Nomad may skip running some system jobs
in that node.
Fix a bug where system jobs may fail to be placed on a node that
initially was not eligible for system job placement.
This changes causes the reschedule to re-evaluate the node if any
attribute used in feasibility checks changes.
Fixes https://github.com/hashicorp/nomad/issues/8448
In a multi-task-group job, treat 0 canary groups as auto-promote.
This change fixes an edge case where Nomad requires a manual promotion,
if the job had any group with canary=0 and rest of groups having
auto_promote set.
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Speed up client startup, by retrying more until the servers are known.
Currently, if client fingerprinting is fast and finishes before the
client connect to a server, node registration may be delayed by 15
seconds or so!
Ideally, we'd wait until the client discovers the servers and then retry
immediately, but that requires significant code changes.
Here, we simply retry the node registration request every second. That's
basically the equivalent of check if the client discovered servers every
second. Should be a cheap operation.
When testing this change on my local computer and where both servers and
clients are co-located, the time from startup till node registration
dropped from 34 seconds to 8 seconds!
When creating a TCP proxy bridge for Connect tasks, we are at the
mercy of either end for managing the connection state. For long
lived gRPC connections the proxy could reasonably expect to stay
open until the context was cancelled. For the HTTP connections used
by connect native tasks, we experience connection disconnects.
The proxy gets recreated as needed on follow up requests, however
we also emit a WARN log when the connection is broken. This PR
lowers the WARN to a TRACE, because these disconnects are to be
expected.
Ideally we would be able to proxy at the HTTP layer, however Consul
or the connect native task could be configured to expect mTLS, preventing
Nomad from MiTM the requests.
We also can't mange the proxy lifecycle more intelligently, because
we have no control over the HTTP client or server and how they wish
to manage connection state.
What we have now works, it's just noisy.
Fixes#10933
* api: revert to defaulting to http/1
PR #10778 incidentally changed the api http client to connect with
HTTP/2 first. However, the websocket libraries used in `alloc exec`
features don't handle http/2 well, and don't downgrade to http/1
gracefully.
Given that the switch is incidental, and not requested by users.
Furthermore, api consumers can opt-in to forcing http/2 by setting
custom http clients.
Fixes#10922
Fix a panic in handling one-time auth tokens, used to support `nomad ui
--authenticate`.
If the nomad leader is a 1.1.x with some servers running as 1.0.x, the
pre-1.1.0 servers risk crashing and the cluster may lose quorum. That
can happen when `nomad authenticate -ui` command is issued, or when the
leader scans for expired tokens every 10 minutes.
Fixed#10943 .
Use glint to determine if os.Stdout is a terminal.
glint Terminal renderer expects os.Stdout [not only to be a terminal, but also to have non-zero size](b492b545f6/renderer_term.go (L39-L46)). It's unclear how this condition arises, but this additional check causes Nomad to render deployments progress through glint when glint cannot support it.
By using golint to perform the check, we eliminate the risk of mis-judgement.
When the client launches, use a consistent read to fetch its own allocs,
but allow stale read afterwards as long as reads don't revert into older
state.
This change addresses an edge case affecting restarting client. When a
client restarts, it may fetch a stale data concerning its allocs: allocs
that have completed prior to the client shutdown may still have "run/running"
desired/client status, and have the client attempt to re-run again.
An alternative approach is to track the indices such that the client
set MinQueryIndex on the maximum index the client ever saw, or compare
received allocs against locally restored client state. Garbage
collection complicates this approach (local knowledge is not complete),
and the approach still risks starting "dead" allocations (e.g. the
allocation may have been placed when client just restarted and have
already been reschuled by the time the client started. This approach
here is effective against all kinds of stalness problems with small
overhead.
Basically the same as #10896 but with the Affinity struct.
Since we use reflect.DeepEquals for job comparison, there is
risk of false positives for changes due to a job struct with
memoized vs non-memoized strings.
Closes#10897
This PR fixes a bug where the underlying Envoy process of a Connect gateway
would consume a full core of CPU if there is more than one sidecar or gateway
in a group. The utilization was being caused by Consul injecting an envoy_ready_listener
on 127.0.0.1:8443, of which only one of the Envoys would be able to bind to.
The others would spin in a hot loop trying to bind the listener.
As a workaround, we now specify -address during the Envoy bootstrap config
step, which is how Consul maps this ready listener. Because there is already
the envoy_admin_listener, and we need to continue supporting running gateways
in host networking mode, and in those case we want to use the same port
value coming from the service.port field, we now bind the admin listener to
the 127.0.0.2 loop-back interface, and the ready listener takes 127.0.0.1.
This shouldn't make a difference in the 99.999% use case where envoy is
being run in its official docker container. Advanced users can reference
${NOMAD_ENVOY_ADMIN_ADDR_<service>} (as they 'ought to) if needed,
as well as the new variable ${NOMAD_ENVOY_READY_ADDR_<service>} for the
envoy_ready_listener.
Adds missing interpolation step to the `meta` blocks when building the task
environment. Also fixes incorrect parameter order in the test assertion and
adds diagnostics to the test.
This PR will have Nomad de-register a sidecar proxy service before
attempting to de-register the parent service. Otherwise, Consul will
emit a warning and an error.
Fixes#10845
When a jobspec doesn't include a namespace, we provide it with the default
namespace, but this ends up overriding the explicit `-namespace` flag. This
changeset uses the same logic as region parsing to create an order of
precedence: the query string parameter (the `-namespace` flag) overrides the
API request body which overrides the jobspec.
This PR uses regex-based matching for sidecar proxy services and checks when syncing
with Consul. Previously we would check if the parent of the sidecar was still being
tracked in Nomad. This is a false invariant - one which we must not depend when we
make #10845 work.
Fixes#10843
The default agent configuration values were not set, which meant they were not
being set in the client configuration and this results in fingerprints failing
unless the values were set explicitly.
When a task group with `service` block(s) is validated, we validate that there
are no duplicates, but this validation doesn't have access to the task environment
because it hasn't been created yet. Services and checks with interpolation can
be flagged incorrectly as conflicting. Name conflicts in services are not
actually an error in Consul and users have reported wanting to use the same
service name for task groups differentiated by tags.
Alloc exec only works when task is passed as a flag and not an arg.
Alloc logs currently accepts either, but alloc signal and restart only
accept task as an arg. This adds -task as a flag to the other alloc
commands to make the cli UX consistent. If task is passed as a flag and
an arg, it ignores the arg.
This PR makes it so the Consul sync logic will ignore operations that
do not specify an action to take (i.e. [de-]register [services|checks]).
Ideally such noops would be discarded at the callsites (i.e. users
of [Create|Update|Remove]Workload], but we can also be defensive
at the commit point.
Also adds 2 trace logging statements which are helpful for diagnosing
sync operations with Consul - when they happen and why.
Fixes#10797
Updates to the datacenter field should be destructive for any allocation that
is on a node no longer in the list of datacenters, but inplace for any
allocation on a node that is still in the list. Add a check for this change to
the system and generic schedulers after we've checked the task definition for
updates and obtained the node for each current allocation.
There are bits of logic in callers of RemoveWorkload on group/task
cleanup hooks which call RemoveWorkload with the "Canary" version
of the workload, in case the alloc is marked as a Canary. This logic
triggers an extra sync with Consul, and also doesn't do the intended
behavior - for which no special casing is necessary anyway. When the
workload is marked for removal, all associated services and checks
will be removed regardless of the Canary status, because the service
and check IDs do not incorporate the canary-ness in the first place.
The only place where canary-ness matters is when updating a workload,
where we need to compute the hash of the services and checks to determine
whether they have been modified, the Canary flag of which is a part of
that.
Fixes#10842
Adopts [`go-changelog`](https://github.com/hashicorp/go-changelog) for managing Nomad's changelog. `go-changelog` is becoming the HashiCorp defacto standard tool for managing changelog, e.g. [Consul](https://github.com/hashicorp/consul/pull/8387), [Vault](https://github.com/hashicorp/vault/pull/10363), [Waypoint](https://github.com/hashicorp/waypoint/pull/1179). [Consul](https://github.com/hashicorp/consul/pull/8387) seems to be the first product to adopt it, and its PR has the most context - though I've updated `.changelog/README.md` with the relevant info here.
## Changes to developers workflow
When opening PRs, developers should add a changelog entry in `.changelog/<PR#>.txt`. Check [`.changelog/README.md`](https://github.com/hashicorp/nomad/blob/docs-adopt-gochangelog/.changelog/README.md#developer-guide).
For the WIP release, entries can be amended even after the PR merged, and new files may be added post-hoc (e.g. during transition period, missed accidentally, community PRs, etc).
### Transitioning
Pending PRs can start including the changelog entry files immediately.
For 1.1.3/1.0.9 cycle, the release coordinator should create the entries for any PR that gets merged without a changelog entry file. They should also move any 1.1.3 entry in CHANGELOG.md to a changelog entry file, as this PR done for GH-10818.
## Changes to release process
Before cutting a release, release coordinator should update the changelog by inserting the output of `make changelog` to CHANGELOG.md with appropriate headers. See [`.changelog/README.md`](https://github.com/hashicorp/nomad/blob/docs-adopt-gochangelog/.changelog/README.md#how-to-generate-changelog-entries-for-release) for more details.
## Details
go-changelog is a basic templating engine for maintaining changelog in HashiCorp environment.
It expects the changelog entries as files indexed by their PR number. The CLI generates the changelog section for a release by comparing two git references (e.g. `HEAD` and the latest release, e.g. `v1.1.2`), and still requires manual process for updating CHANGELOG.md and final formatting.
The approach has many nice advantages:
* Avoids changelog related merge conflicts: Each PR touches different file!
* Copes with amendments and post-PR updates: Just add or update a changelog entry file using the original PR numbers.
* Addresses the release backporting scenario: Cherry-picking PRs will cherry-pick the relevant changelog entry automatically!
* Only relies on data available through `git` - no reliance on GitHub metadata or require GitHub credentials
The approach has few downsides though:
* CHANGELOG.md going stale during development and must be updated manually before cutting the release
* Repository watchers can no longer glance at the CHANGELOG.md to see upcoming changes
* We can periodically update the file, but `go-changelog` tool does not aid with that
* `go-changelog` tool does not offer good error reporting. If an entry is has an invalid tag (e.g. uses `release-note:bugfix` instead of `release-note:bug`), the entry will be dropped silently
* We should update go-changelog to warn against unexpected entry tags
* TODO: Meanwhile, PR reviewers and release coordinators should watch out
## Potential follow ups
We should follow up with CI checks to ensure PR changes include a warning. I've opted not to include that now. We still make many non-changelog-worth PRs for website/docs, for large features that get merged in multiple small PRs. I did not want to include a check that fails often.
Also, we should follow up to have `go-changelog` emit better warnings on unexpected tag.
In Nomad 1.1.1 we generate a hosts file based on the Nomad-owned network
namespace, rather than using the default hosts file from the pause
container. This hosts file should be shared between tasks in the same
allocation so that tasks can update the file and have the results propagated
between tasks.