When creating a TCP proxy bridge for Connect tasks, we are at the
mercy of either end for managing the connection state. For long
lived gRPC connections the proxy could reasonably expect to stay
open until the context was cancelled. For the HTTP connections used
by connect native tasks, we experience connection disconnects.
The proxy gets recreated as needed on follow up requests, however
we also emit a WARN log when the connection is broken. This PR
lowers the WARN to a TRACE, because these disconnects are to be
expected.
Ideally we would be able to proxy at the HTTP layer, however Consul
or the connect native task could be configured to expect mTLS, preventing
Nomad from MiTM the requests.
We also can't mange the proxy lifecycle more intelligently, because
we have no control over the HTTP client or server and how they wish
to manage connection state.
What we have now works, it's just noisy.
Fixes#10933
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.
Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.
As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).
Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.
Closes#2527
As we moved to using `-detach` for registering jobs, we should wait
until allocs and deployments are created before asserting their
properties.
Fixing `TestNodeDrainIgnoreSystem` and `TestRescheduleProgressDeadlineFail` tests as they seem particularly flaky, failing 9 and 7 times (respectively) in the last two weeks.
* api: revert to defaulting to http/1
PR #10778 incidentally changed the api http client to connect with
HTTP/2 first. However, the websocket libraries used in `alloc exec`
features don't handle http/2 well, and don't downgrade to http/1
gracefully.
Given that the switch is incidental, and not requested by users.
Furthermore, api consumers can opt-in to forcing http/2 by setting
custom http clients.
Fixes#10922
Noticed that the private Enterprise repository dependencies drifted a bit. Here, we update the OSS to the dependencies used by Enterprise.
We should update all dependencies as a matter of hygiene, but that's an issue for another time.
Fix a panic in handling one-time auth tokens, used to support `nomad ui
--authenticate`.
If the nomad leader is a 1.1.x with some servers running as 1.0.x, the
pre-1.1.0 servers risk crashing and the cluster may lose quorum. That
can happen when `nomad authenticate -ui` command is issued, or when the
leader scans for expired tokens every 10 minutes.
Fixed#10943 .
Namespaces are set-up in Nomad to be an object that has an id property.
However, namespaces actually don't have that shape. Our search was expecting
a namespace object, but we actually don't have a namespace assigned to jobs
in our config and namespace is set to null. Normally, these namespaces would
be set to default, but that would require us to refactor our Mirage config
if we wanted to assert that namespaces are 'default' and not null. So this is
a bandaid solution.
Support the new post-1.0.0 job spec fields in the HCLv1 parser.
The HCLv1 parser is still the default (or only!) parser in many downstream tools, e.g. [Levant](e48c439f14/template/render.go (L13-L30)), and [terraform-provider-nomad](bce32a7831/nomad/resource_job.go (L735-L743)) .
While we initially intended to deprecate HCLv1 parser in 1.0.0, we never communicated that publicly. We did not fully anticipate the public usage of `jobspec` package (we assumed it's a private package), or the friction that HCLv2 introduced in some cases (e.g. https://github.com/hashicorp/nomad/issues/10777, https://github.com/hashicorp/nomad/issues/9838).
So moving forward we intend to ensure that new job spec fields are honored in both the HCLv1 and HCLv2 parser until we solidify the migration path and communicate it properly.
Use glint to determine if os.Stdout is a terminal.
glint Terminal renderer expects os.Stdout [not only to be a terminal, but also to have non-zero size](b492b545f6/renderer_term.go (L39-L46)). It's unclear how this condition arises, but this additional check causes Nomad to render deployments progress through glint when glint cannot support it.
By using golint to perform the check, we eliminate the risk of mis-judgement.