This PR adds new probes for detecting these new Consul related attributes:
Consul namespaces are a Consul enterprise feature that may be disabled depending
on the enterprise license associated with the Consul servers. Having this attribute
available will enable Nomad to properly decide whether to query the Consul Namespace
API.
Consul connect must be explicitly enabled before Connect APIs will work. Currently
Nomad only checks for a minimum Consul version. Having this attribute available will
enable Nomad to properly schedule Connect tasks only on nodes with a Consul agent that
has Connect enabled.
Consul connect requires the grpc port to be explicitly set before Connect APIs will work.
Currently Nomad only checks for a minimal Consul version. Having this attribute available
will enable Nomad to schedule Connect tasks only on nodes with a Consul agent that has
the grpc listener enabled.
This PR refactors the ConsulFingerprint implementation, breaking individual attributes
into individual functions to make testing them easier. This is in preparation for
additional extractors about to be added. Behavior should be otherwise unchanged.
It adds the attribute consul.sku, which can be used to differentiate between Consul
OSS vs Consul ENT.
The reconciler has some complicated behavior when there are already running
allocations from a previous version of the job that we want to keep, as
happens during a rollback. Document this behavior with a test.
The plans generated by the scheduler produce high-level output of counts on each
evaluation, but when debugging scheduler issues it'd be nice to have a more
detailed view of the resulting plan. Emitting this log at trace minimizes the
overhead, and producing it in the plan applyer makes it easier to find as it
will always be on the leader.
Arguments to our logger's various write methods are evaluated eagerly, so
method calls in log parameters will always be called, regardless of log
level. Move some logger messages to the logger's `Fmt` method so that
`GoString` is evaluated lazily instead.
Ease spinning up a cluster, where binaries are fetched from arbitrary
urls. These could be CircleCI `build-binaries` job artifacts, or
presigned S3 urls.
Co-authored-by: Tim Gross <tgross@hashicorp.com>
refactor the api handling of `nomad exec`, and ensure that we process
all received events before handling websocket closing.
The exit code should be the last message received, and we ought to
ignore any websocket close error we receive afterwards.
Previously, we used two channels: one for websocket frames and another
for handling errors. This raised the possibility that we processed the
error before processing the frames, resulting into an "unexpected EOF"
error.
In this loop, we ought to close the websocket connection gracefully when
the StreamErrWrapper reaches EOF.
Previously, it's possible that that we drop the last few events or skip sending
the websocket closure. If `handler(handlerPipe)` returns and `cancel` is called,
before the loop here completes processing streaming events, the loop exits
prematurely without propagating the last few events.
Instead here, the loop continues until we hit `httpPipe` EOF (through
`decoder.Decode`), to ensure we process the events to completion.
When a wildcard namespace is used for `nomad job` commands that support prefix
matching, avoid asking the user for input if a prefix is an unambiguous exact
match so that the behavior is similar to the commands using a specific or
unset namespace.
Adds clarification to `nomad volume create` commands around how the `volume`
block in the jobspec overrides this behavior. Adds missing section to `nomad
volume register` and to example volume spec for both commands.
Include the VolumeCapability.MountVolume data in
ControllerPublishVolume, CreateVolume, and ValidateVolumeCapabilities
RPCs sent to the CSI controller. The previous behavior was to only
include the MountVolume capability in the NodeStageVolume request, which
on some CSI implementations would be rejected since the Volume was not
originally provisioned with the specific mount capabilities requested.
The websocket interface used for `alloc exec` has to silently drop client send
errors because otherwise those errors would interleave with the streamed
output. But we may be able to surface errors that cause terminated websockets
a little better in the HTTP server logs.
seems when this PR was raised, the Nomad CI provider was having
availability issues meaning the test suite was not correctly run,
thus allowing broken tests into main. The PR itself exercised test
code which had not been hit before.
The particular problem is when identifying whether the event
received is a heartbeat; this was performed using standard Golang
conditionals. Unfortunately the operator == is not defined on byte
arrays, resulting in the check always returning false. To overcome
this issue the code now uses the bytes.Equal function to correctly
compare the data.
Move the words being defined in the /docs/internal/architecture page to be
small headers so that they can be linked to with anchors from Learn guides and
other documentation location.