Running `make check` on macOS identifies some dead code because the code is used
only with the Linux build tag. Move this code into appropriately-tagged code
files.
The ec2info was never intuitive to run - needing to set the AWS
envinronment variables, cd'ing into tools/ec2info, and knowing
to invoke the command.
This PR makes it so we can run ec2info just by running
make ec2info
The command now also checks for the AWS environment variables being
set, and provides a useful error if they are not.
The NSD checks tests were racey, whereby the check may not have
been triggered by the time it was queried. This change wraps the
check so it can account for this.
This removes the current ACL expiration GC section in order to get
the tests passing and allow more time to investigate the test. I
have full confidence the feature is working as expected and have
tested extensively locally.
A Nomad user reported problems with CSI volumes associated with failed
allocations, where the Nomad server did not send a controller unpublish RPC.
The controller unpublish is skipped if other non-terminal allocations on the
same node claim the volume. The check has a bug where the allocation belonging
to the claim being freed was included in the check incorrectly. During a normal
allocation stop for job stop or a new version of the job, the allocation is
terminal. But allocations that fail are not yet marked terminal at the point in
time when the client sends the unpublish RPC to the server.
For CSI plugins that support controller attach/detach, this means that the
controller will not be able to detach the volume from the allocation's host and
the replacement claim will fail until a GC is run. This changeset fixes the
conditional so that the claim's own allocation is not included, and makes the
logic easier to read. Include a test case covering this path.
Also includes two minor extra bugfixes:
* Entities we get from the state store should always be copied before
altering. Ensure that we copy the volume in the top-level unpublish workflow
before handing off to the steps.
* The list stub object for volumes in `nomad/structs` did not match the stub
object in `api`. The `api` package also did not include the current
readers/writers fields that are expected by the UI. True up the two objects and
add the previously undocumented fields to the docs.
When querying the checks for an allocation, the request must be
forwarded to the agent that is running the allocation. If the
initial request is made to a server agent, the request can be made
directly to the client agent running the allocation. If the
request is made to a client agent not running the alloc, the
request needs to be forwarded to a server and then the correct
client.
The map of in-flight RPCs gets cleared by a goroutine in the test without first
locking it to make sure that it's not being accessed concurrently by the stats
fetcher itself. This can cause a panic in tests.
This PR modifies RandomStagger to protect against negative input
values. If the given interval is negative, the value returned will
be somewhere in the stratosphere. Instead, treat negative inputs
like zero, returning zero.
* Bones of a new flyout section
* Basic sidebar behaviour and style edits
* Concept of a refID for service fragments to disambiguate task and group
* A11y audit etc
* Moves health check aggregation to serviceFragment model and retains history
* Has to be a getter
* flyout populated
* Sidebar styling
* Sidebar table and details added
* Mirage fixture
* Active status and table styles
* Unit test mock updated
* Acceptance tests for alloc services table and flyout
* Chart styles closer to mock
* Without a paused test
* Consul and Nomad icons in services table
* Alloc services test updates in light of new column changes
* without using an inherited scenario
* Added to subnav and basic table implemented
* Existing services become service fragments, and services tab aggregated beneath job route
* Index page within jobs/job/services
* Watchable services
* Lintfixes
* Links to clients and individual services set up
* Child service route
* Keyboard shortcuts on service page
* Model that shows consul services as well, plus level and provider cols
* lintfix
* Level as query param
* Watch job for service name changes too
* Group level service fixtures established
* Progress at task level and job-linked services
* Task and group services on update
* Fixture side-effect cleanup
* Basic acceptance tests for job services
* Testmodel cleanup
* Disabled mirage logging
* New cluster type specifically for services
* Without explicit job-model binding
* Trying to isolate a tostring error
* Account for new tab in keyboardnav
* More test isolation attempts
* Remove skipped tests and link task to parent group by id
ui: add service health viz to table (#14369)
* ui: add service-status-bar
* test: service-status-bar
* refact: update component api for new data struct
* ui: format service health struct
* ui: add service health viz to table
* temp: add placeholder to remind conditional watcher
* test: write tests for transformation algorithm
* refact: update transformation algo
* ui: conditionally long poll checks endpoint
* refact: add conditional logic for nomad provider
refact: update service-fragment model to include owner info
ui: differentiate between task and group-level in derived state comp
test: add test to document behavior
refact: update tests for api change
refact: update integration test for API change
chore: remove unsused vars
chore: elvis operator to protect mirage
refact: create refId instead of internalModel
refact: update algo
refact: update conditional template logic
refact: update test for api change:
chore: cant use if and not in hbs conditional
* Added to subnav and basic table implemented
* Existing services become service fragments, and services tab aggregated beneath job route
* Index page within jobs/job/services
* Watchable services
* Lintfixes
* Links to clients and individual services set up
* Child service route
* Keyboard shortcuts on service page
* Model that shows consul services as well, plus level and provider cols
* lintfix
* Level as query param
* Watch job for service name changes too
* Lintfix
* Testfixes
* Placeholder mirage route
Includes:
* Remove leader upgrade raft version test, as older versions of raft are now
incompatible with our autopilot library.
* Remove attempt to assert initial non-voter status on the `PromoteNonVoter`
test, as this happens too quickly to reliably detect.
* Unskip some previously-skipped tests which we should make stable.
* Remove the `consul/sdk` retry helper for these tests; this uses panic recovery
in a kind of a clever/gross way to reduce LoC but it seems to introduce some
timing issues in the process.
* Add more test step logging and reduce logging noise from the scheduler
goroutines to make it easier to debug failing tests.
* Be more consistent about using the `waitForStableLeadership` helper so that we
can assert the cluster is fully stable and not just that we've added peers.
When configuring Consul Service Mesh, it's sometimes necessary to
provide dynamic value that are only known to Nomad at runtime. By
interpolating configuration values (in addition to configuration keys),
user are able to pass these dynamic values to Consul from their Nomad
jobs.
Nomad's original autopilot was importing from a private package in Consul. It
has been moved out to a shared library. Switch Nomad to use this library so that
we can eliminate the import of Consul, which is necessary to build Nomad ENT
with the current version of the Consul SDK. This also will let us pick up
autopilot improvements shared with Consul more easily.
These options are mutually exclusive but, since `-hcl2-strict` defaults
to `true` users had to explicitily set it to `false` when using `-hcl1`.
Also return `255` when job plan fails validation as this is the expected
code in this situation.
Nomad is generally compliant with the CSI specification for Container
Orchestrators (CO), except for unimplemented features. However, some storage
vendors have built CSI plugins that are not compliant with the specification or
which expect that they're only deployed on Kubernetes. Nomad cannot vouch for
the compatibility of any particular plugin, so clarify this in the docs.
Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>