Commit Graph

24748 Commits

Author SHA1 Message Date
Luiz Aoqui 4b33bbd9c4
ci: run 'make check' as reusable workflow (#17600)
Some of the paths ignored by `test-core.yaml` need to be checked by
`make check`. The `checks.yaml` workflow run on these paths and can also
be used as a reusable workflow.
2023-06-20 08:17:13 +01:00
hashicorp-copywrite[bot] d797da4a3c
[COMPLIANCE] Add Copyright and License Headers (#17596)
Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
2023-06-19 12:23:28 -04:00
Phil Renaud 8e41380f72
[ui, deployments] Promote Canary and Unhealthy Allocations in the deployment status panel (#17547)
* A wild health status appears

* autoPromote notification conditions

* Legend fixes etc

* Acceptance tests for new canary alerts
2023-06-19 12:06:18 -04:00
Luiz Aoqui cfb3bb517f
np: scheduler configuration updates (#17575)
* jobspec: rename node pool scheduler_configuration

In HCL specifications we usually call configuration blocks `config`
instead of `configuration`.

* np: add memory oversubscription config

* np: make scheduler config ENT
2023-06-19 11:41:46 -04:00
Dao Thanh Tung b666857076
terraform: fix syntax in Azure example due to deprecated tf resource arguments (#17497) 2023-06-19 11:26:14 +02:00
dependabot[bot] 0c63019c92
build(deps): bump github.com/stretchr/testify from 1.8.2 to 1.8.4 (#17584) 2023-06-19 08:21:45 +01:00
Bruce Lok 72e92bc17f
fix typo peers.json (#17538) 2023-06-19 07:56:51 +01:00
Michael Lange 00e04a4b54
Merge pull request #17573 from hashicorp/f/legacy-openssl
UI Dev Tools: Use the legacy openssl provider for backcompat
2023-06-17 10:24:53 -07:00
Michael Lange 3ba7f4dae3 Use the legacy openssl provider for backcompat
Node v18 uses a newer version of openssl than webpack 4 is compatible
with. This is the quickest fix.

The ideal fix would be to upgrade webpack to v5 but the state of Ember,
Storybook, and generally just JS dep management makes this not an
option.
2023-06-16 17:58:40 -07:00
Luiz Aoqui d07f9ae2fe
cli: prevent panic if job node pool is nil (#17571)
If the `nomad` CLI is used to access a cluster running a version that
does not include node pools the command will `nil` panic when trying to
resolve the job's node pool.
2023-06-16 17:08:36 -04:00
Luiz Aoqui d5aa72190f
node pools: namespace integration (#17562)
Add structs and fields to support the Nomad Pools Governance Enterprise
feature of controlling node pool access via namespaces.

Nomad Enterprise allows users to specify a default node pool to be used
by jobs that don't specify one. In order to accomplish this, it's
necessary to distinguish between a job that explicitly uses the
`default` node pool and one that did not specify any.

If the `default` node pool is set during job canonicalization it's
impossible to do this, so this commit allows a job to have an empty node
pool value during registration but sets to `default` at the admission
controller mutator.

In order to guarantee state consistency the state store validates that
the job node pool is set and exists before inserting it.
2023-06-16 16:30:22 -04:00
Tim Gross 3da948d0c8
node pools: support `node.pool` constraint in scheduler (#17548)
Although most of the time jobs will be assigned to a single node pool, users may
want to set the node pool to "all" and then constraint to a subset of node
pools. Add support for setting a contraint like `${node.pool}`.
2023-06-16 13:31:46 -04:00
Seth Hoenig 320bac0ac4
e2e: modernize podman test suite (#17564)
Use the new style of e2e test for the podman suite ... which is all of
one test case that was skipped out. Turn the case back on, and we will
add more tests in the near future.
2023-06-16 10:36:17 -05:00
Tim Gross f411f0c0fb
docs: node pool specification (#17553) 2023-06-16 10:37:47 -04:00
Seth Hoenig cafaf2e2ee
e2e: cleanup podman installation in jammy image (#17558)
* e2e: cleanup podman installation in jammy image

The original steps were copied over from the bionic image and does a lot
of hoop jumping we do not need anymore.

For the moment just hard-code installing the v0.4.2 version of the driver,
but I may follow up and modify hc-install to support installing @latest
like go itself.

* use releases for hc-install
2023-06-15 18:17:31 -05:00
Michael Lange cac5160aa8
Merge pull request #17516 from hashicorp/f/fix-storybook
Fix Storybook
2023-06-15 14:31:40 -07:00
Seth Hoenig c7b44a57a2
e2e: purge bionic packer image scripts (#17559)
Bionic is dead, long live the Jammy!
2023-06-15 15:15:01 -05:00
Tim Gross df366df1cd
docs: fix broken link in variables spec page (#17554) 2023-06-15 15:57:00 -04:00
Michael Lange 65270115bf Error free Storybook build 2023-06-15 12:43:15 -07:00
Michael Lange 9635bec8bb Free your mind of the babel and the packed web
ember-cli-storybook and storybook itself has progressed to the point
where the DIY configs aren't necessary. It's all swept under the
`framework: '@storybook/ember'` config in main.js. Yay!
2023-06-15 12:40:03 -07:00
Michael Lange 2b5c4d982c Tidy up Storybook related packages
It's unfortunate having to point to a hash for ember-cli-storybook, but
there hasn't been a release since the environment PR merged. At least
this is better than pointing at a fork?
2023-06-15 12:40:03 -07:00
Michael Lange 915ff5b19a Minimum viable fix for Storybook
Stories that used named blocked wouldn't render the named blocks.
Evidently this was due to using a customized template renderer that
became incompatible when Ember was upgraded.
2023-06-15 12:40:03 -07:00
Phil Renaud 7cfe2d09e0
Deployment history timeline styling (#17524) 2023-06-15 14:42:32 -04:00
Patric Stout 4767d44b94
Fix DevicesSets being removed when cpusets are reloaded with cgroup v2 (#17535)
* Fix DevicesSets being removed when cpusets are reloaded with cgroup v2

This meant that if any allocation was created or removed, all
active DevicesSets were removed from all cgroups of all tasks.

This was most noticeable with "exec" and "raw_exec", as it meant
they no longer had access to /dev files.

* e2e: add test for verifying cgroups do not interfere with access to devices

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-06-15 09:39:36 -05:00
dependabot[bot] 2856967dda
build(deps-dev): bump webpack from 5.69.1 to 5.86.0 in /ui (#17488)
Bumps [webpack](https://github.com/webpack/webpack) from 5.69.1 to 5.86.0.
- [Release notes](https://github.com/webpack/webpack/releases)
- [Commits](https://github.com/webpack/webpack/compare/v5.69.1...v5.86.0)

---
updated-dependencies:
- dependency-name: webpack
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-15 10:39:06 -04:00
Seth Hoenig 3e7007b2a3
tests: set timeout on test-ui (#17549)
This seems to finish in about 20 minutes, or run for 6+ hours until hitting
a default timeout. Set a timeout to 30 minutes so we aren't wasting time
and runners.
2023-06-15 09:38:50 -05:00
Tim Gross 524183e2b1
docs: add missing `client.allocs` metrics (#17540)
The docs were missing counter metrics emitted by the task runner around task
state changes.
2023-06-15 09:18:11 -04:00
Luiz Aoqui bdc7f3305f
rpc: fix log message in Node.UpdateStatus (#17537) 2023-06-14 16:51:46 -04:00
Tim Gross 5b9322c70a
docs: clarify node pool apply/delete behavior (#17529) 2023-06-14 15:58:53 -04:00
Tim Gross dc9fae34ca
node pools: add pool as label on client metrics (#17528)
This changeset adds the node pool as a label anywhere we're already emitting
labels with additional information such as node class or ID about the client.
2023-06-14 15:58:38 -04:00
Tim Gross 5f509b8ce0
cli: fix missing `-quiet` flag for `var init` (#17526)
The `var init` command was intended to have support for a `-quiet` flag but it
was not documented and never parsed.
2023-06-14 14:52:46 -04:00
Tim Gross 736ad3ed32
docs: note namespace apply/delete behaviors, fix metric (#17527)
This changeset includes some fixes to documentation discovered while working on
node pools, but we didn't want to include in the node pool PRs so they can get
backported easily:

* namespace apply/delete commands are forwarded to the authoritative region
* deleting a namespace requires there are no non-terminal jobs in any of the
  federated regions
* fixed a typo in the name of the `nomad.client.allocated.disk` metric
2023-06-14 14:52:06 -04:00
Phil Renaud 7400c37b89
[ui] Job status panel: tooltips on individual allocs (#17514)
* Tooltip on individual allocs in the panel

* Isolate allocation cells to their own component

* Tipsy trigger

* Aria label for failed-or-lost tooltips

* Buildfix

* Try adding percy exec back to exam run
2023-06-14 12:45:36 -04:00
Luiz Aoqui ec80d051d8
client: fix panic on alloc stop in non-Linux environments (#17515)
Provide a no-op implementation of the drivers.DriverNetoworkManager
interface to be used by systems that don't support network isolation and
prevent panics where a network manager is expected.
2023-06-14 10:22:38 -04:00
James Rasell 70fc0fd701
build: add agent bindata file to copywrite ignore list. (#17507) 2023-06-14 11:13:59 +01:00
Tim Gross c1a01697c8
node pools: implement `node pool init` command (#17479)
Implement a `nomad node pool init` command that generates an example spec file
in either HCL or JSON format.
2023-06-13 14:51:29 -04:00
Luiz Aoqui bc17cffaef
node pool: node pool upsert on multiregion node register (#17503)
When registering a node with a new node pool in a non-authoritative
region we can't create the node pool because this new pool will not be
replicated to other regions.

This commit modifies the node registration logic to only allow automatic
node pool creation in the authoritative region.

In non-authoritative regions, the client is registered, but the node
pool is not created. The client is kept in the `initialing` status until
its node pool is created in the authoritative region and replicated to
the client's region.
2023-06-13 11:28:28 -04:00
Tim Gross 952eb2713e
node pools: protect against deleting occupied pools (#17457)
We don't want to delete node pools that have nodes or non-terminal jobs. Add a
check in the `DeleteNodePools` RPC to check locally and in federated regions,
similar to how we check that it's safe to delete namespaces.
2023-06-13 09:57:42 -04:00
stswidwinski 9a58474400
conf: Add preemption_config to the server extra HCL keys which should be removed (#17481)
Add preemption_config to the set of keys which should be pruned from the server
config as described in #17480.
2023-06-13 10:48:19 +02:00
Daniel Bennett fa8b102092
ci: remove circleci (#17502)
all of our workflows are in GitHub Actions now 🎉
2023-06-12 16:28:19 -05:00
Tim Gross e8a361310f
node pools: replicate from authoritative region (#17456)
Upserts and deletes of node pools are forwarded to the authoritative region,
just like we do for namespaces, quotas, ACL policies, etc. Replicate node pools
from the authoritative region.
2023-06-12 13:24:24 -04:00
dependabot[bot] d45bb4bab9
build(deps): bump github.com/hashicorp/go-plugin from 1.4.9 to 1.4.10 (#17486) 2023-06-12 14:22:33 +01:00
Tim Gross bb7f0edd6a
node pools: prevent panic on upsert during upgrades (#17474)
Whenever we write a Raft log entry for node pools, we need to first make sure
that all servers can safely apply the log without panicking. Gate upsert and
delete RPCs on all servers being upgraded to the minimum version.
2023-06-12 09:01:30 -04:00
Tim Gross e3a37c0b97
replication: fix potential panic during upgrades (#17476)
If the authoritative region has been upgraded to a version of Nomad that has new
replicated objects (such as ACL Auth Methods, ACL Binding Rules, etc.), the
non-authoritative regions will start replicating those objects as soon as their
leader is upgraded. If a server in the non-authoritative region is upgraded and
then becomes the leader before all the other servers in the region have been
upgraded, then it will attempt to write a Raft log entry that the followers
don't understand. The followers will then panic.

Add same the minimum version checks that we do for RPC writes to the leader's
replication loop.
2023-06-12 08:53:56 -04:00
dependabot[bot] 8bd3bdab42
build(deps): bump github.com/shoenig/go-m1cpu from 0.1.5 to 0.1.6 (#17487) 2023-06-12 12:08:16 +01:00
dependabot[bot] c1f5ffb3bc
build(deps): bump github.com/fatih/color from 1.13.0 to 1.15.0 (#17485) 2023-06-12 10:44:18 +01:00
Phil Renaud 6a9df6e3ab
[ui] Don't show a service as healthy when its parent alloc is not running (#17465)
* Fix: dont show a service as healthy when its parent alloc is not running

* Test for Health Unknown
2023-06-09 15:43:11 -04:00
Piotr Kazmierczak 57dad0ca07
docs: corrections and additional information for OIDC-related concepts (#17470) 2023-06-09 16:50:22 +02:00
Piotr Kazmierczak 0a4052ece5
docs: add missing login API endpoint documentation (#17467) 2023-06-09 15:59:01 +02:00
Seth Hoenig 557a6b4a5e
docker: stop network pause container of lost alloc after node restart (#17455)
This PR fixes a bug where the docker network pause container would not be
stopped and removed in the case where a node is restarted, the alloc is
moved to another node, the node comes back up. See the issue below for
full repro conditions.

Basically in the DestroyNetwork PostRun hook we would depend on the
NetworkIsolationSpec field not being nil - which is only the case
if the Client stays alive all the way from network creation to network
teardown. If the node is rebooted we lose that state and previously
would not be able to find the pause container to remove. Now, we manually
find the pause container by scanning them and looking for the associated
allocID.

Fixes #17299
2023-06-09 08:46:29 -05:00