Commit Graph

24714 Commits

Author SHA1 Message Date
James Rasell 70fc0fd701
build: add agent bindata file to copywrite ignore list. (#17507) 2023-06-14 11:13:59 +01:00
Tim Gross c1a01697c8
node pools: implement `node pool init` command (#17479)
Implement a `nomad node pool init` command that generates an example spec file
in either HCL or JSON format.
2023-06-13 14:51:29 -04:00
Luiz Aoqui bc17cffaef
node pool: node pool upsert on multiregion node register (#17503)
When registering a node with a new node pool in a non-authoritative
region we can't create the node pool because this new pool will not be
replicated to other regions.

This commit modifies the node registration logic to only allow automatic
node pool creation in the authoritative region.

In non-authoritative regions, the client is registered, but the node
pool is not created. The client is kept in the `initialing` status until
its node pool is created in the authoritative region and replicated to
the client's region.
2023-06-13 11:28:28 -04:00
Tim Gross 952eb2713e
node pools: protect against deleting occupied pools (#17457)
We don't want to delete node pools that have nodes or non-terminal jobs. Add a
check in the `DeleteNodePools` RPC to check locally and in federated regions,
similar to how we check that it's safe to delete namespaces.
2023-06-13 09:57:42 -04:00
stswidwinski 9a58474400
conf: Add preemption_config to the server extra HCL keys which should be removed (#17481)
Add preemption_config to the set of keys which should be pruned from the server
config as described in #17480.
2023-06-13 10:48:19 +02:00
Daniel Bennett fa8b102092
ci: remove circleci (#17502)
all of our workflows are in GitHub Actions now 🎉
2023-06-12 16:28:19 -05:00
Tim Gross e8a361310f
node pools: replicate from authoritative region (#17456)
Upserts and deletes of node pools are forwarded to the authoritative region,
just like we do for namespaces, quotas, ACL policies, etc. Replicate node pools
from the authoritative region.
2023-06-12 13:24:24 -04:00
dependabot[bot] d45bb4bab9
build(deps): bump github.com/hashicorp/go-plugin from 1.4.9 to 1.4.10 (#17486) 2023-06-12 14:22:33 +01:00
Tim Gross bb7f0edd6a
node pools: prevent panic on upsert during upgrades (#17474)
Whenever we write a Raft log entry for node pools, we need to first make sure
that all servers can safely apply the log without panicking. Gate upsert and
delete RPCs on all servers being upgraded to the minimum version.
2023-06-12 09:01:30 -04:00
Tim Gross e3a37c0b97
replication: fix potential panic during upgrades (#17476)
If the authoritative region has been upgraded to a version of Nomad that has new
replicated objects (such as ACL Auth Methods, ACL Binding Rules, etc.), the
non-authoritative regions will start replicating those objects as soon as their
leader is upgraded. If a server in the non-authoritative region is upgraded and
then becomes the leader before all the other servers in the region have been
upgraded, then it will attempt to write a Raft log entry that the followers
don't understand. The followers will then panic.

Add same the minimum version checks that we do for RPC writes to the leader's
replication loop.
2023-06-12 08:53:56 -04:00
dependabot[bot] 8bd3bdab42
build(deps): bump github.com/shoenig/go-m1cpu from 0.1.5 to 0.1.6 (#17487) 2023-06-12 12:08:16 +01:00
dependabot[bot] c1f5ffb3bc
build(deps): bump github.com/fatih/color from 1.13.0 to 1.15.0 (#17485) 2023-06-12 10:44:18 +01:00
Phil Renaud 6a9df6e3ab
[ui] Don't show a service as healthy when its parent alloc is not running (#17465)
* Fix: dont show a service as healthy when its parent alloc is not running

* Test for Health Unknown
2023-06-09 15:43:11 -04:00
Piotr Kazmierczak 57dad0ca07
docs: corrections and additional information for OIDC-related concepts (#17470) 2023-06-09 16:50:22 +02:00
Piotr Kazmierczak 0a4052ece5
docs: add missing login API endpoint documentation (#17467) 2023-06-09 15:59:01 +02:00
Seth Hoenig 557a6b4a5e
docker: stop network pause container of lost alloc after node restart (#17455)
This PR fixes a bug where the docker network pause container would not be
stopped and removed in the case where a node is restarted, the alloc is
moved to another node, the node comes back up. See the issue below for
full repro conditions.

Basically in the DestroyNetwork PostRun hook we would depend on the
NetworkIsolationSpec field not being nil - which is only the case
if the Client stays alive all the way from network creation to network
teardown. If the node is rebooted we lose that state and previously
would not be able to find the pause container to remove. Now, we manually
find the pause container by scanning them and looking for the associated
allocID.

Fixes #17299
2023-06-09 08:46:29 -05:00
Phil Renaud 944f30674d
[ui] Parallelize ember tests (#17442)
* Exam to parallelize tests

* Logging to try to solve test flakiness

* Logging in another failure

* Hardening for one test and snapshot for another

* Explicitly set the first one as the servicedAlloc instead of randomly picking

* A wild CircleCI test failure appears

* de-log
2023-06-07 17:01:35 -04:00
Seth Hoenig 134e70cbab
client: fix client panic during drain cause by shutdown (#17450)
During shutdown of a client with drain_on_shutdown there is a race between
the Client ending the cgroup and the task's cpuset manager cleaning up
the cgroup. During the path traversal, skip anything we cannot read, which
avoids the nil DirEntry we try to dereference now.
2023-06-07 15:12:44 -05:00
Tim Gross 64a4c6204a
build: update to go1.20.5 (#17451)
Go released a security update to fix build-time code injection and execution via
CGO. This doesn't impact already-released versions of Nomad, just the build
toolchain, so we won't be releasing a Nomad security update to go with it.
2023-06-07 11:44:59 -04:00
Tim Gross fbaf4c8b69
node pools: implement support in scheduler (#17443)
Implement scheduler support for node pool:

* When a scheduler is invoked, we get a set of the ready nodes in the DCs that
  are allowed for that job. Extend the filter to include the node pool.
* Ensure that changes to a job's node pool are picked up as destructive
  allocation updates.
* Add `NodesInPool` as a metric to all reporting done by the scheduler.
* Add the node-in-pool the filter to the `Node.Register` RPC so that we don't
  generate spurious evals for nodes in the wrong pool.
2023-06-07 10:39:03 -04:00
Luiz Aoqui 5878113c41
node pool: implement `nomad node pool nodes` CLI (#17444) 2023-06-07 10:37:27 -04:00
Tim Gross 06fc284644
node pools: implement CLI for `node pool jobs` command (#17432) 2023-06-06 15:02:26 -04:00
Tim Gross c0f2295510
node pools: implement HTTP API to list jobs in pool (#17431)
Implements the HTTP API associated with the `NodePool.ListJobs` RPC, including
the `api` package for the public API and documentation.

Update the `NodePool.ListJobs` RPC to fix the missing handling of the special
"all" pool.
2023-06-06 11:40:13 -04:00
Luiz Aoqui 2420c93179
node pools: list nodes in pool (#17413) 2023-06-06 10:43:43 -04:00
Jerome Eteve c26f01eefd
client checks kernel module in /sys/module for WSL2 bridge networking (#17306) 2023-06-06 10:26:50 -04:00
Luiz Aoqui aa1b33d157
node pools: add event stream support (#17412) 2023-06-06 10:14:47 -04:00
Dao Thanh Tung 7c7f2d00bb
Add check for missing `path` in client `host_volume` config (#17393) 2023-06-05 19:31:19 -04:00
Tim Gross 2d16ec6c6f
node pools: implement RPC to list jobs in a given node pool (#17396)
Implements the `NodePool.ListJobs` RPC, with pagination and filtering based on
the existing `Job.List` RPC.
2023-06-05 15:36:52 -04:00
Seth Hoenig d1d4d22f8e
test: ensure cpuset cgroup is setup before fingerprinting (#17428)
This PR fixes a racey test where we need to ensure the cpuset cgroup
is setup before trying to fingerprint it.
2023-06-05 14:15:00 -05:00
Luiz Aoqui 700168e136
node pools: fix node upsert and state mutation tests (#17430) 2023-06-05 14:58:32 -04:00
Phil Renaud f348121ec7
[ui] Remove Ember Assets Github Actions workflow (#17426)
* Remove Ember Assets gha workflow

* PR write added to permissions
2023-06-05 13:52:20 -04:00
hashicorp-copywrite[bot] 0f4532f138
[COMPLIANCE] Add Copyright and License Headers (#17429)
Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
2023-06-05 13:23:59 -04:00
dependabot[bot] 2f4fe019db
build(deps): bump go.etcd.io/bbolt from 1.3.6 to 1.3.7 (#16228)
* build(deps): bump go.etcd.io/bbolt from 1.3.6 to 1.3.7

Bumps [go.etcd.io/bbolt](https://github.com/etcd-io/bbolt) from 1.3.6 to 1.3.7.
- [Release notes](https://github.com/etcd-io/bbolt/releases)
- [Commits](https://github.com/etcd-io/bbolt/compare/v1.3.6...v1.3.7)

---
updated-dependencies:
- dependency-name: go.etcd.io/bbolt
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* cl: update cl for bbolt

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-06-05 10:19:14 -05:00
dependabot[bot] b83a26a8d8
build(deps): bump github.com/dustin/go-humanize from 1.0.0 to 1.0.1 (#16227)
Bumps [github.com/dustin/go-humanize](https://github.com/dustin/go-humanize) from 1.0.0 to 1.0.1.
- [Release notes](https://github.com/dustin/go-humanize/releases)
- [Commits](https://github.com/dustin/go-humanize/compare/v1.0.0...v1.0.1)

---
updated-dependencies:
- dependency-name: github.com/dustin/go-humanize
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-05 10:17:04 -05:00
dependabot[bot] c585cc68db
build(deps): bump github.com/hashicorp/raft from 1.3.11 to 1.5.0 (#17421)
* build(deps): bump github.com/hashicorp/raft from 1.3.11 to 1.5.0

Bumps [github.com/hashicorp/raft](https://github.com/hashicorp/raft) from 1.3.11 to 1.5.0.
- [Release notes](https://github.com/hashicorp/raft/releases)
- [Changelog](https://github.com/hashicorp/raft/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/raft/compare/v1.3.11...v1.5.0)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/raft
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* cl: add cl for raft 1.5.0

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-06-05 09:03:02 -05:00
dependabot[bot] ff4c2e2ea0
build(deps): bump google.golang.org/protobuf from 1.28.1 to 1.30.0 (#17420)
Bumps google.golang.org/protobuf from 1.28.1 to 1.30.0.

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-05 08:57:33 -05:00
KamilCuk cc64281445
Add group_add docker option (#17313) 2023-06-02 20:26:01 -04:00
dependabot[bot] fd52020560
build(deps): bump github.com/shirou/gopsutil/v3 from 3.23.1 to 3.23.4 (#17338)
Bumps [github.com/shirou/gopsutil/v3](https://github.com/shirou/gopsutil) from 3.23.1 to 3.23.4.
- [Release notes](https://github.com/shirou/gopsutil/releases)
- [Commits](https://github.com/shirou/gopsutil/compare/v3.23.1...v3.23.4)

---
updated-dependencies:
- dependency-name: github.com/shirou/gopsutil/v3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-02 19:30:59 -04:00
Luiz Aoqui 6039c18ab6
node pools: register a node in a node pool (#17405) 2023-06-02 17:50:50 -04:00
Luiz Aoqui b770f2b1ef
node pools: implement CLI (#17388) 2023-06-02 15:49:57 -04:00
hc-github-team-es-release-engineering 6758379e48
ci: finish migration from CCI to GHA (#17103)
namely, these workflows:
  test-e2e, test-ui, and test-windows

extra-curricularly, as part of the overall
migration effort company-wide, this also includes
some standardization such as:
 * explicit permissions:read on various workflows
 * pinned action version shas (per https://github.com/hashicorp/security-public-tsccr)
 * actionlint, which among other things runs
   shellcheck on GHA run steps

Co-authored-by: emilymianeil <eneil@hashicorp.com>
Co-authored-by: Daniel Kimsey <daniel.kimsey@hashicorp.com>
2023-06-02 14:35:55 -05:00
Daniel Bennett f7e316e9cd
tests: enable newer windows (#17401)
* "allow" (don't try to drop) linux capabilities
  in the docker test driver harness (see #15181)
* refactor to allow different busybox images
  since windows containers need to be the same
  version as the underlying OS, and we're
  moving from 2016 to 2019
* one docker test was flaky from apparently
  being a bit slower on windows, so add Wait()
2023-06-02 11:38:38 -05:00
Luiz Aoqui 3a962d07f8
np: fix node pool search permission check (#17400)
When checking if a token is allowed to query the search endpoints we
need to return an error if the search context includes `node_pool` and
the token doesn't have access to _any_ pool. This prevents returning an
empty list instead of a permission denied error.
2023-06-02 12:22:47 -04:00
Phil Renaud 03dc959c2e
UI GHA test changes re-implemented (#17399) 2023-06-02 11:59:08 -04:00
Samantha b92a782b6e
check: Add support for Consul field tls_server_name (#17334) 2023-06-02 10:19:12 -04:00
Tim Gross 56e9b944e8
node pools: validate pool exists on job registration (#17386)
Add a new job admission hook for node pools that enforces the pool exists on
registration. Also provide the skeleton function we need for Enterprise
enforcement functions we'll implement later.
2023-06-02 09:32:07 -04:00
Luiz Aoqui f755b9469f
core: refactor task validation (#17344)
Move all validations related to task fields to Task.Validate(). Prior to
this, some task validations were being done inside TaskGroup.Validate()
because they required access to some group values.

But similarly to how TaskGroup.Validate() tasks the job as parameter,
it's fair to expect the task to receive its group.
2023-06-01 19:26:42 -04:00
Luiz Aoqui 4be8d7c049
core: fix kill_timeout validation when progress_deadline is 0 (#17342) 2023-06-01 19:01:32 -04:00
Luiz Aoqui 9bb57c08e3
node pool: add search support (#17385) 2023-06-01 17:48:14 -04:00
Tim Gross 4f14fa0518
node pools: add `node_pool` field to job spec (#17379)
This changeset only adds the `node_pool` field to the jobspec, and ensures that
it gets picked up correctly as a change. Without the rest of the implementation
landed yet, the field will be ignored.
2023-06-01 16:08:55 -04:00