open-nomad

Author	SHA1	Message	Date
Tim Gross	5b9322c70a	docs: clarify node pool apply/delete behavior (#17529 )	2023-06-14 15:58:53 -04:00
Tim Gross	dc9fae34ca	node pools: add pool as label on client metrics (#17528 ) This changeset adds the node pool as a label anywhere we're already emitting labels with additional information such as node class or ID about the client.	2023-06-14 15:58:38 -04:00
Tim Gross	5f509b8ce0	cli: fix missing `-quiet` flag for `var init` (#17526 ) The `var init` command was intended to have support for a `-quiet` flag but it was not documented and never parsed.	2023-06-14 14:52:46 -04:00
Tim Gross	736ad3ed32	docs: note namespace apply/delete behaviors, fix metric (#17527 ) This changeset includes some fixes to documentation discovered while working on node pools, but we didn't want to include in the node pool PRs so they can get backported easily: * namespace apply/delete commands are forwarded to the authoritative region * deleting a namespace requires there are no non-terminal jobs in any of the federated regions * fixed a typo in the name of the `nomad.client.allocated.disk` metric	2023-06-14 14:52:06 -04:00
Phil Renaud	7400c37b89	[ui] Job status panel: tooltips on individual allocs (#17514 ) * Tooltip on individual allocs in the panel * Isolate allocation cells to their own component * Tipsy trigger * Aria label for failed-or-lost tooltips * Buildfix * Try adding percy exec back to exam run	2023-06-14 12:45:36 -04:00
Luiz Aoqui	ec80d051d8	client: fix panic on alloc stop in non-Linux environments (#17515 ) Provide a no-op implementation of the drivers.DriverNetoworkManager interface to be used by systems that don't support network isolation and prevent panics where a network manager is expected.	2023-06-14 10:22:38 -04:00
James Rasell	70fc0fd701	build: add agent bindata file to copywrite ignore list. (#17507 )	2023-06-14 11:13:59 +01:00
Tim Gross	c1a01697c8	node pools: implement `node pool init` command (#17479 ) Implement a `nomad node pool init` command that generates an example spec file in either HCL or JSON format.	2023-06-13 14:51:29 -04:00
Luiz Aoqui	bc17cffaef	node pool: node pool upsert on multiregion node register (#17503 ) When registering a node with a new node pool in a non-authoritative region we can't create the node pool because this new pool will not be replicated to other regions. This commit modifies the node registration logic to only allow automatic node pool creation in the authoritative region. In non-authoritative regions, the client is registered, but the node pool is not created. The client is kept in the `initialing` status until its node pool is created in the authoritative region and replicated to the client's region.	2023-06-13 11:28:28 -04:00
Tim Gross	952eb2713e	node pools: protect against deleting occupied pools (#17457 ) We don't want to delete node pools that have nodes or non-terminal jobs. Add a check in the `DeleteNodePools` RPC to check locally and in federated regions, similar to how we check that it's safe to delete namespaces.	2023-06-13 09:57:42 -04:00
stswidwinski	9a58474400	conf: Add preemption_config to the server extra HCL keys which should be removed (#17481 ) Add preemption_config to the set of keys which should be pruned from the server config as described in #17480.	2023-06-13 10:48:19 +02:00
Daniel Bennett	fa8b102092	ci: remove circleci (#17502 ) all of our workflows are in GitHub Actions now 🎉	2023-06-12 16:28:19 -05:00
Tim Gross	e8a361310f	node pools: replicate from authoritative region (#17456 ) Upserts and deletes of node pools are forwarded to the authoritative region, just like we do for namespaces, quotas, ACL policies, etc. Replicate node pools from the authoritative region.	2023-06-12 13:24:24 -04:00
dependabot[bot]	d45bb4bab9	build(deps): bump github.com/hashicorp/go-plugin from 1.4.9 to 1.4.10 (#17486 )	2023-06-12 14:22:33 +01:00
Tim Gross	bb7f0edd6a	node pools: prevent panic on upsert during upgrades (#17474 ) Whenever we write a Raft log entry for node pools, we need to first make sure that all servers can safely apply the log without panicking. Gate upsert and delete RPCs on all servers being upgraded to the minimum version.	2023-06-12 09:01:30 -04:00
Tim Gross	e3a37c0b97	replication: fix potential panic during upgrades (#17476 ) If the authoritative region has been upgraded to a version of Nomad that has new replicated objects (such as ACL Auth Methods, ACL Binding Rules, etc.), the non-authoritative regions will start replicating those objects as soon as their leader is upgraded. If a server in the non-authoritative region is upgraded and then becomes the leader before all the other servers in the region have been upgraded, then it will attempt to write a Raft log entry that the followers don't understand. The followers will then panic. Add same the minimum version checks that we do for RPC writes to the leader's replication loop.	2023-06-12 08:53:56 -04:00
dependabot[bot]	8bd3bdab42	build(deps): bump github.com/shoenig/go-m1cpu from 0.1.5 to 0.1.6 (#17487 )	2023-06-12 12:08:16 +01:00
dependabot[bot]	c1f5ffb3bc	build(deps): bump github.com/fatih/color from 1.13.0 to 1.15.0 (#17485 )	2023-06-12 10:44:18 +01:00
Phil Renaud	6a9df6e3ab	[ui] Don't show a service as healthy when its parent alloc is not running (#17465 ) * Fix: dont show a service as healthy when its parent alloc is not running * Test for Health Unknown	2023-06-09 15:43:11 -04:00
Piotr Kazmierczak	57dad0ca07	docs: corrections and additional information for OIDC-related concepts (#17470 )	2023-06-09 16:50:22 +02:00
Piotr Kazmierczak	0a4052ece5	docs: add missing login API endpoint documentation (#17467 )	2023-06-09 15:59:01 +02:00
Seth Hoenig	557a6b4a5e	docker: stop network pause container of lost alloc after node restart (#17455 ) This PR fixes a bug where the docker network pause container would not be stopped and removed in the case where a node is restarted, the alloc is moved to another node, the node comes back up. See the issue below for full repro conditions. Basically in the DestroyNetwork PostRun hook we would depend on the NetworkIsolationSpec field not being nil - which is only the case if the Client stays alive all the way from network creation to network teardown. If the node is rebooted we lose that state and previously would not be able to find the pause container to remove. Now, we manually find the pause container by scanning them and looking for the associated allocID. Fixes #17299	2023-06-09 08:46:29 -05:00
Phil Renaud	944f30674d	[ui] Parallelize ember tests (#17442 ) * Exam to parallelize tests * Logging to try to solve test flakiness * Logging in another failure * Hardening for one test and snapshot for another * Explicitly set the first one as the servicedAlloc instead of randomly picking * A wild CircleCI test failure appears * de-log	2023-06-07 17:01:35 -04:00
Seth Hoenig	134e70cbab	client: fix client panic during drain cause by shutdown (#17450 ) During shutdown of a client with drain_on_shutdown there is a race between the Client ending the cgroup and the task's cpuset manager cleaning up the cgroup. During the path traversal, skip anything we cannot read, which avoids the nil DirEntry we try to dereference now.	2023-06-07 15:12:44 -05:00
Tim Gross	64a4c6204a	build: update to go1.20.5 (#17451 ) Go released a security update to fix build-time code injection and execution via CGO. This doesn't impact already-released versions of Nomad, just the build toolchain, so we won't be releasing a Nomad security update to go with it.	2023-06-07 11:44:59 -04:00
Tim Gross	fbaf4c8b69	node pools: implement support in scheduler (#17443 ) Implement scheduler support for node pool: * When a scheduler is invoked, we get a set of the ready nodes in the DCs that are allowed for that job. Extend the filter to include the node pool. * Ensure that changes to a job's node pool are picked up as destructive allocation updates. * Add `NodesInPool` as a metric to all reporting done by the scheduler. * Add the node-in-pool the filter to the `Node.Register` RPC so that we don't generate spurious evals for nodes in the wrong pool.	2023-06-07 10:39:03 -04:00
Luiz Aoqui	5878113c41	node pool: implement `nomad node pool nodes` CLI (#17444 )	2023-06-07 10:37:27 -04:00
Tim Gross	06fc284644	node pools: implement CLI for `node pool jobs` command (#17432 )	2023-06-06 15:02:26 -04:00
Tim Gross	c0f2295510	node pools: implement HTTP API to list jobs in pool (#17431 ) Implements the HTTP API associated with the `NodePool.ListJobs` RPC, including the `api` package for the public API and documentation. Update the `NodePool.ListJobs` RPC to fix the missing handling of the special "all" pool.	2023-06-06 11:40:13 -04:00
Luiz Aoqui	2420c93179	node pools: list nodes in pool (#17413 )	2023-06-06 10:43:43 -04:00
Jerome Eteve	c26f01eefd	client checks kernel module in /sys/module for WSL2 bridge networking (#17306 )	2023-06-06 10:26:50 -04:00
Luiz Aoqui	aa1b33d157	node pools: add event stream support (#17412 )	2023-06-06 10:14:47 -04:00
Dao Thanh Tung	7c7f2d00bb	Add check for missing `path` in client `host_volume` config (#17393 )	2023-06-05 19:31:19 -04:00
Tim Gross	2d16ec6c6f	node pools: implement RPC to list jobs in a given node pool (#17396 ) Implements the `NodePool.ListJobs` RPC, with pagination and filtering based on the existing `Job.List` RPC.	2023-06-05 15:36:52 -04:00
Seth Hoenig	d1d4d22f8e	test: ensure cpuset cgroup is setup before fingerprinting (#17428 ) This PR fixes a racey test where we need to ensure the cpuset cgroup is setup before trying to fingerprint it.	2023-06-05 14:15:00 -05:00
Luiz Aoqui	700168e136	node pools: fix node upsert and state mutation tests (#17430 )	2023-06-05 14:58:32 -04:00
Phil Renaud	f348121ec7	[ui] Remove Ember Assets Github Actions workflow (#17426 ) * Remove Ember Assets gha workflow * PR write added to permissions	2023-06-05 13:52:20 -04:00
hashicorp-copywrite[bot]	0f4532f138	[COMPLIANCE] Add Copyright and License Headers (#17429 ) Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>	2023-06-05 13:23:59 -04:00
dependabot[bot]	2f4fe019db	build(deps): bump go.etcd.io/bbolt from 1.3.6 to 1.3.7 (#16228 ) * build(deps): bump go.etcd.io/bbolt from 1.3.6 to 1.3.7 Bumps [go.etcd.io/bbolt](https://github.com/etcd-io/bbolt) from 1.3.6 to 1.3.7. - [Release notes](https://github.com/etcd-io/bbolt/releases) - [Commits](https://github.com/etcd-io/bbolt/compare/v1.3.6...v1.3.7) --- updated-dependencies: - dependency-name: go.etcd.io/bbolt dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * cl: update cl for bbolt --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-06-05 10:19:14 -05:00
dependabot[bot]	b83a26a8d8	build(deps): bump github.com/dustin/go-humanize from 1.0.0 to 1.0.1 (#16227 ) Bumps [github.com/dustin/go-humanize](https://github.com/dustin/go-humanize) from 1.0.0 to 1.0.1. - [Release notes](https://github.com/dustin/go-humanize/releases) - [Commits](https://github.com/dustin/go-humanize/compare/v1.0.0...v1.0.1) --- updated-dependencies: - dependency-name: github.com/dustin/go-humanize dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-05 10:17:04 -05:00
dependabot[bot]	c585cc68db	build(deps): bump github.com/hashicorp/raft from 1.3.11 to 1.5.0 (#17421 ) * build(deps): bump github.com/hashicorp/raft from 1.3.11 to 1.5.0 Bumps [github.com/hashicorp/raft](https://github.com/hashicorp/raft) from 1.3.11 to 1.5.0. - [Release notes](https://github.com/hashicorp/raft/releases) - [Changelog](https://github.com/hashicorp/raft/blob/main/CHANGELOG.md) - [Commits](https://github.com/hashicorp/raft/compare/v1.3.11...v1.5.0) --- updated-dependencies: - dependency-name: github.com/hashicorp/raft dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * cl: add cl for raft 1.5.0 --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-06-05 09:03:02 -05:00
dependabot[bot]	ff4c2e2ea0	build(deps): bump google.golang.org/protobuf from 1.28.1 to 1.30.0 (#17420 ) Bumps google.golang.org/protobuf from 1.28.1 to 1.30.0. --- updated-dependencies: - dependency-name: google.golang.org/protobuf dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-05 08:57:33 -05:00
KamilCuk	cc64281445	Add group_add docker option (#17313 )	2023-06-02 20:26:01 -04:00
dependabot[bot]	fd52020560	build(deps): bump github.com/shirou/gopsutil/v3 from 3.23.1 to 3.23.4 (#17338 ) Bumps [github.com/shirou/gopsutil/v3](https://github.com/shirou/gopsutil) from 3.23.1 to 3.23.4. - [Release notes](https://github.com/shirou/gopsutil/releases) - [Commits](https://github.com/shirou/gopsutil/compare/v3.23.1...v3.23.4) --- updated-dependencies: - dependency-name: github.com/shirou/gopsutil/v3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-02 19:30:59 -04:00
Luiz Aoqui	6039c18ab6	node pools: register a node in a node pool (#17405 )	2023-06-02 17:50:50 -04:00
Luiz Aoqui	b770f2b1ef	node pools: implement CLI (#17388 )	2023-06-02 15:49:57 -04:00
hc-github-team-es-release-engineering	6758379e48	ci: finish migration from CCI to GHA (#17103 ) namely, these workflows: test-e2e, test-ui, and test-windows extra-curricularly, as part of the overall migration effort company-wide, this also includes some standardization such as: * explicit permissions:read on various workflows * pinned action version shas (per https://github.com/hashicorp/security-public-tsccr) * actionlint, which among other things runs shellcheck on GHA run steps Co-authored-by: emilymianeil <eneil@hashicorp.com> Co-authored-by: Daniel Kimsey <daniel.kimsey@hashicorp.com>	2023-06-02 14:35:55 -05:00
Daniel Bennett	f7e316e9cd	tests: enable newer windows (#17401 ) * "allow" (don't try to drop) linux capabilities in the docker test driver harness (see #15181) * refactor to allow different busybox images since windows containers need to be the same version as the underlying OS, and we're moving from 2016 to 2019 * one docker test was flaky from apparently being a bit slower on windows, so add Wait()	2023-06-02 11:38:38 -05:00
Luiz Aoqui	3a962d07f8	np: fix node pool search permission check (#17400 ) When checking if a token is allowed to query the search endpoints we need to return an error if the search context includes `node_pool` and the token doesn't have access to _any_ pool. This prevents returning an empty list instead of a permission denied error.	2023-06-02 12:22:47 -04:00
Phil Renaud	03dc959c2e	UI GHA test changes re-implemented (#17399 )	2023-06-02 11:59:08 -04:00

1 2 3 4 5 ...

24720 commits