open-nomad

Commit Graph

Author	SHA1	Message	Date
Piotr Kazmierczak	a9230fb0b7	acl: JWT auth method	2023-03-30 09:39:56 +02:00
Tim Gross	76284a09a0	docker: move pause container recovery to after `SetConfig` (#16713 ) When we added recovery of pause containers in #16352 we called the recovery function from the plugin factory function. But in our plugin setup protocol, a plugin isn't ready for use until we call `SetConfig`. This meant that recovering pause containers was always done with the default config. Setting up the Docker client only happens once, so setting the wrong config in the recovery function also means that all other Docker API calls will use the default config. Move the `recoveryPauseContainers` call into the `SetConfig`. Fix the error handling so that we return any error but also don't log when the context is canceled, which happens twice during normal startup as we fingerprint the driver.	2023-03-29 16:20:37 -04:00
dependabot[bot]	afa9608475	build(deps): bump github.com/opencontainers/runc from 1.1.4 to 1.1.5 (#16712 ) * build(deps): bump github.com/opencontainers/runc from 1.1.4 to 1.1.5 Bumps [github.com/opencontainers/runc](https://github.com/opencontainers/runc) from 1.1.4 to 1.1.5. - [Release notes](https://github.com/opencontainers/runc/releases) - [Changelog](https://github.com/opencontainers/runc/blob/v1.1.5/CHANGELOG.md) - [Commits](https://github.com/opencontainers/runc/compare/v1.1.4...v1.1.5) --- updated-dependencies: - dependency-name: github.com/opencontainers/runc dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * changelog entry --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-03-29 15:05:05 -04:00
Juana De La Cuesta	dd770027df	fix: clean the output writter to avoid duplicates when testing for json output (#16619 )	2023-03-29 12:05:23 +02:00
Max Fröhlich	ba590b081e	docs: mention Nomad Admission Control Proxy (#16702 )	2023-03-28 15:18:26 -04:00
Tim Gross	f22ff2b847	docs: clarify capabilities options for `docker` driver (#16693 ) The `docker` driver cannot expand capabilities beyond the default set when the task is a non-root user. Clarify this in the documentation of `allow_caps` and update the `cap_add` and `cap_drop` to match the `exec` driver, which has more clear language overall.	2023-03-28 13:32:08 -04:00
Elvis Pranskevichus	11a9bb6ce7	drivers/exec: Fix handling of capabilities for unprivileged tasks (#16643 ) Currently, the `exec` driver is only setting the Bounding set, which is not sufficient to actually enable the requisite capabilities for the task process. In order for the capabilities to survive `execve` performed by libcontainer, the `Permitted`, `Inheritable`, and `Ambient` sets must also be set. Per CAPABILITIES (7): > Ambient: This is a set of capabilities that are preserved across an > execve(2) of a program that is not privileged. The ambient capability > set obeys the invariant that no capability can ever be ambient if it > is not both permitted and inheritable.	2023-03-28 12:12:55 -04:00
James Rasell	17fd1a2e35	dev: make cni, consul, dev, docker, and vault scripts Lima compat. (#16689 )	2023-03-28 16:21:14 +01:00
Seth Hoenig	87f4b71df0	client/fingerprint: correctly fingerprint E/P cores of Apple Silicon chips (#16672 ) * client/fingerprint: correctly fingerprint E/P cores of Apple Silicon chips This PR adds detection of asymetric core types (Power & Efficiency) (P/E) when running on M1/M2 Apple Silicon CPUs. This functionality is provided by shoenig/go-m1cpu which makes use of the Apple IOKit framework to read undocumented registers containing CPU performance data. Currently working on getting that functionality merged upstream into gopsutil, but gopsutil would still not support detecting P vs E cores like this PR does. Also refactors the CPUFingerprinter code to handle the mixed core types, now setting power vs efficiency cpu attributes. For now the scheduler is still unaware of mixed core types - on Apple platforms tasks cannot reserve cores anyway so it doesn't matter, but at least now the total CPU shares available will be correct. Future work should include adding support for detecting P/E cores on the latest and upcoming Intel chips, where computation of total cpu shares is currently incorrect. For that, we should also include updating the scheduler to be core-type aware, so that tasks of resources.cores on Linux platforms can be assigned the correct number of CPU shares for the core type(s) they have been assigned. node attributes before cpu.arch = arm64 cpu.modelname = Apple M2 Pro cpu.numcores = 12 cpu.reservablecores = 0 cpu.totalcompute = 1000 node attributes after cpu.arch = arm64 cpu.frequency.efficiency = 2424 cpu.frequency.power = 3504 cpu.modelname = Apple M2 Pro cpu.numcores.efficiency = 4 cpu.numcores.power = 8 cpu.reservablecores = 0 cpu.totalcompute = 37728 * fingerprint/cpu: follow up cr items	2023-03-28 08:27:58 -05:00
James Rasell	a18e480a57	dev: modify Go install to support arch64 and non-vagrant machines. (#16651 )	2023-03-28 14:18:48 +01:00
Tim Gross	78acc75b57	docs: add notes about keyring to snapshot restore (#16663 ) When cluster administrators restore from Raft snapshot, they also need to ensure the keyring is in place. For on-prem users doing in-place upgrades this is less of a concern but for typical cloud workflows where the whole host is replaced, it's an important warning (at least until #14852 has been implemented).	2023-03-28 08:31:01 -04:00
Tim Gross	a953456460	docs: fix template retry attempts default documentation (#16667 ) The configuration docs for `client.template.vault_retry`, `consul_retry`, and `nomad_retry` incorrectly document the default number of attempts to be unlimited (0). When we added these config blocks, we defaulted the fields to `nil` for backwards compatibility, which causes them to fall back to the default consul-template configuration values.	2023-03-28 08:27:06 -04:00
James Rasell	a53f9a4094	docs: fix-up legacy link in client config page. (#16678 )	2023-03-28 09:32:34 +01:00
Tobias Birkefeld	581eba9f41	docs: fix link of Read Stats API (#16673 ) The former link results in a 404. Update the link to the correct developer docs.	2023-03-28 08:49:44 +01:00
James Rasell	28c142c1a6	dev: account for non-vagrant machines on Linux config priv. (#16657 )	2023-03-27 17:13:18 +01:00
Juana De La Cuesta	320884b8ee	Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true (#16583 ) * Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true Fixes #11052 When restoring periodic dispatcher, all periodic jobs are forced without checking for previous childre. * Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true Fixes #11052 When restoring periodic dispatcher, all periodic jobs are forced without checking for previous children. * style: refactor force run function * fix: remove defer and inline unlock for speed optimization * Update nomad/leader.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * style: refactor tests to use must * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * fix: move back from defer to calling unlock before returning. createEval cant be called with the lock on * style: refactor test to use must * added new entry to changelog and update comments --------- Co-authored-by: James Rasell <jrasell@hashicorp.com> Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-03-27 17:25:05 +02:00
Juana De La Cuesta	21b675244e	style: rename ForceRun to ForceEval, for clarity (#16617 )	2023-03-27 15:38:48 +02:00
Luiz Aoqui	8070882c4b	scheduler: fix reconciliation of reconnecting allocs (#16609 ) When a disconnect client reconnects the `allocReconciler` must find the allocations that were created to replace the original disconnected allocations. This process was being done in only a subset of non-terminal untainted allocations, meaning that, if the replacement allocations were not in this state the reconciler didn't stop them, leaving the job in an inconsistent state. This inconsistency is only solved in a future job evaluation, but at that point the allocation is considered reconnected and so the specific reconnection logic was not applied, leading to unexpected outcomes. This commit fixes the problem by running reconnecting allocation reconciliation logic earlier into the process, leaving the rest of the reconciler oblivious of reconnecting allocations. It also uses the full set of allocations to search for replacements, stopping them even if they are not in the `untainted` set. The system `SystemScheduler` is not affected by this bug because disconnected clients don't trigger replacements: every eligible client is already running an allocation.	2023-03-24 19:38:31 -04:00
ron-savoia	743414739d	docs: added section of needed ACL rules for Nomad UI (#16494 )	2023-03-24 08:57:16 -04:00
Luiz Aoqui	e5d31bca61	cli: job restart command (#16278 ) Implement the new `nomad job restart` command that allows operators to restart allocations tasks or reschedule then entire allocation. Restarts can be batched to target multiple allocations in parallel. Between each batch the command can stop and hold for a predefined time or until the user confirms that the process should proceed. This implements the "Stateless Restarts" alternative from the original RFC (https://gist.github.com/schmichael/e0b8b2ec1eb146301175fd87ddd46180). The original concept is still worth implementing, as it allows this functionality to be exposed over an API that can be consumed by the Nomad UI and other clients. But the implementation turned out to be more complex than we initially expected so we thought it would be better to release a stateless CLI-based implementation first to gather feedback and validate the restart behaviour. Co-authored-by: Shishir Mahajan <smahajan@roblox.com>	2023-03-23 18:28:26 -04:00
Luiz Aoqui	4ccd999304	ci: send notification when prepare is complete (#16627 )	2023-03-23 17:34:45 -04:00
Tim Gross	977c88dcea	drainer: test refactoring to clarify behavior around delete/down nodes (#16612 ) This changeset refactors the tests of the draining node watcher so that we don't mock the node watcher's `Remove` and `Update` methods for its own tests. Instead we'll mock the node watcher's dependencies (the job watcher and deadline notifier) and now unit tests can cover the real code. This allows us to remove a bunch of TODOs in `watch_nodes.go` around testing and clarify some important behaviors: * Nodes that are down or disconnected will still be watched until the scheduler decides what to do with their allocations. This will drive the job watcher but not the node watcher, and that lets the node watcher gracefully handle cases where a heartbeat fails but the node heartbeats again before its allocs can be evicted. * Stop watching nodes that have been deleted. The blocking query for nodes set the maximum index to the highest index of a node it found, rather than the index of the nodes table. This misses updates to the index from deleting nodes. This was done as an performance optimization to avoid excessive unblocking, but because the query is over all nodes anyways there's no optimization to be had here. Remove the optimization so we can detect deleted nodes without having to wait for an update to an unrelated node.	2023-03-23 14:07:09 -04:00
Michael Schurter	5e6799164f	Post 1.5.2 release (#16614 ) * Generate files for 1.5.2 release * Prepare for next release * add 1.4.7 and 1.3.12 to the changelog --------- Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>	2023-03-22 14:23:38 -07:00
Phil Renaud	11de45d17b	[ui] Copyable server and client attribute values (#16548 ) * Copyable server and client attribute values * Changelog	2023-03-22 15:05:01 -04:00
Juana De La Cuesta	5892839c83	Fix broken test for quotas CLI (#16610 ) * fix: fix broken test * fix: fix broken test for quota status	2023-03-22 19:07:37 +01:00
James Rasell	7dd1484757	docs: detail support for Nomad checks in service block. (#16598 )	2023-03-22 09:27:58 +01:00
Michael Schurter	d2aa8fcdc7	taskapi: use HasSuffix to detect errors from rpcs (#16594 ) Matches the "normal" HTTP error detection logic in the same file.	2023-03-21 14:38:07 -07:00
Michael Schurter	4678dc7b4d	e2e: sleep to ensure logs are picked up (#16596 ) :(	2023-03-21 14:10:50 -07:00
Tim Gross	ad774ccfa1	E2E: fix events tests (#16595 ) In #12916 we updated the events test as part of a larger set of changes around mapstructure serialization fixes. But the changes to the jobs we're deploying in the tests had invalid task configs so they never result in good deployments and the test will always fail. Make the before/after jobs identical (except for the version bump) and make them valid. Also wait for allocations for the 2nd job run to appear before checking the deployment list, so that we don't race with the scheduler.	2023-03-21 14:01:40 -07:00
Michael Schurter	15fe2ade18	Windows fixes for e2e tests (#16592 ) * e2e: skip task api test when windows too old * e2e: don't run proxy on windows	2023-03-21 13:55:32 -07:00
Suselz	b3d2ec7634	Update csi_plugin.mdx (#16584 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-03-21 16:16:18 +01:00
Tim Gross	1763622dfd	contrib: architecture guide to the drainer (#16569 ) The drainer component is fairly complex. As part of upcoming work to fix some of the drainer's rough edges, document the drainer's architecture from a Nomad developer perspective.	2023-03-21 09:17:24 -04:00
Luiz Aoqui	518fd610b3	changelog: update #16427 to improvement (#16565 ) The security fix in Go 1.20.2 does not apply to Nomad.	2023-03-20 21:24:53 -04:00
Michael Schurter	f8884d8b52	client/metadata: fix crasher caused by AllowStale = false (#16549 ) Fixes #16517 Given a 3 Server cluster with at least 1 Client connected to Follower 1: If a NodeMeta.{Apply,Read} for the Client request is received by Follower 1 with `AllowStale = false` the Follower will forward the request to the Leader. The Leader, not being connected to the target Client, will forward the RPC to Follower 1. Follower 1, seeing AllowStale=false, will forward the request to the Leader. The Leader, not being connected to... well hoppefully you get the picture: an infinite loop occurs.	2023-03-20 16:32:32 -07:00
Tim Gross	d1b35c6bd0	contrib: mock driver (#16573 )	2023-03-20 16:35:32 -04:00
James Rasell	2f4680680f	dev: remove use of cfssl and use Nomad CLI for TLS certs. (#16145 )	2023-03-20 17:06:15 +01:00
James Rasell	4825b40e9a	docs: remove Java and Scala SDKs from supported list. (#16555 )	2023-03-20 15:35:02 +01:00
Phil Renaud	ccce4b68f2	[ui] Perform common job tasks with keyboard shortcuts (#16378 ) * Throw your mouse into traffic * Add node metadata with a shortcut * Re-labelled * Adds a toast notification to job start/stop on keyboard shortcut * Typo fix	2023-03-20 09:24:39 -04:00
Juana De La Cuesta	47be374bbd	Add `-json` flag to `quota inspect` command (#16478 ) * Added and flag to command * cli[style]: small refactor to avoid confussion with tmpl variable * Update inspect.mdx * cli: add changelog entry * Update .changelog/16478.txt Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update command/quota_inspect.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-03-20 10:40:51 +01:00
Juana De La Cuesta	ed44f50091	cli: add `-json` and `-t` flags to `quota status` command (#16485 ) * cli: add json and t flags to quota status command * cli: add entry to changelog * Update command/quota_status.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-03-20 10:39:56 +01:00
Juana De La Cuesta	eeb3766575	cli: Add `json` and `-t` flags to `server members` command (#16444 ) * cli: Add and flags to server members * Update website/content/docs/commands/server/members.mdx Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update website/content/docs/commands/server/members.mdx Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * cli: update the server memebers tests to use must * cli: add flags addition to changelog --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-03-20 10:39:24 +01:00
Adam Pugh	e4e53872be	Spelling update (#16553 ) updated propogating to propagating	2023-03-20 09:24:41 +01:00
Seth Hoenig	d6dcc53c0a	tls enforcement flaky tests (#16543 ) * tests: add WaitForLeaders helpers using must/wait timings * tests: start servers for mtls tests together Fixes #16253 (hopefully)	2023-03-17 14:11:13 -05:00
Piotr Kazmierczak	0a2b425eb5	cli: nomad login command should not require a -type flag and should respect default auth method (#16504 ) nomad login command does not need to know ACL Auth Method's type, since all method names are unique. Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-03-17 19:14:28 +01:00
Seth Hoenig	07543f8bdf	nsd: always set deregister flag after deregistration of group (#16289 ) * services: always set deregister flag after deregistration of group This PR fixes a bug where the group service hook's deregister flag was not set in some cases, causing the hook to attempt deregistrations twice during job updates (alloc replacement). In the tests ... we used to assert on the wrong behvior (remove twice) which has now been corrected to assert we remove only once. This bug was "silent" in the Consul provider world because the error logs for double deregistration only show up in Consul logs; with the Nomad provider the error logs are in the Nomad agent logs. * services: cleanup group service hook tests	2023-03-17 09:44:21 -05:00
Piotr Kazmierczak	14927e93bc	acl: fix canonicalization of OIDC auth method mock (#16534 )	2023-03-17 15:37:54 +01:00
James Rasell	4a5d7d3793	docs: add binding-rule selector escape example on Windows PS (#16273 )	2023-03-17 15:13:35 +01:00
Michael Schurter	a875bad6e5	Enable ACLs on E2E test clients (#16530 ) * e2e: uniformly enable acls across all agents * docs: clarify that acls should be set everywhere	2023-03-16 14:22:41 -07:00
Tim Gross	ec47b245d0	client: don't use `Status` RPC for Consul discovery (#16490 ) In #16217 we switched clients using Consul discovery to the `Status.Members` endpoint for getting the list of servers so that we're using the correct address. This endpoint has an authorization gate, so this fails if the anonymous policy doesn't have `node:read`. We also can't check the `AuthToken` for the request for the client secret, because the client hasn't yet registered so the server doesn't have anything to compare against. Instead of hitting the `Status.Peers` or `Status.Members` RPC endpoint, use the Consul response directly. Update the `registerNode` method to handle the list of servers we get back in the response; if we get a "no servers" or "no path to region" response we'll kick off discovery again and retry immediately rather than waiting 15s.	2023-03-16 15:38:33 -04:00
Seth Hoenig	5b1970468e	artifact: git needs more files for private repositories (#16508 ) * landlock: git needs more files for private repositories This PR fixes artifact downloading so that git may work when cloning from private repositories. It needs - file read on /etc/passwd - dir read on /root/.ssh - file write on /root/.ssh/known_hosts Add these rules to the landlock rules for the artifact sandbox. * cr: use nonexistent instead of devnull Co-authored-by: Michael Schurter <mschurter@hashicorp.com> * cr: use go-homdir for looking up home directory * pr: pull go-homedir into explicit require * cr: fixup homedir tests in homeless root cases * cl: fix root test for real --------- Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2023-03-16 12:22:25 -05:00

1 2 3 4 5 ...

24460 Commits All Branches Search

24460 Commits

All Branches