open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	c3d9c598f5	Merge release 1.5.3 files	2023-04-05 12:32:00 -04:00
hc-github-team-nomad-core	3578078caf	Prepare for next release	2023-04-05 12:31:42 -04:00
hc-github-team-nomad-core	b64ee2726d	Generate files for 1.5.3 release	2023-04-05 12:31:30 -04:00
Tim Gross	66a01bb35a	upgrade go to 1.20.3	2023-04-05 12:18:19 -04:00
Tim Gross	8278f23042	acl: fix ACL bypass for anon requests that pass thru client HTTP Requests without an ACL token that pass thru the client's HTTP API are treated as though they come from the client itself. This allows bypass of ACLs on RPC requests where ACL permissions are checked (like `Job.Register`). Invalid tokens are correctly rejected. Fix the bypass by only setting a client ID on the identity if we have a valid node secret. Note that this changeset will break rate metrics for RPCs sent by clients without a client secret such as `Node.GetClientAllocs`; these requests will be recorded as anonymous. Future work should: * Ensure the node secret is sent with all client-driven RPCs except `Node.Register` which is TOFU. * Create a new `acl.ACL` object from client requests so that we can enforce ACLs for all endpoints in a uniform way that's less error-prone.~	2023-04-05 12:17:51 -04:00
Juana De La Cuesta	9b4871fece	Prevent kill_timeout greater than progress_deadline (#16761 ) * func: add validation for kill timeout smaller than progress dealine * style: add changelog * style: typo in changelog * style: remove refactored test * Update .changelog/16761.txt Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/structs/structs.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-04-04 18:17:10 +02:00
Seth Hoenig	15a2d912b3	cleanup: use jobID name rather than jobName in job endpoints (#16777 ) These endpoints all refer to JobID by the time you get to the RPC request layer, but the HTTP handler functions call the field JobName, which is a different field (... though often with the same value).	2023-04-04 09:11:58 -05:00
James Rasell	bcfb4ea1f2	cli: fix up failing quota inspect enterprise test. (#16781 )	2023-04-04 15:02:40 +01:00
James Rasell	cb6ba80f0f	cli: stream both stdout and stderr when following an alloc. (#16556 ) This update changes the behaviour when following logs from an allocation, so that both stdout and stderr files streamed when the operator supplies the follow flag. The previous behaviour is held when all other flags and situations are provided. Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-04-04 10:42:27 +01:00
Mike Nomitch	b5a1051fe6	Merge pull request #16575 from hashicorp/docs-add-roadmap-project Adds public roadmap project to readme	2023-04-03 08:21:13 -07:00
Tim Gross	118b703164	CSI: set mounts in alloc hook resources atomically (#16722 ) The allocrunner has a facility for passing data written by allocrunner hooks to taskrunner hooks. Currently the only consumers of this facility are the allocrunner CSI hook (which writes data) and the taskrunner volume hook (which reads that same data). The allocrunner hook for CSI volumes doesn't set the alloc hook resources atomically. Instead, it gets the current resources and then writes a new version back. Because the CSI hook is currently the only writer and all readers happen long afterwards, this should be safe but #16623 shows there's some sequence of events during restore where this breaks down. Refactor hook resources so that hook data is accessed via setters and getters that hold the mutex.	2023-04-03 11:03:36 -04:00
Tim Gross	0c582a2c94	docs: fix use of gpg to avoid teeing binary to terminal (#16767 )	2023-04-03 10:54:21 -04:00
Tim Gross	ffd5435ceb	docs: fix install instructions for apt (#16764 ) The workflow described in the docs for apt installation is deprecated. Update to match the workflow described in the Tutorials and official packaging guide.	2023-04-03 10:06:59 -04:00
Georgy Buranov	ca80546ef7	take maximum processor Mhz (#16740 ) * take maximum processor Mhz * remove break * cl: add cl for 16740 --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-03-31 11:25:32 -05:00
Juana De La Cuesta	89baa13b14	Update quota name on failing test for quota status (#16662 ) * fix: update quota name on test * Update quota_status_test.go * Update quota_status_test.go * fix: simplify template call for quota status	2023-03-31 18:07:21 +02:00
Juana De La Cuesta	1fc13b83d8	style: update documentation (#16729 )	2023-03-31 16:38:16 +02:00
Daniel Bennett	c9adc22eec	Update enterprise licensing documentation (#16615 ) updated various docs for new expiration behavior and new command `nomad license inspect` to validate pre-upgrade	2023-03-30 16:40:19 -05:00
Daniel Bennett	c42950e342	ent: move all license info into LicenseConfig{} (#16738 ) and add new TestConfigForServer() to get a valid nomad.Config to use in tests Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-03-30 16:15:05 -05:00
Horacio Monsalvo	20372b1721	connect: add meta on ConsulSidecarService (#16705 ) Co-authored-by: Sol-Stiep <sol.stiep@southworks.com>	2023-03-30 16:09:28 -04:00
Luiz Aoqui	fa4ee68c6a	ci: use `BACKPORT_MERGE_COMMIT` option (#16730 ) Instead of attempting to pick each individual commit in a PR using `BACKPORT_MERGE_COMMIT` only picks the commit that was merged into `main`. This reduces the amount of work done during a backport, generating cleaner merges and avoiding potential issues on specific commits. With this setting PRs that are not squashed will fail to backport and must be handled manually, but those are considered exceptions.	2023-03-30 11:49:46 -04:00
Piotr Kazmierczak	1470d2ff62	Merge pull request #15897 from hashicorp/f-sso-jwt-auth-method acl: JWT as SSO auth method	2023-03-30 17:07:50 +02:00
Piotr Kazmierczak	1a5eba24a6	acl: set minACLJWTAuthMethodVersion to 1.5.3 and adjust code comment	2023-03-30 15:30:42 +02:00
Phil Renaud	e9a114e249	[ui] Web sign-in with JWT (#16625 ) * Bones of JWT detection * JWT to token pipeline complete * Some live-demo fixes for template language * findSelf and loginJWT funcs made async * Acceptance tests and mirage mocks for JWT login * [ui] Allow for multiple JWT auth methods in the UI (#16665) * Split selectable jwt methods * repositions the dropdown to be next to the input field	2023-03-30 09:40:12 +02:00
Piotr Kazmierczak	d98c8f6759	acl: rebased on main and changed the gate to 1.5.3-dev	2023-03-30 09:40:12 +02:00
Piotr Kazmierczak	acfc266c30	acl: JWT changelog entry and typo fix	2023-03-30 09:40:11 +02:00
Piotr Kazmierczak	4609119fb5	acl: JWT auth CLI (#16532 )	2023-03-30 09:39:56 +02:00
Piotr Kazmierczak	16b6bd9ff2	acl: fix canonicalization of JWT auth method mock (#16531 )	2023-03-30 09:39:56 +02:00
Piotr Kazmierczak	2b353902a1	acl: HTTP endpoints for JWT auth (#16519 )	2023-03-30 09:39:56 +02:00
Piotr Kazmierczak	e48c48e89b	acl: RPC endpoints for JWT auth (#15918 )	2023-03-30 09:39:56 +02:00
Piotr Kazmierczak	a9230fb0b7	acl: JWT auth method	2023-03-30 09:39:56 +02:00
Tim Gross	76284a09a0	docker: move pause container recovery to after `SetConfig` (#16713 ) When we added recovery of pause containers in #16352 we called the recovery function from the plugin factory function. But in our plugin setup protocol, a plugin isn't ready for use until we call `SetConfig`. This meant that recovering pause containers was always done with the default config. Setting up the Docker client only happens once, so setting the wrong config in the recovery function also means that all other Docker API calls will use the default config. Move the `recoveryPauseContainers` call into the `SetConfig`. Fix the error handling so that we return any error but also don't log when the context is canceled, which happens twice during normal startup as we fingerprint the driver.	2023-03-29 16:20:37 -04:00
dependabot[bot]	afa9608475	build(deps): bump github.com/opencontainers/runc from 1.1.4 to 1.1.5 (#16712 ) * build(deps): bump github.com/opencontainers/runc from 1.1.4 to 1.1.5 Bumps [github.com/opencontainers/runc](https://github.com/opencontainers/runc) from 1.1.4 to 1.1.5. - [Release notes](https://github.com/opencontainers/runc/releases) - [Changelog](https://github.com/opencontainers/runc/blob/v1.1.5/CHANGELOG.md) - [Commits](https://github.com/opencontainers/runc/compare/v1.1.4...v1.1.5) --- updated-dependencies: - dependency-name: github.com/opencontainers/runc dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * changelog entry --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-03-29 15:05:05 -04:00
Juana De La Cuesta	dd770027df	fix: clean the output writter to avoid duplicates when testing for json output (#16619 )	2023-03-29 12:05:23 +02:00
Max Fröhlich	ba590b081e	docs: mention Nomad Admission Control Proxy (#16702 )	2023-03-28 15:18:26 -04:00
Tim Gross	f22ff2b847	docs: clarify capabilities options for `docker` driver (#16693 ) The `docker` driver cannot expand capabilities beyond the default set when the task is a non-root user. Clarify this in the documentation of `allow_caps` and update the `cap_add` and `cap_drop` to match the `exec` driver, which has more clear language overall.	2023-03-28 13:32:08 -04:00
Elvis Pranskevichus	11a9bb6ce7	drivers/exec: Fix handling of capabilities for unprivileged tasks (#16643 ) Currently, the `exec` driver is only setting the Bounding set, which is not sufficient to actually enable the requisite capabilities for the task process. In order for the capabilities to survive `execve` performed by libcontainer, the `Permitted`, `Inheritable`, and `Ambient` sets must also be set. Per CAPABILITIES (7): > Ambient: This is a set of capabilities that are preserved across an > execve(2) of a program that is not privileged. The ambient capability > set obeys the invariant that no capability can ever be ambient if it > is not both permitted and inheritable.	2023-03-28 12:12:55 -04:00
James Rasell	17fd1a2e35	dev: make cni, consul, dev, docker, and vault scripts Lima compat. (#16689 )	2023-03-28 16:21:14 +01:00
Seth Hoenig	87f4b71df0	client/fingerprint: correctly fingerprint E/P cores of Apple Silicon chips (#16672 ) * client/fingerprint: correctly fingerprint E/P cores of Apple Silicon chips This PR adds detection of asymetric core types (Power & Efficiency) (P/E) when running on M1/M2 Apple Silicon CPUs. This functionality is provided by shoenig/go-m1cpu which makes use of the Apple IOKit framework to read undocumented registers containing CPU performance data. Currently working on getting that functionality merged upstream into gopsutil, but gopsutil would still not support detecting P vs E cores like this PR does. Also refactors the CPUFingerprinter code to handle the mixed core types, now setting power vs efficiency cpu attributes. For now the scheduler is still unaware of mixed core types - on Apple platforms tasks cannot reserve cores anyway so it doesn't matter, but at least now the total CPU shares available will be correct. Future work should include adding support for detecting P/E cores on the latest and upcoming Intel chips, where computation of total cpu shares is currently incorrect. For that, we should also include updating the scheduler to be core-type aware, so that tasks of resources.cores on Linux platforms can be assigned the correct number of CPU shares for the core type(s) they have been assigned. node attributes before cpu.arch = arm64 cpu.modelname = Apple M2 Pro cpu.numcores = 12 cpu.reservablecores = 0 cpu.totalcompute = 1000 node attributes after cpu.arch = arm64 cpu.frequency.efficiency = 2424 cpu.frequency.power = 3504 cpu.modelname = Apple M2 Pro cpu.numcores.efficiency = 4 cpu.numcores.power = 8 cpu.reservablecores = 0 cpu.totalcompute = 37728 * fingerprint/cpu: follow up cr items	2023-03-28 08:27:58 -05:00
James Rasell	a18e480a57	dev: modify Go install to support arch64 and non-vagrant machines. (#16651 )	2023-03-28 14:18:48 +01:00
Tim Gross	78acc75b57	docs: add notes about keyring to snapshot restore (#16663 ) When cluster administrators restore from Raft snapshot, they also need to ensure the keyring is in place. For on-prem users doing in-place upgrades this is less of a concern but for typical cloud workflows where the whole host is replaced, it's an important warning (at least until #14852 has been implemented).	2023-03-28 08:31:01 -04:00
Tim Gross	a953456460	docs: fix template retry attempts default documentation (#16667 ) The configuration docs for `client.template.vault_retry`, `consul_retry`, and `nomad_retry` incorrectly document the default number of attempts to be unlimited (0). When we added these config blocks, we defaulted the fields to `nil` for backwards compatibility, which causes them to fall back to the default consul-template configuration values.	2023-03-28 08:27:06 -04:00
James Rasell	a53f9a4094	docs: fix-up legacy link in client config page. (#16678 )	2023-03-28 09:32:34 +01:00
Tobias Birkefeld	581eba9f41	docs: fix link of Read Stats API (#16673 ) The former link results in a 404. Update the link to the correct developer docs.	2023-03-28 08:49:44 +01:00
James Rasell	28c142c1a6	dev: account for non-vagrant machines on Linux config priv. (#16657 )	2023-03-27 17:13:18 +01:00
Juana De La Cuesta	320884b8ee	Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true (#16583 ) * Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true Fixes #11052 When restoring periodic dispatcher, all periodic jobs are forced without checking for previous childre. * Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true Fixes #11052 When restoring periodic dispatcher, all periodic jobs are forced without checking for previous children. * style: refactor force run function * fix: remove defer and inline unlock for speed optimization * Update nomad/leader.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * style: refactor tests to use must * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * fix: move back from defer to calling unlock before returning. createEval cant be called with the lock on * style: refactor test to use must * added new entry to changelog and update comments --------- Co-authored-by: James Rasell <jrasell@hashicorp.com> Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-03-27 17:25:05 +02:00
Juana De La Cuesta	21b675244e	style: rename ForceRun to ForceEval, for clarity (#16617 )	2023-03-27 15:38:48 +02:00
Luiz Aoqui	8070882c4b	scheduler: fix reconciliation of reconnecting allocs (#16609 ) When a disconnect client reconnects the `allocReconciler` must find the allocations that were created to replace the original disconnected allocations. This process was being done in only a subset of non-terminal untainted allocations, meaning that, if the replacement allocations were not in this state the reconciler didn't stop them, leaving the job in an inconsistent state. This inconsistency is only solved in a future job evaluation, but at that point the allocation is considered reconnected and so the specific reconnection logic was not applied, leading to unexpected outcomes. This commit fixes the problem by running reconnecting allocation reconciliation logic earlier into the process, leaving the rest of the reconciler oblivious of reconnecting allocations. It also uses the full set of allocations to search for replacements, stopping them even if they are not in the `untainted` set. The system `SystemScheduler` is not affected by this bug because disconnected clients don't trigger replacements: every eligible client is already running an allocation.	2023-03-24 19:38:31 -04:00
ron-savoia	743414739d	docs: added section of needed ACL rules for Nomad UI (#16494 )	2023-03-24 08:57:16 -04:00
Luiz Aoqui	e5d31bca61	cli: job restart command (#16278 ) Implement the new `nomad job restart` command that allows operators to restart allocations tasks or reschedule then entire allocation. Restarts can be batched to target multiple allocations in parallel. Between each batch the command can stop and hold for a predefined time or until the user confirms that the process should proceed. This implements the "Stateless Restarts" alternative from the original RFC (https://gist.github.com/schmichael/e0b8b2ec1eb146301175fd87ddd46180). The original concept is still worth implementing, as it allows this functionality to be exposed over an API that can be consumed by the Nomad UI and other clients. But the implementation turned out to be more complex than we initially expected so we thought it would be better to release a stateless CLI-based implementation first to gather feedback and validate the restart behaviour. Co-authored-by: Shishir Mahajan <smahajan@roblox.com>	2023-03-23 18:28:26 -04:00
Luiz Aoqui	4ccd999304	ci: send notification when prepare is complete (#16627 )	2023-03-23 17:34:45 -04:00

... 3 4 5 6 7 ...

24690 Commits All Branches Search

24690 Commits

All Branches