Requests without an ACL token that pass thru the client's HTTP API are treated
as though they come from the client itself. This allows bypass of ACLs on RPC
requests where ACL permissions are checked (like `Job.Register`). Invalid tokens
are correctly rejected.
Fix the bypass by only setting a client ID on the identity if we have a valid node secret.
Note that this changeset will break rate metrics for RPCs sent by clients
without a client secret such as `Node.GetClientAllocs`; these requests will be
recorded as anonymous.
Future work should:
* Ensure the node secret is sent with all client-driven RPCs except
`Node.Register` which is TOFU.
* Create a new `acl.ACL` object from client requests so that we
can enforce ACLs for all endpoints in a uniform way that's less error-prone.~
These endpoints all refer to JobID by the time you get to the RPC request
layer, but the HTTP handler functions call the field JobName, which is a different
field (... though often with the same value).
This update changes the behaviour when following logs from an
allocation, so that both stdout and stderr files streamed when the
operator supplies the follow flag. The previous behaviour is held
when all other flags and situations are provided.
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
The allocrunner has a facility for passing data written by allocrunner hooks to
taskrunner hooks. Currently the only consumers of this facility are the
allocrunner CSI hook (which writes data) and the taskrunner volume hook (which
reads that same data).
The allocrunner hook for CSI volumes doesn't set the alloc hook resources
atomically. Instead, it gets the current resources and then writes a new version
back. Because the CSI hook is currently the only writer and all readers happen
long afterwards, this should be safe but #16623 shows there's some sequence of
events during restore where this breaks down.
Refactor hook resources so that hook data is accessed via setters and getters
that hold the mutex.
The workflow described in the docs for apt installation is deprecated. Update to
match the workflow described in the Tutorials and official packaging guide.
Instead of attempting to pick each individual commit in a PR using
`BACKPORT_MERGE_COMMIT` only picks the commit that was merged into
`main`.
This reduces the amount of work done during a backport, generating
cleaner merges and avoiding potential issues on specific commits.
With this setting PRs that are not squashed will fail to backport and
must be handled manually, but those are considered exceptions.
* Bones of JWT detection
* JWT to token pipeline complete
* Some live-demo fixes for template language
* findSelf and loginJWT funcs made async
* Acceptance tests and mirage mocks for JWT login
* [ui] Allow for multiple JWT auth methods in the UI (#16665)
* Split selectable jwt methods
* repositions the dropdown to be next to the input field
When we added recovery of pause containers in #16352 we called the recovery
function from the plugin factory function. But in our plugin setup protocol, a
plugin isn't ready for use until we call `SetConfig`. This meant that
recovering pause containers was always done with the default
config. Setting up the Docker client only happens once, so setting the wrong
config in the recovery function also means that all other Docker API calls will
use the default config.
Move the `recoveryPauseContainers` call into the `SetConfig`. Fix the error
handling so that we return any error but also don't log when the context is
canceled, which happens twice during normal startup as we fingerprint the
driver.
The `docker` driver cannot expand capabilities beyond the default set when the
task is a non-root user. Clarify this in the documentation of `allow_caps` and
update the `cap_add` and `cap_drop` to match the `exec` driver, which has more
clear language overall.
Currently, the `exec` driver is only setting the Bounding set, which is
not sufficient to actually enable the requisite capabilities for the
task process. In order for the capabilities to survive `execve`
performed by libcontainer, the `Permitted`, `Inheritable`, and `Ambient`
sets must also be set.
Per CAPABILITIES (7):
> Ambient: This is a set of capabilities that are preserved across an
> execve(2) of a program that is not privileged. The ambient capability
> set obeys the invariant that no capability can ever be ambient if it
> is not both permitted and inheritable.
* client/fingerprint: correctly fingerprint E/P cores of Apple Silicon chips
This PR adds detection of asymetric core types (Power & Efficiency) (P/E)
when running on M1/M2 Apple Silicon CPUs. This functionality is provided
by shoenig/go-m1cpu which makes use of the Apple IOKit framework to read
undocumented registers containing CPU performance data. Currently working
on getting that functionality merged upstream into gopsutil, but gopsutil
would still not support detecting P vs E cores like this PR does.
Also refactors the CPUFingerprinter code to handle the mixed core
types, now setting power vs efficiency cpu attributes.
For now the scheduler is still unaware of mixed core types - on Apple
platforms tasks cannot reserve cores anyway so it doesn't matter, but
at least now the total CPU shares available will be correct.
Future work should include adding support for detecting P/E cores on
the latest and upcoming Intel chips, where computation of total cpu shares
is currently incorrect. For that, we should also include updating the
scheduler to be core-type aware, so that tasks of resources.cores on Linux
platforms can be assigned the correct number of CPU shares for the core
type(s) they have been assigned.
node attributes before
cpu.arch = arm64
cpu.modelname = Apple M2 Pro
cpu.numcores = 12
cpu.reservablecores = 0
cpu.totalcompute = 1000
node attributes after
cpu.arch = arm64
cpu.frequency.efficiency = 2424
cpu.frequency.power = 3504
cpu.modelname = Apple M2 Pro
cpu.numcores.efficiency = 4
cpu.numcores.power = 8
cpu.reservablecores = 0
cpu.totalcompute = 37728
* fingerprint/cpu: follow up cr items
When cluster administrators restore from Raft snapshot, they also need to ensure the
keyring is in place. For on-prem users doing in-place upgrades this is less of a
concern but for typical cloud workflows where the whole host is replaced, it's
an important warning (at least until #14852 has been implemented).
The configuration docs for `client.template.vault_retry`, `consul_retry`, and
`nomad_retry` incorrectly document the default number of attempts to be
unlimited (0). When we added these config blocks, we defaulted the fields to
`nil` for backwards compatibility, which causes them to fall back to the default
consul-template configuration values.
* Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true
Fixes#11052
When restoring periodic dispatcher, all periodic jobs are forced without checking for previous childre.
* Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true
Fixes#11052
When restoring periodic dispatcher, all periodic jobs are forced without checking for previous children.
* style: refactor force run function
* fix: remove defer and inline unlock for speed optimization
* Update nomad/leader.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Update nomad/leader_test.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Update nomad/leader_test.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Update nomad/leader_test.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Update nomad/leader_test.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Update nomad/leader_test.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Update nomad/leader_test.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Update nomad/leader_test.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* style: refactor tests to use must
* Update nomad/leader_test.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Update nomad/leader_test.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Update nomad/leader_test.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Update nomad/leader_test.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Update nomad/leader_test.go
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* fix: move back from defer to calling unlock before returning.
createEval cant be called with the lock on
* style: refactor test to use must
* added new entry to changelog and update comments
---------
Co-authored-by: James Rasell <jrasell@hashicorp.com>
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
When a disconnect client reconnects the `allocReconciler` must find the
allocations that were created to replace the original disconnected
allocations.
This process was being done in only a subset of non-terminal untainted
allocations, meaning that, if the replacement allocations were not in
this state the reconciler didn't stop them, leaving the job in an
inconsistent state.
This inconsistency is only solved in a future job evaluation, but at
that point the allocation is considered reconnected and so the specific
reconnection logic was not applied, leading to unexpected outcomes.
This commit fixes the problem by running reconnecting allocation
reconciliation logic earlier into the process, leaving the rest of the
reconciler oblivious of reconnecting allocations.
It also uses the full set of allocations to search for replacements,
stopping them even if they are not in the `untainted` set.
The system `SystemScheduler` is not affected by this bug because
disconnected clients don't trigger replacements: every eligible client
is already running an allocation.
Implement the new `nomad job restart` command that allows operators to
restart allocations tasks or reschedule then entire allocation.
Restarts can be batched to target multiple allocations in parallel.
Between each batch the command can stop and hold for a predefined time
or until the user confirms that the process should proceed.
This implements the "Stateless Restarts" alternative from the original
RFC
(https://gist.github.com/schmichael/e0b8b2ec1eb146301175fd87ddd46180).
The original concept is still worth implementing, as it allows this
functionality to be exposed over an API that can be consumed by the
Nomad UI and other clients. But the implementation turned out to be more
complex than we initially expected so we thought it would be better to
release a stateless CLI-based implementation first to gather feedback
and validate the restart behaviour.
Co-authored-by: Shishir Mahajan <smahajan@roblox.com>