Protect against a panic when we attempt to start a container with a name
that conflicts with an existing one. If the existing one is being
deleted while nomad first attempts to create the container, the
createContainer will fail with `container already exists`, but we get
nil container reference from the `containerByName` lookup, and cause a
crash.
I'm not certain how we get into the state, except for being very
unlucky. I suspect that this case may be the result of a concurrent
restart or the docker engine API not being fully consistent (e.g. an
earlier call purged the container, but docker didn't free up resources
yet to create a new container with the same name immediately yet).
If that's the case, then re-attempting creation will hopefully succeed,
or we'd at least fail enough times for the alloc to be rescheduled to
another node.
This page has not been updated (yet) to reflect that support for all 3 job types (service, batch, system) which shipped in 0.9.2.
The current page implies that preemption is only available for system jobs.
This is early preparation for Nomad 0.12, where we plan to move Preemption from Enterprise feature suite to OSS for all.
The BinPackIter accounted for node reservations twice when scoring nodes
which could bias scores toward nodes with reservations.
Pseudo-code for previous algorithm:
```
proposed = reservedResources + sum(allocsResources)
available = nodeResources - reservedResources
score = 1 - (proposed / available)
```
The node's reserved resources are added to the total resources used by
allocations, and then the node's reserved resources are later
substracted from the node's overall resources.
The new algorithm is:
```
proposed = sum(allocResources)
available = nodeResources - reservedResources
score = 1 - (proposed / available)
```
The node's reserved resources are no longer added to the total resources
used by allocations.
My guess as to how this bug happened is that the resource utilization
variable (`util`) is calculated and returned by the `AllocsFit` function
which needs to take reserved resources into account as a basic
feasibility check.
To avoid re-calculating alloc resource usage (because there may be a
large number of allocs), we reused `util` in the `ScoreFit` function.
`ScoreFit` properly accounts for reserved resources by subtracting them
from the node's overall resources. However since `util` _also_ took
reserved resources into account the score would be incorrect.
Prior to the fix the added test output:
```
Node: reserved Score: 1.0000
Node: reserved2 Score: 1.0000
Node: no-reserved Score: 0.9741
```
The scores being 1.0 for *both* nodes with reserved resources is a good
hint something is wrong as they should receive different scores. Upon
further inspection the double accounting of reserved resources caused
their scores to be >1.0 and clamped.
After the fix the added test outputs:
```
Node: no-reserved Score: 0.9741
Node: reserved Score: 0.9480
Node: reserved2 Score: 0.8717
```
The field names within the structs representing the Connect proxy
definition were not the same (nomad/structs/ vs api/), causing the
values to be lost in translation for the 'nomad job inspect' command.
Since the field names already shipped in v0.11.0 we cannot simply
fix the names. Instead, use the json struct tag on the structs/ structs
to remap the name to match the publicly expose api/ package on json
encoding.
This means existing jobs from v0.11.0 will continue to work, and
the JSON API for job submission will remain backwards compatible.
Before, the proxy stanza did not parse non-object fields
`local_service_port` and `local_service_address` from the
connect `proxy` stanza. This change fixes that.
* documents the scaling block in the JSON Job docs
resolves#7656
* add task-specific restart to JSON Job docs
companion to #7603
* [docs] improved and corrected scaling docs
* Update website/pages/api-docs/json-jobs.mdx
Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
* update `restart` documentation
#7288 added support for task-specific `restart` policy. this PR updates the docs to reflect that.
* added an explicit example of task-specific restart policy
* Update website/pages/docs/job-specification/restart.mdx
Shutdown http server last, after nomad client/server components
terminate.
Before this change, if the agent is taking an unexpectedly long time to
shutdown, the operator cannot query the http server directly: they
cannot access agent specific http endpoints and need to query another
agent about the troublesome agent.
Unexpectedly long shutdown can happen in normal cases, e.g. a client
might hung is if one of the allocs it is running has a long
shutdown_delay.
Here, we switch to ensuring that the http server is shutdown last.
I believe this doesn't require extra care in agent shutting down logic
while operators may be able to submit write http requests. We already
need to cope with operators submiting these http requests to another
agent or by servers updating the client allocations.
Follow-up for a method missed in the refactor for #7688. The
`volAndPluginLookup` method is only ever called from the server's `CSI`
RPC and never the `ClientCSI` RPC, so move it into that scope.
The current design of `ClientCSI` RPC requires that callers in the
server know about the free-standing `nodeForControllerPlugin`
function. This makes it difficult to send `ClientCSI` RPC messages
from subpackages of `nomad` and adds a bunch of boilerplate to every
server-side caller of a controller RPC.
This changeset makes it so that the `ClientCSI` RPCs will populate and
validate the controller's client node ID if it hasn't been passed by
the caller, centralizing the logic of picking and validating
controller targets into the `nomad.ClientCSI` struct.
task shutdown_delay will currently only run if there are registered
services for the task. This implementation detail isn't explicity stated
anywhere and is defined outside of the service stanza.
This change moves shutdown_delay to be evaluated after prekill hooks are
run, outside of any task runner hooks.
just use time.sleep
Version 10 fixes an issue where if lint-staged fails while linting
a partially staged file, all unstaged changes will be removed from
the working tree. Now when this happens, unstaged changes will be
in the stash.
Before, the submitted jobspec for sidecar_task would pass
through 2 key validation steps - once for the subset specific
to connect sidecar task definitions, and once again for the set
of normal task definition where the task would actually get
unmarshalled.
The valid keys for the normal task definition did not include
"name", which is supposed to be configurable for the sidecar
task. To fix this, just eliminate the double validation step,
and instead pass-in the correct set of keys to validate against
to the one generic task parser.
Fixes#7680
Before, if the sidecar_service stanza of a connect enabled service
was missing, the job submission would cause a panic in the nomad
agent. Since the panic was happening in the API handler the agent
itself continued running, but this change will the condition more
gracefully.
By fixing the `Copy` method, the API handler now returns the proper
error.
$ nomad job run foo.nomad
Error submitting job: Unexpected response code: 500 (1 error occurred:
* Task group api validation failed: 2 errors occurred:
* Missing tasks for task group
* Task group service validation failed: 1 error occurred:
* Service[0] count-api validation failed: 1 error occurred:
* Consul Connect must be native or use a sidecar service