3371214431
This PR implements a new "System Batch" scheduler type. Jobs can make use of this new scheduler by setting their type to 'sysbatch'. Like the name implies, sysbatch can be thought of as a hybrid between system and batch jobs - it is for running short lived jobs intended to run on every compatible node in the cluster. As with batch jobs, sysbatch jobs can also be periodic and/or parameterized dispatch jobs. A sysbatch job is considered complete when it has been run on all compatible nodes until reaching a terminal state (success or failed on retries). Feasibility and preemption are governed the same as with system jobs. In this PR, the update stanza is not yet supported. The update stanza is sill limited in functionality for the underlying system scheduler, and is not useful yet for sysbatch jobs. Further work in #4740 will improve support for the update stanza and deployments. Closes #2527
129 lines
4 KiB
Plaintext
129 lines
4 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: reschedule Stanza - Job Specification
|
|
description: >-
|
|
The "reschedule" stanza specifies the group's rescheduling strategy upon
|
|
|
|
allocation failures. Nomad will only attempt to reschedule failed allocations
|
|
on
|
|
|
|
to another node only after any local
|
|
[restarts](docs/job-specification/restart)
|
|
|
|
have been exceeded.
|
|
---
|
|
|
|
# `reschedule` Stanza
|
|
|
|
<Placement
|
|
groups={[
|
|
['job', 'reschedule'],
|
|
['job', 'group', 'reschedule'],
|
|
]}
|
|
/>
|
|
|
|
The `reschedule` stanza specifies the group's rescheduling strategy. If specified at the job
|
|
level, the configuration will apply to all groups within the job. If the
|
|
reschedule stanza is present on both the job and the group, they are merged with
|
|
the group stanza taking the highest precedence and then the job.
|
|
|
|
Nomad will attempt to schedule the task on another node if any of its allocation
|
|
statuses become "failed". It prefers to create a replacement allocation on a node
|
|
that hasn't previously been used.
|
|
|
|
```hcl
|
|
job "docs" {
|
|
group "example" {
|
|
reschedule {
|
|
attempts = 15
|
|
interval = "1h"
|
|
delay = "30s"
|
|
delay_function = "exponential"
|
|
max_delay = "120s"
|
|
unlimited = false
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
~> The reschedule stanza does not apply to `system` or `sysbatch` jobs because
|
|
they run on every node.
|
|
|
|
## `reschedule` Parameters
|
|
|
|
- `attempts` `(int: <varies>)` - Specifies the number of reschedule attempts
|
|
allowed in the configured interval. Defaults vary by job type, see below
|
|
for more information.
|
|
|
|
- `interval` `(string: <varies>)` - Specifies the sliding window which begins
|
|
when the first reschedule attempt starts and ensures that only `attempts`
|
|
number of reschedule happen within it. If more than `attempts` number of
|
|
failures happen with this interval, Nomad will not reschedule any more.
|
|
|
|
- `delay` `(string: <varies>)` - Specifies the duration to wait before attempting
|
|
to reschedule a failed task. This is specified using a label suffix like "30s" or "1h".
|
|
Delay cannot be less than 5 seconds.
|
|
|
|
- `delay_function` `(string: <varies>)` - Specifies the function that is used to
|
|
calculate subsequent reschedule delays. The initial delay is specified by the delay parameter.
|
|
`delay_function` has three possible values which are described below.
|
|
|
|
- `constant` - The delay between reschedule attempts stays constant at the `delay` value.
|
|
- `exponential` - The delay between reschedule attempts doubles.
|
|
- `fibonacci` - The delay between reschedule attempts is calculated by adding the two most recent
|
|
delays applied. For example if `delay` is set to 5 seconds, the next five reschedule attempts will be
|
|
delayed by 5 seconds, 5 seconds, 10 seconds, 15 seconds, and 25 seconds respectively.
|
|
|
|
- `max_delay` `(string: <varies>)` - is an upper bound on the delay beyond which it will not increase. This parameter
|
|
is used when `delay_function` is `exponential` or `fibonacci`, and is ignored when `constant` delay is used.
|
|
|
|
- `unlimited` `(boolean:<varies>)` - `unlimited` enables unlimited reschedule attempts. If this is set to true
|
|
the `attempts` and `interval` fields are not used.
|
|
|
|
Information about reschedule attempts are displayed in the CLI and API for
|
|
allocations. Rescheduling is enabled by default for service and batch jobs
|
|
with the options shown below.
|
|
|
|
### `reschedule` Parameter Defaults
|
|
|
|
The values for the `reschedule` parameters vary by job type. Below are the
|
|
defaults by job type:
|
|
|
|
- The default batch reschedule policy is:
|
|
|
|
```hcl
|
|
reschedule {
|
|
attempts = 1
|
|
interval = "24h"
|
|
unlimited = false
|
|
delay = "5s"
|
|
delay_function = "constant"
|
|
}
|
|
```
|
|
|
|
- The default service reschedule policy is:
|
|
|
|
```hcl
|
|
reschedule {
|
|
delay = "30s"
|
|
delay_function = "exponential"
|
|
max_delay = "1h"
|
|
unlimited = true
|
|
}
|
|
```
|
|
|
|
### Disabling rescheduling
|
|
|
|
To disable rescheduling, set the `attempts` parameter to zero and `unlimited` to false.
|
|
|
|
```hcl
|
|
job "docs" {
|
|
group "example" {
|
|
reschedule {
|
|
attempts = 0
|
|
unlimited = false
|
|
}
|
|
}
|
|
}
|
|
```
|