open-nomad/website/content/docs/commands/eval/delete.mdx

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

74 lines
2.3 KiB
Plaintext
Raw Normal View History

---
layout: docs
page_title: 'Commands: eval delete'
description: |
The eval delete command is used to delete evaluations.
---
# Command: eval delete
The `eval delete` command is used to delete evaluations. It should be used
cautiously and only in outage situations where there is a large backlog of
evaluations not being processed. During most normal and outage scenarios,
Nomad's reconciliation and state management will handle evaluations as needed.
The eval broker is expected to be paused prior to running this command and
un-paused after. These actions can be performed by the
[`operator scheduler get-config`][scheduler_get_config]
and [`operator scheduler set-config`][scheduler_set_config] commands respectively.
## Usage
```plaintext
nomad eval delete [options] [args]
```
It takes an optional argument which is the ID of the evaluation to delete. If
the evaluation ID is omitted, this command will use the filter flag to identify
and delete a set of evaluations.
When ACLs are enabled, this command requires a `management` token.
## General Options
@include 'general_options.mdx'
## Delete Options
- `-filter`: Specifies an expression used to filter evaluations by for
eval delete: move batching of deletes into RPC handler and state (#15117) During unusual outage recovery scenarios on large clusters, a backlog of millions of evaluations can appear. In these cases, the `eval delete` command can put excessive load on the cluster by listing large sets of evals to extract the IDs and then sending larges batches of IDs. Although the command's batch size was carefully tuned, we still need to be JSON deserialize, re-serialize to MessagePack, send the log entries through raft, and get the FSM applied. To improve performance of this recovery case, move the batching process into the RPC handler and the state store. The design here is a little weird, so let's look a the failed options first: * A naive solution here would be to just send the filter as the raft request and let the FSM apply delete the whole set in a single operation. Benchmarking with 1M evals on a 3 node cluster demonstrated this can block the FSM apply for several minutes, which puts the cluster at risk if there's a leadership failover (the barrier write can't be made while this apply is in-flight). * A less naive but still bad solution would be to have the RPC handler filter and paginate, and then hand a list of IDs to the existing raft log entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and took roughly an hour to complete. Instead, we're filtering and paginating in the RPC handler to find a page token, and then passing both the filter and page token in the raft log. The FSM apply recreates the paginator using the filter and page token to get roughly the same page of evaluations, which it then deletes. The pagination process is fairly cheap (only abut 5% of the total FSM apply time), so counter-intuitively this rework ends up being much faster. A benchmark of 1M evaluations showed this blocked the FSM apply for 20-30ms at a time (typical for normal operations) and completes in less than 4 minutes. Note that, as with the existing design, this delete is not consistent: a new evaluation inserted "behind" the cursor of the pagination will fail to be deleted.
2022-11-14 19:08:13 +00:00
deletion. When using this flag, it is advisable to ensure the syntax is
correct using the eval list command first. Note that deleting evals by filter
is imprecise: for sets of evals larger than a single raft log batch, evals can
be inserted behind the cursor and therefore be missed.
- `-yes`: Bypass the confirmation prompt if an evaluation ID was not provided.
## Examples
Delete an evaluation using its ID:
```shell-session
$ nomad eval delete 9ecffbba-73be-d909-5d7e-ac2694c10e0c
Successfuly deleted 1 evaluation
```
Delete all evaluations with status `pending` for the `example` job:
```shell-session
$ nomad eval delete -filter='Status == "pending" and JobID == "example"'
Are you sure you want to delete 3 evals? [y/N] y
Successfuly deleted 3 evaluations
```
Delete all evaluations for the `system` and `service` whilst skipping all
prompts:
```shell-session
$ nomad eval delete -filter='Scheduler == "system" or Scheduler == "service"' -yes
Successfuly deleted 23 evaluations
```
[scheduler_get_config]: /docs/commands/operator/scheduler/get-config
[scheduler_set_config]: /docs/commands/operator/scheduler/set-config