open-nomad/website/pages/api-docs/operator.mdx
2020-10-02 13:31:40 -04:00

646 lines
19 KiB
Plaintext

---
layout: api
page_title: Operator - HTTP API
sidebar_title: Operator
description: |-
The /operator endpoints provides cluster-level tools for Nomad operators, such
as interacting with the Raft subsystem.
---
# /v1/operator
The `/operator` endpoint provides cluster-level tools for Nomad operators, such
as interacting with the Raft subsystem.
~> Use this interface with extreme caution, as improper use could lead to a
Nomad outage and even loss of data.
See the [Outage Recovery](https://learn.hashicorp.com/tutorials/nomad/outage-recovery) guide for some examples of how
these capabilities are used. For a CLI to perform these operations manually,
please see the documentation for the
[`nomad operator`](/docs/commands/operator) command.
## Read Raft Configuration
This endpoint queries the status of a client node registered with Nomad.
| Method | Path | Produces |
| ------ | --------------------------------- | ------------------ |
| `GET` | `/v1/operator/raft/configuration` | `application/json` |
The table below shows this endpoint's support for
[blocking queries](/api-docs#blocking-queries) and
[required ACLs](/api-docs#acls).
| Blocking Queries | ACL Required |
| ---------------- | ------------ |
| `NO` | `management` |
### Parameters
- `stale` - Specifies if the cluster should respond without an active leader.
This is specified as a query string parameter.
### Sample Request
```shell-session
$ curl \
https://localhost:4646/v1/operator/raft/configuration
```
### Sample Response
```json
{
"Index": 1,
"Servers": [
{
"Address": "127.0.0.1:4647",
"ID": "127.0.0.1:4647",
"Leader": true,
"Node": "bacon-mac.global",
"RaftProtocol": 2,
"Voter": true
}
]
}
```
#### Field Reference
- `Index` `(int)` - The `Index` value is the Raft corresponding to this
configuration. The latest configuration may not yet be committed if changes
are in flight.
- `Servers` `(array: Server)` - The returned `Servers` array has information
about the servers in the Raft peer configuration.
- `ID` `(string)` - The ID of the server. This is the same as the `Address`
but may be upgraded to a GUID in a future version of Nomad.
- `Node` `(string)` - The node name of the server, as known to Nomad, or
`"(unknown)"` if the node is stale and not known.
- `Address` `(string)` - The `ip:port` for the server.
- `Leader` `(bool)` - is either "true" or "false" depending on the server's
role in the Raft configuration.
- `Voter` `(bool)` - is "true" or "false", indicating if the server has a vote
in the Raft configuration. Future versions of Nomad may add support for
non-voting servers.
## Remove Raft Peer
This endpoint removes a Nomad server with given address from the Raft
configuration. The return code signifies success or failure.
| Method | Path | Produces |
| -------- | ------------------------ | ------------------ |
| `DELETE` | `/v1/operator/raft/peer` | `application/json` |
The table below shows this endpoint's support for
[blocking queries](/api-docs#blocking-queries) and
[required ACLs](/api-docs#acls).
| Blocking Queries | ACL Required |
| ---------------- | ------------ |
| `NO` | `management` |
### Parameters
- `address` `(string: <optional>)` - Specifies the server to remove as
`ip:port`. This cannot be provided along with the `id` parameter.
- `id` `(string: <optional>)` - Specifies the server to remove as
`id`. This cannot be provided along with the `address` parameter.
### Sample Request
```shell-session
$ curl \
--request DELETE \
https://localhost:4646/v1/operator/raft/peer?address=1.2.3.4
```
## Read Autopilot Configuration
This endpoint retrieves its latest Autopilot configuration.
| Method | Path | Produces |
| ------ | -------------------------------------- | ------------------ |
| `GET` | `/v1/operator/autopilot/configuration` | `application/json` |
The table below shows this endpoint's support for
[blocking queries](/api-docs#blocking-queries) and
[required ACLs](/api-docs#acls).
| Blocking Queries | ACL Required |
| ---------------- | --------------- |
| `NO` | `operator:read` |
### Sample Request
```shell-session
$ curl \
https://localhost:4646/v1/operator/autopilot/configuration
```
### Sample Response
```json
{
"CleanupDeadServers": true,
"LastContactThreshold": "200ms",
"MaxTrailingLogs": 250,
"ServerStabilizationTime": "10s",
"EnableRedundancyZones": false,
"DisableUpgradeMigration": false,
"EnableCustomUpgrades": false,
"CreateIndex": 4,
"ModifyIndex": 4
}
```
For more information about the Autopilot configuration options, see the
[agent configuration section](/docs/configuration/autopilot).
## Update Autopilot Configuration
This endpoint updates the Autopilot configuration of the cluster.
| Method | Path | Produces |
| ------ | -------------------------------------- | ------------------ |
| `PUT` | `/v1/operator/autopilot/configuration` | `application/json` |
The table below shows this endpoint's support for
[blocking queries](/api-docs#blocking-queries) and
[required ACLs](/api-docs#acls).
| Blocking Queries | ACL Required |
| ---------------- | ---------------- |
| `NO` | `operator:write` |
### Parameters
- `cas` `(int: 0)` - Specifies to use a Check-And-Set operation. The update will
only happen if the given index matches the `ModifyIndex` of the configuration
at the time of writing.
### Sample Payload
```json
{
"CleanupDeadServers": true,
"LastContactThreshold": "200ms",
"MaxTrailingLogs": 250,
"ServerStabilizationTime": "10s",
"EnableRedundancyZones": false,
"DisableUpgradeMigration": false,
"EnableCustomUpgrades": false,
"CreateIndex": 4,
"ModifyIndex": 4
}
```
- `CleanupDeadServers` `(bool: true)` - Specifies automatic removal of dead
server nodes periodically and whenever a new server is added to the cluster.
- `LastContactThreshold` `(string: "200ms")` - Specifies the maximum amount of
time a server can go without contact from the leader before being considered
unhealthy. Must be a duration value such as `10s`.
- `MaxTrailingLogs` `(int: 250)` specifies the maximum number of log entries
that a server can trail the leader by before being considered unhealthy.
- `ServerStabilizationTime` `(string: "10s")` - Specifies the minimum amount of
time a server must be stable in the 'healthy' state before being added to the
cluster. Only takes effect if all servers are running Raft protocol version 3
or higher. Must be a duration value such as `30s`.
- `EnableRedundancyZones` `(bool: false)` - (Enterprise-only) Specifies whether
to enable redundancy zones.
- `DisableUpgradeMigration` `(bool: false)` - (Enterprise-only) Disables Autopilot's
upgrade migration strategy in Nomad Enterprise of waiting until enough
newer-versioned servers have been added to the cluster before promoting any of
them to voters.
- `EnableCustomUpgrades` `(bool: false)` - (Enterprise-only) Specifies whether to
enable using custom upgrade versions when performing migrations.
## Read Health
This endpoint queries the health of the autopilot status.
| Method | Path | Produces |
| ------ | ------------------------------- | ------------------ |
| `GET` | `/v1/operator/autopilot/health` | `application/json` |
The table below shows this endpoint's support for
[blocking queries](/api-docs#blocking-queries) and
[required ACLs](/api-docs#acls).
| Blocking Queries | ACL Required |
| ---------------- | --------------- |
| `NO` | `operator:read` |
### Sample Request
```shell-session
$ curl \
https://localhost:4646/v1/operator/autopilot/health
```
### Sample response
```json
{
"Healthy": true,
"FailureTolerance": 0,
"Servers": [
{
"ID": "e349749b-3303-3ddf-959c-b5885a0e1f6e",
"Name": "node1",
"Address": "127.0.0.1:8300",
"SerfStatus": "alive",
"Version": "0.8.0",
"Leader": true,
"LastContact": "0s",
"LastTerm": 2,
"LastIndex": 46,
"Healthy": true,
"Voter": true,
"StableSince": "2017-03-06T22:07:51Z"
},
{
"ID": "e36ee410-cc3c-0a0c-c724-63817ab30303",
"Name": "node2",
"Address": "127.0.0.1:8205",
"SerfStatus": "alive",
"Version": "0.8.0",
"Leader": false,
"LastContact": "27.291304ms",
"LastTerm": 2,
"LastIndex": 46,
"Healthy": true,
"Voter": false,
"StableSince": "2017-03-06T22:18:26Z"
}
]
}
```
- `Healthy` is whether all the servers are currently healthy.
- `FailureTolerance` is the number of redundant healthy servers that could be
fail without causing an outage (this would be 2 in a healthy cluster of 5
servers).
- `Servers` holds detailed health information on each server:
- `ID` is the Raft ID of the server.
- `Name` is the node name of the server.
- `Address` is the address of the server.
- `SerfStatus` is the SerfHealth check status for the server.
- `Version` is the Nomad version of the server.
- `Leader` is whether this server is currently the leader.
- `LastContact` is the time elapsed since this server's last contact with the leader.
- `LastTerm` is the server's last known Raft leader term.
- `LastIndex` is the index of the server's last committed Raft log entry.
- `Healthy` is whether the server is healthy according to the current Autopilot configuration.
- `Voter` is whether the server is a voting member of the Raft cluster.
- `StableSince` is the time this server has been in its current `Healthy` state.
The HTTP status code will indicate the health of the cluster. If `Healthy` is true, then a
status of 200 will be returned. If `Healthy` is false, then a status of 429 will be returned.
## Read Scheduler Configuration
This endpoint retrieves the latest Scheduler configuration. This API was introduced in
Nomad 0.9 and currently supports enabling/disabling preemption. More options may be added in
the future.
| Method | Path | Produces |
| ------ | -------------------------------------- | ------------------ |
| `GET` | `/v1/operator/scheduler/configuration` | `application/json` |
The table below shows this endpoint's support for
[blocking queries](/api-docs#blocking-queries) and
[required ACLs](/api-docs#acls).
| Blocking Queries | ACL Required |
| ---------------- | --------------- |
| `NO` | `operator:read` |
### Sample Request
```shell-session
$ curl \
https://localhost:4646/v1/operator/scheduler/configuration
```
### Sample Response
```json
{
"Index": 5,
"KnownLeader": true,
"LastContact": 0,
"SchedulerConfig": {
"CreateIndex": 5,
"ModifyIndex": 5,
"SchedulerAlgorithm": "spread",
"PreemptionConfig": {
"SystemSchedulerEnabled": true,
"BatchSchedulerEnabled": false,
"ServiceSchedulerEnabled": false
}
}
}
```
#### Field Reference
- `Index` `(int)` - The `Index` value is the Raft commit index corresponding to this
configuration.
- `SchedulerConfig` `(SchedulerConfig)` - The returned `SchedulerConfig` object has configuration
settings mentioned below.
- `SchedulerAlgorithm` `(string: "binpack")` - Specifies whether scheduler binpacks or spreads allocations on available nodes.
- `PreemptionConfig` `(PreemptionConfig)` - Options to enable preemption for various schedulers.
- `SystemSchedulerEnabled` `(bool: true)` - Specifies whether preemption for system jobs is enabled. Note that
this defaults to true.
- `BatchSchedulerEnabled` `(bool: false)` - Specifies whether preemption for batch jobs is enabled. Note that
this defaults to false and must be explicitly enabled.
- `ServiceSchedulerEnabled` `(bool: false)` - Specifies whether preemption for service jobs is enabled. Note that
this defaults to false and must be explicitly enabled.
- `CreateIndex` - The Raft index at which the config was created.
- `ModifyIndex` - The Raft index at which the config was modified.
## Update Scheduler Configuration
This endpoint updates the scheduler configuration of the cluster.
| Method | Path | Produces |
| ------------- | -------------------------------------- | ------------------ |
| `PUT`, `POST` | `/v1/operator/scheduler/configuration` | `application/json` |
The table below shows this endpoint's support for
[blocking queries](/api-docs#blocking-queries) and
[required ACLs](/api-docs#acls).
| Blocking Queries | ACL Required |
| ---------------- | ---------------- |
| `NO` | `operator:write` |
### Bootstrap Configuration Element
The [`default_scheduler_config`][] attribute of the server stanza will provide a
starting value for this configuration. Once bootstrapped, the value in the
server state is authoritative.
### Parameters
- `cas` `(int: 0)` - Specifies to use a Check-And-Set operation. The update will
only happen if the given index matches the `ModifyIndex` of the configuration
at the time of writing.
### Sample Payload
```json
{
"SchedulerAlgorithm": "spread",
"PreemptionConfig": {
"SystemSchedulerEnabled": true,
"BatchSchedulerEnabled": false,
"ServiceSchedulerEnabled": true
}
}
```
- `SchedulerAlgorithm` `(string: "binpack")` - Specifies whether scheduler
binpacks or spreads allocations on available nodes. Possible values are
`"binpack"` and `"spread"`
- `PreemptionConfig` `(PreemptionConfig)` - Options to enable preemption for
various schedulers.
- `SystemSchedulerEnabled` `(bool: true)` - Specifies whether preemption for
system jobs is enabled. Note that if this is set to true, then system jobs
can preempt any other jobs.
- `BatchSchedulerEnabled` `(bool: false)` - Specifies
whether preemption for batch jobs is enabled. Note that if this is set to
true, then batch jobs can preempt any other jobs.
- `ServiceSchedulerEnabled` `(bool: false)` - Specifies
whether preemption for service jobs is enabled. Note that if this is set to
true, then service jobs can preempt any other jobs.
### Sample Response
```json
{
"Updated": false,
"Index": 15
}
```
- `Updated` - Indicates that the configuration was updated when a `cas` value is
provided. For non-CAS requests, this field will be false even though the
update is applied.
- `Index` - Current Raft index when the request was received.
## Get Nomad Enterprise License Info
This endpoint gets information about the current license.
| Method | Path | Produces |
| ------ | ---------------------- | ------------------ |
| `GET` | `/v1/operator/license` | `application/json` |
The table below shows this endpoint's support for
[blocking queries](/api-docs#blocking-queries) and
[required ACLs](/api-docs#acls).
| Blocking Queries | ACL Required |
| ---------------- | --------------- |
| `NO` | `operator:read` |
### Sample Request
```shell-session
$ curl \
https://localhost:4646/v1/operator/license
```
### Sample Response
```json
{
"KnownLeader": false,
"LastContact": 0,
"LastIndex": 0,
"License": {
"CustomerID": "temporary license customer",
"ExpirationTime": "2020-06-01T14:50:16.581304556-04:00",
"Features": [
"Automated Upgrades",
"Enhanced Read Scalability",
"Redundancy Zones",
"Namespaces",
"Resource Quotas",
"Preemption",
"Audit Logging",
"Setinel Policies"
],
"Flags": {
"modules": ["governance-policy"]
},
"InstallationID": "*",
"IssueTime": "2020-06-01T08:50:16.581304556-04:00",
"LicenseID": "temporary-license",
"Modules": ["governance-policy"],
"Product": "nomad",
"StartTime": "2020-06-01T08:50:16.581304556-04:00",
"TerminationTime": "2020-06-01T14:50:16.581304556-04:00"
},
"RequestTime": 0
}
```
## Updating the Nomad Enterprise License
This endpoint updates the Nomad license.
| Method | Path | Produces |
| ------ | ---------------------- | ------------------ |
| `PUT` | `/v1/operator/license` | `application/json` |
The table below shows this endpoint's support for
[blocking queries](/api-docs#blocking-queries) and
[required ACLs](/api-docs#acls).
| Blocking Queries | ACL Required |
| ---------------- | ---------------- |
| `NO` | `operator:write` |
### Sample Payload
The payload is the raw license blob.
### Sample Request
```shell-session
$ curl \
--request PUT \
--data @nomad.license \
https://localhost:4646/v1/operator/license
```
### Sample Response
```json
{
"Index": 15
}
```
[`default_scheduler_config`]: /docs/configuration/server#default_scheduler_config
## Generate Snapshot
This endpoint generates and returns an atomic, point-in-time snapshot of the
Nomad server state for disaster recovery. Snapshots include all state managed by Nomad's
Raft [consensus protocol](/docs/internals/consensus).
Snapshots are exposed as gzipped tar archives which internally contain the Raft
metadata required to restore, as well as a binary serialized version of the
Nomad server state. The contents are covered internally by SHA-256 hashes.
These hashes are verified during snapshot restore operations. The structure of
the archive is internal to Nomad and not intended to be used other than for
restore operations. The archives are not designed to be modified before a
restore.
| Method | Path | Produces |
| :----- | :---------------------- | ------------------------ |
| `GET` | `/v1/operator/snapshot` | `200 application/x-gzip` |
The table below shows this endpoint's support for
[blocking queries](/api-docs#blocking-queries) and
[required ACLs](/api-docs#acls).
| Blocking Queries | ACL Required |
| ---------------- | ------------ |
| `NO` | `management` |
### Parameters
- `stale` - Specifies if the cluster should respond without an active leader.
This is specified as a query string parameter.
### Sample Request
```shell-session
$ curl \
-o snapshot.tgz \
http://127.0.0.1:4646/v1/operator/snapshot
```
The above example results in a tarball named `snapshot.tgz` in the current working directory.
## Restore Snapshot
This endpoint restores a point-in-time snapshot of the Nomad server state.
Restores involve a potentially dangerous low-level Raft operation that is not
designed to handle server failures during a restore. This operation is primarily
intended to be used when recovering from a disaster, restoring into a fresh
cluster of Nomad servers.
The body of the request should be a snapshot archive returned from a previous
call to the `GET` method.
| Method | Path | Produces |
| :----- | :---------------------- | ----------------------------- |
| `PUT` | `/v1/operator/snapshot` | `200 text/plain (empty body)` |
The table below shows this endpoint's support for
[blocking queries](/api-docs#blocking-queries) and
[required ACLs](/api-docs#acls).
| Blocking Queries | ACL Required |
| ---------------- | ------------ |
| `NO` | `management` |
### Sample Request
```shell-session
$ curl \
--request PUT \
--data-binary @snapshot.tgz \
http://127.0.0.1:4646/v1/operator/snapshot
```
~> Some tools default to www/encoded uploads. Nomad expects the snapshot to be
in pure binary form.