Diagnose docs + changelog (#12159)
* save * diagnose docs * changelog * changelog formatting
This commit is contained in:
parent
f7ecb978a6
commit
fff7dc7a40
|
@ -0,0 +1,3 @@
|
||||||
|
```release-note:feature
|
||||||
|
operator diagnose: a new vault operator command to detect common issues with vault server setups.
|
||||||
|
```
|
|
@ -0,0 +1,228 @@
|
||||||
|
---
|
||||||
|
layout: docs
|
||||||
|
page_title: operator diagnose - Command
|
||||||
|
description: |-
|
||||||
|
"vault operator diagnose" is a new operator-centric command, focused on providing a clear description
|
||||||
|
of what is working in Vault, and what is not working. The command focuses on why Vault cannot serve requests,
|
||||||
|
but will also warn on configurations or statuses that it deems to be unsafe in some way.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# operator diagnose
|
||||||
|
|
||||||
|
The operator diagnose command should be used primarily when vault is down or
|
||||||
|
partially inoperational. The command can be used safely regardless of the state
|
||||||
|
vault is in, but may return meaningless results for some of the test cases if the
|
||||||
|
vault server is already running.
|
||||||
|
|
||||||
|
Note: if you run the diagnose command proactively, either before a server
|
||||||
|
starts or while a server is operational, please consult the documentation
|
||||||
|
on the individual checks below to see which checks are returning false error
|
||||||
|
messages or warnings.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
The following flags are available in addition to the [standard set of
|
||||||
|
flags](/docs/commands) included on all commands.
|
||||||
|
|
||||||
|
### Output Options
|
||||||
|
|
||||||
|
- `-format` `(string: "table")` - Print the output in the given format. Valid
|
||||||
|
formats are "table", "json", or "yaml". This can also be specified via the
|
||||||
|
`VAULT_FORMAT` environment variable.
|
||||||
|
|
||||||
|
#### Output Layout
|
||||||
|
|
||||||
|
The operator diagnose command will output a set of lines in the CLI.
|
||||||
|
Each line will begin with a prefix in parenthesis. These are:.
|
||||||
|
|
||||||
|
- `[ success ]` - Denotes that the check was successful.
|
||||||
|
- `[ warning ]` - Denotes that the check has passed, but that there may be potential
|
||||||
|
issues to look into that may relate to the issues vault is experiencing. Diagnose warns
|
||||||
|
frequently. These warnings are meant to serve as starting points in the debugging process.
|
||||||
|
- `[ failure ]` - Denotes that the check has failed. Failures are critical issues in the eyes
|
||||||
|
of the diagnose command.
|
||||||
|
|
||||||
|
In addition to these prefixed lines, there may be output lines that are not prefixed, but are
|
||||||
|
color-coded purple. These are advice lines from Diagnose, and are meant to offer general guidance
|
||||||
|
on how to go about fixing potential warnings or failures that may arise.
|
||||||
|
|
||||||
|
Warn or fail prefixes in nested checks will bubble up to the parent if the prefix superceeds the
|
||||||
|
parent prefix. Fail superceeds warn, and warn superceeds ok. For example, if the TLS checks under
|
||||||
|
the Storage check fails, the `[ failure ]` prefix will bubble up to the Storage check.
|
||||||
|
|
||||||
|
### Command Options
|
||||||
|
|
||||||
|
- `-config` `(string; "")` - The path to the vault configuration file used by
|
||||||
|
the vault server on startup.
|
||||||
|
|
||||||
|
### Diagnose Checks
|
||||||
|
|
||||||
|
The following section details the various checks that Diagnose runs. Check names in documentation
|
||||||
|
will be separated by slashes to denote that they are nested, when applicable. For example, a check
|
||||||
|
documented as `A / B` will show up as `B` in the `operator diagnose` output, and will be nested
|
||||||
|
(indented) under `A`.
|
||||||
|
|
||||||
|
#### Vault Diagnose
|
||||||
|
|
||||||
|
`Vault Diagnose` is the top level check that contains the rest of the checks. It will report the status
|
||||||
|
of the check
|
||||||
|
|
||||||
|
#### Check Operating System / Check Open File Limit
|
||||||
|
|
||||||
|
`Check Open File Limit` verifies that the open file limit value is set high enough for vault
|
||||||
|
to run effectively. We recommend setting these limits to at least 1024768.
|
||||||
|
|
||||||
|
This check will be skipped on openbsd, arm, and windows.
|
||||||
|
|
||||||
|
#### Check Operating System / Check Disk Usage
|
||||||
|
|
||||||
|
`Check Disk Usage` will report disk usage for each partition. For each partition on a prod host,
|
||||||
|
we recommend having at least 5% of the partition free to use, and at least 1 GB of space.
|
||||||
|
|
||||||
|
This check will be skipped on openbsd and arm.
|
||||||
|
|
||||||
|
#### Parse Configuration
|
||||||
|
|
||||||
|
`Parse Configuration` will check the vault server config file for syntax errors. It will check
|
||||||
|
for extra values in the configuration file, repeated stanzas, and stanzas that do not belong
|
||||||
|
in the configuration file (for example a "tcpp" listener as opposed to a tcp listener).
|
||||||
|
|
||||||
|
Currently, the `storage` stanza is not checked.
|
||||||
|
|
||||||
|
#### Check Storage / Create Storage Backend
|
||||||
|
|
||||||
|
`Create Storage Backend` ensures that the storage stanza configured in the vault server config
|
||||||
|
has enough information to create a storage object internally. Common errors will have to do
|
||||||
|
with misconfigured fields in the storage stanza.
|
||||||
|
|
||||||
|
#### Check Storage / Check Consul TLS
|
||||||
|
|
||||||
|
`Check Consul TLS` verifies TLS information included in the storage stanza if the storage type
|
||||||
|
is consul. If a certificate chain is provided, Diagnose parses the root, intermediate, and leaf
|
||||||
|
certificates, and checks each one for correctness.
|
||||||
|
|
||||||
|
#### Check Storage / Check Consul Direct Storage Access
|
||||||
|
|
||||||
|
`Check Consul Direct Storage Access` is a consul-specific check that ensures Vault is not accessing
|
||||||
|
the consul server directly, but rather through a local agent.
|
||||||
|
|
||||||
|
#### Check Storage / Check Raft Folder Permissions
|
||||||
|
|
||||||
|
`Check Raft Folder Permissions` computes the permissions on the raft folder, checks that a boltDB file
|
||||||
|
has been initialized within the folder previously, and ensures that the folder is not too permissive, but
|
||||||
|
at the same time has enough permissions to be used. The raft folder should not have `other` permissions, but
|
||||||
|
should have `group rw` or `owner rw`, depending on different setups. This check also warns if it detects a
|
||||||
|
symlink being used.
|
||||||
|
|
||||||
|
Note that this check will warn that a raft file has not been created if diagnose is run without any
|
||||||
|
pre-existing server runs.
|
||||||
|
|
||||||
|
This check will be skipped on windows.
|
||||||
|
|
||||||
|
#### Check Storage / Check Raft Folder Ownership
|
||||||
|
|
||||||
|
`Check Raft Folder Ownership` ensures that vault does not need to run as root to access the boltDB folder.
|
||||||
|
|
||||||
|
Note that this check will warn that a raft file has not been created if diagnose is run without any
|
||||||
|
pre-existing server runs.
|
||||||
|
|
||||||
|
This check will be skipped on windows.
|
||||||
|
|
||||||
|
#### Check Storage / Check For Raft Quorum
|
||||||
|
|
||||||
|
`Check For Raft Quorum` uses the FSM to ensure that there were an odd number of voters in the raft quorum when
|
||||||
|
vault was last running.
|
||||||
|
|
||||||
|
Note that this check will warn that there are 0 voters if diagnose is run without any pre-existing server runs.
|
||||||
|
|
||||||
|
#### Check Storage / Check Storage Access
|
||||||
|
|
||||||
|
`Check Storage Access` will try to write a dud value, named `diagnose/latency/<uuid>`, to storage.
|
||||||
|
Ensure that there is no important data at this location before running diagnose, as this check
|
||||||
|
will overwrite that data. This check will then try to list and read the value it wrote to ensure
|
||||||
|
the name and value is as expected.
|
||||||
|
|
||||||
|
`Check Storage Access` will warn if any operation takes longer than 100ms, and error out if the
|
||||||
|
entire check takes longer than 30s.
|
||||||
|
|
||||||
|
#### Check Service Discovery / Check Consul Service Discovery TLS
|
||||||
|
|
||||||
|
`Check Consul Service Discovery TLS` verifies TLS information included in the service discovery
|
||||||
|
stanza if the storage type is consul. If a certificate chain is provided, Diagnose parses
|
||||||
|
the root, intermediate, and leaf certificates, and checks each one for correctness.
|
||||||
|
|
||||||
|
#### Check Service Discovery / Check Consul Direct Service Discovery
|
||||||
|
|
||||||
|
`Check Consul Direct Service Discovery` is a consul-specific check that ensures Vault
|
||||||
|
is not accessing the consul server directly, but rather through a local agent.
|
||||||
|
|
||||||
|
#### Create Vault Server Configuration Seals
|
||||||
|
|
||||||
|
`Create Vault Server Configuration Seals` creates seals from the vault configuration
|
||||||
|
stanza and verifies they can be initialized and finalized.
|
||||||
|
|
||||||
|
#### Check Transit Seal TLS
|
||||||
|
|
||||||
|
`Check Transit Seal TLS` checks the TLS client certificate, key, and CA certificate
|
||||||
|
provided in a transit seal stanza (if one exists) for correctness.
|
||||||
|
|
||||||
|
#### Create Core Configuration / Initialize Randomness for Core
|
||||||
|
|
||||||
|
`Initialize Randomness for Core` ensures that vault has access to the randReader that
|
||||||
|
the vault core uses.
|
||||||
|
|
||||||
|
#### HA Storage
|
||||||
|
|
||||||
|
This check and any nested checks will be the same as the `Check Storage` checks.
|
||||||
|
The only difference is that the checks here will be run on whatever is specified in the
|
||||||
|
`ha_storage` section of the vault configuration, as opposed to the `storage` section.
|
||||||
|
|
||||||
|
#### Determine Redirect Address
|
||||||
|
|
||||||
|
Ensures that one of the `VAULT_API_ADDR`, `VAULT_REDIRECT_ADDR`, or `VAULT_ADVERTISE_ADDR`
|
||||||
|
environment variables are set, or that the redirect address is specified in the vault
|
||||||
|
configuration.
|
||||||
|
|
||||||
|
#### Check Cluster Address
|
||||||
|
|
||||||
|
Parses the cluster address from the `VAULT_CLUSTER_ADDR` environment variable, or from the
|
||||||
|
redirect address or cluster address specified in the vault configuration, and checks that
|
||||||
|
the address is of the form `host:port`.
|
||||||
|
|
||||||
|
#### Check Core Creation
|
||||||
|
|
||||||
|
`Check Core Creation` verifies the logical configuration checks that vault does when it
|
||||||
|
creates a core object. These are runtime checks, meaning any errors thrown by this diagnose
|
||||||
|
test will also be thrown by the vault server itself when it is run.
|
||||||
|
|
||||||
|
#### Check For Autoloaded License
|
||||||
|
|
||||||
|
`Check For Autoloaded License` is an enterprise diagnose check, which verifies that vault
|
||||||
|
has access to a valid autoloaded license that will not expire in the next 30 days.
|
||||||
|
|
||||||
|
#### Start Listeners / Check Listener TLS
|
||||||
|
|
||||||
|
`Check Listener TLS` verifies the server certificate file and key are valid and matching.
|
||||||
|
It also checks the client CA file, if one is provided, for a valid certificate, and performs
|
||||||
|
the standard runtime listener checks on the listener configuration stanza, such as verifying
|
||||||
|
that the minimum and maximum TLS versions are within the bounds of what vault supports.
|
||||||
|
|
||||||
|
Like all the other Diagnose TLS checks, it will warn if any of the certificates provided are
|
||||||
|
set to expire within the next month.
|
||||||
|
|
||||||
|
#### Start Listeners / Create Listeners
|
||||||
|
|
||||||
|
`Create Listeners` uses the listener configuration to initialize the listeners, erroring with
|
||||||
|
a server error if anything goes wrong.
|
||||||
|
|
||||||
|
#### Check Autounseal Encryption
|
||||||
|
|
||||||
|
`Check Autounseal Encryption` will initialize the barrier using the seal stanza, if the seal
|
||||||
|
type is not a shamir seal, and use it to encrypt and decrypt a dud value.
|
||||||
|
|
||||||
|
#### Check Server Before Runtime
|
||||||
|
|
||||||
|
`Check Server Before Runtime` achieves parity with the server run command, running through
|
||||||
|
the runtime code checks before the server is initialized to ensure that nothing fails.
|
||||||
|
This check will never fail without another diagnose check failing.
|
Loading…
Reference in New Issue