Diagnose docs + changelog (#12159)
* save * diagnose docs * changelog * changelog formatting
This commit is contained in:
parent
f7ecb978a6
commit
fff7dc7a40
|
@ -0,0 +1,3 @@
|
|||
```release-note:feature
|
||||
operator diagnose: a new vault operator command to detect common issues with vault server setups.
|
||||
```
|
|
@ -0,0 +1,228 @@
|
|||
---
|
||||
layout: docs
|
||||
page_title: operator diagnose - Command
|
||||
description: |-
|
||||
"vault operator diagnose" is a new operator-centric command, focused on providing a clear description
|
||||
of what is working in Vault, and what is not working. The command focuses on why Vault cannot serve requests,
|
||||
but will also warn on configurations or statuses that it deems to be unsafe in some way.
|
||||
|
||||
---
|
||||
|
||||
# operator diagnose
|
||||
|
||||
The operator diagnose command should be used primarily when vault is down or
|
||||
partially inoperational. The command can be used safely regardless of the state
|
||||
vault is in, but may return meaningless results for some of the test cases if the
|
||||
vault server is already running.
|
||||
|
||||
Note: if you run the diagnose command proactively, either before a server
|
||||
starts or while a server is operational, please consult the documentation
|
||||
on the individual checks below to see which checks are returning false error
|
||||
messages or warnings.
|
||||
|
||||
## Usage
|
||||
|
||||
The following flags are available in addition to the [standard set of
|
||||
flags](/docs/commands) included on all commands.
|
||||
|
||||
### Output Options
|
||||
|
||||
- `-format` `(string: "table")` - Print the output in the given format. Valid
|
||||
formats are "table", "json", or "yaml". This can also be specified via the
|
||||
`VAULT_FORMAT` environment variable.
|
||||
|
||||
#### Output Layout
|
||||
|
||||
The operator diagnose command will output a set of lines in the CLI.
|
||||
Each line will begin with a prefix in parenthesis. These are:.
|
||||
|
||||
- `[ success ]` - Denotes that the check was successful.
|
||||
- `[ warning ]` - Denotes that the check has passed, but that there may be potential
|
||||
issues to look into that may relate to the issues vault is experiencing. Diagnose warns
|
||||
frequently. These warnings are meant to serve as starting points in the debugging process.
|
||||
- `[ failure ]` - Denotes that the check has failed. Failures are critical issues in the eyes
|
||||
of the diagnose command.
|
||||
|
||||
In addition to these prefixed lines, there may be output lines that are not prefixed, but are
|
||||
color-coded purple. These are advice lines from Diagnose, and are meant to offer general guidance
|
||||
on how to go about fixing potential warnings or failures that may arise.
|
||||
|
||||
Warn or fail prefixes in nested checks will bubble up to the parent if the prefix superceeds the
|
||||
parent prefix. Fail superceeds warn, and warn superceeds ok. For example, if the TLS checks under
|
||||
the Storage check fails, the `[ failure ]` prefix will bubble up to the Storage check.
|
||||
|
||||
### Command Options
|
||||
|
||||
- `-config` `(string; "")` - The path to the vault configuration file used by
|
||||
the vault server on startup.
|
||||
|
||||
### Diagnose Checks
|
||||
|
||||
The following section details the various checks that Diagnose runs. Check names in documentation
|
||||
will be separated by slashes to denote that they are nested, when applicable. For example, a check
|
||||
documented as `A / B` will show up as `B` in the `operator diagnose` output, and will be nested
|
||||
(indented) under `A`.
|
||||
|
||||
#### Vault Diagnose
|
||||
|
||||
`Vault Diagnose` is the top level check that contains the rest of the checks. It will report the status
|
||||
of the check
|
||||
|
||||
#### Check Operating System / Check Open File Limit
|
||||
|
||||
`Check Open File Limit` verifies that the open file limit value is set high enough for vault
|
||||
to run effectively. We recommend setting these limits to at least 1024768.
|
||||
|
||||
This check will be skipped on openbsd, arm, and windows.
|
||||
|
||||
#### Check Operating System / Check Disk Usage
|
||||
|
||||
`Check Disk Usage` will report disk usage for each partition. For each partition on a prod host,
|
||||
we recommend having at least 5% of the partition free to use, and at least 1 GB of space.
|
||||
|
||||
This check will be skipped on openbsd and arm.
|
||||
|
||||
#### Parse Configuration
|
||||
|
||||
`Parse Configuration` will check the vault server config file for syntax errors. It will check
|
||||
for extra values in the configuration file, repeated stanzas, and stanzas that do not belong
|
||||
in the configuration file (for example a "tcpp" listener as opposed to a tcp listener).
|
||||
|
||||
Currently, the `storage` stanza is not checked.
|
||||
|
||||
#### Check Storage / Create Storage Backend
|
||||
|
||||
`Create Storage Backend` ensures that the storage stanza configured in the vault server config
|
||||
has enough information to create a storage object internally. Common errors will have to do
|
||||
with misconfigured fields in the storage stanza.
|
||||
|
||||
#### Check Storage / Check Consul TLS
|
||||
|
||||
`Check Consul TLS` verifies TLS information included in the storage stanza if the storage type
|
||||
is consul. If a certificate chain is provided, Diagnose parses the root, intermediate, and leaf
|
||||
certificates, and checks each one for correctness.
|
||||
|
||||
#### Check Storage / Check Consul Direct Storage Access
|
||||
|
||||
`Check Consul Direct Storage Access` is a consul-specific check that ensures Vault is not accessing
|
||||
the consul server directly, but rather through a local agent.
|
||||
|
||||
#### Check Storage / Check Raft Folder Permissions
|
||||
|
||||
`Check Raft Folder Permissions` computes the permissions on the raft folder, checks that a boltDB file
|
||||
has been initialized within the folder previously, and ensures that the folder is not too permissive, but
|
||||
at the same time has enough permissions to be used. The raft folder should not have `other` permissions, but
|
||||
should have `group rw` or `owner rw`, depending on different setups. This check also warns if it detects a
|
||||
symlink being used.
|
||||
|
||||
Note that this check will warn that a raft file has not been created if diagnose is run without any
|
||||
pre-existing server runs.
|
||||
|
||||
This check will be skipped on windows.
|
||||
|
||||
#### Check Storage / Check Raft Folder Ownership
|
||||
|
||||
`Check Raft Folder Ownership` ensures that vault does not need to run as root to access the boltDB folder.
|
||||
|
||||
Note that this check will warn that a raft file has not been created if diagnose is run without any
|
||||
pre-existing server runs.
|
||||
|
||||
This check will be skipped on windows.
|
||||
|
||||
#### Check Storage / Check For Raft Quorum
|
||||
|
||||
`Check For Raft Quorum` uses the FSM to ensure that there were an odd number of voters in the raft quorum when
|
||||
vault was last running.
|
||||
|
||||
Note that this check will warn that there are 0 voters if diagnose is run without any pre-existing server runs.
|
||||
|
||||
#### Check Storage / Check Storage Access
|
||||
|
||||
`Check Storage Access` will try to write a dud value, named `diagnose/latency/<uuid>`, to storage.
|
||||
Ensure that there is no important data at this location before running diagnose, as this check
|
||||
will overwrite that data. This check will then try to list and read the value it wrote to ensure
|
||||
the name and value is as expected.
|
||||
|
||||
`Check Storage Access` will warn if any operation takes longer than 100ms, and error out if the
|
||||
entire check takes longer than 30s.
|
||||
|
||||
#### Check Service Discovery / Check Consul Service Discovery TLS
|
||||
|
||||
`Check Consul Service Discovery TLS` verifies TLS information included in the service discovery
|
||||
stanza if the storage type is consul. If a certificate chain is provided, Diagnose parses
|
||||
the root, intermediate, and leaf certificates, and checks each one for correctness.
|
||||
|
||||
#### Check Service Discovery / Check Consul Direct Service Discovery
|
||||
|
||||
`Check Consul Direct Service Discovery` is a consul-specific check that ensures Vault
|
||||
is not accessing the consul server directly, but rather through a local agent.
|
||||
|
||||
#### Create Vault Server Configuration Seals
|
||||
|
||||
`Create Vault Server Configuration Seals` creates seals from the vault configuration
|
||||
stanza and verifies they can be initialized and finalized.
|
||||
|
||||
#### Check Transit Seal TLS
|
||||
|
||||
`Check Transit Seal TLS` checks the TLS client certificate, key, and CA certificate
|
||||
provided in a transit seal stanza (if one exists) for correctness.
|
||||
|
||||
#### Create Core Configuration / Initialize Randomness for Core
|
||||
|
||||
`Initialize Randomness for Core` ensures that vault has access to the randReader that
|
||||
the vault core uses.
|
||||
|
||||
#### HA Storage
|
||||
|
||||
This check and any nested checks will be the same as the `Check Storage` checks.
|
||||
The only difference is that the checks here will be run on whatever is specified in the
|
||||
`ha_storage` section of the vault configuration, as opposed to the `storage` section.
|
||||
|
||||
#### Determine Redirect Address
|
||||
|
||||
Ensures that one of the `VAULT_API_ADDR`, `VAULT_REDIRECT_ADDR`, or `VAULT_ADVERTISE_ADDR`
|
||||
environment variables are set, or that the redirect address is specified in the vault
|
||||
configuration.
|
||||
|
||||
#### Check Cluster Address
|
||||
|
||||
Parses the cluster address from the `VAULT_CLUSTER_ADDR` environment variable, or from the
|
||||
redirect address or cluster address specified in the vault configuration, and checks that
|
||||
the address is of the form `host:port`.
|
||||
|
||||
#### Check Core Creation
|
||||
|
||||
`Check Core Creation` verifies the logical configuration checks that vault does when it
|
||||
creates a core object. These are runtime checks, meaning any errors thrown by this diagnose
|
||||
test will also be thrown by the vault server itself when it is run.
|
||||
|
||||
#### Check For Autoloaded License
|
||||
|
||||
`Check For Autoloaded License` is an enterprise diagnose check, which verifies that vault
|
||||
has access to a valid autoloaded license that will not expire in the next 30 days.
|
||||
|
||||
#### Start Listeners / Check Listener TLS
|
||||
|
||||
`Check Listener TLS` verifies the server certificate file and key are valid and matching.
|
||||
It also checks the client CA file, if one is provided, for a valid certificate, and performs
|
||||
the standard runtime listener checks on the listener configuration stanza, such as verifying
|
||||
that the minimum and maximum TLS versions are within the bounds of what vault supports.
|
||||
|
||||
Like all the other Diagnose TLS checks, it will warn if any of the certificates provided are
|
||||
set to expire within the next month.
|
||||
|
||||
#### Start Listeners / Create Listeners
|
||||
|
||||
`Create Listeners` uses the listener configuration to initialize the listeners, erroring with
|
||||
a server error if anything goes wrong.
|
||||
|
||||
#### Check Autounseal Encryption
|
||||
|
||||
`Check Autounseal Encryption` will initialize the barrier using the seal stanza, if the seal
|
||||
type is not a shamir seal, and use it to encrypt and decrypt a dud value.
|
||||
|
||||
#### Check Server Before Runtime
|
||||
|
||||
`Check Server Before Runtime` achieves parity with the server run command, running through
|
||||
the runtime code checks before the server is initialized to ensure that nothing fails.
|
||||
This check will never fail without another diagnose check failing.
|
Loading…
Reference in New Issue