2020-11-09 16:58:54 +00:00
|
|
|
---
|
|
|
|
layout: docs
|
|
|
|
page_title: Transform - Secrets Engines - Tokenization
|
|
|
|
description: >-
|
|
|
|
More information on the Tokenization transform.
|
|
|
|
---
|
|
|
|
|
|
|
|
# Tokenization Transform
|
|
|
|
|
2021-06-15 17:38:16 +00:00
|
|
|
Not to be confused with Vault tokens, Tokenization exchanges a
|
2020-12-17 21:53:33 +00:00
|
|
|
sensitive value for an unrelated value called a _token_. The original sensitive
|
2021-04-06 17:49:04 +00:00
|
|
|
value cannot be recovered from a token alone, they are irreversible. Instead,
|
|
|
|
unlike format preserving encryption, tokenization is stateful. To decode the
|
2021-03-19 15:34:41 +00:00
|
|
|
original value, the token must be submitted to Vault where it is
|
2020-11-18 00:56:30 +00:00
|
|
|
retrieved from a cryptographic mapping in storage.
|
2020-11-09 16:58:54 +00:00
|
|
|
|
|
|
|
## Operation
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
On encode, Vault generates a random, signed token and stores a mapping of a
|
|
|
|
version of that token to encrypted versions of the plaintext and metadata, as
|
|
|
|
well as a fingerprint of the original plaintext which facilitates the `tokenized`
|
|
|
|
endpoint that lets one query whether a plaintext exists in the system.
|
2020-11-09 16:58:54 +00:00
|
|
|
|
|
|
|
Depending on the mapping mode, the plaintext may be decoded only with posession
|
2020-12-17 21:53:33 +00:00
|
|
|
of the distributed token, or may be recoverable in the export operation. See
|
2020-11-09 16:58:54 +00:00
|
|
|
[Security Considerations](#security-considerations) for more.
|
|
|
|
|
|
|
|
## Performance Considerations
|
|
|
|
|
|
|
|
### Builtin (Internal) Store
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
As tokenization is stateful, the encode operation necessarily writes values to
|
2020-12-17 21:53:33 +00:00
|
|
|
storage. By default, that storage is the Vault backend store itself. This
|
2020-11-09 16:58:54 +00:00
|
|
|
differs from some secret engines in that the encode and decode operations require
|
2020-12-17 21:53:33 +00:00
|
|
|
an access of storage per operation. Other engines use storage for configuration
|
2020-11-09 16:58:54 +00:00
|
|
|
but can process operations largely without accessing any storage.
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
Since these operations involve writes to storage, and therefore must be performed
|
|
|
|
on primary nodes, the scalability of the encode operation is limited by the
|
2020-11-09 16:58:54 +00:00
|
|
|
primary's storage performance.
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
Additionally, using internal storage, since writes must be performed on primary
|
2020-11-09 16:58:54 +00:00
|
|
|
nodes, the scalability of the encode operation will be limited by the performance
|
2020-12-17 21:53:33 +00:00
|
|
|
of the primary and its storage subsystem. All other operations can be performed
|
2020-11-09 16:58:54 +00:00
|
|
|
on secondaries.
|
|
|
|
|
|
|
|
Finally, due to replication, writes to the primary may take some time to reach
|
2020-11-18 00:56:30 +00:00
|
|
|
secondaries, so other read operations like decode or metadata may not succeed on
|
2020-12-17 21:53:33 +00:00
|
|
|
the secondaries until this happens. In other words, tokenization is eventually
|
2020-11-09 16:58:54 +00:00
|
|
|
consistent.
|
|
|
|
|
|
|
|
### External Storage
|
|
|
|
|
2021-04-06 17:49:04 +00:00
|
|
|
All nodes (except DRs) can participate in all operations using external storage,
|
|
|
|
but one must take care to monitor and scale the external storage for the level of
|
|
|
|
traffic experienced. The storage schema is simple however and well known approaches
|
2021-03-19 15:34:41 +00:00
|
|
|
should be effective.
|
2020-11-09 16:58:54 +00:00
|
|
|
|
|
|
|
## Security Considerations
|
|
|
|
|
|
|
|
The goal of Tokenization is to let end users' devices store the token rather than
|
2020-11-18 00:56:30 +00:00
|
|
|
their sensitive values (such as credit card numbers) and still participate in
|
2020-12-17 21:53:33 +00:00
|
|
|
transations where the token is a standin for the sensitive value. For this reason
|
2020-11-09 16:58:54 +00:00
|
|
|
the token Vault generates is completely unrelated (e.g. irreversible) to the
|
2020-11-18 00:56:30 +00:00
|
|
|
sensitive value.
|
2020-11-09 16:58:54 +00:00
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
Furthermore, the Tokenization transform is designed to resist a number of attacks
|
2020-12-17 21:53:33 +00:00
|
|
|
on the values produced during encode. In particular it is designed so that
|
2020-11-09 16:58:54 +00:00
|
|
|
attackers cannot recover plaintext even if they steal the tokenization values
|
2020-12-17 21:53:33 +00:00
|
|
|
from Vault itself. In the default mapping mode,
|
2020-11-09 16:58:54 +00:00
|
|
|
even stealing the underlying transform key does not allow them to recover
|
2020-12-17 21:53:33 +00:00
|
|
|
the plaintext without also posessing the encoded token. An attacker must have
|
2020-11-09 16:58:54 +00:00
|
|
|
gotten access to all values in the construct.
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
In the `exportable` mapping mode however, the plaintext values are encrypted
|
2020-12-17 21:53:33 +00:00
|
|
|
in a way that can be decrypted within Vault. If the attacker posesses the
|
2020-11-18 00:56:30 +00:00
|
|
|
transform key and the tokenization mapping values, the plaintext can be
|
2020-12-17 21:53:33 +00:00
|
|
|
recovered. This mode is available for the case where operators prioritize the
|
2021-04-06 17:49:04 +00:00
|
|
|
ability to export all of the plaintext values in an emergency, via the
|
2021-03-19 15:34:41 +00:00
|
|
|
`export-decoded` operation.
|
2020-11-09 16:58:54 +00:00
|
|
|
|
|
|
|
### Metadata
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
Since tokenization isn't format preserving and requires storage, one can associate
|
2020-12-17 21:53:33 +00:00
|
|
|
arbitrary metadata with a token. Metadata is considered less sensitive than the
|
|
|
|
original plaintext value. As it has it's own retrieval endpoint, operators can
|
2020-11-18 00:56:30 +00:00
|
|
|
configure policies that may allow access to the metadata of a token but not
|
2020-11-09 16:58:54 +00:00
|
|
|
its decoded value to enable workflows that operate just on the metadata.
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
## TTLs and Tidying
|
2020-11-09 16:58:54 +00:00
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
By default, tokens are long lived, and the storage for them will be maintained
|
2020-12-17 21:53:33 +00:00
|
|
|
indefinitely. Where there is a concept of time-to-live, it is strongely encouraged
|
|
|
|
that the tokens be generated with a TTL. For example, as credit cards
|
2020-11-18 00:56:30 +00:00
|
|
|
have an expiration date, it is recommended that tokenizing a credit card
|
|
|
|
primary account number (PAN) be done with a TTL that corresponds to the time
|
2020-11-09 16:58:54 +00:00
|
|
|
after which the PAN is invalid.
|
|
|
|
|
2020-12-17 21:53:33 +00:00
|
|
|
This allows such values to be _tidied_ and removed from storage once expired.
|
2020-11-18 00:56:30 +00:00
|
|
|
Tokens themselves encode the expiration time, so decode and other operations
|
2020-11-09 16:58:54 +00:00
|
|
|
can immediately reject the operation when presented with an expired token.
|
|
|
|
|
2021-03-19 15:34:41 +00:00
|
|
|
## Storage
|
|
|
|
|
|
|
|
### External SQL Stores
|
|
|
|
|
2021-04-06 17:49:04 +00:00
|
|
|
Currently the PostgreSQL and MySQL relational databases are supported as
|
2021-03-19 15:34:41 +00:00
|
|
|
external storage backends for tokenization.
|
2021-04-06 17:49:04 +00:00
|
|
|
The [Schema Endpoint](../../../api-docs/secret/transform#create-update-store-schema)
|
|
|
|
may be used to initialize and upgrade the necessary database tables. Vault uses
|
|
|
|
a schema versioning table to determine if it needs to create or modify the
|
|
|
|
tables when using that endpoint. If you make changes to those tables yourself,
|
2021-03-19 15:34:41 +00:00
|
|
|
the automatic schema management may become out of sync and may fail in the future.
|
|
|
|
|
|
|
|
External stores may often be preferred due to their ability to achieve a much
|
|
|
|
higher scale of performance, especially when used with batch operations.
|
|
|
|
|
|
|
|
### Snapshot/Restore
|
|
|
|
|
2021-04-06 17:49:04 +00:00
|
|
|
Snapshot allows one to iteratively retrieve the tokenization state, for
|
|
|
|
backup or migration purposes. The resulting data can be fed to the restore
|
|
|
|
endpoint of the same or a different tokenization store. Note that the state
|
|
|
|
is only useable by the tokenization transform that created it, as state is
|
2021-03-19 15:34:41 +00:00
|
|
|
encrypted via keys in that configured trnasform.
|
|
|
|
|
|
|
|
### Export Decoded
|
|
|
|
|
|
|
|
For stores configured with the `exportable` mapping mode, the export decoded
|
2021-04-06 17:49:04 +00:00
|
|
|
endpoint allows operators to retrieve the _decoded_ contents of tokenization
|
|
|
|
state, which includes tokens and their decoded, sensitive values. The
|
2021-03-19 15:34:41 +00:00
|
|
|
`exportable` mode is only recommended if this use case is required, as the default
|
|
|
|
cannot be decoded by attackers even if they gain access to Vault's storage and
|
|
|
|
keys.
|
|
|
|
|
|
|
|
### Migration
|
|
|
|
|
|
|
|
Tokenization stores are configured separately from the tokenization transform,
|
2021-04-06 17:49:04 +00:00
|
|
|
and the transform can point to multiple stores. The primary use case for this
|
|
|
|
one-to-many relationship is to facilitate migration between two tokenization
|
|
|
|
stores.
|
2021-03-19 15:34:41 +00:00
|
|
|
|
2021-04-06 17:49:04 +00:00
|
|
|
When multiple stores are configured, Vault writes new tokenization state to all
|
2021-03-19 15:34:41 +00:00
|
|
|
configured stores, and reads from each store in the order they were configured.
|
2021-04-06 17:49:04 +00:00
|
|
|
Thus, one can use multiple configured stores along with the snapshot/restore
|
2021-03-19 15:34:41 +00:00
|
|
|
functionality to perform a zero-downtime migration to a new store:
|
|
|
|
|
|
|
|
1. Configure the new tokenization store in the API.
|
2021-04-06 17:49:04 +00:00
|
|
|
1. Modify the existing tokenization transform to use both the existing and new
|
|
|
|
store.
|
2021-03-19 15:34:41 +00:00
|
|
|
1. Snapshot the old store.
|
|
|
|
1. Restore the snapshot to the new store.
|
|
|
|
1. Perform any desired validations.
|
|
|
|
1. Modify the tokenization transform to use only the new store.
|
|
|
|
|
|
|
|
## Key Management
|
2021-03-17 21:29:13 +00:00
|
|
|
|
|
|
|
Tokenization supports key rotation. Keys are tied to transforms, so key
|
|
|
|
names are the same as the name of the corresponding tokenization transform.
|
2021-04-06 17:49:04 +00:00
|
|
|
Keys can be rotated to a new version, with backward compatibility for
|
2021-03-17 21:29:13 +00:00
|
|
|
decoding. Encoding is always performed with the newest key version. Keys versions
|
|
|
|
can be tidied as well. For more information, see the [transform api docs](../../../api-docs/secret/transform).
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
## Learn
|
|
|
|
|
|
|
|
Refer to [Tokenize Data with Transform Secrets
|
|
|
|
Engine](https://learn.hashicorp.com/tutorials/vault/tokenization) for a
|
|
|
|
step-by-step tutorial.
|