2020-11-09 16:58:54 +00:00
|
|
|
---
|
|
|
|
layout: docs
|
|
|
|
page_title: Transform - Secrets Engines - Tokenization
|
|
|
|
sidebar_title: Tokenization Transform <sup>ENTERPRISE</sup>
|
|
|
|
description: >-
|
|
|
|
More information on the Tokenization transform.
|
|
|
|
---
|
|
|
|
|
|
|
|
# Tokenization Transform
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
Not to be confused with Vault tokens, [Tokenization](transform/tokenization) exchanges a
|
|
|
|
sensitive value for an unrelated value called a *token*. The original sensitive
|
|
|
|
value cannot be recovered from a token alone, they are irreversible. Instead, unlike format preserving encryption, tokenization is stateful. To
|
|
|
|
decode the original value, the token must be submitted to Vault where it is
|
|
|
|
retrieved from a cryptographic mapping in storage.
|
2020-11-09 16:58:54 +00:00
|
|
|
|
|
|
|
|
|
|
|
## Operation
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
On encode, Vault generates a random, signed token and stores a mapping of a
|
|
|
|
version of that token to encrypted versions of the plaintext and metadata, as
|
|
|
|
well as a fingerprint of the original plaintext which facilitates the `tokenized`
|
|
|
|
endpoint that lets one query whether a plaintext exists in the system.
|
2020-11-09 16:58:54 +00:00
|
|
|
|
|
|
|
Depending on the mapping mode, the plaintext may be decoded only with posession
|
2020-11-18 00:56:30 +00:00
|
|
|
of the distributed token, or may be recoverable in the export operation. See
|
2020-11-09 16:58:54 +00:00
|
|
|
[Security Considerations](#security-considerations) for more.
|
|
|
|
|
|
|
|
## Performance Considerations
|
|
|
|
|
|
|
|
### Builtin (Internal) Store
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
As tokenization is stateful, the encode operation necessarily writes values to
|
|
|
|
storage. By default, that storage is the Vault backend store itself. This
|
2020-11-09 16:58:54 +00:00
|
|
|
differs from some secret engines in that the encode and decode operations require
|
|
|
|
an access of storage per operation. Other engines use storage for configuration
|
|
|
|
but can process operations largely without accessing any storage.
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
Since these operations involve writes to storage, and therefore must be performed
|
|
|
|
on primary nodes, the scalability of the encode operation is limited by the
|
2020-11-09 16:58:54 +00:00
|
|
|
primary's storage performance.
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
Additionally, using internal storage, since writes must be performed on primary
|
2020-11-09 16:58:54 +00:00
|
|
|
nodes, the scalability of the encode operation will be limited by the performance
|
|
|
|
of the primary and its storage subsystem. All other operations can be performed
|
|
|
|
on secondaries.
|
|
|
|
|
|
|
|
Finally, due to replication, writes to the primary may take some time to reach
|
2020-11-18 00:56:30 +00:00
|
|
|
secondaries, so other read operations like decode or metadata may not succeed on
|
|
|
|
the secondaries until this happens. In other words, tokenization is eventually
|
2020-11-09 16:58:54 +00:00
|
|
|
consistent.
|
|
|
|
|
|
|
|
### External Storage
|
|
|
|
|
|
|
|
All nodes (except DRs) can participate in all operations using external storage, but one must take care
|
2020-11-18 00:56:30 +00:00
|
|
|
to monitor and scale the external storage for the level of traffic experienced.
|
2020-11-09 16:58:54 +00:00
|
|
|
The storage schema is simple however and well known approaches should be effective.
|
|
|
|
|
|
|
|
## Security Considerations
|
|
|
|
|
|
|
|
The goal of Tokenization is to let end users' devices store the token rather than
|
2020-11-18 00:56:30 +00:00
|
|
|
their sensitive values (such as credit card numbers) and still participate in
|
|
|
|
transations where the token is a standin for the sensitive value. For this reason
|
2020-11-09 16:58:54 +00:00
|
|
|
the token Vault generates is completely unrelated (e.g. irreversible) to the
|
2020-11-18 00:56:30 +00:00
|
|
|
sensitive value.
|
2020-11-09 16:58:54 +00:00
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
Furthermore, the Tokenization transform is designed to resist a number of attacks
|
|
|
|
on the values produced during encode. In particular it is designed so that
|
2020-11-09 16:58:54 +00:00
|
|
|
attackers cannot recover plaintext even if they steal the tokenization values
|
|
|
|
from Vault itself. In the default mapping mode,
|
|
|
|
even stealing the underlying transform key does not allow them to recover
|
|
|
|
the plaintext without also posessing the encoded token. An attacker must have
|
|
|
|
gotten access to all values in the construct.
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
In the `exportable` mapping mode however, the plaintext values are encrypted
|
|
|
|
in a way that can be decrypted within Vault. If the attacker posesses the
|
|
|
|
transform key and the tokenization mapping values, the plaintext can be
|
|
|
|
recovered. This mode is available for the case where operators prioritize the
|
|
|
|
ability to export all of the plaintext values in an emergency, via the export
|
2020-11-09 16:58:54 +00:00
|
|
|
operation.
|
|
|
|
|
|
|
|
### Metadata
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
Since tokenization isn't format preserving and requires storage, one can associate
|
2020-11-09 16:58:54 +00:00
|
|
|
arbitrary metadata with a token. Metadata is considered less sensitive than the
|
|
|
|
original plaintext value. As it has it's own retrieval endpoint, operators can
|
2020-11-18 00:56:30 +00:00
|
|
|
configure policies that may allow access to the metadata of a token but not
|
2020-11-09 16:58:54 +00:00
|
|
|
its decoded value to enable workflows that operate just on the metadata.
|
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
## TTLs and Tidying
|
2020-11-09 16:58:54 +00:00
|
|
|
|
2020-11-18 00:56:30 +00:00
|
|
|
By default, tokens are long lived, and the storage for them will be maintained
|
2020-11-09 16:58:54 +00:00
|
|
|
indefinitely. Where there is a concept of time-to-live, it is strongely encouraged
|
|
|
|
that the tokens be generated with a TTL. For example, as credit cards
|
2020-11-18 00:56:30 +00:00
|
|
|
have an expiration date, it is recommended that tokenizing a credit card
|
|
|
|
primary account number (PAN) be done with a TTL that corresponds to the time
|
2020-11-09 16:58:54 +00:00
|
|
|
after which the PAN is invalid.
|
|
|
|
|
|
|
|
This allows such values to be *tidied* and removed from storage once expired.
|
2020-11-18 00:56:30 +00:00
|
|
|
Tokens themselves encode the expiration time, so decode and other operations
|
2020-11-09 16:58:54 +00:00
|
|
|
can immediately reject the operation when presented with an expired token.
|
|
|
|
|
|
|
|
## External Storage
|
|
|
|
|
|
|
|
### SQL Stores
|
|
|
|
|
|
|
|
Currently only PostgreSQL is supported as an external storage backend for tokenization.
|
|
|
|
The [Schema Endpoint](../../../api-docs/secret/transform#create-update-store-schema) may be used to initialize and upgrade the necessary
|
|
|
|
database tables. Vault uses a schema versioning table to determine if it needs
|
|
|
|
to create or modify the tables when using that endpoint. If you make changes to
|
|
|
|
those tables yourself, the automatic schema management may become out of sync
|
2020-11-18 00:56:30 +00:00
|
|
|
and may fail in the future.
|
|
|
|
|
|
|
|
## Learn
|
|
|
|
|
|
|
|
Refer to [Tokenize Data with Transform Secrets
|
|
|
|
Engine](https://learn.hashicorp.com/tutorials/vault/tokenization) for a
|
|
|
|
step-by-step tutorial.
|