open-vault/website/content/docs/secrets/transform/tokenization.mdx

---
layout: docs
page_title: Transform - Secrets Engines - Tokenization
sidebar_title: Tokenization Transform <sup>ENTERPRISE</sup>
description: >-
  More information on the Tokenization transform.
---

# Tokenization Transform

Not to be confused with Vault tokens, [Tokenization](transform/tokenization) exchanges a
sensitive value for an unrelated value called a _token_. The original sensitive
value cannot be recovered from a token alone, they are irreversible. Instead, unlike format preserving encryption, tokenization is stateful. To
decode the original value, the token must be submitted to Vault where it is
retrieved from a cryptographic mapping in storage.

## Operation

On encode, Vault generates a random, signed token and stores a mapping of a
version of that token to encrypted versions of the plaintext and metadata, as
well as a fingerprint of the original plaintext which facilitates the `tokenized`
endpoint that lets one query whether a plaintext exists in the system.

Depending on the mapping mode, the plaintext may be decoded only with posession
of the distributed token, or may be recoverable in the export operation. See
[Security Considerations](#security-considerations) for more.

## Performance Considerations

### Builtin (Internal) Store

As tokenization is stateful, the encode operation necessarily writes values to
storage. By default, that storage is the Vault backend store itself. This
differs from some secret engines in that the encode and decode operations require
an access of storage per operation. Other engines use storage for configuration
but can process operations largely without accessing any storage.

Since these operations involve writes to storage, and therefore must be performed
on primary nodes, the scalability of the encode operation is limited by the
primary's storage performance.

Additionally, using internal storage, since writes must be performed on primary
nodes, the scalability of the encode operation will be limited by the performance
of the primary and its storage subsystem. All other operations can be performed
on secondaries.

Finally, due to replication, writes to the primary may take some time to reach
secondaries, so other read operations like decode or metadata may not succeed on
the secondaries until this happens. In other words, tokenization is eventually
consistent.

### External Storage

All nodes (except DRs) can participate in all operations using external storage, but one must take care
to monitor and scale the external storage for the level of traffic experienced.
The storage schema is simple however and well known approaches should be effective.

## Security Considerations

The goal of Tokenization is to let end users' devices store the token rather than
their sensitive values (such as credit card numbers) and still participate in
transations where the token is a standin for the sensitive value. For this reason
the token Vault generates is completely unrelated (e.g. irreversible) to the
sensitive value.

Furthermore, the Tokenization transform is designed to resist a number of attacks
on the values produced during encode. In particular it is designed so that
attackers cannot recover plaintext even if they steal the tokenization values
from Vault itself. In the default mapping mode,
even stealing the underlying transform key does not allow them to recover
the plaintext without also posessing the encoded token. An attacker must have
gotten access to all values in the construct.

In the `exportable` mapping mode however, the plaintext values are encrypted
in a way that can be decrypted within Vault. If the attacker posesses the
transform key and the tokenization mapping values, the plaintext can be
recovered. This mode is available for the case where operators prioritize the
ability to export all of the plaintext values in an emergency, via the export
operation.

### Metadata

Since tokenization isn't format preserving and requires storage, one can associate
arbitrary metadata with a token. Metadata is considered less sensitive than the
original plaintext value. As it has it's own retrieval endpoint, operators can
configure policies that may allow access to the metadata of a token but not
its decoded value to enable workflows that operate just on the metadata.

## TTLs and Tidying

By default, tokens are long lived, and the storage for them will be maintained
indefinitely. Where there is a concept of time-to-live, it is strongely encouraged
that the tokens be generated with a TTL. For example, as credit cards
have an expiration date, it is recommended that tokenizing a credit card
primary account number (PAN) be done with a TTL that corresponds to the time
after which the PAN is invalid.

This allows such values to be _tidied_ and removed from storage once expired.
Tokens themselves encode the expiration time, so decode and other operations
can immediately reject the operation when presented with an expired token.

### Key Management

Tokenization supports key rotation. Keys are tied to transforms, so key
names are the same as the name of the corresponding tokenization transform.
Keys can be rotated to a new version, with backward compatibility for 
decoding. Encoding is always performed with the newest key version. Keys versions
can be tidied as well. For more information, see the [transform api docs](../../../api-docs/secret/transform).

## External Storage

### SQL Stores

Currently only PostgreSQL is supported as an external storage backend for tokenization.
The [Schema Endpoint](../../../api-docs/secret/transform#create-update-store-schema) may be used to initialize and upgrade the necessary
database tables. Vault uses a schema versioning table to determine if it needs
to create or modify the tables when using that endpoint. If you make changes to
those tables yourself, the automatic schema management may become out of sync
and may fail in the future.

## Learn

Refer to [Tokenize Data with Transform Secrets
Engine](https://learn.hashicorp.com/tutorials/vault/tokenization) for a
step-by-step tutorial.
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`---`
			`layout: docs`
			`page_title: Transform - Secrets Engines - Tokenization`
			`sidebar_title: Tokenization Transform <sup>ENTERPRISE</sup>`
			`description: >-`
			`More information on the Tokenization transform.`
			`---`

			`# Tokenization Transform`

Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`Not to be confused with Vault tokens, [Tokenization](transform/tokenization) exchanges a`
Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`sensitive value for an unrelated value called a _token_. The original sensitive`
			`value cannot be recovered from a token alone, they are irreversible. Instead, unlike format preserving encryption, tokenization is stateful. To`
Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`decode the original value, the token must be submitted to Vault where it is`
			`retrieved from a cryptographic mapping in storage.`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00
			`## Operation`

Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`On encode, Vault generates a random, signed token and stores a mapping of a`
			`version of that token to encrypted versions of the plaintext and metadata, as`
			well as a fingerprint of the original plaintext which facilitates the `tokenized`
			`endpoint that lets one query whether a plaintext exists in the system.`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00
			`Depending on the mapping mode, the plaintext may be decoded only with posession`
Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`of the distributed token, or may be recoverable in the export operation. See`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`[Security Considerations](#security-considerations) for more.`

			`## Performance Considerations`

			`### Builtin (Internal) Store`

Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`As tokenization is stateful, the encode operation necessarily writes values to`
Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`storage. By default, that storage is the Vault backend store itself. This`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`differs from some secret engines in that the encode and decode operations require`
Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`an access of storage per operation. Other engines use storage for configuration`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`but can process operations largely without accessing any storage.`

Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`Since these operations involve writes to storage, and therefore must be performed`
			`on primary nodes, the scalability of the encode operation is limited by the`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`primary's storage performance.`

Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`Additionally, using internal storage, since writes must be performed on primary`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`nodes, the scalability of the encode operation will be limited by the performance`
Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`of the primary and its storage subsystem. All other operations can be performed`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`on secondaries.`

			`Finally, due to replication, writes to the primary may take some time to reach`
Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`secondaries, so other read operations like decode or metadata may not succeed on`
Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`the secondaries until this happens. In other words, tokenization is eventually`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`consistent.`

			`### External Storage`

			`All nodes (except DRs) can participate in all operations using external storage, but one must take care`
Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`to monitor and scale the external storage for the level of traffic experienced.`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`The storage schema is simple however and well known approaches should be effective.`

			`## Security Considerations`

			`The goal of Tokenization is to let end users' devices store the token rather than`
Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`their sensitive values (such as credit card numbers) and still participate in`
Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`transations where the token is a standin for the sensitive value. For this reason`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`the token Vault generates is completely unrelated (e.g. irreversible) to the`
Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`sensitive value.`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00
Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`Furthermore, the Tokenization transform is designed to resist a number of attacks`
Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`on the values produced during encode. In particular it is designed so that`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`attackers cannot recover plaintext even if they steal the tokenization values`
Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`from Vault itself. In the default mapping mode,`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`even stealing the underlying transform key does not allow them to recover`
Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`the plaintext without also posessing the encoded token. An attacker must have`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`gotten access to all values in the construct.`

Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			In the `exportable` mapping mode however, the plaintext values are encrypted
Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`in a way that can be decrypted within Vault. If the attacker posesses the`
Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`transform key and the tokenization mapping values, the plaintext can be`
Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`recovered. This mode is available for the case where operators prioritize the`
Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`ability to export all of the plaintext values in an emergency, via the export`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`operation.`

			`### Metadata`

Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`Since tokenization isn't format preserving and requires storage, one can associate`
Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`arbitrary metadata with a token. Metadata is considered less sensitive than the`
			`original plaintext value. As it has it's own retrieval endpoint, operators can`
Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`configure policies that may allow access to the metadata of a token but not`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`its decoded value to enable workflows that operate just on the metadata.`

Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`## TTLs and Tidying`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00
Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`By default, tokens are long lived, and the storage for them will be maintained`
Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`indefinitely. Where there is a concept of time-to-live, it is strongely encouraged`
			`that the tokens be generated with a TTL. For example, as credit cards`
Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`have an expiration date, it is recommended that tokenizing a credit card`
			`primary account number (PAN) be done with a TTL that corresponds to the time`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`after which the PAN is invalid.`

Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`This allows such values to be _tidied_ and removed from storage once expired.`
Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`Tokens themselves encode the expiration time, so decode and other operations`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`can immediately reject the operation when presented with an expired token.`

Docs: Key Rotation For Tokenization [VAULT-1482] (#10921) * first docs pass * filled in read output * transform docs changes * transform docs changes * transform docs changes * transform docs changes * transform docs changes * transform docs changes * transform docs changes 2021-03-17 21:29:13 +00:00			`### Key Management`

			`Tokenization supports key rotation. Keys are tied to transforms, so key`
			`names are the same as the name of the corresponding tokenization transform.`
			`Keys can be rotated to a new version, with backward compatibility for`
			`decoding. Encoding is always performed with the newest key version. Keys versions`
			`can be tidied as well. For more information, see the [transform api docs](../../../api-docs/secret/transform).`

Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`## External Storage`

			`### SQL Stores`

			`Currently only PostgreSQL is supported as an external storage backend for tokenization.`
			`The [Schema Endpoint](../../../api-docs/secret/transform#create-update-store-schema) may be used to initialize and upgrade the necessary`
Implement MDX Remote (#10581) * implement mdx remote * fix an unfenced code block * fix partials path Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com> 2020-12-17 21:53:33 +00:00			`database tables. Vault uses a schema versioning table to determine if it needs`
			`to create or modify the tables when using that endpoint. If you make changes to`
Land Tokenization docs (#10357) 2020-11-09 16:58:54 +00:00			`those tables yourself, the automatic schema management may become out of sync`
Add Learn links (#10411) * Add Learn links * Update website/pages/docs/secrets/transform/tokenization.mdx Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> 2020-11-18 00:56:30 +00:00			`and may fail in the future.`

			`## Learn`

			`Refer to [Tokenize Data with Transform Secrets`
			`Engine](https://learn.hashicorp.com/tutorials/vault/tokenization) for a`
			`step-by-step tutorial.`