mirror of https://github.com/facebook/rocksdb.git
d13825e586
Summary: This PR does not affect write-committed. Add a member, `rollback_deletion_type_callback` to TransactionDBOptions so that a write-prepared transaction, when rolling back, can call this callback to decide if a `Delete` or `SingleDelete` should be used to cancel a prior `Put` written to the database during prepare phase. The purpose of this PR is to prevent mixing `Delete` and `SingleDelete` for the same key, causing undefined behaviors. Without this PR, the following can happen: ``` // The application always issues SingleDelete when deleting keys. txn1->Put('a'); txn1->Prepare(); // writes to memtable and potentially gets flushed/compacted to Lmax txn1->Rollback(); // inserts DELETE('a') txn2->Put('a'); txn2->Commit(); // writes to memtable and potentially gets flushed/compacted ``` In the database, we may have ``` L0: [PUT('a', s=100)] L1: [DELETE('a', s=90)] Lmax: [PUT('a', s=0)] ``` If a compaction compacts L0 and L1, then we have ``` L1: [PUT('a', s=100)] Lmax: [PUT('a', s=0)] ``` If a future transaction issues a SingleDelete, we have ``` L0: [SD('a', s=110)] L1: [PUT('a', s=100)] Lmax: [PUT('a', s=0)] ``` Then, a compaction including L0, L1 and Lmax leads to ``` Lmax: [PUT('a', s=0)] ``` which is incorrect. Similar bugs reported and addressed in https://github.com/cockroachdb/pebble/issues/1255. Based on our team's current priority, we have decided to take this approach for now. We may come back and revisit in the future. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9873 Test Plan: make check Reviewed By: ltamasi Differential Revision: D35762170 Pulled By: riversand963 fbshipit-source-id: b28d56eefc786b53c9844b9ef4a7807acdd82c8d |
||
---|---|---|
.. | ||
agg_merge | ||
backup | ||
blob_db | ||
cassandra | ||
checkpoint | ||
compaction_filters | ||
convenience | ||
leveldb_options | ||
memory | ||
merge_operators | ||
option_change_migration | ||
options | ||
persistent_cache | ||
simulator_cache | ||
table_properties_collectors | ||
trace | ||
transactions | ||
ttl | ||
write_batch_with_index | ||
cache_dump_load.cc | ||
cache_dump_load_impl.cc | ||
cache_dump_load_impl.h | ||
compaction_filters.cc | ||
counted_fs.cc | ||
counted_fs.h | ||
debug.cc | ||
env_mirror.cc | ||
env_mirror_test.cc | ||
env_timed.cc | ||
env_timed.h | ||
env_timed_test.cc | ||
fault_injection_env.cc | ||
fault_injection_env.h | ||
fault_injection_fs.cc | ||
fault_injection_fs.h | ||
fault_injection_secondary_cache.cc | ||
fault_injection_secondary_cache.h | ||
memory_allocators.h | ||
merge_operators.cc | ||
merge_operators.h | ||
object_registry.cc | ||
object_registry_test.cc | ||
util_merge_operators_test.cc | ||
wal_filter.cc |