rocksdb/utilities/transactions
Yu Zhang 9c94559de7 Optimize compaction for standalone range deletion files (#13078)
Summary:
This PR adds some optimization for compacting standalone range deletion files. A standalone range deletion file is one with just a single range deletion. Currently, such a file is used in bulk loading to achieve something like atomically delete old version of all data with one big range deletion and adding new version of data. These are the changes included in the PR:

1) When a standalone range deletion file is ingested via bulk loading, it's marked for compaction.
2) When picking input files during compaction picking, we attempt to only pick a standalone range deletion file when oldest snapshot is at or above the file's seqno. To do this, `PickCompaction` API is updated to take existing snapshots as an input. This is only done for the universal compaction + UDT disabled combination, we save querying for existing snapshots and not pass it for all other cases.
3) At `Compaction` construction time, the input files will be filtered to examine if any of them can be skipped for compaction iterator. For example, if all the data of the file is deleted by a standalone range tombstone, and the oldest snapshot is at or above such range tombstone, this file will be filtered out.
4) Every time a snapshot is released, we examine if any column family has standalone range deletion files that becomes eligible to be scheduled for compaction. And schedule one for it.

Potential future improvements:
- Add some dedicated statistics for the filtered files.
- Extend this input filtering to L0 files' compactions cases when a newer L0 file could shadow an older L0 file

Pull Request resolved: https://github.com/facebook/rocksdb/pull/13078

Test Plan: Added unit tests and stress tested a few rounds

Reviewed By: cbi42

Differential Revision: D64879415

Pulled By: jowlyzhang

fbshipit-source-id: 02b8683fddbe11f093bcaa0a38406deb39f44d9e
2024-10-25 09:32:14 -07:00
..
lock Fix deprecated use of 0/NULL in internal_repo_rocksdb/repo/util/xxhash.h + 5 2024-04-01 21:20:51 -07:00
optimistic_transaction.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
optimistic_transaction.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
optimistic_transaction_db_impl.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
optimistic_transaction_db_impl.h Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
optimistic_transaction_test.cc Add GetEntityForUpdate to optimistic and WriteCommitted pessimistic transactions (#12668) 2024-05-20 10:43:05 -07:00
pessimistic_transaction.cc Add a TransactionOptions to enable tracking timestamp size info inside WriteBatch (#12864) 2024-08-05 13:06:45 -07:00
pessimistic_transaction.h Add an option to toggle timestamp based validation for the whole DB (#12857) 2024-07-29 13:54:37 -07:00
pessimistic_transaction_db.cc Make transaction name conflict check more robust (#12895) 2024-07-30 12:31:02 -07:00
pessimistic_transaction_db.h Make transaction name conflict check more robust (#12895) 2024-07-30 12:31:02 -07:00
snapshot_checker.cc Optimize compaction for standalone range deletion files (#13078) 2024-10-25 09:32:14 -07:00
timestamped_snapshot_test.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
transaction_base.cc Fix bug for recovering a prepared but not committed txn (#12856) 2024-07-11 16:25:35 -07:00
transaction_base.h Add GetEntityForUpdate to optimistic and WriteCommitted pessimistic transactions (#12668) 2024-05-20 10:43:05 -07:00
transaction_db_mutex_impl.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
transaction_db_mutex_impl.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
transaction_test.cc Fix rebuilding transactions containing PutEntity (#12681) 2024-05-21 17:22:20 -07:00
transaction_test.h Rename, deprecate LogFile and VectorLogPtr (#12695) 2024-05-28 09:24:49 -07:00
transaction_util.cc Add an option to toggle timestamp based validation for the whole DB (#12857) 2024-07-29 13:54:37 -07:00
transaction_util.h Add an option to toggle timestamp based validation for the whole DB (#12857) 2024-07-29 13:54:37 -07:00
write_committed_transaction_ts_test.cc Add a TransactionOptions to enable tracking timestamp size info inside WriteBatch (#12864) 2024-08-05 13:06:45 -07:00
write_prepared_transaction_test.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
write_prepared_txn.cc Add an option to toggle timestamp based validation for the whole DB (#12857) 2024-07-29 13:54:37 -07:00
write_prepared_txn.h Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
write_prepared_txn_db.cc Add public API WriteWithCallback to support custom callbacks (#12603) 2024-05-31 19:30:19 -07:00
write_prepared_txn_db.h Properly disable MultiCFIterator in WritePrepared/UnPreparedTxnDBs (#12883) 2024-07-24 16:50:12 -07:00
write_unprepared_transaction_test.cc Refactor WriteUnpreparedStressTest to be a unit test (#11424) 2023-05-22 12:31:52 -07:00
write_unprepared_txn.cc Do not add unprep_seqs when WriteImpl() fails in unprepared txn (#12927) 2024-08-15 09:16:29 -07:00
write_unprepared_txn.h Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
write_unprepared_txn_db.cc Add public API WriteWithCallback to support custom callbacks (#12603) 2024-05-31 19:30:19 -07:00
write_unprepared_txn_db.h Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00