rocksdb/db
Levi Tamasi e367bc7f4b Clean up blob files based on the linked SST set (#7001)
Summary:
The earlier `VersionBuilder` code only cleaned up blob files that were
marked as entirely consisting of garbage using `VersionEdits` with
`BlobFileGarbage`. This covers the cases when table files go through
regular compaction, where we iterate through the KVs and thus have an
opportunity to calculate the amount of garbage (that is, most cases).
However, it does not help when table files are simply dropped (e.g. deletion
compactions or the `DeleteFile` API). To deal with such cases, the patch
adds logic that cleans up all blob files at the head of the list until the first
one with linked SSTs is found. (As an example, let's assume we have blob files
with numbers 1..10, and the first one with any linked SSTs is number 8.
This means that SSTs in the `Version` only rely on blob files with numbers >= 8,
and thus 1..7 are no longer needed.)

The code change itself is pretty small; however, changing the logic like this
necessitated changes to some tests that have been added recently (namely
to the ones that use blob files in isolation, i.e. without any table files referring
to them). Some of these cases were fixed by bypassing `VersionBuilder` altogether
in order to keep the tests simple (which actually makes them more proper unit tests
as well), while the `VersionBuilder` unit tests were fixed by adding dummy table
files to the test cases as needed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7001

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D22119474

Pulled By: ltamasi

fbshipit-source-id: c6547141355667d4291d9661d6518eb741e7b54a
2020-06-30 15:31:21 -07:00
..
blob Move kNoExpiration to blob_db.h (#7018) 2020-06-23 13:45:06 -07:00
compaction Compaction filter support for BlobDB (#6850) 2020-06-29 17:32:14 -07:00
db_impl Divide WriteCallbackTest.WriteWithCallbackTest (#7037) 2020-06-30 12:31:30 -07:00
arena_wrapped_db_iter.cc Fix a bug that causes iterator to return wrong result in a rare data race (#6973) 2020-06-18 10:16:38 -07:00
arena_wrapped_db_iter.h Iterator with timestamp (#6255) 2020-03-06 16:24:27 -08:00
builder.cc Store DB identity and DB session ID in SST files (#6983) 2020-06-17 10:57:40 -07:00
builder.h Store DB identity and DB session ID in SST files (#6983) 2020-06-17 10:57:40 -07:00
c.cc Expose KeyMayExist in the C API (#7021) 2020-06-29 12:21:53 -07:00
c_test.c Expose KeyMayExist in the C API (#7021) 2020-06-29 12:21:53 -07:00
column_family.cc Attempt to recover from db with missing table files (#6334) 2020-03-20 19:30:48 -07:00
column_family.h Attempt to recover from db with missing table files (#6334) 2020-03-20 19:30:48 -07:00
column_family_test.cc Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
compact_files_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
compacted_db_impl.cc return timestamp from get (#6409) 2020-03-02 16:01:00 -08:00
compacted_db_impl.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
comparator_db_test.cc Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
convenience.cc sst_dump to reduce number of file reads (#6836) 2020-05-12 18:23:33 -07:00
corruption_test.cc Check iterator status BlockBasedTableReader::VerifyChecksumInBlocks() (#6909) 2020-06-05 11:08:25 -07:00
cuckoo_table_db_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_basic_test.cc Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00
db_block_cache_test.cc Fix potential overflow of unsigned type in for loop (#6902) 2020-06-02 15:05:07 -07:00
db_bloom_filter_test.cc Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
db_compaction_filter_test.cc Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
db_compaction_test.cc Remove an assertion in FlushAfterIntraL0CompactionCheckConsistencyFail (#7003) 2020-06-19 16:58:29 -07:00
db_dynamic_level_test.cc C++20 compatibility (#6697) 2020-04-20 13:24:25 -07:00
db_encryption_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_filesnapshot.cc First step towards handling MANIFEST write error (#6949) 2020-06-24 19:07:08 -07:00
db_flush_test.cc Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
db_info_dumper.cc Add a DB Session ID (#6959) 2020-06-15 10:47:02 -07:00
db_info_dumper.h Add a DB Session ID (#6959) 2020-06-15 10:47:02 -07:00
db_inplace_update_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_io_failure_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_iter.cc Add timestamp to delete (#6253) 2020-05-28 10:40:03 -07:00
db_iter.h make iterator return versions between timestamp bounds (#6544) 2020-04-10 09:51:58 -07:00
db_iter_stress_test.cc Test CircleCI with CLANG-10 (#7025) 2020-06-24 16:22:49 -07:00
db_iter_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_iterator_test.cc Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
db_log_iter_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_logical_block_size_cache_test.cc Get block size only in direct IO mode (#6522) 2020-03-20 15:26:10 -07:00
db_memtable_test.cc Remove racially charged terms "whitelist" and "blacklist" (#7008) 2020-06-19 15:27:32 -07:00
db_merge_operand_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_merge_operator_test.cc Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
db_options_test.cc Test CircleCI with CLANG-10 (#7025) 2020-06-24 16:22:49 -07:00
db_properties_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_range_del_test.cc Fix potential overflow of unsigned type in for loop (#6902) 2020-06-02 15:05:07 -07:00
db_sst_test.cc Add logs and stats in DeleteScheduler (#6927) 2020-06-05 09:43:04 -07:00
db_statistics_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_table_properties_test.cc Store DB identity and DB session ID in SST files (#6983) 2020-06-17 10:57:40 -07:00
db_tailing_iter_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_test.cc Clean up blob files based on the linked SST set (#7001) 2020-06-30 15:31:21 -07:00
db_test2.cc Move away from direct TmpDir() call in some tests (#7030) 2020-06-25 12:09:57 -07:00
db_test_util.cc Fix failure to write output in SpecialEnv::GetCurrentTime (#6803) 2020-05-05 13:11:29 -07:00
db_test_util.h Disable fsync in some tests to speed them up (#7036) 2020-06-29 16:56:59 -07:00
db_universal_compaction_test.cc Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
db_wal_test.cc Add OptionTypeInfo::Enum and related methods (#6423) 2020-05-05 15:04:04 -07:00
db_with_timestamp_basic_test.cc Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
db_with_timestamp_compaction_test.cc Compaction with timestamp: input boundaries (#6645) 2020-04-10 16:05:49 -07:00
db_write_test.cc Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
dbformat.cc Add timestamp to delete (#6253) 2020-05-28 10:40:03 -07:00
dbformat.h Add timestamp to delete (#6253) 2020-05-28 10:40:03 -07:00
dbformat_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
deletefile_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
error_handler.cc First step towards handling MANIFEST write error (#6949) 2020-06-24 19:07:08 -07:00
error_handler.h Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487) 2020-03-27 16:04:43 -07:00
error_handler_fs_test.cc Revamp cache_bench to resemble a real workload (#6629) 2020-04-03 10:26:49 -07:00
event_helpers.cc Store DB identity and DB session ID in SST files (#6983) 2020-06-17 10:57:40 -07:00
event_helpers.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
experimental.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
external_sst_file_basic_test.cc Ingest SST files with checksum information (#6891) 2020-06-11 14:27:36 -07:00
external_sst_file_ingestion_job.cc Ingest SST files with checksum information (#6891) 2020-06-11 14:27:36 -07:00
external_sst_file_ingestion_job.h Ingest SST files with checksum information (#6891) 2020-06-11 14:27:36 -07:00
external_sst_file_test.cc Disable fsync in some tests to speed them up (#7036) 2020-06-29 16:56:59 -07:00
fault_injection_test.cc Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
file_indexer.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
file_indexer.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
file_indexer_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
filename_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
flush_job.cc Store DB identity and DB session ID in SST files (#6983) 2020-06-17 10:57:40 -07:00
flush_job.h Store DB identity and DB session ID in SST files (#6983) 2020-06-17 10:57:40 -07:00
flush_job_test.cc Store DB identity and DB session ID in SST files (#6983) 2020-06-17 10:57:40 -07:00
flush_scheduler.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
flush_scheduler.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
forward_iterator.cc make L0 index/filter pinned memory usage predictable (#6911) 2020-06-09 16:51:23 -07:00
forward_iterator.h Properly report IO errors when IndexType::kBinarySearchWithFirstKey is used (#6621) 2020-04-15 17:40:44 -07:00
forward_iterator_bench.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
import_column_family_job.cc Fix potential size_t overflow in import_column_family (#6762) 2020-04-30 08:40:42 -07:00
import_column_family_job.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
import_column_family_test.cc Use a per-thread path for the export directory in import_column_family_test (#6962) 2020-06-10 14:04:07 -07:00
internal_stats.cc First step towards handling MANIFEST write error (#6949) 2020-06-24 19:07:08 -07:00
internal_stats.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
job_context.h Expose the set of live blob files from Version/VersionSet (#6785) 2020-05-04 15:08:13 -07:00
listener_test.cc Move BlobDB related files under db/ to db/blob/ (#6519) 2020-03-12 11:00:56 -07:00
log_format.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
log_reader.cc Fail point-in-time WAL recovery upon IOError reading WAL (#6963) 2020-06-11 18:42:10 -07:00
log_reader.h Fix tabs and lint-ignores (#6734) 2020-04-20 11:39:31 -07:00
log_test.cc Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
log_writer.cc Fail recovery when MANIFEST record checksum mismatch (#6996) 2020-06-18 10:09:12 -07:00
log_writer.h Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487) 2020-03-27 16:04:43 -07:00
logs_with_prep_tracker.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
logs_with_prep_tracker.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
lookup_key.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
malloc_stats.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
malloc_stats.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
manual_compaction_test.cc Skip high levels with no key falling in the range in CompactRange (#6482) 2020-03-04 20:15:25 -08:00
memtable.cc Add unity build to CircleCI (#7026) 2020-06-26 11:14:08 -07:00
memtable.h return timestamp from get (#6409) 2020-03-02 16:01:00 -08:00
memtable_list.cc Fix data race to VersionSet::io_status_ (#7034) 2020-06-27 08:57:31 -07:00
memtable_list.h Fix some defects reported by Coverity Scan (#6933) 2020-06-04 15:46:27 -07:00
memtable_list_test.cc Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487) 2020-03-27 16:04:43 -07:00
merge_context.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
merge_helper.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
merge_helper.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
merge_helper_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
merge_operator.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
merge_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
obsolete_files_test.cc Clean up blob files based on the linked SST set (#7001) 2020-06-30 15:31:21 -07:00
options_file_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
perf_context_test.cc C++20 compatibility (#6697) 2020-04-20 13:24:25 -07:00
pinned_iterators_manager.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
plain_table_db_test.cc Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00
pre_release_callback.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
prefix_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
range_del_aggregator.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
range_del_aggregator.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
range_del_aggregator_bench.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
range_del_aggregator_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
range_tombstone_fragmenter.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
range_tombstone_fragmenter.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
range_tombstone_fragmenter_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
read_callback.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
repair.cc Store DB identity and DB session ID in SST files (#6983) 2020-06-17 10:57:40 -07:00
repair_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
snapshot_checker.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
snapshot_impl.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
snapshot_impl.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
table_cache.cc Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00
table_cache.h Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00
table_properties_collector.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
table_properties_collector.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
table_properties_collector_test.cc Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
transaction_log_impl.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
transaction_log_impl.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
trim_history_scheduler.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
trim_history_scheduler.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
version_builder.cc Clean up blob files based on the linked SST set (#7001) 2020-06-30 15:31:21 -07:00
version_builder.h make L0 index/filter pinned memory usage predictable (#6911) 2020-06-09 16:51:23 -07:00
version_builder_test.cc Clean up blob files based on the linked SST set (#7001) 2020-06-30 15:31:21 -07:00
version_edit.cc Remove unnecessary inclusion of version_edit.h in env (#6952) 2020-06-07 21:56:55 -07:00
version_edit.h Remove unnecessary inclusion of version_edit.h in env (#6952) 2020-06-07 21:56:55 -07:00
version_edit_handler.cc Fail recovery when MANIFEST record checksum mismatch (#6996) 2020-06-18 10:09:12 -07:00
version_edit_handler.h Fail recovery when MANIFEST record checksum mismatch (#6996) 2020-06-18 10:09:12 -07:00
version_edit_test.cc Revert "Added the safe-to-ignore tag to version_edit (#6530)" (#6569) 2020-03-23 10:27:47 -07:00
version_set.cc Fix data race to VersionSet::io_status_ (#7034) 2020-06-27 08:57:31 -07:00
version_set.h Clean up blob files based on the linked SST set (#7001) 2020-06-30 15:31:21 -07:00
version_set_test.cc Clean up blob files based on the linked SST set (#7001) 2020-06-30 15:31:21 -07:00
wal_manager.cc Fix FilterBench when RTTI=0 (#6732) 2020-04-29 13:09:23 -07:00
wal_manager.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
wal_manager_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
write_batch.cc Add timestamp to delete (#6253) 2020-05-28 10:40:03 -07:00
write_batch_base.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
write_batch_internal.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
write_batch_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
write_callback.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
write_callback_test.cc Divide WriteCallbackTest.WriteWithCallbackTest (#7037) 2020-06-30 12:31:30 -07:00
write_controller.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
write_controller.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
write_controller_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
write_thread.cc fix some spelling typos (#6464) 2020-02-28 14:14:03 -08:00
write_thread.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00