rocksdb/db/blob
Gang Liao e6432dfd4c Make it possible to enable blob files starting from a certain LSM tree level (#10077)
Summary:
Currently, if blob files are enabled (i.e. `enable_blob_files` is true), large values are extracted both during flush/recovery (when SST files are written into level 0 of the LSM tree) and during compaction into any LSM tree level. For certain use cases that have a mix of short-lived and long-lived values, it might make sense to support extracting large values only during compactions whose output level is greater than or equal to a specified LSM tree level (e.g. compactions into L1/L2/... or above). This could reduce the space amplification caused by large values that are turned into garbage shortly after being written at the price of some write amplification incurred by long-lived values whose extraction to blob files is delayed.

In order to achieve this, we would like to do the following:
- Add a new configuration option `blob_file_starting_level` (default: 0) to `AdvancedColumnFamilyOptions` (and `MutableCFOptions` and extend the related logic)
- Instantiate `BlobFileBuilder` in `BuildTable` (used during flush and recovery, where the LSM tree level is L0) and `CompactionJob` iff `enable_blob_files` is set and the LSM tree level is `>= blob_file_starting_level`
- Add unit tests for the new functionality, and add the new option to our stress tests (`db_stress` and `db_crashtest.py` )
- Add the new option to our benchmarking tool `db_bench` and the BlobDB benchmark script `run_blob_bench.sh`
- Add the new option to the `ldb` tool (see https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool)
- Ideally extend the C and Java bindings with the new option
- Update the BlobDB wiki to document the new option.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10077

Reviewed By: ltamasi

Differential Revision: D36884156

Pulled By: gangliao

fbshipit-source-id: 942bab025f04633edca8564ed64791cb5e31627d
2022-06-02 20:04:33 -07:00
..
blob_constants.h Move BlobDB related files under db/ to db/blob/ (#6519) 2020-03-12 11:00:56 -07:00
blob_counting_iterator.h Log the amount of blob garbage generated by compactions in the MANIFEST (#8450) 2021-06-24 16:11:56 -07:00
blob_counting_iterator_test.cc Cleanup multiple implementations of VectorIterator (#8901) 2021-10-06 07:48:31 -07:00
blob_fetcher.cc Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
blob_fetcher.h Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
blob_file_addition.cc Print blob file checksums as hex (#8437) 2021-06-22 09:49:44 -07:00
blob_file_addition.h Move BlobDB related files under db/ to db/blob/ (#6519) 2020-03-12 11:00:56 -07:00
blob_file_addition_test.cc Print blob file checksums as hex (#8437) 2021-06-22 09:49:44 -07:00
blob_file_builder.cc Expose blob file information through the EventListener interface (#8675) 2021-09-16 17:23:36 -07:00
blob_file_builder.h Expose blob file information through the EventListener interface (#8675) 2021-09-16 17:23:36 -07:00
blob_file_builder_test.cc Make SystemClock into a Customizable Class (#8636) 2021-09-21 09:23:48 -07:00
blob_file_cache.cc Rename ImmutableOptions variables (#8409) 2021-06-16 16:51:38 -07:00
blob_file_cache.h Rename ImmutableOptions variables (#8409) 2021-06-16 16:51:38 -07:00
blob_file_cache_test.cc Make SystemClock into a Customizable Class (#8636) 2021-09-21 09:23:48 -07:00
blob_file_completion_callback.h Fix LITE mode builds on MacOs (#8981) 2021-10-04 05:30:26 -07:00
blob_file_garbage.cc Move BlobDB related files under db/ to db/blob/ (#6519) 2020-03-12 11:00:56 -07:00
blob_file_garbage.h Move BlobDB related files under db/ to db/blob/ (#6519) 2020-03-12 11:00:56 -07:00
blob_file_garbage_test.cc Move BlobDB related files under db/ to db/blob/ (#6519) 2020-03-12 11:00:56 -07:00
blob_file_meta.cc Add BlobMetaData retrieval methods (#8273) 2021-06-28 08:13:29 -07:00
blob_file_meta.h Add BlobMetaData retrieval methods (#8273) 2021-06-28 08:13:29 -07:00
blob_file_reader.cc Add rate limiter priority to ReadOptions (#9424) 2022-02-16 23:18:14 -08:00
blob_file_reader.h Add rate limiter priority to ReadOptions (#9424) 2022-02-16 23:18:14 -08:00
blob_file_reader_test.cc Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
blob_garbage_meter.cc Add a class for measuring the amount of garbage generated during compaction (#8426) 2021-06-21 22:25:30 -07:00
blob_garbage_meter.h Add a class for measuring the amount of garbage generated during compaction (#8426) 2021-06-21 22:25:30 -07:00
blob_garbage_meter_test.cc Add a class for measuring the amount of garbage generated during compaction (#8426) 2021-06-21 22:25:30 -07:00
blob_index.h Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
blob_log_format.cc Introduce BlobFileCache and add support for blob files to Get() (#7540) 2020-10-15 13:04:47 -07:00
blob_log_format.h Add a class for measuring the amount of garbage generated during compaction (#8426) 2021-06-21 22:25:30 -07:00
blob_log_sequential_reader.cc Add rate limiter priority to ReadOptions (#9424) 2022-02-16 23:18:14 -08:00
blob_log_sequential_reader.h Fix a issue with initializing blob header buffer (#8537) 2021-08-02 17:15:06 -07:00
blob_log_writer.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
blob_log_writer.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
db_blob_basic_test.cc Propagate errors from UpdateBoundaries (#9851) 2022-04-15 20:25:48 -07:00
db_blob_compaction_test.cc Make it possible to enable blob files starting from a certain LSM tree level (#10077) 2022-06-02 20:04:33 -07:00
db_blob_corruption_test.cc Enable a few unit tests to use custom Env objects (#9087) 2021-11-08 11:05:59 -08:00
db_blob_index_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
prefetch_buffer_collection.cc Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
prefetch_buffer_collection.h Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00