rocksdb/util
Andrew Kryczka 843d2e3137 Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.

During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.

On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.

Test Plan: new unit test

Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong

Reviewed By: sdong

Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D52287
2016-04-27 17:36:03 -07:00
..
aligned_buffer.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
allocator.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
arena.cc Fixes warnings and ensure correct int behavior on 32-bit platforms. 2016-03-16 22:57:57 +01:00
arena.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
arena_test.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
autovector.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
autovector_test.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
bloom.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
bloom_test.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
build_version.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
cache.cc Change default number of cache shard bit to be 6 and max_file_opening_threads to be 16. 2016-04-07 13:55:10 -07:00
cache_bench.cc Cache to have an option to fail Cache::Insert() when full 2016-03-10 17:35:19 -08:00
cache_test.cc Cache to have an option to fail Cache::Insert() when full 2016-03-10 17:35:19 -08:00
channel.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
coding.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
coding.h Fixed compile warnings in posix_logger.h and coding.h 2016-03-31 16:01:47 -07:00
coding_test.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
compaction_job_stats_impl.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
comparator.cc Improve BytewiseComparatorImpl::FindShortestSeparator 2016-04-25 23:02:14 -07:00
compression.h Shared dictionary compression using reference block 2016-04-27 17:36:03 -07:00
concurrent_arena.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
concurrent_arena.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
crc32c.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
crc32c.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
crc32c_test.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
delete_scheduler.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
delete_scheduler.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
delete_scheduler_test.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
dynamic_bloom.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
dynamic_bloom.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
dynamic_bloom_test.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
env.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
env_hdfs.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
env_posix.cc Remove the SyncPoint usage in the destructor of PosixEnv 2016-02-17 23:32:14 -08:00
env_test.cc Alpine Linux Build (#990) 2016-04-22 16:49:12 -07:00
event_logger.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
event_logger.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
event_logger_test.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
file_reader_writer.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
file_reader_writer.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
file_reader_writer_test.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
file_util.cc Forge current file for checkpoint 2016-03-17 10:07:21 -07:00
file_util.h Forge current file for checkpoint 2016-03-17 10:07:21 -07:00
filelock_test.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
filter_policy.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
hash.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
hash.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
heap.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
heap_test.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
histogram.cc Fix FB internal CI build failure 2016-03-15 11:38:15 -07:00
histogram.h Histogram Concurrency Improvement and Time-Windowing Support 2016-03-11 16:54:25 -08:00
histogram_test.cc Fix Build Error 2016-03-11 22:56:25 -08:00
histogram_windowing.cc Fix in HistogramWindowingImpl 2016-03-17 14:28:41 -07:00
histogram_windowing.h Histogram Concurrency Improvement and Time-Windowing Support 2016-03-11 16:54:25 -08:00
instrumented_mutex.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
instrumented_mutex.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
io_posix.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
io_posix.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
iostats_context.cc IOStatsContext::ToString() add option to exclude zero counters 2016-02-23 10:26:24 -08:00
iostats_context_imp.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
iostats_context_test.cc IOStatsContext::ToString() add option to exclude zero counters 2016-02-23 10:26:24 -08:00
kv_map.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
log_buffer.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
log_buffer.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
log_write_bench.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
logging.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
logging.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
memenv.cc Refactor to support file_reader_writer on Windows. 2015-09-11 09:57:02 -07:00
memenv_test.cc [directory includes cleanup] Finish removing util->db dependencies 2016-01-26 10:49:24 -08:00
mock_env.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
mock_env.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
mock_env_test.cc [directory includes cleanup] Finish removing util->db dependencies 2016-01-26 10:49:24 -08:00
murmurhash.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
murmurhash.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
mutable_cf_options.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
mutable_cf_options.h Rename options.compaction_measure_io_stats to options.report_bg_io_stats and include flush too. 2016-04-15 10:22:18 -07:00
mutexlock.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
options.cc Shared dictionary compression using reference block 2016-04-27 17:36:03 -07:00
options_builder.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
options_helper.cc Shared dictionary compression using reference block 2016-04-27 17:36:03 -07:00
options_helper.h Print memory allocation counters 2016-04-27 16:23:33 -07:00
options_parser.cc Add parsing of missing DB options 2016-03-02 10:34:14 -08:00
options_parser.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
options_sanity_check.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
options_sanity_check.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
options_settable_test.cc Print memory allocation counters 2016-04-27 16:23:33 -07:00
options_test.cc Shared dictionary compression using reference block 2016-04-27 17:36:03 -07:00
perf_context.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
perf_context_imp.h fix ios build error 2016-02-17 20:22:40 +08:00
perf_level.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
perf_level_imp.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
perf_step_timer.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
posix_logger.h Fixed compile warnings in posix_logger.h and coding.h 2016-03-31 16:01:47 -07:00
random.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
random.h Fixes warnings and ensure correct int behavior on 32-bit platforms. 2016-03-16 22:57:57 +01:00
rate_limiter.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
rate_limiter.h Add a minimum value for the refill bytes per period value 2016-04-13 09:01:42 -07:00
rate_limiter_test.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
slice.cc to/from hex refactor 2016-03-30 14:36:48 -07:00
slice_transform_test.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
sst_file_manager_impl.cc Fix SstFileManager uninitialized data member 2016-02-18 11:25:19 -08:00
sst_file_manager_impl.h Introduce SstFileManager::SetMaxAllowedSpaceUsage() to cap disk space usage 2016-02-17 15:20:23 -08:00
statistics.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
statistics.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
status.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
status_message.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
stderr_logger.h Stderr info logger 2016-04-01 11:06:06 -07:00
stop_watch.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
string_util.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
string_util.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
sync_point.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
sync_point.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
testharness.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
testharness.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
testutil.cc Rename options.compaction_measure_io_stats to options.report_bg_io_stats and include flush too. 2016-04-15 10:22:18 -07:00
testutil.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
thread_list_test.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
thread_local.cc Fixed a dependency issue of ThreadLocalPtr 2016-02-10 16:56:01 -08:00
thread_local.h Fixed a dependency issue of ThreadLocalPtr 2016-02-10 16:56:01 -08:00
thread_local_test.cc Fix LITE build thread_local_test 2016-02-19 13:57:18 -08:00
thread_operation.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
thread_posix.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
thread_posix.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
thread_status_impl.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
thread_status_updater.cc Use pure if-then check instead of assert in EraseColumnFamilyInfo 2016-03-04 16:03:31 -08:00
thread_status_updater.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
thread_status_updater_debug.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
thread_status_util.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
thread_status_util.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
thread_status_util_debug.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
transaction_test_util.cc Add multithreaded transaction test 2016-03-11 15:16:52 -08:00
transaction_test_util.h Fix AppVeyor build error 2016-03-15 10:57:33 -07:00
xfunc.cc Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
xfunc.h Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
xxhash.cc Enable MS Warning C4804 : unsafe use of type 'bool' in operation 2015-11-18 16:23:19 -08:00
xxhash.h Prevent xxhash symbols from polluting global namespace 2015-03-12 12:07:10 -07:00