rocksdb/table
Peter Dillinger 003e72b201 Use size_t for filter APIs, protect against overflow (#7726)
Summary:
Deprecate CalculateNumEntry and replace with
ApproximateNumEntries (better name) using size_t instead of int and
uint32_t, to minimize confusing casts and bad overflow behavior
(possible though probably not realistic). Bloom sizes are now explicitly
capped at max size supported by implementations: just under 4GiB for
fv=5 Bloom, and just under 512MiB for fv<5 Legacy Bloom. This
hardening could help to set up for fuzzing.

Also, since RocksDB only uses this information as an approximation
for trying to hit certain sizes for partitioned filters, it's more important
that the function be reasonably fast than for it to be completely
accurate. It's hard enough to be 100% accurate for Ribbon (currently
reversing CalculateSpace) that adding optimize_filters_for_memory
into the mix is just not worth trying to be 100% accurate for num
entries for bytes.

Also:
- Cleaned up filter_policy.h to remove MSVC warning handling and
potentially unsafe use of exception for "not implemented"
- Correct the number of entries limit beyond which current Ribbon
implementation falls back on Bloom instead.
- Consistently use "num_entries" rather than "num_entry"
- Remove LegacyBloomBitsBuilder::CalculateNumEntry as it's essentially
obsolete from general implementation
BuiltinFilterBitsBuilder::CalculateNumEntries.
- Fix filter_bench to skip some tests that don't make sense when only
one or a small number of filters has been generated.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7726

Test Plan:
expanded existing unit tests for CalculateSpace /
ApproximateNumEntries. Also manually used filter_bench to verify Legacy and
fv=5 Bloom size caps work (much too expensive for unit test). Note that
the actual bits per key is below requested due to space cap.

    $ ./filter_bench -impl=0 -bits_per_key=20 -average_keys_per_filter=256000000 -vary_key_count_ratio=0 -m_keys_total_max=256 -allow_bad_fp_rate
    ...
    Total size (MB): 511.992
    Bits/key stored: 16.777
    ...
    $ ./filter_bench -impl=2 -bits_per_key=20 -average_keys_per_filter=2000000000 -vary_key_count_ratio=0 -m_keys_total_max=2000
    ...
    Total size (MB): 4096
    Bits/key stored: 17.1799
    ...
    $

Reviewed By: jay-zhuang

Differential Revision: D25239800

Pulled By: pdillinger

fbshipit-source-id: f94e6d065efd31e05ec630ae1a82e6400d8390c4
2020-12-11 22:18:12 -08:00
..
adaptive Bring the Configurable options together (#5753) 2020-09-14 17:01:01 -07:00
block_based Use size_t for filter APIs, protect against overflow (#7726) 2020-12-11 22:18:12 -08:00
cuckoo Create a Customizable class to load classes and configurations (#6590) 2020-11-11 15:10:41 -08:00
plain Create a Customizable class to load classes and configurations (#6590) 2020-11-11 15:10:41 -08:00
block_fetcher.cc Skip unnecessary allocation for mmap reads under 5000 bytes (#7043) 2020-06-30 15:40:40 -07:00
block_fetcher.h Reduce memory copies when fetching and uncompressing blocks from SST files (#6689) 2020-04-24 15:32:56 -07:00
block_fetcher_test.cc Add Status check enforcement for unit tests (#7464) 2020-09-30 12:48:05 -07:00
cleanable_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
format.cc Fix initialization order of DBOptions and kHostnameForDbHostId (#7702) 2020-11-19 22:39:40 -08:00
format.h Add a host location property to TableProperties (#7479) 2020-10-19 11:38:48 -07:00
get_context.cc Introduce BlobFileCache and add support for blob files to Get() (#7540) 2020-10-15 13:04:47 -07:00
get_context.h Introduce BlobFileCache and add support for blob files to Get() (#7540) 2020-10-15 13:04:47 -07:00
internal_iterator.h Clean up InternalIterator upper bound logic a little bit (#7200) 2020-08-05 10:44:57 -07:00
iter_heap.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
iterator.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
iterator_wrapper.h Redesign block cache pinning API (#7520) 2020-10-11 14:58:24 -07:00
merger_test.cc More Makefile Cleanup (#7097) 2020-07-09 14:35:17 -07:00
merging_iterator.cc Add some simulator cache and block tracer tests to ASSERT_STATUS_CHECKED (#7305) 2020-08-24 16:43:31 -07:00
merging_iterator.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
meta_blocks.cc Add a host location property to TableProperties (#7479) 2020-10-19 11:38:48 -07:00
meta_blocks.h Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00
mock_table.cc Add full_history_ts_low_ to CompactionJob (#7657) 2020-11-12 11:43:24 -08:00
mock_table.h Add full_history_ts_low_ to CompactionJob (#7657) 2020-11-12 11:43:24 -08:00
multiget_context.h Fix MultiGet unable to query timestamp data issue (#7589) 2020-11-03 09:45:41 -08:00
persistent_cache_helper.cc Add more tests to ASSERT_STATUS_CHECKED (#7367) 2020-09-16 15:48:07 -07:00
persistent_cache_helper.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
persistent_cache_options.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
scoped_arena_iterator.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
sst_file_dumper.cc In ParseInternalKey(), include corrupt key info in Status (#7515) 2020-10-28 10:12:58 -07:00
sst_file_dumper.h Add sst_file_dumper status check (#7315) 2020-09-04 19:26:42 -07:00
sst_file_reader.cc Add blob support to DBIter (#7731) 2020-12-04 21:29:38 -08:00
sst_file_reader_test.cc Add a host location property to TableProperties (#7479) 2020-10-19 11:38:48 -07:00
sst_file_writer.cc enable Status check assertions for sst_file_reader_test (#7448) 2020-09-28 15:13:15 -07:00
sst_file_writer_collectors.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
table_builder.h Store DB identity and DB session ID in SST files (#6983) 2020-06-17 10:57:40 -07:00
table_factory.cc Create a Customizable class to load classes and configurations (#6590) 2020-11-11 15:10:41 -08:00
table_properties.cc Add a host location property to TableProperties (#7479) 2020-10-19 11:38:48 -07:00
table_properties_internal.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
table_reader.h dedup ReadOptions in iterator hierarchy (#7210) 2020-08-03 15:23:04 -07:00
table_reader_bench.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
table_reader_caller.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
table_test.cc Return Status from MemTable mutation functions (#7656) 2020-11-23 16:29:04 -08:00
two_level_iterator.cc Redesign block cache pinning API (#7520) 2020-10-11 14:58:24 -07:00
two_level_iterator.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00