87b82f28a1
Summary: ### **Summary:** To minimize the internal fragmentation caused by the variable size of the compressed blocks, the original block is split according to the jemalloc bin size in `Insert()` and then merged back in `Lookup()`. Based on the analysis of the results of the following tests, from the overall internal fragmentation perspective, this PR does mitigate the internal fragmentation issue. _Do more myshadow tests with the latest commit. I finished several myshadow AB Testing and the results are promising. For the config of 4GB primary cache and 3GB secondary cache, Jemalloc resident stats shows consistently ~0.15GB memory saving; the allocated and active stats show similar memory savings. The CPU usage is almost the same before and after this PR._ To evaluate the issue of memory fragmentations and the benefits of this PR, I conducted two sets of local tests as follows. **T1** Keys: 16 bytes each (+ 0 bytes user-defined timestamp) Values: 100 bytes each (50 bytes after compression) Entries: 90000000 RawSize: 9956.4 MB (estimated) FileSize: 5664.8 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T1_3 | 4000 | 4000 | | T1_4 | 2000 | 3000 | Populate the DB: ./db_bench --benchmarks=fillrandom --num=90000000 -db=/mem_fragmentation/db_bench_1 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=90000000 -use_existing_db -db=/mem_fragmentation/db_bench_1 Run read tests with differnt cache setting: T1_3: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_3_20220710 -duration=1800 & T1_4: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_4_20220710 -duration=1800 & For T1_3 and T1_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T1_3 | T1_3 after mem defrag | T1_4 | T1_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8728 | 8076 | 5518 | 5043 | | available (MB) | 8753 | 8092 | 5536 | 5051 | | external fragmentation rate | 0.003 | 0.002 | 0.003 | 0.0016 | | resident (MB) | 8956 | 8365 | 5655 | 5235 | **T2** Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 256 bytes each (128 bytes after compression) Entries: 40000000 RawSize: 10986.3 MB (estimated) FileSize: 6103.5 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T2_3 | 4000 | 4000 | | T2_4 | 2000 | 3000 | Create DB (10GB): ./db_bench -benchmarks=fillrandom -use_direct_reads=true -num=40000000 -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=40000000 -use_existing_db -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Run read tests with differnt cache setting: T2_3: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_3 -duration=1800 & T2_4: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_4 -duration=1800 & For T2_3 and T2_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T2_3 | T2_3 after mem defrag | T2_4 | T2_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8425 | 8093 | 5426 | 5149 | | available (MB) | 8489 | 8138 | 5435 | 5158 | | external fragmentation rate | 0.008 | 0.0055 | 0.0017 | 0.0017 | | resident (MB) | 8676 | 8392 | 5541 | 5321 | Pull Request resolved: https://github.com/facebook/rocksdb/pull/10287 Test Plan: Unit tests. Reviewed By: anand1976 Differential Revision: D37743362 Pulled By: gitbw95 fbshipit-source-id: 0010c5af08addeacc5ebbc4ffe5be882fb1d38ad |
||
---|---|---|
.circleci | ||
.github/workflows | ||
buckifier | ||
build_tools | ||
cache | ||
cmake | ||
coverage | ||
db | ||
db_stress_tool | ||
docs | ||
env | ||
examples | ||
file | ||
fuzz | ||
include/rocksdb | ||
java | ||
logging | ||
memory | ||
memtable | ||
microbench | ||
monitoring | ||
options | ||
plugin | ||
port | ||
table | ||
test_util | ||
third-party | ||
tools | ||
trace_replay | ||
util | ||
utilities | ||
.clang-format | ||
.gitignore | ||
.lgtm.yml | ||
.watchmanconfig | ||
AUTHORS | ||
CMakeLists.txt | ||
CODE_OF_CONDUCT.md | ||
CONTRIBUTING.md | ||
COPYING | ||
DEFAULT_OPTIONS_HISTORY.md | ||
DUMP_FORMAT.md | ||
HISTORY.md | ||
INSTALL.md | ||
LANGUAGE-BINDINGS.md | ||
LICENSE.Apache | ||
LICENSE.leveldb | ||
Makefile | ||
PLUGINS.md | ||
README.md | ||
ROCKSDB_LITE.md | ||
TARGETS | ||
USERS.md | ||
Vagrantfile | ||
WINDOWS_PORT.md | ||
common.mk | ||
crash_test.mk | ||
issue_template.md | ||
rocksdb.pc.in | ||
src.mk | ||
thirdparty.inc |
README.md
RocksDB: A Persistent Key-Value Store for Flash and RAM Storage
RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)
This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.
Start with example usage here: https://github.com/facebook/rocksdb/tree/main/examples
See the github wiki for more explanation.
The public interface is in include/
. Callers should not include or
rely on the details of any other header files in this package. Those
internal APIs may be changed without warning.
Questions and discussions are welcome on the RocksDB Developers Public Facebook group and email list on Google Groups.
License
RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.