mirror of
https://github.com/facebook/rocksdb.git
synced 2024-11-26 16:30:56 +00:00
321dfdc3ae
Summary: The leveldb API is enhanced to support different compression algorithms at different levels. This adds the option min_level_to_compress to db_bench that specifies the minimum level for which compression should be done when compression is enabled. This can be used to disable compression for levels 0 and 1 which are likely to suffer from stalls because of the CPU load for memtable flushes and (L0,L1) compaction. Level 0 is special as it gets frequent memtable flushes. Level 1 is special as it frequently gets all:all file compactions between it and level 0. But all other levels could be the same. For any level N where N > 1, the rate of sequential IO for that level should be the same. The last level is the exception because it might not be full and because files from it are not read to compact with the next larger level. The same amount of time will be spent doing compaction at any level N excluding N=0, 1 or the last level. By this standard all of those levels should use the same compression. The difference is that the loss (using more disk space) from a faster compression algorithm is less significant for N=2 than for N=3. So we might be willing to trade disk space for faster write rates with no compression for L0 and L1, snappy for L2, zlib for L3. Using a faster compression algorithm for the mid levels also allows us to reclaim some cpu without trading off much loss in disk space overhead. Also note that little is to be gained by compressing levels 0 and 1. For a 4-level tree they account for 10% of the data. For a 5-level tree they account for 1% of the data. With compression enabled: * memtable flush rate is ~18MB/second * (L0,L1) compaction rate is ~30MB/second With compression enabled but min_level_to_compress=2 * memtable flush rate is ~320MB/second * (L0,L1) compaction rate is ~560MB/second This practicaly takes the same code from https://reviews.facebook.net/D6225 but makes the leveldb api more general purpose with a few additional lines of code. Test Plan: make check Differential Revision: https://reviews.facebook.net/D6261 |
||
---|---|---|
db | ||
doc | ||
hdfs | ||
helpers/memenv | ||
include/leveldb | ||
java | ||
port | ||
scribe | ||
snappy | ||
table | ||
thrift | ||
tools | ||
util | ||
.arcconfig | ||
.gitignore | ||
AUTHORS | ||
build_detect_platform | ||
build_detect_version | ||
fbcode.gcc471.sh | ||
LICENSE | ||
Makefile | ||
NEWS | ||
README | ||
README.fb | ||
TODO |
leveldb: A key-value store Authors: Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com) The code under this directory implements a system for maintaining a persistent key/value store. See doc/index.html for more explanation. See doc/impl.html for a brief overview of the implementation. The public interface is in include/*.h. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning. Guide to header files: include/db.h Main interface to the DB: Start here include/options.h Control over the behavior of an entire database, and also control over the behavior of individual reads and writes. include/comparator.h Abstraction for user-specified comparison function. If you want just bytewise comparison of keys, you can use the default comparator, but clients can write their own comparator implementations if they want custom ordering (e.g. to handle different character encodings, etc.) include/iterator.h Interface for iterating over data. You can get an iterator from a DB object. include/write_batch.h Interface for atomically applying multiple updates to a database. include/slice.h A simple module for maintaining a pointer and a length into some other byte array. include/status.h Status is returned from many of the public interfaces and is used to report success and various kinds of errors. include/env.h Abstraction of the OS environment. A posix implementation of this interface is in util/env_posix.cc include/table.h include/table_builder.h Lower-level modules that most clients probably won't use directly