mirror of
https://github.com/facebook/rocksdb.git
synced 2024-11-26 07:30:54 +00:00
Update documentation
Summary: Added more options for compaction settings + thread pools. Please check if thread pool description is correct. Test Plan: - Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14043
This commit is contained in:
parent
9df2b217e9
commit
c3dda7276c
131
doc/index.html
131
doc/index.html
|
@ -387,7 +387,8 @@ of point reads of small values may wish to switch to a smaller block
|
|||
size if performance measurements indicate an improvement. There isn't
|
||||
much benefit in using blocks smaller than one kilobyte, or larger than
|
||||
a few megabytes. Also note that compression will be more effective
|
||||
with larger block sizes.
|
||||
with larger block sizes. To change block size parameter, use
|
||||
<code>Options::block_size</code>.
|
||||
<p>
|
||||
<h2>Write buffer</h2>
|
||||
<p>
|
||||
|
@ -434,7 +435,7 @@ filesystem and each file stores a sequence of compressed blocks. If
|
|||
used uncompressed block contents. If <code>options.block_cache_compressed</code>
|
||||
is non-NULL, it is used to cache frequently used compressed blocks. Compressed
|
||||
cache is an alternative to OS cache, which also caches compressed blocks. If
|
||||
compressed cache is used, you should disable OS cache by setting
|
||||
compressed cache is used, the OS cache will be disabled automatically by setting
|
||||
<code>options.allow_os_buffer</code> to false.
|
||||
<p>
|
||||
<pre>
|
||||
|
@ -588,7 +589,7 @@ Here we give overview of the options that impact behavior of Compactions:
|
|||
<ul>
|
||||
<p>
|
||||
<li><code>Options::compaction_style</code> - RocksDB currently supports two
|
||||
compaction algorithms - Compaction style and Level style. This option switches
|
||||
compaction algorithms - Universal style and Level style. This option switches
|
||||
between the two. Can be kCompactionStyleUniversal or kCompactionStyleLevel.
|
||||
If this is kCompactionStyleUniversal, then you can configure universal style
|
||||
parameters with <code>Options::compaction_options_universal</code>.
|
||||
|
@ -608,16 +609,126 @@ key-value during background compaction.
|
|||
</ul>
|
||||
<p>
|
||||
Other options impacting performance of compactions and when they get triggered
|
||||
are: <code>access_hint_on_compaction_start</code>,
|
||||
<code>level0_file_num_compaction_trigger</code>,
|
||||
<code>max_mem_compaction_level</code>, <code>target_file_size_base</code>,
|
||||
<code>target_file_size_multiplier</code>,
|
||||
<code>expanded_compaction_factor</code>, <code>source_compaction_factor</code>,
|
||||
<code>max_grandparent_overlap_factor</code>,
|
||||
<code>disable_seek_compaction</code>, <code>max_background_compactions</code>.
|
||||
are:
|
||||
<ul>
|
||||
<p>
|
||||
<li> <code>Options::access_hint_on_compaction_start</code> - Specify the file access
|
||||
pattern once a compaction is started. It will be applied to all input files of a compaction. Default: NORMAL
|
||||
<p>
|
||||
<li> <code>Options::level0_file_num_compaction_trigger</code> - Number of files to trigger level-0 compaction.
|
||||
A negative value means that level-0 compaction will not be triggered by number of files at all.
|
||||
<p>
|
||||
<li> <code>Options::max_mem_compaction_level</code> - Maximum level to which a new compacted memtable is pushed if it
|
||||
does not create overlap. We try to push to level 2 to avoid the relatively expensive level 0=>1 compactions and to avoid some
|
||||
expensive manifest file operations. We do not push all the way to the largest level since that can generate a lot of wasted disk
|
||||
space if the same key space is being repeatedly overwritten.
|
||||
<p>
|
||||
<li> <code>Options::target_file_size_base</code> and <code>Options::target_file_size_multiplier</code> -
|
||||
Target file size for compaction. target_file_size_base is per-file size for level-1.
|
||||
Target file size for level L can be calculated by target_file_size_base * (target_file_size_multiplier ^ (L-1))
|
||||
For example, if target_file_size_base is 2MB and target_file_size_multiplier is 10, then each file on level-1 will
|
||||
be 2MB, and each file on level 2 will be 20MB, and each file on level-3 will be 200MB. Default target_file_size_base is 2MB
|
||||
and default target_file_size_multiplier is 1.
|
||||
<p>
|
||||
<li> <code>Options::expanded_compaction_factor</code> - Maximum number of bytes in all compacted files. We avoid expanding
|
||||
the lower level file set of a compaction if it would make the total compaction cover more than
|
||||
(expanded_compaction_factor * targetFileSizeLevel()) many bytes.
|
||||
<p>
|
||||
<li> <code>Options::source_compaction_factor</code> - Maximum number of bytes in all source files to be compacted in a
|
||||
single compaction run. We avoid picking too many files in the source level so that we do not exceed the total source bytes
|
||||
for compaction to exceed (source_compaction_factor * targetFileSizeLevel()) many bytes.
|
||||
Default:1, i.e. pick maxfilesize amount of data as the source of a compaction.
|
||||
<p>
|
||||
<li> <code>Options::max_grandparent_overlap_factor</code> - Control maximum bytes of overlaps in grandparent (i.e., level+2) before we
|
||||
stop building a single file in a level->level+1 compaction.
|
||||
<p>
|
||||
<li> <code>Options::disable_seek_compaction</code> - Disable compaction triggered by seek.
|
||||
With bloomfilter and fast storage, a miss on one level is very cheap if the file handle is cached in table cache
|
||||
(which is true if max_open_files is large).
|
||||
<p>
|
||||
<li> <code>Options::max_background_compactions</code> - Maximum number of concurrent background jobs, submitted to
|
||||
the default LOW priority thread pool
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
You can learn more about all of those options in <code>rocksdb/options.h</code>
|
||||
|
||||
<h2> Universal style compaction specific settings</h2>
|
||||
<p>
|
||||
If you're using Universal style compaction, there is an object <code>CompactionOptionsUniversal</code>
|
||||
that hold all the different options for that compaction. The exact definition is in
|
||||
<code>rocksdb/universal_compaction.h</code> and you can set it in <code>Options::compaction_options_universal</code>.
|
||||
Here we give short overview of options in <code>CompactionOptionsUniversal</code>:
|
||||
<ul>
|
||||
<p>
|
||||
<li> <code>CompactionOptionsUniversal::size_ratio</code> - Percentage flexibilty while comparing file size. If the candidate file(s)
|
||||
size is 1% smaller than the next file's size, then include next file into
|
||||
this candidate set. Default: 1
|
||||
<p>
|
||||
<li> <code>CompactionOptionsUniversal::min_merge_width</code> - The minimum number of files in a single compaction run. Default: 2
|
||||
<p>
|
||||
<li> <code>CompactionOptionsUniversal::max_merge_width</code> - The maximum number of files in a single compaction run. Default: UINT_MAX
|
||||
<p>
|
||||
<li> <code>CompactionOptionsUniversal::max_size_amplification_percent</code> - The size amplification is defined as the amount (in percentage) of
|
||||
additional storage needed to store a single byte of data in the database. For example, a size amplification of 2% means that a database that
|
||||
contains 100 bytes of user-data may occupy upto 102 bytes of physical storage. By this definition, a fully compacted database has
|
||||
a size amplification of 0%. Rocksdb uses the following heuristic to calculate size amplification: it assumes that all files excluding
|
||||
the earliest file contribute to the size amplification. Default: 200, which means that a 100 byte database could require upto
|
||||
300 bytes of storage.
|
||||
<p>
|
||||
<li> <code>CompactionOptionsUniversal::compression_size_percent</code> - If this option is set to be -1 (the default value), all the output files
|
||||
will follow compression type specified. If this option is not negative, we will try to make sure compressed
|
||||
size is just above this value. In normal cases, at least this percentage
|
||||
of data will be compressed.
|
||||
When we are compacting to a new file, here is the criteria whether
|
||||
it needs to be compressed: assuming here are the list of files sorted
|
||||
by generation time: [ A1...An B1...Bm C1...Ct ],
|
||||
where A1 is the newest and Ct is the oldest, and we are going to compact
|
||||
B1...Bm, we calculate the total size of all the files as total_size, as
|
||||
well as the total size of C1...Ct as total_C, the compaction output file
|
||||
will be compressed iff total_C / total_size < this percentage
|
||||
<p>
|
||||
<li> <code>CompactionOptionsUniversal::stop_style</code> - The algorithm used to stop picking files into a single compaction run.
|
||||
Can be kCompactionStopStyleSimilarSize (pick files of similar size) or kCompactionStopStyleTotalSize (total size of picked files > next file).
|
||||
Default: kCompactionStopStyleTotalSize
|
||||
</ul>
|
||||
|
||||
<h1>Thread pools</h1>
|
||||
<p>
|
||||
A thread pool is associated with Env environment object. The client has to create a thread pool by setting the number of background
|
||||
threads using method <code>Env::SetBackgroundThreads()</code> defined in <code>rocksdb/env.h</code>.
|
||||
We use the thread pool for compactions and memtable flushes.
|
||||
Since memtable flushes are in critical code path (stalling memtable flush can stall writes, increasing p99), we suggest
|
||||
having two thread pools - with priorities HIGH and LOW. Memtable flushes can be set up to be scheduled on HIGH thread pool.
|
||||
There are two options available for configuration of background compactions and flushes:
|
||||
<ul>
|
||||
<p>
|
||||
<li> <code>Options::max_background_compactions</code> - Maximum number of concurrent background jobs,
|
||||
submitted to the default LOW priority thread pool
|
||||
<p>
|
||||
<li> <code>Options::max_background_flushes</code> - Maximum number of concurrent background memtable flush jobs, submitted to
|
||||
the HIGH priority thread pool. By default, all background jobs (major compaction and memtable flush) go
|
||||
to the LOW priority pool. If this option is set to a positive number, memtable flush jobs will be submitted to the HIGH priority pool.
|
||||
It is important when the same Env is shared by multiple db instances. Without a separate pool, long running major compaction jobs could
|
||||
potentially block memtable flush jobs of other db instances, leading to unnecessary Put stalls.
|
||||
</ul>
|
||||
<p>
|
||||
<pre>
|
||||
#include "rocksdb/env.h"
|
||||
#include "rocksdb/db.h"
|
||||
|
||||
auto env = rocksdb::Env::Default();
|
||||
env->SetBackgroundThreads(2, rocksdb::Env::LOW);
|
||||
env->SetBackgroundThreads(1, rocksdb::Env::HIGH);
|
||||
rocksdb::DB* db;
|
||||
rocksdb::Options options;
|
||||
options.env = env;
|
||||
options.max_background_compactions = 2;
|
||||
options.max_background_flushes = 1;
|
||||
rocksdb::Status status = rocksdb::DB::Open(options, "/tmp/testdb", &db);
|
||||
assert(status.ok());
|
||||
...
|
||||
</pre>
|
||||
<h1>Approximate Sizes</h1>
|
||||
<p>
|
||||
The <code>GetApproximateSizes</code> method can used to get the approximate
|
||||
|
|
Loading…
Reference in a new issue