rocksdb/db
Igor Canadi fdb6be4e24 Rewritten system for scheduling background work
Summary:
When scaling to higher number of column families, the worst bottleneck was MaybeScheduleFlushOrCompaction(), which did a for loop over all column families while holding a mutex. This patch addresses the issue.

The approach is similar to our earlier efforts: instead of a pull-model, where we do something for every column family, we can do a push-based model -- when we detect that column family is ready to be flushed/compacted, we add it to the flush_queue_/compaction_queue_. That way we don't need to loop over every column family in MaybeScheduleFlushOrCompaction.

Here are the performance results:

Command:

    ./db_bench --write_buffer_size=268435456 --db_write_buffer_size=268435456 --db=/fast-rocksdb-tmp/rocks_lots_of_cf --use_existing_db=0 --open_files=55000 --statistics=1 --histogram=1 --disable_data_sync=1 --max_write_buffer_number=2 --sync=0 --benchmarks=fillrandom --threads=16 --num_column_families=5000  --disable_wal=1 --max_background_flushes=16 --max_background_compactions=16 --level0_file_num_compaction_trigger=2 --level0_slowdown_writes_trigger=2 --level0_stop_writes_trigger=3 --hard_rate_limit=1 --num=33333333 --writes=33333333

Before the patch:

     fillrandom   :      26.950 micros/op 37105 ops/sec;    4.1 MB/s

After the patch:

      fillrandom   :      17.404 micros/op 57456 ops/sec;    6.4 MB/s

Next bottleneck is VersionSet::AddLiveFiles, which is painfully slow when we have a lot of files. This is coming in the next patch, but when I removed that code, here's what I got:

      fillrandom   :       7.590 micros/op 131758 ops/sec;   14.6 MB/s

Test Plan:
make check

two stress tests:

Big number of compactions and flushes:

    ./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0  --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000

max_background_flushes=0, to verify that this case also works correctly

    ./db_stress --threads=30 --ops_per_thread=2000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0  --reopen=3 --max_background_compactions=3 --max_background_flushes=0 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000

Reviewers: ljin, rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D30123
2014-12-19 20:38:12 +01:00
..
builder.cc Turn on -Wshadow 2014-10-31 11:59:54 -07:00
builder.h introduce ImmutableOptions 2014-09-04 16:18:36 -07:00
c.cc Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
c_test.c Add test for upper bounds on iterators using C interface. 2014-11-25 23:07:40 +00:00
column_family.cc Rewritten system for scheduling background work 2014-12-19 20:38:12 +01:00
column_family.h Rewritten system for scheduling background work 2014-12-19 20:38:12 +01:00
column_family_test.cc Improve scalability of DB::GetSnapshot() 2014-12-11 13:27:57 -08:00
compaction.cc Move the file copy out of the mutex. 2014-12-16 16:57:22 -08:00
compaction.h Move the file copy out of the mutex. 2014-12-16 16:57:22 -08:00
compaction_job.cc Fixed a bug which could hide non-ok status in CompactionJob::Run() 2014-11-16 21:52:23 -08:00
compaction_job.h Optimize usage of Status in CompactionJob 2014-11-10 11:57:58 -08:00
compaction_job_test.cc Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
compaction_picker.cc Rewritten system for scheduling background work 2014-12-19 20:38:12 +01:00
compaction_picker.h RocksDB: Allow Level-Style Compaction to Place Files in Different Paths 2014-12-15 21:48:16 -08:00
compaction_picker_test.cc RocksDB: Allow Level-Style Compaction to Place Files in Different Paths 2014-12-15 21:48:16 -08:00
comparator_db_test.cc Add rocksdb::ToString() to address cases where std::to_string is not available. 2014-11-24 20:44:49 -08:00
corruption_test.cc Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
cuckoo_table_db_test.cc remove unreliable test in db/cuckoo_table_db_test.cc 2014-11-24 15:18:09 -08:00
db_bench.cc Fix problem with create_if_missing option when wal_dir is used 2014-12-08 12:53:24 -08:00
db_filesnapshot.cc Moved checkpoint to utilities 2014-11-20 15:54:47 -08:00
db_impl.cc Rewritten system for scheduling background work 2014-12-19 20:38:12 +01:00
db_impl.h Rewritten system for scheduling background work 2014-12-19 20:38:12 +01:00
db_impl_debug.cc remove all remaining references to cfd->options() 2014-11-18 10:20:10 -08:00
db_impl_readonly.cc Block ReadOnlyDB in ROCKSDB_LITE 2014-11-26 11:37:59 -08:00
db_impl_readonly.h Block ReadOnlyDB in ROCKSDB_LITE 2014-11-26 11:37:59 -08:00
db_iter.cc Replace exception by setting valid_ = false in DBIter::MergeValuesNewToOld() 2014-12-04 11:11:11 -08:00
db_iter.h reduce references to cfd->options() in DBImpl 2014-09-08 15:04:34 -07:00
db_iter_test.cc Add rocksdb::ToString() to address cases where std::to_string is not available. 2014-11-24 20:44:49 -08:00
db_test.cc RocksDB: Allow Level-Style Compaction to Place Files in Different Paths 2014-12-15 21:48:16 -08:00
dbformat.cc Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
dbformat.h Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
dbformat_test.cc Use IterKey instead of string in Block::Iter to reduce malloc 2014-07-23 12:31:11 -07:00
deletefile_test.cc Add rocksdb::ToString() to address cases where std::to_string is not available. 2014-11-24 20:44:49 -08:00
file_indexer.cc Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
file_indexer.h Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
file_indexer_test.cc Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
filename.cc CompactFiles, EventListener and GetDatabaseMetaData 2014-11-07 14:45:18 -08:00
filename.h CompactFiles, EventListener and GetDatabaseMetaData 2014-11-07 14:45:18 -08:00
filename_test.cc Support purging logs from separate log directory 2014-08-14 13:22:50 -07:00
flush_job.cc RocksDB: Allow Level-Style Compaction to Place Files in Different Paths 2014-12-15 21:48:16 -08:00
flush_job.h CompactFiles, EventListener and GetDatabaseMetaData 2014-11-07 14:45:18 -08:00
flush_job_test.cc Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
flush_scheduler.cc Don't return (or dereference) dangling pointer 2014-10-02 14:33:16 -07:00
flush_scheduler.h Push model for flushing memtables 2014-09-10 18:46:09 -07:00
forward_iterator.cc remove all remaining references to cfd->options() 2014-11-18 10:20:10 -08:00
forward_iterator.h Make ForwardIterator::status() more efficient 2014-11-10 15:44:20 -08:00
internal_stats.cc Add DBProperty to return number of snapshots and time for oldest snapshot 2014-12-05 17:07:49 -08:00
internal_stats.h Add DBProperty to return number of snapshots and time for oldest snapshot 2014-12-05 17:07:49 -08:00
job_context.h Explicitly clean JobContext 2014-11-14 15:43:10 -08:00
listener_test.cc Add rocksdb::ToString() to address cases where std::to_string is not available. 2014-11-24 20:44:49 -08:00
log_and_apply_bench.cc Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
log_format.h Some minor refactoring on the code 2014-01-02 16:32:31 -08:00
log_reader.cc Fix iOS compile with -Wshorten-64-to-32 2014-11-13 14:39:30 -05:00
log_reader.h Fix UnmarkEOF for partial blocks 2014-01-27 14:49:10 -08:00
log_test.cc Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
log_writer.cc Add appropriate LICENSE and Copyright message. 2013-10-16 17:48:41 -07:00
log_writer.h Add appropriate LICENSE and Copyright message. 2013-10-16 17:48:41 -07:00
memtable.cc Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
memtable.h Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
memtable_allocator.cc Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
memtable_allocator.h Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
memtable_list.cc Redesign pending_outputs_ 2014-11-07 11:50:34 -08:00
memtable_list.h Redesign pending_outputs_ 2014-11-07 11:50:34 -08:00
merge_context.h Enhance partial merge to support multiple arguments 2014-03-24 17:57:13 -07:00
merge_helper.cc Turn -Wshadow back on 2014-11-06 11:14:28 -08:00
merge_helper.h Fixed the crash when merge_operator is not properly set after reopen. 2014-07-30 17:24:36 -07:00
merge_operator.cc Some small cleaning up to make some compiling environment happy 2014-03-26 18:11:41 -07:00
merge_test.cc Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
perf_context_test.cc Add rocksdb::ToString() to address cases where std::to_string is not available. 2014-11-24 20:44:49 -08:00
plain_table_db_test.cc Add rocksdb::ToString() to address cases where std::to_string is not available. 2014-11-24 20:44:49 -08:00
prefix_test.cc Add rocksdb::ToString() to address cases where std::to_string is not available. 2014-11-24 20:44:49 -08:00
repair.cc Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
skiplist.h Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
skiplist_test.cc Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
snapshot.h Add DBProperty to return number of snapshots and time for oldest snapshot 2014-12-05 17:07:49 -08:00
table_cache.cc TableMock + framework for mock classes 2014-10-28 17:52:32 -07:00
table_cache.h use GetContext to replace callback function pointer 2014-09-29 11:09:09 -07:00
table_properties_collector.cc Add rocksdb::ToString() to address cases where std::to_string is not available. 2014-11-24 20:44:49 -08:00
table_properties_collector.h TablePropertiesCollectorFactory 2014-05-13 12:30:55 -07:00
table_properties_collector_test.cc fix asan check 2014-09-05 09:53:04 -07:00
transaction_log_impl.cc Turn -Wshadow back on 2014-11-06 11:14:28 -08:00
transaction_log_impl.h Turn -Wshadow back on 2014-11-06 11:14:28 -08:00
version_builder.cc Add an assert and avoid std::sort(autovector) to investigate an ASAN issue 2014-12-12 12:44:00 -08:00
version_builder.h Move VersionBuilder logic to a separate .cc file 2014-10-31 16:34:38 -07:00
version_builder_test.cc Add rocksdb::ToString() to address cases where std::to_string is not available. 2014-11-24 20:44:49 -08:00
version_edit.cc Turn on -Wshadow 2014-10-31 11:59:54 -07:00
version_edit.h Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
version_edit_test.cc Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
version_set.cc Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
version_set.h Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
version_set_test.cc rename FileLevel to LevelFilesBrief / unfriend CompactedDBImpl 2014-10-28 10:03:13 -07:00
wal_manager.cc Add rocksdb::ToString() to address cases where std::to_string is not available. 2014-11-24 20:44:49 -08:00
wal_manager.h Fix -Wnon-virtual-dtor errors 2014-11-10 17:39:38 -05:00
wal_manager_test.cc Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
write_batch.cc Remove the use of exception in WriteBatch::Handler 2014-12-04 12:01:55 -08:00
write_batch_internal.h remove all remaining references to cfd->options() 2014-11-18 10:20:10 -08:00
write_batch_test.cc Remove the use of exception in WriteBatch::Handler 2014-12-04 12:01:55 -08:00
write_controller.cc Push- instead of pull-model for managing Write stalls 2014-09-08 11:20:25 -07:00
write_controller.h Fix #284 2014-09-13 14:14:10 -07:00
write_controller_test.cc Push- instead of pull-model for managing Write stalls 2014-09-08 11:20:25 -07:00
write_thread.cc WriteThread 2014-09-12 16:23:58 -07:00
write_thread.h WriteThread 2014-09-12 16:23:58 -07:00
writebuffer.h Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00