rocksdb/db
Nathan Bronson b7198c3afe reduce db mutex contention for write batch groups
Summary:
This diff allows a Writer to join the next write batch group
without acquiring any locks. Waiting is performed via a per-Writer mutex,
so all of the non-leader writers never need to acquire the db mutex.
It is now possible to join a write batch group after the leader has been
chosen but before the batch has been constructed. This diff doesn't
increase parallelism, but reduces synchronization overheads.

For some CPU-bound workloads (no WAL, RAM-sized working set) this can
substantially reduce contention on the db mutex in a multi-threaded
environment.  With T=8 N=500000 in a CPU-bound scenario (see the test
plan) this is good for a 33% perf win.  Not all scenarios see such a
win, but none show a loss.  This code is slightly faster even for the
single-threaded case (about 2% for the CPU-bound scenario below).

Test Plan:
1. unit tests
2. COMPILE_WITH_TSAN=1 make check
3. stress high-contention scenarios with db_bench -benchmarks=fillrandom -threads=$T -batch_size=1 -memtablerep=skip_list -value_size=0 --num=$N -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -disable_auto_compactions --max_write_buffer_number=8 -max_background_flushes=8 --disable_wal --write_buffer_size=160000000

Reviewers: sdong, igor, rven, ljin, yhchiang

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D43887
2015-08-14 10:55:43 -07:00
..
builder.cc Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env 2015-07-17 16:58:18 -07:00
builder.h Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env 2015-07-17 16:58:18 -07:00
c.cc Don't let flushes preempt compactions 2015-07-17 12:02:52 -07:00
c_test.c Deprecate CompactionFilterV2 2015-07-17 18:59:11 +02:00
column_family.cc Don't let flushes preempt compactions 2015-07-17 12:02:52 -07:00
column_family.h Fail DB::Open() when the requested compression is not available 2015-06-18 14:55:05 -07:00
column_family_test.cc Removing duplicate code 2015-08-05 07:33:27 -07:00
compact_files_test.cc Improved FileExists API 2015-07-20 17:20:40 -07:00
compacted_db_impl.cc Remove db_impl_readonly dependency on utilities 2015-07-14 11:32:54 -07:00
compacted_db_impl.h Remove db_impl_readonly dependency on utilities 2015-07-14 11:32:54 -07:00
compaction.cc Fix when output level is 0 of universal compaction with trivial move 2015-07-27 14:25:57 -07:00
compaction.h Parallelize L0-L1 Compaction: Restructure Compaction Job 2015-08-03 11:32:14 -07:00
compaction_job.cc Add options.compaction_measure_io_stats to print write I/O stats in compactions 2015-08-13 16:52:26 -07:00
compaction_job.h Add options.compaction_measure_io_stats to print write I/O stats in compactions 2015-08-13 16:52:26 -07:00
compaction_job_stats_test.cc Add options.compaction_measure_io_stats to print write I/O stats in compactions 2015-08-13 16:52:26 -07:00
compaction_job_test.cc Add options.compaction_measure_io_stats to print write I/O stats in compactions 2015-08-13 16:52:26 -07:00
compaction_picker.cc Fix CompactFiles by adding all necessary files 2015-08-03 15:53:22 -07:00
compaction_picker.h Fix when output level is 0 of universal compaction with trivial move 2015-07-27 14:25:57 -07:00
compaction_picker_test.cc Make compaction_picker_test runnable in ROCKSDB_LITE 2015-07-20 10:46:09 -07:00
comparator_db_test.cc Make "make all" work for CYGWIN 2015-06-09 16:36:07 -07:00
convenience.cc move convenience.h out of utilities 2015-07-15 14:51:51 -07:00
corruption_test.cc Don't let flushes preempt compactions 2015-07-17 12:02:52 -07:00
cuckoo_table_db_test.cc Block cuckoo table tests in ROCKSDB_LITE 2015-07-20 10:50:46 -07:00
db_bench.cc Add options.compaction_measure_io_stats to print write I/O stats in compactions 2015-08-13 16:52:26 -07:00
db_compaction_filter_test.cc Don't let flushes preempt compactions 2015-07-17 12:02:52 -07:00
db_compaction_test.cc Update Tests To Enable Subcompactions 2015-08-04 22:19:07 -07:00
db_dynamic_level_test.cc Move DynamicLevel related db-tests to db_dynamic_level_test.cc 2015-07-13 19:00:30 -07:00
db_filesnapshot.cc Add wal files to Checkpoint for multiple column families. 2015-06-19 16:08:31 -07:00
db_impl.cc reduce db mutex contention for write batch groups 2015-08-14 10:55:43 -07:00
db_impl.h Pessimistic Transactions 2015-08-11 17:52:23 -07:00
db_impl_debug.cc reduce db mutex contention for write batch groups 2015-08-14 10:55:43 -07:00
db_impl_experimental.cc Clean up InstallSuperVersion 2015-06-17 12:37:59 -07:00
db_impl_readonly.cc Remove db_impl_readonly dependency on utilities 2015-07-14 11:32:54 -07:00
db_impl_readonly.h Use CompactRangeOptions for CompactRange 2015-06-17 14:36:14 -07:00
db_inplace_update_test.cc Move in-place-update related tests from db_test.cc to db_inplace_update_test.cc 2015-07-20 16:05:28 -07:00
db_iter.cc Removing duplicate code in db_bench/db_stress, fixing typos 2015-08-11 11:46:15 -07:00
db_iter.h reduce references to cfd->options() in DBImpl 2014-09-08 15:04:34 -07:00
db_iter_test.cc Add test case to repro the mispositional iterator in a low-chance data race case 2015-08-12 10:50:52 -07:00
db_log_iter_test.cc Make TransactionLogIterator related tests from db_test.cc to db_log_iter_test.cc 2015-07-14 16:08:21 -07:00
db_tailing_iter_test.cc Move TailingIterator tests from db_test.cc to db_test_tailing_iterator.cc 2015-07-14 16:41:08 -07:00
db_test.cc Parallelize LoadTableHandlers 2015-08-11 12:19:56 -07:00
db_universal_compaction_test.cc Fix when output level is 0 of universal compaction with trivial move 2015-07-27 14:25:57 -07:00
db_wal_test.cc Add two unit tests for SyncWAL() 2015-08-05 14:27:02 -07:00
dbformat.cc Replace %llu with format macros in ParsedInternalKey::DebugString()) 2015-06-17 20:44:26 -07:00
dbformat.h Avoid manipulating const char* arrays 2015-07-14 00:21:41 -07:00
dbformat_test.cc Avoid manipulating const char* arrays 2015-07-14 00:21:41 -07:00
deletefile_test.cc Improved FileExists API 2015-07-20 17:20:40 -07:00
event_helpers.cc Add EventListener::OnTableFileDeletion() 2015-06-03 19:57:01 -07:00
event_helpers.h Add EventListener::OnTableFileDeletion() 2015-06-03 19:57:01 -07:00
experimental.cc Implement DB::PromoteL0 method 2015-04-23 12:10:36 -07:00
fault_injection_test.cc [wal changes 3/3] method in DB to sync WAL without blocking writers 2015-08-05 06:06:39 -07:00
file_indexer.cc Fix possible SIGSEGV in CompactRange (github issue #596) 2015-04-29 10:52:31 -07:00
file_indexer.h Fix public API dependency on internal codes and dependency on MAX_INT32 2015-07-11 10:32:11 -07:00
file_indexer_test.cc Fix possible SIGSEGV in CompactRange (github issue #596) 2015-04-29 10:52:31 -07:00
filename.cc Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env 2015-07-17 16:58:18 -07:00
filename.h Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env 2015-07-17 16:58:18 -07:00
filename_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
flush_job.cc Don't let flushes preempt compactions 2015-07-17 12:02:52 -07:00
flush_job.h simple ManagedSnapshot wrapper 2015-08-06 17:59:05 -07:00
flush_job_test.cc Better CompactionJob testing 2015-08-07 21:59:51 -07:00
flush_scheduler.cc Don't return (or dereference) dangling pointer 2014-10-02 14:33:16 -07:00
flush_scheduler.h Fix data race #1 2015-01-26 11:48:07 -08:00
forward_iterator.cc fixed leaking log::Writers 2015-07-07 12:10:10 -07:00
forward_iterator.h rocksdb: Add missing override 2015-02-26 11:28:41 -08:00
internal_stats.cc Removing duplicate code in db_bench/db_stress, fixing typos 2015-08-11 11:46:15 -07:00
internal_stats.h Report live data size estimate 2015-07-21 21:33:20 -07:00
job_context.h fixed leaking log::Writers 2015-07-07 12:10:10 -07:00
listener_test.cc Deprecate WriteOptions::timeout_hint_us 2015-07-14 09:35:48 +02:00
log_format.h Some minor refactoring on the code 2014-01-02 16:32:31 -08:00
log_reader.cc Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env 2015-07-17 16:58:18 -07:00
log_reader.h Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env 2015-07-17 16:58:18 -07:00
log_test.cc Removing duplicate code 2015-08-05 07:33:27 -07:00
log_writer.cc Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env 2015-07-17 16:58:18 -07:00
log_writer.h Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env 2015-07-17 16:58:18 -07:00
managed_iterator.cc Windows Port from Microsoft 2015-07-01 16:13:56 -07:00
managed_iterator.h Fixed xfunc related compile errors in ROCKSDB_LITE 2015-04-09 21:05:18 -07:00
memtable.cc Allow GetApproximateSize() to include mem table size if it is skip list memtable 2015-06-16 18:13:23 -07:00
memtable.h Allow GetApproximateSize() to include mem table size if it is skip list memtable 2015-06-16 18:13:23 -07:00
memtable_allocator.cc Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
memtable_allocator.h Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
memtable_list.cc Allow GetApproximateSize() to include mem table size if it is skip list memtable 2015-06-16 18:13:23 -07:00
memtable_list.h Allow GetApproximateSize() to include mem table size if it is skip list memtable 2015-06-16 18:13:23 -07:00
memtable_list_test.cc Removing duplicate code 2015-08-05 07:33:27 -07:00
memtablerep_bench.cc Windows Port from Microsoft 2015-07-01 16:13:56 -07:00
merge_context.h API to fetch from both a WriteBatchWithIndex and the db 2015-05-11 14:51:51 -07:00
merge_helper.cc Further cleanup of CompactionJob and MergeHelper 2015-07-28 19:21:55 -07:00
merge_helper.h Further cleanup of CompactionJob and MergeHelper 2015-07-28 19:21:55 -07:00
merge_helper_test.cc Further cleanup of CompactionJob and MergeHelper 2015-07-28 19:21:55 -07:00
merge_operator.cc Call merge operators with empty values 2015-06-26 11:35:46 -07:00
merge_test.cc Make merge_test runnable in ROCKSDB_LITE 2015-07-20 11:17:52 -07:00
perf_context_test.cc Makefile minor cleanup 2015-03-30 16:05:35 -04:00
plain_table_db_test.cc Block plain_table_db_test in ROCKSDB_LITE 2015-07-20 11:12:02 -07:00
prefix_test.cc Skip unsupported tests in ROCKSDB_LITE 2015-07-20 11:24:54 -07:00
repair.cc Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env 2015-07-17 16:58:18 -07:00
skiplist.h reduce comparisons by skiplist 2015-08-11 11:25:22 -07:00
skiplist_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
slice.cc Create an abstract interface for write batches 2015-03-17 19:23:08 -07:00
snapshot_impl.cc simple ManagedSnapshot wrapper 2015-08-06 17:59:05 -07:00
snapshot_impl.h simple ManagedSnapshot wrapper 2015-08-06 17:59:05 -07:00
table_cache.cc Add statistic histogram "rocksdb.sst.read.micros" 2015-08-05 13:02:33 -07:00
table_cache.h Add statistic histogram "rocksdb.sst.read.micros" 2015-08-05 13:02:33 -07:00
table_properties_collector.cc A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge 2015-04-06 10:27:21 -07:00
table_properties_collector.h Add TablePropertiesCollector::NeedCompact() to suggest DB to further compact output files 2015-06-05 20:18:21 -07:00
table_properties_collector_test.cc Removing duplicate code 2015-08-05 07:33:27 -07:00
transaction_log_impl.cc Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env 2015-07-17 16:58:18 -07:00
transaction_log_impl.h Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env 2015-07-17 16:58:18 -07:00
version_builder.cc Parallelize LoadTableHandlers 2015-08-11 12:19:56 -07:00
version_builder.h Parallelize LoadTableHandlers 2015-08-11 12:19:56 -07:00
version_builder_test.cc Add TablePropertiesCollector::NeedCompact() to suggest DB to further compact output files 2015-06-05 20:18:21 -07:00
version_edit.cc Added JSON manifest dump option to ldb command 2015-07-17 10:07:40 -07:00
version_edit.h Added JSON manifest dump option to ldb command 2015-07-17 10:07:40 -07:00
version_edit_test.cc Add TablePropertiesCollector::NeedCompact() to suggest DB to further compact output files 2015-06-05 20:18:21 -07:00
version_set.cc Parallelize LoadTableHandlers 2015-08-11 12:19:56 -07:00
version_set.h Add DBOptions::skip_sats_update_on_db_open 2015-08-04 13:48:16 -07:00
version_set_test.cc Report live data size estimate 2015-07-21 21:33:20 -07:00
wal_manager.cc Improved FileExists API 2015-07-20 17:20:40 -07:00
wal_manager.h Fix -Wnon-virtual-dtor errors 2014-11-10 17:39:38 -05:00
wal_manager_test.cc Skip unsupported tests in ROCKSDB_LITE 2015-07-20 11:24:54 -07:00
write_batch.cc simple ManagedSnapshot wrapper 2015-08-06 17:59:05 -07:00
write_batch_base.cc WriteBatch.Merge w/ SliceParts support 2015-05-29 04:30:03 -07:00
write_batch_internal.h WriteBatch Save Points 2015-07-29 16:54:23 -07:00
write_batch_test.cc WriteBatch Save Points 2015-07-29 16:54:23 -07:00
write_callback.h Optimistic Transactions 2015-05-29 14:36:35 -07:00
write_callback_test.cc Fix compile for write_callback_test in ROCKSDB_LITE 2015-07-20 10:54:15 -07:00
write_controller.cc Slow down writes by bytes written 2015-06-11 20:42:18 -07:00
write_controller.h Slow down writes by bytes written 2015-06-11 20:42:18 -07:00
write_controller_test.cc Slow down writes by bytes written 2015-06-11 20:42:18 -07:00
write_thread.cc reduce db mutex contention for write batch groups 2015-08-14 10:55:43 -07:00
write_thread.h reduce db mutex contention for write batch groups 2015-08-14 10:55:43 -07:00
writebuffer.h Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00