rocksdb/db/db_impl
Rulin Huang 20777b96cb Optimizations in notify-one (#12545)
Summary:
We tested on icelake server (vcpu=160). The default configuration is allow_concurrent_memtable_write=1, thread number =activate core number. With our optimizations, the improvement can reach up to 184% in fillseq case. op/s is as the performance indicator in db_bench, and the following are performance improvements in some cases in db_bench.
| case name          | optimized/original  |
|-------------------:|--------------------:|
| fillrandom         | 182%                |
| fillseq            | 184%                |
| fillsync           | 136%                |
| overwrite          | 179%                |
| randomreplacekeys  | 180%                |
| randomtransaction  | 161%                |
| updaterandom       | 163%                |
| xorupdaterandom    | 165%                |

With analysis, we find that although the process of writing memtable is processed in parallel, the process of waking up the writers is not processed in parallel, which means that only one writers is responsible for the sequential waking up other writers. The following is our method to optimize this process.

Assume that there are currently n threads in total, we parallelize SetState in LaunchParallelMemTableWriters. To wake up each writer to write its own memtable, the leader writer first wakes up the (n^0.5-1) caller writers, and then those callers and the leader will wake up n/x separately to write to the memtable. This reduces the number for the leader's to SetState n-1 writers to 2*(n^0.5) writers in turn.

A reproduction script:
./db_bench --benchmarks="fillrandom"  --threads ${number of all activate vcpu}  --seed 1708494134896523  --duration 60

![image](https://github.com/facebook/rocksdb/assets/22110918/c5eca02f-93b3-4434-bba2-5155fc892a97)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12545

Reviewed By: ajkr

Differential Revision: D57422827

Pulled By: cbi42

fbshipit-source-id: 94127937c0c61e4241720bd902c82c607b7b2431
2024-05-30 09:10:44 -07:00
..
compacted_db_impl.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
compacted_db_impl.h Deprecate some variants of Get and MultiGet (#12327) 2024-02-16 09:21:06 -08:00
db_impl.cc Flush WAL upon DB close (#12684) 2024-05-22 11:08:16 -07:00
db_impl.h Rename, deprecate LogFile and VectorLogPtr (#12695) 2024-05-28 09:24:49 -07:00
db_impl_compaction_flush.cc Support read timestamp in ldb (#12641) 2024-05-13 15:43:12 -07:00
db_impl_debug.cc
db_impl_experimental.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
db_impl_files.cc Fix delete obsolete files on recovery not rate limited (#12590) 2024-05-01 12:26:54 -07:00
db_impl_follower.cc Add LDB command and option for follower instances (#12682) 2024-05-28 23:21:32 -07:00
db_impl_follower.h Implement obsolete file deletion (GC) in follower (#12657) 2024-05-17 19:13:33 -07:00
db_impl_open.cc Fix delete obsolete files on recovery not rate limited (#12590) 2024-05-01 12:26:54 -07:00
db_impl_readonly.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
db_impl_readonly.h
db_impl_secondary.cc Implement obsolete file deletion (GC) in follower (#12657) 2024-05-17 19:13:33 -07:00
db_impl_secondary.h Basic RocksDB follower implementation (#12540) 2024-04-19 19:13:31 -07:00
db_impl_write.cc Optimizations in notify-one (#12545) 2024-05-30 09:10:44 -07:00