rocksdb/db/db_impl
Peter Dillinger 546e213c4f Fix DelayWrite() calls for two_write_queues (#11130)
Summary:
PR https://github.com/facebook/rocksdb/issues/11020 fixed a case where it was easy to deadlock the DB with LockWAL() but introduced a bug showing up as a rare assertion failure in the stress test. Specifically, `assert(w->state == STATE_INIT)` in `WriteThread::LinkOne()` called from `BeginWriteStall()`, `DelayWrite()`, `WriteImplWALOnly()`. I haven't been about to generate a unit test that reproduces this failure but I believe the root cause is that DelayWrite() was never meant to be re-entrant, only called from the DB's write_thread_ leader. https://github.com/facebook/rocksdb/issues/11020 introduced a call to DelayWrite() from the nonmem_write_thread_ group leader.

This fix is to make DelayWrite() apply to the specific write queue that it is being called from (inject a dummy write stall entry to the head of the appropriate write queue). WriteController is re-entrant, based on polling and state changes signalled with bg_cv_, so can manage stalling two queues. The only anticipated complication (called out by Andrew in previous PR) is that we don't want timed write delays being injected in parallel for the two queues, because that dimishes the intended throttling effect. Thus, we only allow timed delays for the primary write queue.

HISTORY not updated because this is intended for the same release where the bug was introduced.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11130

Test Plan:
Although I was not able to reproduce the assertion failure, I was able to reproduce a distinct flaw with what I believe is the same root cause: a kind of deadlock if both write queues need to wake up from stopped writes. Only one will be waiting on bg_cv_ (the other waiting in `LinkOne()` for the write queue to open up), so a single SignalAll() will only unblock one of the queues, with the other re-instating the stop until another signal on bg_cv_. A simple unit test is added for this case.

Will also run crash_test_with_multiops_wc_txn for a while looking for issues.

Reviewed By: ajkr

Differential Revision: D42749330

Pulled By: pdillinger

fbshipit-source-id: 4317dd899a93d57c26fd5af7143038f82d4d4d1b
2023-01-25 14:18:27 -08:00
..
compacted_db_impl.cc Run clang-format on some files in db/db_impl directory (#10869) 2022-10-25 13:49:09 -07:00
compacted_db_impl.h Run clang-format on some files in db/db_impl directory (#10869) 2022-10-25 13:49:09 -07:00
db_impl.cc Fix data race on ColumnFamilyData::flush_reason by letting FlushRequest/Job owns flush_reason instead of CFD (#11111) 2023-01-24 09:54:04 -08:00
db_impl.h Fix DelayWrite() calls for two_write_queues (#11130) 2023-01-25 14:18:27 -08:00
db_impl_compaction_flush.cc Fix data race on ColumnFamilyData::flush_reason by letting FlushRequest/Job owns flush_reason instead of CFD (#11111) 2023-01-24 09:54:04 -08:00
db_impl_debug.cc Add an unittest for Periodic compaction conflict with ongoing compaction (#10908) 2022-12-12 10:37:55 -08:00
db_impl_experimental.cc Include estimated bytes deleted by range tombstones in compensated file size (#10734) 2022-12-29 13:28:24 -08:00
db_impl_files.cc Fix missing WAL in new manifest by rolling over the WAL deletion record from prev manifest (#10892) 2022-11-29 14:14:43 -08:00
db_impl_open.cc Include estimated bytes deleted by range tombstones in compensated file size (#10734) 2022-12-29 13:28:24 -08:00
db_impl_readonly.cc Skip swaths of range tombstone covered keys in merging iterator (2022 edition) (#10449) 2022-09-02 09:51:19 -07:00
db_impl_readonly.h Run clang-format on some files in db/db_impl directory (#10869) 2022-10-25 13:49:09 -07:00
db_impl_secondary.cc Run clang-format on some files in db/db_impl directory (#10869) 2022-10-25 13:49:09 -07:00
db_impl_secondary.h Run clang-format on some files in db/db_impl directory (#10869) 2022-10-25 13:49:09 -07:00
db_impl_write.cc Fix DelayWrite() calls for two_write_queues (#11130) 2023-01-25 14:18:27 -08:00