rocksdb/utilities/transactions
Yanqin Jin 0547cecb81 Reduce access to atomic variables in a test (#10909)
Summary:
With TSAN build on CircleCI (see mini-tsan in .circleci/config).
Sometimes `SeqAdvanceConcurrentTest.SeqAdvanceConcurrent` will get stuck when an experimental feature called
"unordered write" is enabled. Stack trace will be the following
```
Thread 7 (Thread 0x7f2284a1c700 (LWP 481523) "write_prepared_"):
#0  0x00000000004fa3f5 in __tsan_atomic64_load () at ./db/merge_context.h:15
https://github.com/facebook/rocksdb/issues/1  0x00000000005e5942 in std::__atomic_base<unsigned long>::load (this=0x7b74000012f8, __m=std::memory_order_seq_cst) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:481
https://github.com/facebook/rocksdb/issues/2  std::__atomic_base<unsigned long>::operator unsigned long (this=0x7b74000012f8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:341
https://github.com/facebook/rocksdb/issues/3  0x00000000005bf001 in rocksdb::SeqAdvanceConcurrentTest_SeqAdvanceConcurrent_Test::TestBody()::$_9::operator()(void*) const (this=0x7b14000085e8) at utilities/transactions/write_prepared_transaction_test.cc:1702

Thread 6 (Thread 0x7f228421b700 (LWP 481521) "write_prepared_"):
#0  0x000000000052178c in __tsan::MetaMap::GetAndLock(__tsan::ThreadState*, unsigned long, unsigned long, bool, bool) () at ./db/merge_context.h:15
https://github.com/facebook/rocksdb/issues/1  0x00000000004fa48e in __tsan_atomic64_load () at ./db/merge_context.h:15
https://github.com/facebook/rocksdb/issues/2  0x00000000005e5942 in std::__atomic_base<unsigned long>::load (this=0x7b74000012f8, __m=std::memory_order_seq_cst) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:481
https://github.com/facebook/rocksdb/issues/3  std::__atomic_base<unsigned long>::operator unsigned long (this=0x7b74000012f8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:341
https://github.com/facebook/rocksdb/issues/4  0x00000000005bf001 in rocksdb::SeqAdvanceConcurrentTest_SeqAdvanceConcurrent_Test::TestBody()::$_9::operator()(void*) const (this=0x7b14000085e8) at utilities/transactions/write_prepared_transaction_test.cc:1702
```

This is problematic and suspicious. Two threads will get stuck in the same place trying to load from an atomic variable.
https://github.com/facebook/rocksdb/blob/7.8.fb/utilities/transactions/write_prepared_transaction_test.cc#L1694:L1707. Not sure why two threads can reach the same point.

The stack trace shows that there may be a deadlock, since the two threads are on the same write thread (one is doing Prepare, while the other is trying to commit).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10909

Test Plan:
On CircleCI mini-tsan, apply a patch first so that we have a higher chance of hitting the same problematic situation,
```
 diff --git a/utilities/transactions/write_prepared_transaction_test.cc b/utilities/transactions/write_prepared_transaction_test.cc
index 4bc1f3744..bd5dc4924 100644
 --- a/utilities/transactions/write_prepared_transaction_test.cc
+++ b/utilities/transactions/write_prepared_transaction_test.cc
@@ -1714,13 +1714,13 @@ TEST_P(SeqAdvanceConcurrentTest, SeqAdvanceConcurrent) {
       size_t d = (n % base[bi + 1]) / base[bi];
       switch (d) {
         case 0:
-          threads.emplace_back(txn_t0, bi);
+          threads.emplace_back(txn_t3, bi);
           break;
         case 1:
-          threads.emplace_back(txn_t1, bi);
+          threads.emplace_back(txn_t3, bi);
           break;
         case 2:
-          threads.emplace_back(txn_t2, bi);
+          threads.emplace_back(txn_t3, bi);
           break;
         case 3:
           threads.emplace_back(txn_t3, bi);
```
then build and run tests
```
COMPILE_WITH_TSAN=1 CC=clang-13 CXX=clang++-13 ROCKSDB_DISABLE_ALIGNED_NEW=1 USE_CLANG=1 make V=1 -j32 check
gtest-parallel -r 100 ./write_prepared_transaction_test --gtest_filter=TwoWriteQueues/SeqAdvanceConcurrentTest.SeqAdvanceConcurrent/19
```
In the above, `SeqAdvanceConcurrent/19`. The tests 10 to 19 correspond to unordered write in which Prepare() and Commit() can both enter the same write thread.
Before this PR, there is a high chance of hitting the deadlock. With this PR, no deadlock has been encountered so far.

Reviewed By: ltamasi

Differential Revision: D40869387

Pulled By: riversand963

fbshipit-source-id: 81e82a70c263e4f3417597a201b081ee54f1deab
2022-11-02 14:54:58 -07:00
..
lock Pass const LockInfo& to AcquireLocked() and AcquireWithTimeout (#10874) 2022-10-28 14:05:12 -07:00
optimistic_transaction.cc Add further tests to ASSERT_STATUS_CHECKED (2) (#7698) 2020-12-09 21:21:16 -08:00
optimistic_transaction.h Run clang-format on utilities/transactions (#10871) 2022-10-25 14:15:22 -07:00
optimistic_transaction_db_impl.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
optimistic_transaction_db_impl.h Run clang-format on utilities/transactions (#10871) 2022-10-25 14:15:22 -07:00
optimistic_transaction_test.cc Run clang-format on utilities/transactions (#10871) 2022-10-25 14:15:22 -07:00
pessimistic_transaction.cc Reset pessimistic transaction's read/commit timestamps during Initialize() (#10677) 2022-09-14 18:28:21 -07:00
pessimistic_transaction.h Snapshots with user-specified timestamps (#9879) 2022-06-10 16:07:03 -07:00
pessimistic_transaction_db.cc Correctly implement Create-/DropColumnFamilies for PessimisticTransactionDB (#10332) 2022-07-22 08:31:22 -07:00
pessimistic_transaction_db.h Correctly implement Create-/DropColumnFamilies for PessimisticTransactionDB (#10332) 2022-07-22 08:31:22 -07:00
snapshot_checker.cc Use STATIC_AVOID_DESTRUCTION for static objects with non-trivial destructors (#9958) 2022-05-17 09:39:22 -07:00
timestamped_snapshot_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
transaction_base.cc Run clang-format on utilities/transactions (#10871) 2022-10-25 14:15:22 -07:00
transaction_base.h Run clang-format on utilities/transactions (#10871) 2022-10-25 14:15:22 -07:00
transaction_db_mutex_impl.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
transaction_db_mutex_impl.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
transaction_test.cc Run clang-format on utilities/transactions (#10871) 2022-10-25 14:15:22 -07:00
transaction_test.h Snapshots with user-specified timestamps (#9879) 2022-06-10 16:07:03 -07:00
transaction_util.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
transaction_util.h Update TransactionUtil::CheckKeyForConflict to also use timestamps (#9162) 2021-11-15 12:52:18 -08:00
write_committed_transaction_ts_test.cc Basic Support for Merge with user-defined timestamp (#10819) 2022-10-31 22:28:58 -07:00
write_prepared_transaction_test.cc Reduce access to atomic variables in a test (#10909) 2022-11-02 14:54:58 -07:00
write_prepared_txn.cc Run clang-format on utilities/transactions (#10871) 2022-10-25 14:15:22 -07:00
write_prepared_txn.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
write_prepared_txn_db.cc Run clang-format on utilities/transactions (#10871) 2022-10-25 14:15:22 -07:00
write_prepared_txn_db.h Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
write_unprepared_transaction_test.cc Run clang-format on utilities/transactions (#10871) 2022-10-25 14:15:22 -07:00
write_unprepared_txn.cc Add WriteOptions::protection_bytes_per_key (#10037) 2022-06-16 23:10:07 -07:00
write_unprepared_txn.h Replace tracked_keys with a new LockTracker interface in TransactionDB (#7013) 2020-08-06 12:38:00 -07:00
write_unprepared_txn_db.cc Run clang-format on utilities/transactions (#10871) 2022-10-25 14:15:22 -07:00
write_unprepared_txn_db.h WriteUnPrepared: Pass in correct subbatch count during rollback (#6463) 2020-02-28 11:19:32 -08:00