rocksdb/db/db_impl
Yanqin Jin bbdaf63d0f Fix a TSAN-reported bug caused by concurrent accesss to std::deque (#9686)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9686

According to https://www.cplusplus.com/reference/deque/deque/back/,
"
The container is accessed (neither the const nor the non-const versions modify the container).
The last element is potentially accessed or modified by the caller. Concurrently accessing or modifying other elements is safe.
"

Also according to https://www.cplusplus.com/reference/deque/deque/pop_front/,
"
The container is modified.
The first element is modified. Concurrently accessing or modifying other elements is safe (although see iterator validity above).
"
In RocksDB, we never pop the last element of `DBImpl::alive_log_files_`. We have been
exploiting this fact and the above two properties when ensuring correctness when
`DBImpl::alive_log_files_` may be accessed concurrently. Specifically, it can be accessed
in the write path when db mutex is released. Sometimes, the log_mute_ is held. It can also be accessed in `FindObsoleteFiles()`
when db mutex is always held. It can also be accessed
during recovery when db mutex is also held.
Given the fact that we never pop the last element of alive_log_files_, we currently do not
acquire additional locks when accessing it in `WriteToWAL()` as follows
```
alive_log_files_.back().AddSize(log_entry.size());
```

This is problematic.

Check source code of deque.h
```
  back() _GLIBCXX_NOEXCEPT
  {
__glibcxx_requires_nonempty();
...
  }

  pop_front() _GLIBCXX_NOEXCEPT
  {
...
  if (this->_M_impl._M_start._M_cur
      != this->_M_impl._M_start._M_last - 1)
    {
      ...
      ++this->_M_impl._M_start._M_cur;
    }
  ...
  }
```

`back()` will actually call `__glibcxx_requires_nonempty()` first.
If `__glibcxx_requires_nonempty()` is enabled and not an empty macro,
it will call `empty()`
```
bool empty() {
return this->_M_impl._M_finish == this->_M_impl._M_start;
}
```
You can see that it will access `this->_M_impl._M_start`, racing with `pop_front()`.
Therefore, TSAN will actually catch the bug in this case.

To be able to use TSAN on our library and unit tests, we should always coordinate
concurrent accesses to STL containers properly.

We need to pass information about db mutex and log mutex into `WriteToWAL()`, otherwise
it's impossible to know which mutex to acquire inside the function.

To fix this, we can catch the tail of `alive_log_files_` by reference, so that we do not have to call `back()` in `WriteToWAL()`.

Reviewed By: pdillinger

Differential Revision: D34780309

fbshipit-source-id: 1def9821f0c437f2736c6a26445d75890377889b
2022-03-14 18:49:55 -07:00
..
compacted_db_impl.cc Fix a timer crash caused by invalid memory management (#9656) 2022-03-12 11:45:56 -08:00
compacted_db_impl.h Move compacted_db_impl.[c|h] to db/db_impl (#8082) 2021-03-23 13:49:26 -07:00
db_impl.cc Fix a timer crash caused by invalid memory management (#9656) 2022-03-12 11:45:56 -08:00
db_impl.h Fix a TSAN-reported bug caused by concurrent accesss to std::deque (#9686) 2022-03-14 18:49:55 -07:00
db_impl_compaction_flush.cc DisableManualCompaction may fail to cancel an unscheduled task (#9659) 2022-03-12 20:07:04 -08:00
db_impl_debug.cc Add OpenAndTrimHistory API to support trimming data with specified timestamp (#9410) 2022-03-11 16:13:23 -08:00
db_impl_experimental.cc Get DBTest passing Assert Status Checked (#7737) 2021-12-09 11:00:17 -08:00
db_impl_files.cc Fix a silent data loss for write-committed txn (#9571) 2022-02-16 23:08:58 -08:00
db_impl_open.cc Fix a TSAN-reported bug caused by concurrent accesss to std::deque (#9686) 2022-03-14 18:49:55 -07:00
db_impl_readonly.cc Fix PinSelf() read-after-free in DB::GetMergeOperands() (#9507) 2022-02-15 12:25:18 -08:00
db_impl_readonly.h RocksJava - Add errorIfLogFileExists parameter to RocksDB.openReadOnly (#7046) 2020-09-17 15:41:25 -07:00
db_impl_secondary.cc Fix PinSelf() read-after-free in DB::GetMergeOperands() (#9507) 2022-02-15 12:25:18 -08:00
db_impl_secondary.h Add commit marker with timestamp (#9266) 2021-12-10 11:05:35 -08:00
db_impl_write.cc Fix a TSAN-reported bug caused by concurrent accesss to std::deque (#9686) 2022-03-14 18:49:55 -07:00