rocksdb/db/compaction
Yanqin Jin 0bd4dcde6b CompactionIterator sees consistent view of which keys are committed (#9830)
Summary:
**This PR does not affect the functionality of `DB` and write-committed transactions.**

`CompactionIterator` uses `KeyCommitted(seq)` to determine if a key in the database is committed.
As the name 'write-committed' implies, if write-committed policy is used, a key exists in the database only if
it is committed. In fact, the implementation of `KeyCommitted()` is as follows:

```
inline bool KeyCommitted(SequenceNumber seq) {
  // For non-txn-db and write-committed, snapshot_checker_ is always nullptr.
  return snapshot_checker_ == nullptr ||
         snapshot_checker_->CheckInSnapshot(seq, kMaxSequence) == SnapshotCheckerResult::kInSnapshot;
}
```

With that being said, we focus on write-prepared/write-unprepared transactions.

A few notes:
- A key can exist in the db even if it's uncommitted. Therefore, we rely on `snapshot_checker_` to determine data visibility. We also require that all writes go through transaction API instead of the raw `WriteBatch` + `Write`, thus at most one uncommitted version of one user key can exist in the database.
- `CompactionIterator` outputs a key as long as the key is uncommitted.

Due to the above reasons, it is possible that `CompactionIterator` decides to output an uncommitted key without
doing further checks on the key (`NextFromInput()`). By the time the key is being prepared for output, the key becomes
committed because the `snapshot_checker_(seq, kMaxSequence)` becomes true in the implementation of `KeyCommitted()`.
Then `CompactionIterator` will try to zero its sequence number and hit assertion error if the key is a tombstone.

To fix this issue, we should make the `CompactionIterator` see a consistent view of the input keys. Note that
for write-prepared/write-unprepared, the background flush/compaction jobs already take a "job snapshot" before starting
processing keys. The job snapshot is released only after the entire flush/compaction finishes. We can use this snapshot
to determine whether a key is committed or not with minor change to `KeyCommitted()`.

```
inline bool KeyCommitted(SequenceNumber sequence) {
  // For non-txn-db and write-committed, snapshot_checker_ is always nullptr.
  return snapshot_checker_ == nullptr ||
         snapshot_checker_->CheckInSnapshot(sequence, job_snapshot_) ==
             SnapshotCheckerResult::kInSnapshot;
}
```

As a result, whether a key is committed or not will remain a constant throughout compaction, causing no trouble
for `CompactionIterator`s assertions.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9830

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D35561162

Pulled By: riversand963

fbshipit-source-id: 0e00d200c195240341cfe6d34cbc86798b315b9f
2022-04-14 11:11:04 -07:00
..
clipping_iterator.h Add a clipping internal iterator (#8327) 2021-06-09 15:41:16 -07:00
clipping_iterator_test.cc Cleanup multiple implementations of VectorIterator (#8901) 2021-10-06 07:48:31 -07:00
compaction.cc Add OpenAndTrimHistory API to support trimming data with specified timestamp (#9410) 2022-03-11 16:13:23 -08:00
compaction.h Add OpenAndTrimHistory API to support trimming data with specified timestamp (#9410) 2022-03-11 16:13:23 -08:00
compaction_iteration_stats.h Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
compaction_iterator.cc CompactionIterator sees consistent view of which keys are committed (#9830) 2022-04-14 11:11:04 -07:00
compaction_iterator.h CompactionIterator sees consistent view of which keys are committed (#9830) 2022-04-14 11:11:04 -07:00
compaction_iterator_test.cc CompactionIterator sees consistent view of which keys are committed (#9830) 2022-04-14 11:11:04 -07:00
compaction_job.cc CompactionIterator sees consistent view of which keys are committed (#9830) 2022-04-14 11:11:04 -07:00
compaction_job.h CompactionIterator sees consistent view of which keys are committed (#9830) 2022-04-14 11:11:04 -07:00
compaction_job_stats_test.cc Make MemTableRepFactory into a Customizable class (#8419) 2021-09-08 07:46:44 -07:00
compaction_job_test.cc CompactionIterator sees consistent view of which keys are committed (#9830) 2022-04-14 11:11:04 -07:00
compaction_picker.cc Add OpenAndTrimHistory API to support trimming data with specified timestamp (#9410) 2022-03-11 16:13:23 -08:00
compaction_picker.h Add OpenAndTrimHistory API to support trimming data with specified timestamp (#9410) 2022-03-11 16:13:23 -08:00
compaction_picker_fifo.cc Add OpenAndTrimHistory API to support trimming data with specified timestamp (#9410) 2022-03-11 16:13:23 -08:00
compaction_picker_fifo.h Add OpenAndTrimHistory API to support trimming data with specified timestamp (#9410) 2022-03-11 16:13:23 -08:00
compaction_picker_level.cc Add OpenAndTrimHistory API to support trimming data with specified timestamp (#9410) 2022-03-11 16:13:23 -08:00
compaction_picker_level.h
compaction_picker_test.cc Add OpenAndTrimHistory API to support trimming data with specified timestamp (#9410) 2022-03-11 16:13:23 -08:00
compaction_picker_universal.cc DisableManualCompaction may fail to cancel an unscheduled task (#9659) 2022-03-12 20:07:04 -08:00
compaction_picker_universal.h
compaction_service_test.cc Support canceling running RemoteCompaction on remote side (#9725) 2022-04-13 13:28:09 -07:00
file_pri.h Try to start TTL earlier with kMinOverlappingRatio is used (#8749) 2021-11-01 14:36:31 -07:00
sst_partitioner.cc Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362) 2022-01-11 06:33:48 -08:00