Commit graph

220 commits

Author SHA1 Message Date
Yi Wu 5d4fddfa52 WritePrepared: Fix visible key compacted out by compaction (#4883)
Summary:
With WritePrepared transaction, flush/compaction can contain uncommitted keys, and those keys can get committed during compaction. If a snapshot is taken before the key is committed, it should not see the key. On the other hand, compaction grab the list of snapshots at its beginning, and only consider those snapshots to dedup keys. Consider the case:
```
seq = 1: put "foo" = "bar"
seq = 2: transaction T: delete "foo", prepare
seq = 3: compaction start
seq = 4: take snapshot S
seq = 5: transaction T: commit.
...
seq = N: compaction iterator reached key "foo".
```
When compaction start, the list of snapshot is empty. Compaction doesn't take snapshot S into account. When it reached "foo", transaction T is committed. Compaction may think the value "foo=bar" is not visible by any snapshot (which is wrong), and compact the value out.

The fix is to explicitly take a snapshot before compaction grabbing the list of snapshots. Compaction will then has to keep keys visible to this snapshot.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4883

Differential Revision: D13668775

Pulled By: maysamyabandeh

fbshipit-source-id: 1cab9615f94b7d3e8522cc3d44c3a14c7d4720e4
2019-01-15 21:34:38 -08:00
Maysam Yabandeh cad99a6031 WritePrepared: snapshot should be larger than max_evicted_seq_ (#4886)
Summary:
The AdvanceMaxEvictedSeq algorithm assumes that new snapshots always have sequence number larger than the last max_evicted_seq_. To enforce this assumption we make two changes:
i) max is not advanced beyond the last published seq, with the exception that the evicted commit entry itself is not published yet, which is quite rare.
ii) When obtaining the snapshot if the max_evicted_seq_ is not published yet, commit a dummy entry so that it waits for it to be published and also increased the latest published seq by one above the max.
To test these non-realistic corner cases we create a commit cache with size 1 so that every single commit results into eviction.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4886

Differential Revision: D13685270

Pulled By: maysamyabandeh

fbshipit-source-id: 5461bc09c2a9b75798bfcb9853a256c81cdac0b0
2019-01-15 18:11:52 -08:00
Yi Wu d50c10ed37 WritePrepared: Fix SmallestUnCommittedSeq() doesn't check delayed_prepared (#4867)
Summary:
When prepared_txns_ heap is empty, SmallestUnCommittedSeq() should check delayed_prepared_ set as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4867

Differential Revision: D13632134

Pulled By: maysamyabandeh

fbshipit-source-id: b0423bb0a58dc95f1e636d5ed3f6e619df801fb7
2019-01-15 09:17:53 -08:00
Maysam Yabandeh 856ac24484 WritePrepared: fix race condition on GetSnapshotListFromDB (#4872)
Summary:
Fixes a typo that made mutex_ to remain unlocked when GetSnapshotListFromDB called from WritePreparedTxnDB.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4872

Differential Revision: D13640381

Pulled By: maysamyabandeh

fbshipit-source-id: 50f6600568f9092b4b43115f6ebd96e6c7388ad7
2019-01-11 13:46:23 -08:00
Zhongyi Xie 6a4ec41fed add assert to silence clang warning (#4871)
Summary:
currently clang analyze fails with the following warning:
> utilities/transactions/write_prepared_transaction_test.cc:1451:5: warning: Forming reference to null pointer
    ASSERT_GT(wp_db->max_evicted_seq_, 0);  // max after recovery
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4871

Differential Revision: D13638053

Pulled By: miasantreble

fbshipit-source-id: b192b0c13c411c58defc9e280b34cdfcab3fa8e3
2019-01-11 12:17:34 -08:00
Maysam Yabandeh f3a99e8a4d WritePrepared: Report released snapshots in IsInSnapshot (#4856)
Summary:
Previously IsInSnapshot assumed that the snapshot is valid at the time that the function is called. However there are cases where that might not be valid. Example is background compactions where the compaction algorithm operates with a list of snapshots some of which might be released by the time they are being passed to IsInSnapshot. The patch make two changes to enable the caller to tell difference: i) any live snapshot below max is added to max_committed_seq_, which allows IsInSnapshot to confidently tell whether the passed snapshot is invalid if it below max, ii) extends IsInSnapshot API with a "released" variable that is set true when IsInSnapshot find no such snapshot below max and also find no other way to give a certain return value. In such cases the return value is true but the caller should also check the "released" boolean after the call.
In short here is the changes in the API:
i) If the snapshot is valid, no change is required.
ii) If the snapshot might be invalid, a reference to "released" boolean must be passed to IsInSnapshot.
ii-a) If snapshot is above max, IsInSnapshot can figure the return valid using the commit cache.
ii-b) otherwise if snapshot is in old_commit_map_, IsInSnapshot can use that to tell if value was visible to the snapshot.
ii-c) otherwise it sets "released" to true and returns true as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4856

Differential Revision: D13599847

Pulled By: maysamyabandeh

fbshipit-source-id: 1752be28667f886a1efec8cae5714b9b7a8f1e0f
2019-01-08 14:47:29 -08:00
Maysam Yabandeh cd227d74ba WritePrepared: improve IsInSnapshotEmptyMapTest (#4853)
Summary:
IsInSnapshotEmptyMapTest tests that IsInSnapshot returns correct value for existing data after a recovery, where max is not zero and yet commit cache is empty. The existing test was preliminary which is improved in this patch. It also increases the db sequence after recovery so that there the snapshot immediately taken after recovery would have a sequence number different than that of max_evicted_seq. This simplifies the logic in IsInSnapshot by not having to consider the special case that an old snapshot might be equal to max_evicted_seq and yet not present in old_commit_map.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4853

Differential Revision: D13595223

Pulled By: maysamyabandeh

fbshipit-source-id: 77c12ca8a3f61a47479a93bef2038ff502dc3322
2019-01-08 11:27:11 -08:00
Maysam Yabandeh 0ed98bf89e WritePrepared: fix snapshot sequence in rollback (#4851)
Summary:
The rollback algorithm in WritePrepared transactions requires reading the values before the transaction start. Currently it uses the prepare_seq -1 as the snapshot sequence number for the read. This is not correct since the passed sequence number must be for a valid snapshot. The patch fixes it by passing kMaxSequenceNumber instead. This is fine since all the writes done by the aborted transaction will be skipped during the read anyway.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4851

Differential Revision: D13592773

Pulled By: maysamyabandeh

fbshipit-source-id: ff1bf92ea9909d4cccb173bdff49febc0e9eb7a2
2019-01-07 14:57:03 -08:00
Faustin Lammler 7d65bd5ce4 Fix spelling errors (#4827)
Summary:
Hi, Lintian, the Debian package checker complains about spelling error (spelling-error-in-binary).

See https://salsa.debian.org/mariadb-team/mariadb-10.3/-/jobs/98380
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4827

Differential Revision: D13566362

Pulled By: riversand963

fbshipit-source-id: cd4e9212133c73b0591030de6cdedaa47575968d
2019-01-02 11:17:57 -08:00
DorianZheng 2670fe8c73 Get CompactionJobInfo from CompactFiles
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4716

Differential Revision: D13207677

Pulled By: ajkr

fbshipit-source-id: d0ccf5a66df6cbb07288b0c5ebad81fd9df3926b
2018-12-13 14:21:24 -08:00
Maysam Yabandeh b878f93c70 Extend Transaction::GetForUpdate with do_validate (#4680)
Summary:
Transaction::GetForUpdate is extended with a do_validate parameter with default value of true. If false it skips validating the snapshot (if there is any) before doing the read. After the read it also returns the latest value (expects the ReadOptions::snapshot to be nullptr). This allows RocksDB applications to use GetForUpdate similarly to how InnoDB does. Similarly ::Merge, ::Put, ::Delete, and ::SingleDelete are extended with assume_exclusive_tracked with default value of false. It true it indicates that call is assumed to be after a ::GetForUpdate(do_validate=false).
The Java APIs are accordingly updated.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4680

Differential Revision: D13068508

Pulled By: maysamyabandeh

fbshipit-source-id: f0b59db28f7f6a078b60844d902057140765e67d
2018-12-06 17:49:00 -08:00
Zhongyi Xie 2f1ca4e838 Revert "BaseDeltaIterator: always check valid() before accessing key(… (#4744)
Summary:
…) (#4702)"

This reverts commit 3a18bb3e15.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4744

Differential Revision: D13311869

Pulled By: miasantreble

fbshipit-source-id: 6300b12cc34828d8b9274e907a3aef1506d5d553
2018-12-03 23:38:27 -08:00
Zhongyi Xie 3a18bb3e15 BaseDeltaIterator: always check valid() before accessing key() (#4702)
Summary:
Current implementation of `current_over_upper_bound_` fails to take into consideration that keys might be invalid in either base iterator or delta iterator. Calling key() in such scenario will lead to assertion failure and runtime errors.
This PR addresses the bug by adding check for valid keys before calling `IsOverUpperBound()`, also added test coverage for iterate_upper_bound usage in BaseDeltaIterator
Also recommit https://github.com/facebook/rocksdb/pull/4656 (It was reverted earlier due to bugs)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4702

Differential Revision: D13146643

Pulled By: miasantreble

fbshipit-source-id: 6d136929da12d0f2e2a5cea474a8038ec5cdf1d0
2018-11-30 15:35:13 -08:00
Maysam Yabandeh f1b0841f06 WritePrepared: followup fix for snapshot double release issue (#4734)
Summary:
The fix in #4727 for double snapshot release was incomplete since it does not properly remove the duplicate entires in the snapshot list after finding that a snapshot is still valid. The patch does that and also improves the unit test to show the issue.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4734

Differential Revision: D13266260

Pulled By: maysamyabandeh

fbshipit-source-id: 351e2c40cca45a87b757774c11af74182314911e
2018-11-29 21:01:57 -08:00
Maysam Yabandeh 1a5a93ff74 WritePrepared: Fix double snapshot release issue (#4727)
Summary:
Currently the garbage collection of items in old_commit_map_ was done upon ::ReleaseSnapshot. The assumption behind this method was that the sequence number of snapshots are unique, which is incorrect. In the very rare cases that two consecutive snapshot have the same sequence number this could lead the release of the first snapshot affect the old_commit_map_ that is necessary to service the reads of the second snapshot. The bug would be triggered only if i) two snapshot have the same seq, ii) both of them are very old (older than the last ~4m transactions), and iii) there is commit entry overlapping with the snapshot seq number.
It is fixed by doing the cleanup of old_commit_map_ in UpdateSnapshot: the new list of snapshots are compared with the old one and the missing sequence numbers are concluded released. If two snapshots have the same seq number, after the release of one of them, the seq number still appears in the snapshot least and thus not cleaned up prematurely.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4727

Differential Revision: D13246495

Pulled By: maysamyabandeh

fbshipit-source-id: 93b87a5042afd8060889df245526d3f5d29de9fe
2018-11-28 19:03:31 -08:00
Zhongyi Xie a21cb22ee3 Revert "apply ReadOptions.iterate_upper_bound to transaction iterator… (#4705)
Summary:
… (#4656)"

This reverts commit b76398a82b.

Will add test coverage for iterate_upper_bound before re-commit b76398
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4705

Differential Revision: D13148592

Pulled By: miasantreble

fbshipit-source-id: 4d1ce0bfd9f7a5359a7688bd780eb06a66f45b1f
2018-11-24 10:46:28 -08:00
Zhongyi Xie b76398a82b apply ReadOptions.iterate_upper_bound to transaction iterator (#4656)
Summary:
Currently transaction iterator does not apply `ReadOptions.iterate_upper_bound` when iterating. This PR attempts to fix the problem by having `BaseDeltaIterator` enforcing the upper bound check when iterator state is changed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4656

Differential Revision: D13039257

Pulled By: miasantreble

fbshipit-source-id: 909eb9f6b4597a4d80418fb139f32ec82c6ec1d1
2018-11-13 15:44:15 -08:00
Sagar Vemuri dc3528077a Update all unique/shared_ptr instances to be qualified with namespace std (#4638)
Summary:
Ran the following commands to recursively change all the files under RocksDB:
```
find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
```
Running `make format` updated some formatting on the files touched.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638

Differential Revision: D12934992

Pulled By: sagar0

fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
2018-11-09 11:19:58 -08:00
Siying Dong 566fc8b994 Black list some valgrind tests (#4642)
Summary:
valgrind tests with 1 thread run too long. To make it shorter, black list some long tests. These are already blacklisted in parallel valgrind tests, but they are not in non-parallel mode
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4642

Differential Revision: D12945237

Pulled By: siying

fbshipit-source-id: 04cf977d435996480fe87aa09f14b17975b74f7d
2018-11-06 14:22:36 -08:00
Maysam Yabandeh 2b5b7bc795 WritePrepared: Fix bug in searching in non-cached snapshots (#4639)
Summary:
When evicting an entry form the commit_cache, it is verified against the list of old snapshots to see if it overlaps with any. The list of old snapshots is split into two lists: an efficient concurrent cache and an slow vector protected by a lock. The patch fixes a bug that would stop the search in the cache if it finds any and yet would not include the larger snapshots in the slower list.
An extra info log entry is also removed. The condition to trigger that although very rare is still feasible and should not spam the LOG when that happens.
Fixes #4621
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4639

Differential Revision: D12934989

Pulled By: maysamyabandeh

fbshipit-source-id: 4e0fe8147ba292b554ae78e94c21c2ef31e03e2d
2018-11-05 23:03:50 -08:00
Simon Grätzer ad21b1af52 Set WriteCommitted txn id to commit sequence number (#4565)
Summary:
SetId and GetId are the experimental API that so far being used in WritePrepared and WriteUnPrepared transactions, where the id is assigned at the prepare time. The patch extends the API to WriteCommitted transactions, by setting the id at commit time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4565

Differential Revision: D10557862

Pulled By: maysamyabandeh

fbshipit-source-id: 2b27a140682b6185a4988fa88f8152628e0d67af
2018-10-24 12:21:38 -07:00
jsteemann d1c0d3f358 Small issues (#4564)
Summary:
Couple of very minor improvements (typos in comments, full qualification of class name, reordering members of a struct to make it smaller)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4564

Differential Revision: D10510183

Pulled By: maysamyabandeh

fbshipit-source-id: c7ddf9bfbf2db08cd31896c3fd93789d3fa68c8b
2018-10-23 10:35:57 -07:00
Maysam Yabandeh 3f5282268f Skip concurrency control during recovery of pessimistic txn (#4346)
Summary:
TransactionOptions::skip_concurrency_control allows pessimistic transactions to skip the overhead of concurrency control. This could be as an optimization if the application knows that the transaction would not have any conflict with concurrent transactions. It is currently used during recovery assuming (i) application guarantees no conflict between prepared transactions in the WAL (ii) application guarantees that recovered transactions will be rolled back/commit before new transactions start.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4346

Differential Revision: D9759149

Pulled By: maysamyabandeh

fbshipit-source-id: f896e84fa58b0b584be904c7fd3883a41ea3215b
2018-09-10 16:57:53 -07:00
cngzhnp 64324e329e Support pragma once in all header files and cleanup some warnings (#4339)
Summary:
As you know, almost all compilers support "pragma once" keyword instead of using include guards. To be keep consistency between header files, all header files are edited.

Besides this, try to fix some warnings about loss of data.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4339

Differential Revision: D9654990

Pulled By: ajkr

fbshipit-source-id: c2cf3d2d03a599847684bed81378c401920ca848
2018-09-05 18:13:31 -07:00
Mikhail Antonov 927f274939 Avoiding write stall caused by manual flushes (#4297)
Summary:
Basically at the moment it seems it's possible to cause write stall by calling flush (either manually vis DB::Flush(), or from Backup Engine directly calling FlushMemTable() while background flush may be already happening.

One of the ways to fix it is that in DBImpl::CompactRange() we already check for possible stall and delay flush if needed before we actually proceed to call FlushMemTable(). We can simply move this delay logic to separate method and call it from FlushMemTable.

This is draft patch, for first look; need to check tests/update SyncPoints and most certainly would need to add allow_write_stall method to FlushOptions().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4297

Differential Revision: D9420705

Pulled By: mikhail-antonov

fbshipit-source-id: f81d206b55e1d7b39e4dc64242fdfbceeea03fcc
2018-08-29 12:12:55 -07:00
Yi Wu 4f12d49daf Suppress clang analyzer error (#4299)
Summary:
Suppress multiple clang-analyzer error. All of them are clang false-positive.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4299

Differential Revision: D9430740

Pulled By: yiwu-arbug

fbshipit-source-id: fbdd575bdc214d124826d61d35a117995c509279
2018-08-21 16:43:05 -07:00
jsteemann 90f744941d adds missing PopSavePoint method to Transaction (#4256)
Summary:
Transaction has had methods to deal with SavePoints already, but
was missing the PopSavePoint method provided by WriteBatch and
WriteBatchWithIndex.
This PR adds PopSavePoint to Transaction as well. Having the method
on Transaction-level too is useful for applications that repeatedly
execute a sequence of operations that normally succeed, but infrequently
need to get rolled back. Using SavePoints here is sensible, but as
operations normally succeed the application may pile up a lot of
useless SavePoints inside a Transaction, leading to slightly increased
memory usage for managing the unneeded SavePoints.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4256

Differential Revision: D9326932

Pulled By: yiwu-arbug

fbshipit-source-id: 53a0af18a6c7e87feff8a56f1f3eab9df7f371d6
2018-08-17 11:57:30 -07:00
Siying Dong 9c0c8f5ff6 GetAllKeyVersions() to take an extra argument of max_num_ikeys. (#4271)
Summary:
Right now, `ldb idump` may have memory out of control if there is a big range of tombstones. Add an option to cut maxinum number of keys in GetAllKeyVersions(), and push down --max_num_ikeys from ldb.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4271

Differential Revision: D9369149

Pulled By: siying

fbshipit-source-id: 7cbb797b7d2fa16573495a7e84937456d3ff25bf
2018-08-16 15:57:08 -07:00
Maysam Yabandeh caf0f53a74 Index value delta encoding (#3983)
Summary:
Given that index value is a BlockHandle, which is basically an <offset, size> pair we can apply delta encoding on the values. The first value at each index restart interval encoded the full BlockHandle but the rest encode only the size. Refer to IndexBlockIter::DecodeCurrentValue for the detail of the encoding. This reduces the index size which helps using the  block cache more efficiently. The feature is enabled with using format_version 4.

The feature comes with a bit of cpu overhead which should be paid back by the higher cache hits due to smaller index block size.
Results with sysbench read-only using 4k blocks and using 16 index restart interval:
Format 2:
19585   rocksdb read-only range=100
Format 3:
19569   rocksdb read-only range=100
Format 4:
19352   rocksdb read-only range=100
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3983

Differential Revision: D8361343

Pulled By: maysamyabandeh

fbshipit-source-id: f882ee082322acac32b0072e2bdbb0b5f854e651
2018-08-09 16:58:40 -07:00
Manuel Ung ea212e5316 WriteUnPrepared: Implement unprepared batches for transactions (#4104)
Summary:
This adds support for writing unprepared batches based on size defined in `TransactionOptions::max_write_batch_size`. This is done by overriding methods that modify data (Put/Delete/SingleDelete/Merge) and checking first if write batch size has exceeded threshold. If so, the write batch is written to DB as an unprepared batch.

Support for Commit/Rollback for unprepared batch is added as well. This has been done by simply extending the WritePrepared Commit/Rollback logic to take care of all unprep_seq numbers either when updating prepare heap, or adding to commit map. For updating the commit map, this logic exists inside `WriteUnpreparedCommitEntryPreReleaseCallback`.

A test change was also made to have transactions unregister themselves when committing without prepare. This is because with write unprepared, there may be unprepared entries (which act similarly to prepared entries) already when a commit is done without prepare.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4104

Differential Revision: D8785717

Pulled By: lth

fbshipit-source-id: c02006e281ec1ce00f628e2a7beec0ee73096a91
2018-07-24 00:13:18 -07:00
Maysam Yabandeh 8581a93a6b Per-thread unique test db names (#4135)
Summary:
The patch makes sure that two parallel test threads will operate on different db paths. This enables using open source tools such as gtest-parallel to run the tests of a file in parallel.
Example: ``` ~/gtest-parallel/gtest-parallel ./table_test```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4135

Differential Revision: D8846653

Pulled By: maysamyabandeh

fbshipit-source-id: 799bad1abb260e3d346bcb680d2ae207a852ba84
2018-07-13 17:27:39 -07:00
Fosco Marotto 8527012bb6 Converted db/merge_test.cc to use gtest (#4114)
Summary:
Picked up a task to convert this to use the gtest framework.  It can't be this simple, can it?

It works, but should all the std::cout be removed?

```
[$] ~/git/rocksdb [gft !]: ./merge_test
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from MergeTest
[ RUN      ] MergeTest.MergeDbTest
Test read-modify-write counters...
a: 3
1
2
a: 3
b: 1225
3
Compaction started ...
Compaction ended
a: 3
b: 1225
Test merge-based counters...
a: 3
1
2
a: 3
b: 1225
3
Test merge in memtable...
a: 3
1
2
a: 3
b: 1225
3
Test Partial-Merge
Test merge-operator not set after reopen
[       OK ] MergeTest.MergeDbTest (93 ms)
[ RUN      ] MergeTest.MergeDbTtlTest
Opening database with TTL
Test read-modify-write counters...
a: 3
1
2
a: 3
b: 1225
3
Compaction started ...
Compaction ended
a: 3
b: 1225
Test merge-based counters...
a: 3
1
2
a: 3
b: 1225
3
Test merge in memtable...
Opening database with TTL
a: 3
1
2
a: 3
b: 1225
3
Test Partial-Merge
Opening database with TTL
Opening database with TTL
Opening database with TTL
Opening database with TTL
Test merge-operator not set after reopen
[       OK ] MergeTest.MergeDbTtlTest (97 ms)
[----------] 2 tests from MergeTest (190 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (190 ms total)
[  PASSED  ] 2 tests.
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4114

Differential Revision: D8822886

Pulled By: gfosco

fbshipit-source-id: c299d008e883c3bb911d2b357a2e9e4423f8e91a
2018-07-13 14:13:07 -07:00
Maysam Yabandeh 537a233941 Exclude StackableDB from transaction stress tests (#4132)
Summary:
The transactions are currently tested with and without using StackableDB. This is mostly to check that the code path is consistent with stackable db as well. Slow, stress tests however do not benefit from being run again with StackableDB. The patch excludes StackableDB from such tests.
On a single core it reduced the runtime of transaction_test from 199s to 135s.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4132

Differential Revision: D8841655

Pulled By: maysamyabandeh

fbshipit-source-id: 7b9aaba2673b542b195439dfb306cef26bd63b19
2018-07-13 13:59:11 -07:00
Manuel Ung b9846370e9 WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078)
Summary:
This adds support for recovering WriteUnprepared transactions through the following changes:
- The information in `RecoveredTransaction` is extended so that it can reference multiple batches.
- `MarkBeginPrepare` is extended with a bool indicating whether it is an unprepared begin, and this is passed down to `InsertRecoveredTransaction` to indicate whether the current transaction is prepared or not.
- `WriteUnpreparedTxnDB::Initialize` is overridden so that it will rollback unprepared transactions from the recovered transactions. This can be done without updating the prepare heap/commit map, because this is before the DB has finished initializing, and after writing the rollback batch, those data structures should not contain information about the rolled back transaction anyway.

Commit/Rollback of live transactions is still unimplemented and will come later.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4078

Differential Revision: D8703382

Pulled By: lth

fbshipit-source-id: 7e0aada6c23bd39299f1f20d6c060492e0e6b60a
2018-07-06 17:59:13 -07:00
Daniel Black 36fa49ceb5 transaction_test: -Wunused-variable with clang-7 (#4074)
Summary:
clang version 7.0.0- (trunk)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

clang++-7  -DROCKSDB_USE_RTTI -g -W -Wextra -Wall -Wsign-compare -Wshadow -Wno-unused-parameter -Werror -I. -I./include -std=c++11  -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX  -DOS_LINUX -fno-builtin-memcmp -DROCKSDB_FALLOCATE_PRESENT -DSNAPPY -DGFLAGS=google -DZLIB -DBZIP2 -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_PTHREAD_ADAPTIVE_MUTEX -DROCKSDB_BACKTRACE -DROCKSDB_RANGESYNC_PRESENT -DROCKSDB_SCHED_GETCPU_PRESENT -Wshorten-64-to-32 -march=native  -DHAVE_SSE42 -DROCKSDB_SUPPORT_THREAD_LOCAL  -isystem ./third-party/gtest-1.7.0/fused-src -DTRAVIS -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -c utilities/transactions/transaction_test.cc -o utilities/transactions/transaction_test.o
utilities/transactions/transaction_test.cc:2282:22: error: unused variable 'txn_options' [-Werror,-Wunused-variable]
  TransactionOptions txn_options;
                     ^
utilities/transactions/transaction_test.cc:2822:22: error: unused variable 'txn_options' [-Werror,-Wunused-variable]
  TransactionOptions txn_options;
                     ^
utilities/transactions/transaction_test.cc:2928:22: error: unused variable 'txn_options' [-Werror,-Wunused-variable]
  TransactionOptions txn_options;
                     ^
utilities/transactions/transaction_test.cc:3109:22: error: unused variable 'txn_options' [-Werror,-Wunused-variable]
  TransactionOptions txn_options;
                     ^
utilities/transactions/transaction_test.cc:4364:22: error: unused variable 'txn_options' [-Werror,-Wunused-variable]
  TransactionOptions txn_options;
                     ^
Closes https://github.com/facebook/rocksdb/pull/4074

Differential Revision: D8698051

Pulled By: ajkr

fbshipit-source-id: 6255618eefdd189962fbea1b02cf1eb5ae501274
2018-06-29 11:43:36 -07:00
Manuel Ung 8ad63a4b86 WriteUnPrepared: Add new WAL marker kTypeBeginUnprepareXID (#4069)
Summary:
This adds a new WAL marker of type kTypeBeginUnprepareXID.

Also, DBImpl now contains a field called batch_per_txn (meaning one WriteBatch per transaction, or possibly multiple WriteBatches). This would also indicate that this DB is using WriteUnprepared policy.

Recovery code would be able to make use of this extra field on DBImpl in a separate diff. For now, it is just used to determine whether the WAL is compatible or not.
Closes https://github.com/facebook/rocksdb/pull/4069

Differential Revision: D8675099

Pulled By: lth

fbshipit-source-id: ca27cae1738e46d65f2bb92860fc759deb874749
2018-06-28 18:58:29 -07:00
chouxi 818c84e116 Store timestamp in deadlock detection (#4060)
Summary:
- Summary
    Add timestamp into the DeadlockInfo to store the timestamp when deadlock detected on the rocksdb side.

- Testplan:
    `make check -j64`
Closes https://github.com/facebook/rocksdb/pull/4060

Differential Revision: D8655380

Pulled By: chouxi

fbshipit-source-id: f58e1aa5e09eb1d1eed0a181d4e2304aaf01efe8
2018-06-27 12:27:58 -07:00
Manuel Ung a16e00b7b9 WriteUnPrepared Txn: Disable seek to snapshot optimization (#3955)
Summary:
This is implemented by extending ReadCallback with another function `MaxUnpreparedSequenceNumber` which returns the largest visible sequence number for the current transaction, if there is uncommitted data written to DB. Otherwise, it returns zero, indicating no uncommitted data.

There are the places where reads had to be modified.
- Get and Seek/Next was just updated to seek to max(snapshot_seq, MaxUnpreparedSequenceNumber()) instead, and iterate until a key was visible.
- Prev did not need need updates since it did not use the Seek to sequence number optimization. Assuming that locks were held when writing unprepared keys, and ValidateSnapshot runs, there should only be committed keys and unprepared keys of the current transaction, all of which are visible. Prev will simply iterate to get the last visible key.
- Reseeking to skip keys optimization was also disabled for write unprepared, since it's possible to hit the max_skip condition even while reseeking. There needs to be some way to resolve infinite looping in this case.
Closes https://github.com/facebook/rocksdb/pull/3955

Differential Revision: D8286688

Pulled By: lth

fbshipit-source-id: 25e42f47fdeb5f7accea0f4fd350ef35198caafe
2018-06-27 12:23:07 -07:00
Manuel Ung ab2254bedf Fix clang analyze
Summary:
This fixes the errors as reported here:
https://github.com/facebook/rocksdb/pull/3941#issuecomment-394424043
Closes https://github.com/facebook/rocksdb/pull/3950

Differential Revision: D8263086

Pulled By: lth

fbshipit-source-id: 5e148d489cab2153e5846d16979a0a1f2d677d57
2018-06-04 14:44:23 -07:00
Manuel Ung 01e3c30def Extend existing unit tests to run with WriteUnprepared as well
Summary:
As titled.

I have not extended the Compatibility tests because the new WAL markers are still unimplemented.
Closes https://github.com/facebook/rocksdb/pull/3941

Differential Revision: D8238394

Pulled By: lth

fbshipit-source-id: 980e3d44837bbf2cfa64047f9738f559dfac4b1d
2018-06-01 14:58:41 -07:00
Manuel Ung aaac6cd16f Add write unprepared classes by inheriting from write prepared
Summary: Closes https://github.com/facebook/rocksdb/pull/3907

Differential Revision: D8218325

Pulled By: lth

fbshipit-source-id: ff32d8dab4a159cd2762876cba4b15e3dc51ff3b
2018-05-31 10:47:42 -07:00
Siying Dong 4dd80debd0 Remove tests from ROCKSDB_VALGRIND_RUN
Summary:
In order to make valgrind check test to pass in a day, remove some tests that run prohibitively slow under valgrind.
Closes https://github.com/facebook/rocksdb/pull/3924

Differential Revision: D8210184

Pulled By: siying

fbshipit-source-id: 5b06fb08f3cf57571d422d05a0dbddc9f9376f7a
2018-05-30 16:15:16 -07:00
Maysam Yabandeh 66c7aa32fb Clarify the ownership of root db after TransactionDB::Open
Summary:
The patch clarifies the ownership of the root db after TransactionDB::Open. If it is a success the ownership if with the TransactionDB, and the root db will be deleted when the destructor of the base class, StackableDB, is called. If it is failure, the temporarily created root db will also be deleted properly.
The patch also includes lots of useful formatting changes.

Closes https://github.com/facebook/rocksdb/pull/3714 upon which this patch is built.
Closes https://github.com/facebook/rocksdb/pull/3806

Differential Revision: D7878010

Pulled By: maysamyabandeh

fbshipit-source-id: f54f3942e29434143ae5a2423ceec9c7072cd4c2
2018-05-11 15:14:03 -07:00
Siying Dong d59549298f Skip deleted WALs during recovery
Summary:
This patch record min log number to keep to the manifest while flushing SST files to ignore them and any WAL older than them during recovery. This is to avoid scenarios when we have a gap between the WAL files are fed to the recovery procedure. The gap could happen by for example out-of-order WAL deletion. Such gap could cause problems in 2PC recovery where the prepared and commit entry are placed into two separate WAL and gap in the WALs could result into not processing the WAL with the commit entry and hence breaking the 2PC recovery logic.

Before the commit, for 2PC case, we determined which log number to keep in FindObsoleteFiles(). We looked at the earliest logs with outstanding prepare entries, or prepare entries whose respective commit or abort are in memtable. With the commit, the same calculation is done while we apply the SST flush. Just before installing the flush file, we precompute the earliest log file to keep after the flush finishes using the same logic (but skipping the memtables just flushed), record this information to the manifest entry for this new flushed SST file. This pre-computed value is also remembered in memory, and will later be used to determine whether a log file can be deleted. This value is unlikely to change until next flush because the commit entry will stay in memtable. (In WritePrepared, we could have removed the older log files as soon as all prepared entries are committed. It's not yet done anyway. Even if we do it, the only thing we loss with this new approach is earlier log deletion between two flushes, which does not guarantee to happen anyway because the obsolete file clean-up function is only executed after flush or compaction)

This min log number to keep is stored in the manifest using the safely-ignore customized field of AddFile entry, in order to guarantee that the DB generated using newer release can be opened by previous releases no older than 4.2.
Closes https://github.com/facebook/rocksdb/pull/3765

Differential Revision: D7747618

Pulled By: siying

fbshipit-source-id: d00c92105b4f83852e9754a1b70d6b64cb590729
2018-05-03 15:43:09 -07:00
Maysam Yabandeh cfb86659bf WritePrepared Txn: enable rollback in stress test
Summary:
Rollback was disabled in stress test since there was a concurrency issue in WritePrepared rollback algorithm. The issue is fixed by caching the column family handles in WritePrepared to skip getting them from the db when needed for rollback.

Tested by running transaction stress test under tsan.
Closes https://github.com/facebook/rocksdb/pull/3785

Differential Revision: D7793727

Pulled By: maysamyabandeh

fbshipit-source-id: d81ab6fda0e53186ca69944cfe0712ce4869451e
2018-05-02 18:13:05 -07:00
Maysam Yabandeh 5bed8a0065 WritePrepared Txn: split SeqAdvanceConcurrentTest
Summary:
The tsan flavor of SeqAdvanceConcurrentTest times out in our test infra. The patch splits it into 10 tests.
On my vm before:
[       OK ] WritePreparedTransactionTest/WritePreparedTransactionTest.SeqAdvanceConcurrentTest/0 (5194 ms)
after:
[       OK ] OneWriteQueue/SeqAdvanceConcurrentTest.SeqAdvanceConcurrentTest/0 (1906 ms)
Closes https://github.com/facebook/rocksdb/pull/3799

Differential Revision: D7854515

Pulled By: maysamyabandeh

fbshipit-source-id: 4fbac42a1f974326cbc237f8cb9d6232d379c431
2018-05-02 18:13:05 -07:00
Maysam Yabandeh bb2a2ec731 WritePrepared Txn: rollback via commit
Summary:
Currently WritePrepared rolls back a transaction with prepare sequence number prepare_seq by i) write a single rollback batch with rollback_seq, ii) add <rollback_seq, rollback_seq> to commit cache, iii) remove prepare_seq from PrepareHeap.
This is correct assuming that there is no snapshot taken when a transaction is rolled back. This is the case the way MySQL does rollback which is after recovery. Otherwise if max_evicted_seq advances the prepare_seq, the live snapshot might assume data as committed since it does not find them in CommitCache.
The change is to simply add <prepare_seq. rollback_seq> to commit cache before removing prepare_seq from PrepareHeap. In this way if max_evicted_seq advances prpeare_seq, the existing mechanism that we have to check evicted entries against live snapshots will make sure that the live snapshot will not see the data of rolled back transaction.
Closes https://github.com/facebook/rocksdb/pull/3745

Differential Revision: D7696193

Pulled By: maysamyabandeh

fbshipit-source-id: c9a2d46341ddc03554dded1303520a1cab74ef9c
2018-04-20 15:28:19 -07:00
Maysam Yabandeh c3d1e36cce WritePrepared Txn: enable TryAgain for duplicates at the end of the batch
Summary:
The WriteBatch::Iterate will try with a larger sequence number if the memtable reports a duplicate. This status is specified with TryAgain status. So far the assumption was that the last entry in the batch will never return TryAgain, which is correct when WAL is created via WritePrepared since it always appends a batch separator if a natural one does not exist. However when reading a WAL generated by WriteCommitted this batch separator might  not exist. Although WritePrepared is not supposed to be able to read the WAL generated by WriteCommitted we should avoid confusing scenarios in which the behavior becomes unpredictable. The path fixes that by allowing TryAgain even for the last entry of the write batch.
Closes https://github.com/facebook/rocksdb/pull/3747

Differential Revision: D7708391

Pulled By: maysamyabandeh

fbshipit-source-id: bfaddaa9b14a4cdaff6977f6f63c789a6ab1ee0d
2018-04-20 15:28:19 -07:00
Zhongyi Xie 954b496b3f fix memory leak in two_level_iterator
Summary:
this PR fixes a few failed contbuild:
1. ASAN memory leak in Block::NewIterator (table/block.cc:429). the proper destruction of first_level_iter_ and second_level_iter_ of two_level_iterator.cc is missing from the code after the refactoring in https://github.com/facebook/rocksdb/pull/3406
2. various unused param errors introduced by https://github.com/facebook/rocksdb/pull/3662
3. updated comment for `ForceReleaseCachedEntry` to emphasize the use of `force_erase` flag.
Closes https://github.com/facebook/rocksdb/pull/3718

Reviewed By: maysamyabandeh

Differential Revision: D7621192

Pulled By: miasantreble

fbshipit-source-id: 476c94264083a0730ded957c29de7807e4f5b146
2018-04-15 17:26:26 -07:00
David Lai 3be9b36453 comment unused parameters to turn on -Wunused-parameter flag
Summary:
This PR comments out the rest of the unused arguments which allow us to turn on the -Wunused-parameter flag. This is the second part of a codemod relating to https://github.com/facebook/rocksdb/pull/3557.
Closes https://github.com/facebook/rocksdb/pull/3662

Differential Revision: D7426121

Pulled By: Dayvedde

fbshipit-source-id: 223994923b42bd4953eb016a0129e47560f7e352
2018-04-12 17:59:16 -07:00