Summary:
TransactionOptions::skip_concurrency_control allows pessimistic transactions to skip the overhead of concurrency control. This could be as an optimization if the application knows that the transaction would not have any conflict with concurrent transactions. It is currently used during recovery assuming (i) application guarantees no conflict between prepared transactions in the WAL (ii) application guarantees that recovered transactions will be rolled back/commit before new transactions start.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4346
Differential Revision: D9759149
Pulled By: maysamyabandeh
fbshipit-source-id: f896e84fa58b0b584be904c7fd3883a41ea3215b
Summary:
This is a followup to #4311. Checking `!RangeDelAggregator::IsEmpty()` before opening a dedicated range tombstone SST did not properly prevent empty SSTs from being generated. That's because it relies on `CollapsedRangeDelMap::Size`, which had an underflow bug when the map was empty. This PR fixes that underflow bug.
Also fixed an uninitialized variable in db_stress.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4336
Differential Revision: D9600080
Pulled By: ajkr
fbshipit-source-id: bc6980ca79d2cd01b825ebc9dbccd51c1a70cfc7
Summary:
This PR fixes issue 3842. We drop deletion markers iff
1. We are the bottom most level AND
2. All other occurrences of the key are in the same snapshot range as the delete
I've also enhanced db_stress_test to add an option that does a full compare of the keys. This is done by a single thread (thread # 0). For tests I've run (so far)
make check -j64
db_stress
db_stress --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify that new code doesnt break existing tests */
./db_stress --compare_full_db_state_snapshot=true --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify new test code */
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4289
Differential Revision: D9491165
Pulled By: shrikanthshankar
fbshipit-source-id: ce144834f31736c189aaca81bed356ba990331e2
Summary:
The API comment on `OnTableFileCreationStarted` (b6280d01f9/include/rocksdb/listener.h (L331-L333)) led users to believe a call to `OnTableFileCreationStarted` will always be matched with a call to `OnTableFileCreated`. However, we were skipping the `OnTableFileCreated` call in one case: no error happens but also no file is generated since there's no data.
This PR adds the call to `OnTableFileCreated` for that case. The filename will be "(nil)" and the size will be zero.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4307
Differential Revision: D9485201
Pulled By: ajkr
fbshipit-source-id: 2f077ec7913f128487aae2624c69a50762394df6
Summary:
We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039
Differential Revision: D8670178
Pulled By: riversand963
fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a
Summary:
In the OnTableFileCreation() listener, assert on various TableProperties
only when file size > 0 bytes. The listener can get called even for 0
byte SSTs which have been deleted.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4273
Differential Revision: D9322738
Pulled By: anand1976
fbshipit-source-id: 17cdfb3d0da946b9a158d7328e5db1c87973956b
Summary:
Due to 4ea56b1bd0, we should also remove the
assersion in stress test. This removal can be temporary, and we can add it back
once we figure out the reason for the 0-byte SSTs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4268
Differential Revision: D9297186
Pulled By: riversand963
fbshipit-source-id: cebba9a68f42e815f8cf24471176d2cfdf962f63
Summary:
Although delete scheduler implementation allows for the interface not to be supported, the delete_scheduler_test does not allow for that.
Address compiler warnings
Make sst_dump_test use test directory structure as the current execution directory may not be writiable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4221
Differential Revision: D9210152
Pulled By: siying
fbshipit-source-id: 381a74511e969ecb8089d5c4b4df87dc30c8df63
Summary:
- Add `--compression_max_dict_bytes` and `--compression_zstd_max_train_bytes` flags to stress test
- Randomly enable/disable the above flags in crash test
- Set `--compression_type=zstd` in FB-specific crash test runs
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4234
Differential Revision: D9187207
Pulled By: ajkr
fbshipit-source-id: 8d78cf8d8e1165f2cd1c32e069b73726b5bc1fd2
Summary:
Making generation of column families and keys virtual function so that
subclasses of StressTest can override them to provide custom parameter
generation for more flexibility. This will be useful for future tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4046
Differential Revision: D9073382
Pulled By: riversand963
fbshipit-source-id: 2754f0fdfa5c24d95c1f92d4944bc479552fb665
Summary:
RocksDB used to store global_seqno in external SST files written by
SstFileWriter. During file ingestion, RocksDB uses `pwrite` to update the
`global_seqno`. Since random write is not supported in some non-POSIX compliant
file systems, external SST file ingestion is not supported on these file
systems. To address this limitation, we no longer update `global_seqno` during
file ingestion. Later RocksDB uses the MANIFEST and other information in table
properties to deduce global seqno for externally-ingested SST files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4172
Differential Revision: D8961465
Pulled By: riversand963
fbshipit-source-id: 4382ec85270a96be5bc0cf33758ca2b167b05071
Summary:
If crash happen after a hard link established, Recover function may reuse the file number that has already assigned to the internal file, and this will overwrite the external file. To protect the external file, we have to make sure the file number will never being reused.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4099
Differential Revision: D9034092
Pulled By: riversand963
fbshipit-source-id: 3f1a737440b86aa2ef01673e5013aacbb7c33e28
Summary:
db_stress doesn't cover upper or lower bound in iterators. Try to cover it by randomly assigning a random one. Also in prefix scan tests, with 50% of the chance, set next prefix as the upper bound.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4162
Differential Revision: D8953507
Pulled By: siying
fbshipit-source-id: f0f04e9cb6c07cbebbb82b892ca23e0daeea708b
Summary:
- Attempt to clean the checkpoint staging directory before starting a checkpoint. It was already cleaned up at the end of checkpoint. But it wasn't cleaned up in the edge case where the process crashed while staging checkpoint files.
- Attempt to clean the checkpoint directory before calling `Checkpoint::Create` in `db_stress`. This handles the case where checkpoint directory was created by a previous `db_stress` run but the process crashed before cleaning it up.
- Use `DestroyDB` for cleaning checkpoint directory since a checkpoint is a DB.
Closes https://github.com/facebook/rocksdb/pull/4035
Reviewed By: yiwu-arbug
Differential Revision: D8580223
Pulled By: ajkr
fbshipit-source-id: 28c667400e249fad0fdedc664b349031b7b61599
Summary:
Once per `ingest_external_file_one_in` operations, uses SstFileWriter to create a file containing `ingest_external_file_width` consecutive keys. The file is named containing the thread ID to avoid clashes. The file is then added to the DB using `IngestExternalFile`.
We can't enable it by default in crash test because `nooverwritepercent` and `test_batches_snapshot` both must be zero for the DB's whole lifetime. Perhaps we should setup a separate test with that config as range deletion also requires it.
Closes https://github.com/facebook/rocksdb/pull/4018
Differential Revision: D8507698
Pulled By: ajkr
fbshipit-source-id: 1437ea26fd989349a9ce8b94117241c65e40f10f
Summary:
Add the `backup_one_in` and `checkpoint_one_in` options to periodically trigger backups and checkpoints. The directory names contain thread ID to avoid clashing with parallel backups/checkpoints. Enable checkpoint in crash test so our CI runs will use it. Didn't enable backup in crash test since it copies all the files which is too slow.
Closes https://github.com/facebook/rocksdb/pull/4005
Differential Revision: D8472275
Pulled By: ajkr
fbshipit-source-id: ff91bdc37caac4ffd97aea8df96b3983313ac1d5
Summary:
db_stress initialization randomly chooses a set of keys to not overwrite. It was doing it separately for each column family. That caused 30+ second initialization times for the non-simple crash tests, which have 10 CFs. This PR:
- reuses the same set of randomly chosen no-overwrite keys across all CFs
- logs a couple more timestamps so we can more easily see initialization time
Closes https://github.com/facebook/rocksdb/pull/3990
Differential Revision: D8393821
Pulled By: ajkr
fbshipit-source-id: d0b263a298df607285ffdd8b0983ff6575cc6c34
Summary:
We use `db_stress.cc` intensively to test and verify the behavior of RocksDB. Sometimes we need to add new tests for recently added features. Original `StressTest` class provides many general functionality that can be leveraged by other tests. Therefore, in this refactoring PR, I try to identify the general operations as well as operations that future tests most likely want to customize. Future tests can inherit `StressTest` and overriding the virtual functions to test custom logic.
Closes https://github.com/facebook/rocksdb/pull/3902
Differential Revision: D8284607
Pulled By: riversand963
fbshipit-source-id: 019302d04665a2b18334b6d05d04a477168c8ea4
Summary:
PR https://github.com/facebook/rocksdb/pull/3838 made some changes that triggers lint warnings.
Run `make format` to fix formatting as suggested by siying .
Also piggyback two changes:
1) fix singleton destruction order for windows and posix env
2) fix two clang warnings
Closes https://github.com/facebook/rocksdb/pull/3954
Differential Revision: D8272041
Pulled By: miasantreble
fbshipit-source-id: 7c4fd12bd17aac13534520de0c733328aa3c6c9f
Summary:
format_version=3 changes the format of SST index. This is however not being tested currently since tests only work with the default format_version which is currently 2. The patch extends the most related tests to also test for format_version=3.
Closes https://github.com/facebook/rocksdb/pull/3942
Differential Revision: D8238413
Pulled By: maysamyabandeh
fbshipit-source-id: 915725f55753dd8e9188e802bf471c23645ad035
Summary:
Previously `db_stress` attempted to configure direct I/O dynamically in `SetOptions()` which had multiple problems (ummm must've never been tested):
- It's a DB option so SetDBOptions should've been called instead
- It's not a dynamic option so even SetDBOptions would fail
- It required enabling SyncPoint to mask O_DIRECT since it had no way to detect whether the DB directory was in tmpfs or not. This required locking that consumed ~80% of db_stress CPU.
In this PR I delete the broken dynamic config and instead configure it statically, only enabling it if the DB directory truly supports O_DIRECT.
Closes https://github.com/facebook/rocksdb/pull/3939
Differential Revision: D8238120
Pulled By: ajkr
fbshipit-source-id: 60bb2deebe6c9b54a3f788079261715b4a229279
Summary:
Catch up with Posix features
NewWritableRWFile must fail when file does not exists
Implement Env::Truncate()
Adjust Env options optimization functions
Implement MemoryMappedBuffer on Windows.
Closes https://github.com/facebook/rocksdb/pull/3857
Differential Revision: D8053610
Pulled By: ajkr
fbshipit-source-id: ccd0d46c29648a9f6f496873bc1c9d6c5547487e
Summary:
I repro'd some of the "unexpected value" failures showing up in our CI lately and they always happened on keys that have a mix of single deletes and merge operands. The `SingleDelete()` API comment mentions it's incompatible with `Merge()`, so this PR prevents `db_stress` from mixing them.
Closes https://github.com/facebook/rocksdb/pull/3878
Differential Revision: D8097346
Pulled By: ajkr
fbshipit-source-id: 357a48c6a31156f4f8db3ce565638ad924c437a1
Summary:
In the past, the default value of max_manifest_file_size is uint64_t::MAX,
allowing a long running RocksDB process to grow its MANIFEST file to take up
the entire disk, as reported in [issue 3851](https://github.com/facebook/rocksdb/issues/3851). It is reasonable and common to provide a default non-max value for this option. Therefore, I set the value to 1GB.
siying miasantreble Please let me know whether this looks good to you. Thanks!
Closes https://github.com/facebook/rocksdb/pull/3867
Differential Revision: D8051524
Pulled By: riversand963
fbshipit-source-id: 50251f0804b1fa933a19a30d19d261ea8b9d2b72
Summary:
I noticed, while debugging an unrelated issue, that db_stress is failing to build on mac, leading to a failed `make all`.
```
$ make db_stress -j4
...
tools/db_stress.cc:862:69: error: cannot initialize a parameter of type 'uint64_t *' (aka 'unsigned long long *') with an rvalue of type 'size_t *' (aka 'unsigned long *')
status = FLAGS_env->GetFileSize(FLAGS_expected_values_path, &size);
^~~~~
./include/rocksdb/env.h:277:66: note: passing argument to parameter 'file_size' here
virtual Status GetFileSize(const std::string& fname, uint64_t* file_size) = 0;
^
1 error generated.
make: *** [tools/db_stress.o] Error 1
make: *** Waiting for unfinished jobs....
```
Closes https://github.com/facebook/rocksdb/pull/3839
Differential Revision: D7979236
Pulled By: sagar0
fbshipit-source-id: 0615e7bb5405bade71e4203803bf723720422d62
Summary:
In case `--expected_values_path` is unset, we allocate a buffer internally to hold the expected DB state. This PR makes sure it is freed.
Closes https://github.com/facebook/rocksdb/pull/3804
Differential Revision: D7874694
Pulled By: ajkr
fbshipit-source-id: a8f7655e009507c4e639ceebfc3525d69c856e3b
Summary:
Returning bytes_read causes the caller to call GetLastError()
to report failure but the lasterror may be overwritten by then
so we lose the error code.
Fix up CMake file to include xpress source code only when needed.
Fix warning for the uninitialized var.
Closes https://github.com/facebook/rocksdb/pull/3795
Differential Revision: D7832935
Pulled By: anand1976
fbshipit-source-id: 4be21affb9b85d361b96244f4ef459f492b7cb2b
Summary:
- Original commit: a4fb1f8c04
- Revert commit (we reverted as a quick fix to get crash tests passing): 6afe22db2e
This PR includes the contents of the original commit plus two bug fixes, which are:
- In whitebox crash test, only set `--expected_values_path` for `db_stress` runs in the first half of the crash test's duration. In the second half, a fresh DB is created for each `db_stress` run, so we cannot maintain expected state across `db_stress` runs.
- Made `Exists()` return true for `UNKNOWN_SENTINEL` values. I previously had an assert in `Exists()` that value was not `UNKNOWN_SENTINEL`. But it is possible for post-crash-recovery expected values to be `UNKNOWN_SENTINEL` (i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where a `SingleDelete` deletes no data. But if we had returned false, the effect would be calling `SingleDelete` on a key with multiple older versions, which is not supported.
Closes https://github.com/facebook/rocksdb/pull/3793
Differential Revision: D7811671
Pulled By: ajkr
fbshipit-source-id: 67e0295bfb1695ff9674837f2e05bb29c50efc30
Summary:
crash-recovery verification is failing in the whitebox testing, which may or may not be a valid correctness issue -- need more time to investigate. In the meantime, reverting so we don't mask other failures.
Closes https://github.com/facebook/rocksdb/pull/3786
Differential Revision: D7794516
Pulled By: ajkr
fbshipit-source-id: 28ccdfdb9ec9b3b0fb08c15cbf9d2e282201ff33
Summary:
- When options file is provided to db_stress, take supported options from the file instead of from flags
- Call `BuildOptionsTable` after `Open` so it can use `options_` once it has been populated either from flags or from file
- Allow options filename to be passed via `db_crashtest.py`
Closes https://github.com/facebook/rocksdb/pull/3768
Differential Revision: D7755331
Pulled By: ajkr
fbshipit-source-id: 5205cc5deb0d74d677b9832174153812bab9a60a
Summary:
Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values.
In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`.
For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class.
`db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup.
Closes https://github.com/facebook/rocksdb/pull/3629
Differential Revision: D7463144
Pulled By: ajkr
fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b
Summary:
db_stress was already capable running transactions by setting use_txn. Running it under stress showed a couple of problems fixed in this patch.
- The uncommitted transaction must be either rolled back or commit after recovery.
- Current implementation of WritePrepared transaction cannot handle cf drop before crash. Clarified that in the comments and added safety checks. When running with use_txn, clear_column_family_one_in must be set to 0.
Closes https://github.com/facebook/rocksdb/pull/3733
Differential Revision: D7654419
Pulled By: maysamyabandeh
fbshipit-source-id: a024bad80a9dc99677398c00d29ff17d4436b7f3
Summary:
fix a bug in `db_stress` where an int array was incorrectly deallocated using delete instead of delete[]
Closes https://github.com/facebook/rocksdb/pull/3725
Differential Revision: D7634749
Pulled By: miasantreble
fbshipit-source-id: 489b776f5f4c03de1824edac5495787ec19cc910
Summary:
this PR fixes a few failed contbuild:
1. ASAN memory leak in Block::NewIterator (table/block.cc:429). the proper destruction of first_level_iter_ and second_level_iter_ of two_level_iterator.cc is missing from the code after the refactoring in https://github.com/facebook/rocksdb/pull/3406
2. various unused param errors introduced by https://github.com/facebook/rocksdb/pull/3662
3. updated comment for `ForceReleaseCachedEntry` to emphasize the use of `force_erase` flag.
Closes https://github.com/facebook/rocksdb/pull/3718
Reviewed By: maysamyabandeh
Differential Revision: D7621192
Pulled By: miasantreble
fbshipit-source-id: 476c94264083a0730ded957c29de7807e4f5b146
Summary:
…verwrite_keys. Also changed each no_overwrite_key set to an unordered set, otherwise Knuth shuffle only gets you 2x time improvement, because insertion (and subsequent internal sorting) into an ordered set is the bottleneck.
With this change, each iteration of permutation construction and prefix selection takes around 40 secs, as opposed to 360 secs previously. However, this still means that with the default 10 CF per blackbox test case, the test is going to time out given the default interval of 200 secs.
Also, there is currently an assertion error affecting all blackbox tests in db_crashtest.py; this assertion error will be fixed in a future PR.
Closes https://github.com/facebook/rocksdb/pull/3699
Differential Revision: D7624616
Pulled By: amytai
fbshipit-source-id: ea64fbe83407ff96c1c0ecabbc6c830576939393
Summary:
This PR comments out the rest of the unused arguments which allow us to turn on the -Wunused-parameter flag. This is the second part of a codemod relating to https://github.com/facebook/rocksdb/pull/3557.
Closes https://github.com/facebook/rocksdb/pull/3662
Differential Revision: D7426121
Pulled By: Dayvedde
fbshipit-source-id: 223994923b42bd4953eb016a0129e47560f7e352
Summary:
In this change, an option to set different paths for different column families is added.
This option is set via cf_paths setting of ColumnFamilyOptions. This option will work in a similar fashion to db_paths setting. Cf_paths is a vector of Dbpath values which contains a pair of the absolute path and target size. Multiple levels in a Column family can go to different paths if cf_paths has more than one path.
To maintain backward compatibility, if cf_paths is not specified for a column family, db_paths setting will be used. Note that, if db_paths setting is also not specified, RocksDB already has code to use db_name as the only path.
Changes :
1) A new member "cf_paths" is added to ImmutableCfOptions. This is set, based on cf_paths setting of ColumnFamilyOptions and db_paths setting of ImmutableDbOptions. This member is used to identify the path information whenever files are accessed.
2) Validation checks are added for cf_paths setting based on existing checks for db_paths setting.
3) DestroyDB, PurgeObsoleteFiles etc. are edited to support multiple cf_paths.
4) Unit tests are added appropriately.
Closes https://github.com/facebook/rocksdb/pull/3102
Differential Revision: D6951697
Pulled By: ajkr
fbshipit-source-id: 60d2262862b0a8fd6605b09ccb0da32bb331787d
Summary:
The zeroed entries were not removed from prepared_section_completed_ map. This patch adds a unit test to show the problem and fixes that by refactoring the code. The new code is more efficient since i) it uses two separate mutex to avoid contention between commit and prepare threads, ii) it uses a sorted vector for maintaining uniq log entires with prepare which avoids a very large heap with many duplicate entries.
Closes https://github.com/facebook/rocksdb/pull/3545
Differential Revision: D7106071
Pulled By: maysamyabandeh
fbshipit-source-id: b3ae17cb6cd37ef10b6b35e0086c15c758768a48
Summary:
We added a new verification that ensures a value that snapshot reads when is released is the same as when it was created. This test however fails when the cf is dropped in between. The patch skips the tests if that was the case.
Closes https://github.com/facebook/rocksdb/pull/3279
Differential Revision: D6581584
Pulled By: maysamyabandeh
fbshipit-source-id: afe37d371c0f91818d2e279b3949b810e112e8eb
Summary:
Add "--use_txn" option to use transactional API in db_stress, default being WRITE_PREPARED policy, which is the main intention of modifying db_stress. It also extend the existing snapshots to verify that before releasing a snapshot a read from it returns the same value as before.
Closes https://github.com/facebook/rocksdb/pull/3243
Differential Revision: D6556912
Pulled By: maysamyabandeh
fbshipit-source-id: 1ae31465be362d44bd06e635e2e9e49a1da11268
Summary:
I started adding gflags support for cmake on linux and got frustrated that I'd need to duplicate the build_detect_platform logic, which determines namespace based on attempting compilation. We can do it differently -- use the GFLAGS_NAMESPACE macro if available, and if not, that indicates it's an old gflags version without configurable namespace so we can simply hardcode "google".
Closes https://github.com/facebook/rocksdb/pull/3212
Differential Revision: D6456973
Pulled By: ajkr
fbshipit-source-id: 3e6d5bde3ca00d4496a120a7caf4687399f5d656
Summary:
- Release all snapshots before crashing and reopening the DB. Without this, we may attempt to release snapshots from an old DB using a new DB. That tripped an assertion.
- Release multiple snapshots in the same operation if needed. Without this, we would sometimes leak snapshots.
Closes https://github.com/facebook/rocksdb/pull/3098
Differential Revision: D6194923
Pulled By: ajkr
fbshipit-source-id: b9c89bcca7ebcbb6c7802c616f9d1175a005aadf
Summary:
Add options to `db_stress` (correctness testing tool) to randomly acquire snapshot and release it after some period of time. It's useful for correctness testing of #3009, as well as other parts of compaction that behave differently depending on which snapshots are held.
Closes https://github.com/facebook/rocksdb/pull/3038
Differential Revision: D6086501
Pulled By: ajkr
fbshipit-source-id: 3ec0d8666c78ac507f1f808887c4ff759ba9b865
Summary:
store a zero as the checksum when disabled since it's easier to keep block trailer a fixed length.
Closes https://github.com/facebook/rocksdb/pull/2781
Differential Revision: D5694702
Pulled By: ajkr
fbshipit-source-id: 69cea9da415778ba2b600dfd9d0dfc8cb5188ecd
Summary:
Clang complain about an cast from uint64_t to int32 in db_stress. Fixing it.
Closes https://github.com/facebook/rocksdb/pull/2755
Differential Revision: D5655947
Pulled By: yiwu-arbug
fbshipit-source-id: cfac10e796e0adfef4727090b50975b0d6e2c9be
Summary:
fix some things that made this command hard to use from CLI:
- use default values for `target_file_size_base` and `max_bytes_for_level_base`. previously we were using small values for these but default value of `write_buffer_size`, which led to enormous number of L1 files.
- failure message for `value_size_mult` too big. previously there was just an assert, so in non-debug mode it'd overrun the value buffer and crash mysteriously.
- only print verification success if there's no failure. before it'd print both in the failure case.
- support `memtable_prefix_bloom_size_ratio`
- support `num_bottom_pri_threads` (universal compaction)
Closes https://github.com/facebook/rocksdb/pull/2741
Differential Revision: D5629495
Pulled By: ajkr
fbshipit-source-id: ddad97d6d4ba0884e7c0f933b0a359712514fc1d
Summary:
Support a window of `active_width` keys that rolls through `[0, max_key)` over the duration of the test. Operations only affect keys inside the window. This gives us the ability to detect L0->L0 deletion bug (#2722).
Closes https://github.com/facebook/rocksdb/pull/2739
Differential Revision: D5628555
Pulled By: ajkr
fbshipit-source-id: 9cb2d8f4ab1a7c73f7797b8e19f7094970ea8749
Summary:
Allow users to rate limit background work based on read bytes, written bytes, or sum of read and written bytes. Support these by changing the RateLimiter API, so no additional options were needed.
Closes https://github.com/facebook/rocksdb/pull/2433
Differential Revision: D5216946
Pulled By: ajkr
fbshipit-source-id: aec57a8357dbb4bfde2003261094d786d94f724e
Summary:
Release builds are failing on Linux with the error:
```
tools/db_stress.cc: In function ‘int main(int, char**)’:
tools/db_stress.cc:2365:12: error: ‘rocksdb::SyncPoint’ has not been declared
rocksdb::SyncPoint::GetInstance()->SetCallBack(
^
tools/db_stress.cc:2370:12: error: ‘rocksdb::SyncPoint’ has not been declared
rocksdb::SyncPoint::GetInstance()->SetCallBack(
^
tools/db_stress.cc:2375:12: error: ‘rocksdb::SyncPoint’ has not been declared
rocksdb::SyncPoint::GetInstance()->EnableProcessing();
^
make[1]: *** [tools/db_stress.o] Error 1
make[1]: Leaving directory `/data/sandcastle/boxes/trunk-git-rocksdb-public'
make: *** [release] Error 2
```
Closes https://github.com/facebook/rocksdb/pull/2355
Differential Revision: D5113552
Pulled By: sagar0
fbshipit-source-id: 351df707277787da5633ba4a40e52edc7c895dc4
Summary:
Disable direct reads for log and manifest. Direct reads should not affect sequential_file
Also add kDirectIO for option_config_ in db_test_util
Closes https://github.com/facebook/rocksdb/pull/2337
Differential Revision: D5100261
Pulled By: lightmark
fbshipit-source-id: 0ebfd13b93fa1b8f9acae514ac44f8125a05868b
Summary:
Replace Options::use_direct_writes with Options::use_direct_io_for_flush_and_compaction
Now if Options::use_direct_io_for_flush_and_compaction = true, we will enable direct io for both reads and writes for flush and compaction job. Whereas Options::use_direct_reads controls user reads like iterator and Get().
Closes https://github.com/facebook/rocksdb/pull/2117
Differential Revision: D4860912
Pulled By: lightmark
fbshipit-source-id: d93575a8a5e780cf7e40797287edc425ee648c19
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090
Differential Revision: D4833681
Pulled By: siying
fbshipit-source-id: 2fd8bef
Summary:
add direct_io and compaction_readahead_size in db_stress
test direct_io under db_stress with compaction_readahead_size enabled to capture bugs found in production.
`./db_stress --allow_concurrent_memtable_write=0 --use_direct_reads --use_direct_writes --compaction_readahead_size=4096`
Closes https://github.com/facebook/rocksdb/pull/1906
Differential Revision: D4604514
Pulled By: IslamAbdelRahman
fbshipit-source-id: ebbf0ee
Summary:
Remove disableDataSync, and another similarly named disable_data_sync options.
This is being done to simplify options, and also because the performance gains of this feature can be achieved by other methods.
Closes https://github.com/facebook/rocksdb/pull/1859
Differential Revision: D4541292
Pulled By: sagar0
fbshipit-source-id: 5b3a6ca
Summary:
made db_stress capable of adding range deletions to its db and verifying their correctness. i'll make db_crashtest.py use this option later once the collapsing optimization (https://github.com/facebook/rocksdb/pull/1614) is committed because currently it slows down the test too much.
Closes https://github.com/facebook/rocksdb/pull/1625
Differential Revision: D4293939
Pulled By: ajkr
fbshipit-source-id: d3beb3a
Summary:
ZSTD 1.0.0 is coming. We can finally add a support of ZSTD without worrying about compatibility.
Still keep ZSTDNotFinal for compatibility reason.
Test Plan: Run all tests. Run db_bench with ZSTD version with RocksDB built with ZSTD 1.0 and older.
Reviewers: andrewkr, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: cyan, igor, IslamAbdelRahman, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D63141
Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes.
Test Plan: Add two new unit tests. Run all existing tests, including jtest.
Reviewers: yhchiang, igor, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D59829
Summary:
Clock-based cache implemenetation aim to have better concurreny than
default LRU cache. See inline comments for implementation details.
Test Plan:
Update cache_test to run on both LRUCache and ClockCache. Adding some
new tests to catch some of the bugs that I fixed while implementing the
cache.
Reviewers: kradhakrishnan, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D61647
Summary:
db_stress test is now failing because of this scenario
- run db_stress with merge_operator enabled (now we have a db with merge operands)
- run db_stress with merge_operator disabled (now when we fail to open the db)
the solution is to pass the merge_operator to the DB even if we are not going to do any merge operations
Test Plan: Check the failure
Reviewers: andrewkr, yiwu, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D61311
Summary: OnTableFileCreated() now is also called when the file creaion fails. In that case, we shouldn't assert the file size is not 0.
Test Plan: Run crash test
Reviewers: yiwu, andrewkr, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: IslamAbdelRahman, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D61137
Summary: Extend the option memtable_prefix_bloom_huge_page_tlb_size from just putting memtable bloom filter to huge page to memtable itself too.
Test Plan: Run all existing tests.
Reviewers: IslamAbdelRahman, yhchiang, andrewkr
Reviewed By: andrewkr
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D60513
Summary: filter_deltes is not a frequently used feature. Remove it.
Test Plan: Run all test suites.
Reviewers: igor, yhchiang, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D59427
Summary:
crash_test grep for 'fail' string in the output and if found it consider that we failed.
Update the output to use something else
Test Plan: make crash_test (still running)
Reviewers: yhchiang, sdong, andrewkr
Reviewed By: andrewkr
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D57381
Summary:
As db_stress with CompactFiles possibly catches a previous bug currently,
temporarily disable CompactFiles in db_stress in its default setting
to allows new bug to be detected while investigating the bug in CompactFiles.
Test Plan: crash test
Reviewers: sdong, kradhakrishnan, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D57333
Comparable with Snappy on comp ratio.
Implemented using Windows API, does not require external package.
Avaiable since Windows 8 and server 2012.
Use -DXPRESS=1 with CMake to enable.
Summary:
Enable testing CompactFiles in db_stress by adding flag test_compact_files
to db_stress.
Test Plan:
./db_stress --test_compact_files=1 --compaction_style=0 --allow_concurrent_memtable_write=false --ops_per_thread=100000
./db_stress --test_compact_files=1 --compaction_style=1 --allow_concurrent_memtable_write=false --ops_per_thread=100000
Sample output (note that it's normal to have some CompactFiles() failed):
Stress Test : 491.891 micros/op 65054 ops/sec
: Wrote 21.98 MB (0.45 MB/sec) (45% of 3200352 ops)
: Wrote 1440728 times
: Deleted 441616 times
: Single deleted 38181 times
: 319251 read and 19025 found the key
: Prefix scanned 640520 times
: Iterator size sum is 9691415
: Iterated 319704 times
: Got errors 0 times
: 1323 CompactFiles() succeed
: 32 CompactFiles() failed
2016/04/11-15:50:58 Verification successful
Reviewers: sdong, IslamAbdelRahman, kradhakrishnan, yiwu, andrewkr
Reviewed By: andrewkr
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D56565
Summary:
Expose the inverse of ToString(hex=true) on Slice: Slice::DecodeHex
Refactor the other implementation of to/from hex in ldb_cmd.h to use the Slice
version
(Difference between the 2 is whether 0x is expected/produced in front of the hex
string or not)
Eliminated support for invalid odd length hex string - this is now invalid
instead of having 1/2 byte set
Added (inverse of HexToString) test for LDBCommand::StringToHex which also
indirectly tests Slice::ToString(true)
After moving the original implementation from ldb_cmd.h, updated it to much simpler/efficient version
(originally/inspired from https://github.com/facebook/wdt/blob/master/util/EncryptionUtils.cpp#L140-L169 )
Test Plan: run tests
Reviewers: uddipta, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56121
Summary: Default crash test uses prefix hash memtable, which is not compatible to concurrent memtable. Allow prefix test run with skip list and use skip list memtable when concurrent insert is used.
Test Plan: Run "python -u tools/db_crashtest.py whitebox" and watch sometimes skip list is used.
Reviewers: anthony, yhchiang, kradhakrishnan, andrewkr, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D53907
Summary: Add an option of --allow_concurrent_memtable_write in stress test and cover it in crash test
Test Plan: Run crash test and make sure three combinations of the two options show up randomly.
Reviewers: IslamAbdelRahman, yhchiang, andrewkr, anthony, kradhakrishnan
Reviewed By: kradhakrishnan
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D53811
Summary:
Give a name for every kill point, and allow users to disable some kill points based on prefixes. The kill points can be passed by db_stress through a command line paramter. This provides a way for users to boost the chance of triggering low frequency kill points
This allow follow up changes in crash test scripts to improve crash test coverage.
Test Plan:
Manually run db_stress with variable values of --kill_random_test and --kill_prefix_blacklist. Like this:
--kill_random_test=2 --kill_prefix_blacklist=Posix,WritableFileWriter::Append,WritableFileWriter::WriteBuffered,WritableFileWriter::Sync
Reviewers: igor, kradhakrishnan, rven, IslamAbdelRahman, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D48735
Summary:
Fixed the following tsan warning in db_stress.cc
WARNING: ThreadSanitizer: data race (pid=3163194)
Read of size 8 at 0x7fd1797cb518 by thread T32:
#0 VerifyDb tools/db_stress.cc:1731 (db_stress+0x000000040674)
#1 rocksdb::StressTest::ThreadBody(void*) tools/db_stress.cc:1191 (db_stress+0x0000000625a9)
#2 StartThreadWrapper util/env_posix.cc:1648 (db_stress+0x00000028bbbd)
Previous write of size 8 at 0x7fd1797cb518 by thread T31:
#0 VerifyDb tools/db_stress.cc:1726 (db_stress+0x00000004072a)
#1 rocksdb::StressTest::ThreadBody(void*) tools/db_stress.cc:1191 (db_stress+0x0000000625a9)
#2 StartThreadWrapper util/env_posix.cc:1648 (db_stress+0x00000028bbbd)
The cause is that in VerifyDb(), the static local const variable long max_key
can be read and written at the same time. This patch fixed it by making it
non-static.
Test Plan: db_stress
Reviewers: igor, sdong, IslamAbdelRahman, anthony
Reviewed By: anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D47703
Summary:
This patch fixes#7460559. It introduces SingleDelete as a new database
operation. This operation can be used to delete keys that were never
overwritten (no put following another put of the same key). If an overwritten
key is single deleted the behavior is undefined. Single deletion of a
non-existent key has no effect but multiple consecutive single deletions are
not allowed (see limitations).
In contrast to the conventional Delete() operation, the deletion entry is
removed along with the value when the two are lined up in a compaction. Note:
The semantics are similar to @igor's prototype that allowed to have this
behavior on the granularity of a column family (
https://reviews.facebook.net/D42093 ). This new patch, however, is more
aggressive when it comes to removing tombstones: It removes the SingleDelete
together with the value whenever there is no snapshot between them while the
older patch only did this when the sequence number of the deletion was older
than the earliest snapshot.
Most of the complex additions are in the Compaction Iterator, all other changes
should be relatively straightforward. The patch also includes basic support for
single deletions in db_stress and db_bench.
Limitations:
- Not compatible with cuckoo hash tables
- Single deletions cannot be used in combination with merges and normal
deletions on the same key (other keys are not affected by this)
- Consecutive single deletions are currently not allowed (and older version of
this patch supported this so it could be resurrected if needed)
Test Plan: make all check
Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor
Reviewed By: igor
Subscribers: maykov, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43179
Summary: Add ZSTD compression type. The same way as adding LZ4.
Test Plan: run all tests. Generate files in db_bench. Make sure reads succeed. But the SST files cannot be opened in older versions. Also some other adhoc tests.
Reviewers: rven, anthony, IslamAbdelRahman, kradhakrishnan, igor
Reviewed By: igor
Subscribers: MarkCallaghan, maykov, yoshinorim, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D45747