rocksdb

Commit Graph

Author	SHA1	Message	Date
Cheng Chang	5f478b9f75	Remove outdated comment (#6379 ) Summary: Since the logic for handling IDENTITY file is now inside `NewDB`, the comment above `NewDB` is no longer relevant. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6379 Test Plan: not needed Differential Revision: D19795440 Pulled By: cheng-chang fbshipit-source-id: 0b1cca87ac6d92474701c46aa4c8d4d708bfa19b	2020-02-07 13:18:43 -08:00
Levi Tamasi	1b4be4cac9	BlobDB: ignore trivially moved files when updating the SST<->blob file mapping (#6381 ) Summary: BlobDB keeps track of the mapping between SSTs and blob files using the `OnFlushCompleted` and `OnCompactionCompleted` callbacks of the `EventListener` interface: upon receiving a flush notification, a link is added between the newly flushed SST and the corresponding blob file; for compactions, links are removed for the inputs and added for the outputs. The earlier code performed this link deletion and addition even for trivially moved files; the new code walks through the two lists together (in a fashion that's similar to merge sort) and skips such files. This should mitigate https://github.com/facebook/rocksdb/issues/6338, wherein an assertion is triggered with the earlier code when a compaction notification for a trivial move precedes the flush notification for the moved SST. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6381 Test Plan: make check Differential Revision: D19773729 Pulled By: ltamasi fbshipit-source-id: ae0f273ded061110dd9334e8fb99b0d7786650b0	2020-02-07 12:50:57 -08:00
Cheng Chang	107a7ca930	Remove inappropriate comments (#6371 ) Summary: The comments are for iterators, not Cleanable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6371 Test Plan: no need Differential Revision: D19727527 Pulled By: cheng-chang fbshipit-source-id: c74aeffa27ea0ce15a36ff6f9694826712cd1c70	2020-02-07 12:35:24 -08:00
Cheng Chang	0a74e1b958	Add status checks during DB::Open (#6380 ) Summary: Several statuses were not checked during DB::Open. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6380 Test Plan: make check Differential Revision: D19780237 Pulled By: cheng-chang fbshipit-source-id: c8d189d20344bd1607890dd1449345bda2ef96b9	2020-02-07 12:32:09 -08:00
Yanqin Jin	f361cedf06	Atomic flush rollback once on failure (#6385 ) Summary: Before this fix, atomic flush codepath may hit an assertion failure on a specific failure case. If all flush jobs within an atomic flush succeed (they do not write to MANIFEST), but batch writing version edits to MANIFEST fails, then `cfd->imm()->RollbackMemTableFlush()` will be called twice, and the second invocation hits assertion failure `assert(m->flush_in_progress_)` since the first invocation resets the variable `flush_in_progress_` to false already. Test plan (dev server): ``` ./db_flush_test --gtest_filter=DBAtomicFlushTest/DBAtomicFlushTest.RollbackAfterFailToInstallResults make check ``` Both must succeed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6385 Differential Revision: D19782943 Pulled By: riversand963 fbshipit-source-id: 84e1592625e729d1b70fdc8479959387a74cb121	2020-02-07 10:52:10 -08:00
atul	c6f75516b7	Fixing the documentation of the function (#4803 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6354 Differential Revision: D19725459 Pulled By: riversand963 fbshipit-source-id: fded24576251bfa4b289399f0909f1fe43426e28	2020-02-06 10:30:44 -08:00
Cheng Chang	f5f79f01a2	Be able to read compatible leveldb sst files (#6370 ) Summary: In `DBSSTTest.SSTsWithLdbSuffixHandling`, some sst files are renamed to ldb files, the original intention of the test is to test that the ldb files can be loaded along with the sst files. The original test checks this by `ASSERT_NE("NOT_FOUND", Get(Key(k)))`, but the problem is `Get(Key(k))` returns IO error due to path not found instead of NOT_FOUND, so the success of ASSERT_NE does not mean the key can be retrieved. This PR updates the test to make sure Get(Key(k)) returns the original value. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6370 Test Plan: make db_sst_test && ./db_sst_test Differential Revision: D19726278 Pulled By: cheng-chang fbshipit-source-id: 993127f56457b315e669af4eeb92d6f956b7a4b7	2020-02-06 10:15:44 -08:00
sdong	24c9dce825	Remove include math.h (#6373 ) Summary: We see some odd errors complaining math. However, it doesn't seem that it is needed to be included. Remove the include of math.h. Just removing it from db_bench doesn't seem to break anything. Replacing sqrt from std::sqrt seems to work for histogram.cc Pull Request resolved: https://github.com/facebook/rocksdb/pull/6373 Test Plan: Watch Travis and appveyor to run. Differential Revision: D19730068 fbshipit-source-id: d3ad41defcdd9f51c2da1a3673fb258f5dfacf47	2020-02-05 21:00:49 -08:00
Mike Kolupaev	1ed7d9b1b5	Avoid lots of calls to Env::GetFileSize() in SstFileManagerImpl when opening DB (#6363 ) Summary: Before this PR it calls GetFileSize() once for each sst file in the DB. This can take a long time if there are be tens of thousands of sst files (e.g. in thousands of column families), and even longer if Env is talking to some remote service rather than local filesystem. This PR makes DB::Open() use sst file sizes that are already known from manifest (typically almost all files in the DB) and only call GetFileSize() for non-sst or obsolete files. Note that GetFileSize() is also called and checked against manifest in CheckConsistency(), so the calls in SstFileManagerImpl were completely redundant. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6363 Test Plan: deployed to a test cluster, looked at a dump of Env calls (from a custom instrumented Env) - no more thousands of GetFileSize()s. Differential Revision: D19702509 Pulled By: al13n321 fbshipit-source-id: 99f8110620cb2e9d0c092dfcdbb11f3af4ff8b73	2020-02-04 13:41:53 -08:00
sdong	3a073234da	Consolidate ReadFileToString() (#6366 ) Summary: It's a minor refactoring. We have two ReadFileToString() but they are very similar. Make the one with Env argument calls the one with FS argument instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6366 Test Plan: Run all existing tests Differential Revision: D19712332 fbshipit-source-id: 5ae6fabf6355938690d95cda52afd1f39e0a7823	2020-02-04 11:39:23 -08:00
sdong	69c8614815	Avoid to get manifest file size when recovering from it. (#6369 ) Summary: Right now RocksDB gets manifest file size before recovering from it. The information is available in LogReader. Use it instead to prevent one file system call. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6369 Test Plan: Run all existing tests Differential Revision: D19714872 fbshipit-source-id: 0144be324d403c99e3da875ea2feccc8f64e883d	2020-02-04 11:39:23 -08:00
Mike Kolupaev	637e64b9ac	Add an option to prevent DB::Open() from querying sizes of all sst files (#6353 ) Summary: When paranoid_checks is on, DBImpl::CheckConsistency() iterates over all sst files and calls Env::GetFileSize() for each of them. As far as I could understand, this is pretty arbitrary and doesn't affect correctness - if filesystem doesn't corrupt fsynced files, the file sizes will always match; if it does, it may as well corrupt contents as well as sizes, and rocksdb doesn't check contents on open. If there are thousands of sst files, getting all their sizes takes a while. If, on top of that, Env is overridden to use some remote storage instead of local filesystem, it can be really slow and overload the remote storage service. This PR adds an option to not do GetFileSize(); instead it does GetChildren() for parent directory to check that all the expected sst files are at least present, but doesn't check their sizes. We can't just disable paranoid_checks instead because paranoid_checks do a few other important things: make the DB read-only on write errors, print error messages on read errors, etc. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6353 Test Plan: ran the added sanity check unit test. Will try it out in a LogDevice test cluster where the GetFileSize() calls are causing a lot of trouble. Differential Revision: D19656425 Pulled By: al13n321 fbshipit-source-id: c2c421b367633033760d1f56747bad206d1fbf82	2020-02-04 01:27:26 -08:00
anand76	7330ec0ff1	Fix a test failure in error_handler_test (#6367 ) Summary: Fix an intermittent failure in DBErrorHandlingTest.CompactionManifestWriteError due to a race between background error recovery and the main test thread calling TEST_WaitForCompact(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6367 Test Plan: Run the test using gtest_parallel Differential Revision: D19713802 Pulled By: anand1976 fbshipit-source-id: 29e35dc26e0984fe8334c083e059f4fa1f335d68	2020-02-03 18:16:52 -08:00
sdong	f195d8d523	Use ReadFileToString() to get content from IDENTITY file (#6365 ) Summary: Right now when reading IDENTITY file, we use a very similar logic as ReadFileToString() while it does an extra file size check, which may be expensive in some file systems. There is no reason to duplicate the logic. Use ReadFileToString() instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6365 Test Plan: RUn all existing tests. Differential Revision: D19709399 fbshipit-source-id: 3bac31f3b2471f98a0d2694278b41e9cd34040fe	2020-02-03 17:40:49 -08:00
sdong	36c504be17	Avoid create directory for every column families (#6358 ) Summary: A relatively recent regression causes for every CF, create and open directory is called for the DB directory, unless CF has a private directory. This doesn't scale well with large number of column families. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6358 Test Plan: Run all existing tests and see it pass. strace with db_bench --num_column_families and observe it doesn't open directory for number of column families. Differential Revision: D19675141 fbshipit-source-id: da01d9216f1dae3f03d4064fbd88ce71245bd9be	2020-02-03 14:13:39 -08:00
Huisheng Liu	eb4d6af5ae	Error handler test fix (#6266 ) Summary: MultiDBCompactionError fails when it verifies the number of files on level 0 and level 1 without waiting for compaction to finish. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6266 Differential Revision: D19701639 Pulled By: riversand963 fbshipit-source-id: e96d511bcde705075f073e0b550cebcd2ecfccdc	2020-02-03 13:32:53 -08:00
Adam Retter	7242dae7fe	Improve RocksJava Comparator (#6252 ) Summary: This is a redesign of the API for RocksJava comparators with the aim of improving performance. It also simplifies the class hierarchy. NOTE: This breaks backwards compatibility for existing 3rd party Comparators implemented in Java... so we need to consider carefully which release branches this goes into. Previously when implementing a comparator in Java the developer had a choice of subclassing either `DirectComparator` or `Comparator` which would use direct and non-direct byte-buffers resepectively (via `DirectSlice` and `Slice`). In this redesign there we have eliminated the overhead of using the Java Slice classes, and just use `ByteBuffer`s. The `ComparatorOptions` supplied when constructing a Comparator allow you to choose between direct and non-direct byte buffers by setting `useDirect`. In addition, the `ComparatorOptions` now allow you to choose whether a ByteBuffer is reused over multiple comparator calls, by setting `maxReusedBufferSize > 0`. When buffers are reused, ComparatorOptions provides a choice of mutex type by setting `useAdaptiveMutex`. --- [JMH benchmarks previously indicated](https://github.com/facebook/rocksdb/pull/6241#issue-356398306) that the difference between C++ and Java for implementing a comparator was ~7x slowdown in Java. With these changes, when reusing buffers and guarding access to them via mutexes the slowdown is approximately the same. However, these changes offer a new facility to not reuse mutextes, which reduces the slowdown to ~5.5x in Java. We also offer a `thread_local` mechanism for reusing buffers, which reduces slowdown to ~5.2x in Java (closes https://github.com/facebook/rocksdb/pull/4425). These changes also form a good base for further optimisation work such as further JNI lookup caching, and JNI critical. --- These numbers were captured without jemalloc. With jemalloc, the performance improves for all tests, and the Java slowdown reduces to between 4.8x and 5.x. ``` ComparatorBenchmarks.put native_bytewise thrpt 25 124483.795 ± 2032.443 ops/s ComparatorBenchmarks.put native_reverse_bytewise thrpt 25 114414.536 ± 3486.156 ops/s ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_adaptive-mutex thrpt 25 17228.250 ± 1288.546 ops/s ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_non-adaptive-mutex thrpt 25 16035.865 ± 1248.099 ops/s ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_thread-local thrpt 25 21571.500 ± 871.521 ops/s ComparatorBenchmarks.put java_bytewise_direct_reused-64_adaptive-mutex thrpt 25 23613.773 ± 8465.660 ops/s ComparatorBenchmarks.put java_bytewise_direct_reused-64_non-adaptive-mutex thrpt 25 16768.172 ± 5618.489 ops/s ComparatorBenchmarks.put java_bytewise_direct_reused-64_thread-local thrpt 25 23921.164 ± 8734.742 ops/s ComparatorBenchmarks.put java_bytewise_non-direct_no-reuse thrpt 25 17899.684 ± 839.679 ops/s ComparatorBenchmarks.put java_bytewise_direct_no-reuse thrpt 25 22148.316 ± 1215.527 ops/s ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_adaptive-mutex thrpt 25 11311.126 ± 820.602 ops/s ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_non-adaptive-mutex thrpt 25 11421.311 ± 807.210 ops/s ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_thread-local thrpt 25 11554.005 ± 960.556 ops/s ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_adaptive-mutex thrpt 25 22960.523 ± 1673.421 ops/s ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_non-adaptive-mutex thrpt 25 18293.317 ± 1434.601 ops/s ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_thread-local thrpt 25 24479.361 ± 2157.306 ops/s ComparatorBenchmarks.put java_reverse_bytewise_non-direct_no-reuse thrpt 25 7942.286 ± 626.170 ops/s ComparatorBenchmarks.put java_reverse_bytewise_direct_no-reuse thrpt 25 11781.955 ± 1019.843 ops/s ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6252 Differential Revision: D19331064 Pulled By: pdillinger fbshipit-source-id: 1f3b794e6a14162b2c3ffb943e8c0e64a0c03738	2020-02-03 12:30:13 -08:00
sdong	800d24ddc5	Fix DBTest2.ChangePrefixExtractor LITE build (#6356 ) Summary: DBTest2.ChangePrefixExtractor fails in LITE build because LITE build doesn't support adaptive build. Fix it by removing the stats check but only check correctness. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6356 Test Plan: Run the test with both of LITE and non-LITE build. Differential Revision: D19669537 fbshipit-source-id: 6d7dd6c8a79f18e80ca1636864b9c71922030d8e	2020-01-31 15:44:14 -08:00
Maysam Yabandeh	01ab882ba3	Fix release warning for unused bg_canceled Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6357 Differential Revision: D19670931 Pulled By: maysamyabandeh fbshipit-source-id: d528c4c7f9450f1f38b9d2a36e0d5d0865b39be9	2020-01-31 15:09:10 -08:00
sdong	ec496347bc	Add a unit test for prefix extractor changes (#6323 ) Summary: Add a unit test for prefix extractor change, including a check that fails due to a bug. Also comment out the partitioned filter case which will fail the test too. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6323 Test Plan: Run the test and it passes (and fails if the SeekForPrev() part is uncommented) Differential Revision: D19509744 fbshipit-source-id: 678202ca97b5503e9de73b54b90de9e5ba822b72	2020-01-31 11:02:03 -08:00
Maysam Yabandeh	2243030bc5	Cancel bg jobs before deleting WritePrepared DB in stress tests (#6355 ) Summary: Background jobs in WritePrepared DB might access the db via a snapshot checker callback. The stress tests therefore should cancel background jobs before deleting the db in ::Reopen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6355 Differential Revision: D19664132 Pulled By: maysamyabandeh fbshipit-source-id: 6060a830e8aad0015c10448286ad37c8a346ac01	2020-01-31 10:29:11 -08:00
Maysam Yabandeh	3316d29221	Disable recycle_log_file_num when it is incompatible with recovery mode (#6351 ) Summary: Non-zero recycle_log_file_num is incompatible with kPointInTimeRecovery and kAbsoluteConsistency recovery modes. Currently SanitizeOptions changes the recovery mode to kTolerateCorruptedTailRecords, while to resolve this option conflict it makes more sense to compromise recycle_log_file_num, which is a performance feature, instead of wal_recovery_mode, which is a safety feature. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6351 Differential Revision: D19648931 Pulled By: maysamyabandeh fbshipit-source-id: dd0bf78349edc007518a00c4d63931fd69294ad7	2020-01-31 07:28:30 -08:00
Yanqin Jin	f2fbc5d668	Shorten certain test names to avoid infra failure (#6352 ) Summary: Unit test names, together with other components, are used to create log files during some internal testing. Overly long names cause infra failure due to file names being too long. Look for internal tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6352 Differential Revision: D19649307 Pulled By: riversand963 fbshipit-source-id: 6f29de096e33c0eaa87d9c8702f810eda50059e7	2020-01-30 23:10:24 -08:00
Burton Li	c9a5e48762	fix build warnnings on MSVC (#6309 ) Summary: Fix build warnings on MSVC. siying Pull Request resolved: https://github.com/facebook/rocksdb/pull/6309 Differential Revision: D19455012 Pulled By: ltamasi fbshipit-source-id: 940739f2c92de60e47cc2bed8dd7f921459545a9	2020-01-30 16:07:26 -08:00
Peter Dillinger	90c71aa5d9	Don't download from (unreliable) maven.org (#6348 ) Summary: I set up a mirror of our Java deps on github so we can download them through github URLs rather than maven.org, which is proving terribly unreliable from Travis builds. Also sanitized calls to curl, so they are easier to read and appropriately fail on download failure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6348 Test Plan: CI Differential Revision: D19633621 Pulled By: pdillinger fbshipit-source-id: 7eb3f730953db2ead758dc94039c040f406790f3	2020-01-30 11:02:08 -08:00
anand76	fb05b5a652	Force a new manifest file if append to current one fails (#6331 ) Summary: Fix for issue https://github.com/facebook/rocksdb/issues/6316 When an append/sync of the manifest file fails due to an IO error such as NoSpace, we don't always put the DB in read-only mode. This is true for flush and compactions, as well as foreground operatons such as column family add/drop, CompactFiles etc. Subsequent changes to the DB will be recorded in the same manifest file, which would have a corrupted record in the middle due to the previous failure. On next DB::Open(), it will fail to process the full manifest and data will be lost. To fix this, we reset VersionSet::descriptor_log_ on append/sync failure, which will force a new manifest file to be written on the next append. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6331 Test Plan: Add new unit tests in error_handler_test.cc Differential Revision: D19632951 Pulled By: anand1976 fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3	2020-01-30 10:56:29 -08:00
Levi Tamasi	9e3ace42a4	Add statistics for BlobDB GC (#6296 ) Summary: The patch adds statistics support to the new BlobDB garbage collection implementation; namely, it adds support for the following (pre-existing) tickers: `BLOB_DB_GC_NUM_FILES`: the number of blob files obsoleted by the GC logic. `BLOB_DB_GC_NUM_NEW_FILES`: the number of new blob files generated by the GC logic. `BLOB_DB_GC_FAILURES`: the number of failed GC passes (where a GC pass is equivalent to a (sub)compaction). `BLOB_DB_GC_NUM_KEYS_RELOCATED`: the number of blobs relocated to new blob files by the GC logic. `BLOB_DB_GC_BYTES_RELOCATED`: the total size of blobs relocated to new blob files. The tickers `BLOB_DB_GC_NUM_KEYS_OVERWRITTEN`, `BLOB_DB_GC_NUM_KEYS_EXPIRED`, `BLOB_DB_GC_BYTES_OVERWRITTEN`, `BLOB_DB_GC_BYTES_EXPIRED`, and `BLOB_DB_GC_MICROS` are not relevant for the new GC logic, and are thus marked deprecated. The patch also adds a couple of log messages that log the number and total size of blobs encountered and relocated during a GC pass, as well as the number of blob files created and obsoleted. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6296 Test Plan: Extended unit tests and used the BlobDB mode of `db_bench`. Differential Revision: D19402513 Pulled By: ltamasi fbshipit-source-id: d53d2bfbf4928a1db1e9346c67ebb9007b8932ec	2020-01-29 16:46:16 -08:00
sdong	71874c5aaf	Fix LITE build with DBTest2.AutoPrefixMode1 (#6346 ) Summary: DBTest2.AutoPrefixMode1 doesn't pass because auto prefix mode is not supported there. Fix it by disabling the test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6346 Test Plan: Run DBTest2.AutoPrefixMode1 in lite mode Differential Revision: D19627486 fbshipit-source-id: fbde75260aeecb7e6fc406e09c19a71a95aa5f08	2020-01-29 16:43:42 -08:00
Peter Dillinger	23dcf2759d	Upload DB dir for all crash tests (#6344 ) Summary: Difficult to root cause crash test failures without archiving db dir. Now all crash test configurations should save the db dir. Also exit with error code on bad command. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6344 Test Plan: Hmm, how about this: for TARGET in stress_crash asan_crash ubsan_crash tsan_crash; do EMAIL=email ONCALL=oncall TRIGGER=all SUBSCRIBER=sub build_tools/rocksdb-lego-determinator $TARGET > tmp && node -c tmp && grep -q Upload tmp \|\| echo Bad; done Differential Revision: D19625605 Pulled By: pdillinger fbshipit-source-id: cb84aa93ee80b4534f4c61b90f0e0f99a41155d5	2020-01-29 15:59:07 -08:00
sdong	02ac6c9a3c	Fix db_bloom_filter_test clang LITE build (#6340 ) Summary: db_bloom_filter_test break with clang LITE build with following message: db/db_bloom_filter_test.cc:23:29: error: unused variable 'kPlainTable' [-Werror,-Wunused-const-variable] static constexpr PseudoMode kPlainTable = -1; ^ Fix it by moving the declaration out of LITE build Pull Request resolved: https://github.com/facebook/rocksdb/pull/6340 Test Plan: USE_CLANG=1 LITE=1 make db_bloom_filter_test and without LITE=1 Differential Revision: D19609834 fbshipit-source-id: 0e88f5c6759238a94f9880d84c785ac18e7cdd7e	2020-01-29 12:57:48 -08:00
Maysam Yabandeh	2f973ca96e	Double Crash in kPointInTimeRecovery with TransactionDB (#6313 ) Summary: In WritePrepared there could be gap in sequence numbers. This breaks the trick we use in kPointInTimeRecovery which assume the first seq in the log right after the corrupted log is one larger than the last seq we read from the logs. To let this trick keep working, we add a dummy entry with the expected sequence to the first log right after recovery. Also in WriteCommitted, if the log right after the corrupted log is empty, since it has no sequence number to let the sequential trick work, it is assumed as unexpected behavior. This is however expected to happen if we close the db after recovering from a corruption and before writing anything new to it. To remedy that, we apply the same technique by writing a dummy entry to the log that is created after the corrupted log. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6313 Differential Revision: D19458291 Pulled By: maysamyabandeh fbshipit-source-id: 09bc49e574690085df45b034ca863ff315937e2d	2020-01-29 11:40:55 -08:00
Adam Retter	a07a9dc904	Reduce the need to re-download dependencies (#6318 ) Summary: Both changes are related to RocksJava: 1. Allow dependencies that are already present on the host system due to Maven to be reused in Docker builds. 2. Extend the `make clean-not-downloaded` target to RocksJava, so that libraries needed as dependencies for the test suite are not deleted and re-downloaded unnecessarily. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6318 Differential Revision: D19608742 Pulled By: pdillinger fbshipit-source-id: 25e25649e3e3212b537ac4512b40e2e53dc02ae7	2020-01-29 08:01:56 -08:00
sdong	8f2bee6747	Add ReadOptions.auto_prefix_mode (#6314 ) Summary: Add a new option ReadOptions.auto_prefix_mode. When set to true, iterator should return the same result as total order seek, but may choose to do prefix seek internally, based on iterator upper bounds. Also fix two previous bugs when handling prefix extrator changes: (1) reverse iterator should not rely on upper bound to determine prefix. Fix it with skipping prefix check. (2) block-based filter is not handled properly. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6314 Test Plan: (1) add a unit test; (2) add the check to stress test and run see whether it can pass at least one run. Differential Revision: D19458717 fbshipit-source-id: 51c1bcc5cdd826c2469af201979a39600e779bce	2020-01-28 14:44:05 -08:00
Siying Dong	431fb6c0ba	Add Google Group to Issue Template Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6339 Differential Revision: D19608457 fbshipit-source-id: 2adea28b1bd20b85ccafca1aa567030115220ea6	2020-01-28 14:40:37 -08:00
Sagar Vemuri	4f6c86226c	Use the same oldest ancestor time in table properties and manifest Summary: ./db_compaction_test DBCompactionTest.LevelTtlCascadingCompactions passed 96 / 100 times. ``` With the fix: all runs (tried 100, 1000, 10000) succeed. ``` $ TEST_TMPDIR=/dev/shm ~/gtest-parallel/gtest-parallel ./db_compaction_test --gtest_filter=DBCompactionTest.LevelTtlCascadingCompactions --repeat=1000 [1000/1000] DBCompactionTest.LevelTtlCascadingCompactions (1895 ms) ``` Test Plan: Build: ``` COMPILE_WITH_TSAN=1 make db_compaction_test -j100 ``` Without the fix: a few runs out of 100 fail: ``` $ TEST_TMPDIR=/dev/shm KEEP_DB=1 ~/gtest-parallel/gtest-parallel ./db_compaction_test --gtest_filter=DBCompactionTest.LevelTtlCascadingCompactions --repeat=100 ... ... Note: Google Test filter = DBCompactionTest.LevelTtlCascadingCompactions [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from DBCompactionTest [ RUN ] DBCompactionTest.LevelTtlCascadingCompactions db/db_compaction_test.cc:3687: Failure Expected equality of these values: oldest_time Which is: 1580155869 level_to_files[6][0].oldest_ancester_time Which is: 1580155870 DB is still at /dev/shm//db_compaction_test_6337001442947696266 [ FAILED ] DBCompactionTest.LevelTtlCascadingCompactions (1432 ms) [----------] 1 test from DBCompactionTest (1432 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (1433 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] DBCompactionTest.LevelTtlCascadingCompactions 1 FAILED TEST [80/100] DBCompactionTest.LevelTtlCascadingCompactions returned/aborted with exit code 1 (1489 ms) [100/100] DBCompactionTest.LevelTtlCascadingCompactions (1522 ms) FAILED TESTS (4/100): 1419 ms: ./db_compaction_test DBCompactionTest.LevelTtlCascadingCompactions (try https://github.com/facebook/rocksdb/issues/90) 1434 ms: ./db_compaction_test DBCompactionTest.LevelTtlCascadingCompactions (try https://github.com/facebook/rocksdb/issues/84) 1457 ms: ./db_compaction_test DBCompactionTest.LevelTtlCascadingCompactions (try https://github.com/facebook/rocksdb/issues/82) 1489 ms: ./db_compaction_test DBCompactionTest.LevelTtlCascadingCompactions (try https://github.com/facebook/rocksdb/issues/74) Differential Revision: D19587040 Pulled By: sagar0 fbshipit-source-id: 11191ae9940837643bff47ebe18b299b4be3d950	2020-01-27 19:58:53 -08:00
sdong	7aa66c704f	Move HISTORY.md entry of hash index fix from 6.7 to unreleased (#6337 ) Summary: Commits related to hash index fix have been reverted in 6.7.fb branch. Update HISTORY.md to keep it in sync. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6337 Differential Revision: D19593717 fbshipit-source-id: 466178dc6205c9e41ccced41bf281a0952bdc2ca	2020-01-27 17:45:42 -08:00
Andrew Kryczka	5b33cfa1e3	fix `WriteBufferManager` flush log message (#6335 ) Summary: It chooses the oldest memtable, not the largest one. This is an important difference for users whose CFs receive non-uniform write rates. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6335 Differential Revision: D19588865 Pulled By: maysamyabandeh fbshipit-source-id: 62ad4325b0182f5f27858584cd73fd5978fb2cec	2020-01-27 15:49:22 -08:00
sdong	f10f135938	Fix regression bug of hash index with iterator total order seek (#6328 ) Summary: https://github.com/facebook/rocksdb/pull/6028 introduces a bug for hash index in SST files. If a table reader is created when total order seek is used, prefix_extractor might be passed into table reader as null. While later when prefix seek is used, the same table reader used, hash index is checked but prefix extractor is null and the program would crash. Fix the issue by fixing http://github.com/facebook/rocksdb/pull/6028 in the way that prefix_extractor is preserved but ReadOptions.total_order_seek is checked Also, a null pointer check is added so that a bug like this won't cause segfault in the future. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6328 Test Plan: Add a unit test that would fail without the fix. Stress test that reproduces the crash would pass. Differential Revision: D19586751 fbshipit-source-id: 8de77690167ddf5a77a01e167cf89430b1bfba42	2020-01-27 15:44:54 -08:00
Peter Dillinger	986df37135	Clean up PartitionedFilterBlockBuilder (#6299 ) Summary: Remove the redundant PartitionedFilterBlockBuilder::num_added_ and ::NumAdded since the parent class, FullFilterBlockBuilder, already provides them. Also rename filters_in_partition_ and filters_per_partition_ to keys_added_to_partition_ and keys_per_partition_ to improve readability. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6299 Test Plan: make check Differential Revision: D19413278 Pulled By: pdillinger fbshipit-source-id: 04926ee7874477d659cb2b6ae03f2d995fb747e5	2020-01-27 13:15:14 -08:00
Fosco Marotto	bd698e4f55	Update version for next release, 6.7.0 (#6320 ) Summary: Adjusted history for 6.6.1 and 6.6.2, switched master version to 6.7.0. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6320 Differential Revision: D19499272 Pulled By: gfosco fbshipit-source-id: 2bafb2456951f231e411e9c03aaa4c044f497684	2020-01-24 15:36:32 -08:00
Maysam Yabandeh	c4bc30e12d	Implement PinnableSlice::remove_prefix (#6330 ) Summary: The function was left unimplemented. Although we currently don't have a use for that it was declared with an assert(0) to prevent mistakenly using the remove_prefix of the parent class. The function body with only assert(0) however causes issues with some compiler's warning levels. The patch implements the function to avoid the warning. It also piggybacks some minor code warning for unnecessary semicolons after the function definition.s Pull Request resolved: https://github.com/facebook/rocksdb/pull/6330 Differential Revision: D19559062 Pulled By: maysamyabandeh fbshipit-source-id: 3a022484f688c9abd4556e5412bcc2628ab96a00	2020-01-24 13:04:53 -08:00
Levi Tamasi	f34782a67d	Fix the "records dropped" statistics (#6325 ) Summary: The earlier code used two conflicting definitions for the number of input records going into a compaction, one based on the `rocksdb.num.entries` table property and one based on `CompactionIterationStats`. The first one is correct and in line with how output records are counted, while the second one incorrectly ignores input records in various cases when the `CompactionIterator` advances or reseeks the input iterator (this can happen, amongst other cases, when dealing with `SingleDelete`s, regular `Delete`s, `Merge`s, and compaction filters). This can result in the code undercounting the input records and computing an incorrect value for "records dropped" during the compaction. The patch fixes this by switching over to the correct (table property based) input record count for "records dropped". Pull Request resolved: https://github.com/facebook/rocksdb/pull/6325 Test Plan: Tested using `make check` and `db_bench`. Differential Revision: D19525491 Pulled By: ltamasi fbshipit-source-id: 4340b0b2f41546db8e356db70ca02199e48fa636	2020-01-23 15:27:22 -08:00
anand76	0672a6db64	Fix queue manipulation in WriteThread::BeginWriteStall() (#6322 ) Summary: When there is a write stall, the active write group leader calls ```BeginWriteStall()``` to walk the queue of writers and remove any with the ```no_slowdown``` option set. There was a bug in the code which updated the back pointer but not the forward pointer (```link_newer```), corrupting the list and causing some threads to wait forever. This PR fixes it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6322 Test Plan: Add a unit test in db_write_test Differential Revision: D19538313 Pulled By: anand1976 fbshipit-source-id: 6fbed819e594913f435886606f5d36f74f235c3a	2020-01-23 14:01:28 -08:00
Maysam Yabandeh	967a2d953f	Revert "crash_test to enable block-based table hash index (#6310 )" (#6327 ) Summary: This reverts commit `8e309b35bb`. The stress tests are failing . Revert it until we figure the root cause. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6327 Differential Revision: D19537657 Pulled By: maysamyabandeh fbshipit-source-id: bf34a5dd720825957729e136e9a5a729a240e61a	2020-01-23 09:09:17 -08:00
Maysam Yabandeh	cb1142e00d	Set index_block_restart_interval of kHashSearch to 1 in stress test (#6324 ) Summary: kHashSearch is incompatible with larger than 1 values for index_block_restart_interval. Setting it to 1 in stress tests would avoid confusion about the test parameters. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6324 Differential Revision: D19525669 Pulled By: maysamyabandeh fbshipit-source-id: fbf3a797e0ebcebb4d32eba3728cf3583906fc8a	2020-01-22 16:33:21 -08:00
matthewvon	e6e8b9e871	Correct pragma once problem with Bazel on Windows (#6321 ) Summary: This is a simple edit to have two #include file paths be consistent within range_del_aggregator.{h,cc} with everywhere else. The impact of this inconsistency is that it actual breaks a Bazel based build on the Windows platform. The same pragma once failure occurs with both Windows Visual C++ 2019 and clang for Windows 9.0. Bazel's "sandboxing" of the builds causes both compilers to not properly recognize "rocksdb/types.h" and "include/rocksdb/types.h" to be the same file (also comparator.h). My guess is that the backslash versus forward slash mixing within path names is the underlying issue. But, everything builds fine once the include paths in these two source files are consistent with the rest of the repository. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6321 Differential Revision: D19506585 Pulled By: ltamasi fbshipit-source-id: 294c346607edc433ab99eaabc9c880ee7426817a	2020-01-21 16:12:43 -08:00
Levi Tamasi	d305f13e21	Make DBCompactionTest.SkipStatsUpdateTest more robust (#6306 ) Summary: Currently, this test case tries to infer whether `VersionStorageInfo::UpdateAccumulatedStats` was called during open by checking the number of files opened against an arbitrary threshold (10). This makes the test brittle and results in sporadic failures. The patch changes the test case to use sync points to directly test whether `UpdateAccumulatedStats` was called. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6306 Test Plan: `make check` Differential Revision: D19439544 Pulled By: ltamasi fbshipit-source-id: ceb7adf578222636a0f51740872d0278cd1a914f	2020-01-21 12:55:55 -08:00
sdong	8e309b35bb	crash_test to enable block-based table hash index (#6310 ) Summary: Block-based table has index has been disabled in crash test due to bugs. We fixed a bug and re-enable it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6310 Test Plan: Finish one round of "crash_test_with_atomic_flush" test successfully while exclusively running has index. Another run also ran for several hours without failure. Differential Revision: D19455856 fbshipit-source-id: 1192752d2c1e81ed7e5c5c7a9481c841582d5274	2020-01-21 12:27:30 -08:00
Peter Dillinger	8aa99fc71e	Warn on excessive keys for legacy Bloom filter with 32-bit hash (#6317 ) Summary: With many millions of keys, the old Bloom filter implementation for the block-based table (format_version <= 4) would have excessive FP rate due to the limitations of feeding the Bloom filter with a 32-bit hash. This change computes an estimated inflated FP rate due to this effect and warns in the log whenever an SST filter is constructed (almost certainly a "full" not "partitioned" filter) that exceeds 1.5x FP rate due to this effect. The detailed condition is only checked if 3 million keys or more have been added to a filter, as this should be a lower bound for common bits/key settings (< 20). Recommended remedies include smaller SST file size, using format_version >= 5 (for new Bloom filter), or using partitioned filters. This does not change behavior other than generating warnings for some constructed filters using the old implementation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6317 Test Plan: Example with warning, 15M keys @ 15 bits / key: (working_mem_size_mb is just to stop after building one filter if it's large) $ ./filter_bench -quick -impl=0 -working_mem_size_mb=1 -bits_per_key=15 -average_keys_per_filter=15000000 2>&1 \| grep 'FP rate' [WARN] [/block_based/filter_policy.cc:292] Using legacy SST/BBT Bloom filter with excessive key count (15.0M @ 15bpk), causing estimated 1.8x higher filter FP rate. Consider using new Bloom with format_version>=5, smaller SST file size, or partitioned filters. Predicted FP rate %: 0.766702 Average FP rate %: 0.66846 Example without warning (150K keys): $ ./filter_bench -quick -impl=0 -working_mem_size_mb=1 -bits_per_key=15 -average_keys_per_filter=150000 2>&1 \| grep 'FP rate' Predicted FP rate %: 0.422857 Average FP rate %: 0.379301 $ With more samples at 15 bits/key: 150K keys -> no warning; actual: 0.379% FP rate (baseline) 1M keys -> no warning; actual: 0.396% FP rate, 1.045x 9M keys -> no warning; actual: 0.563% FP rate, 1.485x 10M keys -> warning (1.5x); actual: 0.564% FP rate, 1.488x 15M keys -> warning (1.8x); actual: 0.668% FP rate, 1.76x 25M keys -> warning (2.4x); actual: 0.880% FP rate, 2.32x At 10 bits/key: 150K keys -> no warning; actual: 1.17% FP rate (baseline) 1M keys -> no warning; actual: 1.16% FP rate 10M keys -> no warning; actual: 1.32% FP rate, 1.13x 25M keys -> no warning; actual: 1.63% FP rate, 1.39x 35M keys -> warning (1.6x); actual: 1.81% FP rate, 1.55x At 5 bits/key: 150K keys -> no warning; actual: 9.32% FP rate (baseline) 25M keys -> no warning; actual: 9.62% FP rate, 1.03x 200M keys -> no warning; actual: 12.2% FP rate, 1.31x 250M keys -> warning (1.5x); actual: 12.8% FP rate, 1.37x 300M keys -> warning (1.6x); actual: 13.4% FP rate, 1.43x The reason for the modest inaccuracy at low bits/key is that the assumption of independence between a collision between 32-hash values feeding the filter and an FP in the filter is not quite true for implementations using "simple" logic to compute indices from the stock hash result. There's math on this in my dissertation, but I don't think it's worth the effort just for these extreme cases (> 100 million keys and low-ish bits/key). Differential Revision: D19471715 Pulled By: pdillinger fbshipit-source-id: f80c96893a09bf1152630ff0b964e5cdd7e35c68	2020-01-20 21:31:47 -08:00
Peter Dillinger	4b86fe1123	Log warning for high bits/key in legacy Bloom filter (#6312 ) Summary: Help users that would benefit most from new Bloom filter implementation by logging a warning that recommends the using format_version >= 5. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6312 Test Plan: $ (for BPK in 10 13 14 19 20 50; do ./filter_bench -quick -impl=0 -bits_per_key=$BPK -m_queries=1 2>&1; done) \| grep 'its/key' Bits/key actual: 10.0647 Bits/key actual: 13.0593 [WARN] [/block_based/filter_policy.cc:546] Using legacy Bloom filter with high (14) bits/key. Significant filter space and/or accuracy improvement is available with format_verion>=5. Bits/key actual: 14.0581 [WARN] [/block_based/filter_policy.cc:546] Using legacy Bloom filter with high (19) bits/key. Significant filter space and/or accuracy improvement is available with format_verion>=5. Bits/key actual: 19.0542 [WARN] [/block_based/filter_policy.cc:546] Using legacy Bloom filter with high (20) bits/key. Dramatic filter space and/or accuracy improvement is available with format_verion>=5. Bits/key actual: 20.0584 [WARN] [/block_based/filter_policy.cc:546] Using legacy Bloom filter with high (50) bits/key. Dramatic filter space and/or accuracy improvement is available with format_verion>=5. Bits/key actual: 50.0577 Differential Revision: D19457191 Pulled By: pdillinger fbshipit-source-id: 073d94cde5c70e03a160f953e1100c15ea83eda4	2020-01-17 19:37:35 -08:00

1 2 3 4 5 ...

8716 Commits All Branches Search

8716 Commits

All Branches