rocksdb

mirror of https://github.com/facebook/rocksdb.git synced 2024-11-29 18:33:58 +00:00

Author	SHA1	Message	Date
Sahir Hoda	d65d7d657d	Expose JemallocNodumpAllocator to C API (#8178 ) Summary: Add new C APIs to create the JemallocNodumpAllocator and set it on a Cache object. `make test` passes with and without `DISABLE_JEMALLOC=1`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8178 Reviewed By: jay-zhuang Differential Revision: D27944631 Pulled By: ajkr fbshipit-source-id: 2531729aa285a8985c58f22f093c4d53029c4a7b	2021-04-22 22:22:34 -07:00
mrambacher	01e460d538	Make types of Immutable/Mutable Options fields match that of the underlying Option (#8176 ) Summary: This PR is a first step at attempting to clean up some of the Mutable/Immutable Options code. With this change, a DBOption and a ColumnFamilyOption can be reconstructed from their Mutable and Immutable equivalents, respectively. readrandom tests do not show any performance degradation versus master (though both are slightly slower than the current 6.19 release). There are still fields in the ImmutableCFOptions that are not CF options but DB options. Eventually, I would like to move those into an ImmutableOptions (= ImmutableDBOptions+ImmutableCFOptions). But that will be part of a future PR to minimize changes and disruptions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8176 Reviewed By: pdillinger Differential Revision: D27954339 Pulled By: mrambacher fbshipit-source-id: ec6b805ba9afe6e094bffdbd76246c2d99aa9fad	2021-04-22 20:43:54 -07:00
Jay Zhuang	f0fca2b1d5	Add internal compaction API for Secondary instance (#8171 ) Summary: Add compaction API for secondary instance, which compact the files to a secondary DB path without installing to the LSM tree. The API will be used to remote compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8171 Test Plan: `make check` Reviewed By: ajkr Differential Revision: D27694545 Pulled By: jay-zhuang fbshipit-source-id: 8ff3ec1bffdb2e1becee994918850c8902caf731	2021-04-22 13:02:28 -07:00
Zhichao Cao	09a9ec3ac0	Fix the false positive alert of CF consistency check in WAL recovery (#8207 ) Summary: In current RocksDB, in recover the information form WAL, we do the consistency check for each column family when one WAL file is corrupted and PointInTimeRecovery is set. However, it will report a false positive alert on "SST file is ahead of WALs" when one of the CF current log number is greater than the corrupted WAL number (CF contains the data beyond the corrupted WAl) due to a new column family creation during flush. In this case, a new WAL is created (it is empty) during a flush. Also, due to some reason (e.g., storage issue or crash happens before SyncCloseLog is called), the old WAL is corrupted. The new CF has no data, therefore, it does not have the consistency issue. Fix: when checking cfd->GetLogNumber() > corrupted_wal_number also check cfd->GetLiveSstFilesSize() > 0. So the CFs with no SST file data will skip the check here. Note potential ignored inconsistency caused due to fix: empty CF can also be caused by write+delete. In this case, after flush, there is no SST files being generated. However, this CF still have the log in the WAL. When the WAL is corrupted, the DB might be inconsistent. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8207 Test Plan: added unit test, make crash_test Reviewed By: riversand963 Differential Revision: D27898839 Pulled By: zhichao-cao fbshipit-source-id: 931fc2d8b92dd00b4169bf84b94e712fd688a83e	2021-04-22 10:28:37 -07:00
Yanqin Jin	314352761f	Ignore comparator name mismatch in ldb manifest dump (#8216 ) Summary: RocksDB allows user-specified custom comparators which may not be known to `ldb`, a built-in tool for checking/mutating the database. Therefore, column family comparator names mismatch encountered during manifest dump should not prevent the dumping from proceeding. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8216 Test Plan: ``` make check ``` Also manually do the following ``` KEEP_DB=1 ./db_with_timestamp_basic_test ./ldb --db=<db> manifest_dump --verbose ``` The ldb should succeed and print something like: ``` ... --------------- Column family "default" (ID 0) -------------- log number: 6 comparator: <TestComparator>, but the comparator object is not available. ... ``` Reviewed By: ltamasi Differential Revision: D27927581 Pulled By: riversand963 fbshipit-source-id: f610b2c842187d17f575362070209ee6b74ec6d4	2021-04-21 20:43:10 -07:00
sdong	4985cea141	Add comment to DisableManualCompaction() (#8186 ) Summary: Add comment to DisableManualCompaction() which was missing. Also explictly return from DBImpl::CompactRange() to avoid memtable flush when manual compaction is disabled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8186 Test Plan: Run existing unit tests. Reviewed By: jay-zhuang Differential Revision: D27744517 fbshipit-source-id: 449548a48905903b888dc9612bd17480f6596a71	2021-04-21 15:23:46 -07:00
Akanksha Mahajan	596e9008e4	Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898 ) Summary: When WriteBufferManager is shared across DBs and column families to maintain memory usage under a limit, OOMs have been observed when flush cannot finish but writes continuously insert to memtables. In order to avoid OOMs, when memory usage goes beyond buffer_limit_ and DBs tries to write, this change will stall incoming writers until flush is completed and memory_usage drops. Design: Stall condition: When total memory usage exceeds WriteBufferManager::buffer_size_ (memory_usage() >= buffer_size_) WriterBufferManager::ShouldStall() returns true. DBImpl first block incoming/future writers by calling write_thread_.BeginWriteStall() (which adds dummy stall object to the writer's queue). Then DB is blocked on a state State::Blocked (current write doesn't go through). WBStallInterface object maintained by every DB instance is added to the queue of WriteBufferManager. If multiple DBs tries to write during this stall, they will also be blocked when check WriteBufferManager::ShouldStall() returns true. End Stall condition: When flush is finished and memory usage goes down, stall will end only if memory waiting to be flushed is less than buffer_size/2. This lower limit will give time for flush to complete and avoid continous stalling if memory usage remains close to buffer_size. WriterBufferManager::EndWriteStall() is called, which removes all instances from its queue and signal them to continue. Their state is changed to State::Running and they are unblocked. DBImpl then signal all incoming writers of that DB to continue by calling write_thread_.EndWriteStall() (which removes dummy stall object from the queue). DB instance creates WBMStallInterface which is an interface to block and signal DBs during stall. When DB needs to be blocked or signalled by WriteBufferManager, state_for_wbm_ state is changed accordingly (RUNNING or BLOCKED). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7898 Test Plan: Added a new test db/db_write_buffer_manager_test.cc Reviewed By: anand1976 Differential Revision: D26093227 Pulled By: akankshamahajan15 fbshipit-source-id: 2bbd982a3fb7033f6de6153aa92a221249861aae	2021-04-21 13:54:02 -07:00
Andrew Kryczka	905dd17b35	Fix seqno in ingested file boundary key metadata (#8209 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/6245. Adapted from https://github.com/facebook/rocksdb/issues/8201 and https://github.com/facebook/rocksdb/issues/8205. Previously we were writing the ingested file's smallest/largest internal keys with sequence number zero, or `kMaxSequenceNumber` in case of range tombstone. The former (sequence number zero) is incorrect and can lead to files being incorrectly ordered. The fix in this PR is to overwrite boundary keys that have sequence number zero with the ingested file's assigned sequence number. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8209 Test Plan: repro unit test Reviewed By: riversand963 Differential Revision: D27885678 Pulled By: ajkr fbshipit-source-id: 4a9f2c6efdfff81c3a9923e915ea88b250ee7b6a	2021-04-20 14:00:21 -07:00
Jay Zhuang	a89740fbc6	Fix unittest no space issue (#8204 ) Summary: Unittest reports no space from time to time, which can be reproduced on a small memory machine with SHM. It's caused by large WAL files generated during the test, which is preallocated, but didn't truncate during close(). Adding the missing APIs to set preallocation. It added arm test as nightly build, as the test runs more than 1 hour. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8204 Test Plan: test on small memory arm machine Reviewed By: mrambacher Differential Revision: D27873145 Pulled By: jay-zhuang fbshipit-source-id: f797c429d6bc13cbcc673bc03fcc72adda55f506	2021-04-20 08:42:28 -07:00
Yanqin Jin	a376c22066	Handle rename() failure in non-local FS (#8192 ) Summary: In a distributed environment, a file `rename()` operation can succeed on server (remote) side, but the client can somehow return non-ok status to RocksDB. Possible reasons include network partition, connection issue, etc. This happens in `rocksdb::SetCurrentFile()`, which can be called in `LogAndApply() -> ProcessManifestWrites()` if RocksDB tries to switch to a new MANIFEST. We currently always delete the new MANIFEST if an error occurs. This is problematic in distributed world. If the server-side successfully updates the CURRENT file via renaming, then a subsequent `DB::Open()` will try to look for the new MANIFEST and fail. As a fix, we can track the execution result of IO operations on the new MANIFEST. - If IO operations on the new MANIFEST fail, then we know the CURRENT must point to the original MANIFEST. Therefore, it is safe to remove the new MANIFEST. - If IO operations on the new MANIFEST all succeed, but somehow we end up in the clean up code block, then we do not know whether CURRENT points to the new or old MANIFEST. (For local POSIX-compliant FS, it should still point to old MANIFEST, but it does not matter if we keep the new MANIFEST.) Therefore, we keep the new MANIFEST. - Any future `LogAndApply()` will switch to a new MANIFEST and update CURRENT. - If process reopens the db immediately after the failure, then the CURRENT file can point to either the new MANIFEST or the old one, both of which exist. Therefore, recovery can succeed and ignore the other. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8192 Test Plan: make check Reviewed By: zhichao-cao Differential Revision: D27804648 Pulled By: riversand963 fbshipit-source-id: 9c16f2a5ce41bc6aadf085e48449b19ede8423e4	2021-04-19 18:11:13 -07:00
Levi Tamasi	0c6e4674a6	Fix a data race related to DB properties (#8206 ) Summary: Historically, the DB properties `rocksdb.cur-size-active-mem-table`, `rocksdb.cur-size-all-mem-tables`, and `rocksdb.size-all-mem-tables` called the method `MemTable::ApproximateMemoryUsage` for mutable memtables, which is not safe without synchronization. This resulted in data races with memtable inserts. The patch changes the code handling these properties to use `MemTable::ApproximateMemoryUsageFast` instead, which returns a cached value backed by an atomic variable. Two test cases had to be updated for this change. `MemoryTest.MemTableAndTableReadersTotal` was fixed by increasing the value size used so each value ends up in its own memtable, which was the original intention (note: the test has been broken in the sense that the test code didn't consider that memtable sizes below 64 KB get increased to 64 KB by `SanitizeOptions`, and has been passing only by accident). `DBTest.MemoryUsageWithMaxWriteBufferSizeToMaintain` relies on completely up-to-date values and thus was changed to use `ApproximateMemoryUsage` directly instead of going through the DB properties. Note: this should be safe in this case since there's only a single thread involved. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8206 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D27866811 Pulled By: ltamasi fbshipit-source-id: 7bd754d0565e0a65f1f7f0e78ffc093beef79394	2021-04-19 16:38:02 -07:00
Yanqin Jin	b0e20194ea	Handle blob files when options.best_efforts_recovery is true (#8180 ) Summary: If `options.best_efforts_recovery == true`, RocksDB currently tolerates missing table files and recovers to the latest version without missing table files (not considering WAL). It is necessary to handle blob files as well to make the feature more complete. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8180 Test Plan: make check Reviewed By: ltamasi Differential Revision: D27840556 Pulled By: riversand963 fbshipit-source-id: 041685d0dc2e7779ac4f0374c07a8a327704aa5e	2021-04-19 11:56:14 -07:00
mrambacher	4c41e51c07	Add Blob Options to C API (#8148 ) Summary: Added the Blob option settings from the AdvancedColmnFamilyOptions to the C API. There are no tests for getting/setting options in the C API currently, hence no specific test plans. Should there be a some? Pull Request resolved: https://github.com/facebook/rocksdb/pull/8148 Reviewed By: ltamasi Differential Revision: D27568495 Pulled By: mrambacher fbshipit-source-id: 3a52b784467ea2c4bc58be5f75c5d41f0a5c55d6	2021-04-16 05:56:00 -07:00
Akanksha Mahajan	00803d619c	Fix flaky failure in DBSSTest.DBWithSstFileManagerForBlobFilesWithGC (#8196 ) Summary: Updated the test to wait until all trash files are deleted by SSTFileManager in the background. Since deletion runs in background so number of files deleted might not always be as expected. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8196 Reviewed By: jay-zhuang Differential Revision: D27812273 Pulled By: akankshamahajan15 fbshipit-source-id: d3ace1db34f91254b52fa455e09844d02801f58e	2021-04-15 20:18:57 -07:00
Akanksha Mahajan	83031e7343	Fix for LITE mode failure on MacOS (#8189 ) Summary: Fix for failure to build in LITE mode on MacOs from BlobFileCompletionCallback unused private fields. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8189 Reviewed By: jay-zhuang Differential Revision: D27768341 Pulled By: akankshamahajan15 fbshipit-source-id: 14d31d7a9b52d308d9f9f27feff1977c5550622f	2021-04-15 09:45:02 -07:00
Akanksha Mahajan	296b47db25	Extend file_checksum_dump ldb command and DB::GetLiveFilesChecksumInfo to blob files (#8179 ) Summary: Extend the DB::GetLiveFilesChecksumInfo API to blob files. This API is also used by the file_checksum_dump ldb command to dump checksum of SST files which now also dumps blob files checksum. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8179 Test Plan: Add new unit test Reviewed By: zhichao-cao Differential Revision: D27714965 Pulled By: akankshamahajan15 fbshipit-source-id: d8b7343ea845a64c83800336d88cced7152a8c92	2021-04-15 09:38:13 -07:00
Yanqin Jin	b1f62be10e	Use the right level (L0) for files written during WAL recovery (#8187 ) Summary: As the name of `DBImpl::WriteLevel0TableForRecovery` suggests, the resulting table file should be placed on L0. However, the argument `level` passed to `BuildTable()` is -1. We need to correct this since the level information will be useful to determine file placement. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8187 Test Plan: make check Reviewed By: ltamasi Differential Revision: D27748570 Pulled By: riversand963 fbshipit-source-id: e1cd23128a8de31f14b1edc2ea92754c154e4f10	2021-04-14 23:40:22 -07:00
Justin Chapman	d89483098f	Assert unlimited max_open_files for FIFO compaction. (#8172 ) Summary: Resolves https://github.com/facebook/rocksdb/issues/8014 - Add an assertion on `DB::Open` to ensure `db_options.max_open_files` is unlimited if FIFO Compaction is being used. - This is to align with what the docs mention and to prevent premature data deletion. - Update tests to work with this assertion. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8172 Test Plan: ```bash $ make check -j$(nproc) Generated TARGETS Summary: - 6 libs - 0 binarys - 180 tests ``` Reviewed By: ajkr Differential Revision: D27768792 Pulled By: thejchap fbshipit-source-id: cf6350535e3a3577fec72bcba75b3c094dc7a6f3	2021-04-14 12:05:47 -07:00
Sahir Hoda	139778dfb3	Expose Cache::DisownData in C API (#8160 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8160 Reviewed By: riversand963 Differential Revision: D27672474 Pulled By: ajkr fbshipit-source-id: fdbbc3398f0b1d4cef6b68636e5caf369c34b3a7	2021-04-09 10:39:11 -07:00
Giuseppe Ottaviano	48cd7a3aae	Fix flush reason attribution (#8150 ) Summary: Current flush reason attribution is misleading or incorrect (depending on what the original intention was): - Flush due to WAL reaching its maximum size is attributed to `kWriteBufferManager` - Flushes due to full write buffer and write buffer manager are not distinguishable, both are attributed to `kWriteBufferFull` This changes the first to a new flush reason `kWALFull`, and splits the second between `kWriteBufferManager` and `kWriteBufferFull`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8150 Reviewed By: zhichao-cao Differential Revision: D27569645 Pulled By: ot fbshipit-source-id: 7e3c8ca186a6e71976e6b8e937297eebd4b769cc	2021-04-07 23:18:37 -07:00
Peter Dillinger	a4e82a3cca	Fix read-only DB writing to filesystem with write_dbid_to_manifest (#8164 ) Summary: Fixing another crash test failure in the case of write_dbid_to_manifest=true and reading a backup as read-only DB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8164 Test Plan: enhanced unit test for backup as read-only DB, ran blackbox_crash_test more with elevated backup_one_in Reviewed By: zhichao-cao Differential Revision: D27622237 Pulled By: pdillinger fbshipit-source-id: 680d0f99ddb465a601737f2e3f2c80efd47384fb	2021-04-07 10:26:47 -07:00
Peter Dillinger	879357fdb0	Make backups openable as read-only DBs (#8142 ) Summary: A current limitation of backups is that you don't know the exact database state of when the backup was taken. With this new feature, you can at least inspect the backup's DB state without restoring it by opening it as a read-only DB. Rather than add something like OpenAsReadOnlyDB to the BackupEngine API, which would inhibit opening stackable DB implementations read-only (if/when their APIs support it), we instead provide a DB name and Env that can be used to open as a read-only DB. Possible follow-up work: * Add a version of GetBackupInfo for a single backup. * Let CreateNewBackup return the BackupID of the newly-created backup. Implementation details: Refactored ChrootFileSystem to split off new base class RemapFileSystem, which allows more general remapping of files. We use this base class to implement BackupEngineImpl::RemapSharedFileSystem. To minimize API impact, I decided to just add these fields `name_for_open` and `env_for_open` to those set by GetBackupInfo when include_file_details=true. Creating the RemapSharedFileSystem adds a bit to the memory consumption, perhaps unnecessarily in some cases, but this has been mitigated by (a) only initialize the RemapSharedFileSystem lazily when GetBackupInfo with include_file_details=true is called, and (b) using the existing `shared_ptr<FileInfo>` objects to hold most of the mapping data. To enhance API safety, RemapSharedFileSystem is wrapped by new ReadOnlyFileSystem which rejects any attempts to write. This uncovered a couple of places in which DB::OpenForReadOnly would write to the filesystem, so I fixed these. Added a release note because this affects logging. Additional minor refactoring in backupable_db.cc to support the new functionality. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8142 Test Plan: new test (run with ASAN and UBSAN), added to stress test and ran it for a while with amplified backup_one_in Reviewed By: ajkr Differential Revision: D27535408 Pulled By: pdillinger fbshipit-source-id: 04666d310aa0261ef6b2385c43ca793ce1dfd148	2021-04-06 14:37:53 -07:00
Yanqin Jin	09528f9fa1	Fix a bug for SeekForPrev with partitioned filter and prefix (#8137 ) Summary: According to https://github.com/facebook/rocksdb/issues/5907, each filter partition "should include the bloom of the prefix of the last key in the previous partition" so that SeekForPrev() in prefix mode can return correct result. The prefix of the last key in the previous partition does not necessarily have the same prefix as the first key in the current partition. Regardless of the first key in current partition, the prefix of the last key in the previous partition should be added. The existing code, however, does not follow this. Furthermore, there is another issue: when finishing current filter partition, `FullFilterBlockBuilder::AddPrefix()` is called for the first key in next filter partition, which effectively overwrites `last_prefix_str_` prematurely. Consequently, when the filter block builder proceeds to the next partition, `last_prefix_str_` will be the prefix of its first key, leaving no way of adding the bloom of the prefix of the last key of the previous partition. Prefix extractor is FixedLength.2. ``` [ filter part 1 ] [ filter part 2 ] abc d ``` When SeekForPrev("abcd"), checking the filter partition will land on filter part 2 because "abcd" > "abc" but smaller than "d". If the filter in filter part 2 happens to return false for the test for "ab", then SeekForPrev("abcd") will build incorrect iterator tree in non-total-order mode. Also fix a unit test which starts to fail following this PR. `InDomain` should not fail due to assertion error when checking on an arbitrary key. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8137 Test Plan: ``` make check ``` Without this fix, the following command will fail pretty soon. ``` ./db_stress --acquire_snapshot_one_in=10000 --avoid_flush_during_recovery=0 \ --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 \ --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=17 \ --bottommost_compression_type=disable --cache_index_and_filter_blocks=1 --cache_size=1048576 \ --checkpoint_one_in=0 --checksum_type=kxxHash64 --clear_column_family_one_in=0 \ --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_ttl=0 \ --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 \ --compression_parallel_threads=1 --compression_type=zstd --compression_zstd_max_train_bytes=0 \ --continuous_verification_interval=0 --db=/dev/shm/rocksdb/rocksdb_crashtest_whitebox \ --db_write_buffer_size=8388608 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --enable_blob_files=0 \ --enable_compaction_filter=0 --enable_pipelined_write=1 --file_checksum_impl=big --flush_one_in=1000000 \ --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 \ --get_sorted_wal_files_one_in=0 --index_block_restart_interval=4 --index_type=2 --ingest_external_file_one_in=0 \ --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True \ --log2_keys_per_lock=10 --long_running_snapshots=1 --mark_for_compaction_one_file_in=0 \ --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=100000000 --max_key_len=3 \ --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=3 \ --max_write_buffer_size_to_maintain=8388608 --memtablerep=skip_list --mmap_read=1 --mock_direct_io=False \ --nooverwritepercent=0 --open_files=500000 --ops_per_thread=20000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=1 --partition_pinning=0 --pause_background_one_in=1000000 \ --periodic_compaction_seconds=0 --prefixpercent=5 --progress_reports=0 --read_fault_one_in=0 --read_only=0 \ --readpercent=45 --recycle_log_file_num=0 --reopen=20 --secondary_catch_up_one_in=0 \ --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 \ --sst_file_manager_bytes_per_truncate=0 --subcompactions=2 --sync=0 --sync_fault_injection=False \ --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=0 --test_cf_consistency=0 \ --top_level_index_pinning=0 --unpartitioned_pinning=1 --use_blob_db=0 --use_block_based_filter=0 \ --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_merge=0 \ --use_multiget=0 --use_ribbon_filter=0 --use_txn=0 --user_timestamp_size=8 --verify_checksum=1 \ --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --write_buffer_size=4194304 \ --write_dbid_to_manifest=1 --writepercent=35 ``` Reviewed By: pdillinger Differential Revision: D27553054 Pulled By: riversand963 fbshipit-source-id: 60e391e4a2d8d98a9a3172ec5d6176b90ec3de98	2021-04-06 12:14:08 -07:00
Yanqin Jin	dd3fbbbf95	Use separate db dir for different tests hoping to remove flakiness (#8147 ) Summary: DBWALTestWithParam relies on `SstFileManager` to have the expected behavior. However, if this test shares db directories with other DBSSTTest, then the SstFileManager may see non-empty data, thus will change its behavior to be different from expectation, introducing flakiness. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8147 Test Plan: make check Reviewed By: jay-zhuang Differential Revision: D27553362 Pulled By: riversand963 fbshipit-source-id: a2d86343e8e2220bc553b6695ce87dd21a97ddec	2021-04-03 11:48:56 -07:00
Peter Dillinger	0fccc6225e	Fix db_test2 parallelism (#8145 ) Summary: With thread/process-specific dirs. (Errors seen in FB infra.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/8145 Test Plan: see in FB infra tests Reviewed By: riversand963 Differential Revision: D27542355 Pulled By: pdillinger fbshipit-source-id: b3c8e66f91a6a6b3a775f6fc0c3cf71e63c29ade	2021-04-02 13:38:04 -07:00
Akanksha Mahajan	689b13e639	Add request_id in IODebugContext. (#8045 ) Summary: Add request_id in IODebugContext which will be populated by underlying FileSystem for IOTracing purposes. Update IOTracer to trace request_id in the tracing records. Provided API IODebugContext::SetRequestId which will set the request_id and enable tracing for request_id. The API hides the implementation and underlying file system needs to call this API directly. Update DB::StartIOTrace API and remove redundant Env* from the argument as its not used and DB already has Env that is passed down to IOTracer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8045 Test Plan: Update unit test. Differential Revision: D26899871 Pulled By: akankshamahajan15 fbshipit-source-id: 56adef52ee5af0fb3060b607c3af1ec01635fa2b	2021-04-01 13:14:51 -07:00
rockeet	5025c7ec09	version_set_test.cc: remove a redundent obj copy (#7880 ) Summary: Remove redundant obj copy Pull Request resolved: https://github.com/facebook/rocksdb/pull/7880 Reviewed By: akankshamahajan15 Differential Revision: D26921119 Pulled By: riversand963 fbshipit-source-id: f227da688b067870a069e728a67799a8a95fee99	2021-04-01 11:28:54 -07:00
Andrew Kryczka	c43a37a922	Fix compression dictionary sampling with dedicated range tombstone SSTs (#8141 ) Summary: Return early in case there are zero data blocks when `BlockBasedTableBuilder::EnterUnbuffered()` is called. This crash can only be triggered by applying dictionary compression to SST files that contain only range tombstones. It cannot be triggered by a low buffer limit alone since we only consider entering unbuffered mode after buffering a data block causing the limit to be breached, or `Finish()`ing the file. It also cannot be triggered by a totally empty file because those go through `Abandon()` rather than `Finish()` so unbuffered mode is never entered. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8141 Test Plan: added a unit test that repro'd the "Floating point exception" Reviewed By: riversand963 Differential Revision: D27495640 Pulled By: ajkr fbshipit-source-id: a463cfba476919dc5c5c380800a75a86c31ffa23	2021-04-01 05:08:17 -07:00
darionyaphet	a3a943bf63	Merge checks into one (#8138 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8138 Reviewed By: zhichao-cao Differential Revision: D27475616 Pulled By: riversand963 fbshipit-source-id: d2815eed578a90c53d6a4e0dc4aaa232516eb4f8	2021-03-31 19:13:10 -07:00
Andrew Kryczka	1ba2b8a568	Add sample_for_compression results to table properties (#8139 ) Summary: Added `TableProperties::{fast,slow}_compression_estimated_data_size`. These properties are present in block-based tables when `ColumnFamilyOptions::sample_for_compression > 0` and the necessary compression library is supported when the file is generated. They contain estimates of what `TableProperties::data_size` would be if the "fast"/"slow" compression library had been used instead. One limitation is we do not record exactly which "fast" (ZSTD or Zlib) or "slow" (LZ4 or Snappy) compression library produced the result. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8139 Test Plan: - new unit test - ran `db_bench` with `sample_for_compression=1`; verified the `data_size` property matches the `{slow,fast}_compression_estimated_data_size` when the same compression type is used for the output file compression and the sampled compression Reviewed By: riversand963 Differential Revision: D27454338 Pulled By: ajkr fbshipit-source-id: 9529293de93ddac7f03b2e149d746e9f634abac4	2021-03-31 18:21:50 -07:00
Zhichao Cao	335c5a6be5	Fix error_handler_fs_test failure due to statistics (#8136 ) Summary: Fix error_handler_fs_test failure due to statistics, it will fails due to multi-thread running and resume is different. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8136 Test Plan: make check Reviewed By: akankshamahajan15 Differential Revision: D27448828 Pulled By: zhichao-cao fbshipit-source-id: b94255c45e9e66e93334b5ca2e4e1bfcba23fc20	2021-03-30 21:44:44 -07:00
sherriiiliu	e6534900bd	Fix possible hang issue in ~DBImpl() when flush is scheduled in LOW pool (#8125 ) Summary: In DBImpl::CloseHelper, we wait for bg_compaction_scheduled_ and bg_flush_scheduled_ to drop to 0. Unschedule is called prior to cancel any unscheduled flushes/compactions. It is assumed that anything in the high priority is a flush, and anything in the low priority pool is a compaction. This assumption, however, is broken when the high-pri pool is full. As a result, bg_compaction_scheduled_ can go < 0 and bg_flush_scheduled_ will remain > 0 and DB can be in hang state. The fix is, we decrement the `bg_{flush,compaction,bottom_compaction}_scheduled_` inside the `Unschedule{Flush,Compaction,BottomCompaction}Callback()`s. DB `mutex_` will make the counts atomic in `Unschedule`. Related discussion: https://github.com/facebook/rocksdb/issues/7928 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8125 Test Plan: Added new test case which hangs without the fix. Reviewed By: jay-zhuang Differential Revision: D27390043 Pulled By: ajkr fbshipit-source-id: 78a367fba9a59ac5607ad24bd1c46dc16d5ec110	2021-03-30 18:35:20 -07:00
Jay Zhuang	a037bb35e9	Compaction should not move data to up level (#8116 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8116 Reviewed By: ajkr, mrambacher Differential Revision: D27353828 Pulled By: jay-zhuang fbshipit-source-id: 42703fb01b04d92cc097d7979e64798448852e88	2021-03-29 17:10:42 -07:00
anand76	7d7f14480e	Always truncate the latest WAL file on DB Open (#8122 ) Summary: Currently, we only truncate the latest alive WAL files when the DB is opened. If the latest WAL file is empty or was flushed during Open, its not truncated since the file will be deleted later on in the Open path. However, before deletion, a new WAL file is created, and if the process crash loops between the new WAL file creation and deletion of the old WAL file, the preallocated space will keep accumulating and eventually use up all disk space. To prevent this, always truncate the latest WAL file, even if its empty or the data was flushed. Tests: Add unit tests to db_wal_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/8122 Reviewed By: riversand963 Differential Revision: D27366132 Pulled By: anand1976 fbshipit-source-id: f923cc03ef033ccb32b140d36c6a63a8152f0e8e	2021-03-28 10:00:08 -07:00
anand76	c5f52714fb	Use malloc in rocksdb_transaction_get_snapshot (#8114 ) Summary: The snapshot structure returned by rocksdb_transaction_get_snapshot is supposed to be freed by calling rocksdb_free(), so allocate using malloc rather than new. Fixes https://github.com/facebook/rocksdb/issues/6112 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8114 Reviewed By: akankshamahajan15 Differential Revision: D27362923 Pulled By: anand1976 fbshipit-source-id: e93a8b1ffe26dafbe22529907f72b796ae971214	2021-03-26 15:51:34 -07:00
Zhichao Cao	7f27767efa	Remove disabled tests (#8123 ) Summary: Remove disabled tests Pull Request resolved: https://github.com/facebook/rocksdb/pull/8123 Test Plan: make check Reviewed By: ltamasi Differential Revision: D27367066 Pulled By: zhichao-cao fbshipit-source-id: 71fa1d492d9b0144decff0a1d0e0ef25c0ecc4ba	2021-03-26 12:49:00 -07:00
Levi Tamasi	303cb23a0f	Introduce a ThreadGuard class and use it in ExternalSSTFileTest.PickedLevelBug (#8112 ) Summary: The patch adds a resource management/RAII class called `ThreadGuard`, which can be used to ensure that the managed thread is joined when the `ThreadGuard` is destroyed, regardless of whether it is due to the object going out of scope, an early return, an exception etc. This is important because if an `std::thread` object is destroyed without having been joined (or detached) first, the process is aborted (via `std::terminate`). For now, `ThreadGuard` is only used in the test case `ExternalSSTFileTest.PickedLevelBug`; however, it could come in handy elsewhere in the codebase as well (both in test code and "real" code). Case in point: in the `PickedLevelBug` test case, with the earlier code we could end up in the above situation when the following assertion (which is before the threads are joined) is triggered: ``` ASSERT_FALSE(bg_compact_started.load()); ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8112 Test Plan: ``` make check gtest-parallel --repeat=10000 ./external_sst_file_test --gtest_filter="*PickedLevelBug" ``` Reviewed By: riversand963 Differential Revision: D27343185 Pulled By: ltamasi fbshipit-source-id: 2a8c3aa68bc78cc03ec0dbae909fb25c2cd15c69	2021-03-25 22:08:58 -07:00
Zhichao Cao	af80a78ba4	Fix flush no wal IO error bug (#8107 ) Summary: There is bug in the current code base introduced in https://github.com/facebook/rocksdb/issues/8049 , we still set the SST file write IO Error only case as hard error. Fix it by removing the logic. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8107 Test Plan: make check, error_handler_fs_test Reviewed By: anand1976 Differential Revision: D27321422 Pulled By: zhichao-cao fbshipit-source-id: c014afc1553ca66b655e3bbf9d0bf6eb417ccf94	2021-03-25 21:42:50 -07:00
storagezhang	711881bc25	Fix some typos in comments (#8066 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8066 Reviewed By: jay-zhuang Differential Revision: D27280799 Pulled By: mrambacher fbshipit-source-id: 68f91f5af4ffe0a84be581961bf9366887f47702	2021-03-25 21:18:08 -07:00
Andrew Kryczka	c20a7cd6c7	Apply `sample_for_compression` to all block-based tables (#8105 ) Summary: Previously it only applied to block-based tables generated by flush. This restriction was undocumented and blocked a new use case. Now compression sampling applies to all block-based tables we generate when it is enabled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8105 Test Plan: new unit test Reviewed By: riversand963 Differential Revision: D27317275 Pulled By: ajkr fbshipit-source-id: cd9fcc5178d6515e8cb59c6facb5ac01893cb5b0	2021-03-25 15:00:45 -07:00
Jay Zhuang	45c65d6dcf	Use thread-safe `strerror_r()` to get error message (#8087 ) Summary: `strerror()` is not thread-safe, using `strerror_r()` instead. The API could be different on the different platforms, used the code from `0deef031cb/folly/String.cpp (L457)` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8087 Reviewed By: mrambacher Differential Revision: D27267151 Pulled By: jay-zhuang fbshipit-source-id: 4b8856d1ec069d5f239b764750682c56e5be9ddb	2021-03-24 23:07:27 -07:00
Connor	f06b761185	Fix unexpected compaction error for compact files (#8024 ) Summary: Summary: When doing CompactFiles on the files of multiple levels(num_level > 2) with L0 is included, the compaction would fail like this. ![image](https://user-images.githubusercontent.com/13497871/109975371-8b601280-7d35-11eb-830f-f732dc1f9246.png) The reason is that in `VerifyCompactionFileConsistency` it checks the levels between the L0 and base level should be empty, but it regards the compaction triggered by `CompactFiles` as an L0 -> base level compaction wrongly. The condition is committed several years ago, whereas it isn't correct anymore. ```c++ if (vstorage->compaction_style_ == kCompactionStyleLevel && c->start_level() == 0 && c->num_input_levels() > 2U) ``` So this PR just deletes the incorrect check. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8024 Test Plan: make check Reviewed By: jay-zhuang Differential Revision: D26907060 Pulled By: ajkr fbshipit-source-id: 538cef32faf464cd422e3f8de236ea3e58880c2b	2021-03-24 21:18:03 -07:00
Akanksha Mahajan	41e554da2b	Fix Race condition in db_sst_test (#8092 ) Summary: Fix race condition in DBSSTTest.DBWithMaxSpaceAllowedWithBlobFiles where background flush thread updates delete_blob_file but in test thread Flush() already completes after getting bg_error and delete_blob_file remains false. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8092 Test Plan: Ran ASAN job few times on CircleCI Reviewed By: riversand963 Differential Revision: D27275815 Pulled By: akankshamahajan15 fbshipit-source-id: 2939ad1671403881573bbe07c71aa474c5019130	2021-03-23 17:38:52 -07:00
Yanqin Jin	9f7c02dad5	Move compacted_db_impl.[c\|h] to db/db_impl (#8082 ) Summary: As title. All core db implementations should stay in db_impl. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8082 Test Plan: make check Reviewed By: ajkr Differential Revision: D27211442 Pulled By: riversand963 fbshipit-source-id: e0953fde75064740e899aaff7989ff033b7f5232	2021-03-23 13:49:26 -07:00
storagezhang	c8b0842bcd	Remove unused variable (#8067 ) Summary: Remove unused variable `Slice blob_to_write` in `db/blob/blob_file_cache_test.cc`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8067 Reviewed By: zhichao-cao Differential Revision: D27107693 Pulled By: riversand963 fbshipit-source-id: 9bfd4d296a6a1714ad5c1fa5bb231a0c52dbd56d	2021-03-19 12:13:59 -07:00
storagezhang	d9be6556aa	Include C++ standard library headers instead of C compatibility headers (#8068 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8068 Reviewed By: zhichao-cao Differential Revision: D27147685 Pulled By: riversand963 fbshipit-source-id: 5428b1c0142ecae17c977fba31a6d49b52983d1c	2021-03-19 12:09:47 -07:00
storagezhang	c706324208	Add default in switch (#8065 ) Summary: switch may not cover all branch in `db/c.cc`: ```c++ void rocksdb_options_set_access_hint_on_compaction_start( rocksdb_options_t* opt, int v) { switch(v) { case 0: opt->rep.access_hint_on_compaction_start = ROCKSDB_NAMESPACE::Options::NONE; break; case 1: opt->rep.access_hint_on_compaction_start = ROCKSDB_NAMESPACE::Options::NORMAL; break; case 2: opt->rep.access_hint_on_compaction_start = ROCKSDB_NAMESPACE::Options::SEQUENTIAL; break; case 3: opt->rep.access_hint_on_compaction_start = ROCKSDB_NAMESPACE::Options::WILLNEED; break; } } ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8065 Reviewed By: riversand963 Differential Revision: D27102892 Pulled By: zhichao-cao fbshipit-source-id: ad1d20d192712878e61597311ba75b55df0066d7	2021-03-19 11:57:52 -07:00
Zhichao Cao	dd0447ae2c	Add new Append API with DataVerificationInfo to Env WritableFile (#8071 ) Summary: Add the new Append and PositionedAppend API to env WritableFile. User is able to benefit from the write checksum handoff API when using the legacy Env classes. FileSystem already implemented the checksum handoff API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8071 Test Plan: make check, added new unit test. Reviewed By: anand1976 Differential Revision: D27177043 Pulled By: zhichao-cao fbshipit-source-id: 430c8331fc81099fa6d00f4fff703b68b9e8080e	2021-03-19 11:44:13 -07:00
Yanqin Jin	7ee41a5d25	Fix a test failure when built with ASSERT_STATUS_CHECKED=1 (#8075 ) Summary: As title. Test plan ASSERT_STATUS_CHECKED=1 make -j20 backupable_db_test error_handler_fs_test ./backupable_db_test ./error_handler_fs_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/8075 Reviewed By: zhichao-cao Differential Revision: D27173832 Pulled By: riversand963 fbshipit-source-id: 37dac50f7c89127804ff2572abddd4174642de30	2021-03-18 21:52:48 -07:00
Zhichao Cao	c810947184	Separate handling of WAL Sync io error with SST flush io error (#8049 ) Summary: In previous codebase, if WAL is used, all the retryable IO Error will be treated as hard error. So write is stalled. In this PR, the retryable IO error from WAL sync is separated from SST file flush io error. If WAL Sync is ok and retryable IO Error only happens during SST flush, the error is mapped to soft error. So user can continue insert to Memtable and append to WAL. Resolve the bug that if WAL sync fails, the memtable status does not roll back due to calling PickMemtable early than calling and checking SyncClosedLog. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8049 Test Plan: added new unit test, make check Reviewed By: anand1976 Differential Revision: D26965529 Pulled By: zhichao-cao fbshipit-source-id: f5fecb66602212523c92ee49d7edcb6065982410	2021-03-18 14:33:16 -07:00

1 2 3 4 5 ...

4423 commits