rocksdb

mirror of https://github.com/facebook/rocksdb.git synced 2024-11-30 04:41:49 +00:00

Author	SHA1	Message	Date
krad	6199cba998	Fix race in unit test. Summary: Avoid falling victim to race condition. Test Plan: Run the unit test Reviewers: sdong igor CC: leveldb@ Task ID: #7312624 Blame Rev:	2015-06-29 11:40:21 -07:00
Venkatesh Radhakrishnan	c9cd404bcd	Make flush check for shutdown Summary: Fixes task 7156865 where a compaction causes a hang in flush memtable if CancelAllBackgroundWork was called prior to it. Stack trace is in : https://phabricator.fb.com/P19848829 We end up waiting for a flush which will never happen because there are no background threads. Test Plan: PreShutdownFlush Reviewers: sdong, igor Reviewed By: sdong, igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40617	2015-06-25 14:43:25 -07:00
Poornima Chozhiyath Raman	4fb09c6871	Updating SeekToLast with upper bound Summary: #7124486: RocksDB's Iterator.SeekToLast should seek to the last key before iterate_upper_bound if presents Test Plan: ./db_iter_test run successfully with the new testcase Reviewers: rven, yhchiang, igor, anthony, kradhakrishnan, sdong Reviewed By: sdong Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40425	2015-06-25 09:44:30 -07:00
Islam AbdelRahman	674b1181cf	Bottommost level compaction option Summary: Replace force_bottommost_level_compaction in CompactRangeOption with an option that allow the user to (always skip, always compact, compact if compaction filter is present) the bottommost level for level based compaction. Test Plan: make check Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40527	2015-06-23 13:32:40 -07:00
Giuseppe Ottaviano	782a1590f9	Implement a table-level row cache Summary: Implementation of a table-level row cache. It only caches point queries done through the `DB::Get` interface, queries done through the `Iterator` interface will completely skip the cache. Supports snapshots and merge operations. Test Plan: Ran `make valgrind_check commit-prereq` Reviewers: igor, philipp, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39849	2015-06-23 10:25:45 -07:00
krad	de85e4cadf	Introduce WAL recovery consistency levels Summary: The "one size fits all" approach with WAL recovery will only introduce inconvenience for our varied clients as we go forward. The current recovery is a bit heuristic. We introduce the following levels of consistency while replaying the WAL. 1. RecoverAfterRestart (kTolerateCorruptedTailRecords) This mocks the current recovery mode. 2. RecoverAfterCleanShutdown (kAbsoluteConsistency) This is ideal for unit test and cases where the store is shutdown cleanly. We tolerate no corruption or incomplete writes. 3. RecoverPointInTime (kPointInTimeRecovery) This is ideal when using devices with controller cache or file systems which can loose data on restart. We recover upto the point were is no corruption or incomplete write. 4. RecoverAfterDisaster (kSkipAnyCorruptRecord) This is ideal mode to recover data. We tolerate corruption and incomplete writes, and we hop over those sections that we cannot make sense of salvaging as many records as possible. Test Plan: (1) Run added unit test to cover all levels. (2) Run make check. Reviewers: leveldb, sdong, igor Subscribers: yoshinorim, dhruba Differential Revision: https://reviews.facebook.net/D38487	2015-06-22 15:28:12 -07:00
Venkatesh Radhakrishnan	04251e1e3a	Add wal files to Checkpoint for multiple column families. Summary: When there are multiple column families, the flush in GetLiveFiles is not atomic, so that there are entries in the wal files which are needed to get a consisten RocksDB. We now add the log files to the checkpoint. Test Plan: CheckpointCF - This test forces more data to be written to the other column families after the flush of the first column family but before the second. Reviewers: igor, yhchiang, IslamAbdelRahman, anthony, kradhakrishnan, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40323	2015-06-19 16:08:31 -07:00
Igor Canadi	bf03f59c11	Disable CompressLevelCompaction() if Zlib is not supported Summary: CompressLevelCompaction() depends on Zlib. We should skip it when zlib is not present. Test Plan: `make check` without zlib Reviewers: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40401	2015-06-18 18:46:26 -07:00
Igor Canadi	760e9a94de	Fail DB::Open() when the requested compression is not available Summary: Currently RocksDB silently ignores this issue and doesn't compress the data. Based on discussion, we agree that this is pretty bad because it can cause confusion for our users. This patch fails DB::Open() if we don't support the compression that is specified in the options. Test Plan: make check with LZ4 not present. If Snappy is not present all tests will just fail because Snappy is our default library. We should make Snappy the requirement, since without it our default DB::Open() fails. Reviewers: sdong, MarkCallaghan, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39687	2015-06-18 14:55:05 -07:00
Islam AbdelRahman	4eabbdb7ec	Skip bottommost level compaction if possible Summary: This is https://reviews.facebook.net/D39999 but after introducing an option to force compaction the bottom most level Changes in this patch - Introduce force_bottommost_level_compaction to CompactRangeOptions that force compacting bottommost level during compaction - Skip bottommost level compaction if we dont have a compaction filter and force_bottommost_level_compaction options is not set Although tests pass on my machine but I suspect that there maybe some tests that I am not aware of that should use force_bottommost_level_compaction to pass in a deterministic way Test Plan: make check adding new tests Reviewers: igor, sdong, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40059	2015-06-18 11:03:31 -07:00
Islam AbdelRahman	12e030a992	Use CompactRangeOptions for CompactRange Summary: This diff update DB::CompactRange to use RangeCompactionOptions instead of using multiple parameters Old CompactRange is still available but deprecated Test Plan: make all check make rocksdbjava USE_CLANG=1 make all OPT=-DROCKSDB_LITE make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40209	2015-06-17 14:36:14 -07:00
sdong	40f562e747	Allow GetApproximateSize() to include mem table size if it is skip list memtable Summary: Add an option in GetApproximateSize() so that the result will include estimated sizes in mem tables. To implement it, implement an estimated count from the beginning to a key in skip list. The approach is to count to find the entry, how many Next() is issued from each level, and sum them with a weight that is <branching factor> ^ <level>. Test Plan: Add a test case Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40119	2015-06-16 18:13:23 -07:00
Islam AbdelRahman	cccd2199a6	Revert skip bottommost compaction Summary: Reverting this diff https://reviews.facebook.net/D39999 Will add an option to force bottom most level compaction and then re submit it Test Plan: make check Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40041	2015-06-12 10:43:33 -07:00
Islam AbdelRahman	20f2b54252	Skip bottom most level compaction if no compaction filter Summary: If we don't have a compaction filter then we can skip compacting the bottom most level Test Plan: make check added unit tests Reviewers: yhchiang, sdong, igor Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39999	2015-06-12 09:56:08 -07:00
sdong	7842920be5	Slow down writes by bytes written Summary: We slow down data into the database to the rate of options.delayed_write_rate (a new option) with this patch. The thread synchronization approach I take is to still synchronize write controller by DB mutex and GetDelay() is inside DB mutex. Try to minimize the frequency of getting time in GetDelay(). I verified it through db_bench and it seems to work hard_rate_limit is deprecated. options.delayed_write_rate is still not dynamically changeable. Need to work on it as a follow-up. Test Plan: Add new unit tests in db_test Reviewers: yhchiang, rven, kradhakrishnan, anthony, MarkCallaghan, igor Reviewed By: igor Subscribers: ikabiljo, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36351	2015-06-11 20:42:18 -07:00
Igor Canadi	a84df655f3	Don't let two L0->L1 compactions run in parallel Summary: With experimental feature SuggestCompactRange() we don't restrict running two L0->L1 compactions in parallel. This diff fixes this. Test Plan: added a unit test to reproduce the failure. fixed the unit test Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39981	2015-06-11 15:42:16 -07:00
Islam AbdelRahman	73faa3d41d	Handling edge cases for ReFitLevel Summary: Right now the level we pass to ReFitLevel is the maximum level with files (before compaction), there are multiple cases where this maximum level have changed after compaction - all files where in L0 (now maximum level is L1) - using kCompactionStyleUniversal (now maximum level in the last level) - level_compaction_dynamic_level_bytes ?? We can handle each of these cases individually, but I felt it's safer to calculate max_level_with_files again if we want to do a ReFitLevel Test Plan: adding some tests make -j64 check Reviewers: igor, sdong Reviewed By: sdong Subscribers: ott, dhruba Differential Revision: https://reviews.facebook.net/D39663	2015-06-11 14:15:52 -07:00
sdong	e409d3d745	Make "make all" work for CYGWIN Summary: Some test and benchmark codes don't build for CYGWIN. Fix it. Test Plan: Build "make all" with TARGET_OS=Cygwin on cygwin and make sure it passes. Reviewers: rven, yhchiang, anthony, igor, kradhakrishnan Reviewed By: igor, kradhakrishnan Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39711	2015-06-09 16:36:07 -07:00
Venkatesh Radhakrishnan	406a5682eb	Fix hang when closing a DB after doing loads with WAL disabled. Summary: There is a hang during DB close in the following scenario: a) a load with WAL disabled was done, b) CancelAllBackgroundWork was called, c) DB Close was called This was because in that we will wait for a flush but we cannot do a background flush because we have called CancelAllBackgroundWork which marks the DB as shutting downn. Test Plan: Added DBTest FlushOnDestroy Reviewers: sdong Reviewed By: sdong Subscribers: yoshinorim, hermanlee4, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39747	2015-06-09 10:39:49 -07:00
sdong	6df589b446	Add TablePropertiesCollector::NeedCompact() to suggest DB to further compact output files Summary: It is experimental. Allow users to return from a call back function TablePropertiesCollector::NeedCompact(), based on the data in the file. It can be used to allow users to suggest DB to clear up delete tombstones faster. Test Plan: Add a unit test. Reviewers: igor, yhchiang, kradhakrishnan, rven Reviewed By: rven Subscribers: yoshinorim, MarkCallaghan, maykov, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39585	2015-06-05 20:18:21 -07:00
Yueh-Hsuan Chiang	7322c74012	Revert incorrect commit Summary: Revert incorrect commit Test Plan: db_test Reviewers: sdong, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39651	2015-06-05 11:23:09 -07:00
Yueh-Hsuan Chiang	7647df8f9e	Fixed the tsan failure in util/compaction_job_stats_impl.cc Summary: The type of smallest_output_key_prefix and largest_output_key_prefix have been changed to std::string in https://reviews.facebook.net/D39537. As a result, we shouldn't do smallest_output_key_prefix[0] = 0 in the initialization. Test Plan: compile db_test with tsan enabled and repeat DBTest.CompactionDeletionTrigger test to verify the tsan issue has been gone. Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39645	2015-06-05 11:05:35 -07:00
Islam AbdelRahman	3ce3bb3da2	Allowing L0 -> L1 trivial move on sorted data Summary: This diff updates the logic of how we do trivial move, now trivial move can run on any number of files in input level as long as they are not overlapping The conditions for trivial move have been updated Introduced conditions: - Trivial move cannot happen if we have a compaction filter (except if the compaction is not manual) - Input level files cannot be overlapping Removed conditions: - Trivial move only run when the compaction is not manual - Input level should can contain only 1 file More context on what tests failed because of Trivial move ``` DBTest.CompactionsGenerateMultipleFiles This test is expecting compaction on a file in L0 to generate multiple files in L1, this test will fail with trivial move because we end up with one file in L1 ``` ``` DBTest.NoSpaceCompactRange This test expect compaction to fail when we force environment to report running out of space, of course this is not valid in trivial move situation because trivial move does not need any extra space, and did not check for that ``` ``` DBTest.DropWrites Similar to DBTest.NoSpaceCompactRange ``` ``` DBTest.DeleteObsoleteFilesPendingOutputs This test expect that a file in L2 is deleted after it's moved to L3, this is not valid with trivial move because although the file was moved it is now used by L3 ``` ``` CuckooTableDBTest.CompactionIntoMultipleFiles Same as DBTest.CompactionsGenerateMultipleFiles ``` This diff is based on a work by @sdong https://reviews.facebook.net/D34149 Test Plan: make -j64 check Reviewers: rven, sdong, igor Reviewed By: igor Subscribers: yhchiang, ott, march, dhruba, sdong Differential Revision: https://reviews.facebook.net/D34797	2015-06-04 16:51:25 -07:00
Yueh-Hsuan Chiang	0b3172d071	Add EventListener::OnTableFileDeletion() Summary: Add EventListener::OnTableFileDeletion(), which will be called when a table file is deleted. Test Plan: Extend three existing tests in db_test to verify the deleted files. Reviewers: rven, anthony, kradhakrishnan, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38931	2015-06-03 19:57:01 -07:00
sdong	3af668ed17	Fix DBTest.MigrateToDynamicLevelMaxBytesBase slowness with valgrind Summary: DBTest.MigrateToDynamicLevelMaxBytesBase with valgrind test is extremely slow. Work it around by not having both threads running everything non-stop. Test Plan: Run the test with valgrind which used to take too long to finish and see it finish in reasonable time. Reviewers: yhchiang, anthony, rven, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39477	2015-06-03 12:08:37 -07:00
Yueh-Hsuan Chiang	898e803fc5	Add a stats counter for DB_WRITE back which was mistakenly removed. Summary: Add a stats counter for DB_WRITE back which was mistakenly removed. Test Plan: augment GroupCommitTest Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39399	2015-06-02 12:35:12 -07:00
sdong	4266d4fd90	Allow users to migrate to options.level_compaction_dynamic_level_bytes=true using CompactRange() Summary: In DB::CompactRange(), change parameter "reduce_level" to "change_level". Users can compact all data to the last level if needed. By doing it, users can migrate the DB to options.level_compaction_dynamic_level_bytes=true. Test Plan: Add a unit test for it. Reviewers: yhchiang, anthony, kradhakrishnan, igor, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39099	2015-06-01 18:21:14 -07:00
agiardullo	dc9d70de65	Optimistic Transactions Summary: Optimistic transactions supporting begin/commit/rollback semantics. Currently relies on checking the memtable to determine if there are any collisions at commit time. Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty. You should probably start with transaction.h to get an overview of what is currently supported. Test Plan: Added a new test, but still need to look into stress testing. Reviewers: yhchiang, igor, rven, sdong Reviewed By: sdong Subscribers: adamretter, MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D33435	2015-05-29 14:36:35 -07:00
agiardullo	c815351038	Support saving history in memtable_list Summary: For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit. After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much. This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit). However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions. Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here. Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37443	2015-05-28 16:34:24 -07:00
Yueh-Hsuan Chiang	e2c1d4b57f	[Public API Change] Make DB::GetDbIdentity() be const function. Summary: Make DB::GetDbIdentity() be const function. Test Plan: make db_test Reviewers: igor, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38745	2015-05-21 11:01:48 -07:00
Igor Canadi	b0fdda4ff0	Allow flushes to run in parallel with manual compaction Summary: As title. I spent some time thinking about it and I don't think there should be any issue with running manual compaction and flushes in parallel Test Plan: make check works Reviewers: rven, yhchiang, sdong Reviewed By: yhchiang, sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38355	2015-05-18 15:34:33 -07:00
sdong	fb5bdbf987	DBTest.DynamicLevelMaxBytesCompactRange: make sure L0 is not empty before running compact range Summary: DBTest.DynamicLevelMaxBytesCompactRange needs to make sure L0 is not empty to properly cover the code paths we want to cover. However, current codes have a bug that might leave the condition not held. Improve the test to ensure it. Test Plan: Run the test in an environment that is used to fail. Also run it many times. Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D38631	2015-05-18 11:49:45 -07:00
sdong	6fa7085121	CompactRange skips levels 1 to base_level -1 for dynamic level base size Summary: CompactRange() now is much more expensive for dynamic level base size as it goes through all the levels. Skip those not used levels between level 0 an base level. Test Plan: Run all unit tests Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37125	2015-05-18 10:54:11 -07:00
sdong	5aad881298	DBTest.DynamicLevelMaxBytesBase2: remove an unnecesary check Summary: DBTest.DynamicLevelMaxBytesBase2 has a check that is not necessary and may fail. Remove it, and add two unrelated check. Test Plan: Run the test Reviewers: yhchiang, rven, kradhakrishnan, anthony, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D38457	2015-05-14 09:22:43 -07:00
sdong	ec43a8b9fb	Universal Compaction with multiple levels won't allocate up to output size Summary: Universal compactions with multiple levels should use file preallocation size based on file size if output level is not level 0 Test Plan: Run all tests. Reviewers: igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D38439	2015-05-13 14:15:46 -07:00
agiardullo	711465ccec	API to fetch from both a WriteBatchWithIndex and the db Summary: Added a couple functions to WriteBatchWithIndex to make it easier to query the value of a key including reading pending writes from a batch. (This is needed for transactions). I created write_batch_with_index_internal.h to use to store an internal-only helper function since there wasn't a good place in the existing class hierarchy to store this function (and it didn't seem right to stick this function inside WriteBatchInternal::Rep). Since I needed to access the WriteBatchEntryComparator, I moved some helper classes from write_batch_with_index.cc into write_batch_with_index_internal.h/.cc. WriteBatchIndexEntry, ReadableWriteBatch, and WriteBatchEntryComparator are all unchanged (just moved to a different file(s)). Test Plan: Added new unit tests. Reviewers: rven, yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38037	2015-05-11 14:51:51 -07:00
Igor Canadi	3996fff8a1	Fix clang build - add override Summary: In new clang we need to add override to every overriden function Test Plan: none Reviewers: rven Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D38259	2015-05-09 11:04:14 -07:00
Igor Canadi	d978139063	SuggestCompactRange() is manual compaction Summary: When reporting compaction that was started because of SuggestCompactRange() we should treat it as manual compaction. Test Plan: none Reviewers: yhchiang, rven Reviewed By: rven Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38139	2015-05-08 19:37:02 -07:00
Yueh-Hsuan Chiang	77a5a543a5	Allow GetThreadList() to report basic compaction operation properties. Summary: Now we're able to show more details about a compaction in GetThreadList() :) This patch allows GetThreadList() to report basic compaction operation properties. Basic compaction properties include: 1. job id 2. compaction input / output level 3. compaction property flags (is_manual, is_deletion, .. etc) 4. total input bytes 5. the number of bytes has been read currently. 6. the number of bytes has been written currently. Flush operation properties will be done in a seperate diff. Test Plan: /db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=1 Sample output of tracking same job: ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 31.357 ms CompactionJob::FinishCompactionOutputFile BaseInputLevel 1 \| BytesRead 2264663 \| BytesWritten 1934241 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 59.440 ms CompactionJob::FinishCompactionOutputFile BaseInputLevel 1 \| BytesRead 2264663 \| BytesWritten 1934241 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 226.375 ms CompactionJob::Install BaseInputLevel 1 \| BytesRead 3958013 \| BytesWritten 3621940 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37653	2015-05-06 22:51:06 -07:00
Venkatesh Radhakrishnan	d2346c2cf0	Fix hang with large write batches and column families. Summary: This diff fixes a hang reported by a Github user. https://www.facebook.com/l.php?u=https%3A%2F%2Fgithub.com%2Ffacebook%2Frocksdb%2Fissues%2F595%23issuecomment-96983273&h=9AQFYOWlo Multiple large write batches with column families cause a hang. The issue was caused by not doing flushes/compaction when the write controller was stopped. Test Plan: Create a DBTest from the user's test case Reviewers: igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37929	2015-05-01 15:41:50 -07:00
Igor Canadi	7246ad34d0	Don't compact bottommost level in SuggestCompactRange Summary: Before the fix we also marked the bottommost level for compaction. This is wrong because then RocksDB has N+1 levels instead of N as before the compaction. Test Plan: SuggestCompactRangeTest in db_test Reviewers: yhchiang, rven Reviewed By: rven Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37869	2015-04-29 13:35:48 -07:00
Igor Canadi	7f47ba0e26	Fix possible SIGSEGV in CompactRange (github issue #596 ) Summary: For very detailed explanation of what's happening read this: https://github.com/facebook/rocksdb/issues/596 Test Plan: make check + new unit test Reviewers: yhchiang, anthony, rven Reviewed By: rven Subscribers: adamretter, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37779	2015-04-29 10:52:31 -07:00
clark.kang	6ede020dc4	fix typos	2015-04-25 18:14:27 +09:00
sdong	d01bbb53ae	Fix CompactRange for universal compaction with num_levels > 1 Summary: CompactRange for universal compaction with num_levels > 1 seems to have a bug. The unit test also has a bug so it doesn't capture the problem. Fix it. Revert the compact range to the logic equivalent to num_levels=1. Always compact all files together. It should also fix DBTest.IncreaseUniversalCompactionNumLevels. The issue was that options.write_buffer_size = 100 << 10 and options.write_buffer_size = 100 << 10 are not used in later test scenarios. So write_buffer_size of 4MB was used. The compaction trigger condition is not anymore obvious as expected. Test Plan: Run the new test and all test suites Reviewers: yhchiang, rven, kradhakrishnan, anthony, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37551	2015-04-23 19:12:31 -07:00
Giuseppe Ottaviano	2dc421df48	Implement DB::PromoteL0 method Summary: This diff implements a new `DB` method `PromoteL0` which moves all files in L0 to a given level skipping compaction, provided that the files have disjoint ranges and all levels up to the target level are empty. This method provides finer-grain control for trivial compactions, and it is useful for bulk-loading pre-sorted keys. Compared to D34797, it does not change the semantics of an existing operation, which can impact existing code. PromoteL0 is designed to work well in combination with the proposed `GetSstFileWriter`/`AddFile` interface, enabling to "design" the level structure by populating one level at a time. Such fine-grained control can be very useful for static or mostly-static databases. Test Plan: `make check` Reviewers: IslamAbdelRahman, philipp, MarkCallaghan, yhchiang, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37107	2015-04-23 12:10:36 -07:00
sdong	397b6588bd	options.paranoid_file_checks to read all rows after writing to a file. Summary: To further distinguish the corruption cases were caused by storage media or in memory states when writing it, add a paranoid check after writing the file to iterate all the rows. Test Plan: Add a new unit test for it Reviewers: rven, igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37335	2015-04-23 11:34:35 -07:00
Venkatesh Radhakrishnan	618d07b068	Making PreShutdown tests more reliable. Summary: A couple of times on Travis, we have had the thread status say that there were no compactions done and since we assert for it, the test failed. We now fix this by waiting till compaction started. Test Plan: run DBTEST::PreShutdown d=/tmp/j; rm -rf $d; seq 200 \| parallel --gnu --eta 'd=/tmp/j/d-{}; mkdir -p $d; TEST_TMPDIR=$d ./db_test --gtest_filter=DBTest.PreShutdown* >& '$d'/log-{}' Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37545	2015-04-23 08:35:02 -07:00
Igor Canadi	6059bdf86a	Add experimental API MarkForCompaction() Summary: Some Mongo+Rocks datasets in Parse's environment are not doing compactions very frequently. During the quiet period (with no IO), we'd like to schedule compactions so that our reads become faster. Also, aggressively compacting during quiet periods helps when write bursts happen. In addition, we also want to compact files that are containing deleted key ranges (like old oplog keys). All of this is currently not possible with CompactRange() because it's single-threaded and blocks all other compactions from happening. Running CompactRange() risks an issue of blocking writes because we generate too much Level 0 files before the compaction is over. Stopping writes is very dangerous because they hold transaction locks. We tried running manual compaction once on Mongo+Rocks and everything fell apart. MarkForCompaction() solves all of those problems. This is very light-weight manual compaction. It is lower priority than automatic compactions, which means it shouldn't interfere with background process keeping the LSM tree clean. However, if no automatic compactions need to be run (or we have extra background threads available), we will start compacting files that are marked for compaction. Test Plan: added a new unit test Reviewers: yhchiang, rven, MarkCallaghan, sdong Reviewed By: sdong Subscribers: yoshinorim, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37083	2015-04-17 16:44:45 -07:00
sdong	debaf85ef5	Bug of trivial move of dynamic level Summary: D36669 introduces a bug that trivial moved data is not going to specific level but the next level, which will incorrectly be level 1 for level 0 compaciton if base level is not level 1. Fixing it by appreciating the output level Test Plan: Run all tests Reviewers: MarkCallaghan, rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37119	2015-04-14 21:42:08 -07:00
sdong	12d7d3d28d	Fix and Improve DBTest.DynamicLevelCompressionPerLevel2 Summary: Recent change of DBTest.DynamicLevelCompressionPerLevel2 has a bug that the second sync point is not enabled. Fix it. Also add an assert for that. Also, flush compression is not tracked in the test. Add it. Test Plan: Build everything Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37101	2015-04-14 21:42:08 -07:00

1 2 3 4 5 ...

538 commits