Commit graph

769 commits

Author SHA1 Message Date
Igor Canadi 8a9fca2619 Better error handling in BackupEngine
Summary:
Couple of changes here:
* NewBackupEngine() and NewReadOnlyBackupEngine() are now removed. They were deprecated since RocksDB 3.8. Changing these to new functions should be pretty straight-forward. As a followup, I'll fix all fbcode callsights
* Instead of initializing backup engine in the constructor, we initialize it in a separate function now. That way, we can catch all errors and return appropriate status code.
* We catch all errors during initializations and return them to the client properly.
* Added new tests to backupable_db_test, to make sure that we can't open BackupEngine when there are Env errors.
* Transitioned backupable_db_test to use BackupEngine rather than BackupableDB. From the two available APIs, judging by the current use-cases, it looks like BackupEngine API won. It's much more flexible since it doesn't require StackableDB.

Test Plan: Added a new unit test to backupable_db_test

Reviewers: yhchiang, sdong, AaronFeldman

Reviewed By: AaronFeldman

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D41925
2015-07-14 20:51:36 +02:00
Igor Canadi a9c5109515 Deprecate purge_redundant_kvs_while_flush
Summary: This option is guarding the feature implemented 2 and a half years ago: D8991. The feature was enabled by default back then and has been running without issues. There is no reason why any client would turn this feature off. I found no reference in fbcode.

Test Plan: none

Reviewers: sdong, yhchiang, anthony, dhruba

Reviewed By: dhruba

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42063
2015-07-14 13:07:02 +02:00
Igor Canadi 5aea98ddd8 Deprecate WriteOptions::timeout_hint_us
Summary:
In one of our recent meetings, we discussed deprecating features that are not being actively used. One of those features, at least within Facebook, is timeout_hint. The feature is really nicely implemented, but if nobody needs it, we should remove it from our code-base (until we get a valid use-case). Some arguments:
* Less code == better icache hit rate, smaller builds, simpler code
* The motivation for adding timeout_hint_us was to work-around RocksDB's stall issue. However, we're currently addressing the stall issue itself (see @sdong's recent work on stall write_rate), so we should never see sharp lock-ups in the future.
* Nobody is using the feature within Facebook's code-base. Googling for `timeout_hint_us` also doesn't yield any users.

Test Plan: make check

Reviewers: anthony, kradhakrishnan, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: sdong, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D41937
2015-07-14 09:35:48 +02:00
Ari Ekmekji 8bca83e5dd Add tombstone information in CompactionJobStats
Summary:
Added new statistics in CompactionJobStats to keep track of
deletion entries and the expiration of those entries. Updated these
fields in compaction_job.cc as compaction took place and wrote a new
test in compaction_job_stats_test.cc to verify accuracy.

Test Plan:
Wrote new test DeletionStatsTest in
compaction_job_stats_test.cc to verify

Reviewers: sdong, igor, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D41355
2015-07-13 15:51:38 -07:00
sdong f9728640f3 "make format" against last 10 commits
Summary: This helps Windows port to format their changes, as discussed. Might have formatted some other codes too becasue last 10 commits include more.

Test Plan: Build it.

Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D41961
2015-07-13 13:50:18 -07:00
sdong 76d3cd3286 Fix public API dependency on internal codes and dependency on MAX_INT32
Summary:
Public API depends on port/port.h which is wrong. Fix it.
Also with gcc 4.8.1 build was broken as MAX_INT32 was not recognized. Fix it by using ::max in linux.

Test Plan: Build it and try to build an external project on top of it.

Reviewers: anthony, yhchiang, kradhakrishnan, igor

Reviewed By: igor

Subscribers: yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D41745
2015-07-11 10:32:11 -07:00
sdong a6e38fd170 Fix a uncleaned counter in PerfContext::Reset()
Summary:
new_table_iterator_nanos is not cleaned in PerfContext::Reset() while new_table_block_iter_nanos is cleaned twice. Fix it.
Also fix a comment.

Test Plan: Build and db_bench with --perf_context to see the value shown.

Reviewers: kradhakrishnan, anthony, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D41721
2015-07-10 16:27:32 -07:00
Siying Dong e41cbd9c2f Merge pull request #646 from yuslepukhin/ms_win_port
Windows Port from Microsoft
2015-07-10 15:53:39 -07:00
Dmitri Smirnov 296de4ae6d Address review comments
Rule of five: add destructor
 Add a note to COMMIT.md for 3rd party json.
2015-07-10 15:21:04 -07:00
sdong 041b6f95a2 perf_context: report time spent on reading index and bloom blocks
Summary: Add a perf context counter to help users figure out time spent on reading indexes and bloom filter blocks.

Test Plan: Will write a unit test

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D41433
2015-07-10 14:45:42 -07:00
Dmitri Smirnov c903ccc4c2 Merge from github/master 2015-07-09 18:01:08 -07:00
unknown 5c79132335 Revert the changes related to Options, as requested to seperate them into
a different patch.
2015-07-09 11:31:42 -07:00
Dmitri Smirnov ef4b87f1b2 Commit both PR and internal code review changes 2015-07-07 16:58:20 -07:00
Poornima Chozhiyath Raman c0b23dd5b0 Enabling trivial move in universal compaction
Summary: This change enables trivial move if all the input files are non onverlapping while doing Universal Compaction.

Test Plan: ./compaction_picker_test and db_test ran successfully with the new testcases.

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D40875
2015-07-07 14:18:55 -07:00
Dmitri Smirnov d2f0912bd3 Merge the latest changes from github/master 2015-07-02 17:23:41 -07:00
Ari Ekmekji 35cd75c379 Introduce InfoLogLevel::HEADER_LEVEL
Summary:
 Introduced a new category in the enum InfoLogLevel in env.h.
 Modifed Log() in env.cc to use the Header()
 when the InfoLogLevel == HEADER_LEVEL.
 Updated tests in auto_roll_logger_test to ensure
 the header is handled properly in these cases.

Test Plan: Augment existing tests in auto_roll_logger_test

Reviewers: igor, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D41067
2015-07-02 17:14:39 -07:00
Dmitri Smirnov feb99c31a4 Merge remote-tracking branch 'origin' into ms_win_port 2015-07-02 13:14:18 -07:00
agiardullo 4159f5b87b Prepare 3.12
Summary: About to cut release

Test Plan: none

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D41061
2015-07-02 12:20:36 -07:00
Aaron Feldman a69bc91e37 Multithreaded backup and restore in BackupEngineImpl
Summary:
Add a new field: BackupableDBOptions.max_background_copies.
CreateNewBackup() and RestoreDBFromBackup() will use this number of threads to perform copies.
If there is a backup rate limit, then max_background_copies must be 1.
Update backupable_db_test.cc to test multi-threaded backup and restore.
Update backupable_db_test.cc to test backups when the backup environment is not the same as the database environment.

Test Plan:
Run ./backupable_db_test
Run valgrind ./backupable_db_test
Run with TSAN and ASAN

Reviewers: yhchiang, rven, anthony, sdong, igor

Reviewed By: igor

Subscribers: yhchiang, anthony, sdong, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D40725
2015-07-02 11:35:51 -07:00
Dmitri Smirnov 9dbde7277c Merge remote-tracking branch 'origin' into ms_win_port 2015-07-02 11:34:22 -07:00
Dmitri Smirnov ca2fe2c1b6 Address GCC compilation issues
invalid suffix on literal
 no return statement in function returning non-void CuckooStep::operator=
 extra qualification ‘rocksdb::spatial::Variant::
 dereferencing type-punned pointer will break strict-aliasing rules
2015-07-01 17:04:08 -07:00
Dmitri Smirnov 18285c1e2f Windows Port from Microsoft
Summary: Make RocksDb build and run on Windows to be functionally
 complete and performant. All existing test cases run with no
 regressions. Performance numbers are in the pull-request.

 Test plan: make all of the existing unit tests pass, obtain perf numbers.

 Co-authored-by: Praveen Rao praveensinghrao@outlook.com
 Co-authored-by: Sherlock Huang baihan.huang@gmail.com
 Co-authored-by: Alex Zinoviev alexander.zinoviev@me.com
 Co-authored-by: Dmitri Smirnov dmitrism@microsoft.com
2015-07-01 16:13:56 -07:00
Igor Canadi 4cbc4e6f88 Call merge operators with empty values
Summary: It's not really nice to call user's API with garbage data in new_value. This diff makes sure that new_value is empty before calling the merge operator.

Test Plan: Added assert to Merge operator in merge_test

Reviewers: sdong, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D40773
2015-06-26 11:35:46 -07:00
Yueh-Hsuan Chiang 48da7a9cad Improve the comment for BYTES_READ in statistics.
Summary:
BYTES_READ only count the number of logical bytes read from
the DB::Get() function.  It neither includes all logical bytes read
nor indicates IO read bytes.

This patch improves the comment for BYTES_READ.

Test Plan: Only change comment.

Reviewers: sdong, rven, anthony, kradhakrishnan, IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D40599
2015-06-24 15:00:51 -07:00
Islam AbdelRahman 674b1181cf Bottommost level compaction option
Summary: Replace force_bottommost_level_compaction in CompactRangeOption with an option that allow the user to (always skip, always compact, compact if compaction filter is present) the bottommost level for level based compaction.

Test Plan: make check

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D40527
2015-06-23 13:32:40 -07:00
Giuseppe Ottaviano 782a1590f9 Implement a table-level row cache
Summary:
Implementation of a table-level row cache.
It only caches point queries done through the `DB::Get` interface, queries done through the `Iterator` interface will completely skip the cache.

Supports snapshots and merge operations.

Test Plan: Ran `make valgrind_check commit-prereq`

Reviewers: igor, philipp, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D39849
2015-06-23 10:25:45 -07:00
krad de85e4cadf Introduce WAL recovery consistency levels
Summary:
The "one size fits all" approach with WAL recovery will only introduce inconvenience for our varied clients as we go forward. The current recovery is a bit heuristic. We introduce the following levels of consistency while replaying the WAL.

1. RecoverAfterRestart (kTolerateCorruptedTailRecords)

This mocks the current recovery mode.

2. RecoverAfterCleanShutdown (kAbsoluteConsistency)

This is ideal for unit test and cases where the store is shutdown cleanly. We tolerate no corruption or incomplete writes.

3. RecoverPointInTime (kPointInTimeRecovery)

This is ideal when using devices with controller cache or file systems which can loose data on restart. We recover upto the point were is no corruption or incomplete write.

4. RecoverAfterDisaster (kSkipAnyCorruptRecord)

This is ideal mode to recover data. We tolerate corruption and incomplete writes, and we hop over those sections that we cannot make sense of salvaging as many records as possible.

Test Plan:
(1) Run added unit test to cover all levels.
(2) Run make check.

Reviewers: leveldb, sdong, igor

Subscribers: yoshinorim, dhruba

Differential Revision: https://reviews.facebook.net/D38487
2015-06-22 15:28:12 -07:00
krad 7015fd81c4 Add read_nanos to IOStatsContext.
Summary: MyRocks need a mechanism to track read outliers. We need to expose this
stat.

Test Plan: None

Reviewers: sdong

CC: leveldb

Task ID: #7152512

Blame Rev:
2015-06-22 11:09:35 -07:00
Yueh-Hsuan Chiang eade498bda Block utilities/write_batch_with_index in ROCKSDB_LITE
Summary:
Block utilities/write_batch_with_index in ROCKSDB_LITE as we
don't include anly utilities in ROCKSDB_LITE

Test Plan: write_batch_with_index_test

Reviewers: rven, anthony, kradhakrishnan, IslamAbdelRahman, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D40347
2015-06-18 15:54:05 -07:00
Igor Canadi 760e9a94de Fail DB::Open() when the requested compression is not available
Summary:
Currently RocksDB silently ignores this issue and doesn't compress the data. Based on discussion, we agree that this is pretty bad because it can cause confusion for our users.

This patch fails DB::Open() if we don't support the compression that is specified in the options.

Test Plan: make check with LZ4 not present. If Snappy is not present all tests will just fail because Snappy is our default library. We should make Snappy the requirement, since without it our default DB::Open() fails.

Reviewers: sdong, MarkCallaghan, rven, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39687
2015-06-18 14:55:05 -07:00
Aaron Feldman 69bb210d58 Add Cache.GetPinnedUsageUsage()
Summary:
  Add the funcion Cache.GetPinnedUsage() to return the memory size of entries
  that are in use by the system (that is, all the entries not in the LRU list).

Test Plan:
  Run ./cache_test and examine PinnedUsageTest.

Reviewers: tnovak, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D40305
2015-06-18 13:56:31 -07:00
Islam AbdelRahman 4eabbdb7ec Skip bottommost level compaction if possible
Summary:
This is https://reviews.facebook.net/D39999 but after introducing an option to force compaction the bottom most level

Changes in this patch
- Introduce force_bottommost_level_compaction to CompactRangeOptions that force compacting bottommost level during compaction
- Skip bottommost level compaction if we dont have a compaction filter and force_bottommost_level_compaction options is not set

Although tests pass on my machine but I suspect that there maybe some tests that I am not aware of that  should use force_bottommost_level_compaction to pass in a deterministic way

Test Plan:
make check
adding new tests

Reviewers: igor, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D40059
2015-06-18 11:03:31 -07:00
Igor Canadi 4b8bb62f0a Don't dump DBOptions for each column family
Summary: Currently we dump DBOptions for each column family options we dump. This leads to duplicate lines in our LOG file. This diff fixes that.

Test Plan: Check out the LOG

Reviewers: sdong, rven, yhchiang

Reviewed By: yhchiang

Subscribers: IslamAbdelRahman, yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39729
2015-06-18 10:15:54 -07:00
Islam AbdelRahman 12e030a992 Use CompactRangeOptions for CompactRange
Summary:
This diff update DB::CompactRange to use RangeCompactionOptions instead of using multiple parameters
Old CompactRange is still available but deprecated

Test Plan:
make all check
make rocksdbjava
USE_CLANG=1 make all
OPT=-DROCKSDB_LITE make release

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D40209
2015-06-17 14:36:14 -07:00
Holodov Alexander 84a9c6a53a add comment 2015-05-16 15:29:39 +04:00
Holodov Alexander eeb44366ba C api: human-readable statistics 2015-05-16 12:34:28 +04:00
sdong 40f562e747 Allow GetApproximateSize() to include mem table size if it is skip list memtable
Summary:
Add an option in GetApproximateSize() so that the result will include estimated sizes in mem tables.
To implement it, implement an estimated count from the beginning to a key in skip list. The approach is to count to find the entry, how many Next() is issued from each level, and sum them with a weight that is <branching factor> ^ <level>.

Test Plan: Add a test case

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D40119
2015-06-16 18:13:23 -07:00
sdong 7842920be5 Slow down writes by bytes written
Summary:
We slow down data into the database to the rate of options.delayed_write_rate (a new option) with this patch.

The thread synchronization approach I take is to still synchronize write controller by DB mutex and GetDelay() is inside DB mutex. Try to minimize the frequency of getting time in GetDelay(). I verified it through db_bench and it seems to work

hard_rate_limit is deprecated.

options.delayed_write_rate is still not dynamically changeable. Need to work on it as a follow-up.

Test Plan: Add new unit tests in db_test

Reviewers: yhchiang, rven, kradhakrishnan, anthony, MarkCallaghan, igor

Reviewed By: igor

Subscribers: ikabiljo, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D36351
2015-06-11 20:42:18 -07:00
Islam AbdelRahman d6ce0f7c61 Add largest sequence to FlushJobInfo
Summary:
Adding largest sequence number to FlushJobInfo
and passing flushed file metadata to NotifyOnFlushCompleted which include alot of other values that we may want to expose in FlushJobInfo

Test Plan: make check

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D39927
2015-06-11 15:22:22 -07:00
Yueh-Hsuan Chiang 3eddd1abe9 Add Env::GetThreadID(), which returns the ID of the current thread.
Summary:
Add Env::GetThreadID(), which returns the ID of the current thread.

In addition, make GetThreadList() and InfoLog use same unique ID for the same thread.

Test Plan:
db_test
listener_test

Reviewers: igor, rven, IslamAbdelRahman, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D39735
2015-06-11 14:18:02 -07:00
Igor Canadi 821cff114e Re-generate WriteEntry on WBWIIterator::Entry()
Summary:
[This is the resubmit of D39813. Tests were failing, so I reverted the diff. I found the bug and I'm now resubmitting]

If we don't do this, any calls to Entry() after WBWI mutation will result in undefined behavior. We need to re-fetch the offset from the skip list and regenerate the new pointer (because string's base pointer can change while mutating).

Test Plan: COMPILE_WITH_ASAN=1 make write_batch_with_index_test && ./write_batch_with_index_test

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D39897
2015-06-10 12:57:38 -07:00
Igor Canadi 75222d130e Revert "Fix compile"
This reverts commit 51440f83ec.

Revert "Re-generate WriteEntry on WBWIIterator::Entry()"

This reverts commit 4949ef08db.
2015-06-10 11:05:27 -07:00
Igor Canadi 47f1e72125 Merge pull request #630 from rdallman/c-wb-logdata
C: add WriteBatch.PutLogData support
2015-06-10 10:50:28 -07:00
Igor Canadi 4949ef08db Re-generate WriteEntry on WBWIIterator::Entry()
Summary: If we don't do this, any calls to Entry() after WBWI mutation will result in undefined behavior. We need to re-fetch the offset from the skip list and regenerate the new pointer (because string's base pointer can change while mutating).

Test Plan: COMPILE_WITH_ASAN=1 make write_batch_with_index_test && ./write_batch_with_index_test

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39813
2015-06-10 10:35:19 -07:00
Reed Allman 735df66552 C: add WriteBatch.PutLogData support 2015-06-10 00:12:33 -07:00
Islam AbdelRahman 643bbbf081 Use nullptr for default compaction_filter_factory
Summary:
Replacing the default value for compaction_filter_factory and compaction_filter_factory_v2 to be nullptr instead of DefaultCompactionFilterFactory / DefaultCompactionFilterFactoryV2
The reason for this is to be able to determine easily if we have compaction filter factory or not without depending on RTTI

Test Plan: make check

Reviewers: yoshinorim, ott, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D39693
2015-06-08 16:34:26 -07:00
Igor Canadi 133130a4f5 Merge pull request #625 from rdallman/c-slice-parts-support
C: add support for WriteBatch SliceParts params
2015-06-08 13:14:44 -04:00
Igor Canadi de4d172d0f Merge pull request #622 from rdallman/c-multiget
C: add MultiGet support
2015-06-08 13:13:54 -04:00
sdong 6df589b446 Add TablePropertiesCollector::NeedCompact() to suggest DB to further compact output files
Summary:
It is experimental. Allow users to return from a call back function TablePropertiesCollector::NeedCompact(), based on the data in the file.
It can be used to allow users to suggest DB to clear up delete tombstones faster.

Test Plan: Add a unit test.

Reviewers: igor, yhchiang, kradhakrishnan, rven

Reviewed By: rven

Subscribers: yoshinorim, MarkCallaghan, maykov, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D39585
2015-06-05 20:18:21 -07:00
Yueh-Hsuan Chiang 2e764f06ea [API Change] Improve EventListener::OnFlushCompleted interface
Summary:
EventListener::OnFlushCompleted() now passes a structure instead
of a list of parameters.  This minimizes the API change in the
future.

Test Plan:
listener_test
compact_files_test
example/compact_files_example

Reviewers: kradhakrishnan, sdong, IslamAbdelRahman, rven, igor

Reviewed By: rven, igor

Subscribers: IslamAbdelRahman, rven, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39543
2015-06-05 12:28:51 -07:00