Commit graph

313 commits

Author SHA1 Message Date
Siying Dong cd7c4143d7 Improve Write Stalling System
Summary:
Current write stalling system has the problem of lacking of positive feedback if the restricted rate is already too low. Users sometimes stack in very low slowdown value. With the diff, we add a positive feedback (increasing the slowdown value) if we recover from slowdown state back to normal. To avoid the positive feedback to keep the slowdown value to be to high, we add issue a negative feedback every time we are close to the stop condition. Experiments show it is easier to reach a relative balance than before.

Also increase level0_stop_writes_trigger default from 24 to 32. Since level0_slowdown_writes_trigger default is 20, stop trigger 24 only gives four files as the buffer time to slowdown writes. In order to avoid stop in four files while 20 files have been accumulated, the slowdown value must be very low, which is amost the same as stop. It also doesn't give enough time for the slowdown value to converge. Increase it to 32 will smooth out the system.
Closes https://github.com/facebook/rocksdb/pull/1562

Differential Revision: D4218519

Pulled By: siying

fbshipit-source-id: 95e4088
2016-11-23 09:24:15 -08:00
Islam AbdelRahman c1038d2837 Release RocksDB 5.0
Summary:
Update HISTORY.md and version.h
Closes https://github.com/facebook/rocksdb/pull/1536

Differential Revision: D4202987

Pulled By: IslamAbdelRahman

fbshipit-source-id: 94985e3
2016-11-17 18:39:15 -08:00
Yi Wu 36e4762ce0 Remove Ticker::SEQUENCE_NUMBER
Summary:
Remove the ticker count because:
* Having to reset the ticker count in WriteImpl is ineffiecent;
* It doesn't make sense to have it as a ticker count if multiple db
  instance share a statistics object.
Closes https://github.com/facebook/rocksdb/pull/1531

Differential Revision: D4194442

Pulled By: yiwu-arbug

fbshipit-source-id: e2110a9
2016-11-16 22:39:09 -08:00
Andrew Kryczka 0765babe15 Remove LATEST_BACKUP file
Summary:
This has been unused since D42069 but kept around for backward
compatibility. I think it is unlikely anyone will use a much older version of
RocksDB for restore than they use for backup, so I propose removing it. It is
also causing recurring confusion, e.g., https://www.facebook.com/groups/rocksdb.dev/permalink/980454015386446/

Ported from https://reviews.facebook.net/D60735
Closes https://github.com/facebook/rocksdb/pull/1529

Differential Revision: D4194199

Pulled By: ajkr

fbshipit-source-id: 82f9bf4
2016-11-16 17:24:15 -08:00
Yueh-Hsuan Chiang 647eafdc21 Introduce Lua Extension: RocksLuaCompactionFilter
Summary:
This diff includes an implementation of CompactionFilter that allows
users to write CompactionFilter in Lua.  With this ability, users can
dynamically change compaction filter logic without requiring building
the rocksdb binary and restarting the database.

To compile, WITH_LUA_PATH must be specified to the base directory
of lua.
Closes https://github.com/facebook/rocksdb/pull/1478

Differential Revision: D4150138

Pulled By: yhchiang

fbshipit-source-id: ed84222
2016-11-16 15:39:12 -08:00
Siying Dong 972e3ff295 Enable allow_concurrent_memtable_write and enable_write_thread_adaptive_yield by default
Summary: Closes https://github.com/facebook/rocksdb/pull/1496

Differential Revision: D4168080

Pulled By: siying

fbshipit-source-id: 056ae62
2016-11-16 09:39:09 -08:00
Andrew Kryczka 489d142808 DeleteRange interface
Summary:
Expose DeleteRange() interface since we think the implementation is functionally correct now.
Closes https://github.com/facebook/rocksdb/pull/1503

Differential Revision: D4171921

Pulled By: ajkr

fbshipit-source-id: 5e21c98
2016-11-15 15:24:16 -08:00
Artemiy Kolesnikov 91300d01f6 Dynamic max_total_wal_size option
Summary: Closes https://github.com/facebook/rocksdb/pull/1509

Differential Revision: D4176426

Pulled By: yiwu-arbug

fbshipit-source-id: b57689d
2016-11-14 22:54:17 -08:00
Yi Wu 1ea79a78c9 Optimize sequential insert into memtable - Part 1: Interface
Summary:
Currently our skip-list have an optimization to speedup sequential
inserts from a single stream, by remembering the last insert position.
We extend the idea to support sequential inserts from multiple streams,
and even tolerate small reordering wihtin each stream.

This PR is the interface part adding the following:
- Add `memtable_insert_prefix_extractor` to allow specifying prefix for each key.
- Add `InsertWithHint()` interface to memtable, to allow underlying
  implementation to return a hint of insert position, which can be later
  pass back to optimize inserts.
- Memtable will maintain a map from prefix to hints and pass the hint
  via `InsertWithHint()` if `memtable_insert_prefix_extractor` is non-null.
Closes https://github.com/facebook/rocksdb/pull/1419

Differential Revision: D4079367

Pulled By: yiwu-arbug

fbshipit-source-id: 3555326
2016-11-13 19:09:18 -08:00
Lijun Tang adb665e0bf Allowed delayed_write_rate option to be dynamically set.
Summary: Closes https://github.com/facebook/rocksdb/pull/1488

Differential Revision: D4157784

Pulled By: siying

fbshipit-source-id: f150081
2016-11-12 15:54:11 -08:00
Yi Wu 437942e481 Add avoid_flush_during_shutdown DB option
Summary:
Add avoid_flush_during_shutdown DB option.
Closes https://github.com/facebook/rocksdb/pull/1451

Differential Revision: D4108643

Pulled By: yiwu-arbug

fbshipit-source-id: abdaf4d
2016-11-02 15:39:18 -07:00
Benoit Girard 2b16d664cb Change max_bytes_for_level_multiplier to double
Summary: Closes https://github.com/facebook/rocksdb/pull/1427

Differential Revision: D4094732

Pulled By: yiwu-arbug

fbshipit-source-id: b9b79e9
2016-11-01 21:09:23 -07:00
Yueh-Hsuan Chiang ab53998372 Bump RocksDB version to 4.13 (#1405)
Summary:
Bump RocksDB version to 4.13

Test Plan:
unit tests

Reviewers: sdong, IslamAbdelRahman, andrewkr, lightmark

Subscribers: leveldb
2016-10-18 15:39:10 -07:00
Aaron Gao d88dff4ef2 add seeforprev in history
Summary: update new feature in history and avoid breaking mongorocks

Test Plan: make check

Reviewers: sdong, yiwu, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64611
2016-10-17 15:34:13 -07:00
Yi Wu e29d3b67c2 Make max_background_compactions and base_background_compactions dynamic changeable
Summary:
Add DB::SetDBOptions to dynamic change max_background_compactions and base_background_compactions.
I'll add more dynamic changeable options soon.

Test Plan: unit test.

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64749
2016-10-14 12:25:39 -07:00
Yi Wu d6ae6dec69 Add Statistics::getAndResetTickerCount().
Summary: A convience method to atomically get and reset ticker count. I'm wanting to use it to have a thin wrapper to the statistics object to export ticker counts to ODS for LogDevice (since they don't even use fb303).

Test Plan:
test in LogDevice shadow cluster.
https://fburl.com/461868822

Reviewers: andrewkr, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64869
2016-10-11 10:54:11 -07:00
Yi Wu 17f76fc564 DB::GetOptions() reflect dynamic changed options
Summary: DB::GetOptions() reflect dynamic changed options.

Test Plan: See the new unit test.

Reviewers: yhchiang, sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63903
2016-09-14 22:10:28 -07:00
Islam AbdelRahman eb1d4d53c8 Release RocksDB 4.12
Summary: Release 4.12

Test Plan: none

Reviewers: andrewkr, yiwu, lightmark, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D63885
2016-09-12 12:34:08 -07:00
Andrew Kryczka de28a25533 Update HISTORY.md for thread-local stats
Summary: as titled

Test Plan: doitlive

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63777
2016-09-08 14:02:19 -07:00
sdong 607628d349 Support ZSTD with finalized format
Summary:
ZSTD 1.0.0 is coming. We can finally add a support of ZSTD without worrying about compatibility.
Still keep ZSTDNotFinal for compatibility reason.

Test Plan: Run all tests. Run db_bench with ZSTD version with RocksDB built with ZSTD 1.0 and older.

Reviewers: andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: cyan, igor, IslamAbdelRahman, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D63141
2016-09-06 12:22:16 -07:00
Yi Wu a88677d2cf Remove ImmutableCFOptions from public API
Summary: There's no reference to ImmutableCFOptions elsewhere in /include/rocksdb. ImmutableCFOptions was introduced in this commit (5665e5e285) but later its reference in /include/rocksdb/table.h is removed.

Test Plan:
  make all check

Reviewers: IslamAbdelRahman, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: yhchiang, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63177
2016-09-02 14:16:31 -07:00
sdong 32149059f9 Merge options source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes
Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes.

Test Plan: Add two new unit tests. Run all existing tests, including jtest.

Reviewers: yhchiang, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59829
2016-09-01 14:33:24 -07:00
Justin T. Gibbs 23a057007c Document memtable flush behavior in CancelAllBackgroundWork()
Summary:
Update History.md to reflect recent change that ensures unpersisted data
is flushed even if clients call CancelAllBackgroundWork() directly.

Test Plan: Review rendering of markdown.

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62703
2016-08-26 10:14:40 -07:00
Yi Wu 4cc37f59e5 Introduce ClockCache
Summary:
Clock-based cache implemenetation aim to have better concurreny than
default LRU cache. See inline comments for implementation details.

Test Plan:
Update cache_test to run on both LRUCache and ClockCache. Adding some
new tests to catch some of the bugs that I fixed while implementing the
cache.

Reviewers: kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61647
2016-08-19 12:28:19 -07:00
sdong 638c49f24f Change HISTORY.md for release 4.11
Summary:
Need to change HISTORY.md for 4.11.
4.10 was not updated either. Update it together.

Test Plan: Not needed.

Reviewers: kradhakrishnan, andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61665
2016-08-10 10:44:24 -07:00
Wanning Jiang c38b075e7e Update HISTORY.md 2016-08-09 21:12:45 -07:00
Wanning Jiang 8f399e3fee Update HISTORY.md 2016-08-09 20:57:40 -07:00
sdong 7c4615cf1f A utility function to help users migrate DB after options change
Summary: Add a utility function that trigger necessary full compaction and put output to the correct level by looking at new options and old options.

Test Plan: Add unit tests for it.

Reviewers: andrewkr, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: muthu, sumeet, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60783
2016-08-05 15:39:55 -07:00
sdong e5b5f12b81 Change options memtable_prefix_bloom_huge_page_tlb_size => memtable_huge_page_size and cover huge page to memtable too
Summary: Extend the option memtable_prefix_bloom_huge_page_tlb_size from just putting memtable bloom filter to huge page to memtable itself too.

Test Plan: Run all existing tests.

Reviewers: IslamAbdelRahman, yhchiang, andrewkr

Reviewed By: andrewkr

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60513
2016-07-26 18:15:11 -07:00
sdong 32df9733d1 Add options.write_buffer_manager: control total memtable size across DB instances
Summary: Add option write_buffer_manager to help users control total memory spent on memtables across multiple DB instances.

Test Plan: Add a new unit test.

Reviewers: yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: adela, benj, sumeet, muthu, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59925
2016-07-05 18:11:25 -07:00
Lucas Qiu 38fae9e65d fix typos in HISTORY.md (#1192) 2016-07-04 22:58:35 -07:00
Islam AbdelRahman 9b5adea97f Add More Logging to track total_log_size
Summary: We saw instances where total_log_size is off the real value, but I'm not able to reproduce it. Add more logging to help debugging when it happens again.

Test Plan: Run the unit test and see the logging.

Reviewers: andrewkr, yhchiang, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60081
2016-07-02 01:56:57 -07:00
Andrew Kryczka 56ac686292 Detect column family from properties [CF + RepairDB part 2/3]
Summary:
This diff uses the CF ID and CF name properties in the SST file
to associate recovered data with the proper column family. Depends on D59775.

- In ScanTable(), create column families in VersionSet each time a new one is discovered (via reading SST file properties)
- In ConvertLogToTable(), dump an SST file for every column family with data in the WAL
- In AddTables(), make a VersionEdit per-column family that adds all of that CF's tables

Test Plan:
  $ ./repair_test

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59781
2016-06-24 13:12:13 -07:00
omegaga c4e19b77e8 Add a read option to enable background purge when cleaning up iterators
Summary:
Add a read option `background_purge_on_iterator_cleanup` to avoid deleting files in foreground when destroying iterators.
Instead, a job is scheduled in high priority queue and would be executed in a separate background thread.

Test Plan: Add a variant of PurgeObsoleteFileTest. Turn on background purge option in the new test, and use sleeping task to ensure files are deleted in background.

Reviewers: IslamAbdelRahman, sdong

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59499
2016-06-21 18:41:23 -07:00
sdong 7b79238b65 Deprectate filter_deletes
Summary: filter_deltes is not a frequently used feature. Remove it.

Test Plan: Run all test suites.

Reviewers: igor, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59427
2016-06-17 10:30:47 -07:00
sdong 0babce57f7 Move away from enum char value -1
Summary: char is not signed in some platforms. Having negative values confuse those compilers.

Test Plan: Run all existing tests.

Reviewers: andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59619
2016-06-14 17:07:34 -07:00
Yi Wu bc8af90e8c add option to not flush memtable on open()
Summary:
Add option to not flush memtable on open()
In case the option is enabled, don't delete existing log files by not updating log numbers to MANIFEST.
Will still flush if we need to (e.g. memtable full in the middle). In that case we also flush final memtable.
If wal_recovery_mode = kPointInTimeRecovery, do not halt immediately after encounter corruption. Instead, check if seq id of next log file is last_log_sequence + 1. In that case we continue recovery.

Test Plan: See unit test.

Reviewers: dhruba, horuff, sdong

Reviewed By: sdong

Subscribers: benj, yhchiang, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57813
2016-06-13 11:34:16 -07:00
sdong 20699df843 memtable_prefix_bloom_bits -> memtable_prefix_bloom_bits_ratio and deprecate memtable_prefix_bloom_probes
Summary:
memtable_prefix_bloom_probes is not a critical option. Remove it to reduce number of options.
It's easier for users to make mistakes with memtable_prefix_bloom_bits, turn it to memtable_prefix_bloom_bits_ratio

Test Plan: Run all existing tests

Reviewers: yhchiang, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: gunnarku, yoshinorim, MarkCallaghan, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59199
2016-06-10 12:12:10 -07:00
Aaron Gao 53a4bd8a69 duplicate line
Summary: duplicated line in history.md

Test Plan: n/a

Reviewers: yiwu

Reviewed By: yiwu

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59433
2016-06-09 17:30:52 -07:00
Aaron Gao 3e86869616 release 4.9 update version and history
Summary: update version and history

Test Plan: N/A

Reviewers: yiwu, sdong, IslamAbdelRahman, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59421
2016-06-09 17:26:17 -07:00
sdong b2973eaaeb Remove options builder
Summary: AFIK, options builder is not used by anyone. Remove it.

Test Plan: Run all existing tests.

Reviewers: IslamAbdelRahman, andrewkr, igor

Reviewed By: igor

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59319
2016-06-09 16:56:17 -07:00
sdong 1d725ca51d Deprecate BlockBasedTableOptions.hash_index_allow_collision=false.
Summary: Deprecate this one option and delete code and tests that are now superfluous.

Test Plan: all tests pass

Reviewers: igor, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: msalib, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55317
2016-05-20 17:52:27 -07:00
Evan Shaw 7383b64b3a Fix formatting of HISTORY.md (#1126)
A couple "New Features" headers needed a blank line before them.
2016-05-19 10:56:12 -07:00
Islam AbdelRahman 4b31723433 Add bottommost_compression option
Summary:
Add a new option that can be used to set a specific compression algorithm for bottommost level.
This option will only affect levels larger than base level.

I have also updated CompactionJobInfo to include the compression algorithm used in compaction

Test Plan:
added new unittest
existing unittests

Reviewers: andrewkr, yhchiang, sdong

Reviewed By: sdong

Subscribers: lightmark, andrewkr, dhruba, yoshinorim

Differential Revision: https://reviews.facebook.net/D57669
2016-05-09 15:57:19 -07:00
Yi Wu 24a24f013d Enable configurable readahead for iterators
Summary:
Add an option `iterator_readahead_size` to `ReadOptions` to enable
configurable readahead for iterators similar to the corresponding
option for compaction.

Test Plan:
```
make commit_prereq
```

Reviewers: kumar.rangarajan, ott, igor, sdong

Reviewed By: sdong

Subscribers: yiwu, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55419
2016-05-04 15:25:58 -07:00
Yi Wu 1b166928c7 Release RocksDB 4.8.0
Summary: Release RocksDB 4.8.0

Test Plan: N/A

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57525
2016-05-02 14:38:04 -07:00
Yi Wu a92049e3e7 Added EventListener::OnTableFileCreationStarted() callback
Summary: Added EventListener::OnTableFileCreationStarted. EventListener::OnTableFileCreated will be called on failure case. User can check creation status via TableFileCreationInfo::status.

Test Plan: unit test.

Reviewers: dhruba, yhchiang, ott, sdong

Reviewed By: sdong

Subscribers: sdong, kradhakrishnan, IslamAbdelRahman, andrewkr, yhchiang, leveldb, ott, dhruba

Differential Revision: https://reviews.facebook.net/D56337
2016-04-29 11:35:00 -07:00
Andrew Kryczka 843d2e3137 Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.

During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.

On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.

Test Plan: new unit test

Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong

Reviewed By: sdong

Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D52287
2016-04-27 17:36:03 -07:00
Andrew Kryczka 7a6045a3c7 fix typo in HISTORY.md 2016-04-20 18:53:37 -07:00
Andrew Kryczka 73a847ef89 Add per-level compression ratio property
Summary:
This is needed so we can measure compression ratio improvements
achieved by D52287.

The property compares raw data size against the total file size for a given
level. If the level is empty it should return 0.0.

Test Plan: new unit test

Reviewers: IslamAbdelRahman, yhchiang, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D56967
2016-04-20 18:46:54 -07:00
Andrew Kryczka 40b840f294 Delete deprecated *BackupableDB interface for backups
Summary:
This interface is redundant and has been deprecated for a while.
It's also unused internally. Let's delete it.

I moved the comments to the corresponding functions in BackupEngine/
BackupEngineReadOnly. This caused the diff tool to not work cleanly.

Test Plan:
unit tests

  $ ./backupable_db_test

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D56331
2016-04-18 09:04:14 -07:00
sdong 4b6833aec1 Rename options.compaction_measure_io_stats to options.report_bg_io_stats and include flush too.
Summary: It is useful to print out IO stats in flush jobs too. Extend options.compaction_measure_io_stats to flush jobs and raname it.

Test Plan: Try db_bench and see the stats are printed out.

Reviewers: yhchiang

Reviewed By: yhchiang

Subscribers: kradhakrishnan, yiwu, IslamAbdelRahman, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56769
2016-04-15 10:22:18 -07:00
Islam AbdelRahman 1aeca97337 Release RocksDB 4.7
Summary: Bump the version and update HISTORY.md

Test Plan: none

Reviewers: yhchiang, sdong, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56469
2016-04-13 15:03:08 -07:00
sdong 2feafa3db9 Change some RocksDB default options
Summary: Change some RocksDB default options to make it more friendly to server workloads.

Test Plan: Run all existing tests

Reviewers: yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: sumeet, muthu, benj, MarkCallaghan, igor, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55941
2016-03-31 17:12:18 -07:00
Andrew Kryczka 0267655dad Update change log for 4.6 release
Summary: as titled

Test Plan: N/A

Reviewers: sdong, kradhakrishnan, anthony, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55323
2016-03-12 13:51:57 -08:00
Islam AbdelRahman 580fede347 Aggregate hot Iterator counters in LocalStatistics (DBIter::Next perf regression)
Summary:
This patch bump the counters in the frequent code path DBIter::Next() / DBIter::Prev() in a local data members and send them to Statistics when the iterator is destroyed
A better solution will be to have thread_local implementation for Statistics

New performance
```
readseq      :       0.035 micros/op 28597881 ops/sec; 3163.7 MB/s
     1,851,568,819      stalled-cycles-frontend   #   31.29% frontend cycles idle    [49.86%]
       884,929,823      stalled-cycles-backend    #   14.95% backend  cycles idle    [50.21%]
readreverse  :       0.071 micros/op 14077393 ops/sec; 1557.3 MB/s
     3,239,575,993      stalled-cycles-frontend   #   27.36% frontend cycles idle    [49.96%]
     1,558,253,983      stalled-cycles-backend    #   13.16% backend  cycles idle    [50.14%]

```

Existing performance

```
readreverse  :       0.174 micros/op 5732342 ops/sec;  634.1 MB/s
    20,570,209,389      stalled-cycles-frontend   #   70.71% frontend cycles idle    [50.01%]
    18,422,816,837      stalled-cycles-backend    #   63.33% backend  cycles idle    [50.04%]

readseq      :       0.119 micros/op 8400537 ops/sec;  929.3 MB/s
    15,634,225,844      stalled-cycles-frontend   #   79.07% frontend cycles idle    [49.96%]
    14,227,427,453      stalled-cycles-backend    #   71.95% backend  cycles idle    [50.09%]
```

Test Plan: unit tests

Reviewers: yhchiang, sdong, igor

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55107
2016-03-11 19:01:12 -08:00
Yi Wu f71fc77b7c Cache to have an option to fail Cache::Insert() when full
Summary:
Cache to have an option to fail Cache::Insert() when full. Update call sites to check status and handle error.

I totally have no idea what's correct behavior of all the call sites when they encounter error. Please let me know if you see something wrong or more unit test is needed.

Test Plan: make check -j32, see tests pass.

Reviewers: anthony, yhchiang, andrewkr, IslamAbdelRahman, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D54705
2016-03-10 17:35:19 -08:00
sdong 12fd9b1868 Change BlockBasedTableOptions.format_version default to 2
Summary: BlockBasedTableOptions.format_version = 2 uses better encoding format. Now it's the time to make it default.

Test Plan: Run all existing tests.

Reviewers: igor, yhchiang, anthony, kradhakrishnan, andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D54879
2016-03-01 14:28:52 -08:00
sdong 1f5954147b Introduce Iterator::GetProperty() and replace Iterator::IsKeyPinned()
Summary:
Add Iterator::GetProperty(), a way for users to communicate with iterator, and turn Iterator::IsKeyPinned() with it.
As a follow-up, I'll ask a property as the version number attached to the iterator

Test Plan: Rerun existing tests and add a negative test case.

Reviewers: yhchiang, andrewkr, kradhakrishnan, anthony, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D54783
2016-02-29 14:01:31 -08:00
sdong 92a9ccf1a6 Add a new compaction priority that picks file whose overlapping ratio is smallest
Summary:
Add a new compaction priority as following:
For every file, we calculate total size of files overalapping with the file in the next level, over the file's size itself. The file with smallest ratio will be picked first.
My "db_bench --fillrandom" shows about 5% less compaction than kOldestSmallestSeqFirst if --hard_pending_compaction_bytes_limit value to keep LSM tree in shape. If not limiting hard_pending_compaction_bytes_limit, improvement is only 1% or 2%.

Test Plan: Add a unit test

Reviewers: andrewkr, kradhakrishnan, anthony, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D54075
2016-02-11 15:59:19 -08:00
Yueh-Hsuan Chiang 4a8cbf4e31 Allows Get and MultiGet to read directly from SST files.
Summary:
Add kSstFileTier to ReadTier, which allows Get and MultiGet to
read only directly from SST files and skip mem-tables.

    kSstFileTier = 0x2      // data in SST files.
                          // Note that this ReadTier currently only supports
                          // Get and MultiGet and does not support iterators.

Test Plan: add new test in db_test.

Reviewers: anthony, IslamAbdelRahman, rven, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: igor, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D53511
2016-02-09 11:20:22 -08:00
sdong 73a9b0f4ba Update version to 4.5
Summary: Time to cut branch for release 4.5. Change the versions.

Test Plan: Not needed

Reviewers: IslamAbdelRahman, yhchiang, kradhakrishnan, andrewkr, anthony

Reviewed By: anthony

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D53883
2016-02-05 13:05:54 -08:00
Islam AbdelRahman d6c838f1e1 Add SstFileManager (component tracking all SST file in DBs and control the deletion rate)
Summary:
Add a new class SstFileTracker that will be notified whenever a DB add/delete/move and sst file, it will also replace DeleteScheduler
SstFileTracker can be used later to abort writes when we exceed a specific size

Test Plan: unit tests

Reviewers: rven, anthony, yhchiang, sdong

Reviewed By: sdong

Subscribers: igor, lovro, march, dhruba

Differential Revision: https://reviews.facebook.net/D50469
2016-01-28 18:35:01 -08:00
sdong d20915d52a Disable stats about mutex duration by default
Summary: Measuring mutex duration will measure time inside DB mutex, which breaks our best practice. Add a stat level in Statistics class. By default, disable to measure the mutex operations.

Test Plan: Add a unit test to make sure it is off by default.

Reviewers: rven, anthony, IslamAbdelRahman, kradhakrishnan, andrewkr, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D53367
2016-01-26 14:56:55 -08:00
sdong 38e1d7fea3 ldb to support --column_family option
Summary:
Add an option --column_family option, so that users can query or update specific column family.
Also add an create column family parameter to make unit test easier.
Still need to add unit tests.

Test Plan: Will add a test case in ldb python test.

Reviewers: yhchiang, rven, andrewkr, IslamAbdelRahman, kradhakrishnan, anthony

Reviewed By: anthony

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D53265
2016-01-25 14:58:18 -08:00
sdong fb9811ee9b Add a perf context level that doesn't measure time for mutex operations
Summary: Timing mutex operations can impact scalability of the system. Add a new perf context level that can measure time counters except for mutex.

Test Plan: Add a new unit test case to make sure it is not set.

Reviewers: IslamAbdelRahman, rven, kradhakrishnan, yhchiang, anthony

Reviewed By: anthony

Subscribers: MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D53199
2016-01-25 14:45:03 -08:00
agiardullo f7ebc2f346 Update HISTORY.mc for 4.4.0
Summary: Prepare to cut release

Test Plan: no code change.

Reviewers: yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D52827
2016-01-14 16:57:06 -08:00
Gunnar Kudrjavets b1a3b4c0d0 Make ldb automagically determine the file type and use the correct dumping function
Summary:
This set of changes implements the following design: `ldb` will utilize `--path` parameter which can be used to specify a file name. Tool will then apply some heuristic to determine how to output the data properly. The design decision is not to probe the file content, but use file names to determine what dumping function to call.

Usage examples:

Understands that path points to a manifest file and dumps it.
`./ldb --path=/tmp/test_db/MANIFEST-000023 dump`

Understands that path points to a WAL file and dumps it.
`./ldb --path=/tmp/test_db/000024.log dump --header`

Understands that path points to a SST file and dumps it.
`./ldb --path=/tmp/test_db/000007.sst dump`

Figures out that none of the supported file types are applicable and outputs
an appropriate error message.
`./ldb --path=/tmp/cron.log dump`

Test Plan:
Basics:

git diff
make clean
make -j 32 commit-prereq
arc lint

More specific testing (done as part of commit-prereq, but can be iterated separately when making isolated changes):

make clean
make ldb
python tools/ldb_test.py
make rocksdb_dump
make rocksdb_undump
sh tools/rocksdb_dump_test.sh

Reviewers: rven, IslamAbdelRahman, yhchiang, kradhakrishnan, anthony, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D52269
2016-01-06 14:19:08 -08:00
sdong 15b8902264 Change default options.delayed_write_rate
Summary: We now have a mechanism to further slowdown writes. Double default options.delayed_write_rate to try to keep the default behavior closer to it used to be.

Test Plan: Run all tests.

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: yhchiang, kradhakrishnan, rven, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D52281
2015-12-23 14:51:55 -08:00
sdong b9f77ba12b When slowdown is triggered, reduce the write rate
Summary: It's usually hard for users to set a value of options.delayed_write_rate. With this diff, after slowdown condition triggers, we greedily reduce write rate if estimated pending compaction bytes increase. If estimated compaction pending bytes drop, we increase the write rate.

Test Plan:
Add a unit test
Test with db_bench setting:
TEST_TMPDIR=/dev/shm/ ./db_bench --benchmarks=fillrandom -num=10000000 --soft_pending_compaction_bytes_limit=1000000000 --hard_pending_compaction_bytes_limit=3000000000 --delayed_write_rate=100000000

and make sure without the commit, write stop will happen, but with the commit, it will not happen.

Reviewers: igor, anthony, rven, yhchiang, kradhakrishnan, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D52131
2015-12-23 11:33:15 -08:00
Islam AbdelRahman d005c66faf Report compaction reason in CompactionListener
Summary:
Add CompactionReason to CompactionJobInfo
This will allow users to understand why compaction started which will help options tuning

Test Plan:
added new tests
make check -j64

Reviewers: yhchiang, anthony, kradhakrishnan, sdong, rven

Reviewed By: rven

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D51975
2015-12-22 11:37:19 -08:00
sdong d72b31774e Slowdown when writing to the last write buffer
Summary: Now if inserting to mem table is much faster than writing to files, there is no mechanism users can rely on to avoid stopping for reaching options.max_write_buffer_number. With the commit, if there are more than four maximum write buffers configured, we slow down to the rate of options.delayed_write_rate while we reach the last one.

Test Plan:
1. Add a new unit test.
2. Run db_bench with

./db_bench --benchmarks=fillrandom --num=10000000 --max_background_flushes=6 --batch_size=32 -max_write_buffer_number=4 --delayed_write_rate=500000 --statistics

based on hard drive and see stopping is avoided with the commit.

Reviewers: yhchiang, IslamAbdelRahman, anthony, rven, kradhakrishnan, igor

Reviewed By: igor

Subscribers: MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D52047
2015-12-17 10:49:08 -08:00
charsyam c30b499541 fix typos in comments 2015-12-11 01:54:48 +09:00
sdong 56e77f0967 Deprecate options.soft_rate_limit and add options.soft_pending_compaction_bytes_limit
Summary: Deprecate options.soft_rate_limit, which is hard to tune, with options.soft_pending_compaction_bytes_limit, which would trigger the slowdown if estimated pending compaction bytes exceeds the threshold. The hope is to make it more striaght-forward to tune.

Test Plan: Modify DBTest.SoftLimit to cover options.soft_pending_compaction_bytes_limit instead; run all unit tests.

Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, igor, anthony

Reviewed By: anthony

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D51117
2015-12-09 18:22:45 -08:00
sdong d6e1035a1f A new compaction picking priority that optimizes for write amplification for random updates.
Summary: Introduce a compaction picking priority that picks files who contains the oldest rows to compact. This is a mode that slightly improves write amplification for random update cases.

Test Plan: Add a unit test and run it in valgrind too.

Reviewers: yhchiang, anthony, IslamAbdelRahman, rven, kradhakrishnan, MarkCallaghan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D51459
2015-12-09 18:13:03 -08:00
krad 188170fb4e Updating HISTORY.md
Summary: Added 4.3.0 version

Test Plan:

Reviewers:

CC: leveldb@

Task ID: #9298965

Blame Rev:
2015-12-08 17:46:43 -08:00
sdong db320b1b82 DB to only flush the column family with the largest memtable while option.db_write_buffer_size is hit
Summary: When option.db_write_buffer_size is hit, we currently flush all column families. Move to flush the column family with the largest active memt table instead. In this way, we can avoid too many small files in some cases.

Test Plan: Modify test DBTest.SharedWriteBuffer to work with the updated behavior

Reviewers: kradhakrishnan, yhchiang, rven, anthony, IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: march, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D51291
2015-11-30 13:36:57 -08:00
Venkatesh Radhakrishnan 81be49c755 Have a way for compaction filter to ignore snapshots
Summary:
Provide an API for compaction filter to specify that it needs
to be applied even if there are snapshots.

Test Plan: DBTestCompactionFilter.CompactionFilterIgnoreSnapshot

Reviewers: yhchiang, IslamAbdelRahman, sdong, anthony

Reviewed By: anthony

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D51087
2015-11-20 15:57:26 -08:00
sdong 9bc9c93bd4 Move to version 4.3
Summary: RocksDB 4.2 is already cut. Move to 4.3

Test Plan: Not needed

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D50799
2015-11-16 14:29:08 -08:00
Yueh-Hsuan Chiang d781da8164 Add CheckOptionsCompatibility() API to options_util
Summary:
Add CheckOptionsCompatibility() API to options_util that returns
Status::OK if the input DBOptions and ColumnFamilyDescriptors
are compatible with the latest options stored in the specified DB path.

Test Plan: Added tests in options_util_test

Reviewers: igor, anthony, IslamAbdelRahman, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D50649
2015-11-12 16:52:51 -08:00
Yueh-Hsuan Chiang e11f676e34 Add OptionsUtil::LoadOptionsFromFile() API
Summary:
This patch adds OptionsUtil::LoadOptionsFromFile() and
OptionsUtil::LoadLatestOptionsFromDB(), which allow developers
to construct DBOptions and ColumnFamilyOptions from a RocksDB
options file.  Note that most pointer-typed options such as
merge_operator will not be constructed.

With this API, developers no longer need to remember all the
options in order to reopen an existing rocksdb instance like
the following:

  DBOptions db_options;
  std::vector<std::string> cf_names;
  std::vector<ColumnFamilyOptions> cf_opts;

  // Load primitive-typed options from an existing DB
  OptionsUtil::LoadLatestOptionsFromDB(
      dbname, &db_options, &cf_names, &cf_opts);

  // Initialize necessary pointer-typed options
  cf_opts[0].merge_operator.reset(new MyMergeOperator());
  ...

  // Construct the vector of ColumnFamilyDescriptor
  std::vector<ColumnFamilyDescriptor> cf_descs;
  for (size_t i = 0; i < cf_opts.size(); ++i) {
    cf_descs.emplace_back(cf_names[i], cf_opts[i]);
  }

  // Open the DB
  DB* db = nullptr;
  std::vector<ColumnFamilyHandle*> cf_handles;
  auto s = DB::Open(db_options, dbname, cf_descs,
                    &handles, &db);

Test Plan:
Augment existing tests in column_family_test
options_test
db_test

Reviewers: igor, IslamAbdelRahman, sdong, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D49095
2015-11-12 06:52:43 -08:00
Yueh-Hsuan Chiang e114f0abb8 Enable RocksDB to persist Options file.
Summary:
This patch allows rocksdb to persist options into a file on
DB::Open, SetOptions, and Create / Drop ColumnFamily.
Options files are created under the same directory as the rocksdb
instance.

In addition, this patch also adds a fail_if_missing_options_file in DBOptions
that makes any function call return non-ok status when it is not able to
persist options properly.

  // If true, then DB::Open / CreateColumnFamily / DropColumnFamily
  // / SetOptions will fail if options file is not detected or properly
  // persisted.
  //
  // DEFAULT: false
  bool fail_if_missing_options_file;

Options file names are formatted as OPTIONS-<number>, and RocksDB
will always keep the latest two options files.

Test Plan:
Add options_file_test.

options_test
column_family_test

Reviewers: igor, IslamAbdelRahman, sdong, anthony

Reviewed By: anthony

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48285
2015-11-10 22:58:01 -08:00
Yueh-Hsuan Chiang db3f5e4948 Update HISTORY.md
Summary:
Move some of the items in public API changes section to to new
features section.

Test Plan: no code change.

Reviewers: igor, sdong, anthony, IslamAbdelRahman

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D50229
2015-11-04 23:52:09 -08:00
Yueh-Hsuan Chiang 7d7ee2b654 Add Memory Insight support to utilities
Summary:
This patch introduces utilities/memory, which currently includes
GetApproximateMemoryUsageByType that reports different types of
rocksdb memory usage given a list of input DBs.

The API also take care of the case where Cache could be shared
across multiple column families / multiple db instances.

Currently, it reports memory usage of memtable, table-readers
and cache.

Test Plan: utilities/memory/memory_test.cc

Reviewers: igor, anthony, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D49257
2015-11-03 17:52:17 -08:00
Yueh-Hsuan Chiang 3ecbab0040 Add GetAggregatedIntProperty(): returns the aggregated value from all CFs
Summary:
This patch adds GetAggregatedIntProperty() that returns the aggregated
value from all CFs

Test Plan: Added a test in db_test

Reviewers: igor, sdong, anthony, IslamAbdelRahman, rven

Reviewed By: rven

Subscribers: rven, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D49497
2015-11-03 15:54:18 -08:00
Islam AbdelRahman 2872e0c8c2 Clean and expose CreateLoggerFromOptions
Summary:
CreateLoggerFromOptions have some parameters like  db_log_dir and env, these parameters are redundant since they already exist in DBOptions

this patch remove the redundant parameters and expose CreateLoggerFromOptions to users

Test Plan: make check

Reviewers: igor, anthony, yhchiang, rven, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: dhruba, hermanlee4

Differential Revision: https://reviews.facebook.net/D49713
2015-10-29 18:07:37 -07:00
sdong 666fb5df43 Remove DefaultCompactionFilterFactory.
Summary: DefaultCompactionFilterFactory is not used anymore after recent changes. Remove it.

Test Plan: Just run existing tests.

Reviewers: rven, igor, yhchiang, kradhakrishnan, anthony, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48699
2015-10-14 11:02:30 -07:00
sdong 776bd8d5eb Pass column family ID to table property collector
Summary: Pass column family ID through TablePropertiesCollectorFactory::CreateTablePropertiesCollector() so that users can identify which column family this file is for and handle it differently.

Test Plan: Add unit test scenarios in tests related to table properties collectors to verify the information passed in is correct.

Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48411
2015-10-09 14:36:51 -07:00
sdong b77eb16aba New Manifest format to allow customized fields in NewFile.
Summary: With this commit, we add a new format in manifest when adding a new file. Now path ID and need-compaction hint are first two customized fields.

Test Plan: Add a test case in version_edit_test to verify the encoding and decoding logic. Add a unit test in db_test to verify need compaction is persistent after DB restarting.

Reviewers: kradhakrishnan, anthony, IslamAbdelRahman, yhchiang, rven, igor

Reviewed By: igor

Subscribers: javigon, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48123
2015-10-08 15:51:45 -07:00
sdong 000836a880 CompactionFilter::Context to contain column family ID
Summary: Add the column family ID to compaction filter context, so it is easier for compaction filter to apply different logic for different column families.

Test Plan: Add a unit test to verify the column family ID passed is correct.

Reviewers: rven, yhchiang, igor

Reviewed By: igor

Subscribers: yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48357
2015-10-08 11:27:38 -07:00
Yueh-Hsuan Chiang 3a0bf873b5 Change RocksDB version to 4.1
Summary: Change RocksDB version to 4.1

Test Plan: no code change.

Reviewers: sdong, anthony, IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48387
2015-10-08 11:15:18 -07:00
Igor Canadi 9803e0d813 compaction_filter.h cleanup
Summary:
Two changes:
1. remove *V2 filter stuff. we deprecated that a while ago
2. clarify what happens when user sets max_subcompactions to bigger than 1

Test Plan: none

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47871
2015-10-08 09:32:50 -07:00
Igor Canadi d80ce7f99a Compaction filter on merge operands
Summary:
Since Andres' internship is over, I took over https://reviews.facebook.net/D42555 and rebased and simplified it a bit.

The behavior in this diff is a bit simpler than in D42555:
* only merge operators are passed through FilterMergeValue(). If fitler function returns true, the merge operator is ignored
* compaction filter is *not* called on: 1) results of merge operations and 2) base values that are getting merged with merge operands (the second case was also true in previous diff)

Do we also need a compaction filter to get called on merge results?

Test Plan: make && make check

Reviewers: lovro, tnovak, rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: noetzli, kolmike, leveldb, dhruba, sdong

Differential Revision: https://reviews.facebook.net/D47847
2015-10-07 09:30:03 -07:00
dyniusz 0267502655 Support for LevelDB SST with .ldb suffix
Summary:
	Handle SST files with both ".sst" and ".ldb" suffix.
	This enables user to migrate from leveldb to rocksdb.

Test Plan:
        Added unit test with DB operating on SSTs with names schema.
        See db/dc_test.cc:SSTsWithLdbSuffixHandling for details

Reviewers: yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48003
2015-10-06 17:46:22 -07:00
Islam AbdelRahman f03b5c987b Add experimental DB::AddFile() to plug sst files into empty DB
Summary:
This is an initial version of bulk load feature

This diff allow us to create sst files, and then bulk load them later, right now the restrictions for loading an sst file are
(1) Memtables are empty
(2) Added sst files have sequence number = 0, and existing values in database have sequence number = 0
(3) Added sst files values are not overlapping

Test Plan: unit testing

Reviewers: igor, ott, sdong

Reviewed By: sdong

Subscribers: leveldb, ott, dhruba

Differential Revision: https://reviews.facebook.net/D39081
2015-09-23 12:42:43 -07:00
Andres Noetzli 014fd55adc Support for SingleDelete()
Summary:
This patch fixes #7460559. It introduces SingleDelete as a new database
operation. This operation can be used to delete keys that were never
overwritten (no put following another put of the same key). If an overwritten
key is single deleted the behavior is undefined. Single deletion of a
non-existent key has no effect but multiple consecutive single deletions are
not allowed (see limitations).

In contrast to the conventional Delete() operation, the deletion entry is
removed along with the value when the two are lined up in a compaction. Note:
The semantics are similar to @igor's prototype that allowed to have this
behavior on the granularity of a column family (
https://reviews.facebook.net/D42093 ). This new patch, however, is more
aggressive when it comes to removing tombstones: It removes the SingleDelete
together with the value whenever there is no snapshot between them while the
older patch only did this when the sequence number of the deletion was older
than the earliest snapshot.

Most of the complex additions are in the Compaction Iterator, all other changes
should be relatively straightforward. The patch also includes basic support for
single deletions in db_stress and db_bench.

Limitations:
- Not compatible with cuckoo hash tables
- Single deletions cannot be used in combination with merges and normal
  deletions on the same key (other keys are not affected by this)
- Consecutive single deletions are currently not allowed (and older version of
  this patch supported this so it could be resurrected if needed)

Test Plan: make all check

Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor

Reviewed By: igor

Subscribers: maykov, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43179
2015-09-17 11:42:56 -07:00
Islam AbdelRahman f3f2032c46 Release RocksDB 4.0.0
Summary: Release RocksDB 4.0.0

Test Plan: no test

Reviewers: sdong, yhchiang, anthony, rven, kradhakrishnan, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D46515
2015-09-09 16:01:03 -07:00
agiardullo 25dbc579f5 Update HISTORY file for transactions
Summary: Move line about pessimistic transactions to the correct place in the history file and no longer refer to it as 'experimental'

Test Plan: n/a

Reviewers: igor, sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46449
2015-09-09 13:27:19 -07:00
Andres Noetzli 6bdc484fd8 Added Equal method to Comparator interface
Summary:
In some cases, equality comparisons can be done more efficiently than three-way
comparisons. There are quite a few places in the code where we only care about
equality. This patch adds an Equal() method that defaults to using the
Compare() method.

Test Plan: make clean all check

Reviewers: rven, anthony, yhchiang, igor, sdong

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46233
2015-09-08 15:30:49 -07:00
Igor Canadi b42cd6bed5 Remove the need for LATEST_BACKUP in BackupEngine
Summary:
In the first implementation of BackupEngine, LATEST_BACKUP was the commit point. The backup became committed after the write to LATEST_BACKUP completed.

However, we can avoid the need for LATEST_BACKUP. Instead of write to LATEST_BACKUP, the commit point can be the rename from `meta/<backup_id>.tmp` to `meta/<backup_id>`. Once we see that there exists a file `meta/<backup_id>` (without tmp), we can assume that backup is valid.

In this diff, we still write out the file LATEST_BACKUP. We need to do this so that we can maintain backward compatibility. However, the new version doesn't depend on this file anymore. We get the latest backup by `ls`-ing `meta` directory.

This diff depends on D41925

Test Plan: Adjusted backupable_db_test to this new behavior

Reviewers: benj, yhchiang, sdong, AaronFeldman

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42069
2015-09-02 11:49:49 -07:00