Commit graph

357 commits

Author SHA1 Message Date
Andrew Kryczka 3fa9a39c68 Add GetAllKeyVersions API
Summary:
- Introduced an include/ file dedicated to db-related debug functions to avoid making db.h more complex
- Added debugging function, `GetAllKeyVersions()`, to return a listing of internal data for a range of user keys. The new `struct KeyVersion` exposes data similar to internal key without exposing any internal type.
- Migrated the "ldb idump" subcommand to use this function
- The API takes an inclusive-exclusive range to match behavior of "ldb idump". This will be quite annoying for users who want to query a single user key's versions :(.
Closes https://github.com/facebook/rocksdb/pull/2232

Differential Revision: D4976007

Pulled By: ajkr

fbshipit-source-id: cab375da53a7595d6575af2b7e3b776aa3ad793e
2017-05-12 15:54:06 -07:00
Yi Wu 2cd00773c7 Add bulk create/drop column family API
Summary:
Adding DB::CreateColumnFamilie() and DB::DropColumnFamilies() to bulk create/drop column families. This is to address the problem creating/dropping 1k column families takes minutes. The bottleneck is we persist options files for every single column family create/drop, and it parses the persisted options file for verification, which take a lot CPU time.

The new APIs simply create/drop column families individually, and persist options file once at the end. This improves create 1k column families to within ~0.1s. Further improvement can be merge manifest write to one IO.
Closes https://github.com/facebook/rocksdb/pull/2248

Differential Revision: D5001578

Pulled By: yiwu-arbug

fbshipit-source-id: d4e00bda671451e0b314c13e12ad194b1704aa03
2017-05-07 23:20:46 -07:00
Siying Dong 264d3f540c Allow IntraL0 compaction in FIFO Compaction
Summary:
Allow an option for users to do some compaction in FIFO compaction, to pay some write amplification for fewer number of files.
Closes https://github.com/facebook/rocksdb/pull/2163

Differential Revision: D4895953

Pulled By: siying

fbshipit-source-id: a1ab608dd0627211f3e1f588a2e97159646e1231
2017-05-04 18:16:13 -07:00
Leonidas Galanis a45e98a5b5 max_open_files dynamic set, follow up
Summary:
Followup to make 0x40000 a TableCache constant that indicates infinite capacity
Closes https://github.com/facebook/rocksdb/pull/2247

Differential Revision: D5001349

Pulled By: lgalanis

fbshipit-source-id: ce7bd2e54b0975bb9f8680fdaa0f8bb0e7ae81a2
2017-05-04 10:42:45 -07:00
Leonidas Galanis e7ae4a3a02 Max open files mutable
Summary:
Makes max_open_files db option dynamically set-able by SetDBOptions. During the call of SetDBOptions we call SetCapacity on the table cache, which is a LRUCache.
Closes https://github.com/facebook/rocksdb/pull/2185

Differential Revision: D4979189

Pulled By: yiwu-arbug

fbshipit-source-id: ca7e8dc5e3619c79434f579be4847c0f7e56afda
2017-05-03 21:13:14 -07:00
siddontang b551104e04 support PopSavePoint for WriteBatch
Summary:
Try to fix https://github.com/facebook/rocksdb/issues/1969
Closes https://github.com/facebook/rocksdb/pull/2170

Differential Revision: D4907333

Pulled By: yiwu-arbug

fbshipit-source-id: 417b420ff668e6c2fd0dad42a94c57385012edc5
2017-05-03 10:57:45 -07:00
Dmitri Smirnov cdad04b051 Remove double buffering on RandomRead on Windows.
Summary:
Remove double buffering on RandomRead on Windows.
  With more logic appear in file reader/write Read no longer
  obeys forwarding calls to Windows implementation.
  Previously direct_io (unbuffered) was only available on Windows
  but now is supported as generic.
  We remove intermediate buffering on Windows.
  Remove random_access_max_buffer_size option which was windows specific.
  Non-zero values for that opton introduced unnecessary lock contention.
  Remove Env::EnableReadAhead(), Env::ShouldForwardRawRequest() that are
  no longer necessary.
  Add aligned buffer reads for cases when requested reads exceed read ahead size.
Closes https://github.com/facebook/rocksdb/pull/2105

Differential Revision: D4847770

Pulled By: siying

fbshipit-source-id: 8ab48f8e854ab498a4fd398a6934859792a2788f
2017-04-27 12:30:05 -07:00
Andrew Kryczka efc361ef7d Add user stats Reset API
Summary:
It resets all the ticker and histogram stats to zero. Needed to change the locking a bit since Reset() is the only operation that manipulates multiple tickers/histograms together, and that operation should be seen as atomic by other operations that access tickers/histograms.
Closes https://github.com/facebook/rocksdb/pull/2213

Differential Revision: D4952232

Pulled By: ajkr

fbshipit-source-id: c0475c3e4c7b940120d53891b69c3091149a0679
2017-04-26 15:57:01 -07:00
Siying Dong 97005dbd5d tools/check_format_compatible.sh to cover option file loading too
Summary:
tools/check_format_compatible.sh will check a newer version of RocksDB can open option files generated by older version releases. In order to achieve that, a new parameter "--try_load_options" is added to ldb. With this parameter set, if option file exists, we load the option file and use it to open the DB. With this opiton set, we can validate option loading logic.
Closes https://github.com/facebook/rocksdb/pull/2178

Differential Revision: D4914989

Pulled By: siying

fbshipit-source-id: db114f7724fcb41e5e9483116d84d7c4b8389ca4
2017-04-20 10:26:37 -07:00
Siying Dong c49d704656 Add DB:ResetStats()
Summary:
Add a function to allow users to reset internal stats without restarting the DB.
Closes https://github.com/facebook/rocksdb/pull/2167

Differential Revision: D4907939

Pulled By: siying

fbshipit-source-id: ab2dd85b88aabe9380da7485320a1d460d3e1f68
2017-04-18 16:56:48 -07:00
Yi Wu 0fcdccc33e Blob storage helper methods
Summary:
Split out interfaces needed for blob storage from #1560, including
* CompactionEventListener and OnFlushBegin listener interfaces.
* Blob filename support.
Closes https://github.com/facebook/rocksdb/pull/2169

Differential Revision: D4905463

Pulled By: yiwu-arbug

fbshipit-source-id: 564e73448f1b7a367e5e46216a521e57ea9011b5
2017-04-18 12:42:38 -07:00
Aaron Gao 44fa8ece9b change use_direct_writes to use_direct_io_for_flush_and_compaction
Summary:
Replace Options::use_direct_writes with Options::use_direct_io_for_flush_and_compaction
Now if Options::use_direct_io_for_flush_and_compaction = true, we will enable direct io for both reads and writes for flush and compaction job. Whereas Options::use_direct_reads controls user reads like iterator and Get().
Closes https://github.com/facebook/rocksdb/pull/2117

Differential Revision: D4860912

Pulled By: lightmark

fbshipit-source-id: d93575a8a5e780cf7e40797287edc425ee648c19
2017-04-13 16:12:04 -07:00
Sagar Vemuri 415be221cb RocksDB Release 5.4 : Update HISTORY.md and build version.
Summary: Closes https://github.com/facebook/rocksdb/pull/2142

Reviewed By: siying

Differential Revision: D4874696

Pulled By: sagar0

fbshipit-source-id: 03e6e21735bb74e5a37cc913aabb2c250af558cc
2017-04-12 17:36:27 -07:00
Aaron Gao 02799ad77a Revert "delete fallocate with punch_hole"
Summary:
This reverts commit 0fd574926c.
It breaks tmpfs on kernel 4.0 or earlier. We will wait for the fix before remove this part
Closes https://github.com/facebook/rocksdb/pull/2096

Differential Revision: D4839661

Pulled By: lightmark

fbshipit-source-id: 574a51f
2017-04-05 16:10:09 -07:00
Maysam Yabandeh e2a7b202c3 Release note for partition filters
Summary:
I tagged it experimental since the configuring the filter partitions by size will not be ready for this release.
Closes https://github.com/facebook/rocksdb/pull/2049

Differential Revision: D4831575

Pulled By: maysamyabandeh

fbshipit-source-id: fcc2d25
2017-04-05 12:24:22 -07:00
Andrew Kryczka d659faad54 Level-based L0->L0 compaction
Summary:
Level-based L0->L0 compaction operates on spans of files that aren't currently being compacted. It reduces the number of L0 files, thus making write stall conditions harder to reach.

- L0->L0 is triggered when base level is unavailable due to pending compactions
- L0->L0 always outputs one file of at most `max_level0_burst_file_size` bytes.
- Subcompactions are disabled for L0->L0 since we want to output one file.
- Input files are chosen as the longest span of available files that will fit within the size limit. This minimizes number of files in L0.
Closes https://github.com/facebook/rocksdb/pull/2027

Differential Revision: D4760318

Pulled By: ajkr

fbshipit-source-id: 9d07183
2017-04-04 18:09:11 -07:00
Maysam Yabandeh a1c469d719 Add release notes for PinnableSlice
Summary: Closes https://github.com/facebook/rocksdb/pull/2037

Differential Revision: D4822085

Pulled By: maysamyabandeh

fbshipit-source-id: 9d5a986
2017-04-03 15:09:19 -07:00
Sagar Vemuri c6d04f2ecf Option to fail a request as incomplete when skipping too many internal keys
Summary:
Operations like Seek/Next/Prev sometimes take too long to complete when there are many internal keys to be skipped. Adding an option, max_skippable_internal_keys -- which could be used to set a threshold for the maximum number of keys that can be skipped, will help to address these cases where it is much better to fail a request (as incomplete) than to wait for a considerable time for the request to complete.

This feature -- to fail an iterator seek request as incomplete, is disabled by default when max_skippable_internal_keys = 0. It is enabled only when max_skippable_internal_keys > 0.

This feature is based on the discussion mentioned in the PR https://github.com/facebook/rocksdb/pull/1084.
Closes https://github.com/facebook/rocksdb/pull/2000

Differential Revision: D4753223

Pulled By: sagar0

fbshipit-source-id: 1c973f7
2017-03-30 12:09:21 -07:00
Aaron Gao 0fd574926c delete fallocate with punch_hole
Summary:
As discuss in this thread:
https://www.facebook.com/groups/rocksdb.dev/permalink/1218043868294125/

We remove fallocate with FALLOC_FL_PUNCH_HOLE because the recent bug on xfs in kernel 4.x+ that align file size to page size even with FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE.
Closes https://github.com/facebook/rocksdb/pull/2038

Differential Revision: D4779974

Pulled By: siying

fbshipit-source-id: 5f54625
2017-03-28 15:54:12 -07:00
Siying Dong 909028e214 HISTORY.md for log_size_for_flush in CreateCheckpoint()
Summary: Closes https://github.com/facebook/rocksdb/pull/2021

Differential Revision: D4755324

Pulled By: siying

fbshipit-source-id: c8d7955
2017-03-22 11:24:12 -07:00
Raza Hussain 6908e24b56 dynamic setting of stats_dump_period_sec through SetDBOption()
Summary:
Resolved the following issue: https://github.com/facebook/rocksdb/issues/1930
Closes https://github.com/facebook/rocksdb/pull/2004

Differential Revision: D4736764

Pulled By: yiwu-arbug

fbshipit-source-id: 64fe0b7
2017-03-20 22:54:13 -07:00
Aaron Gao e5bd8def1e update history.md for fixing the bug that skips keys
Summary: Closes https://github.com/facebook/rocksdb/pull/1986

Differential Revision: D4699152

Pulled By: siying

fbshipit-source-id: b18c32c
2017-03-13 11:39:13 -07:00
Andrew Kryczka fe1835617a release 5.3
Summary: Closes https://github.com/facebook/rocksdb/pull/1971

Differential Revision: D4683851

Pulled By: ajkr

fbshipit-source-id: 967116e
2017-03-09 12:39:10 -08:00
Aaron Gao f7997f1341 add direct I/O to version notes 5.2.0
Summary:
let users know this feature is ready
Closes https://github.com/facebook/rocksdb/pull/1915

Differential Revision: D4610842

Pulled By: lightmark

fbshipit-source-id: d102772
2017-02-23 22:09:12 -08:00
Siying Dong 8efb5ffa2a [rocksdb][PR] Remove option min_partial_merge_operands and verify_checksums_in_comp…
Summary:
…action

 The two options, min_partial_merge_operands and verify_checksums_in_compaction, are not seldom used. Remove them to reduce the total number of options. Also remove them from Java and C interface.
Closes https://github.com/facebook/rocksdb/pull/1902

Differential Revision: D4601219

Pulled By: siying

fbshipit-source-id: aad4cb2
2017-02-23 15:09:12 -08:00
Islam AbdelRahman 1560b2f5f0 Temporarly return deprecated functions to fix MongoRocks build
Summary:
MongoRocks is still using some deprecated functions, return them temporarily
Closes https://github.com/facebook/rocksdb/pull/1892

Differential Revision: D4592451

Pulled By: IslamAbdelRahman

fbshipit-source-id: 5e6be3e
2017-02-21 12:54:11 -08:00
Yi Wu 381fd32247 Remove timeout_hint_us from WriteOptions
Summary:
The option has been deprecated for two years and has no effect. Removing.
Closes https://github.com/facebook/rocksdb/pull/1866

Differential Revision: D4555203

Pulled By: yiwu-arbug

fbshipit-source-id: c48f627
2017-02-17 15:24:17 -08:00
Islam AbdelRahman 7ab0051835 Remove deprecated DB::AddFile and DB::CompactRange
Summary:
Remove functions that we deprecated long time ago in db.h
Closes https://github.com/facebook/rocksdb/pull/1878

Differential Revision: D4576521

Pulled By: IslamAbdelRahman

fbshipit-source-id: dfddad1
2017-02-17 10:54:13 -08:00
Sagar Vemuri eb912a927e Remove disableDataSync option
Summary:
Remove disableDataSync, and another similarly named disable_data_sync options.
This is being done to simplify options, and also because the performance gains of this feature can be achieved by other methods.
Closes https://github.com/facebook/rocksdb/pull/1859

Differential Revision: D4541292

Pulled By: sagar0

fbshipit-source-id: 5b3a6ca
2017-02-13 11:09:13 -08:00
Siying Dong 9afa20cf2f Increase build version and HISTORY.md for releasing 5.2
Summary:
Also clean up HISTORY.md a little bit.
Closes https://github.com/facebook/rocksdb/pull/1854

Differential Revision: D4539556

Pulled By: siying

fbshipit-source-id: 567391e
2017-02-10 12:09:15 -08:00
Maysam Yabandeh ac2a77a746 Announce the experimetnal two-level index feature in HISTORY.md
Summary:
Announce the experimetnal two-level index feature in HISTORY.md. Also updated the default for index_per_partition to 1024.
Closes https://github.com/facebook/rocksdb/pull/1855

Differential Revision: D4530102

Pulled By: maysamyabandeh

fbshipit-source-id: b0fc6ff
2017-02-08 14:24:10 -08:00
Vitaliy Liptchinsky 1aaa898cf1 Adding GetApproximateMemTableStats method
Summary:
Added method that returns approx num of entries as well as size for memtables.
Closes https://github.com/facebook/rocksdb/pull/1841

Differential Revision: D4511990

Pulled By: VitaliyLi

fbshipit-source-id: 9a4576e
2017-02-06 14:54:16 -08:00
Siying Dong 4a3e7d320c Change the default of delayed slowdown value to 16MB/s
Summary:
Change the default of delayed slowdown value to 16MB/s and further increase the L0 stop condition to 36 files.
Closes https://github.com/facebook/rocksdb/pull/1821

Differential Revision: D4489229

Pulled By: siying

fbshipit-source-id: 1003981
2017-02-01 20:39:17 -08:00
Siying Dong 2d75cd40d3 NewLRUCache() to pick number of shard bits based on capacity if not given
Summary:
If the users use the NewLRUCache() without passing in the number of shard bits, instead of using hard-coded 6, we'll determine it based on capacity.
Closes https://github.com/facebook/rocksdb/pull/1584

Differential Revision: D4242517

Pulled By: siying

fbshipit-source-id: 86b0f18
2017-01-27 06:39:12 -08:00
Vitaliy Liptchinsky e840213d6e Change DB::GetApproximateSizes for more flexibility needed for MyRocks
Summary:
Added an option to GetApproximateSizes to exclude file stats, as MyRocks has those counted exactly and we need only stats from memtables.
Closes https://github.com/facebook/rocksdb/pull/1787

Differential Revision: D4441111

Pulled By: IslamAbdelRahman

fbshipit-source-id: c11f4c3
2017-01-20 09:39:11 -08:00
Siying Dong 17a4b75cc3 Always fsync the file after file copying
Summary:
File copying happens when creating checkpoints and bulkloading files from different FS partition. We should fsync the files when copying them to guarantee durability. A side effect will be that the dirty pages in file system buffers won't grow too large.
Closes https://github.com/facebook/rocksdb/pull/1728

Differential Revision: D4371083

Pulled By: siying

fbshipit-source-id: 579e14c
2016-12-28 19:09:16 -08:00
Siying Dong 906523d98a Add description to the 2PC checkpooint bug in HISTORY.md
Summary: Closes https://github.com/facebook/rocksdb/pull/1729

Differential Revision: D4371674

Pulled By: siying

fbshipit-source-id: 907e373
2016-12-28 15:54:14 -08:00
Andrew Kryczka 243975d5da More accurate error status for BackupEngine::Open
Summary:
Some users are assuming NotFound means the backup does not
exist at the provided path, which is a reasonable assumption. We need to
stop returning NotFound for system errors.

Depends on #1644
Closes https://github.com/facebook/rocksdb/pull/1645

Differential Revision: D4312233

Pulled By: ajkr

fbshipit-source-id: 5343c10
2016-12-12 13:24:21 -08:00
Islam AbdelRahman ed8fbdb560 Add EventListener::OnExternalFileIngested() event
Summary:
Add EventListener::OnExternalFileIngested() to allow user to subscribe to external file ingestion events
Closes https://github.com/facebook/rocksdb/pull/1623

Differential Revision: D4285844

Pulled By: IslamAbdelRahman

fbshipit-source-id: 0b95a88
2016-12-06 14:09:17 -08:00
Yi Wu 0b0f235724 Mention IngestExternalFile changes in HISTORY.md
Summary:
I hit the land button too fast and didn't include the line.
Closes https://github.com/facebook/rocksdb/pull/1622

Differential Revision: D4281316

Pulled By: yiwu-arbug

fbshipit-source-id: c7b38e0
2016-12-05 16:09:11 -08:00
Yi Wu 23db48e8d8 Update HISTORY.md for 5.0 branch
Summary:
These changes are included in the new branch-cut.
Closes https://github.com/facebook/rocksdb/pull/1621

Differential Revision: D4281015

Pulled By: yiwu-arbug

fbshipit-source-id: d88858b
2016-12-05 15:39:11 -08:00
Anton Safonov 9053fe2a5c Made delete_obsolete_files_period_micros option dynamic
Summary:
Made delete_obsolete_files_period_micros option dynamic. It can be updating using DB::SetDBOptions().
Closes https://github.com/facebook/rocksdb/pull/1595

Differential Revision: D4246569

Pulled By: tonek

fbshipit-source-id: d23f560
2016-12-05 14:24:16 -08:00
Igor Canadi 3f407b065c Kill flashcache code in RocksDB
Summary:
Now that we have userspace persisted cache, we don't need flashcache anymore.
Closes https://github.com/facebook/rocksdb/pull/1588

Differential Revision: D4245114

Pulled By: igorcanadi

fbshipit-source-id: e2c1c72
2016-12-01 10:09:22 -08:00
Mike Kolupaev 247d0979aa Support for range skips in compaction filter
Summary:
This adds the ability for compaction filter to say "drop this key-value, and also drop everything up to key x". This will cause the compaction to seek input iterator to x, without reading the data. This can make compaction much faster when large consecutive chunks of data are filtered out. See the changes in include/rocksdb/compaction_filter.h for the new API.

Along the way this diff also adds ability for compaction filter changing merge operands, similar to how it can change values; we're not going to use this feature, it just seemed easier and cleaner to implement it than to document that it's not implemented :)

The diff is not as big as it may seem, about half of the lines are a test.
Closes https://github.com/facebook/rocksdb/pull/1599

Differential Revision: D4252092

Pulled By: al13n321

fbshipit-source-id: 41e1e48
2016-12-01 07:09:15 -08:00
Siying Dong cd7c4143d7 Improve Write Stalling System
Summary:
Current write stalling system has the problem of lacking of positive feedback if the restricted rate is already too low. Users sometimes stack in very low slowdown value. With the diff, we add a positive feedback (increasing the slowdown value) if we recover from slowdown state back to normal. To avoid the positive feedback to keep the slowdown value to be to high, we add issue a negative feedback every time we are close to the stop condition. Experiments show it is easier to reach a relative balance than before.

Also increase level0_stop_writes_trigger default from 24 to 32. Since level0_slowdown_writes_trigger default is 20, stop trigger 24 only gives four files as the buffer time to slowdown writes. In order to avoid stop in four files while 20 files have been accumulated, the slowdown value must be very low, which is amost the same as stop. It also doesn't give enough time for the slowdown value to converge. Increase it to 32 will smooth out the system.
Closes https://github.com/facebook/rocksdb/pull/1562

Differential Revision: D4218519

Pulled By: siying

fbshipit-source-id: 95e4088
2016-11-23 09:24:15 -08:00
Islam AbdelRahman c1038d2837 Release RocksDB 5.0
Summary:
Update HISTORY.md and version.h
Closes https://github.com/facebook/rocksdb/pull/1536

Differential Revision: D4202987

Pulled By: IslamAbdelRahman

fbshipit-source-id: 94985e3
2016-11-17 18:39:15 -08:00
Yi Wu 36e4762ce0 Remove Ticker::SEQUENCE_NUMBER
Summary:
Remove the ticker count because:
* Having to reset the ticker count in WriteImpl is ineffiecent;
* It doesn't make sense to have it as a ticker count if multiple db
  instance share a statistics object.
Closes https://github.com/facebook/rocksdb/pull/1531

Differential Revision: D4194442

Pulled By: yiwu-arbug

fbshipit-source-id: e2110a9
2016-11-16 22:39:09 -08:00
Andrew Kryczka 0765babe15 Remove LATEST_BACKUP file
Summary:
This has been unused since D42069 but kept around for backward
compatibility. I think it is unlikely anyone will use a much older version of
RocksDB for restore than they use for backup, so I propose removing it. It is
also causing recurring confusion, e.g., https://www.facebook.com/groups/rocksdb.dev/permalink/980454015386446/

Ported from https://reviews.facebook.net/D60735
Closes https://github.com/facebook/rocksdb/pull/1529

Differential Revision: D4194199

Pulled By: ajkr

fbshipit-source-id: 82f9bf4
2016-11-16 17:24:15 -08:00
Yueh-Hsuan Chiang 647eafdc21 Introduce Lua Extension: RocksLuaCompactionFilter
Summary:
This diff includes an implementation of CompactionFilter that allows
users to write CompactionFilter in Lua.  With this ability, users can
dynamically change compaction filter logic without requiring building
the rocksdb binary and restarting the database.

To compile, WITH_LUA_PATH must be specified to the base directory
of lua.
Closes https://github.com/facebook/rocksdb/pull/1478

Differential Revision: D4150138

Pulled By: yhchiang

fbshipit-source-id: ed84222
2016-11-16 15:39:12 -08:00
Siying Dong 972e3ff295 Enable allow_concurrent_memtable_write and enable_write_thread_adaptive_yield by default
Summary: Closes https://github.com/facebook/rocksdb/pull/1496

Differential Revision: D4168080

Pulled By: siying

fbshipit-source-id: 056ae62
2016-11-16 09:39:09 -08:00
Andrew Kryczka 489d142808 DeleteRange interface
Summary:
Expose DeleteRange() interface since we think the implementation is functionally correct now.
Closes https://github.com/facebook/rocksdb/pull/1503

Differential Revision: D4171921

Pulled By: ajkr

fbshipit-source-id: 5e21c98
2016-11-15 15:24:16 -08:00
Artemiy Kolesnikov 91300d01f6 Dynamic max_total_wal_size option
Summary: Closes https://github.com/facebook/rocksdb/pull/1509

Differential Revision: D4176426

Pulled By: yiwu-arbug

fbshipit-source-id: b57689d
2016-11-14 22:54:17 -08:00
Yi Wu 1ea79a78c9 Optimize sequential insert into memtable - Part 1: Interface
Summary:
Currently our skip-list have an optimization to speedup sequential
inserts from a single stream, by remembering the last insert position.
We extend the idea to support sequential inserts from multiple streams,
and even tolerate small reordering wihtin each stream.

This PR is the interface part adding the following:
- Add `memtable_insert_prefix_extractor` to allow specifying prefix for each key.
- Add `InsertWithHint()` interface to memtable, to allow underlying
  implementation to return a hint of insert position, which can be later
  pass back to optimize inserts.
- Memtable will maintain a map from prefix to hints and pass the hint
  via `InsertWithHint()` if `memtable_insert_prefix_extractor` is non-null.
Closes https://github.com/facebook/rocksdb/pull/1419

Differential Revision: D4079367

Pulled By: yiwu-arbug

fbshipit-source-id: 3555326
2016-11-13 19:09:18 -08:00
Lijun Tang adb665e0bf Allowed delayed_write_rate option to be dynamically set.
Summary: Closes https://github.com/facebook/rocksdb/pull/1488

Differential Revision: D4157784

Pulled By: siying

fbshipit-source-id: f150081
2016-11-12 15:54:11 -08:00
Yi Wu 437942e481 Add avoid_flush_during_shutdown DB option
Summary:
Add avoid_flush_during_shutdown DB option.
Closes https://github.com/facebook/rocksdb/pull/1451

Differential Revision: D4108643

Pulled By: yiwu-arbug

fbshipit-source-id: abdaf4d
2016-11-02 15:39:18 -07:00
Benoit Girard 2b16d664cb Change max_bytes_for_level_multiplier to double
Summary: Closes https://github.com/facebook/rocksdb/pull/1427

Differential Revision: D4094732

Pulled By: yiwu-arbug

fbshipit-source-id: b9b79e9
2016-11-01 21:09:23 -07:00
Yueh-Hsuan Chiang ab53998372 Bump RocksDB version to 4.13 (#1405)
Summary:
Bump RocksDB version to 4.13

Test Plan:
unit tests

Reviewers: sdong, IslamAbdelRahman, andrewkr, lightmark

Subscribers: leveldb
2016-10-18 15:39:10 -07:00
Aaron Gao d88dff4ef2 add seeforprev in history
Summary: update new feature in history and avoid breaking mongorocks

Test Plan: make check

Reviewers: sdong, yiwu, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64611
2016-10-17 15:34:13 -07:00
Yi Wu e29d3b67c2 Make max_background_compactions and base_background_compactions dynamic changeable
Summary:
Add DB::SetDBOptions to dynamic change max_background_compactions and base_background_compactions.
I'll add more dynamic changeable options soon.

Test Plan: unit test.

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64749
2016-10-14 12:25:39 -07:00
Yi Wu d6ae6dec69 Add Statistics::getAndResetTickerCount().
Summary: A convience method to atomically get and reset ticker count. I'm wanting to use it to have a thin wrapper to the statistics object to export ticker counts to ODS for LogDevice (since they don't even use fb303).

Test Plan:
test in LogDevice shadow cluster.
https://fburl.com/461868822

Reviewers: andrewkr, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64869
2016-10-11 10:54:11 -07:00
Yi Wu 17f76fc564 DB::GetOptions() reflect dynamic changed options
Summary: DB::GetOptions() reflect dynamic changed options.

Test Plan: See the new unit test.

Reviewers: yhchiang, sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63903
2016-09-14 22:10:28 -07:00
Islam AbdelRahman eb1d4d53c8 Release RocksDB 4.12
Summary: Release 4.12

Test Plan: none

Reviewers: andrewkr, yiwu, lightmark, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D63885
2016-09-12 12:34:08 -07:00
Andrew Kryczka de28a25533 Update HISTORY.md for thread-local stats
Summary: as titled

Test Plan: doitlive

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63777
2016-09-08 14:02:19 -07:00
sdong 607628d349 Support ZSTD with finalized format
Summary:
ZSTD 1.0.0 is coming. We can finally add a support of ZSTD without worrying about compatibility.
Still keep ZSTDNotFinal for compatibility reason.

Test Plan: Run all tests. Run db_bench with ZSTD version with RocksDB built with ZSTD 1.0 and older.

Reviewers: andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: cyan, igor, IslamAbdelRahman, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D63141
2016-09-06 12:22:16 -07:00
Yi Wu a88677d2cf Remove ImmutableCFOptions from public API
Summary: There's no reference to ImmutableCFOptions elsewhere in /include/rocksdb. ImmutableCFOptions was introduced in this commit (5665e5e285) but later its reference in /include/rocksdb/table.h is removed.

Test Plan:
  make all check

Reviewers: IslamAbdelRahman, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: yhchiang, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63177
2016-09-02 14:16:31 -07:00
sdong 32149059f9 Merge options source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes
Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes.

Test Plan: Add two new unit tests. Run all existing tests, including jtest.

Reviewers: yhchiang, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59829
2016-09-01 14:33:24 -07:00
Justin T. Gibbs 23a057007c Document memtable flush behavior in CancelAllBackgroundWork()
Summary:
Update History.md to reflect recent change that ensures unpersisted data
is flushed even if clients call CancelAllBackgroundWork() directly.

Test Plan: Review rendering of markdown.

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62703
2016-08-26 10:14:40 -07:00
Yi Wu 4cc37f59e5 Introduce ClockCache
Summary:
Clock-based cache implemenetation aim to have better concurreny than
default LRU cache. See inline comments for implementation details.

Test Plan:
Update cache_test to run on both LRUCache and ClockCache. Adding some
new tests to catch some of the bugs that I fixed while implementing the
cache.

Reviewers: kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61647
2016-08-19 12:28:19 -07:00
sdong 638c49f24f Change HISTORY.md for release 4.11
Summary:
Need to change HISTORY.md for 4.11.
4.10 was not updated either. Update it together.

Test Plan: Not needed.

Reviewers: kradhakrishnan, andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61665
2016-08-10 10:44:24 -07:00
Wanning Jiang c38b075e7e Update HISTORY.md 2016-08-09 21:12:45 -07:00
Wanning Jiang 8f399e3fee Update HISTORY.md 2016-08-09 20:57:40 -07:00
sdong 7c4615cf1f A utility function to help users migrate DB after options change
Summary: Add a utility function that trigger necessary full compaction and put output to the correct level by looking at new options and old options.

Test Plan: Add unit tests for it.

Reviewers: andrewkr, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: muthu, sumeet, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60783
2016-08-05 15:39:55 -07:00
sdong e5b5f12b81 Change options memtable_prefix_bloom_huge_page_tlb_size => memtable_huge_page_size and cover huge page to memtable too
Summary: Extend the option memtable_prefix_bloom_huge_page_tlb_size from just putting memtable bloom filter to huge page to memtable itself too.

Test Plan: Run all existing tests.

Reviewers: IslamAbdelRahman, yhchiang, andrewkr

Reviewed By: andrewkr

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60513
2016-07-26 18:15:11 -07:00
sdong 32df9733d1 Add options.write_buffer_manager: control total memtable size across DB instances
Summary: Add option write_buffer_manager to help users control total memory spent on memtables across multiple DB instances.

Test Plan: Add a new unit test.

Reviewers: yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: adela, benj, sumeet, muthu, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59925
2016-07-05 18:11:25 -07:00
Lucas Qiu 38fae9e65d fix typos in HISTORY.md (#1192) 2016-07-04 22:58:35 -07:00
Islam AbdelRahman 9b5adea97f Add More Logging to track total_log_size
Summary: We saw instances where total_log_size is off the real value, but I'm not able to reproduce it. Add more logging to help debugging when it happens again.

Test Plan: Run the unit test and see the logging.

Reviewers: andrewkr, yhchiang, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60081
2016-07-02 01:56:57 -07:00
Andrew Kryczka 56ac686292 Detect column family from properties [CF + RepairDB part 2/3]
Summary:
This diff uses the CF ID and CF name properties in the SST file
to associate recovered data with the proper column family. Depends on D59775.

- In ScanTable(), create column families in VersionSet each time a new one is discovered (via reading SST file properties)
- In ConvertLogToTable(), dump an SST file for every column family with data in the WAL
- In AddTables(), make a VersionEdit per-column family that adds all of that CF's tables

Test Plan:
  $ ./repair_test

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59781
2016-06-24 13:12:13 -07:00
omegaga c4e19b77e8 Add a read option to enable background purge when cleaning up iterators
Summary:
Add a read option `background_purge_on_iterator_cleanup` to avoid deleting files in foreground when destroying iterators.
Instead, a job is scheduled in high priority queue and would be executed in a separate background thread.

Test Plan: Add a variant of PurgeObsoleteFileTest. Turn on background purge option in the new test, and use sleeping task to ensure files are deleted in background.

Reviewers: IslamAbdelRahman, sdong

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59499
2016-06-21 18:41:23 -07:00
sdong 7b79238b65 Deprectate filter_deletes
Summary: filter_deltes is not a frequently used feature. Remove it.

Test Plan: Run all test suites.

Reviewers: igor, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59427
2016-06-17 10:30:47 -07:00
sdong 0babce57f7 Move away from enum char value -1
Summary: char is not signed in some platforms. Having negative values confuse those compilers.

Test Plan: Run all existing tests.

Reviewers: andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59619
2016-06-14 17:07:34 -07:00
Yi Wu bc8af90e8c add option to not flush memtable on open()
Summary:
Add option to not flush memtable on open()
In case the option is enabled, don't delete existing log files by not updating log numbers to MANIFEST.
Will still flush if we need to (e.g. memtable full in the middle). In that case we also flush final memtable.
If wal_recovery_mode = kPointInTimeRecovery, do not halt immediately after encounter corruption. Instead, check if seq id of next log file is last_log_sequence + 1. In that case we continue recovery.

Test Plan: See unit test.

Reviewers: dhruba, horuff, sdong

Reviewed By: sdong

Subscribers: benj, yhchiang, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57813
2016-06-13 11:34:16 -07:00
sdong 20699df843 memtable_prefix_bloom_bits -> memtable_prefix_bloom_bits_ratio and deprecate memtable_prefix_bloom_probes
Summary:
memtable_prefix_bloom_probes is not a critical option. Remove it to reduce number of options.
It's easier for users to make mistakes with memtable_prefix_bloom_bits, turn it to memtable_prefix_bloom_bits_ratio

Test Plan: Run all existing tests

Reviewers: yhchiang, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: gunnarku, yoshinorim, MarkCallaghan, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59199
2016-06-10 12:12:10 -07:00
Aaron Gao 53a4bd8a69 duplicate line
Summary: duplicated line in history.md

Test Plan: n/a

Reviewers: yiwu

Reviewed By: yiwu

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59433
2016-06-09 17:30:52 -07:00
Aaron Gao 3e86869616 release 4.9 update version and history
Summary: update version and history

Test Plan: N/A

Reviewers: yiwu, sdong, IslamAbdelRahman, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59421
2016-06-09 17:26:17 -07:00
sdong b2973eaaeb Remove options builder
Summary: AFIK, options builder is not used by anyone. Remove it.

Test Plan: Run all existing tests.

Reviewers: IslamAbdelRahman, andrewkr, igor

Reviewed By: igor

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59319
2016-06-09 16:56:17 -07:00
sdong 1d725ca51d Deprecate BlockBasedTableOptions.hash_index_allow_collision=false.
Summary: Deprecate this one option and delete code and tests that are now superfluous.

Test Plan: all tests pass

Reviewers: igor, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: msalib, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55317
2016-05-20 17:52:27 -07:00
Evan Shaw 7383b64b3a Fix formatting of HISTORY.md (#1126)
A couple "New Features" headers needed a blank line before them.
2016-05-19 10:56:12 -07:00
Islam AbdelRahman 4b31723433 Add bottommost_compression option
Summary:
Add a new option that can be used to set a specific compression algorithm for bottommost level.
This option will only affect levels larger than base level.

I have also updated CompactionJobInfo to include the compression algorithm used in compaction

Test Plan:
added new unittest
existing unittests

Reviewers: andrewkr, yhchiang, sdong

Reviewed By: sdong

Subscribers: lightmark, andrewkr, dhruba, yoshinorim

Differential Revision: https://reviews.facebook.net/D57669
2016-05-09 15:57:19 -07:00
Yi Wu 24a24f013d Enable configurable readahead for iterators
Summary:
Add an option `iterator_readahead_size` to `ReadOptions` to enable
configurable readahead for iterators similar to the corresponding
option for compaction.

Test Plan:
```
make commit_prereq
```

Reviewers: kumar.rangarajan, ott, igor, sdong

Reviewed By: sdong

Subscribers: yiwu, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55419
2016-05-04 15:25:58 -07:00
Yi Wu 1b166928c7 Release RocksDB 4.8.0
Summary: Release RocksDB 4.8.0

Test Plan: N/A

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57525
2016-05-02 14:38:04 -07:00
Yi Wu a92049e3e7 Added EventListener::OnTableFileCreationStarted() callback
Summary: Added EventListener::OnTableFileCreationStarted. EventListener::OnTableFileCreated will be called on failure case. User can check creation status via TableFileCreationInfo::status.

Test Plan: unit test.

Reviewers: dhruba, yhchiang, ott, sdong

Reviewed By: sdong

Subscribers: sdong, kradhakrishnan, IslamAbdelRahman, andrewkr, yhchiang, leveldb, ott, dhruba

Differential Revision: https://reviews.facebook.net/D56337
2016-04-29 11:35:00 -07:00
Andrew Kryczka 843d2e3137 Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.

During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.

On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.

Test Plan: new unit test

Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong

Reviewed By: sdong

Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D52287
2016-04-27 17:36:03 -07:00
Andrew Kryczka 7a6045a3c7 fix typo in HISTORY.md 2016-04-20 18:53:37 -07:00
Andrew Kryczka 73a847ef89 Add per-level compression ratio property
Summary:
This is needed so we can measure compression ratio improvements
achieved by D52287.

The property compares raw data size against the total file size for a given
level. If the level is empty it should return 0.0.

Test Plan: new unit test

Reviewers: IslamAbdelRahman, yhchiang, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D56967
2016-04-20 18:46:54 -07:00
Andrew Kryczka 40b840f294 Delete deprecated *BackupableDB interface for backups
Summary:
This interface is redundant and has been deprecated for a while.
It's also unused internally. Let's delete it.

I moved the comments to the corresponding functions in BackupEngine/
BackupEngineReadOnly. This caused the diff tool to not work cleanly.

Test Plan:
unit tests

  $ ./backupable_db_test

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D56331
2016-04-18 09:04:14 -07:00
sdong 4b6833aec1 Rename options.compaction_measure_io_stats to options.report_bg_io_stats and include flush too.
Summary: It is useful to print out IO stats in flush jobs too. Extend options.compaction_measure_io_stats to flush jobs and raname it.

Test Plan: Try db_bench and see the stats are printed out.

Reviewers: yhchiang

Reviewed By: yhchiang

Subscribers: kradhakrishnan, yiwu, IslamAbdelRahman, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56769
2016-04-15 10:22:18 -07:00
Islam AbdelRahman 1aeca97337 Release RocksDB 4.7
Summary: Bump the version and update HISTORY.md

Test Plan: none

Reviewers: yhchiang, sdong, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56469
2016-04-13 15:03:08 -07:00
sdong 2feafa3db9 Change some RocksDB default options
Summary: Change some RocksDB default options to make it more friendly to server workloads.

Test Plan: Run all existing tests

Reviewers: yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: sumeet, muthu, benj, MarkCallaghan, igor, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55941
2016-03-31 17:12:18 -07:00
Andrew Kryczka 0267655dad Update change log for 4.6 release
Summary: as titled

Test Plan: N/A

Reviewers: sdong, kradhakrishnan, anthony, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55323
2016-03-12 13:51:57 -08:00
Islam AbdelRahman 580fede347 Aggregate hot Iterator counters in LocalStatistics (DBIter::Next perf regression)
Summary:
This patch bump the counters in the frequent code path DBIter::Next() / DBIter::Prev() in a local data members and send them to Statistics when the iterator is destroyed
A better solution will be to have thread_local implementation for Statistics

New performance
```
readseq      :       0.035 micros/op 28597881 ops/sec; 3163.7 MB/s
     1,851,568,819      stalled-cycles-frontend   #   31.29% frontend cycles idle    [49.86%]
       884,929,823      stalled-cycles-backend    #   14.95% backend  cycles idle    [50.21%]
readreverse  :       0.071 micros/op 14077393 ops/sec; 1557.3 MB/s
     3,239,575,993      stalled-cycles-frontend   #   27.36% frontend cycles idle    [49.96%]
     1,558,253,983      stalled-cycles-backend    #   13.16% backend  cycles idle    [50.14%]

```

Existing performance

```
readreverse  :       0.174 micros/op 5732342 ops/sec;  634.1 MB/s
    20,570,209,389      stalled-cycles-frontend   #   70.71% frontend cycles idle    [50.01%]
    18,422,816,837      stalled-cycles-backend    #   63.33% backend  cycles idle    [50.04%]

readseq      :       0.119 micros/op 8400537 ops/sec;  929.3 MB/s
    15,634,225,844      stalled-cycles-frontend   #   79.07% frontend cycles idle    [49.96%]
    14,227,427,453      stalled-cycles-backend    #   71.95% backend  cycles idle    [50.09%]
```

Test Plan: unit tests

Reviewers: yhchiang, sdong, igor

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55107
2016-03-11 19:01:12 -08:00