Commit Graph

6839 Commits

Author SHA1 Message Date
Huachao Huang ab43ff58b5 Delete files in multiple ranges at once
Summary:
Using `DeleteFilesInRange` to delete files in a lot of ranges can be slow, because
`VersionSet::LogAndApply` is expensive.

This PR adds a new `DeleteFilesInRange` function to delete files in multiple
ranges at once.

Close https://github.com/facebook/rocksdb/issues/2951
Closes https://github.com/facebook/rocksdb/pull/3431

Differential Revision: D6849228

Pulled By: ajkr

fbshipit-source-id: daeedcabd8def4b1d9ee95a58266dee77b5d68cb
2018-01-30 13:56:39 -08:00
Fosco Marotto 77dc069eb9 Change size_t cast in table_test
Summary:
Fixes this build error on master (macOS):

```
table/table_test.cc:972:27: error: implicit conversion loses integer precision: 'size_t' (aka 'unsigned long') to
      'unsigned int' [-Werror,-Wshorten-64-to-32]
```
Closes https://github.com/facebook/rocksdb/pull/3434

Reviewed By: maysamyabandeh

Differential Revision: D6840354

Pulled By: gfosco

fbshipit-source-id: fffac6aefbbdd134ce1299453c5590aa855a5fc8
2018-01-30 11:12:51 -08:00
Andrew Kryczka f3fe6f883b fix for checkpoint directory with trailing slash(es)
Summary:
previously if `checkpoint_dir` contained a trailing slash, we'd attempt to create the `.tmp` directory under `checkpoint_dir` due to simply concatenating `checkpoint_dir + ".tmp"`. This failed because `checkpoint_dir` hadn't been created yet and our directory creation is non-recursive. This PR fixes the issue by always creating the `.tmp` directory in the same parent as `checkpoint_dir` by stripping trailing slashes before concatenating.
Closes https://github.com/facebook/rocksdb/pull/3275

Differential Revision: D6574952

Pulled By: ajkr

fbshipit-source-id: a6daa6777a901eac2460cd0140c9515f7241aefc
2018-01-29 21:11:42 -08:00
Yi Wu 4bdf06e78f Fix DBFlushTest::ManualFlushWithMinWriteBufferNumberToMerge dead lock
Summary:
In the test, there can be a dead lock between background flush thread and foreground main thread as following:
* background flush thread:
  - holding db mutex, while
  - waiting on "DBImpl::FlushMemTableToOutputFile:BeforeInstallSV" sync point.
* foreground thread:
  - waiting for db mutex to write "key2"

Fixing by let background flush thread wait without holding db mutex.
Closes https://github.com/facebook/rocksdb/pull/3436

Differential Revision: D6841334

Pulled By: yiwu-arbug

fbshipit-source-id: b020768ac94e166e40953c5d09e505515a5f244d
2018-01-29 18:56:47 -08:00
Maysam Yabandeh 3073b1c573 Split SnapshotConcurrentAccessTest into 20 sub tests
Summary:
SnapshotConcurrentAccessTest sometimes times out when running on the test infra. This patch splits the test into smaller sub-tests to avoid the timeout. It also benefits from lower run-time of each sub-test and increases the coverage of the test. The overall run-time of each final sub-test is at most half of the original test so we should no longer see a timeout.
Closes https://github.com/facebook/rocksdb/pull/3435

Differential Revision: D6839427

Pulled By: maysamyabandeh

fbshipit-source-id: d53fdb157109e2438ca7fe447d0cf4b71f304bd8
2018-01-29 17:12:55 -08:00
Sagar Vemuri e6605e5302 Tests for dynamic universal compaction options
Summary:
Added a test for three dynamic universal compaction options, in the realm of read amplification:
- size_ratio
- min_merge_width
- max_merge_width

Also updated DynamicUniversalCompactionSizeAmplification by adding a check on compaction reason.
Found a bug in compaction reason setting while working on this PR, and fixed in #3412 .

TODO for later: Still to add tests for these options: compression_size_percent, stop_style and trivial_move.
Closes https://github.com/facebook/rocksdb/pull/3419

Differential Revision: D6822217

Pulled By: sagar0

fbshipit-source-id: 074573fca6389053cbac229891a0163f38bb56c4
2018-01-29 16:42:45 -08:00
Zhongyi Xie 3fe0937180 Use block cache to track memory usage when ReadOptions.fill_cache=false
Summary:
ReadOptions.fill_cache is set in compaction inputs and can be set by users in their queries too. It tells RocksDB not to put a data block used to block cache.

The memory used by the data block is, however, not trackable by users.

To make the system more manageable, we can cost the block to block cache while using it, and then release it after using.
Closes https://github.com/facebook/rocksdb/pull/3333

Differential Revision: D6670230

Pulled By: miasantreble

fbshipit-source-id: ab848d3ed286bd081a13ee1903de357b56cbc308
2018-01-29 14:43:10 -08:00
Siying Dong e2d4b0efb1 db_bench: sanity check CuckooTable with mmap_read option
Summary:
This is to avoid run time error. Fail the db_bench immediately if cuckoo table is used but mmap_read is not specified.
Closes https://github.com/facebook/rocksdb/pull/3420

Differential Revision: D6838284

Pulled By: siying

fbshipit-source-id: 20893fa28d40fadc31e4ff154bed02f5a1bad341
2018-01-29 14:27:32 -08:00
Mark Isaacson b8eb32f8cf Suppress lint in old files
Summary: Grandfather in super old lint issues to make a clean slate for moving forward that allows us to have stronger enforcement on new issues.

Reviewed By: yiwu-arbug

Differential Revision: D6821806

fbshipit-source-id: 22797d31ec58e9eb0255d3b66fedfcfcb0dc127c
2018-01-29 12:56:42 -08:00
Andrew Kryczka 9f7ccc8445 fix db_bench filluniquerandom key count assertion
Summary:
It failed every time. I guess people usually ran with assertions disabled.
Closes https://github.com/facebook/rocksdb/pull/3422

Differential Revision: D6822984

Pulled By: ajkr

fbshipit-source-id: 2e90db75618b26ac1c46ddfa9e03c095c7bf16e3
2018-01-29 11:43:21 -08:00
Mamy Ratsimbazafy 3f666f79af Add Nim to the list of language bindings
Summary: Closes https://github.com/facebook/rocksdb/pull/3428

Differential Revision: D6834061

Pulled By: maysamyabandeh

fbshipit-source-id: edca5b5b8330e0fee646c7434b9631da76670240
2018-01-29 09:57:46 -08:00
Ben Darnell 65cd6cd4b6 Rewrite comments on use_fsync option
Summary:
This replaces a vague warning about the mostly-obsolete ext3 filesystem with
a more detailed note about a historical bug in the still-relevant ext4.

Fixes #3410
Closes https://github.com/facebook/rocksdb/pull/3421

Differential Revision: D6834881

Pulled By: siying

fbshipit-source-id: 7771ef5c89a54c0ac17821680779c48178d0b400
2018-01-29 09:57:46 -08:00
Maysam Yabandeh 46acdc9883 Split HarnessTest_Randomized to avoid timeout
Summary:
Split HarnessTest_Randomized to two tests
Closes https://github.com/facebook/rocksdb/pull/3424

Differential Revision: D6826006

Pulled By: maysamyabandeh

fbshipit-source-id: 59c9a11c7da092206effce6e4fa3792f9c66bef2
2018-01-29 07:41:44 -08:00
Yi Wu 439855a774 StackableDB optionally take shared ownership of the underlying DB
Summary:
Allow StackableDB optionally takes a shared_ptr on construction and thus hold shared ownership of the underlying DB.
Closes https://github.com/facebook/rocksdb/pull/3423

Differential Revision: D6824163

Pulled By: yiwu-arbug

fbshipit-source-id: dbdc30c42e007533a987ef413785e192340f03eb
2018-01-26 15:28:44 -08:00
Maysam Yabandeh 4927b4e662 Rounddown in FilePrefetchBuffer::Prefetch
Summary:
FilePrefetchBuffer::Prefetch is currently rounds the offset up which does not fit its new use cases in prefetching index/filter blocks, as it would skips over some the offsets that were requested to be prefetched. This patch rounds down instead.

Fixes #3180
Closes https://github.com/facebook/rocksdb/pull/3413

Differential Revision: D6816392

Pulled By: maysamyabandeh

fbshipit-source-id: 3aaeaf59c55d72b61dacfae6d4a8e65eccb3c553
2018-01-26 12:57:25 -08:00
Sagar Vemuri 7fcc1d0ddf Incorrect Universal Compaction reason
Summary:
While writing tests for dynamic Universal Compaction options, I found that the compaction reasons we set for size-ratio based and sorted-run based universal compactions are swapped with each other. Fixed it.
Closes https://github.com/facebook/rocksdb/pull/3412

Differential Revision: D6820540

Pulled By: sagar0

fbshipit-source-id: 270a188968ba25b2c96a8339904416c4c87ff5b3
2018-01-26 11:12:40 -08:00
Andrew Kryczka 0e6e405fec db_bench support for memtable in-place update
Summary: Closes https://github.com/facebook/rocksdb/pull/3416

Differential Revision: D6820606

Pulled By: ajkr

fbshipit-source-id: 5035ffb33ade8d50520cafeb685ee8c8fcf1cca8
2018-01-26 10:57:49 -08:00
Sagar Vemuri d938226af4 Improve performance of long range scans with readahead
Summary:
This change improves the performance of iterators doing long range scans (e.g. big/full table scans in MyRocks) by using readahead and prefetching additional data on each disk IO. This prefetching is automatically enabled on noticing more than 2 IOs for the same table file during iteration. The readahead size starts with 8KB and is exponentially increased on each additional sequential IO, up to a max of 256 KB. This helps in cutting down the number of IOs needed to complete the range scan.

Constraints:
- The prefetched data is stored by the OS in page cache. So this currently works only for non direct-reads use-cases i.e applications which use page cache. (Direct-I/O support will be enabled in a later PR).
- This gets currently enabled only when ReadOptions.readahead_size = 0 (which is the default value).

Thanks to siying for the original idea and implementation.

**Benchmarks:**
Data fill:
```
TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=fillrandom -num=1000000000 -compression_type="none" -level_compaction_dynamic_level_bytes
```
Do a long range scan: Seekrandom with large number of nexts
```
TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=seekrandom -duration=60 -num=1000000000 -use_existing_db -seek_nexts=10000 -statistics -histogram
```

Page cache was cleared before each experiment with the command:
```
sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
```
```
Before:
seekrandom   :   34020.945 micros/op 29 ops/sec;   32.5 MB/s (1636 of 1999 found)
With this change:
seekrandom   :    8726.912 micros/op 114 ops/sec;  126.8 MB/s (5702 of 6999 found)
```
~3.9X performance improvement.

Also verified with strace and gdb that the readahead size is increasing as expected.
```
strace -e readahead -f -T -t -p <db_bench process pid>
```
Closes https://github.com/facebook/rocksdb/pull/3282

Differential Revision: D6586477

Pulled By: sagar0

fbshipit-source-id: 8a118a0ed4594fbb7f5b1cafb242d7a4033cb58c
2018-01-25 21:41:53 -08:00
Ben Darnell 65d431639b Update comments about default WALRecoveryMode
Summary:
The default changed in 6a14f7a976 but this comment was not updated.
Closes https://github.com/facebook/rocksdb/pull/3409

Differential Revision: D6808264

Pulled By: maysamyabandeh

fbshipit-source-id: 0d7e2a054eb181e9a144fcb783cf0b2c77219bc0
2018-01-25 18:12:08 -08:00
Siying Dong 1039133f2d BlockBasedTable::NewDataBlockIterator to always return BlockIter
Summary:
This is a pre-cleaning up before a major block based table iterator refactoring. BlockBasedTable::NewDataBlockIterator() will always return BlockIter. This simplifies the logic and code and enable further refactoring and optimization.
Closes https://github.com/facebook/rocksdb/pull/3398

Differential Revision: D6780165

Pulled By: siying

fbshipit-source-id: 273f7dc896724f682c0118fb69a359d9cc4418b4
2018-01-25 14:57:18 -08:00
Yi Wu c7226428dd WritePrepared Txn: Fix DBIterator and add test
Summary:
In DBIter, Prev() calls FindValueForCurrentKey() to search the current value backward. If it finds that there are too many stale value being skipped, it falls back to FindValueForCurrentKeyUsingSeek(), seeking directly to the key with snapshot sequence. After introducing read_callback, however, the key it seeks to might not be visible, according to read_callback. It thus needs to keep searching forward until the first visible value.
Closes https://github.com/facebook/rocksdb/pull/3382

Differential Revision: D6756148

Pulled By: yiwu-arbug

fbshipit-source-id: 064e39b1eec5e083af1c10142600f26d1d2697be
2018-01-23 16:57:11 -08:00
Kamalalochana Subbaiah d6fdd59c63 CMake changes for CRC32 Optimization on PowerPC
Summary: Closes https://github.com/facebook/rocksdb/pull/2869

Differential Revision: D6791359

Pulled By: ajkr

fbshipit-source-id: fdd38df603d84bbcce8d85dd1729d5caa256e6be
2018-01-23 16:57:11 -08:00
Yi Wu 35d8e65a04 Make Iterator::SeekForPrev pure virtual
Summary:
To prevent user who implement the Iterator interface fail to implement SeekForPrev by mistake.
Closes https://github.com/facebook/rocksdb/pull/3402

Differential Revision: D6790681

Pulled By: yiwu-arbug

fbshipit-source-id: bd75b8ced30208982e0a1414d34384d93496827a
2018-01-23 16:12:19 -08:00
Yi Wu d46e832e94 Assert last reference before destroy ColumnFamilyData
Summary:
In ColumnFamilySet destructor, assert it hold the last reference to cfd before destroy them.

Closes #3112
Closes https://github.com/facebook/rocksdb/pull/3397

Differential Revision: D6777967

Pulled By: yiwu-arbug

fbshipit-source-id: 60b19070e0c194b3b6146699140c1d68777866cb
2018-01-23 15:12:28 -08:00
Yi Wu edc258127e DB::DumpSupportInfo should log all supported compression types
Summary:
DB::DumpSupportInfo should log all supported compression types.
Closes #3146
Closes https://github.com/facebook/rocksdb/pull/3396

Differential Revision: D6777019

Pulled By: yiwu-arbug

fbshipit-source-id: 5b17f1ffb2d71224e52f7d9c045434746c789fb0
2018-01-23 14:44:12 -08:00
Nathan VanBenschoten ec0167eecb Fix WriteBatch rep_ format for RangeDeletion records
Summary:
This is a small amount of general cleanup I made while experimenting with https://github.com/facebook/rocksdb/issues/3391.
Closes https://github.com/facebook/rocksdb/pull/3392

Differential Revision: D6788365

Pulled By: yiwu-arbug

fbshipit-source-id: 2716e5aabd5424a4dfdaa954361a62c8eb721ae2
2018-01-23 12:57:32 -08:00
Sagar Vemuri 0ea7170d7d Remove old misleading comments
Summary:
FIFO and Universal compaction options were recently made dynamic, but I forgot to update these comments. These would mislead anyone who is reading the code.
Closes https://github.com/facebook/rocksdb/pull/3399

Differential Revision: D6786358

Pulled By: sagar0

fbshipit-source-id: 57cfc412f63deaee29bbd82b863304821d60057d
2018-01-23 10:27:41 -08:00
Yi Wu 7e3d3326ce Blob DB: dump blob_db_options.min_blob_size
Summary:
min_blob_size was missing from BlobDBOptions::Dump.
Closes https://github.com/facebook/rocksdb/pull/3400

Differential Revision: D6781525

Pulled By: yiwu-arbug

fbshipit-source-id: 40d9b391578d7f8c91bd89f4ce2eda5064864c25
2018-01-22 22:41:27 -08:00
Siying Dong 7291a3f813 Improve fallocate size in compaction output
Summary:
Now in leveled compaction, we allocate solely based on output target file size. If the total input size is smaller than the number, we should use the total input size instead. Also, cap the allocate size to 1GB.
Closes https://github.com/facebook/rocksdb/pull/3385

Differential Revision: D6762363

Pulled By: siying

fbshipit-source-id: e30906f6e9bff3ec847d2166e44cb49c92f98a13
2018-01-22 16:43:46 -08:00
Islam AbdelRahman c615689bb5 Support skipping bloom filters for SstFileWriter
Summary:
Add an option for SstFileWriter to skip building bloom filters
Closes https://github.com/facebook/rocksdb/pull/3360

Differential Revision: D6709120

Pulled By: IslamAbdelRahman

fbshipit-source-id: 964d4bce38822a048691792f447bcfbb4b6bd809
2018-01-22 14:42:18 -08:00
Bernard Spil 6f5ba0bf5b Fix building on FreeBSD
Summary:
FreeBSD uses jemalloc as the base malloc implementation.
The patch has been functional on FreeBSD as of the MariaDB 10.2 port.
Closes https://github.com/facebook/rocksdb/pull/3386

Differential Revision: D6765742

Pulled By: yiwu-arbug

fbshipit-source-id: d55dbc082eecf640ef3df9a21f26064ebe6587e8
2018-01-19 17:12:43 -08:00
Yi Wu f2f034ef3b Blob DB: fix crash when DB full but no candidate file to evict
Summary:
When blob_files is empty, std::min_element will return blobfiles.end(), which cannot be dereference. Fixing it.
Closes https://github.com/facebook/rocksdb/pull/3387

Differential Revision: D6764927

Pulled By: yiwu-arbug

fbshipit-source-id: 86f78700132be95760d35ac63480dfd3a8bbe17a
2018-01-19 16:26:50 -08:00
Yi Wu 5568aec421 Fix DBTest::SoftLimit TSAN failure
Summary:
Fix data race found by TSAN around WriteStallListener: https://gist.github.com/yiwu-arbug/027d2448b903648f2f0f40b05258d80f
Closes https://github.com/facebook/rocksdb/pull/3384

Differential Revision: D6762167

Pulled By: yiwu-arbug

fbshipit-source-id: cd3a5c9f806de390bd1af6077ea6dbbc8bcaec09
2018-01-19 12:57:15 -08:00
Siying Dong 47ad6b81ff Add 5.10.fb to tools/check_format_compatible.sh
Summary: Closes https://github.com/facebook/rocksdb/pull/3383

Differential Revision: D6762375

Pulled By: siying

fbshipit-source-id: dc1e0dc9718ffb59ffe42e2a2c844b67f935a5fb
2018-01-19 12:42:07 -08:00
Yi Wu f1cb83fcf4 Fix Flush() keep waiting after flush finish
Summary:
Flush() call could be waiting indefinitely if min_write_buffer_number_to_merge is used. Consider the sequence:
1. User call Flush() with flush_options.wait = true
2. The manual flush started in the background
3. New memtable become immutable because of writes. The new memtable will not trigger flush if min_write_buffer_number_to_merge is not reached.
4. The manual flush finish.

Because of the new memtable created at step 3 not being flush, previous logic of WaitForFlushMemTable() keep waiting, despite the memtables it intent to flush has been flushed.

Here instead of checking if there are any more memtables to flush, WaitForFlushMemTable() also check the id of the earliest memtable. If the id is larger than that of latest memtable at the time flush was initiated, it means all the memtable at the time of flush start has all been flush.
Closes https://github.com/facebook/rocksdb/pull/3378

Differential Revision: D6746789

Pulled By: yiwu-arbug

fbshipit-source-id: 35e698f71c7f90b06337a93e6825f4ea3b619bfa
2018-01-18 17:45:16 -08:00
topilski b9873162f0 Fixed get version on windows, moved throwing exceptions into cc file.
Summary:
Fixes for msys2 and mingw, hide exceptions into cpp  file.
Closes https://github.com/facebook/rocksdb/pull/3377

Differential Revision: D6746707

Pulled By: yiwu-arbug

fbshipit-source-id: 456b38df80bc48b8386a2cf87f669b5a4f9999a4
2018-01-18 14:56:56 -08:00
jonasf 4decff6fa8 Add possibility to change ttl on open DB
Summary:
We have seen cases where it could be good to change TTL on already open DB.
Change ttl in TtlCompactionFilterFactory on open db.
Next time a filter is created, it will filter accroding to the set TTL.

Is this something that could be useful for others?
Any downsides?
Closes https://github.com/facebook/rocksdb/pull/3292

Differential Revision: D6731993

Pulled By: miasantreble

fbshipit-source-id: 73b94d69237b11e8730734389052429d621a6b1e
2018-01-18 10:42:15 -08:00
Andrew Kryczka 46e599fc6b fix live WALs purged while file deletions disabled
Summary:
When calling `DisableFileDeletions` followed by `GetSortedWalFiles`, we guarantee the files returned by the latter call won't be deleted until after file deletions are re-enabled. However, `GetSortedWalFiles` didn't omit files already planned for deletion via `PurgeObsoleteFiles`, so the guarantee could be broken.

We fix it by making `GetSortedWalFiles` wait for the number of pending purges to hit zero if file deletions are disabled. This condition is eventually met since `PurgeObsoleteFiles` is guaranteed to be called for the existing pending purges, and new purges cannot be scheduled while file deletions are disabled. Once the condition is met, `GetSortedWalFiles` simply returns the content of DB and archive directories, which nobody can delete (except for deletion scheduler, for which I plan to fix this bug later) until deletions are re-enabled.
Closes https://github.com/facebook/rocksdb/pull/3341

Differential Revision: D6681131

Pulled By: ajkr

fbshipit-source-id: 90b1e2f2362ea9ef715623841c0826611a817634
2018-01-17 17:42:04 -08:00
Andrew Kryczka 266d85fbec fix DBTest.AutomaticConflictsWithManualCompaction
Summary:
After af92d4ad11, only exclusive manual compaction can have conflict. dc360df81e updated the conflict-checking test case accordingly. But we missed the point that exclusive manual compaction can only conflict with automatic compactions scheduled after it, since it waits on pending automatic compactions before it begins running.

This PR updates the test case to ensure the automatic compactions are scheduled after the manual compaction starts but before it finishes, thus ensuring a conflict. I also cleaned up the test case to use less space as I saw it cause out-of-space error on travis.
Closes https://github.com/facebook/rocksdb/pull/3375

Differential Revision: D6735162

Pulled By: ajkr

fbshipit-source-id: 020530a4e150a4786792dce7cec5d66b420cb884
2018-01-16 23:12:00 -08:00
Yi Wu dc360df81e Fix multiple build failures
Summary:
* Fix DBTest.CompactRangeWithEmptyBottomLevel lite build failure
* Fix DBTest.AutomaticConflictsWithManualCompaction failure introduce by #3366
* Fix BlockBasedTableTest::IndexUncompressed should be disabled if snappy is disabled
* Fix ASAN failure with DBBasicTest::DBClose test
Closes https://github.com/facebook/rocksdb/pull/3373

Differential Revision: D6732313

Pulled By: yiwu-arbug

fbshipit-source-id: 1eb9b9d9a8d795f56188fa9770db9353f6fdedc5
2018-01-16 17:30:39 -08:00
Bartek Wrona bf6f03f3cd Issue #3370 Broken CMakeLists.txt
Summary:
Issue #3370 Simple fixes to make RocksDB project working also as a submodule of other bigger one.
Closes https://github.com/facebook/rocksdb/pull/3372

Differential Revision: D6729595

Pulled By: ajkr

fbshipit-source-id: eee2589e7a7c4322873dff8510eebd050301c54c
2018-01-16 14:26:50 -08:00
Sunguck Lee af92d4ad11 Avoid too frequent MaybeScheduleFlushOrCompaction() call
Summary:
If there's manual compaction in the queue, then "HaveManualCompaction(compaction_queue_.front())" will return true, and this cause too frequent MaybeScheduleFlushOrCompaction().

https://github.com/facebook/rocksdb/issues/3198
Closes https://github.com/facebook/rocksdb/pull/3366

Differential Revision: D6729575

Pulled By: ajkr

fbshipit-source-id: 96da04f8fd33297b1ccaec3badd9090403da29b0
2018-01-16 13:12:12 -08:00
Anand Ananthabhotla d0f1b49ab6 Add a Close() method to DB to return status when closing a db
Summary:
Currently, the only way to close an open DB is to destroy the DB
object. There is no way for the caller to know the status. In one
instance, the destructor encountered an error due to failure to
close a log file on HDFS. In order to prevent silent failures, we add
DB::Close() that calls CloseImpl() which must be implemented by its
descendants.
The main failure point in the destructor is closing the log file. This
patch also adds a Close() entry point to Logger in order to get status.
When DBOptions::info_log is allocated and owned by the DBImpl, it is
explicitly closed by DBImpl::CloseImpl().
Closes https://github.com/facebook/rocksdb/pull/3348

Differential Revision: D6698158

Pulled By: anand1976

fbshipit-source-id: 9468e2892553eb09c4c41b8723f590c0dbd8ab7d
2018-01-16 11:08:57 -08:00
Sagar Vemuri 68829ed89c Revert Snappy version upgrade
Summary:
Java static builds are again broken, this time due Snappy version upgrade introduced in 90c1d81975 (#3331).

This is due to two reasons:
1. The new Snappy packages should now be downloaded from https://github.com/google/snappy/archive/<pkg.tar.gz> instead of https://github.com/google/snappy/releases/download/<pkg.tar.gz> which we are using now.
1. In addition to the the above URL change, Snappy changed its build from using autotools to CMake based : e69d9f8806/README.md (L65-L72)

So more changes are needed if we are going to upgrade to 1.1.7. Hence reverting the version upgrade until we figure them out.
Closes https://github.com/facebook/rocksdb/pull/3363

Differential Revision: D6716983

Pulled By: sagar0

fbshipit-source-id: f451a1bc5eb0bb090f4da07bc3e5ba72cf89aefa
2018-01-12 23:41:43 -08:00
Andrew Kryczka 43549c7d59 Prevent unnecessary calls to PurgeObsoleteFiles
Summary:
Split `JobContext::HaveSomethingToDelete` into two functions: itself and `JobContext::HaveSomethingToClean`. Now we won't call `DBImpl::PurgeObsoleteFiles` in cases where we really just need to call `JobContext::Clean`. The change is needed because I want to track pending calls to `PurgeObsoleteFiles` for a bug fix, which is much simpler if we only call it after `FindObsoleteFiles` finds files to delete.
Closes https://github.com/facebook/rocksdb/pull/3350

Differential Revision: D6690609

Pulled By: ajkr

fbshipit-source-id: 61502e7469288afe16a663a1b7df345baeaf246f
2018-01-12 13:27:08 -08:00
Andrew Kryczka ba295cda29 replace DBTest.HugeNumbersOfLevel with a more targeted test case
Summary:
This test often causes out-of-space error when run on travis. We don't want such stress tests in our unit test suite.

The bug in #596, which this test intends to expose, can be repro'd as long as the bottommost level(s) are empty when CompactRange is called. I rewrote the test to cover this simple case without writing a lot of data.
Closes https://github.com/facebook/rocksdb/pull/3362

Differential Revision: D6710417

Pulled By: ajkr

fbshipit-source-id: 9a1ec85e738c813ac2fee29f1d5302065ecb54c5
2018-01-12 11:12:09 -08:00
Sagar Vemuri e446d14093 Fix PowerPC dynamic java build
Summary:
Java build on PPC64le has been broken since a few months, due to #2716. Fixing it with the least amount of changes.
(We should cleanup a little around this code when time permits).

This should fix the build failures seen in http://140.211.168.68:8080/job/Rocksdb/ .
Closes https://github.com/facebook/rocksdb/pull/3359

Differential Revision: D6712938

Pulled By: sagar0

fbshipit-source-id: 3046e8f072180693de2af4762934ec1ace309ca4
2018-01-12 10:57:14 -08:00
Andrew Kryczka 6d7e3b9faf fix Gemfile.lock nokogiri dependencies
Summary:
I installed the ruby dependencies and ran `bundle update nokogiri`. It depends on a newer version of "mini_portile2" which I missed in 9c2f64e148. Now `bundle install` works again.
Closes https://github.com/facebook/rocksdb/pull/3361

Differential Revision: D6710164

Pulled By: ajkr

fbshipit-source-id: 9a08d6cc6400ef495b715b3d68b04ce3f3367031
2018-01-11 20:11:32 -08:00
Peter (Stig) Edwards 45828c7215 Consider an increase to buffer size when reading option file, from 4K to 8K.
Summary:
Hello and thank you for RocksDB,

While looking into the buffered io used when an `OPTIONS` file is read I noticed the `OPTIONS` files produced by RocksDB 5.8.8 (and head of master) were just over 4096 bytes in size, resulting in the version of glibc I am using (glibc-2.17-196.el7) (on the filesystem used) being passed a 4K buffer for the `fread_unlocked` call and 2 system call reads using a 4096 buffer being used to read the contents of the `OPTIONS` file.

  If the buffer size is increased to 8192 then 1 system call read is used to read the contents.

  As I think the buffer size is just used for reading `OPTIONS` files, and I thought it likely that `OPTIONS` files have increased in size (as more options are added), I thought I would suggest an increase.

[  If the comments from the top of the `OPTIONS` file are removed, and white space from the start of lines is removed then the size can be reduced to be under 4K, but as more options are added the size seems likely to grow again. ]

Create a new database:

```
> ./ldb --create_if_missing --db=/tmp/rdb_tmp put 1 1
OK
```

The OPTIONS file is 4252 bytes:

```
> stat /tmp/rdb_tmp/OPTIONS* | head -n 2
  File: ‘/tmp/rdb_tmp/OPTIONS-000005’
  Size: 4252            Blocks: 16         IO Block: 4096   regular file
```

Before, the 4096 byte buffer is used from 2 system read calls:

```
> strace -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 |
    grep -A 1 'RocksDB option file'
read(3, "# This is a RocksDB option file."..., 4096) = 4096
read(3, "e\n  metadata_block_size=4096\n  c"..., 4096) = 156
```

ltrace shows 4096 passed to fread_unlocked

```
> ltrace -S -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 |
    grep -C 3 'RocksDB option file'
[pid 51013] fread_unlocked(0x7ffd5fbf2d50, 1, 4096, 0x7fd2e084e780 <unfinished ...>
[pid 51013] fstat@SYS(3, 0x7ffd5fbf28f0)         = 0
[pid 51013] mmap@SYS(nil, 4096, 3, 34, -1, 0)    = 0x7fd2e318c000
[pid 51013] read@SYS(3, "# This is a RocksDB option file."..., 4096) = 4096
[pid 51013] <... fread_unlocked resumed> )       = 4096
...
```

After, the 8192 byte buffer is used from 1 system read call:

```
> strace -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 | grep -A 1 'RocksDB option file'
read(3, "# This is a RocksDB option file."..., 8192) = 4252
read(3, "", 4096)                       = 0
```

ltrace shows 8192 passed to fread_unlocked

```
> ltrace -S -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 | grep -C 3 'RocksDB option file'
[pid 146611] fread_unlocked(0x7ffcfba382f0, 1, 8192, 0x7fc4e844e780 <unfinished ...>
[pid 146611] fstat@SYS(3, 0x7ffcfba380f0)        = 0
[pid 146611] mmap@SYS(nil, 4096, 3, 34, -1, 0)   = 0x7fc4eaee0000
[pid 146611] read@SYS(3, "# This is a RocksDB option file."..., 8192) = 4252
[pid 146611] read@SYS(3, "", 4096)               = 0
[pid 146611] <... fread_unlocked resumed> )      = 4252
[pid 146611] feof(0x7fc4e844e780)                = 1
```
Closes https://github.com/facebook/rocksdb/pull/3294

Differential Revision: D6653684

Pulled By: ajkr

fbshipit-source-id: 222f25f5442fefe1dcec18c700bd9e235bb63491
2018-01-11 18:57:41 -08:00
Changli Gao 0a7ba0e548 Fix memleak when DB::DeleteFile()
Summary:
Because the corresponding read_first_record_cache_ item wasn't
erased, memory leaked.
Closes https://github.com/facebook/rocksdb/pull/1712

Differential Revision: D4363654

Pulled By: ajkr

fbshipit-source-id: 7da1adcfc8c380e4ffe05b8769fc2221ad17a225
2018-01-11 18:57:33 -08:00