Commit graph

2767 commits

Author SHA1 Message Date
Andrew Kryczka a99fb9928f fix column_family_test asan
Summary:
stop calling Close() at the end of tests holding a compaction pressure token since it causes the write controller to be deleted while it's still needed. these calls were pointless anyways since Close() is already called in the test's destructor.
Closes https://github.com/facebook/rocksdb/pull/2367

Differential Revision: D5125906

Pulled By: ajkr

fbshipit-source-id: 6cad8673e5546a82ff602ac0ba59cc3f68dbde46
2017-05-24 16:41:51 -07:00
Andrew Kryczka bb01c1880c Introduce max_background_jobs mutable option
Summary:
- `max_background_flushes` and `max_background_compactions` are still supported for backwards compatibility
- `base_background_compactions` is completely deprecated. Now we just throttle to one background compaction when there's no pressure.
- `max_background_jobs` is added to automatically partition the concurrent background jobs into flushes vs compactions. Currently it's very simple as we just allocate one-fourth of the jobs to flushes, and the remaining can be used for compactions.
- The test cases that set `base_background_compactions > 1` needed to be updated. I just grab the pressure token such that the desired number of compactions can be scheduled.
Closes https://github.com/facebook/rocksdb/pull/2205

Differential Revision: D4937461

Pulled By: ajkr

fbshipit-source-id: df52cbbd497e13bbc9a60560a5ac2a2526b3f1f9
2017-05-24 11:29:08 -07:00
Siying Dong 41cbb72749 options.delayed_write_rate use the rate of rate_limiter by default.
Summary:
It's hard for RocksDB to come up with a good default of delayed write rate. Use rate given by rate limiter if it is availalbe. This provides the I/O order of magnitude.
Closes https://github.com/facebook/rocksdb/pull/2357

Differential Revision: D5115324

Pulled By: siying

fbshipit-source-id: 341065ad2211c981fc804011c0f0e59a50c7e754
2017-05-24 09:58:24 -07:00
Igor Canadi 52d9e5f7b6 Fix column family seconds_up accounting
Summary:
`cf_stats_snapshot_.seconds_up` appears to be never updated, unlike `db_stats_snapshot_.seconds_up`, which is updated here: https://github.com/facebook/rocksdb/blob/master/db/internal_stats.cc#L883

This leads to wrong information in the log, for example:
```
** Compaction Stats [default] **

....

Uptime(secs): 85591.2 total, 85591.2 interval
```

Even though DB's interval is correctly logged as 60 seconds:
```
** DB Stats **
Uptime(secs): 85591.2 total, 637.8 interval
```
Closes https://github.com/facebook/rocksdb/pull/2338

Differential Revision: D5114131

Pulled By: sagar0

fbshipit-source-id: 85243a38213236ccbb601a7f7aaa8865eaa8083c
2017-05-23 17:14:04 -07:00
Sagar Vemuri 7d8207f1f2 Fix errors in clang-analyzer builds
Summary:
Fix build error in db_iter.cc when running clang-analyzer.
```
  CC       db/db_iter.o
db/db_iter.cc:938:21: error: no matching constructor for initialization of 'rocksdb::ParsedInternalKey'
  ParsedInternalKey ikey(Slice(), 0, 0);
                    ^    ~~~~~~~~~~~~~
./db/dbformat.h:84:3: note: candidate constructor not viable: no known conversion from 'int' to 'rocksdb::ValueType' for 3rd argument
  ParsedInternalKey(const Slice& u, const SequenceNumber& seq, ValueType t)
  ^
./db/dbformat.h:78:8: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 3 were provided
struct ParsedInternalKey {
       ^
./db/dbformat.h:78:8: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 3 were provided
./db/dbformat.h:83:3: note: candidate constructor not viable: requires 0 arguments, but 3 were provided
  ParsedInternalKey() { }  // Intentionally left uninitialized (for speed)
  ^
1 error generated.
```
Closes https://github.com/facebook/rocksdb/pull/2354

Differential Revision: D5115751

Pulled By: sagar0

fbshipit-source-id: b0e386d4e935e4725b07761c3ca5f7a8cbde3692
2017-05-23 15:11:42 -07:00
Andrew Kryczka 6cc9aef162 New API for background work in single thread pool
Summary:
Previously users could set `max_background_flushes=0` to force rocksdb to use a single thread pool for both background flushes and compactions. That'll no longer be possible since I'm going to deprecate `max_background_flushes` and `max_background_compactions` in favor of a single option. This diff introduces a new way to force a single thread pool: when high-pri pool has zero threads, all background jobs will be submitted to low-pri pool.

Note the majority of the code change is adding `Env::GetBackgroundThreads()`, which is necessary to check whether the user has provided a zero-sized thread pool.
Closes https://github.com/facebook/rocksdb/pull/2204

Differential Revision: D4936256

Pulled By: ajkr

fbshipit-source-id: 929a07a0c0705f7766f5339cd013ff74e90d6e01
2017-05-23 11:12:27 -07:00
Yi Wu 9d0a07ed52 Fix rocksdb.estimate-num-keys DB property underflow
Summary:
rocksdb.estimate-num-keys is compute from `estimate_num_keys - 2 * estimate_num_deletes`. If  `2 * estimate_num_deletes > estimate_num_keys` it will underflow. Fixing it.
Closes https://github.com/facebook/rocksdb/pull/2348

Differential Revision: D5109272

Pulled By: yiwu-arbug

fbshipit-source-id: e1bfb91346a59b7282a282b615002507e9d7c246
2017-05-23 10:42:59 -07:00
Aaron Gao 3e86c0f07c disable direct reads for log and manifest and add direct io to tests
Summary:
Disable direct reads for log and manifest. Direct reads should not affect sequential_file
Also add kDirectIO for option_config_ in db_test_util
Closes https://github.com/facebook/rocksdb/pull/2337

Differential Revision: D5100261

Pulled By: lightmark

fbshipit-source-id: 0ebfd13b93fa1b8f9acae514ac44f8125a05868b
2017-05-22 18:41:28 -07:00
Sagar Vemuri 228f49d20a Fix data races caught by tsan
Summary:
This fixes the tsan build failures in:
- write_callback_test
- persistent_cache_test.*
Closes https://github.com/facebook/rocksdb/pull/2339

Differential Revision: D5101190

Pulled By: sagar0

fbshipit-source-id: 537e19ed05272b1f34cfbf793aa822b2264a1643
2017-05-22 10:27:23 -07:00
Yi Wu 07bdcb91fe New WriteImpl to pipeline WAL/memtable write
Summary:
PipelineWriteImpl is an alternative approach to WriteImpl. In WriteImpl, only one thread is allow to write at the same time. This thread will do both WAL and memtable writes for all write threads in the write group. Pending writers wait in queue until the current writer finishes. In the pipeline write approach, two queue is maintained: one WAL writer queue and one memtable writer queue. All writers (regardless of whether they need to write WAL) will still need to first join the WAL writer queue, and after the house keeping work and WAL writing, they will need to join memtable writer queue if needed. The benefit of this approach is that
1. Writers without memtable writes (e.g. the prepare phase of two phase commit) can exit write thread once WAL write is finish. They don't need to wait for memtable writes in case of group commit.
2. Pending writers only need to wait for previous WAL writer finish to be able to join the write thread, instead of wait also for previous memtable writes.

Merging #2056 and #2058 into this PR.
Closes https://github.com/facebook/rocksdb/pull/2286

Differential Revision: D5054606

Pulled By: yiwu-arbug

fbshipit-source-id: ee5b11efd19d3e39d6b7210937b11cefdd4d1c8d
2017-05-19 14:26:42 -07:00
Yi Wu d746aead1a Suppress clang-analyzer false positive
Summary:
Fixing two types of clang-analyzer false positives:
* db is deleted and then reopen, and clang-analyzer thinks we are reusing the pointer after it has been deleted. Adding asserts to hint clang-analyzer the pointer is recreated.
* ParsedInternalKey is (intentionally) uninitialized. Initialize the struct only when clang-analyzer is running.
Closes https://github.com/facebook/rocksdb/pull/2334

Differential Revision: D5093801

Pulled By: yiwu-arbug

fbshipit-source-id: f51355382098eb3da5ab9f64e094c6d03e6bdf7d
2017-05-19 10:56:28 -07:00
Siying Dong 217b866f47 column_family_test: EnvCounter::num_new_writable_file_ to be atomic
Summary:
TSAN shows warning of data race of EnvCounter::num_new_writable_file_. Make it atomic.
Closes https://github.com/facebook/rocksdb/pull/2331

Differential Revision: D5089215

Pulled By: siying

fbshipit-source-id: 15f6dcfb770a3310cbb6337c22482c8b330daffc
2017-05-18 13:56:12 -07:00
yizhu.sun f5ba131bf8 Fixed some spelling mistakes
Summary: Closes https://github.com/facebook/rocksdb/pull/2314

Differential Revision: D5079601

Pulled By: sagar0

fbshipit-source-id: ae5696fd735718f544435c64c3179c49b8c04349
2017-05-17 23:12:36 -07:00
hyunwoo 0ebdd70579 fixed typo
Summary:
fixed typo
Closes https://github.com/facebook/rocksdb/pull/2312

Differential Revision: D5079631

Pulled By: sagar0

fbshipit-source-id: e4c8d1d89b244ee69e9dea1dd013227cc5241026
2017-05-17 16:41:49 -07:00
Mikhail Antonov ba685a472a Support ingest_behind for IngestExternalFile
Summary:
First cut for early review; there are few conceptual points to answer and some code structure issues.

For conceptual points -

 - restriction-wise, we're going to disallow ingest_behind if (use_seqno_zero_out=true || disable_auto_compaction=false), the user is responsible to properly open and close DB with required params
 - we wanted to ingest into reserved bottom most level. Should we fail fast if bottom level isn't empty, or should we attempt to ingest if file fits there key-ranges-wise?
 - Modifying AssignLevelForIngestedFile seems the place we we'd handle that.

On code structure - going to refactor GenerateAndAddExternalFile call in the test class to allow passing instance of IngestionOptions, that's just going to incur lots of changes at callsites.
Closes https://github.com/facebook/rocksdb/pull/2144

Differential Revision: D4873732

Pulled By: lightmark

fbshipit-source-id: 81cb698106b68ef8797f564453651d50900e153a
2017-05-17 11:42:42 -07:00
boolean5 cb9392a094 add Transactions and Checkpoint to C API
Summary:
I've added functions to the C API to support Transactions as requested in #1637 and to support Checkpoint.

I have also added the corresponding tests to c_test.c

For now, the following is omitted:

1. Optimistic Transactions
2. The column family variation of functions
Closes https://github.com/facebook/rocksdb/pull/2236

Differential Revision: D4989510

Pulled By: yiwu-arbug

fbshipit-source-id: 518cb39f76d5e9ec9690d633fcdc014b98958071
2017-05-16 22:59:43 -07:00
hyunwoo f720796e24 fixed typo
Summary:
fixed exisitng -> existing
Closes https://github.com/facebook/rocksdb/pull/2305

Differential Revision: D5070169

Pulled By: yiwu-arbug

fbshipit-source-id: 8c8450acf50757b767cf78b78314018395738d96
2017-05-16 11:07:58 -07:00
siddontang 1ca723dbd1 C API: support pinnable get
Summary: Closes https://github.com/facebook/rocksdb/pull/2254

Differential Revision: D5053590

Pulled By: yiwu-arbug

fbshipit-source-id: 2f365a031b3a2947b4fba21d26d4f8f52af9b9f0
2017-05-16 11:07:58 -07:00
赵星宇 4f9e69ccf4 fix log err
Summary: Closes https://github.com/facebook/rocksdb/pull/2206

Differential Revision: D5054222

Pulled By: yiwu-arbug

fbshipit-source-id: d8742bda1bf3e76d7b68eeb86df4608031b5cbc8
2017-05-15 16:15:38 -07:00
Yi Wu 3907c94ffb Fix ColumnFamilyTest:BulkAddDrop
Summary:
Fix ColumnFamilyTest:BulkAddDrop not deleted CF handles at the end, causing ASAN failure.
Closes https://github.com/facebook/rocksdb/pull/2275

Differential Revision: D5040724

Pulled By: yiwu-arbug

fbshipit-source-id: 86cd4070c944d01173a3cc36462bb800698af192
2017-05-10 23:05:44 -07:00
Anirban Rahut d85ff4953c Blob storage pr
Summary:
The final pull request for Blob Storage.
Closes https://github.com/facebook/rocksdb/pull/2269

Differential Revision: D5033189

Pulled By: yiwu-arbug

fbshipit-source-id: 6356b683ccd58cbf38a1dc55e2ea400feecd5d06
2017-05-10 15:14:44 -07:00
Aaron Gao 492fc49a86 fix readampbitmap tests
Summary:
fix test failure of ReadAmpBitmap and ReadAmpBitmapLiveInCacheAfterDBClose.
test ReadAmpBitmapLiveInCacheAfterDBClose individually and make check
Closes https://github.com/facebook/rocksdb/pull/2271

Differential Revision: D5038133

Pulled By: lightmark

fbshipit-source-id: 803cd6f45ccfdd14a9d9473c8af311033e164be8
2017-05-10 12:29:23 -07:00
Andrew Kryczka be421b0b16 portable sched_getcpu calls
Summary:
- added a feature test in build_detect_platform to check whether sched_getcpu() is available. glibc offers it only on some platforms (e.g., linux but not mac); this way should be easier than maintaining a list of platforms on which it's available.
- refactored PhysicalCoreID() to be simpler / less repetitive. ordered the conditional compilation clauses from most-to-least preferred
Closes https://github.com/facebook/rocksdb/pull/2272

Differential Revision: D5038093

Pulled By: ajkr

fbshipit-source-id: 81d7db3cc620250de220bdeb3194b2b3d7673de7
2017-05-10 12:29:23 -07:00
Aaron Gao 259a00eaca unbiase readamp bitmap
Summary:
Consider BlockReadAmpBitmap with bytes_per_bit = 32. Suppose bytes [a, b) were used, while bytes [a-32, a)
 and [b+1, b+33) weren't used; more formally, the union of ranges passed to BlockReadAmpBitmap::Mark() contains [a, b) and doesn't intersect with [a-32, a) and [b+1, b+33). Then bits [floor(a/32), ceil(b/32)] will be set, and so the number of useful bytes will be estimated as (ceil(b/32) - floor(a/32)) * 32, which is on average equal to b-a+31.

An extreme example: if we use 1 byte from each block, it'll be counted as 32 bytes from each block.

It's easy to remove this bias by slightly changing the semantics of the bitmap. Currently each bit represents a byte range [i*32, (i+1)*32).

This diff makes each bit represent a single byte: i*32 + X, where X is a random number in [0, 31] generated when bitmap is created. So, e.g., if you read a single byte at random, with probability 31/32 it won't be counted at all, and with probability 1/32 it will be counted as 32 bytes; so, on average it's counted as 1 byte.

*But there is one exception: the last bit will always set with the old way.*

(*) - assuming read_amp_bytes_per_bit = 32.
Closes https://github.com/facebook/rocksdb/pull/2259

Differential Revision: D5035652

Pulled By: lightmark

fbshipit-source-id: bd98b1b9b49fbe61f9e3781d07f624e3cbd92356
2017-05-10 01:49:52 -07:00
Islam AbdelRahman 4897eb250b dont skip IO for filter blocks
Summary:
Based on my experience with linkbench, We should not skip loading bloom filter blocks when they are not available in block cache when using Iterator::Seek

Actually I am not sure why this behavior existed in the first place
Closes https://github.com/facebook/rocksdb/pull/2255

Differential Revision: D5010721

Pulled By: maysamyabandeh

fbshipit-source-id: 0af545a06ac4baeecb248706ec34d009c2480ca4
2017-05-09 09:52:02 -07:00
Changjian Gao 3f73d54bbd Add C API to set max_file_opening_threads option
Summary:
Add `rocksdb_options_set_max_file_opening_threads()` API
Closes https://github.com/facebook/rocksdb/pull/2184

Differential Revision: D4923090

Pulled By: lightmark

fbshipit-source-id: c4ddce17733d999d426d02f7202b33a46ed6faed
2017-05-08 22:49:32 -07:00
Yi Wu 2cd00773c7 Add bulk create/drop column family API
Summary:
Adding DB::CreateColumnFamilie() and DB::DropColumnFamilies() to bulk create/drop column families. This is to address the problem creating/dropping 1k column families takes minutes. The bottleneck is we persist options files for every single column family create/drop, and it parses the persisted options file for verification, which take a lot CPU time.

The new APIs simply create/drop column families individually, and persist options file once at the end. This improves create 1k column families to within ~0.1s. Further improvement can be merge manifest write to one IO.
Closes https://github.com/facebook/rocksdb/pull/2248

Differential Revision: D5001578

Pulled By: yiwu-arbug

fbshipit-source-id: d4e00bda671451e0b314c13e12ad194b1704aa03
2017-05-07 23:20:46 -07:00
Tamir Duberstein fdaefa0309 travis: add Windows cross-compilation
Summary:
- downcase includes for case-sensitive filesystems
- give targets the same name (librocksdb) on all platforms

With this patch it is possible to cross-compile RocksDB for Windows
from a Linux host using mingw.

cc yuslepukhin orgads
Closes https://github.com/facebook/rocksdb/pull/2107

Differential Revision: D4849784

Pulled By: siying

fbshipit-source-id: ad26ed6b4d393851aa6551e6aa4201faba82ef60
2017-05-05 23:20:01 -07:00
Aaron Gao a30a696034 do not read next datablock if upperbound is reached
Summary:
Now if we have iterate_upper_bound set, we continue read until get a key >= upper_bound. For a lot of cases that neighboring data blocks have a user key gap between them, our index key will be a user key in the middle to get a shorter size. For example, if we have blocks:
[a b c d][f g h]
Then the index key for the first block will be 'e'.
then if upper bound is any key between 'd' and 'e', for example, d1, d2, ..., d99999999999, we don't have to read the second block and also know that we have done our iteration by reaching the last key that smaller the upper bound already.

This diff can reduce RA in most cases.
Closes https://github.com/facebook/rocksdb/pull/2239

Differential Revision: D4990693

Pulled By: lightmark

fbshipit-source-id: ab30ea2e3c6edf3fddd5efed3c34fcf7739827ff
2017-05-05 23:20:01 -07:00
Aaron Gao 2d42cf5ea9 Roundup read bytes in ReadaheadRandomAccessFile
Summary:
Fix alignment in ReadaheadRandomAccessFile
Closes https://github.com/facebook/rocksdb/pull/2253

Differential Revision: D5012336

Pulled By: lightmark

fbshipit-source-id: 10d2c829520cb787227ef653ef63d5d701725778
2017-05-05 12:14:14 -07:00
Siying Dong 264d3f540c Allow IntraL0 compaction in FIFO Compaction
Summary:
Allow an option for users to do some compaction in FIFO compaction, to pay some write amplification for fewer number of files.
Closes https://github.com/facebook/rocksdb/pull/2163

Differential Revision: D4895953

Pulled By: siying

fbshipit-source-id: a1ab608dd0627211f3e1f588a2e97159646e1231
2017-05-04 18:16:13 -07:00
Andrew Kryczka 8c3a180e83 Set lower-bound on dynamic level sizes
Summary:
Changed dynamic leveling to stop setting the base level's size bound below `max_bytes_for_level_base`.

Behavior for config where `max_bytes_for_level_base == level0_file_num_compaction_trigger * write_buffer_size` and same amount of data in L0 and base-level:

- Before #2027, compaction scoring would favor base-level due to dividing by size smaller than `max_bytes_for_level_base`.
- After #2027, L0 and Lbase get equal scores. The disadvantage is L0 is often compacted before reaching the num files trigger since `write_buffer_size` can be bigger than the dynamically chosen base-level size. This increases write-amp.
- After this diff, L0 and Lbase still get equal scores. Now it takes `level0_file_num_compaction_trigger` files of size `write_buffer_size` to trigger L0 compaction by size, fixing the write-amp problem above.
Closes https://github.com/facebook/rocksdb/pull/2123

Differential Revision: D4861570

Pulled By: ajkr

fbshipit-source-id: 467ddef56ed1f647c14d86bb018bcb044c39b964
2017-05-04 18:16:12 -07:00
Andrew Kryczka 7c1c8ce5ac Avoid calling fallocate with UINT64_MAX
Summary:
When user doesn't set a limit on compaction output file size, let's use the sum of the input files' sizes. This will avoid passing UINT64_MAX as fallocate()'s length. Reported in #2249.

Test setup:
- command: `TEST_TMPDIR=/data/rocksdb-test/ strace -e fallocate ./db_compaction_test --gtest_filter=DBCompactionTest.ManualCompactionUnknownOutputSize`
- filesystem: xfs

before this diff:
`fallocate(10, 01, 0, 1844674407370955160) = -1 ENOSPC (No space left on device)`

after this diff:
`fallocate(10, 01, 0, 1977)              = 0`
Closes https://github.com/facebook/rocksdb/pull/2252

Differential Revision: D5007275

Pulled By: ajkr

fbshipit-source-id: 4491404a6ae8a41328aede2e2d6f4d9ac3e38880
2017-05-04 17:43:22 -07:00
Leonidas Galanis a45e98a5b5 max_open_files dynamic set, follow up
Summary:
Followup to make 0x40000 a TableCache constant that indicates infinite capacity
Closes https://github.com/facebook/rocksdb/pull/2247

Differential Revision: D5001349

Pulled By: lgalanis

fbshipit-source-id: ce7bd2e54b0975bb9f8680fdaa0f8bb0e7ae81a2
2017-05-04 10:42:45 -07:00
Leonidas Galanis e7ae4a3a02 Max open files mutable
Summary:
Makes max_open_files db option dynamically set-able by SetDBOptions. During the call of SetDBOptions we call SetCapacity on the table cache, which is a LRUCache.
Closes https://github.com/facebook/rocksdb/pull/2185

Differential Revision: D4979189

Pulled By: yiwu-arbug

fbshipit-source-id: ca7e8dc5e3619c79434f579be4847c0f7e56afda
2017-05-03 21:13:14 -07:00
siddontang b551104e04 support PopSavePoint for WriteBatch
Summary:
Try to fix https://github.com/facebook/rocksdb/issues/1969
Closes https://github.com/facebook/rocksdb/pull/2170

Differential Revision: D4907333

Pulled By: yiwu-arbug

fbshipit-source-id: 417b420ff668e6c2fd0dad42a94c57385012edc5
2017-05-03 10:57:45 -07:00
Siying Dong af6fe69e4c Fix an issue of manual / auto compaction data race
Summary:
A data race between a manual and an auto compaction can cause a scheduled automatic compaction to be cancelled and never rescheduled again. This may cause a condition of hanging forever. Fix this by always making sure the cancelled compaction is put back to the compaction queue.
Closes https://github.com/facebook/rocksdb/pull/2238

Differential Revision: D4984591

Pulled By: siying

fbshipit-source-id: 3ab153886403c7b991896dcb2158b96cac12f227
2017-05-02 15:11:59 -07:00
Siying Dong aeaba07b2a Remove an assert that causes TSAN failure.
Summary:
ColumnFamilyData::ConstructNewMemtable is called out of DB mutex, and it asserts current_ is not empty, but current_ should only be accessed inside DB mutex. Remove this assert to make TSAN happy.
Closes https://github.com/facebook/rocksdb/pull/2235

Differential Revision: D4978531

Pulled By: siying

fbshipit-source-id: 423685a7dae88ed3faaa9e1b9ccb3427ac704a4b
2017-05-01 16:35:15 -07:00
Siying Dong d616ebea23 Add GPLv2 as an alternative license.
Summary: Closes https://github.com/facebook/rocksdb/pull/2226

Differential Revision: D4967547

Pulled By: siying

fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4
2017-04-27 18:06:12 -07:00
Aaron Gao 0ca3ead0cb add GetRootDB() in DeleteFilesInRange
Summary:
In case users cast a subclass of db* into dbimpl*
Closes https://github.com/facebook/rocksdb/pull/2222

Differential Revision: D4964486

Pulled By: lightmark

fbshipit-source-id: 0ccdc08ee8e7a193dfbbe0218c3cbfd795662ca1
2017-04-27 14:33:17 -07:00
Dmitri Smirnov cdad04b051 Remove double buffering on RandomRead on Windows.
Summary:
Remove double buffering on RandomRead on Windows.
  With more logic appear in file reader/write Read no longer
  obeys forwarding calls to Windows implementation.
  Previously direct_io (unbuffered) was only available on Windows
  but now is supported as generic.
  We remove intermediate buffering on Windows.
  Remove random_access_max_buffer_size option which was windows specific.
  Non-zero values for that opton introduced unnecessary lock contention.
  Remove Env::EnableReadAhead(), Env::ShouldForwardRawRequest() that are
  no longer necessary.
  Add aligned buffer reads for cases when requested reads exceed read ahead size.
Closes https://github.com/facebook/rocksdb/pull/2105

Differential Revision: D4847770

Pulled By: siying

fbshipit-source-id: 8ab48f8e854ab498a4fd398a6934859792a2788f
2017-04-27 12:30:05 -07:00
Siying Dong e15382c09c Disable two flaky tests
Summary: Closes https://github.com/facebook/rocksdb/pull/2217

Differential Revision: D4959351

Pulled By: siying

fbshipit-source-id: ce7c3a430bae0d15e06b3d5c958ebce969d08564
2017-04-26 17:13:46 -07:00
Andrew Kryczka efc361ef7d Add user stats Reset API
Summary:
It resets all the ticker and histogram stats to zero. Needed to change the locking a bit since Reset() is the only operation that manipulates multiple tickers/histograms together, and that operation should be seen as atomic by other operations that access tickers/histograms.
Closes https://github.com/facebook/rocksdb/pull/2213

Differential Revision: D4952232

Pulled By: ajkr

fbshipit-source-id: c0475c3e4c7b940120d53891b69c3091149a0679
2017-04-26 15:57:01 -07:00
Andrew Kryczka f6a27d0bce Extract statistics tests into separate file
Summary:
I'm going to add more DB tests for statistics as currently we have very few. I started a file dedicated to this purpose and moved the existing stats-specific tests there.
Closes https://github.com/facebook/rocksdb/pull/2211

Differential Revision: D4951558

Pulled By: ajkr

fbshipit-source-id: 05d11c35079c40ecabdfd2cf5556ccb761f694a4
2017-04-26 14:47:23 -07:00
Aaron Gao 7eddecce12 support bulk loading with universal compaction
Summary:
Support buck load with universal compaction.
More test cases to be added.
Closes https://github.com/facebook/rocksdb/pull/2202

Differential Revision: D4935360

Pulled By: lightmark

fbshipit-source-id: cc3ca1b6f42faa503207dab1408d6bcf393ee5b5
2017-04-26 13:41:32 -07:00
Aaron Gao 72c21fb3f2 call GetRootDB() before cast to DBImpl* in CancelAllBackgroundWork
Summary:
User could call this with wrapper class of DB or DBImpl
Closes https://github.com/facebook/rocksdb/pull/2200

Differential Revision: D4935530

Pulled By: lightmark

fbshipit-source-id: df9cb61d67d0f3bbcf62f714d77523a459a92883
2017-04-24 13:47:17 -07:00
Tomas Kolda 04d58970cb AIX and Solaris Sparc Support
Summary:
Replacement of #2147

The change was squashed due to a lot of conflicts.
Closes https://github.com/facebook/rocksdb/pull/2194

Differential Revision: D4929799

Pulled By: siying

fbshipit-source-id: 5cd49c254737a1d5ac13f3c035f128e86524c581
2017-04-21 20:48:04 -07:00
Aaron Gao cb885bccfe set compaction_iterator earliest_snapshot to max if no snapshot
Summary:
It is a potential bug that will be triggered if we ingest files before inserting the first key into an empty db.
0 is a special value reserved to indicate the concept of non-existence. But not good for seqno in this case because 0 is a valid seqno for ingestion(bulk loading)
Closes https://github.com/facebook/rocksdb/pull/2183

Differential Revision: D4919827

Pulled By: lightmark

fbshipit-source-id: 237eea40f88bd6487b66806109d90065dc02c362
2017-04-20 19:56:59 -07:00
Andrew Kryczka 1dd7760513 Change L0 compaction score using level size
Summary:
The goal is to avoid the problem of small number of L0 files triggering compaction to base level (which increased write-amp), while still allowing L0 compaction-by-size (so intra-L0 compactions cause score to increase).
Closes https://github.com/facebook/rocksdb/pull/2172

Differential Revision: D4908552

Pulled By: ajkr

fbshipit-source-id: 4b170142b2b368e24bd7948b2a6f24c69fabf73d
2017-04-19 12:00:01 -07:00
Yi Wu 966ebb02f5 Hide event listeners from lite build
Summary:
Fixing lite build failure introduce by #2169.
Closes https://github.com/facebook/rocksdb/pull/2174

Reviewed By: sagar0

Differential Revision: D4910619

Pulled By: yiwu-arbug

fbshipit-source-id: 5213b7b7431cc258688793c8c28153025588d8d9
2017-04-18 17:26:19 -07:00