Commit graph

1366 commits

Author SHA1 Message Date
Andrew Kryczka e2c6c06366 add TimedEnv
Summary:
I've needed Env timing measurements a few times now, so finally built something for it.
Closes https://github.com/facebook/rocksdb/pull/2073

Differential Revision: D4811231

Pulled By: ajkr

fbshipit-source-id: 218a249
2017-04-04 11:24:12 -07:00
Andrew Kryczka 4e0065015d make all DB::Get overloads virtual
Summary:
some fbcode services override it, we need to keep it virtual.

original change: #1756
Closes https://github.com/facebook/rocksdb/pull/2065

Differential Revision: D4808123

Pulled By: ajkr

fbshipit-source-id: 5eaeea7
2017-03-30 23:39:14 -07:00
Orgad Shaneh 6401a8b76b Fix build with MinGW
Summary:
There still are many warnings (most of them about invalid printf format
for long long), but it builds if FAIL_ON_WARNINGS is disabled.
Closes https://github.com/facebook/rocksdb/pull/2052

Differential Revision: D4807355

Pulled By: siying

fbshipit-source-id: ef03786
2017-03-30 16:54:52 -07:00
Sagar Vemuri c6d04f2ecf Option to fail a request as incomplete when skipping too many internal keys
Summary:
Operations like Seek/Next/Prev sometimes take too long to complete when there are many internal keys to be skipped. Adding an option, max_skippable_internal_keys -- which could be used to set a threshold for the maximum number of keys that can be skipped, will help to address these cases where it is much better to fail a request (as incomplete) than to wait for a considerable time for the request to complete.

This feature -- to fail an iterator seek request as incomplete, is disabled by default when max_skippable_internal_keys = 0. It is enabled only when max_skippable_internal_keys > 0.

This feature is based on the discussion mentioned in the PR https://github.com/facebook/rocksdb/pull/1084.
Closes https://github.com/facebook/rocksdb/pull/2000

Differential Revision: D4753223

Pulled By: sagar0

fbshipit-source-id: 1c973f7
2017-03-30 12:09:21 -07:00
Siying Dong 67d7623794 Expose the stalling information through DB::GetProperty()
Summary:
Add two DB properties: rocksdb.actual_delayed_write_rate and rocksdb.is_write_stooped, for people to know whether current writes are being throttled.
Closes https://github.com/facebook/rocksdb/pull/2043

Differential Revision: D4782975

Pulled By: siying

fbshipit-source-id: 6b2f5cf
2017-03-29 11:54:20 -07:00
Maysam Yabandeh e7731d119a Configure index partition size
Summary:
Allow the users to specify the target index partition size.

With this patch an index partition is cut before its estimated in-memory size goes above the configured value for metadata_block_size. The filter partitions are still cut right after an index partition is cut.
Closes https://github.com/facebook/rocksdb/pull/2041

Differential Revision: D4780216

Pulled By: maysamyabandeh

fbshipit-source-id: 95a0831
2017-03-28 12:09:12 -07:00
Shu Zhang 8dee8cad9e Enable fifo compaction benchmark to db_bench
Summary:
Added fifo benchmark to db_bench.
One thing i am not sure is that i am using CompactRange() instead of CompactFiles(). (may cause performance skew because CompactionRange() is not happening in current thread?)  For CompactFiles(), for some reason FIFO compaction doesn't work as expected. More insight is welcomed. I guess FIFO compaction doesn't work with file names? igorcanadi

test cmd:
./db_bench --compaction_style=2 --benchmarks=fillseqdeterministic --disable_auto_compactions --num_levels=1 --fifo_compaction_max_table_files_size_mb=10

---------------------- DB 0 LSM ---------------------
Level[0]: /000014.sst(size: 4211014 bytes)
fillseqdeterministic :       4.731 micros/op 211381 ops/sec;   23.4 MB/s
Closes https://github.com/facebook/rocksdb/pull/1734

Differential Revision: D4774964

Pulled By: siying

fbshipit-source-id: 9d08df6
2017-03-24 17:09:15 -07:00
Warren Falk 41ccae6d26 Add C API functions (and tests) for WriteBatchWithIndex
Summary:
I've added functions to the C API to support WriteBatchWithIndex as requested in #1833.

I've also added unit tests to c_test

I've implemented the WriteBatchWithIndex variation of every function available for regular WriteBatch.  And added additional functions unique to WriteBatchWithIndex.

For now, the following is omitted:
  1. The ability to create WriteBatchWithIndex's custom batch-only iterator as I'm not sure what its purpose is.  It should be possible to add later if anyone wants it.
  2. The ability to create the batch with a fallback comparator, since it appears to be unnecessary.  I believe the column family comparator will be used for this, meaning those using a custom comparator can just use the column family variations.
Closes https://github.com/facebook/rocksdb/pull/1985

Differential Revision: D4760039

Pulled By: siying

fbshipit-source-id: 393227e
2017-03-23 15:54:13 -07:00
Siying Dong 9ef3627fd3 Allow checkpointing without flushing
Summary:
Add a parameter to Checkpoint::CreateCheckpoint() so that flush can be skipped if total log file size is within a threshold.
Closes https://github.com/facebook/rocksdb/pull/1993

Differential Revision: D4719842

Pulled By: siying

fbshipit-source-id: 4f9d9e1
2017-03-21 18:09:13 -07:00
Siying Dong 17866ecc3a Allow Users to change customized ldb tools' header in help printing
Summary: Closes https://github.com/facebook/rocksdb/pull/2018

Differential Revision: D4748448

Pulled By: siying

fbshipit-source-id: a54c2f9
2017-03-21 17:39:12 -07:00
Maysam Yabandeh 21d8c31796 remove LIKELY from public headers
Summary:
Fixes #2008
Closes https://github.com/facebook/rocksdb/pull/2009

Differential Revision: D4739885

Pulled By: maysamyabandeh

fbshipit-source-id: 36fba5d
2017-03-20 12:54:14 -07:00
Siying Dong dbae438a0e Replace "DEPRECATED" comment to "not supported"
Summary:
"DEPRECATED" is ambiguous. Make it clear that those options not supported won't take effect.
Closes https://github.com/facebook/rocksdb/pull/1995

Differential Revision: D4724241

Pulled By: siying

fbshipit-source-id: 1e812b8
2017-03-16 13:54:11 -07:00
Islam AbdelRahman 995618a821 Support SstFileManager::SetDeleteRateBytesPerSecond()
Summary:
Update DeleteScheduler component to support changing delete rate in runtime by introducing
SstFileManager::SetDeleteRateBytesPerSecond()
Closes https://github.com/facebook/rocksdb/pull/1994

Differential Revision: D4719906

Pulled By: IslamAbdelRahman

fbshipit-source-id: e6b8d9e
2017-03-16 12:09:15 -07:00
slyrz 5fa927aa97 Add Xpress and ZSTD CompressionType values to C header
Summary:
This PR copies the missing CompressionType values
```cpp
enum CompressionType : unsigned char {
  ...
  kXpressCompression = 0x6,
  kZSTD = 0x7,
  ...
};
```
into the C header.
Closes https://github.com/facebook/rocksdb/pull/1989

Differential Revision: D4706265

Pulled By: siying

fbshipit-source-id: e65e62a
2017-03-14 12:09:21 -07:00
Maysam Yabandeh 11526252cc Pinnableslice (2nd attempt)
Summary:
PinnableSlice

    Summary:
    Currently the point lookup values are copied to a string provided by the
    user. This incures an extra memcpy cost. This patch allows doing point lookup
    via a PinnableSlice which pins the source memory location (instead of
    copying their content) and releases them after the content is consumed
    by the user. The old API of Get(string) is translated to the new API
    underneath.

    Here is the summary for improvements:

    value 100 byte: 1.8% regular, 1.2% merge values
    value 1k byte: 11.5% regular, 7.5% merge values
    value 10k byte: 26% regular, 29.9% merge values
    The improvement for merge could be more if we extend this approach to
    pin the merge output and delay the full merge operation until the user
    actually needs it. We have put that for future work.

    PS:
    Sometimes we observe a small decrease in performance when switching from
    t5452014 to this patch but with the old Get(string) API. The d
Closes https://github.com/facebook/rocksdb/pull/1756

Differential Revision: D4391738

Pulled By: maysamyabandeh

fbshipit-source-id: 6f3edd3
2017-03-13 11:54:10 -07:00
Sagar Vemuri 1ffbdfd9a7 Add a new SstFileWriter constructor without explicit comparator
Summary:
The comparator param in SstFileWriter constructor is redundant as it already exists as a field in options. So the current SstFileWriter constructor should be deprecated in favor of a new one which does not take a comparator.
Note that the jni/java apis have not been touched yet.
Closes https://github.com/facebook/rocksdb/pull/1978

Differential Revision: D4685629

Pulled By: sagar0

fbshipit-source-id: 372ce96
2017-03-13 11:39:13 -07:00
Andrew Kryczka fe1835617a release 5.3
Summary: Closes https://github.com/facebook/rocksdb/pull/1971

Differential Revision: D4683851

Pulled By: ajkr

fbshipit-source-id: 967116e
2017-03-09 12:39:10 -08:00
Andrew Kryczka 5b11124e39 add max to histogram stats
Summary:
Domas enlightened me about p100 (i.e., max) stats. Let's add them to our histograms.
Closes https://github.com/facebook/rocksdb/pull/1968

Differential Revision: D4678716

Pulled By: ajkr

fbshipit-source-id: 65e7118
2017-03-08 22:24:15 -08:00
Maysam Yabandeh 54b434110e Builders for partition filter
Summary:
This is the second split of this pull request: https://github.com/facebook/rocksdb/pull/1891 which includes only the builder part. The testing will be included in the third split, where the reader is also included.
Closes https://github.com/facebook/rocksdb/pull/1952

Differential Revision: D4660272

Pulled By: maysamyabandeh

fbshipit-source-id: 36b3cf0
2017-03-07 13:54:12 -08:00
Sagar Vemuri 97edc72d39 Add a memtable-only iterator
Summary:
This PR is to support a way to iterate over all the keys that are just in memtables.
Closes https://github.com/facebook/rocksdb/pull/1953

Differential Revision: D4663500

Pulled By: sagar0

fbshipit-source-id: 144e177
2017-03-07 11:54:10 -08:00
Andrew Kryczka 7c80a6d7d1 Statistic for how often rate limiter is drained
Summary:
This is the metric I plan to use for adaptive rate limiting. The statistics are updated only if the rate limiter is drained by flush or compaction. I believe (but am not certain) that this is the normal case.

The Statistics object is passed in RateLimiter::Request() to avoid requiring changes to client code, which would've been necessary if we passed it in the RateLimiter constructor.
Closes https://github.com/facebook/rocksdb/pull/1946

Differential Revision: D4646489

Pulled By: ajkr

fbshipit-source-id: d8e0161
2017-03-02 17:54:15 -08:00
Andrew Kryczka 0ad5af42d0 Clarify VerifyBackup behavior
Summary:
It's non-obvious to users that using the same backup engine for creating/verifying provides better results than using separate backup engines, so add a comment in header.
Closes https://github.com/facebook/rocksdb/pull/1942

Differential Revision: D4637865

Pulled By: ajkr

fbshipit-source-id: e6efe24
2017-03-02 17:24:11 -08:00
Islam AbdelRahman f89b3893c0 Remove skip_table_builder_flush and default it to true
Summary:
This option is needed to be enabled for Direct IO
and I cannot think of a reason where we need to disable it

remove it and default it to true
Closes https://github.com/facebook/rocksdb/pull/1944

Differential Revision: D4641088

Pulled By: IslamAbdelRahman

fbshipit-source-id: d7085b9
2017-03-02 16:54:10 -08:00
Aaron Gao e877afa08b Remove bulk loading and auto_roll_logger in rocksdb_lite
Summary:
shrink lite size
Closes https://github.com/facebook/rocksdb/pull/1929

Differential Revision: D4622059

Pulled By: siying

fbshipit-source-id: 050b796
2017-02-28 11:09:11 -08:00
Islam AbdelRahman 08864df212 Move advanced column family options to advanced_options.h
Summary:
For the sake of making our options simpler, we should keep options.h as simple as possible and move more advanced/less common options to advaned_options.h

I started with ColumnFamilyOptions and also did some re-ordering

I have moved all ColumnFamilyOptions to advanced_options.h and only left these options in options.h

```
const Comparator* comparator = BytewiseComparator();
std::shared_ptr<MergeOperator> merge_operator = nullptr;
const CompactionFilter* compaction_filter = nullptr;
std::shared_ptr<CompactionFilterFactory> compaction_filter_factory = nullptr;
size_t write_buffer_size = 64 << 20;
CompressionType compression;
int level0_file_num_compaction_trigger = 4;
bool disable_auto_compactions = false;
```
Please feel free to comment on specific options if you think they should be advanced or should not be
Closes https://github.com/facebook/rocksdb/pull/1847

Differential Revision: D4519996

Pulled By: IslamAbdelRahman

fbshipit-source-id: abebd9a
2017-02-27 17:54:14 -08:00
Siying Dong 8efb5ffa2a [rocksdb][PR] Remove option min_partial_merge_operands and verify_checksums_in_comp…
Summary:
…action

 The two options, min_partial_merge_operands and verify_checksums_in_compaction, are not seldom used. Remove them to reduce the total number of options. Also remove them from Java and C interface.
Closes https://github.com/facebook/rocksdb/pull/1902

Differential Revision: D4601219

Pulled By: siying

fbshipit-source-id: aad4cb2
2017-02-23 15:09:12 -08:00
Giuseppe Ottaviano 4d7c06cedf Make WriteBatchWithIndex moveble
Summary:
`WriteBatchWithIndex` has an incorrect implicitly-generated move constructor (it will copy the pointer causing a double-free on destruction). Just switch to `unique_ptr` so we get correct move semantics for free.
Closes https://github.com/facebook/rocksdb/pull/1899

Differential Revision: D4598896

Pulled By: ajkr

fbshipit-source-id: 2373d47
2017-02-22 17:54:11 -08:00
Islam AbdelRahman 1560b2f5f0 Temporarly return deprecated functions to fix MongoRocks build
Summary:
MongoRocks is still using some deprecated functions, return them temporarily
Closes https://github.com/facebook/rocksdb/pull/1892

Differential Revision: D4592451

Pulled By: IslamAbdelRahman

fbshipit-source-id: 5e6be3e
2017-02-21 12:54:11 -08:00
Yi Wu 381fd32247 Remove timeout_hint_us from WriteOptions
Summary:
The option has been deprecated for two years and has no effect. Removing.
Closes https://github.com/facebook/rocksdb/pull/1866

Differential Revision: D4555203

Pulled By: yiwu-arbug

fbshipit-source-id: c48f627
2017-02-17 15:24:17 -08:00
Marcin Dlugajczyk a618a16f44 New subcode for IOError to detect the ESTALE errno
Summary:
I'd like to propose a patch to expose a new IOError type with subcode kStaleFile to allow to detect when ESTALE error is returned. This allows the rocksdb consumers to handle this error separately from other IOErrors.

I've also added a missing string representation for the kDeadlock subcode, I believe calling ToString() on Status object with that subcode would result in an out of band access in the msgs array,

Please let me know if you have any questions or would like me to make any changes to this pull request.
Closes https://github.com/facebook/rocksdb/pull/1748

Differential Revision: D4387675

Pulled By: IslamAbdelRahman

fbshipit-source-id: 67feb13
2017-02-17 10:54:13 -08:00
Islam AbdelRahman 7ab0051835 Remove deprecated DB::AddFile and DB::CompactRange
Summary:
Remove functions that we deprecated long time ago in db.h
Closes https://github.com/facebook/rocksdb/pull/1878

Differential Revision: D4576521

Pulled By: IslamAbdelRahman

fbshipit-source-id: dfddad1
2017-02-17 10:54:13 -08:00
Aaron Gao db2b4eb50e avoid direct io in rocksdb_lite
Summary:
fix lite bugs
disable direct io in lite mode
Closes https://github.com/facebook/rocksdb/pull/1870

Differential Revision: D4559866

Pulled By: yiwu-arbug

fbshipit-source-id: 3761c51
2017-02-16 10:39:13 -08:00
Xiaofei Du 7106a994fe Use monotonic time points in write_controller.cc and rate_limiter.cc
Summary:
NowMicros() provides non-monotonic time. When wall clock is
synchronized or changed, the non-monotonicity time points will affect write rate
controllers. This patch changes write_controller.cc and rate_limiter.cc to use
monotonic time points.
Closes https://github.com/facebook/rocksdb/pull/1865

Differential Revision: D4561732

Pulled By: siying

fbshipit-source-id: 95ece62
2017-02-14 18:24:24 -08:00
Sagar Vemuri eb912a927e Remove disableDataSync option
Summary:
Remove disableDataSync, and another similarly named disable_data_sync options.
This is being done to simplify options, and also because the performance gains of this feature can be achieved by other methods.
Closes https://github.com/facebook/rocksdb/pull/1859

Differential Revision: D4541292

Pulled By: sagar0

fbshipit-source-id: 5b3a6ca
2017-02-13 11:09:13 -08:00
Siying Dong 9afa20cf2f Increase build version and HISTORY.md for releasing 5.2
Summary:
Also clean up HISTORY.md a little bit.
Closes https://github.com/facebook/rocksdb/pull/1854

Differential Revision: D4539556

Pulled By: siying

fbshipit-source-id: 567391e
2017-02-10 12:09:15 -08:00
Maysam Yabandeh ac2a77a746 Announce the experimetnal two-level index feature in HISTORY.md
Summary:
Announce the experimetnal two-level index feature in HISTORY.md. Also updated the default for index_per_partition to 1024.
Closes https://github.com/facebook/rocksdb/pull/1855

Differential Revision: D4530102

Pulled By: maysamyabandeh

fbshipit-source-id: b0fc6ff
2017-02-08 14:24:10 -08:00
Andrew Kryczka 186c7eedb0 Remove incorrect statistics warning
Summary:
statistics are always concurrently updated regardless of whether a single instance or multiple instances are used. remove the warning since it's unnecessarily preventing people from sharing stats objects across rocksdb instances.

replace #1819 which was having some problem with our internal tools.
Closes https://github.com/facebook/rocksdb/pull/1826

Differential Revision: D4495970

Pulled By: ajkr

fbshipit-source-id: 6601f61
2017-02-07 11:39:12 -08:00
Maysam Yabandeh 69d5262c81 Two-level Indexes
Summary:
Partition Index blocks and use a Partition-index as a 2nd level index.

The two-level index can be used by setting
BlockBasedTableOptions::kTwoLevelIndexSearch as the index type and
configuring BlockBasedTableOptions::index_per_partition

t15539501
Closes https://github.com/facebook/rocksdb/pull/1814

Differential Revision: D4473535

Pulled By: maysamyabandeh

fbshipit-source-id: bffb87e
2017-02-06 16:39:12 -08:00
Dmitri Smirnov 0a4cdde50a Windows thread
Summary:
introduce new methods into a public threadpool interface,
- allow submission of std::functions as they allow greater flexibility.
- add Joining methods to the implementation to join scheduled and submitted jobs with
  an option to cancel jobs that did not start executing.
- Remove ugly `#ifdefs` between pthread and std implementation, make it uniform.
- introduce pimpl for a drop in replacement of the implementation
- Introduce rocksdb::port::Thread typedef which is a replacement for std::thread.  On Posix Thread defaults as before std::thread.
- Implement WindowsThread that allocates memory in a more controllable manner than windows std::thread with a replaceable implementation.
- should be no functionality changes.
Closes https://github.com/facebook/rocksdb/pull/1823

Differential Revision: D4492902

Pulled By: siying

fbshipit-source-id: c74cb11
2017-02-06 14:54:18 -08:00
Vitaliy Liptchinsky 1aaa898cf1 Adding GetApproximateMemTableStats method
Summary:
Added method that returns approx num of entries as well as size for memtables.
Closes https://github.com/facebook/rocksdb/pull/1841

Differential Revision: D4511990

Pulled By: VitaliyLi

fbshipit-source-id: 9a4576e
2017-02-06 14:54:16 -08:00
Siying Dong 4a3e7d320c Change the default of delayed slowdown value to 16MB/s
Summary:
Change the default of delayed slowdown value to 16MB/s and further increase the L0 stop condition to 36 files.
Closes https://github.com/facebook/rocksdb/pull/1821

Differential Revision: D4489229

Pulled By: siying

fbshipit-source-id: 1003981
2017-02-01 20:39:17 -08:00
Islam AbdelRahman 8b369ae5bd Cleaner default options using C++11 in-class init
Summary:
C++11 in-class initialization is cleaner and makes it the default more explicit to our users and more visible.
Use it for ColumnFamilyOptions and DBOptions
Closes https://github.com/facebook/rocksdb/pull/1822

Differential Revision: D4490473

Pulled By: IslamAbdelRahman

fbshipit-source-id: c493a87
2017-01-31 18:09:15 -08:00
oranagra b96372dead improving the C wrapper
Summary:
- rocksdb_property_int (so that we don't have to parse strings)
- and rocksdb_set_options (to allow controlling options via strings)
- a few other missing options exposed
- a documentation comment fix
Closes https://github.com/facebook/rocksdb/pull/1793

Differential Revision: D4456569

Pulled By: yiwu-arbug

fbshipit-source-id: 9f1fac1
2017-01-27 17:39:16 -08:00
Siying Dong 2d75cd40d3 NewLRUCache() to pick number of shard bits based on capacity if not given
Summary:
If the users use the NewLRUCache() without passing in the number of shard bits, instead of using hard-coded 6, we'll determine it based on capacity.
Closes https://github.com/facebook/rocksdb/pull/1584

Differential Revision: D4242517

Pulled By: siying

fbshipit-source-id: 86b0f18
2017-01-27 06:39:12 -08:00
Andrew Kryczka 17c1180603 Generalize Env registration framework
Summary:
The Env registration framework supports registering client Envs and selecting which one to instantiate according to a text field. This enabled things like adding the -env_uri argument to db_bench, so the same binary could be reused with different Envs just by changing CLI config.

Now this problem has come up again in a non-Env context, as I want to instantiate a client Statistics implementation from db_bench, which is configured entirely via text parameters. Also, in the future we may wish to use it for deserializing client objects when loading OPTIONS file.

This diff generalizes the Env registration logic to work with arbitrary types.

- Generalized registration and instantiation code by templating them
- The entire implementation is in a header file as that's Google style guide's recommendation for template definitions
- Pattern match with std::regex_match rather than checking prefix, which was the previous behavior
- Rename functions/files to be non-Env-specific
Closes https://github.com/facebook/rocksdb/pull/1776

Differential Revision: D4421933

Pulled By: ajkr

fbshipit-source-id: 34647d1
2017-01-25 16:09:14 -08:00
Hyeonseok Oh f2b4939da4 fixed typo
Summary:
I fixed exisit -> exist
Closes https://github.com/facebook/rocksdb/pull/1799

Differential Revision: D4451466

Pulled By: yiwu-arbug

fbshipit-source-id: b447c3a
2017-01-23 12:54:13 -08:00
Jay Lee 537da370da c: allow set savepoint to writebatch
Summary:
Allow set SavePoint to WriteBatch in C ABI.
Closes https://github.com/facebook/rocksdb/pull/1698

Differential Revision: D4378556

Pulled By: yiwu-arbug

fbshipit-source-id: afca746
2017-01-20 13:24:13 -08:00
Vitaliy Liptchinsky e840213d6e Change DB::GetApproximateSizes for more flexibility needed for MyRocks
Summary:
Added an option to GetApproximateSizes to exclude file stats, as MyRocks has those counted exactly and we need only stats from memtables.
Closes https://github.com/facebook/rocksdb/pull/1787

Differential Revision: D4441111

Pulled By: IslamAbdelRahman

fbshipit-source-id: c11f4c3
2017-01-20 09:39:11 -08:00
Aaron Gao 3e6899d116 change UseDirectIO() to use_direct_io()
Summary:
also change variable name `direct_io_` to `use_direct_io_` in WritableFile to make it consistent with read path.
Closes https://github.com/facebook/rocksdb/pull/1770

Differential Revision: D4416435

Pulled By: lightmark

fbshipit-source-id: 4143c53
2017-01-13 12:09:15 -08:00
Aaron Gao dc2584eea0 direct reads refactor
Summary:
direct IO reads refactoring
remove unnecessary classes and unified interfaces
tested with db_bench

need more change for options and ON/OFF for different files.
Since disabled is default, it should be fine now
Closes https://github.com/facebook/rocksdb/pull/1636

Differential Revision: D4307189

Pulled By: lightmark

fbshipit-source-id: 6991e22
2017-01-11 16:54:12 -08:00
Mike Kolupaev d18dd2c41f Abort compactions more reliably when closing DB
Summary:
DB shutdown aborts running compactions by setting an atomic shutting_down=true that CompactionJob periodically checks. Without this PR it checks it before processing every _output_ value. If compaction filter filters everything out, the compaction is uninterruptible. This PR adds checks for shutting_down on every _input_ value (in CompactionIterator and MergeHelper).

There's also some minor code cleanup along the way.
Closes https://github.com/facebook/rocksdb/pull/1639

Differential Revision: D4306571

Pulled By: yiwu-arbug

fbshipit-source-id: f050890
2017-01-11 15:09:21 -08:00
Andrew Kryczka fe395fb63d Allow incrementing refcount on cache handles
Summary:
Previously the only way to increment a handle's refcount was to invoke Lookup(), which (1) did hash table lookup to get cache handle, (2) incremented that handle's refcount. For a future DeleteRange optimization, I added a function, Ref(), for when the caller already has a cache handle and only needs to do (2).
Closes https://github.com/facebook/rocksdb/pull/1761

Differential Revision: D4397114

Pulled By: ajkr

fbshipit-source-id: 9addbe5
2017-01-10 16:54:20 -08:00
Maysam Yabandeh d0ba8ec8f9 Revert "PinnableSlice"
Summary:
This reverts commit 54d94e9c2c.

The pull request was landed by mistake.
Closes https://github.com/facebook/rocksdb/pull/1755

Differential Revision: D4391678

Pulled By: maysamyabandeh

fbshipit-source-id: 36d5149
2017-01-08 14:24:12 -08:00
Maysam Yabandeh 54d94e9c2c PinnableSlice
Summary:
Currently the point lookup values are copied to a string provided by the user.
This incures an extra memcpy cost. This patch allows doing point lookup
via a PinnableSlice which pins the source memory location (instead of
copying their content) and releases them after the content is consumed
by the user. The old API of Get(string) is translated to the new API
underneath.

 Here is the summary for improvements:
 1. value 100 byte: 1.8%  regular, 1.2% merge values
 2. value 1k   byte: 11.5% regular, 7.5% merge values
 3. value 10k byte: 26% regular,    29.9% merge values

 The improvement for merge could be more if we extend this approach to
 pin the merge output and delay the full merge operation until the user
 actually needs it. We have put that for future work.

PS:
Sometimes we observe a small decrease in performance when switching from
t5452014 to this patch but with the old Get(string) API. The difference
is a little and could be noise. More importantly it is safely
cancelled
Closes https://github.com/facebook/rocksdb/pull/1732

Differential Revision: D4374613

Pulled By: maysamyabandeh

fbshipit-source-id: a077f1a
2017-01-08 13:54:13 -08:00
Islam AbdelRahman ac73d7558b Add GetSupportedCompressions() convenience function
Summary:
This function will return a list of supported compression types in RocksDB
This is needed for MyRocks https://github.com/facebook/mysql-5.6/pull/446
Closes https://github.com/facebook/rocksdb/pull/1747

Differential Revision: D4385921

Pulled By: IslamAbdelRahman

fbshipit-source-id: 2f5b59f
2017-01-06 11:24:14 -08:00
siddontang 653ac1f9c6 C API: support total_order_mode
Summary: Closes https://github.com/facebook/rocksdb/pull/1687

Differential Revision: D4349210

Pulled By: IslamAbdelRahman

fbshipit-source-id: 32d0fbd
2017-01-03 18:39:14 -08:00
Adam Retter 85ac1a320a Fix rocksdb::Status::getState
Summary:
This fixes the Java API for Status#getState use in Native code and also simplifies the implementation of rocksdb::Status::getState.
Closes https://github.com/facebook/rocksdb/issues/1688
Closes https://github.com/facebook/rocksdb/pull/1714

Differential Revision: D4364181

Pulled By: yiwu-arbug

fbshipit-source-id: 8e073b4
2017-01-03 18:39:14 -08:00
Maysam Yabandeh 0712d541d1 Delegate Cleanables
Summary:
Cleanable objects will perform the registered cleanups when
they are destructed. We however rather to delay this cleaning like when
we are gathering the merge operands. Current approach is to create the
Cleanable object on heap (instead of on stack) and delay deleting it.

By allowing Cleanables to delegate their cleanups to another cleanable
object we can delay the cleaning without however the need to craete the
cleanable object on heap and keeping it around. This patch applies this
technique for the cleanups of BlockIter and shows improved performance
for some in-memory benchmarks:
+1.8% for merge worklaod, +6.4% for non-merge workload when the merge
operator is specified.
https://our.intern.facebook.com/intern/tasks?t=15168163

Non-merge benchmark:
TEST_TMPDIR=/dev/shm/v100nocomp/ ./db_bench --benchmarks=fillrandom
--num=1000000 -value_size=100 -compression_type=none

Reading random with no merge operator specified:
TEST_TMPDIR=/dev/shm/v100nocomp/ ./db_bench
--benchmarks="read
Closes https://github.com/facebook/rocksdb/pull/1711

Differential Revision: D4361163

Pulled By: maysamyabandeh

fbshipit-source-id: 9801e07
2016-12-29 15:54:19 -08:00
Islam AbdelRahman d58ef52ba6 Allow SstFileWriter to Fadvise the file away from page cache
Summary:
Add `fadvise_trigger` option to `SstFileWriter`

If fadvise_trigger is passed with a non-zero value, SstFileWriter will invalidate the os page cache every `fadvise_trigger` bytes for the sst file
Closes https://github.com/facebook/rocksdb/pull/1731

Differential Revision: D4371246

Pulled By: IslamAbdelRahman

fbshipit-source-id: 91caff1
2016-12-29 15:09:19 -08:00
Siying Dong 438f22bc56 Fix bug of Checkpoint loses recent transactions with 2PC
Summary:
If 2PC is enabled, checkpoint may not copy previous log files that contain uncommitted prepare records. In this diff we keep those files.
Closes https://github.com/facebook/rocksdb/pull/1724

Differential Revision: D4368319

Pulled By: siying

fbshipit-source-id: cc2c746
2016-12-28 12:24:16 -08:00
Sage Weil 4e07b08eff include/rocksdb/utilities/env_librados: fix typo
Summary:
Broken by 972f96b3fb

Signed-off-by: Sage Weil <sage@redhat.com>
Closes https://github.com/facebook/rocksdb/pull/1719

Differential Revision: D4366123

Pulled By: IslamAbdelRahman

fbshipit-source-id: a11e535
2016-12-23 19:09:14 -08:00
Yi Wu ab48c165a9 Print cache options to info log
Summary:
Improve cache options logging to info log.
Also print the value of
cache_index_and_filter_blocks_with_high_priority.
Closes https://github.com/facebook/rocksdb/pull/1709

Differential Revision: D4358776

Pulled By: yiwu-arbug

fbshipit-source-id: 8f030a0
2016-12-22 14:54:19 -08:00
Aaron Gao 972f96b3fb direct io write support
Summary:
rocksdb direct io support

```
[gzh@dev11575.prn2 ~/rocksdb] ./db_bench -benchmarks=fillseq --num=1000000
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 5.0
Date:       Wed Nov 23 13:17:43 2016
CPU:        40 * Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
CPUCache:   25600 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 1
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       4.393 micros/op 227639 ops/sec;   25.2 MB/s

[gzh@dev11575.prn2 ~/roc
Closes https://github.com/facebook/rocksdb/pull/1564

Differential Revision: D4241093

Pulled By: lightmark

fbshipit-source-id: 98c29e3
2016-12-22 13:09:19 -08:00
Yi Wu 5d1457dbbf Dump persistent cache options
Summary:
Dump persistent cache options
Closes https://github.com/facebook/rocksdb/pull/1679

Differential Revision: D4337019

Pulled By: yiwu-arbug

fbshipit-source-id: 3812f8a
2016-12-19 14:09:12 -08:00
Ding Ma 1a136c1f13 Expose file size
Summary:
add a new function to SstFileWriter that will tell the user how big is there file right now.
Closes https://github.com/facebook/rocksdb/pull/1686

Differential Revision: D4338868

Pulled By: mdyuki1016

fbshipit-source-id: c1ee16a
2016-12-16 18:39:12 -08:00
Daniel Black 816c1e30ca gcc-7 requires include <functional> for std::function
Summary:
Fixes compile error:

In file included from ./util/statistics.h:17:0,
                 from ./util/stop_watch.h:8,
                 from ./util/perf_step_timer.h:9,
                 from ./util/iostats_context_imp.h:8,
                 from ./util/posix_logger.h:27,
                 from ./port/util_logger.h:18,
                 from ./db/auto_roll_logger.h:15,
                 from db/auto_roll_logger.cc:6:
./util/thread_local.h:65:16: error: 'function' in namespace 'std' does not name a template type
   typedef std::function<void(void*, void*)> FoldFunc;
Closes https://github.com/facebook/rocksdb/pull/1656

Differential Revision: D4318702

Pulled By: yiwu-arbug

fbshipit-source-id: 8c5d17a
2016-12-16 11:24:18 -08:00
siddontang 8f5d24ae68 C API: support get usage and pinned_usage for cache
Summary: Closes https://github.com/facebook/rocksdb/pull/1671

Differential Revision: D4327453

Pulled By: yiwu-arbug

fbshipit-source-id: bcdbc65
2016-12-15 17:24:17 -08:00
Daniel Black bfbcec2339 Gcc 7 error expansion to defined
Summary:
sorry if these gcc-7/clang-4 cleanups are getting tedious.
Closes https://github.com/facebook/rocksdb/pull/1658

Differential Revision: D4318792

Pulled By: yiwu-arbug

fbshipit-source-id: 8e85891
2016-12-13 18:39:14 -08:00
siddontang b57dd9262a C API: support writebatch delete range
Summary:
Seem that writebatch delete range can work now, so I add C API for later use.

Btw, can we use this feature in production now?
Closes https://github.com/facebook/rocksdb/pull/1647

Differential Revision: D4314534

Pulled By: ajkr

fbshipit-source-id: e835165
2016-12-13 11:24:18 -08:00
Andrew Kryczka f0c509e2c8 Return finer-granularity status from Env::GetChildren*
Summary:
It'd be nice to use the error status type to distinguish
between user error and system error. For example, GetChildren can fail
listing a backup directory's contents either because a bad path was provided
(user error) or because an operation failed, e.g., a remote storage service
call failed (system error). In the former case, we want to continue and treat
the backup directory as empty; in the latter case, we want to immediately
propagate the error to the caller.

This diff uses NotFound to indicate user error and IOError to indicate
system error. Previously IOError indicated both.
Closes https://github.com/facebook/rocksdb/pull/1644

Differential Revision: D4312157

Pulled By: ajkr

fbshipit-source-id: 51b4f24
2016-12-12 12:54:13 -08:00
zhangjinpeng1987 45c7ce1377 CompactRangeOptions C API
Summary:
Add C API for CompactRangeOptions.
Closes https://github.com/facebook/rocksdb/pull/1596

Differential Revision: D4252339

Pulled By: yiwu-arbug

fbshipit-source-id: f768f93
2016-12-07 17:54:14 -08:00
Islam AbdelRahman ed8fbdb560 Add EventListener::OnExternalFileIngested() event
Summary:
Add EventListener::OnExternalFileIngested() to allow user to subscribe to external file ingestion events
Closes https://github.com/facebook/rocksdb/pull/1623

Differential Revision: D4285844

Pulled By: IslamAbdelRahman

fbshipit-source-id: 0b95a88
2016-12-06 14:09:17 -08:00
Manuel Ung 2005c88a75 Implement non-exclusive locks
Summary:
This is an implementation of non-exclusive locks for pessimistic transactions. It is relatively simple and does not prevent starvation (ie. it's possible that request for exclusive access will never be granted if there are always threads holding shared access). It is done by changing `KeyLockInfo` to hold an set a transaction ids, instead of just one, and adding a flag specifying whether this lock is currently held with exclusive access or not.

Some implementation notes:
- Some lock diagnostic functions had to be updated to return a set of transaction ids for a given lock, eg. `GetWaitingTxn` and `GetLockStatusData`.
- Deadlock detection is a bit more complicated since a transaction can now wait on multiple other transactions. A BFS is done in this case, and deadlock detection depth is now just a limit on the number of transactions we visit.
- Expirable transactions do not work efficiently with shared locks at the moment, but that's okay for now.
Closes https://github.com/facebook/rocksdb/pull/1573

Differential Revision: D4239097

Pulled By: lth

fbshipit-source-id: da7c074
2016-12-05 17:39:17 -08:00
Islam AbdelRahman 67f37cf198 Allow user to specify a CF for SST files generated by SstFileWriter
Summary:
Allow user to explicitly specify that the generated file by SstFileWriter will be ingested in a specific CF.
This allow us to persist the CF id in the generated file
Closes https://github.com/facebook/rocksdb/pull/1615

Differential Revision: D4270422

Pulled By: IslamAbdelRahman

fbshipit-source-id: 7fb954e
2016-12-05 14:24:16 -08:00
Igor Canadi 3f407b065c Kill flashcache code in RocksDB
Summary:
Now that we have userspace persisted cache, we don't need flashcache anymore.
Closes https://github.com/facebook/rocksdb/pull/1588

Differential Revision: D4245114

Pulled By: igorcanadi

fbshipit-source-id: e2c1c72
2016-12-01 10:09:22 -08:00
Mike Kolupaev 247d0979aa Support for range skips in compaction filter
Summary:
This adds the ability for compaction filter to say "drop this key-value, and also drop everything up to key x". This will cause the compaction to seek input iterator to x, without reading the data. This can make compaction much faster when large consecutive chunks of data are filtered out. See the changes in include/rocksdb/compaction_filter.h for the new API.

Along the way this diff also adds ability for compaction filter changing merge operands, similar to how it can change values; we're not going to use this feature, it just seemed easier and cleaner to implement it than to document that it's not implemented :)

The diff is not as big as it may seem, about half of the lines are a test.
Closes https://github.com/facebook/rocksdb/pull/1599

Differential Revision: D4252092

Pulled By: al13n321

fbshipit-source-id: 41e1e48
2016-12-01 07:09:15 -08:00
Panagiotis Ktistakis 96fcefbf1d c api: expose option for dynamic level size target
Summary: Closes https://github.com/facebook/rocksdb/pull/1587

Differential Revision: D4245923

Pulled By: yiwu-arbug

fbshipit-source-id: 6ee7291
2016-11-30 11:24:14 -08:00
zhangjinpeng1987 00197cff39 Add C API to set base_backgroud_compactions
Summary:
Add C API to set base_backgroud_compactions
Closes https://github.com/facebook/rocksdb/pull/1571

Differential Revision: D4245709

Pulled By: yiwu-arbug

fbshipit-source-id: 792c6b8
2016-11-30 11:09:13 -08:00
Andrew Kryczka 7ffb10fc1a DeleteRange compaction statistics
Summary:
- "rocksdb.compaction.key.drop.range_del" - number of keys dropped during compaction due to a range tombstone covering them
- "rocksdb.compaction.range_del.drop.obsolete" - number of range tombstones dropped due to compaction to bottom level and no snapshot saving them
- s/CompactionIteratorStats/CompactionIterationStats/g since this class is no longer specific to CompactionIterator -- it's also updated for range tombstone iteration during compaction
- Move the above class into a separate .h file to avoid circular dependency.
Closes https://github.com/facebook/rocksdb/pull/1520

Differential Revision: D4187179

Pulled By: ajkr

fbshipit-source-id: 10c2103
2016-11-28 11:54:12 -08:00
Mike Kolupaev 236d4c67e9 Less linear search in DBIter::Seek() when keys are overwritten a lot
Summary:
In one deployment we saw high latencies (presumably from slow iterator operations) and a lot of CPU time reported by perf with this stack:

```
  rocksdb::MergingIterator::Next
  rocksdb::DBIter::FindNextUserEntryInternal
  rocksdb::DBIter::Seek
```

I think what's happening is:
1. we create a snapshot iterator,
2. we do lots of Put()s for the same key x; this creates lots of entries in memtable,
3. we seek the iterator to a key slightly smaller than x,
4. the seek walks over lots of entries in memtable for key x, skipping them because of high sequence numbers.

CC IslamAbdelRahman
Closes https://github.com/facebook/rocksdb/pull/1413

Differential Revision: D4083879

Pulled By: IslamAbdelRahman

fbshipit-source-id: a83ddae
2016-11-28 10:24:11 -08:00
Yi Wu dfb6fe6755 Unified InlineSkipList::Insert algorithm with hinting
Summary:
This PR is based on nbronson's diff with small
modifications to wire it up with existing interface. Comparing to
previous version, this approach works better for inserting keys in
decreasing order or updating the same key, and impose less restriction
to the prefix extractor.

---- Summary from original diff ----

This diff introduces a single InlineSkipList::Insert that unifies
the existing sequential insert optimization (prev_), concurrent insertion,
and insertion using externally-managed insertion point hints.

There's a deep symmetry between insertion hints (cursors) and the
concurrent algorithm.  In both cases we have partial information from
the recent past that is likely but not certain to be accurate.  This diff
introduces the struct InlineSkipList::Splice, which encodes predecessor
and successor information in the same form that was previously only used
within a single call to InsertConcurrently.  Splice holds information
about an insertion point that can be used to levera
Closes https://github.com/facebook/rocksdb/pull/1561

Differential Revision: D4217283

Pulled By: yiwu-arbug

fbshipit-source-id: 33ee437
2016-11-22 14:09:13 -08:00
Maysam Yabandeh 182b940e70 Add WriteOptions.no_slowdown
Summary:
If the WriteOptions.no_slowdown flag is set AND we need to wait or sleep for
the write request, then fail immediately with Status::Incomplete().
Closes https://github.com/facebook/rocksdb/pull/1527

Differential Revision: D4191405

Pulled By: maysamyabandeh

fbshipit-source-id: 7f3ce3f
2016-11-21 18:09:13 -08:00
Karthikeyan Radhakrishnan 4118e13330 Persistent Cache: Expose stats to user via public API
Summary:
Exposing persistent cache stats (counters) to the user via public API.
Closes https://github.com/facebook/rocksdb/pull/1485

Differential Revision: D4155274

Pulled By: siying

fbshipit-source-id: 30a9f50
2016-11-21 17:39:13 -08:00
Maysam Yabandeh 9d60151b04 Implement PositionedAppend for PosixWritableFile
Summary:
This patch clarifies the contract of PositionedAppend with some unit
tests and also implements it for PosixWritableFile. (Tasks: 14524071)
Closes https://github.com/facebook/rocksdb/pull/1514

Differential Revision: D4204907

Pulled By: maysamyabandeh

fbshipit-source-id: 06eabd2
2016-11-18 17:24:13 -08:00
Islam AbdelRahman c1038d2837 Release RocksDB 5.0
Summary:
Update HISTORY.md and version.h
Closes https://github.com/facebook/rocksdb/pull/1536

Differential Revision: D4202987

Pulled By: IslamAbdelRahman

fbshipit-source-id: 94985e3
2016-11-17 18:39:15 -08:00
Islam AbdelRahman f39452e81f Fix heap use after free ASAN/Valgrind
Summary:
Dont use c_str() of temp std::string in RocksLuaCompactionFilter::Name()
Closes https://github.com/facebook/rocksdb/pull/1535

Differential Revision: D4199094

Pulled By: IslamAbdelRahman

fbshipit-source-id: e56ce62
2016-11-17 12:24:12 -08:00
Yi Wu 36e4762ce0 Remove Ticker::SEQUENCE_NUMBER
Summary:
Remove the ticker count because:
* Having to reset the ticker count in WriteImpl is ineffiecent;
* It doesn't make sense to have it as a ticker count if multiple db
  instance share a statistics object.
Closes https://github.com/facebook/rocksdb/pull/1531

Differential Revision: D4194442

Pulled By: yiwu-arbug

fbshipit-source-id: e2110a9
2016-11-16 22:39:09 -08:00
Yueh-Hsuan Chiang 647eafdc21 Introduce Lua Extension: RocksLuaCompactionFilter
Summary:
This diff includes an implementation of CompactionFilter that allows
users to write CompactionFilter in Lua.  With this ability, users can
dynamically change compaction filter logic without requiring building
the rocksdb binary and restarting the database.

To compile, WITH_LUA_PATH must be specified to the base directory
of lua.
Closes https://github.com/facebook/rocksdb/pull/1478

Differential Revision: D4150138

Pulled By: yhchiang

fbshipit-source-id: ed84222
2016-11-16 15:39:12 -08:00
Siying Dong 972e3ff295 Enable allow_concurrent_memtable_write and enable_write_thread_adaptive_yield by default
Summary: Closes https://github.com/facebook/rocksdb/pull/1496

Differential Revision: D4168080

Pulled By: siying

fbshipit-source-id: 056ae62
2016-11-16 09:39:09 -08:00
Andrew Kryczka 489d142808 DeleteRange interface
Summary:
Expose DeleteRange() interface since we think the implementation is functionally correct now.
Closes https://github.com/facebook/rocksdb/pull/1503

Differential Revision: D4171921

Pulled By: ajkr

fbshipit-source-id: 5e21c98
2016-11-15 15:24:16 -08:00
Yi Wu 1ea79a78c9 Optimize sequential insert into memtable - Part 1: Interface
Summary:
Currently our skip-list have an optimization to speedup sequential
inserts from a single stream, by remembering the last insert position.
We extend the idea to support sequential inserts from multiple streams,
and even tolerate small reordering wihtin each stream.

This PR is the interface part adding the following:
- Add `memtable_insert_prefix_extractor` to allow specifying prefix for each key.
- Add `InsertWithHint()` interface to memtable, to allow underlying
  implementation to return a hint of insert position, which can be later
  pass back to optimize inserts.
- Memtable will maintain a map from prefix to hints and pass the hint
  via `InsertWithHint()` if `memtable_insert_prefix_extractor` is non-null.
Closes https://github.com/facebook/rocksdb/pull/1419

Differential Revision: D4079367

Pulled By: yiwu-arbug

fbshipit-source-id: 3555326
2016-11-13 19:09:18 -08:00
Maysam Yabandeh 361010d447 Exporting compaction stats in the form of a map
Summary:
Currently the compaction stats are printed to stdout. We want to export the compaction stats in a map format so that the upper layer apps (e.g., MySQL) could present
the stats in any format required by the them.
Closes https://github.com/facebook/rocksdb/pull/1477

Differential Revision: D4149836

Pulled By: maysamyabandeh

fbshipit-source-id: b3df19f
2016-11-11 20:54:14 -08:00
Jay Lee a7875272d7 c: support seek_for_prev
Summary:
support seek_for_prev in c abi.
Closes https://github.com/facebook/rocksdb/pull/1457

Differential Revision: D4135360

Pulled By: lightmark

fbshipit-source-id: 61256b0
2016-11-08 12:54:13 -08:00
Andrew Kryczka f998c9790f DeleteRange Get support
Summary:
During Get()/MultiGet(), build up a RangeDelAggregator with range
tombstones as we search through live memtable, immutable memtables, and
SST files. This aggregator is then used by memtable.cc's SaveValue() and
GetContext::SaveValue() to check whether keys are covered.

added tests for Get on memtables/files; end-to-end tests mainly in https://reviews.facebook.net/D64761
Closes https://github.com/facebook/rocksdb/pull/1456

Differential Revision: D4111271

Pulled By: ajkr

fbshipit-source-id: 6e388d4
2016-11-03 18:54:20 -07:00
zhangjinpeng1987 879f366366 Add C api for RateLimiter
Summary:
Add C api for RateLimiter.
Closes https://github.com/facebook/rocksdb/pull/1455

Differential Revision: D4116362

Pulled By: yiwu-arbug

fbshipit-source-id: cb05a8d
2016-11-03 11:09:17 -07:00
Yi Wu 437942e481 Add avoid_flush_during_shutdown DB option
Summary:
Add avoid_flush_during_shutdown DB option.
Closes https://github.com/facebook/rocksdb/pull/1451

Differential Revision: D4108643

Pulled By: yiwu-arbug

fbshipit-source-id: abdaf4d
2016-11-02 15:39:18 -07:00
Benoit Girard 2b16d664cb Change max_bytes_for_level_multiplier to double
Summary: Closes https://github.com/facebook/rocksdb/pull/1427

Differential Revision: D4094732

Pulled By: yiwu-arbug

fbshipit-source-id: b9b79e9
2016-11-01 21:09:23 -07:00
Jay Lee 16fb04434f expose IngestExternalFile to c abi
Summary:
IngestExternalFile is very useful when doing bulk load. This pr expose this API to c so many bindings can benefit from it too.
Closes https://github.com/facebook/rocksdb/pull/1454

Differential Revision: D4113420

Pulled By: yiwu-arbug

fbshipit-source-id: 307c6ae
2016-11-01 17:09:39 -07:00
Kien-hung Li eeb27e1bbd Add handy option to turn on direct I/O in db_bench (#1424) 2016-10-28 10:36:05 -07:00
Jan Doms c6168d13ab removed some declarations from c.h which resulted in undefined symbols (#1407) 2016-10-28 10:33:49 -07:00
Reid Horuff 4dfaa6610a Make IsDeadlockDetect() virtual member of Transaction
Summary: Make `IsDeadlockDetect()` virtual member of base class `Transaction` for ease of use in MyRocks

Test Plan: compiles. compiles into MyRocks call-site.

Reviewers: mung

Reviewed By: mung

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D65385
2016-10-21 14:47:59 -07:00
Islam AbdelRahman 869ae5d786 Support IngestExternalFile (remove AddFile restrictions)
Summary:
Changes in the diff

API changes:
- Introduce IngestExternalFile to replace AddFile (I think this make the API more clear)
- Introduce IngestExternalFileOptions (This struct will encapsulate the options for ingesting the external file)
- Deprecate AddFile() API

Logic changes:
- If our file overlap with the memtable we will flush the memtable
- We will find the first level in the LSM tree that our file key range overlap with the keys in it
- We will find the lowest level in the LSM tree above the the level we found in step 2 that our file can fit in and ingest our file in it
- We will assign a global sequence number to our new file
- Remove AddFile restrictions by using global sequence numbers

Other changes:
- Refactor all AddFile logic to be encapsulated in ExternalSstFileIngestionJob

Test Plan:
unit tests (still need to add more)
addfile_stress (https://reviews.facebook.net/D65037)

Reviewers: yiwu, andrewkr, lightmark, yhchiang, sdong

Reviewed By: sdong

Subscribers: jkedgar, hcz, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D65061
2016-10-20 17:05:32 -07:00
Manuel Ung 4edd39fda2 Implement deadlock detection
Summary: Implement deadlock detection. This is done by maintaining a TxnID -> TxnID map which represents the edges in the wait for graph (this is named `wait_txn_map_`).

Test Plan: transaction_test

Reviewers: IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D64491
2016-10-19 19:45:57 -07:00
Islam AbdelRahman b88f8e87c5 Support SST files with Global sequence numbers [reland]
Summary:
reland https://reviews.facebook.net/D62523

- Update SstFileWriter to include a property for a global sequence number in the SST file `rocksdb.external_sst_file.global_seqno`
- Update TableProperties to be aware of the offset of each property in the file
- Update BlockBasedTableReader and Block to be able to honor the sequence number in `rocksdb.external_sst_file.global_seqno` property and use it to overwrite all sequence number in the file

Something worth mentioning is that we don't update the seqno in the index block since and when doing a binary search, the reason for that is that it's guaranteed that SST files with global seqno will have only one user_key and each key will have seqno=0 encoded in it, This mean that this key is greater than any other key with seqno> 0. That mean that we can actually keep the current logic for these blocks

Test Plan: unit tests

Reviewers: sdong, yhchiang

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D65211
2016-10-18 16:59:37 -07:00
Yueh-Hsuan Chiang ab53998372 Bump RocksDB version to 4.13 (#1405)
Summary:
Bump RocksDB version to 4.13

Test Plan:
unit tests

Reviewers: sdong, IslamAbdelRahman, andrewkr, lightmark

Subscribers: leveldb
2016-10-18 15:39:10 -07:00
Aaron Gao d88dff4ef2 add seeforprev in history
Summary: update new feature in history and avoid breaking mongorocks

Test Plan: make check

Reviewers: sdong, yiwu, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64611
2016-10-17 15:34:13 -07:00
Yi Wu e29d3b67c2 Make max_background_compactions and base_background_compactions dynamic changeable
Summary:
Add DB::SetDBOptions to dynamic change max_background_compactions and base_background_compactions.
I'll add more dynamic changeable options soon.

Test Plan: unit test.

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64749
2016-10-14 12:25:39 -07:00
Dmitri Smirnov b9311aa65c Implement WinRandomRW file and improve code reuse (#1388) 2016-10-13 16:36:34 -07:00
Reid Horuff 02b3e3985c Make txn->GetState() const
Summary: makes Transaction::GetState() a const function.

Test Plan: compiles.

Reviewers: mung

Reviewed By: mung

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D64929
2016-10-11 15:48:50 -07:00
Yi Wu 991b585ee0 More block cache tickers
Summary: Adding several missing block cache tickers.

Test Plan:
  make all check

Reviewers: IslamAbdelRahman, yhchiang, lightmark

Reviewed By: lightmark

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64881
2016-10-11 11:59:05 -07:00
Yi Wu d6ae6dec69 Add Statistics::getAndResetTickerCount().
Summary: A convience method to atomically get and reset ticker count. I'm wanting to use it to have a thin wrapper to the statistics object to export ticker counts to ODS for LogDevice (since they don't even use fb303).

Test Plan:
test in LogDevice shadow cluster.
https://fburl.com/461868822

Reviewers: andrewkr, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64869
2016-10-11 10:54:11 -07:00
Islam AbdelRahman 2ad68b971a Support running consistency checks in release mode
Summary:
We always run consistency checks when compiling in debug mode
allow users to set Options::force_consistency_checks to true to be able to run such checks even when compiling in release mode

Test Plan:
make check -j64
make release

Reviewers: lightmark, sdong, yiwu

Reviewed By: yiwu

Subscribers: hermanlee4, andrewkr, yoshinorim, jkedgar, dhruba

Differential Revision: https://reviews.facebook.net/D64701
2016-10-07 17:21:45 -07:00
Islam AbdelRahman d062328977 Revert "Support SST files with Global sequence numbers"
This reverts commit ab01da5437.
2016-10-07 14:05:12 -07:00
Reid Horuff 37737c3a6b Expose Transaction State Publicly
Summary:
This exposes a transactions state through a public api rather than through a public member variable. I also do some name refactoring.
ExecutionStatus => TransactionState
exec_status_ => trx_state_

Test Plan: It compiles and transaction_test passes.

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, mung, dhruba, sdong

Differential Revision: https://reviews.facebook.net/D64689
2016-10-07 11:58:53 -07:00
Reid Horuff 2c1f95291d Add facility to write only a portion of WriteBatch to WAL
Summary:
When constructing a write batch a client may now call MarkWalTerminationPoint() on that batch. No batch operations after this call will be added written to the WAL but will still be inserted into the Memtable. This facility is used to remove one of the three WriteImpl calls in 2PC transactions. This produces a ~1% perf improvement.

```
RocksDB - unoptimized 2pc, sync_binlog=1, disable_2pc=off
INFO 2016-08-31 14:30:38,814 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2619 seconds. Requests/second = 28628

RocksDB - optimized 2pc , sync_binlog=1, disable_2pc=off
INFO 2016-08-31 16:26:59,442 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2581 seconds. Requests/second = 29054
```

Test Plan: Two unit tests added.

Reviewers: sdong, yiwu, IslamAbdelRahman

Reviewed By: yiwu

Subscribers: hermanlee4, dhruba, andrewkr

Differential Revision: https://reviews.facebook.net/D64599
2016-10-07 11:32:10 -07:00
Sage Weil 4985f60fc8 env_mirror: fix a few leaks (#1363)
* env_mirror: fix leak from LockFile

Signed-off-by: Sage Weil <sage@redhat.com>

* env_mirror: instruct EnvMirror whether mirrored Envs should be destroyed

The lifecycle rules for Env are frustrating and undocumented.  Notably,
Env::Default() should *not* be freed, but any Env instances we created
should be.

Explicitly instruct EnvMirror whether to clean up child Env instances.
Default to false so that we do not affect existing callers.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-10-06 10:43:05 -07:00
Igor Mihalik 5aded67ddb update of c.h (#1371)
Added rocksdb_options_set_memtable_prefix_bloom_size_ratio function implemented in c.cc but not exported via c.h
2016-10-06 10:37:19 -07:00
Islam AbdelRahman ab01da5437 Support SST files with Global sequence numbers
Summary:
- Update SstFileWriter to include a property for a global sequence number in the SST file `rocksdb.external_sst_file.global_seqno`
- Update TableProperties to be aware of the offset of each property in the file
- Update BlockBasedTableReader and Block to be able to honor the sequence number in `rocksdb.external_sst_file.global_seqno` property and use it to overwrite all sequence number in the file

Something worth mentioning is that we don't update the seqno in the index block since and when doing a binary search, the reason for that is that it's guaranteed that SST files with global seqno will have only one user_key and each key will have seqno=0 encoded in it, This mean that this key is greater than any other key with seqno> 0. That mean that we can actually keep the current logic for these blocks

Test Plan: unit tests

Reviewers: andrewkr, yhchiang, yiwu, sdong

Reviewed By: sdong

Subscribers: hcz, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62523
2016-10-03 16:12:39 -07:00
krad e91b4d0cf6 Add factory method for creating persistent cache that is accessible from public
Summary:
Currently there is no mechanism to create persistent cache from
headers. Adding a simple factory method to create a simple persistent cache with
default or NVM optimized settings.

note: Any idea to test this factory is appreciated.

Test Plan: None

Reviewers: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64527
2016-10-03 10:55:46 -07:00
Manuel Ung be1f1092c9 Expose transaction id, lock state information and transaction wait information
Summary:
This diff does 3 things:

Expose TransactionID so that we can identify transactions when we retrieve locking and lock wait information. This is exposed as `Transaction::GetID`.

Expose lock state information by locking all stripes in all column families and copying their contents to a data structure. This is exposed as `TransactionDB::GetLockStatusData`.

Adds support for tracking the transaction and the key being waited on, and exposes this as `Transaction::GetWaitingTxn`.

Test Plan: unit tests

Reviewers: horuff, sdong

Reviewed By: sdong

Subscribers: vasilep, hermanlee4, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D64413
2016-09-30 11:41:21 -07:00
Aaron Gao f517d9dd09 Add SeekForPrev() to Iterator
Summary:
Add new Iterator API, `SeekForPrev`: find the last key that <= target key
support prefix_extractor
support prefix_same_as_start
support upper_bound
not supported in iterators without Prev()

Also add tests in db_iter_test and db_iterator_test

Pass all tests
Cheers!

Test Plan: make all check -j64

Reviewers: andrewkr, yiwu, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64149
2016-09-27 18:20:57 -07:00
Yi Wu 9ed928e7a9 Split DBOptions into ImmutableDBOptions and MutableDBOptions
Summary: Use ImmutableDBOptions/MutableDBOptions internally and DBOptions only for user-facing APIs. MutableDBOptions is barely a placeholder for now. I'll start to move options to MutableDBOptions in following diffs.

Test Plan:
  make all check

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64065
2016-09-23 16:34:04 -07:00
Aaron Gao 0a1bd9c509 add cfh deletion started listener
Summary: add ColumnFamilyHandleDeletionStarted listener which can be called when user deletes handler.

Test Plan: ./listener_test

Reviewers: yiwu, IslamAbdelRahman, sdong, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60717
2016-09-22 11:56:18 -07:00
Giuseppe Ottaviano d45eb6c6d2 Fix typo (#1349) 2016-09-21 20:06:56 -07:00
Yi Wu 17f76fc564 DB::GetOptions() reflect dynamic changed options
Summary: DB::GetOptions() reflect dynamic changed options.

Test Plan: See the new unit test.

Reviewers: yhchiang, sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63903
2016-09-14 22:10:28 -07:00
Islam AbdelRahman ba65c816bb Support POSIX RandomRWFile
Summary:
Add Env::RandomRWFile in env.h and implement it for POSIX
RandomRWFile is a file that allow us to read from / write to random offsets in the file

I will implement it for other Envs later after finishing the whole task for AddFile()

Test Plan: unit tests

Reviewers: andrewkr, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62433
2016-09-13 12:08:22 -07:00
zhangjinpeng1987 b06b191362 add C api for set wal_recovery_mode (#1327)
* add C api for set wal recovery mode

* add test
2016-09-09 10:11:30 -07:00
ammongit 0fcb6dbed7 Remove extraneous function prototypes from c.h (#1326)
* Fix function prototypes from upstream commit 32149059.

* Fix removed function.

* Readd removed function.
2016-09-08 11:31:06 -07:00
Edouard A 66a91e2607 Add NoSpace subcode to IOError (#1320)
Add a sub code to distinguish "out of space" errors from regular I/O errors
2016-09-07 12:37:45 -07:00
wenduo 0e2da497c6 fix typo in option.h's comment (#1321) 2016-09-07 01:36:16 -07:00
sdong 607628d349 Support ZSTD with finalized format
Summary:
ZSTD 1.0.0 is coming. We can finally add a support of ZSTD without worrying about compatibility.
Still keep ZSTDNotFinal for compatibility reason.

Test Plan: Run all tests. Run db_bench with ZSTD version with RocksDB built with ZSTD 1.0 and older.

Reviewers: andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: cyan, igor, IslamAbdelRahman, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D63141
2016-09-06 12:22:16 -07:00
Yi Wu a88677d2cf Remove ImmutableCFOptions from public API
Summary: There's no reference to ImmutableCFOptions elsewhere in /include/rocksdb. ImmutableCFOptions was introduced in this commit (5665e5e285) but later its reference in /include/rocksdb/table.h is removed.

Test Plan:
  make all check

Reviewers: IslamAbdelRahman, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: yhchiang, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63177
2016-09-02 14:16:31 -07:00
John Alexander 4fd08f4b8b Ensure Correct Behavior of StatsLevel kExceptDetailedTimers and kExceptTimeForMutex (#1308)
* Fix StatsLevel so that kExceptTimeForMutex leaves compression stats enabled and kExceptDetailedTimers disables mutex lock stats. Also change default stats level to kExceptDetailedTimers (disabling both compression and mutex timing).

* Changed order of StatsLevel enum to simplify logic for determining what stats to record.
2016-09-01 19:57:55 -07:00
sdong 32149059f9 Merge options source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes
Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes.

Test Plan: Add two new unit tests. Run all existing tests, including jtest.

Reviewers: yhchiang, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59829
2016-09-01 14:33:24 -07:00
Aaron Gao 4590b53a4b add stats to Cache::LookUp()
Summary: basically for SimCache stats. I find most times it is hard to pass Statistics* to SimCache constructor.

Test Plan: make all check

Reviewers: andrewkr, sdong, yiwu

Reviewed By: yiwu

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D62193
2016-09-01 13:50:39 -07:00
Sherlock 85bb308258 Expose Utility function StringToMap() (#1306) 2016-09-01 13:47:21 -07:00
Islam AbdelRahman b49b92cf28 Introduce Read amplification bitmap (read amp statistics)
Summary:
Add ReadOptions::read_amp_bytes_per_bit option which allow us to create a bitmap for every data block we read
the bitmap will contain (block_size / read_amp_bytes_per_bit) bits.

We will use this bitmap to mark which bytes have been used of the block so we can calculate the read amplification

Test Plan: added new tests

Reviewers: andrewkr, yhchiang, sdong

Reviewed By: sdong

Subscribers: yiwu, leveldb, march, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58707
2016-08-26 18:55:58 -07:00
Aaron Gao c7004840d2 store prefix_extractor_name in table
Summary:
Make sure prefix extractor name is stored in SST files and if DB is opened with a prefix extractor of a different name, prefix bloom is skipped when read the file.
Also add unit tests for that.

Test Plan:
before change:
```
Note: Google Test filter = BlockBasedTableTest.SkipPrefixBloomFilter
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from BlockBasedTableTest
[ RUN      ] BlockBasedTableTest.SkipPrefixBloomFilter
table/table_test.cc:1421: Failure
Value of: db_iter->Valid()
Actual: false
Expected: true
[  FAILED  ] BlockBasedTableTest.SkipPrefixBloomFilter (1 ms)
[----------] 1 test from BlockBasedTableTest (1 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] BlockBasedTableTest.SkipPrefixBloomFilter

1 FAILED TEST
```
after:
```
Note: Google Test filter = BlockBasedTableTest.SkipPrefixBloomFilter
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from BlockBasedTableTest
[ RUN      ] BlockBasedTableTest.SkipPrefixBloomFilter
[       OK ] BlockBasedTableTest.SkipPrefixBloomFilter (0 ms)
[----------] 1 test from BlockBasedTableTest (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (0 ms total)
[  PASSED  ] 1 test.
```

Reviewers: sdong, andrewkr, yiwu, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61215
2016-08-26 11:46:32 -07:00
Aaron Gao 4ad928e170 add comment to SimCache to estimate actual capacity
Summary: as title

Test Plan: make all check

Reviewers: yiwu

Reviewed By: yiwu

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D62493
2016-08-26 11:36:14 -07:00
Islam AbdelRahman e9b2af87f8 Expose ThreadPool under include/rocksdb/threadpool.h
Summary:
This diff split ThreadPool to
-ThreadPool (abstract interface exposed in include/rocksdb/threadpool.h)
-ThreadPoolImpl (actual implementation in util/threadpool_imp.h)

This allow us to expose ThreadPool to the user so we can use it as an option later

Test Plan: existing unit tests

Reviewers: andrewkr, yiwu, yhchiang, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62085
2016-08-26 10:41:35 -07:00
Yi Wu 4a16c32ece Option to cache index/filter blocks with priority
Summary:
Add option to block based table to insert index/filter blocks to block cache with priority. Combined with LRUCache with high_pri_pool_ratio, we can reserved space for index/filter blocks, make them less likely to be evicted.

Depends on D61977.

Test Plan: See unit test.

Reviewers: lightmark, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, march, leveldb

Differential Revision: https://reviews.facebook.net/D62241
2016-08-23 13:44:13 -07:00
Adam Retter ffdf6eee19 Add Status to RocksDBException so that meaningful function result Status from the C++ API isn't lost (#1273) 2016-08-22 11:01:42 -07:00
Yi Wu 72f8cc703c LRU cache mid-point insertion
Summary:
Add mid-point insertion functionality to LRU cache. Caller of `Cache::Insert()` can set an additional parameter to make a cache entry have higher priority. The LRU cache will reserve at most `capacity * high_pri_pool_pct` bytes for high-pri cache entries. If `high_pri_pool_pct` is zero, the cache degenerates to normal LRU cache.

Context: If we are to put index and filter blocks into RocksDB block cache, index/filter block can be swap out too early. We want to add an option to RocksDB to reserve some capacity in block cache just for index/filter blocks, to mitigate the issue.

In later diffs I'll update block based table reader to use the interface to cache index/filter blocks at high priority, and expose the option to `DBOptions` and make it dynamic changeable.

Test Plan: unit test.

Reviewers: IslamAbdelRahman, sdong, lightmark

Reviewed By: lightmark

Subscribers: andrewkr, dhruba, march, leveldb

Differential Revision: https://reviews.facebook.net/D61977
2016-08-19 16:43:31 -07:00
Wanning Jiang 78837f5d61 TableBuilder / TableReader support for range deletion
Summary: 1. Range Deletion Tombstone structure 2. Modify Add() in table_builder to make it usable for adding range del tombstones 3. Expose NewTombstoneIterator() API in table_reader

Test Plan: table_test.cc (now BlockBasedTableBuilder::Add() only accepts InternalKey. I make table_test only pass InternalKey to BlockBasedTableBuidler. Also test writing/reading range deletion tombstones in table_test )

Reviewers: sdong, IslamAbdelRahman, lightmark, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61473
2016-08-19 15:10:31 -07:00
Yi Wu 4cc37f59e5 Introduce ClockCache
Summary:
Clock-based cache implemenetation aim to have better concurreny than
default LRU cache. See inline comments for implementation details.

Test Plan:
Update cache_test to run on both LRUCache and ClockCache. Adding some
new tests to catch some of the bugs that I fixed while implementing the
cache.

Reviewers: kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61647
2016-08-19 12:28:19 -07:00
Jay 49d88be021 c abi: allow compaction filter ignore snapshot (#1268)
close #1262
2016-08-17 18:48:43 -07:00
Anirban Rahut 2fc2fd92a9 Single Delete Mismatch and Fallthrough statistics
Summary:
Added 2 statistics in compaction job statistics, to
identify if single deletes are not meeting a matching key
(fallthrough) or single deletes are meeting a merge, delete or
another single delete (i.e. not the expected case of put).

Test Plan: Tested the statistics using write_stress and compaction_job_stats_test

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61749
2016-08-16 08:21:43 -07:00
Andrew Kryczka 3771e37970 WriteBatch support for range deletion
Summary:
Add API to WriteBatch to store range deletions in its buffer
which are later added to memtable. In the WriteBatch buffer, a range
deletion is encoded as "<optype><CF ID (optional)><begin key><end key>".

With this diff, the range tombstones are stored inline with the data in
the memtable. It's useful for now because the test cases rely on the
data being accessible via memtable. My next step is to store range
tombstones in a separate area in the memtable.

Test Plan: unit tests

Reviewers: IslamAbdelRahman, sdong, wanning

Reviewed By: wanning

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61401
2016-08-16 08:16:04 -07:00
Edouard A a117891b4b Fixed typo (#1279)
Users shouldn't reply on -> users shouldn't rely on
2016-08-15 09:21:01 -07:00
Aaron Gao e408e98c8c add Name() to Cache
Summary: preparation for detecting Cache type. If SimCache, we then may trigger some command like "setSimCapacity()" with setOptions()

Test Plan: make all check

Reviewers: yiwu, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61953
2016-08-12 14:16:57 -07:00