Commit graph

690 commits

Author SHA1 Message Date
sdong ec43a8b9fb Universal Compaction with multiple levels won't allocate up to output size
Summary: Universal compactions with multiple levels should use file preallocation size based on file size if output level is not level 0

Test Plan: Run all tests.

Reviewers: igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D38439
2015-05-13 14:15:46 -07:00
Yueh-Hsuan Chiang 714fcc067d Make ThreadStatus::InterpretOperationProperties take const uint64_t*
Summary: Make ThreadStatus::InterpretOperationProperties take const uint64_t*

Test Plan:
make
make OPT=-DROCKSDB_LITE shared_lib

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38445
2015-05-13 12:26:07 -07:00
Yueh-Hsuan Chiang 14431e971d Fixed a bug in EventListener::OnCompactionCompleted().
Summary:
Fixed a bug in EventListener::OnCompactionCompleted() that returns
incorrect list of input / output file names.

Test Plan: Extend existing test in listener_test.cc

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38349
2015-05-12 16:10:23 -07:00
agiardullo 711465ccec API to fetch from both a WriteBatchWithIndex and the db
Summary:
Added a couple functions to WriteBatchWithIndex to make it easier to query the value of a key including reading pending writes from a batch.  (This is needed for transactions).

I created write_batch_with_index_internal.h to use to store an internal-only helper function since there wasn't a good place in the existing class hierarchy to store this function (and it didn't seem right to stick this function inside WriteBatchInternal::Rep).

Since I needed to access the WriteBatchEntryComparator, I moved some helper classes from write_batch_with_index.cc into write_batch_with_index_internal.h/.cc.  WriteBatchIndexEntry, ReadableWriteBatch, and WriteBatchEntryComparator are all unchanged (just moved to a different file(s)).

Test Plan: Added new unit tests.

Reviewers: rven, yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38037
2015-05-11 14:51:51 -07:00
Igor Canadi 962f8ba332 Bump to 3.11
Summary: as title

Test Plan: none

Reviewers: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38175
2015-05-07 13:27:14 -07:00
Yueh-Hsuan Chiang 77a5a543a5 Allow GetThreadList() to report basic compaction operation properties.
Summary:
Now we're able to show more details about a compaction in
GetThreadList() :)

This patch allows GetThreadList() to report basic compaction
operation properties.  Basic compaction properties include:
    1. job id
    2. compaction input / output level
    3. compaction property flags (is_manual, is_deletion, .. etc)
    4. total input bytes
    5. the number of bytes has been read currently.
    6. the number of bytes has been written currently.

Flush operation properties will be done in a seperate diff.

Test Plan:
/db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=1

Sample output of tracking same job:

          ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
   140664171987072    Low Pri      default           Compaction     31.357 ms     CompactionJob::FinishCompactionOutputFile              BaseInputLevel 1 | BytesRead 2264663 | BytesWritten 1934241 | IsDeletion 0 | IsManual 0 | IsTrivialMove 0 | JobID 277 | OutputLevel 2 | TotalInputBytes 3964158 |

          ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
   140664171987072    Low Pri      default           Compaction     59.440 ms     CompactionJob::FinishCompactionOutputFile              BaseInputLevel 1 | BytesRead 2264663 | BytesWritten 1934241 | IsDeletion 0 | IsManual 0 | IsTrivialMove 0 | JobID 277 | OutputLevel 2 | TotalInputBytes 3964158 |

          ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
   140664171987072    Low Pri      default           Compaction    226.375 ms                        CompactionJob::Install              BaseInputLevel 1 | BytesRead 3958013 | BytesWritten 3621940 | IsDeletion 0 | IsManual 0 | IsTrivialMove 0 | JobID 277 | OutputLevel 2 | TotalInputBytes 3964158 |

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37653
2015-05-06 22:51:06 -07:00
Laurent Demailly df4130ad85 fix crashes in stats and compaction filter for db_ttl_impl
Summary: fix crashes in stats and compaction filter for db_ttl_impl

Test Plan:
Ran build with lots of debugging
https://reviews.facebook.net/differential/diff/194175/

Reviewers: yhchiang, igor, rven

Reviewed By: igor

Subscribers: rven, dhruba

Differential Revision: https://reviews.facebook.net/D38001
2015-05-05 16:54:47 -07:00
Igor Canadi fd96b55402 Making GetOptions() comment better (#597) 2015-04-30 09:29:51 -07:00
Aashish Pant 794ccfde89 Task 6532943: Rocksdb - SetCapacity() can dynamically change cache capacity if feasible
Summary:
When new capacity is larger than existing capacity, simply update the capacity to the new valie
When new capacity is less than existing capacity, but more than the usage, simply update the capacity to new value
When new capacity is less than the existing capacity and existing usage both, try to purge entries in LRU if feasible to make usage < capacity

Test Plan: Created unit tests in cache_test.cc

Reviewers: sdong, rven, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D37527
2015-04-24 14:12:58 -07:00
sdong 98a44559d5 Build for CYGWIN
Summary:
Make it build for CYGWIN.
Need to define "-std=gnu++11" instead of "-std=c++11" and use some replacement functions.

Test Plan: Build it and run some unit tests in CYGWIN

Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D37605
2015-04-23 21:33:44 -07:00
Giuseppe Ottaviano 2dc421df48 Implement DB::PromoteL0 method
Summary:
This diff implements a new `DB` method `PromoteL0` which moves all files in L0
to a given level skipping compaction, provided that the files have disjoint
ranges and all levels up to the target level are empty.

This method provides finer-grain control for trivial compactions, and it is
useful for bulk-loading pre-sorted keys. Compared to D34797, it does not change
the semantics of an existing operation, which can impact existing code.

PromoteL0 is designed to work well in combination with the proposed
`GetSstFileWriter`/`AddFile` interface, enabling to "design" the level structure
by populating one level at a time. Such fine-grained control can be very useful
for static or mostly-static databases.

Test Plan: `make check`

Reviewers: IslamAbdelRahman, philipp, MarkCallaghan, yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D37107
2015-04-23 12:10:36 -07:00
sdong 397b6588bd options.paranoid_file_checks to read all rows after writing to a file.
Summary: To further distinguish the corruption cases were caused by storage media or in memory states when writing it, add a paranoid check after writing the file to iterate all the rows.

Test Plan: Add a new unit test for it

Reviewers: rven, igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D37335
2015-04-23 11:34:35 -07:00
Igor Canadi 6059bdf86a Add experimental API MarkForCompaction()
Summary:
Some Mongo+Rocks datasets in Parse's environment are not doing compactions very frequently. During the quiet period (with no IO), we'd like to schedule compactions so that our reads become faster. Also, aggressively compacting during quiet periods helps when write bursts happen. In addition, we also want to compact files that are containing deleted key ranges (like old oplog keys).

All of this is currently not possible with CompactRange() because it's single-threaded and blocks all other compactions from happening. Running CompactRange() risks an issue of blocking writes because we generate too much Level 0 files before the compaction is over. Stopping writes is very dangerous because they hold transaction locks. We tried running manual compaction once on Mongo+Rocks and everything fell apart.

MarkForCompaction() solves all of those problems. This is very light-weight manual compaction. It is lower priority than automatic compactions, which means it shouldn't interfere with background process keeping the LSM tree clean. However, if no automatic compactions need to be run (or we have extra background threads available), we will start compacting files that are marked for compaction.

Test Plan: added a new unit test

Reviewers: yhchiang, rven, MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37083
2015-04-17 16:44:45 -07:00
agiardullo 84c5bd7eb9 Add thread-safety documentation to MemTable and related classes
Summary: Other than making some class members private, this is a documentation-only change

Test Plan: unit tests

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36567
2015-04-08 21:10:35 -07:00
Yoshinori Matsunobu 824e646341 Adding another NewFlashcacheAwareEnv function to support pre-opened fd
Summary:
There are some cases when flachcache file descriptor was
already allocated (i.e. fb-MySQL). Then NewFlashcacheAwareEnv returns an
error at open() because fd was already assigned. This diff adds another
function to instantiate FlashcacheAwareEnv, with pre-allocated fd cachedev_fd.

Test Plan: Tested with MyRocks using this function, then worked

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, MarkCallaghan, rven

Differential Revision: https://reviews.facebook.net/D36447
2015-04-06 16:50:36 -07:00
Igor Canadi 5e067a7b19 Clean up compression logging
Summary: Now we add warnings when user configures compression and the compression is not supported.

Test Plan:
Configured compression to non-supported values. Observed messages in my log:

    2015/03/26-12:17:57.586341 7ffb8a496840 [WARN] Compression type chosen for level 2 is not supported: LZ4. RocksDB will not compress data on level 2.

    2015/03/26-12:19:10.768045 7f36f15c5840 [WARN] Compression type chosen is not supported: LZ4. RocksDB will not compress data.

Reviewers: rven, sdong, yhchiang

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35979
2015-04-06 12:50:44 -07:00
sdong 953a885ebf A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge
Summary:
Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it.

Also refactor the codes so that
(1) make table property collector and internal table property collector two separate data structures with the later one now exposed
(2) table builders only receive internal table properties

Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector.

Reviewers: yhchiang, igor.sugak, rven, igor

Reviewed By: rven, igor

Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D35373
2015-04-06 10:27:21 -07:00
Igor Canadi d61cb0b9de db_bench can now disable flashcache for background threads
Summary: Most of the approach is copied from WebSQL's MySQL branch. It's nice that we can do this without touching core RocksDB code.

Test Plan: Compiles and runs. Didn't test flashback code, as I don't have flashback device and most if it is c/p

Reviewers: MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: rven, lgalanis, kradhakrishnan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35391
2015-03-30 09:51:11 -07:00
Herman Lee e018892bb6 Formalize the DB properties string definitions.
Summary:
Assign the string properties to const string variables under the
DB::Properties namespace. This helps catch typos during compilation and
also consolidates the property definition in one place.

Test Plan: Run rocksdb unit tests

Reviewers: sdong, yoshinorim, igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D35991
2015-03-27 14:50:20 -07:00
Anurag Indu 3d1a924ff3 Adding stats for the merge and filter operation
Summary:
We have addded new stats and perf_context for measuring the merge and filter operation time consumption.
We have bounded all the merge operations within the GUARD statment and collected the total time for these operations in the DB.

Test Plan: WIP

Reviewers: rven, yhchiang, kradhakrishnan, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D34377
2015-03-24 14:42:04 -07:00
Yueh-Hsuan Chiang 248c063ba1 Report elapsed time in micros in ThreadStatus instead of start time.
Summary:
Report elapsed time of a thread operation in micros in ThreadStatus
instead of start time of a thread operation in seconds since the
Epoch, 1970-01-01 00:00:00 (UTC).

Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
--max_background_compactions=10 --max_background_flushes=3 \
--thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
--num_column_families=10

Sample Output:
            ThreadID ThreadType                    cfName    Operation  ElapsedTime                                         Stage        State
     140667724562496   High Pri column_family_name_000002        Flush   772.419 ms                    FlushJob::WriteLevel0Table
     140667728756800   High Pri                   default        Flush   617.845 ms                    FlushJob::WriteLevel0Table
     140667732951104   High Pri column_family_name_000005        Flush   772.078 ms                    FlushJob::WriteLevel0Table
     140667875557440    Low Pri column_family_name_000008   Compaction  1409.216 ms                        CompactionJob::Install
     140667737145408    Low Pri
     140667749728320    Low Pri
     140667816837184    Low Pri column_family_name_000007   Compaction  1071.815 ms      CompactionJob::ProcessKeyValueCompaction
     140667787477056    Low Pri column_family_name_000009   Compaction   772.516 ms      CompactionJob::ProcessKeyValueCompaction
     140667741339712    Low Pri
     140667758116928    Low Pri column_family_name_000004   Compaction   620.739 ms      CompactionJob::ProcessKeyValueCompaction
     140667753922624    Low Pri
     140667842003008    Low Pri column_family_name_000006   Compaction  1260.079 ms      CompactionJob::ProcessKeyValueCompaction
     140667745534016    Low Pri

Reviewers: sdong, igor, rven

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35769
2015-03-24 11:32:25 -07:00
Igor Canadi 315abac945 Undeprecate GetLiveFiles()
Summary: There is no alternative to GetLiveFiles() function

Test Plan: none

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35805
2015-03-24 09:42:38 -07:00
sdong 0831a35994 Add a DB Property For Number of Deletions in Memtables
Summary: Add a DB property for number of deletions in memtables. It can sometimes help people debug slowness because of too many deletes.

Test Plan: Add test cases.

Reviewers: rven, yhchiang, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba, yoshinorim

Differential Revision: https://reviews.facebook.net/D35247
2015-03-18 17:03:59 -07:00
Igor Canadi 51301b869f Enable dynamic changing of rate limiter's bytes_per_second
Summary: This feature is going to be useful for mongodb+rocksdb. I'll expose it through mongo's API.

Test Plan: added new unit test. also will run TSAN on the new unit test

Reviewers: meyering, sdong

Reviewed By: meyering, sdong

Subscribers: meyering, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35307
2015-03-18 15:35:55 -07:00
agiardullo 81345b90f9 Create an abstract interface for write batches
Summary: WriteBatch and WriteBatchWithIndex now both inherit from a common abstract base class.  This makes it easier to write code that is agnostic toward the implementation of the particular write batch.  In particular, I plan on utilizing this abstraction to allow transactions to support using either implementation of a write batch.

Test Plan: modified existing WriteBatchWithIndex tests to test new functions.  Running all tests.

Reviewers: igor, rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34017
2015-03-17 19:23:08 -07:00
Igor Canadi c88ff4ca76 Deprecate removeScanCountLimit in NewLRUCache
Summary: It is no longer used by the implementation, so we should also remove it from the public API.

Test Plan: make check

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34971
2015-03-17 15:04:37 -07:00
Venkatesh Radhakrishnan d4d42c02ea Fixed clang build in env.h
Summary: Mark function as override.

Test Plan: USE_CLANG=1 make -j32 check

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35163
2015-03-16 19:51:25 -07:00
Venkatesh Radhakrishnan b2b3086524 Speed up rocksDB close call.
Summary:
On RocksDB, when there are multiple instances doing
flushes/compactions in the background, the close call takes a long time
because the flushes/compactions need to complete before the database can
shut down. If another instance is using the background threads and the compaction for this instance is in the queue since it has been scheduled, we still cannot shutdown. We now remove the scheduled background tasks which have not yet started running, so that shutdown is speeded up.

Test Plan: DB Test added.

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33741
2015-03-16 18:49:14 -07:00
Yueh-Hsuan Chiang c594b0e89d Allow GetThreadList() to report operation stage.
Summary: Allow GetThreadList() to report operation stage.

Test Plan:
  ./thread_list_test
  ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
    --max_background_compactions=10 --max_background_flushes=3 \
    --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
    --num_column_families=10

  export ROCKSDB_TESTS=ThreadStatus
  ./db_test

Sample output
          ThreadID ThreadType                    cfName    Operation        OP_StartTime    ElapsedTime                                         Stage        State
   140116265861184    Low Pri
   140116270055488    Low Pri
   140116274249792   High Pri column_family_name_000005        Flush 2015/03/10-14:58:11           0 us                    FlushJob::WriteLevel0Table
   140116400078912    Low Pri column_family_name_000004   Compaction 2015/03/10-14:58:11           0 us     CompactionJob::FinishCompactionOutputFile
   140116358135872    Low Pri column_family_name_000006   Compaction 2015/03/10-14:58:10           1 us     CompactionJob::FinishCompactionOutputFile
   140116341358656    Low Pri
   140116295221312   High Pri                   default        Flush 2015/03/10-14:58:11           0 us                    FlushJob::WriteLevel0Table
   140116324581440    Low Pri column_family_name_000009   Compaction 2015/03/10-14:58:11           0 us      CompactionJob::ProcessKeyValueCompaction
   140116278444096    Low Pri
   140116299415616    Low Pri column_family_name_000008   Compaction 2015/03/10-14:58:11           0 us     CompactionJob::FinishCompactionOutputFile
   140116291027008   High Pri column_family_name_000001        Flush 2015/03/10-14:58:11           0 us                    FlushJob::WriteLevel0Table
   140116286832704    Low Pri column_family_name_000002   Compaction 2015/03/10-14:58:11           0 us     CompactionJob::FinishCompactionOutputFile
   140116282638400    Low Pri

Reviewers: rven, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34683
2015-03-13 10:45:40 -07:00
sdong e9de8b65a6 Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true
Summary:
Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true so that options.compression_per_level[1] determines compression for the level L0 is merged to, options.compression_per_level[2] to the level after that, etc.

Test Plan: run all tests

Reviewers: rven, yhchiang, kradhakrishnan, igor

Reviewed By: igor

Subscribers: yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D34431
2015-03-11 13:14:52 -07:00
Venkatesh Radhakrishnan 284be570c8 Provide a mechanism to inform Rocksdb that it is shutting down
Summary:
Provide an API which enables users to infor Rocksdb that it is
shutting down.

Test Plan: db_test

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34617
2015-03-11 10:31:02 -07:00
Yueh-Hsuan Chiang 89597bb66b Allow GetThreadList() to report the start time of the current operation.
Summary: Allow GetThreadList() to report the start time of the current operation.

Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
  --max_background_compactions=10 --max_background_flushes=3 \
  --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
  --num_column_families=10

Sample output:
          ThreadID ThreadType                    cfName    Operation        OP_StartTime         State
   140338840797248   High Pri column_family_name_000003        Flush 2015/03/09-17:49:59
   140338844991552   High Pri column_family_name_000004        Flush 2015/03/09-17:49:59
   140338849185856    Low Pri
   140338983403584    Low Pri
   140339008569408    Low Pri
   140338861768768    Low Pri
   140338924683328    Low Pri
   140338899517504    Low Pri
   140338853380160    Low Pri
   140338882740288    Low Pri
   140338865963072   High Pri column_family_name_000006        Flush 2015/03/09-17:49:59
   140338954043456    Low Pri
   140338857574464    Low Pri

Reviewers: igor, rven, sdong

Reviewed By: sdong

Subscribers: lgalanis, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34689
2015-03-10 14:51:28 -07:00
Yueh-Hsuan Chiang dc4532c497 Add --thread_status_per_interval to db_bench
Summary:
Add --thread_status_per_interval to db_bench, which allows
db_bench to optionally enable print the current thread status
periodically.

Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 --max_background_compactions=10 --max_background_flushes=3 --thread_status_per_interval=1000 --key_size=16 --value_size=1000 --num_column_families=10

Sample output:
          ThreadID ThreadType                         dbName                     cfName       Operation           State
   140281571770432    Low Pri
   140281575964736   High Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000001           Flush
   140281710182464    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000008      Compaction
   140281638879296    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000007      Compaction
   140281592741952    Low Pri
   140281580159040   High Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000002           Flush
   140281676628032    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000006      Compaction
   140281584353344    Low Pri
   140281622102080    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000009      Compaction
   140281605324864    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000004      Compaction
   140281601130560   High Pri  /tmp/rocksdbtest-5297/dbbench                    default           Flush
   140281596936256    Low Pri
   140281588547648    Low Pri

Reviewers: igor, rven, sdong

Reviewed By: rven

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D34515
2015-03-06 11:22:06 -08:00
Igor Canadi db03739340 options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically.
Summary:
When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases.

In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier.

Test Plan: New unit tests and pass tests suites including valgrind.

Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo

Reviewed By: ikabiljo

Subscribers: yoshinorim, ikabiljo, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31437
2015-03-02 22:40:41 -08:00
Igor Canadi 3cf7f353d9 Instrument memtable seeks
Summary: As title

Test Plan: compiles

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34191
2015-02-27 17:06:06 -08:00
Igor Canadi b9ff6b050d Fix a bug in ReadOnlyBackupEngine
Summary:
This diff fixes a bug introduced by D28521. Read-only backup engine can delete a backup that is later than the latest -- we never check the condition.

I also added a bunch of logging that will help with debugging cases like this in the future.

See more discussion at t6218248.

Test Plan: Added a unit test that was failing before the change. Also, see new LOG file contents: https://phabricator.fb.com/P19738984

Reviewers: benj, sanketh, sumeet, yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33897
2015-02-27 14:03:56 -08:00
Sameet Agarwal e7c434c364 Add columnfamily option optimize_filters_for_hits to optimize for key hits only
Summary:
    Summary:
    Added a new option to ColumnFamllyOptions  - optimize_filters_for_hits. This option can be used in the case where most
    accesses to the store are key hits and we dont need to optimize performance for key misses.
    This is useful when you have a very large database and most of your lookups succeed.  The option allows the store to
     not store and use filters in the last level (the largest level which contains data). These filters can take a large amount of
     space for large databases (in memory and on-disk). For the last level, these filters are only useful for key misses and not
     for key hits. If we are not optimizing for key misses, we can choose to not store these filters for that level.

    This option is only provided for BlockBasedTable. We skip the filters when we are compacting

Test Plan:
1. Modified db_test toalso run tests with an additonal option (skip_filters_on_last_level)
 2. Added another unit test to db_test which specifically tests that filters are being skipped

Reviewers: rven, igor, sdong

Reviewed By: sdong

Subscribers: lgalanis, yoshinorim, MarkCallaghan, rven, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33717
2015-02-26 16:25:56 -08:00
stash93 03bbf718cb Return fbson
Summary: mac compile is fixed in fbson, so it can be returned back from 7ce1b2c

Test Plan:
make all check
make valgrind_check

Reviewers: golovachalexander, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D33855
2015-02-27 00:04:14 +03:00
Igor Sugak 62247ffa3b rocksdb: Add missing override
Summary:
When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes.

Prerequisites: bear and clang 3.5 build with extra tools

```lang=bash
% USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html
% clang-modernize -p . -include . -add-override
% make format
```

Test Plan:
Make sure all tests are passing.
```lang=bash
% #Use default fb code clang.
% make check
```
Verify less error and no missing override errors.
```lang=bash
% # Have trunk clang present in path.
% ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make
```

Reviewers: igor, kradhakrishnan, rven, meyering, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34077
2015-02-26 11:28:41 -08:00
Jim Meyering c37937a9ce maint: remove extraneous "const" attribute from return type
Summary:
The "const" attribute does not make sense on a return type,
and provokes a warning/error from gcc -W -Wextra.

Test Plan:
  Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors.

Reviewers: ljin, sdong, igor.sugak, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D33759
2015-02-20 11:07:07 -08:00
Igor Canadi 96ab15d306 GetOptionsFromString + fixes to block_based_table_options
Summary:
In mongo, we currently have a single column family and I'd like to support setting rocksdb::Options from string. This diff provides an option to GetOptionsFromString()

There's one more problem. Currently GetColumnFamilyOptionsFromString() overwrites block_based options. In mongo I set default values for block_cache and some other values of BlockBasedTableOptions and I don't want them reset to default with GetOptionsFromString().

Test Plan: added unit test

Reviewers: yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33729
2015-02-19 19:11:14 -08:00
sdong d45a6a4002 Add rocksdb.num-live-versions: number of live versions
Summary: Add a DB property about live versions. It can be helpful to figure out whether there are files not live but not yet deleted, in some use cases.

Test Plan: make all check

Reviewers: rven, yhchiang, igor

Reviewed By: igor

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33327
2015-02-19 13:10:37 -08:00
Igor Canadi b8ac71ba18 Revert "Fbson to Json"
This reverts commit 7ce1b2c19c.
2015-02-18 13:14:53 -08:00
stash93 7ce1b2c19c Fbson to Json
Summary: Replaced rapidjson with fbson

Test Plan:
make all check
make valgrind_check

Reviewers: golovachalexander, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D32733
2015-02-18 23:58:56 +03:00
Venkatesh Radhakrishnan 7d817268b9 Managed iterator
Summary:
This is a diff for managed iterator. A managed iterator
is a wrapper around an iterator which saves the options for that
iterator as well as the current key/value so that the underlying iterator
and its associated memory can be released when it is aged out
automatically or on the request of the user. Will provide the automatic release as a follow-up diff.

Test Plan: Managed* tests in db_test and XF tests for managed iterator

Reviewers: igor, yhchiang, anthony, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31401
2015-02-18 11:49:31 -08:00
sdong 68af7811ea Remember whole key/prefix filtering on/off in SST file
Summary: Remember whole key or prefix filtering on/off in SST files. If user opens the DB with a different setting that cannot be satisfied while reading the SST file, ignore the bloom filter.

Test Plan: Add a unit test for it

Reviewers: yhchiang, igor, rven

Reviewed By: rven

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D32889
2015-02-11 11:20:04 -08:00
sdong 6d6305dd7d Perf Context to report DB mutex waiting time
Summary: Add counters in perf context to allow users to figure out how time spent on waiting for DB mutex

Test Plan: Add a test and run it.

Reviewers: yhchiang, rven, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D33177
2015-02-09 17:55:12 -08:00
Grace Law 1851f977c2 Added RocksDB stats GET_HIT_L0 and GET_HIT_L1
Summary:
  - In statistics.h , added tickers.
  - In version_set.cc,
  -- Added a getter method for hit_file_level_ in the class FilePicker
  -- Added a line in the Get() method in case of a found, increment the corresponding counters based on the level of the file respectively.

Corresponding task: https://our.intern.facebook.com/intern/tasks/?s=506100481&t=5952818
Personal fork: 0c3f2e3600

Test Plan:
In terminal,
```
make -j32 db_test
ROCKSDB_TESTS=L0L1L2AndUpHitCounter ./db_test
```

Or to use debugger,
```
make -j32 db_test
export ROCKSDB_TESTS=L0L1L2AndUpHitCounter
gdb db_test
```

Reviewers: rven, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D32205
2015-02-09 14:53:58 -08:00
fyrz cfe8837e43 Switch logv with loglevel to virtual 2015-02-09 20:59:29 +01:00
Igor Canadi ee4aa9a0ee Merge pull request #481 from mkevac/backupable
Allow creating and restoring backups from C
2015-02-09 09:51:19 -08:00