Commit graph

32 commits

Author SHA1 Message Date
Liangjun Feng 9aa011fa36 Optimize GetRange Function
Summary: Optimize GetRange Function by checking the level of the files

Test Plan: pass make all check

Reviewers: rven, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D37977
2015-05-05 09:57:47 -07:00
Igor Canadi 6059bdf86a Add experimental API MarkForCompaction()
Summary:
Some Mongo+Rocks datasets in Parse's environment are not doing compactions very frequently. During the quiet period (with no IO), we'd like to schedule compactions so that our reads become faster. Also, aggressively compacting during quiet periods helps when write bursts happen. In addition, we also want to compact files that are containing deleted key ranges (like old oplog keys).

All of this is currently not possible with CompactRange() because it's single-threaded and blocks all other compactions from happening. Running CompactRange() risks an issue of blocking writes because we generate too much Level 0 files before the compaction is over. Stopping writes is very dangerous because they hold transaction locks. We tried running manual compaction once on Mongo+Rocks and everything fell apart.

MarkForCompaction() solves all of those problems. This is very light-weight manual compaction. It is lower priority than automatic compactions, which means it shouldn't interfere with background process keeping the LSM tree clean. However, if no automatic compactions need to be run (or we have extra background threads available), we will start compacting files that are marked for compaction.

Test Plan: added a new unit test

Reviewers: yhchiang, rven, MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37083
2015-04-17 16:44:45 -07:00
Igor Canadi b5400f90fe Kill dead code
Summary: this is not used anywhere

Test Plan: compiles

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37053
2015-04-17 12:07:47 -07:00
Igor Canadi 47b8743984 Make Compaction class easier to use
Summary:
The goal of this diff is to make Compaction class easier to use. This should also make new compaction algorithms easier to write (like CompactFiles from @yhchiang and dynamic leveled and multi-leveled universal from @sdong).

Here are couple of things demonstrating that Compaction class is hard to use:
1. we have two constructors of Compaction class
2. there's this thing called grandparents_, but it appears to only be setup for leveled compaction and not compactfiles
3. it's easy to introduce a subtle and dangerous bug like this: D36225
4. SetupBottomMostLevel() is hard to understand and it shouldn't be. See this comment: afbafeaeae/db/compaction.cc (L236-L241). It also made it harder for @yhchiang to write CompactFiles, as evidenced by this: afbafeaeae/db/compaction_picker.cc (L204-L210)

The problem is that we create Compaction object, which holds a lot of state, and then pass it around to some functions. After those functions are done mutating, then we call couple of functions on Compaction object, like SetupBottommostLevel() and MarkFilesBeingCompacted(). It is very hard to see what's happening with all that Compaction's state while it's travelling across different functions. If you're writing a new PickCompaction() function you need to try really hard to understand what are all the functions you need to run on Compaction object and what state you need to setup.

My proposed solution is to make important parts of Compaction immutable after construction. PickCompaction() should calculate compaction inputs and then pass them onto Compaction object once they are finalized. That makes it easy to create a new compaction -- just provide all the parameters to the constructor and you're done. No need to call confusing functions after you created your object.

This diff doesn't fully achieve that goal, but it comes pretty close. Here are some of the changes:
* have one Compaction constructor instead of two.
* inputs_ is constant after construction
* MarkFilesBeingCompacted() is now private to Compaction class and automatically called on construction/destruction.
* SetupBottommostLevel() is gone. Compaction figures it out on its own based on the input.
* CompactionPicker's functions are not passing around Compaction object anymore. They are only passing around the state that they need.

Test Plan:
make check
make asan_check
make valgrind_check

Reviewers: rven, anthony, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: sdong, yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36687
2015-04-10 15:01:54 -07:00
Venkatesh Radhakrishnan afbafeaeae Disallow trivial move if compression level is different
Summary:
Check compression level of start_level with output_compression
before allowing trivial move

Test Plan: New DBTest CompressLevelCompactionThirdPath added

Reviewers: igor, yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36213
2015-04-02 11:06:30 -07:00
sdong b23bbaa82a Universal Compactions with Small Files
Summary:
With this change, we use L1 and up to store compaction outputs in universal compaction.
The compaction pick logic stays the same. Outputs are stored in the largest "level" as possible.

If options.num_levels=1, it behaves all the same as now.

Test Plan:
1) convert most of existing unit tests for universal comapaction to include the option of one level and multiple levels.
2) add a unit test to cover parallel compaction in universal compaction and run it in one level and multiple levels
3) add unit test to migrate from multiple level setting back to one level setting
4) add a unit test to insert keys to trigger multiple rounds of compactions and verify results.

Reviewers: rven, kradhakrishnan, yhchiang, igor

Reviewed By: igor

Subscribers: meyering, leveldb, MarkCallaghan, dhruba

Differential Revision: https://reviews.facebook.net/D34539
2015-03-30 15:12:02 -07:00
Igor Canadi 52e0f3353f Clean up compactions_in_progress_
Summary: Suprisingly, the only way we use this vector is to keep track of level0 compactions. Thus, I simplified it.

Test Plan: make check

Reviewers: rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35313
2015-03-18 18:25:15 -07:00
Igor Canadi db03739340 options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically.
Summary:
When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases.

In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier.

Test Plan: New unit tests and pass tests suites including valgrind.

Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo

Reviewed By: ikabiljo

Subscribers: yoshinorim, ikabiljo, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31437
2015-03-02 22:40:41 -08:00
Igor Sugak 62247ffa3b rocksdb: Add missing override
Summary:
When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes.

Prerequisites: bear and clang 3.5 build with extra tools

```lang=bash
% USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html
% clang-modernize -p . -include . -add-override
% make format
```

Test Plan:
Make sure all tests are passing.
```lang=bash
% #Use default fb code clang.
% make check
```
Verify less error and no missing override errors.
```lang=bash
% # Have trunk clang present in path.
% ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make
```

Reviewers: igor, kradhakrishnan, rven, meyering, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34077
2015-02-26 11:28:41 -08:00
Igor Canadi e39f4f6cf9 Fix data race #3
Summary: Added requirement that ComputeCompactionScore() be executed in mutex, since it's accessing being_compacted bool, which can be mutated by other threads. Also added more comments about thread safety of FileMetaData, since it was a bit confusing. However, it seems that FileMetaData doesn't have data races (except being_compacted)

Test Plan: Ran 100 ConvertCompactionStyle tests with thread sanitizer. On master -- some failures. With this patch -- none.

Reviewers: yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32283
2015-02-04 16:04:51 -08:00
Venkatesh Radhakrishnan 153f4f0719 RocksDB: Allow Level-Style Compaction to Place Files in Different Paths
Summary:
Allow Level-style compaction to place files in different paths
This diff provides the code for task 4854591. We now support level-compaction
to place files in different paths by specifying  them in db_paths  along with
the minimum level for files to store in that path.

Test Plan: ManualLevelCompactionOutputPathId in db_test.cc

Reviewers: yhchiang, MarkCallaghan, dhruba, yoshinorim, sdong

Reviewed By: sdong

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29799
2014-12-15 21:48:16 -08:00
Yueh-Hsuan Chiang bcf9086899 Block Universal and FIFO compactions in ROCKSDB_LITE
Summary: Block Universal and FIFO compactions in ROCKSDB_LITE

Test Plan:
make shared_lib -j32
make OPT=-DROCKSDB_LITE shared_lib

Reviewers: ljin, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29589
2014-11-26 15:45:11 -08:00
Yueh-Hsuan Chiang 4161de92a3 Fix SIGSEGV
Summary: As a short-term fix, let's go back to previous way of calculating NeedsCompaction(). SIGSEGV happens because NeedsCompaction() can happen before super_version (and thus MutableCFOptions) is initialized.

Test Plan: make check

Reviewers: ljin, sdong, rven, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D28875
2014-11-13 15:21:04 -08:00
Igor Canadi 772bc97f13 No CompactFiles in ROCKSDB_LITE
Summary: It adds lots of code.

Test Plan: compile for iOS, compile for mac. works.

Reviewers: rven, sdong, ljin, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D28857
2014-11-13 16:45:33 -05:00
Yueh-Hsuan Chiang 1d1a64f58a Move NeedsCompaction() from VersionStorageInfo to CompactionPicker
Summary:
Move NeedsCompaction() from VersionStorageInfo to CompactionPicker
to allow different compaction strategy to have their own way to
determine whether doing compaction is necessary.

When compaction style is set to kCompactionStyleNone, then
NeedsCompaction() will always return false.

Test Plan:
export ROCKSDB_TESTS=Compact
./db_test

Reviewers: ljin, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D28719
2014-11-13 13:41:43 -08:00
Yueh-Hsuan Chiang 28c82ff1b3 CompactFiles, EventListener and GetDatabaseMetaData
Summary:
This diff adds three sets of APIs to RocksDB.

= GetColumnFamilyMetaData =
* This APIs allow users to obtain the current state of a RocksDB instance on one column family.
* See GetColumnFamilyMetaData in include/rocksdb/db.h

= EventListener =
* A virtual class that allows users to implement a set of
  call-back functions which will be called when specific
  events of a RocksDB instance happens.
* To register EventListener, simply insert an EventListener to ColumnFamilyOptions::listeners

= CompactFiles =
* CompactFiles API inputs a set of file numbers and an output level, and RocksDB
  will try to compact those files into the specified level.

= Example =
* Example code can be found in example/compact_files_example.cc, which implements
  a simple external compactor using EventListener, GetColumnFamilyMetaData, and
  CompactFiles API.

Test Plan:
listener_test
compactor_test
example/compact_files_example
export ROCKSDB_TESTS=CompactFiles
db_test
export ROCKSDB_TESTS=MetaData
db_test

Reviewers: ljin, igor, rven, sdong

Reviewed By: sdong

Subscribers: MarkCallaghan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D24705
2014-11-07 14:45:18 -08:00
sdong 76d1c28e82 Make CompactionPicker more easily tested
Summary:
Make compaction picker easier to test.
The basic idea is to separate a minimum subcomponent of Version to VersionStorageInfo, which just responsible to LSM tree. A stub VersionStorageInfo can then be easily created and passed into compaction picker so that we can check the outputs.

It now passes most tests. Still two things need to be done:
(1) deal with the FIFO compaction's file size.
(2) write an example test to make sure the interface can do the job.

Add a compaction_picker_test to make sure compaction picker codes can be easily unit tested.

Test Plan:
Pass all unit tests and compaction_picker_test

Reviewers: yhchiang, rven, igor, ljin

Reviewed By: ljin

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D27639
2014-10-29 15:16:53 -07:00
Yueh-Hsuan Chiang fcac705f95 Fixed compile warning on Mac caused by unused variables.
Summary:
Fixed compile warning caused by unused variables.

./db/compaction_picker.h:118:7: error: private field 'max_grandparent_overlap_factor_' is not used [-Werror,-Wunused-private-field]
  int max_grandparent_overlap_factor_;
      ^
./db/compaction_picker.h:119:7: error: private field 'expanded_compaction_factor_' is not used [-Werror,-Wunused-private-field]
  int expanded_compaction_factor_;
      ^
2 errors generated.

Test Plan:
make db_test
2014-10-02 01:03:08 -07:00
Lei Jin 5ec53f3edf make compaction related options changeable
Summary:
make compaction related options changeable. Most of changes are tedious,
following the same convention: grabs MutableCFOptions at the beginning
of compaction under mutex, then pass it throughout the job and register
it in SuperVersion at the end.

Test Plan: make all check

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23349
2014-10-01 16:19:16 -07:00
miguelportilla 93e6b5e9d9 Changes to support unity build:
* Script for building the unity.cc file via Makefile
* Unity executable Makefile target for testing builds
* Source code changes to fix compilation of unity build
2014-08-11 13:22:47 -04:00
sdong f6b7e1ed1a Allow user to specify DB path of output file of manual compaction
Summary: Add a parameter path_id to DB::CompactRange(), to indicate where the output file should be placed to.

Test Plan: add a unit test

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: xjin, igor, dhruba, MarkCallaghan, leveldb

Differential Revision: https://reviews.facebook.net/D20085
2014-07-21 19:06:00 -07:00
Yueh-Hsuan Chiang 052ddbe0e2 Add MaxInputLevel() to CompactionPicker
Summary:
Having if-then branch for different compaction strategies is considered
hacky and make CompactionPicker less pluggable.  This diff removes two
of such if-then branches in version_set.cc by adding MaxInputLevel() to
CompactionPicker.

    // Given the current number of levels, returns the lowest allowed level
    // for compaction input.
    virtual int MaxInputLevel(int current_num_levels) const;

Test Plan:
make db_test
export ROCKSDB_TESTS=Compaction
./db_test

Reviewers: igor, sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19971
2014-07-17 18:01:04 -07:00
sdong 0abaed2e08 Support multiple DB directories in universal compaction style
Summary:
This patch adds a target size parameter in options.db_paths and universal compaction will base it to determine which DB path to place a new file.
Level-style stays the same.

Test Plan: Add new unit tests

Reviewers: ljin, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, dhruba, igor, leveldb

Differential Revision: https://reviews.facebook.net/D19869
2014-07-15 12:06:28 -07:00
Igor Canadi 6de6a06631 FIFO compaction style
Summary:
Introducing new compaction style -- FIFO.

FIFO compaction style has write amplification of 1 (+1 for WAL) and it deletes the oldest files when the total DB size exceeds pre-configured values.

FIFO compaction style is suited for storing high-frequency event logs.

Test Plan: Added a unit test

Reviewers: dhruba, haobo, sdong

Reviewed By: dhruba

Subscribers: alberts, leveldb

Differential Revision: https://reviews.facebook.net/D18765
2014-05-21 11:43:35 -07:00
Igor Canadi fb2346fc1f [CF] Code cleanup part 1
Summary:
I'm cleaning up some code preparing for the big diff review tomorrow. This is the first part of the cleanup.

Changes are mostly cosmetic. The goal is to decrease amount of code difference between columnfamilies and master branch.

This diff also fixes race condition when dropping column family.

Test Plan: Ran db_stress with variety of parameters

Reviewers: dhruba, haobo

Differential Revision: https://reviews.facebook.net/D16833
2014-03-12 09:56:53 -07:00
Igor Canadi 80a207fc90 Merge branch 'master' into columnfamilies
Conflicts:
	db/compaction_picker.cc
	db/compaction_picker.h
	db/db_impl.cc
	db/version_set.cc
	db/version_set.h
	include/rocksdb/options.h
	util/options.cc
2014-03-05 16:59:22 -08:00
sdong ecb1ffa2a8 Buffer info logs when picking compactions and write them out after releasing the mutex
Summary: Now while the background thread is picking compactions, it writes out multiple info_logs, especially for universal compaction, which introduces a chance of waiting log writing in mutex, which is bad. To remove this risk, write all those info logs to a buffer and flush it after releasing the mutex.

Test Plan:
make all check
check the log lines while running some tests that trigger compactions.

Reviewers: haobo, igor, dhruba

Reviewed By: dhruba

CC: i.am.jin.lei, dhruba, yhchiang, leveldb, nkg-

Differential Revision: https://reviews.facebook.net/D16515
2014-03-05 15:36:32 -08:00
Igor Canadi 73f62255c1 [CF] Split SanitizeOptions into two
Summary:
There are three SanitizeOption-s now : one for DBOptions, one for ColumnFamilyOptions and one for Options (which just calls the other two)

I have also reshuffled some options -- table_cache options and info_log should live in DBOptions, for example.

Test Plan: make check doesn't complain

Reviewers: dhruba, haobo, kailiu, sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15873
2014-02-04 17:26:51 -08:00
Igor Canadi f7489123e2 Move compaction picker and internal key comparator to ColumnFamilyData
Summary: Compaction picker and internal key comparator are different for each column family (not global), so they should live in ColumnFamilyData

Test Plan: make check

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15801
2014-01-31 16:06:55 -08:00
Igor Canadi 677fee27c6 Make VersionSet::ReduceNumberOfLevels() static
Summary:
A lot of our code implicitly assumes number_levels to be static. ReduceNumberOfLevels() breaks that assumption. For example, after calling ReduceNumberOfLevels(), DBImpl::NumberLevels() will be different from VersionSet::NumberLevels(). This is dangerous. Thankfully, it's not in public headers and is only used from LDB cmd tool. LDB tool is only using it statically, i.e. it never calls it with running DB instance. With this diff, we make it explicitly static. This way, we can assume number_levels to be immutable and not break assumption that lot of our code is relying upon. LDB tool can still use the method.

Also, I removed the method from a separate file since it breaks filename completition. version_se<TAB> now completes to "version_set." instead of "version_set" (without the dot). I don't see a big reason that the function should be in a different file.

Test Plan: reduce_levels_test

Reviewers: dhruba, haobo, kailiu, sdong

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15303
2014-01-24 14:57:04 -08:00
Igor Canadi 0f4a75b710 Fix SIGSEGV in compaction picker
Summary:
The SIGSEGV was introduced by https://reviews.facebook.net/D15171

I also fixed ExpandWhileOverlapping() which returned the failure by setting it's own stack variable to nullptr (!). This bug is present in 2.6 release, so I guess ExpandWhileOverlapping never fails :)

Test Plan: `make check`. Also MarkCallaghan confirmed it fixed the SIGSEGV he reported.

Reviewers: MarkCallaghan, kailiu, sdong, dhruba, haobo

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15261
2014-01-17 12:02:03 -08:00
Igor Canadi c699c84af4 CompactionPicker
Summary:
This is a big one. This diff moves all the code related to picking compactions from VersionSet to new class CompactionPicker. Column families' compactions will be completely separate processes, so we need to have multiple CompactionPickers.

To make this easier to review, most of the code change is just copy/paste. There is also a small change not to use VersionSet::current_, but rather to take `Version* version` as a parameter. Most of the other code is exactly the same.

In future diffs, I will also make some improvements to CompactionPickers. I think the most important part will be encapsulating it better. Currently Version, VersionSet, Compaction and CompactionPicker are all friend classes, which makes it harder to change the implementation.

This diff depends on D15171, D15183, D15189 and D15201

Test Plan: `make check`

Reviewers: kailiu, sdong, dhruba, haobo

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15207
2014-01-16 13:03:52 -08:00