Commit graph

92 commits

Author SHA1 Message Date
Mayank Agarwal 2a986919d6 Make rocksdb-deletes faster using bloom filter
Summary:
Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving:
1. Put of delete type
2. Space in the db,and
3. Compaction time

Test Plan:
make all check;
will run db_stress and db_bench and enhance unit-test once the basic design gets approved

Reviewers: dhruba, haobo, vamsi

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11607
2013-07-11 12:11:11 -07:00
Xing Jin 8a5341ec7d Newbie code question
Summary:
This diff is more about my question when reading compaction codes,
instead of a normal diff. I don't quite understand the logic here.

Test Plan: I didn't do any test. If this is a bug, I will continue doing some test.

Reviewers: haobo, dhruba, emayanke

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11661
2013-07-11 09:03:40 -07:00
Abhishek Kona 00124683de [rocksdb] do not trim range for level0 in manual compaction
Summary:
https://code.google.com/p/leveldb/issues/detail?can=1&q=178&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary&id=178

Ported the solution as is to RocksDB.

Test Plan: moved the unit test as manual_compaction_test

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11331
2013-06-17 13:58:17 -07:00
Haobo Xu bdf1085944 [RocksDB] cleanup EnvOptions
Summary:
This diff simplifies EnvOptions by treating it as POD, similar to Options.
- virtual functions are removed and member fields are accessed directly.
- StorageOptions is removed.
- Options.allow_readahead and Options.allow_readahead_compactions are deprecated.
- Unused global variables are removed: useOsBuffer, useFsReadAhead, useMmapRead, useMmapWrite

Test Plan: make check; db_stress

Reviewers: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11175
2013-06-12 11:17:19 -07:00
Dhruba Borthakur e673d5d26d Do not submit multiple simultaneous seek-compaction requests.
Summary:
The code was such that if multi-threaded-compactions as well
as seek compaction are enabled then it submits multiple
compaction request for the same range of keys. This causes
extraneous sst-files to accumulate at various levels.

Test Plan:
I am not able to write a very good unit test for this one
but can easily reproduce this bug with 'dbstress' with the
following options.

batch=1;maxk=100000000;ops=100000000;ro=0;fm=2;bpl=10485760;of=500000; wbn=3; mbc=20; mb=2097152; wbs=4194304; dds=1; sync=0;  t=32; bs=16384; cs=1048576; of=500000; ./db_stress --disable_seek_compaction=0 --mmap_read=0 --threads=$t --block_size=$bs --cache_size=$cs --open_files=$of --verify_checksum=1 --db=/data/mysql/leveldb/dbstress.dir --sync=$sync --disable_wal=1 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --target_file_size_multiplier=$fm --max_write_buffer_number=$wbn --max_background_compactions=$mbc --max_bytes_for_level_base=$bpl --reopen=$ro --ops_per_thread=$ops --max_key=$maxk --test_batches_snapshots=$batch

Reviewers: leveldb, emayanke

Reviewed By: emayanke

Differential Revision: https://reviews.facebook.net/D11055
2013-06-10 15:49:19 -07:00
Abhishek Kona d91b42ee27 [Rocksdb] Measure all FSYNC/SYNC times
Summary: Add stop watches around all sync calls.

Test Plan: db_bench check if respective histograms are printed

Reviewers: haobo, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11073
2013-06-05 11:06:21 -07:00
Haobo Xu ab8d2f6ab2 [RocksDB] [Performance] Allow different posix advice to be applied to the same table file
Summary:
Current posix advice implementation ties up the access pattern hint with the creation of a file.
It is not possible to apply different advice for different access (random get vs compaction read),
without keeping two open files for the same table. This patch extended the RandomeAccessFile interface
to accept new access hint at anytime. Particularly, we are able to set different access hint on the same
table file based on when/how the file is used.
Two options are added to set the access hint, after the file is first opened and after the file is being
compacted.

Test Plan: make check; db_stress; db_bench

Reviewers: dhruba

Reviewed By: dhruba

CC: MarkCallaghan, leveldb

Differential Revision: https://reviews.facebook.net/D10905
2013-05-30 19:08:44 -07:00
Dhruba Borthakur d1aaaf718c Ability to set different size fanout multipliers for every level.
Summary:
There is an existing field Options.max_bytes_for_level_multiplier that
sets the multiplier for the size of each level in the database.

This patch introduces the ability to set different multipliers
for every level in the database. The size of a level is determined
by using both max_bytes_for_level_multiplier as well as the
per-level fanout.

size of level[i] = size of level[i-1] * max_bytes_for_level_multiplier
                   * fanout[i-1]

The default value of fanout is 1, so that it is backward compatible.

Test Plan: make check

Reviewers: haobo, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D10863
2013-05-21 13:50:20 -07:00
Dhruba Borthakur a8d3aa2c26 Assertion failure for L0-L1 compactions.
Summary:
For level-0 compactions, we try to find if can include more L0 files
in the same compaction run. This causes the 'smallest' and 'largest'
key to get extended to a larger range. But the suceeding call to
ParentRangeInCompaction() was still using the earlier
values of 'smallest' and 'largest',

Because of this bug, a file in L1 can be part of two concurrent
compactions: one L0-L1 compaction and the other L1-L2 compaction.

This should not cause any data loss, but will cause an assertion
failure with debug builds.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D10677
2013-05-08 17:10:11 -07:00
Haobo Xu 05e8854085 [Rocksdb] Support Merge operation in rocksdb
Summary:
This diff introduces a new Merge operation into rocksdb.
The purpose of this review is mostly getting feedback from the team (everyone please) on the design.

Please focus on the four files under include/leveldb/, as they spell the client visible interface change.
include/leveldb/db.h
include/leveldb/merge_operator.h
include/leveldb/options.h
include/leveldb/write_batch.h

Please go over local/my_test.cc carefully, as it is a concerete use case.

Please also review the impelmentation files to see if the straw man implementation makes sense.

Note that, the diff does pass all make check and truly supports forward iterator over db and a version
of Get that's based on iterator.

Future work:
- Integration with compaction
- A raw Get implementation

I am working on a wiki that explains the design and implementation choices, but coding comes
just naturally and I think it might be a good idea to share the code earlier. The code is
heavily commented.

Test Plan: run all local tests

Reviewers: dhruba, heyongqiang

Reviewed By: dhruba

CC: leveldb, zshao, sheki, emayanke, MarkCallaghan

Differential Revision: https://reviews.facebook.net/D9651
2013-05-03 16:59:02 -07:00
Dhruba Borthakur 9b81d3c406 Simplified level_ptrs by using a std:vector
Summary: Simplified level_ptrs by using a std:vector

Test Plan: make check

Reviewers: sheki, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D10245
2013-04-15 13:52:51 -07:00
Haobo Xu 013e9ebbf1 [RocksDB] [Performance] Speed up FindObsoleteFiles
Summary:
FindObsoleteFiles was slow, holding the single big lock, resulted in bad p99 behavior.
Didn't profile anything, but several things could be improved:
1. VersionSet::AddLiveFiles works with std::set, which is by itself slow (a tree).
   You also don't know how many dynamic allocations occur just for building up this tree.
   switched to std::vector, also added logic to pre-calculate total size and do just one allocation
2. Don't see why env_->GetChildren() needs to be mutex proteced, moved to PurgeObsoleteFiles where
   mutex could be unlocked.
3. switched std::set to std:unordered_set, the conversion from vector is also inside PurgeObsoleteFiles
I have a feeling this should pretty much fix it.

Test Plan: make check;  db_stress

Reviewers: dhruba, heyongqiang, MarkCallaghan

Reviewed By: dhruba

CC: leveldb, zshao

Differential Revision: https://reviews.facebook.net/D10197
2013-04-12 11:29:27 -07:00
Dhruba Borthakur ad96563b79 Ability to configure bufferedio-reads, filesystem-readaheads and mmap-read-write per database.
Summary:
This patch allows an application to specify whether to use bufferedio,
reads-via-mmaps and writes-via-mmaps per database. Earlier, there
was a global static variable that was used to configure this functionality.

The default setting remains the same (and is backward compatible):
 1. use bufferedio
 2. do not use mmaps for reads
 3. use mmap for writes
 4. use readaheads for reads needed for compaction

I also added a parameter to db_bench to be able to explicitly specify
whether to do readaheads for compactions or not.

Test Plan: make check

Reviewers: sheki, heyongqiang, MarkCallaghan

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9429
2013-03-20 23:14:03 -07:00
Mayank Agarwal 487168cdcf Fixed sign-comparison in rocksdb code-base and fixed Makefile
Summary: Makefile had options to ignore sign-comparisons and unused-parameters, which should be there. Also fixed the specific errors in the code-base

Test Plan: make

Reviewers: chip, dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9531
2013-03-19 14:35:23 -07:00
Abhishek Kona 7b9db9c98e DO not report level size as zero when there are no files in L0
Summary:
Instead of checking for number of files in L0. Check for number of files in the requested level.

Bug introduced in D4929 (diff trying to do too many things).

Test Plan: db_test.

Reviewers: dhruba, MarkCallaghan

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D9483
2013-03-18 12:04:38 -07:00
Dhruba Borthakur ebf16f57c9 Prevent segfault because SizeUnderCompaction was called without any locks.
Summary:
SizeBeingCompacted was called without any lock protection. This causes
crashes, especially when running db_bench with value_size=128K.
The fix is to compute SizeUnderCompaction while holding the mutex and
passing in these values into the call to Finalize.

(gdb) where
#4  leveldb::VersionSet::SizeBeingCompacted (this=this@entry=0x7f0b490931c0, level=level@entry=4) at db/version_set.cc:1827
#5  0x000000000043a3c8 in leveldb::VersionSet::Finalize (this=this@entry=0x7f0b490931c0, v=v@entry=0x7f0b3b86b480) at db/version_set.cc:1420
#6  0x00000000004418d1 in leveldb::VersionSet::LogAndApply (this=0x7f0b490931c0, edit=0x7f0b3dc8c200, mu=0x7f0b490835b0, new_descriptor_log=<optimized out>) at db/version_set.cc:1016
#7  0x00000000004222b2 in leveldb::DBImpl::InstallCompactionResults (this=this@entry=0x7f0b49083400, compact=compact@entry=0x7f0b2b8330f0) at db/db_impl.cc:1473
#8  0x0000000000426027 in leveldb::DBImpl::DoCompactionWork (this=this@entry=0x7f0b49083400, compact=compact@entry=0x7f0b2b8330f0) at db/db_impl.cc:1757
#9  0x0000000000426690 in leveldb::DBImpl::BackgroundCompaction (this=this@entry=0x7f0b49083400, madeProgress=madeProgress@entry=0x7f0b41bf2d1e, deletion_state=...) at db/db_impl.cc:1268
#10 0x0000000000428f42 in leveldb::DBImpl::BackgroundCall (this=0x7f0b49083400) at db/db_impl.cc:1170
#11 0x000000000045348e in BGThread (this=0x7f0b49023100) at util/env_posix.cc:941
#12 leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper (arg=0x7f0b49023100) at util/env_posix.cc:874
#13 0x00007f0b4a7cf10d in start_thread (arg=0x7f0b41bf3700) at pthread_create.c:301
#14 0x00007f0b49b4b11d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Test Plan:
make check

I am running db_bench with a value size of 128K to see if the segfault is fixed.

Reviewers: MarkCallaghan, sheki, emayanke

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9279
2013-03-11 14:09:01 -07:00
Dhruba Borthakur 6d812b6afb A mechanism to detect manifest file write errors and put db in readonly mode.
Summary:
If there is an error while writing an edit to the manifest file, the manifest
file is closed and reopened to check if the edit made it in. However, if the
re-opening of the manifest is unsuccessful and options.paranoid_checks is set
t true, then the db refuses to accept new puts, effectively putting the db
in readonly mode.

In a future diff, I would like to make the default value of paranoid_check
to true.

Test Plan: make check

Reviewers: sheki

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9201
2013-03-07 09:45:49 -08:00
Mark Callaghan 993543d1be Add rate_delay_limit_milliseconds
Summary:
This adds the rate_delay_limit_milliseconds option to make the delay
configurable in MakeRoomForWrite when the max compaction score is too high.
This delay is called the Ln slowdown. This change also counts the Ln slowdown
per level to make it possible to see where the stalls occur.

From IO-bound performance testing, the Level N stalls occur:
* with compression -> at the largest uncompressed level. This makes sense
                      because compaction for compressed levels is much
                      slower. When Lx is uncompressed and Lx+1 is compressed
                      then files pile up at Lx because the (Lx,Lx+1)->Lx+1
                      compaction process is the first to be slowed by
                      compression.
* without compression -> at level 1

Task ID: #1832108

Blame Rev:

Test Plan:
run with real data, added test

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D9045
2013-03-04 07:41:15 -08:00
Abhishek Kona c41f1e995c Codemod NULL to nullptr
Summary:
scripted NULL to nullptr in
* include/leveldb/
* db/
* table/
* util/

Test Plan: make all check

Reviewers: dhruba, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9003
2013-02-28 18:04:58 -08:00
Dhruba Borthakur fd367e677e Fix unit test failure in db_filename.cc
Summary:
    c_test: db/filename.cc:74: std::string leveldb::DescriptorFileName(const string&,....

Test Plan:
  this is a failure in a unit test

Differential Revision: https://reviews.facebook.net/D8667
2013-02-18 21:53:56 -08:00
Chip Turner 2fdf91a4f8 Fix a number of object lifetime/ownership issues
Summary:
Replace manual memory management with std::unique_ptr in a
number of places; not exhaustive, but this fixes a few leaks with file
handles as well as clarifies semantics of the ownership of file handles
with log classes.

Test Plan: db_stress, make check

Reviewers: dhruba

Reviewed By: dhruba

CC: zshao, leveldb, heyongqiang

Differential Revision: https://reviews.facebook.net/D8043
2013-01-23 16:54:11 -08:00
Abhishek Kona 7d5a4383bb rollover manifest file.
Summary:
Check in LogAndApply if the file size is more than the limit set in
Options.
Things to consider : will this be expensive?

Test Plan: make all check. Inputs on a new unit test?

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7701
2013-01-16 12:09:44 -08:00
Chip Turner a2dcd79c1e Add optional clang compile mode
Summary:
clang is an alternate compiler based on llvm.  It produces
nicer error messages and finds some bugs that gcc doesn't, such as the
size_t change in this file (which caused some write return values to be
misinterpreted!)

Clang isn't the default; to try it, do "USE_CLANG=1 make" or "export
USE_CLANG=1" then make as normal

Test Plan: "make check" and "USE_CLANG=1 make check"

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D7899
2013-01-15 18:48:37 -08:00
Chip Turner 9bbcab57a9 Fix broken build
Summary: Mis-merged from HEAD, had a duplicate declaration.

Test Plan: make -j32 OPT=-g

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D7911
2013-01-15 14:05:49 -08:00
Kosie van der Merwe 28fe86c48a Fixed bug with seek compactions on Level 0
Summary: Due to how the code handled compactions in Level 0 in `PickCompaction()` it could be the case that two compactions on level 0 ran that produced tables in level 1 that overlap. However, this case seems like it would only occur on a seek compaction which is unlikely on level 0. Furthermore, level 0 and level 1 had to have a certain arrangement of files.

Test Plan:
make check

Reviewers: dhruba, vamsi

Reviewed By: dhruba

CC: leveldb, sheki

Differential Revision: https://reviews.facebook.net/D7923
2013-01-15 12:43:09 -08:00
Chip Turner c0cb289d57 Various build cleanups/improvements
Summary:
Specific changes:

1) Turn on -Werror so all warnings are errors
2) Fix some warnings the above now complains about
3) Add proper dependency support so changing a .h file forces a .c file
to rebuild
4) Automatically use fbcode gcc on any internal machine rather than
whatever system compiler is laying around
5) Fix jemalloc to once again be used in the builds (seemed like it
wasn't being?)
6) Fix issue where 'git' would fail in build_detect_version because of
LD_LIBRARY_PATH being set in the third-party build system

Test Plan:
make, make check, make clean, touch a header file, make sure
rebuild is expected

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D7887
2013-01-14 18:40:22 -08:00
Mark Callaghan 2ba125faf6 fix warning for unused variable
Test Plan: compile

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7857
2013-01-11 15:00:47 -08:00
Abhishek Kona 85ad13be1a Port fix for Leveldb manifest writing bug from Open-Source
Summary:
Pretty much a blind copy of the patch in open source.
Hope to get this in before we make a release

Test Plan: make clean check

Reviewers: dhruba, heyongqiang

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7809
2013-01-10 12:06:03 -08:00
Dhruba Borthakur d7d43ae21a ExtendOverlappingInputs too slow for large databases.
Summary:
There was a bug in the ExtendOverlappingInputs method so that
the terminating condition for the backward search was incorrect.

Test Plan: make clean check

Reviewers: sheki, emayanke, MarkCallaghan

Reviewed By: MarkCallaghan

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7725
2013-01-02 13:19:06 -08:00
Zheng Shao c28097538a manifest_dump: Add --hex=1 option
Summary: Without this option, manifest_dump does not print binary keys for files in a human-readable way.

Test Plan:
./manifest_dump --hex=1 --verbose=0 --file=/data/users/zshao/fdb_comparison/leveldb/fbobj.apprequest-0_0_original/MANIFEST-000002
manifest_file_number 589 next_file_number 590 last_sequence 2311567 log_number 543  prev_log_number 0
--- level 0 --- version# 0 ---
 532:1300357['0000455BABE20000' @ 2183973 : 1 .. 'FFFCA5D7ADE20000' @ 2184254 : 1]
 536:1308170['000198C75CE30000' @ 2203313 : 1 .. 'FFFCF94A79E30000' @ 2206463 : 1]
 542:1321644['0002931AA5E50000' @ 2267055 : 1 .. 'FFF77B31C5E50000' @ 2270754 : 1]
 544:1286390['000410A309E60000' @ 2278592 : 1 .. 'FFFE470A73E60000' @ 2289221 : 1]
 538:1298778['0006BCF4D8E30000' @ 2217050 : 1 .. 'FFFD77DAF7E30000' @ 2220489 : 1]
 540:1282353['00090D5356E40000' @ 2231156 : 1 .. 'FFFFF4625CE40000' @ 2231969 : 1]
--- level 1 --- version# 0 ---
 510:2112325['000007F9C2D40000' @ 1782099 : 1 .. '146F5B67B8D80000' @ 1905458 : 1]
 511:2121742['146F8A3023D60000' @ 1824388 : 1 .. '28BC8FBB9CD40000' @ 1777993 : 1]
 512:801631['28BCD396F1DE0000' @ 2080191 : 1 .. '3082DBE9ADDB0000' @ 1989927 : 1]

Reviewers: dhruba, sheki, emayanke

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7425
2012-12-16 08:58:28 -08:00
Dhruba Borthakur c847a31727 Print compaction score for every compaction run.
Summary:
A compaction is picked based on its score. It is useful to
print the compaction score in the LOG because it aids in
debugging. If one looks at the logs, one can find out why
a compaction was preferred over another.

Test Plan: make clean check

Differential Revision: https://reviews.facebook.net/D7137
2012-12-04 10:03:47 -08:00
Abhishek Kona d29f181923 Fix all the lint errors.
Summary:
Scripted and removed all trailing spaces and converted all tabs to
spaces.

Also fixed other lint errors.
All lint errors from this point of time should be taken seriously.

Test Plan: make all check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7059
2012-11-28 17:18:41 -08:00
Dhruba Borthakur 2a39699900 Assertion failure while running with unit tests with OPT=-g
Summary:
When we expand the range of keys for a level 0 compaction, we
need to invoke ParentFilesInCompaction() only once for the
entire range of keys that is being compacted. We were invoking
it for each file that was being compacted, but this triggers
an assertion because each file's range were contiguous but
non-overlapping.

I renamed ParentFilesInCompaction to ParentRangeInCompaction
to adequately represent that it is the range-of-keys and
not individual files that we compact in a single compaction run.

Here is the assertion that is fixed by this patch.
db_test: db/version_set.cc:585: void leveldb::Version::ExtendOverlappingInputs(int, const leveldb::Slice&, const leveldb::Slice&, std::vector<leveldb::FileMetaData*, std::allocator<leveldb::FileMetaData*> >*, int): Assertion `user_cmp->Compare(flimit, user_begin) >= 0' failed.

Test Plan: make clean check OPT=-g

Reviewers: sheki

Reviewed By: sheki

CC: MarkCallaghan, emayanke, leveldb

Differential Revision: https://reviews.facebook.net/D6963
2012-11-26 14:00:39 -08:00
Dhruba Borthakur 7632fdb5cb Support taking a configurable number of files from the same level to compact in a single compaction run.
Summary:
The compaction process takes some files from LevelK and
merges it into LevelK+1. The number of files it picks from
LevelK was capped such a way that the total amount of
data picked does not exceed the maxfilesize of that level.
This essentially meant that only one file from LevelK
is picked for a single compaction.

For bulkloads, we would like to take many many file from
LevelK and compact them using a single compaction run.

This patch introduces a option called the 'source_compaction_factor'
(similar to expanded_compaction_factor). It is a multiplier
that is multiplied by the maxfilesize of that level to arrive
at the limit that is used to throttle the number of source
files from LevelK.  For bulk loads, set source_compaction_factor
to a very high number so that multiple files from the same
level are picked for compaction in a single compaction.

The default value of source_compaction_factor is 1, so that
we can keep backward compatibilty with existing compaction semantics.

Test Plan: make clean check

Reviewers: emayanke, sheki

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D6867
2012-11-21 08:37:03 -08:00
Dhruba Borthakur 3754f2f4ff A major bug that was not considering the compaction score of the n-1 level.
Summary:
The method Finalize() recomputes the compaction score of each
level and then sorts these score from largest to smallest. The
idea is that the level with the largest compaction score will
be a better candidate for compaction.  There are usually very
few levels, and a bubble sort code was used to sort these
compaction scores. There existed a bug in the sorting code that
skipped looking at the score for the n-1 level. This meant that
even if the compaction score of the n-1 level is large, it will
not be picked for compaction.

This patch fixes the bug and also introduces "asserts" in the
code to detect any possible inconsistencies caused by future bugs.

This bug existed in the very first code change that introduced
multi-threaded compaction to the leveldb code. That version of
code was committed on Oct 19th via
1ca0584345

Test Plan: make clean check OPT=-g

Reviewers: emayanke, sheki, MarkCallaghan

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D6837
2012-11-20 15:44:21 -08:00
Dhruba Borthakur dde70898a1 Fix asserts
Summary:
make check OPT=-g fails with the following assert.
==== Test DBTest.ApproximateSizes
db_test: db/version_set.cc:765: void leveldb::VersionSet::Builder::CheckConsistencyForDeletes(leveldb::VersionEdit*, int, int): Assertion `found' failed.

The assertion was that file #7 that was being deleted did not
preexists, but actualy it did pre-exist as shown in the manifest
dump shows below. The bug was that we did not check for file
existance at the same level.

*************************Edit[0] = VersionEdit {
  Comparator: leveldb.BytewiseComparator
}

*************************Edit[1] = VersionEdit {
  LogNumber: 8
  PrevLogNumber: 0
  NextFile: 9
  LastSeq: 80
  AddFile: 0 7 8005319 'key000000' @ 1 : 1 .. 'key000079' @ 80 : 1
}

*************************Edit[2] = VersionEdit {
  LogNumber: 8
  PrevLogNumber: 0
  NextFile: 13
  LastSeq: 80
  CompactPointer: 0 'key000079' @ 80 : 1
  DeleteFile: 0 7
  AddFile: 1 9 2101425 'key000000' @ 1 : 1 .. 'key000020' @ 21 : 1
  AddFile: 1 10 2101425 'key000021' @ 22 : 1 .. 'key000041' @ 42 : 1
  AddFile: 1 11 2101425 'key000042' @ 43 : 1 .. 'key000062' @ 63 : 1
  AddFile: 1 12 1701165 'key000063' @ 64 : 1 .. 'key000079' @ 80 : 1
}

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
2012-11-19 14:51:22 -08:00
Dhruba Borthakur 48dafb2c59 Fix compilation error introduced by previous commit
7889e09455

Summary:
Fix compilation error introduced by previous commit
7889e09455

Test Plan:
make clean check
2012-11-19 12:16:45 -08:00
Dhruba Borthakur 7889e09455 Enhance manifest_dump to print each individual edit.
Summary:
The manifest file contains a series of edits. If the verbose
option is switched on, then print each individual edit in the
manifest file. This helps in debugging.

Test Plan: make clean manifest_dump

Reviewers: emayanke, sheki

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D6807
2012-11-19 12:04:35 -08:00
Dhruba Borthakur 43d9a8225a Fix asserts so that "make check OPT=-g" works on performance branch
Summary:
Compilation used to fail with the error:
db/version_set.cc:1773: error: ‘number_of_files_to_sort_’ is not a member of ‘leveldb::VersionSet’

I created a new method called CheckConsistencyForDeletes() so that
all the high cost checking is done only when OPT=-g is specified.

I also fixed a bug in PickCompactionBySize that was triggered when
OPT=-g was switched on. The base_index in the compaction record
was not set correctly.

Test Plan: make check OPT=-g

Differential Revision: https://reviews.facebook.net/D6687
2012-11-13 10:40:52 -08:00
Dhruba Borthakur 95dda37858 Move filesize-based-sorting to outside the Mutex
Summary:
When a new version is created, we sort all the files at every
level based on their size. This is necessary because we want
to compact the largest file first. The sorting takes quite a
bit of CPU.

Moved the sorting code to be outside the mutex. Also, the
earlier code was sorting files at all levels but we do not
need to sort the highest-number level because those files
are never the cause of any compaction. To reduce sorting
costs, we sort only the first few files in each level
because it is likely that those are the only files in that
level that will be picked for compaction.

At steady state, I have seen that this patch increase
throughout from 1500 writes/sec to 1700 writes/sec at the
end of a 72 hour run. The cpu saving by not sorting the
last level was not distinctive in this test run because
there were only 100K files in the highest numbered level.
I expect the cpu saving to be significant when the number of
files is much higher.

This is mostly an early preview and not ready for rigorous review.

With this patch, the writs/sec is now bottlenecked not by the sorting code but by GetOverlappingInputs. I am working on a patch to optimize GetOverlappingInputs.

Test Plan: make check

Reviewers: MarkCallaghan, heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D6411
2012-11-07 15:39:44 -08:00
Dhruba Borthakur 8143062edd Merge branch 'master' into performance
Conflicts:
	db/db_impl.cc
	db/version_set.cc
	util/options.cc
2012-11-07 15:11:37 -08:00
Dhruba Borthakur 9b87a2bae8 Avoid doing a exhaustive search when looking for overlapping files.
Summary:
The Version::GetOverlappingInputs() is called multiple times in
the compaction code path. Eack invocation does a binary search
for overlapping files in the specified key range.
This patch remembers the offset of an overlapped file when
GetOverlappingInputs() is called the first time within
a compaction run. Suceeding calls to GetOverlappingInputs()
uses the remembered index to avoid the binary search.

I measured that 1000 iterations of GetOverlappingInputs
takes around 4500 microseconds without this patch. If I use
this patch with the hint on every invocation, then 1000
iterations take about 3900 microsecond.

Test Plan: make check OPT=-g

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: MarkCallaghan, emayanke, sheki

Differential Revision: https://reviews.facebook.net/D6513
2012-11-07 11:47:17 -08:00
Dhruba Borthakur aa42c66814 Fix all warnings generated by -Wall option to the compiler.
Summary:
The default compilation process now uses "-Wall" to compile.
Fix all compilation error generated by gcc.

Test Plan: make all check

Reviewers: heyongqiang, emayanke, sheki

Reviewed By: heyongqiang

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6525
2012-11-06 14:07:31 -08:00
Dhruba Borthakur 5f91868cee Merge branch 'master' into performance
Conflicts:
	db/version_set.cc
	util/options.cc
2012-11-05 16:51:55 -08:00
Dhruba Borthakur cb7a00227f The method GetOverlappingInputs should use binary search.
Summary:
The method Version::GetOverlappingInputs used a sequential search
to map a kay-range to a set of files. But the files are arranged
in ascending order of key, so a biary search is more effective.

This patch implements Version::GetOverlappingInputsBinarySearch
that finds one file that corresponds to the specified key range
and then iterates backwards and forwards to find all overlapping
files.

This patch is critical for making compactions efficient, especially
when there are thousands of files in a single level.

I measured that 1000 iterations of TEST_MaxNextLevelOverlappingBytes
takes 16000 microseconds without this patch. With this patch, the
same method takes about 4600 microseconds.

Test Plan: Almost all unit tests in db_test uses this method to lookup keys.

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: MarkCallaghan, emayanke, sheki

Differential Revision: https://reviews.facebook.net/D6465
2012-11-05 16:08:01 -08:00
heyongqiang f1a7c735b5 fix complie error
Summary:

as subject

Test Plan:n/a
2012-11-05 10:30:19 -08:00
heyongqiang d55c2ba305 Add a tool to change number of levels
Summary: as subject.

Test Plan: manually test it, will add a testcase

Reviewers: dhruba, MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6345
2012-11-05 10:17:39 -08:00
Dhruba Borthakur 53e04311b1 Merge branch 'master' into performance
Conflicts:
	db/db_bench.cc
	util/options.cc
2012-10-29 14:18:00 -07:00
Dhruba Borthakur 5b0fe6c73b Greedy algorithm for picking files to compact.
Summary:
It is best if we pick the largest file to compact in a level.
This reduces the write amplification factor for compactions.
Each level has an auxiliary data structure called files_by_size_
that sorts all files by their size. This data structure is
updated when a new version is created.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6195
2012-10-25 18:27:53 -07:00
Dhruba Borthakur 8fb5f40468 firstIndex fix for multi-threaded compaction code.
Summary:
Prior to multi-threaded compaction, wrap-around would be done by using
current_->files_[level[0]. With this change we should be
using the first file for which f->being_compacted is not true.

1ca0584345 (commitcomment-2041516)

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6165
2012-10-25 08:44:47 -07:00