Commit graph

1752 commits

Author SHA1 Message Date
Igor Canadi 52783c793c declare kInline size in arena.cc 2014-05-14 12:40:49 -07:00
Igor Canadi a490d2d3a0 Revert "Define kInlineSize in .cc file instead of header"
This reverts commit 35a8873aa6.
2014-05-14 12:36:21 -07:00
Igor Canadi 35a8873aa6 Define kInlineSize in .cc file instead of header 2014-05-14 12:29:46 -07:00
Igor Canadi eea73226e9 Improve EnvHdfs
Summary: Copy improvements from fbcode's version of EnvHdfs to our open-source version. Some very important bug fixes in there.

Test Plan: compiles

Reviewers: dhruba, haobo, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18711
2014-05-14 12:14:18 -07:00
Igor Canadi f4574449e9 Clean up compaction logging
Summary: Cleaned up compaction logging a little bit. Now file sizes are easier to read. Also, removed the trailing space.

Test Plan:
verified that i'm happy with logging output:

        files_size[#33(seq=101,sz=98KB,0) #31(seq=81,sz=159KB,0) #26(seq=0,sz=637KB,0)]

Reviewers: sdong, haobo, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18549
2014-05-14 12:13:50 -07:00
sdong 3e4a9ec241 Arena to inline 2KB of data in it.
Summary:
In order to use arena to a use case that the total allocation size might be small (LogBuffer is already such a case), inline 1KB of data in it, so that it can be mostly in stack or inline in another class.

If always inlining 2KB is a concern, I could make it a template to determine what to inline. However, dependents need to changes. Doesn't go with it for now

Test Plan: make all check.

Reviewers: haobo, igor, yhchiang, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18609
2014-05-14 11:49:01 -07:00
Igor Canadi 1ef31b6ca1 Merge pull request #143 from mlin/travis-ci
Travis CI
2014-05-14 10:44:22 -07:00
Yueh-Hsuan Chiang 9a0e3ab7ec Merge branch 'master' of github.com:facebook/rocksdb into show-reads 2014-05-13 16:45:54 -07:00
Yueh-Hsuan Chiang e883407ead [Java] Refined the output of Java DbBenchmark. 2014-05-13 16:44:53 -07:00
sdong 8c2c4602ee FixedPrefixTransform to include prefix length in its name
Summary: As title

Test Plan: make all check.

Reviewers: haobo, igor, yhchiang

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18705
2014-05-13 16:08:21 -07:00
Yueh-Hsuan Chiang e30dec938d [Java] Fixed a bug in Java DB Benchmark where random reads does not consider full key range.
Summary: Fixed a bug in Java DB Benchmark where random reads does not consider full key range.

Test Plan:
make rocksdbjava
make jdb_bench
cd java
jdb_bench.sh --db=/tmp/rocksdb-test --benchmarks=fillseq --use_existing_db=false --num=100000
jdb_bench.sh --db=/tmp/rocksdb-test --benchmarks=readrandom --use_existing_db=true --num=100000 --reads=1000000

Reviewers: haobo, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18693
2014-05-13 13:19:50 -07:00
Yueh-Hsuan Chiang 93f2643656 [Java] Make read benchmarks print out found / not-found information.
Summary: Make read benchmarks print out found / not-found information.

Test Plan:
make rocksdbjava
make jdb_bench
cd java
./jdb_bench.sh

Reviewers: haobo, sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18699
2014-05-13 13:11:26 -07:00
Igor Canadi 26f5dd9a5a TablePropertiesCollectorFactory
Summary:
This diff addresses task #4296714 and rethinks how users provide us with TablePropertiesCollectors as part of Options.

Here's description of task #4296714:
       I'm debugging #4295529 and noticed that our count of user properties kDeletedKeys is wrong. We're sharing one single InternalKeyPropertiesCollector with all Table Builders. In LOG Files, we're outputting number of kDeletedKeys as connected with a single table, while it's actually the total count of deleted keys since creation of the DB.

       For example, this table has 3155 entries and 1391828 deleted keys.

The problem with current approach that we call methods on a single TablePropertiesCollector for all the tables we create. Even worse, we could do it from multiple threads at the same time and TablePropertiesCollector has no way of knowing which table we're calling it for.

Good part: Looks like nobody inside Facebook is using Options::table_properties_collectors. This means we should be able to painfully change the API.

In this change, I introduce TablePropertiesCollectorFactory. For every table we create, we call `CreateTablePropertiesCollector`, which creates a TablePropertiesCollector for a single table. We then use it sequentially from a single thread, which means it doesn't have to be thread-safe.

Test Plan:
Added a test in table_properties_collector_test that fails on master (build two tables, assert that kDeletedKeys count is correct for the second one).
Also, all other tests

Reviewers: sdong, dhruba, haobo, kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18579
2014-05-13 12:30:55 -07:00
Yueh-Hsuan Chiang 2082a7d745 [Java] Temporary set the number of BG threads based on the number of BG compactions.
Summary:
Before the Java binding for Env is ready, Java developers have no way to
control the number of background threads.  This diff provides a temporary
solution where RocksDB.setMaxBackgroundCompactions() will affect the
number of background threads.

Note that once Env is ready.  Changes made in this diff should be reverted.

Test Plan:
make rocksdbjava
make jtest
make jdb_bench
java/jdb_bench.sh

Reviewers: haobo, sdong

Reviewed By: sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18681
2014-05-13 12:28:47 -07:00
Yueh-Hsuan Chiang 1c7799d8aa Fixed a file-not-found issue when a log file is moved to archive.
Summary:
Fixed a file-not-found issue when a log file is moved to archive
by doing a missing retry.

Test Plan:
make db_test
export ROCKSDB_TEST=TransactionLogIteratorRace
./db_test

Reviewers: sdong, haobo

Reviewed By: sdong

CC: igor, leveldb

Differential Revision: https://reviews.facebook.net/D18669
2014-05-12 17:50:21 -07:00
Yueh-Hsuan Chiang d14581f936 [Java] Rename org.rocksdb.Iterator to org.rocksdb.RocksIterator.
Summary:
Renamed Iterator to RocksIterator to avoid potential confliction with
the Java built-in Iterator.

Test Plan:
make rocksdbjava
make jtest

Reviewers: haobo, dhruba, sdong, ankgup87

Reviewed By: sdong

CC: leveldb, rsumbaly, swapnilghike, zzbennett

Differential Revision: https://reviews.facebook.net/D18615
2014-05-12 11:02:25 -07:00
Igor Canadi 3f8da15bf9 Merge pull request #142 from mlin/build-in-paths-containing-spaces
Fix building RocksDB in paths containing spaces
2014-05-12 10:48:23 -07:00
Igor Canadi d08073aafc Merge pull request #141 from dallasmarlow/master
only use MAP_HUGETLB when the kernel supports it
2014-05-11 15:44:32 -07:00
Dallas Marlow 557fbc9b3b arena spacing 2014-05-11 10:22:28 -04:00
Mike Lin 27c05ea2ee Add a minimal .travis.yml for Travis CI. Some ugly hacks needed to get RocksDB building in Travis' OpenVZ Ubuntu 12.04 environment. 2014-05-10 23:33:50 -07:00
Mike Lin 76596b5318 Fix building RocksDB in paths containing spaces -- quote path names in Makefile and build_detect_platform. 2014-05-10 21:01:25 -07:00
Igor Canadi 3edc056f6d comment 2014-05-10 11:25:56 -07:00
Igor Canadi 038a477b53 Make it easier to start using RocksDB
Summary:
This diff is addressing multiple things with a single goal -- to make RocksDB easier to use:
* Add some functions to Options that make RocksDB easier to tune.
* Add example code for both simple RocksDB and RocksDB with Column Families.
* Rewrite our README.md

Regarding Options, I took a stab at something we talked about for a long time:
* https://www.facebook.com/groups/rocksdb.dev/permalink/563169950448190/

I added functions:
* IncreaseParallelism() -- easy, increases the thread pool and max_background_compactions
* OptimizeLevelStyleCompaction(memtable_memory_budget) -- the easiest way to optimize rocksdb for less stalls with level style compaction. This is very likely not ideal configuration. Feel free to suggest improvements. I used some of Mark's suggestions from here: https://github.com/facebook/rocksdb/issues/54
* OptimizeUniversalStyleCompaction(memtable_memory_budget) -- optimize for universal compaction.

Test Plan: compiled rocksdb. ran examples.

Reviewers: dhruba, MarkCallaghan, haobo, sdong, yhchiang

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18621
2014-05-10 10:49:33 -07:00
Dallas Marlow 030db3d17e testing 2014-05-09 18:58:39 -04:00
sdong acd17fd002 Remove unused variable in DBIter
Summary: as title

Test Plan: Still compile

Reviewers: haobo, igor, yhchiang

Reviewed By: igor

CC: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18603
2014-05-09 13:20:35 -07:00
Tyler Neely deb89401fd have proprocessor choose correct mmap args 2014-05-09 14:49:25 -04:00
Igor Canadi fec4269966 Fix more gflag namespace issues 2014-05-09 08:41:02 -07:00
Igor Canadi a1068c91a1 Make RocksDB work with newer gflags
Summary:
Newer gflags switched from `google` namespace to `gflags` namespace. See: https://github.com/facebook/rocksdb/issues/139 and https://github.com/facebook/rocksdb/issues/102

Unfortunately, they don't define any macro with their namespace, so we need to actually try to compile gflags with two different namespace to figure out which one is the correct one.

Test Plan: works in fbcode environemnt. I'll also try in ubutnu with newer gflags

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18537
2014-05-08 17:25:13 -07:00
sdong ddd41146c4 MergingIterator uses autovector instead of vector
Summary:
Use autovector in MergingIterator so that if there are 4 or less child iterators in it, iterator wrappers are inline, which is more likely to be cache friendly.

Based on one test run with a shadow traffic of one product, it reduces CPU of MergingIterator::Seek() by half.

Test Plan: make all check

Reviewers: haobo, yhchiang, igor, dhruba

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18531
2014-05-08 15:01:20 -07:00
Igor Canadi af7453a673 autovector::resize
Summary: Resize the autovector!

Test Plan: test

Reviewers: sdong

Reviewed By: sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18543
2014-05-08 13:50:49 -07:00
Igor Canadi 8e37a29bfb Compaction with zero outputs
Summary: We had a hypothesis in https://reviews.facebook.net/D18507 that empty-string internal keys might have been caused by compaction filter deleting all the entries. I added a unit test for that case. Unforutnately, everything works as expected.

Test Plan: this is a test

Reviewers: dhruba, haobo, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18519
2014-05-08 13:48:39 -07:00
sdong 1c6a027d23 HashLinkedList::Iterator: remove an ununsed class variable
Summary: This variable is not used. Remove it.

Test Plan: build.

Reviewers: haobo, igor, yhchiang

Reviewed By: haobo

CC: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18525
2014-05-08 13:28:52 -07:00
dallas marlow f41cde3105 remove anon mmap allocation flag MAP_HUGETLB 2014-05-08 11:45:44 -04:00
Igor Canadi b5616dafd1 Fix iOS compile 2014-05-07 17:48:31 -07:00
Igor Canadi 768d424dd9 [fix] SIGSEGV when VersionEdit in MANIFEST is corrupted
Summary:
This was reported by our customers in task #4295529.

Cause:
* MANIFEST file contains a VersionEdit, which contains file entries whose 'smallest' and 'largest' internal keys are empty. String with zero characters. Root cause of corruption was not investigated. We should report corruption when this happens. However, we currently SIGSEGV.

Here's what happens:
* VersionEdit encodes zero-strings happily and stores them in smallest and largest InternalKeys. InternalKey::Encode() does assert when `rep_.empty()`, but we don't assert in production environemnts. Also, we should never assert as a result of DB corruption.
* As part of our ConsistencyCheck, we call GetLiveFilesMetaData()
* GetLiveFilesMetadata() calls `file->largest.user_key().ToString()`
* user_key() function does: 1. assert(size > 8) (ooops, no assert), 2. returns `Slice(internal_key.data(), internal_key.size() - 8)`
* since `internal_key.size()` is unsigned int, this call translates to `Slice(whatever, 1298471928561892576182756)`. Bazinga.

Fix:
* VersionEdit checks if InternalKey is valid in `VersionEdit::GetInternalKey()`. If it's invalid, returns corruption.

Lessons learned:
* Always keep in mind that even if you `assert()`, production code will continue execution even if assert fails.
* Never `assert` based on DB corruption. Assert only if the code should guarantee that assert can't fail.

Test Plan: dumped offending manifest. Before: assert. Now: corruption

Reviewers: dhruba, haobo, sdong

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18507
2014-05-07 16:52:12 -07:00
Igor Canadi 313b2e5da1 Better INSTALL.md and Makefile rules
Summary: We have a lot of problems with gflags. However, when compiling rocksdb static library, we don't need gflags dependency. Reorganize INSTALL.md such that first-time customers don't need any dependency installed to actually build rocksdb static library.

Test Plan: none

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18501
2014-05-07 16:51:30 -07:00
sdong 9efbd85ac9 fsync directory after creating current file in NewDB()
Summary: One of our users reported current file corruption. The machine was rebooted during the time. This is the only think I can think of which could cause current file corruption. Just add this paranoid check.

Test Plan: make all check

Reviewers: haobo, igor

Reviewed By: haobo

CC: yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18495
2014-05-06 17:51:33 -07:00
sdong 3a171dcb51 Pass logger to memtable rep and TLB page allocation error logged to info logs
Summary:
TLB page allocation errors are now logged to info logs, instead of stderr.
In order to do that, mem table rep's factory functions take a info logger now.

Test Plan: make all check

Reviewers: haobo, igor, yhchiang

Reviewed By: yhchiang

CC: leveldb, yhchiang, dhruba

Differential Revision: https://reviews.facebook.net/D18471
2014-05-05 16:43:37 -07:00
Igor Canadi 044af85847 Update HISTORY.md -- release RocksDB 3.0 2014-05-05 14:44:57 -07:00
Igor Canadi 7984b9bbb0 BackupableDBTest thread-safe
Summary: We need to lock accesses to some TestEnv variables. Otherwise we get failures like http://ci-builds.fb.com/job/rocksdb_asan_check/657/console

Test Plan: make check

Reviewers: dhruba, haobo, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18489
2014-05-05 14:30:24 -07:00
Igor Canadi 15c3991933 Add comment about ValueType 2014-05-05 12:57:47 -07:00
Igor Canadi 9e7d00d9e7 Make rocksdb work with all versions of lz4
Summary:
There are some projects in fbcode that define lz4 dependency on r108. We, however, defined dependency on r117. That produced some interesting issues and our build system was not happy.

This diff makes rocksdb work with both r108 and r117. Hopefully this will fix our problems.

Test Plan: compiled rocksdb with both r108 and r117 lz4

Reviewers: dhruba, sdong, haobo

Reviewed By: sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18465
2014-05-05 11:35:40 -07:00
Igor Canadi d2569fea47 log_and_apply_bench on a new benchmark framework
Summary:
db_test includes Benchmark for LogAndApply. This diff removes it from db_test and puts it into a separate log_and_apply bench. I just wanted to play around with our new benchmark framework and figure out how it works.

I would also like to show you how great it is! I believe right set of microbenchmarks can speed up our productivity a lot and help catch early regressions.

Test Plan: no

Reviewers: dhruba, haobo, sdong, ljin, yhchiang

Reviewed By: yhchiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18261
2014-05-05 11:11:48 -07:00
sdong 9b17558311 PlainTableFactory::PlainTableFactory() to have huge TLB turned off by default
Summary: PlainTableFactory::PlainTableFactory() now has Huge TLB page feature turned on by default. Although it is not a public API (which we always turn the feature off now), our unit tests, like db_test sometimes uses it directly, which causes wrong coverage of codes. This patch fix it to allow unit tests to run with the correct setting

Test Plan: Run db_test and make sure this feature is not on any more.

Reviewers: igor, haobo

Reviewed By: igor

CC: yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18483
2014-05-05 11:05:54 -07:00
Igor Canadi 6785a52b1b Temporary remove perror() calls before we can log from inside of arena 2014-05-05 07:13:48 -07:00
sdong 4a7c747064 Revert "Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB""
And make the default 0 for hash linked list memtable

This reverts commit d69dc64be7.
2014-05-04 13:56:29 -07:00
Yueh-Hsuan Chiang d56959a5fc [Java] Use environmental variable JAVA_HOME in Makefile for RocksJava. 2014-05-04 13:23:21 -07:00
Igor Canadi db1854d78c Declare all DB methods virtual so that StackableDB can override them 2014-05-04 11:39:49 -07:00
Igor Canadi d69dc64be7 Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB"
This reverts commit 7dafa3a1d7.
2014-05-04 08:37:09 -07:00
Benjamin Renard 41e5cf2392 Add share_files_with_cheksum option to BackupEngine
Summary: added a new option to BackupEngine: if share_files_with_checksum is set to true, sst files are stored in shared_checksum/ and are identified by the triple (file name, checksum, file size) instead of just the file name. This option is targeted at distributed databases that want to backup their primary replica.

Test Plan: unit tests and tested backup and restore on a distributed rocksdb

Reviewers: igor

Reviewed By: igor

Differential Revision: https://reviews.facebook.net/D18393
2014-05-02 17:08:55 -07:00