Commit Graph

135 Commits

Author SHA1 Message Date
Igor Canadi b947fdc89d Column family support for DB::OpenForReadOnly()
Summary: When opening DB in read-only mode, client can choose to only specify a subset of column families ("default" column family can't be omitted, though)

Test Plan: added a unit test in column_family_test

Reviewers: haobo, sdong, ljin, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17565
2014-04-09 09:56:17 -07:00
Igor Canadi 76642b81f4 Increase done even if progress_reports is false 2014-03-20 16:52:59 -07:00
Igor Canadi 159928dfa5 Added flag progress_reports in db_stress 2014-03-19 09:58:41 -07:00
Igor Canadi 64904b39a0 Merge branch 'master' into columnfamilies
Conflicts:
	utilities/backupable/backupable_db.cc
2014-03-17 17:57:14 -07:00
Igor Canadi 5601bc4619 Check starts_with(prefix) in MultiPrefixIterate
Summary: We switched to prefix_seek method of seeking. This means that anytime we check Valid(), we also need to check starts_with(prefix)

Test Plan: ran db_stress

Reviewers: ljin

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16953
2014-03-17 17:02:34 -07:00
Igor Canadi e0c1211555 Merge branch 'master' into columnfamilies
Conflicts:
	db/version_set.cc
	tools/db_stress.cc
2014-03-17 12:21:05 -07:00
Igor Canadi 9b8a2b52d4 No prefix iterator in db_stress
Summary: We're trying to deprecate prefix iterators, so no need to test them in db_stress

Test Plan: ran it

Reviewers: ljin

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16917
2014-03-17 10:02:46 -07:00
Igor Canadi 928ee23567 Change WriteBatch interface 2014-03-14 13:40:06 -07:00
Igor Canadi e1f56e12cf Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_test.cc
	tools/db_stress.cc
2014-03-13 13:21:20 -07:00
Igor Canadi 04a1035efe Revert "DB stress with normal skip list"
This reverts commit 86926d8c6a.
2014-03-12 22:21:13 -07:00
Lei Jin 02a2cb139b fix VerifyDb in StressTest
Summary:
this should fix the hash_skip_list issue, but I still see seqno
assertion failure in the last run. Will continue investigating and
address that in a different diff

Test Plan: make whitebox_crash_test

Reviewers: igor

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16851
2014-03-12 22:20:22 -07:00
Igor Canadi 86926d8c6a DB stress with normal skip list
Summary:
Hash skip list has issues, causing db_stress to fail badly.

For now, switching back to skip_list by default before we figure out root cause.

Test Plan: db_stress is happy(ier)

Reviewers: ljin

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16845
2014-03-12 17:35:27 -07:00
Igor Canadi dff9214165 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	tools/db_stress.cc
2014-03-12 10:17:41 -07:00
Igor Canadi 5ba028c179 DBStress cleanup
Summary:
*) fixed the comment
*) constant 1 was not casted to 64-bit, which (I think) might cause overflow if we shift it too much
*) default prefix size to be 7, like it was before

Test Plan: compiled

Reviewers: ljin

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16827
2014-03-12 09:31:06 -07:00
Lei Jin 86ba3e24e3 make assert based on FLAGS_prefix_size
Summary: as title

Test Plan: running python tools/db_crashtest.py

Reviewers: igor

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16803
2014-03-11 16:33:42 -07:00
Lei Jin 02dab3be11 fix db_stress test
Summary: Fix the db_stress test, let is run with HashSkipList for real

Test Plan:
python tools/db_crashtest.py
python tools/db_crashtest2.py

Reviewers: igor, haobo

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16773
2014-03-11 13:44:33 -07:00
Igor Canadi 56ca8338e5 initialize static const outside of class 2014-03-11 13:08:48 -07:00
Igor Canadi 457c78eb89 [CF] db_stress for column families
Summary:
I had this diff for a while to test column families implementation. Last night, I ran it sucessfully for 10 hours with the command:

   time ./db_stress --threads=30 --ops_per_thread=200000000 --max_key=5000 --column_families=20 --clear_column_family_one_in=3000000 --verify_before_write=1  --reopen=50 --max_background_compactions=10 --max_background_flushes=10 --db=/tmp/db_stress

It is ready to be committed :)

Test Plan: Ran it for 10 hours

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16797
2014-03-11 12:06:12 -07:00
Lei Jin 8d007b4aaf Consolidate SliceTransform object ownership
Summary:
(1) Fix SanitizeOptions() to also check HashLinkList. The current
dynamic case just happens to work because the 2 classes have the same
layout.
(2) Do not delete SliceTransform object in HashSkipListFactory and
HashLinkListFactory destructor. Reason: SanitizeOptions() enforces
prefix_extractor and SliceTransform to be the same object when
Hash**Factory is used. This makes the behavior strange: when
Hash**Factory is used, prefix_extractor will be released by RocksDB. If
other memtable factory is used, prefix_extractor should be released by
user.

Test Plan: db_bench && make asan_check

Reviewers: haobo, igor, sdong

Reviewed By: igor

CC: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D16587
2014-03-10 12:56:46 -07:00
Albert Strasheim df2f92214a Support for LZ4 compression. 2014-02-08 14:15:51 -08:00
Siying Dong 8477255da3 Moving Some includes from options.h to forward declaration
Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies.

Test Plan: make all check

Reviewers: kailiu, haobo, igor, dhruba

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15411
2014-01-24 17:16:22 -08:00
Igor Canadi e832e72b31 Revert "Moving to glibc-fb"
This reverts commit d24961b65e.

For some reason, glibc2.17-fb breaks gflags. Reverting for now
2014-01-24 11:50:38 -08:00
Igor Canadi d24961b65e Moving to glibc-fb
Summary:
It looks like we might have some trouble when building the new release with 4.8, since fbcode is using glibc2.17-fb by default and we are using glibc2.17. It was reported by Benjamin Renard in our internal group.

This diff moves our fbcode build to use glibc2.17-fb by default. I got some linker errors when compiling, complaining that `google::SetUsageMessage()` was undefined. After deleting all offending lines, the compile was successful and everything works.

Test Plan:
Compiled
Ran ./db_bench ./db_stress ./db_repl_stress

Reviewers: kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15405
2014-01-24 10:24:08 -08:00
Igor Canadi 83681bf9ef Statistics code cleanup
Summary: I'm separating code-cleanup part of https://reviews.facebook.net/D14517. This will make D14517 easier to understand and this diff easier to review.

Test Plan: make check

Reviewers: haobo, kailiu, sdong, dhruba, tnovak

Reviewed By: tnovak

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15099
2014-01-17 12:46:06 -08:00
Igor Canadi eb12e47e0e Killing Transform Rep
Summary:
Let's get rid of TransformRep and it's children. We have confirmed that HashSkipListRep works better with multifeed, so there is no benefit to keeping this around.

This diff is mostly just deleting references to obsoleted functions. I also have a diff for fbcode that we'll need to push when we switch to new release.

I had to expose HashSkipListRepFactory in the client header files because db_impl.cc needs access to GetTransform() function for SanitizeOptions.

Test Plan: make check

Reviewers: dhruba, haobo, kailiu, sdong

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14397
2013-12-03 12:42:15 -08:00
Igor Canadi 469a9f32a7 Fix two nasty use-after-free-bugs
Summary:
These bugs were caught by ASAN crash test.
1. The first one, in table/filter_block.cc is very nasty. We first reference entries_ and store the reference to Slice prev. Then, we call entries_.append(), which can change the reference. The Slice prev now points to junk.
2. The second one is a bug in a test, so it's not very serious. Once we set read_opts.prefix, we never clear it, so some other function might still reference it.

Test Plan: asan crash test now runs more than 5 mins. Before, it failed immediately. I will run the full one, but the full one takes quite some time (5 hours)

Reviewers: dhruba, haobo, kailiu

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14223
2013-11-19 21:01:48 -08:00
kailiu 97d8e573a6 make util/env_posix.cc work under mac
Summary: This diff invoves some more complicated issues in the posix environment.

Test Plan: works under mac os. will need to verify dev box.

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14061
2013-11-16 23:44:39 -08:00
Kai Liu 88ba331c1a Add the index/filter block cache
Summary: This diff leverage the existing block cache and extend it to cache index/filter block.

Test Plan:
Added new tests in db_test and table_test

The correctness is checked by:

1. make check
2. make valgrind_check

Performance is test by:

1. 10 times of build_tools/regression_build_test.sh on two versions of rocksdb before/after the code change. Test results suggests no significant difference between them. For the two key operatons `overwrite` and `readrandom`, the average iops are both 20k and ~260k, with very small variance).
2. db_stress.

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb, haobo, xjin

Differential Revision: https://reviews.facebook.net/D13167
2013-11-12 22:46:51 -08:00
Dhruba Borthakur b4ad5e89ae Implement a compressed block cache.
Summary:
Rocksdb can now support a uncompressed block cache, or a compressed
block cache or both. Lookups first look for a block in the
uncompressed cache, if it is not found only then it is looked up
in the compressed cache. If it is found in the compressed cache,
then it is uncompressed and inserted into the uncompressed cache.

It is possible that the same block resides in the compressed cache
as well as the uncompressed cache at the same time. Both caches
have their own individual LRU policy.

Test Plan: Unit test case attached.

Reviewers: kailiu, sdong, haobo, leveldb

Reviewed By: haobo

CC: xjin, haobo

Differential Revision: https://reviews.facebook.net/D12675
2013-11-01 14:31:35 -07:00
Slobodan Predolac e44976b199 Conversion of db_bench, db_stress and db_repl_stress to use gflags
Summary: Converted db_stress, db_repl_stress and db_bench to use gflags

Test Plan: I tested by printing out all the flags from old and new versions. Tried defaults, + various combinations with "interesting flags". Also, tested by running db_crashtest.py and db_crashtest2.py.

Reviewers: emayanke, dhruba, haobo, kailiu, sdong

Reviewed By: emayanke

CC: leveldb, xjin

Differential Revision: https://reviews.facebook.net/D13581
2013-10-24 07:43:14 -07:00
Dhruba Borthakur 9cd221094c Add appropriate LICENSE and Copyright message.
Summary:
Add appropriate LICENSE and Copyright message.

Test Plan:
make check

Reviewers:

CC:

Task ID: #

Blame Rev:
2013-10-16 17:48:41 -07:00
Dhruba Borthakur 4463b11cad Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix.
Summary: Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix.

Test Plan: make check

Reviewers: emayanke, haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D13311
2013-10-06 00:14:26 -07:00
Dhruba Borthakur a143ef9b38 Change namespace from leveldb to rocksdb
Summary:
Change namespace from leveldb to rocksdb. This allows a single
application to link in open-source leveldb code as well as
rocksdb code into the same process.

Test Plan: compile rocksdb

Reviewers: emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D13287
2013-10-04 11:59:26 -07:00
Mayank Agarwal 6b34021fc2 Triggering verify for gets also
Summary: Will use iterators to verify keys in the db for half of its keys and Gets for the other half.

Test Plan: ./db_stress --max_key=1000 --ops_per_thread=100

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D13227
2013-10-02 11:22:17 -07:00
Natalie Hildebrandt 7edb92b843 Phase 2 of iterator stress test
Summary: Using an iterator instead of the Get method, each thread goes through a portion of the database and verifies values by comparing to the shared state.

Test Plan:
./db_stress --db=/tmp/tmppp --max_key=10000 --ops_per_thread=10000

To test some basic cases, the following lines can be added (each set in turn) to the verifyDb method with the following expected results:

    // Should abort with "Unexpected value found"
    shared.Delete(start);

    // Should abort with "Value not found"
    WriteOptions write_opts;
    db_->Delete(write_opts, Key(start));

    // Should succeed
    WriteOptions write_opts;
    shared.Delete(start);
     db_->Delete(write_opts, Key(start));

    // Should abort with "Value not found"
    WriteOptions write_opts;
    db_->Delete(write_opts, Key(start + (end-start)/2));

    // Should abort with "Value not found"
    db_->Delete(write_opts, Key(end-1));

    // Should abort with "Unexpected value"
    shared.Delete(end-1);

    // Should abort with "Unexpected value"
    shared.Delete(start + (end-start)/2);

    // Should abort with "Value not found"
    db_->Delete(write_opts, Key(start));
    shared.Delete(start);
    db_->Delete(write_opts, Key(end-1));
    db_->Delete(write_opts, Key(end-2));

To test the out of range abort, change the key in the for loop to Key(i+1), so that the key defined by the index i is now outside of the supposed range of the database.

Reviewers: emayanke

Reviewed By: emayanke

CC: dhruba, xjin

Differential Revision: https://reviews.facebook.net/D13071
2013-09-30 16:48:00 -07:00
Natalie Hildebrandt 433541823c Phase 1 of an iterator stress test
Summary:
Added MultiIterate() which does a seek and some Next/Prev
calls.  Iterator status is checked only, no data integrity check

Test Plan:
make db_stress
./db_stress --iterpercent=<nonzero value> --readpercent=, etc.

Reviewers: emayanke, dhruba, xjin

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D12915
2013-09-19 16:47:24 -07:00
Dhruba Borthakur 4012ca1c7b Added a parameter to limit the maximum space amplification for universal compaction.
Summary:
Added a new field called max_size_amplification_ratio in the
CompactionOptionsUniversal structure. This determines the maximum
percentage overhead of space amplification.

The size amplification is defined to be the ratio between the size of
the oldest file to the sum of the sizes of all other files. If the
size amplification exceeds the specified value, then min_merge_width
and max_merge_width are ignored and a full compaction of all files is done.
A value of 10 means that the size a database that stores 100 bytes
of user data could occupy 110 bytes of physical storage.

Test Plan: Unit test DBTest.UniversalCompactionSpaceAmplification added.

Reviewers: haobo, emayanke, xjin

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D12825
2013-09-13 16:27:18 -07:00
Dhruba Borthakur 1186192ed1 Replace include/leveldb with include/rocksdb.
Summary: Replace include/leveldb with include/rocksdb.

Test Plan:
make clean; make check
make clean; make release

Differential Revision: https://reviews.facebook.net/D12489
2013-08-23 10:51:00 -07:00
Jim Paton 74781a0c49 Add three new MemTableRep's
Summary:
This patch adds three new MemTableRep's: UnsortedRep, PrefixHashRep, and VectorRep.

UnsortedRep stores keys in an std::unordered_map of std::sets. When an iterator is requested, it dumps the keys into an std::set and iterates over that.

VectorRep stores keys in an std::vector. When an iterator is requested, it creates a copy of the vector and sorts it using std::sort. The iterator accesses that new vector.

PrefixHashRep stores keys in an unordered_map mapping prefixes to ordered sets.

I also added one API change. I added a function MemTableRep::MarkImmutable. This function is called when the rep is added to the immutable list. It doesn't do anything yet, but it seems like that could be useful. In particular, for the vectorrep, it means we could elide the extra copy and just sort in place. The only reason I haven't done that yet is because the use of the ArenaAllocator complicates things (I can elaborate on this if needed).

Test Plan:
make -j32 check
./db_stress --memtablerep=vector
./db_stress --memtablerep=unsorted
./db_stress --memtablerep=prefixhash --prefix_size=10

Reviewers: dhruba, haobo, emayanke

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D12117
2013-08-22 23:10:02 -07:00
Deon Nicholas b87dcae1a3 Made merge_oprator a shared_ptr; and added TTL unit tests
Test Plan:
- make all check;
- make release;
- make stringappend_test; ./stringappend_test

Reviewers: haobo, emayanke

Reviewed By: haobo

CC: leveldb, kailiu

Differential Revision: https://reviews.facebook.net/D12381
2013-08-20 13:35:28 -07:00
Deon Nicholas ad48c3c262 Benchmarking for Merge Operator
Summary:
Updated db_bench and utilities/merge_operators.h to allow for dynamic benchmarking
of merge operators in db_bench. Added a new test (--benchmarks=mergerandom), which performs
a bunch of random Merge() operations over random keys. Also added a "--merge_operator=" flag
so that the tester can easily benchmark different merge operators. Currently supports
the PutOperator and UInt64Add operator. Support for stringappend or list append may come later.

Test Plan:
	1. make db_bench
	2. Test the PutOperator (simulating Put) as follows:
./db_bench --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom --merge_operator=put
--threads=2

3. Test the UInt64AddOperator (simulating numeric addition) similarly:
./db_bench --value_size=8 --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom
--merge_operator=uint64add --threads=2

Reviewers: haobo, dhruba, zshao, MarkCallaghan

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11535
2013-08-15 17:13:07 -07:00
Xing Jin 0a5afd1afc Minor fix to current codes
Summary:
Minor fix to current codes, including: coding style, output format,
comments. No major logic change. There are only 2 real changes, please see my inline comments.

Test Plan: make all check

Reviewers: haobo, dhruba, emayanke

Differential Revision: https://reviews.facebook.net/D12297
2013-08-14 23:03:57 -07:00
Tyler Harter 7612d496ff Add prefix scans to db_stress (and bug fix in prefix scan)
Summary: Added support for prefix scans.

Test Plan: ./db_stress --max_key=4096 --ops_per_thread=10000

Reviewers: dhruba, vamsi

Reviewed By: vamsi

CC: leveldb

Differential Revision: https://reviews.facebook.net/D12267
2013-08-14 16:58:36 -07:00
Dhruba Borthakur f5fa26b6a9 Merge branch 'performance' of github.com:facebook/rocksdb into performance
Conflicts:
	db/builder.cc
	db/db_impl.cc
	db/version_set.cc
	include/leveldb/statistics.h
2013-08-07 11:58:06 -07:00
Mayank Agarwal 1d7b4765c3 Expose base db object from ttl wrapper
Summary: rocksdb replicaiton will need this when writing value+TS from master to slave 'as is'

Test Plan: make

Reviewers: dhruba, vamsi, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11919
2013-08-05 18:44:14 -07:00
Dhruba Borthakur 711a30cb30 Merge branch 'master' into performance
Conflicts:
	include/leveldb/options.h
	include/leveldb/statistics.h
	util/options.cc
2013-08-02 10:22:08 -07:00
Mayank Agarwal 59d0b02f8b Expand KeyMayExist to return the proper value if it can be found in memory and also check block_cache
Summary: Removed KeyMayExistImpl because KeyMayExist demanded Get like semantics now. Removed no_io from memtable and imm because we need the proper value now and shouldn't just stop when we see Merge in memtable. Added checks to block_cache. Updated documentation and unit-test

Test Plan: make all check;db_stress for 1 hour

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11853
2013-08-01 09:07:46 -07:00
Mayank Agarwal bf66c10b13 Use KeyMayExist for WriteBatch-Deletes
Summary:
Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete.
Added code to skip getting Table from disk if not already present in table_cache.
Some renaming of variables.
Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch.
Changed KeyMayExist to not be pure virtual and provided a default implementation.
Expanded unit-tests in db_test to check appropriately.
Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1.

Test Plan: db_stress;make check

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb, xjin

Differential Revision: https://reviews.facebook.net/D11745
2013-07-23 13:36:50 -07:00
Dhruba Borthakur 4a745a5666 Merge branch 'master' into performance
Conflicts:
	db/version_set.cc
	include/leveldb/options.h
	util/options.cc
2013-07-17 15:05:57 -07:00
Mayank Agarwal 2a986919d6 Make rocksdb-deletes faster using bloom filter
Summary:
Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving:
1. Put of delete type
2. Space in the db,and
3. Compaction time

Test Plan:
make all check;
will run db_stress and db_bench and enhance unit-test once the basic design gets approved

Reviewers: dhruba, haobo, vamsi

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11607
2013-07-11 12:11:11 -07:00
Mayank Agarwal 821889e207 Print complete statistics in db_stress
Summary: db_stress should alos print complete statistics like db_bench. Needed this when I wanted to measure number of delete-IOs dropped due to CheckKeyMayExist to be introduced to rocksdb codebase later- to make deltes in rocksdb faster

Test Plan: make db_stress;./db_stress --max_key=100 --ops_per_thread=1000 --statistics=1

Reviewers: sheki, dhruba, vamsi, haobo

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D11655
2013-07-10 18:07:13 -07:00
Dhruba Borthakur 116ec527f2 Renamed 'hybrid_compaction' tp be "Universal Compaction'.
Summary:
All the universal compaction parameters are encapsulated in
a new file universal_compaction.h

Test Plan:
make check
2013-07-03 15:47:53 -07:00
Dhruba Borthakur 47c4191fe8 Reduce write amplification by merging files in L0 back into L0
Summary:
There is a new option called hybrid_mode which, when switched on,
causes HBase style compactions.  Files from L0 are
compacted back into L0. This meat of this compaction algorithm
is in PickCompactionHybrid().

All files reside in L0. That means all files have overlapping
keys. Each file has a time-bound, i.e. each file contains a
range of keys that were inserted around the same time. The
start-seqno and the end-seqno refers to the timeframe when
these keys were inserted.  Files that have contiguous seqno
are compacted together into a larger file. All files are
ordered from most recent to the oldest.

The current compaction algorithm starts to look for
candidate files starting from the most recent file. It continues to
add more files to the same compaction run as long as the
sum of the files chosen till now is smaller than the next
candidate file size. This logic needs to be debated
and validated.

The above logic should reduce write amplification to a
large extent... will publish numbers shortly.

Test Plan: dbstress runs for 6 hours with no data corruption (tested so far).

Differential Revision: https://reviews.facebook.net/D11289
2013-06-30 20:07:04 -07:00
Haobo Xu 96be2c4ee0 [RocksDB] Add mmap_read option for db_stress
Summary: as title, also removed an incorrect assertion

Test Plan: make check; db_stress --mmap_read=1; db_stress --mmap_read=0

Reviewers: dhruba, emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11367
2013-06-19 10:28:32 -07:00
Dhruba Borthakur 836534debd Enhance dbstress to allow specifying compaction trigger for L0.
Summary:
Rocksdb allos specifying the number of files in L0 that triggers
compactions. Expose this api as a command line parameter for
running db_stress.

Test Plan: Run test

Reviewers: sheki, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11343
2013-06-17 14:15:09 -07:00
Haobo Xu bdf1085944 [RocksDB] cleanup EnvOptions
Summary:
This diff simplifies EnvOptions by treating it as POD, similar to Options.
- virtual functions are removed and member fields are accessed directly.
- StorageOptions is removed.
- Options.allow_readahead and Options.allow_readahead_compactions are deprecated.
- Unused global variables are removed: useOsBuffer, useFsReadAhead, useMmapRead, useMmapWrite

Test Plan: make check; db_stress

Reviewers: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11175
2013-06-12 11:17:19 -07:00
Haobo Xu c2e2460f8a [RocksDB] Expose DBStatistics
Summary: Make Statistics usable by client

Test Plan: make check; db_bench

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D10899
2013-05-23 11:49:38 -07:00
Vamsi Ponnekanti 760dd4750f [Kill randomly at various points in source code for testing]
Summary:
This is initial version. A few ways in which this could
be extended in the future are:
(a) Killing from more places in source code
(b) Hashing stack and using that hash in determining whether to crash.
    This is to avoid crashing more often at source lines that are executed
    more often.
(c) Raising exceptions or returning errors instead of killing

Test Plan:
This whole thing is for testing.

Here is part of output:

python2.7 tools/db_crashtest2.py -d 600
Running db_stress

db_stress retncode -15 output LevelDB version     : 1.5
Number of threads   : 32
Ops per thread      : 10000000
Read percentage     : 50
Write-buffer-size   : 4194304
Delete percentage   : 30
Max key             : 1000
Ratio #ops/#keys    : 320000
Num times DB reopens: 0
Batches/snapshots   : 1
Purge redundant %   : 50
Num keys per lock   : 4
Compression         : snappy
------------------------------------------------
No lock creation because test_batches_snapshots set
2013/04/26-17:55:17  Starting database operations
Created bg thread 0x7fc1f07ff700
... finished 60000 ops
Running db_stress

db_stress retncode -15 output LevelDB version     : 1.5
Number of threads   : 32
Ops per thread      : 10000000
Read percentage     : 50
Write-buffer-size   : 4194304
Delete percentage   : 30
Max key             : 1000
Ratio #ops/#keys    : 320000
Num times DB reopens: 0
Batches/snapshots   : 1
Purge redundant %   : 50
Num keys per lock   : 4
Compression         : snappy
------------------------------------------------
Created bg thread 0x7ff0137ff700
No lock creation because test_batches_snapshots set
2013/04/26-17:56:15  Starting database operations
... finished 90000 ops

Revert Plan: OK

Task ID: #2252691

Reviewers: dhruba, emayanke

Reviewed By: emayanke

CC: leveldb, haobo

Differential Revision: https://reviews.facebook.net/D10581
2013-05-21 18:21:49 -07:00
Mayank Agarwal 3827403c51 Check to db_stress to not allow disable_wal and reopens set together
Summary: db can't reopen safely with disable_wal set!

Test Plan: make db_stress; run db_stress with disable_wal and reopens set and see error

Reviewers: dhruba, vamsi

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D10857
2013-05-21 11:49:29 -07:00
Mayank Agarwal 15ccd10c7f A nit to db_stress to terminate generated value at proper length
Summary: Will help while debugging if the generated value is truncated at proper length.

Test Plan: make db_stress;/db_stress --max_key=10000 --db=/tmp/mcr --threads=1 --ops_per_thread=10000

Reviewers: dhruba, vamsi

Reviewed By: vamsi

Differential Revision: https://reviews.facebook.net/D10845
2013-05-20 18:13:32 -07:00
Mayank Agarwal d786b25e2d Timestamp and TTL Wrapper for rocksdb
Summary:
When opened with DBTimestamp::Open call, timestamps are prepended to and stripped from the value during subsequent Put and Get calls respectively. The Timestamp is used to discard values in Get and custom compaction filter which have exceeded their TTL which is specified during Open.
Have made a temporary change to Makefile to let us test with the temporary file TestTime.cc. Have also changed the private members of db_impl.h to protected to let them be inherited by the new class DBTimestamp

Test Plan: make db_timestamp; TestTime.cc(will not check it in) shows how to use the apis currently, but I will write unit-tests shortly

Reviewers: dhruba, vamsi, haobo, sheki, heyongqiang, vkrest

Reviewed By: vamsi

CC: zshao, xjin, vkrest, MarkCallaghan

Differential Revision: https://reviews.facebook.net/D10311
2013-05-02 16:34:42 -07:00
Mayank Agarwal 9b3134f5ca Make provision for db_stress to work with a pre-existing dir
Summary: The crash_test depends on db_stress to work with pre-existing dir

Test Plan: make db_stress; Run db_stress with 'destroy_db_initially=0'

Reviewers: vamsi, dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D10041
2013-04-08 13:12:27 -07:00
Mayank Agarwal 26f68d3939 db_stress #reopens should be less than ops_per_thread
Summary: For sanity w.r.t. the way we split up the reopens equally among the ops/thread

Test Plan: make db_stress; db_stress --ops_per_thread=10 --reopens=10 => error

Reviewers: vamsi, dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D10023
2013-04-08 12:09:17 -07:00
Mayank Agarwal e937d47180 Python script to periodically run and kill the db_stress test
Summary: The script runs and kills the stress test periodically. Default values have been used in the script now. Should I make this a part of the Makefile or automated rocksdb build? The values can be easily changed in the script right now, but should I add some support for variable values or input to the script? I believe the script achieves its objective of unsafe crashes and reopening to expect sanity in the database.

Test Plan: python tools/db_crashtest.py

Reviewers: dhruba, vamsi, MarkCallaghan

Reviewed By: vamsi

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9369
2013-04-01 12:18:46 -07:00
Abhishek Kona 63f216ee0a memory manage statistics
Summary:
Earlier Statistics object was a raw pointer. This meant the user had to clear up
the Statistics object after creating the database. In most use cases the database is created in a function and the statistics pointer is out of scope. Hence the statistics object would never be deleted.
Now Using a shared_ptr to manage this.

Want this in before the next release.

Test Plan: make all check.

Reviewers: dhruba, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9735
2013-03-27 11:27:39 -07:00
Vamsi Ponnekanti 8ade935971 [Report the #gets and #founds in db_stress]
Summary:
Also added some comments and fixed some bugs in
stats reporting. Now the stats seem to match what is expected.

Test Plan:
[nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_stress --test_batches_snapshots=1 --ops_per_thread=1000 --threads=1 --max_key=320
LevelDB version     : 1.5
Number of threads   : 1
Ops per thread      : 1000
Read percentage     : 10
Delete percentage   : 30
Max key             : 320
Ratio #ops/#keys    : 3
Num times DB reopens: 10
Batches/snapshots   : 1
Num keys per lock   : 4
Compression         : snappy
------------------------------------------------
No lock creation because test_batches_snapshots set
2013/03/04-15:58:56  Starting database operations
2013/03/04-15:58:56  Reopening database for the 1th time
2013/03/04-15:58:56  Reopening database for the 2th time
2013/03/04-15:58:56  Reopening database for the 3th time
2013/03/04-15:58:56  Reopening database for the 4th time
Created bg thread 0x7f4542bff700
2013/03/04-15:58:56  Reopening database for the 5th time
2013/03/04-15:58:56  Reopening database for the 6th time
2013/03/04-15:58:56  Reopening database for the 7th time
2013/03/04-15:58:57  Reopening database for the 8th time
2013/03/04-15:58:57  Reopening database for the 9th time
2013/03/04-15:58:57  Reopening database for the 10th time
2013/03/04-15:58:57  Reopening database for the 11th time
2013/03/04-15:58:57  Limited verification already done during gets
Stress Test : 1811.551 micros/op 552 ops/sec
            : Wrote 0.10 MB (0.05 MB/sec) (598% of 1011 ops)
            : Wrote 6050 times
            : Deleted 3050 times
            : 500/900 gets found the key
            : Got errors 0 times

[nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_stress --ops_per_thread=1000 --threads=1 --max_key=320
LevelDB version     : 1.5
Number of threads   : 1
Ops per thread      : 1000
Read percentage     : 10
Delete percentage   : 30
Max key             : 320
Ratio #ops/#keys    : 3
Num times DB reopens: 10
Batches/snapshots   : 0
Num keys per lock   : 4
Compression         : snappy
------------------------------------------------
Creating 80 locks
2013/03/04-15:58:17  Starting database operations
2013/03/04-15:58:17  Reopening database for the 1th time
2013/03/04-15:58:17  Reopening database for the 2th time
2013/03/04-15:58:17  Reopening database for the 3th time
2013/03/04-15:58:17  Reopening database for the 4th time
Created bg thread 0x7fc0f5bff700
2013/03/04-15:58:17  Reopening database for the 5th time
2013/03/04-15:58:17  Reopening database for the 6th time
2013/03/04-15:58:18  Reopening database for the 7th time
2013/03/04-15:58:18  Reopening database for the 8th time
2013/03/04-15:58:18  Reopening database for the 9th time
2013/03/04-15:58:18  Reopening database for the 10th time
2013/03/04-15:58:18  Reopening database for the 11th time
2013/03/04-15:58:18  Starting verification
Stress Test : 1836.258 micros/op 544 ops/sec
            : Wrote 0.01 MB (0.01 MB/sec) (59% of 1011 ops)
            : Wrote 605 times
            : Deleted 305 times
            : 50/90 gets found the key
            : Got errors 0 times
2013/03/04-15:58:18  Verification successful

Revert Plan: OK

Task ID: #

Reviewers: emayanke, dhruba

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9081
2013-03-10 21:57:00 -07:00
amayank 3b6653b1f8 Make db_stress Not purge redundant keys on some opens
Summary: In light of the new option introduced by commit 806e264350 where the database has an option to compact before flushing to disk, we want the stress test to test both sides of the option. Have made it to 'deterministically' and configurably change that option for reopens.

Test Plan: make db_stress; ./db_stress with some differnet options

Reviewers: dhruba, vamsi

Reviewed By: dhruba

CC: leveldb, sheki

Differential Revision: https://reviews.facebook.net/D9165
2013-03-08 04:55:07 -08:00
Abhishek Kona a9866b721b Refactor statistics. Remove individual functions like incNumFileOpens
Summary:
Use only the counter mechanism. Do away with
incNumFileOpens, incNumFileClose, incNumFileErrors
s/NULL/nullptr/g in db/table_cache.cc

Test Plan: make clean check

Reviewers: dhruba, heyongqiang, emayanke

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8841
2013-02-25 13:58:34 -08:00
Vamsi Ponnekanti 465b9103f8 [Add a second kind of verification to db_stress
Summary:
Currently the test tracks all writes in memory and
uses it for verification at the end. This has 4 problems:
(a) It needs mutex for each write to ensure in-memory update
and leveldb update are done atomically. This slows down the
benchmark.
(b) Verification phase at the end is time consuming as well
(c) Does not test batch writes or snapshots
(d) We cannot kill the test and restart multiple times in a
loop because in-memory state will be lost.

I am adding a FLAGS_multi that does MultiGet/MultiPut/MultiDelete
instead of get/put/delete to get/put/delete a group of related
keys with same values atomically. Every get retrieves the group
of keys and checks that their values are same. This does not have
the above problems but the downside is that it does less amount
of validation than the other approach.

Test Plan:
This whole this is a test! Here is a small run. I am doing larger run now.

[nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_stress --ops_per_thread=10000 --multi=1 --ops_per_key=25
LevelDB version     : 1.5
Number of threads   : 32
Ops per thread      : 10000
Read percentage     : 10
Delete percentage   : 30
Max key             : 2147483648
Num times DB reopens: 10
Num keys per lock   : 4
Compression         : snappy
------------------------------------------------
Creating 536870912 locks
2013/02/20-16:59:32  Starting database operations
Created bg thread 0x7f9ebcfff700
2013/02/20-16:59:37  Reopening database for the 1th time
2013/02/20-16:59:46  Reopening database for the 2th time
2013/02/20-16:59:57  Reopening database for the 3th time
2013/02/20-17:00:11  Reopening database for the 4th time
2013/02/20-17:00:25  Reopening database for the 5th time
2013/02/20-17:00:36  Reopening database for the 6th time
2013/02/20-17:00:47  Reopening database for the 7th time
2013/02/20-17:00:59  Reopening database for the 8th time
2013/02/20-17:01:10  Reopening database for the 9th time
2013/02/20-17:01:20  Reopening database for the 10th time
2013/02/20-17:01:31  Reopening database for the 11th time
2013/02/20-17:01:31  Starting verification
Stress Test : 109.125 micros/op 22191 ops/sec
            : Wrote 0.00 MB (0.23 MB/sec) (59% of 32 ops)
            : Deleted 10 times
2013/02/20-17:01:31  Verification successful

Revert Plan: OK

Task ID: #

Reviewers: dhruba, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8733
2013-02-22 12:20:11 -08:00
amayank 1052ea236f Exploring the rocksdb stress test
Summary:
Fixed a bug in the stress-test where the correct size was not being
passed to GenerateValue. This bug was there since the beginning but assertions
were switched on in our code-base only recently.
Added comments on the top detailing how the stress test works and how to
quicken/slow it down after investigation.

Test Plan: make all check. ./db_stress

Reviewers: dhruba, asad

Reviewed By: dhruba

CC: vamsi, sheki, heyongqiang, zshao

Differential Revision: https://reviews.facebook.net/D8727
2013-02-21 11:27:28 -08:00
Abhishek Kona fe10200ddc Introduce histogram in statistics.h
Summary:
* Introduce is histogram in statistics.h
* stop watch to measure time.
* introduce two timers as a poc.
Replaced NULL with nullptr to fight some lint errors
Should be useful for google.

Test Plan:
ran db_bench and check stats.
make all check

Reviewers: dhruba, heyongqiang

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8637
2013-02-20 10:43:32 -08:00
Chip Turner 2fdf91a4f8 Fix a number of object lifetime/ownership issues
Summary:
Replace manual memory management with std::unique_ptr in a
number of places; not exhaustive, but this fixes a few leaks with file
handles as well as clarifies semantics of the ownership of file handles
with log classes.

Test Plan: db_stress, make check

Reviewers: dhruba

Reviewed By: dhruba

CC: zshao, leveldb, heyongqiang

Differential Revision: https://reviews.facebook.net/D8043
2013-01-23 16:54:11 -08:00
Kosie van der Merwe 3c3df7402f Fixed issues Valgrind found.
Summary:
Found issues with `db_test` and `db_stress` when running valgrind.

`DBImpl` had an issue where if an compaction failed then it will use the uninitialised file size of an output file is used. This manifested as the final call to output to the log in `DoCompactionWork()` branching on uninitialized memory (all the way down in printf's innards).

Test Plan:
Ran `valgrind --track_origins=yes ./db_test` and `valgrind ./db_stress` to see if issues disappeared.

Ran `make check` to see if there were no regressions.

Reviewers: vamsi, dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8001
2013-01-17 10:04:45 -08:00
Abhishek Kona 7d5a4383bb rollover manifest file.
Summary:
Check in LogAndApply if the file size is more than the limit set in
Options.
Things to consider : will this be expensive?

Test Plan: make all check. Inputs on a new unit test?

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7701
2013-01-16 12:09:44 -08:00
Dhruba Borthakur 62e7583f94 enhance dbstress to simulate hard crash
Summary:
dbstress has an option to reopen the database. Make it such that the
previous handle is not closed before we reopen, this simulates a
situation similar to a process crash.

Added new api to DMImpl to remove the lock file.

Test Plan: run db_stress

Reviewers: emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D6777
2012-11-18 23:16:17 -08:00
Dhruba Borthakur c3392c9fe0 The db_stress test should also test multi-threaded compaction.
Summary:
Create more than one background compaction thread if specified.
This code peice is similar to what exists in db_bench.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6753
2012-11-14 22:01:39 -08:00
Dhruba Borthakur 6c5a4d646a Merge branch 'master' into performance
Conflicts:
	db/db_impl.h
2012-11-14 21:39:52 -08:00
amayank e626261742 Introducing "database reopens" into the stress test. Database will reopen after a specified number of iterations (configurable) of each thread when they will wait for the databse to reopen.
Summary: FLAGS_reopen (configurable) specifies the number of times the databse is to be reopened. FLAGS_ops_per_thread is divided into points based on that reopen field. At these points all threads come together to wait for the databse to reopen. Each thread "votes" for the database to reopen and when all have voted, the database reopens.

Test Plan: make all;./db_stress

Reviewers: dhruba, MarkCallaghan, sheki, asad, heyongqiang

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6627
2012-11-12 12:26:32 -08:00
heyongqiang 20d18a89a3 disable size compaction in ldb reduce_levels and added compression and file size parameter to it
Summary:
disable size compaction in ldb reduce_levels, this will avoid compactions rather than the manual comapction,

added --compression=none|snappy|zlib|bzip2 and --file_size= per-file size to ldb reduce_levels command

Test Plan: run ldb

Reviewers: dhruba, MarkCallaghan

Reviewed By: dhruba

CC: sheki, emayanke

Differential Revision: https://reviews.facebook.net/D6597
2012-11-09 10:14:47 -08:00
amayank 9e97bfdcde Introducing deletes for stress test
Summary: Stress test modified to do deletes and later verify them

Test Plan: running the test: db_stress

Reviewers: dhruba, heyongqiang, asad, sheki, MarkCallaghan

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6567
2012-11-08 16:55:18 -08:00
Dhruba Borthakur 8143062edd Merge branch 'master' into performance
Conflicts:
	db/db_impl.cc
	db/version_set.cc
	util/options.cc
2012-11-07 15:11:37 -08:00
Dhruba Borthakur aa42c66814 Fix all warnings generated by -Wall option to the compiler.
Summary:
The default compilation process now uses "-Wall" to compile.
Fix all compilation error generated by gcc.

Test Plan: make all check

Reviewers: heyongqiang, emayanke, sheki

Reviewed By: heyongqiang

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6525
2012-11-06 14:07:31 -08:00
Dhruba Borthakur 1ca0584345 This is the mega-patch multi-threaded compaction
published in https://reviews.facebook.net/D5997.

Summary:
This patch allows compaction to occur in multiple background threads
concurrently.

If a manual compaction is issued, the system falls back to a
single-compaction-thread model. This is done to ensure correctess
and simplicity of code. When the manual compaction is finished,
the system resumes its concurrent-compaction mode automatically.

The updates to the manifest are done via group-commit approach.

Test Plan: run db_bench
2012-10-19 14:00:53 -07:00
Dhruba Borthakur 5dc784c233 Fix compilation problem with db_stress when using C11 compiler.
Summary:

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
2012-10-12 17:00:25 -07:00
Asad K Awan 24f7983b1f [tools] Add a tool to stress test concurrent writing to levelDB
Summary:
Created a tool that runs multiple threads that concurrently read and write to levelDB.
All writes to the DB are stored in an in-memory hashtable and verified at the end of the
test. All writes for a given key are serialzied.

Test Plan:
 - Verified by writing only a few keys and logging all writes and verifying that values read and written are correct.
 - Verified correctness of value generator.
 - Ran with various parameters of number of keys, locks, and threads.

Reviewers: dhruba, MarkCallaghan, heyongqiang

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D5829
2012-10-10 12:12:55 -07:00