Commit graph

755 commits

Author SHA1 Message Date
Dhruba Borthakur 554c06dd18 Reduce write amplification by merging files in L0 back into L0
Summary:
There is a new option called hybrid_mode which, when switched on,
causes HBase style compactions.  Files from L0 are
compacted back into L0. This meat of this compaction algorithm
is in PickCompactionHybrid().

All files reside in L0. That means all files have overlapping
keys. Each file has a time-bound, i.e. each file contains a
range of keys that were inserted around the same time. The
start-seqno and the end-seqno refers to the timeframe when
these keys were inserted.  Files that have contiguous seqno
are compacted together into a larger file. All files are
ordered from most recent to the oldest.

The current compaction algorithm starts to look for
candidate files starting from the most recent file. It continues to
add more files to the same compaction run as long as the
sum of the files chosen till now is smaller than the next
candidate file size. This logic needs to be debated
and validated.

The above logic should reduce write amplification to a
large extent... will publish numbers shortly.

Test Plan: dbstress runs for 6 hours with no data corruption (tested so far).

Differential Revision: https://reviews.facebook.net/D11289
2013-06-30 09:11:02 -07:00
Haobo Xu 71e0f695c1 [RocksDB] Expose count for WriteBatch
Summary: As title. Exposed a Count function that returns the number of updates in a batch. Could be handy for replication sequence number check.

Test Plan: make check;

Reviewers: emayanke, sheki, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11523
2013-06-26 15:13:21 -07:00
Deon Nicholas 34ef873290 Added stringappend_test back into the unit tests.
Summary:
With the Makefile now updated to correctly update all .o files, this
should fix the issues recompiling stringappend_test. This should also fix the
"segmentation-fault" that we were getting earlier. Now, stringappend_test should
be clean, and I have added it back to the unit-tests. Also made some minor updates
to the tests themselves.

Test Plan:
1. make clean; make stringappend_test -j 32	(will test it by itself)
2. make clean; make all check -j 32		(to run all unit tests)
3. make clean; make release			(test in release mode)
4. valgrind ./stringappend_test 		(valgrind tests)

Reviewers: haobo, jpaton, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11505
2013-06-26 11:41:13 -07:00
Deon Nicholas 6894a50aa7 Updated "make clean" to remove all .o files
Summary:
The old Makefile did not remove ALL .o and .d files, but rather only
those that happened to be in the root folder and one-level deep. This was causing
issues when recompiling files in deeper folders. This fix now causes make clean
to find ALL .o and .d files via a unix "find" command, and then remove them.

Test Plan:
make clean;
make all -j 32;

Reviewers: haobo, jpaton, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11493
2013-06-25 11:30:37 -07:00
Mayank Agarwal b858da709a Simplify bucketing logic in ldb-ttl
Summary: [start_time, end_time) is waht I'm following for the buckets and the whole time-range. Also cleaned up some code in db_ttl.* Not correcting the spacing/indenting convention for util/ldb_cmd.cc in this diff.

Test Plan: python ldb_test.py, make ttl_test, Run mcrocksdb-backup tool, Run the ldb tool on 2 mcrocksdb production backups form sigmafio033.prn1

Reviewers: vamsi, haobo

Reviewed By: vamsi

Differential Revision: https://reviews.facebook.net/D11433
2013-06-21 09:49:24 -07:00
Mayank Agarwal 61f1baaedf Introducing timeranged scan, timeranged dump in ldb. Also the ability to count in time-batches during Dump
Summary:
Scan and Dump commands in ldb use iterator. We need to also print timestamp for ttl databases for debugging. For this I create a TtlIterator class pointer in these functions and assign it the value of Iterator pointer which actually points to t TtlIterator object, and access the new function ValueWithTS which can return TS also. Buckets feature for dump command: gives a count of different key-values in the specified time-range distributed across the time-range partitioned according to bucket-size. start_time and end_time are specified in unixtimestamp and bucket in seconds on the user-commandline
Have commented out 3 ines from ldb_test.py so that the test does not break right now. It breaks because timestamp is also printed now and I have to look at wildcards in python to compare properly.

Test Plan: python tools/ldb_test.py

Reviewers: vamsi, dhruba, haobo, sheki

Reviewed By: vamsi

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11403
2013-06-19 18:45:13 -07:00
Haobo Xu 0f78fad9f5 [RocksDB] add back --mmap_read options to crashtest
Summary: As title, now that db_stress supports --map_read properly

Test Plan: make crash_test

Reviewers: vamsi, emayanke, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11391
2013-06-19 16:15:59 -07:00
Haobo Xu 4deaa0d48b [RocksDB] Minor change to statistics.h
Summary: as title, use initialize list so that lines fit in 80 chars.

Test Plan: make check;

Reviewers: sheki, dhruba

Differential Revision: https://reviews.facebook.net/D11385
2013-06-19 12:44:42 -07:00
Haobo Xu 96be2c4ee0 [RocksDB] Add mmap_read option for db_stress
Summary: as title, also removed an incorrect assertion

Test Plan: make check; db_stress --mmap_read=1; db_stress --mmap_read=0

Reviewers: dhruba, emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11367
2013-06-19 10:28:32 -07:00
Abhishek Kona 5ef6bb8c37 [rocksdb][refactor] statistic printing code to one place
Summary: $title

Test Plan: db_bench --statistics=1

Reviewers: haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11373
2013-06-18 20:28:41 -07:00
Jim Paton 09de7a3b6a Fix Zlib_Compress and Zlib_Uncompress
Summary:
Zlib_{Compress,Uncompress} did not handle very small input buffers properly. In addition, they did not call inflate/deflate until Z_STREAM_END was returned; it was possible for them to exit when only Z_OK had returned.

This diff also fixes a bunch of lint errors.

Test Plan: Run make check

Reviewers: dhruba, sheki, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11301
2013-06-18 16:57:42 -07:00
Haobo Xu 3cc1af2062 [RocksDB] Option for incremental sync
Summary: This diff added an option to control the incremenal sync frequency. db_bench has a new flag bytes_per_sync for easy tuning exercise.

Test Plan: make check; db_bench

Reviewers: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11295
2013-06-18 15:00:32 -07:00
Abhishek Kona 79f4fd2b62 [Rocksdb] Simplify Printing code in db_bench
Summary:
simplify the printing code in db_bench
         use TickersMap and HistogramsNameMap introduced in previous diffs.

Test Plan: ./db_bench --statistics=1 and see if all the statistics are printed

Reviewers: haobo, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11355
2013-06-18 14:58:00 -07:00
Dhruba Borthakur 6acbe0fc45 Compact multiple memtables before flushing to storage.
Summary:
Merge multiple multiple memtables in memory before writing it
out to a file in L0.

There is a new config parameter min_write_buffer_number_to_merge
that specifies the number of write buffers that should be merged
together to a single file in storage. The system will not flush
wrte buffers to storage unless at least these many buffers have
accumulated in memory.
The default value of this new parameter is 1, which means that
a write buffer will be immediately flushed to disk as soon it is
ready.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D11241
2013-06-18 14:28:04 -07:00
Abhishek Kona f561b3a324 [Rocksdb] Rename one stat key from leveldb to rocksdb 2013-06-17 14:33:05 -07:00
Dhruba Borthakur 836534debd Enhance dbstress to allow specifying compaction trigger for L0.
Summary:
Rocksdb allos specifying the number of files in L0 that triggers
compactions. Expose this api as a command line parameter for
running db_stress.

Test Plan: Run test

Reviewers: sheki, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11343
2013-06-17 14:15:09 -07:00
Abhishek Kona 00124683de [rocksdb] do not trim range for level0 in manual compaction
Summary:
https://code.google.com/p/leveldb/issues/detail?can=1&q=178&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary&id=178

Ported the solution as is to RocksDB.

Test Plan: moved the unit test as manual_compaction_test

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11331
2013-06-17 13:58:17 -07:00
Abhishek Kona 39ee47fbf4 [Rocksdb] Record WriteBlock Times into a histogram
Summary: Add a histogram to track WriteBlock times

Test Plan: db_bench and print

Reviewers: haobo, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11319
2013-06-17 10:11:10 -07:00
Deon Nicholas 8926b72751 Minor tweaks to StringAppend MergeOperator.
Summary:
I'm concerned about a random seg-fault that sometimes occurs when
running stringappend_test. I will investigate further. First, I am removing
stringappend_test from the regular release tests, and making some clean-ups
to the code.

Test Plan:
1. make stringappend_test
2. ./stringappend_test

Reviewers: haobo, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11313
2013-06-14 16:44:39 -07:00
Abhishek Kona bff718d81c [Rocksdb] Implement filluniquerandom
Summary:
Use a bit set to keep track of which random number is generated.
        Currently only supports single-threaded. All our perf tests are run with threads=1
        Copied over bitset implementation from common/datastructures

Test Plan: printed the generated keys, and verified all keys were present.

Reviewers: MarkCallaghan, haobo, dhruba

Reviewed By: MarkCallaghan

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11247
2013-06-14 16:17:56 -07:00
Deon Nicholas 2a52e1dcb6 Fix db_bench for release build.
Test Plan: make release

Reviewers: haobo, dhruba, jpaton

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11307
2013-06-14 16:00:47 -07:00
Haobo Xu 1afdf28701 [RocksDB] Compaction Filter Cleanup
Summary: This hopefully gives the right semantics to compaction filter. Will write a small wiki to explain the ideas.

Test Plan: make check; db_stress

Reviewers: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11121
2013-06-14 14:23:08 -07:00
Abhishek Kona 7a5f71d19a [Rocksdb] measure table open io in a histogram
Summary: Table is setup for compaction using Table::SetupForCompaction. So read block calls can be differentiated b/w Gets/Compaction. Use this and measure times.

Test Plan: db_bench --statistics=1

Reviewers: dhruba, haobo

Reviewed By: haobo

CC: leveldb, MarkCallaghan

Differential Revision: https://reviews.facebook.net/D11217
2013-06-13 17:25:09 -07:00
Haobo Xu 0c2a2dd5e8 [RocksDB] Fix build. Removed deprecated option --mmap_read from db_crashtest
Summary: As title

Test Plan: db_crashtest

Reviewers: vamsi, emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11271
2013-06-13 13:48:35 -07:00
Haobo Xu 778e179046 [RocksDB] Sync file to disk incrementally
Summary:
During compaction, we sync the output files after they are fully written out. This causes unnecessary blocking of the compaction thread and burstiness of the write traffic.
This diff simply asks the OS to sync data incrementally as they are written, on the background. The hope is that, at the final sync, most of the data are already on disk and we would block less on the sync call. Thus, each compaction runs faster and we could use fewer number of compaction threads to saturate IO.
In addition, the write traffic will be smoothed out, hopefully reducing the IO P99 latency too.

Some quick tests show 10~20% improvement in per thread compaction throughput. Combined with posix advice on compaction read, just 5 threads are enough to almost saturate the udb flash bandwidth for 800 bytes write only benchmark.
What's more promising is that, with saturated IO, iostat shows average wait time is actually smoother and much smaller.
For the write only test 800bytes test:
Before the change:  await  occillate between 10ms and 3ms
After the change: await ranges 1-3ms

Will test against read-modify-write workload too, see if high read latency P99 could be resolved.

Will introduce a parameter to control the sync interval in a follow up diff after cleaning up EnvOptions.

Test Plan: make check; db_bench; db_stress

Reviewers: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11115
2013-06-12 12:53:59 -07:00
Deon Nicholas 4985a9f73b [Rocksdb] [Multiget] Introduced multiget into db_bench
Summary:
Preliminary! Introduced the --use_multiget=1 and --keys_per_multiget=n
flags for db_bench. Also updated and tested the ReadRandom() method
to include an option to use multiget. By default,
keys_per_multiget=100.

Preliminary tests imply that multiget is at least 1.25x faster per
key than regular get.

Will continue adding Multiget for ReadMissing, ReadHot,
RandomWithVerify, ReadRandomWriteRandom; soon. Will also think
about ways to better verify benchmarks.

Test Plan:
1. make db_bench
2. ./db_bench --benchmarks=fillrandom
3. ./db_bench --benchmarks=readrandom --use_existing_db=1
	      --use_multiget=1 --threads=4 --keys_per_multiget=100
4. ./db_bench --benchmarks=readrandom --use_existing_db=1
	      --threads=4
5. Verify ops/sec (and 1000000 of 1000000 keys found)

Reviewers: haobo, MarkCallaghan, dhruba

Reviewed By: MarkCallaghan

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11127
2013-06-12 12:42:21 -07:00
Haobo Xu bdf1085944 [RocksDB] cleanup EnvOptions
Summary:
This diff simplifies EnvOptions by treating it as POD, similar to Options.
- virtual functions are removed and member fields are accessed directly.
- StorageOptions is removed.
- Options.allow_readahead and Options.allow_readahead_compactions are deprecated.
- Unused global variables are removed: useOsBuffer, useFsReadAhead, useMmapRead, useMmapWrite

Test Plan: make check; db_stress

Reviewers: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11175
2013-06-12 11:17:19 -07:00
Deon Nicholas 5679107b07 Completed the implementation and test cases for Redis API.
Summary:
Completed the implementation for the Redis API for Lists.
The Redis API uses rocksdb as a backend to persistently
store maps from key->list. It supports basic operations
for appending, inserting, pushing, popping, and accessing
a list, given its key.

Test Plan:
  - Compile with: make redis_test
  - Test with: ./redis_test
  - Run all unit tests (for all rocksdb) with: make all check
  - To use an interactive REDIS client use: ./redis_test -m
  - To clean the database before use:       ./redis_test -m -d

Reviewers: haobo, dhruba, zshao

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D10833
2013-06-11 11:19:49 -07:00
Dhruba Borthakur e673d5d26d Do not submit multiple simultaneous seek-compaction requests.
Summary:
The code was such that if multi-threaded-compactions as well
as seek compaction are enabled then it submits multiple
compaction request for the same range of keys. This causes
extraneous sst-files to accumulate at various levels.

Test Plan:
I am not able to write a very good unit test for this one
but can easily reproduce this bug with 'dbstress' with the
following options.

batch=1;maxk=100000000;ops=100000000;ro=0;fm=2;bpl=10485760;of=500000; wbn=3; mbc=20; mb=2097152; wbs=4194304; dds=1; sync=0;  t=32; bs=16384; cs=1048576; of=500000; ./db_stress --disable_seek_compaction=0 --mmap_read=0 --threads=$t --block_size=$bs --cache_size=$cs --open_files=$of --verify_checksum=1 --db=/data/mysql/leveldb/dbstress.dir --sync=$sync --disable_wal=1 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --target_file_size_multiplier=$fm --max_write_buffer_number=$wbn --max_background_compactions=$mbc --max_bytes_for_level_base=$bpl --reopen=$ro --ops_per_thread=$ops --max_key=$maxk --test_batches_snapshots=$batch

Reviewers: leveldb, emayanke

Reviewed By: emayanke

Differential Revision: https://reviews.facebook.net/D11055
2013-06-10 15:49:19 -07:00
Mayank Agarwal 3c35eda9bd Make Write API work for TTL databases
Summary: Added logic to make another WriteBatch with Timestamps during the Write function execution in TTL class. Also expanded the ttl_test to test for it. Have done nothing for Merge for now.

Test Plan: make ttl_test;./ttl_test

Reviewers: haobo, vamsi, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D10827
2013-06-10 15:23:44 -07:00
Dhruba Borthakur 1b69f1e584 Fix refering freed memory in earlier commit.
Summary: Fix refering freed memory in earlier commit by https://reviews.facebook.net/D11181

Test Plan: make check

Reviewers: haobo, sheki

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11193
2013-06-10 15:08:13 -07:00
Abhishek Kona 4a8554d5bb [Rocksdb] fix wrong assert
Summary: the assert was wrong in D11145. Broke build

Test Plan: make db_bench run it

Reviewers: dhruba, haobo, emayanke

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11187
2013-06-10 13:14:14 -07:00
Dhruba Borthakur c5de1b9391 Print name of user comparator in LOG.
Summary:
The current code prints the name of the InternalKeyComparator
in the log file. We would also like to print the name of the
user-specified comparator for easier debugging.

Test Plan: make check

Reviewers: sheki

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11181
2013-06-10 12:11:55 -07:00
Abhishek Kona a4913c5170 [rocksdb] names for all metrics provided in statistics.h
Summary: Provide a  map of histograms and ticker vs strings. Fb303 libraries can use this to provide the mapping. We will not have to duplicate the code during release.

Test Plan: db_bench with statistics=1

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11145
2013-06-10 11:57:55 -07:00
Mayank Agarwal 184343a061 Max_mem_compaction_level can have maximum value of num_levels-1
Summary:
Without this files could be written out to a level greater than the maximum level possible and is the source of the segfaults that wormhole awas getting. The sequence of steps that was followed:
1. WriteLevel0Table was called when memtable was to be flushed for a file.
2. PickLevelForMemTableOutput was called to determine the level to which this file should be pushed.
3. PickLevelForMemTableOutput returned a wrong result because max_mem_compaction_level was equal to 2 even when num_levels was equal to 0.
The fix to re-initialize max_mem_compaction_level based on num_levels passed seems correct.

Test Plan: make all check; Also made a dummy file to mimic the wormhole-file behaviour which was causing the segfaults and found that the same segfault occurs without this change and not with this.

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11157
2013-06-09 10:38:55 -07:00
Mayank Agarwal 7a6bd8e975 Modifying options to db_stress when it is run with db_crashtest
Summary: These extra options caught some bugs. Will be run via Jenkins now with the crash_test

Test Plan: ./make crashtest

Reviewers: dhruba, vamsi

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11151
2013-06-09 09:58:46 -07:00
Vamsi Ponnekanti 3bb9449906 [Fix whilebox crash test failure]
Summary:
I think the check for "error" that I added had caused
false alarm. Fixed that.

Test Plan:
Revert Plan: OK

Task ID: #

Reviewers: emayanke, dhruba

Reviewed By: emayanke

Differential Revision: https://reviews.facebook.net/D11139
2013-06-07 11:34:46 -07:00
Abhishek Kona e982b5a489 [Rocksdb] measure table open io in a histogram
Summary: as title

Test Plan: db_bench --statistics=1 check for statistic.

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11109
2013-06-07 10:02:28 -07:00
Jim Paton 8ef328ee6a ctags and cscope support to Makefile
Summary: Added a target to Makefile called 'tags' that runs ctags and cscope on all *.cc and *.h file

Test Plan:
Run 'make tags'. Then start vim and do
:set tags=./tags
:cs add cscope.out

These commands should give you no error messages. You should then be able to access cscope db and ctags as normal in vim.

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D11103
2013-06-07 09:13:40 -07:00
Vamsi Ponnekanti 5cf7a00bda [Make most of the changes suggested by Aaron]
Summary: $title

Test Plan:
Revert Plan: OK

Task ID: #

Reviewers: emayanke, akushner

Reviewed By: akushner

Differential Revision: https://reviews.facebook.net/D10923
2013-06-06 17:31:45 -07:00
Deon Nicholas db1f0cddf3 Fixed valgrind errors 2013-06-05 17:25:16 -07:00
Deon Nicholas d8c7c45ea0 Very basic Multiget and simple test cases.
Summary:
Implemented the MultiGet operator which takes in a list of keys
and returns their associated values. Currently uses std::vector as its
container data structure. Otherwise, it works identically to "Get".

Test Plan:
 1. make db_test      ; compile it
 2. ./db_test         ; test it
 3. make all check    ; regress / run all tests
 4. make release      ; (optional) compile with release settings

Reviewers: haobo, MarkCallaghan, dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D10875
2013-06-05 11:22:38 -07:00
Abhishek Kona d91b42ee27 [Rocksdb] Measure all FSYNC/SYNC times
Summary: Add stop watches around all sync calls.

Test Plan: db_bench check if respective histograms are printed

Reviewers: haobo, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11073
2013-06-05 11:06:21 -07:00
Abhishek Kona ee522d0032 [Rocksdb] Log on disable/enable file deletions
Summary: as title

Test Plan: compile

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11085
2013-06-05 10:48:24 -07:00
Haobo Xu 043573b24f [RocksDB] Include 64bit random number generator
Summary: As title.

Test Plan: make check;

Reviewers: chip, MarkCallaghan

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11061
2013-06-04 13:52:27 -07:00
Mark Callaghan d9f538e1a9 Improve output for GetProperty('leveldb.stats')
Summary:
Display separate values for read, write & total compaction IO.
Display compaction amplification and write amplification.
Add similar values for the period since the last call to GetProperty. Results since the server started
are reported as "cumulative" stats. Results since the last call to GetProperty are reported as
"interval" stats.

Level  Files Size(MB) Time(sec)  Read(MB) Write(MB)    Rn(MB)  Rnp1(MB)  Wnew(MB) Amplify Read(MB/s) Write(MB/s)      Rn     Rnp1     Wnp1     NewW    Count  Ln-stall
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0        7       13        21         0       211         0         0       211     0.0       0.0        10.1        0        0        0        0      113       0.0
  1       79      157        88       993       989       198       795       194     9.0      11.3        11.2      106      405      502       97       14       0.0
  2       19       36         5        63        63        37        27        36     2.4      12.3        12.2       19       14       32       18       12       0.0
>>>>>>>>>>>>>>>>>>>>>>>>> text below has been is new and/or reformatted
Uptime(secs): 122.2 total, 0.9 interval
Compaction IO cumulative (GB): 0.21 new, 1.03 read, 1.23 write, 2.26 read+write
Compaction IO cumulative (MB/sec): 1.7 new, 8.6 read, 10.3 write, 19.0 read+write
Amplification cumulative: 6.0 write, 11.0 compaction
Compaction IO interval (MB): 5.59 new, 0.00 read, 5.59 write, 5.59 read+write
Compaction IO interval (MB/sec): 6.5 new, 0.0 read, 6.5 write, 6.5 read+write
Amplification interval: 1.0 write, 1.0 compaction
>>>>>>>>>>>>>>>>>>>>>>>> text above is new and/or reformatted
Stalls(secs): 90.574 level0_slowdown, 0.000 level0_numfiles, 10.165 memtable_compaction, 0.000 leveln_slowdown

Task ID: #

Blame Rev:

Test Plan:
make check, run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11049
2013-06-03 20:24:49 -07:00
Haobo Xu 2b1fb5b01d [RocksDB] Add score column to leveldb.stats
Summary: Added the 'score' column to the compaction stats output, which shows the level total size devided by level target size. Could be useful when monitoring compaction decisions...

Test Plan: make check; db_bench

Reviewers: dhruba

CC: leveldb, MarkCallaghan

Differential Revision: https://reviews.facebook.net/D11025
2013-06-03 17:38:27 -07:00
Haobo Xu d897d33bf1 [RocksDB] Introduce Fast Mutex option
Summary:
This diff adds an option to specify whether PTHREAD_MUTEX_ADAPTIVE_NP will be enabled for the rocksdb single big kernel lock. db_bench also have this option now.
Quickly tested 8 thread cpu bound 100 byte random read.
No fast mutex: ~750k/s ops
With fast mutex: ~880k/s ops

Test Plan: make check; db_bench; db_stress

Reviewers: dhruba

CC: MarkCallaghan, leveldb

Differential Revision: https://reviews.facebook.net/D11031
2013-06-01 23:11:34 -07:00
Haobo Xu ab8d2f6ab2 [RocksDB] [Performance] Allow different posix advice to be applied to the same table file
Summary:
Current posix advice implementation ties up the access pattern hint with the creation of a file.
It is not possible to apply different advice for different access (random get vs compaction read),
without keeping two open files for the same table. This patch extended the RandomeAccessFile interface
to accept new access hint at anytime. Particularly, we are able to set different access hint on the same
table file based on when/how the file is used.
Two options are added to set the access hint, after the file is first opened and after the file is being
compacted.

Test Plan: make check; db_stress; db_bench

Reviewers: dhruba

Reviewed By: dhruba

CC: MarkCallaghan, leveldb

Differential Revision: https://reviews.facebook.net/D10905
2013-05-30 19:08:44 -07:00
Haobo Xu 2df65c118c [RocksDB] Dump counters and histogram data periodically with compaction stats
Summary: As title

Test Plan: make check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D10995
2013-05-29 12:00:18 -07:00