Commit graph

973 commits

Author SHA1 Message Date
Igor Canadi 5f4166c90e ReadaheadRandomAccessFile -- userspace readahead
Summary:
ReadaheadRandomAccessFile acts as a transparent layer on top of RandomAccessFile. When a Read() request is issued, it issues a much bigger request to the OS and caches the result. When a new request comes in and we already have the data cached, it doesn't have to issue any requests to the OS.

We add ReadaheadRandomAccessFile layer only when file is read during compactions.

D45105 was incorrectly closed by Phabricator because I committed it to a separate branch (not master), so I'm resubmitting the diff.

Test Plan: make check

Reviewers: MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45123
2015-08-26 15:25:59 -07:00
Igor Canadi 16ebe3a2a9 Mmap reads should not return error if reading past file
Summary:
Currently, mmap returns IOError when user tries to read data past the end of the file. This diff changes the behavior. Now, we return just the bytes that we can, and report the size we returned via a Slice result. This is consistent with non-mmap behavior and also pread() system call.

This diff is taken out of D45123.

Test Plan: make check

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45645
2015-08-26 14:51:38 -07:00
Andres Notzli 09d982f9e0 Fix compact_files_example
Summary:
See task #7983654. The example was triggering an assert in compaction job
because the compaction was not marked as manual. With this patch,
CompactionPicker::FormCompaction() marks compactions as manual. This patch
also fixes a couple of typos, adds optimistic_transaction_example to
.gitignore and librocksdb as a dependency for examples. Adding librocksdb as
a dependency makes sure that the examples are built with the latest changes
in librocksdb.

Test Plan: make clean && cd examples && make all && ./compact_files_example

Reviewers: rven, sdong, anthony, igor, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45117
2015-08-25 12:29:44 -07:00
Ari Ekmekji b6def58f73 Changed 'num_subcompactions' to the more accurate 'max_subcompactions'
Summary:
Up until this point we had DbOptions.num_subcompactions, but
it is semantically more correct to call this max_subcompactions since
we will schedule *up to* DbOptions.max_subcompactions smaller compactions
at a time during a compaction job.

I also added a --subcompactions option to db_bench

Test Plan: make all   make check

Reviewers: sdong, igor, anthony, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D45069
2015-08-21 14:25:34 -07:00
sdong 9130873a13 Add options.new_table_reader_for_compaction_inputs
Summary: Currently compaction inputs share the same file descriptor and table reader as other foreground threads. It makes fadvise works less predictable. Add options.new_table_reader_for_compaction_inputs to enforce to create a new file descriptor and new table reader for it.

Test Plan: Add the option.

Reviewers: rven, anthony, kradhakrishnan, IslamAbdelRahman, igor, yhchiang

Reviewed By: igor

Subscribers: igor, MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43311
2015-08-21 08:46:29 -07:00
Islam AbdelRahman 3fd70b05b8 Rate limit deletes issued by DestroyDB
Summary: Update DestroyDB so that all SST files in the first path id go through DeleteScheduler instead of being deleted immediately

Test Plan: added a unittest

Reviewers: igor, yhchiang, anthony, kradhakrishnan, rven, sdong

Reviewed By: sdong

Subscribers: jeanxu2012, dhruba

Differential Revision: https://reviews.facebook.net/D44955
2015-08-19 15:02:17 -07:00
Yueh-Hsuan Chiang 9ec9571593 DBOptions serialization and deserialization
Summary:
This patch implements DBOptions deserialization and improve
the current implementation of DBOptions serialization by
using a static structure that stores the offset of each
DBOptions member variables to perform serialization and
deserialization instead of using tons of if-then-branch
to determine the mapping between string and variables.

Test Plan: Added test in options_test.cc

Reviewers: igor, anthony, sdong, IslamAbdelRahman

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D44097
2015-08-18 13:30:18 -07:00
Yueh-Hsuan Chiang b2df20a890 Make HashCuckooRep::ApproximateMemoryUsage() return reasonable estimation.
Summary:
HashCuckooRep::ApproximateMemoryUsage() previously return
std::numeric_limits<size_t>::max() when it cannot accept more
entries.  This patch makes it return a more reasonable estimation.

This change is necessary in order to make GetIntProperty("rocksdb.cur-size-all-mem-tables")
handles HashCuckooRep properly in diff https://reviews.facebook.net/D44229.

Test Plan: db_test

Reviewers: sdong, anthony, IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D44241
2015-08-18 13:19:55 -07:00
Ari Ekmekji f0da6977a3 [Parallel L0-L1 Compaction Prep]: Giving Subcompactions Their Own State
Summary:
In prepration for running multiple threads at the same time during
a compaction job, this patch assigns each subcompaction its own state
(instead of sharing the one global CompactionState). Each subcompaction then
uses this state to update its statistics, keep track of its snapshots, etc.
during the course of execution. Then at the end of all the executions the
statistics are aggregated across the subcompactions so that the final result
is the same as if only one larger compaction had run.

Test Plan: ./db_test  ./db_compaction_test  ./compaction_job_test

Reviewers: sdong, anthony, igor, noetzli, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43239
2015-08-18 11:06:23 -07:00
Andres Notzli f32a572099 Simplify querying of merge results
Summary:
While working on supporting mixing merge operators with
single deletes ( https://reviews.facebook.net/D43179 ),
I realized that returning and dealing with merge results
can be made simpler. Submitting this as a separate diff
because it is not directly related to single deletes.

Before, callers of merge helper had to retrieve the merge
result in one of two ways depending on whether the merge
was successful or not (success = result of merge was single
kTypeValue). For successful merges, the caller could query
the resulting key/value pair and for unsuccessful merges,
the result could be retrieved in the form of two deques of
keys and values. However, with single deletes, a successful merge
does not return a single key/value pair (if merge
operands are merged with a single delete, we have to generate
a value and keep the original single delete around to make
sure that we are not accidentially producing a key overwrite).
In addition, the two existing call sites of the merge
helper were taking the same actions independently from whether
the merge was successful or not, so this patch simplifies that.

Test Plan: make clean all check

Reviewers: rven, sdong, yhchiang, anthony, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43353
2015-08-17 17:34:38 -07:00
sdong 72613657f0 Measure file read latency histogram per level
Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.

Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected

Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D44193
2015-08-14 17:32:42 -07:00
sdong 603b6da8b8 Add options.compaction_measure_io_stats to print write I/O stats in compactions
Summary:
Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs:

2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]}

Add two more counters in iostats_context.

Also add a parameter of db_bench.

Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D44115
2015-08-13 16:52:26 -07:00
Islam AbdelRahman 3bd9db420e [Cleanup] Remove RandomRWFile
Summary: RandomRWFile is not used anywhere in out code base, this patch remove RandomRWFile

Test Plan:
make check -j64
USE_CLANG=1 make all -j64
OPT=-DROCKSDB_LITE make release -j64

Reviewers: sdong, yhchiang, anthony, kradhakrishnan, rven, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D44091
2015-08-12 10:18:59 -07:00
agiardullo 0db807ec28 Transaction error statuses
Summary:
Based on feedback from spetrunia, we should better differentiate error statuses for transaction failures.

https://github.com/MySQLOnRocksDB/mysql-5.6/issues/86#issuecomment-124605954

Test Plan: unit tests

Reviewers: rven, kradhakrishnan, spetrunia, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43323
2015-08-11 17:52:56 -07:00
agiardullo c2f2cb0214 Pessimistic Transactions
Summary:
Initial implementation of Pessimistic Transactions.  This diff contains the api changes discussed in D38913.  This diff is pretty large, so let me know if people would prefer to meet up to discuss it.

MyRocks folks:  please take a look at the API in include/rocksdb/utilities/transaction[_db].h and let me know if you have any issues.

Also, you'll notice a couple of TODOs in the implementation of RollbackToSavePoint().  After chatting with Siying, I'm going to send out a separate diff for an alternate implementation of this feature that implements the rollback inside of WriteBatch/WriteBatchWithIndex.  We can then decide which route is preferable.

Next, I'm planning on doing some perf testing and then integrating this diff into MongoRocks for further testing.

Test Plan: Unit tests, db_bench parallel testing.

Reviewers: igor, rven, sdong, yhchiang, yoshinorim

Reviewed By: sdong

Subscribers: hermanlee4, maykov, spetrunia, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D40869
2015-08-11 17:52:23 -07:00
Yueh-Hsuan Chiang e61fafbe7a Fixed clang-build error in util/thread_local.cc
Summary:
This patch fixes the following clang-build error in util/thread_local.cc by using a cleaner macro blocker:

12:26:31 util/thread_local.cc:157:19: error: declaration shadows a static data member of 'rocksdb::ThreadLocalPtr::StaticMeta' [-Werror,-Wshadow]
12:26:31       ThreadData* tls_ =
12:26:31                   ^
12:26:31 util/thread_local.cc:19:66: note: previous declaration is here
12:26:31 __thread ThreadLocalPtr::ThreadData* ThreadLocalPtr::StaticMeta::tls_ = nullptr;
12:26:31                                                                  ^

Test Plan: db_test

Reviewers: sdong, anthony, IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D44043
2015-08-11 13:30:49 -07:00
Islam AbdelRahman cee1e8a080 Parallelize LoadTableHandlers
Summary: Add a new option that all LoadTableHandlers to use multiple threads to load files on DB Open and Recover

Test Plan:
make check -j64
COMPILE_WITH_TSAN=1 make check -j64
DISABLE_JEMALLOC=1 make all valgrind_check -j64 (still running)

Reviewers: yhchiang, anthony, rven, kradhakrishnan, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D43755
2015-08-11 12:19:56 -07:00
Yueh-Hsuan Chiang b47d65b315 Fixed Segmentation Fault in db_stress on OSX.
Summary:
This patch provides a simplier solution to the memory leak
issue identified in patch https://reviews.facebook.net/D43677,
where a static function local variable can be used instead of
using a global static unique_ptr.

Test Plan: run db_stress on mac

Reviewers: igor, sdong, anthony, IslamAbdelRahman, maykov

Reviewed By: maykov

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43995
2015-08-11 10:55:27 -07:00
Yoshinori Matsunobu 2cf0f4f471 Adding wal_recovery_mode log message
Summary:
wal_recovery_mode setting was not written to LOG. This diff
adds the log message

Test Plan: manually checked

Reviewers: kradhakrishnan, sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D43953
2015-08-10 09:43:30 -07:00
Islam AbdelRahman 22dcaaff30 More accurate time measurement for delete_scheduler_test
Summary: Start measuring time spent before BackgroundEmptyTrash starts

Test Plan: delete_scheduler_test

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D43857
2015-08-07 15:37:56 -07:00
Nate Rosenblum ac04a6cfb8 Fix OSX + Windows build
Commit 257ee89 added a static destruction helper to avoid notional
"leaks" of TLS on main thread exit. This helper fails to compile on
OS X (and presumably Windows, though I haven't checked), which lacks
the __thread storage class StaticMeta::tls_ member.

This patch fixes the builds. Do note that the static cleanup mechanism
may be somewhat brittle and atexit(3) may be a more suitable approach
to releasing the main thread's TLS if it's highly desirable for this
memory to not be reported "reachable" by Valgrind at exit.
2015-08-07 10:47:05 -07:00
Alexey Maykov 257ee895f9 Fixed memory leaks
Summary:
MyRocks valgrind run was showing memory leaks. The fixes are mostly self-explaining.
There is only a single usage of ThreadLocalPtr. Potentially, we may think about replacing this use with thread_local, but it will be a bigger change. Another option to consider is using thread_local instead of __thread in ThreadLocalPtr implementation. This way, tls_ can be stored using std::unique_ptr and no destructor would be required.

Test Plan:
 - make check
 - MyRocks valgrind run doesn't report leaks

Reviewers: rven, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D43677
2015-08-06 15:39:12 -07:00
Islam AbdelRahman 40f893f4a9 Fix delete_scheduler_test valgrind error
Summary: Use shared_ptr instead of deleting in destructor

Test Plan: DISABLE_JEMALLOC=1 make delete_scheduler_test -j64 && valgrind --error-exitcode=2 --leak-check=full ./delete_scheduler_test

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D43659
2015-08-06 10:56:00 -07:00
sdong c7742452eb Add Statistics.getHistogramString() to print more detailed outputs of a histogram
Summary:
Provide a way for users to know more detailed ditribution of a histogram metrics. Example outputs:

Manually add statement
  fprintf(stdout, "%s\n", dbstats->getHistogramString(SST_READ_MICROS).c_str());
Will print out something like:

Count: 989151  Average: 1.7659  StdDev: 1.52
Min: 0.0000  Median: 1.2071  Max: 860.0000
Percentiles: P50: 1.21 P75: 1.70 P99: 5.12 P99.9: 13.67 P99.99: 21.70
------------------------------------------------------
[       0,       1 )   390839  39.513%  39.513% ########
[       1,       2 )   500918  50.641%  90.154% ##########
[       2,       3 )    79358   8.023%  98.177% ##
[       3,       4 )     6297   0.637%  98.813%
[       4,       5 )     1712   0.173%  98.986%
[       5,       6 )     1134   0.115%  99.101%
[       6,       7 )     1222   0.124%  99.224%
[       7,       8 )     1529   0.155%  99.379%
[       8,       9 )     1264   0.128%  99.507%
[       9,      10 )      988   0.100%  99.607%
[      10,      12 )     1378   0.139%  99.746%
[      12,      14 )     1828   0.185%  99.931%
[      14,      16 )      410   0.041%  99.972%
[      16,      18 )       72   0.007%  99.980%
[      18,      20 )       67   0.007%  99.986%
[      20,      25 )      106   0.011%  99.997%
[      25,      30 )       24   0.002%  99.999%
[      30,      35 )        1   0.000% 100.000%
[     250,     300 )        2   0.000% 100.000%
[     300,     350 )        1   0.000% 100.000%
[     800,     900 )        1   0.000% 100.000%

Test Plan: Manually add a print in db_bench and make sure it prints out as expected. Will add some codes to cover the function

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43611
2015-08-05 20:05:56 -07:00
Islam AbdelRahman 29b028b0ed Make DeleteScheduler tests more reliable
Summary: Update DeleteScheduler tests so that they verify the used penalties for waiting instead of measuring the time spent which is not reliable

Test Plan:
make -j64 delete_scheduler_test && ./delete_scheduler_test
COMPILE_WITH_TSAN=1 make -j64 delete_scheduler_test && ./delete_scheduler_test
COMPILE_WITH_ASAN=1 make -j64 delete_scheduler_test && ./delete_scheduler_test

make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths"
COMPILE_WITH_TSAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths"
COMPILE_WITH_ASAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths"

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D43635
2015-08-05 19:16:52 -07:00
sdong 7ccd1c80a7 Add two unit tests for SyncWAL()
Summary:
Add two unit tests for SyncWAL(). One makes sure SyncWAL() doesn't block writes in the other thread. Another one makes sure SyncWAL() doesn't wait ongoing writes to finish before being executed.

Create a new test file db_wal_test and move two WAL related tests from db_test to here.

Test Plan: Run the new tests

Reviewers: IslamAbdelRahman, rven, kradhakrishnan, kolmike, tnovak, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43605
2015-08-05 14:27:02 -07:00
sdong 3ae386eafe Add statistic histogram "rocksdb.sst.read.micros"
Summary: Measure read latency histogram and put in statistics. Compaction inputs are excluded from it when possible (unfortunately usually no possible as we usually take table reader from table cache.

Test Plan:
Run db_bench and it shows the stats, like:

rocksdb.sst.read.micros statistics Percentiles :=> 50 : 1.238522 95 : 2.529740 99 : 3.912180

Reviewers: kradhakrishnan, rven, anthony, IslamAbdelRahman, MarkCallaghan, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43275
2015-08-05 13:02:33 -07:00
Islam AbdelRahman bd2fc5f5fb Fix TSAN for delete_scheduler_test
Summary: Fixing TSAN false positive and relaxing the conditions when we are running under TSAN

Test Plan: COMPILE_WITH_TSAN=1 make -j64 delete_scheduler_test && ./delete_scheduler_test

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D43593
2015-08-05 11:45:31 -07:00
Andres Notzli c465071029 Removing duplicate code
Summary:
While working on https://reviews.facebook.net/D43179 , I found
duplicate code in the tests. This patch removes it.

Test Plan: make clean all check

Reviewers: igor, sdong, rven, anthony, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43263
2015-08-05 07:33:27 -07:00
Mike Kolupaev e06cf1a098 [wal changes 3/3] method in DB to sync WAL without blocking writers
Summary:
Subj. We really need this feature.

Previous diff D40899 has most of the changes to make this possible, this diff just adds the method.

Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind.

Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong

Reviewed By: sdong

Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba

Differential Revision: https://reviews.facebook.net/D40905
2015-08-05 06:06:39 -07:00
Ari Ekmekji 5dc3e6881a Update Tests To Enable Subcompactions
Summary:
Updated DBTest DBCompactionTest and CompactionJobStatsTest
to run compaction-related tests once with subcompactions enabled and
once disabled using the TEST_P test type in the Google Test suite.

Test Plan: ./db_test  ./db_compaction-test  ./compaction_job_stats_test

Reviewers: sdong, igor, anthony, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43443
2015-08-04 22:19:07 -07:00
Islam AbdelRahman c45a57b41e Support delete rate limiting
Summary:
Introduce DeleteScheduler that allow enforcing a rate limit on file deletion
Instead of deleting files immediately, files are moved to trash directory and deleted in a background thread that apply sleep penalty between deletes if needed.

I have updated PurgeObsoleteFiles and PurgeObsoleteWALFiles to use the delete_scheduler instead of env_->DeleteFile

Test Plan:
added delete_scheduler_test
existing unit tests

Reviewers: kradhakrishnan, anthony, rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D43221
2015-08-04 20:45:27 -07:00
Yueh-Hsuan Chiang 14d0bfa429 Add DBOptions::skip_sats_update_on_db_open
Summary:
UpdateAccumulatedStats() is used to optimize compaction decision
esp. when the number of deletion entries are high, but this function
can slowdown DBOpen esp. in disk environment.

This patch adds DBOptions::skip_sats_update_on_db_open, which skips
UpdateAccumulatedStats() in DB::Open() time when it's set to true.

Test Plan: Add DBCompactionTest.SkipStatsUpdateTest

Reviewers: igor, anthony, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: tnovak, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42843
2015-08-04 13:48:16 -07:00
Boyang Zhang d5c0a6da6c Merge branch 'master' of github.com:facebook/rocksdb
Fixed memory leak error
2015-08-04 10:56:49 -07:00
Boyang Zhang 2d41403f45 Made change to fix the memory leak
Summary: So I took a look and I used a pointer to TableBuilder.  Changed it to a unique_ptr.  I think this should work, but I cannot run valgrind correctly on my local machine to test it.

Test Plan: Run valgrind, but it's not working locally.  It says I'm executing an unrecognized instruction.

Reviewers: yhchiang

Subscribers: dhruba, sdong

Differential Revision: https://reviews.facebook.net/D43485
2015-08-04 10:55:42 -07:00
sdong 92f7039eec fix memory corruption issue in sst_dump --show_compression_sizes
Summary: In "sst_dump --show_compression_sizes", a reference of CompressionOptions is kept in TableBuilderOptions, which is destroyed later, causing a memory issue.

Test Plan: Run valgrind against SSTDumpToolTest.CompressedSizes and make sure it is fixed

Reviewers: IslamAbdelRahman, yhchiang, kradhakrishnan, rven

Reviewed By: rven

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43497
2015-08-04 10:48:44 -07:00
Ari Ekmekji 40c64434d4 Parallelize L0-L1 Compaction: Restructure Compaction Job
Summary:
As of now compactions involving files from Level 0 and Level 1 are single
threaded because the files in L0, although sorted, are not range partitioned like
the other levels. This means that during L0-L1 compaction each file from L1
needs to be merged with potentially all the files from L0.

This attempt to parallelize the L0-L1 compaction assigns a thread and a
corresponding iterator to each L1 file that then considers only the key range
found in that L1 file and only the L0 files that have those keys (and only the
specific portion of those L0 files in which those keys are found). In this way
the overlap is minimized and potentially eliminated between different iterators
focusing on the same files.

The first step is to restructure the compaction logic to break L0-L1 compactions
into multiple, smaller, sequential compactions. Eventually each of these smaller
jobs will be run simultaneously. Areas to pay extra attention to are

  # Correct aggregation of compaction job statistics across multiple threads
  # Proper opening/closing of output files (make sure each thread's is unique)
  # Keys that span multiple L1 files
  # Skewed distributions of keys within L0 files

Test Plan: Make and run db_test (newer version has separate compaction tests) and compaction_job_stats_test

Reviewers: igor, noetzli, anthony, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42699
2015-08-03 11:32:14 -07:00
sdong 47316c2d08 dump_manifest supports DB with more number of levels
Summary: Now ldb dump_manifest refuses to work if there are 20 levels. Extend the limit to 64.

Test Plan: Run the tool with 20 number of levels

Reviewers: kradhakrishnan, anthony, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D42879
2015-08-03 11:02:09 -07:00
Andres Noetzli 544be638ab Fixing fprintf of non string literal
Summary:
sst_dump_tool contains two instances of `fprintf`s where the `format` argument is not
a string literal. This prevents the code from compiling with some compilers/compiler
options because of the potential security risks associated with printing non-literals.

Test Plan: make all

Reviewers: rven, igor, yhchiang, sdong, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43305
2015-07-30 17:46:47 -07:00
Boyang Zhang 05d4265a28 Merge branch 'master' of github.com:facebook/rocksdb 2015-07-29 17:48:52 -07:00
Boyang Zhang 4be6d44167 Compression sizes option for sst_dump_tool
Summary:
Added a new feature to sst_dump_tool.cc to allow a user to see the sizes of the different compression algorithms on an .sst file.

Usage:
./sst_dump --file=<filename> --show_compression_sizes
./sst_dump --file=<filename> --show_compression_sizes --set_block_size=<block_size>

Note: If you do not set a block size, it will default to 16kb

Test Plan: manual test and the write a unit test

Reviewers: IslamAbdelRahman, anthony, yhchiang, rven, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D42963
2015-07-29 17:42:13 -07:00
Andres Notzli e95c59cd2f Count number of corrupt keys during compaction
Summary:
For task #7771355, we would like to log the number of corrupt keys
during a compaction. This patch implements and tests the count
as part of CompactionJobStats.

Test Plan: make && make check

Reviewers: rven, igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42921
2015-07-28 16:41:40 -07:00
Poornima Chozhiyath Raman 1bdfcef7bf Fix when output level is 0 of universal compaction with trivial move
Summary: Fix for universal compaction with trivial move, when the ouput level is 0. The tests where failing. Fixed by allowing normal compaction when output level is 0.

Test Plan: modified test cases run successfully.

Reviewers: sdong, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: anthony, kradhakrishnan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D42933
2015-07-27 14:25:57 -07:00
Siying Dong 8279d41972 Merge pull request #667 from yuslepukhin/fix_now_microsec_win
Fix WinEnv::NowMicros
2015-07-23 11:00:43 -07:00
Dmitri Smirnov 555ca3e7b7 Fix WinEnv::NowMicrosec
* std::chrono does not provide enough granularity for microsecs and periodically emits
    duplicates
  * the bug is manifested in log rotation logic where we get duplicate
   log file names and loose previous log content
  * msvc does not imlement COW on std::strings adjusted the test to use
    refs in the loops as auto does not retain ref info
  * adjust auto_log rotation test with Windows specific command to remove
    a folder. The test previously worked because we have unix utils installed
    in house but this may not be the case for everyone.
2015-07-22 14:36:43 -07:00
Mike Kolupaev 3bf9f9a832 cleaned up PosixMmapFile a little
Summary: https://reviews.facebook.net/D42321 has left PosixMmapFile in some weird state. This diff removes pending_sync_ that was now unused, fixes indentation and prevents Fsync() from calling both fsync() and fdatasync().

Test Plan: `make -j check`

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D42885
2015-07-22 12:27:39 -07:00
sdong 85ac65536b Tests to avoid to use TMPDIR directly
Summary: Directly using TMPDIR can cause problems when running tests using parallel option. Fix them.

Test Plan: Run all tests in parallel

Reviewers: kradhakrishnan, yhchiang, IslamAbdelRahman, anthony

Reviewed By: anthony

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D42807
2015-07-21 19:07:34 -07:00
Siying Dong 3dbf4ba220 RangeSync not to sync last 1MB of the file
Summary:
From other ones' investigation:

"sync_file_range() behavior highly depends on kernel version and filesystem.

xfs does neighbor page flushing outside of the specified ranges. For example, sync_file_range(fd, 8192, 16384) does not only trigger flushing page #3 to #4, but also flushing many more dirty pages (i.e. up to page#16)... Ranges of the sync_file_range() should be far enough from write() offset (at least 1MB)."

Test Plan: make all check

Reviewers: igor, rven, kradhakrishnan, yhchiang, IslamAbdelRahman, anthony

Reviewed By: anthony

Subscribers: yoshinorim, MarkCallaghan, sumeet, domas, dhruba, leveldb, ljin

Differential Revision: https://reviews.facebook.net/D15807
2015-07-21 16:22:40 -07:00
agiardullo 064294081b Improved FileExists API
Summary: Add new CheckFileExists method.  Considered changing the FileExists api but didn't want to break anyone's builds.

Test Plan: unit tests

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42003
2015-07-20 17:20:40 -07:00
Islam AbdelRahman 59eca2cc99 Make memenv_test runnable in ROCKSDB_LITE
Summary: Make memenv_test runnable in ROCKSDB_LITE

Test Plan: memenv_test

Reviewers: sdong, igor, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D42123
2015-07-20 11:35:10 -07:00