Commit graph

1251 commits

Author SHA1 Message Date
Igor Canadi 6398e6a6a5 Fix DeleteFile() + enable deleting files oldest files in level 0
Summary:
DeleteFile() call was broken for non-default column family. This fixes it. We might need this feature for mongo.

I also introduced a possibility of deleting oldest file in level 0.

Test Plan: added unit test to deletefile_test

Reviewers: ljin, yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24909
2014-10-21 11:23:06 -07:00
sdong 5bfb7f5d0b db_bench: seekrandom can specify --seek_nexts to read specific keys after seek.
Summary:
Add a function as tittle.
Also use the same parameter to fillseekseq too.

Test Plan: Run seekrandom using the new parameter

Reviewers: ljin, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: rven, igor, yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D25035
2014-10-20 11:55:33 -07:00
Lei Jin ff8f74c204 remove checking lower bound of level size
Summary:
as title

Test Plan:
make db_test
2014-10-17 21:18:51 -07:00
Lei Jin 2dd9bfe3a8 Sanitize block-based table index type and check prefix_extractor
Summary:
Respond to issue reported
https://www.facebook.com/groups/rocksdb.dev/permalink/651090261656158/
Change the Sanitize signature to take both DBOptions and CFOptions

Test Plan: unit test

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D25041
2014-10-17 21:18:36 -07:00
Yueh-Hsuan Chiang 6c66918645 Speed up DB::Open() and Version creation by limiting the number of FileMetaData initialization.
Summary:
This diff speeds up DB::Open() and Version creation by limiting the number of FileMetaData initialization. The behavior of Version::UpdateAccumulatedStats() is changed as follows:

* It only initializes the first 20 uninitialized FileMetaData from file.  This guarantees the size of the latest 20 files will always be compensated when they have any deletion entries.  Previously it may initialize all FileMetaData by loading all files at DB::Open().
* In case none the first 20 files has any data entry, UpdateAccumulatedStats() will initialize the FileMetaData of the oldest file.

Test Plan: db_test

Reviewers: igor, sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24255
2014-10-17 14:58:30 -07:00
Yueh-Hsuan Chiang 5db9e76644 Fix Mac compile error: C++11 forbids default arguments for lambda expressions
Summary:
Fix the following Mac compile error.
db/db_test.cc:8686:52: error: C++11 forbids default arguments for lambda expressions [-Werror,-Wlambda-extensions]
  auto gen_l0_kb = [this](int start, int size, int stride = 1) {
                                                   ^        ~
Test Plan:
db_test
2014-10-17 14:47:26 -07:00
Lei Jin f4363fb81c Fix DynamicMemtableOptions test
Summary: as title

Test Plan: make release

Reviewers: igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D25029
2014-10-17 10:09:45 -07:00
Igor Canadi ee80fb4b4a Total memtables size counter
Summary: Added one new counter for GetProperty

Test Plan: Not sure if needs a test case. compiles

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D25023
2014-10-17 09:26:27 -07:00
Lei Jin 274dc81c92 fix build failure
Summary: missed default value during merge

Test Plan: ./db_test

Reviewers: igor, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24975
2014-10-16 17:33:09 -07:00
Lei Jin d6c8dba727 Log MutableCFOptions in SetOptions
Summary: as title

Test Plan: make release

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24903
2014-10-16 17:22:28 -07:00
Lei Jin 4d5708aa56 dynamic soft_rate_limit and hard_rate_limit
Summary: as title

Test Plan:
unit test
I am only able to build the test case for hard_rate_limit.
soft_rate_limit is essentially the same thing as hard_rate_limit

Reviewers: igor, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24759
2014-10-16 17:21:31 -07:00
Lei Jin 065a67c4f0 dynamic disable_auto_compactions
Summary: Add more tests as well

Test Plan: unit test

Reviewers: igor, sdong, yhchiang

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24747
2014-10-16 17:14:17 -07:00
Lei Jin dc50a1a593 make max_write_buffer_number dynamic
Summary: as title

Test Plan: unit test

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D24729
2014-10-16 16:57:59 -07:00
Igor Canadi ca250d71a1 Move logging out of mutex
Summary: As title

Test Plan: compiles

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24897
2014-10-15 10:56:50 -07:00
Igor Canadi cc6c883f59 Stop stopping writes on bg_error_
Summary: This might have caused https://github.com/facebook/rocksdb/issues/345. If we're stopping writes and bg_error comes along, we will never unblock the write.

Test Plan: compiles

Reviewers: ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24807
2014-10-13 14:25:55 -07:00
Yueh-Hsuan Chiang 5a76186340 Fixed compile error on Mac: default arguments for lambda expressions
Summary:
Fixed the following compile error on Mac.

db/db_test.cc:8618:52: error: C++11 forbids default arguments for lambda expressions [-Werror,-Wlambda-extensions]
  auto gen_l0_kb = [this](int start, int size, int stride = 1) {
                                                   ^        ~
1 error generated.

Test Plan:
db_test
2014-10-10 14:10:16 -07:00
sdong b7d3d6ebc5 db_bench: set thread pool size according to max_background_flushes
Summary: option max_background_flushes doesn't make sense if thread pool size is not set accordingly. Set the thread pool size as what we do for max_background_compactions.

Test Plan: Run db_bench with max_background_flushes > 1

Reviewers: yhchiang, igor, rven, ljin

Reviewed By: ljin

Subscribers: MarkCallaghan, leveldb

Differential Revision: https://reviews.facebook.net/D24717
2014-10-09 20:38:15 -07:00
Tomislav Novak 88edfd90ae SkipListRep::LookaheadIterator
Summary:
This diff introduces the `lookahead` argument to `SkipListFactory()`. This is an
optimization for the tailing use case which includes many seeks. E.g. consider
the following operations on a skip list iterator:

   Seek(x), Next(), Next(), Seek(x+2), Next(), Seek(x+3), Next(), Next(), ...

If `lookahead` is positive, `SkipListRep` will return an iterator which also
keeps track of the previously visited node. Seek() then first does a linear
search starting from that node (up to `lookahead` steps). As in the tailing
example above, this may require fewer than ~log(n) comparisons as with regular
skip list search.

Test Plan:
Added a new benchmark (`fillseekseq`) which simulates the usage pattern. It
first writes N records (with consecutive keys), then measures how much time it
takes to read them by calling `Seek()` and `Next()`.

   $ time ./db_bench -num 10000000 -benchmarks fillseekseq -prefix_size 1 \
      -key_size 8 -write_buffer_size $[1024*1024*1024] -value_size 50 \
      -seekseq_next 2 -skip_list_lookahead=0
   [...]
   DB path: [/dev/shm/rocksdbtest/dbbench]
   fillseekseq  :       0.389 micros/op 2569047 ops/sec;

   real    0m21.806s
   user    0m12.106s
   sys     0m9.672s

   $ time ./db_bench [...] -skip_list_lookahead=2
   [...]
   DB path: [/dev/shm/rocksdbtest/dbbench]
   fillseekseq  :       0.153 micros/op 6540684 ops/sec;

   real    0m19.469s
   user    0m10.192s
   sys     0m9.252s

Reviewers: ljin, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb, march, lovro

Differential Revision: https://reviews.facebook.net/D23997
2014-10-07 11:48:23 -07:00
Igor Canadi f78b832e5d Log RocksDB version
Summary: This will be much easier than reviewing git sha's we currently have in our LOGs

Test Plan: none

Reviewers: sdong, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24591
2014-10-07 10:40:57 -07:00
Lei Jin 25f6a852e4 add db_test for changing memtable size
Summary:
The test only covers changing write_buffer_size. Other changable
parameters such bloom bits/probes are not obvious how to test.
Suggestions are welcome

Test Plan: db_test

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24429
2014-10-07 10:40:45 -07:00
Yueh-Hsuan Chiang 56dfd363fd Fix a check in database shutdown or Column family drop during flush.
Summary:
Fix a check in database shutdown or Column family drop during flush.

Special thanks to Maurice Barnum who spots the problem :)

Test Plan: db_test

Reviewers: ljin, igor, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24273
2014-10-03 00:25:27 -07:00
sdong 8ea232b9e3 Add number of records dropped in compaction summary
Summary:
Add two stats to compaction summary:
1. Total input records from previous level
2. Total number of records dropped after compaction

Test Plan: See outputs of printing when runnning locally

Reviewers: ljin, igor, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24411
2014-10-02 17:54:25 -07:00
sdong f4086a88b4 perf_context.get_from_output_files_time is set for MultiGet() and ReadOnly DB too.
Summary: perf_context.get_from_output_files_time is now only set writable DB's DB::Get(). Extend it to MultiGet() and read only DB.

Test Plan:
make all check
Fix perf_context_test and extend it to cover MultiGet(), as long as read-only DB. Run it and watch the results

Reviewers: ljin, yhchiang, igor

Reviewed By: igor

Subscribers: rven, leveldb

Differential Revision: https://reviews.facebook.net/D24207
2014-10-02 17:02:50 -07:00
Nik Bougalis a213971d8a Don't return (or dereference) dangling pointer 2014-10-02 14:33:16 -07:00
Igor Canadi d6987216c9 Merge pull request #327 from dalgaaf/wip-da-SCA-20141001
Fix some issues from SCA
2014-10-02 10:59:52 -07:00
Yueh-Hsuan Chiang 89833e5a85 Fixed signed-unsigned comparison warning in db_test.cc
Summary:
Fixed signed-unsigned comparison warning in db_test.cc

db/db_test.cc:8606:3: note: in instantiation of function template specialization 'rocksdb::test::Tester::IsEq<int, unsigned long>' requested here
  ASSERT_EQ(2, metadata.size());
    ^

Test Plan:
make db_test
2014-10-02 01:05:59 -07:00
Yueh-Hsuan Chiang fcac705f95 Fixed compile warning on Mac caused by unused variables.
Summary:
Fixed compile warning caused by unused variables.

./db/compaction_picker.h:118:7: error: private field 'max_grandparent_overlap_factor_' is not used [-Werror,-Wunused-private-field]
  int max_grandparent_overlap_factor_;
      ^
./db/compaction_picker.h:119:7: error: private field 'expanded_compaction_factor_' is not used [-Werror,-Wunused-private-field]
  int expanded_compaction_factor_;
      ^
2 errors generated.

Test Plan:
make db_test
2014-10-02 01:03:08 -07:00
Tomislav Novak 187b29938c ForwardIterator: update prev_key_ only if prefix hasn't changed
Summary:
Since ForwardIterator is on a level below DBIter, the latter may call Next() on
it (e.g. in order to skip deletion markers). Since this also updates
`prev_key_`, it may prevent the Seek() optimization.

For example, assume that there's only one SST file and it contains the following
entries: 0101, 0201 (`ValueType::kTypeDeletion`, i.e. a tombstone record), 0201
(`kTypeValue`), 0202. Memtable is empty. `Seek(0102)` will result in `prev_key_`
being set to `0201` instead of `0102`, since `DBIter::Seek()` will call
`ForwardIterator::Next()` to skip record 0201. Therefore, when `Seek(0102)` is
called again, `NeedToSeekImmutable()` will return true.

This fix relies on `prefix_extractor_` to detect prefix changes. `prev_key_` is
only set to `current_->key()` as long as they have the same prefix.

I also made a small change to `NeedToSeekImmutable()` so it no longer returns
true when the db is empty (i.e. there's nothing but a memtable).

Test Plan:
   $ TEST_TMPDIR=/dev/shm/rocksdbtest ROCKSDB_TESTS=TailingIterator ./db_test

Reviewers: sdong, igor, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23823
2014-10-01 17:10:48 -07:00
Lei Jin 5ec53f3edf make compaction related options changeable
Summary:
make compaction related options changeable. Most of changes are tedious,
following the same convention: grabs MutableCFOptions at the beginning
of compaction under mutex, then pass it throughout the job and register
it in SuperVersion at the end.

Test Plan: make all check

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23349
2014-10-01 16:19:16 -07:00
Danny Al-Gaaf 4a171882d6 db/version_set.cc: remove unnecessary checks
Fix for:

[db/version_set.cc:1219]: (style) Unsigned variable 'last_file'
 can't be negative so it is unnecessary to test it.
[db/version_set.cc:1234]: (style) Unsigned variable 'first_file'
 can't be negative so it is unnecessary to test it.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-10-01 11:09:22 +02:00
Danny Al-Gaaf 091153493c db/db_test.cc: remove unused variable
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-10-01 10:49:09 +02:00
Danny Al-Gaaf 5abd8add7d db/deletefile_test.cc: remove unused variable
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-10-01 10:49:08 +02:00
Danny Al-Gaaf d6483af870 db/db_test.cc: reduce scope of some variables
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-10-01 10:49:08 +02:00
Danny Al-Gaaf 44cca0cd8f db/db_iter.cc: remove unused variable
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-10-01 10:49:08 +02:00
Danny Al-Gaaf 8ee75dca2e db/memtable.cc: remove unused variable merge_result
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-09-30 23:30:33 +02:00
Danny Al-Gaaf 0fd8bbca53 db/db_impl.cc: reduce scope of prefix_initialized
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-09-30 23:30:33 +02:00
Danny Al-Gaaf 676ff7b1fb compaction_picker.cc: remove check for >=0 for unsigned
Fix for:

[db/compaction_picker.cc:923]: (style) Unsigned variable
 'start_index' can't be negative so it is unnecessary to test it.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-09-30 23:30:33 +02:00
Danny Al-Gaaf 33580fa39a db/db_impl.cc: fix object handling, remove double lines
Fix for:

[db/db_impl.cc:4039]: (error) Instance of 'StopWatch' object is
 destroyed immediately.
[db/db_impl.cc:4042]: (error) Instance of 'StopWatch' object is
 destroyed immediately.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-09-30 23:30:32 +02:00
Danny Al-Gaaf b8b7117e97 db/version_set.cc: use !empty() instead of 'size() > 0'
Use empty() since it should be prefered as it has, following
the standard, a constant time complexity regardless of the
containter type. The same is not guaranteed for size().

Fix for:
[db/version_set.cc:2250]: (performance) Possible inefficient
 checking for 'column_families_not_found' emptiness.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-09-30 23:30:31 +02:00
Danny Al-Gaaf 53910ddb15 db_test.cc: pass parameter by reference
Fix for:

[db/db_test.cc:6141]: (performance) Function parameter
 'key' should be passed by reference.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-09-30 23:30:31 +02:00
Danny Al-Gaaf 68ca534169 corruption_test.cc: pass parameter by reference
Fix for:

[db/corruption_test.cc:134]: (performance) Function parameter
 'fname' should be passed by reference.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-09-30 23:30:31 +02:00
Danny Al-Gaaf 7506198da2 cuckoo_table_db_test.cc: add flush after delete
It seems that a FlushMemTable() call is needed in the
Uint64Comparator test after call Delete(). Otherwise the later
via Put() added keys get lost with the next FlushMemTable()
call before the check.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2014-09-30 17:53:49 +02:00
Mark Callaghan 1f963305a8 Print MB per second compaction throughput separately for reads and writes
Summary:
From this line there used to be one column (MB/sec) that includes reads and writes. This change splits it and for real workloads the rd and wr rates might not match when keys are dropped.
2014/09/29-17:31:01.213162 7f929fbff700 (Original Log Time 2014/09/29-17:31:01.180025) [default] compacted to: files[2 5 0 0 0 0 0], MB/sec: 14.0 rd, 14.0 wr, level 1, files in(4, 0) out(5) MB in(8.5, 0.0) out(8.5), read-write-amplify(2.0) write-amplify(1.0) OK

Test Plan:
make check, grepped LOG

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Differential Revision: https://reviews.facebook.net/D24237
2014-09-29 17:51:40 -07:00
mike@arpaia.co f0f7955497 Fixing comile errors on OS X
Summary: Building master on OS X has some compile errors due to implicit type conversions which generate warnings which RocksDB's build settings raise as errors.

Test Plan: It compiles!

Reviewers: ljin, igor

Reviewed By: ljin

Differential Revision: https://reviews.facebook.net/D24135
2014-09-29 16:05:25 -07:00
Mark Callaghan 747523d241 Print per column family metrics in db_bench
Summary: see above

Test Plan:
make check, ran db_bench and looked at output

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Differential Revision: https://reviews.facebook.net/D24189
2014-09-29 15:47:05 -07:00
erik 827e31c746 Make test use a compatible type in the size checks. 2014-09-29 14:52:16 -07:00
Lei Jin fd5d80d55e CompactedDB: log using the correct info_log
Summary:
info_log from supplied Options can be nullptr. Using the one from
db_impl. Also call flush after that since no more loggging will happen
and LOG can contain partial output

Test Plan: verified with db_bench

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24183
2014-09-29 12:45:04 -07:00
Lei Jin 2faf49d5f1 use GetContext to replace callback function pointer
Summary:
Intead of passing callback function pointer and its arg on Table::Get()
interface, passing GetContext. This makes the interface cleaner and
possible better perf. Also adding a fast pass for SaveValue()

Test Plan: make all check

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24057
2014-09-29 11:09:09 -07:00
sdong 389edb6b1b universal compaction picker: use double for potential overflow
Summary: There is a possible overflow case in universal compaction picker. Use double to make the logic straight-forward

Test Plan: make all check

Reviewers: yhchiang, igor, MarkCallaghan, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23817
2014-09-26 16:17:05 -07:00
Lei Jin fbd2dafc9f CompactedDBImpl::MultiGet() for better CuckooTable performance
Summary:
Add the MultiGet API to allow prefetching.
With file size of 1.5G, I configured it to have 0.9 hash ratio that can
fill With 115M keys and result in 2 hash functions, the lookup QPS is
~4.9M/s  vs. 3M/s for Get().
It is tricky to set the parameters right. Since files size is determined
by power-of-two factor, that means # of keys is fixed in each file. With
big file size (thus smaller # of files), we will have more chance to
waste lot of space in the last file - lower space utilization as a
result. Using smaller file size can improve the situation, but that
harms lookup speed.

Test Plan: db_bench

Reviewers: yhchiang, sdong, igor

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23673
2014-09-25 13:34:51 -07:00