Commit graph

1236 commits

Author SHA1 Message Date
Igor Canadi dee91c259d WriteThread
Summary: This diff just moves the write thread control out of the DBImpl. I will need this as I will control column family data concurrency by only accessing some data in the write thread. That way, we won't have to lock our accesses to column family hash table (mappings from IDs to CFDs).

Test Plan: make check

Reviewers: sdong, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23301
2014-09-12 16:23:58 -07:00
Igor Canadi 540a257f2c Fix WAL synced
Summary: Uhm...

Test Plan: nope

Reviewers: sdong, yhchiang, tnovak, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23343
2014-09-12 16:15:29 -07:00
Chilledheart 49fe329e5e Fix build issue under macosx 2014-09-13 05:05:22 +08:00
Feng Zhu 0352a9fa91 add_wrapped_bloom_test
Summary:
1. wrap a filter policy like what fbcode/multifeed/rocksdb/MultifeedRocksDbKey.h
   to ensure that rocksdb works fine after filterpolicy interface change

Test Plan: 1. valgrind ./bloom_test

Reviewers: ljin, igor, yhchiang, dhruba, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23229
2014-09-11 16:33:46 -07:00
Igor Canadi 9c0e66ce98 Don't run background jobs (flush, compactions) when bg_error_ is set
Summary:
If bg_error_ is set, that means that we mark DB read only. However, current behavior still continues the flushes and compactions, even though bg_error_ is set.

On the other hand, if bg_error_ is set, we will return Status::OK() from CompactRange(), although the compaction didn't actually succeed.

This is clearly not desired behavior. I found this when I was debugging t5132159, although I'm pretty sure these aren't related.

Also, when we're shutting down, it's dangerous to exit RunManualCompaction(), since that will destruct ManualCompaction object. Background compaction job might still hold a reference to manual_compaction_ and this will lead to undefined behavior. I changed the behavior so that we only exit RunManualCompaction when manual compaction job is marked done.

Test Plan: make check

Reviewers: sdong, ljin, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23223
2014-09-11 16:24:16 -07:00
Igor Canadi a9639bda84 Fix valgrind test
Summary: Get valgrind to stop complaining about uninitialized value

Test Plan: valgrind not complaining anymore

Reviewers: sdong, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23289
2014-09-11 15:36:30 -07:00
Igor Canadi d1f24dc7ee Relax FlushSchedule test
Summary: The test makes sure that we don't call flush too often. For that, it's ok to check if we have less than 10 table files. Otherwise, the test is flaky because it's hard to estimate number of entries in the memtable before it gets flushed (any ideas?)

Test Plan: Still works, but hopefully less flaky.

Reviewers: ljin, sdong, yhchiang

Reviewed by: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23241
2014-09-11 11:00:45 -07:00
Igor Canadi 3d9e6f7759 Push model for flushing memtables
Summary:
When memtable is full it calls the registered callback. That callback then registers column family as needing the flush. Every write checks if there are some column families that need to be flushed. This completely eliminates the need for MakeRoomForWrite() function and simplifies our Write code-path.

There is some complexity with the concurrency when the column family is dropped. I made it a bit less complex by dropping the column family from the write thread in https://reviews.facebook.net/D22965. Let me know if you want to discuss this.

Test Plan: make check works. I'll also run db_stress with creating and dropping column families for a while.

Reviewers: yhchiang, sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23067
2014-09-10 18:46:09 -07:00
Igor Canadi 059e584dd3 [unit test] CompactRange should fail if we don't have space
Summary:
See t5106397.

Also, few more changes:
1. in unit tests, the assumption is that writes will be dropped when there is no space left on device. I changed the wording around it.
2. InvalidArgument() errors are only when user-provided arguments are invalid. When the file is corrupted, we need to return Status::Corruption

Test Plan: make check

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23145
2014-09-10 17:00:00 -07:00
Igor Canadi a52cecb56c Fix Mac compile 2014-09-09 18:42:35 -07:00
Jonah Cohen 092f97e219 Fix comments and typos
Summary: Correct some comments and typos in RocksDB.

Test Plan: Inspection

Reviewers: sdong, igor

Reviewed By: igor

Differential Revision: https://reviews.facebook.net/D23133
2014-09-09 15:20:49 -07:00
Igor Canadi 0a42295a24 Fix SimpleWriteTimeoutTest
Summary:
In column family's SanitizeOptions() [1], we make sure that min_write_buffer_number_to_merge is normal value. However, this test depended on the fact that setting min_write_buffer_number_to_merge to be bigger than max_write_buffer_number will cause a deadlock. I'm not sure how it worked before.

This diff fixes it by scheduling sleeping background task, which will actually block any attempts of flushing.

[1] https://github.com/facebook/rocksdb/blob/master/db/column_family.cc#L104

Test Plan: the test works now

Reviewers: yhchiang, sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23103
2014-09-09 11:50:05 -07:00
sdong 06d986252a Always pass MergeContext as pointer, not reference
Summary: To follow the coding convention and make sure when passing reference as a parameter it is also const, pass MergeContext as a pointer to mem tables.

Test Plan: make all check

Reviewers: ljin, igor

Reviewed By: igor

Subscribers: leveldb, dhruba, yhchiang

Differential Revision: https://reviews.facebook.net/D23085
2014-09-09 11:37:32 -07:00
Stanislau Hlebik d343c3fe46 Improve db recovery
Summary: Avoid creating unnecessary sst files while db opening

Test Plan: make all check

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: zagfox, yhchiang, ljin, leveldb

Differential Revision: https://reviews.facebook.net/D20661
2014-09-09 11:18:50 -07:00
Lei Jin 52311463e9 MemTableOptions
Summary: removed reference to options in WriteBatch and DBImpl::Get()

Test Plan: make all check

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23049
2014-09-08 18:46:52 -07:00
Lei Jin 171d4ff4a2 remove TailingIterator reference in db_impl.h
Summary: as title

Test Plan: make release

Reviewers: igor

Differential Revision: https://reviews.facebook.net/D23073
2014-09-08 15:39:53 -07:00
Lei Jin 9b0f7ffa1c rename version_set options_ to db_options_ to avoid confusion
Summary: as title

Test Plan: make release

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23007
2014-09-08 15:25:01 -07:00
Igor Canadi 2d57828d0e Check stop level trigger-0 before slowdown level-0 trigger
Summary: ...

Test Plan: Can't repro the test failure, but let's see what jenkins says

Reviewers: zagfox, sdong, ljin

Reviewed By: sdong, ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23061
2014-09-08 15:23:58 -07:00
Lei Jin 659d2d50c3 move compaction_filter to immutable_options
Summary:
all shared_ptrs are in immutable_options now. This will also make
options assignment a little cheaper

Test Plan: make release

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23001
2014-09-08 15:09:25 -07:00
Lei Jin 048560a642 reduce references to cfd->options() in DBImpl
Summary:
I found it is almost impossible to get rid of this function in a single
batch. I will take a step by step approach

Test Plan: make release

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22995
2014-09-08 15:04:34 -07:00
sdong 011241bb99 DB::Flush() Do not wait for background threads when there is nothing in mem table
Summary:
When we have multiple column families, users can issue Flush() on every column families to make sure everything is flushes, even if some of them might be empty. By skipping the waiting for empty cases, it can be greatly speed up.

Still wait for people's comments before writing unit tests for it.

Test Plan: Will write a unit test to make sure it is correct.

Reviewers: ljin, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D22953
2014-09-08 13:40:42 -07:00
Igor Canadi a2bb7c3c33 Push- instead of pull-model for managing Write stalls
Summary:
Introducing WriteController, which is a source of truth about per-DB write delays. Let's define an DB epoch as a period where there are no flushes and compactions (i.e. new epoch is started when flush or compaction finishes). Each epoch can either:
* proceed with all writes without delay
* delay all writes by fixed time
* stop all writes

The three modes are recomputed at each epoch change (flush, compaction), rather than on every write (which is currently the case).

When we have a lot of column families, our current pull behavior adds a big overhead, since we need to loop over every column family for every write. With new push model, overhead on Write code-path is minimal.

This is just the start. Next step is to also take care of stalls introduced by slow memtable flushes. The final goal is to eliminate function MakeRoomForWrite(), which currently needs to be called for every column family by every write.

Test Plan: make check for now. I'll add some unit tests later. Also, perf test.

Reviewers: dhruba, yhchiang, MarkCallaghan, sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22791
2014-09-08 11:20:25 -07:00
Feng Zhu 0af157f9bf Implement full filter for block based table.
Summary:
1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file.

2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter.

3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type.

4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h.

5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc

Benchmark: base commit 1d23b5c470
Command:
db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1
Read QPS increase for about 30% from 2230002 to 2991411.

Test Plan:
make all check
valgrind db_test
db_stress --use_block_based_filter = 0
./auto_sanity_test.sh

Reviewers: igor, yhchiang, ljin, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 10:37:05 -07:00
Igor Canadi 9360cc690e Fix valgrind issue 2014-09-08 08:01:25 -07:00
Igor Canadi 9f1c80b556 Drop column family from write thread
Summary: If we drop column family only from (single) write thread, we can be sure that nobody will drop the column family while we're writing (and our mutex is released). This greatly simplifies my patch that's getting rid of MakeRoomForWrite().

Test Plan: make check, but also running stress test

Reviewers: ljin, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22965
2014-09-05 15:20:05 -07:00
Igor Canadi 8de151bb99 Add db_bench with lots of column families to regression tests
Summary:
That way we can see when this graph goes up and be happy.

Couple of changes:
1. title
2. fix db_bench to delete column families before deleting the DB. this was asserting when compiled in debug mode
3. don't sync manifest when disableDataSync. We discussed this offline. I can move it to separate diff if you'd like

Test Plan: ran it

Reviewers: sdong, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22815
2014-09-05 14:20:18 -07:00
Lei Jin c9e419ccb6 rename options_ to db_options_ in DBImpl to avoid confusion
Summary: as title

Test Plan: make release

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22935
2014-09-05 11:48:17 -07:00
Radheshyam Balasundaram 5cd0576ffe Fix compaction bug in Cuckoo Table Builder. Use kvs_.size() instead of num_entries in FileSize() method.
Summary: Fix compaction bug in Cuckoo Table Builder. Use kvs_.size() instead of num_entries in FileSize() method. Also added tests.

Test Plan:
make check all
Also ran db_bench to generate multiple files.

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22743
2014-09-05 11:18:01 -07:00
Raghav Pisolkar 0fbb3facc0 fixed memory leak in unit test DBIteratorBoundTest
Summary: fixed memory leak in unit test DBIteratorBoundTest

Test Plan: ran valgrind test on my unit test

Reviewers: sdong

Differential Revision: https://reviews.facebook.net/D22911
2014-09-05 10:35:28 -07:00
Lei Jin adcd2532ca fix asan check
Summary:
PlainTable takes reference instead of a copy. Keep a copy in the test
code

Test Plan: make asan_check

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22899
2014-09-05 09:53:04 -07:00
liuhuahang bb6ae0f80c fix more compile warnings
N/A

Change-Id: I5b6f9c70aea7d3f3489328834fed323d41106d9f
Signed-off-by: liuhuahang <liuhuahang@zerus.co>
2014-09-05 14:14:37 +08:00
Nik Bougalis 4329d74e05 Fix swapped variable names to accurately reflect usage 2014-09-04 20:09:45 -07:00
Stanislau Hlebik 45a5e3ede0 Remove path with arena==nullptr from NewInternalIterator
Summary:
Simply code by removing code path which does not use Arena
from NewInternalIterator

Test Plan:
make all check
make valgrind_check

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22395
2014-09-04 17:40:41 -07:00
Lei Jin 5665e5e285 introduce ImmutableOptions
Summary:
As a preparation to support updating some options dynamically, I'd like
to first introduce ImmutableOptions, which is a subset of Options that
cannot be changed during the course of a DB lifetime without restart.

ColumnFamily will keep both Options and ImmutableOptions. Any component
below ColumnFamily should only take ImmutableOptions in their
constructor. Other options should be taken from APIs, which will be
allowed to adjust dynamically.

I am yet to make changes to memtable and other related classes to take
ImmutableOptions in their ctor. That can be done in a seprate diff as
this one is already pretty big.

Test Plan: make all check

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D22545
2014-09-04 16:18:36 -07:00
Raghav Pisolkar e0b99d4f5d created a new ReadOptions parameter 'iterate_upper_bound' 2014-09-04 11:00:16 -07:00
liuhuahang ef5b384729 fix a few compile warnings
1, const qualifiers on return types make no sense and will trigger a compile warning: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]

2, class HistogramImpl has virtual functions and thus should have a virtual destructor

3, with some toolchain, the macro __STDC_FORMAT_MACROS is predefined and thus should be checked before define

Change-Id: I69747a03bfae88671bfbb2637c80d17600159c99
Signed-off-by: liuhuahang <liuhuahang@zerus.co>
2014-09-04 23:06:23 +08:00
Lei Jin 9b58c73c7c call SanitizeDBOptionsByCFOptions() in the right place
Summary: It only covers Open() with default column family right now

Test Plan: make release

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22467
2014-09-02 14:42:23 -07:00
Igor Canadi a84234a61b Ignore missing column families
Summary:
Before this diff, whenever we Write to non-existing column family, Write() would fail.

This diff adds an option to not fail a Write() when WriteBatch points to non-existing column family. MongoDB said this would be useful for them, since they might have a transaction updating an index that was dropped by another thread. This way, they don't have to worry about checking if all indexes are alive on every write. They don't care if they lose writes to dropped index.

Test Plan: added a small unit test

Reviewers: sdong, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22143
2014-09-02 13:29:05 -07:00
Igor Canadi 7f19bb93c6 Merge pull request #242 from tdfischer/perf-timer-destructors
Refactor PerfStepTimer to automatically stop on destruct
2014-09-02 13:06:40 -07:00
Feng Zhu 8438a19360 fix dropping column family bug
Summary: 1. db/db_impl.cc:2324 (DBImpl::BackgroundCompaction) should not raise bg_error_ when column family is dropped during compaction.

Test Plan: 1. db_stress

Reviewers: ljin, yhchiang, dhruba, igor, sdong

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22653
2014-09-02 12:25:58 -07:00
Torrie Fischer 6614a48418 Refactor PerfStepTimer to stop on destruct
This eliminates the need to remember to call PERF_TIMER_STOP when a section has
been timed. This allows more useful design with the perf timers and enables
possible return value optimizations. Simplistic example:

class Foo {
  public:
    Foo(int v) : m_v(v);
  private:
    int m_v;
}

Foo makeFrobbedFoo(int *errno)
{
  *errno = 0;
  return Foo();
}

Foo bar(int *errno)
{
  PERF_TIMER_GUARD(some_timer);

  return makeFrobbedFoo(errno);
}

int main(int argc, char[] argv)
{
  Foo f;
  int errno;

  f = bar(&errno);

  if (errno)
    return -1;
  return 0;
}

After bar() is called, perf_context.some_timer would be incremented as if
Stop(&perf_context.some_timer) was called at the end, and the compiler is still
able to produce optimizations on the return value from makeFrobbedFoo() through
to main().
2014-09-02 12:04:22 -07:00
Igor Canadi 990df99a61 Fix ios compile
Summary: We need to set contbuild for this :)

Test Plan: compiles

Reviewers: sdong, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22701
2014-09-02 10:50:15 -07:00
Igor Canadi 7dcadb1d37 Don't let flush preempt compaction in certain cases
Summary:
I have an application configured with 16 background threads. Write rates are high. L0->L1 compactions is very slow and it limits the concurrency of the system. While it's happening, other 15 threads are idle. However, when there is a need of a flush, that one thread busy with L0->L1 is doing flush, instead of any other 15 threads that are just sitting there.

This diff prevents that. If there are threads that are idle, we don't let flush preempt compaction.

Test Plan: Will run stress test

Reviewers: ljin, sdong, yhchiang

Reviewed By: sdong, yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D22299
2014-09-02 08:34:54 -07:00
Nik Bougalis f09329cb01 Fix candidate file comparison when using path ids 2014-08-31 00:54:15 -07:00
Tomislav Novak 0f9c43ea36 ForwardIterator: reset incomplete iterators on Seek()
Summary:
When reading from kBlockCacheTier, ForwardIterator's internal child iterators
may end up in the incomplete state (read was unable to complete without doing
disk I/O). `ForwardIterator::status()` will correctly report that; however, the
iterator may be stuck in that state until all sub-iterators are rebuilt:

  * `NeedToSeekImmutable()` may return false even if some sub-iterators are
    incomplete
  * one of the child iterators may be an empty iterator without any state other
    that the kIncomplete status (created using `NewErrorIterator()`); seeking on
    any such iterator has no effect -- we need to construct it again

Akin to rebuilding iterators after a superversion bump, this diff makes forward
iterator reset all incomplete child iterators when `Seek()` or `Next()` are
called.

Test Plan: TEST_TMPDIR=/dev/shm/rocksdbtest ROCKSDB_TESTS=TailingIterator ./db_test

Reviewers: igor, sdong, ljin

Reviewed By: ljin

Subscribers: lovro, march, leveldb

Differential Revision: https://reviews.facebook.net/D22575
2014-08-29 16:21:29 -07:00
Lei Jin 722d80c374 reduce recordTick overhead in compaction loop
Summary: It is too expensive to bump ticker to every key/vaue pair

Test Plan: make release

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22527
2014-08-29 09:51:09 -07:00
Igor Canadi 0c26e76b28 Merge pull request #237 from tdfischer/tdfischer/faster-timeout-test
test: db: fix test to have a smaller timeout for when it runs on faster ...
2014-08-28 20:40:10 -04:00
Feng Zhu 1d23b5c470 remove_internal_filter_policy
Summary:
1. remove class InternalFilterPolicy in db/dbformat.h
2. Transformation from internal key to user key is done in filter_block.cc
3. This is a preparation for patch D20979

Test Plan:
make all check
valgrind ./db_test

Reviewers: igor, yhchiang, ljin, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22509
2014-08-28 17:06:29 -07:00
Igor Canadi d977e55596 Don't let other compactions run when manual compaction runs
Summary:
Based on discussions from t4982833. This is just a short-term fix, I plan to revamp manual compaction process as part of t4982812.

Also, I think we should schedule automatic compactions at the very end of manual compactions, not when we're done with one level. I made that change as part of this diff. Let me know if you disagree.

Test Plan: make check for now

Reviewers: sdong, tnovak, yhchiang, ljin

Reviewed By: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22401
2014-08-28 13:06:28 -04:00
Igor Canadi d5bd6c772b Fix ios compile
Summary: No __thread for ios.

Test Plan: compile works for ios now

Reviewers: ljin, dhruba

Reviewed By: dhruba

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22491
2014-08-28 12:46:05 -04:00
Radheshyam Balasundaram 4142a3e783 Adding a user comparator for comparing Uint64 slices.
Summary:
- New Uint64 comparator
- Modify Reader and Builder to take custom user comparators instead of bytewise comparator
- Modify logic for choosing unused user key in builder
- Modify iterator logic in reader
- test changes

Test Plan:
cuckoo_table_{builder,reader,db}_test
make check all

Reviewers: ljin, sdong

Reviewed By: ljin

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D22377
2014-08-27 10:39:31 -07:00
Radheshyam Balasundaram b6fd7811eb Don't do memtable lookup in db_impl_readonly if memtables are empty while opening db.
Summary: In DBImpl::Recover method, while loading memtables, also check if memtables are empty. Use this in DBImplReadonly to determine whether to lookup memtable or not.

Test Plan:
db_test
make check all

Reviewers: sdong, yhchiang, ljin, igor

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22281
2014-08-26 17:19:03 -07:00
Stanislau Hlebik 9dcb75b6d9 Add is-file-deletions-enabled property
Summary:
Add property 'rocksdb.is-file-deletions-enable'
	 which equals disable_delete_obsole_file_

Test Plan: make all check

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22119
2014-08-26 16:26:29 -07:00
Lei Jin 1755581f19 improve OptimizeForPointLookup()
Summary: also fix HISTORY.md

Test Plan: make all check

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22437
2014-08-26 14:15:00 -07:00
Lei Jin bda6f3363d fix valgrind error in c_test caused by BlockBasedTableOptions
Summary:
It was creating BlockBasedTableOptions object in a loop without calling
destroy()

Test Plan: valgrind ./c_test --leak-check=full --show-reachable=yes

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22431
2014-08-26 09:57:25 -07:00
Torrie Fischer 0db6b028e7 Update timeout to 50ms instead of 3. 2014-08-26 09:38:45 -07:00
Lei Jin 23861857c4 ReadOptions.total_order_seek to allow total order seek for block-based table when hash index is enabled
Summary: as title

Test Plan: table_test

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22239
2014-08-25 16:14:30 -07:00
Lei Jin a98badff16 print table options
Summary: Add a virtual function in table factory that will print table options

Test Plan: make release

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22149
2014-08-25 14:24:09 -07:00
Lei Jin 384400128f move block based table related options BlockBasedTableOptions
Summary:
I will move compression related options in a separate diff since this
diff is already pretty lengthy.
I guess I will also need to change JNI accordingly :(

Test Plan: make all check

Reviewers: yhchiang, igor, sdong

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21915
2014-08-25 14:22:05 -07:00
Igor Canadi 42ea795209 Fix concurrency issue in CompactionPicker
Summary:
I am currently working on a project that uses RocksDB. While debugging some perf issues, I came up across interesting compaction concurrency issue. Namely, I had 15 idle threads and a good comapction to do, but CompactionPicker returned "Compaction nothing to do". Here's how Internal stats looked:

    2014/08/22-08:08:04.551982 7fc7fc3f5700 ------- DUMPING STATS -------
    2014/08/22-08:08:04.552000 7fc7fc3f5700
    ** Compaction Stats [default] **
    Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s)  Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt)  Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms)
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      L0     7/5        353   1.0      0.0     0.0      0.0       2.3      2.3    0.0   0.0      0.0      9.4        0         0         0         0        247        46    5.359       8.53          1 8526.25
      L1     2/2         86   1.3      2.6     1.9      0.7       2.6      1.9    2.7   1.3     24.3     24.0       39        19        71        52        109        11    9.938       0.00          0    0.00
      L2    26/0        833   1.3      5.7     1.7      4.0       5.2      1.2    6.3   3.0     15.6     14.2       47       112       147        35        373        44    8.468       0.00          0    0.00
      L3    12/0        505   0.1      0.0     0.0      0.0       0.0      0.0    0.0   0.0      0.0      0.0        0         0         0         0          0         0    0.000       0.00          0    0.00
     Sum    47/7       1778   0.0      8.3     3.6      4.6      10.0      5.4    8.1   4.4     11.6     14.1       86       131       218        87        728       101    7.212       8.53          1 8526.25
     Int     0/0          0   0.0      2.4     0.8      1.6       2.7      1.2   11.5   6.1     12.0     13.6       20        43        63        20        203        23    8.845       0.00          0    0.00
    Flush(GB): accumulative 2.266, interval 0.444
    Stalls(secs): 0.000 level0_slowdown, 0.000 level0_numfiles, 8.526 memtable_compaction, 0.000 leveln_slowdown_soft, 0.000 leveln_slowdown_hard
    Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 1 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard

    ** DB Stats **
    Uptime(secs): 336.8 total, 60.4 interval
    Cumulative writes: 61584000 writes, 6480589 batches, 9.5 writes per batch, 1.39 GB user ingest
    Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, 0.00 GB written
    Interval writes: 11235257 writes, 1175050 batches, 9.6 writes per batch, 259.9 MB user ingest
    Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, 0.00 MB written

To see what happened, go here: 47b452cfcf/db/compaction_picker.cc (L430)
* The for loop started with level 1, because it has the worst score.
* PickCompactionBySize on L429 returned nullptr because all files were being compacted
* ExpandWhileOverlapping(c) returned true (because that's what it does when it gets nullptr!?)
* for loop break-ed, never trying compactions for level 2 :( :(

This bug was present at least since January. I have no idea how we didn't find this sooner.

Test Plan:
Unit testing compaction picker is hard. I tested this by running my service and observing L0->L1 and L2->L3 compactions in parallel. However, for long-term, I opened the task #4968469. @yhchiang is currently refactoring CompactionPicker, hopefully the new version will be unit-testable ;)

Here's how my compactions look like after the patch:

    2014/08/22-08:50:02.166699 7f3400ffb700 ------- DUMPING STATS -------
    2014/08/22-08:50:02.166722 7f3400ffb700
    ** Compaction Stats [default] **
    Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s)  Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt)  Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms)
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      L0     8/5        404   1.5      0.0     0.0      0.0       4.3      4.3    0.0   0.0      0.0      9.6        0         0         0         0        463        88    5.260       0.00          0    0.00
      L1     2/2         60   0.9      4.8     3.9      0.8       4.7      3.9    2.4   1.2     23.9     23.6       80        23       131       108        204        19   10.747       0.00          0    0.00
      L2    23/3        697   1.0     11.6     3.5      8.1      10.9      2.8    6.4   3.1     17.7     16.6       95       242       317        75        669        92    7.268       0.00          0    0.00
      L3    58/14      2207   0.3      6.2     1.6      4.6       5.9      1.3    7.4   3.6     14.6     13.9       43       121       159        38        436        36   12.106       0.00          0    0.00
     Sum    91/24      3368   0.0     22.5     9.1     13.5      25.8     12.4   11.2   6.0     13.0     14.9      218       386       607       221       1772       235    7.538       0.00          0    0.00
     Int     0/0          0   0.0      3.2     0.9      2.3       3.6      1.3   15.3   8.0     12.4     13.7       24        66        89        23        266        27    9.838       0.00          0    0.00
    Flush(GB): accumulative 4.336, interval 0.444
    Stalls(secs): 0.000 level0_slowdown, 0.000 level0_numfiles, 0.000 memtable_compaction, 0.000 leveln_slowdown_soft, 0.000 leveln_slowdown_hard
    Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard

    ** DB Stats **
    Uptime(secs): 577.7 total, 60.1 interval
    Cumulative writes: 116960736 writes, 11966220 batches, 9.8 writes per batch, 2.64 GB user ingest
    Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, 0.00 GB written
    Interval writes: 11643735 writes, 1206136 batches, 9.7 writes per batch, 269.2 MB user ingest
    Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, 0.00 MB written

Yay for concurrent L0->L1 and L2->L3 compactions!

Reviewers: sdong, yhchiang, ljin

Reviewed By: yhchiang

Subscribers: yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D22305
2014-08-22 11:32:40 -07:00
Yueh-Hsuan Chiang 47b452cfcf Fix the error of c_test.c
Summary:
Fix the error of c_test.c

Test Plan:
make c_test
./c_test
2014-08-20 17:05:29 -07:00
Yueh-Hsuan Chiang 562b7a1f28 Add missing implementaiton of SanitizeDBOptions in simple_table_db_test.cc
Summary:
Add missing implementaiton of SanitizeDBOptions in simple_table_db_test.cc

Test Plan:
make simple_table_db_test.cc
2014-08-20 16:33:25 -07:00
Yueh-Hsuan Chiang 63a2215c63 Improve Options sanitization and add MmapReadRequired() to TableFactory
Summary:
Currently, PlainTable must use mmap_reads.  When PlainTable is used but
allow_mmap_reads is not set, rocksdb will fail in flush.

This diff improve Options sanitization and add MmapReadRequired() to
TableFactory.

Test Plan:
export ROCKSDB_TESTS=PlainTableOptionsSanitizeTest
make db_test -j32
./db_test

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: you, leveldb

Differential Revision: https://reviews.facebook.net/D21939
2014-08-20 15:53:39 -07:00
sdong 10720a5587 Revert the unintended change that DestroyDB() doesn't clean up info logs.
Summary: A previous change triggered a change by mistake: DestroyDB() will keep info logs under DB directory. Revert the unintended change.

Test Plan: Add a unit test case to verify it.

Reviewers: ljin, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22209
2014-08-20 12:07:32 -07:00
Torrie Fischer 7c5173d27f test: db: fix test to have a smaller timeout for when it runs on faster hardware 2014-08-19 13:45:12 -07:00
Radheshyam Balasundaram 162b8151f1 Adding Column Family support in db_bench.
Summary:
Adding num_column_families flag. Adding support for column families in DoWrite and ReadRandom methods.
[Igor, please let me know if this approach sounds good. I shall add it to other methods too.]

Test Plan: Ran fillseq on 1M keys and 10 Column families and ran readrandom.

Reviewers: sdong, yhchiang, igor, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21387
2014-08-18 18:15:01 -07:00
sdong 28b5c76004 WriteBatchWithIndex: a wrapper of WriteBatch, with a searchable index
Summary:
Add WriteBatchWithIndex so that a user can query data out of a WriteBatch, to support MongoDB's read-its-own-write.

WriteBatchWithIndex uses a skiplist to store the binary index. The index stores the offset of the entry in the write batch. When searching for a key, the key for the entry is read by read the entry from the write batch from the offset.

Define a new iterator class for querying data out of WriteBatchWithIndex. A user can create an iterator of the write batch for one column family, seek to a key and keep calling Next() to see next entries.

I will add more unit tests if people are OK about this API.

Test Plan:
make all check
Add unit tests.

Reviewers: yhchiang, igor, MarkCallaghan, ljin

Reviewed By: ljin

Subscribers: dhruba, leveldb, xjin

Differential Revision: https://reviews.facebook.net/D21381
2014-08-18 16:37:38 -07:00
Radheshyam Balasundaram 36e759d199 Adding Cuckoo Table SST option to db_bench
Summary: Adding flags to use cuckoo table SST in db_bench.cc

Test Plan: Ran benchmark with fillseq and readrandom

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21729
2014-08-18 11:59:38 -07:00
Igor Canadi a6fd14c881 Fix valgrind error in c_test 2014-08-18 11:08:51 -07:00
Igor Canadi c8ecfaedd0 Merge pull request #230 from cockroachdb/spencerkimball/send-user-keys-to-v2-filter
Pass parsed user key to prefix extractor in V2 compaction
2014-08-18 11:09:30 -04:00
Yueh-Hsuan Chiang 570ba5aca8 Avoid retrying to read property block from a table when it does not exist.
Summary:
Avoid retrying to read property block from a table when it does not exist
in updating stats for compensating deletion entries.

In addition, ReadTableProperties() now returns Status::NotFound instead
of Status::Corruption when table properties does not exist in the file.

Test Plan:
make db_test -j32
export ROCKSDB_TESTS=CompactionDeleteionTrigger
./db_test

Reviewers: ljin, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21867
2014-08-15 12:17:44 -07:00
sdong 58b0f9d890 Support purging logs from separate log directory
Summary:
1. Support purging info logs from a separate paths from DB path. Refactor the codes of generating info log prefixes so that it can be called when generating new files and scanning log directory.
2. Fix the bug of not scanning multiple DB paths (should only impact multiple DB paths)

Test Plan:
Add unit test for generating and parsing info log files
Add end-to-end test in db_test

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb, igor, dhruba

Differential Revision: https://reviews.facebook.net/D21801
2014-08-14 13:22:50 -07:00
Lei Jin 58c49466d2 Allow env_posix to lower background thread IO priority
Summary: This is a linux-specific system call.

Test Plan: ran db_bench

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: haobo, leveldb

Differential Revision: https://reviews.facebook.net/D21183
2014-08-13 20:49:58 -07:00
Lei Jin 5a5953b388 Add histogram for DB_SEEK
Summary: as title

Test Plan: make release

Reviewers: sdong, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21717
2014-08-13 15:56:37 -07:00
Feng Zhu 5e642403a9 log db path info before open
Summary: 1. write db MANIFEST, CURRENT, IDENTITY, sst files, log files to log before open

Test Plan: run db and check LOG file

Reviewers: ljin, yhchiang, igor, dhruba, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21459
2014-08-13 13:45:13 -07:00
Stanislau Hlebik 0c9dc9f8e0 Remove malloc from FormatFileNumber
Summary: Replace unnecessary malloc with stack allocation

Test Plan: make all check

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21771
2014-08-13 11:57:40 -07:00
sdong 48081777f3 Revert "Include candidate files under options.db_log_dir in FindObsoleteFiles()"
This reverts commit 54153ab07a.
2014-08-12 18:14:27 -07:00
Yueh-Hsuan Chiang 0138b8eba8 Fixed compile errors (signed / unsigned comparison) in cuckoo_table_db_test on Mac
Summary:
Fixed compile errors (signed / unsigned comparison) in cuckoo_table_db_test on Mac

Test Plan:
make cuckoo_table_db_test
2014-08-12 17:35:09 -07:00
Yueh-Hsuan Chiang 1562653ba0 Fixed a signed-unsigned comparison error in db_test
Summary:
Fixed a signed-unsigned comparison error in db_test

Test Plan:
make db_test
2014-08-12 17:26:47 -07:00
Lei Jin 218857b3f5 remove tailing_iter.h/cc
Summary: as title

Test Plan:
make all check
ran db_bench and saw seek stats at the end

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21651
2014-08-12 17:13:15 -07:00
Lei Jin 5d0074c471 set bytes_per_sync to 1MB if rate limiter is enabled
Summary: as title

Test Plan: make all check

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21201
2014-08-12 16:42:18 -07:00
Spencer Kimball 3fcf7b26b9 Pass parsed user key to prefix extractor in V2 compaction
Previously, the prefix extractor was being supplied with the RocksDB
key instead of a parsed user key. This makes correct interpretation
by calling application fragile or impossible.
2014-08-12 18:48:28 -04:00
Stanislau Hlebik 2fa643466d Add scope guard
Summary: Small change: replace mutex_.Lock/mutex_.Unlock() with scope guard

Test Plan: make all check

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21609
2014-08-12 12:13:13 -07:00
Stanislau Hlebik 06a52bda64 Flush only one column family
Summary:
Currently DBImpl::Flush() triggers flushes in all column families.
Instead we need to trigger just the column family specified.

Test Plan: make all check

Reviewers: igor, ljin, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20841
2014-08-11 22:10:32 -07:00
Radheshyam Balasundaram 9674c11d01 Integrating Cuckoo Hash SST Table format into RocksDB
Summary:
Contains the following changes:
- Implementation of cuckoo_table_factory
- Adding cuckoo table into AdaptiveTableFactory
- Adding cuckoo_table_db_test, similar to lines of plain_table_db_test
- Minor fixes to Reader: When a key is found in the table, return the key found instead of the search key.
- Minor fixes to Builder: Add table properties that are required by Version::UpdateTemporaryStats() during Get operation. Don't define curr_node as a reference variable as the memory locations may get reassigned during tree.push_back operation, leading to invalid memory access.

Test Plan:
cuckoo_table_reader_test --enable_perf
cuckoo_table_builder_test
cuckoo_table_db_test
make check all
make valgrind_check
make asan_check

Reviewers: sdong, igor, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21219
2014-08-11 20:21:07 -07:00
Siying Dong 18efdba8d5 Merge pull request #228 from miguelportilla/develop
Changes to support unity build:

Script for building the unity.cc file via Makefile
Unity executable Makefile target for testing builds
Source code changes to fix compilation of unity build
2014-08-11 11:10:23 -07:00
Feng Zhu d3f2ec694f check prefix_size when using hash search in db_bench
Summary:
1. Check prefix_size when enable use_hash_search in db_bench
2. Remove include/statistics.h in db_bench

Test Plan: ./db_bench --use_hash_search=1

Reviewers: ljin, yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21375
2014-08-11 10:47:52 -07:00
miguelportilla 93e6b5e9d9 Changes to support unity build:
* Script for building the unity.cc file via Makefile
* Unity executable Makefile target for testing builds
* Source code changes to fix compilation of unity build
2014-08-11 13:22:47 -04:00
sdong 54153ab07a Include candidate files under options.db_log_dir in FindObsoleteFiles()
Summary: In FindObsoleteFiles(), we don't scan db_log_dir. Add it.

Test Plan: make all check

Reviewers: ljin, igor, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, yhchiang

Differential Revision: https://reviews.facebook.net/D21429
2014-08-08 17:37:03 -07:00
sdong 4632239d13 Need to schedule compactions when manual compaction finishes
Summary: If there is an outstanding compaction scheduled but at the time a manual compaction is triggered, the manual compaction will preempt. In the end of the manual compaction, we should try to schedule compactions to make sure those preempted ones are not skipped.

Test Plan: make all check

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb, dhruba, igor

Differential Revision: https://reviews.facebook.net/D21321
2014-08-08 12:28:36 -07:00
Igor Canadi 5e0868147d Fix SIGSEGV in travis
Summary:
Travis build was failing a lot. For example see https://travis-ci.org/facebook/rocksdb/builds/31425845

This fixes it.

Also, please don't put any code after SignalAll :)

Test Plan: no more SIGSEGV

Reviewers: yhchiang, sdong, ljin

Reviewed By: ljin

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D21417
2014-08-08 10:24:00 -07:00
Stanislau Hlebik 7c88249f51 Fix db_test and DBIter
Summary: Fix old issue with DBTest.Randomized with BlockBasedTableWithWholeKeyHashIndex + added printing in DBTest.Randomized.

Test Plan: make all check

Reviewers: zagfox, igor, ljin, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21003
2014-08-08 09:44:14 -07:00
Igor Canadi 894a77abdf Fix leak in c_test 2014-08-07 15:06:52 -07:00
Igor Canadi 323d6e3542 Fix c_test 2014-08-07 14:29:38 -04:00
Igor Canadi f8d6a2981f Merge pull request #224 from cockroachdb/spencerkimball/compaction-filter-v2-c-bindings
Add support for C bindings to the compaction V2 filter mechanism.
2014-08-07 14:10:54 -04:00
sdong 76dcf7eefd Minor: fix a format
Summary: A format fixing

Test Plan: N/A

Reviewers: ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21255
2014-08-06 18:11:33 -07:00
sdong 7abe9655d3 Fix valgrind failure caused by recent checked-in.
Summary: Initialize un-initialized parameters

Test Plan: run the failed test (c_test)

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21249
2014-08-06 17:45:47 -07:00
Spencer Kimball 38e8b727a8 Fix typo, add missing inclusion of state void* in invocation of
create_compaction_filter_v2_.
2014-08-06 18:42:15 -04:00
Spencer Kimball c1f588af71 Add support for C bindings to the compaction V2 filter mechanism.
Test Plan: make c_test && ./c_test

Some fixes after merge.
2014-08-06 15:55:48 -04:00
sdong 1242bfcad7 Add DB property "rocksdb.estimate-table-readers-mem"
Summary:
Add a DB Property "rocksdb.estimate-table-readers-mem" to return estimated memory usage by all loaded table readers, other than allocated from block cache.

Refactor the property codes to allow getting property from a version, with DB mutex not acquired.

Test Plan: Add several checks of this new property in existing codes for various cases.

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: xjin, igor, leveldb

Differential Revision: https://reviews.facebook.net/D20733
2014-08-06 11:39:46 -07:00