Commit graph

1286 commits

Author SHA1 Message Date
Igor Canadi 9d0577a6be Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_impl.h
	db/transaction_log_impl.cc
	db/transaction_log_impl.h
	include/rocksdb/options.h
	util/env.cc
	util/options.cc
2014-03-03 18:29:03 -08:00
Igor Canadi 5142b37000 Fix a group commit bug in LogAndApply
Summary:
EncodeTo(&record) does not overwrite, it appends to it.

This means that group commit log and apply will look something like:
record1
record1record2
record1record2record3

I'm surprised this didn't show up in production, but I think the reason is that MANIFEST group commit almost never happens.

This bug turned up in column family work, where opening a database failed with "adding a same column family twice".

Test Plan: Tested the change in column family branch and observed that the problem is gone (with db_stress)

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16461
2014-03-03 17:10:43 -08:00
Igor Canadi 97eddef235 Reopen DB in crash test
Summary:
Why don't we automatically reopen DB when running crash test (running in our nightly build)? If I understand correctly, crashtest is manually reopenning the DB, but then the DB does not check its consistency when you kill db_stress process and then re-run it again.
Does this make sense?

Test Plan: not reall

Reviewers: dhruba, haobo, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16167
2014-03-03 17:10:30 -08:00
Igor Canadi f9b2f0ad79 [CF] Fix CF bugs in WriteBatch
Summary:
This diff fixes two bugs:
* Increase sequence number even if WriteBatch fails. This is important because WriteBatches in WAL logs have implictly increasing sequence number, even if one update in a write batch fails. This caused some writes to get lost in my CF stress testing
* Tolerate 'invalid column family' errors on recovery. When a column family is dropped, processing WAL logs can have some WriteBatches that still refer to the dropped column family. In recovery environment, we want to ignore those errors. In client's Write() code path, however, we want to return the failure to the client if he's trying to add data to invalid column family.

Test Plan: db_stress's verification works now

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16533
2014-03-03 17:07:46 -08:00
kailiu 1aeafeccac Make the Create() function comform the convention
Summary:

Moved "Return multiple values" a more conventional way.
2014-03-01 23:43:03 -08:00
Kai Liu 16d4e45c12 Fix the memory leak in table index
Summary:

BinarySearchIndex didn't use unique_ptr to guard the block object nor
delete it in destructor, leading to valgrind failure for "definite
memory leak".

Test Plan:
re-ran the failed valgrind test cases
2014-03-01 11:50:35 -08:00
Kai Liu ff151132b3 Fix the unit test failure in devbox
Summary:
My last diff was developed in MacOS but in devserver environment error occurs.

I dug into the problem and found the way we calcuate approximate data size is pretty out-of-date. We can use table properties to get more accurate results.

Test Plan: ran ./table_test and passed

Reviewers: igor, dhruba, haobo, sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16509
2014-02-28 20:40:05 -08:00
kailiu 74939a9e13 Make the block-based table's index pluggable
Summary:
This patch introduced a new table options that allows us to make
block-based table's index pluggable.

To support that new features:

* Code has been refacotred to be more flexible and supports this option well.
* More documentation is added for the existing obsecure functionalities.
* Big surgeon on DataBlockReader(), where the logic was really convoluted.
* Other small code cleanups.

The pluggablility will mostly affect development of internal modules
and won't change frequently, as a result I intentionally avoid
heavy-weight patterns (like factory) and try to make it simple.

Test Plan: make all check

Reviewers: haobo, sdong

Reviewed By: sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16395
2014-02-28 18:19:07 -08:00
kailiu bf86af5174 Remove the terrible hack in for flush_block_policy_factory
Summary:
Previous code is too convoluted and I must be drunk for letting
such code to be written without a second thought.

Thanks to the discussion with @sdong, I added the `Options` when
generating the flusher, thus avoiding the tricks.

Just FYI: I resisted to add Options in flush_block_policy.h since I
wanted to avoid cyclic dependencies: FlushBlockPolicy dpends on Options
and Options also depends FlushBlockPolicy... While I appreciate my
effort to prevent it, the old design turns out creating more troubles than
it tried to avoid.

Test Plan: ran ./table_test

Reviewers: sdong

Reviewed By: sdong

CC: sdong, leveldb

Differential Revision: https://reviews.facebook.net/D16503
2014-02-28 16:39:27 -08:00
Igor Canadi 8ea21a778b [CF] Rething LogAndApply for column families
Summary:
I though I might get away with as little changes to LogAndApply() as possible. It turns out this is not the case.

This diff introduces different behavior of LogAndApply() for three cases:
1. column family add
2. column family drop
3. no-column family manipulation

(1) and (2) don't support group commit yet.

There were a lot of problems with old version od LogAndApply, detected by db_stress. The biggest was non-atomicity of manifest writes and metadata changes (i.e. if column family add is in manifest, it also has to be in in-memory data structure).

Test Plan: db_stress

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16491
2014-02-28 14:46:48 -08:00
Igor Canadi 58ca641d53 Make Log::Reader more robust
Summary:
This diff does two things:
(1) Log::Reader does not report a corruption when the last record in a log or manifest file is truncated (meaning that log writer died in the middle of the write). Inherited the code from LevelDB: https://code.google.com/p/leveldb/source/detail?r=269fc6ca9416129248db5ca57050cd5d39d177c8#
(2) Turn off mmap writes for all writes to log and manifest files

(2) is necessary because if we use mmap writes, the last record is not truncated, but is actually filled with zeros, making checksum fail. It is hard to recover from checksum failing.

Test Plan:
Added unit tests from LevelDB
Actually recovered a "corrupted" MANIFEST file.

Reviewers: dhruba, haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16119
2014-02-28 13:19:47 -08:00
Igor Canadi 12966ec1bb Fix LogAndApply() group commit 2014-02-28 12:22:45 -08:00
Yueh-Hsuan Chiang a77527f2af Add ReadOptions to TransactionLogIterator.
Summary:
Add an optional input parameter ReadOptions to DB::GetUpdateSince(),
which allows the verification of checksums to be disabled by setting
ReadOptions::verify_checksums to false.

Test Plan: Tests are done off-line and will not be included in the regular unit test.

Reviewers: igor

Reviewed By: igor

CC: leveldb, xjin, dhruba

Differential Revision: https://reviews.facebook.net/D16305
2014-02-28 11:50:36 -08:00
Igor Canadi f6a257b6a1 Set dropped column family before persisting in the manifest 2014-02-28 11:49:32 -08:00
Igor Canadi 670f3ba212 [CF] Small refactor of Recover() and DumpManifest() 2014-02-28 11:25:38 -08:00
Igor Canadi 099ad94306 Set log number for column family 2014-02-28 11:08:24 -08:00
Igor Canadi 510f84b686 [CF] CreateColumnFamily fix
Summary:
This fixes few bugs with CreateColumnFamily
* We first have to LogAndApply and then call VersionSet::CreateColumnFamily. Otherwise, WriteSnapshot might be invoked, writing out column family add inside of LogAndApply, even though it's not really committed
* Fix LogAndApplyHelper() to not apply log number to column_family_data, which is in case of column family add, just a dummy (default) column family
* Create SuperVerion when creating column family

Test Plan: column_family_test

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16443
2014-02-28 10:40:52 -08:00
Kai Liu 6ba1084f24 Fix some compilation bugs in different platforms
Summary:

detect some problems when testing my 3rd party release tool.
2014-02-27 22:20:17 -08:00
Kai Liu 99e4b40a55 Fix the [-Werror=sign-compare] issues
Summary:

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
2014-02-27 22:18:33 -08:00
Igor Canadi 206b38f31c SetLogNumber in CreateColumnFamily 2014-02-27 16:53:45 -08:00
Igor Canadi b41a3bc4da [CF] Change flow of CreateColumnFamily
Summary:
Previously, we first wrote to the manifest and then created internal data structure.
Now, we first create internal data structure. That way, we can write out internal comparator to the manifest

Test Plan: column_family_test

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16425
2014-02-27 16:49:49 -08:00
Igor Canadi 492c9f71c6 [CF] Column family support for LDB tool
Summary: Added list_column_family command and also updated dump_manifest

Test Plan: no

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16419
2014-02-27 16:39:23 -08:00
Yueh-Hsuan Chiang 9a7b74954f Refine the checks in InfoLogLevel test.
Summary:
InfoLogLevel test now checks the number of lines of the output log file
instead of the number of bytes in the log file.

This diff fixes the issue that the previous InfoLogLevel test in
auto_roll_logger_test passed in make check but fails when valgrind
is used.

Test Plan: run with make check and valgrind.

Reviewers: kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16407
2014-02-27 14:00:10 -08:00
Lei Jin ad0c3747cb cache SuperVersion in thread local storage to avoid mutex lock
Summary: as title

Test Plan:
asan_check
will post results later

Reviewers: haobo, igor, dhruba, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16257
2014-02-27 11:38:55 -08:00
Igor Canadi 85b1b5e1b9 [CF] WaitForFlush() instead of sleeping
Summary: If we sleep for 300ms the test fails in valgrind because it takes more than 300ms to flush. This way we WaitForFlush() when we're expecting flush, but still sleep and check if the flush happens even though it's not supposed to.

Test Plan: notest

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16401
2014-02-27 10:31:05 -08:00
kailiu e41c060a06 Make sure logger is safely released in InfoLogLevel
Summary: fix the memory leak that was captured by jenkin build.

Test Plan: ran the valgrind test locally

Reviewers: yhchiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16389
2014-02-26 19:07:57 -08:00
kailiu 444cafc28c Fix inconsistent code format
Summary:
Found some function follows camel style. When naming funciton, we have two styles:

Trivially expose internal data in readonly mode: `all_lower_case()`
Regular function: `CapitalizeFirstLetter()`

I renames these functions.

Test Plan: make -j32

Reviewers: haobo, sdong, dhruba, igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16383
2014-02-26 18:56:39 -08:00
Igor Canadi 4c42201204 [CF] Test fixes and speedup 2014-02-26 17:34:39 -08:00
Igor Canadi 343c32be7b [CF] DifferentMergeOperators and DifferentCompactionStyles tests
Summary:
Two new column family tests:
* DifferentMergeOperators -- three column families, one without merge operator, one with add operator and one with append operator. verify that operations work as expected.
* DifferentCompactionStyles -- three column families, two with level compactions and one with universal compaction. trigger the compactions and verify they work as expected.

Test Plan: nope

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16377
2014-02-26 16:05:24 -08:00
sdong a04dbf6e49 PlainTable::Next() should pass the error message from ReadKey()
Summary:
PlainTable::Next() should pass the error message from ReadKey(). Now it would return a wrong error message.
Also improve the messages of status when failing to read

Test Plan: make all check

Reviewers: ljin, kailiu, haobo

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16365
2014-02-26 15:12:44 -08:00
Yueh-Hsuan Chiang ccaedd16d4 Enable log info with different levels.
Summary:
* Now each Log related function has a variant that takes an additional
  argument indicating its log level, which is one of the following:
 - DEBUG, INFO, WARN, ERROR, FATAL.

* To ensure backward-compatibility, old version Log functions are kept
  unchanged.

* Logger now has a member variable indicating its log level.  Any incoming
  Log request which log level is lower than Logger's log level will not
  be output.

* The output of the newer version Log will be prefixed by its log level.

Test Plan:
Add a LogType test in auto_roll_logger_test.cc

 = Sample log output =
    2014/02/11-00:03:07.683895 7feded179840 [DEBUG] this is the message to be written to the log file!!
    2014/02/11-00:03:07.683898 7feded179840 [INFO] this is the message to be written to the log file!!
    2014/02/11-00:03:07.683900 7feded179840 [WARN] this is the message to be written to the log file!!
    2014/02/11-00:03:07.683903 7feded179840 [ERROR] this is the message to be written to the log file!!
    2014/02/11-00:03:07.683906 7feded179840 [FATAL] this is the message to be written to the log file!!

Reviewers: dhruba, xjin, kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16071
2014-02-26 14:41:28 -08:00
Igor Canadi 3c81546422 [CF] Make LogDeletionTest less flakey
Summary: Retry GetSortedWalFiles() and also wait 20ms before counting number of log files. WaitForFlush() doesn't necessarily wait for logs to be deleted, since logs are deleted outside of the mutex.

Test Plan: column_family_test

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16371
2014-02-26 14:41:18 -08:00
Igor Canadi 6e7cae7711 [CF] More tests
Summary: New unit tests for column families

Test Plan: this is a test

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16359
2014-02-26 14:16:23 -08:00
Igor Canadi 9bce2b2a84 [CF] Fix lint errors in CF code
Summary: Big CF diff uncovered some lint errors. This diff fixes some of them. Not much to see here

Test Plan: make check

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16347
2014-02-26 10:10:00 -08:00
Igor Canadi 8b7ab9951c [CF] Handle failure in WriteBatch::Handler
Summary:
* Add ColumnFamilyHandle::GetID() function. Client needs to know column family's ID to be able to construct WriteBatch
* Handle WriteBatch::Handler failure gracefully. Since WriteBatch is not a very smart function (it takes raw CF id), client can add data to WriteBatch for column family that doesn't exist. In that case, we need to gracefully return failure status from DB::Write(). To do that, I added a return Status to WriteBatch functions PutCF, DeleteCF and MergeCF.

Test Plan: Added test to column_family_test

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16323
2014-02-26 10:10:00 -08:00
Igor Canadi 944ff673d6 Merge branch 'master' into columnfamilies 2014-02-26 10:09:52 -08:00
Lei Jin b2795b799e thread local pointer storage
Summary:
This is not a generic thread local implementation in the sense that it
only takes pointer. But it does support multiple instances per thread
and lets user plugin function to perform cleanup when thread exits or an
instance gets destroyed.

Test Plan: unit test for now

Reviewers: haobo, igor, sdong, dhruba

Reviewed By: igor

CC: leveldb, kailiu

Differential Revision: https://reviews.facebook.net/D16131
2014-02-25 17:47:37 -08:00
Igor Canadi 8895526308 Merge branch 'master' into columnfamilies 2014-02-25 17:04:48 -08:00
Igor Canadi 5ad7ee03ea [CF] Log deletion in column families
Summary:
* Added unit test that verifies that obsolete files are deleted.
* Advance log number for empty column family when cutting log file.
* MinLogNumber() bug fix! (caught by the new unit test)

Test Plan: unit test

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16311
2014-02-25 16:54:41 -08:00
Igor Canadi dc277f0ab7 [CF] Adaptation of GetLiveFiles for CF
Summary: Even if user flushes the memtables before getting live files, we still can't guarantee that new data didn't come in (to already-flushed memtables). If we want backups to provide consistent view of the database, we still need to get WAL files.

Test Plan: backupable_db_test

Reviewers: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16299
2014-02-25 13:21:14 -08:00
Igor Canadi 5a91746277 log file is uint64_t 2014-02-25 12:57:43 -08:00
Igor Canadi 4209516359 Schedule flush when waiting on flush
Summary:
This will also help with avoiding the deadlock. If a flush failed and we're waiting for a memtable to be flushed, we should schedule a new flush and hope a new one succeedes.

If paranoid_checks = false, Wait() will still hang on ENOSPC, but at least it will automatically continue when the space frees up. Current behavior both hangs and deadlocks.

Also, I renamed some 'compaction' to 'flush'. 'compaction' was leveldb way of saying things.

Test Plan: make check

Reviewers: dhruba, haobo, ljin

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16281
2014-02-25 12:04:14 -08:00
Lei Jin dea894ef8d expose wal_dir in db_bench
Summary: as title

Test Plan: ran db_bench

Reviewers: dhruba, haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16269
2014-02-25 10:43:46 -08:00
Igor Canadi a8c1f2970d Merge pull request #90 from alberts/morecapi
A few more C API functions.
2014-02-25 10:42:53 -08:00
Albert Strasheim 72aacf6b96 A few more C API functions. 2014-02-25 10:32:28 -08:00
Igor Canadi b69e7d99d5 [CF] Better handling of memtable logs
Summary: DBImpl now keeps a list of alive_log_files_. On every FindObsoleteFiles, it deletes all alive log files that are smaller than versions_->MinLogNumber()

Test Plan:
make check passes
no specific unit tests yet, will add

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16293
2014-02-25 09:55:13 -08:00
Igor Canadi d39da4b578 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
2014-02-24 17:09:05 -08:00
Igor Canadi 6ed450a58c DeleteFile should schedule Flush or Compaction
Summary:
More info here: https://github.com/facebook/rocksdb/issues/89
If flush fails because of ENOSPC, we have a deadlock problem. This is a quick fix that will continue the normal operation when user deletes the file and frees up the space on the device.

We need to address the issue more broadly with bg_error_ cleanup.

Test Plan: make check

Reviewers: dhruba, haobo, ljin

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16275
2014-02-24 16:00:13 -08:00
Igor Canadi 2bf1151a25 Fix C API 2014-02-24 15:15:34 -08:00
sdong 01c27be5fb A simple benchmark to measure WAL append latency
Summary: A simple benchmark that simulates WAL append. It can be used to test different platform/file system's performance on WAL.

Test Plan: run it.

Reviewers: haobo, kailiu

Reviewed By: haobo

CC: igor, dhruba, i.am.jin.lei, yhchiang, leveldb, nkg-

Differential Revision: https://reviews.facebook.net/D16239
2014-02-24 14:39:32 -08:00