Commit graph

360 commits

Author SHA1 Message Date
Siying Dong 298ba27ae2 Merge pull request #846 from yuslepukhin/enble_c4244_lossofdata
Enable MS compiler warning c4244.
2015-12-23 22:59:42 -08:00
Siying Dong 7810aa802a Merge pull request #899 from zhipeng-jia/fix_clang_warning
Fix clang warnings
2015-12-23 22:58:52 -08:00
sdong 3280ae9a29 Fix warning in release
Summary: Warning in release build.

Test Plan: Make release and make all

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D52305
2015-12-23 22:38:12 -08:00
Zhipeng Jia 73b175a773 Fix clang warnings regarding unnecessary std::move 2015-12-24 04:10:00 +08:00
agiardullo eff309867e Do not use timed_mutex in TransactionDB
Summary: Stopped using std::timed_mutex as it has known issues in older versiong of gcc.  Ran into these problems when testing MongoRocks.

Test Plan: unit tests.  Manual mongo testing on gcc 4.8.

Reviewers: igor, yhchiang, rven, IslamAbdelRahman, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D52197
2015-12-18 17:26:02 -08:00
Dmitri Smirnov b6d19adcf7 Use port size_t formatting 2015-12-15 11:34:22 -08:00
Venkatesh Radhakrishnan 030215bf01 Running manual compactions in parallel with other automatic or manual compactions in restricted cases
Summary:
This diff provides a framework for doing manual
compactions in parallel with other compactions. We now have a deque of manual compactions. We also pass manual compactions as an argument from RunManualCompactions down to
BackgroundCompactions, so that RunManualCompactions can be reentrant.
Parallelism is controlled by the two routines
ConflictingManualCompaction to allow/disallow new parallel/manual
compactions based on already existing ManualCompactions. In this diff, by default manual compactions still have to run exclusive of other compactions. However, by setting the compaction option, exclusive_manual_compaction to false, it is possible to run other compactions in parallel with a manual compaction. However, we are still restricted to one manual compaction per column family at a time. All of these restrictions will be relaxed in future diffs.
I will be adding more tests later.

Test Plan: Rocksdb regression + new tests + valgrind

Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, sdong

Reviewed By: sdong

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47973
2015-12-14 11:20:34 -08:00
Dmitri Smirnov 236fe21c92 Enable MS compiler warning c4244.
Mostly due to the fact that there are differences in sizes of int,long
  on 64 bit systems vs GNU.
2015-12-11 16:47:34 -08:00
agiardullo 84f98792d6 Transaction::SetWriteOptions()
Summary: Add support to change write options after creating a transaction.  This is needed for MongoRocks.

Test Plan: added test

Reviewers: sdong, rven, kradhakrishnan, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D51867
2015-12-11 16:08:25 -08:00
agiardullo 3bfd3d39a3 Use SST files for Transaction conflict detection
Summary:
Currently, transactions can fail even if there is no actual write conflict.  This is due to relying on only the memtables to check for write-conflicts.  Users have to tune memtable settings to try to avoid this, but it's hard to figure out exactly how to tune these settings.

With this diff, TransactionDB will use both memtables and SST files to determine if there are any write conflicts.  This relies on the fact that BlockBasedTable stores sequence numbers for all writes that happen after any open snapshot.  Also, D50295 is needed to prevent SingleDelete from disappearing writes (the TODOs in this test code will be fixed once the other diff is approved and merged).

Note that Optimistic transactions will still rely on tuning memtable settings as we do not want to read from SST while on the write thread.  Also, memtable settings can still be used to reduce how often TransactionDB needs to read SST files.

Test Plan: unit tests, db bench

Reviewers: rven, yhchiang, kradhakrishnan, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb, yoshinorim

Differential Revision: https://reviews.facebook.net/D50475
2015-12-11 12:34:11 -08:00
Igor Canadi 64fa43843b Merge pull request #862 from ceph/wip-env
implement EnvMirror
2015-12-10 18:45:07 -08:00
Sage Weil 2074ddd625 env: add EnvMirror
This is an Env implementation that mirrors all storage-related methods on
two different backend Env's and verifies that they return the same
results (return status and read results).  This is useful for implementing
a new Env and verifying its correctness.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-12-10 21:32:45 -05:00
charsyam c30b499541 fix typos in comments 2015-12-11 01:54:48 +09:00
Siying Dong fa3dbf203f Merge pull request #853 from Vaisman/enable_C4267_warning
Enable C4267 warning
2015-12-08 17:59:24 -08:00
agiardullo e5c5f23814 Support marking snapshots for write-conflict checking - Take 2
Summary:
D51183 was reverted due to breaking the LITE build.

This diff is the same as D51183 but with a fix for the LITE BUILD(D51693)

Test Plan: run all unit tests

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D51711
2015-12-08 16:47:31 -08:00
sdong 1d63c3d610 Revert "Support marking snapshots for write-conflict checking"
This reverts commit ec704aafdc for it broke RocksDB LITE build.
2015-12-08 09:27:17 -08:00
agiardullo ec704aafdc Support marking snapshots for write-conflict checking
Summary:
D50475 enables using SST files for transaction write-conflict checking.  In order for this to work, we need to make sure not to compact out SingleDeletes when there is an earlier transaction snapshot(D50295).  If there is a long-held snapshot, this could reduce the benefit of the SingleDelete optimization.

This diff allows Transactions to mark snapshots as being used for write-conflict checking.  Then, during compaction, we will be able to optimize SingleDeletes better in the future.

This diff adds a flag to SnapshotImpl which is used by Transactions.  This diff also passes the earliest write-conflict snapshot's sequence number to CompactionIterator.  This diff does not actually change Compaction (after this diff is pushed, D50295 will be able to use this information).

Test Plan: no behavior change, ran existing tests

Reviewers: rven, kradhakrishnan, yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D51183
2015-12-07 19:40:51 -08:00
Jay Edgar b28b7c6dd9 Added callback notification when a snapshot is created
Summary: When SetSnapshot() is used the caller immediately knows a snapshot has been created, but when SetSnapshotOnNextOperation() is used the caller needs a way to get notified when that snapshot has been generated.  This creates an interface that the client can implement that will be called at the time the snapshot is created.

Test Plan: Added a new SetSnapshotOnNextOperationWithNotification test into the transaction_test.

Reviewers: sdong, anthony

Reviewed By: anthony

Subscribers: yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D51177
2015-12-04 10:20:36 -08:00
Alex Yang e8180f9901 added public api to schedule flush/compaction, code to prevent race with db::open
Summary:
Fixes T8781168.

Added a new function EnableAutoCompactions in db.h to be publicly
avialable.  This allows compaction to be re-enabled after disabling it via
SetOptions

Refactored code to set the dbptr earlier on in TransactionDB::Open and DB::Open
Temporarily disable auto_compaction in TransactionDB::Open until dbptr is set to
prevent race condition.

Test Plan:
Ran make all check

verified fix on myrocks side:
was able to reproduce the seg fault with
../tools/mysqltest.sh --mem --force rocksdb.drop_table

method was to manually sleep the thread after DB::Open but before TransactionDB ptr was
assigned in transaction_db_impl.cc:
  DB::Open(db_options, dbname, column_families_copy, handles, &db);
  clock_t goal = (60000 * 10) + clock();
  while (goal > clock());
  ...dbptr(aka rdb) gets assigned below

verified my changes fixed the issue.

Also added unit test 'ToggleAutoCompaction' in transaction_test.cc

Reviewers: hermanlee4, anthony

Reviewed By: anthony

Subscribers: alex, dhruba

Differential Revision: https://reviews.facebook.net/D51147
2015-12-03 22:59:44 -08:00
SherlockNoMad b4efaebff0 Fix ms version Appveyor build error 2015-11-30 11:07:47 -08:00
sdong 6bbfa1874b BackupDB to have a mode to use file size in file name
Summary: Getting file size from all the backup files can take a long time. In some cases, the sizes are available in file names. We allow a mode to get those sizes from file name.

Test Plan:
Make some unit tests in backupable_db_test to run in such a mode.
Make sure RocksDB Lite builds too.

Reviewers: IslamAbdelRahman, rven, yhchiang, kradhakrishnan, anthony, igor

Reviewed By: igor

Subscribers: muthu, asameet, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D51243
2015-11-25 11:55:37 -08:00
Vasili Svirski 41b32c6059 Enable C4267 warning
* conversion from 'size_t' to 'type', by add static_cast

Tested:
* by build solution on Windows, Linux locally,
* run tests
* build CI system successful
2015-11-24 16:33:09 +03:00
sdong 6170fec251 Fix build broken by previous commit of "option helper refactor"
Summary:
The commit of option helper refactor broken the build:
(1) a git merge problem
(2) some uncaught compiler warning
Fix it.

Test Plan: Make sure "make all" passes

Reviewers: anthony, IslamAbdelRahman, rven, kradhakrishnan, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D50943
2015-11-17 16:52:54 -08:00
Islam AbdelRahman a163cc2d5a Lint everything
Summary:
```
arc2 lint --everything
```

run the linter on the whole code repo to fix exisitng lint issues

Test Plan: make check -j64

Reviewers: sdong, rven, anthony, kradhakrishnan, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D50769
2015-11-16 12:56:21 -08:00
Yueh-Hsuan Chiang d781da8164 Add CheckOptionsCompatibility() API to options_util
Summary:
Add CheckOptionsCompatibility() API to options_util that returns
Status::OK if the input DBOptions and ColumnFamilyDescriptors
are compatible with the latest options stored in the specified DB path.

Test Plan: Added tests in options_util_test

Reviewers: igor, anthony, IslamAbdelRahman, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D50649
2015-11-12 16:52:51 -08:00
Yueh-Hsuan Chiang 5ac16300b0 Fixed valgrind error in options_util_test
Summary:
Fixed valgrind error in options_util_test by deleting the
compaction_filter allocated from RandomInitCFOptions().

Test Plan: valgrind --error-exitcode=2 --leak-check=full ./options_util_test

Reviewers: anthony, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D50661
2015-11-12 14:12:27 -08:00
Yueh-Hsuan Chiang e11f676e34 Add OptionsUtil::LoadOptionsFromFile() API
Summary:
This patch adds OptionsUtil::LoadOptionsFromFile() and
OptionsUtil::LoadLatestOptionsFromDB(), which allow developers
to construct DBOptions and ColumnFamilyOptions from a RocksDB
options file.  Note that most pointer-typed options such as
merge_operator will not be constructed.

With this API, developers no longer need to remember all the
options in order to reopen an existing rocksdb instance like
the following:

  DBOptions db_options;
  std::vector<std::string> cf_names;
  std::vector<ColumnFamilyOptions> cf_opts;

  // Load primitive-typed options from an existing DB
  OptionsUtil::LoadLatestOptionsFromDB(
      dbname, &db_options, &cf_names, &cf_opts);

  // Initialize necessary pointer-typed options
  cf_opts[0].merge_operator.reset(new MyMergeOperator());
  ...

  // Construct the vector of ColumnFamilyDescriptor
  std::vector<ColumnFamilyDescriptor> cf_descs;
  for (size_t i = 0; i < cf_opts.size(); ++i) {
    cf_descs.emplace_back(cf_names[i], cf_opts[i]);
  }

  // Open the DB
  DB* db = nullptr;
  std::vector<ColumnFamilyHandle*> cf_handles;
  auto s = DB::Open(db_options, dbname, cf_descs,
                    &handles, &db);

Test Plan:
Augment existing tests in column_family_test
options_test
db_test

Reviewers: igor, IslamAbdelRahman, sdong, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D49095
2015-11-12 06:52:43 -08:00
Yueh-Hsuan Chiang e114f0abb8 Enable RocksDB to persist Options file.
Summary:
This patch allows rocksdb to persist options into a file on
DB::Open, SetOptions, and Create / Drop ColumnFamily.
Options files are created under the same directory as the rocksdb
instance.

In addition, this patch also adds a fail_if_missing_options_file in DBOptions
that makes any function call return non-ok status when it is not able to
persist options properly.

  // If true, then DB::Open / CreateColumnFamily / DropColumnFamily
  // / SetOptions will fail if options file is not detected or properly
  // persisted.
  //
  // DEFAULT: false
  bool fail_if_missing_options_file;

Options file names are formatted as OPTIONS-<number>, and RocksDB
will always keep the latest two options files.

Test Plan:
Add options_file_test.

options_test
column_family_test

Reviewers: igor, IslamAbdelRahman, sdong, anthony

Reviewed By: anthony

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48285
2015-11-10 22:58:01 -08:00
Satnam Singh c8e01ef982 Delete test iterators
Summary:
Valgrind reports an issue with the test for GeoIterator.
This diff explicitly deletes the two iterators used in this test.

Test Plan: This diff is for a test. The test still passes.

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D50193
2015-11-05 13:30:51 -08:00
SherlockNoMad 2e45409910 Fix appveyor build failure 2015-11-04 20:10:16 -08:00
Yueh-Hsuan Chiang dba5e00741 Fixed the compile error in RocksDBLite in memory_test.cc
Summary:
Fixed the following compile error in RocksDBLite:

18:00:33   CC       utilities/memory/memory_test.o
18:00:33 utilities/memory/memory_test.cc: In function ‘int main(int, char**)’:
18:00:33 utilities/memory/memory_test.cc:268:66: error: ‘printf’ was not declared in this scope
18:00:33    printf("Skipped in RocksDBLite as utilities are not supported.");
18:00:33                                                                   ^

Test Plan: make OPT=-DROCKSDB_LITE memory_test

Reviewers: igor, sdong, anthony, IslamAbdelRahman

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D50145
2015-11-03 18:06:23 -08:00
Yueh-Hsuan Chiang 7d7ee2b654 Add Memory Insight support to utilities
Summary:
This patch introduces utilities/memory, which currently includes
GetApproximateMemoryUsageByType that reports different types of
rocksdb memory usage given a list of input DBs.

The API also take care of the case where Cache could be shared
across multiple column families / multiple db instances.

Currently, it reports memory usage of memtable, table-readers
and cache.

Test Plan: utilities/memory/memory_test.cc

Reviewers: igor, anthony, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D49257
2015-11-03 17:52:17 -08:00
Satnam Singh c9aef3c41c Add RocksDb/GeoDb Iterator interface
Summary:
This diff is a first step towards an iterator based interface for the
SearchRadial method which replaces a vector of GeoObjects with an
iterator for GeoObjects. This diff works by just wrapping the iterator
for the encapsulated vector of GeoObjects. A future diff could extend
this approach by defining an interator in terms of the underlying
iteration in SearchRadial which would then remove the need to have
an in-memory representation for all the matching GeoObjects.
Fixes T8421387

Test Plan:
The existing tests have been modified to work with the new
interface.

Reviewers: IslamAbdelRahman, kradhakrishnan, dhruba, igor

Reviewed By: igor

Subscribers: igor, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D50031
2015-11-03 15:20:58 -08:00
Islam AbdelRahman 2872e0c8c2 Clean and expose CreateLoggerFromOptions
Summary:
CreateLoggerFromOptions have some parameters like  db_log_dir and env, these parameters are redundant since they already exist in DBOptions

this patch remove the redundant parameters and expose CreateLoggerFromOptions to users

Test Plan: make check

Reviewers: igor, anthony, yhchiang, rven, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: dhruba, hermanlee4

Differential Revision: https://reviews.facebook.net/D49713
2015-10-29 18:07:37 -07:00
Dmitri Smirnov 3c750b59ae No need to #ifdef test only code on windows 2015-10-22 15:15:37 -07:00
Siying Dong 90228bb088 Merge pull request #771 from maximecaron/patch-1
Fix build error using Visual Studio 12
2015-10-19 15:26:22 -07:00
Jay Edgar 8f143e03fb Add ClearSnapshot()
Summary:
MyRocks needs the ability to clear a snapshot for Read Committed support

Test Plan: transaction_test

Reviewers: anthony

Reviewed By: anthony

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48861
2015-10-16 11:53:30 -07:00
Maxime Caron 92060b2153 Fix build error using Visual Studio 12 2015-10-14 16:05:55 -07:00
agiardullo def74f8763 Deferred snapshot creation in transactions
Summary: Support for Transaction::CreateSnapshotOnNextOperation().  This is to fix a write-conflict race-condition that Yoshinori was running into when testing MyRocks with LinkBench.

Test Plan: New tests

Reviewers: yhchiang, spetrunia, rven, igor, yoshinorim, sdong

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48099
2015-10-09 15:46:16 -07:00
agiardullo c5f3707d42 DisableIndexing() for Transactions
Summary:
MyRocks reported some perfomance issues when inserting many keys into a transaction due to the cost of inserting new keys into WriteBatchWithIndex.  Frequently, they don't even need the keys to be indexed as they don't need to read them back.  DisableIndexing() can be used to avoid the cost of indexing.

I also plan on eventually investigating if we can improve WriteBatchWithIndex performance.  But even if we improved the perf here, it is still beneficial to be able to disable the indexing all together for large transactions.

Test Plan: unit test

Reviewers: igor, rven, yoshinorim, spetrunia, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48471
2015-10-09 15:36:09 -07:00
sdong 776bd8d5eb Pass column family ID to table property collector
Summary: Pass column family ID through TablePropertiesCollectorFactory::CreateTablePropertiesCollector() so that users can identify which column family this file is for and handle it differently.

Test Plan: Add unit test scenarios in tests related to table properties collectors to verify the information passed in is correct.

Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48411
2015-10-09 14:36:51 -07:00
agiardullo 03b08ba9a9 Return MergeInProgress when fetching from transactions or WBWI with overwrite_key
Summary:
WriteBatchWithIndex::GetFromBatchAndDB only works correctly for overwrite_key=false.  Transactions use overwrite_key=true (since WriteBatchWithIndex::GetIteratorWithBase only works when overwrite_key=true).  So currently, Transactions could return incorrectly merged results when calling Get/GetForUpdate().

Until a permanent fix can be put in place, Transaction::Get[ForUpdate] and WriteBatchWithIndex::GetFromBatch[AndDB] will now return MergeInProgress if the most recent write to a key in the batch is a Merge.

Test Plan: more tests

Reviewers: sdong, yhchiang, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47817
2015-09-30 11:14:42 -07:00
agiardullo 5a51fa907b Fix accidental object copy in transactions
Summary: Should have used auto& instead of auto.  Also needed to change the code a bit due to const correctness.

Test Plan: unit tests

Reviewers: sdong, igor, yoshinorim, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47787
2015-09-29 17:36:56 -07:00
agiardullo afe0dc539b SingleDelete support for Transactions
Summary: Transactional SingleDelete is needed for MyRocks.  Note: This diff requires D47529.

Test Plan: Added some new tests in this diff as well as more tests added in D47529

Reviewers: rven, sdong, igor, yhchiang

Reviewed By: yhchiang

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47535
2015-09-28 12:14:26 -07:00
agiardullo 25fd743d75 Fix SingleDelete support in WriteBatchWithIndex
Summary: Fixed some  bugs in using SingleDelete on a WriteBatchWithIndex and added some tests.

Test Plan: new tests

Reviewers: sdong, yhchiang, rven, kradhakrishnan, IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47529
2015-09-25 12:23:07 -07:00
Dmitri Smirnov 489a3e95d4 Re-work to support size_t max constant for 32/64-bit. 2015-09-22 10:34:21 -07:00
Dmitri Smirnov 5e8f0a66db Use port::constant for std::muneric_limtis<>::max() 2015-09-21 16:56:47 -07:00
Dmitri Smirnov 2754ec9994 Fix Windows constexpr issue and '#ifdef' column_family_test in Release. 2015-09-21 16:21:01 -07:00
agiardullo c7fba80291 Fix non-deterministic failure in backupable_db_test
Summary: FailOverwritingBackups has unexpected results when auto-compaction runs.

Test Plan: ran test a bunch of times

Reviewers: IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47181
2015-09-17 20:14:51 -07:00
Andres Noetzli 014fd55adc Support for SingleDelete()
Summary:
This patch fixes #7460559. It introduces SingleDelete as a new database
operation. This operation can be used to delete keys that were never
overwritten (no put following another put of the same key). If an overwritten
key is single deleted the behavior is undefined. Single deletion of a
non-existent key has no effect but multiple consecutive single deletions are
not allowed (see limitations).

In contrast to the conventional Delete() operation, the deletion entry is
removed along with the value when the two are lined up in a compaction. Note:
The semantics are similar to @igor's prototype that allowed to have this
behavior on the granularity of a column family (
https://reviews.facebook.net/D42093 ). This new patch, however, is more
aggressive when it comes to removing tombstones: It removes the SingleDelete
together with the value whenever there is no snapshot between them while the
older patch only did this when the sequence number of the deletion was older
than the earliest snapshot.

Most of the complex additions are in the Compaction Iterator, all other changes
should be relatively straightforward. The patch also includes basic support for
single deletions in db_stress and db_bench.

Limitations:
- Not compatible with cuckoo hash tables
- Single deletions cannot be used in combination with merges and normal
  deletions on the same key (other keys are not affected by this)
- Consecutive single deletions are currently not allowed (and older version of
  this patch supported this so it could be resurrected if needed)

Test Plan: make all check

Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor

Reviewed By: igor

Subscribers: maykov, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43179
2015-09-17 11:42:56 -07:00