Commit Graph

159 Commits

Author SHA1 Message Date
Levi Tamasi a59dc843a4 Move blob_index.h to db/ (#5919)
Summary:
Extracted from PR https://github.com/facebook/rocksdb/issues/5903 for technical reasons.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5919

Test Plan: make check

Differential Revision: D17910132

Pulled By: ltamasi

fbshipit-source-id: 6ecbb8d6e84b2a1d1f28575ad48ac3cc65833eb5
2019-10-14 12:54:05 -07:00
sdong b931f84e56 Divide file_reader_writer.h and .cc (#5803)
Summary:
file_reader_writer.h and .cc contain several files and helper function, and it's hard to navigate. Separate it to multiple files and put them under file/
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5803

Test Plan: Build whole project using make and cmake.

Differential Revision: D17374550

fbshipit-source-id: 10efca907721e7a78ed25bbf74dc5410dea05987
2019-09-16 10:33:51 -07:00
Pratik Dhandharia 1b4c104a67 replace some reinterpret_cast with static_cast_with_check (#5740)
Summary:
This PR focuses on replacing some of the reinterpret_cast<DBImpl*> to static_cast_with_check<DBImpl, DB>.

Files impacted:

./db/db_impl/db_impl_compaction_flush.cc
./db/write_batch.cc
./utilities/blob_db/blob_db_impl.cc
./utilities/transactions/pessimistic_transaction_db.cc
./utilities/transactions/transaction_base.cc
./utilities/transactions/write_prepared_txn_db.cc
./utilities/transactions/write_unprepared_txn_db.cc
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5740

Differential Revision: D17055691

Pulled By: pdhandharia

fbshipit-source-id: 0f8034d1b32eade56e37d59c04b7bf236a81d8e8
2019-08-27 10:59:11 -07:00
Levi Tamasi 0a97125ec0 Fix data races in BlobDB (#5698)
Summary:
Some accesses to blob_files_ and open_ttl_files_ in BlobDBImpl, as well
as to expiration_range_ in BlobFile were not properly synchronized.
The patch fixes this and also makes sure the invariant that obsolete_files_
is a subset of blob_files_ holds even when an attempt to delete an obsolete
blob file fails.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5698

Test Plan:
COMPILE_WITH_TSAN=1 make blob_db_test
gtest-parallel --repeat=1000 ./blob_db_test --gtest_filter="*ShutdownWait*"

The test fails with TSAN errors ~20 times out of 1000 without the patch but
completes successfully 1000 out of 1000 times with the fix.

Differential Revision: D16793235

Pulled By: ltamasi

fbshipit-source-id: 8034b987598d4fdc9f15098d4589cc49cde484e9
2019-08-14 16:10:36 -07:00
Vijay Nadimpalli d150e01474 New API to get all merge operands for a Key (#5604)
Summary:
This is a new API added to db.h to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. Example use-cases:
1. Update subset of columns and read subset of columns -
Imagine a SQL Table, a row is encoded as a K/V pair (as it is done in MyRocks). If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU.
2. Updating very few attributes in a value which is a JSON-like document -
Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge.
----------------------------------------------------------------------------------------------------
API :
Status GetMergeOperands(
      const ReadOptions& options, ColumnFamilyHandle* column_family,
      const Slice& key, PinnableSlice* merge_operands,
      GetMergeOperandsOptions* get_merge_operands_options,
      int* number_of_operands)

Example usage :
int size = 100;
int number_of_operands = 0;
std::vector<PinnableSlice> values(size);
GetMergeOperandsOptions merge_operands_info;
db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, &number_of_operands);

Description :
Returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion.
merge_operands-> Points to an array of at-least merge_operands_options.expected_max_number_of_operands and the caller is responsible for allocating it. If the status returned is Incomplete then number_of_operands will contain the total number of merge operands found in DB for key.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5604

Test Plan:
Added unit test and perf test in db_bench that can be run using the command:
./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist

Differential Revision: D16657366

Pulled By: vjnadimpalli

fbshipit-source-id: 0faadd752351745224ee12d4ae9ef3cb529951bf
2019-08-06 14:26:44 -07:00
anand76 e0d9d57750 Fix bugs in WAL trash file handling (#5520)
Summary:
1. Cleanup WAL trash files on open
2. Don't apply deletion rate limit if WAL dir is different from db dir
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5520

Test Plan: Add new unit tests and make check

Differential Revision: D16096750

Pulled By: anand1976

fbshipit-source-id: 6f07858ad864b754b711db416f0389c45ede599b
2019-07-06 21:07:32 -07:00
sdong 58c4aee42e TransactionUtil::CheckKey() to skip unnecessary history (#4941)
Summary:
If a memtable definitely covers a key, there isn't a need to check older memtables.
We can skip them by checking the earliest sequence number.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4941

Differential Revision: D13932666

fbshipit-source-id: b9d52f234b8ad9dd3bf6547645cd457175a3ca9b
2019-06-11 11:46:42 -07:00
Siying Dong 000b9ec217 Move some logging related files to logging/ (#5387)
Summary:
Many logging related source files are under util/. It will be more structured if they are together.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5387

Differential Revision: D15579036

Pulled By: siying

fbshipit-source-id: 3850134ed50b8c0bb40a0c8ae1f184fa4081303f
2019-05-31 17:23:59 -07:00
Vijay Nadimpalli 49c5a12dbe Organizing rocksdb/db directory
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5390

Differential Revision: D15579388

Pulled By: vjnadimpalli

fbshipit-source-id: 5bfc95e31554b8ff05b97b76d6534113f527f366
2019-05-31 11:57:01 -07:00
Siying Dong 8843129ece Move some memory related files from util/ to memory/ (#5382)
Summary:
Move arena, allocator, and memory tools under util to a separate memory/ directory.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5382

Differential Revision: D15564655

Pulled By: siying

fbshipit-source-id: 9cd6b5d0d3d52b39606e19221fa154596e5852a5
2019-05-30 17:44:09 -07:00
Vijay Nadimpalli 50e470791d Organizing rocksdb/table directory by format
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5373

Differential Revision: D15559425

Pulled By: vjnadimpalli

fbshipit-source-id: 5d6d6d615582bedd96a4b879bb25d429a6de8b55
2019-05-30 14:51:11 -07:00
Siying Dong e9e0101ca4 Move test related files under util/ to test_util/ (#5377)
Summary:
There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
ve them to a new directory test_util/, so that util/ is cleaner.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377

Differential Revision: D15551366

Pulled By: siying

fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
2019-05-30 11:25:51 -07:00
Siying Dong 545d206040 Move some file related files outside util/ (#5375)
Summary:
util/ means for lower level libraries, so it's a good idea to move the files which requires knowledge to DB out. Create a file/ and move some files there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5375

Differential Revision: D15550935

Pulled By: siying

fbshipit-source-id: 61a9715dcde5386eebfb43e93f847bba1ae0d3f2
2019-05-29 20:47:06 -07:00
Adam Simpkins c06c4c01c5 Fix many bugs in log statement arguments (#5089)
Summary:
Annotate all of the logging functions to inform the compiler that these
use printf-style formatting arguments.  This allows the compiler to emit
warnings if the format arguments are incorrect.

This also fixes many problems reported now that format string checking
is enabled.  Many of these are simply mix-ups in the argument type (e.g,
int vs uint64_t), but in several cases the wrong number of arguments
were being passed in which can cause the code to crash.

The primary motivation for this was to fix the log message in
`DBImpl::SwitchMemtable()` which caused a segfault due to an extra %s
format parameter with no argument supplied.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5089

Differential Revision: D14574795

Pulled By: simpkins

fbshipit-source-id: 0921b03f0743652bf4ae21e414ff54b3bb65422a
2019-04-04 12:12:11 -07:00
Siying Dong fe2bd190a5 BlobDB::Open() should put all existing trash files to delete scheduler (#5103)
Summary:
Right now, BlobDB::Open() fails to put all trash files to delete scheduler,
which causes some trash files permanently untracked.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5103

Differential Revision: D14606095

Pulled By: siying

fbshipit-source-id: 41a9437a2948abb235c0ed85f9a04612d0e50183
2019-03-26 10:53:19 -07:00
Shobhit Dayal b45b1cde3e Feature for sampling and reporting compressibility (#4842)
Summary:
This is a feature to sample data-block compressibility and and report them as stats. 1 in N (tunable) blocks is sampled for compressibility using two algorithms:
1. lz4 or snappy for fast compression
2. zstd or zlib for slow but higher compression.

The stats are reported to the caller as raw-bytes and compressed-bytes. The block continues to be compressed for storage using the specified CompressionType.

The db_bench_tool how has a command line option for specifying the sampling rate. It's default value is 0 (no sampling). To test the overhead for a certain value, users can compare the performance of db_bench_tool, varying the sampling rate. It is unlikely to have a noticeable impact for high values like 20.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4842

Differential Revision: D13629011

Pulled By: shobhitdayal

fbshipit-source-id: 14ca668bcab6499b2a1734edf848eb62a4f4fafa
2019-03-18 12:15:34 -07:00
Levi Tamasi 79b6ab43ce BlobDB: Remove GC interval option (#5044)
Summary:
Remove BlobDBOptions.garbage_collection_interval_secs for now, since
garbage collection is not yet implemented in BlobDB.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5044

Differential Revision: D14354046

Pulled By: ltamasi

fbshipit-source-id: 2b966b6d1e088ba9462f3ea73e115013562fbc04
2019-03-07 10:19:05 -08:00
Siying Dong 5e298f865b Add two more StatsLevel (#5027)
Summary:
Statistics cost too much CPU for some use cases. Add two stats levels
so that people can choose to skip two types of expensive stats, timers and
histograms.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5027

Differential Revision: D14252765

Pulled By: siying

fbshipit-source-id: 75ecec9eaa44c06118229df4f80c366115346592
2019-02-28 10:27:59 -08:00
Siying Dong 06f378d75e When closing BlobDB, should first wait for all background tasks (#5005)
Summary:
When closing a BlobDB, it only waits for background tasks
to finish as the last thing, but the background task may access
some variables that are destroyed. The fix is to introduce a
shutdown function in the timer queue and call the function as
the first thing when destorying BlobDB.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5005

Differential Revision: D14170342

Pulled By: siying

fbshipit-source-id: 081e6a2d99b9765d5956cf6cdfc290c07270c233
2019-02-21 17:26:01 -08:00
Michael Liu ca89ac2ba9 Apply modernize-use-override (2nd iteration)
Summary:
Use C++11’s override and remove virtual where applicable.
Change are automatically generated.

Reviewed By: Orvid

Differential Revision: D14090024

fbshipit-source-id: 1e9432e87d2657e1ff0028e15370a85d1739ba2a
2019-02-14 14:41:36 -08:00
anand76 d0d484b132 Always delete Blob DB files in the background (#4928)
Summary:
Blob DB files are not tracked by the SFM, so they currently don't get
deleted in the background. Force them to be deleted in background so
rate limiting can be applied
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4928

Differential Revision: D13854649

Pulled By: anand1976

fbshipit-source-id: 8031ce66842ff0af440c715d886b377983dad7d8
2019-01-29 15:50:03 -08:00
Siying Dong 08b8cea69f Deleting Blob files also goes through SstFileManager (#4904)
Summary:
Right now, deleting blob files is not rate limited, even if SstFileManger is specified.
On the other hand, rate limiting blob deletion is not supported. With this change, Blob file
deletion will go through SstFileManager too.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4904

Differential Revision: D13772545

Pulled By: siying

fbshipit-source-id: bd1b1d0beb26d5167385e00b7ecb8b94b879de84
2019-01-22 17:00:29 -08:00
Sagar Vemuri 8189c184ec Remove unused Blob WAL filter (#4896)
Summary:
Remove unused blob WAL filter so that users are not confused.
I was initially under the impression that we have WAL Filter support in BlobDB.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4896

Differential Revision: D13725709

Pulled By: sagar0

fbshipit-source-id: f997d7546e138a474036e88b957907cc714327f1
2019-01-22 11:13:49 -08:00
Andrew Kryczka 01013ae766 Digest ZSTD compression dictionary once when writing SST file (#4849)
Summary:
This is essentially a re-submission of #4251 with a few improvements:

- Split `CompressionDict` into two separate classes: `CompressionDict` and `UncompressionDict`
- Eliminated `Init` functions. Instead do all initialization work in constructors.
- Added test case for parallel DB open, which is the scenario where #4251 failed under TSAN.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4849

Differential Revision: D13606039

Pulled By: ajkr

fbshipit-source-id: 08c236059798c710db9cbf545fce0f371232d447
2019-01-18 19:12:57 -08:00
Yi Wu c7a45ca91f BlobDB: handle IO error on write (#4580)
Summary:
A fix similar to #4410 but on the write path. On IO error on `SelectBlobFile()` we didn't return error code properly, but simply a nullptr of `BlobFile`. The `AppendBlob()` method didn't have null check for the pointer and caused crash. The fix make sure we properly return error code in this case.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4580

Differential Revision: D10513849

Pulled By: yiwu-arbug

fbshipit-source-id: 80bca920d1d7a3541149de981015ad83e0aa14b5
2018-10-23 15:03:45 -07:00
DorianZheng 27090ae8f6 Fix DBImpl::GetColumnFamilyHandleUnlocked race condition (#4391)
Summary:
- Fix DBImpl API race condition

The timeline of execution flow is as follow:
```
timeline              user_thread1                      user_thread2
t1   |     cfh = GetColumnFamilyHandleUnlocked(0)
t2   |     id1 = cfh->GetID()
t3   |                                                GetColumnFamilyHandleUnlocked(1)
t4   |     id2 = cfh->GetID()
     V
```
The original implementation return a pointer to a stateful variable, so that the return `ColumnFamilyHandle` will be changed when another thread calls `GetColumnFamilyHandleUnlocked` with different `column family id`

- Expose ColumnFamily ID to compaction event listener

- Fix the return status of `DBImpl::GetLatestSequenceForKey`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4391

Differential Revision: D10221243

Pulled By: yiwu-arbug

fbshipit-source-id: dec60ee9ff0c8261a2f2413a8506ec1063991993
2018-10-08 14:24:16 -07:00
Yi Wu 04d373b260 BlobDB: handle IO error on read (#4410)
Summary:
Fix IO error on read not being handle and crashing the DB. With the fix we properly return the error.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4410

Differential Revision: D9979246

Pulled By: yiwu-arbug

fbshipit-source-id: 111a85675067a29c03cb60e9a34103f4ff636694
2018-09-20 16:58:45 -07:00
Yanqin Jin 8959063c9c Store the return value of Fsync for check
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4361

Differential Revision: D9803723

Pulled By: riversand963

fbshipit-source-id: 5a0d4cd3e57fd195571dcd5822895ee00547fa6a
2018-09-14 13:29:56 -07:00
Andrew Kryczka 2c14662213 Revert "Digest ZSTD compression dictionary once per SST file (#4251)" (#4347)
Summary:
Reverting is needed to unblock a user building against master, who is blocked for multiple days due to a thread-safety issue in `GetEmptyDict`. We haven't been able to fix it quickly, so reverting.

Simply ran `git revert 6c40806e51a89386d2b066fddf73d3fd03a36f65`. There were no merge conflicts.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4347

Differential Revision: D9668365

Pulled By: ajkr

fbshipit-source-id: 0c56334f0a23cf5ee0233d4e4679eae6709739cd
2018-09-06 09:58:34 -07:00
cngzhnp 64324e329e Support pragma once in all header files and cleanup some warnings (#4339)
Summary:
As you know, almost all compilers support "pragma once" keyword instead of using include guards. To be keep consistency between header files, all header files are edited.

Besides this, try to fix some warnings about loss of data.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4339

Differential Revision: D9654990

Pulled By: ajkr

fbshipit-source-id: c2cf3d2d03a599847684bed81378c401920ca848
2018-09-05 18:13:31 -07:00
Yi Wu 3e801e5ed1 BlobDB: Improve info log (#4324)
Summary:
Improve BlobDB info logs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4324

Differential Revision: D9545074

Pulled By: yiwu-arbug

fbshipit-source-id: 678ab8820a78758fee451be3b123b0680c1081df
2018-08-30 11:57:46 -07:00
Yi Wu 38ad3c9f8a BlobDB: Avoid returning garbage value on key not found (#4321)
Summary:
When reading an expired key using `Get(..., std::string* value)` API, BlobDB first read the index entry and decode expiration from it. In this case, although BlobDB reset the PinnableSlice, the index entry is stored in user provided string `value`. The value will be returned as a garbage value, despite status being NotFound. Fixing it by use a different PinnableSlice to read the index entry.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4321

Differential Revision: D9519042

Pulled By: yiwu-arbug

fbshipit-source-id: f054c951a1fa98265228be94f931904ed7056677
2018-08-27 16:28:39 -07:00
Yi Wu a6d3de4e7a BlobDB: Implement DisableFileDeletions (#4314)
Summary:
`DB::DiableFileDeletions` and `DB::EnableFileDeletions` are used for applications to stop RocksDB background jobs to delete files while they are doing replication. Implement these methods for BlobDB. `DeleteObsolteFiles` now needs to check `disable_file_deletions_` before starting, and will hold `delete_file_mutex_` the whole time while it is running. `DisableFileDeletions` needs to wait on `delete_file_mutex_` for running `DeleteObsolteFiles` job and set `disable_file_deletions_` flag.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4314

Differential Revision: D9501373

Pulled By: yiwu-arbug

fbshipit-source-id: 81064c1228f1724eff46da22b50ff765b16292cd
2018-08-27 10:58:29 -07:00
Andrew Kryczka 6c40806e51 Digest ZSTD compression dictionary once per SST file (#4251)
Summary:
In RocksDB, for a given SST file, all data blocks are compressed with the same dictionary. When we compress a block using the dictionary's raw bytes, the compression library first has to digest the dictionary to get it into a usable form. This digestion work is redundant and ideally should be done once per file.

ZSTD offers APIs for the caller to create and reuse a digested dictionary object (`ZSTD_CDict`). In this PR, we call `ZSTD_createCDict` once per file to digest the raw bytes. Then we use `ZSTD_compress_usingCDict` to compress each data block using the pre-digested dictionary. Once the file's created `ZSTD_freeCDict` releases the resources held by the digested dictionary.

There are a couple other changes included in this PR:

- Changed the parameter object for (un)compression functions from `CompressionContext`/`UncompressionContext` to `CompressionInfo`/`UncompressionInfo`. This avoids the previous pattern, where `CompressionContext`/`UncompressionContext` had to be mutated before calling a (un)compression function depending on whether dictionary should be used. I felt that mutation was error-prone so eliminated it.
- Added support for digested uncompression dictionaries (`ZSTD_DDict`) as well. However, this PR does not support reusing them across uncompression calls for the same file. That work is deferred to a later PR when we will store the `ZSTD_DDict` objects in block cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4251

Differential Revision: D9257078

Pulled By: ajkr

fbshipit-source-id: 21b8cb6bbdd48e459f1c62343780ab66c0a64438
2018-08-23 19:28:18 -07:00
Yanqin Jin bb5dcea98e Add path to WritableFileWriter. (#4039)
Summary:
We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039

Differential Revision: D8670178

Pulled By: riversand963

fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a
2018-08-23 10:12:58 -07:00
Yi Wu 7188bd34f3 BlobDB: Fix expired file not being evicted (#4294)
Summary:
Fix expired file not being evicted from the DB. We have a background task (previously called `CheckSeqFiles` and I rename it to `EvictExpiredFiles`) to scan and remove expired files, but it only close the files, not marking them as expired.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4294

Differential Revision: D9415984

Pulled By: yiwu-arbug

fbshipit-source-id: eff7bf0331c52a7ccdb02318602bff7f64f3ef3d
2018-08-20 22:42:33 -07:00
Yi Wu c970358574 BlobDB: Can return expiration together with Get() (#4227)
Summary:
Add API to allow fetching expiration of a key with `Get()`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4227

Differential Revision: D9169897

Pulled By: yiwu-arbug

fbshipit-source-id: 2a6f216c493dc75731ddcef1daa689b517fab31b
2018-08-06 17:43:14 -07:00
Yi Wu 4cb7068c1e BlobDB: Fix VisibleToActiveSnapshot() (#4236)
Summary:
There are two issues with `VisibleToActiveSnapshot`:
1. If there are no snapshots, `oldest_snapshot` will be 0 and `VisibleToActiveSnapshot` will always return true. Since the method is used to decide whether it is safe to delete obsolete files, obsolete file won't be able to delete in this case.
2. The `auto` keyword of `auto snapshots = db_impl_->snapshots()` translate to a copy of `const SnapshotList` instead of a reference. Since copy constructor of `SnapshotList` is not defined, using the copy may yield unexpected result.

Issue 2 actually hide issue 1 from being catch by tests. During test `snapshots.empty()` can return false while it should actually be empty, and `snapshots.oldest()` return an invalid address, making `oldest_snapshot` being some random large number.

The issue was originally reported by BlobDB early adopter at Kuaishou.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4236

Differential Revision: D9188706

Pulled By: yiwu-arbug

fbshipit-source-id: a0f2624b927cf9bf28c1bb534784fee5d106f5ea
2018-08-06 16:57:42 -07:00
Yi Wu 140f256da2 BlobDB: Cleanup TTLExtractor interface (#4229)
Summary:
Cleanup TTLExtractor interface. The original purpose of it is to allow our users keep using existing `Write()` interface but allow it to accept TTL via `TTLExtractor`. However the interface is confusing. Will replace it with something like `WriteWithTTL(batch, ttl)` in the future.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4229

Differential Revision: D9174390

Pulled By: yiwu-arbug

fbshipit-source-id: 68201703d784408b851336ab4dd9b84188245b2d
2018-08-06 11:58:05 -07:00
Yi Wu 6d454d7376 BlobDB: is_fifo=true also evict non-TTL blob files (#4049)
Summary:
Previously with is_fifo=true we only evict TTL file. Changing it to also evict non-TTL files from oldest to newest, after exhausted TTL files.
Closes https://github.com/facebook/rocksdb/pull/4049

Differential Revision: D8604597

Pulled By: yiwu-arbug

fbshipit-source-id: bc4209ee27c1528ce4b72833e6f1e1bff80082c1
2018-06-25 22:43:05 -07:00
Yi Wu a71e467381 Blob DB: enable readahead for garbage collection (#3648)
Summary:
Enable readahead for blob DB garbage collection, which should improve GC performance a little bit.
Closes https://github.com/facebook/rocksdb/pull/3648

Differential Revision: D7383791

Pulled By: yiwu-arbug

fbshipit-source-id: 642b3327f7105eca85986d3fb2d8f960a3d83cf1
2018-06-23 23:12:00 -07:00
Zhongyi Xie f1592a06c2 run make format for PR 3838 (#3954)
Summary:
PR https://github.com/facebook/rocksdb/pull/3838 made some changes that triggers lint warnings.
Run `make format` to fix formatting as suggested by siying .
Also piggyback two changes:
1) fix singleton destruction order for windows and posix env
2) fix two clang warnings
Closes https://github.com/facebook/rocksdb/pull/3954

Differential Revision: D8272041

Pulled By: miasantreble

fbshipit-source-id: 7c4fd12bd17aac13534520de0c733328aa3c6c9f
2018-06-05 12:58:02 -07:00
Dmitri Smirnov f4b72d7056 Provide a way to override windows memory allocator with jemalloc for ZSTD
Summary:
Windows does not have LD_PRELOAD mechanism to override all memory allocation functions and ZSTD makes use of C-tuntime calloc. During flushes and compactions default system allocator fragments and the system slows down considerably.

For builds with jemalloc we employ an advanced ZSTD context creation API that re-directs memory allocation to jemalloc. To reduce the cost of context creation on each block we cache ZSTD context within the block based table builder while a new SST file is being built, this will help all platform builds including those w/o jemalloc. This avoids system allocator fragmentation and improves the performance.

The change does not address random reads and currently on Windows reads with ZSTD regress as compared with SNAPPY compression.
Closes https://github.com/facebook/rocksdb/pull/3838

Differential Revision: D8229794

Pulled By: miasantreble

fbshipit-source-id: 719b622ab7bf4109819bc44f45ec66f0dd3ee80d
2018-06-04 12:12:48 -07:00
Sergey Elin 3272bc07c6 Fix formatting in log message
Summary:
Add missing space.
Closes https://github.com/facebook/rocksdb/pull/3826

Differential Revision: D7956059

Pulled By: miasantreble

fbshipit-source-id: 3aeba76385f8726399a3086c46de710636a31191
2018-05-11 11:28:54 -07:00
Yi Wu f1b7b790c9 BlobDB: Fix BlobDBImpl::GCFileAndUpdateLSM issues
Summary:
* Fix BlobDBImpl::GCFileAndUpdateLSM doesn't close the new file, and the new file will not be able to be garbage collected later.
* Fix BlobDBImpl::GCFileAndUpdateLSM doesn't copy over metadata from old file to new file.
Closes https://github.com/facebook/rocksdb/pull/3639

Differential Revision: D7355092

Pulled By: yiwu-arbug

fbshipit-source-id: 4fa3594ac5ce376bed1af04a545c532cfc0088c4
2018-03-21 16:30:09 -07:00
Yi Wu b864bc9b5b Blob DB: Improve FIFO eviction
Summary:
Improving blob db FIFO eviction with the following changes,
* Change blob_dir_size to max_db_size. Take into account SST file size when computing DB size.
* FIFO now only take into account live sst files and live blob files. It is normal for disk usage to go over max_db_size because there are obsolete sst files and blob files pending deletion.
* FIFO eviction now also evict TTL blob files that's still open. It doesn't evict non-TTL blob files.
* If FIFO is triggered, it will pass an expiration and the current sequence number to compaction filter. Compaction filter will then filter inlined keys to evict those with an earlier expiration and smaller sequence number. So call LSM FIFO.
* Compaction filter also filter those blob indexes where corresponding blob file is gone.
* Add an event listener to listen compaction/flush event and update sst file size.
* Implement DB::Close() to make sure base db, as well as event listener and compaction filter, destruct before blob db.
* More blob db statistics around FIFO.
* Fix some locking issue when accessing a blob file.
Closes https://github.com/facebook/rocksdb/pull/3556

Differential Revision: D7139328

Pulled By: yiwu-arbug

fbshipit-source-id: ea5edb07b33dfceacb2682f4789bea61de28bbfa
2018-03-06 11:57:42 -08:00
Andrew Kryczka 5d68243e61 Comment out unused variables
Summary:
Submitting on behalf of another employee.
Closes https://github.com/facebook/rocksdb/pull/3557

Differential Revision: D7146025

Pulled By: ajkr

fbshipit-source-id: 495ca5db5beec3789e671e26f78170957704e77e
2018-03-05 13:13:41 -08:00
Yi Wu 1209b6db5c Blob DB: remove existing garbage collection implementation
Summary:
Red diff to remove existing implementation of garbage collection. The current approach is reference counting kind of approach and require a lot of effort to get the size counter right on compaction and deletion. I'm going to go with a simple mark-sweep kind of approach and will send another PR for that.

CompactionEventListener was added solely for blob db and it adds complexity and overhead to compaction iterator. Removing it as well.
Closes https://github.com/facebook/rocksdb/pull/3551

Differential Revision: D7130190

Pulled By: yiwu-arbug

fbshipit-source-id: c3a375ad2639a3f6ed179df6eda602372cc5b8df
2018-03-02 12:57:23 -08:00
Igor Sugak aba3409740 Back out "[codemod] - comment out unused parameters"
Reviewed By: igorsugak

fbshipit-source-id: 4a93675cc1931089ddd574cacdb15d228b1e5f37
2018-02-22 12:43:17 -08:00
David Lai f4a030ce81 - comment out unused parameters
Reviewed By: everiq, igorsugak

Differential Revision: D7046710

fbshipit-source-id: 8e10b1f1e2aecebbfb229c742e214db887e5a461
2018-02-22 09:44:23 -08:00
Yi Wu e62a763752 Blob DB: miscellaneous changes
Summary:
* Expose garbage collection related options
* Minor logging and counter name update
* Remove unused constants.
Closes https://github.com/facebook/rocksdb/pull/3451

Differential Revision: D6867077

Pulled By: yiwu-arbug

fbshipit-source-id: 6c3272a9c9d78b125a0bd6b2e56d00d087cdd6c8
2018-01-31 18:13:23 -08:00
Yi Wu f2f034ef3b Blob DB: fix crash when DB full but no candidate file to evict
Summary:
When blob_files is empty, std::min_element will return blobfiles.end(), which cannot be dereference. Fixing it.
Closes https://github.com/facebook/rocksdb/pull/3387

Differential Revision: D6764927

Pulled By: yiwu-arbug

fbshipit-source-id: 86f78700132be95760d35ac63480dfd3a8bbe17a
2018-01-19 16:26:50 -08:00
Yi Wu 30a017feca Blob DB: avoid having a separate read of checksum
Summary:
Previously on a blob db read, we are making a read of the blob value, and then make another read to get CRC checksum. I'm combining the two read into one.

readrandom db_bench with 1G database with base db size of 13M, value size 1k:
`./db_bench --db=/home/yiwu/tmp/db_bench --use_blob_db --value_size=1024 --num=1000000 --benchmarks=readrandom --use_existing_db --cache_size=32000000`
master: throughput 234MB/s, get micros p50 5.984 p95 9.998 p99 20.817 p100 787
this PR: throughput 261MB/s, get micros p50 5.157 p95 9.928 p99 20.724 p100 190
Closes https://github.com/facebook/rocksdb/pull/3301

Differential Revision: D6615950

Pulled By: yiwu-arbug

fbshipit-source-id: 052410c6d8539ec0cc305d53793bbc8f3616baa3
2018-01-05 16:41:58 -08:00
Yi Wu e763e1b623 BlobDB: dump blob db options on open
Summary:
We dump blob db options on blob db open, but it was removed by mistake in #3246. Adding it back.
Closes https://github.com/facebook/rocksdb/pull/3298

Differential Revision: D6607177

Pulled By: yiwu-arbug

fbshipit-source-id: 2a4aacbfa52fd8f1878dc9e1fbb95fe48faf80c0
2017-12-19 16:57:12 -08:00
Yi Wu 48cf8da2bb BlobDB: update blob_db_options.bytes_per_sync behavior
Summary:
Previously, if blob_db_options.bytes_per_sync, there is a background job to call fsync() for every bytes_per_sync bytes written to a blob file. With the change we simply pass bytes_per_sync as env_options_ to blob files so that sync_file_range() will be used instead.
Closes https://github.com/facebook/rocksdb/pull/3297

Differential Revision: D6606994

Pulled By: yiwu-arbug

fbshipit-source-id: 452424be52e32ba92f5ea603b564e9b88929af47
2017-12-19 16:41:41 -08:00
Yi Wu 237b292515 BlobDB: Remove the need to get sequence number per write
Summary:
Previously we store sequence number range of each blob files, and use the sequence number range to check if the file can be possibly visible by a snapshot. But it adds complexity to the code, since the sequence number is only available after a write. (The current implementation get sequence number by calling GetLatestSequenceNumber(), which is wrong.) With the patch, we are not storing sequence number range, and check if snapshot_sequence < obsolete_sequence to decide if the file is visible by a snapshot (previously we check if first_sequence <= snapshot_sequence < obsolete_sequence).
Closes https://github.com/facebook/rocksdb/pull/3274

Differential Revision: D6571497

Pulled By: yiwu-arbug

fbshipit-source-id: ca06479dc1fcd8782f6525b62b7762cd47d61909
2017-12-15 13:27:30 -08:00
Yi Wu 250a51a3f9 BlobDB: refactor DB open logic
Summary:
Refactor BlobDB open logic. List of changes:

Major:
* On reopen, mark blob files found as immutable, do not use them for writing new keys.
* Not to scan the whole file to find file footer. Instead just seek to the end of the file and try to read footer.

Minor:
* Move most of the real logic from blob_db.cc to blob_db_impl.cc.
* Not to hold shared_ptr of event listeners in global maps in blob_db.cc
* Some changes to BlobFile interface.
* Improve logging and error handling.
Closes https://github.com/facebook/rocksdb/pull/3246

Differential Revision: D6526147

Pulled By: yiwu-arbug

fbshipit-source-id: 9dc4cdd63359a2f9b696af817086949da8d06952
2017-12-11 12:12:38 -08:00
Prashant D 7b57510a17 utilities: Fix coverity issues in blob_db and col_buf_decoder
Summary:
utilities/blob_db/blob_db_impl.cc
265                    : bdb_options_.blob_dir;
   	3. uninit_member: Non-static class member env_ is not initialized in this constructor nor in any functions that it calls.
   	5. uninit_member: Non-static class member ttl_extractor_ is not initialized in this constructor nor in any functions that it calls.
   	7. uninit_member: Non-static class member open_p1_done_ is not initialized in this constructor nor in any functions that it calls.

CID 1418245 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
9. uninit_member: Non-static class member debug_level_ is not initialized in this constructor nor in any functions that it calls.
266}

   	4. past_the_end: Function end creates an iterator.

CID 1418258 (#1 of 1): Using invalid iterator (INVALIDATE_ITERATOR)
5. deref_iterator: Dereferencing iterator file_nums.end() though it is already past the end of its container.

utilities/col_buf_decoder.h:
     nullable_(nullable),
   	2. uninit_member: Non-static class member remain_runs_ is not initialized in this constructor nor in any functions that it calls.
   	4. uninit_member: Non-static class member run_val_ is not initialized in this constructor nor in any functions that it calls.

CID 1396134 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
6. uninit_member: Non-static class member last_val_ is not initialized in this constructor nor in any functions that it calls.
 46        big_endian_(big_endian) {}
Closes https://github.com/facebook/rocksdb/pull/3134

Differential Revision: D6340607

Pulled By: sagar0

fbshipit-source-id: 25c52566e2ff979fe6c7abb0f40c27fc16597054
2017-11-28 12:27:57 -08:00
Yi Wu 78279350aa Blob DB: Add statistics
Summary:
Adding a list of blob db counters.

Also remove WaStats() which doesn't expose the stats and can be substitute by (BLOB_DB_BYTES_WRITTEN / BLOB_DB_BLOB_FILE_BYTES_WRITTEN).
Closes https://github.com/facebook/rocksdb/pull/3193

Differential Revision: D6394216

Pulled By: yiwu-arbug

fbshipit-source-id: 017508c8ff3fcd7ea7403c64d0f9834b24816803
2017-11-28 11:58:49 -08:00
Yi Wu f0dde49cda Blob DB: Fix GC handling for inlined blob
Summary:
Garbage collection checks if the offset in blob index matches the offset of the blob value in the file. If it is a mismatch, the value is the current version. However it failed to check if the blob index is an inlined type, which don't even have an offset. Fixing it.
Closes https://github.com/facebook/rocksdb/pull/3194

Differential Revision: D6394270

Pulled By: yiwu-arbug

fbshipit-source-id: 7c2b9d795f1116f55f4d728086980f9b6e88ea78
2017-11-24 11:56:47 -08:00
Sagar Vemuri 578f36e431 Blob DB: Remove some redundant log lines
Summary:
Saw some redundant log lines when trying to benchmark blob db. So, removed the lines from blob_file.cc, and let the lines in blob_db_impl.cc take the lead.
Closes https://github.com/facebook/rocksdb/pull/3189

Differential Revision: D6381726

Pulled By: sagar0

fbshipit-source-id: 5f0b1e56fe4bc3b715d89ea9b5749bd935cd0606
2017-11-20 21:11:34 -08:00
Yi Wu 42564ada53 Blob DB: not using PinnableSlice move assignment
Summary:
The current implementation of PinnableSlice move assignment have an issue #3163. We are moving away from it instead of try to get the move assignment right, since it is too tricky.
Closes https://github.com/facebook/rocksdb/pull/3164

Differential Revision: D6319201

Pulled By: yiwu-arbug

fbshipit-source-id: 8f3279021f3710da4a4caa14fd238ed2df902c48
2017-11-13 18:12:20 -08:00
Dmitri Smirnov f8e2db0717 Fix crashes, address test issues and adjust windows test script
Summary:
Add per-exe execution capability
  Add fix parsing of groups/tests
  Add timer test exclusion

 Fix unit tests
  Ifdef threadpool specific tests that do not pass on Vista threadpool.
  Remove spurious outout from prefix_test so test case listing works
  properly.
  Fix not using standard test directories results in file creation errors
  in sst_dump_test.

  BlobDb fixes:
    In C++ end() iterators can not be dereferenced. They are not valid.
	When deleting blob_db_ set it to nullptr before any other code executes.
	Not fixed:. On Windows you can not delete a file while it is open.
	[ RUN      ] BlobDBTest.ReadWhileGC
	d:\dev\rocksdb\rocksdb\utilities\blob_db\blob_db_test.cc(75): error: DestroyBlobDB(dbname_, options, bdb_options)
	IO error: Failed to delete: d:/mnt/db\testrocksdb-17444/blob_db_test/blob_dir/000001.blob: Permission denied
	d:\dev\rocksdb\rocksdb\utilities\blob_db\blob_db_test.cc(75): error: DestroyBlobDB(dbname_, options, bdb_options)
	IO error: Failed to delete: d:/mnt/db\testrocksdb-17444/blob_db_test/blob_dir/000001.blob: Permission denied

  write_batch
    Should not call front() if there is a chance the container is empty
Closes https://github.com/facebook/rocksdb/pull/3152

Differential Revision: D6293274

Pulled By: sagar0

fbshipit-source-id: 318c3717c22087fae13b18715dffb24565dbd956
2017-11-10 10:41:57 -08:00
Yi Wu 5e9e5a4702 Blob DB: Fix race condition between flush and write
Summary:
A race condition will happen when:
* a user thread writes a value, but it hits the write stop condition because there are too many un-flushed memtables, while holding blob_db_impl.write_mutex_.
* Flush is triggered and call flush begin listener and try to acquire blob_db_impl.write_mutex_.

Fixing it.
Closes https://github.com/facebook/rocksdb/pull/3149

Differential Revision: D6279805

Pulled By: yiwu-arbug

fbshipit-source-id: 0e3c58afb78795ebe3360a2c69e05651e3908c40
2017-11-08 19:42:22 -08:00
Yi Wu 4f9f124347 Blob DB: use compression in file header instead of global options
Summary:
To fix the issue of failing to decompress existing value after reopen DB with a different compression settings.
Closes https://github.com/facebook/rocksdb/pull/3142

Differential Revision: D6267260

Pulled By: yiwu-arbug

fbshipit-source-id: c7cf7f3e33b0cd25520abf4771cdf9180cc02a5f
2017-11-07 17:42:17 -08:00
Yi Wu be410dede8 Fix PinnableSlice move assignment
Summary:
After move assignment, we need to re-initialized the moved PinnableSlice.

Also update blob_db_impl.cc to not reuse the moved PinnableSlice since it is supposed to be in an undefined state after move.
Closes https://github.com/facebook/rocksdb/pull/3127

Differential Revision: D6238585

Pulled By: yiwu-arbug

fbshipit-source-id: bd99f2e37406c4f7de160c7dee6a2e8126bc224e
2017-11-03 18:13:21 -07:00
Yi Wu 62578d80c1 Blob DB: Add compaction filter to remove expired blob index entries
Summary:
After adding expiration to blob index in #3066, we are now able to add a compaction filter to cleanup expired blob index entries.
Closes https://github.com/facebook/rocksdb/pull/3090

Differential Revision: D6183812

Pulled By: yiwu-arbug

fbshipit-source-id: 9cb03267a9702975290e758c9c176a2c03530b83
2017-11-02 17:27:38 -07:00
Yi Wu 7bfa88037e Blob DB: fix snapshot handling
Summary:
Blob db will keep blob file if data in the file is visible to an active snapshot. Before this patch it checks whether there is an active snapshot has sequence number greater than the earliest sequence in the file. This is problematic since we take snapshot on every read, if it keep having reads, old blob files will not be cleanup. Change to check if there is an active snapshot falls in the range of [earliest_sequence, obsolete_sequence) where obsolete sequence is
1. if data is relocated to another file by garbage collection, it is the latest sequence at the time garbage collection finish
2. otherwise, it is the latest sequence of the file
Closes https://github.com/facebook/rocksdb/pull/3087

Differential Revision: D6182519

Pulled By: yiwu-arbug

fbshipit-source-id: cdf4c35281f782eb2a9ad6a87b6727bbdff27a45
2017-11-02 15:58:27 -07:00
Yi Wu f662f8f0b6 Blob DB: option to enable garbage collection
Summary:
Add an option to enable/disable auto garbage collection, where we keep counting how many keys have been evicted by either deletion or compaction and decide whether to garbage collect a blob file.

Default disable auto garbage collection for now since the whole logic is not fully tested and we plan to make major change to it.
Closes https://github.com/facebook/rocksdb/pull/3117

Differential Revision: D6224756

Pulled By: yiwu-arbug

fbshipit-source-id: cdf53bdccec96a4580a2b3a342110ad9e8864dfe
2017-11-02 15:58:27 -07:00
Yi Wu 167ba599ec Blob DB: Fix flaky BlobDBTest::GCExpiredKeyWhileOverwriting test
Summary:
The test intent to wait until key being overwritten until proceed with garbage collection. It failed to wait for `PutUntil` finally finish. Fixing it.
Closes https://github.com/facebook/rocksdb/pull/3116

Differential Revision: D6222833

Pulled By: yiwu-arbug

fbshipit-source-id: fa9b57a772b92a66cf250b44e7975c43f62f45c5
2017-11-02 13:27:34 -07:00
Sagar Vemuri 25ac1697b4 Blob DB: Evict oldest blob file when close to blob db size limit
Summary:
Evict oldest blob file and put it in obsolete_files list when close to blob db size limit. The file will be delete when the `DeleteObsoleteFiles` background job runs next time.
For now I set `kEvictOldestFileAtSize` constant, which controls when to evict the oldest file, at 90%. It could be tweaked or made into an option if really needed; I didn't want to expose it as an option pre-maturely as there are already too many :) .
Closes https://github.com/facebook/rocksdb/pull/3094

Differential Revision: D6187340

Pulled By: sagar0

fbshipit-source-id: 687f8262101b9301bf964b94025a2fe9d8573421
2017-11-02 12:11:21 -07:00
Yi Wu f6082d1944 Blob DB: cleanup unused options
Summary:
* cleanup num_concurrent_simple_blobs. We don't do concurrent writes (by taking write_mutex_) so it doesn't make sense to have multiple non TTL files open. We can revisit later when we want to improve writes.
* cleanup eviction callback. we don't have plan to use it now.
* rename s/open_simple_blob_files_/open_non_ttl_file_/ and s/open_blob_files_/open_ttl_files_/ to avoid confusion.
Closes https://github.com/facebook/rocksdb/pull/3088

Differential Revision: D6182598

Pulled By: yiwu-arbug

fbshipit-source-id: 99e6f5e01fa66d31309cdb06ce48502464bac6ad
2017-10-31 16:42:08 -07:00
Sagar Vemuri f5078dde2d Blob DB: Initialize all fields in Blob Header, Footer and Record structs
Summary:
Fixing un-itializations caught by valgrind.
Closes https://github.com/facebook/rocksdb/pull/3103

Differential Revision: D6200195

Pulled By: sagar0

fbshipit-source-id: bf35a3fb03eb1d308e4c5ce30dee1e345d7b03b3
2017-10-31 16:42:08 -07:00
Yi Wu 3ebb7ba7b9 Blob DB: update blob file format
Summary:
Changing blob file format and some code cleanup around the change. The change with blob log format are:
* Remove timestamp field in blob file header, blob file footer and blob records. The field is not being use and often confuse with expiration field.
* Blob file header now come with column family id, which always equal to default column family id. It leaves room for future support of column family.
* Compression field in blob file header now is a standalone byte (instead of compact encode with flags field)
* Blob file footer now come with its own crc.
* Key length now being uint64_t instead of uint32_t
* Blob CRC now checksum both key and value (instead of value only).
* Some reordering of the fields.

The list of cleanups:
* Better inline comments in blob_log_format.h
* rename ttlrange_t and snrange_t to ExpirationRange and SequenceRange respectively.
* simplify blob_db::Reader
* Move crc checking logic to inside blob_log_format.cc
Closes https://github.com/facebook/rocksdb/pull/3081

Differential Revision: D6171304

Pulled By: yiwu-arbug

fbshipit-source-id: e4373e0d39264441b7e2fbd0caba93ddd99ea2af
2017-10-27 13:27:12 -07:00
Yi Wu 5a2a6483dc Blob DB: Inline small values in base DB
Summary:
Adding the `min_blob_size` option to allow storing small values in base db (in LSM tree) together with the key. The goal is to improve performance for small values, while taking advantage of blob db's low write amplification for large values.

Also adding expiration timestamp to blob index. It will be useful to evict stale blob indexes in base db by adding a compaction filter. I'll work on the compaction filter in future patches.

See blob_index.h for the new blob index format. There are 4 cases when writing a new key:
* small value w/o TTL: put in base db as normal value (i.e. ValueType::kTypeValue)
* small value w/ TTL: put (type, expiration, value) to base db.
* large value w/o TTL: write value to blob log and put (type, file, offset, size, compression) to base db.
* large value w/TTL: write value to blob log and put (type, expiration, file, offset, size, compression) to base db.
Closes https://github.com/facebook/rocksdb/pull/3066

Differential Revision: D6142115

Pulled By: yiwu-arbug

fbshipit-source-id: 9526e76e19f0839310a3f5f2a43772a4ad182cd0
2017-10-26 12:30:54 -07:00
Sagar Vemuri 96e3a600ba Return write error on reaching blob dir size limit
Summary:
I found that we continue accepting writes even when the blob db goes beyond the configured blob directory size limit. Now, we return an error for writes on reaching `blob_dir_size` limit and if `is_fifo` is set to false. (We cannot just drop any file when `is_fifo` is true.)

Deleting the oldest file when `is_fifo` is true will be handled in a later PR.
Closes https://github.com/facebook/rocksdb/pull/3060

Differential Revision: D6136156

Pulled By: sagar0

fbshipit-source-id: 2f11cb3f2eedfa94524fbfa2613dd64bfad7a23c
2017-10-25 16:30:37 -07:00
Dmitri Smirnov d2a65c59e1 Fix unused var warnings in Release mode
Summary:
MSVC does not support unused attribute at this time. A separate assignment line fixes the issue probably by being counted as usage for MSVC and it no longer complains about unused var.
Closes https://github.com/facebook/rocksdb/pull/3048

Differential Revision: D6126272

Pulled By: maysamyabandeh

fbshipit-source-id: 4907865db45fd75a39a15725c0695aaa17509c1f
2017-10-23 14:27:04 -07:00
Yi Wu eaaef91178 Blob DB: Store blob index as kTypeBlobIndex in base db
Summary:
Blob db insert blob index to base db as kTypeBlobIndex type, to tell apart values written by plain rocksdb or blob db. This is to make it possible to migrate from existing rocksdb to blob db.

Also with the patch blob db garbage collection get away from OptimisticTransaction. Instead it use a custom write callback to achieve similar behavior as OptimisticTransaction. This is because we need to pass the is_blob_index flag to DBImpl::Get but OptimisticTransaction don't support it.
Closes https://github.com/facebook/rocksdb/pull/3000

Differential Revision: D6050044

Pulled By: yiwu-arbug

fbshipit-source-id: 61dc72ab9977625e75f78cd968e7d8a3976e3632
2017-10-17 17:28:11 -07:00
Yi Wu 0552029b5c Blob DB: not writing sequence number as blob record footer
Summary:
Previously each time we write a blob we write blog_record_header + key + value + blob_record_footer to blob log. The footer only contains a sequence and a crc for the sequence number. The sequence number was used in garbage collection to verify the value is recent. After #2703 we moved to use optimistic transaction and no longer use sequence number from the footer. Remove the footer altogether.

There's another usage of sequence number and we are keeping it: Each blob log file keep track of sequence number range of keys in it, and use it to check if it is reference by a snapshot, before being deleted.
Closes https://github.com/facebook/rocksdb/pull/3005

Differential Revision: D6057585

Pulled By: yiwu-arbug

fbshipit-source-id: d6da53c457a316e9723f359a1b47facfc3ffe090
2017-10-17 12:13:08 -07:00
Zhongyi Xie e2548366e1 add GetLiveFiles and GetLiveFilesMetaData for BlobDB
Summary: Closes https://github.com/facebook/rocksdb/pull/2976

Differential Revision: D5994759

Pulled By: miasantreble

fbshipit-source-id: 985c31dccb957cb970c302f813cd07a1e8cb6438
2017-10-09 19:56:04 -07:00
Sagar Vemuri da29eba43b Enable WAL for blob index
Summary:
Enabled WAL, during GC, for blob index which is stored on regular RocksDB.
Closes https://github.com/facebook/rocksdb/pull/2975

Differential Revision: D5997384

Pulled By: sagar0

fbshipit-source-id: b76c1487d8b5be0e36c55e8d77ffe3d37d63d85b
2017-10-06 10:59:31 -07:00
Yi Wu dcd36a6aee Make it explicit blob db doesn't support CF
Summary:
Blob db doesn't currently support column families. Return NotSupported status explicitly.
Closes https://github.com/facebook/rocksdb/pull/2825

Differential Revision: D5757438

Pulled By: yiwu-arbug

fbshipit-source-id: 44de9408fd032c98e8ae337d4db4ed37169bd9fa
2017-09-08 11:11:04 -07:00
Yi Wu 503db684f7 make blob file close synchronous
Summary:
Fixing flaky blob_db_test.

To close a blob file, blob db used to add a CloseSeqWrite job to the background thread to close it. Changing file close to be synchronous in order to simplify logic, and fix flaky blob_db_test.
Closes https://github.com/facebook/rocksdb/pull/2787

Differential Revision: D5699387

Pulled By: yiwu-arbug

fbshipit-source-id: dd07a945cd435cd3808fce7ee4ea57817409474a
2017-08-25 10:41:49 -07:00
yiwu-arbug 5b68b114f1 Blob db create a snapshot before every read
Summary:
If GC kicks in between

* A Get() reads index entry from base db.
* The Get() read from a blob file

The GC can delete the corresponding blob file, making the key not found. Fortunately we have existing logic to avoid deleting a blob file if it is referenced by a snapshot. So the fix is to explicitly create a snapshot before reading index entry from base db.
Closes https://github.com/facebook/rocksdb/pull/2754

Differential Revision: D5655956

Pulled By: yiwu-arbug

fbshipit-source-id: e4ccbc51331362542e7343175bbcbdea5830f544
2017-08-20 18:26:19 -07:00
yiwu-arbug 4624ae52c9 GC the oldest file when out of space
Summary:
When out of space, blob db should GC the oldest file. The current implementation GC the newest one instead. Fixing it.
Closes https://github.com/facebook/rocksdb/pull/2757

Differential Revision: D5657611

Pulled By: yiwu-arbug

fbshipit-source-id: 56c30a4c52e6ab04551dda8c5c46006d4070b28d
2017-08-20 17:11:06 -07:00
yiwu-arbug 29877ec7b4 Fix blob db crash during calculating write amp
Summary:
On initial call to BlobDBImpl::WaStats() `all_periods_write_` would be empty, so it will crash when we call pop_front() at line 1627. Apparently it is mean to pop only when `all_periods_write_.size() > kWriteAmplificationStatsPeriods`.

The whole write amp calculation doesn't seems to be correct and it is not being exposed. Will work on it later.

Test Plan
Change kWriteAmplificationStatsPeriodMillisecs to 1000 (1 second) and run db_bench --use_blob_db for 5 minutes.
Closes https://github.com/facebook/rocksdb/pull/2751

Differential Revision: D5648269

Pulled By: yiwu-arbug

fbshipit-source-id: b843d9a09bb5f9e1b713d101ec7b87e54b5115a4
2017-08-17 15:01:09 -07:00
Maysam Yabandeh eb6425303e Update WritePrepared with the pseudo code
Summary:
Implement the main body of WritePrepared pseudo code. This includes PrepareInternal and CommitInternal, as well as AddCommitted which updates the commit map. It also provides a IsInSnapshot method that could be later called form the read path to decide if a version is in the read snapshot or it should other be skipped.

This patch lacks unit tests and does not attempt to offer an efficient implementation. The idea is that to have the API specified so that we can work on related tasks in parallel.
Closes https://github.com/facebook/rocksdb/pull/2713

Differential Revision: D5640021

Pulled By: maysamyabandeh

fbshipit-source-id: bfa7a05e8d8498811fab714ce4b9c21530514e1c
2017-08-16 16:57:47 -07:00
yiwu-arbug e5a1b727c0 Fix blob DB transaction usage while GC
Summary:
While GC, blob DB use optimistic transaction to delete or replace the index entry in LSM, to guarantee correctness if there's a normal write writing to the same key. However, the previous implementation doesn't call SetSnapshot() nor use GetForUpdate() of transaction API, instead it do its own sequence number checking before beginning the transaction. A normal write can sneak in after the sequence number check and overwrite the key, and the GC will delete or relocate the old version of the key by mistake. Update the code to property use GetForUpdate() to check the existing index entry.

After the patch the sequence number store with each blob record is useless, So I'm considering remove the sequence number from blob record, in another patch.
Closes https://github.com/facebook/rocksdb/pull/2703

Differential Revision: D5589178

Pulled By: yiwu-arbug

fbshipit-source-id: 8dc960cd5f4e61b36024ba7c32d05584ce149c24
2017-08-11 12:43:17 -07:00
Maysam Yabandeh bdc056f8aa Refactor PessimisticTransaction
Summary:
This patch splits Commit and Prepare into lock-related logic and db-write-related logic. It moves lock-related logic to PessimisticTransaction to be reused by all children classes and movies the existing impl of db-write-related to PrepareInternal, CommitSingleInternal, and CommitInternal in WriteCommittedTxnImpl.
Closes https://github.com/facebook/rocksdb/pull/2691

Differential Revision: D5569464

Pulled By: maysamyabandeh

fbshipit-source-id: d1b8698e69801a4126c7bc211745d05c636f5325
2017-08-07 16:12:29 -07:00
Yi Wu 0d4a2b7330 Avoid blob db call Sync() while writing
Summary:
The FsyncFiles background job call Fsync() periodically for blob files. However it can access WritableFileWriter concurrently with a Put() or Write(). And WritableFileWriter does not support concurrent access. It will lead to WritableFileWriter buffer being flush with same content twice, and blob file end up corrupted. Fixing by simply let FsyncFiles hold write_mutex_.
Closes https://github.com/facebook/rocksdb/pull/2685

Differential Revision: D5561908

Pulled By: yiwu-arbug

fbshipit-source-id: f0bb5bcab0e05694e053b8c49eab43640721e872
2017-08-04 13:12:07 -07:00
Yi Wu 92afe830f9 Update all blob db TTL and timestamps to uint64_t
Summary:
The current blob db implementation use mix of int32_t, uint32_t and uint64_t for TTL and expiration. Update all timestamps to uint64_t for consistency.
Closes https://github.com/facebook/rocksdb/pull/2683

Differential Revision: D5557103

Pulled By: yiwu-arbug

fbshipit-source-id: e4eab2691629a755e614e8cf1eed9c3a681d0c42
2017-08-03 17:57:30 -07:00
Yi Wu 0b814ba92d Allow concurrent writes to blob db
Summary:
I'm going with brute-force solution, just letting Put() and Write() holding a mutex before writing. May improve concurrent writing with finer granularity locking later.
Closes https://github.com/facebook/rocksdb/pull/2682

Differential Revision: D5552690

Pulled By: yiwu-arbug

fbshipit-source-id: 039abd675b5d274a7af6428198d1733cafecef4c
2017-08-03 15:11:26 -07:00
Yi Wu 2c45ada4c4 Blob DB garbage collection should keep keys with newer version
Summary:
Fix the bug where if blob db garbage collection revmoe keys with newer version. It shouldn't delete the key from base db when sequence number in base db is not equal to the one in blob log.
Closes https://github.com/facebook/rocksdb/pull/2678

Differential Revision: D5549752

Pulled By: yiwu-arbug

fbshipit-source-id: abb8649260963b5c389748023970fd746279d227
2017-08-03 13:12:12 -07:00
Yi Wu 1900771bd2 Dump Blob DB options to info log
Summary:
* Dump blob db options to info log
* Remove BlobDBOptionsImpl to disallow dynamic cast *BlobDBOptions into *BlobDBOptionsImpl. Move options there to be constants or into BlobDBOptions. The dynamic cast is broken after #2645
* Change some of the default options
* Remove blob_db_options.min_blob_size, which is unimplemented. Will implement it soon.
Closes https://github.com/facebook/rocksdb/pull/2671

Differential Revision: D5529912

Pulled By: yiwu-arbug

fbshipit-source-id: dcd58ca981db5bcc7f123b65a0d6f6ae0dc703c7
2017-08-01 13:01:47 -07:00
Siying Dong 21696ba502 Replace dynamic_cast<>
Summary:
Replace dynamic_cast<> so that users can choose to build with RTTI off, so that they can save several bytes per object, and get tiny more memory available.
Some nontrivial changes:
1. Add Comparator::GetRootComparator() to get around the internal comparator hack
2. Add the two experiemental functions to DB
3. Add TableFactory::GetOptionString() to avoid unnecessary casting to get the option string
4. Since 3 is done, move the parsing option functions for table factory to table factory files too, to be symmetric.
Closes https://github.com/facebook/rocksdb/pull/2645

Differential Revision: D5502723

Pulled By: siying

fbshipit-source-id: fd13cec5601cf68a554d87bfcf056f2ffa5fbf7c
2017-07-28 16:27:16 -07:00
Yi Wu 6083bc79f8 Blob DB TTL extractor
Summary:
Introducing blob_db::TTLExtractor to replace extract_ttl_fn. The TTL
extractor can be use to extract TTL from keys insert with Put or
WriteBatch. Change over existing extract_ttl_fn are:
* If value is changed, it will be return via std::string* (rather than Slice*). With Slice* the new value has to be part of the existing value. With std::string* the limitation is removed.
* It can optionally return TTL or expiration.

Other changes in this PR:
* replace `std::chrono::system_clock` with `Env::NowMicros` so that I can mock time in tests.
* add several TTL tests.
* other minor naming change.
Closes https://github.com/facebook/rocksdb/pull/2659

Differential Revision: D5512627

Pulled By: yiwu-arbug

fbshipit-source-id: 0dfcb00d74d060b8534c6130c808e4d5d0a54440
2017-07-27 23:26:04 -07:00
Sagar Vemuri 72502cf227 Revert "comment out unused parameters"
Summary:
This reverts the previous commit 1d7048c598, which broke the build.

Did a `git revert 1d7048c`.
Closes https://github.com/facebook/rocksdb/pull/2627

Differential Revision: D5476473

Pulled By: sagar0

fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06
2017-07-21 18:26:26 -07:00
Victor Gao 1d7048c598 comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 14:57:44 -07:00
Yi Wu 0302da47a7 Reduce blob db noisy logging
Summary:
Remove some of the per-key logging by blob db to reduce noise.
Closes https://github.com/facebook/rocksdb/pull/2587

Differential Revision: D5429115

Pulled By: yiwu-arbug

fbshipit-source-id: b89328282fb8b3c64923ce48738c16017ce7feaf
2017-07-20 15:02:31 -07:00
Siying Dong 3c327ac2d0 Change RocksDB License
Summary: Closes https://github.com/facebook/rocksdb/pull/2589

Differential Revision: D5431502

Pulled By: siying

fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75
2017-07-15 16:11:23 -07:00