rocksdb

mirror of https://github.com/facebook/rocksdb.git synced 2024-11-26 16:30:56 +00:00

Author	SHA1	Message	Date
Zhongyi Xie	d68f9f4580	simplify include directive involving inttypes (#5402 ) Summary: When using `PRIu64` type of printf specifier, current code base does the following: ``` #ifndef __STDC_FORMAT_MACROS #define __STDC_FORMAT_MACROS #endif #include <inttypes.h> ``` However, this can be simplified to ``` #include <cinttypes> ``` as long as flag `-std=c++11` is used. This should solve issues like https://github.com/facebook/rocksdb/issues/5159 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5402 Differential Revision: D15701195 Pulled By: miasantreble fbshipit-source-id: 6dac0a05f52aadb55e9728038599d3d2e4b59d03	2019-06-06 13:56:07 -07:00
Vijay Nadimpalli	49c5a12dbe	Organizing rocksdb/db directory Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5390 Differential Revision: D15579388 Pulled By: vjnadimpalli fbshipit-source-id: 5bfc95e31554b8ff05b97b76d6534113f527f366	2019-05-31 11:57:01 -07:00
Yanqin Jin	83f7a8eed0	Fix compilation error in LITE mode (#5391 ) Summary: Add macro ROCKSDB_LITE to fix compilation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5391 Differential Revision: D15574522 Pulled By: riversand963 fbshipit-source-id: 95aea83c5d9b2bf98a3ba0ef9167b63c9be2988b	2019-05-31 08:32:22 -07:00
Yanqin Jin	b9f5900658	Fix WAL replay by skipping old write batches (#5170 ) Summary: 1. Fix a bug in WAL replay in which write batches with old sequence numbers are mistakenly inserted into memtables. 2. Add support for benchmarking secondary instance to db_bench_tool. With changes made in this PR, we can start benchmarking secondary instance using two processes. It is also possible to vary the frequency at which the secondary instance tries to catch up with the primary. The info log of the secondary can be found in a directory whose path can be specified with '-secondary_path'. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5170 Differential Revision: D15564608 Pulled By: riversand963 fbshipit-source-id: ce97688ed3d33f69d3a0b9266ebbbbf887aa0ec8	2019-05-30 19:33:33 -07:00
Siying Dong	8843129ece	Move some memory related files from util/ to memory/ (#5382 ) Summary: Move arena, allocator, and memory tools under util to a separate memory/ directory. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5382 Differential Revision: D15564655 Pulled By: siying fbshipit-source-id: 9cd6b5d0d3d52b39606e19221fa154596e5852a5	2019-05-30 17:44:09 -07:00
Siying Dong	e9e0101ca4	Move test related files under util/ to test_util/ (#5377 ) Summary: There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo ve them to a new directory test_util/, so that util/ is cleaner. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377 Differential Revision: D15551366 Pulled By: siying fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c	2019-05-30 11:25:51 -07:00
Maysam Yabandeh	eab4f49a2c	WritePrepared: skip_concurrency_control option (#5330 ) Summary: This enables the user to set TransactionDBOptions::skip_concurrency_control so the standard `DB::Write(const WriteOptions& opts, WriteBatch* updates)` would skip the concurrency control. This would give higher throughput to the users who know their use case doesn't need concurrency control. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5330 Differential Revision: D15525932 Pulled By: maysamyabandeh fbshipit-source-id: 68421ac1ba34f549a4a8de9ce4c2dccf6fb4b06b	2019-05-28 16:29:45 -07:00
Zhichao Cao	a13026fb2f	Added trace replay fast forward function (#5273 ) Summary: In the current db_bench trace replay, the replay process strictly follows the timestamp to issue the queries. In some cases, user does not care about the time. Therefore, fast forward is needed for users to speed up the replay process. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5273 Differential Revision: D15389232 Pulled By: zhichao-cao fbshipit-source-id: 735d629b9d2a167b05af3e4fa0ddf9d5d0be1806	2019-05-16 20:21:18 -07:00
Maysam Yabandeh	f383641a1d	Unordered Writes (#5218 ) Summary: Performing unordered writes in rocksdb when unordered_write option is set to true. When enabled the writes to memtable are done without joining any write thread. This offers much higher write throughput since the upcoming writes would not have to wait for the slowest memtable write to finish. The tradeoff is that the writes visible to a snapshot might change over time. If the application cannot tolerate that, it should implement its own mechanisms to work around that. Using TransactionDB with WRITE_PREPARED write policy is one way to achieve that. Doing so increases the max throughput by 2.2x without however compromising the snapshot guarantees. The patch is prepared based on an original by siying Existing unit tests are extended to include unordered_write option. Benchmark Results: ``` TEST_TMPDIR=/dev/shm/ ./db_bench_unordered --benchmarks=fillrandom --threads=32 --num=10000000 -max_write_buffer_number=16 --max_background_jobs=64 --batch_size=8 --writes=3000000 -level0_file_num_compaction_trigger=99999 --level0_slowdown_writes_trigger=99999 --level0_stop_writes_trigger=99999 -enable_pipelined_write=false -disable_auto_compactions --unordered_write=1 ``` With WAL - Vanilla RocksDB: 78.6 MB/s - WRITER_PREPARED with unordered_write: 177.8 MB/s (2.2x) - unordered_write: 368.9 MB/s (4.7x with relaxed snapshot guarantees) Without WAL - Vanilla RocksDB: 111.3 MB/s - WRITER_PREPARED with unordered_write: 259.3 MB/s MB/s (2.3x) - unordered_write: 645.6 MB/s (5.8x with relaxed snapshot guarantees) - WRITER_PREPARED with unordered_write disable concurrency control: 185.3 MB/s MB/s (2.35x) Limitations: - The feature is not yet extended to `max_successive_merges` > 0. The feature is also incompatible with `enable_pipelined_write` = true as well as with `allow_concurrent_memtable_write` = false. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5218 Differential Revision: D15219029 Pulled By: maysamyabandeh fbshipit-source-id: 38f2abc4af8780148c6128acdba2b3227bc81759	2019-05-13 17:47:21 -07:00
Yi Wu	92c60547fe	db_bench: fix hang on IO error (#5300 ) Summary: db_bench will wait indefinitely if there's background error. Fix by pass `abs_time_us` to cond var. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5300 Differential Revision: D15319945 Pulled By: miasantreble fbshipit-source-id: 0034fb7f6ec7c3303c4ccf26e54c20fbdac8ab44	2019-05-13 11:30:35 -07:00
Sagar Vemuri	3548e4220d	Improve explicit user readahead performance (#5246 ) Summary: Improve the iterators performance when the user explicitly sets the readahead size via `ReadOptions.readahead_size`. 1. Stop creating new table readers when the user explicitly sets readahead size. 2. Make use of an internal buffer based on `FilePrefetchBuffer` instead of using `ReadaheadRandomAccessFileReader`, to handle the user readahead requests (for both buffered and direct io cases). 3. Add `readahead_size` to db_bench. Benchmarks: https://gist.github.com/sagar0/53693edc320a18abeaeca94ca32f5737 For 1 MB readahead, Buffered IO performance improves by 28% and Direct IO performance improves by 50%. For 512KB readahead, Buffered IO performance improves by 30% and Direct IO performance improves by 67%. Test Plan: Updated `DBIteratorTest.ReadAhead` test to make sure that: - no new table readers are created for iterators on setting ReadOptions.readahead_size - At least "readahead" number of bytes are actually getting read on each iterator read. TODO later: - Use similar logic for compactions as well. - This ties in nicely with #4052 and paves the way for removing ReadaheadRandomAcessFile later. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5246 Differential Revision: D15107946 Pulled By: sagar0 fbshipit-source-id: 2c1149729ca7d779e4e8b7710ba6f4e8cbfd3bea	2019-04-26 21:24:10 -07:00
Adam Retter	990b2f4cb3	Fix compilation on db_bench_tool.cc on Windows (#5227 ) Summary: I needed this change to be able to build the v6.0.1 release on Windows. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5227 Differential Revision: D15033815 Pulled By: sagar0 fbshipit-source-id: 579f3b8e694c34c0d43527eb2fa37175e37f5911	2019-04-23 11:16:51 -07:00
Zhongyi Xie	baa5302447	Avoid double-compacting data in bottom level in manual compactions (#5138 ) Summary: Depending on the config, manual compaction (leveled compaction style) does following compactions: L0->L1 L1->L2 ... Ln-1 -> Ln Ln -> Ln The final Ln -> Ln compaction is partly unnecessary as it recompacts all the files that were just generated by the Ln-1 -> Ln. We should avoid recompacting such files. This rule should be applied to Lmax only. Resolves issue https://github.com/facebook/rocksdb/issues/4995 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5138 Differential Revision: D14940106 Pulled By: miasantreble fbshipit-source-id: 8d3cf5507a17e76f3333cfd4bac5256d005636e5	2019-04-16 23:32:20 -07:00
Yi Wu	b70967aac7	db_bench: support seek to non-exist prefix (#5163 ) Summary: Add `--seek_missing_prefix` flag to db_bench to allow benchmarking seeking to non-existing prefix. Usage example: ``` ./db_bench --db=/dev/shm/db_bench --use_existing_db=false --benchmarks=fillrandom --num=100000000 --prefix_size=9 --keys_per_prefix=10 ./db_bench --db=/dev/shm/db_bench --use_existing_db=true --benchmarks=seekrandom --disable_auto_compactions=true --num=100000000 --prefix_size=9 --keys_per_prefix=10 --reads=1000 --prefix_same_as_start=true --seek_missing_prefix=true ``` Also adding `--total_order_seek` and `--prefix_same_as_start` flags. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5163 Differential Revision: D14935724 Pulled By: riversand963 fbshipit-source-id: 7c41023f007febe373eb1589861f215432a9e18a	2019-04-15 10:54:58 -07:00
Yanqin Jin	3189398c00	Fix bugs detected by clang analyzer (#5185 ) Summary: as titled. False positive included, fixed anyway to make the check pass. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5185 Differential Revision: D14909384 Pulled By: riversand963 fbshipit-source-id: dc5177e72b1929ccfd6175a60e2cd7bdb9bd80f3	2019-04-12 10:45:56 -07:00
anand76	fefd4b98c5	Introduce a new MultiGet batching implementation (#5011 ) Summary: This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching. Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to - 1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch() 2. Bloom filter cachelines can be prefetched, hiding the cache miss latency The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress. Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32). Batch Sizes 1 \| 2 \| 4 \| 8 \| 16 \| 32 Random pattern (Stride length 0) 4.158 \| 4.109 \| 4.026 \| 4.05 \| 4.1 \| 4.074 - Get 4.438 \| 4.302 \| 4.165 \| 4.122 \| 4.096 \| 4.075 - MultiGet (no batching) 4.461 \| 4.256 \| 4.277 \| 4.11 \| 4.182 \| 4.14 - MultiGet (w/ batching) Good locality (Stride length 16) 4.048 \| 3.659 \| 3.248 \| 2.99 \| 2.84 \| 2.753 4.429 \| 3.728 \| 3.406 \| 3.053 \| 2.911 \| 2.781 4.452 \| 3.45 \| 2.833 \| 2.451 \| 2.233 \| 2.135 Good locality (Stride length 256) 4.066 \| 3.786 \| 3.581 \| 3.447 \| 3.415 \| 3.232 4.406 \| 4.005 \| 3.644 \| 3.49 \| 3.381 \| 3.268 4.393 \| 3.649 \| 3.186 \| 2.882 \| 2.676 \| 2.62 Medium locality (Stride length 4096) 4.012 \| 3.922 \| 3.768 \| 3.61 \| 3.582 \| 3.555 4.364 \| 4.057 \| 3.791 \| 3.65 \| 3.57 \| 3.465 4.479 \| 3.758 \| 3.316 \| 3.077 \| 2.959 \| 2.891 dbbench command used (on a DB with 4 levels, 12 million keys)- TEST_TMPDIR=/dev/shm numactl -C 10 ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011 Differential Revision: D14348703 Pulled By: anand1976 fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b	2019-04-11 14:28:26 -07:00
Siying Dong	2b4d5ceb47	Remove some "using std::..." from header files. (#5113 ) Summary: The code convention we are following, Google C++ Style, discourage alias in header files, especially public headers: https://google.github.io/styleguide/cppguide.html#Aliases Remove some of them. Might removed some from .cc files as well to be consistent. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5113 Differential Revision: D14633030 Pulled By: siying fbshipit-source-id: b990edc919d5de60295992284f980195e501d424	2019-03-27 10:28:21 -07:00
Shobhit Dayal	b45b1cde3e	Feature for sampling and reporting compressibility (#4842 ) Summary: This is a feature to sample data-block compressibility and and report them as stats. 1 in N (tunable) blocks is sampled for compressibility using two algorithms: 1. lz4 or snappy for fast compression 2. zstd or zlib for slow but higher compression. The stats are reported to the caller as raw-bytes and compressed-bytes. The block continues to be compressed for storage using the specified CompressionType. The db_bench_tool how has a command line option for specifying the sampling rate. It's default value is 0 (no sampling). To test the overhead for a certain value, users can compare the performance of db_bench_tool, varying the sampling rate. It is unlikely to have a noticeable impact for high values like 20. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4842 Differential Revision: D13629011 Pulled By: shobhitdayal fbshipit-source-id: 14ca668bcab6499b2a1734edf848eb62a4f4fafa	2019-03-18 12:15:34 -07:00
Zhichao Cao	05ebfebc17	Fixed the potential stack overflow of MixGraph in db_bench (#5051 ) Summary: In the MixGraph benchmark of db_bench, The max buffer size used for value of KV-pair might be extremely large (64MB), which might cause function stack overflow in some platforms, reduced to 1MB. Added the finished ops printing in MixGraph benchmark. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5051 Differential Revision: D14379571 Pulled By: zhichao-cao fbshipit-source-id: 24084fbe38f60f2902d9a40f6bc9a25e4e2c9bb9	2019-03-08 14:10:17 -08:00
Andrew Kryczka	18d2e4beb7	Run db_bench on database generated externally (#5017 ) Summary: Added an option, `-use_existing_keys`, which can be set to run benchmarks against an arbitrary existing database. Now users can benchmark against their actual database rather than synthetic data. Before the run begins, it loads all the keys into memory, then uses that set of keys rather than synthesizing new ones in `GenerateKeyFromInt`. This is mainly intended for small-scale DBs where the memory consumption is not a concern. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5017 Differential Revision: D14270303 Pulled By: riversand963 fbshipit-source-id: 6328df9dffb5e19170270dd00a69f4bbe424e5ed	2019-03-01 11:19:03 -08:00
Siying Dong	aef763b6d6	Make statistics's stats_level change thread-safe (#5030 ) Summary: Right now, users can change statistics.stats_level while DB is running, but TSAN may report data race. We make stats_level_ to be atomic, and access them using accessors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5030 Differential Revision: D14267519 Pulled By: siying fbshipit-source-id: 37d7ebeff7a43a406230143422a16af899163f73	2019-03-01 10:42:09 -08:00
Siying Dong	5e298f865b	Add two more StatsLevel (#5027 ) Summary: Statistics cost too much CPU for some use cases. Add two stats levels so that people can choose to skip two types of expensive stats, timers and histograms. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5027 Differential Revision: D14252765 Pulled By: siying fbshipit-source-id: 75ecec9eaa44c06118229df4f80c366115346592	2019-02-28 10:27:59 -08:00
Zhongyi Xie	c4f5d0aa15	add GetStatsHistory to retrieve stats snapshots (#4748 ) Summary: This PR adds public `GetStatsHistory` API to retrieve stats history in the form of an std map. The key of the map is the timestamp in microseconds when the stats snapshot is taken, the value is another std map from stats name to stats value (stored in std string). Two DBOptions are introduced: `stats_persist_period_sec` (default 10 minutes) controls the intervals between two snapshots are taken; `max_stats_history_count` (default 10) controls the max number of history snapshots to keep in memory. RocksDB will stop collecting stats snapshots if `stats_persist_period_sec` is set to 0. (This PR is the in-memory part of https://github.com/facebook/rocksdb/pull/4535) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4748 Differential Revision: D13961471 Pulled By: miasantreble fbshipit-source-id: ac836d401ecb84ea92216bf9966f969dedf4ad04	2019-02-20 15:52:54 -08:00
Zhongyi Xie	ed995c6a69	add whole key bloom filter support in memtables (#4985 ) Summary: MyRocks calls `GetForUpdate` on `INSERT`, for unique key check, and in almost all cases GetForUpdate returns empty result. For such cases, whole key bloom filter is helpful. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4985 Differential Revision: D14118257 Pulled By: miasantreble fbshipit-source-id: d35cb7109c62fd5ad541a26968e3a3e16d3e85ea	2019-02-19 12:15:39 -08:00
Siying Dong	4db46aa2e6	Fix LITE Build (#4989 ) Summary: LITE mode has EventListener to be an empty class. However in db_bench, it is used. When "override" is added to the functions, the build breaks. Fix it by keeping the listener empty in LITE mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4989 Differential Revision: D14108132 Pulled By: siying fbshipit-source-id: 80121aab35b1120e502b37b782301dd700692697	2019-02-15 16:13:11 -08:00
Aubin Sanyal	3231a2e581	Deprecate ttl option from CompactionOptionsFIFO (#4965 ) Summary: We introduced ttl option in CompactionOptionsFIFO when ttl-based file deletion (compaction) was supported only as part of FIFO Compaction. But with the extension of ttl semantics even to Level compaction, CompactionOptionsFIFO.ttl can now be deprecated. Instead we will start using ColumnFamilyOptions.ttl for FIFO compaction as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4965 Differential Revision: D14072960 Pulled By: sagar0 fbshipit-source-id: c98cc2ae695a28136295787cd88d36a220fc219e	2019-02-15 09:51:41 -08:00
Michael Liu	ca89ac2ba9	Apply modernize-use-override (2nd iteration) Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. Reviewed By: Orvid Differential Revision: D14090024 fbshipit-source-id: 1e9432e87d2657e1ff0028e15370a85d1739ba2a	2019-02-14 14:41:36 -08:00
Siying Dong	cf3a671733	Remove cuckoo hash memtable (#4953 ) Summary: Cuckoo Hash is less useful than we initially expected. Remove it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4953 Differential Revision: D13979264 Pulled By: siying fbshipit-source-id: 2a60afdaa989f045357398b43a1cc5d46f4492ed	2019-02-07 16:15:27 -08:00
Siying Dong	d9c9f3c809	db_bench: fix "micros/op" reporting (#4949 ) Summary: `4985a9f73b (diff-e5276985b26a0551957144f4420a594bR511)` changes the meaning of latency reporting from running time per query, to elapse_time / #ops, without providing a reason why. Considering that this is a counter-intuitive reporting, Reverting the change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4949 Differential Revision: D13964684 Pulled By: siying fbshipit-source-id: d6304d3d4b5a802daa292302623c7dbca9a680bc	2019-02-05 17:20:02 -08:00
Alexander Zinoviev	32a6dd9a41	Add a new CPU time counter to compaction report (#4889 ) Summary: Measure CPU time consumed for a compaction and report it in the stats report Enable NowCPUNanos() to work for MacOS Pull Request resolved: https://github.com/facebook/rocksdb/pull/4889 Differential Revision: D13701276 Pulled By: zinoale fbshipit-source-id: 5024e5bbccd4dd10fd90d947870237f436445055	2019-01-29 17:24:00 -08:00
zhichao-cao	e2547103fd	Fix the build error caused by the dynamic array (#4918 ) Summary: In the MixGraph benchmark of db_bench #4788 , the char array is initialized with an argument from user's input, which can cause build error on some platforms. Also, the msg char array size can be potentially smaller than the printed data, which should be extended from 100 to 256. Tested with make check. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4918 Differential Revision: D13844298 Pulled By: sagar0 fbshipit-source-id: 33c4809c5c4438f0a9f7b289d3f42e20c545bbab	2019-01-28 12:39:57 -08:00
Andrew Kryczka	e242fa4664	Add latest toolchain (gcc-8, etc.) build support for fbcode users (#4923 ) Summary: - When building with internal dependencies, specify this toolchain by setting `ROCKSDB_FBCODE_BUILD_WITH_PLATFORM007=1` - It is not enabled by default. However, it is enabled for TSAN builds in CI since there is a known problem with TSAN in gcc-5: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71090 - I did not add support for Lua since (1) we agreed to deprecate it, and (2) we only have an internal build for v5.3 with this toolchain while that has breaking changes compared to our current version (v5.2). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4923 Differential Revision: D13827226 Pulled By: ajkr fbshipit-source-id: 9aa3388ed3679777cfb15ef8cbcb83c07f62f947	2019-01-28 11:26:32 -08:00
Sagar Vemuri	0cead31d10	Fix Clang static analyzer warning in db_bench (#4910 ) Summary: Fixed clang static analyzer warning about division by 0. ``` ar: creating librocksdb_debug.a tools/db_bench_tool.cc:4650:43: warning: Division by zero int pos = static_cast<int>(rand_num % range_); ~~~~~~~~~^~~~~~~~ 1 warning generated. make: *** [analyze] Error 1 ``` This is from the new code I recently merged in `ce8e88d2d7`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4910 Differential Revision: D13788037 Pulled By: sagar0 fbshipit-source-id: f48851dca85047c19fbb1a361e25ce643aa4c7ea	2019-01-23 13:33:02 -08:00
Zhongyi Xie	cbe0239270	add cast to avoid loss of precision error (#4906 ) Summary: this PR address the following error: > tools/db_bench_tool.cc:4776:68: error: implicit conversion loses integer precision: 'int64_t' (aka 'long') to 'unsigned int' [-Werror,-Wshorten-64-to-32] s = db_with_cfh->db->Put(write_options_, key, gen.Generate(value_size)); Pull Request resolved: https://github.com/facebook/rocksdb/pull/4906 Differential Revision: D13780185 Pulled By: miasantreble fbshipit-source-id: 1c83a77d341099518c72f0f4a63e97ab9c4784b3	2019-01-22 22:44:17 -08:00
Zhichao Cao	ce8e88d2d7	Generate mixed workload with Get, Put, Seek in db_bench (#4788 ) Summary: Based on the specific workload models (key access distribution, value size distribution, and iterator scan length distribution, the QPS variation), the MixGraph benchmark generate the synthetic workload according to these distributions which can reflect the real-world workload characteristics. After user enable the tracing function, they will get the trace file. By analyzing the trace file with the trace_analyzer tool, user can generate a set of statistic data files including. The _accessed_key_stats.txt, -accessed_value_size_distribution.txt, -iterator_length_distribution.txt, and -qps_stats.txt are mainly used to fit the Matlab model fitting. After that, user can get the parameters of the workload distributions (the modeling details are described: [here](https://github.com/facebook/rocksdb/wiki/RocksDB-Trace%2C-Replay%2C-and-Analyzer)) The key access distribution follows the The two-term power model. The probability density function is: `f(x) = ax^{b}+c`. The corresponding parameters are key_dist_a, key_dist_b, and key_dist_c in db_bench For the value size distribution and iterator scan length distribution, they both follow the Generalized Pareto Distribution. The probability density function is `f(x) = (1/sigma)(1+k(x-theta)/sigma))^{-1-1/k)`. The parameters are: value_k, value_theta, value_sigma and iter_k, iter_theta, iter_sigma. For more information about the Generalized Pareto Distribution, users can find the [wiki](https://en.wikipedia.org/wiki/Generalized_Pareto_distribution) and [Matalb page](https://www.mathworks.com/help/stats/generalized-pareto-distribution.html) As for the QPS, it follows the diurnal pattern. So Sine is a good model to fit it. `F(x) = sine_asin(sine_bx + sine_c) + sine_d`. The trace_will tell you the average QPS in the print out resutls, which is sine_d. After user fit the "-qps_stats.txt" to the Matlab model, user can get the sine_a, sine_b, and sine_c. By using the 4 parameters, user can control the QPS variation including the period, average, changes. To use the bench mark, user can indicate the following parameters as examples: ``` -benchmarks="mixgraph" -key_dist_a=0.002312 -key_dist_b=0.3467 -value_k=0.9233 -value_sigma=226.4092 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.7 -mix_put_ratio=0.25 -mix_seek_ratio=0.05 -sine_mix_rate_interval_milliseconds=500 -sine_a=15000 -sine_b=1 -sine_d=20000 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4788 Differential Revision: D13573940 Pulled By: sagar0 fbshipit-source-id: e184c27e07b4f1bc0b436c2be36c5090c1fb0222	2019-01-22 10:44:26 -08:00
Andrew Kryczka	01013ae766	Digest ZSTD compression dictionary once when writing SST file (#4849 ) Summary: This is essentially a re-submission of #4251 with a few improvements: - Split `CompressionDict` into two separate classes: `CompressionDict` and `UncompressionDict` - Eliminated `Init` functions. Instead do all initialization work in constructors. - Added test case for parallel DB open, which is the scenario where #4251 failed under TSAN. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4849 Differential Revision: D13606039 Pulled By: ajkr fbshipit-source-id: 08c236059798c710db9cbf545fce0f371232d447	2019-01-18 19:12:57 -08:00
DorianZheng	2670fe8c73	Get `CompactionJobInfo` from CompactFiles Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4716 Differential Revision: D13207677 Pulled By: ajkr fbshipit-source-id: d0ccf5a66df6cbb07288b0c5ebad81fd9df3926b	2018-12-13 14:21:24 -08:00
Sagar Vemuri	70645355ad	Move FIFOCompactionPicker to a separate file (#4724 ) Summary: Summary: Simplified the code layout by moving FIFOCompactionPicker to a separate file. Why?: While trying to add ttl functionality to universal compaction, I found that `FIFOCompactionPicker` class and its impl methods to be interspersed between `LevelCompactionPicker` methods which kind-of made the code a little hard to traverse. So I moved `FIFOCompactionPicker` to a separate compaction_picker_fifo.h/cc file, similar to `UniversalCompactionPicker`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4724 Differential Revision: D13227914 Pulled By: sagar0 fbshipit-source-id: 89471766ea67fa4d87664a41c057dd7df4b3d4e3	2018-11-29 16:04:52 -08:00
Sagar Vemuri	c94f073e5e	Fix Mac build break in casting (#4722 ) Summary: Mac build is failing with the below error: ``` $ make db_bench -j8 ... ... tools/db_bench_tool.cc:4583:25: error: no matching function for call to 'max' (uint64_t)std::max(0l, seek_pos - FLAGS_max_scan_distance), ^~~~~~~~ /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2717:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('long' vs. 'long long') max(const _Tp& __a, const _Tp& __b) ^ /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2727:1: note: candidate template ignored: could not match 'initializer_list<type-parameter-0-0>' against 'long' max(initializer_list<_Tp> __t, _Compare __comp) ^ /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2709:1: note: candidate function template not viable: requires 3 arguments, but 2 were provided max(const _Tp& __a, const _Tp& __b, _Compare __comp) ^ /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2735:1: note: candidate function template not viable: requires single argument '__t', but 2 arguments were provided max(initializer_list<_Tp> __t) ^ 1 error generated. make: *** [tools/db_bench_tool.o] Error 1 ``` My compiler version: Mac OS X Mojave ``` $ clang++ --version Apple LLVM version 10.0.0 (clang-1000.11.45.5) Target: x86_64-apple-darwin18.2.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4722 Differential Revision: D13220196 Pulled By: sagar0 fbshipit-source-id: 01e5e928288a5613027c83a26ad8aedf04438b14	2018-11-27 13:30:16 -08:00
Po-Chuan Hsieh	60deb4485e	Fix build with ROCKSDB_LITE and -Wunused-private-field (#4715 ) Summary: The error message of databases/rocksdb-lite (FreeBSD port) is as follows: ``` tools/db_bench_tool.cc:1976:16: error: private field 'trace_options_' is not used [-Werror,-Wunused-private-field] TraceOptions trace_options_; ^ ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4715 Differential Revision: D13207902 Pulled By: ajkr fbshipit-source-id: be3c612eba656aeddb77e35e2f201dd25dc92f7e	2018-11-26 21:35:38 -08:00
Abhishek Madan	0ed738fdd0	Add max_scan_distance flag to db_bench (#4660 ) Summary: The new flag makes it possible to constrain iterator traversal by the upper/lower bound the iterator is expected to pass. This allows seekrandom results to be more easily comparable between DBs with and without deletions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4660 Differential Revision: D13053111 Pulled By: abhimadan fbshipit-source-id: 33e250f2e2d210b54c7726399da30a33f723c33c	2018-11-14 10:46:12 -08:00
Sagar Vemuri	dc3528077a	Update all unique/shared_ptr instances to be qualified with namespace std (#4638 ) Summary: Ran the following commands to recursively change all the files under RocksDB: ``` find . -type f -name ".cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} + find . -type f -name ".cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} + find . -type f -name ".cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} + find . -type f -name ".cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} + ``` Running `make format` updated some formatting on the files touched. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638 Differential Revision: D12934992 Pulled By: sagar0 fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8	2018-11-09 11:19:58 -08:00
Zhongyi Xie	fe0d23059d	Fix two contrun job failures (#4587 ) Summary: Currently there are two contrun test failures: * rocksdb-contrun-lite: > tools/db_bench_tool.cc: In function ‘int rocksdb::db_bench_tool(int, char)’: tools/db_bench_tool.cc:5814:5: error: ‘DumpMallocStats’ is not a member of ‘rocksdb’ rocksdb::DumpMallocStats(&stats_string); ^ make: * [tools/db_bench_tool.o] Error 1 * rocksdb-contrun-unity: > In file included from unity.cc:44:0: db/range_tombstone_fragmenter.cc: In member function ‘void rocksdb::FragmentedRangeTombstoneIterator::FragmentTombstones(std::unique_ptr<rocksdb::InternalIteratorBase<rocksdb::Slice> >, rocksdb::SequenceNumber)’: db/range_tombstone_fragmenter.cc:90:14: error: reference to ‘ParsedInternalKeyComparator’ is ambiguous auto cmp = ParsedInternalKeyComparator(icmp_); This PR will fix them Pull Request resolved: https://github.com/facebook/rocksdb/pull/4587 Differential Revision: D10846554 Pulled By: miasantreble fbshipit-source-id: 8d3358879e105060197b1379c84aecf51b352b93	2018-10-24 20:16:45 -07:00
Yi Wu	0415244bfa	option to print malloc stats at the end of db_bench (#4582 ) Summary: Option to print malloc stats to stdout at the end of db_bench. This is different from `--dump_malloc_stats`, which periodically print the same information to LOG file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4582 Differential Revision: D10520814 Pulled By: yiwu-arbug fbshipit-source-id: beff5e514e414079d31092b630813f82939ffe5c	2018-10-24 11:39:05 -07:00
Simon Grätzer	f959e88048	Fix printf formatting on MacOS (#4533 ) Summary: On MacOS with clang the compilation of _tools/db_bench_tool.cc_ always fails because the format used in a `fprintf` call has the wrong type. This PR should hopefully fix this issue ``` tools/db_bench_tool.cc:4233:61: error: format specifies type 'unsigned long long' but the argument has type 'size_t' (aka 'unsigned long') ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4533 Differential Revision: D10471657 Pulled By: maysamyabandeh fbshipit-source-id: f20f5f3756d3571b586c895c845d0d4d1e34a398	2018-10-19 14:46:09 -07:00
Abhishek Madan	35cd754a6d	Add writes_before_delete_range flag to db_bench (#4538 ) Summary: The new flag allows tombstones to be generated after enough keys have been written to the database, which makes it easier to ensure that tombstones cover a lot of keys. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4538 Differential Revision: D10455685 Pulled By: abhimadan fbshipit-source-id: f25d5421745a353c830dea12b79784e852056551	2018-10-18 17:19:59 -07:00
Zhongyi Xie	d6ec288703	Add PerfContextByLevel to provide per level perf context information (#4226 ) Summary: Current implementation of perf context is level agnostic. Making it hard to do performance evaluation for the LSM tree. This PR adds `PerfContextByLevel` to decompose the counters by level. This will be helpful when analyzing point and range query performance as well as tuning bloom filter Also replaced __thread with thread_local keyword for perf_context Pull Request resolved: https://github.com/facebook/rocksdb/pull/4226 Differential Revision: D10369509 Pulled By: miasantreble fbshipit-source-id: f1ced4e0de5fcebdb7f9cff36164516bc6382d82	2018-10-17 11:19:40 -07:00
Igor Canadi	1cf5deb8fd	Introduce CacheAllocator, a custom allocator for cache blocks (#4437 ) Summary: This is a conceptually simple change, but it touches many files to pass the allocator through function calls. We introduce CacheAllocator, which can be used by clients to configure custom allocator for cache blocks. Our motivation is to hook this up with folly's `JemallocNodumpAllocator` (`f43ce6d686/folly/experimental/JemallocNodumpAllocator.h`), but there are many other possible use cases. Additionally, this commit cleans up memory allocation in `util/compression.h`, making sure that all allocations are wrapped in a unique_ptr as soon as possible. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4437 Differential Revision: D10132814 Pulled By: yiwu-arbug fbshipit-source-id: be1343a4b69f6048df127939fea9bbc96969f564	2018-10-02 17:24:58 -07:00
Abhishek Madan	519f8b145f	Generate appropriate number of keys in db_bench (#4404 ) Summary: If range tombstones are generated every few writes, the KeyGenerator's limit is now extended to account for the additional Next() calls. This is primarily important for `filluniquerandom` benchmarks that enforce the call limit. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4404 Differential Revision: D9949326 Pulled By: abhimadan fbshipit-source-id: 0bdfeb2cad2098dc0b8b029236dab5e4bef25e38	2018-09-19 16:28:21 -07:00
Anand Ananthabhotla	a27fce408e	Auto recovery from out of space errors (#4164 ) Summary: This commit implements automatic recovery from a Status::NoSpace() error during background operations such as write callback, flush and compaction. The broad design is as follows - 1. Compaction errors are treated as soft errors and don't put the database in read-only mode. A compaction is delayed until enough free disk space is available to accomodate the compaction outputs, which is estimated based on the input size. This means that users can continue to write, and we rely on the WriteController to delay or stop writes if the compaction debt becomes too high due to persistent low disk space condition 2. Errors during write callback and flush are treated as hard errors, i.e the database is put in read-only mode and goes back to read-write only fater certain recovery actions are taken. 3. Both types of recovery rely on the SstFileManagerImpl to poll for sufficient disk space. We assume that there is a 1-1 mapping between an SFM and the underlying OS storage container. For cases where multiple DBs are hosted on a single storage container, the user is expected to allocate a single SFM instance and use the same one for all the DBs. If no SFM is specified by the user, DBImpl::Open() will allocate one, but this will be one per DB and each DB will recover independently. The recovery implemented by SFM is as follows - a) On the first occurance of an out of space error during compaction, subsequent compactions will be delayed until the disk free space check indicates enough available space. The required space is computed as the sum of input sizes. b) The free space check requirement will be removed once the amount of free space is greater than the size reserved by in progress compactions when the first error occured c) If the out of space error is a hard error, a background thread in SFM will poll for sufficient headroom before triggering the recovery of the database and putting it in write-only mode. The headroom is calculated as the sum of the write_buffer_size of all the DB instances associated with the SFM 4. EventListener callbacks will be called at the start and completion of automatic recovery. Users can disable the auto recov ery in the start callback, and later initiate it manually by calling DB::Resume() Todo: 1. More extensive testing 2. Add disk full condition to db_stress (follow-on PR) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164 Differential Revision: D9846378 Pulled By: anand1976 fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a	2018-09-15 13:43:04 -07:00

1 2 3 4 5

204 commits