rocksdb

mirror of https://github.com/facebook/rocksdb.git synced 2024-11-30 04:41:49 +00:00

Author	SHA1	Message	Date
Yanqin Jin	210b49cac9	Disable pipelined write in atomic flush stress test (#5266 ) Summary: Since currently pipelined write allows one thread to perform memtable writes while another thread is traversing the `flush_scheduler_`, it will cause an assertion failure in `FlushScheduler::Clear`. To unblock crash recoery tests, we temporarily disable pipelined write when atomic flush is enabled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5266 Differential Revision: D15142285 Pulled By: riversand963 fbshipit-source-id: a0c20fe4ac543e08feaed602414f982054df7831	2019-04-30 08:12:42 -07:00
Sagar Vemuri	3548e4220d	Improve explicit user readahead performance (#5246 ) Summary: Improve the iterators performance when the user explicitly sets the readahead size via `ReadOptions.readahead_size`. 1. Stop creating new table readers when the user explicitly sets readahead size. 2. Make use of an internal buffer based on `FilePrefetchBuffer` instead of using `ReadaheadRandomAccessFileReader`, to handle the user readahead requests (for both buffered and direct io cases). 3. Add `readahead_size` to db_bench. Benchmarks: https://gist.github.com/sagar0/53693edc320a18abeaeca94ca32f5737 For 1 MB readahead, Buffered IO performance improves by 28% and Direct IO performance improves by 50%. For 512KB readahead, Buffered IO performance improves by 30% and Direct IO performance improves by 67%. Test Plan: Updated `DBIteratorTest.ReadAhead` test to make sure that: - no new table readers are created for iterators on setting ReadOptions.readahead_size - At least "readahead" number of bytes are actually getting read on each iterator read. TODO later: - Use similar logic for compactions as well. - This ties in nicely with #4052 and paves the way for removing ReadaheadRandomAcessFile later. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5246 Differential Revision: D15107946 Pulled By: sagar0 fbshipit-source-id: 2c1149729ca7d779e4e8b7710ba6f4e8cbfd3bea	2019-04-26 21:24:10 -07:00
Adam Retter	990b2f4cb3	Fix compilation on db_bench_tool.cc on Windows (#5227 ) Summary: I needed this change to be able to build the v6.0.1 release on Windows. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5227 Differential Revision: D15033815 Pulled By: sagar0 fbshipit-source-id: 579f3b8e694c34c0d43527eb2fa37175e37f5911	2019-04-23 11:16:51 -07:00
Fosco Marotto	6c2bf9e916	Add copyright headers per FB open-source checkup tool. (#5199 ) Summary: internal task: T35568575 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5199 Differential Revision: D14962794 Pulled By: gfosco fbshipit-source-id: 93838ede6d0235eaecff90d200faed9a8515bbbe	2019-04-18 10:55:01 -07:00
Zhongyi Xie	baa5302447	Avoid double-compacting data in bottom level in manual compactions (#5138 ) Summary: Depending on the config, manual compaction (leveled compaction style) does following compactions: L0->L1 L1->L2 ... Ln-1 -> Ln Ln -> Ln The final Ln -> Ln compaction is partly unnecessary as it recompacts all the files that were just generated by the Ln-1 -> Ln. We should avoid recompacting such files. This rule should be applied to Lmax only. Resolves issue https://github.com/facebook/rocksdb/issues/4995 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5138 Differential Revision: D14940106 Pulled By: miasantreble fbshipit-source-id: 8d3cf5507a17e76f3333cfd4bac5256d005636e5	2019-04-16 23:32:20 -07:00
Yi Wu	b70967aac7	db_bench: support seek to non-exist prefix (#5163 ) Summary: Add `--seek_missing_prefix` flag to db_bench to allow benchmarking seeking to non-existing prefix. Usage example: ``` ./db_bench --db=/dev/shm/db_bench --use_existing_db=false --benchmarks=fillrandom --num=100000000 --prefix_size=9 --keys_per_prefix=10 ./db_bench --db=/dev/shm/db_bench --use_existing_db=true --benchmarks=seekrandom --disable_auto_compactions=true --num=100000000 --prefix_size=9 --keys_per_prefix=10 --reads=1000 --prefix_same_as_start=true --seek_missing_prefix=true ``` Also adding `--total_order_seek` and `--prefix_same_as_start` flags. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5163 Differential Revision: D14935724 Pulled By: riversand963 fbshipit-source-id: 7c41023f007febe373eb1589861f215432a9e18a	2019-04-15 10:54:58 -07:00
Yanqin Jin	3189398c00	Fix bugs detected by clang analyzer (#5185 ) Summary: as titled. False positive included, fixed anyway to make the check pass. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5185 Differential Revision: D14909384 Pulled By: riversand963 fbshipit-source-id: dc5177e72b1929ccfd6175a60e2cd7bdb9bd80f3	2019-04-12 10:45:56 -07:00
anand76	fefd4b98c5	Introduce a new MultiGet batching implementation (#5011 ) Summary: This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching. Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to - 1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch() 2. Bloom filter cachelines can be prefetched, hiding the cache miss latency The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress. Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32). Batch Sizes 1 \| 2 \| 4 \| 8 \| 16 \| 32 Random pattern (Stride length 0) 4.158 \| 4.109 \| 4.026 \| 4.05 \| 4.1 \| 4.074 - Get 4.438 \| 4.302 \| 4.165 \| 4.122 \| 4.096 \| 4.075 - MultiGet (no batching) 4.461 \| 4.256 \| 4.277 \| 4.11 \| 4.182 \| 4.14 - MultiGet (w/ batching) Good locality (Stride length 16) 4.048 \| 3.659 \| 3.248 \| 2.99 \| 2.84 \| 2.753 4.429 \| 3.728 \| 3.406 \| 3.053 \| 2.911 \| 2.781 4.452 \| 3.45 \| 2.833 \| 2.451 \| 2.233 \| 2.135 Good locality (Stride length 256) 4.066 \| 3.786 \| 3.581 \| 3.447 \| 3.415 \| 3.232 4.406 \| 4.005 \| 3.644 \| 3.49 \| 3.381 \| 3.268 4.393 \| 3.649 \| 3.186 \| 2.882 \| 2.676 \| 2.62 Medium locality (Stride length 4096) 4.012 \| 3.922 \| 3.768 \| 3.61 \| 3.582 \| 3.555 4.364 \| 4.057 \| 3.791 \| 3.65 \| 3.57 \| 3.465 4.479 \| 3.758 \| 3.316 \| 3.077 \| 2.959 \| 2.891 dbbench command used (on a DB with 4 levels, 12 million keys)- TEST_TMPDIR=/dev/shm numactl -C 10 ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011 Differential Revision: D14348703 Pulled By: anand1976 fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b	2019-04-11 14:28:26 -07:00
datonli	f0edf9d575	#5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152 ) Summary: mv port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output Pull Request resolved: https://github.com/facebook/rocksdb/pull/5152 Differential Revision: D14779409 Pulled By: siying fbshipit-source-id: d4162c47c979c6e8cc6a9e601802864ab3768ecb	2019-04-04 11:38:19 -07:00
Yanqin Jin	d77476ef55	Fix db_stress for custom env (#5122 ) Summary: Fix some hdfs-related code so that it can compile and run 'db_stress' Pull Request resolved: https://github.com/facebook/rocksdb/pull/5122 Differential Revision: D14675495 Pulled By: riversand963 fbshipit-source-id: cac280479efcf5451982558947eac1732e8bc45a	2019-03-28 19:20:27 -07:00
Siying Dong	2b4d5ceb47	Remove some "using std::..." from header files. (#5113 ) Summary: The code convention we are following, Google C++ Style, discourage alias in header files, especially public headers: https://google.github.io/styleguide/cppguide.html#Aliases Remove some of them. Might removed some from .cc files as well to be consistent. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5113 Differential Revision: D14633030 Pulled By: siying fbshipit-source-id: b990edc919d5de60295992284f980195e501d424	2019-03-27 10:28:21 -07:00
Yanqin Jin	9358178edc	Support for single-primary, multi-secondary instances (#4899 ) Summary: This PR allows RocksDB to run in single-primary, multi-secondary process mode. The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary. Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`. This PR has several components: 1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary. 2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue. 3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`. 3.1 Tail the primary's MANIFEST during recovery. 3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`. 3.3 Tailing WAL will be in a future PR. 4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899 Differential Revision: D14510945 Pulled By: riversand963 fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886	2019-03-26 16:45:31 -07:00
Zhongyi Xie	52e6404e0f	ldb command parsing: allow option values to contain equals signs (#5088 ) Summary: Right now ldb command doesn't allow cases where option values contain equals sign. For example, ``` ldb --db=/tmp/test scan --from='q=3' --max_keys=1 ``` after parsing, ldb will have one option 'db', 'max_keys' and one flag 'from'. This PR updates the parsing logic so that it now supports the above mentioned cases Pull Request resolved: https://github.com/facebook/rocksdb/pull/5088 Differential Revision: D14600869 Pulled By: miasantreble fbshipit-source-id: c6ef518c74a98d7b6675ea5954ae08b1bda5554e	2019-03-25 13:23:11 -07:00
Shobhit Dayal	b45b1cde3e	Feature for sampling and reporting compressibility (#4842 ) Summary: This is a feature to sample data-block compressibility and and report them as stats. 1 in N (tunable) blocks is sampled for compressibility using two algorithms: 1. lz4 or snappy for fast compression 2. zstd or zlib for slow but higher compression. The stats are reported to the caller as raw-bytes and compressed-bytes. The block continues to be compressed for storage using the specified CompressionType. The db_bench_tool how has a command line option for specifying the sampling rate. It's default value is 0 (no sampling). To test the overhead for a certain value, users can compare the performance of db_bench_tool, varying the sampling rate. It is unlikely to have a noticeable impact for high values like 20. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4842 Differential Revision: D13629011 Pulled By: shobhitdayal fbshipit-source-id: 14ca668bcab6499b2a1734edf848eb62a4f4fafa	2019-03-18 12:15:34 -07:00
Andrew Kryczka	2263f86901	exercise WAL recycling in crash test (#5070 ) Summary: Since this feature affects the WAL behavior, it seems important our crash-recovery tests cover it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5070 Differential Revision: D14470085 Pulled By: miasantreble fbshipit-source-id: 9b9682a718a926d57d055e0a5ec867efbd2eb9c1	2019-03-15 12:03:26 -07:00
Zhichao Cao	dcde292c3b	Add the -try_process_corrupted_trace option to trace_analyzer (#5067 ) Summary: In the current trace_analyzer implementation, once the trace file has corrupted content, which can be caused by unexpected tracing operations or other reasons, trace_analyzer will print the error and stop analyzing. By adding the -try_process_corrupted_trace option, user can try to process the corrupted trace file and get the analyzing results of the trace records from the beginning to the the first corrupted point in the trace file. Analyzing might fail even this option is enabled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5067 Differential Revision: D14433037 Pulled By: zhichao-cao fbshipit-source-id: d095233ba371726869af0def0cdee23b69896831	2019-03-14 20:03:01 -07:00
Andrew Kryczka	5a5c0492db	ldb: set `total_order_seek` for scans (#5066 ) Summary: Without `total_order_seek=true`, using this command with `prefix_extractor` set skips over lots of keys. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5066 Differential Revision: D14425967 Pulled By: sagar0 fbshipit-source-id: f6f142733258d92604f920615be9266e1fe797f8	2019-03-12 13:10:39 -07:00
Zhichao Cao	05ebfebc17	Fixed the potential stack overflow of MixGraph in db_bench (#5051 ) Summary: In the MixGraph benchmark of db_bench, The max buffer size used for value of KV-pair might be extremely large (64MB), which might cause function stack overflow in some platforms, reduced to 1MB. Added the finished ops printing in MixGraph benchmark. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5051 Differential Revision: D14379571 Pulled By: zhichao-cao fbshipit-source-id: 24084fbe38f60f2902d9a40f6bc9a25e4e2c9bb9	2019-03-08 14:10:17 -08:00
Andrew Kryczka	18d2e4beb7	Run db_bench on database generated externally (#5017 ) Summary: Added an option, `-use_existing_keys`, which can be set to run benchmarks against an arbitrary existing database. Now users can benchmark against their actual database rather than synthetic data. Before the run begins, it loads all the keys into memory, then uses that set of keys rather than synthesizing new ones in `GenerateKeyFromInt`. This is mainly intended for small-scale DBs where the memory consumption is not a concern. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5017 Differential Revision: D14270303 Pulled By: riversand963 fbshipit-source-id: 6328df9dffb5e19170270dd00a69f4bbe424e5ed	2019-03-01 11:19:03 -08:00
Siying Dong	aef763b6d6	Make statistics's stats_level change thread-safe (#5030 ) Summary: Right now, users can change statistics.stats_level while DB is running, but TSAN may report data race. We make stats_level_ to be atomic, and access them using accessors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5030 Differential Revision: D14267519 Pulled By: siying fbshipit-source-id: 37d7ebeff7a43a406230143422a16af899163f73	2019-03-01 10:42:09 -08:00
Maysam Yabandeh	0b80f6b380	WritePrepared: script to analyze stress test failures (#5033 ) Summary: This the hackish script we used to find the root cause of failures in transaction stress tests. It is not well-written and does not require rigorous reviewing but it is better than starting from scratch each time we observe an issue. The stress tests would just say that at which snapshots the sum of all the keys in a set is inconsistent with another set. To help debugging one need to know which key exactly returned inconsistent results. The script looks at the transactions between two conflicting snapshots, and performs thee changes manually to see for which key the read value was inconsistent. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5033 Differential Revision: D14280362 Pulled By: maysamyabandeh fbshipit-source-id: d5826055c46711460ba81480d96cb5ea082814a5	2019-03-01 09:18:40 -08:00
Siying Dong	5e298f865b	Add two more StatsLevel (#5027 ) Summary: Statistics cost too much CPU for some use cases. Add two stats levels so that people can choose to skip two types of expensive stats, timers and histograms. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5027 Differential Revision: D14252765 Pulled By: siying fbshipit-source-id: 75ecec9eaa44c06118229df4f80c366115346592	2019-02-28 10:27:59 -08:00
Zhongyi Xie	c4f5d0aa15	add GetStatsHistory to retrieve stats snapshots (#4748 ) Summary: This PR adds public `GetStatsHistory` API to retrieve stats history in the form of an std map. The key of the map is the timestamp in microseconds when the stats snapshot is taken, the value is another std map from stats name to stats value (stored in std string). Two DBOptions are introduced: `stats_persist_period_sec` (default 10 minutes) controls the intervals between two snapshots are taken; `max_stats_history_count` (default 10) controls the max number of history snapshots to keep in memory. RocksDB will stop collecting stats snapshots if `stats_persist_period_sec` is set to 0. (This PR is the in-memory part of https://github.com/facebook/rocksdb/pull/4535) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4748 Differential Revision: D13961471 Pulled By: miasantreble fbshipit-source-id: ac836d401ecb84ea92216bf9966f969dedf4ad04	2019-02-20 15:52:54 -08:00
Zhongyi Xie	ed995c6a69	add whole key bloom filter support in memtables (#4985 ) Summary: MyRocks calls `GetForUpdate` on `INSERT`, for unique key check, and in almost all cases GetForUpdate returns empty result. For such cases, whole key bloom filter is helpful. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4985 Differential Revision: D14118257 Pulled By: miasantreble fbshipit-source-id: d35cb7109c62fd5ad541a26968e3a3e16d3e85ea	2019-02-19 12:15:39 -08:00
Siying Dong	4db46aa2e6	Fix LITE Build (#4989 ) Summary: LITE mode has EventListener to be an empty class. However in db_bench, it is used. When "override" is added to the functions, the build breaks. Fix it by keeping the listener empty in LITE mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4989 Differential Revision: D14108132 Pulled By: siying fbshipit-source-id: 80121aab35b1120e502b37b782301dd700692697	2019-02-15 16:13:11 -08:00
Aubin Sanyal	3231a2e581	Deprecate ttl option from CompactionOptionsFIFO (#4965 ) Summary: We introduced ttl option in CompactionOptionsFIFO when ttl-based file deletion (compaction) was supported only as part of FIFO Compaction. But with the extension of ttl semantics even to Level compaction, CompactionOptionsFIFO.ttl can now be deprecated. Instead we will start using ColumnFamilyOptions.ttl for FIFO compaction as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4965 Differential Revision: D14072960 Pulled By: sagar0 fbshipit-source-id: c98cc2ae695a28136295787cd88d36a220fc219e	2019-02-15 09:51:41 -08:00
Michael Liu	ca89ac2ba9	Apply modernize-use-override (2nd iteration) Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. Reviewed By: Orvid Differential Revision: D14090024 fbshipit-source-id: 1e9432e87d2657e1ff0028e15370a85d1739ba2a	2019-02-14 14:41:36 -08:00
Yanqin Jin	4fc442029a	Avoid using kInAtomicGroup tag for single-cf op (#4981 ) Summary: if an operation just involves a single column family, then we do not have to set the kInAtomicGroup tag when writing to MANIFEST. This change can fix a compatibility test failure, i.e. 5.15 and earlier cannot recognize kInAtomicGroup tag. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4981 Differential Revision: D14072687 Pulled By: riversand963 fbshipit-source-id: 46b0c61e399f16c6b7169de0b33430d0ed90d6d4	2019-02-13 18:33:42 -08:00
Andrew Kryczka	62f70f6d14	Reduce scope of compression dictionary to single SST (#4952 ) Summary: Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio. So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include: - The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called. - After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up. - Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952 Differential Revision: D13967980 Pulled By: ajkr fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f	2019-02-11 19:47:32 -08:00
Andrew Kryczka	1218704b61	Fix `compression_zstd_max_train_bytes` coverage in stress test (#4957 ) Summary: Previously `finalize_and_sanitize` function was always zeroing out `compression_zstd_max_train_bytes`. It was only supposed to do that when non-ZSTD compression was used. But since `--compression_type` was an unknown argument (i.e., one that `db_crashtest.py` does not recognize and blindly forwards to `db_stress`), `finalize_and_sanitize` could not tell whether ZSTD was used. This PR fixes it simply by making `--compression_type` a known argument with snappy as default (same as `db_stress`). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4957 Differential Revision: D13994302 Pulled By: ajkr fbshipit-source-id: 1b0baea7331397822830970d3698642eb7a7df65	2019-02-11 14:56:39 -08:00
Siying Dong	cf3a671733	Remove cuckoo hash memtable (#4953 ) Summary: Cuckoo Hash is less useful than we initially expected. Remove it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4953 Differential Revision: D13979264 Pulled By: siying fbshipit-source-id: 2a60afdaa989f045357398b43a1cc5d46f4492ed	2019-02-07 16:15:27 -08:00
Siying Dong	d9c9f3c809	db_bench: fix "micros/op" reporting (#4949 ) Summary: `4985a9f73b (diff-e5276985b26a0551957144f4420a594bR511)` changes the meaning of latency reporting from running time per query, to elapse_time / #ops, without providing a reason why. Considering that this is a counter-intuitive reporting, Reverting the change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4949 Differential Revision: D13964684 Pulled By: siying fbshipit-source-id: d6304d3d4b5a802daa292302623c7dbca9a680bc	2019-02-05 17:20:02 -08:00
Alexander Zinoviev	32a6dd9a41	Add a new CPU time counter to compaction report (#4889 ) Summary: Measure CPU time consumed for a compaction and report it in the stats report Enable NowCPUNanos() to work for MacOS Pull Request resolved: https://github.com/facebook/rocksdb/pull/4889 Differential Revision: D13701276 Pulled By: zinoale fbshipit-source-id: 5024e5bbccd4dd10fd90d947870237f436445055	2019-01-29 17:24:00 -08:00
zhichao-cao	e2547103fd	Fix the build error caused by the dynamic array (#4918 ) Summary: In the MixGraph benchmark of db_bench #4788 , the char array is initialized with an argument from user's input, which can cause build error on some platforms. Also, the msg char array size can be potentially smaller than the printed data, which should be extended from 100 to 256. Tested with make check. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4918 Differential Revision: D13844298 Pulled By: sagar0 fbshipit-source-id: 33c4809c5c4438f0a9f7b289d3f42e20c545bbab	2019-01-28 12:39:57 -08:00
Andrew Kryczka	e242fa4664	Add latest toolchain (gcc-8, etc.) build support for fbcode users (#4923 ) Summary: - When building with internal dependencies, specify this toolchain by setting `ROCKSDB_FBCODE_BUILD_WITH_PLATFORM007=1` - It is not enabled by default. However, it is enabled for TSAN builds in CI since there is a known problem with TSAN in gcc-5: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71090 - I did not add support for Lua since (1) we agreed to deprecate it, and (2) we only have an internal build for v5.3 with this toolchain while that has breaking changes compared to our current version (v5.2). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4923 Differential Revision: D13827226 Pulled By: ajkr fbshipit-source-id: 9aa3388ed3679777cfb15ef8cbcb83c07f62f947	2019-01-28 11:26:32 -08:00
Sagar Vemuri	0cead31d10	Fix Clang static analyzer warning in db_bench (#4910 ) Summary: Fixed clang static analyzer warning about division by 0. ``` ar: creating librocksdb_debug.a tools/db_bench_tool.cc:4650:43: warning: Division by zero int pos = static_cast<int>(rand_num % range_); ~~~~~~~~~^~~~~~~~ 1 warning generated. make: *** [analyze] Error 1 ``` This is from the new code I recently merged in `ce8e88d2d7`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4910 Differential Revision: D13788037 Pulled By: sagar0 fbshipit-source-id: f48851dca85047c19fbb1a361e25ce643aa4c7ea	2019-01-23 13:33:02 -08:00
Zhongyi Xie	cbe0239270	add cast to avoid loss of precision error (#4906 ) Summary: this PR address the following error: > tools/db_bench_tool.cc:4776:68: error: implicit conversion loses integer precision: 'int64_t' (aka 'long') to 'unsigned int' [-Werror,-Wshorten-64-to-32] s = db_with_cfh->db->Put(write_options_, key, gen.Generate(value_size)); Pull Request resolved: https://github.com/facebook/rocksdb/pull/4906 Differential Revision: D13780185 Pulled By: miasantreble fbshipit-source-id: 1c83a77d341099518c72f0f4a63e97ab9c4784b3	2019-01-22 22:44:17 -08:00
Zhichao Cao	ce8e88d2d7	Generate mixed workload with Get, Put, Seek in db_bench (#4788 ) Summary: Based on the specific workload models (key access distribution, value size distribution, and iterator scan length distribution, the QPS variation), the MixGraph benchmark generate the synthetic workload according to these distributions which can reflect the real-world workload characteristics. After user enable the tracing function, they will get the trace file. By analyzing the trace file with the trace_analyzer tool, user can generate a set of statistic data files including. The _accessed_key_stats.txt, -accessed_value_size_distribution.txt, -iterator_length_distribution.txt, and -qps_stats.txt are mainly used to fit the Matlab model fitting. After that, user can get the parameters of the workload distributions (the modeling details are described: [here](https://github.com/facebook/rocksdb/wiki/RocksDB-Trace%2C-Replay%2C-and-Analyzer)) The key access distribution follows the The two-term power model. The probability density function is: `f(x) = ax^{b}+c`. The corresponding parameters are key_dist_a, key_dist_b, and key_dist_c in db_bench For the value size distribution and iterator scan length distribution, they both follow the Generalized Pareto Distribution. The probability density function is `f(x) = (1/sigma)(1+k(x-theta)/sigma))^{-1-1/k)`. The parameters are: value_k, value_theta, value_sigma and iter_k, iter_theta, iter_sigma. For more information about the Generalized Pareto Distribution, users can find the [wiki](https://en.wikipedia.org/wiki/Generalized_Pareto_distribution) and [Matalb page](https://www.mathworks.com/help/stats/generalized-pareto-distribution.html) As for the QPS, it follows the diurnal pattern. So Sine is a good model to fit it. `F(x) = sine_asin(sine_bx + sine_c) + sine_d`. The trace_will tell you the average QPS in the print out resutls, which is sine_d. After user fit the "-qps_stats.txt" to the Matlab model, user can get the sine_a, sine_b, and sine_c. By using the 4 parameters, user can control the QPS variation including the period, average, changes. To use the bench mark, user can indicate the following parameters as examples: ``` -benchmarks="mixgraph" -key_dist_a=0.002312 -key_dist_b=0.3467 -value_k=0.9233 -value_sigma=226.4092 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.7 -mix_put_ratio=0.25 -mix_seek_ratio=0.05 -sine_mix_rate_interval_milliseconds=500 -sine_a=15000 -sine_b=1 -sine_d=20000 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4788 Differential Revision: D13573940 Pulled By: sagar0 fbshipit-source-id: e184c27e07b4f1bc0b436c2be36c5090c1fb0222	2019-01-22 10:44:26 -08:00
Andrew Kryczka	01013ae766	Digest ZSTD compression dictionary once when writing SST file (#4849 ) Summary: This is essentially a re-submission of #4251 with a few improvements: - Split `CompressionDict` into two separate classes: `CompressionDict` and `UncompressionDict` - Eliminated `Init` functions. Instead do all initialization work in constructors. - Added test case for parallel DB open, which is the scenario where #4251 failed under TSAN. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4849 Differential Revision: D13606039 Pulled By: ajkr fbshipit-source-id: 08c236059798c710db9cbf545fce0f371232d447	2019-01-18 19:12:57 -08:00
Siying Dong	4e37251b4d	With ldb --try_load_options and wal_dir doesn't exist, ignore it (#4875 ) Summary: LDB is frequently used to exam data copied. wal_dir in option file is not modified and it usually points to the path it copied from. The user experience will be better if when ldb sees wal_dir pointed by the option file doesn't exist, rather than fail, just ignore it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4875 Differential Revision: D13643173 Pulled By: siying fbshipit-source-id: 2e64d4ea2ec49a6794b9a706b7fc1ba901128bb8	2019-01-11 16:48:32 -08:00
Yanqin Jin	ffc9f84649	Free memory after use Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4857 Differential Revision: D13602688 Pulled By: riversand963 fbshipit-source-id: 993419a6afb982a7a701ff71daebebb4b4a6b265	2019-01-08 17:19:09 -08:00
Yanqin Jin	e686caffec	Remove unnecessary assersion in AtomicFlushStressTest::TestCheckpoint (#4846 ) Summary: as titled. We can remove the assersion because we do not perform verification in AtomicFlushStressTest::TestCheckpoint for similar reasons to TestGet, TestPut, etc. Therefore, we override TestCheckpoint in AtomicFlushStressTest so that the assertion `rand_column_families.size() == rand_keys.size()' is removed, and we do not verify the DB in this function. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4846 Differential Revision: D13583377 Pulled By: riversand963 fbshipit-source-id: 03647f3da67e27a397413fd666e3bb43003bf596	2019-01-07 16:47:26 -08:00
Huachao Huang	74f7d7551e	tools: use provided options instead of the default (#4839 ) Summary: The current implementation hardcode the default options in different places, which makes it impossible to support other environments (like encrypted environment). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4839 Differential Revision: D13573578 Pulled By: sagar0 fbshipit-source-id: 76b58b4b758902798d10ff2f52d9f39abff015e7	2019-01-03 11:23:49 -08:00
Yanqin Jin	565b5bdc42	Add support for read-only db chkpt stress (#4690 ) Summary: Updated stress test will support testing of db in read-only mode. The user has to make sure that only read/scan operations are enabled. This PR relies on #4681. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4690 Differential Revision: D13102741 Pulled By: riversand963 fbshipit-source-id: f5a36b34db187fe12dd355f7eda161f99d6c75e4	2019-01-02 17:40:53 -08:00
Andrew Kryczka	ace543a815	fix accounting for range tombstones in TableProperties (#4841 ) Summary: - To be consistent with the accounting of other optypes in `TableProperties`, we should count range tombstones in `TableProperties::num_entries` and `TableProperties::num_deletions`. - Updated assertions in stress test's `OnTableFileCreated` handler to accept files with range tombstones only. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4841 Differential Revision: D13568424 Pulled By: ajkr fbshipit-source-id: 0139d7806494eda20ece67ec460d2458dbbf6026	2019-01-02 15:08:53 -08:00
Andrew Kryczka	68d949b3e3	Enable DeleteRange in stress/crash tests (#4483 ) Summary: Set `delrangepercent=1` when `test_batches_snapshots=false`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4483 Differential Revision: D10324361 Pulled By: ajkr fbshipit-source-id: 0cde1f1504f9493408a0c6493b976d7e5f5b2d23	2018-12-18 13:42:49 -08:00
Fosco Marotto	311cd8cf2f	Updated benchmark script (#4134 ) Summary: When producing the updated performance on flash results for the wiki, these are the updates which were made. https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks Pull Request resolved: https://github.com/facebook/rocksdb/pull/4134 Differential Revision: D13491052 Pulled By: gfosco fbshipit-source-id: dcd92f24659e0917cb1ac54a4446aa8e7aac8b0d	2018-12-17 16:34:30 -08:00
Andrew Kryczka	8d2b74d287	Refine db_stress params for atomic flush (#4781 ) Summary: Separate flag for enabling option from flag for enabling dedicated atomic stress test. I have found setting the former without setting the latter can detect different problems. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4781 Differential Revision: D13463211 Pulled By: ajkr fbshipit-source-id: 054f777885b2dc7d5ea99faafa21d6537eee45fd	2018-12-13 22:10:38 -08:00
DorianZheng	2670fe8c73	Get `CompactionJobInfo` from CompactFiles Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4716 Differential Revision: D13207677 Pulled By: ajkr fbshipit-source-id: d0ccf5a66df6cbb07288b0c5ebad81fd9df3926b	2018-12-13 14:21:24 -08:00
Sagar Vemuri	70645355ad	Move FIFOCompactionPicker to a separate file (#4724 ) Summary: Summary: Simplified the code layout by moving FIFOCompactionPicker to a separate file. Why?: While trying to add ttl functionality to universal compaction, I found that `FIFOCompactionPicker` class and its impl methods to be interspersed between `LevelCompactionPicker` methods which kind-of made the code a little hard to traverse. So I moved `FIFOCompactionPicker` to a separate compaction_picker_fifo.h/cc file, similar to `UniversalCompactionPicker`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4724 Differential Revision: D13227914 Pulled By: sagar0 fbshipit-source-id: 89471766ea67fa4d87664a41c057dd7df4b3d4e3	2018-11-29 16:04:52 -08:00

1 2 3 4 5 ...

784 commits