rocksdb

mirror of https://github.com/facebook/rocksdb.git synced 2024-11-26 16:30:56 +00:00

Author	SHA1	Message	Date
Feng Zhu	8f09d53fd1	remove malloc when create data and index iterator in Get Summary: Define Block::Iter to be an independent class to be used by block_based_table_reader When creating data and index iterator, update an existing iterator rather than new one Thus malloc and free could be reduced Benchmark, Base: commit `76286ee67e` commands: --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=2621440 --use_hash_search=1 --block_size=1024 --block_restart_interval=1 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 malloc: 3.30% -> 1.42% free: 3.59%->1.61% Test Plan: make all check run db_stress valgrind ./db_test ./table_test Reviewers: ljin, yhchiang, dhruba, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20655	2014-07-30 16:34:35 -07:00
Radheshyam Balasundaram	91c01485d1	Minor changes to CuckooTableBuilder Summary: - Copy the key and value to in-memory hash table during Add operation. Also modified cuckoo_table_reader_test to use this. - Store only the user_key in in-memory hash table if it is last level file. - Handle Carryover while chosing unused key in Finish() method in case unused key was never found before Finish() call. Test Plan: cuckoo_table_reader_test --enable_perf cuckoo_table_builder_test valgrind_check asan_check Reviewers: sdong, yhchiang, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20715	2014-07-28 17:14:25 -07:00
Lei Jin	40fa8a4cd5	make statistics forward-able Summary: Make StatisticsImpl being able to forward stats to provided statistics implementation. The main purpose is to allow us to collect internal stats in the future even when user supplies custom statistics implementation. It avoids intrumenting 2 sets of stats collection code. One immediate use case is tuning advisor, which needs to collect some internal stats, users may not be interested. Test Plan: ran db_bench and see stats show up at the end of run Will run make all check since some tests rely on statistics Reviewers: yhchiang, sdong, igor Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20145	2014-07-28 12:05:36 -07:00
sdong	4a8f0c957c	Block::Iter::PrefixSeek() to have an extra check to filter out some false matches Summary: In block based table's hash index checking, when looking for a key that doesn't exist, there is a high chance that a false block is returned because of hash bucket conflicts. In this revision, another check is done to filter out some of those cases: comparing previous key of the block boundary to see whether the target block is what we are looking for. In a favored test setting (bloom filter disabled, 8 L0 files), I saw about 80% improvements. In a non-favored test setting (bloom filter enabled, files are all in L1, files are all cached), I see the performance penalty is less than 3%. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: ljin Subscribers: wuj, leveldb, zagfox, yhchiang Differential Revision: https://reviews.facebook.net/D20595	2014-07-25 17:27:57 -07:00
Radheshyam Balasundaram	62f9b071ff	Implementation of CuckooTableReader Summary: Contains: - Implementation of TableReader based on Cuckoo Hashing - Unittests for CuckooTableReader - Performance test for TableReader Test Plan: make cuckoo_table_reader_test ./cuckoo_table_reader_test make valgrind_check make asan_check Reviewers: yhchiang, sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20511	2014-07-25 16:37:32 -07:00
Radheshyam Balasundaram	07a7d870b8	Addressing TODOs in CuckooTableBuilder Summary: Contains the following changes in CuckooTableBuilder: - Take an extra parameter in constructor to identify last level file. - Implement a better way to identify if a bucket has been inserted into the tree already during BFS search. - Minor typos Test Plan: make cuckoo_table_builder ./cuckoo_table_builder make valgrind_check Reviewers: sdong, igor, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20445	2014-07-24 10:07:41 -07:00
Feng Zhu	da9274574f	Use IterKey instead of string in Block::Iter to reduce malloc Summary: Modify a functioin TrimAppend in dbformat.h: IterKey. Write a test for it in dbformat_test Use IterKey in block::Iter to replace std::string to reduce malloc. Evaluate it using perf record. malloc: 4.26% -> 2.91% free: 3.61% -> 3.08% Test Plan: make all check ./valgrind db_test dbformat_test Reviewers: ljin, haobo, yhchiang, dhruba, igor, sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D20433	2014-07-23 12:31:11 -07:00
Igor Canadi	2d3d63597a	Fix signed-unsigned compare error	2014-07-22 15:35:07 -04:00
Radheshyam Balasundaram	f6272e3055	Fixing memory leaks in cuckoo_table_builder_test Summary: Fixes some memory leaks in cuckoo_builder_test.cc. This also fixed broken valgrind_check tests Test Plan: make valgrind_check ./cuckoo_builder_test Currently running make check all. I shall update once it is done. Reviewers: ljin, sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20385	2014-07-22 09:49:04 -07:00
Yueh-Hsuan Chiang	bbe2e91d00	Fixed a compile error of cuckoo_table_builder. Summary: Fixed the following compile error. ./table/cuckoo_table_builder.h:72:22: error: private field 'key_length_' is not used [-Werror,-Wunused-private-field] const unsigned int key_length_; ^ 1 error generated. Test Plan: make Reviewers: sdong, ljin, radheshyamb, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20349	2014-07-21 15:17:09 -07:00
Radheshyam Balasundaram	cf3da899b0	Adding a new SST table builder based on Cuckoo Hashing Summary: Cuckoo Hashing based SST table builder. Contains: - Cuckoo Hashing logic and file storage logic. - Unit tests for logic Test Plan: make cuckoo_table_builder_test ./cuckoo_table_builder_test make check all Reviewers: yhchiang, igor, sdong, ljin Reviewed By: ljin Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19545	2014-07-21 13:26:09 -07:00
Stanislau Hlebik	c1a90b0848	Fix db_bench Summary: Adding check for zero size index Test Plan: ./build_tools/regression_build_test.sh Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20259	2014-07-21 10:31:33 -07:00
Chilledheart	54f4e2f188	Fix clang compiler warnings	2014-07-20 22:57:20 +08:00
Stanislau Hlebik	9d70cce047	Adding option to save PlainTable index and bloom filter in SST file. Summary: Adding option to save PlainTable index and bloom filter in SST file. If there is no bloom block and/or index block, PlainTableReader builds new ones. Otherwise PlainTableReader just use these blocks. Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19527	2014-07-18 16:58:13 -07:00
Stanislau Hlebik	92d73cbe78	Add PlainTableOptions Summary: Since we have a lot of options for PlainTable, add a struct PlainTableOptions to manage them Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20175	2014-07-18 00:08:38 -07:00
Igor Canadi	1614284eff	Fix compressed cache	2014-07-16 06:45:49 -07:00
Stanislau Hlebik	30c81e7717	Removing NewTotalOrderPlainTableFactory Summary: Seems like NewTotalOrderPlainTableFactory is useless and is semantically incorrect. Total order mode indicator is prefix_extractor == nullptr, but NewTotalOrderPlainTableFactory doesn't set it to be nullptr. That's why some tests in plain_table_db_tests is incorrect. Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19587	2014-07-10 11:32:04 -07:00
Lei Jin	8d9a46fcd1	initialize decoded_internal_key_valid Summary: ReadInternalKey() will assign correct value anyway. Initialize it to true to suppress compiler error reported https://github.com/facebook/rocksdb/issues/186 Test Plan: I cannot reproduce it but this is obvious Reviewers: sdong, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19467	2014-07-03 23:13:08 -07:00
Feng Zhu	5656367416	use arena to allocate memtable's bloomfilter and hashskiplist's buckets_ Summary: Bloomfilter and hashskiplist's buckets_ allocated by memtable's arena DynamicBloom: pass arena via constructor, allocate space in SetTotalBits HashSkipListRep: allocate space of buckets_ using arena. do not delete it in deconstructor because arena would take care of it. Several test files are changed. Test Plan: make all check Reviewers: ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: igor, dhruba Differential Revision: https://reviews.facebook.net/D19335	2014-06-30 15:54:31 -07:00
Igor Canadi	9fe87b17aa	Fix compile	2014-06-20 10:36:48 +02:00
Igor Canadi	d4a8423334	Remove seek compaction Summary: As discussed in our internal group, we don't get much use of seek compaction at the moment, while it's making code more complicated and slower in some cases. This diff removes seek compaction and (hopefully) all code that was introduced to support seek compaction. There is one test case that relied on didIO information. I'll try to find another way to implement it. Test Plan: make check Reviewers: sdong, haobo, yhchiang, ljin, dhruba Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19161	2014-06-20 10:23:02 +02:00
Haobo Xu	7a9dd5f214	[RocksDB] Make block based table hash index more adaptive Summary: Currently, RocksDB returns error if a db written with prefix hash index, is later opened without providing a prefix extractor. This is uncessarily harsh. Without a prefix extractor, we could always fallback to the normal binary index. Test Plan: unit test, also manually veried LOG that fallback did occur. Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19191	2014-06-19 16:40:32 -07:00
Lei Jin	1ec2d1c69d	fix make shared_lib compilation error Summary: s/class ParsedInternalKey/struct ParsedInternalKey Test Plan: make shared_lib Reviewers: igor, yhchiang, sdong, haobo Reviewed By: haobo Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19173	2014-06-19 10:12:26 -07:00
Haobo Xu	167738256f	[RocksDB] Fix unit test Summary: fix a bug in D19047, which caused DBTest.RecoverDuringMemtableCompaction to fail. Test Plan: unit test Reviewers: sdong, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19155	2014-06-19 01:37:21 -07:00
sdong	edd47c5104	PlainTable to encode to avoid to rewrite prefix when it is the same as the previous key Summary: Add a encoding feature of PlainTable to encode PlainTable's keys to save some bytes for the same prefixes. The data format is documented in table/plain_table_factory.h Test Plan: Add unit test coverage in plain_table_db_test Reviewers: yhchiang, igor, dhruba, ljin, haobo Reviewed By: haobo Subscribers: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D18735	2014-06-18 20:41:52 -07:00
Haobo Xu	0f0076ed5a	[RocksDB] Reduce memory footprint of the blockbased table hash index. Summary: Currently, the in-memory hash index of blockbased table uses a precise hash map to track the prefix to block range mapping. In some use cases, especially when prefix itself is big, the memory overhead becomes a problem. This diff introduces a fixed hash bucket array that does not store the prefix and allows prefix collision, which is similar to the plaintable hash index, in order to reduce the memory consumption. Just a quick draft, still testing and refining. Test Plan: unit test and shadow testing Reviewers: dhruba, kailiu, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19047	2014-06-18 18:16:07 -07:00
Igor Canadi	3525aac9e5	Change order of parameters in adaptive table factory Summary: This is minor, but if we put the writing talbe factory as the third parameter, when we add a new table format, we'll have a situation: 1) block based factory 2) plain table factory 3) output factory 4) new format factory I think it makes more sense to have output as the first parameter. Also, fixed a NewAdaptiveTableFactory() call in unit test Test Plan: unit test Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19119	2014-06-18 07:04:37 +02:00
sdong	200e4b4a72	Add a table factory that can read DB with both of PlainTable and BlockBasedTable in it Summary: The new table factory is used if users want to convert a DB from one table format to the other. A user can use this table to open a DB written using one table format and write new files to another table format. Test Plan: add a unit test Reviewers: haobo, igor Reviewed By: igor Subscribers: dhruba, ljin, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D19017	2014-06-17 11:49:22 -07:00
Lei Jin	c83b085770	prefetch bloom filter data block for L0 files Summary: as title Test Plan: db_bench the initial result is very promising. I will post results of complete runs Reviewers: dhruba, haobo, sdong, igor Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18867	2014-06-12 10:06:18 -07:00
sdong	88a1691a1e	BlockBasedTable::PrefixMayMatch() to bloom setting to the beginning of the function Summary: In BlockBasedTable::PrefixMayMatch() we calculate prefix even if bloom is not config. Move the check before Test Plan: make all check Reviewers: igor, ljin Reviewed By: ljin Subscribers: wuj, leveldb, haobo, yhchiang, dhruba Differential Revision: https://reviews.facebook.net/D18993	2014-06-10 11:14:22 -07:00
sdong	80f409ea37	Clean PlainTableReader's variables for better data locality Summary: Clean PlainTableReader's data structures: (1) inline bloom_ (in order to do this, change DynamicBloom to allow lazy initialization) (2) remove some variables only used when initialization from the class (3) put variables not used in normal read code paths to the end of the class and reference prefix_extractor directly (4) make Options a reference. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: ljin Subscribers: igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18891	2014-06-09 13:53:39 -07:00
Igor Canadi	f43c8262c2	Don't compress block bigger than 2GB Summary: This is a temporary solution to a issue that we have with compression libraries. See task #4453446. Test Plan: make check doesn't complain :) Reviewers: haobo, ljin, yhchiang, dhruba, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18975	2014-06-09 12:26:09 -07:00
sdong	df9069d23f	In DB::NewIterator(), try to allocate the whole iterator tree in an arena Summary: In this patch, try to allocate the whole iterator tree starting from DBIter from an arena 1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it. 2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator. 3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it. Limitations: (1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc (2) Two level iterator itself is allocated in arena, but not iterators inside it. Test Plan: make all check Reviewers: ljin, haobo Reviewed By: haobo Subscribers: leveldb, dhruba, yhchiang, igor Differential Revision: https://reviews.facebook.net/D18513	2014-06-02 17:44:57 -07:00
Lei Jin	388d2054c7	forward iterator Summary: Forward iterator puts everything together in a flat structure instead of a hierarchy of nested iterators. this should simplify the code and provide better performance. It also enables more optimization since all information are accessiable in one place. Init evaluation shows about 6% improvement Test Plan: db_test and db_bench Reviewers: dhruba, igor, tnovak, sdong, haobo Reviewed By: haobo Subscribers: sdong, leveldb Differential Revision: https://reviews.facebook.net/D18795	2014-05-30 14:31:55 -07:00
Kai Liu	0b3d03d026	Materialize the hash index Summary: Materialize the hash index to avoid the soaring cpu/flash usage when initializing the database. Test Plan: existing unit tests passed Reviewers: sdong, haobo Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D18339	2014-05-15 14:09:03 -07:00
Igor Canadi	26f5dd9a5a	TablePropertiesCollectorFactory Summary: This diff addresses task #4296714 and rethinks how users provide us with TablePropertiesCollectors as part of Options. Here's description of task #4296714: I'm debugging #4295529 and noticed that our count of user properties kDeletedKeys is wrong. We're sharing one single InternalKeyPropertiesCollector with all Table Builders. In LOG Files, we're outputting number of kDeletedKeys as connected with a single table, while it's actually the total count of deleted keys since creation of the DB. For example, this table has 3155 entries and 1391828 deleted keys. The problem with current approach that we call methods on a single TablePropertiesCollector for all the tables we create. Even worse, we could do it from multiple threads at the same time and TablePropertiesCollector has no way of knowing which table we're calling it for. Good part: Looks like nobody inside Facebook is using Options::table_properties_collectors. This means we should be able to painfully change the API. In this change, I introduce TablePropertiesCollectorFactory. For every table we create, we call `CreateTablePropertiesCollector`, which creates a TablePropertiesCollector for a single table. We then use it sequentially from a single thread, which means it doesn't have to be thread-safe. Test Plan: Added a test in table_properties_collector_test that fails on master (build two tables, assert that kDeletedKeys count is correct for the second one). Also, all other tests Reviewers: sdong, dhruba, haobo, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D18579	2014-05-13 12:30:55 -07:00
Igor Canadi	fec4269966	Fix more gflag namespace issues	2014-05-09 08:41:02 -07:00
sdong	ddd41146c4	MergingIterator uses autovector instead of vector Summary: Use autovector in MergingIterator so that if there are 4 or less child iterators in it, iterator wrappers are inline, which is more likely to be cache friendly. Based on one test run with a shadow traffic of one product, it reduces CPU of MergingIterator::Seek() by half. Test Plan: make all check Reviewers: haobo, yhchiang, igor, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D18531	2014-05-08 15:01:20 -07:00
Igor Canadi	b5616dafd1	Fix iOS compile	2014-05-07 17:48:31 -07:00
sdong	3a171dcb51	Pass logger to memtable rep and TLB page allocation error logged to info logs Summary: TLB page allocation errors are now logged to info logs, instead of stderr. In order to do that, mem table rep's factory functions take a info logger now. Test Plan: make all check Reviewers: haobo, igor, yhchiang Reviewed By: yhchiang CC: leveldb, yhchiang, dhruba Differential Revision: https://reviews.facebook.net/D18471	2014-05-05 16:43:37 -07:00
sdong	9b17558311	PlainTableFactory::PlainTableFactory() to have huge TLB turned off by default Summary: PlainTableFactory::PlainTableFactory() now has Huge TLB page feature turned on by default. Although it is not a public API (which we always turn the feature off now), our unit tests, like db_test sometimes uses it directly, which causes wrong coverage of codes. This patch fix it to allow unit tests to run with the correct setting Test Plan: Run db_test and make sure this feature is not on any more. Reviewers: igor, haobo Reviewed By: igor CC: yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18483	2014-05-05 11:05:54 -07:00
sdong	4a7c747064	Revert "Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB"" And make the default 0 for hash linked list memtable This reverts commit `d69dc64be7`.	2014-05-04 13:56:29 -07:00
Igor Canadi	d69dc64be7	Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB" This reverts commit `7dafa3a1d7`.	2014-05-04 08:37:09 -07:00
Igor Canadi	d28ed6931f	fix release build	2014-05-01 12:42:06 -07:00
Igor Canadi	d29e48bb2e	fix compile warning	2014-05-01 14:12:35 -04:00
Igor Canadi	0afc8bc29a	xxHash Summary: Originally: https://github.com/facebook/rocksdb/pull/87/files I'm taking over to apply some finishing touches Test Plan: will add tests Reviewers: dhruba, haobo, sdong, yhchiang, ljin Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D18315	2014-05-01 14:09:32 -04:00
sdong	7dafa3a1d7	Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB Summary: Add an option to allocate a piece of memory from huge page TLB. Add options to trigger it in dynamic bloom, plain table indexes andhash linked list hash table. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: nkg-, dhruba, leveldb, igor, yhchiang Differential Revision: https://reviews.facebook.net/D18357	2014-04-30 11:02:26 -07:00
Igor Canadi	c489499a2b	Fix OSX compile	2014-04-26 17:15:43 -04:00
Lei Jin	ccaca59bee	avoid calling FindFile twice in TwoLevelIterator for PlainTable Summary: this is to reclaim the regression introduced in https://reviews.facebook.net/D17853 Test Plan: make all check Reviewers: igor, haobo, sdong, dhruba, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17985	2014-04-25 12:23:07 -07:00
Lei Jin	d642c60bdc	Check PrefixMayMatch on Seek() Summary: As a follow-up diff for https://reviews.facebook.net/D17805, add optimization to check PrefixMayMatch on Seek() Test Plan: make all check Reviewers: igor, haobo, sdong, yhchiang, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17853	2014-04-25 12:22:23 -07:00
Lei Jin	3995e801ab	kill ReadOptions.prefix and .prefix_seek Summary: also add an override option total_order_iteration if you want to use full iterator with prefix_extractor Test Plan: make all check Reviewers: igor, haobo, sdong, yhchiang Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D17805	2014-04-25 12:21:34 -07:00
sdong	86a0133d05	PlainTableReader to expose index size to users Summary: This is a temp solution to expose index sizes to users from PlainTableReader before we persistent them to files. In this patch, the memory consumption of indexes used by PlainTableReader will be reported as two user defined properties, so that users can monitor them. Test Plan: Add a unit test. make all check` Reviewers: haobo, ljin Reviewed By: haobo CC: nkg-, yhchiang, igor, ljin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18195	2014-04-22 19:29:05 -07:00
Yueh-Hsuan Chiang	af6ad113a8	Fix SIGFAULT when running sst_dump on v2.6 db Summary: Fix the sigfault when running sst_dump on v2.6 db. Test Plan: git checkout `bba6595b1f` make clean make db_bench ./db_bench --db=/tmp/some/db --benchmarks=fillseq arc patch D18039 make clean make sst_dump ./sst_dump --file=/tmp/some/db --command=check Reviewers: igor, haobo, sdong Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D18039	2014-04-21 17:49:47 -07:00
Igor Canadi	8dc34364d2	Rename "benchmark" back to "bench". Also, make `benchharness.cc` not compiled into rocksdb library.	2014-04-21 13:12:15 -07:00
Pratyush Seth	ff1b5df4c6	Added benchmark functionality on the lines of folly/Benchmark.h Summary: Added benchmark functionality on the lines of folly/Benchmark.h Test Plan: Added unit tests Reviewers: igor, haobo, sdong, ljin, yhchiang, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17973	2014-04-21 12:29:55 -07:00
sdong	27d3bc184e	Use a different approach to make sure BlockBasedTableReader can use hash index on older files Summary: A recent commit `e37dd216f9` makes sure hash index can be used when reading existing files. This patch uses another way to achieve the approach: (1) Currently, always writing kBinarySearch to files, despite of BlockBasedTableOptions.IndexType setting. (2) When reading a file, read out the field, and make sure it is kBinarySearch, while always use index type by users. The reason for doing it is, to reserve kHashSearch property on disk to future. If now we write out binary index for both of kHashSearch and kBinarySearch. We have to use a new flag in the future for hash index on disk, otherwise compatibility would break. Also, we want the real index type and type shown in properties block to be consistent. Test Plan: make all check Reviewers: haobo, kailiu Reviewed By: kailiu CC: igor, ljin, yhchiang, xjin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18009	2014-04-18 14:09:21 -07:00
Kai Liu	e37dd216f9	Index type doesn't have to be persisted Summary: With the recent changes, there is no need to check the property block about the index block type. If user want to use it, they don't really need any disk format change; everything happens in the fly. Also another team encountered an error while reading the index type from properties. Test Plan: ran all the tests Reviewers: sdong CC: Task ID: # Blame Rev:	2014-04-17 11:08:12 -07:00
sdong	5cef458a2c	RocksDB 2.8 to be able to read files generated by 2.6 Summary: From 2.6 to 2.7, property block name is renamed from rocksdb.stats to rocksdb.properties. Older properties were not able to be loaded. In 2.8, we seem to have added some logic that uses property block without checking null pointers, which create segment faults. In this patch, we fix it by: (1) try rocksdb.stats if rocksdb.properties is not found (2) add some null checking before consuming rep->table_properties Test Plan: make sure a file generated in 2.7 couldn't be opened now can be opened. Reviewers: haobo, igor, yhchiang Reviewed By: igor CC: ljin, xjin, dhruba, kailiu, leveldb Differential Revision: https://reviews.facebook.net/D17961	2014-04-17 09:51:43 -07:00
Igor Canadi	588bca2020	RocksDBLite Summary: Introducing RocksDBLite! Removes all the non-essential features and reduces the binary size. This effort should help our adoption on mobile. Binary size when compiling for IOS (`TARGET_OS=IOS m static_lib`) is down to 9MB from 15MB (without stripping) Test Plan: compiles :) Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17835	2014-04-15 13:39:26 -07:00
Kai Liu	e23e73e67c	Use shorten index key for hash-index Summary: I was wrong about the "index builder", right now since we create index by scanning both whole table and index, there is not need to preserve the whole key as the index key. I switch back to original way index which is both space efficient and able to supprot in-fly construction of hash index. IN this patch, I made minimal change since I'm not sure if we still need the "pluggable index builder", under current circumstance it is of no use and kind of over-engineered. But I'm not sure if we can still exploit its usefulness in the future; otherwise I think I can just burn them with great vengeance. Test Plan: unit tests Reviewers: sdong, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17745	2014-04-10 22:45:25 -07:00
Kai Liu	75b59d5146	Enable hash index for block-based table Summary: Based on previous patches, this diff eventually provides the end-to-end mechanism for users to specify the hash-index. Test Plan: Wrote several new unit tests. Reviewers: sdong, haobo, dhruba Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16539	2014-04-10 14:19:43 -07:00
Igor Canadi	4daea66343	Turn on -Wmissing-prototypes Summary: Compiling for iOS has by default turned on -Wmissing-prototypes, which causes rocksdb to fail compiling. This diff turns on -Wmissing-prototypes in our compile options and cleans up all functions with missing prototypes. Test Plan: compiles Reviewers: dhruba, haobo, ljin, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17649	2014-04-09 21:17:14 -07:00
sdong	e9ed28f9c8	PlainTableBuilder::Add() to use local char array instead of reused std::string as tmp buffer Summary: Our profile shows that in one of the applications, 5% of the CPU costs of PlainTableBuilder::Add() are spent on std::string stacks. By this simple change, we avoid this global reusable string. Also, we avoid another call of file appending, which probably gives another 2%. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17601	2014-04-09 10:17:32 -07:00
Igor Canadi	34455deb06	Fix Mac OS compile issues	2014-04-08 14:05:53 -07:00
Igor Canadi	5b345b76cb	Remove env_ from MergingIterator Summary: env_ is not used. Compiling for iOS complains. Test Plan: compiles now Reviewers: ljin, haobo, sdong, dhruba Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17589	2014-04-08 13:40:42 -07:00
Lei Jin	92c1eb0291	macros for perf_context Summary: This will allow us to disable them completely for iOS or for better performance Test Plan: will run make all check Reviewers: igor, haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17511	2014-04-08 10:58:07 -07:00
sdong	5e2db3b434	PlainTableIterator not to store copied key in std::string Summary: Move PlainTableIterator's copied key from std::string local buffer to avoid paying the extra costs in std::string related to sharing. Reuse the same buffer class in DbIter. Move the class to dbformat.h. This patch improves iterator performance significantly. Running this benchmark: ./table_reader_bench --num_keys2=17 --iterator --plain_table --time_unit=nanosecond The average latency is improved to about 750 nanoseconds from 1100 nanoseconds. Test Plan: Add a unit test. make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17547	2014-04-07 19:06:09 -07:00
Igor Canadi	3d2fe844ab	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/memtable_list.cc db/version_set.cc	2014-04-07 11:31:11 -07:00
Igor Canadi	bcd1f15b60	Remove -Wno-unused-const-variable	2014-04-04 16:15:47 -07:00
Igor Canadi	8555ce2dec	Merge branch 'master' into columnfamilies	2014-04-02 10:48:05 -07:00
sdong	d50619a559	PlainTableIterator::Seek() shouldn't check bloom filter in total order mode Summary: In total order mode, iterator's seek() shouldn't check total order. Also some cleaning up about checking null for shared pointers. I don't know the behavior before it. This bug was reported by @igor. Test Plan: test plain_table_db_test Reviewers: ljin, haobo, igor Reviewed By: igor CC: yhchiang, dhruba, igor, leveldb Differential Revision: https://reviews.facebook.net/D17391	2014-04-01 15:05:16 -07:00
Igor Canadi	ddbd1ece88	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_test.cc db/internal_stats.cc db/internal_stats.h db/version_edit.cc db/version_edit.h db/version_set.cc include/rocksdb/options.h util/options.cc	2014-03-31 13:39:24 -07:00
Lei Jin	0d755fff14	cache friendly blocked bloomfilter Summary: By constraining the probes within cache line(s), we can improve the cache miss rate thus performance. This probably only makes sense for in-memory workload so defaults the option to off. Numbers and comparision can be found in wiki: https://our.intern.facebook.com/intern/wiki/index.php/Ljin/rocksdb_perf/2014_03_17#Bloom_Filter_Study Test Plan: benchmarked this change substantially. Will run make all check as well Reviewers: haobo, igor, dhruba, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17133	2014-03-28 09:21:20 -07:00
Igor Canadi	e20fa3f8a4	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/internal_stats.cc db/internal_stats.h db/version_set.cc	2014-03-19 17:22:20 -07:00
Kai Liu	69f6cf431d	Fix two bugs in talbe format Summary: Previous code had two bugs: * didn't initialize the table_magic_number_ explicitly -- as a result a random junk number is stored for table_magic_number_, making HasInitializedMagicNumber() always return true. * if condition is inconrrect in set_table_magic_number(), and the return value is not checked. I replace if-else by a stronger requirement enforced by assert(). Test Plan: Previous sst_dump failed to work. After the fix, things back to normal. Reviewers: yhchiang CC: haobo, sdong, igor, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17055	2014-03-19 16:18:33 -07:00
Igor Canadi	fb2346fc1f	[CF] Code cleanup part 1 Summary: I'm cleaning up some code preparing for the big diff review tomorrow. This is the first part of the cleanup. Changes are mostly cosmetic. The goal is to decrease amount of code difference between columnfamilies and master branch. This diff also fixes race condition when dropping column family. Test Plan: Ran db_stress with variety of parameters Reviewers: dhruba, haobo Differential Revision: https://reviews.facebook.net/D16833	2014-03-12 09:56:53 -07:00
Igor Canadi	9634ba42ac	Merge branch 'master' into columnfamilies Conflicts: db/compaction_picker.cc db/db_impl.cc db/db_impl.h db/tailing_iter.cc db/version_set.h include/rocksdb/options.h util/options.cc	2014-03-10 17:26:09 -07:00
Lei Jin	8d007b4aaf	Consolidate SliceTransform object ownership Summary: (1) Fix SanitizeOptions() to also check HashLinkList. The current dynamic case just happens to work because the 2 classes have the same layout. (2) Do not delete SliceTransform object in HashSkipListFactory and HashLinkListFactory destructor. Reason: SanitizeOptions() enforces prefix_extractor and SliceTransform to be the same object when HashFactory is used. This makes the behavior strange: when HashFactory is used, prefix_extractor will be released by RocksDB. If other memtable factory is used, prefix_extractor should be released by user. Test Plan: db_bench && make asan_check Reviewers: haobo, igor, sdong Reviewed By: igor CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D16587	2014-03-10 12:56:46 -07:00
Igor Canadi	1e0d47276c	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h	2014-03-07 16:59:47 -08:00
Kai Liu	566f18e6ad	More precise calculation of sub_index_size Summary: Previous we did rough estimation of subindex size, which in worst case may result in array reallocation. This patch aims to get the exact size and avoid any reallocation. Test Plan: make all check Reviewers: sdong, dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16125	2014-03-06 17:30:46 -08:00
Igor Canadi	0738ae6dc9	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc	2014-03-05 12:25:05 -08:00
kailiu	a01bda0997	Fix a buggy assert Summary: The assert was pointless since if if prefix is the same as the whole key, assertion will surely fail. Reason behind is when performing the internal key comparison, if user keys are the same, key with smaller transaction id wins. Test Plan: make -j32 && make check Reviewers: sdong, dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16551	2014-03-04 18:08:23 -08:00
Igor Canadi	fa34697237	Merge branch 'master' into columnfamilies	2014-03-04 09:39:14 -08:00
kailiu	a1d56e73ec	Uncomment the unit tests in table test	2014-03-03 22:19:49 -08:00
kailiu	906f3dca72	Add a hash-index component for block Summary: this is the key component extracted from diff: https://reviews.facebook.net/D14271 I separate it to a dedicated patch to make the review easier. Test Plan: added a unit test and passed it. Reviewers: haobo, sdong, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D16245	2014-03-03 21:11:49 -08:00
Igor Canadi	9d0577a6be	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/transaction_log_impl.cc db/transaction_log_impl.h include/rocksdb/options.h util/env.cc util/options.cc	2014-03-03 18:29:03 -08:00
sdong	f0ee2356af	Fix issue with iterator operations in this order: Prev(), Seek(), Prev() Summary: Due to a bad merge of D14163 and D14001 before checking in D14001, "direction_ = kForward;" in MergeIterator::Seek() was deleted my mistake (in commit `b135d01e7b` ). It will generate wrong results or assert failure after the sequence of Prev() (or SeekToLast()), Seek() and Prev(). Fix it Test Plan: make all check Reviewers: igor, haobo, dhruba Reviewed By: igor CC: yhchiang, i.am.jin.lei, ljin, leveldb Differential Revision: https://reviews.facebook.net/D16527	2014-03-03 17:50:27 -08:00
kailiu	1aeafeccac	Make the Create() function comform the convention Summary: Moved "Return multiple values" a more conventional way.	2014-03-01 23:43:03 -08:00
Kai Liu	16d4e45c12	Fix the memory leak in table index Summary: BinarySearchIndex didn't use unique_ptr to guard the block object nor delete it in destructor, leading to valgrind failure for "definite memory leak". Test Plan: re-ran the failed valgrind test cases	2014-03-01 11:50:35 -08:00
Kai Liu	ff151132b3	Fix the unit test failure in devbox Summary: My last diff was developed in MacOS but in devserver environment error occurs. I dug into the problem and found the way we calcuate approximate data size is pretty out-of-date. We can use table properties to get more accurate results. Test Plan: ran ./table_test and passed Reviewers: igor, dhruba, haobo, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16509	2014-02-28 20:40:05 -08:00
kailiu	74939a9e13	Make the block-based table's index pluggable Summary: This patch introduced a new table options that allows us to make block-based table's index pluggable. To support that new features: * Code has been refacotred to be more flexible and supports this option well. * More documentation is added for the existing obsecure functionalities. * Big surgeon on DataBlockReader(), where the logic was really convoluted. * Other small code cleanups. The pluggablility will mostly affect development of internal modules and won't change frequently, as a result I intentionally avoid heavy-weight patterns (like factory) and try to make it simple. Test Plan: make all check Reviewers: haobo, sdong Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16395	2014-02-28 18:19:07 -08:00
kailiu	bf86af5174	Remove the terrible hack in for flush_block_policy_factory Summary: Previous code is too convoluted and I must be drunk for letting such code to be written without a second thought. Thanks to the discussion with @sdong, I added the `Options` when generating the flusher, thus avoiding the tricks. Just FYI: I resisted to add Options in flush_block_policy.h since I wanted to avoid cyclic dependencies: FlushBlockPolicy dpends on Options and Options also depends FlushBlockPolicy... While I appreciate my effort to prevent it, the old design turns out creating more troubles than it tried to avoid. Test Plan: ran ./table_test Reviewers: sdong Reviewed By: sdong CC: sdong, leveldb Differential Revision: https://reviews.facebook.net/D16503	2014-02-28 16:39:27 -08:00
kailiu	444cafc28c	Fix inconsistent code format Summary: Found some function follows camel style. When naming funciton, we have two styles: Trivially expose internal data in readonly mode: `all_lower_case()` Regular function: `CapitalizeFirstLetter()` I renames these functions. Test Plan: make -j32 Reviewers: haobo, sdong, dhruba, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D16383	2014-02-26 18:56:39 -08:00
sdong	a04dbf6e49	PlainTable::Next() should pass the error message from ReadKey() Summary: PlainTable::Next() should pass the error message from ReadKey(). Now it would return a wrong error message. Also improve the messages of status when failing to read Test Plan: make all check Reviewers: ljin, kailiu, haobo Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D16365	2014-02-26 15:12:44 -08:00
Igor Canadi	d39da4b578	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc	2014-02-24 17:09:05 -08:00
Kai Liu	2b205b35d8	Disable putting filter block to block cache Summary: This bug caused server crash issues because the filter block is too big and kept purging out of cache. Test Plan: Wrote a new unit tests to make sure it works. Reviewers: dhruba, haobo, igor, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16221	2014-02-19 15:38:57 -08:00
Igor Canadi	76c048183c	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_test.cc include/rocksdb/db.h	2014-02-14 16:46:03 -08:00
kailiu	63690625cd	Expose the table properties to application Summary: Provide a public API for users to access the table properties for each SSTable. Test Plan: Added a unit tests to test the function correctness under differnet conditions. Reviewers: haobo, dhruba, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16083	2014-02-13 16:28:21 -08:00
Kai Liu	b2e7ee8b41	Followup code refactor on plain table Summary: Fixed most comments in https://reviews.facebook.net/D15429. Still have some remaining comments left. Test Plan: make all check Reviewers: sdong, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15885	2014-02-13 15:27:59 -08:00
Kai Liu	59cffe02c4	Benchmark table reader wiht nanoseconds Summary: nanosecnods gave us better view of the performance, especially when some operations are fast so that micro seconds may only reveal less informative results. Test Plan: sample output: ./table_reader_bench --plain_table --time_unit=nanosecond ======================================================================================================= InMemoryTableSimpleBenchmark: PlainTable num_key1: 4096 num_key2: 512 non_empty ======================================================================================================= Histogram (unit: nanosecond): Count: 6291456 Average: 475.3867 StdDev: 556.05 Min: 135.0000 Median: 400.1817 Max: 33370.0000 Percentiles: P50: 400.18 P75: 530.02 P99: 887.73 P99.9: 8843.26 P99.99: 9941.21 ------------------------------------------------------ [ 120, 140 ) 2 0.000% 0.000% [ 140, 160 ) 452 0.007% 0.007% [ 160, 180 ) 13683 0.217% 0.225% [ 180, 200 ) 54353 0.864% 1.089% [ 200, 250 ) 101004 1.605% 2.694% [ 250, 300 ) 729791 11.600% 14.294% ## [ 300, 350 ) 616070 9.792% 24.086% ## [ 350, 400 ) 1628021 25.877% 49.963% ##### [ 400, 450 ) 647220 10.287% 60.250% ## [ 450, 500 ) 577206 9.174% 69.424% ## [ 500, 600 ) 1168585 18.574% 87.999% #### [ 600, 700 ) 506875 8.057% 96.055% ## [ 700, 800 ) 147878 2.350% 98.406% [ 800, 900 ) 42633 0.678% 99.083% [ 900, 1000 ) 16304 0.259% 99.342% [ 1000, 1200 ) 7811 0.124% 99.466% [ 1200, 1400 ) 1453 0.023% 99.490% [ 1400, 1600 ) 307 0.005% 99.494% [ 1600, 1800 ) 81 0.001% 99.496% [ 1800, 2000 ) 18 0.000% 99.496% [ 2000, 2500 ) 8 0.000% 99.496% [ 2500, 3000 ) 6 0.000% 99.496% [ 3500, 4000 ) 3 0.000% 99.496% [ 4000, 4500 ) 116 0.002% 99.498% [ 4500, 5000 ) 1144 0.018% 99.516% [ 5000, 6000 ) 1087 0.017% 99.534% [ 6000, 7000 ) 2403 0.038% 99.572% [ 7000, 8000 ) 9840 0.156% 99.728% [ 8000, 9000 ) 12820 0.204% 99.932% [ 9000, 10000 ) 3881 0.062% 99.994% [ 10000, 12000 ) 135 0.002% 99.996% [ 12000, 14000 ) 159 0.003% 99.998% [ 14000, 16000 ) 58 0.001% 99.999% [ 16000, 18000 ) 30 0.000% 100.000% [ 18000, 20000 ) 14 0.000% 100.000% [ 20000, 25000 ) 2 0.000% 100.000% [ 25000, 30000 ) 2 0.000% 100.000% [ 30000, 35000 ) 1 0.000% 100.000% Reviewers: haobo, dhruba, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16113	2014-02-13 13:57:36 -08:00

1 2 3 4 5 ...

314 commits