rocksdb

Commit Graph

Author	SHA1	Message	Date
Albert Strasheim	df2f92214a	Support for LZ4 compression.	2014-02-08 14:15:51 -08:00
Igor Canadi	99e61fdd5c	[CF] Separate dumping of DBOptions and ColumnFamilyOptions Summary: When we open a DB, we should dump only DBOptions and then when we create a new column family, we dump ColumnFamilyOptions for each one. Test Plan: make check, confirm contents of the LOG Reviewers: dhruba, haobo, sdong, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D16011	2014-02-07 13:52:51 -08:00
Igor Canadi	2b8c44639a	Merge branch 'master' into columnfamilies	2014-02-07 11:42:56 -08:00
Igor Canadi	d53b188228	Fix some errors detected by coverity scan Summary: Nothing major, just an extra return line and posibility of leaking fb in NewRandomRWFile Test Plan: make check Reviewers: kailiu, dhruba Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15993	2014-02-06 21:59:44 -08:00
Igor Canadi	0143abdbb0	Merge branch 'master' into columnfamilies Conflicts: HISTORY.md db/db_impl.cc db/db_impl.h db/db_iter.cc db/db_test.cc db/dbformat.h db/memtable.cc db/memtable_list.cc db/memtable_list.h db/table_cache.cc db/table_cache.h db/version_edit.h db/version_set.cc db/version_set.h db/write_batch.cc db/write_batch_test.cc include/rocksdb/options.h util/options.cc	2014-02-06 15:58:20 -08:00
Igor Canadi	0b4ccf765c	Flushes should always go to HIGH priority thread pool Summary: This is not column-family related diff. It is in columnfamily branch because the change is significant and we want to push it with next major release (3.0). It removes the leveldb notion of one thread pool and expands it to two thread pools by default (HIGH and LOW). Flush process is removed from compaction process and all flush threads are executed on HIGH thread pool, since we don't want long-running compactions to influence flush latency. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15987	2014-02-06 14:17:13 -08:00
kailiu	84f8185fc0	Merge branch 'master' into performance Conflicts: HISTORY.md db/db_impl.cc db/memtable.cc	2014-02-05 21:21:00 -08:00
Igor Canadi	f276e0e59d	[CF] Options -> DBOptions Summary: Replaced most of occurrences of Options with more specific DBOptions. This brings us very close to supporting different configuration options for each column family. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15933	2014-02-05 14:56:09 -08:00
Igor Canadi	328ac7ee02	Merge branch 'master' into columnfamilies	2014-02-05 12:00:33 -08:00
Igor Canadi	c24d8c4e90	[CF] Rethink table cache Summary: Adapting table cache to column families is interesting. We want table cache to be global LRU, so if some column families are use not as often as others, we want them to be evicted from cache. However, current TableCache object also constructs tables on its own. If table is not found in the cache, TableCache automatically creates new table. We want each column family to be able to specify different table factory. To solve the problem, we still have a single LRU, but we provide the LRUCache object to TableCache on construction. We have one TableCache per column family, but the underyling cache is shared by all TableCache objects. This allows us to have a global LRU, but still be able to support different table factories for different column families. Also, in the future it will also be able to support different directories for different column families. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15915	2014-02-05 11:55:30 -08:00
Albert Strasheim	d411dc5884	crc32c: choose function in static initialization Before: 4.430 micros/op 225732 ops/sec; 881.8 MB/s (4K per op) After: 4.125 micros/op 242425 ops/sec; 947.0 MB/s (4K per op)	2014-02-04 19:13:57 -08:00
Igor Canadi	73f62255c1	[CF] Split SanitizeOptions into two Summary: There are three SanitizeOption-s now : one for DBOptions, one for ColumnFamilyOptions and one for Options (which just calls the other two) I have also reshuffled some options -- table_cache options and info_log should live in DBOptions, for example. Test Plan: make check doesn't complain Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15873	2014-02-04 17:26:51 -08:00
Igor Canadi	2966d764cd	Fix some 32-bit compile errors Summary: RocksDB doesn't compile on 32-bit architecture apparently. This is attempt to fix some of 32-bit errors. They are reported here: https://gist.github.com/paxos/8789697 Test Plan: RocksDB still compiles on 64-bit :) Reviewers: kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15825	2014-02-03 13:48:30 -08:00
Igor Canadi	2a9271b403	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/db_impl_readonly.cc	2014-02-03 13:47:54 -08:00
Igor Canadi	29bacb2eb6	VersionSet cleanup Summary: Removed icmp_ from VersionSet (since it's per-column-family, not per-DB-instance) Unfriended VersionSet and ColumnFamilyData (yay!) Removed VersionSet::NumberLevels() Cleaned up DBImpl Test Plan: make check Reviewers: dhruba, haobo, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15819	2014-02-03 13:10:47 -08:00
Siying Dong	d169b67680	[Performance Branch] PlainTable to encode rows with seqID 0, value type using 1 internal byte. Summary: In PlainTable, use one single byte to represent 8 bytes of internal bytes, if seqID = 0 and it is value type (which should be common for bottom most files). It is to save 7 bytes for uncompressed cases. Test Plan: make all check Reviewers: haobo, dhruba, kailiu Reviewed By: haobo CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D15489	2014-02-03 12:19:30 -08:00
Igor Canadi	5c6ef56152	Fix printf format	2014-02-03 10:25:37 -08:00
Kai Liu	87bda51d77	Merge pull request #58 from mlin/no-stdout Eliminate stdout message when launching a posix thread.	2014-02-03 00:38:11 -08:00
kailiu	4f6cb17bdb	First phase API clean up Summary: Addressed all the issues in https://reviews.facebook.net/D15447. Now most table-related modules are hidden from user land. Test Plan: make check Reviewers: sdong, haobo, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15525	2014-02-03 00:30:43 -08:00
Igor Canadi	5661ed8b80	Fix reduce_levels_test	2014-01-31 19:44:48 -08:00
Igor Canadi	f7489123e2	Move compaction picker and internal key comparator to ColumnFamilyData Summary: Compaction picker and internal key comparator are different for each column family (not global), so they should live in ColumnFamilyData Test Plan: make check Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15801	2014-01-31 16:06:55 -08:00
kailiu	4e0298f23c	Clean up arena API Summary: Easy thing goes first. This patch moves arena to internal dir; based on which, the coming patch will deal with memtable_rep. Test Plan: make check Reviewers: haobo, sdong, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15615	2014-01-30 22:10:10 -08:00
Igor Canadi	c37e7de669	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h	2014-01-30 11:47:23 -08:00
Igor Canadi	ac92420fc5	Merge branch 'master' into performance Conflicts: db/db_impl.h	2014-01-30 10:09:23 -08:00
Igor Canadi	fa99d53e55	Change ColumnFamilyData from struct to class Summary: ColumnFamilyData grew a lot, there's much more data that it holds now. It makes more sense to encapsulate it better by making it a class. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15579	2014-01-29 15:18:36 -08:00
Lei Jin	fb84c49a36	convert Tickers back to array with padding and alignment Summary: Pad each Ticker structure to be 64 bytes and make them align on 64 bytes boundary to avoid cache line false sharing issue. Please refer to task 3615553 for more details Test Plan: db_bench LevelDB: version 2.0s Date: Wed Jan 29 12:23:17 2014 CPU: 32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz CPUCache: 20480 KB rocksdb.build.overwrite.qps 49638 rocksdb.build.overwrite.p50_micros 58.73 rocksdb.build.overwrite.p75_micros 210.56 rocksdb.build.overwrite.p99_micros 733.28 rocksdb.build.fillseq.qps 366729 rocksdb.build.fillseq.p50_micros 1.00 rocksdb.build.fillseq.p75_micros 1.00 rocksdb.build.fillseq.p99_micros 2.65 rocksdb.build.readrandom.qps 1152995 rocksdb.build.readrandom.p50_micros 11.27 rocksdb.build.readrandom.p75_micros 15.69 rocksdb.build.readrandom.p99_micros 33.59 rocksdb.build.readrandom_smallblockcache.qps 956047 rocksdb.build.readrandom_smallblockcache.p50_micros 15.23 rocksdb.build.readrandom_smallblockcache.p75_micros 17.31 rocksdb.build.readrandom_smallblockcache.p99_micros 31.49 rocksdb.build.readrandom_memtable_sst.qps 1105183 rocksdb.build.readrandom_memtable_sst.p50_micros 12.04 rocksdb.build.readrandom_memtable_sst.p75_micros 15.78 rocksdb.build.readrandom_memtable_sst.p99_micros 32.49 rocksdb.build.readrandom_fillunique_random.qps 487856 rocksdb.build.readrandom_fillunique_random.p50_micros 29.65 rocksdb.build.readrandom_fillunique_random.p75_micros 40.93 rocksdb.build.readrandom_fillunique_random.p99_micros 78.68 rocksdb.build.memtablefillrandom.qps 91304 rocksdb.build.memtablefillrandom.p50_micros 171.05 rocksdb.build.memtablefillrandom.p75_micros 196.12 rocksdb.build.memtablefillrandom.p99_micros 291.73 rocksdb.build.memtablereadrandom.qps 1340411 rocksdb.build.memtablereadrandom.p50_micros 9.48 rocksdb.build.memtablereadrandom.p75_micros 13.95 rocksdb.build.memtablereadrandom.p99_micros 30.36 rocksdb.build.readwhilewriting.qps 491004 rocksdb.build.readwhilewriting.p50_micros 29.58 rocksdb.build.readwhilewriting.p75_micros 40.34 rocksdb.build.readwhilewriting.p99_micros 76.78 Reviewers: igor, haobo Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15573	2014-01-29 15:08:41 -08:00
kailiu	a5e220f5ef	Merge branch 'master' into performance Conflicts: Makefile db/db_impl.cc db/db_test.cc db/memtable_list.cc db/memtable_list.h table/block_based_table_reader.cc table/table_test.cc util/cache.cc util/coding.cc	2014-01-28 10:35:55 -08:00
Igor Canadi	4bf25357ae	[column families] Removing VersionSet::current() Summary: Instead of VersionSet::current(), DBImpl uses default_cfd_->current directly. Test Plan: make check Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15483	2014-01-27 17:04:46 -08:00
Igor Canadi	eb055609e4	[column families] Move memtable and immutable memtable list to column family data Summary: All memtables and immutable memtables are moved from DBImpl to ColumnFamilyData. For now, they are all referenced from default column family in DBImpl. It shouldn't be hard to get them from custom column family. Test Plan: make check Reviewers: dhruba, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15459	2014-01-27 13:37:14 -08:00
Igor Canadi	ae16606f98	Merge branch 'master' into columnfamilies Conflicts: db/version_set.cc db/version_set.h	2014-01-27 11:11:51 -08:00
Igor Canadi	832158e7f7	Fsync directory after we create a new file Summary: @dhruba, I'm not sure where we need to sync the directory. I implemented the function in Env() and added the dir sync just after we close the newly created file in the builder. Should I also add FsyncDir() to new files that get created by a compaction? Test Plan: Confirmed that FsyncDir is returning Status::OK() Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D14751	2014-01-27 11:02:21 -08:00
Siying Dong	b20486f294	[Performance Branch] HashLinkList to avoid to convert length prefixed string back to internal keys Summary: Converting from length prefixed buffer back to internal key costs some CPU but it is not necessary. In this patch, internal keys are pass though the functions so that we don't need to convert back to it. Test Plan: make all check Reviewers: haobo, kailiu Reviewed By: kailiu CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D15393	2014-01-27 10:26:14 -08:00
Igor Canadi	5356b2a680	Merge branch 'master' into columnfamilies	2014-01-24 18:34:48 -08:00
Siying Dong	8477255da3	Moving Some includes from options.h to forward declaration Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies. Test Plan: make all check Reviewers: kailiu, haobo, igor, dhruba Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15411	2014-01-24 17:16:22 -08:00
Igor Canadi	1423e7c9de	Merge branch 'master' into columnfamilies Conflicts: db/version_set.cc db/version_set_reduce_num_levels.cc util/ldb_cmd.cc	2014-01-24 15:03:54 -08:00
Igor Canadi	b13bdfa500	Add a call DisownData() to Cache, which should speed up shutdown Summary: On a shutdown, freeing memory takes a long time. If we're shutting down, we don't really care about memory leaks. I added a call to Cache that will avoid freeing all objects in cache. Test Plan: I created a script to test the speedup and demonstrate how to use the call: https://phabricator.fb.com/P3864368 Clean shutdown took 7.2 seconds, while fast and dirty one took 6.3 seconds. Unfortunately, the speedup is not that big, but should be bigger with bigger block_cache. I have set up the capacity to 80GB, but the script filled up only ~7GB. Reviewers: dhruba, haobo, MarkCallaghan, xjin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15069	2014-01-24 14:57:52 -08:00
Igor Canadi	677fee27c6	Make VersionSet::ReduceNumberOfLevels() static Summary: A lot of our code implicitly assumes number_levels to be static. ReduceNumberOfLevels() breaks that assumption. For example, after calling ReduceNumberOfLevels(), DBImpl::NumberLevels() will be different from VersionSet::NumberLevels(). This is dangerous. Thankfully, it's not in public headers and is only used from LDB cmd tool. LDB tool is only using it statically, i.e. it never calls it with running DB instance. With this diff, we make it explicitly static. This way, we can assume number_levels to be immutable and not break assumption that lot of our code is relying upon. LDB tool can still use the method. Also, I removed the method from a separate file since it breaks filename completition. version_se<TAB> now completes to "version_set." instead of "version_set" (without the dot). I don't see a big reason that the function should be in a different file. Test Plan: reduce_levels_test Reviewers: dhruba, haobo, kailiu, sdong Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15303	2014-01-24 14:57:04 -08:00
Igor Canadi	28d1a0c6f5	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/db_impl_readonly.h db/db_test.cc include/rocksdb/db.h include/utilities/stackable_db.h	2014-01-24 09:27:29 -08:00
kailiu	eda924a03a	Remove an unused `GetLengthPrefixedSlice` Summary: We have 3 versions of GetLengthPrefixedSlice() and one of them is no longer in use. Test Plan: make Reviewers: sdong, igor, haobo, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15399	2014-01-23 23:06:52 -08:00
Kai Liu	054c5dda8c	Merge branch 'master' into performance Conflicts: db/db_impl.cc db/db_test.cc db/memtable.cc db/version_set.cc include/rocksdb/statistics.h util/statistics_imp.h	2014-01-23 16:32:49 -08:00
Igor Canadi	4e91f27c3a	Fix performance regression in statistics Summary: For some reason, D15099 caused a big performance regression: https://fburl.com/16059000 After digging a bit, I figured out that the reason was that std::atomic_uint_fast64_t was allocated in an array. When I switched from an array to vector, the QPS returned to the previous level. I'm not sure why this is happening, but this diff seems to fix the performance regression. Test Plan: I ran the regression script, observed the performance going back to normal Reviewers: tnovak, kailiu, haobo Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15375	2014-01-23 16:11:55 -08:00
Kai Liu	bb19b530ca	Aggressively inlining the short functions in coding.cc Summary: This diff takes an even more aggressive way to inline the functions. A decent rule that I followed is "not inline a function if it is more than 10 lines long." Normally optimizing code by inline is ugly and hard to control, but since one of our usecase has significant amount of CPU used in functions from coding.cc, I'd like to try this diff out. Test Plan: 1. the size for some .o file increased a little bit, but most less than 1%. So I think the negative impact of inline is negligible. 2. As the regression test shows (ran for 10 times and I calculated the average number) Metrics Befor After ======================================================================== rocksdb.build.fillseq.qps 426595 444515 (+4.6%) rocksdb.build.memtablefillrandom.qps 121739 123110 rocksdb.build.memtablereadrandom.qps 1285103 1280520 rocksdb.build.overwrite.qps 125816 135570 (+9%) rocksdb.build.readrandom_fillunique_random.qps 285995 296863 rocksdb.build.readrandom_memtable_sst.qps 1027132 1027279 rocksdb.build.readrandom.qps 1041427 1054665 rocksdb.build.readrandom_smallblockcache.qps 1028631 1038433 rocksdb.build.readwhilewriting.qps 918352 914629 Reviewers: haobo, sdong, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15291	2014-01-23 16:03:34 -08:00
Igor Canadi	7c5e583a27	ColumnFamilySet Summary: I created a separate class ColumnFamilySet to keep track of column families. Before we did this in VersionSet and I believe this approach is cleaner. Let me know if you have any comments. I will commit tomorrow. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15357	2014-01-23 14:03:38 -08:00
Igor Canadi	92a022ad07	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/db_impl_readonly.cc db/version_set.cc	2014-01-22 10:59:07 -08:00
kailiu	4036d58dc9	Fix a Statistics-related unit test faulure Summary: In my MacOS, the member variables are populated with random numbers after initialization. This diff fixes it by fill these arrays with 0. Test Plan: make && ./table_test Reviewers: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15315	2014-01-21 18:02:55 -08:00
Igor Canadi	23f6791c9e	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl_readonly.cc db/db_test.cc db/version_edit.cc db/version_edit.h db/version_set.cc db/version_set.h db/version_set_reduce_num_levels.cc	2014-01-21 17:01:52 -08:00
Kai Liu	ef602f6275	Misc cleanup on performance branch Summary: Did some trivial stuffs: * Add more comments; * fix compiler's warning messages (uninitialized variables). * etc Test Plan: make check	2014-01-17 14:26:29 -08:00
Igor Canadi	83681bf9ef	Statistics code cleanup Summary: I'm separating code-cleanup part of https://reviews.facebook.net/D14517. This will make D14517 easier to understand and this diff easier to review. Test Plan: make check Reviewers: haobo, kailiu, sdong, dhruba, tnovak Reviewed By: tnovak CC: leveldb Differential Revision: https://reviews.facebook.net/D15099	2014-01-17 12:46:06 -08:00
kailiu	1304d8c8ce	Merge branch 'master' into performance Conflicts: Makefile db/db_impl.cc db/db_impl.h db/db_test.cc db/memtable.cc db/memtable.h db/version_edit.h db/version_set.cc include/rocksdb/options.h util/hash_skiplist_rep.cc util/options.cc	2014-01-15 23:12:31 -08:00
kailiu	eae1804f29	Remove the unnecessary use of shared_ptr Summary: shared_ptr is slower than unique_ptr (which literally comes with no performance cost compare with raw pointers). In memtable and memtable rep, we use shared_ptr when we'd actually should use unique_ptr. According to igor's previous work, we are likely to make quite some performance gain from this diff. Test Plan: make check Reviewers: dhruba, igor, sdong, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15213	2014-01-15 18:22:01 -08:00
Igor Canadi	65a8a52b54	Decrease reliance on VersionSet::NumberLevels() Summary: With column families VersionSet will not have a constant number of levels (each CF can have different options), so we'll need to eliminate call to VersionSet::NumberLevels() This diff decreases number of callsites, but we're not there yet. It associates number of levels with Version (each version is associated with single CF) instead of VersionSet. I have also slightly changed how VersionSet keeps track of manifest size. This diff also modifies constructor of Compaction such that it takes input_version and automatically Ref()s it. Before this was done outside of constructor. In next diffs I will continue to decrease number of callsites of VersionSet::NumberLevels() and also references to current_ Test Plan: make check Reviewers: haobo, dhruba, kailiu, sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D15171	2014-01-15 16:15:43 -08:00
Kai Liu	cd535c2280	Optimize MayContainHash() Summary: In latest leaf's, MayContainHash() consistently consumes 5%~7% CPU usage. I checked the code and did an experiment with/without inlining this method. In release mode, with `1024 * 1024 * 256` bits and `1024 * 512` entries, both call 2^30 MayContainHash() with distinctive parameters. As the result showed, this patch reduced the running time from 9.127 sec to 7.891 sec. Test Plan: make check Reviewers: sdong, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15177	2014-01-14 22:03:57 -08:00
Igor Canadi	d9cd7a063f	Fix CompactRange to apply filter to every key Summary: When doing CompactRange(), we should first flush the memtable and then calculate max_level_with_files. Also, we want to compact all the levels that have files, including level `max_level_with_files`. This patch fixed the unit test. Test Plan: Added a failing unit test and a fix, so it's not failing anymore. Reviewers: dhruba, haobo, sdong Reviewed By: haobo CC: leveldb, xjin Differential Revision: https://reviews.facebook.net/D14421	2014-01-14 16:19:09 -08:00
Naman Gupta	8454cfe569	Add read/modify/write functionality to Put() api Summary: The application can set a callback function, which is applied on the previous value. And calculates the new value. This new value can be set, either inplace, if the previous value existed in memtable, and new value is smaller than previous value. Otherwise the new value is added normally. Test Plan: fbmake. Added unit tests. All unit tests pass. Reviewers: dhruba, haobo Reviewed By: haobo CC: sdong, kailiu, xinyaohu, sumeet, leveldb Differential Revision: https://reviews.facebook.net/D14745	2014-01-14 07:55:16 -08:00
Igor Canadi	00065d0d5d	Fix merge test Copy max_successive_merges from options to column family options	2014-01-13 13:27:47 -08:00
Igor Canadi	151f9e144f	Merge branch 'master' into columnfamilies	2014-01-13 09:09:51 -08:00
Schalk-Willem Kruger	a09ee1069d	Improve RocksDB "get" performance by computing merge result in memtable Summary: Added an option (max_successive_merges) that can be used to specify the maximum number of successive merge operations on a key in the memtable. This can be used to improve performance of the "get" operation. If many successive merge operations are performed on a key, the performance of "get" operations on the key deteriorates, as the value has to be computed for each "get" operation by applying all the successive merge operations. FB Task ID: #3428853 Test Plan: make all check db_bench --benchmarks=readrandommergerandom counter_stress_test Reviewers: haobo, vamsi, dhruba, sdong Reviewed By: haobo CC: zshao Differential Revision: https://reviews.facebook.net/D14991	2014-01-10 17:33:56 -08:00
Siying Dong	5b5ab0c1a8	[Performance Branch] Fix memory leak in HashLinkListRep.GetIterator() Summary: Full list constructed for full iterator can be leaked. This was a bug introduced when I copy the full iterator codes from hash skip list to hash link list. This patch fixes it. Test Plan: Run valgrind test against db_test and make sure the memory leak is fixed Reviewers: kailiu, haobo Reviewed By: kailiu CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D15093	2014-01-10 12:12:28 -08:00
Yancey	afdd2d1a46	fix compile warning	2014-01-10 17:56:35 +08:00
Siying Dong	237a3da677	StopWatch not to get time if it is created for statistics and it is disabled Summary: Currently, even if statistics is not enabled, StopWatch only for the stats still gets the time of the day, which is wasteful. This patch adds a new option to StopWatch to disable this get in this case. Test Plan: make all check Reviewers: dhruba, haobo, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14703 Conflicts: db/db_impl.cc	2014-01-09 17:39:48 -08:00
Siying Dong	424a524ac9	[Performance Branch] A Hashed Linked List Based Mem Table Summary: Implement a mem table, in which keys are hashed based on prefixes. In each bucket, entries are organized in a sorted linked list. It has the same thread safety guarantee as skip list. The motivation is to optimize memory usage for the case that prefix hashing is primary way of seeking to the entry. Compared to hash skip list implementation, this implementation is more memory efficient, but inside each bucket, search is always linear. The target scenario is that there are only very limited number of records in each hash bucket. Test Plan: Add a test case in db_test Reviewers: haobo, kailiu, dhruba Reviewed By: haobo CC: igor, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D14979	2014-01-09 16:19:11 -08:00
Siying Dong	5575316350	StopWatch not to get time if it is created for statistics and it is disabled Summary: Currently, even if statistics is not enabled, StopWatch only for the stats still gets the time of the day, which is wasteful. This patch adds a new option to StopWatch to disable this get in this case. Test Plan: make all check Reviewers: dhruba, haobo, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14703	2014-01-08 16:05:36 -08:00
kailiu	12b6d2b839	Separate the aligned and unaligned memory allocation Summary: Use two vectors for different types of memory allocation. Test Plan: run all unit tests. Reviewers: haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15027	2014-01-08 15:11:42 -08:00
Igor Canadi	72918efffe	[column families] Implement DB::OpenWithColumnFamilies() Summary: In addition to implementing OpenWithColumnFamilies, this diff also includes some minor changes: * Changed all column family names from Slice() to std::string. The performance of column family name handling is not critical, and it's more convenient and cleaner to have names as std::strings * Implemented ColumnFamilyOptions(const Options&) and DBOptions(const Options&) * Added ColumnFamilyOptions to VersionSet::ColumnFamilyData. ColumnFamilyOptions are specified on OpenWithColumnFamilies() and CreateColumnFamily() I will keep the diff in the Phabricator for a day or two and will push to the branch then. Feel free to comment even after the diff has been pushed. Test Plan: Added a simple unit test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15033	2014-01-07 11:05:50 -08:00
Igor Canadi	17a222670b	Merge branch 'master' into performance	2014-01-07 11:04:21 -08:00
Mike Lin	4c75e21c20	Eliminate stdout message when launching a posix thread. This seems out of place as it's the only time RocksDB prints to stdout in the normal course of operations. Thread IDs can still be retrieved from the LOG file: cut -d ' ' -f2 LOG \| sort \| uniq \| egrep -x '[0-9a-f]+'	2014-01-07 10:44:02 -08:00
Igor Canadi	fff5c7e817	Merge branch 'master' into columnfamilies	2014-01-06 13:31:41 -08:00
kailiu	7e70ff63d6	Fix issue #57	2014-01-06 11:11:19 -08:00
Kai Liu	774ed89c24	Replace vector with autovector Summary: this diff only replace the cases when we need to frequently create vector with small amount of entries. This diff doesn't aim to improve performance of a specific area, but more like a small scale test for the autovector and see how it works in real life. Test Plan: make check I also ran the performance tests, however there is no performance gain/loss. All performance numbers are pretty much the same before/after the change. Reviewers: dhruba, haobo, sdong, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14985	2014-01-02 16:43:35 -08:00
Igor Canadi	6de1b5b83e	Merge branch 'master' into columnfamilies	2014-01-02 04:18:07 -08:00
kailiu	f1cec73a76	Merge branch 'master' into performance Conflicts: db/db_impl.cc db/db_test.cc db/memtable.cc db/version_set.cc include/rocksdb/statistics.h	2013-12-27 12:23:17 -08:00
Siying Dong	18df47b79a	Avoid malloc in NotFound key status if no message is given. Summary: In some places we have NotFound status created with empty message, but it doesn't avoid a malloc. With this patch, the malloc is avoided for that case. The motivation of it is that I found in db_bench readrandom test when all keys are not existing, about 4% of the total running time is spent on malloc of Status, plus a similar amount of CPU spent on free of them, which is not necessary. Test Plan: make all check Reviewers: dhruba, haobo, igor Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14691	2013-12-26 16:23:10 -08:00
Kai Liu	b40c052bfa	Fix all the comparison issue in fb dev servers	2013-12-26 16:13:49 -08:00
kailiu	113a08c929	Fix [-Werror=sign-compare] in autovector_test	2013-12-26 15:47:07 -08:00
kailiu	c01676e46d	Implement autovector Summary: A vector that leverages pre-allocated stack-based array to achieve better performance for array with small amount of items. Test Plan: Added tests for both correctness and performance Here is the performance benchmark between vector and autovector Please note that in the test "Creation and Insertion Test", the test case were designed with the motivation described below: * no element inserted: internal array of std::vector may not really get initialize. * one element inserted: internal array of std::vector must have initialized. * kSize elements inserted. This shows the most time we'll spend if we keep everything in stack. * 2 * kSize elements inserted. The internal vector of autovector must have been initialized. Note: kSize is the capacity of autovector ===================================================== Creation and Insertion Test ===================================================== created 100000 vectors: each was inserted with 0 elements total time elapsed: 128000 (ns) created 100000 autovectors: each was inserted with 0 elements total time elapsed: 3641000 (ns) created 100000 VectorWithReserveSizes: each was inserted with 0 elements total time elapsed: 9896000 (ns) ----------------------------------- created 100000 vectors: each was inserted with 1 elements total time elapsed: 11089000 (ns) created 100000 autovectors: each was inserted with 1 elements total time elapsed: 5008000 (ns) created 100000 VectorWithReserveSizes: each was inserted with 1 elements total time elapsed: 24271000 (ns) ----------------------------------- created 100000 vectors: each was inserted with 4 elements total time elapsed: 39369000 (ns) created 100000 autovectors: each was inserted with 4 elements total time elapsed: 10121000 (ns) created 100000 VectorWithReserveSizes: each was inserted with 4 elements total time elapsed: 28473000 (ns) ----------------------------------- created 100000 vectors: each was inserted with 8 elements total time elapsed: 75013000 (ns) created 100000 autovectors: each was inserted with 8 elements total time elapsed: 18237000 (ns) created 100000 VectorWithReserveSizes: each was inserted with 8 elements total time elapsed: 42464000 (ns) ----------------------------------- created 100000 vectors: each was inserted with 16 elements total time elapsed: 102319000 (ns) created 100000 autovectors: each was inserted with 16 elements total time elapsed: 76724000 (ns) created 100000 VectorWithReserveSizes: each was inserted with 16 elements total time elapsed: 68285000 (ns) ----------------------------------- ===================================================== Sequence Access Test ===================================================== performed 100000 sequence access against vector size: 4 total time elapsed: 198000 (ns) performed 100000 sequence access against autovector size: 4 total time elapsed: 306000 (ns) ----------------------------------- performed 100000 sequence access against vector size: 8 total time elapsed: 565000 (ns) performed 100000 sequence access against autovector size: 8 total time elapsed: 512000 (ns) ----------------------------------- performed 100000 sequence access against vector size: 16 total time elapsed: 1076000 (ns) performed 100000 sequence access against autovector size: 16 total time elapsed: 1070000 (ns) ----------------------------------- Reviewers: dhruba, haobo, sdong, chip Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14655	2013-12-26 15:03:47 -08:00
Kai Liu	5643ae1a3f	Merge pull request #32 from jamesgolick/master Only try to use fallocate if it's actually present on the system.	2013-12-26 14:20:22 -08:00
Siying Dong	abaf26266d	[RocksDB] [Performance Branch] Some Changes to PlainTable format Summary: Some changes to PlainTable format: (1) support variable key length (2) use user defined slice transformer to extract prefixes (3) Run some test cases against PlainTable in db_test and table_test Test Plan: test db_test Reviewers: haobo, kailiu CC: dhruba, igor, leveldb, nkg- Differential Revision: https://reviews.facebook.net/D14457	2013-12-20 12:08:35 -08:00
Igor Canadi	9385a5247e	[RocksDB] [Column Family] Interface proposal Summary: <This diff is for Column Family branch> Sharing some of the work I've done so far. This diff compiles and passes the tests. The biggest change is in options.h - I broke down Options into two parts - DBOptions and ColumnFamilyOptions. DBOptions is DB-specific (env, create_if_missing, block_cache, etc.) and ColumnFamilyOptions is column family-specific (all compaction options, compresion options, etc.). Note that this does not break backwards compatibility at all. Further, I created DBWithColumnFamily which inherits DB interface and adds new functions with column family support. Clients can transparently switch to DBWithColumnFamily and it will not break their backwards compatibility. There are few methods worth checking out: ListColumnFamilies(), MultiNewIterator(), MultiGet() and GetSnapshot(). [GetSnapshot() returns the snapshot across all column families for now - I think that's what we agreed on] Finally, I made small changes to WriteBatch so we are able to atomically insert data across column families. Please provide feedback. Test Plan: make check works, the code is backward compatible Reviewers: dhruba, haobo, sdong, kailiu, emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D14445	2013-12-18 13:08:22 -08:00
kailiu	5f5e5fc2e9	Revert `atomic_size_t usage` Summary: By disassemble the function, we found that the atomic variables do invoke the `lock` that locks the memory bus. As a tradeoff, we protect the GetUsage by mutex and leave usage_ as plain size_t. Test Plan: passed `cache_test` Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14667	2013-12-13 17:03:19 -08:00
Haobo Xu	5090316f0d	[RocksDB] [Performance Branch] Trivia build fix Summary: make release complains signed unsigned comparison. Test Plan: make release Reviewers: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D14661	2013-12-13 14:21:59 -08:00
kailiu	b660e2d468	Expose usage info for the cache Summary: This diff will help us to figure out the memory usage for the cache part. Test Plan: added a new memory usage test for cache Reviewers: haobo, sdong, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14559	2013-12-13 12:53:45 -08:00
James Golick	c28dd2a891	oops - missed a spot	2013-12-11 11:18:00 -08:00
Igor Canadi	e8d40c31b3	[RocksDB perf] Cache speedup Summary: I have ran a get benchmark where all the data is in the cache and observed that most of the time is spent on waiting for lock in LRUCache. This is an effort to optimize LRUCache. Test Plan: The data was loaded with fillseq. Then, I ran a benchmark: /db_bench --db=/tmp/rocksdb_stat_bench --num=1000000 --benchmarks=readrandom --statistics=1 --use_existing_db=1 --threads=16 --disable_seek_compaction=1 --cache_size=20000000000 --cache_numshardbits=8 --table_cache_numshardbits=8 I ran the benchmark three times. Here are the results: AFTER THE PATCH: 798072, 803998, 811807 BEFORE THE PATCH: 782008, 815593, 763017 Reviewers: dhruba, haobo, kailiu Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14571	2013-12-11 08:33:29 -08:00
Haobo Xu	3c02c363b3	[RocksDB] [Performance Branch] Added dynamic bloom, to be used for memable non-existing key filtering Summary: as title Test Plan: dynamic_bloom_test Reviewers: dhruba, sdong, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D14385	2013-12-11 00:15:14 -08:00
James Golick	43c386b72e	only try to use fallocate if it's actually present on the system	2013-12-10 22:34:19 -08:00
kailiu	c79e595471	Make Cache::GetCapacity constant Summary: This will allow us to access constant via `DB::GetOptions().table_cache.GetCapacity()` or `DB::GetOptions().block_cache.GetCapacity()` since GetOptions() is also constant method.	2013-12-10 17:34:35 -08:00
Igor Canadi	19f5463d3f	Don't LogFlush() in foreground threads Summary: So fflush() takes a lock which is heavyweight. I added flush_pending_, but more importantly, I removed LogFlush() from foreground threads. Test Plan: ./db_test Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14535	2013-12-10 10:57:46 -08:00
Igor Canadi	fb9fce4fc3	[RocksDB] BackupableDB Summary: In this diff I present you BackupableDB v1. You can easily use it to backup your DB and it will do incremental snapshots for you. Let's first describe how you would use BackupableDB. It's inheriting StackableDB interface so you can easily construct it with your DB object -- it will add a method RollTheSnapshot() to the DB object. When you call RollTheSnapshot(), current snapshot of the DB will be stored in the backup dir. To restore, you can just call RestoreDBFromBackup() on a BackupableDB (which is a static method) and it will restore all files from the backup dir. In the next version, it will even support automatic backuping every X minutes. There are multiple things you can configure: 1. backup_env and db_env can be different, which is awesome because then you can easily backup to HDFS or wherever you feel like. 2. sync - if true, it guarantees backup consistency on machine reboot 3. number of snapshots to keep - this will keep last N snapshots around if you want, for some reason, be able to restore from an earlier snapshot. All the backuping is done in incremental fashion - if we already have 00010.sst, we will not copy it again. IMPORTANT -- This is based on assumption that 00010.sst never changes - two files named 00010.sst from the same DB will always be exactly the same. Is this true? I always copy manifest, current and log files. 4. You can decide if you want to flush the memtables before you backup, or you're fine with backing up the log files -- either way, you get a complete and consistent view of the database at a time of backup. 5. More things you can find in BackupableDBOptions Here is the directory structure I use: backup_dir/CURRENT_SNAPSHOT - just 4 bytes holding the latest snapshot 0, 1, 2, ... - files containing serialized version of each snapshot - containing a list of files files/*.sst - sst files shared between snapshots - if one snapshot references 00010.sst and another one needs to backup it from the DB, it will just reference the same file files/ 0/, 1/, 2/, ... - snapshot directories containing private snapshot files - current, manifest and log files All the files are ref counted and deleted immediatelly when they get out of scope. Some other stuff in this diff: 1. Added GetEnv() method to the DB. Discussed with @haobo and we agreed that it seems right thing to do. 2. Fixed StackableDB interface. The way it was set up before, I was not able to implement BackupableDB. Test Plan: I have a unittest, but please don't look at this yet. I just hacked it up to help me with debugging. I will write a lot of good tests and update the diff. Also, `make asan_check` Reviewers: dhruba, haobo, emayanke Reviewed By: dhruba CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D14295	2013-12-09 14:06:52 -08:00
Igor Canadi	9644e0e0c7	Print stack trace on assertion failure Summary: This will help me a lot! When we hit an assertion in unittest, we get the whole stack trace now. Also, changed stack trace a bit, we now include actual demangled C++ class::function symbols! Test Plan: Added ASSERT_TRUE(false) to a test, observed a stack trace Reviewers: haobo, dhruba, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D14499	2013-12-06 17:11:09 -08:00
kailiu	c7707f24c2	Refine the statistics	2013-12-06 16:51:35 -08:00
kailiu	551e9428ce	Merge branch 'master' into performance	2013-12-06 14:15:42 -08:00
kailiu	e1d92dfd2e	Fix a bunch of mac compilation issues in performance branch	2013-12-04 23:00:33 -08:00
Vamsi Ponnekanti	fa88cbc71e	[Log dumper broken when merge operator is in log] Summary: $title Test Plan: on my dev box Revert Plan: OK Task ID: # Reviewers: emayanke, dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14451	2013-12-04 16:22:54 -08:00
Mark Callaghan	97aa401e2f	Add compression options to db_bench Summary: This adds 2 options for compression to db_bench: * universal_compression_size_percent * compression_level - to set zlib compression level It also logs compression_size_percent at startup in LOG Task ID: # Blame Rev: Test Plan: make check, run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14439	2013-12-03 14:28:48 -08:00
Igor Canadi	eb12e47e0e	Killing Transform Rep Summary: Let's get rid of TransformRep and it's children. We have confirmed that HashSkipListRep works better with multifeed, so there is no benefit to keeping this around. This diff is mostly just deleting references to obsoleted functions. I also have a diff for fbcode that we'll need to push when we switch to new release. I had to expose HashSkipListRepFactory in the client header files because db_impl.cc needs access to GetTransform() function for SanitizeOptions. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14397	2013-12-03 12:42:15 -08:00
lovro	930cb0b9ee	Clarify CompactionFilter thread safety requirements Summary: Documenting our discussion Test Plan: make Reviewers: dhruba, haobo Reviewed By: dhruba CC: igor Differential Revision: https://reviews.facebook.net/D14403	2013-12-02 16:41:43 -08:00
lovro	45a2f2d8d3	Fix build without glibc Summary: The preprocessor does not follow normal rules of && evaluation, tries to evaluate __GLIBC_PREREQ(2, 12) even though the defined() check fails. This breaks the build if __GLIBC_PREREQ is absent. Test Plan: Try adding #undef __GLIBC_PREREQ above the offending line, build no longer breaks Reviewed By: igor Blame Rev: `4c81383628`	2013-12-01 11:32:54 -08:00
Kai Liu	1966b63137	Merge branch 'master' into perf	2013-11-27 11:47:40 -08:00
lovro	4c81383628	Set background thread name with pthread_setname_np() Summary: Makes it easier to monitor performance with top Test Plan: ./manual_compaction_test with `top -H` running. Previously was two `manual_compacti`, now one shows `rocksdb:bg0`. Reviewers: igor, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14367	2013-11-27 11:28:06 -08:00
Haobo Xu	4e6463ea44	[RocksDB][Performance Branch] Make height and branching factor configurable for skiplist implementation Summary: As title. Especially, HashSkipListRepFactory will be able to specify a relatively small height, to reduce the memory overhead of one skiplist per bucket. Test Plan: make check and test it on leaf4 Reviewers: dhruba, sdong, kailiu CC: reconnect.grayhat, leveldb Differential Revision: https://reviews.facebook.net/D14307	2013-11-26 21:59:36 -08:00

1 2 3 4 5 ...

425 Commits