rocksdb

Commit Graph

Author	SHA1	Message	Date
Igor Canadi	f276e0e59d	[CF] Options -> DBOptions Summary: Replaced most of occurrences of Options with more specific DBOptions. This brings us very close to supporting different configuration options for each column family. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15933	2014-02-05 14:56:09 -08:00
Igor Canadi	c24d8c4e90	[CF] Rethink table cache Summary: Adapting table cache to column families is interesting. We want table cache to be global LRU, so if some column families are use not as often as others, we want them to be evicted from cache. However, current TableCache object also constructs tables on its own. If table is not found in the cache, TableCache automatically creates new table. We want each column family to be able to specify different table factory. To solve the problem, we still have a single LRU, but we provide the LRUCache object to TableCache on construction. We have one TableCache per column family, but the underyling cache is shared by all TableCache objects. This allows us to have a global LRU, but still be able to support different table factories for different column families. Also, in the future it will also be able to support different directories for different column families. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15915	2014-02-05 11:55:30 -08:00
Igor Canadi	7b9f134959	[CF] Move InternalStats to ColumnFamilyData Summary: InternalStats is a messy thing, keeping both DB data and column family data. However, it's better off living in ColumnFamilyData than in DBImpl. For now, at least. Test Plan: make check Reviewers: dhruba, kailiu, haobo, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15879	2014-02-04 17:59:50 -08:00
Igor Canadi	73f62255c1	[CF] Split SanitizeOptions into two Summary: There are three SanitizeOption-s now : one for DBOptions, one for ColumnFamilyOptions and one for Options (which just calls the other two) I have also reshuffled some options -- table_cache options and info_log should live in DBOptions, for example. Test Plan: make check doesn't complain Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15873	2014-02-04 17:26:51 -08:00
Igor Canadi	5e2c4fe766	Get rid of DBImpl::user_comparator() Summary: user_comparator() is a Column Family property, not DBImpl Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15855	2014-02-03 17:50:47 -08:00
Igor Canadi	0e22badc08	[column families] Iterator and MultiGet Summary: Support for different column families in Iterator and MultiGet code path. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15849	2014-02-03 17:44:40 -08:00
Igor Canadi	2a9271b403	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/db_impl_readonly.cc	2014-02-03 13:47:54 -08:00
Lei Jin	5b3b6549d6	use super_version in NewIterator() and MultiGet() function Summary: Use super_version insider NewIterator to avoid Ref() each component separately under mutex The new added bench shows NewIterator QPS increases from 515K to 719K No meaningful improvement for multiget I guess due to its relatively small cost comparing to 90 keys fetch in the test. Test Plan: unit test and db_bench Reviewers: igor, sdong Reviewed By: igor CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D15609	2014-02-03 13:13:36 -08:00
Igor Canadi	29bacb2eb6	VersionSet cleanup Summary: Removed icmp_ from VersionSet (since it's per-column-family, not per-DB-instance) Unfriended VersionSet and ColumnFamilyData (yay!) Removed VersionSet::NumberLevels() Cleaned up DBImpl Test Plan: make check Reviewers: dhruba, haobo, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15819	2014-02-03 13:10:47 -08:00
Siying Dong	d169b67680	[Performance Branch] PlainTable to encode rows with seqID 0, value type using 1 internal byte. Summary: In PlainTable, use one single byte to represent 8 bytes of internal bytes, if seqID = 0 and it is value type (which should be common for bottom most files). It is to save 7 bytes for uncompressed cases. Test Plan: make all check Reviewers: haobo, dhruba, kailiu Reviewed By: haobo CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D15489	2014-02-03 12:19:30 -08:00
kailiu	4f6cb17bdb	First phase API clean up Summary: Addressed all the issues in https://reviews.facebook.net/D15447. Now most table-related modules are hidden from user land. Test Plan: make check Reviewers: sdong, haobo, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15525	2014-02-03 00:30:43 -08:00
Igor Canadi	27a8856c23	Compacting column families Summary: This diff enables non-default column families to get compacted both automatically and also by calling CompactRange() Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15813	2014-01-31 19:54:03 -08:00
Igor Canadi	f7489123e2	Move compaction picker and internal key comparator to ColumnFamilyData Summary: Compaction picker and internal key comparator are different for each column family (not global), so they should live in ColumnFamilyData Test Plan: make check Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15801	2014-01-31 16:06:55 -08:00
Igor Canadi	5693db2a02	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc	2014-01-31 14:45:15 -08:00
Igor Canadi	dbbffbd772	Mark the log_number file number used Summary: VersionSet::next_file_number_ is always assumed to be strictly greater than VersionSet::log_number_. In our new recovery code, we artificially set log_number_ to be (log_number + 1), so that once we flush, we don't recover from the same log file again (this is important because of merge operator non-idempotence) When we set VersionSet::log_number_ to (log_number + 1), we also have to mark that file number used, such that next_file_number_ is increased to a legal level. Otherwise, VersionSet might assert. This has not be a problem so far because here's what happens: 1. assume next_file_number is 5, we're recovering log_number 10 2. in DBImpl::Recover() we call MarkFileNumberUsed with 10. This will set VersionSet::next_file_number_ to 11. 3. If there are some updates, we will call WriteTable0ForRecovery(), which will use file number 11 as a new table file and advance VersionSet::next_file_number_ to 12. 4. When we LogAndApply() with log_number 11, assertion is true: assert(11 <= 12); However, this was a lucky occurrence. Even though this diff doesn't cause a bug, I think the issue is important to fix. Test Plan: In column families I have different recovery logic and this code path asserted. When adding MarkFileNumberUsed(log_number + 1) assert is gone. Reviewers: dhruba, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15783	2014-01-31 14:43:16 -08:00
Igor Canadi	3615f534d1	Enable flushing memtables from arbitrary column families Summary: Removed default_cfd_ from all flush code paths. This means we can now flush memtables from arbitrary column families! Test Plan: Added a new unit test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15789	2014-01-31 14:42:52 -08:00
Siying Dong	56bea9f80d	When using Universal Compaction, Zero out seqID in the last file too Summary: I didn't figure out the reason why the feature of zeroing out earlier sequence ID is disabled in universal compaction. I do see bottommost_level is set correctly. It should simply work if we remove the constraint of universal compaction. Test Plan: make all check Reviewers: haobo, dhruba, kailiu, igor Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15423	2014-01-31 09:59:52 -08:00
Igor Canadi	9ca638a86d	Enable iterating column families with a concurrent writer Summary: Sometimes we iterate through column families, and unlock the mutex in the body of the iteration. While mutex is unlocked, some column family might be created or dropped. We need to be able to continue iterating through column families even though our current column family got dropped. This diff implements circular linked lists that connect all column families. It then uses the link list to enable iterating through linked lists. Even if the column family is dropped, its next_ pointer still can be used to advance to another alive column family. Test Plan: make check Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15603	2014-01-30 16:57:52 -08:00
Igor Canadi	6973bb1722	MakeRoomForWrite() support for column families Summary: Making room for write will be the hardest part of the column family implementation. For now, I just iterate through all column families and run MakeRoomForWrite() for every one. Test Plan: make check does not complain Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15597	2014-01-30 16:12:08 -08:00
Igor Canadi	c37e7de669	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h	2014-01-30 11:47:23 -08:00
Igor Canadi	ac92420fc5	Merge branch 'master' into performance Conflicts: db/db_impl.h	2014-01-30 10:09:23 -08:00
Igor Canadi	3c0dcf0e25	InternalStatistics Summary: In DBImpl we keep track of some statistics internally and expose them via GetProperty(). This diff encapsulates all the internal statistics into a class InternalStatisics. Most of it is copy/paste. Apart from cleaning up db_impl.cc, this diff is also necessary for Column families, since every column family should have its own CompactionStats, MakeRoomForWrite-stall stats, etc. It's much easier to keep track of it in every column family if it's nicely encapsulated in its own class. Test Plan: make check Reviewers: dhruba, kailiu, haobo, sdong, emayanke Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15273	2014-01-29 20:40:41 -08:00
Lei Jin	d118707f8d	set bg_error_ when background flush goes wrong Summary: as title Test Plan: unit test Reviewers: haobo, igor, sdong, kailiu, dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15435	2014-01-29 15:55:58 -08:00
Igor Canadi	fa99d53e55	Change ColumnFamilyData from struct to class Summary: ColumnFamilyData grew a lot, there's much more data that it holds now. It makes more sense to encapsulate it better by making it a class. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15579	2014-01-29 15:18:36 -08:00
Igor Canadi	4662969bf5	PurgeObsoleteFiles in DropColumnFamily Summary: When we drop the column family, we want to delete all the files from that column family. Test Plan: make check Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15561	2014-01-29 11:58:33 -08:00
Igor Canadi	f24a3ee52d	Read from and write to different column families Summary: This one is big. It adds ability to write to and read from different column families (see the unit test). It also supports recovery of different column families from log, which was the hardest part to reason about. We need to make sure to never delete the log file which has unflushed data from any column family. To support that, I added another concept, which is versions_->MinLogNumber() Test Plan: Added a unit test in column_family_test Reviewers: dhruba, haobo, sdong, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15537	2014-01-29 11:38:16 -08:00
Igor Canadi	c1071ed95c	Merge branch 'master' into columnfamilies	2014-01-28 16:04:00 -08:00
kailiu	a5e220f5ef	Merge branch 'master' into performance Conflicts: Makefile db/db_impl.cc db/db_test.cc db/memtable_list.cc db/memtable_list.h table/block_based_table_reader.cc table/table_test.cc util/cache.cc util/coding.cc	2014-01-28 10:35:55 -08:00
Igor Canadi	4bf25357ae	[column families] Removing VersionSet::current() Summary: Instead of VersionSet::current(), DBImpl uses default_cfd_->current directly. Test Plan: make check Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15483	2014-01-27 17:04:46 -08:00
Mark Callaghan	90f29ccbef	Update monitoring to include average time per compaction and stall Summary: The new columns are msComp and msStall that provide average time per compaction and stall for that level in milliseconds. Level Files Size(MB) Score Time(sec) Read(MB) Write(MB) Rn(MB) Rnp1(MB) Wnew(MB) RW-Amplify Read(MB/s) Write(MB/s) Rn Rnp1 Wnp1 NewW Count msComp msStall Ln-stall Stall-cnt ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0 8 15 1.5 2 0 30 0 0 30 0.0 0.0 15.5 0 0 0 0 16 112 0.2 1.3 7568 1 8 16 1.6 1 26 26 15 11 16 3.5 17.6 18.1 8 6 13 7 3 362 0.0 0.0 0 2 1 2 0.0 0 0 2 0 0 2 0.0 0.0 18.4 0 0 0 0 1 50 0.0 0.0 0 Task ID: # Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15345	2014-01-27 15:06:04 -08:00
Igor Canadi	511b03a5b5	LogAndApply to take ColumnFamilyData Summary: This removes the default implementation of LogAndApply that applied the changed to the default column family by default. It is mostly simple reformatting. Test Plan: make check Reviewers: dhruba, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15465	2014-01-27 13:57:58 -08:00
Igor Canadi	eb055609e4	[column families] Move memtable and immutable memtable list to column family data Summary: All memtables and immutable memtables are moved from DBImpl to ColumnFamilyData. For now, they are all referenced from default column family in DBImpl. It shouldn't be hard to get them from custom column family. Test Plan: make check Reviewers: dhruba, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15459	2014-01-27 13:37:14 -08:00
Igor Canadi	ae16606f98	Merge branch 'master' into columnfamilies Conflicts: db/version_set.cc db/version_set.h	2014-01-27 11:11:51 -08:00
Igor Canadi	832158e7f7	Fsync directory after we create a new file Summary: @dhruba, I'm not sure where we need to sync the directory. I implemented the function in Env() and added the dir sync just after we close the newly created file in the builder. Should I also add FsyncDir() to new files that get created by a compaction? Test Plan: Confirmed that FsyncDir is returning Status::OK() Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D14751	2014-01-27 11:02:21 -08:00
Igor Canadi	cf783c674a	Merge branch 'master' into columnfamilies Conflicts: db/version_set.h	2014-01-27 10:00:24 -08:00
Igor Canadi	6c2ca1d3e6	Move NeedsCompaction() from VersionSet to Version Summary: There is no reason to have functions NeedCompaction(), MaxCompactionScore() and MaxCompactionScoreLevel() in VersionSet, since they don't access any data in VersionSet. Test Plan: make check Reviewers: kailiu, haobo, sdong Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15333	2014-01-27 09:59:00 -08:00
Igor Canadi	6a404de4ba	Merge branch 'master' into columnfamilies	2014-01-24 15:54:18 -08:00
Igor Canadi	f653fdcf5a	Fixing iterator cleanup for Tailing iterator Immutable tailing iterator doesn't set CleanupState::mem, so we don't have to unref it.	2014-01-24 15:51:06 -08:00
Igor Canadi	1423e7c9de	Merge branch 'master' into columnfamilies Conflicts: db/version_set.cc db/version_set_reduce_num_levels.cc util/ldb_cmd.cc	2014-01-24 15:03:54 -08:00
Igor Canadi	c583157d49	MemTableListVersion Summary: MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful. This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team. I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet()) Test Plan: `make check` works. I will also do `make valgrind_check` before commit Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15255	2014-01-24 14:52:08 -08:00
Igor Canadi	28d1a0c6f5	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/db_impl_readonly.h db/db_test.cc include/rocksdb/db.h include/utilities/stackable_db.h	2014-01-24 09:27:29 -08:00
Igor Canadi	09489d395f	Fix a bug in DBImpl::CreateColumnFamily	2014-01-24 09:12:00 -08:00
Lei Jin	aba2acb5ec	CompactRange() to return status Summary: as title Test Plan: make all check What else tests shall I cover? Reviewers: igor, haobo CC: Differential Revision: https://reviews.facebook.net/D15339	2014-01-23 16:41:46 -08:00
Kai Liu	054c5dda8c	Merge branch 'master' into performance Conflicts: db/db_impl.cc db/db_test.cc db/memtable.cc db/version_set.cc include/rocksdb/statistics.h util/statistics_imp.h	2014-01-23 16:32:49 -08:00
Tomislav Novak	81c9cc9b3b	Tailing iterator Summary: This diff implements a special type of iterator that doesn't create a snapshot (can be used to read newly inserted data) and is optimized for doing sequential reads. TailingIterator uses current superversion number to determine whether to invalidate its internal iterators. If the version hasn't changed, it can often avoid doing expensive seeks over immutable structures (sst files and immutable memtables). Test Plan: * new unit tests * running LD with this patch Reviewers: igor, dhruba, haobo, sdong, kailiu Reviewed By: sdong CC: leveldb, lovro, march Differential Revision: https://reviews.facebook.net/D15285	2014-01-23 16:26:08 -08:00
Igor Canadi	7c5e583a27	ColumnFamilySet Summary: I created a separate class ColumnFamilySet to keep track of column families. Before we did this in VersionSet and I believe this approach is cleaner. Let me know if you have any comments. I will commit tomorrow. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15357	2014-01-23 14:03:38 -08:00
Igor Canadi	f9a25dda9f	Fix wrong merge	2014-01-22 11:30:19 -08:00
Igor Canadi	92a022ad07	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/db_impl_readonly.cc db/version_set.cc	2014-01-22 10:59:07 -08:00
Igor Canadi	6fe9b57748	Refactor Recover() code Summary: This diff does two things: * Rethinks how we call Recover() with read_only option. Before, we call it with pointer to memtable where we'd like to apply those changes to. This memtable is set in db_impl_readonly.cc and it's actually DBImpl::mem_. Why don't we just apply updates to mem_ right away? It seems more intuitive. * Changes when we apply updates to manifest. Before, the process is to recover all the logs, flush it to sst files and then do one giant commit that atomically adds all recovered sst files and sets the next log number. This works good enough, but causes some small troubles for my column family approach, since I can't have one VersionEdit apply to more than single column family[1]. The change here is to commit the files recovered from logs right away. Here is the state of the world before the change: 1. Recover log 5, add new sst files to edit 2. Recover log 7, add new sst files to edit 3. Recover log 8, add new sst files to edit 4. Commit all added sst files to manifest and mark log files 5, 7 and 8 as recoverd (via SetLogNumber(9) function) After the change, we'll do: 1. Recover log 5, commit the new sst files and set log 5 as recovered 2. Recover log 7, commit the new sst files and set log 7 as recovered 3. Recover log 8, commit the new sst files and set log 8 as recovered The added (small) benefit is that if we fail after (2), the new recovery will only have to recover log 8. In previous case, we'll have to restart the recovery from the beginning. The bigger benefit will be to enable easier integration of multiple column families in Recovery code path. [1] I'm happy to dicuss this decison, but I believe this is the cleanest way to go. It also makes backward compatibility much easier. We don't have a requirement of adding multiple column families atomically. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15237	2014-01-22 10:45:26 -08:00
Igor Canadi	23f6791c9e	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl_readonly.cc db/db_test.cc db/version_edit.cc db/version_edit.h db/version_set.cc db/version_set.h db/version_set_reduce_num_levels.cc	2014-01-21 17:01:52 -08:00
Mark Callaghan	4e8321bfea	Boost access before mutex is unlocked Summary: This moves the use of versions_ to before the mutex is unlocked to avoid a possible race. Task ID: # Blame Rev: Test Plan: make check Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: haobo, dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15279	2014-01-17 21:32:23 -08:00
Igor Canadi	83681bf9ef	Statistics code cleanup Summary: I'm separating code-cleanup part of https://reviews.facebook.net/D14517. This will make D14517 easier to understand and this diff easier to review. Test Plan: make check Reviewers: haobo, kailiu, sdong, dhruba, tnovak Reviewed By: tnovak CC: leveldb Differential Revision: https://reviews.facebook.net/D15099	2014-01-17 12:46:06 -08:00
Igor Canadi	8079dd5d24	Merge branch 'master' into performance	2014-01-17 12:20:07 -08:00
Mark Callaghan	439e36db21	Fix SlowdownAmount Summary: This had a few bugs. 1) bottom and top were reversed. top is for the max value but the callers were passing the max value to bottom. The result is that the max sleep is used when n >= bottom. 2) one of the callers passed values with type double and these values are frequently between 1.0 and 2.0 so rounding will do some bad things 3) sometimes the function returned 0 when there should be a stall With this change and one other diff (out for review soon) there are slightly fewer stalls on one workload. With the fix. Stalls(secs): 160.166 level0_slowdown, 0.000 level0_numfiles, 0.000 memtable_compaction, 58.495 leveln_slowdown Stalls(count): 910261 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 54526 leveln_slowdown Without the fix. Stalls(secs): 172.227 level0_slowdown, 0.000 level0_numfiles, 0.000 memtable_compaction, 56.538 leveln_slowdown Stalls(count): 160831 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 52845 leveln_slowdown Task ID: # Blame Rev: Test Plan: run db_bench for --benchmarks=overwrite with IO-bound database Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15243	2014-01-17 10:15:30 -08:00
Mike Lin	b1194f4903	Minor compaction logging improvements 1) make summary less likely to be truncated 2) format human-readable file sizes in summary 3) log the motivation for each universal compaction	2014-01-16 22:54:31 -08:00
kailiu	1304d8c8ce	Merge branch 'master' into performance Conflicts: Makefile db/db_impl.cc db/db_impl.h db/db_test.cc db/memtable.cc db/memtable.h db/version_edit.h db/version_set.cc include/rocksdb/options.h util/hash_skiplist_rep.cc util/options.cc	2014-01-15 23:12:31 -08:00
Igor Canadi	2f4eda7890	Move functions from VersionSet to Version Summary: There were some functions in VersionSet that had no reason to be there instead of Version. Moving them to Version will make column families implementation easier. The functions moved are: * NumLevelBytes * LevelSummary * LevelFileSummary * MaxNextLevelOverlappingBytes * AddLiveFiles (previously AddLiveFilesCurrentVersion()) * NeedSlowdownForNumLevel0Files The diff continues on (and depends on) D15171 Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong, emayanke Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15183	2014-01-15 16:18:04 -08:00
Igor Canadi	65a8a52b54	Decrease reliance on VersionSet::NumberLevels() Summary: With column families VersionSet will not have a constant number of levels (each CF can have different options), so we'll need to eliminate call to VersionSet::NumberLevels() This diff decreases number of callsites, but we're not there yet. It associates number of levels with Version (each version is associated with single CF) instead of VersionSet. I have also slightly changed how VersionSet keeps track of manifest size. This diff also modifies constructor of Compaction such that it takes input_version and automatically Ref()s it. Before this was done outside of constructor. In next diffs I will continue to decrease number of callsites of VersionSet::NumberLevels() and also references to current_ Test Plan: make check Reviewers: haobo, dhruba, kailiu, sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D15171	2014-01-15 16:15:43 -08:00
Siying Dong	9b51af5a17	[RocksDB Performance Branch] DBImpl.NewInternalIterator() to reduce works inside mutex Summary: To reduce mutex contention caused by DBImpl.NewInternalIterator(), in this function, move all the iteration creation works out of mutex, only leaving object ref and get. Test Plan: make all check will run db_stress for a while too to make sure no problem. Reviewers: haobo, dhruba, kailiu Reviewed By: haobo CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D14589 Conflicts: db/db_impl.cc	2014-01-14 17:41:44 -08:00
Igor Canadi	d9cd7a063f	Fix CompactRange to apply filter to every key Summary: When doing CompactRange(), we should first flush the memtable and then calculate max_level_with_files. Also, we want to compact all the levels that have files, including level `max_level_with_files`. This patch fixed the unit test. Test Plan: Added a failing unit test and a fix, so it's not failing anymore. Reviewers: dhruba, haobo, sdong Reviewed By: haobo CC: leveldb, xjin Differential Revision: https://reviews.facebook.net/D14421	2014-01-14 16:19:09 -08:00
Igor Canadi	055e6df45b	VersionEdit not to take NumLevels() Summary: I will submit a sequence of diffs that are preparing master branch for column families. There are a lot of implicit assumptions in the code that are making column family implementation hard. If I make the change only in column family branch, it will make merging back to master impossible. Most of the diffs will be simple code refactorings, so I hope we can have fast turnaround time. Feel free to grab me in person to discuss any of them. This diff removes number of level check from VersionEdit. It is used only when VersionEdit is read, not written, but has to be set when it is written. I believe it is a right thing to make VersionEdit dumb and check consistency on the caller side. This will also make it much easier to implement Column Families, since different column families can have different number of levels. Test Plan: make check Reviewers: dhruba, haobo, sdong, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15159	2014-01-14 15:27:09 -08:00
Igor Canadi	7d9f21cf23	BuildBatchGroup -- memcpy outside of lock Summary: When building batch group, don't actually build a new batch since it requires heavy-weight mem copy and malloc. Only store references to the batches and build the batch group without lock held. Test Plan: `make check` I am also planning to run performance tests. The workload that will benefit from this change is readwhilewriting. I will post the results once I have them. Reviewers: dhruba, haobo, kailiu Reviewed By: haobo CC: leveldb, xjin Differential Revision: https://reviews.facebook.net/D15063	2014-01-14 14:49:31 -08:00
Naman Gupta	1d9bac4d7f	Use sanitized options while opening db Summary: We use SanitizeOptions() to set appropriate values for some options, based on other options. So we should use the sanitized options by default. Luckily it hasn't caused a bug yet, but can result in a bug in the fugture. Test Plan: make check Reviewers: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14103	2014-01-14 11:46:24 -08:00
Siying Dong	9ea8bf90f1	DB::Put() to estimate write batch data size needed and pre-allocate buffer Summary: In one of CPU profiles, we see some CPU costs of string::reserve() inside Batch.Put(). This patch should be able to reduce some of the costs by allocating sufficient buffer before hand. Since it is a trivial percentage of CPU costs, I didn't find a way to show the improvement in one of the benchmarks. I'll deploy it to same application and do the same CPU profiling to make sure those CPU costs are reduced. Test Plan: make all check Reviewers: haobo, kailiu, igor Reviewed By: haobo CC: leveldb, nkg- Differential Revision: https://reviews.facebook.net/D15135	2014-01-14 11:24:43 -08:00
Siying Dong	fbbf0d1456	Pre-calculate whether to slow down for too many level 0 files Summary: Currently in DBImpl::MakeRoomForWrite(), we do "versions_->NumLevelFiles(0) >= options_.level0_slowdown_writes_trigger" to check whether the writer thread needs to slow down. However, versions_->NumLevelFiles(0) is slightly more expensive than we expected. By caching the result of the comparison when installing a new version, we can avoid this function call every time. Test Plan: make all check Manually trigger this behavior by applying universal compaction style and make sure inserts are made slow after there are certain number of files. Reviewers: haobo, kailiu, igor Reviewed By: kailiu CC: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D15141	2014-01-14 11:23:02 -08:00
Siying Dong	51dd21926c	DB::Put() to estimate write batch data size needed and pre-allocate buffer Summary: In one of CPU profiles, we see some CPU costs of string::reserve() inside Batch.Put(). This patch should be able to reduce some of the costs by allocating sufficient buffer before hand. Since it is a trivial percentage of CPU costs, I didn't find a way to show the improvement in one of the benchmarks. I'll deploy it to same application and do the same CPU profiling to make sure those CPU costs are reduced. Test Plan: make all check Reviewers: haobo, kailiu, igor Reviewed By: haobo CC: leveldb, nkg- Differential Revision: https://reviews.facebook.net/D15135	2014-01-14 10:53:16 -08:00
Igor Canadi	151f9e144f	Merge branch 'master' into columnfamilies	2014-01-13 09:09:51 -08:00
Igor Canadi	d076cef347	[column families] Get rid of VersionSet::current_ and keep current Version for each column family Summary: The biggest change here is getting rid of current_ Version and adding a column_family_data->current Version to each column family. I have also fixed some smaller things in VersionSet that made it easier to implement Column family support. Test Plan: make check Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15105	2014-01-13 08:59:15 -08:00
Siying Dong	aa0ef6602d	[Performance Branch] If options.max_open_files set to be -1, cache table readers in FileMetadata for Get() and NewIterator() Summary: In some use cases, table readers for all live files should always be cached. In that case, there will be an opportunity to avoid the table cache look-up while Get() and NewIterator(). We define options.max_open_files = -1 to be the mode that table readers for live files will always be kept. In that mode, table readers are cached in FileMetaData (with a reference count hold in table cache). So that when executing table_cache.Get() and table_cache.newInterator(), LRU cache checking can be by-passed, to reduce latency. Test Plan: add a test case in db_test Reviewers: haobo, kailiu Reviewed By: haobo CC: dhruba, igor, leveldb Differential Revision: https://reviews.facebook.net/D15039	2014-01-10 15:57:49 -08:00
Siying Dong	237a3da677	StopWatch not to get time if it is created for statistics and it is disabled Summary: Currently, even if statistics is not enabled, StopWatch only for the stats still gets the time of the day, which is wasteful. This patch adds a new option to StopWatch to disable this get in this case. Test Plan: make all check Reviewers: dhruba, haobo, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14703 Conflicts: db/db_impl.cc	2014-01-09 17:39:48 -08:00
Siying Dong	5575316350	StopWatch not to get time if it is created for statistics and it is disabled Summary: Currently, even if statistics is not enabled, StopWatch only for the stats still gets the time of the day, which is wasteful. This patch adds a new option to StopWatch to disable this get in this case. Test Plan: make all check Reviewers: dhruba, haobo, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14703	2014-01-08 16:05:36 -08:00
Igor Canadi	19e3ee64ac	Add column family information to WAL Summary: I have added three new value types: * kTypeColumnFamilyDeletion * kTypeColumnFamilyValue * kTypeColumnFamilyMerge which include column family Varint32 before the data (value, deletion and merge). These values are used only in WAL (not in memtables yet). This endeavour required changing some WriteBatch internals. Test Plan: Added a unittest Reviewers: dhruba, haobo, sdong, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15045	2014-01-08 12:53:33 -08:00
Mark Callaghan	50994bf699	Don't always compress L0 files written by memtable flush Summary: Code was always compressing L0 files written by a memtable flush when compression was enabled. Now this is done when min_level_to_compress=0 for leveled compaction and when universal_compaction_size_percent=-1 for universal compaction. Task ID: #3416472 Blame Rev: Test Plan: ran db_bench with compression options Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba, igor, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14757	2014-01-07 21:50:26 -08:00
Igor Canadi	72918efffe	[column families] Implement DB::OpenWithColumnFamilies() Summary: In addition to implementing OpenWithColumnFamilies, this diff also includes some minor changes: * Changed all column family names from Slice() to std::string. The performance of column family name handling is not critical, and it's more convenient and cleaner to have names as std::strings * Implemented ColumnFamilyOptions(const Options&) and DBOptions(const Options&) * Added ColumnFamilyOptions to VersionSet::ColumnFamilyData. ColumnFamilyOptions are specified on OpenWithColumnFamilies() and CreateColumnFamily() I will keep the diff in the Phabricator for a day or two and will push to the branch then. Feel free to comment even after the diff has been pushed. Test Plan: Added a simple unit test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15033	2014-01-07 11:05:50 -08:00
Igor Canadi	d3a2ba9c64	Merge branch 'master' into columnfamilies	2014-01-07 11:05:03 -08:00
Igor Canadi	17a222670b	Merge branch 'master' into performance	2014-01-07 11:04:21 -08:00
Tomislav Novak	9f690ec62c	Fix a deadlock in CompactRange() Summary: The way DBImpl::TEST_CompactRange() throttles down the number of bg compactions can cause it to deadlock when CompactRange() is called concurrently from multiple threads. Imagine a following scenario with only two threads (max_background_compactions is 10 and bg_compaction_scheduled_ is initially 0): 1. Thread #1 increments bg_compaction_scheduled_ (to LargeNumber), sets bg_compaction_scheduled_ to 9 (newvalue), schedules the compaction (bg_compaction_scheduled_ is now 10) and waits for it to complete. 2. Thread #2 calls TEST_CompactRange(), increments bg_compaction_scheduled_ (now LargeNumber + 10) and waits on a cv for bg_compaction_scheduled_ to drop to LargeNumber. 3. BG thread completes the first manual compaction, decrements bg_compaction_scheduled_ and wakes up all threads waiting on bg_cv_. Thread #1 runs, increments bg_compaction_scheduled_ by LargeNumber again (now 2*LargeNumber + 9). Since that's more than LargeNumber + newvalue, thread #2 also goes to sleep (waiting on bg_cv_), without resetting bg_compaction_scheduled_. This diff attempts to address the problem by introducing a new counter bg_manual_only_ (when positive, MaybeScheduleFlushOrCompaction() will only schedule manual compactions). Test Plan: I could pretty much consistently reproduce the deadlock with a program that calls CompactRange(nullptr, nullptr) immediately after Write() from multiple threads. This no longer happens with this patch. Tests (make check) pass. Reviewers: dhruba, igor, sdong, haobo Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14799	2014-01-07 10:37:34 -08:00
Igor Canadi	ef6ad1708d	[column families] Support to create and drop column families Summary: This diff provides basic implementations of CreateColumnFamily(), DropColumnFamily() and ListColumnFamilies(). It builds on top of https://reviews.facebook.net/D14733 It also includes a bug fix for DBImplReadOnly, where Get implementation would be redirected to DBImpl instead of DBImplReadOnly. Test Plan: Added unit test Reviewers: dhruba, haobo, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15021	2014-01-03 01:12:16 -08:00
Kai Liu	774ed89c24	Replace vector with autovector Summary: this diff only replace the cases when we need to frequently create vector with small amount of entries. This diff doesn't aim to improve performance of a specific area, but more like a small scale test for the autovector and see how it works in real life. Test Plan: make check I also ran the performance tests, however there is no performance gain/loss. All performance numbers are pretty much the same before/after the change. Reviewers: dhruba, haobo, sdong, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14985	2014-01-02 16:43:35 -08:00
kailiu	e72aa37cc5	Merge branch 'master' into performance Conflicts: db/table_cache.cc	2014-01-02 16:34:59 -08:00
kailiu	476416c27c	Some minor refactoring on the code Summary: I made some cleanup while reading the source code in `db`. Most changes are about style, naming or C++ 11 new features. Test Plan: ran `make check` Reviewers: haobo, dhruba, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15009	2014-01-02 16:32:31 -08:00
Igor Canadi	6de1b5b83e	Merge branch 'master' into columnfamilies	2014-01-02 04:18:07 -08:00
Igor Canadi	b60c14f6ee	Support multi-threaded DisableFileDeletions() and EnableFileDeletions() Summary: We don't want two threads to clash if they concurrently call DisableFileDeletions() and EnableFileDeletions(). I'm adding a counter that will enable file deletions only after all DisableFileDeletions() calls have been negated with EnableFileDeletions(). However, we also don't want to break the old behavior, so I added a parameter force to EnableFileDeletions(). If force is true, we will still enable file deletions after every call to EnableFileDeletions(), which is what is happening now. Test Plan: make check Reviewers: dhruba, haobo, sanketh Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14781	2014-01-02 03:33:42 -08:00
kailiu	f1cec73a76	Merge branch 'master' into performance Conflicts: db/db_impl.cc db/db_test.cc db/memtable.cc db/version_set.cc include/rocksdb/statistics.h	2013-12-27 12:23:17 -08:00
Igor Canadi	1fdb3f7dc6	[RocksDB] Optimize locking for Get Summary: Instead of locking and saving a DB state, we can cache a DB state and update it only when it changes. This change reduces lock contention and speeds up read operations on the DB. Performance improvements are substantial, although there is some cost in no-read workloads. I ran the regression tests on my devserver and here are the numbers: overwrite 56345 -> 63001 fillseq 193730 -> 185296 readrandom 771301 -> 1219803 (58% improvement!) readrandom_smallblockcache 677609 -> 862850 readrandom_memtable_sst 710440 -> 1109223 readrandom_fillunique_random 221589 -> 247869 memtablefillrandom 105286 -> 92643 memtablereadrandom 763033 -> 1288862 Test Plan: make asan_check I am also running db_stress Reviewers: dhruba, haobo, sdong, kailiu Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14679	2013-12-20 09:57:58 -08:00
Igor Canadi	9385a5247e	[RocksDB] [Column Family] Interface proposal Summary: <This diff is for Column Family branch> Sharing some of the work I've done so far. This diff compiles and passes the tests. The biggest change is in options.h - I broke down Options into two parts - DBOptions and ColumnFamilyOptions. DBOptions is DB-specific (env, create_if_missing, block_cache, etc.) and ColumnFamilyOptions is column family-specific (all compaction options, compresion options, etc.). Note that this does not break backwards compatibility at all. Further, I created DBWithColumnFamily which inherits DB interface and adds new functions with column family support. Clients can transparently switch to DBWithColumnFamily and it will not break their backwards compatibility. There are few methods worth checking out: ListColumnFamilies(), MultiNewIterator(), MultiGet() and GetSnapshot(). [GetSnapshot() returns the snapshot across all column families for now - I think that's what we agreed on] Finally, I made small changes to WriteBatch so we are able to atomically insert data across column families. Please provide feedback. Test Plan: make check works, the code is backward compatible Reviewers: dhruba, haobo, sdong, kailiu, emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D14445	2013-12-18 13:08:22 -08:00
Siying Dong	a8b8b11dc4	Get() Does Not Reserve space for to_delete memtables Summary: It seems to be a decision tradeoff in current codes: we make a malloc for every Get() to reduce one malloc for a flush inside mutex. It takes about 5% of CPU time in readrandom tests. We might consider the tradeoff to be the other way around. Test Plan: make all check Reviewers: dhruba, haobo, igor Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14697	2013-12-17 17:16:16 -08:00
Mark Callaghan	e9e6b00d29	Add monitoring for universal compaction and add counters for compaction IO Summary: Adds these counters { WAL_FILE_SYNCED, "rocksdb.wal.synced" } number of writes that request a WAL sync { WAL_FILE_BYTES, "rocksdb.wal.bytes" }, number of bytes written to the WAL { WRITE_DONE_BY_SELF, "rocksdb.write.self" }, number of writes processed by the calling thread { WRITE_DONE_BY_OTHER, "rocksdb.write.other" }, number of writes not processed by the calling thread. Instead these were processed by the current holder of the write lock { WRITE_WITH_WAL, "rocksdb.write.wal" }, number of writes that request WAL logging { COMPACT_READ_BYTES, "rocksdb.compact.read.bytes" }, number of bytes read during compaction { COMPACT_WRITE_BYTES, "rocksdb.compact.write.bytes" }, number of bytes written during compaction Per-interval stats output was updated with WAL stats and correct stats for universal compaction including a correct value for write-amplification. It now looks like: Compactions Level Files Size(MB) Score Time(sec) Read(MB) Write(MB) Rn(MB) Rnp1(MB) Wnew(MB) RW-Amplify Read(MB/s) Write(MB/s) Rn Rnp1 Wnp1 NewW Count Ln-stall Stall-cnt -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0 7 464 46.4 281 3411 3875 3411 0 3875 2.1 12.1 13.8 621 0 240 240 628 0.0 0 Uptime(secs): 310.8 total, 2.0 interval Writes cumulative: 9999999 total, 9999999 batches, 1.0 per batch, 1.22 ingest GB WAL cumulative: 9999999 WAL writes, 9999999 WAL syncs, 1.00 writes per sync, 1.22 GB written Compaction IO cumulative (GB): 1.22 new, 3.33 read, 3.78 write, 7.12 read+write Compaction IO cumulative (MB/sec): 4.0 new, 11.0 read, 12.5 write, 23.4 read+write Amplification cumulative: 4.1 write, 6.8 compaction Writes interval: 100000 total, 100000 batches, 1.0 per batch, 12.5 ingest MB WAL interval: 100000 WAL writes, 100000 WAL syncs, 1.00 writes per sync, 0.01 MB written Compaction IO interval (MB): 12.49 new, 14.98 read, 21.50 write, 36.48 read+write Compaction IO interval (MB/sec): 6.4 new, 7.6 read, 11.0 write, 18.6 read+write Amplification interval: 101.7 write, 102.9 compaction Stalls(secs): 142.924 level0_slowdown, 0.000 level0_numfiles, 0.805 memtable_compaction, 0.000 leveln_slowdown Stalls(count): 132461 level0_slowdown, 0 level0_numfiles, 3 memtable_compaction, 0 leveln_slowdown Task ID: #3329644, #3301695 Blame Rev: Test Plan: Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14583	2013-12-12 13:27:43 -08:00
Siying Dong	e8ab1934d9	[RocksDB Performance Branch] DBImpl.NewInternalIterator() to reduce works inside mutex Summary: To reduce mutex contention caused by DBImpl.NewInternalIterator(), in this function, move all the iteration creation works out of mutex, only leaving object ref and get. Test Plan: make all check will run db_stress for a while too to make sure no problem. Reviewers: haobo, dhruba, kailiu Reviewed By: haobo CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D14589	2013-12-12 11:30:00 -08:00
Siying Dong	aaf9c6203c	[RocksDB][Performance Branch]Iterator Cleanup method only tries to find obsolete files if it has the last reference to a version Summary: When deconstructing an iterator, no need to check obsolete file if it doesn't hold last reference of any version. Test Plan: make all check Reviewers: haobo, igor, dhruba, kailiu Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14595	2013-12-11 13:59:43 -08:00
Siying Dong	a8029fdc75	Introduce MergeContext to Lazily Initialize merge operand list Summary: In get operations, merge_operands is only used in few cases. Lazily initialize it can reduce average latency in some cases Test Plan: make all check Reviewers: haobo, kailiu, dhruba Reviewed By: haobo CC: igor, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D14415 Conflicts: db/db_impl.cc db/memtable.cc	2013-12-11 11:37:28 -08:00
Siying Dong	0304e3d2ff	When flushing mem tables, create iterators out of mutex Summary: creating new iterators of mem tables can be expensive. Move them out of mutex. DBImpl::WriteLevel0Table()'s mems seems to be a local vector and is only used by flushing. memtables to flush are also immutable, so it should be safe to do so. Test Plan: make all check Reviewers: haobo, dhruba, kailiu Reviewed By: dhruba CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D14577 Conflicts: db/db_impl.cc	2013-12-11 10:02:17 -08:00
Siying Dong	95a411d853	When flushing mem tables, create iterators out of mutex Summary: creating new iterators of mem tables can be expensive. Move them out of mutex. DBImpl::WriteLevel0Table()'s mems seems to be a local vector and is only used by flushing. memtables to flush are also immutable, so it should be safe to do so. Test Plan: make all check Reviewers: haobo, dhruba, kailiu Reviewed By: dhruba CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D14577	2013-12-11 09:57:19 -08:00
kailiu	a82f42b765	rename db/memtablelist.{h,cc}	2013-12-10 19:03:13 -08:00
Igor Canadi	204bb9cffd	Get rid of LogFlush() in InternalIterator	2013-12-10 10:59:00 -08:00
Igor Canadi	19f5463d3f	Don't LogFlush() in foreground threads Summary: So fflush() takes a lock which is heavyweight. I added flush_pending_, but more importantly, I removed LogFlush() from foreground threads. Test Plan: ./db_test Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14535	2013-12-10 10:57:46 -08:00
Igor Canadi	fb9fce4fc3	[RocksDB] BackupableDB Summary: In this diff I present you BackupableDB v1. You can easily use it to backup your DB and it will do incremental snapshots for you. Let's first describe how you would use BackupableDB. It's inheriting StackableDB interface so you can easily construct it with your DB object -- it will add a method RollTheSnapshot() to the DB object. When you call RollTheSnapshot(), current snapshot of the DB will be stored in the backup dir. To restore, you can just call RestoreDBFromBackup() on a BackupableDB (which is a static method) and it will restore all files from the backup dir. In the next version, it will even support automatic backuping every X minutes. There are multiple things you can configure: 1. backup_env and db_env can be different, which is awesome because then you can easily backup to HDFS or wherever you feel like. 2. sync - if true, it guarantees backup consistency on machine reboot 3. number of snapshots to keep - this will keep last N snapshots around if you want, for some reason, be able to restore from an earlier snapshot. All the backuping is done in incremental fashion - if we already have 00010.sst, we will not copy it again. IMPORTANT -- This is based on assumption that 00010.sst never changes - two files named 00010.sst from the same DB will always be exactly the same. Is this true? I always copy manifest, current and log files. 4. You can decide if you want to flush the memtables before you backup, or you're fine with backing up the log files -- either way, you get a complete and consistent view of the database at a time of backup. 5. More things you can find in BackupableDBOptions Here is the directory structure I use: backup_dir/CURRENT_SNAPSHOT - just 4 bytes holding the latest snapshot 0, 1, 2, ... - files containing serialized version of each snapshot - containing a list of files files/*.sst - sst files shared between snapshots - if one snapshot references 00010.sst and another one needs to backup it from the DB, it will just reference the same file files/ 0/, 1/, 2/, ... - snapshot directories containing private snapshot files - current, manifest and log files All the files are ref counted and deleted immediatelly when they get out of scope. Some other stuff in this diff: 1. Added GetEnv() method to the DB. Discussed with @haobo and we agreed that it seems right thing to do. 2. Fixed StackableDB interface. The way it was set up before, I was not able to implement BackupableDB. Test Plan: I have a unittest, but please don't look at this yet. I just hacked it up to help me with debugging. I will write a lot of good tests and update the diff. Also, `make asan_check` Reviewers: dhruba, haobo, emayanke Reviewed By: dhruba CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D14295	2013-12-09 14:06:52 -08:00
kailiu	551e9428ce	Merge branch 'master' into performance	2013-12-06 14:15:42 -08:00
Siying Dong	ef2211a9ca	[RocksDB Performance Branch] Introduce MergeContext to Lazily Initialize merge operand list Summary: In get operations, merge_operands is only used in few cases. Lazily initialize it can reduce average latency in some cases Test Plan: make all check Reviewers: haobo, kailiu, dhruba Reviewed By: haobo CC: igor, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D14415	2013-12-06 10:28:59 -08:00
Mayank Agarwal	18802689b8	Make an API to get database identity from the IDENTITY file Summary: This would enable rocksdb users to get the db identity without depending on implementation details(storing that in IDENTITY file) Test Plan: db/db_test (has identity checks) Reviewers: dhruba, haobo, igor, kailiu Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14463	2013-12-04 22:39:17 -08:00

1 2 3 4 5 ...

415 Commits