rocksdb

mirror of https://github.com/facebook/rocksdb.git synced 2024-11-26 16:30:56 +00:00

Author	SHA1	Message	Date
Mayank Agarwal	59d0b02f8b	Expand KeyMayExist to return the proper value if it can be found in memory and also check block_cache Summary: Removed KeyMayExistImpl because KeyMayExist demanded Get like semantics now. Removed no_io from memtable and imm because we need the proper value now and shouldn't just stop when we see Merge in memtable. Added checks to block_cache. Updated documentation and unit-test Test Plan: make all check;db_stress for 1 hour Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11853	2013-08-01 09:07:46 -07:00
Jim Paton	9700677a2b	Slow down writes gradually rather than suddenly Summary: Currently, when a certain number of level0 files (level0_slowdown_writes_trigger) are present, RocksDB will slow down each write by 1ms. There is a second limit of level0 files at which RocksDB will stop writes altogether (level0_stop_writes_trigger). This patch enables the user to supply a third parameter specifying the number of files at which Rocks will start slowing down writes (level0_start_slowdown_writes). When this number is reached, Rocks will slow down writes as a quadratic function of level0_slowdown_writes_trigger - num_level0_files. For some workloads, this improves latency and throughput. I will post some stats momentarily in https://our.intern.facebook.com/intern/tasks/?t=2613384. Test Plan: make -j32 check ./db_stress ./db_bench Reviewers: dhruba, haobo, MarkCallaghan, xjin Reviewed By: xjin CC: leveldb, xjin, zshao Differential Revision: https://reviews.facebook.net/D11859	2013-07-31 16:20:48 -07:00
Jim Paton	18afff2e63	Add stall counts to statistics Summary: Previously, statistics are kept on how much time is spent on stalls of different types. This patch adds support for keeping number of stalls of each type. For example, instead of just reporting how many microseconds are spent waiting for memtables to be compacted, it will also report how many times a write stalled for that to occur. Test Plan: make -j32 check ./db_stress # Not really sure what else should be done... Reviewers: dhruba, MarkCallaghan, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11841	2013-07-29 10:34:23 -07:00
Jim Paton	52d7ecfc78	Virtualize SkipList Interface Summary: This diff virtualizes the skiplist interface so that users can provide their own implementation of a backing store for MemTables. Eventually, the backing store will be responsible for its own synchronization, allowing users (and us) to experiment with different lockless implementations. Test Plan: make clean make -j32 check ./db_stress Reviewers: dhruba, emayanke, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11739	2013-07-23 14:42:27 -07:00
Mayank Agarwal	bf66c10b13	Use KeyMayExist for WriteBatch-Deletes Summary: Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete. Added code to skip getting Table from disk if not already present in table_cache. Some renaming of variables. Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch. Changed KeyMayExist to not be pure virtual and provided a default implementation. Expanded unit-tests in db_test to check appropriately. Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1. Test Plan: db_stress;make check Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb, xjin Differential Revision: https://reviews.facebook.net/D11745	2013-07-23 13:36:50 -07:00
Haobo Xu	9ee68871dc	[RocksDB] Enable manual compaction to move files back to an appropriate level. Summary: As title. This diff added an option reduce_level to CompactRange. When set to true, it will try to move the files back to the minimum level sufficient to hold the data set. Note that the default is set to true now, just to excerise it in all existing tests. Will set the default to false before check-in, for backward compatibility. Test Plan: make check; Reviewers: dhruba, emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D11553	2013-07-19 16:20:36 -07:00
Mayank Agarwal	2a986919d6	Make rocksdb-deletes faster using bloom filter Summary: Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving: 1. Put of delete type 2. Space in the db,and 3. Compaction time Test Plan: make all check; will run db_stress and db_bench and enhance unit-test once the basic design gets approved Reviewers: dhruba, haobo, vamsi Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11607	2013-07-11 12:11:11 -07:00
Abhishek Kona	5ef6bb8c37	[rocksdb][refactor] statistic printing code to one place Summary: $title Test Plan: db_bench --statistics=1 Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11373	2013-06-18 20:28:41 -07:00
Dhruba Borthakur	6acbe0fc45	Compact multiple memtables before flushing to storage. Summary: Merge multiple multiple memtables in memory before writing it out to a file in L0. There is a new config parameter min_write_buffer_number_to_merge that specifies the number of write buffers that should be merged together to a single file in storage. The system will not flush wrte buffers to storage unless at least these many buffers have accumulated in memory. The default value of this new parameter is 1, which means that a write buffer will be immediately flushed to disk as soon it is ready. Test Plan: make check Differential Revision: https://reviews.facebook.net/D11241	2013-06-18 14:28:04 -07:00
Haobo Xu	bdf1085944	[RocksDB] cleanup EnvOptions Summary: This diff simplifies EnvOptions by treating it as POD, similar to Options. - virtual functions are removed and member fields are accessed directly. - StorageOptions is removed. - Options.allow_readahead and Options.allow_readahead_compactions are deprecated. - Unused global variables are removed: useOsBuffer, useFsReadAhead, useMmapRead, useMmapWrite Test Plan: make check; db_stress Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11175	2013-06-12 11:17:19 -07:00
Deon Nicholas	d8c7c45ea0	Very basic Multiget and simple test cases. Summary: Implemented the MultiGet operator which takes in a list of keys and returns their associated values. Currently uses std::vector as its container data structure. Otherwise, it works identically to "Get". Test Plan: 1. make db_test ; compile it 2. ./db_test ; test it 3. make all check ; regress / run all tests 4. make release ; (optional) compile with release settings Reviewers: haobo, MarkCallaghan, dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10875	2013-06-05 11:22:38 -07:00
Mark Callaghan	d9f538e1a9	Improve output for GetProperty('leveldb.stats') Summary: Display separate values for read, write & total compaction IO. Display compaction amplification and write amplification. Add similar values for the period since the last call to GetProperty. Results since the server started are reported as "cumulative" stats. Results since the last call to GetProperty are reported as "interval" stats. Level Files Size(MB) Time(sec) Read(MB) Write(MB) Rn(MB) Rnp1(MB) Wnew(MB) Amplify Read(MB/s) Write(MB/s) Rn Rnp1 Wnp1 NewW Count Ln-stall ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0 7 13 21 0 211 0 0 211 0.0 0.0 10.1 0 0 0 0 113 0.0 1 79 157 88 993 989 198 795 194 9.0 11.3 11.2 106 405 502 97 14 0.0 2 19 36 5 63 63 37 27 36 2.4 12.3 12.2 19 14 32 18 12 0.0 >>>>>>>>>>>>>>>>>>>>>>>>> text below has been is new and/or reformatted Uptime(secs): 122.2 total, 0.9 interval Compaction IO cumulative (GB): 0.21 new, 1.03 read, 1.23 write, 2.26 read+write Compaction IO cumulative (MB/sec): 1.7 new, 8.6 read, 10.3 write, 19.0 read+write Amplification cumulative: 6.0 write, 11.0 compaction Compaction IO interval (MB): 5.59 new, 0.00 read, 5.59 write, 5.59 read+write Compaction IO interval (MB/sec): 6.5 new, 0.0 read, 6.5 write, 6.5 read+write Amplification interval: 1.0 write, 1.0 compaction >>>>>>>>>>>>>>>>>>>>>>>> text above is new and/or reformatted Stalls(secs): 90.574 level0_slowdown, 0.000 level0_numfiles, 10.165 memtable_compaction, 0.000 leveln_slowdown Task ID: # Blame Rev: Test Plan: make check, run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11049	2013-06-03 20:24:49 -07:00
Haobo Xu	2df65c118c	[RocksDB] Dump counters and histogram data periodically with compaction stats Summary: As title Test Plan: make check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10995	2013-05-29 12:00:18 -07:00
Haobo Xu	ef15b9d178	[RocksDB] Fix MaybeDumpStats Summary: MaybeDumpStats was causing lock problem Test Plan: make check; db_stress Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D10935	2013-05-24 15:43:16 -07:00
Haobo Xu	0e879c93de	[RocksDB] dump leveldb.stats periodically in LOG file. Summary: Added an option stats_dump_period_sec to dump leveldb.stats to LOG periodically for diagnosis. By defauly, it's set to a very big number 3600 (1 hour). Test Plan: make check; Reviewers: dhruba Reviewed By: dhruba CC: leveldb, zshao Differential Revision: https://reviews.facebook.net/D10761	2013-05-23 16:56:59 -07:00
Abhishek Kona	988c20b9f7	[RocksDB] Clear Archive WAL files Summary: WAL files are moved to archive directory and clear only at DB::Open. Can lead to a lot of space consumption in a Database. Added logic to periodically clear Archive Directory too. Test Plan: make all check + add unit test Reviewers: dhruba, heyongqiang Reviewed By: heyongqiang CC: leveldb Differential Revision: https://reviews.facebook.net/D10617	2013-05-06 11:41:01 -07:00
Haobo Xu	05e8854085	[Rocksdb] Support Merge operation in rocksdb Summary: This diff introduces a new Merge operation into rocksdb. The purpose of this review is mostly getting feedback from the team (everyone please) on the design. Please focus on the four files under include/leveldb/, as they spell the client visible interface change. include/leveldb/db.h include/leveldb/merge_operator.h include/leveldb/options.h include/leveldb/write_batch.h Please go over local/my_test.cc carefully, as it is a concerete use case. Please also review the impelmentation files to see if the straw man implementation makes sense. Note that, the diff does pass all make check and truly supports forward iterator over db and a version of Get that's based on iterator. Future work: - Integration with compaction - A raw Get implementation I am working on a wiki that explains the design and implementation choices, but coding comes just naturally and I think it might be a good idea to share the code earlier. The code is heavily commented. Test Plan: run all local tests Reviewers: dhruba, heyongqiang Reviewed By: dhruba CC: leveldb, zshao, sheki, emayanke, MarkCallaghan Differential Revision: https://reviews.facebook.net/D9651	2013-05-03 16:59:02 -07:00
Haobo Xu	645ff8f231	Let's get rid of delete as much as possible, here are some examples. Summary: If a class owns an object: - If the object can be null => use a unique_ptr. no delete - If the object can not be null => don't even need new, let alone delete - for runtime sized array => use vector, no delete. Test Plan: make check Reviewers: dhruba, heyongqiang Reviewed By: heyongqiang CC: leveldb, zshao, sheki, emayanke, MarkCallaghan Differential Revision: https://reviews.facebook.net/D9783	2013-03-28 17:31:44 -07:00
Dhruba Borthakur	ad96563b79	Ability to configure bufferedio-reads, filesystem-readaheads and mmap-read-write per database. Summary: This patch allows an application to specify whether to use bufferedio, reads-via-mmaps and writes-via-mmaps per database. Earlier, there was a global static variable that was used to configure this functionality. The default setting remains the same (and is backward compatible): 1. use bufferedio 2. do not use mmaps for reads 3. use mmap for writes 4. use readaheads for reads needed for compaction I also added a parameter to db_bench to be able to explicitly specify whether to do readaheads for compactions or not. Test Plan: make check Reviewers: sheki, heyongqiang, MarkCallaghan Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D9429	2013-03-20 23:14:03 -07:00
Abhishek Kona	02c459805b	Ignore a zero-sized file while looking for a seq-no in GetUpdatesSince Summary: Rocksdb can create 0 sized log files when it is opened and closed without any operations. The GetUpdatesSince fails currently if there is a log file of size zero. This diff fixes this. If there is a log file is 0, it is removed form the probable_file_list Test Plan: unit test Reviewers: dhruba, heyongqiang Reviewed By: heyongqiang CC: leveldb Differential Revision: https://reviews.facebook.net/D9507	2013-03-19 11:00:09 -07:00
Abhishek Kona	d68880a1b9	Do not allow Transaction Log Iterator to fall ahead when writer is writing the same file Summary: Store the last flushed, seq no. in db_impl. Check against it in transaction Log iterator. Do not attempt to read ahead if we do not know if the data is flushed completely. Does not work if flush is disabled. Any ideas on fixing that? * Minor change, iter->Next is called the first time automatically for * the first time. Test Plan: existing test pass. More ideas on testing this? Planning to run some stress test. Reviewers: dhruba, heyongqiang CC: leveldb Differential Revision: https://reviews.facebook.net/D9087	2013-03-06 14:05:53 -08:00
Mark Callaghan	993543d1be	Add rate_delay_limit_milliseconds Summary: This adds the rate_delay_limit_milliseconds option to make the delay configurable in MakeRoomForWrite when the max compaction score is too high. This delay is called the Ln slowdown. This change also counts the Ln slowdown per level to make it possible to see where the stalls occur. From IO-bound performance testing, the Level N stalls occur: * with compression -> at the largest uncompressed level. This makes sense because compaction for compressed levels is much slower. When Lx is uncompressed and Lx+1 is compressed then files pile up at Lx because the (Lx,Lx+1)->Lx+1 compaction process is the first to be slowed by compression. * without compression -> at level 1 Task ID: #1832108 Blame Rev: Test Plan: run with real data, added test Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D9045	2013-03-04 07:41:15 -08:00
Abhishek Kona	959337ed5b	Measure compaction time. Summary: just record time consumed in compaction Test Plan: compile Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D8781	2013-02-22 11:38:40 -08:00
amayank	b2c50f1c3f	Fix for the weird behaviour encountered by ldb Get where it could read only the second-latest value Summary: Changed the Get and Scan options with openForReadOnly mode to have access to the memtable. Changed the visibility of NewInternalIterator in db_impl from private to protected so that the derived class db_impl_read_only can call that in its NewIterator function for the scan case. The previous approach which changed the default for flush_on_destroy_ from false to true caused many problems in the unit tests due to empty sst files that it created. All unit tests pass now. Test Plan: make clean; make all check; ldb put and get and scans Reviewers: dhruba, heyongqiang, sheki Reviewed By: dhruba CC: kosievdmerwe, zshao, dilipj, kailiu Differential Revision: https://reviews.facebook.net/D8697	2013-02-20 10:45:52 -08:00
Chip Turner	0b83a83191	Fix poor error on num_levels mismatch and few other minor improvements Summary: Previously, if you opened a db with num_levels set lower than the database, you received the unhelpful message "Corruption: VersionEdit: new-file entry." Now you get a more verbose message describing the issue. Also, fix handling of compression_levels (both the run-over-the-end issue and the memory management of it). Lastly, unique_ptr'ify a couple of minor calls. Test Plan: make check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D8151	2013-01-25 15:37:26 -08:00
Chip Turner	2fdf91a4f8	Fix a number of object lifetime/ownership issues Summary: Replace manual memory management with std::unique_ptr in a number of places; not exhaustive, but this fixes a few leaks with file handles as well as clarifies semantics of the ownership of file handles with log classes. Test Plan: db_stress, make check Reviewers: dhruba Reviewed By: dhruba CC: zshao, leveldb, heyongqiang Differential Revision: https://reviews.facebook.net/D8043	2013-01-23 16:54:11 -08:00
Abhishek Kona	7d5a4383bb	rollover manifest file. Summary: Check in LogAndApply if the file size is more than the limit set in Options. Things to consider : will this be expensive? Test Plan: make all check. Inputs on a new unit test? Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7701	2013-01-16 12:09:44 -08:00
Dhruba Borthakur	f4c2b7cf97	Enhance ReadOnly mode to process the all committed transactions. Summary: Leveldb has an api OpenForReadOnly() that opens the database in readonly mode. This call had an option to not process the transaction log. This patch removes this option and always processes all transactions that had been committed. It has been done in such a way that it does not create/write to any new files in the process. The invariant of "no-writes" to the leveldb data directory is still true. This enhancement allows multiple threads to open the same database in readonly mode and access all trancations that were committed right upto the OpenForReadOnly call. I changed the public API to match the new semantics because there are no users who are currently using this api. Test Plan: make clean check Reviewers: sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D7479	2012-12-19 16:30:46 -08:00
Dhruba Borthakur	24fc379273	An public api to fetch the latest transaction id. Summary: Implement a interface to retrieve the most current transaction id from the database. Test Plan: Added unit test. Reviewers: sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D7269	2012-12-10 16:04:19 -08:00
Abhishek Kona	1c6742e32f	Refactor GetArchivalDirectoryName to filename.h Summary: filename.h has functions to do similar things. Moving code away from db_impl.cc Test Plan: make check Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D7251	2012-12-10 10:51:07 -08:00
Abhishek Kona	8055008909	GetUpdatesSince API to enable replication. Summary: How it works: * GetUpdatesSince takes a SequenceNumber. * A LogFile with the first SequenceNumber nearest and lesser than the requested Sequence Number is found. * Seek in the logFile till the requested SeqNumber is found. * Return an iterator which contains logic to return record's one by one. Test Plan: * Test case included to check the good code path. * Will update with more test-cases. * Feedback required on test-cases. Reviewers: dhruba, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7119	2012-12-07 11:42:13 -08:00
sheki	d4627e6de4	Move WAL files to archive directory, instead of deleting. Summary: Create a directory "archive" in the DB directory. During DeleteObsolteFiles move the WAL files (*.log) to the Archive directory, instead of deleting. Test Plan: Created a DB using DB_Bench. Reopened it. Checked if files move. Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6975	2012-11-28 17:28:08 -08:00
Abhishek Kona	d29f181923	Fix all the lint errors. Summary: Scripted and removed all trailing spaces and converted all tabs to spaces. Also fixed other lint errors. All lint errors from this point of time should be taken seriously. Test Plan: make all check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7059	2012-11-28 17:18:41 -08:00
Dhruba Borthakur	9a357847eb	Delete non-visible keys during a compaction even in the presense of snapshots. Summary: LevelDB should delete almost-new keys when a long-open snapshot exists. The previous behavior is to keep all versions that were created after the oldest open snapshot. This can lead to database size bloat for high-update workloads when there are long-open snapshots and long-open snapshot will be used for logical backup. By "almost new" I mean that the key was updated more than once after the oldest snapshot. If there were two snapshots with seq numbers s1 and s2 (s1 < s2), and if we find two instances of the same key k1 that lie entirely within s1 and s2 (i.e. s1 < k1 < s2), then the earlier version of k1 can be safely deleted because that version is not visible in any snapshot. Test Plan: unit test attached make clean check Differential Revision: https://reviews.facebook.net/D6999	2012-11-28 15:47:40 -08:00
Dhruba Borthakur	62e7583f94	enhance dbstress to simulate hard crash Summary: dbstress has an option to reopen the database. Make it such that the previous handle is not closed before we reopen, this simulates a situation similar to a process crash. Added new api to DMImpl to remove the lock file. Test Plan: run db_stress Reviewers: emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D6777	2012-11-18 23:16:17 -08:00
Dhruba Borthakur	6c5a4d646a	Merge branch 'master' into performance Conflicts: db/db_impl.h	2012-11-14 21:39:52 -08:00
Dhruba Borthakur	8143062edd	Merge branch 'master' into performance Conflicts: db/db_impl.cc db/version_set.cc util/options.cc	2012-11-07 15:11:37 -08:00
heyongqiang	3fcf533ed0	Add a readonly db Summary: as subject Test Plan: run db_bench readrandom Reviewers: dhruba Reviewed By: dhruba CC: MarkCallaghan, emayanke, sheki Differential Revision: https://reviews.facebook.net/D6495	2012-11-07 14:19:48 -08:00
Abhishek Kona	4e413df3d0	Flush Data at object destruction if disableWal is used. Summary: Added a conditional flush in ~DBImpl to flush. There is still a chance of writes not being persisted if there is a crash (not a clean shutdown) before the DBImpl instance is destroyed. Test Plan: modified db_test to meet the new expectations. Reviewers: dhruba, heyongqiang Differential Revision: https://reviews.facebook.net/D6519	2012-11-06 15:04:42 -08:00
Dhruba Borthakur	53e04311b1	Merge branch 'master' into performance Conflicts: db/db_bench.cc util/options.cc	2012-10-29 14:18:00 -07:00
Mark Callaghan	70c42bf05f	Adds DB::GetNextCompaction and then uses that for rate limiting db_bench Summary: Adds a method that returns the score for the next level that most needs compaction. That method is then used by db_bench to rate limit threads. Threads are put to sleep at the end of each stats interval until the score is less than the limit. The limit is set via the --rate_limit=$double option. The specified value must be > 1.0. Also adds the option --stats_per_interval to enable additional metrics reported every stats interval. Task ID: # Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6243	2012-10-29 10:17:43 -07:00
Dhruba Borthakur	ea9e087851	Merge branch 'master' into performance Conflicts: db/db_bench.cc db/db_impl.cc db/db_test.cc	2012-10-26 08:57:56 -07:00
Mark Callaghan	e7206f43ee	Improve statistics Summary: This adds more statistics to be reported by GetProperty("leveldb.stats"). The new stats include time spent waiting on stalls in MakeRoomForWrite. This also includes the total amplification rate where that is: (#bytes of sequential IO during compaction) / (#bytes from Put) This also includes a lot more data for the per-level compaction report. * Rn(MB) - MB read from level N during compaction between levels N and N+1 * Rnp1(MB) - MB read from level N+1 during compaction between levels N and N+1 * Wnew(MB) - new data written to the level during compaction * Amplify - ( Write(MB) + Rnp1(MB) ) / Rn(MB) * Rn - files read from level N during compaction between levels N and N+1 * Rnp1 - files read from level N+1 during compaction between levels N and N+1 * Wnp1 - files written to level N+1 during compaction between levels N and N+1 * NewW - new files written to level N+1 during compaction * Count - number of compactions done for this level This is the new output from DB::GetProperty("leveldb.stats"). The old output stopped at Write(MB) Compactions Level Files Size(MB) Time(sec) Read(MB) Write(MB) Rn(MB) Rnp1(MB) Wnew(MB) Amplify Read(MB/s) Write(MB/s) Rn Rnp1 Wnp1 NewW Count ------------------------------------------------------------------------------------------------------------------------------------- 0 3 6 33 0 576 0 0 576 -1.0 0.0 1.3 0 0 0 0 290 1 127 242 351 5316 5314 570 4747 567 17.0 12.1 12.1 287 2399 2685 286 32 2 161 328 54 822 824 326 496 328 4.0 1.9 1.9 160 251 411 160 161 Amplification: 22.3 rate, 0.56 GB in, 12.55 GB out Uptime(secs): 439.8 Stalls(secs): 206.938 level0_slowdown, 0.000 level0_numfiles, 24.129 memtable_compaction Task ID: # Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - (cherry picked from commit ecdeead38f86cc02e754d0032600742c4f02fec8) Reviewers: dhruba Differential Revision: https://reviews.facebook.net/D6153	2012-10-24 14:21:38 -07:00
Dhruba Borthakur	4c107587ed	Delete files outside the mutex. Summary: The compaction process deletes a large number of files. This takes quite a bit of time and is best done outside the mutex lock. Test Plan: make check Differential Revision: https://reviews.facebook.net/D6123	2012-10-22 11:53:23 -07:00
Dhruba Borthakur	98f23cf04a	Merge branch 'master' into performance Conflicts: db/db_impl.cc db/db_impl.h	2012-10-21 01:55:19 -07:00
Dhruba Borthakur	64c4b9f0e2	Delete files outside the mutex. Summary: The compaction process deletes a large number of files. This takes quite a bit of time and is best done outside the mutex lock. Test Plan: Reviewers: CC: Task ID: # Blame Rev:	2012-10-21 01:49:48 -07:00
Dhruba Borthakur	1ca0584345	This is the mega-patch multi-threaded compaction published in https://reviews.facebook.net/D5997. Summary: This patch allows compaction to occur in multiple background threads concurrently. If a manual compaction is issued, the system falls back to a single-compaction-thread model. This is done to ensure correctess and simplicity of code. When the manual compaction is finished, the system resumes its concurrent-compaction mode automatically. The updates to the manifest are done via group-commit approach. Test Plan: run db_bench	2012-10-19 14:00:53 -07:00
Dhruba Borthakur	aa73538f2a	The deletion of obsolete files should not occur very frequently. Summary: The method DeleteObsolete files is a very costly methind, especially when the number of files in a system is large. It makes a list of all live-files and then scans the directory to compute the diff. By default, this method is executed after every compaction run. This patch makes it such that DeleteObsolete files is never invoked twice within a configured period. Test Plan: run all unit tests Reviewers: heyongqiang, MarkCallaghan Reviewed By: MarkCallaghan Differential Revision: https://reviews.facebook.net/D6045	2012-10-16 10:26:10 -07:00
Dhruba Borthakur	ae36e509f8	The BackupAPI should also list the length of the manifest file. Summary: The GetLiveFiles() api lists the set of sst files and the current MANIFEST file. But the database continues to append new data to the MANIFEST file even when the application is backing it up to the backup location. This means that the database-version that is stored in the MANIFEST FILE in the backup location does not correspond to the sst files returned by GetLiveFiles. This API adds a new parameter to GetLiveFiles. This new parmeter returns the current size of the MANIFEST file. Test Plan: Unit test attached. Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5631	2012-09-25 03:13:25 -07:00
Dhruba Borthakur	ba55d77b5d	Ability to take a file-lvel snapshot from leveldb. Summary: A set of apis that allows an application to backup data from the leveldb database based on a set of files. Test Plan: unint test attached. more coming soon. Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5439	2012-09-17 09:14:50 -07:00
heyongqiang	dcbd6be340	remove boost Summary: as subject Test Plan: build Reviewers: dhruba Differential Revision: https://reviews.facebook.net/D5469	2012-09-16 19:33:43 -07:00
heyongqiang	0f43aa474e	put log in a seperate dir Summary: added a new option db_log_dir, which points the log dir. Inside that dir, in order to make log names unique, the log file name is prefixed with the leveldb data dir absolute path. Test Plan: db_test Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D5205	2012-09-06 17:52:08 -07:00
heyongqiang	6fee5a74f5	Do not spin in a tight loop attempting compactions if there is a compaction error Summary: as subject. ported the change from google code leveldb 1.5 Test Plan: run db_test Reviewers: dhruba Differential Revision: https://reviews.facebook.net/D4839	2012-08-28 11:43:33 -07:00
heyongqiang	d3759ca121	fix db_test error with scribe logger turned on Summary: as subject Test Plan: db_test Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D4929	2012-08-28 11:22:58 -07:00
Dhruba Borthakur	a098207c95	Fixed unit test c_test by initializing logger=NULL. Summary: Fixed unit test c_test by initializing logger=NULL. Removed "atomic" from last_log_ts so that unit tests do not require C11 compiler. Anyway, last_log_ts is mostly used for logging, so it is ok if it is loosely accurate. Test Plan: run c_test Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D4803	2012-08-21 17:10:29 -07:00
heyongqiang	6ba1f17789	adding a scribe logger in leveldb to log leveldb deploy stats Summary: as subject. A new log is written to scribe via thrift client when a new db is opened and when there is a compaction. a new option var scribe_log_db_stats is added. Test Plan: manually checked using command "ptail -time 0 leveldb_deploy_stats" Reviewers: dhruba Differential Revision: https://reviews.facebook.net/D4659	2012-08-21 11:43:22 -07:00
heyongqiang	20ee76bd34	use ts as suffix for LOG.old files Summary: as subject and only maintain 10 log files. Test Plan: new test in db_test Reviewers: dhruba Differential Revision: https://reviews.facebook.net/D4731	2012-08-17 16:22:04 -07:00
heyongqiang	22ee777f68	add flush interface to DB Summary: as subject. The flush will flush everything in the db. Test Plan: new test in db_test.cc Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D4029	2012-07-06 12:11:19 -07:00
heyongqiang	4e4b6812ff	Make some variables configurable for each db instance Summary: Make configurable 'targetFileSize', 'targetFileSizeMultiplier', 'maxBytesForLevelBase', 'maxBytesForLevelMultiplier', 'expandedCompactionFactor', 'maxGrandParentOverlapFactor' Test Plan: N/A Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D3801	2012-06-27 14:36:31 -07:00
Sanjay Ghemawat	85584d497e	Added bloom filter support. In particular, we add a new FilterPolicy class. An instance of this class can be supplied in Options when opening a database. If supplied, the instance is used to generate summaries of keys (e.g., a bloom filter) which are placed in sstables. These summaries are consulted by DB::Get() so we can avoid reading sstable blocks that are guaranteed to not contain the key we are looking for. This change provides one implementation of FilterPolicy based on bloom filters. Other changes: - Updated version number to 1.4. - Some build tweaks. - C binding for CompactRange. - A few more benchmarks: deleteseq, deleterandom, readmissing, seekrandom. - Minor .gitignore update.	2012-04-17 08:36:46 -07:00
Sanjay Ghemawat	d79762e273	added group commit; drastically speeds up mult-threaded synchronous write workloads	2012-03-08 16:23:21 -08:00
Hans Wennborg	36a5f8ed7f	A number of fixes: - Replace raw slice comparison with a call to user comparator. Added test for custom comparators. - Fix end of namespace comments. - Fixed bug in picking inputs for a level-0 compaction. When finding overlapping files, the covered range may expand as files are added to the input set. We now correctly expand the range when this happens instead of continuing to use the old range. For example, suppose L0 contains files with the following ranges: F1: a .. d F2: c .. g F3: f .. j and the initial compaction target is F3. We used to search for range f..j which yielded {F2,F3}. However we now expand the range as soon as another file is added. In this case, when F2 is added, we expand the range to c..j and restart the search. That picks up file F1 as well. This change fixes a bug related to deleted keys showing up incorrectly after a compaction as described in Issue 44. (Sync with upstream @25072954)	2011-10-31 17:22:06 +00:00
Gabor Cselle	299ccedfec	A number of bugfixes: - Added DB::CompactRange() method. Changed manual compaction code so it breaks up compactions of big ranges into smaller compactions. Changed the code that pushes the output of memtable compactions to higher levels to obey the grandparent constraint: i.e., we must never have a single file in level L that overlaps too much data in level L+1 (to avoid very expensive L-1 compactions). Added code to pretty-print internal keys. - Fixed bug where we would not detect overlap with files in level-0 because we were incorrectly using binary search on an array of files with overlapping ranges. Added "leveldb.sstables" property that can be used to dump all of the sstables and ranges that make up the db state. - Removing post_write_snapshot support. Email to leveldb mailing list brought up no users, just confusion from one person about what it meant. - Fixing static_cast char to unsigned on BIG_ENDIAN platforms. Fixes Issue 35 and Issue 36. - Comment clarification to address leveldb Issue 37. - Change license in posix_logger.h to match other files. - A build problem where uint32 was used instead of uint32_t. Sync with upstream @24408625	2011-10-05 16:30:28 -07:00
gabor@google.com	7263023651	Bugfixes: for Get(), don't hold mutex while writing log. - Fix bug in Get: when it triggers a compaction, it could sometimes mark the compaction with the wrong level (if there was a gap in the set of levels examined for the Get). - Do not hold mutex while writing to the log file or to the MANIFEST file. Added a new benchmark that runs a writer thread concurrently with reader threads. Percentiles ------------------------------ micros/op: avg median 99 99.9 99.99 99.999 max ------------------------------------------------------ before: 42 38 110 225 32000 42000 48000 after: 24 20 55 65 130 1100 7000 - Fixed race in optimized Get. It should have been using the pinned memtables, not the current memtables. git-svn-id: https://leveldb.googlecode.com/svn/trunk@50 62dab493-f737-651d-591e-8d6aee1b9529	2011-09-01 19:08:02 +00:00
gabor@google.com	ccf0fcd5c2	A number of smaller fixes and performance improvements: - Implemented Get() directly instead of building on top of a full merging iterator stack. This speeds up the "readrandom" benchmark by up to 15-30%. - Fixed an opensource compilation problem. Added --db=<name> flag to control where the database is placed. - Automatically compact a file when we have done enough overlapping seeks to that file. - Fixed a performance bug where we would read from at least one file in a level even if none of the files overlapped the key being read. - Makefile fix for Mac OSX installations that have XCode 4 without XCode 3. - Unified the two occurrences of binary search in a file-list into one routine. - Found and fixed a bug where we would unnecessarily search the last file when looking for a key larger than all data in the level. - A fix to avoid the need for trivial move compactions and therefore gets rid of two out of five syncs in "fillseq". - Removed the MANIFEST file write when switching to a new memtable/log-file for a 10-20% improvement on fill speed on ext4. - Adding a SNAPPY setting in the Makefile for folks who have Snappy installed. Snappy compresses values and speeds up writes. git-svn-id: https://leveldb.googlecode.com/svn/trunk@32 62dab493-f737-651d-591e-8d6aee1b9529	2011-06-22 02:36:45 +00:00
hans@chromium.org	80e5b0d944	sync with upstream @21706995 Fixed race condition reported by Dave Smit (dizzyd@dizzyd,com) on the leveldb mailing list. We were not signalling waiters after a trivial move from level-0. The result was that in some cases (hard to reproduce), a write would get stuck forever waiting for the number of level-0 files to drop below its hard limit. The new code is simpler: there is just one condition variable instead of two, and the condition variable is signalled after every piece of background work finishes. Also, all compaction work (including for manual compactions) is done in the background thread, and therefore we can remove the "compacting_" variable. git-svn-id: https://leveldb.googlecode.com/svn/trunk@31 62dab493-f737-651d-591e-8d6aee1b9529	2011-06-07 14:40:26 +00:00
dgrogan@chromium.org	740d8b3d00	Update from upstream @21551990 * Patch LevelDB to build for OSX and iOS * Fix race condition in memtable iterator deletion. * Other small fixes. git-svn-id: https://leveldb.googlecode.com/svn/trunk@29 62dab493-f737-651d-591e-8d6aee1b9529	2011-05-28 00:53:58 +00:00
dgrogan@chromium.org	ba6dac0e80	@20776309 * env_chromium.cc should not export symbols. * Fix MSVC warnings. * Removed large value support. * Fix broken reference to documentation file git-svn-id: https://leveldb.googlecode.com/svn/trunk@24 62dab493-f737-651d-591e-8d6aee1b9529	2011-04-20 22:48:11 +00:00
dgrogan@chromium.org	69c6d38342	reverting disastrous MOE commit, returning to r21 git-svn-id: https://leveldb.googlecode.com/svn/trunk@23 62dab493-f737-651d-591e-8d6aee1b9529	2011-04-19 23:11:15 +00:00
dgrogan@chromium.org	b743906eea	Revision created by MOE tool push_codebase. MOE_MIGRATION= git-svn-id: https://leveldb.googlecode.com/svn/trunk@22 62dab493-f737-651d-591e-8d6aee1b9529	2011-04-19 23:01:25 +00:00
dgrogan@chromium.org	b409afe968	chmod a-x git-svn-id: https://leveldb.googlecode.com/svn/trunk@21 62dab493-f737-651d-591e-8d6aee1b9529	2011-04-18 23:15:58 +00:00
dgrogan@chromium.org	f779e7a5d8	@20602303. Default file permission is now 755. git-svn-id: https://leveldb.googlecode.com/svn/trunk@20 62dab493-f737-651d-591e-8d6aee1b9529	2011-04-12 19:38:58 +00:00
jorlow@chromium.org	4671a695fc	Move include files into a leveldb subdir. git-svn-id: https://leveldb.googlecode.com/svn/trunk@18 62dab493-f737-651d-591e-8d6aee1b9529	2011-03-30 18:35:40 +00:00
jorlow@chromium.org	8303bb1b33	Pull from upstream. git-svn-id: https://leveldb.googlecode.com/svn/trunk@14 62dab493-f737-651d-591e-8d6aee1b9529	2011-03-22 23:24:02 +00:00
jorlow@chromium.org	13b72af77b	More changes from upstream. git-svn-id: https://leveldb.googlecode.com/svn/trunk@12 62dab493-f737-651d-591e-8d6aee1b9529	2011-03-22 18:32:49 +00:00
jorlow@chromium.org	f67e15e50f	Initial checkin. git-svn-id: https://leveldb.googlecode.com/svn/trunk@2 62dab493-f737-651d-591e-8d6aee1b9529	2011-03-18 22:37:00 +00:00

1 2 3 4

176 commits