Commit graph

130 commits

Author SHA1 Message Date
Dhruba Borthakur 7730587120 Prevent segfault in OpenCompactionOutputFile
Summary:
The segfault was happening because the program was unable to open a new
sst file (as part of the compaction) because the process ran out of
file descriptors.

The fix is to check the return status of the file creation before taking
any other action.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fabf03f9700 (LWP 29904)]
leveldb::DBImpl::OpenCompactionOutputFile (this=this@entry=0x7fabf9011400, compact=compact@entry=0x7fabf741a2b0) at db/db_impl.cc:1399
1399    db/db_impl.cc: No such file or directory.
(gdb) where

Test Plan: make check

Reviewers: MarkCallaghan, sheki

Reviewed By: MarkCallaghan

CC: leveldb

Differential Revision: https://reviews.facebook.net/D10101
2013-04-10 09:59:48 -07:00
Abhishek Kona 574b76f710 [RocksDB][Bug] Look at all the files, not just the first file in TransactionLogIter as BatchWrites can leave it in Limbo
Summary:
Transaction Log Iterator did not move to the next file in the series if there was a write batch at the end of the currentFile.
The solution is if the last seq no. of the current file is < RequestedSeqNo. Assume the first seqNo. of the next file has to satisfy the request.

Also major refactoring around the code. Moved opening the logreader to a seperate function, got rid of goto.

Test Plan: added a unit test for it.

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: leveldb, emayanke

Differential Revision: https://reviews.facebook.net/D10029
2013-04-08 16:28:09 -07:00
Abhishek Kona ca789a10cc [Rocksdb] Recover last updated sequence number from manifest also.
Summary:
During recovery, last_updated_manifest number was not set if there were no records in the Write-ahead log.
Now check for the recovered manifest also and set last_updated_manifest file to the max value.

Test Plan: unit test

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9891
2013-04-02 17:18:27 -07:00
Haobo Xu 6763110867 [RocksDB] Replace iterator based loop with range based loop for stl containers
Summary:
As title.
Code is shorter and cleaner
See https://our.dev.facebook.com/intern/tasks/?t=2233981

Test Plan: make check

Reviewers: dhruba, heyongqiang

Reviewed By: dhruba

CC: leveldb, zshao

Differential Revision: https://reviews.facebook.net/D9789
2013-04-02 11:46:45 -07:00
Haobo Xu 645ff8f231 Let's get rid of delete as much as possible, here are some examples.
Summary:
If a class owns an object:
 - If the object can be null => use a unique_ptr. no delete
 - If the object can not be null => don't even need new, let alone delete
 - for runtime sized array => use vector, no delete.

Test Plan: make check

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: leveldb, zshao, sheki, emayanke, MarkCallaghan

Differential Revision: https://reviews.facebook.net/D9783
2013-03-28 17:31:44 -07:00
Abhishek Kona 3b51605b8d [RocksDB] Fix binary search while finding probable wal files
Summary:
RocksDB does a binary search to look at the files which might contain the requested sequence number at the call GetUpdatesSince.
There was a bug in the binary search => when the file pointed by the middle index of bsearch was empty/corrupt it needst to resize the vector and update indexes.
This now fixes that.

Test Plan: existing unit tests pass.

Reviewers: heyongqiang, dhruba

Reviewed By: heyongqiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9777
2013-03-28 13:37:15 -07:00
Abhishek Kona 8e9c781ae5 [Rocksdb] Fix Crash on finding a db with no log files. Error out instead
Summary:
If the vector returned by GetUpdatesSince is empty, it is still returned to the
user. This causes it throw an std::range error.
The probable file list is checked and it returns an IOError status instead of OK now.

Test Plan: added a unit test.

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9771
2013-03-28 13:19:07 -07:00
Abhishek Kona 7fdd5f5b33 Use non-mmapd files for Write-Ahead Files
Summary:
Use non mmapd files for Write-Ahead log.
Earlier use of MMaped files. made the log iterator read ahead and miss records.
Now the reader and writer will point to the same physical location.

There is no perf regression :
./db_bench --benchmarks=fillseq --db=/dev/shm/mmap_test --num=$(million 20) --use_existing_db=0 --threads=2
with This diff :
fillseq      :      10.756 micros/op 185281 ops/sec;   20.5 MB/s
without this dif :
fillseq      :      11.085 micros/op 179676 ops/sec;   19.9 MB/s

Test Plan: unit test included

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9741
2013-03-28 13:13:35 -07:00
Haobo Xu ecd8db0200 [RocksDB] Minimize Mutex protected code section in the critical path
Summary: rocksdb uses a single global lock to protect in memory metadata. We should minimize the mutex protected code section to increase the effective parallelism of the program. See https://our.intern.facebook.com/intern/tasks/?t=2218928

Test Plan:
make check
db_bench

Reviewers: dhruba, heyongqiang

CC: zshao, leveldb

Differential Revision: https://reviews.facebook.net/D9705
2013-03-26 22:42:26 -07:00
Dhruba Borthakur d0798f67f4 Run compactions even if workload is readonly or read-mostly.
Summary:
The events that trigger compaction:
* opening the database
* Get -> only if seek compaction is not disabled and other checks are true
* MakeRoomForWrite -> when memtable is full
* BackgroundCall ->
  If the background thread is about to do a compaction run, it schedules
  a new background task to trigger a possible compaction. This will cause
  additional background threads to find and process other compactions that
  can run concurrently.

Test Plan: ran db_bench with overwrite and readonly alternatively.

Reviewers: sheki, MarkCallaghan

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9579
2013-03-20 23:43:29 -07:00
Dhruba Borthakur ad96563b79 Ability to configure bufferedio-reads, filesystem-readaheads and mmap-read-write per database.
Summary:
This patch allows an application to specify whether to use bufferedio,
reads-via-mmaps and writes-via-mmaps per database. Earlier, there
was a global static variable that was used to configure this functionality.

The default setting remains the same (and is backward compatible):
 1. use bufferedio
 2. do not use mmaps for reads
 3. use mmap for writes
 4. use readaheads for reads needed for compaction

I also added a parameter to db_bench to be able to explicitly specify
whether to do readaheads for compactions or not.

Test Plan: make check

Reviewers: sheki, heyongqiang, MarkCallaghan

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9429
2013-03-20 23:14:03 -07:00
Mayank Agarwal 487168cdcf Fixed sign-comparison in rocksdb code-base and fixed Makefile
Summary: Makefile had options to ignore sign-comparisons and unused-parameters, which should be there. Also fixed the specific errors in the code-base

Test Plan: make

Reviewers: chip, dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9531
2013-03-19 14:35:23 -07:00
Mark Callaghan 72d14eafd3 add --benchmarks=levelstats option to db_bench, prevent "nan" in stats output
Summary:
Add --benchmarks=levelstats option to report per-level stats (#files, #bytes)
Change readwhilewriting test to report response time for writes but exclude
them from the stats merged by all threads.
Prevent "NaN" in stats output by preventing division by 0.
Remove "o" file I committed by mistake.

Task ID: #

Blame Rev:

Test Plan:
make check

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D9513
2013-03-19 13:14:44 -07:00
Abhishek Kona 02c459805b Ignore a zero-sized file while looking for a seq-no in GetUpdatesSince
Summary:
Rocksdb can create 0 sized log files when it is opened and closed without any operations.
The GetUpdatesSince fails currently if there is a log file of size zero.

This diff fixes this. If there is a log file is 0, it is removed form the probable_file_list

Test Plan: unit test

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9507
2013-03-19 11:00:09 -07:00
Dhruba Borthakur 6d812b6afb A mechanism to detect manifest file write errors and put db in readonly mode.
Summary:
If there is an error while writing an edit to the manifest file, the manifest
file is closed and reopened to check if the edit made it in. However, if the
re-opening of the manifest is unsuccessful and options.paranoid_checks is set
t true, then the db refuses to accept new puts, effectively putting the db
in readonly mode.

In a future diff, I would like to make the default value of paranoid_check
to true.

Test Plan: make check

Reviewers: sheki

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9201
2013-03-07 09:45:49 -08:00
Abhishek Kona d68880a1b9 Do not allow Transaction Log Iterator to fall ahead when writer is writing the same file
Summary:
Store the last flushed, seq no. in db_impl. Check against it in
transaction Log iterator. Do not attempt to read ahead if we do not know
if the data is flushed completely.
Does not work if flush is disabled. Any ideas on fixing that?
* Minor change, iter->Next is called the first time automatically for
* the first time.

Test Plan:
existing test pass.
More ideas on testing this?
Planning to run some stress test.

Reviewers: dhruba, heyongqiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9087
2013-03-06 14:05:53 -08:00
Dhruba Borthakur afed60938f Fox db_stress crash by copying keys before changing sequencenum to zero.
Summary:
The compaction process zeros out sequence numbers if the output is
part of the bottommost level.
The Slice is supposed to refer to an immutable data buffer. The
merger that implements the priority queue while reading kvs as
the input of a compaction run reies on this fact. The bug was that
were updating the sequence number of a record in-place and that was
causing suceeding invocations of the merger to return kvs in
arbitrary order of sequence numbers.
The fix is to copy the key to a local memory buffer before setting
its seqno to 0.

Test Plan:
Set Options.purge_redundant_kvs_while_flush = false and then run
db_stress --ops_per_thread=1000 --max_key=320

Reviewers: emayanke, sheki

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9147
2013-03-06 10:52:08 -08:00
Mark Callaghan 993543d1be Add rate_delay_limit_milliseconds
Summary:
This adds the rate_delay_limit_milliseconds option to make the delay
configurable in MakeRoomForWrite when the max compaction score is too high.
This delay is called the Ln slowdown. This change also counts the Ln slowdown
per level to make it possible to see where the stalls occur.

From IO-bound performance testing, the Level N stalls occur:
* with compression -> at the largest uncompressed level. This makes sense
                      because compaction for compressed levels is much
                      slower. When Lx is uncompressed and Lx+1 is compressed
                      then files pile up at Lx because the (Lx,Lx+1)->Lx+1
                      compaction process is the first to be slowed by
                      compression.
* without compression -> at level 1

Task ID: #1832108

Blame Rev:

Test Plan:
run with real data, added test

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D9045
2013-03-04 07:41:15 -08:00
Dhruba Borthakur 806e264350 Ability for rocksdb to compact when flushing the in-memory memtable to a file in L0.
Summary:
Rocks accumulates recent writes and deletes in the in-memory memtable.
When the memtable is full, it writes the contents on the memtable to
a file in L0.

This patch removes redundant records at the time of the flush. If there
are multiple versions of the same key in the memtable, then only the
most recent one is dumped into the output file. The purging of
redundant records occur only if the most recent snapshot is earlier
than the earliest record in the memtable.

Should we switch on this feature by default or should we keep this feature
turned off in the default settings?

Test Plan: Added test case to db_test.cc

Reviewers: sheki, vamsi, emayanke, heyongqiang

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8991
2013-03-04 00:01:47 -08:00
Dhruba Borthakur e45c7a8444 Abilty to support upto a million .sst files in the database
Summary:
There was an artifical limit of 50K files per database. This is
insifficient if the database is 1 TB in size and each file is 2 MB.

Test Plan: make check

Reviewers: sheki, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8919
2013-02-26 16:27:51 -08:00
Abhishek Kona 959337ed5b Measure compaction time.
Summary: just record time consumed in compaction

Test Plan: compile

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8781
2013-02-22 11:38:40 -08:00
Abhishek Kona ec77366e14 Counters for bytes written and read.
Summary:
* Counters for bytes read and write.
as a part of this diff, I want to=>
* Measure compaction times. @dhruba can you point which function, should
* I time to get Compaction-times. Was looking at CompactRange.

Test Plan: db_test

Reviewers: dhruba, emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8763
2013-02-21 16:06:32 -08:00
Abhishek Kona fe10200ddc Introduce histogram in statistics.h
Summary:
* Introduce is histogram in statistics.h
* stop watch to measure time.
* introduce two timers as a poc.
Replaced NULL with nullptr to fight some lint errors
Should be useful for google.

Test Plan:
ran db_bench and check stats.
make all check

Reviewers: dhruba, heyongqiang

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8637
2013-02-20 10:43:32 -08:00
amayank f3901e0647 Revert "Fix for the weird behaviour encountered by ldb Get where it could read only the second-latest value"
This reverts commit 4c696ed001.
2013-02-18 22:32:27 -08:00
Dhruba Borthakur 4564915446 Zero out redundant sequence numbers for kvs to increase compression efficiency
Summary:
The sequence numbers in each record eat up plenty of space on storage.
The optimization zeroes out sequence numbers on kvs in the Lmax
layer that are earlier than the earliest snapshot.

Test Plan: Unit test attached.

Differential Revision: https://reviews.facebook.net/D8619
2013-02-18 21:51:15 -08:00
amayank 4c696ed001 Fix for the weird behaviour encountered by ldb Get where it could read only the second-latest value
Summary:
flush_on_destroy has a default value of false and the memtable is flushed
in the dbimpl-destructor only when that is set to true. Because we want the memtable to be flushed everytime that
the destructor is called(db is closed) and the cases where we work with the memtable only are very less
it is a good idea to give this a default value of true. Thus the put from ldb
wil have its data flushed to disk in the destructor and the next Get will be able to
read it when opened with OpenForReadOnly. The reason that ldb could read the latest value when
the db was opened in the normal Open mode is that the Get from normal Open first reads
the memtable and directly finds the latest value written there and the Get from OpenForReadOnly
doesn't have access to the memtable (which is correct because all its Put/Modify) are disabled

Test Plan: make all; ldb put and get and scans

Reviewers: dhruba, heyongqiang, sheki

Reviewed By: heyongqiang

CC: kosievdmerwe, zshao, dilipj, kailiu

Differential Revision: https://reviews.facebook.net/D8631
2013-02-15 16:56:06 -08:00
Kai Liu b63aafce42 Allow the logs to be purged by TTL.
Summary:
* Add a SplitByTTLLogger to enable this feature. In this diff I implemented generalized AutoSplitLoggerBase class to simplify the
development of such classes.
* Refactor the existing AutoSplitLogger and fix several bugs.

Test Plan:
* Added a unit tests for different types of "auto splitable" loggers individually.
* Tested the composited logger which allows the log files to be splitted by both TTL and log size.

Reviewers: heyongqiang, dhruba

Reviewed By: heyongqiang

CC: zshao, leveldb

Differential Revision: https://reviews.facebook.net/D8037
2013-02-04 19:42:40 -08:00
Chip Turner 0b83a83191 Fix poor error on num_levels mismatch and few other minor improvements
Summary:
Previously, if you opened a db with num_levels set lower than
the database, you received the unhelpful message "Corruption:
VersionEdit: new-file entry."  Now you get a more verbose message
describing the issue.

Also, fix handling of compression_levels (both the run-over-the-end
issue and the memory management of it).

Lastly, unique_ptr'ify a couple of minor calls.

Test Plan: make check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8151
2013-01-25 15:37:26 -08:00
Chip Turner 772f75b3fb Stop continually re-creating build_version.c
Summary:
We continually rebuilt build_version.c because we put the
current date into it, but that's what __DATE__ already is.  This makes
builds faster.

This also fixes an issue with 'make clean FOO' not working properly.

Also tweak the build rules to be more consistent, always have warnings,
and add a 'make release' rule to handle flags for release builds.

Test Plan: make, make clean

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D8139
2013-01-24 17:51:39 -08:00
Chip Turner 3dafdfb2c4 Use fallocate to prevent excessive allocation of sst files and logs
Summary:
On some filesystems, pre-allocation can be a considerable
amount of space.  xfs in our production environment pre-allocates by
1GB, for instance.  By using fallocate to inform the kernel of our
expected file sizes, we eliminate this wasteage (that isn't recovered
until the file is closed which, in the case of LOG files, can be a
considerable amount of time).

Test Plan:
created an xfs loopback filesystem, mounted with
allocsize=4M, and ran db_stress.  LOG file without this change was 4M,
and with it it was 128k then grew to normal size.

Reviewers: dhruba

Reviewed By: dhruba

CC: adsharma, leveldb

Differential Revision: https://reviews.facebook.net/D7953
2013-01-24 12:25:13 -08:00
Chip Turner 2fdf91a4f8 Fix a number of object lifetime/ownership issues
Summary:
Replace manual memory management with std::unique_ptr in a
number of places; not exhaustive, but this fixes a few leaks with file
handles as well as clarifies semantics of the ownership of file handles
with log classes.

Test Plan: db_stress, make check

Reviewers: dhruba

Reviewed By: dhruba

CC: zshao, leveldb, heyongqiang

Differential Revision: https://reviews.facebook.net/D8043
2013-01-23 16:54:11 -08:00
Abhishek Kona 16903c35b0 Add counters to count gets and writes
Summary: Add Tickers to count Write's and Get's

Test Plan: make check

Reviewers: dhruba, chip

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7977
2013-01-17 12:27:56 -08:00
Kosie van der Merwe 3c3df7402f Fixed issues Valgrind found.
Summary:
Found issues with `db_test` and `db_stress` when running valgrind.

`DBImpl` had an issue where if an compaction failed then it will use the uninitialised file size of an output file is used. This manifested as the final call to output to the log in `DoCompactionWork()` branching on uninitialized memory (all the way down in printf's innards).

Test Plan:
Ran `valgrind --track_origins=yes ./db_test` and `valgrind ./db_stress` to see if issues disappeared.

Ran `make check` to see if there were no regressions.

Reviewers: vamsi, dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8001
2013-01-17 10:04:45 -08:00
Abhishek Kona 7d5a4383bb rollover manifest file.
Summary:
Check in LogAndApply if the file size is more than the limit set in
Options.
Things to consider : will this be expensive?

Test Plan: make all check. Inputs on a new unit test?

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7701
2013-01-16 12:09:44 -08:00
Chip Turner c0cb289d57 Various build cleanups/improvements
Summary:
Specific changes:

1) Turn on -Werror so all warnings are errors
2) Fix some warnings the above now complains about
3) Add proper dependency support so changing a .h file forces a .c file
to rebuild
4) Automatically use fbcode gcc on any internal machine rather than
whatever system compiler is laying around
5) Fix jemalloc to once again be used in the builds (seemed like it
wasn't being?)
6) Fix issue where 'git' would fail in build_detect_version because of
LD_LIBRARY_PATH being set in the third-party build system

Test Plan:
make, make check, make clean, touch a header file, make sure
rebuild is expected

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D7887
2013-01-14 18:40:22 -08:00
Kosie van der Merwe d8371ef1f6 Fixing some issues Valgrind found
Summary: Found some issues running Valgrind on `db_test` (there are still some outstanding ones) and fixed them.

Test Plan:
make check

ran `valgrind ./db_test` and saw that errors no longer occur

Reviewers: dhruba, vamsi, emayanke, sheki

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7803
2013-01-08 12:16:40 -08:00
Kosie van der Merwe d6e873f22f Added clearer error message for failure to create db directory in DBImpl::Recover()
Summary:
Changed CreateDir() to CreateDirIfMissing() so a directory that already exists now causes and error.

Fixed CreateDirIfMissing() and added Env.DirExists()

Test Plan:
make check to test for regessions

Ran the following to test if the error message is not about lock files not existing
./db_bench --db=dir/testdb

After creating a file "testdb", ran the following to see if it failed with sane error message:
./db_bench --db=testdb

Reviewers: dhruba, emayanke, vamsi, sheki

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7707
2013-01-07 10:11:18 -08:00
Dhruba Borthakur f4c2b7cf97 Enhance ReadOnly mode to process the all committed transactions.
Summary:
Leveldb has an api OpenForReadOnly() that opens the database
in readonly mode. This call had an option to not process the
transaction log.  This patch removes this option and always
processes all transactions that had been committed. It has
been done in such a way that it does not create/write to
any new files in the process. The invariant of "no-writes"
to the leveldb data directory is still true.

This enhancement allows multiple threads to open the same database
in readonly mode and access all trancations that were committed right
upto the OpenForReadOnly call.

I changed the public API to match the new semantics because
there are no users who are currently using this api.

Test Plan: make clean check

Reviewers: sheki

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7479
2012-12-19 16:30:46 -08:00
Dhruba Borthakur 3d1e92b05a Enhancements to rocksdb for better support for replication.
Summary:
1. The OpenForReadOnly() call should not lock the db. This is useful
so that multiple processes can open the same database concurrently
for reading.
2. GetUpdatesSince should not error out if the archive directory
does not exist.
3. A new constructor for WriteBatch that can takes a serialized
string as a parameter of the constructor.

Test Plan: make clean check

Reviewers: sheki

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7449
2012-12-17 11:40:19 -08:00
Kosie van der Merwe 62d48571de Added meta-database support.
Summary:
Added kMetaDatabase for meta-databases in db/filename.h along with supporting
fuctions.
Fixed switch in DBImpl so that it also handles kMetaDatabase.
Fixed DestroyDB() that it can handle destroying meta-databases.

Test Plan: make check

Reviewers: sheki, emayanke, vamsi, dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D7245
2012-12-17 11:26:59 -08:00
Abhishek Kona 2f0585fb97 Fix a bug. Where DestroyDB deletes a non-existant archive directory.
Summary:
C tests would fail sometimes as DestroyDB would return a Failure Status
message when deleting an archival directory which was not created
(WAL_ttl_seconds = 0).

Fix: Ignore the Status returned on Deleting Archival Directory.

Test Plan: * make check

Reviewers: dhruba, emayanke

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7395
2012-12-17 10:25:26 -08:00
Abhishek Kona 22c283625b Fix Bug in Binary Search for files containing a seq no. and delete Archived Log Files during Destroy DB.
Summary:
* Fixed implementation bug in Binary_Searvch introduced in https://reviews.facebook.net/D7119
* Binary search is also overflow safe.
* Delete archive log files and archive dir during DestroyDB

Test Plan: make check

Reviewers: dhruba

CC: kosievdmerwe, emayanke

Differential Revision: https://reviews.facebook.net/D7263
2012-12-11 16:15:02 -08:00
Dhruba Borthakur 24fc379273 An public api to fetch the latest transaction id.
Summary:
Implement a interface to retrieve the most current transaction
id from the database.

Test Plan: Added unit test.

Reviewers: sheki

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7269
2012-12-10 16:04:19 -08:00
Abhishek Kona 1c6742e32f Refactor GetArchivalDirectoryName to filename.h
Summary:
filename.h has functions to do similar things.
Moving code away from db_impl.cc

Test Plan: make check

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D7251
2012-12-10 10:51:07 -08:00
Abhishek Kona 8055008909 GetUpdatesSince API to enable replication.
Summary:
How it works:
* GetUpdatesSince takes a SequenceNumber.
* A LogFile with the first SequenceNumber nearest and lesser than the requested Sequence Number is found.
* Seek in the logFile till the requested SeqNumber is found.
* Return an iterator which contains logic to return record's one by one.

Test Plan:
* Test case included to check the good code path.
* Will update with more test-cases.
* Feedback required on test-cases.

Reviewers: dhruba, emayanke

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7119
2012-12-07 11:42:13 -08:00
Dhruba Borthakur c847a31727 Print compaction score for every compaction run.
Summary:
A compaction is picked based on its score. It is useful to
print the compaction score in the LOG because it aids in
debugging. If one looks at the logs, one can find out why
a compaction was preferred over another.

Test Plan: make clean check

Differential Revision: https://reviews.facebook.net/D7137
2012-12-04 10:03:47 -08:00
sheki d4627e6de4 Move WAL files to archive directory, instead of deleting.
Summary:
Create a directory "archive" in the DB directory.
During DeleteObsolteFiles move the WAL files (*.log) to the Archive directory,
instead of deleting.

Test Plan: Created a DB using DB_Bench. Reopened it. Checked if files move.

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6975
2012-11-28 17:28:08 -08:00
Abhishek Kona d29f181923 Fix all the lint errors.
Summary:
Scripted and removed all trailing spaces and converted all tabs to
spaces.

Also fixed other lint errors.
All lint errors from this point of time should be taken seriously.

Test Plan: make all check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7059
2012-11-28 17:18:41 -08:00
Dhruba Borthakur 9a357847eb Delete non-visible keys during a compaction even in the presense of snapshots.
Summary:
 LevelDB should delete almost-new keys when a long-open snapshot exists.
The previous behavior is to keep all versions that were created after the
oldest open snapshot. This can lead to database size bloat for
high-update workloads when there are long-open snapshots and long-open
snapshot will be used for logical backup. By "almost new" I mean that the
key was updated more than once after the oldest snapshot.

If there were two snapshots with seq numbers s1 and s2 (s1 < s2), and if
we find two instances of the same key k1 that lie entirely within s1 and
s2 (i.e. s1 < k1 < s2), then the earlier version
of k1 can be safely deleted because that version is not visible in any snapshot.

Test Plan:
unit test attached
make clean check

Differential Revision: https://reviews.facebook.net/D6999
2012-11-28 15:47:40 -08:00
Dhruba Borthakur 3366eda839 Print out status at the end of a compaction run.
Summary:
Print out status at the end of a compaction run. This helps in
debugging.

Test Plan: make clean check

Reviewers: sheki

Reviewed By: sheki

Differential Revision: https://reviews.facebook.net/D7035
2012-11-27 22:17:38 -08:00