Commit Graph

1999 Commits

Author SHA1 Message Date
heyongqiang e21ba94a69 Set FD_CLOEXEC after each file open
Summary: as subject. This is causing problem in adsconv. Ideally, this flags should be set in open. But that is only supported in Linux kernel ≥2.6.23 and glibc ≥2.7.

Test Plan:
db_test

run db_test

Reviewers: dhruba, MarkCallaghan, haobo

Reviewed By: dhruba

CC: leveldb, chip

Differential Revision: https://reviews.facebook.net/D10089
2013-04-10 14:44:06 -07:00
Mayank Agarwal adb4e4509b Fixing delete in env_posix.cc
Summary: Was deleting incorrectly. Should delete the whole array.

Test Plan: make;valgrind stops complaining about Mismatched free/delete

Reviewers: dhruba, sheki

Reviewed By: sheki

CC: leveldb, haobo

Differential Revision: https://reviews.facebook.net/D10059
2013-04-09 11:49:35 -07:00
Haobo Xu a77bc3d14c [RocksDB] Fix LRUCache Eviction problem
Summary:
1. The stock LRUCache nukes itself whenever the working set (the total number of entries not released by client at a certain time) is bigger than the cache capacity.
See https://our.dev.facebook.com/intern/tasks/?t=2252281
2. There's a bug in shard calculation leading to segmentation fault when only one shard is needed.

Test Plan: make check

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: leveldb, zshao, sheki

Differential Revision: https://reviews.facebook.net/D9927
2013-04-04 11:22:50 -07:00
Haobo Xu d815082159 [RocksDB] env_posix cleanup
Summary:
1. SetBackgroundThreads was not thread safe
2. queue_size_ does not seem necessary
3. moved condition signal after shared state change. Even though the original
   order is in practice ok (because the mutex is still held), it looks fishy
   and non-intuitive.

Test Plan: make check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb, zshao

Differential Revision: https://reviews.facebook.net/D9825
2013-04-02 11:36:51 -07:00
Abhishek Kona 7fdd5f5b33 Use non-mmapd files for Write-Ahead Files
Summary:
Use non mmapd files for Write-Ahead log.
Earlier use of MMaped files. made the log iterator read ahead and miss records.
Now the reader and writer will point to the same physical location.

There is no perf regression :
./db_bench --benchmarks=fillseq --db=/dev/shm/mmap_test --num=$(million 20) --use_existing_db=0 --threads=2
with This diff :
fillseq      :      10.756 micros/op 185281 ops/sec;   20.5 MB/s
without this dif :
fillseq      :      11.085 micros/op 179676 ops/sec;   19.9 MB/s

Test Plan: unit test included

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9741
2013-03-28 13:13:35 -07:00
Abhishek Kona 63f216ee0a memory manage statistics
Summary:
Earlier Statistics object was a raw pointer. This meant the user had to clear up
the Statistics object after creating the database. In most use cases the database is created in a function and the statistics pointer is out of scope. Hence the statistics object would never be deleted.
Now Using a shared_ptr to manage this.

Want this in before the next release.

Test Plan: make all check.

Reviewers: dhruba, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9735
2013-03-27 11:27:39 -07:00
Simon Marlow a8bf8fe504 Integrate the manifest_dump command with ldb
Summary:
Syntax:

   manifest_dump [--verbose] --num=<manifest_num>

e.g.

$ ./ldb --db=/home/smarlow/tmp/testdb manifest_dump --num=12
manifest_file_number 13 next_file_number 14 last_sequence 3 log_number
11  prev_log_number 0
--- level 0 --- version# 0 ---
 6:116['a1' @ 1 : 1 .. 'a1' @ 1 : 1]
 10:130['a3' @ 2 : 1 .. 'a4' @ 3 : 1]
--- level 1 --- version# 0 ---
--- level 2 --- version# 0 ---
--- level 3 --- version# 0 ---
--- level 4 --- version# 0 ---
--- level 5 --- version# 0 ---
--- level 6 --- version# 0 ---

Test Plan: - Tested on an example DB (see output in summary)

Reviewers: sheki, dhruba

Reviewed By: sheki

CC: leveldb, heyongqiang

Differential Revision: https://reviews.facebook.net/D9609
2013-03-22 09:17:30 -07:00
Mayank Agarwal 38d54832f7 Initialize variable in constructor for PosixEnv::checkedDiskForMmap_
Summary: This caused compilation problems on some gcc platforms during the third-partyrelease

Test Plan: make

Reviewers: sheki

Reviewed By: sheki

Differential Revision: https://reviews.facebook.net/D9627
2013-03-21 11:26:50 -07:00
Dhruba Borthakur ad96563b79 Ability to configure bufferedio-reads, filesystem-readaheads and mmap-read-write per database.
Summary:
This patch allows an application to specify whether to use bufferedio,
reads-via-mmaps and writes-via-mmaps per database. Earlier, there
was a global static variable that was used to configure this functionality.

The default setting remains the same (and is backward compatible):
 1. use bufferedio
 2. do not use mmaps for reads
 3. use mmap for writes
 4. use readaheads for reads needed for compaction

I also added a parameter to db_bench to be able to explicitly specify
whether to do readaheads for compactions or not.

Test Plan: make check

Reviewers: sheki, heyongqiang, MarkCallaghan

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9429
2013-03-20 23:14:03 -07:00
Mayank Agarwal a6f4275403 Removing boost from ldb_cmd.cc
Summary: Getting rid of boost in our github codebase which caused problems on third-party

Test Plan: make ldb; python tools/ldb_test.py

Reviewers: sheki, dhruba

Reviewed By: sheki

Differential Revision: https://reviews.facebook.net/D9543
2013-03-20 11:19:12 -07:00
Mayank Agarwal 48abc06049 Using return value of fwrite in posix_logger.h
Summary: Was causing error(warning) in third-party saying unused result

Test Plan: make

Reviewers: sheki, dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D9447
2013-03-19 21:33:01 -07:00
Mayank Agarwal 487168cdcf Fixed sign-comparison in rocksdb code-base and fixed Makefile
Summary: Makefile had options to ignore sign-comparisons and unused-parameters, which should be there. Also fixed the specific errors in the code-base

Test Plan: make

Reviewers: chip, dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9531
2013-03-19 14:35:23 -07:00
Mayank Agarwal f04cc368f7 Fixing a careless mistake in ldb
Summary: negation of the condition checked currently had to be checkd actually

Test Plan: make ldb; python ldb_test.py

Reviewers: sheki, dhruba

Reviewed By: sheki

Differential Revision: https://reviews.facebook.net/D9459
2013-03-15 13:59:11 -07:00
Mayank Agarwal a78fb5e8bc Doing away with boost in ldb_cmd.h
Summary: boost functions cause complications while deploying to third-party

Test Plan: make

Reviewers: sheki, dhruba

Reviewed By: sheki

Differential Revision: https://reviews.facebook.net/D9441
2013-03-14 18:16:46 -07:00
Abhishek Kona 1ba5abca97 Use posix_fallocate as default.
Summary:
Ftruncate does not throw an error on disk-full. This causes Sig-bus in
the case where the database tries to issue a Put call on a full-disk.

Use posix_fallocate for allocation instead of truncate.
Add a check to use MMaped files only on ext4, xfs and tempfs, as
posix_fallocate is very slow on ext3 and older.

Test Plan: make all check

Reviewers: dhruba, chip

Reviewed By: dhruba

CC: adsharma, leveldb

Differential Revision: https://reviews.facebook.net/D9291
2013-03-13 13:50:26 -07:00
Mayank Agarwal 5b278b53ae Fix valgrind errors in rocksdb tests: auto_roll_logger_test, reduce_levels_test
Summary: Fix for memory leaks in rocksdb tests. Also modified the variable NUM_FAILED_TESTS to print the actual number of failed tests.

Test Plan: make <test>; valgrind --leak-check=full ./<test>

Reviewers: sheki, dhruba

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9333
2013-03-12 16:03:16 -07:00
Dhruba Borthakur 469724be7f Add appropriate parameters to make bulk-load go faster.
Summary:
1. Create only 2 levels so that manual compactions are fast.
2. Set target file size to a large value

Test Plan: make clean check

Reviewers: kailiu, zshao

Reviewed By: zshao

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9231
2013-03-08 10:52:16 -08:00
Zheng Shao 7b43500794 [RocksDB] Add bulk_load option to Options and ldb
Summary:
Add a shortcut function to make it easier for people
to efficiently bulk_load data into RocksDB.

Test Plan:
Tried ldb with "--bulk_load" and "--bulk_load --compact" and verified the outcome.
Needs to consult the team on how to test this automatically.

Reviewers: sheki, dhruba, emayanke, heyongqiang

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8907
2013-03-05 00:34:53 -08:00
Mark Callaghan 993543d1be Add rate_delay_limit_milliseconds
Summary:
This adds the rate_delay_limit_milliseconds option to make the delay
configurable in MakeRoomForWrite when the max compaction score is too high.
This delay is called the Ln slowdown. This change also counts the Ln slowdown
per level to make it possible to see where the stalls occur.

From IO-bound performance testing, the Level N stalls occur:
* with compression -> at the largest uncompressed level. This makes sense
                      because compaction for compressed levels is much
                      slower. When Lx is uncompressed and Lx+1 is compressed
                      then files pile up at Lx because the (Lx,Lx+1)->Lx+1
                      compaction process is the first to be slowed by
                      compression.
* without compression -> at level 1

Task ID: #1832108

Blame Rev:

Test Plan:
run with real data, added test

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D9045
2013-03-04 07:41:15 -08:00
Dhruba Borthakur 806e264350 Ability for rocksdb to compact when flushing the in-memory memtable to a file in L0.
Summary:
Rocks accumulates recent writes and deletes in the in-memory memtable.
When the memtable is full, it writes the contents on the memtable to
a file in L0.

This patch removes redundant records at the time of the flush. If there
are multiple versions of the same key in the memtable, then only the
most recent one is dumped into the output file. The purging of
redundant records occur only if the most recent snapshot is earlier
than the earliest record in the memtable.

Should we switch on this feature by default or should we keep this feature
turned off in the default settings?

Test Plan: Added test case to db_test.cc

Reviewers: sheki, vamsi, emayanke, heyongqiang

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8991
2013-03-04 00:01:47 -08:00
Abhishek Kona c41f1e995c Codemod NULL to nullptr
Summary:
scripted NULL to nullptr in
* include/leveldb/
* db/
* table/
* util/

Test Plan: make all check

Reviewers: dhruba, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9003
2013-02-28 18:04:58 -08:00
Abhishek Kona 9bf91c74b8 ldb waldump to print the keys along with other stats + NULL to nullptr in ldb_cmd.cc
Summary: LDB tool to print the deleted/put keys in hex in the wal file.

Test Plan: run ldb on a  db to check if output was satisfactory

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8691
2013-02-20 11:01:37 -08:00
amayank b2c50f1c3f Fix for the weird behaviour encountered by ldb Get where it could read only the second-latest value
Summary:
Changed the Get and Scan options with openForReadOnly mode to have access to the memtable.
Changed the visibility of NewInternalIterator in db_impl from private to protected so that
the derived class db_impl_read_only can call that in its NewIterator function for the
scan case. The previous approach which changed the default for flush_on_destroy_ from false to true
caused many problems in the unit tests due to empty sst files that it created. All
unit tests pass now.

Test Plan: make clean; make all check; ldb put and get and scans

Reviewers: dhruba, heyongqiang, sheki

Reviewed By: dhruba

CC: kosievdmerwe, zshao, dilipj, kailiu

Differential Revision: https://reviews.facebook.net/D8697
2013-02-20 10:45:52 -08:00
Abhishek Kona fe10200ddc Introduce histogram in statistics.h
Summary:
* Introduce is histogram in statistics.h
* stop watch to measure time.
* introduce two timers as a poc.
Replaced NULL with nullptr to fight some lint errors
Should be useful for google.

Test Plan:
ran db_bench and check stats.
make all check

Reviewers: dhruba, heyongqiang

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8637
2013-02-20 10:43:32 -08:00
Kai Liu 45f0030458 Fix the "IO error" in auto_roll_logger_test
Summary:

I missed InitTestDb() in one of my tess. InitTestDb() initializes the test directory, without which the test will throw IO error.

This problem didn't occur before because I've already run the tests before so the test directory is already there.

Test Plan:

Reviewers: dhruba

CC:

Task ID: #

Blame Rev:
2013-02-19 00:13:22 -08:00
amayank f3901e0647 Revert "Fix for the weird behaviour encountered by ldb Get where it could read only the second-latest value"
This reverts commit 4c696ed001.
2013-02-18 22:32:27 -08:00
amayank 4c696ed001 Fix for the weird behaviour encountered by ldb Get where it could read only the second-latest value
Summary:
flush_on_destroy has a default value of false and the memtable is flushed
in the dbimpl-destructor only when that is set to true. Because we want the memtable to be flushed everytime that
the destructor is called(db is closed) and the cases where we work with the memtable only are very less
it is a good idea to give this a default value of true. Thus the put from ldb
wil have its data flushed to disk in the destructor and the next Get will be able to
read it when opened with OpenForReadOnly. The reason that ldb could read the latest value when
the db was opened in the normal Open mode is that the Get from normal Open first reads
the memtable and directly finds the latest value written there and the Get from OpenForReadOnly
doesn't have access to the memtable (which is correct because all its Put/Modify) are disabled

Test Plan: make all; ldb put and get and scans

Reviewers: dhruba, heyongqiang, sheki

Reviewed By: heyongqiang

CC: kosievdmerwe, zshao, dilipj, kailiu

Differential Revision: https://reviews.facebook.net/D8631
2013-02-15 16:56:06 -08:00
Kai Liu aaa0cbb97a Fix the warning introduced by auto_roll_logger_test
Summary: Fix the warning [-Werror=format-security] and [-Werror=unused-result].

Test Plan:
enforced the Werror and run make

Task ID: 2101673

Blame Rev:

Reviewers: heyongqiang

Differential Revision: https://reviews.facebook.net/D8553
2013-02-13 15:29:35 -08:00
Chip Turner f02db1c118 Add zlib to our builds and tweak histogram output
Summary:
$SUBJECT -- cosmetic fix for histograms, print P75/P99, and
make sure zlib is enabled for our command line tools.

Test Plan: compile, test db_bench with --compression_type=zlib

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: adsharma, leveldb

Differential Revision: https://reviews.facebook.net/D8445
2013-02-07 15:31:53 -08:00
Kai Liu b63aafce42 Allow the logs to be purged by TTL.
Summary:
* Add a SplitByTTLLogger to enable this feature. In this diff I implemented generalized AutoSplitLoggerBase class to simplify the
development of such classes.
* Refactor the existing AutoSplitLogger and fix several bugs.

Test Plan:
* Added a unit tests for different types of "auto splitable" loggers individually.
* Tested the composited logger which allows the log files to be splitted by both TTL and log size.

Reviewers: heyongqiang, dhruba

Reviewed By: heyongqiang

CC: zshao, leveldb

Differential Revision: https://reviews.facebook.net/D8037
2013-02-04 19:42:40 -08:00
Abhishek Kona 4dc02f7b7a Initialize all doubles to 0 in histogram.cc
Summary:
The existing code did not initialize a few doubles in histogram.cc.
Cropped up when I wrote a unit-test.

Test Plan: make all check

Reviewers: chip

Reviewed By: chip

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8319
2013-01-31 17:31:43 -08:00
Abhishek Kona 009034cf12 Performant util/histogram.
Summary:
Earlier way to record in histogram=>
Linear search BucketLimit array to find the bucket and increment the
counter
Current way to record in histogram=>
Store a HistMap statically which points the buckets of each value in the
range [kFirstValue, kLastValue);

In the proccess use vectors instead of array's and refactor some code to
HistogramHelper class.

Test Plan:
run db_bench with histogram=1 and see a histogram being
printed.

Reviewers: dhruba, chip, heyongqiang

Reviewed By: chip

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8265
2013-01-31 16:10:34 -08:00
Kosie van der Merwe 4dcc0c89f4 Fixed cache key for block cache
Summary:
Added function to `RandomAccessFile` to generate an unique ID for that file. Currently only `PosixRandomAccessFile` has this behaviour implemented and only on Linux.

Changed how key is generated in `Table::BlockReader`.

Added tests to check whether the unique ID is stable, unique and not a prefix of another unique ID. Added tests to see that `Table` uses the cache more efficiently.

Test Plan: make check

Reviewers: chip, vamsi, dhruba

Reviewed By: chip

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8145
2013-01-31 15:20:24 -08:00
Chip Turner 2c3565285e Add OS_LINUX ifdef protections around fallocate parts
Summary: fallocate is linux only, so let's protect it with ifdef's

Test Plan: make

Reviewers: sheki, dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8223
2013-01-28 12:03:35 -08:00
Dilip Antony Joseph 11ce6a060e Enhanced ldb to support data access commands
Summary: Added put/get/scan/batchput/delete/approxsize

Test Plan: Added pyunit script to test the newly added commands

Reviewers: chip, leveldb

Reviewed By: chip

CC: zshao, emayanke

Differential Revision: https://reviews.facebook.net/D7947
2013-01-28 11:38:26 -08:00
Chip Turner 0b83a83191 Fix poor error on num_levels mismatch and few other minor improvements
Summary:
Previously, if you opened a db with num_levels set lower than
the database, you received the unhelpful message "Corruption:
VersionEdit: new-file entry."  Now you get a more verbose message
describing the issue.

Also, fix handling of compression_levels (both the run-over-the-end
issue and the memory management of it).

Lastly, unique_ptr'ify a couple of minor calls.

Test Plan: make check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8151
2013-01-25 15:37:26 -08:00
Chip Turner 772f75b3fb Stop continually re-creating build_version.c
Summary:
We continually rebuilt build_version.c because we put the
current date into it, but that's what __DATE__ already is.  This makes
builds faster.

This also fixes an issue with 'make clean FOO' not working properly.

Also tweak the build rules to be more consistent, always have warnings,
and add a 'make release' rule to handle flags for release builds.

Test Plan: make, make clean

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D8139
2013-01-24 17:51:39 -08:00
Chip Turner 3dafdfb2c4 Use fallocate to prevent excessive allocation of sst files and logs
Summary:
On some filesystems, pre-allocation can be a considerable
amount of space.  xfs in our production environment pre-allocates by
1GB, for instance.  By using fallocate to inform the kernel of our
expected file sizes, we eliminate this wasteage (that isn't recovered
until the file is closed which, in the case of LOG files, can be a
considerable amount of time).

Test Plan:
created an xfs loopback filesystem, mounted with
allocsize=4M, and ran db_stress.  LOG file without this change was 4M,
and with it it was 128k then grew to normal size.

Reviewers: dhruba

Reviewed By: dhruba

CC: adsharma, leveldb

Differential Revision: https://reviews.facebook.net/D7953
2013-01-24 12:25:13 -08:00
Chip Turner 2fdf91a4f8 Fix a number of object lifetime/ownership issues
Summary:
Replace manual memory management with std::unique_ptr in a
number of places; not exhaustive, but this fixes a few leaks with file
handles as well as clarifies semantics of the ownership of file handles
with log classes.

Test Plan: db_stress, make check

Reviewers: dhruba

Reviewed By: dhruba

CC: zshao, leveldb, heyongqiang

Differential Revision: https://reviews.facebook.net/D8043
2013-01-23 16:54:11 -08:00
Abhishek Kona 7d5a4383bb rollover manifest file.
Summary:
Check in LogAndApply if the file size is more than the limit set in
Options.
Things to consider : will this be expensive?

Test Plan: make all check. Inputs on a new unit test?

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7701
2013-01-16 12:09:44 -08:00
Chip Turner a2dcd79c1e Add optional clang compile mode
Summary:
clang is an alternate compiler based on llvm.  It produces
nicer error messages and finds some bugs that gcc doesn't, such as the
size_t change in this file (which caused some write return values to be
misinterpreted!)

Clang isn't the default; to try it, do "USE_CLANG=1 make" or "export
USE_CLANG=1" then make as normal

Test Plan: "make check" and "USE_CLANG=1 make check"

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D7899
2013-01-15 18:48:37 -08:00
Kosie van der Merwe 4d339d7462 Fixed memory leak in ShardedLRUCache
Summary: `~ShardedLRUCache()` was empty despite `init()` allocating memory on the heap. Fixed the leak by freeing memory allocated by `init()`.

Test Plan:
make check

Ran valgrind on db_test before and after patch and saw leaked memory went down

Reviewers: vamsi, dhruba, emayanke, sheki

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7791
2013-01-08 11:24:15 -08:00
Kosie van der Merwe d6e873f22f Added clearer error message for failure to create db directory in DBImpl::Recover()
Summary:
Changed CreateDir() to CreateDirIfMissing() so a directory that already exists now causes and error.

Fixed CreateDirIfMissing() and added Env.DirExists()

Test Plan:
make check to test for regessions

Ran the following to test if the error message is not about lock files not existing
./db_bench --db=dir/testdb

After creating a file "testdb", ran the following to see if it failed with sane error message:
./db_bench --db=testdb

Reviewers: dhruba, emayanke, vamsi, sheki

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7707
2013-01-07 10:11:18 -08:00
Dhruba Borthakur 5b05417df3 Complication error when using gcc 4.7.1.
Summary:
There is a compilation error while using gcc 4.7.1.
util/ldb_cmd.cc:381:3: error: ‘leveldb::ReadOptions::ReadOptions’ names the constructor, not the type
util/ldb_cmd.cc:381:37: error: expected ‘;’ before ‘read_options’
util/ldb_cmd.cc:381:49: error: statement cannot resolve address of overloaded function

Test Plan: make clean check

Reviewers: sheki, emayanke, zshao

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7659
2012-12-27 12:38:20 -08:00
Zheng Shao 0f762ac573 ldb: Add command "ldb query" to support random read from the database
Summary: The queries will come from stdin.  One key per line.  The output will be in stdout, in the format of "<key> ==> <value>" if found, or "<key>" if not found.  "--hex" uses HEX-encoded keys and values in both input and output.

Test Plan: ldb query --db=leveldb_db --hex

Reviewers: dhruba, emayanke, sheki

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7617
2012-12-26 20:37:42 -08:00
Zheng Shao dcece4707e ldb: Fix incorrect arg parsing
Summary: We were ignoring additional chars at the end of an arg.  This can create confusion, e.g. --disable_wal=0 will act the same as --disable_wal without any warnings.

Test Plan:
Tried this:
[zshao@dev485 ~/git/rocksdb] ./ldb dump --statsAAA
Failed: Unknown argument:--statsAAA

Reviewers: dhruba, sheki, emayanke

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7635
2012-12-26 20:09:10 -08:00
Zheng Shao 70f0f50731 ldb: file_size and write_buffer_size options.
Summary:
This allows ldb to control the write_buffer_size (which reflects to L0 file size) and file_size (which reflects to L1 file size).  Since the target_file_size_ratio is 1 by default, all other levels will also have the same file size as L1.

As part of the diff, I also cleaned up some unused code and help messages.

Test Plan: ./ldb load --db=/data/users/zshao/test_leveldb --file_size=64000000 --write_buffer_size=32000000 --create_if_missing --input_hex --disable_wal

Reviewers: dhruba, sheki, emayanke

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7569
2012-12-21 10:40:38 -08:00
Abhishek Kona 1aae609b92 Use CRC32 ss42 instruction. Load it dynamically.
Test Plan: make all check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb, zshao

Differential Revision: https://reviews.facebook.net/D7503
2012-12-21 10:20:32 -08:00
Zheng Shao be9b862d47 ldb: add "ldb load" command
Summary: This command accepts key-value pairs from stdin with the same format of "ldb dump" command.  This allows us to try out different compression algorithms/block sizes easily.

Test Plan: dump, load, dump, verify the data is the same.

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7443
2012-12-19 01:45:59 -08:00
Zheng Shao 3d9ff0e921 ldb: fix dump command to pad HEX output chars with 0.
Summary: The old code was omitting the 0 if the char is less than 16.

Test Plan:
Tried the following program:
int main() {
  unsigned char c = 1;
  printf("%X\n", c);
  printf("%02X\n", c);
  return 0;
}
The output is:
1
01

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7437
2012-12-16 16:55:38 -08:00
Zheng Shao 7dc8bb710e ldb: support --block_size=<4096|65536|...> and --auto_compaction=<0|1>
Summary: This allows us to use ldb to do more experiments like block_size changes.

Test Plan: run it by hand.

Reviewers: dhruba, sheki, emayanke

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7431
2012-12-16 09:00:14 -08:00
Dhruba Borthakur 38671c4d54 Fix a race condition while processing tasks by background threads.
Summary:
Suppose you submit 100 background tasks one after another. The first
enqueu task finds that the queue is empty and wakes up one worker thread.
Now suppose that all remaining 99 work items are enqueued, they do not
wake up any worker threads because the queue is already non-empty.
This causes a situation when there are 99 tasks in the task queue but
only one worker thread is processing a task while the remaining
worker threads are waiting.
The fix is to always wakeup one worker thread while enqueuing a task.

I also added a check to count the number of elements in the queue
to help in debugging.

Test Plan: make clean check.

Reviewers: chip

Reviewed By: chip

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7203
2012-12-09 17:15:27 -08:00
Zheng Shao 768edfaaed ldb: Add compression and bloom filter options.
Summary:
Added the following two options:
[--bloom_bits=<int,e.g.:14>]
[--compression_type=<no|snappy|zlib|bzip2>]

These options will be used when ldb opens the leveldb database.

Test Plan: Tried by hand for both success and failure cases. We do need a test framework.

Reviewers: dhruba, emayanke, sheki

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7197
2012-12-07 14:26:37 -08:00
Kosie van der Merwe f69e9f3e04 Fixed off by 1 in tests.
Summary: Added 1 to indices where I shouldn't have so overrun array.

Test Plan: make check

Reviewers: sheki, emayanke, vamsi, dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D7227
2012-12-07 10:48:46 -08:00
Kosie van der Merwe 0eb0c9bb82 Added methods to write small ints to bit streams.
Summary: Added BitStreamPutInt() and BitStreamGetInt() which take a stream of chars and can write integers of arbitrary bit sizes to that stream at arbitrary positions. There are also convenience versions of these functions that take std::strings and leveldb::Slices.

Test Plan: make check

Reviewers: sheki, vamsi, dhruba, emayanke

Reviewed By: vamsi

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7071
2012-12-07 10:42:19 -08:00
sheki d4627e6de4 Move WAL files to archive directory, instead of deleting.
Summary:
Create a directory "archive" in the DB directory.
During DeleteObsolteFiles move the WAL files (*.log) to the Archive directory,
instead of deleting.

Test Plan: Created a DB using DB_Bench. Reopened it. Checked if files move.

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6975
2012-11-28 17:28:08 -08:00
Abhishek Kona d29f181923 Fix all the lint errors.
Summary:
Scripted and removed all trailing spaces and converted all tabs to
spaces.

Also fixed other lint errors.
All lint errors from this point of time should be taken seriously.

Test Plan: make all check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D7059
2012-11-28 17:18:41 -08:00
Chip Turner 6caf3b8e4b Fix broken test; some ldb commands can run without a db_
Summary:
It would appear our unit tests make use of code from ldb_cmd,
and don't always require a valid database handle.  D6855 was not aware
db_ could sometimes be NULL for such commands, and so it broke
reduce_levels_test.

This moves the check elsewhere to (at least) fix the 'ldb dump' case of
segfaulting when it couldn't open a database.

Test Plan: make check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D6903
2012-11-26 11:11:30 -08:00
Chip Turner 879e45eb99 Fix ldb segfault and use static libsnappy for all builds
Summary:
Link statically against snappy, using the gvfs one for facebook
environments, and the bundled one otherwise.

In addition, fix a few minor segfaults in ldb when it couldn't open the
database, and update .gitignore to include a few other build artifacts.

Test Plan: make check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D6855
2012-11-21 11:07:19 -08:00
Dhruba Borthakur 7632fdb5cb Support taking a configurable number of files from the same level to compact in a single compaction run.
Summary:
The compaction process takes some files from LevelK and
merges it into LevelK+1. The number of files it picks from
LevelK was capped such a way that the total amount of
data picked does not exceed the maxfilesize of that level.
This essentially meant that only one file from LevelK
is picked for a single compaction.

For bulkloads, we would like to take many many file from
LevelK and compact them using a single compaction run.

This patch introduces a option called the 'source_compaction_factor'
(similar to expanded_compaction_factor). It is a multiplier
that is multiplied by the maxfilesize of that level to arrive
at the limit that is used to throttle the number of source
files from LevelK.  For bulk loads, set source_compaction_factor
to a very high number so that multiple files from the same
level are picked for compaction in a single compaction.

The default value of source_compaction_factor is 1, so that
we can keep backward compatibilty with existing compaction semantics.

Test Plan: make clean check

Reviewers: emayanke, sheki

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D6867
2012-11-21 08:37:03 -08:00
Dhruba Borthakur fbb73a4ac3 Support to disable background compactions on a database.
Summary:
This option is needed for fast bulk uploads. The goal is to load
all the data into files in L0 without any interference from
background compactions.

Test Plan: make clean check

Reviewers: sheki

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D6849
2012-11-20 21:12:06 -08:00
Abhishek Kona 661dc15721 Fix LDB dumpwal to print the messages as in the file.
Summary:
StringStream.clear() does not clear the stream. It sets some flags.
Who knew? Fixing that is not printing the stuff again and again.

Test Plan: ran it on a local db

Reviewers: dhruba, emayanke

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6795
2012-11-19 12:04:35 -08:00
Abhishek Kona 30742e1692 LDB can read WAL.
Summary:
Add option to read WAL and print a summary for each record.
facebook task => #1885013

E.G. Output :
./ldb dump_wal --walfile=/tmp/leveldbtest-5907/dbbench/026122.log --header
Sequence,Count,ByteSize
49981,1,100033
49981,1,100033
49982,1,100033
49981,1,100033
49982,1,100033
49983,1,100033
49981,1,100033
49982,1,100033
49983,1,100033
49984,1,100033
49981,1,100033
49982,1,100033

Test Plan:
Works run
./ldb read_wal --wal-file=/tmp/leveldbtest-5907/dbbench/000078.log --header

Reviewers: dhruba, heyongqiang

Reviewed By: dhruba

CC: emayanke, leveldb, zshao

Differential Revision: https://reviews.facebook.net/D6675
2012-11-19 12:04:34 -08:00
Dhruba Borthakur 6c5a4d646a Merge branch 'master' into performance
Conflicts:
	db/db_impl.h
2012-11-14 21:39:52 -08:00
Dhruba Borthakur 5d16e503a6 Improved CompactionFilter api: pass in a opaque argument to CompactionFilter invocation.
Summary:
There are applications that operate on multiple leveldb instances.
These applications will like to pass in an opaque type for each
leveldb instance and this type should be passed back to the application
with every invocation of the CompactionFilter api.

Test Plan: Enehanced unit test for opaque parameter to CompactionFilter.

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: MarkCallaghan, sheki, emayanke

Differential Revision: https://reviews.facebook.net/D6711
2012-11-13 16:22:26 -08:00
heyongqiang c64796fd34 Fix test failure of reduce_num_levels
Summary:
I changed the reduce_num_levels logic to avoid "compactRange()" call if the current number of levels in use (levels that contain files) is smaller than the new num of levels.
And that change breaks the assert in reduce_levels_test

Test Plan: run reduce_levels_test

Reviewers: dhruba, MarkCallaghan

Reviewed By: dhruba

CC: emayanke, sheki

Differential Revision: https://reviews.facebook.net/D6651
2012-11-12 12:05:38 -08:00
Dhruba Borthakur 9c6c232e47 Compilation error while compiling with OPT=-g
Summary:
make clean check OPT=-g fails
leveldb::DBStatistics::getTickerCount(leveldb::Tickers)’:
./db/db_statistics.h:34: error: ‘MAX_NO_TICKERS’ was not declared in this scope
util/ldb_cmd.cc:255: warning: left shift count >= width of type

Test Plan:
make clean check OPT=-g

Reviewers:

CC:

Task ID: #

Blame Rev:
2012-11-11 00:20:40 -08:00
heyongqiang 20d18a89a3 disable size compaction in ldb reduce_levels and added compression and file size parameter to it
Summary:
disable size compaction in ldb reduce_levels, this will avoid compactions rather than the manual comapction,

added --compression=none|snappy|zlib|bzip2 and --file_size= per-file size to ldb reduce_levels command

Test Plan: run ldb

Reviewers: dhruba, MarkCallaghan

Reviewed By: dhruba

CC: sheki, emayanke

Differential Revision: https://reviews.facebook.net/D6597
2012-11-09 10:14:47 -08:00
Dhruba Borthakur 8143062edd Merge branch 'master' into performance
Conflicts:
	db/db_impl.cc
	db/version_set.cc
	util/options.cc
2012-11-07 15:11:37 -08:00
Dhruba Borthakur aa42c66814 Fix all warnings generated by -Wall option to the compiler.
Summary:
The default compilation process now uses "-Wall" to compile.
Fix all compilation error generated by gcc.

Test Plan: make all check

Reviewers: heyongqiang, emayanke, sheki

Reviewed By: heyongqiang

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6525
2012-11-06 14:07:31 -08:00
Dhruba Borthakur 5f91868cee Merge branch 'master' into performance
Conflicts:
	db/version_set.cc
	util/options.cc
2012-11-05 16:51:55 -08:00
Dhruba Borthakur 5273c81483 Ability to invoke application hook for every key during compaction.
Summary:
There are certain use-cases where the application intends to
delete older keys aftre they have expired a certian time period.
One option for those applications is to periodically scan the
entire database and delete appropriate keys.

A better way is to allow the application to hook into the
compaction process. This patch allows the application to set
a method callback for every key that is being compacted. If
this method returns true, then the key is not preserved in
the output of the compaction.

Test Plan:
This is mostly to preview the proposed new public api.
Since it is a public api, please do due diligence on reviewing it.

I will be writing test cases for this api in mynext version of
this patch.

Reviewers: MarkCallaghan, heyongqiang

Reviewed By: heyongqiang

CC: sheki, adsharma

Differential Revision: https://reviews.facebook.net/D6285
2012-11-05 16:02:13 -08:00
heyongqiang d55c2ba305 Add a tool to change number of levels
Summary: as subject.

Test Plan: manually test it, will add a testcase

Reviewers: dhruba, MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6345
2012-11-05 10:17:39 -08:00
Dhruba Borthakur 81f735d97c Merge branch 'master' into performance
Conflicts:
	db/db_impl.cc
	util/options.cc
2012-11-05 09:41:38 -08:00
amayank 854c66b089 Make compression options configurable. These include window-bits, level and strategy for ZlibCompression
Summary: Leveldb currently uses windowBits=-14 while using zlib compression.(It was earlier 15). This makes the setting configurable. Related changes here: https://reviews.facebook.net/D6105

Test Plan: make all check

Reviewers: dhruba, MarkCallaghan, sheki, heyongqiang

Differential Revision: https://reviews.facebook.net/D6393
2012-11-02 11:26:39 -07:00
heyongqiang 3096fa7534 Add two more options: disable block cache and make table cache shard number configuable
Summary:

as subject

Test Plan:

run db_bench and db_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6111
2012-11-01 13:23:21 -07:00
Dhruba Borthakur 53e04311b1 Merge branch 'master' into performance
Conflicts:
	db/db_bench.cc
	util/options.cc
2012-10-29 14:18:00 -07:00
Dhruba Borthakur 321dfdc3ae Allow having different compression algorithms on different levels.
Summary:
The leveldb API is enhanced to support different compression algorithms at
different levels.

This adds the option min_level_to_compress to db_bench that specifies
the minimum level for which compression should be done when
compression is enabled. This can be used to disable compression for levels
0 and 1 which are likely to suffer from stalls because of the CPU load
for memtable flushes and (L0,L1) compaction.  Level 0 is special as it
gets frequent memtable flushes. Level 1 is special as it frequently
gets all:all file compactions between it and level 0. But all other levels
could be the same. For any level N where N > 1, the rate of sequential
IO for that level should be the same. The last level is the
exception because it might not be full and because files from it are
not read to compact with the next larger level.

The same amount of time will be spent doing compaction at any
level N excluding N=0, 1 or the last level. By this standard all
of those levels should use the same compression. The difference is that
the loss (using more disk space) from a faster compression algorithm
is less significant for N=2 than for N=3. So we might be willing to
trade disk space for faster write rates with no compression
for L0 and L1, snappy for L2, zlib for L3. Using a faster compression
algorithm for the mid levels also allows us to reclaim some cpu
without trading off much loss in disk space overhead.

Also note that little is to be gained by compressing levels 0 and 1. For
a 4-level tree they account for 10% of the data. For a 5-level tree they
account for 1% of the data.

With compression enabled:
* memtable flush rate is ~18MB/second
* (L0,L1) compaction rate is ~30MB/second

With compression enabled but min_level_to_compress=2
* memtable flush rate is ~320MB/second
* (L0,L1) compaction rate is ~560MB/second

This practicaly takes the same code from https://reviews.facebook.net/D6225
but makes the leveldb api more general purpose with a few additional
lines of code.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6261
2012-10-29 11:48:09 -07:00
Mark Callaghan 70c42bf05f Adds DB::GetNextCompaction and then uses that for rate limiting db_bench
Summary:
Adds a method that returns the score for the next level that most
needs compaction. That method is then used by db_bench to rate limit threads.
Threads are put to sleep at the end of each stats interval until the score
is less than the limit. The limit is set via the --rate_limit=$double option.
The specified value must be > 1.0. Also adds the option --stats_per_interval
to enable additional metrics reported every stats interval.

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6243
2012-10-29 10:17:43 -07:00
Kai Liu 8965c8d0b9 Add the missing util/auto_split_logger.h
Summary:

Test Plan:

Reviewers:

CC:

Task ID: 1803577

Blame Rev:
2012-10-26 15:23:50 -07:00
Kai Liu d50f8eb603 Enable LevelDb to create a new log file if current log file is too large.
Summary: Enable LevelDb to create a new log file if current log file is too large.

Test Plan:
Write a script and manually check the generated info LOG.

Task ID: 1803577

Blame Rev:

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: zshao

Differential Revision: https://reviews.facebook.net/D6003
2012-10-26 14:55:02 -07:00
Dhruba Borthakur e982f5a1d2 Merge branch 'master' into performance
Conflicts:
	util/options.cc
2012-10-19 15:16:42 -07:00
Dhruba Borthakur cf5adc8016 db_bench was not correctly initializing the value for delete_obsolete_files_period_micros option.
Summary:
The parameter delete_obsolete_files_period_micros controls the
periodicity of deleting obsolete files. db_bench was reading in
this parameter intoa local variable called 'l' but was incorrectly
using another local variable called 'n' while setting it in the
db.options data structure.
This patch also logs the value of delete_obsolete_files_period_micros
in the LOG file at db startup time.

I am hoping that this will improve the overall write throughput drastically.

Test Plan: run db_bench

Reviewers: MarkCallaghan, heyongqiang

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6099
2012-10-19 15:10:12 -07:00
Dhruba Borthakur 1ca0584345 This is the mega-patch multi-threaded compaction
published in https://reviews.facebook.net/D5997.

Summary:
This patch allows compaction to occur in multiple background threads
concurrently.

If a manual compaction is issued, the system falls back to a
single-compaction-thread model. This is done to ensure correctess
and simplicity of code. When the manual compaction is finished,
the system resumes its concurrent-compaction mode automatically.

The updates to the manifest are done via group-commit approach.

Test Plan: run db_bench
2012-10-19 14:00:53 -07:00
Dhruba Borthakur aa73538f2a The deletion of obsolete files should not occur very frequently.
Summary:
The method DeleteObsolete files is a very costly methind, especially
when the number of files in a system is large. It makes a list of
all live-files and then scans the directory to compute the diff.
By default, this method is executed after every compaction run.

This patch makes it such that DeleteObsolete files is never
invoked twice within a configured period.

Test Plan: run all unit tests

Reviewers: heyongqiang, MarkCallaghan

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6045
2012-10-16 10:26:10 -07:00
Dhruba Borthakur f7975ac733 Implement RowLocks for assoc schema
Summary:
Each assoc is identified by (id1, assocType). This is the rowkey.
Each row has a read/write rowlock. There is statically allocated array
of 2000 read/write locks. A rowkey is murmur-hashed to one of the
read/write locks.

assocPut and assocDelete acquires the rowlock in Write mode.
The key-updates are done within the rowlock with a atomic nosync
batch write to leveldb. Then the rowlock is released and
a write-with-sync is done to sync leveldb transaction log.

Test Plan: added unit test

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5859
2012-10-03 23:19:01 -07:00
Dhruba Borthakur c1006d4276 An configurable option to write data using write instead of mmap.
Summary:
We have seen that reading data via the pread call (instead of
mmap) is much faster on Linux 2.6.x kernels. This patch makes
an equivalent option to switch off mmaps for the write path
as well.

db_bench --mmap_write=0 will use write() instead of mmap() to
write data to a file.

This change is backward compatible, the default
option is to continue using mmap for writing to a file.

Test Plan: "make check all"

Differential Revision: https://reviews.facebook.net/D5781
2012-10-03 17:08:13 -07:00
Dhruba Borthakur a58d48de79 Implement ReadWrite locks for leveldb
Summary:
Implement ReadWrite locks for leveldb. These will be helpful
to implement a read-modify-write operation (e.g. atomic increments).

Test Plan: does not modify any existing code

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5787
2012-10-01 22:37:39 -07:00
Dhruba Borthakur 72c45c66c6 Print the block cache size in the LOG.
Summary: Print the block cache size in the LOG.

Test Plan: run db_bench and look at LOG. This is helpful while I was debugging one use-case.

Reviewers: heyongqiang, MarkCallaghan

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5739
2012-09-29 21:39:19 -07:00
Dhruba Borthakur ae36e509f8 The BackupAPI should also list the length of the manifest file.
Summary:
The GetLiveFiles() api lists the set of sst files and the current
MANIFEST file. But the database continues to append new data to the
MANIFEST file even when the application is backing it up to the
backup location. This means that the database-version that is
stored in the MANIFEST FILE in the backup location
does not correspond to the sst files returned by GetLiveFiles.

This API adds a new parameter to GetLiveFiles. This new parmeter
returns the current size of the MANIFEST file.

Test Plan: Unit test attached.

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5631
2012-09-25 03:13:25 -07:00
Dhruba Borthakur 9e84834eb4 Allow a configurable number of background threads.
Summary:
The background threads are necessary for compaction.
For slower storage, it might be necessary to have more than
one compaction thread per DB. This patch allows creating
a configurable number of worker threads.
The default reamins at 1 (to maintain backward compatibility).

Test Plan:
run all unit tests. changes to db-bench coming in
a separate patch.

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5559
2012-09-19 15:51:08 -07:00
heyongqiang a8464ed820 add an option to disable seek compaction
Summary:
as subject. This diff should be good for benchmarking.

will send another diff to make it better in the case the seek compaction is enable.
In that coming diff, will not count a seek if the bloomfilter filters.

Test Plan: build

Reviewers: dhruba, MarkCallaghan

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5481
2012-09-17 13:59:57 -07:00
heyongqiang b85cdca690 add a global var leveldb::useMmapRead to enable mmap Summary:
Summary:
as subject. this can be used for benchmarking.
If we want it for some cases, we can do more changes to make this part of the option.

Test Plan: db_test

Reviewers: dhruba

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5451
2012-09-16 22:07:35 -07:00
Mark Callaghan 33323f2111 Remove use of mmap for random reads
Summary:
Reads via mmap on concurrent workloads are much slower than pread.
For example on a 24-core server with storage that can do 100k IOPS or more
I can get no more than 10k IOPS with mmap reads and 32+ threads.

Test Plan: db_bench benchmarks

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5433
2012-09-14 16:43:50 -07:00
Dhruba Borthakur 93f4952089 Ability to switch off filesystem read-aheads
Summary:
Ability to switch off filesystem read-aheads. This change is
backward-compatible: the default setting is to allow file
system read-aheads.

Test Plan: run benchmarks

Reviewers: heyongqiang, adsharma

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5391
2012-09-13 12:09:56 -07:00
Dhruba Borthakur 4028ae7d31 Do not cache readahead-pages in the OS cache.
Summary:
When posix_fadvise(offset, offset) is usedm it frees up only those
pages in that specified range. But the filesystem could have done some
read-aheads and those get cached in the OS cache.

Do not cache readahead-pages in the OS cache.

Test Plan: run db_bench benchmark.

Reviewers: vamsi, heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5379
2012-09-13 10:56:02 -07:00
Dhruba Borthakur 407727b75f Fix compiler warnings. Use uint64_t instead of uint.
Summary: Fix compiler warnings. Use uint64_t instead of uint.

Test Plan: build using -Wall

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5355
2012-09-12 14:42:36 -07:00
heyongqiang 0f43aa474e put log in a seperate dir
Summary: added a new option db_log_dir, which points the log dir. Inside that dir, in order to make log names unique, the log file name is prefixed with the leveldb data dir absolute path.

Test Plan: db_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D5205
2012-09-06 17:52:08 -07:00
Dhruba Borthakur fe93631678 Clean up compiler warnings generated by -Wall option.
Summary:
Clean up compiler warnings generated by -Wall option.
make clean all OPT=-Wall

This is a pre-requisite before making a new release.

Test Plan: compile and run unit tests

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5019
2012-08-29 14:24:51 -07:00
Dhruba Borthakur e5fe80e4e3 The sharding of the block cache is limited to 2*20 pieces.
Summary:
The numbers of shards that the block cache is divided into is
configurable. However, if the user specifies that he/she wants
the block cache to be divided into more than 2**20 pieces, then
the system will rey to allocate a huge array of that size) that
could fail.

It is better to limit the sharding of the block cache to an
upper bound. The default sharding is 16 shards (i.e. 2**4)
and the maximum is now 2 million shards (i.e. 2**20).

Also, fixed a bug with the LRUCache where the numShardBits
should be a private member of the LRUCache object rather than
a static variable.

Test Plan:
run db_bench with --cache_numshardbits=64.

Task ID: #

Blame Rev:

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5013
2012-08-29 12:17:59 -07:00
heyongqiang a4f9b8b49e merge 1.5
Summary:

as subject

Test Plan:

db_test table_test

Reviewers: dhruba
2012-08-28 11:43:33 -07:00
Dhruba Borthakur fc20273e73 Introduce a new method Env->Fsync() that issues fsync (instead of fdatasync).
Summary:
Introduce a new method Env->Fsync() that issues fsync (instead of fdatasync).
This is needed for data durability when running on ext3 filesystems.
Added options to the benchmark db_bench to generate performance numbers
with either fsync or fdatasync enabled.

Cleaned up Makefile to build leveldb_shell only when building the thrift
leveldb server.

Test Plan: build and run benchmark

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D4911
2012-08-27 21:24:17 -07:00
Dhruba Borthakur e5a7c8e580 Log the open-options to the LOG.
Summary: Log the open-options to the LOG. Use options_ instead of options because SanitizeOptions could modify the max_file_open limit.

Test Plan: num db_bench

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D4833
2012-08-22 12:22:12 -07:00
heyongqiang 21082fa13c regression for trigger compaction logic
Summary: as subject

Test Plan: manually run db_bench confirmed

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4809
2012-08-21 18:11:21 -07:00
Dhruba Borthakur f4e7febf22 Record the version of the source repository that was used to build the leveldb library.
Summary: Record the version of the source that we are compiling. We keep a record of the git revision in util/version.cc. This source file is then built as a regular source file as part of the compilation process. One can run "strings executable_filename | grep _build_" to find the version of the source that we used to build the executable file.

Test Plan: none

Differential Revision: https://reviews.facebook.net/D4785
2012-08-21 14:47:15 -07:00
heyongqiang 6ba1f17789 adding a scribe logger in leveldb to log leveldb deploy stats
Summary:
as subject.

A new log is written to scribe via thrift client when a new db is opened and when there is
a compaction.

a new option var scribe_log_db_stats is added.

Test Plan: manually checked using command "ptail -time 0 leveldb_deploy_stats"

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4659
2012-08-21 11:43:22 -07:00
Dhruba Borthakur e56b2c5a31 Prevent concurrent multiple opens of leveldb database.
Summary:
The fcntl call cannot detect lock conflicts when invoked multiple times
from the same thread.
Use a static lockedFile Set to record the paths that are locked.
A lockfile request checks to see if htis filename already exists in
lockedFiles, if so, then it triggers an error. Otherwise, it inserts
the filename in the lockedFiles Set.
A unlock file request verifies that the filename is in the lockedFiles
set and removes it from lockedFiles set.

Test Plan: unit test attached

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D4755
2012-08-20 23:55:04 -07:00
Dhruba Borthakur c3096afd61 Introduce a new option disableDataSync for opening the database. If this is set to true, then the data written to newly created data files are not sycned to disk, instead depend on the OS to flush dirty data to stable storage. This option is good for bulk
Test Plan:
manual tests

Task ID: #

Blame Rev:

Differential Revision: https://reviews.facebook.net/D4515
2012-08-03 15:23:53 -07:00
Dhruba Borthakur d11b637f34 bits_per_key is already configurable. It defines how many bloom bits will be used for every key in the database.
My change in this patch is to make the Hash code that is used for blooms to be confgurable. In fact,
one can specify a modified HashCode that inspects only parts of the Key to generate the Hash (used
by booms).

Test Plan: none

Differential Revision: https://reviews.facebook.net/D4059
2012-07-09 23:06:07 -07:00
Dhruba Borthakur 80c663882a Create leveldb server via Thrift.
Summary:
First draft.
Unit tests pass.

Test Plan: unit tests attached

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D3969
2012-07-07 09:42:39 -07:00
heyongqiang 7600228072 fix compile warning
Summary: as subject

Test Plan: compile

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D3957
2012-07-02 17:37:45 -07:00
heyongqiang 4e4b6812ff Make some variables configurable for each db instance
Summary:
Make configurable 'targetFileSize', 'targetFileSizeMultiplier',
'maxBytesForLevelBase', 'maxBytesForLevelMultiplier',
'expandedCompactionFactor', 'maxGrandParentOverlapFactor'

Test Plan: N/A

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D3801
2012-06-27 14:36:31 -07:00
Dhruba Borthakur a35e574344 Make Leveldb save data into HDFS files. You have to set USE_HDFS in your environment variable to compile leveldb with HDFS support.
Test Plan: Run benchmark.

Differential Revision: https://reviews.facebook.net/D3549
2012-06-14 00:29:01 -07:00
Dhruba Borthakur 8f293b68a9 Support --bufferedio=[0,1] from db_bench. If bufferedio = 0, then the read code path clears the OS page cache after the IO is completed. The default remains as bufferedio=1
Summary:
Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Differential Revision: https://reviews.facebook.net/D3429
2012-05-29 13:29:44 -07:00
Dhruba Borthakur a2a0e358cb Add support to specify the number of shards for the Block cache. By default, the block cache is sharded into 16 parts.
Summary:
Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Differential Revision: https://reviews.facebook.net/D3273
2012-05-16 17:23:49 -07:00
Arun Sharma 95af128225 SSE4 optimization
Summary:
This speeds up CRC computation significantly on
hardware that supports it. Enabled via -msse4.

Note: the binary won't be usable on older CPUs
that don't support the instruction.

Test Plan: crc32c_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D3201
2012-05-15 10:10:01 -07:00
Arun Sharma 921a48428e Optimize for lp64
Summary:
Some code reorganization in-preparation for replacing with a hardware
instruction.

* Use u64 for some of the key types
* Use an ALIGN macro so code is easier to read

Test Plan: crc32c_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D3135
2012-05-14 15:40:11 -07:00
Sanjay Ghemawat 85584d497e Added bloom filter support.
In particular, we add a new FilterPolicy class.  An instance
of this class can be supplied in Options when opening a
database.  If supplied, the instance is used to generate
summaries of keys (e.g., a bloom filter) which are placed in
sstables.  These summaries are consulted by DB::Get() so we
can avoid reading sstable blocks that are guaranteed to not
contain the key we are looking for.

This change provides one implementation of FilterPolicy
based on bloom filters.

Other changes:
- Updated version number to 1.4.
- Some build tweaks.
- C binding for CompactRange.
- A few more benchmarks: deleteseq, deleterandom, readmissing, seekrandom.
- Minor .gitignore update.
2012-04-17 08:36:46 -07:00
Sanjay Ghemawat 9013f13b15 use mmap on 64-bit machines to speed-up reads; small build fixes 2012-03-15 09:14:00 -07:00
Sanjay Ghemawat 3c8be108bf fixed issues 66 (leaking files on disk error) and 68 (no sync of CURRENT file) 2012-01-25 14:56:52 -08:00
Hans Wennborg 42fb47f6ed Pass system's CFLAGS, remove exit time destructor, sstable bug fix.
- Pass system's values of CFLAGS,LDFLAGS.
  Don't override OPT if it's already set.
  Original patch by Alessio Treglia <alessio@debian.org>:
  http://code.google.com/p/leveldb/issues/detail?id=27#c6

- Remove 1 exit time destructor from leveldb.
  See http://crbug.com/101600

- Fix problem where sstable building code would pass an
  internal key to the user comparator.

(Sync with uptream at 25436817.)
2011-11-14 17:06:16 +00:00
Hans Wennborg 36a5f8ed7f A number of fixes:
- Replace raw slice comparison with a call to user comparator.
  Added test for custom comparators.

- Fix end of namespace comments.

- Fixed bug in picking inputs for a level-0 compaction.

  When finding overlapping files, the covered range may expand
  as files are added to the input set.  We now correctly expand
  the range when this happens instead of continuing to use the
  old range.  For example, suppose L0 contains files with the
  following ranges:

      F1: a .. d
      F2:    c .. g
      F3:       f .. j

  and the initial compaction target is F3.  We used to search
  for range f..j which yielded {F2,F3}.  However we now expand
  the range as soon as another file is added.  In this case,
  when F2 is added, we expand the range to c..j and restart the
  search.  That picks up file F1 as well.

  This change fixes a bug related to deleted keys showing up
  incorrectly after a compaction as described in Issue 44.

(Sync with upstream @25072954)
2011-10-31 17:22:06 +00:00
Gabor Cselle 299ccedfec A number of bugfixes:
- Added DB::CompactRange() method.

  Changed manual compaction code so it breaks up compactions of
  big ranges into smaller compactions.

  Changed the code that pushes the output of memtable compactions
  to higher levels to obey the grandparent constraint: i.e., we
  must never have a single file in level L that overlaps too
  much data in level L+1 (to avoid very expensive L-1 compactions).

  Added code to pretty-print internal keys.

- Fixed bug where we would not detect overlap with files in
  level-0 because we were incorrectly using binary search
  on an array of files with overlapping ranges.

  Added "leveldb.sstables" property that can be used to dump
  all of the sstables and ranges that make up the db state.

- Removing post_write_snapshot support.  Email to leveldb mailing
  list brought up no users, just confusion from one person about
  what it meant.

- Fixing static_cast char to unsigned on BIG_ENDIAN platforms.

  Fixes	Issue 35 and Issue 36.

- Comment clarification to address leveldb Issue 37.

- Change license in posix_logger.h to match other files.

- A build problem where uint32 was used instead of uint32_t.

Sync with upstream @24408625
2011-10-05 16:30:28 -07:00
Hans Wennborg 213a68eb68 Sync with upstream @23860137.
Fix GCC -Wshadow warnings in LevelDB's public header files,
reported by Dustin.

Add in-memory Env implementation (helpers/memenv/*).
This enables users to create LevelDB databases in-memory.

Initialize ShardedLRUCache::last_id_ to zero.
This fixes a Valgrind warning.

(Also delete port/sha1_* which were removed upstream some time ago.)
2011-09-12 10:21:10 +01:00
gabor@google.com e3584f9c28 Bugfix for issue 33; reduce lock contention in Get(), parallel benchmarks.
- Fix for issue 33 (non-null-terminated result from
  leveldb_property_value())

- Support for running multiple instances of a benchmark in parallel.

- Reduce lock contention on Get():
  (1) Do not hold the lock while searching memtables.
  (2) Shard block and table caches 16-ways.

  Benchmark for evaluating this change:
  $ db_bench --benchmarks=fillseq1,readrandom --threads=$n
  (fillseq1 is a small hack to make sure fillseq runs once regardless
  of number of threads specified on the command line).



git-svn-id: https://leveldb.googlecode.com/svn/trunk@49 62dab493-f737-651d-591e-8d6aee1b9529
2011-08-22 21:08:51 +00:00
gabor@google.com ab323f7e1e Bugfixes for iterator and documentation.
- Fix bug in Iterator::Prev where it would return the wrong key.
  Fixes issues 29 and 30.

- Added a tweak to testharness to allow running just some tests.

- Fixing two minor documentation errors based on issues 28 and 25.

- Cleanup; fix namespaces of export-to-C code.
  Also fix one "const char*" vs "char*" mismatch.



git-svn-id: https://leveldb.googlecode.com/svn/trunk@48 62dab493-f737-651d-591e-8d6aee1b9529
2011-08-16 01:21:01 +00:00
dgrogan@chromium.org a05525d13b @23023120
git-svn-id: https://leveldb.googlecode.com/svn/trunk@47 62dab493-f737-651d-591e-8d6aee1b9529
2011-08-06 00:19:37 +00:00
gabor@google.com f122c6dfbb Adding FreeBSD support, removing Chromium files, adding benchmark.
- LevelDB patch for FreeBSD. This resolves Issue 22.
  Contributed by dforsythe (thanks!).

- Removing Chromium-specific files.
  They are now going to live in the Chromium repository.

- Adding a benchmark page comparing LevelDB performance
  to SQLite and Kyoto Cabinet's TreeDB, along with
  code to generate the benchmarks.
  Thanks to Kevin Tseng for compiling the benchmarks,
  and Scott Hess and Mikio Hirabayashi for their
  help and advice.



git-svn-id: https://leveldb.googlecode.com/svn/trunk@40 62dab493-f737-651d-591e-8d6aee1b9529
2011-07-27 01:46:25 +00:00
gabor@google.com 60bd8015f2 Speed up Snappy uncompression, new Logger interface.
- Removed one copy of an uncompressed block contents changing
  the signature of Snappy_Uncompress() so it uncompresses into a
  flat array instead of a std::string.
        
  Speeds up readrandom ~10%.

- Instead of a combination of Env/WritableFile, we now have a
  Logger interface that can be easily overridden applications
  that want to supply their own logging.

- Separated out the gcc and Sun Studio parts of atomic_pointer.h
  so we can use 'asm', 'volatile' keywords for Sun Studio.




git-svn-id: https://leveldb.googlecode.com/svn/trunk@39 62dab493-f737-651d-591e-8d6aee1b9529
2011-07-21 02:40:18 +00:00
gabor@google.com 6872ace901 Sun Studio support, and fix for test related memory fixes.
- LevelDB patch for Sun Studio
  Based on a patch submitted by Theo Schlossnagle - thanks!
  This fixes Issue 17.

- Fix a couple of test related memory leaks.



git-svn-id: https://leveldb.googlecode.com/svn/trunk@38 62dab493-f737-651d-591e-8d6aee1b9529
2011-07-19 23:36:47 +00:00
gabor@google.com 6699c7ebe6 Small tweaks and bugfixes for Issue 18 and 19.
Slight tweak to the no-overlap optimization: only push to
level 2 to reduce the amount of wasted space when the same
small key range is being repeatedly overwritten.

Fix for Issue 18: Avoid failure on Windows by avoiding
deletion of lock file until the end of DestroyDB().

Fix for Issue 19: Disregard sequence numbers when checking for 
overlap in sstable ranges. This fixes issue 19: when writing 
the same key over and over again, we would generate a sequence 
of sstables that were never merged together since their sequence
numbers were disjoint.

Don't ignore map/unmap error checks.

Miscellaneous fixes for small problems Sanjay found while diagnosing
issue/9 and issue/16 (corruption_testr failures).
- log::Reader reports the record type when it finds an unexpected type.
- log::Reader no longer reports an error when it encounters an expected
  zero record regardless of the setting of the "checksum" flag.
- Added a missing forward declaration.
- Documented a side-effects of larger write buffer sizes
  (longer recovery time).



git-svn-id: https://leveldb.googlecode.com/svn/trunk@37 62dab493-f737-651d-591e-8d6aee1b9529
2011-07-15 00:20:57 +00:00
gabor@google.com f57e23351f Platform detection during build, plus compatibility patches for machines without <cstdatomic>.
This revision adds two major changes:
1. build_detect_platform which generates build_config.mk
   with platform-dependent flags for the build process
2. /port/atomic_pointer.h with anAtomicPointerimplementation
   for platforms without <cstdatomic>

Some of this code is loosely based on patches submitted to the 
LevelDB mailing list at https://groups.google.com/forum/#!forum/leveldb
Tip of the hat to Dave Smith and Edouard A, who both sent patches.

The presence of Snappy (http://code.google.com/p/snappy/) and
cstdatomic are now both detected in the build_detect_platform
script (1.) which gets executing during make.

For (2.), instead of broadly importing atomicops_* from Chromium or
the Google performance tools, we chose to just implement AtomicPointer 
and the limited atomic load and store operations it needs. 
This resulted in much less code and fewer files - everything is 
contained in atomic_pointer.h.



git-svn-id: https://leveldb.googlecode.com/svn/trunk@34 62dab493-f737-651d-591e-8d6aee1b9529
2011-06-29 00:30:50 +00:00
dgrogan@chromium.org 740d8b3d00 Update from upstream @21551990
* Patch LevelDB to build for OSX and iOS
* Fix race condition in memtable iterator deletion.
* Other small fixes.

git-svn-id: https://leveldb.googlecode.com/svn/trunk@29 62dab493-f737-651d-591e-8d6aee1b9529
2011-05-28 00:53:58 +00:00
dgrogan@chromium.org da79909507 sync with upstream @ 21409451
Check the NEWS file for details of what changed.

git-svn-id: https://leveldb.googlecode.com/svn/trunk@28 62dab493-f737-651d-591e-8d6aee1b9529
2011-05-21 02:17:43 +00:00
dgrogan@chromium.org be9f061d2f pull in hans' mac build fix
git-svn-id: https://leveldb.googlecode.com/svn/trunk@26 62dab493-f737-651d-591e-8d6aee1b9529
2011-04-21 01:54:51 +00:00
dgrogan@chromium.org ba6dac0e80 @20776309
* env_chromium.cc should not export symbols.
* Fix MSVC warnings.
* Removed large value support.
* Fix broken reference to documentation file

git-svn-id: https://leveldb.googlecode.com/svn/trunk@24 62dab493-f737-651d-591e-8d6aee1b9529
2011-04-20 22:48:11 +00:00
dgrogan@chromium.org 69c6d38342 reverting disastrous MOE commit, returning to r21
git-svn-id: https://leveldb.googlecode.com/svn/trunk@23 62dab493-f737-651d-591e-8d6aee1b9529
2011-04-19 23:11:15 +00:00
dgrogan@chromium.org b743906eea Revision created by MOE tool push_codebase.
MOE_MIGRATION=


git-svn-id: https://leveldb.googlecode.com/svn/trunk@22 62dab493-f737-651d-591e-8d6aee1b9529
2011-04-19 23:01:25 +00:00
dgrogan@chromium.org b409afe968 chmod a-x
git-svn-id: https://leveldb.googlecode.com/svn/trunk@21 62dab493-f737-651d-591e-8d6aee1b9529
2011-04-18 23:15:58 +00:00
dgrogan@chromium.org f779e7a5d8 @20602303. Default file permission is now 755.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@20 62dab493-f737-651d-591e-8d6aee1b9529
2011-04-12 19:38:58 +00:00
jorlow@chromium.org 4671a695fc Move include files into a leveldb subdir.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@18 62dab493-f737-651d-591e-8d6aee1b9529
2011-03-30 18:35:40 +00:00
jorlow@chromium.org 4d66fd5af3 Upstream change.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@17 62dab493-f737-651d-591e-8d6aee1b9529
2011-03-29 22:41:11 +00:00
jorlow@chromium.org e2da744e12 Upstream changes.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@16 62dab493-f737-651d-591e-8d6aee1b9529
2011-03-28 20:43:44 +00:00
jorlow@chromium.org e11bdf1935 Upstream changes
git-svn-id: https://leveldb.googlecode.com/svn/trunk@15 62dab493-f737-651d-591e-8d6aee1b9529
2011-03-25 20:27:43 +00:00
jorlow@chromium.org 8303bb1b33 Pull from upstream.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@14 62dab493-f737-651d-591e-8d6aee1b9529
2011-03-22 23:24:02 +00:00
jorlow@chromium.org 6d243ebf79 Make GetTestDirectory threadsafe within Chromium and make it work on Windows.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@13 62dab493-f737-651d-591e-8d6aee1b9529
2011-03-22 19:07:54 +00:00
jorlow@chromium.org 4bcb231187 more upstream changes
git-svn-id: https://leveldb.googlecode.com/svn/trunk@10 62dab493-f737-651d-591e-8d6aee1b9529
2011-03-21 21:06:49 +00:00
jorlow@chromium.org 0e38925490 Sync in bug fixes
git-svn-id: https://leveldb.googlecode.com/svn/trunk@9 62dab493-f737-651d-591e-8d6aee1b9529
2011-03-21 19:40:57 +00:00
jorlow@chromium.org f67e15e50f Initial checkin.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@2 62dab493-f737-651d-591e-8d6aee1b9529
2011-03-18 22:37:00 +00:00