Commit graph

2246 commits

Author SHA1 Message Date
Yueh-Hsuan Chiang 70828557ef Some fixes on size compensation logic for deletion entry in compaction
Summary:
This patch include two fixes:
1. newly created Version will now takes the aggregated stats for average-value-size from the latest Version.
2. compensated size of a file is now computed only for newly created / loaded file, this addresses the issue where files are already sorted by their compensated file size but might sometimes observe some out-of-order due to later update on compensated file size.

Test Plan:
export ROCKSDB_TESTS=CompactionDele
./db_test

Reviewers: ljin, igor, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19557
2014-07-09 12:46:08 -07:00
Ankit Gupta 0a4d930264 Caching methodId and fieldId is fine: v2 2014-07-09 09:09:08 -07:00
Ankit Gupta b6caaea9d3 Caching methodId and fieldId is fine 2014-07-09 09:06:40 -07:00
Ankit Gupta 21e522673f Revert wrong commit 2014-07-09 09:06:20 -07:00
Ankit Gupta 05bd545000 Caching methodId and fieldId is fine 2014-07-09 09:02:05 -07:00
Lei Jin ef1aad97f9 fix one more internal_stats issue
Summary: stall count is wrong

Test Plan: make release

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19539
2014-07-08 15:29:13 -07:00
Lei Jin 73d7147096 make rate limiter test more reliable
Summary:
Randomize keys so that compaction actually happens.
Change the config so that compaction happens more aggressively.
The test takes longer time, but the results are more stable shown by
iostat

Test Plan: ran it

Reviewers: igor, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19533
2014-07-08 15:15:00 -07:00
Lei Jin 8a9cc7885c report correct interval amplification
Summary: as title

Test Plan: make release

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19515
2014-07-08 12:48:10 -07:00
Lei Jin 534357ca3a integrate rate limiter into rocksdb
Summary:
Add option and plugin rate limiter for PosixWritableFile. The rate
limiter only applies to flush and compaction. WAL and MANIFEST are
excluded from this enforcement.

Test Plan: db_test

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19425
2014-07-08 12:31:49 -07:00
Lei Jin 5ef1ba7ff5 generic rate limiter
Summary:
A generic rate limiter that can be shared by threads and rocksdb
instances. Will use this to smooth out write traffic generated by
compaction and flush. This will help us get better p99 behavior on flash
storage.

Test Plan:
unit test output
==== Test RateLimiterTest.Rate
request size [1 - 1023], limit 10 KB/sec, actual rate: 10.374969 KB/sec, elapsed 2002265
request size [1 - 2047], limit 20 KB/sec, actual rate: 20.771242 KB/sec, elapsed 2002139
request size [1 - 4095], limit 40 KB/sec, actual rate: 41.285299 KB/sec, elapsed 2202424
request size [1 - 8191], limit 80 KB/sec, actual rate: 81.371605 KB/sec, elapsed 2402558
request size [1 - 16383], limit 160 KB/sec, actual rate: 162.541268 KB/sec, elapsed 3303500

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19359
2014-07-08 11:41:57 -07:00
Lei Jin b278ae8e50 Apply fractional cascading in ForwardIterator::Seek()
Summary:
Use search hint to reduce FindFile range thus avoid comparison
For a small DB with 50M keys, perf_context counter shows it reduces
comparison from 2B to 1.3B for a 15-minute run. No perf change was
observed for 1 seek thread, but quite good improvement was seen for 32
seek threads, when CPU was busy.
will post detail results when ready

Test Plan: db_bench and db_test

Reviewers: haobo, sdong, dhruba, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18879
2014-07-08 11:40:42 -07:00
Igor Canadi 1b95bf734d Merge pull request #197 from rdallman/update-options
C API: update options w/ convenience funcs & fifo compaction
2014-07-08 11:40:11 -07:00
Reed Allman fd3fb4b0bf C API: update options w/ convenience funcs & fifo compaction 2014-07-08 10:57:45 -07:00
Igor Canadi db1a10e9b8 Merge pull request #198 from rdallman/cf-compact-range
C API: bugfix column_family_compact_range
2014-07-08 08:51:41 -07:00
Igor Canadi 5491c486eb Merge pull request #199 from jrobeson/patch-1
facebook accounts are not required for CLA signers
2014-07-08 08:49:49 -07:00
Johnny Robeson 058421dea8 facebook accounts are not required for CLA signers
slightly reword the sentence to avoid mentioning it.
2014-07-08 05:57:54 -04:00
Reed Allman e9b18b6b89 C API: bugfix column_family_comact_range 2014-07-07 21:48:49 -07:00
Ankit Gupta 4216ca36ae Class IDs and method IDs should not be cached 2014-07-07 21:45:02 -07:00
Ankit Gupta 637744ce15 Package .so file with JNI jar 2014-07-07 20:50:21 -07:00
sdong 01159aa802 stackable_db: add default function for GetLiveFilesMetaData()
Summary: stackable_db doesn't have GetLiveFilesMetaData() implemented. Add it

Test Plan: make all check

Reviewers: yhchiang, ljin, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19491
2014-07-07 17:02:57 -07:00
Igor Canadi 4adf64e068 Fix compile issue 2014-07-07 14:54:11 -07:00
Igor Canadi 862227769a Merge pull request #194 from ankgup87/master
Add compression type to options
2014-07-07 14:48:48 -07:00
Igor Canadi 8a03935f8c Fix valgrind error in c_test
Summary:
External contribution caused some valgrind errors: 1a34aaaef0

This diff fixes them

Test Plan: ran valgrind

Reviewers: sdong, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19485
2014-07-07 14:41:54 -07:00
Igor Canadi 8f6e9ab209 Merge pull request #191 from edsrzf/c-api-compaction-filter-factory
C API: Add support for compaction filter factories (v1)
2014-07-07 13:42:46 -07:00
Evan Shaw 13a130cc00 C API: Add test for compaction filter factories
Also refactored the compaction filter tests to share some code and ensure that
options were getting reset so future test results aren't confused.
2014-07-08 08:12:36 +12:00
Evan Shaw 3f7104d7c5 C API: Allow setting compaction filter factory 2014-07-08 08:12:36 +12:00
Evan Shaw 91bede79cc C API: Add support for compaction filter factories (v1) 2014-07-08 08:12:36 +12:00
Evan Shaw 0bf5589c74 Fix compaction_filter.h typos 2014-07-08 08:11:01 +12:00
Ankit Gupta 97bfcd6a16 Update doc of options.cc 2014-07-07 12:26:06 -07:00
Ankit Gupta c9cfb88256 Remove doc from options class 2014-07-07 12:21:15 -07:00
Ankit Gupta 4efca96bb3 Change enum type from int to byte 2014-07-07 12:18:55 -07:00
Radheshyam Balasundaram f0660d5253 Adding NUMA support to db_bench tests
Summary:
Changes:
- Adding numa_aware flag to db_bench.cc
- Using numa.h library to bind memory and cpu of threads to a fixed NUMA node
Result: There seems to be no significant change in the micros/op time with numa_aware enabled. I also tried this with other implementations, including a combination of pthread_setaffinity_np, sched_setaffinity and set_mempolicy methods. It'd be great if someone could point out where I'm going wrong and if we can achieve a better micors/op.

Test Plan:
Ran db_bench tests using following command:
./db_bench --db=/mnt/tmp --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=/mnt/tmp --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --duration=300 --benchmarks=readwhilewriting --use_existing_db=1 --num=157286400 --threads=24 --writes_per_second=10240 --numa_aware=[False/True]

The tests were run in private devserver with 24 cores and the db was prepopulated using filluniquerandom test. The tests resulted in 0.145 us/op with numa_aware=False and 0.161 us/op with numa_aware=True.

Reviewers: sdong, yhchiang, ljin, igor

Reviewed By: ljin, igor

Subscribers: igor, leveldb

Differential Revision: https://reviews.facebook.net/D19353
2014-07-07 10:53:31 -07:00
Ankit Gupta e8c592c625 Add newline 2014-07-07 10:25:27 -07:00
Igor Canadi 0bc5fa9f40 Merge pull request #190 from edsrzf/c-api-writebatch-serialized
C API: support constructing write batch from serialized representation
2014-07-07 10:17:43 -07:00
Ankit Gupta 47f0cf6d38 Add docs 2014-07-07 10:06:28 -07:00
Ankit Gupta da12f9ec4e Remove .so package with JNI 2014-07-07 10:01:28 -07:00
Ankit Gupta 54d7a2c0c5 Add compression type to options 2014-07-07 09:58:54 -07:00
Igor Canadi efbb4bd387 Merge pull request #193 from rdallman/columnfamilies
C API: column family support
2014-07-07 09:47:30 -07:00
Ankit Gupta bc708e0012 Merge branch 'master' of https://github.com/facebook/rocksdb 2014-07-07 09:14:32 -07:00
Reed Allman 1a34aaaef0 C API: column family support 2014-07-07 01:41:01 -07:00
Evan Shaw 9fc23d0c56 C API: support constructing write batch from serialized representation 2014-07-06 10:36:33 +12:00
Yueh-Hsuan Chiang 7b85c1e900 Improve SimpleWriteTimeoutTest to avoid false alarm.
Summary:
SimpleWriteTimeoutTest has two parts: 1) insert two large key/values
to make memtable full and expect both of them are successful; 2) insert
another key / value and expect it to be timed-out.  Previously we also
set a timeout in the first step, but this might sometimes cause
false alarm.

This diff makes the first two writes run without timeout setting.

Test Plan:
export ROCKSDB_TESTS=Time
make db_test

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19461
2014-07-04 00:02:12 -07:00
Lei Jin 8d9a46fcd1 initialize decoded_internal_key_valid
Summary:
ReadInternalKey() will assign correct value anyway. Initialize it to
true to suppress compiler error reported
https://github.com/facebook/rocksdb/issues/186

Test Plan: I cannot reproduce it but this is obvious

Reviewers: sdong, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19467
2014-07-03 23:13:08 -07:00
Yueh-Hsuan Chiang d33657a4a5 Fixed a warning in release mode.
Summary: Removed a variable that is only used in assertion check.

Test Plan: make release

Reviewers: ljin, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19455
2014-07-03 17:19:17 -07:00
Yueh-Hsuan Chiang 90a6aca48e Finer report I/O stats about Flush and Compaction.
Summary:
This diff allows the I/O stats about Flush and Compaction to be reported
in a more accurate way.  Instead of measuring the size of a file, it
measure I/O cost in per read / write basis.

Test Plan: make all check

Reviewers: sdong, igor, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19383
2014-07-03 16:28:03 -07:00
Yueh-Hsuan Chiang a1df6c1fc8 Update HISTORY.md to include TimeOut write API and compaction update.
Summary: Update HISTORY.md to include TimeOut write API and compaction update.

Test Plan: n/a

Reviewers: ljin, sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19449
2014-07-03 15:58:27 -07:00
Yueh-Hsuan Chiang d4d338de33 Add timeout_hint_us to WriteOptions and introduce Status::TimeOut.
Summary:
This diff adds timeout_hint_us to WriteOptions.  If it's non-zero, then
1) writes associated with this options MAY be aborted when it has been
  waiting for longer than the specified time.  If an abortion happens,
  associated writes will return Status::TimeOut.
2) the stall time of the associated write caused by flush or compaction
  will be limited by timeout_hint_us.

The default value of timeout_hint_us is 0 (i.e., OFF.)

The statistics of timeout writes will be recorded in WRITE_TIMEDOUT.

Test Plan:
export ROCKSDB_TESTS=WriteTimeoutAndDelayTest
make db_test
./db_test

Reviewers: igor, ljin, haobo, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18837
2014-07-03 15:47:02 -07:00
Ankit Gupta 6b835c6009 Merge branch 'master' of https://github.com/facebook/rocksdb 2014-07-03 14:13:46 -07:00
Ankit Gupta 70aa243fa1 Package .so file with .jar 2014-07-03 14:13:05 -07:00
Igor Canadi 4203431e71 Fix mac os compile error 2014-07-03 23:03:24 +02:00