Commit Graph

289 Commits

Author SHA1 Message Date
Jay Zhuang 98c1333806 Disable a known flaky test: RandomAccessUniqueIDDeletes (#7511)
Summary:
It's a known issue, which is tracked in https://github.com/facebook/rocksdb/issues/7405, https://github.com/facebook/rocksdb/issues/7470. Disable it for
now.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7511

Reviewed By: zhichao-cao

Differential Revision: D24145075

Pulled By: jay-zhuang

fbshipit-source-id: 1858497972f2baba617867aaeac30d93b8305c80
2020-10-08 09:40:59 -07:00
Zhichao Cao 4146276885 Add ldb_cmd_test to ASSERT_STATUS_CHECKED list (#7499)
Summary:
Add ldb_cmd_test to ASSERT_STATUS_CHECKED list

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7499

Test Plan: pass ASSERT_STATUS_CHECKED=1 make -j48 ldb_cmd_test

Reviewed By: cheng-chang

Differential Revision: D24086203

Pulled By: zhichao-cao

fbshipit-source-id: 29592202b1d4335e566de15e7937269d98d57841
2020-10-08 00:00:48 -07:00
Andrew Kryczka 1600aac46f Flush info log for warning and higher severity (#7462)
Summary:
After unclean crash, the tail of the log could look as follows due to block buffering, even when the call to `ROCKSDB_LOG_ERROR()` finished.

```
2020/09/29-13:54:39.596710 7f67025fe700 [ERROR] [/db_impl/db_impl_compaction_flush.cc:2500] Waiting after background compaction err
```

This PR forces the flush while logging warning severity or higher to prevent that case.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7462

Reviewed By: riversand963

Differential Revision: D24000154

Pulled By: ajkr

fbshipit-source-id: 3bf5f1e69a62ee10e84095cebc88937a8f81b4ad
2020-09-29 16:06:14 -07:00
Zhichao Cao 0ce9b3a22d Add AppendWithVerify and PositionedAppendWithVerify to Env and FileSystem (#7419)
Summary:
Add new AppendWithVerify and PositionedAppendWithVerify APIs to Env and FileSystem to bring the data verification information (data checksum information) from upper layer (e.g., WritableFileWriter) to the storage layer. This PR only include the API definition, no functional codes are added to unblock other developers which depend on these APIs.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7419

Test Plan: make -j32

Reviewed By: pdillinger

Differential Revision: D23883196

Pulled By: zhichao-cao

fbshipit-source-id: 94676c26bc56144cc32e3661f84f21eccd790411
2020-09-23 19:02:26 -07:00
Akanksha Mahajan 98ac6b646a Add IO Tracer Parser (#7333)
Summary:
Implement a parsing tool io_tracer_parser that takes IO trace file (binary file) with command line argument --io_trace_file and output file with --output_file and dumps the IO trace records in outputfile in human readable form.

Also added unit test cases that generates IO trace records and calls io_tracer_parse to parse those records.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7333

Test Plan:
make check -j64,
 Add unit test cases.

Reviewed By: anand1976

Differential Revision: D23772360

Pulled By: akankshamahajan15

fbshipit-source-id: 9c20519c189362e6663352d08863326f3e496271
2020-09-23 15:50:26 -07:00
Xavier Deguillard 249f2b59a0 build: make it compile with @mode/win (#7406)
Summary:
While rocksdb can compile on both macOS and Linux with Buck, it couldn't be
compiled on Windows. The only way to compile it on Windows was with the CMake
build.

To keep the multi-platform complexity low, I've simply included all the Windows
bits in the TARGETS file, and added large #if blocks when not on Windows, the
same was done on the posix specific files.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7406

Test Plan:
On my devserver:
  buck test //rocksdb/...
On Windows:
  buck build mode/win //rocksdb/src:rocksdb_lib

Reviewed By: pdillinger

Differential Revision: D23874358

Pulled By: xavierd

fbshipit-source-id: 8768b5d16d7e8f44b5ca1e2483881ca4b24bffbe
2020-09-23 12:55:54 -07:00
mrambacher 67bd5401e9 Changes to EncryptedEnv public API (#7279)
Summary:
Cleaned up the public API to use the EncryptedEnv.  This change will allow providers to be developed and added to the system easier in the future.  It will also allow better integration in the future with the OPTIONS file.

- The internal classes were moved out of the public API into an internal "env_encryption_ctr.h" header.  Short-cut constructors were added to provide the original API functionality.
- The APIs to the constructors were changed to take shared_ptr, rather than raw pointers or references to allow better memory management and alternative implementations.
- CreateFromString methods were added to allow future expansion to other provider and cipher implementations through a standard API.

Additionally, there was a code duplication in the NewXXXFile methods.  This common code was moved under a templatized function.

A first-pass at structuring the code was made to potentially allow multiple EncryptionProviders in a single EncryptedEnv.  The idea was that different providers may use different cipher keys or different versions/algorithms.  The EncryptedEnv should have some means of picking different providers based on information.  The groundwork was started for this (the use of the provider_ member variable was localized) but the work has not been completed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7279

Reviewed By: jay-zhuang

Differential Revision: D23709440

Pulled By: zhichao-cao

fbshipit-source-id: 0e845fff0e03a52603eb9672b4ade32d063ff2f2
2020-09-15 17:14:10 -07:00
Akanksha Mahajan 0de335e076 Use FSRandomRWFilePtr Object to call underlying file system. (#7198)
Summary:
Replace FSRandomRWFile pointer with FSRandomRWFilePtr object in the rocksdb internal code.
This new object wraps FSRandomRWFile pointer.

Objective: If tracing is enabled, FSRandomRWFile object returns FSRandomRWFileTracingWrapper pointer that includes all necessary information in IORecord and calls underlying FileSystem and invokes IOTracer to dump that record in a binary file. If tracing is disabled then, underlying FileSystem pointer is returned directly.
FSRandomRWFilePtr wrapper class is added to bypass the FSRandomRWFileWrapper when
tracing is disabled.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7198

Test Plan: make check -j64

Reviewed By: anand1976

Differential Revision: D23421116

Pulled By: akankshamahajan15

fbshipit-source-id: 8a5ba0e7d9c1ba34c3a6f29829b107c5f09ab6a3
2020-09-08 12:21:58 -07:00
Akanksha Mahajan b175eceb09 Store FSWritableFilePtr object in WritableFileWriter (#7193)
Summary:
Replace FSWritableFile pointer with FSWritableFilePtr
    object in WritableFileWriter.
    This new object wraps FSWritableFile pointer.

    Objective: If tracing is enabled, FSWritableFile Ptr returns
    FSWritableFileTracingWrapper pointer that includes all necessary
    information in IORecord and calls underlying FileSystem and invokes
    IOTracer to dump that record in a binary file. If tracing is disabled
    then, underlying FileSystem pointer is returned directly.
    FSWritableFilePtr wrapper class is added to bypass the
    FSWritableFileWrapper when
    tracing is disabled.

    Test Plan: make check -j64

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7193

Reviewed By: anand1976

Differential Revision: D23355915

Pulled By: akankshamahajan15

fbshipit-source-id: e62a27a13c1fd77e36a6dbafc7006d969bed25cf
2020-09-08 10:56:08 -07:00
Peter Dillinger 4e258d3e63 Fix backup/restore in stress/crash test (#7357)
Summary:
(1) Skip check on specific key if restoring an old backup
(small minority of cases) because it can fail in those cases. (2) Remove
an old assertion about number of column families and number of keys
passed in, which is broken by atomic flush (cf_consistency) test. Like
other code (for better or worse) assume a single key and iterate over
column families. (3) Apply mock_direct_io to NewSequentialFile so that
db_stress backup works on /dev/shm.

Also add more context to output in case of backup/restore db_stress
failure.

Also a minor fix to BackupEngine to report first failure status in
creating new backup, and drop another clue about the potential
source of a "Backup failed" status.

Reverts "Disable backup/restore stress test (https://github.com/facebook/rocksdb/issues/7350)"

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7357

Test Plan:
Using backup_one_in=10000,
"USE_CLANG=1 make crash_test_with_atomic_flush" for 30+ minutes
"USE_CLANG=1 make blackbox_crash_test" for 30+ minutes
And with use_direct_reads with TEST_TMPDIR=/dev/shm/rocksdb

Reviewed By: riversand963

Differential Revision: D23567244

Pulled By: pdillinger

fbshipit-source-id: e77171c2e8394d173917e36898c02dead1c40b77
2020-09-08 10:50:19 -07:00
Akanksha Mahajan 8e0df9050c Store FSRandomAccessPtr object in RandomAccessFileReader (#7192)
Summary:
Replace FSRandomAccessFile pointer with FSRandomAccessFilePtr
    object in RandomAccessFileReader.
    This new object wraps FSRandomAccessFile pointer.

    Objective: If tracing is enabled, FSRandomAccessFile Ptr returns
    FSRandomAccessFileTracingWrapper pointer that includes all necessary
    information in IORecord and calls underlying FileSystem and invokes
    IOTracer to dump that record in a binary file. If tracing is disabled
    then, underlying FileSystem pointer is returned directly.
    FSRandomAccessFilePtr wrapper class is added to bypass the FSRandomAccessFileWrapper when
    tracing is disabled.

    Test Plan: make check -j64

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7192

Reviewed By: anand1976

Differential Revision: D23356867

Pulled By: akankshamahajan15

fbshipit-source-id: 48f31168166a17a7444b40be44a9a9d4a5c7182c
2020-08-27 11:21:52 -07:00
mrambacher e9befdebbf Add EnvTestWithParam::OptionsTest to the ASSERT_STATUS_CHECKED passes (#7283)
Summary:
This test uses database functionality and required more extensive work to get it to pass than the other tests.  The DB functionality required for this test now passes the check.

When it was unclear what the proper behavior was for unchecked status codes, a TODO was added.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7283

Reviewed By: akankshamahajan15

Differential Revision: D23251497

Pulled By: ajkr

fbshipit-source-id: 52b79629bdafa0a58de8ead1d1d66f141b331523
2020-08-20 19:18:35 -07:00
Akanksha Mahajan cc24ac14eb Store FSSequentialFilePtr object in SequenceFileReader (#7190)
Summary:
This diff contains following changes:
    1. Replace `FSSequentialFile` pointer with `FSSequentialFilePtr` object that wraps `FSSequentialFile` pointer in `SequenceFileReader`.

Objective: If tracing is enabled, `FSSequentialFilePtr` returns `FSSequentialFileTracingWrapper` pointer that includes all necessary information in `IORecord` and calls underlying FileSystem and invokes `IOTracer` to dump that record in a binary file. If tracing is disabled then, underlying `FileSystem` pointer is returned directly. `FSSequentialFilePtr` wrapper class is added to bypass the `FSSequentialFileTracingWrapper` when tracing is disabled.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7190

Test Plan:
make check -j64
          COMPILE_WITH_TSAN=1 make check -j64

Reviewed By: anand1976

Differential Revision: D23059616

Pulled By: akankshamahajan15

fbshipit-source-id: 1564b94dd1297cd0fbfe2ed5c9cc3e20f7395301
2020-08-18 16:20:54 -07:00
Hans Holmberg 2a0d3c7054 Add a file system parameter: --fs_uri to db_stress and db_bench (#6878)
Summary:
This pull request adds the parameter --fs_uri to db_bench and db_stress, creating a composite env combining the default env with a specified registered rocksdb file system.

This makes it easier to develop and test new RocksDB FileSystems.

The pull request also registers the posix file system for testing purposes.

Examples:
```
$./db_bench --fs_uri=posix:// --benchmarks=fillseq

$./db_stress --fs_uri=zenfs://nullb1
```

zenfs is a RocksDB FileSystem I'm developing to add support for zoned block devices, and in that case the zoned block device is specified in the uri (a zoned null block device in the above example).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/6878

Reviewed By: siying

Differential Revision: D23023063

Pulled By: ajkr

fbshipit-source-id: 8b3fe7193ce45e683043b021779b7a4d547af247
2020-08-17 11:55:24 -07:00
Akanksha Mahajan 1f9f630b27 Store FileSystemPtr object that contains FileSystem ptr (#7180)
Summary:
As part of the IOTracing project, this PR
    1. Caches "FileSystemPtr" object(wrapper class that returns file system pointer based on tracing enabled) instead of "FileSystem" pointer.
    2. FileSystemPtr object is created using FileSystem pointer and IOTracer
    pointer.
    3. IOTracer shared_ptr is created in DBImpl and it is passed to different classes through constructor.
    4. When tracing is enabled through DB::StartIOTrace, FileSystemPtr
    returns FileSystemTracingWrapper pointer for tracing purpose and when
    it is disabled underlying FileSystem pointer is returned.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7180

Test Plan:
make check -j64
                COMPILE_WITH_TSAN=1 make check -j64

Reviewed By: anand1976

Differential Revision: D22987117

Pulled By: akankshamahajan15

fbshipit-source-id: 6073617e4c2d5bc363914f3a1f55ae3b0a58fbf1
2020-08-12 17:31:23 -07:00
Yingchun Lai 67bbac3621 Remove duplicate colon in Status message (#7041)
Summary:
A colon will be added after 'msg' automatically when invoke function Status(Code _code, const Slice& msg, const Slice& msg2),
it's not needed to append a colon explicitly to 'msg'.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7041

Reviewed By: ajkr

Differential Revision: D22292801

fbshipit-source-id: 8f2d69065bb779d2613468bf9fc9169f32c3f1ec
2020-08-06 15:18:04 -07:00
Akanksha Mahajan 493f425e77 Add support to start and end IOTracing through DB APIs (#7203)
Summary:
1. Add support to start io tracing through DB::StartIOTrace(Env*, const TraceOptions&, std::unique_ptr<TraceWriter>&&) and end tracing through DB::EndIOTrace(). This doesn't trace DB::Open.

User side code:

//Open DB
DB::Open(options, dbname, &db);

/* Start tracing */
db->StartIOTrace(env, trace_opt, std::move(trace_writer));

/* Perform Operations */

/*End tracing*/
db->EndIOTrace();

2. Fix the build errors for Windows.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7203

Test Plan: make check -j64

Reviewed By: anand1976

Differential Revision: D22901947

Pulled By: akankshamahajan15

fbshipit-source-id: e59c0b785a802168e6f1aa028d99c224a35cb30c
2020-08-04 18:41:45 -07:00
mrambacher d9d190742c Make env*_test work with ASSERT_STATUS_CHECKED (#7176)
Summary:
Make (most of) the env*_test pass when ASSERT_STATUS_CHECKED is enabled.

One test that opens a database is currently disabled in this mode, as there are many errors that need revisited for DB tests and status checks.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7176

Reviewed By: cheng-chang

Differential Revision: D22799278

Pulled By: ajkr

fbshipit-source-id: 16d8a02eaeecd6df1060249b6a5811292801f2ed
2020-07-28 22:59:48 -07:00
Akanksha Mahajan d93bd3ce25 Add FileSystem wrapper classes for IO tracing. (#7002)
Summary:
1. Add the wrapper classes FileSystemTracingWrapper, FSSequentialFileTracingWrapper, FSRandomAccessFileTracingWrapper, FSWritableFileTracingWrapper, FSRandomRWFileTracingWrapper that forward the calls to underlying storage system and then pass the file operation information to IOTracer. IOTracer dumps the record in binary format for tracing.
2. Add the wrapper classes FileSystemPtr, FSSequentialFilePtr, FSRandomAccessFilePtr, FSWritableFilePtr and FSRandomRWFilePtr that overload operator-> and return ptr to underlying storage system or Tracing wrapper class based on enabling/disabling of IO tracing. These classes are added to bypass Tracing Wrapper classes when we disable tracing.
3. Add enums in trace.h that distinguish which options need to be added for different file operations(Read, close, write etc) as part of tracing record.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7002

Test Plan: make check -j64

Reviewed By: anand1976

Differential Revision: D22127897

Pulled By: akankshamahajan15

fbshipit-source-id: 74cff58ce5661c9a3832dfaa52483f3b2d8565e0
2020-07-13 16:36:55 -07:00
mrambacher c7c7b07f06 More Makefile Cleanup (#7097)
Summary:
Cleans up some of the dependencies on test code in the Makefile while building tools:
- Moves the test::RandomString, DBBaseTest::RandomString into Random
- Moves the test::RandomHumanReadableString into Random
- Moves the DestroyDir method into file_utils
- Moves the SetupSyncPointsToMockDirectIO into sync_point.
- Moves the FaultInjection Env and FS classes under env

These changes allow all of the tools to build without dependencies on test_util, thereby simplifying the build dependencies.  By moving the FaultInjection code, the dependency in db_stress on different libraries for debug vs release was eliminated.

Tested both release and debug builds via Make and CMake for both static and shared libraries.

More work remains to clean up how the tools are built and remove some unnecessary dependencies.  There is also more work that should be done to get the Makefile and CMake to align in their builds -- what is in the libraries and the sizes of the executables are different.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7097

Reviewed By: riversand963

Differential Revision: D22463160

Pulled By: pdillinger

fbshipit-source-id: e19462b53324ab3f0b7c72459dbc73165cc382b2
2020-07-09 14:35:17 -07:00
sdong d64cf0e4ee Move away from direct TmpDir() call in some tests (#7030)
Summary:
Some tests directly uses TmpDir() as temporary directory without adding any randomize factor. This would cause failures when tests run in parallel. Fix it by moving some of them to test::PerThreadDBPath()
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7030

Test Plan: Watch existing tests pass

Reviewed By: zhichao-cao

Differential Revision: D22224710

fbshipit-source-id: 28c9932fede0a4a64670e5b5fdb08f4fb5dccdd0
2020-06-25 12:09:57 -07:00
sdong 9cc25190e1 Test CircleCI with CLANG-10 (#7025)
Summary:
It's useful to build RocksDB using a more recent clang version in CI. Add a CircleCI build and fix some issues with it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7025

Test Plan: See all tests pass.

Reviewed By: pdillinger

Differential Revision: D22215700

fbshipit-source-id: 914a729c2cd3f3ac4a627cc0ac58d4691dca2168
2020-06-24 16:22:49 -07:00
Zhichao Cao 83a4dd1a67 Fix the memory leak in Env_basic_test (#7017)
Summary:
Fix the memory leak broken asan and other test introduced by https://github.com/facebook/rocksdb/issues/6830
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7017

Test Plan: pass asan_check

Reviewed By: siying

Differential Revision: D22190289

Pulled By: zhichao-cao

fbshipit-source-id: 03a095f698b4f9d72fd9374191b17c890d7c2b56
2020-06-24 11:05:24 -07:00
Matthew Von-Maszewski 1092f19d95 Make EncryptEnv inheritable (#6830)
Summary:
EncryptEnv class is both declared and defined within env_encryption.cc.  This makes it really tough to derive new classes from that base.

This branch moves declaration of the class to rocksdb/env_encryption.h.  The change facilitates making new encryption modules (such as an upcoming openssl AES CTR pull request) possible / easy.

The only coding change was to add the EncryptEnv object to env_basic_test.cc.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6830

Reviewed By: riversand963

Differential Revision: D21706593

Pulled By: ajkr

fbshipit-source-id: 64d2da95a1569ceeb9b1549c3bec5404cf4c89f0
2020-06-22 13:27:16 -07:00
Peter Dillinger 88b4210701 Remove racially charged terms "whitelist" and "blacklist" (#7008)
Summary:
We don't need them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7008

Test Plan: "make check" and ensure "make crash_test" starts

Reviewed By: ajkr

Differential Revision: D22143838

Pulled By: pdillinger

fbshipit-source-id: 72c8e16603abc59f4954e304466bc4dc1f58f94e
2020-06-19 15:27:32 -07:00
Andrew Kryczka 312f23c92d build fixes for GNU/kFreeBSD (#6992)
Summary:
Upstream https://salsa.debian.org/mariadb-team/mariadb-10.4/-/blob/master/debian/patches/rocksdb-kfreebsd.patch
by jrtc27.

Fixes https://github.com/facebook/rocksdb/issues/5223.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6992

Reviewed By: zhichao-cao

Differential Revision: D22084150

Pulled By: ajkr

fbshipit-source-id: 1822311ba16f112a15065b2180ce89d36af9cafc
2020-06-18 09:51:28 -07:00
Cheng Chang f7613e2a9e Make it able to lower cpu priority to specific level in threadpool (#6969)
Summary:
`Env::LowerThreadPoolCPUPriority` takes a new parameter `CpuPriority` to be able to lower to a specific priority such as `CpuPriority::kIdle`, previously, the priority is always lowered to `CpuPriority::kLow`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6969

Test Plan: unit test `EnvPosixTest::LowerThreadPoolCpuPriority` added to `env_test.cc`.

Reviewed By: siying

Differential Revision: D22011169

Pulled By: cheng-chang

fbshipit-source-id: 568878c24a924912e35cef00c552d4a63431cdf4
2020-06-13 13:25:20 -07:00
Zhen Li d63f86e506 fix build with 'USE_HDFS' on windows (#6950)
Summary:
Build with "USE_HDFS" failed with below errors on Windows. This PR is trying to fix them
Severity	Code	Description	Project	File	Line	Suppression State
Error (active)	E0020	identifier "ssize_t" is undefined	rocksdb	D:\Git\rocksdb\rocksdb\env\env_hdfs.cc	127
Error (active)	E1696	cannot open source file "sys/time.h"	rocksdb	D:\Git\rocksdb\rocksdb\env\env_hdfs.cc	15
Error	C2065	'pthread_t': undeclared identifier	rocksdb	d:\git\rocksdb\rocksdb\hdfs\env_hdfs.h	166
Error	C3861	'pthread_self': identifier not found	rocksdb	d:\git\rocksdb\rocksdb\hdfs\env_hdfs.h	167
Error	C1083	Cannot open include file: 'sys/time.h': No such file or directory	rocksdb	d:\git\rocksdb\rocksdb\env\env_hdfs.cc	15
Error	C2065	'pthread_t': undeclared identifier	db_bench	d:\git\rocksdb\rocksdb\hdfs\env_hdfs.h	166
Error	C3861	'pthread_self': identifier not found	db_bench	d:\git\rocksdb\rocksdb\hdfs\env_hdfs.h	167
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6950

Test Plan:
1. manually test build with "USE_HDFS" on Windows, verified HDFS Env related function by db_bench.exe.
D:\Git\rocksdb\build\Debug>db_bench.exe --hdfs="abfs://test@rdbtest2.dfs.core.windows.net" --num=100 --benchmarks="fillseq,readseq,fillseekseq" --db="abfs://test@rdbtest2.dfs.core.windows.net/test"
2020-06-05 20:42:21,102 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-06-05 20:42:22,646 WARN utils.SSLSocketFactoryEx: Failed to load OpenSSL. Falling back to the JSSE default.
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 6.10
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    100
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    0.0 MB (estimated)
FileSize:   0.0 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: Snappy
Compression sampling rate: 0
Memtablerep: skip_list
Perf Level: 1
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
DB path: [abfs://test@rdbtest2.dfs.core.windows.net/test]
fillseq      :    1138.350 micros/op 877 ops/sec;    0.1 MB/s
DB path: [abfs://test@rdbtest2.dfs.core.windows.net/test]
readseq      :      63.580 micros/op 15627 ops/sec;    1.7 MB/s
DB path: [abfs://test@rdbtest2.dfs.core.windows.net/test]
fillseekseq  :      45.615 micros/op 21762 ops/sec;

Reviewed By: cheng-chang

Differential Revision: D21964806

Pulled By: riversand963

fbshipit-source-id: 9d7413178ece0113d11bc4398583f7d0590d5dbd
2020-06-12 16:21:50 -07:00
Yanqin Jin a8170d774c Close file to avoid file-descriptor leakage (#6936)
Summary:
When operation on an open file descriptor fails, we should close the file descriptor.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6936

Test Plan: make check

Reviewed By: pdillinger

Differential Revision: D21885458

Pulled By: riversand963

fbshipit-source-id: ba077a76b256a8537f21e22e4ec198f45390bf50
2020-06-04 14:21:15 -07:00
sdong 0b45a68c59 env_test */RunMany/* tests to run individually (#6931)
Summary:
When run */RunMany/* tests individually, e.g. ChrootEnvWithDirectIO/EnvPosixTestWithParam.RunMany/0, they hang. It's because they insert to background thread pool without initializing them. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6931

Test Plan: Run ChrootEnvWithDirectIO/EnvPosixTestWithParam.RunMany/0 by itself and see it passes.

Reviewed By: riversand963

Differential Revision: D21875603

fbshipit-source-id: 7f848174c1a660254a2b1f7e11cca5370793ba30
2020-06-04 09:51:38 -07:00
sdong afa3518839 Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923)
Summary:
This reverts commit 8d87e9cea1.

Based on offline discussions, it's too early to upgrade to gtest 1.10, as it prevents some developers from using an older version of gtest to integrate to some other systems. Revert it for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6923

Reviewed By: pdillinger

Differential Revision: D21864799

fbshipit-source-id: d0726b1ff649fc911b9378f1763316200bd363fc
2020-06-03 15:55:03 -07:00
Hans Holmberg 0f85d163e6 Route GetTestDirectory to FileSystem in CompositeEnvWrappers (#6896)
Summary:
GetTestDirectory implies a file system operation (it creates the
default test directory if missing), so it should be routed to
the FileSystem rather than the Env.

Also remove the GetTestDirectory implementation in the PosixEnv,
since it overrides GetTestDirectory in CompositeEnv making it
impossible to override with a custom FileSystem.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6896

Reviewed By: cheng-chang

Differential Revision: D21868984

Pulled By: ajkr

fbshipit-source-id: e79bfef758d06dacef727c54b96abe62e78726fd
2020-06-03 14:57:46 -07:00
hfrt456 f005dac2d9 fix IsDirectory function in env_hdfs.cc (#6917)
Summary:
fix IsDirectory function for hdfsEnv
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6917

Reviewed By: cheng-chang

Differential Revision: D21865020

Pulled By: riversand963

fbshipit-source-id: ad69ed564d027b7bbdf4c693dd57cd02622fb3f8
2020-06-03 13:50:17 -07:00
Zhichao Cao 2adb7e3768 Fix potential overflow of unsigned type in for loop (#6902)
Summary:
x.size() -1 or y - 1 can overflow to an extremely large value when x.size() pr y is 0 when they are unsigned type. The end condition of i in the for loop will be extremely large, potentially causes segment fault. Fix them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6902

Test Plan: pass make asan_check

Reviewed By: ajkr

Differential Revision: D21843767

Pulled By: zhichao-cao

fbshipit-source-id: 5b8b88155ac5a93d86246d832e89905a783bb5a1
2020-06-02 15:05:07 -07:00
Adam Retter 8d87e9cea1 Update googletest from 1.8.1 to 1.10.0 (#6808)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6808

Reviewed By: anand1976

Differential Revision: D21483984

Pulled By: pdillinger

fbshipit-source-id: 70c5eff2bd54ddba469761d95e4cd4611fb8e598
2020-06-01 20:33:42 -07:00
Akanksha Mahajan a1523efcdf Status check enforcement for io_posix_test and options_settable_test (#6857)
Summary:
Added status check enforcement for io_posix_test and options_settable_test
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6857

Test Plan: ASSERT_STATUS_CHECKED=1 make -j48 check

Reviewed By: ajkr

Differential Revision: D21647904

Pulled By: akankshamahajan15

fbshipit-source-id: b7f2321eb6c141a88cd5e1270ecb7d58f00341af
2020-05-19 19:22:28 -07:00
Cheng Chang ada700b906 Re-read the whole request in direct IO mode when IO uring returns partial result (#6853)
Summary:
If both direct IO and IO uring are enabled, when IO uring returns partial result, we'll try to read the remaining part of the request, but the starting address/offset of the remaining part might not be aligned to the block size, in direct IO mode, the unaligned offset causes bug.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6853

Test Plan: run make check with both direct IO and IO uring enabled, this is covered by one of the continuous tests.

Reviewed By: anand1976

Differential Revision: D21603023

Pulled By: cheng-chang

fbshipit-source-id: 942f6a11ff21e1892af6c4464e02bab4c707787c
2020-05-18 17:25:57 -07:00
Derrick Pallas 5272305437 Fix FilterBench when RTTI=0 (#6732)
Summary:
The dynamic_cast in the filter benchmark causes release mode to fail due to
no-rtti.  Replace with static_cast_with_check.

Signed-off-by: Derrick Pallas <derrick@pallas.us>

Addition by peterd: Remove unnecessary 2nd template arg on all static_cast_with_check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6732

Reviewed By: ltamasi

Differential Revision: D21304260

Pulled By: pdillinger

fbshipit-source-id: 6e8eb437c4ca5a16dbbfa4053d67c4ad55f1608c
2020-04-29 13:09:23 -07:00
Cheng Chang 1758f76f2d Fix unused variable of r in release mode (#6750)
Summary:
In release mode, asserts are not compiled, so `r` is not used, causing compiler warnings.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6750

Test Plan: make check under release mode

Reviewed By: anand1976

Differential Revision: D21220365

Pulled By: cheng-chang

fbshipit-source-id: fd4afa9843d54af68c4da8660ec61549803e1167
2020-04-24 15:14:13 -07:00
Cheng Chang 51bdfae010 Check alignment of MultiRead requests in direct IO mode (#6739)
Summary:
Add assertions to check direct IO's alignment requirements in MultiRead.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6739

Test Plan: make check

Reviewed By: siying

Differential Revision: D21143825

Pulled By: cheng-chang

fbshipit-source-id: 26f1623b062a1851080771128feac0669a61f5e9
2020-04-23 15:19:31 -07:00
Yanqin Jin 243852ec15 Add IsDirectory() to Env and FS (#6711)
Summary:
IsDirectory() is a common API to check whether a path is a regular file or
directory.
POSIX: call stat() and use S_ISDIR(st_mode)
Windows: PathIsDirectoryA() and PathIsDirectoryW()
HDFS: FileSystem.IsDirectory()
Java: File.IsDirectory()
...
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6711

Test Plan: make check

Reviewed By: anand1976

Differential Revision: D21053520

Pulled By: riversand963

fbshipit-source-id: 680aadfd8ce982b63689190cf31b3145d5a89e27
2020-04-17 14:39:18 -07:00
Cheng Chang 2767972386 Fix warning when O_CLOEXEC is not defined (#6695)
Summary:
Compilation fails on systems that do not support O_CLOEXEC. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6695

Test Plan: compile without O_CLOEXEC support

Reviewed By: anand1976

Differential Revision: D21011850

Pulled By: cheng-chang

fbshipit-source-id: f1bf1cce2aa65c7d10b5a9613e941db30e928347
2020-04-16 11:02:50 -07:00
Yi Wu 2b02ea25e2 Add counter in perf_context to time cipher time (#6596)
Summary:
Add `encrypt_data_time` and `decrypt_data_time` perf_context counters to time encryption/decryption time when `EnvEncryption` is enabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6596

Test Plan: CI

Reviewed By: anand1976

Differential Revision: D20678617

fbshipit-source-id: 7b57536143aa38509cde011f704de33382169e07
2020-04-01 16:59:35 -07:00
phantomape cb671ea1ca env: Add clearerr() before repeating an interrupted file read (#6609)
Summary:
This change updates PosixSequentialFile::Read to call clearerr()
before fread()ing again after an EINTR is returned on a previous
fread.

The original fix is from bd8f1ebb91.
Fixing https://github.com/facebook/rocksdb/issues/6509

Signed-off-by: phantomape <cxucheng@outlook.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6609

Reviewed By: zhichao-cao

Differential Revision: D20731482

Pulled By: riversand963

fbshipit-source-id: 7f1f3a1449077d5560f45c465a78d08633740ba0
2020-03-29 21:56:31 -07:00
Zhichao Cao 4246888101 Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.

The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487

Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check

Reviewed By: anand1976

Differential Revision: D20685017

Pulled By: zhichao-cao

fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:04:43 -07:00
anand76 a9d168cfd7 Simplify migration to FileSystem API (#6552)
Summary:
The current Env/FileSystem API separation has a couple of issues -
1. It requires the user to specify 2 options - ```Options::env``` and ```Options::file_system``` - which means they have to make code changes to benefit from the new APIs. Furthermore, there is a risk of accessing the same APIs in two different ways, through Env in the old way and through FileSystem in the new way. The two may not always match, for example, if env is ```PosixEnv``` and FileSystem is a custom implementation. Any stray RocksDB calls to env will use the ```PosixEnv``` implementation rather than the file_system implementation.
2. There needs to be a simple way for the FileSystem developer to instantiate an Env for backward compatibility purposes.

This PR solves the above issues and simplifies the migration in the following ways -
1. Embed a shared_ptr to the ```FileSystem``` in the ```Env```, and remove ```Options::file_system``` as a configurable option. This way, no code changes will be required in application code to benefit from the new API. The default Env constructor uses a ```LegacyFileSystemWrapper``` as the embedded ```FileSystem```.
1a. - This also makes it more robust by ensuring that even if RocksDB
  has some stray calls to Env APIs rather than FileSystem, they will go
  through the same object and thus there is no risk of getting out of
  sync.
2. Provide a ```NewCompositeEnv()``` API that can be used to construct a
PosixEnv with a custom FileSystem implementation. This eliminates an
indirection to call Env APIs, and relieves the FileSystem developer of
the burden of having to implement wrappers for the Env APIs.
3. Add a couple of missing FileSystem APIs - ```SanitizeEnvOptions()``` and
```NewLogger()```

Tests:
1. New unit tests
2. make check and make asan_check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6552

Reviewed By: riversand963

Differential Revision: D20592038

Pulled By: anand1976

fbshipit-source-id: c3801ad4153f96d21d5a3ae26c92ba454d1bf1f7
2020-03-23 21:54:21 -07:00
Yanqin Jin fb09ef05dc Attempt to recover from db with missing table files (#6334)
Summary:
There are situations when RocksDB tries to recover, but the db is in an inconsistent state due to SST files referenced in the MANIFEST being missing. In this case, previous RocksDB will just fail the recovery and return a non-ok status.
This PR enables another possibility. During recovery, RocksDB checks possible MANIFEST files, and try to recover to the most recent state without missing table file. `VersionSet::Recover()` applies version edits incrementally and "materializes" a version only when this version does not reference any missing table file. After processing the entire MANIFEST, the version created last will be the latest version.
`DBImpl::Recover()` calls `VersionSet::Recover()`. Afterwards, WAL replay will *not* be performed.
To use this capability, set `options.best_efforts_recovery = true` when opening the db. Best-efforts recovery is currently incompatible with atomic flush.

Test plan (on devserver):
```
$make check
$COMPILE_WITH_ASAN=1 make all && make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6334

Reviewed By: anand1976

Differential Revision: D19778960

Pulled By: riversand963

fbshipit-source-id: c27ea80f29bc952e7d3311ecf5ee9c54393b40a8
2020-03-20 19:30:48 -07:00
Cheng Chang 5fd152b7ad Get block size only in direct IO mode (#6522)
Summary:
When `use_direct_reads` and `use_direct_writes` are `false`, `logical_sector_size_` inside various `*File` implementations are not actually used, so `GetLogicalBlockSize` does not necessarily need to be called for `logical_sector_size_`, just set a default page size.

This is a follow up PR for https://github.com/facebook/rocksdb/pull/6457.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6522

Test Plan: make check

Reviewed By: siying

Differential Revision: D20408885

Pulled By: cheng-chang

fbshipit-source-id: f2d3808f41265237e7fa2c0be9f084f8fa97fe3d
2020-03-20 15:26:10 -07:00
Cheng Chang 2d9efc9ab2 Cache result of GetLogicalBufferSize in Linux (#6457)
Summary:
In Linux, when reopening DB with many SST files, profiling shows that 100% system cpu time spent for a couple of seconds for `GetLogicalBufferSize`. This slows down MyRocks' recovery time when site is down.

This PR introduces two new APIs:
1. `Env::RegisterDbPaths` and `Env::UnregisterDbPaths` lets `DB` tell the env when it starts or stops using its database directories . The `PosixFileSystem` takes this opportunity to set up a cache from database directories to the corresponding logical block sizes.
2. `LogicalBlockSizeCache` is defined only for OS_LINUX to cache the logical block sizes.

Other modifications:
1. rename `logical buffer size` to `logical block size` to be consistent with Linux terms.
2. declare `GetLogicalBlockSize` in `PosixHelper` to expose it to `PosixFileSystem`.
3. change the functions `IOError` and `IOStatus` in `env/io_posix.h` to have external linkage since they are used in other translation units too.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6457

Test Plan:
1. A new unit test is added for `LogicalBlockSizeCache` in `env/io_posix_test.cc`.
2. A new integration test is added for `DB` operations related to the cache in `db/db_logical_block_size_cache_test.cc`.

`make check`

Differential Revision: D20131243

Pulled By: cheng-chang

fbshipit-source-id: 3077c50f8065c0bffb544d8f49fb10bba9408d04
2020-03-11 18:40:05 -07:00
sdong 331e6199df Include more information in file lock failure (#6507)
Summary:
When users fail to open a DB with file lock failure, it is sometimes hard for users to debug. We now include the time the lock is acquired and the thread ID that acquired the lock, to help users debug problems like this. Default Env's thread ID is used.

Since type of lockedFiles is changed, rename it to follow naming convention too.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6507

Test Plan: Add a unit test and improve an existing test to validate the case.

Differential Revision: D20378333

fbshipit-source-id: 312fe0e9733fd1d1e9969c321b90ce523cf4708a
2020-03-11 16:23:08 -07:00
sumeerbhola 48d8d076a3 Add missing MutexLock to MockEnv::CreateDir (#6474)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6474

Differential Revision: D20205109

Pulled By: ltamasi

fbshipit-source-id: ec136005c63740f5b713ff537b5671ea9b8e217a
2020-03-02 20:52:19 -08:00
sdong 86f1ad7046 Add more unit test coverage to MultiRead (#6452)
Summary:
MultiRead tests in env_test cannot simulate the io_uring case when queries need to be submitted in multiple rounds. Add a new unit test to cover up more requests per MultiRead
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6452

Test Plan: Run it and see it pass when liburing is enabled or not enabled.

Differential Revision: D20078924

fbshipit-source-id: 6cff7fe345a4c5aa47135186e6181bf00df02b68
2020-02-28 16:42:44 -08:00
Michael R. Crusoe 051696bf98 fix some spelling typos (#6464)
Summary:
Found from Debian's "Lintian" program
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6464

Differential Revision: D20162862

Pulled By: zhichao-cao

fbshipit-source-id: 06941ee2437b038b2b8045becbe9d2c6fbff3e12
2020-02-28 14:14:03 -08:00
Peter Dillinger 43dde332cb Share kPageSize (and other small tweaks) (#6443)
Summary:
Make kPageSize extern const size_t (used in draft https://github.com/facebook/rocksdb/issues/6427)
Make kLitteEndian constexpr bool
Clarify a couple of comments
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6443

Test Plan: make check, CI

Differential Revision: D20044558

Pulled By: pdillinger

fbshipit-source-id: e0c5cc13229c82726280dc0ddcba4078346b8418
2020-02-22 08:01:36 -08:00
sdong 942eaba091 Handle io_uring partial results (#6441)
Summary:
The logic that handles io_uring partial results was wrong. Fix the logic by putting it into a queue and continue reading.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6441

Test Plan: Make sure this patch fixes the application test case where the bug was discovered; in env_test, add a unit test that simulates partial results and make sure the results are still correct.

Differential Revision: D20018616

fbshipit-source-id: 5398a7e34d74c26d52aa69dfd604e93e95d99c62
2020-02-21 16:57:37 -08:00
sdong fdf882ded2 Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433)
Summary:
When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433

Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.

Differential Revision: D19977691

fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
2020-02-20 12:09:57 -08:00
Cheng Chang 46516778dd Fix flaky test DecreaseNumBgThreads (#6393)
Summary:
The DecreaseNumBgThreads test keeps failing on Windows in AppVeyor.
It fails because it depends on a timed wait for the tasks to be dequeued from the threadpool's internal queue, but within the specified time, the task might have not been scheduled onto the newly created threads.
https://github.com/facebook/rocksdb/pull/6232 tries to fix this by waiting for longer time to let the threads scheduled.
This PR tries to fix this by replacing the timed wait with a synchronization on the task's internal conditional variable.
When the number of threads increases, instead of guessing the time needed for the task to be scheduled, it directly blocks on the conditional variable until the task starts running.
But when thread number is reduced, it still does a timed wait, but this does not lead to the flakiness now, will try to remove these timed waits in a future PR.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6393

Test Plan: Wait to see whether AppVeyor tests pass.

Differential Revision: D19890928

Pulled By: cheng-chang

fbshipit-source-id: 4e56e4addf625c98c0876e62d9d57a6f0a156f76
2020-02-13 17:27:18 -08:00
sdong 3a073234da Consolidate ReadFileToString() (#6366)
Summary:
It's a minor refactoring. We have two ReadFileToString() but they are very similar. Make the one with Env argument calls the one with FS argument instead.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6366

Test Plan: Run all existing tests

Differential Revision: D19712332

fbshipit-source-id: 5ae6fabf6355938690d95cda52afd1f39e0a7823
2020-02-04 11:39:23 -08:00
sdong 39410bcb3d Fix some shadow warning (#6242)
Summary:
Some shadow warning shows up when using gcc 4.8. An example:

./utilities/blob_db/blob_compaction_filter.h: In constructor ‘rocksdb::blob_db::BlobIndexCompactionFilterFactoryBase::BlobIndexCompactionFilterFactoryBase(rocksdb::blob_db::lobDBImpl*, rocksdb::Env*, rocksdb::Statistics*)’:
./utilities/blob_db/blob_compaction_filter.h:121:7: error: declaration of ‘blob_db_impl’ shadows a member of 'this' [-Werror=shadow]
       : blob_db_impl_(blob_db_impl), env_(_env), statistics_(_statistics) {}
       ^

Fix them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6242

Test Plan: Build and see the warnings go away.

Differential Revision: D19217789

fbshipit-source-id: 8ef631941f23dab47a388e060adec24b72efd65e
2020-01-08 18:20:13 -08:00
anand76 ad34faba15 Fix unity test (#6178)
Summary:
Fix the test failure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6178

Differential Revision: D19071208

Pulled By: maysamyabandeh

fbshipit-source-id: 71622832ac93ff2663946c546d9642d5b9e3d194
2019-12-14 15:39:41 -08:00
anand76 afa2420c2b Introduce a new storage specific Env API (#5761)
Summary:
The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.

This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.

The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.

This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.

The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761

Differential Revision: D18868376

Pulled By: anand1976

fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
2019-12-13 14:48:41 -08:00
sdong d1ae2c3faf Fix an asan warning caused by the recent io_uring change (#6135)
Summary:
ASAN reports:

internal_repo_rocksdb/repo:db_test - MultiThreaded/MultiThreadedDBTest.MultiThreaded/43: fatal
==2692739==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6130000500ca at pc 0x0000006be780 bp 0x7efef85ccd20 sp 0x7efef85cc4d0
[CONTEXT] === How to use this, how to get the raw stack trace, and more: fburl.com/ASAN ===
[CONTEXT] READ of size 331 at 0x6130000500ca thread T195
[CONTEXT]      #0 db_test_bin+0x6be77f                     __interceptor_strlen.part.35
[CONTEXT]      https://github.com/facebook/rocksdb/issues/1 internal_repo_rocksdb/repo/include/rocksdb/slice.h:55 rocksdb::Slice::Slice(char const*)
[CONTEXT]      https://github.com/facebook/rocksdb/issues/2 internal_repo_rocksdb/repo/env/io_posix.cc:522 rocksdb::PosixRandomAccessFile::MultiRead(rocksdb::ReadRequest*, unsigned long)

I looked at env/io_posix.cc:522 but don't see a reason why the line needs to be there at all, because it is not used before overwritten. So it must be a line that is put there as a bug. Remove it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6135

Test Plan: Rerun the same test which passes after the fix. Run all the tests and make sure they all pass.

Differential Revision: D18880251

fbshipit-source-id: 3b84ac6a05b67b529c4202e0ceb4c047460f44f2
2019-12-09 10:25:09 -08:00
sdong 7d79b32618 Break db_stress_tool.cc to a list of source files (#6134)
Summary:
db_stress_tool.cc now is a giant file. In order to main it easier to improve and maintain, break it down to multiple source files.
Most classes are turned into their own files. Separate .h and .cc files are created for gflag definiations. Another .h and .cc files are created for some common functions. Some test execution logic that is only loosely related to class StressTest is moved to db_stress_driver.h and db_stress_driver.cc. All the files are located under db_stress_tool/. The directory name is created as such because if we end it with either stress or test, .gitignore will ignore any file under it and makes it prone to issues in developements.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6134

Test Plan: Build under GCC7 with and without LITE on using GNU Make. Build with GCC 4.8. Build with cmake with -DWITH_TOOL=1

Differential Revision: D18876064

fbshipit-source-id: b25d0a7451840f31ac0f5ebb0068785f783fdf7d
2019-12-08 23:51:01 -08:00
sdong e3a82bb934 PosixRandomAccessFile::MultiRead() to use I/O uring if supported (#5881)
Summary:
Right now, PosixRandomAccessFile::MultiRead() executes read requests in parallel. In this PR, it leverages I/O Uring library to run it in parallel, even when page cache is enabled. This function will fall back if the kernel version doesn't support it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5881

Test Plan: Run the unit test on a kernel version supporting it and make sure all tests pass, and run a unit test on kernel version supporting it and see it pass. Before merging, will also run stress test and see it passes.

Differential Revision: D17742266

fbshipit-source-id: e05699c925ac04fdb42379456a4e23e4ebcb803a
2019-12-07 20:55:52 -08:00
Yanqin Jin 231fffd07c Add Env::SanitizeEnvOptions (#5885)
Summary:
Add Env::SanitizeEnvOptions to allow underlying environments properly
configure env options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5885

Test Plan:
```
make check
```

Differential Revision: D17910327

Pulled By: riversand963

fbshipit-source-id: 86a1ac616e485742c35c4a9cc9f1227c529fc00f
2019-10-14 12:25:00 -07:00
Andrew Kryczka b00761eea6 Fix block cache ID uniqueness for Windows builds (#5844)
Summary:
Since we do not evict a file's blocks from block cache before that file
is deleted, we require a file's cache ID prefix is both unique and
non-reusable. However, the Windows functionality we were relying on only
guaranteed uniqueness. That meant a newly created file could be assigned
the same cache ID prefix as a deleted file. If the newly created file
had block offsets matching the deleted file, full cache keys could be
exactly the same, resulting in obsolete data blocks returned from cache
when trying to read from the new file.

We noticed this when running on FAT32 where compaction was writing out
of order keys due to reading obsolete blocks from its input files. The
functionality is documented as behaving the same on NTFS, although I
wasn't able to repro it there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5844

Test Plan:
we had a reliable repro of out-of-order keys on FAT32 that
was fixed by this change

Differential Revision: D17752442

fbshipit-source-id: 95d983f9196cf415f269e19293b97341edbf7e00
2019-10-11 18:19:31 -07:00
Yanqin Jin 167cdc9f17 Support custom env in sst_dump (#5845)
Summary:
This PR allows for the creation of custom env when using sst_dump. If
the user does not set options.env or set options.env to nullptr, then sst_dump
will automatically try to create a custom env depending on the path to the sst
file or db directory. In order to use this feature, the user must call
ObjectRegistry::Register() beforehand.

Test Plan (on devserver):
```
$make all && make check
```
All tests must pass to ensure this change does not break anything.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5845

Differential Revision: D17678038

Pulled By: riversand963

fbshipit-source-id: 58ecb4b3f75246d52b07c4c924a63ee61c1ee626
2019-10-08 19:19:12 -07:00
sdong e8263dbdaa Apply formatter to recent 200+ commits. (#5830)
Summary:
Further apply formatter to more recent commits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830

Test Plan: Run all existing tests.

Differential Revision: D17488031

fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab
2019-09-20 12:04:26 -07:00
Maysam Yabandeh 638d239507 Charge block cache for cache internal usage (#5797)
Summary:
For our default block cache, each additional entry has extra memory overhead. It include LRUHandle (72 bytes currently) and the cache key (two varint64, file id and offset). The usage is not negligible. For example for block_size=4k, the overhead accounts for an extra 2% memory usage for the cache. The patch charging the cache for the extra usage, reducing untracked memory usage outside block cache. The feature is enabled by default and can be disabled by passing kDontChargeCacheMetadata to the cache constructor.
This PR builds up on https://github.com/facebook/rocksdb/issues/4258
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5797

Test Plan:
- Existing tests are updated to either disable the feature when the test has too much dependency on the old way of accounting the usage or increasing the cache capacity to account for the additional charge of metadata.
- The Usage tests in cache_test.cc are augmented to test the cache usage under kFullChargeCacheMetadata.

Differential Revision: D17396833

Pulled By: maysamyabandeh

fbshipit-source-id: 7684ccb9f8a40ca595e4f5efcdb03623afea0c6f
2019-09-16 15:26:21 -07:00
Shylock Hg 9eb3e1f77d Use delete to disable automatic generated methods. (#5009)
Summary:
Use delete to disable automatic generated methods instead of private, and put the constructor together for more clear.This modification cause the unused field warning, so add unused attribute to disable this warning.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5009

Differential Revision: D17288733

fbshipit-source-id: 8a767ce096f185f1db01bd28fc88fef1cdd921f3
2019-09-11 18:09:00 -07:00
Andrew Kryczka 20dd828c01 Avoid clock_gettime on pre-10.12 macOS versions (#5570)
Summary:
On older macOS like 10.10 we saw the following compiler error:

```
/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/env/env_posix.cc:845:19:
error: use of undeclared identifier 'CLOCK_THREAD_CPUTIME_ID'
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts);
                  ^
```

According to mac's `man clock_gettime`: "These functions first appeared in Mac
OSX 10.12". So we should not try to compile it on earlier versions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5570

Test Plan:
verified it compiles now on 10.10. Also did some investigation to
ensure it does not cause regression on macOS 10.12+, although I do not
have access to such an environment to really test.

Differential Revision: D17322629

Pulled By: riversand963

fbshipit-source-id: e0a412223854f826b4d83e6d15c3739ff4620d7d
2019-09-11 14:07:25 -07:00
Richard He cfc20019d1 Fixed FALLOC_FL_KEEP_SIZE undefined (#5614)
Summary:
Fix `error: ‘FALLOC_FL_KEEP_SIZE’` undeclared error in `io_posix.cc` during Vagrant build in CentOS as per issue https://github.com/facebook/rocksdb/issues/5599
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5614

Differential Revision: D17217960

fbshipit-source-id: ef736c51b16833107fd9ccc7917ed1def2a8d02c
2019-09-05 17:37:21 -07:00
Yi Wu 83b991922e Fix EncryptedEnv assert (#5735)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/5734. By reading the code the assert don't quite make sense to me, since `dataSize` and `fileOffset` has no correlation. But my knowledge about `EncryptedEnv` is very limited.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5735

Test Plan:
run `ENCRYPTED_ENV=1 ./db_encryption_test`

Signed-off-by: Yi Wu <yiwu@pingcap.com>

Differential Revision: D17133849

fbshipit-source-id: bb7262d308e5b2503c400b180edc252668df0ef0
2019-09-05 17:21:42 -07:00
sheng qiu c762efc4a9 fix compile error: ‘FALLOC_FL_KEEP_SIZE’ undeclared (#5708)
Summary:
add "linux/falloc.h" in env/io_posix.cc to fix compile error: ‘FALLOC_FL_KEEP_SIZE’ undeclared

Signed-off-by: sheng qiu <herbert1984106@gmail.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5708

Differential Revision: D16832922

fbshipit-source-id: 30e787c4a1b5a9724a8acfd68962ff5ec5f27d3e
2019-08-16 13:58:05 -07:00
Yi Wu 849a8c0ae0 fix sign compare warnings (#5651)
Summary:
Fix -Wsign-compare warnings for gcc9.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5651

Test Plan: Tested with ubuntu19.10+gcc9

Differential Revision: D16567428

fbshipit-source-id: 730b2704d42ba0c4e4ea946a3199bbb34be4c25c
2019-07-30 14:12:54 -07:00
Mark Rambacher cfcf045acc The ObjectRegistry class replaces the Registrar and NewCustomObjects.… (#5293)
Summary:
The ObjectRegistry class replaces the Registrar and NewCustomObjects.  Objects are registered with the registry by Type (the class must implement the static const char *Type() method).

This change is necessary for a few reasons:
- By having a class (rather than static template instances), the class can be passed between compilation units, meaning that objects could be registered and shared from a dynamic library with an executable.
- By having a class with instances, different units could have different objects registered.  This could be useful if, for example, one Option allowed for a dynamic library and one did not.

When combined with some other PRs (being able to load shared libraries, a Configurable interface to configure objects to/from string), this code will allow objects in external shared libraries to be added to a RocksDB image at run-time, rather than requiring every new extension to be built into the main library and called explicitly by every program.

Test plan (on riversand963's  devserver)
```
$COMPILE_WITH_ASAN=1 make -j32 all && sleep 1 && make check
```
All tests pass.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5293

Differential Revision: D16363396

Pulled By: riversand963

fbshipit-source-id: fbe4acb615bfc11103eef40a0b288845791c0180
2019-07-23 17:13:05 -07:00
ggaurav28 60d8b19836 Implemented a file logger that uses WritableFileWriter (#5491)
Summary:
Current PosixLogger performs IO operations using posix calls. Thus the
current implementation will not work for non-posix env. Created a new
logger class EnvLogger that uses env specific WritableFileWriter for IO operations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5491

Test Plan: make check

Differential Revision: D15909002

Pulled By: ggaurav28

fbshipit-source-id: 13a8105176e8e42db0c59798d48cb6a0dbccc965
2019-07-09 16:27:22 -07:00
Sagar Vemuri 68614a9608 Fix AlignedBuffer's usage in Encryption Env (#5396)
Summary:
The usage of `AlignedBuffer` in env_encryption.cc writes and reads to/from the AlignedBuffer's internal buffer directly without going through AlignedBuffer's APIs (like `Append` and `Read`), causing encapsulation to break in some cases. The writes are especially problematic as after the data is written to the buffer (directly using either memmove or memcpy), the size of the buffer is not updated ... causing the AlignedBuffer to lose track of the encapsulated buffer's current size.
Fixed this by updating the buffer size after every write.

Todo for later:
Add an overloaded method to AlignedBuffer to support a memmove in addition to a memcopy. Encryption env does a memmove, and hence I couldn't switch to using `AlignedBuffer.Append()`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5396

Test Plan: `make check`

Differential Revision: D15764756

Pulled By: sagar0

fbshipit-source-id: 2e24b52bd3b4b5056c5c1da157f91ddf89370183
2019-06-19 16:46:20 -07:00
Andrew Kryczka 220870523c Fix compilation with USE_HDFS (#5444)
Summary:
The changes in 8272a6de57 were untested with `USE_HDFS=1`. There were a couple compiler errors. This PR fixes them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5444

Test Plan:
```
$ EXTRA_LDFLAGS="-L/tmp/hadoop-3.1.2/lib/native/" EXTRA_CXXFLAGS="-I/tmp/hadoop-3.1.2/include" USE_HDFS=1 make -j12 check
```

Differential Revision: D15885009

fbshipit-source-id: 2a0a63739e0b9a2819b461ad63ce1292c4833fe2
2019-06-18 14:55:59 -07:00
Andrew Kryczka 2c9df9f9e5 Dynamic test whether sync_file_range returns ENOSYS (#5416)
Summary:
`sync_file_range` returns `ENOSYS` on Windows Subsystem for Linux even
when using a supposedly supported filesystem like ext4. To handle this
case we can do a dynamic check that a no-op `sync_file_range`
invocation, which is accomplished by passing zero for the `flags`
argument, succeeds.

Also I rearranged the function and comments to hopefully make it more
easily understandable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5416

Differential Revision: D15807061

fbshipit-source-id: d31d94e1f228b7850ea500e6199f8b5daf8cfbd3
2019-06-13 13:56:10 -07:00
Levi Tamasi a16d0cc494 Fix build errors regarding const qualifier being ignored on cast result type (#5432)
Summary:
This affects some TSAN builds:

env/env_test.cc: In member function ‘virtual void rocksdb::EnvPosixTestWithParam_MultiRead_Test::TestBody()’:
env/env_test.cc:1126:76: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers]
       auto data = NewAligned(kSectorSize * 8, static_cast<const char>(i + 1));
                                                                            ^
env/env_test.cc:1154:77: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers]
       auto buf = NewAligned(kSectorSize * 8, static_cast<const char>(i*2 + 1));
                                                                             ^
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5432

Differential Revision: D15727277

Pulled By: ltamasi

fbshipit-source-id: dc0e687b123e7c4d703ccc0c16b7167e07d1c9b0
2019-06-07 19:37:41 -07:00
Yanqin Jin cb1bf09bfc Fix tsan error (#5414)
Summary:
Previous code has a warning when compile with tsan, leading to an error since we have -Werror.
Compilation result
```
In file included from ./env/env_chroot.h:12,
                 from env/env_test.cc:40:
./include/rocksdb/env.h: In instantiation of ‘rocksdb::Status rocksdb::DynamicLibrary::LoadFunction(const string&, std::function<T>*) [with T = void*(void*, const char*); std::__cxx11::string = std::__cxx11::basic_string<char>]’:
env/env_test.cc:260:5:   required from here
./include/rocksdb/env.h:1010:17: error: cast between incompatible function types from ‘rocksdb::DynamicLibrary::FunctionPtr’ {aka ‘void* (*)()’} to ‘void* (*)(void*, const char*)’ [-Werror=cast-function-type]
     *function = reinterpret_cast<T*>(ptr);
                 ^~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
make: *** [env/env_test.o] Error 1
```
It also has another error reported by clang
```
env/env_posix.cc:141:11: warning: Value stored to 'err' during its initialization is never read
    char* err = dlerror();  // Clear any old error
          ^~~   ~~~~~~~~~
1 warning generated.
```

Test plan (on my devserver).
```
$make clean
$OPT=-g ROCKSDB_FBCODE_BUILD_WITH_PLATFORM007=1 COMPILE_WITH_TSAN=1 make -j32
$
$make clean
$USE_CLANG=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j1 analyze
```
Both should pass.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5414

Differential Revision: D15637315

Pulled By: riversand963

fbshipit-source-id: 8e307483761019a4d5998cab92d49516d7edffbf
2019-06-05 15:42:23 -07:00
anand76 0153e14569 Add a MultiRead() method to Env (#5311)
Summary:
Define the Env:: MultiRead() method to allow callers to request multiple block reads in one shot. The underlying Env implementation can parallelize it if it chooses to in order to reduce the overall IO latency.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5311

Differential Revision: D15502172

Pulled By: anand1976

fbshipit-source-id: 2b228269c2e11b5f54694d6b2bb3119c8a8ce2b9
2019-06-05 09:41:34 -07:00
Mark Rambacher c8267120d8 Add support for loading dynamic libraries into the RocksDB environment (#5281)
Summary:
This change adds a Dynamic Library class to the RocksDB Env.  Dynamic libraries are populated via the  Env::LoadLibrary method.

The addition of dynamic library support allows for a few different features to be developed:
1.  The compression code can be changed to use dynamic library support.  This would allow RocksDB to determine at run-time what compression packages were installed.  This change would eliminate the need to make sure the build-time and run-time environment had the same library set.  It would also simplify some of the Java build issues (where it attempts to build and include various packages inside the RocksDB jars).

2.  Along with other features (to be provided in a subsequent PR), this change would allow code/configurations to be added to RocksDB at run-time.  For example, the build system includes code for building an "rados" environment and adding "Cassandra" features.  Instead of these extensions being built into the base RocksDB code, these extensions could be loaded at run-time as required/appropriate, either by configuration or explicitly.

We intend to push out other changes in support of the extending RocksDB at run-time via configurations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5281

Differential Revision: D15447613

Pulled By: riversand963

fbshipit-source-id: 452cd4f54511c0bceee18f6d9d919aae9fd25fef
2019-06-03 23:02:56 -07:00
Siying Dong 000b9ec217 Move some logging related files to logging/ (#5387)
Summary:
Many logging related source files are under util/. It will be more structured if they are together.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5387

Differential Revision: D15579036

Pulled By: siying

fbshipit-source-id: 3850134ed50b8c0bb40a0c8ae1f184fa4081303f
2019-05-31 17:23:59 -07:00
Siying Dong 8843129ece Move some memory related files from util/ to memory/ (#5382)
Summary:
Move arena, allocator, and memory tools under util to a separate memory/ directory.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5382

Differential Revision: D15564655

Pulled By: siying

fbshipit-source-id: 9cd6b5d0d3d52b39606e19221fa154596e5852a5
2019-05-30 17:44:09 -07:00
Siying Dong e9e0101ca4 Move test related files under util/ to test_util/ (#5377)
Summary:
There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
ve them to a new directory test_util/, so that util/ is cleaner.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377

Differential Revision: D15551366

Pulled By: siying

fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
2019-05-30 11:25:51 -07:00
Raphael Bost 468ca61105 Break large file writes into 1GB chunks (#5213)
Summary:
This is a workaround for the issue described in #5169.
It has been tested on a database with very large values, but not dedicated test has been added to the code base.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5213

Differential Revision: D15243116

Pulled By: siying

fbshipit-source-id: e0c226a6cd71a60924dcd7ce7af74abcb4054484
2019-05-15 14:20:24 -07:00
Andrew Kryczka 8272a6de57 Optionally wait on bytes_per_sync to smooth I/O (#5183)
Summary:
The existing implementation does not guarantee bytes reach disk every `bytes_per_sync` when writing SST files, or every `wal_bytes_per_sync` when writing WALs. This can cause confusing behavior for users who enable this feature to avoid large syncs during flush and compaction, but then end up hitting them anyways.

My understanding of the existing behavior is we used `sync_file_range` with `SYNC_FILE_RANGE_WRITE` to submit ranges for async writeback, such that we could continue processing the next range of bytes while that I/O is happening. I believe we can preserve that benefit while also limiting how far the processing can get ahead of the I/O, which prevents huge syncs from happening when the file finishes.

Consider this `sync_file_range` usage: `sync_file_range(fd_, 0, static_cast<off_t>(offset + nbytes), SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE)`. Expanding the range to start at 0 and adding the `SYNC_FILE_RANGE_WAIT_BEFORE` flag causes any pending writeback (like from a previous call to `sync_file_range`) to finish before it proceeds to submit the latest `nbytes` for writeback. The latest `nbytes` are still written back asynchronously, unless processing exceeds I/O speed, in which case the following `sync_file_range` will need to wait on it.

There is a second change in this PR to use `fdatasync` when `sync_file_range` is unavailable (determined statically) or has some known problem with the underlying filesystem (determined dynamically).

The above two changes only apply when the user enables a new option, `strict_bytes_per_sync`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5183

Differential Revision: D14953553

Pulled By: siying

fbshipit-source-id: 445c3862e019fb7b470f9c7f314fc231b62706e9
2019-04-22 11:51:39 -07:00
Fosco Marotto 6c2bf9e916 Add copyright headers per FB open-source checkup tool. (#5199)
Summary:
internal task: T35568575
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5199

Differential Revision: D14962794

Pulled By: gfosco

fbshipit-source-id: 93838ede6d0235eaecff90d200faed9a8515bbbe
2019-04-18 10:55:01 -07:00
Siying Dong ed9f5e21aa Change OptimizeForPointLookup() and OptimizeForSmallDb() (#5165)
Summary:
Change the behavior of OptimizeForSmallDb() so that it is less likely to go out of memory.
Change the behavior of OptimizeForPointLookup() to take advantage of the new memtable whole key filter, and move away from prefix extractor as well as hash-based indexing, as they are prone to misuse.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5165

Differential Revision: D14880709

Pulled By: siying

fbshipit-source-id: 9af30e3c9e151eceea6d6b38701a58f1f9fb692d
2019-04-11 10:45:36 -07:00
jsteemann 313e877285 fix reading encrypted files beyond file boundaries (#5160)
Summary:
This fix should help reading from encrypted files if the file-to-be-read
is smaller than expected. For example, when using the encrypted env and
making it read a journal file of exactly 0 bytes size, the encrypted env
code crashes with SIGSEGV in its Decrypt function, as there is no check
if the read attempts to read over the file's boundaries (as specified
originally by the `dataSize` parameter).

The most important problem this patch addresses is however that there is
no size underlow check in `CTREncryptionProvider::CreateCipherStream`:

The stream to be read will be initialized to a size of always
`prefix.size() - (2 * blockSize)`. If the prefix however is smaller than
twice the block size, this will obviously assume a _very_ large stream
and read over the bounds. The patch adds a check here as follows:

    // If the prefix is smaller than twice the block size, we would below read a
    // very large chunk of the file (and very likely read over the bounds)
    assert(prefix.size() >= 2 * blockSize);
    if (prefix.size() < 2 * blockSize) {
      return Status::Corruption("Unable to read from file " + fname + ": read attempt would read beyond file bounds");
    }

so embedders can catch the error in their release builds.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5160

Differential Revision: D14834633

Pulled By: sagar0

fbshipit-source-id: 47aa39a6db8977252cede054c7eb9a663b9a3484
2019-04-08 14:57:25 -07:00
Yanqin Jin d77476ef55 Fix db_stress for custom env (#5122)
Summary:
Fix some hdfs-related code so that it can compile and run 'db_stress'
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5122

Differential Revision: D14675495

Pulled By: riversand963

fbshipit-source-id: cac280479efcf5451982558947eac1732e8bc45a
2019-03-28 19:20:27 -07:00
Yanqin Jin 9358178edc Support for single-primary, multi-secondary instances (#4899)
Summary:
This PR allows RocksDB to run in single-primary, multi-secondary process mode.
The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary.
Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`.

This PR has several components:
1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary.

2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue.

3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`.
3.1 Tail the primary's MANIFEST during recovery.
3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`.
3.3 Tailing WAL will be in a future PR.

4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899

Differential Revision: D14510945

Pulled By: riversand963

fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886
2019-03-26 16:45:31 -07:00
Zhongyi Xie 3c5eed5ebe remove incorrect assert in `GetUniqueIdFromFile` (#5102)
Summary:
User report has shown that sometimes `BlockBasedTable::SetupCacheKeyPrefix` would assert when trying to generate an id from the file. The actual cause seems to be hardware related but we might be better off without the incorrect assertion
See T42178927 for more information
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5102

Differential Revision: D14604677

Pulled By: miasantreble

fbshipit-source-id: fcb09207ebdc4fa66e941afbc0523d84797e7ad7
2019-03-25 23:28:29 -07:00
Rashmi Sharma a4396f9218 Make it easier for users to load options from option file and set shared block cache. (#5063)
Summary:
[RocksDB] Make it easier for users to load options from option file and set shared block cache.
Right now, it requires several dynamic casting for users to set the shared block cache to their option struct cast from the option file.
If people don't do that, every CF of every DB will generate its own 8MB block cache. It's not a usable setting. So we are dragging every user who loads options from the file into such a mess.
Instead, we should allow them to pass their cache object to LoadLatestOptions() and LoadOptionsFromFile(), so that those loaded option structs will have the shared block cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5063

Differential Revision: D14518584

Pulled By: rashmishrm

fbshipit-source-id: c91430ff9425a0e67d76fc67931d755f491ca5aa
2019-03-21 16:25:28 -07:00
Zhongyi Xie a291f3a1e5 Collect compaction stats by priority and dump to info LOG (#5050)
Summary:
In order to better understand compaction done by different priority thread pool, we now collect compaction stats by priority and also print them to info LOG through stats dump.

```
** Compaction Stats [default] **
Priority    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Low      0/0    0.00 KB   0.0     16.8    11.3      5.5       5.6      0.1       0.0   0.0    406.4    136.1     42.24             34.96        45    0.939     13M  8865K
High      0/0    0.00 KB   0.0      0.0     0.0      0.0      11.4     11.4       0.0   0.0      0.0     76.2    153.00             35.74     12185    0.013       0      0
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5050

Differential Revision: D14408583

Pulled By: miasantreble

fbshipit-source-id: e53746586ea27cb8abc9fec35805bd80ed30f608
2019-03-19 17:28:19 -07:00
Andrew Kryczka 186b3afaa8 Use `fallocate` even if hole-punching unsupported (#5023)
Summary:
The compiler flag `-DROCKSDB_FALLOCATE_PRESENT` was only set when
`fallocate`, `FALLOC_FL_KEEP_SIZE`, and `FALLOC_FL_PUNCH_HOLE` were all
present. However, the last of the three is not really necessary for the
primary `fallocate` use case; furthermore, it was introduced only in later
Linux kernel versions (2.6.38+).

This PR changes the flag `-DROCKSDB_FALLOCATE_PRESENT` to only require
`fallocate` and `FALLOC_FL_KEEP_SIZE` to be present. There is a separate
check for `FALLOC_FL_PUNCH_HOLE` only in the place where it is used.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5023

Differential Revision: D14248487

Pulled By: siying

fbshipit-source-id: a10ed0b902fa755988e957bd2dcec9081ec0502e
2019-03-04 15:43:17 -08:00
Michael Liu 3c5d1b16b1 Apply modernize-use-override (3)
Summary:
Use C++11’s override and remove virtual where applicable.
Change are automatically generated.

bypass-lint
drop-conflicts

Reviewed By: igorsugak

Differential Revision: D14131816

fbshipit-source-id: f20e7f7cecf2e699d70f5fa036f72c0e3f59b50e
2019-02-19 13:39:49 -08:00
Siying Dong c2affccc18 Header logger should call LogHeader() (#4980)
Summary:
The info log header feature never worked well, because log level Header was not
translated to Logger::LogHeader() call. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4980

Differential Revision: D14087283

Pulled By: siying

fbshipit-source-id: 7e7d03ce35fa8d13d4ee549f46f7326f7bc0006d
2019-02-15 16:59:36 -08:00
Young Tack Jin 4091597c67 fix for nvme device path (#4866)
Summary:
nvme device path doesn't have "block" as like "nvme/nvme0/nvme0n1"
or "nvme/nvme0/nvme0n1/nvme0n1p1". the last directory such as
"nvme0n1p1" should be removed if nvme drive is partitioned.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4866

Differential Revision: D13627824

Pulled By: riversand963

fbshipit-source-id: 09ab968f349f3dbb890beea20193f1359b17d317
2019-01-31 19:08:37 -08:00
Alexander Zinoviev 32a6dd9a41 Add a new CPU time counter to compaction report (#4889)
Summary:
Measure CPU time consumed for a compaction and report it in the stats report
Enable NowCPUNanos() to work for MacOS
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4889

Differential Revision: D13701276

Pulled By: zinoale

fbshipit-source-id: 5024e5bbccd4dd10fd90d947870237f436445055
2019-01-29 17:24:00 -08:00
Siying Dong da1c64b6e7 Introduce a CPU time counter in perf_context (#4741)
Summary:
Introduce the first CPU timing counter, perf_context.get_cpu_nanos. This opens a door to more CPU counters in the future.
Only Posix Env has it implemented using clock_gettime() with CLOCK_THREAD_CPUTIME_ID. How accurate the counter is depends on the platform.
Make PerfStepTimer to take an Env as an argument, and sometimes pass it in. The direct reason is to make the unit tests to use SpecialEnv where we can ingest logic there. But in long term, this is a good change.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4741

Differential Revision: D13287798

Pulled By: siying

fbshipit-source-id: 090361049d9d5095d1d1a369fe1338d2e2e1c73f
2018-12-20 12:03:44 -08:00
Sagar Vemuri dc3528077a Update all unique/shared_ptr instances to be qualified with namespace std (#4638)
Summary:
Ran the following commands to recursively change all the files under RocksDB:
```
find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
```
Running `make format` updated some formatting on the files touched.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638

Differential Revision: D12934992

Pulled By: sagar0

fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
2018-11-09 11:19:58 -08:00
Sagar Vemuri abb8ecb4cd Add missing methods to WritableFileWrapper (#4584)
Summary:
`WritableFileWrapper` was missing some newer methods that were added to `WritableFile`. Without these functions, the missing wrapper methods would fallback to using the default implementations in WritableFile instead of using the corresponding implementations in, say, `PosixWritableFile` or `WinWritableFile`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4584

Differential Revision: D10559199

Pulled By: sagar0

fbshipit-source-id: 0d0f18a486aee727d5b8eebd3110a41988e27391
2018-10-24 12:19:54 -07:00
Zhongyi Xie cac87fcf57 move dump stats to a separate thread (#4382)
Summary:
Currently statistics are supposed to be dumped to info log at intervals of `options.stats_dump_period_sec`. However the implementation choice was to bind it with compaction thread, meaning if the database has been serving very light traffic, the stats may not get dumped at all.
We decided to separate stats dumping into a new timed thread using `TimerQueue`, which is already used in blob_db. This will allow us schedule new timed tasks with more deterministic behavior.

Tested with db_bench using `--stats_dump_period_sec=20` in command line:
> LOG:2018/09/17-14:07:45.575025 7fe99fbfe700 [WARN] [db/db_impl.cc:605] ------- DUMPING STATS -------
LOG:2018/09/17-14:08:05.643286 7fe99fbfe700 [WARN] [db/db_impl.cc:605] ------- DUMPING STATS -------
LOG:2018/09/17-14:08:25.691325 7fe99fbfe700 [WARN] [db/db_impl.cc:605] ------- DUMPING STATS -------
LOG:2018/09/17-14:08:45.740989 7fe99fbfe700 [WARN] [db/db_impl.cc:605] ------- DUMPING STATS -------

LOG content:
> 2018/09/17-14:07:45.575025 7fe99fbfe700 [WARN] [db/db_impl.cc:605] ------- DUMPING STATS -------
2018/09/17-14:07:45.575080 7fe99fbfe700 [WARN] [db/db_impl.cc:606]
** DB Stats **
Uptime(secs): 20.0 total, 20.0 interval
Cumulative writes: 4447K writes, 4447K keys, 4447K commit groups, 1.0 writes per commit group, ingest: 5.57 GB, 285.01 MB/s
Cumulative WAL: 4447K writes, 0 syncs, 4447638.00 writes per sync, written: 5.57 GB, 285.01 MB/s
Cumulative stall: 00:00:0.012 H:M:S, 0.1 percent
Interval writes: 4447K writes, 4447K keys, 4447K commit groups, 1.0 writes per commit group, ingest: 5700.71 MB, 285.01 MB/s
Interval WAL: 4447K writes, 0 syncs, 4447638.00 writes per sync, written: 5.57 MB, 285.01 MB/s
Interval stall: 00:00:0.012 H:M:S, 0.1 percent
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4382

Differential Revision: D9933051

Pulled By: miasantreble

fbshipit-source-id: 6d12bb1e4977674eea4bf2d2ac6d486b814bb2fa
2018-10-08 22:54:43 -07:00
Sagar Vemuri b1dad4cfcc assert in PosixEnv::FileExists should be based on errno (#4427)
Summary:
The assert in PosixEnv::FileExists is currently based on the return value of `access` syscall. Instead it should be based on errno.

Initially I wanted to remove this assert as [`access`](https://linux.die.net/man/2/access) can error out in a few other cases (like EROFS). But on thinking more it feels like the assert is doing the right thing ...  its good to crash on EROFS, EFAULT, EINVAL, and other major filesystem related problems so that the user is immediately aware of the problems while testing.
(I think it might be ok to crash on EIO as well, but there might be a specific reason why it was decided not to crash for EIO, and I don't have that context. So letting the letting the assert checks remain as is for now).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4427

Differential Revision: D10037200

Pulled By: sagar0

fbshipit-source-id: 5cc96116a2e53cef701f444a8b5290576f311e51
2018-09-26 13:25:15 -07:00
Anand Ananthabhotla a27fce408e Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
  a) On the first occurance of an out of space error during compaction,
subsequent
  compactions will be delayed until the disk free space check indicates
  enough available space. The required space is computed as the sum of
  input sizes.
  b) The free space check requirement will be removed once the amount of
  free space is greater than the size reserved by in progress
  compactions when the first error occured
  c) If the out of space error is a hard error, a background thread in
  SFM will poll for sufficient headroom before triggering the recovery
  of the database and putting it in write-only mode. The headroom is
  calculated as the sum of the write_buffer_size of all the DB instances
  associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()

Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164

Differential Revision: D9846378

Pulled By: anand1976

fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:43:04 -07:00
cngzhnp 64324e329e Support pragma once in all header files and cleanup some warnings (#4339)
Summary:
As you know, almost all compilers support "pragma once" keyword instead of using include guards. To be keep consistency between header files, all header files are edited.

Besides this, try to fix some warnings about loss of data.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4339

Differential Revision: D9654990

Pulled By: ajkr

fbshipit-source-id: c2cf3d2d03a599847684bed81378c401920ca848
2018-09-05 18:13:31 -07:00
Wez Furlong d00e5de7fc use atomic O_CLOEXEC when available (#4328)
Summary:
In our application we spawn helper child processes concurrently with
opening rocksdb.  In one situation I observed that the child process had inherited
the rocksdb lock file as well as directory handles to the rocksdb storage location.

The code in env_posix takes care to set CLOEXEC but doesn't use `O_CLOEXEC` at the
time that the files are opened which means that there is a window of opportunity
to leak the descriptors across a fork/exec boundary.

This diff introduces a helper that can conditionally set the `O_CLOEXEC` bit for
the open call using the same logic as that in the existing helper for setting
that flag post-open.

I've preserved the post-open logic for systems that don't have `O_CLOEXEC`.

I've introduced setting `O_CLOEXEC` for what appears to be a number of temporary
or transient files and directory handles; I suspect that none of the files
opened by Rocks are intended to be inherited by a forked child process.

In one case, `fopen` is used to open a file.  I've added the use of the glibc-specific `e`
mode to turn on `O_CLOEXEC` for this case.  While this doesn't cover all posix systems,
it is an improvement for our common deployment system.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4328

Reviewed By: ajkr

Differential Revision: D9553046

Pulled By: wez

fbshipit-source-id: acdb89f7a85ca649b22fe3c3bd76f82142bec2bf
2018-08-29 20:27:43 -07:00
Jean-Marc Le Roux bbf30330b4 Fix the build failure with OS_ANDROID (#4232)
Summary:
sysmacros.h should be included in OS_ANDROID build as well otherwise the compile would complain: error: use of undeclared identifier 'major'.
Fixes https://github.com/facebook/rocksdb/issues/4231
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4232

Differential Revision: D9217350

Pulled By: maysamyabandeh

fbshipit-source-id: 21f4b62dbbda3163120ac0b38b95d95d35d67dce
2018-08-08 08:12:02 -07:00
Maysam Yabandeh 8581a93a6b Per-thread unique test db names (#4135)
Summary:
The patch makes sure that two parallel test threads will operate on different db paths. This enables using open source tools such as gtest-parallel to run the tests of a file in parallel.
Example: ``` ~/gtest-parallel/gtest-parallel ./table_test```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4135

Differential Revision: D8846653

Pulled By: maysamyabandeh

fbshipit-source-id: 799bad1abb260e3d346bcb680d2ae207a852ba84
2018-07-13 17:27:39 -07:00
Yanqin Jin 520bbb1774 Disable EnvPosixTest.RunImmediately, add EnvPosixTest.RunEventually. (#4126)
Summary:
The original `EnvPosixTest.RunImmediately` assumes that after scheduling
a background thread, the thread is guaranteed to complete after 0.1 second.
I do not know about any non-real-time OS/runtime providing this guarantee. Nor
does C++11 standard say anything about this in the documentation of `std::thread`.
In fact, we have observed this test failure multiple times on appveyor, and we
haven't been able to reproduce the failure deterministically. Therefore,
I disable this test for now until we know for sure how it used to fail.

Instead, I add another test `EnvPosixTest.RunEventually` that checks that
a thread will be scheduled eventually.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4126

Differential Revision: D8827086

Pulled By: riversand963

fbshipit-source-id: abc5cb655f90d50b791493da5eeb3716885dfe93
2018-07-12 18:27:15 -07:00
Siying Dong 926f3a78a6 In delete scheduler, before ftruncate file for slow delete, check whether there is other hard links (#4093)
Summary:
Right now slow deletion with ftruncate doesn't work well with checkpoints because it ruin hard linked files in checkpoints. To fix it, check the file has no other hard link before ftruncate it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4093

Differential Revision: D8730360

Pulled By: siying

fbshipit-source-id: 756eea5bce8a87b9a2ea3a5bfa190b2cab6f75df
2018-07-09 15:28:12 -07:00
Sagar Vemuri 645e57c22d Assert for Direct IO at the beginning in PositionedRead (#3891)
Summary:
Moved the direct-IO assertion to the top in `PosixSequentialFile::PositionedRead`, as it doesn't make sense to check for sector alignments before checking for direct IO.
Closes https://github.com/facebook/rocksdb/pull/3891

Differential Revision: D8267972

Pulled By: sagar0

fbshipit-source-id: 0ecf77c0fb5c35747a4ddbc15e278918c0849af7
2018-06-21 14:58:01 -07:00
Tomas Kolda 906a602c2c Build and tests fixes for Solaris Sparc (#4000)
Summary:
Here are some fixes for build on Solaris Sparc.

It is also fixing CRC test on BigEndian platforms.
Closes https://github.com/facebook/rocksdb/pull/4000

Differential Revision: D8455394

Pulled By: ajkr

fbshipit-source-id: c9289a7b541a5628139c6b77e84368e14dc3d174
2018-06-15 12:42:53 -07:00
Andrew Kryczka 1f32dc7d2b Check with PosixEnv before opening LOCK file (#3993)
Summary:
Rebased and resubmitting #1831 on behalf of stevelittle.

The problem is when a single process attempts to open the same DB twice, the second attempt fails due to LOCK file held. If the second attempt had opened the LOCK file, it'll now need to close it, and closing causes the file to be unlocked. Then, any subsequent attempt to open the DB will succeed, which is the wrong behavior.

The solution was to track which files a process has locked in PosixEnv, and check those before opening a LOCK file.

Fixes #1780.
Closes https://github.com/facebook/rocksdb/pull/3993

Differential Revision: D8398984

Pulled By: ajkr

fbshipit-source-id: 2755fe66950a0c9de63075f932f9e15768041918
2018-06-13 17:32:04 -07:00
Zhongyi Xie f1592a06c2 run make format for PR 3838 (#3954)
Summary:
PR https://github.com/facebook/rocksdb/pull/3838 made some changes that triggers lint warnings.
Run `make format` to fix formatting as suggested by siying .
Also piggyback two changes:
1) fix singleton destruction order for windows and posix env
2) fix two clang warnings
Closes https://github.com/facebook/rocksdb/pull/3954

Differential Revision: D8272041

Pulled By: miasantreble

fbshipit-source-id: 7c4fd12bd17aac13534520de0c733328aa3c6c9f
2018-06-05 12:58:02 -07:00
Andrew Kryczka 2210152947 Fix singleton destruction order of PosixEnv and SyncPoint (#3951)
Summary:
Ensure the PosixEnv singleton is destroyed first since its destructor waits for background threads to all complete. This ensures background threads cannot hit sync points after the SyncPoint singleton is destroyed, which was previously possible.
Closes https://github.com/facebook/rocksdb/pull/3951

Differential Revision: D8265295

Pulled By: ajkr

fbshipit-source-id: 7738dd458c5d993a78377dd0420e82badada81ab
2018-06-04 15:58:46 -07:00
奏之章 6e08916eb3 Fix Fadvise on closed file when reads use mmap
Summary:
```PosixMmapReadableFile::fd_``` is closed after created, but needs to remain open for the lifetime of `PosixMmapReadableFile` since it is used whenever `InvalidateCache` is called.
Closes https://github.com/facebook/rocksdb/pull/2764

Differential Revision: D8152515

Pulled By: ajkr

fbshipit-source-id: b738a6a55ba4e392f9b0f374ff396a1e61c64f65
2018-05-25 10:57:57 -07:00
Dmitri Smirnov 3db8504cde Catchup with posix features
Summary:
Catch up with Posix features
  NewWritableRWFile must fail when file does not exists
  Implement Env::Truncate()
  Adjust Env options optimization functions
  Implement MemoryMappedBuffer on Windows.
Closes https://github.com/facebook/rocksdb/pull/3857

Differential Revision: D8053610

Pulled By: ajkr

fbshipit-source-id: ccd0d46c29648a9f6f496873bc1c9d6c5547487e
2018-05-24 15:13:04 -07:00
Andrew Kryczka 072ae671a7 Apply use_direct_io_for_flush_and_compaction to writes only
Summary:
Previously `DBOptions::use_direct_io_for_flush_and_compaction=true` combined with `DBOptions::use_direct_reads=false` could cause RocksDB to simultaneously read from two file descriptors for the same file, where background reads used direct I/O and foreground reads used buffered I/O. Our measurements found this mixed-mode I/O negatively impacted foreground read perf, compared to when only buffered I/O was used.

This PR makes the mixed-mode I/O situation impossible by repurposing `DBOptions::use_direct_io_for_flush_and_compaction` to only apply to background writes, and `DBOptions::use_direct_reads` to apply to all reads. There is no risk of direct background direct writes happening simultaneously with buffered reads since we never read from and write to the same file simultaneously.
Closes https://github.com/facebook/rocksdb/pull/3829

Differential Revision: D7915443

Pulled By: ajkr

fbshipit-source-id: 78bcbf276449b7e7766ab6b0db246f789fb1b279
2018-05-09 19:42:58 -07:00
Siying Dong 3690276e74 Disallow to open RandomRW file if the file doesn't exist
Summary:
The only use of RandomRW is to change seqno when bulkloading, and in this use case, the file should exist. We should fail the file opening in this case.
Closes https://github.com/facebook/rocksdb/pull/3827

Differential Revision: D7913719

Pulled By: siying

fbshipit-source-id: 62cf6734f1a6acb9e14f715b927da388131c3492
2018-05-09 10:27:26 -07:00
Andrew Kryczka 19fde54841 initialize local variable for UBSAN in PosixEnv function
Summary:
this is a repeat commit of a8a28da215, which got reverted together with 6afe22db2e, but forgotten about when that commit was un-reverted in 46152d53bf.
Closes https://github.com/facebook/rocksdb/pull/3796

Differential Revision: D7826077

Pulled By: ajkr

fbshipit-source-id: edb22375da56e2feda50c5b35f942f4d2d52b19c
2018-05-01 13:27:05 -07:00
Andrew Kryczka 46152d53bf Second attempt at db_stress crash-recovery verification
Summary:
- Original commit: a4fb1f8c04
- Revert commit (we reverted as a quick fix to get crash tests passing): 6afe22db2e

This PR includes the contents of the original commit plus two bug fixes, which are:

- In whitebox crash test, only set `--expected_values_path` for `db_stress` runs in the first half of the crash test's duration. In the second half, a fresh DB is created for each `db_stress` run, so we cannot maintain expected state across `db_stress` runs.
- Made `Exists()` return true for `UNKNOWN_SENTINEL` values. I previously had an assert in `Exists()` that value was not `UNKNOWN_SENTINEL`. But it is possible for post-crash-recovery expected values to be `UNKNOWN_SENTINEL` (i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where a `SingleDelete` deletes no data. But if we had returned false, the effect would be calling `SingleDelete` on a key with multiple older versions, which is not supported.
Closes https://github.com/facebook/rocksdb/pull/3793

Differential Revision: D7811671

Pulled By: ajkr

fbshipit-source-id: 67e0295bfb1695ff9674837f2e05bb29c50efc30
2018-04-30 12:27:34 -07:00
Andrew Kryczka 6afe22db2e revert db_stress crash-recovery verification
Summary:
crash-recovery verification is failing in the whitebox testing, which may or may not be a valid correctness issue -- need more time to investigate. In the meantime, reverting so we don't mask other failures.
Closes https://github.com/facebook/rocksdb/pull/3786

Differential Revision: D7794516

Pulled By: ajkr

fbshipit-source-id: 28ccdfdb9ec9b3b0fb08c15cbf9d2e282201ff33
2018-04-27 12:57:01 -07:00
Nathan VanBenschoten 37cd617b6b Add virtual Truncate method to Env
Summary:
This change adds a virtual `Truncate` method to `Env`, which truncates
the named file to the specified size. At the moment, this is only
supported for `MockEnv`, but other `Env's` could be extended to override
the method too. This is the same approach that methods like `LinkFile` and
`AreSameFile` have taken.

This is useful for any user of the in-memory `Env`. The implementation's
header is not exported, so before this change, it was impossible to
access it's already existing `Truncate` method.
Closes https://github.com/facebook/rocksdb/pull/3779

Differential Revision: D7785789

Pulled By: ajkr

fbshipit-source-id: 3bcdaeea7b7180529f7d9b496dc67b791a00bbf0
2018-04-26 21:12:51 -07:00
Andrew Kryczka dfc61e7c24 initialize local variable for UBSAN in PosixEnv function
Summary:
It seems clear to me that the variable is initialized before line 492, but it wasn't clear to UBSAN. The failure was:

```
In file included from ./env/io_posix.h:14:0,
                 from env/env_posix.cc:44:
./include/rocksdb/env.h: In member function ‘virtual rocksdb::Status rocksdb::{anonymous}::PosixEnv::NewMemoryMappedFileBuffer(const string&, std::unique_ptr<rocksdb::MemoryMappedFileBuffer>*)’:
./include/rocksdb/env.h:822:36: error: ‘base’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
       : base(_base), length(_length) {}
                                    ^
env/env_posix.cc:482:11: note: ‘base’ was declared here
     void* base;
```

We can just initialize to nullptr to keep UBSAN happy.
Closes https://github.com/facebook/rocksdb/pull/3770

Differential Revision: D7756287

Pulled By: ajkr

fbshipit-source-id: 0f2efb9594e2d3a30706a4ca7e1d4a6328031bf2
2018-04-25 13:42:02 -07:00
Andrew Kryczka a4fb1f8c04 Add crash-recovery correctness check to db_stress
Summary:
Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values.

In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`.

For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class.

`db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup.
Closes https://github.com/facebook/rocksdb/pull/3629

Differential Revision: D7463144

Pulled By: ajkr

fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b
2018-04-24 15:58:22 -07:00
Gabriel Wicke 090c78a0d7 Support lowering CPU priority of background threads
Summary:
Background activities like compaction can negatively affect
latency of higher-priority tasks like request processing. To avoid this,
rocksdb already lowers the IO priority of background threads on Linux
systems. While this takes care of typical IO-bound systems, it does not
help much when CPU (temporarily) becomes the bottleneck. This is
especially likely when using more expensive compression settings.

This patch adds an API to allow for lowering the CPU priority of
background threads, modeled on the IO priority API. Benchmarks (see
below) show significant latency and throughput improvements when CPU
bound. As a result, workloads with some CPU usage bursts should benefit
from lower latencies at a given utilization, or should be able to push
utilization higher at a given request latency target.

A useful side effect is that compaction CPU usage is now easily visible
in common tools, allowing for an easier estimation of the contribution
of compaction vs. request processing threads.

As with IO priority, the implementation is limited to Linux, degrading
to a no-op on other systems.
Closes https://github.com/facebook/rocksdb/pull/3763

Differential Revision: D7740096

Pulled By: gwicke

fbshipit-source-id: e5d32373e8dc403a7b0c2227023f9ce4f22b413c
2018-04-24 08:41:51 -07:00
Yi Wu 2e72a5899b Disable EnvPosixTest::FilePermission
Summary:
The test is flaky in our CI but could not be reproduce manually on the same CI host. Disabling it.
Closes https://github.com/facebook/rocksdb/pull/3753

Differential Revision: D7716320

Pulled By: yiwu-arbug

fbshipit-source-id: 6bed3b05880c1d24e8dc86bc970e5181bc98fb45
2018-04-20 15:42:42 -07:00
Andrew Kryczka 3cea61392f include thread-pool priority in thread names
Summary:
Previously threads were named "rocksdb:bg\<index in thread pool\>", so the first thread in all thread pools would be named "rocksdb:bg0". Users want to be able to distinguish threads used for flush (high-pri) vs regular compaction (low-pri) vs compaction to bottom-level (bottom-pri). So I changed the thread naming convention to include the thread-pool priority.
Closes https://github.com/facebook/rocksdb/pull/3702

Differential Revision: D7581415

Pulled By: ajkr

fbshipit-source-id: ce04482b6acd956a401ef22dc168b84f76f7d7c1
2018-04-18 17:27:56 -07:00
Zhongyi Xie 954b496b3f fix memory leak in two_level_iterator
Summary:
this PR fixes a few failed contbuild:
1. ASAN memory leak in Block::NewIterator (table/block.cc:429). the proper destruction of first_level_iter_ and second_level_iter_ of two_level_iterator.cc is missing from the code after the refactoring in https://github.com/facebook/rocksdb/pull/3406
2. various unused param errors introduced by https://github.com/facebook/rocksdb/pull/3662
3. updated comment for `ForceReleaseCachedEntry` to emphasize the use of `force_erase` flag.
Closes https://github.com/facebook/rocksdb/pull/3718

Reviewed By: maysamyabandeh

Differential Revision: D7621192

Pulled By: miasantreble

fbshipit-source-id: 476c94264083a0730ded957c29de7807e4f5b146
2018-04-15 17:26:26 -07:00
Xiaofei Du a0102aa6d7 Make database files' permissions configurable
Summary: Closes https://github.com/facebook/rocksdb/pull/3709

Differential Revision: D7610227

Pulled By: xiaofeidu008

fbshipit-source-id: 88a52f0f9f96e2195fccde995cf9760b785e9f07
2018-04-13 13:13:04 -07:00
Steven Fackler 1f5457ef21 Merge raw and shared pointer log method impls
Summary:
Calling rocksdb::Log, rocksdb::Info, etc with a `shared_ptr<Logger>` should behave the same as calling those functions with a `Logger *`. This PR achieves it by making the `shared_ptr<Logger>` versions delegate to the `Logger *` versions.

Closes #3689
Closes https://github.com/facebook/rocksdb/pull/3710

Differential Revision: D7595557

Pulled By: ajkr

fbshipit-source-id: 64dd7f20fd42dc821bac7b8032705c35b483e00d
2018-04-13 11:12:54 -07:00
David Lai 3be9b36453 comment unused parameters to turn on -Wunused-parameter flag
Summary:
This PR comments out the rest of the unused arguments which allow us to turn on the -Wunused-parameter flag. This is the second part of a codemod relating to https://github.com/facebook/rocksdb/pull/3557.
Closes https://github.com/facebook/rocksdb/pull/3662

Differential Revision: D7426121

Pulled By: Dayvedde

fbshipit-source-id: 223994923b42bd4953eb016a0129e47560f7e352
2018-04-12 17:59:16 -07:00
Bruce Mitchener a3a3f5497c Fix some typos in comments and docs.
Summary: Closes https://github.com/facebook/rocksdb/pull/3568

Differential Revision: D7170953

Pulled By: siying

fbshipit-source-id: 9cfb8dd88b7266da920c0e0c1e10fb2c5af0641c
2018-03-08 10:27:25 -08:00
Bruce Mitchener 0de710f5b8 Use nullptr instead of NULL / 0 more consistently.
Summary: Closes https://github.com/facebook/rocksdb/pull/3569

Differential Revision: D7170968

Pulled By: yiwu-arbug

fbshipit-source-id: 308a6b7dd358a04fd9a7de3d927bfd8abd57d348
2018-03-07 12:42:12 -08:00
amytai 0a3db28d98 Disallow compactions if there isn't enough free space
Summary:
This diff handles cases where compaction causes an ENOSPC error.
This does not handle corner cases where another background job is started while compaction is running, and the other background job triggers ENOSPC, although we do allow the user to provision for these background jobs with SstFileManager::SetCompactionBufferSize.
It also does not handle the case where compaction has finished and some other background job independently triggers ENOSPC.

Usage: Functionality is inside SstFileManager. In particular, users should set SstFileManager::SetMaxAllowedSpaceUsage, which is the reference highwatermark for determining whether to cancel compactions.
Closes https://github.com/facebook/rocksdb/pull/3449

Differential Revision: D7016941

Pulled By: amytai

fbshipit-source-id: 8965ab8dd8b00972e771637a41b4e6c645450445
2018-03-06 16:27:54 -08:00
Fosco Marotto d518fe1da6 uint64_t and size_t changes to compile for iOS
Summary:
In attempting to build a static lib for use in iOS, I ran in to lots of type errors between uint64_t and size_t.  This PR contains the changes I made to get `TARGET_OS=IOS make static_lib` to succeed while also getting Xcode to build successfully with the resulting `librocksdb.a` library imported.

This also compiles for me on macOS and tests fine, but I'm really not sure if I made the correct decisions about where to `static_cast` and where to change types.

Also up for discussion: is iOS worth supporting?  Getting the static lib is just part one, we aren't providing any bridging headers or wrappers like the ObjectiveRocks project, it won't be a great experience.
Closes https://github.com/facebook/rocksdb/pull/3503

Differential Revision: D7106457

Pulled By: gfosco

fbshipit-source-id: 82ac2073de7e1f09b91f6b4faea91d18bd311f8e
2018-03-06 12:43:51 -08:00
Dmitri Smirnov c364eb42b5 Windows cumulative patch
Summary:
This patch addressed several issues.
  Portability including db_test std::thread -> port::Thread Cc: @
  and %z to ROCKSDB portable macro. Cc: maysamyabandeh

  Implement Env::AreFilesSame

  Make the implementation of file unique number more robust

  Get rid of C-runtime and go directly to Windows API when dealing
  with file primitives.

  Implement GetSectorSize() and aling unbuffered read on the value if
  available.

  Adjust Windows Logger for the new interface, implement CloseImpl() Cc: anand1976

  Fix test running script issue where $status var was of incorrect scope
  so the failures were swallowed and not reported.

  DestroyDB() creates a logger and opens a LOG file in the directory
  being cleaned up. This holds a lock on the folder and the cleanup is
  prevented. This fails one of the checkpoin tests. We observe the same in production.
  We close the log file in this change.

 Fix DBTest2.ReadAmpBitmapLiveInCacheAfterDBClose failure where the test
 attempts to open a directory with NewRandomAccessFile which does not
 work on Windows.
  Fix DBTest.SoftLimit as it is dependent on thread timing. CC: yiwu-arbug
Closes https://github.com/facebook/rocksdb/pull/3552

Differential Revision: D7156304

Pulled By: siying

fbshipit-source-id: 43db0a757f1dfceffeb2b7988043156639173f5b
2018-03-06 11:57:43 -08:00
Andrew Kryczka 5d68243e61 Comment out unused variables
Summary:
Submitting on behalf of another employee.
Closes https://github.com/facebook/rocksdb/pull/3557

Differential Revision: D7146025

Pulled By: ajkr

fbshipit-source-id: 495ca5db5beec3789e671e26f78170957704e77e
2018-03-05 13:13:41 -08:00
Zhongyi Xie 92b1a68d87 fix FreeBSD build
Summary:
Currently FreeBSD build is broken in master and possibly some previous releases due to unrecognized symbol `O_DIRECT`.
This PR will fix the build on FreeBSD
Closes https://github.com/facebook/rocksdb/pull/3560

Differential Revision: D7148646

Pulled By: miasantreble

fbshipit-source-id: 95b6c3d310fa531267c086b2cd40a5ab1c042b5a
2018-03-05 11:12:28 -08:00
Anand Ananthabhotla dfbe52e099 Fix the Logger::Close() and DBImpl::Close() design pattern
Summary:
The recent Logger::Close() and DBImpl::Close() implementation rely on
calling the CloseImpl() virtual function from the destructor, which will
not work. Refactor the implementation to have a private close helper
function in derived classes that can be called by both CloseImpl() and
the destructor.
Closes https://github.com/facebook/rocksdb/pull/3528

Reviewed By: gfosco

Differential Revision: D7049303

Pulled By: anand1976

fbshipit-source-id: 76a64cbf403209216dfe4864ecf96b5d7f3db9f4
2018-02-23 13:57:26 -08:00
Igor Sugak aba3409740 Back out "[codemod] - comment out unused parameters"
Reviewed By: igorsugak

fbshipit-source-id: 4a93675cc1931089ddd574cacdb15d228b1e5f37
2018-02-22 12:43:17 -08:00
David Lai f4a030ce81 - comment out unused parameters
Reviewed By: everiq, igorsugak

Differential Revision: D7046710

fbshipit-source-id: 8e10b1f1e2aecebbfb229c742e214db887e5a461
2018-02-22 09:44:23 -08:00
jsteemann 4e7a182d09 Several small "fixes"
Summary:
- removed a few unneeded variables
- fused some variable declarations and their assignments
- fixed right-trimming code in string_util.cc to not underflow
- simplifed an assertion
- move non-nullptr check assertion before dereferencing of that pointer
- pass an std::string function parameter by const reference instead of by value (avoiding potential copy)
Closes https://github.com/facebook/rocksdb/pull/3507

Differential Revision: D7004679

Pulled By: sagar0

fbshipit-source-id: 52944952d9b56dfcac3bea3cd7878e315bb563c4
2018-02-15 16:57:37 -08:00
Tamir Duberstein cd5092e168 Suppress unused warnings
Summary:
- Use `__unused__` everywhere
- Suppress unused warnings in Release mode
    + This currently affects non-MSVC builds (e.g. mingw64).
Closes https://github.com/facebook/rocksdb/pull/3448

Differential Revision: D6885496

Pulled By: miasantreble

fbshipit-source-id: f2f6adacec940cc3851a9eee328fafbf61aad211
2018-02-02 12:27:07 -08:00
Anand Ananthabhotla d0f1b49ab6 Add a Close() method to DB to return status when closing a db
Summary:
Currently, the only way to close an open DB is to destroy the DB
object. There is no way for the caller to know the status. In one
instance, the destructor encountered an error due to failure to
close a log file on HDFS. In order to prevent silent failures, we add
DB::Close() that calls CloseImpl() which must be implemented by its
descendants.
The main failure point in the destructor is closing the log file. This
patch also adds a Close() entry point to Logger in order to get status.
When DBOptions::info_log is allocated and owned by the DBImpl, it is
explicitly closed by DBImpl::CloseImpl().
Closes https://github.com/facebook/rocksdb/pull/3348

Differential Revision: D6698158

Pulled By: anand1976

fbshipit-source-id: 9468e2892553eb09c4c41b8723f590c0dbd8ab7d
2018-01-16 11:08:57 -08:00
Yi Wu bbcd3b0bd2 Suppress valgrind "unimplemented functionality" error
Summary:
Add ROCKSDB_VALGRIND_RUN macro and suppress false-positive "unimplemented functionality" throw by valgrind for steam hints.

Another approach would be add a valgrind suppress file. Valgrind is suppose to print the suppression when given "--gen-suppressions=all" param, which is suppose to be the content for the suppression file. But it doesn't print.
Closes https://github.com/facebook/rocksdb/pull/3174

Differential Revision: D6338786

Pulled By: yiwu-arbug

fbshipit-source-id: 3559efa5f3b92d40d09ad6ac82bc7b59f86c75aa
2017-11-15 14:28:34 -08:00