Commit Graph

331 Commits

Author SHA1 Message Date
sdong 69a32eecab Use madvise() for mmaped file advise (#10170)
Summary:
A recent PR https://github.com/facebook/rocksdb/pull/10142 enabled fadvise for mmaped file. However, we were told that it might not take effective and madvise() should be used.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10170

Test Plan:
Run existing tests
Run a benchmark using mmap with advise random and see I/O size is indeed small.

Reviewed By: anand1976

Differential Revision: D37158582

fbshipit-source-id: 8b3a74f0e89d2e16aac78ee4124c05841d4135c3
2022-06-15 13:05:58 -07:00
Akanksha Mahajan 19345de60d fix cancel argument for latest liburing (#10168)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10168

the arg changed to u64

Reviewed By: ajkr

Differential Revision: D37155407

fbshipit-source-id: 464eab2806675f148fce075a6fea369fa3d7a9bb
2022-06-15 09:10:19 -07:00
Akanksha Mahajan 40d19bc12c Fix the failure related to io_uring_prep_cancel (#10165)
Summary:
Fix for Internal jobs are failing with
```
 error: no matching function for call to 'io_uring_prep_cancel'
      io_uring_prep_cancel(sqe, posix_handle, 0);
      ^~~~~~~~~~~~~~~~~~~~
note: candidate function not viable: no known conversion from 'rocksdb::Posix_IOHandle *' to '__u64' (aka 'unsigned long long') for 2nd argument
static inline void io_uring_prep_cancel(struct io_uring_sqe *sqe,
```

User data is set using `io_uring_set_data` API so no need to pass posix_handle here.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10165

Test Plan: CircleCI jobs

Reviewed By: jay-zhuang

Differential Revision: D37145233

Pulled By: akankshamahajan15

fbshipit-source-id: 05da650e1240e9c6fcc8aed5f0067308dccb164a
2022-06-14 12:35:11 -07:00
Akanksha Mahajan c0e0f30667 Implement AbortIO using io_uring (#10125)
Summary:
Implement AbortIO in posix using io_uring to cancel any pending read requests submitted. Its cancelled using io_uring_prep_cancel which sets the IORING_OP_ASYNC_CANCEL flag.

To cancel a request, the sqe must have ->addr set to the user_data of the request it wishes to cancel. If the request is cancelled successfully, the original request is completed with -ECANCELED and the cancel request is completed with a result of 0. If the request was already running, the original may or may not complete in error. The cancel request will complete with -EALREADY for that case. And finally, if the request to cancel wasn't found, the cancel request is completed with -ENOENT.

Reference: https://kernel.dk/io_uring-whatsnew.pdf,
https://lore.kernel.org/io-uring/d9a8d76d23690842f666c326631ecc2d85b6c1bc.1615566409.git.asml.silence@gmail.com/

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10125

Test Plan: Existing Posix tests.

Reviewed By: anand1976

Differential Revision: D36946970

Pulled By: akankshamahajan15

fbshipit-source-id: 3bc1f1521b3151d01a348fc6431eb3fc85db3a14
2022-06-13 18:07:24 -07:00
sdong 80afa77660 Run fadvise with mmap file (#10142)
Summary:
Right now with mmap file, we don't run fadvise following users' requests. There is no reason for that so this diff does that.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10142

Test Plan:
A simple readrandom against files with page cache dropped shows latency improvement from 7.8 us to 2.8:

./db_bench -use_existing_db --benchmarks=readrandom --num=100

Reviewed By: anand1976

Differential Revision: D37074975

fbshipit-source-id: ccc72bcac1b5fd634eb8fa2b6a5d9afe332e0bf6
2022-06-10 16:34:01 -07:00
zczhu e88d8935ae Add comments/permit unchecked error to close_db_dir pull requests (#10093)
Summary:
In [close_db_dir](https://github.com/facebook/rocksdb/pull/10049) pull request, some merging conflicts occurred (some comments and one line `s.PermitUncheckedError()` are missing). This pull request aims to put them back.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10093

Reviewed By: ajkr

Differential Revision: D36884117

Pulled By: littlepig2013

fbshipit-source-id: 8c0e2a8793fc52804067c511843bd1ff4912c1c3
2022-06-02 21:52:35 -07:00
Zichen Zhu 65893ad959 Explicitly closing all directory file descriptors (#10049)
Summary:
Currently, the DB directory file descriptor is left open until the deconstruction process (`DB::Close()` does not close the file descriptor). To verify this, comment out the lines between `db_ = nullptr` and `db_->Close()` (line 512, 513, 514, 515 in ldb_cmd.cc) to leak the ``db_'' object, build `ldb` tool and run
```
strace --trace=open,openat,close ./ldb --db=$TEST_TMPDIR --ignore_unknown_options put K1 V1 --create_if_missing
```
There is one directory file descriptor that is not closed in the strace log.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10049

Test Plan: Add a new unit test DBBasicTest.DBCloseAllDirectoryFDs: Open a database with different WAL directory and three different data directories, and all directory file descriptors should be closed after calling Close(). Explicitly call Close() after a directory file descriptor is not used so that the counter of directory open and close should be equivalent.

Reviewed By: ajkr, hx235

Differential Revision: D36722135

Pulled By: littlepig2013

fbshipit-source-id: 07bdc2abc417c6b30997b9bbef1f79aa757b21ff
2022-06-01 18:03:34 -07:00
Gang Liao e228515740 Pass the size of blob files to SstFileManager during DB open (#10062)
Summary:
RocksDB uses the (no longer aptly named) SST file manager (see https://github.com/facebook/rocksdb/wiki/Managing-Disk-Space-Utilization) to track and potentially limit the space used by SST and blob files (as well as to rate-limit the deletion of these data files). The SST file manager tracks the SST and blob file sizes in an in-memory hash map, which has to be rebuilt during DB open. File sizes can be generally obtained by querying the file system; however, there is a performance optimization possibility here since the sizes of SST and blob files are also tracked in the RocksDB MANIFEST, so we can simply pass the file sizes stored there instead of consulting the file system for each file. Currently, this optimization is only implemented for SST files; we would like to extend it to blob files as well.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10062

Test Plan:
Add unit tests for the change to the test suite
ltamasi riversand963  akankshamahajan15

Reviewed By: ltamasi

Differential Revision: D36726621

Pulled By: gangliao

fbshipit-source-id: 4010dc46ef7306142f1c2e0d1c3bf75b196ef82a
2022-05-27 05:58:43 -07:00
Akanksha Mahajan 28ea1fb44a Provide support for IOTracing for ReadAsync API (#9833)
Summary:
Same as title

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9833

Test Plan:
Add unit test and manually check the output of tracing logs
For fixed readahead_size it logs as:
```
Access Time : 193352113447923     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 15075     , IO Status: OK, Length: 12288, Offset: 659456
Access Time : 193352113465232     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 14425     , IO Status: OK, Length: 12288, Offset: 671744
Access Time : 193352113481539     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 13062     , IO Status: OK, Length: 12288, Offset: 684032
Access Time : 193352113497692     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 13649     , IO Status: OK, Length: 12288, Offset: 696320
Access Time : 193352113520043     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 19384     , IO Status: OK, Length: 12288, Offset: 708608
Access Time : 193352113538401     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 15406     , IO Status: OK, Length: 12288, Offset: 720896
Access Time : 193352113554855     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 13670     , IO Status: OK, Length: 12288, Offset: 733184
Access Time : 193352113571624     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 13855     , IO Status: OK, Length: 12288, Offset: 745472
Access Time : 193352113587924     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 13953     , IO Status: OK, Length: 12288, Offset: 757760
Access Time : 193352113603285     , File Name: 000026.sst          , File Operation: Prefetch          , Latency: 59        , IO Status: Not implemented: Prefetch not supported, Length: 8868, Offset: 898349
```

For implicit readahead:
```
Access Time : 193351865156587     , File Name: 000026.sst          , File Operation: Prefetch          , Latency: 48        , IO Status: Not implemented: Prefetch not supported, Length: 12266, Offset: 391174
Access Time : 193351865160354     , File Name: 000026.sst          , File Operation: Prefetch          , Latency: 51        , IO Status: Not implemented: Prefetch not supported, Length: 12266, Offset: 395248
Access Time : 193351865164253     , File Name: 000026.sst          , File Operation: Prefetch          , Latency: 49        , IO Status: Not implemented: Prefetch not supported, Length: 12266, Offset: 399322
Access Time : 193351865165461     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 222871    , IO Status: OK, Length: 135168, Offset: 401408
```

Reviewed By: anand1976

Differential Revision: D35601634

Pulled By: akankshamahajan15

fbshipit-source-id: 5a4f32a850af878efa0767bd5706380152a1f26e
2022-05-25 19:47:03 -07:00
Levi Tamasi af7ae912e2 Fix potential ambiguities in/around port/sys_time.h (#10045)
Summary:
There are some time-related POSIX APIs that are not available on Windows
(e.g. `localtime_r`), which we have worked around by providing our own
implementations in `port/sys_time.h`. This workaround actually relies on
some ambiguity: on Windows, a call to `localtime_r` calls
`ROCKSDB_NAMESPACE::port::localtime_r` (which is pulled into
`ROCKSDB_NAMESPACE` by a using-declaration), while on other platforms
it calls the global `localtime_r`. This works fine as long as there is only one
candidate function; however, it breaks down when there is more than one
`localtime_r` visible in a scope.

The patch fixes this by introducing `ROCKSDB_NAMESPACE::port::{TimeVal, GetTimeOfDay, LocalTimeR}`
to eliminate any ambiguity.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10045

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D36639372

Pulled By: ltamasi

fbshipit-source-id: fc13dbfa421b7c8918111a6d9e24ce77e91a7c50
2022-05-24 18:20:17 -07:00
Changyu Bi 8515bd50c9 Support read rate-limiting in SequentialFileReader (#9973)
Summary:
Added rate limiter and read rate-limiting support to SequentialFileReader. I've updated call sites to SequentialFileReader::Read with appropriate IO priority (or left a TODO and specified IO_TOTAL for now).

The PR is separated into four commits: the first one added the rate-limiting support, but with some fixes in the unit test since the number of request bytes from rate limiter in SequentialFileReader are not accurate (there is overcharge at EOF). The second commit fixed this by allowing SequentialFileReader to check file size and determine how many bytes are left in the file to read. The third commit added benchmark related code. The fourth commit moved the logic of using file size to avoid overcharging the rate limiter into backup engine (the main user of SequentialFileReader).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9973

Test Plan:
- `make check`, backup_engine_test covers usage of SequentialFileReader with rate limiter.
- Run db_bench to check if rate limiting is throttling as expected: Verified that reads and writes are together throttled at 2MB/s, and at 0.2MB chunks that are 100ms apart.
  - Set up: `./db_bench --benchmarks=fillrandom -db=/dev/shm/test_rocksdb`
  - Benchmark:
```
strace -ttfe read,write ./db_bench --benchmarks=backup -db=/dev/shm/test_rocksdb --backup_rate_limit=2097152 --use_existing_db
strace -ttfe read,write ./db_bench --benchmarks=restore -db=/dev/shm/test_rocksdb --restore_rate_limit=2097152 --use_existing_db
```
- db bench on backup and restore to ensure no performance regression.
  - backup (avg over 50 runs): pre-change: 1.90443e+06 micros/op; post-change: 1.8993e+06 micros/op (improve by 0.2%)
  - restore (avg over 50 runs): pre-change: 1.79105e+06 micros/op; post-change: 1.78192e+06 micros/op (improve by 0.5%)

```
# Set up
./db_bench --benchmarks=fillrandom -db=/tmp/test_rocksdb -num=10000000

# benchmark
TEST_TMPDIR=/tmp/test_rocksdb
NUM_RUN=50
for ((j=0;j<$NUM_RUN;j++))
do
   ./db_bench -db=$TEST_TMPDIR -num=10000000 -benchmarks=backup -use_existing_db | egrep 'backup'
  # Restore
  #./db_bench -db=$TEST_TMPDIR -num=10000000 -benchmarks=restore -use_existing_db
done > rate_limit.txt && awk -v NUM_RUN=$NUM_RUN '{sum+=$3;sum_sqrt+=$3^2}END{print sum/NUM_RUN, sqrt(sum_sqrt/NUM_RUN-(sum/NUM_RUN)^2)}' rate_limit.txt >> rate_limit_2.txt
```

Reviewed By: hx235

Differential Revision: D36327418

Pulled By: cbi42

fbshipit-source-id: e75d4307cff815945482df5ba630c1e88d064691
2022-05-24 10:28:57 -07:00
mrambacher b11ff347b4 Use STATIC_AVOID_DESTRUCTION for static objects with non-trivial destructors (#9958)
Summary:
Changed the static objects that had non-trivial destructors to use the STATIC_AVOID_DESTRUCTION construct.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9958

Reviewed By: pdillinger

Differential Revision: D36442982

Pulled By: mrambacher

fbshipit-source-id: 029d47b1374d30d198bfede369a4c0ae7a4eb519
2022-05-17 09:39:22 -07:00
mrambacher 204a42ca97 Added GetFactoryCount/Names/Types to ObjectRegistry (#9358)
Summary:
These methods allow for more thorough testing of the ObjectRegistry and Customizable infrastructure in a simpler manner.  With this change, the Customizable tests can now check what factories are registered and attempt to create each of them in a systematic fashion.

With this change, I think all of the factories registered with the ObjectRegistry/CreateFromString are now tested via the customizable_test classes.

Note that there were a few other minor changes.  There was a "posix://*" register with the ObjectRegistry which was missed during the PatternEntry conversion -- these changes found that.  The nickname and default names for the FileSystem classes was also inverted.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9358

Reviewed By: pdillinger

Differential Revision: D33433542

Pulled By: mrambacher

fbshipit-source-id: 9a32da74e6620745b4eeffb2712be70eeeadfa7e
2022-05-16 09:44:43 -07:00
mrambacher bfc6a8ee4a Option type info functions (#9411)
Summary:
Add methods to set the various functions (Parse, Serialize, Equals) to the OptionTypeInfo.  These methods simplify the number of constructors required for OptionTypeInfo and make the code a little clearer.

Add functions to the OptionTypeInfo for Prepare and Validate.  These methods allow types other than Configurable and Customizable to have Prepare and Validate logic.  These methods could be used by an option to guarantee that its settings were in a range or that a value was initialized.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9411

Reviewed By: pdillinger

Differential Revision: D36174849

Pulled By: mrambacher

fbshipit-source-id: 72517d8c6bab4723788a4c1a9e16590bff870125
2022-05-13 04:57:08 -07:00
sdong 736a7b5433 Remove own ToString() (#9955)
Summary:
ToString() is created as some platform doesn't support std::to_string(). However, we've already used std::to_string() by mistake for 16 months (in db/db_info_dumper.cc). This commit just remove ToString().

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9955

Test Plan: Watch CI tests

Reviewed By: riversand963

Differential Revision: D36176799

fbshipit-source-id: bdb6dcd0e3a3ab96a1ac810f5d0188f684064471
2022-05-06 13:03:58 -07:00
Akanksha Mahajan fce65e7e4f Fix bug in async_io path which reads incorrect length (#9916)
Summary:
In FilePrefetchBuffer, in case data is overlapping between two
buffers and more data is required to read and copy that to third buffer,
incorrect length was updated resulting in
```
Iterator diverged from control iterator which has value 00000000000310C3000000000000012B0000000000000274 total_order_seek: 1 auto_prefix_mode: 0 S 000000000002C37F000000000000012B000000000000001C NNNPPPPPNN; total_order_seek: 1 auto_prefix_mode: 0 S 000000000002F10B00000000000000BF78787878787878 NNNPNNNNPN; total_order_seek: 1 auto_prefix_mode: 0 S 00000000000310C3000000000000012B000000000000026B
iterator is not valid
Control CF default
db_stress: db_stress_tool/db_stress_test_base.cc:1388: void rocksdb::StressTest::VerifyIterator(rocksdb::ThreadState*, rocksdb::ColumnFamilyHandle*, const rocksdb::ReadOptions&, rocksdb::Iterator*, rocksdb::Iterator*, rocksdb::StressTest::LastIterateOp, const rocksdb::Slice&, const string&, bool*): Assertion `false' failed.
Aborted (core dumped)
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9916

Test Plan:
```
- CircleCI jobs
- Ran db_stress with OPTIONS file which caught the bug
 ./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --async_io=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=42.26248932628998 --bottommost_compression_type=disable --cache_index_and_filter_blocks=0 --cache_size=8388608 --checkpoint_one_in=0 --checksum_type=kxxHash --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_ttl=0 --compression_max_dict_buffer_bytes=1073741823 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=zstd --compression_zstd_max_train_bytes=65536 --continuous_verification_interval=0 --db=/dev/shm/rocksdb/ --db_write_buffer_size=134217728 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_blob_files=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --fail_if_options_file_error=0 --file_checksum_impl=none --flush_one_in=1000000 --format_version=4 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=12 --index_type=2 --ingest_external_file_one_in=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --long_running_snapshots=0 --mark_for_compaction_one_file_in=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=25000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=8388608 --memtable_prefix_bloom_size_ratio=0.001 --memtable_whole_key_filtering=1 --memtablerep=skip_list --mmap_read=0 --mock_direct_io=False --nooverwritepercent=1 --open_files=100 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=16 --ops_per_thread=100000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=0 --progress_reports=0 --read_fault_one_in=0 --read_only=0 --readpercent=50 --recycle_log_file_num=1 --reopen=0 --reserve_table_reader_memory=0 --ribbon_starting_level=999 --secondary_cache_fault_one_in=0 --secondary_catch_up_one_in=0 --set_options_one_in=10000 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=0 --subcompactions=2 --sync=0 --sync_fault_injection=False --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=0 --test_cf_consistency=0 --top_level_index_pinning=3 --unpartitioned_pinning=3 --use_blob_db=0 --use_block_based_filter=0 --use_clock_cache=0 --use_direct_io_for_flush_and_compaction=1 --use_direct_reads=0 --use_full_merge_v1=0 --use_merge=0 --use_multiget=0 --use_txn=0 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --wal_compression=zstd --write_buffer_size=4194304 --write_dbid_to_manifest=0 --writepercent=35 --options_file=/home/akankshamahajan/OPTIONS.orig -column_families=1

db_bench with async_io enabled to make sure db_bench completes successfully without any failure.
- ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 -async_io=1
```

crash_test in progress

Reviewed By: anand1976

Differential Revision: D35985789

Pulled By: akankshamahajan15

fbshipit-source-id: 5abe185f34caa99ca587d4bdc8954bd0802b1bf9
2022-04-27 22:33:29 -07:00
Akanksha Mahajan 5bd374b392 Add experimental new FS API AbortIO to cancel read request (#9901)
Summary:
Add experimental new API AbortIO in FileSystem to abort the
read requests submitted asynchronously through ReadAsync API.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9901

Test Plan: Existing tests

Reviewed By: anand1976

Differential Revision: D35885591

Pulled By: akankshamahajan15

fbshipit-source-id: df3944e6e9e6e487af1fa688376b4abb6837fb02
2022-04-25 14:20:03 -07:00
Akanksha Mahajan 36bc3da97f Fix segfault in FilePrefetchBuffer with async_io enabled (#9777)
Summary:
If FilePrefetchBuffer object is destroyed and then later Poll() calls callback on object which has been destroyed, it gives segfault on accessing destroyed object. It was caught after adding unit tests that tests Posix implementation of ReadAsync and Poll APIs.
This PR also updates and fixes existing IOURing tests which were not running locally because RocksDbIOUringEnable function wasn't defined and IOUring was disabled for those tests

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9777

Test Plan: Added new unit test

Reviewed By: anand1976

Differential Revision: D35254002

Pulled By: akankshamahajan15

fbshipit-source-id: 68e80054ffb14ae25c255920ebc6548ca5f130a1
2022-04-04 15:35:43 -07:00
KNOEEE cb4d188a34 Fix a bug in PosixClock (#9695)
Summary:
Multiplier here should be 1e6 to get microseconds.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9695

Reviewed By: ajkr

Differential Revision: D34897086

Pulled By: jay-zhuang

fbshipit-source-id: 9c1d0811ea740ba0a007edc2da199edbd000b88b
2022-03-21 16:11:02 -07:00
Yanqin Jin 3bdbf67e1a Fix race condition caused by concurrent accesses to forceMmapOff_ when opening Posix WritableFile (#9685)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9685

Our TSAN reports a race condition as follows when running test
```
gtest-parallel -r 100 ./external_sst_file_test --gtest_filter=ExternalSSTFileTest.MultiThreaded
```
leads to the following

```
WARNING: ThreadSanitizer: data race (pid=2683148)
  Write of size 1 at 0x556fede63340 by thread T7:
    #0 rocksdb::(anonymous namespace)::PosixFileSystem::OpenWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::FileOptions const&, bool, std::unique_ptr<rocksdb::FSWritableFile, std::default_delete<rocksdb::FSWritableFile> >*, rocksdb::IODebugContext*) internal_repo_rocksdb/repo/env/fs_posix.cc:334 (external_sst_file_test+0xb61ac4)
    #1 rocksdb::(anonymous namespace)::PosixFileSystem::ReopenWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::FileOptions const&, std::unique_ptr<rocksdb::FSWritableFile, std::default_delete<rocksdb::FSWritableFile> >*, rocksdb::IODebugContext*) internal_repo_rocksdb/repo/env/fs_posix.cc:382 (external_sst_file_test+0xb5ba96)
    #2 rocksdb::CompositeEnv::ReopenWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unique_ptr<rocksdb::WritableFile, std::default_delete<rocksdb::WritableFile> >*, rocksdb::EnvOptions const&) internal_repo_rocksdb/repo/env/composite_env.cc:334 (external_sst_file_test+0xa6ab7f)
    #3 rocksdb::EnvWrapper::ReopenWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unique_ptr<rocksdb::WritableFile, std::default_delete<rocksdb::WritableFile> >*, rocksdb::EnvOptions const&) internal_repo_rocksdb/repo/include/rocksdb/env.h:1428 (external_sst_file_test+0x561f3e)
Previous read of size 1 at 0x556fede63340 by thread T4:
    #0 rocksdb::(anonymous namespace)::PosixFileSystem::OpenWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::FileOptions const&, bool, std::unique_ptr<rocksdb::FSWritableFile, std::default_delete<rocksdb::FSWritableFile> >*, rocksdb::IODebugContext*) internal_repo_rocksdb/repo/env/fs_posix.cc:328 (external_sst_file_test+0xb61a70)
    #1 rocksdb::(anonymous namespace)::PosixFileSystem::ReopenWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator
...
```

Fix by making sure the following block gets executed only once:
```
      if (!checkedDiskForMmap_) {
        // this will be executed once in the program's lifetime.
        // do not use mmapWrite on non ext-3/xfs/tmpfs systems.
        if (!SupportsFastAllocate(fname)) {
          forceMmapOff_ = true;
        }
        checkedDiskForMmap_ = true;
      }
```

Reviewed By: pdillinger

Differential Revision: D34780308

fbshipit-source-id: b761f66b24c8b5b8389d86ea371c8542b8d869d5
2022-03-17 19:50:30 -07:00
Jay Zhuang efd767d14a Fix build for io_uring (#9690)
Summary:
Minor fix for build failure:
```
./env/io_posix.h:68:33: error: unused parameter 'len' [-Werror=unused-parameter]
   68 |                          size_t len, size_t iov_len, bool async_read,
      |                          ~~~~~~~^~~
```
Only happens for release build with io_uring.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9690

Test Plan: build pass with io_uring

Reviewed By: akankshamahajan15

Differential Revision: D34846347

Pulled By: jay-zhuang

fbshipit-source-id: 2d7afb585097262d7722ef1beac486fc8ef28419
2022-03-12 22:12:18 -08:00
Akanksha Mahajan 8465cccde2 Posix API support for Async Read and Poll APIs (#9578)
Summary:
Provide support for Async Read and Poll in Posix file system using IOUring.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9578

Test Plan: In progress

Reviewed By: anand1976

Differential Revision: D34690256

Pulled By: akankshamahajan15

fbshipit-source-id: 291cbd1380a3cb904b726c34c0560d1b2ce44a2e
2022-03-10 18:28:31 -08:00
anand76 7574841aac Fix issue #9627 (#9657)
Summary:
SMB mounts do not support hard links. The ENOTSUP error code is
returned, which should be interpreted by PosixFileSystem as
IOStatus::NotSupported().

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9657

Reviewed By: mrambacher

Differential Revision: D34634783

Pulled By: anand1976

fbshipit-source-id: 0d57f5b2e6118e4c20e9ed1a293327428c3aecac
2022-03-07 11:39:31 -08:00
Peter Dillinger ce60d0cbe5 Test refactoring for Backups+Temperatures (#9655)
Summary:
In preparation for more support for file Temperatures in BackupEngine,
this change does some test refactoring:
* Move DBTest2::BackupFileTemperature test to
BackupEngineTest::FileTemperatures, with some updates to make it work
in the new home. This test will soon be expanded for deeper backup work.
* Move FileTemperatureTestFS from db_test2.cc to db_test_util.h, to
support sharing because of above moved test, but split off the "no link"
part to the test needing it.
* Use custom FileSystems in backupable_db_test rather than custom Envs,
because going through Env file interfaces doesn't support temperatures.
* Fix RemapFileSystem to map DirFsyncOptions::renamed_new_name
parameter to FsyncWithDirOptions, which was required because this
limitation caused a crash only after moving to higher fidelity of
FileSystem interface (vs. LegacyDirectoryWrapper throwing away some
parameter details)
* `backupable_options_` -> `engine_options_` as part of the ongoing
work to get rid of the obsolete "backupable" naming.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9655

Test Plan: test code updates only

Reviewed By: jay-zhuang

Differential Revision: D34622183

Pulled By: pdillinger

fbshipit-source-id: f24b7a596a89b9e089e960f4e5d772575513e93f
2022-03-04 12:32:30 -08:00
Akanksha Mahajan d74468e348 Update Poll and ReadAsync API in File System (#9623)
Summary:
Update the signature of Poll and ReadAsync APIs in filesystem.
Instead of unique_ptr, void** will be passed as io_handle and the delete function.
io_handle and delete function should be provided by underlying
FileSystem and its lifetime will be maintained by RocksDB. io_handle
will be deleted by RocksDB once callback is made to update the results or Poll is
called to get the results.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9623

Test Plan: Add a new unit test.

Reviewed By: anand1976

Differential Revision: D34403529

Pulled By: akankshamahajan15

fbshipit-source-id: ea185a5f4c7bec334631e4f781ea7ba4135645f0
2022-03-01 17:11:42 -08:00
mrambacher fe9d495112 Return different Status based on ObjectRegistry::NewObject calls (#9333)
Summary:
This fix addresses https://github.com/facebook/rocksdb/issues/9299.

If attempting to create a new object via the ObjectRegistry and a factory is not found, the ObjectRegistry will return a "NotSupported" status.  This is the same behavior as previously.

If the factory is found but could not successfully create the object, an "InvalidArgument" status is returned.  If the factory returned a reason why (in the errmsg), this message will be in the returned status.

In practice, there are two options in the ConfigOptions that control how these errors are propagated:
- If "ignore_unknown_options=true", then both InvalidArgument and NotSupported status codes will be swallowed internally.  Both cases will return success
- If "ignore_unsupported_options=true", then having no factory will return success but a failing factory will return an error
- If both options are false, both cases (no and failing factory) will return errors.

In practice this likely only changes Customizable that may be partially available.  For example, the JEMallocMemoryAllocator is a built-in allocator that is registered with the system but may not be compiled in.  In this case, the status code for this allocator changed from NotSupported("JEMalloc not available") to InvalidArgumen("JEMalloc not available").  Other Customizable builtins/plugins would have the same semantics.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9333

Reviewed By: pdillinger

Differential Revision: D33517681

Pulled By: mrambacher

fbshipit-source-id: 8033052d4a4a7b88c2d9f90147b1b4467e51f6fd
2022-02-11 05:11:24 -08:00
Jay Zhuang b69f4360ea Fix flaky test EnvPosixTestWithParam.RunMany (#9502)
Summary:
Thread-pool pops a thread function and then run the function,
which may cause thread-pool is empty but the last function is still
running.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9502

Test Plan:
`gtest-parallel ./env_test
--gtest_filter=DefaultEnvWithoutDirectIO/EnvPosixTestWithParam.RunMany/0
-r 10000 -w 1000`

Reviewed By: ajkr

Differential Revision: D34011184

Pulled By: jay-zhuang

fbshipit-source-id: 8c38bef155205bef96fd1c988dcc643a6b2ac270
2022-02-07 09:07:59 -08:00
Peter Dillinger fd3e0f43b3 Require C++17 (#9481)
Summary:
Drop support for some old compilers by requiring C++17 standard
(or higher). See https://github.com/facebook/rocksdb/issues/9388

First modification based on this is to remove some conditional compilation in slice.h (also
better for ODR)

Also in this PR:
* Fix some Makefile formatting that seems to affect ASSERT_STATUS_CHECKED config in
some cases
* Add c_test to NON_PARALLEL_TEST in Makefile
* Fix a clang-analyze reported "potential leak" in lru_cache_test
* Better "compatibility" definition of DEFINE_uint32 for old versions of gflags
* Fix a linking problem with shared libraries in Makefile (`./random_test: error while loading shared libraries: librocksdb.so.6.29: cannot open shared object file: No such file or directory`)
* Always set ROCKSDB_SUPPORT_THREAD_LOCAL and use thread_local (from C++11)
  * TODO in later PR: clean up that obsolete flag
* Fix a cosmetic typo in c.h (https://github.com/facebook/rocksdb/issues/9488)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9481

Test Plan:
CircleCI config substantially updated.

* Upgrade to latest Ubuntu images for each release
* Generally prefer Ubuntu 20, but keep a couple Ubuntu 16 builds with oldest supported
compilers, to ensure compatibility
* Remove .circleci/cat_ignore_eagain except for Ubuntu 16 builds, because this is to work
around a kernel bug that should not affect anything but Ubuntu 16.
* Remove designated gcc-9 build, because the default linux build now uses GCC 9 from
Ubuntu 20.
* Add some `apt-key add` to fix some apt "couldn't be verified" errors
* Generally drop SKIP_LINK=1; work-around no longer needed
* Generally `add-apt-repository` before `apt-get update` as manual testing indicated the
reverse might not work.

Travis:
* Use gcc-7 by default (remove specific gcc-7 and gcc-4.8 builds)
* TODO in later PR: fix s390x "Assembler messages: Error: invalid switch -march=z14" failure

AppVeyor:
* Completely dropped because we are dropping VS2015 support and CircleCI covers
VS >= 2017

Also local testing with old gflags (out of necessity when using ROCKSDB_NO_FBCODE=1).

Reviewed By: mrambacher

Differential Revision: D33946377

Pulled By: pdillinger

fbshipit-source-id: ae077c823905b45370a26c0103ada119459da6c1
2022-02-04 17:13:10 -08:00
mrambacher aae3093719 Introduce a CountedFileSystem for counting file operations (#9283)
Summary:
Added a CountedFileSystem that tracks a number of file operations (opens, closes, deletes, renames, flushes, syncs, fsyncs, reads, writes).    This class was based on the ReportFileOpEnv from db_bench.

This is a stepping stone PR to be able to change the SpecialEnv into a SpecialFileSystem, where several of the file varieties wish to do operation counting.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9283

Reviewed By: pdillinger

Differential Revision: D33062004

Pulled By: mrambacher

fbshipit-source-id: d0d297a7fb9c48c06cbf685e5fa755c27193b6f5
2022-02-03 15:01:23 -08:00
Yanqin Jin 50135c1bf3 Move HDFS support to separate repo (#9170)
Summary:
This PR moves HDFS support from RocksDB repo to a separate repo. The new (temporary?) repo
in this PR serves as an example before we finalize the decision on where and who to host hdfs support. At this point,
people can start from the example repo and fork.

Java/JNI is not included yet, and needs to be done later if necessary.

The goal is to include this commit in RocksDB 7.0 release.

Reference:
https://github.com/ajkr/dedupfs by ajkr

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9170

Test Plan:
Follow the instructions in https://github.com/riversand963/rocksdb-hdfs-env/blob/master/README.md. Build and run db_bench and db_stress.

make check

Reviewed By: ajkr

Differential Revision: D33751662

Pulled By: riversand963

fbshipit-source-id: 22b4db7f31762ed417a20239f5a08dcd1696244f
2022-01-24 20:23:54 -08:00
sdong 1cecd22de9 Increase wait time within EnvPosixTestWithParam.RunMany (#9413)
Summary:
We see:

[ RUN      ] ChrootEnvWithDirectIO/EnvPosixTestWithParam.RunMany/0
env/env_test.cc:464: Failure
Expected equality of these values:
  4
  cur
    Which is: 0

The suspicious is that the wait time is not long enough. Increase the wait time to 10s and allows earlier check.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9413

Test Plan: Run the test

Reviewed By: riversand963

Differential Revision: D33697715

fbshipit-source-id: 3d71715562a8cceb694b773276dd9e4e451a18bc
2022-01-24 12:50:18 -08:00
Peter Dillinger e7ac7363b4 Add to HISTORY and minor loose ends from #9294, #9254 (#9386)
Summary:
Loose ends relate to mmap on 32-bit systems. (Testing is more
complicated when the feature was completely disabled on 32-bit.)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9386

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D33590715

Pulled By: pdillinger

fbshipit-source-id: f2637036a538a552200adee65b6765fce8cae27b
2022-01-21 13:04:19 -08:00
Yanqin Jin 1a8e9f0e07 Use fcntl(F_FULLFSYNC) on OS X (#9356)
Summary:
Closing https://github.com/facebook/rocksdb/issues/5954

fsync/fdatasync on Linux:
```
(fsync/fdatasync) includes writing through or flushing a disk cache if present.
```

However, on OS X and iOS:
```
(fsync) will flush all data from the host to the drive (i.e. the "permanent storage device"),
the drive itself may not physically write the data to the platters for quite some time and it
may be written in an out-of-order sequence.
```

Solution is to use `fcntl(F_FULLFSYNC)` on OS X so that we get the same
persistence guarantee.

According to OSX man page,
```
The F_FULLFSYNC fcntl asks the drive to flush **all** buffered data to permanent storage.
```
This suggests that it will be no faster than `fsync` on Linux, since Linux, according to its man page,
```
writing through or flushing a disk cache if present
```
It means Linux may not flush **all** data from disk cache.

This is similar to bug reports/fixes in:
- golang: https://github.com/golang/go/issues/26650
- leveldb: 296de8d5b8.

Not sure if we should fallback to fsync since we break persistence contract.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9356

Reviewed By: jay-zhuang

Differential Revision: D33417416

Pulled By: riversand963

fbshipit-source-id: 475548ff9c5eaccde325e0f6842694271cbc8cb7
2022-01-18 20:23:11 -08:00
Si Ke 93b1de4f45 Enable db_test running in Centos 32 bit OS and Alpine 32 bit OS (#9294)
Summary:
Closes https://github.com/facebook/rocksdb/issues/9271

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9294

Reviewed By: riversand963, hx235

Differential Revision: D33586002

Pulled By: pdillinger

fbshipit-source-id: 3d1a2fa71023e108613ff03dbd37a5f954fc4920
2022-01-14 11:58:18 -08:00
mrambacher 1973fcba11 Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362)
Summary:
In order to support old-style regex function registration, restored the original "Register<T>(string, Factory)" method using regular expressions.  The PatternEntry methods were left in place but renamed to AddFactory.  The goal is to allow for the deprecation of the original regex Registry method in an upcoming release.

Added modes to the PatternEntry kMatchZeroOrMore and kMatchAtLeastOne to match * or +, respectively (kMatchAtLeastOne was the original behavior).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9362

Reviewed By: pdillinger

Differential Revision: D33432562

Pulled By: mrambacher

fbshipit-source-id: ed88ab3f9a2ad0d525c7bd1692873f9bb3209d02
2022-01-11 06:33:48 -08:00
mrambacher fe31dc53ca Make the Env class Customizable (#9293)
Summary:
Allows the Env to have options (Configurable) and loads like other Customizable classes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9293

Reviewed By: pdillinger, zhichao-cao

Differential Revision: D33181591

Pulled By: mrambacher

fbshipit-source-id: 55e823886c654d214eda9eedd45ccdc54dac14d7
2022-01-04 16:45:49 -08:00
mrambacher 1c39b7952b Remove/Reduce use of Regex in ObjectRegistry/Library (#9264)
Summary:
Added new ObjectLibrary::Entry classes to replace/reduce the use of Regex.  For simple factories that only do name matching, there are "StringEntry" and "AltStringEntry" classes.  For classes that use some semblance of regular expressions, there is a PatternEntry class that can match a name and prefixes.  There is also a class for Customizable::IndividualId format matches.

Added tests for the new derivative classes and got all unit tests to pass.

Resolves https://github.com/facebook/rocksdb/issues/9225.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9264

Reviewed By: pdillinger

Differential Revision: D33062001

Pulled By: mrambacher

fbshipit-source-id: c2d2143bd2d38bdf522705c8280c35381b135c03
2021-12-29 07:56:23 -08:00
Peter Dillinger 0050a73a4f New stable, fixed-length cache keys (#9126)
Summary:
This change standardizes on a new 16-byte cache key format for
block cache (incl compressed and secondary) and persistent cache (but
not table cache and row cache).

The goal is a really fast cache key with practically ideal stability and
uniqueness properties without external dependencies (e.g. from FileSystem).
A fixed key size of 16 bytes should enable future optimizations to the
concurrent hash table for block cache, which is a heavy CPU user /
bottleneck, but there appears to be measurable performance improvement
even with no changes to LRUCache.

This change replaces a lot of disjointed and ugly code handling cache
keys with calls to a simple, clean new internal API (cache_key.h).
(Preserving the old cache key logic under an option would be very ugly
and likely negate the performance gain of the new approach. Complete
replacement carries some inherent risk, but I think that's acceptable
with sufficient analysis and testing.)

The scheme for encoding new cache keys is complicated but explained
in cache_key.cc.

Also: EndianSwapValue is moved to math.h to be next to other bit
operations. (Explains some new include "math.h".) ReverseBits operation
added and unit tests added to hash_test for both.

Fixes https://github.com/facebook/rocksdb/issues/7405 (presuming a root cause)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9126

Test Plan:
### Basic correctness
Several tests needed updates to work with the new functionality, mostly
because we are no longer relying on filesystem for stable cache keys
so table builders & readers need more context info to agree on cache
keys. This functionality is so core, a huge number of existing tests
exercise the cache key functionality.

### Performance
Create db with
`TEST_TMPDIR=/dev/shm ./db_bench -bloom_bits=10 -benchmarks=fillrandom -num=3000000 -partition_index_and_filters`
And test performance with
`TEST_TMPDIR=/dev/shm ./db_bench -readonly -use_existing_db -bloom_bits=10 -benchmarks=readrandom -num=3000000 -duration=30 -cache_index_and_filter_blocks -cache_size=250000 -threads=4`
using DEBUG_LEVEL=0 and simultaneous before & after runs.
Before ops/sec, avg over 100 runs: 121924
After ops/sec, avg over 100 runs: 125385 (+2.8%)

### Collision probability
I have built a tool, ./cache_bench -stress_cache_key to broadly simulate host-wide cache activity
over many months, by making some pessimistic simplifying assumptions:
* Every generated file has a cache entry for every byte offset in the file (contiguous range of cache keys)
* All of every file is cached for its entire lifetime

We use a simple table with skewed address assignment and replacement on address collision
to simulate files coming & going, with quite a variance (super-Poisson) in ages. Some output
with `./cache_bench -stress_cache_key -sck_keep_bits=40`:

```
Total cache or DBs size: 32TiB  Writing 925.926 MiB/s or 76.2939TiB/day
Multiply by 9.22337e+18 to correct for simulation losses (but still assume whole file cached)
```

These come from default settings of 2.5M files per day of 32 MB each, and
`-sck_keep_bits=40` means that to represent a single file, we are only keeping 40 bits of
the 128-bit cache key.  With file size of 2\*\*25 contiguous keys (pessimistic), our simulation
is about 2\*\*(128-40-25) or about 9 billion billion times more prone to collision than reality.

More default assumptions, relatively pessimistic:
* 100 DBs in same process (doesn't matter much)
* Re-open DB in same process (new session ID related to old session ID) on average
every 100 files generated
* Restart process (all new session IDs unrelated to old) 24 times per day

After enough data, we get a result at the end:

```
(keep 40 bits)  17 collisions after 2 x 90 days, est 10.5882 days between (9.76592e+19 corrected)
```

If we believe the (pessimistic) simulation and the mathematical generalization, we would need to run a billion machines all for 97 billion days to expect a cache key collision. To help verify that our generalization ("corrected") is robust, we can make our simulation more precise with `-sck_keep_bits=41` and `42`, which takes more running time to get enough data:

```
(keep 41 bits)  16 collisions after 4 x 90 days, est 22.5 days between (1.03763e+20 corrected)
(keep 42 bits)  19 collisions after 10 x 90 days, est 47.3684 days between (1.09224e+20 corrected)
```

The generalized prediction still holds. With the `-sck_randomize` option, we can see that we are beating "random" cache keys (except offsets still non-randomized) by a modest amount (roughly 20x less collision prone than random), which should make us reasonably comfortable even in "degenerate" cases:

```
197 collisions after 1 x 90 days, est 0.456853 days between (4.21372e+18 corrected)
```

I've run other tests to validate other conditions behave as expected, never behaving "worse than random" unless we start chopping off structured data.

Reviewed By: zhichao-cao

Differential Revision: D33171746

Pulled By: pdillinger

fbshipit-source-id: f16a57e369ed37be5e7e33525ace848d0537c88f
2021-12-16 17:15:13 -08:00
Adam Retter c879910102 Fix fstatfs call for compilation on 32 bit systems (#9251)
Summary:
On some 32-bit systems, BTRFS_SUPER_MAGIC is unsigned while __fsword_t is signed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9251

Reviewed By: ajkr

Differential Revision: D32961651

Pulled By: pdillinger

fbshipit-source-id: 78e85fc1336f304a21e4d5961e60957c90daed63
2021-12-08 22:01:23 -08:00
jsteemann 5384f0af6e Fix compile warnings (#9199)
Summary:
* added missing override specifiers for overriden methods
  this fixes compiler warnings emitted by g++ and clang++ when compile option `-Wsuggest-override` is turned on.
* fix compile warning with -Wmaybe-uninitialized
  g++-11 warns about a _potentially_ uninitialized variable when using `-Wmaybe_uninitialized`:
  ```
      env/env.cc: In member function ‘virtual rocksdb::Status rocksdb::Env::GetHostNameString(std::string*)’:
      env/env.cc:738:66: error: ‘hostname_buf’ may be used uninitialized [-Werror=maybe-uninitialized]
        738 |   Status s = GetHostName(hostname_buf.data(), hostname_buf.size());
            |                                                                  ^
      In file included from /usr/include/c++/11/tuple:39,
                       from /usr/include/c++/11/functional:54,
                       from ./include/rocksdb/env.h:22,
                       from env/env.cc:10:
      /usr/include/c++/11/array:176:7: note: by argument 1 of type ‘const std::array<char, 256>*’ to ‘constexpr std::array<_Tp, _Nm>::size_type std::array<_Tp, _Nm>::size() const [with _Tp = char; long unsigned int _Nm = 256]’ declared here
        176 |       size() const noexcept { return _Nm; }
            |       ^~~~
      env/env.cc:737:37: note: ‘hostname_buf’ declared here
        737 |   std::array<char, kMaxHostNameLen> hostname_buf;
  ```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9199

Reviewed By: jay-zhuang

Differential Revision: D32630703

Pulled By: pdillinger

fbshipit-source-id: 9ea3010b1105a582548e3c3c0db4475b201e4a10
2021-11-24 11:19:06 -08:00
Andrew Kryczka 9bb13c56b3 Use system-wide thread ID in info log lines (#9164)
Summary:
This makes it easier to debug with tools like `ps`. The change only
applies to builds with glibc 2.30+ and _GNU_SOURCE extensions enabled.
We could adopt it in more cases by using the syscall but this is enough
for our build.

Replaces https://github.com/facebook/rocksdb/issues/2973.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9164

Test Plan:
- ran some benchmarks and correlated logged thread IDs with those shown by `ps -L`.
- verified no noticeable regression in throughput for log heavy (more than 700k log lines and over 5k / second) scenario.

Benchmark command:

```
$ TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom -compression_type=none -max_bytes_for_level_multiplier=2 -write_buffer_size=262144 -num_levels=7 -max_bytes_for_level_base=2097152 -target_file_size_base=524288 -level_compaction_dynamic_level_bytes=true -max_background_jobs=12 -num=20000000
```

Results before: 15.9MB/s, 15.8MB/s, 16.0MB/s
Results after: 16.3MB/s, 16.3MB/s, 15.8MB/s

- Rely on CI to test the fallback behavior

Reviewed By: riversand963

Differential Revision: D32399660

Pulled By: ajkr

fbshipit-source-id: c24d44fdf7782faa616ef0a0964eaca3539d9c24
2021-11-12 19:46:06 -08:00
Jay Zhuang 29102641dd Skip directory fsync for filesystem btrfs (#8903)
Summary:
Directory fsync might be expensive on btrfs and it may not be needed.
Here are 4 directory fsync cases:
1. creating a new file: dir-fsync is not needed on btrfs, as long as the
   new file itself is synced.
2. renaming a file: dir-fsync is not needed if the renamed file is
   synced. So an API `FsyncAfterFileRename(filename, ...)` is provided
   to sync the file on btrfs. By default, it just calls dir-fsync.
3. deleting files: dir-fsync is forced by set
   `IOOptions.force_dir_fsync = true`
4. renaming multiple files (like backup and checkpoint): dir-fsync is
   forced, the same as above.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8903

Test Plan: run tests on btrfs and non btrfs

Reviewed By: ajkr

Differential Revision: D30885059

Pulled By: jay-zhuang

fbshipit-source-id: dd2730b31580b0bcaedffc318a762d7dbf25de4a
2021-11-03 12:21:27 -07:00
mrambacher f72c834eab Make FileSystem a Customizable Class (#8649)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8649

Reviewed By: zhichao-cao

Differential Revision: D32036059

Pulled By: mrambacher

fbshipit-source-id: 4f1e7557ecac52eb849b83ae02b8d7d232112295
2021-11-02 09:07:11 -07:00
Peter Dillinger ad5325a736 Experimental support for SST unique IDs (#8990)
Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).

Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990

Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.

Reviewed By: zhichao-cao, mrambacher

Differential Revision: D31582865

Pulled By: pdillinger

fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
2021-10-18 23:32:01 -07:00
jsteemann ab6755711b add IsSyncThreadSafe() override to EncryptedWritableFile (#8993)
Summary:
EncryptedWritableFile is derived from FSWritableFile, which implements
the `IsSyncThreadSafe()` function as

    bool IsSyncThreadSafe() const { return false; }

EncryptedWritableFile does not override this method from the base class,
so the `IsSyncThreadSafe()` function on an EncryptedWritableFile will
always return false.
This change adds an override of `IsSyncThreadSafe()` to
EncryptedWritableFile so that the latter will now ask its underlying
`file_` object for the thread-safety of sync operations.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8993

Reviewed By: jay-zhuang

Differential Revision: D31613123

Pulled By: ajkr

fbshipit-source-id: b18625e21a9911744eef3215c29913490e4b6001
2021-10-14 16:14:26 -07:00
mrambacher 13ae16c315 Cleanup includes in dbformat.h (#8930)
Summary:
This header file was including everything and the kitchen sink when it did not need to.  This resulted in many places including this header when they needed other pieces instead.

Cleaned up this header to only include what was needed and fixed up the remaining code to include what was now missing.

Hopefully, this sort of code hygiene cleanup will speed up the builds...

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8930

Reviewed By: pdillinger

Differential Revision: D31142788

Pulled By: mrambacher

fbshipit-source-id: 6b45de3f300750c79f751f6227dece9cfd44085d
2021-09-29 04:04:40 -07:00
mrambacher 6924869867 Make SystemClock into a Customizable Class (#8636)
Summary:
Made SystemClock into a Customizable class, complete with CreateFromString.

Cleaned up some of the existing SystemClock implementations that were redundant (NoSleep was the same as the internal one for MockEnv).

Changed MockEnv construction to allow Clock to be passed to the Memory/MockFileSystem.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8636

Reviewed By: zhichao-cao

Differential Revision: D30483360

Pulled By: mrambacher

fbshipit-source-id: cd0e3a876c39f8c98fe13374c06e8edbd5b9f2a1
2021-09-21 09:23:48 -07:00
anand76 99fe4c5005 Add a gflag for IO uring enable/disable (#8931)
Summary:
In case of IO uring bugs, we need to provide a way for users to turn it off.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8931

Test Plan: Manually run db_bench with/without the option and verify the behavior

Reviewed By: pdillinger

Differential Revision: D31040252

Pulled By: anand1976

fbshipit-source-id: 56f2537d6ac8488c9e126296d8190ad9e0158f70
2021-09-18 10:24:56 -07:00
anand76 7743f033b1 More robust checking of IO uring completion data (#8894)
Summary:
Potential bugs in the IO uring implementation can cause bad data to be returned in the completion queue. Add some checks in the PosixRandomAccessFile::MultiRead completion handling code to catch such errors and fail the entire MultiRead. Also log some diagnostic messages and stack trace.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8894

Reviewed By: siying, pdillinger

Differential Revision: D30826982

Pulled By: anand1976

fbshipit-source-id: af91815ac760e095d6cc0466cf8bd5c10167fd15
2021-09-15 12:44:43 -07:00
mrambacher dafa584fd1 Change the File System File Wrappers to std::unique_ptr (#8618)
Summary:
This allows the wrapper classes to own the wrapped object and eliminates confusion as to ownership.  Previously, many classes implemented their own ownership solutions.  Fixes https://github.com/facebook/rocksdb/issues/8606

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8618

Reviewed By: pdillinger

Differential Revision: D30136064

Pulled By: mrambacher

fbshipit-source-id: d0bf471df8818dbb1770a86335fe98f761cca193
2021-09-13 08:46:19 -07:00
Peter Dillinger e40b04e9fa Fix POSIX LockFile after failure to create file (#8747)
Summary:
Failure to create the lock file (e.g. out of space) could
prevent future LockFile attempts in the same process on the same file
from succeeding.

Also added DEBUG code to fail assertion if PosixFileLock is destroyed
without using UnlockFile (which is a risk because FileLock is in the
public API with virtual destructor).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8747

Test Plan: test added

Reviewed By: ajkr

Differential Revision: D30732543

Pulled By: pdillinger

fbshipit-source-id: 4c30a959566d91f778d6fad3fbbd5f3941b097c1
2021-09-07 22:41:36 -07:00
Peter Dillinger 4750421ece Replace most typedef with using= (#8751)
Summary:
Old typedef syntax is confusing

Most but not all changes with

    perl -pi -e 's/typedef (.*) ([a-zA-Z0-9_]+);/using $2 = $1;/g' list_of_files
    make format

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8751

Test Plan: existing

Reviewed By: zhichao-cao

Differential Revision: D30745277

Pulled By: pdillinger

fbshipit-source-id: 6f65f0631c3563382d43347896020413cc2366d9
2021-09-07 11:31:59 -07:00
Peter Dillinger 13ded69484 Built-in support for generating unique IDs, bug fix (#8708)
Summary:
Env::GenerateUniqueId() works fine on Windows and on POSIX
where /proc/sys/kernel/random/uuid exists. Our other implementation is
flawed and easily produces collision in a new multi-threaded test.
As we rely more heavily on DB session ID uniqueness, this becomes a
serious issue.

This change combines several individually suitable entropy sources
for reliable generation of random unique IDs, with goal of uniqueness
and portability, not cryptographic strength nor maximum speed.

Specifically:
* Moves code for getting UUIDs from the OS to port::GenerateRfcUuid
rather than in Env implementation details. Callers are now told whether
the operation fails or succeeds.
* Adds an internal API GenerateRawUniqueId for generating high-quality
128-bit unique identifiers, by combining entropy from three "tracks":
  * Lots of info from default Env like time, process id, and hostname.
  * std::random_device
  * port::GenerateRfcUuid (when working)
* Built-in implementations of Env::GenerateUniqueId() will now always
produce an RFC 4122 UUID string, either from platform-specific API or
by converting the output of GenerateRawUniqueId.

DB session IDs now use GenerateRawUniqueId while DB IDs (not as
critical) try to use port::GenerateRfcUuid but fall back on
GenerateRawUniqueId with conversion to an RFC 4122 UUID.

GenerateRawUniqueId is declared and defined under env/ rather than util/
or even port/ because of the Env dependency.

Likely follow-up: enhance GenerateRawUniqueId to be faster after the
first call and to guarantee uniqueness within the lifetime of a single
process (imparting the same property onto DB session IDs).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8708

Test Plan:
A new mini-stress test in env_test checks the various public
and internal APIs for uniqueness, including each track of
GenerateRawUniqueId individually. We can't hope to verify anywhere close
to 128 bits of entropy, but it can at least detect flaws as bad as the
old code. Serial execution of the new tests takes about 350 ms on
my machine.

Reviewed By: zhichao-cao, mrambacher

Differential Revision: D30563780

Pulled By: pdillinger

fbshipit-source-id: de4c9ff4b2f581cf784fcedb5f39f16e5185c364
2021-08-30 15:20:41 -07:00
Merlin Mao f58d276764 Make TraceRecord and Replayer public (#8611)
Summary:
New public interfaces:
`TraceRecord` and `TraceRecord::Handler`, available in "rocksdb/trace_record.h".
`Replayer`, available in `rocksdb/utilities/replayer.h`.

User can use `DB::NewDefaultReplayer()` to create a Replayer to auto/manual replay a trace file.

Unit tests:
- `./db_test2 --gtest_filter="DBTest2.TraceAndReplay"`: Updated with the internal API changes.
- `./db_test2 --gtest_filter="DBTest2.TraceAndManualReplay"`: New for manual replay.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8611

Reviewed By: ajkr

Differential Revision: D30266329

Pulled By: autopear

fbshipit-source-id: 1ecb3cbbedae0f6a67c18f0cc82e002b4d81b6f8
2021-08-11 19:32:46 -07:00
Mark Rambacher 42ba60b3ba Make EncryptionProvider and BlockCipher into Customizable objects (#8354)
Summary:
Made the EncryptionProvider and BlockCipher classes inherit from Customizable.  Added/fixed the CreateFromString method to these classes to create instances from builtin or registered classes.  Added tests to verify that instances can be registered and retrieved as appropriate.

Added the ability to configure the builtin (CTR, ROT13) classes from configurable properties.  Added the appropriate tests.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8354

Reviewed By: zhichao-cao

Differential Revision: D29558949

Pulled By: mrambacher

fbshipit-source-id: c20286b32d179777e060f51a58943e9b0cf81d04
2021-07-16 07:58:51 -07:00
Peter Dillinger a53d6d25e0 Improve support for valgrind error on reachable (#8503)
Summary:
MyRocks apparently uses valgrind to check for unreachable
unfreed data, which is stricter than our valgrind checks. Internal ref:
D29257815

This patch adds valgrind support to STATIC_AVOID_DESTRUCTION so that it's
not reported with those stricter checks.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8503

Test Plan:
make valgrind_test
Also, with modified VALGRIND_OPTS (see Makefile), more kinds of
failures seen before than after this commit.

Reviewed By: ajkr, yizhang82

Differential Revision: D29597784

Pulled By: pdillinger

fbshipit-source-id: 360de157a176aec4d1be99ca20d160ecd47c0873
2021-07-12 17:00:27 -07:00
mrambacher 281ac9c89e Add CreateFrom methods to Env/FileSystem (#8174)
Summary:
- Added CreateFromString method to Env and FilesSystem to replace LoadEnv/Load.  This method/signature is a precursor to making these classes extend Customizable.

- Added CreateFromSystem to Env.  This method standardizes creating an Env from the environment variables.  Previously, some places would check TEST_ENV_URI and others would also check TEST_FS_URI.  Now the code is more command/standardized.

- Added CreateFromFlags to Env.  These method allows Env to be create from string options (such as GFLAGS options) in a more standard way.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8174

Reviewed By: zhichao-cao

Differential Revision: D28999603

Pulled By: mrambacher

fbshipit-source-id: 88e6911e7e91f908458a7fe10a20e93ecbc275fb
2021-06-15 03:43:48 -07:00
matthewvon 5a2b4ed671 BugFix: fs_posix.cc GetFreeSpace uses wrong value non-root users (#8370)
Summary:
fs_posix.cc GetFreeSpace() calculates free space based upon a call to statvfs().  However, there are two extremely different values in statvfs's returned structure:  f_bfree which is free space for root and f_bavail which is free space for non-root users.  The existing code uses f_bfree.  Many disks have 5 to 10% of the total disk space reserved for root only.  Therefore GetFreeSpace() does not realize that non-root users may not have storage available.

This PR detects whether the effective posix user is root or not, then selects the appropriate available space value.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8370

Reviewed By: mrambacher

Differential Revision: D29032710

Pulled By: jay-zhuang

fbshipit-source-id: 57feba34ed035615a479956d28f98d85735281c0
2021-06-10 11:11:54 -07:00
sdong 871a2cb292 Fix test issue in new env_test tests (#8319)
Summary:
The two new tests added to env_test don't clear sync points, so if tests are run in continuous mode, rather than parallel mode, the next test will trigger previous sync point and fail. Fix it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8319

Test Plan: Run the tests in continuous mode which used to fail and see them passing.

Reviewed By: pdillinger

Differential Revision: D28542562

fbshipit-source-id: 4052d487635188fe68a2a9df4b03d97b23f96720
2021-05-19 10:59:02 -07:00
sdong ce0fc71adf Minor improvements in env_test (#8317)
Summary:
Fix typo in comments in env_test and add PermitUncheckedError() to two statuses.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8317

Reviewed By: jay-zhuang

Differential Revision: D28525093

fbshipit-source-id: 7a1ed3e45b6f500b8d2ae19fa339c9368111e922
2021-05-19 10:28:08 -07:00
sdong 60e5af83c1 Handle return code by io_uring_submit_and_wait() and io_uring_wait_cqe() (#8311)
Summary:
Right now return codes by io_uring_submit_and_wait() and io_uring_wait_cqe() are not handled. It is not the good practice. Although these two functions are not supposed to return non-0 values in normal exeuction, people suspect that they might return non-0 value when an interruption happens, and the code might cause hanging.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8311

Test Plan: Make sure at least normal test cases still pass.

Reviewed By: anand1976

Differential Revision: D28500828

fbshipit-source-id: 8a76cea9cafbd041102e0b6a8eef9d0bfed7c211
2021-05-18 16:09:14 -07:00
sdong e19908cba6 Refactor kill point (#8241)
Summary:
Refactor kill point to one single class, rather than several extern variables. The intention was to drop unflushed data before killing to simulate some job, and I tried to a pointer to fault ingestion fs to the killing class, but it ended up with harder than I thought. Perhaps we'll need to do this in another way. But I thought the refactoring itself is good so I send it out.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8241

Test Plan: make release and run crash test for a while.

Reviewed By: anand1976

Differential Revision: D28078486

fbshipit-source-id: f9182c1455f52e6851c13f88a21bade63bcec45f
2021-05-05 15:50:29 -07:00
Jay Zhuang a89740fbc6 Fix unittest no space issue (#8204)
Summary:
Unittest reports no space from time to time, which can be reproduced on a small memory machine with SHM. It's caused by large WAL files generated during the test, which is preallocated, but didn't truncate during close(). Adding the missing APIs to set preallocation.
It added arm test as nightly build, as the test runs more than 1 hour.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8204

Test Plan: test on small memory arm machine

Reviewed By: mrambacher

Differential Revision: D27873145

Pulled By: jay-zhuang

fbshipit-source-id: f797c429d6bc13cbcc673bc03fcc72adda55f506
2021-04-20 08:42:28 -07:00
Akanksha Mahajan d52b520d51 Integrated BlobDB for backup/restore support (#8129)
Summary:
Add support for blob files for backup/restore like table files.
    Since DB session ID is currently not supported for blob files (there is no place to store it in
    the header), so for blob files uses the
    kLegacyCrc32cAndFileSize naming scheme even if
    share_files_with_checksum_naming is set to kUseDbSessionId.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8129

Test Plan: Add new test units

Reviewed By: ltamasi

Differential Revision: D27408510

Pulled By: akankshamahajan15

fbshipit-source-id: b27434d189a639ef3e6ad165c61a143a2daaf06e
2021-04-07 13:38:54 -07:00
Peter Dillinger 35af0433cf Fix crash test with backup as read-only DB (#8161)
Summary:
Forgot to re-test crash test after adding read-only filesystem
enforcement to https://github.com/facebook/rocksdb/issues/8142. The problem is ReadOnlyFileSystem would reject
CreateDirIfMissing whenever DBOptions::create_if_missing=true. The fix
that is better for users is to allow CreateDirIfMissing in
ReadOnlyFileSystem if the directory exists, so that they don't cause a
failure on using create_if_missing with opening backups as read-only
DBs. Added this option test to the unit test (in addition to being in the
crash test).

Also fixed a couple of lints.

And some better messaging from 'make format' so that when you run it
with uncommitted changes, it's clear that it's only checking the
uncommitted changes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8161

Test Plan: local blackbox_crash_test with amplified backup_one_in

Reviewed By: ajkr

Differential Revision: D27614409

Pulled By: pdillinger

fbshipit-source-id: 63ccb626c7e34c200d61c6bca2a8f60da9015179
2021-04-06 23:31:51 -07:00
Peter Dillinger 879357fdb0 Make backups openable as read-only DBs (#8142)
Summary:
A current limitation of backups is that you don't know the
exact database state of when the backup was taken. With this new
feature, you can at least inspect the backup's DB state without
restoring it by opening it as a read-only DB.

Rather than add something like OpenAsReadOnlyDB to the BackupEngine API,
which would inhibit opening stackable DB implementations read-only
(if/when their APIs support it), we instead provide a DB name and Env
that can be used to open as a read-only DB.

Possible follow-up work:

* Add a version of GetBackupInfo for a single backup.
* Let CreateNewBackup return the BackupID of the newly-created backup.

Implementation details:

Refactored ChrootFileSystem to split off new base class RemapFileSystem,
which allows more general remapping of files. We use this base class to
implement BackupEngineImpl::RemapSharedFileSystem.

To minimize API impact, I decided to just add these fields `name_for_open`
and `env_for_open` to those set by GetBackupInfo when
include_file_details=true. Creating the RemapSharedFileSystem adds a bit
to the memory consumption, perhaps unnecessarily in some cases, but this
has been mitigated by (a) only initialize the RemapSharedFileSystem
lazily when GetBackupInfo with include_file_details=true is called, and
(b) using the existing `shared_ptr<FileInfo>` objects to hold most of the
mapping data.

To enhance API safety, RemapSharedFileSystem is wrapped by new
ReadOnlyFileSystem which rejects any attempts to write. This uncovered a
couple of places in which DB::OpenForReadOnly would write to the
filesystem, so I fixed these. Added a release note because this affects
logging.

Additional minor refactoring in backupable_db.cc to support the new
functionality.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8142

Test Plan:
new test (run with ASAN and UBSAN), added to stress test and
ran it for a while with amplified backup_one_in

Reviewed By: ajkr

Differential Revision: D27535408

Pulled By: pdillinger

fbshipit-source-id: 04666d310aa0261ef6b2385c43ca793ce1dfd148
2021-04-06 14:37:53 -07:00
Akanksha Mahajan 689b13e639 Add request_id in IODebugContext. (#8045)
Summary:
Add request_id in IODebugContext which will be populated by
    underlying FileSystem for IOTracing purposes. Update IOTracer to trace
    request_id in the tracing records. Provided API
    IODebugContext::SetRequestId which will set the request_id and enable
    tracing for request_id. The API hides the implementation and underlying
    file system needs to call this API directly.

Update DB::StartIOTrace API and remove redundant Env* from the
    argument as its not used and DB already has Env that is passed down to
    IOTracer.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8045

Test Plan: Update unit test.

Differential Revision: D26899871

Pulled By: akankshamahajan15

fbshipit-source-id: 56adef52ee5af0fb3060b607c3af1ec01635fa2b
2021-04-01 13:14:51 -07:00
Imanol-Mikel Barba Sabariego 04191e1c5d Adding safer permissions to PosixFilesystem::NewLogger (#8106)
Summary:
We have observed rocksdb databases creating info log files with world-writeable permissions.

The reason why the file is created like so is because stdio streams opened with fopen calls use mode 0666, and while normally most systems have a umask of 022, in some occasions (for instance, while running daemons), you may find that the application is running with a less restrictive umask. The result is that when opening the DB, the LOG file would be created with world-writeable perms:

```
$ ls -lh db/
total 6.4M
-rw-r--r-- 1 ibarba users  115 Mar 24 17:41 000004.log
-rw-r--r-- 1 ibarba users   16 Mar 24 17:41 CURRENT
-rw-r--r-- 1 ibarba users   37 Mar 24 17:41 IDENTITY
-rw-r--r-- 1 ibarba users    0 Mar 24 17:41 LOCK
-rw-rw-r-- 1 ibarba users 114K Mar 24 17:41 LOG
-rw-r--r-- 1 ibarba users  514 Mar 24 17:41 MANIFEST-000003
-rw-r--r-- 1 ibarba users  31K Mar 24 17:41 OPTIONS-000018
-rw-r--r-- 1 ibarba users  31K Mar 24 17:41 OPTIONS-000020
```

This diff replaces the fopen call with a regular open() call restricting mode, and then using fdopen to associate an stdio stream with that file descriptor. Resulting in the following files being created:

```
-rw-r--r-- 1 ibarba users   58 Mar 24 18:16 000004.log
-rw-r--r-- 1 ibarba users   16 Mar 24 18:16 CURRENT
-rw-r--r-- 1 ibarba users   37 Mar 24 18:16 IDENTITY
-rw-r--r-- 1 ibarba users    0 Mar 24 18:16 LOCK
-rw-r--r-- 1 ibarba users 111K Mar 24 18:16 LOG
-rw-r--r-- 1 ibarba users  514 Mar 24 18:16 MANIFEST-000003
-rw-r--r-- 1 ibarba users  31K Mar 24 18:16 OPTIONS-000018
-rw-r--r-- 1 ibarba users  31K Mar 24 18:16 OPTIONS-000020
```

With the correct permissions

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8106

Reviewed By: akankshamahajan15

Differential Revision: D27415377

Pulled By: mrambacher

fbshipit-source-id: 97ac6c215700a7ea306f4a1fdf9fcf64a3cbb202
2021-03-29 20:47:21 -07:00
anand76 7d7f14480e Always truncate the latest WAL file on DB Open (#8122)
Summary:
Currently, we only truncate the latest alive WAL files when the DB is opened. If the latest WAL file is empty or was flushed during Open, its not truncated since the file will be deleted later on in the Open path. However, before deletion, a new WAL file is created, and if the process crash loops between the new WAL file creation and deletion of the old WAL file, the preallocated space will keep accumulating and eventually use up all disk space. To prevent this, always truncate the latest WAL file, even if its empty or the data was flushed.

Tests:
Add unit tests to db_wal_test

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8122

Reviewed By: riversand963

Differential Revision: D27366132

Pulled By: anand1976

fbshipit-source-id: f923cc03ef033ccb32b140d36c6a63a8152f0e8e
2021-03-28 10:00:08 -07:00
Jay Zhuang ce6de862c1 Avoid checking errno on success call (#8119)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8119

Reviewed By: sushilpa

Differential Revision: D27365407

Pulled By: jay-zhuang

fbshipit-source-id: 327c09bf76834ce0be4287680640adc8b88bcec2
2021-03-26 18:46:38 -07:00
Jay Zhuang 45c65d6dcf Use thread-safe `strerror_r()` to get error message (#8087)
Summary:
`strerror()` is not thread-safe, using `strerror_r()` instead. The API could be different on the different platforms, used the code from 0deef031cb/folly/String.cpp (L457)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8087

Reviewed By: mrambacher

Differential Revision: D27267151

Pulled By: jay-zhuang

fbshipit-source-id: 4b8856d1ec069d5f239b764750682c56e5be9ddb
2021-03-24 23:07:27 -07:00
Zhichao Cao 8dc6d8c748 Added append with checksum handoff API to hdfs (#8084)
Summary:
Added append with checksum handoff API to hdfs

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8084

Test Plan: make check

Reviewed By: ajkr

Differential Revision: D27237823

Pulled By: zhichao-cao

fbshipit-source-id: 93b38db23b1811a6daa049afb89240089ec6f67c
2021-03-23 15:12:03 -07:00
Zhichao Cao dd0447ae2c Add new Append API with DataVerificationInfo to Env WritableFile (#8071)
Summary:
Add the new Append and PositionedAppend API to env WritableFile. User is able to benefit from the write checksum handoff API when using the legacy Env classes. FileSystem already implemented the checksum handoff API.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8071

Test Plan: make check, added new unit test.

Reviewed By: anand1976

Differential Revision: D27177043

Pulled By: zhichao-cao

fbshipit-source-id: 430c8331fc81099fa6d00f4fff703b68b9e8080e
2021-03-19 11:44:13 -07:00
mrambacher 1a343bc393 Make ChRootEnv, EncryptedEnv, and TimedEnv into FileSystems (#7968)
Summary:
These classes were wraps of Env that provided only extensions to the FileSystem functionality.  Changed the classes to be FileSystems and the wraps to be of the CompositeEnvWrapper.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7968

Reviewed By: anand1976

Differential Revision: D26900253

Pulled By: mrambacher

fbshipit-source-id: 94001d8024a3c54a1c11adadca2bac66c3af2a77
2021-03-15 19:50:11 -07:00
mrambacher 3dff28cf9b Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033)
Summary:
For performance purposes, the lower level routines were changed to use a SystemClock* instead of a std::shared_ptr<SystemClock>.  The shared ptr has some performance degradation on certain hardware classes.

For most of the system, there is no risk of the pointer being deleted/invalid because the shared_ptr will be stored elsewhere.  For example, the ImmutableDBOptions stores the Env which has a std::shared_ptr<SystemClock> in it.  The SystemClock* within the ImmutableDBOptions is essentially a "short cut" to gain access to this constant resource.

There were a few classes (PeriodicWorkScheduler?) where the "short cut" property did not hold.  In those cases, the shared pointer was preserved.

Using db_bench readrandom perf_level=3 on my EC2 box, this change performed as well or better than 6.17:

6.17: readrandom   :      28.046 micros/op 854902 ops/sec;   61.3 MB/s (355999 of 355999 found)
6.18: readrandom   :      32.615 micros/op 735306 ops/sec;   52.7 MB/s (290999 of 290999 found)
PR: readrandom   :      27.500 micros/op 871909 ops/sec;   62.5 MB/s (367999 of 367999 found)

(Note that the times for 6.18 are prior to revert of the SystemClock).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8033

Reviewed By: pdillinger

Differential Revision: D27014563

Pulled By: mrambacher

fbshipit-source-id: ad0459eba03182e454391b5926bf5cdd45657b67
2021-03-15 04:34:11 -07:00
Peter Dillinger 4b18c46d10 Refactor: add LineFileReader and Status::MustCheck (#8026)
Summary:
Removed confusing, awkward, and undocumented internal API
ReadOneLine and replaced with very simple LineFileReader.

In refactoring backupable_db.cc, this has the side benefit of
removing the arbitrary cap on the size of backup metadata files.

Also added Status::MustCheck to make it easy to mark a Status as
"must check." Using this, I can ensure that after
LineFileReader::ReadLine returns false the caller checks GetStatus().

Also removed some excessive conditional compilation in status.h

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8026

Test Plan: added unit test, and running tests with ASSERT_STATUS_CHECKED

Reviewed By: mrambacher

Differential Revision: D26831687

Pulled By: pdillinger

fbshipit-source-id: ef749c265a7a26bb13cd44f6f0f97db2955f6f0f
2021-03-09 20:12:38 -08:00
Deepak Ojha cbf38af705 Wal recovery failure with encryption due to zero bytes WAL size. (#7924)
Summary:
Fixes issue https://github.com/facebook/rocksdb/issues/7410

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7924

Reviewed By: jay-zhuang

Differential Revision: D26216805

Pulled By: ajkr

fbshipit-source-id: 5720635ee06003269bfaf8c8616b3671974b08a8
2021-02-05 12:40:52 -08:00
mrambacher 4a09d632c4 Remove Legacy and Custom FileWrapper classes from header files (#7851)
Summary:
Removed the uses of the Legacy FileWrapper classes from the source code.  The wrappers were creating an additional layer of indirection/wrapping, as the Env already has a FileSystem.

Moved the Custom FileWrapper classes into the CustomEnv, as these classes are really for the private use the the CustomEnv class.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7851

Reviewed By: anand1976

Differential Revision: D26114816

Pulled By: mrambacher

fbshipit-source-id: db32840e58d969d3a0fa6c25aaf13d6dcdc74150
2021-01-28 22:10:32 -08:00
mrambacher 12f1137355 Add a SystemClock class to capture the time functions of an Env (#7858)
Summary:
Introduces and uses a SystemClock class to RocksDB.  This class contains the time-related functions of an Env and these functions can be redirected from the Env to the SystemClock.

Many of the places that used an Env (Timer, PerfStepTimer, RepeatableThread, RateLimiter, WriteController) for time-related functions have been changed to use SystemClock instead.  There are likely more places that can be changed, but this is a start to show what can/should be done.  Over time it would be nice to migrate most (if not all) of the uses of the time functions from the Env to the SystemClock.

There are several Env classes that implement these functions.  Most of these have not been converted yet to SystemClock implementations; that will come in a subsequent PR.  It would be good to unify many of the Mock Timer implementations, so that they behave similarly and be tested similarly (some override Sleep, some use a MockSleep, etc).

Additionally, this change will allow new methods to be introduced to the SystemClock (like https://github.com/facebook/rocksdb/issues/7101 WaitFor) in a consistent manner across a smaller number of classes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7858

Reviewed By: pdillinger

Differential Revision: D26006406

Pulled By: mrambacher

fbshipit-source-id: ed10a8abbdab7ff2e23d69d85bd25b3e7e899e90
2021-01-25 22:09:11 -08:00
Akanksha Mahajan 1d226018af In IOTracing, add filename with each operation in trace file. (#7885)
Summary:
1. In IOTracing, add filename with each IOTrace record. Filename is stored in file object (Tracing Wrappers).
         2. Change the logic of figuring out which additional information (file_size,
            length, offset etc) needs to be store with each operation
            which is different for different operations.
            When new information will be added in future (depends on operation),
            this change would make the future additions simple.

Logic: In IOTraceRecord, io_op_data is added and its
         bitwise positions represent which additional information need
         to added in the record from enum IOTraceOp. Values in IOTraceOp represent bitwise positions.
         So if length and offset needs to be stored (IOTraceOp::kIOLen
         is 1 and IOTraceOp::kIOOffset is 2), position 1 and 2 (from rightmost bit) will be set
         and io_op_data will contain 110.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7885

Test Plan: Updated io_tracer_test and verified the trace file manually.

Reviewed By: anand1976

Differential Revision: D25982353

Pulled By: akankshamahajan15

fbshipit-source-id: ebfc5539cc0e231d7794a6b42b73f5403e360b22
2021-01-25 14:37:35 -08:00
Adam Retter 4926b33742 Improvements to Env::GetChildren (#7819)
Summary:
The main improvement here is to not include `.` or `..` in the results of `Env::GetChildren`. The occurrence of `.` or `..`; it is non-portable, dependent on the Operating System and the File System. See: https://www.gnu.org/software/libc/manual/html_node/Reading_002fClosing-Directory.html

There were lots of duplicate checks spread through the RocksDB codebase previously to skip `.` and `..`. This new removes the need for those at the source.

Also some minor fixes to `Env::GetChildren`:
* Improve error handling in POSIX implementation
* Remove unnecessary array allocation on Windows
* Fix struct name for Windows Non-UTF-8 API

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7819

Reviewed By: ajkr

Differential Revision: D25837394

Pulled By: jay-zhuang

fbshipit-source-id: 1e137e7218d38b450af9c083f73d5357abcbba2e
2021-01-09 09:44:34 -08:00
mrambacher e628f59e87 Create a CustomEnv class; Add WinFileSystem; Make LegacyFileSystemWrapper private (#7703)
Summary:
This PR does the following:
-> Creates a WinFileSystem class.  This class is the Windows equivalent of the PosixFileSystem and will be used on Windows systems.
-> Introduces a CustomEnv class.  A CustomEnv is an Env that takes a FileSystem as constructor argument.  I believe there will only ever be two implementations of this class (PosixEnv and WinEnv).  There is still a CustomEnvWrapper class that takes an Env and a FileSystem and wraps the Env calls with the input Env but uses the FileSystem for the FileSystem calls
-> Eliminates the public uses of the LegacyFileSystemWrapper.

With this change in place, there are effectively the following patterns of Env:
- "Base Env classes" (PosixEnv, WinEnv).  These classes implement the core Env functions (e.g. Threads) and have a hard-coded input FileSystem.  These classes inherit from CompositeEnv, implement the core Env functions (threads) and delegate the FileSystem-like calls to the input file system.
- Wrapped Composite Env classes (MemEnv).  These classes take in an Env and a FileSystem.  The core env functions are re-directed to the wrapped env.  The file system calls are redirected to the input file system
- Legacy Wrapped Env classes.  These classes take in an Env input (but no FileSystem).  The core env functions are re-directed to the wrapped env.  A "Legacy File System" is created using this env and the file system calls directed to the env itself.

With these changes in place, the PosixEnv becomes a singleton -- there is only ever one created.  Any other use of the PosixEnv is via another wrapped env.  This cleans up some of the issues with the env construction and destruction.

Additionally, there were places in the code that required had an Env when they required a FileSystem.  Many of these places would wrap the Env with a LegacyFileSystemWrapper instead of using the env->GetFileSystem().  These places were changed, thereby removing layers of additional redirection (LegacyFileSystem --> Env --> Env::FileSystem).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7703

Reviewed By: zhichao-cao

Differential Revision: D25762190

Pulled By: anand1976

fbshipit-source-id: 1a088e97fc916f28ac69c149cd1dcad0ab31704b
2021-01-06 10:49:32 -08:00
mrambacher 55e99688cc No elide constructors (#7798)
Summary:
Added "no-elide-constructors to the ASSERT_STATUS_CHECK builds.  This flag gives more errors/warnings for some of the Status checks where an inner class checks a Status and later returns it.  In this case,  without the elide check on, the returned status may not have been checked in the caller, thereby bypassing the checked code.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7798

Reviewed By: jay-zhuang

Differential Revision: D25680451

Pulled By: pdillinger

fbshipit-source-id: c3f14ed9e2a13f0a8c54d839d5fb4d1fc1e93917
2020-12-23 16:55:53 -08:00
Azat Khuzhin 40f2b65008 Eliminate possible race between LockFile() vs UnlockFile() (#7721)
Summary:
LockFile() accessing LockHoldingInfo (element of locked_files) by
reference after mutex_locked_files had been released.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7721

Reviewed By: pdillinger

Differential Revision: D25431839

Pulled By: jay-zhuang

fbshipit-source-id: eefee93f12a8016a98e2466e442af2605b3e2a5e
2020-12-10 09:35:11 -08:00
anand76 9627e342c8 Use default FileSystem in GenerateUniqueId (#7672)
Summary:
Use ```FileSystem::Default``` to read ```/proc/sys/kernel/uuid```, so it works for ```Envs``` with remote ```FileSystem``` as well.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7672

Reviewed By: riversand963

Differential Revision: D24998702

Pulled By: anand1976

fbshipit-source-id: fa95c1d70f0e4ed17561201f047aa055046d06c3
2020-11-16 20:48:13 -08:00
Peter Dillinger c57f914482 Use NPHash64 in more places (#7632)
Summary:
Since the hashes should not be persisted in output_validator
nor mock_env.

Also updated NPHash64 to use 64-bit seed, and comments.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7632

Test Plan:
make check, and new build setting that enables modification
to NPHash64, to check for behavior depending on specific values. Added
that setting to one of the CircleCI configurations.

Reviewed By: jay-zhuang

Differential Revision: D24833780

Pulled By: pdillinger

fbshipit-source-id: 02a57652ccf1ac105fbca79e77875bb7bf7c071f
2020-11-10 23:42:13 -08:00
Cheng Chang 5e794b0841 Fix a recovery corner case (#7621)
Summary:
Consider the following sequence of events:

1. Db flushed an SST with file number N, appended to MANIFEST, and tried to sync the MANIFEST.
2. Syncing MANIFEST failed and db crashed.
3. Db tried to recover with this MANIFEST. In the meantime, no entry about the newly-flushed SST was found in the MANIFEST. Therefore, RocksDB replayed WAL and tried to flush to an SST file reusing the same file number N. This failed because file system does not support overwrite. Then Db deleted this file.
4. Db crashed again.
5. Db tried to recover. When db read the MANIFEST, there was an entry referencing N.sst. This could happen probably because the append in step 1 finally reached the MANIFEST and became visible. Since N.sst had been deleted in step 3, recovery failed.

It is possible that N.sst created in step 1 is valid. Although step 3 would still fail since the MANIFEST was not synced properly in step 1 and 2, deleting N.sst would make it impossible for the db to recover even if the remaining part of MANIFEST was appended and visible after step 5.

After this PR, in step 3, immediately after recovering from MANIFEST, a new MANIFEST is created, then we find that N.sst is not referenced in the MANIFEST, so we delete it, and we'll not reuse N as file number. Then in step 5, since the new MANIFEST does not contain N.sst, the recovery failure situation in step 5 won't happen.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7621

Test Plan:
1. some tests are updated, because these tests assume that new MANIFEST is created after WAL recovery.
2. a new unit test is added in db_basic_test to simulate step 3.

Reviewed By: riversand963

Differential Revision: D24668144

Pulled By: cheng-chang

fbshipit-source-id: 90d7487fbad2bc3714f5ede46ea949895b15ae3b
2020-11-07 22:23:27 -08:00
Yanqin Jin 394210f280 Remove unused includes (#7604)
Summary:
This is a PR generated **semi-automatically** by an internal tool to remove unused includes and `using` statements.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7604

Test Plan: make check

Reviewed By: ajkr

Differential Revision: D24579392

Pulled By: riversand963

fbshipit-source-id: c4bfa6c6b08da1de186690d37eb73d8fff45aecd
2020-10-28 23:22:27 -07:00
Andrew Kryczka 6c2c0635c9 Require only one `Logger::Logv()` implementation (#7605)
Summary:
A user who extended `Logger` recently pointed out it is unusual to
require they implement the two-argument `Logv()` overload when they've
already implemented the three-argument `Logv()` overload. I agree with
that and think we can fix it by only calling the two-argument overload
from the default implementation of the three-argument overload. Then
when the three-argument overload is overridden, RocksDB would not
rely on the two-argument overload. Only `Logger::LogHeader()` needed
adjustment to achieve this.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7605

Reviewed By: riversand963

Differential Revision: D24584749

Pulled By: ajkr

fbshipit-source-id: 9aabe040ac761c4c0dbebc4be046967403ecaf21
2020-10-28 10:00:51 -07:00
mrambacher f35f7f2704 Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566)
Summary:
This PR does a few things:

1.  The MockFileSystem class was split out from the MockEnv.  This change would theoretically allow a MockFileSystem to be used by other Environments as well (if we created a means of constructing one).  The MockFileSystem implements a FileSystem in its entirety and does not rely on any Wrapper implementation.

2.  Make the RocksDB test suite work when MOCK_ENV=1 and ENCRYPTED_ENV=1 are set.  To accomplish this, a few things were needed:
- The tests that tried to use the "wrong" environment (Env::Default() instead of env_) were updated
- The MockFileSystem was changed to support the features it was missing or mishandled (such as recursively deleting files in a directory or supporting renaming of a directory).

3.  Updated the test framework to have a ROCKSDB_GTEST_SKIP macro.  This can be used to flag tests that are skipped.  Currently, this defaults to doing nothing (marks the test as SUCCESS) but will mark the tests as SKIPPED when RocksDB is upgraded to a version of gtest that supports this (gtest-1.10).

I have run a full "make check" with MEM_ENV, ENCRYPTED_ENV,  both, and neither under both MacOS and RedHat.  A few tests were disabled/skipped for the MEM/ENCRYPTED cases.  The error_handler_fs_test fails/hangs for MEM_ENV (presumably a timing problem) and I will introduce another PR/issue to track that problem.  (I will also push a change to disable those tests soon).  There is one more test in DBTest2 that also fails which I need to investigate or skip before this PR is merged.

Theoretically, this PR should also allow the test suite to run against an Env loaded from the registry, though I do not have one to try it with currently.

Finally, once this is accepted, it would be nice if there was a CircleCI job to run these tests on a checkin so this effort does not become stale.  I do not know how to do that, so if someone could write that job, it would be appreciated :)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7566

Reviewed By: zhichao-cao

Differential Revision: D24408980

Pulled By: jay-zhuang

fbshipit-source-id: 911b1554a4d0da06fd51feca0c090a4abdcb4a5f
2020-10-27 10:33:09 -07:00
anand76 00751e4292 Add a host location property to TableProperties (#7479)
Summary:
This PR adds support for writing a location identifier of the DB host to SST files as a table property. By default, the hostname is used, but can be overridden by the user. There have been some recent corruptions in files written by ```SstFileWriter``` before checksumming, so this property can be used to trace it back to the writing host and checking the host for hardware isues.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7479

Test Plan: Add new unit tests

Reviewed By: pdillinger

Differential Revision: D24340671

Pulled By: anand1976

fbshipit-source-id: 2038949fd8d160c0633ccb4f9da77740f19fa2a2
2020-10-19 11:38:48 -07:00
Akanksha Mahajan 7b65666cf1 Update IOTrace operations in stackable_db.h (#7514)
Summary:
Update IOTrace operations in stackabledb.h and also trace few
other IO operations.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7514

Test Plan: make check -j64

Reviewed By: anand1976

Differential Revision: D24151202

Pulled By: akankshamahajan15

fbshipit-source-id: 112cd3d2041f8c6398b7b0ba1a783b8c93224d4a
2020-10-14 10:16:15 -07:00
Jay Zhuang 98c1333806 Disable a known flaky test: RandomAccessUniqueIDDeletes (#7511)
Summary:
It's a known issue, which is tracked in https://github.com/facebook/rocksdb/issues/7405, https://github.com/facebook/rocksdb/issues/7470. Disable it for
now.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7511

Reviewed By: zhichao-cao

Differential Revision: D24145075

Pulled By: jay-zhuang

fbshipit-source-id: 1858497972f2baba617867aaeac30d93b8305c80
2020-10-08 09:40:59 -07:00
Zhichao Cao 4146276885 Add ldb_cmd_test to ASSERT_STATUS_CHECKED list (#7499)
Summary:
Add ldb_cmd_test to ASSERT_STATUS_CHECKED list

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7499

Test Plan: pass ASSERT_STATUS_CHECKED=1 make -j48 ldb_cmd_test

Reviewed By: cheng-chang

Differential Revision: D24086203

Pulled By: zhichao-cao

fbshipit-source-id: 29592202b1d4335e566de15e7937269d98d57841
2020-10-08 00:00:48 -07:00
Andrew Kryczka 1600aac46f Flush info log for warning and higher severity (#7462)
Summary:
After unclean crash, the tail of the log could look as follows due to block buffering, even when the call to `ROCKSDB_LOG_ERROR()` finished.

```
2020/09/29-13:54:39.596710 7f67025fe700 [ERROR] [/db_impl/db_impl_compaction_flush.cc:2500] Waiting after background compaction err
```

This PR forces the flush while logging warning severity or higher to prevent that case.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7462

Reviewed By: riversand963

Differential Revision: D24000154

Pulled By: ajkr

fbshipit-source-id: 3bf5f1e69a62ee10e84095cebc88937a8f81b4ad
2020-09-29 16:06:14 -07:00
Zhichao Cao 0ce9b3a22d Add AppendWithVerify and PositionedAppendWithVerify to Env and FileSystem (#7419)
Summary:
Add new AppendWithVerify and PositionedAppendWithVerify APIs to Env and FileSystem to bring the data verification information (data checksum information) from upper layer (e.g., WritableFileWriter) to the storage layer. This PR only include the API definition, no functional codes are added to unblock other developers which depend on these APIs.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7419

Test Plan: make -j32

Reviewed By: pdillinger

Differential Revision: D23883196

Pulled By: zhichao-cao

fbshipit-source-id: 94676c26bc56144cc32e3661f84f21eccd790411
2020-09-23 19:02:26 -07:00
Akanksha Mahajan 98ac6b646a Add IO Tracer Parser (#7333)
Summary:
Implement a parsing tool io_tracer_parser that takes IO trace file (binary file) with command line argument --io_trace_file and output file with --output_file and dumps the IO trace records in outputfile in human readable form.

Also added unit test cases that generates IO trace records and calls io_tracer_parse to parse those records.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7333

Test Plan:
make check -j64,
 Add unit test cases.

Reviewed By: anand1976

Differential Revision: D23772360

Pulled By: akankshamahajan15

fbshipit-source-id: 9c20519c189362e6663352d08863326f3e496271
2020-09-23 15:50:26 -07:00
Xavier Deguillard 249f2b59a0 build: make it compile with @mode/win (#7406)
Summary:
While rocksdb can compile on both macOS and Linux with Buck, it couldn't be
compiled on Windows. The only way to compile it on Windows was with the CMake
build.

To keep the multi-platform complexity low, I've simply included all the Windows
bits in the TARGETS file, and added large #if blocks when not on Windows, the
same was done on the posix specific files.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7406

Test Plan:
On my devserver:
  buck test //rocksdb/...
On Windows:
  buck build mode/win //rocksdb/src:rocksdb_lib

Reviewed By: pdillinger

Differential Revision: D23874358

Pulled By: xavierd

fbshipit-source-id: 8768b5d16d7e8f44b5ca1e2483881ca4b24bffbe
2020-09-23 12:55:54 -07:00
mrambacher 67bd5401e9 Changes to EncryptedEnv public API (#7279)
Summary:
Cleaned up the public API to use the EncryptedEnv.  This change will allow providers to be developed and added to the system easier in the future.  It will also allow better integration in the future with the OPTIONS file.

- The internal classes were moved out of the public API into an internal "env_encryption_ctr.h" header.  Short-cut constructors were added to provide the original API functionality.
- The APIs to the constructors were changed to take shared_ptr, rather than raw pointers or references to allow better memory management and alternative implementations.
- CreateFromString methods were added to allow future expansion to other provider and cipher implementations through a standard API.

Additionally, there was a code duplication in the NewXXXFile methods.  This common code was moved under a templatized function.

A first-pass at structuring the code was made to potentially allow multiple EncryptionProviders in a single EncryptedEnv.  The idea was that different providers may use different cipher keys or different versions/algorithms.  The EncryptedEnv should have some means of picking different providers based on information.  The groundwork was started for this (the use of the provider_ member variable was localized) but the work has not been completed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7279

Reviewed By: jay-zhuang

Differential Revision: D23709440

Pulled By: zhichao-cao

fbshipit-source-id: 0e845fff0e03a52603eb9672b4ade32d063ff2f2
2020-09-15 17:14:10 -07:00
Akanksha Mahajan 0de335e076 Use FSRandomRWFilePtr Object to call underlying file system. (#7198)
Summary:
Replace FSRandomRWFile pointer with FSRandomRWFilePtr object in the rocksdb internal code.
This new object wraps FSRandomRWFile pointer.

Objective: If tracing is enabled, FSRandomRWFile object returns FSRandomRWFileTracingWrapper pointer that includes all necessary information in IORecord and calls underlying FileSystem and invokes IOTracer to dump that record in a binary file. If tracing is disabled then, underlying FileSystem pointer is returned directly.
FSRandomRWFilePtr wrapper class is added to bypass the FSRandomRWFileWrapper when
tracing is disabled.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7198

Test Plan: make check -j64

Reviewed By: anand1976

Differential Revision: D23421116

Pulled By: akankshamahajan15

fbshipit-source-id: 8a5ba0e7d9c1ba34c3a6f29829b107c5f09ab6a3
2020-09-08 12:21:58 -07:00