Commit graph

170 commits

Author SHA1 Message Date
Levi Tamasi b00fa5597e Fix the handling of wide-column base values in the max_successive_merges logic (#11913)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11913

The `max_successive_merges` logic currently does not handle wide-column base values correctly, since it uses the `Get` API, which only returns the value of the default column. The patch fixes this by switching to `GetEntity` and passing all columns (if applicable) to the merge operator.

Reviewed By: jaykorean

Differential Revision: D49795097

fbshipit-source-id: 75eb7cc9476226255062cdb3d43ab6bd1cc2faa3
2023-10-02 16:25:25 -07:00
Jay Huh 2cfe53ec05 Add helpful message for ldb when unknown option found (#11907)
Summary:
Users may run into an issue when running ldb on db that's in a different version and they have different set of options: `Failed: Invalid argument: Could not find option: <MISSING_OPTION>`

They can work around this by setting `--ignore_unknown_options`, but the error message is not clear for users to find why the option is missing. It's also hard for the users to find the `ignore_unknown_options` option especially if they are not familiar with the codebase or `ldb` tool.

This PR changes the error message to help users to find out what's wrong and possible workaround for the issue

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11907

Test Plan:
Testing by reproducing the issue locally
```
❯./ldb --db=/data/users/jewoongh/db_crash_whitebox_T164195541/ get a
Failed: Invalid argument: Could not find option: : unknown_option_test
This tool was built with version 8.8.0. If your db is in a different version, please try again with option --ignore_unknown_options.
```

Reviewed By: jowlyzhang

Differential Revision: D49762291

Pulled By: jaykorean

fbshipit-source-id: 895570150fde886d5ec524908c4b2664c9230ac9
2023-09-29 09:58:40 -07:00
Jay Huh 4b79e8c003 GetEntity and PutEntity Support in ldb (#11796)
Summary:
- `get_entity` and `put_entity` command support in ldb
- Input Format for `put_entity`: `ldb --db=<DB_PATH> put_entity <KEY> <COLUMN_1_NAME>:<COLUMN_1_VALUE> <COLUMN_2_NAME>:<COLUMN_2_VALUE> ...`
- Output Format for `get_entity`: `<COLUMN_1_NAME>:<COLUMN_1_VALUE> <COLUMN_2_NAME>:<COLUMN_2_VALUE>`
- If `get_entity` is called against non-wide column value (existing behavior), empty key (kDefaultWideColumnName) will be printed, appended by `:<COLUMN_VALUE>`
- If `get` is called against wide column value (existing behavior), first column value is printed if the first column name is kDefaultWideColumnName.

# Test

Checks for `put_entity` and `get_entity` added in `ldb_test.py`
```
❯ python3 tools/ldb_test.py                                                                                                                                                                                                                                                                                                                                                                    took 45s at 10:45:44 AM
Running testBlobBatchPut...
.Running testBlobDump
.Running testBlobPut...
.Running testBlobStartingLevel...
.Running testCheckConsistency...
.Running testColumnFamilies...
.Running testCountDelimDump...
.Running testCountDelimIDump...
.Running testDumpLiveFiles...
.Running testDumpLoad...
Warning: 7 bad lines ignored.
.Running testGetProperty...
.Running testHexPutGet...
.Running testIDumpBasics...
.Running testIDumpDecodeBlobIndex...
.Running testIngestExternalSst...
.Running testInvalidCmdLines...
.Running testListColumnFamilies...
.Running testListLiveFilesMetadata...
.Running testManifestDump...
.Running testMiscAdminTask...
Compacting the db...
Sequence,Count,ByteSize,Physical Offset,Key(s)
.Running testSSTDump...
.Running testSimpleStringPutGet...
.Running testStringBatchPut...
.Running testTtlPutGet...
.Running testWALDump...
.
----------------------------------------------------------------------
Ran 25 tests in 57.742s
```

Manual Test
```
# Invalid format for wide columns
❯ ./ldb --db=/tmp/test_db put_entity x4 x5
Failed: wide column format needs to be <column_name>:<column_value> (did you mean put <key> <value>?)

# empty column name (kDefaultWideColumnName)
❯ ./ldb --db=/tmp/test_db put_entity x4 :x5
OK
❯ ./ldb --db=/tmp/test_db get_entity x4
:x5
❯ ./ldb --db=/tmp/test_db get x4
x5

❯ ./ldb --db=/tmp/test_db put_entity a1 :z1 b1:c1 b2:f1
OK
❯ ./ldb --db=/tmp/test_db get_entity a1
:z1 b1:c1 b2:f1

# Keeping the existing behavior if `get` was called on wide column values
❯ ./ldb --db=/tmp/test_db get a1
z1

# Scan
❯ ./ldb --db=/tmp/test_db scan
a1 ==> b1:c1 b2:f1
x4 ==> x5
x5 ==> cn1:cv1 cn2:cv2

# Scan hex
❯ ./ldb --db=/tmp/test_db scan --hex
0x6131 ==> 0x6231:0x6331 0x6232:0x6631
0x7834 ==> 0x7835
0x7835 ==> 0x636E31:0x637631 0x636E32:0x637632

# More testing with hex values
❯ ./ldb --db=/tmp/test_db get_entity 0x6131 --hex
0x6231:0x6331 0x6232:0x6631

❯ ./ldb --db=/tmp/test_db get_entity 0x78 --hex
Failed: GetEntity failed: NotFound:

❯ ./ldb --db=/tmp/test_db get_entity 0x7834 --hex
:0x7835

❯ ./ldb --db=/tmp/test_db put_entity 0x7834 0x6234:0x6635 --hex
OK

❯ ./ldb --db=/tmp/test_db get_entity 0x7834 --hex
0x6234:0x6635

❯ ./ldb --db=/tmp/test_db get_entity 0x7834 --key_hex
b4:f5

❯ ./ldb --db=/tmp/test_db get_entity x4 --value_hex
0x6234:0x6635
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11796

Reviewed By: jowlyzhang

Differential Revision: D48978141

Pulled By: jaykorean

fbshipit-source-id: 4f87c222417ed90a6dbf39bd7b0f068b01e68393
2023-09-12 16:32:40 -07:00
Levi Tamasi 8fc78a3a9e Add helper methods WideColumnsHelper::{Has,Get}DefaultColumn (#11813)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11813

The patch adds a couple of helper methods `WideColumnsHelper::{Has,Get}DefaultColumn` to eliminate some code duplication.

Reviewed By: jaykorean

Differential Revision: D49166682

fbshipit-source-id: f229ca5b94599f7445a0112b10f8317292505c82
2023-09-11 16:32:32 -07:00
Jay Huh 47be3ffffb Minor refactor on LDB command for wide column support and release note (#11777)
Summary:
As mentioned in https://github.com/facebook/rocksdb/issues/11754 , refactor to clean up some nearly identical logic. This PR changes the existing debugging string format of Scan command as the following.

```
❯ ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/ scan --hex
```

Before
```
0x6669727374 : :0x68656C6C6F 0x617474725F6E616D6531:0x666F6F 0x617474725F6E616D6532:0x626172
0x7365636F6E64 : 0x617474725F6F6E65:0x74776F 0x617474725F7468726565:0x666F7572
0x7468697264 : 0x62617A
```
After
```
0x6669727374 ==> :0x68656C6C6F 0x617474725F6E616D6531:0x666F6F 0x617474725F6E616D6532:0x626172
0x7365636F6E64 ==> 0x617474725F6F6E65:0x74776F 0x617474725F7468726565:0x666F7572
0x7468697264 ==> 0x62617A
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11777

Test Plan:
```
❯ ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/ dump
first ==> :hello attr_name1:foo attr_name2:bar
second ==> attr_one:two attr_three:four
third ==> baz
Keys in range: 3

❯ ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/ scan
first ==> :hello attr_name1:foo attr_name2:bar
second ==> attr_one:two attr_three:four
third ==> baz

❯ ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/ dump --hex
0x6669727374 ==> :0x68656C6C6F 0x617474725F6E616D6531:0x666F6F 0x617474725F6E616D6532:0x626172
0x7365636F6E64 ==> 0x617474725F6F6E65:0x74776F 0x617474725F7468726565:0x666F7572
0x7468697264 ==> 0x62617A
Keys in range: 3

❯ ./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/ scan --hex
0x6669727374 ==> :0x68656C6C6F 0x617474725F6E616D6531:0x666F6F 0x617474725F6E616D6532:0x626172
0x7365636F6E64 ==> 0x617474725F6F6E65:0x74776F 0x617474725F7468726565:0x666F7572
0x7468697264 ==> 0x62617A
```

Reviewed By: jowlyzhang

Differential Revision: D48876755

Pulled By: jaykorean

fbshipit-source-id: b1c608a810fe038999ac528b690d398abf5f21d7
2023-08-31 16:17:03 -07:00
Jay Huh ea9a5b2914 Wide Column support in ldb (#11754)
Summary:
wide_columns can now be pretty-printed in the following commands
- `./ldb dump_wal`
- `./ldb dump`
- `./ldb idump`
- `./ldb dump_live_files`
- `./ldb scan`
- `./sst_dump --command=scan`

There are opportunities to refactor to reduce some nearly identical code. This PR is initial change to add wide column support in `ldb` and `sst_dump` tool. More PRs to come for the refactor.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11754

Test Plan:
**New Tests added**
- `WideColumnsHelperTest::DumpWideColumns`
- `WideColumnsHelperTest::DumpSliceAsWideColumns`

**Changes added to existing tests**
- `ExternalSSTFileTest::BasicMixed` added to cover mixed case (This test should have been added in https://github.com/facebook/rocksdb/issues/11688). This test does not verify the ldb or sst_dump output. This test was used to create test SST files having some rows with wide columns and some without and the generated SST files were used to manually test sst_dump_tool.
- `createSST()` in `sst_dump_test` now takes `wide_column_one_in` to add wide column value in SST

**dump_wal**
```
./ldb dump_wal --walfile=/tmp/rocksdbtest-226125/db_wide_basic_test_2675429_2308393776696827948/000004.log --print_value --header
```
```
Sequence,Count,ByteSize,Physical Offset,Key(s) : value
1,1,59,0,PUT_ENTITY(0) : 0x:0x68656C6C6F 0x617474725F6E616D6531:0x666F6F 0x617474725F6E616D6532:0x626172
2,1,34,42,PUT_ENTITY(0) : 0x617474725F6F6E65:0x74776F 0x617474725F7468726565:0x666F7572
3,1,17,7d,PUT(0) : 0x7468697264 : 0x62617A
```

**idump**
```
./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/ idump
```
```
'first' seq:1, type:22 => :hello attr_name1:foo attr_name2:bar
'second' seq:2, type:22 => attr_one:two attr_three:four
'third' seq:3, type:1 => baz
Internal keys in range: 3
```

**SST Dump from dump_live_files**
```
./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/ compact
./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/ dump_live_files
```
```
...
==============================
SST Files
==============================
/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/000013.sst level:1
------------------------------
Process /tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/000013.sst
Sst file format: block-based
'first' seq:0, type:22 => :hello attr_name1:foo attr_name2:bar
'second' seq:0, type:22 => attr_one:two attr_three:four
'third' seq:0, type:1 => baz
...
```

**dump**
```
./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/ dump
```
```
first ==> :hello attr_name1:foo attr_name2:bar
second ==> attr_one:two attr_three:four
third ==> baz
Keys in range: 3
```

**scan**
```
./ldb --db=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/ scan
```
```
first : :hello attr_name1:foo attr_name2:bar
second : attr_one:two attr_three:four
third : baz
```

**sst_dump**
```
./sst_dump --file=/tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/000013.sst --command=scan
```
```
options.env is 0x7ff54b296000
Process /tmp/rocksdbtest-226125/db_wide_basic_test_3481961_2308393776696827948/000013.sst
Sst file format: block-based
from [] to []
'first' seq:0, type:22 => :hello attr_name1:foo attr_name2:bar
'second' seq:0, type:22 => attr_one:two attr_three:four
'third' seq:0, type:1 => baz
```

Reviewed By: ltamasi

Differential Revision: D48837999

Pulled By: jaykorean

fbshipit-source-id: b0280f0589d2b9716bb9b50530ffcabb397d140f
2023-08-30 12:45:52 -07:00
Yu Zhang 2a9f3b6cc5 Try to use a db's OPTIONS file for some ldb commands (#11721)
Summary:
For some ldb commands that doesn't need to open the DB, it's still useful to use the DB's existing OPTIONS file if it's available.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11721

Reviewed By: pdillinger

Differential Revision: D48485540

Pulled By: jowlyzhang

fbshipit-source-id: 2d2db837523044066f1a2c4b59a5c03f6cd35e6b
2023-08-21 15:04:22 -07:00
huangmengbin 98d0f6ec08 fix: VersionSet::DumpManifest (#11605)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/11604

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11605

Reviewed By: jowlyzhang

Differential Revision: D47459254

Pulled By: ajkr

fbshipit-source-id: 4420e443fbf4bd01ddaa2b47285fc4445bf36246
2023-07-19 10:44:10 -07:00
Hui Xiao 151242ce46 Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288)
Summary:
**Context:**
The existing stat rocksdb.sst.read.micros does not reflect each of compaction and flush cases but aggregate them, which is not so helpful for us to understand IO read behavior of each of them.

**Summary**
- Update `StopWatch` and `RandomAccessFileReader` to record `rocksdb.sst.read.micros` and `rocksdb.file.{flush/compaction}.read.micros`
   - Fixed the default histogram in `RandomAccessFileReader`
- New field `ReadOptions/IOOptions::io_activity`; Pass `ReadOptions` through paths under db open, flush and compaction to where we can prepare `IOOptions` and pass it to `RandomAccessFileReader`
- Use `thread_status_util` for assertion in `DbStressFSWrapper` for continuous testing on we are passing correct `io_activity` under db open, flush and compaction

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11288

Test Plan:
- **Stress test**
- **Db bench 1: rocksdb.sst.read.micros COUNT ≈ sum of rocksdb.file.read.flush.micros's and rocksdb.file.read.compaction.micros's.**  (without blob)
     - May not be exactly the same due to `HistogramStat::Add` only guarantees atomic not accuracy across threads.
```
./db_bench -db=/dev/shm/testdb/ -statistics=true -benchmarks="fillseq" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3 (-use_plain_table=1 -prefix_size=10)
```
```
// BlockBasedTable
rocksdb.sst.read.micros P50 : 2.009374 P95 : 4.968548 P99 : 8.110362 P100 : 43.000000 COUNT : 40456 SUM : 114805
rocksdb.file.read.flush.micros P50 : 1.871841 P95 : 3.872407 P99 : 5.540541 P100 : 43.000000 COUNT : 2250 SUM : 6116
rocksdb.file.read.compaction.micros P50 : 2.023109 P95 : 5.029149 P99 : 8.196910 P100 : 26.000000 COUNT : 38206 SUM : 108689

// PlainTable
Does not apply
```
- **Db bench 2: performance**

**Read**

SETUP: db with 900 files
```
./db_bench -db=/dev/shm/testdb/ -benchmarks="fillseq" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655  -disable_auto_compactions=true -target_file_size_base=655 -compression_type=none
```run till convergence
```
./db_bench -seed=1678564177044286 -use_existing_db=true -db=/dev/shm/testdb -benchmarks=readrandom[-X60] -statistics=true -num=1000000 -disable_auto_compactions=true -compression_type=none -bloom_bits=3
```
Pre-change
`readrandom [AVG 60 runs] : 21568 (± 248) ops/sec`
Post-change (no regression, -0.3%)
`readrandom [AVG 60 runs] : 21486 (± 236) ops/sec`

**Compaction/Flush**run till convergence
```
./db_bench -db=/dev/shm/testdb2/ -seed=1678564177044286 -benchmarks="fillseq[-X60]" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655  -disable_auto_compactions=false -target_file_size_base=655 -compression_type=none

rocksdb.sst.read.micros  COUNT : 33820
rocksdb.sst.read.flush.micros COUNT : 1800
rocksdb.sst.read.compaction.micros COUNT : 32020
```
Pre-change
`fillseq [AVG 46 runs] : 1391 (± 214) ops/sec;    0.7 (± 0.1) MB/sec`

Post-change (no regression, ~-0.4%)
`fillseq [AVG 46 runs] : 1385 (± 216) ops/sec;    0.7 (± 0.1) MB/sec`

Reviewed By: ajkr

Differential Revision: D44007011

Pulled By: hx235

fbshipit-source-id: a54c89e4846dfc9a135389edf3f3eedfea257132
2023-04-21 09:07:18 -07:00
sdong 4720ba4391 Remove RocksDB LITE (#11147)
Summary:
We haven't been actively mantaining RocksDB LITE recently and the size must have been gone up significantly. We are removing the support.

Most of changes were done through following comments:

unifdef -m -UROCKSDB_LITE `git grep -l ROCKSDB_LITE | egrep '[.](cc|h)'`

by Peter Dillinger. Others changes were manually applied to build scripts, CircleCI manifests, ROCKSDB_LITE is used in an expression and file db_stress_test_base.cc.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11147

Test Plan: See CI

Reviewed By: pdillinger

Differential Revision: D42796341

fbshipit-source-id: 4920e15fc2060c2cd2221330a6d0e5e65d4b7fe2
2023-01-27 13:14:19 -08:00
sdong 48fe921754 Run clang format against files under tools/ and db_stress_tool/ (#10868)
Summary:
Some lines of .h and .cc files are not properly fomatted. Clear them up with clang format.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10868

Test Plan: Watch existing CI to pass

Reviewed By: ajkr

Differential Revision: D40683485

fbshipit-source-id: 491fbb78b2cdcb948164f306829909ad816d5d0b
2022-10-25 14:29:41 -07:00
Hui Xiao e484b81eee Sync dir containing CURRENT after RenameFile on CURRENT as much as possible (#10573)
Summary:
**Context:**
Below crash test revealed a bug that directory containing CURRENT file (short for `dir_contains_current_file` below) was not always get synced after a new CURRENT is created and being called with `RenameFile` as part of the creation.

This bug exposes a risk that such un-synced directory containing the updated CURRENT can’t survive a host crash (e.g, power loss) hence get corrupted. This then will be followed by a recovery from a corrupted CURRENT that we don't want.

The root-cause is that a nullptr `FSDirectory* dir_contains_current_file` sometimes gets passed-down to `SetCurrentFile()` hence in those case `dir_contains_current_file->FSDirectory::FsyncWithDirOptions()` will be skipped  (which otherwise will internally call`Env/FS::SyncDic()` )
```
./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --allow_data_in_errors=True --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=8 --block_size=16384 --bloom_bits=134.8015470676662 --bottommost_compression_type=disable --cache_size=8388608 --checkpoint_one_in=1000000 --checksum_type=kCRC32c --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=2 --compaction_ttl=100 --compression_max_dict_buffer_bytes=511 --compression_max_dict_bytes=16384 --compression_type=zstd --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=65536 --continuous_verification_interval=0 --data_block_index_type=0 --db=$db --db_write_buffer_size=1048576 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --expected_values_dir=$exp --fail_if_options_file_error=1 --file_checksum_impl=none --flush_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=4 --ingest_external_file_one_in=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --mark_for_compaction_one_file_in=10 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=10000 --max_key_len=3 --max_manifest_file_size=16384 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_prefix_bloom_size_ratio=0.001 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --mmap_read=1 --nooverwritepercent=1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_memory=1 --paranoid_file_checks=1 --partition_pinning=2 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=5 --prefixpercent=5 --prepopulate_block_cache=1 --progress_reports=0 --read_fault_one_in=1000 --readpercent=45 --recycle_log_file_num=0 --reopen=0 --ribbon_starting_level=999 --secondary_cache_fault_one_in=32 --secondary_cache_uri=compressed_secondary_cache://capacity=8388608 --set_options_one_in=10000 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --subcompactions=3 --sync_fault_injection=1 --target_file_size_base=2097 --target_file_size_multiplier=2 --test_batches_snapshots=1 --top_level_index_pinning=1 --use_full_merge_v1=1 --use_merge=1 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --write_buffer_size=4194 --writepercent=35
```

```
stderr:
WARNING: prefix_size is non-zero but memtablerep != prefix_hash
db_stress: utilities/fault_injection_fs.cc:748: virtual rocksdb::IOStatus rocksdb::FaultInjectionTestFS::RenameFile(const std::string &, const std::string &, const rocksdb::IOOptions &, rocksdb::IODebugContext *): Assertion `tlist.find(tdn.second) == tlist.end()' failed.`
```

**Summary:**
The PR ensured the non-test path pass down a non-null dir containing CURRENT (which is by current RocksDB assumption just db_dir) by doing the following:
- Renamed `directory_to_fsync` as `dir_contains_current_file` in `SetCurrentFile()` to tighten the association between this directory and CURRENT file
- Changed `SetCurrentFile()` API to require `dir_contains_current_file` being passed-in, instead of making it by default nullptr.
    -  Because `SetCurrentFile()`'s `dir_contains_current_file` is passed down from `VersionSet::LogAndApply()` then `VersionSet::ProcessManifestWrites()` (i.e, think about this as a chain of 3 functions related to MANIFEST update), these 2 functions also got refactored to require `dir_contains_current_file`
- Updated the non-test-path callers of these 3 functions to obtain and pass in non-nullptr `dir_contains_current_file`, which by current assumption of RocksDB, is the `FSDirectory* db_dir`.
    - `db_impl` path will obtain `DBImpl::directories_.getDbDir()` while others with no access to such `directories_` are obtained on the fly by creating such object `FileSystem::NewDirectory(..)` and manage it by unique pointers to ensure short life time.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10573

Test Plan:
- `make check`
- Passed the repro db_stress command
- For future improvement, since we currently don't assert dir containing CURRENT to be non-nullptr due to https://github.com/facebook/rocksdb/pull/10573#pullrequestreview-1087698899, there is still chances that future developers mistakenly pass down nullptr dir containing CURRENT thus resulting skipped sync dir and cause the bug again. Therefore a smarter test (e.g, such as quoted from ajkr  "(make) unsynced data loss to be dropping files corresponding to unsynced directory entries") is still needed.

Reviewed By: ajkr

Differential Revision: D39005886

Pulled By: hx235

fbshipit-source-id: 336fb9090d0cfa6ca3dd580db86268007dde7f5a
2022-08-29 17:35:21 -07:00
Jay Zhuang 6a0010eb46 ldb to display public unique id and dump work with key range (#10417)
Summary:
2 ldb command improvements:
1. `ldb manifest_dump --verbose` display both the internal unique id and public id. which is useful to manually check sst_unique_id between manifest and SST;
2. `ldb dump` has `--from/to` option, but not working. Add support for that.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10417

Test Plan:
run the command locally
```
$ ldb manifest_dump --path=MANIFEST-000026 --verbose
...
AddFile: 0 18 1023 'bar' seq:6, type:1 .. 'foo' seq:5, type:1 oldest_ancester_time:1658787615 file_creation_time:1658787615 file_checksum: file_checksum_func_name: Unknown unique_id(internal): {8800772265202404198,16149248642318466463} public_unique_id: F3E0A029B631D7D4-6E402DE08E771780
```
```
$ ldb dump --path=000036.sst --from=key000006 --to=key000009
Sst file format: block-based
'key000006' seq:2411, type:1 => value6
'key000007' seq:2412, type:1 => value7
'key000008' seq:2413, type:1 => value8
...
```

Reviewed By: ajkr

Differential Revision: D38136140

Pulled By: jay-zhuang

fbshipit-source-id: 8be6eeaa07ff9f089e33011ebe90fd0b69d33bf3
2022-07-26 20:40:18 -07:00
Gang Liao ec4ebeff30 Support prepopulating/warming the blob cache (#10298)
Summary:
Many workloads have temporal locality, where recently written items are read back in a short period of time. When using remote file systems, this is inefficient since it involves network traffic and higher latencies. Because of this, we would like to support prepopulating the blob cache during flush.

This task is a part of https://github.com/facebook/rocksdb/issues/10156

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10298

Reviewed By: ltamasi

Differential Revision: D37908743

Pulled By: gangliao

fbshipit-source-id: 9feaed234bc719d38f0c02975c1ad19fa4bb37d1
2022-07-17 07:13:59 -07:00
Gang Liao deff48bcef Add blob source to retrieve blobs in RocksDB (#10198)
Summary:
There is currently no caching mechanism for blobs, which is not ideal especially when the database resides on remote storage (where we cannot rely on the OS page cache). As part of this task, we would like to make it possible for the application to configure a blob cache.
In this task, we formally introduced the blob source to RocksDB.  BlobSource is a new abstraction layer that provides universal access to blobs, regardless of whether they are in the blob cache, secondary cache, or (remote) storage. Depending on user settings, it always fetch blobs from multi-tier cache and storage with minimal cost.

Note: The new `MultiGetBlob()` implementation is not included in the current PR. To go faster, we aim to create a separate PR for it in parallel!

This PR is a part of https://github.com/facebook/rocksdb/issues/10156

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10198

Reviewed By: ltamasi

Differential Revision: D37294735

Pulled By: gangliao

fbshipit-source-id: 9cb50422d9dd1bc03798501c2778b6c7520c7a1e
2022-06-20 20:58:11 -07:00
Gang Liao e6432dfd4c Make it possible to enable blob files starting from a certain LSM tree level (#10077)
Summary:
Currently, if blob files are enabled (i.e. `enable_blob_files` is true), large values are extracted both during flush/recovery (when SST files are written into level 0 of the LSM tree) and during compaction into any LSM tree level. For certain use cases that have a mix of short-lived and long-lived values, it might make sense to support extracting large values only during compactions whose output level is greater than or equal to a specified LSM tree level (e.g. compactions into L1/L2/... or above). This could reduce the space amplification caused by large values that are turned into garbage shortly after being written at the price of some write amplification incurred by long-lived values whose extraction to blob files is delayed.

In order to achieve this, we would like to do the following:
- Add a new configuration option `blob_file_starting_level` (default: 0) to `AdvancedColumnFamilyOptions` (and `MutableCFOptions` and extend the related logic)
- Instantiate `BlobFileBuilder` in `BuildTable` (used during flush and recovery, where the LSM tree level is L0) and `CompactionJob` iff `enable_blob_files` is set and the LSM tree level is `>= blob_file_starting_level`
- Add unit tests for the new functionality, and add the new option to our stress tests (`db_stress` and `db_crashtest.py` )
- Add the new option to our benchmarking tool `db_bench` and the BlobDB benchmark script `run_blob_bench.sh`
- Add the new option to the `ldb` tool (see https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool)
- Ideally extend the C and Java bindings with the new option
- Update the BlobDB wiki to document the new option.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10077

Reviewed By: ltamasi

Differential Revision: D36884156

Pulled By: gangliao

fbshipit-source-id: 942bab025f04633edca8564ed64791cb5e31627d
2022-06-02 20:04:33 -07:00
Zichen Zhu 65893ad959 Explicitly closing all directory file descriptors (#10049)
Summary:
Currently, the DB directory file descriptor is left open until the deconstruction process (`DB::Close()` does not close the file descriptor). To verify this, comment out the lines between `db_ = nullptr` and `db_->Close()` (line 512, 513, 514, 515 in ldb_cmd.cc) to leak the ``db_'' object, build `ldb` tool and run
```
strace --trace=open,openat,close ./ldb --db=$TEST_TMPDIR --ignore_unknown_options put K1 V1 --create_if_missing
```
There is one directory file descriptor that is not closed in the strace log.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10049

Test Plan: Add a new unit test DBBasicTest.DBCloseAllDirectoryFDs: Open a database with different WAL directory and three different data directories, and all directory file descriptors should be closed after calling Close(). Explicitly call Close() after a directory file descriptor is not used so that the counter of directory open and close should be equivalent.

Reviewed By: ajkr, hx235

Differential Revision: D36722135

Pulled By: littlepig2013

fbshipit-source-id: 07bdc2abc417c6b30997b9bbef1f79aa757b21ff
2022-06-01 18:03:34 -07:00
Changyu Bi 8515bd50c9 Support read rate-limiting in SequentialFileReader (#9973)
Summary:
Added rate limiter and read rate-limiting support to SequentialFileReader. I've updated call sites to SequentialFileReader::Read with appropriate IO priority (or left a TODO and specified IO_TOTAL for now).

The PR is separated into four commits: the first one added the rate-limiting support, but with some fixes in the unit test since the number of request bytes from rate limiter in SequentialFileReader are not accurate (there is overcharge at EOF). The second commit fixed this by allowing SequentialFileReader to check file size and determine how many bytes are left in the file to read. The third commit added benchmark related code. The fourth commit moved the logic of using file size to avoid overcharging the rate limiter into backup engine (the main user of SequentialFileReader).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9973

Test Plan:
- `make check`, backup_engine_test covers usage of SequentialFileReader with rate limiter.
- Run db_bench to check if rate limiting is throttling as expected: Verified that reads and writes are together throttled at 2MB/s, and at 0.2MB chunks that are 100ms apart.
  - Set up: `./db_bench --benchmarks=fillrandom -db=/dev/shm/test_rocksdb`
  - Benchmark:
```
strace -ttfe read,write ./db_bench --benchmarks=backup -db=/dev/shm/test_rocksdb --backup_rate_limit=2097152 --use_existing_db
strace -ttfe read,write ./db_bench --benchmarks=restore -db=/dev/shm/test_rocksdb --restore_rate_limit=2097152 --use_existing_db
```
- db bench on backup and restore to ensure no performance regression.
  - backup (avg over 50 runs): pre-change: 1.90443e+06 micros/op; post-change: 1.8993e+06 micros/op (improve by 0.2%)
  - restore (avg over 50 runs): pre-change: 1.79105e+06 micros/op; post-change: 1.78192e+06 micros/op (improve by 0.5%)

```
# Set up
./db_bench --benchmarks=fillrandom -db=/tmp/test_rocksdb -num=10000000

# benchmark
TEST_TMPDIR=/tmp/test_rocksdb
NUM_RUN=50
for ((j=0;j<$NUM_RUN;j++))
do
   ./db_bench -db=$TEST_TMPDIR -num=10000000 -benchmarks=backup -use_existing_db | egrep 'backup'
  # Restore
  #./db_bench -db=$TEST_TMPDIR -num=10000000 -benchmarks=restore -use_existing_db
done > rate_limit.txt && awk -v NUM_RUN=$NUM_RUN '{sum+=$3;sum_sqrt+=$3^2}END{print sum/NUM_RUN, sqrt(sum_sqrt/NUM_RUN-(sum/NUM_RUN)^2)}' rate_limit.txt >> rate_limit_2.txt
```

Reviewed By: hx235

Differential Revision: D36327418

Pulled By: cbi42

fbshipit-source-id: e75d4307cff815945482df5ba630c1e88d064691
2022-05-24 10:28:57 -07:00
yaphet 26768edb65 Support single delete in ldb (#9469)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9469

Reviewed By: riversand963

Differential Revision: D33953484

fbshipit-source-id: f4e84a2d9865957d744c7e84ff02ffbb0a62b0a8
2022-05-10 16:37:19 -07:00
sdong 736a7b5433 Remove own ToString() (#9955)
Summary:
ToString() is created as some platform doesn't support std::to_string(). However, we've already used std::to_string() by mistake for 16 months (in db/db_info_dumper.cc). This commit just remove ToString().

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9955

Test Plan: Watch CI tests

Reviewed By: riversand963

Differential Revision: D36176799

fbshipit-source-id: bdb6dcd0e3a3ab96a1ac810f5d0188f684064471
2022-05-06 13:03:58 -07:00
Jay Zhuang 270179bb12 Default try_load_options to true when DB is specified (#9937)
Summary:
If the DB path is specified, the user would expect ldb loads the
options from the path, but it's not:
```
$ ldb list_live_files_metadata --db=`pwd`
```
Default `try_load_options` to true in that case. The user can still
disable that by:
```
$ ldb list_live_files_metadata --db=`pwd` --try_load_options=false
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9937

Test Plan:
`ldb list_live_files_metadata --db=`pwd`` is able to work for
a db generated with different options.num_levels.

Reviewed By: ajkr

Differential Revision: D36106708

Pulled By: jay-zhuang

fbshipit-source-id: 2732fdc027a4d172436b2c9b6a9787b56b10c710
2022-05-04 08:49:46 -07:00
Anvesh Komuravelli aafb377bb5 Update protection info on recovered logs data (#9875)
Summary:
Update protection info on recovered logs data

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9875

Test Plan:
- Benchmark setup: `TEST_TMPDIR=/dev/shm/100MB_WAL_DB/ ./db_bench -benchmarks=fillrandom -write_buffer_size=1048576000`
- Benchmark command: `TEST_TMPDIR=/dev/shm/100MB_WAL_DB/ /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=overwrite -write_buffer_size=1048576000 -writes=1 -report_open_timing=true`
- Results before this PR
```
OpenDb:     2350.14 milliseconds
OpenDb:     2296.94 milliseconds
OpenDb:     2184.29 milliseconds
OpenDb:     2167.59 milliseconds
OpenDb:     2231.24 milliseconds
OpenDb:     2109.57 milliseconds
OpenDb:     2197.71 milliseconds
OpenDb:     2120.8 milliseconds
OpenDb:     2148.12 milliseconds
OpenDb:     2207.95 milliseconds
```
- Results after this PR
```
OpenDb:     2424.52 milliseconds
OpenDb:     2359.84 milliseconds
OpenDb:     2317.68 milliseconds
OpenDb:     2339.4 milliseconds
OpenDb:     2325.36 milliseconds
OpenDb:     2321.06 milliseconds
OpenDb:     2353.98 milliseconds
OpenDb:     2344.64 milliseconds
OpenDb:     2384.09 milliseconds
OpenDb:     2428.58 milliseconds
```

Mean regressed 7.2% (2201.4 -> 2359.9)

Reviewed By: ajkr

Differential Revision: D36012787

Pulled By: akomurav

fbshipit-source-id: d2aba09f29c6beb2fd0fe8e1e359be910b4ef02a
2022-04-28 14:42:00 -07:00
yuzhangyu ac29645743 Add blob dump support to the dump_live_files command (#9896)
Summary:
This patch completes the second part of the task: "Add blob support to the dump and dump_live_files command"

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9896

Reviewed By: ltamasi

Differential Revision: D35852667

Pulled By: jowlyzhang

fbshipit-source-id: a006456c881f468a92da689e895134762e9574e1
2022-04-22 16:54:43 -07:00
yuzhangyu fff28a7725 Add blob dump support to the dump command (#9881)
Summary:
This patch is the first part of adding blob dump support. It only adds blob dump support to the dump command. A follow up patch will add blob dump support to the dump_live_files command.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9881

Reviewed By: ltamasi

Differential Revision: D35796731

Pulled By: jowlyzhang

fbshipit-source-id: 2cc5973b222d505a331ac7b969edcf992b47c5ee
2022-04-21 20:37:07 -07:00
yuzhangyu 9b5790f018 Add --decode_blob_index option to idump and dump commands (#9870)
Summary:
This patch completes the first part of the task: "Extend all three commands so they can decode and print blob references if a new option --decode_blob_index is specified"

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9870

Reviewed By: ltamasi

Differential Revision: D35753932

Pulled By: jowlyzhang

fbshipit-source-id: 9d2bbba0eef2ed86b982767eba9de1b4881f35c9
2022-04-20 11:10:20 -07:00
yuzhangyu 082eb04200 Add option --decode_blob_index to dump_live_files command (#9842)
Summary:
This change only add decode blob index support to dump_live_files command, which is part of a task to add blob support to a few commands.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9842

Reviewed By: ltamasi

Differential Revision: D35650167

Pulled By: jowlyzhang

fbshipit-source-id: a78151b98bc38ac6f52c6e01ca6927a3429ddd14
2022-04-15 09:04:04 -07:00
Peter Dillinger 6534c6dea4 Fix remaining uses of "backupable" (#9792)
Summary:
Various renaming and fixes to get rid of remaining uses of
"backupable" which is terminology leftover from the original, flawed
design of BackupableDB. Now any DB can be backed up, using BackupEngine.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9792

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D35334386

Pulled By: pdillinger

fbshipit-source-id: 2108a42b4575c8cccdfd791c549aae93ec2f3329
2022-04-05 09:52:33 -07:00
Peter Dillinger a8a422e962 Add manifest fix-up utility for file temperatures (#9683)
Summary:
The goal of this change is to allow changes to the "current" (in
FileSystem) file temperatures to feed back into DB metadata, so that
they can inform decisions and stats reporting. In part because of
modular code factoring, it doesn't seem easy to do this automagically,
where opening an SST file and observing current Temperature different
from expected would trigger a change in metadata and DB manifest write
(essentially giving the deep read path access to the write path). It is also
difficult to do this while the DB is open because of the limitations of
LogAndApply.

This change allows updating file temperature metadata on a closed DB
using an experimental utility function UpdateManifestForFilesState()
or `ldb update_manifest --update_temperatures`. This should suffice for
"migration" scenarios where outside tooling has placed or re-arranged DB
files into a (different) tiered configuration without going through
RocksDB itself (currently, only compaction can change temperature
metadata).

Some details:
* Refactored and added unit test for `ldb unsafe_remove_sst_file` because
of shared functionality
* Pulled in autovector.h changes from https://github.com/facebook/rocksdb/issues/9546 to fix SuperVersionContext
move constructor (related to an older draft of this change)

Possible follow-up work:
* Support updating manifest with file checksums, such as when a
new checksum function is used and want existing DB metadata updated
for it.
* It's possible that for some repair scenarios, lighter weight than
full repair, we might want to support UpdateManifestForFilesState() to
modify critical file details like size or checksum using same
algorithm. But let's make sure these are differentiated from modifying
file details in ways that don't suspect corruption (or require extreme
trust).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9683

Test Plan: unit tests added

Reviewed By: jay-zhuang

Differential Revision: D34798828

Pulled By: pdillinger

fbshipit-source-id: cfd83e8fb10761d8c9e7f9c020d68c9106a95554
2022-03-18 16:35:51 -07:00
Peter Dillinger cff0d1e8e6 New backup meta schema, with file temperatures (#9660)
Summary:
The primary goal of this change is to add support for backing up and
restoring (applying on restore) file temperature metadata, without
committing to either the DB manifest or the FS reported "current"
temperatures being exclusive "source of truth".

To achieve this goal, we need to add temperature information to backup
metadata, which requires updated backup meta schema. Fortunately I
prepared for this in https://github.com/facebook/rocksdb/issues/8069, which began forward compatibility in version
6.19.0 for this kind of schema update. (Previously, backup meta schema
was not extensible! Making this schema update public will allow some
other "nice to have" features like taking backups with hard links, and
avoiding crc32c checksum computation when another checksum is already
available.) While schema version 2 is newly public, the default schema
version is still 1. Until we change the default, users will need to set
to 2 to enable features like temperature data backup+restore. New
metadata like temperature information will be ignored with a warning
in versions before this change and since 6.19.0. The metadata is
considered ignorable because a functioning DB can be restored without
it.

Some detail:
* Some renaming because "future schema" is now just public schema 2.
* Initialize some atomics in TestFs (linter reported)
* Add temperature hint support to SstFileDumper (used by BackupEngine)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9660

Test Plan:
related unit test majorly updated for the new functionality,
including some shared testing support for tracking temperatures in a FS.

Some other tests and testing hooks into production code also updated for
making the backup meta schema change public.

Reviewed By: ajkr

Differential Revision: D34686968

Pulled By: pdillinger

fbshipit-source-id: 3ac1fa3e67ee97ca8a5103d79cc87d872c1d862a
2022-03-18 11:06:17 -07:00
Changneng Chen 9ed96703d1 Add support for BlobDB to ldb (#9630)
Summary:
Add the configuration options and help messages of BlobDB to `ldb`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9630

Test Plan: `python ./tools/ldb_test.py`

Reviewed By: ltamasi

Differential Revision: D34443176

Pulled By: changneng

fbshipit-source-id: 5b3f185cdfc2561e06dd37215c7edfbca07dbe80
2022-02-25 23:13:11 -08:00
Peter Dillinger 78aee6fedc Remove obsolete backupable_db.h, utility_db.h (#9438)
Summary:
This also removes the obsolete names BackupableDBOptions
and UtilityDB. API users must now use BackupEngineOptions and
DBWithTTL::Open. In C API, `rocksdb_backupable_db_*` is replaced
`rocksdb_backup_engine_*`. Similar renaming in Java API.

In reference to https://github.com/facebook/rocksdb/issues/9389

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9438

Test Plan: CI

Reviewed By: mrambacher

Differential Revision: D33780269

Pulled By: pdillinger

fbshipit-source-id: 4a6cfc5c1b4c78bcad790b9d3dd13c5fdf4a1fac
2022-01-27 15:45:30 -08:00
yaphet 7fb723f581 Using back to get the last element (#9415)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9415

Reviewed By: ajkr

Differential Revision: D33773673

Pulled By: riversand963

fbshipit-source-id: 52b59ec5a6b01a91d3f990b7f2b0f16320afb49b
2022-01-27 11:35:33 -08:00
Yanqin Jin bd513fd075 Add commit marker with timestamp (#9266)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9266

This diff adds a new tag `CommitWithTimestamp`. Currently, there is no API to trigger writing
this tag to WAL, thus it is unavailable to users.
This is an ongoing effort to add user-defined timestamp support to write-committed transactions.
This diff also indicates all column families that may potentially participate in the same
transaction must either disable timestamp or have the same timestamp format, since
`CommitWithTimestamp` tag is followed by a single byte-array denoting the commit
timestamp of the transaction. We will enforce this checking in a future diff. We keep this
diff small.

Reviewed By: ltamasi

Differential Revision: D31721350

fbshipit-source-id: e1450811443647feb6ca01adec4c8aaae270ffc6
2021-12-10 11:05:35 -08:00
Pradeep Ambati e5bfb91d09 List blob files when using command - list_live_files_metadata (#8976)
Summary:
The ldb list_live_files_metadata command does not print any information about blob files currently. We would like to add this functionality. Note that list_live_files_metadata has two different modes of operation: the one shown above, which shows the LSM tree structure, and another one, which can be enabled using the flag --sort_by_filename and simply lists the files in numerical order regardless of level. We would like to show blob files in both modes.

Changes:
1. Using GetAllColumnFamilyMetaData API instead of GetLiveFilesMetaData API for fetching live files data.

Testing:
1. Created a sample rocksdb instance using dbbench command (this creates both SST and blob files)
2. Checked if the blob files are listed or not by using ldb commands.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8976

Reviewed By: ltamasi

Differential Revision: D31316061

Pulled By: pradeepambati

fbshipit-source-id: d15cdea192febf7a45f28deee2ba40615d3d84ab
2021-09-30 15:13:11 -07:00
mrambacher 13ae16c315 Cleanup includes in dbformat.h (#8930)
Summary:
This header file was including everything and the kitchen sink when it did not need to.  This resulted in many places including this header when they needed other pieces instead.

Cleaned up this header to only include what was needed and fixed up the remaining code to include what was now missing.

Hopefully, this sort of code hygiene cleanup will speed up the builds...

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8930

Reviewed By: pdillinger

Differential Revision: D31142788

Pulled By: mrambacher

fbshipit-source-id: 6b45de3f300750c79f751f6227dece9cfd44085d
2021-09-29 04:04:40 -07:00
sdong ba48ff8303 Fix ldb --try_load_options doesn't use customized Env (#8929)
Summary:
As title. The reason is that after loading customized options, the env is not set back to the correct one. Fix it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8929

Test Plan: Manually validate in an environment where the command failed.

Reviewed By: riversand963

Differential Revision: D31026931

fbshipit-source-id: c25dc788bf80ed5bf4b24922c442781943bcd65b
2021-09-17 15:26:27 -07:00
Peter Dillinger a5566d508b Fix flaky, dubious LdbCmdTest::*DumpFileChecksum* (#8898)
Summary:
These tests would frequently fail to find SST files due to race
condition in running ldb (read-only) on an open DB which might do automatic
compaction. But only sometimes would that failure translate into test
failure because the implementation of ldb file_checksum_dump would
swallow many errors. Now,

* DB closed while running ldb to avoid unnecessary race condition
* Detect and report/propagate more failures in `ldb file_checksum_dump`
* Use --hex so that random binary data is not printed to console

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8898

Test Plan: ./ldb_cmd_test --gtest_filter=*Checksum* --gtest_repeat=100

Reviewed By: zhichao-cao

Differential Revision: D30848738

Pulled By: pdillinger

fbshipit-source-id: 20290b517eeceba99bb538bb5a17088f7e878405
2021-09-13 17:07:21 -07:00
Peter Dillinger c9cd5d25a8 Remove some unneeded code (#8736)
Summary:
* FullKey and ParseFullKey appear to serve no purpose in the public API
(or anything else) so removed. Only use in one test updated.
* NumberToString serves no purpose vs. ToString so removed, numerous
calls updated
* Remove unnecessary forward declarations in metadata.h by re-arranging
class definitions.
* Remove some unneeded semicolons

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8736

Test Plan: existing tests

Reviewed By: mrambacher

Differential Revision: D30700039

Pulled By: pdillinger

fbshipit-source-id: 1e436a576f511a6ed8b4d97af7cc8216bc729af2
2021-09-01 14:28:58 -07:00
mrambacher ab7f7c9e49 Allow WAL dir to change with db dir (#8582)
Summary:
Prior to this change, the "wal_dir"  DBOption would always be set (defaults to dbname) when the DBOptions were sanitized.  Because of this setitng in the options file, it was not possible to rename/relocate a database directory after it had been created and use the existing options file.

After this change, the "wal_dir" option is only set under specific circumstances.  Methods were added to the ImmutableDBOptions class to see if it is set and if it is set to something other than the dbname.  Additionally, a method was added to retrieve the effective value of the WAL dir (either the option or the dbname/path).

Tests were added to the core and ldb to test that a database could be created and renamed without issue.  Additional tests for various permutations of wal_dir were also added.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8582

Reviewed By: pdillinger, autopear

Differential Revision: D29881122

Pulled By: mrambacher

fbshipit-source-id: 67d3d033dc8813d59917b0a3fba2550c0efd6dfb
2021-07-30 12:16:44 -07:00
Peter Dillinger 0804b44fb6 Some fixes and enhancements to ldb repair (#8544)
Summary:
* Basic handling of SST file with just range tombstones rather than
failing assertion about smallest_seqno <= largest_seqno
* Adds --verbose option so that there exists a way to see the INFO
output from Repairer.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8544

Test Plan: unit test added, manual testing for --verbose

Reviewed By: ajkr

Differential Revision: D29954805

Pulled By: pdillinger

fbshipit-source-id: 696af25805fc36cc178b04ba6045922a22625fd9
2021-07-28 16:44:14 -07:00
Baptiste Lemaire 3f20925dc4 Add list live files metadata (#8446)
Summary:
Add an argument to ldb to dump live file names, column families, and levels, `list_live_files_metadata`. The output shows all active SST file names, sorted first by column family and then by level. For each level the SST files are sorted alphabetically.

Typically, the output looks like this:
```
./ldb --db=/tmp/test_db list_live_files_metadata
Live SST Files:
===== Column Family: default =====
---------- level 0 ----------
/tmp/test_db/000069.sst
---------- level 1 ----------
/tmp/test_db/000064.sst
/tmp/test_db/000065.sst
/tmp/test_db/000066.sst
/tmp/test_db/000071.sst
---------- level 2 ----------
/tmp/test_db/000038.sst
/tmp/test_db/000039.sst
/tmp/test_db/000052.sst
/tmp/test_db/000067.sst
/tmp/test_db/000070.sst
------------------------------
```

Second, a flag was added `--sort_by_filename`, to change the layout of the output. When this flag is added to the command, the output shows all active SST files sorted by name, in front of which the LSM level and the column family are mentioned. With the same example, the following command would return:
```
./ldb --db=/tmp/test_db list_live_files_metadata --sort_by_filename
Live SST Files:
/tmp/test_db/000038.sst : level 2, column family 'default'
/tmp/test_db/000039.sst : level 2, column family 'default'
/tmp/test_db/000052.sst : level 2, column family 'default'
/tmp/test_db/000064.sst : level 1, column family 'default'
/tmp/test_db/000065.sst : level 1, column family 'default'
/tmp/test_db/000066.sst : level 1, column family 'default'
/tmp/test_db/000067.sst : level 2, column family 'default'
/tmp/test_db/000069.sst : level 0, column family 'default'
/tmp/test_db/000070.sst : level 2, column family 'default'
/tmp/test_db/000071.sst : level 1, column family 'default'
------------------------------
```

Thus, the user can either request to show the files by levels, or sorted by filenames.
This PR includes a simple Python unit test that makes sure the file name and level printed out by this new feature matches the one found with an existing feature, `dump_live_file`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8446

Reviewed By: akankshamahajan15

Differential Revision: D29320080

Pulled By: bjlemaire

fbshipit-source-id: 01fb7b5637c59010d74c80730a28d815994e7009
2021-06-22 19:07:46 -07:00
Baptiste Lemaire 0a1aed4e71 Fix double slashes in user-provided db path. (#8439)
Summary:
At the moment, the following command : "`./ --db=mypath/ dump_file_files`" returns a series of erronous names with double slashes, ie: "`mypath//000xxx.sst`", including manifest file names with double slashes "`mypath//MANIFEST-00XXX`", whereas "`./ --db=mypath dump_file_files`" correctly returns "`mypath/000xxx.sst`" and "`mypath/MANIFEST-00XXX`".

This (very short) PR simply checks if there is a need to add or remove any '`/`' character when the `db_path` and `manifest_filename`/sst `filenames` are concatenated.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8439

Reviewed By: akankshamahajan15

Differential Revision: D29301349

Pulled By: bjlemaire

fbshipit-source-id: 3e9e58f9749d278b654ae838fcee13ad698705a8
2021-06-22 11:46:25 -07:00
mrambacher 281ac9c89e Add CreateFrom methods to Env/FileSystem (#8174)
Summary:
- Added CreateFromString method to Env and FilesSystem to replace LoadEnv/Load.  This method/signature is a precursor to making these classes extend Customizable.

- Added CreateFromSystem to Env.  This method standardizes creating an Env from the environment variables.  Previously, some places would check TEST_ENV_URI and others would also check TEST_FS_URI.  Now the code is more command/standardized.

- Added CreateFromFlags to Env.  These method allows Env to be create from string options (such as GFLAGS options) in a more standard way.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8174

Reviewed By: zhichao-cao

Differential Revision: D28999603

Pulled By: mrambacher

fbshipit-source-id: 88e6911e7e91f908458a7fe10a20e93ecbc275fb
2021-06-15 03:43:48 -07:00
Zhichao Cao f44e69c64a Use DbSessionId as cache key prefix when secondary cache is enabled (#8360)
Summary:
Currently, we either use the file system inode or a monotonically incrementing runtime ID as the block cache key prefix. However, if we use a monotonically incrementing runtime ID (in the case that the file system does not support inode id generation), in some cases, it cannot ensure uniqueness (e.g., we have secondary cache migrated from host to host). We use DbSessionID (20 bytes) + current file number (at most 10 bytes) as the new cache block key prefix when the secondary cache is enabled. So can accommodate scenarios such as transfer of cache state across hosts.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8360

Test Plan: add the test to lru_cache_test

Reviewed By: pdillinger

Differential Revision: D29006215

Pulled By: zhichao-cao

fbshipit-source-id: 6cff686b38d83904667a2bd39923cd030df16814
2021-06-10 11:02:43 -07:00
Yanqin Jin 063a68b9cd Check and handle failure in ldb (#8072)
Summary:
Currently, a few ldb commands do not check the execution result of
database operations. This PR checks the execution results and tries to
improve the error reporting.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8072

Test Plan:
```
make check
```
and
```
ASSERT_STATUS_CHECKED=1 make -j20 ldb
python tools/ldb_test.py
```

Reviewed By: zhichao-cao

Differential Revision: D27152466

Pulled By: riversand963

fbshipit-source-id: b94220496a4b3591b61c1d350f665860a6579f30
2021-03-18 14:43:34 -07:00
Hans Holmberg 670567db09 Add support for custom file systems to ldb and sst_dump (#8010)
Summary:
This PR adds support for custom file systems to ldb and sst_dump by adding command line options for specifying --fs_uri and --backup_fs uri (for ldb backup/restore commands). fs_uri is already supported in db_bench and db_stress, and there is already support in ldb and db stress for specifying customized envs.

The PR also fixes what looks like a bug in the ldb backup/restore commands. As it is right now, backups can only be made from and to the same environment/file system which does not seem to be the intended behavior. This PR makes it possible to do/restore backups between different envs/file systems.

Example:
`./ldb backup --fs_uri=zenfs://dev:nvme2n1 --backup_fs_uri=posix:// --backup_dir=/tmp/my_rocksdb_backup  --db=rocksdbtest/dbbench
`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8010

Reviewed By: jay-zhuang

Differential Revision: D26904654

Pulled By: ajkr

fbshipit-source-id: 9b695ed8b944fcc6b27c4daaa9f52e87ee2c1fb4
2021-03-09 20:49:15 -08:00
mrambacher 4a09d632c4 Remove Legacy and Custom FileWrapper classes from header files (#7851)
Summary:
Removed the uses of the Legacy FileWrapper classes from the source code.  The wrappers were creating an additional layer of indirection/wrapping, as the Env already has a FileSystem.

Moved the Custom FileWrapper classes into the CustomEnv, as these classes are really for the private use the the CustomEnv class.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7851

Reviewed By: anand1976

Differential Revision: D26114816

Pulled By: mrambacher

fbshipit-source-id: db32840e58d969d3a0fa6c25aaf13d6dcdc74150
2021-01-28 22:10:32 -08:00
anand76 8e7b068ecc Make ldb load column family options from OPTIONS file (#7847)
Summary:
When the --try_load_options is used in conjunction with the
--column_family option, ldb incorrectly sets the ColumnFamilyOptions for
that column family to defaults. This PR fixes that by retaining from the
OPTIONS file and applying command line overrides.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7847

Test Plan: Add a unit test in ldb_cmd_test

Reviewed By: ajkr

Differential Revision: D25874720

Pulled By: anand1976

fbshipit-source-id: 04bcf23b55e5a30b5b6a59b0e5cb4faef3da7429
2021-01-11 20:56:34 -08:00
Adam Retter 6e0f62f2b6 Add more tests to ASSERT_STATUS_CHECKED (3), API change (#7715)
Summary:
Third batch of adding more tests to ASSERT_STATUS_CHECKED.

* db_compaction_filter_test
* db_compaction_test
* db_dynamic_level_test
* db_inplace_update_test
* db_sst_test
* db_tailing_iter_test
* db_io_failure_test

Also update GetApproximateSizes APIs to all return Status.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7715

Reviewed By: jay-zhuang

Differential Revision: D25806896

Pulled By: pdillinger

fbshipit-source-id: 6cb9d62ba5a756c645812754c596ad3995d7c262
2021-01-06 14:15:02 -08:00
Andrew Kryczka b8c01ed38a Support --hex flag in ldb file_checksum_dump (#7820)
Summary:
Prior to this PR it prints the raw bytes which can include non-printable
characters. This PR adds the option to print in hex instead.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7820

Test Plan:
try it out

```
$ ./ldb file_checksum_dump --hex --db=/tmp/rocksdbtest-9383//db_basic_test_12281129388755189514/
16, FileChecksumCrc32c, 0xC789D948
```

Reviewed By: jay-zhuang

Differential Revision: D25738072

Pulled By: ajkr

fbshipit-source-id: 8cf2856877971756c0495cfa63a9a1281c414dc7
2021-01-04 11:13:14 -08:00