Commit graph

3 commits

Author SHA1 Message Date
Yu Zhang d458331ee9 Move file tracking in VersionEditHandlerPointInTime to VersionBuilder (#12928)
Summary:
`VersionEditHandlerPointInTime` is tracking found files, missing files, intermediate files in order to decide to build a `Version` on negative edge trigger (transition from valid to invalid) without applying  the current `VersionEdit`.  However, applying `VersionEdit` and check completeness of a `Version` are specialization of `VersionBuilder`.  More importantly, when we augment best efforts recovery to recover not just complete point in time Version but also a prefix of seqno for a point in time Version, such checks need to be duplicated in `VersionEditHandlerPointInTime` and `VersionBuilder`.

To avoid this, this refactor move all the file tracking functionality in `VersionEditHandlerPointInTime` into `VersionBuilder`.  To continue to let `VersionEditHandlerPIT` do the edge trigger check and  build a `Version` before applying the current `VersionEdit`, a suite of APIs to supporting creating a save point and its associated functions are added in `VersionBuilder` to achieve this.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12928

Test Plan: Existing tests

Reviewed By: anand1976

Differential Revision: D61171320

Pulled By: jowlyzhang

fbshipit-source-id: 604f66f8b1e3a3e13da59d8ba357c74e8a366dbc
2024-08-12 21:09:37 -07:00
anand76 0ed93552f4 Implement obsolete file deletion (GC) in follower (#12657)
Summary:
This PR implements deletion of obsolete files in a follower RocksDB instance. The follower tails the leader's MANIFEST and creates links to newly added SST files. These links need to be deleted once those files become obsolete in order to reclaim space. There are three cases to be considered -
1. New files added and links created, but the Version could not be installed due to some missing files. Those links need to be preserved so a subsequent catch up attempt can succeed. We insert the next file number in the `VersionSet` to `pending_outputs_` to prevent their deletion.
2. Files deleted from the previous successfully installed `Version`. These are deleted as usual in `PurgeObsoleteFiles`.
3. New files added by a `VersionEdit` and deleted by a subsequent `VersionEdit`, both processed in the same catchup attempt. Links will be created for the new files when verifying a candidate `Version`. Those need to be deleted explicitly as they're never added to `VersionStorageInfo`, and thus not deleted by `PurgeObsoleteFiles`.

Test plan -
New unit tests in `db_follower_test`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12657

Reviewed By: jowlyzhang

Differential Revision: D57462697

Pulled By: anand1976

fbshipit-source-id: 898f15570638dd4930f839ffd31c560f9cb73916
2024-05-17 19:13:33 -07:00
anand76 d8fb849b7e Basic RocksDB follower implementation (#12540)
Summary:
A basic implementation of RocksDB follower mode, which opens a remote database (referred to as leader) on a distributed file system by tailing its MANIFEST. It leverages the secondary instance mode, but is different in some key ways -
1. It has its own directory with links to the leader's database
2. Periodically refreshes itself
3. (Future) Snapshot support
4. (Future) Garbage collection of obsolete links
5. (Long term) Memtable replication

There are two main classes implementing this functionality - `DBImplFollower` and `OnDemandFileSystem`. The former is derived from `DBImplSecondary`. Similar to `DBImplSecondary`, it implements recovery and catch up through MANIFEST tailing using the `ReactiveVersionSet`, but does not consider logs. In a future PR, we will implement memtable replication, which will eliminate the need to catch up using logs. In addition, the recovery and catch-up tries to avoid directory listing as repeated metadata operations are expensive.

The second main piece is the `OnDemandFileSystem`, which plugs in as an `Env` for the follower instance and creates the illusion of the follower directory as a clone of the leader directory. It creates links to SSTs on first reference. When the follower tails the MANIFEST and attempts to create a new `Version`, it calls `VerifyFileMetadata` to verify the size of the file, and optionally the unique ID of the file. During this process, links are created which prevent the underlying files from getting deallocated even if the leader deletes the files.

TODOs: Deletion of obsolete links, snapshots, robust checking against misconfigurations, better observability etc.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12540

Reviewed By: jowlyzhang

Differential Revision: D56315718

Pulled By: anand1976

fbshipit-source-id: d19e1aca43a6af4000cb8622a718031b69ebd97b
2024-04-19 19:13:31 -07:00