Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
// Copyright (c) 2020-present, Facebook, Inc. All rights reserved.
|
|
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
|
|
// (found in the LICENSE.Apache file in the root directory).
|
|
|
|
|
2022-04-16 03:25:48 +00:00
|
|
|
#include "db/blob/blob_index.h"
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
#include "db/db_test_util.h"
|
|
|
|
#include "rocksdb/rocksdb_namespace.h"
|
|
|
|
|
|
|
|
namespace ROCKSDB_NAMESPACE {
|
|
|
|
|
|
|
|
enum class WriteBatchOpType {
|
|
|
|
kPut = 0,
|
|
|
|
kDelete,
|
|
|
|
kSingleDelete,
|
|
|
|
kDeleteRange,
|
|
|
|
kMerge,
|
2022-06-25 22:30:47 +00:00
|
|
|
kPutEntity,
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
kNum,
|
|
|
|
};
|
|
|
|
|
|
|
|
// Integer addition is needed for `::testing::Range()` to take the enum type.
|
|
|
|
WriteBatchOpType operator+(WriteBatchOpType lhs, const int rhs) {
|
|
|
|
using T = std::underlying_type<WriteBatchOpType>::type;
|
|
|
|
return static_cast<WriteBatchOpType>(static_cast<T>(lhs) + rhs);
|
|
|
|
}
|
|
|
|
|
2022-06-17 06:10:07 +00:00
|
|
|
enum class WriteMode {
|
|
|
|
// `Write()` a `WriteBatch` constructed with `protection_bytes_per_key > 0`.
|
|
|
|
kWriteProtectedBatch = 0,
|
|
|
|
// `Write()` a `WriteBatch` constructed with `protection_bytes_per_key == 0`.
|
|
|
|
// Protection is enabled via `WriteOptions::protection_bytes_per_key > 0`.
|
|
|
|
kWriteUnprotectedBatch,
|
|
|
|
// TODO(ajkr): add a mode that uses `Write()` wrappers, e.g., `Put()`.
|
|
|
|
kNum,
|
|
|
|
};
|
|
|
|
|
|
|
|
// Integer addition is needed for `::testing::Range()` to take the enum type.
|
|
|
|
WriteMode operator+(WriteMode lhs, const int rhs) {
|
|
|
|
using T = std::underlying_type<WriteMode>::type;
|
|
|
|
return static_cast<WriteMode>(static_cast<T>(lhs) + rhs);
|
|
|
|
}
|
|
|
|
|
2022-06-15 20:43:58 +00:00
|
|
|
std::pair<WriteBatch, Status> GetWriteBatch(ColumnFamilyHandle* cf_handle,
|
2022-06-17 06:10:07 +00:00
|
|
|
size_t protection_bytes_per_key,
|
2022-06-15 20:43:58 +00:00
|
|
|
WriteBatchOpType op_type) {
|
|
|
|
Status s;
|
|
|
|
WriteBatch wb(0 /* reserved_bytes */, 0 /* max_bytes */,
|
2022-06-17 06:10:07 +00:00
|
|
|
protection_bytes_per_key, 0 /* default_cf_ts_sz */);
|
2022-06-15 20:43:58 +00:00
|
|
|
switch (op_type) {
|
|
|
|
case WriteBatchOpType::kPut:
|
|
|
|
s = wb.Put(cf_handle, "key", "val");
|
|
|
|
break;
|
|
|
|
case WriteBatchOpType::kDelete:
|
|
|
|
s = wb.Delete(cf_handle, "key");
|
|
|
|
break;
|
|
|
|
case WriteBatchOpType::kSingleDelete:
|
|
|
|
s = wb.SingleDelete(cf_handle, "key");
|
|
|
|
break;
|
|
|
|
case WriteBatchOpType::kDeleteRange:
|
|
|
|
s = wb.DeleteRange(cf_handle, "begin", "end");
|
|
|
|
break;
|
|
|
|
case WriteBatchOpType::kMerge:
|
|
|
|
s = wb.Merge(cf_handle, "key", "val");
|
|
|
|
break;
|
2022-06-25 22:30:47 +00:00
|
|
|
case WriteBatchOpType::kPutEntity:
|
|
|
|
s = wb.PutEntity(cf_handle, "key",
|
|
|
|
{{"attr_name1", "foo"}, {"attr_name2", "bar"}});
|
|
|
|
break;
|
2022-06-15 20:43:58 +00:00
|
|
|
case WriteBatchOpType::kNum:
|
|
|
|
assert(false);
|
|
|
|
}
|
|
|
|
return {std::move(wb), std::move(s)};
|
|
|
|
}
|
|
|
|
|
2022-06-25 22:30:47 +00:00
|
|
|
class DbKvChecksumTestBase : public DBTestBase {
|
|
|
|
public:
|
|
|
|
DbKvChecksumTestBase(const std::string& path, bool env_do_fsync)
|
|
|
|
: DBTestBase(path, env_do_fsync) {}
|
|
|
|
|
|
|
|
ColumnFamilyHandle* GetCFHandleToUse(ColumnFamilyHandle* column_family,
|
|
|
|
WriteBatchOpType op_type) const {
|
|
|
|
// Note: PutEntity cannot be called without column family
|
|
|
|
if (op_type == WriteBatchOpType::kPutEntity && !column_family) {
|
|
|
|
return db_->DefaultColumnFamily();
|
|
|
|
}
|
|
|
|
|
|
|
|
return column_family;
|
|
|
|
}
|
|
|
|
};
|
|
|
|
|
|
|
|
class DbKvChecksumTest : public DbKvChecksumTestBase,
|
2022-06-17 06:10:07 +00:00
|
|
|
public ::testing::WithParamInterface<
|
|
|
|
std::tuple<WriteBatchOpType, char, WriteMode>> {
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
public:
|
|
|
|
DbKvChecksumTest()
|
2022-06-25 22:30:47 +00:00
|
|
|
: DbKvChecksumTestBase("db_kv_checksum_test", /*env_do_fsync=*/false) {
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
op_type_ = std::get<0>(GetParam());
|
|
|
|
corrupt_byte_addend_ = std::get<1>(GetParam());
|
2022-06-17 06:10:07 +00:00
|
|
|
write_mode_ = std::get<2>(GetParam());
|
|
|
|
}
|
|
|
|
|
|
|
|
Status ExecuteWrite(ColumnFamilyHandle* cf_handle) {
|
|
|
|
switch (write_mode_) {
|
|
|
|
case WriteMode::kWriteProtectedBatch: {
|
2022-06-25 22:30:47 +00:00
|
|
|
auto batch_and_status =
|
|
|
|
GetWriteBatch(GetCFHandleToUse(cf_handle, op_type_),
|
|
|
|
8 /* protection_bytes_per_key */, op_type_);
|
2022-06-17 06:10:07 +00:00
|
|
|
assert(batch_and_status.second.ok());
|
|
|
|
return db_->Write(WriteOptions(), &batch_and_status.first);
|
|
|
|
}
|
|
|
|
case WriteMode::kWriteUnprotectedBatch: {
|
2022-06-25 22:30:47 +00:00
|
|
|
auto batch_and_status =
|
|
|
|
GetWriteBatch(GetCFHandleToUse(cf_handle, op_type_),
|
|
|
|
0 /* protection_bytes_per_key */, op_type_);
|
2022-06-17 06:10:07 +00:00
|
|
|
assert(batch_and_status.second.ok());
|
|
|
|
WriteOptions write_opts;
|
|
|
|
write_opts.protection_bytes_per_key = 8;
|
|
|
|
return db_->Write(write_opts, &batch_and_status.first);
|
|
|
|
}
|
|
|
|
case WriteMode::kNum:
|
|
|
|
assert(false);
|
|
|
|
}
|
|
|
|
return Status::NotSupported("WriteMode " +
|
|
|
|
std::to_string(static_cast<int>(write_mode_)));
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void CorruptNextByteCallBack(void* arg) {
|
|
|
|
Slice encoded = *static_cast<Slice*>(arg);
|
2022-05-05 20:08:21 +00:00
|
|
|
if (entry_len_ == std::numeric_limits<size_t>::max()) {
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
// We learn the entry size on the first attempt
|
|
|
|
entry_len_ = encoded.size();
|
|
|
|
}
|
|
|
|
// All entries should be the same size
|
|
|
|
assert(entry_len_ == encoded.size());
|
|
|
|
char* buf = const_cast<char*>(encoded.data());
|
|
|
|
buf[corrupt_byte_offset_] += corrupt_byte_addend_;
|
|
|
|
++corrupt_byte_offset_;
|
|
|
|
}
|
|
|
|
|
|
|
|
bool MoreBytesToCorrupt() { return corrupt_byte_offset_ < entry_len_; }
|
|
|
|
|
|
|
|
protected:
|
|
|
|
WriteBatchOpType op_type_;
|
|
|
|
char corrupt_byte_addend_;
|
2022-06-17 06:10:07 +00:00
|
|
|
WriteMode write_mode_;
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
size_t corrupt_byte_offset_ = 0;
|
2022-05-05 20:08:21 +00:00
|
|
|
size_t entry_len_ = std::numeric_limits<size_t>::max();
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
};
|
|
|
|
|
2022-06-15 20:43:58 +00:00
|
|
|
std::string GetOpTypeString(const WriteBatchOpType& op_type) {
|
|
|
|
switch (op_type) {
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
case WriteBatchOpType::kPut:
|
2022-06-15 20:43:58 +00:00
|
|
|
return "Put";
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
case WriteBatchOpType::kDelete:
|
2022-06-15 20:43:58 +00:00
|
|
|
return "Delete";
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
case WriteBatchOpType::kSingleDelete:
|
2022-06-15 20:43:58 +00:00
|
|
|
return "SingleDelete";
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
case WriteBatchOpType::kDeleteRange:
|
2022-06-15 20:43:58 +00:00
|
|
|
return "DeleteRange";
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
case WriteBatchOpType::kMerge:
|
2022-06-15 20:43:58 +00:00
|
|
|
return "Merge";
|
2022-06-25 22:30:47 +00:00
|
|
|
case WriteBatchOpType::kPutEntity:
|
|
|
|
return "PutEntity";
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
case WriteBatchOpType::kNum:
|
|
|
|
assert(false);
|
|
|
|
}
|
2022-06-15 20:43:58 +00:00
|
|
|
assert(false);
|
|
|
|
return "";
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
INSTANTIATE_TEST_CASE_P(
|
|
|
|
DbKvChecksumTest, DbKvChecksumTest,
|
|
|
|
::testing::Combine(::testing::Range(static_cast<WriteBatchOpType>(0),
|
|
|
|
WriteBatchOpType::kNum),
|
2022-06-17 06:10:07 +00:00
|
|
|
::testing::Values(2, 103, 251),
|
|
|
|
::testing::Range(static_cast<WriteMode>(0),
|
|
|
|
WriteMode::kNum)),
|
|
|
|
[](const testing::TestParamInfo<
|
|
|
|
std::tuple<WriteBatchOpType, char, WriteMode>>& args) {
|
2022-06-15 20:43:58 +00:00
|
|
|
std::ostringstream oss;
|
|
|
|
oss << GetOpTypeString(std::get<0>(args.param)) << "Add"
|
|
|
|
<< static_cast<int>(
|
|
|
|
static_cast<unsigned char>(std::get<1>(args.param)));
|
2022-06-17 06:10:07 +00:00
|
|
|
switch (std::get<2>(args.param)) {
|
|
|
|
case WriteMode::kWriteProtectedBatch:
|
|
|
|
oss << "WriteProtectedBatch";
|
|
|
|
break;
|
|
|
|
case WriteMode::kWriteUnprotectedBatch:
|
|
|
|
oss << "WriteUnprotectedBatch";
|
|
|
|
break;
|
|
|
|
case WriteMode::kNum:
|
|
|
|
assert(false);
|
|
|
|
}
|
2022-06-15 20:43:58 +00:00
|
|
|
return oss.str();
|
|
|
|
});
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
|
2022-06-17 06:10:07 +00:00
|
|
|
// TODO(ajkr): add a test that corrupts the `WriteBatch` contents. Such
|
|
|
|
// corruptions should only be detectable in `WriteMode::kWriteProtectedBatch`.
|
|
|
|
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
TEST_P(DbKvChecksumTest, MemTableAddCorrupted) {
|
|
|
|
// This test repeatedly attempts to write `WriteBatch`es containing a single
|
|
|
|
// entry of type `op_type_`. Each attempt has one byte corrupted in its
|
|
|
|
// memtable entry by adding `corrupt_byte_addend_` to its original value. The
|
|
|
|
// test repeats until an attempt has been made on each byte in the encoded
|
|
|
|
// memtable entry. All attempts are expected to fail with `Status::Corruption`
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"MemTable::Add:Encoded",
|
|
|
|
std::bind(&DbKvChecksumTest::CorruptNextByteCallBack, this,
|
|
|
|
std::placeholders::_1));
|
|
|
|
|
|
|
|
while (MoreBytesToCorrupt()) {
|
|
|
|
// Failed memtable insert always leads to read-only mode, so we have to
|
|
|
|
// reopen for every attempt.
|
|
|
|
Options options = CurrentOptions();
|
|
|
|
if (op_type_ == WriteBatchOpType::kMerge) {
|
|
|
|
options.merge_operator = MergeOperators::CreateStringAppendOperator();
|
|
|
|
}
|
|
|
|
Reopen(options);
|
|
|
|
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
2022-06-17 06:10:07 +00:00
|
|
|
ASSERT_TRUE(ExecuteWrite(nullptr /* cf_handle */).IsCorruption());
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
2022-06-15 20:43:58 +00:00
|
|
|
|
|
|
|
// In case the above callback is not invoked, this test will run
|
|
|
|
// numeric_limits<size_t>::max() times until it reports an error (or will
|
|
|
|
// exhaust disk space). Added this assert to report error early.
|
|
|
|
ASSERT_TRUE(entry_len_ < std::numeric_limits<size_t>::max());
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_P(DbKvChecksumTest, MemTableAddWithColumnFamilyCorrupted) {
|
|
|
|
// This test repeatedly attempts to write `WriteBatch`es containing a single
|
|
|
|
// entry of type `op_type_` to a non-default column family. Each attempt has
|
|
|
|
// one byte corrupted in its memtable entry by adding `corrupt_byte_addend_`
|
|
|
|
// to its original value. The test repeats until an attempt has been made on
|
|
|
|
// each byte in the encoded memtable entry. All attempts are expected to fail
|
|
|
|
// with `Status::Corruption`.
|
|
|
|
Options options = CurrentOptions();
|
|
|
|
if (op_type_ == WriteBatchOpType::kMerge) {
|
|
|
|
options.merge_operator = MergeOperators::CreateStringAppendOperator();
|
|
|
|
}
|
|
|
|
CreateAndReopenWithCF({"pikachu"}, options);
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"MemTable::Add:Encoded",
|
|
|
|
std::bind(&DbKvChecksumTest::CorruptNextByteCallBack, this,
|
|
|
|
std::placeholders::_1));
|
|
|
|
|
|
|
|
while (MoreBytesToCorrupt()) {
|
|
|
|
// Failed memtable insert always leads to read-only mode, so we have to
|
|
|
|
// reopen for every attempt.
|
|
|
|
ReopenWithColumnFamilies({kDefaultColumnFamilyName, "pikachu"}, options);
|
|
|
|
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
2022-06-17 06:10:07 +00:00
|
|
|
ASSERT_TRUE(ExecuteWrite(handles_[1]).IsCorruption());
|
2022-06-15 20:43:58 +00:00
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
|
|
|
|
// In case the above callback is not invoked, this test will run
|
|
|
|
// numeric_limits<size_t>::max() times until it reports an error (or will
|
|
|
|
// exhaust disk space). Added this assert to report error early.
|
|
|
|
ASSERT_TRUE(entry_len_ < std::numeric_limits<size_t>::max());
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_P(DbKvChecksumTest, NoCorruptionCase) {
|
|
|
|
// If this test fails, we may have found a piece of malfunctioned hardware
|
2022-06-17 06:10:07 +00:00
|
|
|
auto batch_and_status =
|
2022-06-25 22:30:47 +00:00
|
|
|
GetWriteBatch(GetCFHandleToUse(nullptr, op_type_),
|
|
|
|
8 /* protection_bytes_per_key */, op_type_);
|
2022-06-15 20:43:58 +00:00
|
|
|
ASSERT_OK(batch_and_status.second);
|
|
|
|
ASSERT_OK(batch_and_status.first.VerifyChecksum());
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_P(DbKvChecksumTest, WriteToWALCorrupted) {
|
|
|
|
// This test repeatedly attempts to write `WriteBatch`es containing a single
|
|
|
|
// entry of type `op_type_`. Each attempt has one byte corrupted by adding
|
|
|
|
// `corrupt_byte_addend_` to its original value. The test repeats until an
|
|
|
|
// attempt has been made on each byte in the encoded write batch. All attempts
|
|
|
|
// are expected to fail with `Status::Corruption`
|
|
|
|
Options options = CurrentOptions();
|
|
|
|
if (op_type_ == WriteBatchOpType::kMerge) {
|
|
|
|
options.merge_operator = MergeOperators::CreateStringAppendOperator();
|
|
|
|
}
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"DBImpl::WriteToWAL:log_entry",
|
|
|
|
std::bind(&DbKvChecksumTest::CorruptNextByteCallBack, this,
|
|
|
|
std::placeholders::_1));
|
|
|
|
// First 8 bytes are for sequence number which is not protected in write batch
|
|
|
|
corrupt_byte_offset_ = 8;
|
|
|
|
|
|
|
|
while (MoreBytesToCorrupt()) {
|
|
|
|
// Corrupted write batch leads to read-only mode, so we have to
|
|
|
|
// reopen for every attempt.
|
|
|
|
Reopen(options);
|
|
|
|
auto log_size_pre_write = dbfull()->TEST_total_log_size();
|
|
|
|
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
2022-06-17 06:10:07 +00:00
|
|
|
ASSERT_TRUE(ExecuteWrite(nullptr /* cf_handle */).IsCorruption());
|
2022-06-15 20:43:58 +00:00
|
|
|
// Confirm that nothing was written to WAL
|
|
|
|
ASSERT_EQ(log_size_pre_write, dbfull()->TEST_total_log_size());
|
|
|
|
ASSERT_TRUE(dbfull()->TEST_GetBGError().IsCorruption());
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
|
|
|
|
// In case the above callback is not invoked, this test will run
|
|
|
|
// numeric_limits<size_t>::max() times until it reports an error (or will
|
|
|
|
// exhaust disk space). Added this assert to report error early.
|
|
|
|
ASSERT_TRUE(entry_len_ < std::numeric_limits<size_t>::max());
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_P(DbKvChecksumTest, WriteToWALWithColumnFamilyCorrupted) {
|
|
|
|
// This test repeatedly attempts to write `WriteBatch`es containing a single
|
|
|
|
// entry of type `op_type_`. Each attempt has one byte corrupted by adding
|
|
|
|
// `corrupt_byte_addend_` to its original value. The test repeats until an
|
|
|
|
// attempt has been made on each byte in the encoded write batch. All attempts
|
|
|
|
// are expected to fail with `Status::Corruption`
|
|
|
|
Options options = CurrentOptions();
|
|
|
|
if (op_type_ == WriteBatchOpType::kMerge) {
|
|
|
|
options.merge_operator = MergeOperators::CreateStringAppendOperator();
|
|
|
|
}
|
|
|
|
CreateAndReopenWithCF({"pikachu"}, options);
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"DBImpl::WriteToWAL:log_entry",
|
|
|
|
std::bind(&DbKvChecksumTest::CorruptNextByteCallBack, this,
|
|
|
|
std::placeholders::_1));
|
|
|
|
// First 8 bytes are for sequence number which is not protected in write batch
|
|
|
|
corrupt_byte_offset_ = 8;
|
|
|
|
|
|
|
|
while (MoreBytesToCorrupt()) {
|
|
|
|
// Corrupted write batch leads to read-only mode, so we have to
|
|
|
|
// reopen for every attempt.
|
|
|
|
ReopenWithColumnFamilies({kDefaultColumnFamilyName, "pikachu"}, options);
|
|
|
|
auto log_size_pre_write = dbfull()->TEST_total_log_size();
|
|
|
|
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
2022-06-17 06:10:07 +00:00
|
|
|
ASSERT_TRUE(ExecuteWrite(nullptr /* cf_handle */).IsCorruption());
|
2022-06-15 20:43:58 +00:00
|
|
|
// Confirm that nothing was written to WAL
|
|
|
|
ASSERT_EQ(log_size_pre_write, dbfull()->TEST_total_log_size());
|
|
|
|
ASSERT_TRUE(dbfull()->TEST_GetBGError().IsCorruption());
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
2022-06-15 20:43:58 +00:00
|
|
|
|
|
|
|
// In case the above callback is not invoked, this test will run
|
|
|
|
// numeric_limits<size_t>::max() times until it reports an error (or will
|
|
|
|
// exhaust disk space). Added this assert to report error early.
|
|
|
|
ASSERT_TRUE(entry_len_ < std::numeric_limits<size_t>::max());
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
class DbKvChecksumTestMergedBatch
|
2022-06-25 22:30:47 +00:00
|
|
|
: public DbKvChecksumTestBase,
|
2022-06-15 20:43:58 +00:00
|
|
|
public ::testing::WithParamInterface<
|
|
|
|
std::tuple<WriteBatchOpType, WriteBatchOpType, char>> {
|
|
|
|
public:
|
|
|
|
DbKvChecksumTestMergedBatch()
|
2022-06-25 22:30:47 +00:00
|
|
|
: DbKvChecksumTestBase("db_kv_checksum_test", /*env_do_fsync=*/false) {
|
2022-06-15 20:43:58 +00:00
|
|
|
op_type1_ = std::get<0>(GetParam());
|
|
|
|
op_type2_ = std::get<1>(GetParam());
|
|
|
|
corrupt_byte_addend_ = std::get<2>(GetParam());
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
}
|
2022-06-15 20:43:58 +00:00
|
|
|
|
|
|
|
protected:
|
|
|
|
WriteBatchOpType op_type1_;
|
|
|
|
WriteBatchOpType op_type2_;
|
|
|
|
char corrupt_byte_addend_;
|
|
|
|
};
|
|
|
|
|
|
|
|
void CorruptWriteBatch(Slice* content, size_t offset,
|
|
|
|
char corrupt_byte_addend) {
|
|
|
|
ASSERT_TRUE(offset < content->size());
|
|
|
|
char* buf = const_cast<char*>(content->data());
|
|
|
|
buf[offset] += corrupt_byte_addend;
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_P(DbKvChecksumTestMergedBatch, NoCorruptionCase) {
|
|
|
|
// Veirfy write batch checksum after write batch append
|
2022-06-25 22:30:47 +00:00
|
|
|
auto batch1 = GetWriteBatch(GetCFHandleToUse(nullptr, op_type1_),
|
2022-06-17 06:10:07 +00:00
|
|
|
8 /* protection_bytes_per_key */, op_type1_);
|
2022-06-15 20:43:58 +00:00
|
|
|
ASSERT_OK(batch1.second);
|
2022-06-25 22:30:47 +00:00
|
|
|
auto batch2 = GetWriteBatch(GetCFHandleToUse(nullptr, op_type2_),
|
2022-06-17 06:10:07 +00:00
|
|
|
8 /* protection_bytes_per_key */, op_type2_);
|
2022-06-15 20:43:58 +00:00
|
|
|
ASSERT_OK(batch2.second);
|
|
|
|
ASSERT_OK(WriteBatchInternal::Append(&batch1.first, &batch2.first));
|
|
|
|
ASSERT_OK(batch1.first.VerifyChecksum());
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
}
|
|
|
|
|
2022-06-15 20:43:58 +00:00
|
|
|
TEST_P(DbKvChecksumTestMergedBatch, WriteToWALCorrupted) {
|
|
|
|
// This test has two writers repeatedly attempt to write `WriteBatch`es
|
|
|
|
// containing a single entry of type op_type1_ and op_type2_ respectively. The
|
|
|
|
// leader of the write group writes the batch containinng the entry of type
|
|
|
|
// op_type1_. One byte of the pre-merged write batches is corrupted by adding
|
|
|
|
// `corrupt_byte_addend_` to the batch's original value during each attempt.
|
|
|
|
// The test repeats until an attempt has been made on each byte in both
|
|
|
|
// pre-merged write batches. All attempts are expected to fail with
|
|
|
|
// `Status::Corruption`.
|
|
|
|
Options options = CurrentOptions();
|
|
|
|
if (op_type1_ == WriteBatchOpType::kMerge ||
|
|
|
|
op_type2_ == WriteBatchOpType::kMerge) {
|
|
|
|
options.merge_operator = MergeOperators::CreateStringAppendOperator();
|
|
|
|
}
|
|
|
|
|
2022-06-25 22:30:47 +00:00
|
|
|
auto leader_batch_and_status =
|
|
|
|
GetWriteBatch(GetCFHandleToUse(nullptr, op_type1_),
|
|
|
|
8 /* protection_bytes_per_key */, op_type1_);
|
2022-06-15 20:43:58 +00:00
|
|
|
ASSERT_OK(leader_batch_and_status.second);
|
2022-06-25 22:30:47 +00:00
|
|
|
auto follower_batch_and_status =
|
|
|
|
GetWriteBatch(GetCFHandleToUse(nullptr, op_type2_),
|
|
|
|
8 /* protection_bytes_per_key */, op_type2_);
|
2022-06-15 20:43:58 +00:00
|
|
|
size_t leader_batch_size = leader_batch_and_status.first.GetDataSize();
|
|
|
|
size_t total_bytes =
|
|
|
|
leader_batch_size + follower_batch_and_status.first.GetDataSize();
|
|
|
|
// First 8 bytes are for sequence number which is not protected in write batch
|
|
|
|
size_t corrupt_byte_offset = 8;
|
|
|
|
|
|
|
|
std::atomic<bool> follower_joined{false};
|
|
|
|
std::atomic<int> leader_count{0};
|
|
|
|
port::Thread follower_thread;
|
|
|
|
// This callback should only be called by the leader thread
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"WriteThread::JoinBatchGroup:Wait2", [&](void* arg_leader) {
|
|
|
|
auto* leader = reinterpret_cast<WriteThread::Writer*>(arg_leader);
|
|
|
|
ASSERT_EQ(leader->state, WriteThread::STATE_GROUP_LEADER);
|
|
|
|
|
|
|
|
// This callback should only be called by the follower thread
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"WriteThread::JoinBatchGroup:Wait", [&](void* arg_follower) {
|
|
|
|
auto* follower =
|
|
|
|
reinterpret_cast<WriteThread::Writer*>(arg_follower);
|
|
|
|
// The leader thread will wait on this bool and hence wait until
|
|
|
|
// this writer joins the write group
|
|
|
|
ASSERT_NE(follower->state, WriteThread::STATE_GROUP_LEADER);
|
|
|
|
if (corrupt_byte_offset >= leader_batch_size) {
|
|
|
|
Slice batch_content = follower->batch->Data();
|
|
|
|
CorruptWriteBatch(&batch_content,
|
|
|
|
corrupt_byte_offset - leader_batch_size,
|
|
|
|
corrupt_byte_addend_);
|
|
|
|
}
|
|
|
|
// Leader busy waits on this flag
|
|
|
|
follower_joined = true;
|
|
|
|
// So the follower does not enter the outer callback at
|
|
|
|
// WriteThread::JoinBatchGroup:Wait2
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
});
|
|
|
|
|
|
|
|
// Start the other writer thread which will join the write group as
|
|
|
|
// follower
|
|
|
|
follower_thread = port::Thread([&]() {
|
|
|
|
follower_batch_and_status =
|
2022-06-25 22:30:47 +00:00
|
|
|
GetWriteBatch(GetCFHandleToUse(nullptr, op_type2_),
|
2022-06-17 06:10:07 +00:00
|
|
|
8 /* protection_bytes_per_key */, op_type2_);
|
2022-06-15 20:43:58 +00:00
|
|
|
ASSERT_OK(follower_batch_and_status.second);
|
|
|
|
ASSERT_TRUE(
|
|
|
|
db_->Write(WriteOptions(), &follower_batch_and_status.first)
|
|
|
|
.IsCorruption());
|
|
|
|
});
|
|
|
|
|
|
|
|
ASSERT_EQ(leader->batch->GetDataSize(), leader_batch_size);
|
|
|
|
if (corrupt_byte_offset < leader_batch_size) {
|
|
|
|
Slice batch_content = leader->batch->Data();
|
|
|
|
CorruptWriteBatch(&batch_content, corrupt_byte_offset,
|
|
|
|
corrupt_byte_addend_);
|
|
|
|
}
|
|
|
|
leader_count++;
|
|
|
|
while (!follower_joined) {
|
|
|
|
// busy waiting
|
|
|
|
}
|
|
|
|
});
|
|
|
|
while (corrupt_byte_offset < total_bytes) {
|
|
|
|
// Reopen DB since it failed WAL write which lead to read-only mode
|
|
|
|
Reopen(options);
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
auto log_size_pre_write = dbfull()->TEST_total_log_size();
|
2022-06-25 22:30:47 +00:00
|
|
|
leader_batch_and_status =
|
|
|
|
GetWriteBatch(GetCFHandleToUse(nullptr, op_type1_),
|
|
|
|
8 /* protection_bytes_per_key */, op_type1_);
|
2022-06-15 20:43:58 +00:00
|
|
|
ASSERT_OK(leader_batch_and_status.second);
|
|
|
|
ASSERT_TRUE(db_->Write(WriteOptions(), &leader_batch_and_status.first)
|
|
|
|
.IsCorruption());
|
|
|
|
follower_thread.join();
|
|
|
|
// Prevent leader thread from entering this callback
|
|
|
|
SyncPoint::GetInstance()->ClearCallBack("WriteThread::JoinBatchGroup:Wait");
|
|
|
|
ASSERT_EQ(1, leader_count);
|
|
|
|
// Nothing should have been written to WAL
|
|
|
|
ASSERT_EQ(log_size_pre_write, dbfull()->TEST_total_log_size());
|
|
|
|
ASSERT_TRUE(dbfull()->TEST_GetBGError().IsCorruption());
|
|
|
|
|
|
|
|
corrupt_byte_offset++;
|
|
|
|
if (corrupt_byte_offset == leader_batch_size) {
|
|
|
|
// skip over the sequence number part of follower's write batch
|
|
|
|
corrupt_byte_offset += 8;
|
|
|
|
}
|
|
|
|
follower_joined = false;
|
|
|
|
leader_count = 0;
|
|
|
|
}
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_P(DbKvChecksumTestMergedBatch, WriteToWALWithColumnFamilyCorrupted) {
|
|
|
|
// This test has two writers repeatedly attempt to write `WriteBatch`es
|
|
|
|
// containing a single entry of type op_type1_ and op_type2_ respectively. The
|
|
|
|
// leader of the write group writes the batch containinng the entry of type
|
|
|
|
// op_type1_. One byte of the pre-merged write batches is corrupted by adding
|
|
|
|
// `corrupt_byte_addend_` to the batch's original value during each attempt.
|
|
|
|
// The test repeats until an attempt has been made on each byte in both
|
|
|
|
// pre-merged write batches. All attempts are expected to fail with
|
|
|
|
// `Status::Corruption`.
|
|
|
|
Options options = CurrentOptions();
|
|
|
|
if (op_type1_ == WriteBatchOpType::kMerge ||
|
|
|
|
op_type2_ == WriteBatchOpType::kMerge) {
|
|
|
|
options.merge_operator = MergeOperators::CreateStringAppendOperator();
|
|
|
|
}
|
|
|
|
CreateAndReopenWithCF({"ramen"}, options);
|
|
|
|
|
2022-06-17 06:10:07 +00:00
|
|
|
auto leader_batch_and_status =
|
2022-06-25 22:30:47 +00:00
|
|
|
GetWriteBatch(GetCFHandleToUse(handles_[1], op_type1_),
|
|
|
|
8 /* protection_bytes_per_key */, op_type1_);
|
2022-06-15 20:43:58 +00:00
|
|
|
ASSERT_OK(leader_batch_and_status.second);
|
2022-06-17 06:10:07 +00:00
|
|
|
auto follower_batch_and_status =
|
2022-06-25 22:30:47 +00:00
|
|
|
GetWriteBatch(GetCFHandleToUse(handles_[1], op_type2_),
|
|
|
|
8 /* protection_bytes_per_key */, op_type2_);
|
2022-06-15 20:43:58 +00:00
|
|
|
size_t leader_batch_size = leader_batch_and_status.first.GetDataSize();
|
|
|
|
size_t total_bytes =
|
|
|
|
leader_batch_size + follower_batch_and_status.first.GetDataSize();
|
|
|
|
// First 8 bytes are for sequence number which is not protected in write batch
|
|
|
|
size_t corrupt_byte_offset = 8;
|
|
|
|
|
|
|
|
std::atomic<bool> follower_joined{false};
|
|
|
|
std::atomic<int> leader_count{0};
|
|
|
|
port::Thread follower_thread;
|
|
|
|
// This callback should only be called by the leader thread
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"WriteThread::JoinBatchGroup:Wait2", [&](void* arg_leader) {
|
|
|
|
auto* leader = reinterpret_cast<WriteThread::Writer*>(arg_leader);
|
|
|
|
ASSERT_EQ(leader->state, WriteThread::STATE_GROUP_LEADER);
|
|
|
|
|
|
|
|
// This callback should only be called by the follower thread
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"WriteThread::JoinBatchGroup:Wait", [&](void* arg_follower) {
|
|
|
|
auto* follower =
|
|
|
|
reinterpret_cast<WriteThread::Writer*>(arg_follower);
|
|
|
|
// The leader thread will wait on this bool and hence wait until
|
|
|
|
// this writer joins the write group
|
|
|
|
ASSERT_NE(follower->state, WriteThread::STATE_GROUP_LEADER);
|
|
|
|
if (corrupt_byte_offset >= leader_batch_size) {
|
|
|
|
Slice batch_content =
|
|
|
|
WriteBatchInternal::Contents(follower->batch);
|
|
|
|
CorruptWriteBatch(&batch_content,
|
|
|
|
corrupt_byte_offset - leader_batch_size,
|
|
|
|
corrupt_byte_addend_);
|
|
|
|
}
|
|
|
|
follower_joined = true;
|
|
|
|
// So the follower does not enter the outer callback at
|
|
|
|
// WriteThread::JoinBatchGroup:Wait2
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
});
|
|
|
|
|
|
|
|
// Start the other writer thread which will join the write group as
|
|
|
|
// follower
|
|
|
|
follower_thread = port::Thread([&]() {
|
2022-06-25 22:30:47 +00:00
|
|
|
follower_batch_and_status =
|
|
|
|
GetWriteBatch(GetCFHandleToUse(handles_[1], op_type2_),
|
|
|
|
8 /* protection_bytes_per_key */, op_type2_);
|
2022-06-15 20:43:58 +00:00
|
|
|
ASSERT_OK(follower_batch_and_status.second);
|
|
|
|
ASSERT_TRUE(
|
|
|
|
db_->Write(WriteOptions(), &follower_batch_and_status.first)
|
|
|
|
.IsCorruption());
|
|
|
|
});
|
|
|
|
|
|
|
|
ASSERT_EQ(leader->batch->GetDataSize(), leader_batch_size);
|
|
|
|
if (corrupt_byte_offset < leader_batch_size) {
|
|
|
|
Slice batch_content = WriteBatchInternal::Contents(leader->batch);
|
|
|
|
CorruptWriteBatch(&batch_content, corrupt_byte_offset,
|
|
|
|
corrupt_byte_addend_);
|
|
|
|
}
|
|
|
|
leader_count++;
|
|
|
|
while (!follower_joined) {
|
|
|
|
// busy waiting
|
|
|
|
}
|
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
while (corrupt_byte_offset < total_bytes) {
|
|
|
|
// Reopen DB since it failed WAL write which lead to read-only mode
|
|
|
|
ReopenWithColumnFamilies({kDefaultColumnFamilyName, "ramen"}, options);
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
auto log_size_pre_write = dbfull()->TEST_total_log_size();
|
2022-06-17 06:10:07 +00:00
|
|
|
leader_batch_and_status =
|
2022-06-25 22:30:47 +00:00
|
|
|
GetWriteBatch(GetCFHandleToUse(handles_[1], op_type1_),
|
|
|
|
8 /* protection_bytes_per_key */, op_type1_);
|
2022-06-15 20:43:58 +00:00
|
|
|
ASSERT_OK(leader_batch_and_status.second);
|
|
|
|
ASSERT_TRUE(db_->Write(WriteOptions(), &leader_batch_and_status.first)
|
|
|
|
.IsCorruption());
|
|
|
|
follower_thread.join();
|
|
|
|
// Prevent leader thread from entering this callback
|
|
|
|
SyncPoint::GetInstance()->ClearCallBack("WriteThread::JoinBatchGroup:Wait");
|
|
|
|
|
|
|
|
ASSERT_EQ(1, leader_count);
|
|
|
|
// Nothing should have been written to WAL
|
|
|
|
ASSERT_EQ(log_size_pre_write, dbfull()->TEST_total_log_size());
|
|
|
|
ASSERT_TRUE(dbfull()->TEST_GetBGError().IsCorruption());
|
|
|
|
|
|
|
|
corrupt_byte_offset++;
|
|
|
|
if (corrupt_byte_offset == leader_batch_size) {
|
|
|
|
// skip over the sequence number part of follower's write batch
|
|
|
|
corrupt_byte_offset += 8;
|
|
|
|
}
|
|
|
|
follower_joined = false;
|
|
|
|
leader_count = 0;
|
|
|
|
}
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
}
|
|
|
|
|
|
|
|
INSTANTIATE_TEST_CASE_P(
|
|
|
|
DbKvChecksumTestMergedBatch, DbKvChecksumTestMergedBatch,
|
|
|
|
::testing::Combine(::testing::Range(static_cast<WriteBatchOpType>(0),
|
|
|
|
WriteBatchOpType::kNum),
|
|
|
|
::testing::Range(static_cast<WriteBatchOpType>(0),
|
|
|
|
WriteBatchOpType::kNum),
|
|
|
|
::testing::Values(2, 103, 251)),
|
|
|
|
[](const testing::TestParamInfo<
|
|
|
|
std::tuple<WriteBatchOpType, WriteBatchOpType, char>>& args) {
|
|
|
|
std::ostringstream oss;
|
|
|
|
oss << GetOpTypeString(std::get<0>(args.param))
|
|
|
|
<< GetOpTypeString(std::get<1>(args.param)) << "Add"
|
|
|
|
<< static_cast<int>(
|
|
|
|
static_cast<unsigned char>(std::get<2>(args.param)));
|
|
|
|
return oss.str();
|
|
|
|
});
|
|
|
|
|
|
|
|
// TODO: add test for transactions
|
|
|
|
// TODO: add test for corrupted write batch with WAL disabled
|
2022-07-05 22:44:35 +00:00
|
|
|
|
|
|
|
class DbKVChecksumWALToWriteBatchTest : public DBTestBase {
|
|
|
|
public:
|
|
|
|
DbKVChecksumWALToWriteBatchTest()
|
|
|
|
: DBTestBase("db_kv_checksum_test", /*env_do_fsync=*/false) {}
|
|
|
|
};
|
|
|
|
|
|
|
|
TEST_F(DbKVChecksumWALToWriteBatchTest, WriteBatchChecksumHandoff) {
|
|
|
|
Options options = CurrentOptions();
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_OK(db_->Put(WriteOptions(), "key", "val"));
|
|
|
|
std::string content = "";
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"DBImpl::RecoverLogFiles:BeforeUpdateProtectionInfo:batch",
|
|
|
|
[&](void* batch_ptr) {
|
|
|
|
WriteBatch* batch = reinterpret_cast<WriteBatch*>(batch_ptr);
|
|
|
|
content.assign(batch->Data().data(), batch->GetDataSize());
|
|
|
|
Slice batch_content = batch->Data();
|
|
|
|
// Corrupt first bit
|
|
|
|
CorruptWriteBatch(&batch_content, 0, 1);
|
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"DBImpl::RecoverLogFiles:BeforeUpdateProtectionInfo:checksum",
|
|
|
|
[&](void* checksum_ptr) {
|
|
|
|
// Verify that checksum is produced on the batch content
|
|
|
|
uint64_t checksum = *reinterpret_cast<uint64_t*>(checksum_ptr);
|
|
|
|
ASSERT_EQ(checksum, XXH3_64bits(content.data(), content.size()));
|
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
ASSERT_TRUE(TryReopen(options).IsCorruption());
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
};
|
|
|
|
|
Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 20:17:17 +00:00
|
|
|
} // namespace ROCKSDB_NAMESPACE
|
|
|
|
|
|
|
|
int main(int argc, char** argv) {
|
|
|
|
::testing::InitGoogleTest(&argc, argv);
|
|
|
|
return RUN_ALL_TESTS();
|
|
|
|
}
|