2016-02-09 23:12:00 +00:00
|
|
|
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
|
2017-07-15 23:03:42 +00:00
|
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
|
|
// (found in the LICENSE.Apache file in the root directory).
|
2013-12-05 21:09:13 +00:00
|
|
|
|
|
|
|
#include "rocksdb/table_properties.h"
|
2019-05-30 21:47:29 +00:00
|
|
|
|
2022-07-15 04:49:34 +00:00
|
|
|
#include "db/seqno_to_time_mapping.h"
|
2022-04-06 17:33:00 +00:00
|
|
|
#include "port/malloc.h"
|
2014-11-25 04:44:49 +00:00
|
|
|
#include "port/port.h"
|
2016-08-19 22:10:31 +00:00
|
|
|
#include "rocksdb/env.h"
|
Experimental support for SST unique IDs (#8990)
Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).
Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990
Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.
Reviewed By: zhichao-cao, mrambacher
Differential Revision: D31582865
Pulled By: pdillinger
fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
2021-10-19 06:28:28 +00:00
|
|
|
#include "rocksdb/unique_id.h"
|
2016-08-19 22:10:31 +00:00
|
|
|
#include "table/table_properties_internal.h"
|
Experimental support for SST unique IDs (#8990)
Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).
Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990
Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.
Reviewed By: zhichao-cao, mrambacher
Differential Revision: D31582865
Pulled By: pdillinger
fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
2021-10-19 06:28:28 +00:00
|
|
|
#include "table/unique_id_impl.h"
|
Improve / clean up meta block code & integrity (#9163)
Summary:
* Checksums are now checked on meta blocks unless specifically
suppressed or not applicable (e.g. plain table). (Was other way around.)
This means a number of cases that were not checking checksums now are,
including direct read TableProperties in Version::GetTableProperties
(fixed in meta_blocks ReadTableProperties), reading any block from
PersistentCache (fixed in BlockFetcher), read TableProperties in
SstFileDumper (ldb/sst_dump/BackupEngine) before table reader open,
maybe more.
* For that to work, I moved the global_seqno+TableProperties checksum
logic to the shared table/ code, because that is used by many utilies
such as SstFileDumper.
* Also for that to work, we have to know when we're dealing with a block
that has a checksum (trailer), so added that capability to Footer based
on magic number, and from there BlockFetcher.
* Knowledge of trailer presence has also fixed a problem where other
table formats were reading blocks including bytes for a non-existant
trailer--and awkwardly kind-of not using them, e.g. no shared code
checking checksums. (BlockFetcher compression type was populated
incorrectly.) Now we only read what is needed.
* Minimized code duplication and differing/incompatible/awkward
abstractions in meta_blocks.{cc,h} (e.g. SeekTo in metaindex block
without parsing block handle)
* Moved some meta block handling code from table_properties*.*
* Moved some code specific to block-based table from shared table/ code
to BlockBasedTable class. The checksum stuff means we can't completely
separate it, but things that don't need to be in shared table/ code
should not be.
* Use unique_ptr rather than raw ptr in more places. (Note: you can
std::move from unique_ptr to shared_ptr.)
Without enhancements to GetPropertiesOfAllTablesTest (see below),
net reduction of roughly 100 lines of code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9163
Test Plan:
existing tests and
* Enhanced DBTablePropertiesTest.GetPropertiesOfAllTablesTest to verify that
checksums are now checked on direct read of table properties by TableCache
(new test would fail before this change)
* Also enhanced DBTablePropertiesTest.GetPropertiesOfAllTablesTest to test
putting table properties under old meta name
* Also generally enhanced that same test to actually test what it was
supposed to be testing already, by kicking things out of table cache when
we don't want them there.
Reviewed By: ajkr, mrambacher
Differential Revision: D32514757
Pulled By: pdillinger
fbshipit-source-id: 507964b9311d186ae8d1131182290cbd97a99fa9
2021-11-18 19:42:12 +00:00
|
|
|
#include "util/random.h"
|
2014-11-25 04:44:49 +00:00
|
|
|
#include "util/string_util.h"
|
2013-12-05 21:09:13 +00:00
|
|
|
|
2020-02-20 20:07:53 +00:00
|
|
|
namespace ROCKSDB_NAMESPACE {
|
2013-12-05 21:09:13 +00:00
|
|
|
|
2015-10-08 23:57:35 +00:00
|
|
|
const uint32_t TablePropertiesCollectorFactory::Context::kUnknownColumnFamily =
|
2022-05-05 20:08:21 +00:00
|
|
|
std::numeric_limits<int32_t>::max();
|
2015-10-08 23:57:35 +00:00
|
|
|
|
2013-12-05 21:09:13 +00:00
|
|
|
namespace {
|
2022-10-25 18:50:38 +00:00
|
|
|
void AppendProperty(std::string& props, const std::string& key,
|
|
|
|
const std::string& value, const std::string& prop_delim,
|
|
|
|
const std::string& kv_delim) {
|
|
|
|
props.append(key);
|
|
|
|
props.append(kv_delim);
|
|
|
|
props.append(value);
|
|
|
|
props.append(prop_delim);
|
|
|
|
}
|
2013-12-05 21:09:13 +00:00
|
|
|
|
2022-10-25 18:50:38 +00:00
|
|
|
template <class TValue>
|
|
|
|
void AppendProperty(std::string& props, const std::string& key,
|
|
|
|
const TValue& value, const std::string& prop_delim,
|
|
|
|
const std::string& kv_delim) {
|
|
|
|
AppendProperty(props, key, std::to_string(value), prop_delim, kv_delim);
|
2013-12-05 21:09:13 +00:00
|
|
|
}
|
2022-10-25 18:50:38 +00:00
|
|
|
} // namespace
|
2013-12-05 21:09:13 +00:00
|
|
|
|
2022-10-25 18:50:38 +00:00
|
|
|
std::string TableProperties::ToString(const std::string& prop_delim,
|
|
|
|
const std::string& kv_delim) const {
|
2013-12-05 21:09:13 +00:00
|
|
|
std::string result;
|
|
|
|
result.reserve(1024);
|
|
|
|
|
|
|
|
// Basic Info
|
2014-02-05 00:21:47 +00:00
|
|
|
AppendProperty(result, "# data blocks", num_data_blocks, prop_delim,
|
|
|
|
kv_delim);
|
2013-12-05 21:09:13 +00:00
|
|
|
AppendProperty(result, "# entries", num_entries, prop_delim, kv_delim);
|
2018-10-30 22:29:58 +00:00
|
|
|
AppendProperty(result, "# deletions", num_deletions, prop_delim, kv_delim);
|
|
|
|
AppendProperty(result, "# merge operands", num_merge_operands, prop_delim,
|
|
|
|
kv_delim);
|
2018-06-27 03:18:43 +00:00
|
|
|
AppendProperty(result, "# range deletions", num_range_deletions, prop_delim,
|
|
|
|
kv_delim);
|
2013-12-05 21:09:13 +00:00
|
|
|
|
|
|
|
AppendProperty(result, "raw key size", raw_key_size, prop_delim, kv_delim);
|
2014-02-05 00:21:47 +00:00
|
|
|
AppendProperty(result, "raw average key size",
|
|
|
|
num_entries != 0 ? 1.0 * raw_key_size / num_entries : 0.0,
|
|
|
|
prop_delim, kv_delim);
|
|
|
|
AppendProperty(result, "raw value size", raw_value_size, prop_delim,
|
|
|
|
kv_delim);
|
|
|
|
AppendProperty(result, "raw average value size",
|
|
|
|
num_entries != 0 ? 1.0 * raw_value_size / num_entries : 0.0,
|
|
|
|
prop_delim, kv_delim);
|
2013-12-05 21:09:13 +00:00
|
|
|
|
|
|
|
AppendProperty(result, "data block size", data_size, prop_delim, kv_delim);
|
2018-05-26 01:41:31 +00:00
|
|
|
char index_block_size_str[80];
|
|
|
|
snprintf(index_block_size_str, sizeof(index_block_size_str),
|
2018-08-09 23:49:45 +00:00
|
|
|
"index block size (user-key? %d, delta-value? %d)",
|
|
|
|
static_cast<int>(index_key_is_user_key),
|
|
|
|
static_cast<int>(index_value_is_delta_encoded));
|
2018-05-26 01:41:31 +00:00
|
|
|
AppendProperty(result, index_block_size_str, index_size, prop_delim,
|
|
|
|
kv_delim);
|
2017-06-13 17:59:22 +00:00
|
|
|
if (index_partitions != 0) {
|
|
|
|
AppendProperty(result, "# index partitions", index_partitions, prop_delim,
|
|
|
|
kv_delim);
|
2022-10-25 18:50:38 +00:00
|
|
|
AppendProperty(result, "top-level index size", top_level_index_size,
|
|
|
|
prop_delim, kv_delim);
|
2017-06-13 17:59:22 +00:00
|
|
|
}
|
2014-02-05 00:21:47 +00:00
|
|
|
AppendProperty(result, "filter block size", filter_size, prop_delim,
|
|
|
|
kv_delim);
|
2021-05-22 00:10:29 +00:00
|
|
|
AppendProperty(result, "# entries for filter", num_filter_entries, prop_delim,
|
|
|
|
kv_delim);
|
2014-02-05 00:21:47 +00:00
|
|
|
AppendProperty(result, "(estimated) table size",
|
|
|
|
data_size + index_size + filter_size, prop_delim, kv_delim);
|
2013-12-05 21:09:13 +00:00
|
|
|
|
|
|
|
AppendProperty(
|
2014-02-05 00:21:47 +00:00
|
|
|
result, "filter policy name",
|
2013-12-05 21:09:13 +00:00
|
|
|
filter_policy_name.empty() ? std::string("N/A") : filter_policy_name,
|
2014-02-05 00:21:47 +00:00
|
|
|
prop_delim, kv_delim);
|
2013-12-05 21:09:13 +00:00
|
|
|
|
2018-05-21 21:33:55 +00:00
|
|
|
AppendProperty(result, "prefix extractor name",
|
|
|
|
prefix_extractor_name.empty() ? std::string("N/A")
|
|
|
|
: prefix_extractor_name,
|
|
|
|
prop_delim, kv_delim);
|
|
|
|
|
2016-04-09 01:50:18 +00:00
|
|
|
AppendProperty(result, "column family ID",
|
2020-02-20 20:07:53 +00:00
|
|
|
column_family_id ==
|
|
|
|
ROCKSDB_NAMESPACE::TablePropertiesCollectorFactory::
|
|
|
|
Context::kUnknownColumnFamily
|
2016-04-09 01:50:18 +00:00
|
|
|
? std::string("N/A")
|
2022-05-06 20:03:58 +00:00
|
|
|
: std::to_string(column_family_id),
|
2016-04-09 01:50:18 +00:00
|
|
|
prop_delim, kv_delim);
|
|
|
|
AppendProperty(
|
|
|
|
result, "column family name",
|
|
|
|
column_family_name.empty() ? std::string("N/A") : column_family_name,
|
|
|
|
prop_delim, kv_delim);
|
|
|
|
|
2016-04-21 17:16:28 +00:00
|
|
|
AppendProperty(result, "comparator name",
|
|
|
|
comparator_name.empty() ? std::string("N/A") : comparator_name,
|
|
|
|
prop_delim, kv_delim);
|
2023-08-08 19:25:21 +00:00
|
|
|
AppendProperty(result, "user defined timestamps persisted",
|
|
|
|
user_defined_timestamps_persisted ? std::string("true")
|
|
|
|
: std::string("false"),
|
|
|
|
prop_delim, kv_delim);
|
2024-08-21 23:24:18 +00:00
|
|
|
AppendProperty(result, "largest sequence number in file", key_largest_seqno,
|
|
|
|
prop_delim, kv_delim);
|
2016-04-21 17:16:28 +00:00
|
|
|
|
|
|
|
AppendProperty(
|
|
|
|
result, "merge operator name",
|
|
|
|
merge_operator_name.empty() ? std::string("N/A") : merge_operator_name,
|
|
|
|
prop_delim, kv_delim);
|
|
|
|
|
|
|
|
AppendProperty(result, "property collectors names",
|
|
|
|
property_collectors_names.empty() ? std::string("N/A")
|
|
|
|
: property_collectors_names,
|
|
|
|
prop_delim, kv_delim);
|
|
|
|
|
2016-05-12 16:47:16 +00:00
|
|
|
AppendProperty(
|
|
|
|
result, "SST file compression algo",
|
|
|
|
compression_name.empty() ? std::string("N/A") : compression_name,
|
|
|
|
prop_delim, kv_delim);
|
|
|
|
|
2019-04-02 21:48:52 +00:00
|
|
|
AppendProperty(
|
|
|
|
result, "SST file compression options",
|
|
|
|
compression_options.empty() ? std::string("N/A") : compression_options,
|
|
|
|
prop_delim, kv_delim);
|
|
|
|
|
2017-06-28 00:02:20 +00:00
|
|
|
AppendProperty(result, "creation time", creation_time, prop_delim, kv_delim);
|
|
|
|
|
2017-10-23 22:22:05 +00:00
|
|
|
AppendProperty(result, "time stamp of earliest key", oldest_key_time,
|
|
|
|
prop_delim, kv_delim);
|
|
|
|
|
Periodic Compactions (#5166)
Summary:
Introducing Periodic Compactions.
This feature allows all the files in a CF to be periodically compacted. It could help in catching any corruptions that could creep into the DB proactively as every file is constantly getting re-compacted. And also, of course, it helps to cleanup data older than certain threshold.
- Introduced a new option `periodic_compaction_time` to control how long a file can live without being compacted in a CF.
- This works across all levels.
- The files are put in the same level after going through the compaction. (Related files in the same level are picked up as `ExpandInputstoCleanCut` is used).
- Compaction filters, if any, are invoked as usual.
- A new table property, `file_creation_time`, is introduced to implement this feature. This property is set to the time at which the SST file was created (and that time is given by the underlying Env/OS).
This feature can be enabled on its own, or in conjunction with `ttl`. It is possible to set a different time threshold for the bottom level when used in conjunction with ttl. Since `ttl` works only on 0 to last but one levels, you could set `ttl` to, say, 1 day, and `periodic_compaction_time` to, say, 7 days. Since `ttl < periodic_compaction_time` all files in last but one levels keep getting picked up based on ttl, and almost never based on periodic_compaction_time. The files in the bottom level get picked up for compaction based on `periodic_compaction_time`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5166
Differential Revision: D14884441
Pulled By: sagar0
fbshipit-source-id: 408426cbacb409c06386a98632dcf90bfa1bda47
2019-04-11 02:24:25 +00:00
|
|
|
AppendProperty(result, "file creation time", file_creation_time, prop_delim,
|
|
|
|
kv_delim);
|
|
|
|
|
2021-04-01 01:20:44 +00:00
|
|
|
AppendProperty(result, "slow compression estimated data size",
|
|
|
|
slow_compression_estimated_data_size, prop_delim, kv_delim);
|
|
|
|
AppendProperty(result, "fast compression estimated data size",
|
|
|
|
fast_compression_estimated_data_size, prop_delim, kv_delim);
|
|
|
|
|
2020-06-17 17:55:42 +00:00
|
|
|
// DB identity and DB session ID
|
|
|
|
AppendProperty(result, "DB identity", db_id, prop_delim, kv_delim);
|
|
|
|
AppendProperty(result, "DB session identity", db_session_id, prop_delim,
|
|
|
|
kv_delim);
|
2021-08-21 03:39:52 +00:00
|
|
|
AppendProperty(result, "DB host id", db_host_id, prop_delim, kv_delim);
|
|
|
|
AppendProperty(result, "original file number", orig_file_number, prop_delim,
|
|
|
|
kv_delim);
|
2020-06-17 17:55:42 +00:00
|
|
|
|
Experimental support for SST unique IDs (#8990)
Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).
Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990
Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.
Reviewed By: zhichao-cao, mrambacher
Differential Revision: D31582865
Pulled By: pdillinger
fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
2021-10-19 06:28:28 +00:00
|
|
|
// Unique ID, when available
|
|
|
|
std::string id;
|
|
|
|
Status s = GetUniqueIdFromTableProperties(*this, &id);
|
|
|
|
AppendProperty(result, "unique ID",
|
|
|
|
s.ok() ? UniqueIdToHumanString(id) : "N/A", prop_delim,
|
|
|
|
kv_delim);
|
|
|
|
|
2022-07-15 04:49:34 +00:00
|
|
|
SeqnoToTimeMapping seq_time_mapping;
|
Fix/cleanup SeqnoToTimeMapping (#12253)
Summary:
The SeqnoToTimeMapping class (RocksDB internal) used by the preserve_internal_time_seconds / preclude_last_level_data_seconds options was essentially in a prototype state with some significant flaws that would risk biting us some day. This is a big, complicated change because both the implementation and the behavioral requirements of the class needed to be upgraded together. In short, this makes SeqnoToTimeMapping more internally responsible for maintaining good invariants, so that callers don't easily encounter dangerous scenarios.
* Some API functions were confusingly named and structured, so I fully refactored the APIs to use clear naming (e.g. `DecodeFrom` and `CopyFromSeqnoRange`), object states, function preconditions, etc.
* Previously the object could informally be sorted / compacted or not, and there was limited checking or enforcement on these states. Now there's a well-defined "enforced" state that is consistently checked in debug mode for applicable operations. (I attempted to create a separate "builder" class for unenforced states, but IIRC found that more cumbersome for existing uses than it was worth.)
* Previously operations would coalesce data in a way that was better for `GetProximalTimeBeforeSeqno` than for `GetProximalSeqnoBeforeTime` which is odd because the latter is the only one used by DB code currently (what is the seqno cut-off for data definitely older than this given time?). This is now reversed to consistently favor `GetProximalSeqnoBeforeTime`, with that logic concentrated in one place: `SeqnoToTimeMapping::SeqnoTimePair::Merge()`. Unfortunately, a lot of unit test logic was specifically testing the old, suboptimal behavior.
* Previously, the natural behavior of SeqnoToTimeMapping was to THROW AWAY data needed to get reasonable answers to the important `GetProximalSeqnoBeforeTime` queries. This is because SeqnoToTimeMapping only had a FIFO policy for staying within the entry capacity (except in aggregate+sort+serialize mode). If the DB wasn't extremely careful to avoid gathering too many time mappings, it could lose track of where the seqno cutoff was for cold data (`GetProximalSeqnoBeforeTime()` returning 0) and preventing all further data migration to the cold tier--until time passes etc. for mappings to catch up with FIFO purging of them. (The problem is not so acute because SST files contain relevant snapshots of the mappings, but the problem would apply to long-lived memtables.)
* Now the SeqnoToTimeMapping class has fully-integrated smarts for keeping a sufficiently complete history, within capacity limits, to give good answers to `GetProximalSeqnoBeforeTime` queries.
* Fixes old `// FIXME: be smarter about how we erase to avoid data falling off the front prematurely.`
* Fix an apparent bug in how entries are selected for storing into SST files. Previously, it only selected entries within the seqno range of the file, but that would easily leave a gap at the beginning of the timeline for data in the file for the purposes of answering GetProximalXXX queries with reasonable accuracy. This could probably lead to the same problem discussed above in naively throwing away entries in FIFO order in the old SeqnoToTimeMapping. The updated testing of GetProximalSeqnoBeforeTime in BasicSeqnoToTimeMapping relies on the fixed behavior.
* Fix a potential compaction CPU efficiency/scaling issue in which each compaction output file would iterate over and sort all seqno-to-time mappings from all compaction input files. Now we distill the input file entries to a constant size before processing each compaction output file.
Intended follow-up (me or others):
* Expand some direct testing of SeqnoToTimeMapping APIs. Here I've focused on updating existing tests to make sense.
* There are likely more gaps in availability of needed SeqnoToTimeMapping data when the DB shuts down and is restarted, at least with WAL.
* The data tracked in the DB could be kept more accurate and limited if it used the oldest seqno of unflushed data. This might require some more API refactoring.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12253
Test Plan: unit tests updated
Reviewed By: jowlyzhang
Differential Revision: D52913733
Pulled By: pdillinger
fbshipit-source-id: 020737fcbbe6212f6701191a6ab86565054c9593
2024-01-20 05:50:38 +00:00
|
|
|
s = seq_time_mapping.DecodeFrom(seqno_to_time_mapping);
|
2022-07-15 04:49:34 +00:00
|
|
|
AppendProperty(result, "Sequence number to time mapping",
|
|
|
|
s.ok() ? seq_time_mapping.ToHumanString() : "N/A", prop_delim,
|
|
|
|
kv_delim);
|
|
|
|
|
2013-12-05 21:09:13 +00:00
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2015-08-25 19:03:54 +00:00
|
|
|
void TableProperties::Add(const TableProperties& tp) {
|
|
|
|
data_size += tp.data_size;
|
|
|
|
index_size += tp.index_size;
|
2017-06-13 17:59:22 +00:00
|
|
|
index_partitions += tp.index_partitions;
|
|
|
|
top_level_index_size += tp.top_level_index_size;
|
2018-05-26 01:41:31 +00:00
|
|
|
index_key_is_user_key += tp.index_key_is_user_key;
|
2018-08-09 23:49:45 +00:00
|
|
|
index_value_is_delta_encoded += tp.index_value_is_delta_encoded;
|
2015-08-25 19:03:54 +00:00
|
|
|
filter_size += tp.filter_size;
|
|
|
|
raw_key_size += tp.raw_key_size;
|
|
|
|
raw_value_size += tp.raw_value_size;
|
|
|
|
num_data_blocks += tp.num_data_blocks;
|
|
|
|
num_entries += tp.num_entries;
|
2021-05-22 00:10:29 +00:00
|
|
|
num_filter_entries += tp.num_filter_entries;
|
2018-10-30 22:29:58 +00:00
|
|
|
num_deletions += tp.num_deletions;
|
|
|
|
num_merge_operands += tp.num_merge_operands;
|
2018-06-27 03:18:43 +00:00
|
|
|
num_range_deletions += tp.num_range_deletions;
|
2021-04-01 01:20:44 +00:00
|
|
|
slow_compression_estimated_data_size +=
|
|
|
|
tp.slow_compression_estimated_data_size;
|
|
|
|
fast_compression_estimated_data_size +=
|
|
|
|
tp.fast_compression_estimated_data_size;
|
2015-08-25 19:03:54 +00:00
|
|
|
}
|
|
|
|
|
2020-12-19 15:59:08 +00:00
|
|
|
std::map<std::string, uint64_t>
|
|
|
|
TableProperties::GetAggregatablePropertiesAsMap() const {
|
|
|
|
std::map<std::string, uint64_t> rv;
|
|
|
|
rv["data_size"] = data_size;
|
|
|
|
rv["index_size"] = index_size;
|
|
|
|
rv["index_partitions"] = index_partitions;
|
|
|
|
rv["top_level_index_size"] = top_level_index_size;
|
|
|
|
rv["filter_size"] = filter_size;
|
|
|
|
rv["raw_key_size"] = raw_key_size;
|
|
|
|
rv["raw_value_size"] = raw_value_size;
|
|
|
|
rv["num_data_blocks"] = num_data_blocks;
|
|
|
|
rv["num_entries"] = num_entries;
|
2021-05-22 00:10:29 +00:00
|
|
|
rv["num_filter_entries"] = num_filter_entries;
|
2020-12-19 15:59:08 +00:00
|
|
|
rv["num_deletions"] = num_deletions;
|
|
|
|
rv["num_merge_operands"] = num_merge_operands;
|
|
|
|
rv["num_range_deletions"] = num_range_deletions;
|
2021-04-01 01:20:44 +00:00
|
|
|
rv["slow_compression_estimated_data_size"] =
|
|
|
|
slow_compression_estimated_data_size;
|
|
|
|
rv["fast_compression_estimated_data_size"] =
|
|
|
|
fast_compression_estimated_data_size;
|
2020-12-19 15:59:08 +00:00
|
|
|
return rv;
|
|
|
|
}
|
|
|
|
|
2022-04-06 17:33:00 +00:00
|
|
|
// WARNING: manual update to this function is needed
|
|
|
|
// whenever a new string property is added to TableProperties
|
|
|
|
// to reduce approximation error.
|
|
|
|
//
|
|
|
|
// TODO: eliminate the need of manually updating this function
|
|
|
|
// for new string properties
|
|
|
|
std::size_t TableProperties::ApproximateMemoryUsage() const {
|
|
|
|
std::size_t usage = 0;
|
|
|
|
#ifdef ROCKSDB_MALLOC_USABLE_SIZE
|
|
|
|
usage += malloc_usable_size((void*)this);
|
|
|
|
#else
|
|
|
|
usage += sizeof(*this);
|
|
|
|
#endif // ROCKSDB_MALLOC_USABLE_SIZE
|
|
|
|
|
|
|
|
std::size_t string_props_mem_usage =
|
|
|
|
db_id.size() + db_session_id.size() + db_host_id.size() +
|
|
|
|
column_family_name.size() + filter_policy_name.size() +
|
|
|
|
comparator_name.size() + merge_operator_name.size() +
|
|
|
|
prefix_extractor_name.size() + property_collectors_names.size() +
|
|
|
|
compression_name.size() + compression_options.size();
|
|
|
|
usage += string_props_mem_usage;
|
|
|
|
|
|
|
|
for (auto iter = user_collected_properties.begin();
|
|
|
|
iter != user_collected_properties.end(); ++iter) {
|
|
|
|
usage += (iter->first.size() + iter->second.size());
|
|
|
|
}
|
|
|
|
|
|
|
|
return usage;
|
|
|
|
}
|
|
|
|
|
2020-06-17 17:55:42 +00:00
|
|
|
const std::string TablePropertiesNames::kDbId = "rocksdb.creating.db.identity";
|
|
|
|
const std::string TablePropertiesNames::kDbSessionId =
|
|
|
|
"rocksdb.creating.session.identity";
|
2020-10-19 18:37:05 +00:00
|
|
|
const std::string TablePropertiesNames::kDbHostId =
|
|
|
|
"rocksdb.creating.host.identity";
|
2021-08-21 03:39:52 +00:00
|
|
|
const std::string TablePropertiesNames::kOriginalFileNumber =
|
|
|
|
"rocksdb.original.file.number";
|
2022-10-25 18:50:38 +00:00
|
|
|
const std::string TablePropertiesNames::kDataSize = "rocksdb.data.size";
|
|
|
|
const std::string TablePropertiesNames::kIndexSize = "rocksdb.index.size";
|
2017-06-13 17:59:22 +00:00
|
|
|
const std::string TablePropertiesNames::kIndexPartitions =
|
|
|
|
"rocksdb.index.partitions";
|
|
|
|
const std::string TablePropertiesNames::kTopLevelIndexSize =
|
|
|
|
"rocksdb.top-level.index.size";
|
2018-05-26 01:41:31 +00:00
|
|
|
const std::string TablePropertiesNames::kIndexKeyIsUserKey =
|
|
|
|
"rocksdb.index.key.is.user.key";
|
2018-08-09 23:49:45 +00:00
|
|
|
const std::string TablePropertiesNames::kIndexValueIsDeltaEncoded =
|
|
|
|
"rocksdb.index.value.is.delta.encoded";
|
2022-10-25 18:50:38 +00:00
|
|
|
const std::string TablePropertiesNames::kFilterSize = "rocksdb.filter.size";
|
|
|
|
const std::string TablePropertiesNames::kRawKeySize = "rocksdb.raw.key.size";
|
2013-12-05 21:09:13 +00:00
|
|
|
const std::string TablePropertiesNames::kRawValueSize =
|
|
|
|
"rocksdb.raw.value.size";
|
|
|
|
const std::string TablePropertiesNames::kNumDataBlocks =
|
|
|
|
"rocksdb.num.data.blocks";
|
2022-10-25 18:50:38 +00:00
|
|
|
const std::string TablePropertiesNames::kNumEntries = "rocksdb.num.entries";
|
2021-05-22 00:10:29 +00:00
|
|
|
const std::string TablePropertiesNames::kNumFilterEntries =
|
|
|
|
"rocksdb.num.filter_entries";
|
2018-10-30 22:29:58 +00:00
|
|
|
const std::string TablePropertiesNames::kDeletedKeys = "rocksdb.deleted.keys";
|
|
|
|
const std::string TablePropertiesNames::kMergeOperands =
|
|
|
|
"rocksdb.merge.operands";
|
2018-06-27 03:18:43 +00:00
|
|
|
const std::string TablePropertiesNames::kNumRangeDeletions =
|
|
|
|
"rocksdb.num.range-deletions";
|
2022-10-25 18:50:38 +00:00
|
|
|
const std::string TablePropertiesNames::kFilterPolicy = "rocksdb.filter.policy";
|
2013-12-20 17:35:24 +00:00
|
|
|
const std::string TablePropertiesNames::kFormatVersion =
|
|
|
|
"rocksdb.format.version";
|
|
|
|
const std::string TablePropertiesNames::kFixedKeyLen =
|
|
|
|
"rocksdb.fixed.key.length";
|
2016-04-07 06:10:32 +00:00
|
|
|
const std::string TablePropertiesNames::kColumnFamilyId =
|
|
|
|
"rocksdb.column.family.id";
|
|
|
|
const std::string TablePropertiesNames::kColumnFamilyName =
|
|
|
|
"rocksdb.column.family.name";
|
2016-04-21 17:16:28 +00:00
|
|
|
const std::string TablePropertiesNames::kComparator = "rocksdb.comparator";
|
|
|
|
const std::string TablePropertiesNames::kMergeOperator =
|
|
|
|
"rocksdb.merge.operator";
|
2016-08-26 18:46:32 +00:00
|
|
|
const std::string TablePropertiesNames::kPrefixExtractorName =
|
|
|
|
"rocksdb.prefix.extractor.name";
|
2016-04-21 17:16:28 +00:00
|
|
|
const std::string TablePropertiesNames::kPropertyCollectors =
|
|
|
|
"rocksdb.property.collectors";
|
2016-05-12 16:47:16 +00:00
|
|
|
const std::string TablePropertiesNames::kCompression = "rocksdb.compression";
|
2019-04-02 21:48:52 +00:00
|
|
|
const std::string TablePropertiesNames::kCompressionOptions =
|
|
|
|
"rocksdb.compression_options";
|
2017-06-28 00:02:20 +00:00
|
|
|
const std::string TablePropertiesNames::kCreationTime = "rocksdb.creation.time";
|
2017-10-23 22:22:05 +00:00
|
|
|
const std::string TablePropertiesNames::kOldestKeyTime =
|
|
|
|
"rocksdb.oldest.key.time";
|
Periodic Compactions (#5166)
Summary:
Introducing Periodic Compactions.
This feature allows all the files in a CF to be periodically compacted. It could help in catching any corruptions that could creep into the DB proactively as every file is constantly getting re-compacted. And also, of course, it helps to cleanup data older than certain threshold.
- Introduced a new option `periodic_compaction_time` to control how long a file can live without being compacted in a CF.
- This works across all levels.
- The files are put in the same level after going through the compaction. (Related files in the same level are picked up as `ExpandInputstoCleanCut` is used).
- Compaction filters, if any, are invoked as usual.
- A new table property, `file_creation_time`, is introduced to implement this feature. This property is set to the time at which the SST file was created (and that time is given by the underlying Env/OS).
This feature can be enabled on its own, or in conjunction with `ttl`. It is possible to set a different time threshold for the bottom level when used in conjunction with ttl. Since `ttl` works only on 0 to last but one levels, you could set `ttl` to, say, 1 day, and `periodic_compaction_time` to, say, 7 days. Since `ttl < periodic_compaction_time` all files in last but one levels keep getting picked up based on ttl, and almost never based on periodic_compaction_time. The files in the bottom level get picked up for compaction based on `periodic_compaction_time`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5166
Differential Revision: D14884441
Pulled By: sagar0
fbshipit-source-id: 408426cbacb409c06386a98632dcf90bfa1bda47
2019-04-11 02:24:25 +00:00
|
|
|
const std::string TablePropertiesNames::kFileCreationTime =
|
|
|
|
"rocksdb.file.creation.time";
|
2021-04-01 01:20:44 +00:00
|
|
|
const std::string TablePropertiesNames::kSlowCompressionEstimatedDataSize =
|
|
|
|
"rocksdb.sample_for_compression.slow.data.size";
|
|
|
|
const std::string TablePropertiesNames::kFastCompressionEstimatedDataSize =
|
|
|
|
"rocksdb.sample_for_compression.fast.data.size";
|
2022-07-15 04:49:34 +00:00
|
|
|
const std::string TablePropertiesNames::kSequenceNumberTimeMapping =
|
|
|
|
"rocksdb.seqno.time.map";
|
Record and use the tail size to prefetch table tail (#11406)
Summary:
**Context:**
We prefetch the tail part of a SST file (i.e, the blocks after data blocks till the end of the file) during each SST file open in hope to prefetch all the stuff at once ahead of time for later read e.g, footer, meta index, filter/index etc. The existing approach to estimate the tail size to prefetch is through `TailPrefetchStats` heuristics introduced in https://github.com/facebook/rocksdb/pull/4156, which has caused small reads in unlucky case (e.g, small read into the tail buffer during table open in thread 1 under the same BlockBasedTableFactory object can make thread 2's tail prefetching use a small size that it shouldn't) and is hard to debug. Therefore we decide to record the exact tail size and use it directly to prefetch tail of the SST instead of relying heuristics.
**Summary:**
- Obtain and record in manifest the tail size in `BlockBasedTableBuilder::Finish()`
- For backward compatibility, we fall back to TailPrefetchStats and last to simple heuristics that the tail size is a linear portion of the file size - see PR conversation for more.
- Make`tail_start_offset` part of the table properties and deduct tail size to record in manifest for external files (e.g, file ingestion, import CF) and db repair (with no access to manifest).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11406
Test Plan:
1. New UT
2. db bench
Note: db bench on /tmp/ where direct read is supported is too slow to finish and the default pinning setting in db bench is not helpful to profile # sst read of Get. Therefore I hacked the following to obtain the following comparison.
```
diff --git a/table/block_based/block_based_table_reader.cc b/table/block_based/block_based_table_reader.cc
index bd5669f0f..791484c1f 100644
--- a/table/block_based/block_based_table_reader.cc
+++ b/table/block_based/block_based_table_reader.cc
@@ -838,7 +838,7 @@ Status BlockBasedTable::PrefetchTail(
&tail_prefetch_size);
// Try file system prefetch
- if (!file->use_direct_io() && !force_direct_prefetch) {
+ if (false && !file->use_direct_io() && !force_direct_prefetch) {
if (!file->Prefetch(prefetch_off, prefetch_len, ro.rate_limiter_priority)
.IsNotSupported()) {
prefetch_buffer->reset(new FilePrefetchBuffer(
diff --git a/tools/db_bench_tool.cc b/tools/db_bench_tool.cc
index ea40f5fa0..39a0ac385 100644
--- a/tools/db_bench_tool.cc
+++ b/tools/db_bench_tool.cc
@@ -4191,6 +4191,8 @@ class Benchmark {
std::shared_ptr<TableFactory>(NewCuckooTableFactory(table_options));
} else {
BlockBasedTableOptions block_based_options;
+ block_based_options.metadata_cache_options.partition_pinning =
+ PinningTier::kAll;
block_based_options.checksum =
static_cast<ChecksumType>(FLAGS_checksum_type);
if (FLAGS_use_hash_search) {
```
Create DB
```
./db_bench --bloom_bits=3 --use_existing_db=1 --seed=1682546046158958 --partition_index_and_filters=1 --statistics=1 -db=/dev/shm/testdb/ -benchmarks=readrandom -key_size=3200 -value_size=512 -num=1000000 -write_buffer_size=6550000 -disable_auto_compactions=false -target_file_size_base=6550000 -compression_type=none
```
ReadRandom
```
./db_bench --bloom_bits=3 --use_existing_db=1 --seed=1682546046158958 --partition_index_and_filters=1 --statistics=1 -db=/dev/shm/testdb/ -benchmarks=readrandom -key_size=3200 -value_size=512 -num=1000000 -write_buffer_size=6550000 -disable_auto_compactions=false -target_file_size_base=6550000 -compression_type=none
```
(a) Existing (Use TailPrefetchStats for tail size + use seperate prefetch buffer in PartitionedFilter/IndexReader::CacheDependencies())
```
rocksdb.table.open.prefetch.tail.hit COUNT : 3395
rocksdb.sst.read.micros P50 : 5.655570 P95 : 9.931396 P99 : 14.845454 P100 : 585.000000 COUNT : 999905 SUM : 6590614
```
(b) This PR (Record tail size + use the same tail buffer in PartitionedFilter/IndexReader::CacheDependencies())
```
rocksdb.table.open.prefetch.tail.hit COUNT : 14257
rocksdb.sst.read.micros P50 : 5.173347 P95 : 9.015017 P99 : 12.912610 P100 : 228.000000 COUNT : 998547 SUM : 5976540
```
As we can see, we increase the prefetch tail hit count and decrease SST read count with this PR
3. Test backward compatibility by stepping through reading with post-PR code on a db generated pre-PR.
Reviewed By: pdillinger
Differential Revision: D45413346
Pulled By: hx235
fbshipit-source-id: 7d5e36a60a72477218f79905168d688452a4c064
2023-05-08 20:14:28 +00:00
|
|
|
const std::string TablePropertiesNames::kTailStartOffset =
|
|
|
|
"rocksdb.tail.start.offset";
|
2023-06-22 04:49:01 +00:00
|
|
|
const std::string TablePropertiesNames::kUserDefinedTimestampsPersisted =
|
|
|
|
"rocksdb.user.defined.timestamps.persisted";
|
2024-08-21 23:24:18 +00:00
|
|
|
const std::string TablePropertiesNames::kKeyLargestSeqno =
|
|
|
|
"rocksdb.key.largest.seqno";
|
2013-12-05 21:09:13 +00:00
|
|
|
|
Experimental support for SST unique IDs (#8990)
Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).
Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990
Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.
Reviewed By: zhichao-cao, mrambacher
Differential Revision: D31582865
Pulled By: pdillinger
fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
2021-10-19 06:28:28 +00:00
|
|
|
#ifndef NDEBUG
|
2022-04-07 19:25:43 +00:00
|
|
|
// WARNING: TEST_SetRandomTableProperties assumes the following layout of
|
|
|
|
// TableProperties
|
|
|
|
//
|
|
|
|
// struct TableProperties {
|
|
|
|
// int64_t orig_file_number = 0;
|
|
|
|
// ...
|
|
|
|
// ... int64_t properties only
|
|
|
|
// ...
|
|
|
|
// std::string db_id;
|
|
|
|
// ...
|
|
|
|
// ... std::string properties only
|
|
|
|
// ...
|
|
|
|
// std::string compression_options;
|
|
|
|
// UserCollectedProperties user_collected_properties;
|
|
|
|
// ...
|
|
|
|
// ... Other extra properties: non-int64_t/non-std::string properties only
|
|
|
|
// ...
|
|
|
|
// }
|
Experimental support for SST unique IDs (#8990)
Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).
Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990
Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.
Reviewed By: zhichao-cao, mrambacher
Differential Revision: D31582865
Pulled By: pdillinger
fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
2021-10-19 06:28:28 +00:00
|
|
|
void TEST_SetRandomTableProperties(TableProperties* props) {
|
|
|
|
Random* r = Random::GetTLSInstance();
|
|
|
|
uint64_t* pu = &props->orig_file_number;
|
|
|
|
assert(static_cast<void*>(pu) == static_cast<void*>(props));
|
|
|
|
std::string* ps = &props->db_id;
|
|
|
|
const uint64_t* const pu_end = reinterpret_cast<const uint64_t*>(ps);
|
2022-04-07 19:25:43 +00:00
|
|
|
// Use the last string property's address instead of
|
|
|
|
// the first extra property (e.g `user_collected_properties`)'s address
|
|
|
|
// in the for-loop to avoid advancing pointer to pointing to
|
|
|
|
// potential non-zero padding bytes between these two addresses due to
|
|
|
|
// user_collected_properties's alignment requirement
|
|
|
|
const std::string* const ps_end_inclusive = &props->compression_options;
|
Experimental support for SST unique IDs (#8990)
Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).
Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990
Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.
Reviewed By: zhichao-cao, mrambacher
Differential Revision: D31582865
Pulled By: pdillinger
fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
2021-10-19 06:28:28 +00:00
|
|
|
|
|
|
|
for (; pu < pu_end; ++pu) {
|
|
|
|
*pu = r->Next64();
|
|
|
|
}
|
|
|
|
assert(static_cast<void*>(pu) == static_cast<void*>(ps));
|
2022-04-07 19:25:43 +00:00
|
|
|
for (; ps <= ps_end_inclusive; ++ps) {
|
Experimental support for SST unique IDs (#8990)
Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).
Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990
Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.
Reviewed By: zhichao-cao, mrambacher
Differential Revision: D31582865
Pulled By: pdillinger
fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
2021-10-19 06:28:28 +00:00
|
|
|
*ps = r->RandomBinaryString(13);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2020-02-20 20:07:53 +00:00
|
|
|
} // namespace ROCKSDB_NAMESPACE
|