rocksdb/util
Peter Dillinger 459969e993 Simplify detection of x86 CPU features (#11419)
Summary:
**Background** - runtime detection of certain x86 CPU features was added for optimizing CRC32c checksums, where performance is dramatically affected by the availability of certain CPU instructions and code using intrinsics for those instructions. And Java builds with native library try to be broadly compatible but performant.

What has changed is that CRC32c is no longer the most efficient cheecksum on contemporary x86_64 hardware, nor the default checksum. XXH3 is generally faster and not as dramatically impacted by the availability of certain CPU instructions. For example, on my Skylake system using db_bench (similar on an older Skylake system without AVX512):

PORTABLE=1 empty USE_SSE  : xxh3->8 GB/s   crc32c->0.8 GB/s  (no SSE4.2 nor AVX2 instructions)
PORTABLE=1 USE_SSE=1      : xxh3->19 GB/s  crc32c->16 GB/s  (with SSE4.2 and AVX2)
PORTABLE=0 USE_SSE ignored: xxh3->28 GB/s  crc32c->16 GB/s  (also some AVX512)

Testing a ~10 year old system, with SSE4.2 but without AVX2, crc32c is a similar speed to the new systems but xxh3 is only about half that speed, also 8GB/s like the non-AVX2 compile above. Given that xxh3 has specific optimization for AVX2, I think we can infer that that crc32c is only fastest for that ~2008-2013 period when SSE4.2 was included but not AVX2. And given that xxh3 is only about 2x slower on these systems (not like >10x slower for unoptimized crc32c), I don't think we need to invest too much in optimally adapting to these old cases.

x86 hardware that doesn't support fast CRC32c is now extremely rare, so requiring a custom build to support such hardware is fine IMHO.

**This change** does two related things:
* Remove runtime CPU detection for optimizing CRC32c on x86. Maintaining this code is non-zero work, and compiling special code that doesn't work on the configured target instruction set for code generation is always dubious. (On the one hand we have to ensure the CRC32c code uses SSE4.2 but on the other hand we have to ensure nothing else does.)
* Detect CPU features in source code, not in build scripts. Although there are some hypothetical advantages to detectiong in build scripts (compiler generality), RocksDB supports at least three build systems: make, cmake, and buck. It's not practical to support feature detection on all three, and we have suffered from missed optimization opportunities by relying on missing or incomplete detection in cmake and buck. We also depend on some components like xxhash that do source code detection anyway.

**In more detail:**
* `HAVE_SSE42`, `HAVE_AVX2`, and `HAVE_PCLMUL` replaced by standard macros `__SSE4_2__`, `__AVX2__`, and `__PCLMUL__`.
* MSVC does not provide high fidelity defines for SSE, PCLMUL, or POPCNT, but we can infer those from `__AVX__` or `__AVX2__` in a compatibility header. In rare cases of false negative or false positive feature detection, a build engineer should be able to set defines to work around the issue.
* `__POPCNT__` is another standard define, but we happen to only need it on MSVC, where it is set by that compatibility header, or can be set by the build engineer.
* `PORTABLE` can be set to a CPU type, e.g. "haswell", to compile for that CPU type.
* `USE_SSE` is deprecated, now equivalent to PORTABLE=haswell, which roughly approximates its old behavior.

Notably, this change should enable more builds to use the AVX2-optimized Bloom filter implementation.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11419

Test Plan:
existing tests, CI

Manual performance tests after the change match the before above (none expected with make build).

We also see AVX2 optimized Bloom filter code enabled when expected, by injecting a compiler error. (Performance difference is not big on my current CPU.)

Reviewed By: ajkr

Differential Revision: D45489041

Pulled By: pdillinger

fbshipit-source-id: 60ceb0dd2aa3b365c99ed08a8b2a087a9abb6a70
2023-05-09 22:25:45 -07:00
..
aligned_buffer.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
async_file_reader.cc Return any errors returned by ReadAsync to the MultiGet caller (#11171) 2023-02-02 16:35:27 -08:00
async_file_reader.h Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
autovector.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
autovector_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
bloom_impl.h Simplify detection of x86 CPU features (#11419) 2023-05-09 22:25:45 -07:00
bloom_test.cc Fix an uninitialized variable warning for g++ 12.2.0 (#10995) 2022-11-30 19:27:28 -08:00
build_version.cc.in Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
cast_util.h
channel.h
cleanable.cc Fix compile error in Clang 13 (#10033) 2022-05-28 00:15:28 -07:00
coding.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
coding.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
coding_lean.h New stable, fixed-length cache keys (#9126) 2021-12-16 17:15:13 -08:00
coding_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
compaction_job_stats_impl.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
comparator.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
compression.cc Fix bug in WAL streaming uncompression (#11198) 2023-02-08 12:05:49 -08:00
compression.h Fix bug in WAL streaming uncompression (#11198) 2023-02-08 12:05:49 -08:00
compression_context_cache.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
compression_context_cache.h
concurrent_task_limiter_impl.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
concurrent_task_limiter_impl.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
core_local.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
coro_utils.h Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
crc32c.cc Simplify detection of x86 CPU features (#11419) 2023-05-09 22:25:45 -07:00
crc32c.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
crc32c_arm64.cc Add OpenBSD/arm64 support for detection of CRC32 and PMULL (#10902) 2022-11-02 14:35:27 -07:00
crc32c_arm64.h Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00
crc32c_ppc.c
crc32c_ppc.h
crc32c_ppc_asm.S
crc32c_ppc_constants.h
crc32c_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
data_structure.cc Improve SmallEnumSet (#11178) 2023-02-08 20:14:57 -08:00
defer.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
defer_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
distributed_mutex.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
duplicate_detector.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
dynamic_bloom.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
dynamic_bloom.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
dynamic_bloom_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
fastrange.h
file_checksum_helper.cc Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
file_checksum_helper.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
file_reader_writer_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
filelock_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
filter_bench.cc Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
gflags_compat.h Fix gflags_compat.h (#11346) 2023-04-03 10:41:00 -07:00
hash.cc
hash.h
hash128.h
hash_containers.h Meta-internal folly integration with F14FastMap (#9546) 2022-04-13 07:34:01 -07:00
hash_map.h
hash_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
heap.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
heap_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
kv_map.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
log_write_bench.cc
math.h Simplify detection of x86 CPU features (#11419) 2023-05-09 22:25:45 -07:00
math128.h Derive cache keys from SST unique IDs (#10394) 2022-08-12 13:49:49 -07:00
murmurhash.cc Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00
murmurhash.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
mutexlock.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
ppc-opcode.h
random.cc remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
random.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
random_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
rate_limiter.cc Fix race conditions in GenericRateLimiter (#10374) 2022-07-19 09:31:14 -07:00
rate_limiter.h Fix race conditions in GenericRateLimiter (#10374) 2022-07-19 09:31:14 -07:00
rate_limiter_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
repeatable_thread.h
repeatable_thread_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
ribbon_alg.h Use only ASCII in source files (#10164) 2022-06-15 14:44:43 -07:00
ribbon_config.cc
ribbon_config.h
ribbon_impl.h
ribbon_test.cc util/ribbon_test.cc: avoid ambiguous reversed operator error in c++20 (#11371) 2023-04-12 13:24:34 -07:00
set_comparator.h
single_thread_executor.h Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
slice.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
slice_test.cc Improve SmallEnumSet (#11178) 2023-02-08 20:14:57 -08:00
slice_transform_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
status.cc Merge operator failed subcode (#11231) 2023-02-17 10:58:46 -08:00
stderr_logger.cc Fix an import issue in fbcode. (#10604) 2022-08-29 21:09:36 -07:00
stderr_logger.h Fix an import issue in fbcode. (#10604) 2022-08-29 21:09:36 -07:00
stop_watch.h Changes and enhancements to compression stats, thresholds (#11388) 2023-04-21 21:57:40 -07:00
string_util.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
string_util.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
thread_guard.h
thread_list_test.cc Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
thread_local.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
thread_local.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
thread_local_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
thread_operation.h Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
threadpool_imp.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
threadpool_imp.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer_queue.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer_queue_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
user_comparator_wrapper.h Make UserComparatorWrapper not Customizable (#10837) 2022-10-21 12:27:50 -07:00
vector_iterator.h Make InternalKeyComparator not configurable (#10342) 2022-07-14 10:09:31 -07:00
work_queue.h
work_queue_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
xxhash.cc Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00
xxhash.h Upgrade xxhash.h to latest dev (#11098) 2023-01-19 12:07:50 -08:00
xxph3.h Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00