Commit Graph

5 Commits

Author SHA1 Message Date
Peter Dillinger 459969e993 Simplify detection of x86 CPU features (#11419)
Summary:
**Background** - runtime detection of certain x86 CPU features was added for optimizing CRC32c checksums, where performance is dramatically affected by the availability of certain CPU instructions and code using intrinsics for those instructions. And Java builds with native library try to be broadly compatible but performant.

What has changed is that CRC32c is no longer the most efficient cheecksum on contemporary x86_64 hardware, nor the default checksum. XXH3 is generally faster and not as dramatically impacted by the availability of certain CPU instructions. For example, on my Skylake system using db_bench (similar on an older Skylake system without AVX512):

PORTABLE=1 empty USE_SSE  : xxh3->8 GB/s   crc32c->0.8 GB/s  (no SSE4.2 nor AVX2 instructions)
PORTABLE=1 USE_SSE=1      : xxh3->19 GB/s  crc32c->16 GB/s  (with SSE4.2 and AVX2)
PORTABLE=0 USE_SSE ignored: xxh3->28 GB/s  crc32c->16 GB/s  (also some AVX512)

Testing a ~10 year old system, with SSE4.2 but without AVX2, crc32c is a similar speed to the new systems but xxh3 is only about half that speed, also 8GB/s like the non-AVX2 compile above. Given that xxh3 has specific optimization for AVX2, I think we can infer that that crc32c is only fastest for that ~2008-2013 period when SSE4.2 was included but not AVX2. And given that xxh3 is only about 2x slower on these systems (not like >10x slower for unoptimized crc32c), I don't think we need to invest too much in optimally adapting to these old cases.

x86 hardware that doesn't support fast CRC32c is now extremely rare, so requiring a custom build to support such hardware is fine IMHO.

**This change** does two related things:
* Remove runtime CPU detection for optimizing CRC32c on x86. Maintaining this code is non-zero work, and compiling special code that doesn't work on the configured target instruction set for code generation is always dubious. (On the one hand we have to ensure the CRC32c code uses SSE4.2 but on the other hand we have to ensure nothing else does.)
* Detect CPU features in source code, not in build scripts. Although there are some hypothetical advantages to detectiong in build scripts (compiler generality), RocksDB supports at least three build systems: make, cmake, and buck. It's not practical to support feature detection on all three, and we have suffered from missed optimization opportunities by relying on missing or incomplete detection in cmake and buck. We also depend on some components like xxhash that do source code detection anyway.

**In more detail:**
* `HAVE_SSE42`, `HAVE_AVX2`, and `HAVE_PCLMUL` replaced by standard macros `__SSE4_2__`, `__AVX2__`, and `__PCLMUL__`.
* MSVC does not provide high fidelity defines for SSE, PCLMUL, or POPCNT, but we can infer those from `__AVX__` or `__AVX2__` in a compatibility header. In rare cases of false negative or false positive feature detection, a build engineer should be able to set defines to work around the issue.
* `__POPCNT__` is another standard define, but we happen to only need it on MSVC, where it is set by that compatibility header, or can be set by the build engineer.
* `PORTABLE` can be set to a CPU type, e.g. "haswell", to compile for that CPU type.
* `USE_SSE` is deprecated, now equivalent to PORTABLE=haswell, which roughly approximates its old behavior.

Notably, this change should enable more builds to use the AVX2-optimized Bloom filter implementation.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11419

Test Plan:
existing tests, CI

Manual performance tests after the change match the before above (none expected with make build).

We also see AVX2 optimized Bloom filter code enabled when expected, by injecting a compiler error. (Performance difference is not big on my current CPU.)

Reviewed By: ajkr

Differential Revision: D45489041

Pulled By: pdillinger

fbshipit-source-id: 60ceb0dd2aa3b365c99ed08a8b2a087a9abb6a70
2023-05-09 22:25:45 -07:00
Yaroslav Stepanchuk 0a43061f8d Remove ROCKSDB_SUPPORT_THREAD_LOCAL define because it's a part of C++11 (#10015)
Summary:
ROCKSDB_SUPPORT_THREAD_LOCAL definition has been removed.
`__thread`(#define) has been replaced with `thread_local`(C++ keyword) across the code base.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10015

Reviewed By: siying

Differential Revision: D36485491

Pulled By: pdillinger

fbshipit-source-id: 6522d212514ee190b90b4e2750c80c7e34013c78
2022-05-18 15:25:19 -07:00
sdong 46f8889b6a platform010 gcc (#9946)
Summary:
Make platform010 gcc build work.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9946

Test Plan:
ROCKSDB_FBCODE_BUILD_WITH_PLATFORM010=1 make release -j
ROCKSDB_FBCODE_BUILD_WITH_PLATFORM010=1 make all check -j

Reviewed By: pdillinger, mdcallag

Differential Revision: D36152684

fbshipit-source-id: ca7b0916c51501a72bb15ad33a85e8c5cac5b505
2022-05-05 11:45:51 -07:00
Peter Dillinger 41237dd306 Add "no compression" job to CircleCI (#9850)
Summary:
Since they operate at distinct abstraction layers, I thought it
was prudent to combine with EncryptedEnv CI test for each PR, for efficiency
in testing. Also added supported compressions to sst_dump --help output
so that CI job can verify no compiled-in compression support.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9850

Test Plan: CI, some manual stuff

Reviewed By: riversand963

Differential Revision: D35682346

Pulled By: pdillinger

fbshipit-source-id: be9879c1533fed304ee32c89fd9ba4b07c2b90cc
2022-04-18 12:47:16 -07:00
sdong d5dfa8c6fe Upgrade development environment. (#9843)
Summary:
It's to support Meta's internal environment platform010. Gcc still doesn't work but USE_CLANG=1 should work.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9843

Test Plan: Try to make and ROCKSDB_FBCODE_BUILD_WITH_PLATFORM010=1 USE_CLANG=1 make

Reviewed By: pdillinger

Differential Revision: D35652507

fbshipit-source-id: a4a14b2fa4a2d6ca6fbf1b65060e81c39f079363
2022-04-15 16:05:38 -07:00