mirror of
https://github.com/google/snappy.git
synced 2024-11-29 00:34:34 +00:00
decompress: add hint to remove extra AND
Clang doesn't realize the load with free zero-extension, and emits another extra 'and xn, xm, 0xff' to calc offset. With this change ,this extra op is removed, and consistent 1.7% performance uplift is observed. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: Ica4617852c4b93eadc6c5c551dc3961ffbadb8f0
This commit is contained in:
parent
f2db8f77ce
commit
d643b9a988
|
@ -1108,6 +1108,15 @@ std::pair<const uint8_t*, ptrdiff_t> DecompressBranchless(
|
||||||
// ip points just past the tag and we are touching at maximum kSlopBytes
|
// ip points just past the tag and we are touching at maximum kSlopBytes
|
||||||
// in an iteration.
|
// in an iteration.
|
||||||
size_t tag = ip[-1];
|
size_t tag = ip[-1];
|
||||||
|
#if defined(__clang__) && defined(__aarch64__)
|
||||||
|
// Workaround for https://bugs.llvm.org/show_bug.cgi?id=51317
|
||||||
|
// when loading 1 byte, clang for aarch64 doesn't realize that it(ldrb)
|
||||||
|
// comes with free zero-extension, so clang generates another
|
||||||
|
// 'and xn, xm, 0xff' before it use that as the offset. This 'and' is
|
||||||
|
// redundant and can be removed by adding this dummy asm, which gives
|
||||||
|
// clang a hint that we're doing the zero-extension at the load.
|
||||||
|
asm("" ::"r"(tag));
|
||||||
|
#endif
|
||||||
do {
|
do {
|
||||||
// The throughput is limited by instructions, unrolling the inner loop
|
// The throughput is limited by instructions, unrolling the inner loop
|
||||||
// twice reduces the amount of instructions checking limits and also
|
// twice reduces the amount of instructions checking limits and also
|
||||||
|
|
Loading…
Reference in a new issue