Commit Graph

368 Commits

Author SHA1 Message Date
Snappy Team 8dd58a519f Fix compilation for older GCC and Clang versions.
Not everything defining __GNUC__ supports flag outputs
from asm statements; in particular, some Clang versions
on macOS does not. The correct test per the GCC documentation
is __GCC_ASM_FLAG_OUTPUTS__, so use that instead.

PiperOrigin-RevId: 423749308
2022-02-20 18:19:45 +00:00
David Green 6c6e890ef9 Change LittleEndian loads/stores to use memcpy
The existing code uses a series of 8bit loads with shifts and ors to
emulate an (unaligned) load of a larger type. These are then expected to
become single loads in the compiler, producing optimal assembly. Whilst
this is true it happens very late in the compiler, meaning that
throughout most of the pipeline it is treated (and cost-modelled) as
multiple loads, shifts and ors. This can make the compiler make poor
decisions (such as not unrolling loops that should be), or to break up
the pattern before it is turned into a single load.

For example the loops in CompressFragment do not get unrolled as
expected due to a higher cost than the unroll threshold in clang.

Instead this patch uses a more conventional methods of loading unaligned
data, using a memcpy directly which the compiler will be able to deal
with much more straight forwardly, modelling it as a single unaligned
load. The old code is left as-is for big-endian systems.

This helps improve the performance of the BM_ZFlat benchmarks by up to
10-15% on an Arm Neoverse N1.

Change-Id: I986f845ebd0a0806d052d2be3e4dbcbee91713d7
2022-01-19 07:14:46 +00:00
Victor Costan 8b07ff196a Update contributing guidelines.
* Align CONTRIBUTING.md with the google/new-project template.
* Explain the support story for the CMake config.

PiperOrigin-RevId: 421311695
2022-01-12 17:25:50 +00:00
Antoine Pitrou 64df9f28c8 Fix UBSan error (ptr + offset overflow)
As `i + offset` is promoted to a "negative" size_t,
UBSan would complain when adding the resulting offset to `dst`:
```
/tmp/RtmptDX1SS/file584e37df4e/snappy_ep-prefix/src/snappy_ep/snappy.cc:343:43: runtime error: addition of unsigned offset to 0x6120003c5ec1 overflowed to 0x6120003c5ec0
    #0 0x7f9ebd21769c in snappy::(anonymous namespace)::Copy64BytesWithPatternExtension(char*, unsigned long) /tmp/RtmptDX1SS/file584e37df4e/snappy_ep-prefix/src/snappy_ep/snappy.cc:343:43
    #1 0x7f9ebd21769c in std::__1::pair<unsigned char const*, long> snappy::DecompressBranchless<char*>(unsigned char const*, unsigned char const*, long, char*, long) /tmp/RtmptDX1SS/file584e37df4e/snappy_ep-prefix/src/snappy_ep/snappy.cc:1160:15
```
2021-11-30 19:46:18 +01:00
Snappy Team 65dc7b3839 Pass by reference the first argument of ExtractLowBytes
to avoid UB of passing uninitialized argument by value.

PiperOrigin-RevId: 406052814
2021-11-14 22:09:42 +00:00
Victor Costan fe18b46322 Switch CI to GitHub Actions.
PiperOrigin-RevId: 394247182
2021-09-01 16:57:31 +00:00
Victor Costan a7ddc144d1 Merge pull request #140 from JunHe77:adv
PiperOrigin-RevId: 394061345
2021-08-31 19:47:38 +00:00
Jun He aeb5de55a9 decompress: refine data depdency
The final ip advance value doesn't have to wait for
the result of offset to load *tag. It can be computed
along with the offset, so the codegen will use one
csinc in parallel with ldrb. This will improve the
throughput.
With this change it is observed ~4.2% uplift in UFlat/10
and ~3.7% in UFlatMedley

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I20ab211235bbf578c6c978f2bbd9160a49e920da
2021-08-30 09:51:37 +08:00
Victor Costan 7062d7f1d8 Merge pull request #133 from JunHe77:simd
PiperOrigin-RevId: 393681630
2021-08-30 01:36:24 +00:00
Victor Costan cbb83a1d64 Migrate feature detection macro checks from #ifdef to #if.
The #if predicate evaluates to false if the macro is undefined, or
defined to 0. #ifdef (and its synonym #if defined) evaluates to false
only if the macro is undefined.

The new setup allows differentiating between setting a macro to 0 (to
express that the capability definitely does not exist / should not be
used) and leaving a macro undefined (to express not knowing whether a
capability exists / not caring if a capability is used).

PiperOrigin-RevId: 391094241
2021-08-16 18:26:33 +00:00
Victor Costan a8400f1fab Add baseline CPU level to Travis CI.
PiperOrigin-RevId: 391082698
2021-08-16 17:42:27 +00:00
Victor Costan b9c9a989b2 Merge pull request #135 from JunHe77:remove_extra
PiperOrigin-RevId: 390767998
2021-08-14 08:15:44 +00:00
Victor Costan 5c87bc61b6 Merge pull request #136 from JunHe77:ext_arm
PiperOrigin-RevId: 390715690
2021-08-13 23:24:49 +00:00
Jun He 734b32bfe3 Add config and header file for NEON support
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I3fade568ff92b4303387705f843d0051d5e88349
2021-08-12 15:45:46 +08:00
Jun He ab9a57280d Fix SSE3 and BMI2 compile error
After SHUFFLE code blocks are refactored, "tmmintrin.h"
is missed, and bmi2 code part will have build failure
as type conflicts.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I7800cd7e050f4d349e5a227206b14b9c566e547f
2021-08-12 15:45:41 +08:00
Jun He d643b9a988 decompress: add hint to remove extra AND
Clang doesn't realize the load with free zero-extension,
and emits another extra 'and xn, xm, 0xff' to calc offset.
With this change ,this extra op is removed, and consistent
1.7% performance uplift is observed.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: Ica4617852c4b93eadc6c5c551dc3961ffbadb8f0
2021-08-12 15:19:53 +08:00
Jun He f52721b2b4 decompression: optimize ExtractOffset for Arm
Inspired by kExtractMasksCombined, this patch uses shift
to replace table lookup. On Arm the codegen is 2 shift ops
(lsl+lsr). Comparing to previous ldr which requires 4 cycles
latency, the lsl+lsr only need 2 cycles.
Slight (~0.3%) uplift observed on N1, and ~3% on A72.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I5b53632d22d9e5cf1a49d0c5cdd16265a15de23b
2021-08-06 15:44:27 +08:00
Snappy Team f2db8f77ce Move the extract masks variable out in zippy. I see a consistent 1.5-2% improvement for ARM. Probably because ARM has more relaxed address computation than x86 https://www.godbolt.org/z/bfM1ezx41. I don't think this is a compiler bug or it can do something about it
PiperOrigin-RevId: 387569896
2021-08-02 14:50:16 +00:00
Snappy Team c8f7641646 Remove inline assembly as the bug in clang was fixed
PiperOrigin-RevId: 387356237
2021-08-02 14:50:09 +00:00
Snappy Team 9cc3689b21 Optimize memset to pure SIMD because compilers generate consistently bad code. clang for ARM and gcc for x86 https://gcc.godbolt.org/z/oxeGG7aEx
PiperOrigin-RevId: 383467656
2021-08-02 14:49:57 +00:00
Snappy Team b4888f7616 Optimize tag extraction for ARM with conditional increment instruction generation (csinc). For codegen see https://gcc.godbolt.org/z/a8z9j95Pv
PiperOrigin-RevId: 382688740
2021-07-05 01:05:54 +00:00
atdt b3fb0b5b4b Enable vector byte shuffle optimizations on ARM NEON
The SSSE3 intrinsics we use have their direct analogues in NEON, so making this optimization portable requires a very thin translation layer.

PiperOrigin-RevId: 381280165
2021-07-05 01:05:44 +00:00
Victor Costan b638ebe5d9 Update Travis CI config.
Xcode (drives macOS image) : 12.2 => 12.5
Clang                      : 10 => 12
GCC                        : 10 => 11
PiperOrigin-RevId: 375610083
2021-05-25 02:20:52 +00:00
Snappy Team d8f5dd8eca Clarify, in a comment, that offset/256 fits in 3 bits. It has to in this context, because the other 5 bits in the byte are used for len-4 and the tag.
PiperOrigin-RevId: 374926553
2021-05-25 02:20:42 +00:00
Victor Costan 2b63814b15 Tag open source release 1.1.9.
PiperOrigin-RevId: 372007801
2021-05-04 22:53:34 +00:00
atdt 9c1be17938 'size' remains unused if none of ZLIB, LZO and LZ4 are available.
While we're here, take care of a couple of lint warnings by converting CHECK(a != b) to CHECK_NE(a, b).

PiperOrigin-RevId: 369132446
2021-04-22 04:27:48 +00:00
Chris Mumford 78650d126a Add project goals to CONTRIBUTING.md.
PiperOrigin-RevId: 362386747
2021-03-12 06:41:07 +00:00
Victor Costan 5e7c14bd05 Add stubs for abseil flags.
This CL also removes support for using the gflags library to modify the
flags.

PiperOrigin-RevId: 361583626
2021-03-08 17:26:48 +00:00
Victor Costan 80a2a10c8c Remove unused run_microbenchmarks flag.
PiperOrigin-RevId: 361582956
2021-03-08 17:26:39 +00:00
Snappy Team 453942b38f Add absl::GetFlag and absl::SetFlag to uses of flags.
PiperOrigin-RevId: 357807059
2021-02-17 04:41:41 +00:00
Victor Costan ea368c2f07 Add AppVeyor status badge.
PiperOrigin-RevId: 347861379
2020-12-16 19:28:23 +00:00
Victor Costan d1d1f48604 Remove unused include in snappy_benchmark.cc.
PiperOrigin-RevId: 347861229
2020-12-16 19:28:12 +00:00
Victor Costan 4ebd8b2f23 Split benchmarks and test tools into separate targets.
This lets us remove main() from snappy_bench.cc and snappy_unittest.cc,
which simplifies integrating these tests and benchmarks with other
suites.

PiperOrigin-RevId: 347857427
2020-12-16 19:09:56 +00:00
Victor Costan 0793e2ae2d Merge pull request #117 from cmumford:disable-osx-fuzzer
PiperOrigin-RevId: 347736844
2020-12-16 03:02:51 +00:00
Victor Costan ac55f842f7 Test stub improvements.
PiperOrigin-RevId: 347736380
2020-12-16 02:58:39 +00:00
Chris Mumford 6e9ae72423 Disable fuzzing on OSX.
LibFuzzer does not ship with the Mac OSX Command Line Tools.

```
ld: file not found: /Applications/Xcode-12.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/12.0.0/lib/darwin/libclang_rt.fuzzer_osx.a

clang: error: linker command failed with exit code 1 (use -v to see invocation)
```
2020-12-15 17:10:14 -08:00
Victor Costan 402d88812c
Fixup for adding the third_party/{benchmark, googletest} submodules. (#115) 2020-12-15 12:01:28 -08:00
Victor Costan 6badb0a261 Merge pull request #114 from cmumford:werror-only-clang
PiperOrigin-RevId: 347660305
2020-12-15 19:49:13 +00:00
Chris Mumford bc53daa7be Fixed endif clause. 2020-12-15 11:19:53 -08:00
Chris Mumford e9a6a08439 Matching clang. 2020-12-15 11:17:28 -08:00
Chris Mumford 955a5dd1b3 Building with `-Werror` only with clang.
gcc was unable to inline a function call, which caused a build
failure due to `-Wall -Werror`.

The build error was:

```
../snappy.cc:292:76: error: ignoring attributes on template argument ‘__m128i’ [-Werror=ignored-attributes]
  292 | static inline std::pair<__m128i /* pattern */, __m128i /* reshuffle_mask */>
      |                                                                            ^
../snappy.cc:292:76: error: ignoring attributes on template argument ‘__m128i’ [-Werror=ignored-attributes]
cc1plus: all warnings being treated as errors
```
2020-12-15 11:02:17 -08:00
Chris Mumford 42d1dd7ea3 Fix CHECK_EQ to call ok() instead of CheckSuccess().
CheckSuccess was removed in e1e91ee464.

PiperOrigin-RevId: 347625874
2020-12-15 09:16:39 -08:00
Victor Costan eaaa0ed0ca
Fixup for adding the third_party/{benchmark, googletest} submodules. (#111) 2020-12-15 08:49:01 -08:00
Victor Costan e1e91ee464 Rework file:: stubs.
PiperOrigin-RevId: 347541488
2020-12-15 06:21:47 +00:00
Victor Costan 6aa79cb471 Wrap snappy_unittest in an anonymous namespace and remove static from functions.
PiperOrigin-RevId: 347541028
2020-12-15 06:18:35 +00:00
Victor Costan bae9f9bef8
Fixup for adding the third_party/{benchmark, googletest} submodules. (#110) 2020-12-14 20:27:33 -08:00
Victor Costan 5f913be04e Fix unused local variable warnings.
This will not change the compilation output.

PiperOrigin-RevId: 347525836
2020-12-15 04:14:46 +00:00
Victor Costan 549685a598 Remove custom testing and benchmarking code.
Snappy includes a testing framework, which implements a subset of the
Google Test API, and can be used when Google Test is not available.
Snappy also includes a micro-benchmark framework, which implements an
old version of the Google Benchmark API.

This CL replaces the custom test and micro-benchmark frameworks with
google/googletest and google/benchmark. The code is vendored in
third_party/ via git submodules. The setup is similar to google/crc32c
and google/leveldb.

This CL also updates the benchmarking code to the modern Google
Benchmark API.

Benchmark results are expected to be more precise, as the old framework
ran each benchmark with a fixed number of iterations, whereas Google
Benchmark keeps iterating until the noise is low.

PiperOrigin-RevId: 347456142
2020-12-14 21:27:31 +00:00
Chris Mumford 11f9a77a2f Add Travis-CI build status badge to README.md.
PiperOrigin-RevId: 347402877
2020-12-14 09:40:22 -08:00
Victor Costan 49540965a3 Update Travis CI config.
PiperOrigin-RevId: 347397797
2020-12-14 09:11:46 -08:00