Commit Graph

323 Commits

Author SHA1 Message Date
Victor Costan a505476792 Test: fix build error. 2022-09-29 09:56:09 -07:00
Matt Callanan 9758c9dfd7 Add `snappy::CompressFromIOVec`.
This reads from an `iovec` array rather than from a `char` array as in `snappy::Compress`.

PiperOrigin-RevId: 476930623
2022-09-29 09:32:28 -07:00
Victor Costan af720f9a3b Merge pull request #148 from pitrou:ubsan-ptr-add-overflow
PiperOrigin-RevId: 463090354
2022-07-27 15:28:16 +00:00
Marcin Kowalczyk 44caf79086 Move the comment about non-overlap requirement from the implementation to the
contract of `MemCopy64()`, and clarify that it applies to `size`, not to 64.

PiperOrigin-RevId: 453920284
2022-07-27 15:28:08 +00:00
Snappy Team d261d2766f Optimize zippy MemCpy / MemMove during decompression
By default MemCpy() / MemMove() always copies 64 bytes in DecompressBranchless().  Profiling shows that the vast majority of the time we need to copy many fewer bytes (typically <= 16 bytes).  It is safe to copy fewer bytes as long as we exceed len.

This change improves throughput by ~12% on ARM, ~35% on AMD Milan, and ~7% on Intel Cascade Lake.

PiperOrigin-RevId: 453917840
2022-07-27 15:27:58 +00:00
Snappy Team 6a2b78a379 Optimize Zippy compression for ARM by 5-10% by choosing csel instructions
PiperOrigin-RevId: 444863689
2022-05-09 16:19:11 +00:00
Snappy Team 8dd58a519f Fix compilation for older GCC and Clang versions.
Not everything defining __GNUC__ supports flag outputs
from asm statements; in particular, some Clang versions
on macOS does not. The correct test per the GCC documentation
is __GCC_ASM_FLAG_OUTPUTS__, so use that instead.

PiperOrigin-RevId: 423749308
2022-02-20 18:19:45 +00:00
Victor Costan 8b07ff196a Update contributing guidelines.
* Align CONTRIBUTING.md with the google/new-project template.
* Explain the support story for the CMake config.

PiperOrigin-RevId: 421311695
2022-01-12 17:25:50 +00:00
Antoine Pitrou 64df9f28c8 Fix UBSan error (ptr + offset overflow)
As `i + offset` is promoted to a "negative" size_t,
UBSan would complain when adding the resulting offset to `dst`:
```
/tmp/RtmptDX1SS/file584e37df4e/snappy_ep-prefix/src/snappy_ep/snappy.cc:343:43: runtime error: addition of unsigned offset to 0x6120003c5ec1 overflowed to 0x6120003c5ec0
    #0 0x7f9ebd21769c in snappy::(anonymous namespace)::Copy64BytesWithPatternExtension(char*, unsigned long) /tmp/RtmptDX1SS/file584e37df4e/snappy_ep-prefix/src/snappy_ep/snappy.cc:343:43
    #1 0x7f9ebd21769c in std::__1::pair<unsigned char const*, long> snappy::DecompressBranchless<char*>(unsigned char const*, unsigned char const*, long, char*, long) /tmp/RtmptDX1SS/file584e37df4e/snappy_ep-prefix/src/snappy_ep/snappy.cc:1160:15
```
2021-11-30 19:46:18 +01:00
Snappy Team 65dc7b3839 Pass by reference the first argument of ExtractLowBytes
to avoid UB of passing uninitialized argument by value.

PiperOrigin-RevId: 406052814
2021-11-14 22:09:42 +00:00
Victor Costan fe18b46322 Switch CI to GitHub Actions.
PiperOrigin-RevId: 394247182
2021-09-01 16:57:31 +00:00
Victor Costan a7ddc144d1 Merge pull request #140 from JunHe77:adv
PiperOrigin-RevId: 394061345
2021-08-31 19:47:38 +00:00
Jun He aeb5de55a9 decompress: refine data depdency
The final ip advance value doesn't have to wait for
the result of offset to load *tag. It can be computed
along with the offset, so the codegen will use one
csinc in parallel with ldrb. This will improve the
throughput.
With this change it is observed ~4.2% uplift in UFlat/10
and ~3.7% in UFlatMedley

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I20ab211235bbf578c6c978f2bbd9160a49e920da
2021-08-30 09:51:37 +08:00
Victor Costan 7062d7f1d8 Merge pull request #133 from JunHe77:simd
PiperOrigin-RevId: 393681630
2021-08-30 01:36:24 +00:00
Victor Costan cbb83a1d64 Migrate feature detection macro checks from #ifdef to #if.
The #if predicate evaluates to false if the macro is undefined, or
defined to 0. #ifdef (and its synonym #if defined) evaluates to false
only if the macro is undefined.

The new setup allows differentiating between setting a macro to 0 (to
express that the capability definitely does not exist / should not be
used) and leaving a macro undefined (to express not knowing whether a
capability exists / not caring if a capability is used).

PiperOrigin-RevId: 391094241
2021-08-16 18:26:33 +00:00
Victor Costan a8400f1fab Add baseline CPU level to Travis CI.
PiperOrigin-RevId: 391082698
2021-08-16 17:42:27 +00:00
Victor Costan b9c9a989b2 Merge pull request #135 from JunHe77:remove_extra
PiperOrigin-RevId: 390767998
2021-08-14 08:15:44 +00:00
Victor Costan 5c87bc61b6 Merge pull request #136 from JunHe77:ext_arm
PiperOrigin-RevId: 390715690
2021-08-13 23:24:49 +00:00
Jun He 734b32bfe3 Add config and header file for NEON support
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I3fade568ff92b4303387705f843d0051d5e88349
2021-08-12 15:45:46 +08:00
Jun He ab9a57280d Fix SSE3 and BMI2 compile error
After SHUFFLE code blocks are refactored, "tmmintrin.h"
is missed, and bmi2 code part will have build failure
as type conflicts.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I7800cd7e050f4d349e5a227206b14b9c566e547f
2021-08-12 15:45:41 +08:00
Jun He d643b9a988 decompress: add hint to remove extra AND
Clang doesn't realize the load with free zero-extension,
and emits another extra 'and xn, xm, 0xff' to calc offset.
With this change ,this extra op is removed, and consistent
1.7% performance uplift is observed.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: Ica4617852c4b93eadc6c5c551dc3961ffbadb8f0
2021-08-12 15:19:53 +08:00
Jun He f52721b2b4 decompression: optimize ExtractOffset for Arm
Inspired by kExtractMasksCombined, this patch uses shift
to replace table lookup. On Arm the codegen is 2 shift ops
(lsl+lsr). Comparing to previous ldr which requires 4 cycles
latency, the lsl+lsr only need 2 cycles.
Slight (~0.3%) uplift observed on N1, and ~3% on A72.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I5b53632d22d9e5cf1a49d0c5cdd16265a15de23b
2021-08-06 15:44:27 +08:00
Snappy Team f2db8f77ce Move the extract masks variable out in zippy. I see a consistent 1.5-2% improvement for ARM. Probably because ARM has more relaxed address computation than x86 https://www.godbolt.org/z/bfM1ezx41. I don't think this is a compiler bug or it can do something about it
PiperOrigin-RevId: 387569896
2021-08-02 14:50:16 +00:00
Snappy Team c8f7641646 Remove inline assembly as the bug in clang was fixed
PiperOrigin-RevId: 387356237
2021-08-02 14:50:09 +00:00
Snappy Team 9cc3689b21 Optimize memset to pure SIMD because compilers generate consistently bad code. clang for ARM and gcc for x86 https://gcc.godbolt.org/z/oxeGG7aEx
PiperOrigin-RevId: 383467656
2021-08-02 14:49:57 +00:00
Snappy Team b4888f7616 Optimize tag extraction for ARM with conditional increment instruction generation (csinc). For codegen see https://gcc.godbolt.org/z/a8z9j95Pv
PiperOrigin-RevId: 382688740
2021-07-05 01:05:54 +00:00
atdt b3fb0b5b4b Enable vector byte shuffle optimizations on ARM NEON
The SSSE3 intrinsics we use have their direct analogues in NEON, so making this optimization portable requires a very thin translation layer.

PiperOrigin-RevId: 381280165
2021-07-05 01:05:44 +00:00
Victor Costan b638ebe5d9 Update Travis CI config.
Xcode (drives macOS image) : 12.2 => 12.5
Clang                      : 10 => 12
GCC                        : 10 => 11
PiperOrigin-RevId: 375610083
2021-05-25 02:20:52 +00:00
Snappy Team d8f5dd8eca Clarify, in a comment, that offset/256 fits in 3 bits. It has to in this context, because the other 5 bits in the byte are used for len-4 and the tag.
PiperOrigin-RevId: 374926553
2021-05-25 02:20:42 +00:00
Victor Costan 2b63814b15 Tag open source release 1.1.9.
PiperOrigin-RevId: 372007801
2021-05-04 22:53:34 +00:00
atdt 9c1be17938 'size' remains unused if none of ZLIB, LZO and LZ4 are available.
While we're here, take care of a couple of lint warnings by converting CHECK(a != b) to CHECK_NE(a, b).

PiperOrigin-RevId: 369132446
2021-04-22 04:27:48 +00:00
Chris Mumford 78650d126a Add project goals to CONTRIBUTING.md.
PiperOrigin-RevId: 362386747
2021-03-12 06:41:07 +00:00
Victor Costan 5e7c14bd05 Add stubs for abseil flags.
This CL also removes support for using the gflags library to modify the
flags.

PiperOrigin-RevId: 361583626
2021-03-08 17:26:48 +00:00
Victor Costan 80a2a10c8c Remove unused run_microbenchmarks flag.
PiperOrigin-RevId: 361582956
2021-03-08 17:26:39 +00:00
Snappy Team 453942b38f Add absl::GetFlag and absl::SetFlag to uses of flags.
PiperOrigin-RevId: 357807059
2021-02-17 04:41:41 +00:00
Victor Costan ea368c2f07 Add AppVeyor status badge.
PiperOrigin-RevId: 347861379
2020-12-16 19:28:23 +00:00
Victor Costan d1d1f48604 Remove unused include in snappy_benchmark.cc.
PiperOrigin-RevId: 347861229
2020-12-16 19:28:12 +00:00
Victor Costan 4ebd8b2f23 Split benchmarks and test tools into separate targets.
This lets us remove main() from snappy_bench.cc and snappy_unittest.cc,
which simplifies integrating these tests and benchmarks with other
suites.

PiperOrigin-RevId: 347857427
2020-12-16 19:09:56 +00:00
Victor Costan 0793e2ae2d Merge pull request #117 from cmumford:disable-osx-fuzzer
PiperOrigin-RevId: 347736844
2020-12-16 03:02:51 +00:00
Victor Costan ac55f842f7 Test stub improvements.
PiperOrigin-RevId: 347736380
2020-12-16 02:58:39 +00:00
Chris Mumford 6e9ae72423 Disable fuzzing on OSX.
LibFuzzer does not ship with the Mac OSX Command Line Tools.

```
ld: file not found: /Applications/Xcode-12.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/12.0.0/lib/darwin/libclang_rt.fuzzer_osx.a

clang: error: linker command failed with exit code 1 (use -v to see invocation)
```
2020-12-15 17:10:14 -08:00
Victor Costan 402d88812c
Fixup for adding the third_party/{benchmark, googletest} submodules. (#115) 2020-12-15 12:01:28 -08:00
Victor Costan 6badb0a261 Merge pull request #114 from cmumford:werror-only-clang
PiperOrigin-RevId: 347660305
2020-12-15 19:49:13 +00:00
Chris Mumford bc53daa7be Fixed endif clause. 2020-12-15 11:19:53 -08:00
Chris Mumford e9a6a08439 Matching clang. 2020-12-15 11:17:28 -08:00
Chris Mumford 955a5dd1b3 Building with `-Werror` only with clang.
gcc was unable to inline a function call, which caused a build
failure due to `-Wall -Werror`.

The build error was:

```
../snappy.cc:292:76: error: ignoring attributes on template argument ‘__m128i’ [-Werror=ignored-attributes]
  292 | static inline std::pair<__m128i /* pattern */, __m128i /* reshuffle_mask */>
      |                                                                            ^
../snappy.cc:292:76: error: ignoring attributes on template argument ‘__m128i’ [-Werror=ignored-attributes]
cc1plus: all warnings being treated as errors
```
2020-12-15 11:02:17 -08:00
Chris Mumford 42d1dd7ea3 Fix CHECK_EQ to call ok() instead of CheckSuccess().
CheckSuccess was removed in e1e91ee464.

PiperOrigin-RevId: 347625874
2020-12-15 09:16:39 -08:00
Victor Costan eaaa0ed0ca
Fixup for adding the third_party/{benchmark, googletest} submodules. (#111) 2020-12-15 08:49:01 -08:00
Victor Costan e1e91ee464 Rework file:: stubs.
PiperOrigin-RevId: 347541488
2020-12-15 06:21:47 +00:00
Victor Costan 6aa79cb471 Wrap snappy_unittest in an anonymous namespace and remove static from functions.
PiperOrigin-RevId: 347541028
2020-12-15 06:18:35 +00:00