Commit Graph

353 Commits

Author SHA1 Message Date
Danila Kutenin ab38064abe Fix compilation in the benchmark 2024-04-04 18:44:31 +00:00
Danila Kutenin 4e693db158 Use C++11 style instead of C++20 2024-04-04 18:42:29 +00:00
Danila Kutenin a60fd602ce Fix sync 2024-04-04 18:36:37 +00:00
Snappy Team 766d24c95e Zippy level 2 for denser compression and faster decompression
We also increased the hashtable size by 1 bit as it significantly degraded the ratio. Thus even level 1 might slightly improve.

PiperOrigin-RevId: 621456036
2024-04-04 18:27:00 +00:00
Snappy Team 4f5cf9a8d6 Internal changes
PiperOrigin-RevId: 599838882
2024-04-04 18:26:53 +00:00
Snappy Team 8bf2640823 Internal changes
PiperOrigin-RevId: 599151099
2024-04-04 18:26:42 +00:00
Snappy Team f0b0c9b8ce Internal changes
PiperOrigin-RevId: 597991348
2024-04-04 18:24:48 +00:00
Snappy Team 54d07d53a2 Restructure compression sampling for comparative analysis
PiperOrigin-RevId: 597989810
2024-04-04 18:21:10 +00:00
Richard O'Grady 27f34a580b Fix -Wsign-compare warning
PiperOrigin-RevId: 547529709
2023-07-12 11:12:48 -07:00
Richard O'Grady c9f9edf6d7 Fixes for Windows bazel build.
Don't pass -Wno-sign-compare on Windows.
Add a #define HAVE_WINDOWS_H if _WIN32 is defined.
Don't assume sys/uio.h is available on Windows.

PiperOrigin-RevId: 524416809
2023-04-14 18:02:20 -07:00
Richard O'Grady 66a30b803f Add initial bazel build support for snappy.
PiperOrigin-RevId: 524135175
2023-04-13 17:10:32 -07:00
Richard O'Grady f725f6766b Upgrade googletest to v1.13.0 release. 2023-04-13 10:31:13 -07:00
Richard O'Grady 8325392950 Disable Wimplicit-int-float-conversion warning in googletest
PiperOrigin-RevId: 524031046
2023-04-13 10:04:53 -07:00
Richard O'Grady 108139d275 Upgrade benchmark library to v1.7.1 release. 2023-04-11 13:16:42 -07:00
Richard O'Grady 00aa9ac61d Disable -Wsign-compare warning.
PiperOrigin-RevId: 523460180
2023-04-11 11:55:49 -07:00
Richard O'Grady cfc573e08f Define missing SNAPPY_PREFETCH macros.
PiperOrigin-RevId: 523287305
2023-04-11 10:38:23 -07:00
Ilya Tokar 92f18e66fd Add prefetch to zippy compress
PiperOrigin-RevId: 518358512
2023-03-29 17:31:17 -07:00
Snappy Team f603a02008 Explicitly #include <utility> in snappy-internal.h
snappy-internal.h uses std::pair, which is defined in the <utility>
header. Typically, this works because existing C++ standard library
implementations provide <utility> via other transitive includes;
however, these transitive includes are not guaranteed to exist, and
don't exist in certain contexts (e.g. compiling against LLVM's libc++
with Clang modules.)

PiperOrigin-RevId: 517213822
2023-03-29 17:31:10 -07:00
Snappy Team 9c42b71b19 Optimize check for uncommon decompression for ARM, saving two instructions and three cycles.
PiperOrigin-RevId: 517141646
2023-03-29 17:30:58 -07:00
Victor Costan dc05e02648 Tag open source release 1.1.10.
PiperOrigin-RevId: 515161676
2023-03-08 15:44:00 -08:00
Snappy Team 7b82423c59 The output buffer in DecompressBranchless is never read from and the source buffers are never written. This allows us to defer any writes to the output buffer for an arbitrary amount of time as long as the writes all occur in the proper order. When a MemCopy64 would have normally occurred we save away the source address and length. Once we reach the location of the next write to the output buffer first perform the deferred copy. This gives time for the source address calculation and length to finish before the deferred copy.
This change gives 1.84% on CLX and 0.97% Milan.

PiperOrigin-RevId: 504012310
2023-03-07 06:35:00 -08:00
Victor Costan 30326e5b8c Merge pull request #150 from davemgreen:betterunalignedloads
PiperOrigin-RevId: 501489679
2023-01-12 13:33:26 +00:00
Snappy Team 74960e8bd6 Allow some buffer overwrite on literal emitting
Calls to memcpy seem to be quite expensive

```
BM_ZFlat/0                  [html (22.24 %)   ]     114µs ± 6%   110µs ± 6%  -3.97%  (p=0.000 n=118+115)
BM_ZFlat/1                  [urls (47.84 %)   ]    1.63ms ± 5%  1.58ms ± 5%  -3.39%  (p=0.000 n=117+115)
BM_ZFlat/2                  [jpg (99.95 %)    ]    7.84µs ± 6%  7.70µs ± 6%  -1.66%  (p=0.000 n=119+117)
BM_ZFlat/3                  [jpg_200 (73.00 %)]     265ns ± 6%   255ns ± 6%  -3.48%   (p=0.000 n=101+98)
BM_ZFlat/4                  [pdf (83.31 %)    ]    11.8µs ± 6%  11.6µs ± 6%  -2.14%  (p=0.000 n=118+116)
BM_ZFlat/5                  [html4 (22.52 %)  ]     525µs ± 6%   513µs ± 6%  -2.36%  (p=0.000 n=117+116)
BM_ZFlat/6                  [txt1 (57.87 %)   ]     494µs ± 5%   480µs ± 6%  -2.84%  (p=0.000 n=118+116)
BM_ZFlat/7                  [txt2 (62.02 %)   ]     444µs ± 4%   428µs ± 7%  -3.51%  (p=0.000 n=119+117)
BM_ZFlat/8                  [txt3 (55.17 %)   ]    1.34ms ± 5%  1.30ms ± 5%  -2.40%  (p=0.000 n=120+116)
BM_ZFlat/9                  [txt4 (66.41 %)   ]    1.84ms ± 5%  1.78ms ± 5%  -3.55%  (p=0.000 n=110+111)
BM_ZFlat/10                 [pb (19.61 %)     ]     101µs ± 5%    97µs ± 5%  -4.67%  (p=0.000 n=118+118)
BM_ZFlat/11                 [gaviota (37.73 %)]     368µs ± 5%   360µs ± 6%  -2.13%    (p=0.000 n=91+90)
BM_ZFlat/12                 [cp (48.25 %)     ]    38.9µs ± 6%  36.8µs ± 6%  -5.36%    (p=0.000 n=88+87)
BM_ZFlat/13                 [c (42.52 %)      ]    13.4µs ± 6%  13.1µs ± 8%  -2.38%  (p=0.000 n=115+116)
BM_ZFlat/14                 [lsp (48.94 %)    ]    4.05µs ± 4%  3.94µs ± 4%  -2.58%    (p=0.000 n=91+85)
BM_ZFlat/15                 [xls (41.10 %)    ]    1.42ms ± 5%  1.39ms ± 7%  -2.49%  (p=0.000 n=116+117)
BM_ZFlat/16                 [xls_200 (78.00 %)]     313ns ± 6%   307ns ± 5%  -1.89%    (p=0.000 n=89+84)
BM_ZFlat/17                 [bin (18.12 %)    ]     518µs ± 5%   506µs ± 5%  -2.42%  (p=0.000 n=118+116)
BM_ZFlat/18                 [bin_200 (7.50 %) ]    86.8ns ± 6%  85.3ns ± 6%  -1.76%  (p=0.000 n=118+114)
BM_ZFlat/19                 [sum (48.99 %)    ]    67.9µs ± 4%  61.1µs ± 6%  -9.96%  (p=0.000 n=114+117)
BM_ZFlat/20                 [man (59.45 %)    ]    5.64µs ± 6%  5.47µs ± 7%  -3.06%  (p=0.000 n=117+115)
BM_ZFlatAll                 [21 kTestDataFiles]    9.23ms ± 4%  9.01ms ± 5%  -2.44%    (p=0.000 n=80+83)
BM_ZFlatIncreasingTableSize [7 tables         ]    30.4µs ± 5%  29.3µs ± 7%  -3.45%    (p=0.000 n=96+96)
```

PiperOrigin-RevId: 490184133
2023-01-12 13:33:17 +00:00
Ilya Tokar 37f375ddeb Add prefetch to zippy decompess,
PiperOrigin-RevId: 489554313
2023-01-12 13:33:10 +00:00
Snappy Team 15e2a0e13d Add "cc" clobbers to inline asm that modifies flags.
As far as we know, the lack of "cc" in the clobbers hasn't caused
problems yet, but it could.  This change is to improve correctness,
and is also almost certainly performance neutral.

PiperOrigin-RevId: 487133620
2023-01-12 13:33:01 +00:00
Snappy Team 8881ba172a Improve the speed of hashing in zippy compression.
This change replaces the hashing function used during compression with
one that is roughly as good but faster.  This speeds up compression by
two to a few percent on the Intel-, AMD-, and Arm-based machines we
tested.  The amount of compression is roughly unchanged.

PiperOrigin-RevId: 485960303
2023-01-12 13:32:54 +00:00
Snappy Team a2d219a8a8 Modify MemCopy64 to use AVX 32 byte copies instead of SSE2 16 byte copies on capable x86 platforms. This gives an average speedup of 6.87% on Milan and 1.90% on Skylake.
PiperOrigin-RevId: 480370725
2023-01-12 13:32:43 +00:00
Marcin Kowalczyk 984b191f0f Fix the remaining occurrence of non-const `std::string::data()`.
PiperOrigin-RevId: 479818960
2022-10-08 21:59:12 +02:00
Matt Callanan 974fcc49e8 Fix compilation errors under C++11.
`std::string::data()` is const-only until C++17.

PiperOrigin-RevId: 479708109
2022-10-08 08:41:35 +02:00
Marcin Kowalczyk d644ca8770 Fix warnings due to use of `__attribute__(always_inline)` without `inline`.
PiperOrigin-RevId: 478984028
2022-10-05 10:38:16 +02:00
Matt Callanan 9758c9dfd7 Add `snappy::CompressFromIOVec`.
This reads from an `iovec` array rather than from a `char` array as in `snappy::Compress`.

PiperOrigin-RevId: 476930623
2022-09-29 09:32:28 -07:00
Victor Costan af720f9a3b Merge pull request #148 from pitrou:ubsan-ptr-add-overflow
PiperOrigin-RevId: 463090354
2022-07-27 15:28:16 +00:00
Marcin Kowalczyk 44caf79086 Move the comment about non-overlap requirement from the implementation to the
contract of `MemCopy64()`, and clarify that it applies to `size`, not to 64.

PiperOrigin-RevId: 453920284
2022-07-27 15:28:08 +00:00
Snappy Team d261d2766f Optimize zippy MemCpy / MemMove during decompression
By default MemCpy() / MemMove() always copies 64 bytes in DecompressBranchless().  Profiling shows that the vast majority of the time we need to copy many fewer bytes (typically <= 16 bytes).  It is safe to copy fewer bytes as long as we exceed len.

This change improves throughput by ~12% on ARM, ~35% on AMD Milan, and ~7% on Intel Cascade Lake.

PiperOrigin-RevId: 453917840
2022-07-27 15:27:58 +00:00
Snappy Team 6a2b78a379 Optimize Zippy compression for ARM by 5-10% by choosing csel instructions
PiperOrigin-RevId: 444863689
2022-05-09 16:19:11 +00:00
Snappy Team 8dd58a519f Fix compilation for older GCC and Clang versions.
Not everything defining __GNUC__ supports flag outputs
from asm statements; in particular, some Clang versions
on macOS does not. The correct test per the GCC documentation
is __GCC_ASM_FLAG_OUTPUTS__, so use that instead.

PiperOrigin-RevId: 423749308
2022-02-20 18:19:45 +00:00
David Green 6c6e890ef9 Change LittleEndian loads/stores to use memcpy
The existing code uses a series of 8bit loads with shifts and ors to
emulate an (unaligned) load of a larger type. These are then expected to
become single loads in the compiler, producing optimal assembly. Whilst
this is true it happens very late in the compiler, meaning that
throughout most of the pipeline it is treated (and cost-modelled) as
multiple loads, shifts and ors. This can make the compiler make poor
decisions (such as not unrolling loops that should be), or to break up
the pattern before it is turned into a single load.

For example the loops in CompressFragment do not get unrolled as
expected due to a higher cost than the unroll threshold in clang.

Instead this patch uses a more conventional methods of loading unaligned
data, using a memcpy directly which the compiler will be able to deal
with much more straight forwardly, modelling it as a single unaligned
load. The old code is left as-is for big-endian systems.

This helps improve the performance of the BM_ZFlat benchmarks by up to
10-15% on an Arm Neoverse N1.

Change-Id: I986f845ebd0a0806d052d2be3e4dbcbee91713d7
2022-01-19 07:14:46 +00:00
Victor Costan 8b07ff196a Update contributing guidelines.
* Align CONTRIBUTING.md with the google/new-project template.
* Explain the support story for the CMake config.

PiperOrigin-RevId: 421311695
2022-01-12 17:25:50 +00:00
Antoine Pitrou 64df9f28c8 Fix UBSan error (ptr + offset overflow)
As `i + offset` is promoted to a "negative" size_t,
UBSan would complain when adding the resulting offset to `dst`:
```
/tmp/RtmptDX1SS/file584e37df4e/snappy_ep-prefix/src/snappy_ep/snappy.cc:343:43: runtime error: addition of unsigned offset to 0x6120003c5ec1 overflowed to 0x6120003c5ec0
    #0 0x7f9ebd21769c in snappy::(anonymous namespace)::Copy64BytesWithPatternExtension(char*, unsigned long) /tmp/RtmptDX1SS/file584e37df4e/snappy_ep-prefix/src/snappy_ep/snappy.cc:343:43
    #1 0x7f9ebd21769c in std::__1::pair<unsigned char const*, long> snappy::DecompressBranchless<char*>(unsigned char const*, unsigned char const*, long, char*, long) /tmp/RtmptDX1SS/file584e37df4e/snappy_ep-prefix/src/snappy_ep/snappy.cc:1160:15
```
2021-11-30 19:46:18 +01:00
Snappy Team 65dc7b3839 Pass by reference the first argument of ExtractLowBytes
to avoid UB of passing uninitialized argument by value.

PiperOrigin-RevId: 406052814
2021-11-14 22:09:42 +00:00
Victor Costan fe18b46322 Switch CI to GitHub Actions.
PiperOrigin-RevId: 394247182
2021-09-01 16:57:31 +00:00
Victor Costan a7ddc144d1 Merge pull request #140 from JunHe77:adv
PiperOrigin-RevId: 394061345
2021-08-31 19:47:38 +00:00
Jun He aeb5de55a9 decompress: refine data depdency
The final ip advance value doesn't have to wait for
the result of offset to load *tag. It can be computed
along with the offset, so the codegen will use one
csinc in parallel with ldrb. This will improve the
throughput.
With this change it is observed ~4.2% uplift in UFlat/10
and ~3.7% in UFlatMedley

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I20ab211235bbf578c6c978f2bbd9160a49e920da
2021-08-30 09:51:37 +08:00
Victor Costan 7062d7f1d8 Merge pull request #133 from JunHe77:simd
PiperOrigin-RevId: 393681630
2021-08-30 01:36:24 +00:00
Victor Costan cbb83a1d64 Migrate feature detection macro checks from #ifdef to #if.
The #if predicate evaluates to false if the macro is undefined, or
defined to 0. #ifdef (and its synonym #if defined) evaluates to false
only if the macro is undefined.

The new setup allows differentiating between setting a macro to 0 (to
express that the capability definitely does not exist / should not be
used) and leaving a macro undefined (to express not knowing whether a
capability exists / not caring if a capability is used).

PiperOrigin-RevId: 391094241
2021-08-16 18:26:33 +00:00
Victor Costan a8400f1fab Add baseline CPU level to Travis CI.
PiperOrigin-RevId: 391082698
2021-08-16 17:42:27 +00:00
Victor Costan b9c9a989b2 Merge pull request #135 from JunHe77:remove_extra
PiperOrigin-RevId: 390767998
2021-08-14 08:15:44 +00:00
Victor Costan 5c87bc61b6 Merge pull request #136 from JunHe77:ext_arm
PiperOrigin-RevId: 390715690
2021-08-13 23:24:49 +00:00
Jun He 734b32bfe3 Add config and header file for NEON support
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I3fade568ff92b4303387705f843d0051d5e88349
2021-08-12 15:45:46 +08:00
Jun He ab9a57280d Fix SSE3 and BMI2 compile error
After SHUFFLE code blocks are refactored, "tmmintrin.h"
is missed, and bmi2 code part will have build failure
as type conflicts.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I7800cd7e050f4d349e5a227206b14b9c566e547f
2021-08-12 15:45:41 +08:00