snappy

mirror of https://github.com/google/snappy.git synced 2024-11-28 05:44:33 +00:00

Author	SHA1	Message	Date
Richard O'Grady	cfc573e08f	Define missing SNAPPY_PREFETCH macros. PiperOrigin-RevId: 523287305	2023-04-11 10:38:23 -07:00
Ilya Tokar	92f18e66fd	Add prefetch to zippy compress PiperOrigin-RevId: 518358512	2023-03-29 17:31:17 -07:00
Snappy Team	f603a02008	Explicitly #include <utility> in snappy-internal.h snappy-internal.h uses std::pair, which is defined in the <utility> header. Typically, this works because existing C++ standard library implementations provide <utility> via other transitive includes; however, these transitive includes are not guaranteed to exist, and don't exist in certain contexts (e.g. compiling against LLVM's libc++ with Clang modules.) PiperOrigin-RevId: 517213822	2023-03-29 17:31:10 -07:00
Snappy Team	9c42b71b19	Optimize check for uncommon decompression for ARM, saving two instructions and three cycles. PiperOrigin-RevId: 517141646	2023-03-29 17:30:58 -07:00
Victor Costan	dc05e02648	Tag open source release 1.1.10. PiperOrigin-RevId: 515161676	2023-03-08 15:44:00 -08:00
Snappy Team	7b82423c59	The output buffer in DecompressBranchless is never read from and the source buffers are never written. This allows us to defer any writes to the output buffer for an arbitrary amount of time as long as the writes all occur in the proper order. When a MemCopy64 would have normally occurred we save away the source address and length. Once we reach the location of the next write to the output buffer first perform the deferred copy. This gives time for the source address calculation and length to finish before the deferred copy. This change gives 1.84% on CLX and 0.97% Milan. PiperOrigin-RevId: 504012310	2023-03-07 06:35:00 -08:00
Victor Costan	30326e5b8c	Merge pull request #150 from davemgreen:betterunalignedloads PiperOrigin-RevId: 501489679	2023-01-12 13:33:26 +00:00
Snappy Team	74960e8bd6	Allow some buffer overwrite on literal emitting Calls to memcpy seem to be quite expensive ``` BM_ZFlat/0 [html (22.24 %) ] 114µs ± 6% 110µs ± 6% -3.97% (p=0.000 n=118+115) BM_ZFlat/1 [urls (47.84 %) ] 1.63ms ± 5% 1.58ms ± 5% -3.39% (p=0.000 n=117+115) BM_ZFlat/2 [jpg (99.95 %) ] 7.84µs ± 6% 7.70µs ± 6% -1.66% (p=0.000 n=119+117) BM_ZFlat/3 [jpg_200 (73.00 %)] 265ns ± 6% 255ns ± 6% -3.48% (p=0.000 n=101+98) BM_ZFlat/4 [pdf (83.31 %) ] 11.8µs ± 6% 11.6µs ± 6% -2.14% (p=0.000 n=118+116) BM_ZFlat/5 [html4 (22.52 %) ] 525µs ± 6% 513µs ± 6% -2.36% (p=0.000 n=117+116) BM_ZFlat/6 [txt1 (57.87 %) ] 494µs ± 5% 480µs ± 6% -2.84% (p=0.000 n=118+116) BM_ZFlat/7 [txt2 (62.02 %) ] 444µs ± 4% 428µs ± 7% -3.51% (p=0.000 n=119+117) BM_ZFlat/8 [txt3 (55.17 %) ] 1.34ms ± 5% 1.30ms ± 5% -2.40% (p=0.000 n=120+116) BM_ZFlat/9 [txt4 (66.41 %) ] 1.84ms ± 5% 1.78ms ± 5% -3.55% (p=0.000 n=110+111) BM_ZFlat/10 [pb (19.61 %) ] 101µs ± 5% 97µs ± 5% -4.67% (p=0.000 n=118+118) BM_ZFlat/11 [gaviota (37.73 %)] 368µs ± 5% 360µs ± 6% -2.13% (p=0.000 n=91+90) BM_ZFlat/12 [cp (48.25 %) ] 38.9µs ± 6% 36.8µs ± 6% -5.36% (p=0.000 n=88+87) BM_ZFlat/13 [c (42.52 %) ] 13.4µs ± 6% 13.1µs ± 8% -2.38% (p=0.000 n=115+116) BM_ZFlat/14 [lsp (48.94 %) ] 4.05µs ± 4% 3.94µs ± 4% -2.58% (p=0.000 n=91+85) BM_ZFlat/15 [xls (41.10 %) ] 1.42ms ± 5% 1.39ms ± 7% -2.49% (p=0.000 n=116+117) BM_ZFlat/16 [xls_200 (78.00 %)] 313ns ± 6% 307ns ± 5% -1.89% (p=0.000 n=89+84) BM_ZFlat/17 [bin (18.12 %) ] 518µs ± 5% 506µs ± 5% -2.42% (p=0.000 n=118+116) BM_ZFlat/18 [bin_200 (7.50 %) ] 86.8ns ± 6% 85.3ns ± 6% -1.76% (p=0.000 n=118+114) BM_ZFlat/19 [sum (48.99 %) ] 67.9µs ± 4% 61.1µs ± 6% -9.96% (p=0.000 n=114+117) BM_ZFlat/20 [man (59.45 %) ] 5.64µs ± 6% 5.47µs ± 7% -3.06% (p=0.000 n=117+115) BM_ZFlatAll [21 kTestDataFiles] 9.23ms ± 4% 9.01ms ± 5% -2.44% (p=0.000 n=80+83) BM_ZFlatIncreasingTableSize [7 tables ] 30.4µs ± 5% 29.3µs ± 7% -3.45% (p=0.000 n=96+96) ``` PiperOrigin-RevId: 490184133	2023-01-12 13:33:17 +00:00
Ilya Tokar	37f375ddeb	Add prefetch to zippy decompess, PiperOrigin-RevId: 489554313	2023-01-12 13:33:10 +00:00
Snappy Team	15e2a0e13d	Add "cc" clobbers to inline asm that modifies flags. As far as we know, the lack of "cc" in the clobbers hasn't caused problems yet, but it could. This change is to improve correctness, and is also almost certainly performance neutral. PiperOrigin-RevId: 487133620	2023-01-12 13:33:01 +00:00
Snappy Team	8881ba172a	Improve the speed of hashing in zippy compression. This change replaces the hashing function used during compression with one that is roughly as good but faster. This speeds up compression by two to a few percent on the Intel-, AMD-, and Arm-based machines we tested. The amount of compression is roughly unchanged. PiperOrigin-RevId: 485960303	2023-01-12 13:32:54 +00:00
Snappy Team	a2d219a8a8	Modify MemCopy64 to use AVX 32 byte copies instead of SSE2 16 byte copies on capable x86 platforms. This gives an average speedup of 6.87% on Milan and 1.90% on Skylake. PiperOrigin-RevId: 480370725	2023-01-12 13:32:43 +00:00
Marcin Kowalczyk	984b191f0f	Fix the remaining occurrence of non-const `std::string::data()`. PiperOrigin-RevId: 479818960	2022-10-08 21:59:12 +02:00
Matt Callanan	974fcc49e8	Fix compilation errors under C++11. `std::string::data()` is const-only until C++17. PiperOrigin-RevId: 479708109	2022-10-08 08:41:35 +02:00
Marcin Kowalczyk	d644ca8770	Fix warnings due to use of `__attribute__(always_inline)` without `inline`. PiperOrigin-RevId: 478984028	2022-10-05 10:38:16 +02:00
Matt Callanan	9758c9dfd7	Add `snappy::CompressFromIOVec`. This reads from an `iovec` array rather than from a `char` array as in `snappy::Compress`. PiperOrigin-RevId: 476930623	2022-09-29 09:32:28 -07:00
Victor Costan	af720f9a3b	Merge pull request #148 from pitrou:ubsan-ptr-add-overflow PiperOrigin-RevId: 463090354	2022-07-27 15:28:16 +00:00
Marcin Kowalczyk	44caf79086	Move the comment about non-overlap requirement from the implementation to the contract of `MemCopy64()`, and clarify that it applies to `size`, not to 64. PiperOrigin-RevId: 453920284	2022-07-27 15:28:08 +00:00
Snappy Team	d261d2766f	Optimize zippy MemCpy / MemMove during decompression By default MemCpy() / MemMove() always copies 64 bytes in DecompressBranchless(). Profiling shows that the vast majority of the time we need to copy many fewer bytes (typically <= 16 bytes). It is safe to copy fewer bytes as long as we exceed len. This change improves throughput by ~12% on ARM, ~35% on AMD Milan, and ~7% on Intel Cascade Lake. PiperOrigin-RevId: 453917840	2022-07-27 15:27:58 +00:00
Snappy Team	6a2b78a379	Optimize Zippy compression for ARM by 5-10% by choosing csel instructions PiperOrigin-RevId: 444863689	2022-05-09 16:19:11 +00:00
Snappy Team	8dd58a519f	Fix compilation for older GCC and Clang versions. Not everything defining __GNUC__ supports flag outputs from asm statements; in particular, some Clang versions on macOS does not. The correct test per the GCC documentation is __GCC_ASM_FLAG_OUTPUTS__, so use that instead. PiperOrigin-RevId: 423749308	2022-02-20 18:19:45 +00:00
David Green	6c6e890ef9	Change LittleEndian loads/stores to use memcpy The existing code uses a series of 8bit loads with shifts and ors to emulate an (unaligned) load of a larger type. These are then expected to become single loads in the compiler, producing optimal assembly. Whilst this is true it happens very late in the compiler, meaning that throughout most of the pipeline it is treated (and cost-modelled) as multiple loads, shifts and ors. This can make the compiler make poor decisions (such as not unrolling loops that should be), or to break up the pattern before it is turned into a single load. For example the loops in CompressFragment do not get unrolled as expected due to a higher cost than the unroll threshold in clang. Instead this patch uses a more conventional methods of loading unaligned data, using a memcpy directly which the compiler will be able to deal with much more straight forwardly, modelling it as a single unaligned load. The old code is left as-is for big-endian systems. This helps improve the performance of the BM_ZFlat benchmarks by up to 10-15% on an Arm Neoverse N1. Change-Id: I986f845ebd0a0806d052d2be3e4dbcbee91713d7	2022-01-19 07:14:46 +00:00
Victor Costan	8b07ff196a	Update contributing guidelines. * Align CONTRIBUTING.md with the google/new-project template. * Explain the support story for the CMake config. PiperOrigin-RevId: 421311695	2022-01-12 17:25:50 +00:00
Antoine Pitrou	64df9f28c8	Fix UBSan error (ptr + offset overflow) As `i + offset` is promoted to a "negative" size_t, UBSan would complain when adding the resulting offset to `dst`: ``` /tmp/RtmptDX1SS/file584e37df4e/snappy_ep-prefix/src/snappy_ep/snappy.cc:343:43: runtime error: addition of unsigned offset to 0x6120003c5ec1 overflowed to 0x6120003c5ec0 #0 0x7f9ebd21769c in snappy::(anonymous namespace)::Copy64BytesWithPatternExtension(char, unsigned long) /tmp/RtmptDX1SS/file584e37df4e/snappy_ep-prefix/src/snappy_ep/snappy.cc:343:43 #1 0x7f9ebd21769c in std::__1::pair<unsigned char const, long> snappy::DecompressBranchless<char>(unsigned char const, unsigned char const, long, char, long) /tmp/RtmptDX1SS/file584e37df4e/snappy_ep-prefix/src/snappy_ep/snappy.cc:1160:15 ```	2021-11-30 19:46:18 +01:00
Snappy Team	65dc7b3839	Pass by reference the first argument of ExtractLowBytes to avoid UB of passing uninitialized argument by value. PiperOrigin-RevId: 406052814	2021-11-14 22:09:42 +00:00
Victor Costan	fe18b46322	Switch CI to GitHub Actions. PiperOrigin-RevId: 394247182	2021-09-01 16:57:31 +00:00
Victor Costan	a7ddc144d1	Merge pull request #140 from JunHe77:adv PiperOrigin-RevId: 394061345	2021-08-31 19:47:38 +00:00
Jun He	aeb5de55a9	decompress: refine data depdency The final ip advance value doesn't have to wait for the result of offset to load *tag. It can be computed along with the offset, so the codegen will use one csinc in parallel with ldrb. This will improve the throughput. With this change it is observed ~4.2% uplift in UFlat/10 and ~3.7% in UFlatMedley Signed-off-by: Jun He <jun.he@arm.com> Change-Id: I20ab211235bbf578c6c978f2bbd9160a49e920da	2021-08-30 09:51:37 +08:00
Victor Costan	7062d7f1d8	Merge pull request #133 from JunHe77:simd PiperOrigin-RevId: 393681630	2021-08-30 01:36:24 +00:00
Victor Costan	cbb83a1d64	Migrate feature detection macro checks from #ifdef to #if. The #if predicate evaluates to false if the macro is undefined, or defined to 0. #ifdef (and its synonym #if defined) evaluates to false only if the macro is undefined. The new setup allows differentiating between setting a macro to 0 (to express that the capability definitely does not exist / should not be used) and leaving a macro undefined (to express not knowing whether a capability exists / not caring if a capability is used). PiperOrigin-RevId: 391094241	2021-08-16 18:26:33 +00:00
Victor Costan	a8400f1fab	Add baseline CPU level to Travis CI. PiperOrigin-RevId: 391082698	2021-08-16 17:42:27 +00:00
Victor Costan	b9c9a989b2	Merge pull request #135 from JunHe77:remove_extra PiperOrigin-RevId: 390767998	2021-08-14 08:15:44 +00:00
Victor Costan	5c87bc61b6	Merge pull request #136 from JunHe77:ext_arm PiperOrigin-RevId: 390715690	2021-08-13 23:24:49 +00:00
Jun He	734b32bfe3	Add config and header file for NEON support Signed-off-by: Jun He <jun.he@arm.com> Change-Id: I3fade568ff92b4303387705f843d0051d5e88349	2021-08-12 15:45:46 +08:00
Jun He	ab9a57280d	Fix SSE3 and BMI2 compile error After SHUFFLE code blocks are refactored, "tmmintrin.h" is missed, and bmi2 code part will have build failure as type conflicts. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: I7800cd7e050f4d349e5a227206b14b9c566e547f	2021-08-12 15:45:41 +08:00
Jun He	d643b9a988	decompress: add hint to remove extra AND Clang doesn't realize the load with free zero-extension, and emits another extra 'and xn, xm, 0xff' to calc offset. With this change ,this extra op is removed, and consistent 1.7% performance uplift is observed. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: Ica4617852c4b93eadc6c5c551dc3961ffbadb8f0	2021-08-12 15:19:53 +08:00
Jun He	f52721b2b4	decompression: optimize ExtractOffset for Arm Inspired by kExtractMasksCombined, this patch uses shift to replace table lookup. On Arm the codegen is 2 shift ops (lsl+lsr). Comparing to previous ldr which requires 4 cycles latency, the lsl+lsr only need 2 cycles. Slight (~0.3%) uplift observed on N1, and ~3% on A72. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: I5b53632d22d9e5cf1a49d0c5cdd16265a15de23b	2021-08-06 15:44:27 +08:00
Snappy Team	f2db8f77ce	Move the extract masks variable out in zippy. I see a consistent 1.5-2% improvement for ARM. Probably because ARM has more relaxed address computation than x86 https://www.godbolt.org/z/bfM1ezx41 . I don't think this is a compiler bug or it can do something about it PiperOrigin-RevId: 387569896	2021-08-02 14:50:16 +00:00
Snappy Team	c8f7641646	Remove inline assembly as the bug in clang was fixed PiperOrigin-RevId: 387356237	2021-08-02 14:50:09 +00:00
Snappy Team	9cc3689b21	Optimize memset to pure SIMD because compilers generate consistently bad code. clang for ARM and gcc for x86 https://gcc.godbolt.org/z/oxeGG7aEx PiperOrigin-RevId: 383467656	2021-08-02 14:49:57 +00:00
Snappy Team	b4888f7616	Optimize tag extraction for ARM with conditional increment instruction generation (csinc). For codegen see https://gcc.godbolt.org/z/a8z9j95Pv PiperOrigin-RevId: 382688740	2021-07-05 01:05:54 +00:00
atdt	b3fb0b5b4b	Enable vector byte shuffle optimizations on ARM NEON The SSSE3 intrinsics we use have their direct analogues in NEON, so making this optimization portable requires a very thin translation layer. PiperOrigin-RevId: 381280165	2021-07-05 01:05:44 +00:00
Victor Costan	b638ebe5d9	Update Travis CI config. Xcode (drives macOS image) : 12.2 => 12.5 Clang : 10 => 12 GCC : 10 => 11 PiperOrigin-RevId: 375610083	2021-05-25 02:20:52 +00:00
Snappy Team	d8f5dd8eca	Clarify, in a comment, that offset/256 fits in 3 bits. It has to in this context, because the other 5 bits in the byte are used for len-4 and the tag. PiperOrigin-RevId: 374926553	2021-05-25 02:20:42 +00:00
Victor Costan	2b63814b15	Tag open source release 1.1.9. PiperOrigin-RevId: 372007801	2021-05-04 22:53:34 +00:00
atdt	9c1be17938	'size' remains unused if none of ZLIB, LZO and LZ4 are available. While we're here, take care of a couple of lint warnings by converting CHECK(a != b) to CHECK_NE(a, b). PiperOrigin-RevId: 369132446	2021-04-22 04:27:48 +00:00
Chris Mumford	78650d126a	Add project goals to CONTRIBUTING.md. PiperOrigin-RevId: 362386747	2021-03-12 06:41:07 +00:00
Victor Costan	5e7c14bd05	Add stubs for abseil flags. This CL also removes support for using the gflags library to modify the flags. PiperOrigin-RevId: 361583626	2021-03-08 17:26:48 +00:00
Victor Costan	80a2a10c8c	Remove unused run_microbenchmarks flag. PiperOrigin-RevId: 361582956	2021-03-08 17:26:39 +00:00
Snappy Team	453942b38f	Add absl::GetFlag and absl::SetFlag to uses of flags. PiperOrigin-RevId: 357807059	2021-02-17 04:41:41 +00:00

1 2 3 4 5 ...

338 commits