snappy

mirror of https://github.com/google/snappy.git synced 2024-11-29 00:34:34 +00:00

Author	SHA1	Message	Date
Jun He	f52721b2b4	decompression: optimize ExtractOffset for Arm Inspired by kExtractMasksCombined, this patch uses shift to replace table lookup. On Arm the codegen is 2 shift ops (lsl+lsr). Comparing to previous ldr which requires 4 cycles latency, the lsl+lsr only need 2 cycles. Slight (~0.3%) uplift observed on N1, and ~3% on A72. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: I5b53632d22d9e5cf1a49d0c5cdd16265a15de23b	2021-08-06 15:44:27 +08:00
Snappy Team	f2db8f77ce	Move the extract masks variable out in zippy. I see a consistent 1.5-2% improvement for ARM. Probably because ARM has more relaxed address computation than x86 https://www.godbolt.org/z/bfM1ezx41 . I don't think this is a compiler bug or it can do something about it PiperOrigin-RevId: 387569896	2021-08-02 14:50:16 +00:00
Snappy Team	c8f7641646	Remove inline assembly as the bug in clang was fixed PiperOrigin-RevId: 387356237	2021-08-02 14:50:09 +00:00
Snappy Team	9cc3689b21	Optimize memset to pure SIMD because compilers generate consistently bad code. clang for ARM and gcc for x86 https://gcc.godbolt.org/z/oxeGG7aEx PiperOrigin-RevId: 383467656	2021-08-02 14:49:57 +00:00
Snappy Team	b4888f7616	Optimize tag extraction for ARM with conditional increment instruction generation (csinc). For codegen see https://gcc.godbolt.org/z/a8z9j95Pv PiperOrigin-RevId: 382688740	2021-07-05 01:05:54 +00:00
atdt	b3fb0b5b4b	Enable vector byte shuffle optimizations on ARM NEON The SSSE3 intrinsics we use have their direct analogues in NEON, so making this optimization portable requires a very thin translation layer. PiperOrigin-RevId: 381280165	2021-07-05 01:05:44 +00:00
Victor Costan	b638ebe5d9	Update Travis CI config. Xcode (drives macOS image) : 12.2 => 12.5 Clang : 10 => 12 GCC : 10 => 11 PiperOrigin-RevId: 375610083	2021-05-25 02:20:52 +00:00
Snappy Team	d8f5dd8eca	Clarify, in a comment, that offset/256 fits in 3 bits. It has to in this context, because the other 5 bits in the byte are used for len-4 and the tag. PiperOrigin-RevId: 374926553	2021-05-25 02:20:42 +00:00
Victor Costan	2b63814b15	Tag open source release 1.1.9. PiperOrigin-RevId: 372007801	2021-05-04 22:53:34 +00:00
atdt	9c1be17938	'size' remains unused if none of ZLIB, LZO and LZ4 are available. While we're here, take care of a couple of lint warnings by converting CHECK(a != b) to CHECK_NE(a, b). PiperOrigin-RevId: 369132446	2021-04-22 04:27:48 +00:00
Chris Mumford	78650d126a	Add project goals to CONTRIBUTING.md. PiperOrigin-RevId: 362386747	2021-03-12 06:41:07 +00:00
Victor Costan	5e7c14bd05	Add stubs for abseil flags. This CL also removes support for using the gflags library to modify the flags. PiperOrigin-RevId: 361583626	2021-03-08 17:26:48 +00:00
Victor Costan	80a2a10c8c	Remove unused run_microbenchmarks flag. PiperOrigin-RevId: 361582956	2021-03-08 17:26:39 +00:00
Snappy Team	453942b38f	Add absl::GetFlag and absl::SetFlag to uses of flags. PiperOrigin-RevId: 357807059	2021-02-17 04:41:41 +00:00
Victor Costan	ea368c2f07	Add AppVeyor status badge. PiperOrigin-RevId: 347861379	2020-12-16 19:28:23 +00:00
Victor Costan	d1d1f48604	Remove unused include in snappy_benchmark.cc. PiperOrigin-RevId: 347861229	2020-12-16 19:28:12 +00:00
Victor Costan	4ebd8b2f23	Split benchmarks and test tools into separate targets. This lets us remove main() from snappy_bench.cc and snappy_unittest.cc, which simplifies integrating these tests and benchmarks with other suites. PiperOrigin-RevId: 347857427	2020-12-16 19:09:56 +00:00
Victor Costan	0793e2ae2d	Merge pull request #117 from cmumford:disable-osx-fuzzer PiperOrigin-RevId: 347736844	2020-12-16 03:02:51 +00:00
Victor Costan	ac55f842f7	Test stub improvements. PiperOrigin-RevId: 347736380	2020-12-16 02:58:39 +00:00
Chris Mumford	6e9ae72423	Disable fuzzing on OSX. LibFuzzer does not ship with the Mac OSX Command Line Tools. ``` ld: file not found: /Applications/Xcode-12.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/12.0.0/lib/darwin/libclang_rt.fuzzer_osx.a clang: error: linker command failed with exit code 1 (use -v to see invocation) ```	2020-12-15 17:10:14 -08:00
Victor Costan	402d88812c	Fixup for adding the third_party/{benchmark, googletest} submodules. (#115 )	2020-12-15 12:01:28 -08:00
Victor Costan	6badb0a261	Merge pull request #114 from cmumford:werror-only-clang PiperOrigin-RevId: 347660305	2020-12-15 19:49:13 +00:00
Chris Mumford	bc53daa7be	Fixed endif clause.	2020-12-15 11:19:53 -08:00
Chris Mumford	e9a6a08439	Matching clang.	2020-12-15 11:17:28 -08:00
Chris Mumford	955a5dd1b3	Building with `-Werror` only with clang. gcc was unable to inline a function call, which caused a build failure due to `-Wall -Werror`. The build error was: ``` ../snappy.cc:292:76: error: ignoring attributes on template argument ‘__m128i’ [-Werror=ignored-attributes] 292 \| static inline std::pair<__m128i /* pattern /, __m128i / reshuffle_mask */> \| ^ ../snappy.cc:292:76: error: ignoring attributes on template argument ‘__m128i’ [-Werror=ignored-attributes] cc1plus: all warnings being treated as errors ```	2020-12-15 11:02:17 -08:00
Chris Mumford	42d1dd7ea3	Fix CHECK_EQ to call ok() instead of CheckSuccess(). CheckSuccess was removed in `e1e91ee464`. PiperOrigin-RevId: 347625874	2020-12-15 09:16:39 -08:00
Victor Costan	eaaa0ed0ca	Fixup for adding the third_party/{benchmark, googletest} submodules. (#111 )	2020-12-15 08:49:01 -08:00
Victor Costan	e1e91ee464	Rework file:: stubs. PiperOrigin-RevId: 347541488	2020-12-15 06:21:47 +00:00
Victor Costan	6aa79cb471	Wrap snappy_unittest in an anonymous namespace and remove static from functions. PiperOrigin-RevId: 347541028	2020-12-15 06:18:35 +00:00
Victor Costan	bae9f9bef8	Fixup for adding the third_party/{benchmark, googletest} submodules. (#110 )	2020-12-14 20:27:33 -08:00
Victor Costan	5f913be04e	Fix unused local variable warnings. This will not change the compilation output. PiperOrigin-RevId: 347525836	2020-12-15 04:14:46 +00:00
Victor Costan	549685a598	Remove custom testing and benchmarking code. Snappy includes a testing framework, which implements a subset of the Google Test API, and can be used when Google Test is not available. Snappy also includes a micro-benchmark framework, which implements an old version of the Google Benchmark API. This CL replaces the custom test and micro-benchmark frameworks with google/googletest and google/benchmark. The code is vendored in third_party/ via git submodules. The setup is similar to google/crc32c and google/leveldb. This CL also updates the benchmarking code to the modern Google Benchmark API. Benchmark results are expected to be more precise, as the old framework ran each benchmark with a fixed number of iterations, whereas Google Benchmark keeps iterating until the noise is low. PiperOrigin-RevId: 347456142	2020-12-14 21:27:31 +00:00
Chris Mumford	11f9a77a2f	Add Travis-CI build status badge to README.md. PiperOrigin-RevId: 347402877	2020-12-14 09:40:22 -08:00
Victor Costan	49540965a3	Update Travis CI config. PiperOrigin-RevId: 347397797	2020-12-14 09:11:46 -08:00
Victor Costan	8995ffabb9	Replace #pragma nounroll with equivalent used elsewhere. PiperOrigin-RevId: 347341130	2020-12-14 09:59:34 +00:00
Victor Costan	d1daa83044	Remove inline qualifier from static variables. This feature requires C++17. Fortunately, inline is useful for header declarations, which may be included in multiple compilation units. The declarations modified by this CL occur in a single compilation unit. PiperOrigin-RevId: 347338760	2020-12-14 09:59:23 +00:00
Snappy Team	3b571656fa	1) Improve the lookup table data to require less instructions to extract the necessary data. We now store len - offset in a signed int16, this happens to remove masking offset in the calculations and the calculations that need to be done precisely give the flags that we need for testing correctness. 2) Replace offset extraction with a lookup mask. This is less uops and is needed because we need to special case type 3 to always return 0 as to properly trigger the fallback. 3) Unroll the loop twice, this removes some loop-condition checks AND it improves the generated assembly. The loop variables tend to end up in a different register requiring mov's having two consecutive copies allows the elision of the mov's. PiperOrigin-RevId: 346663328	2020-12-14 02:48:03 +00:00
Shahriar Rouf	a9730ed505	Optimize zippy decompression by making IncrementalCopy faster. When SSSE3 is available: - Use PSHUFB (_mm_shuffle_epi8) to handle pattern size 1 to 15 (previously it handled size 1 to 7). - This enables us to do 16 byte copies instead of 8 bytes copies because we know that the pattern size >= 16. - Use shuffle-reshuffle strategy to generate the next pattern after loading the initial pattern. This enables us to write 4 conditionals (similar to when pattern size >= 16) which would allow FDO to layout the code with respect to actual probabilities of each length. - The PSHUFB masks are now generated programmatically at compile-time. When SSSE3 is unavailable: - No change. In both cases: - assert(op < op_limit) in IncrementalCopy so that we can check 'op_limit <= buf_limit - 15' instead of 'op_limit <= buf_limit - 16'. All existing call sites of IncrementalCopy guarantee this. 'bin' case is notably >20% faster because it has many repeated character patterns (i.e. pattern_size = 1). PiperOrigin-RevId: 346454471	2020-12-14 02:47:49 +00:00
Snappy Team	56c2c247d0	Internal change PiperOrigin-RevId: 345360683	2020-12-03 22:52:52 +00:00
Shahriar Rouf	a94be58e65	Optimize zippy decompression by making IncrementalCopy faster. When SSSE3 is available: - Use PSHUFB (_mm_shuffle_epi8) to handle pattern size 1 to 15 (previously it handled size 1 to 7). - This enables us to do 16 byte copies instead of 8 bytes copies because we know that the pattern size >= 16. - Use shuffle-reshuffle strategy to generate the next pattern after loading the initial pattern. This enables us to write 4 conditionals (similar to when pattern size >= 16) which would allow FDO to layout the code with respect to actual probabilities of each length. - The PSHUFB masks are now generated programmatically at compile-time. When SSSE3 is unavailable: - No change. In both cases: - assert(op < op_limit) in IncrementalCopy so that we can check 'op_limit <= buf_limit - 15' instead of 'op_limit <= buf_limit - 16'. All existing call sites of IncrementalCopy guarantee this. 'bin' case is notably >20% faster because it has many repeated character patterns (i.e. pattern_size = 1). PiperOrigin-RevId: 345340892	2020-12-03 22:52:41 +00:00
Snappy Team	01a566f825	Fix opensource version PiperOrigin-RevId: 343272548	2020-11-19 17:06:26 +00:00
Snappy Team	616b8229b6	Add LZ4 as a benchmark option. Snappy is starting to look really good compared to LZ4. LZ4 is considered the fastest solution by many on internet. We now see that Snappy is actually becoming very competitive with compression a little faster and decompression slower but certainly not terribly slower. PiperOrigin-RevId: 343140860	2020-11-18 23:22:04 +00:00
Snappy Team	e4a6e97b91	Extend validate benchmarks over all types and also add a medley for validation. I also made the compression happen only once per benchmark. This way we get a cleaner measurement of #branch-misses using "perf stat". Compression suffers naturally from a large number of branch misses which was polluting the measurements. This showed that with the new decompression the branch misses is actually much lower then initially reported, only .2% and very stable, ie. doesn't really fluctuate with how you execute the benchmarks. PiperOrigin-RevId: 342628576	2020-11-18 23:21:55 +00:00
Snappy Team	719bed0ae2	Bug fix. Error on 0 offset copies. PiperOrigin-RevId: 342447553	2020-11-18 23:21:47 +00:00
Snappy Team	289c8a3c0a	Make zippy decompression branchless PiperOrigin-RevId: 342423961	2020-11-18 23:21:38 +00:00
Snappy Team	3bfa265a04	Revert zippy optimization that causes heap buffer overflows. PiperOrigin-RevId: 342283314	2020-11-18 23:21:30 +00:00
Shahriar Rouf	4d2dc9dcbb	Optimize zippy unzipping by upto >10% by making IncrementalCopy faster. When SSSE3 is available: - Use PSHUFB (_mm_shuffle_epi8) to handle pattern size 1 to 15 (previously it handled size 1 to 7). - This enables us to do 16 byte copies instead of 8 bytes copies because we know that the pattern size >= 16. - Use shuffle-reshuffle strategy to generate the next pattern after loading the initial pattern. This enables us to write 4 conditionals (similar to when pattern size >= 16) which would allow FDO to layout the code with respect to actual probabilities of each length. - The PSHUFB masks are now generated programmatically at compile-time. When SSSE3 is unavailable: - No change. In both cases: - assert(op < op_limit) in IncrementalCopy so that we can check 'op_limit <= buf_limit - 15' instead of 'op_limit <= buf_limit - 16'. All existing call sites of IncrementalCopy guarantee this. PiperOrigin-RevId: 342267037	2020-11-18 23:21:21 +00:00
Snappy Team	11e5165b98	Add a benchmark that decreased the branch prediction memorization by increasing the amount of independent branches executed per benchmark iteration. PiperOrigin-RevId: 342242843	2020-11-18 23:21:12 +00:00
Luca Versari	6835abd953	Change hash function for Compress. ((ab)>>18) & mask has higher throughput than (ab)>>shift, and produces the same results when the hash table size is 2**14. In other cases, the hash function is still good, but it's not as necessary for that to be the case as the input is small anyway. This speeds up in encoding, especially in cases where hashing is a significant part of the encoding critical path (small or uncompressible files). PiperOrigin-RevId: 341498741	2020-11-18 23:20:58 +00:00
Victor Costan	368b01c8dd	Merge pull request #107 from jsteemann:bug-fix/fix-compile-warning PiperOrigin-RevId: 340505526	2020-11-03 20:51:55 +00:00

1 2 3 4 5 ...

302 commits