snappy

Commit Graph

Author	SHA1	Message	Date
Richard O'Grady	cfc573e08f	Define missing SNAPPY_PREFETCH macros. PiperOrigin-RevId: 523287305	2023-04-11 10:38:23 -07:00
Victor Costan	30326e5b8c	Merge pull request #150 from davemgreen:betterunalignedloads PiperOrigin-RevId: 501489679	2023-01-12 13:33:26 +00:00
David Green	6c6e890ef9	Change LittleEndian loads/stores to use memcpy The existing code uses a series of 8bit loads with shifts and ors to emulate an (unaligned) load of a larger type. These are then expected to become single loads in the compiler, producing optimal assembly. Whilst this is true it happens very late in the compiler, meaning that throughout most of the pipeline it is treated (and cost-modelled) as multiple loads, shifts and ors. This can make the compiler make poor decisions (such as not unrolling loops that should be), or to break up the pattern before it is turned into a single load. For example the loops in CompressFragment do not get unrolled as expected due to a higher cost than the unroll threshold in clang. Instead this patch uses a more conventional methods of loading unaligned data, using a memcpy directly which the compiler will be able to deal with much more straight forwardly, modelling it as a single unaligned load. The old code is left as-is for big-endian systems. This helps improve the performance of the BM_ZFlat benchmarks by up to 10-15% on an Arm Neoverse N1. Change-Id: I986f845ebd0a0806d052d2be3e4dbcbee91713d7	2022-01-19 07:14:46 +00:00
Victor Costan	cbb83a1d64	Migrate feature detection macro checks from #ifdef to #if. The #if predicate evaluates to false if the macro is undefined, or defined to 0. #ifdef (and its synonym #if defined) evaluates to false only if the macro is undefined. The new setup allows differentiating between setting a macro to 0 (to express that the capability definitely does not exist / should not be used) and leaving a macro undefined (to express not knowing whether a capability exists / not caring if a capability is used). PiperOrigin-RevId: 391094241	2021-08-16 18:26:33 +00:00
Victor Costan	5e7c14bd05	Add stubs for abseil flags. This CL also removes support for using the gflags library to modify the flags. PiperOrigin-RevId: 361583626	2021-03-08 17:26:48 +00:00
Chris Kennelly	7ffaf77cf4	Replace ARCH_K8 with __x86_64__. PiperOrigin-RevId: 321389098	2020-10-07 21:12:27 +00:00
Victor Costan	837f38b3e0	Revise stubs for ARCH_{K8,PPC,ARM}. * ARCH_K8 and ARCH_ARM now work correctly on MSVC. * ARCH_PPC now uses the same macro as tcmalloc. Microsoft documentation: https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=vs-2019 PowerPC documentation: http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655243_75216.html PiperOrigin-RevId: 310160787	2020-05-06 16:07:47 +00:00
Victor Costan	e1353b9fa8	Remove ARCH_* guards around Bits::FindLSBSetNonZero64(). Bits::FindLSBSetNonZero64() is now available unconditionally, and it should be easier to reason about the code included in each build configuration. This reduces the amount of conditional compiling going on, which makes it easier to reason about what stubs are a used in each build configuration. The guards were added to work around the fact that MSVC has a _BitScanForward64() intrinsic, but the intrinsic is only implemented on 64-bit targets, and causes a compilation error on 32-bit targets errors. By contrast, Clang and GCC support __builtin_ctzll() everywhere, and implement it with varying efficiency. This CL reworks the conditional compilation directives so that Bits::FindLSBSetNonZero64() uses the _BitScanForward64() intrinsic on MSVC when available, and the portable implementation otherwise. PiperOrigin-RevId: 310007748	2020-05-05 20:18:12 +00:00
Victor Costan	c98344f626	Fix Clang/GCC compilation warnings. This makes it easier to adopt snappy in other projects. PiperOrigin-RevId: 309958249	2020-05-05 16:15:02 +00:00
Victor Costan	5417da69b7	Switch from C headers to C++ headers. This CL makes the following substitutions. * assert.h -> cassert * math.h -> cmath * stdarg.h -> cstdarg * stdio.h -> cstdio * stdlib.h -> cstdlib * string.h -> cstring stddef.h and stdint.h are not migrated to C++ headers. PiperOrigin-RevId: 309074805	2020-04-29 19:38:03 +00:00
Victor Costan	4f195aee43	Remove mismatched #endif. PiperOrigin-RevId: 306345559	2020-04-14 00:38:04 +00:00
Victor Costan	041c608086	Remove platform-dependent code for unaligned loads/stores. Snappy issues multi-byte (16/32/64-bit) loads and stores that are not aligned, meaning the addresses are 16/32/64-bit multiples. This is accomplished using two methods: 1) The portable method allocates a uint{16,32,64}_t on the stack, and std::memcpy()s the bytes into/from the integer. This method relies on well-defined behaviori (std::memcpy() works on all valid pointers, fixed-width unsigned integer types use a pure binary representation and therefore have no invalid values), and should compile to valid code on all platforms. 2) The fast method reinterpret_casts the address to a pointer to a uint{16,32,64}_t and dereferences the pointer. This is expected to compile to one hardware instruction (mov on x86, ldr/str on arm). The caveat is that the reinterpret_cast is undefined behavior (UB) unless the address happened to be a valid uint{16,32,64}_t pointer. The UB shows up as follows. * On architectures that don't have hardware instructions for unaligned loads / stores, the pointer access can trigger a hardware exceptions. This is mitigated by #ifdef blocks that attempt to restrict the fast method to platforms that support it. * On architectures that have separate instructions for aligned and unaligned access, the compiler may need an explicit hint to emit the hardware instruction for unaligned access. This is accomplished on Clang and GCC by wrapping the pointers into structs tagged with __attribute__((__packed__)). This CL removes the fast method. Fortunately, compilers have advanced enough that the portable method gets compiled down to the same instructions as the fast method, without the need for the caveats explained above. Specifically, modern Clang, GCC and MSVC optimize std::memcpy() to a single instruction (mov / ldr / str). A test case proving this can be seen at https://godbolt.org/z/gZg2Fk PiperOrigin-RevId: 306342728	2020-04-14 00:22:20 +00:00
Victor Costan	27ff130ff9	Remove platform-dependent code for little-endian loads and stores. The platform-independent code that breaks down the loads and stores into byte-level operations is optimized into single instructions (mov or ldr/str) and instruction pairs (mov+bswap or ldr/str+rev) by recent versions of Clang and GCC. Tested at https://godbolt.org/z/2BQP-o PiperOrigin-RevId: 306321608	2020-04-13 22:30:59 +00:00
Victor Costan	a4cdb5d133	Introduce SNAPPY_ATTRIBUTE_ALWAYS_INLINE. An internal CL started using ABSL_ATTRIBUTE_ALWAYS_INLINE from Abseil. This CL introduces equivalent functionality as SNAPPY_ALWAYS_INLINE. PiperOrigin-RevId: 306289650	2020-04-13 19:51:05 +00:00
Victor Costan	231b8be076	Migrate to standard integral types. The following changes are done via find/replace. * int8 -> int8_t * int16 -> int16_t * int32 -> int32_t * int64 -> int64_t The aliases were removed from snappy-stubs-public.h. PiperOrigin-RevId: 306141557	2020-04-12 20:10:03 +00:00
Victor Costan	14bef66290	Modernize memcpy() and memmove() usage. This CL replaces memcpy() with std::memcpy() and memmove() with std::memmove(), and #includes <cstring> in files that use either function. PiperOrigin-RevId: 306067788	2020-04-12 00:06:15 +00:00
Snappy Team	0c7ed08a25	The result on protobuf benchmark is around 19%. Results vary by their propensity for compression. As the frequency of finding matches influences the amount of branch misspredicts and the amount of hashing. Two ideas 1) The code uses "heuristic match skipping" has a quadratic interpolation. However for the first 32 bytes it's just every byte. Special case 16 bytes. This removes a lot of code. 2) Load 64 bit integers and shift instead of reload. The hashing loop has a very long chain data = Load32(ip) -> hash = Hash(data) -> offset = table[hash] -> copy_data = Load32(base_ip + offset) followed by a compare between data and copy_data. This chain is around 20 cycles. It's unreasonable for the branch predictor to be able to predict when it's a match (that is completely driven by the content of the data). So when it's a miss this chain is on the critical path. By loading 64 bits and shifting we can effectively remove the first load. PiperOrigin-RevId: 302893821	2020-04-11 04:40:39 +00:00
Victor Costan	62363d9a79	Fully qualify std::string. This is in preparation for removing the snappy::string alias of std::string. PiperOrigin-RevId: 271383199	2019-09-26 10:57:29 -07:00
Chris Mumford	407712f4c9	Merge pull request #76 from abyss7:patch-1 PiperOrigin-RevId: 248211389	2019-05-14 14:27:56 -07:00
Chris Mumford	c76b053449	Sync TODO and comment processing with external repo. Copybara transforms code slightly different than MOE. One example is the TODO username stripping where Copybara produces different results than MOE did. This change moves the Copybara versions of comments to the public repository. Note: These changes didn't originate in cl/247950252. PiperOrigin-RevId: 247950252	2019-05-14 11:02:57 -07:00
Ivan	be831dc98c	Fix compilation	2019-04-25 18:44:08 +03:00
costan	fdba21ffd6	Fix typo in two argument names in stubs. The stubs are only used in the open source version, so it wasn't caught in internal tests.	2019-01-06 13:49:33 -08:00
costan	81d444e4e4	Remove direct use of _builtin_clz. A previous CL introduced _builtin_clz in zippy.cc. This is a GCC / Clang intrinsic, and is not supported in Visual Studio. The rest of the project uses bit manipulation intrinsics via the functions in Bits::, which are stubbed out for the open source build in zippy-stubs-internal.h. This CL extracts Bits::Log2FloorNonZero() out of Bits::Log2Floor() in the stubbed version of Bits, adds assertions to the Bits::*NonZero() functions in the stubs, and converts _builtin_clz to a Bits::Log2FloorNonZero() call. The latter part is not obvious. A mathematical proof of correctness is outlined in added comments. An empirical proof is available at https://godbolt.org/z/mPKWmh -- CalculateTableSizeOld(), which is the current code, compiles to the same assembly on Clang as CalculateTableSizeNew1(), which is the bigger jump in the proof. CalculateTableSizeNew2() is a fairly obvious transformation from CalculateTableSizeNew1(), and results in slightly better assembly on all supported compilers. Two benchmark runs with the same arguments as the original CL only showed differences in completely disjoint tests, suggesting that the differences are pure noise.	2019-01-06 12:49:08 -08:00
atdt	8f469d97e2	Avoid store-forwarding stalls in Zippy's IncrementalCopy NEW: Annotate `pattern` as initialized, for MSan. Snappy's IncrementalCopy routine optimizes for speed by reading and writing memory in blocks of eight or sixteen bytes. If the gap between the source and destination pointers is smaller than eight bytes, snappy's strategy is to expand the gap by issuing a series of partly-overlapping eight-byte loads+stores. Because the range of each load partly overlaps that of the store which preceded it, the store buffer cannot be forwarded to the load, and the load stalls while it waits for the store to retire. This is called a store-forwarding stall. We can use fewer loads and avoid most of the stalls by loading the first eight bytes into an 128-bit XMM register, then using PSHUFB to permute the register's contents in-place into the desired repeating sequence of bytes. When falling back to IncrementalCopySlow, use memset if the pattern size == 1. This eliminates around 60% of the stalls. name old time/op new time/op delta BM_UFlat/0 [html] 48.6µs ± 0% 48.2µs ± 0% -0.92% (p=0.000 n=19+18) BM_UFlat/1 [urls] 589µs ± 0% 576µs ± 0% -2.17% (p=0.000 n=19+18) BM_UFlat/2 [jpg] 7.12µs ± 0% 7.10µs ± 0% ~ (p=0.071 n=19+18) BM_UFlat/3 [jpg_200] 162ns ± 0% 151ns ± 0% -7.06% (p=0.000 n=19+18) BM_UFlat/4 [pdf] 8.25µs ± 0% 8.19µs ± 0% -0.74% (p=0.000 n=19+18) BM_UFlat/5 [html4] 218µs ± 0% 218µs ± 0% +0.09% (p=0.000 n=17+18) BM_UFlat/6 [txt1] 191µs ± 0% 189µs ± 0% -1.12% (p=0.000 n=19+18) BM_UFlat/7 [txt2] 168µs ± 0% 167µs ± 0% -1.01% (p=0.000 n=19+18) BM_UFlat/8 [txt3] 502µs ± 0% 499µs ± 0% -0.52% (p=0.000 n=19+18) BM_UFlat/9 [txt4] 704µs ± 0% 695µs ± 0% -1.26% (p=0.000 n=19+18) BM_UFlat/10 [pb] 45.6µs ± 0% 44.2µs ± 0% -3.13% (p=0.000 n=19+15) BM_UFlat/11 [gaviota] 188µs ± 0% 194µs ± 0% +3.06% (p=0.000 n=15+18) BM_UFlat/12 [cp] 15.1µs ± 2% 14.7µs ± 1% -2.09% (p=0.000 n=18+18) BM_UFlat/13 [c] 7.38µs ± 0% 7.36µs ± 0% -0.28% (p=0.000 n=16+18) BM_UFlat/14 [lsp] 2.31µs ± 0% 2.37µs ± 0% +2.64% (p=0.000 n=19+18) BM_UFlat/15 [xls] 984µs ± 0% 909µs ± 0% -7.59% (p=0.000 n=19+18) BM_UFlat/16 [xls_200] 215ns ± 0% 217ns ± 0% +0.71% (p=0.000 n=19+15) BM_UFlat/17 [bin] 289µs ± 0% 287µs ± 0% -0.71% (p=0.000 n=19+18) BM_UFlat/18 [bin_200] 161ns ± 0% 116ns ± 0% -28.09% (p=0.000 n=19+16) BM_UFlat/19 [sum] 31.9µs ± 0% 29.2µs ± 0% -8.37% (p=0.000 n=19+18) BM_UFlat/20 [man] 3.13µs ± 1% 3.07µs ± 0% -1.79% (p=0.000 n=19+18) name old allocs/op new allocs/op delta BM_UFlat/0 [html] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/1 [urls] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/2 [jpg] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/3 [jpg_200] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/4 [pdf] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/5 [html4] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/6 [txt1] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/7 [txt2] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/8 [txt3] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/9 [txt4] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/10 [pb] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/11 [gaviota] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/12 [cp] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/13 [c] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/14 [lsp] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/15 [xls] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/16 [xls_200] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/17 [bin] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/18 [bin_200] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/19 [sum] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/20 [man] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) name old speed new speed delta BM_UFlat/0 [html] 2.11GB/s ± 0% 2.13GB/s ± 0% +0.92% (p=0.000 n=19+18) BM_UFlat/1 [urls] 1.19GB/s ± 0% 1.22GB/s ± 0% +2.22% (p=0.000 n=16+17) BM_UFlat/2 [jpg] 17.3GB/s ± 0% 17.3GB/s ± 0% ~ (p=0.074 n=19+18) BM_UFlat/3 [jpg_200] 1.23GB/s ± 0% 1.33GB/s ± 0% +7.58% (p=0.000 n=19+18) BM_UFlat/4 [pdf] 12.4GB/s ± 0% 12.5GB/s ± 0% +0.74% (p=0.000 n=19+18) BM_UFlat/5 [html4] 1.88GB/s ± 0% 1.88GB/s ± 0% -0.09% (p=0.000 n=18+18) BM_UFlat/6 [txt1] 798MB/s ± 0% 807MB/s ± 0% +1.13% (p=0.000 n=19+18) BM_UFlat/7 [txt2] 743MB/s ± 0% 751MB/s ± 0% +1.02% (p=0.000 n=19+18) BM_UFlat/8 [txt3] 850MB/s ± 0% 855MB/s ± 0% +0.52% (p=0.000 n=19+18) BM_UFlat/9 [txt4] 684MB/s ± 0% 693MB/s ± 0% +1.28% (p=0.000 n=19+18) BM_UFlat/10 [pb] 2.60GB/s ± 0% 2.69GB/s ± 0% +3.25% (p=0.000 n=19+16) BM_UFlat/11 [gaviota] 979MB/s ± 0% 950MB/s ± 0% -2.97% (p=0.000 n=15+18) BM_UFlat/12 [cp] 1.63GB/s ± 2% 1.67GB/s ± 1% +2.13% (p=0.000 n=18+18) BM_UFlat/13 [c] 1.51GB/s ± 0% 1.52GB/s ± 0% +0.29% (p=0.000 n=16+18) BM_UFlat/14 [lsp] 1.61GB/s ± 1% 1.57GB/s ± 0% -2.57% (p=0.000 n=19+18) BM_UFlat/15 [xls] 1.05GB/s ± 0% 1.13GB/s ± 0% +8.22% (p=0.000 n=19+18) BM_UFlat/16 [xls_200] 928MB/s ± 0% 921MB/s ± 0% -0.81% (p=0.000 n=19+17) BM_UFlat/17 [bin] 1.78GB/s ± 0% 1.79GB/s ± 0% +0.71% (p=0.000 n=19+18) BM_UFlat/18 [bin_200] 1.24GB/s ± 0% 1.72GB/s ± 0% +38.92% (p=0.000 n=19+18) BM_UFlat/19 [sum] 1.20GB/s ± 0% 1.31GB/s ± 0% +9.15% (p=0.000 n=19+18) BM_UFlat/20 [man] 1.35GB/s ± 1% 1.38GB/s ± 0% +1.84% (p=0.000 n=19+18)	2018-08-04 18:51:07 -07:00
costan	632cd0f128	Use 64-bit optimized code path for ARM64. This is inspired by https://github.com/google/snappy/pull/22. Benchmark results with the change, Pixel C with Android N2G48B Benchmark Time(ns) CPU(ns) Iterations --------------------------------------------------- BM_UFlat/0 119544 119253 1501 818.9MB/s html BM_UFlat/1 1223950 1208588 163 554.0MB/s urls BM_UFlat/2 16081 15962 11527 7.2GB/s jpg BM_UFlat/3 356 352 416666 540.6MB/s jpg_200 BM_UFlat/4 25010 24860 7683 3.8GB/s pdf BM_UFlat/5 484832 481572 407 811.1MB/s html4 BM_UFlat/6 408410 408713 482 354.9MB/s txt1 BM_UFlat/7 361714 361663 553 330.1MB/s txt2 BM_UFlat/8 1090582 1087912 182 374.1MB/s txt3 BM_UFlat/9 1503127 1503759 133 305.6MB/s txt4 BM_UFlat/10 114183 114285 1715 989.6MB/s pb BM_UFlat/11 406714 407331 491 431.5MB/s gaviota BM_UIOVec/0 370397 369888 538 264.0MB/s html BM_UIOVec/1 3207510 3190000 100 209.9MB/s urls BM_UIOVec/2 16589 16573 11223 6.9GB/s jpg BM_UIOVec/3 1052 1052 165289 181.2MB/s jpg_200 BM_UIOVec/4 49151 49184 3985 1.9GB/s pdf BM_UValidate/0 68115 68095 2893 1.4GB/s html BM_UValidate/1 792652 792000 250 845.4MB/s urls BM_UValidate/2 334 334 487804 343.1GB/s jpg BM_UValidate/3 235 235 666666 809.9MB/s jpg_200 BM_UValidate/4 6126 6130 32626 15.6GB/s pdf BM_ZFlat/0 292697 290560 678 336.1MB/s html (22.31 %) BM_ZFlat/1 4062080 4050000 100 165.3MB/s urls (47.78 %) BM_ZFlat/2 29225 29274 6422 3.9GB/s jpg (99.95 %) BM_ZFlat/3 1099 1098 163934 173.7MB/s jpg_200 (73.00 %) BM_ZFlat/4 44117 44233 4205 2.2GB/s pdf (83.30 %) BM_ZFlat/5 1158058 1157894 171 337.4MB/s html4 (22.52 %) BM_ZFlat/6 1102983 1093922 181 132.6MB/s txt1 (57.88 %) BM_ZFlat/7 974142 975490 204 122.4MB/s txt2 (61.91 %) BM_ZFlat/8 2984670 2990000 100 136.1MB/s txt3 (54.99 %) BM_ZFlat/9 4100130 4090000 100 112.4MB/s txt4 (66.26 %) BM_ZFlat/10 276236 275139 716 411.0MB/s pb (19.68 %) BM_ZFlat/11 760091 759541 262 231.4MB/s gaviota (37.72 %) Baseline benchmark results, Pixel C with Android N2G48B Benchmark Time(ns) CPU(ns) Iterations --------------------------------------------------- BM_UFlat/0 148957 147565 1335 661.8MB/s html BM_UFlat/1 1527257 1500000 132 446.4MB/s urls BM_UFlat/2 19589 19397 8764 5.9GB/s jpg BM_UFlat/3 425 418 408163 455.3MB/s jpg_200 BM_UFlat/4 30096 29552 6497 3.2GB/s pdf BM_UFlat/5 595933 594594 333 657.0MB/s html4 BM_UFlat/6 516315 514360 383 282.0MB/s txt1 BM_UFlat/7 454653 453514 441 263.2MB/s txt2 BM_UFlat/8 1382687 1361111 144 299.0MB/s txt3 BM_UFlat/9 1967590 1904761 105 241.3MB/s txt4 BM_UFlat/10 148271 144560 1342 782.3MB/s pb BM_UFlat/11 523997 510471 382 344.4MB/s gaviota BM_UIOVec/0 478443 465227 417 209.9MB/s html BM_UIOVec/1 4172860 4060000 100 164.9MB/s urls BM_UIOVec/2 21470 20975 7342 5.5GB/s jpg BM_UIOVec/3 1357 1330 75187 143.4MB/s jpg_200 BM_UIOVec/4 63143 61365 3031 1.6GB/s pdf BM_UValidate/0 86910 85125 2279 1.1GB/s html BM_UValidate/1 1022256 1000000 195 669.6MB/s urls BM_UValidate/2 420 417 400000 274.6GB/s jpg BM_UValidate/3 311 302 571428 630.0MB/s jpg_200 BM_UValidate/4 7778 7584 25445 12.6GB/s pdf BM_ZFlat/0 469209 457547 424 213.4MB/s html (22.31 %) BM_ZFlat/1 5633510 5460000 100 122.6MB/s urls (47.78 %) BM_ZFlat/2 37896 36693 4524 3.1GB/s jpg (99.95 %) BM_ZFlat/3 1485 1441 123456 132.3MB/s jpg_200 (73.00 %) BM_ZFlat/4 74870 72775 2652 1.3GB/s pdf (83.30 %) BM_ZFlat/5 1857321 1785714 112 218.8MB/s html4 (22.52 %) BM_ZFlat/6 1538723 1492307 130 97.2MB/s txt1 (57.88 %) BM_ZFlat/7 1338236 1310810 148 91.1MB/s txt2 (61.91 %) BM_ZFlat/8 4050820 4040000 100 100.7MB/s txt3 (54.99 %) BM_ZFlat/9 5234940 5230000 100 87.9MB/s txt4 (66.26 %) BM_ZFlat/10 400309 400000 495 282.7MB/s pb (19.68 %) BM_ZFlat/11 1063042 1058510 188 166.1MB/s gaviota (37.72 %)	2017-08-16 19:18:22 -07:00
costan	77c12adc19	Add unistd.h checks back to the CMake build. getpagesize(), as well as its POSIX.2001 replacement sysconf(_SC_PAGESIZE), is defined in <unistd.h>. On Linux and OS X, including <sys/mman.h> is sufficient to get a definition for getpagesize(). However, this is not true for the Android NDK. This CL brings back the HAVE_UNISTD_H definition and its associated header check. This also adds a HAVE_FUNC_SYSCONF definition, which checks for the presence of sysconf(). The definition can be used later to replace getpagesize() with sysconf().	2017-08-02 10:56:06 -07:00
costan	f0d3237c32	Use _BitScanForward and _BitScanReverse on MSVC. Based on https://github.com/google/snappy/pull/30	2017-08-01 14:38:02 -07:00
jueminyang	71b8f86887	Add SNAPPY_ prefix to PREDICT_{TRUE,FALSE} macros.	2017-08-01 14:36:26 -07:00
costan	038a3329b1	Inline DISALLOW_COPY_AND_ASSIGN. snappy-stubs-public.h defined the DISALLOW_COPY_AND_ASSIGN macro, so the definition propagated to all translation units that included the open source headers. The macro is now inlined, thus avoiding polluting the macro environment of snappy users.	2017-07-27 16:46:42 -07:00
alkis	18488d6212	Use 64 bit little endian on ppc64le. This has tangible performance benefits. This lands https://github.com/google/snappy/pull/27	2017-06-28 18:33:13 -07:00
costan	8b60aac4fd	Remove "using namespace std;" from zippy-stubs-internal.h. This makes it easier to build zippy, as some compiles require a warning suppression to accept "using namespace std".	2017-03-13 13:03:01 -07:00
alkis	597fa795de	Delete UnalignedCopy64 from snappy-stubs since the version in snappy.cc is more robust and possibly faster (assuming the compiler knows how to best copy 8 bytes between locations in memory the fastest way possible - a rather safe bet).	2017-03-08 11:42:30 -08:00
Steinar H. Gunderson	0800b1e4c7	Work around an issue where some compilers interpret <:: as a trigraph. Also correct the namespace name.	2016-01-08 15:05:44 +01:00
Steinar H. Gunderson	e7d2818d1e	Unbreak the open-source build for ARM due to missing ATTRIBUTE_PACKED declaration.	2016-01-08 11:40:06 +01:00
Steinar H. Gunderson	ef5598aa0e	Make UNALIGNED_LOAD16/32 on ARMv7 go through an explicitly unaligned struct, to avoid the compiler coalescing multiple loads into a single load instruction (which only work for aligned accesses). A typical example where GCC would coalesce: uint8* p = ...; uint32 a = UNALIGNED_LOAD32(p); uint32 b = UNALIGNED_LOAD32(p + 4); uint32 c = a \| b;	2016-01-04 12:51:31 +01:00
Steinar H. Gunderson	22acaf438e	Change some internal path names. This is mostly to sync up with some changes from Google's internal repositories; it does not affect the open-source distribution in itself.	2015-06-22 15:39:08 +02:00
snappy.mirrorbot@gmail.com	3ec60ac987	Mark ARMv4 as not supporting unaligned accesses (not just ARMv5 and ARMv6); apparently Debian still targets these by default, giving us segfaults on armel. R=sanjay git-svn-id: https://snappy.googlecode.com/svn/trunk@64 03e5f5b5-db94-4691-08a0-1a8bf15f6143	2012-07-04 09:28:33 +00:00
snappy.mirrorbot@gmail.com	8b95464146	Snappy library no longer depends on iostream. Achieved by moving logging macro definitions to a test-only header file, and by changing non-test code to use assert, fprintf, and abort instead of LOG/CHECK macros. R=sesse git-svn-id: https://snappy.googlecode.com/svn/trunk@62 03e5f5b5-db94-4691-08a0-1a8bf15f6143	2012-05-22 09:32:50 +00:00
snappy.mirrorbot@gmail.com	f8829ea39d	Enable the use of unaligned loads and stores for ARM-based architectures where they are available (ARMv7 and higher). This gives a significant speed boost on ARM, both for compression and decompression. It should not affect x86 at all. There are more changes possible to speed up ARM, but it might not be that easy to do without hurting x86 or making the code uglier. Also, we de not try to use NEON yet. Microbenchmark results on a Cortex-A9 1GHz, using g++ 4.6.2 (from Ubuntu/Linaro), -O2 -DNDEBUG -Wa,-march=armv7a -mtune=cortex-a9 -mthumb-interwork: Benchmark Time(ns) CPU(ns) Iterations --------------------------------------------------- BM_UFlat/0 524806 529100 378 184.6MB/s html [+33.6%] BM_UFlat/1 5139790 5200000 100 128.8MB/s urls [+28.8%] BM_UFlat/2 86540 84166 1901 1.4GB/s jpg [ +0.6%] BM_UFlat/3 215351 210176 904 428.0MB/s pdf [+29.8%] BM_UFlat/4 2144490 2100000 100 186.0MB/s html4 [+33.3%] BM_UFlat/5 194482 190000 1000 123.5MB/s cp [+36.2%] BM_UFlat/6 91843 90175 2107 117.9MB/s c [+38.6%] BM_UFlat/7 28535 28426 6684 124.8MB/s lsp [+34.7%] BM_UFlat/8 9206600 9200000 100 106.7MB/s xls [+42.4%] BM_UFlat/9 1865273 1886792 106 76.9MB/s txt1 [+32.5%] BM_UFlat/10 1576809 1587301 126 75.2MB/s txt2 [+32.3%] BM_UFlat/11 4968450 4900000 100 83.1MB/s txt3 [+32.7%] BM_UFlat/12 6673970 6700000 100 68.6MB/s txt4 [+32.8%] BM_UFlat/13 2391470 2400000 100 203.9MB/s bin [+29.2%] BM_UFlat/14 334601 344827 522 105.8MB/s sum [+30.6%] BM_UFlat/15 37404 38080 5252 105.9MB/s man [+33.8%] BM_UFlat/16 535470 540540 370 209.2MB/s pb [+31.2%] BM_UFlat/17 1875245 1886792 106 93.2MB/s gaviota [+37.8%] BM_UValidate/0 178425 179533 1114 543.9MB/s html [ +2.7%] BM_UValidate/1 2100450 2000000 100 334.8MB/s urls [ +5.0%] BM_UValidate/2 1039 1044 172413 113.3GB/s jpg [ +3.4%] BM_UValidate/3 59423 59470 3363 1.5GB/s pdf [ +7.8%] BM_UValidate/4 760716 766283 261 509.8MB/s html4 [ +6.5%] BM_ZFlat/0 1204632 1204819 166 81.1MB/s html (23.57 %) [+32.8%] BM_ZFlat/1 15656190 15600000 100 42.9MB/s urls (50.89 %) [+27.6%] BM_ZFlat/2 403336 410677 487 294.8MB/s jpg (99.88 %) [+16.5%] BM_ZFlat/3 664073 671140 298 134.0MB/s pdf (82.13 %) [+28.4%] BM_ZFlat/4 4961940 4900000 100 79.7MB/s html4 (23.55 %) [+30.6%] BM_ZFlat/5 500664 501253 399 46.8MB/s cp (48.12 %) [+33.4%] BM_ZFlat/6 217276 215982 926 49.2MB/s c (42.40 %) [+25.0%] BM_ZFlat/7 64122 65487 3054 54.2MB/s lsp (48.37 %) [+36.1%] BM_ZFlat/8 18045730 18000000 100 54.6MB/s xls (41.34 %) [+34.4%] BM_ZFlat/9 4051530 4000000 100 36.3MB/s txt1 (59.81 %) [+25.0%] BM_ZFlat/10 3451800 3500000 100 34.1MB/s txt2 (64.07 %) [+25.7%] BM_ZFlat/11 11052340 11100000 100 36.7MB/s txt3 (57.11 %) [+24.3%] BM_ZFlat/12 14538690 14600000 100 31.5MB/s txt4 (68.35 %) [+24.7%] BM_ZFlat/13 5041850 5000000 100 97.9MB/s bin (18.21 %) [+32.0%] BM_ZFlat/14 908840 909090 220 40.1MB/s sum (51.88 %) [+22.2%] BM_ZFlat/15 86921 86206 1972 46.8MB/s man (59.36 %) [+42.2%] BM_ZFlat/16 1312315 1315789 152 86.0MB/s pb (23.15 %) [+34.5%] BM_ZFlat/17 3173120 3200000 100 54.9MB/s gaviota (38.27%) [+28.1%] The move from 64-bit to 32-bit operations for the copies also affected 32-bit x86; positive on the decompression side, and slightly negative on the compression side (unless that is noise; I only ran once): Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------- BM_UFlat/0 86279 86140 7778 1.1GB/s html [ +7.5%] BM_UFlat/1 839265 822622 778 813.9MB/s urls [ +9.4%] BM_UFlat/2 9180 9143 87500 12.9GB/s jpg [ +1.2%] BM_UFlat/3 35080 35000 20000 2.5GB/s pdf [+10.1%] BM_UFlat/4 350318 345000 2000 1.1GB/s html4 [ +7.0%] BM_UFlat/5 33808 33472 21212 701.0MB/s cp [ +9.0%] BM_UFlat/6 15201 15214 46667 698.9MB/s c [+14.9%] BM_UFlat/7 4652 4651 159091 762.9MB/s lsp [ +7.5%] BM_UFlat/8 1285551 1282528 538 765.7MB/s xls [+10.7%] BM_UFlat/9 282510 281690 2414 514.9MB/s txt1 [+13.6%] BM_UFlat/10 243494 239286 2800 498.9MB/s txt2 [+14.4%] BM_UFlat/11 743625 740000 1000 550.0MB/s txt3 [+14.3%] BM_UFlat/12 999441 989717 778 464.3MB/s txt4 [+16.1%] BM_UFlat/13 412402 410076 1707 1.2GB/s bin [ +7.3%] BM_UFlat/14 54876 54000 10000 675.3MB/s sum [+13.0%] BM_UFlat/15 6146 6100 100000 660.8MB/s man [+14.8%] BM_UFlat/16 90496 90286 8750 1.2GB/s pb [ +4.0%] BM_UFlat/17 292650 292000 2500 602.0MB/s gaviota [+18.1%] BM_UValidate/0 49620 49699 14286 1.9GB/s html [ +0.0%] BM_UValidate/1 501371 500000 1000 1.3GB/s urls [ +0.0%] BM_UValidate/2 232 227 3043478 521.5GB/s jpg [ +1.3%] BM_UValidate/3 17250 17143 43750 5.1GB/s pdf [ -1.3%] BM_UValidate/4 198643 200000 3500 1.9GB/s html4 [ -0.9%] BM_ZFlat/0 227128 229415 3182 425.7MB/s html (23.57 %) [ -1.4%] BM_ZFlat/1 2970089 2960000 250 226.2MB/s urls (50.89 %) [ -1.9%] BM_ZFlat/2 45683 44999 15556 2.6GB/s jpg (99.88 %) [ +2.2%] BM_ZFlat/3 114661 113136 6364 795.1MB/s pdf (82.13 %) [ -1.5%] BM_ZFlat/4 919702 914286 875 427.2MB/s html4 (23.55%) [ -1.3%] BM_ZFlat/5 108189 108422 6364 216.4MB/s cp (48.12 %) [ -1.2%] BM_ZFlat/6 44525 44000 15909 241.7MB/s c (42.40 %) [ -2.9%] BM_ZFlat/7 15973 15857 46667 223.8MB/s lsp (48.37 %) [ +0.0%] BM_ZFlat/8 2677888 2639405 269 372.1MB/s xls (41.34 %) [ -1.4%] BM_ZFlat/9 800715 780000 1000 186.0MB/s txt1 (59.81 %) [ -0.4%] BM_ZFlat/10 700089 700000 1000 170.5MB/s txt2 (64.07 %) [ -2.9%] BM_ZFlat/11 2159356 2138365 318 190.3MB/s txt3 (57.11 %) [ -0.3%] BM_ZFlat/12 2796143 2779923 259 165.3MB/s txt4 (68.35 %) [ -1.4%] BM_ZFlat/13 856458 835476 778 585.8MB/s bin (18.21 %) [ -0.1%] BM_ZFlat/14 166908 166857 4375 218.6MB/s sum (51.88 %) [ -1.4%] BM_ZFlat/15 21181 20857 35000 193.3MB/s man (59.36 %) [ -0.8%] BM_ZFlat/16 244009 239973 2917 471.3MB/s pb (23.15 %) [ -1.4%] BM_ZFlat/17 596362 590000 1000 297.9MB/s gaviota (38.27%) [ +0.0%] R=sanjay git-svn-id: https://snappy.googlecode.com/svn/trunk@59 03e5f5b5-db94-4691-08a0-1a8bf15f6143	2012-02-21 17:02:17 +00:00
snappy.mirrorbot@gmail.com	d9068ee301	Fix public issue r57: Fix most warnings with -Wall, mostly signed/unsigned warnings. There are still some in the unit test, but the main .cc file should be clean. We haven't enabled -Wall for the default build, since the unit test is still not clean. This also fixes a real bug in the open-source implementation of ReadFileToStringOrDie(); it would not detect errors correctly. I had to go through some pains to avoid performance loss as the types were changed; I think there might still be some with 32-bit if and only if LFS is enabled (ie., size_t is 64-bit), but for regular 32-bit and 64-bit I can't see any losses, and I've diffed the generated GCC assembler between the old and new code without seeing any significant choices. If anything, it's ever so slightly faster. This may or may not enable compression of very large blocks (>2^32 bytes) when size_t is 64-bit, but I haven't checked, and it is still not a supported case. git-svn-id: https://snappy.googlecode.com/svn/trunk@56 03e5f5b5-db94-4691-08a0-1a8bf15f6143	2012-01-04 13:10:46 +00:00
snappy.mirrorbot@gmail.com	e2e3032868	Fix public issue #50 : Include generic byteswap macros. Also include Solaris 10 and FreeBSD versions. R=csilvers git-svn-id: https://snappy.googlecode.com/svn/trunk@49 03e5f5b5-db94-4691-08a0-1a8bf15f6143	2011-09-15 09:50:05 +00:00
snappy.mirrorbot@gmail.com	f1063a5dc4	Use the right #ifdef test for sys/mman.h. Based on patch by Travis Vitek. git-svn-id: https://snappy.googlecode.com/svn/trunk@47 03e5f5b5-db94-4691-08a0-1a8bf15f6143	2011-08-10 18:44:16 +00:00
snappy.mirrorbot@gmail.com	3dd93f3ec7	Fix public issue #27 : Add HAVE_CONFIG_H tests around the config.h inclusion in snappy-stubs-internal.h, which eases compiling outside the automake/autoconf framework. R=csilvers DELTA=5 (4 added, 1 deleted, 0 changed) Revision created by MOE tool push_codebase. MOE_MIGRATION=1152 git-svn-id: https://snappy.googlecode.com/svn/trunk@25 03e5f5b5-db94-4691-08a0-1a8bf15f6143	2011-03-30 20:27:53 +00:00
snappy.mirrorbot@gmail.com	b4bbc1041b	Change Snappy from the Apache 2.0 to a BSD-type license. R=dannyb DELTA=328 (80 added, 184 deleted, 64 changed) Revision created by MOE tool push_codebase. MOE_MIGRATION=1061 git-svn-id: https://snappy.googlecode.com/svn/trunk@20 03e5f5b5-db94-4691-08a0-1a8bf15f6143	2011-03-25 16:14:41 +00:00
snappy.mirrorbot@gmail.com	2e182e9bb8	Make the unit test work on systems without mmap(). This is required for, among others, Windows support. For Windows in specific, we could have used CreateFileMapping/MapViewOfFile, but this should at least get us a bit closer to compiling, and is of course also relevant for embedded systems with no MMU. (Part 1/2) R=csilvers DELTA=9 (8 added, 0 deleted, 1 changed) Revision created by MOE tool push_codebase. MOE_MIGRATION=1031 git-svn-id: https://snappy.googlecode.com/svn/trunk@15 03e5f5b5-db94-4691-08a0-1a8bf15f6143	2011-03-24 19:12:27 +00:00
snappy.mirrorbot@gmail.com	28a6440239	Revision created by MOE tool push_codebase. MOE_MIGRATION= git-svn-id: https://snappy.googlecode.com/svn/trunk@2 03e5f5b5-db94-4691-08a0-1a8bf15f6143	2011-03-18 17:14:15 +00:00

46 Commits