Commit Graph

243 Commits

Author SHA1 Message Date
Victor Costan 113cd97ab3 Tighten types on a few for loops.
* Replace post-increment with pre-increment in for loops.
* Replace unsigned int counters with precise types, like uint8_t.
* Switch to C++11 iterating loops when possible.

PiperOrigin-RevId: 309724233
2020-05-04 12:32:00 +00:00
Victor Costan abde3abb1f Fix Travis CI build.
PiperOrigin-RevId: 309143110
2020-04-30 02:09:07 +00:00
Victor Costan e6506681fa Fix accidental double std:: qualifiers.
PiperOrigin-RevId: 309136120
2020-04-30 01:19:26 +00:00
Victor Costan 63620c06d2 Add some std:: qualifiers to types and functions.
PiperOrigin-RevId: 309110343
2020-04-29 22:31:55 +00:00
Victor Costan 5417da69b7 Switch from C headers to C++ headers.
This CL makes the following substitutions.

* assert.h -> cassert
* math.h -> cmath
* stdarg.h -> cstdarg
* stdio.h -> cstdio
* stdlib.h -> cstdlib
* string.h -> cstring

stddef.h and stdint.h are not migrated to C++ headers.

PiperOrigin-RevId: 309074805
2020-04-29 19:38:03 +00:00
Victor Costan 251d935d50 Remove #include <string> from snappy-stubs-public.h.
The header hasn't been needed since the removal of the snappy::string
alias to std::string.

PiperOrigin-RevId: 306446542
2020-04-14 16:50:30 +00:00
Victor Costan 4f195aee43 Remove mismatched #endif.
PiperOrigin-RevId: 306345559
2020-04-14 00:38:04 +00:00
Victor Costan 041c608086 Remove platform-dependent code for unaligned loads/stores.
Snappy issues multi-byte (16/32/64-bit) loads and stores that are not
aligned, meaning the addresses are 16/32/64-bit multiples. This is
accomplished using two methods:

1) The portable method allocates a uint{16,32,64}_t on the stack, and
std::memcpy()s the bytes into/from the integer. This method relies on
well-defined behaviori (std::memcpy() works on all valid pointers,
fixed-width unsigned integer types use a pure binary representation and
therefore have no invalid values), and should compile to valid code on
all platforms.

2) The fast method reinterpret_casts the address to a pointer to a
uint{16,32,64}_t and dereferences the pointer. This is expected to
compile to one hardware instruction (mov on x86, ldr/str on arm). The
caveat is that the reinterpret_cast is undefined behavior (UB) unless the
address happened to be a valid uint{16,32,64}_t pointer. The UB shows up
as follows.
* On architectures that don't have hardware instructions for unaligned
  loads / stores, the pointer access can trigger a hardware exceptions.
  This is mitigated by #ifdef blocks that attempt to restrict the fast
  method to platforms that support it.
* On architectures that have separate instructions for aligned and
  unaligned access, the compiler may need an explicit hint to emit the
  hardware instruction for unaligned access. This is accomplished on
  Clang and GCC by wrapping the pointers into structs tagged with
  __attribute__((__packed__)).

This CL removes the fast method. Fortunately, compilers have advanced
enough that the portable method gets compiled down to the same
instructions as the fast method, without the need for the caveats
explained above. Specifically, modern Clang, GCC and MSVC optimize
std::memcpy() to a single instruction (mov / ldr / str). A test case
proving this can be seen at https://godbolt.org/z/gZg2Fk
PiperOrigin-RevId: 306342728
2020-04-14 00:22:20 +00:00
Victor Costan 27ff130ff9 Remove platform-dependent code for little-endian loads and stores.
The platform-independent code that breaks down the loads and stores into
byte-level operations is optimized into single instructions (mov or
ldr/str) and instruction pairs (mov+bswap or ldr/str+rev) by recent
versions of Clang and GCC. Tested at https://godbolt.org/z/2BQP-o

PiperOrigin-RevId: 306321608
2020-04-13 22:30:59 +00:00
Victor Costan a4cdb5d133 Introduce SNAPPY_ATTRIBUTE_ALWAYS_INLINE.
An internal CL started using ABSL_ATTRIBUTE_ALWAYS_INLINE
from Abseil. This CL introduces equivalent functionality as
SNAPPY_ALWAYS_INLINE.

PiperOrigin-RevId: 306289650
2020-04-13 19:51:05 +00:00
Victor Costan 231b8be076 Migrate to standard integral types.
The following changes are done via find/replace.
* int8 -> int8_t
* int16 -> int16_t
* int32 -> int32_t
* int64 -> int64_t

The aliases were removed from snappy-stubs-public.h.

PiperOrigin-RevId: 306141557
2020-04-12 20:10:03 +00:00
Victor Costan 14bef66290 Modernize memcpy() and memmove() usage.
This CL replaces memcpy() with std::memcpy()
and memmove() with std::memmove(), and #includes
<cstring> in files that use either function.

PiperOrigin-RevId: 306067788
2020-04-12 00:06:15 +00:00
Snappy Team d674348a0c Improve zippy with 5-10%.
BM_ZCord/0        [html   ]            1.26GB/s ± 0%           1.35GB/s ± 0%   +7.90%          (p=0.008 n=5+5)
BM_ZCord/1        [urls   ]             535MB/s ± 0%            562MB/s ± 0%   +5.05%          (p=0.008 n=5+5)
BM_ZCord/2        [jpg    ]            10.2GB/s ± 1%           10.2GB/s ± 0%     ~             (p=0.310 n=5+5)
BM_ZCord/3        [jpg_200]             841MB/s ± 1%            846MB/s ± 1%     ~             (p=0.421 n=5+5)
BM_ZCord/4        [pdf    ]            6.77GB/s ± 1%           7.06GB/s ± 1%   +4.28%          (p=0.008 n=5+5)
BM_ZCord/5        [html4  ]            1.00GB/s ± 0%           1.08GB/s ± 0%   +7.94%          (p=0.008 n=5+5)
BM_ZCord/6        [txt1   ]             391MB/s ± 0%            417MB/s ± 0%   +6.71%          (p=0.008 n=5+5)
BM_ZCord/7        [txt2   ]             363MB/s ± 0%            388MB/s ± 0%   +6.73%          (p=0.016 n=5+4)
BM_ZCord/8        [txt3   ]             400MB/s ± 0%            426MB/s ± 0%   +6.55%          (p=0.008 n=5+5)
BM_ZCord/9        [txt4   ]             328MB/s ± 0%            350MB/s ± 0%   +6.66%          (p=0.008 n=5+5)
BM_ZCord/10       [pb     ]            1.67GB/s ± 1%           1.80GB/s ± 0%   +7.52%          (p=0.008 n=5+5)

1) A key bottleneck in the data dependency chain is figuring out how many bytes are matched and loading the data for next hash value. The load-to-use latency is 5 cycles, in previous cl/303353110 we removed the load in lieu of "shrd" to align previous loads. Unfortunately "shrd" itself has a latency of 4 cycles, we'd prefer "shrx" which takes 1 cycle for variable shifts.
2)Maximally use data already computed. The above trick calculates 5 bytes of useful data. So in case we need to search for new match we can use this for the first search (which is one byte further).

PiperOrigin-RevId: 303875535
2020-04-11 04:41:15 +00:00
Snappy Team 4dfcad9f4e assertion failure on darwin_x86_64, have to investigage
PiperOrigin-RevId: 303428229
2020-04-11 04:41:07 +00:00
Snappy Team e19178748f assertion failure on darwin_x86_64, have to investigage
PiperOrigin-RevId: 303346402
2020-04-11 04:40:57 +00:00
Snappy Team 0faf56378e This cl does two things
1) It shaves of a few cycles from the data dependency chain. By using "shrd" instead of a load.
2) The important loop is finding small copies (4-12) which are either "copy 1", or "copy 2" depending if the offset fits <2048. It turns out that this is a branch that is mispredicted often. Due to the long dependency chain the CPU is running with IPC~1 anyway so we can freely add instructions to instead emit copies branchfree. This reduces the branch misspredicts from 15% to 11% (for BM_ZFlat/6 txt1) and from 5.6% to 4% (for BM_ZFlat/10 or pb).

PiperOrigin-RevId: 303328967
2020-04-11 04:40:48 +00:00
Snappy Team 0c7ed08a25 The result on protobuf benchmark is around 19%. Results vary by their propensity for compression. As the frequency of finding matches influences the amount of branch misspredicts and the amount of hashing.
Two ideas
1) The code uses "heuristic match skipping" has a quadratic interpolation. However for the first 32 bytes it's just every byte. Special case 16 bytes. This removes a lot of code.
2) Load 64 bit integers and shift instead of reload. The hashing loop has a very long chain data = Load32(ip) -> hash = Hash(data) -> offset = table[hash] -> copy_data = Load32(base_ip + offset) followed by a compare between data and copy_data. This chain is around 20 cycles. It's unreasonable for the branch predictor to be able to predict when it's a match (that is completely driven by the content of the data). So when it's a miss this chain is on the critical path. By loading 64 bits and shifting we can effectively remove the first load.

PiperOrigin-RevId: 302893821
2020-04-11 04:40:39 +00:00
Snappy Team 3c77e01459 1) Make the output pointer a local variable such it doesn't need a load add store on it's loop carried dependency chain.
2) Reduce the input pointer loop carried dependency chain from 7 cycles to 4 cycles by using pre-loading. This is a very subtle point.
3) Just brutally copy 64 bytes which removes a difficult to predict branch from the inner most loop. There is enough bandwidth to do so in the intrinsic cycles of the loop.
4) Implement limit pointers that include the slop region. This removes unnecessary instructions from the hot path.
5) It seems the removal of the difficult to predict branch has removed the code sensitivity to alignment, so remove the asm nop's.

PiperOrigin-RevId: 294692928
2020-04-11 04:40:29 +00:00
Snappy Team 9eabb7baba Cut a load from the critical dependency chain of the input pointer by speculating the uncommon case of COPY_4 is not happening.
PiperOrigin-RevId: 293803653
2020-04-11 04:40:20 +00:00
Snappy Team cddd9c0875 Improve comments in IncrementalCopy, add an assert.
PiperOrigin-RevId: 292506754
2020-04-11 04:40:09 +00:00
Victor Costan 537f4ad624 Tag open source release 1.1.8.
PiperOrigin-RevId: 289675084
2020-01-14 10:58:53 -08:00
Snappy Team b5477a8457 Optimize IncrementalCopy: There are between 1 and 4 copy iterations. Allow FDO to work with full knowledge of the probabilities for each branch.
On skylake, this improves protobuf and html decompression speed by 15% and 9% respectively, and the rest by ~2%.
On haswell, this improves protobuf and html decompression speed by 23% and 16% respectively, and the rest by ~3%.

PiperOrigin-RevId: 289090401
2020-01-14 10:58:42 -08:00
Victor Costan f5acee902c Move CI to Visual Studio 2019.
PiperOrigin-RevId: 279785698
2019-11-11 12:05:59 -08:00
Victor Costan 26410cc4f8 Merge pull request #85 from bitomaxsp:patch-1
PiperOrigin-RevId: 279633518
2019-11-10 14:10:50 -08:00
Victor Costan 0eec45ed16 Align CMake configuration with related projects.
PiperOrigin-RevId: 279237837
2019-11-07 22:39:04 -08:00
Victor Costan 6617df53fa Remove redundant PROJECT_SOURCE_DIR usage from CMake config.
Inspired by https://github.com/google/crc32c/pull/32

PiperOrigin-RevId: 278718367
2019-11-05 16:35:29 -08:00
Victor Costan f48c38f91a Fix one forgotten instance of StringPrintf -> StrFormat.
PiperOrigin-RevId: 278315159
2019-11-04 00:09:19 -08:00
Victor Costan c9212708b2 Fix build errors.
PiperOrigin-RevId: 278310119
2019-11-03 23:24:02 -08:00
Victor Costan eb2eb73e6b Test CMake installation on Travis.
PiperOrigin-RevId: 278300416
2019-11-03 21:51:20 -08:00
Snappy Team 8f32e3fbc0 Internal changes
PiperOrigin-RevId: 277555451
2019-11-03 21:51:08 -08:00
Dmitry 38945971d6
Allow build with different standard if lib used as a subproject 2019-10-17 14:17:49 +02:00
Victor Costan e9e11b84e6 Fix Travis CI build.
* Fix bash conditionals: [ a == b ] should be [ a = b ].
* Upgrade to LLVM 9 on Travis.
* Upgrade fuzzer build arguments for LLVM 9.

PiperOrigin-RevId: 271898655
2019-09-29 20:39:28 -07:00
Victor Costan 9dabbca006 Remove snappy::string alias to std::string.
PiperOrigin-RevId: 271678325
2019-09-28 09:04:06 -07:00
Victor Costan 62363d9a79 Fully qualify std::string.
This is in preparation for removing the snappy::string alias
of std::string.

PiperOrigin-RevId: 271383199
2019-09-26 10:57:29 -07:00
Victor Costan d837d5cfe1 Merge pull request #80 from tmm1:patch-2
PiperOrigin-RevId: 264514195
2019-08-21 09:11:04 -07:00
Victor Costan 44d84addf2 Fix benchmarks.
PiperOrigin-RevId: 264501168
2019-08-20 17:17:53 -07:00
Victor Costan c6bf1170d8 Fix benchmarks.
PiperOrigin-RevId: 264420835
2019-08-20 13:16:53 -07:00
Victor Costan 6219c7787b Fix unused variable warnings in fuzzers.
PiperOrigin-RevId: 264377331
2019-08-20 13:16:41 -07:00
Victor Costan 5a57d32566 Rename zippy_*_fuzzer.cc -> snappy_*_fuzzer.cc.
PiperOrigin-RevId: 264321311
2019-08-19 23:43:34 -07:00
Victor Costan fd79e6f9b2 Merge pull request #78 from bshastry:libfuzzer-harness
PiperOrigin-RevId: 264241380
2019-08-19 14:30:13 -07:00
Shahriar Rouf 4c7f2d5dfb Add BM_ZFlatAll, BM_ZFlatIncreasingTableSize benchmarks to see how good zippy performs when it is processing different data one after the other.
PiperOrigin-RevId: 257518137
2019-08-19 14:30:00 -07:00
Bhargava Shastry a58d4b03c5 Update travis config for fuzzer builds 2019-07-27 10:57:49 +02:00
Aman Gupta d926a6bcb5
Updated to match .gitignore from google/leveldb 2019-07-20 12:49:48 -07:00
Aman Gupta 6662dfb5d4
Create .gitignore 2019-07-13 13:08:35 -07:00
Bhargava Shastry d71375bf8a Add libFuzzer harnesses, a cmake option to build them 2019-07-12 14:42:48 +02:00
Chris Mumford 156cd8939c Removed reference to deprecated autotools.
PiperOrigin-RevId: 253128048
2019-06-14 15:40:42 -07:00
Victor Costan fe702ad2a3 Use GCC 9 on Travis CI
PiperOrigin-RevId: 249995900
2019-05-25 14:37:17 -07:00
Chris Mumford a3e012d762 The snappy landing page at http://google.github.io/snappy/ is
served by [GitHub Pages](https://pages.github.com/) and lives
in the gh-pages branch. This changes moves the page contents
to a more easily accessed Markdown file.

PiperOrigin-RevId: 248561542
2019-05-16 11:11:34 -07:00
Chris Mumford 4312f49315 Merge pull request #75 from Maikuolan:patch-1
PiperOrigin-RevId: 248558516
2019-05-16 11:11:21 -07:00
Chris Mumford 407712f4c9 Merge pull request #76 from abyss7:patch-1
PiperOrigin-RevId: 248211389
2019-05-14 14:27:56 -07:00