Go to file
jgorbe ca37ab7fb9 Ensure DecompressAllTags starts on a 32-byte boundary + 16 bytes.
First of all, I'm sorry about this ugly hack. I hope the following long
explanation is enough to justify it.

We have observed that, in some conditions, the results for dataset number 10
(pb) in the zippy benchmark can show a >20% regression on Skylake CPUs.

In order to diagnose this, we profiled the benchmark looking at hot functions
(99% of the time is spent on DecompressAllTags), then looked at the generated
code to see if there was any difference. In order to discard a minor difference
we observed in register allocation we replaced zippy.cc with a pre-built assembly
file so it was the same in both variants, and we still were able to reproduce the
regression.

After discarding a regression caused by the compiler, we digged a bit further
and noticed that the alignment of the function in the final binary was
different. Both were aligned to a 16-byte boundary, but the slower one was also
(by chance) aligned to a 32-byte boundary. A regression caused by alignment
differences would explain why I could reproduce it consistently on the same CitC
client, but not others: slight differences in the sources can cause the resulting
binary to have different layout.

Here are some detailed benchmark results before/after the fix. Note how fixing
the alignment makes the difference between baseline and experiment go away, but
regular 32-byte alignment puts both variants in the same ballpark as the
original regression:

Original (note BM_UCord_10 and BM_UDataBuffer_10 around the -24% line):

  BASELINE
  BM_UCord/10                    2938           2932          24194 3.767GB/s  pb
  BM_UDataBuffer/10              3008           3004          23316 3.677GB/s  pb

  EXPERIMENT
  BM_UCord/10                    3797           3789          18512 2.915GB/s  pb
  BM_UDataBuffer/10              4024           4016          17543 2.750GB/s  pb

Aligning DecompressAllTags to a 32-byte boundary:

  BASELINE
  BM_UCord/10                    3872           3862          18035 2.860GB/s  pb
  BM_UDataBuffer/10              4010           3998          17591 2.763GB/s  pb

  EXPERIMENT
  BM_UCord/10                    3884           3876          18126 2.850GB/s  pb
  BM_UDataBuffer/10              4037           4027          17199 2.743GB/s  pb

Aligning DecompressAllTags to a 32-byte boundary + 16 bytes (this patch):

  BASELINE
  BM_UCord/10                    3103           3095          22642 3.569GB/s  pb
  BM_UDataBuffer/10              3186           3177          21947 3.476GB/s  pb

  EXPERIMENT
  BM_UCord/10                    3104           3095          22632 3.569GB/s  pb
  BM_UDataBuffer/10              3167           3159          22076 3.496GB/s  pb

This change forces the "good" alignment for DecompressAllTags which, if
anything, should make benchmark results more stable (and maybe we'll improve
some unlucky application!).
2018-02-17 00:47:18 -08:00
cmake Use 64-bit optimized code path for ARM64. 2017-08-16 19:18:22 -07:00
testdata Fix public issue 82: Stop distributing benchmark data files that have 2014-02-19 10:31:49 +00:00
.appveyor.yml Redo CMake configuration. 2017-07-28 10:14:21 -07:00
.travis.yml Fix Travis CI configuration for OSX. 2018-01-04 15:27:36 -08:00
AUTHORS Revision created by MOE tool push_codebase. 2011-03-18 17:14:15 +00:00
CMakeLists.txt Tag open source release 1.1.7. 2017-08-24 16:54:23 -07:00
CONTRIBUTING.md Add guidelines for opensource contributions. 2017-08-01 14:38:24 -07:00
COPYING Change some internal path names. 2015-06-22 15:39:08 +02:00
NEWS Tag open source release 1.1.7. 2017-08-24 16:54:23 -07:00
README.md Remove benchmarking support for fastlz. 2017-06-28 18:33:55 -07:00
format_description.txt In the format description, use a clearer example to emphasize that varints are 2011-10-05 12:27:12 +00:00
framing_format.txt Add support for padding in the Snappy framed format. 2013-10-25 13:31:27 +00:00
snappy-c.cc Include C bindings of Snappy, contributed by Martin Gieseking. 2011-04-08 09:51:53 +00:00
snappy-c.h Change some internal path names. 2015-06-22 15:39:08 +02:00
snappy-internal.h Use 64-bit optimized code path for ARM64. 2017-08-16 19:18:22 -07:00
snappy-sinksource.cc Add support for Uncompress(source, sink). Various changes to allow 2015-07-06 14:21:00 +02:00
snappy-sinksource.h Add support for Uncompress(source, sink). Various changes to allow 2015-07-06 14:21:00 +02:00
snappy-stubs-internal.cc Change Snappy from the Apache 2.0 to a BSD-type license. 2011-03-25 16:14:41 +00:00
snappy-stubs-internal.h Use 64-bit optimized code path for ARM64. 2017-08-16 19:18:22 -07:00
snappy-stubs-public.h.in Fix generated version number in open source release. 2017-12-20 14:32:54 -08:00
snappy-test.cc Update Travis CI config, add AppVeyor for Windows CI coverage. 2017-06-28 18:36:37 -07:00
snappy-test.h Add SNAPPY_ prefix to PREDICT_{TRUE,FALSE} macros. 2017-08-01 14:36:26 -07:00
snappy.cc Ensure DecompressAllTags starts on a 32-byte boundary + 16 bytes. 2018-02-17 00:47:18 -08:00
snappy.h Add support for Uncompress(source, sink). Various changes to allow 2015-07-06 14:21:00 +02:00
snappy_unittest.cc Replace getpagesize() with sysconf(_SC_PAGESIZE). 2017-08-01 14:38:57 -07:00

README.md

Snappy, a fast compressor/decompressor.

Introduction

Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. (For more information, see "Performance", below.)

Snappy has the following properties:

  • Fast: Compression speeds at 250 MB/sec and beyond, with no assembler code. See "Performance" below.
  • Stable: Over the last few years, Snappy has compressed and decompressed petabytes of data in Google's production environment. The Snappy bitstream format is stable and will not change between versions.
  • Robust: The Snappy decompressor is designed not to crash in the face of corrupted or malicious input.
  • Free and open source software: Snappy is licensed under a BSD-type license. For more information, see the included COPYING file.

Snappy has previously been called "Zippy" in some Google presentations and the like.

Performance

Snappy is intended to be fast. On a single core of a Core i7 processor in 64-bit mode, it compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more. (These numbers are for the slowest inputs in our benchmark suite; others are much faster.) In our tests, Snappy usually is faster than algorithms in the same class (e.g. LZO, LZF, QuickLZ, etc.) while achieving comparable compression ratios.

Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and other already-compressed data. Similar numbers for zlib in its fastest mode are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are capable of achieving yet higher compression rates, although usually at the expense of speed. Of course, compression ratio will vary significantly with the input.

Although Snappy should be fairly portable, it is primarily optimized for 64-bit x86-compatible processors, and may run slower in other environments. In particular:

  • Snappy uses 64-bit operations in several places to process more data at once than would otherwise be possible.
  • Snappy assumes unaligned 32- and 64-bit loads and stores are cheap. On some platforms, these must be emulated with single-byte loads and stores, which is much slower.
  • Snappy assumes little-endian throughout, and needs to byte-swap data in several places if running on a big-endian platform.

Experience has shown that even heavily tuned code can be improved. Performance optimizations, whether for 64-bit x86 or other platforms, are of course most welcome; see "Contact", below.

Building

CMake is supported and autotools will soon be deprecated. You need CMake 3.4 or above to build:

mkdir build cd build && cmake ../ && make

Usage

Note that Snappy, both the implementation and the main interface, is written in C++. However, several third-party bindings to other languages are available; see the home page at http://google.github.io/snappy/ for more information. Also, if you want to use Snappy from C code, you can use the included C bindings in snappy-c.h.

To use Snappy from your own C++ program, include the file "snappy.h" from your calling file, and link against the compiled library.

There are many ways to call Snappy, but the simplest possible is

snappy::Compress(input.data(), input.size(), &output);

and similarly

snappy::Uncompress(input.data(), input.size(), &output);

where "input" and "output" are both instances of std::string.

There are other interfaces that are more flexible in various ways, including support for custom (non-array) input sources. See the header file for more information.

Tests and benchmarks

When you compile Snappy, snappy_unittest is compiled in addition to the library itself. You do not need it to use the compressor from your own library, but it contains several useful components for Snappy development.

First of all, it contains unit tests, verifying correctness on your machine in various scenarios. If you want to change or optimize Snappy, please run the tests to verify you have not broken anything. Note that if you have the Google Test library installed, unit test behavior (especially failures) will be significantly more user-friendly. You can find Google Test at

http://github.com/google/googletest

You probably also want the gflags library for handling of command-line flags; you can find it at

http://gflags.github.io/gflags/

In addition to the unit tests, snappy contains microbenchmarks used to tune compression and decompression performance. These are automatically run before the unit tests, but you can disable them using the flag --run_microbenchmarks=false if you have gflags installed (otherwise you will need to edit the source).

Finally, snappy can benchmark Snappy against a few other compression libraries (zlib, LZO, LZF, and QuickLZ), if they were detected at configure time. To benchmark using a given file, give the compression algorithm you want to test Snappy against (e.g. --zlib) and then a list of one or more file names on the command line. The testdata/ directory contains the files used by the microbenchmark, which should provide a reasonably balanced starting point for benchmarking. (Note that baddata[1-3].snappy are not intended as benchmarks; they are used to verify correctness in the presence of corrupted data in the unit test.)

Contact

Snappy is distributed through GitHub. For the latest version, a bug tracker, and other information, see

http://google.github.io/snappy/

or the repository at

https://github.com/google/snappy