Find a file
snappy.mirrorbot@gmail.com c266bbf321 Speed up decompression by caching ip_.
It is seemingly hard for the compiler to understand that ip_, the current input
pointer into the compressed data stream, can not alias on anything else, and
thus using it directly will incur memory traffic as it cannot be kept in a
register. The code already knew about this and cached it into a local
variable, but since Step() only decoded one tag, it had to move ip_ back into
place between every tag. This seems to have cost us a significant amount of
performance, so changing Step() into a function that decodes as much as it can
before it saves ip_ back and returns. (Note that Step() was already inlined,
so it is not the manual inlining that buys the performance here.)

The wins are about 3-6% for Core 2, 6-13% on Core i7 and 5-12% on Opteron
(for plain array-to-array decompression, in 64-bit opt mode).

There is a tiny difference in the behavior here; if an invalid literal is
encountered (ie., the writer refuses the Append() operation), ip_ will now
point to the byte past the tag byte, instead of where the literal was
originally thought to end. However, we don't use ip_ for anything after
DecompressAllTags() has returned, so this should not change external behavior
in any way.

Microbenchmark results for Core i7, 64-bit (Opteron results are similar):

Benchmark            Time(ns)    CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0              79134      79110       8835 1.2GB/s  html      [ +6.2%]
BM_UFlat/1             786126     786096        891 851.8MB/s  urls    [+10.0%]
BM_UFlat/2               9948       9948      69125 11.9GB/s  jpg      [ -1.3%]
BM_UFlat/3              31999      31998      21898 2.7GB/s  pdf       [ +6.5%]
BM_UFlat/4             318909     318829       2204 1.2GB/s  html4     [ +6.5%]
BM_UFlat/5              31384      31390      22363 747.5MB/s  cp      [ +9.2%]
BM_UFlat/6              14037      14034      49858 757.7MB/s  c       [+10.6%]
BM_UFlat/7               4612       4612     151395 769.5MB/s  lsp     [ +9.5%]
BM_UFlat/8            1203174    1203007        582 816.3MB/s  xls     [+19.3%]
BM_UFlat/9             253869     253955       2757 571.1MB/s  txt1    [+11.4%]
BM_UFlat/10            219292     219290       3194 544.4MB/s  txt2    [+12.1%]
BM_UFlat/11            672135     672131       1000 605.5MB/s  txt3    [+11.2%]
BM_UFlat/12            902512     902492        776 509.2MB/s  txt4    [+12.5%]
BM_UFlat/13            372110     371998       1881 1.3GB/s  bin       [ +5.8%]
BM_UFlat/14             50407      50407      10000 723.5MB/s  sum     [+13.5%]
BM_UFlat/15              5699       5701     100000 707.2MB/s  man     [+12.4%]
BM_UFlat/16             83448      83424       8383 1.3GB/s  pb        [ +5.7%]
BM_UFlat/17            256958     256963       2723 684.1MB/s  gaviota [ +7.9%]
BM_UValidate/0          42795      42796      16351 2.2GB/s  html      [+25.8%]
BM_UValidate/1         490672     490622       1427 1.3GB/s  urls      [+22.7%]
BM_UValidate/2            237        237    2950297 499.0GB/s  jpg     [+24.9%]
BM_UValidate/3          14610      14611      47901 6.0GB/s  pdf       [+26.8%]
BM_UValidate/4         171973     171990       4071 2.2GB/s  html4     [+25.7%]




git-svn-id: https://snappy.googlecode.com/svn/trunk@38 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2011-06-02 17:59:40 +00:00
m4
testdata Replace geo.protodata with a newer version. 2011-04-07 16:36:43 +00:00
AUTHORS
autogen.sh Fix public issue #31: Don't reset PATH in autogen.sh; instead, do the trickery 2011-04-26 12:34:37 +00:00
ChangeLog Release Snappy 1.0.2, to get the license change and various other fixes into 2011-05-03 23:22:33 +00:00
configure.ac Fix public issue #37: Only link snappy_unittest against -lz and other autodetected 2011-05-03 23:22:52 +00:00
COPYING Change Snappy from the Apache 2.0 to a BSD-type license. 2011-03-25 16:14:41 +00:00
format_description.txt Fix the numbering of the headlines in the Snappy format description. 2011-05-17 08:48:25 +00:00
Makefile.am Fix public issue #32: Add compressed format documentation for Snappy. 2011-05-16 08:59:18 +00:00
NEWS Release Snappy 1.0.2, to get the license change and various other fixes into 2011-05-03 23:22:33 +00:00
README Include C bindings of Snappy, contributed by Martin Gieseking. 2011-04-08 09:51:53 +00:00
snappy-c.cc Include C bindings of Snappy, contributed by Martin Gieseking. 2011-04-08 09:51:53 +00:00
snappy-c.h Include C bindings of Snappy, contributed by Martin Gieseking. 2011-04-08 09:51:53 +00:00
snappy-internal.h Put back the final few lines of what was truncated during the 2011-03-28 22:17:04 +00:00
snappy-sinksource.cc Change on 2011-03-25 19:18:00-07:00 by sesse 2011-03-26 02:34:34 +00:00
snappy-sinksource.h Change on 2011-03-25 19:18:00-07:00 by sesse 2011-03-26 02:34:34 +00:00
snappy-stubs-internal.cc Change Snappy from the Apache 2.0 to a BSD-type license. 2011-03-25 16:14:41 +00:00
snappy-stubs-internal.h Fix public issue #27: Add HAVE_CONFIG_H tests around the config.h 2011-03-30 20:27:53 +00:00
snappy-stubs-public.h.in Change Snappy from the Apache 2.0 to a BSD-type license. 2011-03-25 16:14:41 +00:00
snappy-test.cc Fix public issue #39: Pick out the median runs based on CPU time, 2011-05-09 21:29:02 +00:00
snappy-test.h Fix public issue #30: Stop using gettimeofday() altogether on Win32, 2011-04-26 12:34:55 +00:00
snappy.cc Speed up decompression by caching ip_. 2011-06-02 17:59:40 +00:00
snappy.h Put back the final few lines of what was truncated during the 2011-03-28 22:17:04 +00:00
snappy_unittest.cc Fix public issue #26: Take memory allocation and reallocation entirely out of the 2011-03-30 20:27:39 +00:00

Snappy, a fast compressor/decompressor.


Introduction
============

Snappy is a compression/decompression library. It does not aim for maximum
compression, or compatibility with any other compression library; instead,
it aims for very high speeds and reasonable compression. For instance,
compared to the fastest mode of zlib, Snappy is an order of magnitude faster
for most inputs, but the resulting compressed files are anywhere from 20% to
100% bigger. (For more information, see "Performance", below.)

Snappy has the following properties:

 * Fast: Compression speeds at 250 MB/sec and beyond, with no assembler code.
   See "Performance" below.
 * Stable: Over the last few years, Snappy has compressed and decompressed
   petabytes of data in Google's production environment. The Snappy bitstream
   format is stable and will not change between versions.
 * Robust: The Snappy decompressor is designed not to crash in the face of
   corrupted or malicious input.
 * Free and open source software: Snappy is licensed under a BSD-type license.
   For more information, see the included COPYING file.

Snappy has previously been called "Zippy" in some Google presentations
and the like.


Performance
===========
 
Snappy is intended to be fast. On a single core of a Core i7 processor
in 64-bit mode, it compresses at about 250 MB/sec or more and decompresses at
about 500 MB/sec or more. (These numbers are for the slowest inputs in our
benchmark suite; others are much faster.) In our tests, Snappy usually
is faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ,
etc.) while achieving comparable compression ratios.

Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x
for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and
other already-compressed data. Similar numbers for zlib in its fastest mode
are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are
capable of achieving yet higher compression rates, although usually at the
expense of speed. Of course, compression ratio will vary significantly with
the input.

Although Snappy should be fairly portable, it is primarily optimized
for 64-bit x86-compatible processors, and may run slower in other environments.
In particular:

 - Snappy uses 64-bit operations in several places to process more data at
   once than would otherwise be possible.
 - Snappy assumes unaligned 32- and 64-bit loads and stores are cheap.
   On some platforms, these must be emulated with single-byte loads 
   and stores, which is much slower.
 - Snappy assumes little-endian throughout, and needs to byte-swap data in
   several places if running on a big-endian platform.

Experience has shown that even heavily tuned code can be improved.
Performance optimizations, whether for 64-bit x86 or other platforms,
are of course most welcome; see "Contact", below.


Usage
=====

Note that Snappy, both the implementation and the main interface,
is written in C++. However, several third-party bindings to other languages
are available; see the Google Code page at http://code.google.com/p/snappy/
for more information. Also, if you want to use Snappy from C code, you can
use the included C bindings in snappy-c.h.

To use Snappy from your own C++ program, include the file "snappy.h" from
your calling file, and link against the compiled library.

There are many ways to call Snappy, but the simplest possible is

  snappy::Compress(input, &output);

and similarly

  snappy::Uncompress(input, &output);

where "input" and "output" are both instances of std::string.

There are other interfaces that are more flexible in various ways, including
support for custom (non-array) input sources. See the header file for more
information.


Tests and benchmarks
====================

When you compile Snappy, snappy_unittest is compiled in addition to the
library itself. You do not need it to use the compressor from your own library,
but it contains several useful components for Snappy development.

First of all, it contains unit tests, verifying correctness on your machine in
various scenarios. If you want to change or optimize Snappy, please run the
tests to verify you have not broken anything. Note that if you have the
Google Test library installed, unit test behavior (especially failures) will be
significantly more user-friendly. You can find Google Test at

  http://code.google.com/p/googletest/

You probably also want the gflags library for handling of command-line flags;
you can find it at

  http://code.google.com/p/google-gflags/

In addition to the unit tests, snappy contains microbenchmarks used to
tune compression and decompression performance. These are automatically run
before the unit tests, but you can disable them using the flag
--run_microbenchmarks=false if you have gflags installed (otherwise you will
need to edit the source).

Finally, snappy can benchmark Snappy against a few other compression libraries
(zlib, LZO, LZF, FastLZ and QuickLZ), if they were detected at configure time.
To benchmark using a given file, give the compression algorithm you want to test
Snappy against (e.g. --zlib) and then a list of one or more file names on the
command line. The testdata/ directory contains the files used by the
microbenchmark, which should provide a reasonably balanced starting point for
benchmarking. (Note that baddata[1-3].snappy are not intended as benchmarks; they
are used to verify correctness in the presence of corrupted data in the unit
test.)


Contact
=======

Snappy is distributed through Google Code. For the latest version, a bug tracker,
and other information, see

  http://code.google.com/p/snappy/