Find a file
snappy.mirrorbot@gmail.com f8829ea39d Enable the use of unaligned loads and stores for ARM-based architectures
where they are available (ARMv7 and higher). This gives a significant 
speed boost on ARM, both for compression and decompression. 
It should not affect x86 at all. 
 
There are more changes possible to speed up ARM, but it might not be 
that easy to do without hurting x86 or making the code uglier. 
Also, we de not try to use NEON yet. 
 
Microbenchmark results on a Cortex-A9 1GHz, using g++ 4.6.2 (from Ubuntu/Linaro), 
-O2 -DNDEBUG -Wa,-march=armv7a -mtune=cortex-a9 -mthumb-interwork: 
 
Benchmark            Time(ns)    CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0             524806     529100        378 184.6MB/s  html            [+33.6%]
BM_UFlat/1            5139790    5200000        100 128.8MB/s  urls            [+28.8%]
BM_UFlat/2              86540      84166       1901 1.4GB/s  jpg               [ +0.6%]
BM_UFlat/3             215351     210176        904 428.0MB/s  pdf             [+29.8%]
BM_UFlat/4            2144490    2100000        100 186.0MB/s  html4           [+33.3%]
BM_UFlat/5             194482     190000       1000 123.5MB/s  cp              [+36.2%]
BM_UFlat/6              91843      90175       2107 117.9MB/s  c               [+38.6%]
BM_UFlat/7              28535      28426       6684 124.8MB/s  lsp             [+34.7%]
BM_UFlat/8            9206600    9200000        100 106.7MB/s  xls             [+42.4%]
BM_UFlat/9            1865273    1886792        106 76.9MB/s  txt1             [+32.5%]
BM_UFlat/10           1576809    1587301        126 75.2MB/s  txt2             [+32.3%]
BM_UFlat/11           4968450    4900000        100 83.1MB/s  txt3             [+32.7%]
BM_UFlat/12           6673970    6700000        100 68.6MB/s  txt4             [+32.8%]
BM_UFlat/13           2391470    2400000        100 203.9MB/s  bin             [+29.2%]
BM_UFlat/14            334601     344827        522 105.8MB/s  sum             [+30.6%]
BM_UFlat/15             37404      38080       5252 105.9MB/s  man             [+33.8%]
BM_UFlat/16            535470     540540        370 209.2MB/s  pb              [+31.2%]
BM_UFlat/17           1875245    1886792        106 93.2MB/s  gaviota          [+37.8%]
BM_UValidate/0         178425     179533       1114 543.9MB/s  html            [ +2.7%]
BM_UValidate/1        2100450    2000000        100 334.8MB/s  urls            [ +5.0%]
BM_UValidate/2           1039       1044     172413 113.3GB/s  jpg             [ +3.4%]
BM_UValidate/3          59423      59470       3363 1.5GB/s  pdf               [ +7.8%]
BM_UValidate/4         760716     766283        261 509.8MB/s  html4           [ +6.5%]
BM_ZFlat/0            1204632    1204819        166 81.1MB/s  html (23.57 %)   [+32.8%]
BM_ZFlat/1           15656190   15600000        100 42.9MB/s  urls (50.89 %)   [+27.6%]
BM_ZFlat/2             403336     410677        487 294.8MB/s  jpg (99.88 %)   [+16.5%]
BM_ZFlat/3             664073     671140        298 134.0MB/s  pdf (82.13 %)   [+28.4%]
BM_ZFlat/4            4961940    4900000        100 79.7MB/s  html4 (23.55 %)  [+30.6%]
BM_ZFlat/5             500664     501253        399 46.8MB/s  cp (48.12 %)     [+33.4%]
BM_ZFlat/6             217276     215982        926 49.2MB/s  c (42.40 %)      [+25.0%]
BM_ZFlat/7              64122      65487       3054 54.2MB/s  lsp (48.37 %)    [+36.1%]
BM_ZFlat/8           18045730   18000000        100 54.6MB/s  xls (41.34 %)    [+34.4%]
BM_ZFlat/9            4051530    4000000        100 36.3MB/s  txt1 (59.81 %)   [+25.0%]
BM_ZFlat/10           3451800    3500000        100 34.1MB/s  txt2 (64.07 %)   [+25.7%]
BM_ZFlat/11          11052340   11100000        100 36.7MB/s  txt3 (57.11 %)   [+24.3%]
BM_ZFlat/12          14538690   14600000        100 31.5MB/s  txt4 (68.35 %)   [+24.7%]
BM_ZFlat/13           5041850    5000000        100 97.9MB/s  bin (18.21 %)    [+32.0%]
BM_ZFlat/14            908840     909090        220 40.1MB/s  sum (51.88 %)    [+22.2%]
BM_ZFlat/15             86921      86206       1972 46.8MB/s  man (59.36 %)    [+42.2%]
BM_ZFlat/16           1312315    1315789        152 86.0MB/s  pb (23.15 %)     [+34.5%]
BM_ZFlat/17           3173120    3200000        100 54.9MB/s  gaviota (38.27%) [+28.1%]


The move from 64-bit to 32-bit operations for the copies also affected 32-bit x86;
positive on the decompression side, and slightly negative on the compression side
(unless that is noise; I only ran once):

Benchmark              Time(ns)    CPU(ns) Iterations
-----------------------------------------------------
BM_UFlat/0                86279      86140       7778 1.1GB/s  html             [ +7.5%]
BM_UFlat/1               839265     822622        778 813.9MB/s  urls           [ +9.4%]
BM_UFlat/2                 9180       9143      87500 12.9GB/s  jpg             [ +1.2%]
BM_UFlat/3                35080      35000      20000 2.5GB/s  pdf              [+10.1%]
BM_UFlat/4               350318     345000       2000 1.1GB/s  html4            [ +7.0%]
BM_UFlat/5                33808      33472      21212 701.0MB/s  cp             [ +9.0%]
BM_UFlat/6                15201      15214      46667 698.9MB/s  c              [+14.9%]
BM_UFlat/7                 4652       4651     159091 762.9MB/s  lsp            [ +7.5%]
BM_UFlat/8              1285551    1282528        538 765.7MB/s  xls            [+10.7%]
BM_UFlat/9               282510     281690       2414 514.9MB/s  txt1           [+13.6%]
BM_UFlat/10              243494     239286       2800 498.9MB/s  txt2           [+14.4%]
BM_UFlat/11              743625     740000       1000 550.0MB/s  txt3           [+14.3%]
BM_UFlat/12              999441     989717        778 464.3MB/s  txt4           [+16.1%]
BM_UFlat/13              412402     410076       1707 1.2GB/s  bin              [ +7.3%]
BM_UFlat/14               54876      54000      10000 675.3MB/s  sum            [+13.0%]
BM_UFlat/15                6146       6100     100000 660.8MB/s  man            [+14.8%]
BM_UFlat/16               90496      90286       8750 1.2GB/s  pb               [ +4.0%]
BM_UFlat/17              292650     292000       2500 602.0MB/s  gaviota        [+18.1%]
BM_UValidate/0            49620      49699      14286 1.9GB/s  html             [ +0.0%]
BM_UValidate/1           501371     500000       1000 1.3GB/s  urls             [ +0.0%]
BM_UValidate/2              232        227    3043478 521.5GB/s  jpg            [ +1.3%]
BM_UValidate/3            17250      17143      43750 5.1GB/s  pdf              [ -1.3%]
BM_UValidate/4           198643     200000       3500 1.9GB/s  html4            [ -0.9%]
BM_ZFlat/0               227128     229415       3182 425.7MB/s  html (23.57 %) [ -1.4%]
BM_ZFlat/1              2970089    2960000        250 226.2MB/s  urls (50.89 %) [ -1.9%]
BM_ZFlat/2                45683      44999      15556 2.6GB/s  jpg (99.88 %)    [ +2.2%]
BM_ZFlat/3               114661     113136       6364 795.1MB/s  pdf (82.13 %)  [ -1.5%]
BM_ZFlat/4               919702     914286        875 427.2MB/s  html4 (23.55%) [ -1.3%]
BM_ZFlat/5               108189     108422       6364 216.4MB/s  cp (48.12 %)   [ -1.2%]
BM_ZFlat/6                44525      44000      15909 241.7MB/s  c (42.40 %)    [ -2.9%]
BM_ZFlat/7                15973      15857      46667 223.8MB/s  lsp (48.37 %)  [ +0.0%]
BM_ZFlat/8              2677888    2639405        269 372.1MB/s  xls (41.34 %)  [ -1.4%]
BM_ZFlat/9               800715     780000       1000 186.0MB/s  txt1 (59.81 %) [ -0.4%]
BM_ZFlat/10              700089     700000       1000 170.5MB/s  txt2 (64.07 %) [ -2.9%]
BM_ZFlat/11             2159356    2138365        318 190.3MB/s  txt3 (57.11 %) [ -0.3%]
BM_ZFlat/12             2796143    2779923        259 165.3MB/s  txt4 (68.35 %) [ -1.4%]
BM_ZFlat/13              856458     835476        778 585.8MB/s  bin (18.21 %)  [ -0.1%]
BM_ZFlat/14              166908     166857       4375 218.6MB/s  sum (51.88 %)  [ -1.4%]
BM_ZFlat/15               21181      20857      35000 193.3MB/s  man (59.36 %)  [ -0.8%]
BM_ZFlat/16              244009     239973       2917 471.3MB/s  pb (23.15 %)   [ -1.4%]
BM_ZFlat/17              596362     590000       1000 297.9MB/s  gaviota (38.27%) [ +0.0%]

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@59 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2012-02-21 17:02:17 +00:00
m4
testdata
AUTHORS
autogen.sh Fix public issue #31: Don't reset PATH in autogen.sh; instead, do the trickery 2011-04-26 12:34:37 +00:00
ChangeLog Release Snappy 1.0.4. 2011-09-15 19:34:06 +00:00
configure.ac Release Snappy 1.0.4. 2011-09-15 19:34:06 +00:00
COPYING
format_description.txt In the format description, use a clearer example to emphasize that varints are 2011-10-05 12:27:12 +00:00
framing_format.txt Add a framing format description. We do not have any implementation of this at 2012-01-04 10:46:39 +00:00
Makefile.am Add a framing format description. We do not have any implementation of this at 2012-01-04 10:46:39 +00:00
NEWS Release Snappy 1.0.4. 2011-09-15 19:34:06 +00:00
README Fix public issue #53: Update the README to the API we actually open-sourced 2011-11-08 14:46:39 +00:00
snappy-c.cc Include C bindings of Snappy, contributed by Martin Gieseking. 2011-04-08 09:51:53 +00:00
snappy-c.h Include C bindings of Snappy, contributed by Martin Gieseking. 2011-04-08 09:51:53 +00:00
snappy-internal.h
snappy-sinksource.cc Minor refactoring to accomodate changes in Google's internal code tree. 2012-01-08 17:55:48 +00:00
snappy-sinksource.h Minor refactoring to accomodate changes in Google's internal code tree. 2012-01-08 17:55:48 +00:00
snappy-stubs-internal.cc
snappy-stubs-internal.h Enable the use of unaligned loads and stores for ARM-based architectures 2012-02-21 17:02:17 +00:00
snappy-stubs-public.h.in
snappy-test.cc Fix public issue r57: Fix most warnings with -Wall, mostly signed/unsigned 2012-01-04 13:10:46 +00:00
snappy-test.h Fix public issue r57: Fix most warnings with -Wall, mostly signed/unsigned 2012-01-04 13:10:46 +00:00
snappy.cc Enable the use of unaligned loads and stores for ARM-based architectures 2012-02-21 17:02:17 +00:00
snappy.h Fix public issue r57: Fix most warnings with -Wall, mostly signed/unsigned 2012-01-04 13:10:46 +00:00
snappy_unittest.cc Lower the size allocated in the "corrupted input" unit test from 256 MB 2012-02-11 22:11:22 +00:00

Snappy, a fast compressor/decompressor.


Introduction
============

Snappy is a compression/decompression library. It does not aim for maximum
compression, or compatibility with any other compression library; instead,
it aims for very high speeds and reasonable compression. For instance,
compared to the fastest mode of zlib, Snappy is an order of magnitude faster
for most inputs, but the resulting compressed files are anywhere from 20% to
100% bigger. (For more information, see "Performance", below.)

Snappy has the following properties:

 * Fast: Compression speeds at 250 MB/sec and beyond, with no assembler code.
   See "Performance" below.
 * Stable: Over the last few years, Snappy has compressed and decompressed
   petabytes of data in Google's production environment. The Snappy bitstream
   format is stable and will not change between versions.
 * Robust: The Snappy decompressor is designed not to crash in the face of
   corrupted or malicious input.
 * Free and open source software: Snappy is licensed under a BSD-type license.
   For more information, see the included COPYING file.

Snappy has previously been called "Zippy" in some Google presentations
and the like.


Performance
===========
 
Snappy is intended to be fast. On a single core of a Core i7 processor
in 64-bit mode, it compresses at about 250 MB/sec or more and decompresses at
about 500 MB/sec or more. (These numbers are for the slowest inputs in our
benchmark suite; others are much faster.) In our tests, Snappy usually
is faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ,
etc.) while achieving comparable compression ratios.

Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x
for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and
other already-compressed data. Similar numbers for zlib in its fastest mode
are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are
capable of achieving yet higher compression rates, although usually at the
expense of speed. Of course, compression ratio will vary significantly with
the input.

Although Snappy should be fairly portable, it is primarily optimized
for 64-bit x86-compatible processors, and may run slower in other environments.
In particular:

 - Snappy uses 64-bit operations in several places to process more data at
   once than would otherwise be possible.
 - Snappy assumes unaligned 32- and 64-bit loads and stores are cheap.
   On some platforms, these must be emulated with single-byte loads 
   and stores, which is much slower.
 - Snappy assumes little-endian throughout, and needs to byte-swap data in
   several places if running on a big-endian platform.

Experience has shown that even heavily tuned code can be improved.
Performance optimizations, whether for 64-bit x86 or other platforms,
are of course most welcome; see "Contact", below.


Usage
=====

Note that Snappy, both the implementation and the main interface,
is written in C++. However, several third-party bindings to other languages
are available; see the Google Code page at http://code.google.com/p/snappy/
for more information. Also, if you want to use Snappy from C code, you can
use the included C bindings in snappy-c.h.

To use Snappy from your own C++ program, include the file "snappy.h" from
your calling file, and link against the compiled library.

There are many ways to call Snappy, but the simplest possible is

  snappy::Compress(input.data(), input.size(), &output);

and similarly

  snappy::Uncompress(input.data(), input.size(), &output);

where "input" and "output" are both instances of std::string.

There are other interfaces that are more flexible in various ways, including
support for custom (non-array) input sources. See the header file for more
information.


Tests and benchmarks
====================

When you compile Snappy, snappy_unittest is compiled in addition to the
library itself. You do not need it to use the compressor from your own library,
but it contains several useful components for Snappy development.

First of all, it contains unit tests, verifying correctness on your machine in
various scenarios. If you want to change or optimize Snappy, please run the
tests to verify you have not broken anything. Note that if you have the
Google Test library installed, unit test behavior (especially failures) will be
significantly more user-friendly. You can find Google Test at

  http://code.google.com/p/googletest/

You probably also want the gflags library for handling of command-line flags;
you can find it at

  http://code.google.com/p/google-gflags/

In addition to the unit tests, snappy contains microbenchmarks used to
tune compression and decompression performance. These are automatically run
before the unit tests, but you can disable them using the flag
--run_microbenchmarks=false if you have gflags installed (otherwise you will
need to edit the source).

Finally, snappy can benchmark Snappy against a few other compression libraries
(zlib, LZO, LZF, FastLZ and QuickLZ), if they were detected at configure time.
To benchmark using a given file, give the compression algorithm you want to test
Snappy against (e.g. --zlib) and then a list of one or more file names on the
command line. The testdata/ directory contains the files used by the
microbenchmark, which should provide a reasonably balanced starting point for
benchmarking. (Note that baddata[1-3].snappy are not intended as benchmarks; they
are used to verify correctness in the presence of corrupted data in the unit
test.)


Contact
=======

Snappy is distributed through Google Code. For the latest version, a bug tracker,
and other information, see

  http://code.google.com/p/snappy/