Go to file
Steinar H. Gunderson b2312c4c25 Add support for Uncompress(source, sink). Various changes to allow
Uncompress(source, sink) to get the same performance as the different
variants of Uncompress to Cord/DataBuffer/String/FlatBuffer.

Changes to efficiently support Uncompress(source, sink)
--------

a) For strings - we add support to StringByteSink to do GetAppendBuffer so we
   can write to it without copying.
b) For flat array buffers, we do GetAppendBuffer and see if we can get a full buffer.

With the above changes we get performance with ByteSource/ByteSink
that is	very close to directly using flat arrays and strings.

We add various benchmark cases to demonstrate that.

Orthogonal change
------------------

Add support for TryFastAppend() for SnappyScatteredWriter.

Benchmark results are below

CPU: Intel Core2 dL1:32KB dL2:4096KB
Benchmark              Time(ns)    CPU(ns) Iterations
-----------------------------------------------------
BM_UFlat/0               109065     108996       6410 896.0MB/s  html
BM_UFlat/1              1012175    1012343        691 661.4MB/s  urls
BM_UFlat/2                26775      26771      26149 4.4GB/s  jpg
BM_UFlat/3                48947      48940      14363 1.8GB/s  pdf
BM_UFlat/4               441029     440835       1589 886.1MB/s  html4
BM_UFlat/5                39861      39880      17823 588.3MB/s  cp
BM_UFlat/6                18315      18300      38126 581.1MB/s  c
BM_UFlat/7                 5254       5254     100000 675.4MB/s  lsp
BM_UFlat/8              1568060    1567376        447 626.6MB/s  xls
BM_UFlat/9               337512     337734       2073 429.5MB/s  txt1
BM_UFlat/10              287269     287054       2434 415.9MB/s  txt2
BM_UFlat/11              890098     890219        787 457.2MB/s  txt3
BM_UFlat/12             1186593    1186863        590 387.2MB/s  txt4
BM_UFlat/13              573927     573318       1000 853.7MB/s  bin
BM_UFlat/14               64250      64294      10000 567.2MB/s  sum
BM_UFlat/15                7301       7300      96153 552.2MB/s  man
BM_UFlat/16              109617     109636       6375 1031.5MB/s  pb
BM_UFlat/17              364438     364497       1921 482.3MB/s  gaviota
BM_UFlatSink/0           108518     108465       6450 900.4MB/s  html
BM_UFlatSink/1           991952     991997        705 675.0MB/s  urls
BM_UFlatSink/2            26815      26798      26065 4.4GB/s  jpg
BM_UFlatSink/3            49127      49122      14255 1.8GB/s  pdf
BM_UFlatSink/4           436674     436731       1604 894.4MB/s  html4
BM_UFlatSink/5            39738      39733      17345 590.5MB/s  cp
BM_UFlatSink/6            18413      18416      37962 577.4MB/s  c
BM_UFlatSink/7             5677       5676     100000 625.2MB/s  lsp
BM_UFlatSink/8          1552175    1551026        451 633.2MB/s  xls
BM_UFlatSink/9           338526     338489       2065 428.5MB/s  txt1
BM_UFlatSink/10          289387     289307       2420 412.6MB/s  txt2
BM_UFlatSink/11          893803     893706        783 455.4MB/s  txt3
BM_UFlatSink/12         1195919    1195459        586 384.4MB/s  txt4
BM_UFlatSink/13          559637     559779       1000 874.3MB/s  bin
BM_UFlatSink/14           65073      65094      10000 560.2MB/s  sum
BM_UFlatSink/15            7618       7614      92823 529.5MB/s  man
BM_UFlatSink/16          110085     110121       6352 1027.0MB/s  pb
BM_UFlatSink/17          369196     368915       1896 476.5MB/s  gaviota
BM_UValidate/0            46954      46957      14899 2.0GB/s  html
BM_UValidate/1           500621     500868       1000 1.3GB/s  urls
BM_UValidate/2              283        283    2481447 417.2GB/s  jpg
BM_UValidate/3            16230      16228      43137 5.4GB/s  pdf
BM_UValidate/4           189129     189193       3701 2.0GB/s  html4

A=uday
R=sanjay
2015-07-06 14:21:00 +02:00
m4 Fix public issue #12: Don't keep autogenerated auto* files in Subversion; 2011-03-23 23:17:36 +00:00
testdata Fix public issue 82: Stop distributing benchmark data files that have 2014-02-19 10:31:49 +00:00
AUTHORS Revision created by MOE tool push_codebase. 2011-03-18 17:14:15 +00:00
COPYING Change some internal path names. 2015-06-22 15:39:08 +02:00
ChangeLog Release Snappy 1.1.2. 2014-02-28 11:18:07 +00:00
Makefile.am Fix public issue 82: Stop distributing benchmark data files that have 2014-02-19 10:31:49 +00:00
NEWS Release Snappy 1.1.2. 2014-02-28 11:18:07 +00:00
README Fix public issue #53: Update the README to the API we actually open-sourced 2011-11-08 14:46:39 +00:00
autogen.sh Fix public issue #31: Don't reset PATH in autogen.sh; instead, do the trickery 2011-04-26 12:34:37 +00:00
configure.ac Release Snappy 1.1.2. 2014-02-28 11:18:07 +00:00
format_description.txt In the format description, use a clearer example to emphasize that varints are 2011-10-05 12:27:12 +00:00
framing_format.txt Add support for padding in the Snappy framed format. 2013-10-25 13:31:27 +00:00
snappy-c.cc Include C bindings of Snappy, contributed by Martin Gieseking. 2011-04-08 09:51:53 +00:00
snappy-c.h Change some internal path names. 2015-06-22 15:39:08 +02:00
snappy-internal.h Change a few branch annotations that profiling found to be wrong. 2015-06-22 16:09:56 +02:00
snappy-sinksource.cc Add support for Uncompress(source, sink). Various changes to allow 2015-07-06 14:21:00 +02:00
snappy-sinksource.h Add support for Uncompress(source, sink). Various changes to allow 2015-07-06 14:21:00 +02:00
snappy-stubs-internal.cc Change Snappy from the Apache 2.0 to a BSD-type license. 2011-03-25 16:14:41 +00:00
snappy-stubs-internal.h Change some internal path names. 2015-06-22 15:39:08 +02:00
snappy-stubs-public.h.in Change some internal path names. 2015-06-22 15:39:08 +02:00
snappy-test.cc Fixed unit tests to compile under MSVC. 2015-06-22 16:09:56 +02:00
snappy-test.h Fixed unit tests to compile under MSVC. 2015-06-22 16:09:56 +02:00
snappy.cc Add support for Uncompress(source, sink). Various changes to allow 2015-07-06 14:21:00 +02:00
snappy.h Add support for Uncompress(source, sink). Various changes to allow 2015-07-06 14:21:00 +02:00
snappy_unittest.cc Add support for Uncompress(source, sink). Various changes to allow 2015-07-06 14:21:00 +02:00

README

Snappy, a fast compressor/decompressor.


Introduction
============

Snappy is a compression/decompression library. It does not aim for maximum
compression, or compatibility with any other compression library; instead,
it aims for very high speeds and reasonable compression. For instance,
compared to the fastest mode of zlib, Snappy is an order of magnitude faster
for most inputs, but the resulting compressed files are anywhere from 20% to
100% bigger. (For more information, see "Performance", below.)

Snappy has the following properties:

 * Fast: Compression speeds at 250 MB/sec and beyond, with no assembler code.
   See "Performance" below.
 * Stable: Over the last few years, Snappy has compressed and decompressed
   petabytes of data in Google's production environment. The Snappy bitstream
   format is stable and will not change between versions.
 * Robust: The Snappy decompressor is designed not to crash in the face of
   corrupted or malicious input.
 * Free and open source software: Snappy is licensed under a BSD-type license.
   For more information, see the included COPYING file.

Snappy has previously been called "Zippy" in some Google presentations
and the like.


Performance
===========
 
Snappy is intended to be fast. On a single core of a Core i7 processor
in 64-bit mode, it compresses at about 250 MB/sec or more and decompresses at
about 500 MB/sec or more. (These numbers are for the slowest inputs in our
benchmark suite; others are much faster.) In our tests, Snappy usually
is faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ,
etc.) while achieving comparable compression ratios.

Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x
for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and
other already-compressed data. Similar numbers for zlib in its fastest mode
are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are
capable of achieving yet higher compression rates, although usually at the
expense of speed. Of course, compression ratio will vary significantly with
the input.

Although Snappy should be fairly portable, it is primarily optimized
for 64-bit x86-compatible processors, and may run slower in other environments.
In particular:

 - Snappy uses 64-bit operations in several places to process more data at
   once than would otherwise be possible.
 - Snappy assumes unaligned 32- and 64-bit loads and stores are cheap.
   On some platforms, these must be emulated with single-byte loads 
   and stores, which is much slower.
 - Snappy assumes little-endian throughout, and needs to byte-swap data in
   several places if running on a big-endian platform.

Experience has shown that even heavily tuned code can be improved.
Performance optimizations, whether for 64-bit x86 or other platforms,
are of course most welcome; see "Contact", below.


Usage
=====

Note that Snappy, both the implementation and the main interface,
is written in C++. However, several third-party bindings to other languages
are available; see the Google Code page at http://code.google.com/p/snappy/
for more information. Also, if you want to use Snappy from C code, you can
use the included C bindings in snappy-c.h.

To use Snappy from your own C++ program, include the file "snappy.h" from
your calling file, and link against the compiled library.

There are many ways to call Snappy, but the simplest possible is

  snappy::Compress(input.data(), input.size(), &output);

and similarly

  snappy::Uncompress(input.data(), input.size(), &output);

where "input" and "output" are both instances of std::string.

There are other interfaces that are more flexible in various ways, including
support for custom (non-array) input sources. See the header file for more
information.


Tests and benchmarks
====================

When you compile Snappy, snappy_unittest is compiled in addition to the
library itself. You do not need it to use the compressor from your own library,
but it contains several useful components for Snappy development.

First of all, it contains unit tests, verifying correctness on your machine in
various scenarios. If you want to change or optimize Snappy, please run the
tests to verify you have not broken anything. Note that if you have the
Google Test library installed, unit test behavior (especially failures) will be
significantly more user-friendly. You can find Google Test at

  http://code.google.com/p/googletest/

You probably also want the gflags library for handling of command-line flags;
you can find it at

  http://code.google.com/p/google-gflags/

In addition to the unit tests, snappy contains microbenchmarks used to
tune compression and decompression performance. These are automatically run
before the unit tests, but you can disable them using the flag
--run_microbenchmarks=false if you have gflags installed (otherwise you will
need to edit the source).

Finally, snappy can benchmark Snappy against a few other compression libraries
(zlib, LZO, LZF, FastLZ and QuickLZ), if they were detected at configure time.
To benchmark using a given file, give the compression algorithm you want to test
Snappy against (e.g. --zlib) and then a list of one or more file names on the
command line. The testdata/ directory contains the files used by the
microbenchmark, which should provide a reasonably balanced starting point for
benchmarking. (Note that baddata[1-3].snappy are not intended as benchmarks; they
are used to verify correctness in the presence of corrupted data in the unit
test.)


Contact
=======

Snappy is distributed through Google Code. For the latest version, a bug tracker,
and other information, see

  http://code.google.com/p/snappy/