mirror of
https://github.com/google/snappy.git
synced 2024-11-29 09:36:43 +00:00
5ed51ce15f
We do the fast-path step as soon as possible; in fact, as soon as we know the literal length. Since we usually hit the fast path, we can then skip the checks for long literals and available input space (beyond what the fast path check already does). Note that this changes the decompression Writer API; however, it does not change the ABI, since writers are always templatized and as such never cross compilation units. The new API is slightly more general, in that it doesn't hard-code the value 16. Note that we also take care to check for len <= 16 first, since the other two checks almost always succeed (so we don't want to waste time checking for them until we have to). The improvements are most marked on Nehalem, but are generally positive on other platforms as well. All microbenchmarks are 64-bit, opt. Clovertown (Core 2): Benchmark Time(ns) CPU(ns) Iterations -------------------------------------------- BM_UFlat/0 110226 110224 100000 886.0MB/s html [ +1.5%] BM_UFlat/1 1036523 1036508 10000 646.0MB/s urls [ -0.8%] BM_UFlat/2 26775 26775 522570 4.4GB/s jpg [ +0.0%] BM_UFlat/3 49738 49737 280974 1.8GB/s pdf [ +0.3%] BM_UFlat/4 446790 446792 31334 874.3MB/s html4 [ +0.8%] BM_UFlat/5 40561 40562 350424 578.5MB/s cp [ +1.3%] BM_UFlat/6 18722 18722 746903 568.0MB/s c [ +1.4%] BM_UFlat/7 5373 5373 2608632 660.5MB/s lsp [ +8.3%] BM_UFlat/8 1615716 1615718 8670 607.8MB/s xls [ +2.0%] BM_UFlat/9 345278 345281 40481 420.1MB/s txt1 [ +1.4%] BM_UFlat/10 294855 294855 47452 404.9MB/s txt2 [ +1.6%] BM_UFlat/11 914263 914263 15316 445.2MB/s txt3 [ +1.1%] BM_UFlat/12 1222694 1222691 10000 375.8MB/s txt4 [ +1.4%] BM_UFlat/13 584495 584489 23954 837.4MB/s bin [ -0.6%] BM_UFlat/14 66662 66662 210123 547.1MB/s sum [ +1.2%] BM_UFlat/15 7368 7368 1881856 547.1MB/s man [ +4.0%] BM_UFlat/16 110727 110726 100000 1021.4MB/s pb [ +2.3%] BM_UFlat/17 382138 382141 36616 460.0MB/s gaviota [ -0.7%] Westmere (Core i7): Benchmark Time(ns) CPU(ns) Iterations -------------------------------------------- BM_UFlat/0 78861 78853 177703 1.2GB/s html [ +2.1%] BM_UFlat/1 739560 739491 18912 905.4MB/s urls [ +3.4%] BM_UFlat/2 9867 9866 1419014 12.0GB/s jpg [ +3.4%] BM_UFlat/3 31989 31986 438385 2.7GB/s pdf [ +0.2%] BM_UFlat/4 319406 319380 43771 1.2GB/s html4 [ +1.9%] BM_UFlat/5 29639 29636 472862 791.7MB/s cp [ +5.2%] BM_UFlat/6 13478 13477 1000000 789.0MB/s c [ +2.3%] BM_UFlat/7 4030 4029 3475364 880.7MB/s lsp [ +8.7%] BM_UFlat/8 1036585 1036492 10000 947.5MB/s xls [ +6.9%] BM_UFlat/9 242127 242105 57838 599.1MB/s txt1 [ +3.0%] BM_UFlat/10 206499 206480 67595 578.2MB/s txt2 [ +3.4%] BM_UFlat/11 641635 641570 21811 634.4MB/s txt3 [ +2.4%] BM_UFlat/12 848847 848769 16443 541.4MB/s txt4 [ +3.1%] BM_UFlat/13 384968 384938 36366 1.2GB/s bin [ +0.3%] BM_UFlat/14 47106 47101 297770 774.3MB/s sum [ +4.4%] BM_UFlat/15 5063 5063 2772202 796.2MB/s man [ +7.7%] BM_UFlat/16 83663 83656 167697 1.3GB/s pb [ +1.8%] BM_UFlat/17 260224 260198 53823 675.6MB/s gaviota [ -0.5%] Barcelona (Opteron): Benchmark Time(ns) CPU(ns) Iterations -------------------------------------------- BM_UFlat/0 112490 112457 100000 868.4MB/s html [ -0.4%] BM_UFlat/1 1066719 1066339 10000 627.9MB/s urls [ +1.0%] BM_UFlat/2 24679 24672 563802 4.8GB/s jpg [ +0.7%] BM_UFlat/3 50603 50589 277285 1.7GB/s pdf [ +2.6%] BM_UFlat/4 452982 452849 30900 862.6MB/s html4 [ -0.2%] BM_UFlat/5 43860 43848 319554 535.1MB/s cp [ +1.2%] BM_UFlat/6 21419 21413 653573 496.6MB/s c [ +1.0%] BM_UFlat/7 6646 6645 2105405 534.1MB/s lsp [ +0.3%] BM_UFlat/8 1828487 1827886 7658 537.3MB/s xls [ +2.6%] BM_UFlat/9 391824 391714 35708 370.3MB/s txt1 [ +2.2%] BM_UFlat/10 334913 334816 41885 356.6MB/s txt2 [ +1.7%] BM_UFlat/11 1042062 1041674 10000 390.7MB/s txt3 [ +1.1%] BM_UFlat/12 1398902 1398456 10000 328.6MB/s txt4 [ +1.7%] BM_UFlat/13 545706 545530 25669 897.2MB/s bin [ -0.4%] BM_UFlat/14 71512 71505 196035 510.0MB/s sum [ +1.4%] BM_UFlat/15 8422 8421 1665036 478.7MB/s man [ +2.6%] BM_UFlat/16 112053 112048 100000 1009.3MB/s pb [ -0.4%] BM_UFlat/17 416723 416713 33612 421.8MB/s gaviota [ -2.0%] R=sanjay git-svn-id: https://snappy.googlecode.com/svn/trunk@53 03e5f5b5-db94-4691-08a0-1a8bf15f6143 |
||
---|---|---|
m4 | ||
testdata | ||
AUTHORS | ||
autogen.sh | ||
ChangeLog | ||
configure.ac | ||
COPYING | ||
format_description.txt | ||
Makefile.am | ||
NEWS | ||
README | ||
snappy-c.cc | ||
snappy-c.h | ||
snappy-internal.h | ||
snappy-sinksource.cc | ||
snappy-sinksource.h | ||
snappy-stubs-internal.cc | ||
snappy-stubs-internal.h | ||
snappy-stubs-public.h.in | ||
snappy-test.cc | ||
snappy-test.h | ||
snappy.cc | ||
snappy.h | ||
snappy_unittest.cc |
Snappy, a fast compressor/decompressor. Introduction ============ Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. (For more information, see "Performance", below.) Snappy has the following properties: * Fast: Compression speeds at 250 MB/sec and beyond, with no assembler code. See "Performance" below. * Stable: Over the last few years, Snappy has compressed and decompressed petabytes of data in Google's production environment. The Snappy bitstream format is stable and will not change between versions. * Robust: The Snappy decompressor is designed not to crash in the face of corrupted or malicious input. * Free and open source software: Snappy is licensed under a BSD-type license. For more information, see the included COPYING file. Snappy has previously been called "Zippy" in some Google presentations and the like. Performance =========== Snappy is intended to be fast. On a single core of a Core i7 processor in 64-bit mode, it compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more. (These numbers are for the slowest inputs in our benchmark suite; others are much faster.) In our tests, Snappy usually is faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ, etc.) while achieving comparable compression ratios. Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and other already-compressed data. Similar numbers for zlib in its fastest mode are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are capable of achieving yet higher compression rates, although usually at the expense of speed. Of course, compression ratio will vary significantly with the input. Although Snappy should be fairly portable, it is primarily optimized for 64-bit x86-compatible processors, and may run slower in other environments. In particular: - Snappy uses 64-bit operations in several places to process more data at once than would otherwise be possible. - Snappy assumes unaligned 32- and 64-bit loads and stores are cheap. On some platforms, these must be emulated with single-byte loads and stores, which is much slower. - Snappy assumes little-endian throughout, and needs to byte-swap data in several places if running on a big-endian platform. Experience has shown that even heavily tuned code can be improved. Performance optimizations, whether for 64-bit x86 or other platforms, are of course most welcome; see "Contact", below. Usage ===== Note that Snappy, both the implementation and the main interface, is written in C++. However, several third-party bindings to other languages are available; see the Google Code page at http://code.google.com/p/snappy/ for more information. Also, if you want to use Snappy from C code, you can use the included C bindings in snappy-c.h. To use Snappy from your own C++ program, include the file "snappy.h" from your calling file, and link against the compiled library. There are many ways to call Snappy, but the simplest possible is snappy::Compress(input.data(), input.size(), &output); and similarly snappy::Uncompress(input.data(), input.size(), &output); where "input" and "output" are both instances of std::string. There are other interfaces that are more flexible in various ways, including support for custom (non-array) input sources. See the header file for more information. Tests and benchmarks ==================== When you compile Snappy, snappy_unittest is compiled in addition to the library itself. You do not need it to use the compressor from your own library, but it contains several useful components for Snappy development. First of all, it contains unit tests, verifying correctness on your machine in various scenarios. If you want to change or optimize Snappy, please run the tests to verify you have not broken anything. Note that if you have the Google Test library installed, unit test behavior (especially failures) will be significantly more user-friendly. You can find Google Test at http://code.google.com/p/googletest/ You probably also want the gflags library for handling of command-line flags; you can find it at http://code.google.com/p/google-gflags/ In addition to the unit tests, snappy contains microbenchmarks used to tune compression and decompression performance. These are automatically run before the unit tests, but you can disable them using the flag --run_microbenchmarks=false if you have gflags installed (otherwise you will need to edit the source). Finally, snappy can benchmark Snappy against a few other compression libraries (zlib, LZO, LZF, FastLZ and QuickLZ), if they were detected at configure time. To benchmark using a given file, give the compression algorithm you want to test Snappy against (e.g. --zlib) and then a list of one or more file names on the command line. The testdata/ directory contains the files used by the microbenchmark, which should provide a reasonably balanced starting point for benchmarking. (Note that baddata[1-3].snappy are not intended as benchmarks; they are used to verify correctness in the presence of corrupted data in the unit test.) Contact ======= Snappy is distributed through Google Code. For the latest version, a bug tracker, and other information, see http://code.google.com/p/snappy/