mirror of
https://github.com/google/snappy.git
synced 2024-11-25 22:47:10 +00:00
f8829ea39d
where they are available (ARMv7 and higher). This gives a significant speed boost on ARM, both for compression and decompression. It should not affect x86 at all. There are more changes possible to speed up ARM, but it might not be that easy to do without hurting x86 or making the code uglier. Also, we de not try to use NEON yet. Microbenchmark results on a Cortex-A9 1GHz, using g++ 4.6.2 (from Ubuntu/Linaro), -O2 -DNDEBUG -Wa,-march=armv7a -mtune=cortex-a9 -mthumb-interwork: Benchmark Time(ns) CPU(ns) Iterations --------------------------------------------------- BM_UFlat/0 524806 529100 378 184.6MB/s html [+33.6%] BM_UFlat/1 5139790 5200000 100 128.8MB/s urls [+28.8%] BM_UFlat/2 86540 84166 1901 1.4GB/s jpg [ +0.6%] BM_UFlat/3 215351 210176 904 428.0MB/s pdf [+29.8%] BM_UFlat/4 2144490 2100000 100 186.0MB/s html4 [+33.3%] BM_UFlat/5 194482 190000 1000 123.5MB/s cp [+36.2%] BM_UFlat/6 91843 90175 2107 117.9MB/s c [+38.6%] BM_UFlat/7 28535 28426 6684 124.8MB/s lsp [+34.7%] BM_UFlat/8 9206600 9200000 100 106.7MB/s xls [+42.4%] BM_UFlat/9 1865273 1886792 106 76.9MB/s txt1 [+32.5%] BM_UFlat/10 1576809 1587301 126 75.2MB/s txt2 [+32.3%] BM_UFlat/11 4968450 4900000 100 83.1MB/s txt3 [+32.7%] BM_UFlat/12 6673970 6700000 100 68.6MB/s txt4 [+32.8%] BM_UFlat/13 2391470 2400000 100 203.9MB/s bin [+29.2%] BM_UFlat/14 334601 344827 522 105.8MB/s sum [+30.6%] BM_UFlat/15 37404 38080 5252 105.9MB/s man [+33.8%] BM_UFlat/16 535470 540540 370 209.2MB/s pb [+31.2%] BM_UFlat/17 1875245 1886792 106 93.2MB/s gaviota [+37.8%] BM_UValidate/0 178425 179533 1114 543.9MB/s html [ +2.7%] BM_UValidate/1 2100450 2000000 100 334.8MB/s urls [ +5.0%] BM_UValidate/2 1039 1044 172413 113.3GB/s jpg [ +3.4%] BM_UValidate/3 59423 59470 3363 1.5GB/s pdf [ +7.8%] BM_UValidate/4 760716 766283 261 509.8MB/s html4 [ +6.5%] BM_ZFlat/0 1204632 1204819 166 81.1MB/s html (23.57 %) [+32.8%] BM_ZFlat/1 15656190 15600000 100 42.9MB/s urls (50.89 %) [+27.6%] BM_ZFlat/2 403336 410677 487 294.8MB/s jpg (99.88 %) [+16.5%] BM_ZFlat/3 664073 671140 298 134.0MB/s pdf (82.13 %) [+28.4%] BM_ZFlat/4 4961940 4900000 100 79.7MB/s html4 (23.55 %) [+30.6%] BM_ZFlat/5 500664 501253 399 46.8MB/s cp (48.12 %) [+33.4%] BM_ZFlat/6 217276 215982 926 49.2MB/s c (42.40 %) [+25.0%] BM_ZFlat/7 64122 65487 3054 54.2MB/s lsp (48.37 %) [+36.1%] BM_ZFlat/8 18045730 18000000 100 54.6MB/s xls (41.34 %) [+34.4%] BM_ZFlat/9 4051530 4000000 100 36.3MB/s txt1 (59.81 %) [+25.0%] BM_ZFlat/10 3451800 3500000 100 34.1MB/s txt2 (64.07 %) [+25.7%] BM_ZFlat/11 11052340 11100000 100 36.7MB/s txt3 (57.11 %) [+24.3%] BM_ZFlat/12 14538690 14600000 100 31.5MB/s txt4 (68.35 %) [+24.7%] BM_ZFlat/13 5041850 5000000 100 97.9MB/s bin (18.21 %) [+32.0%] BM_ZFlat/14 908840 909090 220 40.1MB/s sum (51.88 %) [+22.2%] BM_ZFlat/15 86921 86206 1972 46.8MB/s man (59.36 %) [+42.2%] BM_ZFlat/16 1312315 1315789 152 86.0MB/s pb (23.15 %) [+34.5%] BM_ZFlat/17 3173120 3200000 100 54.9MB/s gaviota (38.27%) [+28.1%] The move from 64-bit to 32-bit operations for the copies also affected 32-bit x86; positive on the decompression side, and slightly negative on the compression side (unless that is noise; I only ran once): Benchmark Time(ns) CPU(ns) Iterations ----------------------------------------------------- BM_UFlat/0 86279 86140 7778 1.1GB/s html [ +7.5%] BM_UFlat/1 839265 822622 778 813.9MB/s urls [ +9.4%] BM_UFlat/2 9180 9143 87500 12.9GB/s jpg [ +1.2%] BM_UFlat/3 35080 35000 20000 2.5GB/s pdf [+10.1%] BM_UFlat/4 350318 345000 2000 1.1GB/s html4 [ +7.0%] BM_UFlat/5 33808 33472 21212 701.0MB/s cp [ +9.0%] BM_UFlat/6 15201 15214 46667 698.9MB/s c [+14.9%] BM_UFlat/7 4652 4651 159091 762.9MB/s lsp [ +7.5%] BM_UFlat/8 1285551 1282528 538 765.7MB/s xls [+10.7%] BM_UFlat/9 282510 281690 2414 514.9MB/s txt1 [+13.6%] BM_UFlat/10 243494 239286 2800 498.9MB/s txt2 [+14.4%] BM_UFlat/11 743625 740000 1000 550.0MB/s txt3 [+14.3%] BM_UFlat/12 999441 989717 778 464.3MB/s txt4 [+16.1%] BM_UFlat/13 412402 410076 1707 1.2GB/s bin [ +7.3%] BM_UFlat/14 54876 54000 10000 675.3MB/s sum [+13.0%] BM_UFlat/15 6146 6100 100000 660.8MB/s man [+14.8%] BM_UFlat/16 90496 90286 8750 1.2GB/s pb [ +4.0%] BM_UFlat/17 292650 292000 2500 602.0MB/s gaviota [+18.1%] BM_UValidate/0 49620 49699 14286 1.9GB/s html [ +0.0%] BM_UValidate/1 501371 500000 1000 1.3GB/s urls [ +0.0%] BM_UValidate/2 232 227 3043478 521.5GB/s jpg [ +1.3%] BM_UValidate/3 17250 17143 43750 5.1GB/s pdf [ -1.3%] BM_UValidate/4 198643 200000 3500 1.9GB/s html4 [ -0.9%] BM_ZFlat/0 227128 229415 3182 425.7MB/s html (23.57 %) [ -1.4%] BM_ZFlat/1 2970089 2960000 250 226.2MB/s urls (50.89 %) [ -1.9%] BM_ZFlat/2 45683 44999 15556 2.6GB/s jpg (99.88 %) [ +2.2%] BM_ZFlat/3 114661 113136 6364 795.1MB/s pdf (82.13 %) [ -1.5%] BM_ZFlat/4 919702 914286 875 427.2MB/s html4 (23.55%) [ -1.3%] BM_ZFlat/5 108189 108422 6364 216.4MB/s cp (48.12 %) [ -1.2%] BM_ZFlat/6 44525 44000 15909 241.7MB/s c (42.40 %) [ -2.9%] BM_ZFlat/7 15973 15857 46667 223.8MB/s lsp (48.37 %) [ +0.0%] BM_ZFlat/8 2677888 2639405 269 372.1MB/s xls (41.34 %) [ -1.4%] BM_ZFlat/9 800715 780000 1000 186.0MB/s txt1 (59.81 %) [ -0.4%] BM_ZFlat/10 700089 700000 1000 170.5MB/s txt2 (64.07 %) [ -2.9%] BM_ZFlat/11 2159356 2138365 318 190.3MB/s txt3 (57.11 %) [ -0.3%] BM_ZFlat/12 2796143 2779923 259 165.3MB/s txt4 (68.35 %) [ -1.4%] BM_ZFlat/13 856458 835476 778 585.8MB/s bin (18.21 %) [ -0.1%] BM_ZFlat/14 166908 166857 4375 218.6MB/s sum (51.88 %) [ -1.4%] BM_ZFlat/15 21181 20857 35000 193.3MB/s man (59.36 %) [ -0.8%] BM_ZFlat/16 244009 239973 2917 471.3MB/s pb (23.15 %) [ -1.4%] BM_ZFlat/17 596362 590000 1000 297.9MB/s gaviota (38.27%) [ +0.0%] R=sanjay git-svn-id: https://snappy.googlecode.com/svn/trunk@59 03e5f5b5-db94-4691-08a0-1a8bf15f6143 |
||
---|---|---|
m4 | ||
testdata | ||
AUTHORS | ||
autogen.sh | ||
ChangeLog | ||
configure.ac | ||
COPYING | ||
format_description.txt | ||
framing_format.txt | ||
Makefile.am | ||
NEWS | ||
README | ||
snappy-c.cc | ||
snappy-c.h | ||
snappy-internal.h | ||
snappy-sinksource.cc | ||
snappy-sinksource.h | ||
snappy-stubs-internal.cc | ||
snappy-stubs-internal.h | ||
snappy-stubs-public.h.in | ||
snappy-test.cc | ||
snappy-test.h | ||
snappy.cc | ||
snappy.h | ||
snappy_unittest.cc |
Snappy, a fast compressor/decompressor. Introduction ============ Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. (For more information, see "Performance", below.) Snappy has the following properties: * Fast: Compression speeds at 250 MB/sec and beyond, with no assembler code. See "Performance" below. * Stable: Over the last few years, Snappy has compressed and decompressed petabytes of data in Google's production environment. The Snappy bitstream format is stable and will not change between versions. * Robust: The Snappy decompressor is designed not to crash in the face of corrupted or malicious input. * Free and open source software: Snappy is licensed under a BSD-type license. For more information, see the included COPYING file. Snappy has previously been called "Zippy" in some Google presentations and the like. Performance =========== Snappy is intended to be fast. On a single core of a Core i7 processor in 64-bit mode, it compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more. (These numbers are for the slowest inputs in our benchmark suite; others are much faster.) In our tests, Snappy usually is faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ, etc.) while achieving comparable compression ratios. Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and other already-compressed data. Similar numbers for zlib in its fastest mode are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are capable of achieving yet higher compression rates, although usually at the expense of speed. Of course, compression ratio will vary significantly with the input. Although Snappy should be fairly portable, it is primarily optimized for 64-bit x86-compatible processors, and may run slower in other environments. In particular: - Snappy uses 64-bit operations in several places to process more data at once than would otherwise be possible. - Snappy assumes unaligned 32- and 64-bit loads and stores are cheap. On some platforms, these must be emulated with single-byte loads and stores, which is much slower. - Snappy assumes little-endian throughout, and needs to byte-swap data in several places if running on a big-endian platform. Experience has shown that even heavily tuned code can be improved. Performance optimizations, whether for 64-bit x86 or other platforms, are of course most welcome; see "Contact", below. Usage ===== Note that Snappy, both the implementation and the main interface, is written in C++. However, several third-party bindings to other languages are available; see the Google Code page at http://code.google.com/p/snappy/ for more information. Also, if you want to use Snappy from C code, you can use the included C bindings in snappy-c.h. To use Snappy from your own C++ program, include the file "snappy.h" from your calling file, and link against the compiled library. There are many ways to call Snappy, but the simplest possible is snappy::Compress(input.data(), input.size(), &output); and similarly snappy::Uncompress(input.data(), input.size(), &output); where "input" and "output" are both instances of std::string. There are other interfaces that are more flexible in various ways, including support for custom (non-array) input sources. See the header file for more information. Tests and benchmarks ==================== When you compile Snappy, snappy_unittest is compiled in addition to the library itself. You do not need it to use the compressor from your own library, but it contains several useful components for Snappy development. First of all, it contains unit tests, verifying correctness on your machine in various scenarios. If you want to change or optimize Snappy, please run the tests to verify you have not broken anything. Note that if you have the Google Test library installed, unit test behavior (especially failures) will be significantly more user-friendly. You can find Google Test at http://code.google.com/p/googletest/ You probably also want the gflags library for handling of command-line flags; you can find it at http://code.google.com/p/google-gflags/ In addition to the unit tests, snappy contains microbenchmarks used to tune compression and decompression performance. These are automatically run before the unit tests, but you can disable them using the flag --run_microbenchmarks=false if you have gflags installed (otherwise you will need to edit the source). Finally, snappy can benchmark Snappy against a few other compression libraries (zlib, LZO, LZF, FastLZ and QuickLZ), if they were detected at configure time. To benchmark using a given file, give the compression algorithm you want to test Snappy against (e.g. --zlib) and then a list of one or more file names on the command line. The testdata/ directory contains the files used by the microbenchmark, which should provide a reasonably balanced starting point for benchmarking. (Note that baddata[1-3].snappy are not intended as benchmarks; they are used to verify correctness in the presence of corrupted data in the unit test.) Contact ======= Snappy is distributed through Google Code. For the latest version, a bug tracker, and other information, see http://code.google.com/p/snappy/