mirror of https://github.com/google/snappy.git
Make heuristic match skipping more aggressive.
This causes compression to be much faster on incompressible inputs (such as the jpeg and pdf tests), and is neutral or even positive on the other tests. The test set shows only microscopic density regressions; I attempted to construct a worst-case test set containing ~1500 different cases of mixed plaintext + /dev/urandom, and even those seemed to be only 0.38 percentage points less dense on average (the single worst case was 87.8% -> 89.0%), which we can live with given that this is already an edge case. The original idea is by Klaus Post; I only tweaked the implementation. Ironically, the new implementation is almost more in line with the comment that was there, so I've left that largely alone, albeit with a small modification. Microbenchmark results (opt mode, 64-bit, static linking): Ivy Bridge: Benchmark Base (ns) New (ns) Improvement ---------------------------------------------------------------------------------------- BM_ZFlat/0 120284 115480 847.0MB/s html (22.31 %) +4.2% BM_ZFlat/1 1527911 1522242 440.7MB/s urls (47.78 %) +0.4% BM_ZFlat/2 17591 10582 10.9GB/s jpg (99.95 %) +66.2% BM_ZFlat/3 323 322 593.3MB/s jpg_200 (73.00 %) +0.3% BM_ZFlat/4 53691 14063 6.8GB/s pdf (83.30 %) +281.8% BM_ZFlat/5 495442 492347 794.8MB/s html4 (22.52 %) +0.6% BM_ZFlat/6 473523 473622 306.7MB/s txt1 (57.88 %) -0.0% BM_ZFlat/7 421406 420120 284.5MB/s txt2 (61.91 %) +0.3% BM_ZFlat/8 1265632 1270538 320.8MB/s txt3 (54.99 %) -0.4% BM_ZFlat/9 1742688 1737894 264.8MB/s txt4 (66.26 %) +0.3% BM_ZFlat/10 107950 103404 1095.1MB/s pb (19.68 %) +4.4% BM_ZFlat/11 372660 371818 473.5MB/s gaviota (37.72 %) +0.2% BM_ZFlat/12 53239 49528 474.4MB/s cp (48.12 %) +7.5% BM_ZFlat/13 18940 17349 613.9MB/s c (42.47 %) +9.2% BM_ZFlat/14 5155 5075 700.3MB/s lsp (48.37 %) +1.6% BM_ZFlat/15 1474757 1474471 667.2MB/s xls (41.23 %) +0.0% BM_ZFlat/16 363 362 528.0MB/s xls_200 (78.00 %) +0.3% BM_ZFlat/17 453849 456931 1073.2MB/s bin (18.11 %) -0.7% BM_ZFlat/18 90 87 2.1GB/s bin_200 (7.50 %) +3.4% BM_ZFlat/19 82163 80498 453.7MB/s sum (48.96 %) +2.1% BM_ZFlat/20 7174 7124 566.7MB/s man (59.21 %) +0.7% Sum of all benchmarks 8694831 8623857 +0.8% Sandy Bridge: Benchmark Base (ns) New (ns) Improvement ---------------------------------------------------------------------------------------- BM_ZFlat/0 117426 112649 868.2MB/s html (22.31 %) +4.2% BM_ZFlat/1 1517095 1498522 447.5MB/s urls (47.78 %) +1.2% BM_ZFlat/2 18601 10649 10.8GB/s jpg (99.95 %) +74.7% BM_ZFlat/3 359 356 536.0MB/s jpg_200 (73.00 %) +0.8% BM_ZFlat/4 60249 13832 6.9GB/s pdf (83.30 %) +335.6% BM_ZFlat/5 481246 475571 822.7MB/s html4 (22.52 %) +1.2% BM_ZFlat/6 460541 455693 318.8MB/s txt1 (57.88 %) +1.1% BM_ZFlat/7 407751 404147 295.8MB/s txt2 (61.91 %) +0.9% BM_ZFlat/8 1228255 1222519 333.4MB/s txt3 (54.99 %) +0.5% BM_ZFlat/9 1678299 1666379 276.2MB/s txt4 (66.26 %) +0.7% BM_ZFlat/10 106499 101715 1113.4MB/s pb (19.68 %) +4.7% BM_ZFlat/11 361913 360222 488.7MB/s gaviota (37.72 %) +0.5% BM_ZFlat/12 53137 49618 473.6MB/s cp (48.12 %) +7.1% BM_ZFlat/13 18801 17812 597.8MB/s c (42.47 %) +5.6% BM_ZFlat/14 5394 5383 660.2MB/s lsp (48.37 %) +0.2% BM_ZFlat/15 1435411 1432870 686.4MB/s xls (41.23 %) +0.2% BM_ZFlat/16 389 395 483.3MB/s xls_200 (78.00 %) -1.5% BM_ZFlat/17 447255 445510 1100.4MB/s bin (18.11 %) +0.4% BM_ZFlat/18 86 86 2.2GB/s bin_200 (7.50 %) +0.0% BM_ZFlat/19 82555 79512 459.3MB/s sum (48.96 %) +3.8% BM_ZFlat/20 7527 7553 534.5MB/s man (59.21 %) -0.3% Sum of all benchmarks 8488789 8360993 +1.5% Haswell: Benchmark Base (ns) New (ns) Improvement ---------------------------------------------------------------------------------------- BM_ZFlat/0 107512 105621 925.6MB/s html (22.31 %) +1.8% BM_ZFlat/1 1344306 1332479 503.1MB/s urls (47.78 %) +0.9% BM_ZFlat/2 14752 9471 12.1GB/s jpg (99.95 %) +55.8% BM_ZFlat/3 287 275 694.0MB/s jpg_200 (73.00 %) +4.4% BM_ZFlat/4 48810 12263 7.8GB/s pdf (83.30 %) +298.0% BM_ZFlat/5 443013 442064 884.6MB/s html4 (22.52 %) +0.2% BM_ZFlat/6 429239 432124 336.0MB/s txt1 (57.88 %) -0.7% BM_ZFlat/7 381765 383681 311.5MB/s txt2 (61.91 %) -0.5% BM_ZFlat/8 1136667 1154304 353.0MB/s txt3 (54.99 %) -1.5% BM_ZFlat/9 1579925 1592431 288.9MB/s txt4 (66.26 %) -0.8% BM_ZFlat/10 98345 92411 1.2GB/s pb (19.68 %) +6.4% BM_ZFlat/11 340397 340466 516.8MB/s gaviota (37.72 %) -0.0% BM_ZFlat/12 47076 43536 539.5MB/s cp (48.12 %) +8.1% BM_ZFlat/13 16680 15637 680.8MB/s c (42.47 %) +6.7% BM_ZFlat/14 4616 4539 782.6MB/s lsp (48.37 %) +1.7% BM_ZFlat/15 1331231 1334094 736.9MB/s xls (41.23 %) -0.2% BM_ZFlat/16 326 322 593.5MB/s xls_200 (78.00 %) +1.2% BM_ZFlat/17 404383 400326 1.2GB/s bin (18.11 %) +1.0% BM_ZFlat/18 69 69 2.7GB/s bin_200 (7.50 %) +0.0% BM_ZFlat/19 74771 71348 511.7MB/s sum (48.96 %) +4.8% BM_ZFlat/20 6461 6383 632.2MB/s man (59.21 %) +1.2% Sum of all benchmarks 7810631 7773844 +0.5% I've done a quick test that there are no performance regressions on external GCC (4.9.2, Debian, Haswell, 64-bit), too.
This commit is contained in:
parent
2b9152d9c5
commit
d53de18799
|
@ -364,9 +364,9 @@ char* CompressFragment(const char* input,
|
|||
//
|
||||
// Heuristic match skipping: If 32 bytes are scanned with no matches
|
||||
// found, start looking only at every other byte. If 32 more bytes are
|
||||
// scanned, look at every third byte, etc.. When a match is found,
|
||||
// immediately go back to looking at every byte. This is a small loss
|
||||
// (~5% performance, ~0.1% density) for compressible data due to more
|
||||
// scanned (or skipped), look at every third byte, etc.. When a match is
|
||||
// found, immediately go back to looking at every byte. This is a small
|
||||
// loss (~5% performance, ~0.1% density) for compressible data due to more
|
||||
// bookkeeping, but for non-compressible data (such as JPEG) it's a huge
|
||||
// win since the compressor quickly "realizes" the data is incompressible
|
||||
// and doesn't bother looking for matches everywhere.
|
||||
|
@ -382,7 +382,8 @@ char* CompressFragment(const char* input,
|
|||
ip = next_ip;
|
||||
uint32 hash = next_hash;
|
||||
assert(hash == Hash(ip, shift));
|
||||
uint32 bytes_between_hash_lookups = skip++ >> 5;
|
||||
uint32 bytes_between_hash_lookups = skip >> 5;
|
||||
skip += bytes_between_hash_lookups;
|
||||
next_ip = ip + bytes_between_hash_lookups;
|
||||
if (PREDICT_FALSE(next_ip > ip_limit)) {
|
||||
goto emit_remainder;
|
||||
|
|
Loading…
Reference in New Issue