mirror of https://github.com/google/snappy.git
323 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
atdt | 5913c5f8e4 |
Don't use _bzhi_u32 under MSan
MSan knows that x & 0xFF only uses the lower byte from x but it isn't as smart about _bzhi_u32(val, 8). (I'll file an upstream bug.) |
|
atdt | 136b3ebc31 |
If BMI instructions are available, use BZHI to extract low bytes.
With --cpu=haswell, this results in some significant speed improvement (notably 12-14% for html and pb). On k8, performance is not affected (as expected). Full benchmark results for --cpu={k8,haswell} below. Haswell ------- name old time/op new time/op delta BM_UFlat/0 [html ] 55.2µs ± 0% 49.0µs ± 0% -11.34% (p=0.008 n=5+5) BM_UFlat/1 [urls ] 612µs ± 0% 604µs ± 0% -1.21% (p=0.008 n=5+5) BM_UFlat/2 [jpg ] 6.11µs ± 2% 6.07µs ± 1% ~ (p=0.421 n=5+5) BM_UFlat/3 [jpg_200 ] 134ns ± 0% 132ns ± 5% -1.49% (p=0.048 n=5+5) BM_UFlat/4 [pdf ] 8.41µs ± 2% 8.34µs ± 1% ~ (p=0.222 n=5+5) BM_UFlat/5 [html4 ] 239µs ± 0% 234µs ± 0% -2.24% (p=0.008 n=5+5) BM_UFlat/6 [txt1 ] 211µs ± 0% 205µs ± 0% -2.73% (p=0.008 n=5+5) BM_UFlat/7 [txt2 ] 185µs ± 0% 181µs ± 0% -2.34% (p=0.008 n=5+5) BM_UFlat/8 [txt3 ] 560µs ± 0% 545µs ± 0% -2.55% (p=0.008 n=5+5) BM_UFlat/9 [txt4 ] 773µs ± 0% 753µs ± 0% -2.61% (p=0.008 n=5+5) BM_UFlat/10 [pb ] 51.6µs ± 0% 45.3µs ± 0% -12.28% (p=0.008 n=5+5) BM_UFlat/11 [gaviota ] 209µs ± 0% 204µs ± 0% -2.28% (p=0.008 n=5+5) BM_UFlat/12 [cp ] 17.3µs ± 0% 15.7µs ± 1% -9.57% (p=0.008 n=5+5) BM_UFlat/13 [c ] 8.08µs ± 0% 8.00µs ± 0% -0.99% (p=0.008 n=5+5) BM_UFlat/14 [lsp ] 2.48µs ± 0% 2.45µs ± 0% -1.11% (p=0.008 n=5+5) BM_UFlat/15 [xls ] 967µs ± 0% 954µs ± 0% -1.36% (p=0.008 n=5+5) BM_UFlat/16 [xls_200 ] 219ns ± 1% 218ns ± 1% ~ (p=0.444 n=5+5) BM_UFlat/17 [bin ] 278µs ± 0% 275µs ± 0% -0.92% (p=0.008 n=5+5) BM_UFlat/18 [bin_200 ] 100ns ± 0% 99ns ± 1% -1.04% (p=0.008 n=5+5) BM_UFlat/19 [sum ] 34.0µs ± 0% 30.9µs ± 0% -9.10% (p=0.008 n=5+5) BM_UFlat/20 [man ] 3.21µs ± 0% 3.20µs ± 0% ~ (p=0.063 n=5+5) BM_UValidate/0 [html ] 33.1µs ± 0% 33.6µs ± 0% +1.69% (p=0.008 n=5+5) BM_UValidate/1 [urls ] 436µs ± 0% 441µs ± 0% +1.06% (p=0.008 n=5+5) BM_UValidate/2 [jpg ] 141ns ± 0% 142ns ± 0% +0.71% (p=0.008 n=5+5) BM_UValidate/3 [jpg_200 ] 94.3ns ± 0% 95.3ns ± 0% +1.06% (p=0.008 n=5+5) BM_UValidate/4 [pdf ] 2.87µs ± 0% 2.95µs ± 0% +2.74% (p=0.008 n=5+5) BM_UIOVec/0 [html ] 126µs ± 0% 124µs ± 0% -1.50% (p=0.008 n=5+5) BM_UIOVec/1 [urls ] 1.13ms ± 0% 1.11ms ± 0% -1.95% (p=0.008 n=5+5) BM_UIOVec/2 [jpg ] 6.31µs ± 3% 7.44µs ± 3% +17.75% (p=0.008 n=5+5) BM_UIOVec/3 [jpg_200 ] 332ns ± 1% 318ns ± 1% -4.22% (p=0.008 n=5+5) BM_UIOVec/4 [pdf ] 12.7µs ± 3% 12.6µs ± 9% ~ (p=0.222 n=5+5) BM_UFlatSink/0 [html ] 55.2µs ± 0% 49.0µs ± 0% -11.31% (p=0.008 n=5+5) BM_UFlatSink/1 [urls ] 612µs ± 0% 605µs ± 0% -1.17% (p=0.008 n=5+5) BM_UFlatSink/2 [jpg ] 6.29µs ±12% 6.57µs ± 9% ~ (p=0.548 n=5+5) BM_UFlatSink/3 [jpg_200 ] 138ns ± 2% 134ns ± 0% -2.76% (p=0.000 n=5+4) BM_UFlatSink/4 [pdf ] 8.35µs ± 0% 8.34µs ± 1% ~ (p=0.905 n=4+5) BM_UFlatSink/5 [html4 ] 239µs ± 0% 234µs ± 0% -2.33% (p=0.008 n=5+5) BM_UFlatSink/6 [txt1 ] 211µs ± 0% 205µs ± 0% -2.82% (p=0.008 n=5+5) BM_UFlatSink/7 [txt2 ] 185µs ± 0% 181µs ± 0% -2.18% (p=0.008 n=5+5) BM_UFlatSink/8 [txt3 ] 560µs ± 0% 545µs ± 0% -2.57% (p=0.008 n=5+5) BM_UFlatSink/9 [txt4 ] 773µs ± 0% 754µs ± 0% -2.54% (p=0.008 n=5+5) BM_UFlatSink/10 [pb ] 51.6µs ± 0% 45.3µs ± 0% -12.19% (p=0.008 n=5+5) BM_UFlatSink/11 [gaviota ] 209µs ± 0% 204µs ± 0% -2.39% (p=0.008 n=5+5) BM_UFlatSink/12 [cp ] 17.3µs ± 0% 15.6µs ± 0% -9.98% (p=0.008 n=5+5) BM_UFlatSink/13 [c ] 8.10µs ± 1% 7.98µs ± 0% -1.53% (p=0.008 n=5+5) BM_UFlatSink/14 [lsp ] 2.49µs ± 1% 2.47µs ± 0% -0.84% (p=0.008 n=5+5) BM_UFlatSink/15 [xls ] 968µs ± 0% 953µs ± 0% -1.48% (p=0.008 n=5+5) BM_UFlatSink/16 [xls_200 ] 220ns ± 1% 220ns ± 0% ~ (p=1.000 n=5+4) BM_UFlatSink/17 [bin ] 278µs ± 0% 275µs ± 0% -0.99% (p=0.008 n=5+5) BM_UFlatSink/18 [bin_200 ] 102ns ± 1% 103ns ± 0% +1.18% (p=0.048 n=5+5) BM_UFlatSink/19 [sum ] 34.0µs ± 0% 30.9µs ± 0% -9.21% (p=0.008 n=5+5) BM_UFlatSink/20 [man ] 3.22µs ± 1% 3.20µs ± 0% -0.76% (p=0.032 n=5+5) BM_ZFlat/0 [html (22.31 %) ] 122µs ± 0% 122µs ± 0% ~ (p=0.413 n=4+5) BM_ZFlat/1 [urls (47.78 %) ] 1.60ms ± 0% 1.60ms ± 0% -0.06% (p=0.032 n=5+5) BM_ZFlat/2 [jpg (99.95 %) ] 10.5µs ± 2% 10.7µs ± 9% ~ (p=0.841 n=5+5) BM_ZFlat/3 [jpg_200 (73.00 %)] 310ns ± 1% 309ns ± 3% ~ (p=0.349 n=4+5) BM_ZFlat/4 [pdf (83.30 %) ] 13.5µs ± 1% 13.6µs ± 2% ~ (p=0.595 n=5+5) BM_ZFlat/5 [html4 (22.52 %) ] 533µs ± 0% 532µs ± 0% -0.08% (p=0.032 n=5+5) BM_ZFlat/6 [txt1 (57.88 %) ] 529µs ± 0% 528µs ± 0% ~ (p=0.222 n=5+5) BM_ZFlat/7 [txt2 (61.91 %) ] 469µs ± 0% 469µs ± 0% ~ (p=0.690 n=5+5) BM_ZFlat/8 [txt3 (54.99 %) ] 1.40ms ± 0% 1.40ms ± 0% ~ (p=0.548 n=5+5) BM_ZFlat/9 [txt4 (66.26 %) ] 1.93ms ± 0% 1.92ms ± 0% ~ (p=0.421 n=5+5) BM_ZFlat/10 [pb (19.68 %) ] 106µs ± 0% 106µs ± 0% ~ (p=0.548 n=5+5) BM_ZFlat/11 [gaviota (37.72 %)] 404µs ± 0% 404µs ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/12 [cp (48.12 %) ] 43.2µs ± 0% 43.3µs ± 1% ~ (p=0.151 n=5+5) BM_ZFlat/13 [c (42.47 %) ] 16.4µs ± 1% 16.4µs ± 0% ~ (p=0.310 n=5+5) BM_ZFlat/14 [lsp (48.37 %) ] 4.96µs ± 0% 4.96µs ± 1% ~ (p=0.651 n=5+5) BM_ZFlat/15 [xls (41.23 %) ] 1.54ms ± 0% 1.54ms ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/16 [xls_200 (78.00 %)] 352ns ± 2% 351ns ± 1% ~ (p=0.762 n=5+5) BM_ZFlat/17 [bin (18.11 %) ] 491µs ± 0% 491µs ± 0% ~ (p=0.310 n=5+5) BM_ZFlat/18 [bin_200 (7.50 %) ] 75.6ns ± 1% 77.2ns ± 0% +2.06% (p=0.016 n=5+4) BM_ZFlat/19 [sum (48.96 %) ] 76.9µs ± 0% 76.7µs ± 0% ~ (p=0.222 n=5+5) BM_ZFlat/20 [man (59.21 %) ] 6.87µs ± 1% 6.81µs ± 0% -0.87% (p=0.008 n=5+5) name old speed new speed delta BM_UFlat/0 [html ] 1.85GB/s ± 0% 2.09GB/s ± 0% +12.83% (p=0.016 n=4+5) BM_UFlat/1 [urls ] 1.15GB/s ± 0% 1.16GB/s ± 0% +1.25% (p=0.008 n=5+5) BM_UFlat/2 [jpg ] 20.1GB/s ± 2% 20.3GB/s ± 1% ~ (p=0.421 n=5+5) BM_UFlat/3 [jpg_200 ] 1.49GB/s ± 0% 1.53GB/s ± 0% +2.83% (p=0.016 n=5+4) BM_UFlat/4 [pdf ] 12.2GB/s ± 2% 12.3GB/s ± 1% ~ (p=0.222 n=5+5) BM_UFlat/5 [html4 ] 1.71GB/s ± 0% 1.75GB/s ± 0% +2.29% (p=0.008 n=5+5) BM_UFlat/6 [txt1 ] 722MB/s ± 0% 742MB/s ± 0% +2.81% (p=0.008 n=5+5) BM_UFlat/7 [txt2 ] 676MB/s ± 0% 692MB/s ± 0% +2.40% (p=0.008 n=5+5) BM_UFlat/8 [txt3 ] 762MB/s ± 0% 782MB/s ± 0% +2.62% (p=0.008 n=5+5) BM_UFlat/9 [txt4 ] 623MB/s ± 0% 640MB/s ± 0% +2.68% (p=0.008 n=5+5) BM_UFlat/10 [pb ] 2.30GB/s ± 0% 2.62GB/s ± 0% +13.99% (p=0.008 n=5+5) BM_UFlat/11 [gaviota ] 883MB/s ± 0% 903MB/s ± 0% +2.33% (p=0.008 n=5+5) BM_UFlat/12 [cp ] 1.42GB/s ± 0% 1.57GB/s ± 1% +10.57% (p=0.008 n=5+5) BM_UFlat/13 [c ] 1.38GB/s ± 0% 1.39GB/s ± 0% +1.00% (p=0.008 n=5+5) BM_UFlat/14 [lsp ] 1.50GB/s ± 0% 1.52GB/s ± 0% +1.12% (p=0.008 n=5+5) BM_UFlat/15 [xls ] 1.06GB/s ± 0% 1.08GB/s ± 0% +1.34% (p=0.016 n=5+4) BM_UFlat/16 [xls_200 ] 913MB/s ± 1% 918MB/s ± 1% ~ (p=0.421 n=5+5) BM_UFlat/17 [bin ] 1.85GB/s ± 0% 1.86GB/s ± 0% +0.92% (p=0.008 n=5+5) BM_UFlat/18 [bin_200 ] 2.01GB/s ± 0% 2.03GB/s ± 1% +1.10% (p=0.008 n=5+5) BM_UFlat/19 [sum ] 1.13GB/s ± 0% 1.24GB/s ± 0% +9.99% (p=0.008 n=5+5) BM_UFlat/20 [man ] 1.32GB/s ± 0% 1.32GB/s ± 1% ~ (p=0.063 n=5+5) BM_UValidate/0 [html ] 3.10GB/s ± 0% 3.04GB/s ± 0% -1.66% (p=0.008 n=5+5) BM_UValidate/1 [urls ] 1.61GB/s ± 0% 1.59GB/s ± 0% -1.04% (p=0.008 n=5+5) BM_UValidate/2 [jpg ] 875GB/s ± 0% 866GB/s ± 0% -1.11% (p=0.008 n=5+5) BM_UValidate/3 [jpg_200 ] 2.12GB/s ± 0% 2.10GB/s ± 0% -1.01% (p=0.016 n=5+4) BM_UValidate/4 [pdf ] 35.7GB/s ± 0% 34.7GB/s ± 0% -2.66% (p=0.008 n=5+5) BM_UIOVec/0 [html ] 813MB/s ± 0% 825MB/s ± 0% +1.52% (p=0.008 n=5+5) BM_UIOVec/1 [urls ] 622MB/s ± 0% 634MB/s ± 0% +1.99% (p=0.008 n=5+5) BM_UIOVec/2 [jpg ] 19.5GB/s ± 3% 16.6GB/s ± 3% -15.08% (p=0.008 n=5+5) BM_UIOVec/3 [jpg_200 ] 603MB/s ± 1% 630MB/s ± 1% +4.42% (p=0.008 n=5+5) BM_UIOVec/4 [pdf ] 8.05GB/s ± 3% 8.12GB/s ± 8% ~ (p=0.222 n=5+5) BM_UFlatSink/0 [html ] 1.85GB/s ± 0% 2.09GB/s ± 0% +12.76% (p=0.008 n=5+5) BM_UFlatSink/1 [urls ] 1.15GB/s ± 0% 1.16GB/s ± 0% +1.18% (p=0.008 n=5+5) BM_UFlatSink/2 [jpg ] 19.6GB/s ±11% 18.8GB/s ± 9% ~ (p=0.548 n=5+5) BM_UFlatSink/3 [jpg_200 ] 1.45GB/s ± 1% 1.49GB/s ± 0% +2.82% (p=0.016 n=5+4) BM_UFlatSink/4 [pdf ] 12.3GB/s ± 0% 12.3GB/s ± 1% ~ (p=0.905 n=4+5) BM_UFlatSink/5 [html4 ] 1.71GB/s ± 0% 1.75GB/s ± 0% +2.41% (p=0.008 n=5+5) BM_UFlatSink/6 [txt1 ] 722MB/s ± 0% 743MB/s ± 0% +2.90% (p=0.008 n=5+5) BM_UFlatSink/7 [txt2 ] 676MB/s ± 0% 691MB/s ± 0% +2.23% (p=0.008 n=5+5) BM_UFlatSink/8 [txt3 ] 763MB/s ± 0% 783MB/s ± 0% +2.64% (p=0.008 n=5+5) BM_UFlatSink/9 [txt4 ] 623MB/s ± 0% 639MB/s ± 0% +2.61% (p=0.008 n=5+5) BM_UFlatSink/10 [pb ] 2.30GB/s ± 0% 2.62GB/s ± 0% +13.86% (p=0.008 n=5+5) BM_UFlatSink/11 [gaviota ] 882MB/s ± 0% 904MB/s ± 0% +2.45% (p=0.008 n=5+5) BM_UFlatSink/12 [cp ] 1.42GB/s ± 0% 1.58GB/s ± 0% +11.09% (p=0.008 n=5+5) BM_UFlatSink/13 [c ] 1.38GB/s ± 1% 1.40GB/s ± 0% +1.56% (p=0.008 n=5+5) BM_UFlatSink/14 [lsp ] 1.50GB/s ± 1% 1.51GB/s ± 1% +0.85% (p=0.008 n=5+5) BM_UFlatSink/15 [xls ] 1.06GB/s ± 0% 1.08GB/s ± 0% +1.51% (p=0.016 n=5+4) BM_UFlatSink/16 [xls_200 ] 908MB/s ± 1% 911MB/s ± 0% ~ (p=0.730 n=5+4) BM_UFlatSink/17 [bin ] 1.85GB/s ± 0% 1.86GB/s ± 0% +1.01% (p=0.008 n=5+5) BM_UFlatSink/18 [bin_200 ] 1.96GB/s ± 1% 1.94GB/s ± 1% -1.18% (p=0.016 n=5+5) BM_UFlatSink/19 [sum ] 1.12GB/s ± 0% 1.24GB/s ± 0% +10.16% (p=0.008 n=5+5) BM_UFlatSink/20 [man ] 1.31GB/s ± 1% 1.32GB/s ± 0% +0.77% (p=0.048 n=5+5) BM_ZFlat/0 [html (22.31 %) ] 839MB/s ± 0% 839MB/s ± 0% ~ (p=0.413 n=4+5) BM_ZFlat/1 [urls (47.78 %) ] 439MB/s ± 0% 439MB/s ± 0% +0.06% (p=0.032 n=5+5) BM_ZFlat/2 [jpg (99.95 %) ] 11.7GB/s ± 2% 11.5GB/s ± 9% ~ (p=0.841 n=5+5) BM_ZFlat/3 [jpg_200 (73.00 %)] 645MB/s ± 1% 647MB/s ± 3% ~ (p=0.413 n=4+5) BM_ZFlat/4 [pdf (83.30 %) ] 7.57GB/s ± 1% 7.54GB/s ± 2% ~ (p=0.595 n=5+5) BM_ZFlat/5 [html4 (22.52 %) ] 769MB/s ± 0% 770MB/s ± 0% +0.08% (p=0.032 n=5+5) BM_ZFlat/6 [txt1 (57.88 %) ] 288MB/s ± 0% 288MB/s ± 0% ~ (p=0.222 n=5+5) BM_ZFlat/7 [txt2 (61.91 %) ] 267MB/s ± 0% 267MB/s ± 0% ~ (p=0.690 n=5+5) BM_ZFlat/8 [txt3 (54.99 %) ] 305MB/s ± 0% 305MB/s ± 0% ~ (p=0.548 n=5+5) BM_ZFlat/9 [txt4 (66.26 %) ] 250MB/s ± 0% 251MB/s ± 0% ~ (p=0.421 n=5+5) BM_ZFlat/10 [pb (19.68 %) ] 1.12GB/s ± 0% 1.12GB/s ± 0% ~ (p=0.635 n=5+5) BM_ZFlat/11 [gaviota (37.72 %)] 457MB/s ± 0% 457MB/s ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/12 [cp (48.12 %) ] 570MB/s ± 0% 568MB/s ± 1% ~ (p=0.151 n=5+5) BM_ZFlat/13 [c (42.47 %) ] 682MB/s ± 1% 681MB/s ± 0% ~ (p=0.310 n=5+5) BM_ZFlat/14 [lsp (48.37 %) ] 750MB/s ± 0% 751MB/s ± 1% ~ (p=0.690 n=5+5) BM_ZFlat/15 [xls (41.23 %) ] 668MB/s ± 0% 668MB/s ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/16 [xls_200 (78.00 %)] 569MB/s ± 2% 570MB/s ± 1% ~ (p=0.841 n=5+5) BM_ZFlat/17 [bin (18.11 %) ] 1.04GB/s ± 0% 1.04GB/s ± 0% ~ (p=0.310 n=5+5) BM_ZFlat/18 [bin_200 (7.50 %) ] 2.64GB/s ± 1% 2.59GB/s ± 0% -1.99% (p=0.016 n=5+4) BM_ZFlat/19 [sum (48.96 %) ] 497MB/s ± 0% 498MB/s ± 0% ~ (p=0.222 n=5+5) BM_ZFlat/20 [man (59.21 %) ] 615MB/s ± 1% 621MB/s ± 0% +0.87% (p=0.008 n=5+5) K8 -- name old time/op new time/op delta BM_UFlat/0 [html ] 41.7µs ± 0% 41.7µs ± 0% ~ (p=0.841 n=5+5) BM_UFlat/1 [urls ] 588µs ± 0% 588µs ± 0% ~ (p=0.310 n=5+5) BM_UFlat/2 [jpg ] 7.11µs ± 1% 7.10µs ± 1% ~ (p=0.556 n=5+4) BM_UFlat/3 [jpg_200 ] 130ns ± 0% 130ns ± 0% ~ (all samples are equal) BM_UFlat/4 [pdf ] 8.19µs ± 0% 8.26µs ± 2% ~ (p=0.460 n=5+5) BM_UFlat/5 [html4 ] 219µs ± 0% 219µs ± 0% ~ (p=1.000 n=5+5) BM_UFlat/6 [txt1 ] 192µs ± 0% 191µs ± 0% ~ (p=0.341 n=5+5) BM_UFlat/7 [txt2 ] 170µs ± 0% 170µs ± 0% ~ (p=0.841 n=5+5) BM_UFlat/8 [txt3 ] 509µs ± 0% 509µs ± 0% ~ (p=0.151 n=5+5) BM_UFlat/9 [txt4 ] 712µs ± 0% 712µs ± 0% ~ (p=0.841 n=5+5) BM_UFlat/10 [pb ] 38.5µs ± 0% 38.5µs ± 0% ~ (p=0.452 n=5+5) BM_UFlat/11 [gaviota ] 189µs ± 0% 189µs ± 0% ~ (p=0.841 n=5+5) BM_UFlat/12 [cp ] 14.2µs ± 1% 14.2µs ± 0% ~ (p=0.889 n=5+5) BM_UFlat/13 [c ] 7.32µs ± 0% 7.33µs ± 0% ~ (p=1.000 n=5+5) BM_UFlat/14 [lsp ] 2.26µs ± 0% 2.27µs ± 0% ~ (p=0.222 n=4+5) BM_UFlat/15 [xls ] 954µs ± 0% 955µs ± 0% ~ (p=0.222 n=5+5) BM_UFlat/16 [xls_200 ] 215ns ± 4% 212ns ± 0% ~ (p=0.095 n=5+4) BM_UFlat/17 [bin ] 276µs ± 0% 276µs ± 0% ~ (p=0.841 n=5+5) BM_UFlat/18 [bin_200 ] 104ns ±10% 103ns ± 3% ~ (p=0.825 n=5+5) BM_UFlat/19 [sum ] 29.2µs ± 0% 29.2µs ± 0% ~ (p=0.690 n=5+5) BM_UFlat/20 [man ] 2.96µs ± 0% 2.97µs ± 0% +0.43% (p=0.032 n=5+5) BM_UValidate/0 [html ] 33.4µs ± 0% 33.4µs ± 0% ~ (p=0.151 n=5+5) BM_UValidate/1 [urls ] 441µs ± 0% 441µs ± 0% ~ (p=0.548 n=5+5) BM_UValidate/2 [jpg ] 146ns ± 0% 146ns ± 0% ~ (all samples are equal) BM_UValidate/3 [jpg_200 ] 98.0ns ± 0% 98.0ns ± 0% ~ (p=1.000 n=5+5) BM_UValidate/4 [pdf ] 2.89µs ± 0% 2.89µs ± 0% ~ (p=0.794 n=5+5) BM_UIOVec/0 [html ] 121µs ± 0% 121µs ± 0% ~ (p=0.151 n=5+5) BM_UIOVec/1 [urls ] 1.08ms ± 0% 1.08ms ± 0% ~ (p=0.095 n=5+5) BM_UIOVec/2 [jpg ] 7.47µs ± 5% 7.31µs ± 2% ~ (p=0.222 n=5+5) BM_UIOVec/3 [jpg_200 ] 330ns ± 0% 330ns ± 0% ~ (all samples are equal) BM_UIOVec/4 [pdf ] 12.3µs ± 2% 12.0µs ± 0% ~ (p=0.063 n=5+5) BM_UFlatSink/0 [html ] 41.6µs ± 0% 41.6µs ± 0% ~ (p=0.095 n=5+5) BM_UFlatSink/1 [urls ] 589µs ± 0% 589µs ± 0% ~ (p=1.000 n=5+5) BM_UFlatSink/2 [jpg ] 7.84µs ±26% 7.23µs ± 5% ~ (p=0.690 n=5+5) BM_UFlatSink/3 [jpg_200 ] 132ns ± 0% 132ns ± 0% ~ (all samples are equal) BM_UFlatSink/4 [pdf ] 8.43µs ± 3% 8.27µs ± 2% ~ (p=0.254 n=5+5) BM_UFlatSink/5 [html4 ] 219µs ± 0% 219µs ± 0% ~ (p=0.524 n=5+5) BM_UFlatSink/6 [txt1 ] 192µs ± 0% 192µs ± 0% ~ (p=0.690 n=5+5) BM_UFlatSink/7 [txt2 ] 170µs ± 0% 170µs ± 0% ~ (p=0.421 n=5+5) BM_UFlatSink/8 [txt3 ] 509µs ± 0% 509µs ± 0% ~ (p=0.310 n=5+5) BM_UFlatSink/9 [txt4 ] 712µs ± 0% 712µs ± 0% ~ (p=0.841 n=5+5) BM_UFlatSink/10 [pb ] 38.5µs ± 0% 38.5µs ± 0% ~ (p=0.421 n=5+5) BM_UFlatSink/11 [gaviota ] 189µs ± 0% 189µs ± 0% ~ (p=1.000 n=5+5) BM_UFlatSink/12 [cp ] 14.2µs ± 0% 14.2µs ± 0% ~ (p=0.421 n=5+5) BM_UFlatSink/13 [c ] 7.37µs ± 1% 7.36µs ± 1% ~ (p=0.746 n=5+5) BM_UFlatSink/14 [lsp ] 2.27µs ± 0% 2.27µs ± 1% ~ (p=0.714 n=5+5) BM_UFlatSink/15 [xls ] 954µs ± 0% 954µs ± 0% ~ (p=1.000 n=5+5) BM_UFlatSink/16 [xls_200 ] 215ns ± 1% 215ns ± 1% ~ (p=0.921 n=5+5) BM_UFlatSink/17 [bin ] 276µs ± 0% 276µs ± 0% ~ (p=1.000 n=5+5) BM_UFlatSink/18 [bin_200 ] 103ns ± 2% 104ns ± 1% ~ (p=0.429 n=5+5) BM_UFlatSink/19 [sum ] 29.2µs ± 0% 29.2µs ± 0% ~ (p=0.452 n=5+5) BM_UFlatSink/20 [man ] 2.96µs ± 0% 2.97µs ± 1% ~ (p=0.484 n=5+5) BM_ZFlat/0 [html (22.31 %) ] 126µs ± 0% 126µs ± 0% ~ (p=1.000 n=5+5) BM_ZFlat/1 [urls (47.78 %) ] 1.67ms ± 0% 1.67ms ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/2 [jpg (99.95 %) ] 11.6µs ± 4% 11.6µs ± 3% ~ (p=1.000 n=5+5) BM_ZFlat/3 [jpg_200 (73.00 %)] 368ns ± 1% 367ns ± 0% ~ (p=0.159 n=5+5) BM_ZFlat/4 [pdf (83.30 %) ] 14.7µs ± 1% 14.6µs ± 0% ~ (p=0.190 n=5+4) BM_ZFlat/5 [html4 (22.52 %) ] 550µs ± 0% 550µs ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/6 [txt1 (57.88 %) ] 540µs ± 0% 540µs ± 0% ~ (p=0.310 n=5+5) BM_ZFlat/7 [txt2 (61.91 %) ] 479µs ± 0% 480µs ± 0% ~ (p=1.000 n=5+5) BM_ZFlat/8 [txt3 (54.99 %) ] 1.44ms ± 0% 1.44ms ± 0% ~ (p=0.421 n=5+5) BM_ZFlat/9 [txt4 (66.26 %) ] 1.97ms ± 0% 1.97ms ± 0% ~ (p=0.421 n=5+5) BM_ZFlat/10 [pb (19.68 %) ] 110µs ± 0% 109µs ± 0% ~ (p=0.730 n=5+4) BM_ZFlat/11 [gaviota (37.72 %)] 412µs ± 0% 412µs ± 0% ~ (p=1.000 n=5+5) BM_ZFlat/12 [cp (48.12 %) ] 46.3µs ± 0% 46.3µs ± 1% ~ (p=0.841 n=5+5) BM_ZFlat/13 [c (42.47 %) ] 17.7µs ± 0% 17.7µs ± 1% ~ (p=0.841 n=5+5) BM_ZFlat/14 [lsp (48.37 %) ] 5.54µs ± 1% 5.55µs ± 0% ~ (p=0.254 n=5+4) BM_ZFlat/15 [xls (41.23 %) ] 1.62ms ± 0% 1.63ms ± 0% ~ (p=0.151 n=5+5) BM_ZFlat/16 [xls_200 (78.00 %)] 395ns ± 2% 394ns ± 1% ~ (p=1.000 n=5+5) BM_ZFlat/17 [bin (18.11 %) ] 507µs ± 0% 507µs ± 0% ~ (p=0.056 n=5+5) BM_ZFlat/18 [bin_200 (7.50 %) ] 89.6ns ± 5% 89.8ns ± 5% ~ (p=1.000 n=5+5) BM_ZFlat/19 [sum (48.96 %) ] 79.9µs ± 0% 79.9µs ± 0% ~ (p=0.690 n=5+5) BM_ZFlat/20 [man (59.21 %) ] 7.67µs ± 0% 7.67µs ± 1% ~ (p=0.548 n=5+5) name old speed new speed delta BM_UFlat/0 [html ] 2.45GB/s ± 0% 2.45GB/s ± 0% ~ (p=0.889 n=5+5) BM_UFlat/1 [urls ] 1.19GB/s ± 0% 1.19GB/s ± 0% ~ (all samples are equal) BM_UFlat/2 [jpg ] 17.3GB/s ± 1% 17.3GB/s ± 1% ~ (p=0.556 n=5+4) BM_UFlat/3 [jpg_200 ] 1.54GB/s ± 0% 1.54GB/s ± 0% ~ (p=0.833 n=5+5) BM_UFlat/4 [pdf ] 12.5GB/s ± 0% 12.4GB/s ± 2% ~ (p=0.421 n=5+5) BM_UFlat/5 [html4 ] 1.87GB/s ± 0% 1.87GB/s ± 0% ~ (p=1.000 n=4+5) BM_UFlat/6 [txt1 ] 794MB/s ± 0% 794MB/s ± 0% ~ (p=0.310 n=5+5) BM_UFlat/7 [txt2 ] 738MB/s ± 0% 738MB/s ± 0% ~ (p=0.841 n=5+5) BM_UFlat/8 [txt3 ] 839MB/s ± 0% 838MB/s ± 0% ~ (p=0.151 n=5+5) BM_UFlat/9 [txt4 ] 677MB/s ± 0% 677MB/s ± 0% ~ (p=0.841 n=5+5) BM_UFlat/10 [pb ] 3.08GB/s ± 0% 3.08GB/s ± 0% ~ (p=0.452 n=5+5) BM_UFlat/11 [gaviota ] 975MB/s ± 0% 975MB/s ± 0% ~ (p=0.841 n=5+5) BM_UFlat/12 [cp ] 1.73GB/s ± 1% 1.73GB/s ± 0% ~ (p=0.984 n=5+5) BM_UFlat/13 [c ] 1.52GB/s ± 0% 1.52GB/s ± 0% ~ (p=0.841 n=5+5) BM_UFlat/14 [lsp ] 1.64GB/s ± 0% 1.64GB/s ± 0% ~ (p=0.254 n=4+5) BM_UFlat/15 [xls ] 1.08GB/s ± 0% 1.08GB/s ± 0% ~ (p=0.095 n=5+4) BM_UFlat/16 [xls_200 ] 931MB/s ± 4% 941MB/s ± 0% ~ (p=0.151 n=5+5) BM_UFlat/17 [bin ] 1.86GB/s ± 0% 1.86GB/s ± 0% ~ (p=0.762 n=5+5) BM_UFlat/18 [bin_200 ] 1.92GB/s ± 9% 1.95GB/s ± 3% ~ (p=1.000 n=5+5) BM_UFlat/19 [sum ] 1.31GB/s ± 1% 1.31GB/s ± 0% ~ (p=0.548 n=5+5) BM_UFlat/20 [man ] 1.43GB/s ± 0% 1.42GB/s ± 1% -0.42% (p=0.040 n=5+5) BM_UValidate/0 [html ] 3.06GB/s ± 0% 3.06GB/s ± 0% ~ (p=0.151 n=5+5) BM_UValidate/1 [urls ] 1.59GB/s ± 0% 1.59GB/s ± 0% ~ (p=0.357 n=5+5) BM_UValidate/2 [jpg ] 845GB/s ± 0% 845GB/s ± 0% ~ (p=0.548 n=5+5) BM_UValidate/3 [jpg_200 ] 2.04GB/s ± 0% 2.04GB/s ± 0% ~ (p=1.000 n=5+5) BM_UValidate/4 [pdf ] 35.4GB/s ± 0% 35.4GB/s ± 0% ~ (p=0.421 n=5+5) BM_UIOVec/0 [html ] 845MB/s ± 0% 845MB/s ± 0% ~ (p=0.151 n=5+5) BM_UIOVec/1 [urls ] 650MB/s ± 0% 650MB/s ± 0% ~ (p=0.087 n=5+5) BM_UIOVec/2 [jpg ] 16.5GB/s ± 5% 16.8GB/s ± 2% ~ (p=0.222 n=5+5) BM_UIOVec/3 [jpg_200 ] 605MB/s ± 0% 605MB/s ± 0% ~ (p=0.690 n=5+5) BM_UIOVec/4 [pdf ] 8.36GB/s ± 2% 8.54GB/s ± 0% ~ (p=0.063 n=5+5) BM_UFlatSink/0 [html ] 2.46GB/s ± 0% 2.46GB/s ± 0% ~ (p=0.063 n=5+5) BM_UFlatSink/1 [urls ] 1.19GB/s ± 0% 1.19GB/s ± 0% ~ (all samples are equal) BM_UFlatSink/2 [jpg ] 16.0GB/s ±22% 17.0GB/s ± 5% ~ (p=0.690 n=5+5) BM_UFlatSink/3 [jpg_200 ] 1.51GB/s ± 0% 1.51GB/s ± 2% ~ (p=1.000 n=5+5) BM_UFlatSink/4 [pdf ] 12.2GB/s ± 3% 12.4GB/s ± 2% ~ (p=0.254 n=5+5) BM_UFlatSink/5 [html4 ] 1.87GB/s ± 0% 1.87GB/s ± 0% ~ (p=0.532 n=5+5) BM_UFlatSink/6 [txt1 ] 794MB/s ± 0% 794MB/s ± 0% ~ (p=0.690 n=5+5) BM_UFlatSink/7 [txt2 ] 738MB/s ± 0% 738MB/s ± 0% ~ (p=0.421 n=5+5) BM_UFlatSink/8 [txt3 ] 838MB/s ± 0% 838MB/s ± 0% ~ (p=0.310 n=5+5) BM_UFlatSink/9 [txt4 ] 676MB/s ± 0% 676MB/s ± 0% ~ (p=0.841 n=5+5) BM_UFlatSink/10 [pb ] 3.08GB/s ± 0% 3.08GB/s ± 0% ~ (p=0.365 n=5+5) BM_UFlatSink/11 [gaviota ] 975MB/s ± 0% 975MB/s ± 0% ~ (p=1.000 n=5+5) BM_UFlatSink/12 [cp ] 1.73GB/s ± 0% 1.74GB/s ± 0% ~ (p=0.286 n=5+5) BM_UFlatSink/13 [c ] 1.51GB/s ± 1% 1.52GB/s ± 1% ~ (p=0.683 n=5+5) BM_UFlatSink/14 [lsp ] 1.64GB/s ± 0% 1.64GB/s ± 0% ~ (p=0.444 n=5+5) BM_UFlatSink/15 [xls ] 1.08GB/s ± 0% 1.08GB/s ± 0% ~ (p=0.333 n=4+5) BM_UFlatSink/16 [xls_200 ] 930MB/s ± 1% 930MB/s ± 1% ~ (p=0.841 n=5+5) BM_UFlatSink/17 [bin ] 1.86GB/s ± 0% 1.86GB/s ± 0% ~ (p=1.000 n=5+5) BM_UFlatSink/18 [bin_200 ] 1.93GB/s ± 2% 1.93GB/s ± 1% ~ (p=0.651 n=5+5) BM_UFlatSink/19 [sum ] 1.31GB/s ± 0% 1.31GB/s ± 0% ~ (p=0.508 n=5+5) BM_UFlatSink/20 [man ] 1.43GB/s ± 0% 1.42GB/s ± 1% ~ (p=0.524 n=5+5) BM_ZFlat/0 [html (22.31 %) ] 815MB/s ± 0% 815MB/s ± 0% ~ (p=1.000 n=5+5) BM_ZFlat/1 [urls (47.78 %) ] 420MB/s ± 0% 420MB/s ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/2 [jpg (99.95 %) ] 10.6GB/s ± 4% 10.6GB/s ± 3% ~ (p=1.000 n=5+5) BM_ZFlat/3 [jpg_200 (73.00 %)] 543MB/s ± 1% 546MB/s ± 0% ~ (p=0.095 n=5+5) BM_ZFlat/4 [pdf (83.30 %) ] 6.96GB/s ± 1% 7.01GB/s ± 0% ~ (p=0.190 n=5+4) BM_ZFlat/5 [html4 (22.52 %) ] 745MB/s ± 0% 745MB/s ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/6 [txt1 (57.88 %) ] 282MB/s ± 0% 282MB/s ± 0% ~ (p=0.310 n=5+5) BM_ZFlat/7 [txt2 (61.91 %) ] 261MB/s ± 0% 261MB/s ± 0% ~ (p=1.000 n=5+5) BM_ZFlat/8 [txt3 (54.99 %) ] 297MB/s ± 0% 297MB/s ± 0% ~ (p=0.421 n=5+5) BM_ZFlat/9 [txt4 (66.26 %) ] 244MB/s ± 0% 244MB/s ± 0% ~ (p=0.389 n=5+5) BM_ZFlat/10 [pb (19.68 %) ] 1.08GB/s ± 0% 1.08GB/s ± 0% ~ (p=0.238 n=5+4) BM_ZFlat/11 [gaviota (37.72 %)] 448MB/s ± 0% 447MB/s ± 0% ~ (p=1.000 n=5+5) BM_ZFlat/12 [cp (48.12 %) ] 532MB/s ± 0% 531MB/s ± 1% ~ (p=0.841 n=5+5) BM_ZFlat/13 [c (42.47 %) ] 632MB/s ± 0% 631MB/s ± 1% ~ (p=0.841 n=5+5) BM_ZFlat/14 [lsp (48.37 %) ] 672MB/s ± 1% 671MB/s ± 0% ~ (p=0.286 n=5+4) BM_ZFlat/15 [xls (41.23 %) ] 634MB/s ± 0% 633MB/s ± 0% ~ (p=0.151 n=5+5) BM_ZFlat/16 [xls_200 (78.00 %)] 507MB/s ± 2% 508MB/s ± 1% ~ (p=1.000 n=5+5) BM_ZFlat/17 [bin (18.11 %) ] 1.01GB/s ± 0% 1.01GB/s ± 0% ~ (p=0.056 n=5+5) BM_ZFlat/18 [bin_200 (7.50 %) ] 2.24GB/s ± 5% 2.23GB/s ± 5% ~ (p=0.889 n=5+5) BM_ZFlat/19 [sum (48.96 %) ] 479MB/s ± 0% 479MB/s ± 0% ~ (p=0.690 n=5+5) BM_ZFlat/20 [man (59.21 %) ] 551MB/s ± 0% 551MB/s ± 1% ~ (p=0.548 n=5+5) |
|
nafi | eb47f79631 |
Optimize by about 0.5%.
How? Move boolean args of EmitLiteral, EmitCopyAtMost64 and EmitCopy to template args so that compiler generates two separate pruned versions of the functions for arg=true and arg=false. FWIW, CompressFragment function calls 1) EmitLiteral inside from a 1-level loop and 2) EmitCopy from a 2-level nested loop. CompressFragment is called from inside another while-loop from the public 'Compress' function. name old time/op new time/op delta BM_UFlat/0 [html ] 41.9µs ± 0% 41.1µs ± 0% -1.92% (p=0.000 n=10+10) BM_UFlat/1 [urls ] 576µs ± 0% 572µs ± 0% -0.68% (p=0.000 n=10+10) BM_UFlat/2 [jpg ] 7.25µs ± 6% 7.13µs ± 1% ~ (p=0.074 n=9+8) BM_UFlat/3 [jpg_200 ] 132ns ± 1% 130ns ± 0% -1.45% (p=0.000 n=10+8) BM_UFlat/4 [pdf ] 8.27µs ± 3% 8.22µs ± 0% ~ (p=0.277 n=9+8) BM_UFlat/5 [html4 ] 220µs ± 0% 219µs ± 0% -0.75% (p=0.000 n=10+10) BM_UFlat/6 [txt1 ] 192µs ± 0% 190µs ± 0% -0.80% (p=0.000 n=10+10) BM_UFlat/7 [txt2 ] 169µs ± 0% 168µs ± 0% -0.69% (p=0.000 n=10+10) BM_UFlat/8 [txt3 ] 510µs ± 0% 508µs ± 0% -0.42% (p=0.000 n=10+10) BM_UFlat/9 [txt4 ] 707µs ± 0% 702µs ± 0% -0.67% (p=0.000 n=10+10) BM_UFlat/10 [pb ] 38.5µs ± 0% 37.4µs ± 1% -2.84% (p=0.000 n=10+10) BM_UFlat/11 [gaviota ] 189µs ± 0% 190µs ± 0% +0.55% (p=0.000 n=10+10) BM_UFlat/12 [cp ] 14.2µs ± 0% 14.1µs ± 0% -0.44% (p=0.000 n=10+10) BM_UFlat/13 [c ] 7.31µs ± 1% 7.35µs ± 0% +0.54% (p=0.002 n=10+10) BM_UFlat/14 [lsp ] 2.27µs ± 0% 2.27µs ± 1% ~ (p=0.161 n=9+9) BM_UFlat/15 [xls ] 905µs ± 0% 903µs ± 0% -0.25% (p=0.000 n=10+10) BM_UFlat/16 [xls_200 ] 214ns ± 1% 213ns ± 1% -0.57% (p=0.043 n=10+10) BM_UFlat/17 [bin ] 275µs ± 0% 274µs ± 0% -0.31% (p=0.000 n=10+10) BM_UFlat/18 [bin_200 ] 102ns ± 5% 101ns ± 3% ~ (p=0.161 n=9+9) BM_UFlat/19 [sum ] 27.9µs ± 0% 27.2µs ± 0% -2.68% (p=0.000 n=10+10) BM_UFlat/20 [man ] 2.97µs ± 1% 2.97µs ± 0% ~ (p=0.400 n=9+10) BM_UValidate/0 [html ] 33.3µs ± 0% 33.7µs ± 0% +1.18% (p=0.000 n=10+10) BM_UValidate/1 [urls ] 442µs ± 0% 442µs ± 0% ~ (p=0.353 n=10+10) BM_UValidate/2 [jpg ] 146ns ± 0% 146ns ± 0% ~ (p=0.063 n=10+10) BM_UValidate/3 [jpg_200 ] 98.4ns ± 0% 98.5ns ± 0% ~ (p=0.184 n=10+10) BM_UValidate/4 [pdf ] 2.88µs ± 0% 2.90µs ± 1% +0.68% (p=0.000 n=10+10) BM_UIOVec/0 [html ] 122µs ± 0% 122µs ± 0% -0.39% (p=0.000 n=10+10) BM_UIOVec/1 [urls ] 1.08ms ± 0% 1.08ms ± 0% ~ (p=0.529 n=10+10) BM_UIOVec/2 [jpg ] 7.71µs ±11% 7.76µs ± 9% ~ (p=0.853 n=10+10) BM_UIOVec/3 [jpg_200 ] 327ns ± 0% 328ns ± 0% ~ (p=0.146 n=8+10) BM_UIOVec/4 [pdf ] 12.1µs ± 1% 12.1µs ± 3% ~ (p=0.315 n=10+10) BM_UFlatSink/0 [html ] 41.8µs ± 0% 41.0µs ± 0% -1.87% (p=0.000 n=10+9) BM_UFlatSink/1 [urls ] 576µs ± 0% 572µs ± 0% -0.74% (p=0.000 n=9+10) BM_UFlatSink/2 [jpg ] 7.58µs ± 8% 7.56µs ± 9% ~ (p=0.739 n=10+10) BM_UFlatSink/3 [jpg_200 ] 133ns ± 0% 134ns ± 0% +0.60% (p=0.000 n=10+9) BM_UFlatSink/4 [pdf ] 8.44µs ± 3% 8.30µs ± 1% -1.65% (p=0.029 n=10+10) BM_UFlatSink/5 [html4 ] 220µs ± 0% 218µs ± 0% -0.81% (p=0.000 n=10+10) BM_UFlatSink/6 [txt1 ] 192µs ± 0% 190µs ± 0% -0.78% (p=0.000 n=10+10) BM_UFlatSink/7 [txt2 ] 169µs ± 0% 168µs ± 0% -0.59% (p=0.000 n=10+10) BM_UFlatSink/8 [txt3 ] 510µs ± 0% 508µs ± 0% -0.39% (p=0.000 n=10+10) BM_UFlatSink/9 [txt4 ] 707µs ± 0% 703µs ± 0% -0.62% (p=0.000 n=10+10) BM_UFlatSink/10 [pb ] 38.4µs ± 0% 37.4µs ± 0% -2.62% (p=0.000 n=9+9) BM_UFlatSink/11 [gaviota ] 189µs ± 0% 190µs ± 0% +0.63% (p=0.000 n=10+10) BM_UFlatSink/12 [cp ] 14.2µs ± 0% 14.1µs ± 0% -0.27% (p=0.011 n=10+10) BM_UFlatSink/13 [c ] 7.33µs ± 1% 7.35µs ± 1% ~ (p=0.243 n=10+9) BM_UFlatSink/14 [lsp ] 2.27µs ± 0% 2.26µs ± 0% -0.39% (p=0.000 n=9+9) BM_UFlatSink/15 [xls ] 904µs ± 0% 902µs ± 0% -0.28% (p=0.000 n=10+10) BM_UFlatSink/16 [xls_200 ] 216ns ± 1% 217ns ± 1% ~ (p=0.661 n=10+9) BM_UFlatSink/17 [bin ] 275µs ± 0% 274µs ± 0% -0.24% (p=0.000 n=8+9) BM_UFlatSink/18 [bin_200 ] 104ns ± 2% 104ns ± 1% -0.70% (p=0.043 n=9+10) BM_UFlatSink/19 [sum ] 27.8µs ± 0% 27.1µs ± 0% -2.51% (p=0.000 n=9+10) BM_UFlatSink/20 [man ] 3.02µs ± 1% 3.00µs ± 1% ~ (p=0.079 n=10+9) BM_ZFlat/0 [html (22.31 %) ] 126µs ± 0% 126µs ± 0% -0.24% (p=0.000 n=10+10) BM_ZFlat/1 [urls (47.78 %) ] 1.68ms ± 0% 1.67ms ± 0% -1.06% (p=0.000 n=10+10) BM_ZFlat/2 [jpg (99.95 %) ] 11.8µs ± 5% 11.6µs ± 5% ~ (p=0.165 n=10+10) BM_ZFlat/3 [jpg_200 (73.00 %)] 360ns ± 3% 358ns ± 1% ~ (p=0.762 n=10+8) BM_ZFlat/4 [pdf (83.30 %) ] 14.8µs ± 2% 14.6µs ± 1% -1.57% (p=0.022 n=10+9) BM_ZFlat/5 [html4 (22.52 %) ] 556µs ± 0% 552µs ± 0% -0.87% (p=0.000 n=10+10) BM_ZFlat/6 [txt1 (57.88 %) ] 542µs ± 0% 540µs ± 0% -0.47% (p=0.000 n=10+10) BM_ZFlat/7 [txt2 (61.91 %) ] 483µs ± 0% 480µs ± 0% -0.62% (p=0.000 n=10+10) BM_ZFlat/8 [txt3 (54.99 %) ] 1.45ms ± 0% 1.44ms ± 0% -0.47% (p=0.000 n=10+10) BM_ZFlat/9 [txt4 (66.26 %) ] 1.98ms ± 0% 1.97ms ± 0% -0.19% (p=0.007 n=10+10) BM_ZFlat/10 [pb (19.68 %) ] 111µs ± 0% 109µs ± 0% -1.75% (p=0.000 n=10+10) BM_ZFlat/11 [gaviota (37.72 %)] 411µs ± 0% 410µs ± 0% -0.21% (p=0.004 n=10+10) BM_ZFlat/12 [cp (48.12 %) ] 45.9µs ± 0% 45.5µs ± 0% -0.76% (p=0.000 n=10+10) BM_ZFlat/13 [c (42.47 %) ] 17.6µs ± 0% 17.5µs ± 0% -0.80% (p=0.000 n=10+10) BM_ZFlat/14 [lsp (48.37 %) ] 5.50µs ± 0% 5.44µs ± 0% -1.19% (p=0.000 n=9+10) BM_ZFlat/15 [xls (41.23 %) ] 1.63ms ± 0% 1.61ms ± 0% -1.21% (p=0.000 n=10+10) BM_ZFlat/16 [xls_200 (78.00 %)] 389ns ± 2% 391ns ± 1% ~ (p=0.182 n=10+9) BM_ZFlat/17 [bin (18.11 %) ] 509µs ± 0% 506µs ± 0% -0.51% (p=0.000 n=10+10) BM_ZFlat/18 [bin_200 (7.50 %) ] 92.7ns ± 0% 89.4ns ± 1% -3.55% (p=0.000 n=8+8) BM_ZFlat/19 [sum (48.96 %) ] 80.2µs ± 0% 78.9µs ± 0% -1.65% (p=0.000 n=10+10) BM_ZFlat/20 [man (59.21 %) ] 7.59µs ± 1% 7.59µs ± 1% ~ (p=0.912 n=10+10) name old allocs/op new allocs/op delta BM_UFlat/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/5 [html4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/6 [txt1 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/7 [txt2 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/8 [txt3 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/9 [txt4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/10 [pb ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/11 [gaviota ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/12 [cp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/13 [c ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/14 [lsp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/15 [xls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/16 [xls_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/17 [bin ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/18 [bin_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/19 [sum ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/20 [man ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/5 [html4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/6 [txt1 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/7 [txt2 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/8 [txt3 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/9 [txt4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/10 [pb ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/11 [gaviota ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/12 [cp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/13 [c ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/14 [lsp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/15 [xls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/16 [xls_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/17 [bin ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/18 [bin_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/19 [sum ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/20 [man ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_ZFlat/0 [html (22.31 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/1 [urls (47.78 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/2 [jpg (99.95 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/3 [jpg_200 (73.00 %)] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/4 [pdf (83.30 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/5 [html4 (22.52 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/6 [txt1 (57.88 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/7 [txt2 (61.91 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/8 [txt3 (54.99 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/9 [txt4 (66.26 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/10 [pb (19.68 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/11 [gaviota (37.72 %)] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/12 [cp (48.12 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/13 [c (42.47 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/14 [lsp (48.37 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/15 [xls (41.23 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/16 [xls_200 (78.00 %)] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/17 [bin (18.11 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/18 [bin_200 (7.50 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/19 [sum (48.96 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/20 [man (59.21 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_UFlat/0 [html ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/1 [urls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/2 [jpg ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/3 [jpg_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/4 [pdf ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/5 [html4 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/6 [txt1 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/7 [txt2 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/8 [txt3 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/9 [txt4 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/10 [pb ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/11 [gaviota ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/12 [cp ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/13 [c ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/14 [lsp ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/15 [xls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/16 [xls_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/17 [bin ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/18 [bin_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/19 [sum ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/20 [man ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/0 [html ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/1 [urls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/2 [jpg ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/3 [jpg_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/4 [pdf ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/0 [html ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/1 [urls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/2 [jpg ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/3 [jpg_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/4 [pdf ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlatSink/0 [html ] 102k ± 0% 102k ± 0% ~ (all samples are equal) BM_UFlatSink/1 [urls ] 702k ± 0% 702k ± 0% ~ (all samples are equal) BM_UFlatSink/2 [jpg ] 123k ± 0% 123k ± 0% ~ (all samples are equal) BM_UFlatSink/3 [jpg_200 ] 201 ± 0% 201 ± 0% ~ (all samples are equal) BM_UFlatSink/4 [pdf ] 102k ± 0% 102k ± 0% ~ (all samples are equal) BM_UFlatSink/5 [html4 ] 410k ± 0% 410k ± 0% ~ (all samples are equal) BM_UFlatSink/6 [txt1 ] 152k ± 0% 152k ± 0% ~ (all samples are equal) BM_UFlatSink/7 [txt2 ] 125k ± 0% 125k ± 0% ~ (all samples are equal) BM_UFlatSink/8 [txt3 ] 427k ± 0% 427k ± 0% ~ (all samples are equal) BM_UFlatSink/9 [txt4 ] 482k ± 0% 482k ± 0% ~ (all samples are equal) BM_UFlatSink/10 [pb ] 119k ± 0% 119k ± 0% ~ (all samples are equal) BM_UFlatSink/11 [gaviota ] 184k ± 0% 184k ± 0% ~ (all samples are equal) BM_UFlatSink/12 [cp ] 24.6k ± 0% 24.6k ± 0% ~ (all samples are equal) BM_UFlatSink/13 [c ] 11.2k ± 0% 11.2k ± 0% ~ (all samples are equal) BM_UFlatSink/14 [lsp ] 3.72k ± 0% 3.72k ± 0% ~ (all samples are equal) BM_UFlatSink/15 [xls ] 1.03M ± 0% 1.03M ± 0% ~ (all samples are equal) BM_UFlatSink/16 [xls_200 ] 201 ± 0% 201 ± 0% ~ (all samples are equal) BM_UFlatSink/17 [bin ] 513k ± 0% 513k ± 0% ~ (all samples are equal) BM_UFlatSink/18 [bin_200 ] 201 ± 0% 201 ± 0% ~ (all samples are equal) BM_UFlatSink/19 [sum ] 38.2k ± 0% 38.2k ± 0% ~ (all samples are equal) BM_UFlatSink/20 [man ] 4.23k ± 0% 4.23k ± 0% ~ (all samples are equal) BM_ZFlat/0 [html (22.31 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/1 [urls (47.78 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/2 [jpg (99.95 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/3 [jpg_200 (73.00 %)] 63.3k ± 0% 63.3k ± 0% ~ (all samples are equal) BM_ZFlat/4 [pdf (83.30 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/5 [html4 (22.52 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/6 [txt1 (57.88 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/7 [txt2 (61.91 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/8 [txt3 (54.99 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/9 [txt4 (66.26 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/10 [pb (19.68 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/11 [gaviota (37.72 %)] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/12 [cp (48.12 %) ] 86.1k ± 0% 86.1k ± 0% ~ (all samples are equal) BM_ZFlat/13 [c (42.47 %) ] 63.3k ± 0% 63.3k ± 0% ~ (all samples are equal) BM_ZFlat/14 [lsp (48.37 %) ] 63.3k ± 0% 63.3k ± 0% ~ (all samples are equal) BM_ZFlat/15 [xls (41.23 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/16 [xls_200 (78.00 %)] 63.3k ± 0% 63.3k ± 0% ~ (all samples are equal) BM_ZFlat/17 [bin (18.11 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/18 [bin_200 (7.50 %) ] 63.3k ± 0% 63.3k ± 0% ~ (all samples are equal) BM_ZFlat/19 [sum (48.96 %) ] 116k ± 0% 116k ± 0% ~ (all samples are equal) BM_ZFlat/20 [man (59.21 %) ] 63.3k ± 0% 63.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_UFlat/0 [html ] 2.45GB/s ± 0% 2.50GB/s ± 0% +1.96% (p=0.000 n=10+10) BM_UFlat/1 [urls ] 1.22GB/s ± 0% 1.23GB/s ± 0% +0.69% (p=0.000 n=10+10) BM_UFlat/2 [jpg ] 17.0GB/s ± 5% 17.3GB/s ± 1% ~ (p=0.074 n=9+8) BM_UFlat/3 [jpg_200 ] 1.52GB/s ± 1% 1.54GB/s ± 0% +1.44% (p=0.000 n=10+8) BM_UFlat/4 [pdf ] 12.5GB/s ± 1% 12.5GB/s ± 0% ~ (p=0.721 n=8+8) BM_UFlat/5 [html4 ] 1.87GB/s ± 0% 1.88GB/s ± 0% +0.76% (p=0.000 n=10+10) BM_UFlat/6 [txt1 ] 795MB/s ± 0% 801MB/s ± 0% +0.79% (p=0.000 n=10+10) BM_UFlat/7 [txt2 ] 741MB/s ± 0% 746MB/s ± 0% +0.68% (p=0.000 n=10+10) BM_UFlat/8 [txt3 ] 840MB/s ± 0% 844MB/s ± 0% +0.44% (p=0.000 n=10+10) BM_UFlat/9 [txt4 ] 684MB/s ± 0% 688MB/s ± 0% +0.65% (p=0.000 n=9+10) BM_UFlat/10 [pb ] 3.09GB/s ± 0% 3.18GB/s ± 0% +2.88% (p=0.000 n=10+9) BM_UFlat/11 [gaviota ] 980MB/s ± 0% 975MB/s ± 0% -0.57% (p=0.000 n=10+10) BM_UFlat/12 [cp ] 1.74GB/s ± 0% 1.75GB/s ± 0% +0.38% (p=0.001 n=10+9) BM_UFlat/13 [c ] 1.53GB/s ± 1% 1.52GB/s ± 0% -0.55% (p=0.003 n=10+10) BM_UFlat/14 [lsp ] 1.64GB/s ± 0% 1.64GB/s ± 1% ~ (p=0.400 n=9+10) BM_UFlat/15 [xls ] 1.14GB/s ± 0% 1.14GB/s ± 0% +0.23% (p=0.000 n=10+10) BM_UFlat/16 [xls_200 ] 936MB/s ± 1% 941MB/s ± 1% ~ (p=0.052 n=10+10) BM_UFlat/17 [bin ] 1.87GB/s ± 0% 1.88GB/s ± 0% +0.28% (p=0.000 n=10+10) BM_UFlat/18 [bin_200 ] 1.97GB/s ± 5% 1.99GB/s ± 3% ~ (p=0.136 n=9+9) BM_UFlat/19 [sum ] 1.37GB/s ± 0% 1.41GB/s ± 0% +2.82% (p=0.000 n=10+9) BM_UFlat/20 [man ] 1.42GB/s ± 1% 1.42GB/s ± 0% ~ (p=0.579 n=10+10) BM_UValidate/0 [html ] 3.08GB/s ± 0% 3.05GB/s ± 0% -1.18% (p=0.000 n=10+10) BM_UValidate/1 [urls ] 1.59GB/s ± 0% 1.59GB/s ± 0% ~ (p=0.247 n=10+10) BM_UValidate/2 [jpg ] 845GB/s ± 0% 846GB/s ± 0% +0.09% (p=0.000 n=10+10) BM_UValidate/3 [jpg_200 ] 2.04GB/s ± 0% 2.04GB/s ± 0% -0.09% (p=0.019 n=10+10) BM_UValidate/4 [pdf ] 35.7GB/s ± 0% 35.4GB/s ± 1% -0.70% (p=0.000 n=10+10) BM_UIOVec/0 [html ] 841MB/s ± 0% 844MB/s ± 0% +0.36% (p=0.000 n=10+10) BM_UIOVec/1 [urls ] 650MB/s ± 0% 650MB/s ± 0% ~ (p=0.105 n=10+10) BM_UIOVec/2 [jpg ] 16.1GB/s ±10% 15.9GB/s ± 8% ~ (p=0.853 n=10+10) BM_UIOVec/3 [jpg_200 ] 612MB/s ± 1% 612MB/s ± 0% ~ (p=0.243 n=9+10) BM_UIOVec/4 [pdf ] 8.52GB/s ± 2% 8.46GB/s ± 3% ~ (p=0.436 n=10+10) BM_UFlatSink/0 [html ] 2.46GB/s ± 0% 2.50GB/s ± 0% +1.83% (p=0.000 n=9+10) BM_UFlatSink/1 [urls ] 1.22GB/s ± 0% 1.23GB/s ± 0% +0.73% (p=0.000 n=10+10) BM_UFlatSink/2 [jpg ] 16.3GB/s ± 8% 16.4GB/s ± 9% ~ (p=0.739 n=10+10) BM_UFlatSink/3 [jpg_200 ] 1.51GB/s ± 0% 1.50GB/s ± 0% -0.62% (p=0.000 n=10+9) BM_UFlatSink/4 [pdf ] 12.2GB/s ± 3% 12.4GB/s ± 1% +1.62% (p=0.029 n=10+10) BM_UFlatSink/5 [html4 ] 1.87GB/s ± 0% 1.88GB/s ± 0% +0.79% (p=0.000 n=10+10) BM_UFlatSink/6 [txt1 ] 795MB/s ± 0% 801MB/s ± 0% +0.74% (p=0.000 n=10+9) BM_UFlatSink/7 [txt2 ] 741MB/s ± 0% 745MB/s ± 0% +0.59% (p=0.000 n=10+9) BM_UFlatSink/8 [txt3 ] 840MB/s ± 0% 843MB/s ± 0% +0.37% (p=0.000 n=9+10) BM_UFlatSink/9 [txt4 ] 684MB/s ± 0% 688MB/s ± 0% +0.57% (p=0.000 n=9+10) BM_UFlatSink/10 [pb ] 3.10GB/s ± 0% 3.18GB/s ± 0% +2.64% (p=0.000 n=9+10) BM_UFlatSink/11 [gaviota ] 980MB/s ± 0% 974MB/s ± 0% -0.64% (p=0.000 n=10+10) BM_UFlatSink/12 [cp ] 1.74GB/s ± 0% 1.75GB/s ± 0% +0.26% (p=0.005 n=10+10) BM_UFlatSink/13 [c ] 1.52GB/s ± 1% 1.52GB/s ± 1% ~ (p=0.123 n=10+10) BM_UFlatSink/14 [lsp ] 1.64GB/s ± 0% 1.65GB/s ± 0% +0.46% (p=0.000 n=10+8) BM_UFlatSink/15 [xls ] 1.14GB/s ± 0% 1.15GB/s ± 0% +0.27% (p=0.000 n=10+10) BM_UFlatSink/16 [xls_200 ] 927MB/s ± 1% 926MB/s ± 1% ~ (p=0.497 n=10+9) BM_UFlatSink/17 [bin ] 1.87GB/s ± 0% 1.88GB/s ± 0% +0.27% (p=0.000 n=10+10) BM_UFlatSink/18 [bin_200 ] 1.92GB/s ± 2% 1.93GB/s ± 1% +0.70% (p=0.035 n=9+10) BM_UFlatSink/19 [sum ] 1.38GB/s ± 0% 1.41GB/s ± 0% +2.59% (p=0.000 n=9+10) BM_UFlatSink/20 [man ] 1.40GB/s ± 1% 1.41GB/s ± 1% ~ (p=0.079 n=10+9) BM_ZFlat/0 [html (22.31 %) ] 814MB/s ± 0% 816MB/s ± 0% +0.23% (p=0.000 n=10+10) BM_ZFlat/1 [urls (47.78 %) ] 418MB/s ± 0% 423MB/s ± 0% +1.06% (p=0.000 n=10+10) BM_ZFlat/2 [jpg (99.95 %) ] 10.5GB/s ± 5% 10.7GB/s ± 5% ~ (p=0.165 n=10+10) BM_ZFlat/3 [jpg_200 (73.00 %)] 558MB/s ± 3% 560MB/s ± 1% ~ (p=0.696 n=10+8) BM_ZFlat/4 [pdf (83.30 %) ] 6.94GB/s ± 2% 7.05GB/s ± 1% +1.59% (p=0.028 n=10+9) BM_ZFlat/5 [html4 (22.52 %) ] 739MB/s ± 0% 745MB/s ± 0% +0.86% (p=0.000 n=10+10) BM_ZFlat/6 [txt1 (57.88 %) ] 281MB/s ± 0% 283MB/s ± 0% +0.46% (p=0.000 n=10+10) BM_ZFlat/7 [txt2 (61.91 %) ] 260MB/s ± 0% 261MB/s ± 0% +0.59% (p=0.000 n=10+10) BM_ZFlat/8 [txt3 (54.99 %) ] 296MB/s ± 0% 297MB/s ± 0% +0.45% (p=0.000 n=10+10) BM_ZFlat/9 [txt4 (66.26 %) ] 244MB/s ± 0% 245MB/s ± 0% +0.16% (p=0.000 n=10+10) BM_ZFlat/10 [pb (19.68 %) ] 1.07GB/s ± 0% 1.09GB/s ± 0% +1.75% (p=0.000 n=10+10) BM_ZFlat/11 [gaviota (37.72 %)] 450MB/s ± 0% 451MB/s ± 0% +0.17% (p=0.000 n=9+10) BM_ZFlat/12 [cp (48.12 %) ] 538MB/s ± 0% 542MB/s ± 0% +0.74% (p=0.000 n=10+10) BM_ZFlat/13 [c (42.47 %) ] 635MB/s ± 0% 640MB/s ± 0% +0.80% (p=0.000 n=10+10) BM_ZFlat/14 [lsp (48.37 %) ] 678MB/s ± 0% 686MB/s ± 1% +1.18% (p=0.000 n=9+10) BM_ZFlat/15 [xls (41.23 %) ] 633MB/s ± 0% 641MB/s ± 0% +1.23% (p=0.000 n=10+7) BM_ZFlat/16 [xls_200 (78.00 %)] 516MB/s ± 2% 513MB/s ± 1% ~ (p=0.156 n=10+9) BM_ZFlat/17 [bin (18.11 %) ] 1.01GB/s ± 0% 1.02GB/s ± 0% +0.49% (p=0.000 n=10+10) BM_ZFlat/18 [bin_200 (7.50 %) ] 2.16GB/s ± 0% 2.24GB/s ± 1% +3.65% (p=0.000 n=8+8) BM_ZFlat/19 [sum (48.96 %) ] 478MB/s ± 0% 486MB/s ± 0% +1.66% (p=0.000 n=10+10) BM_ZFlat/20 [man (59.21 %) ] 558MB/s ± 1% 558MB/s ± 1% ~ (p=0.912 n=10+10) |
|
jueminyang | 254966c71e | Migrate to use absl::random | |
alkis | 53a38e5e33 |
Reduce number of allocations when compressing and simplify the code.
Before we were allocating at least once: twice with large table and thrice when we used a scratch buffer. With this approach we always allocate once. name old speed new speed delta BM_UFlat/0 [html ] 2.45GB/s ± 0% 2.45GB/s ± 0% -0.13% (p=0.000 n=11+11) BM_UFlat/1 [urls ] 1.19GB/s ± 0% 1.22GB/s ± 0% +2.48% (p=0.000 n=11+11) BM_UFlat/2 [jpg ] 17.2GB/s ± 2% 17.3GB/s ± 1% ~ (p=0.193 n=11+11) BM_UFlat/3 [jpg_200 ] 1.52GB/s ± 0% 1.51GB/s ± 0% -0.78% (p=0.000 n=10+9) BM_UFlat/4 [pdf ] 12.5GB/s ± 1% 12.5GB/s ± 1% ~ (p=0.881 n=9+9) BM_UFlat/5 [html4 ] 1.86GB/s ± 0% 1.86GB/s ± 0% ~ (p=0.123 n=11+11) BM_UFlat/6 [txt1 ] 793MB/s ± 0% 799MB/s ± 0% +0.78% (p=0.000 n=11+9) BM_UFlat/7 [txt2 ] 739MB/s ± 0% 744MB/s ± 0% +0.77% (p=0.000 n=11+11) BM_UFlat/8 [txt3 ] 839MB/s ± 0% 845MB/s ± 0% +0.71% (p=0.000 n=11+11) BM_UFlat/9 [txt4 ] 678MB/s ± 0% 685MB/s ± 0% +1.01% (p=0.000 n=11+11) BM_UFlat/10 [pb ] 3.08GB/s ± 0% 3.12GB/s ± 0% +1.21% (p=0.000 n=11+11) BM_UFlat/11 [gaviota ] 975MB/s ± 0% 976MB/s ± 0% +0.11% (p=0.000 n=11+11) BM_UFlat/12 [cp ] 1.73GB/s ± 1% 1.74GB/s ± 1% +0.46% (p=0.010 n=11+11) BM_UFlat/13 [c ] 1.53GB/s ± 0% 1.53GB/s ± 0% ~ (p=0.987 n=11+10) BM_UFlat/14 [lsp ] 1.65GB/s ± 0% 1.63GB/s ± 1% -1.04% (p=0.000 n=11+11) BM_UFlat/15 [xls ] 1.08GB/s ± 0% 1.15GB/s ± 0% +6.12% (p=0.000 n=10+11) BM_UFlat/16 [xls_200 ] 944MB/s ± 0% 920MB/s ± 3% -2.51% (p=0.000 n=9+11) BM_UFlat/17 [bin ] 1.86GB/s ± 0% 1.87GB/s ± 0% +0.68% (p=0.000 n=10+11) BM_UFlat/18 [bin_200 ] 1.91GB/s ± 3% 1.92GB/s ± 5% ~ (p=0.356 n=11+11) BM_UFlat/19 [sum ] 1.31GB/s ± 0% 1.40GB/s ± 0% +6.53% (p=0.000 n=11+11) BM_UFlat/20 [man ] 1.42GB/s ± 0% 1.42GB/s ± 0% +0.33% (p=0.000 n=10+10) |
|
ckennelly | df5548c0b3 |
Use sized deallocation when releasing Zippy's scratch buffers.
name old time/op new time/op delta BM_UFlat/0 [html ] 41.7µs ± 0% 41.7µs ± 0% ~ (p=0.222 n=5+5) BM_UFlat/1 [urls ] 587µs ± 0% 574µs ± 0% -2.31% (p=0.008 n=5+5) BM_UFlat/2 [jpg ] 7.24µs ± 2% 7.25µs ± 2% ~ (p=0.690 n=5+5) BM_UFlat/3 [jpg_200 ] 130ns ± 0% 131ns ± 1% ~ (p=0.556 n=4+5) BM_UFlat/4 [pdf ] 8.21µs ± 0% 8.24µs ± 1% ~ (p=0.278 n=5+5) BM_UFlat/5 [html4 ] 219µs ± 0% 220µs ± 0% +0.45% (p=0.008 n=5+5) BM_UFlat/6 [txt1 ] 192µs ± 0% 190µs ± 0% -0.86% (p=0.008 n=5+5) BM_UFlat/7 [txt2 ] 169µs ± 0% 168µs ± 0% -0.54% (p=0.008 n=5+5) BM_UFlat/8 [txt3 ] 509µs ± 0% 505µs ± 0% -0.66% (p=0.008 n=5+5) BM_UFlat/9 [txt4 ] 710µs ± 0% 702µs ± 0% -1.14% (p=0.008 n=5+5) BM_UFlat/10 [pb ] 38.2µs ± 0% 37.9µs ± 0% -0.82% (p=0.008 n=5+5) BM_UFlat/11 [gaviota ] 189µs ± 0% 189µs ± 0% ~ (p=0.746 n=5+5) BM_UFlat/12 [cp ] 14.2µs ± 0% 14.2µs ± 1% ~ (p=0.421 n=5+5) BM_UFlat/13 [c ] 7.29µs ± 0% 7.34µs ± 1% +0.69% (p=0.016 n=5+5) BM_UFlat/14 [lsp ] 2.27µs ± 0% 2.28µs ± 0% +0.34% (p=0.008 n=5+5) BM_UFlat/15 [xls ] 954µs ± 0% 900µs ± 0% -5.67% (p=0.008 n=5+5) BM_UFlat/16 [xls_200 ] 213ns ± 1% 217ns ± 2% ~ (p=0.056 n=5+5) BM_UFlat/17 [bin ] 276µs ± 0% 274µs ± 0% -0.94% (p=0.008 n=5+5) BM_UFlat/18 [bin_200 ] 101ns ± 1% 101ns ± 1% ~ (p=0.524 n=5+5) BM_UFlat/19 [sum ] 29.3µs ± 0% 27.3µs ± 0% -6.98% (p=0.008 n=5+5) BM_UFlat/20 [man ] 2.95µs ± 0% 2.95µs ± 0% ~ (p=0.651 n=5+5) For microbenchmarks, the overhead of allocating/deallocating should be small (the relevant metadata for TCMalloc's PageMap will be in cache), but this helps demonstrate that the refactoring does not adversely impact performance. |
|
alkis | 1b7466e143 |
Compute the wordmask instead of looking it up in a table.
Tested: name old speed new speed delta BM_UFlat/0 [html ] 2.13GB/s ± 0% 2.46GB/s ± 0% +15.70% (p=0.000 n=10+8) BM_UFlat/1 [urls ] 1.21GB/s ± 0% 1.20GB/s ± 0% -1.49% (p=0.000 n=9+10) BM_UFlat/2 [jpg ] 17.1GB/s ± 1% 17.2GB/s ± 1% ~ (p=0.120 n=11+11) BM_UFlat/3 [jpg_200] 1.55GB/s ± 0% 1.54GB/s ± 0% -0.96% (p=0.000 n=10+7) BM_UFlat/4 [pdf ] 12.9GB/s ± 0% 12.6GB/s ± 0% -1.98% (p=0.000 n=11+9) BM_UFlat/5 [html4 ] 1.87GB/s ± 0% 1.87GB/s ± 0% -0.06% (p=0.033 n=11+11) BM_UFlat/6 [txt1 ] 816MB/s ± 0% 793MB/s ± 0% -2.84% (p=0.000 n=11+11) BM_UFlat/7 [txt2 ] 758MB/s ± 0% 737MB/s ± 0% -2.77% (p=0.000 n=11+11) BM_UFlat/8 [txt3 ] 865MB/s ± 0% 839MB/s ± 0% -2.94% (p=0.000 n=11+8) BM_UFlat/9 [txt4 ] 701MB/s ± 0% 679MB/s ± 0% -3.11% (p=0.000 n=11+10) BM_UFlat/10 [pb ] 2.60GB/s ± 2% 3.07GB/s ± 0% +17.81% (p=0.000 n=11+11) BM_UFlat/11 [gaviota] 1.01GB/s ± 0% 0.97GB/s ± 0% -3.83% (p=0.000 n=11+10) BM_UFlat/12 [cp ] 1.66GB/s ± 1% 1.73GB/s ± 1% +4.32% (p=0.000 n=11+11) BM_UFlat/13 [c ] 1.52GB/s ± 1% 1.53GB/s ± 0% +0.49% (p=0.002 n=11+11) BM_UFlat/14 [lsp ] 1.61GB/s ± 0% 1.64GB/s ± 0% +2.10% (p=0.000 n=10+11) BM_UFlat/15 [xls ] 1.12GB/s ± 0% 1.08GB/s ± 0% -3.95% (p=0.000 n=11+7) BM_UFlat/16 [xls_200] 926MB/s ± 1% 935MB/s ± 1% ~ (p=0.056 n=9+11) BM_UFlat/17 [bin ] 1.89GB/s ± 0% 1.86GB/s ± 0% -1.32% (p=0.000 n=11+11) BM_UFlat/18 [bin_200] 1.96GB/s ± 0% 1.99GB/s ± 1% +1.78% (p=0.000 n=11+11) BM_UFlat/19 [sum ] 1.32GB/s ± 0% 1.31GB/s ± 0% -0.79% (p=0.000 n=11+10) BM_UFlat/20 [man ] 1.40GB/s ± 0% 1.43GB/s ± 0% +2.51% (p=0.000 n=9+10) BM_UValidate/0 [html ] 2.95GB/s ± 1% 3.07GB/s ± 0% +4.11% (p=0.000 n=10+11) BM_UValidate/1 [urls ] 1.57GB/s ± 0% 1.60GB/s ± 0% +2.24% (p=0.000 n=10+11) BM_UValidate/2 [jpg ] 822GB/s ± 0% 850GB/s ± 0% +3.42% (p=0.000 n=10+11) BM_UValidate/3 [jpg_200] 2.01GB/s ± 0% 2.04GB/s ± 0% +1.24% (p=0.000 n=11+11) BM_UValidate/4 [pdf ] 33.7GB/s ± 0% 35.9GB/s ± 1% +6.51% (p=0.000 n=10+11) BM_UIOVec/0 [html ] 852MB/s ± 0% 852MB/s ± 0% ~ (p=0.898 n=11+11) BM_UIOVec/1 [urls ] 663MB/s ± 0% 652MB/s ± 0% -1.61% (p=0.000 n=11+11) BM_UIOVec/2 [jpg ] 15.3GB/s ± 1% 15.3GB/s ± 2% ~ (p=0.459 n=9+10) BM_UIOVec/3 [jpg_200] 652MB/s ± 0% 627MB/s ± 1% -3.80% (p=0.000 n=10+11) BM_UIOVec/4 [pdf ] 8.80GB/s ± 1% 8.57GB/s ± 1% -2.62% (p=0.000 n=10+11) BM_UFlatSink/0 [html ] 2.13GB/s ± 0% 2.46GB/s ± 0% +15.63% (p=0.000 n=11+11) BM_UFlatSink/1 [urls ] 1.21GB/s ± 0% 1.20GB/s ± 0% -1.42% (p=0.000 n=11+10) BM_UFlatSink/2 [jpg ] 17.1GB/s ± 2% 17.2GB/s ± 1% ~ (p=0.175 n=11+9) BM_UFlatSink/3 [jpg_200] 1.52GB/s ± 1% 1.47GB/s ± 3% -3.15% (p=0.000 n=11+11) BM_UFlatSink/4 [pdf ] 12.8GB/s ± 1% 12.6GB/s ± 1% -1.76% (p=0.000 n=11+11) BM_UFlatSink/5 [html4 ] 1.87GB/s ± 0% 1.87GB/s ± 0% -0.19% (p=0.000 n=11+10) BM_UFlatSink/6 [txt1 ] 816MB/s ± 0% 792MB/s ± 0% -2.94% (p=0.000 n=11+11) BM_UFlatSink/7 [txt2 ] 758MB/s ± 0% 736MB/s ± 0% -2.83% (p=0.000 n=11+11) BM_UFlatSink/8 [txt3 ] 865MB/s ± 0% 838MB/s ± 0% -3.13% (p=0.000 n=11+11) BM_UFlatSink/9 [txt4 ] 701MB/s ± 0% 678MB/s ± 0% -3.20% (p=0.000 n=11+11) BM_UFlatSink/10 [pb ] 2.60GB/s ± 2% 3.07GB/s ± 0% +18.27% (p=0.000 n=11+10) BM_UFlatSink/11 [gaviota] 1.01GB/s ± 0% 0.97GB/s ± 0% -3.90% (p=0.000 n=11+11) BM_UFlatSink/12 [cp ] 1.66GB/s ± 1% 1.73GB/s ± 1% +4.62% (p=0.000 n=11+10) BM_UFlatSink/13 [c ] 1.52GB/s ± 0% 1.53GB/s ± 1% ~ (p=0.180 n=9+11) BM_UFlatSink/14 [lsp ] 1.61GB/s ± 0% 1.64GB/s ± 1% +1.98% (p=0.000 n=9+11) BM_UFlatSink/15 [xls ] 1.12GB/s ± 0% 1.08GB/s ± 0% -3.76% (p=0.000 n=11+11) BM_UFlatSink/16 [xls_200] 909MB/s ± 2% 924MB/s ± 1% +1.62% (p=0.000 n=11+11) BM_UFlatSink/17 [bin ] 1.88GB/s ± 0% 1.86GB/s ± 0% -1.18% (p=0.000 n=9+11) BM_UFlatSink/18 [bin_200] 1.94GB/s ± 2% 1.94GB/s ± 1% ~ (p=0.090 n=11+11) BM_UFlatSink/19 [sum ] 1.32GB/s ± 0% 1.31GB/s ± 0% -0.76% (p=0.000 n=11+11) BM_UFlatSink/20 [man ] 1.39GB/s ± 2% 1.43GB/s ± 0% +2.75% (p=0.000 n=11+10) Assembly before: * 44 8b 5c 85 a0 mov -0x60(%rbp,%rax,4),%r11d 45 23 5d 00 and 0x0(%r13),%r11d 89 d6 mov %edx,%esi 81 e6 00 07 00 00 and $0x700,%esi Assembly after: * 89 c1 mov %eax,%ecx * c0 e1 03 shl $0x3,%cl * bf ff ff ff ff mov $0xffffffff,%edi * 48 d3 e7 shl %cl,%rdi * f7 d7 not %edi 41 23 7d 00 and 0x0(%r13),%edi 41 89 d3 mov %edx,%r11d 41 81 e3 00 07 00 00 and $0x700,%r11d |
|
Caleb Mazalevskis |
a866f7181c
|
Update README to use HTTPS instead of HTTP.
HTTPS is currently available for all the HTTP links included in the README. As such, using HTTPS instead of HTTP for those links may be preferable. |
|
costan | ea660b57d6 | Fix unused private field warning in NDEBUG builds. | |
costan | 7fefd231a1 |
C++11 guarantees <cstddef> and <cstdint>.
The build configuration can be cleaned up a bit. |
|
costan | db082d2cd6 | Remove GCC on OSX from the Travis CI matrix. | |
costan | ad82620f6f |
Move pshufb_fill_patterns from snappy-internal.h to snappy.cc.
The array of constants is only used in the SSSE3 fast-path in IncrementalCopy. |
|
costan | 73c31e824c |
Fix Visual Studio build.
Commit |
|
jefflim | 27ff0af12a |
Improve performance of zippy decompression to IOVecs by up to almost 50%
1) Simplify loop condition for small pattern IncrementalCopy 2) Use pointers rather than indices to track current iovec. 3) Use fast IncrementalCopy 4) Bypass Append check from within AppendFromSelf While this code greatly improves the performance of ZippyIOVecWriter, a bigger question is whether IOVec writing should be improved, or removed. Perf tests: name old speed new speed delta BM_UFlat/0 [html ] 2.13GB/s ± 0% 2.14GB/s ± 1% ~ BM_UFlat/1 [urls ] 1.22GB/s ± 0% 1.24GB/s ± 0% +1.87% BM_UFlat/2 [jpg ] 17.2GB/s ± 1% 17.1GB/s ± 0% ~ BM_UFlat/3 [jpg_200 ] 1.55GB/s ± 0% 1.53GB/s ± 2% ~ BM_UFlat/4 [pdf ] 12.8GB/s ± 1% 12.7GB/s ± 2% -0.36% BM_UFlat/5 [html4 ] 1.89GB/s ± 0% 1.90GB/s ± 1% ~ BM_UFlat/6 [txt1 ] 811MB/s ± 0% 829MB/s ± 1% +2.24% BM_UFlat/7 [txt2 ] 756MB/s ± 0% 774MB/s ± 1% +2.41% BM_UFlat/8 [txt3 ] 860MB/s ± 0% 879MB/s ± 1% +2.16% BM_UFlat/9 [txt4 ] 699MB/s ± 0% 715MB/s ± 1% +2.31% BM_UFlat/10 [pb ] 2.64GB/s ± 0% 2.65GB/s ± 1% ~ BM_UFlat/11 [gaviota ] 1.00GB/s ± 0% 0.99GB/s ± 2% ~ BM_UFlat/12 [cp ] 1.66GB/s ± 1% 1.66GB/s ± 2% ~ BM_UFlat/13 [c ] 1.53GB/s ± 0% 1.47GB/s ± 5% -3.97% BM_UFlat/14 [lsp ] 1.60GB/s ± 1% 1.55GB/s ± 5% -3.41% BM_UFlat/15 [xls ] 1.12GB/s ± 0% 1.15GB/s ± 0% +1.93% BM_UFlat/16 [xls_200 ] 918MB/s ± 2% 929MB/s ± 1% +1.15% BM_UFlat/17 [bin ] 1.86GB/s ± 0% 1.89GB/s ± 1% +1.61% BM_UFlat/18 [bin_200 ] 1.90GB/s ± 1% 1.97GB/s ± 1% +3.67% BM_UFlat/19 [sum ] 1.32GB/s ± 0% 1.33GB/s ± 1% ~ BM_UFlat/20 [man ] 1.39GB/s ± 0% 1.36GB/s ± 3% ~ BM_UValidate/0 [html ] 2.85GB/s ± 3% 2.90GB/s ± 0% ~ BM_UValidate/1 [urls ] 1.57GB/s ± 0% 1.56GB/s ± 0% -0.20% BM_UValidate/2 [jpg ] 824GB/s ± 0% 825GB/s ± 0% +0.11% BM_UValidate/3 [jpg_200 ] 2.01GB/s ± 0% 2.02GB/s ± 0% +0.10% BM_UValidate/4 [pdf ] 30.4GB/s ±11% 33.5GB/s ± 0% ~ BM_UIOVec/0 [html ] 604MB/s ± 0% 856MB/s ± 0% +41.70% BM_UIOVec/1 [urls ] 440MB/s ± 0% 660MB/s ± 0% +49.91% BM_UIOVec/2 [jpg ] 15.1GB/s ± 1% 15.3GB/s ± 1% +1.22% BM_UIOVec/3 [jpg_200 ] 567MB/s ± 1% 629MB/s ± 0% +10.89% BM_UIOVec/4 [pdf ] 7.16GB/s ± 2% 8.56GB/s ± 1% +19.64% BM_UFlatSink/0 [html ] 2.13GB/s ± 0% 2.16GB/s ± 0% +1.47% BM_UFlatSink/1 [urls ] 1.22GB/s ± 0% 1.25GB/s ± 0% +2.18% BM_UFlatSink/2 [jpg ] 17.1GB/s ± 2% 17.1GB/s ± 2% ~ BM_UFlatSink/3 [jpg_200 ] 1.51GB/s ± 1% 1.53GB/s ± 2% +1.11% BM_UFlatSink/4 [pdf ] 12.7GB/s ± 2% 12.8GB/s ± 1% +0.67% BM_UFlatSink/5 [html4 ] 1.90GB/s ± 0% 1.92GB/s ± 0% +1.31% BM_UFlatSink/6 [txt1 ] 810MB/s ± 0% 835MB/s ± 0% +3.04% BM_UFlatSink/7 [txt2 ] 755MB/s ± 0% 779MB/s ± 0% +3.19% BM_UFlatSink/8 [txt3 ] 859MB/s ± 0% 884MB/s ± 0% +2.86% BM_UFlatSink/9 [txt4 ] 698MB/s ± 0% 718MB/s ± 0% +2.96% BM_UFlatSink/10 [pb ] 2.64GB/s ± 0% 2.67GB/s ± 0% +1.16% BM_UFlatSink/11 [gaviota ] 1.00GB/s ± 0% 1.01GB/s ± 0% +1.04% BM_UFlatSink/12 [cp ] 1.66GB/s ± 1% 1.68GB/s ± 1% +0.83% BM_UFlatSink/13 [c ] 1.52GB/s ± 1% 1.53GB/s ± 0% +0.38% BM_UFlatSink/14 [lsp ] 1.60GB/s ± 1% 1.61GB/s ± 0% +0.91% BM_UFlatSink/15 [xls ] 1.12GB/s ± 0% 1.15GB/s ± 0% +1.96% BM_UFlatSink/16 [xls_200 ] 906MB/s ± 3% 920MB/s ± 1% +1.55% BM_UFlatSink/17 [bin ] 1.86GB/s ± 0% 1.90GB/s ± 0% +2.15% BM_UFlatSink/18 [bin_200 ] 1.85GB/s ± 2% 1.92GB/s ± 2% +4.01% BM_UFlatSink/19 [sum ] 1.32GB/s ± 1% 1.35GB/s ± 0% +2.23% BM_UFlatSink/20 [man ] 1.39GB/s ± 1% 1.40GB/s ± 0% +1.12% BM_ZFlat/0 [html (22.31 %) ] 800MB/s ± 0% 793MB/s ± 0% -0.95% BM_ZFlat/1 [urls (47.78 %) ] 423MB/s ± 0% 424MB/s ± 0% +0.11% BM_ZFlat/2 [jpg (99.95 %) ] 12.0GB/s ± 2% 12.0GB/s ± 4% ~ BM_ZFlat/3 [jpg_200 (73.00 %)] 592MB/s ± 3% 594MB/s ± 2% ~ BM_ZFlat/4 [pdf (83.30 %) ] 7.26GB/s ± 1% 7.23GB/s ± 2% -0.49% BM_ZFlat/5 [html4 (22.52 %) ] 738MB/s ± 0% 739MB/s ± 0% +0.17% BM_ZFlat/6 [txt1 (57.88 %) ] 286MB/s ± 0% 285MB/s ± 0% -0.09% BM_ZFlat/7 [txt2 (61.91 %) ] 264MB/s ± 0% 264MB/s ± 0% +0.08% BM_ZFlat/8 [txt3 (54.99 %) ] 300MB/s ± 0% 300MB/s ± 0% ~ BM_ZFlat/9 [txt4 (66.26 %) ] 248MB/s ± 0% 247MB/s ± 0% -0.20% BM_ZFlat/10 [pb (19.68 %) ] 1.04GB/s ± 0% 1.03GB/s ± 0% -1.17% BM_ZFlat/11 [gaviota (37.72 %)] 451MB/s ± 0% 450MB/s ± 0% -0.35% BM_ZFlat/12 [cp (48.12 %) ] 543MB/s ± 0% 538MB/s ± 0% -1.04% BM_ZFlat/13 [c (42.47 %) ] 638MB/s ± 1% 643MB/s ± 0% +0.68% BM_ZFlat/14 [lsp (48.37 %) ] 686MB/s ± 0% 691MB/s ± 1% +0.76% BM_ZFlat/15 [xls (41.23 %) ] 636MB/s ± 0% 633MB/s ± 0% -0.52% BM_ZFlat/16 [xls_200 (78.00 %)] 523MB/s ± 2% 520MB/s ± 2% -0.56% BM_ZFlat/17 [bin (18.11 %) ] 1.01GB/s ± 0% 1.01GB/s ± 0% +0.50% BM_ZFlat/18 [bin_200 (7.50 %) ] 2.45GB/s ± 1% 2.44GB/s ± 1% -0.54% BM_ZFlat/19 [sum (48.96 %) ] 487MB/s ± 0% 478MB/s ± 0% -1.89% BM_ZFlat/20 [man (59.21 %) ] 567MB/s ± 1% 566MB/s ± 1% ~ The BM_UFlat/13 and BM_UFlat/14 results showed high variance, so I reran them: name old speed new speed delta BM_UFlat/13 [c ] 1.53GB/s ± 0% 1.53GB/s ± 1% ~ BM_UFlat/14 [lsp] 1.61GB/s ± 1% 1.61GB/s ± 1% +0.25% |
|
costan | 4ffb0e62c5 | Update Travis CI configuration. | |
atdt | be490ef9ec | Test for SSE3 suppport before using pshufb. | |
atdt | 8f469d97e2 |
Avoid store-forwarding stalls in Zippy's IncrementalCopy
NEW: Annotate `pattern` as initialized, for MSan. Snappy's IncrementalCopy routine optimizes for speed by reading and writing memory in blocks of eight or sixteen bytes. If the gap between the source and destination pointers is smaller than eight bytes, snappy's strategy is to expand the gap by issuing a series of partly-overlapping eight-byte loads+stores. Because the range of each load partly overlaps that of the store which preceded it, the store buffer cannot be forwarded to the load, and the load stalls while it waits for the store to retire. This is called a store-forwarding stall. We can use fewer loads and avoid most of the stalls by loading the first eight bytes into an 128-bit XMM register, then using PSHUFB to permute the register's contents in-place into the desired repeating sequence of bytes. When falling back to IncrementalCopySlow, use memset if the pattern size == 1. This eliminates around 60% of the stalls. name old time/op new time/op delta BM_UFlat/0 [html] 48.6µs ± 0% 48.2µs ± 0% -0.92% (p=0.000 n=19+18) BM_UFlat/1 [urls] 589µs ± 0% 576µs ± 0% -2.17% (p=0.000 n=19+18) BM_UFlat/2 [jpg] 7.12µs ± 0% 7.10µs ± 0% ~ (p=0.071 n=19+18) BM_UFlat/3 [jpg_200] 162ns ± 0% 151ns ± 0% -7.06% (p=0.000 n=19+18) BM_UFlat/4 [pdf] 8.25µs ± 0% 8.19µs ± 0% -0.74% (p=0.000 n=19+18) BM_UFlat/5 [html4] 218µs ± 0% 218µs ± 0% +0.09% (p=0.000 n=17+18) BM_UFlat/6 [txt1] 191µs ± 0% 189µs ± 0% -1.12% (p=0.000 n=19+18) BM_UFlat/7 [txt2] 168µs ± 0% 167µs ± 0% -1.01% (p=0.000 n=19+18) BM_UFlat/8 [txt3] 502µs ± 0% 499µs ± 0% -0.52% (p=0.000 n=19+18) BM_UFlat/9 [txt4] 704µs ± 0% 695µs ± 0% -1.26% (p=0.000 n=19+18) BM_UFlat/10 [pb] 45.6µs ± 0% 44.2µs ± 0% -3.13% (p=0.000 n=19+15) BM_UFlat/11 [gaviota] 188µs ± 0% 194µs ± 0% +3.06% (p=0.000 n=15+18) BM_UFlat/12 [cp] 15.1µs ± 2% 14.7µs ± 1% -2.09% (p=0.000 n=18+18) BM_UFlat/13 [c] 7.38µs ± 0% 7.36µs ± 0% -0.28% (p=0.000 n=16+18) BM_UFlat/14 [lsp] 2.31µs ± 0% 2.37µs ± 0% +2.64% (p=0.000 n=19+18) BM_UFlat/15 [xls] 984µs ± 0% 909µs ± 0% -7.59% (p=0.000 n=19+18) BM_UFlat/16 [xls_200] 215ns ± 0% 217ns ± 0% +0.71% (p=0.000 n=19+15) BM_UFlat/17 [bin] 289µs ± 0% 287µs ± 0% -0.71% (p=0.000 n=19+18) BM_UFlat/18 [bin_200] 161ns ± 0% 116ns ± 0% -28.09% (p=0.000 n=19+16) BM_UFlat/19 [sum] 31.9µs ± 0% 29.2µs ± 0% -8.37% (p=0.000 n=19+18) BM_UFlat/20 [man] 3.13µs ± 1% 3.07µs ± 0% -1.79% (p=0.000 n=19+18) name old allocs/op new allocs/op delta BM_UFlat/0 [html] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/1 [urls] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/2 [jpg] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/3 [jpg_200] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/4 [pdf] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/5 [html4] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/6 [txt1] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/7 [txt2] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/8 [txt3] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/9 [txt4] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/10 [pb] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/11 [gaviota] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/12 [cp] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/13 [c] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/14 [lsp] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/15 [xls] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/16 [xls_200] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/17 [bin] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/18 [bin_200] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/19 [sum] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/20 [man] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) name old speed new speed delta BM_UFlat/0 [html] 2.11GB/s ± 0% 2.13GB/s ± 0% +0.92% (p=0.000 n=19+18) BM_UFlat/1 [urls] 1.19GB/s ± 0% 1.22GB/s ± 0% +2.22% (p=0.000 n=16+17) BM_UFlat/2 [jpg] 17.3GB/s ± 0% 17.3GB/s ± 0% ~ (p=0.074 n=19+18) BM_UFlat/3 [jpg_200] 1.23GB/s ± 0% 1.33GB/s ± 0% +7.58% (p=0.000 n=19+18) BM_UFlat/4 [pdf] 12.4GB/s ± 0% 12.5GB/s ± 0% +0.74% (p=0.000 n=19+18) BM_UFlat/5 [html4] 1.88GB/s ± 0% 1.88GB/s ± 0% -0.09% (p=0.000 n=18+18) BM_UFlat/6 [txt1] 798MB/s ± 0% 807MB/s ± 0% +1.13% (p=0.000 n=19+18) BM_UFlat/7 [txt2] 743MB/s ± 0% 751MB/s ± 0% +1.02% (p=0.000 n=19+18) BM_UFlat/8 [txt3] 850MB/s ± 0% 855MB/s ± 0% +0.52% (p=0.000 n=19+18) BM_UFlat/9 [txt4] 684MB/s ± 0% 693MB/s ± 0% +1.28% (p=0.000 n=19+18) BM_UFlat/10 [pb] 2.60GB/s ± 0% 2.69GB/s ± 0% +3.25% (p=0.000 n=19+16) BM_UFlat/11 [gaviota] 979MB/s ± 0% 950MB/s ± 0% -2.97% (p=0.000 n=15+18) BM_UFlat/12 [cp] 1.63GB/s ± 2% 1.67GB/s ± 1% +2.13% (p=0.000 n=18+18) BM_UFlat/13 [c] 1.51GB/s ± 0% 1.52GB/s ± 0% +0.29% (p=0.000 n=16+18) BM_UFlat/14 [lsp] 1.61GB/s ± 1% 1.57GB/s ± 0% -2.57% (p=0.000 n=19+18) BM_UFlat/15 [xls] 1.05GB/s ± 0% 1.13GB/s ± 0% +8.22% (p=0.000 n=19+18) BM_UFlat/16 [xls_200] 928MB/s ± 0% 921MB/s ± 0% -0.81% (p=0.000 n=19+17) BM_UFlat/17 [bin] 1.78GB/s ± 0% 1.79GB/s ± 0% +0.71% (p=0.000 n=19+18) BM_UFlat/18 [bin_200] 1.24GB/s ± 0% 1.72GB/s ± 0% +38.92% (p=0.000 n=19+18) BM_UFlat/19 [sum] 1.20GB/s ± 0% 1.31GB/s ± 0% +9.15% (p=0.000 n=19+18) BM_UFlat/20 [man] 1.35GB/s ± 1% 1.38GB/s ± 0% +1.84% (p=0.000 n=19+18) |
|
costan | 4f7bd2dbfd |
Update CI configurations.
Bump GCC and Clang on Travis and remove Visual Studio 2015 from AppVeyor. |
|
jgorbe | ca37ab7fb9 |
Ensure DecompressAllTags starts on a 32-byte boundary + 16 bytes.
First of all, I'm sorry about this ugly hack. I hope the following long explanation is enough to justify it. We have observed that, in some conditions, the results for dataset number 10 (pb) in the zippy benchmark can show a >20% regression on Skylake CPUs. In order to diagnose this, we profiled the benchmark looking at hot functions (99% of the time is spent on DecompressAllTags), then looked at the generated code to see if there was any difference. In order to discard a minor difference we observed in register allocation we replaced zippy.cc with a pre-built assembly file so it was the same in both variants, and we still were able to reproduce the regression. After discarding a regression caused by the compiler, we digged a bit further and noticed that the alignment of the function in the final binary was different. Both were aligned to a 16-byte boundary, but the slower one was also (by chance) aligned to a 32-byte boundary. A regression caused by alignment differences would explain why I could reproduce it consistently on the same CitC client, but not others: slight differences in the sources can cause the resulting binary to have different layout. Here are some detailed benchmark results before/after the fix. Note how fixing the alignment makes the difference between baseline and experiment go away, but regular 32-byte alignment puts both variants in the same ballpark as the original regression: Original (note BM_UCord_10 and BM_UDataBuffer_10 around the -24% line): BASELINE BM_UCord/10 2938 2932 24194 3.767GB/s pb BM_UDataBuffer/10 3008 3004 23316 3.677GB/s pb EXPERIMENT BM_UCord/10 3797 3789 18512 2.915GB/s pb BM_UDataBuffer/10 4024 4016 17543 2.750GB/s pb Aligning DecompressAllTags to a 32-byte boundary: BASELINE BM_UCord/10 3872 3862 18035 2.860GB/s pb BM_UDataBuffer/10 4010 3998 17591 2.763GB/s pb EXPERIMENT BM_UCord/10 3884 3876 18126 2.850GB/s pb BM_UDataBuffer/10 4037 4027 17199 2.743GB/s pb Aligning DecompressAllTags to a 32-byte boundary + 16 bytes (this patch): BASELINE BM_UCord/10 3103 3095 22642 3.569GB/s pb BM_UDataBuffer/10 3186 3177 21947 3.476GB/s pb EXPERIMENT BM_UCord/10 3104 3095 22632 3.569GB/s pb BM_UDataBuffer/10 3167 3159 22076 3.496GB/s pb This change forces the "good" alignment for DecompressAllTags which, if anything, should make benchmark results more stable (and maybe we'll improve some unlucky application!). |
|
scrubbed | 15a2804cd2 |
Fix an incorrect analysis / comment in the "pattern doubling" code.
This should have a miniscule positive effect on performance; the main idea of the CL is just to fix the incorrect comment. |
|
costan | e69d9f8806 | Fix Travis CI configuration for OSX. | |
chandlerc | 4aba5426d4 |
Rework a very hot, very sensitive part of snappy to reduce the number of
instructions, the number of dynamic branches, and avoid a particular loop structure than LLVM has a very hard time optimizing for this particular case. The code being changed is part of the hottest path for snappy decompression. In the benchmarks for decompressing protocol buffers, this has proven to be amazingly sensitive to the slightest changes in code layout. For example, previously we added '.p2align 5' assembly directive to the code. This essentially padded the loop out from the function. Merely by doing this we saw significant performance improvements. As a consequence, several of the compiler's typically reasonable optimizations can have surprising bad impacts. Loop unrolling is a primary culprit, but in the next LLVM release we are seeing an issue due to loop rotation. While some of the problems caused by the newly triggered loop rotation in LLVM can be mitigated with ongoing work on LLVM's code layout optimizations (specifically, loop header cloning), that is a fairly long term project. And even minor fluctuations in how that subsequent optimization is performed may prevent gaining the performance back. For now, we need some way to unblock the next LLVM release which contains a generic improvement to the LLVM loop optimizer that enables loop rotation in more places, but uncovers this sensitivity and weakness in a particular case. This CL restructures the loop to have a simpler structure. Specifically, we eagerly test what the terminal condition will be and provide two versions of the copy loop that use a single loop predicate. The comments in the source code and benchmarks indicate that only one of these two cases is actually hot: we expect to generally have enough slop in the buffer. That in turn allows us to generate a much simpler branch and loop structure for the hot path (especially for the protocol buffer decompression benchmark). However, structuring even this simple loop in a way that doesn't trigger some other performance bubble (often a more severe one) is quite challenging. We have to carefully manage the variables used in the loop and the addressing pattern. We should teach LLVM how to do this reliably, but that too is a *much* more significant undertaking and is extremely rare to have this degree of importance. The desired structure of the loop, as shown with IACA's analysis for the broadwell micro-architecture (HSW and SKX are similar): | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | | --------------------------------------------------------------------------------- | 1 | | | 1.0 1.0 | | | | | | | mov rcx, qword ptr [rdi+rdx*1-0x8] | 2^ | | | | 0.4 | 1.0 | | | 0.6 | | mov qword ptr [rdi], rcx | 1 | | | | 1.0 1.0 | | | | | | mov rcx, qword ptr [rdi+rdx*1] | 2^ | | | 0.3 | | 1.0 | | | 0.7 | | mov qword ptr [rdi+0x8], rcx | 1 | 0.5 | | | | | 0.5 | | | | add rdi, 0x10 | 1 | 0.2 | | | | | | 0.8 | | | cmp rdi, rax | 0F | | | | | | | | | | jb 0xffffffffffffffe9 Specifically, the arrangement of addressing modes for the stores such that micro-op fusion (indicated by the `^` on the `2` micro-op count) is important to achieve good throughput for this loop. The other thing necessary to make this change effective is to remove our previous hack using `.p2align 5` to pad out the main decompression loop, and to forcibly disable loop unrolling for critical loops. Because this change simplifies the loop structure, more unrolling opportunities show up. Also, the next LLVM release's generic loop optimization improvements allow unrolling in more places, requiring still more disabling of unrolling in this change. Perhaps most surprising of these is that we must disable loop unrolling in the *slow* path. While unrolling there seems pointless, it should also be harmless. This cold code is laid out very far away from all of the hot code. All the samples shown in a profile of the benchmark occur before this loop in the function. And yet, if the loop gets unrolled (which seems to only happen reliably with the next LLVM release) we see a nearly 20% regression in decompressing protocol buffers! With the current release of LLVM, we still observe some regression from this source change, but it is fairly small (5% on decompressing protocol buffers, less elsewhere). And with the next LLVM release it drops to under 1% even in that case. Meanwhile, without this change, the next release of LLVM will regress decompressing protocol buffers by more than 10%. |
|
costan | 26102a0c66 |
Fix generated version number in open source release.
Lands GitHub PR #61. The patch was also independently contributed by Martin Gieseking <martin.gieseking@uos.de>. |
|
costan | b02bfa754e | Tag open source release 1.1.7. | |
wmi | 824e6718b5 |
Add a loop alignment directive to work around a performance regression.
We found LLVM upstream change at rL310792 degraded zippy benchmark by ~3%. Performance analysis showed the regression was caused by some side-effect. The incidental loop alignment change (from 32 bytes to 16 bytes) led to increase of branch miss prediction and caused the regression. The regression was reproducible on several intel micro-architectures, like sandybridge, haswell and skylake. Sadly we still don't have good understanding about the internal of intel branch predictor and cannot explain how the branch miss prediction increases when the loop alignment changes, so we cannot make a real fix here. The workaround solution in the patch is to add a directive, align the hot loop to 32 bytes, which can restore the performance. This is in order to unblock the flip of default compiler to LLVM. |
|
costan | 55924d1109 |
Add GNUInstallDirs to CMake configuration.
This is modeled after https://github.com/google/googletest/pull/1160. The immediate benefit is fixing the library install paths on 64-bit Linux distributions, which tend to support running 32-bit and 64-bit code side by side by installing 32-bit libraries in /usr/lib and 64-bit libraries in /usr/lib64. |
|
costan | 632cd0f128 |
Use 64-bit optimized code path for ARM64.
This is inspired by https://github.com/google/snappy/pull/22. Benchmark results with the change, Pixel C with Android N2G48B Benchmark Time(ns) CPU(ns) Iterations --------------------------------------------------- BM_UFlat/0 119544 119253 1501 818.9MB/s html BM_UFlat/1 1223950 1208588 163 554.0MB/s urls BM_UFlat/2 16081 15962 11527 7.2GB/s jpg BM_UFlat/3 356 352 416666 540.6MB/s jpg_200 BM_UFlat/4 25010 24860 7683 3.8GB/s pdf BM_UFlat/5 484832 481572 407 811.1MB/s html4 BM_UFlat/6 408410 408713 482 354.9MB/s txt1 BM_UFlat/7 361714 361663 553 330.1MB/s txt2 BM_UFlat/8 1090582 1087912 182 374.1MB/s txt3 BM_UFlat/9 1503127 1503759 133 305.6MB/s txt4 BM_UFlat/10 114183 114285 1715 989.6MB/s pb BM_UFlat/11 406714 407331 491 431.5MB/s gaviota BM_UIOVec/0 370397 369888 538 264.0MB/s html BM_UIOVec/1 3207510 3190000 100 209.9MB/s urls BM_UIOVec/2 16589 16573 11223 6.9GB/s jpg BM_UIOVec/3 1052 1052 165289 181.2MB/s jpg_200 BM_UIOVec/4 49151 49184 3985 1.9GB/s pdf BM_UValidate/0 68115 68095 2893 1.4GB/s html BM_UValidate/1 792652 792000 250 845.4MB/s urls BM_UValidate/2 334 334 487804 343.1GB/s jpg BM_UValidate/3 235 235 666666 809.9MB/s jpg_200 BM_UValidate/4 6126 6130 32626 15.6GB/s pdf BM_ZFlat/0 292697 290560 678 336.1MB/s html (22.31 %) BM_ZFlat/1 4062080 4050000 100 165.3MB/s urls (47.78 %) BM_ZFlat/2 29225 29274 6422 3.9GB/s jpg (99.95 %) BM_ZFlat/3 1099 1098 163934 173.7MB/s jpg_200 (73.00 %) BM_ZFlat/4 44117 44233 4205 2.2GB/s pdf (83.30 %) BM_ZFlat/5 1158058 1157894 171 337.4MB/s html4 (22.52 %) BM_ZFlat/6 1102983 1093922 181 132.6MB/s txt1 (57.88 %) BM_ZFlat/7 974142 975490 204 122.4MB/s txt2 (61.91 %) BM_ZFlat/8 2984670 2990000 100 136.1MB/s txt3 (54.99 %) BM_ZFlat/9 4100130 4090000 100 112.4MB/s txt4 (66.26 %) BM_ZFlat/10 276236 275139 716 411.0MB/s pb (19.68 %) BM_ZFlat/11 760091 759541 262 231.4MB/s gaviota (37.72 %) Baseline benchmark results, Pixel C with Android N2G48B Benchmark Time(ns) CPU(ns) Iterations --------------------------------------------------- BM_UFlat/0 148957 147565 1335 661.8MB/s html BM_UFlat/1 1527257 1500000 132 446.4MB/s urls BM_UFlat/2 19589 19397 8764 5.9GB/s jpg BM_UFlat/3 425 418 408163 455.3MB/s jpg_200 BM_UFlat/4 30096 29552 6497 3.2GB/s pdf BM_UFlat/5 595933 594594 333 657.0MB/s html4 BM_UFlat/6 516315 514360 383 282.0MB/s txt1 BM_UFlat/7 454653 453514 441 263.2MB/s txt2 BM_UFlat/8 1382687 1361111 144 299.0MB/s txt3 BM_UFlat/9 1967590 1904761 105 241.3MB/s txt4 BM_UFlat/10 148271 144560 1342 782.3MB/s pb BM_UFlat/11 523997 510471 382 344.4MB/s gaviota BM_UIOVec/0 478443 465227 417 209.9MB/s html BM_UIOVec/1 4172860 4060000 100 164.9MB/s urls BM_UIOVec/2 21470 20975 7342 5.5GB/s jpg BM_UIOVec/3 1357 1330 75187 143.4MB/s jpg_200 BM_UIOVec/4 63143 61365 3031 1.6GB/s pdf BM_UValidate/0 86910 85125 2279 1.1GB/s html BM_UValidate/1 1022256 1000000 195 669.6MB/s urls BM_UValidate/2 420 417 400000 274.6GB/s jpg BM_UValidate/3 311 302 571428 630.0MB/s jpg_200 BM_UValidate/4 7778 7584 25445 12.6GB/s pdf BM_ZFlat/0 469209 457547 424 213.4MB/s html (22.31 %) BM_ZFlat/1 5633510 5460000 100 122.6MB/s urls (47.78 %) BM_ZFlat/2 37896 36693 4524 3.1GB/s jpg (99.95 %) BM_ZFlat/3 1485 1441 123456 132.3MB/s jpg_200 (73.00 %) BM_ZFlat/4 74870 72775 2652 1.3GB/s pdf (83.30 %) BM_ZFlat/5 1857321 1785714 112 218.8MB/s html4 (22.52 %) BM_ZFlat/6 1538723 1492307 130 97.2MB/s txt1 (57.88 %) BM_ZFlat/7 1338236 1310810 148 91.1MB/s txt2 (61.91 %) BM_ZFlat/8 4050820 4040000 100 100.7MB/s txt3 (54.99 %) BM_ZFlat/9 5234940 5230000 100 87.9MB/s txt4 (66.26 %) BM_ZFlat/10 400309 400000 495 282.7MB/s pb (19.68 %) BM_ZFlat/11 1063042 1058510 188 166.1MB/s gaviota (37.72 %) |
|
costan | 77c12adc19 |
Add unistd.h checks back to the CMake build.
getpagesize(), as well as its POSIX.2001 replacement sysconf(_SC_PAGESIZE), is defined in <unistd.h>. On Linux and OS X, including <sys/mman.h> is sufficient to get a definition for getpagesize(). However, this is not true for the Android NDK. This CL brings back the HAVE_UNISTD_H definition and its associated header check. This also adds a HAVE_FUNC_SYSCONF definition, which checks for the presence of sysconf(). The definition can be used later to replace getpagesize() with sysconf(). |
|
costan | c8049c5827 |
Replace getpagesize() with sysconf(_SC_PAGESIZE).
getpagesize() has been removed from POSIX.1-2001. Its recommended replacement is sysconf(_SC_PAGESIZE). |
|
costan | 18e2f220d8 |
Add guidelines for opensource contributions.
The guidelines follow the instructions at https://opensource.google.com/docs/releasing/preparing/#CONTRIBUTING |
|
costan | f0d3237c32 |
Use _BitScanForward and _BitScanReverse on MSVC.
Based on https://github.com/google/snappy/pull/30 |
|
jueminyang | 71b8f86887 | Add SNAPPY_ prefix to PREDICT_{TRUE,FALSE} macros. | |
costan | be6dc3db83 |
Redo CMake configuration.
The style was changed to match the official manual [1], the install configuration was simplified and now matches the official packaging guide [2], and the config files use the CMake-specific variable syntax ${VAR} instead of the autoconf-compatible syntax @VAR@, as documented in [3]. The public header files are declared as such (for CMake 3.3+), and the generated headers are included in the library target definition. The tests are only built if SNAPPY_BUILD_TESTS (default ON) is true, so zippy can be easily used in projects that add_subdirectory() its source code directly, instead of using find_package(). [1] https://cmake.org/cmake/help/git-master/manual/cmake-language.7.html [2] https://cmake.org/cmake/help/git-master/manual/cmake-packages.7.html [3] https://cmake.org/cmake/help/git-master/command/configure_file.html |
|
costan | e4de6ce087 |
Small improvements to open source CI configuration.
This CL fixes 64-bit Windows testing (), makes it possible to view the test output in the Travis / AppVeyor CI console while the test is running, and takes advantage of the new support for the .appveyor.yml file name to make the CI configuration less obtrusive. |
|
costan | c756f7f5d9 |
Support both static and shared library CMake builds.
This can be used to fix https://github.com/Homebrew/homebrew-core/issues/15722. |
|
costan | 038a3329b1 |
Inline DISALLOW_COPY_AND_ASSIGN.
snappy-stubs-public.h defined the DISALLOW_COPY_AND_ASSIGN macro, so the definition propagated to all translation units that included the open source headers. The macro is now inlined, thus avoiding polluting the macro environment of snappy users. |
|
costan | a8b239c3de | snappy: Remove autoconf build configuration. | |
costan | 27671c6aec |
Clean up CMake header and type checks.
Unused macros: HAVE_DLFCN_H, HAVE_INTTYPES_H, HAVE_MEMORY_H, HAVE_STDLIB_H, HAVE_STRINGS_H, HAVE_STRING_H, HAVE_SYS_BYTESWAP_H, HAVE_SYS_STAT_H, HAVE_SYS_TYPES_H, HAVE_UNISTD_H. Used but never set macros: HAVE_LIBLZF, HAVE_LIBQUICKLZ. These only gate conditional includes. The code that takes advantage of them was removed. Unused types: ssize_t. The testing code uses HAVE_FUNC_MMAP, which was not wired in the CMake build, causing a whole test to be skipped. |
|
costan | 548501c988 |
zippy: Re-release snappy 1.1.5 as 1.1.6.
The migration from autotools to CMake in 1.1.5 wasn't as smooth as intended. The SONAME / SOVERSION were broken in both build systems, causing breakages in systems that upgraded from snappy 1.1.4 to 1.1.5, as reported in https://github.com/Homebrew/homebrew-core/issues/15274 and https://github.com/google/snappy/pull/45. |
|
costan | 513df5fb5a | Tag open source release 1.1.5. | |
costan | 5bc9c82ae3 |
Set minimum CMake version to 3.1.
The project only needs CMake 3.1 features, and some Travis CI bots have CMake 3.2.2. Therefore, requiring CMake 3.4 is inconvenient. |
|
costan | e9720a001d | Update Travis CI config, add AppVeyor for Windows CI coverage. | |
tmsriram | f24f9d2d97 |
Explicitly copy internal::wordmask to the stack array to work around a compiler
optimization with LLVM that converts const stack arrays to global arrays. This is a temporary change and should be reverted when https://reviews.llvm.org/D30759 is fixed. With PIE, accessing stack arrays is more efficient than global arrays and wordmask was moved to the stack due to that. However, the LLVM compiler automatically converts stack arrays, detected as constant, to global arrays and this transformation hurts PIE performance with LLVM. We are working to fix this in the LLVM compiler, via https://reviews.llvm.org/D30759, to not do this conversion in PIE mode. Until this patch is finished, please consider this source change as a temporary work around to keep this array on the stack. This source change is important to allow some projects to flip the default compiler from GCC to LLVM for optimized builds. This change works for the following reason. The LLVM compiler does not convert non-const stack arrays to global arrays and explicitly copying the elements is enough to make the compiler assume that this is a non-const array. With GCC, this change does not affect code-gen in any significant way. The array initialization code is slightly different as it copies the constants directly to the stack. With LLVM, this keeps the array on the stack. No change in performance with GCC (within noise range). With LLVM, ~0.7% improvement in optimized mode (no FDO) and ~1.75% improvement in FDO mode. |
|
ysaed | 82deffcde7 | Remove benchmarking support for fastlz. | |
alkis | 18488d6212 |
Use 64 bit little endian on ppc64le.
This has tangible performance benefits. This lands https://github.com/google/snappy/pull/27 |
|
alkis | 7b9532b878 |
Improve the SSE2 macro check on Windows.
This lands https://github.com/google/snappy/pull/37 |
|
alkis | 7dadceea52 |
Check for the existence of sys/uio.h in autoconf build.
This lands https://github.com/google/snappy/pull/32 |
|
jyrki | 83179dd8be | Remove quicklz and lzf support in benchmarks. | |
vrabaud | c8131680d0 |
Provide a CMakeLists.txt.
This lands https://github.com/google/snappy/pull/29 |
|
costan | ed3b7b242b | Clean up unused function warnings in snappy. |