mirror of https://github.com/google/snappy.git
200 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Bhargava Shastry | a58d4b03c5 | Update travis config for fuzzer builds | |
Bhargava Shastry | d71375bf8a | Add libFuzzer harnesses, a cmake option to build them | |
Chris Mumford | 156cd8939c |
Removed reference to deprecated autotools.
PiperOrigin-RevId: 253128048 |
|
Victor Costan | fe702ad2a3 |
Use GCC 9 on Travis CI
PiperOrigin-RevId: 249995900 |
|
Chris Mumford | a3e012d762 |
The snappy landing page at http://google.github.io/snappy/ is
served by [GitHub Pages](https://pages.github.com/) and lives in the gh-pages branch. This changes moves the page contents to a more easily accessed Markdown file. PiperOrigin-RevId: 248561542 |
|
Chris Mumford | 4312f49315 |
Merge pull request #75 from Maikuolan:patch-1
PiperOrigin-RevId: 248558516 |
|
Chris Mumford | 407712f4c9 |
Merge pull request #76 from abyss7:patch-1
PiperOrigin-RevId: 248211389 |
|
Chris Mumford | 8c188a6c78 |
Minor typo fix in README.
PiperOrigin-RevId: 248170160 |
|
Chris Mumford | c76b053449 |
Sync TODO and comment processing with external repo.
Copybara transforms code slightly different than MOE. One example is the TODO username stripping where Copybara produces different results than MOE did. This change moves the Copybara versions of comments to the public repository. Note: These changes didn't originate in cl/247950252. PiperOrigin-RevId: 247950252 |
|
Chris Mumford | 54b6379e9f |
Changed CMake version from 3.4 to that in CMakeLists.txt in README.
PiperOrigin-RevId: 247484946 |
|
Victor Costan | 0af4349bf0 |
Update Travis CI configuration.
The Travis configuration: 1) Installs recent versions of clang and GCC. 2) Sets up the environment so that CMake picks up the installed compilers. Previously, the pre-installed clang compiler was used instead. 3) Requests a modern macOS image that has all the headers needed by GCC. The CL also removes now-unnecessary old workarounds from the Travis configuration. PiperOrigin-RevId: 245832795 |
|
Chris Mumford | 877cc86f0e |
Fixed formatted (bash/c++) sections of README.md.
PiperOrigin-RevId: 244695986 |
|
atdt | 02cf187555 |
Remove MSan exemption for _bzhi_u32, since LLVM now handles it correctly.
This cleans up a TODO from cl/225463783 and cl/225655713. PiperOrigin-RevId: 241933185 |
|
Ivan |
be831dc98c
|
Fix compilation | |
costan | d58cd618be | Remove MSBuild section from AppVeyor configuration. | |
nafi | c197d686a9 |
Optimize snappy compression by about 2.2%.
'jpg_200' is notably optimized by ~8%. name old time/op new time/op delta BM_UFlat/0 [html ] 41.8µs ± 0% 41.9µs ± 0% +0.33% (p=0.016 n=5+5) BM_UFlat/1 [urls ] 590µs ± 0% 590µs ± 0% ~ (p=1.000 n=5+5) BM_UFlat/2 [jpg ] 7.14µs ± 1% 7.12µs ± 1% ~ (p=0.310 n=5+5) BM_UFlat/3 [jpg_200 ] 129ns ± 0% 129ns ± 0% ~ (p=0.167 n=5+5) BM_UFlat/4 [pdf ] 8.21µs ± 0% 8.20µs ± 0% ~ (p=0.310 n=5+5) BM_UFlat/5 [html4 ] 220µs ± 1% 220µs ± 0% ~ (p=0.421 n=5+5) BM_UFlat/6 [txt1 ] 193µs ± 0% 193µs ± 0% ~ (p=0.841 n=5+5) BM_UFlat/7 [txt2 ] 171µs ± 0% 171µs ± 0% ~ (p=0.056 n=5+5) BM_UFlat/8 [txt3 ] 512µs ± 0% 511µs ± 0% ~ (p=0.310 n=5+5) BM_UFlat/9 [txt4 ] 716µs ± 0% 716µs ± 0% ~ (p=1.000 n=5+5) BM_UFlat/10 [pb ] 38.8µs ± 1% 38.8µs ± 0% ~ (p=1.000 n=5+5) BM_UFlat/11 [gaviota ] 190µs ± 0% 190µs ± 0% ~ (p=0.841 n=5+5) BM_UFlat/12 [cp ] 14.4µs ± 1% 14.4µs ± 1% ~ (p=0.151 n=5+5) BM_UFlat/13 [c ] 7.33µs ± 0% 7.32µs ± 0% ~ (p=0.690 n=5+5) BM_UFlat/14 [lsp ] 2.30µs ± 0% 2.31µs ± 1% ~ (p=0.548 n=5+5) BM_UFlat/15 [xls ] 984µs ± 0% 984µs ± 0% ~ (p=1.000 n=5+5) BM_UFlat/16 [xls_200 ] 213ns ± 0% 213ns ± 0% ~ (p=0.310 n=5+5) BM_UFlat/17 [bin ] 277µs ± 0% 278µs ± 0% ~ (p=0.690 n=5+5) BM_UFlat/18 [bin_200 ] 101ns ± 0% 102ns ± 0% ~ (p=0.190 n=5+4) BM_UFlat/19 [sum ] 29.6µs ± 0% 29.6µs ± 0% ~ (p=0.310 n=5+5) BM_UFlat/20 [man ] 2.98µs ± 1% 2.98µs ± 0% ~ (p=1.000 n=5+5) BM_UValidate/0 [html ] 33.5µs ± 0% 33.6µs ± 0% ~ (p=0.310 n=5+5) BM_UValidate/1 [urls ] 443µs ± 0% 443µs ± 0% ~ (p=0.841 n=5+5) BM_UValidate/2 [jpg ] 146ns ± 0% 146ns ± 0% ~ (p=0.222 n=5+5) BM_UValidate/3 [jpg_200 ] 95.6ns ± 0% 95.5ns ± 0% ~ (p=0.421 n=5+5) BM_UValidate/4 [pdf ] 2.92µs ± 0% 2.92µs ± 0% ~ (p=0.841 n=5+5) BM_UIOVec/0 [html ] 122µs ± 0% 122µs ± 0% ~ (p=0.548 n=5+5) BM_UIOVec/1 [urls ] 1.08ms ± 0% 1.08ms ± 0% ~ (p=0.151 n=5+5) BM_UIOVec/2 [jpg ] 7.48µs ± 5% 7.75µs ±12% ~ (p=0.690 n=5+5) BM_UIOVec/3 [jpg_200 ] 331ns ± 1% 327ns ± 1% ~ (p=0.056 n=5+5) BM_UIOVec/4 [pdf ] 12.0µs ± 0% 12.0µs ± 0% ~ (p=1.000 n=5+5) BM_UFlatSink/0 [html ] 41.7µs ± 0% 41.8µs ± 0% ~ (p=0.421 n=5+5) BM_UFlatSink/1 [urls ] 591µs ± 0% 590µs ± 0% ~ (p=0.151 n=5+5) BM_UFlatSink/2 [jpg ] 7.18µs ± 2% 7.31µs ± 3% ~ (p=0.190 n=4+5) BM_UFlatSink/3 [jpg_200 ] 134ns ± 2% 134ns ± 2% ~ (p=1.000 n=5+5) BM_UFlatSink/4 [pdf ] 8.22µs ± 0% 8.23µs ± 0% ~ (p=0.730 n=4+5) BM_UFlatSink/5 [html4 ] 219µs ± 0% 219µs ± 0% ~ (p=0.548 n=5+5) BM_UFlatSink/6 [txt1 ] 193µs ± 0% 193µs ± 0% ~ (p=0.095 n=5+5) BM_UFlatSink/7 [txt2 ] 171µs ± 0% 171µs ± 0% ~ (p=0.841 n=5+5) BM_UFlatSink/8 [txt3 ] 512µs ± 0% 512µs ± 0% ~ (p=0.548 n=5+5) BM_UFlatSink/9 [txt4 ] 718µs ± 0% 718µs ± 0% ~ (p=0.548 n=5+5) BM_UFlatSink/10 [pb ] 38.7µs ± 0% 38.6µs ± 0% ~ (p=0.222 n=5+5) BM_UFlatSink/11 [gaviota ] 191µs ± 0% 190µs ± 0% ~ (p=0.690 n=5+5) BM_UFlatSink/12 [cp ] 14.3µs ± 0% 14.4µs ± 0% ~ (p=0.222 n=5+5) BM_UFlatSink/13 [c ] 7.33µs ± 0% 7.34µs ± 1% ~ (p=0.690 n=5+5) BM_UFlatSink/14 [lsp ] 2.29µs ± 1% 2.30µs ± 1% ~ (p=0.095 n=5+5) BM_UFlatSink/15 [xls ] 981µs ± 0% 980µs ± 0% ~ (p=0.310 n=5+5) BM_UFlatSink/16 [xls_200 ] 216ns ± 1% 216ns ± 1% ~ (p=1.000 n=5+5) BM_UFlatSink/17 [bin ] 277µs ± 0% 277µs ± 0% ~ (p=1.000 n=5+5) BM_UFlatSink/18 [bin_200 ] 104ns ± 0% 104ns ± 1% ~ (p=0.905 n=5+4) BM_UFlatSink/19 [sum ] 29.5µs ± 0% 29.5µs ± 0% ~ (p=0.222 n=5+5) BM_UFlatSink/20 [man ] 3.01µs ± 1% 3.01µs ± 0% ~ (p=0.730 n=5+4) BM_ZFlat/0 [html (22.31 %) ] 126µs ± 0% 124µs ± 0% -1.66% (p=0.008 n=5+5) BM_ZFlat/1 [urls (47.78 %) ] 1.68ms ± 0% 1.63ms ± 0% -2.73% (p=0.008 n=5+5) BM_ZFlat/2 [jpg (99.95 %) ] 11.6µs ± 8% 11.4µs ± 6% ~ (p=0.310 n=5+5) BM_ZFlat/3 [jpg_200 (73.00 %)] 369ns ± 1% 340ns ± 1% -7.93% (p=0.008 n=5+5) BM_ZFlat/4 [pdf (83.30 %) ] 14.9µs ± 4% 14.4µs ± 1% -3.56% (p=0.008 n=5+5) BM_ZFlat/5 [html4 (22.52 %) ] 551µs ± 0% 545µs ± 0% -1.21% (p=0.008 n=5+5) BM_ZFlat/6 [txt1 (57.88 %) ] 540µs ± 0% 534µs ± 0% -1.15% (p=0.008 n=5+5) BM_ZFlat/7 [txt2 (61.91 %) ] 480µs ± 0% 475µs ± 0% -1.13% (p=0.008 n=5+5) BM_ZFlat/8 [txt3 (54.99 %) ] 1.44ms ± 0% 1.43ms ± 0% -1.14% (p=0.008 n=5+5) BM_ZFlat/9 [txt4 (66.26 %) ] 1.97ms ± 0% 1.95ms ± 0% -1.00% (p=0.008 n=5+5) BM_ZFlat/10 [pb (19.68 %) ] 110µs ± 0% 107µs ± 0% -2.77% (p=0.008 n=5+5) BM_ZFlat/11 [gaviota (37.72 %)] 413µs ± 0% 411µs ± 0% -0.50% (p=0.008 n=5+5) BM_ZFlat/12 [cp (48.12 %) ] 46.6µs ± 1% 44.8µs ± 1% -3.89% (p=0.008 n=5+5) BM_ZFlat/13 [c (42.47 %) ] 17.8µs ± 0% 17.5µs ± 0% -1.87% (p=0.008 n=5+5) BM_ZFlat/14 [lsp (48.37 %) ] 5.62µs ± 1% 5.35µs ± 1% -4.81% (p=0.008 n=5+5) BM_ZFlat/15 [xls (41.23 %) ] 1.63ms ± 0% 1.63ms ± 0% ~ (p=0.310 n=5+5) BM_ZFlat/16 [xls_200 (78.00 %)] 393ns ± 1% 384ns ± 2% -2.45% (p=0.008 n=5+5) BM_ZFlat/17 [bin (18.11 %) ] 510µs ± 0% 503µs ± 0% -1.50% (p=0.016 n=4+5) BM_ZFlat/18 [bin_200 (7.50 %) ] 83.2ns ± 3% 84.5ns ± 4% ~ (p=0.206 n=5+5) BM_ZFlat/19 [sum (48.96 %) ] 80.0µs ± 0% 78.3µs ± 0% -2.20% (p=0.008 n=5+5) BM_ZFlat/20 [man (59.21 %) ] 7.79µs ± 1% 7.45µs ± 1% -4.38% (p=0.008 n=5+5) name old allocs/op new allocs/op delta BM_UFlat/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/5 [html4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/6 [txt1 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/7 [txt2 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/8 [txt3 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/9 [txt4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/10 [pb ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/11 [gaviota ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/12 [cp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/13 [c ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/14 [lsp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/15 [xls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/16 [xls_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/17 [bin ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/18 [bin_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/19 [sum ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/20 [man ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/5 [html4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/6 [txt1 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/7 [txt2 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/8 [txt3 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/9 [txt4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/10 [pb ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/11 [gaviota ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/12 [cp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/13 [c ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/14 [lsp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/15 [xls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/16 [xls_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/17 [bin ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/18 [bin_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/19 [sum ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/20 [man ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_ZFlat/0 [html (22.31 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/1 [urls (47.78 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/2 [jpg (99.95 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/3 [jpg_200 (73.00 %)] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/4 [pdf (83.30 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/5 [html4 (22.52 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/6 [txt1 (57.88 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/7 [txt2 (61.91 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/8 [txt3 (54.99 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/9 [txt4 (66.26 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/10 [pb (19.68 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/11 [gaviota (37.72 %)] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/12 [cp (48.12 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/13 [c (42.47 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/14 [lsp (48.37 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/15 [xls (41.23 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/16 [xls_200 (78.00 %)] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/17 [bin (18.11 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/18 [bin_200 (7.50 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/19 [sum (48.96 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/20 [man (59.21 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_UFlat/0 [html ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/1 [urls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/2 [jpg ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/3 [jpg_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/4 [pdf ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/5 [html4 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/6 [txt1 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/7 [txt2 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/8 [txt3 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/9 [txt4 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/10 [pb ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/11 [gaviota ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/12 [cp ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/13 [c ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/14 [lsp ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/15 [xls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/16 [xls_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/17 [bin ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/18 [bin_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/19 [sum ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/20 [man ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/0 [html ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/1 [urls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/2 [jpg ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/3 [jpg_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/4 [pdf ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/0 [html ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/1 [urls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/2 [jpg ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/3 [jpg_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/4 [pdf ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlatSink/0 [html ] 102k ± 0% 102k ± 0% ~ (all samples are equal) BM_UFlatSink/1 [urls ] 702k ± 0% 702k ± 0% ~ (all samples are equal) BM_UFlatSink/2 [jpg ] 123k ± 0% 123k ± 0% ~ (all samples are equal) BM_UFlatSink/3 [jpg_200 ] 201 ± 0% 201 ± 0% ~ (all samples are equal) BM_UFlatSink/4 [pdf ] 102k ± 0% 102k ± 0% ~ (all samples are equal) BM_UFlatSink/5 [html4 ] 410k ± 0% 410k ± 0% ~ (all samples are equal) BM_UFlatSink/6 [txt1 ] 152k ± 0% 152k ± 0% ~ (all samples are equal) BM_UFlatSink/7 [txt2 ] 125k ± 0% 125k ± 0% ~ (all samples are equal) BM_UFlatSink/8 [txt3 ] 427k ± 0% 427k ± 0% ~ (all samples are equal) BM_UFlatSink/9 [txt4 ] 482k ± 0% 482k ± 0% ~ (all samples are equal) BM_UFlatSink/10 [pb ] 119k ± 0% 119k ± 0% ~ (all samples are equal) BM_UFlatSink/11 [gaviota ] 184k ± 0% 184k ± 0% ~ (all samples are equal) BM_UFlatSink/12 [cp ] 24.6k ± 0% 24.6k ± 0% ~ (all samples are equal) BM_UFlatSink/13 [c ] 11.2k ± 0% 11.2k ± 0% ~ (all samples are equal) BM_UFlatSink/14 [lsp ] 3.72k ± 0% 3.72k ± 0% ~ (all samples are equal) BM_UFlatSink/15 [xls ] 1.03M ± 0% 1.03M ± 0% ~ (all samples are equal) BM_UFlatSink/16 [xls_200 ] 201 ± 0% 201 ± 0% ~ (all samples are equal) BM_UFlatSink/17 [bin ] 513k ± 0% 513k ± 0% ~ (all samples are equal) BM_UFlatSink/18 [bin_200 ] 201 ± 0% 201 ± 0% ~ (all samples are equal) BM_UFlatSink/19 [sum ] 38.2k ± 0% 38.2k ± 0% ~ (all samples are equal) BM_UFlatSink/20 [man ] 4.23k ± 0% 4.23k ± 0% ~ (all samples are equal) BM_ZFlat/0 [html (22.31 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/1 [urls (47.78 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/2 [jpg (99.95 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/3 [jpg_200 (73.00 %)] 30.7k ± 0% 30.7k ± 0% ~ (all samples are equal) BM_ZFlat/4 [pdf (83.30 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/5 [html4 (22.52 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/6 [txt1 (57.88 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/7 [txt2 (61.91 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/8 [txt3 (54.99 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/9 [txt4 (66.26 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/10 [pb (19.68 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/11 [gaviota (37.72 %)] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/12 [cp (48.12 %) ] 86.1k ± 0% 86.1k ± 0% ~ (all samples are equal) BM_ZFlat/13 [c (42.47 %) ] 57.0k ± 0% 57.0k ± 0% ~ (all samples are equal) BM_ZFlat/14 [lsp (48.37 %) ] 30.6k ± 0% 30.6k ± 0% ~ (all samples are equal) BM_ZFlat/15 [xls (41.23 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/16 [xls_200 (78.00 %)] 30.7k ± 0% 30.7k ± 0% ~ (all samples are equal) BM_ZFlat/17 [bin (18.11 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/18 [bin_200 (7.50 %) ] 30.7k ± 0% 30.7k ± 0% ~ (all samples are equal) BM_ZFlat/19 [sum (48.96 %) ] 116k ± 0% 116k ± 0% ~ (all samples are equal) BM_ZFlat/20 [man (59.21 %) ] 30.6k ± 0% 30.6k ± 0% ~ (all samples are equal) name old speed new speed delta BM_UFlat/0 [html ] 2.46GB/s ± 0% 2.45GB/s ± 1% ~ (p=0.841 n=5+5) BM_UFlat/1 [urls ] 1.19GB/s ± 1% 1.20GB/s ± 1% ~ (p=0.310 n=5+5) BM_UFlat/2 [jpg ] 17.3GB/s ± 1% 17.4GB/s ± 1% ~ (p=0.310 n=5+5) BM_UFlat/3 [jpg_200 ] 1.56GB/s ± 0% 1.56GB/s ± 0% ~ (p=0.190 n=4+5) BM_UFlat/4 [pdf ] 12.5GB/s ± 1% 12.5GB/s ± 0% ~ (p=0.548 n=5+5) BM_UFlat/5 [html4 ] 1.87GB/s ± 0% 1.87GB/s ± 1% ~ (p=1.000 n=5+5) BM_UFlat/6 [txt1 ] 791MB/s ± 1% 791MB/s ± 0% ~ (p=1.000 n=5+5) BM_UFlat/7 [txt2 ] 737MB/s ± 0% 738MB/s ± 0% ~ (p=0.841 n=5+5) BM_UFlat/8 [txt3 ] 839MB/s ± 0% 839MB/s ± 0% ~ (p=1.000 n=5+5) BM_UFlat/9 [txt4 ] 675MB/s ± 1% 674MB/s ± 0% ~ (p=0.730 n=5+4) BM_UFlat/10 [pb ] 3.08GB/s ± 1% 3.06GB/s ± 0% ~ (p=0.095 n=5+5) BM_UFlat/11 [gaviota ] 974MB/s ± 0% 976MB/s ± 0% ~ (p=0.238 n=5+5) BM_UFlat/12 [cp ] 1.70GB/s ± 0% 1.72GB/s ± 0% +1.07% (p=0.016 n=4+5) BM_UFlat/13 [c ] 1.53GB/s ± 0% 1.53GB/s ± 1% ~ (p=1.000 n=5+5) BM_UFlat/14 [lsp ] 1.62GB/s ± 1% 1.62GB/s ± 1% ~ (p=1.000 n=5+5) BM_UFlat/15 [xls ] 1.05GB/s ± 1% 1.05GB/s ± 0% ~ (p=0.556 n=5+4) BM_UFlat/16 [xls_200 ] 943MB/s ± 0% 940MB/s ± 0% ~ (p=0.151 n=5+5) BM_UFlat/17 [bin ] 1.86GB/s ± 1% 1.86GB/s ± 0% ~ (p=1.000 n=5+5) BM_UFlat/18 [bin_200 ] 1.99GB/s ± 0% 1.97GB/s ± 1% ~ (p=0.190 n=5+4) BM_UFlat/19 [sum ] 1.30GB/s ± 0% 1.30GB/s ± 1% ~ (p=0.151 n=5+5) BM_UFlat/20 [man ] 1.42GB/s ± 1% 1.42GB/s ± 0% ~ (p=1.000 n=5+5) BM_UValidate/0 [html ] 3.06GB/s ± 0% 3.06GB/s ± 1% ~ (p=1.000 n=5+5) BM_UValidate/1 [urls ] 1.59GB/s ± 0% 1.59GB/s ± 0% ~ (p=0.095 n=5+5) BM_UValidate/2 [jpg ] 845GB/s ± 0% 845GB/s ± 0% ~ (p=1.000 n=5+5) BM_UValidate/3 [jpg_200 ] 2.10GB/s ± 0% 2.10GB/s ± 0% ~ (p=0.310 n=5+5) BM_UValidate/4 [pdf ] 35.1GB/s ± 0% 35.1GB/s ± 1% ~ (p=0.690 n=5+5) BM_UIOVec/0 [html ] 843MB/s ± 0% 847MB/s ± 1% ~ (p=0.222 n=5+5) BM_UIOVec/1 [urls ] 652MB/s ± 1% 652MB/s ± 1% ~ (p=0.310 n=5+5) BM_UIOVec/2 [jpg ] 16.5GB/s ± 5% 16.0GB/s ±10% ~ (p=0.841 n=5+5) BM_UIOVec/3 [jpg_200 ] 606MB/s ± 1% 614MB/s ± 1% ~ (p=0.056 n=5+5) BM_UIOVec/4 [pdf ] 8.57GB/s ± 0% 8.57GB/s ± 0% ~ (p=0.343 n=4+4) BM_UFlatSink/0 [html ] 2.47GB/s ± 0% 2.45GB/s ± 0% -0.58% (p=0.016 n=5+5) BM_UFlatSink/1 [urls ] 1.19GB/s ± 0% 1.20GB/s ± 0% ~ (p=0.548 n=5+5) BM_UFlatSink/2 [jpg ] 16.4GB/s ±19% 16.9GB/s ± 4% ~ (p=0.690 n=5+5) BM_UFlatSink/3 [jpg_200 ] 1.50GB/s ± 2% 1.50GB/s ± 2% ~ (p=1.000 n=5+5) BM_UFlatSink/4 [pdf ] 12.5GB/s ± 0% 12.5GB/s ± 0% ~ (p=0.730 n=4+5) BM_UFlatSink/5 [html4 ] 1.87GB/s ± 1% 1.88GB/s ± 0% ~ (p=0.421 n=5+5) BM_UFlatSink/6 [txt1 ] 793MB/s ± 0% 792MB/s ± 1% ~ (p=0.690 n=5+5) BM_UFlatSink/7 [txt2 ] 736MB/s ± 0% 736MB/s ± 1% ~ (p=0.841 n=5+5) BM_UFlatSink/8 [txt3 ] 839MB/s ± 0% 839MB/s ± 0% ~ (p=0.548 n=5+5) BM_UFlatSink/9 [txt4 ] 675MB/s ± 0% 675MB/s ± 0% ~ (p=0.222 n=5+5) BM_UFlatSink/10 [pb ] 3.07GB/s ± 0% 3.09GB/s ± 0% +0.54% (p=0.016 n=5+5) BM_UFlatSink/11 [gaviota ] 973MB/s ± 0% 971MB/s ± 0% ~ (p=0.151 n=5+5) BM_UFlatSink/12 [cp ] 1.72GB/s ± 1% 1.71GB/s ± 1% ~ (p=0.421 n=5+5) BM_UFlatSink/13 [c ] 1.53GB/s ± 1% 1.52GB/s ± 0% ~ (p=0.841 n=5+5) BM_UFlatSink/14 [lsp ] 1.63GB/s ± 0% 1.62GB/s ± 1% ~ (p=0.222 n=5+5) BM_UFlatSink/15 [xls ] 1.06GB/s ± 0% 1.05GB/s ± 0% ~ (p=0.111 n=4+5) BM_UFlatSink/16 [xls_200 ] 932MB/s ± 1% 928MB/s ± 1% ~ (p=0.548 n=5+5) BM_UFlatSink/17 [bin ] 1.86GB/s ± 0% 1.86GB/s ± 1% ~ (p=1.000 n=5+5) BM_UFlatSink/18 [bin_200 ] 1.93GB/s ± 1% 1.94GB/s ± 1% ~ (p=0.730 n=5+4) BM_UFlatSink/19 [sum ] 1.30GB/s ± 0% 1.30GB/s ± 1% ~ (p=0.690 n=5+5) BM_UFlatSink/20 [man ] 1.41GB/s ± 1% 1.41GB/s ± 2% ~ (p=0.690 n=5+5) BM_ZFlat/0 [html (22.31 %) ] 815MB/s ± 1% 829MB/s ± 0% +1.78% (p=0.008 n=5+5) BM_ZFlat/1 [urls (47.78 %) ] 420MB/s ± 1% 432MB/s ± 1% +2.87% (p=0.008 n=5+5) BM_ZFlat/2 [jpg (99.95 %) ] 10.7GB/s ± 8% 10.9GB/s ± 6% ~ (p=0.421 n=5+5) BM_ZFlat/3 [jpg_200 (73.00 %)] 544MB/s ± 2% 590MB/s ± 1% +8.41% (p=0.008 n=5+5) BM_ZFlat/4 [pdf (83.30 %) ] 6.92GB/s ± 3% 7.16GB/s ± 1% +3.51% (p=0.008 n=5+5) BM_ZFlat/5 [html4 (22.52 %) ] 745MB/s ± 0% 755MB/s ± 0% +1.34% (p=0.008 n=5+5) BM_ZFlat/6 [txt1 (57.88 %) ] 282MB/s ± 0% 285MB/s ± 1% +1.04% (p=0.008 n=5+5) BM_ZFlat/7 [txt2 (61.91 %) ] 262MB/s ± 0% 265MB/s ± 0% +1.22% (p=0.008 n=5+5) BM_ZFlat/8 [txt3 (54.99 %) ] 297MB/s ± 0% 300MB/s ± 0% +1.09% (p=0.008 n=5+5) BM_ZFlat/9 [txt4 (66.26 %) ] 246MB/s ± 1% 248MB/s ± 0% +0.95% (p=0.008 n=5+5) BM_ZFlat/10 [pb (19.68 %) ] 1.08GB/s ± 1% 1.11GB/s ± 1% +2.57% (p=0.008 n=5+5) BM_ZFlat/11 [gaviota (37.72 %)] 449MB/s ± 1% 451MB/s ± 0% ~ (p=0.056 n=5+5) BM_ZFlat/12 [cp (48.12 %) ] 530MB/s ± 1% 552MB/s ± 0% +4.17% (p=0.008 n=5+5) BM_ZFlat/13 [c (42.47 %) ] 628MB/s ± 1% 640MB/s ± 0% +1.85% (p=0.008 n=5+5) BM_ZFlat/14 [lsp (48.37 %) ] 665MB/s ± 0% 697MB/s ± 1% +4.71% (p=0.008 n=5+5) BM_ZFlat/15 [xls (41.23 %) ] 635MB/s ± 0% 634MB/s ± 0% ~ (p=0.310 n=5+5) BM_ZFlat/16 [xls_200 (78.00 %)] 511MB/s ± 1% 522MB/s ± 2% +2.23% (p=0.008 n=5+5) BM_ZFlat/17 [bin (18.11 %) ] 1.01GB/s ± 1% 1.02GB/s ± 0% +1.67% (p=0.008 n=5+5) BM_ZFlat/18 [bin_200 (7.50 %) ] 2.41GB/s ± 3% 2.37GB/s ± 4% ~ (p=0.222 n=5+5) BM_ZFlat/19 [sum (48.96 %) ] 480MB/s ± 0% 490MB/s ± 1% +2.24% (p=0.008 n=5+5) BM_ZFlat/20 [man (59.21 %) ] 545MB/s ± 0% 569MB/s ± 1% +4.38% (p=0.008 n=5+5) |
|
costan | 3f194acb57 |
Convert DCHECK to assert.
A previous CL introduced a use of DCHECK. The open source build does not support DCHECK, and this project uses assert() instead of DCHECK. |
|
costan | 97a20b480f |
Reduce the LeftShiftOverflows() table size.
A previous CL introduced LeftShiftOverflows(), which takes a uint32 input. However, the value it operates on is guaranteed to only have 8 bits set. This CL takes advantage of this restriction to reduce the size of the static table used to compute LeftShiftOverflows(). The same methodology as the previous CL suggests a 0.6% improvement. The improvement is likely bigger on mobile CPUs that have much smaller caches. Benchmark results: name old time/op new time/op delta BM_UFlat/0 [html ] 42.5µs ± 1% 42.1µs ± 0% -0.87% (p=0.000 n=20+20) BM_UFlat/1 [urls ] 575µs ± 0% 574µs ± 0% -0.16% (p=0.000 n=20+19) BM_UFlat/2 [jpg ] 7.13µs ± 1% 7.20µs ± 5% ~ (p=0.422 n=16+19) BM_UFlat/3 [jpg_200 ] 129ns ± 0% 130ns ± 0% +0.82% (p=0.000 n=20+17) BM_UFlat/4 [pdf ] 8.22µs ± 1% 8.21µs ± 0% ~ (p=0.586 n=17+17) BM_UFlat/5 [html4 ] 222µs ± 0% 222µs ± 0% -0.11% (p=0.047 n=19+20) BM_UFlat/6 [txt1 ] 192µs ± 0% 191µs ± 0% -0.69% (p=0.000 n=20+20) BM_UFlat/7 [txt2 ] 169µs ± 0% 169µs ± 0% -0.28% (p=0.000 n=20+20) BM_UFlat/8 [txt3 ] 510µs ± 0% 507µs ± 0% -0.50% (p=0.000 n=20+20) BM_UFlat/9 [txt4 ] 707µs ± 0% 703µs ± 0% -0.53% (p=0.000 n=20+20) BM_UFlat/10 [pb ] 39.1µs ± 0% 38.5µs ± 0% -1.56% (p=0.000 n=20+20) BM_UFlat/11 [gaviota ] 189µs ± 0% 189µs ± 0% -0.42% (p=0.000 n=20+20) BM_UFlat/12 [cp ] 14.2µs ± 0% 14.2µs ± 1% -0.30% (p=0.001 n=18+19) BM_UFlat/13 [c ] 7.29µs ± 0% 7.34µs ± 1% +0.59% (p=0.000 n=19+20) BM_UFlat/14 [lsp ] 2.28µs ± 0% 2.29µs ± 1% +0.39% (p=0.000 n=19+18) BM_UFlat/15 [xls ] 905µs ± 0% 904µs ± 0% -0.12% (p=0.030 n=20+20) BM_UFlat/16 [xls_200 ] 213ns ± 2% 215ns ± 4% +0.92% (p=0.011 n=20+20) BM_UFlat/17 [bin ] 274µs ± 0% 275µs ± 0% +0.55% (p=0.000 n=20+20) BM_UFlat/18 [bin_200 ] 101ns ± 1% 101ns ± 1% ~ (p=0.913 n=18+18) BM_UFlat/19 [sum ] 27.9µs ± 1% 27.5µs ± 1% -1.38% (p=0.000 n=20+20) BM_UFlat/20 [man ] 2.97µs ± 1% 2.97µs ± 1% ~ (p=0.835 n=20+19) BM_UValidate/0 [html ] 33.5µs ± 0% 34.2µs ± 0% +2.32% (p=0.000 n=20+20) BM_UValidate/1 [urls ] 441µs ± 0% 442µs ± 0% +0.15% (p=0.010 n=20+20) BM_UValidate/2 [jpg ] 144ns ± 0% 146ns ± 0% +1.32% (p=0.000 n=20+20) BM_UValidate/3 [jpg_200 ] 95.3ns ± 0% 96.0ns ± 0% +0.68% (p=0.000 n=20+20) BM_UValidate/4 [pdf ] 2.86µs ± 0% 2.88µs ± 1% +0.67% (p=0.000 n=19+19) BM_UIOVec/0 [html ] 122µs ± 0% 122µs ± 0% -0.25% (p=0.000 n=20+20) BM_UIOVec/1 [urls ] 1.08ms ± 0% 1.08ms ± 0% ~ (p=0.068 n=20+20) BM_UIOVec/2 [jpg ] 7.63µs ± 7% 7.76µs ±11% ~ (p=0.396 n=19+20) BM_UIOVec/3 [jpg_200 ] 325ns ± 0% 326ns ± 0% +0.27% (p=0.000 n=20+18) BM_UIOVec/4 [pdf ] 12.1µs ± 2% 12.1µs ± 3% ~ (p=0.967 n=19+20) BM_UFlatSink/0 [html ] 42.4µs ± 0% 42.1µs ± 0% -0.89% (p=0.000 n=20+20) BM_UFlatSink/1 [urls ] 575µs ± 0% 575µs ± 0% ~ (p=0.883 n=20+20) BM_UFlatSink/2 [jpg ] 7.58µs ±16% 7.52µs ±15% ~ (p=0.945 n=19+20) BM_UFlatSink/3 [jpg_200 ] 133ns ± 4% 133ns ± 4% ~ (p=0.627 n=19+20) BM_UFlatSink/4 [pdf ] 8.29µs ± 4% 8.39µs ± 4% +1.14% (p=0.013 n=19+18) BM_UFlatSink/5 [html4 ] 223µs ± 0% 222µs ± 0% -0.18% (p=0.001 n=20+20) BM_UFlatSink/6 [txt1 ] 192µs ± 0% 191µs ± 0% -0.71% (p=0.000 n=20+20) BM_UFlatSink/7 [txt2 ] 169µs ± 0% 169µs ± 0% -0.26% (p=0.000 n=20+20) BM_UFlatSink/8 [txt3 ] 510µs ± 0% 508µs ± 0% -0.50% (p=0.000 n=20+20) BM_UFlatSink/9 [txt4 ] 707µs ± 0% 704µs ± 0% -0.44% (p=0.000 n=20+20) BM_UFlatSink/10 [pb ] 39.1µs ± 0% 38.5µs ± 1% -1.62% (p=0.000 n=19+20) BM_UFlatSink/11 [gaviota ] 189µs ± 0% 189µs ± 0% -0.39% (p=0.000 n=20+20) BM_UFlatSink/12 [cp ] 14.2µs ± 0% 14.2µs ± 1% ~ (p=0.435 n=19+19) BM_UFlatSink/13 [c ] 7.29µs ± 0% 7.33µs ± 1% +0.57% (p=0.000 n=19+20) BM_UFlatSink/14 [lsp ] 2.29µs ± 0% 2.29µs ± 1% ~ (p=0.791 n=18+18) BM_UFlatSink/15 [xls ] 903µs ± 0% 902µs ± 0% -0.11% (p=0.044 n=20+19) BM_UFlatSink/16 [xls_200 ] 215ns ± 1% 215ns ± 1% ~ (p=0.885 n=19+19) BM_UFlatSink/17 [bin ] 274µs ± 0% 275µs ± 0% +0.51% (p=0.000 n=20+20) BM_UFlatSink/18 [bin_200 ] 103ns ± 2% 103ns ± 0% -0.41% (p=0.016 n=20+15) BM_UFlatSink/19 [sum ] 27.9µs ± 1% 27.5µs ± 1% -1.34% (p=0.000 n=20+19) BM_UFlatSink/20 [man ] 2.98µs ± 1% 2.97µs ± 1% ~ (p=0.358 n=18+19) BM_ZFlat/0 [html (22.31 %) ] 126µs ± 0% 126µs ± 0% +0.14% (p=0.011 n=20+20) BM_ZFlat/1 [urls (47.78 %) ] 1.67ms ± 0% 1.67ms ± 0% +0.11% (p=0.043 n=20+20) BM_ZFlat/2 [jpg (99.95 %) ] 11.5µs ± 6% 11.7µs ± 7% ~ (p=0.142 n=20+20) BM_ZFlat/3 [jpg_200 (73.00 %)] 349ns ± 3% 351ns ± 3% ~ (p=0.573 n=18+20) BM_ZFlat/4 [pdf (83.30 %) ] 14.6µs ± 2% 14.7µs ± 4% ~ (p=0.879 n=19+20) BM_ZFlat/5 [html4 (22.52 %) ] 553µs ± 0% 552µs ± 0% -0.23% (p=0.000 n=20+20) BM_ZFlat/6 [txt1 (57.88 %) ] 540µs ± 0% 540µs ± 0% ~ (p=0.221 n=20+20) BM_ZFlat/7 [txt2 (61.91 %) ] 479µs ± 0% 481µs ± 1% +0.47% (p=0.000 n=20+20) BM_ZFlat/8 [txt3 (54.99 %) ] 1.44ms ± 0% 1.44ms ± 0% +0.13% (p=0.040 n=20+20) BM_ZFlat/9 [txt4 (66.26 %) ] 1.97ms ± 0% 1.97ms ± 0% +0.16% (p=0.009 n=20+20) BM_ZFlat/10 [pb (19.68 %) ] 110µs ± 1% 109µs ± 1% -0.79% (p=0.000 n=20+20) BM_ZFlat/11 [gaviota (37.72 %)] 410µs ± 0% 410µs ± 0% ~ (p=0.149 n=20+19) BM_ZFlat/12 [cp (48.12 %) ] 45.4µs ± 1% 44.9µs ± 1% -1.23% (p=0.000 n=20+20) BM_ZFlat/13 [c (42.47 %) ] 17.5µs ± 0% 17.5µs ± 1% ~ (p=0.883 n=20+20) BM_ZFlat/14 [lsp (48.37 %) ] 5.51µs ± 1% 5.46µs ± 1% -0.95% (p=0.000 n=20+18) BM_ZFlat/15 [xls (41.23 %) ] 1.61ms ± 0% 1.62ms ± 0% ~ (p=0.183 n=20+20) BM_ZFlat/16 [xls_200 (78.00 %)] 389ns ± 2% 391ns ± 3% ~ (p=0.740 n=18+20) BM_ZFlat/17 [bin (18.11 %) ] 508µs ± 0% 508µs ± 0% ~ (p=0.779 n=20+20) BM_ZFlat/18 [bin_200 (7.50 %) ] 87.4ns ± 5% 88.1ns ± 8% ~ (p=0.367 n=16+19) BM_ZFlat/19 [sum (48.96 %) ] 79.1µs ± 0% 80.2µs ± 0% +1.39% (p=0.000 n=20+20) BM_ZFlat/20 [man (59.21 %) ] 7.55µs ± 1% 7.57µs ± 1% +0.31% (p=0.025 n=19+19) name old speed new speed delta BM_UFlat/0 [html ] 2.42GB/s ± 0% 2.44GB/s ± 0% +0.77% (p=0.000 n=19+19) BM_UFlat/1 [urls ] 1.22GB/s ± 0% 1.23GB/s ± 0% +0.06% (p=0.000 n=20+19) BM_UFlat/2 [jpg ] 17.3GB/s ± 2% 17.2GB/s ± 4% ~ (p=0.433 n=17+19) BM_UFlat/3 [jpg_200 ] 1.56GB/s ± 0% 1.54GB/s ± 0% -0.82% (p=0.000 n=20+20) BM_UFlat/4 [pdf ] 12.5GB/s ± 1% 12.5GB/s ± 1% ~ (p=0.322 n=17+17) BM_UFlat/5 [html4 ] 1.85GB/s ± 0% 1.85GB/s ± 0% +0.16% (p=0.000 n=20+20) BM_UFlat/6 [txt1 ] 794MB/s ± 0% 800MB/s ± 0% +0.68% (p=0.000 n=18+20) BM_UFlat/7 [txt2 ] 741MB/s ± 0% 743MB/s ± 0% +0.30% (p=0.000 n=19+19) BM_UFlat/8 [txt3 ] 840MB/s ± 0% 844MB/s ± 0% +0.53% (p=0.000 n=18+20) BM_UFlat/9 [txt4 ] 684MB/s ± 0% 688MB/s ± 0% +0.57% (p=0.000 n=20+17) BM_UFlat/10 [pb ] 3.04GB/s ± 0% 3.09GB/s ± 0% +1.60% (p=0.000 n=19+20) BM_UFlat/11 [gaviota ] 977MB/s ± 0% 981MB/s ± 0% +0.45% (p=0.000 n=19+19) BM_UFlat/12 [cp ] 1.74GB/s ± 0% 1.74GB/s ± 0% +0.29% (p=0.000 n=20+19) BM_UFlat/13 [c ] 1.53GB/s ± 0% 1.52GB/s ± 1% -0.56% (p=0.000 n=19+20) BM_UFlat/14 [lsp ] 1.64GB/s ± 0% 1.63GB/s ± 1% -0.38% (p=0.000 n=19+20) BM_UFlat/15 [xls ] 1.14GB/s ± 0% 1.14GB/s ± 0% +0.11% (p=0.000 n=19+20) BM_UFlat/16 [xls_200 ] 941MB/s ± 1% 931MB/s ± 4% -1.02% (p=0.001 n=19+20) BM_UFlat/17 [bin ] 1.88GB/s ± 0% 1.87GB/s ± 0% -0.51% (p=0.000 n=20+20) BM_UFlat/18 [bin_200 ] 1.98GB/s ± 0% 1.98GB/s ± 1% ~ (p=0.767 n=18+18) BM_UFlat/19 [sum ] 1.37GB/s ± 0% 1.39GB/s ± 0% +1.46% (p=0.000 n=20+20) BM_UFlat/20 [man ] 1.43GB/s ± 0% 1.43GB/s ± 0% ~ (p=0.501 n=18+18) BM_UValidate/0 [html ] 3.07GB/s ± 0% 3.00GB/s ± 0% -2.25% (p=0.000 n=20+20) BM_UValidate/1 [urls ] 1.60GB/s ± 0% 1.59GB/s ± 0% -0.11% (p=0.000 n=18+19) BM_UValidate/2 [jpg ] 859GB/s ± 0% 848GB/s ± 0% -1.29% (p=0.000 n=20+19) BM_UValidate/3 [jpg_200 ] 2.10GB/s ± 0% 2.09GB/s ± 0% -0.68% (p=0.000 n=19+20) BM_UValidate/4 [pdf ] 35.9GB/s ± 0% 35.6GB/s ± 1% -0.71% (p=0.000 n=20+20) BM_UIOVec/0 [html ] 843MB/s ± 0% 844MB/s ± 0% +0.21% (p=0.000 n=20+20) BM_UIOVec/1 [urls ] 651MB/s ± 0% 650MB/s ± 0% -0.10% (p=0.000 n=20+20) BM_UIOVec/2 [jpg ] 16.2GB/s ± 6% 16.0GB/s ±10% ~ (p=0.380 n=19+20) BM_UIOVec/3 [jpg_200 ] 617MB/s ± 0% 615MB/s ± 0% -0.24% (p=0.000 n=20+17) BM_UIOVec/4 [pdf ] 8.52GB/s ± 3% 8.50GB/s ± 3% ~ (p=0.771 n=19+20) BM_UFlatSink/0 [html ] 2.42GB/s ± 0% 2.44GB/s ± 0% +0.93% (p=0.000 n=20+20) BM_UFlatSink/1 [urls ] 1.23GB/s ± 0% 1.23GB/s ± 0% +0.04% (p=0.006 n=20+20) BM_UFlatSink/2 [jpg ] 16.4GB/s ±14% 16.5GB/s ±13% ~ (p=0.879 n=19+20) BM_UFlatSink/3 [jpg_200 ] 1.51GB/s ± 4% 1.51GB/s ± 4% ~ (p=0.874 n=18+20) BM_UFlatSink/4 [pdf ] 12.4GB/s ± 4% 12.3GB/s ± 4% -1.11% (p=0.016 n=19+18) BM_UFlatSink/5 [html4 ] 1.85GB/s ± 0% 1.85GB/s ± 0% +0.20% (p=0.000 n=20+20) BM_UFlatSink/6 [txt1 ] 794MB/s ± 0% 799MB/s ± 0% +0.72% (p=0.000 n=19+20) BM_UFlatSink/7 [txt2 ] 741MB/s ± 0% 743MB/s ± 0% +0.30% (p=0.000 n=18+20) BM_UFlatSink/8 [txt3 ] 839MB/s ± 0% 843MB/s ± 0% +0.52% (p=0.000 n=20+18) BM_UFlatSink/9 [txt4 ] 684MB/s ± 0% 687MB/s ± 0% +0.46% (p=0.000 n=20+20) BM_UFlatSink/10 [pb ] 3.04GB/s ± 0% 3.09GB/s ± 0% +1.71% (p=0.000 n=20+19) BM_UFlatSink/11 [gaviota ] 976MB/s ± 0% 980MB/s ± 0% +0.45% (p=0.000 n=20+20) BM_UFlatSink/12 [cp ] 1.74GB/s ± 1% 1.74GB/s ± 1% ~ (p=0.904 n=20+20) BM_UFlatSink/13 [c ] 1.53GB/s ± 0% 1.53GB/s ± 1% -0.50% (p=0.000 n=19+20) BM_UFlatSink/14 [lsp ] 1.63GB/s ± 1% 1.63GB/s ± 1% ~ (p=0.358 n=19+18) BM_UFlatSink/15 [xls ] 1.14GB/s ± 0% 1.15GB/s ± 0% +0.12% (p=0.000 n=20+20) BM_UFlatSink/16 [xls_200 ] 931MB/s ± 1% 931MB/s ± 1% ~ (p=0.686 n=19+19) BM_UFlatSink/17 [bin ] 1.88GB/s ± 0% 1.87GB/s ± 0% -0.53% (p=0.000 n=20+20) BM_UFlatSink/18 [bin_200 ] 1.94GB/s ± 2% 1.95GB/s ± 1% +0.42% (p=0.014 n=20+15) BM_UFlatSink/19 [sum ] 1.37GB/s ± 0% 1.39GB/s ± 0% +1.38% (p=0.000 n=19+18) BM_UFlatSink/20 [man ] 1.42GB/s ± 1% 1.43GB/s ± 0% ~ (p=0.284 n=18+19) BM_ZFlat/0 [html (22.31 %) ] 815MB/s ± 0% 814MB/s ± 0% -0.15% (p=0.000 n=20+20) BM_ZFlat/1 [urls (47.78 %) ] 423MB/s ± 0% 422MB/s ± 0% -0.14% (p=0.000 n=20+20) BM_ZFlat/2 [jpg (99.95 %) ] 10.8GB/s ± 5% 10.6GB/s ± 7% ~ (p=0.142 n=20+20) BM_ZFlat/3 [jpg_200 (73.00 %)] 574MB/s ± 2% 572MB/s ± 2% ~ (p=0.613 n=18+20) BM_ZFlat/4 [pdf (83.30 %) ] 7.01GB/s ± 2% 7.01GB/s ± 4% ~ (p=0.593 n=18+20) BM_ZFlat/5 [html4 (22.52 %) ] 743MB/s ± 0% 745MB/s ± 0% +0.25% (p=0.000 n=20+19) BM_ZFlat/6 [txt1 (57.88 %) ] 283MB/s ± 0% 282MB/s ± 0% ~ (p=0.261 n=18+19) BM_ZFlat/7 [txt2 (61.91 %) ] 262MB/s ± 0% 261MB/s ± 0% -0.35% (p=0.000 n=20+19) BM_ZFlat/8 [txt3 (54.99 %) ] 298MB/s ± 0% 297MB/s ± 0% -0.11% (p=0.000 n=20+19) BM_ZFlat/9 [txt4 (66.26 %) ] 245MB/s ± 0% 245MB/s ± 0% -0.13% (p=0.000 n=19+20) BM_ZFlat/10 [pb (19.68 %) ] 1.08GB/s ± 0% 1.09GB/s ± 0% +0.82% (p=0.000 n=18+19) BM_ZFlat/11 [gaviota (37.72 %)] 451MB/s ± 0% 451MB/s ± 0% -0.05% (p=0.004 n=19+20) BM_ZFlat/12 [cp (48.12 %) ] 543MB/s ± 1% 550MB/s ± 1% +1.24% (p=0.000 n=20+20) BM_ZFlat/13 [c (42.47 %) ] 638MB/s ± 0% 637MB/s ± 0% ~ (p=0.708 n=19+19) BM_ZFlat/14 [lsp (48.37 %) ] 678MB/s ± 2% 684MB/s ± 1% +0.89% (p=0.000 n=20+19) BM_ZFlat/15 [xls (41.23 %) ] 640MB/s ± 0% 640MB/s ± 0% -0.10% (p=0.000 n=19+19) BM_ZFlat/16 [xls_200 (78.00 %)] 515MB/s ± 2% 514MB/s ± 3% ~ (p=0.916 n=18+19) BM_ZFlat/17 [bin (18.11 %) ] 1.01GB/s ± 0% 1.01GB/s ± 0% +0.03% (p=0.033 n=20+20) BM_ZFlat/18 [bin_200 (7.50 %) ] 2.30GB/s ± 6% 2.28GB/s ± 9% ~ (p=0.502 n=16+19) BM_ZFlat/19 [sum (48.96 %) ] 485MB/s ± 0% 478MB/s ± 0% -1.39% (p=0.000 n=19+20) BM_ZFlat/20 [man (59.21 %) ] 562MB/s ± 1% 560MB/s ± 1% -0.37% (p=0.016 n=18+19) |
|
costan | 4f0adca400 |
Wrap BMI2 instruction usage in support checks.
A previous version of this was submitted and rolled back due to breakage -- an attempt to accommodate Visual Studio resulted in compiler errors on GCC/Clang with -mavx2 but without -mbmi2. This version makes the BMI2 support check more strict, to avoid the errors. A previous CL introduced _bzhi_u32 (part of Intel's BMI2 instruction set, released in Haswell) gated by a check for the __BMI2__ preprocessor macro. This works for Clang and GCC, but does not work on Visual Studio, and may not work on other compilers. This CL plumbs the BMI2 support checks through the CMake configuration used by the open source build. It also replaces the <x86intrin.h> header, which does not exist on Visual Studio, with the more scoped headers <tmmintrin.h> (for SSSE3) and <immintrin.h> (for BMI2/AVX2). Asides from fixing the open source build, the more scoped headers make it slightly less likely that newer intrinsics will creep in without proper gating. |
|
nafi | 46768e335d |
Optimize decompression by about 0.82%.
Assembly difference: https://godbolt.org/z/cvlH9b name old time/op new time/op delta BM_UFlat/0 [html ] 42.3µs ± 0% 42.5µs ± 0% +0.57% (p=0.008 n=5+5) BM_UFlat/1 [urls ] 590µs ± 0% 575µs ± 0% -2.60% (p=0.008 n=5+5) BM_UFlat/2 [jpg ] 7.16µs ± 1% 7.15µs ± 1% ~ (p=0.841 n=5+5) BM_UFlat/3 [jpg_200 ] 131ns ± 0% 129ns ± 0% -1.41% (p=0.008 n=5+5) BM_UFlat/4 [pdf ] 8.21µs ± 0% 8.22µs ± 1% ~ (p=0.690 n=5+5) BM_UFlat/5 [html4 ] 222µs ± 0% 223µs ± 0% ~ (p=0.841 n=5+5) BM_UFlat/6 [txt1 ] 193µs ± 0% 192µs ± 0% ~ (p=0.095 n=5+5) BM_UFlat/7 [txt2 ] 171µs ± 0% 169µs ± 0% -0.83% (p=0.008 n=5+5) BM_UFlat/8 [txt3 ] 511µs ± 0% 510µs ± 0% ~ (p=0.841 n=5+5) BM_UFlat/9 [txt4 ] 717µs ± 0% 707µs ± 0% -1.42% (p=0.008 n=5+5) BM_UFlat/10 [pb ] 38.8µs ± 0% 39.3µs ± 0% +1.26% (p=0.008 n=5+5) BM_UFlat/11 [gaviota ] 190µs ± 0% 189µs ± 0% -0.43% (p=0.032 n=5+5) BM_UFlat/12 [cp ] 14.3µs ± 0% 14.2µs ± 0% -0.92% (p=0.008 n=5+5) BM_UFlat/13 [c ] 7.35µs ± 1% 7.30µs ± 0% -0.66% (p=0.032 n=5+5) BM_UFlat/14 [lsp ] 2.30µs ± 1% 2.28µs ± 0% ~ (p=0.056 n=5+5) BM_UFlat/15 [xls ] 983µs ± 0% 904µs ± 0% -7.99% (p=0.008 n=5+5) BM_UFlat/16 [xls_200 ] 213ns ± 0% 213ns ± 1% ~ (p=0.690 n=5+5) BM_UFlat/17 [bin ] 278µs ± 0% 274µs ± 0% -1.56% (p=0.008 n=5+5) BM_UFlat/18 [bin_200 ] 101ns ± 0% 101ns ± 1% ~ (p=1.000 n=5+5) BM_UFlat/19 [sum ] 29.4µs ± 1% 28.0µs ± 1% -4.98% (p=0.008 n=5+5) BM_UFlat/20 [man ] 2.97µs ± 0% 2.97µs ± 0% ~ (p=0.421 n=5+5) BM_UValidate/0 [html ] 33.6µs ± 0% 33.6µs ± 0% ~ (p=0.548 n=5+5) BM_UValidate/1 [urls ] 443µs ± 0% 441µs ± 0% -0.43% (p=0.016 n=4+5) BM_UValidate/2 [jpg ] 146ns ± 0% 144ns ± 0% -1.63% (p=0.008 n=5+5) BM_UValidate/3 [jpg_200 ] 98.6ns ± 0% 95.3ns ± 0% -3.32% (p=0.008 n=5+5) BM_UValidate/4 [pdf ] 2.89µs ± 1% 2.85µs ± 0% -1.22% (p=0.008 n=5+5) BM_UIOVec/0 [html ] 122µs ± 0% 122µs ± 0% ~ (p=1.000 n=5+5) BM_UIOVec/1 [urls ] 1.08ms ± 0% 1.08ms ± 0% ~ (p=0.095 n=5+5) BM_UIOVec/2 [jpg ] 7.51µs ± 4% 7.69µs ± 6% ~ (p=0.421 n=5+5) BM_UIOVec/3 [jpg_200 ] 327ns ± 0% 327ns ± 1% ~ (p=0.730 n=4+5) BM_UIOVec/4 [pdf ] 12.0µs ± 1% 12.0µs ± 0% ~ (p=0.286 n=5+4) BM_UFlatSink/0 [html ] 42.3µs ± 0% 42.5µs ± 0% +0.46% (p=0.008 n=5+5) BM_UFlatSink/1 [urls ] 589µs ± 0% 575µs ± 0% -2.36% (p=0.008 n=5+5) BM_UFlatSink/2 [jpg ] 7.40µs ± 8% 7.74µs ± 9% ~ (p=0.310 n=5+5) BM_UFlatSink/3 [jpg_200 ] 134ns ± 0% 131ns ± 0% -1.78% (p=0.008 n=5+5) BM_UFlatSink/4 [pdf ] 8.28µs ± 3% 8.35µs ± 6% ~ (p=0.548 n=5+5) BM_UFlatSink/5 [html4 ] 222µs ± 0% 222µs ± 0% ~ (p=0.690 n=5+5) BM_UFlatSink/6 [txt1 ] 193µs ± 0% 192µs ± 0% ~ (p=0.222 n=5+5) BM_UFlatSink/7 [txt2 ] 171µs ± 0% 169µs ± 0% -0.91% (p=0.008 n=5+5) BM_UFlatSink/8 [txt3 ] 512µs ± 0% 510µs ± 0% -0.28% (p=0.032 n=5+5) BM_UFlatSink/9 [txt4 ] 717µs ± 0% 707µs ± 0% -1.32% (p=0.008 n=5+5) BM_UFlatSink/10 [pb ] 38.7µs ± 0% 39.2µs ± 0% +1.29% (p=0.008 n=5+5) BM_UFlatSink/11 [gaviota ] 190µs ± 0% 189µs ± 0% -0.47% (p=0.008 n=5+5) BM_UFlatSink/12 [cp ] 14.3µs ± 0% 14.2µs ± 0% -0.65% (p=0.008 n=5+5) BM_UFlatSink/13 [c ] 7.36µs ± 1% 7.29µs ± 0% -0.92% (p=0.008 n=5+5) BM_UFlatSink/14 [lsp ] 2.30µs ± 1% 2.29µs ± 0% ~ (p=0.841 n=5+5) BM_UFlatSink/15 [xls ] 980µs ± 0% 903µs ± 0% -7.92% (p=0.008 n=5+5) BM_UFlatSink/16 [xls_200 ] 217ns ± 0% 215ns ± 0% -0.94% (p=0.008 n=5+5) BM_UFlatSink/17 [bin ] 278µs ± 0% 273µs ± 0% -1.56% (p=0.008 n=5+5) BM_UFlatSink/18 [bin_200 ] 107ns ± 5% 104ns ± 0% ~ (p=0.056 n=5+5) BM_UFlatSink/19 [sum ] 29.5µs ± 0% 27.9µs ± 0% -5.32% (p=0.008 n=5+5) BM_UFlatSink/20 [man ] 3.01µs ± 0% 3.00µs ± 1% ~ (p=0.310 n=5+5) BM_ZFlat/0 [html (22.31 %) ] 127µs ± 0% 126µs ± 0% -0.46% (p=0.008 n=5+5) BM_ZFlat/1 [urls (47.78 %) ] 1.67ms ± 0% 1.67ms ± 0% ~ (p=0.548 n=5+5) BM_ZFlat/2 [jpg (99.95 %) ] 11.5µs ± 3% 11.6µs ± 6% ~ (p=0.841 n=5+5) BM_ZFlat/3 [jpg_200 (73.00 %)] 350ns ± 2% 347ns ± 0% ~ (p=0.905 n=5+4) BM_ZFlat/4 [pdf (83.30 %) ] 14.6µs ± 4% 14.6µs ± 1% ~ (p=0.421 n=5+5) BM_ZFlat/5 [html4 (22.52 %) ] 553µs ± 0% 553µs ± 0% ~ (p=0.690 n=5+5) BM_ZFlat/6 [txt1 (57.88 %) ] 540µs ± 0% 540µs ± 0% ~ (p=1.000 n=5+5) BM_ZFlat/7 [txt2 (61.91 %) ] 481µs ± 0% 479µs ± 0% -0.54% (p=0.008 n=5+5) BM_ZFlat/8 [txt3 (54.99 %) ] 1.44ms ± 0% 1.44ms ± 0% ~ (p=0.222 n=5+5) BM_ZFlat/9 [txt4 (66.26 %) ] 1.97ms ± 0% 1.97ms ± 0% ~ (p=0.222 n=5+5) BM_ZFlat/10 [pb (19.68 %) ] 110µs ± 0% 110µs ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/11 [gaviota (37.72 %)] 411µs ± 0% 410µs ± 0% ~ (p=0.222 n=5+5) BM_ZFlat/12 [cp (48.12 %) ] 46.1µs ± 1% 45.8µs ± 0% ~ (p=0.056 n=5+5) BM_ZFlat/13 [c (42.47 %) ] 17.6µs ± 0% 17.6µs ± 1% ~ (p=0.310 n=5+5) BM_ZFlat/14 [lsp (48.37 %) ] 5.46µs ± 1% 5.49µs ± 0% ~ (p=0.222 n=5+5) BM_ZFlat/15 [xls (41.23 %) ] 1.62ms ± 0% 1.61ms ± 0% ~ (p=0.190 n=4+5) BM_ZFlat/16 [xls_200 (78.00 %)] 392ns ± 2% 385ns ± 1% ~ (p=0.200 n=4+4) BM_ZFlat/17 [bin (18.11 %) ] 509µs ± 0% 508µs ± 0% -0.26% (p=0.008 n=5+5) BM_ZFlat/18 [bin_200 (7.50 %) ] 90.2ns ±15% 80.8ns ± 0% -10.39% (p=0.016 n=5+4) BM_ZFlat/19 [sum (48.96 %) ] 81.1µs ± 0% 79.1µs ± 1% -2.37% (p=0.008 n=5+5) BM_ZFlat/20 [man (59.21 %) ] 7.61µs ± 1% 7.57µs ± 1% ~ (p=0.421 n=5+5) name old allocs/op new allocs/op delta BM_UFlat/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/5 [html4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/6 [txt1 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/7 [txt2 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/8 [txt3 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/9 [txt4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/10 [pb ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/11 [gaviota ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/12 [cp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/13 [c ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/14 [lsp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/15 [xls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/16 [xls_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/17 [bin ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/18 [bin_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/19 [sum ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/20 [man ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/5 [html4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/6 [txt1 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/7 [txt2 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/8 [txt3 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/9 [txt4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/10 [pb ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/11 [gaviota ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/12 [cp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/13 [c ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/14 [lsp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/15 [xls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/16 [xls_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/17 [bin ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/18 [bin_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/19 [sum ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/20 [man ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_ZFlat/0 [html (22.31 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/1 [urls (47.78 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/2 [jpg (99.95 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/3 [jpg_200 (73.00 %)] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/4 [pdf (83.30 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/5 [html4 (22.52 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/6 [txt1 (57.88 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/7 [txt2 (61.91 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/8 [txt3 (54.99 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/9 [txt4 (66.26 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/10 [pb (19.68 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/11 [gaviota (37.72 %)] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/12 [cp (48.12 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/13 [c (42.47 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/14 [lsp (48.37 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/15 [xls (41.23 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/16 [xls_200 (78.00 %)] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/17 [bin (18.11 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/18 [bin_200 (7.50 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/19 [sum (48.96 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/20 [man (59.21 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_UFlat/0 [html ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/1 [urls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/2 [jpg ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/3 [jpg_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/4 [pdf ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/5 [html4 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/6 [txt1 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/7 [txt2 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/8 [txt3 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/9 [txt4 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/10 [pb ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/11 [gaviota ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/12 [cp ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/13 [c ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/14 [lsp ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/15 [xls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/16 [xls_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/17 [bin ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/18 [bin_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/19 [sum ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/20 [man ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/0 [html ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/1 [urls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/2 [jpg ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/3 [jpg_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/4 [pdf ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/0 [html ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/1 [urls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/2 [jpg ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/3 [jpg_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/4 [pdf ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlatSink/0 [html ] 102k ± 0% 102k ± 0% ~ (all samples are equal) BM_UFlatSink/1 [urls ] 702k ± 0% 702k ± 0% ~ (all samples are equal) BM_UFlatSink/2 [jpg ] 123k ± 0% 123k ± 0% ~ (all samples are equal) BM_UFlatSink/3 [jpg_200 ] 201 ± 0% 201 ± 0% ~ (all samples are equal) BM_UFlatSink/4 [pdf ] 102k ± 0% 102k ± 0% ~ (all samples are equal) BM_UFlatSink/5 [html4 ] 410k ± 0% 410k ± 0% ~ (all samples are equal) BM_UFlatSink/6 [txt1 ] 152k ± 0% 152k ± 0% ~ (all samples are equal) BM_UFlatSink/7 [txt2 ] 125k ± 0% 125k ± 0% ~ (all samples are equal) BM_UFlatSink/8 [txt3 ] 427k ± 0% 427k ± 0% ~ (all samples are equal) BM_UFlatSink/9 [txt4 ] 482k ± 0% 482k ± 0% ~ (all samples are equal) BM_UFlatSink/10 [pb ] 119k ± 0% 119k ± 0% ~ (all samples are equal) BM_UFlatSink/11 [gaviota ] 184k ± 0% 184k ± 0% ~ (all samples are equal) BM_UFlatSink/12 [cp ] 24.6k ± 0% 24.6k ± 0% ~ (all samples are equal) BM_UFlatSink/13 [c ] 11.2k ± 0% 11.2k ± 0% ~ (all samples are equal) BM_UFlatSink/14 [lsp ] 3.72k ± 0% 3.72k ± 0% ~ (all samples are equal) BM_UFlatSink/15 [xls ] 1.03M ± 0% 1.03M ± 0% ~ (all samples are equal) BM_UFlatSink/16 [xls_200 ] 201 ± 0% 201 ± 0% ~ (all samples are equal) BM_UFlatSink/17 [bin ] 513k ± 0% 513k ± 0% ~ (all samples are equal) BM_UFlatSink/18 [bin_200 ] 201 ± 0% 201 ± 0% ~ (all samples are equal) BM_UFlatSink/19 [sum ] 38.2k ± 0% 38.2k ± 0% ~ (all samples are equal) BM_UFlatSink/20 [man ] 4.23k ± 0% 4.23k ± 0% ~ (all samples are equal) BM_ZFlat/0 [html (22.31 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/1 [urls (47.78 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/2 [jpg (99.95 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/3 [jpg_200 (73.00 %)] 30.7k ± 0% 30.7k ± 0% ~ (all samples are equal) BM_ZFlat/4 [pdf (83.30 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/5 [html4 (22.52 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/6 [txt1 (57.88 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/7 [txt2 (61.91 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/8 [txt3 (54.99 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/9 [txt4 (66.26 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/10 [pb (19.68 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/11 [gaviota (37.72 %)] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/12 [cp (48.12 %) ] 86.1k ± 0% 86.1k ± 0% ~ (all samples are equal) BM_ZFlat/13 [c (42.47 %) ] 57.0k ± 0% 57.0k ± 0% ~ (all samples are equal) BM_ZFlat/14 [lsp (48.37 %) ] 30.6k ± 0% 30.6k ± 0% ~ (all samples are equal) BM_ZFlat/15 [xls (41.23 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/16 [xls_200 (78.00 %)] 30.7k ± 0% 30.7k ± 0% ~ (all samples are equal) BM_ZFlat/17 [bin (18.11 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/18 [bin_200 (7.50 %) ] 30.7k ± 0% 30.7k ± 0% ~ (all samples are equal) BM_ZFlat/19 [sum (48.96 %) ] 116k ± 0% 116k ± 0% ~ (all samples are equal) BM_ZFlat/20 [man (59.21 %) ] 30.6k ± 0% 30.6k ± 0% ~ (all samples are equal) name old speed new speed delta BM_UFlat/0 [html ] 2.43GB/s ± 0% 2.41GB/s ± 0% -0.59% (p=0.032 n=5+5) BM_UFlat/1 [urls ] 1.19GB/s ± 1% 1.22GB/s ± 0% +2.58% (p=0.008 n=5+5) BM_UFlat/2 [jpg ] 17.2GB/s ± 1% 17.3GB/s ± 1% ~ (p=0.421 n=5+5) BM_UFlat/3 [jpg_200 ] 1.54GB/s ± 1% 1.56GB/s ± 1% +1.23% (p=0.008 n=5+5) BM_UFlat/4 [pdf ] 12.5GB/s ± 1% 12.5GB/s ± 0% ~ (p=0.413 n=5+4) BM_UFlat/5 [html4 ] 1.85GB/s ± 1% 1.85GB/s ± 0% ~ (p=0.690 n=5+5) BM_UFlat/6 [txt1 ] 793MB/s ± 0% 794MB/s ± 0% ~ (p=0.690 n=5+5) BM_UFlat/7 [txt2 ] 738MB/s ± 0% 742MB/s ± 1% ~ (p=0.151 n=5+5) BM_UFlat/8 [txt3 ] 839MB/s ± 0% 838MB/s ± 0% ~ (p=0.310 n=5+5) BM_UFlat/9 [txt4 ] 674MB/s ± 0% 684MB/s ± 0% +1.55% (p=0.008 n=5+5) BM_UFlat/10 [pb ] 3.07GB/s ± 1% 3.03GB/s ± 1% -1.27% (p=0.008 n=5+5) BM_UFlat/11 [gaviota ] 974MB/s ± 0% 978MB/s ± 0% +0.50% (p=0.032 n=5+5) BM_UFlat/12 [cp ] 1.72GB/s ± 0% 1.74GB/s ± 1% +0.79% (p=0.008 n=5+5) BM_UFlat/13 [c ] 1.52GB/s ± 1% 1.53GB/s ± 1% ~ (p=0.421 n=5+5) BM_UFlat/14 [lsp ] 1.62GB/s ± 1% 1.64GB/s ± 0% ~ (p=0.151 n=5+5) BM_UFlat/15 [xls ] 1.05GB/s ± 0% 1.14GB/s ± 1% +8.60% (p=0.008 n=5+5) BM_UFlat/16 [xls_200 ] 942MB/s ± 0% 941MB/s ± 1% ~ (p=0.690 n=5+5) BM_UFlat/17 [bin ] 1.85GB/s ± 0% 1.88GB/s ± 0% +1.60% (p=0.008 n=5+5) BM_UFlat/18 [bin_200 ] 1.99GB/s ± 0% 1.99GB/s ± 0% ~ (p=0.421 n=5+5) BM_UFlat/19 [sum ] 1.30GB/s ± 1% 1.37GB/s ± 1% +5.28% (p=0.008 n=5+5) BM_UFlat/20 [man ] 1.43GB/s ± 1% 1.42GB/s ± 0% ~ (p=0.421 n=5+5) BM_UValidate/0 [html ] 3.07GB/s ± 0% 3.05GB/s ± 1% ~ (p=0.222 n=5+5) BM_UValidate/1 [urls ] 1.59GB/s ± 0% 1.60GB/s ± 0% ~ (p=0.310 n=5+5) BM_UValidate/2 [jpg ] 845GB/s ± 0% 860GB/s ± 0% +1.75% (p=0.008 n=5+5) BM_UValidate/3 [jpg_200 ] 2.04GB/s ± 1% 2.11GB/s ± 1% +3.61% (p=0.008 n=5+5) BM_UValidate/4 [pdf ] 35.6GB/s ± 1% 36.1GB/s ± 1% +1.40% (p=0.016 n=5+5) BM_UIOVec/0 [html ] 845MB/s ± 1% 843MB/s ± 1% ~ (p=0.310 n=5+5) BM_UIOVec/1 [urls ] 653MB/s ± 0% 651MB/s ± 1% ~ (p=0.190 n=4+5) BM_UIOVec/2 [jpg ] 16.4GB/s ± 4% 16.1GB/s ± 5% ~ (p=0.548 n=5+5) BM_UIOVec/3 [jpg_200 ] 611MB/s ± 2% 614MB/s ± 0% ~ (p=0.548 n=5+5) BM_UIOVec/4 [pdf ] 8.53GB/s ± 1% 8.52GB/s ± 3% ~ (p=0.841 n=5+5) BM_UFlatSink/0 [html ] 2.43GB/s ± 1% 2.42GB/s ± 0% ~ (p=0.222 n=5+5) BM_UFlatSink/1 [urls ] 1.20GB/s ± 0% 1.23GB/s ± 1% +2.38% (p=0.008 n=5+5) BM_UFlatSink/2 [jpg ] 16.7GB/s ± 8% 16.0GB/s ± 8% ~ (p=0.151 n=5+5) BM_UFlatSink/3 [jpg_200 ] 1.50GB/s ± 0% 1.53GB/s ± 0% +2.13% (p=0.008 n=5+5) BM_UFlatSink/4 [pdf ] 12.5GB/s ± 0% 12.3GB/s ± 5% ~ (p=0.730 n=4+5) BM_UFlatSink/5 [html4 ] 1.85GB/s ± 0% 1.84GB/s ± 0% ~ (p=0.151 n=5+5) BM_UFlatSink/6 [txt1 ] 791MB/s ± 0% 791MB/s ± 0% ~ (p=1.000 n=5+5) BM_UFlatSink/7 [txt2 ] 735MB/s ± 0% 739MB/s ± 0% +0.51% (p=0.016 n=5+4) BM_UFlatSink/8 [txt3 ] 838MB/s ± 0% 840MB/s ± 0% ~ (p=0.151 n=5+5) BM_UFlatSink/9 [txt4 ] 674MB/s ± 0% 683MB/s ± 0% +1.37% (p=0.008 n=5+5) BM_UFlatSink/10 [pb ] 3.07GB/s ± 0% 3.03GB/s ± 1% -1.34% (p=0.008 n=5+5) BM_UFlatSink/11 [gaviota ] 973MB/s ± 0% 975MB/s ± 0% ~ (p=0.310 n=5+5) BM_UFlatSink/12 [cp ] 1.73GB/s ± 1% 1.74GB/s ± 1% ~ (p=0.056 n=5+5) BM_UFlatSink/13 [c ] 1.52GB/s ± 1% 1.53GB/s ± 1% +0.76% (p=0.032 n=5+5) BM_UFlatSink/14 [lsp ] 1.62GB/s ± 0% 1.63GB/s ± 0% ~ (p=0.548 n=5+5) BM_UFlatSink/15 [xls ] 1.05GB/s ± 0% 1.14GB/s ± 0% +8.57% (p=0.008 n=5+5) BM_UFlatSink/16 [xls_200 ] 925MB/s ± 0% 933MB/s ± 0% +0.85% (p=0.008 n=5+5) BM_UFlatSink/17 [bin ] 1.85GB/s ± 1% 1.88GB/s ± 0% +1.47% (p=0.008 n=5+5) BM_UFlatSink/18 [bin_200 ] 1.88GB/s ± 5% 1.93GB/s ± 0% ~ (p=0.421 n=5+5) BM_UFlatSink/19 [sum ] 1.30GB/s ± 1% 1.37GB/s ± 1% +5.18% (p=0.008 n=5+5) BM_UFlatSink/20 [man ] 1.41GB/s ± 0% 1.41GB/s ± 1% ~ (p=0.222 n=5+5) BM_ZFlat/0 [html (22.31 %) ] 809MB/s ± 0% 814MB/s ± 1% +0.61% (p=0.016 n=5+5) BM_ZFlat/1 [urls (47.78 %) ] 423MB/s ± 0% 422MB/s ± 0% ~ (p=0.548 n=5+5) BM_ZFlat/2 [jpg (99.95 %) ] 10.8GB/s ± 3% 10.6GB/s ± 5% ~ (p=0.690 n=5+5) BM_ZFlat/3 [jpg_200 (73.00 %)] 575MB/s ± 2% 579MB/s ± 0% ~ (p=1.000 n=5+4) BM_ZFlat/4 [pdf (83.30 %) ] 7.06GB/s ± 4% 7.05GB/s ± 2% ~ (p=0.421 n=5+5) BM_ZFlat/5 [html4 (22.52 %) ] 745MB/s ± 0% 744MB/s ± 0% ~ (p=0.421 n=5+5) BM_ZFlat/6 [txt1 (57.88 %) ] 282MB/s ± 0% 282MB/s ± 1% ~ (p=1.000 n=5+5) BM_ZFlat/7 [txt2 (61.91 %) ] 261MB/s ± 0% 263MB/s ± 0% +0.55% (p=0.032 n=5+5) BM_ZFlat/8 [txt3 (54.99 %) ] 297MB/s ± 1% 297MB/s ± 0% ~ (p=1.000 n=5+5) BM_ZFlat/9 [txt4 (66.26 %) ] 245MB/s ± 0% 246MB/s ± 0% ~ (p=0.286 n=5+5) BM_ZFlat/10 [pb (19.68 %) ] 1.08GB/s ± 1% 1.08GB/s ± 0% ~ (p=0.056 n=5+5) BM_ZFlat/11 [gaviota (37.72 %)] 450MB/s ± 0% 452MB/s ± 0% +0.55% (p=0.016 n=5+5) BM_ZFlat/12 [cp (48.12 %) ] 537MB/s ± 1% 538MB/s ± 0% ~ (p=0.421 n=5+5) BM_ZFlat/13 [c (42.47 %) ] 637MB/s ± 1% 634MB/s ± 1% ~ (p=0.222 n=5+5) BM_ZFlat/14 [lsp (48.37 %) ] 684MB/s ± 1% 680MB/s ± 0% ~ (p=0.310 n=5+5) BM_ZFlat/15 [xls (41.23 %) ] 641MB/s ± 0% 640MB/s ± 1% ~ (p=0.310 n=5+5) BM_ZFlat/16 [xls_200 (78.00 %)] 501MB/s ± 9% 521MB/s ± 1% ~ (p=0.111 n=5+4) BM_ZFlat/17 [bin (18.11 %) ] 1.01GB/s ± 0% 1.02GB/s ± 1% ~ (p=0.151 n=5+5) BM_ZFlat/18 [bin_200 (7.50 %) ] 2.24GB/s ±14% 2.48GB/s ± 0% ~ (p=0.063 n=5+4) BM_ZFlat/19 [sum (48.96 %) ] 473MB/s ± 1% 485MB/s ± 1% +2.47% (p=0.008 n=5+5) BM_ZFlat/20 [man (59.21 %) ] 558MB/s ± 1% 558MB/s ± 1% ~ (p=1.000 n=5+5) |
|
costan | fdba21ffd6 |
Fix typo in two argument names in stubs.
The stubs are only used in the open source version, so it wasn't caught in internal tests. |
|
costan | 81d444e4e4 |
Remove direct use of _builtin_clz.
A previous CL introduced _builtin_clz in zippy.cc. This is a GCC / Clang intrinsic, and is not supported in Visual Studio. The rest of the project uses bit manipulation intrinsics via the functions in Bits::, which are stubbed out for the open source build in zippy-stubs-internal.h. This CL extracts Bits::Log2FloorNonZero() out of Bits::Log2Floor() in the stubbed version of Bits, adds assertions to the Bits::*NonZero() functions in the stubs, and converts _builtin_clz to a Bits::Log2FloorNonZero() call. The latter part is not obvious. A mathematical proof of correctness is outlined in added comments. An empirical proof is available at https://godbolt.org/z/mPKWmh -- CalculateTableSizeOld(), which is the current code, compiles to the same assembly on Clang as CalculateTableSizeNew1(), which is the bigger jump in the proof. CalculateTableSizeNew2() is a fairly obvious transformation from CalculateTableSizeNew1(), and results in slightly better assembly on all supported compilers. Two benchmark runs with the same arguments as the original CL only showed differences in completely disjoint tests, suggesting that the differences are pure noise. |
|
costan | 9a6fa91217 |
Remove use of std::uniform_distribution<uint8_t>.
A previous CL removed use of Google-specific random number generating
functionality, such as ACMRandom, and used the C++11 standard library
instead. The CL used std::uniform_distribution<uint8_t> to generate
random bytes, which seems to be unsupported by the standard [1, 2].
For better or for worse, our toolchain does not complain. However,
Visual Studio errors out with "invalid template argument for
uniform_int_distribution: N4659 29.6.1.1 [rand.req.genl]/1e requires one
of short, int, long, long long, unsigned short, unsigned int, unsigned
long, or unsigned long long".
This CL replaces std::uniform_distribution<uint8_t> with
std::uniform_distribution<int>(0, 255) and appropriate static_cast<>s.
[1] http://eel.is/c++draft/rand.req.genl#1.6
[2]
|
|
costan | 3fcbc47f99 |
Use std random number generators in tests.
An earlier CL introduced absl::Uniform, which is not yet open sourced, and therefore unavailable in the open source build. This CL removes absl::Uniform and ACMRandom in favor of equivalent C++11 standard random generators. Abseil promises to be faster than the standard library, but we can afford a speed hit in tests in return for an easier open sourcing story. |
|
costan | 925c3094c4 |
Convert DCHECK to assert.
The open source build does not support DCHECK, and this project uses assert() instead of DCHECK. |
|
costan | 02de4ff1d1 |
Update Travis CI configuration.
The Travis CI configuration updates reflect the following changes: * Container-based builds (sudo: false) have been removed. https://changelog.travis-ci.com/the-container-based-build-environment-is-fully-deprecated-84517 * Ubuntu Xenial (16.04) is available as a base image. https://blog.travis-ci.com/2018-11-08-xenial-release * Homebrew now has a dedicated DSL. https://docs.travis-ci.com/user/installing-dependencies/#installing-packages-on-os-x To take full advantage of VM resources, CI builds now use Ninja https://ninja-build.org/ instead of Make. |
|
atdt | f7aece15e2 | Add comment explaining MSan false-positive workaround | |
atdt | 5913c5f8e4 |
Don't use _bzhi_u32 under MSan
MSan knows that x & 0xFF only uses the lower byte from x but it isn't as smart about _bzhi_u32(val, 8). (I'll file an upstream bug.) |
|
atdt | 136b3ebc31 |
If BMI instructions are available, use BZHI to extract low bytes.
With --cpu=haswell, this results in some significant speed improvement (notably 12-14% for html and pb). On k8, performance is not affected (as expected). Full benchmark results for --cpu={k8,haswell} below. Haswell ------- name old time/op new time/op delta BM_UFlat/0 [html ] 55.2µs ± 0% 49.0µs ± 0% -11.34% (p=0.008 n=5+5) BM_UFlat/1 [urls ] 612µs ± 0% 604µs ± 0% -1.21% (p=0.008 n=5+5) BM_UFlat/2 [jpg ] 6.11µs ± 2% 6.07µs ± 1% ~ (p=0.421 n=5+5) BM_UFlat/3 [jpg_200 ] 134ns ± 0% 132ns ± 5% -1.49% (p=0.048 n=5+5) BM_UFlat/4 [pdf ] 8.41µs ± 2% 8.34µs ± 1% ~ (p=0.222 n=5+5) BM_UFlat/5 [html4 ] 239µs ± 0% 234µs ± 0% -2.24% (p=0.008 n=5+5) BM_UFlat/6 [txt1 ] 211µs ± 0% 205µs ± 0% -2.73% (p=0.008 n=5+5) BM_UFlat/7 [txt2 ] 185µs ± 0% 181µs ± 0% -2.34% (p=0.008 n=5+5) BM_UFlat/8 [txt3 ] 560µs ± 0% 545µs ± 0% -2.55% (p=0.008 n=5+5) BM_UFlat/9 [txt4 ] 773µs ± 0% 753µs ± 0% -2.61% (p=0.008 n=5+5) BM_UFlat/10 [pb ] 51.6µs ± 0% 45.3µs ± 0% -12.28% (p=0.008 n=5+5) BM_UFlat/11 [gaviota ] 209µs ± 0% 204µs ± 0% -2.28% (p=0.008 n=5+5) BM_UFlat/12 [cp ] 17.3µs ± 0% 15.7µs ± 1% -9.57% (p=0.008 n=5+5) BM_UFlat/13 [c ] 8.08µs ± 0% 8.00µs ± 0% -0.99% (p=0.008 n=5+5) BM_UFlat/14 [lsp ] 2.48µs ± 0% 2.45µs ± 0% -1.11% (p=0.008 n=5+5) BM_UFlat/15 [xls ] 967µs ± 0% 954µs ± 0% -1.36% (p=0.008 n=5+5) BM_UFlat/16 [xls_200 ] 219ns ± 1% 218ns ± 1% ~ (p=0.444 n=5+5) BM_UFlat/17 [bin ] 278µs ± 0% 275µs ± 0% -0.92% (p=0.008 n=5+5) BM_UFlat/18 [bin_200 ] 100ns ± 0% 99ns ± 1% -1.04% (p=0.008 n=5+5) BM_UFlat/19 [sum ] 34.0µs ± 0% 30.9µs ± 0% -9.10% (p=0.008 n=5+5) BM_UFlat/20 [man ] 3.21µs ± 0% 3.20µs ± 0% ~ (p=0.063 n=5+5) BM_UValidate/0 [html ] 33.1µs ± 0% 33.6µs ± 0% +1.69% (p=0.008 n=5+5) BM_UValidate/1 [urls ] 436µs ± 0% 441µs ± 0% +1.06% (p=0.008 n=5+5) BM_UValidate/2 [jpg ] 141ns ± 0% 142ns ± 0% +0.71% (p=0.008 n=5+5) BM_UValidate/3 [jpg_200 ] 94.3ns ± 0% 95.3ns ± 0% +1.06% (p=0.008 n=5+5) BM_UValidate/4 [pdf ] 2.87µs ± 0% 2.95µs ± 0% +2.74% (p=0.008 n=5+5) BM_UIOVec/0 [html ] 126µs ± 0% 124µs ± 0% -1.50% (p=0.008 n=5+5) BM_UIOVec/1 [urls ] 1.13ms ± 0% 1.11ms ± 0% -1.95% (p=0.008 n=5+5) BM_UIOVec/2 [jpg ] 6.31µs ± 3% 7.44µs ± 3% +17.75% (p=0.008 n=5+5) BM_UIOVec/3 [jpg_200 ] 332ns ± 1% 318ns ± 1% -4.22% (p=0.008 n=5+5) BM_UIOVec/4 [pdf ] 12.7µs ± 3% 12.6µs ± 9% ~ (p=0.222 n=5+5) BM_UFlatSink/0 [html ] 55.2µs ± 0% 49.0µs ± 0% -11.31% (p=0.008 n=5+5) BM_UFlatSink/1 [urls ] 612µs ± 0% 605µs ± 0% -1.17% (p=0.008 n=5+5) BM_UFlatSink/2 [jpg ] 6.29µs ±12% 6.57µs ± 9% ~ (p=0.548 n=5+5) BM_UFlatSink/3 [jpg_200 ] 138ns ± 2% 134ns ± 0% -2.76% (p=0.000 n=5+4) BM_UFlatSink/4 [pdf ] 8.35µs ± 0% 8.34µs ± 1% ~ (p=0.905 n=4+5) BM_UFlatSink/5 [html4 ] 239µs ± 0% 234µs ± 0% -2.33% (p=0.008 n=5+5) BM_UFlatSink/6 [txt1 ] 211µs ± 0% 205µs ± 0% -2.82% (p=0.008 n=5+5) BM_UFlatSink/7 [txt2 ] 185µs ± 0% 181µs ± 0% -2.18% (p=0.008 n=5+5) BM_UFlatSink/8 [txt3 ] 560µs ± 0% 545µs ± 0% -2.57% (p=0.008 n=5+5) BM_UFlatSink/9 [txt4 ] 773µs ± 0% 754µs ± 0% -2.54% (p=0.008 n=5+5) BM_UFlatSink/10 [pb ] 51.6µs ± 0% 45.3µs ± 0% -12.19% (p=0.008 n=5+5) BM_UFlatSink/11 [gaviota ] 209µs ± 0% 204µs ± 0% -2.39% (p=0.008 n=5+5) BM_UFlatSink/12 [cp ] 17.3µs ± 0% 15.6µs ± 0% -9.98% (p=0.008 n=5+5) BM_UFlatSink/13 [c ] 8.10µs ± 1% 7.98µs ± 0% -1.53% (p=0.008 n=5+5) BM_UFlatSink/14 [lsp ] 2.49µs ± 1% 2.47µs ± 0% -0.84% (p=0.008 n=5+5) BM_UFlatSink/15 [xls ] 968µs ± 0% 953µs ± 0% -1.48% (p=0.008 n=5+5) BM_UFlatSink/16 [xls_200 ] 220ns ± 1% 220ns ± 0% ~ (p=1.000 n=5+4) BM_UFlatSink/17 [bin ] 278µs ± 0% 275µs ± 0% -0.99% (p=0.008 n=5+5) BM_UFlatSink/18 [bin_200 ] 102ns ± 1% 103ns ± 0% +1.18% (p=0.048 n=5+5) BM_UFlatSink/19 [sum ] 34.0µs ± 0% 30.9µs ± 0% -9.21% (p=0.008 n=5+5) BM_UFlatSink/20 [man ] 3.22µs ± 1% 3.20µs ± 0% -0.76% (p=0.032 n=5+5) BM_ZFlat/0 [html (22.31 %) ] 122µs ± 0% 122µs ± 0% ~ (p=0.413 n=4+5) BM_ZFlat/1 [urls (47.78 %) ] 1.60ms ± 0% 1.60ms ± 0% -0.06% (p=0.032 n=5+5) BM_ZFlat/2 [jpg (99.95 %) ] 10.5µs ± 2% 10.7µs ± 9% ~ (p=0.841 n=5+5) BM_ZFlat/3 [jpg_200 (73.00 %)] 310ns ± 1% 309ns ± 3% ~ (p=0.349 n=4+5) BM_ZFlat/4 [pdf (83.30 %) ] 13.5µs ± 1% 13.6µs ± 2% ~ (p=0.595 n=5+5) BM_ZFlat/5 [html4 (22.52 %) ] 533µs ± 0% 532µs ± 0% -0.08% (p=0.032 n=5+5) BM_ZFlat/6 [txt1 (57.88 %) ] 529µs ± 0% 528µs ± 0% ~ (p=0.222 n=5+5) BM_ZFlat/7 [txt2 (61.91 %) ] 469µs ± 0% 469µs ± 0% ~ (p=0.690 n=5+5) BM_ZFlat/8 [txt3 (54.99 %) ] 1.40ms ± 0% 1.40ms ± 0% ~ (p=0.548 n=5+5) BM_ZFlat/9 [txt4 (66.26 %) ] 1.93ms ± 0% 1.92ms ± 0% ~ (p=0.421 n=5+5) BM_ZFlat/10 [pb (19.68 %) ] 106µs ± 0% 106µs ± 0% ~ (p=0.548 n=5+5) BM_ZFlat/11 [gaviota (37.72 %)] 404µs ± 0% 404µs ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/12 [cp (48.12 %) ] 43.2µs ± 0% 43.3µs ± 1% ~ (p=0.151 n=5+5) BM_ZFlat/13 [c (42.47 %) ] 16.4µs ± 1% 16.4µs ± 0% ~ (p=0.310 n=5+5) BM_ZFlat/14 [lsp (48.37 %) ] 4.96µs ± 0% 4.96µs ± 1% ~ (p=0.651 n=5+5) BM_ZFlat/15 [xls (41.23 %) ] 1.54ms ± 0% 1.54ms ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/16 [xls_200 (78.00 %)] 352ns ± 2% 351ns ± 1% ~ (p=0.762 n=5+5) BM_ZFlat/17 [bin (18.11 %) ] 491µs ± 0% 491µs ± 0% ~ (p=0.310 n=5+5) BM_ZFlat/18 [bin_200 (7.50 %) ] 75.6ns ± 1% 77.2ns ± 0% +2.06% (p=0.016 n=5+4) BM_ZFlat/19 [sum (48.96 %) ] 76.9µs ± 0% 76.7µs ± 0% ~ (p=0.222 n=5+5) BM_ZFlat/20 [man (59.21 %) ] 6.87µs ± 1% 6.81µs ± 0% -0.87% (p=0.008 n=5+5) name old speed new speed delta BM_UFlat/0 [html ] 1.85GB/s ± 0% 2.09GB/s ± 0% +12.83% (p=0.016 n=4+5) BM_UFlat/1 [urls ] 1.15GB/s ± 0% 1.16GB/s ± 0% +1.25% (p=0.008 n=5+5) BM_UFlat/2 [jpg ] 20.1GB/s ± 2% 20.3GB/s ± 1% ~ (p=0.421 n=5+5) BM_UFlat/3 [jpg_200 ] 1.49GB/s ± 0% 1.53GB/s ± 0% +2.83% (p=0.016 n=5+4) BM_UFlat/4 [pdf ] 12.2GB/s ± 2% 12.3GB/s ± 1% ~ (p=0.222 n=5+5) BM_UFlat/5 [html4 ] 1.71GB/s ± 0% 1.75GB/s ± 0% +2.29% (p=0.008 n=5+5) BM_UFlat/6 [txt1 ] 722MB/s ± 0% 742MB/s ± 0% +2.81% (p=0.008 n=5+5) BM_UFlat/7 [txt2 ] 676MB/s ± 0% 692MB/s ± 0% +2.40% (p=0.008 n=5+5) BM_UFlat/8 [txt3 ] 762MB/s ± 0% 782MB/s ± 0% +2.62% (p=0.008 n=5+5) BM_UFlat/9 [txt4 ] 623MB/s ± 0% 640MB/s ± 0% +2.68% (p=0.008 n=5+5) BM_UFlat/10 [pb ] 2.30GB/s ± 0% 2.62GB/s ± 0% +13.99% (p=0.008 n=5+5) BM_UFlat/11 [gaviota ] 883MB/s ± 0% 903MB/s ± 0% +2.33% (p=0.008 n=5+5) BM_UFlat/12 [cp ] 1.42GB/s ± 0% 1.57GB/s ± 1% +10.57% (p=0.008 n=5+5) BM_UFlat/13 [c ] 1.38GB/s ± 0% 1.39GB/s ± 0% +1.00% (p=0.008 n=5+5) BM_UFlat/14 [lsp ] 1.50GB/s ± 0% 1.52GB/s ± 0% +1.12% (p=0.008 n=5+5) BM_UFlat/15 [xls ] 1.06GB/s ± 0% 1.08GB/s ± 0% +1.34% (p=0.016 n=5+4) BM_UFlat/16 [xls_200 ] 913MB/s ± 1% 918MB/s ± 1% ~ (p=0.421 n=5+5) BM_UFlat/17 [bin ] 1.85GB/s ± 0% 1.86GB/s ± 0% +0.92% (p=0.008 n=5+5) BM_UFlat/18 [bin_200 ] 2.01GB/s ± 0% 2.03GB/s ± 1% +1.10% (p=0.008 n=5+5) BM_UFlat/19 [sum ] 1.13GB/s ± 0% 1.24GB/s ± 0% +9.99% (p=0.008 n=5+5) BM_UFlat/20 [man ] 1.32GB/s ± 0% 1.32GB/s ± 1% ~ (p=0.063 n=5+5) BM_UValidate/0 [html ] 3.10GB/s ± 0% 3.04GB/s ± 0% -1.66% (p=0.008 n=5+5) BM_UValidate/1 [urls ] 1.61GB/s ± 0% 1.59GB/s ± 0% -1.04% (p=0.008 n=5+5) BM_UValidate/2 [jpg ] 875GB/s ± 0% 866GB/s ± 0% -1.11% (p=0.008 n=5+5) BM_UValidate/3 [jpg_200 ] 2.12GB/s ± 0% 2.10GB/s ± 0% -1.01% (p=0.016 n=5+4) BM_UValidate/4 [pdf ] 35.7GB/s ± 0% 34.7GB/s ± 0% -2.66% (p=0.008 n=5+5) BM_UIOVec/0 [html ] 813MB/s ± 0% 825MB/s ± 0% +1.52% (p=0.008 n=5+5) BM_UIOVec/1 [urls ] 622MB/s ± 0% 634MB/s ± 0% +1.99% (p=0.008 n=5+5) BM_UIOVec/2 [jpg ] 19.5GB/s ± 3% 16.6GB/s ± 3% -15.08% (p=0.008 n=5+5) BM_UIOVec/3 [jpg_200 ] 603MB/s ± 1% 630MB/s ± 1% +4.42% (p=0.008 n=5+5) BM_UIOVec/4 [pdf ] 8.05GB/s ± 3% 8.12GB/s ± 8% ~ (p=0.222 n=5+5) BM_UFlatSink/0 [html ] 1.85GB/s ± 0% 2.09GB/s ± 0% +12.76% (p=0.008 n=5+5) BM_UFlatSink/1 [urls ] 1.15GB/s ± 0% 1.16GB/s ± 0% +1.18% (p=0.008 n=5+5) BM_UFlatSink/2 [jpg ] 19.6GB/s ±11% 18.8GB/s ± 9% ~ (p=0.548 n=5+5) BM_UFlatSink/3 [jpg_200 ] 1.45GB/s ± 1% 1.49GB/s ± 0% +2.82% (p=0.016 n=5+4) BM_UFlatSink/4 [pdf ] 12.3GB/s ± 0% 12.3GB/s ± 1% ~ (p=0.905 n=4+5) BM_UFlatSink/5 [html4 ] 1.71GB/s ± 0% 1.75GB/s ± 0% +2.41% (p=0.008 n=5+5) BM_UFlatSink/6 [txt1 ] 722MB/s ± 0% 743MB/s ± 0% +2.90% (p=0.008 n=5+5) BM_UFlatSink/7 [txt2 ] 676MB/s ± 0% 691MB/s ± 0% +2.23% (p=0.008 n=5+5) BM_UFlatSink/8 [txt3 ] 763MB/s ± 0% 783MB/s ± 0% +2.64% (p=0.008 n=5+5) BM_UFlatSink/9 [txt4 ] 623MB/s ± 0% 639MB/s ± 0% +2.61% (p=0.008 n=5+5) BM_UFlatSink/10 [pb ] 2.30GB/s ± 0% 2.62GB/s ± 0% +13.86% (p=0.008 n=5+5) BM_UFlatSink/11 [gaviota ] 882MB/s ± 0% 904MB/s ± 0% +2.45% (p=0.008 n=5+5) BM_UFlatSink/12 [cp ] 1.42GB/s ± 0% 1.58GB/s ± 0% +11.09% (p=0.008 n=5+5) BM_UFlatSink/13 [c ] 1.38GB/s ± 1% 1.40GB/s ± 0% +1.56% (p=0.008 n=5+5) BM_UFlatSink/14 [lsp ] 1.50GB/s ± 1% 1.51GB/s ± 1% +0.85% (p=0.008 n=5+5) BM_UFlatSink/15 [xls ] 1.06GB/s ± 0% 1.08GB/s ± 0% +1.51% (p=0.016 n=5+4) BM_UFlatSink/16 [xls_200 ] 908MB/s ± 1% 911MB/s ± 0% ~ (p=0.730 n=5+4) BM_UFlatSink/17 [bin ] 1.85GB/s ± 0% 1.86GB/s ± 0% +1.01% (p=0.008 n=5+5) BM_UFlatSink/18 [bin_200 ] 1.96GB/s ± 1% 1.94GB/s ± 1% -1.18% (p=0.016 n=5+5) BM_UFlatSink/19 [sum ] 1.12GB/s ± 0% 1.24GB/s ± 0% +10.16% (p=0.008 n=5+5) BM_UFlatSink/20 [man ] 1.31GB/s ± 1% 1.32GB/s ± 0% +0.77% (p=0.048 n=5+5) BM_ZFlat/0 [html (22.31 %) ] 839MB/s ± 0% 839MB/s ± 0% ~ (p=0.413 n=4+5) BM_ZFlat/1 [urls (47.78 %) ] 439MB/s ± 0% 439MB/s ± 0% +0.06% (p=0.032 n=5+5) BM_ZFlat/2 [jpg (99.95 %) ] 11.7GB/s ± 2% 11.5GB/s ± 9% ~ (p=0.841 n=5+5) BM_ZFlat/3 [jpg_200 (73.00 %)] 645MB/s ± 1% 647MB/s ± 3% ~ (p=0.413 n=4+5) BM_ZFlat/4 [pdf (83.30 %) ] 7.57GB/s ± 1% 7.54GB/s ± 2% ~ (p=0.595 n=5+5) BM_ZFlat/5 [html4 (22.52 %) ] 769MB/s ± 0% 770MB/s ± 0% +0.08% (p=0.032 n=5+5) BM_ZFlat/6 [txt1 (57.88 %) ] 288MB/s ± 0% 288MB/s ± 0% ~ (p=0.222 n=5+5) BM_ZFlat/7 [txt2 (61.91 %) ] 267MB/s ± 0% 267MB/s ± 0% ~ (p=0.690 n=5+5) BM_ZFlat/8 [txt3 (54.99 %) ] 305MB/s ± 0% 305MB/s ± 0% ~ (p=0.548 n=5+5) BM_ZFlat/9 [txt4 (66.26 %) ] 250MB/s ± 0% 251MB/s ± 0% ~ (p=0.421 n=5+5) BM_ZFlat/10 [pb (19.68 %) ] 1.12GB/s ± 0% 1.12GB/s ± 0% ~ (p=0.635 n=5+5) BM_ZFlat/11 [gaviota (37.72 %)] 457MB/s ± 0% 457MB/s ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/12 [cp (48.12 %) ] 570MB/s ± 0% 568MB/s ± 1% ~ (p=0.151 n=5+5) BM_ZFlat/13 [c (42.47 %) ] 682MB/s ± 1% 681MB/s ± 0% ~ (p=0.310 n=5+5) BM_ZFlat/14 [lsp (48.37 %) ] 750MB/s ± 0% 751MB/s ± 1% ~ (p=0.690 n=5+5) BM_ZFlat/15 [xls (41.23 %) ] 668MB/s ± 0% 668MB/s ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/16 [xls_200 (78.00 %)] 569MB/s ± 2% 570MB/s ± 1% ~ (p=0.841 n=5+5) BM_ZFlat/17 [bin (18.11 %) ] 1.04GB/s ± 0% 1.04GB/s ± 0% ~ (p=0.310 n=5+5) BM_ZFlat/18 [bin_200 (7.50 %) ] 2.64GB/s ± 1% 2.59GB/s ± 0% -1.99% (p=0.016 n=5+4) BM_ZFlat/19 [sum (48.96 %) ] 497MB/s ± 0% 498MB/s ± 0% ~ (p=0.222 n=5+5) BM_ZFlat/20 [man (59.21 %) ] 615MB/s ± 1% 621MB/s ± 0% +0.87% (p=0.008 n=5+5) K8 -- name old time/op new time/op delta BM_UFlat/0 [html ] 41.7µs ± 0% 41.7µs ± 0% ~ (p=0.841 n=5+5) BM_UFlat/1 [urls ] 588µs ± 0% 588µs ± 0% ~ (p=0.310 n=5+5) BM_UFlat/2 [jpg ] 7.11µs ± 1% 7.10µs ± 1% ~ (p=0.556 n=5+4) BM_UFlat/3 [jpg_200 ] 130ns ± 0% 130ns ± 0% ~ (all samples are equal) BM_UFlat/4 [pdf ] 8.19µs ± 0% 8.26µs ± 2% ~ (p=0.460 n=5+5) BM_UFlat/5 [html4 ] 219µs ± 0% 219µs ± 0% ~ (p=1.000 n=5+5) BM_UFlat/6 [txt1 ] 192µs ± 0% 191µs ± 0% ~ (p=0.341 n=5+5) BM_UFlat/7 [txt2 ] 170µs ± 0% 170µs ± 0% ~ (p=0.841 n=5+5) BM_UFlat/8 [txt3 ] 509µs ± 0% 509µs ± 0% ~ (p=0.151 n=5+5) BM_UFlat/9 [txt4 ] 712µs ± 0% 712µs ± 0% ~ (p=0.841 n=5+5) BM_UFlat/10 [pb ] 38.5µs ± 0% 38.5µs ± 0% ~ (p=0.452 n=5+5) BM_UFlat/11 [gaviota ] 189µs ± 0% 189µs ± 0% ~ (p=0.841 n=5+5) BM_UFlat/12 [cp ] 14.2µs ± 1% 14.2µs ± 0% ~ (p=0.889 n=5+5) BM_UFlat/13 [c ] 7.32µs ± 0% 7.33µs ± 0% ~ (p=1.000 n=5+5) BM_UFlat/14 [lsp ] 2.26µs ± 0% 2.27µs ± 0% ~ (p=0.222 n=4+5) BM_UFlat/15 [xls ] 954µs ± 0% 955µs ± 0% ~ (p=0.222 n=5+5) BM_UFlat/16 [xls_200 ] 215ns ± 4% 212ns ± 0% ~ (p=0.095 n=5+4) BM_UFlat/17 [bin ] 276µs ± 0% 276µs ± 0% ~ (p=0.841 n=5+5) BM_UFlat/18 [bin_200 ] 104ns ±10% 103ns ± 3% ~ (p=0.825 n=5+5) BM_UFlat/19 [sum ] 29.2µs ± 0% 29.2µs ± 0% ~ (p=0.690 n=5+5) BM_UFlat/20 [man ] 2.96µs ± 0% 2.97µs ± 0% +0.43% (p=0.032 n=5+5) BM_UValidate/0 [html ] 33.4µs ± 0% 33.4µs ± 0% ~ (p=0.151 n=5+5) BM_UValidate/1 [urls ] 441µs ± 0% 441µs ± 0% ~ (p=0.548 n=5+5) BM_UValidate/2 [jpg ] 146ns ± 0% 146ns ± 0% ~ (all samples are equal) BM_UValidate/3 [jpg_200 ] 98.0ns ± 0% 98.0ns ± 0% ~ (p=1.000 n=5+5) BM_UValidate/4 [pdf ] 2.89µs ± 0% 2.89µs ± 0% ~ (p=0.794 n=5+5) BM_UIOVec/0 [html ] 121µs ± 0% 121µs ± 0% ~ (p=0.151 n=5+5) BM_UIOVec/1 [urls ] 1.08ms ± 0% 1.08ms ± 0% ~ (p=0.095 n=5+5) BM_UIOVec/2 [jpg ] 7.47µs ± 5% 7.31µs ± 2% ~ (p=0.222 n=5+5) BM_UIOVec/3 [jpg_200 ] 330ns ± 0% 330ns ± 0% ~ (all samples are equal) BM_UIOVec/4 [pdf ] 12.3µs ± 2% 12.0µs ± 0% ~ (p=0.063 n=5+5) BM_UFlatSink/0 [html ] 41.6µs ± 0% 41.6µs ± 0% ~ (p=0.095 n=5+5) BM_UFlatSink/1 [urls ] 589µs ± 0% 589µs ± 0% ~ (p=1.000 n=5+5) BM_UFlatSink/2 [jpg ] 7.84µs ±26% 7.23µs ± 5% ~ (p=0.690 n=5+5) BM_UFlatSink/3 [jpg_200 ] 132ns ± 0% 132ns ± 0% ~ (all samples are equal) BM_UFlatSink/4 [pdf ] 8.43µs ± 3% 8.27µs ± 2% ~ (p=0.254 n=5+5) BM_UFlatSink/5 [html4 ] 219µs ± 0% 219µs ± 0% ~ (p=0.524 n=5+5) BM_UFlatSink/6 [txt1 ] 192µs ± 0% 192µs ± 0% ~ (p=0.690 n=5+5) BM_UFlatSink/7 [txt2 ] 170µs ± 0% 170µs ± 0% ~ (p=0.421 n=5+5) BM_UFlatSink/8 [txt3 ] 509µs ± 0% 509µs ± 0% ~ (p=0.310 n=5+5) BM_UFlatSink/9 [txt4 ] 712µs ± 0% 712µs ± 0% ~ (p=0.841 n=5+5) BM_UFlatSink/10 [pb ] 38.5µs ± 0% 38.5µs ± 0% ~ (p=0.421 n=5+5) BM_UFlatSink/11 [gaviota ] 189µs ± 0% 189µs ± 0% ~ (p=1.000 n=5+5) BM_UFlatSink/12 [cp ] 14.2µs ± 0% 14.2µs ± 0% ~ (p=0.421 n=5+5) BM_UFlatSink/13 [c ] 7.37µs ± 1% 7.36µs ± 1% ~ (p=0.746 n=5+5) BM_UFlatSink/14 [lsp ] 2.27µs ± 0% 2.27µs ± 1% ~ (p=0.714 n=5+5) BM_UFlatSink/15 [xls ] 954µs ± 0% 954µs ± 0% ~ (p=1.000 n=5+5) BM_UFlatSink/16 [xls_200 ] 215ns ± 1% 215ns ± 1% ~ (p=0.921 n=5+5) BM_UFlatSink/17 [bin ] 276µs ± 0% 276µs ± 0% ~ (p=1.000 n=5+5) BM_UFlatSink/18 [bin_200 ] 103ns ± 2% 104ns ± 1% ~ (p=0.429 n=5+5) BM_UFlatSink/19 [sum ] 29.2µs ± 0% 29.2µs ± 0% ~ (p=0.452 n=5+5) BM_UFlatSink/20 [man ] 2.96µs ± 0% 2.97µs ± 1% ~ (p=0.484 n=5+5) BM_ZFlat/0 [html (22.31 %) ] 126µs ± 0% 126µs ± 0% ~ (p=1.000 n=5+5) BM_ZFlat/1 [urls (47.78 %) ] 1.67ms ± 0% 1.67ms ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/2 [jpg (99.95 %) ] 11.6µs ± 4% 11.6µs ± 3% ~ (p=1.000 n=5+5) BM_ZFlat/3 [jpg_200 (73.00 %)] 368ns ± 1% 367ns ± 0% ~ (p=0.159 n=5+5) BM_ZFlat/4 [pdf (83.30 %) ] 14.7µs ± 1% 14.6µs ± 0% ~ (p=0.190 n=5+4) BM_ZFlat/5 [html4 (22.52 %) ] 550µs ± 0% 550µs ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/6 [txt1 (57.88 %) ] 540µs ± 0% 540µs ± 0% ~ (p=0.310 n=5+5) BM_ZFlat/7 [txt2 (61.91 %) ] 479µs ± 0% 480µs ± 0% ~ (p=1.000 n=5+5) BM_ZFlat/8 [txt3 (54.99 %) ] 1.44ms ± 0% 1.44ms ± 0% ~ (p=0.421 n=5+5) BM_ZFlat/9 [txt4 (66.26 %) ] 1.97ms ± 0% 1.97ms ± 0% ~ (p=0.421 n=5+5) BM_ZFlat/10 [pb (19.68 %) ] 110µs ± 0% 109µs ± 0% ~ (p=0.730 n=5+4) BM_ZFlat/11 [gaviota (37.72 %)] 412µs ± 0% 412µs ± 0% ~ (p=1.000 n=5+5) BM_ZFlat/12 [cp (48.12 %) ] 46.3µs ± 0% 46.3µs ± 1% ~ (p=0.841 n=5+5) BM_ZFlat/13 [c (42.47 %) ] 17.7µs ± 0% 17.7µs ± 1% ~ (p=0.841 n=5+5) BM_ZFlat/14 [lsp (48.37 %) ] 5.54µs ± 1% 5.55µs ± 0% ~ (p=0.254 n=5+4) BM_ZFlat/15 [xls (41.23 %) ] 1.62ms ± 0% 1.63ms ± 0% ~ (p=0.151 n=5+5) BM_ZFlat/16 [xls_200 (78.00 %)] 395ns ± 2% 394ns ± 1% ~ (p=1.000 n=5+5) BM_ZFlat/17 [bin (18.11 %) ] 507µs ± 0% 507µs ± 0% ~ (p=0.056 n=5+5) BM_ZFlat/18 [bin_200 (7.50 %) ] 89.6ns ± 5% 89.8ns ± 5% ~ (p=1.000 n=5+5) BM_ZFlat/19 [sum (48.96 %) ] 79.9µs ± 0% 79.9µs ± 0% ~ (p=0.690 n=5+5) BM_ZFlat/20 [man (59.21 %) ] 7.67µs ± 0% 7.67µs ± 1% ~ (p=0.548 n=5+5) name old speed new speed delta BM_UFlat/0 [html ] 2.45GB/s ± 0% 2.45GB/s ± 0% ~ (p=0.889 n=5+5) BM_UFlat/1 [urls ] 1.19GB/s ± 0% 1.19GB/s ± 0% ~ (all samples are equal) BM_UFlat/2 [jpg ] 17.3GB/s ± 1% 17.3GB/s ± 1% ~ (p=0.556 n=5+4) BM_UFlat/3 [jpg_200 ] 1.54GB/s ± 0% 1.54GB/s ± 0% ~ (p=0.833 n=5+5) BM_UFlat/4 [pdf ] 12.5GB/s ± 0% 12.4GB/s ± 2% ~ (p=0.421 n=5+5) BM_UFlat/5 [html4 ] 1.87GB/s ± 0% 1.87GB/s ± 0% ~ (p=1.000 n=4+5) BM_UFlat/6 [txt1 ] 794MB/s ± 0% 794MB/s ± 0% ~ (p=0.310 n=5+5) BM_UFlat/7 [txt2 ] 738MB/s ± 0% 738MB/s ± 0% ~ (p=0.841 n=5+5) BM_UFlat/8 [txt3 ] 839MB/s ± 0% 838MB/s ± 0% ~ (p=0.151 n=5+5) BM_UFlat/9 [txt4 ] 677MB/s ± 0% 677MB/s ± 0% ~ (p=0.841 n=5+5) BM_UFlat/10 [pb ] 3.08GB/s ± 0% 3.08GB/s ± 0% ~ (p=0.452 n=5+5) BM_UFlat/11 [gaviota ] 975MB/s ± 0% 975MB/s ± 0% ~ (p=0.841 n=5+5) BM_UFlat/12 [cp ] 1.73GB/s ± 1% 1.73GB/s ± 0% ~ (p=0.984 n=5+5) BM_UFlat/13 [c ] 1.52GB/s ± 0% 1.52GB/s ± 0% ~ (p=0.841 n=5+5) BM_UFlat/14 [lsp ] 1.64GB/s ± 0% 1.64GB/s ± 0% ~ (p=0.254 n=4+5) BM_UFlat/15 [xls ] 1.08GB/s ± 0% 1.08GB/s ± 0% ~ (p=0.095 n=5+4) BM_UFlat/16 [xls_200 ] 931MB/s ± 4% 941MB/s ± 0% ~ (p=0.151 n=5+5) BM_UFlat/17 [bin ] 1.86GB/s ± 0% 1.86GB/s ± 0% ~ (p=0.762 n=5+5) BM_UFlat/18 [bin_200 ] 1.92GB/s ± 9% 1.95GB/s ± 3% ~ (p=1.000 n=5+5) BM_UFlat/19 [sum ] 1.31GB/s ± 1% 1.31GB/s ± 0% ~ (p=0.548 n=5+5) BM_UFlat/20 [man ] 1.43GB/s ± 0% 1.42GB/s ± 1% -0.42% (p=0.040 n=5+5) BM_UValidate/0 [html ] 3.06GB/s ± 0% 3.06GB/s ± 0% ~ (p=0.151 n=5+5) BM_UValidate/1 [urls ] 1.59GB/s ± 0% 1.59GB/s ± 0% ~ (p=0.357 n=5+5) BM_UValidate/2 [jpg ] 845GB/s ± 0% 845GB/s ± 0% ~ (p=0.548 n=5+5) BM_UValidate/3 [jpg_200 ] 2.04GB/s ± 0% 2.04GB/s ± 0% ~ (p=1.000 n=5+5) BM_UValidate/4 [pdf ] 35.4GB/s ± 0% 35.4GB/s ± 0% ~ (p=0.421 n=5+5) BM_UIOVec/0 [html ] 845MB/s ± 0% 845MB/s ± 0% ~ (p=0.151 n=5+5) BM_UIOVec/1 [urls ] 650MB/s ± 0% 650MB/s ± 0% ~ (p=0.087 n=5+5) BM_UIOVec/2 [jpg ] 16.5GB/s ± 5% 16.8GB/s ± 2% ~ (p=0.222 n=5+5) BM_UIOVec/3 [jpg_200 ] 605MB/s ± 0% 605MB/s ± 0% ~ (p=0.690 n=5+5) BM_UIOVec/4 [pdf ] 8.36GB/s ± 2% 8.54GB/s ± 0% ~ (p=0.063 n=5+5) BM_UFlatSink/0 [html ] 2.46GB/s ± 0% 2.46GB/s ± 0% ~ (p=0.063 n=5+5) BM_UFlatSink/1 [urls ] 1.19GB/s ± 0% 1.19GB/s ± 0% ~ (all samples are equal) BM_UFlatSink/2 [jpg ] 16.0GB/s ±22% 17.0GB/s ± 5% ~ (p=0.690 n=5+5) BM_UFlatSink/3 [jpg_200 ] 1.51GB/s ± 0% 1.51GB/s ± 2% ~ (p=1.000 n=5+5) BM_UFlatSink/4 [pdf ] 12.2GB/s ± 3% 12.4GB/s ± 2% ~ (p=0.254 n=5+5) BM_UFlatSink/5 [html4 ] 1.87GB/s ± 0% 1.87GB/s ± 0% ~ (p=0.532 n=5+5) BM_UFlatSink/6 [txt1 ] 794MB/s ± 0% 794MB/s ± 0% ~ (p=0.690 n=5+5) BM_UFlatSink/7 [txt2 ] 738MB/s ± 0% 738MB/s ± 0% ~ (p=0.421 n=5+5) BM_UFlatSink/8 [txt3 ] 838MB/s ± 0% 838MB/s ± 0% ~ (p=0.310 n=5+5) BM_UFlatSink/9 [txt4 ] 676MB/s ± 0% 676MB/s ± 0% ~ (p=0.841 n=5+5) BM_UFlatSink/10 [pb ] 3.08GB/s ± 0% 3.08GB/s ± 0% ~ (p=0.365 n=5+5) BM_UFlatSink/11 [gaviota ] 975MB/s ± 0% 975MB/s ± 0% ~ (p=1.000 n=5+5) BM_UFlatSink/12 [cp ] 1.73GB/s ± 0% 1.74GB/s ± 0% ~ (p=0.286 n=5+5) BM_UFlatSink/13 [c ] 1.51GB/s ± 1% 1.52GB/s ± 1% ~ (p=0.683 n=5+5) BM_UFlatSink/14 [lsp ] 1.64GB/s ± 0% 1.64GB/s ± 0% ~ (p=0.444 n=5+5) BM_UFlatSink/15 [xls ] 1.08GB/s ± 0% 1.08GB/s ± 0% ~ (p=0.333 n=4+5) BM_UFlatSink/16 [xls_200 ] 930MB/s ± 1% 930MB/s ± 1% ~ (p=0.841 n=5+5) BM_UFlatSink/17 [bin ] 1.86GB/s ± 0% 1.86GB/s ± 0% ~ (p=1.000 n=5+5) BM_UFlatSink/18 [bin_200 ] 1.93GB/s ± 2% 1.93GB/s ± 1% ~ (p=0.651 n=5+5) BM_UFlatSink/19 [sum ] 1.31GB/s ± 0% 1.31GB/s ± 0% ~ (p=0.508 n=5+5) BM_UFlatSink/20 [man ] 1.43GB/s ± 0% 1.42GB/s ± 1% ~ (p=0.524 n=5+5) BM_ZFlat/0 [html (22.31 %) ] 815MB/s ± 0% 815MB/s ± 0% ~ (p=1.000 n=5+5) BM_ZFlat/1 [urls (47.78 %) ] 420MB/s ± 0% 420MB/s ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/2 [jpg (99.95 %) ] 10.6GB/s ± 4% 10.6GB/s ± 3% ~ (p=1.000 n=5+5) BM_ZFlat/3 [jpg_200 (73.00 %)] 543MB/s ± 1% 546MB/s ± 0% ~ (p=0.095 n=5+5) BM_ZFlat/4 [pdf (83.30 %) ] 6.96GB/s ± 1% 7.01GB/s ± 0% ~ (p=0.190 n=5+4) BM_ZFlat/5 [html4 (22.52 %) ] 745MB/s ± 0% 745MB/s ± 0% ~ (p=0.841 n=5+5) BM_ZFlat/6 [txt1 (57.88 %) ] 282MB/s ± 0% 282MB/s ± 0% ~ (p=0.310 n=5+5) BM_ZFlat/7 [txt2 (61.91 %) ] 261MB/s ± 0% 261MB/s ± 0% ~ (p=1.000 n=5+5) BM_ZFlat/8 [txt3 (54.99 %) ] 297MB/s ± 0% 297MB/s ± 0% ~ (p=0.421 n=5+5) BM_ZFlat/9 [txt4 (66.26 %) ] 244MB/s ± 0% 244MB/s ± 0% ~ (p=0.389 n=5+5) BM_ZFlat/10 [pb (19.68 %) ] 1.08GB/s ± 0% 1.08GB/s ± 0% ~ (p=0.238 n=5+4) BM_ZFlat/11 [gaviota (37.72 %)] 448MB/s ± 0% 447MB/s ± 0% ~ (p=1.000 n=5+5) BM_ZFlat/12 [cp (48.12 %) ] 532MB/s ± 0% 531MB/s ± 1% ~ (p=0.841 n=5+5) BM_ZFlat/13 [c (42.47 %) ] 632MB/s ± 0% 631MB/s ± 1% ~ (p=0.841 n=5+5) BM_ZFlat/14 [lsp (48.37 %) ] 672MB/s ± 1% 671MB/s ± 0% ~ (p=0.286 n=5+4) BM_ZFlat/15 [xls (41.23 %) ] 634MB/s ± 0% 633MB/s ± 0% ~ (p=0.151 n=5+5) BM_ZFlat/16 [xls_200 (78.00 %)] 507MB/s ± 2% 508MB/s ± 1% ~ (p=1.000 n=5+5) BM_ZFlat/17 [bin (18.11 %) ] 1.01GB/s ± 0% 1.01GB/s ± 0% ~ (p=0.056 n=5+5) BM_ZFlat/18 [bin_200 (7.50 %) ] 2.24GB/s ± 5% 2.23GB/s ± 5% ~ (p=0.889 n=5+5) BM_ZFlat/19 [sum (48.96 %) ] 479MB/s ± 0% 479MB/s ± 0% ~ (p=0.690 n=5+5) BM_ZFlat/20 [man (59.21 %) ] 551MB/s ± 0% 551MB/s ± 1% ~ (p=0.548 n=5+5) |
|
nafi | eb47f79631 |
Optimize by about 0.5%.
How? Move boolean args of EmitLiteral, EmitCopyAtMost64 and EmitCopy to template args so that compiler generates two separate pruned versions of the functions for arg=true and arg=false. FWIW, CompressFragment function calls 1) EmitLiteral inside from a 1-level loop and 2) EmitCopy from a 2-level nested loop. CompressFragment is called from inside another while-loop from the public 'Compress' function. name old time/op new time/op delta BM_UFlat/0 [html ] 41.9µs ± 0% 41.1µs ± 0% -1.92% (p=0.000 n=10+10) BM_UFlat/1 [urls ] 576µs ± 0% 572µs ± 0% -0.68% (p=0.000 n=10+10) BM_UFlat/2 [jpg ] 7.25µs ± 6% 7.13µs ± 1% ~ (p=0.074 n=9+8) BM_UFlat/3 [jpg_200 ] 132ns ± 1% 130ns ± 0% -1.45% (p=0.000 n=10+8) BM_UFlat/4 [pdf ] 8.27µs ± 3% 8.22µs ± 0% ~ (p=0.277 n=9+8) BM_UFlat/5 [html4 ] 220µs ± 0% 219µs ± 0% -0.75% (p=0.000 n=10+10) BM_UFlat/6 [txt1 ] 192µs ± 0% 190µs ± 0% -0.80% (p=0.000 n=10+10) BM_UFlat/7 [txt2 ] 169µs ± 0% 168µs ± 0% -0.69% (p=0.000 n=10+10) BM_UFlat/8 [txt3 ] 510µs ± 0% 508µs ± 0% -0.42% (p=0.000 n=10+10) BM_UFlat/9 [txt4 ] 707µs ± 0% 702µs ± 0% -0.67% (p=0.000 n=10+10) BM_UFlat/10 [pb ] 38.5µs ± 0% 37.4µs ± 1% -2.84% (p=0.000 n=10+10) BM_UFlat/11 [gaviota ] 189µs ± 0% 190µs ± 0% +0.55% (p=0.000 n=10+10) BM_UFlat/12 [cp ] 14.2µs ± 0% 14.1µs ± 0% -0.44% (p=0.000 n=10+10) BM_UFlat/13 [c ] 7.31µs ± 1% 7.35µs ± 0% +0.54% (p=0.002 n=10+10) BM_UFlat/14 [lsp ] 2.27µs ± 0% 2.27µs ± 1% ~ (p=0.161 n=9+9) BM_UFlat/15 [xls ] 905µs ± 0% 903µs ± 0% -0.25% (p=0.000 n=10+10) BM_UFlat/16 [xls_200 ] 214ns ± 1% 213ns ± 1% -0.57% (p=0.043 n=10+10) BM_UFlat/17 [bin ] 275µs ± 0% 274µs ± 0% -0.31% (p=0.000 n=10+10) BM_UFlat/18 [bin_200 ] 102ns ± 5% 101ns ± 3% ~ (p=0.161 n=9+9) BM_UFlat/19 [sum ] 27.9µs ± 0% 27.2µs ± 0% -2.68% (p=0.000 n=10+10) BM_UFlat/20 [man ] 2.97µs ± 1% 2.97µs ± 0% ~ (p=0.400 n=9+10) BM_UValidate/0 [html ] 33.3µs ± 0% 33.7µs ± 0% +1.18% (p=0.000 n=10+10) BM_UValidate/1 [urls ] 442µs ± 0% 442µs ± 0% ~ (p=0.353 n=10+10) BM_UValidate/2 [jpg ] 146ns ± 0% 146ns ± 0% ~ (p=0.063 n=10+10) BM_UValidate/3 [jpg_200 ] 98.4ns ± 0% 98.5ns ± 0% ~ (p=0.184 n=10+10) BM_UValidate/4 [pdf ] 2.88µs ± 0% 2.90µs ± 1% +0.68% (p=0.000 n=10+10) BM_UIOVec/0 [html ] 122µs ± 0% 122µs ± 0% -0.39% (p=0.000 n=10+10) BM_UIOVec/1 [urls ] 1.08ms ± 0% 1.08ms ± 0% ~ (p=0.529 n=10+10) BM_UIOVec/2 [jpg ] 7.71µs ±11% 7.76µs ± 9% ~ (p=0.853 n=10+10) BM_UIOVec/3 [jpg_200 ] 327ns ± 0% 328ns ± 0% ~ (p=0.146 n=8+10) BM_UIOVec/4 [pdf ] 12.1µs ± 1% 12.1µs ± 3% ~ (p=0.315 n=10+10) BM_UFlatSink/0 [html ] 41.8µs ± 0% 41.0µs ± 0% -1.87% (p=0.000 n=10+9) BM_UFlatSink/1 [urls ] 576µs ± 0% 572µs ± 0% -0.74% (p=0.000 n=9+10) BM_UFlatSink/2 [jpg ] 7.58µs ± 8% 7.56µs ± 9% ~ (p=0.739 n=10+10) BM_UFlatSink/3 [jpg_200 ] 133ns ± 0% 134ns ± 0% +0.60% (p=0.000 n=10+9) BM_UFlatSink/4 [pdf ] 8.44µs ± 3% 8.30µs ± 1% -1.65% (p=0.029 n=10+10) BM_UFlatSink/5 [html4 ] 220µs ± 0% 218µs ± 0% -0.81% (p=0.000 n=10+10) BM_UFlatSink/6 [txt1 ] 192µs ± 0% 190µs ± 0% -0.78% (p=0.000 n=10+10) BM_UFlatSink/7 [txt2 ] 169µs ± 0% 168µs ± 0% -0.59% (p=0.000 n=10+10) BM_UFlatSink/8 [txt3 ] 510µs ± 0% 508µs ± 0% -0.39% (p=0.000 n=10+10) BM_UFlatSink/9 [txt4 ] 707µs ± 0% 703µs ± 0% -0.62% (p=0.000 n=10+10) BM_UFlatSink/10 [pb ] 38.4µs ± 0% 37.4µs ± 0% -2.62% (p=0.000 n=9+9) BM_UFlatSink/11 [gaviota ] 189µs ± 0% 190µs ± 0% +0.63% (p=0.000 n=10+10) BM_UFlatSink/12 [cp ] 14.2µs ± 0% 14.1µs ± 0% -0.27% (p=0.011 n=10+10) BM_UFlatSink/13 [c ] 7.33µs ± 1% 7.35µs ± 1% ~ (p=0.243 n=10+9) BM_UFlatSink/14 [lsp ] 2.27µs ± 0% 2.26µs ± 0% -0.39% (p=0.000 n=9+9) BM_UFlatSink/15 [xls ] 904µs ± 0% 902µs ± 0% -0.28% (p=0.000 n=10+10) BM_UFlatSink/16 [xls_200 ] 216ns ± 1% 217ns ± 1% ~ (p=0.661 n=10+9) BM_UFlatSink/17 [bin ] 275µs ± 0% 274µs ± 0% -0.24% (p=0.000 n=8+9) BM_UFlatSink/18 [bin_200 ] 104ns ± 2% 104ns ± 1% -0.70% (p=0.043 n=9+10) BM_UFlatSink/19 [sum ] 27.8µs ± 0% 27.1µs ± 0% -2.51% (p=0.000 n=9+10) BM_UFlatSink/20 [man ] 3.02µs ± 1% 3.00µs ± 1% ~ (p=0.079 n=10+9) BM_ZFlat/0 [html (22.31 %) ] 126µs ± 0% 126µs ± 0% -0.24% (p=0.000 n=10+10) BM_ZFlat/1 [urls (47.78 %) ] 1.68ms ± 0% 1.67ms ± 0% -1.06% (p=0.000 n=10+10) BM_ZFlat/2 [jpg (99.95 %) ] 11.8µs ± 5% 11.6µs ± 5% ~ (p=0.165 n=10+10) BM_ZFlat/3 [jpg_200 (73.00 %)] 360ns ± 3% 358ns ± 1% ~ (p=0.762 n=10+8) BM_ZFlat/4 [pdf (83.30 %) ] 14.8µs ± 2% 14.6µs ± 1% -1.57% (p=0.022 n=10+9) BM_ZFlat/5 [html4 (22.52 %) ] 556µs ± 0% 552µs ± 0% -0.87% (p=0.000 n=10+10) BM_ZFlat/6 [txt1 (57.88 %) ] 542µs ± 0% 540µs ± 0% -0.47% (p=0.000 n=10+10) BM_ZFlat/7 [txt2 (61.91 %) ] 483µs ± 0% 480µs ± 0% -0.62% (p=0.000 n=10+10) BM_ZFlat/8 [txt3 (54.99 %) ] 1.45ms ± 0% 1.44ms ± 0% -0.47% (p=0.000 n=10+10) BM_ZFlat/9 [txt4 (66.26 %) ] 1.98ms ± 0% 1.97ms ± 0% -0.19% (p=0.007 n=10+10) BM_ZFlat/10 [pb (19.68 %) ] 111µs ± 0% 109µs ± 0% -1.75% (p=0.000 n=10+10) BM_ZFlat/11 [gaviota (37.72 %)] 411µs ± 0% 410µs ± 0% -0.21% (p=0.004 n=10+10) BM_ZFlat/12 [cp (48.12 %) ] 45.9µs ± 0% 45.5µs ± 0% -0.76% (p=0.000 n=10+10) BM_ZFlat/13 [c (42.47 %) ] 17.6µs ± 0% 17.5µs ± 0% -0.80% (p=0.000 n=10+10) BM_ZFlat/14 [lsp (48.37 %) ] 5.50µs ± 0% 5.44µs ± 0% -1.19% (p=0.000 n=9+10) BM_ZFlat/15 [xls (41.23 %) ] 1.63ms ± 0% 1.61ms ± 0% -1.21% (p=0.000 n=10+10) BM_ZFlat/16 [xls_200 (78.00 %)] 389ns ± 2% 391ns ± 1% ~ (p=0.182 n=10+9) BM_ZFlat/17 [bin (18.11 %) ] 509µs ± 0% 506µs ± 0% -0.51% (p=0.000 n=10+10) BM_ZFlat/18 [bin_200 (7.50 %) ] 92.7ns ± 0% 89.4ns ± 1% -3.55% (p=0.000 n=8+8) BM_ZFlat/19 [sum (48.96 %) ] 80.2µs ± 0% 78.9µs ± 0% -1.65% (p=0.000 n=10+10) BM_ZFlat/20 [man (59.21 %) ] 7.59µs ± 1% 7.59µs ± 1% ~ (p=0.912 n=10+10) name old allocs/op new allocs/op delta BM_UFlat/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/5 [html4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/6 [txt1 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/7 [txt2 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/8 [txt3 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/9 [txt4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/10 [pb ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/11 [gaviota ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/12 [cp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/13 [c ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/14 [lsp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/15 [xls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/16 [xls_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/17 [bin ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/18 [bin_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/19 [sum ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/20 [man ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UValidate/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UIOVec/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/0 [html ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/1 [urls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/2 [jpg ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/3 [jpg_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/4 [pdf ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/5 [html4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/6 [txt1 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/7 [txt2 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/8 [txt3 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/9 [txt4 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/10 [pb ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/11 [gaviota ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/12 [cp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/13 [c ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/14 [lsp ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/15 [xls ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/16 [xls_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/17 [bin ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/18 [bin_200 ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/19 [sum ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlatSink/20 [man ] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_ZFlat/0 [html (22.31 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/1 [urls (47.78 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/2 [jpg (99.95 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/3 [jpg_200 (73.00 %)] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/4 [pdf (83.30 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/5 [html4 (22.52 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/6 [txt1 (57.88 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/7 [txt2 (61.91 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/8 [txt3 (54.99 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/9 [txt4 (66.26 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/10 [pb (19.68 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/11 [gaviota (37.72 %)] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/12 [cp (48.12 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/13 [c (42.47 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/14 [lsp (48.37 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/15 [xls (41.23 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/16 [xls_200 (78.00 %)] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/17 [bin (18.11 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/18 [bin_200 (7.50 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/19 [sum (48.96 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) BM_ZFlat/20 [man (59.21 %) ] 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal) name old peak-mem(Bytes)/op new peak-mem(Bytes)/op delta BM_UFlat/0 [html ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/1 [urls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/2 [jpg ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/3 [jpg_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/4 [pdf ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/5 [html4 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/6 [txt1 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/7 [txt2 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/8 [txt3 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/9 [txt4 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/10 [pb ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/11 [gaviota ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/12 [cp ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/13 [c ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/14 [lsp ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/15 [xls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/16 [xls_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/17 [bin ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/18 [bin_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/19 [sum ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlat/20 [man ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/0 [html ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/1 [urls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/2 [jpg ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/3 [jpg_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UValidate/4 [pdf ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/0 [html ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/1 [urls ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/2 [jpg ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/3 [jpg_200 ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UIOVec/4 [pdf ] 4.00 ± 0% 4.00 ± 0% ~ (all samples are equal) BM_UFlatSink/0 [html ] 102k ± 0% 102k ± 0% ~ (all samples are equal) BM_UFlatSink/1 [urls ] 702k ± 0% 702k ± 0% ~ (all samples are equal) BM_UFlatSink/2 [jpg ] 123k ± 0% 123k ± 0% ~ (all samples are equal) BM_UFlatSink/3 [jpg_200 ] 201 ± 0% 201 ± 0% ~ (all samples are equal) BM_UFlatSink/4 [pdf ] 102k ± 0% 102k ± 0% ~ (all samples are equal) BM_UFlatSink/5 [html4 ] 410k ± 0% 410k ± 0% ~ (all samples are equal) BM_UFlatSink/6 [txt1 ] 152k ± 0% 152k ± 0% ~ (all samples are equal) BM_UFlatSink/7 [txt2 ] 125k ± 0% 125k ± 0% ~ (all samples are equal) BM_UFlatSink/8 [txt3 ] 427k ± 0% 427k ± 0% ~ (all samples are equal) BM_UFlatSink/9 [txt4 ] 482k ± 0% 482k ± 0% ~ (all samples are equal) BM_UFlatSink/10 [pb ] 119k ± 0% 119k ± 0% ~ (all samples are equal) BM_UFlatSink/11 [gaviota ] 184k ± 0% 184k ± 0% ~ (all samples are equal) BM_UFlatSink/12 [cp ] 24.6k ± 0% 24.6k ± 0% ~ (all samples are equal) BM_UFlatSink/13 [c ] 11.2k ± 0% 11.2k ± 0% ~ (all samples are equal) BM_UFlatSink/14 [lsp ] 3.72k ± 0% 3.72k ± 0% ~ (all samples are equal) BM_UFlatSink/15 [xls ] 1.03M ± 0% 1.03M ± 0% ~ (all samples are equal) BM_UFlatSink/16 [xls_200 ] 201 ± 0% 201 ± 0% ~ (all samples are equal) BM_UFlatSink/17 [bin ] 513k ± 0% 513k ± 0% ~ (all samples are equal) BM_UFlatSink/18 [bin_200 ] 201 ± 0% 201 ± 0% ~ (all samples are equal) BM_UFlatSink/19 [sum ] 38.2k ± 0% 38.2k ± 0% ~ (all samples are equal) BM_UFlatSink/20 [man ] 4.23k ± 0% 4.23k ± 0% ~ (all samples are equal) BM_ZFlat/0 [html (22.31 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/1 [urls (47.78 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/2 [jpg (99.95 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/3 [jpg_200 (73.00 %)] 63.3k ± 0% 63.3k ± 0% ~ (all samples are equal) BM_ZFlat/4 [pdf (83.30 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/5 [html4 (22.52 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/6 [txt1 (57.88 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/7 [txt2 (61.91 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/8 [txt3 (54.99 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/9 [txt4 (66.26 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/10 [pb (19.68 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/11 [gaviota (37.72 %)] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/12 [cp (48.12 %) ] 86.1k ± 0% 86.1k ± 0% ~ (all samples are equal) BM_ZFlat/13 [c (42.47 %) ] 63.3k ± 0% 63.3k ± 0% ~ (all samples are equal) BM_ZFlat/14 [lsp (48.37 %) ] 63.3k ± 0% 63.3k ± 0% ~ (all samples are equal) BM_ZFlat/15 [xls (41.23 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/16 [xls_200 (78.00 %)] 63.3k ± 0% 63.3k ± 0% ~ (all samples are equal) BM_ZFlat/17 [bin (18.11 %) ] 175k ± 0% 175k ± 0% ~ (all samples are equal) BM_ZFlat/18 [bin_200 (7.50 %) ] 63.3k ± 0% 63.3k ± 0% ~ (all samples are equal) BM_ZFlat/19 [sum (48.96 %) ] 116k ± 0% 116k ± 0% ~ (all samples are equal) BM_ZFlat/20 [man (59.21 %) ] 63.3k ± 0% 63.3k ± 0% ~ (all samples are equal) name old speed new speed delta BM_UFlat/0 [html ] 2.45GB/s ± 0% 2.50GB/s ± 0% +1.96% (p=0.000 n=10+10) BM_UFlat/1 [urls ] 1.22GB/s ± 0% 1.23GB/s ± 0% +0.69% (p=0.000 n=10+10) BM_UFlat/2 [jpg ] 17.0GB/s ± 5% 17.3GB/s ± 1% ~ (p=0.074 n=9+8) BM_UFlat/3 [jpg_200 ] 1.52GB/s ± 1% 1.54GB/s ± 0% +1.44% (p=0.000 n=10+8) BM_UFlat/4 [pdf ] 12.5GB/s ± 1% 12.5GB/s ± 0% ~ (p=0.721 n=8+8) BM_UFlat/5 [html4 ] 1.87GB/s ± 0% 1.88GB/s ± 0% +0.76% (p=0.000 n=10+10) BM_UFlat/6 [txt1 ] 795MB/s ± 0% 801MB/s ± 0% +0.79% (p=0.000 n=10+10) BM_UFlat/7 [txt2 ] 741MB/s ± 0% 746MB/s ± 0% +0.68% (p=0.000 n=10+10) BM_UFlat/8 [txt3 ] 840MB/s ± 0% 844MB/s ± 0% +0.44% (p=0.000 n=10+10) BM_UFlat/9 [txt4 ] 684MB/s ± 0% 688MB/s ± 0% +0.65% (p=0.000 n=9+10) BM_UFlat/10 [pb ] 3.09GB/s ± 0% 3.18GB/s ± 0% +2.88% (p=0.000 n=10+9) BM_UFlat/11 [gaviota ] 980MB/s ± 0% 975MB/s ± 0% -0.57% (p=0.000 n=10+10) BM_UFlat/12 [cp ] 1.74GB/s ± 0% 1.75GB/s ± 0% +0.38% (p=0.001 n=10+9) BM_UFlat/13 [c ] 1.53GB/s ± 1% 1.52GB/s ± 0% -0.55% (p=0.003 n=10+10) BM_UFlat/14 [lsp ] 1.64GB/s ± 0% 1.64GB/s ± 1% ~ (p=0.400 n=9+10) BM_UFlat/15 [xls ] 1.14GB/s ± 0% 1.14GB/s ± 0% +0.23% (p=0.000 n=10+10) BM_UFlat/16 [xls_200 ] 936MB/s ± 1% 941MB/s ± 1% ~ (p=0.052 n=10+10) BM_UFlat/17 [bin ] 1.87GB/s ± 0% 1.88GB/s ± 0% +0.28% (p=0.000 n=10+10) BM_UFlat/18 [bin_200 ] 1.97GB/s ± 5% 1.99GB/s ± 3% ~ (p=0.136 n=9+9) BM_UFlat/19 [sum ] 1.37GB/s ± 0% 1.41GB/s ± 0% +2.82% (p=0.000 n=10+9) BM_UFlat/20 [man ] 1.42GB/s ± 1% 1.42GB/s ± 0% ~ (p=0.579 n=10+10) BM_UValidate/0 [html ] 3.08GB/s ± 0% 3.05GB/s ± 0% -1.18% (p=0.000 n=10+10) BM_UValidate/1 [urls ] 1.59GB/s ± 0% 1.59GB/s ± 0% ~ (p=0.247 n=10+10) BM_UValidate/2 [jpg ] 845GB/s ± 0% 846GB/s ± 0% +0.09% (p=0.000 n=10+10) BM_UValidate/3 [jpg_200 ] 2.04GB/s ± 0% 2.04GB/s ± 0% -0.09% (p=0.019 n=10+10) BM_UValidate/4 [pdf ] 35.7GB/s ± 0% 35.4GB/s ± 1% -0.70% (p=0.000 n=10+10) BM_UIOVec/0 [html ] 841MB/s ± 0% 844MB/s ± 0% +0.36% (p=0.000 n=10+10) BM_UIOVec/1 [urls ] 650MB/s ± 0% 650MB/s ± 0% ~ (p=0.105 n=10+10) BM_UIOVec/2 [jpg ] 16.1GB/s ±10% 15.9GB/s ± 8% ~ (p=0.853 n=10+10) BM_UIOVec/3 [jpg_200 ] 612MB/s ± 1% 612MB/s ± 0% ~ (p=0.243 n=9+10) BM_UIOVec/4 [pdf ] 8.52GB/s ± 2% 8.46GB/s ± 3% ~ (p=0.436 n=10+10) BM_UFlatSink/0 [html ] 2.46GB/s ± 0% 2.50GB/s ± 0% +1.83% (p=0.000 n=9+10) BM_UFlatSink/1 [urls ] 1.22GB/s ± 0% 1.23GB/s ± 0% +0.73% (p=0.000 n=10+10) BM_UFlatSink/2 [jpg ] 16.3GB/s ± 8% 16.4GB/s ± 9% ~ (p=0.739 n=10+10) BM_UFlatSink/3 [jpg_200 ] 1.51GB/s ± 0% 1.50GB/s ± 0% -0.62% (p=0.000 n=10+9) BM_UFlatSink/4 [pdf ] 12.2GB/s ± 3% 12.4GB/s ± 1% +1.62% (p=0.029 n=10+10) BM_UFlatSink/5 [html4 ] 1.87GB/s ± 0% 1.88GB/s ± 0% +0.79% (p=0.000 n=10+10) BM_UFlatSink/6 [txt1 ] 795MB/s ± 0% 801MB/s ± 0% +0.74% (p=0.000 n=10+9) BM_UFlatSink/7 [txt2 ] 741MB/s ± 0% 745MB/s ± 0% +0.59% (p=0.000 n=10+9) BM_UFlatSink/8 [txt3 ] 840MB/s ± 0% 843MB/s ± 0% +0.37% (p=0.000 n=9+10) BM_UFlatSink/9 [txt4 ] 684MB/s ± 0% 688MB/s ± 0% +0.57% (p=0.000 n=9+10) BM_UFlatSink/10 [pb ] 3.10GB/s ± 0% 3.18GB/s ± 0% +2.64% (p=0.000 n=9+10) BM_UFlatSink/11 [gaviota ] 980MB/s ± 0% 974MB/s ± 0% -0.64% (p=0.000 n=10+10) BM_UFlatSink/12 [cp ] 1.74GB/s ± 0% 1.75GB/s ± 0% +0.26% (p=0.005 n=10+10) BM_UFlatSink/13 [c ] 1.52GB/s ± 1% 1.52GB/s ± 1% ~ (p=0.123 n=10+10) BM_UFlatSink/14 [lsp ] 1.64GB/s ± 0% 1.65GB/s ± 0% +0.46% (p=0.000 n=10+8) BM_UFlatSink/15 [xls ] 1.14GB/s ± 0% 1.15GB/s ± 0% +0.27% (p=0.000 n=10+10) BM_UFlatSink/16 [xls_200 ] 927MB/s ± 1% 926MB/s ± 1% ~ (p=0.497 n=10+9) BM_UFlatSink/17 [bin ] 1.87GB/s ± 0% 1.88GB/s ± 0% +0.27% (p=0.000 n=10+10) BM_UFlatSink/18 [bin_200 ] 1.92GB/s ± 2% 1.93GB/s ± 1% +0.70% (p=0.035 n=9+10) BM_UFlatSink/19 [sum ] 1.38GB/s ± 0% 1.41GB/s ± 0% +2.59% (p=0.000 n=9+10) BM_UFlatSink/20 [man ] 1.40GB/s ± 1% 1.41GB/s ± 1% ~ (p=0.079 n=10+9) BM_ZFlat/0 [html (22.31 %) ] 814MB/s ± 0% 816MB/s ± 0% +0.23% (p=0.000 n=10+10) BM_ZFlat/1 [urls (47.78 %) ] 418MB/s ± 0% 423MB/s ± 0% +1.06% (p=0.000 n=10+10) BM_ZFlat/2 [jpg (99.95 %) ] 10.5GB/s ± 5% 10.7GB/s ± 5% ~ (p=0.165 n=10+10) BM_ZFlat/3 [jpg_200 (73.00 %)] 558MB/s ± 3% 560MB/s ± 1% ~ (p=0.696 n=10+8) BM_ZFlat/4 [pdf (83.30 %) ] 6.94GB/s ± 2% 7.05GB/s ± 1% +1.59% (p=0.028 n=10+9) BM_ZFlat/5 [html4 (22.52 %) ] 739MB/s ± 0% 745MB/s ± 0% +0.86% (p=0.000 n=10+10) BM_ZFlat/6 [txt1 (57.88 %) ] 281MB/s ± 0% 283MB/s ± 0% +0.46% (p=0.000 n=10+10) BM_ZFlat/7 [txt2 (61.91 %) ] 260MB/s ± 0% 261MB/s ± 0% +0.59% (p=0.000 n=10+10) BM_ZFlat/8 [txt3 (54.99 %) ] 296MB/s ± 0% 297MB/s ± 0% +0.45% (p=0.000 n=10+10) BM_ZFlat/9 [txt4 (66.26 %) ] 244MB/s ± 0% 245MB/s ± 0% +0.16% (p=0.000 n=10+10) BM_ZFlat/10 [pb (19.68 %) ] 1.07GB/s ± 0% 1.09GB/s ± 0% +1.75% (p=0.000 n=10+10) BM_ZFlat/11 [gaviota (37.72 %)] 450MB/s ± 0% 451MB/s ± 0% +0.17% (p=0.000 n=9+10) BM_ZFlat/12 [cp (48.12 %) ] 538MB/s ± 0% 542MB/s ± 0% +0.74% (p=0.000 n=10+10) BM_ZFlat/13 [c (42.47 %) ] 635MB/s ± 0% 640MB/s ± 0% +0.80% (p=0.000 n=10+10) BM_ZFlat/14 [lsp (48.37 %) ] 678MB/s ± 0% 686MB/s ± 1% +1.18% (p=0.000 n=9+10) BM_ZFlat/15 [xls (41.23 %) ] 633MB/s ± 0% 641MB/s ± 0% +1.23% (p=0.000 n=10+7) BM_ZFlat/16 [xls_200 (78.00 %)] 516MB/s ± 2% 513MB/s ± 1% ~ (p=0.156 n=10+9) BM_ZFlat/17 [bin (18.11 %) ] 1.01GB/s ± 0% 1.02GB/s ± 0% +0.49% (p=0.000 n=10+10) BM_ZFlat/18 [bin_200 (7.50 %) ] 2.16GB/s ± 0% 2.24GB/s ± 1% +3.65% (p=0.000 n=8+8) BM_ZFlat/19 [sum (48.96 %) ] 478MB/s ± 0% 486MB/s ± 0% +1.66% (p=0.000 n=10+10) BM_ZFlat/20 [man (59.21 %) ] 558MB/s ± 1% 558MB/s ± 1% ~ (p=0.912 n=10+10) |
|
jueminyang | 254966c71e | Migrate to use absl::random | |
alkis | 53a38e5e33 |
Reduce number of allocations when compressing and simplify the code.
Before we were allocating at least once: twice with large table and thrice when we used a scratch buffer. With this approach we always allocate once. name old speed new speed delta BM_UFlat/0 [html ] 2.45GB/s ± 0% 2.45GB/s ± 0% -0.13% (p=0.000 n=11+11) BM_UFlat/1 [urls ] 1.19GB/s ± 0% 1.22GB/s ± 0% +2.48% (p=0.000 n=11+11) BM_UFlat/2 [jpg ] 17.2GB/s ± 2% 17.3GB/s ± 1% ~ (p=0.193 n=11+11) BM_UFlat/3 [jpg_200 ] 1.52GB/s ± 0% 1.51GB/s ± 0% -0.78% (p=0.000 n=10+9) BM_UFlat/4 [pdf ] 12.5GB/s ± 1% 12.5GB/s ± 1% ~ (p=0.881 n=9+9) BM_UFlat/5 [html4 ] 1.86GB/s ± 0% 1.86GB/s ± 0% ~ (p=0.123 n=11+11) BM_UFlat/6 [txt1 ] 793MB/s ± 0% 799MB/s ± 0% +0.78% (p=0.000 n=11+9) BM_UFlat/7 [txt2 ] 739MB/s ± 0% 744MB/s ± 0% +0.77% (p=0.000 n=11+11) BM_UFlat/8 [txt3 ] 839MB/s ± 0% 845MB/s ± 0% +0.71% (p=0.000 n=11+11) BM_UFlat/9 [txt4 ] 678MB/s ± 0% 685MB/s ± 0% +1.01% (p=0.000 n=11+11) BM_UFlat/10 [pb ] 3.08GB/s ± 0% 3.12GB/s ± 0% +1.21% (p=0.000 n=11+11) BM_UFlat/11 [gaviota ] 975MB/s ± 0% 976MB/s ± 0% +0.11% (p=0.000 n=11+11) BM_UFlat/12 [cp ] 1.73GB/s ± 1% 1.74GB/s ± 1% +0.46% (p=0.010 n=11+11) BM_UFlat/13 [c ] 1.53GB/s ± 0% 1.53GB/s ± 0% ~ (p=0.987 n=11+10) BM_UFlat/14 [lsp ] 1.65GB/s ± 0% 1.63GB/s ± 1% -1.04% (p=0.000 n=11+11) BM_UFlat/15 [xls ] 1.08GB/s ± 0% 1.15GB/s ± 0% +6.12% (p=0.000 n=10+11) BM_UFlat/16 [xls_200 ] 944MB/s ± 0% 920MB/s ± 3% -2.51% (p=0.000 n=9+11) BM_UFlat/17 [bin ] 1.86GB/s ± 0% 1.87GB/s ± 0% +0.68% (p=0.000 n=10+11) BM_UFlat/18 [bin_200 ] 1.91GB/s ± 3% 1.92GB/s ± 5% ~ (p=0.356 n=11+11) BM_UFlat/19 [sum ] 1.31GB/s ± 0% 1.40GB/s ± 0% +6.53% (p=0.000 n=11+11) BM_UFlat/20 [man ] 1.42GB/s ± 0% 1.42GB/s ± 0% +0.33% (p=0.000 n=10+10) |
|
ckennelly | df5548c0b3 |
Use sized deallocation when releasing Zippy's scratch buffers.
name old time/op new time/op delta BM_UFlat/0 [html ] 41.7µs ± 0% 41.7µs ± 0% ~ (p=0.222 n=5+5) BM_UFlat/1 [urls ] 587µs ± 0% 574µs ± 0% -2.31% (p=0.008 n=5+5) BM_UFlat/2 [jpg ] 7.24µs ± 2% 7.25µs ± 2% ~ (p=0.690 n=5+5) BM_UFlat/3 [jpg_200 ] 130ns ± 0% 131ns ± 1% ~ (p=0.556 n=4+5) BM_UFlat/4 [pdf ] 8.21µs ± 0% 8.24µs ± 1% ~ (p=0.278 n=5+5) BM_UFlat/5 [html4 ] 219µs ± 0% 220µs ± 0% +0.45% (p=0.008 n=5+5) BM_UFlat/6 [txt1 ] 192µs ± 0% 190µs ± 0% -0.86% (p=0.008 n=5+5) BM_UFlat/7 [txt2 ] 169µs ± 0% 168µs ± 0% -0.54% (p=0.008 n=5+5) BM_UFlat/8 [txt3 ] 509µs ± 0% 505µs ± 0% -0.66% (p=0.008 n=5+5) BM_UFlat/9 [txt4 ] 710µs ± 0% 702µs ± 0% -1.14% (p=0.008 n=5+5) BM_UFlat/10 [pb ] 38.2µs ± 0% 37.9µs ± 0% -0.82% (p=0.008 n=5+5) BM_UFlat/11 [gaviota ] 189µs ± 0% 189µs ± 0% ~ (p=0.746 n=5+5) BM_UFlat/12 [cp ] 14.2µs ± 0% 14.2µs ± 1% ~ (p=0.421 n=5+5) BM_UFlat/13 [c ] 7.29µs ± 0% 7.34µs ± 1% +0.69% (p=0.016 n=5+5) BM_UFlat/14 [lsp ] 2.27µs ± 0% 2.28µs ± 0% +0.34% (p=0.008 n=5+5) BM_UFlat/15 [xls ] 954µs ± 0% 900µs ± 0% -5.67% (p=0.008 n=5+5) BM_UFlat/16 [xls_200 ] 213ns ± 1% 217ns ± 2% ~ (p=0.056 n=5+5) BM_UFlat/17 [bin ] 276µs ± 0% 274µs ± 0% -0.94% (p=0.008 n=5+5) BM_UFlat/18 [bin_200 ] 101ns ± 1% 101ns ± 1% ~ (p=0.524 n=5+5) BM_UFlat/19 [sum ] 29.3µs ± 0% 27.3µs ± 0% -6.98% (p=0.008 n=5+5) BM_UFlat/20 [man ] 2.95µs ± 0% 2.95µs ± 0% ~ (p=0.651 n=5+5) For microbenchmarks, the overhead of allocating/deallocating should be small (the relevant metadata for TCMalloc's PageMap will be in cache), but this helps demonstrate that the refactoring does not adversely impact performance. |
|
alkis | 1b7466e143 |
Compute the wordmask instead of looking it up in a table.
Tested: name old speed new speed delta BM_UFlat/0 [html ] 2.13GB/s ± 0% 2.46GB/s ± 0% +15.70% (p=0.000 n=10+8) BM_UFlat/1 [urls ] 1.21GB/s ± 0% 1.20GB/s ± 0% -1.49% (p=0.000 n=9+10) BM_UFlat/2 [jpg ] 17.1GB/s ± 1% 17.2GB/s ± 1% ~ (p=0.120 n=11+11) BM_UFlat/3 [jpg_200] 1.55GB/s ± 0% 1.54GB/s ± 0% -0.96% (p=0.000 n=10+7) BM_UFlat/4 [pdf ] 12.9GB/s ± 0% 12.6GB/s ± 0% -1.98% (p=0.000 n=11+9) BM_UFlat/5 [html4 ] 1.87GB/s ± 0% 1.87GB/s ± 0% -0.06% (p=0.033 n=11+11) BM_UFlat/6 [txt1 ] 816MB/s ± 0% 793MB/s ± 0% -2.84% (p=0.000 n=11+11) BM_UFlat/7 [txt2 ] 758MB/s ± 0% 737MB/s ± 0% -2.77% (p=0.000 n=11+11) BM_UFlat/8 [txt3 ] 865MB/s ± 0% 839MB/s ± 0% -2.94% (p=0.000 n=11+8) BM_UFlat/9 [txt4 ] 701MB/s ± 0% 679MB/s ± 0% -3.11% (p=0.000 n=11+10) BM_UFlat/10 [pb ] 2.60GB/s ± 2% 3.07GB/s ± 0% +17.81% (p=0.000 n=11+11) BM_UFlat/11 [gaviota] 1.01GB/s ± 0% 0.97GB/s ± 0% -3.83% (p=0.000 n=11+10) BM_UFlat/12 [cp ] 1.66GB/s ± 1% 1.73GB/s ± 1% +4.32% (p=0.000 n=11+11) BM_UFlat/13 [c ] 1.52GB/s ± 1% 1.53GB/s ± 0% +0.49% (p=0.002 n=11+11) BM_UFlat/14 [lsp ] 1.61GB/s ± 0% 1.64GB/s ± 0% +2.10% (p=0.000 n=10+11) BM_UFlat/15 [xls ] 1.12GB/s ± 0% 1.08GB/s ± 0% -3.95% (p=0.000 n=11+7) BM_UFlat/16 [xls_200] 926MB/s ± 1% 935MB/s ± 1% ~ (p=0.056 n=9+11) BM_UFlat/17 [bin ] 1.89GB/s ± 0% 1.86GB/s ± 0% -1.32% (p=0.000 n=11+11) BM_UFlat/18 [bin_200] 1.96GB/s ± 0% 1.99GB/s ± 1% +1.78% (p=0.000 n=11+11) BM_UFlat/19 [sum ] 1.32GB/s ± 0% 1.31GB/s ± 0% -0.79% (p=0.000 n=11+10) BM_UFlat/20 [man ] 1.40GB/s ± 0% 1.43GB/s ± 0% +2.51% (p=0.000 n=9+10) BM_UValidate/0 [html ] 2.95GB/s ± 1% 3.07GB/s ± 0% +4.11% (p=0.000 n=10+11) BM_UValidate/1 [urls ] 1.57GB/s ± 0% 1.60GB/s ± 0% +2.24% (p=0.000 n=10+11) BM_UValidate/2 [jpg ] 822GB/s ± 0% 850GB/s ± 0% +3.42% (p=0.000 n=10+11) BM_UValidate/3 [jpg_200] 2.01GB/s ± 0% 2.04GB/s ± 0% +1.24% (p=0.000 n=11+11) BM_UValidate/4 [pdf ] 33.7GB/s ± 0% 35.9GB/s ± 1% +6.51% (p=0.000 n=10+11) BM_UIOVec/0 [html ] 852MB/s ± 0% 852MB/s ± 0% ~ (p=0.898 n=11+11) BM_UIOVec/1 [urls ] 663MB/s ± 0% 652MB/s ± 0% -1.61% (p=0.000 n=11+11) BM_UIOVec/2 [jpg ] 15.3GB/s ± 1% 15.3GB/s ± 2% ~ (p=0.459 n=9+10) BM_UIOVec/3 [jpg_200] 652MB/s ± 0% 627MB/s ± 1% -3.80% (p=0.000 n=10+11) BM_UIOVec/4 [pdf ] 8.80GB/s ± 1% 8.57GB/s ± 1% -2.62% (p=0.000 n=10+11) BM_UFlatSink/0 [html ] 2.13GB/s ± 0% 2.46GB/s ± 0% +15.63% (p=0.000 n=11+11) BM_UFlatSink/1 [urls ] 1.21GB/s ± 0% 1.20GB/s ± 0% -1.42% (p=0.000 n=11+10) BM_UFlatSink/2 [jpg ] 17.1GB/s ± 2% 17.2GB/s ± 1% ~ (p=0.175 n=11+9) BM_UFlatSink/3 [jpg_200] 1.52GB/s ± 1% 1.47GB/s ± 3% -3.15% (p=0.000 n=11+11) BM_UFlatSink/4 [pdf ] 12.8GB/s ± 1% 12.6GB/s ± 1% -1.76% (p=0.000 n=11+11) BM_UFlatSink/5 [html4 ] 1.87GB/s ± 0% 1.87GB/s ± 0% -0.19% (p=0.000 n=11+10) BM_UFlatSink/6 [txt1 ] 816MB/s ± 0% 792MB/s ± 0% -2.94% (p=0.000 n=11+11) BM_UFlatSink/7 [txt2 ] 758MB/s ± 0% 736MB/s ± 0% -2.83% (p=0.000 n=11+11) BM_UFlatSink/8 [txt3 ] 865MB/s ± 0% 838MB/s ± 0% -3.13% (p=0.000 n=11+11) BM_UFlatSink/9 [txt4 ] 701MB/s ± 0% 678MB/s ± 0% -3.20% (p=0.000 n=11+11) BM_UFlatSink/10 [pb ] 2.60GB/s ± 2% 3.07GB/s ± 0% +18.27% (p=0.000 n=11+10) BM_UFlatSink/11 [gaviota] 1.01GB/s ± 0% 0.97GB/s ± 0% -3.90% (p=0.000 n=11+11) BM_UFlatSink/12 [cp ] 1.66GB/s ± 1% 1.73GB/s ± 1% +4.62% (p=0.000 n=11+10) BM_UFlatSink/13 [c ] 1.52GB/s ± 0% 1.53GB/s ± 1% ~ (p=0.180 n=9+11) BM_UFlatSink/14 [lsp ] 1.61GB/s ± 0% 1.64GB/s ± 1% +1.98% (p=0.000 n=9+11) BM_UFlatSink/15 [xls ] 1.12GB/s ± 0% 1.08GB/s ± 0% -3.76% (p=0.000 n=11+11) BM_UFlatSink/16 [xls_200] 909MB/s ± 2% 924MB/s ± 1% +1.62% (p=0.000 n=11+11) BM_UFlatSink/17 [bin ] 1.88GB/s ± 0% 1.86GB/s ± 0% -1.18% (p=0.000 n=9+11) BM_UFlatSink/18 [bin_200] 1.94GB/s ± 2% 1.94GB/s ± 1% ~ (p=0.090 n=11+11) BM_UFlatSink/19 [sum ] 1.32GB/s ± 0% 1.31GB/s ± 0% -0.76% (p=0.000 n=11+11) BM_UFlatSink/20 [man ] 1.39GB/s ± 2% 1.43GB/s ± 0% +2.75% (p=0.000 n=11+10) Assembly before: * 44 8b 5c 85 a0 mov -0x60(%rbp,%rax,4),%r11d 45 23 5d 00 and 0x0(%r13),%r11d 89 d6 mov %edx,%esi 81 e6 00 07 00 00 and $0x700,%esi Assembly after: * 89 c1 mov %eax,%ecx * c0 e1 03 shl $0x3,%cl * bf ff ff ff ff mov $0xffffffff,%edi * 48 d3 e7 shl %cl,%rdi * f7 d7 not %edi 41 23 7d 00 and 0x0(%r13),%edi 41 89 d3 mov %edx,%r11d 41 81 e3 00 07 00 00 and $0x700,%r11d |
|
Caleb Mazalevskis |
a866f7181c
|
Update README to use HTTPS instead of HTTP.
HTTPS is currently available for all the HTTP links included in the README. As such, using HTTPS instead of HTTP for those links may be preferable. |
|
costan | ea660b57d6 | Fix unused private field warning in NDEBUG builds. | |
costan | 7fefd231a1 |
C++11 guarantees <cstddef> and <cstdint>.
The build configuration can be cleaned up a bit. |
|
costan | db082d2cd6 | Remove GCC on OSX from the Travis CI matrix. | |
costan | ad82620f6f |
Move pshufb_fill_patterns from snappy-internal.h to snappy.cc.
The array of constants is only used in the SSSE3 fast-path in IncrementalCopy. |
|
costan | 73c31e824c |
Fix Visual Studio build.
Commit |
|
jefflim | 27ff0af12a |
Improve performance of zippy decompression to IOVecs by up to almost 50%
1) Simplify loop condition for small pattern IncrementalCopy 2) Use pointers rather than indices to track current iovec. 3) Use fast IncrementalCopy 4) Bypass Append check from within AppendFromSelf While this code greatly improves the performance of ZippyIOVecWriter, a bigger question is whether IOVec writing should be improved, or removed. Perf tests: name old speed new speed delta BM_UFlat/0 [html ] 2.13GB/s ± 0% 2.14GB/s ± 1% ~ BM_UFlat/1 [urls ] 1.22GB/s ± 0% 1.24GB/s ± 0% +1.87% BM_UFlat/2 [jpg ] 17.2GB/s ± 1% 17.1GB/s ± 0% ~ BM_UFlat/3 [jpg_200 ] 1.55GB/s ± 0% 1.53GB/s ± 2% ~ BM_UFlat/4 [pdf ] 12.8GB/s ± 1% 12.7GB/s ± 2% -0.36% BM_UFlat/5 [html4 ] 1.89GB/s ± 0% 1.90GB/s ± 1% ~ BM_UFlat/6 [txt1 ] 811MB/s ± 0% 829MB/s ± 1% +2.24% BM_UFlat/7 [txt2 ] 756MB/s ± 0% 774MB/s ± 1% +2.41% BM_UFlat/8 [txt3 ] 860MB/s ± 0% 879MB/s ± 1% +2.16% BM_UFlat/9 [txt4 ] 699MB/s ± 0% 715MB/s ± 1% +2.31% BM_UFlat/10 [pb ] 2.64GB/s ± 0% 2.65GB/s ± 1% ~ BM_UFlat/11 [gaviota ] 1.00GB/s ± 0% 0.99GB/s ± 2% ~ BM_UFlat/12 [cp ] 1.66GB/s ± 1% 1.66GB/s ± 2% ~ BM_UFlat/13 [c ] 1.53GB/s ± 0% 1.47GB/s ± 5% -3.97% BM_UFlat/14 [lsp ] 1.60GB/s ± 1% 1.55GB/s ± 5% -3.41% BM_UFlat/15 [xls ] 1.12GB/s ± 0% 1.15GB/s ± 0% +1.93% BM_UFlat/16 [xls_200 ] 918MB/s ± 2% 929MB/s ± 1% +1.15% BM_UFlat/17 [bin ] 1.86GB/s ± 0% 1.89GB/s ± 1% +1.61% BM_UFlat/18 [bin_200 ] 1.90GB/s ± 1% 1.97GB/s ± 1% +3.67% BM_UFlat/19 [sum ] 1.32GB/s ± 0% 1.33GB/s ± 1% ~ BM_UFlat/20 [man ] 1.39GB/s ± 0% 1.36GB/s ± 3% ~ BM_UValidate/0 [html ] 2.85GB/s ± 3% 2.90GB/s ± 0% ~ BM_UValidate/1 [urls ] 1.57GB/s ± 0% 1.56GB/s ± 0% -0.20% BM_UValidate/2 [jpg ] 824GB/s ± 0% 825GB/s ± 0% +0.11% BM_UValidate/3 [jpg_200 ] 2.01GB/s ± 0% 2.02GB/s ± 0% +0.10% BM_UValidate/4 [pdf ] 30.4GB/s ±11% 33.5GB/s ± 0% ~ BM_UIOVec/0 [html ] 604MB/s ± 0% 856MB/s ± 0% +41.70% BM_UIOVec/1 [urls ] 440MB/s ± 0% 660MB/s ± 0% +49.91% BM_UIOVec/2 [jpg ] 15.1GB/s ± 1% 15.3GB/s ± 1% +1.22% BM_UIOVec/3 [jpg_200 ] 567MB/s ± 1% 629MB/s ± 0% +10.89% BM_UIOVec/4 [pdf ] 7.16GB/s ± 2% 8.56GB/s ± 1% +19.64% BM_UFlatSink/0 [html ] 2.13GB/s ± 0% 2.16GB/s ± 0% +1.47% BM_UFlatSink/1 [urls ] 1.22GB/s ± 0% 1.25GB/s ± 0% +2.18% BM_UFlatSink/2 [jpg ] 17.1GB/s ± 2% 17.1GB/s ± 2% ~ BM_UFlatSink/3 [jpg_200 ] 1.51GB/s ± 1% 1.53GB/s ± 2% +1.11% BM_UFlatSink/4 [pdf ] 12.7GB/s ± 2% 12.8GB/s ± 1% +0.67% BM_UFlatSink/5 [html4 ] 1.90GB/s ± 0% 1.92GB/s ± 0% +1.31% BM_UFlatSink/6 [txt1 ] 810MB/s ± 0% 835MB/s ± 0% +3.04% BM_UFlatSink/7 [txt2 ] 755MB/s ± 0% 779MB/s ± 0% +3.19% BM_UFlatSink/8 [txt3 ] 859MB/s ± 0% 884MB/s ± 0% +2.86% BM_UFlatSink/9 [txt4 ] 698MB/s ± 0% 718MB/s ± 0% +2.96% BM_UFlatSink/10 [pb ] 2.64GB/s ± 0% 2.67GB/s ± 0% +1.16% BM_UFlatSink/11 [gaviota ] 1.00GB/s ± 0% 1.01GB/s ± 0% +1.04% BM_UFlatSink/12 [cp ] 1.66GB/s ± 1% 1.68GB/s ± 1% +0.83% BM_UFlatSink/13 [c ] 1.52GB/s ± 1% 1.53GB/s ± 0% +0.38% BM_UFlatSink/14 [lsp ] 1.60GB/s ± 1% 1.61GB/s ± 0% +0.91% BM_UFlatSink/15 [xls ] 1.12GB/s ± 0% 1.15GB/s ± 0% +1.96% BM_UFlatSink/16 [xls_200 ] 906MB/s ± 3% 920MB/s ± 1% +1.55% BM_UFlatSink/17 [bin ] 1.86GB/s ± 0% 1.90GB/s ± 0% +2.15% BM_UFlatSink/18 [bin_200 ] 1.85GB/s ± 2% 1.92GB/s ± 2% +4.01% BM_UFlatSink/19 [sum ] 1.32GB/s ± 1% 1.35GB/s ± 0% +2.23% BM_UFlatSink/20 [man ] 1.39GB/s ± 1% 1.40GB/s ± 0% +1.12% BM_ZFlat/0 [html (22.31 %) ] 800MB/s ± 0% 793MB/s ± 0% -0.95% BM_ZFlat/1 [urls (47.78 %) ] 423MB/s ± 0% 424MB/s ± 0% +0.11% BM_ZFlat/2 [jpg (99.95 %) ] 12.0GB/s ± 2% 12.0GB/s ± 4% ~ BM_ZFlat/3 [jpg_200 (73.00 %)] 592MB/s ± 3% 594MB/s ± 2% ~ BM_ZFlat/4 [pdf (83.30 %) ] 7.26GB/s ± 1% 7.23GB/s ± 2% -0.49% BM_ZFlat/5 [html4 (22.52 %) ] 738MB/s ± 0% 739MB/s ± 0% +0.17% BM_ZFlat/6 [txt1 (57.88 %) ] 286MB/s ± 0% 285MB/s ± 0% -0.09% BM_ZFlat/7 [txt2 (61.91 %) ] 264MB/s ± 0% 264MB/s ± 0% +0.08% BM_ZFlat/8 [txt3 (54.99 %) ] 300MB/s ± 0% 300MB/s ± 0% ~ BM_ZFlat/9 [txt4 (66.26 %) ] 248MB/s ± 0% 247MB/s ± 0% -0.20% BM_ZFlat/10 [pb (19.68 %) ] 1.04GB/s ± 0% 1.03GB/s ± 0% -1.17% BM_ZFlat/11 [gaviota (37.72 %)] 451MB/s ± 0% 450MB/s ± 0% -0.35% BM_ZFlat/12 [cp (48.12 %) ] 543MB/s ± 0% 538MB/s ± 0% -1.04% BM_ZFlat/13 [c (42.47 %) ] 638MB/s ± 1% 643MB/s ± 0% +0.68% BM_ZFlat/14 [lsp (48.37 %) ] 686MB/s ± 0% 691MB/s ± 1% +0.76% BM_ZFlat/15 [xls (41.23 %) ] 636MB/s ± 0% 633MB/s ± 0% -0.52% BM_ZFlat/16 [xls_200 (78.00 %)] 523MB/s ± 2% 520MB/s ± 2% -0.56% BM_ZFlat/17 [bin (18.11 %) ] 1.01GB/s ± 0% 1.01GB/s ± 0% +0.50% BM_ZFlat/18 [bin_200 (7.50 %) ] 2.45GB/s ± 1% 2.44GB/s ± 1% -0.54% BM_ZFlat/19 [sum (48.96 %) ] 487MB/s ± 0% 478MB/s ± 0% -1.89% BM_ZFlat/20 [man (59.21 %) ] 567MB/s ± 1% 566MB/s ± 1% ~ The BM_UFlat/13 and BM_UFlat/14 results showed high variance, so I reran them: name old speed new speed delta BM_UFlat/13 [c ] 1.53GB/s ± 0% 1.53GB/s ± 1% ~ BM_UFlat/14 [lsp] 1.61GB/s ± 1% 1.61GB/s ± 1% +0.25% |
|
costan | 4ffb0e62c5 | Update Travis CI configuration. | |
atdt | be490ef9ec | Test for SSE3 suppport before using pshufb. | |
atdt | 8f469d97e2 |
Avoid store-forwarding stalls in Zippy's IncrementalCopy
NEW: Annotate `pattern` as initialized, for MSan. Snappy's IncrementalCopy routine optimizes for speed by reading and writing memory in blocks of eight or sixteen bytes. If the gap between the source and destination pointers is smaller than eight bytes, snappy's strategy is to expand the gap by issuing a series of partly-overlapping eight-byte loads+stores. Because the range of each load partly overlaps that of the store which preceded it, the store buffer cannot be forwarded to the load, and the load stalls while it waits for the store to retire. This is called a store-forwarding stall. We can use fewer loads and avoid most of the stalls by loading the first eight bytes into an 128-bit XMM register, then using PSHUFB to permute the register's contents in-place into the desired repeating sequence of bytes. When falling back to IncrementalCopySlow, use memset if the pattern size == 1. This eliminates around 60% of the stalls. name old time/op new time/op delta BM_UFlat/0 [html] 48.6µs ± 0% 48.2µs ± 0% -0.92% (p=0.000 n=19+18) BM_UFlat/1 [urls] 589µs ± 0% 576µs ± 0% -2.17% (p=0.000 n=19+18) BM_UFlat/2 [jpg] 7.12µs ± 0% 7.10µs ± 0% ~ (p=0.071 n=19+18) BM_UFlat/3 [jpg_200] 162ns ± 0% 151ns ± 0% -7.06% (p=0.000 n=19+18) BM_UFlat/4 [pdf] 8.25µs ± 0% 8.19µs ± 0% -0.74% (p=0.000 n=19+18) BM_UFlat/5 [html4] 218µs ± 0% 218µs ± 0% +0.09% (p=0.000 n=17+18) BM_UFlat/6 [txt1] 191µs ± 0% 189µs ± 0% -1.12% (p=0.000 n=19+18) BM_UFlat/7 [txt2] 168µs ± 0% 167µs ± 0% -1.01% (p=0.000 n=19+18) BM_UFlat/8 [txt3] 502µs ± 0% 499µs ± 0% -0.52% (p=0.000 n=19+18) BM_UFlat/9 [txt4] 704µs ± 0% 695µs ± 0% -1.26% (p=0.000 n=19+18) BM_UFlat/10 [pb] 45.6µs ± 0% 44.2µs ± 0% -3.13% (p=0.000 n=19+15) BM_UFlat/11 [gaviota] 188µs ± 0% 194µs ± 0% +3.06% (p=0.000 n=15+18) BM_UFlat/12 [cp] 15.1µs ± 2% 14.7µs ± 1% -2.09% (p=0.000 n=18+18) BM_UFlat/13 [c] 7.38µs ± 0% 7.36µs ± 0% -0.28% (p=0.000 n=16+18) BM_UFlat/14 [lsp] 2.31µs ± 0% 2.37µs ± 0% +2.64% (p=0.000 n=19+18) BM_UFlat/15 [xls] 984µs ± 0% 909µs ± 0% -7.59% (p=0.000 n=19+18) BM_UFlat/16 [xls_200] 215ns ± 0% 217ns ± 0% +0.71% (p=0.000 n=19+15) BM_UFlat/17 [bin] 289µs ± 0% 287µs ± 0% -0.71% (p=0.000 n=19+18) BM_UFlat/18 [bin_200] 161ns ± 0% 116ns ± 0% -28.09% (p=0.000 n=19+16) BM_UFlat/19 [sum] 31.9µs ± 0% 29.2µs ± 0% -8.37% (p=0.000 n=19+18) BM_UFlat/20 [man] 3.13µs ± 1% 3.07µs ± 0% -1.79% (p=0.000 n=19+18) name old allocs/op new allocs/op delta BM_UFlat/0 [html] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/1 [urls] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/2 [jpg] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/3 [jpg_200] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/4 [pdf] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/5 [html4] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/6 [txt1] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/7 [txt2] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/8 [txt3] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/9 [txt4] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/10 [pb] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/11 [gaviota] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/12 [cp] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/13 [c] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/14 [lsp] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/15 [xls] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/16 [xls_200] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/17 [bin] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/18 [bin_200] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/19 [sum] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) BM_UFlat/20 [man] 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal) name old speed new speed delta BM_UFlat/0 [html] 2.11GB/s ± 0% 2.13GB/s ± 0% +0.92% (p=0.000 n=19+18) BM_UFlat/1 [urls] 1.19GB/s ± 0% 1.22GB/s ± 0% +2.22% (p=0.000 n=16+17) BM_UFlat/2 [jpg] 17.3GB/s ± 0% 17.3GB/s ± 0% ~ (p=0.074 n=19+18) BM_UFlat/3 [jpg_200] 1.23GB/s ± 0% 1.33GB/s ± 0% +7.58% (p=0.000 n=19+18) BM_UFlat/4 [pdf] 12.4GB/s ± 0% 12.5GB/s ± 0% +0.74% (p=0.000 n=19+18) BM_UFlat/5 [html4] 1.88GB/s ± 0% 1.88GB/s ± 0% -0.09% (p=0.000 n=18+18) BM_UFlat/6 [txt1] 798MB/s ± 0% 807MB/s ± 0% +1.13% (p=0.000 n=19+18) BM_UFlat/7 [txt2] 743MB/s ± 0% 751MB/s ± 0% +1.02% (p=0.000 n=19+18) BM_UFlat/8 [txt3] 850MB/s ± 0% 855MB/s ± 0% +0.52% (p=0.000 n=19+18) BM_UFlat/9 [txt4] 684MB/s ± 0% 693MB/s ± 0% +1.28% (p=0.000 n=19+18) BM_UFlat/10 [pb] 2.60GB/s ± 0% 2.69GB/s ± 0% +3.25% (p=0.000 n=19+16) BM_UFlat/11 [gaviota] 979MB/s ± 0% 950MB/s ± 0% -2.97% (p=0.000 n=15+18) BM_UFlat/12 [cp] 1.63GB/s ± 2% 1.67GB/s ± 1% +2.13% (p=0.000 n=18+18) BM_UFlat/13 [c] 1.51GB/s ± 0% 1.52GB/s ± 0% +0.29% (p=0.000 n=16+18) BM_UFlat/14 [lsp] 1.61GB/s ± 1% 1.57GB/s ± 0% -2.57% (p=0.000 n=19+18) BM_UFlat/15 [xls] 1.05GB/s ± 0% 1.13GB/s ± 0% +8.22% (p=0.000 n=19+18) BM_UFlat/16 [xls_200] 928MB/s ± 0% 921MB/s ± 0% -0.81% (p=0.000 n=19+17) BM_UFlat/17 [bin] 1.78GB/s ± 0% 1.79GB/s ± 0% +0.71% (p=0.000 n=19+18) BM_UFlat/18 [bin_200] 1.24GB/s ± 0% 1.72GB/s ± 0% +38.92% (p=0.000 n=19+18) BM_UFlat/19 [sum] 1.20GB/s ± 0% 1.31GB/s ± 0% +9.15% (p=0.000 n=19+18) BM_UFlat/20 [man] 1.35GB/s ± 1% 1.38GB/s ± 0% +1.84% (p=0.000 n=19+18) |
|
costan | 4f7bd2dbfd |
Update CI configurations.
Bump GCC and Clang on Travis and remove Visual Studio 2015 from AppVeyor. |
|
jgorbe | ca37ab7fb9 |
Ensure DecompressAllTags starts on a 32-byte boundary + 16 bytes.
First of all, I'm sorry about this ugly hack. I hope the following long explanation is enough to justify it. We have observed that, in some conditions, the results for dataset number 10 (pb) in the zippy benchmark can show a >20% regression on Skylake CPUs. In order to diagnose this, we profiled the benchmark looking at hot functions (99% of the time is spent on DecompressAllTags), then looked at the generated code to see if there was any difference. In order to discard a minor difference we observed in register allocation we replaced zippy.cc with a pre-built assembly file so it was the same in both variants, and we still were able to reproduce the regression. After discarding a regression caused by the compiler, we digged a bit further and noticed that the alignment of the function in the final binary was different. Both were aligned to a 16-byte boundary, but the slower one was also (by chance) aligned to a 32-byte boundary. A regression caused by alignment differences would explain why I could reproduce it consistently on the same CitC client, but not others: slight differences in the sources can cause the resulting binary to have different layout. Here are some detailed benchmark results before/after the fix. Note how fixing the alignment makes the difference between baseline and experiment go away, but regular 32-byte alignment puts both variants in the same ballpark as the original regression: Original (note BM_UCord_10 and BM_UDataBuffer_10 around the -24% line): BASELINE BM_UCord/10 2938 2932 24194 3.767GB/s pb BM_UDataBuffer/10 3008 3004 23316 3.677GB/s pb EXPERIMENT BM_UCord/10 3797 3789 18512 2.915GB/s pb BM_UDataBuffer/10 4024 4016 17543 2.750GB/s pb Aligning DecompressAllTags to a 32-byte boundary: BASELINE BM_UCord/10 3872 3862 18035 2.860GB/s pb BM_UDataBuffer/10 4010 3998 17591 2.763GB/s pb EXPERIMENT BM_UCord/10 3884 3876 18126 2.850GB/s pb BM_UDataBuffer/10 4037 4027 17199 2.743GB/s pb Aligning DecompressAllTags to a 32-byte boundary + 16 bytes (this patch): BASELINE BM_UCord/10 3103 3095 22642 3.569GB/s pb BM_UDataBuffer/10 3186 3177 21947 3.476GB/s pb EXPERIMENT BM_UCord/10 3104 3095 22632 3.569GB/s pb BM_UDataBuffer/10 3167 3159 22076 3.496GB/s pb This change forces the "good" alignment for DecompressAllTags which, if anything, should make benchmark results more stable (and maybe we'll improve some unlucky application!). |
|
scrubbed | 15a2804cd2 |
Fix an incorrect analysis / comment in the "pattern doubling" code.
This should have a miniscule positive effect on performance; the main idea of the CL is just to fix the incorrect comment. |
|
costan | e69d9f8806 | Fix Travis CI configuration for OSX. | |
chandlerc | 4aba5426d4 |
Rework a very hot, very sensitive part of snappy to reduce the number of
instructions, the number of dynamic branches, and avoid a particular loop structure than LLVM has a very hard time optimizing for this particular case. The code being changed is part of the hottest path for snappy decompression. In the benchmarks for decompressing protocol buffers, this has proven to be amazingly sensitive to the slightest changes in code layout. For example, previously we added '.p2align 5' assembly directive to the code. This essentially padded the loop out from the function. Merely by doing this we saw significant performance improvements. As a consequence, several of the compiler's typically reasonable optimizations can have surprising bad impacts. Loop unrolling is a primary culprit, but in the next LLVM release we are seeing an issue due to loop rotation. While some of the problems caused by the newly triggered loop rotation in LLVM can be mitigated with ongoing work on LLVM's code layout optimizations (specifically, loop header cloning), that is a fairly long term project. And even minor fluctuations in how that subsequent optimization is performed may prevent gaining the performance back. For now, we need some way to unblock the next LLVM release which contains a generic improvement to the LLVM loop optimizer that enables loop rotation in more places, but uncovers this sensitivity and weakness in a particular case. This CL restructures the loop to have a simpler structure. Specifically, we eagerly test what the terminal condition will be and provide two versions of the copy loop that use a single loop predicate. The comments in the source code and benchmarks indicate that only one of these two cases is actually hot: we expect to generally have enough slop in the buffer. That in turn allows us to generate a much simpler branch and loop structure for the hot path (especially for the protocol buffer decompression benchmark). However, structuring even this simple loop in a way that doesn't trigger some other performance bubble (often a more severe one) is quite challenging. We have to carefully manage the variables used in the loop and the addressing pattern. We should teach LLVM how to do this reliably, but that too is a *much* more significant undertaking and is extremely rare to have this degree of importance. The desired structure of the loop, as shown with IACA's analysis for the broadwell micro-architecture (HSW and SKX are similar): | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | | --------------------------------------------------------------------------------- | 1 | | | 1.0 1.0 | | | | | | | mov rcx, qword ptr [rdi+rdx*1-0x8] | 2^ | | | | 0.4 | 1.0 | | | 0.6 | | mov qword ptr [rdi], rcx | 1 | | | | 1.0 1.0 | | | | | | mov rcx, qword ptr [rdi+rdx*1] | 2^ | | | 0.3 | | 1.0 | | | 0.7 | | mov qword ptr [rdi+0x8], rcx | 1 | 0.5 | | | | | 0.5 | | | | add rdi, 0x10 | 1 | 0.2 | | | | | | 0.8 | | | cmp rdi, rax | 0F | | | | | | | | | | jb 0xffffffffffffffe9 Specifically, the arrangement of addressing modes for the stores such that micro-op fusion (indicated by the `^` on the `2` micro-op count) is important to achieve good throughput for this loop. The other thing necessary to make this change effective is to remove our previous hack using `.p2align 5` to pad out the main decompression loop, and to forcibly disable loop unrolling for critical loops. Because this change simplifies the loop structure, more unrolling opportunities show up. Also, the next LLVM release's generic loop optimization improvements allow unrolling in more places, requiring still more disabling of unrolling in this change. Perhaps most surprising of these is that we must disable loop unrolling in the *slow* path. While unrolling there seems pointless, it should also be harmless. This cold code is laid out very far away from all of the hot code. All the samples shown in a profile of the benchmark occur before this loop in the function. And yet, if the loop gets unrolled (which seems to only happen reliably with the next LLVM release) we see a nearly 20% regression in decompressing protocol buffers! With the current release of LLVM, we still observe some regression from this source change, but it is fairly small (5% on decompressing protocol buffers, less elsewhere). And with the next LLVM release it drops to under 1% even in that case. Meanwhile, without this change, the next release of LLVM will regress decompressing protocol buffers by more than 10%. |
|
costan | 26102a0c66 |
Fix generated version number in open source release.
Lands GitHub PR #61. The patch was also independently contributed by Martin Gieseking <martin.gieseking@uos.de>. |