Commit Graph

200 Commits

Author SHA1 Message Date
Bhargava Shastry a58d4b03c5 Update travis config for fuzzer builds 2019-07-27 10:57:49 +02:00
Bhargava Shastry d71375bf8a Add libFuzzer harnesses, a cmake option to build them 2019-07-12 14:42:48 +02:00
Chris Mumford 156cd8939c Removed reference to deprecated autotools.
PiperOrigin-RevId: 253128048
2019-06-14 15:40:42 -07:00
Victor Costan fe702ad2a3 Use GCC 9 on Travis CI
PiperOrigin-RevId: 249995900
2019-05-25 14:37:17 -07:00
Chris Mumford a3e012d762 The snappy landing page at http://google.github.io/snappy/ is
served by [GitHub Pages](https://pages.github.com/) and lives
in the gh-pages branch. This changes moves the page contents
to a more easily accessed Markdown file.

PiperOrigin-RevId: 248561542
2019-05-16 11:11:34 -07:00
Chris Mumford 4312f49315 Merge pull request #75 from Maikuolan:patch-1
PiperOrigin-RevId: 248558516
2019-05-16 11:11:21 -07:00
Chris Mumford 407712f4c9 Merge pull request #76 from abyss7:patch-1
PiperOrigin-RevId: 248211389
2019-05-14 14:27:56 -07:00
Chris Mumford 8c188a6c78 Minor typo fix in README.
PiperOrigin-RevId: 248170160
2019-05-14 11:05:38 -07:00
Chris Mumford c76b053449 Sync TODO and comment processing with external repo.
Copybara transforms code slightly different than MOE. One
example is the TODO username stripping where Copybara
produces different results than MOE did. This change
moves the Copybara versions of comments to the public
repository.

Note: These changes didn't originate in cl/247950252.

PiperOrigin-RevId: 247950252
2019-05-14 11:02:57 -07:00
Chris Mumford 54b6379e9f Changed CMake version from 3.4 to that in CMakeLists.txt in README.
PiperOrigin-RevId: 247484946
2019-05-13 10:11:19 -07:00
Victor Costan 0af4349bf0 Update Travis CI configuration.
The Travis configuration:
1) Installs recent versions of clang and GCC.
2) Sets up the environment so that CMake picks up the installed
   compilers. Previously, the pre-installed clang compiler was used
   instead.
3) Requests a modern macOS image that has all the headers needed by GCC.

The CL also removes now-unnecessary old workarounds from the
Travis configuration.

PiperOrigin-RevId: 245832795
2019-05-13 10:11:19 -07:00
Chris Mumford 877cc86f0e Fixed formatted (bash/c++) sections of README.md.
PiperOrigin-RevId: 244695986
2019-05-13 10:11:19 -07:00
atdt 02cf187555 Remove MSan exemption for _bzhi_u32, since LLVM now handles it correctly.
This cleans up a TODO from cl/225463783 and cl/225655713.

PiperOrigin-RevId: 241933185
2019-05-13 10:11:12 -07:00
Ivan be831dc98c
Fix compilation 2019-04-25 18:44:08 +03:00
costan d58cd618be Remove MSBuild section from AppVeyor configuration. 2019-02-26 18:28:14 -08:00
nafi c197d686a9 Optimize snappy compression by about 2.2%.
'jpg_200' is notably optimized by ~8%.

name                                          old time/op             new time/op             delta
BM_UFlat/0      [html             ]            41.8µs ± 0%             41.9µs ± 0%  +0.33%          (p=0.016 n=5+5)
BM_UFlat/1      [urls             ]             590µs ± 0%              590µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/2      [jpg              ]            7.14µs ± 1%             7.12µs ± 1%    ~             (p=0.310 n=5+5)
BM_UFlat/3      [jpg_200          ]              129ns ± 0%              129ns ± 0%    ~             (p=0.167 n=5+5)
BM_UFlat/4      [pdf              ]            8.21µs ± 0%             8.20µs ± 0%    ~             (p=0.310 n=5+5)
BM_UFlat/5      [html4            ]             220µs ± 1%              220µs ± 0%    ~             (p=0.421 n=5+5)
BM_UFlat/6      [txt1             ]             193µs ± 0%              193µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/7      [txt2             ]             171µs ± 0%              171µs ± 0%    ~             (p=0.056 n=5+5)
BM_UFlat/8      [txt3             ]             512µs ± 0%              511µs ± 0%    ~             (p=0.310 n=5+5)
BM_UFlat/9      [txt4             ]             716µs ± 0%              716µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/10     [pb               ]            38.8µs ± 1%             38.8µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/11     [gaviota          ]             190µs ± 0%              190µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/12     [cp               ]            14.4µs ± 1%             14.4µs ± 1%    ~             (p=0.151 n=5+5)
BM_UFlat/13     [c                ]            7.33µs ± 0%             7.32µs ± 0%    ~             (p=0.690 n=5+5)
BM_UFlat/14     [lsp              ]            2.30µs ± 0%             2.31µs ± 1%    ~             (p=0.548 n=5+5)
BM_UFlat/15     [xls              ]             984µs ± 0%              984µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/16     [xls_200          ]              213ns ± 0%              213ns ± 0%    ~             (p=0.310 n=5+5)
BM_UFlat/17     [bin              ]             277µs ± 0%              278µs ± 0%    ~             (p=0.690 n=5+5)
BM_UFlat/18     [bin_200          ]              101ns ± 0%              102ns ± 0%    ~             (p=0.190 n=5+4)
BM_UFlat/19     [sum              ]            29.6µs ± 0%             29.6µs ± 0%    ~             (p=0.310 n=5+5)
BM_UFlat/20     [man              ]            2.98µs ± 1%             2.98µs ± 0%    ~             (p=1.000 n=5+5)
BM_UValidate/0  [html             ]            33.5µs ± 0%             33.6µs ± 0%    ~             (p=0.310 n=5+5)
BM_UValidate/1  [urls             ]             443µs ± 0%              443µs ± 0%    ~             (p=0.841 n=5+5)
BM_UValidate/2  [jpg              ]              146ns ± 0%              146ns ± 0%    ~             (p=0.222 n=5+5)
BM_UValidate/3  [jpg_200          ]             95.6ns ± 0%             95.5ns ± 0%    ~             (p=0.421 n=5+5)
BM_UValidate/4  [pdf              ]            2.92µs ± 0%             2.92µs ± 0%    ~             (p=0.841 n=5+5)
BM_UIOVec/0     [html             ]             122µs ± 0%              122µs ± 0%    ~             (p=0.548 n=5+5)
BM_UIOVec/1     [urls             ]             1.08ms ± 0%             1.08ms ± 0%    ~             (p=0.151 n=5+5)
BM_UIOVec/2     [jpg              ]            7.48µs ± 5%             7.75µs ±12%    ~             (p=0.690 n=5+5)
BM_UIOVec/3     [jpg_200          ]              331ns ± 1%              327ns ± 1%    ~             (p=0.056 n=5+5)
BM_UIOVec/4     [pdf              ]            12.0µs ± 0%             12.0µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/0  [html             ]            41.7µs ± 0%             41.8µs ± 0%    ~             (p=0.421 n=5+5)
BM_UFlatSink/1  [urls             ]             591µs ± 0%              590µs ± 0%    ~             (p=0.151 n=5+5)
BM_UFlatSink/2  [jpg              ]            7.18µs ± 2%             7.31µs ± 3%    ~             (p=0.190 n=4+5)
BM_UFlatSink/3  [jpg_200          ]              134ns ± 2%              134ns ± 2%    ~             (p=1.000 n=5+5)
BM_UFlatSink/4  [pdf              ]            8.22µs ± 0%             8.23µs ± 0%    ~             (p=0.730 n=4+5)
BM_UFlatSink/5  [html4            ]             219µs ± 0%              219µs ± 0%    ~             (p=0.548 n=5+5)
BM_UFlatSink/6  [txt1             ]             193µs ± 0%              193µs ± 0%    ~             (p=0.095 n=5+5)
BM_UFlatSink/7  [txt2             ]             171µs ± 0%              171µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlatSink/8  [txt3             ]             512µs ± 0%              512µs ± 0%    ~             (p=0.548 n=5+5)
BM_UFlatSink/9  [txt4             ]             718µs ± 0%              718µs ± 0%    ~             (p=0.548 n=5+5)
BM_UFlatSink/10 [pb               ]            38.7µs ± 0%             38.6µs ± 0%    ~             (p=0.222 n=5+5)
BM_UFlatSink/11 [gaviota          ]             191µs ± 0%              190µs ± 0%    ~             (p=0.690 n=5+5)
BM_UFlatSink/12 [cp               ]            14.3µs ± 0%             14.4µs ± 0%    ~             (p=0.222 n=5+5)
BM_UFlatSink/13 [c                ]            7.33µs ± 0%             7.34µs ± 1%    ~             (p=0.690 n=5+5)
BM_UFlatSink/14 [lsp              ]            2.29µs ± 1%             2.30µs ± 1%    ~             (p=0.095 n=5+5)
BM_UFlatSink/15 [xls              ]             981µs ± 0%              980µs ± 0%    ~             (p=0.310 n=5+5)
BM_UFlatSink/16 [xls_200          ]              216ns ± 1%              216ns ± 1%    ~             (p=1.000 n=5+5)
BM_UFlatSink/17 [bin              ]             277µs ± 0%              277µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/18 [bin_200          ]              104ns ± 0%              104ns ± 1%    ~             (p=0.905 n=5+4)
BM_UFlatSink/19 [sum              ]            29.5µs ± 0%             29.5µs ± 0%    ~             (p=0.222 n=5+5)
BM_UFlatSink/20 [man              ]            3.01µs ± 1%             3.01µs ± 0%    ~             (p=0.730 n=5+4)
BM_ZFlat/0      [html (22.31 %)   ]             126µs ± 0%              124µs ± 0%  -1.66%          (p=0.008 n=5+5)
BM_ZFlat/1      [urls (47.78 %)   ]             1.68ms ± 0%             1.63ms ± 0%  -2.73%          (p=0.008 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]            11.6µs ± 8%             11.4µs ± 6%    ~             (p=0.310 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]              369ns ± 1%              340ns ± 1%  -7.93%          (p=0.008 n=5+5)
BM_ZFlat/4      [pdf (83.30 %)    ]            14.9µs ± 4%             14.4µs ± 1%  -3.56%          (p=0.008 n=5+5)
BM_ZFlat/5      [html4 (22.52 %)  ]             551µs ± 0%              545µs ± 0%  -1.21%          (p=0.008 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]             540µs ± 0%              534µs ± 0%  -1.15%          (p=0.008 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]             480µs ± 0%              475µs ± 0%  -1.13%          (p=0.008 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]             1.44ms ± 0%             1.43ms ± 0%  -1.14%          (p=0.008 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]             1.97ms ± 0%             1.95ms ± 0%  -1.00%          (p=0.008 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]             110µs ± 0%              107µs ± 0%  -2.77%          (p=0.008 n=5+5)
BM_ZFlat/11     [gaviota (37.72 %)]             413µs ± 0%              411µs ± 0%  -0.50%          (p=0.008 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            46.6µs ± 1%             44.8µs ± 1%  -3.89%          (p=0.008 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            17.8µs ± 0%             17.5µs ± 0%  -1.87%          (p=0.008 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            5.62µs ± 1%             5.35µs ± 1%  -4.81%          (p=0.008 n=5+5)
BM_ZFlat/15     [xls (41.23 %)    ]             1.63ms ± 0%             1.63ms ± 0%    ~             (p=0.310 n=5+5)
BM_ZFlat/16     [xls_200 (78.00 %)]              393ns ± 1%              384ns ± 2%  -2.45%          (p=0.008 n=5+5)
BM_ZFlat/17     [bin (18.11 %)    ]             510µs ± 0%              503µs ± 0%  -1.50%          (p=0.016 n=4+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]             83.2ns ± 3%             84.5ns ± 4%    ~             (p=0.206 n=5+5)
BM_ZFlat/19     [sum (48.96 %)    ]            80.0µs ± 0%             78.3µs ± 0%  -2.20%          (p=0.008 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            7.79µs ± 1%             7.45µs ± 1%  -4.38%          (p=0.008 n=5+5)

name                                          old allocs/op           new allocs/op           delta
BM_UFlat/0      [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/1      [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/2      [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/3      [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/4      [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/5      [html4            ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/6      [txt1             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/7      [txt2             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/8      [txt3             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/9      [txt4             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/10     [pb               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/11     [gaviota          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/12     [cp               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/13     [c                ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/14     [lsp              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/15     [xls              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/16     [xls_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/17     [bin              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/18     [bin_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/19     [sum              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/20     [man              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/0  [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/1  [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/2  [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/3  [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/4  [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/0     [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/1     [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/2     [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/3     [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/4     [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/0  [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/1  [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/2  [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/3  [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/4  [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/5  [html4            ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/6  [txt1             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/7  [txt2             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/8  [txt3             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/9  [txt4             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/10 [pb               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/11 [gaviota          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/12 [cp               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/13 [c                ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/14 [lsp              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/15 [xls              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/16 [xls_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/17 [bin              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/18 [bin_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/19 [sum              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/20 [man              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_ZFlat/0      [html (22.31 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/1      [urls (47.78 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/2      [jpg (99.95 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/3      [jpg_200 (73.00 %)]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/4      [pdf (83.30 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/5      [html4 (22.52 %)  ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/6      [txt1 (57.88 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/7      [txt2 (61.91 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/8      [txt3 (54.99 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/9      [txt4 (66.26 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/10     [pb (19.68 %)     ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/11     [gaviota (37.72 %)]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/12     [cp (48.12 %)     ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/13     [c (42.47 %)      ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/14     [lsp (48.37 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/15     [xls (41.23 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/16     [xls_200 (78.00 %)]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/17     [bin (18.11 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/18     [bin_200 (7.50 %) ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/19     [sum (48.96 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/20     [man (59.21 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)

name                                          old peak-mem(Bytes)/op  new peak-mem(Bytes)/op  delta
BM_UFlat/0      [html             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/1      [urls             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/2      [jpg              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/3      [jpg_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/4      [pdf              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/5      [html4            ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/6      [txt1             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/7      [txt2             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/8      [txt3             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/9      [txt4             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/10     [pb               ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/11     [gaviota          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/12     [cp               ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/13     [c                ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/14     [lsp              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/15     [xls              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/16     [xls_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/17     [bin              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/18     [bin_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/19     [sum              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/20     [man              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/0  [html             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/1  [urls             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/2  [jpg              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/3  [jpg_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/4  [pdf              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/0     [html             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/1     [urls             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/2     [jpg              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/3     [jpg_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/4     [pdf              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlatSink/0  [html             ]               102k ± 0%               102k ± 0%    ~     (all samples are equal)
BM_UFlatSink/1  [urls             ]               702k ± 0%               702k ± 0%    ~     (all samples are equal)
BM_UFlatSink/2  [jpg              ]               123k ± 0%               123k ± 0%    ~     (all samples are equal)
BM_UFlatSink/3  [jpg_200          ]                201 ± 0%                201 ± 0%    ~     (all samples are equal)
BM_UFlatSink/4  [pdf              ]               102k ± 0%               102k ± 0%    ~     (all samples are equal)
BM_UFlatSink/5  [html4            ]               410k ± 0%               410k ± 0%    ~     (all samples are equal)
BM_UFlatSink/6  [txt1             ]               152k ± 0%               152k ± 0%    ~     (all samples are equal)
BM_UFlatSink/7  [txt2             ]               125k ± 0%               125k ± 0%    ~     (all samples are equal)
BM_UFlatSink/8  [txt3             ]               427k ± 0%               427k ± 0%    ~     (all samples are equal)
BM_UFlatSink/9  [txt4             ]               482k ± 0%               482k ± 0%    ~     (all samples are equal)
BM_UFlatSink/10 [pb               ]               119k ± 0%               119k ± 0%    ~     (all samples are equal)
BM_UFlatSink/11 [gaviota          ]               184k ± 0%               184k ± 0%    ~     (all samples are equal)
BM_UFlatSink/12 [cp               ]              24.6k ± 0%              24.6k ± 0%    ~     (all samples are equal)
BM_UFlatSink/13 [c                ]              11.2k ± 0%              11.2k ± 0%    ~     (all samples are equal)
BM_UFlatSink/14 [lsp              ]              3.72k ± 0%              3.72k ± 0%    ~     (all samples are equal)
BM_UFlatSink/15 [xls              ]              1.03M ± 0%              1.03M ± 0%    ~     (all samples are equal)
BM_UFlatSink/16 [xls_200          ]                201 ± 0%                201 ± 0%    ~     (all samples are equal)
BM_UFlatSink/17 [bin              ]               513k ± 0%               513k ± 0%    ~     (all samples are equal)
BM_UFlatSink/18 [bin_200          ]                201 ± 0%                201 ± 0%    ~     (all samples are equal)
BM_UFlatSink/19 [sum              ]              38.2k ± 0%              38.2k ± 0%    ~     (all samples are equal)
BM_UFlatSink/20 [man              ]              4.23k ± 0%              4.23k ± 0%    ~     (all samples are equal)
BM_ZFlat/0      [html (22.31 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/1      [urls (47.78 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/2      [jpg (99.95 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/3      [jpg_200 (73.00 %)]              30.7k ± 0%              30.7k ± 0%    ~     (all samples are equal)
BM_ZFlat/4      [pdf (83.30 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/5      [html4 (22.52 %)  ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/6      [txt1 (57.88 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/7      [txt2 (61.91 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/8      [txt3 (54.99 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/9      [txt4 (66.26 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/10     [pb (19.68 %)     ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/11     [gaviota (37.72 %)]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/12     [cp (48.12 %)     ]              86.1k ± 0%              86.1k ± 0%    ~     (all samples are equal)
BM_ZFlat/13     [c (42.47 %)      ]              57.0k ± 0%              57.0k ± 0%    ~     (all samples are equal)
BM_ZFlat/14     [lsp (48.37 %)    ]              30.6k ± 0%              30.6k ± 0%    ~     (all samples are equal)
BM_ZFlat/15     [xls (41.23 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/16     [xls_200 (78.00 %)]              30.7k ± 0%              30.7k ± 0%    ~     (all samples are equal)
BM_ZFlat/17     [bin (18.11 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/18     [bin_200 (7.50 %) ]              30.7k ± 0%              30.7k ± 0%    ~     (all samples are equal)
BM_ZFlat/19     [sum (48.96 %)    ]               116k ± 0%               116k ± 0%    ~     (all samples are equal)
BM_ZFlat/20     [man (59.21 %)    ]              30.6k ± 0%              30.6k ± 0%    ~     (all samples are equal)

name                                          old speed               new speed               delta
BM_UFlat/0      [html             ]           2.46GB/s ± 0%           2.45GB/s ± 1%    ~             (p=0.841 n=5+5)
BM_UFlat/1      [urls             ]           1.19GB/s ± 1%           1.20GB/s ± 1%    ~             (p=0.310 n=5+5)
BM_UFlat/2      [jpg              ]           17.3GB/s ± 1%           17.4GB/s ± 1%    ~             (p=0.310 n=5+5)
BM_UFlat/3      [jpg_200          ]           1.56GB/s ± 0%           1.56GB/s ± 0%    ~             (p=0.190 n=4+5)
BM_UFlat/4      [pdf              ]           12.5GB/s ± 1%           12.5GB/s ± 0%    ~             (p=0.548 n=5+5)
BM_UFlat/5      [html4            ]           1.87GB/s ± 0%           1.87GB/s ± 1%    ~             (p=1.000 n=5+5)
BM_UFlat/6      [txt1             ]            791MB/s ± 1%            791MB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/7      [txt2             ]            737MB/s ± 0%            738MB/s ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/8      [txt3             ]            839MB/s ± 0%            839MB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/9      [txt4             ]            675MB/s ± 1%            674MB/s ± 0%    ~             (p=0.730 n=5+4)
BM_UFlat/10     [pb               ]           3.08GB/s ± 1%           3.06GB/s ± 0%    ~             (p=0.095 n=5+5)
BM_UFlat/11     [gaviota          ]            974MB/s ± 0%            976MB/s ± 0%    ~             (p=0.238 n=5+5)
BM_UFlat/12     [cp               ]           1.70GB/s ± 0%           1.72GB/s ± 0%  +1.07%          (p=0.016 n=4+5)
BM_UFlat/13     [c                ]           1.53GB/s ± 0%           1.53GB/s ± 1%    ~             (p=1.000 n=5+5)
BM_UFlat/14     [lsp              ]           1.62GB/s ± 1%           1.62GB/s ± 1%    ~             (p=1.000 n=5+5)
BM_UFlat/15     [xls              ]           1.05GB/s ± 1%           1.05GB/s ± 0%    ~             (p=0.556 n=5+4)
BM_UFlat/16     [xls_200          ]            943MB/s ± 0%            940MB/s ± 0%    ~             (p=0.151 n=5+5)
BM_UFlat/17     [bin              ]           1.86GB/s ± 1%           1.86GB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/18     [bin_200          ]           1.99GB/s ± 0%           1.97GB/s ± 1%    ~             (p=0.190 n=5+4)
BM_UFlat/19     [sum              ]           1.30GB/s ± 0%           1.30GB/s ± 1%    ~             (p=0.151 n=5+5)
BM_UFlat/20     [man              ]           1.42GB/s ± 1%           1.42GB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UValidate/0  [html             ]           3.06GB/s ± 0%           3.06GB/s ± 1%    ~             (p=1.000 n=5+5)
BM_UValidate/1  [urls             ]           1.59GB/s ± 0%           1.59GB/s ± 0%    ~             (p=0.095 n=5+5)
BM_UValidate/2  [jpg              ]            845GB/s ± 0%            845GB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UValidate/3  [jpg_200          ]           2.10GB/s ± 0%           2.10GB/s ± 0%    ~             (p=0.310 n=5+5)
BM_UValidate/4  [pdf              ]           35.1GB/s ± 0%           35.1GB/s ± 1%    ~             (p=0.690 n=5+5)
BM_UIOVec/0     [html             ]            843MB/s ± 0%            847MB/s ± 1%    ~             (p=0.222 n=5+5)
BM_UIOVec/1     [urls             ]            652MB/s ± 1%            652MB/s ± 1%    ~             (p=0.310 n=5+5)
BM_UIOVec/2     [jpg              ]           16.5GB/s ± 5%           16.0GB/s ±10%    ~             (p=0.841 n=5+5)
BM_UIOVec/3     [jpg_200          ]            606MB/s ± 1%            614MB/s ± 1%    ~             (p=0.056 n=5+5)
BM_UIOVec/4     [pdf              ]           8.57GB/s ± 0%           8.57GB/s ± 0%    ~             (p=0.343 n=4+4)
BM_UFlatSink/0  [html             ]           2.47GB/s ± 0%           2.45GB/s ± 0%  -0.58%          (p=0.016 n=5+5)
BM_UFlatSink/1  [urls             ]           1.19GB/s ± 0%           1.20GB/s ± 0%    ~             (p=0.548 n=5+5)
BM_UFlatSink/2  [jpg              ]           16.4GB/s ±19%           16.9GB/s ± 4%    ~             (p=0.690 n=5+5)
BM_UFlatSink/3  [jpg_200          ]           1.50GB/s ± 2%           1.50GB/s ± 2%    ~             (p=1.000 n=5+5)
BM_UFlatSink/4  [pdf              ]           12.5GB/s ± 0%           12.5GB/s ± 0%    ~             (p=0.730 n=4+5)
BM_UFlatSink/5  [html4            ]           1.87GB/s ± 1%           1.88GB/s ± 0%    ~             (p=0.421 n=5+5)
BM_UFlatSink/6  [txt1             ]            793MB/s ± 0%            792MB/s ± 1%    ~             (p=0.690 n=5+5)
BM_UFlatSink/7  [txt2             ]            736MB/s ± 0%            736MB/s ± 1%    ~             (p=0.841 n=5+5)
BM_UFlatSink/8  [txt3             ]            839MB/s ± 0%            839MB/s ± 0%    ~             (p=0.548 n=5+5)
BM_UFlatSink/9  [txt4             ]            675MB/s ± 0%            675MB/s ± 0%    ~             (p=0.222 n=5+5)
BM_UFlatSink/10 [pb               ]           3.07GB/s ± 0%           3.09GB/s ± 0%  +0.54%          (p=0.016 n=5+5)
BM_UFlatSink/11 [gaviota          ]            973MB/s ± 0%            971MB/s ± 0%    ~             (p=0.151 n=5+5)
BM_UFlatSink/12 [cp               ]           1.72GB/s ± 1%           1.71GB/s ± 1%    ~             (p=0.421 n=5+5)
BM_UFlatSink/13 [c                ]           1.53GB/s ± 1%           1.52GB/s ± 0%    ~             (p=0.841 n=5+5)
BM_UFlatSink/14 [lsp              ]           1.63GB/s ± 0%           1.62GB/s ± 1%    ~             (p=0.222 n=5+5)
BM_UFlatSink/15 [xls              ]           1.06GB/s ± 0%           1.05GB/s ± 0%    ~             (p=0.111 n=4+5)
BM_UFlatSink/16 [xls_200          ]            932MB/s ± 1%            928MB/s ± 1%    ~             (p=0.548 n=5+5)
BM_UFlatSink/17 [bin              ]           1.86GB/s ± 0%           1.86GB/s ± 1%    ~             (p=1.000 n=5+5)
BM_UFlatSink/18 [bin_200          ]           1.93GB/s ± 1%           1.94GB/s ± 1%    ~             (p=0.730 n=5+4)
BM_UFlatSink/19 [sum              ]           1.30GB/s ± 0%           1.30GB/s ± 1%    ~             (p=0.690 n=5+5)
BM_UFlatSink/20 [man              ]           1.41GB/s ± 1%           1.41GB/s ± 2%    ~             (p=0.690 n=5+5)
BM_ZFlat/0      [html (22.31 %)   ]            815MB/s ± 1%            829MB/s ± 0%  +1.78%          (p=0.008 n=5+5)
BM_ZFlat/1      [urls (47.78 %)   ]            420MB/s ± 1%            432MB/s ± 1%  +2.87%          (p=0.008 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]           10.7GB/s ± 8%           10.9GB/s ± 6%    ~             (p=0.421 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]            544MB/s ± 2%            590MB/s ± 1%  +8.41%          (p=0.008 n=5+5)
BM_ZFlat/4      [pdf (83.30 %)    ]           6.92GB/s ± 3%           7.16GB/s ± 1%  +3.51%          (p=0.008 n=5+5)
BM_ZFlat/5      [html4 (22.52 %)  ]            745MB/s ± 0%            755MB/s ± 0%  +1.34%          (p=0.008 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]            282MB/s ± 0%            285MB/s ± 1%  +1.04%          (p=0.008 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]            262MB/s ± 0%            265MB/s ± 0%  +1.22%          (p=0.008 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]            297MB/s ± 0%            300MB/s ± 0%  +1.09%          (p=0.008 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]            246MB/s ± 1%            248MB/s ± 0%  +0.95%          (p=0.008 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]           1.08GB/s ± 1%           1.11GB/s ± 1%  +2.57%          (p=0.008 n=5+5)
BM_ZFlat/11     [gaviota (37.72 %)]            449MB/s ± 1%            451MB/s ± 0%    ~             (p=0.056 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            530MB/s ± 1%            552MB/s ± 0%  +4.17%          (p=0.008 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            628MB/s ± 1%            640MB/s ± 0%  +1.85%          (p=0.008 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            665MB/s ± 0%            697MB/s ± 1%  +4.71%          (p=0.008 n=5+5)
BM_ZFlat/15     [xls (41.23 %)    ]            635MB/s ± 0%            634MB/s ± 0%    ~             (p=0.310 n=5+5)
BM_ZFlat/16     [xls_200 (78.00 %)]            511MB/s ± 1%            522MB/s ± 2%  +2.23%          (p=0.008 n=5+5)
BM_ZFlat/17     [bin (18.11 %)    ]           1.01GB/s ± 1%           1.02GB/s ± 0%  +1.67%          (p=0.008 n=5+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]           2.41GB/s ± 3%           2.37GB/s ± 4%    ~             (p=0.222 n=5+5)
BM_ZFlat/19     [sum (48.96 %)    ]            480MB/s ± 0%            490MB/s ± 1%  +2.24%          (p=0.008 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            545MB/s ± 0%            569MB/s ± 1%  +4.38%          (p=0.008 n=5+5)
2019-02-26 18:27:31 -08:00
costan 3f194acb57 Convert DCHECK to assert.
A previous CL introduced a use of DCHECK. The open source build does not
support DCHECK, and this project uses assert() instead of DCHECK.
2019-01-08 13:49:15 -08:00
costan 97a20b480f Reduce the LeftShiftOverflows() table size.
A previous CL introduced LeftShiftOverflows(), which takes a uint32
input. However, the value it operates on is guaranteed to only have 8
bits set. This CL takes advantage of this restriction to reduce the size
of the static table used to compute LeftShiftOverflows().

The same methodology as the previous CL suggests a 0.6% improvement. The
improvement is likely bigger on mobile CPUs that have much smaller
caches.

Benchmark results:

name                                          old time/op             new time/op             delta
BM_UFlat/0      [html             ]            42.5µs ± 1%             42.1µs ± 0%  -0.87%        (p=0.000 n=20+20)
BM_UFlat/1      [urls             ]             575µs ± 0%              574µs ± 0%  -0.16%        (p=0.000 n=20+19)
BM_UFlat/2      [jpg              ]            7.13µs ± 1%             7.20µs ± 5%    ~           (p=0.422 n=16+19)
BM_UFlat/3      [jpg_200          ]              129ns ± 0%              130ns ± 0%  +0.82%        (p=0.000 n=20+17)
BM_UFlat/4      [pdf              ]            8.22µs ± 1%             8.21µs ± 0%    ~           (p=0.586 n=17+17)
BM_UFlat/5      [html4            ]             222µs ± 0%              222µs ± 0%  -0.11%        (p=0.047 n=19+20)
BM_UFlat/6      [txt1             ]             192µs ± 0%              191µs ± 0%  -0.69%        (p=0.000 n=20+20)
BM_UFlat/7      [txt2             ]             169µs ± 0%              169µs ± 0%  -0.28%        (p=0.000 n=20+20)
BM_UFlat/8      [txt3             ]             510µs ± 0%              507µs ± 0%  -0.50%        (p=0.000 n=20+20)
BM_UFlat/9      [txt4             ]             707µs ± 0%              703µs ± 0%  -0.53%        (p=0.000 n=20+20)
BM_UFlat/10     [pb               ]            39.1µs ± 0%             38.5µs ± 0%  -1.56%        (p=0.000 n=20+20)
BM_UFlat/11     [gaviota          ]             189µs ± 0%              189µs ± 0%  -0.42%        (p=0.000 n=20+20)
BM_UFlat/12     [cp               ]            14.2µs ± 0%             14.2µs ± 1%  -0.30%        (p=0.001 n=18+19)
BM_UFlat/13     [c                ]            7.29µs ± 0%             7.34µs ± 1%  +0.59%        (p=0.000 n=19+20)
BM_UFlat/14     [lsp              ]            2.28µs ± 0%             2.29µs ± 1%  +0.39%        (p=0.000 n=19+18)
BM_UFlat/15     [xls              ]             905µs ± 0%              904µs ± 0%  -0.12%        (p=0.030 n=20+20)
BM_UFlat/16     [xls_200          ]              213ns ± 2%              215ns ± 4%  +0.92%        (p=0.011 n=20+20)
BM_UFlat/17     [bin              ]             274µs ± 0%              275µs ± 0%  +0.55%        (p=0.000 n=20+20)
BM_UFlat/18     [bin_200          ]              101ns ± 1%              101ns ± 1%    ~           (p=0.913 n=18+18)
BM_UFlat/19     [sum              ]            27.9µs ± 1%             27.5µs ± 1%  -1.38%        (p=0.000 n=20+20)
BM_UFlat/20     [man              ]            2.97µs ± 1%             2.97µs ± 1%    ~           (p=0.835 n=20+19)
BM_UValidate/0  [html             ]            33.5µs ± 0%             34.2µs ± 0%  +2.32%        (p=0.000 n=20+20)
BM_UValidate/1  [urls             ]             441µs ± 0%              442µs ± 0%  +0.15%        (p=0.010 n=20+20)
BM_UValidate/2  [jpg              ]              144ns ± 0%              146ns ± 0%  +1.32%        (p=0.000 n=20+20)
BM_UValidate/3  [jpg_200          ]             95.3ns ± 0%             96.0ns ± 0%  +0.68%        (p=0.000 n=20+20)
BM_UValidate/4  [pdf              ]            2.86µs ± 0%             2.88µs ± 1%  +0.67%        (p=0.000 n=19+19)
BM_UIOVec/0     [html             ]             122µs ± 0%              122µs ± 0%  -0.25%        (p=0.000 n=20+20)
BM_UIOVec/1     [urls             ]             1.08ms ± 0%             1.08ms ± 0%    ~           (p=0.068 n=20+20)
BM_UIOVec/2     [jpg              ]            7.63µs ± 7%             7.76µs ±11%    ~           (p=0.396 n=19+20)
BM_UIOVec/3     [jpg_200          ]              325ns ± 0%              326ns ± 0%  +0.27%        (p=0.000 n=20+18)
BM_UIOVec/4     [pdf              ]            12.1µs ± 2%             12.1µs ± 3%    ~           (p=0.967 n=19+20)
BM_UFlatSink/0  [html             ]            42.4µs ± 0%             42.1µs ± 0%  -0.89%        (p=0.000 n=20+20)
BM_UFlatSink/1  [urls             ]             575µs ± 0%              575µs ± 0%    ~           (p=0.883 n=20+20)
BM_UFlatSink/2  [jpg              ]            7.58µs ±16%             7.52µs ±15%    ~           (p=0.945 n=19+20)
BM_UFlatSink/3  [jpg_200          ]              133ns ± 4%              133ns ± 4%    ~           (p=0.627 n=19+20)
BM_UFlatSink/4  [pdf              ]            8.29µs ± 4%             8.39µs ± 4%  +1.14%        (p=0.013 n=19+18)
BM_UFlatSink/5  [html4            ]             223µs ± 0%              222µs ± 0%  -0.18%        (p=0.001 n=20+20)
BM_UFlatSink/6  [txt1             ]             192µs ± 0%              191µs ± 0%  -0.71%        (p=0.000 n=20+20)
BM_UFlatSink/7  [txt2             ]             169µs ± 0%              169µs ± 0%  -0.26%        (p=0.000 n=20+20)
BM_UFlatSink/8  [txt3             ]             510µs ± 0%              508µs ± 0%  -0.50%        (p=0.000 n=20+20)
BM_UFlatSink/9  [txt4             ]             707µs ± 0%              704µs ± 0%  -0.44%        (p=0.000 n=20+20)
BM_UFlatSink/10 [pb               ]            39.1µs ± 0%             38.5µs ± 1%  -1.62%        (p=0.000 n=19+20)
BM_UFlatSink/11 [gaviota          ]             189µs ± 0%              189µs ± 0%  -0.39%        (p=0.000 n=20+20)
BM_UFlatSink/12 [cp               ]            14.2µs ± 0%             14.2µs ± 1%    ~           (p=0.435 n=19+19)
BM_UFlatSink/13 [c                ]            7.29µs ± 0%             7.33µs ± 1%  +0.57%        (p=0.000 n=19+20)
BM_UFlatSink/14 [lsp              ]            2.29µs ± 0%             2.29µs ± 1%    ~           (p=0.791 n=18+18)
BM_UFlatSink/15 [xls              ]             903µs ± 0%              902µs ± 0%  -0.11%        (p=0.044 n=20+19)
BM_UFlatSink/16 [xls_200          ]              215ns ± 1%              215ns ± 1%    ~           (p=0.885 n=19+19)
BM_UFlatSink/17 [bin              ]             274µs ± 0%              275µs ± 0%  +0.51%        (p=0.000 n=20+20)
BM_UFlatSink/18 [bin_200          ]              103ns ± 2%              103ns ± 0%  -0.41%        (p=0.016 n=20+15)
BM_UFlatSink/19 [sum              ]            27.9µs ± 1%             27.5µs ± 1%  -1.34%        (p=0.000 n=20+19)
BM_UFlatSink/20 [man              ]            2.98µs ± 1%             2.97µs ± 1%    ~           (p=0.358 n=18+19)
BM_ZFlat/0      [html (22.31 %)   ]             126µs ± 0%              126µs ± 0%  +0.14%        (p=0.011 n=20+20)
BM_ZFlat/1      [urls (47.78 %)   ]             1.67ms ± 0%             1.67ms ± 0%  +0.11%        (p=0.043 n=20+20)
BM_ZFlat/2      [jpg (99.95 %)    ]            11.5µs ± 6%             11.7µs ± 7%    ~           (p=0.142 n=20+20)
BM_ZFlat/3      [jpg_200 (73.00 %)]              349ns ± 3%              351ns ± 3%    ~           (p=0.573 n=18+20)
BM_ZFlat/4      [pdf (83.30 %)    ]            14.6µs ± 2%             14.7µs ± 4%    ~           (p=0.879 n=19+20)
BM_ZFlat/5      [html4 (22.52 %)  ]             553µs ± 0%              552µs ± 0%  -0.23%        (p=0.000 n=20+20)
BM_ZFlat/6      [txt1 (57.88 %)   ]             540µs ± 0%              540µs ± 0%    ~           (p=0.221 n=20+20)
BM_ZFlat/7      [txt2 (61.91 %)   ]             479µs ± 0%              481µs ± 1%  +0.47%        (p=0.000 n=20+20)
BM_ZFlat/8      [txt3 (54.99 %)   ]             1.44ms ± 0%             1.44ms ± 0%  +0.13%        (p=0.040 n=20+20)
BM_ZFlat/9      [txt4 (66.26 %)   ]             1.97ms ± 0%             1.97ms ± 0%  +0.16%        (p=0.009 n=20+20)
BM_ZFlat/10     [pb (19.68 %)     ]             110µs ± 1%              109µs ± 1%  -0.79%        (p=0.000 n=20+20)
BM_ZFlat/11     [gaviota (37.72 %)]             410µs ± 0%              410µs ± 0%    ~           (p=0.149 n=20+19)
BM_ZFlat/12     [cp (48.12 %)     ]            45.4µs ± 1%             44.9µs ± 1%  -1.23%        (p=0.000 n=20+20)
BM_ZFlat/13     [c (42.47 %)      ]            17.5µs ± 0%             17.5µs ± 1%    ~           (p=0.883 n=20+20)
BM_ZFlat/14     [lsp (48.37 %)    ]            5.51µs ± 1%             5.46µs ± 1%  -0.95%        (p=0.000 n=20+18)
BM_ZFlat/15     [xls (41.23 %)    ]             1.61ms ± 0%             1.62ms ± 0%    ~           (p=0.183 n=20+20)
BM_ZFlat/16     [xls_200 (78.00 %)]              389ns ± 2%              391ns ± 3%    ~           (p=0.740 n=18+20)
BM_ZFlat/17     [bin (18.11 %)    ]             508µs ± 0%              508µs ± 0%    ~           (p=0.779 n=20+20)
BM_ZFlat/18     [bin_200 (7.50 %) ]             87.4ns ± 5%             88.1ns ± 8%    ~           (p=0.367 n=16+19)
BM_ZFlat/19     [sum (48.96 %)    ]            79.1µs ± 0%             80.2µs ± 0%  +1.39%        (p=0.000 n=20+20)
BM_ZFlat/20     [man (59.21 %)    ]            7.55µs ± 1%             7.57µs ± 1%  +0.31%        (p=0.025 n=19+19)

name                                          old speed               new speed               delta
BM_UFlat/0      [html             ]           2.42GB/s ± 0%           2.44GB/s ± 0%  +0.77%        (p=0.000 n=19+19)
BM_UFlat/1      [urls             ]           1.22GB/s ± 0%           1.23GB/s ± 0%  +0.06%        (p=0.000 n=20+19)
BM_UFlat/2      [jpg              ]           17.3GB/s ± 2%           17.2GB/s ± 4%    ~           (p=0.433 n=17+19)
BM_UFlat/3      [jpg_200          ]           1.56GB/s ± 0%           1.54GB/s ± 0%  -0.82%        (p=0.000 n=20+20)
BM_UFlat/4      [pdf              ]           12.5GB/s ± 1%           12.5GB/s ± 1%    ~           (p=0.322 n=17+17)
BM_UFlat/5      [html4            ]           1.85GB/s ± 0%           1.85GB/s ± 0%  +0.16%        (p=0.000 n=20+20)
BM_UFlat/6      [txt1             ]            794MB/s ± 0%            800MB/s ± 0%  +0.68%        (p=0.000 n=18+20)
BM_UFlat/7      [txt2             ]            741MB/s ± 0%            743MB/s ± 0%  +0.30%        (p=0.000 n=19+19)
BM_UFlat/8      [txt3             ]            840MB/s ± 0%            844MB/s ± 0%  +0.53%        (p=0.000 n=18+20)
BM_UFlat/9      [txt4             ]            684MB/s ± 0%            688MB/s ± 0%  +0.57%        (p=0.000 n=20+17)
BM_UFlat/10     [pb               ]           3.04GB/s ± 0%           3.09GB/s ± 0%  +1.60%        (p=0.000 n=19+20)
BM_UFlat/11     [gaviota          ]            977MB/s ± 0%            981MB/s ± 0%  +0.45%        (p=0.000 n=19+19)
BM_UFlat/12     [cp               ]           1.74GB/s ± 0%           1.74GB/s ± 0%  +0.29%        (p=0.000 n=20+19)
BM_UFlat/13     [c                ]           1.53GB/s ± 0%           1.52GB/s ± 1%  -0.56%        (p=0.000 n=19+20)
BM_UFlat/14     [lsp              ]           1.64GB/s ± 0%           1.63GB/s ± 1%  -0.38%        (p=0.000 n=19+20)
BM_UFlat/15     [xls              ]           1.14GB/s ± 0%           1.14GB/s ± 0%  +0.11%        (p=0.000 n=19+20)
BM_UFlat/16     [xls_200          ]            941MB/s ± 1%            931MB/s ± 4%  -1.02%        (p=0.001 n=19+20)
BM_UFlat/17     [bin              ]           1.88GB/s ± 0%           1.87GB/s ± 0%  -0.51%        (p=0.000 n=20+20)
BM_UFlat/18     [bin_200          ]           1.98GB/s ± 0%           1.98GB/s ± 1%    ~           (p=0.767 n=18+18)
BM_UFlat/19     [sum              ]           1.37GB/s ± 0%           1.39GB/s ± 0%  +1.46%        (p=0.000 n=20+20)
BM_UFlat/20     [man              ]           1.43GB/s ± 0%           1.43GB/s ± 0%    ~           (p=0.501 n=18+18)
BM_UValidate/0  [html             ]           3.07GB/s ± 0%           3.00GB/s ± 0%  -2.25%        (p=0.000 n=20+20)
BM_UValidate/1  [urls             ]           1.60GB/s ± 0%           1.59GB/s ± 0%  -0.11%        (p=0.000 n=18+19)
BM_UValidate/2  [jpg              ]            859GB/s ± 0%            848GB/s ± 0%  -1.29%        (p=0.000 n=20+19)
BM_UValidate/3  [jpg_200          ]           2.10GB/s ± 0%           2.09GB/s ± 0%  -0.68%        (p=0.000 n=19+20)
BM_UValidate/4  [pdf              ]           35.9GB/s ± 0%           35.6GB/s ± 1%  -0.71%        (p=0.000 n=20+20)
BM_UIOVec/0     [html             ]            843MB/s ± 0%            844MB/s ± 0%  +0.21%        (p=0.000 n=20+20)
BM_UIOVec/1     [urls             ]            651MB/s ± 0%            650MB/s ± 0%  -0.10%        (p=0.000 n=20+20)
BM_UIOVec/2     [jpg              ]           16.2GB/s ± 6%           16.0GB/s ±10%    ~           (p=0.380 n=19+20)
BM_UIOVec/3     [jpg_200          ]            617MB/s ± 0%            615MB/s ± 0%  -0.24%        (p=0.000 n=20+17)
BM_UIOVec/4     [pdf              ]           8.52GB/s ± 3%           8.50GB/s ± 3%    ~           (p=0.771 n=19+20)
BM_UFlatSink/0  [html             ]           2.42GB/s ± 0%           2.44GB/s ± 0%  +0.93%        (p=0.000 n=20+20)
BM_UFlatSink/1  [urls             ]           1.23GB/s ± 0%           1.23GB/s ± 0%  +0.04%        (p=0.006 n=20+20)
BM_UFlatSink/2  [jpg              ]           16.4GB/s ±14%           16.5GB/s ±13%    ~           (p=0.879 n=19+20)
BM_UFlatSink/3  [jpg_200          ]           1.51GB/s ± 4%           1.51GB/s ± 4%    ~           (p=0.874 n=18+20)
BM_UFlatSink/4  [pdf              ]           12.4GB/s ± 4%           12.3GB/s ± 4%  -1.11%        (p=0.016 n=19+18)
BM_UFlatSink/5  [html4            ]           1.85GB/s ± 0%           1.85GB/s ± 0%  +0.20%        (p=0.000 n=20+20)
BM_UFlatSink/6  [txt1             ]            794MB/s ± 0%            799MB/s ± 0%  +0.72%        (p=0.000 n=19+20)
BM_UFlatSink/7  [txt2             ]            741MB/s ± 0%            743MB/s ± 0%  +0.30%        (p=0.000 n=18+20)
BM_UFlatSink/8  [txt3             ]            839MB/s ± 0%            843MB/s ± 0%  +0.52%        (p=0.000 n=20+18)
BM_UFlatSink/9  [txt4             ]            684MB/s ± 0%            687MB/s ± 0%  +0.46%        (p=0.000 n=20+20)
BM_UFlatSink/10 [pb               ]           3.04GB/s ± 0%           3.09GB/s ± 0%  +1.71%        (p=0.000 n=20+19)
BM_UFlatSink/11 [gaviota          ]            976MB/s ± 0%            980MB/s ± 0%  +0.45%        (p=0.000 n=20+20)
BM_UFlatSink/12 [cp               ]           1.74GB/s ± 1%           1.74GB/s ± 1%    ~           (p=0.904 n=20+20)
BM_UFlatSink/13 [c                ]           1.53GB/s ± 0%           1.53GB/s ± 1%  -0.50%        (p=0.000 n=19+20)
BM_UFlatSink/14 [lsp              ]           1.63GB/s ± 1%           1.63GB/s ± 1%    ~           (p=0.358 n=19+18)
BM_UFlatSink/15 [xls              ]           1.14GB/s ± 0%           1.15GB/s ± 0%  +0.12%        (p=0.000 n=20+20)
BM_UFlatSink/16 [xls_200          ]            931MB/s ± 1%            931MB/s ± 1%    ~           (p=0.686 n=19+19)
BM_UFlatSink/17 [bin              ]           1.88GB/s ± 0%           1.87GB/s ± 0%  -0.53%        (p=0.000 n=20+20)
BM_UFlatSink/18 [bin_200          ]           1.94GB/s ± 2%           1.95GB/s ± 1%  +0.42%        (p=0.014 n=20+15)
BM_UFlatSink/19 [sum              ]           1.37GB/s ± 0%           1.39GB/s ± 0%  +1.38%        (p=0.000 n=19+18)
BM_UFlatSink/20 [man              ]           1.42GB/s ± 1%           1.43GB/s ± 0%    ~           (p=0.284 n=18+19)
BM_ZFlat/0      [html (22.31 %)   ]            815MB/s ± 0%            814MB/s ± 0%  -0.15%        (p=0.000 n=20+20)
BM_ZFlat/1      [urls (47.78 %)   ]            423MB/s ± 0%            422MB/s ± 0%  -0.14%        (p=0.000 n=20+20)
BM_ZFlat/2      [jpg (99.95 %)    ]           10.8GB/s ± 5%           10.6GB/s ± 7%    ~           (p=0.142 n=20+20)
BM_ZFlat/3      [jpg_200 (73.00 %)]            574MB/s ± 2%            572MB/s ± 2%    ~           (p=0.613 n=18+20)
BM_ZFlat/4      [pdf (83.30 %)    ]           7.01GB/s ± 2%           7.01GB/s ± 4%    ~           (p=0.593 n=18+20)
BM_ZFlat/5      [html4 (22.52 %)  ]            743MB/s ± 0%            745MB/s ± 0%  +0.25%        (p=0.000 n=20+19)
BM_ZFlat/6      [txt1 (57.88 %)   ]            283MB/s ± 0%            282MB/s ± 0%    ~           (p=0.261 n=18+19)
BM_ZFlat/7      [txt2 (61.91 %)   ]            262MB/s ± 0%            261MB/s ± 0%  -0.35%        (p=0.000 n=20+19)
BM_ZFlat/8      [txt3 (54.99 %)   ]            298MB/s ± 0%            297MB/s ± 0%  -0.11%        (p=0.000 n=20+19)
BM_ZFlat/9      [txt4 (66.26 %)   ]            245MB/s ± 0%            245MB/s ± 0%  -0.13%        (p=0.000 n=19+20)
BM_ZFlat/10     [pb (19.68 %)     ]           1.08GB/s ± 0%           1.09GB/s ± 0%  +0.82%        (p=0.000 n=18+19)
BM_ZFlat/11     [gaviota (37.72 %)]            451MB/s ± 0%            451MB/s ± 0%  -0.05%        (p=0.004 n=19+20)
BM_ZFlat/12     [cp (48.12 %)     ]            543MB/s ± 1%            550MB/s ± 1%  +1.24%        (p=0.000 n=20+20)
BM_ZFlat/13     [c (42.47 %)      ]            638MB/s ± 0%            637MB/s ± 0%    ~           (p=0.708 n=19+19)
BM_ZFlat/14     [lsp (48.37 %)    ]            678MB/s ± 2%            684MB/s ± 1%  +0.89%        (p=0.000 n=20+19)
BM_ZFlat/15     [xls (41.23 %)    ]            640MB/s ± 0%            640MB/s ± 0%  -0.10%        (p=0.000 n=19+19)
BM_ZFlat/16     [xls_200 (78.00 %)]            515MB/s ± 2%            514MB/s ± 3%    ~           (p=0.916 n=18+19)
BM_ZFlat/17     [bin (18.11 %)    ]           1.01GB/s ± 0%           1.01GB/s ± 0%  +0.03%        (p=0.033 n=20+20)
BM_ZFlat/18     [bin_200 (7.50 %) ]           2.30GB/s ± 6%           2.28GB/s ± 9%    ~           (p=0.502 n=16+19)
BM_ZFlat/19     [sum (48.96 %)    ]            485MB/s ± 0%            478MB/s ± 0%  -1.39%        (p=0.000 n=19+20)
BM_ZFlat/20     [man (59.21 %)    ]            562MB/s ± 1%            560MB/s ± 1%  -0.37%        (p=0.016 n=18+19)
2019-01-08 13:48:30 -08:00
costan 4f0adca400 Wrap BMI2 instruction usage in support checks.
A previous version of this was submitted and rolled back due to breakage
-- an attempt to accommodate Visual Studio resulted in compiler errors
on GCC/Clang with -mavx2 but without -mbmi2. This version makes the BMI2
support check more strict, to avoid the errors.

A previous CL introduced _bzhi_u32 (part of Intel's BMI2 instruction
set, released in Haswell) gated by a check for the __BMI2__ preprocessor
macro. This works for Clang and GCC, but does not work on Visual Studio,
and may not work on other compilers.

This CL plumbs the BMI2 support checks through the CMake configuration
used by the open source build. It also replaces the <x86intrin.h>
header, which does not exist on Visual Studio, with the more scoped
headers <tmmintrin.h> (for SSSE3) and <immintrin.h> (for BMI2/AVX2).
Asides from fixing the open source build, the more scoped headers make
it slightly less likely that newer intrinsics will creep in without
proper gating.
2019-01-08 06:44:11 -08:00
nafi 46768e335d Optimize decompression by about 0.82%.
Assembly difference: https://godbolt.org/z/cvlH9b

name                                          old time/op             new time/op             delta
BM_UFlat/0      [html             ]            42.3µs ± 0%             42.5µs ± 0%   +0.57%          (p=0.008 n=5+5)
BM_UFlat/1      [urls             ]             590µs ± 0%              575µs ± 0%   -2.60%          (p=0.008 n=5+5)
BM_UFlat/2      [jpg              ]            7.16µs ± 1%             7.15µs ± 1%     ~             (p=0.841 n=5+5)
BM_UFlat/3      [jpg_200          ]              131ns ± 0%              129ns ± 0%   -1.41%          (p=0.008 n=5+5)
BM_UFlat/4      [pdf              ]            8.21µs ± 0%             8.22µs ± 1%     ~             (p=0.690 n=5+5)
BM_UFlat/5      [html4            ]             222µs ± 0%              223µs ± 0%     ~             (p=0.841 n=5+5)
BM_UFlat/6      [txt1             ]             193µs ± 0%              192µs ± 0%     ~             (p=0.095 n=5+5)
BM_UFlat/7      [txt2             ]             171µs ± 0%              169µs ± 0%   -0.83%          (p=0.008 n=5+5)
BM_UFlat/8      [txt3             ]             511µs ± 0%              510µs ± 0%     ~             (p=0.841 n=5+5)
BM_UFlat/9      [txt4             ]             717µs ± 0%              707µs ± 0%   -1.42%          (p=0.008 n=5+5)
BM_UFlat/10     [pb               ]            38.8µs ± 0%             39.3µs ± 0%   +1.26%          (p=0.008 n=5+5)
BM_UFlat/11     [gaviota          ]             190µs ± 0%              189µs ± 0%   -0.43%          (p=0.032 n=5+5)
BM_UFlat/12     [cp               ]            14.3µs ± 0%             14.2µs ± 0%   -0.92%          (p=0.008 n=5+5)
BM_UFlat/13     [c                ]            7.35µs ± 1%             7.30µs ± 0%   -0.66%          (p=0.032 n=5+5)
BM_UFlat/14     [lsp              ]            2.30µs ± 1%             2.28µs ± 0%     ~             (p=0.056 n=5+5)
BM_UFlat/15     [xls              ]             983µs ± 0%              904µs ± 0%   -7.99%          (p=0.008 n=5+5)
BM_UFlat/16     [xls_200          ]              213ns ± 0%              213ns ± 1%     ~             (p=0.690 n=5+5)
BM_UFlat/17     [bin              ]             278µs ± 0%              274µs ± 0%   -1.56%          (p=0.008 n=5+5)
BM_UFlat/18     [bin_200          ]              101ns ± 0%              101ns ± 1%     ~             (p=1.000 n=5+5)
BM_UFlat/19     [sum              ]            29.4µs ± 1%             28.0µs ± 1%   -4.98%          (p=0.008 n=5+5)
BM_UFlat/20     [man              ]            2.97µs ± 0%             2.97µs ± 0%     ~             (p=0.421 n=5+5)
BM_UValidate/0  [html             ]            33.6µs ± 0%             33.6µs ± 0%     ~             (p=0.548 n=5+5)
BM_UValidate/1  [urls             ]             443µs ± 0%              441µs ± 0%   -0.43%          (p=0.016 n=4+5)
BM_UValidate/2  [jpg              ]              146ns ± 0%              144ns ± 0%   -1.63%          (p=0.008 n=5+5)
BM_UValidate/3  [jpg_200          ]             98.6ns ± 0%             95.3ns ± 0%   -3.32%          (p=0.008 n=5+5)
BM_UValidate/4  [pdf              ]            2.89µs ± 1%             2.85µs ± 0%   -1.22%          (p=0.008 n=5+5)
BM_UIOVec/0     [html             ]             122µs ± 0%              122µs ± 0%     ~             (p=1.000 n=5+5)
BM_UIOVec/1     [urls             ]             1.08ms ± 0%             1.08ms ± 0%     ~             (p=0.095 n=5+5)
BM_UIOVec/2     [jpg              ]            7.51µs ± 4%             7.69µs ± 6%     ~             (p=0.421 n=5+5)
BM_UIOVec/3     [jpg_200          ]              327ns ± 0%              327ns ± 1%     ~             (p=0.730 n=4+5)
BM_UIOVec/4     [pdf              ]            12.0µs ± 1%             12.0µs ± 0%     ~             (p=0.286 n=5+4)
BM_UFlatSink/0  [html             ]            42.3µs ± 0%             42.5µs ± 0%   +0.46%          (p=0.008 n=5+5)
BM_UFlatSink/1  [urls             ]             589µs ± 0%              575µs ± 0%   -2.36%          (p=0.008 n=5+5)
BM_UFlatSink/2  [jpg              ]            7.40µs ± 8%             7.74µs ± 9%     ~             (p=0.310 n=5+5)
BM_UFlatSink/3  [jpg_200          ]              134ns ± 0%              131ns ± 0%   -1.78%          (p=0.008 n=5+5)
BM_UFlatSink/4  [pdf              ]            8.28µs ± 3%             8.35µs ± 6%     ~             (p=0.548 n=5+5)
BM_UFlatSink/5  [html4            ]             222µs ± 0%              222µs ± 0%     ~             (p=0.690 n=5+5)
BM_UFlatSink/6  [txt1             ]             193µs ± 0%              192µs ± 0%     ~             (p=0.222 n=5+5)
BM_UFlatSink/7  [txt2             ]             171µs ± 0%              169µs ± 0%   -0.91%          (p=0.008 n=5+5)
BM_UFlatSink/8  [txt3             ]             512µs ± 0%              510µs ± 0%   -0.28%          (p=0.032 n=5+5)
BM_UFlatSink/9  [txt4             ]             717µs ± 0%              707µs ± 0%   -1.32%          (p=0.008 n=5+5)
BM_UFlatSink/10 [pb               ]            38.7µs ± 0%             39.2µs ± 0%   +1.29%          (p=0.008 n=5+5)
BM_UFlatSink/11 [gaviota          ]             190µs ± 0%              189µs ± 0%   -0.47%          (p=0.008 n=5+5)
BM_UFlatSink/12 [cp               ]            14.3µs ± 0%             14.2µs ± 0%   -0.65%          (p=0.008 n=5+5)
BM_UFlatSink/13 [c                ]            7.36µs ± 1%             7.29µs ± 0%   -0.92%          (p=0.008 n=5+5)
BM_UFlatSink/14 [lsp              ]            2.30µs ± 1%             2.29µs ± 0%     ~             (p=0.841 n=5+5)
BM_UFlatSink/15 [xls              ]             980µs ± 0%              903µs ± 0%   -7.92%          (p=0.008 n=5+5)
BM_UFlatSink/16 [xls_200          ]              217ns ± 0%              215ns ± 0%   -0.94%          (p=0.008 n=5+5)
BM_UFlatSink/17 [bin              ]             278µs ± 0%              273µs ± 0%   -1.56%          (p=0.008 n=5+5)
BM_UFlatSink/18 [bin_200          ]              107ns ± 5%              104ns ± 0%     ~             (p=0.056 n=5+5)
BM_UFlatSink/19 [sum              ]            29.5µs ± 0%             27.9µs ± 0%   -5.32%          (p=0.008 n=5+5)
BM_UFlatSink/20 [man              ]            3.01µs ± 0%             3.00µs ± 1%     ~             (p=0.310 n=5+5)
BM_ZFlat/0      [html (22.31 %)   ]             127µs ± 0%              126µs ± 0%   -0.46%          (p=0.008 n=5+5)
BM_ZFlat/1      [urls (47.78 %)   ]             1.67ms ± 0%             1.67ms ± 0%     ~             (p=0.548 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]            11.5µs ± 3%             11.6µs ± 6%     ~             (p=0.841 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]              350ns ± 2%              347ns ± 0%     ~             (p=0.905 n=5+4)
BM_ZFlat/4      [pdf (83.30 %)    ]            14.6µs ± 4%             14.6µs ± 1%     ~             (p=0.421 n=5+5)
BM_ZFlat/5      [html4 (22.52 %)  ]             553µs ± 0%              553µs ± 0%     ~             (p=0.690 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]             540µs ± 0%              540µs ± 0%     ~             (p=1.000 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]             481µs ± 0%              479µs ± 0%   -0.54%          (p=0.008 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]             1.44ms ± 0%             1.44ms ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]             1.97ms ± 0%             1.97ms ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]             110µs ± 0%              110µs ± 0%     ~             (p=0.841 n=5+5)
BM_ZFlat/11     [gaviota (37.72 %)]             411µs ± 0%              410µs ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            46.1µs ± 1%             45.8µs ± 0%     ~             (p=0.056 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            17.6µs ± 0%             17.6µs ± 1%     ~             (p=0.310 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            5.46µs ± 1%             5.49µs ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/15     [xls (41.23 %)    ]             1.62ms ± 0%             1.61ms ± 0%     ~             (p=0.190 n=4+5)
BM_ZFlat/16     [xls_200 (78.00 %)]              392ns ± 2%              385ns ± 1%     ~             (p=0.200 n=4+4)
BM_ZFlat/17     [bin (18.11 %)    ]             509µs ± 0%              508µs ± 0%   -0.26%          (p=0.008 n=5+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]             90.2ns ±15%             80.8ns ± 0%  -10.39%          (p=0.016 n=5+4)
BM_ZFlat/19     [sum (48.96 %)    ]            81.1µs ± 0%             79.1µs ± 1%   -2.37%          (p=0.008 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            7.61µs ± 1%             7.57µs ± 1%     ~             (p=0.421 n=5+5)

name                                          old allocs/op           new allocs/op           delta
BM_UFlat/0      [html             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/1      [urls             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/2      [jpg              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/3      [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/4      [pdf              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/5      [html4            ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/6      [txt1             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/7      [txt2             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/8      [txt3             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/9      [txt4             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/10     [pb               ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/11     [gaviota          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/12     [cp               ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/13     [c                ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/14     [lsp              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/15     [xls              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/16     [xls_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/17     [bin              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/18     [bin_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/19     [sum              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/20     [man              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UValidate/0  [html             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UValidate/1  [urls             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UValidate/2  [jpg              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UValidate/3  [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UValidate/4  [pdf              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UIOVec/0     [html             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UIOVec/1     [urls             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UIOVec/2     [jpg              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UIOVec/3     [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UIOVec/4     [pdf              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/0  [html             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/1  [urls             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/2  [jpg              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/3  [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/4  [pdf              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/5  [html4            ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/6  [txt1             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/7  [txt2             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/8  [txt3             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/9  [txt4             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/10 [pb               ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/11 [gaviota          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/12 [cp               ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/13 [c                ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/14 [lsp              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/15 [xls              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/16 [xls_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/17 [bin              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/18 [bin_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/19 [sum              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/20 [man              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_ZFlat/0      [html (22.31 %)   ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/1      [urls (47.78 %)   ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/2      [jpg (99.95 %)    ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/3      [jpg_200 (73.00 %)]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/4      [pdf (83.30 %)    ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/5      [html4 (22.52 %)  ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/6      [txt1 (57.88 %)   ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/7      [txt2 (61.91 %)   ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/8      [txt3 (54.99 %)   ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/9      [txt4 (66.26 %)   ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/10     [pb (19.68 %)     ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/11     [gaviota (37.72 %)]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/12     [cp (48.12 %)     ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/13     [c (42.47 %)      ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/14     [lsp (48.37 %)    ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/15     [xls (41.23 %)    ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/16     [xls_200 (78.00 %)]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/17     [bin (18.11 %)    ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/18     [bin_200 (7.50 %) ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/19     [sum (48.96 %)    ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/20     [man (59.21 %)    ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)

name                                          old peak-mem(Bytes)/op  new peak-mem(Bytes)/op  delta
BM_UFlat/0      [html             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/1      [urls             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/2      [jpg              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/3      [jpg_200          ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/4      [pdf              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/5      [html4            ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/6      [txt1             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/7      [txt2             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/8      [txt3             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/9      [txt4             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/10     [pb               ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/11     [gaviota          ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/12     [cp               ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/13     [c                ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/14     [lsp              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/15     [xls              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/16     [xls_200          ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/17     [bin              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/18     [bin_200          ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/19     [sum              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/20     [man              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UValidate/0  [html             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UValidate/1  [urls             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UValidate/2  [jpg              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UValidate/3  [jpg_200          ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UValidate/4  [pdf              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UIOVec/0     [html             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UIOVec/1     [urls             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UIOVec/2     [jpg              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UIOVec/3     [jpg_200          ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UIOVec/4     [pdf              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlatSink/0  [html             ]               102k ± 0%               102k ± 0%     ~     (all samples are equal)
BM_UFlatSink/1  [urls             ]               702k ± 0%               702k ± 0%     ~     (all samples are equal)
BM_UFlatSink/2  [jpg              ]               123k ± 0%               123k ± 0%     ~     (all samples are equal)
BM_UFlatSink/3  [jpg_200          ]                201 ± 0%                201 ± 0%     ~     (all samples are equal)
BM_UFlatSink/4  [pdf              ]               102k ± 0%               102k ± 0%     ~     (all samples are equal)
BM_UFlatSink/5  [html4            ]               410k ± 0%               410k ± 0%     ~     (all samples are equal)
BM_UFlatSink/6  [txt1             ]               152k ± 0%               152k ± 0%     ~     (all samples are equal)
BM_UFlatSink/7  [txt2             ]               125k ± 0%               125k ± 0%     ~     (all samples are equal)
BM_UFlatSink/8  [txt3             ]               427k ± 0%               427k ± 0%     ~     (all samples are equal)
BM_UFlatSink/9  [txt4             ]               482k ± 0%               482k ± 0%     ~     (all samples are equal)
BM_UFlatSink/10 [pb               ]               119k ± 0%               119k ± 0%     ~     (all samples are equal)
BM_UFlatSink/11 [gaviota          ]               184k ± 0%               184k ± 0%     ~     (all samples are equal)
BM_UFlatSink/12 [cp               ]              24.6k ± 0%              24.6k ± 0%     ~     (all samples are equal)
BM_UFlatSink/13 [c                ]              11.2k ± 0%              11.2k ± 0%     ~     (all samples are equal)
BM_UFlatSink/14 [lsp              ]              3.72k ± 0%              3.72k ± 0%     ~     (all samples are equal)
BM_UFlatSink/15 [xls              ]              1.03M ± 0%              1.03M ± 0%     ~     (all samples are equal)
BM_UFlatSink/16 [xls_200          ]                201 ± 0%                201 ± 0%     ~     (all samples are equal)
BM_UFlatSink/17 [bin              ]               513k ± 0%               513k ± 0%     ~     (all samples are equal)
BM_UFlatSink/18 [bin_200          ]                201 ± 0%                201 ± 0%     ~     (all samples are equal)
BM_UFlatSink/19 [sum              ]              38.2k ± 0%              38.2k ± 0%     ~     (all samples are equal)
BM_UFlatSink/20 [man              ]              4.23k ± 0%              4.23k ± 0%     ~     (all samples are equal)
BM_ZFlat/0      [html (22.31 %)   ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/1      [urls (47.78 %)   ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/2      [jpg (99.95 %)    ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/3      [jpg_200 (73.00 %)]              30.7k ± 0%              30.7k ± 0%     ~     (all samples are equal)
BM_ZFlat/4      [pdf (83.30 %)    ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/5      [html4 (22.52 %)  ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/6      [txt1 (57.88 %)   ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/7      [txt2 (61.91 %)   ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/8      [txt3 (54.99 %)   ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/9      [txt4 (66.26 %)   ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/10     [pb (19.68 %)     ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/11     [gaviota (37.72 %)]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/12     [cp (48.12 %)     ]              86.1k ± 0%              86.1k ± 0%     ~     (all samples are equal)
BM_ZFlat/13     [c (42.47 %)      ]              57.0k ± 0%              57.0k ± 0%     ~     (all samples are equal)
BM_ZFlat/14     [lsp (48.37 %)    ]              30.6k ± 0%              30.6k ± 0%     ~     (all samples are equal)
BM_ZFlat/15     [xls (41.23 %)    ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/16     [xls_200 (78.00 %)]              30.7k ± 0%              30.7k ± 0%     ~     (all samples are equal)
BM_ZFlat/17     [bin (18.11 %)    ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/18     [bin_200 (7.50 %) ]              30.7k ± 0%              30.7k ± 0%     ~     (all samples are equal)
BM_ZFlat/19     [sum (48.96 %)    ]               116k ± 0%               116k ± 0%     ~     (all samples are equal)
BM_ZFlat/20     [man (59.21 %)    ]              30.6k ± 0%              30.6k ± 0%     ~     (all samples are equal)

name                                          old speed               new speed               delta
BM_UFlat/0      [html             ]           2.43GB/s ± 0%           2.41GB/s ± 0%   -0.59%          (p=0.032 n=5+5)
BM_UFlat/1      [urls             ]           1.19GB/s ± 1%           1.22GB/s ± 0%   +2.58%          (p=0.008 n=5+5)
BM_UFlat/2      [jpg              ]           17.2GB/s ± 1%           17.3GB/s ± 1%     ~             (p=0.421 n=5+5)
BM_UFlat/3      [jpg_200          ]           1.54GB/s ± 1%           1.56GB/s ± 1%   +1.23%          (p=0.008 n=5+5)
BM_UFlat/4      [pdf              ]           12.5GB/s ± 1%           12.5GB/s ± 0%     ~             (p=0.413 n=5+4)
BM_UFlat/5      [html4            ]           1.85GB/s ± 1%           1.85GB/s ± 0%     ~             (p=0.690 n=5+5)
BM_UFlat/6      [txt1             ]            793MB/s ± 0%            794MB/s ± 0%     ~             (p=0.690 n=5+5)
BM_UFlat/7      [txt2             ]            738MB/s ± 0%            742MB/s ± 1%     ~             (p=0.151 n=5+5)
BM_UFlat/8      [txt3             ]            839MB/s ± 0%            838MB/s ± 0%     ~             (p=0.310 n=5+5)
BM_UFlat/9      [txt4             ]            674MB/s ± 0%            684MB/s ± 0%   +1.55%          (p=0.008 n=5+5)
BM_UFlat/10     [pb               ]           3.07GB/s ± 1%           3.03GB/s ± 1%   -1.27%          (p=0.008 n=5+5)
BM_UFlat/11     [gaviota          ]            974MB/s ± 0%            978MB/s ± 0%   +0.50%          (p=0.032 n=5+5)
BM_UFlat/12     [cp               ]           1.72GB/s ± 0%           1.74GB/s ± 1%   +0.79%          (p=0.008 n=5+5)
BM_UFlat/13     [c                ]           1.52GB/s ± 1%           1.53GB/s ± 1%     ~             (p=0.421 n=5+5)
BM_UFlat/14     [lsp              ]           1.62GB/s ± 1%           1.64GB/s ± 0%     ~             (p=0.151 n=5+5)
BM_UFlat/15     [xls              ]           1.05GB/s ± 0%           1.14GB/s ± 1%   +8.60%          (p=0.008 n=5+5)
BM_UFlat/16     [xls_200          ]            942MB/s ± 0%            941MB/s ± 1%     ~             (p=0.690 n=5+5)
BM_UFlat/17     [bin              ]           1.85GB/s ± 0%           1.88GB/s ± 0%   +1.60%          (p=0.008 n=5+5)
BM_UFlat/18     [bin_200          ]           1.99GB/s ± 0%           1.99GB/s ± 0%     ~             (p=0.421 n=5+5)
BM_UFlat/19     [sum              ]           1.30GB/s ± 1%           1.37GB/s ± 1%   +5.28%          (p=0.008 n=5+5)
BM_UFlat/20     [man              ]           1.43GB/s ± 1%           1.42GB/s ± 0%     ~             (p=0.421 n=5+5)
BM_UValidate/0  [html             ]           3.07GB/s ± 0%           3.05GB/s ± 1%     ~             (p=0.222 n=5+5)
BM_UValidate/1  [urls             ]           1.59GB/s ± 0%           1.60GB/s ± 0%     ~             (p=0.310 n=5+5)
BM_UValidate/2  [jpg              ]            845GB/s ± 0%            860GB/s ± 0%   +1.75%          (p=0.008 n=5+5)
BM_UValidate/3  [jpg_200          ]           2.04GB/s ± 1%           2.11GB/s ± 1%   +3.61%          (p=0.008 n=5+5)
BM_UValidate/4  [pdf              ]           35.6GB/s ± 1%           36.1GB/s ± 1%   +1.40%          (p=0.016 n=5+5)
BM_UIOVec/0     [html             ]            845MB/s ± 1%            843MB/s ± 1%     ~             (p=0.310 n=5+5)
BM_UIOVec/1     [urls             ]            653MB/s ± 0%            651MB/s ± 1%     ~             (p=0.190 n=4+5)
BM_UIOVec/2     [jpg              ]           16.4GB/s ± 4%           16.1GB/s ± 5%     ~             (p=0.548 n=5+5)
BM_UIOVec/3     [jpg_200          ]            611MB/s ± 2%            614MB/s ± 0%     ~             (p=0.548 n=5+5)
BM_UIOVec/4     [pdf              ]           8.53GB/s ± 1%           8.52GB/s ± 3%     ~             (p=0.841 n=5+5)
BM_UFlatSink/0  [html             ]           2.43GB/s ± 1%           2.42GB/s ± 0%     ~             (p=0.222 n=5+5)
BM_UFlatSink/1  [urls             ]           1.20GB/s ± 0%           1.23GB/s ± 1%   +2.38%          (p=0.008 n=5+5)
BM_UFlatSink/2  [jpg              ]           16.7GB/s ± 8%           16.0GB/s ± 8%     ~             (p=0.151 n=5+5)
BM_UFlatSink/3  [jpg_200          ]           1.50GB/s ± 0%           1.53GB/s ± 0%   +2.13%          (p=0.008 n=5+5)
BM_UFlatSink/4  [pdf              ]           12.5GB/s ± 0%           12.3GB/s ± 5%     ~             (p=0.730 n=4+5)
BM_UFlatSink/5  [html4            ]           1.85GB/s ± 0%           1.84GB/s ± 0%     ~             (p=0.151 n=5+5)
BM_UFlatSink/6  [txt1             ]            791MB/s ± 0%            791MB/s ± 0%     ~             (p=1.000 n=5+5)
BM_UFlatSink/7  [txt2             ]            735MB/s ± 0%            739MB/s ± 0%   +0.51%          (p=0.016 n=5+4)
BM_UFlatSink/8  [txt3             ]            838MB/s ± 0%            840MB/s ± 0%     ~             (p=0.151 n=5+5)
BM_UFlatSink/9  [txt4             ]            674MB/s ± 0%            683MB/s ± 0%   +1.37%          (p=0.008 n=5+5)
BM_UFlatSink/10 [pb               ]           3.07GB/s ± 0%           3.03GB/s ± 1%   -1.34%          (p=0.008 n=5+5)
BM_UFlatSink/11 [gaviota          ]            973MB/s ± 0%            975MB/s ± 0%     ~             (p=0.310 n=5+5)
BM_UFlatSink/12 [cp               ]           1.73GB/s ± 1%           1.74GB/s ± 1%     ~             (p=0.056 n=5+5)
BM_UFlatSink/13 [c                ]           1.52GB/s ± 1%           1.53GB/s ± 1%   +0.76%          (p=0.032 n=5+5)
BM_UFlatSink/14 [lsp              ]           1.62GB/s ± 0%           1.63GB/s ± 0%     ~             (p=0.548 n=5+5)
BM_UFlatSink/15 [xls              ]           1.05GB/s ± 0%           1.14GB/s ± 0%   +8.57%          (p=0.008 n=5+5)
BM_UFlatSink/16 [xls_200          ]            925MB/s ± 0%            933MB/s ± 0%   +0.85%          (p=0.008 n=5+5)
BM_UFlatSink/17 [bin              ]           1.85GB/s ± 1%           1.88GB/s ± 0%   +1.47%          (p=0.008 n=5+5)
BM_UFlatSink/18 [bin_200          ]           1.88GB/s ± 5%           1.93GB/s ± 0%     ~             (p=0.421 n=5+5)
BM_UFlatSink/19 [sum              ]           1.30GB/s ± 1%           1.37GB/s ± 1%   +5.18%          (p=0.008 n=5+5)
BM_UFlatSink/20 [man              ]           1.41GB/s ± 0%           1.41GB/s ± 1%     ~             (p=0.222 n=5+5)
BM_ZFlat/0      [html (22.31 %)   ]            809MB/s ± 0%            814MB/s ± 1%   +0.61%          (p=0.016 n=5+5)
BM_ZFlat/1      [urls (47.78 %)   ]            423MB/s ± 0%            422MB/s ± 0%     ~             (p=0.548 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]           10.8GB/s ± 3%           10.6GB/s ± 5%     ~             (p=0.690 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]            575MB/s ± 2%            579MB/s ± 0%     ~             (p=1.000 n=5+4)
BM_ZFlat/4      [pdf (83.30 %)    ]           7.06GB/s ± 4%           7.05GB/s ± 2%     ~             (p=0.421 n=5+5)
BM_ZFlat/5      [html4 (22.52 %)  ]            745MB/s ± 0%            744MB/s ± 0%     ~             (p=0.421 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]            282MB/s ± 0%            282MB/s ± 1%     ~             (p=1.000 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]            261MB/s ± 0%            263MB/s ± 0%   +0.55%          (p=0.032 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]            297MB/s ± 1%            297MB/s ± 0%     ~             (p=1.000 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]            245MB/s ± 0%            246MB/s ± 0%     ~             (p=0.286 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]           1.08GB/s ± 1%           1.08GB/s ± 0%     ~             (p=0.056 n=5+5)
BM_ZFlat/11     [gaviota (37.72 %)]            450MB/s ± 0%            452MB/s ± 0%   +0.55%          (p=0.016 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            537MB/s ± 1%            538MB/s ± 0%     ~             (p=0.421 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            637MB/s ± 1%            634MB/s ± 1%     ~             (p=0.222 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            684MB/s ± 1%            680MB/s ± 0%     ~             (p=0.310 n=5+5)
BM_ZFlat/15     [xls (41.23 %)    ]            641MB/s ± 0%            640MB/s ± 1%     ~             (p=0.310 n=5+5)
BM_ZFlat/16     [xls_200 (78.00 %)]            501MB/s ± 9%            521MB/s ± 1%     ~             (p=0.111 n=5+4)
BM_ZFlat/17     [bin (18.11 %)    ]           1.01GB/s ± 0%           1.02GB/s ± 1%     ~             (p=0.151 n=5+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]           2.24GB/s ±14%           2.48GB/s ± 0%     ~             (p=0.063 n=5+4)
BM_ZFlat/19     [sum (48.96 %)    ]            473MB/s ± 1%            485MB/s ± 1%   +2.47%          (p=0.008 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            558MB/s ± 1%            558MB/s ± 1%     ~             (p=1.000 n=5+5)
2019-01-08 06:35:12 -08:00
costan fdba21ffd6 Fix typo in two argument names in stubs.
The stubs are only used in the open source version, so it wasn't caught
in internal tests.
2019-01-06 13:49:33 -08:00
costan 81d444e4e4 Remove direct use of _builtin_clz.
A previous CL introduced _builtin_clz in zippy.cc. This is a GCC / Clang
intrinsic, and is not supported in Visual Studio. The rest of the
project uses bit manipulation intrinsics via the functions in Bits::,
which are stubbed out for the open source build in
zippy-stubs-internal.h.

This CL extracts Bits::Log2FloorNonZero() out of Bits::Log2Floor() in
the stubbed version of Bits, adds assertions to the Bits::*NonZero()
functions in the stubs, and converts _builtin_clz to a
Bits::Log2FloorNonZero() call.

The latter part is not obvious. A mathematical proof of correctness is
outlined in added comments. An empirical proof is available at
https://godbolt.org/z/mPKWmh -- CalculateTableSizeOld(), which is the
current code, compiles to the same assembly on Clang as
CalculateTableSizeNew1(), which is the bigger jump in the proof.
CalculateTableSizeNew2() is a fairly obvious transformation from
CalculateTableSizeNew1(), and results in slightly better assembly on all
supported compilers.

Two benchmark runs with the same arguments as the original CL only
showed differences in completely disjoint tests, suggesting that the
differences are pure noise.
2019-01-06 12:49:08 -08:00
costan 9a6fa91217 Remove use of std::uniform_distribution<uint8_t>.
A previous CL removed use of Google-specific random number generating
functionality, such as ACMRandom, and used the C++11 standard library
instead. The CL used std::uniform_distribution<uint8_t> to generate
random bytes, which seems to be unsupported by the standard [1, 2].

For better or for worse, our toolchain does not complain. However,
Visual Studio errors out with "invalid template argument for
uniform_int_distribution: N4659 29.6.1.1 [rand.req.genl]/1e requires one
of short, int, long, long long, unsigned short, unsigned int, unsigned
long, or unsigned long long".

This CL replaces std::uniform_distribution<uint8_t> with
std::uniform_distribution<int>(0, 255) and appropriate static_cast<>s.

[1] http://eel.is/c++draft/rand.req.genl#1.6
[2] be83c0b472/source/numerics.tex (L1807-L1817)
2019-01-06 12:48:39 -08:00
costan 3fcbc47f99 Use std random number generators in tests.
An earlier CL introduced absl::Uniform, which is not yet open sourced,
and therefore unavailable in the open source build.

This CL removes absl::Uniform and ACMRandom in favor of equivalent C++11
standard random generators. Abseil promises to be faster than the
standard library, but we can afford a speed hit in tests in return for
an easier open sourcing story.
2019-01-04 19:09:39 -08:00
costan 925c3094c4 Convert DCHECK to assert.
The open source build does not support DCHECK, and this project uses
assert() instead of DCHECK.
2019-01-04 19:09:15 -08:00
costan 02de4ff1d1 Update Travis CI configuration.
The Travis CI configuration updates reflect the following changes:
* Container-based builds (sudo: false) have been removed.
  https://changelog.travis-ci.com/the-container-based-build-environment-is-fully-deprecated-84517
* Ubuntu Xenial (16.04) is available as a base image.
  https://blog.travis-ci.com/2018-11-08-xenial-release
* Homebrew now has a dedicated DSL.
  https://docs.travis-ci.com/user/installing-dependencies/#installing-packages-on-os-x

To take full advantage of VM resources, CI builds now use Ninja
https://ninja-build.org/ instead of Make.
2019-01-04 19:09:07 -08:00
atdt f7aece15e2 Add comment explaining MSan false-positive workaround 2019-01-04 19:09:01 -08:00
atdt 5913c5f8e4 Don't use _bzhi_u32 under MSan
MSan knows that x & 0xFF only uses the lower byte from x but it isn't as
smart about _bzhi_u32(val, 8). (I'll file an upstream bug.)
2019-01-04 19:08:53 -08:00
atdt 136b3ebc31 If BMI instructions are available, use BZHI to extract low bytes.
With --cpu=haswell, this results in some significant speed improvement
(notably 12-14% for html and pb). On k8, performance is not affected (as
expected). Full benchmark results for --cpu={k8,haswell} below.

Haswell
-------

name                                          old time/op             new time/op             delta
BM_UFlat/0      [html             ]            55.2µs ± 0%             49.0µs ± 0%  -11.34%          (p=0.008 n=5+5)
BM_UFlat/1      [urls             ]             612µs ± 0%              604µs ± 0%   -1.21%          (p=0.008 n=5+5)
BM_UFlat/2      [jpg              ]            6.11µs ± 2%             6.07µs ± 1%     ~             (p=0.421 n=5+5)
BM_UFlat/3      [jpg_200          ]              134ns ± 0%              132ns ± 5%   -1.49%          (p=0.048 n=5+5)
BM_UFlat/4      [pdf              ]            8.41µs ± 2%             8.34µs ± 1%     ~             (p=0.222 n=5+5)
BM_UFlat/5      [html4            ]             239µs ± 0%              234µs ± 0%   -2.24%          (p=0.008 n=5+5)
BM_UFlat/6      [txt1             ]             211µs ± 0%              205µs ± 0%   -2.73%          (p=0.008 n=5+5)
BM_UFlat/7      [txt2             ]             185µs ± 0%              181µs ± 0%   -2.34%          (p=0.008 n=5+5)
BM_UFlat/8      [txt3             ]             560µs ± 0%              545µs ± 0%   -2.55%          (p=0.008 n=5+5)
BM_UFlat/9      [txt4             ]             773µs ± 0%              753µs ± 0%   -2.61%          (p=0.008 n=5+5)
BM_UFlat/10     [pb               ]            51.6µs ± 0%             45.3µs ± 0%  -12.28%          (p=0.008 n=5+5)
BM_UFlat/11     [gaviota          ]             209µs ± 0%              204µs ± 0%   -2.28%          (p=0.008 n=5+5)
BM_UFlat/12     [cp               ]            17.3µs ± 0%             15.7µs ± 1%   -9.57%          (p=0.008 n=5+5)
BM_UFlat/13     [c                ]            8.08µs ± 0%             8.00µs ± 0%   -0.99%          (p=0.008 n=5+5)
BM_UFlat/14     [lsp              ]            2.48µs ± 0%             2.45µs ± 0%   -1.11%          (p=0.008 n=5+5)
BM_UFlat/15     [xls              ]             967µs ± 0%              954µs ± 0%   -1.36%          (p=0.008 n=5+5)
BM_UFlat/16     [xls_200          ]              219ns ± 1%              218ns ± 1%     ~             (p=0.444 n=5+5)
BM_UFlat/17     [bin              ]             278µs ± 0%              275µs ± 0%   -0.92%          (p=0.008 n=5+5)
BM_UFlat/18     [bin_200          ]              100ns ± 0%               99ns ± 1%   -1.04%          (p=0.008 n=5+5)
BM_UFlat/19     [sum              ]            34.0µs ± 0%             30.9µs ± 0%   -9.10%          (p=0.008 n=5+5)
BM_UFlat/20     [man              ]            3.21µs ± 0%             3.20µs ± 0%     ~             (p=0.063 n=5+5)
BM_UValidate/0  [html             ]            33.1µs ± 0%             33.6µs ± 0%   +1.69%          (p=0.008 n=5+5)
BM_UValidate/1  [urls             ]             436µs ± 0%              441µs ± 0%   +1.06%          (p=0.008 n=5+5)
BM_UValidate/2  [jpg              ]              141ns ± 0%              142ns ± 0%   +0.71%          (p=0.008 n=5+5)
BM_UValidate/3  [jpg_200          ]             94.3ns ± 0%             95.3ns ± 0%   +1.06%          (p=0.008 n=5+5)
BM_UValidate/4  [pdf              ]            2.87µs ± 0%             2.95µs ± 0%   +2.74%          (p=0.008 n=5+5)
BM_UIOVec/0     [html             ]             126µs ± 0%              124µs ± 0%   -1.50%          (p=0.008 n=5+5)
BM_UIOVec/1     [urls             ]             1.13ms ± 0%             1.11ms ± 0%   -1.95%          (p=0.008 n=5+5)
BM_UIOVec/2     [jpg              ]            6.31µs ± 3%             7.44µs ± 3%  +17.75%          (p=0.008 n=5+5)
BM_UIOVec/3     [jpg_200          ]              332ns ± 1%              318ns ± 1%   -4.22%          (p=0.008 n=5+5)
BM_UIOVec/4     [pdf              ]            12.7µs ± 3%             12.6µs ± 9%     ~             (p=0.222 n=5+5)
BM_UFlatSink/0  [html             ]            55.2µs ± 0%             49.0µs ± 0%  -11.31%          (p=0.008 n=5+5)
BM_UFlatSink/1  [urls             ]             612µs ± 0%              605µs ± 0%   -1.17%          (p=0.008 n=5+5)
BM_UFlatSink/2  [jpg              ]            6.29µs ±12%             6.57µs ± 9%     ~             (p=0.548 n=5+5)
BM_UFlatSink/3  [jpg_200          ]              138ns ± 2%              134ns ± 0%   -2.76%          (p=0.000 n=5+4)
BM_UFlatSink/4  [pdf              ]            8.35µs ± 0%             8.34µs ± 1%     ~             (p=0.905 n=4+5)
BM_UFlatSink/5  [html4            ]             239µs ± 0%              234µs ± 0%   -2.33%          (p=0.008 n=5+5)
BM_UFlatSink/6  [txt1             ]             211µs ± 0%              205µs ± 0%   -2.82%          (p=0.008 n=5+5)
BM_UFlatSink/7  [txt2             ]             185µs ± 0%              181µs ± 0%   -2.18%          (p=0.008 n=5+5)
BM_UFlatSink/8  [txt3             ]             560µs ± 0%              545µs ± 0%   -2.57%          (p=0.008 n=5+5)
BM_UFlatSink/9  [txt4             ]             773µs ± 0%              754µs ± 0%   -2.54%          (p=0.008 n=5+5)
BM_UFlatSink/10 [pb               ]            51.6µs ± 0%             45.3µs ± 0%  -12.19%          (p=0.008 n=5+5)
BM_UFlatSink/11 [gaviota          ]             209µs ± 0%              204µs ± 0%   -2.39%          (p=0.008 n=5+5)
BM_UFlatSink/12 [cp               ]            17.3µs ± 0%             15.6µs ± 0%   -9.98%          (p=0.008 n=5+5)
BM_UFlatSink/13 [c                ]            8.10µs ± 1%             7.98µs ± 0%   -1.53%          (p=0.008 n=5+5)
BM_UFlatSink/14 [lsp              ]            2.49µs ± 1%             2.47µs ± 0%   -0.84%          (p=0.008 n=5+5)
BM_UFlatSink/15 [xls              ]             968µs ± 0%              953µs ± 0%   -1.48%          (p=0.008 n=5+5)
BM_UFlatSink/16 [xls_200          ]              220ns ± 1%              220ns ± 0%     ~             (p=1.000 n=5+4)
BM_UFlatSink/17 [bin              ]             278µs ± 0%              275µs ± 0%   -0.99%          (p=0.008 n=5+5)
BM_UFlatSink/18 [bin_200          ]              102ns ± 1%              103ns ± 0%   +1.18%          (p=0.048 n=5+5)
BM_UFlatSink/19 [sum              ]            34.0µs ± 0%             30.9µs ± 0%   -9.21%          (p=0.008 n=5+5)
BM_UFlatSink/20 [man              ]            3.22µs ± 1%             3.20µs ± 0%   -0.76%          (p=0.032 n=5+5)
BM_ZFlat/0      [html (22.31 %)   ]             122µs ± 0%              122µs ± 0%     ~             (p=0.413 n=4+5)
BM_ZFlat/1      [urls (47.78 %)   ]             1.60ms ± 0%             1.60ms ± 0%   -0.06%          (p=0.032 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]            10.5µs ± 2%             10.7µs ± 9%     ~             (p=0.841 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]              310ns ± 1%              309ns ± 3%     ~             (p=0.349 n=4+5)
BM_ZFlat/4      [pdf (83.30 %)    ]            13.5µs ± 1%             13.6µs ± 2%     ~             (p=0.595 n=5+5)
BM_ZFlat/5      [html4 (22.52 %)  ]             533µs ± 0%              532µs ± 0%   -0.08%          (p=0.032 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]             529µs ± 0%              528µs ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]             469µs ± 0%              469µs ± 0%     ~             (p=0.690 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]             1.40ms ± 0%             1.40ms ± 0%     ~             (p=0.548 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]             1.93ms ± 0%             1.92ms ± 0%     ~             (p=0.421 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]             106µs ± 0%              106µs ± 0%     ~             (p=0.548 n=5+5)
BM_ZFlat/11     [gaviota (37.72 %)]             404µs ± 0%              404µs ± 0%     ~             (p=0.841 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            43.2µs ± 0%             43.3µs ± 1%     ~             (p=0.151 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            16.4µs ± 1%             16.4µs ± 0%     ~             (p=0.310 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            4.96µs ± 0%             4.96µs ± 1%     ~             (p=0.651 n=5+5)
BM_ZFlat/15     [xls (41.23 %)    ]             1.54ms ± 0%             1.54ms ± 0%     ~             (p=0.841 n=5+5)
BM_ZFlat/16     [xls_200 (78.00 %)]              352ns ± 2%              351ns ± 1%     ~             (p=0.762 n=5+5)
BM_ZFlat/17     [bin (18.11 %)    ]             491µs ± 0%              491µs ± 0%     ~             (p=0.310 n=5+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]             75.6ns ± 1%             77.2ns ± 0%   +2.06%          (p=0.016 n=5+4)
BM_ZFlat/19     [sum (48.96 %)    ]            76.9µs ± 0%             76.7µs ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            6.87µs ± 1%             6.81µs ± 0%   -0.87%          (p=0.008 n=5+5)

name                                          old speed               new speed               delta
BM_UFlat/0      [html             ]           1.85GB/s ± 0%           2.09GB/s ± 0%  +12.83%          (p=0.016 n=4+5)
BM_UFlat/1      [urls             ]           1.15GB/s ± 0%           1.16GB/s ± 0%   +1.25%          (p=0.008 n=5+5)
BM_UFlat/2      [jpg              ]           20.1GB/s ± 2%           20.3GB/s ± 1%     ~             (p=0.421 n=5+5)
BM_UFlat/3      [jpg_200          ]           1.49GB/s ± 0%           1.53GB/s ± 0%   +2.83%          (p=0.016 n=5+4)
BM_UFlat/4      [pdf              ]           12.2GB/s ± 2%           12.3GB/s ± 1%     ~             (p=0.222 n=5+5)
BM_UFlat/5      [html4            ]           1.71GB/s ± 0%           1.75GB/s ± 0%   +2.29%          (p=0.008 n=5+5)
BM_UFlat/6      [txt1             ]            722MB/s ± 0%            742MB/s ± 0%   +2.81%          (p=0.008 n=5+5)
BM_UFlat/7      [txt2             ]            676MB/s ± 0%            692MB/s ± 0%   +2.40%          (p=0.008 n=5+5)
BM_UFlat/8      [txt3             ]            762MB/s ± 0%            782MB/s ± 0%   +2.62%          (p=0.008 n=5+5)
BM_UFlat/9      [txt4             ]            623MB/s ± 0%            640MB/s ± 0%   +2.68%          (p=0.008 n=5+5)
BM_UFlat/10     [pb               ]           2.30GB/s ± 0%           2.62GB/s ± 0%  +13.99%          (p=0.008 n=5+5)
BM_UFlat/11     [gaviota          ]            883MB/s ± 0%            903MB/s ± 0%   +2.33%          (p=0.008 n=5+5)
BM_UFlat/12     [cp               ]           1.42GB/s ± 0%           1.57GB/s ± 1%  +10.57%          (p=0.008 n=5+5)
BM_UFlat/13     [c                ]           1.38GB/s ± 0%           1.39GB/s ± 0%   +1.00%          (p=0.008 n=5+5)
BM_UFlat/14     [lsp              ]           1.50GB/s ± 0%           1.52GB/s ± 0%   +1.12%          (p=0.008 n=5+5)
BM_UFlat/15     [xls              ]           1.06GB/s ± 0%           1.08GB/s ± 0%   +1.34%          (p=0.016 n=5+4)
BM_UFlat/16     [xls_200          ]            913MB/s ± 1%            918MB/s ± 1%     ~             (p=0.421 n=5+5)
BM_UFlat/17     [bin              ]           1.85GB/s ± 0%           1.86GB/s ± 0%   +0.92%          (p=0.008 n=5+5)
BM_UFlat/18     [bin_200          ]           2.01GB/s ± 0%           2.03GB/s ± 1%   +1.10%          (p=0.008 n=5+5)
BM_UFlat/19     [sum              ]           1.13GB/s ± 0%           1.24GB/s ± 0%   +9.99%          (p=0.008 n=5+5)
BM_UFlat/20     [man              ]           1.32GB/s ± 0%           1.32GB/s ± 1%     ~             (p=0.063 n=5+5)
BM_UValidate/0  [html             ]           3.10GB/s ± 0%           3.04GB/s ± 0%   -1.66%          (p=0.008 n=5+5)
BM_UValidate/1  [urls             ]           1.61GB/s ± 0%           1.59GB/s ± 0%   -1.04%          (p=0.008 n=5+5)
BM_UValidate/2  [jpg              ]            875GB/s ± 0%            866GB/s ± 0%   -1.11%          (p=0.008 n=5+5)
BM_UValidate/3  [jpg_200          ]           2.12GB/s ± 0%           2.10GB/s ± 0%   -1.01%          (p=0.016 n=5+4)
BM_UValidate/4  [pdf              ]           35.7GB/s ± 0%           34.7GB/s ± 0%   -2.66%          (p=0.008 n=5+5)
BM_UIOVec/0     [html             ]            813MB/s ± 0%            825MB/s ± 0%   +1.52%          (p=0.008 n=5+5)
BM_UIOVec/1     [urls             ]            622MB/s ± 0%            634MB/s ± 0%   +1.99%          (p=0.008 n=5+5)
BM_UIOVec/2     [jpg              ]           19.5GB/s ± 3%           16.6GB/s ± 3%  -15.08%          (p=0.008 n=5+5)
BM_UIOVec/3     [jpg_200          ]            603MB/s ± 1%            630MB/s ± 1%   +4.42%          (p=0.008 n=5+5)
BM_UIOVec/4     [pdf              ]           8.05GB/s ± 3%           8.12GB/s ± 8%     ~             (p=0.222 n=5+5)
BM_UFlatSink/0  [html             ]           1.85GB/s ± 0%           2.09GB/s ± 0%  +12.76%          (p=0.008 n=5+5)
BM_UFlatSink/1  [urls             ]           1.15GB/s ± 0%           1.16GB/s ± 0%   +1.18%          (p=0.008 n=5+5)
BM_UFlatSink/2  [jpg              ]           19.6GB/s ±11%           18.8GB/s ± 9%     ~             (p=0.548 n=5+5)
BM_UFlatSink/3  [jpg_200          ]           1.45GB/s ± 1%           1.49GB/s ± 0%   +2.82%          (p=0.016 n=5+4)
BM_UFlatSink/4  [pdf              ]           12.3GB/s ± 0%           12.3GB/s ± 1%     ~             (p=0.905 n=4+5)
BM_UFlatSink/5  [html4            ]           1.71GB/s ± 0%           1.75GB/s ± 0%   +2.41%          (p=0.008 n=5+5)
BM_UFlatSink/6  [txt1             ]            722MB/s ± 0%            743MB/s ± 0%   +2.90%          (p=0.008 n=5+5)
BM_UFlatSink/7  [txt2             ]            676MB/s ± 0%            691MB/s ± 0%   +2.23%          (p=0.008 n=5+5)
BM_UFlatSink/8  [txt3             ]            763MB/s ± 0%            783MB/s ± 0%   +2.64%          (p=0.008 n=5+5)
BM_UFlatSink/9  [txt4             ]            623MB/s ± 0%            639MB/s ± 0%   +2.61%          (p=0.008 n=5+5)
BM_UFlatSink/10 [pb               ]           2.30GB/s ± 0%           2.62GB/s ± 0%  +13.86%          (p=0.008 n=5+5)
BM_UFlatSink/11 [gaviota          ]            882MB/s ± 0%            904MB/s ± 0%   +2.45%          (p=0.008 n=5+5)
BM_UFlatSink/12 [cp               ]           1.42GB/s ± 0%           1.58GB/s ± 0%  +11.09%          (p=0.008 n=5+5)
BM_UFlatSink/13 [c                ]           1.38GB/s ± 1%           1.40GB/s ± 0%   +1.56%          (p=0.008 n=5+5)
BM_UFlatSink/14 [lsp              ]           1.50GB/s ± 1%           1.51GB/s ± 1%   +0.85%          (p=0.008 n=5+5)
BM_UFlatSink/15 [xls              ]           1.06GB/s ± 0%           1.08GB/s ± 0%   +1.51%          (p=0.016 n=5+4)
BM_UFlatSink/16 [xls_200          ]            908MB/s ± 1%            911MB/s ± 0%     ~             (p=0.730 n=5+4)
BM_UFlatSink/17 [bin              ]           1.85GB/s ± 0%           1.86GB/s ± 0%   +1.01%          (p=0.008 n=5+5)
BM_UFlatSink/18 [bin_200          ]           1.96GB/s ± 1%           1.94GB/s ± 1%   -1.18%          (p=0.016 n=5+5)
BM_UFlatSink/19 [sum              ]           1.12GB/s ± 0%           1.24GB/s ± 0%  +10.16%          (p=0.008 n=5+5)
BM_UFlatSink/20 [man              ]           1.31GB/s ± 1%           1.32GB/s ± 0%   +0.77%          (p=0.048 n=5+5)
BM_ZFlat/0      [html (22.31 %)   ]            839MB/s ± 0%            839MB/s ± 0%     ~             (p=0.413 n=4+5)
BM_ZFlat/1      [urls (47.78 %)   ]            439MB/s ± 0%            439MB/s ± 0%   +0.06%          (p=0.032 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]           11.7GB/s ± 2%           11.5GB/s ± 9%     ~             (p=0.841 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]            645MB/s ± 1%            647MB/s ± 3%     ~             (p=0.413 n=4+5)
BM_ZFlat/4      [pdf (83.30 %)    ]           7.57GB/s ± 1%           7.54GB/s ± 2%     ~             (p=0.595 n=5+5)
BM_ZFlat/5      [html4 (22.52 %)  ]            769MB/s ± 0%            770MB/s ± 0%   +0.08%          (p=0.032 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]            288MB/s ± 0%            288MB/s ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]            267MB/s ± 0%            267MB/s ± 0%     ~             (p=0.690 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]            305MB/s ± 0%            305MB/s ± 0%     ~             (p=0.548 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]            250MB/s ± 0%            251MB/s ± 0%     ~             (p=0.421 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]           1.12GB/s ± 0%           1.12GB/s ± 0%     ~             (p=0.635 n=5+5)
BM_ZFlat/11     [gaviota (37.72 %)]            457MB/s ± 0%            457MB/s ± 0%     ~             (p=0.841 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            570MB/s ± 0%            568MB/s ± 1%     ~             (p=0.151 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            682MB/s ± 1%            681MB/s ± 0%     ~             (p=0.310 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            750MB/s ± 0%            751MB/s ± 1%     ~             (p=0.690 n=5+5)
BM_ZFlat/15     [xls (41.23 %)    ]            668MB/s ± 0%            668MB/s ± 0%     ~             (p=0.841 n=5+5)
BM_ZFlat/16     [xls_200 (78.00 %)]            569MB/s ± 2%            570MB/s ± 1%     ~             (p=0.841 n=5+5)
BM_ZFlat/17     [bin (18.11 %)    ]           1.04GB/s ± 0%           1.04GB/s ± 0%     ~             (p=0.310 n=5+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]           2.64GB/s ± 1%           2.59GB/s ± 0%   -1.99%          (p=0.016 n=5+4)
BM_ZFlat/19     [sum (48.96 %)    ]            497MB/s ± 0%            498MB/s ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            615MB/s ± 1%            621MB/s ± 0%   +0.87%          (p=0.008 n=5+5)

K8
--

name                                          old time/op             new time/op             delta
BM_UFlat/0      [html             ]            41.7µs ± 0%             41.7µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/1      [urls             ]             588µs ± 0%              588µs ± 0%    ~             (p=0.310 n=5+5)
BM_UFlat/2      [jpg              ]            7.11µs ± 1%             7.10µs ± 1%    ~             (p=0.556 n=5+4)
BM_UFlat/3      [jpg_200          ]              130ns ± 0%              130ns ± 0%    ~     (all samples are equal)
BM_UFlat/4      [pdf              ]            8.19µs ± 0%             8.26µs ± 2%    ~             (p=0.460 n=5+5)
BM_UFlat/5      [html4            ]             219µs ± 0%              219µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/6      [txt1             ]             192µs ± 0%              191µs ± 0%    ~             (p=0.341 n=5+5)
BM_UFlat/7      [txt2             ]             170µs ± 0%              170µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/8      [txt3             ]             509µs ± 0%              509µs ± 0%    ~             (p=0.151 n=5+5)
BM_UFlat/9      [txt4             ]             712µs ± 0%              712µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/10     [pb               ]            38.5µs ± 0%             38.5µs ± 0%    ~             (p=0.452 n=5+5)
BM_UFlat/11     [gaviota          ]             189µs ± 0%              189µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/12     [cp               ]            14.2µs ± 1%             14.2µs ± 0%    ~             (p=0.889 n=5+5)
BM_UFlat/13     [c                ]            7.32µs ± 0%             7.33µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/14     [lsp              ]            2.26µs ± 0%             2.27µs ± 0%    ~             (p=0.222 n=4+5)
BM_UFlat/15     [xls              ]             954µs ± 0%              955µs ± 0%    ~             (p=0.222 n=5+5)
BM_UFlat/16     [xls_200          ]              215ns ± 4%              212ns ± 0%    ~             (p=0.095 n=5+4)
BM_UFlat/17     [bin              ]             276µs ± 0%              276µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/18     [bin_200          ]              104ns ±10%              103ns ± 3%    ~             (p=0.825 n=5+5)
BM_UFlat/19     [sum              ]            29.2µs ± 0%             29.2µs ± 0%    ~             (p=0.690 n=5+5)
BM_UFlat/20     [man              ]            2.96µs ± 0%             2.97µs ± 0%  +0.43%          (p=0.032 n=5+5)
BM_UValidate/0  [html             ]            33.4µs ± 0%             33.4µs ± 0%    ~             (p=0.151 n=5+5)
BM_UValidate/1  [urls             ]             441µs ± 0%              441µs ± 0%    ~             (p=0.548 n=5+5)
BM_UValidate/2  [jpg              ]              146ns ± 0%              146ns ± 0%    ~     (all samples are equal)
BM_UValidate/3  [jpg_200          ]             98.0ns ± 0%             98.0ns ± 0%    ~             (p=1.000 n=5+5)
BM_UValidate/4  [pdf              ]            2.89µs ± 0%             2.89µs ± 0%    ~             (p=0.794 n=5+5)
BM_UIOVec/0     [html             ]             121µs ± 0%              121µs ± 0%    ~             (p=0.151 n=5+5)
BM_UIOVec/1     [urls             ]             1.08ms ± 0%             1.08ms ± 0%    ~             (p=0.095 n=5+5)
BM_UIOVec/2     [jpg              ]            7.47µs ± 5%             7.31µs ± 2%    ~             (p=0.222 n=5+5)
BM_UIOVec/3     [jpg_200          ]              330ns ± 0%              330ns ± 0%    ~     (all samples are equal)
BM_UIOVec/4     [pdf              ]            12.3µs ± 2%             12.0µs ± 0%    ~             (p=0.063 n=5+5)
BM_UFlatSink/0  [html             ]            41.6µs ± 0%             41.6µs ± 0%    ~             (p=0.095 n=5+5)
BM_UFlatSink/1  [urls             ]             589µs ± 0%              589µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/2  [jpg              ]            7.84µs ±26%             7.23µs ± 5%    ~             (p=0.690 n=5+5)
BM_UFlatSink/3  [jpg_200          ]              132ns ± 0%              132ns ± 0%    ~     (all samples are equal)
BM_UFlatSink/4  [pdf              ]            8.43µs ± 3%             8.27µs ± 2%    ~             (p=0.254 n=5+5)
BM_UFlatSink/5  [html4            ]             219µs ± 0%              219µs ± 0%    ~             (p=0.524 n=5+5)
BM_UFlatSink/6  [txt1             ]             192µs ± 0%              192µs ± 0%    ~             (p=0.690 n=5+5)
BM_UFlatSink/7  [txt2             ]             170µs ± 0%              170µs ± 0%    ~             (p=0.421 n=5+5)
BM_UFlatSink/8  [txt3             ]             509µs ± 0%              509µs ± 0%    ~             (p=0.310 n=5+5)
BM_UFlatSink/9  [txt4             ]             712µs ± 0%              712µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlatSink/10 [pb               ]            38.5µs ± 0%             38.5µs ± 0%    ~             (p=0.421 n=5+5)
BM_UFlatSink/11 [gaviota          ]             189µs ± 0%              189µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/12 [cp               ]            14.2µs ± 0%             14.2µs ± 0%    ~             (p=0.421 n=5+5)
BM_UFlatSink/13 [c                ]            7.37µs ± 1%             7.36µs ± 1%    ~             (p=0.746 n=5+5)
BM_UFlatSink/14 [lsp              ]            2.27µs ± 0%             2.27µs ± 1%    ~             (p=0.714 n=5+5)
BM_UFlatSink/15 [xls              ]             954µs ± 0%              954µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/16 [xls_200          ]              215ns ± 1%              215ns ± 1%    ~             (p=0.921 n=5+5)
BM_UFlatSink/17 [bin              ]             276µs ± 0%              276µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/18 [bin_200          ]              103ns ± 2%              104ns ± 1%    ~             (p=0.429 n=5+5)
BM_UFlatSink/19 [sum              ]            29.2µs ± 0%             29.2µs ± 0%    ~             (p=0.452 n=5+5)
BM_UFlatSink/20 [man              ]            2.96µs ± 0%             2.97µs ± 1%    ~             (p=0.484 n=5+5)
BM_ZFlat/0      [html (22.31 %)   ]             126µs ± 0%              126µs ± 0%    ~             (p=1.000 n=5+5)
BM_ZFlat/1      [urls (47.78 %)   ]             1.67ms ± 0%             1.67ms ± 0%    ~             (p=0.841 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]            11.6µs ± 4%             11.6µs ± 3%    ~             (p=1.000 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]              368ns ± 1%              367ns ± 0%    ~             (p=0.159 n=5+5)
BM_ZFlat/4      [pdf (83.30 %)    ]            14.7µs ± 1%             14.6µs ± 0%    ~             (p=0.190 n=5+4)
BM_ZFlat/5      [html4 (22.52 %)  ]             550µs ± 0%              550µs ± 0%    ~             (p=0.841 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]             540µs ± 0%              540µs ± 0%    ~             (p=0.310 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]             479µs ± 0%              480µs ± 0%    ~             (p=1.000 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]             1.44ms ± 0%             1.44ms ± 0%    ~             (p=0.421 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]             1.97ms ± 0%             1.97ms ± 0%    ~             (p=0.421 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]             110µs ± 0%              109µs ± 0%    ~             (p=0.730 n=5+4)
BM_ZFlat/11     [gaviota (37.72 %)]             412µs ± 0%              412µs ± 0%    ~             (p=1.000 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            46.3µs ± 0%             46.3µs ± 1%    ~             (p=0.841 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            17.7µs ± 0%             17.7µs ± 1%    ~             (p=0.841 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            5.54µs ± 1%             5.55µs ± 0%    ~             (p=0.254 n=5+4)
BM_ZFlat/15     [xls (41.23 %)    ]             1.62ms ± 0%             1.63ms ± 0%    ~             (p=0.151 n=5+5)
BM_ZFlat/16     [xls_200 (78.00 %)]              395ns ± 2%              394ns ± 1%    ~             (p=1.000 n=5+5)
BM_ZFlat/17     [bin (18.11 %)    ]             507µs ± 0%              507µs ± 0%    ~             (p=0.056 n=5+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]             89.6ns ± 5%             89.8ns ± 5%    ~             (p=1.000 n=5+5)
BM_ZFlat/19     [sum (48.96 %)    ]            79.9µs ± 0%             79.9µs ± 0%    ~             (p=0.690 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            7.67µs ± 0%             7.67µs ± 1%    ~             (p=0.548 n=5+5)

name                                          old speed               new speed               delta
BM_UFlat/0      [html             ]           2.45GB/s ± 0%           2.45GB/s ± 0%    ~             (p=0.889 n=5+5)
BM_UFlat/1      [urls             ]           1.19GB/s ± 0%           1.19GB/s ± 0%    ~     (all samples are equal)
BM_UFlat/2      [jpg              ]           17.3GB/s ± 1%           17.3GB/s ± 1%    ~             (p=0.556 n=5+4)
BM_UFlat/3      [jpg_200          ]           1.54GB/s ± 0%           1.54GB/s ± 0%    ~             (p=0.833 n=5+5)
BM_UFlat/4      [pdf              ]           12.5GB/s ± 0%           12.4GB/s ± 2%    ~             (p=0.421 n=5+5)
BM_UFlat/5      [html4            ]           1.87GB/s ± 0%           1.87GB/s ± 0%    ~             (p=1.000 n=4+5)
BM_UFlat/6      [txt1             ]            794MB/s ± 0%            794MB/s ± 0%    ~             (p=0.310 n=5+5)
BM_UFlat/7      [txt2             ]            738MB/s ± 0%            738MB/s ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/8      [txt3             ]            839MB/s ± 0%            838MB/s ± 0%    ~             (p=0.151 n=5+5)
BM_UFlat/9      [txt4             ]            677MB/s ± 0%            677MB/s ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/10     [pb               ]           3.08GB/s ± 0%           3.08GB/s ± 0%    ~             (p=0.452 n=5+5)
BM_UFlat/11     [gaviota          ]            975MB/s ± 0%            975MB/s ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/12     [cp               ]           1.73GB/s ± 1%           1.73GB/s ± 0%    ~             (p=0.984 n=5+5)
BM_UFlat/13     [c                ]           1.52GB/s ± 0%           1.52GB/s ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/14     [lsp              ]           1.64GB/s ± 0%           1.64GB/s ± 0%    ~             (p=0.254 n=4+5)
BM_UFlat/15     [xls              ]           1.08GB/s ± 0%           1.08GB/s ± 0%    ~             (p=0.095 n=5+4)
BM_UFlat/16     [xls_200          ]            931MB/s ± 4%            941MB/s ± 0%    ~             (p=0.151 n=5+5)
BM_UFlat/17     [bin              ]           1.86GB/s ± 0%           1.86GB/s ± 0%    ~             (p=0.762 n=5+5)
BM_UFlat/18     [bin_200          ]           1.92GB/s ± 9%           1.95GB/s ± 3%    ~             (p=1.000 n=5+5)
BM_UFlat/19     [sum              ]           1.31GB/s ± 1%           1.31GB/s ± 0%    ~             (p=0.548 n=5+5)
BM_UFlat/20     [man              ]           1.43GB/s ± 0%           1.42GB/s ± 1%  -0.42%          (p=0.040 n=5+5)
BM_UValidate/0  [html             ]           3.06GB/s ± 0%           3.06GB/s ± 0%    ~             (p=0.151 n=5+5)
BM_UValidate/1  [urls             ]           1.59GB/s ± 0%           1.59GB/s ± 0%    ~             (p=0.357 n=5+5)
BM_UValidate/2  [jpg              ]            845GB/s ± 0%            845GB/s ± 0%    ~             (p=0.548 n=5+5)
BM_UValidate/3  [jpg_200          ]           2.04GB/s ± 0%           2.04GB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UValidate/4  [pdf              ]           35.4GB/s ± 0%           35.4GB/s ± 0%    ~             (p=0.421 n=5+5)
BM_UIOVec/0     [html             ]            845MB/s ± 0%            845MB/s ± 0%    ~             (p=0.151 n=5+5)
BM_UIOVec/1     [urls             ]            650MB/s ± 0%            650MB/s ± 0%    ~             (p=0.087 n=5+5)
BM_UIOVec/2     [jpg              ]           16.5GB/s ± 5%           16.8GB/s ± 2%    ~             (p=0.222 n=5+5)
BM_UIOVec/3     [jpg_200          ]            605MB/s ± 0%            605MB/s ± 0%    ~             (p=0.690 n=5+5)
BM_UIOVec/4     [pdf              ]           8.36GB/s ± 2%           8.54GB/s ± 0%    ~             (p=0.063 n=5+5)
BM_UFlatSink/0  [html             ]           2.46GB/s ± 0%           2.46GB/s ± 0%    ~             (p=0.063 n=5+5)
BM_UFlatSink/1  [urls             ]           1.19GB/s ± 0%           1.19GB/s ± 0%    ~     (all samples are equal)
BM_UFlatSink/2  [jpg              ]           16.0GB/s ±22%           17.0GB/s ± 5%    ~             (p=0.690 n=5+5)
BM_UFlatSink/3  [jpg_200          ]           1.51GB/s ± 0%           1.51GB/s ± 2%    ~             (p=1.000 n=5+5)
BM_UFlatSink/4  [pdf              ]           12.2GB/s ± 3%           12.4GB/s ± 2%    ~             (p=0.254 n=5+5)
BM_UFlatSink/5  [html4            ]           1.87GB/s ± 0%           1.87GB/s ± 0%    ~             (p=0.532 n=5+5)
BM_UFlatSink/6  [txt1             ]            794MB/s ± 0%            794MB/s ± 0%    ~             (p=0.690 n=5+5)
BM_UFlatSink/7  [txt2             ]            738MB/s ± 0%            738MB/s ± 0%    ~             (p=0.421 n=5+5)
BM_UFlatSink/8  [txt3             ]            838MB/s ± 0%            838MB/s ± 0%    ~             (p=0.310 n=5+5)
BM_UFlatSink/9  [txt4             ]            676MB/s ± 0%            676MB/s ± 0%    ~             (p=0.841 n=5+5)
BM_UFlatSink/10 [pb               ]           3.08GB/s ± 0%           3.08GB/s ± 0%    ~             (p=0.365 n=5+5)
BM_UFlatSink/11 [gaviota          ]            975MB/s ± 0%            975MB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/12 [cp               ]           1.73GB/s ± 0%           1.74GB/s ± 0%    ~             (p=0.286 n=5+5)
BM_UFlatSink/13 [c                ]           1.51GB/s ± 1%           1.52GB/s ± 1%    ~             (p=0.683 n=5+5)
BM_UFlatSink/14 [lsp              ]           1.64GB/s ± 0%           1.64GB/s ± 0%    ~             (p=0.444 n=5+5)
BM_UFlatSink/15 [xls              ]           1.08GB/s ± 0%           1.08GB/s ± 0%    ~             (p=0.333 n=4+5)
BM_UFlatSink/16 [xls_200          ]            930MB/s ± 1%            930MB/s ± 1%    ~             (p=0.841 n=5+5)
BM_UFlatSink/17 [bin              ]           1.86GB/s ± 0%           1.86GB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/18 [bin_200          ]           1.93GB/s ± 2%           1.93GB/s ± 1%    ~             (p=0.651 n=5+5)
BM_UFlatSink/19 [sum              ]           1.31GB/s ± 0%           1.31GB/s ± 0%    ~             (p=0.508 n=5+5)
BM_UFlatSink/20 [man              ]           1.43GB/s ± 0%           1.42GB/s ± 1%    ~             (p=0.524 n=5+5)
BM_ZFlat/0      [html (22.31 %)   ]            815MB/s ± 0%            815MB/s ± 0%    ~             (p=1.000 n=5+5)
BM_ZFlat/1      [urls (47.78 %)   ]            420MB/s ± 0%            420MB/s ± 0%    ~             (p=0.841 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]           10.6GB/s ± 4%           10.6GB/s ± 3%    ~             (p=1.000 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]            543MB/s ± 1%            546MB/s ± 0%    ~             (p=0.095 n=5+5)
BM_ZFlat/4      [pdf (83.30 %)    ]           6.96GB/s ± 1%           7.01GB/s ± 0%    ~             (p=0.190 n=5+4)
BM_ZFlat/5      [html4 (22.52 %)  ]            745MB/s ± 0%            745MB/s ± 0%    ~             (p=0.841 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]            282MB/s ± 0%            282MB/s ± 0%    ~             (p=0.310 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]            261MB/s ± 0%            261MB/s ± 0%    ~             (p=1.000 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]            297MB/s ± 0%            297MB/s ± 0%    ~             (p=0.421 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]            244MB/s ± 0%            244MB/s ± 0%    ~             (p=0.389 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]           1.08GB/s ± 0%           1.08GB/s ± 0%    ~             (p=0.238 n=5+4)
BM_ZFlat/11     [gaviota (37.72 %)]            448MB/s ± 0%            447MB/s ± 0%    ~             (p=1.000 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            532MB/s ± 0%            531MB/s ± 1%    ~             (p=0.841 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            632MB/s ± 0%            631MB/s ± 1%    ~             (p=0.841 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            672MB/s ± 1%            671MB/s ± 0%    ~             (p=0.286 n=5+4)
BM_ZFlat/15     [xls (41.23 %)    ]            634MB/s ± 0%            633MB/s ± 0%    ~             (p=0.151 n=5+5)
BM_ZFlat/16     [xls_200 (78.00 %)]            507MB/s ± 2%            508MB/s ± 1%    ~             (p=1.000 n=5+5)
BM_ZFlat/17     [bin (18.11 %)    ]           1.01GB/s ± 0%           1.01GB/s ± 0%    ~             (p=0.056 n=5+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]           2.24GB/s ± 5%           2.23GB/s ± 5%    ~             (p=0.889 n=5+5)
BM_ZFlat/19     [sum (48.96 %)    ]            479MB/s ± 0%            479MB/s ± 0%    ~             (p=0.690 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            551MB/s ± 0%            551MB/s ± 1%    ~             (p=0.548 n=5+5)
2019-01-04 19:08:39 -08:00
nafi eb47f79631 Optimize by about 0.5%.
How? Move boolean args of EmitLiteral, EmitCopyAtMost64 and EmitCopy to
template args so that compiler generates two separate pruned versions of
the functions for arg=true and arg=false. FWIW, CompressFragment
function calls 1) EmitLiteral inside from a 1-level loop and 2) EmitCopy
from a 2-level nested loop. CompressFragment is called from inside
another while-loop from the public 'Compress' function.

name                                          old time/op             new time/op             delta
BM_UFlat/0      [html             ]            41.9µs ± 0%             41.1µs ± 0%  -1.92%        (p=0.000 n=10+10)
BM_UFlat/1      [urls             ]             576µs ± 0%              572µs ± 0%  -0.68%        (p=0.000 n=10+10)
BM_UFlat/2      [jpg              ]            7.25µs ± 6%             7.13µs ± 1%    ~             (p=0.074 n=9+8)
BM_UFlat/3      [jpg_200          ]              132ns ± 1%              130ns ± 0%  -1.45%         (p=0.000 n=10+8)
BM_UFlat/4      [pdf              ]            8.27µs ± 3%             8.22µs ± 0%    ~             (p=0.277 n=9+8)
BM_UFlat/5      [html4            ]             220µs ± 0%              219µs ± 0%  -0.75%        (p=0.000 n=10+10)
BM_UFlat/6      [txt1             ]             192µs ± 0%              190µs ± 0%  -0.80%        (p=0.000 n=10+10)
BM_UFlat/7      [txt2             ]             169µs ± 0%              168µs ± 0%  -0.69%        (p=0.000 n=10+10)
BM_UFlat/8      [txt3             ]             510µs ± 0%              508µs ± 0%  -0.42%        (p=0.000 n=10+10)
BM_UFlat/9      [txt4             ]             707µs ± 0%              702µs ± 0%  -0.67%        (p=0.000 n=10+10)
BM_UFlat/10     [pb               ]            38.5µs ± 0%             37.4µs ± 1%  -2.84%        (p=0.000 n=10+10)
BM_UFlat/11     [gaviota          ]             189µs ± 0%              190µs ± 0%  +0.55%        (p=0.000 n=10+10)
BM_UFlat/12     [cp               ]            14.2µs ± 0%             14.1µs ± 0%  -0.44%        (p=0.000 n=10+10)
BM_UFlat/13     [c                ]            7.31µs ± 1%             7.35µs ± 0%  +0.54%        (p=0.002 n=10+10)
BM_UFlat/14     [lsp              ]            2.27µs ± 0%             2.27µs ± 1%    ~             (p=0.161 n=9+9)
BM_UFlat/15     [xls              ]             905µs ± 0%              903µs ± 0%  -0.25%        (p=0.000 n=10+10)
BM_UFlat/16     [xls_200          ]              214ns ± 1%              213ns ± 1%  -0.57%        (p=0.043 n=10+10)
BM_UFlat/17     [bin              ]             275µs ± 0%              274µs ± 0%  -0.31%        (p=0.000 n=10+10)
BM_UFlat/18     [bin_200          ]              102ns ± 5%              101ns ± 3%    ~             (p=0.161 n=9+9)
BM_UFlat/19     [sum              ]            27.9µs ± 0%             27.2µs ± 0%  -2.68%        (p=0.000 n=10+10)
BM_UFlat/20     [man              ]            2.97µs ± 1%             2.97µs ± 0%    ~            (p=0.400 n=9+10)
BM_UValidate/0  [html             ]            33.3µs ± 0%             33.7µs ± 0%  +1.18%        (p=0.000 n=10+10)
BM_UValidate/1  [urls             ]             442µs ± 0%              442µs ± 0%    ~           (p=0.353 n=10+10)
BM_UValidate/2  [jpg              ]              146ns ± 0%              146ns ± 0%    ~           (p=0.063 n=10+10)
BM_UValidate/3  [jpg_200          ]             98.4ns ± 0%             98.5ns ± 0%    ~           (p=0.184 n=10+10)
BM_UValidate/4  [pdf              ]            2.88µs ± 0%             2.90µs ± 1%  +0.68%        (p=0.000 n=10+10)
BM_UIOVec/0     [html             ]             122µs ± 0%              122µs ± 0%  -0.39%        (p=0.000 n=10+10)
BM_UIOVec/1     [urls             ]             1.08ms ± 0%             1.08ms ± 0%    ~           (p=0.529 n=10+10)
BM_UIOVec/2     [jpg              ]            7.71µs ±11%             7.76µs ± 9%    ~           (p=0.853 n=10+10)
BM_UIOVec/3     [jpg_200          ]              327ns ± 0%              328ns ± 0%    ~            (p=0.146 n=8+10)
BM_UIOVec/4     [pdf              ]            12.1µs ± 1%             12.1µs ± 3%    ~           (p=0.315 n=10+10)
BM_UFlatSink/0  [html             ]            41.8µs ± 0%             41.0µs ± 0%  -1.87%         (p=0.000 n=10+9)
BM_UFlatSink/1  [urls             ]             576µs ± 0%              572µs ± 0%  -0.74%         (p=0.000 n=9+10)
BM_UFlatSink/2  [jpg              ]            7.58µs ± 8%             7.56µs ± 9%    ~           (p=0.739 n=10+10)
BM_UFlatSink/3  [jpg_200          ]              133ns ± 0%              134ns ± 0%  +0.60%         (p=0.000 n=10+9)
BM_UFlatSink/4  [pdf              ]            8.44µs ± 3%             8.30µs ± 1%  -1.65%        (p=0.029 n=10+10)
BM_UFlatSink/5  [html4            ]             220µs ± 0%              218µs ± 0%  -0.81%        (p=0.000 n=10+10)
BM_UFlatSink/6  [txt1             ]             192µs ± 0%              190µs ± 0%  -0.78%        (p=0.000 n=10+10)
BM_UFlatSink/7  [txt2             ]             169µs ± 0%              168µs ± 0%  -0.59%        (p=0.000 n=10+10)
BM_UFlatSink/8  [txt3             ]             510µs ± 0%              508µs ± 0%  -0.39%        (p=0.000 n=10+10)
BM_UFlatSink/9  [txt4             ]             707µs ± 0%              703µs ± 0%  -0.62%        (p=0.000 n=10+10)
BM_UFlatSink/10 [pb               ]            38.4µs ± 0%             37.4µs ± 0%  -2.62%          (p=0.000 n=9+9)
BM_UFlatSink/11 [gaviota          ]             189µs ± 0%              190µs ± 0%  +0.63%        (p=0.000 n=10+10)
BM_UFlatSink/12 [cp               ]            14.2µs ± 0%             14.1µs ± 0%  -0.27%        (p=0.011 n=10+10)
BM_UFlatSink/13 [c                ]            7.33µs ± 1%             7.35µs ± 1%    ~            (p=0.243 n=10+9)
BM_UFlatSink/14 [lsp              ]            2.27µs ± 0%             2.26µs ± 0%  -0.39%          (p=0.000 n=9+9)
BM_UFlatSink/15 [xls              ]             904µs ± 0%              902µs ± 0%  -0.28%        (p=0.000 n=10+10)
BM_UFlatSink/16 [xls_200          ]              216ns ± 1%              217ns ± 1%    ~            (p=0.661 n=10+9)
BM_UFlatSink/17 [bin              ]             275µs ± 0%              274µs ± 0%  -0.24%          (p=0.000 n=8+9)
BM_UFlatSink/18 [bin_200          ]              104ns ± 2%              104ns ± 1%  -0.70%         (p=0.043 n=9+10)
BM_UFlatSink/19 [sum              ]            27.8µs ± 0%             27.1µs ± 0%  -2.51%         (p=0.000 n=9+10)
BM_UFlatSink/20 [man              ]            3.02µs ± 1%             3.00µs ± 1%    ~            (p=0.079 n=10+9)
BM_ZFlat/0      [html (22.31 %)   ]             126µs ± 0%              126µs ± 0%  -0.24%        (p=0.000 n=10+10)
BM_ZFlat/1      [urls (47.78 %)   ]             1.68ms ± 0%             1.67ms ± 0%  -1.06%        (p=0.000 n=10+10)
BM_ZFlat/2      [jpg (99.95 %)    ]            11.8µs ± 5%             11.6µs ± 5%    ~           (p=0.165 n=10+10)
BM_ZFlat/3      [jpg_200 (73.00 %)]              360ns ± 3%              358ns ± 1%    ~            (p=0.762 n=10+8)
BM_ZFlat/4      [pdf (83.30 %)    ]            14.8µs ± 2%             14.6µs ± 1%  -1.57%         (p=0.022 n=10+9)
BM_ZFlat/5      [html4 (22.52 %)  ]             556µs ± 0%              552µs ± 0%  -0.87%        (p=0.000 n=10+10)
BM_ZFlat/6      [txt1 (57.88 %)   ]             542µs ± 0%              540µs ± 0%  -0.47%        (p=0.000 n=10+10)
BM_ZFlat/7      [txt2 (61.91 %)   ]             483µs ± 0%              480µs ± 0%  -0.62%        (p=0.000 n=10+10)
BM_ZFlat/8      [txt3 (54.99 %)   ]             1.45ms ± 0%             1.44ms ± 0%  -0.47%        (p=0.000 n=10+10)
BM_ZFlat/9      [txt4 (66.26 %)   ]             1.98ms ± 0%             1.97ms ± 0%  -0.19%        (p=0.007 n=10+10)
BM_ZFlat/10     [pb (19.68 %)     ]             111µs ± 0%              109µs ± 0%  -1.75%        (p=0.000 n=10+10)
BM_ZFlat/11     [gaviota (37.72 %)]             411µs ± 0%              410µs ± 0%  -0.21%        (p=0.004 n=10+10)
BM_ZFlat/12     [cp (48.12 %)     ]            45.9µs ± 0%             45.5µs ± 0%  -0.76%        (p=0.000 n=10+10)
BM_ZFlat/13     [c (42.47 %)      ]            17.6µs ± 0%             17.5µs ± 0%  -0.80%        (p=0.000 n=10+10)
BM_ZFlat/14     [lsp (48.37 %)    ]            5.50µs ± 0%             5.44µs ± 0%  -1.19%         (p=0.000 n=9+10)
BM_ZFlat/15     [xls (41.23 %)    ]             1.63ms ± 0%             1.61ms ± 0%  -1.21%        (p=0.000 n=10+10)
BM_ZFlat/16     [xls_200 (78.00 %)]              389ns ± 2%              391ns ± 1%    ~            (p=0.182 n=10+9)
BM_ZFlat/17     [bin (18.11 %)    ]             509µs ± 0%              506µs ± 0%  -0.51%        (p=0.000 n=10+10)
BM_ZFlat/18     [bin_200 (7.50 %) ]             92.7ns ± 0%             89.4ns ± 1%  -3.55%          (p=0.000 n=8+8)
BM_ZFlat/19     [sum (48.96 %)    ]            80.2µs ± 0%             78.9µs ± 0%  -1.65%        (p=0.000 n=10+10)
BM_ZFlat/20     [man (59.21 %)    ]            7.59µs ± 1%             7.59µs ± 1%    ~           (p=0.912 n=10+10)

name                                          old allocs/op           new allocs/op           delta
BM_UFlat/0      [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/1      [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/2      [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/3      [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/4      [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/5      [html4            ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/6      [txt1             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/7      [txt2             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/8      [txt3             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/9      [txt4             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/10     [pb               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/11     [gaviota          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/12     [cp               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/13     [c                ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/14     [lsp              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/15     [xls              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/16     [xls_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/17     [bin              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/18     [bin_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/19     [sum              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/20     [man              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/0  [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/1  [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/2  [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/3  [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/4  [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/0     [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/1     [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/2     [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/3     [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/4     [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/0  [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/1  [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/2  [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/3  [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/4  [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/5  [html4            ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/6  [txt1             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/7  [txt2             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/8  [txt3             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/9  [txt4             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/10 [pb               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/11 [gaviota          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/12 [cp               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/13 [c                ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/14 [lsp              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/15 [xls              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/16 [xls_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/17 [bin              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/18 [bin_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/19 [sum              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/20 [man              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_ZFlat/0      [html (22.31 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/1      [urls (47.78 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/2      [jpg (99.95 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/3      [jpg_200 (73.00 %)]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/4      [pdf (83.30 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/5      [html4 (22.52 %)  ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/6      [txt1 (57.88 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/7      [txt2 (61.91 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/8      [txt3 (54.99 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/9      [txt4 (66.26 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/10     [pb (19.68 %)     ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/11     [gaviota (37.72 %)]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/12     [cp (48.12 %)     ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/13     [c (42.47 %)      ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/14     [lsp (48.37 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/15     [xls (41.23 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/16     [xls_200 (78.00 %)]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/17     [bin (18.11 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/18     [bin_200 (7.50 %) ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/19     [sum (48.96 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/20     [man (59.21 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)

name                                          old peak-mem(Bytes)/op  new peak-mem(Bytes)/op  delta
BM_UFlat/0      [html             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/1      [urls             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/2      [jpg              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/3      [jpg_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/4      [pdf              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/5      [html4            ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/6      [txt1             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/7      [txt2             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/8      [txt3             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/9      [txt4             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/10     [pb               ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/11     [gaviota          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/12     [cp               ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/13     [c                ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/14     [lsp              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/15     [xls              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/16     [xls_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/17     [bin              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/18     [bin_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/19     [sum              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/20     [man              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/0  [html             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/1  [urls             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/2  [jpg              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/3  [jpg_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/4  [pdf              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/0     [html             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/1     [urls             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/2     [jpg              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/3     [jpg_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/4     [pdf              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlatSink/0  [html             ]               102k ± 0%               102k ± 0%    ~     (all samples are equal)
BM_UFlatSink/1  [urls             ]               702k ± 0%               702k ± 0%    ~     (all samples are equal)
BM_UFlatSink/2  [jpg              ]               123k ± 0%               123k ± 0%    ~     (all samples are equal)
BM_UFlatSink/3  [jpg_200          ]                201 ± 0%                201 ± 0%    ~     (all samples are equal)
BM_UFlatSink/4  [pdf              ]               102k ± 0%               102k ± 0%    ~     (all samples are equal)
BM_UFlatSink/5  [html4            ]               410k ± 0%               410k ± 0%    ~     (all samples are equal)
BM_UFlatSink/6  [txt1             ]               152k ± 0%               152k ± 0%    ~     (all samples are equal)
BM_UFlatSink/7  [txt2             ]               125k ± 0%               125k ± 0%    ~     (all samples are equal)
BM_UFlatSink/8  [txt3             ]               427k ± 0%               427k ± 0%    ~     (all samples are equal)
BM_UFlatSink/9  [txt4             ]               482k ± 0%               482k ± 0%    ~     (all samples are equal)
BM_UFlatSink/10 [pb               ]               119k ± 0%               119k ± 0%    ~     (all samples are equal)
BM_UFlatSink/11 [gaviota          ]               184k ± 0%               184k ± 0%    ~     (all samples are equal)
BM_UFlatSink/12 [cp               ]              24.6k ± 0%              24.6k ± 0%    ~     (all samples are equal)
BM_UFlatSink/13 [c                ]              11.2k ± 0%              11.2k ± 0%    ~     (all samples are equal)
BM_UFlatSink/14 [lsp              ]              3.72k ± 0%              3.72k ± 0%    ~     (all samples are equal)
BM_UFlatSink/15 [xls              ]              1.03M ± 0%              1.03M ± 0%    ~     (all samples are equal)
BM_UFlatSink/16 [xls_200          ]                201 ± 0%                201 ± 0%    ~     (all samples are equal)
BM_UFlatSink/17 [bin              ]               513k ± 0%               513k ± 0%    ~     (all samples are equal)
BM_UFlatSink/18 [bin_200          ]                201 ± 0%                201 ± 0%    ~     (all samples are equal)
BM_UFlatSink/19 [sum              ]              38.2k ± 0%              38.2k ± 0%    ~     (all samples are equal)
BM_UFlatSink/20 [man              ]              4.23k ± 0%              4.23k ± 0%    ~     (all samples are equal)
BM_ZFlat/0      [html (22.31 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/1      [urls (47.78 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/2      [jpg (99.95 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/3      [jpg_200 (73.00 %)]              63.3k ± 0%              63.3k ± 0%    ~     (all samples are equal)
BM_ZFlat/4      [pdf (83.30 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/5      [html4 (22.52 %)  ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/6      [txt1 (57.88 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/7      [txt2 (61.91 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/8      [txt3 (54.99 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/9      [txt4 (66.26 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/10     [pb (19.68 %)     ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/11     [gaviota (37.72 %)]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/12     [cp (48.12 %)     ]              86.1k ± 0%              86.1k ± 0%    ~     (all samples are equal)
BM_ZFlat/13     [c (42.47 %)      ]              63.3k ± 0%              63.3k ± 0%    ~     (all samples are equal)
BM_ZFlat/14     [lsp (48.37 %)    ]              63.3k ± 0%              63.3k ± 0%    ~     (all samples are equal)
BM_ZFlat/15     [xls (41.23 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/16     [xls_200 (78.00 %)]              63.3k ± 0%              63.3k ± 0%    ~     (all samples are equal)
BM_ZFlat/17     [bin (18.11 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/18     [bin_200 (7.50 %) ]              63.3k ± 0%              63.3k ± 0%    ~     (all samples are equal)
BM_ZFlat/19     [sum (48.96 %)    ]               116k ± 0%               116k ± 0%    ~     (all samples are equal)
BM_ZFlat/20     [man (59.21 %)    ]              63.3k ± 0%              63.3k ± 0%    ~     (all samples are equal)

name                                          old speed               new speed               delta
BM_UFlat/0      [html             ]           2.45GB/s ± 0%           2.50GB/s ± 0%  +1.96%        (p=0.000 n=10+10)
BM_UFlat/1      [urls             ]           1.22GB/s ± 0%           1.23GB/s ± 0%  +0.69%        (p=0.000 n=10+10)
BM_UFlat/2      [jpg              ]           17.0GB/s ± 5%           17.3GB/s ± 1%    ~             (p=0.074 n=9+8)
BM_UFlat/3      [jpg_200          ]           1.52GB/s ± 1%           1.54GB/s ± 0%  +1.44%         (p=0.000 n=10+8)
BM_UFlat/4      [pdf              ]           12.5GB/s ± 1%           12.5GB/s ± 0%    ~             (p=0.721 n=8+8)
BM_UFlat/5      [html4            ]           1.87GB/s ± 0%           1.88GB/s ± 0%  +0.76%        (p=0.000 n=10+10)
BM_UFlat/6      [txt1             ]            795MB/s ± 0%            801MB/s ± 0%  +0.79%        (p=0.000 n=10+10)
BM_UFlat/7      [txt2             ]            741MB/s ± 0%            746MB/s ± 0%  +0.68%        (p=0.000 n=10+10)
BM_UFlat/8      [txt3             ]            840MB/s ± 0%            844MB/s ± 0%  +0.44%        (p=0.000 n=10+10)
BM_UFlat/9      [txt4             ]            684MB/s ± 0%            688MB/s ± 0%  +0.65%         (p=0.000 n=9+10)
BM_UFlat/10     [pb               ]           3.09GB/s ± 0%           3.18GB/s ± 0%  +2.88%         (p=0.000 n=10+9)
BM_UFlat/11     [gaviota          ]            980MB/s ± 0%            975MB/s ± 0%  -0.57%        (p=0.000 n=10+10)
BM_UFlat/12     [cp               ]           1.74GB/s ± 0%           1.75GB/s ± 0%  +0.38%         (p=0.001 n=10+9)
BM_UFlat/13     [c                ]           1.53GB/s ± 1%           1.52GB/s ± 0%  -0.55%        (p=0.003 n=10+10)
BM_UFlat/14     [lsp              ]           1.64GB/s ± 0%           1.64GB/s ± 1%    ~            (p=0.400 n=9+10)
BM_UFlat/15     [xls              ]           1.14GB/s ± 0%           1.14GB/s ± 0%  +0.23%        (p=0.000 n=10+10)
BM_UFlat/16     [xls_200          ]            936MB/s ± 1%            941MB/s ± 1%    ~           (p=0.052 n=10+10)
BM_UFlat/17     [bin              ]           1.87GB/s ± 0%           1.88GB/s ± 0%  +0.28%        (p=0.000 n=10+10)
BM_UFlat/18     [bin_200          ]           1.97GB/s ± 5%           1.99GB/s ± 3%    ~             (p=0.136 n=9+9)
BM_UFlat/19     [sum              ]           1.37GB/s ± 0%           1.41GB/s ± 0%  +2.82%         (p=0.000 n=10+9)
BM_UFlat/20     [man              ]           1.42GB/s ± 1%           1.42GB/s ± 0%    ~           (p=0.579 n=10+10)
BM_UValidate/0  [html             ]           3.08GB/s ± 0%           3.05GB/s ± 0%  -1.18%        (p=0.000 n=10+10)
BM_UValidate/1  [urls             ]           1.59GB/s ± 0%           1.59GB/s ± 0%    ~           (p=0.247 n=10+10)
BM_UValidate/2  [jpg              ]            845GB/s ± 0%            846GB/s ± 0%  +0.09%        (p=0.000 n=10+10)
BM_UValidate/3  [jpg_200          ]           2.04GB/s ± 0%           2.04GB/s ± 0%  -0.09%        (p=0.019 n=10+10)
BM_UValidate/4  [pdf              ]           35.7GB/s ± 0%           35.4GB/s ± 1%  -0.70%        (p=0.000 n=10+10)
BM_UIOVec/0     [html             ]            841MB/s ± 0%            844MB/s ± 0%  +0.36%        (p=0.000 n=10+10)
BM_UIOVec/1     [urls             ]            650MB/s ± 0%            650MB/s ± 0%    ~           (p=0.105 n=10+10)
BM_UIOVec/2     [jpg              ]           16.1GB/s ±10%           15.9GB/s ± 8%    ~           (p=0.853 n=10+10)
BM_UIOVec/3     [jpg_200          ]            612MB/s ± 1%            612MB/s ± 0%    ~            (p=0.243 n=9+10)
BM_UIOVec/4     [pdf              ]           8.52GB/s ± 2%           8.46GB/s ± 3%    ~           (p=0.436 n=10+10)
BM_UFlatSink/0  [html             ]           2.46GB/s ± 0%           2.50GB/s ± 0%  +1.83%         (p=0.000 n=9+10)
BM_UFlatSink/1  [urls             ]           1.22GB/s ± 0%           1.23GB/s ± 0%  +0.73%        (p=0.000 n=10+10)
BM_UFlatSink/2  [jpg              ]           16.3GB/s ± 8%           16.4GB/s ± 9%    ~           (p=0.739 n=10+10)
BM_UFlatSink/3  [jpg_200          ]           1.51GB/s ± 0%           1.50GB/s ± 0%  -0.62%         (p=0.000 n=10+9)
BM_UFlatSink/4  [pdf              ]           12.2GB/s ± 3%           12.4GB/s ± 1%  +1.62%        (p=0.029 n=10+10)
BM_UFlatSink/5  [html4            ]           1.87GB/s ± 0%           1.88GB/s ± 0%  +0.79%        (p=0.000 n=10+10)
BM_UFlatSink/6  [txt1             ]            795MB/s ± 0%            801MB/s ± 0%  +0.74%         (p=0.000 n=10+9)
BM_UFlatSink/7  [txt2             ]            741MB/s ± 0%            745MB/s ± 0%  +0.59%         (p=0.000 n=10+9)
BM_UFlatSink/8  [txt3             ]            840MB/s ± 0%            843MB/s ± 0%  +0.37%         (p=0.000 n=9+10)
BM_UFlatSink/9  [txt4             ]            684MB/s ± 0%            688MB/s ± 0%  +0.57%         (p=0.000 n=9+10)
BM_UFlatSink/10 [pb               ]           3.10GB/s ± 0%           3.18GB/s ± 0%  +2.64%         (p=0.000 n=9+10)
BM_UFlatSink/11 [gaviota          ]            980MB/s ± 0%            974MB/s ± 0%  -0.64%        (p=0.000 n=10+10)
BM_UFlatSink/12 [cp               ]           1.74GB/s ± 0%           1.75GB/s ± 0%  +0.26%        (p=0.005 n=10+10)
BM_UFlatSink/13 [c                ]           1.52GB/s ± 1%           1.52GB/s ± 1%    ~           (p=0.123 n=10+10)
BM_UFlatSink/14 [lsp              ]           1.64GB/s ± 0%           1.65GB/s ± 0%  +0.46%         (p=0.000 n=10+8)
BM_UFlatSink/15 [xls              ]           1.14GB/s ± 0%           1.15GB/s ± 0%  +0.27%        (p=0.000 n=10+10)
BM_UFlatSink/16 [xls_200          ]            927MB/s ± 1%            926MB/s ± 1%    ~            (p=0.497 n=10+9)
BM_UFlatSink/17 [bin              ]           1.87GB/s ± 0%           1.88GB/s ± 0%  +0.27%        (p=0.000 n=10+10)
BM_UFlatSink/18 [bin_200          ]           1.92GB/s ± 2%           1.93GB/s ± 1%  +0.70%         (p=0.035 n=9+10)
BM_UFlatSink/19 [sum              ]           1.38GB/s ± 0%           1.41GB/s ± 0%  +2.59%         (p=0.000 n=9+10)
BM_UFlatSink/20 [man              ]           1.40GB/s ± 1%           1.41GB/s ± 1%    ~            (p=0.079 n=10+9)
BM_ZFlat/0      [html (22.31 %)   ]            814MB/s ± 0%            816MB/s ± 0%  +0.23%        (p=0.000 n=10+10)
BM_ZFlat/1      [urls (47.78 %)   ]            418MB/s ± 0%            423MB/s ± 0%  +1.06%        (p=0.000 n=10+10)
BM_ZFlat/2      [jpg (99.95 %)    ]           10.5GB/s ± 5%           10.7GB/s ± 5%    ~           (p=0.165 n=10+10)
BM_ZFlat/3      [jpg_200 (73.00 %)]            558MB/s ± 3%            560MB/s ± 1%    ~            (p=0.696 n=10+8)
BM_ZFlat/4      [pdf (83.30 %)    ]           6.94GB/s ± 2%           7.05GB/s ± 1%  +1.59%         (p=0.028 n=10+9)
BM_ZFlat/5      [html4 (22.52 %)  ]            739MB/s ± 0%            745MB/s ± 0%  +0.86%        (p=0.000 n=10+10)
BM_ZFlat/6      [txt1 (57.88 %)   ]            281MB/s ± 0%            283MB/s ± 0%  +0.46%        (p=0.000 n=10+10)
BM_ZFlat/7      [txt2 (61.91 %)   ]            260MB/s ± 0%            261MB/s ± 0%  +0.59%        (p=0.000 n=10+10)
BM_ZFlat/8      [txt3 (54.99 %)   ]            296MB/s ± 0%            297MB/s ± 0%  +0.45%        (p=0.000 n=10+10)
BM_ZFlat/9      [txt4 (66.26 %)   ]            244MB/s ± 0%            245MB/s ± 0%  +0.16%        (p=0.000 n=10+10)
BM_ZFlat/10     [pb (19.68 %)     ]           1.07GB/s ± 0%           1.09GB/s ± 0%  +1.75%        (p=0.000 n=10+10)
BM_ZFlat/11     [gaviota (37.72 %)]            450MB/s ± 0%            451MB/s ± 0%  +0.17%         (p=0.000 n=9+10)
BM_ZFlat/12     [cp (48.12 %)     ]            538MB/s ± 0%            542MB/s ± 0%  +0.74%        (p=0.000 n=10+10)
BM_ZFlat/13     [c (42.47 %)      ]            635MB/s ± 0%            640MB/s ± 0%  +0.80%        (p=0.000 n=10+10)
BM_ZFlat/14     [lsp (48.37 %)    ]            678MB/s ± 0%            686MB/s ± 1%  +1.18%         (p=0.000 n=9+10)
BM_ZFlat/15     [xls (41.23 %)    ]            633MB/s ± 0%            641MB/s ± 0%  +1.23%         (p=0.000 n=10+7)
BM_ZFlat/16     [xls_200 (78.00 %)]            516MB/s ± 2%            513MB/s ± 1%    ~            (p=0.156 n=10+9)
BM_ZFlat/17     [bin (18.11 %)    ]           1.01GB/s ± 0%           1.02GB/s ± 0%  +0.49%        (p=0.000 n=10+10)
BM_ZFlat/18     [bin_200 (7.50 %) ]           2.16GB/s ± 0%           2.24GB/s ± 1%  +3.65%          (p=0.000 n=8+8)
BM_ZFlat/19     [sum (48.96 %)    ]            478MB/s ± 0%            486MB/s ± 0%  +1.66%        (p=0.000 n=10+10)
BM_ZFlat/20     [man (59.21 %)    ]            558MB/s ± 1%            558MB/s ± 1%    ~           (p=0.912 n=10+10)
2019-01-04 19:08:30 -08:00
jueminyang 254966c71e Migrate to use absl::random 2019-01-04 19:08:11 -08:00
alkis 53a38e5e33 Reduce number of allocations when compressing and simplify the code.
Before we were allocating at least once: twice with large table and
thrice when we used a scratch buffer. With this approach we always
allocate once.

  name                                          old speed               new speed               delta
  BM_UFlat/0      [html             ]           2.45GB/s ± 0%           2.45GB/s ± 0%   -0.13%        (p=0.000 n=11+11)
  BM_UFlat/1      [urls             ]           1.19GB/s ± 0%           1.22GB/s ± 0%   +2.48%        (p=0.000 n=11+11)
  BM_UFlat/2      [jpg              ]           17.2GB/s ± 2%           17.3GB/s ± 1%     ~           (p=0.193 n=11+11)
  BM_UFlat/3      [jpg_200          ]           1.52GB/s ± 0%           1.51GB/s ± 0%   -0.78%         (p=0.000 n=10+9)
  BM_UFlat/4      [pdf              ]           12.5GB/s ± 1%           12.5GB/s ± 1%     ~             (p=0.881 n=9+9)
  BM_UFlat/5      [html4            ]           1.86GB/s ± 0%           1.86GB/s ± 0%     ~           (p=0.123 n=11+11)
  BM_UFlat/6      [txt1             ]            793MB/s ± 0%            799MB/s ± 0%   +0.78%         (p=0.000 n=11+9)
  BM_UFlat/7      [txt2             ]            739MB/s ± 0%            744MB/s ± 0%   +0.77%        (p=0.000 n=11+11)
  BM_UFlat/8      [txt3             ]            839MB/s ± 0%            845MB/s ± 0%   +0.71%        (p=0.000 n=11+11)
  BM_UFlat/9      [txt4             ]            678MB/s ± 0%            685MB/s ± 0%   +1.01%        (p=0.000 n=11+11)
  BM_UFlat/10     [pb               ]           3.08GB/s ± 0%           3.12GB/s ± 0%   +1.21%        (p=0.000 n=11+11)
  BM_UFlat/11     [gaviota          ]            975MB/s ± 0%            976MB/s ± 0%   +0.11%        (p=0.000 n=11+11)
  BM_UFlat/12     [cp               ]           1.73GB/s ± 1%           1.74GB/s ± 1%   +0.46%        (p=0.010 n=11+11)
  BM_UFlat/13     [c                ]           1.53GB/s ± 0%           1.53GB/s ± 0%     ~           (p=0.987 n=11+10)
  BM_UFlat/14     [lsp              ]           1.65GB/s ± 0%           1.63GB/s ± 1%   -1.04%        (p=0.000 n=11+11)
  BM_UFlat/15     [xls              ]           1.08GB/s ± 0%           1.15GB/s ± 0%   +6.12%        (p=0.000 n=10+11)
  BM_UFlat/16     [xls_200          ]            944MB/s ± 0%            920MB/s ± 3%   -2.51%         (p=0.000 n=9+11)
  BM_UFlat/17     [bin              ]           1.86GB/s ± 0%           1.87GB/s ± 0%   +0.68%        (p=0.000 n=10+11)
  BM_UFlat/18     [bin_200          ]           1.91GB/s ± 3%           1.92GB/s ± 5%     ~           (p=0.356 n=11+11)
  BM_UFlat/19     [sum              ]           1.31GB/s ± 0%           1.40GB/s ± 0%   +6.53%        (p=0.000 n=11+11)
  BM_UFlat/20     [man              ]           1.42GB/s ± 0%           1.42GB/s ± 0%   +0.33%        (p=0.000 n=10+10)
2019-01-04 19:07:49 -08:00
ckennelly df5548c0b3 Use sized deallocation when releasing Zippy's scratch buffers.
name                                          old time/op             new time/op             delta
BM_UFlat/0      [html             ]            41.7µs ± 0%             41.7µs ± 0%    ~             (p=0.222 n=5+5)
BM_UFlat/1      [urls             ]             587µs ± 0%              574µs ± 0%  -2.31%          (p=0.008 n=5+5)
BM_UFlat/2      [jpg              ]            7.24µs ± 2%             7.25µs ± 2%    ~             (p=0.690 n=5+5)
BM_UFlat/3      [jpg_200          ]              130ns ± 0%              131ns ± 1%    ~             (p=0.556 n=4+5)
BM_UFlat/4      [pdf              ]            8.21µs ± 0%             8.24µs ± 1%    ~             (p=0.278 n=5+5)
BM_UFlat/5      [html4            ]             219µs ± 0%              220µs ± 0%  +0.45%          (p=0.008 n=5+5)
BM_UFlat/6      [txt1             ]             192µs ± 0%              190µs ± 0%  -0.86%          (p=0.008 n=5+5)
BM_UFlat/7      [txt2             ]             169µs ± 0%              168µs ± 0%  -0.54%          (p=0.008 n=5+5)
BM_UFlat/8      [txt3             ]             509µs ± 0%              505µs ± 0%  -0.66%          (p=0.008 n=5+5)
BM_UFlat/9      [txt4             ]             710µs ± 0%              702µs ± 0%  -1.14%          (p=0.008 n=5+5)
BM_UFlat/10     [pb               ]            38.2µs ± 0%             37.9µs ± 0%  -0.82%          (p=0.008 n=5+5)
BM_UFlat/11     [gaviota          ]             189µs ± 0%              189µs ± 0%    ~             (p=0.746 n=5+5)
BM_UFlat/12     [cp               ]            14.2µs ± 0%             14.2µs ± 1%    ~             (p=0.421 n=5+5)
BM_UFlat/13     [c                ]            7.29µs ± 0%             7.34µs ± 1%  +0.69%          (p=0.016 n=5+5)
BM_UFlat/14     [lsp              ]            2.27µs ± 0%             2.28µs ± 0%  +0.34%          (p=0.008 n=5+5)
BM_UFlat/15     [xls              ]             954µs ± 0%              900µs ± 0%  -5.67%          (p=0.008 n=5+5)
BM_UFlat/16     [xls_200          ]              213ns ± 1%              217ns ± 2%    ~             (p=0.056 n=5+5)
BM_UFlat/17     [bin              ]             276µs ± 0%              274µs ± 0%  -0.94%          (p=0.008 n=5+5)
BM_UFlat/18     [bin_200          ]              101ns ± 1%              101ns ± 1%    ~             (p=0.524 n=5+5)
BM_UFlat/19     [sum              ]            29.3µs ± 0%             27.3µs ± 0%  -6.98%          (p=0.008 n=5+5)
BM_UFlat/20     [man              ]            2.95µs ± 0%             2.95µs ± 0%    ~             (p=0.651 n=5+5)

For microbenchmarks, the overhead of allocating/deallocating should be
small (the relevant metadata for TCMalloc's PageMap will be in cache),
but this helps demonstrate that the refactoring does not adversely
impact performance.
2019-01-04 19:07:40 -08:00
alkis 1b7466e143 Compute the wordmask instead of looking it up in a table.
Tested:
  name                                old speed               new speed               delta
  BM_UFlat/0      [html   ]           2.13GB/s ± 0%           2.46GB/s ± 0%  +15.70%         (p=0.000 n=10+8)
  BM_UFlat/1      [urls   ]           1.21GB/s ± 0%           1.20GB/s ± 0%   -1.49%         (p=0.000 n=9+10)
  BM_UFlat/2      [jpg    ]           17.1GB/s ± 1%           17.2GB/s ± 1%     ~           (p=0.120 n=11+11)
  BM_UFlat/3      [jpg_200]           1.55GB/s ± 0%           1.54GB/s ± 0%   -0.96%         (p=0.000 n=10+7)
  BM_UFlat/4      [pdf    ]           12.9GB/s ± 0%           12.6GB/s ± 0%   -1.98%         (p=0.000 n=11+9)
  BM_UFlat/5      [html4  ]           1.87GB/s ± 0%           1.87GB/s ± 0%   -0.06%        (p=0.033 n=11+11)
  BM_UFlat/6      [txt1   ]            816MB/s ± 0%            793MB/s ± 0%   -2.84%        (p=0.000 n=11+11)
  BM_UFlat/7      [txt2   ]            758MB/s ± 0%            737MB/s ± 0%   -2.77%        (p=0.000 n=11+11)
  BM_UFlat/8      [txt3   ]            865MB/s ± 0%            839MB/s ± 0%   -2.94%         (p=0.000 n=11+8)
  BM_UFlat/9      [txt4   ]            701MB/s ± 0%            679MB/s ± 0%   -3.11%        (p=0.000 n=11+10)
  BM_UFlat/10     [pb     ]           2.60GB/s ± 2%           3.07GB/s ± 0%  +17.81%        (p=0.000 n=11+11)
  BM_UFlat/11     [gaviota]           1.01GB/s ± 0%           0.97GB/s ± 0%   -3.83%        (p=0.000 n=11+10)
  BM_UFlat/12     [cp     ]           1.66GB/s ± 1%           1.73GB/s ± 1%   +4.32%        (p=0.000 n=11+11)
  BM_UFlat/13     [c      ]           1.52GB/s ± 1%           1.53GB/s ± 0%   +0.49%        (p=0.002 n=11+11)
  BM_UFlat/14     [lsp    ]           1.61GB/s ± 0%           1.64GB/s ± 0%   +2.10%        (p=0.000 n=10+11)
  BM_UFlat/15     [xls    ]           1.12GB/s ± 0%           1.08GB/s ± 0%   -3.95%         (p=0.000 n=11+7)
  BM_UFlat/16     [xls_200]            926MB/s ± 1%            935MB/s ± 1%     ~            (p=0.056 n=9+11)
  BM_UFlat/17     [bin    ]           1.89GB/s ± 0%           1.86GB/s ± 0%   -1.32%        (p=0.000 n=11+11)
  BM_UFlat/18     [bin_200]           1.96GB/s ± 0%           1.99GB/s ± 1%   +1.78%        (p=0.000 n=11+11)
  BM_UFlat/19     [sum    ]           1.32GB/s ± 0%           1.31GB/s ± 0%   -0.79%        (p=0.000 n=11+10)
  BM_UFlat/20     [man    ]           1.40GB/s ± 0%           1.43GB/s ± 0%   +2.51%         (p=0.000 n=9+10)
  BM_UValidate/0  [html   ]           2.95GB/s ± 1%           3.07GB/s ± 0%   +4.11%        (p=0.000 n=10+11)
  BM_UValidate/1  [urls   ]           1.57GB/s ± 0%           1.60GB/s ± 0%   +2.24%        (p=0.000 n=10+11)
  BM_UValidate/2  [jpg    ]            822GB/s ± 0%            850GB/s ± 0%   +3.42%        (p=0.000 n=10+11)
  BM_UValidate/3  [jpg_200]           2.01GB/s ± 0%           2.04GB/s ± 0%   +1.24%        (p=0.000 n=11+11)
  BM_UValidate/4  [pdf    ]           33.7GB/s ± 0%           35.9GB/s ± 1%   +6.51%        (p=0.000 n=10+11)
  BM_UIOVec/0     [html   ]            852MB/s ± 0%            852MB/s ± 0%     ~           (p=0.898 n=11+11)
  BM_UIOVec/1     [urls   ]            663MB/s ± 0%            652MB/s ± 0%   -1.61%        (p=0.000 n=11+11)
  BM_UIOVec/2     [jpg    ]           15.3GB/s ± 1%           15.3GB/s ± 2%     ~            (p=0.459 n=9+10)
  BM_UIOVec/3     [jpg_200]            652MB/s ± 0%            627MB/s ± 1%   -3.80%        (p=0.000 n=10+11)
  BM_UIOVec/4     [pdf    ]           8.80GB/s ± 1%           8.57GB/s ± 1%   -2.62%        (p=0.000 n=10+11)
  BM_UFlatSink/0  [html   ]           2.13GB/s ± 0%           2.46GB/s ± 0%  +15.63%        (p=0.000 n=11+11)
  BM_UFlatSink/1  [urls   ]           1.21GB/s ± 0%           1.20GB/s ± 0%   -1.42%        (p=0.000 n=11+10)
  BM_UFlatSink/2  [jpg    ]           17.1GB/s ± 2%           17.2GB/s ± 1%     ~            (p=0.175 n=11+9)
  BM_UFlatSink/3  [jpg_200]           1.52GB/s ± 1%           1.47GB/s ± 3%   -3.15%        (p=0.000 n=11+11)
  BM_UFlatSink/4  [pdf    ]           12.8GB/s ± 1%           12.6GB/s ± 1%   -1.76%        (p=0.000 n=11+11)
  BM_UFlatSink/5  [html4  ]           1.87GB/s ± 0%           1.87GB/s ± 0%   -0.19%        (p=0.000 n=11+10)
  BM_UFlatSink/6  [txt1   ]            816MB/s ± 0%            792MB/s ± 0%   -2.94%        (p=0.000 n=11+11)
  BM_UFlatSink/7  [txt2   ]            758MB/s ± 0%            736MB/s ± 0%   -2.83%        (p=0.000 n=11+11)
  BM_UFlatSink/8  [txt3   ]            865MB/s ± 0%            838MB/s ± 0%   -3.13%        (p=0.000 n=11+11)
  BM_UFlatSink/9  [txt4   ]            701MB/s ± 0%            678MB/s ± 0%   -3.20%        (p=0.000 n=11+11)
  BM_UFlatSink/10 [pb     ]           2.60GB/s ± 2%           3.07GB/s ± 0%  +18.27%        (p=0.000 n=11+10)
  BM_UFlatSink/11 [gaviota]           1.01GB/s ± 0%           0.97GB/s ± 0%   -3.90%        (p=0.000 n=11+11)
  BM_UFlatSink/12 [cp     ]           1.66GB/s ± 1%           1.73GB/s ± 1%   +4.62%        (p=0.000 n=11+10)
  BM_UFlatSink/13 [c      ]           1.52GB/s ± 0%           1.53GB/s ± 1%     ~            (p=0.180 n=9+11)
  BM_UFlatSink/14 [lsp    ]           1.61GB/s ± 0%           1.64GB/s ± 1%   +1.98%         (p=0.000 n=9+11)
  BM_UFlatSink/15 [xls    ]           1.12GB/s ± 0%           1.08GB/s ± 0%   -3.76%        (p=0.000 n=11+11)
  BM_UFlatSink/16 [xls_200]            909MB/s ± 2%            924MB/s ± 1%   +1.62%        (p=0.000 n=11+11)
  BM_UFlatSink/17 [bin    ]           1.88GB/s ± 0%           1.86GB/s ± 0%   -1.18%         (p=0.000 n=9+11)
  BM_UFlatSink/18 [bin_200]           1.94GB/s ± 2%           1.94GB/s ± 1%     ~           (p=0.090 n=11+11)
  BM_UFlatSink/19 [sum    ]           1.32GB/s ± 0%           1.31GB/s ± 0%   -0.76%        (p=0.000 n=11+11)
  BM_UFlatSink/20 [man    ]           1.39GB/s ± 2%           1.43GB/s ± 0%   +2.75%        (p=0.000 n=11+10)

  Assembly before:

*	44 8b 5c 85 a0       	mov    -0x60(%rbp,%rax,4),%r11d
	45 23 5d 00          	and    0x0(%r13),%r11d
	89 d6                	mov    %edx,%esi
	81 e6 00 07 00 00    	and    $0x700,%esi

  Assembly after:

*	89 c1                	mov    %eax,%ecx
*	c0 e1 03             	shl    $0x3,%cl
*	bf ff ff ff ff       	mov    $0xffffffff,%edi
*	48 d3 e7             	shl    %cl,%rdi
*	f7 d7                	not    %edi
	41 23 7d 00          	and    0x0(%r13),%edi
	41 89 d3             	mov    %edx,%r11d
	41 81 e3 00 07 00 00 	and    $0x700,%r11d
2019-01-04 19:07:28 -08:00
Caleb Mazalevskis a866f7181c
Update README to use HTTPS instead of HTTP.
HTTPS is currently available for all the HTTP links included in the README.
As such, using HTTPS instead of HTTP for those links may be preferable.
2018-12-14 17:12:32 +08:00
costan ea660b57d6 Fix unused private field warning in NDEBUG builds. 2018-08-17 14:31:23 -07:00
costan 7fefd231a1 C++11 guarantees <cstddef> and <cstdint>.
The build configuration can be cleaned up a bit.
2018-08-16 11:36:45 -07:00
costan db082d2cd6 Remove GCC on OSX from the Travis CI matrix. 2018-08-16 11:36:19 -07:00
costan ad82620f6f Move pshufb_fill_patterns from snappy-internal.h to snappy.cc.
The array of constants is only used in the SSSE3 fast-path in IncrementalCopy.
2018-08-09 12:08:12 -07:00
costan 73c31e824c Fix Visual Studio build.
Commit 8f469d97e2 introduced SSSE3 fast
paths that are gated by __SSE3__ macro checks and the <x86intrin.h>
header, neither of which exists in Visual Studio. This commit adds logic
for detecting SSSE3 compiler support that works for all compilers
supported by the open source release.

The commit also replaces the header with <tmmintrin.h>, which only
defines intrinsics supported by SSSE3 and below. This should help flag
any use of SIMD instructions that require more advanced SSE support, so
the uses can be gated by checks that also work in the open source
release.

Last, this commit requires C++11 support for the open source build. This is
needed by the alignas specifier, which was also introduced in commit
8f469d97e2.
2018-08-08 22:25:14 -07:00
jefflim 27ff0af12a Improve performance of zippy decompression to IOVecs by up to almost 50%
1) Simplify loop condition for small pattern IncrementalCopy
2) Use pointers rather than indices to track current iovec.
3) Use fast IncrementalCopy
4) Bypass Append check from within AppendFromSelf

While this code greatly improves the performance of ZippyIOVecWriter, a
bigger question is whether IOVec writing should be improved, or removed.

Perf tests:

name                                 old speed      new speed      delta
BM_UFlat/0      [html             ]  2.13GB/s ± 0%  2.14GB/s ± 1%     ~
BM_UFlat/1      [urls             ]  1.22GB/s ± 0%  1.24GB/s ± 0%   +1.87%
BM_UFlat/2      [jpg              ]  17.2GB/s ± 1%  17.1GB/s ± 0%     ~
BM_UFlat/3      [jpg_200          ]  1.55GB/s ± 0%  1.53GB/s ± 2%     ~
BM_UFlat/4      [pdf              ]  12.8GB/s ± 1%  12.7GB/s ± 2%   -0.36%
BM_UFlat/5      [html4            ]  1.89GB/s ± 0%  1.90GB/s ± 1%     ~
BM_UFlat/6      [txt1             ]   811MB/s ± 0%   829MB/s ± 1%   +2.24%
BM_UFlat/7      [txt2             ]   756MB/s ± 0%   774MB/s ± 1%   +2.41%
BM_UFlat/8      [txt3             ]   860MB/s ± 0%   879MB/s ± 1%   +2.16%
BM_UFlat/9      [txt4             ]   699MB/s ± 0%   715MB/s ± 1%   +2.31%
BM_UFlat/10     [pb               ]  2.64GB/s ± 0%  2.65GB/s ± 1%     ~
BM_UFlat/11     [gaviota          ]  1.00GB/s ± 0%  0.99GB/s ± 2%     ~
BM_UFlat/12     [cp               ]  1.66GB/s ± 1%  1.66GB/s ± 2%     ~
BM_UFlat/13     [c                ]  1.53GB/s ± 0%  1.47GB/s ± 5%   -3.97%
BM_UFlat/14     [lsp              ]  1.60GB/s ± 1%  1.55GB/s ± 5%   -3.41%
BM_UFlat/15     [xls              ]  1.12GB/s ± 0%  1.15GB/s ± 0%   +1.93%
BM_UFlat/16     [xls_200          ]   918MB/s ± 2%   929MB/s ± 1%   +1.15%
BM_UFlat/17     [bin              ]  1.86GB/s ± 0%  1.89GB/s ± 1%   +1.61%
BM_UFlat/18     [bin_200          ]  1.90GB/s ± 1%  1.97GB/s ± 1%   +3.67%
BM_UFlat/19     [sum              ]  1.32GB/s ± 0%  1.33GB/s ± 1%     ~
BM_UFlat/20     [man              ]  1.39GB/s ± 0%  1.36GB/s ± 3%     ~
BM_UValidate/0  [html             ]  2.85GB/s ± 3%  2.90GB/s ± 0%     ~
BM_UValidate/1  [urls             ]  1.57GB/s ± 0%  1.56GB/s ± 0%   -0.20%
BM_UValidate/2  [jpg              ]   824GB/s ± 0%   825GB/s ± 0%   +0.11%
BM_UValidate/3  [jpg_200          ]  2.01GB/s ± 0%  2.02GB/s ± 0%   +0.10%
BM_UValidate/4  [pdf              ]  30.4GB/s ±11%  33.5GB/s ± 0%     ~
BM_UIOVec/0     [html             ]   604MB/s ± 0%   856MB/s ± 0%  +41.70%
BM_UIOVec/1     [urls             ]   440MB/s ± 0%   660MB/s ± 0%  +49.91%
BM_UIOVec/2     [jpg              ]  15.1GB/s ± 1%  15.3GB/s ± 1%   +1.22%
BM_UIOVec/3     [jpg_200          ]   567MB/s ± 1%   629MB/s ± 0%  +10.89%
BM_UIOVec/4     [pdf              ]  7.16GB/s ± 2%  8.56GB/s ± 1%  +19.64%
BM_UFlatSink/0  [html             ]  2.13GB/s ± 0%  2.16GB/s ± 0%   +1.47%
BM_UFlatSink/1  [urls             ]  1.22GB/s ± 0%  1.25GB/s ± 0%   +2.18%
BM_UFlatSink/2  [jpg              ]  17.1GB/s ± 2%  17.1GB/s ± 2%     ~
BM_UFlatSink/3  [jpg_200          ]  1.51GB/s ± 1%  1.53GB/s ± 2%   +1.11%
BM_UFlatSink/4  [pdf              ]  12.7GB/s ± 2%  12.8GB/s ± 1%   +0.67%
BM_UFlatSink/5  [html4            ]  1.90GB/s ± 0%  1.92GB/s ± 0%   +1.31%
BM_UFlatSink/6  [txt1             ]   810MB/s ± 0%   835MB/s ± 0%   +3.04%
BM_UFlatSink/7  [txt2             ]   755MB/s ± 0%   779MB/s ± 0%   +3.19%
BM_UFlatSink/8  [txt3             ]   859MB/s ± 0%   884MB/s ± 0%   +2.86%
BM_UFlatSink/9  [txt4             ]   698MB/s ± 0%   718MB/s ± 0%   +2.96%
BM_UFlatSink/10 [pb               ]  2.64GB/s ± 0%  2.67GB/s ± 0%   +1.16%
BM_UFlatSink/11 [gaviota          ]  1.00GB/s ± 0%  1.01GB/s ± 0%   +1.04%
BM_UFlatSink/12 [cp               ]  1.66GB/s ± 1%  1.68GB/s ± 1%   +0.83%
BM_UFlatSink/13 [c                ]  1.52GB/s ± 1%  1.53GB/s ± 0%   +0.38%
BM_UFlatSink/14 [lsp              ]  1.60GB/s ± 1%  1.61GB/s ± 0%   +0.91%
BM_UFlatSink/15 [xls              ]  1.12GB/s ± 0%  1.15GB/s ± 0%   +1.96%
BM_UFlatSink/16 [xls_200          ]   906MB/s ± 3%   920MB/s ± 1%   +1.55%
BM_UFlatSink/17 [bin              ]  1.86GB/s ± 0%  1.90GB/s ± 0%   +2.15%
BM_UFlatSink/18 [bin_200          ]  1.85GB/s ± 2%  1.92GB/s ± 2%   +4.01%
BM_UFlatSink/19 [sum              ]  1.32GB/s ± 1%  1.35GB/s ± 0%   +2.23%
BM_UFlatSink/20 [man              ]  1.39GB/s ± 1%  1.40GB/s ± 0%   +1.12%
BM_ZFlat/0      [html (22.31 %)   ]   800MB/s ± 0%   793MB/s ± 0%   -0.95%
BM_ZFlat/1      [urls (47.78 %)   ]   423MB/s ± 0%   424MB/s ± 0%   +0.11%
BM_ZFlat/2      [jpg (99.95 %)    ]  12.0GB/s ± 2%  12.0GB/s ± 4%     ~
BM_ZFlat/3      [jpg_200 (73.00 %)]   592MB/s ± 3%   594MB/s ± 2%     ~
BM_ZFlat/4      [pdf (83.30 %)    ]  7.26GB/s ± 1%  7.23GB/s ± 2%   -0.49%
BM_ZFlat/5      [html4 (22.52 %)  ]   738MB/s ± 0%   739MB/s ± 0%   +0.17%
BM_ZFlat/6      [txt1 (57.88 %)   ]   286MB/s ± 0%   285MB/s ± 0%   -0.09%
BM_ZFlat/7      [txt2 (61.91 %)   ]   264MB/s ± 0%   264MB/s ± 0%   +0.08%
BM_ZFlat/8      [txt3 (54.99 %)   ]   300MB/s ± 0%   300MB/s ± 0%     ~
BM_ZFlat/9      [txt4 (66.26 %)   ]   248MB/s ± 0%   247MB/s ± 0%   -0.20%
BM_ZFlat/10     [pb (19.68 %)     ]  1.04GB/s ± 0%  1.03GB/s ± 0%   -1.17%
BM_ZFlat/11     [gaviota (37.72 %)]   451MB/s ± 0%   450MB/s ± 0%   -0.35%
BM_ZFlat/12     [cp (48.12 %)     ]   543MB/s ± 0%   538MB/s ± 0%   -1.04%
BM_ZFlat/13     [c (42.47 %)      ]   638MB/s ± 1%   643MB/s ± 0%   +0.68%
BM_ZFlat/14     [lsp (48.37 %)    ]   686MB/s ± 0%   691MB/s ± 1%   +0.76%
BM_ZFlat/15     [xls (41.23 %)    ]   636MB/s ± 0%   633MB/s ± 0%   -0.52%
BM_ZFlat/16     [xls_200 (78.00 %)]   523MB/s ± 2%   520MB/s ± 2%   -0.56%
BM_ZFlat/17     [bin (18.11 %)    ]  1.01GB/s ± 0%  1.01GB/s ± 0%   +0.50%
BM_ZFlat/18     [bin_200 (7.50 %) ]  2.45GB/s ± 1%  2.44GB/s ± 1%   -0.54%
BM_ZFlat/19     [sum (48.96 %)    ]   487MB/s ± 0%   478MB/s ± 0%   -1.89%
BM_ZFlat/20     [man (59.21 %)    ]   567MB/s ± 1%   566MB/s ± 1%     ~

The BM_UFlat/13 and BM_UFlat/14 results showed high variance, so I reran them:

name               old speed      new speed      delta
BM_UFlat/13 [c  ]  1.53GB/s ± 0%  1.53GB/s ± 1%    ~
BM_UFlat/14 [lsp]  1.61GB/s ± 1%  1.61GB/s ± 1%  +0.25%
2018-08-07 23:41:17 -07:00
costan 4ffb0e62c5 Update Travis CI configuration. 2018-08-07 21:33:14 -07:00
atdt be490ef9ec Test for SSE3 suppport before using pshufb. 2018-08-04 18:51:13 -07:00
atdt 8f469d97e2 Avoid store-forwarding stalls in Zippy's IncrementalCopy
NEW: Annotate `pattern` as initialized, for MSan.

Snappy's IncrementalCopy routine optimizes for speed by reading and writing
memory in blocks of eight or sixteen bytes. If the gap between the source
and destination pointers is smaller than eight bytes, snappy's strategy is
to expand the gap by issuing a series of partly-overlapping eight-byte
loads+stores. Because the range of each load partly overlaps that of the
store which preceded it, the store buffer cannot be forwarded to the load,
and the load stalls while it waits for the store to retire. This is called a
store-forwarding stall.

We can use fewer loads and avoid most of the stalls by loading the first
eight bytes into an 128-bit XMM register, then using PSHUFB to permute the
register's contents in-place into the desired repeating sequence of bytes.
When falling back to IncrementalCopySlow, use memset if the pattern size == 1.
This eliminates around 60% of the stalls.

name                       old time/op    new time/op    delta
BM_UFlat/0 [html]        48.6µs ± 0%    48.2µs ± 0%   -0.92%        (p=0.000 n=19+18)
BM_UFlat/1 [urls]         589µs ± 0%     576µs ± 0%   -2.17%        (p=0.000 n=19+18)
BM_UFlat/2 [jpg]         7.12µs ± 0%    7.10µs ± 0%     ~           (p=0.071 n=19+18)
BM_UFlat/3 [jpg_200]      162ns ± 0%     151ns ± 0%   -7.06%        (p=0.000 n=19+18)
BM_UFlat/4 [pdf]         8.25µs ± 0%    8.19µs ± 0%   -0.74%        (p=0.000 n=19+18)
BM_UFlat/5 [html4]        218µs ± 0%     218µs ± 0%   +0.09%        (p=0.000 n=17+18)
BM_UFlat/6 [txt1]         191µs ± 0%     189µs ± 0%   -1.12%        (p=0.000 n=19+18)
BM_UFlat/7 [txt2]         168µs ± 0%     167µs ± 0%   -1.01%        (p=0.000 n=19+18)
BM_UFlat/8 [txt3]         502µs ± 0%     499µs ± 0%   -0.52%        (p=0.000 n=19+18)
BM_UFlat/9 [txt4]         704µs ± 0%     695µs ± 0%   -1.26%        (p=0.000 n=19+18)
BM_UFlat/10 [pb]         45.6µs ± 0%    44.2µs ± 0%   -3.13%        (p=0.000 n=19+15)
BM_UFlat/11 [gaviota]     188µs ± 0%     194µs ± 0%   +3.06%        (p=0.000 n=15+18)
BM_UFlat/12 [cp]         15.1µs ± 2%    14.7µs ± 1%   -2.09%        (p=0.000 n=18+18)
BM_UFlat/13 [c]          7.38µs ± 0%    7.36µs ± 0%   -0.28%        (p=0.000 n=16+18)
BM_UFlat/14 [lsp]        2.31µs ± 0%    2.37µs ± 0%   +2.64%        (p=0.000 n=19+18)
BM_UFlat/15 [xls]         984µs ± 0%     909µs ± 0%   -7.59%        (p=0.000 n=19+18)
BM_UFlat/16 [xls_200]     215ns ± 0%     217ns ± 0%   +0.71%        (p=0.000 n=19+15)
BM_UFlat/17 [bin]         289µs ± 0%     287µs ± 0%   -0.71%        (p=0.000 n=19+18)
BM_UFlat/18 [bin_200]     161ns ± 0%     116ns ± 0%  -28.09%        (p=0.000 n=19+16)
BM_UFlat/19 [sum]        31.9µs ± 0%    29.2µs ± 0%   -8.37%        (p=0.000 n=19+18)
BM_UFlat/20 [man]        3.13µs ± 1%    3.07µs ± 0%   -1.79%        (p=0.000 n=19+18)

name                       old allocs/op  new allocs/op  delta
BM_UFlat/0 [html]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/1 [urls]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/2 [jpg]          0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/3 [jpg_200]      0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/4 [pdf]          0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/5 [html4]        0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/6 [txt1]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/7 [txt2]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/8 [txt3]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/9 [txt4]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/10 [pb]          0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/11 [gaviota]     0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/12 [cp]          0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/13 [c]           0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/14 [lsp]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/15 [xls]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/16 [xls_200]     0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/17 [bin]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/18 [bin_200]     0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/19 [sum]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/20 [man]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)

name                       old speed      new speed      delta
BM_UFlat/0 [html]      2.11GB/s ± 0%  2.13GB/s ± 0%   +0.92%        (p=0.000 n=19+18)
BM_UFlat/1 [urls]      1.19GB/s ± 0%  1.22GB/s ± 0%   +2.22%        (p=0.000 n=16+17)
BM_UFlat/2 [jpg]       17.3GB/s ± 0%  17.3GB/s ± 0%     ~           (p=0.074 n=19+18)
BM_UFlat/3 [jpg_200]   1.23GB/s ± 0%  1.33GB/s ± 0%   +7.58%        (p=0.000 n=19+18)
BM_UFlat/4 [pdf]       12.4GB/s ± 0%  12.5GB/s ± 0%   +0.74%        (p=0.000 n=19+18)
BM_UFlat/5 [html4]     1.88GB/s ± 0%  1.88GB/s ± 0%   -0.09%        (p=0.000 n=18+18)
BM_UFlat/6 [txt1]       798MB/s ± 0%   807MB/s ± 0%   +1.13%        (p=0.000 n=19+18)
BM_UFlat/7 [txt2]       743MB/s ± 0%   751MB/s ± 0%   +1.02%        (p=0.000 n=19+18)
BM_UFlat/8 [txt3]       850MB/s ± 0%   855MB/s ± 0%   +0.52%        (p=0.000 n=19+18)
BM_UFlat/9 [txt4]       684MB/s ± 0%   693MB/s ± 0%   +1.28%        (p=0.000 n=19+18)
BM_UFlat/10 [pb]       2.60GB/s ± 0%  2.69GB/s ± 0%   +3.25%        (p=0.000 n=19+16)
BM_UFlat/11 [gaviota]   979MB/s ± 0%   950MB/s ± 0%   -2.97%        (p=0.000 n=15+18)
BM_UFlat/12 [cp]       1.63GB/s ± 2%  1.67GB/s ± 1%   +2.13%        (p=0.000 n=18+18)
BM_UFlat/13 [c]        1.51GB/s ± 0%  1.52GB/s ± 0%   +0.29%        (p=0.000 n=16+18)
BM_UFlat/14 [lsp]      1.61GB/s ± 1%  1.57GB/s ± 0%   -2.57%        (p=0.000 n=19+18)
BM_UFlat/15 [xls]      1.05GB/s ± 0%  1.13GB/s ± 0%   +8.22%        (p=0.000 n=19+18)
BM_UFlat/16 [xls_200]   928MB/s ± 0%   921MB/s ± 0%   -0.81%        (p=0.000 n=19+17)
BM_UFlat/17 [bin]      1.78GB/s ± 0%  1.79GB/s ± 0%   +0.71%        (p=0.000 n=19+18)
BM_UFlat/18 [bin_200]  1.24GB/s ± 0%  1.72GB/s ± 0%  +38.92%        (p=0.000 n=19+18)
BM_UFlat/19 [sum]      1.20GB/s ± 0%  1.31GB/s ± 0%   +9.15%        (p=0.000 n=19+18)
BM_UFlat/20 [man]      1.35GB/s ± 1%  1.38GB/s ± 0%   +1.84%        (p=0.000 n=19+18)
2018-08-04 18:51:07 -07:00
costan 4f7bd2dbfd Update CI configurations.
Bump GCC and Clang on Travis and remove Visual Studio 2015 from AppVeyor.
2018-03-09 09:02:34 -08:00
jgorbe ca37ab7fb9 Ensure DecompressAllTags starts on a 32-byte boundary + 16 bytes.
First of all, I'm sorry about this ugly hack. I hope the following long
explanation is enough to justify it.

We have observed that, in some conditions, the results for dataset number 10
(pb) in the zippy benchmark can show a >20% regression on Skylake CPUs.

In order to diagnose this, we profiled the benchmark looking at hot functions
(99% of the time is spent on DecompressAllTags), then looked at the generated
code to see if there was any difference. In order to discard a minor difference
we observed in register allocation we replaced zippy.cc with a pre-built assembly
file so it was the same in both variants, and we still were able to reproduce the
regression.

After discarding a regression caused by the compiler, we digged a bit further
and noticed that the alignment of the function in the final binary was
different. Both were aligned to a 16-byte boundary, but the slower one was also
(by chance) aligned to a 32-byte boundary. A regression caused by alignment
differences would explain why I could reproduce it consistently on the same CitC
client, but not others: slight differences in the sources can cause the resulting
binary to have different layout.

Here are some detailed benchmark results before/after the fix. Note how fixing
the alignment makes the difference between baseline and experiment go away, but
regular 32-byte alignment puts both variants in the same ballpark as the
original regression:

Original (note BM_UCord_10 and BM_UDataBuffer_10 around the -24% line):

  BASELINE
  BM_UCord/10                    2938           2932          24194 3.767GB/s  pb
  BM_UDataBuffer/10              3008           3004          23316 3.677GB/s  pb

  EXPERIMENT
  BM_UCord/10                    3797           3789          18512 2.915GB/s  pb
  BM_UDataBuffer/10              4024           4016          17543 2.750GB/s  pb

Aligning DecompressAllTags to a 32-byte boundary:

  BASELINE
  BM_UCord/10                    3872           3862          18035 2.860GB/s  pb
  BM_UDataBuffer/10              4010           3998          17591 2.763GB/s  pb

  EXPERIMENT
  BM_UCord/10                    3884           3876          18126 2.850GB/s  pb
  BM_UDataBuffer/10              4037           4027          17199 2.743GB/s  pb

Aligning DecompressAllTags to a 32-byte boundary + 16 bytes (this patch):

  BASELINE
  BM_UCord/10                    3103           3095          22642 3.569GB/s  pb
  BM_UDataBuffer/10              3186           3177          21947 3.476GB/s  pb

  EXPERIMENT
  BM_UCord/10                    3104           3095          22632 3.569GB/s  pb
  BM_UDataBuffer/10              3167           3159          22076 3.496GB/s  pb

This change forces the "good" alignment for DecompressAllTags which, if
anything, should make benchmark results more stable (and maybe we'll improve
some unlucky application!).
2018-02-17 00:47:18 -08:00
scrubbed 15a2804cd2 Fix an incorrect analysis / comment in the "pattern doubling" code.
This should have a miniscule positive effect on performance; the
main idea of the CL is just to fix the incorrect comment.
2018-02-17 00:46:31 -08:00
costan e69d9f8806 Fix Travis CI configuration for OSX. 2018-01-04 15:27:36 -08:00
chandlerc 4aba5426d4 Rework a very hot, very sensitive part of snappy to reduce the number of
instructions, the number of dynamic branches, and avoid a particular
loop structure than LLVM has a very hard time optimizing for this
particular case.

The code being changed is part of the hottest path for snappy
decompression. In the benchmarks for decompressing protocol buffers,
this has proven to be amazingly sensitive to the slightest changes in
code layout. For example, previously we added '.p2align 5' assembly
directive to the code. This essentially padded the loop out from the
function. Merely by doing this we saw significant performance
improvements.

As a consequence, several of the compiler's typically reasonable
optimizations can have surprising bad impacts. Loop unrolling is a
primary culprit, but in the next LLVM release we are seeing an issue due
to loop rotation. While some of the problems caused by the newly
triggered loop rotation in LLVM can be mitigated with ongoing work on
LLVM's code layout optimizations (specifically, loop header cloning),
that is a fairly long term project. And even minor fluctuations in how
that subsequent optimization is performed may prevent gaining the
performance back.

For now, we need some way to unblock the next LLVM release which
contains a generic improvement to the LLVM loop optimizer that enables
loop rotation in more places, but uncovers this sensitivity and weakness
in a particular case.

This CL restructures the loop to have a simpler structure. Specifically,
we eagerly test what the terminal condition will be and provide two
versions of the copy loop that use a single loop predicate.

The comments in the source code and benchmarks indicate that only one of
these two cases is actually hot: we expect to generally have enough slop
in the buffer. That in turn allows us to generate a much simpler branch
and loop structure for the hot path (especially for the protocol buffer
decompression benchmark).

However, structuring even this simple loop in a way that doesn't trigger
some other performance bubble (often a more severe one) is quite
challenging. We have to carefully manage the variables used in the loop
and the addressing pattern. We should teach LLVM how to do this
reliably, but that too is a *much* more significant undertaking and is
extremely rare to have this degree of importance. The desired structure
of the loop, as shown with IACA's analysis for the broadwell
micro-architecture (HSW and SKX are similar):

| Num Of |                    Ports pressure in cycles                     |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |  6  |  7  |    |
---------------------------------------------------------------------------------
|   1    |           |     | 1.0   1.0 |           |     |     |     |     |    | mov rcx, qword ptr [rdi+rdx*1-0x8]
|   2^   |           |     |           | 0.4       | 1.0 |     |     | 0.6 |    | mov qword ptr [rdi], rcx
|   1    |           |     |           | 1.0   1.0 |     |     |     |     |    | mov rcx, qword ptr [rdi+rdx*1]
|   2^   |           |     | 0.3       |           | 1.0 |     |     | 0.7 |    | mov qword ptr [rdi+0x8], rcx
|   1    | 0.5       |     |           |           |     | 0.5 |     |     |    | add rdi, 0x10
|   1    | 0.2       |     |           |           |     |     | 0.8 |     |    | cmp rdi, rax
|   0F   |           |     |           |           |     |     |     |     |    | jb 0xffffffffffffffe9

Specifically, the arrangement of addressing modes for the stores such
that micro-op fusion (indicated by the `^` on the `2` micro-op count) is
important to achieve good throughput for this loop.

The other thing necessary to make this change effective is to remove our
previous hack using `.p2align 5` to pad out the main decompression loop,
and to forcibly disable loop unrolling for critical loops. Because this
change simplifies the loop structure, more unrolling opportunities show
up. Also, the next LLVM release's generic loop optimization improvements
allow unrolling in more places, requiring still more disabling of
unrolling in this change.  Perhaps most surprising of these is that we
must disable loop unrolling in the *slow* path. While unrolling there
seems pointless, it should also be harmless.  This cold code is laid out
very far away from all of the hot code. All the samples shown in a
profile of the benchmark occur before this loop in the function. And
yet, if the loop gets unrolled (which seems to only happen reliably with
the next LLVM release) we see a nearly 20% regression in decompressing
protocol buffers!

With the current release of LLVM, we still observe some regression from
this source change, but it is fairly small (5% on decompressing protocol
buffers, less elsewhere). And with the next LLVM release it drops to
under 1% even in that case. Meanwhile, without this change, the next
release of LLVM will regress decompressing protocol buffers by more than
10%.
2018-01-04 15:27:15 -08:00
costan 26102a0c66 Fix generated version number in open source release.
Lands GitHub PR #61. The patch was also independently contributed by
Martin Gieseking <martin.gieseking@uos.de>.
2017-12-20 14:32:54 -08:00