Commit Graph

239 Commits

Author SHA1 Message Date
Chris Mumford 877cc86f0e Fixed formatted (bash/c++) sections of README.md.
PiperOrigin-RevId: 244695986
2019-05-13 10:11:19 -07:00
atdt 02cf187555 Remove MSan exemption for _bzhi_u32, since LLVM now handles it correctly.
This cleans up a TODO from cl/225463783 and cl/225655713.

PiperOrigin-RevId: 241933185
2019-05-13 10:11:12 -07:00
Ivan be831dc98c
Fix compilation 2019-04-25 18:44:08 +03:00
costan d58cd618be Remove MSBuild section from AppVeyor configuration. 2019-02-26 18:28:14 -08:00
nafi c197d686a9 Optimize snappy compression by about 2.2%.
'jpg_200' is notably optimized by ~8%.

name                                          old time/op             new time/op             delta
BM_UFlat/0      [html             ]            41.8µs ± 0%             41.9µs ± 0%  +0.33%          (p=0.016 n=5+5)
BM_UFlat/1      [urls             ]             590µs ± 0%              590µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/2      [jpg              ]            7.14µs ± 1%             7.12µs ± 1%    ~             (p=0.310 n=5+5)
BM_UFlat/3      [jpg_200          ]              129ns ± 0%              129ns ± 0%    ~             (p=0.167 n=5+5)
BM_UFlat/4      [pdf              ]            8.21µs ± 0%             8.20µs ± 0%    ~             (p=0.310 n=5+5)
BM_UFlat/5      [html4            ]             220µs ± 1%              220µs ± 0%    ~             (p=0.421 n=5+5)
BM_UFlat/6      [txt1             ]             193µs ± 0%              193µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/7      [txt2             ]             171µs ± 0%              171µs ± 0%    ~             (p=0.056 n=5+5)
BM_UFlat/8      [txt3             ]             512µs ± 0%              511µs ± 0%    ~             (p=0.310 n=5+5)
BM_UFlat/9      [txt4             ]             716µs ± 0%              716µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/10     [pb               ]            38.8µs ± 1%             38.8µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/11     [gaviota          ]             190µs ± 0%              190µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/12     [cp               ]            14.4µs ± 1%             14.4µs ± 1%    ~             (p=0.151 n=5+5)
BM_UFlat/13     [c                ]            7.33µs ± 0%             7.32µs ± 0%    ~             (p=0.690 n=5+5)
BM_UFlat/14     [lsp              ]            2.30µs ± 0%             2.31µs ± 1%    ~             (p=0.548 n=5+5)
BM_UFlat/15     [xls              ]             984µs ± 0%              984µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/16     [xls_200          ]              213ns ± 0%              213ns ± 0%    ~             (p=0.310 n=5+5)
BM_UFlat/17     [bin              ]             277µs ± 0%              278µs ± 0%    ~             (p=0.690 n=5+5)
BM_UFlat/18     [bin_200          ]              101ns ± 0%              102ns ± 0%    ~             (p=0.190 n=5+4)
BM_UFlat/19     [sum              ]            29.6µs ± 0%             29.6µs ± 0%    ~             (p=0.310 n=5+5)
BM_UFlat/20     [man              ]            2.98µs ± 1%             2.98µs ± 0%    ~             (p=1.000 n=5+5)
BM_UValidate/0  [html             ]            33.5µs ± 0%             33.6µs ± 0%    ~             (p=0.310 n=5+5)
BM_UValidate/1  [urls             ]             443µs ± 0%              443µs ± 0%    ~             (p=0.841 n=5+5)
BM_UValidate/2  [jpg              ]              146ns ± 0%              146ns ± 0%    ~             (p=0.222 n=5+5)
BM_UValidate/3  [jpg_200          ]             95.6ns ± 0%             95.5ns ± 0%    ~             (p=0.421 n=5+5)
BM_UValidate/4  [pdf              ]            2.92µs ± 0%             2.92µs ± 0%    ~             (p=0.841 n=5+5)
BM_UIOVec/0     [html             ]             122µs ± 0%              122µs ± 0%    ~             (p=0.548 n=5+5)
BM_UIOVec/1     [urls             ]             1.08ms ± 0%             1.08ms ± 0%    ~             (p=0.151 n=5+5)
BM_UIOVec/2     [jpg              ]            7.48µs ± 5%             7.75µs ±12%    ~             (p=0.690 n=5+5)
BM_UIOVec/3     [jpg_200          ]              331ns ± 1%              327ns ± 1%    ~             (p=0.056 n=5+5)
BM_UIOVec/4     [pdf              ]            12.0µs ± 0%             12.0µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/0  [html             ]            41.7µs ± 0%             41.8µs ± 0%    ~             (p=0.421 n=5+5)
BM_UFlatSink/1  [urls             ]             591µs ± 0%              590µs ± 0%    ~             (p=0.151 n=5+5)
BM_UFlatSink/2  [jpg              ]            7.18µs ± 2%             7.31µs ± 3%    ~             (p=0.190 n=4+5)
BM_UFlatSink/3  [jpg_200          ]              134ns ± 2%              134ns ± 2%    ~             (p=1.000 n=5+5)
BM_UFlatSink/4  [pdf              ]            8.22µs ± 0%             8.23µs ± 0%    ~             (p=0.730 n=4+5)
BM_UFlatSink/5  [html4            ]             219µs ± 0%              219µs ± 0%    ~             (p=0.548 n=5+5)
BM_UFlatSink/6  [txt1             ]             193µs ± 0%              193µs ± 0%    ~             (p=0.095 n=5+5)
BM_UFlatSink/7  [txt2             ]             171µs ± 0%              171µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlatSink/8  [txt3             ]             512µs ± 0%              512µs ± 0%    ~             (p=0.548 n=5+5)
BM_UFlatSink/9  [txt4             ]             718µs ± 0%              718µs ± 0%    ~             (p=0.548 n=5+5)
BM_UFlatSink/10 [pb               ]            38.7µs ± 0%             38.6µs ± 0%    ~             (p=0.222 n=5+5)
BM_UFlatSink/11 [gaviota          ]             191µs ± 0%              190µs ± 0%    ~             (p=0.690 n=5+5)
BM_UFlatSink/12 [cp               ]            14.3µs ± 0%             14.4µs ± 0%    ~             (p=0.222 n=5+5)
BM_UFlatSink/13 [c                ]            7.33µs ± 0%             7.34µs ± 1%    ~             (p=0.690 n=5+5)
BM_UFlatSink/14 [lsp              ]            2.29µs ± 1%             2.30µs ± 1%    ~             (p=0.095 n=5+5)
BM_UFlatSink/15 [xls              ]             981µs ± 0%              980µs ± 0%    ~             (p=0.310 n=5+5)
BM_UFlatSink/16 [xls_200          ]              216ns ± 1%              216ns ± 1%    ~             (p=1.000 n=5+5)
BM_UFlatSink/17 [bin              ]             277µs ± 0%              277µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/18 [bin_200          ]              104ns ± 0%              104ns ± 1%    ~             (p=0.905 n=5+4)
BM_UFlatSink/19 [sum              ]            29.5µs ± 0%             29.5µs ± 0%    ~             (p=0.222 n=5+5)
BM_UFlatSink/20 [man              ]            3.01µs ± 1%             3.01µs ± 0%    ~             (p=0.730 n=5+4)
BM_ZFlat/0      [html (22.31 %)   ]             126µs ± 0%              124µs ± 0%  -1.66%          (p=0.008 n=5+5)
BM_ZFlat/1      [urls (47.78 %)   ]             1.68ms ± 0%             1.63ms ± 0%  -2.73%          (p=0.008 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]            11.6µs ± 8%             11.4µs ± 6%    ~             (p=0.310 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]              369ns ± 1%              340ns ± 1%  -7.93%          (p=0.008 n=5+5)
BM_ZFlat/4      [pdf (83.30 %)    ]            14.9µs ± 4%             14.4µs ± 1%  -3.56%          (p=0.008 n=5+5)
BM_ZFlat/5      [html4 (22.52 %)  ]             551µs ± 0%              545µs ± 0%  -1.21%          (p=0.008 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]             540µs ± 0%              534µs ± 0%  -1.15%          (p=0.008 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]             480µs ± 0%              475µs ± 0%  -1.13%          (p=0.008 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]             1.44ms ± 0%             1.43ms ± 0%  -1.14%          (p=0.008 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]             1.97ms ± 0%             1.95ms ± 0%  -1.00%          (p=0.008 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]             110µs ± 0%              107µs ± 0%  -2.77%          (p=0.008 n=5+5)
BM_ZFlat/11     [gaviota (37.72 %)]             413µs ± 0%              411µs ± 0%  -0.50%          (p=0.008 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            46.6µs ± 1%             44.8µs ± 1%  -3.89%          (p=0.008 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            17.8µs ± 0%             17.5µs ± 0%  -1.87%          (p=0.008 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            5.62µs ± 1%             5.35µs ± 1%  -4.81%          (p=0.008 n=5+5)
BM_ZFlat/15     [xls (41.23 %)    ]             1.63ms ± 0%             1.63ms ± 0%    ~             (p=0.310 n=5+5)
BM_ZFlat/16     [xls_200 (78.00 %)]              393ns ± 1%              384ns ± 2%  -2.45%          (p=0.008 n=5+5)
BM_ZFlat/17     [bin (18.11 %)    ]             510µs ± 0%              503µs ± 0%  -1.50%          (p=0.016 n=4+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]             83.2ns ± 3%             84.5ns ± 4%    ~             (p=0.206 n=5+5)
BM_ZFlat/19     [sum (48.96 %)    ]            80.0µs ± 0%             78.3µs ± 0%  -2.20%          (p=0.008 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            7.79µs ± 1%             7.45µs ± 1%  -4.38%          (p=0.008 n=5+5)

name                                          old allocs/op           new allocs/op           delta
BM_UFlat/0      [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/1      [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/2      [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/3      [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/4      [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/5      [html4            ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/6      [txt1             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/7      [txt2             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/8      [txt3             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/9      [txt4             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/10     [pb               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/11     [gaviota          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/12     [cp               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/13     [c                ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/14     [lsp              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/15     [xls              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/16     [xls_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/17     [bin              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/18     [bin_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/19     [sum              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/20     [man              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/0  [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/1  [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/2  [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/3  [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/4  [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/0     [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/1     [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/2     [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/3     [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/4     [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/0  [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/1  [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/2  [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/3  [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/4  [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/5  [html4            ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/6  [txt1             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/7  [txt2             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/8  [txt3             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/9  [txt4             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/10 [pb               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/11 [gaviota          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/12 [cp               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/13 [c                ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/14 [lsp              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/15 [xls              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/16 [xls_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/17 [bin              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/18 [bin_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/19 [sum              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/20 [man              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_ZFlat/0      [html (22.31 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/1      [urls (47.78 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/2      [jpg (99.95 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/3      [jpg_200 (73.00 %)]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/4      [pdf (83.30 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/5      [html4 (22.52 %)  ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/6      [txt1 (57.88 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/7      [txt2 (61.91 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/8      [txt3 (54.99 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/9      [txt4 (66.26 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/10     [pb (19.68 %)     ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/11     [gaviota (37.72 %)]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/12     [cp (48.12 %)     ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/13     [c (42.47 %)      ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/14     [lsp (48.37 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/15     [xls (41.23 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/16     [xls_200 (78.00 %)]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/17     [bin (18.11 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/18     [bin_200 (7.50 %) ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/19     [sum (48.96 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/20     [man (59.21 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)

name                                          old peak-mem(Bytes)/op  new peak-mem(Bytes)/op  delta
BM_UFlat/0      [html             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/1      [urls             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/2      [jpg              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/3      [jpg_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/4      [pdf              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/5      [html4            ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/6      [txt1             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/7      [txt2             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/8      [txt3             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/9      [txt4             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/10     [pb               ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/11     [gaviota          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/12     [cp               ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/13     [c                ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/14     [lsp              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/15     [xls              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/16     [xls_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/17     [bin              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/18     [bin_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/19     [sum              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/20     [man              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/0  [html             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/1  [urls             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/2  [jpg              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/3  [jpg_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/4  [pdf              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/0     [html             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/1     [urls             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/2     [jpg              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/3     [jpg_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/4     [pdf              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlatSink/0  [html             ]               102k ± 0%               102k ± 0%    ~     (all samples are equal)
BM_UFlatSink/1  [urls             ]               702k ± 0%               702k ± 0%    ~     (all samples are equal)
BM_UFlatSink/2  [jpg              ]               123k ± 0%               123k ± 0%    ~     (all samples are equal)
BM_UFlatSink/3  [jpg_200          ]                201 ± 0%                201 ± 0%    ~     (all samples are equal)
BM_UFlatSink/4  [pdf              ]               102k ± 0%               102k ± 0%    ~     (all samples are equal)
BM_UFlatSink/5  [html4            ]               410k ± 0%               410k ± 0%    ~     (all samples are equal)
BM_UFlatSink/6  [txt1             ]               152k ± 0%               152k ± 0%    ~     (all samples are equal)
BM_UFlatSink/7  [txt2             ]               125k ± 0%               125k ± 0%    ~     (all samples are equal)
BM_UFlatSink/8  [txt3             ]               427k ± 0%               427k ± 0%    ~     (all samples are equal)
BM_UFlatSink/9  [txt4             ]               482k ± 0%               482k ± 0%    ~     (all samples are equal)
BM_UFlatSink/10 [pb               ]               119k ± 0%               119k ± 0%    ~     (all samples are equal)
BM_UFlatSink/11 [gaviota          ]               184k ± 0%               184k ± 0%    ~     (all samples are equal)
BM_UFlatSink/12 [cp               ]              24.6k ± 0%              24.6k ± 0%    ~     (all samples are equal)
BM_UFlatSink/13 [c                ]              11.2k ± 0%              11.2k ± 0%    ~     (all samples are equal)
BM_UFlatSink/14 [lsp              ]              3.72k ± 0%              3.72k ± 0%    ~     (all samples are equal)
BM_UFlatSink/15 [xls              ]              1.03M ± 0%              1.03M ± 0%    ~     (all samples are equal)
BM_UFlatSink/16 [xls_200          ]                201 ± 0%                201 ± 0%    ~     (all samples are equal)
BM_UFlatSink/17 [bin              ]               513k ± 0%               513k ± 0%    ~     (all samples are equal)
BM_UFlatSink/18 [bin_200          ]                201 ± 0%                201 ± 0%    ~     (all samples are equal)
BM_UFlatSink/19 [sum              ]              38.2k ± 0%              38.2k ± 0%    ~     (all samples are equal)
BM_UFlatSink/20 [man              ]              4.23k ± 0%              4.23k ± 0%    ~     (all samples are equal)
BM_ZFlat/0      [html (22.31 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/1      [urls (47.78 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/2      [jpg (99.95 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/3      [jpg_200 (73.00 %)]              30.7k ± 0%              30.7k ± 0%    ~     (all samples are equal)
BM_ZFlat/4      [pdf (83.30 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/5      [html4 (22.52 %)  ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/6      [txt1 (57.88 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/7      [txt2 (61.91 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/8      [txt3 (54.99 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/9      [txt4 (66.26 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/10     [pb (19.68 %)     ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/11     [gaviota (37.72 %)]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/12     [cp (48.12 %)     ]              86.1k ± 0%              86.1k ± 0%    ~     (all samples are equal)
BM_ZFlat/13     [c (42.47 %)      ]              57.0k ± 0%              57.0k ± 0%    ~     (all samples are equal)
BM_ZFlat/14     [lsp (48.37 %)    ]              30.6k ± 0%              30.6k ± 0%    ~     (all samples are equal)
BM_ZFlat/15     [xls (41.23 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/16     [xls_200 (78.00 %)]              30.7k ± 0%              30.7k ± 0%    ~     (all samples are equal)
BM_ZFlat/17     [bin (18.11 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/18     [bin_200 (7.50 %) ]              30.7k ± 0%              30.7k ± 0%    ~     (all samples are equal)
BM_ZFlat/19     [sum (48.96 %)    ]               116k ± 0%               116k ± 0%    ~     (all samples are equal)
BM_ZFlat/20     [man (59.21 %)    ]              30.6k ± 0%              30.6k ± 0%    ~     (all samples are equal)

name                                          old speed               new speed               delta
BM_UFlat/0      [html             ]           2.46GB/s ± 0%           2.45GB/s ± 1%    ~             (p=0.841 n=5+5)
BM_UFlat/1      [urls             ]           1.19GB/s ± 1%           1.20GB/s ± 1%    ~             (p=0.310 n=5+5)
BM_UFlat/2      [jpg              ]           17.3GB/s ± 1%           17.4GB/s ± 1%    ~             (p=0.310 n=5+5)
BM_UFlat/3      [jpg_200          ]           1.56GB/s ± 0%           1.56GB/s ± 0%    ~             (p=0.190 n=4+5)
BM_UFlat/4      [pdf              ]           12.5GB/s ± 1%           12.5GB/s ± 0%    ~             (p=0.548 n=5+5)
BM_UFlat/5      [html4            ]           1.87GB/s ± 0%           1.87GB/s ± 1%    ~             (p=1.000 n=5+5)
BM_UFlat/6      [txt1             ]            791MB/s ± 1%            791MB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/7      [txt2             ]            737MB/s ± 0%            738MB/s ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/8      [txt3             ]            839MB/s ± 0%            839MB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/9      [txt4             ]            675MB/s ± 1%            674MB/s ± 0%    ~             (p=0.730 n=5+4)
BM_UFlat/10     [pb               ]           3.08GB/s ± 1%           3.06GB/s ± 0%    ~             (p=0.095 n=5+5)
BM_UFlat/11     [gaviota          ]            974MB/s ± 0%            976MB/s ± 0%    ~             (p=0.238 n=5+5)
BM_UFlat/12     [cp               ]           1.70GB/s ± 0%           1.72GB/s ± 0%  +1.07%          (p=0.016 n=4+5)
BM_UFlat/13     [c                ]           1.53GB/s ± 0%           1.53GB/s ± 1%    ~             (p=1.000 n=5+5)
BM_UFlat/14     [lsp              ]           1.62GB/s ± 1%           1.62GB/s ± 1%    ~             (p=1.000 n=5+5)
BM_UFlat/15     [xls              ]           1.05GB/s ± 1%           1.05GB/s ± 0%    ~             (p=0.556 n=5+4)
BM_UFlat/16     [xls_200          ]            943MB/s ± 0%            940MB/s ± 0%    ~             (p=0.151 n=5+5)
BM_UFlat/17     [bin              ]           1.86GB/s ± 1%           1.86GB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/18     [bin_200          ]           1.99GB/s ± 0%           1.97GB/s ± 1%    ~             (p=0.190 n=5+4)
BM_UFlat/19     [sum              ]           1.30GB/s ± 0%           1.30GB/s ± 1%    ~             (p=0.151 n=5+5)
BM_UFlat/20     [man              ]           1.42GB/s ± 1%           1.42GB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UValidate/0  [html             ]           3.06GB/s ± 0%           3.06GB/s ± 1%    ~             (p=1.000 n=5+5)
BM_UValidate/1  [urls             ]           1.59GB/s ± 0%           1.59GB/s ± 0%    ~             (p=0.095 n=5+5)
BM_UValidate/2  [jpg              ]            845GB/s ± 0%            845GB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UValidate/3  [jpg_200          ]           2.10GB/s ± 0%           2.10GB/s ± 0%    ~             (p=0.310 n=5+5)
BM_UValidate/4  [pdf              ]           35.1GB/s ± 0%           35.1GB/s ± 1%    ~             (p=0.690 n=5+5)
BM_UIOVec/0     [html             ]            843MB/s ± 0%            847MB/s ± 1%    ~             (p=0.222 n=5+5)
BM_UIOVec/1     [urls             ]            652MB/s ± 1%            652MB/s ± 1%    ~             (p=0.310 n=5+5)
BM_UIOVec/2     [jpg              ]           16.5GB/s ± 5%           16.0GB/s ±10%    ~             (p=0.841 n=5+5)
BM_UIOVec/3     [jpg_200          ]            606MB/s ± 1%            614MB/s ± 1%    ~             (p=0.056 n=5+5)
BM_UIOVec/4     [pdf              ]           8.57GB/s ± 0%           8.57GB/s ± 0%    ~             (p=0.343 n=4+4)
BM_UFlatSink/0  [html             ]           2.47GB/s ± 0%           2.45GB/s ± 0%  -0.58%          (p=0.016 n=5+5)
BM_UFlatSink/1  [urls             ]           1.19GB/s ± 0%           1.20GB/s ± 0%    ~             (p=0.548 n=5+5)
BM_UFlatSink/2  [jpg              ]           16.4GB/s ±19%           16.9GB/s ± 4%    ~             (p=0.690 n=5+5)
BM_UFlatSink/3  [jpg_200          ]           1.50GB/s ± 2%           1.50GB/s ± 2%    ~             (p=1.000 n=5+5)
BM_UFlatSink/4  [pdf              ]           12.5GB/s ± 0%           12.5GB/s ± 0%    ~             (p=0.730 n=4+5)
BM_UFlatSink/5  [html4            ]           1.87GB/s ± 1%           1.88GB/s ± 0%    ~             (p=0.421 n=5+5)
BM_UFlatSink/6  [txt1             ]            793MB/s ± 0%            792MB/s ± 1%    ~             (p=0.690 n=5+5)
BM_UFlatSink/7  [txt2             ]            736MB/s ± 0%            736MB/s ± 1%    ~             (p=0.841 n=5+5)
BM_UFlatSink/8  [txt3             ]            839MB/s ± 0%            839MB/s ± 0%    ~             (p=0.548 n=5+5)
BM_UFlatSink/9  [txt4             ]            675MB/s ± 0%            675MB/s ± 0%    ~             (p=0.222 n=5+5)
BM_UFlatSink/10 [pb               ]           3.07GB/s ± 0%           3.09GB/s ± 0%  +0.54%          (p=0.016 n=5+5)
BM_UFlatSink/11 [gaviota          ]            973MB/s ± 0%            971MB/s ± 0%    ~             (p=0.151 n=5+5)
BM_UFlatSink/12 [cp               ]           1.72GB/s ± 1%           1.71GB/s ± 1%    ~             (p=0.421 n=5+5)
BM_UFlatSink/13 [c                ]           1.53GB/s ± 1%           1.52GB/s ± 0%    ~             (p=0.841 n=5+5)
BM_UFlatSink/14 [lsp              ]           1.63GB/s ± 0%           1.62GB/s ± 1%    ~             (p=0.222 n=5+5)
BM_UFlatSink/15 [xls              ]           1.06GB/s ± 0%           1.05GB/s ± 0%    ~             (p=0.111 n=4+5)
BM_UFlatSink/16 [xls_200          ]            932MB/s ± 1%            928MB/s ± 1%    ~             (p=0.548 n=5+5)
BM_UFlatSink/17 [bin              ]           1.86GB/s ± 0%           1.86GB/s ± 1%    ~             (p=1.000 n=5+5)
BM_UFlatSink/18 [bin_200          ]           1.93GB/s ± 1%           1.94GB/s ± 1%    ~             (p=0.730 n=5+4)
BM_UFlatSink/19 [sum              ]           1.30GB/s ± 0%           1.30GB/s ± 1%    ~             (p=0.690 n=5+5)
BM_UFlatSink/20 [man              ]           1.41GB/s ± 1%           1.41GB/s ± 2%    ~             (p=0.690 n=5+5)
BM_ZFlat/0      [html (22.31 %)   ]            815MB/s ± 1%            829MB/s ± 0%  +1.78%          (p=0.008 n=5+5)
BM_ZFlat/1      [urls (47.78 %)   ]            420MB/s ± 1%            432MB/s ± 1%  +2.87%          (p=0.008 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]           10.7GB/s ± 8%           10.9GB/s ± 6%    ~             (p=0.421 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]            544MB/s ± 2%            590MB/s ± 1%  +8.41%          (p=0.008 n=5+5)
BM_ZFlat/4      [pdf (83.30 %)    ]           6.92GB/s ± 3%           7.16GB/s ± 1%  +3.51%          (p=0.008 n=5+5)
BM_ZFlat/5      [html4 (22.52 %)  ]            745MB/s ± 0%            755MB/s ± 0%  +1.34%          (p=0.008 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]            282MB/s ± 0%            285MB/s ± 1%  +1.04%          (p=0.008 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]            262MB/s ± 0%            265MB/s ± 0%  +1.22%          (p=0.008 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]            297MB/s ± 0%            300MB/s ± 0%  +1.09%          (p=0.008 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]            246MB/s ± 1%            248MB/s ± 0%  +0.95%          (p=0.008 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]           1.08GB/s ± 1%           1.11GB/s ± 1%  +2.57%          (p=0.008 n=5+5)
BM_ZFlat/11     [gaviota (37.72 %)]            449MB/s ± 1%            451MB/s ± 0%    ~             (p=0.056 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            530MB/s ± 1%            552MB/s ± 0%  +4.17%          (p=0.008 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            628MB/s ± 1%            640MB/s ± 0%  +1.85%          (p=0.008 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            665MB/s ± 0%            697MB/s ± 1%  +4.71%          (p=0.008 n=5+5)
BM_ZFlat/15     [xls (41.23 %)    ]            635MB/s ± 0%            634MB/s ± 0%    ~             (p=0.310 n=5+5)
BM_ZFlat/16     [xls_200 (78.00 %)]            511MB/s ± 1%            522MB/s ± 2%  +2.23%          (p=0.008 n=5+5)
BM_ZFlat/17     [bin (18.11 %)    ]           1.01GB/s ± 1%           1.02GB/s ± 0%  +1.67%          (p=0.008 n=5+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]           2.41GB/s ± 3%           2.37GB/s ± 4%    ~             (p=0.222 n=5+5)
BM_ZFlat/19     [sum (48.96 %)    ]            480MB/s ± 0%            490MB/s ± 1%  +2.24%          (p=0.008 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            545MB/s ± 0%            569MB/s ± 1%  +4.38%          (p=0.008 n=5+5)
2019-02-26 18:27:31 -08:00
costan 3f194acb57 Convert DCHECK to assert.
A previous CL introduced a use of DCHECK. The open source build does not
support DCHECK, and this project uses assert() instead of DCHECK.
2019-01-08 13:49:15 -08:00
costan 97a20b480f Reduce the LeftShiftOverflows() table size.
A previous CL introduced LeftShiftOverflows(), which takes a uint32
input. However, the value it operates on is guaranteed to only have 8
bits set. This CL takes advantage of this restriction to reduce the size
of the static table used to compute LeftShiftOverflows().

The same methodology as the previous CL suggests a 0.6% improvement. The
improvement is likely bigger on mobile CPUs that have much smaller
caches.

Benchmark results:

name                                          old time/op             new time/op             delta
BM_UFlat/0      [html             ]            42.5µs ± 1%             42.1µs ± 0%  -0.87%        (p=0.000 n=20+20)
BM_UFlat/1      [urls             ]             575µs ± 0%              574µs ± 0%  -0.16%        (p=0.000 n=20+19)
BM_UFlat/2      [jpg              ]            7.13µs ± 1%             7.20µs ± 5%    ~           (p=0.422 n=16+19)
BM_UFlat/3      [jpg_200          ]              129ns ± 0%              130ns ± 0%  +0.82%        (p=0.000 n=20+17)
BM_UFlat/4      [pdf              ]            8.22µs ± 1%             8.21µs ± 0%    ~           (p=0.586 n=17+17)
BM_UFlat/5      [html4            ]             222µs ± 0%              222µs ± 0%  -0.11%        (p=0.047 n=19+20)
BM_UFlat/6      [txt1             ]             192µs ± 0%              191µs ± 0%  -0.69%        (p=0.000 n=20+20)
BM_UFlat/7      [txt2             ]             169µs ± 0%              169µs ± 0%  -0.28%        (p=0.000 n=20+20)
BM_UFlat/8      [txt3             ]             510µs ± 0%              507µs ± 0%  -0.50%        (p=0.000 n=20+20)
BM_UFlat/9      [txt4             ]             707µs ± 0%              703µs ± 0%  -0.53%        (p=0.000 n=20+20)
BM_UFlat/10     [pb               ]            39.1µs ± 0%             38.5µs ± 0%  -1.56%        (p=0.000 n=20+20)
BM_UFlat/11     [gaviota          ]             189µs ± 0%              189µs ± 0%  -0.42%        (p=0.000 n=20+20)
BM_UFlat/12     [cp               ]            14.2µs ± 0%             14.2µs ± 1%  -0.30%        (p=0.001 n=18+19)
BM_UFlat/13     [c                ]            7.29µs ± 0%             7.34µs ± 1%  +0.59%        (p=0.000 n=19+20)
BM_UFlat/14     [lsp              ]            2.28µs ± 0%             2.29µs ± 1%  +0.39%        (p=0.000 n=19+18)
BM_UFlat/15     [xls              ]             905µs ± 0%              904µs ± 0%  -0.12%        (p=0.030 n=20+20)
BM_UFlat/16     [xls_200          ]              213ns ± 2%              215ns ± 4%  +0.92%        (p=0.011 n=20+20)
BM_UFlat/17     [bin              ]             274µs ± 0%              275µs ± 0%  +0.55%        (p=0.000 n=20+20)
BM_UFlat/18     [bin_200          ]              101ns ± 1%              101ns ± 1%    ~           (p=0.913 n=18+18)
BM_UFlat/19     [sum              ]            27.9µs ± 1%             27.5µs ± 1%  -1.38%        (p=0.000 n=20+20)
BM_UFlat/20     [man              ]            2.97µs ± 1%             2.97µs ± 1%    ~           (p=0.835 n=20+19)
BM_UValidate/0  [html             ]            33.5µs ± 0%             34.2µs ± 0%  +2.32%        (p=0.000 n=20+20)
BM_UValidate/1  [urls             ]             441µs ± 0%              442µs ± 0%  +0.15%        (p=0.010 n=20+20)
BM_UValidate/2  [jpg              ]              144ns ± 0%              146ns ± 0%  +1.32%        (p=0.000 n=20+20)
BM_UValidate/3  [jpg_200          ]             95.3ns ± 0%             96.0ns ± 0%  +0.68%        (p=0.000 n=20+20)
BM_UValidate/4  [pdf              ]            2.86µs ± 0%             2.88µs ± 1%  +0.67%        (p=0.000 n=19+19)
BM_UIOVec/0     [html             ]             122µs ± 0%              122µs ± 0%  -0.25%        (p=0.000 n=20+20)
BM_UIOVec/1     [urls             ]             1.08ms ± 0%             1.08ms ± 0%    ~           (p=0.068 n=20+20)
BM_UIOVec/2     [jpg              ]            7.63µs ± 7%             7.76µs ±11%    ~           (p=0.396 n=19+20)
BM_UIOVec/3     [jpg_200          ]              325ns ± 0%              326ns ± 0%  +0.27%        (p=0.000 n=20+18)
BM_UIOVec/4     [pdf              ]            12.1µs ± 2%             12.1µs ± 3%    ~           (p=0.967 n=19+20)
BM_UFlatSink/0  [html             ]            42.4µs ± 0%             42.1µs ± 0%  -0.89%        (p=0.000 n=20+20)
BM_UFlatSink/1  [urls             ]             575µs ± 0%              575µs ± 0%    ~           (p=0.883 n=20+20)
BM_UFlatSink/2  [jpg              ]            7.58µs ±16%             7.52µs ±15%    ~           (p=0.945 n=19+20)
BM_UFlatSink/3  [jpg_200          ]              133ns ± 4%              133ns ± 4%    ~           (p=0.627 n=19+20)
BM_UFlatSink/4  [pdf              ]            8.29µs ± 4%             8.39µs ± 4%  +1.14%        (p=0.013 n=19+18)
BM_UFlatSink/5  [html4            ]             223µs ± 0%              222µs ± 0%  -0.18%        (p=0.001 n=20+20)
BM_UFlatSink/6  [txt1             ]             192µs ± 0%              191µs ± 0%  -0.71%        (p=0.000 n=20+20)
BM_UFlatSink/7  [txt2             ]             169µs ± 0%              169µs ± 0%  -0.26%        (p=0.000 n=20+20)
BM_UFlatSink/8  [txt3             ]             510µs ± 0%              508µs ± 0%  -0.50%        (p=0.000 n=20+20)
BM_UFlatSink/9  [txt4             ]             707µs ± 0%              704µs ± 0%  -0.44%        (p=0.000 n=20+20)
BM_UFlatSink/10 [pb               ]            39.1µs ± 0%             38.5µs ± 1%  -1.62%        (p=0.000 n=19+20)
BM_UFlatSink/11 [gaviota          ]             189µs ± 0%              189µs ± 0%  -0.39%        (p=0.000 n=20+20)
BM_UFlatSink/12 [cp               ]            14.2µs ± 0%             14.2µs ± 1%    ~           (p=0.435 n=19+19)
BM_UFlatSink/13 [c                ]            7.29µs ± 0%             7.33µs ± 1%  +0.57%        (p=0.000 n=19+20)
BM_UFlatSink/14 [lsp              ]            2.29µs ± 0%             2.29µs ± 1%    ~           (p=0.791 n=18+18)
BM_UFlatSink/15 [xls              ]             903µs ± 0%              902µs ± 0%  -0.11%        (p=0.044 n=20+19)
BM_UFlatSink/16 [xls_200          ]              215ns ± 1%              215ns ± 1%    ~           (p=0.885 n=19+19)
BM_UFlatSink/17 [bin              ]             274µs ± 0%              275µs ± 0%  +0.51%        (p=0.000 n=20+20)
BM_UFlatSink/18 [bin_200          ]              103ns ± 2%              103ns ± 0%  -0.41%        (p=0.016 n=20+15)
BM_UFlatSink/19 [sum              ]            27.9µs ± 1%             27.5µs ± 1%  -1.34%        (p=0.000 n=20+19)
BM_UFlatSink/20 [man              ]            2.98µs ± 1%             2.97µs ± 1%    ~           (p=0.358 n=18+19)
BM_ZFlat/0      [html (22.31 %)   ]             126µs ± 0%              126µs ± 0%  +0.14%        (p=0.011 n=20+20)
BM_ZFlat/1      [urls (47.78 %)   ]             1.67ms ± 0%             1.67ms ± 0%  +0.11%        (p=0.043 n=20+20)
BM_ZFlat/2      [jpg (99.95 %)    ]            11.5µs ± 6%             11.7µs ± 7%    ~           (p=0.142 n=20+20)
BM_ZFlat/3      [jpg_200 (73.00 %)]              349ns ± 3%              351ns ± 3%    ~           (p=0.573 n=18+20)
BM_ZFlat/4      [pdf (83.30 %)    ]            14.6µs ± 2%             14.7µs ± 4%    ~           (p=0.879 n=19+20)
BM_ZFlat/5      [html4 (22.52 %)  ]             553µs ± 0%              552µs ± 0%  -0.23%        (p=0.000 n=20+20)
BM_ZFlat/6      [txt1 (57.88 %)   ]             540µs ± 0%              540µs ± 0%    ~           (p=0.221 n=20+20)
BM_ZFlat/7      [txt2 (61.91 %)   ]             479µs ± 0%              481µs ± 1%  +0.47%        (p=0.000 n=20+20)
BM_ZFlat/8      [txt3 (54.99 %)   ]             1.44ms ± 0%             1.44ms ± 0%  +0.13%        (p=0.040 n=20+20)
BM_ZFlat/9      [txt4 (66.26 %)   ]             1.97ms ± 0%             1.97ms ± 0%  +0.16%        (p=0.009 n=20+20)
BM_ZFlat/10     [pb (19.68 %)     ]             110µs ± 1%              109µs ± 1%  -0.79%        (p=0.000 n=20+20)
BM_ZFlat/11     [gaviota (37.72 %)]             410µs ± 0%              410µs ± 0%    ~           (p=0.149 n=20+19)
BM_ZFlat/12     [cp (48.12 %)     ]            45.4µs ± 1%             44.9µs ± 1%  -1.23%        (p=0.000 n=20+20)
BM_ZFlat/13     [c (42.47 %)      ]            17.5µs ± 0%             17.5µs ± 1%    ~           (p=0.883 n=20+20)
BM_ZFlat/14     [lsp (48.37 %)    ]            5.51µs ± 1%             5.46µs ± 1%  -0.95%        (p=0.000 n=20+18)
BM_ZFlat/15     [xls (41.23 %)    ]             1.61ms ± 0%             1.62ms ± 0%    ~           (p=0.183 n=20+20)
BM_ZFlat/16     [xls_200 (78.00 %)]              389ns ± 2%              391ns ± 3%    ~           (p=0.740 n=18+20)
BM_ZFlat/17     [bin (18.11 %)    ]             508µs ± 0%              508µs ± 0%    ~           (p=0.779 n=20+20)
BM_ZFlat/18     [bin_200 (7.50 %) ]             87.4ns ± 5%             88.1ns ± 8%    ~           (p=0.367 n=16+19)
BM_ZFlat/19     [sum (48.96 %)    ]            79.1µs ± 0%             80.2µs ± 0%  +1.39%        (p=0.000 n=20+20)
BM_ZFlat/20     [man (59.21 %)    ]            7.55µs ± 1%             7.57µs ± 1%  +0.31%        (p=0.025 n=19+19)

name                                          old speed               new speed               delta
BM_UFlat/0      [html             ]           2.42GB/s ± 0%           2.44GB/s ± 0%  +0.77%        (p=0.000 n=19+19)
BM_UFlat/1      [urls             ]           1.22GB/s ± 0%           1.23GB/s ± 0%  +0.06%        (p=0.000 n=20+19)
BM_UFlat/2      [jpg              ]           17.3GB/s ± 2%           17.2GB/s ± 4%    ~           (p=0.433 n=17+19)
BM_UFlat/3      [jpg_200          ]           1.56GB/s ± 0%           1.54GB/s ± 0%  -0.82%        (p=0.000 n=20+20)
BM_UFlat/4      [pdf              ]           12.5GB/s ± 1%           12.5GB/s ± 1%    ~           (p=0.322 n=17+17)
BM_UFlat/5      [html4            ]           1.85GB/s ± 0%           1.85GB/s ± 0%  +0.16%        (p=0.000 n=20+20)
BM_UFlat/6      [txt1             ]            794MB/s ± 0%            800MB/s ± 0%  +0.68%        (p=0.000 n=18+20)
BM_UFlat/7      [txt2             ]            741MB/s ± 0%            743MB/s ± 0%  +0.30%        (p=0.000 n=19+19)
BM_UFlat/8      [txt3             ]            840MB/s ± 0%            844MB/s ± 0%  +0.53%        (p=0.000 n=18+20)
BM_UFlat/9      [txt4             ]            684MB/s ± 0%            688MB/s ± 0%  +0.57%        (p=0.000 n=20+17)
BM_UFlat/10     [pb               ]           3.04GB/s ± 0%           3.09GB/s ± 0%  +1.60%        (p=0.000 n=19+20)
BM_UFlat/11     [gaviota          ]            977MB/s ± 0%            981MB/s ± 0%  +0.45%        (p=0.000 n=19+19)
BM_UFlat/12     [cp               ]           1.74GB/s ± 0%           1.74GB/s ± 0%  +0.29%        (p=0.000 n=20+19)
BM_UFlat/13     [c                ]           1.53GB/s ± 0%           1.52GB/s ± 1%  -0.56%        (p=0.000 n=19+20)
BM_UFlat/14     [lsp              ]           1.64GB/s ± 0%           1.63GB/s ± 1%  -0.38%        (p=0.000 n=19+20)
BM_UFlat/15     [xls              ]           1.14GB/s ± 0%           1.14GB/s ± 0%  +0.11%        (p=0.000 n=19+20)
BM_UFlat/16     [xls_200          ]            941MB/s ± 1%            931MB/s ± 4%  -1.02%        (p=0.001 n=19+20)
BM_UFlat/17     [bin              ]           1.88GB/s ± 0%           1.87GB/s ± 0%  -0.51%        (p=0.000 n=20+20)
BM_UFlat/18     [bin_200          ]           1.98GB/s ± 0%           1.98GB/s ± 1%    ~           (p=0.767 n=18+18)
BM_UFlat/19     [sum              ]           1.37GB/s ± 0%           1.39GB/s ± 0%  +1.46%        (p=0.000 n=20+20)
BM_UFlat/20     [man              ]           1.43GB/s ± 0%           1.43GB/s ± 0%    ~           (p=0.501 n=18+18)
BM_UValidate/0  [html             ]           3.07GB/s ± 0%           3.00GB/s ± 0%  -2.25%        (p=0.000 n=20+20)
BM_UValidate/1  [urls             ]           1.60GB/s ± 0%           1.59GB/s ± 0%  -0.11%        (p=0.000 n=18+19)
BM_UValidate/2  [jpg              ]            859GB/s ± 0%            848GB/s ± 0%  -1.29%        (p=0.000 n=20+19)
BM_UValidate/3  [jpg_200          ]           2.10GB/s ± 0%           2.09GB/s ± 0%  -0.68%        (p=0.000 n=19+20)
BM_UValidate/4  [pdf              ]           35.9GB/s ± 0%           35.6GB/s ± 1%  -0.71%        (p=0.000 n=20+20)
BM_UIOVec/0     [html             ]            843MB/s ± 0%            844MB/s ± 0%  +0.21%        (p=0.000 n=20+20)
BM_UIOVec/1     [urls             ]            651MB/s ± 0%            650MB/s ± 0%  -0.10%        (p=0.000 n=20+20)
BM_UIOVec/2     [jpg              ]           16.2GB/s ± 6%           16.0GB/s ±10%    ~           (p=0.380 n=19+20)
BM_UIOVec/3     [jpg_200          ]            617MB/s ± 0%            615MB/s ± 0%  -0.24%        (p=0.000 n=20+17)
BM_UIOVec/4     [pdf              ]           8.52GB/s ± 3%           8.50GB/s ± 3%    ~           (p=0.771 n=19+20)
BM_UFlatSink/0  [html             ]           2.42GB/s ± 0%           2.44GB/s ± 0%  +0.93%        (p=0.000 n=20+20)
BM_UFlatSink/1  [urls             ]           1.23GB/s ± 0%           1.23GB/s ± 0%  +0.04%        (p=0.006 n=20+20)
BM_UFlatSink/2  [jpg              ]           16.4GB/s ±14%           16.5GB/s ±13%    ~           (p=0.879 n=19+20)
BM_UFlatSink/3  [jpg_200          ]           1.51GB/s ± 4%           1.51GB/s ± 4%    ~           (p=0.874 n=18+20)
BM_UFlatSink/4  [pdf              ]           12.4GB/s ± 4%           12.3GB/s ± 4%  -1.11%        (p=0.016 n=19+18)
BM_UFlatSink/5  [html4            ]           1.85GB/s ± 0%           1.85GB/s ± 0%  +0.20%        (p=0.000 n=20+20)
BM_UFlatSink/6  [txt1             ]            794MB/s ± 0%            799MB/s ± 0%  +0.72%        (p=0.000 n=19+20)
BM_UFlatSink/7  [txt2             ]            741MB/s ± 0%            743MB/s ± 0%  +0.30%        (p=0.000 n=18+20)
BM_UFlatSink/8  [txt3             ]            839MB/s ± 0%            843MB/s ± 0%  +0.52%        (p=0.000 n=20+18)
BM_UFlatSink/9  [txt4             ]            684MB/s ± 0%            687MB/s ± 0%  +0.46%        (p=0.000 n=20+20)
BM_UFlatSink/10 [pb               ]           3.04GB/s ± 0%           3.09GB/s ± 0%  +1.71%        (p=0.000 n=20+19)
BM_UFlatSink/11 [gaviota          ]            976MB/s ± 0%            980MB/s ± 0%  +0.45%        (p=0.000 n=20+20)
BM_UFlatSink/12 [cp               ]           1.74GB/s ± 1%           1.74GB/s ± 1%    ~           (p=0.904 n=20+20)
BM_UFlatSink/13 [c                ]           1.53GB/s ± 0%           1.53GB/s ± 1%  -0.50%        (p=0.000 n=19+20)
BM_UFlatSink/14 [lsp              ]           1.63GB/s ± 1%           1.63GB/s ± 1%    ~           (p=0.358 n=19+18)
BM_UFlatSink/15 [xls              ]           1.14GB/s ± 0%           1.15GB/s ± 0%  +0.12%        (p=0.000 n=20+20)
BM_UFlatSink/16 [xls_200          ]            931MB/s ± 1%            931MB/s ± 1%    ~           (p=0.686 n=19+19)
BM_UFlatSink/17 [bin              ]           1.88GB/s ± 0%           1.87GB/s ± 0%  -0.53%        (p=0.000 n=20+20)
BM_UFlatSink/18 [bin_200          ]           1.94GB/s ± 2%           1.95GB/s ± 1%  +0.42%        (p=0.014 n=20+15)
BM_UFlatSink/19 [sum              ]           1.37GB/s ± 0%           1.39GB/s ± 0%  +1.38%        (p=0.000 n=19+18)
BM_UFlatSink/20 [man              ]           1.42GB/s ± 1%           1.43GB/s ± 0%    ~           (p=0.284 n=18+19)
BM_ZFlat/0      [html (22.31 %)   ]            815MB/s ± 0%            814MB/s ± 0%  -0.15%        (p=0.000 n=20+20)
BM_ZFlat/1      [urls (47.78 %)   ]            423MB/s ± 0%            422MB/s ± 0%  -0.14%        (p=0.000 n=20+20)
BM_ZFlat/2      [jpg (99.95 %)    ]           10.8GB/s ± 5%           10.6GB/s ± 7%    ~           (p=0.142 n=20+20)
BM_ZFlat/3      [jpg_200 (73.00 %)]            574MB/s ± 2%            572MB/s ± 2%    ~           (p=0.613 n=18+20)
BM_ZFlat/4      [pdf (83.30 %)    ]           7.01GB/s ± 2%           7.01GB/s ± 4%    ~           (p=0.593 n=18+20)
BM_ZFlat/5      [html4 (22.52 %)  ]            743MB/s ± 0%            745MB/s ± 0%  +0.25%        (p=0.000 n=20+19)
BM_ZFlat/6      [txt1 (57.88 %)   ]            283MB/s ± 0%            282MB/s ± 0%    ~           (p=0.261 n=18+19)
BM_ZFlat/7      [txt2 (61.91 %)   ]            262MB/s ± 0%            261MB/s ± 0%  -0.35%        (p=0.000 n=20+19)
BM_ZFlat/8      [txt3 (54.99 %)   ]            298MB/s ± 0%            297MB/s ± 0%  -0.11%        (p=0.000 n=20+19)
BM_ZFlat/9      [txt4 (66.26 %)   ]            245MB/s ± 0%            245MB/s ± 0%  -0.13%        (p=0.000 n=19+20)
BM_ZFlat/10     [pb (19.68 %)     ]           1.08GB/s ± 0%           1.09GB/s ± 0%  +0.82%        (p=0.000 n=18+19)
BM_ZFlat/11     [gaviota (37.72 %)]            451MB/s ± 0%            451MB/s ± 0%  -0.05%        (p=0.004 n=19+20)
BM_ZFlat/12     [cp (48.12 %)     ]            543MB/s ± 1%            550MB/s ± 1%  +1.24%        (p=0.000 n=20+20)
BM_ZFlat/13     [c (42.47 %)      ]            638MB/s ± 0%            637MB/s ± 0%    ~           (p=0.708 n=19+19)
BM_ZFlat/14     [lsp (48.37 %)    ]            678MB/s ± 2%            684MB/s ± 1%  +0.89%        (p=0.000 n=20+19)
BM_ZFlat/15     [xls (41.23 %)    ]            640MB/s ± 0%            640MB/s ± 0%  -0.10%        (p=0.000 n=19+19)
BM_ZFlat/16     [xls_200 (78.00 %)]            515MB/s ± 2%            514MB/s ± 3%    ~           (p=0.916 n=18+19)
BM_ZFlat/17     [bin (18.11 %)    ]           1.01GB/s ± 0%           1.01GB/s ± 0%  +0.03%        (p=0.033 n=20+20)
BM_ZFlat/18     [bin_200 (7.50 %) ]           2.30GB/s ± 6%           2.28GB/s ± 9%    ~           (p=0.502 n=16+19)
BM_ZFlat/19     [sum (48.96 %)    ]            485MB/s ± 0%            478MB/s ± 0%  -1.39%        (p=0.000 n=19+20)
BM_ZFlat/20     [man (59.21 %)    ]            562MB/s ± 1%            560MB/s ± 1%  -0.37%        (p=0.016 n=18+19)
2019-01-08 13:48:30 -08:00
costan 4f0adca400 Wrap BMI2 instruction usage in support checks.
A previous version of this was submitted and rolled back due to breakage
-- an attempt to accommodate Visual Studio resulted in compiler errors
on GCC/Clang with -mavx2 but without -mbmi2. This version makes the BMI2
support check more strict, to avoid the errors.

A previous CL introduced _bzhi_u32 (part of Intel's BMI2 instruction
set, released in Haswell) gated by a check for the __BMI2__ preprocessor
macro. This works for Clang and GCC, but does not work on Visual Studio,
and may not work on other compilers.

This CL plumbs the BMI2 support checks through the CMake configuration
used by the open source build. It also replaces the <x86intrin.h>
header, which does not exist on Visual Studio, with the more scoped
headers <tmmintrin.h> (for SSSE3) and <immintrin.h> (for BMI2/AVX2).
Asides from fixing the open source build, the more scoped headers make
it slightly less likely that newer intrinsics will creep in without
proper gating.
2019-01-08 06:44:11 -08:00
nafi 46768e335d Optimize decompression by about 0.82%.
Assembly difference: https://godbolt.org/z/cvlH9b

name                                          old time/op             new time/op             delta
BM_UFlat/0      [html             ]            42.3µs ± 0%             42.5µs ± 0%   +0.57%          (p=0.008 n=5+5)
BM_UFlat/1      [urls             ]             590µs ± 0%              575µs ± 0%   -2.60%          (p=0.008 n=5+5)
BM_UFlat/2      [jpg              ]            7.16µs ± 1%             7.15µs ± 1%     ~             (p=0.841 n=5+5)
BM_UFlat/3      [jpg_200          ]              131ns ± 0%              129ns ± 0%   -1.41%          (p=0.008 n=5+5)
BM_UFlat/4      [pdf              ]            8.21µs ± 0%             8.22µs ± 1%     ~             (p=0.690 n=5+5)
BM_UFlat/5      [html4            ]             222µs ± 0%              223µs ± 0%     ~             (p=0.841 n=5+5)
BM_UFlat/6      [txt1             ]             193µs ± 0%              192µs ± 0%     ~             (p=0.095 n=5+5)
BM_UFlat/7      [txt2             ]             171µs ± 0%              169µs ± 0%   -0.83%          (p=0.008 n=5+5)
BM_UFlat/8      [txt3             ]             511µs ± 0%              510µs ± 0%     ~             (p=0.841 n=5+5)
BM_UFlat/9      [txt4             ]             717µs ± 0%              707µs ± 0%   -1.42%          (p=0.008 n=5+5)
BM_UFlat/10     [pb               ]            38.8µs ± 0%             39.3µs ± 0%   +1.26%          (p=0.008 n=5+5)
BM_UFlat/11     [gaviota          ]             190µs ± 0%              189µs ± 0%   -0.43%          (p=0.032 n=5+5)
BM_UFlat/12     [cp               ]            14.3µs ± 0%             14.2µs ± 0%   -0.92%          (p=0.008 n=5+5)
BM_UFlat/13     [c                ]            7.35µs ± 1%             7.30µs ± 0%   -0.66%          (p=0.032 n=5+5)
BM_UFlat/14     [lsp              ]            2.30µs ± 1%             2.28µs ± 0%     ~             (p=0.056 n=5+5)
BM_UFlat/15     [xls              ]             983µs ± 0%              904µs ± 0%   -7.99%          (p=0.008 n=5+5)
BM_UFlat/16     [xls_200          ]              213ns ± 0%              213ns ± 1%     ~             (p=0.690 n=5+5)
BM_UFlat/17     [bin              ]             278µs ± 0%              274µs ± 0%   -1.56%          (p=0.008 n=5+5)
BM_UFlat/18     [bin_200          ]              101ns ± 0%              101ns ± 1%     ~             (p=1.000 n=5+5)
BM_UFlat/19     [sum              ]            29.4µs ± 1%             28.0µs ± 1%   -4.98%          (p=0.008 n=5+5)
BM_UFlat/20     [man              ]            2.97µs ± 0%             2.97µs ± 0%     ~             (p=0.421 n=5+5)
BM_UValidate/0  [html             ]            33.6µs ± 0%             33.6µs ± 0%     ~             (p=0.548 n=5+5)
BM_UValidate/1  [urls             ]             443µs ± 0%              441µs ± 0%   -0.43%          (p=0.016 n=4+5)
BM_UValidate/2  [jpg              ]              146ns ± 0%              144ns ± 0%   -1.63%          (p=0.008 n=5+5)
BM_UValidate/3  [jpg_200          ]             98.6ns ± 0%             95.3ns ± 0%   -3.32%          (p=0.008 n=5+5)
BM_UValidate/4  [pdf              ]            2.89µs ± 1%             2.85µs ± 0%   -1.22%          (p=0.008 n=5+5)
BM_UIOVec/0     [html             ]             122µs ± 0%              122µs ± 0%     ~             (p=1.000 n=5+5)
BM_UIOVec/1     [urls             ]             1.08ms ± 0%             1.08ms ± 0%     ~             (p=0.095 n=5+5)
BM_UIOVec/2     [jpg              ]            7.51µs ± 4%             7.69µs ± 6%     ~             (p=0.421 n=5+5)
BM_UIOVec/3     [jpg_200          ]              327ns ± 0%              327ns ± 1%     ~             (p=0.730 n=4+5)
BM_UIOVec/4     [pdf              ]            12.0µs ± 1%             12.0µs ± 0%     ~             (p=0.286 n=5+4)
BM_UFlatSink/0  [html             ]            42.3µs ± 0%             42.5µs ± 0%   +0.46%          (p=0.008 n=5+5)
BM_UFlatSink/1  [urls             ]             589µs ± 0%              575µs ± 0%   -2.36%          (p=0.008 n=5+5)
BM_UFlatSink/2  [jpg              ]            7.40µs ± 8%             7.74µs ± 9%     ~             (p=0.310 n=5+5)
BM_UFlatSink/3  [jpg_200          ]              134ns ± 0%              131ns ± 0%   -1.78%          (p=0.008 n=5+5)
BM_UFlatSink/4  [pdf              ]            8.28µs ± 3%             8.35µs ± 6%     ~             (p=0.548 n=5+5)
BM_UFlatSink/5  [html4            ]             222µs ± 0%              222µs ± 0%     ~             (p=0.690 n=5+5)
BM_UFlatSink/6  [txt1             ]             193µs ± 0%              192µs ± 0%     ~             (p=0.222 n=5+5)
BM_UFlatSink/7  [txt2             ]             171µs ± 0%              169µs ± 0%   -0.91%          (p=0.008 n=5+5)
BM_UFlatSink/8  [txt3             ]             512µs ± 0%              510µs ± 0%   -0.28%          (p=0.032 n=5+5)
BM_UFlatSink/9  [txt4             ]             717µs ± 0%              707µs ± 0%   -1.32%          (p=0.008 n=5+5)
BM_UFlatSink/10 [pb               ]            38.7µs ± 0%             39.2µs ± 0%   +1.29%          (p=0.008 n=5+5)
BM_UFlatSink/11 [gaviota          ]             190µs ± 0%              189µs ± 0%   -0.47%          (p=0.008 n=5+5)
BM_UFlatSink/12 [cp               ]            14.3µs ± 0%             14.2µs ± 0%   -0.65%          (p=0.008 n=5+5)
BM_UFlatSink/13 [c                ]            7.36µs ± 1%             7.29µs ± 0%   -0.92%          (p=0.008 n=5+5)
BM_UFlatSink/14 [lsp              ]            2.30µs ± 1%             2.29µs ± 0%     ~             (p=0.841 n=5+5)
BM_UFlatSink/15 [xls              ]             980µs ± 0%              903µs ± 0%   -7.92%          (p=0.008 n=5+5)
BM_UFlatSink/16 [xls_200          ]              217ns ± 0%              215ns ± 0%   -0.94%          (p=0.008 n=5+5)
BM_UFlatSink/17 [bin              ]             278µs ± 0%              273µs ± 0%   -1.56%          (p=0.008 n=5+5)
BM_UFlatSink/18 [bin_200          ]              107ns ± 5%              104ns ± 0%     ~             (p=0.056 n=5+5)
BM_UFlatSink/19 [sum              ]            29.5µs ± 0%             27.9µs ± 0%   -5.32%          (p=0.008 n=5+5)
BM_UFlatSink/20 [man              ]            3.01µs ± 0%             3.00µs ± 1%     ~             (p=0.310 n=5+5)
BM_ZFlat/0      [html (22.31 %)   ]             127µs ± 0%              126µs ± 0%   -0.46%          (p=0.008 n=5+5)
BM_ZFlat/1      [urls (47.78 %)   ]             1.67ms ± 0%             1.67ms ± 0%     ~             (p=0.548 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]            11.5µs ± 3%             11.6µs ± 6%     ~             (p=0.841 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]              350ns ± 2%              347ns ± 0%     ~             (p=0.905 n=5+4)
BM_ZFlat/4      [pdf (83.30 %)    ]            14.6µs ± 4%             14.6µs ± 1%     ~             (p=0.421 n=5+5)
BM_ZFlat/5      [html4 (22.52 %)  ]             553µs ± 0%              553µs ± 0%     ~             (p=0.690 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]             540µs ± 0%              540µs ± 0%     ~             (p=1.000 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]             481µs ± 0%              479µs ± 0%   -0.54%          (p=0.008 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]             1.44ms ± 0%             1.44ms ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]             1.97ms ± 0%             1.97ms ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]             110µs ± 0%              110µs ± 0%     ~             (p=0.841 n=5+5)
BM_ZFlat/11     [gaviota (37.72 %)]             411µs ± 0%              410µs ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            46.1µs ± 1%             45.8µs ± 0%     ~             (p=0.056 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            17.6µs ± 0%             17.6µs ± 1%     ~             (p=0.310 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            5.46µs ± 1%             5.49µs ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/15     [xls (41.23 %)    ]             1.62ms ± 0%             1.61ms ± 0%     ~             (p=0.190 n=4+5)
BM_ZFlat/16     [xls_200 (78.00 %)]              392ns ± 2%              385ns ± 1%     ~             (p=0.200 n=4+4)
BM_ZFlat/17     [bin (18.11 %)    ]             509µs ± 0%              508µs ± 0%   -0.26%          (p=0.008 n=5+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]             90.2ns ±15%             80.8ns ± 0%  -10.39%          (p=0.016 n=5+4)
BM_ZFlat/19     [sum (48.96 %)    ]            81.1µs ± 0%             79.1µs ± 1%   -2.37%          (p=0.008 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            7.61µs ± 1%             7.57µs ± 1%     ~             (p=0.421 n=5+5)

name                                          old allocs/op           new allocs/op           delta
BM_UFlat/0      [html             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/1      [urls             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/2      [jpg              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/3      [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/4      [pdf              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/5      [html4            ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/6      [txt1             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/7      [txt2             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/8      [txt3             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/9      [txt4             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/10     [pb               ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/11     [gaviota          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/12     [cp               ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/13     [c                ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/14     [lsp              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/15     [xls              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/16     [xls_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/17     [bin              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/18     [bin_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/19     [sum              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/20     [man              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UValidate/0  [html             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UValidate/1  [urls             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UValidate/2  [jpg              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UValidate/3  [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UValidate/4  [pdf              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UIOVec/0     [html             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UIOVec/1     [urls             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UIOVec/2     [jpg              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UIOVec/3     [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UIOVec/4     [pdf              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/0  [html             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/1  [urls             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/2  [jpg              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/3  [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/4  [pdf              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/5  [html4            ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/6  [txt1             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/7  [txt2             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/8  [txt3             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/9  [txt4             ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/10 [pb               ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/11 [gaviota          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/12 [cp               ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/13 [c                ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/14 [lsp              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/15 [xls              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/16 [xls_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/17 [bin              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/18 [bin_200          ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/19 [sum              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_UFlatSink/20 [man              ]               0.00 ±NaN%              0.00 ±NaN%     ~     (all samples are equal)
BM_ZFlat/0      [html (22.31 %)   ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/1      [urls (47.78 %)   ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/2      [jpg (99.95 %)    ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/3      [jpg_200 (73.00 %)]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/4      [pdf (83.30 %)    ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/5      [html4 (22.52 %)  ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/6      [txt1 (57.88 %)   ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/7      [txt2 (61.91 %)   ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/8      [txt3 (54.99 %)   ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/9      [txt4 (66.26 %)   ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/10     [pb (19.68 %)     ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/11     [gaviota (37.72 %)]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/12     [cp (48.12 %)     ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/13     [c (42.47 %)      ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/14     [lsp (48.37 %)    ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/15     [xls (41.23 %)    ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/16     [xls_200 (78.00 %)]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/17     [bin (18.11 %)    ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/18     [bin_200 (7.50 %) ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/19     [sum (48.96 %)    ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)
BM_ZFlat/20     [man (59.21 %)    ]               1.00 ± 0%               1.00 ± 0%     ~     (all samples are equal)

name                                          old peak-mem(Bytes)/op  new peak-mem(Bytes)/op  delta
BM_UFlat/0      [html             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/1      [urls             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/2      [jpg              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/3      [jpg_200          ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/4      [pdf              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/5      [html4            ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/6      [txt1             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/7      [txt2             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/8      [txt3             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/9      [txt4             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/10     [pb               ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/11     [gaviota          ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/12     [cp               ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/13     [c                ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/14     [lsp              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/15     [xls              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/16     [xls_200          ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/17     [bin              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/18     [bin_200          ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/19     [sum              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlat/20     [man              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UValidate/0  [html             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UValidate/1  [urls             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UValidate/2  [jpg              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UValidate/3  [jpg_200          ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UValidate/4  [pdf              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UIOVec/0     [html             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UIOVec/1     [urls             ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UIOVec/2     [jpg              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UIOVec/3     [jpg_200          ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UIOVec/4     [pdf              ]               4.00 ± 0%               4.00 ± 0%     ~     (all samples are equal)
BM_UFlatSink/0  [html             ]               102k ± 0%               102k ± 0%     ~     (all samples are equal)
BM_UFlatSink/1  [urls             ]               702k ± 0%               702k ± 0%     ~     (all samples are equal)
BM_UFlatSink/2  [jpg              ]               123k ± 0%               123k ± 0%     ~     (all samples are equal)
BM_UFlatSink/3  [jpg_200          ]                201 ± 0%                201 ± 0%     ~     (all samples are equal)
BM_UFlatSink/4  [pdf              ]               102k ± 0%               102k ± 0%     ~     (all samples are equal)
BM_UFlatSink/5  [html4            ]               410k ± 0%               410k ± 0%     ~     (all samples are equal)
BM_UFlatSink/6  [txt1             ]               152k ± 0%               152k ± 0%     ~     (all samples are equal)
BM_UFlatSink/7  [txt2             ]               125k ± 0%               125k ± 0%     ~     (all samples are equal)
BM_UFlatSink/8  [txt3             ]               427k ± 0%               427k ± 0%     ~     (all samples are equal)
BM_UFlatSink/9  [txt4             ]               482k ± 0%               482k ± 0%     ~     (all samples are equal)
BM_UFlatSink/10 [pb               ]               119k ± 0%               119k ± 0%     ~     (all samples are equal)
BM_UFlatSink/11 [gaviota          ]               184k ± 0%               184k ± 0%     ~     (all samples are equal)
BM_UFlatSink/12 [cp               ]              24.6k ± 0%              24.6k ± 0%     ~     (all samples are equal)
BM_UFlatSink/13 [c                ]              11.2k ± 0%              11.2k ± 0%     ~     (all samples are equal)
BM_UFlatSink/14 [lsp              ]              3.72k ± 0%              3.72k ± 0%     ~     (all samples are equal)
BM_UFlatSink/15 [xls              ]              1.03M ± 0%              1.03M ± 0%     ~     (all samples are equal)
BM_UFlatSink/16 [xls_200          ]                201 ± 0%                201 ± 0%     ~     (all samples are equal)
BM_UFlatSink/17 [bin              ]               513k ± 0%               513k ± 0%     ~     (all samples are equal)
BM_UFlatSink/18 [bin_200          ]                201 ± 0%                201 ± 0%     ~     (all samples are equal)
BM_UFlatSink/19 [sum              ]              38.2k ± 0%              38.2k ± 0%     ~     (all samples are equal)
BM_UFlatSink/20 [man              ]              4.23k ± 0%              4.23k ± 0%     ~     (all samples are equal)
BM_ZFlat/0      [html (22.31 %)   ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/1      [urls (47.78 %)   ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/2      [jpg (99.95 %)    ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/3      [jpg_200 (73.00 %)]              30.7k ± 0%              30.7k ± 0%     ~     (all samples are equal)
BM_ZFlat/4      [pdf (83.30 %)    ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/5      [html4 (22.52 %)  ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/6      [txt1 (57.88 %)   ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/7      [txt2 (61.91 %)   ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/8      [txt3 (54.99 %)   ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/9      [txt4 (66.26 %)   ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/10     [pb (19.68 %)     ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/11     [gaviota (37.72 %)]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/12     [cp (48.12 %)     ]              86.1k ± 0%              86.1k ± 0%     ~     (all samples are equal)
BM_ZFlat/13     [c (42.47 %)      ]              57.0k ± 0%              57.0k ± 0%     ~     (all samples are equal)
BM_ZFlat/14     [lsp (48.37 %)    ]              30.6k ± 0%              30.6k ± 0%     ~     (all samples are equal)
BM_ZFlat/15     [xls (41.23 %)    ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/16     [xls_200 (78.00 %)]              30.7k ± 0%              30.7k ± 0%     ~     (all samples are equal)
BM_ZFlat/17     [bin (18.11 %)    ]               175k ± 0%               175k ± 0%     ~     (all samples are equal)
BM_ZFlat/18     [bin_200 (7.50 %) ]              30.7k ± 0%              30.7k ± 0%     ~     (all samples are equal)
BM_ZFlat/19     [sum (48.96 %)    ]               116k ± 0%               116k ± 0%     ~     (all samples are equal)
BM_ZFlat/20     [man (59.21 %)    ]              30.6k ± 0%              30.6k ± 0%     ~     (all samples are equal)

name                                          old speed               new speed               delta
BM_UFlat/0      [html             ]           2.43GB/s ± 0%           2.41GB/s ± 0%   -0.59%          (p=0.032 n=5+5)
BM_UFlat/1      [urls             ]           1.19GB/s ± 1%           1.22GB/s ± 0%   +2.58%          (p=0.008 n=5+5)
BM_UFlat/2      [jpg              ]           17.2GB/s ± 1%           17.3GB/s ± 1%     ~             (p=0.421 n=5+5)
BM_UFlat/3      [jpg_200          ]           1.54GB/s ± 1%           1.56GB/s ± 1%   +1.23%          (p=0.008 n=5+5)
BM_UFlat/4      [pdf              ]           12.5GB/s ± 1%           12.5GB/s ± 0%     ~             (p=0.413 n=5+4)
BM_UFlat/5      [html4            ]           1.85GB/s ± 1%           1.85GB/s ± 0%     ~             (p=0.690 n=5+5)
BM_UFlat/6      [txt1             ]            793MB/s ± 0%            794MB/s ± 0%     ~             (p=0.690 n=5+5)
BM_UFlat/7      [txt2             ]            738MB/s ± 0%            742MB/s ± 1%     ~             (p=0.151 n=5+5)
BM_UFlat/8      [txt3             ]            839MB/s ± 0%            838MB/s ± 0%     ~             (p=0.310 n=5+5)
BM_UFlat/9      [txt4             ]            674MB/s ± 0%            684MB/s ± 0%   +1.55%          (p=0.008 n=5+5)
BM_UFlat/10     [pb               ]           3.07GB/s ± 1%           3.03GB/s ± 1%   -1.27%          (p=0.008 n=5+5)
BM_UFlat/11     [gaviota          ]            974MB/s ± 0%            978MB/s ± 0%   +0.50%          (p=0.032 n=5+5)
BM_UFlat/12     [cp               ]           1.72GB/s ± 0%           1.74GB/s ± 1%   +0.79%          (p=0.008 n=5+5)
BM_UFlat/13     [c                ]           1.52GB/s ± 1%           1.53GB/s ± 1%     ~             (p=0.421 n=5+5)
BM_UFlat/14     [lsp              ]           1.62GB/s ± 1%           1.64GB/s ± 0%     ~             (p=0.151 n=5+5)
BM_UFlat/15     [xls              ]           1.05GB/s ± 0%           1.14GB/s ± 1%   +8.60%          (p=0.008 n=5+5)
BM_UFlat/16     [xls_200          ]            942MB/s ± 0%            941MB/s ± 1%     ~             (p=0.690 n=5+5)
BM_UFlat/17     [bin              ]           1.85GB/s ± 0%           1.88GB/s ± 0%   +1.60%          (p=0.008 n=5+5)
BM_UFlat/18     [bin_200          ]           1.99GB/s ± 0%           1.99GB/s ± 0%     ~             (p=0.421 n=5+5)
BM_UFlat/19     [sum              ]           1.30GB/s ± 1%           1.37GB/s ± 1%   +5.28%          (p=0.008 n=5+5)
BM_UFlat/20     [man              ]           1.43GB/s ± 1%           1.42GB/s ± 0%     ~             (p=0.421 n=5+5)
BM_UValidate/0  [html             ]           3.07GB/s ± 0%           3.05GB/s ± 1%     ~             (p=0.222 n=5+5)
BM_UValidate/1  [urls             ]           1.59GB/s ± 0%           1.60GB/s ± 0%     ~             (p=0.310 n=5+5)
BM_UValidate/2  [jpg              ]            845GB/s ± 0%            860GB/s ± 0%   +1.75%          (p=0.008 n=5+5)
BM_UValidate/3  [jpg_200          ]           2.04GB/s ± 1%           2.11GB/s ± 1%   +3.61%          (p=0.008 n=5+5)
BM_UValidate/4  [pdf              ]           35.6GB/s ± 1%           36.1GB/s ± 1%   +1.40%          (p=0.016 n=5+5)
BM_UIOVec/0     [html             ]            845MB/s ± 1%            843MB/s ± 1%     ~             (p=0.310 n=5+5)
BM_UIOVec/1     [urls             ]            653MB/s ± 0%            651MB/s ± 1%     ~             (p=0.190 n=4+5)
BM_UIOVec/2     [jpg              ]           16.4GB/s ± 4%           16.1GB/s ± 5%     ~             (p=0.548 n=5+5)
BM_UIOVec/3     [jpg_200          ]            611MB/s ± 2%            614MB/s ± 0%     ~             (p=0.548 n=5+5)
BM_UIOVec/4     [pdf              ]           8.53GB/s ± 1%           8.52GB/s ± 3%     ~             (p=0.841 n=5+5)
BM_UFlatSink/0  [html             ]           2.43GB/s ± 1%           2.42GB/s ± 0%     ~             (p=0.222 n=5+5)
BM_UFlatSink/1  [urls             ]           1.20GB/s ± 0%           1.23GB/s ± 1%   +2.38%          (p=0.008 n=5+5)
BM_UFlatSink/2  [jpg              ]           16.7GB/s ± 8%           16.0GB/s ± 8%     ~             (p=0.151 n=5+5)
BM_UFlatSink/3  [jpg_200          ]           1.50GB/s ± 0%           1.53GB/s ± 0%   +2.13%          (p=0.008 n=5+5)
BM_UFlatSink/4  [pdf              ]           12.5GB/s ± 0%           12.3GB/s ± 5%     ~             (p=0.730 n=4+5)
BM_UFlatSink/5  [html4            ]           1.85GB/s ± 0%           1.84GB/s ± 0%     ~             (p=0.151 n=5+5)
BM_UFlatSink/6  [txt1             ]            791MB/s ± 0%            791MB/s ± 0%     ~             (p=1.000 n=5+5)
BM_UFlatSink/7  [txt2             ]            735MB/s ± 0%            739MB/s ± 0%   +0.51%          (p=0.016 n=5+4)
BM_UFlatSink/8  [txt3             ]            838MB/s ± 0%            840MB/s ± 0%     ~             (p=0.151 n=5+5)
BM_UFlatSink/9  [txt4             ]            674MB/s ± 0%            683MB/s ± 0%   +1.37%          (p=0.008 n=5+5)
BM_UFlatSink/10 [pb               ]           3.07GB/s ± 0%           3.03GB/s ± 1%   -1.34%          (p=0.008 n=5+5)
BM_UFlatSink/11 [gaviota          ]            973MB/s ± 0%            975MB/s ± 0%     ~             (p=0.310 n=5+5)
BM_UFlatSink/12 [cp               ]           1.73GB/s ± 1%           1.74GB/s ± 1%     ~             (p=0.056 n=5+5)
BM_UFlatSink/13 [c                ]           1.52GB/s ± 1%           1.53GB/s ± 1%   +0.76%          (p=0.032 n=5+5)
BM_UFlatSink/14 [lsp              ]           1.62GB/s ± 0%           1.63GB/s ± 0%     ~             (p=0.548 n=5+5)
BM_UFlatSink/15 [xls              ]           1.05GB/s ± 0%           1.14GB/s ± 0%   +8.57%          (p=0.008 n=5+5)
BM_UFlatSink/16 [xls_200          ]            925MB/s ± 0%            933MB/s ± 0%   +0.85%          (p=0.008 n=5+5)
BM_UFlatSink/17 [bin              ]           1.85GB/s ± 1%           1.88GB/s ± 0%   +1.47%          (p=0.008 n=5+5)
BM_UFlatSink/18 [bin_200          ]           1.88GB/s ± 5%           1.93GB/s ± 0%     ~             (p=0.421 n=5+5)
BM_UFlatSink/19 [sum              ]           1.30GB/s ± 1%           1.37GB/s ± 1%   +5.18%          (p=0.008 n=5+5)
BM_UFlatSink/20 [man              ]           1.41GB/s ± 0%           1.41GB/s ± 1%     ~             (p=0.222 n=5+5)
BM_ZFlat/0      [html (22.31 %)   ]            809MB/s ± 0%            814MB/s ± 1%   +0.61%          (p=0.016 n=5+5)
BM_ZFlat/1      [urls (47.78 %)   ]            423MB/s ± 0%            422MB/s ± 0%     ~             (p=0.548 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]           10.8GB/s ± 3%           10.6GB/s ± 5%     ~             (p=0.690 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]            575MB/s ± 2%            579MB/s ± 0%     ~             (p=1.000 n=5+4)
BM_ZFlat/4      [pdf (83.30 %)    ]           7.06GB/s ± 4%           7.05GB/s ± 2%     ~             (p=0.421 n=5+5)
BM_ZFlat/5      [html4 (22.52 %)  ]            745MB/s ± 0%            744MB/s ± 0%     ~             (p=0.421 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]            282MB/s ± 0%            282MB/s ± 1%     ~             (p=1.000 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]            261MB/s ± 0%            263MB/s ± 0%   +0.55%          (p=0.032 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]            297MB/s ± 1%            297MB/s ± 0%     ~             (p=1.000 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]            245MB/s ± 0%            246MB/s ± 0%     ~             (p=0.286 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]           1.08GB/s ± 1%           1.08GB/s ± 0%     ~             (p=0.056 n=5+5)
BM_ZFlat/11     [gaviota (37.72 %)]            450MB/s ± 0%            452MB/s ± 0%   +0.55%          (p=0.016 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            537MB/s ± 1%            538MB/s ± 0%     ~             (p=0.421 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            637MB/s ± 1%            634MB/s ± 1%     ~             (p=0.222 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            684MB/s ± 1%            680MB/s ± 0%     ~             (p=0.310 n=5+5)
BM_ZFlat/15     [xls (41.23 %)    ]            641MB/s ± 0%            640MB/s ± 1%     ~             (p=0.310 n=5+5)
BM_ZFlat/16     [xls_200 (78.00 %)]            501MB/s ± 9%            521MB/s ± 1%     ~             (p=0.111 n=5+4)
BM_ZFlat/17     [bin (18.11 %)    ]           1.01GB/s ± 0%           1.02GB/s ± 1%     ~             (p=0.151 n=5+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]           2.24GB/s ±14%           2.48GB/s ± 0%     ~             (p=0.063 n=5+4)
BM_ZFlat/19     [sum (48.96 %)    ]            473MB/s ± 1%            485MB/s ± 1%   +2.47%          (p=0.008 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            558MB/s ± 1%            558MB/s ± 1%     ~             (p=1.000 n=5+5)
2019-01-08 06:35:12 -08:00
costan fdba21ffd6 Fix typo in two argument names in stubs.
The stubs are only used in the open source version, so it wasn't caught
in internal tests.
2019-01-06 13:49:33 -08:00
costan 81d444e4e4 Remove direct use of _builtin_clz.
A previous CL introduced _builtin_clz in zippy.cc. This is a GCC / Clang
intrinsic, and is not supported in Visual Studio. The rest of the
project uses bit manipulation intrinsics via the functions in Bits::,
which are stubbed out for the open source build in
zippy-stubs-internal.h.

This CL extracts Bits::Log2FloorNonZero() out of Bits::Log2Floor() in
the stubbed version of Bits, adds assertions to the Bits::*NonZero()
functions in the stubs, and converts _builtin_clz to a
Bits::Log2FloorNonZero() call.

The latter part is not obvious. A mathematical proof of correctness is
outlined in added comments. An empirical proof is available at
https://godbolt.org/z/mPKWmh -- CalculateTableSizeOld(), which is the
current code, compiles to the same assembly on Clang as
CalculateTableSizeNew1(), which is the bigger jump in the proof.
CalculateTableSizeNew2() is a fairly obvious transformation from
CalculateTableSizeNew1(), and results in slightly better assembly on all
supported compilers.

Two benchmark runs with the same arguments as the original CL only
showed differences in completely disjoint tests, suggesting that the
differences are pure noise.
2019-01-06 12:49:08 -08:00
costan 9a6fa91217 Remove use of std::uniform_distribution<uint8_t>.
A previous CL removed use of Google-specific random number generating
functionality, such as ACMRandom, and used the C++11 standard library
instead. The CL used std::uniform_distribution<uint8_t> to generate
random bytes, which seems to be unsupported by the standard [1, 2].

For better or for worse, our toolchain does not complain. However,
Visual Studio errors out with "invalid template argument for
uniform_int_distribution: N4659 29.6.1.1 [rand.req.genl]/1e requires one
of short, int, long, long long, unsigned short, unsigned int, unsigned
long, or unsigned long long".

This CL replaces std::uniform_distribution<uint8_t> with
std::uniform_distribution<int>(0, 255) and appropriate static_cast<>s.

[1] http://eel.is/c++draft/rand.req.genl#1.6
[2] be83c0b472/source/numerics.tex (L1807-L1817)
2019-01-06 12:48:39 -08:00
costan 3fcbc47f99 Use std random number generators in tests.
An earlier CL introduced absl::Uniform, which is not yet open sourced,
and therefore unavailable in the open source build.

This CL removes absl::Uniform and ACMRandom in favor of equivalent C++11
standard random generators. Abseil promises to be faster than the
standard library, but we can afford a speed hit in tests in return for
an easier open sourcing story.
2019-01-04 19:09:39 -08:00
costan 925c3094c4 Convert DCHECK to assert.
The open source build does not support DCHECK, and this project uses
assert() instead of DCHECK.
2019-01-04 19:09:15 -08:00
costan 02de4ff1d1 Update Travis CI configuration.
The Travis CI configuration updates reflect the following changes:
* Container-based builds (sudo: false) have been removed.
  https://changelog.travis-ci.com/the-container-based-build-environment-is-fully-deprecated-84517
* Ubuntu Xenial (16.04) is available as a base image.
  https://blog.travis-ci.com/2018-11-08-xenial-release
* Homebrew now has a dedicated DSL.
  https://docs.travis-ci.com/user/installing-dependencies/#installing-packages-on-os-x

To take full advantage of VM resources, CI builds now use Ninja
https://ninja-build.org/ instead of Make.
2019-01-04 19:09:07 -08:00
atdt f7aece15e2 Add comment explaining MSan false-positive workaround 2019-01-04 19:09:01 -08:00
atdt 5913c5f8e4 Don't use _bzhi_u32 under MSan
MSan knows that x & 0xFF only uses the lower byte from x but it isn't as
smart about _bzhi_u32(val, 8). (I'll file an upstream bug.)
2019-01-04 19:08:53 -08:00
atdt 136b3ebc31 If BMI instructions are available, use BZHI to extract low bytes.
With --cpu=haswell, this results in some significant speed improvement
(notably 12-14% for html and pb). On k8, performance is not affected (as
expected). Full benchmark results for --cpu={k8,haswell} below.

Haswell
-------

name                                          old time/op             new time/op             delta
BM_UFlat/0      [html             ]            55.2µs ± 0%             49.0µs ± 0%  -11.34%          (p=0.008 n=5+5)
BM_UFlat/1      [urls             ]             612µs ± 0%              604µs ± 0%   -1.21%          (p=0.008 n=5+5)
BM_UFlat/2      [jpg              ]            6.11µs ± 2%             6.07µs ± 1%     ~             (p=0.421 n=5+5)
BM_UFlat/3      [jpg_200          ]              134ns ± 0%              132ns ± 5%   -1.49%          (p=0.048 n=5+5)
BM_UFlat/4      [pdf              ]            8.41µs ± 2%             8.34µs ± 1%     ~             (p=0.222 n=5+5)
BM_UFlat/5      [html4            ]             239µs ± 0%              234µs ± 0%   -2.24%          (p=0.008 n=5+5)
BM_UFlat/6      [txt1             ]             211µs ± 0%              205µs ± 0%   -2.73%          (p=0.008 n=5+5)
BM_UFlat/7      [txt2             ]             185µs ± 0%              181µs ± 0%   -2.34%          (p=0.008 n=5+5)
BM_UFlat/8      [txt3             ]             560µs ± 0%              545µs ± 0%   -2.55%          (p=0.008 n=5+5)
BM_UFlat/9      [txt4             ]             773µs ± 0%              753µs ± 0%   -2.61%          (p=0.008 n=5+5)
BM_UFlat/10     [pb               ]            51.6µs ± 0%             45.3µs ± 0%  -12.28%          (p=0.008 n=5+5)
BM_UFlat/11     [gaviota          ]             209µs ± 0%              204µs ± 0%   -2.28%          (p=0.008 n=5+5)
BM_UFlat/12     [cp               ]            17.3µs ± 0%             15.7µs ± 1%   -9.57%          (p=0.008 n=5+5)
BM_UFlat/13     [c                ]            8.08µs ± 0%             8.00µs ± 0%   -0.99%          (p=0.008 n=5+5)
BM_UFlat/14     [lsp              ]            2.48µs ± 0%             2.45µs ± 0%   -1.11%          (p=0.008 n=5+5)
BM_UFlat/15     [xls              ]             967µs ± 0%              954µs ± 0%   -1.36%          (p=0.008 n=5+5)
BM_UFlat/16     [xls_200          ]              219ns ± 1%              218ns ± 1%     ~             (p=0.444 n=5+5)
BM_UFlat/17     [bin              ]             278µs ± 0%              275µs ± 0%   -0.92%          (p=0.008 n=5+5)
BM_UFlat/18     [bin_200          ]              100ns ± 0%               99ns ± 1%   -1.04%          (p=0.008 n=5+5)
BM_UFlat/19     [sum              ]            34.0µs ± 0%             30.9µs ± 0%   -9.10%          (p=0.008 n=5+5)
BM_UFlat/20     [man              ]            3.21µs ± 0%             3.20µs ± 0%     ~             (p=0.063 n=5+5)
BM_UValidate/0  [html             ]            33.1µs ± 0%             33.6µs ± 0%   +1.69%          (p=0.008 n=5+5)
BM_UValidate/1  [urls             ]             436µs ± 0%              441µs ± 0%   +1.06%          (p=0.008 n=5+5)
BM_UValidate/2  [jpg              ]              141ns ± 0%              142ns ± 0%   +0.71%          (p=0.008 n=5+5)
BM_UValidate/3  [jpg_200          ]             94.3ns ± 0%             95.3ns ± 0%   +1.06%          (p=0.008 n=5+5)
BM_UValidate/4  [pdf              ]            2.87µs ± 0%             2.95µs ± 0%   +2.74%          (p=0.008 n=5+5)
BM_UIOVec/0     [html             ]             126µs ± 0%              124µs ± 0%   -1.50%          (p=0.008 n=5+5)
BM_UIOVec/1     [urls             ]             1.13ms ± 0%             1.11ms ± 0%   -1.95%          (p=0.008 n=5+5)
BM_UIOVec/2     [jpg              ]            6.31µs ± 3%             7.44µs ± 3%  +17.75%          (p=0.008 n=5+5)
BM_UIOVec/3     [jpg_200          ]              332ns ± 1%              318ns ± 1%   -4.22%          (p=0.008 n=5+5)
BM_UIOVec/4     [pdf              ]            12.7µs ± 3%             12.6µs ± 9%     ~             (p=0.222 n=5+5)
BM_UFlatSink/0  [html             ]            55.2µs ± 0%             49.0µs ± 0%  -11.31%          (p=0.008 n=5+5)
BM_UFlatSink/1  [urls             ]             612µs ± 0%              605µs ± 0%   -1.17%          (p=0.008 n=5+5)
BM_UFlatSink/2  [jpg              ]            6.29µs ±12%             6.57µs ± 9%     ~             (p=0.548 n=5+5)
BM_UFlatSink/3  [jpg_200          ]              138ns ± 2%              134ns ± 0%   -2.76%          (p=0.000 n=5+4)
BM_UFlatSink/4  [pdf              ]            8.35µs ± 0%             8.34µs ± 1%     ~             (p=0.905 n=4+5)
BM_UFlatSink/5  [html4            ]             239µs ± 0%              234µs ± 0%   -2.33%          (p=0.008 n=5+5)
BM_UFlatSink/6  [txt1             ]             211µs ± 0%              205µs ± 0%   -2.82%          (p=0.008 n=5+5)
BM_UFlatSink/7  [txt2             ]             185µs ± 0%              181µs ± 0%   -2.18%          (p=0.008 n=5+5)
BM_UFlatSink/8  [txt3             ]             560µs ± 0%              545µs ± 0%   -2.57%          (p=0.008 n=5+5)
BM_UFlatSink/9  [txt4             ]             773µs ± 0%              754µs ± 0%   -2.54%          (p=0.008 n=5+5)
BM_UFlatSink/10 [pb               ]            51.6µs ± 0%             45.3µs ± 0%  -12.19%          (p=0.008 n=5+5)
BM_UFlatSink/11 [gaviota          ]             209µs ± 0%              204µs ± 0%   -2.39%          (p=0.008 n=5+5)
BM_UFlatSink/12 [cp               ]            17.3µs ± 0%             15.6µs ± 0%   -9.98%          (p=0.008 n=5+5)
BM_UFlatSink/13 [c                ]            8.10µs ± 1%             7.98µs ± 0%   -1.53%          (p=0.008 n=5+5)
BM_UFlatSink/14 [lsp              ]            2.49µs ± 1%             2.47µs ± 0%   -0.84%          (p=0.008 n=5+5)
BM_UFlatSink/15 [xls              ]             968µs ± 0%              953µs ± 0%   -1.48%          (p=0.008 n=5+5)
BM_UFlatSink/16 [xls_200          ]              220ns ± 1%              220ns ± 0%     ~             (p=1.000 n=5+4)
BM_UFlatSink/17 [bin              ]             278µs ± 0%              275µs ± 0%   -0.99%          (p=0.008 n=5+5)
BM_UFlatSink/18 [bin_200          ]              102ns ± 1%              103ns ± 0%   +1.18%          (p=0.048 n=5+5)
BM_UFlatSink/19 [sum              ]            34.0µs ± 0%             30.9µs ± 0%   -9.21%          (p=0.008 n=5+5)
BM_UFlatSink/20 [man              ]            3.22µs ± 1%             3.20µs ± 0%   -0.76%          (p=0.032 n=5+5)
BM_ZFlat/0      [html (22.31 %)   ]             122µs ± 0%              122µs ± 0%     ~             (p=0.413 n=4+5)
BM_ZFlat/1      [urls (47.78 %)   ]             1.60ms ± 0%             1.60ms ± 0%   -0.06%          (p=0.032 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]            10.5µs ± 2%             10.7µs ± 9%     ~             (p=0.841 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]              310ns ± 1%              309ns ± 3%     ~             (p=0.349 n=4+5)
BM_ZFlat/4      [pdf (83.30 %)    ]            13.5µs ± 1%             13.6µs ± 2%     ~             (p=0.595 n=5+5)
BM_ZFlat/5      [html4 (22.52 %)  ]             533µs ± 0%              532µs ± 0%   -0.08%          (p=0.032 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]             529µs ± 0%              528µs ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]             469µs ± 0%              469µs ± 0%     ~             (p=0.690 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]             1.40ms ± 0%             1.40ms ± 0%     ~             (p=0.548 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]             1.93ms ± 0%             1.92ms ± 0%     ~             (p=0.421 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]             106µs ± 0%              106µs ± 0%     ~             (p=0.548 n=5+5)
BM_ZFlat/11     [gaviota (37.72 %)]             404µs ± 0%              404µs ± 0%     ~             (p=0.841 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            43.2µs ± 0%             43.3µs ± 1%     ~             (p=0.151 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            16.4µs ± 1%             16.4µs ± 0%     ~             (p=0.310 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            4.96µs ± 0%             4.96µs ± 1%     ~             (p=0.651 n=5+5)
BM_ZFlat/15     [xls (41.23 %)    ]             1.54ms ± 0%             1.54ms ± 0%     ~             (p=0.841 n=5+5)
BM_ZFlat/16     [xls_200 (78.00 %)]              352ns ± 2%              351ns ± 1%     ~             (p=0.762 n=5+5)
BM_ZFlat/17     [bin (18.11 %)    ]             491µs ± 0%              491µs ± 0%     ~             (p=0.310 n=5+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]             75.6ns ± 1%             77.2ns ± 0%   +2.06%          (p=0.016 n=5+4)
BM_ZFlat/19     [sum (48.96 %)    ]            76.9µs ± 0%             76.7µs ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            6.87µs ± 1%             6.81µs ± 0%   -0.87%          (p=0.008 n=5+5)

name                                          old speed               new speed               delta
BM_UFlat/0      [html             ]           1.85GB/s ± 0%           2.09GB/s ± 0%  +12.83%          (p=0.016 n=4+5)
BM_UFlat/1      [urls             ]           1.15GB/s ± 0%           1.16GB/s ± 0%   +1.25%          (p=0.008 n=5+5)
BM_UFlat/2      [jpg              ]           20.1GB/s ± 2%           20.3GB/s ± 1%     ~             (p=0.421 n=5+5)
BM_UFlat/3      [jpg_200          ]           1.49GB/s ± 0%           1.53GB/s ± 0%   +2.83%          (p=0.016 n=5+4)
BM_UFlat/4      [pdf              ]           12.2GB/s ± 2%           12.3GB/s ± 1%     ~             (p=0.222 n=5+5)
BM_UFlat/5      [html4            ]           1.71GB/s ± 0%           1.75GB/s ± 0%   +2.29%          (p=0.008 n=5+5)
BM_UFlat/6      [txt1             ]            722MB/s ± 0%            742MB/s ± 0%   +2.81%          (p=0.008 n=5+5)
BM_UFlat/7      [txt2             ]            676MB/s ± 0%            692MB/s ± 0%   +2.40%          (p=0.008 n=5+5)
BM_UFlat/8      [txt3             ]            762MB/s ± 0%            782MB/s ± 0%   +2.62%          (p=0.008 n=5+5)
BM_UFlat/9      [txt4             ]            623MB/s ± 0%            640MB/s ± 0%   +2.68%          (p=0.008 n=5+5)
BM_UFlat/10     [pb               ]           2.30GB/s ± 0%           2.62GB/s ± 0%  +13.99%          (p=0.008 n=5+5)
BM_UFlat/11     [gaviota          ]            883MB/s ± 0%            903MB/s ± 0%   +2.33%          (p=0.008 n=5+5)
BM_UFlat/12     [cp               ]           1.42GB/s ± 0%           1.57GB/s ± 1%  +10.57%          (p=0.008 n=5+5)
BM_UFlat/13     [c                ]           1.38GB/s ± 0%           1.39GB/s ± 0%   +1.00%          (p=0.008 n=5+5)
BM_UFlat/14     [lsp              ]           1.50GB/s ± 0%           1.52GB/s ± 0%   +1.12%          (p=0.008 n=5+5)
BM_UFlat/15     [xls              ]           1.06GB/s ± 0%           1.08GB/s ± 0%   +1.34%          (p=0.016 n=5+4)
BM_UFlat/16     [xls_200          ]            913MB/s ± 1%            918MB/s ± 1%     ~             (p=0.421 n=5+5)
BM_UFlat/17     [bin              ]           1.85GB/s ± 0%           1.86GB/s ± 0%   +0.92%          (p=0.008 n=5+5)
BM_UFlat/18     [bin_200          ]           2.01GB/s ± 0%           2.03GB/s ± 1%   +1.10%          (p=0.008 n=5+5)
BM_UFlat/19     [sum              ]           1.13GB/s ± 0%           1.24GB/s ± 0%   +9.99%          (p=0.008 n=5+5)
BM_UFlat/20     [man              ]           1.32GB/s ± 0%           1.32GB/s ± 1%     ~             (p=0.063 n=5+5)
BM_UValidate/0  [html             ]           3.10GB/s ± 0%           3.04GB/s ± 0%   -1.66%          (p=0.008 n=5+5)
BM_UValidate/1  [urls             ]           1.61GB/s ± 0%           1.59GB/s ± 0%   -1.04%          (p=0.008 n=5+5)
BM_UValidate/2  [jpg              ]            875GB/s ± 0%            866GB/s ± 0%   -1.11%          (p=0.008 n=5+5)
BM_UValidate/3  [jpg_200          ]           2.12GB/s ± 0%           2.10GB/s ± 0%   -1.01%          (p=0.016 n=5+4)
BM_UValidate/4  [pdf              ]           35.7GB/s ± 0%           34.7GB/s ± 0%   -2.66%          (p=0.008 n=5+5)
BM_UIOVec/0     [html             ]            813MB/s ± 0%            825MB/s ± 0%   +1.52%          (p=0.008 n=5+5)
BM_UIOVec/1     [urls             ]            622MB/s ± 0%            634MB/s ± 0%   +1.99%          (p=0.008 n=5+5)
BM_UIOVec/2     [jpg              ]           19.5GB/s ± 3%           16.6GB/s ± 3%  -15.08%          (p=0.008 n=5+5)
BM_UIOVec/3     [jpg_200          ]            603MB/s ± 1%            630MB/s ± 1%   +4.42%          (p=0.008 n=5+5)
BM_UIOVec/4     [pdf              ]           8.05GB/s ± 3%           8.12GB/s ± 8%     ~             (p=0.222 n=5+5)
BM_UFlatSink/0  [html             ]           1.85GB/s ± 0%           2.09GB/s ± 0%  +12.76%          (p=0.008 n=5+5)
BM_UFlatSink/1  [urls             ]           1.15GB/s ± 0%           1.16GB/s ± 0%   +1.18%          (p=0.008 n=5+5)
BM_UFlatSink/2  [jpg              ]           19.6GB/s ±11%           18.8GB/s ± 9%     ~             (p=0.548 n=5+5)
BM_UFlatSink/3  [jpg_200          ]           1.45GB/s ± 1%           1.49GB/s ± 0%   +2.82%          (p=0.016 n=5+4)
BM_UFlatSink/4  [pdf              ]           12.3GB/s ± 0%           12.3GB/s ± 1%     ~             (p=0.905 n=4+5)
BM_UFlatSink/5  [html4            ]           1.71GB/s ± 0%           1.75GB/s ± 0%   +2.41%          (p=0.008 n=5+5)
BM_UFlatSink/6  [txt1             ]            722MB/s ± 0%            743MB/s ± 0%   +2.90%          (p=0.008 n=5+5)
BM_UFlatSink/7  [txt2             ]            676MB/s ± 0%            691MB/s ± 0%   +2.23%          (p=0.008 n=5+5)
BM_UFlatSink/8  [txt3             ]            763MB/s ± 0%            783MB/s ± 0%   +2.64%          (p=0.008 n=5+5)
BM_UFlatSink/9  [txt4             ]            623MB/s ± 0%            639MB/s ± 0%   +2.61%          (p=0.008 n=5+5)
BM_UFlatSink/10 [pb               ]           2.30GB/s ± 0%           2.62GB/s ± 0%  +13.86%          (p=0.008 n=5+5)
BM_UFlatSink/11 [gaviota          ]            882MB/s ± 0%            904MB/s ± 0%   +2.45%          (p=0.008 n=5+5)
BM_UFlatSink/12 [cp               ]           1.42GB/s ± 0%           1.58GB/s ± 0%  +11.09%          (p=0.008 n=5+5)
BM_UFlatSink/13 [c                ]           1.38GB/s ± 1%           1.40GB/s ± 0%   +1.56%          (p=0.008 n=5+5)
BM_UFlatSink/14 [lsp              ]           1.50GB/s ± 1%           1.51GB/s ± 1%   +0.85%          (p=0.008 n=5+5)
BM_UFlatSink/15 [xls              ]           1.06GB/s ± 0%           1.08GB/s ± 0%   +1.51%          (p=0.016 n=5+4)
BM_UFlatSink/16 [xls_200          ]            908MB/s ± 1%            911MB/s ± 0%     ~             (p=0.730 n=5+4)
BM_UFlatSink/17 [bin              ]           1.85GB/s ± 0%           1.86GB/s ± 0%   +1.01%          (p=0.008 n=5+5)
BM_UFlatSink/18 [bin_200          ]           1.96GB/s ± 1%           1.94GB/s ± 1%   -1.18%          (p=0.016 n=5+5)
BM_UFlatSink/19 [sum              ]           1.12GB/s ± 0%           1.24GB/s ± 0%  +10.16%          (p=0.008 n=5+5)
BM_UFlatSink/20 [man              ]           1.31GB/s ± 1%           1.32GB/s ± 0%   +0.77%          (p=0.048 n=5+5)
BM_ZFlat/0      [html (22.31 %)   ]            839MB/s ± 0%            839MB/s ± 0%     ~             (p=0.413 n=4+5)
BM_ZFlat/1      [urls (47.78 %)   ]            439MB/s ± 0%            439MB/s ± 0%   +0.06%          (p=0.032 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]           11.7GB/s ± 2%           11.5GB/s ± 9%     ~             (p=0.841 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]            645MB/s ± 1%            647MB/s ± 3%     ~             (p=0.413 n=4+5)
BM_ZFlat/4      [pdf (83.30 %)    ]           7.57GB/s ± 1%           7.54GB/s ± 2%     ~             (p=0.595 n=5+5)
BM_ZFlat/5      [html4 (22.52 %)  ]            769MB/s ± 0%            770MB/s ± 0%   +0.08%          (p=0.032 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]            288MB/s ± 0%            288MB/s ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]            267MB/s ± 0%            267MB/s ± 0%     ~             (p=0.690 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]            305MB/s ± 0%            305MB/s ± 0%     ~             (p=0.548 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]            250MB/s ± 0%            251MB/s ± 0%     ~             (p=0.421 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]           1.12GB/s ± 0%           1.12GB/s ± 0%     ~             (p=0.635 n=5+5)
BM_ZFlat/11     [gaviota (37.72 %)]            457MB/s ± 0%            457MB/s ± 0%     ~             (p=0.841 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            570MB/s ± 0%            568MB/s ± 1%     ~             (p=0.151 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            682MB/s ± 1%            681MB/s ± 0%     ~             (p=0.310 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            750MB/s ± 0%            751MB/s ± 1%     ~             (p=0.690 n=5+5)
BM_ZFlat/15     [xls (41.23 %)    ]            668MB/s ± 0%            668MB/s ± 0%     ~             (p=0.841 n=5+5)
BM_ZFlat/16     [xls_200 (78.00 %)]            569MB/s ± 2%            570MB/s ± 1%     ~             (p=0.841 n=5+5)
BM_ZFlat/17     [bin (18.11 %)    ]           1.04GB/s ± 0%           1.04GB/s ± 0%     ~             (p=0.310 n=5+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]           2.64GB/s ± 1%           2.59GB/s ± 0%   -1.99%          (p=0.016 n=5+4)
BM_ZFlat/19     [sum (48.96 %)    ]            497MB/s ± 0%            498MB/s ± 0%     ~             (p=0.222 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            615MB/s ± 1%            621MB/s ± 0%   +0.87%          (p=0.008 n=5+5)

K8
--

name                                          old time/op             new time/op             delta
BM_UFlat/0      [html             ]            41.7µs ± 0%             41.7µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/1      [urls             ]             588µs ± 0%              588µs ± 0%    ~             (p=0.310 n=5+5)
BM_UFlat/2      [jpg              ]            7.11µs ± 1%             7.10µs ± 1%    ~             (p=0.556 n=5+4)
BM_UFlat/3      [jpg_200          ]              130ns ± 0%              130ns ± 0%    ~     (all samples are equal)
BM_UFlat/4      [pdf              ]            8.19µs ± 0%             8.26µs ± 2%    ~             (p=0.460 n=5+5)
BM_UFlat/5      [html4            ]             219µs ± 0%              219µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/6      [txt1             ]             192µs ± 0%              191µs ± 0%    ~             (p=0.341 n=5+5)
BM_UFlat/7      [txt2             ]             170µs ± 0%              170µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/8      [txt3             ]             509µs ± 0%              509µs ± 0%    ~             (p=0.151 n=5+5)
BM_UFlat/9      [txt4             ]             712µs ± 0%              712µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/10     [pb               ]            38.5µs ± 0%             38.5µs ± 0%    ~             (p=0.452 n=5+5)
BM_UFlat/11     [gaviota          ]             189µs ± 0%              189µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/12     [cp               ]            14.2µs ± 1%             14.2µs ± 0%    ~             (p=0.889 n=5+5)
BM_UFlat/13     [c                ]            7.32µs ± 0%             7.33µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlat/14     [lsp              ]            2.26µs ± 0%             2.27µs ± 0%    ~             (p=0.222 n=4+5)
BM_UFlat/15     [xls              ]             954µs ± 0%              955µs ± 0%    ~             (p=0.222 n=5+5)
BM_UFlat/16     [xls_200          ]              215ns ± 4%              212ns ± 0%    ~             (p=0.095 n=5+4)
BM_UFlat/17     [bin              ]             276µs ± 0%              276µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/18     [bin_200          ]              104ns ±10%              103ns ± 3%    ~             (p=0.825 n=5+5)
BM_UFlat/19     [sum              ]            29.2µs ± 0%             29.2µs ± 0%    ~             (p=0.690 n=5+5)
BM_UFlat/20     [man              ]            2.96µs ± 0%             2.97µs ± 0%  +0.43%          (p=0.032 n=5+5)
BM_UValidate/0  [html             ]            33.4µs ± 0%             33.4µs ± 0%    ~             (p=0.151 n=5+5)
BM_UValidate/1  [urls             ]             441µs ± 0%              441µs ± 0%    ~             (p=0.548 n=5+5)
BM_UValidate/2  [jpg              ]              146ns ± 0%              146ns ± 0%    ~     (all samples are equal)
BM_UValidate/3  [jpg_200          ]             98.0ns ± 0%             98.0ns ± 0%    ~             (p=1.000 n=5+5)
BM_UValidate/4  [pdf              ]            2.89µs ± 0%             2.89µs ± 0%    ~             (p=0.794 n=5+5)
BM_UIOVec/0     [html             ]             121µs ± 0%              121µs ± 0%    ~             (p=0.151 n=5+5)
BM_UIOVec/1     [urls             ]             1.08ms ± 0%             1.08ms ± 0%    ~             (p=0.095 n=5+5)
BM_UIOVec/2     [jpg              ]            7.47µs ± 5%             7.31µs ± 2%    ~             (p=0.222 n=5+5)
BM_UIOVec/3     [jpg_200          ]              330ns ± 0%              330ns ± 0%    ~     (all samples are equal)
BM_UIOVec/4     [pdf              ]            12.3µs ± 2%             12.0µs ± 0%    ~             (p=0.063 n=5+5)
BM_UFlatSink/0  [html             ]            41.6µs ± 0%             41.6µs ± 0%    ~             (p=0.095 n=5+5)
BM_UFlatSink/1  [urls             ]             589µs ± 0%              589µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/2  [jpg              ]            7.84µs ±26%             7.23µs ± 5%    ~             (p=0.690 n=5+5)
BM_UFlatSink/3  [jpg_200          ]              132ns ± 0%              132ns ± 0%    ~     (all samples are equal)
BM_UFlatSink/4  [pdf              ]            8.43µs ± 3%             8.27µs ± 2%    ~             (p=0.254 n=5+5)
BM_UFlatSink/5  [html4            ]             219µs ± 0%              219µs ± 0%    ~             (p=0.524 n=5+5)
BM_UFlatSink/6  [txt1             ]             192µs ± 0%              192µs ± 0%    ~             (p=0.690 n=5+5)
BM_UFlatSink/7  [txt2             ]             170µs ± 0%              170µs ± 0%    ~             (p=0.421 n=5+5)
BM_UFlatSink/8  [txt3             ]             509µs ± 0%              509µs ± 0%    ~             (p=0.310 n=5+5)
BM_UFlatSink/9  [txt4             ]             712µs ± 0%              712µs ± 0%    ~             (p=0.841 n=5+5)
BM_UFlatSink/10 [pb               ]            38.5µs ± 0%             38.5µs ± 0%    ~             (p=0.421 n=5+5)
BM_UFlatSink/11 [gaviota          ]             189µs ± 0%              189µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/12 [cp               ]            14.2µs ± 0%             14.2µs ± 0%    ~             (p=0.421 n=5+5)
BM_UFlatSink/13 [c                ]            7.37µs ± 1%             7.36µs ± 1%    ~             (p=0.746 n=5+5)
BM_UFlatSink/14 [lsp              ]            2.27µs ± 0%             2.27µs ± 1%    ~             (p=0.714 n=5+5)
BM_UFlatSink/15 [xls              ]             954µs ± 0%              954µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/16 [xls_200          ]              215ns ± 1%              215ns ± 1%    ~             (p=0.921 n=5+5)
BM_UFlatSink/17 [bin              ]             276µs ± 0%              276µs ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/18 [bin_200          ]              103ns ± 2%              104ns ± 1%    ~             (p=0.429 n=5+5)
BM_UFlatSink/19 [sum              ]            29.2µs ± 0%             29.2µs ± 0%    ~             (p=0.452 n=5+5)
BM_UFlatSink/20 [man              ]            2.96µs ± 0%             2.97µs ± 1%    ~             (p=0.484 n=5+5)
BM_ZFlat/0      [html (22.31 %)   ]             126µs ± 0%              126µs ± 0%    ~             (p=1.000 n=5+5)
BM_ZFlat/1      [urls (47.78 %)   ]             1.67ms ± 0%             1.67ms ± 0%    ~             (p=0.841 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]            11.6µs ± 4%             11.6µs ± 3%    ~             (p=1.000 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]              368ns ± 1%              367ns ± 0%    ~             (p=0.159 n=5+5)
BM_ZFlat/4      [pdf (83.30 %)    ]            14.7µs ± 1%             14.6µs ± 0%    ~             (p=0.190 n=5+4)
BM_ZFlat/5      [html4 (22.52 %)  ]             550µs ± 0%              550µs ± 0%    ~             (p=0.841 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]             540µs ± 0%              540µs ± 0%    ~             (p=0.310 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]             479µs ± 0%              480µs ± 0%    ~             (p=1.000 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]             1.44ms ± 0%             1.44ms ± 0%    ~             (p=0.421 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]             1.97ms ± 0%             1.97ms ± 0%    ~             (p=0.421 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]             110µs ± 0%              109µs ± 0%    ~             (p=0.730 n=5+4)
BM_ZFlat/11     [gaviota (37.72 %)]             412µs ± 0%              412µs ± 0%    ~             (p=1.000 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            46.3µs ± 0%             46.3µs ± 1%    ~             (p=0.841 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            17.7µs ± 0%             17.7µs ± 1%    ~             (p=0.841 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            5.54µs ± 1%             5.55µs ± 0%    ~             (p=0.254 n=5+4)
BM_ZFlat/15     [xls (41.23 %)    ]             1.62ms ± 0%             1.63ms ± 0%    ~             (p=0.151 n=5+5)
BM_ZFlat/16     [xls_200 (78.00 %)]              395ns ± 2%              394ns ± 1%    ~             (p=1.000 n=5+5)
BM_ZFlat/17     [bin (18.11 %)    ]             507µs ± 0%              507µs ± 0%    ~             (p=0.056 n=5+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]             89.6ns ± 5%             89.8ns ± 5%    ~             (p=1.000 n=5+5)
BM_ZFlat/19     [sum (48.96 %)    ]            79.9µs ± 0%             79.9µs ± 0%    ~             (p=0.690 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            7.67µs ± 0%             7.67µs ± 1%    ~             (p=0.548 n=5+5)

name                                          old speed               new speed               delta
BM_UFlat/0      [html             ]           2.45GB/s ± 0%           2.45GB/s ± 0%    ~             (p=0.889 n=5+5)
BM_UFlat/1      [urls             ]           1.19GB/s ± 0%           1.19GB/s ± 0%    ~     (all samples are equal)
BM_UFlat/2      [jpg              ]           17.3GB/s ± 1%           17.3GB/s ± 1%    ~             (p=0.556 n=5+4)
BM_UFlat/3      [jpg_200          ]           1.54GB/s ± 0%           1.54GB/s ± 0%    ~             (p=0.833 n=5+5)
BM_UFlat/4      [pdf              ]           12.5GB/s ± 0%           12.4GB/s ± 2%    ~             (p=0.421 n=5+5)
BM_UFlat/5      [html4            ]           1.87GB/s ± 0%           1.87GB/s ± 0%    ~             (p=1.000 n=4+5)
BM_UFlat/6      [txt1             ]            794MB/s ± 0%            794MB/s ± 0%    ~             (p=0.310 n=5+5)
BM_UFlat/7      [txt2             ]            738MB/s ± 0%            738MB/s ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/8      [txt3             ]            839MB/s ± 0%            838MB/s ± 0%    ~             (p=0.151 n=5+5)
BM_UFlat/9      [txt4             ]            677MB/s ± 0%            677MB/s ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/10     [pb               ]           3.08GB/s ± 0%           3.08GB/s ± 0%    ~             (p=0.452 n=5+5)
BM_UFlat/11     [gaviota          ]            975MB/s ± 0%            975MB/s ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/12     [cp               ]           1.73GB/s ± 1%           1.73GB/s ± 0%    ~             (p=0.984 n=5+5)
BM_UFlat/13     [c                ]           1.52GB/s ± 0%           1.52GB/s ± 0%    ~             (p=0.841 n=5+5)
BM_UFlat/14     [lsp              ]           1.64GB/s ± 0%           1.64GB/s ± 0%    ~             (p=0.254 n=4+5)
BM_UFlat/15     [xls              ]           1.08GB/s ± 0%           1.08GB/s ± 0%    ~             (p=0.095 n=5+4)
BM_UFlat/16     [xls_200          ]            931MB/s ± 4%            941MB/s ± 0%    ~             (p=0.151 n=5+5)
BM_UFlat/17     [bin              ]           1.86GB/s ± 0%           1.86GB/s ± 0%    ~             (p=0.762 n=5+5)
BM_UFlat/18     [bin_200          ]           1.92GB/s ± 9%           1.95GB/s ± 3%    ~             (p=1.000 n=5+5)
BM_UFlat/19     [sum              ]           1.31GB/s ± 1%           1.31GB/s ± 0%    ~             (p=0.548 n=5+5)
BM_UFlat/20     [man              ]           1.43GB/s ± 0%           1.42GB/s ± 1%  -0.42%          (p=0.040 n=5+5)
BM_UValidate/0  [html             ]           3.06GB/s ± 0%           3.06GB/s ± 0%    ~             (p=0.151 n=5+5)
BM_UValidate/1  [urls             ]           1.59GB/s ± 0%           1.59GB/s ± 0%    ~             (p=0.357 n=5+5)
BM_UValidate/2  [jpg              ]            845GB/s ± 0%            845GB/s ± 0%    ~             (p=0.548 n=5+5)
BM_UValidate/3  [jpg_200          ]           2.04GB/s ± 0%           2.04GB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UValidate/4  [pdf              ]           35.4GB/s ± 0%           35.4GB/s ± 0%    ~             (p=0.421 n=5+5)
BM_UIOVec/0     [html             ]            845MB/s ± 0%            845MB/s ± 0%    ~             (p=0.151 n=5+5)
BM_UIOVec/1     [urls             ]            650MB/s ± 0%            650MB/s ± 0%    ~             (p=0.087 n=5+5)
BM_UIOVec/2     [jpg              ]           16.5GB/s ± 5%           16.8GB/s ± 2%    ~             (p=0.222 n=5+5)
BM_UIOVec/3     [jpg_200          ]            605MB/s ± 0%            605MB/s ± 0%    ~             (p=0.690 n=5+5)
BM_UIOVec/4     [pdf              ]           8.36GB/s ± 2%           8.54GB/s ± 0%    ~             (p=0.063 n=5+5)
BM_UFlatSink/0  [html             ]           2.46GB/s ± 0%           2.46GB/s ± 0%    ~             (p=0.063 n=5+5)
BM_UFlatSink/1  [urls             ]           1.19GB/s ± 0%           1.19GB/s ± 0%    ~     (all samples are equal)
BM_UFlatSink/2  [jpg              ]           16.0GB/s ±22%           17.0GB/s ± 5%    ~             (p=0.690 n=5+5)
BM_UFlatSink/3  [jpg_200          ]           1.51GB/s ± 0%           1.51GB/s ± 2%    ~             (p=1.000 n=5+5)
BM_UFlatSink/4  [pdf              ]           12.2GB/s ± 3%           12.4GB/s ± 2%    ~             (p=0.254 n=5+5)
BM_UFlatSink/5  [html4            ]           1.87GB/s ± 0%           1.87GB/s ± 0%    ~             (p=0.532 n=5+5)
BM_UFlatSink/6  [txt1             ]            794MB/s ± 0%            794MB/s ± 0%    ~             (p=0.690 n=5+5)
BM_UFlatSink/7  [txt2             ]            738MB/s ± 0%            738MB/s ± 0%    ~             (p=0.421 n=5+5)
BM_UFlatSink/8  [txt3             ]            838MB/s ± 0%            838MB/s ± 0%    ~             (p=0.310 n=5+5)
BM_UFlatSink/9  [txt4             ]            676MB/s ± 0%            676MB/s ± 0%    ~             (p=0.841 n=5+5)
BM_UFlatSink/10 [pb               ]           3.08GB/s ± 0%           3.08GB/s ± 0%    ~             (p=0.365 n=5+5)
BM_UFlatSink/11 [gaviota          ]            975MB/s ± 0%            975MB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/12 [cp               ]           1.73GB/s ± 0%           1.74GB/s ± 0%    ~             (p=0.286 n=5+5)
BM_UFlatSink/13 [c                ]           1.51GB/s ± 1%           1.52GB/s ± 1%    ~             (p=0.683 n=5+5)
BM_UFlatSink/14 [lsp              ]           1.64GB/s ± 0%           1.64GB/s ± 0%    ~             (p=0.444 n=5+5)
BM_UFlatSink/15 [xls              ]           1.08GB/s ± 0%           1.08GB/s ± 0%    ~             (p=0.333 n=4+5)
BM_UFlatSink/16 [xls_200          ]            930MB/s ± 1%            930MB/s ± 1%    ~             (p=0.841 n=5+5)
BM_UFlatSink/17 [bin              ]           1.86GB/s ± 0%           1.86GB/s ± 0%    ~             (p=1.000 n=5+5)
BM_UFlatSink/18 [bin_200          ]           1.93GB/s ± 2%           1.93GB/s ± 1%    ~             (p=0.651 n=5+5)
BM_UFlatSink/19 [sum              ]           1.31GB/s ± 0%           1.31GB/s ± 0%    ~             (p=0.508 n=5+5)
BM_UFlatSink/20 [man              ]           1.43GB/s ± 0%           1.42GB/s ± 1%    ~             (p=0.524 n=5+5)
BM_ZFlat/0      [html (22.31 %)   ]            815MB/s ± 0%            815MB/s ± 0%    ~             (p=1.000 n=5+5)
BM_ZFlat/1      [urls (47.78 %)   ]            420MB/s ± 0%            420MB/s ± 0%    ~             (p=0.841 n=5+5)
BM_ZFlat/2      [jpg (99.95 %)    ]           10.6GB/s ± 4%           10.6GB/s ± 3%    ~             (p=1.000 n=5+5)
BM_ZFlat/3      [jpg_200 (73.00 %)]            543MB/s ± 1%            546MB/s ± 0%    ~             (p=0.095 n=5+5)
BM_ZFlat/4      [pdf (83.30 %)    ]           6.96GB/s ± 1%           7.01GB/s ± 0%    ~             (p=0.190 n=5+4)
BM_ZFlat/5      [html4 (22.52 %)  ]            745MB/s ± 0%            745MB/s ± 0%    ~             (p=0.841 n=5+5)
BM_ZFlat/6      [txt1 (57.88 %)   ]            282MB/s ± 0%            282MB/s ± 0%    ~             (p=0.310 n=5+5)
BM_ZFlat/7      [txt2 (61.91 %)   ]            261MB/s ± 0%            261MB/s ± 0%    ~             (p=1.000 n=5+5)
BM_ZFlat/8      [txt3 (54.99 %)   ]            297MB/s ± 0%            297MB/s ± 0%    ~             (p=0.421 n=5+5)
BM_ZFlat/9      [txt4 (66.26 %)   ]            244MB/s ± 0%            244MB/s ± 0%    ~             (p=0.389 n=5+5)
BM_ZFlat/10     [pb (19.68 %)     ]           1.08GB/s ± 0%           1.08GB/s ± 0%    ~             (p=0.238 n=5+4)
BM_ZFlat/11     [gaviota (37.72 %)]            448MB/s ± 0%            447MB/s ± 0%    ~             (p=1.000 n=5+5)
BM_ZFlat/12     [cp (48.12 %)     ]            532MB/s ± 0%            531MB/s ± 1%    ~             (p=0.841 n=5+5)
BM_ZFlat/13     [c (42.47 %)      ]            632MB/s ± 0%            631MB/s ± 1%    ~             (p=0.841 n=5+5)
BM_ZFlat/14     [lsp (48.37 %)    ]            672MB/s ± 1%            671MB/s ± 0%    ~             (p=0.286 n=5+4)
BM_ZFlat/15     [xls (41.23 %)    ]            634MB/s ± 0%            633MB/s ± 0%    ~             (p=0.151 n=5+5)
BM_ZFlat/16     [xls_200 (78.00 %)]            507MB/s ± 2%            508MB/s ± 1%    ~             (p=1.000 n=5+5)
BM_ZFlat/17     [bin (18.11 %)    ]           1.01GB/s ± 0%           1.01GB/s ± 0%    ~             (p=0.056 n=5+5)
BM_ZFlat/18     [bin_200 (7.50 %) ]           2.24GB/s ± 5%           2.23GB/s ± 5%    ~             (p=0.889 n=5+5)
BM_ZFlat/19     [sum (48.96 %)    ]            479MB/s ± 0%            479MB/s ± 0%    ~             (p=0.690 n=5+5)
BM_ZFlat/20     [man (59.21 %)    ]            551MB/s ± 0%            551MB/s ± 1%    ~             (p=0.548 n=5+5)
2019-01-04 19:08:39 -08:00
nafi eb47f79631 Optimize by about 0.5%.
How? Move boolean args of EmitLiteral, EmitCopyAtMost64 and EmitCopy to
template args so that compiler generates two separate pruned versions of
the functions for arg=true and arg=false. FWIW, CompressFragment
function calls 1) EmitLiteral inside from a 1-level loop and 2) EmitCopy
from a 2-level nested loop. CompressFragment is called from inside
another while-loop from the public 'Compress' function.

name                                          old time/op             new time/op             delta
BM_UFlat/0      [html             ]            41.9µs ± 0%             41.1µs ± 0%  -1.92%        (p=0.000 n=10+10)
BM_UFlat/1      [urls             ]             576µs ± 0%              572µs ± 0%  -0.68%        (p=0.000 n=10+10)
BM_UFlat/2      [jpg              ]            7.25µs ± 6%             7.13µs ± 1%    ~             (p=0.074 n=9+8)
BM_UFlat/3      [jpg_200          ]              132ns ± 1%              130ns ± 0%  -1.45%         (p=0.000 n=10+8)
BM_UFlat/4      [pdf              ]            8.27µs ± 3%             8.22µs ± 0%    ~             (p=0.277 n=9+8)
BM_UFlat/5      [html4            ]             220µs ± 0%              219µs ± 0%  -0.75%        (p=0.000 n=10+10)
BM_UFlat/6      [txt1             ]             192µs ± 0%              190µs ± 0%  -0.80%        (p=0.000 n=10+10)
BM_UFlat/7      [txt2             ]             169µs ± 0%              168µs ± 0%  -0.69%        (p=0.000 n=10+10)
BM_UFlat/8      [txt3             ]             510µs ± 0%              508µs ± 0%  -0.42%        (p=0.000 n=10+10)
BM_UFlat/9      [txt4             ]             707µs ± 0%              702µs ± 0%  -0.67%        (p=0.000 n=10+10)
BM_UFlat/10     [pb               ]            38.5µs ± 0%             37.4µs ± 1%  -2.84%        (p=0.000 n=10+10)
BM_UFlat/11     [gaviota          ]             189µs ± 0%              190µs ± 0%  +0.55%        (p=0.000 n=10+10)
BM_UFlat/12     [cp               ]            14.2µs ± 0%             14.1µs ± 0%  -0.44%        (p=0.000 n=10+10)
BM_UFlat/13     [c                ]            7.31µs ± 1%             7.35µs ± 0%  +0.54%        (p=0.002 n=10+10)
BM_UFlat/14     [lsp              ]            2.27µs ± 0%             2.27µs ± 1%    ~             (p=0.161 n=9+9)
BM_UFlat/15     [xls              ]             905µs ± 0%              903µs ± 0%  -0.25%        (p=0.000 n=10+10)
BM_UFlat/16     [xls_200          ]              214ns ± 1%              213ns ± 1%  -0.57%        (p=0.043 n=10+10)
BM_UFlat/17     [bin              ]             275µs ± 0%              274µs ± 0%  -0.31%        (p=0.000 n=10+10)
BM_UFlat/18     [bin_200          ]              102ns ± 5%              101ns ± 3%    ~             (p=0.161 n=9+9)
BM_UFlat/19     [sum              ]            27.9µs ± 0%             27.2µs ± 0%  -2.68%        (p=0.000 n=10+10)
BM_UFlat/20     [man              ]            2.97µs ± 1%             2.97µs ± 0%    ~            (p=0.400 n=9+10)
BM_UValidate/0  [html             ]            33.3µs ± 0%             33.7µs ± 0%  +1.18%        (p=0.000 n=10+10)
BM_UValidate/1  [urls             ]             442µs ± 0%              442µs ± 0%    ~           (p=0.353 n=10+10)
BM_UValidate/2  [jpg              ]              146ns ± 0%              146ns ± 0%    ~           (p=0.063 n=10+10)
BM_UValidate/3  [jpg_200          ]             98.4ns ± 0%             98.5ns ± 0%    ~           (p=0.184 n=10+10)
BM_UValidate/4  [pdf              ]            2.88µs ± 0%             2.90µs ± 1%  +0.68%        (p=0.000 n=10+10)
BM_UIOVec/0     [html             ]             122µs ± 0%              122µs ± 0%  -0.39%        (p=0.000 n=10+10)
BM_UIOVec/1     [urls             ]             1.08ms ± 0%             1.08ms ± 0%    ~           (p=0.529 n=10+10)
BM_UIOVec/2     [jpg              ]            7.71µs ±11%             7.76µs ± 9%    ~           (p=0.853 n=10+10)
BM_UIOVec/3     [jpg_200          ]              327ns ± 0%              328ns ± 0%    ~            (p=0.146 n=8+10)
BM_UIOVec/4     [pdf              ]            12.1µs ± 1%             12.1µs ± 3%    ~           (p=0.315 n=10+10)
BM_UFlatSink/0  [html             ]            41.8µs ± 0%             41.0µs ± 0%  -1.87%         (p=0.000 n=10+9)
BM_UFlatSink/1  [urls             ]             576µs ± 0%              572µs ± 0%  -0.74%         (p=0.000 n=9+10)
BM_UFlatSink/2  [jpg              ]            7.58µs ± 8%             7.56µs ± 9%    ~           (p=0.739 n=10+10)
BM_UFlatSink/3  [jpg_200          ]              133ns ± 0%              134ns ± 0%  +0.60%         (p=0.000 n=10+9)
BM_UFlatSink/4  [pdf              ]            8.44µs ± 3%             8.30µs ± 1%  -1.65%        (p=0.029 n=10+10)
BM_UFlatSink/5  [html4            ]             220µs ± 0%              218µs ± 0%  -0.81%        (p=0.000 n=10+10)
BM_UFlatSink/6  [txt1             ]             192µs ± 0%              190µs ± 0%  -0.78%        (p=0.000 n=10+10)
BM_UFlatSink/7  [txt2             ]             169µs ± 0%              168µs ± 0%  -0.59%        (p=0.000 n=10+10)
BM_UFlatSink/8  [txt3             ]             510µs ± 0%              508µs ± 0%  -0.39%        (p=0.000 n=10+10)
BM_UFlatSink/9  [txt4             ]             707µs ± 0%              703µs ± 0%  -0.62%        (p=0.000 n=10+10)
BM_UFlatSink/10 [pb               ]            38.4µs ± 0%             37.4µs ± 0%  -2.62%          (p=0.000 n=9+9)
BM_UFlatSink/11 [gaviota          ]             189µs ± 0%              190µs ± 0%  +0.63%        (p=0.000 n=10+10)
BM_UFlatSink/12 [cp               ]            14.2µs ± 0%             14.1µs ± 0%  -0.27%        (p=0.011 n=10+10)
BM_UFlatSink/13 [c                ]            7.33µs ± 1%             7.35µs ± 1%    ~            (p=0.243 n=10+9)
BM_UFlatSink/14 [lsp              ]            2.27µs ± 0%             2.26µs ± 0%  -0.39%          (p=0.000 n=9+9)
BM_UFlatSink/15 [xls              ]             904µs ± 0%              902µs ± 0%  -0.28%        (p=0.000 n=10+10)
BM_UFlatSink/16 [xls_200          ]              216ns ± 1%              217ns ± 1%    ~            (p=0.661 n=10+9)
BM_UFlatSink/17 [bin              ]             275µs ± 0%              274µs ± 0%  -0.24%          (p=0.000 n=8+9)
BM_UFlatSink/18 [bin_200          ]              104ns ± 2%              104ns ± 1%  -0.70%         (p=0.043 n=9+10)
BM_UFlatSink/19 [sum              ]            27.8µs ± 0%             27.1µs ± 0%  -2.51%         (p=0.000 n=9+10)
BM_UFlatSink/20 [man              ]            3.02µs ± 1%             3.00µs ± 1%    ~            (p=0.079 n=10+9)
BM_ZFlat/0      [html (22.31 %)   ]             126µs ± 0%              126µs ± 0%  -0.24%        (p=0.000 n=10+10)
BM_ZFlat/1      [urls (47.78 %)   ]             1.68ms ± 0%             1.67ms ± 0%  -1.06%        (p=0.000 n=10+10)
BM_ZFlat/2      [jpg (99.95 %)    ]            11.8µs ± 5%             11.6µs ± 5%    ~           (p=0.165 n=10+10)
BM_ZFlat/3      [jpg_200 (73.00 %)]              360ns ± 3%              358ns ± 1%    ~            (p=0.762 n=10+8)
BM_ZFlat/4      [pdf (83.30 %)    ]            14.8µs ± 2%             14.6µs ± 1%  -1.57%         (p=0.022 n=10+9)
BM_ZFlat/5      [html4 (22.52 %)  ]             556µs ± 0%              552µs ± 0%  -0.87%        (p=0.000 n=10+10)
BM_ZFlat/6      [txt1 (57.88 %)   ]             542µs ± 0%              540µs ± 0%  -0.47%        (p=0.000 n=10+10)
BM_ZFlat/7      [txt2 (61.91 %)   ]             483µs ± 0%              480µs ± 0%  -0.62%        (p=0.000 n=10+10)
BM_ZFlat/8      [txt3 (54.99 %)   ]             1.45ms ± 0%             1.44ms ± 0%  -0.47%        (p=0.000 n=10+10)
BM_ZFlat/9      [txt4 (66.26 %)   ]             1.98ms ± 0%             1.97ms ± 0%  -0.19%        (p=0.007 n=10+10)
BM_ZFlat/10     [pb (19.68 %)     ]             111µs ± 0%              109µs ± 0%  -1.75%        (p=0.000 n=10+10)
BM_ZFlat/11     [gaviota (37.72 %)]             411µs ± 0%              410µs ± 0%  -0.21%        (p=0.004 n=10+10)
BM_ZFlat/12     [cp (48.12 %)     ]            45.9µs ± 0%             45.5µs ± 0%  -0.76%        (p=0.000 n=10+10)
BM_ZFlat/13     [c (42.47 %)      ]            17.6µs ± 0%             17.5µs ± 0%  -0.80%        (p=0.000 n=10+10)
BM_ZFlat/14     [lsp (48.37 %)    ]            5.50µs ± 0%             5.44µs ± 0%  -1.19%         (p=0.000 n=9+10)
BM_ZFlat/15     [xls (41.23 %)    ]             1.63ms ± 0%             1.61ms ± 0%  -1.21%        (p=0.000 n=10+10)
BM_ZFlat/16     [xls_200 (78.00 %)]              389ns ± 2%              391ns ± 1%    ~            (p=0.182 n=10+9)
BM_ZFlat/17     [bin (18.11 %)    ]             509µs ± 0%              506µs ± 0%  -0.51%        (p=0.000 n=10+10)
BM_ZFlat/18     [bin_200 (7.50 %) ]             92.7ns ± 0%             89.4ns ± 1%  -3.55%          (p=0.000 n=8+8)
BM_ZFlat/19     [sum (48.96 %)    ]            80.2µs ± 0%             78.9µs ± 0%  -1.65%        (p=0.000 n=10+10)
BM_ZFlat/20     [man (59.21 %)    ]            7.59µs ± 1%             7.59µs ± 1%    ~           (p=0.912 n=10+10)

name                                          old allocs/op           new allocs/op           delta
BM_UFlat/0      [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/1      [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/2      [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/3      [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/4      [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/5      [html4            ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/6      [txt1             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/7      [txt2             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/8      [txt3             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/9      [txt4             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/10     [pb               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/11     [gaviota          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/12     [cp               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/13     [c                ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/14     [lsp              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/15     [xls              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/16     [xls_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/17     [bin              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/18     [bin_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/19     [sum              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlat/20     [man              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/0  [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/1  [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/2  [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/3  [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UValidate/4  [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/0     [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/1     [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/2     [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/3     [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UIOVec/4     [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/0  [html             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/1  [urls             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/2  [jpg              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/3  [jpg_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/4  [pdf              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/5  [html4            ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/6  [txt1             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/7  [txt2             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/8  [txt3             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/9  [txt4             ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/10 [pb               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/11 [gaviota          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/12 [cp               ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/13 [c                ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/14 [lsp              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/15 [xls              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/16 [xls_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/17 [bin              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/18 [bin_200          ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/19 [sum              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_UFlatSink/20 [man              ]               0.00 ±NaN%              0.00 ±NaN%    ~     (all samples are equal)
BM_ZFlat/0      [html (22.31 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/1      [urls (47.78 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/2      [jpg (99.95 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/3      [jpg_200 (73.00 %)]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/4      [pdf (83.30 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/5      [html4 (22.52 %)  ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/6      [txt1 (57.88 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/7      [txt2 (61.91 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/8      [txt3 (54.99 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/9      [txt4 (66.26 %)   ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/10     [pb (19.68 %)     ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/11     [gaviota (37.72 %)]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/12     [cp (48.12 %)     ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/13     [c (42.47 %)      ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/14     [lsp (48.37 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/15     [xls (41.23 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/16     [xls_200 (78.00 %)]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/17     [bin (18.11 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/18     [bin_200 (7.50 %) ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/19     [sum (48.96 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)
BM_ZFlat/20     [man (59.21 %)    ]               1.00 ± 0%               1.00 ± 0%    ~     (all samples are equal)

name                                          old peak-mem(Bytes)/op  new peak-mem(Bytes)/op  delta
BM_UFlat/0      [html             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/1      [urls             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/2      [jpg              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/3      [jpg_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/4      [pdf              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/5      [html4            ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/6      [txt1             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/7      [txt2             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/8      [txt3             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/9      [txt4             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/10     [pb               ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/11     [gaviota          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/12     [cp               ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/13     [c                ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/14     [lsp              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/15     [xls              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/16     [xls_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/17     [bin              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/18     [bin_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/19     [sum              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlat/20     [man              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/0  [html             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/1  [urls             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/2  [jpg              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/3  [jpg_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UValidate/4  [pdf              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/0     [html             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/1     [urls             ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/2     [jpg              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/3     [jpg_200          ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UIOVec/4     [pdf              ]               4.00 ± 0%               4.00 ± 0%    ~     (all samples are equal)
BM_UFlatSink/0  [html             ]               102k ± 0%               102k ± 0%    ~     (all samples are equal)
BM_UFlatSink/1  [urls             ]               702k ± 0%               702k ± 0%    ~     (all samples are equal)
BM_UFlatSink/2  [jpg              ]               123k ± 0%               123k ± 0%    ~     (all samples are equal)
BM_UFlatSink/3  [jpg_200          ]                201 ± 0%                201 ± 0%    ~     (all samples are equal)
BM_UFlatSink/4  [pdf              ]               102k ± 0%               102k ± 0%    ~     (all samples are equal)
BM_UFlatSink/5  [html4            ]               410k ± 0%               410k ± 0%    ~     (all samples are equal)
BM_UFlatSink/6  [txt1             ]               152k ± 0%               152k ± 0%    ~     (all samples are equal)
BM_UFlatSink/7  [txt2             ]               125k ± 0%               125k ± 0%    ~     (all samples are equal)
BM_UFlatSink/8  [txt3             ]               427k ± 0%               427k ± 0%    ~     (all samples are equal)
BM_UFlatSink/9  [txt4             ]               482k ± 0%               482k ± 0%    ~     (all samples are equal)
BM_UFlatSink/10 [pb               ]               119k ± 0%               119k ± 0%    ~     (all samples are equal)
BM_UFlatSink/11 [gaviota          ]               184k ± 0%               184k ± 0%    ~     (all samples are equal)
BM_UFlatSink/12 [cp               ]              24.6k ± 0%              24.6k ± 0%    ~     (all samples are equal)
BM_UFlatSink/13 [c                ]              11.2k ± 0%              11.2k ± 0%    ~     (all samples are equal)
BM_UFlatSink/14 [lsp              ]              3.72k ± 0%              3.72k ± 0%    ~     (all samples are equal)
BM_UFlatSink/15 [xls              ]              1.03M ± 0%              1.03M ± 0%    ~     (all samples are equal)
BM_UFlatSink/16 [xls_200          ]                201 ± 0%                201 ± 0%    ~     (all samples are equal)
BM_UFlatSink/17 [bin              ]               513k ± 0%               513k ± 0%    ~     (all samples are equal)
BM_UFlatSink/18 [bin_200          ]                201 ± 0%                201 ± 0%    ~     (all samples are equal)
BM_UFlatSink/19 [sum              ]              38.2k ± 0%              38.2k ± 0%    ~     (all samples are equal)
BM_UFlatSink/20 [man              ]              4.23k ± 0%              4.23k ± 0%    ~     (all samples are equal)
BM_ZFlat/0      [html (22.31 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/1      [urls (47.78 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/2      [jpg (99.95 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/3      [jpg_200 (73.00 %)]              63.3k ± 0%              63.3k ± 0%    ~     (all samples are equal)
BM_ZFlat/4      [pdf (83.30 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/5      [html4 (22.52 %)  ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/6      [txt1 (57.88 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/7      [txt2 (61.91 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/8      [txt3 (54.99 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/9      [txt4 (66.26 %)   ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/10     [pb (19.68 %)     ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/11     [gaviota (37.72 %)]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/12     [cp (48.12 %)     ]              86.1k ± 0%              86.1k ± 0%    ~     (all samples are equal)
BM_ZFlat/13     [c (42.47 %)      ]              63.3k ± 0%              63.3k ± 0%    ~     (all samples are equal)
BM_ZFlat/14     [lsp (48.37 %)    ]              63.3k ± 0%              63.3k ± 0%    ~     (all samples are equal)
BM_ZFlat/15     [xls (41.23 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/16     [xls_200 (78.00 %)]              63.3k ± 0%              63.3k ± 0%    ~     (all samples are equal)
BM_ZFlat/17     [bin (18.11 %)    ]               175k ± 0%               175k ± 0%    ~     (all samples are equal)
BM_ZFlat/18     [bin_200 (7.50 %) ]              63.3k ± 0%              63.3k ± 0%    ~     (all samples are equal)
BM_ZFlat/19     [sum (48.96 %)    ]               116k ± 0%               116k ± 0%    ~     (all samples are equal)
BM_ZFlat/20     [man (59.21 %)    ]              63.3k ± 0%              63.3k ± 0%    ~     (all samples are equal)

name                                          old speed               new speed               delta
BM_UFlat/0      [html             ]           2.45GB/s ± 0%           2.50GB/s ± 0%  +1.96%        (p=0.000 n=10+10)
BM_UFlat/1      [urls             ]           1.22GB/s ± 0%           1.23GB/s ± 0%  +0.69%        (p=0.000 n=10+10)
BM_UFlat/2      [jpg              ]           17.0GB/s ± 5%           17.3GB/s ± 1%    ~             (p=0.074 n=9+8)
BM_UFlat/3      [jpg_200          ]           1.52GB/s ± 1%           1.54GB/s ± 0%  +1.44%         (p=0.000 n=10+8)
BM_UFlat/4      [pdf              ]           12.5GB/s ± 1%           12.5GB/s ± 0%    ~             (p=0.721 n=8+8)
BM_UFlat/5      [html4            ]           1.87GB/s ± 0%           1.88GB/s ± 0%  +0.76%        (p=0.000 n=10+10)
BM_UFlat/6      [txt1             ]            795MB/s ± 0%            801MB/s ± 0%  +0.79%        (p=0.000 n=10+10)
BM_UFlat/7      [txt2             ]            741MB/s ± 0%            746MB/s ± 0%  +0.68%        (p=0.000 n=10+10)
BM_UFlat/8      [txt3             ]            840MB/s ± 0%            844MB/s ± 0%  +0.44%        (p=0.000 n=10+10)
BM_UFlat/9      [txt4             ]            684MB/s ± 0%            688MB/s ± 0%  +0.65%         (p=0.000 n=9+10)
BM_UFlat/10     [pb               ]           3.09GB/s ± 0%           3.18GB/s ± 0%  +2.88%         (p=0.000 n=10+9)
BM_UFlat/11     [gaviota          ]            980MB/s ± 0%            975MB/s ± 0%  -0.57%        (p=0.000 n=10+10)
BM_UFlat/12     [cp               ]           1.74GB/s ± 0%           1.75GB/s ± 0%  +0.38%         (p=0.001 n=10+9)
BM_UFlat/13     [c                ]           1.53GB/s ± 1%           1.52GB/s ± 0%  -0.55%        (p=0.003 n=10+10)
BM_UFlat/14     [lsp              ]           1.64GB/s ± 0%           1.64GB/s ± 1%    ~            (p=0.400 n=9+10)
BM_UFlat/15     [xls              ]           1.14GB/s ± 0%           1.14GB/s ± 0%  +0.23%        (p=0.000 n=10+10)
BM_UFlat/16     [xls_200          ]            936MB/s ± 1%            941MB/s ± 1%    ~           (p=0.052 n=10+10)
BM_UFlat/17     [bin              ]           1.87GB/s ± 0%           1.88GB/s ± 0%  +0.28%        (p=0.000 n=10+10)
BM_UFlat/18     [bin_200          ]           1.97GB/s ± 5%           1.99GB/s ± 3%    ~             (p=0.136 n=9+9)
BM_UFlat/19     [sum              ]           1.37GB/s ± 0%           1.41GB/s ± 0%  +2.82%         (p=0.000 n=10+9)
BM_UFlat/20     [man              ]           1.42GB/s ± 1%           1.42GB/s ± 0%    ~           (p=0.579 n=10+10)
BM_UValidate/0  [html             ]           3.08GB/s ± 0%           3.05GB/s ± 0%  -1.18%        (p=0.000 n=10+10)
BM_UValidate/1  [urls             ]           1.59GB/s ± 0%           1.59GB/s ± 0%    ~           (p=0.247 n=10+10)
BM_UValidate/2  [jpg              ]            845GB/s ± 0%            846GB/s ± 0%  +0.09%        (p=0.000 n=10+10)
BM_UValidate/3  [jpg_200          ]           2.04GB/s ± 0%           2.04GB/s ± 0%  -0.09%        (p=0.019 n=10+10)
BM_UValidate/4  [pdf              ]           35.7GB/s ± 0%           35.4GB/s ± 1%  -0.70%        (p=0.000 n=10+10)
BM_UIOVec/0     [html             ]            841MB/s ± 0%            844MB/s ± 0%  +0.36%        (p=0.000 n=10+10)
BM_UIOVec/1     [urls             ]            650MB/s ± 0%            650MB/s ± 0%    ~           (p=0.105 n=10+10)
BM_UIOVec/2     [jpg              ]           16.1GB/s ±10%           15.9GB/s ± 8%    ~           (p=0.853 n=10+10)
BM_UIOVec/3     [jpg_200          ]            612MB/s ± 1%            612MB/s ± 0%    ~            (p=0.243 n=9+10)
BM_UIOVec/4     [pdf              ]           8.52GB/s ± 2%           8.46GB/s ± 3%    ~           (p=0.436 n=10+10)
BM_UFlatSink/0  [html             ]           2.46GB/s ± 0%           2.50GB/s ± 0%  +1.83%         (p=0.000 n=9+10)
BM_UFlatSink/1  [urls             ]           1.22GB/s ± 0%           1.23GB/s ± 0%  +0.73%        (p=0.000 n=10+10)
BM_UFlatSink/2  [jpg              ]           16.3GB/s ± 8%           16.4GB/s ± 9%    ~           (p=0.739 n=10+10)
BM_UFlatSink/3  [jpg_200          ]           1.51GB/s ± 0%           1.50GB/s ± 0%  -0.62%         (p=0.000 n=10+9)
BM_UFlatSink/4  [pdf              ]           12.2GB/s ± 3%           12.4GB/s ± 1%  +1.62%        (p=0.029 n=10+10)
BM_UFlatSink/5  [html4            ]           1.87GB/s ± 0%           1.88GB/s ± 0%  +0.79%        (p=0.000 n=10+10)
BM_UFlatSink/6  [txt1             ]            795MB/s ± 0%            801MB/s ± 0%  +0.74%         (p=0.000 n=10+9)
BM_UFlatSink/7  [txt2             ]            741MB/s ± 0%            745MB/s ± 0%  +0.59%         (p=0.000 n=10+9)
BM_UFlatSink/8  [txt3             ]            840MB/s ± 0%            843MB/s ± 0%  +0.37%         (p=0.000 n=9+10)
BM_UFlatSink/9  [txt4             ]            684MB/s ± 0%            688MB/s ± 0%  +0.57%         (p=0.000 n=9+10)
BM_UFlatSink/10 [pb               ]           3.10GB/s ± 0%           3.18GB/s ± 0%  +2.64%         (p=0.000 n=9+10)
BM_UFlatSink/11 [gaviota          ]            980MB/s ± 0%            974MB/s ± 0%  -0.64%        (p=0.000 n=10+10)
BM_UFlatSink/12 [cp               ]           1.74GB/s ± 0%           1.75GB/s ± 0%  +0.26%        (p=0.005 n=10+10)
BM_UFlatSink/13 [c                ]           1.52GB/s ± 1%           1.52GB/s ± 1%    ~           (p=0.123 n=10+10)
BM_UFlatSink/14 [lsp              ]           1.64GB/s ± 0%           1.65GB/s ± 0%  +0.46%         (p=0.000 n=10+8)
BM_UFlatSink/15 [xls              ]           1.14GB/s ± 0%           1.15GB/s ± 0%  +0.27%        (p=0.000 n=10+10)
BM_UFlatSink/16 [xls_200          ]            927MB/s ± 1%            926MB/s ± 1%    ~            (p=0.497 n=10+9)
BM_UFlatSink/17 [bin              ]           1.87GB/s ± 0%           1.88GB/s ± 0%  +0.27%        (p=0.000 n=10+10)
BM_UFlatSink/18 [bin_200          ]           1.92GB/s ± 2%           1.93GB/s ± 1%  +0.70%         (p=0.035 n=9+10)
BM_UFlatSink/19 [sum              ]           1.38GB/s ± 0%           1.41GB/s ± 0%  +2.59%         (p=0.000 n=9+10)
BM_UFlatSink/20 [man              ]           1.40GB/s ± 1%           1.41GB/s ± 1%    ~            (p=0.079 n=10+9)
BM_ZFlat/0      [html (22.31 %)   ]            814MB/s ± 0%            816MB/s ± 0%  +0.23%        (p=0.000 n=10+10)
BM_ZFlat/1      [urls (47.78 %)   ]            418MB/s ± 0%            423MB/s ± 0%  +1.06%        (p=0.000 n=10+10)
BM_ZFlat/2      [jpg (99.95 %)    ]           10.5GB/s ± 5%           10.7GB/s ± 5%    ~           (p=0.165 n=10+10)
BM_ZFlat/3      [jpg_200 (73.00 %)]            558MB/s ± 3%            560MB/s ± 1%    ~            (p=0.696 n=10+8)
BM_ZFlat/4      [pdf (83.30 %)    ]           6.94GB/s ± 2%           7.05GB/s ± 1%  +1.59%         (p=0.028 n=10+9)
BM_ZFlat/5      [html4 (22.52 %)  ]            739MB/s ± 0%            745MB/s ± 0%  +0.86%        (p=0.000 n=10+10)
BM_ZFlat/6      [txt1 (57.88 %)   ]            281MB/s ± 0%            283MB/s ± 0%  +0.46%        (p=0.000 n=10+10)
BM_ZFlat/7      [txt2 (61.91 %)   ]            260MB/s ± 0%            261MB/s ± 0%  +0.59%        (p=0.000 n=10+10)
BM_ZFlat/8      [txt3 (54.99 %)   ]            296MB/s ± 0%            297MB/s ± 0%  +0.45%        (p=0.000 n=10+10)
BM_ZFlat/9      [txt4 (66.26 %)   ]            244MB/s ± 0%            245MB/s ± 0%  +0.16%        (p=0.000 n=10+10)
BM_ZFlat/10     [pb (19.68 %)     ]           1.07GB/s ± 0%           1.09GB/s ± 0%  +1.75%        (p=0.000 n=10+10)
BM_ZFlat/11     [gaviota (37.72 %)]            450MB/s ± 0%            451MB/s ± 0%  +0.17%         (p=0.000 n=9+10)
BM_ZFlat/12     [cp (48.12 %)     ]            538MB/s ± 0%            542MB/s ± 0%  +0.74%        (p=0.000 n=10+10)
BM_ZFlat/13     [c (42.47 %)      ]            635MB/s ± 0%            640MB/s ± 0%  +0.80%        (p=0.000 n=10+10)
BM_ZFlat/14     [lsp (48.37 %)    ]            678MB/s ± 0%            686MB/s ± 1%  +1.18%         (p=0.000 n=9+10)
BM_ZFlat/15     [xls (41.23 %)    ]            633MB/s ± 0%            641MB/s ± 0%  +1.23%         (p=0.000 n=10+7)
BM_ZFlat/16     [xls_200 (78.00 %)]            516MB/s ± 2%            513MB/s ± 1%    ~            (p=0.156 n=10+9)
BM_ZFlat/17     [bin (18.11 %)    ]           1.01GB/s ± 0%           1.02GB/s ± 0%  +0.49%        (p=0.000 n=10+10)
BM_ZFlat/18     [bin_200 (7.50 %) ]           2.16GB/s ± 0%           2.24GB/s ± 1%  +3.65%          (p=0.000 n=8+8)
BM_ZFlat/19     [sum (48.96 %)    ]            478MB/s ± 0%            486MB/s ± 0%  +1.66%        (p=0.000 n=10+10)
BM_ZFlat/20     [man (59.21 %)    ]            558MB/s ± 1%            558MB/s ± 1%    ~           (p=0.912 n=10+10)
2019-01-04 19:08:30 -08:00
jueminyang 254966c71e Migrate to use absl::random 2019-01-04 19:08:11 -08:00
alkis 53a38e5e33 Reduce number of allocations when compressing and simplify the code.
Before we were allocating at least once: twice with large table and
thrice when we used a scratch buffer. With this approach we always
allocate once.

  name                                          old speed               new speed               delta
  BM_UFlat/0      [html             ]           2.45GB/s ± 0%           2.45GB/s ± 0%   -0.13%        (p=0.000 n=11+11)
  BM_UFlat/1      [urls             ]           1.19GB/s ± 0%           1.22GB/s ± 0%   +2.48%        (p=0.000 n=11+11)
  BM_UFlat/2      [jpg              ]           17.2GB/s ± 2%           17.3GB/s ± 1%     ~           (p=0.193 n=11+11)
  BM_UFlat/3      [jpg_200          ]           1.52GB/s ± 0%           1.51GB/s ± 0%   -0.78%         (p=0.000 n=10+9)
  BM_UFlat/4      [pdf              ]           12.5GB/s ± 1%           12.5GB/s ± 1%     ~             (p=0.881 n=9+9)
  BM_UFlat/5      [html4            ]           1.86GB/s ± 0%           1.86GB/s ± 0%     ~           (p=0.123 n=11+11)
  BM_UFlat/6      [txt1             ]            793MB/s ± 0%            799MB/s ± 0%   +0.78%         (p=0.000 n=11+9)
  BM_UFlat/7      [txt2             ]            739MB/s ± 0%            744MB/s ± 0%   +0.77%        (p=0.000 n=11+11)
  BM_UFlat/8      [txt3             ]            839MB/s ± 0%            845MB/s ± 0%   +0.71%        (p=0.000 n=11+11)
  BM_UFlat/9      [txt4             ]            678MB/s ± 0%            685MB/s ± 0%   +1.01%        (p=0.000 n=11+11)
  BM_UFlat/10     [pb               ]           3.08GB/s ± 0%           3.12GB/s ± 0%   +1.21%        (p=0.000 n=11+11)
  BM_UFlat/11     [gaviota          ]            975MB/s ± 0%            976MB/s ± 0%   +0.11%        (p=0.000 n=11+11)
  BM_UFlat/12     [cp               ]           1.73GB/s ± 1%           1.74GB/s ± 1%   +0.46%        (p=0.010 n=11+11)
  BM_UFlat/13     [c                ]           1.53GB/s ± 0%           1.53GB/s ± 0%     ~           (p=0.987 n=11+10)
  BM_UFlat/14     [lsp              ]           1.65GB/s ± 0%           1.63GB/s ± 1%   -1.04%        (p=0.000 n=11+11)
  BM_UFlat/15     [xls              ]           1.08GB/s ± 0%           1.15GB/s ± 0%   +6.12%        (p=0.000 n=10+11)
  BM_UFlat/16     [xls_200          ]            944MB/s ± 0%            920MB/s ± 3%   -2.51%         (p=0.000 n=9+11)
  BM_UFlat/17     [bin              ]           1.86GB/s ± 0%           1.87GB/s ± 0%   +0.68%        (p=0.000 n=10+11)
  BM_UFlat/18     [bin_200          ]           1.91GB/s ± 3%           1.92GB/s ± 5%     ~           (p=0.356 n=11+11)
  BM_UFlat/19     [sum              ]           1.31GB/s ± 0%           1.40GB/s ± 0%   +6.53%        (p=0.000 n=11+11)
  BM_UFlat/20     [man              ]           1.42GB/s ± 0%           1.42GB/s ± 0%   +0.33%        (p=0.000 n=10+10)
2019-01-04 19:07:49 -08:00
ckennelly df5548c0b3 Use sized deallocation when releasing Zippy's scratch buffers.
name                                          old time/op             new time/op             delta
BM_UFlat/0      [html             ]            41.7µs ± 0%             41.7µs ± 0%    ~             (p=0.222 n=5+5)
BM_UFlat/1      [urls             ]             587µs ± 0%              574µs ± 0%  -2.31%          (p=0.008 n=5+5)
BM_UFlat/2      [jpg              ]            7.24µs ± 2%             7.25µs ± 2%    ~             (p=0.690 n=5+5)
BM_UFlat/3      [jpg_200          ]              130ns ± 0%              131ns ± 1%    ~             (p=0.556 n=4+5)
BM_UFlat/4      [pdf              ]            8.21µs ± 0%             8.24µs ± 1%    ~             (p=0.278 n=5+5)
BM_UFlat/5      [html4            ]             219µs ± 0%              220µs ± 0%  +0.45%          (p=0.008 n=5+5)
BM_UFlat/6      [txt1             ]             192µs ± 0%              190µs ± 0%  -0.86%          (p=0.008 n=5+5)
BM_UFlat/7      [txt2             ]             169µs ± 0%              168µs ± 0%  -0.54%          (p=0.008 n=5+5)
BM_UFlat/8      [txt3             ]             509µs ± 0%              505µs ± 0%  -0.66%          (p=0.008 n=5+5)
BM_UFlat/9      [txt4             ]             710µs ± 0%              702µs ± 0%  -1.14%          (p=0.008 n=5+5)
BM_UFlat/10     [pb               ]            38.2µs ± 0%             37.9µs ± 0%  -0.82%          (p=0.008 n=5+5)
BM_UFlat/11     [gaviota          ]             189µs ± 0%              189µs ± 0%    ~             (p=0.746 n=5+5)
BM_UFlat/12     [cp               ]            14.2µs ± 0%             14.2µs ± 1%    ~             (p=0.421 n=5+5)
BM_UFlat/13     [c                ]            7.29µs ± 0%             7.34µs ± 1%  +0.69%          (p=0.016 n=5+5)
BM_UFlat/14     [lsp              ]            2.27µs ± 0%             2.28µs ± 0%  +0.34%          (p=0.008 n=5+5)
BM_UFlat/15     [xls              ]             954µs ± 0%              900µs ± 0%  -5.67%          (p=0.008 n=5+5)
BM_UFlat/16     [xls_200          ]              213ns ± 1%              217ns ± 2%    ~             (p=0.056 n=5+5)
BM_UFlat/17     [bin              ]             276µs ± 0%              274µs ± 0%  -0.94%          (p=0.008 n=5+5)
BM_UFlat/18     [bin_200          ]              101ns ± 1%              101ns ± 1%    ~             (p=0.524 n=5+5)
BM_UFlat/19     [sum              ]            29.3µs ± 0%             27.3µs ± 0%  -6.98%          (p=0.008 n=5+5)
BM_UFlat/20     [man              ]            2.95µs ± 0%             2.95µs ± 0%    ~             (p=0.651 n=5+5)

For microbenchmarks, the overhead of allocating/deallocating should be
small (the relevant metadata for TCMalloc's PageMap will be in cache),
but this helps demonstrate that the refactoring does not adversely
impact performance.
2019-01-04 19:07:40 -08:00
alkis 1b7466e143 Compute the wordmask instead of looking it up in a table.
Tested:
  name                                old speed               new speed               delta
  BM_UFlat/0      [html   ]           2.13GB/s ± 0%           2.46GB/s ± 0%  +15.70%         (p=0.000 n=10+8)
  BM_UFlat/1      [urls   ]           1.21GB/s ± 0%           1.20GB/s ± 0%   -1.49%         (p=0.000 n=9+10)
  BM_UFlat/2      [jpg    ]           17.1GB/s ± 1%           17.2GB/s ± 1%     ~           (p=0.120 n=11+11)
  BM_UFlat/3      [jpg_200]           1.55GB/s ± 0%           1.54GB/s ± 0%   -0.96%         (p=0.000 n=10+7)
  BM_UFlat/4      [pdf    ]           12.9GB/s ± 0%           12.6GB/s ± 0%   -1.98%         (p=0.000 n=11+9)
  BM_UFlat/5      [html4  ]           1.87GB/s ± 0%           1.87GB/s ± 0%   -0.06%        (p=0.033 n=11+11)
  BM_UFlat/6      [txt1   ]            816MB/s ± 0%            793MB/s ± 0%   -2.84%        (p=0.000 n=11+11)
  BM_UFlat/7      [txt2   ]            758MB/s ± 0%            737MB/s ± 0%   -2.77%        (p=0.000 n=11+11)
  BM_UFlat/8      [txt3   ]            865MB/s ± 0%            839MB/s ± 0%   -2.94%         (p=0.000 n=11+8)
  BM_UFlat/9      [txt4   ]            701MB/s ± 0%            679MB/s ± 0%   -3.11%        (p=0.000 n=11+10)
  BM_UFlat/10     [pb     ]           2.60GB/s ± 2%           3.07GB/s ± 0%  +17.81%        (p=0.000 n=11+11)
  BM_UFlat/11     [gaviota]           1.01GB/s ± 0%           0.97GB/s ± 0%   -3.83%        (p=0.000 n=11+10)
  BM_UFlat/12     [cp     ]           1.66GB/s ± 1%           1.73GB/s ± 1%   +4.32%        (p=0.000 n=11+11)
  BM_UFlat/13     [c      ]           1.52GB/s ± 1%           1.53GB/s ± 0%   +0.49%        (p=0.002 n=11+11)
  BM_UFlat/14     [lsp    ]           1.61GB/s ± 0%           1.64GB/s ± 0%   +2.10%        (p=0.000 n=10+11)
  BM_UFlat/15     [xls    ]           1.12GB/s ± 0%           1.08GB/s ± 0%   -3.95%         (p=0.000 n=11+7)
  BM_UFlat/16     [xls_200]            926MB/s ± 1%            935MB/s ± 1%     ~            (p=0.056 n=9+11)
  BM_UFlat/17     [bin    ]           1.89GB/s ± 0%           1.86GB/s ± 0%   -1.32%        (p=0.000 n=11+11)
  BM_UFlat/18     [bin_200]           1.96GB/s ± 0%           1.99GB/s ± 1%   +1.78%        (p=0.000 n=11+11)
  BM_UFlat/19     [sum    ]           1.32GB/s ± 0%           1.31GB/s ± 0%   -0.79%        (p=0.000 n=11+10)
  BM_UFlat/20     [man    ]           1.40GB/s ± 0%           1.43GB/s ± 0%   +2.51%         (p=0.000 n=9+10)
  BM_UValidate/0  [html   ]           2.95GB/s ± 1%           3.07GB/s ± 0%   +4.11%        (p=0.000 n=10+11)
  BM_UValidate/1  [urls   ]           1.57GB/s ± 0%           1.60GB/s ± 0%   +2.24%        (p=0.000 n=10+11)
  BM_UValidate/2  [jpg    ]            822GB/s ± 0%            850GB/s ± 0%   +3.42%        (p=0.000 n=10+11)
  BM_UValidate/3  [jpg_200]           2.01GB/s ± 0%           2.04GB/s ± 0%   +1.24%        (p=0.000 n=11+11)
  BM_UValidate/4  [pdf    ]           33.7GB/s ± 0%           35.9GB/s ± 1%   +6.51%        (p=0.000 n=10+11)
  BM_UIOVec/0     [html   ]            852MB/s ± 0%            852MB/s ± 0%     ~           (p=0.898 n=11+11)
  BM_UIOVec/1     [urls   ]            663MB/s ± 0%            652MB/s ± 0%   -1.61%        (p=0.000 n=11+11)
  BM_UIOVec/2     [jpg    ]           15.3GB/s ± 1%           15.3GB/s ± 2%     ~            (p=0.459 n=9+10)
  BM_UIOVec/3     [jpg_200]            652MB/s ± 0%            627MB/s ± 1%   -3.80%        (p=0.000 n=10+11)
  BM_UIOVec/4     [pdf    ]           8.80GB/s ± 1%           8.57GB/s ± 1%   -2.62%        (p=0.000 n=10+11)
  BM_UFlatSink/0  [html   ]           2.13GB/s ± 0%           2.46GB/s ± 0%  +15.63%        (p=0.000 n=11+11)
  BM_UFlatSink/1  [urls   ]           1.21GB/s ± 0%           1.20GB/s ± 0%   -1.42%        (p=0.000 n=11+10)
  BM_UFlatSink/2  [jpg    ]           17.1GB/s ± 2%           17.2GB/s ± 1%     ~            (p=0.175 n=11+9)
  BM_UFlatSink/3  [jpg_200]           1.52GB/s ± 1%           1.47GB/s ± 3%   -3.15%        (p=0.000 n=11+11)
  BM_UFlatSink/4  [pdf    ]           12.8GB/s ± 1%           12.6GB/s ± 1%   -1.76%        (p=0.000 n=11+11)
  BM_UFlatSink/5  [html4  ]           1.87GB/s ± 0%           1.87GB/s ± 0%   -0.19%        (p=0.000 n=11+10)
  BM_UFlatSink/6  [txt1   ]            816MB/s ± 0%            792MB/s ± 0%   -2.94%        (p=0.000 n=11+11)
  BM_UFlatSink/7  [txt2   ]            758MB/s ± 0%            736MB/s ± 0%   -2.83%        (p=0.000 n=11+11)
  BM_UFlatSink/8  [txt3   ]            865MB/s ± 0%            838MB/s ± 0%   -3.13%        (p=0.000 n=11+11)
  BM_UFlatSink/9  [txt4   ]            701MB/s ± 0%            678MB/s ± 0%   -3.20%        (p=0.000 n=11+11)
  BM_UFlatSink/10 [pb     ]           2.60GB/s ± 2%           3.07GB/s ± 0%  +18.27%        (p=0.000 n=11+10)
  BM_UFlatSink/11 [gaviota]           1.01GB/s ± 0%           0.97GB/s ± 0%   -3.90%        (p=0.000 n=11+11)
  BM_UFlatSink/12 [cp     ]           1.66GB/s ± 1%           1.73GB/s ± 1%   +4.62%        (p=0.000 n=11+10)
  BM_UFlatSink/13 [c      ]           1.52GB/s ± 0%           1.53GB/s ± 1%     ~            (p=0.180 n=9+11)
  BM_UFlatSink/14 [lsp    ]           1.61GB/s ± 0%           1.64GB/s ± 1%   +1.98%         (p=0.000 n=9+11)
  BM_UFlatSink/15 [xls    ]           1.12GB/s ± 0%           1.08GB/s ± 0%   -3.76%        (p=0.000 n=11+11)
  BM_UFlatSink/16 [xls_200]            909MB/s ± 2%            924MB/s ± 1%   +1.62%        (p=0.000 n=11+11)
  BM_UFlatSink/17 [bin    ]           1.88GB/s ± 0%           1.86GB/s ± 0%   -1.18%         (p=0.000 n=9+11)
  BM_UFlatSink/18 [bin_200]           1.94GB/s ± 2%           1.94GB/s ± 1%     ~           (p=0.090 n=11+11)
  BM_UFlatSink/19 [sum    ]           1.32GB/s ± 0%           1.31GB/s ± 0%   -0.76%        (p=0.000 n=11+11)
  BM_UFlatSink/20 [man    ]           1.39GB/s ± 2%           1.43GB/s ± 0%   +2.75%        (p=0.000 n=11+10)

  Assembly before:

*	44 8b 5c 85 a0       	mov    -0x60(%rbp,%rax,4),%r11d
	45 23 5d 00          	and    0x0(%r13),%r11d
	89 d6                	mov    %edx,%esi
	81 e6 00 07 00 00    	and    $0x700,%esi

  Assembly after:

*	89 c1                	mov    %eax,%ecx
*	c0 e1 03             	shl    $0x3,%cl
*	bf ff ff ff ff       	mov    $0xffffffff,%edi
*	48 d3 e7             	shl    %cl,%rdi
*	f7 d7                	not    %edi
	41 23 7d 00          	and    0x0(%r13),%edi
	41 89 d3             	mov    %edx,%r11d
	41 81 e3 00 07 00 00 	and    $0x700,%r11d
2019-01-04 19:07:28 -08:00
Caleb Mazalevskis a866f7181c
Update README to use HTTPS instead of HTTP.
HTTPS is currently available for all the HTTP links included in the README.
As such, using HTTPS instead of HTTP for those links may be preferable.
2018-12-14 17:12:32 +08:00
costan ea660b57d6 Fix unused private field warning in NDEBUG builds. 2018-08-17 14:31:23 -07:00
costan 7fefd231a1 C++11 guarantees <cstddef> and <cstdint>.
The build configuration can be cleaned up a bit.
2018-08-16 11:36:45 -07:00
costan db082d2cd6 Remove GCC on OSX from the Travis CI matrix. 2018-08-16 11:36:19 -07:00
costan ad82620f6f Move pshufb_fill_patterns from snappy-internal.h to snappy.cc.
The array of constants is only used in the SSSE3 fast-path in IncrementalCopy.
2018-08-09 12:08:12 -07:00
costan 73c31e824c Fix Visual Studio build.
Commit 8f469d97e2 introduced SSSE3 fast
paths that are gated by __SSE3__ macro checks and the <x86intrin.h>
header, neither of which exists in Visual Studio. This commit adds logic
for detecting SSSE3 compiler support that works for all compilers
supported by the open source release.

The commit also replaces the header with <tmmintrin.h>, which only
defines intrinsics supported by SSSE3 and below. This should help flag
any use of SIMD instructions that require more advanced SSE support, so
the uses can be gated by checks that also work in the open source
release.

Last, this commit requires C++11 support for the open source build. This is
needed by the alignas specifier, which was also introduced in commit
8f469d97e2.
2018-08-08 22:25:14 -07:00
jefflim 27ff0af12a Improve performance of zippy decompression to IOVecs by up to almost 50%
1) Simplify loop condition for small pattern IncrementalCopy
2) Use pointers rather than indices to track current iovec.
3) Use fast IncrementalCopy
4) Bypass Append check from within AppendFromSelf

While this code greatly improves the performance of ZippyIOVecWriter, a
bigger question is whether IOVec writing should be improved, or removed.

Perf tests:

name                                 old speed      new speed      delta
BM_UFlat/0      [html             ]  2.13GB/s ± 0%  2.14GB/s ± 1%     ~
BM_UFlat/1      [urls             ]  1.22GB/s ± 0%  1.24GB/s ± 0%   +1.87%
BM_UFlat/2      [jpg              ]  17.2GB/s ± 1%  17.1GB/s ± 0%     ~
BM_UFlat/3      [jpg_200          ]  1.55GB/s ± 0%  1.53GB/s ± 2%     ~
BM_UFlat/4      [pdf              ]  12.8GB/s ± 1%  12.7GB/s ± 2%   -0.36%
BM_UFlat/5      [html4            ]  1.89GB/s ± 0%  1.90GB/s ± 1%     ~
BM_UFlat/6      [txt1             ]   811MB/s ± 0%   829MB/s ± 1%   +2.24%
BM_UFlat/7      [txt2             ]   756MB/s ± 0%   774MB/s ± 1%   +2.41%
BM_UFlat/8      [txt3             ]   860MB/s ± 0%   879MB/s ± 1%   +2.16%
BM_UFlat/9      [txt4             ]   699MB/s ± 0%   715MB/s ± 1%   +2.31%
BM_UFlat/10     [pb               ]  2.64GB/s ± 0%  2.65GB/s ± 1%     ~
BM_UFlat/11     [gaviota          ]  1.00GB/s ± 0%  0.99GB/s ± 2%     ~
BM_UFlat/12     [cp               ]  1.66GB/s ± 1%  1.66GB/s ± 2%     ~
BM_UFlat/13     [c                ]  1.53GB/s ± 0%  1.47GB/s ± 5%   -3.97%
BM_UFlat/14     [lsp              ]  1.60GB/s ± 1%  1.55GB/s ± 5%   -3.41%
BM_UFlat/15     [xls              ]  1.12GB/s ± 0%  1.15GB/s ± 0%   +1.93%
BM_UFlat/16     [xls_200          ]   918MB/s ± 2%   929MB/s ± 1%   +1.15%
BM_UFlat/17     [bin              ]  1.86GB/s ± 0%  1.89GB/s ± 1%   +1.61%
BM_UFlat/18     [bin_200          ]  1.90GB/s ± 1%  1.97GB/s ± 1%   +3.67%
BM_UFlat/19     [sum              ]  1.32GB/s ± 0%  1.33GB/s ± 1%     ~
BM_UFlat/20     [man              ]  1.39GB/s ± 0%  1.36GB/s ± 3%     ~
BM_UValidate/0  [html             ]  2.85GB/s ± 3%  2.90GB/s ± 0%     ~
BM_UValidate/1  [urls             ]  1.57GB/s ± 0%  1.56GB/s ± 0%   -0.20%
BM_UValidate/2  [jpg              ]   824GB/s ± 0%   825GB/s ± 0%   +0.11%
BM_UValidate/3  [jpg_200          ]  2.01GB/s ± 0%  2.02GB/s ± 0%   +0.10%
BM_UValidate/4  [pdf              ]  30.4GB/s ±11%  33.5GB/s ± 0%     ~
BM_UIOVec/0     [html             ]   604MB/s ± 0%   856MB/s ± 0%  +41.70%
BM_UIOVec/1     [urls             ]   440MB/s ± 0%   660MB/s ± 0%  +49.91%
BM_UIOVec/2     [jpg              ]  15.1GB/s ± 1%  15.3GB/s ± 1%   +1.22%
BM_UIOVec/3     [jpg_200          ]   567MB/s ± 1%   629MB/s ± 0%  +10.89%
BM_UIOVec/4     [pdf              ]  7.16GB/s ± 2%  8.56GB/s ± 1%  +19.64%
BM_UFlatSink/0  [html             ]  2.13GB/s ± 0%  2.16GB/s ± 0%   +1.47%
BM_UFlatSink/1  [urls             ]  1.22GB/s ± 0%  1.25GB/s ± 0%   +2.18%
BM_UFlatSink/2  [jpg              ]  17.1GB/s ± 2%  17.1GB/s ± 2%     ~
BM_UFlatSink/3  [jpg_200          ]  1.51GB/s ± 1%  1.53GB/s ± 2%   +1.11%
BM_UFlatSink/4  [pdf              ]  12.7GB/s ± 2%  12.8GB/s ± 1%   +0.67%
BM_UFlatSink/5  [html4            ]  1.90GB/s ± 0%  1.92GB/s ± 0%   +1.31%
BM_UFlatSink/6  [txt1             ]   810MB/s ± 0%   835MB/s ± 0%   +3.04%
BM_UFlatSink/7  [txt2             ]   755MB/s ± 0%   779MB/s ± 0%   +3.19%
BM_UFlatSink/8  [txt3             ]   859MB/s ± 0%   884MB/s ± 0%   +2.86%
BM_UFlatSink/9  [txt4             ]   698MB/s ± 0%   718MB/s ± 0%   +2.96%
BM_UFlatSink/10 [pb               ]  2.64GB/s ± 0%  2.67GB/s ± 0%   +1.16%
BM_UFlatSink/11 [gaviota          ]  1.00GB/s ± 0%  1.01GB/s ± 0%   +1.04%
BM_UFlatSink/12 [cp               ]  1.66GB/s ± 1%  1.68GB/s ± 1%   +0.83%
BM_UFlatSink/13 [c                ]  1.52GB/s ± 1%  1.53GB/s ± 0%   +0.38%
BM_UFlatSink/14 [lsp              ]  1.60GB/s ± 1%  1.61GB/s ± 0%   +0.91%
BM_UFlatSink/15 [xls              ]  1.12GB/s ± 0%  1.15GB/s ± 0%   +1.96%
BM_UFlatSink/16 [xls_200          ]   906MB/s ± 3%   920MB/s ± 1%   +1.55%
BM_UFlatSink/17 [bin              ]  1.86GB/s ± 0%  1.90GB/s ± 0%   +2.15%
BM_UFlatSink/18 [bin_200          ]  1.85GB/s ± 2%  1.92GB/s ± 2%   +4.01%
BM_UFlatSink/19 [sum              ]  1.32GB/s ± 1%  1.35GB/s ± 0%   +2.23%
BM_UFlatSink/20 [man              ]  1.39GB/s ± 1%  1.40GB/s ± 0%   +1.12%
BM_ZFlat/0      [html (22.31 %)   ]   800MB/s ± 0%   793MB/s ± 0%   -0.95%
BM_ZFlat/1      [urls (47.78 %)   ]   423MB/s ± 0%   424MB/s ± 0%   +0.11%
BM_ZFlat/2      [jpg (99.95 %)    ]  12.0GB/s ± 2%  12.0GB/s ± 4%     ~
BM_ZFlat/3      [jpg_200 (73.00 %)]   592MB/s ± 3%   594MB/s ± 2%     ~
BM_ZFlat/4      [pdf (83.30 %)    ]  7.26GB/s ± 1%  7.23GB/s ± 2%   -0.49%
BM_ZFlat/5      [html4 (22.52 %)  ]   738MB/s ± 0%   739MB/s ± 0%   +0.17%
BM_ZFlat/6      [txt1 (57.88 %)   ]   286MB/s ± 0%   285MB/s ± 0%   -0.09%
BM_ZFlat/7      [txt2 (61.91 %)   ]   264MB/s ± 0%   264MB/s ± 0%   +0.08%
BM_ZFlat/8      [txt3 (54.99 %)   ]   300MB/s ± 0%   300MB/s ± 0%     ~
BM_ZFlat/9      [txt4 (66.26 %)   ]   248MB/s ± 0%   247MB/s ± 0%   -0.20%
BM_ZFlat/10     [pb (19.68 %)     ]  1.04GB/s ± 0%  1.03GB/s ± 0%   -1.17%
BM_ZFlat/11     [gaviota (37.72 %)]   451MB/s ± 0%   450MB/s ± 0%   -0.35%
BM_ZFlat/12     [cp (48.12 %)     ]   543MB/s ± 0%   538MB/s ± 0%   -1.04%
BM_ZFlat/13     [c (42.47 %)      ]   638MB/s ± 1%   643MB/s ± 0%   +0.68%
BM_ZFlat/14     [lsp (48.37 %)    ]   686MB/s ± 0%   691MB/s ± 1%   +0.76%
BM_ZFlat/15     [xls (41.23 %)    ]   636MB/s ± 0%   633MB/s ± 0%   -0.52%
BM_ZFlat/16     [xls_200 (78.00 %)]   523MB/s ± 2%   520MB/s ± 2%   -0.56%
BM_ZFlat/17     [bin (18.11 %)    ]  1.01GB/s ± 0%  1.01GB/s ± 0%   +0.50%
BM_ZFlat/18     [bin_200 (7.50 %) ]  2.45GB/s ± 1%  2.44GB/s ± 1%   -0.54%
BM_ZFlat/19     [sum (48.96 %)    ]   487MB/s ± 0%   478MB/s ± 0%   -1.89%
BM_ZFlat/20     [man (59.21 %)    ]   567MB/s ± 1%   566MB/s ± 1%     ~

The BM_UFlat/13 and BM_UFlat/14 results showed high variance, so I reran them:

name               old speed      new speed      delta
BM_UFlat/13 [c  ]  1.53GB/s ± 0%  1.53GB/s ± 1%    ~
BM_UFlat/14 [lsp]  1.61GB/s ± 1%  1.61GB/s ± 1%  +0.25%
2018-08-07 23:41:17 -07:00
costan 4ffb0e62c5 Update Travis CI configuration. 2018-08-07 21:33:14 -07:00
atdt be490ef9ec Test for SSE3 suppport before using pshufb. 2018-08-04 18:51:13 -07:00
atdt 8f469d97e2 Avoid store-forwarding stalls in Zippy's IncrementalCopy
NEW: Annotate `pattern` as initialized, for MSan.

Snappy's IncrementalCopy routine optimizes for speed by reading and writing
memory in blocks of eight or sixteen bytes. If the gap between the source
and destination pointers is smaller than eight bytes, snappy's strategy is
to expand the gap by issuing a series of partly-overlapping eight-byte
loads+stores. Because the range of each load partly overlaps that of the
store which preceded it, the store buffer cannot be forwarded to the load,
and the load stalls while it waits for the store to retire. This is called a
store-forwarding stall.

We can use fewer loads and avoid most of the stalls by loading the first
eight bytes into an 128-bit XMM register, then using PSHUFB to permute the
register's contents in-place into the desired repeating sequence of bytes.
When falling back to IncrementalCopySlow, use memset if the pattern size == 1.
This eliminates around 60% of the stalls.

name                       old time/op    new time/op    delta
BM_UFlat/0 [html]        48.6µs ± 0%    48.2µs ± 0%   -0.92%        (p=0.000 n=19+18)
BM_UFlat/1 [urls]         589µs ± 0%     576µs ± 0%   -2.17%        (p=0.000 n=19+18)
BM_UFlat/2 [jpg]         7.12µs ± 0%    7.10µs ± 0%     ~           (p=0.071 n=19+18)
BM_UFlat/3 [jpg_200]      162ns ± 0%     151ns ± 0%   -7.06%        (p=0.000 n=19+18)
BM_UFlat/4 [pdf]         8.25µs ± 0%    8.19µs ± 0%   -0.74%        (p=0.000 n=19+18)
BM_UFlat/5 [html4]        218µs ± 0%     218µs ± 0%   +0.09%        (p=0.000 n=17+18)
BM_UFlat/6 [txt1]         191µs ± 0%     189µs ± 0%   -1.12%        (p=0.000 n=19+18)
BM_UFlat/7 [txt2]         168µs ± 0%     167µs ± 0%   -1.01%        (p=0.000 n=19+18)
BM_UFlat/8 [txt3]         502µs ± 0%     499µs ± 0%   -0.52%        (p=0.000 n=19+18)
BM_UFlat/9 [txt4]         704µs ± 0%     695µs ± 0%   -1.26%        (p=0.000 n=19+18)
BM_UFlat/10 [pb]         45.6µs ± 0%    44.2µs ± 0%   -3.13%        (p=0.000 n=19+15)
BM_UFlat/11 [gaviota]     188µs ± 0%     194µs ± 0%   +3.06%        (p=0.000 n=15+18)
BM_UFlat/12 [cp]         15.1µs ± 2%    14.7µs ± 1%   -2.09%        (p=0.000 n=18+18)
BM_UFlat/13 [c]          7.38µs ± 0%    7.36µs ± 0%   -0.28%        (p=0.000 n=16+18)
BM_UFlat/14 [lsp]        2.31µs ± 0%    2.37µs ± 0%   +2.64%        (p=0.000 n=19+18)
BM_UFlat/15 [xls]         984µs ± 0%     909µs ± 0%   -7.59%        (p=0.000 n=19+18)
BM_UFlat/16 [xls_200]     215ns ± 0%     217ns ± 0%   +0.71%        (p=0.000 n=19+15)
BM_UFlat/17 [bin]         289µs ± 0%     287µs ± 0%   -0.71%        (p=0.000 n=19+18)
BM_UFlat/18 [bin_200]     161ns ± 0%     116ns ± 0%  -28.09%        (p=0.000 n=19+16)
BM_UFlat/19 [sum]        31.9µs ± 0%    29.2µs ± 0%   -8.37%        (p=0.000 n=19+18)
BM_UFlat/20 [man]        3.13µs ± 1%    3.07µs ± 0%   -1.79%        (p=0.000 n=19+18)

name                       old allocs/op  new allocs/op  delta
BM_UFlat/0 [html]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/1 [urls]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/2 [jpg]          0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/3 [jpg_200]      0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/4 [pdf]          0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/5 [html4]        0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/6 [txt1]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/7 [txt2]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/8 [txt3]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/9 [txt4]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/10 [pb]          0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/11 [gaviota]     0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/12 [cp]          0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/13 [c]           0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/14 [lsp]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/15 [xls]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/16 [xls_200]     0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/17 [bin]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/18 [bin_200]     0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/19 [sum]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
BM_UFlat/20 [man]         0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)

name                       old speed      new speed      delta
BM_UFlat/0 [html]      2.11GB/s ± 0%  2.13GB/s ± 0%   +0.92%        (p=0.000 n=19+18)
BM_UFlat/1 [urls]      1.19GB/s ± 0%  1.22GB/s ± 0%   +2.22%        (p=0.000 n=16+17)
BM_UFlat/2 [jpg]       17.3GB/s ± 0%  17.3GB/s ± 0%     ~           (p=0.074 n=19+18)
BM_UFlat/3 [jpg_200]   1.23GB/s ± 0%  1.33GB/s ± 0%   +7.58%        (p=0.000 n=19+18)
BM_UFlat/4 [pdf]       12.4GB/s ± 0%  12.5GB/s ± 0%   +0.74%        (p=0.000 n=19+18)
BM_UFlat/5 [html4]     1.88GB/s ± 0%  1.88GB/s ± 0%   -0.09%        (p=0.000 n=18+18)
BM_UFlat/6 [txt1]       798MB/s ± 0%   807MB/s ± 0%   +1.13%        (p=0.000 n=19+18)
BM_UFlat/7 [txt2]       743MB/s ± 0%   751MB/s ± 0%   +1.02%        (p=0.000 n=19+18)
BM_UFlat/8 [txt3]       850MB/s ± 0%   855MB/s ± 0%   +0.52%        (p=0.000 n=19+18)
BM_UFlat/9 [txt4]       684MB/s ± 0%   693MB/s ± 0%   +1.28%        (p=0.000 n=19+18)
BM_UFlat/10 [pb]       2.60GB/s ± 0%  2.69GB/s ± 0%   +3.25%        (p=0.000 n=19+16)
BM_UFlat/11 [gaviota]   979MB/s ± 0%   950MB/s ± 0%   -2.97%        (p=0.000 n=15+18)
BM_UFlat/12 [cp]       1.63GB/s ± 2%  1.67GB/s ± 1%   +2.13%        (p=0.000 n=18+18)
BM_UFlat/13 [c]        1.51GB/s ± 0%  1.52GB/s ± 0%   +0.29%        (p=0.000 n=16+18)
BM_UFlat/14 [lsp]      1.61GB/s ± 1%  1.57GB/s ± 0%   -2.57%        (p=0.000 n=19+18)
BM_UFlat/15 [xls]      1.05GB/s ± 0%  1.13GB/s ± 0%   +8.22%        (p=0.000 n=19+18)
BM_UFlat/16 [xls_200]   928MB/s ± 0%   921MB/s ± 0%   -0.81%        (p=0.000 n=19+17)
BM_UFlat/17 [bin]      1.78GB/s ± 0%  1.79GB/s ± 0%   +0.71%        (p=0.000 n=19+18)
BM_UFlat/18 [bin_200]  1.24GB/s ± 0%  1.72GB/s ± 0%  +38.92%        (p=0.000 n=19+18)
BM_UFlat/19 [sum]      1.20GB/s ± 0%  1.31GB/s ± 0%   +9.15%        (p=0.000 n=19+18)
BM_UFlat/20 [man]      1.35GB/s ± 1%  1.38GB/s ± 0%   +1.84%        (p=0.000 n=19+18)
2018-08-04 18:51:07 -07:00
costan 4f7bd2dbfd Update CI configurations.
Bump GCC and Clang on Travis and remove Visual Studio 2015 from AppVeyor.
2018-03-09 09:02:34 -08:00
jgorbe ca37ab7fb9 Ensure DecompressAllTags starts on a 32-byte boundary + 16 bytes.
First of all, I'm sorry about this ugly hack. I hope the following long
explanation is enough to justify it.

We have observed that, in some conditions, the results for dataset number 10
(pb) in the zippy benchmark can show a >20% regression on Skylake CPUs.

In order to diagnose this, we profiled the benchmark looking at hot functions
(99% of the time is spent on DecompressAllTags), then looked at the generated
code to see if there was any difference. In order to discard a minor difference
we observed in register allocation we replaced zippy.cc with a pre-built assembly
file so it was the same in both variants, and we still were able to reproduce the
regression.

After discarding a regression caused by the compiler, we digged a bit further
and noticed that the alignment of the function in the final binary was
different. Both were aligned to a 16-byte boundary, but the slower one was also
(by chance) aligned to a 32-byte boundary. A regression caused by alignment
differences would explain why I could reproduce it consistently on the same CitC
client, but not others: slight differences in the sources can cause the resulting
binary to have different layout.

Here are some detailed benchmark results before/after the fix. Note how fixing
the alignment makes the difference between baseline and experiment go away, but
regular 32-byte alignment puts both variants in the same ballpark as the
original regression:

Original (note BM_UCord_10 and BM_UDataBuffer_10 around the -24% line):

  BASELINE
  BM_UCord/10                    2938           2932          24194 3.767GB/s  pb
  BM_UDataBuffer/10              3008           3004          23316 3.677GB/s  pb

  EXPERIMENT
  BM_UCord/10                    3797           3789          18512 2.915GB/s  pb
  BM_UDataBuffer/10              4024           4016          17543 2.750GB/s  pb

Aligning DecompressAllTags to a 32-byte boundary:

  BASELINE
  BM_UCord/10                    3872           3862          18035 2.860GB/s  pb
  BM_UDataBuffer/10              4010           3998          17591 2.763GB/s  pb

  EXPERIMENT
  BM_UCord/10                    3884           3876          18126 2.850GB/s  pb
  BM_UDataBuffer/10              4037           4027          17199 2.743GB/s  pb

Aligning DecompressAllTags to a 32-byte boundary + 16 bytes (this patch):

  BASELINE
  BM_UCord/10                    3103           3095          22642 3.569GB/s  pb
  BM_UDataBuffer/10              3186           3177          21947 3.476GB/s  pb

  EXPERIMENT
  BM_UCord/10                    3104           3095          22632 3.569GB/s  pb
  BM_UDataBuffer/10              3167           3159          22076 3.496GB/s  pb

This change forces the "good" alignment for DecompressAllTags which, if
anything, should make benchmark results more stable (and maybe we'll improve
some unlucky application!).
2018-02-17 00:47:18 -08:00
scrubbed 15a2804cd2 Fix an incorrect analysis / comment in the "pattern doubling" code.
This should have a miniscule positive effect on performance; the
main idea of the CL is just to fix the incorrect comment.
2018-02-17 00:46:31 -08:00
costan e69d9f8806 Fix Travis CI configuration for OSX. 2018-01-04 15:27:36 -08:00
chandlerc 4aba5426d4 Rework a very hot, very sensitive part of snappy to reduce the number of
instructions, the number of dynamic branches, and avoid a particular
loop structure than LLVM has a very hard time optimizing for this
particular case.

The code being changed is part of the hottest path for snappy
decompression. In the benchmarks for decompressing protocol buffers,
this has proven to be amazingly sensitive to the slightest changes in
code layout. For example, previously we added '.p2align 5' assembly
directive to the code. This essentially padded the loop out from the
function. Merely by doing this we saw significant performance
improvements.

As a consequence, several of the compiler's typically reasonable
optimizations can have surprising bad impacts. Loop unrolling is a
primary culprit, but in the next LLVM release we are seeing an issue due
to loop rotation. While some of the problems caused by the newly
triggered loop rotation in LLVM can be mitigated with ongoing work on
LLVM's code layout optimizations (specifically, loop header cloning),
that is a fairly long term project. And even minor fluctuations in how
that subsequent optimization is performed may prevent gaining the
performance back.

For now, we need some way to unblock the next LLVM release which
contains a generic improvement to the LLVM loop optimizer that enables
loop rotation in more places, but uncovers this sensitivity and weakness
in a particular case.

This CL restructures the loop to have a simpler structure. Specifically,
we eagerly test what the terminal condition will be and provide two
versions of the copy loop that use a single loop predicate.

The comments in the source code and benchmarks indicate that only one of
these two cases is actually hot: we expect to generally have enough slop
in the buffer. That in turn allows us to generate a much simpler branch
and loop structure for the hot path (especially for the protocol buffer
decompression benchmark).

However, structuring even this simple loop in a way that doesn't trigger
some other performance bubble (often a more severe one) is quite
challenging. We have to carefully manage the variables used in the loop
and the addressing pattern. We should teach LLVM how to do this
reliably, but that too is a *much* more significant undertaking and is
extremely rare to have this degree of importance. The desired structure
of the loop, as shown with IACA's analysis for the broadwell
micro-architecture (HSW and SKX are similar):

| Num Of |                    Ports pressure in cycles                     |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |  6  |  7  |    |
---------------------------------------------------------------------------------
|   1    |           |     | 1.0   1.0 |           |     |     |     |     |    | mov rcx, qword ptr [rdi+rdx*1-0x8]
|   2^   |           |     |           | 0.4       | 1.0 |     |     | 0.6 |    | mov qword ptr [rdi], rcx
|   1    |           |     |           | 1.0   1.0 |     |     |     |     |    | mov rcx, qword ptr [rdi+rdx*1]
|   2^   |           |     | 0.3       |           | 1.0 |     |     | 0.7 |    | mov qword ptr [rdi+0x8], rcx
|   1    | 0.5       |     |           |           |     | 0.5 |     |     |    | add rdi, 0x10
|   1    | 0.2       |     |           |           |     |     | 0.8 |     |    | cmp rdi, rax
|   0F   |           |     |           |           |     |     |     |     |    | jb 0xffffffffffffffe9

Specifically, the arrangement of addressing modes for the stores such
that micro-op fusion (indicated by the `^` on the `2` micro-op count) is
important to achieve good throughput for this loop.

The other thing necessary to make this change effective is to remove our
previous hack using `.p2align 5` to pad out the main decompression loop,
and to forcibly disable loop unrolling for critical loops. Because this
change simplifies the loop structure, more unrolling opportunities show
up. Also, the next LLVM release's generic loop optimization improvements
allow unrolling in more places, requiring still more disabling of
unrolling in this change.  Perhaps most surprising of these is that we
must disable loop unrolling in the *slow* path. While unrolling there
seems pointless, it should also be harmless.  This cold code is laid out
very far away from all of the hot code. All the samples shown in a
profile of the benchmark occur before this loop in the function. And
yet, if the loop gets unrolled (which seems to only happen reliably with
the next LLVM release) we see a nearly 20% regression in decompressing
protocol buffers!

With the current release of LLVM, we still observe some regression from
this source change, but it is fairly small (5% on decompressing protocol
buffers, less elsewhere). And with the next LLVM release it drops to
under 1% even in that case. Meanwhile, without this change, the next
release of LLVM will regress decompressing protocol buffers by more than
10%.
2018-01-04 15:27:15 -08:00
costan 26102a0c66 Fix generated version number in open source release.
Lands GitHub PR #61. The patch was also independently contributed by
Martin Gieseking <martin.gieseking@uos.de>.
2017-12-20 14:32:54 -08:00
costan b02bfa754e Tag open source release 1.1.7. 2017-08-24 16:54:23 -07:00
wmi 824e6718b5 Add a loop alignment directive to work around a performance regression.
We found LLVM upstream change at rL310792 degraded zippy benchmark by
~3%. Performance analysis showed the regression was caused by some
side-effect. The incidental loop alignment change (from 32 bytes to 16
bytes) led to increase of branch miss prediction and caused the
regression. The regression was reproducible on several intel
micro-architectures, like sandybridge, haswell and skylake. Sadly we
still don't have good understanding about the internal of intel branch
predictor and cannot explain how the branch miss prediction increases
when the loop alignment changes, so we cannot make a real fix here. The
workaround solution in the patch is to add a directive, align the hot
loop to 32 bytes, which can restore the performance. This is in order to
unblock the flip of default compiler to LLVM.
2017-08-24 16:54:12 -07:00
costan 55924d1109 Add GNUInstallDirs to CMake configuration.
This is modeled after https://github.com/google/googletest/pull/1160.
The immediate benefit is fixing the library install paths on 64-bit
Linux distributions, which tend to support running 32-bit and 64-bit
code side by side by installing 32-bit libraries in /usr/lib and  64-bit
libraries in /usr/lib64.
2017-08-16 19:19:31 -07:00
costan 632cd0f128 Use 64-bit optimized code path for ARM64.
This is inspired by https://github.com/google/snappy/pull/22.

Benchmark results with the change, Pixel C with Android N2G48B

Benchmark            Time(ns)    CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0             119544     119253       1501 818.9MB/s  html
BM_UFlat/1            1223950    1208588        163 554.0MB/s  urls
BM_UFlat/2              16081      15962      11527 7.2GB/s  jpg
BM_UFlat/3                356        352     416666 540.6MB/s  jpg_200
BM_UFlat/4              25010      24860       7683 3.8GB/s  pdf
BM_UFlat/5             484832     481572        407 811.1MB/s  html4
BM_UFlat/6             408410     408713        482 354.9MB/s  txt1
BM_UFlat/7             361714     361663        553 330.1MB/s  txt2
BM_UFlat/8            1090582    1087912        182 374.1MB/s  txt3
BM_UFlat/9            1503127    1503759        133 305.6MB/s  txt4
BM_UFlat/10            114183     114285       1715 989.6MB/s  pb
BM_UFlat/11            406714     407331        491 431.5MB/s  gaviota
BM_UIOVec/0            370397     369888        538 264.0MB/s  html
BM_UIOVec/1           3207510    3190000        100 209.9MB/s  urls
BM_UIOVec/2             16589      16573      11223 6.9GB/s  jpg
BM_UIOVec/3              1052       1052     165289 181.2MB/s  jpg_200
BM_UIOVec/4             49151      49184       3985 1.9GB/s  pdf
BM_UValidate/0          68115      68095       2893 1.4GB/s  html
BM_UValidate/1         792652     792000        250 845.4MB/s  urls
BM_UValidate/2            334        334     487804 343.1GB/s  jpg
BM_UValidate/3            235        235     666666 809.9MB/s  jpg_200
BM_UValidate/4           6126       6130      32626 15.6GB/s  pdf
BM_ZFlat/0             292697     290560        678 336.1MB/s  html (22.31 %)
BM_ZFlat/1            4062080    4050000        100 165.3MB/s  urls (47.78 %)
BM_ZFlat/2              29225      29274       6422 3.9GB/s  jpg (99.95 %)
BM_ZFlat/3               1099       1098     163934 173.7MB/s  jpg_200 (73.00 %)
BM_ZFlat/4              44117      44233       4205 2.2GB/s  pdf (83.30 %)
BM_ZFlat/5            1158058    1157894        171 337.4MB/s  html4 (22.52 %)
BM_ZFlat/6            1102983    1093922        181 132.6MB/s  txt1 (57.88 %)
BM_ZFlat/7             974142     975490        204 122.4MB/s  txt2 (61.91 %)
BM_ZFlat/8            2984670    2990000        100 136.1MB/s  txt3 (54.99 %)
BM_ZFlat/9            4100130    4090000        100 112.4MB/s  txt4 (66.26 %)
BM_ZFlat/10            276236     275139        716 411.0MB/s  pb (19.68 %)
BM_ZFlat/11            760091     759541        262 231.4MB/s  gaviota (37.72 %)

Baseline benchmark results, Pixel C with Android N2G48B

Benchmark            Time(ns)    CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0             148957     147565       1335 661.8MB/s  html
BM_UFlat/1            1527257    1500000        132 446.4MB/s  urls
BM_UFlat/2              19589      19397       8764 5.9GB/s  jpg
BM_UFlat/3                425        418     408163 455.3MB/s  jpg_200
BM_UFlat/4              30096      29552       6497 3.2GB/s  pdf
BM_UFlat/5             595933     594594        333 657.0MB/s  html4
BM_UFlat/6             516315     514360        383 282.0MB/s  txt1
BM_UFlat/7             454653     453514        441 263.2MB/s  txt2
BM_UFlat/8            1382687    1361111        144 299.0MB/s  txt3
BM_UFlat/9            1967590    1904761        105 241.3MB/s  txt4
BM_UFlat/10            148271     144560       1342 782.3MB/s  pb
BM_UFlat/11            523997     510471        382 344.4MB/s  gaviota
BM_UIOVec/0            478443     465227        417 209.9MB/s  html
BM_UIOVec/1           4172860    4060000        100 164.9MB/s  urls
BM_UIOVec/2             21470      20975       7342 5.5GB/s  jpg
BM_UIOVec/3              1357       1330      75187 143.4MB/s  jpg_200
BM_UIOVec/4             63143      61365       3031 1.6GB/s  pdf
BM_UValidate/0          86910      85125       2279 1.1GB/s  html
BM_UValidate/1        1022256    1000000        195 669.6MB/s  urls
BM_UValidate/2            420        417     400000 274.6GB/s  jpg
BM_UValidate/3            311        302     571428 630.0MB/s  jpg_200
BM_UValidate/4           7778       7584      25445 12.6GB/s  pdf
BM_ZFlat/0             469209     457547        424 213.4MB/s  html (22.31 %)
BM_ZFlat/1            5633510    5460000        100 122.6MB/s  urls (47.78 %)
BM_ZFlat/2              37896      36693       4524 3.1GB/s  jpg (99.95 %)
BM_ZFlat/3               1485       1441     123456 132.3MB/s  jpg_200 (73.00 %)
BM_ZFlat/4              74870      72775       2652 1.3GB/s  pdf (83.30 %)
BM_ZFlat/5            1857321    1785714        112 218.8MB/s  html4 (22.52 %)
BM_ZFlat/6            1538723    1492307        130 97.2MB/s  txt1 (57.88 %)
BM_ZFlat/7            1338236    1310810        148 91.1MB/s  txt2 (61.91 %)
BM_ZFlat/8            4050820    4040000        100 100.7MB/s  txt3 (54.99 %)
BM_ZFlat/9            5234940    5230000        100 87.9MB/s  txt4 (66.26 %)
BM_ZFlat/10            400309     400000        495 282.7MB/s  pb (19.68 %)
BM_ZFlat/11           1063042    1058510        188 166.1MB/s  gaviota (37.72 %)
2017-08-16 19:18:22 -07:00
costan 77c12adc19 Add unistd.h checks back to the CMake build.
getpagesize(), as well as its POSIX.2001 replacement
sysconf(_SC_PAGESIZE), is defined in <unistd.h>. On Linux and OS X,
including <sys/mman.h> is sufficient to get a definition for
getpagesize(). However, this is not true for the Android NDK. This CL
brings back the HAVE_UNISTD_H definition and its associated header
check.

This also adds a HAVE_FUNC_SYSCONF definition, which checks for the
presence of sysconf(). The definition can be used later to replace
getpagesize() with sysconf().
2017-08-02 10:56:06 -07:00
costan c8049c5827 Replace getpagesize() with sysconf(_SC_PAGESIZE).
getpagesize() has been removed from POSIX.1-2001. Its recommended
replacement is sysconf(_SC_PAGESIZE).
2017-08-01 14:38:57 -07:00
costan 18e2f220d8 Add guidelines for opensource contributions.
The guidelines follow the instructions at
https://opensource.google.com/docs/releasing/preparing/#CONTRIBUTING
2017-08-01 14:38:24 -07:00
costan f0d3237c32 Use _BitScanForward and _BitScanReverse on MSVC.
Based on https://github.com/google/snappy/pull/30
2017-08-01 14:38:02 -07:00
jueminyang 71b8f86887 Add SNAPPY_ prefix to PREDICT_{TRUE,FALSE} macros. 2017-08-01 14:36:26 -07:00
costan be6dc3db83 Redo CMake configuration.
The style was changed to match the official manual [1], the install
configuration was simplified and now matches the official packaging
guide [2], and the config files use the CMake-specific variable syntax
${VAR} instead of the autoconf-compatible syntax @VAR@, as documented in
[3]. The public header files are declared as such (for CMake 3.3+), and
the generated headers are included in the library target definition.

The tests are only built if SNAPPY_BUILD_TESTS (default ON) is true, so
zippy can be easily used in projects that add_subdirectory() its source
code directly, instead of using find_package().

[1] https://cmake.org/cmake/help/git-master/manual/cmake-language.7.html
[2] https://cmake.org/cmake/help/git-master/manual/cmake-packages.7.html
[3] https://cmake.org/cmake/help/git-master/command/configure_file.html
2017-07-28 10:14:21 -07:00
costan e4de6ce087 Small improvements to open source CI configuration.
This CL fixes 64-bit Windows testing (), makes it possible to view the
test output in the Travis / AppVeyor CI console while the test is
running, and takes advantage of the new support for the .appveyor.yml
file name to make the CI configuration less obtrusive.
2017-07-27 16:46:54 -07:00