mirror of https://github.com/google/benchmark.git
[Tools] Drop compare_bench.py, compare.py is to be used, add U-test docs. (#645)
As discussed in IRC, time to deduplicate.
This commit is contained in:
parent
f85304e4e3
commit
94c4d6d5c6
|
@ -910,6 +910,10 @@ by `--benchmark_out=<filename>`. The format of the output can be specified
|
|||
using `--benchmark_out_format={json|console|csv}`. Specifying
|
||||
`--benchmark_out` does not suppress the console output.
|
||||
|
||||
## Result comparison
|
||||
|
||||
It is possible to compare the benchmarking results. See [Additional Tooling Documentation](docs/tools.md)
|
||||
|
||||
## Debug vs Release
|
||||
By default, benchmark builds as a debug library. You will see a warning in the
|
||||
output when this is the case. To build it as a release library instead, use:
|
||||
|
|
|
@ -1,86 +1,16 @@
|
|||
# Benchmark Tools
|
||||
|
||||
## compare_bench.py
|
||||
|
||||
The `compare_bench.py` utility which can be used to compare the result of benchmarks.
|
||||
The program is invoked like:
|
||||
|
||||
``` bash
|
||||
$ compare_bench.py <old-benchmark> <new-benchmark> [benchmark options]...
|
||||
```
|
||||
|
||||
Note, it relies on the scipy package which can be installed using [these instructions](https://www.scipy.org/install.html).
|
||||
|
||||
Where `<old-benchmark>` and `<new-benchmark>` either specify a benchmark executable file, or a JSON output file. The type of the input file is automatically detected. If a benchmark executable is specified then the benchmark is run to obtain the results. Otherwise the results are simply loaded from the output file.
|
||||
|
||||
`[benchmark options]` will be passed to the benchmarks invocations. They can be anything that binary accepts, be it either normal `--benchmark_*` parameters, or some custom parameters your binary takes.
|
||||
|
||||
The sample output using the JSON test files under `Inputs/` gives:
|
||||
|
||||
``` bash
|
||||
$ ./compare_bench.py ./gbench/Inputs/test1_run1.json ./gbench/Inputs/test1_run2.json
|
||||
Comparing ./gbench/Inputs/test1_run1.json to ./gbench/Inputs/test1_run2.json
|
||||
Benchmark Time CPU Time Old Time New CPU Old CPU New
|
||||
-------------------------------------------------------------------------------------------------------------
|
||||
BM_SameTimes +0.0000 +0.0000 10 10 10 10
|
||||
BM_2xFaster -0.5000 -0.5000 50 25 50 25
|
||||
BM_2xSlower +1.0000 +1.0000 50 100 50 100
|
||||
BM_1PercentFaster -0.0100 -0.0100 100 99 100 99
|
||||
BM_1PercentSlower +0.0100 +0.0100 100 101 100 101
|
||||
BM_10PercentFaster -0.1000 -0.1000 100 90 100 90
|
||||
BM_10PercentSlower +0.1000 +0.1000 100 110 100 110
|
||||
BM_100xSlower +99.0000 +99.0000 100 10000 100 10000
|
||||
BM_100xFaster -0.9900 -0.9900 10000 100 10000 100
|
||||
BM_10PercentCPUToTime +0.1000 -0.1000 100 110 100 90
|
||||
BM_ThirdFaster -0.3333 -0.3334 100 67 100 67
|
||||
BM_BadTimeUnit -0.9000 +0.2000 0 0 0 1
|
||||
```
|
||||
|
||||
As you can note, the values in `Time` and `CPU` columns are calculated as `(new - old) / |old|`.
|
||||
|
||||
When a benchmark executable is run, the raw output from the benchmark is printed in real time to stdout. The sample output using `benchmark/basic_test` for both arguments looks like:
|
||||
|
||||
```
|
||||
./compare_bench.py test/basic_test test/basic_test --benchmark_filter=BM_empty.*
|
||||
RUNNING: test/basic_test --benchmark_filter=BM_empty.* --benchmark_out=/tmp/tmpN7LF3a
|
||||
Run on (8 X 4000 MHz CPU s)
|
||||
2017-11-07 23:28:36
|
||||
---------------------------------------------------------------------
|
||||
Benchmark Time CPU Iterations
|
||||
---------------------------------------------------------------------
|
||||
BM_empty 4 ns 4 ns 170178757
|
||||
BM_empty/threads:8 1 ns 7 ns 103868920
|
||||
BM_empty_stop_start 0 ns 0 ns 1000000000
|
||||
BM_empty_stop_start/threads:8 0 ns 0 ns 1403031720
|
||||
RUNNING: /test/basic_test --benchmark_filter=BM_empty.* --benchmark_out=/tmp/tmplvrIp8
|
||||
Run on (8 X 4000 MHz CPU s)
|
||||
2017-11-07 23:28:38
|
||||
---------------------------------------------------------------------
|
||||
Benchmark Time CPU Iterations
|
||||
---------------------------------------------------------------------
|
||||
BM_empty 4 ns 4 ns 169534855
|
||||
BM_empty/threads:8 1 ns 7 ns 104188776
|
||||
BM_empty_stop_start 0 ns 0 ns 1000000000
|
||||
BM_empty_stop_start/threads:8 0 ns 0 ns 1404159424
|
||||
Comparing ../build/test/basic_test to ../build/test/basic_test
|
||||
Benchmark Time CPU Time Old Time New CPU Old CPU New
|
||||
---------------------------------------------------------------------------------------------------------------------
|
||||
BM_empty -0.0048 -0.0049 4 4 4 4
|
||||
BM_empty/threads:8 -0.0123 -0.0054 1 1 7 7
|
||||
BM_empty_stop_start -0.0000 -0.0000 0 0 0 0
|
||||
BM_empty_stop_start/threads:8 -0.0029 +0.0001 0 0 0 0
|
||||
|
||||
```
|
||||
|
||||
As you can note, the values in `Time` and `CPU` columns are calculated as `(new - old) / |old|`.
|
||||
Obviously this example doesn't give any useful output, but it's intended to show the output format when 'compare_bench.py' needs to run benchmarks.
|
||||
|
||||
## compare.py
|
||||
|
||||
The `compare.py` can be used to compare the result of benchmarks.
|
||||
|
||||
**NOTE**: the utility relies on the scipy package which can be installed using [these instructions](https://www.scipy.org/install.html).
|
||||
|
||||
### Modes of operation
|
||||
|
||||
There are three modes of operation:
|
||||
|
||||
1. Just compare two benchmarks, what `compare_bench.py` did.
|
||||
1. Just compare two benchmarks
|
||||
The program is invoked like:
|
||||
|
||||
``` bash
|
||||
|
@ -242,3 +172,19 @@ Benchmark Time CPU Time Old
|
|||
```
|
||||
This is a mix of the previous two modes, two (potentially different) benchmark binaries are run, and a different filter is applied to each one.
|
||||
As you can note, the values in `Time` and `CPU` columns are calculated as `(new - old) / |old|`.
|
||||
|
||||
### U test
|
||||
|
||||
If there is a sufficient repetition count of the benchmarks, the tool can do
|
||||
a [U Test](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test), of the
|
||||
null hypothesis that it is equally likely that a randomly selected value from
|
||||
one sample will be less than or greater than a randomly selected value from a
|
||||
second sample.
|
||||
|
||||
If the calculated p-value is below this value is lower than the significance
|
||||
level alpha, then the result is said to be statistically significant and the
|
||||
null hypothesis is rejected. Which in other words means that the two benchmarks
|
||||
aren't identical.
|
||||
|
||||
**WARNING**: requires **LARGE** (no less than 9) number of repetitions to be
|
||||
meaningful!
|
||||
|
|
|
@ -1,67 +0,0 @@
|
|||
#!/usr/bin/env python
|
||||
"""
|
||||
compare_bench.py - Compare two benchmarks or their results and report the
|
||||
difference.
|
||||
"""
|
||||
import argparse
|
||||
from argparse import ArgumentParser
|
||||
import sys
|
||||
import gbench
|
||||
from gbench import util, report
|
||||
from gbench.util import *
|
||||
|
||||
def check_inputs(in1, in2, flags):
|
||||
"""
|
||||
Perform checking on the user provided inputs and diagnose any abnormalities
|
||||
"""
|
||||
in1_kind, in1_err = classify_input_file(in1)
|
||||
in2_kind, in2_err = classify_input_file(in2)
|
||||
output_file = find_benchmark_flag('--benchmark_out=', flags)
|
||||
output_type = find_benchmark_flag('--benchmark_out_format=', flags)
|
||||
if in1_kind == IT_Executable and in2_kind == IT_Executable and output_file:
|
||||
print(("WARNING: '--benchmark_out=%s' will be passed to both "
|
||||
"benchmarks causing it to be overwritten") % output_file)
|
||||
if in1_kind == IT_JSON and in2_kind == IT_JSON and len(flags) > 0:
|
||||
print("WARNING: passing --benchmark flags has no effect since both "
|
||||
"inputs are JSON")
|
||||
if output_type is not None and output_type != 'json':
|
||||
print(("ERROR: passing '--benchmark_out_format=%s' to 'compare_bench.py`"
|
||||
" is not supported.") % output_type)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def main():
|
||||
parser = ArgumentParser(
|
||||
description='compare the results of two benchmarks')
|
||||
parser.add_argument(
|
||||
'test1', metavar='test1', type=str, nargs=1,
|
||||
help='A benchmark executable or JSON output file')
|
||||
parser.add_argument(
|
||||
'test2', metavar='test2', type=str, nargs=1,
|
||||
help='A benchmark executable or JSON output file')
|
||||
parser.add_argument(
|
||||
'benchmark_options', metavar='benchmark_options', nargs=argparse.REMAINDER,
|
||||
help='Arguments to pass when running benchmark executables'
|
||||
)
|
||||
args, unknown_args = parser.parse_known_args()
|
||||
# Parse the command line flags
|
||||
test1 = args.test1[0]
|
||||
test2 = args.test2[0]
|
||||
if unknown_args:
|
||||
# should never happen
|
||||
print("Unrecognized positional argument arguments: '%s'"
|
||||
% unknown_args)
|
||||
exit(1)
|
||||
benchmark_options = args.benchmark_options
|
||||
check_inputs(test1, test2, benchmark_options)
|
||||
# Run the benchmarks and report the results
|
||||
json1 = gbench.util.run_or_load_benchmark(test1, benchmark_options)
|
||||
json2 = gbench.util.run_or_load_benchmark(test2, benchmark_options)
|
||||
output_lines = gbench.report.generate_difference_report(json1, json2)
|
||||
print('Comparing %s to %s' % (test1, test2))
|
||||
for ln in output_lines:
|
||||
print(ln)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
Loading…
Reference in New Issue