Commit graph

964 commits

Author SHA1 Message Date
Gregorio Litenstein 9ffb8df6c5 Fix Clang Detection (#697)
For several versions now, CMake by default refers to macOS’ Clang as AppleClang instead of just Clang, which would fail STREQUAL. Fixed by changing it to MATCHES.
2018-10-05 16:44:02 +01:00
Roman Lebedev a8082de5df
[NFC] Refactor RunBenchmark() (#690)
Ok, so, i'm still trying to get to the state when it will be a trivial change to report all the separate iterations.
The old code (LHS of the diff) was rather convoluted i'd say.
I have tried to refactor it a bit into *small* logical chunks, with proper comments.
As far as i can tell, i preserved the intent of the code, what it was doing before.
The road forward still isn't clear, but i'm quite sure it's not with the old code :)
2018-10-01 17:51:08 +03:00
Victor Costan d8c0f27448 Fix possible loss of data warnings in MSVC. (#694) 2018-10-01 13:00:13 +01:00
Dominic Hamon edc77a3669
Make State constructor private. (#650)
The State constructor should not be part of the public API. Adding a
utility method to BenchmarkInstance allows us to avoid leaking the
RunInThread method into the public API.
2018-09-28 12:28:43 +01:00
Roman Lebedev eb8cbec077 appveyor ci: drop Visual Studio 12 2013 - unsupported by Google Test master branch. (#691)
See https://github.com/google/googletest/pull/1815

Fixes #689.
2018-09-26 10:11:54 +01:00
Roman Lebedev aad33aab3c
[Tooling] Rewrite generate_difference_report(). (#678)
My knowledge of python is not great, so this is kinda horrible.

Two things:
1. If there were repetitions, for the RHS (i.e. the new value) we were always using the first repetition,
    which naturally results in incorrect change reports for the second and following repetitions.
    And what is even worse, that completely broke U test. :(
2. A better support for different repetition count for U test was missing.
    It's important if we are to be able to report 'iteration as repetition',
    since it is rather likely that the iteration count will mismatch.

Now, the rough idea on how this is implemented now. I think this is the right solution.
1. Get all benchmark names (in order) from the lhs benchmark.
2. While preserving the order, keep the unique names
3. Get all benchmark names (in order) from the rhs benchmark.
4. While preserving the order, keep the unique names
5. Intersect `2.` and `4.`, get the list of unique benchmark names that exist on both sides.
6. Now, we want to group (partition) all the benchmarks with the same name.
   ```
   BM_FOO:
       [lhs]: BM_FOO/repetition0 BM_FOO/repetition1
       [rhs]: BM_FOO/repetition0 BM_FOO/repetition1 BM_FOO/repetition2
   ...
   ```
   We also drop mismatches in `time_unit` here.
   _(whose bright idea was it to store arbitrarily scaled timers in json **?!** )_
7. Iterate for each partition
7.1. Conditionally, diff the overlapping repetitions (the count of repetitions may be different.)
7.2. Conditionally, do the U test:
7.2.1. Get **all** the values of `"real_time"` field from the lhs benchmark
7.2.2. Get **all** the values of `"cpu_time"` field from the lhs benchmark
7.2.3. Get **all** the values of `"real_time"` field from the rhs benchmark
7.2.4. Get **all** the values of `"cpu_time"` field from the rhs benchmark
          NOTE: the repetition count may be different, but we want *all* the values!
7.2.5. Do the rest of the u test stuff
7.2.6. Print u test
8. ???
9. **PROFIT**!

Fixes #677
2018-09-19 15:59:13 +03:00
Martin Storsjö 439d6b1c2a Include sys/time.h for cycleclock.h when building on MinGW (#680)
When building for ARM, there is a fallback codepath that uses
gettimeofday, which requires sys/time.h.

The Windows SDK doesn't have this header, but MinGW does have it.
Thus, this fixes building for Windows on ARM with MinGW
headers/libraries, while Windows on ARM with the Windows SDK still
is broken.
2018-09-19 11:52:05 +01:00
Martin Storsjö 5261307982 [benchmark] Lowercase windows specific includes (#679)
The windows SDK headers don't have self-consistent casing anyway,
and many projects consistently use lowercase for them, in order
to fix crosscompilation with mingw headers.
2018-09-18 09:42:20 +01:00
Roman Lebedev a5e9c061d9
[Tooling] 'display aggregates only' support. (#674)
Works as you'd expect, both for the benchmark binaries, and
the jsons (naturally, since the tests need to work):
```
$ ~/src/googlebenchmark/tools/compare.py benchmarks ~/rawspeed/build-{0,1}/src/utilities/rsbench/rsbench --benchmark_repetitions=3 *
RUNNING: /home/lebedevri/rawspeed/build-0/src/utilities/rsbench/rsbench --benchmark_repetitions=3 2K4A9927.CR2 2K4A9928.CR2 2K4A9929.CR2 --benchmark_out=/tmp/tmpYoui5H
2018-09-12 20:23:47
Running /home/lebedevri/rawspeed/build-0/src/utilities/rsbench/rsbench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
-----------------------------------------------------------------------------------------------
Benchmark                                        Time           CPU Iterations UserCounters...
-----------------------------------------------------------------------------------------------
2K4A9927.CR2/threads:8/real_time               447 ms        447 ms          2 CPUTime,s=0.447302 CPUTime/WallTime=1.00001 Pixels=52.6643M Pixels/CPUTime=117.738M Pixels/WallTime=117.738M Raws/CPUTime=2.23562 Raws/WallTime=2.23564 WallTime,s=0.4473
2K4A9927.CR2/threads:8/real_time               444 ms        444 ms          2 CPUTime,s=0.444228 CPUTime/WallTime=1.00001 Pixels=52.6643M Pixels/CPUTime=118.552M Pixels/WallTime=118.553M Raws/CPUTime=2.2511 Raws/WallTime=2.25111 WallTime,s=0.444226
2K4A9927.CR2/threads:8/real_time               447 ms        447 ms          2 CPUTime,s=0.446983 CPUTime/WallTime=0.999999 Pixels=52.6643M Pixels/CPUTime=117.822M Pixels/WallTime=117.822M Raws/CPUTime=2.23722 Raws/WallTime=2.23722 WallTime,s=0.446984
2K4A9927.CR2/threads:8/real_time_mean          446 ms        446 ms          2 CPUTime,s=0.446171 CPUTime/WallTime=1 Pixels=52.6643M Pixels/CPUTime=118.037M Pixels/WallTime=118.038M Raws/CPUTime=2.24131 Raws/WallTime=2.24132 WallTime,s=0.44617
2K4A9927.CR2/threads:8/real_time_median        447 ms        447 ms          2 CPUTime,s=0.446983 CPUTime/WallTime=1.00001 Pixels=52.6643M Pixels/CPUTime=117.822M Pixels/WallTime=117.822M Raws/CPUTime=2.23722 Raws/WallTime=2.23722 WallTime,s=0.446984
2K4A9927.CR2/threads:8/real_time_stddev          2 ms          2 ms          2 CPUTime,s=1.69052m CPUTime/WallTime=3.53737u Pixels=0 Pixels/CPUTime=448.178k Pixels/WallTime=448.336k Raws/CPUTime=8.51008m Raws/WallTime=8.51308m WallTime,s=1.6911m
2K4A9928.CR2/threads:8/real_time               563 ms        563 ms          1 CPUTime,s=0.562511 CPUTime/WallTime=0.999824 Pixels=27.9936M Pixels/CPUTime=49.7654M Pixels/WallTime=49.7567M Raws/CPUTime=1.77774 Raws/WallTime=1.77743 WallTime,s=0.56261
2K4A9928.CR2/threads:8/real_time               561 ms        561 ms          1 CPUTime,s=0.561328 CPUTime/WallTime=0.999917 Pixels=27.9936M Pixels/CPUTime=49.8703M Pixels/WallTime=49.8662M Raws/CPUTime=1.78149 Raws/WallTime=1.78134 WallTime,s=0.561375
2K4A9928.CR2/threads:8/real_time               570 ms        570 ms          1 CPUTime,s=0.570423 CPUTime/WallTime=0.999876 Pixels=27.9936M Pixels/CPUTime=49.0752M Pixels/WallTime=49.0691M Raws/CPUTime=1.75308 Raws/WallTime=1.75287 WallTime,s=0.570493
2K4A9928.CR2/threads:8/real_time_mean          565 ms        565 ms          1 CPUTime,s=0.564754 CPUTime/WallTime=0.999872 Pixels=27.9936M Pixels/CPUTime=49.5703M Pixels/WallTime=49.564M Raws/CPUTime=1.77077 Raws/WallTime=1.77055 WallTime,s=0.564826
2K4A9928.CR2/threads:8/real_time_median        563 ms        563 ms          1 CPUTime,s=0.562511 CPUTime/WallTime=0.999876 Pixels=27.9936M Pixels/CPUTime=49.7654M Pixels/WallTime=49.7567M Raws/CPUTime=1.77774 Raws/WallTime=1.77743 WallTime,s=0.56261
2K4A9928.CR2/threads:8/real_time_stddev          5 ms          5 ms          1 CPUTime,s=4.945m CPUTime/WallTime=46.3459u Pixels=0 Pixels/CPUTime=431.997k Pixels/WallTime=432.061k Raws/CPUTime=0.015432 Raws/WallTime=0.0154343 WallTime,s=4.94686m
2K4A9929.CR2/threads:8/real_time               306 ms        306 ms          2 CPUTime,s=0.306476 CPUTime/WallTime=0.999961 Pixels=12.4416M Pixels/CPUTime=40.5957M Pixels/WallTime=40.5941M Raws/CPUTime=3.2629 Raws/WallTime=3.26277 WallTime,s=0.306488
2K4A9929.CR2/threads:8/real_time               310 ms        310 ms          2 CPUTime,s=0.309978 CPUTime/WallTime=0.999939 Pixels=12.4416M Pixels/CPUTime=40.1371M Pixels/WallTime=40.1347M Raws/CPUTime=3.22604 Raws/WallTime=3.22584 WallTime,s=0.309996
2K4A9929.CR2/threads:8/real_time               309 ms        309 ms          2 CPUTime,s=0.30943 CPUTime/WallTime=0.999987 Pixels=12.4416M Pixels/CPUTime=40.2081M Pixels/WallTime=40.2076M Raws/CPUTime=3.23175 Raws/WallTime=3.23171 WallTime,s=0.309434
2K4A9929.CR2/threads:8/real_time_mean          309 ms        309 ms          2 CPUTime,s=0.308628 CPUTime/WallTime=0.999962 Pixels=12.4416M Pixels/CPUTime=40.3136M Pixels/WallTime=40.3121M Raws/CPUTime=3.24023 Raws/WallTime=3.24011 WallTime,s=0.308639
2K4A9929.CR2/threads:8/real_time_median        309 ms        309 ms          2 CPUTime,s=0.30943 CPUTime/WallTime=0.999961 Pixels=12.4416M Pixels/CPUTime=40.2081M Pixels/WallTime=40.2076M Raws/CPUTime=3.23175 Raws/WallTime=3.23171 WallTime,s=0.309434
2K4A9929.CR2/threads:8/real_time_stddev          2 ms          2 ms          2 CPUTime,s=1.88354m CPUTime/WallTime=23.7788u Pixels=0 Pixels/CPUTime=246.821k Pixels/WallTime=246.914k Raws/CPUTime=0.0198384 Raws/WallTime=0.0198458 WallTime,s=1.88442m
RUNNING: /home/lebedevri/rawspeed/build-1/src/utilities/rsbench/rsbench --benchmark_repetitions=3 2K4A9927.CR2 2K4A9928.CR2 2K4A9929.CR2 --benchmark_out=/tmp/tmpRShmmf
2018-09-12 20:23:55
Running /home/lebedevri/rawspeed/build-1/src/utilities/rsbench/rsbench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
-----------------------------------------------------------------------------------------------
Benchmark                                        Time           CPU Iterations UserCounters...
-----------------------------------------------------------------------------------------------
2K4A9927.CR2/threads:8/real_time               446 ms        446 ms          2 CPUTime,s=0.445589 CPUTime/WallTime=1 Pixels=52.6643M Pixels/CPUTime=118.19M Pixels/WallTime=118.19M Raws/CPUTime=2.24422 Raws/WallTime=2.24422 WallTime,s=0.445589
2K4A9927.CR2/threads:8/real_time               446 ms        446 ms          2 CPUTime,s=0.446008 CPUTime/WallTime=1.00001 Pixels=52.6643M Pixels/CPUTime=118.079M Pixels/WallTime=118.08M Raws/CPUTime=2.24211 Raws/WallTime=2.24213 WallTime,s=0.446005
2K4A9927.CR2/threads:8/real_time               448 ms        448 ms          2 CPUTime,s=0.447763 CPUTime/WallTime=0.999994 Pixels=52.6643M Pixels/CPUTime=117.616M Pixels/WallTime=117.616M Raws/CPUTime=2.23332 Raws/WallTime=2.23331 WallTime,s=0.447766
2K4A9927.CR2/threads:8/real_time_mean          446 ms        446 ms          2 CPUTime,s=0.446453 CPUTime/WallTime=1 Pixels=52.6643M Pixels/CPUTime=117.962M Pixels/WallTime=117.962M Raws/CPUTime=2.23988 Raws/WallTime=2.23989 WallTime,s=0.446453
2K4A9927.CR2/threads:8/real_time_median        446 ms        446 ms          2 CPUTime,s=0.446008 CPUTime/WallTime=1 Pixels=52.6643M Pixels/CPUTime=118.079M Pixels/WallTime=118.08M Raws/CPUTime=2.24211 Raws/WallTime=2.24213 WallTime,s=0.446005
2K4A9927.CR2/threads:8/real_time_stddev          1 ms          1 ms          2 CPUTime,s=1.15367m CPUTime/WallTime=6.48501u Pixels=0 Pixels/CPUTime=304.437k Pixels/WallTime=305.025k Raws/CPUTime=5.7807m Raws/WallTime=5.79188m WallTime,s=1.15591m
2K4A9928.CR2/threads:8/real_time               561 ms        561 ms          1 CPUTime,s=0.560856 CPUTime/WallTime=0.999996 Pixels=27.9936M Pixels/CPUTime=49.9123M Pixels/WallTime=49.9121M Raws/CPUTime=1.78299 Raws/WallTime=1.78298 WallTime,s=0.560858
2K4A9928.CR2/threads:8/real_time               560 ms        560 ms          1 CPUTime,s=0.560023 CPUTime/WallTime=1.00001 Pixels=27.9936M Pixels/CPUTime=49.9865M Pixels/WallTime=49.9872M Raws/CPUTime=1.78564 Raws/WallTime=1.78567 WallTime,s=0.560015
2K4A9928.CR2/threads:8/real_time               562 ms        562 ms          1 CPUTime,s=0.562337 CPUTime/WallTime=0.999994 Pixels=27.9936M Pixels/CPUTime=49.7808M Pixels/WallTime=49.7805M Raws/CPUTime=1.77829 Raws/WallTime=1.77828 WallTime,s=0.56234
2K4A9928.CR2/threads:8/real_time_mean          561 ms        561 ms          1 CPUTime,s=0.561072 CPUTime/WallTime=1 Pixels=27.9936M Pixels/CPUTime=49.8932M Pixels/WallTime=49.8933M Raws/CPUTime=1.78231 Raws/WallTime=1.78231 WallTime,s=0.561071
2K4A9928.CR2/threads:8/real_time_median        561 ms        561 ms          1 CPUTime,s=0.560856 CPUTime/WallTime=0.999996 Pixels=27.9936M Pixels/CPUTime=49.9123M Pixels/WallTime=49.9121M Raws/CPUTime=1.78299 Raws/WallTime=1.78298 WallTime,s=0.560858
2K4A9928.CR2/threads:8/real_time_stddev          1 ms          1 ms          1 CPUTime,s=1.17202m CPUTime/WallTime=10.7929u Pixels=0 Pixels/CPUTime=104.164k Pixels/WallTime=104.612k Raws/CPUTime=3.721m Raws/WallTime=3.73701m WallTime,s=1.17706m
2K4A9929.CR2/threads:8/real_time               305 ms        305 ms          2 CPUTime,s=0.305436 CPUTime/WallTime=0.999926 Pixels=12.4416M Pixels/CPUTime=40.7339M Pixels/WallTime=40.7309M Raws/CPUTime=3.27401 Raws/WallTime=3.27376 WallTime,s=0.305459
2K4A9929.CR2/threads:8/real_time               306 ms        306 ms          2 CPUTime,s=0.30576 CPUTime/WallTime=0.999999 Pixels=12.4416M Pixels/CPUTime=40.6908M Pixels/WallTime=40.6908M Raws/CPUTime=3.27054 Raws/WallTime=3.27054 WallTime,s=0.30576
2K4A9929.CR2/threads:8/real_time               307 ms        307 ms          2 CPUTime,s=0.30724 CPUTime/WallTime=0.999991 Pixels=12.4416M Pixels/CPUTime=40.4947M Pixels/WallTime=40.4944M Raws/CPUTime=3.25478 Raws/WallTime=3.25475 WallTime,s=0.307243
2K4A9929.CR2/threads:8/real_time_mean          306 ms        306 ms          2 CPUTime,s=0.306145 CPUTime/WallTime=0.999972 Pixels=12.4416M Pixels/CPUTime=40.6398M Pixels/WallTime=40.6387M Raws/CPUTime=3.26645 Raws/WallTime=3.26635 WallTime,s=0.306154
2K4A9929.CR2/threads:8/real_time_median        306 ms        306 ms          2 CPUTime,s=0.30576 CPUTime/WallTime=0.999991 Pixels=12.4416M Pixels/CPUTime=40.6908M Pixels/WallTime=40.6908M Raws/CPUTime=3.27054 Raws/WallTime=3.27054 WallTime,s=0.30576
2K4A9929.CR2/threads:8/real_time_stddev          1 ms          1 ms          2 CPUTime,s=961.851u CPUTime/WallTime=40.2708u Pixels=0 Pixels/CPUTime=127.481k Pixels/WallTime=126.577k Raws/CPUTime=0.0102463 Raws/WallTime=0.0101737 WallTime,s=955.105u
Comparing /home/lebedevri/rawspeed/build-0/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-1/src/utilities/rsbench/rsbench
Benchmark                                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------
2K4A9927.CR2/threads:8/real_time                       -0.0038         -0.0038           447           446           447           446
2K4A9927.CR2/threads:8/real_time                       +0.0031         +0.0031           444           446           444           446
2K4A9927.CR2/threads:8/real_time                       -0.0031         -0.0031           447           446           447           446
2K4A9927.CR2/threads:8/real_time_pvalue                 0.6428          0.6428      U Test, Repetitions: 3. WARNING: Results unreliable! 9+ repetitions recommended.
2K4A9927.CR2/threads:8/real_time_mean                  +0.0006         +0.0006           446           446           446           446
2K4A9927.CR2/threads:8/real_time_median                -0.0022         -0.0022           447           446           447           446
2K4A9927.CR2/threads:8/real_time_stddev                -0.3161         -0.3175             2             1             2             1
2K4A9928.CR2/threads:8/real_time                       -0.0031         -0.0029           563           561           563           561
2K4A9928.CR2/threads:8/real_time                       -0.0009         -0.0008           561           561           561           561
2K4A9928.CR2/threads:8/real_time                       -0.0169         -0.0168           570           561           570           561
2K4A9928.CR2/threads:8/real_time_pvalue                 0.0636          0.0636      U Test, Repetitions: 3. WARNING: Results unreliable! 9+ repetitions recommended.
2K4A9928.CR2/threads:8/real_time_mean                  -0.0066         -0.0065           565           561           565           561
2K4A9928.CR2/threads:8/real_time_median                -0.0031         -0.0029           563           561           563           561
2K4A9928.CR2/threads:8/real_time_stddev                -0.7620         -0.7630             5             1             5             1
2K4A9929.CR2/threads:8/real_time                       -0.0034         -0.0034           306           305           306           305
2K4A9929.CR2/threads:8/real_time                       -0.0146         -0.0146           310           305           310           305
2K4A9929.CR2/threads:8/real_time                       -0.0128         -0.0129           309           305           309           305
2K4A9929.CR2/threads:8/real_time_pvalue                 0.0636          0.0636      U Test, Repetitions: 3. WARNING: Results unreliable! 9+ repetitions recommended.
2K4A9929.CR2/threads:8/real_time_mean                  -0.0081         -0.0080           309           306           309           306
2K4A9929.CR2/threads:8/real_time_median                -0.0119         -0.0119           309           306           309           306
2K4A9929.CR2/threads:8/real_time_stddev                -0.4931         -0.4894             2             1             2             1
$ ~/src/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{0,1}/src/utilities/rsbench/rsbench --benchmark_repetitions=3 *
RUNNING: /home/lebedevri/rawspeed/build-0/src/utilities/rsbench/rsbench --benchmark_repetitions=3 2K4A9927.CR2 2K4A9928.CR2 2K4A9929.CR2 --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpjrD2I0
2018-09-12 20:24:11
Running /home/lebedevri/rawspeed/build-0/src/utilities/rsbench/rsbench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
-----------------------------------------------------------------------------------------------
Benchmark                                        Time           CPU Iterations UserCounters...
-----------------------------------------------------------------------------------------------
2K4A9927.CR2/threads:8/real_time_mean          446 ms        446 ms          2 CPUTime,s=0.446077 CPUTime/WallTime=1 Pixels=52.6643M Pixels/CPUTime=118.061M Pixels/WallTime=118.062M Raws/CPUTime=2.24177 Raws/WallTime=2.24178 WallTime,s=0.446076
2K4A9927.CR2/threads:8/real_time_median        446 ms        446 ms          2 CPUTime,s=0.445676 CPUTime/WallTime=1 Pixels=52.6643M Pixels/CPUTime=118.167M Pixels/WallTime=118.168M Raws/CPUTime=2.24378 Raws/WallTime=2.2438 WallTime,s=0.445673
2K4A9927.CR2/threads:8/real_time_stddev          1 ms          1 ms          2 CPUTime,s=820.275u CPUTime/WallTime=3.51513u Pixels=0 Pixels/CPUTime=216.876k Pixels/WallTime=217.229k Raws/CPUTime=4.11809m Raws/WallTime=4.12478m WallTime,s=821.607u
2K4A9928.CR2/threads:8/real_time_mean          562 ms        562 ms          1 CPUTime,s=0.561843 CPUTime/WallTime=0.999983 Pixels=27.9936M Pixels/CPUTime=49.8247M Pixels/WallTime=49.8238M Raws/CPUTime=1.77986 Raws/WallTime=1.77983 WallTime,s=0.561852
2K4A9928.CR2/threads:8/real_time_median        562 ms        562 ms          1 CPUTime,s=0.561745 CPUTime/WallTime=0.999995 Pixels=27.9936M Pixels/CPUTime=49.8333M Pixels/WallTime=49.8307M Raws/CPUTime=1.78017 Raws/WallTime=1.78008 WallTime,s=0.561774
2K4A9928.CR2/threads:8/real_time_stddev          1 ms          1 ms          1 CPUTime,s=788.58u CPUTime/WallTime=30.2578u Pixels=0 Pixels/CPUTime=69.914k Pixels/WallTime=69.4978k Raws/CPUTime=2.4975m Raws/WallTime=2.48263m WallTime,s=783.873u
2K4A9929.CR2/threads:8/real_time_mean          306 ms        306 ms          2 CPUTime,s=0.305718 CPUTime/WallTime=1.00001 Pixels=12.4416M Pixels/CPUTime=40.6964M Pixels/WallTime=40.6967M Raws/CPUTime=3.271 Raws/WallTime=3.27102 WallTime,s=0.305716
2K4A9929.CR2/threads:8/real_time_median        306 ms        306 ms          2 CPUTime,s=0.305737 CPUTime/WallTime=1 Pixels=12.4416M Pixels/CPUTime=40.6939M Pixels/WallTime=40.6945M Raws/CPUTime=3.27079 Raws/WallTime=3.27084 WallTime,s=0.305732
2K4A9929.CR2/threads:8/real_time_stddev          1 ms          1 ms          2 CPUTime,s=584.969u CPUTime/WallTime=7.87539u Pixels=0 Pixels/CPUTime=77.8769k Pixels/WallTime=77.9118k Raws/CPUTime=6.2594m Raws/WallTime=6.2622m WallTime,s=585.232u
RUNNING: /home/lebedevri/rawspeed/build-1/src/utilities/rsbench/rsbench --benchmark_repetitions=3 2K4A9927.CR2 2K4A9928.CR2 2K4A9929.CR2 --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpN9Xhk2
2018-09-12 20:24:18
Running /home/lebedevri/rawspeed/build-1/src/utilities/rsbench/rsbench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
-----------------------------------------------------------------------------------------------
Benchmark                                        Time           CPU Iterations UserCounters...
-----------------------------------------------------------------------------------------------
2K4A9927.CR2/threads:8/real_time_mean          446 ms        446 ms          2 CPUTime,s=0.445771 CPUTime/WallTime=1.00001 Pixels=52.6643M Pixels/CPUTime=118.142M Pixels/WallTime=118.143M Raws/CPUTime=2.2433 Raws/WallTime=2.24332 WallTime,s=0.445769
2K4A9927.CR2/threads:8/real_time_median        446 ms        446 ms          2 CPUTime,s=0.445719 CPUTime/WallTime=1.00001 Pixels=52.6643M Pixels/CPUTime=118.156M Pixels/WallTime=118.156M Raws/CPUTime=2.24356 Raws/WallTime=2.24358 WallTime,s=0.445717
2K4A9927.CR2/threads:8/real_time_stddev          0 ms          0 ms          2 CPUTime,s=184.23u CPUTime/WallTime=1.69441u Pixels=0 Pixels/CPUTime=48.8185k Pixels/WallTime=49.0086k Raws/CPUTime=926.975u Raws/WallTime=930.585u WallTime,s=184.946u
2K4A9928.CR2/threads:8/real_time_mean          562 ms        562 ms          1 CPUTime,s=0.562071 CPUTime/WallTime=0.999969 Pixels=27.9936M Pixels/CPUTime=49.8045M Pixels/WallTime=49.803M Raws/CPUTime=1.77914 Raws/WallTime=1.77908 WallTime,s=0.562088
2K4A9928.CR2/threads:8/real_time_median        562 ms        562 ms          1 CPUTime,s=0.56237 CPUTime/WallTime=0.999986 Pixels=27.9936M Pixels/CPUTime=49.7779M Pixels/WallTime=49.7772M Raws/CPUTime=1.77819 Raws/WallTime=1.77816 WallTime,s=0.562378
2K4A9928.CR2/threads:8/real_time_stddev          1 ms          1 ms          1 CPUTime,s=967.38u CPUTime/WallTime=35.0024u Pixels=0 Pixels/CPUTime=85.7806k Pixels/WallTime=84.0545k Raws/CPUTime=3.06429m Raws/WallTime=3.00263m WallTime,s=947.993u
2K4A9929.CR2/threads:8/real_time_mean          307 ms        307 ms          2 CPUTime,s=0.306511 CPUTime/WallTime=0.999995 Pixels=12.4416M Pixels/CPUTime=40.5924M Pixels/WallTime=40.5922M Raws/CPUTime=3.26264 Raws/WallTime=3.26262 WallTime,s=0.306513
2K4A9929.CR2/threads:8/real_time_median        306 ms        306 ms          2 CPUTime,s=0.306169 CPUTime/WallTime=0.999999 Pixels=12.4416M Pixels/CPUTime=40.6364M Pixels/WallTime=40.637M Raws/CPUTime=3.26618 Raws/WallTime=3.26622 WallTime,s=0.306164
2K4A9929.CR2/threads:8/real_time_stddev          2 ms          2 ms          2 CPUTime,s=2.25763m CPUTime/WallTime=21.4306u Pixels=0 Pixels/CPUTime=298.503k Pixels/WallTime=299.126k Raws/CPUTime=0.0239924 Raws/WallTime=0.0240424 WallTime,s=2.26242m
Comparing /home/lebedevri/rawspeed/build-0/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-1/src/utilities/rsbench/rsbench
Benchmark                                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------
2K4A9927.CR2/threads:8/real_time_pvalue                 0.6428          0.6428      U Test, Repetitions: 3. WARNING: Results unreliable! 9+ repetitions recommended.
2K4A9927.CR2/threads:8/real_time_mean                  -0.0007         -0.0007           446           446           446           446
2K4A9927.CR2/threads:8/real_time_median                +0.0001         +0.0001           446           446           446           446
2K4A9927.CR2/threads:8/real_time_stddev                -0.7746         -0.7749             1             0             1             0
2K4A9928.CR2/threads:8/real_time_pvalue                 0.0636          0.0636      U Test, Repetitions: 3. WARNING: Results unreliable! 9+ repetitions recommended.
2K4A9928.CR2/threads:8/real_time_mean                  +0.0004         +0.0004           562           562           562           562
2K4A9928.CR2/threads:8/real_time_median                +0.0011         +0.0011           562           562           562           562
2K4A9928.CR2/threads:8/real_time_stddev                +0.2091         +0.2270             1             1             1             1
2K4A9929.CR2/threads:8/real_time_pvalue                 0.0636          0.0636      U Test, Repetitions: 3. WARNING: Results unreliable! 9+ repetitions recommended.
2K4A9929.CR2/threads:8/real_time_mean                  +0.0026         +0.0026           306           307           306           307
2K4A9929.CR2/threads:8/real_time_median                +0.0014         +0.0014           306           306           306           306
2K4A9929.CR2/threads:8/real_time_stddev                +2.8652         +2.8585             1             2             1             2
```
2018-09-17 11:59:39 +03:00
Roman Lebedev 1b44120cd1
Un-deprecate [SG]et{Item,Byte}sProcessed, re-implement as custom counters. (#676)
As discussed with @dominichamon and @dbabokin, sugar is nice.
Well, maybe not for the health, but it's sweet.
Alright, enough puns.

A special care needs to be applied not to break csv reporter. UGH.
We end up shedding some code over this.
We no longer specially pretty-print them, they are printed just like the rest of custom counters.

Fixes #627.
2018-09-13 22:03:47 +03:00
Roman Lebedev 58588476ce
Track two more details about runs - the aggregate name, and run name. (#675)
This is related to @BaaMeow's work in https://github.com/google/benchmark/pull/616 but is not based on it.

Two new fields are tracked, and dumped into JSON:
* If the run is an aggregate, the aggregate's name is stored.
  It can be RMS, BigO, mean, median, stddev, or any custom stat name.
* The aggregate-name-less run name is additionally stored.
  I.e. not some name of the benchmark function, but the actual
  name, but without the 'aggregate name' suffix.

This way one can group/filter all the runs,
and filter by the particular aggregate type.

I *might* need this for further tooling improvement.
Or maybe not.
But this is certainly worthwhile for custom tooling.
2018-09-13 15:08:15 +03:00
Roman Lebedev c614dfc0d4
*Display* aggregates only. (#665)
There is a flag 
d9cab612e4/src/benchmark.cc (L75-L78)
and a call
d9cab612e4/include/benchmark/benchmark.h (L837-L840)
But that affects everything, every reporter, destination:
d9cab612e4/src/benchmark.cc (L316)


It would be quite useful to have an ability to be more picky.


More specifically, i would like to be able to only see the aggregates in the on-screen output,
but for the file output to still contain everything. The former is useful in case of a lot of repetition
(or even more so if every iteration is reported separately), while the former is **great** for tooling.

Fixes https://github.com/google/benchmark/issues/664
2018-09-12 16:26:17 +03:00
Roman Lebedev f274c503e1 Backport LLVM's r341717 "Fix flags used to compile benchmark library with clang-cl" (#673)
`MSVC` is true for clang-cl, but `"${CMAKE_CXX_COMPILER_ID}" STREQUAL
"MSVC"` is false, so we would enable -Wall, which means -Weverything
with clang-cl, and we get tons of undesired warnings.

Use the simpler condition to fix things.

Patch by: Reid Kleckner @rnk
2018-09-10 16:30:40 -04:00
Roman Lebedev f0901417c8 GetCacheSizesMacOSX(): use consistent types. (#667)
I have absolutely no way to test this, but this looks obviously-good.

This was reported by Tim Northover @TNorthover in
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180903/584223.html

> I think this breaks some 32-bit configurations (well, mine at least).
> I was using Clang (from Xcode 10 beta) on macOS and got a bunch of
> errors referencing sysinfo.cc:292 and onwards:

> /Users/tim/llvm/llvm-project/llvm/utils/benchmark/src/sysinfo.cc:292:47:
> error: non-constant-expression cannot be narrowed from type
> 'std::__1::array<unsigned long long, 4>::value_type' (aka 'unsigned
> long long') to 'size_t' (aka 'unsigned long') in initializer list
> [-Wc++11-narrowing]
>   } Cases[] = {{"hw.l1dcachesize", "Data", 1, CacheCounts[1]},
>                                               ^~~~~~~~~~~~~~
>
> The same happens when self-hosting ToT. Unfortunately I couldn't
> reproduce the issue on Debian (Clang 6.0.1) even with libc++; I'm not
> sure what the difference is.
2018-09-05 12:20:18 +01:00
Roman Lebedev a7ed76ad78 Travis-ci: attempt to add 32-bit osx xcode build (#669)
Maybe will show https://github.com/google/benchmark/pull/667
or maybe this is completely wrong.
2018-09-05 12:19:54 +01:00
Changming Sun 305ba313be Pass name by const-reference instead of by value in class Statistics' constructor (#668) 2018-09-04 23:46:40 +03:00
pseyfert fbfc495d7f add missing closing bracket in --help message (#666) 2018-09-03 19:45:09 +03:00
Roman Lebedev 5159967520
Mark Set{Items,Bytes}Processed()/{items,bytes}_processed() as deprecated. (#654)
They are basically proto-version of custom user counters.
It does not seem that they do anything that custom user counters
don't do. And having two similar entities is not good for generalization.

Migration plan:
* ```
  SetItemsProcessed(<val>)
    =>
  state.counters.insert({
    {"<Name>", benchmark::Counter(<val>, benchmark::Counter::kIsRate)},
    ...
  });
  ```
* ```
  SetBytesProcessed(<val>)
    =>
  state.counters.insert({
    {"<Name>", benchmark::Counter(<val>, benchmark::Counter::kIsRate,
                                  benchmark::Counter::OneK::kIs1024)},
    ...
  });
  ```
* ```
  <Name>_processed()
    =>
  state.counters["<Name>"]
  ```

One thing the custom user counters miss is better support
for units of measurement.

Refs. https://github.com/google/benchmark/issues/627
2018-08-30 11:59:50 +03:00
Roman Lebedev caa2fcb19c
Counter(): add 'one thousand' param. (#657)
* Counter(): add 'one thousand' param.

Needed for https://github.com/google/benchmark/pull/654

Custom user counters are quite custom. It is not guaranteed
that the user *always* expects for these to have 1k == 1000.
If the counter represents bytes/memory/etc, 1k should be 1024.

Some bikeshedding points:
1. Is this sufficient, or do we really want to go full on
   into custom types with names?
   I think just the '1000' is sufficient for now.
2. Should there be a helper benchmark::Counter::Counter{1000,1024}()
   static 'constructor' functions, since these two, by far,
   will be the most used?
3. In the future, we should be somehow encoding this info into JSON.

* Counter(): use std::pair<> to represent 'one thousand'

* Counter(): just use a new enum with two values 1000 vs 1024.

Simpler is better. If someone comes up with a real reason
to need something more advanced, it can be added later on.

* Counter: just store the 1000 or 1024 in the One_K values directly

* Counter: s/One_K/OneK/
2018-08-29 21:11:06 +03:00
Roman Lebedev d9cab612e4
[NFC] s/console_reporter/display_reporter/ (#663)
There are two destinations:
* display (console, terminal) and
* file.

And each of the destinations can be poplulated with one of the reporters:
* console - human-friendly table-like display
* json
* csv (deprecated)

So using the name console_reporter is confusing.
Is it talking about the console reporter in the sense of
table-like reporter, or in the sense of display destination?
2018-08-29 14:58:54 +03:00
Michael "Croydon" Keck a0018c3931 Ignore 32 bit build option when using MSVC (#638) 2018-08-29 12:51:20 +01:00
Roman Lebedev 8688c5c4cf
Track 'type' of the run - is it an actual measurement, or an aggregate. (#658)
This is *only* exposed in the JSON. Not in CSV, which is deprecated.

This *only* supposed to track these two states.
An additional field could later track which aggregate this is,
specifically (statistic name, rms, bigo, ...)

The motivation is that we already have ReportAggregatesOnly,
but it affects the entire reports, both the display,
and the reporters (json files), which isn't ideal.

It would be very useful to have a 'display aggregates only' option,
both in the library's console reporter, and the python tooling,
This will be especially needed for the 'store separate iterations'.
2018-08-28 18:11:36 +03:00
Roman Lebedev 9a179cb93f
[NFC] Prefix "report(_)?mode" with Aggregation. (#656)
This only specifically represents handling of reporting of aggregates.
Not of anything else. Making it more specific makes the name less generic.

This is an issue because i want to add "iteration report mode",
so the naming would be conflicting.
2018-08-28 17:19:25 +03:00
Bernhard M. Wiedemann ede90ba6c8 Make tests pass on 1-core VMs (#653)
found while working on reproducible builds for openSUSE

To reproduce there
osc checkout openSUSE:Factory/benchmark && cd $_
osc build -j1 --vm-type=kvm
2018-08-28 17:10:14 +03:00
BaaMeow af441fc114 properly escape json names (#652) 2018-08-16 09:47:09 -07:00
Roman Lebedev 94c4d6d5c6 [Tools] Drop compare_bench.py, compare.py is to be used, add U-test docs. (#645)
As discussed in IRC, time to deduplicate.
2018-08-13 07:42:35 -07:00
Kirill Bobyrev f85304e4e3 Remove redundant default which causes failures (#649)
* Remove redundant default which causes failures

* Fix old GCC warnings caused by poor analysis

* Use __builtin_unreachable

* Use BENCHMARK_UNREACHABLE()

* Pull __has_builtin to benchmark.h too

* Also move compiler identification macro to main header

* Move custom compiler identification macro back
2018-08-08 14:39:57 +01:00
Dominic Hamon d939634b8c
README improvements (#648)
* Clarifications and cleaning of the core documentation.
2018-07-26 14:29:33 +01:00
Dominic Hamon f965eab508
Memory management and reporting hooks (#625)
* Introduce memory manager interface

* Add memory stats to JSON reporter and a test

* Add comments and switch json output test to int
2018-07-24 15:57:15 +01:00
Dominic Hamon 63e183b389
Add note to tools.md regarding scipy. 2018-07-23 12:08:20 +01:00
Ori Livneh e1150acab9 Add Ori Livneh to AUTHORS and CONTRIBUTORS 2018-07-09 10:52:28 -04:00
Ori Livneh da9ec3dfca Include system load average in console and JSON reports
High system load can skew benchmark results. By including system load averages
in the library's output, we help users identify a potential issue in the
quality of their measurements, and thus assist them in producing better (more
reproducible) results.

I got the idea for this from Brendan Gregg's checklist for benchmark accuracy
(http://www.brendangregg.com/blog/2018-06-30/benchmarking-checklist.html).
2018-07-09 10:51:08 -04:00
Federico Ficarelli 1f35fa4aa7 Update AUTHORS and CONTRIBUTORS (#632)
Adding myself to AUTHORS and CONTRIBUTORS according to guidelines.
2018-07-09 12:47:16 +01:00
Federico Ficarelli 0c21bc369a Fix build with Intel compiler (#631)
* Set -Wno-deprecated-declarations for Intel

Intel compiler silently ignores -Wno-deprecated-declarations
so warning no. 1786 must be explicitly suppressed.

* Make std::int64_t → double casts explicit

While std::int64_t → double is a perfectly conformant
implicit conversion, Intel compiler warns about it.
Make them explicit via static_cast<double>.

* Make std::int64_t → int casts explicit

Intel compiler warns about emplacing an std::int64_t
into an int container. Just make the conversion explicit
via static_cast<int>.

* Cleanup Intel -Wno-deprecated-declarations workaround logic
2018-07-09 11:45:10 +01:00
Federico Ficarelli 5946795e82 Disable Intel invalid offsetof warning (#629) 2018-07-03 10:13:22 +01:00
Yoshinari Takaoka 847c006902 fixed Google Test (Primer) Documentation link (#628) 2018-06-28 10:25:54 +01:00
Roman Lebedev b123abdcf4 Add Iteration-related Counter::Flags. Fixes #618 (#621)
Inspired by these [two](a1ebe07bea) [bugs](0891555be5) in my code due to the lack of those i have found fixed in my code:
* `kIsIterationInvariant` - `* state.iterations()`
  The value is constant for every iteration, and needs to be **multiplied** by the iteration count.
* `kAvgIterations` - `/ state.iterations()`
  The is global over all the iterations, and needs to be **divided** by the iteration count.

They play nice with `kIsRate`:
* `kIsIterationInvariantRate`
* `kAvgIterationsRate`.

I'm not sure how  meaningful they are when combined with `kAvgThreads`.
I guess the `kIsThreadInvariant` can be added, too, for symmetry with `kAvgThreads`.
2018-06-27 15:45:30 +01:00
Dominic Hamon d8584bda67
Use EXPECT_DOUBLE_EQ when comparing doubles in tests. (#624)
* Use EXPECT_DOUBLE_EQ when comparing doubles in tests.

Fixes #623

* disable 'float-equal' warning
2018-06-27 12:11:30 +01:00
Roman Lebedev 7d03f2df49 [Tooling] Enable U Test by default, add tooltip about repetition count. (#617)
As previously discussed, let's flip the switch ^^.

This exposes the problem that it will now be run
for everyone, even if one did not read the help
about the recommended repetition count.

This is not good. So i think we can do the smart thing:
```
$ ./compare.py benchmarks gbench/Inputs/test3_run{0,1}.json
Comparing gbench/Inputs/test3_run0.json to gbench/Inputs/test3_run1.json
Benchmark                   Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------
BM_One                   -0.1000         +0.1000            10             9           100           110
BM_Two                   +0.1111         -0.0111             9            10            90            89
BM_Two                   +0.2500         +0.1125             8            10            80            89
BM_Two_pvalue             0.2207          0.6831      U Test, Repetitions: 2. WARNING: Results unreliable! 9+ repetitions recommended.
BM_Two_stat              +0.0000         +0.0000             8             8            80            80
```
(old screenshot)
![image](https://user-images.githubusercontent.com/88600/41502182-ea25d872-71bc-11e8-9842-8aa049509b14.png)

Or, in the good case (noise omitted):
```
s$ ./compare.py benchmarks /tmp/run{0,1}.json
Comparing /tmp/run0.json to /tmp/run1.json
Benchmark                                            Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------
<99 more rows like this>
./_T012014.RW2/threads:8/real_time                +0.0160         +0.0596            46            47            10            10
./_T012014.RW2/threads:8/real_time_pvalue          0.0000          0.0000      U Test, Repetitions: 100
./_T012014.RW2/threads:8/real_time_mean           +0.0094         +0.0609            46            47            10            10
./_T012014.RW2/threads:8/real_time_median         +0.0104         +0.0613            46            46            10            10
./_T012014.RW2/threads:8/real_time_stddev         -0.1160         -0.1807             1             1             0             0
```
(old screenshot)
![image](https://user-images.githubusercontent.com/88600/41502185-fb8193f4-71bc-11e8-85fa-cbba83e39db4.png)
2018-06-18 12:58:16 +01:00
Dominic Hamon 151ead6242
Disable deprecation warnings when -Werror is enabled. (#609)
Fixes #608
2018-06-07 12:54:14 +01:00
Marat Dukhan 505be96ab2 Avoid using CMake 3.6 feature list(FILTER ...) (#612)
list(FILTER ...) is a CMake 3.6 feature, but benchmark targets CMake 2.8.12
2018-06-06 12:32:42 +01:00
Sergiu Deitsch 1301f53e31 cmake: use numeric version in package config (#611) 2018-06-05 15:01:44 +01:00
Marat Dukhan 7fb3c564e5 Fix compilation on Android with GNU STL (#596)
* Fix compilation on Android with GNU STL

GNU STL in Android NDK lacks string conversion functions from C++11, including std::stoul, std::stoi, and std::stod.
This patch reimplements these functions in benchmark:: namespace using C-style equivalents from C++03.

* Avoid use of log2 which doesn't exist in Android GNU STL

GNU STL in Android NDK lacks log2 function from C99/C++11.
This patch replaces their use in the code with double log(double) function.
2018-06-05 11:36:26 +01:00
BaaMeow 4c2af07889 (clang-)format all the things (#610)
* format all documents according to contributor guidelines and specifications
use clang-format on/off to stop formatting when it makes excessively poor decisions

* format all tests as well, and mark blocks which change too much
2018-06-01 11:14:19 +01:00
Dominic Hamon 4fbfa2f336
Some platforms and environments don't pass a valid argc/argv. (#607)
Specifically some iOS targets.
2018-05-30 13:17:41 +01:00
Dominic Hamon d07372e64b
clang-format run on the benchmark header (#606) 2018-05-29 14:12:51 +01:00
Eric 7b8d0249d8 Deprecate CSVReporter - A first step to overhauling reporting. (#488)
As @dominichamon and I have discussed, the current reporter interface
is poor at best. And something should be done to fix it.

I strongly suspect such a fix will require an entire reimagining
of the API, and therefore breaking backwards compatibility fully.

For that reason we should start deprecating and removing parts
that we don't intend to replace. One of these parts, I argue,
is the CSVReporter. I propose that the new reporter interface
should choose a single output format (JSON) and traffic entirely
in that. If somebody really wanted to replace the functionality
of the CSVReporter they would do so as an external tool which
transforms the JSON.

For these reasons I propose deprecating the CSVReporter.
2018-05-29 13:25:32 +01:00
Dominic Hamon 16703ff83c
cleaner and slightly larger statistics tests (#604) 2018-05-29 13:13:06 +01:00
Dominic Hamon c8adf4531f
Add some 'travis_wait' commands to avoid gcc@7 installation timeouts. (#605) 2018-05-29 13:12:48 +01:00
Roman Lebedev a6a1b0d765 Benchmarking is hard. Making sense of the benchmarking results is even harder. (#593)
The first problem you have to solve yourself. The second one can be aided.
The benchmark library can compute some statistics over the repetitions,
which helps with grasping the results somewhat.

But that is only for the one set of results. It does not really help to compare
the two benchmark results, which is the interesting bit. Thankfully, there are
these bundled `tools/compare.py` and `tools/compare_bench.py` scripts.

They can provide a diff between two benchmarking results. Yay!
Except not really, it's just a diff, while it is very informative and better than
nothing, it does not really help answer The Question - am i just looking at the noise?
It's like not having these per-benchmark statistics...

Roughly, we can formulate the question as:
> Are these two benchmarks the same?
> Did my change actually change anything, or is the difference below the noise level?

Well, this really sounds like a [null hypothesis](https://en.wikipedia.org/wiki/Null_hypothesis), does it not?
So maybe we can use statistics here, and solve all our problems?
lol, no, it won't solve all the problems. But maybe it will act as a tool,
to better understand the output, just like the usual statistics on the repetitions...

I'm making an assumption here that most of the people care about the change
of average value, not the standard deviation. Thus i believe we can use T-Test,
be it either [Student's t-test](https://en.wikipedia.org/wiki/Student%27s_t-test), or [Welch's t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test).
**EDIT**: however, after @dominichamon review, it was decided that it is better
to use more robust [Mann–Whitney U test](https://en.wikipedia.org/wiki/Mann–Whitney_U_test)
I'm using [scipy.stats.mannwhitneyu](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html#scipy.stats.mannwhitneyu).

There are two new user-facing knobs:
```
$ ./compare.py --help
usage: compare.py [-h] [-u] [--alpha UTEST_ALPHA]
                  {benchmarks,filters,benchmarksfiltered} ...

versatile benchmark output compare tool
<...>
optional arguments:
  -h, --help            show this help message and exit

  -u, --utest           Do a two-tailed Mann-Whitney U test with the null
                        hypothesis that it is equally likely that a randomly
                        selected value from one sample will be less than or
                        greater than a randomly selected value from a second
                        sample. WARNING: requires **LARGE** (9 or more)
                        number of repetitions to be meaningful!
  --alpha UTEST_ALPHA   significance level alpha. if the calculated p-value is
                        below this value, then the result is said to be
                        statistically significant and the null hypothesis is
                        rejected. (default: 0.0500)
```

Example output:
![screenshot_20180512_175517](https://user-images.githubusercontent.com/88600/39958581-ae897924-560d-11e8-81b9-806db6c3e691.png)
As you can guess, the alpha does affect anything but the coloring of the computed p-values.
If it is green, then the change in the average values is statistically-significant.

I'm detecting the repetitions by matching name. This way, no changes to the json are _needed_.
Caveats:
* This won't work if the json is not in the same order as outputted by the benchmark,
   or if the parsing does not retain the ordering.
* This won't work if after the grouped repetitions there isn't at least one row with
  different name (e.g. statistic). Since there isn't a knob to disable printing of statistics
  (only the other way around), i'm not too worried about this.
* **The results will be wrong if the repetition count is different between the two benchmarks being compared.**
* Even though i have added (hopefully full) test coverage, the code of these python tools is staring
  to look a bit jumbled.
* So far i have added this only to the `tools/compare.py`.
  Should i add it to `tools/compare_bench.py` too?
  Or should we deduplicate them (by removing the latter one)?
2018-05-29 11:13:28 +01:00