Despite the wide variety of the features we provide,
some people still have the audacity to complain and demand more.
Concretely, i *very* often would like to see the overall result
of the benchmark. Is the 'new' better or worse, overall,
over all the non-aggregate time/cpu measurements.
This comes up for me most often when i want to quickly see
what effect some LLVM optimization change has on the benchmark.
The idea is straight-forward, just produce four lists:
wall times for LHS benchmark, CPU times for LHS benchmark,
wall times for RHS benchmark, CPU times for RHS benchmark;
then compute geomean for each one of those four lists,
and compute the two percentage change between
* geomean wall time for LHS benchmark and geomean wall time for RHS benchmark
* geomean CPU time for LHS benchmark and geomean CPU time for RHS benchmark
and voila!
It is complicated by the fact that it needs to graciously handle
different time units, so pandas.Timedelta dependency is introduced.
That is the only library that does not barf upon floating times,
i have tried numpy.timedelta64 (only takes integers)
and python's datetime.timedelta (does not take nanosecons),
and they won't do.
Fixes https://github.com/google/benchmark/issues/1147
* Add Setup/Teardown option on Benchmark.
Motivations:
- feature parity with our internal library. (which has ~718 callers)
- more flexible than cordinating setup/teardown inside the benchmark routine.
* change Setup/Teardown callback type to raw function pointers
* add test file to cmake file
* move b.Teardown() up
* add const to param of Setup/Teardown callbacks
* fix comment and add doc to user_guide
* fix typo
* fix doc, fix test and add bindings to python/benchmark.cc
* fix binding again
* remove explicit C cast - that was wrong
* change policy to reference_internal
* try removing the bindinds ...
* clean up
* add more tests with repetitions and fixtures
* more comments
* init setup/teardown callbacks to NULL
* s/nullptr/NULL
* removed unused var
* change assertion on fixture_interaction::fixture_setup
* move NULL init to .cc file
It seems according to [1] that bazelbuild/rules_cc has been put on hold
and that the recommended way for now, is to use the native cc rules.
[1]: https://github.com/bazelbuild/rules_go/pull/2950
* Fix dependency typo and unpin cibuildwheel version in wheel building action
* Move to monolithic build jobs, restrict to x64 architectures
As of this commit, all wheel building jobs complete on GitHub Actions. Since some platform-specific options had to be set to fix different types of build problems underway, the build job matrix was unrolled.
Still left TODO:
* Wheel testing after build (running the Python bindings test)
* Emulating bazel on other architectures to build aarch64/i686/ppc64le
* Enabling Win32 (this fails due to linker errors).
* Add binding test commands for all wheels, set macOSX deployment target to 10.9
* Add instructions for updating Python __version__ variable before release creation
* Fix un-initted error in test.
Found by -Werror,-Wsometimes-uninitialized
* Update spec_arg_test.cc
* additional change:
- Change the API on GetBenchmarkFilter and the `spec` to std::string because google C++ styleguide internally kind of discouraged using raw const char*
* use docker container for ubuntu-16.04 builds
* install some bits
* no sudo in docker container
* cmake, not cmake3
* include perfcounters
* still no sudo in docker containers
* yes please, apt
* [RFC] Adding API for setting/getting benchmark_filter flag?
This PR is more of a Request-for-comment - open to other ideas/suggestions as well.
Details:
This flag has different implementations(absl vs benchmark) and since the proposal to add absl as a dependency was rejected, it would be nice to have a reliable (and less hacky) way to access this flag internally.
(Actually, reading it isn't much a problem but setting it is).
Internally, we have a sizeable number users to use absl::SetFlags to set this flag. This will not work with benchmark-flags.
Another motivation is that not all users use the command line flag. Some prefer to programmatically set this value.
* fixed build errors
* fix lints again
* per discussion: add additional RunSpecifiedBenchmarks instead.
* add tests
* fix up tests
* clarify comment
* fix stray : in test
* more assertion in test
* add test file to test/CMakeLists.txt
* more test
* make test ISO C++ compliant
* fix up BUILD file to pass the flag
* Allow template arguments to be specifed directly on the BENCHMARK macro/
Use cases:
- more convenient (than having to use a separate BENCHMARK_TEMPLATE)
- feature parity with our internal library.
* fix tests
* updated docs
In #1238, one of MemoryManager's Stop methods was marked as deprecated
and this method is used in the same header. This change generated
-Wdeprecated-declarations warning on every file that includes
"benchmark.h". Use gcc's diagnostics to fix this warning.
WebRTC uses Google Benchmarks as a dependency and uses Chromium's build
infrastructure. Chromium is compiled using clang-cl on Windows, and the
-Wdeprecated-declarations warning is triggered. Because clang-cl accepts
gcc's diagnostic prama and defines the __clang__ macro,
using it can solve this issue.
Bug: webrtc:13280
* Introduce Coefficient of variation aggregate
I believe, it is much more useful / use to understand,
because it is already normalized by the mean,
so it is not affected by the duration of the benchmark,
unlike the standard deviation.
Example of real-world output:
```
raw.pixls.us-unique/GoPro/HERO6 Black$ ~/rawspeed/build-old/src/utilities/rsbench/rsbench GOPR9172.GPR --benchmark_repetitions=27 --benchmark_display_aggregates_only=true --benchmark_counters_tabular=true
2021-09-03T18:05:56+03:00
Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench
Run on (32 X 3596.16 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 512 KiB (x16)
L3 Unified 32768 KiB (x2)
Load Average: 7.00, 2.99, 1.85
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GOPR9172.GPR/threads:32/process_time/real_time_mean 11.1 ms 353 ms 27 0.353122 31.9473 12M 33.9879M 1085.84M 2.83232 90.4864 0.0110535
GOPR9172.GPR/threads:32/process_time/real_time_median 11.0 ms 352 ms 27 0.351696 31.9599 12M 34.1203M 1090.11M 2.84336 90.8425 0.0110081
GOPR9172.GPR/threads:32/process_time/real_time_stddev 0.159 ms 4.60 ms 27 4.59539m 0.0462064 0 426.371k 14.9631M 0.0355309 1.24692 158.944u
GOPR9172.GPR/threads:32/process_time/real_time_cv 1.44 % 1.30 % 27 0.0130136 1.44633m 0 0.0125448 0.0137802 0.0125448 0.0137802 0.0143795
```
Fixes https://github.com/google/benchmark/issues/1146
* Be consistent, it's CV, not 'rel std dev'
* Statistics: add support for percentage unit in addition to time
I think, `stddev` statistic is useful, but confusing.
What does it mean if `stddev` of `1ms` is reported?
Is that good or bad? If the `median` is `1s`,
then that means that the measurements are pretty noise-less.
And what about `stddev` of `100ms` is reported?
If the `median` is `1s` - awful, if the `median` is `10s` - good.
And hurray, there is just the statistic that we need:
https://en.wikipedia.org/wiki/Coefficient_of_variation
But, naturally, that produces a value in percents,
but the statistics are currently hardcoded to produce time.
So this refactors thinkgs a bit, and allows a percentage unit for statistics.
I'm not sure whether or not `benchmark` would be okay
with adding this `RSD` statistic by default,
but regales, that is a separate patch.
Refs. https://github.com/google/benchmark/issues/1146
* Address review notes
The test used to work with scipy 1.6, but does not work with 1.7.1
I believe this is because of https://github.com/scipy/scipy/pull/4933
Since there doesn't appear that there is anything wrong with how
we call the `mannwhitneyu()`, i guess we should just expect the new values.
Notably, the tests now fail with earlier scipy versions.