* Rewrite complexity_test to use (hardcoded) manual time
This test is fundamentally flaky, because it tried to read tea leafs,
and is inherently misbehaving in CI environments,
since there are unmitigated sources of noise.
That being said, the computed Big-O also depends on the `--benchmark_min_time=`
Fixes https://github.com/google/benchmark/issues/272
* Correctly compute Big-O for manual timings. Fixes#1758.
* complexity_test: do more stuff in empty loop
* Make all empty loops be a bit longer empty
Looks like on windows, some of these tests still fail,
i guess clock precision is too small.
Much like it makes sense to enumerate all the families,
it makes sense to enumerate stuff within families.
Alternatively, we could have a global instance index,
but i'm not sure why that would be better.
This will be useful when the benchmarks are run not in order,
for the tools to sort the results properly.
It may be useful for those wishing to further post-process JSON results,
but it is mainly geared towards better support for run interleaving,
where results from the same family may not be close-by in the JSON.
While we won't be able to do much about that for outputs,
the tools can and perhaps should reorder the results to that
at least in their output they are in proper order, not run order.
Note that this only counts the families that were filtered-in,
so if e.g. there were three families, and we filtered-out
the second one, the two families (which were first and third)
will have family indexes 0 and 1.
While current counters can e.g. answer the question
"how many items is processed per second", it is impossible to get
it to tell "how many seconds it takes to process a single item".
The solution is to add a yet another modifier `kInvert`,
that is *always* considered last, which simply inverts the answer.
Fixes#781, #830, #848.
* Update AUTHORS and CONTRIBUTORS
* Fix WSL self-test failures
Some of the benchmark self-tests expect and check for a particular
output format from the benchmark library. The numerical values must
not be infinity or not-a-number, or the test will report an error.
Some of the values are computed bytes-per-second or items-per-second
values, so these require that the measured CPU time for the test to be
non-zero. But the loop that is being measured was empty, so the
measured CPU time for the loop was extremely small. On systems like
Windows Subsystem for Linux (WSL) the timer doesn't have enough
resolution to measure this, so the measured CPU time was zero.
This fix just makes sure that these tests have something within the
timing loop, so that the benchmark library will not decide that the
loop takes zero CPU time. This makes these tests more robust, and in
particular makes them pass on WSL.
* [JSON] add threads and repetitions to the json output, for better ide…
[Tests] explicitly check for thread == 1
[Tests] specifically mark all repetition checks
[JSON] add repetition_index reporting, but only for non-aggregates (i…
* [Formatting] Be very, very explicit about pointer alignment so clang-format can not put pointers/references on the wrong side of arguments.
[Benchmark::Run] Make sure to use explanatory sentinel variable rather than a magic number.
* Do not pass redundant information
As discussed with @dominichamon and @dbabokin, sugar is nice.
Well, maybe not for the health, but it's sweet.
Alright, enough puns.
A special care needs to be applied not to break csv reporter. UGH.
We end up shedding some code over this.
We no longer specially pretty-print them, they are printed just like the rest of custom counters.
Fixes#627.
This is related to @BaaMeow's work in https://github.com/google/benchmark/pull/616 but is not based on it.
Two new fields are tracked, and dumped into JSON:
* If the run is an aggregate, the aggregate's name is stored.
It can be RMS, BigO, mean, median, stddev, or any custom stat name.
* The aggregate-name-less run name is additionally stored.
I.e. not some name of the benchmark function, but the actual
name, but without the 'aggregate name' suffix.
This way one can group/filter all the runs,
and filter by the particular aggregate type.
I *might* need this for further tooling improvement.
Or maybe not.
But this is certainly worthwhile for custom tooling.
This is *only* exposed in the JSON. Not in CSV, which is deprecated.
This *only* supposed to track these two states.
An additional field could later track which aggregate this is,
specifically (statistic name, rms, bigo, ...)
The motivation is that we already have ReportAggregatesOnly,
but it affects the entire reports, both the display,
and the reporters (json files), which isn't ideal.
It would be very useful to have a 'display aggregates only' option,
both in the library's console reporter, and the python tooling,
This will be especially needed for the 'store separate iterations'.
Inspired by these [two](a1ebe07bea) [bugs](0891555be5) in my code due to the lack of those i have found fixed in my code:
* `kIsIterationInvariant` - `* state.iterations()`
The value is constant for every iteration, and needs to be **multiplied** by the iteration count.
* `kAvgIterations` - `/ state.iterations()`
The is global over all the iterations, and needs to be **divided** by the iteration count.
They play nice with `kIsRate`:
* `kIsIterationInvariantRate`
* `kAvgIterationsRate`.
I'm not sure how meaningful they are when combined with `kAvgThreads`.
I guess the `kIsThreadInvariant` can be added, too, for symmetry with `kAvgThreads`.
* format all documents according to contributor guidelines and specifications
use clang-format on/off to stop formatting when it makes excessively poor decisions
* format all tests as well, and mark blocks which change too much
Recently the library added a new ranged-for variant of the KeepRunning
loop that is much faster. For this reason it should be preferred in all
new code.
Because a library, its documentation, and its tests should all embody
the best practices of using the library, this patch changes all but a
few usages of KeepRunning() into for (auto _ : state).
The remaining usages in the tests and documentation persist only
to document and test behavior that is different between the two formulations.
Also note that because the range-for loop requires C++11, the KeepRunning
variant has not been deprecated at this time.
* Json reporter: passthrough fp, don't cast it to int; adjust tooling
Json output format is generally meant for further processing
using some automated tools. Thus, it makes sense not to
intentionally limit the precision of the values contained
in the report.
As it can be seen, FormatKV() for doubles, used %.2f format,
which was meant to preserve at least some of the precision.
However, before that function is ever called, the doubles
were already cast to the integer via RoundDouble()...
This is also the case for console reporter, where it makes
sense because the screen space is limited, and this reporter,
however the CSV reporter does output some( decimal digits.
Thus i can only conclude that the loss of the precision
was not really considered, so i have decided to adjust the
code of the json reporter to output the full fp precision.
There can be several reasons why that is the right thing
to do, the bigger the time_unit used, the greater the
precision loss, so i'd say any sort of further processing
(like e.g. tools/compare_bench.py does) is best done
on the values with most precision.
Also, that cast skewed the data away from zero, which
i think may or may not result in false- positives/negatives
in the output of tools/compare_bench.py
* Json reporter: FormatKV(double): address review note
* tools/gbench/report.py: skip benchmarks with different time units
While it may be useful to teach it to operate on the
measurements with different time units, which is now
possible since floats are stored, and not the integers,
but for now at least doing such a sanity-checking
is better than providing misinformation.
This is needed for examining the values of user counters (needed
for #348). It is also needed for checking the values of standard
benchmark results like items_processed or complexities (for example,
checking the standard deviation is needed for unit testing #357
as discussed in #362).
The tests are still missing a way to check actual validity of
numerical results; this will be done next. As they currently are,
the tests pass, but the problem detected with #378 is still
standing and the results with non-standard counters are wrong.