* The parameterized tests check both floating point and integral types. We might as well use types that avoid truncation warnings across the platforms
* static_cast version of how to avoid truncation warnings in basic_test
Co-authored-by: Staffan Tjernstrom <staffantj@users.noreply.github.com>
* Allow template arguments to be specifed directly on the BENCHMARK macro/
Use cases:
- more convenient (than having to use a separate BENCHMARK_TEMPLATE)
- feature parity with our internal library.
* fix tests
* updated docs
Use the benchmark's reported iteration count when estimating
iterations for the next repetition, rather than the requested
iteration count. When the benchmark uses KeepRunningBatch the actual
iteration count can be larger than the one the runner requested.
Prior to this fix the runner was underestimating the next iteration
count, sometimes significantly so. Consider the case of a benchmark
using a batch size of 1024. Prior to this change, the benchmark
runner would attempt iteration counts 1, 10, 100 and 1000, yet the
benchmark itself would do the same amount of work each time: a single
batch of 1024 iterations. The discrepancy could also contribute to
estimation errors once the benchmark time reached 10% of the target.
For example, if the very first batch of 1024 iterations reached 10% of
benchmark_min_min time, the runner would attempt to scale that to 100%
from a basis of one iteration rather than 1024.
This bug was particularly noticeable in benchmarks with large batch
sizes, especially when the benchmark also had slow set up or tear down
phases.
With this fix in place it is possible to use KeepRunningBatch to
achieve a kind of "minimum iteration count" feature by using a larger
fixed batch size. For example, a benchmark may build a map of 500K
elements and test a "find" operation. There is no point in running
"find" just 1, 10, 100, etc., times. The benchmark can now pick a
batch size of something like 10K, and the runner will arrive at the
final max iteration count with in noticeably fewer repetitions.
This is a shameless rip-off of https://github.com/google/benchmark/pull/646
I did promise to look into why that proposed PR was producing
so much worse assembly, and so i finally did.
The reason is - that diff changes `size_t` (unsigned) to `int64_t` (signed).
There is this nice little `assert`:
7a1c370283/include/benchmark/benchmark.h (L744)
It ensures that we didn't magically decide to advance our iterator
when we should have finished benchmarking.
When `cached_` was unsigned, the `assert` was `cached_ UGT 0`.
But we only ever get to that `assert` if `cached_ NE 0`,
and naturally if `cached_` is not `0`, then it is bigger than `0`,
so the `assert` is tautological, and gets folded away.
But now that `cached_` became signed, the assert became `cached_ SGT 0`.
And we still only know that `cached_ NE 0`, so the assert can't be
optimized out, or at least it doesn't currently.
Regardless of whether or not that is a bug in itself,
that particular diff would have regressed the normal 64-bit systems,
by halving the maximal iteration space (since we go from unsigned counter
to signed one, of the same bit-width), which seems like a bug.
And just so it happens, fixing *this* bug, fixes the other bug.
This produces fully (bit-by-bit) identical state_assembly_test.s
The filecheck change is actually needed regardless of this patch,
else this test does not pass for me even without this diff.
Due to ADL lookup performed on the begin and end functions
of `for (auto _ : State)`, std::iterator_traits may get
incidentally instantiated. This patch ensures the library
can tolerate that.
* Support State::KeepRunningBatch().
State::KeepRunning() can take large amounts of time relative to quick
operations (on the order of 1ns, depending on hardware). For such
sensitive operations, it is recommended to run batches of repeated
operations.
This commit simplifies handling of total_iterations_. Rather than
predecrementing such that total_iterations_ == 1 signals that
KeepRunning() should exit, total_iterations_ == 0 now signals the
intention for the benchmark to exit.
* Create better fast path in State::KeepRunningBatch()
* Replace int parameter with size_t to fix signed mismatch warnings
* Ensure benchmark State has been started even on error.
* Simplify KeepRunningBatch()
Recently the library added a new ranged-for variant of the KeepRunning
loop that is much faster. For this reason it should be preferred in all
new code.
Because a library, its documentation, and its tests should all embody
the best practices of using the library, this patch changes all but a
few usages of KeepRunning() into for (auto _ : state).
The remaining usages in the tests and documentation persist only
to document and test behavior that is different between the two formulations.
Also note that because the range-for loop requires C++11, the KeepRunning
variant has not been deprecated at this time.
* Add C++11 Ranged For loop alternative to KeepRunning
As pointed out by @astrelni and @dominichamon, the KeepRunning
loop requires a bunch of memory loads and stores every iterations,
which affects the measurements.
The main reason for these additional loads and stores is that the
State object is passed in by reference, making its contents externally
visible memory, and the compiler doesn't know it hasn't been changed
by non-visible code.
It's also possible the large size of the State struct is hindering
optimizations.
This patch allows the `State` object to be iterated over using
a range-based for loop. Example:
void BM_Foo(benchmark::State& state) {
for (auto _ : state) {
[...]
}
}
This formulation is much more efficient, because the variable counting
the loop index is stored in the iterator produced by `State::begin()`,
which itself is stored in function-local memory and therefore not accessible
by code outside of the function. Therefore the compiler knows the iterator
hasn't been changed every iteration.
This initial patch and idea was from Alex Strelnikov.
* Fix null pointer initialization in C++03
* Make Benchmark a single header library (but not header-only)
This patch refactors benchmark into a single header, to allow
for slightly easier usage.
The initial reason for the header split was to keep C++ library
components from being included by benchmark_api.h, making that
part of the library STL agnostic. However this has since changed
and there seems to be little reason to separate the reporters from
the rest of the library.
* Fix internal_macros.h
* Remove more references to macros.h
* Support multiple ranges in the benchmark
google-benchmark library allows to provide up to two ranges to the
benchmark method (range_x and range_y). However, in many cases it's not
sufficient. The patch introduces multi-range features, so user can easily
define multiple ranges by passing a vector of integers, and access values
through the method range(i).
* Remove redundant API
Functions State::range_x() and State::range_y() have been removed. They should
be replaced by State::range(0) and State::range(1).
Functions Benchmark::ArgPair() and Benchmark::RangePair() have been removed.
They should be replaced by Benchmark::Args() and Benchmark::Ranges().
This patch adopts a new internal structure for how timings are performed.
Currently every iteration of a benchmark checks to see if it has been running
for an appropriate amount of time. Checking the clock introduces noise into
the timings and this can cause inconsistent output from each benchmark.
Now every iteration of a benchmark only checks an iteration count to see if it
should stop running. The iteration count is determined before hand by testing
the benchmark on a series of increasing iteration counts until a suitable count
is found. This increases the amount of time it takes to run the actual benchmarks
but it also greatly increases the accuracy of the results.
This patch introduces some breaking changes. The notable breaking changes are:
1. Benchmarks run on multiple threads no generate a report per thread. Instead
only a single report is generated.
2. ::benchmark::UseRealTime() was removed and replaced with State::UseRealTime().