Commit graph

820 commits

Author SHA1 Message Date
Samuel Panzer 296ec5693e Support State::KeepRunningBatch(). (#521)
* Support State::KeepRunningBatch().

State::KeepRunning() can take large amounts of time relative to quick
operations (on the order of 1ns, depending on hardware). For such
sensitive operations, it is recommended to run batches of repeated
operations.

This commit simplifies handling of total_iterations_. Rather than
predecrementing such that total_iterations_ == 1 signals that
KeepRunning() should exit, total_iterations_ == 0 now signals the
intention for the benchmark to exit.

* Create better fast path in State::KeepRunningBatch()

* Replace int parameter with size_t to fix signed mismatch warnings

* Ensure benchmark State has been started even on error.

* Simplify KeepRunningBatch()
2018-02-09 21:57:04 -07:00
Tim bc83262f9d .vs/ and CmakeSettings.json to gitignore (#522) 2018-02-03 22:04:36 -07:00
Dominic Hamon df415adb2a
Some small clang-tidy fixes (#520) 2018-01-29 08:38:47 -08:00
oskidan 4fe0206b65 Fixes compilation error caused by integer precision loss due to implicit (#518)
conversion in sysinfo.cc
2018-01-19 09:17:01 -08:00
Dominic Hamon 9f5694ceb6
Wrap COMPILER macros. (#514)
Some command line or build systems may already set these (eg, bazel) so
make sure that takes priority.

Fixes #513
2018-01-11 17:22:45 -08:00
Eric e1c3a83b81
Merge pull request #509 from efcs/fix-gtest-install
Prevent GTest and GMock from being installed with Google Benchmark.
2018-01-05 12:51:03 -07:00
Eric Fiselier 778b85a7a9 Prevent GTest and GMock from being installed with Google Benchmark.
When users satisfy the GTest dependancy by placing a googletest
directory in the project, the targets from GTest and GMock incorrectly
get installed along side this library. We shouldn't be installing
our test dependancies.

This patch forces the options that control installation for googletest
to OFF.
2018-01-05 11:04:22 -07:00
Winston Du 052421c823 Updated documentation. (#503)
For people who get this library via CMake's AddExternalProject like me.
Would like a long term tutorial from someone who really understands CMake on how to actually link an externalproject's dependencies to another added external project.
2018-01-04 17:13:34 -07:00
Dominic Hamon e4ccad7c4a
Update README.md 2017-12-14 09:40:26 -08:00
Eric 7db02be244
Add support for GTest based unit tests. (#485)
* Add support for GTest based unit tests.

As Dominic and I have previously discussed, there is some
need/desire to improve the testing situation in Google Benchmark.

One step to fixing this problem is to make it easier to write
unit tests by adding support for GTest, which is what this patch does.

By default it looks for an installed version of GTest. However the
user can specify -DBENCHMARK_BUILD_EXTERNAL_GTEST=ON to instead
download, build, and use copy of gtest from source. This is
quite useful when Benchmark is being built in non-standard configurations,
such as against libc++ or in 32 bit mode.
2017-12-13 16:26:47 -07:00
Eric Fiselier de725e5a7c Document new 'v2' branch meant for unstable development.
This patch documents the newly added v2 branch, which will
be used to stage, test, and receive feedback on upcoming
features, most of which will be breaking changes which can't
be directly applied to master.
2017-12-13 14:51:56 -07:00
Dominic Hamon 7f2d2cd5b9 fix xcode travis builds by skipping mkdir errors 2017-12-07 14:20:59 -08:00
Louis Dionne 5b2c08668c Enforce using a semicolon after BENCHMARK_MAIN to remove compiler warnings (#495) 2017-12-03 18:45:07 -07:00
Victor Costan 0bbaeeaf7a Add GCC on OSX to list of Travis CI configurations. (#492) 2017-11-30 15:21:32 -08:00
Victor Costan 95a1435b81 Fix compilation error with GCC on OSX (issue #490). (#491) 2017-11-30 08:05:38 -08:00
Roman Lebedev c45f01866b CMake: implement LTO for clang. Fixes #478 (#487)
* CMake: implement LTO for clang. Fixes #478

* LTO: add basic docs about required executables.
2017-11-29 12:48:43 -08:00
Kishan Kumar eae42212ce Added the installation guide for Ubuntu (#489)
* Initial CLA Requirement

* Added Installation steps to the Readme.md

* Fixed error in running benchmark of Installation

* Remove unwanted commands

Removed the lengthy install procedure with suggested install mechanism
2017-11-29 09:36:19 -08:00
Roman Lebedev ec5684ed75 Console reporter: properly account for the lenght of custom counter names (#484)
Old output example:
```
Benchmark                                                 Time           CPU Iterations  CPUTime,s   Pixels/s ThreadingFactor
------------------------------------------------------------------------------------------------------------------------------
20170525_0036TEST.RAF/threads:8/real_time                45 ms         45 ms         16   0.718738 79.6277M/s   0.999978   2.41419GB/s    22.2613 items/s FileSize,MB=111.050781; MPix=57.231360
```

New output example:
```
Benchmark                                                 Time           CPU Iterations  CPUTime,s   Pixels/s ThreadingFactor
------------------------------------------------------------------------------------------------------------------------------
20170525_0036TEST.RAF/threads:8/real_time                45 ms         45 ms         16   0.713575 80.1713M/s        0.999571   2.43067GB/s    22.4133 items/s FileSize,MB=111.050781; MPix=57.231360
```
2017-11-27 09:01:01 -08:00
Eric Fiselier 2ec7399cf1 Improve BENCHMARK_UNREACHABLE() implementation.
This patch primarily changes the BENCHMARK_UNREACHABLE()
implementation under MSVC to use __assume(false) instead
of being a NORETURN function, which ironically caused
unreachable code warnings.

Second, since the NOTHROW function attempt generated the
warnings we meant to avoid, it has been replaced with a dummy
null statement.
2017-11-26 13:58:24 -07:00
Eric 11dc36822b
Improve CPU Cache info reporting -- Add Windows support. (#486)
* Improve CPU Cache info reporting -- Add Windows support.

This patch does a couple of thing regarding CPU Cache reporting.

First, it adds an implementation on Windows. Second it fixes
the JSONReporter to correctly (and actually) output the CPU
configuration information.

And finally, third, it detects and reports the number of
physical CPU's that share the same cache.
2017-11-26 13:33:01 -07:00
Eric 27e0b439cf Refactor System information collection -- Add CPU Cache Info (#483)
* Refactor System information collection.

This patch refactors the system information collection,
and in particular information about the target CPU. The
motivation is to make it easier to access CPU information,
and easier to add new information as need be.

This patch additionally adds information about the cache
sizes of the CPU.

* Address review comments: Clean up integer types.

This commit cleans up the integer types used in ValueUnion to
follow the Google style guide.

Additionally it adds a BENCHMARK_UNREACHABLE macro to assist
in documenting/catching unreachable code paths.

* Rename ValueUnion accessors.
2017-11-22 08:33:52 -08:00
Kamil Rytarowski aad6a5fa76 Add NetBSD support (#482)
Define BENCHMARK_OS_NETBSD for NetBSD.

Add detection of cpuinfo_cycles_per_second and cpuinfo_num_cpus.
This code shared detection of these properties with FreeBSD.
2017-11-17 08:46:08 -08:00
Steinar H. Gunderson 0c3ec998c4 Add a pkg-config file, for the benefit of projects not using CMake. (#480) 2017-11-15 11:51:22 -08:00
Dominic Hamon ed5764ea28
Add doc specifying the scope of the timing calculation
Fixes #479
2017-11-13 09:20:12 -08:00
Roman Lebedev 5e66248b44 [Tools] A new, more versatile benchmark output compare tool (#474)
* [Tools] A new, more versatile benchmark output compare tool

Sometimes, there is more than one implementation of some functionality.
And the obvious use-case is to benchmark them, which is better?

Currently, there is no easy way to compare the benchmarking results
in that case:
    The obvious solution is to have multiple binaries, each one
containing/running one implementation. And each binary must use
exactly the same benchmark family name, which is super bad,
because now the binary name should contain all the info about
benchmark family...

What if i tell you that is not the solution?
What if we could avoid producing one binary per benchmark family,
with the same family name used in each binary,
but instead could keep all the related families in one binary,
with their proper names, AND still be able to compare them?

There are three modes of operation:
1. Just compare two benchmarks, what `compare_bench.py` did:
```
$ ../tools/compare.py benchmarks ./a.out ./a.out
RUNNING: ./a.out --benchmark_out=/tmp/tmprBT5nW
Run on (8 X 4000 MHz CPU s)
2017-11-07 21:16:44
------------------------------------------------------
Benchmark               Time           CPU Iterations
------------------------------------------------------
BM_memcpy/8            36 ns         36 ns   19101577   211.669MB/s
BM_memcpy/64           76 ns         76 ns    9412571   800.199MB/s
BM_memcpy/512          84 ns         84 ns    8249070   5.64771GB/s
BM_memcpy/1024        116 ns        116 ns    6181763   8.19505GB/s
BM_memcpy/8192        643 ns        643 ns    1062855   11.8636GB/s
BM_copy/8             222 ns        222 ns    3137987   34.3772MB/s
BM_copy/64           1608 ns       1608 ns     432758   37.9501MB/s
BM_copy/512         12589 ns      12589 ns      54806   38.7867MB/s
BM_copy/1024        25169 ns      25169 ns      27713   38.8003MB/s
BM_copy/8192       201165 ns     201112 ns       3486   38.8466MB/s
RUNNING: ./a.out --benchmark_out=/tmp/tmpt1wwG_
Run on (8 X 4000 MHz CPU s)
2017-11-07 21:16:53
------------------------------------------------------
Benchmark               Time           CPU Iterations
------------------------------------------------------
BM_memcpy/8            36 ns         36 ns   19397903   211.255MB/s
BM_memcpy/64           73 ns         73 ns    9691174   839.635MB/s
BM_memcpy/512          85 ns         85 ns    8312329   5.60101GB/s
BM_memcpy/1024        118 ns        118 ns    6438774   8.11608GB/s
BM_memcpy/8192        656 ns        656 ns    1068644   11.6277GB/s
BM_copy/8             223 ns        223 ns    3146977   34.2338MB/s
BM_copy/64           1611 ns       1611 ns     435340   37.8751MB/s
BM_copy/512         12622 ns      12622 ns      54818   38.6844MB/s
BM_copy/1024        25257 ns      25239 ns      27779   38.6927MB/s
BM_copy/8192       205013 ns     205010 ns       3479    38.108MB/s
Comparing ./a.out to ./a.out
Benchmark                 Time             CPU      Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------
BM_memcpy/8            +0.0020         +0.0020            36            36            36            36
BM_memcpy/64           -0.0468         -0.0470            76            73            76            73
BM_memcpy/512          +0.0081         +0.0083            84            85            84            85
BM_memcpy/1024         +0.0098         +0.0097           116           118           116           118
BM_memcpy/8192         +0.0200         +0.0203           643           656           643           656
BM_copy/8              +0.0046         +0.0042           222           223           222           223
BM_copy/64             +0.0020         +0.0020          1608          1611          1608          1611
BM_copy/512            +0.0027         +0.0026         12589         12622         12589         12622
BM_copy/1024           +0.0035         +0.0028         25169         25257         25169         25239
BM_copy/8192           +0.0191         +0.0194        201165        205013        201112        205010
```

2. Compare two different filters of one benchmark:
(for simplicity, the benchmark is executed twice)
```
$ ../tools/compare.py filters ./a.out BM_memcpy BM_copy
RUNNING: ./a.out --benchmark_filter=BM_memcpy --benchmark_out=/tmp/tmpBWKk0k
Run on (8 X 4000 MHz CPU s)
2017-11-07 21:37:28
------------------------------------------------------
Benchmark               Time           CPU Iterations
------------------------------------------------------
BM_memcpy/8            36 ns         36 ns   17891491   211.215MB/s
BM_memcpy/64           74 ns         74 ns    9400999   825.646MB/s
BM_memcpy/512          87 ns         87 ns    8027453   5.46126GB/s
BM_memcpy/1024        111 ns        111 ns    6116853    8.5648GB/s
BM_memcpy/8192        657 ns        656 ns    1064679   11.6247GB/s
RUNNING: ./a.out --benchmark_filter=BM_copy --benchmark_out=/tmp/tmpAvWcOM
Run on (8 X 4000 MHz CPU s)
2017-11-07 21:37:33
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
BM_copy/8           227 ns        227 ns    3038700   33.6264MB/s
BM_copy/64         1640 ns       1640 ns     426893   37.2154MB/s
BM_copy/512       12804 ns      12801 ns      55417   38.1444MB/s
BM_copy/1024      25409 ns      25407 ns      27516   38.4365MB/s
BM_copy/8192     202986 ns     202990 ns       3454   38.4871MB/s
Comparing BM_memcpy to BM_copy (from ./a.out)
Benchmark                               Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------
[BM_memcpy vs. BM_copy]/8            +5.2829         +5.2812            36           227            36           227
[BM_memcpy vs. BM_copy]/64          +21.1719        +21.1856            74          1640            74          1640
[BM_memcpy vs. BM_copy]/512        +145.6487       +145.6097            87         12804            87         12801
[BM_memcpy vs. BM_copy]/1024       +227.1860       +227.1776           111         25409           111         25407
[BM_memcpy vs. BM_copy]/8192       +308.1664       +308.2898           657        202986           656        202990
```

3. Compare filter one from benchmark one to filter two from benchmark two:
(for simplicity, the benchmark is executed twice)
```
$ ../tools/compare.py benchmarksfiltered ./a.out BM_memcpy ./a.out BM_copy
RUNNING: ./a.out --benchmark_filter=BM_memcpy --benchmark_out=/tmp/tmp_FvbYg
Run on (8 X 4000 MHz CPU s)
2017-11-07 21:38:27
------------------------------------------------------
Benchmark               Time           CPU Iterations
------------------------------------------------------
BM_memcpy/8            37 ns         37 ns   18953482   204.118MB/s
BM_memcpy/64           74 ns         74 ns    9206578   828.245MB/s
BM_memcpy/512          91 ns         91 ns    8086195   5.25476GB/s
BM_memcpy/1024        120 ns        120 ns    5804513   7.95662GB/s
BM_memcpy/8192        664 ns        664 ns    1028363   11.4948GB/s
RUNNING: ./a.out --benchmark_filter=BM_copy --benchmark_out=/tmp/tmpDfL5iE
Run on (8 X 4000 MHz CPU s)
2017-11-07 21:38:32
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
BM_copy/8           230 ns        230 ns    2985909   33.1161MB/s
BM_copy/64         1654 ns       1653 ns     419408   36.9137MB/s
BM_copy/512       13122 ns      13120 ns      53403   37.2156MB/s
BM_copy/1024      26679 ns      26666 ns      26575   36.6218MB/s
BM_copy/8192     215068 ns     215053 ns       3221   36.3283MB/s
Comparing BM_memcpy (from ./a.out) to BM_copy (from ./a.out)
Benchmark                               Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------
[BM_memcpy vs. BM_copy]/8            +5.1649         +5.1637            37           230            37           230
[BM_memcpy vs. BM_copy]/64          +21.4352        +21.4374            74          1654            74          1653
[BM_memcpy vs. BM_copy]/512        +143.6022       +143.5865            91         13122            91         13120
[BM_memcpy vs. BM_copy]/1024       +221.5903       +221.4790           120         26679           120         26666
[BM_memcpy vs. BM_copy]/8192       +322.9059       +323.0096           664        215068           664        215053
```

* [Docs] Document tools/compare.py

* [docs] Document how the change is calculated
2017-11-07 13:35:25 -08:00
Dominic Hamon 90aa8665b5
Reorder inline to avoid warning on MSVC (#469)
Fixes #467
2017-11-07 10:33:07 -08:00
Dominic Hamon f4009ef8e3
Fix #476. Explicit coersion of size_t to boolean (#477) 2017-11-07 10:30:17 -08:00
Eric 72a4581caf Fix #382 - MinGW often reports negative CPU times. (#475)
When stopping a timer, the current time is subtracted
from the start time. However, when the times are identical,
or sufficiently close together, the subtraction can result
in a negative number.

For some reason MinGW is the only platform where this problem
manifests. I suspect it's due to MinGW specific behavior in either
the CPU timing code, floating point model, or printf formatting.

Either way, the fix for MinGW should be correct across all platforms.
2017-11-07 09:44:39 -08:00
Dominic Hamon f65c6d9a2c
Remove deprecated headers (#473) 2017-11-06 08:53:23 -08:00
Dominic Hamon 1e52560157
Add releasing doc (#472) 2017-11-03 12:45:16 -07:00
Roman Lebedev 336bb8db98 Update AUTHORS/CONTRIBUTORS (#471)
As requested, in a pr form :)
2017-11-03 10:00:29 -07:00
Stefan Sauer 4463a60efe Mention how to disable CPU frequency scaling while running the benchmark. (#466)
Describe how to use the cpupower command to disable CPU frequency scaling.
Document this, since there are other ways that don't see to have the same
effect. See #325
2017-11-02 08:34:06 -07:00
Leo Koppel fa341e51cb Improve BM_SetInsert example (#465)
* Fix BM_SetInsert example

Move declaration of `std::set<int> data` outside the timing loop, so that the
destructor is not timed.

* Speed up BM_SetInsert test

Since the time taken to ConstructRandomSet() is so large compared to the time
to insert one element, but only the latter is used to determine number of
iterations, this benchmark now takes an extremely long time to run in
benchmark_test.

Speed it up two ways:
  - Increase the Ranges() parameters
  - Cache ConstructRandomSet() result (it's not random anyway), and do only
    O(N) copy every iteration

* Fix same issue in BM_MapLookup test

* Make BM_SetInsert test consistent with README

- Use the same Ranges everywhere, but increase the 2nd range
- Change order of Args() calls in README to more closely match the result of Ranges
- Don't cache ConstructRandomSet, since it doesn't make sense in README
- Get a smaller optimization inside it, by givint a hint to insert()
2017-10-31 11:00:39 -07:00
Yangqing Jia 491360b833 Add option to install benchmark (#463)
* Add option to install benchmark

* Change to BENCHMARK_ENABLE_INSTALL per @dominichamon
2017-10-20 13:49:37 -07:00
Eric 25acf220a4 Refactor most usages of KeepRunning to use the perfered ranged-for. (#459)
Recently the library added a new ranged-for variant of the KeepRunning
loop that is much faster. For this reason it should be preferred in all
new code.

Because a library, its documentation, and its tests should all embody
the best practices of using the library, this patch changes all but a
few usages of KeepRunning() into for (auto _ : state).

The remaining usages in the tests and documentation persist only
to document and test behavior that is different between the two formulations.

Also note that because the range-for loop requires C++11, the KeepRunning
variant has not been deprecated at this time.
2017-10-17 12:17:02 -06:00
Eric Fiselier 22fd1a556e Fix and document SkipWithError(...) using ranged-for loop. 2017-10-17 10:24:13 -06:00
Eric a37fc0c48a Improve KeepRunning loop performance to be similar to the range-based for. (#460)
This patch improves the performance of the KeepRunning loop in two ways:

(A) it removes the dependency on the max_iterations variable, preventing
it from being loaded every iteration.

(B) it loops to zero, instead of to an upper bound. This allows a single
decrement instruction to be used instead of a arithmetic op followed by a
comparison.
2017-10-17 08:40:44 -07:00
Fred Tingaud 2fc2ea0e45 Correct typo in sample code for range-based for loop. (#458) 2017-10-16 09:17:17 -07:00
Raúl Marín cacd321808 Avoid implicit float to double conversion (#457)
Triggered by -Werror=double-promotion
2017-10-13 09:17:02 -07:00
Eric 0526755944 Add C++11 Ranged For loop alternative to KeepRunning (#454)
* Add C++11 Ranged For loop alternative to KeepRunning

As pointed out by @astrelni and @dominichamon, the KeepRunning
loop requires a bunch of memory loads and stores every iterations,
which affects the measurements.

The main reason for these additional loads and stores is that the
State object is passed in by reference, making its contents externally
visible memory, and the compiler doesn't know it hasn't been changed
by non-visible code.

It's also possible the large size of the State struct is hindering
optimizations.

This patch allows the `State` object to be iterated over using
a range-based for loop. Example:

void BM_Foo(benchmark::State& state) {
	for (auto _ : state) {
		[...]
	}
}

This formulation is much more efficient, because the variable counting
the loop index is stored in the iterator produced by `State::begin()`,
which itself is stored in function-local memory and therefore not accessible
by code outside of the function. Therefore the compiler knows the iterator
hasn't been changed every iteration.

This initial patch and idea was from Alex Strelnikov.

* Fix null pointer initialization in C++03
2017-10-10 08:56:42 -07:00
mwinterb f3cd636f18 Always use inline asm DoNotOptimize with clang. (#452)
* Always use inline asm DoNotOptimize with clang.

clang-cl masquerades as MSVC but not GCC, so it was using the
MSVC-compatible definitions of DoNotOptimize and ClobberMemory.
Presumably, it's better in general to use the targeted assembly for
this functionality (the codegen is different), but the specific issue
is that clang-cl deprecates the usage of _ReadWriteBarrier, and this
gets rid of that warning.

* triggering another AppVeyor run
2017-10-10 00:19:01 +02:00
Anton Lashkov 819adb4cd1 Add macros for create benchmark with templated fixture (#451)
* Add macros for create benchmark with templated fixture

* Add info about templated fixtures to README.md

* Add tests for templated fixtures
2017-10-09 21:10:37 +02:00
Dominic Hamon 2409cb2eb1 Minor move of code to cleanup up namespace spaghetti a bit 2017-10-09 12:01:30 -07:00
Dominic Hamon a96ff121b3 Alphabets are hard. AUTHORS version.
#448
2017-09-27 11:53:16 -07:00
Dominic Hamon 5d47e9878f Alphabets are hard. CONTRIBUTORS version.
#448
2017-09-27 11:52:47 -07:00
Dominic Hamon 8792dff1c9 Remove myself from AUTHORS
Covered by Google Inc here and i'm in CONTRIBUTORS
2017-09-27 20:01:49 +02:00
Dominic Hamon 359120be78 Order CONTRIBUTORS
Fixes #448
2017-09-27 20:01:10 +02:00
Dominic Hamon 84a54ae9f4 Organize AUTHORS
Part of #448
2017-09-27 20:00:12 +02:00
Eric 6d8339dd97 Fix #444 - Use BENCHMARK_HAS_CXX11 over __cplusplus. (#446)
* Fix #444 - Use BENCHMARK_HAS_CXX11 over __cplusplus.

MSVC incorrectly defines __cplusplus to report C++03, despite the compiler
actually providing C++11 or greater. Therefore we have to detect C++11 differently
for MSVC. This patch uses `_MSVC_LANG` which has been defined since
Visual Studio 2015 Update 3; which should be sufficient for detecting C++11.

Secondly this patch changes over most usages of __cplusplus >= 201103L to
check BENCHMARK_HAS_CXX11 instead.

* remove redunant comment
2017-09-14 15:50:33 -06:00
Disconnect3d 2a05f248be Improve README's basic usage example (#433) 2017-09-14 09:31:35 +02:00