benchmark

mirror of https://github.com/google/benchmark.git synced 2024-12-02 20:53:53 +00:00

Author	SHA1	Message	Date
Samuel Panzer	296ec5693e	Support State::KeepRunningBatch(). (#521 ) * Support State::KeepRunningBatch(). State::KeepRunning() can take large amounts of time relative to quick operations (on the order of 1ns, depending on hardware). For such sensitive operations, it is recommended to run batches of repeated operations. This commit simplifies handling of total_iterations_. Rather than predecrementing such that total_iterations_ == 1 signals that KeepRunning() should exit, total_iterations_ == 0 now signals the intention for the benchmark to exit. * Create better fast path in State::KeepRunningBatch() * Replace int parameter with size_t to fix signed mismatch warnings * Ensure benchmark State has been started even on error. * Simplify KeepRunningBatch()	2018-02-09 21:57:04 -07:00
Tim	bc83262f9d	.vs/ and CmakeSettings.json to gitignore (#522 )	2018-02-03 22:04:36 -07:00
Dominic Hamon	df415adb2a	Some small clang-tidy fixes (#520 )	2018-01-29 08:38:47 -08:00
oskidan	4fe0206b65	Fixes compilation error caused by integer precision loss due to implicit (#518 ) conversion in sysinfo.cc	2018-01-19 09:17:01 -08:00
Dominic Hamon	9f5694ceb6	Wrap COMPILER macros. (#514 ) Some command line or build systems may already set these (eg, bazel) so make sure that takes priority. Fixes #513	2018-01-11 17:22:45 -08:00
Eric	e1c3a83b81	Merge pull request #509 from efcs/fix-gtest-install Prevent GTest and GMock from being installed with Google Benchmark.	2018-01-05 12:51:03 -07:00
Eric Fiselier	778b85a7a9	Prevent GTest and GMock from being installed with Google Benchmark. When users satisfy the GTest dependancy by placing a googletest directory in the project, the targets from GTest and GMock incorrectly get installed along side this library. We shouldn't be installing our test dependancies. This patch forces the options that control installation for googletest to OFF.	2018-01-05 11:04:22 -07:00
Winston Du	052421c823	Updated documentation. (#503 ) For people who get this library via CMake's AddExternalProject like me. Would like a long term tutorial from someone who really understands CMake on how to actually link an externalproject's dependencies to another added external project.	2018-01-04 17:13:34 -07:00
Dominic Hamon	e4ccad7c4a	Update README.md	2017-12-14 09:40:26 -08:00
Eric	7db02be244	Add support for GTest based unit tests. (#485 ) * Add support for GTest based unit tests. As Dominic and I have previously discussed, there is some need/desire to improve the testing situation in Google Benchmark. One step to fixing this problem is to make it easier to write unit tests by adding support for GTest, which is what this patch does. By default it looks for an installed version of GTest. However the user can specify -DBENCHMARK_BUILD_EXTERNAL_GTEST=ON to instead download, build, and use copy of gtest from source. This is quite useful when Benchmark is being built in non-standard configurations, such as against libc++ or in 32 bit mode.	2017-12-13 16:26:47 -07:00
Eric Fiselier	de725e5a7c	Document new 'v2' branch meant for unstable development. This patch documents the newly added v2 branch, which will be used to stage, test, and receive feedback on upcoming features, most of which will be breaking changes which can't be directly applied to master.	2017-12-13 14:51:56 -07:00
Dominic Hamon	7f2d2cd5b9	fix xcode travis builds by skipping mkdir errors	2017-12-07 14:20:59 -08:00
Louis Dionne	5b2c08668c	Enforce using a semicolon after BENCHMARK_MAIN to remove compiler warnings (#495 )	2017-12-03 18:45:07 -07:00
Victor Costan	0bbaeeaf7a	Add GCC on OSX to list of Travis CI configurations. (#492 )	2017-11-30 15:21:32 -08:00
Victor Costan	95a1435b81	Fix compilation error with GCC on OSX (issue #490 ). (#491 )	2017-11-30 08:05:38 -08:00
Roman Lebedev	c45f01866b	CMake: implement LTO for clang. Fixes #478 (#487 ) * CMake: implement LTO for clang. Fixes #478 * LTO: add basic docs about required executables.	2017-11-29 12:48:43 -08:00
Kishan Kumar	eae42212ce	Added the installation guide for Ubuntu (#489 ) * Initial CLA Requirement * Added Installation steps to the Readme.md * Fixed error in running benchmark of Installation * Remove unwanted commands Removed the lengthy install procedure with suggested install mechanism	2017-11-29 09:36:19 -08:00
Roman Lebedev	ec5684ed75	Console reporter: properly account for the lenght of custom counter names (#484 ) Old output example: ``` Benchmark Time CPU Iterations CPUTime,s Pixels/s ThreadingFactor ------------------------------------------------------------------------------------------------------------------------------ 20170525_0036TEST.RAF/threads:8/real_time 45 ms 45 ms 16 0.718738 79.6277M/s 0.999978 2.41419GB/s 22.2613 items/s FileSize,MB=111.050781; MPix=57.231360 ``` New output example: ``` Benchmark Time CPU Iterations CPUTime,s Pixels/s ThreadingFactor ------------------------------------------------------------------------------------------------------------------------------ 20170525_0036TEST.RAF/threads:8/real_time 45 ms 45 ms 16 0.713575 80.1713M/s 0.999571 2.43067GB/s 22.4133 items/s FileSize,MB=111.050781; MPix=57.231360 ```	2017-11-27 09:01:01 -08:00
Eric Fiselier	2ec7399cf1	Improve BENCHMARK_UNREACHABLE() implementation. This patch primarily changes the BENCHMARK_UNREACHABLE() implementation under MSVC to use __assume(false) instead of being a NORETURN function, which ironically caused unreachable code warnings. Second, since the NOTHROW function attempt generated the warnings we meant to avoid, it has been replaced with a dummy null statement.	2017-11-26 13:58:24 -07:00
Eric	11dc36822b	Improve CPU Cache info reporting -- Add Windows support. (#486 ) * Improve CPU Cache info reporting -- Add Windows support. This patch does a couple of thing regarding CPU Cache reporting. First, it adds an implementation on Windows. Second it fixes the JSONReporter to correctly (and actually) output the CPU configuration information. And finally, third, it detects and reports the number of physical CPU's that share the same cache.	2017-11-26 13:33:01 -07:00
Eric	27e0b439cf	Refactor System information collection -- Add CPU Cache Info (#483 ) * Refactor System information collection. This patch refactors the system information collection, and in particular information about the target CPU. The motivation is to make it easier to access CPU information, and easier to add new information as need be. This patch additionally adds information about the cache sizes of the CPU. * Address review comments: Clean up integer types. This commit cleans up the integer types used in ValueUnion to follow the Google style guide. Additionally it adds a BENCHMARK_UNREACHABLE macro to assist in documenting/catching unreachable code paths. * Rename ValueUnion accessors.	2017-11-22 08:33:52 -08:00
Kamil Rytarowski	aad6a5fa76	Add NetBSD support (#482 ) Define BENCHMARK_OS_NETBSD for NetBSD. Add detection of cpuinfo_cycles_per_second and cpuinfo_num_cpus. This code shared detection of these properties with FreeBSD.	2017-11-17 08:46:08 -08:00
Steinar H. Gunderson	0c3ec998c4	Add a pkg-config file, for the benefit of projects not using CMake. (#480 )	2017-11-15 11:51:22 -08:00
Dominic Hamon	ed5764ea28	Add doc specifying the scope of the timing calculation Fixes #479	2017-11-13 09:20:12 -08:00
Roman Lebedev	5e66248b44	[Tools] A new, more versatile benchmark output compare tool (#474 ) * [Tools] A new, more versatile benchmark output compare tool Sometimes, there is more than one implementation of some functionality. And the obvious use-case is to benchmark them, which is better? Currently, there is no easy way to compare the benchmarking results in that case: The obvious solution is to have multiple binaries, each one containing/running one implementation. And each binary must use exactly the same benchmark family name, which is super bad, because now the binary name should contain all the info about benchmark family... What if i tell you that is not the solution? What if we could avoid producing one binary per benchmark family, with the same family name used in each binary, but instead could keep all the related families in one binary, with their proper names, AND still be able to compare them? There are three modes of operation: 1. Just compare two benchmarks, what `compare_bench.py` did: ``` $ ../tools/compare.py benchmarks ./a.out ./a.out RUNNING: ./a.out --benchmark_out=/tmp/tmprBT5nW Run on (8 X 4000 MHz CPU s) 2017-11-07 21:16:44 ------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------ BM_memcpy/8 36 ns 36 ns 19101577 211.669MB/s BM_memcpy/64 76 ns 76 ns 9412571 800.199MB/s BM_memcpy/512 84 ns 84 ns 8249070 5.64771GB/s BM_memcpy/1024 116 ns 116 ns 6181763 8.19505GB/s BM_memcpy/8192 643 ns 643 ns 1062855 11.8636GB/s BM_copy/8 222 ns 222 ns 3137987 34.3772MB/s BM_copy/64 1608 ns 1608 ns 432758 37.9501MB/s BM_copy/512 12589 ns 12589 ns 54806 38.7867MB/s BM_copy/1024 25169 ns 25169 ns 27713 38.8003MB/s BM_copy/8192 201165 ns 201112 ns 3486 38.8466MB/s RUNNING: ./a.out --benchmark_out=/tmp/tmpt1wwG_ Run on (8 X 4000 MHz CPU s) 2017-11-07 21:16:53 ------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------ BM_memcpy/8 36 ns 36 ns 19397903 211.255MB/s BM_memcpy/64 73 ns 73 ns 9691174 839.635MB/s BM_memcpy/512 85 ns 85 ns 8312329 5.60101GB/s BM_memcpy/1024 118 ns 118 ns 6438774 8.11608GB/s BM_memcpy/8192 656 ns 656 ns 1068644 11.6277GB/s BM_copy/8 223 ns 223 ns 3146977 34.2338MB/s BM_copy/64 1611 ns 1611 ns 435340 37.8751MB/s BM_copy/512 12622 ns 12622 ns 54818 38.6844MB/s BM_copy/1024 25257 ns 25239 ns 27779 38.6927MB/s BM_copy/8192 205013 ns 205010 ns 3479 38.108MB/s Comparing ./a.out to ./a.out Benchmark Time CPU Time Old Time New CPU Old CPU New ------------------------------------------------------------------------------------------------------ BM_memcpy/8 +0.0020 +0.0020 36 36 36 36 BM_memcpy/64 -0.0468 -0.0470 76 73 76 73 BM_memcpy/512 +0.0081 +0.0083 84 85 84 85 BM_memcpy/1024 +0.0098 +0.0097 116 118 116 118 BM_memcpy/8192 +0.0200 +0.0203 643 656 643 656 BM_copy/8 +0.0046 +0.0042 222 223 222 223 BM_copy/64 +0.0020 +0.0020 1608 1611 1608 1611 BM_copy/512 +0.0027 +0.0026 12589 12622 12589 12622 BM_copy/1024 +0.0035 +0.0028 25169 25257 25169 25239 BM_copy/8192 +0.0191 +0.0194 201165 205013 201112 205010 ``` 2. Compare two different filters of one benchmark: (for simplicity, the benchmark is executed twice) ``` $ ../tools/compare.py filters ./a.out BM_memcpy BM_copy RUNNING: ./a.out --benchmark_filter=BM_memcpy --benchmark_out=/tmp/tmpBWKk0k Run on (8 X 4000 MHz CPU s) 2017-11-07 21:37:28 ------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------ BM_memcpy/8 36 ns 36 ns 17891491 211.215MB/s BM_memcpy/64 74 ns 74 ns 9400999 825.646MB/s BM_memcpy/512 87 ns 87 ns 8027453 5.46126GB/s BM_memcpy/1024 111 ns 111 ns 6116853 8.5648GB/s BM_memcpy/8192 657 ns 656 ns 1064679 11.6247GB/s RUNNING: ./a.out --benchmark_filter=BM_copy --benchmark_out=/tmp/tmpAvWcOM Run on (8 X 4000 MHz CPU s) 2017-11-07 21:37:33 ---------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------- BM_copy/8 227 ns 227 ns 3038700 33.6264MB/s BM_copy/64 1640 ns 1640 ns 426893 37.2154MB/s BM_copy/512 12804 ns 12801 ns 55417 38.1444MB/s BM_copy/1024 25409 ns 25407 ns 27516 38.4365MB/s BM_copy/8192 202986 ns 202990 ns 3454 38.4871MB/s Comparing BM_memcpy to BM_copy (from ./a.out) Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------- [BM_memcpy vs. BM_copy]/8 +5.2829 +5.2812 36 227 36 227 [BM_memcpy vs. BM_copy]/64 +21.1719 +21.1856 74 1640 74 1640 [BM_memcpy vs. BM_copy]/512 +145.6487 +145.6097 87 12804 87 12801 [BM_memcpy vs. BM_copy]/1024 +227.1860 +227.1776 111 25409 111 25407 [BM_memcpy vs. BM_copy]/8192 +308.1664 +308.2898 657 202986 656 202990 ``` 3. Compare filter one from benchmark one to filter two from benchmark two: (for simplicity, the benchmark is executed twice) ``` $ ../tools/compare.py benchmarksfiltered ./a.out BM_memcpy ./a.out BM_copy RUNNING: ./a.out --benchmark_filter=BM_memcpy --benchmark_out=/tmp/tmp_FvbYg Run on (8 X 4000 MHz CPU s) 2017-11-07 21:38:27 ------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------ BM_memcpy/8 37 ns 37 ns 18953482 204.118MB/s BM_memcpy/64 74 ns 74 ns 9206578 828.245MB/s BM_memcpy/512 91 ns 91 ns 8086195 5.25476GB/s BM_memcpy/1024 120 ns 120 ns 5804513 7.95662GB/s BM_memcpy/8192 664 ns 664 ns 1028363 11.4948GB/s RUNNING: ./a.out --benchmark_filter=BM_copy --benchmark_out=/tmp/tmpDfL5iE Run on (8 X 4000 MHz CPU s) 2017-11-07 21:38:32 ---------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------- BM_copy/8 230 ns 230 ns 2985909 33.1161MB/s BM_copy/64 1654 ns 1653 ns 419408 36.9137MB/s BM_copy/512 13122 ns 13120 ns 53403 37.2156MB/s BM_copy/1024 26679 ns 26666 ns 26575 36.6218MB/s BM_copy/8192 215068 ns 215053 ns 3221 36.3283MB/s Comparing BM_memcpy (from ./a.out) to BM_copy (from ./a.out) Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------- [BM_memcpy vs. BM_copy]/8 +5.1649 +5.1637 37 230 37 230 [BM_memcpy vs. BM_copy]/64 +21.4352 +21.4374 74 1654 74 1653 [BM_memcpy vs. BM_copy]/512 +143.6022 +143.5865 91 13122 91 13120 [BM_memcpy vs. BM_copy]/1024 +221.5903 +221.4790 120 26679 120 26666 [BM_memcpy vs. BM_copy]/8192 +322.9059 +323.0096 664 215068 664 215053 ``` * [Docs] Document tools/compare.py * [docs] Document how the change is calculated	2017-11-07 13:35:25 -08:00
Dominic Hamon	90aa8665b5	Reorder inline to avoid warning on MSVC (#469 ) Fixes #467	2017-11-07 10:33:07 -08:00
Dominic Hamon	f4009ef8e3	Fix #476 . Explicit coersion of size_t to boolean (#477 )	2017-11-07 10:30:17 -08:00
Eric	72a4581caf	Fix #382 - MinGW often reports negative CPU times. (#475 ) When stopping a timer, the current time is subtracted from the start time. However, when the times are identical, or sufficiently close together, the subtraction can result in a negative number. For some reason MinGW is the only platform where this problem manifests. I suspect it's due to MinGW specific behavior in either the CPU timing code, floating point model, or printf formatting. Either way, the fix for MinGW should be correct across all platforms.	2017-11-07 09:44:39 -08:00
Dominic Hamon	f65c6d9a2c	Remove deprecated headers (#473 )	2017-11-06 08:53:23 -08:00
Dominic Hamon	1e52560157	Add releasing doc (#472 )	2017-11-03 12:45:16 -07:00
Roman Lebedev	336bb8db98	Update AUTHORS/CONTRIBUTORS (#471 ) As requested, in a pr form :)	2017-11-03 10:00:29 -07:00
Stefan Sauer	4463a60efe	Mention how to disable CPU frequency scaling while running the benchmark. (#466 ) Describe how to use the cpupower command to disable CPU frequency scaling. Document this, since there are other ways that don't see to have the same effect. See #325	2017-11-02 08:34:06 -07:00
Leo Koppel	fa341e51cb	Improve BM_SetInsert example (#465 ) * Fix BM_SetInsert example Move declaration of `std::set<int> data` outside the timing loop, so that the destructor is not timed. * Speed up BM_SetInsert test Since the time taken to ConstructRandomSet() is so large compared to the time to insert one element, but only the latter is used to determine number of iterations, this benchmark now takes an extremely long time to run in benchmark_test. Speed it up two ways: - Increase the Ranges() parameters - Cache ConstructRandomSet() result (it's not random anyway), and do only O(N) copy every iteration * Fix same issue in BM_MapLookup test * Make BM_SetInsert test consistent with README - Use the same Ranges everywhere, but increase the 2nd range - Change order of Args() calls in README to more closely match the result of Ranges - Don't cache ConstructRandomSet, since it doesn't make sense in README - Get a smaller optimization inside it, by givint a hint to insert()	2017-10-31 11:00:39 -07:00
Yangqing Jia	491360b833	Add option to install benchmark (#463 ) * Add option to install benchmark * Change to BENCHMARK_ENABLE_INSTALL per @dominichamon	2017-10-20 13:49:37 -07:00
Eric	25acf220a4	Refactor most usages of KeepRunning to use the perfered ranged-for. (#459 ) Recently the library added a new ranged-for variant of the KeepRunning loop that is much faster. For this reason it should be preferred in all new code. Because a library, its documentation, and its tests should all embody the best practices of using the library, this patch changes all but a few usages of KeepRunning() into for (auto _ : state). The remaining usages in the tests and documentation persist only to document and test behavior that is different between the two formulations. Also note that because the range-for loop requires C++11, the KeepRunning variant has not been deprecated at this time.	2017-10-17 12:17:02 -06:00
Eric Fiselier	22fd1a556e	Fix and document SkipWithError(...) using ranged-for loop.	2017-10-17 10:24:13 -06:00
Eric	a37fc0c48a	Improve KeepRunning loop performance to be similar to the range-based for. (#460 ) This patch improves the performance of the KeepRunning loop in two ways: (A) it removes the dependency on the max_iterations variable, preventing it from being loaded every iteration. (B) it loops to zero, instead of to an upper bound. This allows a single decrement instruction to be used instead of a arithmetic op followed by a comparison.	2017-10-17 08:40:44 -07:00
Fred Tingaud	2fc2ea0e45	Correct typo in sample code for range-based for loop. (#458 )	2017-10-16 09:17:17 -07:00
Raúl Marín	cacd321808	Avoid implicit float to double conversion (#457 ) Triggered by -Werror=double-promotion	2017-10-13 09:17:02 -07:00
Eric	0526755944	Add C++11 Ranged For loop alternative to KeepRunning (#454 ) * Add C++11 Ranged For loop alternative to KeepRunning As pointed out by @astrelni and @dominichamon, the KeepRunning loop requires a bunch of memory loads and stores every iterations, which affects the measurements. The main reason for these additional loads and stores is that the State object is passed in by reference, making its contents externally visible memory, and the compiler doesn't know it hasn't been changed by non-visible code. It's also possible the large size of the State struct is hindering optimizations. This patch allows the `State` object to be iterated over using a range-based for loop. Example: void BM_Foo(benchmark::State& state) { for (auto _ : state) { [...] } } This formulation is much more efficient, because the variable counting the loop index is stored in the iterator produced by `State::begin()`, which itself is stored in function-local memory and therefore not accessible by code outside of the function. Therefore the compiler knows the iterator hasn't been changed every iteration. This initial patch and idea was from Alex Strelnikov. * Fix null pointer initialization in C++03	2017-10-10 08:56:42 -07:00
mwinterb	f3cd636f18	Always use inline asm DoNotOptimize with clang. (#452 ) * Always use inline asm DoNotOptimize with clang. clang-cl masquerades as MSVC but not GCC, so it was using the MSVC-compatible definitions of DoNotOptimize and ClobberMemory. Presumably, it's better in general to use the targeted assembly for this functionality (the codegen is different), but the specific issue is that clang-cl deprecates the usage of _ReadWriteBarrier, and this gets rid of that warning. * triggering another AppVeyor run	2017-10-10 00:19:01 +02:00
Anton Lashkov	819adb4cd1	Add macros for create benchmark with templated fixture (#451 ) * Add macros for create benchmark with templated fixture * Add info about templated fixtures to README.md * Add tests for templated fixtures	2017-10-09 21:10:37 +02:00
Dominic Hamon	2409cb2eb1	Minor move of code to cleanup up namespace spaghetti a bit	2017-10-09 12:01:30 -07:00
Dominic Hamon	a96ff121b3	Alphabets are hard. AUTHORS version. #448	2017-09-27 11:53:16 -07:00
Dominic Hamon	5d47e9878f	Alphabets are hard. CONTRIBUTORS version. #448	2017-09-27 11:52:47 -07:00
Dominic Hamon	8792dff1c9	Remove myself from AUTHORS Covered by Google Inc here and i'm in CONTRIBUTORS	2017-09-27 20:01:49 +02:00
Dominic Hamon	359120be78	Order CONTRIBUTORS Fixes #448	2017-09-27 20:01:10 +02:00
Dominic Hamon	84a54ae9f4	Organize AUTHORS Part of #448	2017-09-27 20:00:12 +02:00
Eric	6d8339dd97	Fix #444 - Use BENCHMARK_HAS_CXX11 over __cplusplus. (#446 ) * Fix #444 - Use BENCHMARK_HAS_CXX11 over __cplusplus. MSVC incorrectly defines __cplusplus to report C++03, despite the compiler actually providing C++11 or greater. Therefore we have to detect C++11 differently for MSVC. This patch uses `_MSVC_LANG` which has been defined since Visual Studio 2015 Update 3; which should be sufficient for detecting C++11. Secondly this patch changes over most usages of __cplusplus >= 201103L to check BENCHMARK_HAS_CXX11 instead. * remove redunant comment	2017-09-14 15:50:33 -06:00
Disconnect3d	2a05f248be	Improve README's basic usage example (#433 )	2017-09-14 09:31:35 +02:00

1 2 3 4 5 ...

820 commits