* Remove redundant default which causes failures
* Fix old GCC warnings caused by poor analysis
* Use __builtin_unreachable
* Use BENCHMARK_UNREACHABLE()
* Pull __has_builtin to benchmark.h too
* Also move compiler identification macro to main header
* Move custom compiler identification macro back
High system load can skew benchmark results. By including system load averages
in the library's output, we help users identify a potential issue in the
quality of their measurements, and thus assist them in producing better (more
reproducible) results.
I got the idea for this from Brendan Gregg's checklist for benchmark accuracy
(http://www.brendangregg.com/blog/2018-06-30/benchmarking-checklist.html).
* Set -Wno-deprecated-declarations for Intel
Intel compiler silently ignores -Wno-deprecated-declarations
so warning no. 1786 must be explicitly suppressed.
* Make std::int64_t → double casts explicit
While std::int64_t → double is a perfectly conformant
implicit conversion, Intel compiler warns about it.
Make them explicit via static_cast<double>.
* Make std::int64_t → int casts explicit
Intel compiler warns about emplacing an std::int64_t
into an int container. Just make the conversion explicit
via static_cast<int>.
* Cleanup Intel -Wno-deprecated-declarations workaround logic
Inspired by these [two](a1ebe07bea) [bugs](0891555be5) in my code due to the lack of those i have found fixed in my code:
* `kIsIterationInvariant` - `* state.iterations()`
The value is constant for every iteration, and needs to be **multiplied** by the iteration count.
* `kAvgIterations` - `/ state.iterations()`
The is global over all the iterations, and needs to be **divided** by the iteration count.
They play nice with `kIsRate`:
* `kIsIterationInvariantRate`
* `kAvgIterationsRate`.
I'm not sure how meaningful they are when combined with `kAvgThreads`.
I guess the `kIsThreadInvariant` can be added, too, for symmetry with `kAvgThreads`.
* Fix compilation on Android with GNU STL
GNU STL in Android NDK lacks string conversion functions from C++11, including std::stoul, std::stoi, and std::stod.
This patch reimplements these functions in benchmark:: namespace using C-style equivalents from C++03.
* Avoid use of log2 which doesn't exist in Android GNU STL
GNU STL in Android NDK lacks log2 function from C99/C++11.
This patch replaces their use in the code with double log(double) function.
* format all documents according to contributor guidelines and specifications
use clang-format on/off to stop formatting when it makes excessively poor decisions
* format all tests as well, and mark blocks which change too much
* Add benchmark_main library with support for Bazel.
* fix newline at end of file
* Add CMake support for benchmark_main.
* Mention optionally using benchmark_main in README.
* Allow support for negative regex filtering
This patch allows one to apply a negation to the entire regex filter
by appending it with a '-' character, much in the same style as
GoogleTest uses.
* Address issues in PR
* Add unit tests for negative filtering
Before this change, we would report the number of requested iterations
passed to the state. After, we will report the actual number run. As a
side-effect, instead of multiplying the expected iterations by the
number of threads to get the total number, we can report the actual
number of iterations across all threads, which takes into account the
situation where some threads might run more iterations than others.
* Ensure 64-bit truncation doesn't happen for complexity results
* One more complexity_n 64-bit fix
* Missed another vector of int
* Piping through the int64_t
* Allow AddRange to work with int64_t.
Fixes#516
Also, tweak how we manage per-test build needs, and create a standard
_gtest suffix for googletest to differentiate from non-googletest tests.
I also ran clang-format on the files that I changed (but not the
benchmark include or main src as they have too many clang-format
issues).
* Add benchmark_gtest to cmake
* Set(Items|Bytes)Processed now take int64_t
Having the copts set on a per-target level can lead to ODR violations
in some cases. Avoid this by ensuring the regex engine is picked
through compiler intrinsics in the header directly.
This patch disables the -Winvalid-offsetof warning for GCC and Clang
when using it to check the cache lines of the State object.
Technically this usage of offsetof is undefined behavior until C++17.
However, all major compilers support this application as an extension,
as demonstrated by the passing static assert (If a compiler encounters UB
during evaluation of a constant expression, that UB must be diagnosed).
Unfortunately, Clang and GCC also produce a warning about it.
This patch temporarily suppresses the warning using #pragma's in the
source file (instead of globally suppressing the warning in the build systems).
This way the warning is ignored for both CMake and Bazel builds without
having to modify either build system.
* Rename StringXxx to StrXxx in string_util.h and its users
This makes the naming consistent within string_util and moves is the
Abseil convention.
* Style guide is 2 spaces before end of line "//" comments
* Rename StrPrintF/StringPrintF to StrFormat for absl compatibility.
On Windows the Shlwapi.h file has a macro:
#define StrCat lstrcatA
And benchmark/src/string_util.h defines StrCat and it is renamed to
lstrcatA if we don't undef the macro in Shlwapi.h. This is an innocuous
bug if string_util.h is included after Shlwapi.h, but it is a compile
error if string_util.h is included before Shlwapi.h.
This fixes issue #545.
* Add Solaris support
Define BENCHMARK_OS_SOLARIS for Solaris.
Platform specific implementations added:
* number of CPUs detection
* CPU cycles per second detection
* Thread CPU usage
* Process CPU usage
* Remove the special case for per process CPU time for Solaris, it's the same as the default.
* Print the executable name as part of the context.
A common use case of the library is to run two different
versions of a benchmark to compare them. In my experience
this often means compiling a benchmark twice, renaming
one of the executables, and then running the executables
back-to-back. In this case the name of the executable
is important contextually information. Unfortunately the
benchmark does not report this information.
This patch adds the executable name to the context reported
by the benchmark.
* attempt to fix tests on Windows
* attempt to fix tests on Windows
* Don't include <sys/resource.h> on Fuchsia.
It doesn't support POSIX resource measurement and timing APIs.
Change-Id: Ifab4bac4296575f042c699db1ce5a4f7c2d82893
* Add BENCHMARK_OS_FUCHSIA for Fuchsia
Change-Id: Ic536f9625e413270285fbfd08471dcb6753ddad1
* Improve State packing: put important members on first cache line.
This patch does a few different things to ensure commonly accessed
data is on the first cache line of the `State` object.
First, it moves the `error_occurred_` member to reside after
the `started_` and `finished_` bools, since there was internal
padding there that was unused.
Second, it moves `batch_leftover_` and `max_iterations` further up
in the struct declaration. These variables are used in the calculation
of `iterations()` which users might call within the loop. Therefore
it's more important they exist on the first cache line.
Finally, this patch turns the bool members into bitfields. Although
this shouldn't have much of an effect currently, because padding is
still needed between the last bool and the first size_t, it should
help in future changes that require more "bool like" members.
* Remove bitfield change for now
* Move bools (and their padding) to end of "first cache line" vars.
I think it makes the most sense to move the padding required
following the group of bools to the end of the variables we want
on the first cache line.
This also means that the `total_iterations_` variable, which is the
most accessed, has the same address as the State object.
* Fix static assertion after moving bools
* Support State::KeepRunningBatch().
State::KeepRunning() can take large amounts of time relative to quick
operations (on the order of 1ns, depending on hardware). For such
sensitive operations, it is recommended to run batches of repeated
operations.
This commit simplifies handling of total_iterations_. Rather than
predecrementing such that total_iterations_ == 1 signals that
KeepRunning() should exit, total_iterations_ == 0 now signals the
intention for the benchmark to exit.
* Create better fast path in State::KeepRunningBatch()
* Replace int parameter with size_t to fix signed mismatch warnings
* Ensure benchmark State has been started even on error.
* Simplify KeepRunningBatch()
This patch primarily changes the BENCHMARK_UNREACHABLE()
implementation under MSVC to use __assume(false) instead
of being a NORETURN function, which ironically caused
unreachable code warnings.
Second, since the NOTHROW function attempt generated the
warnings we meant to avoid, it has been replaced with a dummy
null statement.
* Improve CPU Cache info reporting -- Add Windows support.
This patch does a couple of thing regarding CPU Cache reporting.
First, it adds an implementation on Windows. Second it fixes
the JSONReporter to correctly (and actually) output the CPU
configuration information.
And finally, third, it detects and reports the number of
physical CPU's that share the same cache.
* Refactor System information collection.
This patch refactors the system information collection,
and in particular information about the target CPU. The
motivation is to make it easier to access CPU information,
and easier to add new information as need be.
This patch additionally adds information about the cache
sizes of the CPU.
* Address review comments: Clean up integer types.
This commit cleans up the integer types used in ValueUnion to
follow the Google style guide.
Additionally it adds a BENCHMARK_UNREACHABLE macro to assist
in documenting/catching unreachable code paths.
* Rename ValueUnion accessors.
Define BENCHMARK_OS_NETBSD for NetBSD.
Add detection of cpuinfo_cycles_per_second and cpuinfo_num_cpus.
This code shared detection of these properties with FreeBSD.
When stopping a timer, the current time is subtracted
from the start time. However, when the times are identical,
or sufficiently close together, the subtraction can result
in a negative number.
For some reason MinGW is the only platform where this problem
manifests. I suspect it's due to MinGW specific behavior in either
the CPU timing code, floating point model, or printf formatting.
Either way, the fix for MinGW should be correct across all platforms.
This patch improves the performance of the KeepRunning loop in two ways:
(A) it removes the dependency on the max_iterations variable, preventing
it from being loaded every iteration.
(B) it loops to zero, instead of to an upper bound. This allows a single
decrement instruction to be used instead of a arithmetic op followed by a
comparison.
* Drop Stat1, refactor statistics to be user-providable, add median.
My main goal was to add median statistic. Since Stat1
calculated the stats incrementally, and did not store
the values themselves, it is was not possible. Thus,
i have replaced Stat1 with simple std::vector<double>,
containing all the values.
Then, i have refactored current mean/stdev to be a
function that is provided with values vector, and
returns the statistic. While there, it seemed to make
sense to deduplicate the code by storing all the
statistics functions in a map, and then simply iterate
over it. And the interface to add new statistics is
intentionally exposed, so they may be added easily.
The notable change is that Iterations are no longer
displayed as 0 for stdev. Is could be changed, but
i'm not sure how to nicely fit that into the API.
Similarly, this dance about sometimes (for some fields,
for some statistics) dividing by run.iterations, and
then multiplying the calculated stastic back is also
dropped, and if you do the math, i fail to see why
it was needed there in the first place.
Since that was the only use of stat.h, it is removed.
* complexity.h: attempt to fix MSVC build
* Update README.md
* Store statistics to compute in a vector, ensures ordering.
* Add a bit more tests for repetitions.
* Partially address review notes.
* Fix gcc build: drop extra ';'
clang, why didn't you warn me?
* Address review comments.
* double() -> 0.0
* early return
When generating a human-readable number for user counters, we don't
generally expect 1k to be 1024. This is the default due to the more
general purpose string utility.
Fixes#437
* Json reporter: passthrough fp, don't cast it to int; adjust tooling
Json output format is generally meant for further processing
using some automated tools. Thus, it makes sense not to
intentionally limit the precision of the values contained
in the report.
As it can be seen, FormatKV() for doubles, used %.2f format,
which was meant to preserve at least some of the precision.
However, before that function is ever called, the doubles
were already cast to the integer via RoundDouble()...
This is also the case for console reporter, where it makes
sense because the screen space is limited, and this reporter,
however the CSV reporter does output some( decimal digits.
Thus i can only conclude that the loss of the precision
was not really considered, so i have decided to adjust the
code of the json reporter to output the full fp precision.
There can be several reasons why that is the right thing
to do, the bigger the time_unit used, the greater the
precision loss, so i'd say any sort of further processing
(like e.g. tools/compare_bench.py does) is best done
on the values with most precision.
Also, that cast skewed the data away from zero, which
i think may or may not result in false- positives/negatives
in the output of tools/compare_bench.py
* Json reporter: FormatKV(double): address review note
* tools/gbench/report.py: skip benchmarks with different time units
While it may be useful to teach it to operate on the
measurements with different time units, which is now
possible since floats are stored, and not the integers,
but for now at least doing such a sanity-checking
is better than providing misinformation.
Change ThreadCPUUsage to call ProcessCPUUsage if __rtems__ is defined.
RTEMS real time OS doesn't support CLOCK_THREAD_CPUTIME_ID. See
https://github.com/RTEMS/rtems/blob/master/cpukit/posix/src/clockgettime.c#L58-L59
Prior to this change, ThreadCPUUsage would fail when running on RTEMS with:
ERROR: clock_gettime(CLOCK_THREAD_CPUTIME_ID, ...) failed
* Make Benchmark a single header library (but not header-only)
This patch refactors benchmark into a single header, to allow
for slightly easier usage.
The initial reason for the header split was to keep C++ library
components from being included by benchmark_api.h, making that
part of the library STL agnostic. However this has since changed
and there seems to be little reason to separate the reporters from
the rest of the library.
* Fix internal_macros.h
* Remove more references to macros.h
* Add ClearRegisteredBenchmark() function.
Since benchmarks can be registered at runtime using the RegisterBenchmark(...)
functions, it makes sense to have a ClearRegisteredBenchmarks() function too,
that can be used at runtime to clear the currently registered benchmark and
re-register an entirely new set.
This allows users to run a set of registered benchmarks, get the output using
a custom reporter, and then clear and re-register new benchmarks based on the
previous results.
This fixes issue #400, at least partially.
* Remove unused change
Using target_include_directories CMake will implicitly add the the
necessary include paths to targets which link against the benchmark
library. This is useful when the benchmark repo is included as a
subdirectory in another CMake build.
This thing with the pragma ignore was getting out of hand: now
MinGW (and probably GCC) was erroring too. So I chose to move
the definition of IsZero() out of the anonymous namespace into
benchmark.cc.
The problem was that the call to Finish() the user counters was
lost in a big merge. If I had already written the tests for the
user counters, this would probably have been catched earlier.
* fix android compilation
* checking __GLIBCXX__ and __GLIBCPP__ macro in addition to __ANDROID__
* using vsnprintf instead of std::vsnprintf to compile on Android
* removed __GLIBCPP__ check on Android
* StringPrintF instead of std::to_string for Android
When using CPU time to determine the correct number of iterations the
library additionally checks if the benchmark has consumed 5x the minimum
required time according to the wall clock. This prevents benchmarks
with low CPU usage from running for much longer than actually intended.
However when a benchmark uses a manual timer this heuristic isn't helpful
and likely isn't correct since we don't know what the manual timer actually
measures.
This patch removes the above restriction when a benchmark specifies a manual
timer.
* Add Benchmark::Iterations for explicitly specifying the number of iterations to use.
* Document that benchmark::Iterations should not be used to limit benchmark runtimes
- Remove target_include_directories of ${PROJECT_SOURCE_DIR}/include to
fix error: Target "benchmark" INTERFACE_INCLUDE_DIRECTORIES property
contains path which is prefixed in the source directory.
* Fix reading CPU info from file
Macro CHECK do nothing for release mode, meaning it doesn't invoke the
arguments
* Add myself to AUTHORS and CONTRIBUTORS
Previously the constants used for converting between
different units of time were declared using int64_t. However
we should only use explicitly sized integer types when they
are required, and should use 'int' everwhere else, and there is
no good reason to use int64_t here.
For that reason this patch changes the type of the constants.
This should help address issue #354 as well.
* Added user counters, and move use of bytes_processed and items_processed to user counter logic.
Each counter is a string-value pair. The counters were
made available through the State class. Two helper virtual
methods were added to the Fixture class to allow convenient
initialization and termination of the counters: InitState()
and TerminateState(). The reporting of the counters is buggy
and is still a work in progress, to be completed in the next commits.
* fix bad removal of BenchmarkCounters code during the merge
* add myself to AUTHORS/CONTRIBUTORS
* fix printing to std::cout in csv_reporter
* bytes_per_second and items_per_second are now in the UserCounters class
* add user counters to json reporter
* moving bytes_per_second and items_per_second to their old state
* console reporter dealing ok with user counters.
* update unit tests for user counters
* CSVReporter now prints user counters too.
* cleanup user counters
* reverted changes to cmake files which should have gone into later commits
* fixture_test: fix gcc 4.6 compilation
* remove ctor with default argument
see https://github.com/google/benchmark/pull/262#discussion_r72298055
* use (auto-defined) BENCHMARK_HAS_CXX11 instead of BENCHMARK_INITLIST.
https://github.com/google/benchmark/pull/262#discussion_r72298310
* leanify counters API
Discussions:
API complexity: https://github.com/google/benchmark/pull/262#discussion_r72298731
remove std::string dependency (WIP): https://github.com/google/benchmark/pull/262#discussion_r72298142
spacing & alignment: https://github.com/google/benchmark/pull/262#discussion_r72298422
* remove std::string dependency on public API - changed counter name storage to char*
* Counter ctor: use overloads instead of default arguments
discussion:
https://github.com/google/benchmark/pull/262#discussion_r72298055
* Use raw pointers to remove dependency on std::vector from public API .
For more info, see discussion at https://github.com/google/benchmark/pull/262#discussion_r72319678 .
* Move counter implementation from benchmark.cc to counter.cc.
See discussion: https://github.com/google/benchmark/pull/262#discussion_r72298980 .
* Remove unused (commented-out) code.
* Moved thread counters to ThreadStats.
* Counters: fixed copy and move constructors.
* Counter: use an inplace buffer for small names.
* benchmark_test: move counters test out of CXX11 preprocessor conditional.
* Counter: fix VS2013 compilation error in char[] initialization.
* Fix typo.
* Expose counters from State.
See discussion: https://github.com/google/benchmark/pull/262#issuecomment-237156951
* Changed counters interface to map-like.
* Fix printing of user counters in ConsoleReporter.
* Applied clang-format to counter.cc and console_reporter.cc.
Command was `clang-format -style=Google -i counter.cc console_reporter.cc`
I also applied to all other files, but the changes were very
far-reaching so I rolled those back.
* Rename Counter::Flags_e to Counter::Flags
* Fix use of reserved names in Counter and BenchmarkCounters.
* Counter: Fix move ctor bug + change order of members.
* Fixture: remove tentative methods InitState() and TerminateState().
* Update fixture_test to the new Fixture interface.
* BenchmarkCounters: fixed a bug in the move ctor. Remove call to CHECK_LT().
CHECK_LT() was making the size_t lookup take ~double the time of a string lookup!
* BenchmarkCounters: add option to not print zero counters (defaults to false).
* Add test to compare counter storage and access with std::map.
* README: clarify cost of counter access modes.
* move counter access test to an own test.
* BenchmarkCounters: add move Insert()
* Counters access test: add accelerated lookup by name.
* Fix old range syntax.
* Fix missing include of cstdio
* Fix Visual Studio warning
* VS2013 and lower: fix use of snprintf()
* VS2013: fix use of char[] as a member of std::pair<>.
* change counter storage to std::map
* Remove skipZeroCounters logic
* Fix VS compilation error.
* Implemented request changes to PR #262.
* PR #262: More requested changes.
* README: cleanup counter text.
* PR #262: remove clang-format changes for preexisting code
* Complexity+Counters: fix counter flags which were being ignored.
* Document all Counter::Flag members
* fixed loss of counter values
* ConsoleReporter: remove tabular printing of user counters.
* ConsoleReporter: header printing should not be contingent on user counter names.
* Minor white space and alignment fixes.
* cxx03_test + counters: reuse the BM_empty() function.
* user counters: add note to README on how counters are gathered across threads
* Implement cycleclock::Now for PNaCl
* Make cycleclock::Now compatible with NaCl/ARM
* Support Emscripten (Asm.js, WebAssembly)
* Rearrange #ifs from to handle specific cases first
* DoNotOptimize without inline asm for Emscripten & PNaCl
* Add macro definition for iOS
Add an additional macro definition for iOS.
iOS is defined as a Mac OSX invariant in the TargetConditionals include,
thus we treat it as a subset of OSX within the defines.
* Skip error for hw.cpufrequency on iOS
hw.cpufrequency is not available on iOS devices. As there is no way to reliably
retrieve the CPU frequency on iOS we are printing out a warning
that we were unable to detect the CPU frequency and set it to 0.
This only disables cpu frequency readouts on actual physical iOS devices.
Running this code on the simulator still gives the cpu architecture of
the host computer as the simulator passes down the sysctl calls to OSX.
* BENCHMARK_MAIN() now reports unrecognised command-line flags (see google/benchmark#320)
* add benchmark::ReportUnrecognizedArguments()
Update BENCHMARK_MAIN() to use ReportUnrecognizedArguments() instead of
having the reporting code directly in the macro.
See issue google/benchmark#320 for reference
* let's stick to american english -- fix type in ReportUnrecognizedArguments()
* make ReportUnrecognizedArguments() print to stderr
* make ReportUnrecognizedArguments() return true if any arguments have been reported (i.e. argc > 1)
* Auto-detect whether to produce colorized output
Rename --color_print to --benchmark_color for consistency with the other
flags (and Google Test). Old flag name is kept around for compatibility.
The --benchmark_color/--color_print flag takes a third option, "auto",
which is the new default. In this mode, we attempt to auto-detect
whether to produce colorized output. (The logic for deciding whether to
use colorized output was lifted from GTest:
<https://github.com/google/googletest/blob/master/googletest/src/gtest.cc#L2925>.)
* Update CONTRIBUTORS, AUTHORS
If a reporter's output stream isn't line-buffered (e.g. it's not writing
to a terminal) then it can be some time before a write to it becomes
visible.
This is problematic if, say, you're wanting to use tail -f to view the
file written to via --benchmark_out. Or if the application crashes,
leaving you with no results.
Addressed by flushing the reporters' output streams whenever we invoke
methods that may write to them.
* Refactor benchmark.cc into benchmark_register.cc and benchmark_run.cc
The benchmark.cc file is getting really big and it contains a bunch of
unrelated components. This patch separates the files into two separate
parts. The "runtime" parts and the "registration" parts.
This patch also removes the PIMPL used by Benchmark. Previously we couldn't
have STL types in the interface but now we can. Therefore there is no reason
to keep BenchmarkImp.
* add missing include
* rework windows timers again
* Guard timespec on older Windows versions
* Remove old thread safety annotation workarounds
* Change to using per-thread timers
* fix bad assertions
* fix copy paste error on windows
* Fix thread safety annotations
* Make null-log thread safe
* remove remaining globals
* use chrono for walltime since it is thread safe
* consolidate timer functions
* Add missing ctime include
* Rename to be consistent with Google style
* Format patch using clang-format
* cleanup -Wthread-safety configuration
* Don't trust _POSIX_FEATURE macros because OS X lies.
* Fix OS X thread timings
* attempt to fix mingw build
* Attempt to make mingw work again
* Revert old mingw workaround
* improve diagnostics
* Drastically improve OS X measurements
* Use average real time instead of max
In the `Ranges(...)` generation code a "control" vector which stores
the current index for each range passed to `Ranges`. Previously this vector
was incorrectly initialized to the size of the subranges not the number
of subranges.
Additionally this patch suppresses unused warnings generated by
`stream_init_anchor`.
The benchmark library internals write to std::cout/std::cerr during program
startup. This can cause segfaults when the user doesn't include <iostream> in
the benchmark (which init's the streams). This patch fixes this by emitting
a dynamic initializer in every TU which initializes the streams.
Previously the FittingCurve functions for n^2 and n^3 did the calculation
using int types. This can overflow and cause UB. This patch changes the
calculations to use std::pow to prevent this.
Also re-enable VC 2013 appveyor bot since I *hope* this is what was causing
the failures.
* Support multiple ranges in the benchmark
google-benchmark library allows to provide up to two ranges to the
benchmark method (range_x and range_y). However, in many cases it's not
sufficient. The patch introduces multi-range features, so user can easily
define multiple ranges by passing a vector of integers, and access values
through the method range(i).
* Remove redundant API
Functions State::range_x() and State::range_y() have been removed. They should
be replaced by State::range(0) and State::range(1).
Functions Benchmark::ArgPair() and Benchmark::RangePair() have been removed.
They should be replaced by Benchmark::Args() and Benchmark::Ranges().
These options allow you to write the output of a benchmark to the specified
file and with the specified format. The goal of this change is to help support
tooling.
Without these, clang reorders these instructions as if they were
regular loads/stores which causes SIGILL from the kernel because
it performs all the loads before it starts testing the values.
* Move ComputeStats call out of the reporters
* Cleanup adjusted time calculations in reporters
* Move ComputeBigO call out of reporters
* Remove ReportComplexity interface using ReportRuns instead
* Factor out reporting of basic context information
* Attempt to fix GCC 4.6 build errors
* Move ComputeStats to complexity.cc
* Add test for reporter output.
* setup err_stream tests
* Fix warnings in tests
* whitespace
* Fix build errors caused by super pedantic compilers
* Pass streams by pointer not non-const reference
Some benchmarks may run a few milliseconds which makes it kind of hard to visually compare, since the currently only available nanoseconds numbers can get very large in this case. Therefore this commit adds an optional command line flag --benchmark_time_unit which lets the user choose between ns and ms time units for displaying the mean execution time.
Having access to the thread count from within a benchmark is useful,
for when one wants to distribute a workload dynamically among the
benchmarks running in parallel e.g when using ThreadRange() or
ThreadPerCpu().
Using `0` as a null pointer is illegal when `-Wzero-as-null-pointer-constant`
is given to G++. To avoid the warning `zero-as-null-pointer-constant`,
`nullptr` (C++11 keyword) instead of `0` is used in the `sysctl` invocation.
When the library is created as a *.dll on Windows it is treated like a
runtime object so we must proivde the destination for the runtime
objects in our install command
For cross platform and cross compiler portability we use the
standard integer type for a 64-bit integer. MinGW on Windows doesn't
have the definition for `int64`.
We use the SHGetValueA on Windows to retrieve the MHz of the processor
but this requires the shlwapi library. Previous to this patch the
library was linked with a MSVC specific pragma but there is no
guarantee that on Windows we will be using MSVC. Therefore, it is much
compile agnostic to use the standard CMAKE library linking mechanism
to provide the definition of SHGetValueA