* Support optional, user-directed collection of performance counters
The patch allows an engineer wishing to drill into the root causes
of a regression, for example. Currently, only single threaded runs
are supported. The feature is a build-time opt in, and then a runtime
opt in.
The engineer may run the benchmark executable, passing a list of
performance counter names (using libpfm's naming scheme) at the
command line. The counter values will then be collected and reported
back as UserCounters.
This is different from #240 in that it is a benchmark user opt-in, and
the counter collection is transparent to the benchmark.
Currently, this is only supported on platforms where libpfm is
supported.
libpfm: http://perfmon2.sourceforge.net/
* 'Use' values param in Snapshot when BENCHMARK_OS_WINDOWS
This is to avoid unused parameter warning-as-error
* Added missing include for <vector> in perf_counters.cc
* Moved doc to docs
* Added license blurbs
according to https://cmake.org/cmake/help/latest/command/function.html,
"Referencing to ARGV# arguments beyond ARGC have undefined behavior.",
which I hit with cmake 3.19.7.
This uses ARGC to check whether ARGV1 has been passed before referencing
it.
Currently, i get:
```
Run on (32 X 7326.56 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 512 KiB (x16)
L3 Unified 32768 KiB (x2)
```
which seems mostly right, except that the frequency is rather bogus.
Yes, i guess the CPU could theoretically achieve that,
but i have 3.6GHz configured, and scaling disabled.
So we clearly read the wrong thing.
With this fix, i now get the expected
```
Run on (32 X 3598.53 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 512 KiB (x16)
L3 Unified 32768 KiB (x2)
```
Use the benchmark's reported iteration count when estimating
iterations for the next repetition, rather than the requested
iteration count. When the benchmark uses KeepRunningBatch the actual
iteration count can be larger than the one the runner requested.
Prior to this fix the runner was underestimating the next iteration
count, sometimes significantly so. Consider the case of a benchmark
using a batch size of 1024. Prior to this change, the benchmark
runner would attempt iteration counts 1, 10, 100 and 1000, yet the
benchmark itself would do the same amount of work each time: a single
batch of 1024 iterations. The discrepancy could also contribute to
estimation errors once the benchmark time reached 10% of the target.
For example, if the very first batch of 1024 iterations reached 10% of
benchmark_min_min time, the runner would attempt to scale that to 100%
from a basis of one iteration rather than 1024.
This bug was particularly noticeable in benchmarks with large batch
sizes, especially when the benchmark also had slow set up or tear down
phases.
With this fix in place it is possible to use KeepRunningBatch to
achieve a kind of "minimum iteration count" feature by using a larger
fixed batch size. For example, a benchmark may build a map of 500K
elements and test a "find" operation. There is no point in running
"find" just 1, 10, 100, etc., times. The benchmark can now pick a
batch size of something like 10K, and the runner will arrive at the
final max iteration count with in noticeably fewer repetitions.
When building with gcc TSan on, and in Debug mode, we see a warning
like:
benchmark/src/timers.cc: In function ‘std::string benchmark::LocalDateTimeString()’:
src/timers.cc:241:15: warning: ‘char* strncat(char*, const char*, size_t)’ output may be truncated copying 108 bytes from a string of length 127 [-Wstringop-truncation]
241 | std::strncat(storage, tz_offset, sizeof(storage) - timestamp_len - 1);
| ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
While this is essentially a false positive (we never expect
the number of bytes in tz_offset to be too large), the compiler can't
actually tell that. Shrink the size of tz_offset to a smaller, but still safe
size to eliminate this warning.
Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
* Implement custom benchmark name
The benchmark's name can be changed using the Name() function
which internally uses SetName().
* Update AUTHORS and CONTRIBUTORS
* Describe new feature in README
* Move new name function up
Fixes#1106
Fixes google#1077
Bazel clients currently cannot build the benchmark library in Release
mode. This commit adds a new target ":benchmark_release" to enable this.
The existing behavior results in the `0` value being added twice. Since
`lo` is always added to `dst`, we never want to explicitly add `0` if
`lo` is equal to `0`.
On s390 architecture, z/OS XL compiler uses HLASM inline assembly, which has different syntax and needs to be distinguished to avoid compilation error.
Noticed missing header when was building llvm with gcc-11:
```
llvm-project/llvm/utils/benchmark/src/benchmark_register.h:17:30:
error: 'numeric_limits' is not a member of 'std'
17 | static const T kmax = std::numeric_limits<T>::max();
| ^~~~~~~~~~~~~~
```
Without this commit, compilation fails on DragonFly with the following message:
```
/home/mneumann/Dev/benchmark.old/src/sysinfo.cc:446:2: error: #warning "HOST_NAME_MAX not defined. using 64" [-Werror=cpp]
^~~~~~~
```
Also note that the sysctl is actually `hw.tsc_frequency` on DragonFly:
```
$ sysctl hw.tsc_frequency
hw.tsc_frequency: 3498984022
```
Tested on:
```
$ uname -a
DragonFly box.localnet 5.9-DEVELOPMENT DragonFly v5.9.0.742.g4b29dd-DEVELOPMENT #5: Tue Aug 18 00:21:31 CEST 2020
```
As per discussions in here [1], LLVM is going to get backend support on
Motorola 68000 series CPUs (a.k.a M68K or M680x0). So it's necessary to
add CycleTimer implementation here, which is simply using `gettimeofday`
same as MIPS. This fixes#1049
[1] https://reviews.llvm.org/D88389
NOTE: This is a fresh-start of #738 pull-request which I messed up by re-editing the commiter email which I forgot to modify before pushing. Sorry for the inconvenience.
This PR brings proposed solution for functionality described in #737Fixes#737.
* Fix setup.py and reformat
* Bind benchmark
* Add benchmark option to Python
* Add Python examples for range, complexity, and thread
* Remove invalid multithreading in Python
* Bump Python bindings version to 0.2.0
Co-authored-by: Dominic Hamon <dominichamon@users.noreply.github.com>
* Bind Counter to Python
* Bind State methods to Python
* Bind state.counters to Python
* Import _benchmark.Counter
* Add Python example of state usage
Co-authored-by: Dominic Hamon <dominichamon@users.noreply.github.com>
* Create pylint.yml
* improve file matching
* fix some pylint issues
* run on PR and push (force on master only)
* more pylint fixes
* suppress noisy exit code and filter to fatals
* add conan as a dep so the module is importable
* fix lint error on unreachable branch
* Adds -lm linker flag for (Free|Open)BSD and uses github.com/bazelbuild/platforms for platform detection.
* Prefer selects.with_or to select the linkopts.
* @platforms appears to be implicitly available. @bazel_skylib would require updating every dependent repository.
* Re-enable platforms package.
Fixes#974. The `cxx_feature_check` now has an additional
optional argument which can be used to supply extra cmake flags
to pass to the `try_compile` command. The `CMAKE_CXX_STANDARD=14`
flag was determined to be the minimum flag necessary to correctly
compile and run the regex feature checks when compiling with Clang
under Windows (n.b. this does *not* refer to clang-cl, the frontend
to the MSVC compiler). The additional flag is not enabled for any
other compiler/platform tuple.