* Stop generating the export header and just check it in
* format the new header
* support windows
* format the header again
* avoid depending on internal macro
* ensure we define the right thing for windows static builds
* support older cmake
* and for tests
* attempt to fix sanitizer builds by moving away from llvm head
* extra verbosity
* try clang 13 and add extra logging
* get latest clang and try again
Non-const DoNotOptimize() can't compile when used with some types.
Example of code which can't compile:
char buffer3[3] = "";
benchmark::DoNotOptimize(buffer3);
Error message:
error: impossible constraint in 'asm'
asm volatile("" : "+r"(value) : : "memory");
Introduced in 8545dfb (Fix DoNotOptimize() GCC copy overhead (#1340) (#1410))
The cause is compiler can't work with the +r constraint for types that can't
be placed perfectly in registers. For example, char array[3] can't be perfectly
fit in register on x86_64 so it requires placed in memory but constraint
doesn't allow that.
Solution
- Use +m,r constraint for the small objects so the compiler can decide to use
register or/and memory
- For the big objects +m constraint is used which allows avoiding extra copy
bug(see #1340)
- The same approach is used for the const version of DoNotOptimize()
although the const version works fine with the "r" constraint only.
Using mixed r,m constraint looks more general solution.
See
- Issue #1340 ([BUG] DoNotOptimize() adds overhead with extra copy of argument(gcc))
- Pull request #1410 (Fix DoNotOptimize() GCC copy overhead (#1340) #1410)
- Commit 8545dfb (Fix DoNotOptimize() GCC copy overhead (#1340) (#1410))
* Fix DoNotOptimize() GCC copy overhead (#1340)
The issue is that GCC DoNotOptimize() does a full copy of an argument
if it's not a pointer and it slows down a benchmark. If an argument is big
enough there is a memcpy() call for copying the argument. An argument
object can be a big object so DoNotOptimize() could add sufficient
overhead and affects benchmark results.
The cause is in GCC behavior with asm volatile constraints. Looks like GCC
trying to use r(register) constraint for all cases despite object size.
See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105519
The solution is the split DoNotOptimize() in two cases - value fits
in register and value doesn't fit in register. And use case specific
asm constraint. std::is_trivially_copyable trait is needed because
"+r" constraint doesn't work with non trivial copyable objects.
- Fix requires support C++11 feature std::is_trivially_copyable from GCC
compiler. The feature has been supported since GCC 5
- Fallback for GCC version < 5 still exists but it uses "m" constraint
which means a little bit more overhead in some cases
- Add assembly tests for issued cases
Fixes#1340
* Add supported compiler versions info for assembly tests
- Assembly tests are inherently non-portable. So explicitly add GCC
and Clang versions required for reliable tests passed
- Write a warning message if the current compiler version isn't supported
* Add possibility to ask for libbenchmark version number (#1004)
Add a header which holds the current major, minor, and
patch number of the library. The header is auto generated
by CMake.
* Do not generate unused functions (#1004)
* Add support for version number in bazel (#1004)
* Fix clang format #1004
* Fix more clang format problems (#1004)
* Use git version feature of cmake to determine current lib version
* Rename version_config header to version
* Bake git version into bazel build
* Use same input config header as in cmake for version.h
* Adapt the releasing.md to include versioning in bazel
* add multiple OSes to bazel workflow
* correct indent
* only set copts when they're supported by the OS
* os check should work
* pull out cxx03_test for per-platform stuff
* attempt to fix windows test output
Report all time numbers > 10 digits in scientific notation with
4 decimal places. This is necessary since only 10 digits
are currently reserved for the time columns (Time and CPU).
If exceeding 10 digits the output isnt properly aligned anymore.
* Introduce warmup phase to BenchmarkRunner (#1130)
In order to account for caching effects in user
benchmarks introduce a new command line option
"--benchmark_min_warmup_time"
which allows to specify an amount of time for
which the benchmark should be run before results
are meaningful.
* Adapt review suggestions regarding introduction of warmup phase (#1130)
* Fix BM_CHECK call in MinWarmUpTime (#1130)
* Fix comment on requirements of MinWarmUpTime (#1130)
* Add basic description of warmup phase mechanism to user guide (#1130)
* Add option to get the verbosity provided by commandline flag -v (#1330)
* replace assert with test failure
asserts are stripped out in non debug builds, and we run tests in non-debug CI bots.
* clang-format my own tweak
Co-authored-by: Dominic Hamon <dominichamon@users.noreply.github.com>
This commit adds a small section on how to install and build Python
bindings wheels to the docs, as well as a link to it from the main readme.
Notes were added that clearly state availability of Python wheels based
on Python version and OS/architecture combinations.
For the guide to build a wheel from source, the best practice of
creating a virtual environment and activating it before build was
detailed. Also, a note on the required installation of Bazel was added,
with a link to the official docs on installation.
* Filter out benchmarks that start with "DISABLED_"
This could be slightly more elegant, in that the registration and the
benchmark definition names have to change. Ideally, we'd still register
without the DISABLED_ prefix and it would all "just work".
Fixes#1365
* add some documentation
Previously, with the unrolled job matrix, all jobs had to be listed individually in the `needs` section of the PyPI upload job. But as the wheel build job was reimplemented as a job matrix now, with a
single build job name `build_wheels`, we need to adjust the name in the PyPI upload job as well here to avoid errors.
This commit adds a `bazel shutdown` command to the setuptools BazelExtension. This has the effect that wheel builds shut down the Bazel server and terminate gracefully after the build, something
that was previously an issue on Windows builds.
Since the windows-specific `--no-clean` flag option to `pip wheel` becomes unnecessary due to this change, this change has the side-effect that GitHub Actions wheel builds via `cibuildwheel` can now
be written as a compact job matrix again, which leads to a lot of deduplicated code in the corresponding workflow file.
Lastly, some GitHub-provided actions (checkout, setup-python, upload/download-artifact) were bumped to the latest v3 version.
If someone or something ever needs the dynamic library as a Bazel build
artifact, we can figure that out for them then, but right now, there is
no strong reason to be wrangling various `export.h`-controlling macros.
Fixes#1372.
This commit fixes the previous breakage in Python wheel builds for Windows by adding a `local_defines` field to the `cc_binary` generated in the process of the Python bindings builds. This define is being
picked up by the auto-generated export header `benchmark_export.h`, unsetting the benchmark export macro.
Furthermore, the `linkshared` and `linkstatic` attributes are passed booleans now instead of ints, making the command more directly interpretable to the human reader.
The fix was suggested by @junyer in the corresponding GitHub issue thread https://github.com/google/benchmark/issues/1367 - thank you for the suggestion!
This commit adds a job running after the wheel building job responsible for uploading the built wheels to PyPI.
The job only runs on successful completion of all build jobs, and uploads to PyPI using a secret added to the Google Benchmark repo (TBD).
Also, the setup-python action has been bumped to the latest version v3.
* Add SetBenchmarkFilter() to set --benchmark_filter flag value in user code.
Use case: Provide an API to set this flag indepedence of the flag's implementation (ie., absl flag vs benchmark's flag facility)
* add test
* added notes on Initialize()