* Add LTO builds on Windows+MSVC
Gates the MSVC switches behind an `@bazel_skylib:selects` statement.
This is a first experiment from best guesses and studying the MSVC docs.
* Fix misleading inline comment
* Reapply size optimization for clang, equivalent options for MSVC
Working towards cross-platform optimal nanobind building configurations.
* Add LTO back to non-Windows builds
The Windows case (the option name is "/GL") is more complicated, since
there, the compiler options also need to be passed to the linker if LTO
is enabled.
Since we are gating the linker options on platform at the moment instead
of compiler, we need to implement a Bazel boolean flag for the case
"Platform == MacOS && Compiler == AnyOf(gcc, clang)".
* Change nanobind linkage to response file approach
This change needs https://github.com/bazelbuild/bazel/pull/18952 to be
merged first. Fixes macOS linkage of GBM's nanobind bindings on macOS by
supplying a linker response file instead of `-undefined dynamic_lookup`.
The latter has since been deprecated on macOS.
* Fix bazel_skylib checksum, bump skylib version in MODULE.bazel
* Bump Bazel to version 6.4.0 for linker response file support
* Add Python 3.12 support tag
* Bump nanobind to latest stable v1.6.2 tag
* Add PyPI trusted publishing to GitHub workflow, add Python 3.12 wheel builds
Trusted publishing has been available since v1.8.0 of the pypa-publish
action. It enables password-less authentication and wheel uploads from
the wheel upload job.
`cibuildwheel` was bumped to v2.16.2 to allow Python 3.12 wheel builds.
More info on trusted publishing:
https://github.com/marketplace/actions/pypi-publish#trusted-publishing
The Windows distribution was reverted to `latest` in the OS matrix,
since the discovery problem of MSVC was fixed in a Bazel patch release.
* Bump nanobind to stable v1.7.0 tag
We used assert() a lot in tests and that can cause build breakages in some of the opt builds (since assert() are removed)
it's not practical to sprinkle "(void)" everywhere so I think setting this warning option is the best option for now.
* Increase the kMaxIterations limit
This fixes#1663. Note that as a result of this change, the columns in the console output can become misaligned if the actual iteration count is too high. This will be dealt with in a separate commit.
* Fix failing test on Windows
* Fix formatting
---------
Co-authored-by: dominic <510002+dmah42@users.noreply.github.com>
* Make json and csv output consistent.
Currently, the --benchmark_format=csv option does not output the correct value for the cv statistics. Also, the json output should not contain a time unit for the cv statistics.
* fix formatting
* undo json change
---------
Co-authored-by: dominic <510002+dmah42@users.noreply.github.com>
There are three major compilers on Windows targeting the MSVC ABI (i.e.
linking with microsofts STL etc.):
- `MSVC`
- `clang-cl` aka clang with the MSVC compatible CLI
- `clang++` aka clang with gcc compatible CLI
The cmake variable `MSVC` is only set for the first two as it defined in
terms of the CLI interface provided:
> Set to true when the compiler is some version of Microsoft Visual
> C++ or another compiler simulating the Visual C++ cl command-line syntax.
(from cmake docs)
For many of the tests in the library its the ABI that matters not the
cmdline, so check `CMAKE_CXX_SIMULATE_ID` too, if it is `MSVC` the
current compiler is targeting the MSVC ABI. This handles `clang++`
Previously, this could return the wrong result when there
was an even number of elements.
There were two `nth_element` calls. The second call could
change elements in `[center2, end])`, which was where
`center` pointed. Therefore, `*center` sometimes had the
wrong value after the second `nth_element` call.
Rewrite to use `max_element` instead of the second call to
`nth_element`. This avoids modifying the vector.
* test: Use gtest_main only when needed
There are two types of tests. `*_gtest.cc` files use `gtest` and
`gtest_main`. `*_test.cc` files define their own main.
Only depend on `gtest`/`gtest_main` when needed. This is similar
to what `CMakeLists.txt` does.
* comment-only: gunit => gtest
* Fix typo
* perf_counters: Initialize once only when needed
This works around some performance problems running Android under QEMU.
Calling `pfm_initialize` was very slow, and was called during dynamic
initialization (before `main` or when loaded as a shared library).
This happened whenever benchmark was linked, even if no benchmarks
were run.
Instead, call `pfm_initialize` at most once, and only when one of:
1. `PerfCounters::Initialize` is called
2. `PerfCounters::Create` is called with a non-empty counter list
3. `PerfCounters::IsCounterSupported` is called
The return value of the first `pfm_initialize()` is saved and
returned from all subsequent `PerfCounters::Initialize` calls.
* perf_counters: Make success var const
* InitLibPfmOnce: Inline function
* State: Initialize counters with kAvgIteration in constructor
Previously, `counters` was updated in `PauseTiming()` with
`counters[name] += Counter(measurement, kAvgIteration)`.
The first `counters[name]` call inserts a counter with no flags.
There is no `operator+=` for `Counter`, so the insertion is done
by converting the `Counter` to a `double`, then constructing a
`Counter` to insert from the `double`, which drops the flags.
Pre-insert the `Counter` with the correct flags, then only
update `Counter::value`.
Introduced in 1c64a36 ([perf-counters] Fix pause/resume (#1643)).
* perf_counters_test.cc: Don't divide by iterations
Perf counters are now divided by iterations, so dividing again
in the test is wrong.
* State: Fix shadowed param error
* benchmark.cc: Fix clang-tidy error
---------
Co-authored-by: dominic <510002+dmah42@users.noreply.github.com>
* perf_counters_gtest: Make test pass on Android
Tested on Pixel 3 and Pixel 6. Reduce test to the intersection of
what passes on all platforms.
Pixel 6 doesn't support BRANCHES, and only supports two perf
counters.
---------
Co-authored-by: dominic <510002+dmah42@users.noreply.github.com>
Change condition for `benchmarks_with_threads` from `benchmark.threads() > 0` to `> 1`. `threads()` appears to always be `>= 1`.
Introduced in fbc6efa (Refactoring of PerfCounters infrastructure (#1559))
* [perf-counters] Fix pause/resume
Using `state.PauseTiming() / state.ResumeTiming()` was broken.
Thanks [@virajbshah] for the the repro testcase.
* ran clang-format over the whole perf_counters_test.cc
* Remove check that perf counters are 0 on `Pause`, since `Pause`/`Resume`
sequences would cause a non-0 counter value
* both upper and lower bound for the with/without resume counters
---------
Co-authored-by: dominic <510002+dmah42@users.noreply.github.com>
The Windows toolchain detection fix made it into Bazel 6.3.0, so the CI
should work again with the re-enabled `windows-latest` marker.
Require Bazel 6.3.0 in the Linux container setup in `cibuildwheel`.
The dependencies are contained in the `pyproject.toml` since it was added.
Switches to header and source file globbing instead of manually listing
the files. The selects for different platforms are removed, as a tradeoff,
we take a single- to low double-digit hit in wheel sizes (between 5 percent
zipped and 12% installed on MacOS 13.4).
The newly created `pyproject.toml` contains all static metadata as well
as the readme and version as dynamic arguments, to be read by setuptools
during the build.
What is left in the `setup.py` for now is the custom Bazel extension
class, since that is not properly supported yet.
* Add pfm CI actions for bazel
* Fix problems in unit test.
* Undo enabling the CI tests for pfm - github CI machines seemingly do not support performance counters.
* Remove commented code - can be revisited in github history when needed, and there's a comment explaining the rationale behind the new test code.
---------
Co-authored-by: Andy Christiansen <achristiansen@google.com>
Co-authored-by: dominic <510002+dmah42@users.noreply.github.com>