Previously, the Start/Stop routines were called before the benchmark function
was called and after it returned. However, what we really want is for them
to be called within the core of the benchmark:
for (auto _ : state) {
// This is what we want traced, not the entire BM_foo function.
}
This API is akin to the MemoryManager API and lets tools provide
their own profiler which is wrapped in the same way MemoryManager is
wrapped. Namely, the profiler provides Start/Stop methods that are called
at the start/end of running the benchmark in a separate pass.
Co-authored-by: dominic <510002+dmah42@users.noreply.github.com>