The change to high_precision_clock to gain cross-thread consistency has not been tested on all platforms. Also need to measure slowdown on all platforms. Think we're going to use this change internally for a while before attempting to push upstream.
This change allows a TimeProfiler profiling entry to be created
on one thread and finished on another. This was triggered by a use
case where we needed to trace the scheduling delay for work items
ultimately handled by independent worker threads.
The clock is changed to std::chrono::high_resolution_clock, both for
its increased precision (our work items can be in the us range), but
also for its cross-thread consistency (all threads see the same clock).
As a pleasant side effect it is now possible to pay for the cost of
constructing a profiling entry with a possibly expensive detail string
outside of the critical section we wish to trace. I think we could do
better with shifting the cost of constructing detail strings to the final
trace writing phase, given the API already supports providing detail
strings via a callback. But I've not audited the code to check if
detail closures could have their lifetimes safely extended so dramatically.
Could not find unit tests, please let me know if I missed them.