The current code unnecessarily checks for CPU features twice and doesn't take advantage of the infrastructure in xray_tsc.h at all. At the cost of going back to the old API, unify FDR's TSC handling with the rest of the code base and get rid of two arguments to processFunctionHook which will be subsequently used in D32844 for the first function call argument.
Should we decide to go from (return value + reference) to std::tuple<> is a separate matter (for a separate review).
The functional change here is very subtle - it will use TSC emulation calls everywhere, based on an unlikely static bool, in the same manner in both naïve and FDR modes.
Why do we need this forward declaration?