This is an archive of the discontinued LLVM Phabricator instance.

[XRay] [compiler-rt] unify all TSC API calls
Needs ReviewPublic

Authored by pelikan on Jun 19 2017, 8:42 PM.

Details

Reviewers
dberris
Summary

The current code unnecessarily checks for CPU features twice and doesn't take advantage of the infrastructure in xray_tsc.h at all. At the cost of going back to the old API, unify FDR's TSC handling with the rest of the code base and get rid of two arguments to processFunctionHook which will be subsequently used in D32844 for the first function call argument.

Should we decide to go from (return value + reference) to std::tuple<> is a separate matter (for a separate review).

The functional change here is very subtle - it will use TSC emulation calls everywhere, based on an unlikely static bool, in the same manner in both naïve and FDR modes.

Event Timeline

pelikan created this revision.Jun 19 2017, 8:42 PM
pelikan planned changes to this revision.Jun 19 2017, 8:54 PM

Actually, this is not NFC. The xray_tsc.h infrastructure doesn't check for old i386 CPUs and VMs where RDTSC is "not available" (because it's masked away, trapped and not handled properly). We should move this support into xray_tsc.h first, and remove the NFC tag.

pelikan updated this revision to Diff 103152.Jun 19 2017, 10:23 PM

convert the entire amd64 TSC handling to do the correct thing in broken environments

pelikan updated this revision to Diff 103153.Jun 19 2017, 10:27 PM

add a newline after clock_gettime(2)'s failure Report() string

pelikan retitled this revision from [XRay] [compiler-rt] switch FDR implementation to the TSC API [NFC] to [XRay] [compiler-rt] switch FDR implementation to the TSC API.Jun 19 2017, 10:35 PM
pelikan edited the summary of this revision. (Show Details)
pelikan updated this revision to Diff 103155.Jun 19 2017, 11:00 PM

unify inmemory log as well

pelikan edited the summary of this revision. (Show Details)Jun 19 2017, 11:01 PM
pelikan retitled this revision from [XRay] [compiler-rt] switch FDR implementation to the TSC API to [XRay] [compiler-rt] unify all TSC API calls.
dberris edited edge metadata.Jun 21 2017, 12:24 AM
dberris added a subscriber: eizan.

What are we trying to achieve here? It looks like it's sinking the decision on which implementation to invoke from the caller, down to the callee (readTSC) which gets less efficient.

While we don't have benchmarks yet upstream that measure the cost of these things, I'd really want to keep code in the readTSC(...) implementation to not have branches, and move branches out to the callers. It's unclear whether the benefit of reducing the arguments in processFunctionHook is worth it, if we're going to add more later anyway.

@eizan is working on getting some benchmarks upstream, but it would be good to make sure we're not unnecessarily being pessimistic with the implementation of key parts of the runtime.

lib/xray/xray_x86_64.inc
22

Why do we need this forward declaration?

26–28

I think we're going to have issues with this code in terms of performance impact. This is introducing a function call and synchronization on the first call, then in subsequent calls will introduce an unavoidable branch.

Is there anything else we need to do here?