This is an archive of the discontinued LLVM Phabricator instance.

Skip debuginfo intrinsic in markLiveBlocks.
ClosedPublic

Authored by trentxintong on Jul 13 2018, 9:38 PM.

Details

Summary

The optimizer is 10%+ slower with vs without debuginfo. I started checking where
the difference is coming from.

I compiled sqlite3.c with and without debug info from CTMark and compare the time difference.

I use Xcode Instrument to find where time is spent. This brings about 20ms, out of ~20s.

Diff Detail

Repository
rL LLVM

Event Timeline

trentxintong created this revision.Jul 13 2018, 9:38 PM

There's nothing special about debug intrinsics here except that there are a lot of them. The problem, as far as I can tell, is that we're repeatedly using dyn_cast on each instruction and doing multiple redundant tests. Adding yet another redundant test will help having a lot of debug intrinsics makes things incrementally more expensive for all other kinds of intrinsics. Thus, this doesn't seem like the right way to fix this. Instead of testing for IntrinsicInst, and then CallInst (which is always true whenever the IntrinsicInst is true), and then we always test for StoreInst (but we don't use an 'else' so we always do this test even when the IntrinsicInst/CallInst is true (which will include debug intrinsics)).

First, I'd try adding an 'else' when testing for a store. Also, get rid of the IntrinsicInst test, getIntrinsicID is a member of Function, so you can do dyn_cast Callee to Function, and then check for the intrinsic IDs.

Hi @hfinkel.

Thank you for taking a look at this code. I agree we should definitely "else" the StoreInst check and I have observed improvements in Xcode Instrument for doing so.

And I should also move the DbgInfoIntrinsic check inside the IntrinsicInst check by comparing IntrinsicID.

The problem with moving the IntrinsicInst checks inside the dyn_cast<CallInst> is that every intrinsic call will need to go through a dyn_cast<CallInst> check, a getCalledValue() and then a dyn_cast<Function>(CalledValue) check. If we have a lot of intrinsic which we do when compiling with debuginfo, we could end up being overall slower than keeping the IntrinsicInst check.

When we compile without debuginfo, moving the intrinsic check inside the CallInst check will not necessarily bring us performance, as now every CallInst needs to checks for the existing cases in the if (dyn_cast<CallInst>)) and the Intrinsics (which require dyn_cast<Function>(CalledValue)).

Right now I am seeing the version which we keep the IntrinsicInst check and adding the DbgInfoIntrinsic inside it to be the fastest on sqlite3.c with debuginfo.

Address some of @hfinkel suggestions.

This brings ~25ms now out of ~20s spent to compile sqlite3.c.

Hi @hfinkel.

Thank you for taking a look at this code. I agree we should definitely "else" the StoreInst check and I have observed improvements in Xcode Instrument for doing so.

And I should also move the DbgInfoIntrinsic check inside the IntrinsicInst check by comparing IntrinsicID.

The problem with moving the IntrinsicInst checks inside the dyn_cast<CallInst> is that every intrinsic call will need to go through a dyn_cast<CallInst> check, a getCalledValue() and then a dyn_cast<Function>(CalledValue) check. If we have a lot of intrinsic which we do when compiling with debuginfo, we could end up being overall slower than keeping the IntrinsicInst check.

When we compile without debuginfo, moving the intrinsic check inside the CallInst check will not necessarily bring us performance, as now every CallInst needs to checks for the existing cases in the if (dyn_cast<CallInst>)) and the Intrinsics (which require dyn_cast<Function>(CalledValue)). 4

I'm not sure that I understand what you're saying. "is that every intrinsic call will need to go through a dyn_cast<CallInst> check, a getCalledValue() and then a dyn_cast<Function>(CalledValue) check." - this is exactly what the IntrinsicInst dyn_cast does internally. When we cast to IntrinsicInst we call the following in IntrinsicInst:

static bool classof(const CallInst *I) {
  if (const Function *CF = I->getCalledFunction())
    return CF->isIntrinsic();
  return false;
}
static bool classof(const Value *V) {
  return isa<CallInst>(V) && classof(cast<CallInst>(V));
}

so we pay for the isa<CallInst>, which should be the same cost as the dyn_cast<CallInst>, and then call I->getCalledFunction(), which is implemented as a dyn_cast<Function> of getCalledValue(). When we do isa<DbgInfoIntrinsic>, we do all of that, and then also call getIntrinsicID and compare to one of the IDs for the debug info intrinsics. Thus, to be clear, what I'm saying is that if we're worried about duplicated work here, we shouldn't be using IntrinsicInst at all.

Right now I am seeing the version which we keep the IntrinsicInst check and adding the DbgInfoIntrinsic inside it to be the fastest on sqlite3.c with debuginfo.

@hfinkel I get what you mean. Basically moving the IntrinsicInst check inside the call does not result us doing extra work when processing IntrinsicInst as we essentially do the same things in the dyn_cast<IntrinsicInst>. I will update the diff accordingly.

Update diff addressing all of @hfinkel comments.

This brings about 20ms - 25ms out of ~20s when compile sqlite3.c

hfinkel added inline comments.Jul 15 2018, 7:37 PM
lib/Transforms/Utils/Local.cpp
2025 ↗(On Diff #155581)

You shouldn't need to check for the debugging intrinsics at all. We're skipping all intrinsics except for assume and experimental_guard.

2043 ↗(On Diff #155581)

This can be else if.

2063 ↗(On Diff #155581)

This can be else if too (if Callee is a Function, then it's not a null pointer constant.

Address comments. Thanks @hfinkel

hfinkel accepted this revision.Jul 16 2018, 3:58 PM

LGTM

This revision is now accepted and ready to land.Jul 16 2018, 3:58 PM
This revision was automatically updated to reflect the committed changes.