This is an archive of the discontinued LLVM Phabricator instance.

[TailCallElim] Preserve DT and PDT
ClosedPublic

Authored by NutshellySima on Jul 30 2018, 6:09 AM.

Details

Summary

Previously, in the NewPM pipeline, TailCallElim recalculates the DomTree when it modifies any instruction in the Function.
For example,

CallInst *CI = dyn_cast<CallInst>(&I);
...
CI->setTailCall();
Modified = true;
...
if (!Modified || ...)
  return PreservedAnalyses::all();

After applying this patch, the DomTree only recalculates if needed (plus an extra insertEdge() + an extra deleteEdge() call).

When optimizing SQLite with -passes="default<O3>" pipeline of the newPM, the number of DomTree recalculation decreases by 6.2%, the number of nodes visited by DFS decreases by 2.9%. The time used by DomTree will decrease approximately 1%~2.5% after applying the patch.

Statistics:

Before the patch:
 23010 dom-tree-stats               - Number of DomTree recalculations
489264 dom-tree-stats               - Number of nodes visited by DFS -- DomTree
After the patch:
 21581 dom-tree-stats               - Number of DomTree recalculations
475088 dom-tree-stats               - Number of nodes visited by DFS -- DomTree

Diff Detail

Repository
rL LLVM

Event Timeline

NutshellySima created this revision.Jul 30 2018, 6:09 AM

Use Lazy Strategy instead of Eager to eliminate unnecessary recalculation like

DomTreeUpdater DTU(...);
// EntryBB changed
DTU.recalculate(F);
DTU.insertEdge(...)/deleteEdge(...);
DTU.flush(); // We can delay the recalculation to here if we use the Lazy UpdateStrategy.
kuhar added inline comments.Aug 2 2018, 8:51 AM
lib/Transforms/Scalar/TailRecursionElimination.cpp
596 ↗(On Diff #158695)

Can you add a comment explaining why we have to recalculate here?

826 ↗(On Diff #158695)

Why is Eager better than Lazy here?

857 ↗(On Diff #158695)

Same as above.

brzycki added inline comments.Aug 2 2018, 10:45 AM
lib/Transforms/Scalar/TailRecursionElimination.cpp
596 ↗(On Diff #158695)

+1. recalculate is very expensive and it should become practice for anyone using it to have to describe why a caller has to fall back on this can't rely on the normal API.

NutshellySima added inline comments.Aug 2 2018, 10:55 AM
lib/Transforms/Scalar/TailRecursionElimination.cpp
826 ↗(On Diff #158695)

I ran opt using the newPM O3 pipeline using both strategies on three bitcode files (clang-preopt/clang-opt/SQLite), and observed a performance difference of approximately 0.1%, which can be considered as noise.

There is actually no strong preference of using either one because both work fine from my test. (After applying D50173).
So I'd better left it not commented.

857 ↗(On Diff #158695)

Same as above.

Add the comment to point out the reason why we recalculate the DominatorTree.

NutshellySima marked 2 inline comments as done.Aug 2 2018, 11:17 AM
kuhar added inline comments.Aug 2 2018, 11:52 AM
lib/Transforms/Scalar/TailRecursionElimination.cpp
826 ↗(On Diff #158695)

I'd rather you left a comment here saying there's no difference for the tests we ran and we can later switch to Lazy in the future, if that proves to be profitable.

Address comments

NutshellySima marked 5 inline comments as done.

Amend comments.

kuhar accepted this revision.Aug 2 2018, 12:49 PM

Thanks for the changes!

This revision is now accepted and ready to land.Aug 2 2018, 12:49 PM
brzycki accepted this revision.Aug 2 2018, 1:14 PM

LGTM.

This revision was automatically updated to reflect the committed changes.