This is an archive of the discontinued LLVM Phabricator instance.

[CGP] Avoid repeatedly building DominatorTree causing long compile-time (NFC)
ClosedPublic

Authored by tejohnson on Mar 5 2019, 1:20 PM.

Details

Summary

In r354298 a DominatorTree construction was added via new function
combineToUSubWithOverflow, which was subsequently restructured into
replaceMathCmpWithIntrinsic in r354689. We are hitting a very long
compile time due to this repeated construction, once per math cmp in
the function.

We shouldn't need to build the DominatorTree more than once per
function, except when a transformation invalidates it. There is already
a boolean flag that is returned from these methods indicating whether
the DT has been modified. We can simply build the DT once per
Function walk in CodeGenPrepare::runOnFunction, since any time a change
is made we break out of the Function walk and restart it.

I modified the code so that both replaceMathCmpWithIntrinsic as well as
mergeSExts (which was also building a DT) use the DT constructed by the
run method.

From -mllvm -time-passes:
Before this patch: CodeGen Prepare user time is 328s
With this patch: CodeGen Prepare user time is 21s

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson created this revision.Mar 5 2019, 1:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 5 2019, 1:20 PM
Herald added a subscriber: jdoerfert. · View Herald Transcript
spatel accepted this revision.Mar 5 2019, 1:36 PM

LGTM
Sorry I didn't see this problem, and thanks for fixing it!
If you have any perf data that you can share, it would be good to include that here or in the commit message. I'm curious if this change recovers all of the compile-time perf or if walking the icmp users is also expensive (although I'm not sure how to make that any faster).

This revision is now accepted and ready to land.Mar 5 2019, 1:36 PM

LGTM
Sorry I didn't see this problem, and thanks for fixing it!
If you have any perf data that you can share, it would be good to include that here or in the commit message. I'm curious if this change recovers all of the compile-time perf or if walking the icmp users is also expensive (although I'm not sure how to make that any faster).

I don't have data on how much the time was before your patches went in, but I do have some performance data before and after this patch. Will add it in the commit message.

tejohnson edited the summary of this revision. (Show Details)Mar 6 2019, 6:28 AM
This revision was automatically updated to reflect the committed changes.

Hi, do we know whether this patch fully mitigated the issue or if there are still opportunities to improve compile time? I'm using clang trunk from Wednesday March 20th, and I cannot finish the compilation of a large project with ThinLTO and autofdo profile (compilation time was reasonable before, but ballooned to over 10 hours). When I run perf top to see what is happening, I see that the compiler is most of the time in code dealing with dominator trees or in CodeGenPrepare::runOnFunction. I don't remember seeing this issue before (Clang from February). A snapshot of hottest functions while waiting for the never finishing linking:

 1.43%  [.] (anonymous namespace)::CodeGenPrepare::runOnFunction
 1.38%  [.] llvm::TargetLibraryInfoImpl::getLibFunc
  1.19%  [.] llvm::BasicBlock::getTerminator
 1.13%  [.] llvm::DenseMapBase<llvm::DenseMap<llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::BasicBlock*>, llvm::detail::DenseMapPair<llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<l
 0.94%  [.] (anonymous namespace)::CodeGenPrepare::optimizeInst
 0.81%  [.] llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::attachNewSubtree
 0.65%  [.] (anonymous namespace)::CodeGenPrepare::optimizeCallInst
0.54%  [.] llvm::Instruction::getSuccessor
 0.51%  [.] llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::runDFS<false, bool (*)(llvm::BasicBlock*, llvm::BasicBlock*)>

Just thought about mentioning here in case anyone has ideas, but I'm still figuring out the root cause.