This is an archive of the discontinued LLVM Phabricator instance.

[CompileTime] [Passes] Avoid computing unnecessary analyses. NFC
ClosedPublic

Authored by anna on Apr 27 2022, 8:11 AM.

Details

Summary

Similar to c515b2f39e77, If there are no loops in the function as seen
through LI, we should avoid computing the remaining expensive analyses
(such as SCEV, BPI). Reordered the analyses requests and early return
if there are no loops.

The logic of avoiding expensive analyses is applied to LoopVectorizer,
LoopLoadElimination and LoopUnrollPass, i.e. all function passes which operate
on loops.

This is an NFC with compile time improvement.

Diff Detail

Event Timeline

anna created this revision.Apr 27 2022, 8:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2022, 8:11 AM
anna requested review of this revision.Apr 27 2022, 8:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2022, 8:11 AM
anna added a comment.Apr 27 2022, 8:14 AM

When tested on a large method from our internal workload, I see about 12% reduction in BranchFrequencyAnalysis (other analyses time also dropped to a lower degree). Overall opt time also dropped about 4%.

nikic added inline comments.Apr 27 2022, 12:46 PM
llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp
663

This doesn't really make sense for the legacy pass manager, because it will always calculate the analysis, independently of whether it is actually accessed. I'd suggest leaving the LegacyPM implementation alone.

713

I would keep the DT fetch below the LI.empty() check as well. Yes, it's not going to matter for compile-time because LI computes DT anyway, but there's also no need to specially highlight DT here. Same below.

anna added inline comments.Apr 28 2022, 7:01 AM
llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp
663

okay agreed. I didn't bother with legacy PM for the other passes, so will remove from this one as well.

713

Hmm. I actually kept it above to show exactly that DT will need to be computed as part of LI computation. Otherwise the comment at line 715 seems misleading "Return before computing other expensive analyses."

Although one can say that if the other analyses were already cached, we won't be computing them anyway here...
Maybe change it to AM.getCachedResult<DominatorTreeAnalysis>(F) if the check if moved after the LI.empty() check?

anna updated this revision to Diff 425897.Apr 28 2022, 1:54 PM

addressed review comments.

This revision is now accepted and ready to land.Apr 28 2022, 2:37 PM
This revision was landed with ongoing or failed builds.Apr 29 2022, 7:00 AM
This revision was automatically updated to reflect the committed changes.