This is an archive of the discontinued LLVM Phabricator instance.

[BranchProbabilityInfo] Handle irreducible loops.
ClosedPublic

Authored by gberry on Oct 27 2017, 1:06 PM.

Details

Summary

Compute the strongly connected components of the CFG and fall back to
use these for blocks that are in loops that are not detected by
LoopInfo when computing loop back-edge and exit branch probabilities.

Diff Detail

Repository
rL LLVM

Event Timeline

gberry created this revision.Oct 27 2017, 1:06 PM
fhahn added a subscriber: fhahn.Oct 27 2017, 2:12 PM
fhahn added inline comments.Oct 27 2017, 3:12 PM
lib/Analysis/BranchProbabilityInfo.cpp
458 ↗(On Diff #120682)

Would it be possible to use insert here instead, to avoid doing another lookup at line 463?

gberry marked an inline comment as done.Oct 27 2017, 4:45 PM
gberry added inline comments.
lib/Analysis/BranchProbabilityInfo.cpp
458 ↗(On Diff #120682)

Yep, fixed. Thanks.

fhahn added a comment.Oct 30 2017, 3:58 AM

This looks sensible to me. Do you have any numbers about the impact of this change? Not sure about the compile time impact of computing the SCC here (computing SCC should not be too expensive, O(N+E)), but at least BlockFrequencyInfoImpl uses SCC for a similar thing but it is only computed once it discovers irreducible control flow. It might be worth having this as a separate analysis to do the construction only once in the future.

fhahn added a reviewer: fhahn.Oct 30 2017, 4:04 AM
davidxl added inline comments.Oct 30 2017, 10:45 AM
lib/Analysis/BranchProbabilityInfo.cpp
450 ↗(On Diff #120682)

Nit: It is probably better to change this lambda into a static function and move the body out to make the caller function body look cleaner.

gberry marked an inline comment as done.Oct 30 2017, 11:08 AM

This looks sensible to me. Do you have any numbers about the impact of this change? Not sure about the compile time impact of computing the SCC here (computing SCC should not be too expensive, O(N+E)), but at least BlockFrequencyInfoImpl uses SCC for a similar thing but it is only computed once it discovers irreducible control flow. It might be worth having this as a separate analysis to do the construction only once in the future.

I observed the following performance improvements on AArch64 (Falkor):

spec2017/xz:train             1.00
spec2006/bzip2:lessnoise      7.24

As for compile time, the times for BPI do increase, but the increases are in the noise of the compile time as a whole. I looked briefly at adding irreducible loop detection to LoopInfo, so I could check that here before computing SCCs (see FIXME comment around line 820), but it didn't seem simple enough to justify the effort given the level of compile time increase.

lib/Analysis/BranchProbabilityInfo.cpp
450 ↗(On Diff #120682)

Sure, will do.

gberry updated this revision to Diff 120984.Oct 31 2017, 8:14 AM

Address Florian and David's comments.

davidxl added inline comments.Oct 31 2017, 9:51 AM
lib/Analysis/BranchProbabilityInfo.cpp
828 ↗(On Diff #120984)

Do you see any visible compile time impact of this?

gberry added inline comments.Oct 31 2017, 11:19 AM
lib/Analysis/BranchProbabilityInfo.cpp
828 ↗(On Diff #120984)

I did not (see my previous comment). For the record, I looked at compile times compiling CTMark for aarch64.

This revision is now accepted and ready to land.Oct 31 2017, 11:24 AM
fhahn accepted this revision.Oct 31 2017, 11:48 AM

This looks sensible to me. Do you have any numbers about the impact of this change? Not sure about the compile time impact of computing the SCC here (computing SCC should not be too expensive, O(N+E)), but at least BlockFrequencyInfoImpl uses SCC for a similar thing but it is only computed once it discovers irreducible control flow. It might be worth having this as a separate analysis to do the construction only once in the future.

I observed the following performance improvements on AArch64 (Falkor):

spec2017/xz:train             1.00
spec2006/bzip2:lessnoise      7.24

Great thanks!

As for compile time, the times for BPI do increase, but the increases are in the noise of the compile time as a whole. I looked briefly at adding irreducible loop detection to LoopInfo, so I could check that here before computing SCCs (see FIXME comment around line 820), but it didn't seem simple enough to justify the effort given the level of compile time increase.

Yeah that's what I though too. LGTM

This revision was automatically updated to reflect the committed changes.