Compute the strongly connected components of the CFG and fall back to
use these for blocks that are in loops that are not detected by
LoopInfo when computing loop back-edge and exit branch probabilities.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Analysis/BranchProbabilityInfo.cpp | ||
---|---|---|
458 ↗ | (On Diff #120682) | Would it be possible to use insert here instead, to avoid doing another lookup at line 463? |
lib/Analysis/BranchProbabilityInfo.cpp | ||
---|---|---|
458 ↗ | (On Diff #120682) | Yep, fixed. Thanks. |
This looks sensible to me. Do you have any numbers about the impact of this change? Not sure about the compile time impact of computing the SCC here (computing SCC should not be too expensive, O(N+E)), but at least BlockFrequencyInfoImpl uses SCC for a similar thing but it is only computed once it discovers irreducible control flow. It might be worth having this as a separate analysis to do the construction only once in the future.
lib/Analysis/BranchProbabilityInfo.cpp | ||
---|---|---|
450 ↗ | (On Diff #120682) | Nit: It is probably better to change this lambda into a static function and move the body out to make the caller function body look cleaner. |
I observed the following performance improvements on AArch64 (Falkor):
spec2017/xz:train 1.00 spec2006/bzip2:lessnoise 7.24
As for compile time, the times for BPI do increase, but the increases are in the noise of the compile time as a whole. I looked briefly at adding irreducible loop detection to LoopInfo, so I could check that here before computing SCCs (see FIXME comment around line 820), but it didn't seem simple enough to justify the effort given the level of compile time increase.
lib/Analysis/BranchProbabilityInfo.cpp | ||
---|---|---|
450 ↗ | (On Diff #120682) | Sure, will do. |
lib/Analysis/BranchProbabilityInfo.cpp | ||
---|---|---|
828 ↗ | (On Diff #120984) | Do you see any visible compile time impact of this? |
lib/Analysis/BranchProbabilityInfo.cpp | ||
---|---|---|
828 ↗ | (On Diff #120984) | I did not (see my previous comment). For the record, I looked at compile times compiling CTMark for aarch64. |
Great thanks!
As for compile time, the times for BPI do increase, but the increases are in the noise of the compile time as a whole. I looked briefly at adding irreducible loop detection to LoopInfo, so I could check that here before computing SCCs (see FIXME comment around line 820), but it didn't seem simple enough to justify the effort given the level of compile time increase.
Yeah that's what I though too. LGTM