Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
davidxl: Do you think you'll have time to look at this this week? I was hoping it could be made part of llvm 9-rc2.
See Some minor name suggestions, also
- the new test cases added in the previous patch should use --force-precise-rotation-cost option to avoid losing test coverage;
- the unrelated tests should restore to the exactly the same behavior as before.
lib/CodeGen/MachineBlockPlacement.cpp | ||
---|---|---|
1960 ↗ | (On Diff #214011) | Change PlainMode to 'hasStaticProfileOnly' |
2075 ↗ | (On Diff #214011) | findBestLoopTopPlain --> findBestLoopTopNoProfile |
@Carrot this seems to be reverted, but aggressive loop rotate still has mostly negative affect on performance in internal benchmarks that I've run.
Could you please elaborate on how branches taken was reduced from 2 to 1 (and how it was calculated) for the test case you mentioned in the https://reviews.llvm.org/D43256#1534855
entry | V -->for.body | |\ | | \ | | \ | | if.then3 | | / | | / | |/ ---for.inc | V for.cond.cleanup
One of my benchmarks is very similar, however the aggressive transformation only introduces additional unconditional branches.
It seems that part of the reason for this is that the branches that the transformation was supposed to make fallthrough (i.e. for.inc -> for.body) are conditional and couldn't be eliminated even if the block is reordered, so previously fallthrough became unconditional branches, while no other branch was eliminated.
How is this transformation supposed to handle this?