This is an archive of the discontinued LLVM Phabricator instance.

[MBP] Disable aggressive loop rotate in plain mode
ClosedPublic

Authored by Carrot on Aug 2 2019, 1:27 PM.

Details

Summary

Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.

Diff Detail

Repository
rL LLVM

Event Timeline

Carrot created this revision.Aug 2 2019, 1:27 PM
hans added a comment.Aug 7 2019, 1:28 AM

davidxl: Do you think you'll have time to look at this this week? I was hoping it could be made part of llvm 9-rc2.

davidxl added inline comments.Aug 7 2019, 3:55 AM
lib/CodeGen/MachineBlockPlacement.cpp
2054 ↗(On Diff #213116)

Can this be merged with findBestLoopTop with some refactoring?

2545 ↗(On Diff #213116)

Does this change also turn on PreciseRotationCost by default? I don't think this is intended.

Carrot updated this revision to Diff 214011.Aug 7 2019, 2:59 PM
Carrot marked 2 inline comments as done.
davidxl requested changes to this revision.Aug 7 2019, 8:52 PM

See Some minor name suggestions, also

  1. the new test cases added in the previous patch should use --force-precise-rotation-cost option to avoid losing test coverage;
  2. the unrelated tests should restore to the exactly the same behavior as before.
lib/CodeGen/MachineBlockPlacement.cpp
1960 ↗(On Diff #214011)

Change PlainMode to 'hasStaticProfileOnly'

2075 ↗(On Diff #214011)

findBestLoopTopPlain --> findBestLoopTopNoProfile

This revision now requires changes to proceed.Aug 7 2019, 8:52 PM
Carrot updated this revision to Diff 214180.Aug 8 2019, 10:19 AM
Carrot marked 2 inline comments as done.
davidxl accepted this revision.Aug 8 2019, 10:33 AM

lgtm

This revision is now accepted and ready to land.Aug 8 2019, 10:33 AM
This revision was automatically updated to reflect the committed changes.

@Carrot this seems to be reverted, but aggressive loop rotate still has mostly negative affect on performance in internal benchmarks that I've run.
Could you please elaborate on how branches taken was reduced from 2 to 1 (and how it was calculated) for the test case you mentioned in the https://reviews.llvm.org/D43256#1534855

    entry
      |
      V
-->for.body
|     |\
|     | \
|     |  \
|     | if.then3
|     |  /
|     | /
|     |/
---for.inc
      |
      V
for.cond.cleanup

One of my benchmarks is very similar, however the aggressive transformation only introduces additional unconditional branches.
It seems that part of the reason for this is that the branches that the transformation was supposed to make fallthrough (i.e. for.inc -> for.body) are conditional and couldn't be eliminated even if the block is reordered, so previously fallthrough became unconditional branches, while no other branch was eliminated.

How is this transformation supposed to handle this?

llvm/trunk/test/CodeGen/X86/loop-blocks.ll