This is an archive of the discontinued LLVM Phabricator instance.

[LoopFusion] Add ability to fuse guarded loops
ClosedPublic

Authored by kbarton on Jul 30 2019, 11:37 AM.

Details

Summary

This patch extends the current capabilities in loop fusion to fuse guarded loops
(as defined in https://reviews.llvm.org/D63885). The patch adds the necessary
safety checks to ensure that it safe to fuse the guarded loops (control flow
equivalent, no intervening code, and same guard conditions). It also provides an
alternative method to perform the actual fusion of guarded loops. The mechanics
to fuse guarded loops are slightly different then fusing non-guarded loops, so I
opted to keep them separate methods. I will be cleaning this up in later
patches, and hope to converge on a single method to fuse both guarded and
non-guarded loops, but for now I think the review will be easier to keep them
separate.

Event Timeline

kbarton created this revision.Jul 30 2019, 11:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 30 2019, 11:37 AM
Whitney added inline comments.Jul 30 2019, 12:08 PM
llvm/lib/Transforms/Scalar/LoopFuse.cpp
741

reportLoopFusion<OptimizationRemarkMissed>(*FC0, *FC1,NonIdenticalGuards);

761

reportLoopFusion<OptimizationRemarkMissed>(*FC0, *FC1, NonEmptyExitBlock);

768

reportLoopFusion<OptimizationRemarkMissed>(*FC0, *FC1, NonEmptyGuardBlock);

1057

what if FC0 guard first successor is FC0 Loop, but FC1 guard second successor is FC1 Loop?

1089

Do we need to assert FC.ExitBlock != nullptr?

kbarton marked 6 inline comments as done.Jul 30 2019, 12:29 PM
kbarton added inline comments.
llvm/lib/Transforms/Scalar/LoopFuse.cpp
1057

Then this check will fail.
This isn't meant to be exhaustive, but a first pass. If we run into situations where the guards are logically equivalent, and not identical, then we can extend this at that point.

1089

This should never be null, otherwise the FusionCandidate is not valid. However, it doesn't hurt to check just to be sure.
I've added a similar check to isEmptyPreheader above as well.

kbarton updated this revision to Diff 212411.Jul 30 2019, 12:30 PM
kbarton marked an inline comment as done.
  • Address review comments.
Whitney added inline comments.Jul 30 2019, 12:37 PM
llvm/lib/Transforms/Scalar/LoopFuse.cpp
1057

I am more worry about correctness. When the two compare instructions are the same, but FC0 guard first successor is FC0 Loop, but FC1 guard second successor is FC1 Loop, then we may incorrectly fuse the two loops?

kbarton marked an inline comment as done.Jul 30 2019, 2:15 PM
kbarton added inline comments.
llvm/lib/Transforms/Scalar/LoopFuse.cpp
1057

Oh, I see what you mean. The guards need to be identical, and the position of the in-loop successor and out-of-loop successors also needs to be the same. That's a good catch. I'll update the patch to test this now.

kbarton updated this revision to Diff 212449.Jul 30 2019, 3:05 PM
  • Improve haveIdenticalGuards method by ensuring the guards gave the same flow into/around the loop in addition to identical compare instructions.
Whitney added inline comments.Jul 31 2019, 2:57 PM
llvm/lib/Transforms/Scalar/LoopFuse.cpp
257

I am thinking if we should have this function in Loop class.

1034

whether the non loop successor of the \p FC0 guard branch is the entry block of \p FC1.

1324

Do you think something like this is more precise? The exit block successor for the latch of FC0 is updated to be the header of FC1 and the non exit block successor of the latch of FC1 is updated to be the header of FC0....

1547

reportLoopFusion<OptimizationRemark>(*FC0, *FC1, FuseCounter);

kbarton updated this revision to Diff 213627.Aug 6 2019, 8:55 AM
  • Remove (redundant) increments of FuseCounter statistic.
kbarton updated this revision to Diff 213733.Aug 6 2019, 3:12 PM
kbarton marked 6 inline comments as done.
  • Addressed review comments.

Is there a long-term plan to avoid the code duplication? E.g. refactoring/keeping only the fuseGuardedLoops variant when loop guards become a normal form? One could also add some preprocessing that makes the two loop guards guard both loops, then use the standard fuse code.

llvm/lib/Transforms/Scalar/LoopFuse.cpp
1406

[typo] "just just"

These internal remarks don't seem to be confident in what the code is doing.

Sometimes some ASCII-art might clear up which blocks are connected to where.

kbarton marked an inline comment as done.Aug 8 2019, 11:37 AM
kbarton added inline comments.
llvm/lib/Transforms/Scalar/LoopFuse.cpp
257

Good idea.
I've created https://reviews.llvm.org/D65958 to add this.
Once it lands, I can update this code to use the new method in the loop class (that can be done independent of this review, IMO).

1547

This was actually a mistake. It is counted in the main loop and should not be counted here.

kbarton updated this revision to Diff 214204.Aug 8 2019, 11:56 AM
  • Address review comments.
  • Improve haveIdenticalGuards method by ensuring the guards gave the same flow into/around the loop in addition to identical compare instructions.
  • Remove (redundant) increments of FuseCounter statistic.
  • Addressed review comments.
  • Remove old, out of date comments in fuseGuardedLoops method.
kbarton marked 2 inline comments as done.Aug 8 2019, 12:00 PM

Is there a long-term plan to avoid the code duplication? E.g. refactoring/keeping only the fuseGuardedLoops variant when loop guards become a normal form? One could also add some preprocessing that makes the two loop guards guard both loops, then use the standard fuse code.

Yes, the intention is to clean this up in subsequent patches.
The next patch will restrict fusion to only fuse rotated loops, at which point the previous performFusion method can be removed and we can consolidate on fuseGuardedLoops (maybe rename fuseGuardedLoops to performFusion, but still consolidate on a single method and remove the duplicate code).

llvm/lib/Transforms/Scalar/LoopFuse.cpp
1406

These comments were old and I forgot to remove them. I've updated them now to reflect what is actually being done.

etiotto added inline comments.Aug 13 2019, 7:24 AM
llvm/lib/Transforms/Scalar/LoopFuse.cpp
166

I think it would be better to pass a const reference to the DominatorTree and PostDominatorTree given that they should never be nullptr ?

172

It would be better to initialize GuardBranch in the init. list: GuardBranch(L->getLoopGuardBranch()) and have getLoopGuardBranch() return nullptr if the loop guard cannot be identified for whatever reason.

251

assert GuarbBranch->isConditional()

kbarton updated this revision to Diff 214860.Aug 13 2019, 9:47 AM
kbarton marked 4 inline comments as done.
  • Assert guard branch is conditional in getNonLoopBlock.
kbarton marked an inline comment as done.Aug 13 2019, 9:48 AM
kbarton added inline comments.
llvm/lib/Transforms/Scalar/LoopFuse.cpp
166

I think that should be fine. However, it isn't related to this patch, so I'd prefer to change this in a separate patch.

172

getLoopGuardBranch only works on rotated loops (it will assert if called on a non-rotated loop).

251

This is guaranteed by the getGuardBranch routine already.
I can add it here too, though, to make sure that doesn't change over time and break assumptions here.

Whitney added inline comments.Aug 13 2019, 9:56 AM
llvm/lib/Transforms/Scalar/LoopFuse.cpp
172

FYI https://reviews.llvm.org/D66084 Remove asserts in getLoopGuardBranch.

kbarton marked an inline comment as done.Aug 29 2019, 2:39 PM

Are there any other comments on this?
If not, I'd like to get this committed and start working on the next patch.

Are there any other comments on this?
If not, I'd like to get this committed and start working on the next patch.

I did a sweep through and didn't spot anything, but can't claim to be deeply familiar with this code.

So, LGTM, but give it a day or two to see if anyone else wants to chime in.

llvm/lib/Transforms/Scalar/LoopFuse.cpp
1085

This check is awfully restrictive, and can probably be easily generalized. I'm expecting that's your intent with future patches, so let's discuss there.

kbarton marked an inline comment as done.Aug 30 2019, 10:18 AM
kbarton added inline comments.
llvm/lib/Transforms/Scalar/LoopFuse.cpp
1085

Yes, once the ability to move code from between loops has been added, this restriction can be relaxed or removed altogether.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 26 2019, 2:43 PM
This revision was automatically updated to reflect the committed changes.