This patch extends the current capabilities in loop fusion to fuse guarded loops
(as defined in https://reviews.llvm.org/D63885). The patch adds the necessary
safety checks to ensure that it safe to fuse the guarded loops (control flow
equivalent, no intervening code, and same guard conditions). It also provides an
alternative method to perform the actual fusion of guarded loops. The mechanics
to fuse guarded loops are slightly different then fusing non-guarded loops, so I
opted to keep them separate methods. I will be cleaning this up in later
patches, and hope to converge on a single method to fuse both guarded and
non-guarded loops, but for now I think the review will be easier to keep them
separate.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 35828 Build 35827: arc lint + arc unit
Event Timeline
llvm/lib/Transforms/Scalar/LoopFuse.cpp | ||
---|---|---|
741 | reportLoopFusion<OptimizationRemarkMissed>(*FC0, *FC1,NonIdenticalGuards); | |
762 | reportLoopFusion<OptimizationRemarkMissed>(*FC0, *FC1, NonEmptyExitBlock); | |
769 | reportLoopFusion<OptimizationRemarkMissed>(*FC0, *FC1, NonEmptyGuardBlock); | |
1060 | what if FC0 guard first successor is FC0 Loop, but FC1 guard second successor is FC1 Loop? | |
1093 | Do we need to assert FC.ExitBlock != nullptr? |
llvm/lib/Transforms/Scalar/LoopFuse.cpp | ||
---|---|---|
1060 | Then this check will fail. | |
1093 | This should never be null, otherwise the FusionCandidate is not valid. However, it doesn't hurt to check just to be sure. |
llvm/lib/Transforms/Scalar/LoopFuse.cpp | ||
---|---|---|
1060 | I am more worry about correctness. When the two compare instructions are the same, but FC0 guard first successor is FC0 Loop, but FC1 guard second successor is FC1 Loop, then we may incorrectly fuse the two loops? |
llvm/lib/Transforms/Scalar/LoopFuse.cpp | ||
---|---|---|
1060 | Oh, I see what you mean. The guards need to be identical, and the position of the in-loop successor and out-of-loop successors also needs to be the same. That's a good catch. I'll update the patch to test this now. |
- Improve haveIdenticalGuards method by ensuring the guards gave the same flow into/around the loop in addition to identical compare instructions.
llvm/lib/Transforms/Scalar/LoopFuse.cpp | ||
---|---|---|
257 | I am thinking if we should have this function in Loop class. | |
1037 | whether the non loop successor of the \p FC0 guard branch is the entry block of \p FC1. | |
1329 | Do you think something like this is more precise? The exit block successor for the latch of FC0 is updated to be the header of FC1 and the non exit block successor of the latch of FC1 is updated to be the header of FC0.... | |
1552 | reportLoopFusion<OptimizationRemark>(*FC0, *FC1, FuseCounter); |
Is there a long-term plan to avoid the code duplication? E.g. refactoring/keeping only the fuseGuardedLoops variant when loop guards become a normal form? One could also add some preprocessing that makes the two loop guards guard both loops, then use the standard fuse code.
llvm/lib/Transforms/Scalar/LoopFuse.cpp | ||
---|---|---|
1411 | [typo] "just just" These internal remarks don't seem to be confident in what the code is doing. Sometimes some ASCII-art might clear up which blocks are connected to where. |
llvm/lib/Transforms/Scalar/LoopFuse.cpp | ||
---|---|---|
257 | Good idea. | |
1552 | This was actually a mistake. It is counted in the main loop and should not be counted here. |
- Address review comments.
- Improve haveIdenticalGuards method by ensuring the guards gave the same flow into/around the loop in addition to identical compare instructions.
- Remove (redundant) increments of FuseCounter statistic.
- Addressed review comments.
- Remove old, out of date comments in fuseGuardedLoops method.
Yes, the intention is to clean this up in subsequent patches.
The next patch will restrict fusion to only fuse rotated loops, at which point the previous performFusion method can be removed and we can consolidate on fuseGuardedLoops (maybe rename fuseGuardedLoops to performFusion, but still consolidate on a single method and remove the duplicate code).
llvm/lib/Transforms/Scalar/LoopFuse.cpp | ||
---|---|---|
1411 | These comments were old and I forgot to remove them. I've updated them now to reflect what is actually being done. |
llvm/lib/Transforms/Scalar/LoopFuse.cpp | ||
---|---|---|
166 | I think it would be better to pass a const reference to the DominatorTree and PostDominatorTree given that they should never be nullptr ? | |
172 | It would be better to initialize GuardBranch in the init. list: GuardBranch(L->getLoopGuardBranch()) and have getLoopGuardBranch() return nullptr if the loop guard cannot be identified for whatever reason. | |
251 | assert GuarbBranch->isConditional() |
llvm/lib/Transforms/Scalar/LoopFuse.cpp | ||
---|---|---|
166 | I think that should be fine. However, it isn't related to this patch, so I'd prefer to change this in a separate patch. | |
172 | getLoopGuardBranch only works on rotated loops (it will assert if called on a non-rotated loop). | |
251 | This is guaranteed by the getGuardBranch routine already. |
llvm/lib/Transforms/Scalar/LoopFuse.cpp | ||
---|---|---|
172 | FYI https://reviews.llvm.org/D66084 Remove asserts in getLoopGuardBranch. |
Are there any other comments on this?
If not, I'd like to get this committed and start working on the next patch.
I did a sweep through and didn't spot anything, but can't claim to be deeply familiar with this code.
So, LGTM, but give it a day or two to see if anyone else wants to chime in.
llvm/lib/Transforms/Scalar/LoopFuse.cpp | ||
---|---|---|
1088 | This check is awfully restrictive, and can probably be easily generalized. I'm expecting that's your intent with future patches, so let's discuss there. |
llvm/lib/Transforms/Scalar/LoopFuse.cpp | ||
---|---|---|
1088 | Yes, once the ability to move code from between loops has been added, this restriction can be relaxed or removed altogether. |
I think it would be better to pass a const reference to the DominatorTree and PostDominatorTree given that they should never be nullptr ?