This is an archive of the discontinued LLVM Phabricator instance.

[LoopFusion] Move instructions from FC1.GuardBlock to FC0.GuardBlock and from FC0.ExitBlock to FC1.ExitBlock when proven safe.
ClosedPublic

Authored by Whitney on Jan 29 2020, 9:43 AM.

Details

Summary

Currently LoopFusion give up when the second loop nest guard
block or the first loop nest exit block is not empty. For example:

if (0 < N) {
  for (int i = 0; i < N; ++i) {}
  x+=1;
}
y+=1;
if (0 < N) {
  for (int i = 0; i < N; ++i) {}
}

The above example should be safe to fuse.
This PR moves instructions in FC1 guard block (e.g. y+=1;) to
FC0 guard block, or instructions in FC0 exit block (e.g. x+=1;) to
FC1 exit block, which then LoopFusion is able to fuse them.

Diff Detail

Event Timeline

Whitney created this revision.Jan 29 2020, 9:43 AM
jdoerfert added inline comments.Jan 29 2020, 12:15 PM
llvm/lib/Transforms/Scalar/LoopFuse.cpp
766

Do we still need these isEmptyXXXBlock methods?

779

What happens now if there is a FC1 guard branch with non-movable instructions but no fc0 guard branch. I think we now miss that case and do sth? Could you add a test if we don't have one?

Whitney updated this revision to Diff 241277.Jan 29 2020, 1:39 PM
Whitney marked 3 inline comments as done.

Removed isEmptyXXXBlock methods.

llvm/lib/Transforms/Scalar/LoopFuse.cpp
766

Good catch!

779

I think that's not allowed in https://reviews.llvm.org/D71569.

jdoerfert added inline comments.Jan 29 2020, 3:24 PM
llvm/lib/Transforms/Scalar/LoopFuse.cpp
779

Fair, then you didn't need to change FC1->GuardBranch && into FC0->GuardBranch && FC1->GuardBranch &&, right? At least that is what confused me here.
Maybe we do an outer if (FC0->GuardBranch) {...} and an assertion that both are set or none is? That makes the two conditionals also simpler as they only check isSafeToMoveBefore.

Whitney updated this revision to Diff 241336.Jan 29 2020, 6:40 PM
Whitney marked 2 inline comments as done.

Addressed latest review comment.

jdoerfert accepted this revision.Jan 29 2020, 7:30 PM

LGTM with the caveat that the last change didn't do what I hoped for, see below.

llvm/lib/Transforms/Scalar/LoopFuse.cpp
779

Now you check FC1->GuardBranch twice. What I meant was:

if (FC0->GuardBranch) {
           assert(FC1->GuardBranch && "Expecting valid FC1 guard branch");
            if (!isSafeToMoveBefore(*FC0->ExitBlock,
                                 *FC1->ExitBlock->getFirstNonPHIOrDbg(), DT,
                                 PDT, DI)) {
           LLVM_DEBUG(dbgs() << "Fusion candidate contains unsafe "
                                "instructions in exit block. Not fusing.\n");


           reportLoopFusion<OptimizationRemarkMissed>(*FC0, *FC1,
                                                      NonEmptyExitBlock);
           continue;
         }

           if (!isSafeToMoveBefore(
                   *FC1->GuardBranch->getParent(),
                   *FC0->GuardBranch->getParent()->getTerminator(), DT, PDT,
                   DI)) {
             LLVM_DEBUG(dbgs()
                        << "Fusion candidate contains unsafe "
                           "instructions in guard block. Not fusing.\n");
             reportLoopFusion<OptimizationRemarkMissed>(*FC0, *FC1,
                                                        NonEmptyGuardBlock);
             continue;
           }
    }
This revision is now accepted and ready to land.Jan 29 2020, 7:30 PM
Whitney updated this revision to Diff 241342.Jan 29 2020, 8:14 PM
Whitney marked an inline comment as done.

Addressed review comment.

This revision was automatically updated to reflect the committed changes.

It seems isSafeToMoveBefore() on basic blocks is quite limited. It doesn't handle cases like

for.body3.lr.ph: ; preds = %for.cond1.preheader

%mul4 = mul nsw i32 %j.0123, %N, !dbg !14
%add20 = add nsw i32 %mul4, %j.0123, !dbg !14
%arrayidx21 = getelementptr inbounds float, float* %A, i32 %add20, !dbg !14
%arrayidx21.promoted = load float, float* %arrayidx21, align 4, !dbg !15, !tbaa !16

It stops at '%arrayidx21 = getelementptr inbounds float, float* %A, i32 %add20, !dbg !14', which uses 'add20'.