Page MenuHomePhabricator

[LoopInterchange] Check lcssa phis in the inner latch in scenarios of multi-level nested loops
Needs ReviewPublic

Authored by congzhe on May 11 2021, 7:34 PM.

Details

Reviewers
Whitney
bmahjour
Summary

This is a fix in legality check.

We already know that we need to check whether lcssa phis are supported in inner loop exit block or in outer loop exit block, and we have logic to check them already. Presumably the inner loop latch does not have lcssa phis and there is no code that deals with lcssa phis in the inner loop latch.

However, that assumption is not true, when we have loops with more than two-level nesting. Take a triply nested loop as an example (which is provided as the test case in this patch). Suppose we do interchange two times for the innermost/middle loop, and the (new) middle/outermost loop. I'll use the following terminologies:

The old innermost/middle/outermost header/latch -> the very initial loop nest.
The intermediate innermost/middle/outermost header/latch -> the loop nest after 1st interchange.
The final innermost/middle/outermost header/latch -> the loop nest after 2nd interchange.

If the old innermost latch uses a value defined in the old middle header, then after 1st interchange, it becomes that the intermediate middle latch uses a value defined in the intermediate inner header. This is exactly when there is an lcssa phi node in "BasicBlock* InnerLooplatch". Note that since we are in the stage of doing the 2nd interchange, "BasicBlock* InnerLooplatch" is the intermediate middle latch.

This patch added a function areInnerLoopLatchPHIsSupported() to check the legality in InnerLoopLatch (just to clarify again, currently "BasicBlock* InnerLooplatch" is the intermediate middle latch). Note that the intermediate middle latch will become the final outermost latch, and in certain cases the values in the intermediate innermost loop will not be available in the final outermost latch, e.g., when the old outermost header can jump directly to the old outermost latch.

As an example, for the test case provided in this patch, it will run the 1st interchange without any problem, but in the 2nd interchange it will crash or hit assertion errors with the current trunk. The legality check added in this patch catches the IR after the 1st interchange and bail out the 2nd interchange.

Diff Detail

Event Timeline

congzhe created this revision.May 11 2021, 7:34 PM
congzhe requested review of this revision.May 11 2021, 7:34 PM
congzhe edited the summary of this revision. (Show Details)May 11 2021, 7:41 PM
congzhe updated this revision to Diff 344651.May 11 2021, 9:28 PM
Whitney added inline comments.Sun, May 30, 9:44 AM
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1050

Can user of PHI be not an instruction? If not, change dyn_cast to cast.

llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
17

Is it possible to show the problem with input IR that the innermost and the middle loop is already interchanged?

congzhe added inline comments.Sun, May 30, 3:58 PM
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1050

Thanks, will update to cast.

llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
17

Thanks for the comment! The difficulty is that loop interchange works on loops ordered from the innermost loop toward the outermost loop. Whenever we fail the legality check for a pair of loops, we just bail out the entire pass not trying to interchange the rest of loop nest. The pass does not "skip over" the first pair of candidate loops and go directly to the next one.

With the input IR that the innermost and the middle loop is already interchanged, when the pass starts working on this loop nest, it would do one of the following:

  1. The pass either determines it does not want to interchange the (new) innermost and the (new) middle loop likely due to profitability issue and will just bail out, not interchanging anything.
  1. (Even if we manually change the loop access pattern in the IR to make it pass profitability), it would interchange the the (new) innermost and the (new) middle loop which would result in a new IR that does not expose the problem we would like to expose.

Therefore I thought the current test case might be a in a sense a straightforward one to catch the problem.

  • With the trunk code, it run the 1st interchange and crashes on the 2nd interchange.
  • With this patch, it run the 1st interchange and bails out the 2nd interchange.

I'm wondering if what I described above makes sense to you? Please correct me if I missed something.

Whitney added inline comments.Sun, May 30, 5:10 PM
llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
17

When I run this test case with my opt, I got:

Loops are legal to interchange
Loops interchanged.
Not legal because of current transform limitation
Not interchanging loops. Cannot prove legality.

Can you please check with the latest trunk code, and see if you still see a crash on the 2nd interchange?
The commit-hash I tried with is 1e344ce4f3fac4beb5e23e04dc3dd59398125956.

congzhe added inline comments.Sun, May 30, 6:05 PM
llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
17

That's right, the latest trunk code does not crash because of my recent patch: https://reviews.llvm.org/D101305. That patch makes the pass bail out when it detects triangular loops, however, it also catches this IR accidently because SCEV fails to detect an lcssa variable is loop invariant. So the trunk code bails out and just hides the problem described in this patch.

I can modify my other patch by adding some logic that deals with lcssa phis when calling SCEV, which will expose the crash again. Essentially what this means is that at the end of funciton isLoopStructureUnderstood(), I can simply change this piece of code

const SCEV *S = SE->getSCEV(Right);
    if (!SE->isLoopInvariant(S, OuterLoop))
      return false;

to

const SCEV *S = SE->getSCEV(followLCSSA(Right));
    if (!SE->isLoopInvariant(S, OuterLoop))
      return false;

If needed I can add that change to the current patch.

congzhe added inline comments.Sun, May 30, 6:58 PM
llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
17

Just to be more clear: in order to reproduce the crash, we can either checkout to a repo before my other patch (https://reviews.llvm.org/D101305) is committed, or do the change as described above (moving followLCSSA() to before isLoopStructureUnderstood() is also needed).

Also for the test case please add -loop-interchange-threshold=-100 to enable transformation for the 2nd interchange.

The problem is discovered from a large benchmark with multi-level loop nests. I tried to expose it through minimal IR but I apologize if it still looks not convenient enough to reproduce.

Whitney added inline comments.Sun, May 30, 7:53 PM
llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
17

From modifying your original test case, we can show the same problem with an easier logic. Instead of 2 steps and 3 level nested loop, we can show with only attempting to interchange once with a 2 level nested loop.
Can you simplify all the comments/descriptions and test case?

@a = common global i32 0, align 4
@d = common dso_local local_unnamed_addr global [1 x [6 x i32]] zeroinitializer, align 4

define void @innermost_latch_uses_values_in_middle_header() {
entry:
  %0 = load i32, i32* @a, align 4
  %b = add i32 80, 1
  br label %outermost.header

outermost.header:                                 ; preds = %outermost.latch, %entry
  %indvar.outermost = phi i32 [ 10, %entry ], [ %indvar.outermost.next, %outermost.latch ]
  %tobool71.i = icmp eq i32 %0, 0
  br i1 %tobool71.i, label %innermost.header.preheader, label %outermost.latch

innermost.header.preheader:                       ; preds = %outermost.header
  br label %innermost.header

innermost.header:                                 ; preds = %innermost.header.preheader, %innermost.latch.split
  %indvar.innermost = phi i64 [ %1, %innermost.latch.split ], [ 4, %innermost.header.preheader ]
  %indvar.middle.wide = zext i32 %b to i64 ; a def in the middle header
  %arrayidx9.i = getelementptr inbounds [1 x [6 x i32]], [1 x [6 x i32]]* @d, i64 0, i64 %indvar.innermost, i64 %indvar.innermost
  store i32 0, i32* %arrayidx9.i, align 4
  br label %innermost.latch

innermost.latch:                                  ; preds = %innermost.body
  %indvar.innermost.next = add nsw i64 %indvar.innermost, 1
  %tobool5.i = icmp eq i64 %indvar.innermost.next, %indvar.middle.wide
  br label %innermost.latch.split

innermost.latch.split:                            ; preds = %middle.latch
  %indvar.middle.wide.lcssa = phi i64 [ %indvar.middle.wide, %innermost.latch ]
  %1 = add nsw i64 %indvar.innermost, 1
  %2 = icmp eq i64 %1, %indvar.middle.wide.lcssa
  br i1 %2, label %outermost.latch.loopexit, label %innermost.header

outermost.latch.loopexit:                         ; preds = %innermost.latch.split
  br label %outermost.latch

outermost.latch:                                  ; preds = %outermost.latch.loopexit, %outermost.header
  %indvar.outermost.next = add nsw i32 %indvar.outermost, -5
  %tobool.i = icmp eq i32 %indvar.outermost.next, 0
  br i1 %tobool.i, label %outermost.exit, label %outermost.header

outermost.exit:                                   ; preds = %outermost.latch
  ret void
}

And it looks to me that we could replace all uses of %indvar.middle.wide.lcssa with %indvar.middle.wide and remove %indvar.middle.wide.lcssa to continue interchange. Have you thought about that?

Whitney added inline comments.Sun, May 30, 10:30 PM
llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
17

Another idea is to clone the incoming values of the lcssa phi, as we know it is loop invariance, or else it would bail out by isLoopStructureUnderstood.

congzhe added inline comments.Mon, Jun 7, 5:30 PM
llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
17

@Whitney My apologies for not being able to reply sooner.

Thanks a lot for working out this simpler test case! It does expose the same problem, which is that some variables used in the new outer latch are not always available after interchange.

A minor concern for this test is that, usually %indvar.middle.wide.lcssa in the original inner latch is not supposed to exist since its incoming value is defined in the same loop whereas an lcssa phi is supposed to have incoming values from a loop that is "one-level inner" in the loop nest. So I guess in reality we would not see such patterns as shown in the simpler test, and I completely agree with you that in this case we can just replace all uses of %indvar.middle.wide.lcssa with %indvar.middle.wide so that interchange would not cause any problem.

Originally I used a triply-nested loop as test because the problematic lcssa phi in the intermediate middle latch that has incoming values from the intermediate inner header is a valid lcssa phi and should exist. The corresponding IR (the provided test after 1st interchange) is shown below, where we see the lcssa phi %indvar.middle.wide.lcssa in innermost.latch.split (which is the intermediate middle latch, and acts as the inner loop latch when doing 2nd interchange) is actually valid and should be there. After interchange innermost.latch.split would become the final outermost latch, and its incoming value, i.e., %indvar.middle.wide is not always available at the final outermost latch. If we follow the edge directly from the final outermost header to the final outermost latch, %indvar.middle.wide would not be available.

define void @innermost_latch_uses_values_in_middle_header_after_first_interchange() {
entry:
  %0 = load i32, i32* @a, align 4
  %b = add i32 80, 1
  br label %outermost.header

outermost.header:                                 ; preds = %outermost.latch, %entry
  %indvar.outermost = phi i32 [ 10, %entry ], [ %indvar.outermost.next, %outermost.latch ]
  %tobool71.i = icmp eq i32 %0, 0
  br i1 %tobool71.i, label %innermost.header.preheader, label %outermost.latch

middle.header.preheader:                          ; preds = %innermost.header
  br label %middle.header

middle.header:                                    ; preds = %middle.header.preheader, %middle.latch
  %indvar.middle = phi i64 [ %indvar.middle.next, %middle.latch ], [ 4, %middle.header.preheader ]
  %indvar.middle.wide = zext i32 %b to i64
  br label %innermost.header.split

innermost.header.preheader:                       ; preds = %outermost.header
  br label %innermost.header

innermost.header:                                 ; preds = %innermost.header.preheader, %innermost.latch.split
  %indvar.innermost = phi i64 [ %1, %innermost.latch.split ], [ 4, %innermost.header.preheader ]
  br label %middle.header.preheader

innermost.header.split:                           ; preds = %middle.header
  br label %innermost.body

innermost.body:                                   ; preds = %innermost.header.split
  %arrayidx9.i = getelementptr inbounds [1 x [6 x i32]], [1 x [6 x i32]]* @d, i64 0, i64 %indvar.innermost, i64 %indvar.middle
  store i32 0, i32* %arrayidx9.i, align 4
  br label %innermost.latch

innermost.latch:                                  ; preds = %innermost.body
  %indvar.innermost.next = add nsw i64 %indvar.innermost, 1
  %tobool5.i = icmp eq i64 %indvar.innermost.next, %indvar.middle.wide
  br label %middle.latch

innermost.latch.split:                            ; preds = %middle.latch
  %indvar.middle.wide.lcssa = phi i64 [ %indvar.middle.wide, %middle.latch ]
  %1 = add nsw i64 %indvar.innermost, 1
  %2 = icmp eq i64 %1, %indvar.middle.wide.lcssa
  br i1 %2, label %outermost.latch.loopexit, label %innermost.header

middle.latch:                                     ; preds = %innermost.latch
  %indvar.middle.next = add nsw i64 %indvar.middle, -1
  %tobool2.i = icmp eq i64 %indvar.middle.next, 0
  br i1 %tobool2.i, label %innermost.latch.split, label %middle.header

outermost.latch.loopexit:                         ; preds = %innermost.latch.split
  br label %outermost.latch

outermost.latch:                                  ; preds = %outermost.latch.loopexit, %outermost.header
  %indvar.outermost.next = add nsw i32 %indvar.outermost, -5
  %tobool.i = icmp eq i32 %indvar.outermost.next, 0
  br i1 %tobool.i, label %outermost.exit, label %outermost.header

outermost.exit:                                   ; preds = %outermost.latch
  ret void
}

The key point here is that usually lcssa phis are not supposed to exist in loop latches but are supposed to exist in loop exist blocks. However in scenarios I described in this patch, lcssa phis do exist in loop latches and they are perfectly valid lcssa phis. Therefore the patch aims at dealing with that scenario.

Because of my recent update in isLoopStructureUnderstood, that problem does not cause segmentation fault for now but just bails from isLoopStructureUnderstood. This is because of insufficient capability of SCEV to determine %indvar.middle.wide to be loop invariant. If SCEV is improved in the future such that we would not bail from isLoopStructureUnderstood, the segmentation fault would appear again.

I'm wondering if what I described above looks clear to you? In terms of moving forward, I'm leaning toward sticking to the test case I provided (i.e., keeping the patch as it is) because that is something that occurs in reality. But if you believe we could use your simplified test case and update all the comments/descriptions correspondingly, I could do that as well.

congzhe updated this revision to Diff 350464.Mon, Jun 7, 5:39 PM

Addressed comments, updated from dyn_cast<Instruction>(U) to cast<Instruction>(U).