This is an archive of the discontinued LLVM Phabricator instance.

[LoopInterchange] fix tightlyNested() in LoopInterchange legality
ClosedPublic

Authored by congzhe on Mar 9 2021, 7:24 AM.

Details

Summary

This is yet another attempt to fix tightlyNested().

Add checks in tightlyNested() for the inner loop exit block,
such that 1) if there is control-flow divergence in between the inner
loop exit block and the outer loop latch, or 2) if the inner loop exit
block contains unsafe instructions, tightlyNested() returns false.

The reasoning behind is that after interchange, the original inner loop
exit block, which was part of the outer loop, would be put into the new
inner loop, and will be executed different number of times before and
after interchange. Thus it should be dealt with appropriately.

For now tests are not provided, and the tests provided in the following patches
https://reviews.llvm.org/D96708 and https://reviews.llvm.org/D91682
are appropriate tests, where tightlyNested() previously returns true and will return
false after this patch.

Apart from that, this patch does not fail any of the existing regression tests.

This test comes from https://reviews.llvm.org/D96708:

#include <stdio.h>
static   long  int i,j,k=3,m=3;
unsigned long  int x[50];
long double a[50][50];

int main() {

for (i=0; i<50; i++) for (j=0; j<50; j++) a[i][j]=i; for (i=0; i<50; i++) x[i]=1;

j=0;
for(i=40;i>1;i--) {
  k=20;
  while(k++<40) x[k]=x[j]%40;
  a[i][i]+=a[i][j]+a[j][i];
  for(m=9;;) if (--m<5) break;
}

for (i=0; i<50; i++) for (j=0; j<50; j++)
  printf("a:%ld %5.5Le \n",i,a[i][j]);

  return 0;
}

This test comes from https://reviews.llvm.org/D91682:

 char a;
 int b[][2];
 char c[][9];
 int h() {
   for (; f <= 2; f++) {
     for (int j = 0; j <= 2; j++) {
       b[j][1] = c[j][f];
     }
     if (a)
   }
   return 0;
}

Diff Detail

Event Timeline

congzhe created this revision.Mar 9 2021, 7:24 AM
congzhe requested review of this revision.Mar 9 2021, 7:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 9 2021, 7:24 AM
congzhe edited the summary of this revision. (Show Details)Mar 9 2021, 7:33 AM
congzhe edited the summary of this revision. (Show Details)
congzhe edited the summary of this revision. (Show Details)
congzhe edited the summary of this revision. (Show Details)Mar 9 2021, 7:43 AM
congzhe added reviewers: fhahn, Whitney, masakazu.ueno, geng.
congzhe edited the summary of this revision. (Show Details)
congzhe edited the summary of this revision. (Show Details)
congzhe edited the summary of this revision. (Show Details)
congzhe updated this revision to Diff 329349.Mar 9 2021, 7:59 AM
dancgr added a subscriber: dancgr.Mar 9 2021, 8:10 AM
TaWeiTu added a subscriber: TaWeiTu.Mar 9 2021, 8:34 AM
HLJ2009 added a subscriber: HLJ2009.Mar 9 2021, 6:27 PM
congzhe added a comment.EditedMar 9 2021, 9:54 PM

An update: the current trunk code (without this patch) fails the following test (crashes as follows):

void updateSuccessor(llvm::BranchInst*, llvm::BasicBlock*, llvm::BasicBlock*, std::vector<llvm::cfg::Update<llvm::BasicBlock*>, std::allocator<llvm::cfg::Update<llvm::BasicBlock*> > >&, bool): Assertion 'Changed && "Expected a successor to be updated"' failed.

@A = common global [100 x [100 x i64]] zeroinitializer
@B = common global [100 x i64] zeroinitializer

;;  for(int i=0;i<100;i++)
;;    for(int j=0;j<100;j++)
;;      A[j][i] = A[j][i]+k;

; Crashes with the newest trunk code
define void @crash(i64 %k, i64 %N, i64 signext %ny) {
entry:
  br label %for1.header

for1.header:
  %j23 = phi i64 [ 0, %entry ], [ %j.next24, %for1.inc10 ]
  %cmp21 = icmp slt i64 0, %ny
  br label %singleSucc

singleSucc:
  br i1 %cmp21, label %preheader.j, label %for1.inc10

preheader.j:  
  br label %for2

for2:
  %j = phi i64 [ %j.next, %for2 ], [ 0, %preheader.j ]
  %arrayidx5 = getelementptr inbounds [100 x [100 x i64]], [100 x [100 x i64]]* @A, i64 0, i64 %j, i64 %j23
  %lv = load i64, i64* %arrayidx5
  %add = add nsw i64 %lv, %k
  store i64 %add, i64* %arrayidx5
  %j.next = add nuw nsw i64 %j, 1
  %exitcond = icmp eq i64 %j, 99
  br i1 %exitcond, label %for1.inc10, label %for2

for1.inc10:
  %j.next24 = add nuw nsw i64 %j23, 1
  %exitcond26 = icmp eq i64 %j23, 99
  br i1 %exitcond26, label %for.end12, label %for1.header

for.end12:
  ret void
}

The check from LoopNest::checkLoopsStructure() is actually more comprehensive than what was done for tightlyNested() in LoopInterchange.cpp, for example checkLoopsStructure() also considers the inner loop guard blocks, and some other scenarios. However, the current transformation in LoopInterchange may not be able to handle those complicated scenarios. With the previous check in tightlyNested() (as below) which is simpler, it would not pass legality and would not enter transformation, hence it would not crash.

// A perfectly nested loop will not have any branch in between the outer and
 // inner block i.e. outer header will branch to either inner preheader and
 // outerloop latch.
 BranchInst *OuterLoopHeaderBI =
     dyn_cast<BranchInst>(OuterLoopHeader->getTerminator());
 if (!OuterLoopHeaderBI)
   return false;

 for (BasicBlock *Succ : successors(OuterLoopHeaderBI))
   if (Succ != InnerLoopPreHeader && Succ != InnerLoop->getHeader() &&
       Succ != OuterLoopLatch)
     return false;
congzhe edited the summary of this revision. (Show Details)Mar 9 2021, 10:16 PM

Thanks for fixing it. Is the crashing test case an existing test case, or will be added in D96708 or D91682? If not, can you please also include that test case in this patch?

congzhe updated this revision to Diff 329684.Mar 10 2021, 9:06 AM

Updated the patch.

Added the abovementioned test case to "not-interchanged-tightly-nested.ll", and made appropriate updates in the patch to handle this new test cases. The updates are as follows.

As mentioned above, LoopNest::checkLoopsStructure() is a bit too comprehensive and includes scenarios (e.g., the new test case) where loop interchange transformation cannot handle, so I used the previous check from tightlyNested() instead, which would fail "tightlyNest" check and would not enter transformation, hence would not crash.

All other regression test cases are ok, no failure observed.

Thanks for fixing it. Is the crashing test case an existing test case, or will be added in D96708 or D91682? If not, can you please also include that test case in this patch?

Thanks for the reply! The crashing test case is not an existing test case, and I just added it to this patch. To summarize, three test cases can be handled by this patch which could not be handled by trunk code: one is what I just added, the other two are from D96708 and D91682 as I included in the summary section of this patch.

The patch now revert part of https://reviews.llvm.org/rGdf9158c9a45a6902c2b0394f9bd6512e3e441f31, which leave some changes behind.
Given that the change cause a test case crash, @TaWeiTu can you please revert the change?

The patch now revert part of https://reviews.llvm.org/rGdf9158c9a45a6902c2b0394f9bd6512e3e441f31, which leave some changes behind.
Given that the change cause a test case crash, @TaWeiTu can you please revert the change?

Sure, reverted.

congzhe updated this revision to Diff 329817.Mar 10 2021, 6:41 PM

Updated the patch based on the latest trunk code.

The patch now revert part of https://reviews.llvm.org/rGdf9158c9a45a6902c2b0394f9bd6512e3e441f31, which leave some changes behind.
Given that the change cause a test case crash, @TaWeiTu can you please revert the change?

Sure, reverted.

Thanks a lot for the update!

I updated my patch accordingly based on the latest trunk code. Now we do not have pr48113.ll which was in D97290. If necessary maybe @TaWeiTu or I could add it back. pr48113.ll is handled in this patch.

Hi @congzhe, thanks for the fix!

I think it'd probably be better to keep the LoopNest::checkLoopsStructure and add the additional checks of branch instructions for now.
By doing so, future changes to LoopNest will also apply in LoopInterchange, which is what we are trying to achieve I think (by making the definition of perfectly-nested loop in sync).
The check can be removed in the future, when LoopNest is eventually taught to correctly identify guard branches (singleSucc is currently considered to be the guard branch of the inner loop, but the loops should only be considered perfectly-nested if the guard branch corresponds to the one added by LoopRotate IMO, which is not true in this case).

What do you think?

Hi @congzhe, thanks for the fix!

I think it'd probably be better to keep the LoopNest::checkLoopsStructure and add the additional checks of branch instructions for now.
By doing so, future changes to LoopNest will also apply in LoopInterchange, which is what we are trying to achieve I think (by making the definition of perfectly-nested loop in sync).
The check can be removed in the future, when LoopNest is eventually taught to correctly identify guard branches (singleSucc is currently considered to be the guard branch of the inner loop, but the loops should only be considered perfectly-nested if the guard branch corresponds to the one added by LoopRotate IMO, which is not true in this case).

What do you think?

Thanks for the reply! I see your point.

I am just not entirely sure whether we should keep LoopNest::checkLoopsStructure() in tightlyNested() in loop interchange. As I described above, LoopNest::checkLoopsStructure() does comprehensive checks and some of the scenarios that it considered may not be applicable to loop interchange, because the transformation phase in loop interchange is not comprehensive enough to handle all those scenarios. The scenario with the inner loop guard issue as I added in this patch is only one of those, I can think of other scenarios that can pass LoopNest::checkLoopsStructure() but fails loop interchange transformation phase. I have listed two of them below.

Therefore even we handle the inner loop guard appropriately, there's other scenarios that would fail if we do keep LoopNest::checkLoopsStructure(). Essentially I think from practical point of view, it does make sense that tightlyNest() is not entirely the same as LoopNest::checkLoopsStructure(), since tightlyNest() is specifically tailored for loop interchange hence has its own logic in checking the loop structure.

Scenario 1:

@A = common global [100 x [100 x i64]] zeroinitializer
@B = common global [100 x i64] zeroinitializer

define void @crash(i64 %k, i64 %N, i64 signext %ny) {
entry:
  br label %for1.header

for1.header:
  %j23 = phi i64 [ 0, %entry ], [ %j.next24, %for1.inc10 ]
  %cmp21 = icmp slt i64 0, %ny
  br i1 %cmp21, label %preheader.j, label %for.j.end

preheader.j:  
  br label %for2

for2:
  %j = phi i64 [ %j.next, %for.j.inc ], [ 0, %preheader.j ]
  %arrayidx5 = getelementptr inbounds [100 x [100 x i64]], [100 x [100 x i64]]* @A, i64 0, i64 %j, i64 %j23
  %lv = load i64, i64* %arrayidx5
  %add = add nsw i64 %lv, %k
  store i64 %add, i64* %arrayidx5
  br label %for.j.inc

for.j.inc:                                       
  %j.next = add nuw nsw i64 %j, 1
  %exitcond = icmp eq i64 %j, 99
  br i1 %exitcond, label %for2, label %for.j.end_crit_edge

for.j.end_crit_edge:                              
  %split = phi i64 [ %add, %for.j.inc ]
  br label %for.j.end

for.j.end:                                        
  ;%res.1.lcssa = phi i64 [ %split, %for.j.end_crit_edge ], [ %k, %for1.header ]
  br label %for1.inc10

for1.inc10:
  %j.next24 = add nuw nsw i64 %j23, 1
  %exitcond26 = icmp eq i64 %j23, 99
  br i1 %exitcond26, label %for.end12, label %for1.header

for.end12:
  ret void
}

Scenario 2

@A = common global [100 x [100 x i64]] zeroinitializer
@B = common global [100 x i64] zeroinitializer

define void @crash(i64 %k, i64 %N, i64 signext %ny) {
entry:
  br label %for1.header

for1.header:
  %j23 = phi i64 [ 0, %entry ], [ %j.next24, %for1.inc10 ]    
  br label %singleSucc

singleSucc:
  br label %preheader.j

preheader.j:  
  br label %preheader.j2

preheader.j2:  
  br label %preheader.j3

preheader.j3:  
  br label %for2

for2:
  %j = phi i64 [ %j.next, %for2 ], [ 0, %preheader.j3 ]
  %arrayidx5 = getelementptr inbounds [100 x [100 x i64]], [100 x [100 x i64]]* @A, i64 0, i64 %j, i64 %j23
  %lv = load i64, i64* %arrayidx5
  %add = add nsw i64 %lv, %k
  store i64 %add, i64* %arrayidx5
  %j.next = add nuw nsw i64 %j, 1
  %exitcond = icmp eq i64 %j, 99
  br i1 %exitcond, label %for1.inc10, label %for2

for1.inc10:
  %j.next24 = add nuw nsw i64 %j23, 1
  %exitcond26 = icmp eq i64 %j23, 99
  br i1 %exitcond26, label %for.end12, label %for1.header

for.end12:
  ret void
}

I am fine with this change now, but in the future we should try to

  1. Use utilities in LoopNest as much as possible
  2. Loosen the definition perfect loop nest in LoopNest
  3. Make the transformation more generic to handle more cases, e.g. the newly added test interchange_07.
llvm/test/Transforms/LoopInterchange/not-interchanged-tightly-nested.ll
12 ↗(On Diff #329817)

can we reuse the existing global arrays?

108 ↗(On Diff #329817)

Please add comment for why it is not considered tightly nested.
Is it because we expect the outer loop header to have a conditional branch?

congzhe updated this revision to Diff 332847.Mar 23 2021, 7:08 PM

Updated the patch according to the comments.

Whitney accepted this revision.Mar 23 2021, 7:09 PM
This revision is now accepted and ready to land.Mar 23 2021, 7:09 PM

I am fine with this change now, but in the future we should try to

  1. Use utilities in LoopNest as much as possible
  2. Loosen the definition perfect loop nest in LoopNest
  3. Make the transformation more generic to handle more cases, e.g. the newly added test interchange_07.

Thanks for the review! I've updated the patch accordingly.

llvm/test/Transforms/LoopInterchange/not-interchanged-tightly-nested.ll
12 ↗(On Diff #329817)

Thanks, now removed the newly added array E and reused the existing array A.

108 ↗(On Diff #329817)

Thanks, comments added accordingly.
It is because the outer loop header does not branch to the inner loop preheader, or the
inner loop header, or the outer loop latch. If we did loop interchange on interchange_07, it would crash since the transformation phase currently cannot properly handle such an IR.