Page MenuHomePhabricator

[LoopReroll] Fix rerolling loop with extra instructions
Needs ReviewPublic

Authored by kawashima-fj on Sep 28 2020, 12:25 AM.

Details

Summary

Fixes PR47627

This fix suppresses rerolling a loop which has an unrerollable
instruction.

Sample IR for the explanation below:

define void @foo([2 x i32]* nocapture %a) {
entry:
  br label %loop

loop:
  ; base instruction
  %indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]

  ; unrerollable instructions
  %stptrx = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %indvar, i64 0
  store i32 999, i32* %stptrx, align 4

  ; extra simple arithmetic operations, used by root instructions
  %plus20 = add nuw nsw i64 %indvar, 20
  %plus10 = add nuw nsw i64 %indvar, 10

  ; root instruction 0
  %ldptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 0
  %value0 = load i32, i32* %ldptr0, align 4
  %stptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 0
  store i32 %value0, i32* %stptr0, align 4

  ; root instruction 1
  %ldptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 1
  %value1 = load i32, i32* %ldptr1, align 4
  %stptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 1
  store i32 %value1, i32* %stptr1, align 4

  ; loop-increment and latch
  %indvar.next = add nuw nsw i64 %indvar, 1
  %exitcond = icmp eq i64 %indvar.next, 5
  br i1 %exitcond, label %exit, label %loop

exit:
  ret void
}

In the loop rerolling pass, %indvar and %indvar.next are appended
to the LoopIncs vector in the LoopReroll::DAGRootTracker::findRoots
function.

Before this fix, two instructions with unrerollable instructions
comment above are marked as IL_All at the end of the
LoopReroll::DAGRootTracker::collectUsedInstructions function,
as well as instructions with extra simple arithmetic operations
comment and loop-increment and latch comment. It is incorrect
because IL_All means that the instruction should be executed in all
iterations of the rerolled loop but the store instruction should
not.

This fix rejects instructions which may have side effects and don't
belong to def-use chains of any root instructions and reductions.

See https://bugs.llvm.org/show_bug.cgi?id=47627 for more information.

Diff Detail

Event Timeline

kawashima-fj created this revision.Sep 28 2020, 12:25 AM
kawashima-fj requested review of this revision.Sep 28 2020, 12:25 AM

clang-format reported a format error DenseSet<Instruction*> -> DenseSet<Instruction *> (space before *). However, other three places in the same function (and many places in other functions) have the same format. Should I change all places? only my change?, or keep all them unchanged?

Ping.
I know many of LLVM developers are busy with the developer's meeting now. When someone has time, please review.

Ping.
Can anyone review this fix?

Thanks for all the effort you put into this and sorry you had to wait so long for somebody to look at this.

I don't think the problem here is the LoopIncs. The unrerollable instructions can as well depend on %iv.next. You patch would still reroll the following:

define void @unrerollable_simple([2 x i32]* nocapture %a) {
entry:
  br label %loop

loop:
  ; base instruction
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]

  ; unrerollable instructions
  %iv.next = add nuw nsw i64 %iv, 1
  %stptrx = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %iv.next, i64 0
  store i32 999, i32* %stptrx, align 4

  ; extra simple arithmetic operations, used by root instructions
  %plus20 = add nuw nsw i64 %iv, 20
  %plus10 = add nuw nsw i64 %iv, 10

  ; root instruction 0
  %ldptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 0
  %value0 = load i32, i32* %ldptr0, align 4
  %stptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 0
  store i32 %value0, i32* %stptr0, align 4

  ; root instruction 1
  %ldptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 1
  %value1 = load i32, i32* %ldptr1, align 4
  %stptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 1
  store i32 %value1, i32* %stptr1, align 4

  ; loop-increment and latch
  ;%iv.next = add nuw nsw i64 %iv, 1
  %exitcond = icmp eq i64 %iv.next, 5
  br i1 %exitcond, label %exit, label %loop

exit:
  ret void
}

To me the problem looks like that the unrerollable store does not belong to any root set and is skipped during verification.

clang-format reported a format error DenseSet<Instruction*> -> DenseSet<Instruction *> (space before *). However, other three places in the same function (and many places in other functions) have the same format. Should I change all places? only my change?, or keep all them unchanged?

Use the clang-format suggestion. To goal is to eventually converge to a completely clang-formatted codebase.

llvm/lib/Transforms/Scalar/LoopRerollPass.cpp
1076

[style] LLVM's coding style does not make use of Almost-Always-Auto.

llvm/test/Transforms/LoopReroll/extra_instr.ll
13

[nit] There's usually a space before CHECK

@Meinersbur Thanks for your review. I'll update the patch.

kawashima-fj edited the summary of this revision. (Show Details)

@Meinersbur I'm sorry for being late. I updated the patch. Please review again.