This is an archive of the discontinued LLVM Phabricator instance.

Loop Strength Reduce - Optimize unused IVs to final values in the exit block with SCEV
ClosedPublic

Authored by syzaara on Feb 2 2022, 9:21 AM.

Details

Summary

Loop strength reduce sometimes optimizes away all uses of an induction variable from a loop but leaves the IV increments. When the only remaining use of the IV is the PHI in the exit block, this patch will use SCEV to replace the exit block PHI with the final value of the IV to skip the updates in each loop iteration.

Diff Detail

Event Timeline

syzaara created this revision.Feb 2 2022, 9:21 AM
syzaara requested review of this revision.Feb 2 2022, 9:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2022, 9:21 AM

This is normally a IndVar's problem, but i guess LSR runs so late that there is no IndVar's scheduled after it?
Do you have a phase-ordering test showing this missing simplification?

syzaara updated this revision to Diff 405326.Feb 2 2022, 9:53 AM

This is normally a IndVar's problem, but i guess LSR runs so late that there is no IndVar's scheduled after it?
Do you have a phase-ordering test showing this missing simplification?

Are you referring to IndVarSimplify? Yes, it looks LSR is run after IndVarSimplify.
I thought to implement it in LSR because LSR is the one that was responsible for optimizing away the uses of the IV but leaving behind the single use of increments in the loop.
Here is an example testcase:

int do_work(int * a, int m, int n, int x, int* b) {
   int i,j;
   for( j = 0 ; j < m ; j++ ) {
      for( i = 0 ; i < n ; i++ )
      {
         a[i] += x*b[i];
      }
   }
   return x;
}

Before LSR, we have:

for.body3.us.prol:                                ; preds = %for.body3.us.prol.preheader, %for.body3.us.prol
 %indvars.iv.prol = phi i64 [ %indvars.iv.next.prol, %for.body3.us.prol ], [ %indvars.iv.ph, %for.body3.us.prol.preheader ]
  %prol.iter = phi i64 [ %prol.iter.next, %for.body3.us.prol ], [ 0, %for.body3.us.prol.preheader ]
  %arrayidx.us.prol = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.prol <------
  %30 = load i32, i32* %arrayidx.us.prol, align 4, !tbaa !3
  %mul.us.prol = mul nsw i32 %30, %x
  %arrayidx5.us.prol = getelementptr inbounds i32, i32* %a, i64 %indvars.iv.prol <------
  %31 = load i32, i32* %arrayidx5.us.prol, align 4, !tbaa !3
  %add.us.prol = add nsw i32 %31, %mul.us.prol
  store i32 %add.us.prol, i32* %arrayidx5.us.prol, align 4, !tbaa !3
  %indvars.iv.next.prol = add nuw nsw i64 %indvars.iv.prol, 1 <------
  %prol.iter.next = add i64 %prol.iter, 1
  %prol.iter.cmp.not = icmp eq i64 %prol.iter.next, %xtraiter
  br i1 %prol.iter.cmp.not, label %for.body3.us.prol.loopexit.loopexit, label %for.body3.us.prol, !llvm.loop !15

After LSR, all uses of %indvars.iv.prol are removed:

for.body3.us.prol:                                ; preds = %for.body3.us.prol.preheader, %for.body3.us.prol
  %lsr.iv120 = phi i64 [ %30, %for.body3.us.prol.preheader ], [ %lsr.iv.next, %for.body3.us.prol ]
  %indvars.iv.prol = phi i64 [ %indvars.iv.next.prol, %for.body3.us.prol ], [ %indvars.iv.ph, %for.body3.us.prol.preheader ]
  %prol.iter = phi i64 [ %prol.iter.next, %for.body3.us.prol ], [ 0, %for.body3.us.prol.preheader ]
  %uglygep125 = getelementptr i8, i8* %b124, i64 %lsr.iv120 <------
  %uglygep125126 = bitcast i8* %uglygep125 to i32*
  %31 = load i32, i32* %uglygep125126, align 4, !tbaa !3
  %mul.us.prol = mul nsw i32 %31, %x
  %uglygep122 = getelementptr i8, i8* %a121, i64 %lsr.iv120 <------
  %uglygep122123 = bitcast i8* %uglygep122 to i32*
  %32 = load i32, i32* %uglygep122123, align 4, !tbaa !3
  %add.us.prol = add nsw i32 %32, %mul.us.prol
  store i32 %add.us.prol, i32* %uglygep122123, align 4, !tbaa !3
  %indvars.iv.next.prol = add nuw nsw i64 %indvars.iv.prol, 1
  %prol.iter.next = add i64 %prol.iter, 1
  %lsr.iv.next = add nuw nsw i64 %lsr.iv120, 4
  %prol.iter.cmp.not = icmp eq i64 %prol.iter.next, %xtraiter
  br i1 %prol.iter.cmp.not, label %for.body3.us.prol.loopexit.loopexit, label %for.body3.us.prol, !llvm.loop !15

But there is no follow on to remove unused %indvars.iv.next.prol within the loop.

This is normally a IndVar's problem, but i guess LSR runs so late that there is no IndVar's scheduled after it?
Do you have a phase-ordering test showing this missing simplification?

I verified that adding IndVarSimplifyPass after LSR will also remove the unused IV from the example testcase.

syzaara added a subscriber: nikic.Feb 15 2022, 8:29 AM

@nikic Hi Nikita, could you help me with assessing the compile time impact of adding the IndVarSimplifyPass after LoopStrengthReduce?

nikic added a comment.Feb 15 2022, 8:58 AM

@syzaara Running IndVars after LSR doesn't really make sense. IndVars is a canonicalization pass, while LSR is a target-specific optimization pass. Running IndVars after LSR would break the very specific IV structures generated by LSR, or at least that's my understanding.

Is it possible to abstract the portion of the code in IndVarSimplify to simplify this code pattern, and call that function instead of creating your own ReplaceExitPHIsWithFinalVal?

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
5621

continue can change to break as all Phis in the same block should have the same number of incoming values.

5641

In what situation, that IV only has a single use (IV->getNumUses() != 1) and one of the incoming values of IV is IVNext, but U != IVNext?

llvm/test/Transforms/LoopStrengthReduce/remove_scev_indvars.ll
23

Do we want to also handle the case %iv.prol.lcssa = phi i64 [ %indvars.iv.prol, %for.body ]?

Is it possible to abstract the portion of the code in IndVarSimplify to simplify this code pattern, and call that function instead of creating your own ReplaceExitPHIsWithFinalVal?

IndVarSimply calls rewriteLoopExitValues from LoopUtils to do this optimization. I tried to call the same function in LoopStrengthReduce, however rewriteLoopExitValues asserts that it requires: L->isRecursivelyLCSSAForm(*DT, *LI).
When calling rewriteLoopExitValues from LoopStrengthReduce, this assert is not met.

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
5641

There seems to be cases like this in the lit tests. Here an example from llvm/test/CodeGen/PowerPC/common-chain.ll

Exit block includes:

%add23.3.lcssa = phi i64 [ %add23.3, %for.body ]

So %add23.3 is what we are looking to replace.

for.body include these instructions:

%inc.addr.050 = phi i64 [ %inc4, %for.body.preheader.new ], [ %add23.3, %for.body ]
%add23 = add nsw i64 %inc.addr.050, %inc4
%add23.3 = add i64 %add23.2, %inc4

The IV here is: %inc.addr.050 and it has only one use of:
%add23 = add nsw i64 %inc.addr.050, %inc4
which is not the same as IVNext: %add23.3

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 8:37 AM
syzaara updated this revision to Diff 412737.Mar 3 2022, 8:40 AM

IndVarSimply calls rewriteLoopExitValues from LoopUtils to do this optimization. I tried to call the same function in LoopStrengthReduce, however rewriteLoopExitValues asserts that it requires: L->isRecursivelyLCSSAForm(*DT, *LI).

When calling rewriteLoopExitValues from LoopStrengthReduce, this assert is not met.

I am confused, is the assert from rewriteLoopExitValues met or not when calling from LoopStrengthReduce? Does LoopStrengthReduce currently rely on LCSSA form? Is it only become not LCSSA after LoopStrengthReduce changes the code?
Can we first invoke formLCSSA to form LCSSA before calling rewriteLoopExitValues?

IndVarSimply calls rewriteLoopExitValues from LoopUtils to do this optimization. I tried to call the same function in LoopStrengthReduce, however rewriteLoopExitValues asserts that it requires: L->isRecursivelyLCSSAForm(*DT, *LI).

When calling rewriteLoopExitValues from LoopStrengthReduce, this assert is not met.

I am confused, is the assert from rewriteLoopExitValues met or not when calling from LoopStrengthReduce? Does LoopStrengthReduce currently rely on LCSSA form? Is it only become not LCSSA after LoopStrengthReduce changes the code?
Can we first invoke formLCSSA to form LCSSA before calling rewriteLoopExitValues?

There were cases in lit and lnt where we would fail due to this assert. I have modified the patch to use rewriteLoopExitValues if the loop returns true for isRecursivelyLCSSAForm where there is only one use of the exit value aside from the use in the exit phi.

syzaara updated this revision to Diff 414174.Mar 9 2022, 11:42 AM

Updated to use rewriteLoopExitValues from LoopUtils.

Whitney added inline comments.Mar 11 2022, 2:01 PM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
5605

make it static

5606

I would just call it ExitBB or ExitBlock.

5618

you can simply return (IVNext->getNumUses() == 2);, and you don't really need a loop.

syzaara updated this revision to Diff 418671.Mar 28 2022, 12:01 PM
Whitney accepted this revision.Apr 1 2022, 7:02 AM
This revision is now accepted and ready to land.Apr 1 2022, 7:02 AM
This revision was landed with ongoing or failed builds.Apr 8 2022, 8:18 AM
This revision was automatically updated to reflect the committed changes.
uabelho added a subscriber: uabelho.EditedMay 17 2022, 4:04 AM

Hi,

The following starts crashing with this patch:

llc -O1 -o /dev/null bbi-69670_x86.ll

I get:

llc: ../lib/Transforms/Utils/LoopUtils.cpp:1385: int llvm::rewriteLoopExitValues(llvm::Loop *, llvm::LoopInfo *, llvm::TargetLibraryInfo *, llvm::ScalarEvolution *, const llvm::TargetTransformInfo *, llvm::SCEVExpander &, llvm::DominatorTree *, llvm::ReplaceExitVal, SmallVector<llvm::WeakTrackingVH, 16> &): Assertion `EVL->contains(L) && "LCSSA breach detected!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: ../../main-github/llvm/build-all/bin/llc -O1 -o /dev/null bbi-69670_x86.ll
1.	Running pass 'Function Pass Manager' on module 'bbi-69670_x86.ll'.
2.	Running pass 'Loop Pass Manager' on function '@func_1'
3.	Running pass 'Loop Strength Reduction' on basic block '%for.cond6418'
 #0 0x0000000002abded3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (../../main-github/llvm/build-all/bin/llc+0x2abded3)
 #1 0x0000000002abbb4e llvm::sys::RunSignalHandlers() (../../main-github/llvm/build-all/bin/llc+0x2abbb4e)
 #2 0x0000000002abe256 SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fcb1f00e630 __restore_rt sigaction.c:0:0
 #4 0x00007fcb1c755387 raise (/lib64/libc.so.6+0x36387)
 #5 0x00007fcb1c756a78 abort (/lib64/libc.so.6+0x37a78)
 #6 0x00007fcb1c74e1a6 __assert_fail_base (/lib64/libc.so.6+0x2f1a6)
 #7 0x00007fcb1c74e252 (/lib64/libc.so.6+0x2f252)
 #8 0x0000000002b80be3 llvm::rewriteLoopExitValues(llvm::Loop*, llvm::LoopInfo*, llvm::TargetLibraryInfo*, llvm::ScalarEvolution*, llvm::TargetTransformInfo const*, llvm::SCEVExpander&, llvm::DominatorTree*, llvm::ReplaceExitVal, llvm::SmallVector<llvm::WeakTrackingVH, 16u>&) (../../main-github/llvm/build-all/bin/llc+0x2b80be3)
 #9 0x00000000024c0a1e ReduceLoopStrength(llvm::Loop*, llvm::IVUsers&, llvm::ScalarEvolution&, llvm::DominatorTree&, llvm::LoopInfo&, llvm::TargetTransformInfo const&, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::MemorySSA*) LoopStrengthReduce.cpp:0:0
#10 0x00000000024ec121 (anonymous namespace)::LoopStrengthReduce::runOnLoop(llvm::Loop*, llvm::LPPassManager&) LoopStrengthReduce.cpp:0:0
#11 0x0000000001b2d17b llvm::LPPassManager::runOnFunction(llvm::Function&) (../../main-github/llvm/build-all/bin/llc+0x1b2d17b)
#12 0x000000000230895f llvm::FPPassManager::runOnFunction(llvm::Function&) (../../main-github/llvm/build-all/bin/llc+0x230895f)
#13 0x000000000230f398 llvm::FPPassManager::runOnModule(llvm::Module&) (../../main-github/llvm/build-all/bin/llc+0x230f398)
#14 0x0000000002308f2d llvm::legacy::PassManagerImpl::run(llvm::Module&) (../../main-github/llvm/build-all/bin/llc+0x2308f2d)
#15 0x000000000074a073 main (../../main-github/llvm/build-all/bin/llc+0x74a073)
#16 0x00007fcb1c741555 __libc_start_main (/lib64/libc.so.6+0x22555)
#17 0x0000000000747670 _start (../../main-github/llvm/build-all/bin/llc+0x747670)
Abort

I wouldn't be surprised if it's the dead bb jumping to non-dead code that trips up something:

for.cond6403:                                     ; preds = %dead, %cont5825
  %1 = phi i32 [ %.lcssa221, %dead ], [ 0, %cont5825 ]
  br label %for.cond6418
[...]
dead:                                             ; No predecessors!
  br label %for.cond6403

I wrote
https://github.com/llvm/llvm-project/issues/55529
about the crash.

Hi,

The following starts crashing with this patch:

llc -O1 -o /dev/null bbi-69670_x86.ll

I get:

llc: ../lib/Transforms/Utils/LoopUtils.cpp:1385: int llvm::rewriteLoopExitValues(llvm::Loop *, llvm::LoopInfo *, llvm::TargetLibraryInfo *, llvm::ScalarEvolution *, const llvm::TargetTransformInfo *, llvm::SCEVExpander &, llvm::DominatorTree *, llvm::ReplaceExitVal, SmallVector<llvm::WeakTrackingVH, 16> &): Assertion `EVL->contains(L) && "LCSSA breach detected!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: ../../main-github/llvm/build-all/bin/llc -O1 -o /dev/null bbi-69670_x86.ll
1.	Running pass 'Function Pass Manager' on module 'bbi-69670_x86.ll'.
2.	Running pass 'Loop Pass Manager' on function '@func_1'
3.	Running pass 'Loop Strength Reduction' on basic block '%for.cond6418'
 #0 0x0000000002abded3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (../../main-github/llvm/build-all/bin/llc+0x2abded3)
 #1 0x0000000002abbb4e llvm::sys::RunSignalHandlers() (../../main-github/llvm/build-all/bin/llc+0x2abbb4e)
 #2 0x0000000002abe256 SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fcb1f00e630 __restore_rt sigaction.c:0:0
 #4 0x00007fcb1c755387 raise (/lib64/libc.so.6+0x36387)
 #5 0x00007fcb1c756a78 abort (/lib64/libc.so.6+0x37a78)
 #6 0x00007fcb1c74e1a6 __assert_fail_base (/lib64/libc.so.6+0x2f1a6)
 #7 0x00007fcb1c74e252 (/lib64/libc.so.6+0x2f252)
 #8 0x0000000002b80be3 llvm::rewriteLoopExitValues(llvm::Loop*, llvm::LoopInfo*, llvm::TargetLibraryInfo*, llvm::ScalarEvolution*, llvm::TargetTransformInfo const*, llvm::SCEVExpander&, llvm::DominatorTree*, llvm::ReplaceExitVal, llvm::SmallVector<llvm::WeakTrackingVH, 16u>&) (../../main-github/llvm/build-all/bin/llc+0x2b80be3)
 #9 0x00000000024c0a1e ReduceLoopStrength(llvm::Loop*, llvm::IVUsers&, llvm::ScalarEvolution&, llvm::DominatorTree&, llvm::LoopInfo&, llvm::TargetTransformInfo const&, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::MemorySSA*) LoopStrengthReduce.cpp:0:0
#10 0x00000000024ec121 (anonymous namespace)::LoopStrengthReduce::runOnLoop(llvm::Loop*, llvm::LPPassManager&) LoopStrengthReduce.cpp:0:0
#11 0x0000000001b2d17b llvm::LPPassManager::runOnFunction(llvm::Function&) (../../main-github/llvm/build-all/bin/llc+0x1b2d17b)
#12 0x000000000230895f llvm::FPPassManager::runOnFunction(llvm::Function&) (../../main-github/llvm/build-all/bin/llc+0x230895f)
#13 0x000000000230f398 llvm::FPPassManager::runOnModule(llvm::Module&) (../../main-github/llvm/build-all/bin/llc+0x230f398)
#14 0x0000000002308f2d llvm::legacy::PassManagerImpl::run(llvm::Module&) (../../main-github/llvm/build-all/bin/llc+0x2308f2d)
#15 0x000000000074a073 main (../../main-github/llvm/build-all/bin/llc+0x74a073)
#16 0x00007fcb1c741555 __libc_start_main (/lib64/libc.so.6+0x22555)
#17 0x0000000000747670 _start (../../main-github/llvm/build-all/bin/llc+0x747670)
Abort

I wouldn't be surprised if it's the dead bb jumping to non-dead code that trips up something:

for.cond6403:                                     ; preds = %dead, %cont5825
  %1 = phi i32 [ %.lcssa221, %dead ], [ 0, %cont5825 ]
  br label %for.cond6418
[...]
dead:                                             ; No predecessors!
  br label %for.cond6403

I wrote
https://github.com/llvm/llvm-project/issues/55529
about the crash.

Hi, thanks for letting me know. I will investigate further.