Loop strength reduce sometimes optimizes away all uses of an induction variable from a loop but leaves the IV increments. When the only remaining use of the IV is the PHI in the exit block, this patch will use SCEV to replace the exit block PHI with the final value of the IV to skip the updates in each loop iteration.
Details
Diff Detail
Unit Tests
Event Timeline
This is normally a IndVar's problem, but i guess LSR runs so late that there is no IndVar's scheduled after it?
Do you have a phase-ordering test showing this missing simplification?
Are you referring to IndVarSimplify? Yes, it looks LSR is run after IndVarSimplify.
I thought to implement it in LSR because LSR is the one that was responsible for optimizing away the uses of the IV but leaving behind the single use of increments in the loop.
Here is an example testcase:
int do_work(int * a, int m, int n, int x, int* b) { int i,j; for( j = 0 ; j < m ; j++ ) { for( i = 0 ; i < n ; i++ ) { a[i] += x*b[i]; } } return x; }
Before LSR, we have:
for.body3.us.prol: ; preds = %for.body3.us.prol.preheader, %for.body3.us.prol %indvars.iv.prol = phi i64 [ %indvars.iv.next.prol, %for.body3.us.prol ], [ %indvars.iv.ph, %for.body3.us.prol.preheader ] %prol.iter = phi i64 [ %prol.iter.next, %for.body3.us.prol ], [ 0, %for.body3.us.prol.preheader ] %arrayidx.us.prol = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.prol <------ %30 = load i32, i32* %arrayidx.us.prol, align 4, !tbaa !3 %mul.us.prol = mul nsw i32 %30, %x %arrayidx5.us.prol = getelementptr inbounds i32, i32* %a, i64 %indvars.iv.prol <------ %31 = load i32, i32* %arrayidx5.us.prol, align 4, !tbaa !3 %add.us.prol = add nsw i32 %31, %mul.us.prol store i32 %add.us.prol, i32* %arrayidx5.us.prol, align 4, !tbaa !3 %indvars.iv.next.prol = add nuw nsw i64 %indvars.iv.prol, 1 <------ %prol.iter.next = add i64 %prol.iter, 1 %prol.iter.cmp.not = icmp eq i64 %prol.iter.next, %xtraiter br i1 %prol.iter.cmp.not, label %for.body3.us.prol.loopexit.loopexit, label %for.body3.us.prol, !llvm.loop !15
After LSR, all uses of %indvars.iv.prol are removed:
for.body3.us.prol: ; preds = %for.body3.us.prol.preheader, %for.body3.us.prol %lsr.iv120 = phi i64 [ %30, %for.body3.us.prol.preheader ], [ %lsr.iv.next, %for.body3.us.prol ] %indvars.iv.prol = phi i64 [ %indvars.iv.next.prol, %for.body3.us.prol ], [ %indvars.iv.ph, %for.body3.us.prol.preheader ] %prol.iter = phi i64 [ %prol.iter.next, %for.body3.us.prol ], [ 0, %for.body3.us.prol.preheader ] %uglygep125 = getelementptr i8, i8* %b124, i64 %lsr.iv120 <------ %uglygep125126 = bitcast i8* %uglygep125 to i32* %31 = load i32, i32* %uglygep125126, align 4, !tbaa !3 %mul.us.prol = mul nsw i32 %31, %x %uglygep122 = getelementptr i8, i8* %a121, i64 %lsr.iv120 <------ %uglygep122123 = bitcast i8* %uglygep122 to i32* %32 = load i32, i32* %uglygep122123, align 4, !tbaa !3 %add.us.prol = add nsw i32 %32, %mul.us.prol store i32 %add.us.prol, i32* %uglygep122123, align 4, !tbaa !3 %indvars.iv.next.prol = add nuw nsw i64 %indvars.iv.prol, 1 %prol.iter.next = add i64 %prol.iter, 1 %lsr.iv.next = add nuw nsw i64 %lsr.iv120, 4 %prol.iter.cmp.not = icmp eq i64 %prol.iter.next, %xtraiter br i1 %prol.iter.cmp.not, label %for.body3.us.prol.loopexit.loopexit, label %for.body3.us.prol, !llvm.loop !15
But there is no follow on to remove unused %indvars.iv.next.prol within the loop.
I verified that adding IndVarSimplifyPass after LSR will also remove the unused IV from the example testcase.
@nikic Hi Nikita, could you help me with assessing the compile time impact of adding the IndVarSimplifyPass after LoopStrengthReduce?
@syzaara Running IndVars after LSR doesn't really make sense. IndVars is a canonicalization pass, while LSR is a target-specific optimization pass. Running IndVars after LSR would break the very specific IV structures generated by LSR, or at least that's my understanding.
Is it possible to abstract the portion of the code in IndVarSimplify to simplify this code pattern, and call that function instead of creating your own ReplaceExitPHIsWithFinalVal?
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp | ||
---|---|---|
5621 | continue can change to break as all Phis in the same block should have the same number of incoming values. | |
5641 | In what situation, that IV only has a single use (IV->getNumUses() != 1) and one of the incoming values of IV is IVNext, but U != IVNext? | |
llvm/test/Transforms/LoopStrengthReduce/remove_scev_indvars.ll | ||
23 | Do we want to also handle the case %iv.prol.lcssa = phi i64 [ %indvars.iv.prol, %for.body ]? |
IndVarSimply calls rewriteLoopExitValues from LoopUtils to do this optimization. I tried to call the same function in LoopStrengthReduce, however rewriteLoopExitValues asserts that it requires: L->isRecursivelyLCSSAForm(*DT, *LI).
When calling rewriteLoopExitValues from LoopStrengthReduce, this assert is not met.
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp | ||
---|---|---|
5641 | There seems to be cases like this in the lit tests. Here an example from llvm/test/CodeGen/PowerPC/common-chain.ll Exit block includes: %add23.3.lcssa = phi i64 [ %add23.3, %for.body ] So %add23.3 is what we are looking to replace. for.body include these instructions: %inc.addr.050 = phi i64 [ %inc4, %for.body.preheader.new ], [ %add23.3, %for.body ] %add23 = add nsw i64 %inc.addr.050, %inc4 %add23.3 = add i64 %add23.2, %inc4 The IV here is: %inc.addr.050 and it has only one use of: |
IndVarSimply calls rewriteLoopExitValues from LoopUtils to do this optimization. I tried to call the same function in LoopStrengthReduce, however rewriteLoopExitValues asserts that it requires: L->isRecursivelyLCSSAForm(*DT, *LI).
When calling rewriteLoopExitValues from LoopStrengthReduce, this assert is not met.
I am confused, is the assert from rewriteLoopExitValues met or not when calling from LoopStrengthReduce? Does LoopStrengthReduce currently rely on LCSSA form? Is it only become not LCSSA after LoopStrengthReduce changes the code?
Can we first invoke formLCSSA to form LCSSA before calling rewriteLoopExitValues?
There were cases in lit and lnt where we would fail due to this assert. I have modified the patch to use rewriteLoopExitValues if the loop returns true for isRecursivelyLCSSAForm where there is only one use of the exit value aside from the use in the exit phi.
Hi,
The following starts crashing with this patch:
llc -O1 -o /dev/null bbi-69670_x86.ll
I get:
llc: ../lib/Transforms/Utils/LoopUtils.cpp:1385: int llvm::rewriteLoopExitValues(llvm::Loop *, llvm::LoopInfo *, llvm::TargetLibraryInfo *, llvm::ScalarEvolution *, const llvm::TargetTransformInfo *, llvm::SCEVExpander &, llvm::DominatorTree *, llvm::ReplaceExitVal, SmallVector<llvm::WeakTrackingVH, 16> &): Assertion `EVL->contains(L) && "LCSSA breach detected!"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: ../../main-github/llvm/build-all/bin/llc -O1 -o /dev/null bbi-69670_x86.ll 1. Running pass 'Function Pass Manager' on module 'bbi-69670_x86.ll'. 2. Running pass 'Loop Pass Manager' on function '@func_1' 3. Running pass 'Loop Strength Reduction' on basic block '%for.cond6418' #0 0x0000000002abded3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (../../main-github/llvm/build-all/bin/llc+0x2abded3) #1 0x0000000002abbb4e llvm::sys::RunSignalHandlers() (../../main-github/llvm/build-all/bin/llc+0x2abbb4e) #2 0x0000000002abe256 SignalHandler(int) Signals.cpp:0:0 #3 0x00007fcb1f00e630 __restore_rt sigaction.c:0:0 #4 0x00007fcb1c755387 raise (/lib64/libc.so.6+0x36387) #5 0x00007fcb1c756a78 abort (/lib64/libc.so.6+0x37a78) #6 0x00007fcb1c74e1a6 __assert_fail_base (/lib64/libc.so.6+0x2f1a6) #7 0x00007fcb1c74e252 (/lib64/libc.so.6+0x2f252) #8 0x0000000002b80be3 llvm::rewriteLoopExitValues(llvm::Loop*, llvm::LoopInfo*, llvm::TargetLibraryInfo*, llvm::ScalarEvolution*, llvm::TargetTransformInfo const*, llvm::SCEVExpander&, llvm::DominatorTree*, llvm::ReplaceExitVal, llvm::SmallVector<llvm::WeakTrackingVH, 16u>&) (../../main-github/llvm/build-all/bin/llc+0x2b80be3) #9 0x00000000024c0a1e ReduceLoopStrength(llvm::Loop*, llvm::IVUsers&, llvm::ScalarEvolution&, llvm::DominatorTree&, llvm::LoopInfo&, llvm::TargetTransformInfo const&, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::MemorySSA*) LoopStrengthReduce.cpp:0:0 #10 0x00000000024ec121 (anonymous namespace)::LoopStrengthReduce::runOnLoop(llvm::Loop*, llvm::LPPassManager&) LoopStrengthReduce.cpp:0:0 #11 0x0000000001b2d17b llvm::LPPassManager::runOnFunction(llvm::Function&) (../../main-github/llvm/build-all/bin/llc+0x1b2d17b) #12 0x000000000230895f llvm::FPPassManager::runOnFunction(llvm::Function&) (../../main-github/llvm/build-all/bin/llc+0x230895f) #13 0x000000000230f398 llvm::FPPassManager::runOnModule(llvm::Module&) (../../main-github/llvm/build-all/bin/llc+0x230f398) #14 0x0000000002308f2d llvm::legacy::PassManagerImpl::run(llvm::Module&) (../../main-github/llvm/build-all/bin/llc+0x2308f2d) #15 0x000000000074a073 main (../../main-github/llvm/build-all/bin/llc+0x74a073) #16 0x00007fcb1c741555 __libc_start_main (/lib64/libc.so.6+0x22555) #17 0x0000000000747670 _start (../../main-github/llvm/build-all/bin/llc+0x747670) Abort
I wouldn't be surprised if it's the dead bb jumping to non-dead code that trips up something:
for.cond6403: ; preds = %dead, %cont5825 %1 = phi i32 [ %.lcssa221, %dead ], [ 0, %cont5825 ] br label %for.cond6418 [...] dead: ; No predecessors! br label %for.cond6403
I wrote
https://github.com/llvm/llvm-project/issues/55529
about the crash.
make it static