This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
ScalarEvolutionExpander.h
-
lib/Transforms/
-
Transforms/
-
Scalar/
-
LoopStrengthReduce.cpp
-
Utils/
-
ScalarEvolutionExpander.cpp
-
test/Transforms/LoopStrengthReduce/
-
Transforms/
-
LoopStrengthReduce/
-
wrong-hoisting-iv.ll

Differential D105723

[LSR] Do not hoist IV if it is not post increment case. PR43678
AbandonedPublic

Authored by skatkov on Jul 9 2021, 12:29 PM.

Download Raw Diff

Details

Reviewers

efriedma
mkazantsev
ebrevnov
reames
qcolombet
atrick

Summary

Hosting IV increment in SCEV expander may cause a wrong SSA form formation in case
this increment is planned to be updated by LSR due to other pieces of LSR formula
will be expanded in old place and replacing an operand of hoisted instruction results in
incorrect SSA form.

Diff Detail

Unit TestsFailed

	Time	Test
	3,170 ms	x64 debian > libarcher.barrier::barrier.c
	3,170 ms	x64 debian > libarcher.critical::critical.c
	3,040 ms	x64 debian > libarcher.critical::lock-nested.c
	3,490 ms	x64 debian > libarcher.races::critical-unrelated.c
	3,400 ms	x64 debian > libarcher.races::lock-nested-unrelated.c
		View Full Test Results (14 Failed)

Event Timeline

skatkov created this revision.Jul 9 2021, 12:29 PM

Herald added subscribers: javed.absar, hiraditya. · View Herald TranscriptJul 9 2021, 12:29 PM

skatkov requested review of this revision.Jul 9 2021, 12:29 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2021, 12:29 PM

Harbormaster completed remote builds in B113259: Diff 357594.Jul 9 2021, 1:14 PM

• post.kadirselcuk added a child revision: D34362: [LNT] Support for different DataSet usage in Polybench for "lnt runtest nt".Jul 10 2021, 5:55 PM

• post.kadirselcuk added a parent revision: D105762: [X86] Teach X86FloatingPoint's handleCall to only erase the FP stack if there is a regmask operand that clobbers the FP stack..Jul 10 2021, 8:06 PM

craig.topper removed a parent revision: D105762: [X86] Teach X86FloatingPoint's handleCall to only erase the FP stack if there is a regmask operand that clobbers the FP stack..Jul 10 2021, 9:47 PM

Some additional information to help to reviewers, in other words why I think this is a good fix.

LSR, in LSRInstance::OptimizeLoopTermCond(), sets the IVIncInsertPos variable to common dominator of all conditions related to post-increment IV.
Say, if we have a loop exit with IV check which is not latch and increment of this IV is located in latch. The optimization will try to hoist IV increment closer to check to increase the chances to utilize the same physical register for increment and check.

Later, for each IV it adjust insert position in LSRInstance::AdjustInsertPositionForExpand and in this place it insertion position is adjusted taking into account whether IV is post-increment or not. So for post increment all pieces of formula will be expanded before IVIncInsertPos.

However if IV is not post incremented insertion position will not be adjusted.

Later in SCEV expander, there is another attempt to hoist IV increment (this part which is in the patch) and it does not take into account whether current IV is post-increment or not.
it just based on IVIncInsertPos variable which is set once for all IVs processed in LSR.

As a result if current IV processed by SEV expander is not post incremented (but there is another IV with post-increment), IV increment expansion will be hoisted while other parts of formula do not. This causes a bug.

Hi,

I am not sure I understand the issue here.
Could you paste the IR with the wrong transformation (before this patch) for one of the test (e.g., the smaller one)?

Some high-level comments:

You can put more than one function in a test if you want to avoid to create two different files?
Could you change the filenames of the tests with something more human friendly so that we know what is tested just looking at the filename? (You can put the PR number in the comment in the test)

Cheers,
-Quentin

llvm/test/Transforms/LoopStrengthReduce/pr43678-2.ll
4 ↗	(On Diff #357594)	Could you be more explicit about what this test is exercising? E.g., what IV causes the problem and how it was wrongly transformed (or what it looks like to generate something correct.)
5 ↗	(On Diff #357594)	Could you add more check lines? As is, we only check that we don't crash. You can use `update_test_checks.py`
llvm/test/Transforms/LoopStrengthReduce/pr43678.ll
5 ↗	(On Diff #357594)	Same comments about the check lines and the explanation of what is being tested here.

In D105723#2874536, @qcolombet wrote:

Hi,

I am not sure I understand the issue here.
Could you paste the IR with the wrong transformation (before this patch) for one of the test (e.g., the smaller one)?

Hi Quentin, first of all, thank you very much for starting looking into that. I'll update tests soon.

Here is the buggy output for the first case for the current state of LSR:

*** IR Dump After Loop Strength Reduction (loop-reduce) ***
; Preheader:
bb:
  %tmp = bitcast i8* null to i32*
  %tmp1 = load i32, i32* %tmp, align 4
  %tmp2 = bitcast i8* null to i32*
  %tmp3 = load i32, i32* %tmp2, align 4
  br label %bb6

; Loop:
bb6:                                              ; preds = %bb12, %bb
  %lsr.iv = phi i64 [ %lsr.iv.next, %bb12 ], [ -1, %bb ]
  %tmp8 = phi i32 [ %tmp16, %bb12 ], [ %tmp3, %bb ]
  %lsr.iv.next = add nsw i64 %lsr.iv, 1
  %tmp14 = add i32 %tmp8, %tmp1
  %tmp16 = add i32 %0, 1
  %tmp10 = icmp ult i64 %lsr.iv.next, 1048576
  br i1 %tmp10, label %bb12, label %bb11

bb12:                                             ; preds = %bb6
  %0 = add i32 %tmp1, %tmp8
  %tmp15 = select i1 false, i32 %tmp14, i32 %tmp8
  %tmp17 = fcmp olt double 0.000000e+00, 2.270000e+02
  br i1 %tmp17, label %bb6, label %bb4

; Exit blocks
bb11:                                             ; preds = %bb6
  unreachable

bb4:                                              ; preds = %bb12
  %tmp5 = sext i32 %tmp16 to i64
  unreachable
Instruction does not dominate all uses!
  %0 = add i32 %tmp1, %tmp8
  %tmp16 = add i32 %0, 1
in function test

The critical pieces:
%tmp7 is post-increment induction variable.
%tmp8 is another IV which increment value %tmp16 LSR wants to update.

Due to the bug %tmp16 is hoisted to header due to there is a post-increment %tmp7 and we try to hoist all IV before condition of that IV.
Generated instruction %0 is still generated in backedge due to insert position for formula know that %tmp8 is not post-increment IV and we do not need to hoist it to header.

test update.

Harbormaster completed remote builds in B113921: Diff 358516.Jul 13 2021, 11:00 PM

I spent way too much time looking at this on Tuesday. Let me summarize my findings.

The basic issue we're running into is that while attempting to expand an LSR formula corresponding to an operand of an IV increment, SCEVExpander goes looking for a profitable expansion in terms of existing IVs. While doing that, it tries to optimize the IVs it finds by hoisting expressions involved in those IVs (to the IVIncInsertPt, but which point doesn't really matter). This movement is itself entirely legal, but it creates a very surprising effect. In this case, the IV being hoisted is exactly the one that LSR was in the middle of expanding a formula for an operand of.

This results in the very surprising result that calling LSRInstance::Expand on an operand of some instruction can result in a Value being returned which does not dominate that instruction. It dominates the location that instruction *used to be at*, but the instruction itself may have itself been hoisted. LSR clearly does not expect this.

SCEVExpander has a very delicate interlock with LSR. LSR puts the expander into both a "literal mode", and a special "LSR mode". LSR expects to be able to cache information about SCEVs and instructions, and have only it's own rewrites invalidate them. This seems like a case where SCEVExpander is violating the implicit contract.

Note that nothing in the description above is specific to post-increment IVs. I think the fact we're happening to see this with post-inc is an artifact. Or at least, I don't currently understand why pre-incs couldn't see the same issue.

In this case, simply disabling hoisting entirely in LSRMode appears to work, and doesn't appear to cause any regressions in the test suite. On the surface, that seems like a reasonable fix.

The point I got stuck was trying to reason about how having SCEVExpander return a Value for the operand of some instruction which no longer dominates said instruction doesn't break users other than LSR as well. I'd just started thinking that through when I got distracted and haven't come to a conclusion on that question yet.

In terms of incrementalism, I'd suggest revising this patch to simply disable hoisting in LSRMode. Doing that appears to work, is at least slightly less narrow in fixing a symptom, and is a reasonable step forward. I'd LGTM that. I could also maybe be convinced to LGTM the current patch if you can make a good argument for why this is actually specific to post-inc IVs.

llvm/test/Transforms/LoopStrengthReduce/pr43678.ll
1 ↗	(On Diff #357594)	Please combine these two files into one, and auto-gen the result so that the output gets checked.

In D105723#2878473, @reames wrote:

I spent way too much time looking at this on Tuesday. Let me summarize my findings.

The basic issue we're running into is that while attempting to expand an LSR formula corresponding to an operand of an IV increment, SCEVExpander goes looking for a profitable expansion in terms of existing IVs. While doing that, it tries to optimize the IVs it finds by hoisting expressions involved in those IVs (to the IVIncInsertPt, but which point doesn't really matter). This movement is itself entirely legal, but it creates a very surprising effect. In this case, the IV being hoisted is exactly the one that LSR was in the middle of expanding a formula for an operand of.

This results in the very surprising result that calling LSRInstance::Expand on an operand of some instruction can result in a Value being returned which does not dominate that instruction. It dominates the location that instruction *used to be at*, but the instruction itself may have itself been hoisted. LSR clearly does not expect this.

SCEVExpander has a very delicate interlock with LSR. LSR puts the expander into both a "literal mode", and a special "LSR mode". LSR expects to be able to cache information about SCEVs and instructions, and have only it's own rewrites invalidate them. This seems like a case where SCEVExpander is violating the implicit contract.

Note that nothing in the description above is specific to post-increment IVs. I think the fact we're happening to see this with post-inc is an artifact. Or at least, I don't currently understand why pre-incs couldn't see the same issue.

In this case, simply disabling hoisting entirely in LSRMode appears to work, and doesn't appear to cause any regressions in the test suite. On the surface, that seems like a reasonable fix.

The point I got stuck was trying to reason about how having SCEVExpander return a Value for the operand of some instruction which no longer dominates said instruction doesn't break users other than LSR as well. I'd just started thinking that through when I got distracted and haven't come to a conclusion on that question yet.

In terms of incrementalism, I'd suggest revising this patch to simply disable hoisting in LSRMode. Doing that appears to work, is at least slightly less narrow in fixing a symptom, and is a reasonable step forward. I'd LGTM that. I could also maybe be convinced to LGTM the current patch if you can make a good argument for why this is actually specific to post-inc IVs.

Hi Philip, first of all thank you to looking to this as well and spending time to git this issue.
I completely agree with your root cause analysis. Thank you for writing this in clear manner.

WRT post-increment - digging.... My feeling is that with post-inc we can get such case. I'' try to write a test with post-inc or prove that it is impossible.
What worries me in disabling hoisting in LSR mode in SCEV expander is that it hoist under check LSR mode. So this was done intentionally...

ok, after some additional investigation it looks like with my patch I effectively disabled hoisting at all.
First of all in places which I updated we are trying to hoist loop increment use of IV.

And it seems loop increment use cannot be marked as post-inc use.
There are two places in the code where use is marked as post-inc:

In IVUsers::AddUsersImpl it is marked under condition IVUseShouldUsePostIncValue is true, where the first check

if (L->contains(User))
  return false;

It means that for IV increment (which is always located in a loop) we always returns false.

void IVStrideUse::transformToPostInc(const Loop *L) {
  PostIncLoops.insert(L);
}

The only use of this function is CondUse->transformToPostInc(L); in LSRInstance::OptimizeLoopTermCond, so we mark condition which is not increment of IV.

So increment of IV cannot be marked as post-inc use and my patch just completely disable hoisting...

I will try to look into the history for the reason why this hoisting was added. If we want to complete it, probably we can check if there is a IV increment user which is planned to be updated by LSR and forbid hoisting of such increment. But it will add a complexity and interaction between LSR and expander.

Add Andrew who seems fixed the similar issues in 2012 (!!!).
Basing on commit c908b43d9fe12c633216458e992a1ee375d9910c.

An attempt to make a less intrusive fix.

Harbormaster completed remote builds in B114418: Diff 359215.Jul 15 2021, 10:45 PM

I've posted https://reviews.llvm.org/D106178 as an alternative. Unless someone strongly objects, I plan to simply remove all the buggy and untested code.

efriedma removed a child revision: D34362: [LNT] Support for different DataSet usage in Polybench for "lnt runtest nt".Jul 17 2021, 3:02 PM

In D105723#2884101, @reames wrote:

I've posted https://reviews.llvm.org/D106178 as an alternative. Unless someone strongly objects, I plan to simply remove all the buggy and untested code.

I'm fine with that but my limited knowledge of LSR makes me not so comfortable to approve the whole elimination of optimization.
But in general I agree with all your notes here and in D106178.

Hi all,

Given Philip (@reames)'s findings, it seems reasonable to me to go with what Philip is suggesting: removing the code that violate the contract between LSR and SCEVExpander.

Cheers,
-Quentin

In D105723#2894209, @qcolombet wrote:

Hi all,

Given Philip (@reames)'s findings, it seems reasonable to me to go with what Philip is suggesting: removing the code that violate the contract between LSR and SCEVExpander.

Cheers,
-Quentin

Given that, can you approve Philip's patch?

Given that, can you approve Philip's patch?

Done!

reames mentioned this in rG982da7a20c40: [SCEVExpander] Stop hoisting IR when reusing phis.Aug 17 2021, 9:38 AM

Abandon in favor of https://reviews.llvm.org/D106178

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

ScalarEvolutionExpander.h

10 lines

lib/

Transforms/

Scalar/

LoopStrengthReduce.cpp

5 lines

Utils/

ScalarEvolutionExpander.cpp

5 lines

test/

Transforms/

LoopStrengthReduce/

wrong-hoisting-iv.ll

247 lines

Diff 359215

llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	class SCEVExpander : public SCEVVisitor<SCEVExpander, Value *> {
/// When this is non-null, addrecs expanded in the loop it indicates should		/// When this is non-null, addrecs expanded in the loop it indicates should
/// be inserted with increments at IVIncInsertPos.		/// be inserted with increments at IVIncInsertPos.
const Loop *IVIncInsertLoop;		const Loop *IVIncInsertLoop;

/// When expanding addrecs in the IVIncInsertLoop loop, insert the IV		/// When expanding addrecs in the IVIncInsertLoop loop, insert the IV
/// increment at this position.		/// increment at this position.
Instruction *IVIncInsertPos;		Instruction *IVIncInsertPos;

		// If this is non-null, do not hoist this instruction while hoisting IV
		// increment.
		Instruction *IVIncProhibitHoisting;

/// Phis that complete an IV chain. Reuse		/// Phis that complete an IV chain. Reuse
DenseSet<AssertingVH<PHINode>> ChainedPhis;		DenseSet<AssertingVH<PHINode>> ChainedPhis;

/// When true, SCEVExpander tries to expand expressions in "canonical" form.		/// When true, SCEVExpander tries to expand expressions in "canonical" form.
/// When false, expressions are expanded in a more literal form.		/// When false, expressions are expanded in a more literal form.
///		///
/// In "canonical" form addrecs are expanded as arithmetic based on a		/// In "canonical" form addrecs are expanded as arithmetic based on a
/// canonical induction variable. Note that CanonicalMode doesn't guarantee		/// canonical induction variable. Note that CanonicalMode doesn't guarantee
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	#endif
/// Set the current IV increment loop and position.		/// Set the current IV increment loop and position.
void setIVIncInsertPos(const Loop L, Instruction Pos) {		void setIVIncInsertPos(const Loop L, Instruction Pos) {
assert(!CanonicalMode &&		assert(!CanonicalMode &&
"IV increment positions are not supported in CanonicalMode");		"IV increment positions are not supported in CanonicalMode");
IVIncInsertLoop = L;		IVIncInsertLoop = L;
IVIncInsertPos = Pos;		IVIncInsertPos = Pos;
}		}

		void setIVIncProhibitHoisting(Instruction *I = nullptr) {
		assert(!CanonicalMode &&
		"IV increment hoisting is not supported in CanonicalMode");
		IVIncProhibitHoisting = I;
		}

/// Enable post-inc expansion for addrecs referring to the given		/// Enable post-inc expansion for addrecs referring to the given
/// loops. Post-inc expansion is only supported in non-canonical mode.		/// loops. Post-inc expansion is only supported in non-canonical mode.
void setPostInc(const PostIncLoopSet &L) {		void setPostInc(const PostIncLoopSet &L) {
assert(!CanonicalMode &&		assert(!CanonicalMode &&
"Post-inc expansion is not supported in CanonicalMode");		"Post-inc expansion is not supported in CanonicalMode");
PostIncLoops = L;		PostIncLoops = L;
}		}

▲ Show 20 Lines • Show All 204 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 5,228 Lines • ▼ Show 20 Lines	Value *LSRInstance::Expand(const LSRUse &LU, const LSRFixup &LF,
// which will dominate the result.		// which will dominate the result.
IP = AdjustInsertPositionForExpand(IP, LF, LU, Rewriter);		IP = AdjustInsertPositionForExpand(IP, LF, LU, Rewriter);
Rewriter.setInsertPoint(&*IP);		Rewriter.setInsertPoint(&*IP);

// Inform the Rewriter if we have a post-increment use, so that it can		// Inform the Rewriter if we have a post-increment use, so that it can
// perform an advantageous expansion.		// perform an advantageous expansion.
Rewriter.setPostInc(LF.PostIncLoops);		Rewriter.setPostInc(LF.PostIncLoops);

		// Inform Rewriter that we well need to update inctruction and it should not
		// hoist it.
		Rewriter.setIVIncProhibitHoisting(LF.UserInst);

// This is the type that the user actually needs.		// This is the type that the user actually needs.
Type *OpTy = LF.OperandValToReplace->getType();		Type *OpTy = LF.OperandValToReplace->getType();
// This will be the type that we'll initially expand to.		// This will be the type that we'll initially expand to.
Type *Ty = F.getType();		Type *Ty = F.getType();
if (!Ty)		if (!Ty)
// No type known; just expand directly to the ultimate type.		// No type known; just expand directly to the ultimate type.
Ty = OpTy;		Ty = OpTy;
else if (SE.getEffectiveSCEVType(Ty) == SE.getEffectiveSCEVType(OpTy))		else if (SE.getEffectiveSCEVType(Ty) == SE.getEffectiveSCEVType(OpTy))
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	Value *LSRInstance::Expand(const LSRUse &LU, const LSRFixup &LF,
// Emit instructions summing all the operands.		// Emit instructions summing all the operands.
const SCEV *FullS = Ops.empty() ?		const SCEV *FullS = Ops.empty() ?
SE.getConstant(IntTy, 0) :		SE.getConstant(IntTy, 0) :
SE.getAddExpr(Ops);		SE.getAddExpr(Ops);
Value *FullV = Rewriter.expandCodeFor(FullS, Ty);		Value *FullV = Rewriter.expandCodeFor(FullS, Ty);

// We're done expanding now, so reset the rewriter.		// We're done expanding now, so reset the rewriter.
Rewriter.clearPostInc();		Rewriter.clearPostInc();
		Rewriter.setIVIncProhibitHoisting();

// An ICmpZero Formula represents an ICmp which we're handling as a		// An ICmpZero Formula represents an ICmp which we're handling as a
// comparison against zero. Now that we've expanded an expression for that		// comparison against zero. Now that we've expanded an expression for that
// form, update the ICmp's other operand.		// form, update the ICmp's other operand.
if (LU.Kind == LSRUse::ICmpZero) {		if (LU.Kind == LSRUse::ICmpZero) {
ICmpInst *CI = cast<ICmpInst>(LF.UserInst);		ICmpInst *CI = cast<ICmpInst>(LF.UserInst);
if (auto *OperandIsInstr = dyn_cast<Instruction>(CI->getOperand(1)))		if (auto *OperandIsInstr = dyn_cast<Instruction>(CI->getOperand(1)))
DeadInsts.emplace_back(OperandIsInstr);		DeadInsts.emplace_back(OperandIsInstr);
▲ Show 20 Lines • Show All 664 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp

Show First 20 Lines • Show All 1,046 Lines • ▼ Show 20 Lines	for (auto *InsertPtGuard : InsertPointGuards)
if (InsertPtGuard->GetInsertPoint() == It)		if (InsertPtGuard->GetInsertPoint() == It)
InsertPtGuard->SetInsertPoint(NewInsertPt);		InsertPtGuard->SetInsertPoint(NewInsertPt);
}		}

/// hoistStep - Attempt to hoist a simple IV increment above InsertPos to make		/// hoistStep - Attempt to hoist a simple IV increment above InsertPos to make
/// it available to other uses in this loop. Recursively hoist any operands,		/// it available to other uses in this loop. Recursively hoist any operands,
/// until we reach a value that dominates InsertPos.		/// until we reach a value that dominates InsertPos.
bool SCEVExpander::hoistIVInc(Instruction IncV, Instruction InsertPos) {		bool SCEVExpander::hoistIVInc(Instruction IncV, Instruction InsertPos) {
		if (IncV == IVIncProhibitHoisting)
		return false;

if (SE.DT.dominates(IncV, InsertPos))		if (SE.DT.dominates(IncV, InsertPos))
return true;		return true;

// InsertPos must itself dominate IncV so that IncV's new position satisfies		// InsertPos must itself dominate IncV so that IncV's new position satisfies
// its existing users.		// its existing users.
if (isa<PHINode>(InsertPos) \|\|		if (isa<PHINode>(InsertPos) \|\|
!SE.DT.dominates(InsertPos->getParent(), IncV->getParent()))		!SE.DT.dominates(InsertPos->getParent(), IncV->getParent()))
return false;		return false;
▲ Show 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	for (PHINode &PN : L->getHeader()->phis()) {
IncV = TempIncV;		IncV = TempIncV;
TruncTy = SE.getEffectiveSCEVType(Normalized->getType());		TruncTy = SE.getEffectiveSCEVType(Normalized->getType());
}		}
}		}

if (AddRecPhiMatch) {		if (AddRecPhiMatch) {
// Potentially, move the increment. We have made sure in		// Potentially, move the increment. We have made sure in
// isExpandedAddRecExprPHI or hoistIVInc that this is possible.		// isExpandedAddRecExprPHI or hoistIVInc that this is possible.
if (L == IVIncInsertLoop)		if (L == IVIncInsertLoop && IncV != IVIncProhibitHoisting)
hoistBeforePos(&SE.DT, IncV, IVIncInsertPos, AddRecPhiMatch);		hoistBeforePos(&SE.DT, IncV, IVIncInsertPos, AddRecPhiMatch);

// Ok, the add recurrence looks usable.		// Ok, the add recurrence looks usable.
// Remember this PHI, even in post-inc mode.		// Remember this PHI, even in post-inc mode.
InsertedValues.insert(AddRecPhiMatch);		InsertedValues.insert(AddRecPhiMatch);
// Remember the increment.		// Remember the increment.
rememberInstruction(IncV);		rememberInstruction(IncV);
// Those values were not actually inserted but re-used.		// Those values were not actually inserted but re-used.
▲ Show 20 Lines • Show All 1,471 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopStrengthReduce/wrong-hoisting-iv.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -loop-reduce < %s \| FileCheck %s

				; These are regression tests for PR43768.
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

				; Test checks that LSR does not hoist increment of %val9 while expanding the other pieces of formula
				; to original place in backedge causing incorrect SSA form.
				define void @test1() {
				; CHECK-LABEL: @test1(
				; CHECK-NEXT: bb:
				; CHECK-NEXT: [[VAL:%.]] = load i32, i32 addrspace(3) undef, align 4
				; CHECK-NEXT: [[VAL1:%.*]] = add i32 undef, 12
				; CHECK-NEXT: [[VAL2:%.*]] = trunc i64 undef to i32
				; CHECK-NEXT: [[VAL3:%.*]] = mul i32 [[VAL1]], [[VAL2]]
				; CHECK-NEXT: [[VAL4:%.*]] = sub i32 [[VAL]], [[VAL3]]
				; CHECK-NEXT: [[VAL5:%.*]] = ashr i32 undef, undef
				; CHECK-NEXT: [[VAL6:%.*]] = sub i32 [[VAL4]], [[VAL5]]
				; CHECK-NEXT: [[TMP0:%.*]] = mul i32 [[VAL]], 7
				; CHECK-NEXT: [[TMP1:%.*]] = mul i32 [[VAL3]], 7
				; CHECK-NEXT: [[TMP2:%.*]] = sub i32 [[TMP0]], [[TMP1]]
				; CHECK-NEXT: [[TMP3:%.*]] = mul i32 [[VAL5]], 7
				; CHECK-NEXT: [[TMP4:%.*]] = sub i32 [[TMP2]], [[TMP3]]
				; CHECK-NEXT: [[TMP5:%.*]] = shl i32 [[VAL6]], 3
				; CHECK-NEXT: br label [[BB7:%.*]]
				; CHECK: bb7:
				; CHECK-NEXT: [[LSR_IV1:%.]] = phi i32 [ [[LSR_IV_NEXT2:%.]], [[BB32:%.]] ], [ 0, [[BB:%.]] ]
				; CHECK-NEXT: [[LSR_IV:%.]] = phi i64 [ [[LSR_IV_NEXT:%.]], [[BB32]] ], [ -8, [[BB]] ]
				; CHECK-NEXT: [[LSR_IV_NEXT]] = add nsw i64 [[LSR_IV]], 8
				; CHECK-NEXT: [[LSR_IV_NEXT2]] = add nuw nsw i32 [[LSR_IV1]], [[TMP5]]
				; CHECK-NEXT: [[VAL10:%.*]] = icmp ult i64 [[LSR_IV_NEXT]], 65536
				; CHECK-NEXT: br i1 [[VAL10]], label [[BB12:%.]], label [[BB11:%.]]
				; CHECK: bb11:
				; CHECK-NEXT: unreachable
				; CHECK: bb12:
				; CHECK-NEXT: [[VAL14:%.*]] = icmp slt i32 undef, undef
				; CHECK-NEXT: br i1 [[VAL14]], label [[BB17:%.]], label [[BB12_BB15SPLITSPLITSPLITSPLITSPLIT_CRIT_EDGE:%.]]
				; CHECK: bb15splitsplitsplitsplitsplitsplit:
				; CHECK-NEXT: br label [[BB15SPLITSPLITSPLITSPLITSPLIT:%.*]]
				; CHECK: bb12.bb15splitsplitsplitsplitsplit_crit_edge:
				; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[VAL6]], [[LSR_IV1]]
				; CHECK-NEXT: br label [[BB15SPLITSPLITSPLITSPLITSPLIT]]
				; CHECK: bb15splitsplitsplitsplitsplit:
				; CHECK-NEXT: [[VAL16_PH_PH_PH_PH_PH:%.]] = phi i32 [ [[TMP6]], [[BB12_BB15SPLITSPLITSPLITSPLITSPLIT_CRIT_EDGE]] ], [ [[VAL35:%.]], [[BB15SPLITSPLITSPLITSPLITSPLITSPLIT:%.*]] ]
				; CHECK-NEXT: br label [[BB15SPLITSPLITSPLITSPLIT:%.*]]
				; CHECK: bb17.bb15splitsplitsplitsplit_crit_edge:
				; CHECK-NEXT: [[TMP7:%.*]] = shl i32 [[VAL]], 1
				; CHECK-NEXT: [[TMP8:%.*]] = mul i32 [[VAL1]], [[VAL2]]
				; CHECK-NEXT: [[TMP9:%.*]] = shl i32 [[TMP8]], 1
				; CHECK-NEXT: [[TMP10:%.*]] = sub i32 [[TMP7]], [[TMP9]]
				; CHECK-NEXT: [[TMP11:%.*]] = shl i32 [[VAL5]], 1
				; CHECK-NEXT: [[TMP12:%.*]] = sub i32 [[TMP10]], [[TMP11]]
				; CHECK-NEXT: [[TMP13:%.*]] = add i32 [[TMP12]], [[LSR_IV1]]
				; CHECK-NEXT: br label [[BB15SPLITSPLITSPLITSPLIT]]
				; CHECK: bb15splitsplitsplitsplit:
				; CHECK-NEXT: [[VAL16_PH_PH_PH_PH:%.]] = phi i32 [ [[TMP13]], [[BB17_BB15SPLITSPLITSPLITSPLIT_CRIT_EDGE:%.]] ], [ [[VAL16_PH_PH_PH_PH_PH]], [[BB15SPLITSPLITSPLITSPLITSPLIT]] ]
				; CHECK-NEXT: br label [[BB15SPLITSPLITSPLIT:%.*]]
				; CHECK: bb20.bb15splitsplitsplit_crit_edge:
				; CHECK-NEXT: [[TMP14:%.*]] = mul i32 [[VAL]], 3
				; CHECK-NEXT: [[TMP15:%.*]] = mul i32 [[VAL1]], [[VAL2]]
				; CHECK-NEXT: [[TMP16:%.*]] = mul i32 [[TMP15]], 3
				; CHECK-NEXT: [[TMP17:%.*]] = sub i32 [[TMP14]], [[TMP16]]
				; CHECK-NEXT: [[TMP18:%.*]] = mul i32 [[VAL5]], 3
				; CHECK-NEXT: [[TMP19:%.*]] = sub i32 [[TMP17]], [[TMP18]]
				; CHECK-NEXT: [[TMP20:%.*]] = add i32 [[TMP19]], [[LSR_IV1]]
				; CHECK-NEXT: br label [[BB15SPLITSPLITSPLIT]]
				; CHECK: bb15splitsplitsplit:
				; CHECK-NEXT: [[VAL16_PH_PH_PH:%.]] = phi i32 [ [[TMP20]], [[BB20_BB15SPLITSPLITSPLIT_CRIT_EDGE:%.]] ], [ [[VAL16_PH_PH_PH_PH]], [[BB15SPLITSPLITSPLITSPLIT]] ]
				; CHECK-NEXT: br label [[BB15SPLITSPLIT:%.*]]
				; CHECK: bb23.bb15splitsplit_crit_edge:
				; CHECK-NEXT: [[TMP21:%.*]] = shl i32 [[VAL]], 2
				; CHECK-NEXT: [[TMP22:%.*]] = mul i32 [[VAL1]], [[VAL2]]
				; CHECK-NEXT: [[TMP23:%.*]] = shl i32 [[TMP22]], 2
				; CHECK-NEXT: [[TMP24:%.*]] = sub i32 [[TMP21]], [[TMP23]]
				; CHECK-NEXT: [[TMP25:%.*]] = shl i32 [[VAL5]], 2
				; CHECK-NEXT: [[TMP26:%.*]] = sub i32 [[TMP24]], [[TMP25]]
				; CHECK-NEXT: [[TMP27:%.*]] = add i32 [[TMP26]], [[LSR_IV1]]
				; CHECK-NEXT: br label [[BB15SPLITSPLIT]]
				; CHECK: bb15splitsplit:
				; CHECK-NEXT: [[VAL16_PH_PH:%.]] = phi i32 [ [[TMP27]], [[BB23_BB15SPLITSPLIT_CRIT_EDGE:%.]] ], [ [[VAL16_PH_PH_PH]], [[BB15SPLITSPLITSPLIT]] ]
				; CHECK-NEXT: br label [[BB15SPLIT:%.*]]
				; CHECK: bb26.bb15split_crit_edge:
				; CHECK-NEXT: [[TMP28:%.*]] = mul i32 [[VAL]], 5
				; CHECK-NEXT: [[TMP29:%.*]] = mul i32 [[VAL1]], [[VAL2]]
				; CHECK-NEXT: [[TMP30:%.*]] = mul i32 [[TMP29]], 5
				; CHECK-NEXT: [[TMP31:%.*]] = sub i32 [[TMP28]], [[TMP30]]
				; CHECK-NEXT: [[TMP32:%.*]] = mul i32 [[VAL5]], 5
				; CHECK-NEXT: [[TMP33:%.*]] = sub i32 [[TMP31]], [[TMP32]]
				; CHECK-NEXT: [[TMP34:%.*]] = add i32 [[TMP33]], [[LSR_IV1]]
				; CHECK-NEXT: br label [[BB15SPLIT]]
				; CHECK: bb15split:
				; CHECK-NEXT: [[VAL16_PH:%.]] = phi i32 [ [[TMP34]], [[BB26_BB15SPLIT_CRIT_EDGE:%.]] ], [ [[VAL16_PH_PH]], [[BB15SPLITSPLIT]] ]
				; CHECK-NEXT: br label [[BB15:%.*]]
				; CHECK: bb29.bb15_crit_edge:
				; CHECK-NEXT: [[TMP35:%.*]] = mul i32 [[VAL]], 6
				; CHECK-NEXT: [[TMP36:%.*]] = mul i32 [[VAL1]], [[VAL2]]
				; CHECK-NEXT: [[TMP37:%.*]] = mul i32 [[TMP36]], 6
				; CHECK-NEXT: [[TMP38:%.*]] = sub i32 [[TMP35]], [[TMP37]]
				; CHECK-NEXT: [[TMP39:%.*]] = mul i32 [[VAL5]], 6
				; CHECK-NEXT: [[TMP40:%.*]] = sub i32 [[TMP38]], [[TMP39]]
				; CHECK-NEXT: [[TMP41:%.*]] = add i32 [[TMP40]], [[LSR_IV1]]
				; CHECK-NEXT: br label [[BB15]]
				; CHECK: bb15:
				; CHECK-NEXT: [[VAL16:%.]] = phi i32 [ [[TMP41]], [[BB29_BB15_CRIT_EDGE:%.]] ], [ [[VAL16_PH]], [[BB15SPLIT]] ]
				; CHECK-NEXT: call void @widget() [ "deopt"(i32 [[VAL16]], i32 3, i32 [[VAL]]) ]
				; CHECK-NEXT: unreachable
				; CHECK: bb17:
				; CHECK-NEXT: [[VAL19:%.*]] = icmp slt i32 undef, undef
				; CHECK-NEXT: br i1 [[VAL19]], label [[BB20:%.*]], label [[BB17_BB15SPLITSPLITSPLITSPLIT_CRIT_EDGE]]
				; CHECK: bb20:
				; CHECK-NEXT: [[VAL22:%.*]] = icmp slt i32 undef, undef
				; CHECK-NEXT: br i1 [[VAL22]], label [[BB23:%.*]], label [[BB20_BB15SPLITSPLITSPLIT_CRIT_EDGE]]
				; CHECK: bb23:
				; CHECK-NEXT: [[VAL25:%.*]] = icmp slt i32 undef, undef
				; CHECK-NEXT: br i1 [[VAL25]], label [[BB26:%.*]], label [[BB23_BB15SPLITSPLIT_CRIT_EDGE]]
				; CHECK: bb26:
				; CHECK-NEXT: [[VAL28:%.*]] = icmp slt i32 undef, undef
				; CHECK-NEXT: br i1 [[VAL28]], label [[BB29:%.*]], label [[BB26_BB15SPLIT_CRIT_EDGE]]
				; CHECK: bb29:
				; CHECK-NEXT: [[VAL31:%.*]] = icmp slt i32 undef, undef
				; CHECK-NEXT: br i1 [[VAL31]], label [[BB32]], label [[BB29_BB15_CRIT_EDGE]]
				; CHECK: bb32:
				; CHECK-NEXT: [[TMP42:%.*]] = add i32 [[TMP4]], [[LSR_IV1]]
				; CHECK-NEXT: [[VAL35]] = add i32 [[TMP42]], [[VAL6]]
				; CHECK-NEXT: br i1 false, label [[BB7]], label [[BB15SPLITSPLITSPLITSPLITSPLITSPLIT]]
				;
				bb:
				%val = load i32, i32 addrspace(3)* undef, align 4
				%val1 = add i32 undef, 12
				%val2 = trunc i64 undef to i32
				%val3 = mul i32 %val1, %val2
				%val4 = sub i32 %val, %val3
				%val5 = ashr i32 undef, undef
				%val6 = sub i32 %val4, %val5
				br label %bb7

				bb7: ; preds = %bb32, %bb
				%val8 = phi i64 [ 0, %bb ], [ %val34, %bb32 ]
				%val9 = phi i32 [ 0, %bb ], [ %val35, %bb32 ]
				%val10 = icmp ult i64 %val8, 65536
				br i1 %val10, label %bb12, label %bb11

				bb11: ; preds = %bb7
				unreachable

				bb12: ; preds = %bb7
				%val13 = add i32 %val9, %val6
				%val14 = icmp slt i32 undef, undef
				br i1 %val14, label %bb17, label %bb15

				bb15: ; preds = %bb32, %bb29, %bb26, %bb23, %bb20, %bb17, %bb12
				%val16 = phi i32 [ %val35, %bb32 ], [ %val30, %bb29 ], [ %val27, %bb26 ], [ %val24, %bb23 ], [ %val21, %bb20 ], [ %val18, %bb17 ], [ %val13, %bb12 ]
				call void @widget() [ "deopt"(i32 %val16, i32 3, i32 %val) ]
				unreachable

				bb17: ; preds = %bb12
				%val18 = add i32 %val13, %val6
				%val19 = icmp slt i32 undef, undef
				br i1 %val19, label %bb20, label %bb15

				bb20: ; preds = %bb17
				%val21 = add i32 %val18, %val6
				%val22 = icmp slt i32 undef, undef
				br i1 %val22, label %bb23, label %bb15

				bb23: ; preds = %bb20
				%val24 = add i32 %val21, %val6
				%val25 = icmp slt i32 undef, undef
				br i1 %val25, label %bb26, label %bb15

				bb26: ; preds = %bb23
				%val27 = add i32 %val24, %val6
				%val28 = icmp slt i32 undef, undef
				br i1 %val28, label %bb29, label %bb15

				bb29: ; preds = %bb26
				%val30 = add i32 %val27, %val6
				%val31 = icmp slt i32 undef, undef
				br i1 %val31, label %bb32, label %bb15

				bb32: ; preds = %bb29
				%val33 = add i32 %val30, %val6
				%val34 = add nuw nsw i64 %val8, 8
				%val35 = add i32 %val33, %val6
				br i1 false, label %bb7, label %bb15
				}

				; Test checks that LSR does not hoist increment of %val8 while expanding the other pieces of formula
				; to original place in backedge causing incorrect SSA form.
				define void @test2() {
				; CHECK-LABEL: @test2(
				; CHECK-NEXT: bb:
				; CHECK-NEXT: [[VAL:%.]] = bitcast i8 null to i32*
				; CHECK-NEXT: [[VAL1:%.]] = load i32, i32 [[VAL]], align 4
				; CHECK-NEXT: [[VAL2:%.]] = bitcast i8 null to i32*
				; CHECK-NEXT: [[VAL3:%.]] = load i32, i32 [[VAL2]], align 4
				; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[VAL1]], 1
				; CHECK-NEXT: br label [[BB6:%.*]]
				; CHECK: bb4:
				; CHECK-NEXT: [[VAL5:%.]] = sext i32 [[VAL16:%.]] to i64
				; CHECK-NEXT: unreachable
				; CHECK: bb6:
				; CHECK-NEXT: [[LSR_IV1:%.]] = phi i32 [ [[LSR_IV_NEXT2:%.]], [[BB12:%.]] ], [ [[VAL3]], [[BB:%.]] ]
				; CHECK-NEXT: [[LSR_IV:%.]] = phi i64 [ [[LSR_IV_NEXT:%.]], [[BB12]] ], [ -1, [[BB]] ]
				; CHECK-NEXT: [[LSR_IV_NEXT]] = add nsw i64 [[LSR_IV]], 1
				; CHECK-NEXT: [[LSR_IV_NEXT2]] = add i32 [[LSR_IV1]], [[TMP0]]
				; CHECK-NEXT: [[VAL10:%.*]] = icmp ult i64 [[LSR_IV_NEXT]], 1048576
				; CHECK-NEXT: br i1 [[VAL10]], label [[BB12]], label [[BB11:%.*]]
				; CHECK: bb11:
				; CHECK-NEXT: unreachable
				; CHECK: bb12:
				; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[VAL1]], [[LSR_IV1]]
				; CHECK-NEXT: [[VAL16]] = add i32 [[TMP1]], 1
				; CHECK-NEXT: [[VAL17:%.*]] = fcmp olt double 0.000000e+00, 2.270000e+02
				; CHECK-NEXT: br i1 [[VAL17]], label [[BB6]], label [[BB4:%.*]]
				;
				bb:
				%val = bitcast i8* null to i32*
				%val1 = load i32, i32* %val, align 4
				%val2 = bitcast i8* null to i32*
				%val3 = load i32, i32* %val2, align 4
				br label %bb6

				bb4: ; preds = %bb12
				%val5 = sext i32 %val16 to i64
				unreachable

				bb6: ; preds = %bb12, %bb
				%val7 = phi i64 [ %val9, %bb12 ], [ 0, %bb ]
				%val8 = phi i32 [ %val16, %bb12 ], [ %val3, %bb ]
				%val9 = add nuw nsw i64 %val7, 1
				%val10 = icmp ult i64 %val7, 1048576
				br i1 %val10, label %bb12, label %bb11

				bb11: ; preds = %bb6
				unreachable

				bb12: ; preds = %bb6
				%val13 = select i1 false, i32 0, i32 %val8
				%val14 = add i32 %val8, %val1
				%val15 = select i1 false, i32 %val14, i32 %val13
				%val16 = add i32 %val14, 1
				%val17 = fcmp olt double 0.000000e+00, 2.270000e+02
				br i1 %val17, label %bb6, label %bb4
				}

				declare void @widget()

This is an archive of the discontinued LLVM Phabricator instance.

[LSR] Do not hoist IV if it is not post increment case. PR43678AbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 359215

llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp

llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp

llvm/test/Transforms/LoopStrengthReduce/wrong-hoisting-iv.ll

[LSR] Do not hoist IV if it is not post increment case. PR43678
AbandonedPublic