Download Raw Diff

Details

Reviewers

Commits

rG33b2c88fa822: [LoopFlatten] Widen IV, support ZExt.

Summary

I disabled the widening in fa5cb4b because it run in an assert. I forgot that an extend could also be a zero-extend, which I have added now.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Nov 18 2020, 2:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2020, 2:07 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

SjoerdMeijer requested review of this revision.Nov 18 2020, 2:07 AM

dmgreen added inline comments.Nov 18 2020, 11:06 AM

llvm/lib/Transforms/Scalar/LoopFlatten.cpp
342	dyn_cast -> isa
343	dyn_cast -> cast
552	Some of the formatting is still a little off here.
563	I'm not sure I understand any more. Should we not be replacing it with trunc(OuterInductionPHI) ? Can you add a test where it (the o*I+i value) is not zext or sext?

SjoerdMeijer added inline comments.Nov 18 2020, 1:20 PM

llvm/lib/Transforms/Scalar/LoopFlatten.cpp
563	I'm not sure I understand any more. Should we not be replacing it with trunc(OuterInductionPHI) ? After widening we have e.g. this pattern: %indvar = phi i64 [ %indvar.next, %for.body3.us ], [ 0, %for.cond1.preheader.us ] %3 = trunc i64 %indvar to i32 %add.us = add i32 %3, %mul.us %idxprom.us = zext i32 %add.us to i64 The linear IV user in this example is: %add.us = add i32 %3, %mul.us We don't want to be replacing this `%add.us` value which is a i32 value, because it will indeed be replaced by OuterInductionPhi, which is i64 value after widening. This was the assertion is was talking about. After widening, the value that we should be replacing is zext user which is `%idxprom.us` in this case. After widening, we have this IV -> Trunc->LinearIV ->Ext pattern, which is what we are matching here. I will add a comment to clarify this. Can you add a test where it (the o*I+i value) is not zext or sext? I think these cases are present in the original test test/Transforms/LoopFlatten/loop-flatten.ll.

Fixed casts, formatting and added some comments.

dmgreen added inline comments.Nov 18 2020, 2:11 PM

llvm/lib/Transforms/Scalar/LoopFlatten.cpp
563	Hmm. But, don't we start with something that looks like: for i32 outer = 0..n for i32 inner = 0..m use(outer * m + inner) We widen the IV's so they become: for i64 outer = 0..zext(n) for i64 inner = 0..zext(m) use(trunc(outer) * m + trunc(inner)) And we want to replace that with a single for i64 outer = 0..zext(n)*zext(m) use(trunc(outer)) We have not proved that the original did not overflow, so if it does we need to use the original truncated i32 value, not the i64 version of it directly.

xbolva00 added a subscriber: xbolva00.Nov 18 2020, 2:14 PM

xbolva00 added inline comments.

llvm/lib/Transforms/Scalar/LoopFlatten.cpp
553	isa?

SjoerdMeijer added inline comments.Nov 18 2020, 3:13 PM

llvm/lib/Transforms/Scalar/LoopFlatten.cpp
563	Do you mean overflow in the original outer * m + inner Expression? Is that relevant? Will check when I am back at my desk, but I think after widening we have: Use(outer) Without the trunc.

dmgreen added inline comments.Nov 19 2020, 1:05 AM

llvm/lib/Transforms/Scalar/LoopFlatten.cpp
563	Hmm. But Use in this case is an i32. We need to do something to outer (an i64) to get is back to an i32. This patch seems to be assuming that Use will be either a sext or a zext to the widened type. I think if it's not, it will still hit the assert (and if it is, would use the wrong value once `outer * m + inner` doesn't fit into the smaller type.)

SjoerdMeijer added inline comments.Nov 19 2020, 2:00 AM

llvm/lib/Transforms/Scalar/LoopFlatten.cpp
563	This pass is very restrictive in what it currently supports; it is pattern matching very specific patterns. If there are other users this pass will bail, for example Found use of inner induction variable: Did not match expected pattern, bailing or Found use of outer induction variable Did not match expected pattern, bailing The assumptions are covered with checks. And when it comes to replacing values, we are safe because we are only replacing values in the loop update which have been widened.

Thanks, fixed isa, looks like I keep forgetting about that.

dmgreen added inline comments.Nov 19 2020, 4:33 AM

llvm/lib/Transforms/Scalar/LoopFlatten.cpp
563	What happens in the zext test if it is changed from: %arrayidx.us = getelementptr inbounds i16, i16* %A, i64 %idxprom.us to %arrayidx.us = getelementptr inbounds i16, i16* %A, i32 %add.us Also, consider a simpler example where we have: for i8 outer = 0..n for i8 inner = 0..m use(i32 zext(outer * m + inner)) If we widen the IV's to i32's for example, the call to use() should still get a value between [0..255], even if the nm was higher than that (and so outer m + inner overflows). It would wrap in the original, and still needs to wrap in the final version. For i32->i64 the values will be a lot higher, but the same principle applies. I'm guessing that if it was a trunc(outer * m + inner) instead, the sext(trunc(..)) in the original case would not naturally simplify nicely?

SjoerdMeijer added inline comments.Nov 19 2020, 5:26 AM

llvm/lib/Transforms/Scalar/LoopFlatten.cpp

563

What happens in the zext test if it is changed from:

%arrayidx.us = getelementptr inbounds i16, i16* %A, i64 %idxprom.us

%arrayidx.us = getelementptr inbounds i16, i16* %A, i32 %add.us

We will end up with:

%indvar = phi i64 [ 0, %for.cond1.preheader.us ]
%3 = trunc i64 %indvar to i32
%add.us = add i32 %3, %mul.us
%idxprom.us = zext i32 %add.us to i64
%arrayidx.us = getelementptr inbounds i16, i16* %A, i32 %add.us

if this is the snippet you're interested in, because
.

 Replacing:   %idxprom.us = zext i32 %add.us to i64
with:        %indvar1 = phi i64 [ %indvar.next2, %for.cond1.for.inc7_crit_edge.us ], [ 0, %for.cond1.preheader.us.preheader ]

Which leaves %idxprom.us dead.

Also, consider a simpler example where we have:

for i8 outer = 0..n
for i8 inner = 0..m
 use(i32 zext(outer * m + inner))

Like I said, the pass is very restrictive, and we match very specific patterns. The ZExts are in the way here, we don't recognise the increment and we bail.

Sorry Dave, think I took a wrong exit somewhere, but hopefully back on track now.
I took your example and have added that as a test case. That was actually also asserting because of values with different types, another confirmation that something was wrong. Anyway, when we replace values, we now replace that with trunc(outerPhi) if the phi has been widened, as you suggested earlier I think. Let me know what you think while I double check and run some more code.

dmgreen added inline comments.Nov 20 2020, 1:18 AM

llvm/lib/Transforms/Scalar/LoopFlatten.cpp
551	Is is better to introduce a new trunc of FI.OuterInductionPHI to the correct bitwidth? I'm a little worried that this is just finding _some_ trunc, not necessarily one that it should. It may introduce more truncs but they should get cleared up. It would also prevent using something that did not dominate. Maybe it is fine like this, if it is know that the widening will have introduced a trunc. Can it at least check the type of the trunc is correct? And add a comment saying it should have been added by widening.
566	Finding the `Value *OuterValue = FI.OuterInductionPHI; if (...` can be outside of the loop.

SjoerdMeijer added inline comments.Nov 20 2020, 8:01 AM

llvm/lib/Transforms/Scalar/LoopFlatten.cpp
551	Since we promote the IV there has to be a trunc back to its users. What I see is that there is 1 trunc instruction, and then different zexts instructions of the IV value that to different users if they have different types. This means there is 1 trunc instruction, but you're right that this is not the whole story and something is missing at the moment. So, we will need a generic way to map the different values with the different users, if there are any. I think, for now, I will add a check a bit earlier in the pipeline to bail if we find more than 1 zext user of that trunc. That won't be optimal, but I am keen to start somewhere with this. And of course this patch in its current shape is still running in an assert when I just tried it out with different trunc users slightly modifying your example case: void test(char n, char m) { for(char i = 0; i < n; i++) for(char j = 0; j < m; j++) { char x = i*m+j; use_32(x); use_16(x); } }

Insert a trunc of the outer loop IV for each use, and use that to replace values. Added various tests.

Thanks. LGTM

This revision is now accepted and ready to land.Nov 23 2020, 12:22 AM

Closed by commit rG33b2c88fa822: [LoopFlatten] Widen IV, support ZExt. (authored by SjoerdMeijer). · Explain WhyNov 23 2020, 12:58 AM

This revision was automatically updated to reflect the committed changes.

SjoerdMeijer added a commit: rG33b2c88fa822: [LoopFlatten] Widen IV, support ZExt..

Diff 306010

llvm/lib/Transforms/Scalar/LoopFlatten.cpp

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	struct FlattenInfo {
Value *OuterLimit = nullptr;		Value *OuterLimit = nullptr;
BinaryOperator *InnerIncrement = nullptr;		BinaryOperator *InnerIncrement = nullptr;
BinaryOperator *OuterIncrement = nullptr;		BinaryOperator *OuterIncrement = nullptr;
BranchInst *InnerBranch = nullptr;		BranchInst *InnerBranch = nullptr;
BranchInst *OuterBranch = nullptr;		BranchInst *OuterBranch = nullptr;
SmallPtrSet<Value *, 4> LinearIVUses;		SmallPtrSet<Value *, 4> LinearIVUses;
SmallPtrSet<PHINode *, 4> InnerPHIsToTransform;		SmallPtrSet<PHINode *, 4> InnerPHIsToTransform;

		// Whether this holds the flatten info before or after widening.
		bool Widened = false;

FlattenInfo(Loop OL, Loop IL) : OuterLoop(OL), InnerLoop(IL) {};		FlattenInfo(Loop OL, Loop IL) : OuterLoop(OL), InnerLoop(IL) {};
};		};

// Finds the induction variable, increment and limit for a simple loop that we		// Finds the induction variable, increment and limit for a simple loop that we
// can flatten.		// can flatten.
static bool findLoopComponents(		static bool findLoopComponents(
Loop L, SmallPtrSetImpl<Instruction > &IterationInstructions,		Loop L, SmallPtrSetImpl<Instruction > &IterationInstructions,
PHINode &InductionPHI, Value &Limit, BinaryOperator *&Increment,		PHINode &InductionPHI, Value &Limit, BinaryOperator *&Increment,
▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	static bool checkIVUsers(struct FlattenInfo &FI) {
//		//
// (OuterPHI * InnerLimit) + InnerPHI		// (OuterPHI * InnerLimit) + InnerPHI
//		//
// Any uses of the induction variables not matching that pattern would		// Any uses of the induction variables not matching that pattern would
// require a div/mod to reconstruct in the flattened loop, so the		// require a div/mod to reconstruct in the flattened loop, so the
// transformation wouldn't be profitable.		// transformation wouldn't be profitable.

Value *InnerLimit = FI.InnerLimit;		Value *InnerLimit = FI.InnerLimit;
if (auto *I = dyn_cast<SExtInst>(InnerLimit))		if (FI.Widened &&
InnerLimit = I->getOperand(0);		(dyn_cast<SExtInst>(InnerLimit) \|\| dyn_cast<ZExtInst>(InnerLimit)))
		dmgreenUnsubmitted Not Done Reply Inline Actions dyn_cast -> isa dmgreen: dyn_cast -> isa
		InnerLimit = dyn_cast<Instruction>(InnerLimit)->getOperand(0);
		dmgreenUnsubmitted Not Done Reply Inline Actions dyn_cast -> cast dmgreen: dyn_cast -> cast

// Check that all uses of the inner loop's induction variable match the		// Check that all uses of the inner loop's induction variable match the
// expected pattern, recording the uses of the outer IV.		// expected pattern, recording the uses of the outer IV.
SmallPtrSet<Value *, 4> ValidOuterPHIUses;		SmallPtrSet<Value *, 4> ValidOuterPHIUses;
for (User *U : FI.InnerInductionPHI->users()) {		for (User *U : FI.InnerInductionPHI->users()) {
if (U == FI.InnerIncrement)		if (U == FI.InnerIncrement)
continue;		continue;

▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	static bool DoFlattenLoopPair(struct FlattenInfo &FI, DominatorTree *DT,

// Replace the inner loop backedge with an unconditional branch to the exit.		// Replace the inner loop backedge with an unconditional branch to the exit.
BasicBlock *InnerExitBlock = FI.InnerLoop->getExitBlock();		BasicBlock *InnerExitBlock = FI.InnerLoop->getExitBlock();
BasicBlock *InnerExitingBlock = FI.InnerLoop->getExitingBlock();		BasicBlock *InnerExitingBlock = FI.InnerLoop->getExitingBlock();
InnerExitingBlock->getTerminator()->eraseFromParent();		InnerExitingBlock->getTerminator()->eraseFromParent();
BranchInst::Create(InnerExitBlock, InnerExitingBlock);		BranchInst::Create(InnerExitBlock, InnerExitingBlock);
DT->deleteEdge(InnerExitingBlock, FI.InnerLoop->getHeader());		DT->deleteEdge(InnerExitingBlock, FI.InnerLoop->getHeader());

auto HasSExtUser = [] (Value V) -> Value {		auto HasSZExtUser = [] (Value V) -> Value {
		dmgreenUnsubmitted Not Done Reply Inline Actions Is is better to introduce a new trunc of FI.OuterInductionPHI to the correct bitwidth? I'm a little worried that this is just finding _some_ trunc, not necessarily one that it should. It may introduce more truncs but they should get cleared up. It would also prevent using something that did not dominate. Maybe it is fine like this, if it is know that the widening will have introduced a trunc. Can it at least check the type of the trunc is correct? And add a comment saying it should have been added by widening. dmgreen: Is is better to introduce a new trunc of FI.OuterInductionPHI to the correct bitwidth? I'm a…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions Since we promote the IV there has to be a trunc back to its users. What I see is that there is 1 trunc instruction, and then different zexts instructions of the IV value that to different users if they have different types. This means there is 1 trunc instruction, but you're right that this is not the whole story and something is missing at the moment. So, we will need a generic way to map the different values with the different users, if there are any. I think, for now, I will add a check a bit earlier in the pipeline to bail if we find more than 1 zext user of that trunc. That won't be optimal, but I am keen to start somewhere with this. And of course this patch in its current shape is still running in an assert when I just tried it out with different trunc users slightly modifying your example case: void test(char n, char m) { for(char i = 0; i < n; i++) for(char j = 0; j < m; j++) { char x = im+j; use_32(x); use_16(x); } } SjoerdMeijer:* Since we promote the IV there has to be a trunc back to its users. What I see is that there is…
for (User *U : V->users() )		for (User *U : V->users() )
		dmgreenUnsubmitted Not Done Reply Inline Actions Some of the formatting is still a little off here. dmgreen: Some of the formatting is still a little off here.
if (dyn_cast<SExtInst>(U))		if (dyn_cast<SExtInst>(U) \|\| dyn_cast<ZExtInst>(U))
		xbolva00Unsubmitted Not Done Reply Inline Actions isa? xbolva00: isa?
return U;		return U;
return nullptr;		return nullptr;
};		};

// Replace all uses of the polynomial calculated from the two induction		// Replace all uses of the polynomial calculated from the two induction
// variables with the one new one.		// variables with the one new one.
for (Value *V : FI.LinearIVUses) {		for (Value *V : FI.LinearIVUses) {
// If the induction variable has been widened, look through the SExt.		// If the induction variable has been widened, look through the SExt/ZExt.
if (Value *U = HasSExtUser(V))		if (FI.Widened)
		if (Value *U = HasSZExtUser(V))
		dmgreenUnsubmitted Not Done Reply Inline Actions I'm not sure I understand any more. Should we not be replacing it with trunc(OuterInductionPHI) ? Can you add a test where it (the oI+i value) is not zext or sext? dmgreen:* I'm not sure I understand any more. Should we not be replacing it with trunc(OuterInductionPHI)…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions I'm not sure I understand any more. Should we not be replacing it with trunc(OuterInductionPHI) ? After widening we have e.g. this pattern: %indvar = phi i64 [ %indvar.next, %for.body3.us ], [ 0, %for.cond1.preheader.us ] %3 = trunc i64 %indvar to i32 %add.us = add i32 %3, %mul.us %idxprom.us = zext i32 %add.us to i64 The linear IV user in this example is: %add.us = add i32 %3, %mul.us We don't want to be replacing this `%add.us` value which is a i32 value, because it will indeed be replaced by OuterInductionPhi, which is i64 value after widening. This was the assertion is was talking about. After widening, the value that we should be replacing is zext user which is `%idxprom.us` in this case. After widening, we have this IV -> Trunc->LinearIV ->Ext pattern, which is what we are matching here. I will add a comment to clarify this. Can you add a test where it (the oI+i value) is not zext or sext? I think these cases are present in the original test test/Transforms/LoopFlatten/loop-flatten.ll. SjoerdMeijer:* > I'm not sure I understand any more. Should we not be replacing it with trunc…
		dmgreenUnsubmitted Not Done Reply Inline Actions Hmm. But, don't we start with something that looks like: for i32 outer = 0..n for i32 inner = 0..m use(outer * m + inner) We widen the IV's so they become: for i64 outer = 0..zext(n) for i64 inner = 0..zext(m) use(trunc(outer) * m + trunc(inner)) And we want to replace that with a single for i64 outer = 0..zext(n)zext(m) use(trunc(outer)) We have not proved that the original did not overflow, so if it does we need to use the original truncated i32 value, not the i64 version of it directly. dmgreen:* Hmm. But, don't we start with something that looks like: for i32 outer = 0..n for i32…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions Do you mean overflow in the original outer * m + inner Expression? Is that relevant? Will check when I am back at my desk, but I think after widening we have: Use(outer) Without the trunc. SjoerdMeijer: Do you mean overflow in the original outer * m + inner Expression? Is that relevant? Will…
		dmgreenUnsubmitted Not Done Reply Inline Actions Hmm. But Use in this case is an i32. We need to do something to outer (an i64) to get is back to an i32. This patch seems to be assuming that Use will be either a sext or a zext to the widened type. I think if it's not, it will still hit the assert (and if it is, would use the wrong value once `outer * m + inner` doesn't fit into the smaller type.) dmgreen: Hmm. But Use in this case is an i32. We need to do something to outer (an i64) to get is back…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions This pass is very restrictive in what it currently supports; it is pattern matching very specific patterns. If there are other users this pass will bail, for example Found use of inner induction variable: Did not match expected pattern, bailing or Found use of outer induction variable Did not match expected pattern, bailing The assumptions are covered with checks. And when it comes to replacing values, we are safe because we are only replacing values in the loop update which have been widened. SjoerdMeijer: This pass is very restrictive in what it currently supports; it is pattern matching very…
		dmgreenUnsubmitted Not Done Reply Inline Actions What happens in the zext test if it is changed from: %arrayidx.us = getelementptr inbounds i16, i16* %A, i64 %idxprom.us to %arrayidx.us = getelementptr inbounds i16, i16* %A, i32 %add.us Also, consider a simpler example where we have: for i8 outer = 0..n for i8 inner = 0..m use(i32 zext(outer * m + inner)) If we widen the IV's to i32's for example, the call to use() should still get a value between [0..255], even if the nm was higher than that (and so outer m + inner overflows). It would wrap in the original, and still needs to wrap in the final version. For i32->i64 the values will be a lot higher, but the same principle applies. I'm guessing that if it was a trunc(outer * m + inner) instead, the sext(trunc(..)) in the original case would not naturally simplify nicely? dmgreen: What happens in the zext test if it is changed from: %arrayidx.us = getelementptr inbounds…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions What happens in the zext test if it is changed from: %arrayidx.us = getelementptr inbounds i16, i16* %A, i64 %idxprom.us to %arrayidx.us = getelementptr inbounds i16, i16* %A, i32 %add.us We will end up with: %indvar = phi i64 [ 0, %for.cond1.preheader.us ] %3 = trunc i64 %indvar to i32 %add.us = add i32 %3, %mul.us %idxprom.us = zext i32 %add.us to i64 %arrayidx.us = getelementptr inbounds i16, i16* %A, i32 %add.us if this is the snippet you're interested in, because . Replacing: %idxprom.us = zext i32 %add.us to i64 with: %indvar1 = phi i64 [ %indvar.next2, %for.cond1.for.inc7_crit_edge.us ], [ 0, %for.cond1.preheader.us.preheader ] Which leaves `%idxprom.us` dead. Also, consider a simpler example where we have: for i8 outer = 0..n for i8 inner = 0..m use(i32 zext(outer * m + inner)) Like I said, the pass is very restrictive, and we match very specific patterns. The ZExts are in the way here, we don't recognise the increment and we bail. SjoerdMeijer: > What happens in the zext test if it is changed from: > > %arrayidx.us = getelementptr…
V = U;		V = U;
V->replaceAllUsesWith(FI.OuterInductionPHI);		V->replaceAllUsesWith(FI.OuterInductionPHI);
}		}
		dmgreenUnsubmitted Not Done Reply Inline Actions Finding the `Value OuterValue = FI.OuterInductionPHI; if (...` can be outside of the loop. dmgreen:* Finding the `Value *OuterValue = FI.OuterInductionPHI; if (...` can be outside of the loop.

// Tell LoopInfo, SCEV and the pass manager that the inner loop has been		// Tell LoopInfo, SCEV and the pass manager that the inner loop has been
// deleted, and any information that have about the outer loop invalidated.		// deleted, and any information that have about the outer loop invalidated.
SE->forgetLoop(FI.OuterLoop);		SE->forgetLoop(FI.OuterLoop);
SE->forgetLoop(FI.InnerLoop);		SE->forgetLoop(FI.InnerLoop);
LI->erase(FI.InnerLoop);		LI->erase(FI.InnerLoop);
return true;		return true;
}		}
Show All 38 Lines	PHINode *WidePhi = createWideIV(WideIVs[i], LI, SE, Rewriter, DT, DeadInsts,
true /* UsePostIncrementRanges */);		true /* UsePostIncrementRanges */);
if (!WidePhi)		if (!WidePhi)
return false;		return false;
LLVM_DEBUG(dbgs() << "Created wide phi: "; WidePhi->dump());		LLVM_DEBUG(dbgs() << "Created wide phi: "; WidePhi->dump());
LLVM_DEBUG(dbgs() << "Deleting old phi: "; WideIVs[i].NarrowIV->dump());		LLVM_DEBUG(dbgs() << "Deleting old phi: "; WideIVs[i].NarrowIV->dump());
RecursivelyDeleteDeadPHINode(WideIVs[i].NarrowIV);		RecursivelyDeleteDeadPHINode(WideIVs[i].NarrowIV);
}		}
// After widening, rediscover all the loop components.		// After widening, rediscover all the loop components.
		assert(Widened && "Widenend IV expected");
		FI.Widened = true;
return CanFlattenLoopPair(FI, DT, LI, SE, AC, TTI);		return CanFlattenLoopPair(FI, DT, LI, SE, AC, TTI);
}		}

static bool FlattenLoopPair(struct FlattenInfo &FI, DominatorTree *DT,		static bool FlattenLoopPair(struct FlattenInfo &FI, DominatorTree *DT,
LoopInfo LI, ScalarEvolution SE,		LoopInfo LI, ScalarEvolution SE,
AssumptionCache *AC,		AssumptionCache *AC,
const TargetTransformInfo *TTI) {		const TargetTransformInfo *TTI) {
LLVM_DEBUG(		LLVM_DEBUG(
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopFlatten/widen-iv.ll

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	for.cond1.for.cond.cleanup3_crit_edge.us:
%inc6.us = add nuw nsw i32 %i.018.us, 1		%inc6.us = add nuw nsw i32 %i.018.us, 1
%cmp.us = icmp slt i32 %inc6.us, %N		%cmp.us = icmp slt i32 %inc6.us, %N
br i1 %cmp.us, label %for.cond1.preheader.us, label %for.cond.cleanup		br i1 %cmp.us, label %for.cond1.preheader.us, label %for.cond.cleanup

for.cond.cleanup:		for.cond.cleanup:
ret void		ret void
}		}

		define void @zext(i32 %N, i16* nocapture %A, i16 %val) {
		; CHECK-LABEL: @zext(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[CMP20_NOT:%.]] = icmp eq i32 [[N:%.]], 0
		; CHECK-NEXT: br i1 [[CMP20_NOT]], label [[FOR_END9:%.]], label [[FOR_COND1_PREHEADER_US_PREHEADER:%.]]
		; CHECK: for.cond1.preheader.us.preheader:
		; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[N]] to i64
		; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[N]] to i64
		; CHECK-NEXT: [[FLATTEN_TRIPCOUNT:%.*]] = mul i64 [[TMP0]], [[TMP1]]
		; CHECK-NEXT: br label [[FOR_COND1_PREHEADER_US:%.*]]
		; CHECK: for.cond1.preheader.us:
		; CHECK-NEXT: [[INDVAR1:%.]] = phi i64 [ [[INDVAR_NEXT2:%.]], [[FOR_COND1_FOR_INC7_CRIT_EDGE_US:%.*]] ], [ 0, [[FOR_COND1_PREHEADER_US_PREHEADER]] ]
		; CHECK-NEXT: [[TMP2:%.*]] = trunc i64 [[INDVAR1]] to i32
		; CHECK-NEXT: [[MUL_US:%.*]] = mul i32 [[TMP2]], [[N]]
		; CHECK-NEXT: br label [[FOR_BODY3_US:%.*]]
		; CHECK: for.body3.us:
		; CHECK-NEXT: [[INDVAR:%.*]] = phi i64 [ 0, [[FOR_COND1_PREHEADER_US]] ]
		; CHECK-NEXT: [[TMP3:%.*]] = trunc i64 [[INDVAR]] to i32
		; CHECK-NEXT: [[ADD_US:%.*]] = add i32 [[TMP3]], [[MUL_US]]
		; CHECK-NEXT: [[IDXPROM_US:%.*]] = zext i32 [[ADD_US]] to i64
		; CHECK-NEXT: [[ARRAYIDX_US:%.]] = getelementptr inbounds i16, i16 [[A:%.*]], i64 [[INDVAR1]]
		; CHECK-NEXT: [[TMP4:%.]] = load i16, i16 [[ARRAYIDX_US]], align 2
		; CHECK-NEXT: [[ADD5_US:%.]] = add i16 [[TMP4]], [[VAL:%.]]
		; CHECK-NEXT: store i16 [[ADD5_US]], i16* [[ARRAYIDX_US]], align 2
		; CHECK-NEXT: [[INDVAR_NEXT:%.*]] = add i64 [[INDVAR]], 1
		; CHECK-NEXT: [[CMP2_US:%.*]] = icmp ult i64 [[INDVAR_NEXT]], [[TMP0]]
		; CHECK-NEXT: br label [[FOR_COND1_FOR_INC7_CRIT_EDGE_US]]
		; CHECK: for.cond1.for.inc7_crit_edge.us:
		; CHECK-NEXT: [[INDVAR_NEXT2]] = add i64 [[INDVAR1]], 1
		; CHECK-NEXT: [[CMP_US:%.*]] = icmp ult i64 [[INDVAR_NEXT2]], [[FLATTEN_TRIPCOUNT]]
		; CHECK-NEXT: br i1 [[CMP_US]], label [[FOR_COND1_PREHEADER_US]], label [[FOR_END9_LOOPEXIT:%.*]]
		; CHECK: for.end9.loopexit:
		; CHECK-NEXT: br label [[FOR_END9]]
		; CHECK: for.end9:
		; CHECK-NEXT: ret void
		;
		; DONTWIDEN-LABEL: @zext(
		; DONTWIDEN-NEXT: entry:
		; DONTWIDEN-NEXT: [[CMP20_NOT:%.]] = icmp eq i32 [[N:%.]], 0
		; DONTWIDEN-NEXT: br i1 [[CMP20_NOT]], label [[FOR_END9:%.]], label [[FOR_COND1_PREHEADER_US_PREHEADER:%.]]
		; DONTWIDEN: for.cond1.preheader.us.preheader:
		; DONTWIDEN-NEXT: br label [[FOR_COND1_PREHEADER_US:%.*]]
		; DONTWIDEN: for.cond1.preheader.us:
		; DONTWIDEN-NEXT: [[I_021_US:%.]] = phi i32 [ [[INC8_US:%.]], [[FOR_COND1_FOR_INC7_CRIT_EDGE_US:%.*]] ], [ 0, [[FOR_COND1_PREHEADER_US_PREHEADER]] ]
		; DONTWIDEN-NEXT: [[MUL_US:%.*]] = mul i32 [[I_021_US]], [[N]]
		; DONTWIDEN-NEXT: br label [[FOR_BODY3_US:%.*]]
		; DONTWIDEN: for.body3.us:
		; DONTWIDEN-NEXT: [[J_019_US:%.]] = phi i32 [ 0, [[FOR_COND1_PREHEADER_US]] ], [ [[INC_US:%.]], [[FOR_BODY3_US]] ]
		; DONTWIDEN-NEXT: [[ADD_US:%.*]] = add i32 [[J_019_US]], [[MUL_US]]
		; DONTWIDEN-NEXT: [[IDXPROM_US:%.*]] = zext i32 [[ADD_US]] to i64
		; DONTWIDEN-NEXT: [[ARRAYIDX_US:%.]] = getelementptr inbounds i16, i16 [[A:%.*]], i64 [[IDXPROM_US]]
		; DONTWIDEN-NEXT: [[TMP0:%.]] = load i16, i16 [[ARRAYIDX_US]], align 2
		; DONTWIDEN-NEXT: [[ADD5_US:%.]] = add i16 [[TMP0]], [[VAL:%.]]
		; DONTWIDEN-NEXT: store i16 [[ADD5_US]], i16* [[ARRAYIDX_US]], align 2
		; DONTWIDEN-NEXT: [[INC_US]] = add nuw i32 [[J_019_US]], 1
		; DONTWIDEN-NEXT: [[CMP2_US:%.*]] = icmp ult i32 [[INC_US]], [[N]]
		; DONTWIDEN-NEXT: br i1 [[CMP2_US]], label [[FOR_BODY3_US]], label [[FOR_COND1_FOR_INC7_CRIT_EDGE_US]]
		; DONTWIDEN: for.cond1.for.inc7_crit_edge.us:
		; DONTWIDEN-NEXT: [[INC8_US]] = add i32 [[I_021_US]], 1
		; DONTWIDEN-NEXT: [[CMP_US:%.*]] = icmp ult i32 [[INC8_US]], [[N]]
		; DONTWIDEN-NEXT: br i1 [[CMP_US]], label [[FOR_COND1_PREHEADER_US]], label [[FOR_END9_LOOPEXIT:%.*]]
		; DONTWIDEN: for.end9.loopexit:
		; DONTWIDEN-NEXT: br label [[FOR_END9]]
		; DONTWIDEN: for.end9:
		; DONTWIDEN-NEXT: ret void
		;
		entry:
		%cmp20.not = icmp eq i32 %N, 0
		br i1 %cmp20.not, label %for.end9, label %for.cond1.preheader.us.preheader

		for.cond1.preheader.us.preheader:
		br label %for.cond1.preheader.us

		for.cond1.preheader.us:
		%i.021.us = phi i32 [ %inc8.us, %for.cond1.for.inc7_crit_edge.us ], [ 0, %for.cond1.preheader.us.preheader ]
		%mul.us = mul i32 %i.021.us, %N
		br label %for.body3.us

		for.body3.us:
		%j.019.us = phi i32 [ 0, %for.cond1.preheader.us ], [ %inc.us, %for.body3.us ]
		%add.us = add i32 %j.019.us, %mul.us
		%idxprom.us = zext i32 %add.us to i64
		%arrayidx.us = getelementptr inbounds i16, i16* %A, i64 %idxprom.us
		%0 = load i16, i16* %arrayidx.us, align 2
		%add5.us = add i16 %0, %val
		store i16 %add5.us, i16* %arrayidx.us, align 2
		%inc.us = add nuw i32 %j.019.us, 1
		%cmp2.us = icmp ult i32 %inc.us, %N
		br i1 %cmp2.us, label %for.body3.us, label %for.cond1.for.inc7_crit_edge.us

		for.cond1.for.inc7_crit_edge.us:
		%inc8.us = add i32 %i.021.us, 1
		%cmp.us = icmp ult i32 %inc8.us, %N
		br i1 %cmp.us, label %for.cond1.preheader.us, label %for.end9.loopexit

		for.end9.loopexit:
		br label %for.end9

		for.end9:
		ret void
		}



declare dso_local void @f(i32* %0) local_unnamed_addr #1		declare dso_local void @f(i32* %0) local_unnamed_addr #1

This is an archive of the discontinued LLVM Phabricator instance.

[LoopFlatten] Widen IV, cont'd
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 306010

llvm/lib/Transforms/Scalar/LoopFlatten.cpp

llvm/test/Transforms/LoopFlatten/widen-iv.ll

This is an archive of the discontinued LLVM Phabricator instance.

[LoopFlatten] Widen IV, cont'dClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 306010

llvm/lib/Transforms/Scalar/LoopFlatten.cpp

llvm/test/Transforms/LoopFlatten/widen-iv.ll

[LoopFlatten] Widen IV, cont'd
ClosedPublic