This is an archive of the discontinued LLVM Phabricator instance.

Fix incorrect expand of non-linear addrecs
AbandonedPublic

Authored by apilipenko on May 28 2019, 5:27 PM.

Download Raw Diff

Details

Reviewers

reames
wristow
loladiro
sanjoy
kparzysz
rtereshin

Summary

We have a problem in SCEVExpander around expansion of non-linear addrecs. Expanding an addrec SCEVExpander emits an expression to compute the addrec at a given iteration number (using SCEVAddRecExpr::evaluateAtIteration). As an iteration number it uses the canonical induction variable of the loop. Canonical IV here is an IV starting at 0 and incremented by 1 on every iteration, {0,+,1}. When the loop doesn't have a canonical IV SCEVExpander inserts one. It uses the type of the addrec to be expanded as the type for the canonical IV. This is not always correct.

If the addrec to expand is a linear addrec {start,+,step}, the expression to compute the value at the given iteration i is:

(start + i * step) mod MaxType,

where MaxType is the maximum value in the type of the addrec.

A canonical IV of the same type as addrec corresponds to (i mod MaxType). Using this IV as the iteration number i works fine for linear addrecs:

(start + i * step) mod MaxType = (start mod MaxType) + (i mod MaxType) * (step mod MaxType)

This is because mod commutates with + and *, so we can sink mod (truncation) down to the operands.

But it's not correct for non-linear addrecs because the expression to compute the value on the given iteration involves division (see BinomialCoefficient function in ScalarEvolution.cpp).

For example, look at scev-expand-canonical-iv-type.ll test in this patch. In this example, loop-vectorize expands i8 {0,+,2,+,1} addrec in the loop without a canonical IV. It inserts a canonical IV of type i8 and uses in the expression to compute the value of the addrec at the given iteration.

The expression is:

((i * (i - 1)) /u 2 + 2 * i) mod 256

Using a i8 canonical IV effectively turns it into:

((i mod 256) * ((i mod 256) - 1)) /u 2 + 2 * (i mod 256)

This is not equal to the original expression, because mod and division (truncation and lshr) don't commutate. In this case we need to used a canonical IV of a wider type. For the exact SCEVs of this example see below [1].

SCEVExpander needs to be aware that expansion of an addrec might need an canonical IV of a type wider than the addrec. This patch fixes the issue by introducing SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration method and using it in SCEVExpander.

This patch can be split into three distinct changes. I plan to split them when integrating, but post the initial review with all three in one for better context.

Prepare SCEVExpander::visitAddRecExpr to use CanonicalIV wider than the addrec.
Introduce SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration, MinIterationWidthForBinomialCoefficient with an assertion in BinomialCoefficient.
Use SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration in SCEVExpander::visitAddRecExpr to compute the type of the canonical IV.

This patch is currently going through our internal fuzzing and performance testing.

[1] Debug output from evaluateAtIteration before the fix:

evaluateAtIteration this = {0,+,2,+,1}<%outer_loop>
evaluateAtIteration it = i8 %indvar
evaluateAtIteration result = ((trunc i9 (((zext i8 (-1 + %indvar) to i9) * (zext i8 %indvar to i9)) /u 2) to i8) + (2 * %indvar))

After the fix:

evaluateAtIteration this = {0,+,2,+,1}<%outer_loop>
evaluateAtIteration it = i9 %indvar
evaluateAtIteration result = ((trunc i9 (((-1 + %indvar) * %indvar) /u 2) to i8) + (2 * (trunc i9 %indvar to i8)))

Diff Detail

Event Timeline

apilipenko created this revision.May 28 2019, 5:27 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2019, 5:27 PM

Herald added a subscriber: javed.absar. · View Herald Transcript

apilipenko edited the summary of this revision. (Show Details)May 28 2019, 5:27 PM

Artur, nice find. In terms of staging complexity, have you considered how impactful it would be to simply refuse to generate the value at the given iteration in this case? evaluateAtIteration is allowed to return SCEVCouldNotCompute. I'm tempted to first introduce a bailout for a quick correctness fix - maybe along side your assert to see if we're missing any other cases - and then spend more time considering your full fix.

One observation: I really don't think we want to be emitting an i9. We probably want to be rounding up to the nearest legal type. For i65, I doubt we want to be generating it at all. It's probably best to simply bail out in that case.

p.s. Please go ahead and land your preparatory changes (1). The context was helpful, but we should get those out of the way to simplify the rest of the review. (This is mostly just factoring out CanonicalIVType right?)

lib/Analysis/ScalarEvolutionExpander.cpp
1486	Removing this entirely seems like overkill. Maybe either add a restriction to a) affine addrecs or b) the bitwidth of the existing canonical IV is sufficient?
1568	Not sure you actually want getUnknown? Did you mean getSCEV?

In D62563#1521477, @reames wrote:

Artur, nice find. In terms of staging complexity, have you considered how impactful it would be to simply refuse to generate the value at the given iteration in this case? evaluateAtIteration is allowed to return SCEVCouldNotCompute. I'm tempted to first introduce a bailout for a quick correctness fix - maybe along side your assert to see if we're missing any other cases - and then spend more time considering your full fix.

Yes, evaluateAtIteration can return SCEVCouldNotCompute, but it doesn't seem like SCEVExpander is ready for that. What we can do instead is to use expandAddRecExprLiterally for non-affine addrecs.

One observation: I really don't think we want to be emitting an i9. We probably want to be rounding up to the nearest legal type. For i65, I doubt we want to be generating it at all. It's probably best to simply bail out in that case.

evaluateAtIteration is already emitting i9 types for high order binomial coefficients. See the debug output in the review description. But I guess you are right, that we don't want to introduce canonical IVs of non-legal types. If we need to bail out for some types the the bail out would be expandAddRecExprLiterally.

In general I like the idea of using expandAddRecExprLiterally for non-affine addrecs, I'm going to run some performance experiments with this approach to see the impact.

In D62563#1522063, @apilipenko wrote:

...
In general I like the idea of using expandAddRecExprLiterally for non-affine addrecs, I'm going to run some performance experiments with this approach to see the impact.

Artur and I spoke offline. The workaround suggestion I made doesn't appear to really work well, but Artur is going to move forward with the expandAddRecExprLiterally idea when we'd overflow the IV type. This seems to be a practical fix for non-affine IVs.

Fall back to non-canonical mode for non-affine addrecs.

apilipenko updated this revision to Diff 202768.Jun 3 2019, 11:52 AM

reames requested changes to this revision.Jun 3 2019, 2:04 PM

reames added inline comments.

include/llvm/Analysis/ScalarEvolutionExpressions.h
347	Does this mean that evaluateAtIteration may return an incorrect result? If so, I'd very much like to see an assert which trips on that usage.
lib/Analysis/ScalarEvolutionExpander.cpp
1486	How about only falling back to literal expansion for non-affine addrecs which actually need the wider expansion?

This revision now requires changes to proceed.Jun 3 2019, 2:04 PM

There is a https://bugs.llvm.org/show_bug.cgi?id=42384 failing due same problem.

Finished by @ebrevnov as D65276.

Revision Contents

Path

Size

include/

llvm/

Analysis/

ScalarEvolutionExpressions.h

6 lines

lib/

Analysis/

ScalarEvolution.cpp

21 lines

ScalarEvolutionExpander.cpp

52 lines

test/

Analysis/

ScalarEvolution/

scev-expand-canonical-iv-type.ll

56 lines

Transforms/

LoopVectorize/

X86/

illegal-parallel-loop-uniform-write.ll

69 lines

unittests/

Analysis/

ScalarEvolutionTest.cpp

229 lines

Diff 201795

include/llvm/Analysis/ScalarEvolutionExpressions.h

Show First 20 Lines • Show All 338 Lines • ▼ Show 20 Lines	void setNoWrapFlags(NoWrapFlags Flags) {
if (Flags & (FlagNUW \| FlagNSW))		if (Flags & (FlagNUW \| FlagNSW))
Flags = ScalarEvolution::setFlags(Flags, FlagNW);		Flags = ScalarEvolution::setFlags(Flags, FlagNW);
SubclassData \|= Flags;		SubclassData \|= Flags;
}		}

/// Return the value of this chain of recurrences at the specified		/// Return the value of this chain of recurrences at the specified
/// iteration number.		/// iteration number.
const SCEV evaluateAtIteration(const SCEV It, ScalarEvolution &SE) const;		const SCEV evaluateAtIteration(const SCEV It, ScalarEvolution &SE) const;

		reamesUnsubmitted Not Done Reply Inline Actions Does this mean that evaluateAtIteration may return an incorrect result? If so, I'd very much like to see an assert which trips on that usage. reames: Does this mean that evaluateAtIteration may return an incorrect result? If so, I'd very much…
		/// Return the minimum bitwidth of the iteration expr to compute this
		/// AddRec at the given iteration without overflow. For affine AddRecs its
		/// the same as the AddRec type width, for non-affine it might be wider
		/// that the AddRec type width.
		unsigned minIterationWidthForEvaluateAtIteration(ScalarEvolution &SE) const;

/// Return the number of iterations of this loop that produce		/// Return the number of iterations of this loop that produce
/// values in the specified constant range. Another way of		/// values in the specified constant range. Another way of
/// looking at this is that it returns the first iteration number		/// looking at this is that it returns the first iteration number
/// where the value is not in the condition, thus computing the		/// where the value is not in the condition, thus computing the
/// exit count. If the iteration count can't be computed, an		/// exit count. If the iteration count can't be computed, an
/// instance of SCEVCouldNotCompute is returned.		/// instance of SCEVCouldNotCompute is returned.
const SCEV *getNumIterationsInRange(const ConstantRange &Range,		const SCEV *getNumIterationsInRange(const ConstantRange &Range,
ScalarEvolution &SE) const;		ScalarEvolution &SE) const;
▲ Show 20 Lines • Show All 499 Lines • Show Last 20 Lines

lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,102 Lines • ▼ Show 20 Lines
};		};

} // end anonymous namespace		} // end anonymous namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Simple SCEV method implementations		// Simple SCEV method implementations
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		static unsigned MinIterationWidthForBinomialCoefficient(unsigned K,
		Type *ResultTy,
		ScalarEvolution &SE) {
		unsigned W = SE.getTypeSizeInBits(ResultTy);
		unsigned T = 1;
		for (unsigned i = 3; i <= K; ++i)
		T += APInt(W, i).countTrailingZeros();
		return W + T;
		}

/// Compute BC(It, K). The result has width W. Assume, K > 0.		/// Compute BC(It, K). The result has width W. Assume, K > 0.
static const SCEV BinomialCoefficient(const SCEV It, unsigned K,		static const SCEV BinomialCoefficient(const SCEV It, unsigned K,
ScalarEvolution &SE,		ScalarEvolution &SE,
Type *ResultTy) {		Type *ResultTy) {
// Handle the simplest case efficiently.		// Handle the simplest case efficiently.
if (K == 1)		if (K == 1)
return SE.getTruncateOrZeroExtend(It, ResultTy);		return SE.getTruncateOrZeroExtend(It, ResultTy);

▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	for (unsigned i = 3; i <= K; ++i) {
unsigned TwoFactors = Mult.countTrailingZeros();		unsigned TwoFactors = Mult.countTrailingZeros();
T += TwoFactors;		T += TwoFactors;
Mult.lshrInPlace(TwoFactors);		Mult.lshrInPlace(TwoFactors);
OddFactorial *= Mult;		OddFactorial *= Mult;
}		}

// We need at least W + T bits for the multiplication step		// We need at least W + T bits for the multiplication step
unsigned CalculationBits = W + T;		unsigned CalculationBits = W + T;
		assert(MinIterationWidthForBinomialCoefficient(K, ResultTy, SE) ==
		CalculationBits &&
		"must be the same!");

// Calculate 2^T, at width T+W.		// Calculate 2^T, at width T+W.
APInt DivFactor = APInt::getOneBitSet(CalculationBits, T);		APInt DivFactor = APInt::getOneBitSet(CalculationBits, T);

// Calculate the multiplicative inverse of K! / 2^T;		// Calculate the multiplicative inverse of K! / 2^T;
// this multiplication factor will perform the exact division by		// this multiplication factor will perform the exact division by
// K! / 2^T.		// K! / 2^T.
APInt Mod = APInt::getSignedMinValue(W+1);		APInt Mod = APInt::getSignedMinValue(W+1);
Show All 39 Lines	for (unsigned i = 1, e = getNumOperands(); i != e; ++i) {
if (isa<SCEVCouldNotCompute>(Coeff))		if (isa<SCEVCouldNotCompute>(Coeff))
return Coeff;		return Coeff;

Result = SE.getAddExpr(Result, SE.getMulExpr(getOperand(i), Coeff));		Result = SE.getAddExpr(Result, SE.getMulExpr(getOperand(i), Coeff));
}		}
return Result;		return Result;
}		}

		unsigned SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration(
		ScalarEvolution &SE) const {
		if (isAffine())
		return SE.getTypeSizeInBits(getType());
		return MinIterationWidthForBinomialCoefficient(getNumOperands() - 1,
		getType(), SE);
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SCEV Expression folder implementations		// SCEV Expression folder implementations
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

const SCEV ScalarEvolution::getTruncateExpr(const SCEV Op, Type *Ty,		const SCEV ScalarEvolution::getTruncateExpr(const SCEV Op, Type *Ty,
unsigned Depth) {		unsigned Depth) {
assert(getTypeSizeInBits(Op->getType()) > getTypeSizeInBits(Ty) &&		assert(getTypeSizeInBits(Op->getType()) > getTypeSizeInBits(Ty) &&
"This is not a truncating conversion!");		"This is not a truncating conversion!");
▲ Show 20 Lines • Show All 9,991 Lines • Show Last 20 Lines

lib/Analysis/ScalarEvolutionExpander.cpp

Show First 20 Lines • Show All 1,470 Lines • ▼ Show 20 Lines	Value SCEVExpander::expandAddRecExprLiterally(const SCEVAddRecExpr S) {
return Result;		return Result;
}		}

Value SCEVExpander::visitAddRecExpr(const SCEVAddRecExpr S) {		Value SCEVExpander::visitAddRecExpr(const SCEVAddRecExpr S) {
if (!CanonicalMode) return expandAddRecExprLiterally(S);		if (!CanonicalMode) return expandAddRecExprLiterally(S);

Type *Ty = SE.getEffectiveSCEVType(S->getType());		Type *Ty = SE.getEffectiveSCEVType(S->getType());
const Loop *L = S->getLoop();		const Loop *L = S->getLoop();

		unsigned MinCanonicalIVWidth = S->minIterationWidthForEvaluateAtIteration(SE);

// First check for an existing canonical IV in a suitable type.		// First check for an existing canonical IV in a suitable type.
PHINode *CanonicalIV = nullptr;		PHINode *CanonicalIV = nullptr;
if (PHINode *PN = L->getCanonicalInductionVariable())		if (PHINode *PN = L->getCanonicalInductionVariable())
if (SE.getTypeSizeInBits(PN->getType()) >= SE.getTypeSizeInBits(Ty))		if (SE.getTypeSizeInBits(PN->getType()) >= MinCanonicalIVWidth)
CanonicalIV = PN;		CanonicalIV = PN;
		reamesUnsubmitted Not Done Reply Inline Actions How about only falling back to literal expansion for non-affine addrecs which actually need the wider expansion? reames: How about only falling back to literal expansion for non-affine addrecs which actually need the…

// Rewrite an AddRec in terms of the canonical induction variable, if
reamesUnsubmitted Not Done Reply Inline Actions Removing this entirely seems like overkill. Maybe either add a restriction to a) affine addrecs or b) the bitwidth of the existing canonical IV is sufficient? reames: Removing this entirely seems like overkill. Maybe either add a restriction to a) affine…
// its type is more narrow.
if (CanonicalIV &&
SE.getTypeSizeInBits(CanonicalIV->getType()) >
SE.getTypeSizeInBits(Ty)) {
SmallVector<const SCEV *, 4> NewOps(S->getNumOperands());
for (unsigned i = 0, e = S->getNumOperands(); i != e; ++i)
NewOps[i] = SE.getAnyExtendExpr(S->op_begin()[i], CanonicalIV->getType());
Value *V = expand(SE.getAddRecExpr(NewOps, S->getLoop(),
S->getNoWrapFlags(SCEV::FlagNW)));
BasicBlock::iterator NewInsertPt =
findInsertPointAfter(cast<Instruction>(V), Builder.GetInsertBlock());
V = expandCodeFor(SE.getTruncateExpr(SE.getUnknown(V), Ty), nullptr,
&*NewInsertPt);
return V;
}

// {X,+,F} --> X + {0,+,F}		// {X,+,F} --> X + {0,+,F}
if (!S->getStart()->isZero()) {		if (!S->getStart()->isZero()) {
SmallVector<const SCEV *, 4> NewOps(S->op_begin(), S->op_end());		SmallVector<const SCEV *, 4> NewOps(S->op_begin(), S->op_end());
NewOps[0] = SE.getConstant(Ty, 0);		NewOps[0] = SE.getConstant(Ty, 0);
const SCEV *Rest = SE.getAddRecExpr(NewOps, L,		const SCEV *Rest = SE.getAddRecExpr(NewOps, L,
S->getNoWrapFlags(SCEV::FlagNW));		S->getNoWrapFlags(SCEV::FlagNW));

// Turn things like ptrtoint+arithmetic+inttoptr into GEP. See the		// Turn things like ptrtoint+arithmetic+inttoptr into GEP. See the
Show All 24 Lines	Value SCEVExpander::visitAddRecExpr(const SCEVAddRecExpr S) {
}		}

// If we don't yet have a canonical IV, create one.		// If we don't yet have a canonical IV, create one.
if (!CanonicalIV) {		if (!CanonicalIV) {
// Create and insert the PHI node for the induction variable in the		// Create and insert the PHI node for the induction variable in the
// specified loop.		// specified loop.
BasicBlock *Header = L->getHeader();		BasicBlock *Header = L->getHeader();
pred_iterator HPB = pred_begin(Header), HPE = pred_end(Header);		pred_iterator HPB = pred_begin(Header), HPE = pred_end(Header);
CanonicalIV = PHINode::Create(Ty, std::distance(HPB, HPE), "indvar",		auto *CanonicalIVType =
&Header->front());		IntegerType::get(SE.getContext(), MinCanonicalIVWidth);
		CanonicalIV = PHINode::Create(CanonicalIVType, std::distance(HPB, HPE),
		"indvar", &Header->front());
rememberInstruction(CanonicalIV);		rememberInstruction(CanonicalIV);

SmallSet<BasicBlock *, 4> PredSeen;		SmallSet<BasicBlock *, 4> PredSeen;
Constant *One = ConstantInt::get(Ty, 1);		Constant *One = ConstantInt::get(CanonicalIVType, 1);
for (pred_iterator HPI = HPB; HPI != HPE; ++HPI) {		for (pred_iterator HPI = HPB; HPI != HPE; ++HPI) {
BasicBlock HP = HPI;		BasicBlock HP = HPI;
if (!PredSeen.insert(HP).second) {		if (!PredSeen.insert(HP).second) {
// There must be an incoming value for each predecessor, even the		// There must be an incoming value for each predecessor, even the
// duplicates!		// duplicates!
CanonicalIV->addIncoming(CanonicalIV->getIncomingValueForBlock(HP), HP);		CanonicalIV->addIncoming(CanonicalIV->getIncomingValueForBlock(HP), HP);
continue;		continue;
}		}

if (L->contains(HP)) {		if (L->contains(HP)) {
// Insert a unit add instruction right before the terminator		// Insert a unit add instruction right before the terminator
// corresponding to the back-edge.		// corresponding to the back-edge.
Instruction *Add = BinaryOperator::CreateAdd(CanonicalIV, One,		Instruction *Add = BinaryOperator::CreateAdd(CanonicalIV, One,
"indvar.next",		"indvar.next",
HP->getTerminator());		HP->getTerminator());
Add->setDebugLoc(HP->getTerminator()->getDebugLoc());		Add->setDebugLoc(HP->getTerminator()->getDebugLoc());
rememberInstruction(Add);		rememberInstruction(Add);
CanonicalIV->addIncoming(Add, HP);		CanonicalIV->addIncoming(Add, HP);
} else {		} else {
CanonicalIV->addIncoming(Constant::getNullValue(Ty), HP);		CanonicalIV->addIncoming(Constant::getNullValue(CanonicalIVType), HP);
}		}
}		}
}		}

// {0,+,1} --> Insert a canonical induction variable into the loop!		// {0,+,1} --> Return the canonical induction variable
if (S->isAffine() && S->getOperand(1)->isOne()) {		if (S->isAffine() && S->getOperand(1)->isOne()) {
assert(Ty == SE.getEffectiveSCEVType(CanonicalIV->getType()) &&		if (Ty == SE.getEffectiveSCEVType(CanonicalIV->getType()))
"IVs with types different from the canonical IV should "
"already have been handled!");
return CanonicalIV;		return CanonicalIV;

		assert(SE.getTypeSizeInBits(CanonicalIV->getType()) >=
		SE.getTypeSizeInBits(Ty) &&
		"Canonical IV can't be more narrow than the expr!");
		return expand(SE.getTruncateExpr(SE.getUnknown(CanonicalIV), Ty));
		reamesUnsubmitted Not Done Reply Inline Actions Not sure you actually want getUnknown? Did you mean getSCEV? reames: Not sure you actually want getUnknown? Did you mean getSCEV?
}		}

// {0,+,F} --> {0,+,1} * F		// {0,+,F} --> {0,+,1} * F

// If this is a simple linear addrec, emit it now as a special case.		// If this is a simple linear addrec, emit it now as a special case.
if (S->isAffine()) // {0,+,F} --> i*F		if (S->isAffine()) // {0,+,F} --> i*F
return		return
expand(SE.getTruncateOrNoop(		expand(SE.getTruncateOrNoop(
SE.getMulExpr(SE.getUnknown(CanonicalIV),		SE.getMulExpr(SE.getUnknown(CanonicalIV),
SE.getNoopOrAnyExtend(S->getOperand(1),		SE.getNoopOrAnyExtend(S->getOperand(1),
CanonicalIV->getType())),		CanonicalIV->getType())),
Ty));		Ty));

// If this is a chain of recurrences, turn it into a closed form, using the		// If this is a chain of recurrences, turn it into a closed form, using the
// folders, then expandCodeFor the closed form. This allows the folders to		// folders, then expandCodeFor the closed form. This allows the folders to
// simplify the expression without having to build a bunch of special code		// simplify the expression without having to build a bunch of special code
// into this folder.		// into this folder.
const SCEV *IH = SE.getUnknown(CanonicalIV); // Get I as a "symbolic" SCEV.		const SCEV *IH = SE.getUnknown(CanonicalIV); // Get I as a "symbolic" SCEV.

// Promote S up to the canonical IV type, if the cast is foldable.		const SCEV *V = cast<SCEVAddRecExpr>(S)->evaluateAtIteration(IH, SE);
const SCEV *NewS = S;
const SCEV *Ext = SE.getNoopOrAnyExtend(S, CanonicalIV->getType());
if (isa<SCEVAddRecExpr>(Ext))
NewS = Ext;

const SCEV *V = cast<SCEVAddRecExpr>(NewS)->evaluateAtIteration(IH, SE);
//cerr << "Evaluated: " << this << "\n to: " << V << "\n";		//cerr << "Evaluated: " << this << "\n to: " << V << "\n";

// Truncate the result down to the original type, if needed.		// Truncate the result down to the original type, if needed.
const SCEV *T = SE.getTruncateOrNoop(V, Ty);		const SCEV *T = SE.getTruncateOrNoop(V, Ty);
return expand(T);		return expand(T);
}		}

Value SCEVExpander::visitTruncateExpr(const SCEVTruncateExpr S) {		Value SCEVExpander::visitTruncateExpr(const SCEVTruncateExpr S) {
▲ Show 20 Lines • Show All 819 Lines • Show Last 20 Lines

test/Analysis/ScalarEvolution/scev-expand-canonical-iv-type.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -S \| FileCheck %s

				target triple = "x86_64-unknown-linux-gnu"

				; This test verifies that the SCEVExpander emits a canonical IV of a proper type
				; for non-linear addrec expansion. In this test case loop-vectorize expands
				; i8 {0,+,2,+,1} addrec in the outer loop. The expression to compute the addrec
				; at a given iteration involves division by 2 (via shr). In order to compute
				; this expression without overflow a canonical IV of the wider type i9 should be
				; used.
				;
				; See also:
				; SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration,
				; ScalarEvolutionsTest::SCEVExpandInsertCanonicalIV.

				define i32 @test(i8* %p) {
				; CHECK-LABEL: @test
				bb:
				br label %outer_loop

				outer_loop:
				; Check that the canonical IV inserted by expander is of i9 type
				; CHECK: outer_loop:
				; CHECK: %indvar = phi i9 [ %indvar.next, %outer_loop_cont ], [ 0, %bb ]
				%tmp4 = phi i32 [ 0, %bb ], [ %tmp8, %outer_loop_cont ]
				%tmp5 = phi i32 [ 0, %bb ], [ %tmp13, %outer_loop_cont ]
				%tmp7 = phi i32 [ 1, %bb ], [ %tmp16, %outer_loop_cont ]
				%tmp8 = add i32 %tmp7, %tmp4
				%tmp9 = trunc i32 %tmp8 to i8
				%tmp10 = load i8, i8* %p, align 8
				br label %inner_loop

				inner_loop:
				%tmp19 = phi i8 [ %tmp10, %outer_loop ], [ %tmp23, %inner_loop ]
				%tmp20 = phi i32 [ %tmp5, %outer_loop ], [ %tmp22, %inner_loop ]
				%tmp21 = phi i32 [ 1, %outer_loop ], [ %tmp24, %inner_loop ]
				%tmp22 = add i32 %tmp20, 1
				%tmp23 = add i8 %tmp19, %tmp9
				%tmp24 = add nuw nsw i32 %tmp21, 1
				%tmp25 = icmp ugt i32 %tmp21, 75
				br i1 %tmp25, label %outer_loop_cont, label %inner_loop

				outer_loop_cont:
				%tmp12 = phi i8 [ %tmp19, %inner_loop ]
				%tmp13 = phi i32 [ %tmp22, %inner_loop ]
				%tmp14 = phi i8 [ %tmp23, %inner_loop ]
				store i8 %tmp14, i8* %p, align 8
				%tmp15 = sext i8 %tmp12 to i32
				%tmp16 = add nuw nsw i32 %tmp7, 1
				%tmp17 = icmp ugt i32 %tmp7, 256
				br i1 %tmp17, label %exit, label %outer_loop

				exit:
				%tmp2 = phi i32 [ %tmp15, %outer_loop_cont ]
				ret i32 %tmp2
				}
				No newline at end of file

test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll

	Show All 22 Lines
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP27:%.]] = icmp sgt i32 [[M:%.]], 0			; CHECK-NEXT: [[CMP27:%.]] = icmp sgt i32 [[M:%.]], 0
	; CHECK-NEXT: br i1 [[CMP27]], label [[FOR_BODY3_LR_PH_US_PREHEADER:%.]], label [[FOR_END15:%.]]			; CHECK-NEXT: br i1 [[CMP27]], label [[FOR_BODY3_LR_PH_US_PREHEADER:%.]], label [[FOR_END15:%.]]
	; CHECK: for.body3.lr.ph.us.preheader:			; CHECK: for.body3.lr.ph.us.preheader:
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[M]], -1			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[M]], -1
	; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
	; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[TMP1]], 1			; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[TMP1]], 1
	; CHECK-NEXT: [[TMP3:%.]] = zext i32 [[K:%.]] to i64
	; CHECK-NEXT: br label [[FOR_BODY3_LR_PH_US:%.*]]			; CHECK-NEXT: br label [[FOR_BODY3_LR_PH_US:%.*]]
	; CHECK: for.end.us:			; CHECK: for.end.us:
	; CHECK-NEXT: [[ARRAYIDX9_US:%.]] = getelementptr inbounds i32, i32 [[B:%.]], i64 [[INDVARS_IV33:%.]]			; CHECK-NEXT: [[ARRAYIDX9_US:%.]] = getelementptr inbounds i32, i32 [[B:%.]], i64 [[INDVARS_IV33:%.]]
	; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[ARRAYIDX9_US]], align 4, !llvm.mem.parallel_loop_access !0			; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[ARRAYIDX9_US]], align 4, !llvm.mem.parallel_loop_access !0
	; CHECK-NEXT: [[ADD10_US:%.*]] = add nsw i32 [[TMP4]], 3			; CHECK-NEXT: [[ADD10_US:%.*]] = add nsw i32 [[TMP3]], 3
	; CHECK-NEXT: store i32 [[ADD10_US]], i32* [[ARRAYIDX9_US]], align 4, !llvm.mem.parallel_loop_access !0			; CHECK-NEXT: store i32 [[ADD10_US]], i32* [[ARRAYIDX9_US]], align 4, !llvm.mem.parallel_loop_access !0
	; CHECK-NEXT: [[INDVARS_IV_NEXT34:%.*]] = add i64 [[INDVARS_IV33]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT34:%.*]] = add i64 [[INDVARS_IV33]], 1
	; CHECK-NEXT: [[LFTR_WIDEIV35:%.*]] = trunc i64 [[INDVARS_IV_NEXT34]] to i32			; CHECK-NEXT: [[LFTR_WIDEIV35:%.*]] = trunc i64 [[INDVARS_IV_NEXT34]] to i32
	; CHECK-NEXT: [[EXITCOND36:%.*]] = icmp eq i32 [[LFTR_WIDEIV35]], [[M]]			; CHECK-NEXT: [[EXITCOND36:%.*]] = icmp eq i32 [[LFTR_WIDEIV35]], [[M]]
	; CHECK-NEXT: br i1 [[EXITCOND36]], label [[FOR_END15_LOOPEXIT:%.*]], label [[FOR_BODY3_LR_PH_US]], !llvm.loop !2			; CHECK-NEXT: br i1 [[EXITCOND36]], label [[FOR_END15_LOOPEXIT:%.*]], label [[FOR_BODY3_LR_PH_US]], !llvm.loop !2
	; CHECK: for.body3.us:			; CHECK: for.body3.us:
	; CHECK-NEXT: [[INDVARS_IV29:%.]] = phi i64 [ [[BC_RESUME_VAL:%.]], [[SCALAR_PH:%.]] ], [ [[INDVARS_IV_NEXT30:%.]], [[FOR_BODY3_US:%.*]] ]			; CHECK-NEXT: [[INDVARS_IV29:%.]] = phi i64 [ [[BC_RESUME_VAL:%.]], [[SCALAR_PH:%.]] ], [ [[INDVARS_IV_NEXT30:%.]], [[FOR_BODY3_US:%.*]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = trunc i64 [[INDVARS_IV29]] to i32			; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV29]] to i32
	; CHECK-NEXT: [[ADD4_US:%.]] = add i32 [[ADD_US:%.]], [[TMP5]]			; CHECK-NEXT: [[ADD4_US:%.]] = add i32 [[ADD_US:%.]], [[TMP4]]
	; CHECK-NEXT: [[IDXPROM_US:%.*]] = sext i32 [[ADD4_US]] to i64			; CHECK-NEXT: [[IDXPROM_US:%.*]] = sext i32 [[ADD4_US]] to i64
	; CHECK-NEXT: [[ARRAYIDX_US:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[IDXPROM_US]]			; CHECK-NEXT: [[ARRAYIDX_US:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[IDXPROM_US]]
	; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[ARRAYIDX_US]], align 4, !llvm.mem.parallel_loop_access !0			; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 [[ARRAYIDX_US]], align 4, !llvm.mem.parallel_loop_access !0
	; CHECK-NEXT: [[ADD5_US:%.*]] = add nsw i32 [[TMP6]], 1			; CHECK-NEXT: [[ADD5_US:%.*]] = add nsw i32 [[TMP5]], 1
	; CHECK-NEXT: store i32 [[ADD5_US]], i32* [[ARRAYIDX7_US:%.*]], align 4, !llvm.mem.parallel_loop_access !0			; CHECK-NEXT: store i32 [[ADD5_US]], i32* [[ARRAYIDX7_US:%.*]], align 4, !llvm.mem.parallel_loop_access !0
	; CHECK-NEXT: [[INDVARS_IV_NEXT30]] = add i64 [[INDVARS_IV29]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT30]] = add i64 [[INDVARS_IV29]], 1
	; CHECK-NEXT: [[LFTR_WIDEIV31:%.*]] = trunc i64 [[INDVARS_IV_NEXT30]] to i32			; CHECK-NEXT: [[LFTR_WIDEIV31:%.*]] = trunc i64 [[INDVARS_IV_NEXT30]] to i32
	; CHECK-NEXT: [[EXITCOND32:%.*]] = icmp eq i32 [[LFTR_WIDEIV31]], [[M]]			; CHECK-NEXT: [[EXITCOND32:%.*]] = icmp eq i32 [[LFTR_WIDEIV31]], [[M]]
	; CHECK-NEXT: br i1 [[EXITCOND32]], label [[FOR_END_US:%.*]], label [[FOR_BODY3_US]], !llvm.loop !3			; CHECK-NEXT: br i1 [[EXITCOND32]], label [[FOR_END_US:%.*]], label [[FOR_BODY3_US]], !llvm.loop !3
	; CHECK: for.body3.lr.ph.us:			; CHECK: for.body3.lr.ph.us:
	; CHECK-NEXT: [[INDVARS_IV33]] = phi i64 [ [[INDVARS_IV_NEXT34]], [[FOR_END_US]] ], [ 0, [[FOR_BODY3_LR_PH_US_PREHEADER]] ]			; CHECK-NEXT: [[INDVARS_IV33]] = phi i64 [ [[INDVARS_IV_NEXT34]], [[FOR_END_US]] ], [ 0, [[FOR_BODY3_LR_PH_US_PREHEADER]] ]
	; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[TMP3]], [[INDVARS_IV33]]			; CHECK-NEXT: [[TMP6:%.*]] = trunc i64 [[INDVARS_IV33]] to i32
	; CHECK-NEXT: [[TMP8:%.*]] = trunc i64 [[TMP7]] to i32			; CHECK-NEXT: [[TMP7:%.]] = add i32 [[K:%.]], [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = trunc i64 [[INDVARS_IV33]] to i32			; CHECK-NEXT: [[TMP8:%.*]] = trunc i64 [[INDVARS_IV33]] to i32
	; CHECK-NEXT: [[ADD_US]] = add i32 [[TMP9]], [[K]]			; CHECK-NEXT: [[ADD_US]] = add i32 [[TMP8]], [[K]]
	; CHECK-NEXT: [[ARRAYIDX7_US]] = getelementptr inbounds i32, i32* [[A]], i64 [[INDVARS_IV33]]			; CHECK-NEXT: [[ARRAYIDX7_US]] = getelementptr inbounds i32, i32* [[A]], i64 [[INDVARS_IV33]]
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH]], label [[VECTOR_SCEVCHECK:%.*]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH]], label [[VECTOR_SCEVCHECK:%.*]]
	; CHECK: vector.scevcheck:			; CHECK: vector.scevcheck:
	; CHECK-NEXT: [[MUL:%.*]] = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 1, i32 [[TMP0]])			; CHECK-NEXT: [[MUL:%.*]] = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 1, i32 [[TMP0]])
	; CHECK-NEXT: [[MUL_RESULT:%.*]] = extractvalue { i32, i1 } [[MUL]], 0			; CHECK-NEXT: [[MUL_RESULT:%.*]] = extractvalue { i32, i1 } [[MUL]], 0
	; CHECK-NEXT: [[MUL_OVERFLOW:%.*]] = extractvalue { i32, i1 } [[MUL]], 1			; CHECK-NEXT: [[MUL_OVERFLOW:%.*]] = extractvalue { i32, i1 } [[MUL]], 1
	; CHECK-NEXT: [[TMP10:%.*]] = add i32 [[TMP8]], [[MUL_RESULT]]			; CHECK-NEXT: [[TMP9:%.*]] = add i32 [[TMP7]], [[MUL_RESULT]]
	; CHECK-NEXT: [[TMP11:%.*]] = sub i32 [[TMP8]], [[MUL_RESULT]]			; CHECK-NEXT: [[TMP10:%.*]] = sub i32 [[TMP7]], [[MUL_RESULT]]
	; CHECK-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]			; CHECK-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP10]], [[TMP7]]
	; CHECK-NEXT: [[TMP13:%.*]] = icmp slt i32 [[TMP10]], [[TMP8]]			; CHECK-NEXT: [[TMP12:%.*]] = icmp slt i32 [[TMP9]], [[TMP7]]
	; CHECK-NEXT: [[TMP14:%.*]] = select i1 false, i1 [[TMP12]], i1 [[TMP13]]			; CHECK-NEXT: [[TMP13:%.*]] = select i1 false, i1 [[TMP11]], i1 [[TMP12]]
	; CHECK-NEXT: [[TMP15:%.*]] = or i1 [[TMP14]], [[MUL_OVERFLOW]]			; CHECK-NEXT: [[TMP14:%.*]] = or i1 [[TMP13]], [[MUL_OVERFLOW]]
	; CHECK-NEXT: [[TMP16:%.*]] = or i1 false, [[TMP15]]			; CHECK-NEXT: [[TMP15:%.*]] = or i1 false, [[TMP14]]
	; CHECK-NEXT: br i1 [[TMP16]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[TMP15]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP17:%.*]] = trunc i64 [[INDEX]] to i32			; CHECK-NEXT: [[TMP16:%.*]] = trunc i64 [[INDEX]] to i32
	; CHECK-NEXT: [[TMP18:%.*]] = add i32 [[TMP17]], 0			; CHECK-NEXT: [[TMP17:%.*]] = add i32 [[TMP16]], 0
	; CHECK-NEXT: [[TMP19:%.*]] = add i32 [[ADD_US]], [[TMP18]]			; CHECK-NEXT: [[TMP18:%.*]] = add i32 [[ADD_US]], [[TMP17]]
	; CHECK-NEXT: [[TMP20:%.*]] = sext i32 [[TMP19]] to i64			; CHECK-NEXT: [[TMP19:%.*]] = sext i32 [[TMP18]] to i64
	; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP20]]			; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP19]]
	; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds i32, i32 [[TMP21]], i32 0			; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP20]], i32 0
	; CHECK-NEXT: [[TMP23:%.]] = bitcast i32 [[TMP22]] to <4 x i32>*			; CHECK-NEXT: [[TMP22:%.]] = bitcast i32 [[TMP21]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP23]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP22]], align 4
	; CHECK-NEXT: [[TMP24:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], <i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[TMP23:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], <i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP25:%.*]] = extractelement <4 x i32> [[TMP24]], i32 0			; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x i32> [[TMP23]], i32 0
				; CHECK-NEXT: store i32 [[TMP24]], i32* [[ARRAYIDX7_US]], align 4, !llvm.mem.parallel_loop_access !0
				; CHECK-NEXT: [[TMP25:%.*]] = extractelement <4 x i32> [[TMP23]], i32 1
	; CHECK-NEXT: store i32 [[TMP25]], i32* [[ARRAYIDX7_US]], align 4, !llvm.mem.parallel_loop_access !0			; CHECK-NEXT: store i32 [[TMP25]], i32* [[ARRAYIDX7_US]], align 4, !llvm.mem.parallel_loop_access !0
	; CHECK-NEXT: [[TMP26:%.*]] = extractelement <4 x i32> [[TMP24]], i32 1			; CHECK-NEXT: [[TMP26:%.*]] = extractelement <4 x i32> [[TMP23]], i32 2
	; CHECK-NEXT: store i32 [[TMP26]], i32* [[ARRAYIDX7_US]], align 4, !llvm.mem.parallel_loop_access !0			; CHECK-NEXT: store i32 [[TMP26]], i32* [[ARRAYIDX7_US]], align 4, !llvm.mem.parallel_loop_access !0
	; CHECK-NEXT: [[TMP27:%.*]] = extractelement <4 x i32> [[TMP24]], i32 2			; CHECK-NEXT: [[TMP27:%.*]] = extractelement <4 x i32> [[TMP23]], i32 3
	; CHECK-NEXT: store i32 [[TMP27]], i32* [[ARRAYIDX7_US]], align 4, !llvm.mem.parallel_loop_access !0			; CHECK-NEXT: store i32 [[TMP27]], i32* [[ARRAYIDX7_US]], align 4, !llvm.mem.parallel_loop_access !0
	; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i32> [[TMP24]], i32 3
	; CHECK-NEXT: store i32 [[TMP28]], i32* [[ARRAYIDX7_US]], align 4, !llvm.mem.parallel_loop_access !0
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP29:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP29]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !5			; CHECK-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !5
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END_US]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END_US]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY3_LR_PH_US]] ], [ 0, [[VECTOR_SCEVCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY3_LR_PH_US]] ], [ 0, [[VECTOR_SCEVCHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY3_US]]			; CHECK-NEXT: br label [[FOR_BODY3_US]]
	; CHECK: for.end15.loopexit:			; CHECK: for.end15.loopexit:
	; CHECK-NEXT: br label [[FOR_END15]]			; CHECK-NEXT: br label [[FOR_END15]]
	▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

unittests/Analysis/ScalarEvolutionTest.cpp

Show First 20 Lines • Show All 1,438 Lines • ▼ Show 20 Lines	TEST_F(ScalarEvolutionsTest, SCEVComputeExpressionSize) {
const SCEV *S2S = SE.getSCEV(S2);		const SCEV *S2S = SE.getSCEV(S2);
EXPECT_EQ(AS->getExpressionSize(), 1u);		EXPECT_EQ(AS->getExpressionSize(), 1u);
EXPECT_EQ(BS->getExpressionSize(), 1u);		EXPECT_EQ(BS->getExpressionSize(), 1u);
EXPECT_EQ(CS->getExpressionSize(), 1u);		EXPECT_EQ(CS->getExpressionSize(), 1u);
EXPECT_EQ(S1S->getExpressionSize(), 3u);		EXPECT_EQ(S1S->getExpressionSize(), 3u);
EXPECT_EQ(S2S->getExpressionSize(), 5u);		EXPECT_EQ(S2S->getExpressionSize(), 5u);
}		}

		/// This test verifies that the SCEVExpander emits a canonical IV of a proper
		/// type while expanding non-linear addrecs.
		///
		/// See also:
		/// SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration
		/// scev-expand-canonical-iv-type.ll
		TEST_F(ScalarEvolutionsTest, SCEVExpandInsertCanonicalIV) {
		LLVMContext C;
		SMDiagnostic Err;

		// Expand the addrec produced by GetAddRec into a loop without a canonical IV.
		// SCEVExpander will produce one to expand the addrec. Check the the type of
		// the inserted IV is equal to ExpectedCanonicalIVWidth.
		auto TestNoCanonicalIV = [&](
		std::function<const SCEV (ScalarEvolution & SE, Loop L)> GetAddRec,
		unsigned ExpectedCanonicalIVWidth) {
		std::unique_ptr<Module> M =
		parseAssemblyString("define i32 @test(i32 %limit) { "
		"entry: "
		" br label %loop "
		"loop: "
		" %i = phi i32 [ 1, %entry ], [ %i.inc, %loop ] "
		" %i.inc = add nsw i32 %i, 1 "
		" %cont = icmp slt i32 %i.inc, %limit "
		" br i1 %cont, label %loop, label %exit "
		"exit: "
		" ret i32 %i.inc "
		"}",
		Err, C);

		assert(M && "Could not parse module?");
		assert(!verifyModule(*M) && "Must have been well formed!");

		runWithSE(*M, "test", [&](Function &F, LoopInfo &LI, ScalarEvolution &SE) {
		auto &I = GetInstByName(F, "i");
		auto *Loop = LI.getLoopFor(I.getParent());
		EXPECT_FALSE(Loop->getCanonicalInductionVariable());

		auto *AR = GetAddRec(SE, Loop);
		SCEVExpander Exp(SE, M->getDataLayout(), "expander");
		auto *InsertAt = I.getNextNode();
		Exp.expandCodeFor(AR, nullptr, InsertAt);
		PHINode *CanonicalIV = Loop->getCanonicalInductionVariable();
		unsigned CanonicalIVBitWidth =
		cast<IntegerType>(CanonicalIV->getType())->getBitWidth();
		EXPECT_EQ(CanonicalIVBitWidth, ExpectedCanonicalIVWidth);
		});
		};

		// Expand the addrec produced by GetAddRec into a loop with a canonical IV
		// which is narrower than ExpectedCanonicalIVWidth.
		// SCEVExpander will produce a canonical IV of a wider type to expand the
		// addrec. Check the the type of the inserted IV is equal to
		// ExpectedCanonicalIVWidth.
		auto TestNarrowCanonicalIV = [&](
		std::function<const SCEV (ScalarEvolution & SE, Loop L)> GetAddRec,
		unsigned ExpectedCanonicalIVWidth) {
		std::unique_ptr<Module> M = parseAssemblyString(
		"define i32 @test(i32 %limit) { "
		"entry: "
		" br label %loop "
		"loop: "
		" %i = phi i32 [ 1, %entry ], [ %i.inc, %loop ] "
		" %canonical.iv = phi i8 [ 0, %entry ], [ %canonical.iv.inc, %loop ] "
		" %i.inc = add nsw i32 %i, 1 "
		" %canonical.iv.inc = add i8 %canonical.iv, 1 "
		" %cont = icmp slt i32 %i.inc, %limit "
		" br i1 %cont, label %loop, label %exit "
		"exit: "
		" ret i32 %i.inc "
		"}",
		Err, C);

		assert(M && "Could not parse module?");
		assert(!verifyModule(*M) && "Must have been well formed!");

		runWithSE(*M, "test", [&](Function &F, LoopInfo &LI, ScalarEvolution &SE) {
		auto &I = GetInstByName(F, "i");

		auto *LoopHeaderBB = I.getParent();
		auto *Loop = LI.getLoopFor(LoopHeaderBB);
		PHINode *CanonicalIV = Loop->getCanonicalInductionVariable();
		EXPECT_EQ(CanonicalIV, &GetInstByName(F, "canonical.iv"));

		unsigned CanonicalIVBitWidth =
		cast<IntegerType>(CanonicalIV->getType())->getBitWidth();
		EXPECT_LT(CanonicalIVBitWidth, ExpectedCanonicalIVWidth);

		auto *AR = GetAddRec(SE, Loop);
		SCEVExpander Exp(SE, M->getDataLayout(), "expander");
		auto *InsertAt = I.getNextNode();
		Exp.expandCodeFor(AR, nullptr, InsertAt);

		// Loop over all of the PHI nodes, looking for the new canonical indvar.
		PHINode *NewCanonicalIV = nullptr;
		for (BasicBlock::iterator i = LoopHeaderBB->begin(); isa<PHINode>(i);
		++i) {
		PHINode *PN = cast<PHINode>(i);
		if (PN == &I \|\| PN == CanonicalIV)
		continue;
		EXPECT_FALSE(NewCanonicalIV);
		NewCanonicalIV = PN;
		}

		// Check that NewCanonicalIV is a canonical IV, i.e {0,+,1}
		BasicBlock Incoming = nullptr, Backedge = nullptr;
		EXPECT_TRUE(Loop->getIncomingAndBackEdge(Incoming, Backedge));
		auto *Start = NewCanonicalIV->getIncomingValueForBlock(Incoming);
		EXPECT_TRUE(isa<ConstantInt>(Start));
		EXPECT_TRUE(dyn_cast<ConstantInt>(Start)->isZero());
		auto *Next = NewCanonicalIV->getIncomingValueForBlock(Backedge);
		EXPECT_TRUE(isa<BinaryOperator>(Next));
		auto *NextBinOp = dyn_cast<BinaryOperator>(Next);
		EXPECT_EQ(NextBinOp->getOpcode(), Instruction::Add);
		EXPECT_EQ(NextBinOp->getOperand(0), NewCanonicalIV);
		auto *Step = NextBinOp->getOperand(1);
		EXPECT_TRUE(isa<ConstantInt>(Step));
		EXPECT_TRUE(dyn_cast<ConstantInt>(Step)->isOne());

		unsigned NewCanonicalIVBitWidth =
		cast<IntegerType>(NewCanonicalIV->getType())->getBitWidth();
		EXPECT_EQ(NewCanonicalIVBitWidth, ExpectedCanonicalIVWidth);
		});
		};

		// Expand the addrec produced by GetAddRec into a loop with a canonical IV
		// of width ExpectedCanonicalIVWidth.
		// To expand the addrec SCEVExpander should use the existing canonical IV.
		auto TestMatchingCanonicalIV = [&](
		std::function<const SCEV (ScalarEvolution & SE, Loop L)> GetAddRec,
		unsigned ExpectedCanonicalIVWidth) {
		auto ExpectedCanonicalIVTypeStr =
		"i" + std::to_string(ExpectedCanonicalIVWidth);
		std::unique_ptr<Module> M = parseAssemblyString(
		"define i32 @test(i32 %limit) { "
		"entry: "
		" br label %loop "
		"loop: "
		" %i = phi i32 [ 1, %entry ], [ %i.inc, %loop ] "
		" %canonical.iv = phi " + ExpectedCanonicalIVTypeStr +
		" [ 0, %entry ], [ %canonical.iv.inc, %loop ] "
		" %i.inc = add nsw i32 %i, 1 "
		" %canonical.iv.inc = add " + ExpectedCanonicalIVTypeStr +
		" %canonical.iv, 1 "
		" %cont = icmp slt i32 %i.inc, %limit "
		" br i1 %cont, label %loop, label %exit "
		"exit: "
		" ret i32 %i.inc "
		"}",
		Err, C);

		assert(M && "Could not parse module?");
		assert(!verifyModule(*M) && "Must have been well formed!");

		runWithSE(*M, "test", [&](Function &F, LoopInfo &LI, ScalarEvolution &SE) {
		auto &I = GetInstByName(F, "i");
		auto &CanonicalIV = GetInstByName(F, "canonical.iv");

		auto *LoopHeaderBB = I.getParent();
		auto *Loop = LI.getLoopFor(LoopHeaderBB);
		EXPECT_EQ(&CanonicalIV, Loop->getCanonicalInductionVariable());
		unsigned CanonicalIVBitWidth =
		cast<IntegerType>(CanonicalIV.getType())->getBitWidth();
		EXPECT_EQ(CanonicalIVBitWidth, ExpectedCanonicalIVWidth);

		auto *AR = GetAddRec(SE, Loop);
		SCEVExpander Exp(SE, M->getDataLayout(), "expander");
		auto *InsertAt = I.getNextNode();
		Exp.expandCodeFor(AR, nullptr, InsertAt);

		// Loop over all of the PHI nodes, looking if a new canonical indvar was
		// introduced.
		PHINode *NewCanonicalIV = nullptr;
		for (BasicBlock::iterator i = LoopHeaderBB->begin(); isa<PHINode>(i);
		++i) {
		PHINode *PN = cast<PHINode>(i);
		if (PN == &I \|\| PN == &CanonicalIV)
		continue;
		NewCanonicalIV = PN;
		}
		EXPECT_FALSE(NewCanonicalIV);
		});
		};

		unsigned ARBitwidth = 16;
		Type *ARType = IntegerType::get(C, ARBitwidth);

		// Expand {5,+,1}, expect ARBitwidth canonical IV width
		auto GetAR2 = [&](ScalarEvolution &SE, Loop L) -> const SCEV {
		return SE.getAddRecExpr(SE.getConstant(APInt(ARBitwidth, 5)),
		SE.getOne(ARType), L, SCEV::FlagAnyWrap);
		};
		TestNoCanonicalIV(GetAR2, ARBitwidth);
		TestNarrowCanonicalIV(GetAR2, ARBitwidth);
		TestMatchingCanonicalIV(GetAR2, ARBitwidth);

		// Expand {5,+,1,+,1}, expect ARBitwidth+1 canonical IV width
		auto GetAR3 = [&](ScalarEvolution &SE, Loop L) -> const SCEV {
		SmallVector<const SCEV *, 3> Ops = {SE.getConstant(APInt(ARBitwidth, 5)),
		SE.getOne(ARType), SE.getOne(ARType)};
		return SE.getAddRecExpr(Ops, L, SCEV::FlagAnyWrap);
		};
		TestNoCanonicalIV(GetAR3, ARBitwidth + 1);
		TestNarrowCanonicalIV(GetAR3, ARBitwidth + 1);
		TestMatchingCanonicalIV(GetAR3, ARBitwidth + 1);

		// Expand {5,+,1,+,1,+,1}, expect ARBitwidth+1 canonical IV width
		auto GetAR4 = [&](ScalarEvolution &SE, Loop L) -> const SCEV {
		SmallVector<const SCEV *, 3> Ops = {SE.getConstant(APInt(ARBitwidth, 5)),
		SE.getOne(ARType), SE.getOne(ARType),
		SE.getOne(ARType)};
		return SE.getAddRecExpr(Ops, L, SCEV::FlagAnyWrap);
		};
		TestNoCanonicalIV(GetAR4, ARBitwidth + 1);
		TestNarrowCanonicalIV(GetAR4, ARBitwidth + 1);
		TestMatchingCanonicalIV(GetAR4, ARBitwidth + 1);

		// Expand {5,+,1,+,1,+,1,+,1}, expect ARBitwidth+3 canonical IV width
		auto GetAR5 = [&](ScalarEvolution &SE, Loop L) -> const SCEV {
		SmallVector<const SCEV *, 3> Ops = {SE.getConstant(APInt(ARBitwidth, 5)),
		SE.getOne(ARType), SE.getOne(ARType),
		SE.getOne(ARType), SE.getOne(ARType)};
		return SE.getAddRecExpr(Ops, L, SCEV::FlagAnyWrap);
		};
		TestNoCanonicalIV(GetAR5, ARBitwidth + 3);
		TestNarrowCanonicalIV(GetAR5, ARBitwidth + 3);
		TestMatchingCanonicalIV(GetAR5, ARBitwidth + 3);
		}

} // end anonymous namespace		} // end anonymous namespace
} // end namespace llvm		} // end namespace llvm

This is an archive of the discontinued LLVM Phabricator instance.

Fix incorrect expand of non-linear addrecsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 201795

include/llvm/Analysis/ScalarEvolutionExpressions.h

lib/Analysis/ScalarEvolution.cpp

lib/Analysis/ScalarEvolutionExpander.cpp

test/Analysis/ScalarEvolution/scev-expand-canonical-iv-type.ll

test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll

unittests/Analysis/ScalarEvolutionTest.cpp

Fix incorrect expand of non-linear addrecs
AbandonedPublic