This is an archive of the discontinued LLVM Phabricator instance.

Fix incorrect expand of non-linear addrecs
AbandonedPublic

Authored by apilipenko on May 28 2019, 5:27 PM.

Download Raw Diff

Details

Reviewers

reames
wristow
loladiro
sanjoy
kparzysz
rtereshin

Summary

We have a problem in SCEVExpander around expansion of non-linear addrecs. Expanding an addrec SCEVExpander emits an expression to compute the addrec at a given iteration number (using SCEVAddRecExpr::evaluateAtIteration). As an iteration number it uses the canonical induction variable of the loop. Canonical IV here is an IV starting at 0 and incremented by 1 on every iteration, {0,+,1}. When the loop doesn't have a canonical IV SCEVExpander inserts one. It uses the type of the addrec to be expanded as the type for the canonical IV. This is not always correct.

If the addrec to expand is a linear addrec {start,+,step}, the expression to compute the value at the given iteration i is:

(start + i * step) mod MaxType,

where MaxType is the maximum value in the type of the addrec.

A canonical IV of the same type as addrec corresponds to (i mod MaxType). Using this IV as the iteration number i works fine for linear addrecs:

(start + i * step) mod MaxType = (start mod MaxType) + (i mod MaxType) * (step mod MaxType)

This is because mod commutates with + and *, so we can sink mod (truncation) down to the operands.

But it's not correct for non-linear addrecs because the expression to compute the value on the given iteration involves division (see BinomialCoefficient function in ScalarEvolution.cpp).

For example, look at scev-expand-canonical-iv-type.ll test in this patch. In this example, loop-vectorize expands i8 {0,+,2,+,1} addrec in the loop without a canonical IV. It inserts a canonical IV of type i8 and uses in the expression to compute the value of the addrec at the given iteration.

The expression is:

((i * (i - 1)) /u 2 + 2 * i) mod 256

Using a i8 canonical IV effectively turns it into:

((i mod 256) * ((i mod 256) - 1)) /u 2 + 2 * (i mod 256)

This is not equal to the original expression, because mod and division (truncation and lshr) don't commutate. In this case we need to used a canonical IV of a wider type. For the exact SCEVs of this example see below [1].

SCEVExpander needs to be aware that expansion of an addrec might need an canonical IV of a type wider than the addrec. This patch fixes the issue by introducing SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration method and using it in SCEVExpander.

This patch can be split into three distinct changes. I plan to split them when integrating, but post the initial review with all three in one for better context.

Prepare SCEVExpander::visitAddRecExpr to use CanonicalIV wider than the addrec.
Introduce SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration, MinIterationWidthForBinomialCoefficient with an assertion in BinomialCoefficient.
Use SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration in SCEVExpander::visitAddRecExpr to compute the type of the canonical IV.

This patch is currently going through our internal fuzzing and performance testing.

[1] Debug output from evaluateAtIteration before the fix:

evaluateAtIteration this = {0,+,2,+,1}<%outer_loop>
evaluateAtIteration it = i8 %indvar
evaluateAtIteration result = ((trunc i9 (((zext i8 (-1 + %indvar) to i9) * (zext i8 %indvar to i9)) /u 2) to i8) + (2 * %indvar))

After the fix:

evaluateAtIteration this = {0,+,2,+,1}<%outer_loop>
evaluateAtIteration it = i9 %indvar
evaluateAtIteration result = ((trunc i9 (((-1 + %indvar) * %indvar) /u 2) to i8) + (2 * (trunc i9 %indvar to i8)))

Diff Detail

Event Timeline

apilipenko created this revision.May 28 2019, 5:27 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2019, 5:27 PM

Herald added a subscriber: javed.absar. · View Herald Transcript

apilipenko edited the summary of this revision. (Show Details)May 28 2019, 5:27 PM

Artur, nice find. In terms of staging complexity, have you considered how impactful it would be to simply refuse to generate the value at the given iteration in this case? evaluateAtIteration is allowed to return SCEVCouldNotCompute. I'm tempted to first introduce a bailout for a quick correctness fix - maybe along side your assert to see if we're missing any other cases - and then spend more time considering your full fix.

One observation: I really don't think we want to be emitting an i9. We probably want to be rounding up to the nearest legal type. For i65, I doubt we want to be generating it at all. It's probably best to simply bail out in that case.

p.s. Please go ahead and land your preparatory changes (1). The context was helpful, but we should get those out of the way to simplify the rest of the review. (This is mostly just factoring out CanonicalIVType right?)

lib/Analysis/ScalarEvolutionExpander.cpp
1486	Removing this entirely seems like overkill. Maybe either add a restriction to a) affine addrecs or b) the bitwidth of the existing canonical IV is sufficient?
1596	Not sure you actually want getUnknown? Did you mean getSCEV?

In D62563#1521477, @reames wrote:

Artur, nice find. In terms of staging complexity, have you considered how impactful it would be to simply refuse to generate the value at the given iteration in this case? evaluateAtIteration is allowed to return SCEVCouldNotCompute. I'm tempted to first introduce a bailout for a quick correctness fix - maybe along side your assert to see if we're missing any other cases - and then spend more time considering your full fix.

Yes, evaluateAtIteration can return SCEVCouldNotCompute, but it doesn't seem like SCEVExpander is ready for that. What we can do instead is to use expandAddRecExprLiterally for non-affine addrecs.

One observation: I really don't think we want to be emitting an i9. We probably want to be rounding up to the nearest legal type. For i65, I doubt we want to be generating it at all. It's probably best to simply bail out in that case.

evaluateAtIteration is already emitting i9 types for high order binomial coefficients. See the debug output in the review description. But I guess you are right, that we don't want to introduce canonical IVs of non-legal types. If we need to bail out for some types the the bail out would be expandAddRecExprLiterally.

In general I like the idea of using expandAddRecExprLiterally for non-affine addrecs, I'm going to run some performance experiments with this approach to see the impact.

In D62563#1522063, @apilipenko wrote:

...
In general I like the idea of using expandAddRecExprLiterally for non-affine addrecs, I'm going to run some performance experiments with this approach to see the impact.

Artur and I spoke offline. The workaround suggestion I made doesn't appear to really work well, but Artur is going to move forward with the expandAddRecExprLiterally idea when we'd overflow the IV type. This seems to be a practical fix for non-affine IVs.

Fall back to non-canonical mode for non-affine addrecs.

apilipenko updated this revision to Diff 202768.Jun 3 2019, 11:52 AM

reames requested changes to this revision.Jun 3 2019, 2:04 PM

reames added inline comments.

include/llvm/Analysis/ScalarEvolutionExpressions.h
347	Does this mean that evaluateAtIteration may return an incorrect result? If so, I'd very much like to see an assert which trips on that usage.
lib/Analysis/ScalarEvolutionExpander.cpp
1486	How about only falling back to literal expansion for non-affine addrecs which actually need the wider expansion?

This revision now requires changes to proceed.Jun 3 2019, 2:04 PM

There is a https://bugs.llvm.org/show_bug.cgi?id=42384 failing due same problem.

Finished by @ebrevnov as D65276.

Revision Contents

Path

Size

include/

llvm/

Analysis/

ScalarEvolutionExpander.h

10 lines

ScalarEvolutionExpressions.h

10 lines

lib/

Analysis/

ScalarEvolution.cpp

21 lines

ScalarEvolutionExpander.cpp

20 lines

test/

Analysis/

ScalarEvolution/

scev-expander-non-affine.ll

57 lines

unittests/

Analysis/

ScalarEvolutionTest.cpp

184 lines

Diff 202768

include/llvm/Analysis/ScalarEvolutionExpander.h

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	class SCEVExpander : public SCEVVisitor<SCEVExpander, Value*> {

/// When expanding addrecs in the IVIncInsertLoop loop, insert the IV		/// When expanding addrecs in the IVIncInsertLoop loop, insert the IV
/// increment at this position.		/// increment at this position.
Instruction *IVIncInsertPos;		Instruction *IVIncInsertPos;

/// Phis that complete an IV chain. Reuse		/// Phis that complete an IV chain. Reuse
DenseSet<AssertingVH<PHINode>> ChainedPhis;		DenseSet<AssertingVH<PHINode>> ChainedPhis;

/// When true, expressions are expanded in "canonical" form. In particular,		/// When true, SCEVExpander tries to expand expressions in "canonical" form.
/// addrecs are expanded as arithmetic based on a canonical induction		/// When false, expressions are expanded in a more literal form.
/// variable. When false, expression are expanded in a more literal form.		///
		/// In "canonical" form addrecs are expanded as arithmetic based on a
		/// canonical induction variable. Note that CanonicalMode doesn't guarantee
		/// that all expressions are expanded in "canonical" form. For some
		/// expressions literal mode can be preferred.
bool CanonicalMode;		bool CanonicalMode;

/// When invoked from LSR, the expander is in "strength reduction" mode. The		/// When invoked from LSR, the expander is in "strength reduction" mode. The
/// only difference is that phi's are only reused if they are already in		/// only difference is that phi's are only reused if they are already in
/// "expanded" form.		/// "expanded" form.
bool LSRMode;		bool LSRMode;

typedef IRBuilder<TargetFolder> BuilderType;		typedef IRBuilder<TargetFolder> BuilderType;
▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines

include/llvm/Analysis/ScalarEvolutionExpressions.h

Show First 20 Lines • Show All 337 Lines • ▼ Show 20 Lines	public:
void setNoWrapFlags(NoWrapFlags Flags) {		void setNoWrapFlags(NoWrapFlags Flags) {
if (Flags & (FlagNUW \| FlagNSW))		if (Flags & (FlagNUW \| FlagNSW))
Flags = ScalarEvolution::setFlags(Flags, FlagNW);		Flags = ScalarEvolution::setFlags(Flags, FlagNW);
SubclassData \|= Flags;		SubclassData \|= Flags;
}		}

/// Return the value of this chain of recurrences at the specified		/// Return the value of this chain of recurrences at the specified
/// iteration number.		/// iteration number.
		///
		/// Note that for the resulting expression to be correct the iteration
		reamesUnsubmitted Not Done Reply Inline Actions Does this mean that evaluateAtIteration may return an incorrect result? If so, I'd very much like to see an assert which trips on that usage. reames: Does this mean that evaluateAtIteration may return an incorrect result? If so, I'd very much…
		/// number can't be narrower than what
		/// minIterationWidthForEvaluateAtIteration returns.
const SCEV evaluateAtIteration(const SCEV It, ScalarEvolution &SE) const;		const SCEV evaluateAtIteration(const SCEV It, ScalarEvolution &SE) const;

		/// Return the minimum bitwidth of the iteration expr to compute this
		/// AddRec at the given iteration without overflow. For affine AddRecs its
		/// the same as the AddRec type width, for non-affine it might be wider
		/// that the AddRec type width.
		unsigned minIterationWidthForEvaluateAtIteration(ScalarEvolution &SE) const;

/// Return the number of iterations of this loop that produce		/// Return the number of iterations of this loop that produce
/// values in the specified constant range. Another way of		/// values in the specified constant range. Another way of
/// looking at this is that it returns the first iteration number		/// looking at this is that it returns the first iteration number
/// where the value is not in the condition, thus computing the		/// where the value is not in the condition, thus computing the
/// exit count. If the iteration count can't be computed, an		/// exit count. If the iteration count can't be computed, an
/// instance of SCEVCouldNotCompute is returned.		/// instance of SCEVCouldNotCompute is returned.
const SCEV *getNumIterationsInRange(const ConstantRange &Range,		const SCEV *getNumIterationsInRange(const ConstantRange &Range,
ScalarEvolution &SE) const;		ScalarEvolution &SE) const;
▲ Show 20 Lines • Show All 499 Lines • Show Last 20 Lines

lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,102 Lines • ▼ Show 20 Lines
};		};

} // end anonymous namespace		} // end anonymous namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Simple SCEV method implementations		// Simple SCEV method implementations
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		static unsigned MinIterationWidthForBinomialCoefficient(unsigned K,
		Type *ResultTy,
		ScalarEvolution &SE) {
		unsigned W = SE.getTypeSizeInBits(ResultTy);
		unsigned T = 1;
		for (unsigned i = 3; i <= K; ++i)
		T += APInt(W, i).countTrailingZeros();
		return W + T;
		}

/// Compute BC(It, K). The result has width W. Assume, K > 0.		/// Compute BC(It, K). The result has width W. Assume, K > 0.
static const SCEV BinomialCoefficient(const SCEV It, unsigned K,		static const SCEV BinomialCoefficient(const SCEV It, unsigned K,
ScalarEvolution &SE,		ScalarEvolution &SE,
Type *ResultTy) {		Type *ResultTy) {
// Handle the simplest case efficiently.		// Handle the simplest case efficiently.
if (K == 1)		if (K == 1)
return SE.getTruncateOrZeroExtend(It, ResultTy);		return SE.getTruncateOrZeroExtend(It, ResultTy);

▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	for (unsigned i = 3; i <= K; ++i) {
unsigned TwoFactors = Mult.countTrailingZeros();		unsigned TwoFactors = Mult.countTrailingZeros();
T += TwoFactors;		T += TwoFactors;
Mult.lshrInPlace(TwoFactors);		Mult.lshrInPlace(TwoFactors);
OddFactorial *= Mult;		OddFactorial *= Mult;
}		}

// We need at least W + T bits for the multiplication step		// We need at least W + T bits for the multiplication step
unsigned CalculationBits = W + T;		unsigned CalculationBits = W + T;
		assert(MinIterationWidthForBinomialCoefficient(K, ResultTy, SE) ==
		CalculationBits &&
		"must be the same!");

// Calculate 2^T, at width T+W.		// Calculate 2^T, at width T+W.
APInt DivFactor = APInt::getOneBitSet(CalculationBits, T);		APInt DivFactor = APInt::getOneBitSet(CalculationBits, T);

// Calculate the multiplicative inverse of K! / 2^T;		// Calculate the multiplicative inverse of K! / 2^T;
// this multiplication factor will perform the exact division by		// this multiplication factor will perform the exact division by
// K! / 2^T.		// K! / 2^T.
APInt Mod = APInt::getSignedMinValue(W+1);		APInt Mod = APInt::getSignedMinValue(W+1);
Show All 39 Lines	for (unsigned i = 1, e = getNumOperands(); i != e; ++i) {
if (isa<SCEVCouldNotCompute>(Coeff))		if (isa<SCEVCouldNotCompute>(Coeff))
return Coeff;		return Coeff;

Result = SE.getAddExpr(Result, SE.getMulExpr(getOperand(i), Coeff));		Result = SE.getAddExpr(Result, SE.getMulExpr(getOperand(i), Coeff));
}		}
return Result;		return Result;
}		}

		unsigned SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration(
		ScalarEvolution &SE) const {
		if (isAffine())
		return SE.getTypeSizeInBits(getType());
		return MinIterationWidthForBinomialCoefficient(getNumOperands() - 1,
		getType(), SE);
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SCEV Expression folder implementations		// SCEV Expression folder implementations
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

const SCEV ScalarEvolution::getTruncateExpr(const SCEV Op, Type *Ty,		const SCEV ScalarEvolution::getTruncateExpr(const SCEV Op, Type *Ty,
unsigned Depth) {		unsigned Depth) {
assert(getTypeSizeInBits(Op->getType()) > getTypeSizeInBits(Ty) &&		assert(getTypeSizeInBits(Op->getType()) > getTypeSizeInBits(Ty) &&
"This is not a truncating conversion!");		"This is not a truncating conversion!");
▲ Show 20 Lines • Show All 9,991 Lines • Show Last 20 Lines

lib/Analysis/ScalarEvolutionExpander.cpp

Show First 20 Lines • Show All 1,466 Lines • ▼ Show 20 Lines	if (PointerType *PTy = dyn_cast<PointerType>(ExpandTy)) {
rememberInstruction(Result);		rememberInstruction(Result);
}		}
}		}

return Result;		return Result;
}		}

Value SCEVExpander::visitAddRecExpr(const SCEVAddRecExpr S) {		Value SCEVExpander::visitAddRecExpr(const SCEVAddRecExpr S) {
if (!CanonicalMode) return expandAddRecExprLiterally(S);		// In canonical mode we compute the addrec as an expression of a canonical IV
		// using evaluateAtIteration and expand the resulting SCEV expression. This
		// way we avoid introducing new IVs to carry on the comutation of the addrec
		// throughout the loop.
		//
		// For non-affine addrecs evaluateAtIteration might need a canonical IV of a
		// type wider than the addrec itself (see SCEVAddRecExpr::
		// minIterationWidthForEvaluateAtIteration). Emitting a canonical IV of the
		// proper type might produce non-legal types, for example expanding an i64
		// {0,+,2,+,1} addrec would need an i65 canonical IV. To avoid this just fall
		// back to non-canonical mode for non-affine addrecs.
		if (!CanonicalMode \|\| !S->isAffine())
		reamesUnsubmitted Not Done Reply Inline Actions How about only falling back to literal expansion for non-affine addrecs which actually need the wider expansion? reames: How about only falling back to literal expansion for non-affine addrecs which actually need the…
		return expandAddRecExprLiterally(S);

Type *Ty = SE.getEffectiveSCEVType(S->getType());		Type *Ty = SE.getEffectiveSCEVType(S->getType());
const Loop *L = S->getLoop();		const Loop *L = S->getLoop();

// First check for an existing canonical IV in a suitable type.		// First check for an existing canonical IV in a suitable type.
PHINode *CanonicalIV = nullptr;		PHINode *CanonicalIV = nullptr;
if (PHINode *PN = L->getCanonicalInductionVariable())		if (PHINode *PN = L->getCanonicalInductionVariable())
if (SE.getTypeSizeInBits(PN->getType()) >= SE.getTypeSizeInBits(Ty))		if (SE.getTypeSizeInBits(PN->getType()) >= SE.getTypeSizeInBits(Ty))
CanonicalIV = PN;		CanonicalIV = PN;

// Rewrite an AddRec in terms of the canonical induction variable, if		// Rewrite an AddRec in terms of the canonical induction variable, if
reamesUnsubmitted Not Done Reply Inline Actions Removing this entirely seems like overkill. Maybe either add a restriction to a) affine addrecs or b) the bitwidth of the existing canonical IV is sufficient? reames: Removing this entirely seems like overkill. Maybe either add a restriction to a) affine…
// its type is more narrow.		// its type is more narrow.
if (CanonicalIV &&		if (CanonicalIV &&
SE.getTypeSizeInBits(CanonicalIV->getType()) >		SE.getTypeSizeInBits(CanonicalIV->getType()) >
SE.getTypeSizeInBits(Ty)) {		SE.getTypeSizeInBits(Ty)) {
SmallVector<const SCEV *, 4> NewOps(S->getNumOperands());		SmallVector<const SCEV *, 4> NewOps(S->getNumOperands());
for (unsigned i = 0, e = S->getNumOperands(); i != e; ++i)		for (unsigned i = 0, e = S->getNumOperands(); i != e; ++i)
NewOps[i] = SE.getAnyExtendExpr(S->op_begin()[i], CanonicalIV->getType());		NewOps[i] = SE.getAnyExtendExpr(S->op_begin()[i], CanonicalIV->getType());
Value *V = expand(SE.getAddRecExpr(NewOps, S->getLoop(),		Value *V = expand(SE.getAddRecExpr(NewOps, S->getLoop(),
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	assert(Ty == SE.getEffectiveSCEVType(CanonicalIV->getType()) &&
"IVs with types different from the canonical IV should "		"IVs with types different from the canonical IV should "
"already have been handled!");		"already have been handled!");
return CanonicalIV;		return CanonicalIV;
}		}

// {0,+,F} --> {0,+,1} * F		// {0,+,F} --> {0,+,1} * F

// If this is a simple linear addrec, emit it now as a special case.		// If this is a simple linear addrec, emit it now as a special case.
if (S->isAffine()) // {0,+,F} --> i*F		if (S->isAffine()) // {0,+,F} --> i*F
		reamesUnsubmitted Not Done Reply Inline Actions Not sure you actually want getUnknown? Did you mean getSCEV? reames: Not sure you actually want getUnknown? Did you mean getSCEV?
return		return
expand(SE.getTruncateOrNoop(		expand(SE.getTruncateOrNoop(
SE.getMulExpr(SE.getUnknown(CanonicalIV),		SE.getMulExpr(SE.getUnknown(CanonicalIV),
SE.getNoopOrAnyExtend(S->getOperand(1),		SE.getNoopOrAnyExtend(S->getOperand(1),
CanonicalIV->getType())),		CanonicalIV->getType())),
Ty));		Ty));

// If this is a chain of recurrences, turn it into a closed form, using the		// If this is a chain of recurrences, turn it into a closed form, using the
// folders, then expandCodeFor the closed form. This allows the folders to		// folders, then expandCodeFor the closed form. This allows the folders to
// simplify the expression without having to build a bunch of special code		// simplify the expression without having to build a bunch of special code
// into this folder.		// into this folder.
const SCEV *IH = SE.getUnknown(CanonicalIV); // Get I as a "symbolic" SCEV.		const SCEV *IH = SE.getUnknown(CanonicalIV); // Get I as a "symbolic" SCEV.
		// evaluateAtIteration for non-affine addrecs needs canonical IV of a wider
		// type. For such addrecs we fall back to non-canonical (literal) expansion
		// mode.
		assert(SE.getTypeSizeInBits(IH->getType()) >=
		S->minIterationWidthForEvaluateAtIteration(SE) &&
		"Can't use this canonical IV for evaluateAtIteration");

// Promote S up to the canonical IV type, if the cast is foldable.		// Promote S up to the canonical IV type, if the cast is foldable.
const SCEV *NewS = S;		const SCEV *NewS = S;
const SCEV *Ext = SE.getNoopOrAnyExtend(S, CanonicalIV->getType());		const SCEV *Ext = SE.getNoopOrAnyExtend(S, CanonicalIV->getType());
if (isa<SCEVAddRecExpr>(Ext))		if (isa<SCEVAddRecExpr>(Ext))
NewS = Ext;		NewS = Ext;

const SCEV *V = cast<SCEVAddRecExpr>(NewS)->evaluateAtIteration(IH, SE);		const SCEV *V = cast<SCEVAddRecExpr>(NewS)->evaluateAtIteration(IH, SE);
▲ Show 20 Lines • Show All 827 Lines • Show Last 20 Lines

test/Analysis/ScalarEvolution/scev-expander-non-affine.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -S \| FileCheck %s

				target triple = "x86_64-unknown-linux-gnu"

				; This test verifies that SCEVExpander correctly expands non-affine addrecs in
				; "CanonicalMode". Expanding non-affine addrecs in canonical mode takes a
				; canonical IV of a type wider than the type of the addrec itself. See
				; SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration comment.
				; Currently, SCEVExpander just falls back to literal mode for non-affine
				; addrecs.
				;
				; In this test case loop-vectorize expands i8 {0,+,2,+,1} addrec in the outer
				; loop. The test checks that after the transform outer_loop has an IV literally
				; representing the addrec.

				define i32 @test(i8* %p) {
				; CHECK-LABEL: @test
				bb:
				br label %outer_loop

				outer_loop:
				; CHECK: outer_loop:
				; CHECK: %induction.iv = phi i8 [ %induction.iv.next, %outer_loop_cont ], [ 1, %bb ]
				%tmp4 = phi i32 [ 0, %bb ], [ %tmp8, %outer_loop_cont ]
				%tmp5 = phi i32 [ 0, %bb ], [ %tmp13, %outer_loop_cont ]
				%tmp7 = phi i32 [ 1, %bb ], [ %tmp16, %outer_loop_cont ]
				%tmp8 = add i32 %tmp7, %tmp4
				%tmp9 = trunc i32 %tmp8 to i8
				%tmp10 = load i8, i8* %p, align 8
				br label %inner_loop

				inner_loop:
				%tmp19 = phi i8 [ %tmp10, %outer_loop ], [ %tmp23, %inner_loop ]
				%tmp20 = phi i32 [ %tmp5, %outer_loop ], [ %tmp22, %inner_loop ]
				%tmp21 = phi i32 [ 1, %outer_loop ], [ %tmp24, %inner_loop ]
				%tmp22 = add i32 %tmp20, 1
				%tmp23 = add i8 %tmp19, %tmp9
				%tmp24 = add nuw nsw i32 %tmp21, 1
				%tmp25 = icmp ugt i32 %tmp21, 75
				br i1 %tmp25, label %outer_loop_cont, label %inner_loop

				outer_loop_cont:
				; CHECK: outer_loop_cont:
				; CHECK: %indvar.next = add i8 %indvar, 1
				%tmp12 = phi i8 [ %tmp19, %inner_loop ]
				%tmp13 = phi i32 [ %tmp22, %inner_loop ]
				%tmp14 = phi i8 [ %tmp23, %inner_loop ]
				store i8 %tmp14, i8* %p, align 8
				%tmp15 = sext i8 %tmp12 to i32
				%tmp16 = add nuw nsw i32 %tmp7, 1
				%tmp17 = icmp ugt i32 %tmp7, 256
				br i1 %tmp17, label %exit, label %outer_loop

				exit:
				%tmp2 = phi i32 [ %tmp15, %outer_loop_cont ]
				ret i32 %tmp2
				}
				No newline at end of file

unittests/Analysis/ScalarEvolutionTest.cpp

Show First 20 Lines • Show All 1,631 Lines • ▼ Show 20 Lines	auto GetAR2 = [&](ScalarEvolution &SE, Loop L) -> const SCEV {
return SE.getAddRecExpr(SE.getConstant(APInt(ARBitWidth, 5)),		return SE.getAddRecExpr(SE.getConstant(APInt(ARBitWidth, 5)),
SE.getOne(ARType), L, SCEV::FlagAnyWrap);		SE.getOne(ARType), L, SCEV::FlagAnyWrap);
};		};
TestNoCanonicalIV(GetAR2);		TestNoCanonicalIV(GetAR2);
TestNarrowCanonicalIV(GetAR2);		TestNarrowCanonicalIV(GetAR2);
TestMatchingCanonicalIV(GetAR2, ARBitWidth);		TestMatchingCanonicalIV(GetAR2, ARBitWidth);
}		}

		// Test expansion of non-affine addrecs in CanonicalMode.
		// Expanding non-affine addrecs in canonical mode takes a canonical IV of a
		// type wider than the type of the addrec itself. See SCEVAddRecExpr::
		// minIterationWidthForEvaluateAtIteration comment. Currently, SCEVExpander
		// just falls back to literal mode for non-affine addrecs.
		TEST_F(ScalarEvolutionsTest, SCEVExpandNonAffineAddRec) {
		LLVMContext C;
		SMDiagnostic Err;

		// Expand the addrec produced by GetAddRec into a loop without a canonical IV.
		auto TestNoCanonicalIV = [&](std::function<const SCEVAddRecExpr *(
		ScalarEvolution & SE, Loop * L)> GetAddRec) {
		std::unique_ptr<Module> M =
		parseAssemblyString("define i32 @test(i32 %limit) { "
		"entry: "
		" br label %loop "
		"loop: "
		" %i = phi i32 [ 1, %entry ], [ %i.inc, %loop ] "
		" %i.inc = add nsw i32 %i, 1 "
		" %cont = icmp slt i32 %i.inc, %limit "
		" br i1 %cont, label %loop, label %exit "
		"exit: "
		" ret i32 %i.inc "
		"}",
		Err, C);

		assert(M && "Could not parse module?");
		assert(!verifyModule(*M) && "Must have been well formed!");

		runWithSE(*M, "test", [&](Function &F, LoopInfo &LI, ScalarEvolution &SE) {
		auto &I = GetInstByName(F, "i");
		auto *Loop = LI.getLoopFor(I.getParent());
		EXPECT_FALSE(Loop->getCanonicalInductionVariable());

		auto *AR = GetAddRec(SE, Loop);
		EXPECT_FALSE(AR->isAffine());

		SCEVExpander Exp(SE, M->getDataLayout(), "expander");
		auto *InsertAt = I.getNextNode();
		Value *V = Exp.expandCodeFor(AR, nullptr, InsertAt);
		auto *ExpandedAR = SE.getSCEV(V);
		// Check that the expansion happened literally.
		EXPECT_EQ(AR, ExpandedAR);
		});
		};

		// Expand the addrec produced by GetAddRec into a loop with a canonical IV
		// which is narrower than addrec type.
		auto TestNarrowCanonicalIV = [&](
		std::function<const SCEVAddRecExpr (ScalarEvolution & SE, Loop L)>
		GetAddRec) {
		std::unique_ptr<Module> M = parseAssemblyString(
		"define i32 @test(i32 %limit) { "
		"entry: "
		" br label %loop "
		"loop: "
		" %i = phi i32 [ 1, %entry ], [ %i.inc, %loop ] "
		" %canonical.iv = phi i8 [ 0, %entry ], [ %canonical.iv.inc, %loop ] "
		" %i.inc = add nsw i32 %i, 1 "
		" %canonical.iv.inc = add i8 %canonical.iv, 1 "
		" %cont = icmp slt i32 %i.inc, %limit "
		" br i1 %cont, label %loop, label %exit "
		"exit: "
		" ret i32 %i.inc "
		"}",
		Err, C);

		assert(M && "Could not parse module?");
		assert(!verifyModule(*M) && "Must have been well formed!");

		runWithSE(*M, "test", [&](Function &F, LoopInfo &LI, ScalarEvolution &SE) {
		auto &I = GetInstByName(F, "i");

		auto *LoopHeaderBB = I.getParent();
		auto *Loop = LI.getLoopFor(LoopHeaderBB);
		PHINode *CanonicalIV = Loop->getCanonicalInductionVariable();
		EXPECT_EQ(CanonicalIV, &GetInstByName(F, "canonical.iv"));

		auto *AR = GetAddRec(SE, Loop);
		EXPECT_FALSE(AR->isAffine());

		unsigned ExpectedCanonicalIVWidth = SE.getTypeSizeInBits(AR->getType());
		unsigned CanonicalIVBitWidth =
		cast<IntegerType>(CanonicalIV->getType())->getBitWidth();
		EXPECT_LT(CanonicalIVBitWidth, ExpectedCanonicalIVWidth);

		SCEVExpander Exp(SE, M->getDataLayout(), "expander");
		auto *InsertAt = I.getNextNode();
		Value *V = Exp.expandCodeFor(AR, nullptr, InsertAt);
		auto *ExpandedAR = SE.getSCEV(V);
		// Check that the expansion happened literally.
		EXPECT_EQ(AR, ExpandedAR);
		});
		};

		// Expand the addrec produced by GetAddRec into a loop with a canonical IV
		// of addrec width.
		auto TestMatchingCanonicalIV = [&](
		std::function<const SCEVAddRecExpr (ScalarEvolution & SE, Loop L)>
		GetAddRec,
		unsigned ARBitWidth) {
		auto ARBitWidthTypeStr = "i" + std::to_string(ARBitWidth);
		std::unique_ptr<Module> M = parseAssemblyString(
		"define i32 @test(i32 %limit) { "
		"entry: "
		" br label %loop "
		"loop: "
		" %i = phi i32 [ 1, %entry ], [ %i.inc, %loop ] "
		" %canonical.iv = phi " + ARBitWidthTypeStr +
		" [ 0, %entry ], [ %canonical.iv.inc, %loop ] "
		" %i.inc = add nsw i32 %i, 1 "
		" %canonical.iv.inc = add " + ARBitWidthTypeStr +
		" %canonical.iv, 1 "
		" %cont = icmp slt i32 %i.inc, %limit "
		" br i1 %cont, label %loop, label %exit "
		"exit: "
		" ret i32 %i.inc "
		"}",
		Err, C);

		assert(M && "Could not parse module?");
		assert(!verifyModule(*M) && "Must have been well formed!");

		runWithSE(*M, "test", [&](Function &F, LoopInfo &LI, ScalarEvolution &SE) {
		auto &I = GetInstByName(F, "i");
		auto &CanonicalIV = GetInstByName(F, "canonical.iv");

		auto *LoopHeaderBB = I.getParent();
		auto *Loop = LI.getLoopFor(LoopHeaderBB);
		EXPECT_EQ(&CanonicalIV, Loop->getCanonicalInductionVariable());
		unsigned CanonicalIVBitWidth =
		cast<IntegerType>(CanonicalIV.getType())->getBitWidth();

		auto *AR = GetAddRec(SE, Loop);
		EXPECT_FALSE(AR->isAffine());
		EXPECT_EQ(ARBitWidth, SE.getTypeSizeInBits(AR->getType()));
		EXPECT_EQ(CanonicalIVBitWidth, ARBitWidth);

		SCEVExpander Exp(SE, M->getDataLayout(), "expander");
		auto *InsertAt = I.getNextNode();
		Value *V = Exp.expandCodeFor(AR, nullptr, InsertAt);
		auto *ExpandedAR = SE.getSCEV(V);
		// Check that the expansion happened literally.
		EXPECT_EQ(AR, ExpandedAR);
		});
		};

		unsigned ARBitWidth = 16;
		Type *ARType = IntegerType::get(C, ARBitWidth);

		// Expand {5,+,1,+,1}
		auto GetAR3 = [&](ScalarEvolution &SE, Loop L) -> const SCEVAddRecExpr {
		SmallVector<const SCEV *, 3> Ops = {SE.getConstant(APInt(ARBitWidth, 5)),
		SE.getOne(ARType), SE.getOne(ARType)};
		return cast<SCEVAddRecExpr>(SE.getAddRecExpr(Ops, L, SCEV::FlagAnyWrap));
		};
		TestNoCanonicalIV(GetAR3);
		TestNarrowCanonicalIV(GetAR3);
		TestMatchingCanonicalIV(GetAR3, ARBitWidth);

		// Expand {5,+,1,+,1,+,1}
		auto GetAR4 = [&](ScalarEvolution &SE, Loop L) -> const SCEVAddRecExpr {
		SmallVector<const SCEV *, 4> Ops = {SE.getConstant(APInt(ARBitWidth, 5)),
		SE.getOne(ARType), SE.getOne(ARType),
		SE.getOne(ARType)};
		return cast<SCEVAddRecExpr>(SE.getAddRecExpr(Ops, L, SCEV::FlagAnyWrap));
		};
		TestNoCanonicalIV(GetAR4);
		TestNarrowCanonicalIV(GetAR4);
		TestMatchingCanonicalIV(GetAR4, ARBitWidth);

		// Expand {5,+,1,+,1,+,1,+,1}
		auto GetAR5 = [&](ScalarEvolution &SE, Loop L) -> const SCEVAddRecExpr {
		SmallVector<const SCEV *, 5> Ops = {SE.getConstant(APInt(ARBitWidth, 5)),
		SE.getOne(ARType), SE.getOne(ARType),
		SE.getOne(ARType), SE.getOne(ARType)};
		return cast<SCEVAddRecExpr>(SE.getAddRecExpr(Ops, L, SCEV::FlagAnyWrap));
		};
		TestNoCanonicalIV(GetAR5);
		TestNarrowCanonicalIV(GetAR5);
		TestMatchingCanonicalIV(GetAR5, ARBitWidth);
		}


} // end anonymous namespace		} // end anonymous namespace
} // end namespace llvm		} // end namespace llvm

This is an archive of the discontinued LLVM Phabricator instance.

Fix incorrect expand of non-linear addrecsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 202768

include/llvm/Analysis/ScalarEvolutionExpander.h

include/llvm/Analysis/ScalarEvolutionExpressions.h

lib/Analysis/ScalarEvolution.cpp

lib/Analysis/ScalarEvolutionExpander.cpp

test/Analysis/ScalarEvolution/scev-expander-non-affine.ll

unittests/Analysis/ScalarEvolutionTest.cpp

Fix incorrect expand of non-linear addrecs
AbandonedPublic