This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Analysis/ScalarEvolution.cpp
1775 ↗	(On Diff #151987)	Minor: why `C` instead of `B`?
12175 ↗	(On Diff #151987)	For reference it would be nice to add a short comment describing the pattern you're matching (like `Match A - (A/B)*B`).
12178 ↗	(On Diff #151987)	(Here and below) LLVM style avoids braces around single statement blocks.
12186 ↗	(On Diff #151987)	I think the right term is "multiplicand", though I'd be fine with `MatchMulOperands` too. :)
12195 ↗	(On Diff #151987)	Do you have a test case that breaks if we remove `HasEqualValue` and instead just check for pointer equality? If not, I'd suggest removing it for now.
12212 ↗	(On Diff #151987)	SCEV sorts operands by complexity and I think in your case of `A - (A/B)B` the `A` part should always be less complex than `(A/B)B`. Can you check if this is the case in your test cases? If yes, we can probably drop one of the calls to `MatchAddends`.
llvm/test/Analysis/ScalarEvolution/trip-count14.ll
38 ↗	(On Diff #151987)	Just picking at random -- this one doesn't look obviously correct to me -- I don't immediately see a reason why `%n` can't be `SIGNED_INT32_MIN` (if `%n` is `SIGNED_INT32_MIN` then `%n * -1` sign overflows).

This revision now requires changes to proceed.Jun 19 2018, 4:07 PM

I have been working on related issue but my strategy is different:

When we reduce AddExpr(zext(X), Y, Z) I ask for X' = getAnyExtendExpr(X) (which do not involve any zext/sext/trunc usually and try to look for a matching (A*X'/A) in Y or Z.
If I find it, then I replace the zext(X) by X' and remove the matching (A*X'/A).
That is, it is equivalent to replace zext(X) by X'-(X'/A*A) and have the subtract part cancel out with one of the other element of the addition.

Updated based on comments.

Harbormaster completed remote builds in B19492: Diff 152014.Jun 19 2018, 7:07 PM

timshen marked an inline comment as done.Jun 19 2018, 7:07 PM

In D48338#1137177, @alexandre.isoard wrote:

I have been working on related issue but my strategy is different:

When we reduce AddExpr(zext(X), Y, Z) I ask for X' = getAnyExtendExpr(X) (which do not involve any zext/sext/trunc usually and try to look for a matching (A*X'/A) in Y or Z.
If I find it, then I replace the zext(X) by X' and remove the matching (A*X'/A).
That is, it is equivalent to replace zext(X) by X'-(X'/A*A) and have the subtract part cancel out with one of the other element of the addition.

In my use case, the zext is introduced by LLVM IR. In that case, ScalarEvolution::createSCEV will just call getZeroExtendExpr. I'm not sure if we can use getAnyExtendExpr there.

Ah, you are right, I misinterpreted the purpose.

On that note, why do we want to sink the zext? Wouldn't it be detrimental as it will generate operators of bigger complexity once we code-generate the SCEV.
Shouldn't we lift them instead?

That is:

%x = zext i32 %a to i64
%y = zext i32 %b to i64
%div = udiv i64 %x, %y

Expressed as: zext i32 (%a /u %b) to i64

In D48338#1137326, @alexandre.isoard wrote:
Ah, you are right, I misinterpreted the purpose.

On that note, why do we want to sink the zext? Wouldn't it be detrimental as it will generate operators of bigger complexity once we code-generate the SCEV.
Shouldn't we lift them instead?

That is:
%x = zext i32 %a to i64
%y = zext i32 %b to i64
%div = udiv i64 %x, %y
Expressed as: zext i32 (%a /u %b) to i64

What about zext(%a + %b) + %c? I think zext(%a) + zext(%b) + %c is in a much better state than, idk, zext(%a + %b + trunc(%c)).

In D48338#1137890, @timshen wrote:

What about zext(%a + %b) + %c? I think zext(%a) + zext(%b) + %c is in a much better state than, idk, zext(%a + %b + trunc(%c)).

I don't think those transformations are legal in the general case. But I would go for the expression that has the minimum bit-width of operators in general (that is, sink trunc but lift sext and zext).
Could you explain why your form is preferable (you may have good reasons I am missing)? I have a feeling the true issue is that we have trouble manipulating zext/sext/trunc and that is probably why we have some heuristic to not duplicate them too much.

I have a patch that improve the mobility of those on AddExpr, but I am not 100% sure of its correctness.

It transforms: (sext i57 (199 * (trunc i64 (-1 + (2780916192016515319 * %n)) to i57)) to i64) into (sext i57 (-199 + (trunc i64 %n to i57)) to i64) (not sure if that is correct).

Would that go towards your goal?

In D48338#1137930, @alexandre.isoard wrote:

It transforms: (sext i57 (199 * (trunc i64 (-1 + (2780916192016515319 * %n)) to i57)) to i64) into (sext i57 (-199 + (trunc i64 %n to i57)) to i64) (not sure if that is correct).

Actually that is incorrect but that's not a *classic* SCEV (it is an "Exits" value) and the loop trip count estimation changed, so the value changed...

In D48338#1137930, @alexandre.isoard wrote:

In D48338#1137890, @timshen wrote:

What about zext(%a + %b) + %c? I think zext(%a) + zext(%b) + %c is in a much better state than, idk, zext(%a + %b + trunc(%c)).

I don't think those transformations are legal in the general case. But I would go for the expression that has the minimum bit-width of operators in general (that is, sink trunc but lift sext and zext).

True.

Could you explain why your form is preferable (you may have good reasons I am missing)? I have a feeling the true issue is that we have trouble manipulating zext/sext/trunc and that is probably why we have some heuristic to not duplicate them too much.

There already exists transformation for zext((A + B + ...)<nuw>) --> (zext(A) + zext(B) + ...)<nuw> and zext((A * B * ...)<nuw>) --> (zext(A) * zext(B) * ...)<nuw>. I was to be consistent. The true is from my perspective is that a lot of pattern matching ignores potential zexts, therefore we want to push the zexts away (either to the root or to the leaves) so that pattern in the middle of the tree is not disturbed.

I have a patch that improve the mobility of those on AddExpr, but I am not 100% sure of its correctness.

It transforms: (sext i57 (199 * (trunc i64 (-1 + (2780916192016515319 * %n)) to i57)) to i64) into (sext i57 (-199 + (trunc i64 %n to i57)) to i64) (not sure if that is correct).

Would that go towards your goal?

I'm specifically looking at the URem pattern, zext(A-A/B*B). Nothing involves trunc.

That might be related, for instance, such expressions:

((zext i3 {0,+,1}<%bb> to i64) + (8 * ({0,+,1}<nuw><nsw><%bb> /u 8)) + %a)

get simplified into:

{%a,+,1}<nw><%bb>

sanjoy accepted this revision.Jun 20 2018, 6:38 PM

sanjoy added inline comments.

llvm/lib/Analysis/ScalarEvolution.cpp
12178 ↗	(On Diff #152014)	LLVM style is putting the `return false;` on a different line (though you'll probably find violations in this source file). The checked in `.clang-format` should DTRT though.
12185 ↗	(On Diff #152014)	Might be clearer to call this `MatchURemWithDivisor` and call the argument `Divisor` or `D`.

This revision is now accepted and ready to land.Jun 20 2018, 6:38 PM

In D48338#1138613, @alexandre.isoard wrote:
That might be related, for instance, such expressions:
((zext i3 {0,+,1}<%bb> to i64) + (8 * ({0,+,1}<nuw><nsw><%bb> /u 8)) + %a)
get simplified into:
{%a,+,1}<nw><%bb>

I don't see the pattern I'm trying to match.

Though I can't easily reduce to a test case, the code I'm specifically looking at looks like the following:

%11 = udiv i32 %10, 112
%12 = mul i32 %11, 112
%13 = sub i32 %10, %12
%14 = urem i32 %11, 112
%15 = udiv i32 %10, 12544
%16 = zext i32 %15 to i64
%17 = zext i32 %14 to i64
%18 = zext i32 %13 to i64
%19 = getelementptr inbounds [128 x [112 x [112 x [64 x float]]]], [128 x [112 x [112 x [64 x float]]]] addrspace(1)* %ptr, i64 0, i64 %16, i64 %17, i64 %18, i64 %3

The idea is that %10 is a flat index of %ptr, and the whole GEP should be equivalent to (in C) &%ptr[%10]. This already works for a case where there is no zexts and everything is i32. This patch makes it work with zexts.

Formatting.

Harbormaster completed remote builds in B19558: Diff 152214.Jun 20 2018, 6:50 PM

timshen marked an inline comment as done.Jun 20 2018, 6:51 PM

timshen added inline comments.

llvm/lib/Analysis/ScalarEvolution.cpp
12185 ↗	(On Diff #152014)	I'll keep the variable name "B", as it's consistent throughout the surrounding comments

Closed by commit rL335197: [SCEV] Improve zext(A /u B) and zext(A % B) (authored by timshen). · Explain WhyJun 20 2018, 6:53 PM

This revision was automatically updated to reflect the committed changes.

timshen marked an inline comment as done.

In D48338#1138675, @timshen wrote:
Though I can't easily reduce to a test case, the code I'm specifically looking at looks like the following:
%11 = udiv i32 %10, 112
%12 = mul i32 %11, 112
%13 = sub i32 %10, %12
%14 = urem i32 %11, 112
%15 = udiv i32 %10, 12544
%16 = zext i32 %15 to i64
%17 = zext i32 %14 to i64
%18 = zext i32 %13 to i64
%19 = getelementptr inbounds [128 x [112 x [112 x [64 x float]]]], [128 x [112 x [112 x [64 x float]]]] addrspace(1)* %ptr, i64 0, i64 %16, i64 %17, i64 %18, i64 %3
The idea is that %10 is a flat index of %ptr, and the whole GEP should be equivalent to (in C) &%ptr[%10]. This already works for a case where there is no zexts and everything is i32. This patch makes it work with zexts.

I see. Does it work with 128 instead of 112? urem with powers of two generate trunc and zext.

My issue is that this patch goes:

in the reverse direction of what I need to simplify trunc/zext in general (but maybe that means my approach is not the right approach)
increase the computational complexity of expressions (I suspect that will make us generate worse code but probably negligible on CPU unless we do vectorization)

But, some of our reduction rules were already going in the reverse direction (I had to change some of those to have smaller expressions in the end), so I guess my objective is not aligned with our SCEV reduction rule (which I am not sure have been "designed").

In D48338#1138713, @alexandre.isoard wrote:
In D48338#1138675, @timshen wrote:
Though I can't easily reduce to a test case, the code I'm specifically looking at looks like the following:
%11 = udiv i32 %10, 112
%12 = mul i32 %11, 112
%13 = sub i32 %10, %12
%14 = urem i32 %11, 112
%15 = udiv i32 %10, 12544
%16 = zext i32 %15 to i64
%17 = zext i32 %14 to i64
%18 = zext i32 %13 to i64
%19 = getelementptr inbounds [128 x [112 x [112 x [64 x float]]]], [128 x [112 x [112 x [64 x float]]]] addrspace(1)* %ptr, i64 0, i64 %16, i64 %17, i64 %18, i64 %3
The idea is that %10 is a flat index of %ptr, and the whole GEP should be equivalent to (in C) &%ptr[%10]. This already works for a case where there is no zexts and everything is i32. This patch makes it work with zexts.
I see. Does it work with 128 instead of 112? urem with powers of two generate trunc and zext.

My issue is that this patch goes:

in the reverse direction of what I need to simplify trunc/zext in general (but maybe that means my approach is not the right approach)

increase the computational complexity of expressions (I suspect that will make us generate worse code but probably negligible on CPU unless we do vectorization)

But, some of our reduction rules were already going in the reverse direction (I had to change some of those to have smaller expressions in the end), so I guess my objective is not aligned with our SCEV reduction rule (which I am not sure have been "designed").

Personally I don't have strong opinions on how to transform SCEV expression, as long as it doesn't end up like another InstCombine or DAGCombiner. :) If you have high-level design discussion, maybe shoot an email to llvm-dev?

I agree with Tim that this is best discussed on llvm-dev, but SCEV generally tries to push zext and sext towards the expression leaves (as Tim already pointed out, I noticed after typing this :) ). The canonical example of this is add recurrences -- SCEV tries very hard to transform zext{A,+,B} to {zext A,+,zext B}.

increase the computational complexity of expressions (I suspect that will make us generate worse code but probably negligible on CPU unless we do vectorization)

I think it is somewhat "off topic" to discussion the computational complexity of SCEV expressions -- SCEV expressions are primarily there to aid analysis, not to compute. Maybe what you're looking for is more smarts in SCEVExpander?

timshen mentioned this in D48453: [SCEV] Re-apply r335197 (with Polly fixes)..Jun 21 2018, 1:49 PM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

ScalarEvolution.h

4 lines

lib/

Analysis/

ScalarEvolution.cpp

54 lines

test/

Analysis/

ScalarEvolution/

zext-divrem.ll

42 lines

Diff 152215

llvm/trunk/include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 1,839 Lines • ▼ Show 20 Lines	private:
/// A loop is considered "used" by an expression if it contains		/// A loop is considered "used" by an expression if it contains
/// an add rec on said loop.		/// an add rec on said loop.
void getUsedLoops(const SCEV S, SmallPtrSetImpl<const Loop > &LoopsUsed);		void getUsedLoops(const SCEV S, SmallPtrSetImpl<const Loop > &LoopsUsed);

/// Find all of the loops transitively used in \p S, and update \c LoopUsers		/// Find all of the loops transitively used in \p S, and update \c LoopUsers
/// accordingly.		/// accordingly.
void addToLoopUseLists(const SCEV *S);		void addToLoopUseLists(const SCEV *S);

		/// Try to match the pattern generated by getURemExpr(A, B). If successful,
		/// Assign A and B to LHS and RHS, respectively.
		bool matchURem(const SCEV Expr, const SCEV &LHS, const SCEV *&RHS);

FoldingSet<SCEV> UniqueSCEVs;		FoldingSet<SCEV> UniqueSCEVs;
FoldingSet<SCEVPredicate> UniquePreds;		FoldingSet<SCEVPredicate> UniquePreds;
BumpPtrAllocator SCEVAllocator;		BumpPtrAllocator SCEVAllocator;

/// This maps loops to a list of SCEV expressions that (transitively) use said		/// This maps loops to a list of SCEV expressions that (transitively) use said
/// loop.		/// loop.
DenseMap<const Loop , SmallVector<const SCEV , 4>> LoopUsers;		DenseMap<const Loop , SmallVector<const SCEV , 4>> LoopUsers;

▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,750 Lines • ▼ Show 20 Lines	if (AR->isAffine()) {
if (proveNoWrapByVaryingStart<SCEVZeroExtendExpr>(Start, Step, L)) {		if (proveNoWrapByVaryingStart<SCEVZeroExtendExpr>(Start, Step, L)) {
const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNUW);		const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNUW);
return getAddRecExpr(		return getAddRecExpr(
getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this, Depth + 1),		getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this, Depth + 1),
getZeroExtendExpr(Step, Ty, Depth + 1), L, AR->getNoWrapFlags());		getZeroExtendExpr(Step, Ty, Depth + 1), L, AR->getNoWrapFlags());
}		}
}		}

		// zext(A % B) --> zext(A) % zext(B)
		{
		const SCEV *LHS;
		const SCEV *RHS;
		if (matchURem(Op, LHS, RHS))
		return getURemExpr(getZeroExtendExpr(LHS, Ty, Depth + 1),
		getZeroExtendExpr(RHS, Ty, Depth + 1));
		}

		// zext(A / B) --> zext(A) / zext(B).
		if (auto *Div = dyn_cast<SCEVUDivExpr>(Op))
		return getUDivExpr(getZeroExtendExpr(Div->getLHS(), Ty, Depth + 1),
		getZeroExtendExpr(Div->getRHS(), Ty, Depth + 1));

if (auto *SA = dyn_cast<SCEVAddExpr>(Op)) {		if (auto *SA = dyn_cast<SCEVAddExpr>(Op)) {
// zext((A + B + ...)<nuw>) --> (zext(A) + zext(B) + ...)<nuw>		// zext((A + B + ...)<nuw>) --> (zext(A) + zext(B) + ...)<nuw>
if (SA->hasNoUnsignedWrap()) {		if (SA->hasNoUnsignedWrap()) {
// If the addition does not unsign overflow then we can, by definition,		// If the addition does not unsign overflow then we can, by definition,
// commute the zero extension with the addition operation.		// commute the zero extension with the addition operation.
SmallVector<const SCEV *, 4> Ops;		SmallVector<const SCEV *, 4> Ops;
for (const auto *Op : SA->operands())		for (const auto *Op : SA->operands())
Ops.push_back(getZeroExtendExpr(Op, Ty, Depth + 1));		Ops.push_back(getZeroExtendExpr(Op, Ty, Depth + 1));
▲ Show 20 Lines • Show All 10,381 Lines • ▼ Show 20 Lines	for (auto &I : *BB) {
if (II->second.second == Expr)		if (II->second.second == Expr)
continue;		continue;

OS.indent(Depth) << "[PSE]" << I << ":\n";		OS.indent(Depth) << "[PSE]" << I << ":\n";
OS.indent(Depth + 2) << *Expr << "\n";		OS.indent(Depth + 2) << *Expr << "\n";
OS.indent(Depth + 2) << "--> " << *II->second.second << "\n";		OS.indent(Depth + 2) << "--> " << *II->second.second << "\n";
}		}
}		}

		// Match the mathematical pattern A - (A / B) * B, where A and B can be
		// arbitrary expressions.
		// It's not always easy, as A and B can be folded (imagine A is X / 2, and B is
		// 4, A / B becomes X / 8).
		bool ScalarEvolution::matchURem(const SCEV Expr, const SCEV &LHS,
		const SCEV *&RHS) {
		const auto *Add = dyn_cast<SCEVAddExpr>(Expr);
		if (Add == nullptr \|\| Add->getNumOperands() != 2)
		return false;

		const SCEV *A = Add->getOperand(1);
		const auto *Mul = dyn_cast<SCEVMulExpr>(Add->getOperand(0));

		if (Mul == nullptr)
		return false;

		const auto MatchURemWithDivisor = [&](const SCEV *B) {
		// (SomeExpr + (-(SomeExpr / B) * B)).
		if (Expr == getURemExpr(A, B)) {
		LHS = A;
		RHS = B;
		return true;
		}
		return false;
		};

		// (SomeExpr + (-1 * (SomeExpr / B) * B)).
		if (Mul->getNumOperands() == 3 && isa<SCEVConstant>(Mul->getOperand(0)))
		return MatchURemWithDivisor(Mul->getOperand(1)) \|\|
		MatchURemWithDivisor(Mul->getOperand(2));

		// (SomeExpr + ((-SomeExpr / B) * B)) or (SomeExpr + ((SomeExpr / B) * -B)).
		if (Mul->getNumOperands() == 2)
		return MatchURemWithDivisor(Mul->getOperand(1)) \|\|
		MatchURemWithDivisor(Mul->getOperand(0)) \|\|
		MatchURemWithDivisor(getNegativeSCEV(Mul->getOperand(1))) \|\|
		MatchURemWithDivisor(getNegativeSCEV(Mul->getOperand(0)));
		return false;
		};

llvm/trunk/test/Analysis/ScalarEvolution/zext-divrem.ll

				; RUN: opt -analyze -scalar-evolution -S < %s \| FileCheck %s

				define i64 @test1(i32 %a, i32 %b) {
				; CHECK-LABEL: @test1
				%div = udiv i32 %a, %b
				%zext = zext i32 %div to i64
				; CHECK: %zext
				; CHECK-NEXT: --> ((zext i32 %a to i64) /u (zext i32 %b to i64))
				ret i64 %zext
				}

				define i64 @test2(i32 %a, i32 %b) {
				; CHECK-LABEL: @test2
				%rem = urem i32 %a, %b
				%zext = zext i32 %rem to i64
				; CHECK: %zext
				; CHECK-NEXT: --> ((zext i32 %a to i64) + (-1 * (zext i32 %b to i64) * ((zext i32 %a to i64) /u (zext i32 %b to i64))))
				ret i64 %zext
				}

				define i64 @test3(i32 %a, i32 %b) {
				; CHECK-LABEL: @test3
				%div = udiv i32 %a, %b
				%mul = mul i32 %div, %b
				%sub = sub i32 %a, %mul
				%zext = zext i32 %sub to i64
				; CHECK: %zext
				; CHECK-NEXT: --> ((zext i32 %a to i64) + (-1 * (zext i32 %b to i64) * ((zext i32 %a to i64) /u (zext i32 %b to i64))))
				ret i64 %zext
				}

				define i64 @test4(i32 %t) {
				; CHECK-LABEL: @test4
				%a = udiv i32 %t, 2
				%div = udiv i32 %t, 112
				%mul = mul i32 %div, 56
				%sub = sub i32 %a, %mul
				%zext = zext i32 %sub to i64
				; CHECK: %zext
				; CHECK-NEXT: --> ((-56 * ((zext i32 %t to i64) /u 112)) + ((zext i32 %t to i64) /u 2))
				ret i64 %zext
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Improve zext(A /u B) and zext(A % B)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 152215

llvm/trunk/include/llvm/Analysis/ScalarEvolution.h

llvm/trunk/lib/Analysis/ScalarEvolution.cpp

llvm/trunk/test/Analysis/ScalarEvolution/zext-divrem.ll

[SCEV] Improve zext(A /u B) and zext(A % B)
ClosedPublic