Page MenuHomePhabricator

[SCEV] Improve zext(A /u B) and zext(A % B)
ClosedPublic

Authored by timshen on Jun 19 2018, 3:24 PM.

Details

Summary

Try to match udiv and urem patterns, and sink zext down to the leaves.

I'm not entirely sure why some unrelated tests change, but the added <nsw>s seem right.

Diff Detail

Repository
rL LLVM

Event Timeline

timshen created this revision.Jun 19 2018, 3:24 PM
sanjoy requested changes to this revision.Jun 19 2018, 4:07 PM
sanjoy added inline comments.
llvm/lib/Analysis/ScalarEvolution.cpp
1775 ↗(On Diff #151987)

Minor: why C instead of B?

12175 ↗(On Diff #151987)

For reference it would be nice to add a short comment describing the pattern you're matching (like Match A - (A/B)*B).

12178 ↗(On Diff #151987)

(Here and below) LLVM style avoids braces around single statement blocks.

12186 ↗(On Diff #151987)

I think the right term is "multiplicand", though I'd be fine with MatchMulOperands too. :)

12195 ↗(On Diff #151987)

Do you have a test case that breaks if we remove HasEqualValue and instead just check for pointer equality? If not, I'd suggest removing it for now.

12212 ↗(On Diff #151987)

SCEV sorts operands by complexity and I think in your case of A - (A/B)*B the A part should always be less complex than (A/B)*B. Can you check if this is the case in your test cases? If yes, we can probably drop one of the calls to MatchAddends.

llvm/test/Analysis/ScalarEvolution/trip-count14.ll
38 ↗(On Diff #151987)

Just picking at random -- this one doesn't look obviously correct to me -- I don't immediately see a reason why %n can't be SIGNED_INT32_MIN (if %n is SIGNED_INT32_MIN then %n * -1 sign overflows).

This revision now requires changes to proceed.Jun 19 2018, 4:07 PM

I have been working on related issue but my strategy is different:

When we reduce AddExpr(zext(X), Y, Z) I ask for X' = getAnyExtendExpr(X) (which do not involve any zext/sext/trunc usually and try to look for a matching (A*X'/A) in Y or Z.
If I find it, then I replace the zext(X) by X' and remove the matching (A*X'/A).
That is, it is equivalent to replace zext(X) by X'-(X'/A*A) and have the subtract part cancel out with one of the other element of the addition.

timshen updated this revision to Diff 152014.Jun 19 2018, 7:07 PM
timshen marked 5 inline comments as done.

Updated based on comments.

timshen marked an inline comment as done.Jun 19 2018, 7:07 PM

I have been working on related issue but my strategy is different:

When we reduce AddExpr(zext(X), Y, Z) I ask for X' = getAnyExtendExpr(X) (which do not involve any zext/sext/trunc usually and try to look for a matching (A*X'/A) in Y or Z.
If I find it, then I replace the zext(X) by X' and remove the matching (A*X'/A).
That is, it is equivalent to replace zext(X) by X'-(X'/A*A) and have the subtract part cancel out with one of the other element of the addition.

In my use case, the zext is introduced by LLVM IR. In that case, ScalarEvolution::createSCEV will just call getZeroExtendExpr. I'm not sure if we can use getAnyExtendExpr there.

Ah, you are right, I misinterpreted the purpose.

On that note, why do we want to sink the zext? Wouldn't it be detrimental as it will generate operators of bigger complexity once we code-generate the SCEV.
Shouldn't we lift them instead?

That is:

%x = zext i32 %a to i64
%y = zext i32 %b to i64
%div = udiv i64 %x, %y

Expressed as: zext i32 (%a /u %b) to i64

Ah, you are right, I misinterpreted the purpose.

On that note, why do we want to sink the zext? Wouldn't it be detrimental as it will generate operators of bigger complexity once we code-generate the SCEV.
Shouldn't we lift them instead?

That is:

%x = zext i32 %a to i64
%y = zext i32 %b to i64
%div = udiv i64 %x, %y

Expressed as: zext i32 (%a /u %b) to i64

What about zext(%a + %b) + %c? I think zext(%a) + zext(%b) + %c is in a much better state than, idk, zext(%a + %b + trunc(%c)).

What about zext(%a + %b) + %c? I think zext(%a) + zext(%b) + %c is in a much better state than, idk, zext(%a + %b + trunc(%c)).

I don't think those transformations are legal in the general case. But I would go for the expression that has the minimum bit-width of operators in general (that is, sink trunc but lift sext and zext).
Could you explain why your form is preferable (you may have good reasons I am missing)? I have a feeling the true issue is that we have trouble manipulating zext/sext/trunc and that is probably why we have some heuristic to not duplicate them too much.

I have a patch that improve the mobility of those on AddExpr, but I am not 100% sure of its correctness.

It transforms: (sext i57 (199 * (trunc i64 (-1 + (2780916192016515319 * %n)) to i57)) to i64) into (sext i57 (-199 + (trunc i64 %n to i57)) to i64) (not sure if that is correct).

Would that go towards your goal?

It transforms: (sext i57 (199 * (trunc i64 (-1 + (2780916192016515319 * %n)) to i57)) to i64) into (sext i57 (-199 + (trunc i64 %n to i57)) to i64) (not sure if that is correct).

Actually that is incorrect but that's not a *classic* SCEV (it is an "Exits" value) and the loop trip count estimation changed, so the value changed...

What about zext(%a + %b) + %c? I think zext(%a) + zext(%b) + %c is in a much better state than, idk, zext(%a + %b + trunc(%c)).

I don't think those transformations are legal in the general case. But I would go for the expression that has the minimum bit-width of operators in general (that is, sink trunc but lift sext and zext).

True.

Could you explain why your form is preferable (you may have good reasons I am missing)? I have a feeling the true issue is that we have trouble manipulating zext/sext/trunc and that is probably why we have some heuristic to not duplicate them too much.

There already exists transformation for zext((A + B + ...)<nuw>) --> (zext(A) + zext(B) + ...)<nuw> and zext((A * B * ...)<nuw>) --> (zext(A) * zext(B) * ...)<nuw>. I was to be consistent. The true is from my perspective is that a lot of pattern matching ignores potential zexts, therefore we want to push the zexts away (either to the root or to the leaves) so that pattern in the middle of the tree is not disturbed.

I have a patch that improve the mobility of those on AddExpr, but I am not 100% sure of its correctness.

It transforms: (sext i57 (199 * (trunc i64 (-1 + (2780916192016515319 * %n)) to i57)) to i64) into (sext i57 (-199 + (trunc i64 %n to i57)) to i64) (not sure if that is correct).

Would that go towards your goal?

I'm specifically looking at the URem pattern, zext(A-A/B*B). Nothing involves trunc.

That might be related, for instance, such expressions:

((zext i3 {0,+,1}<%bb> to i64) + (8 * ({0,+,1}<nuw><nsw><%bb> /u 8)) + %a)

get simplified into:

{%a,+,1}<nw><%bb>
sanjoy accepted this revision.Jun 20 2018, 6:38 PM
sanjoy added inline comments.
llvm/lib/Analysis/ScalarEvolution.cpp
12178 ↗(On Diff #152014)

LLVM style is putting the return false; on a different line (though you'll probably find violations in this source file).

The checked in .clang-format should DTRT though.

12185 ↗(On Diff #152014)

Might be clearer to call this MatchURemWithDivisor and call the argument Divisor or D.

This revision is now accepted and ready to land.Jun 20 2018, 6:38 PM

That might be related, for instance, such expressions:

((zext i3 {0,+,1}<%bb> to i64) + (8 * ({0,+,1}<nuw><nsw><%bb> /u 8)) + %a)

get simplified into:

{%a,+,1}<nw><%bb>

I don't see the pattern I'm trying to match.

Though I can't easily reduce to a test case, the code I'm specifically looking at looks like the following:

%11 = udiv i32 %10, 112
%12 = mul i32 %11, 112
%13 = sub i32 %10, %12
%14 = urem i32 %11, 112
%15 = udiv i32 %10, 12544
%16 = zext i32 %15 to i64
%17 = zext i32 %14 to i64
%18 = zext i32 %13 to i64
%19 = getelementptr inbounds [128 x [112 x [112 x [64 x float]]]], [128 x [112 x [112 x [64 x float]]]] addrspace(1)* %ptr, i64 0, i64 %16, i64 %17, i64 %18, i64 %3

The idea is that %10 is a flat index of %ptr, and the whole GEP should be equivalent to (in C) &%ptr[%10]. This already works for a case where there is no zexts and everything is i32. This patch makes it work with zexts.

timshen updated this revision to Diff 152214.Jun 20 2018, 6:50 PM
timshen marked an inline comment as done.

Formatting.

timshen marked an inline comment as done.Jun 20 2018, 6:51 PM
timshen added inline comments.
llvm/lib/Analysis/ScalarEvolution.cpp
12185 ↗(On Diff #152014)

I'll keep the variable name "B", as it's consistent throughout the surrounding comments

This revision was automatically updated to reflect the committed changes.
timshen marked an inline comment as done.

Though I can't easily reduce to a test case, the code I'm specifically looking at looks like the following:

%11 = udiv i32 %10, 112
%12 = mul i32 %11, 112
%13 = sub i32 %10, %12
%14 = urem i32 %11, 112
%15 = udiv i32 %10, 12544
%16 = zext i32 %15 to i64
%17 = zext i32 %14 to i64
%18 = zext i32 %13 to i64
%19 = getelementptr inbounds [128 x [112 x [112 x [64 x float]]]], [128 x [112 x [112 x [64 x float]]]] addrspace(1)* %ptr, i64 0, i64 %16, i64 %17, i64 %18, i64 %3

The idea is that %10 is a flat index of %ptr, and the whole GEP should be equivalent to (in C) &%ptr[%10]. This already works for a case where there is no zexts and everything is i32. This patch makes it work with zexts.

I see. Does it work with 128 instead of 112? urem with powers of two generate trunc and zext.

My issue is that this patch goes:

  • in the reverse direction of what I need to simplify trunc/zext in general (but maybe that means my approach is not the right approach)
  • increase the computational complexity of expressions (I suspect that will make us generate worse code but probably negligible on CPU unless we do vectorization)

But, some of our reduction rules were already going in the reverse direction (I had to change some of those to have smaller expressions in the end), so I guess my objective is not aligned with our SCEV reduction rule (which I am not sure have been "designed").

Though I can't easily reduce to a test case, the code I'm specifically looking at looks like the following:

%11 = udiv i32 %10, 112
%12 = mul i32 %11, 112
%13 = sub i32 %10, %12
%14 = urem i32 %11, 112
%15 = udiv i32 %10, 12544
%16 = zext i32 %15 to i64
%17 = zext i32 %14 to i64
%18 = zext i32 %13 to i64
%19 = getelementptr inbounds [128 x [112 x [112 x [64 x float]]]], [128 x [112 x [112 x [64 x float]]]] addrspace(1)* %ptr, i64 0, i64 %16, i64 %17, i64 %18, i64 %3

The idea is that %10 is a flat index of %ptr, and the whole GEP should be equivalent to (in C) &%ptr[%10]. This already works for a case where there is no zexts and everything is i32. This patch makes it work with zexts.

I see. Does it work with 128 instead of 112? urem with powers of two generate trunc and zext.

My issue is that this patch goes:

  • in the reverse direction of what I need to simplify trunc/zext in general (but maybe that means my approach is not the right approach)
  • increase the computational complexity of expressions (I suspect that will make us generate worse code but probably negligible on CPU unless we do vectorization)

But, some of our reduction rules were already going in the reverse direction (I had to change some of those to have smaller expressions in the end), so I guess my objective is not aligned with our SCEV reduction rule (which I am not sure have been "designed").

Personally I don't have strong opinions on how to transform SCEV expression, as long as it doesn't end up like another InstCombine or DAGCombiner. :) If you have high-level design discussion, maybe shoot an email to llvm-dev?

I agree with Tim that this is best discussed on llvm-dev, but SCEV generally tries to push zext and sext towards the expression leaves (as Tim already pointed out, I noticed after typing this :) ). The canonical example of this is add recurrences -- SCEV tries very hard to transform zext{A,+,B} to {zext A,+,zext B}.

increase the computational complexity of expressions (I suspect that will make us generate worse code but probably negligible on CPU unless we do vectorization)

I think it is somewhat "off topic" to discussion the computational complexity of SCEV expressions -- SCEV expressions are primarily there to aid analysis, not to compute. Maybe what you're looking for is more smarts in SCEVExpander?