Try to match udiv and urem patterns, and sink zext down to the leaves.
I'm not entirely sure why some unrelated tests change, but the added <nsw>s seem right.
Differential D48338
[SCEV] Improve zext(A /u B) and zext(A % B) timshen on Jun 19 2018, 3:24 PM. Authored by
Details Try to match udiv and urem patterns, and sink zext down to the leaves. I'm not entirely sure why some unrelated tests change, but the added <nsw>s seem right.
Diff Detail
Event Timeline
Comment Actions I have been working on related issue but my strategy is different: When we reduce AddExpr(zext(X), Y, Z) I ask for X' = getAnyExtendExpr(X) (which do not involve any zext/sext/trunc usually and try to look for a matching (A*X'/A) in Y or Z. Comment Actions In my use case, the zext is introduced by LLVM IR. In that case, ScalarEvolution::createSCEV will just call getZeroExtendExpr. I'm not sure if we can use getAnyExtendExpr there. Comment Actions Ah, you are right, I misinterpreted the purpose. On that note, why do we want to sink the zext? Wouldn't it be detrimental as it will generate operators of bigger complexity once we code-generate the SCEV. That is: %x = zext i32 %a to i64 %y = zext i32 %b to i64 %div = udiv i64 %x, %y Expressed as: zext i32 (%a /u %b) to i64 Comment Actions What about zext(%a + %b) + %c? I think zext(%a) + zext(%b) + %c is in a much better state than, idk, zext(%a + %b + trunc(%c)). Comment Actions I don't think those transformations are legal in the general case. But I would go for the expression that has the minimum bit-width of operators in general (that is, sink trunc but lift sext and zext). I have a patch that improve the mobility of those on AddExpr, but I am not 100% sure of its correctness. It transforms: (sext i57 (199 * (trunc i64 (-1 + (2780916192016515319 * %n)) to i57)) to i64) into (sext i57 (-199 + (trunc i64 %n to i57)) to i64) (not sure if that is correct). Would that go towards your goal? Comment Actions Actually that is incorrect but that's not a *classic* SCEV (it is an "Exits" value) and the loop trip count estimation changed, so the value changed... Comment Actions True.
There already exists transformation for zext((A + B + ...)<nuw>) --> (zext(A) + zext(B) + ...)<nuw> and zext((A * B * ...)<nuw>) --> (zext(A) * zext(B) * ...)<nuw>. I was to be consistent. The true is from my perspective is that a lot of pattern matching ignores potential zexts, therefore we want to push the zexts away (either to the root or to the leaves) so that pattern in the middle of the tree is not disturbed.
I'm specifically looking at the URem pattern, zext(A-A/B*B). Nothing involves trunc. Comment Actions That might be related, for instance, such expressions: ((zext i3 {0,+,1}<%bb> to i64) + (8 * ({0,+,1}<nuw><nsw><%bb> /u 8)) + %a) get simplified into: {%a,+,1}<nw><%bb>
Comment Actions I don't see the pattern I'm trying to match. Though I can't easily reduce to a test case, the code I'm specifically looking at looks like the following: %11 = udiv i32 %10, 112 %12 = mul i32 %11, 112 %13 = sub i32 %10, %12 %14 = urem i32 %11, 112 %15 = udiv i32 %10, 12544 %16 = zext i32 %15 to i64 %17 = zext i32 %14 to i64 %18 = zext i32 %13 to i64 %19 = getelementptr inbounds [128 x [112 x [112 x [64 x float]]]], [128 x [112 x [112 x [64 x float]]]] addrspace(1)* %ptr, i64 0, i64 %16, i64 %17, i64 %18, i64 %3 The idea is that %10 is a flat index of %ptr, and the whole GEP should be equivalent to (in C) &%ptr[%10]. This already works for a case where there is no zexts and everything is i32. This patch makes it work with zexts.
Comment Actions I see. Does it work with 128 instead of 112? urem with powers of two generate trunc and zext. My issue is that this patch goes:
But, some of our reduction rules were already going in the reverse direction (I had to change some of those to have smaller expressions in the end), so I guess my objective is not aligned with our SCEV reduction rule (which I am not sure have been "designed"). Comment Actions Personally I don't have strong opinions on how to transform SCEV expression, as long as it doesn't end up like another InstCombine or DAGCombiner. :) If you have high-level design discussion, maybe shoot an email to llvm-dev? Comment Actions I agree with Tim that this is best discussed on llvm-dev, but SCEV generally tries to push zext and sext towards the expression leaves (as Tim already pointed out, I noticed after typing this :) ). The canonical example of this is add recurrences -- SCEV tries very hard to transform zext{A,+,B} to {zext A,+,zext B}.
I think it is somewhat "off topic" to discussion the computational complexity of SCEV expressions -- SCEV expressions are primarily there to aid analysis, not to compute. Maybe what you're looking for is more smarts in SCEVExpander? |
Minor: why C instead of B?