[SCEV] Compute SCEV for ashr(add(shl(x, n), c), m) instr tripletClosedPublic

Authored by vedant-amd on Jun 6 2023, 8:14 AM.

Details

Reviewers
 fhahn nikic mkazantsev efriedma
Commits
rG5a9a02f67b77: [SCEV] Compute SCEV for ashr(add(shl(x, n), c), m) instr triplet
Summary

%x = shl i64 %w, n
%y = add i64 %x, c
%z = ashr i64 %y, m

The above given instruction triplet is seen many times in the generated
LLVM IR, but SCEV model is not able to compute the SCEV value of AShr
instruction in this case.

This patch models the two cases of the above instruction pattern using
the following expression:

> sext(add(mul(trunc(w), 2^(n-m)), c >> m))

1. when n = m the expression reduces to sext(add(trunc(w), c >> n))

as n-m=0, and multiplying with 2^0 gives the same result.

1. when n > m the expression works as given above.

It also adds several unittest to verify that SCEV is able to compute
the value.

Comparing the snippets of the result of SCEV analysis:

• SCEV of ashr before change ----------------------------

%idxprom = ashr exact i64 %sext, 32

```-->  %idxprom U: [-2147483648,2147483648) S: [-2147483648,2147483648)
Exits: 8                LoopDispositions: { %for.body: Variant }```
• SCEV of ashr after change ---------------------------

%idxprom = ashr exact i64 %sext, 32

```-->  {0,+,1}<nuw><nsw><%for.body> U: [0,9) S: [0,9)
Exits: 8                LoopDispositions: { %for.body: Computable }```

LoopDisposition of the given SCEV was LoopVariant before, after adding
the new way to model the instruction, the LoopDisposition becomes
LoopComputable as it is able to compute the SCEV of the instruction.

Diff Detail

Event Timeline

vedant-amd created this revision.Jun 6 2023, 8:14 AM
Herald added a project: Restricted Project. Jun 6 2023, 8:14 AM
vedant-amd requested review of this revision.Jun 6 2023, 8:14 AM
Herald added a project: Restricted Project. Jun 6 2023, 8:14 AM
vedant-amd edited the summary of this revision. (Show Details)Jun 6 2023, 8:19 AM
vedant-amd edited the summary of this revision. (Show Details)
vedant-amd edited the summary of this revision. (Show Details)
vedant-amd edited the summary of this revision. (Show Details)
vedant-amd added a comment.EditedJun 6 2023, 8:31 AM

Hey @fhahn @nikic @mkazantsev can you please review this patch, I stumbled upon this while working on a bug in LSR, this patch does fail one of the unit test-case (scaling-factor-incompat-type.ll) in LSR. I believe that might be an effect of SCEV of this AShr pattern being computed, as LSR generally does do some optimizations based on the Shl + Add + AShr instructions.

Hey @fhahn @nikic @mkazantsev can you please review this patch, I stumbled upon this while working on a bug in LSR, this patch does fail one of the unit test-case (scaling-factor-incompat-type.ll) in LSR. I believe that might be an effect of SCEV of this AShr pattern being computed, as LSR generally does do some optimizations based on the Shl + Add + AShr instructions.

ping ! Could you please take a look at this patch once.

llvm/lib/Analysis/ScalarEvolution.cpp
7882

Is there some reason to expect that "c" fits into an int64_t?

7883

I guess the reason AddOperandCI has to be a constant is that this shift needs to constant-fold?

7886

This could be extended to cases where n is greater than m? You can skip that for the initial patch, of course.

I don't really see any reason to restrict the shift amounts like this; the transform is pretty restricted even without that. What effect are you worried about?

38

Try to avoid "undef" in testcases where it isn't relevant. I don't think it really has much effect here, but still.

llvm/lib/Analysis/ScalarEvolution.cpp
7882

Right, I need to add a check like this I guess ?

`if (CI->getValue().uge(BitWidth))`
7883

Theoretically, it doesn't have to be a constant. But, I don't see where such a IR will be emitted. I can handle this case in a future patch.

This IR is usually emitted for the following C code. InstCombine does the optimization.

```int64_t a = 10;
int32_t b = a - 1;
printf("%d", arr[b]);```
vedant-amd marked an inline comment as not done.Jun 8 2023, 9:40 PM
llvm/lib/Analysis/ScalarEvolution.cpp
7882

That doesn't look like the right check.

If it were necessary, you could use isIntN or something like that. But you should just be able to just do the math in APInt: AddOperandCI->getValue().ashr(AShrAmt) or something like that.

vedant-amd updated this revision to Diff 529822.Jun 8 2023, 9:55 PM

Updated unit test to remove ret undefs

vedant-amd marked an inline comment as done.Jun 8 2023, 9:58 PM
38

Sure, will keep this in mind.

vedant-amd marked an inline comment as done.Jun 8 2023, 9:58 PM
llvm/lib/Analysis/ScalarEvolution.cpp
7886

This could be extended to cases where n is greater than m? You can skip that for the initial patch, of course.

I could do that, but the original goal was to handle sext(trunc) expression that are expanded into these statements by instcombine.

I don't really see any reason to restrict the shift amounts like this; the transform is pretty restricted even without that. What effect are you worried about?

We only wanted to support the data types supported by C/C++, and also since these instr are transformed from sext(trunc) it makes sense to just support standard integer types.

llvm/lib/Analysis/ScalarEvolution.cpp
7883

I guess the reason AddOperandCI has to be a constant is that this shift needs to constant-fold?

This can be implemented for non-constants as well, maybe in a future patch. But, it involves coming up with complex SCEV expression involving div.

Added check to make sure AddConstant isn't nullptr and few other changes.

vedant-amd marked 3 inline comments as done.Jun 8 2023, 10:48 PM
llvm/lib/Analysis/ScalarEvolution.cpp
7882

Hey @efriedma , I have made the requested changes (and replied to the questions). Please let me know if anything more needs to be done. Thanks !

updated using clang-format

nikic added a comment.Jun 9 2023, 12:44 AM

I assume the motivation for this is the following InstCombine transform? https://llvm.godbolt.org/z/zd63TWKbW I don't think we can change that canonicalization, so undoing it in SCEV is fine.

llvm/lib/Analysis/ScalarEvolution.cpp
7871–7900

We already have code to handle the general shl+ashr pattern here, including for the case where n and m are not the same. We should reuse the same code. For your pattern, we'd just have an extra add at the start.

llvm/lib/Analysis/ScalarEvolution.cpp
7871–7900

That could be done, but it would be a major refactoring. I will post a patch anyways.

I assume the motivation for this is the following InstCombine transform? https://llvm.godbolt.org/z/zd63TWKbW I don't think we can change that canonicalization, so undoing it in SCEV is fine.

Yup, that's correct. This canonicalization is used several times in LSR, so having it's SCEV might be beneficial.

vedant-amd updated this revision to Diff 529876.Jun 9 2023, 2:52 AM

Refactored code to reuse ashr + shl handling code

I assume the motivation for this is the following InstCombine transform? https://llvm.godbolt.org/z/zd63TWKbW I don't think we can change that canonicalization, so undoing it in SCEV is fine.

Hey @nikic, I did refactor the change, please take a look, all the SCEV test cases do pass with it. It look correct functionally.

I assume the motivation for this is the following InstCombine transform? https://llvm.godbolt.org/z/zd63TWKbW I don't think we can change that canonicalization, so undoing it in SCEV is fine.

Hey @nikic, I did refactor the change, please take a look, all the SCEV test cases do pass with it. It look correct functionally.

ping ! could you please give a +1 ? @nikic @efriedma

xbolva00 added a subscriber: xbolva00.EditedJun 11 2023, 4:35 AM

You should update checks in loopstrengthreduce::scaling-factor-incompat-type.ll

You should update checks in loopstrengthreduce::scaling-factor-incompat-type.ll

I was going to send another patch for that, testcase failure. Sure, will do so.

Ping ! I didn't add the LSR failing testcase fix here, because I honestly don't know if it uncovered a bug, or it's a valid transformation in LSR.

You have to update that file anyway. Pre-commit builds are clearly failing.

You have to update that file anyway. Pre-commit builds are clearly failing.

Right, is it fine if I mark the test case as XFAIL for now ?

nikic added a comment.Jun 13 2023, 6:07 AM

You have to update that file anyway. Pre-commit builds are clearly failing.

Right, is it fine if I mark the test case as XFAIL for now ?

No, you need to regenerate the test by running the update_test_checks.py script.

llvm/lib/Analysis/ScalarEvolution.cpp
7884

There should be no restriction on the shift amount.

7888

We should verify that the add constant has lowest n bits unset, to make reassociation valid. It doesn't matter if n==m, but it matters if n!=m.

7898

You can keep the truncate in the comment code by making this trunc(add()) instead of add(trunc()). Unless I'm missing something, they're equivalent in this context.

7911

Stray newline

7

You can test these cases with basic patterns using just the shl+add+ashr sequence, without a loop (though it's also ok to keep a loop test as a motivating case). Please also test the case where the shift amounts are different.

You have to update that file anyway. Pre-commit builds are clearly failing.

Right, is it fine if I mark the test case as XFAIL for now ?

No, you need to regenerate the test by running the update_test_checks.py script.

Will do so, I was not sure if the testcase will be still valid after the updated CHECKs.

updated failing testcase in LSR

vedant-amd marked an inline comment as done.Jun 13 2023, 11:12 PM

removed restriction on shift amount and the corresponding test case

vedant-amd marked an inline comment as done.Jun 13 2023, 11:19 PM
vedant-amd marked an inline comment as done.
llvm/lib/Analysis/ScalarEvolution.cpp
7888

Do you mean to say check for unset bits from nth bit to the mth bit ? because anything lower than mth bit will anyways be irrelevant once we right shift by m amount.

vedant-amd marked an inline comment as not done.Jun 14 2023, 12:20 AM
llvm/lib/Analysis/ScalarEvolution.cpp
7888

I tried to understand this, I am not able to make sense of it. Can you explain further what you mean to say ?

updated code to shift right the AddConstant by ShlAmt and flip add, trunc exprs

llvm/lib/Analysis/ScalarEvolution.cpp
7898

So, it should look like this ?

`AddTruncateExpr = getTruncateExpr(getAddExpr(ShlOp0SCEV, AddConstant), TruncTy);`

updated code to shift right the AddConstant by ShlAmt and flip add, trunc exprs

vedant-amd added a comment.EditedJul 31 2023, 2:50 AM

ping !! @nikic can you take a look at this, thanks !

Updated LSR testcase and added new testcases for AShr SCEV model

Hey @nikic and @efriedma Sorry for the ping again, but I have addressed all the comments (added the testcases as well), can you please review once again, it's been stuck since a long time. Thanks for your time.

12 ↗(On Diff #545909)

How is this equivalent? Say %a is zero; the original function returns 1, this SCEV expression returns 0. (I think maybe the "ashr" of the constant is shifting by 10, instead of shifting by 8?)

12 ↗(On Diff #545909)

Yeah, the SCEV is wrong. I will fix this. This is what is should have been, but it's a bit opposite.

```= (a*2^10 + 256)/2*8
= a*4 + 1```

So, we need SCEV like this:

2^(shl_amt - ashr_amt) * a + c >> ashr_amt

I have updated my code, here's the correct SCEV:

(1 + (sext i56 (4 * (trunc i64 %a to i56)) to i64))<nuw><nsw> U: [1,-2) S: [-36028797018963967,36028797018963966)

Does it seem correct ?

vedant-amd updated this revision to Diff 546769.Aug 3 2023, 2:37 AM

Fixed issue with generated SCEV for ShlAmt > AshrAmt

Hey @efriedma Thanks for spotting the bug, I updated the SCEV for case when m > n, it seems to be correct now. Can you please review the patch, thanks !

CC: @nikic

12 ↗(On Diff #545909)

I suspect the updated code isn't right... can you add an example with a larger operand to the add? The add needs to happen before the sign-extend, but in this specific example, that doesn't matter because the two expressions are equivalent.

12 ↗(On Diff #545909)

I will add one, but I believe we can move the add out of the sext as the add constant is already shifted right by Ashr amount.

12 ↗(On Diff #545909)

Also, I am confused about one thing, the Type of MulExpr turns out to be i56 and that of the addConstant is i64, I am able to add them before the Sign extend. Adding the m > n part to this ashr model seems to buggy at this point, as I had proposed can this patch just address the case where m=n ? and I submit, the m > n thing later with thorough thinking. But, I think @nikic wants them in the same patch.

12 ↗(On Diff #545909)

Adding to the previous comment, the Ashr + Add + Shl with shlamt > ashramt quite rarely occur in LLVM IR. But, the case where the shlamt = ashramt is quite common in LLVM IR, notably because instcombine like passes do the transformation. I propose that this patch just address the latter and I send a patch for the m > n case at a later point. This patch is stuck from a long time, and just taking care of m=n case won't negatively affect as it'll safely exit in the m > n case.

I'm not sure why you're having so much trouble with the unequal shift math. The case of unequal shift amount is really just treating the shl as two shifts: one shift by an arbitrary amount, followed by one shift with shift amount equal to the ashr.

That said I'd be okay with a patch that bailed out on that case.

Update the SCEV code and add a new testcase

I'm not sure why you're having so much trouble with the unequal shift math. The case of unequal shift amount is really just treating the shl as two shifts: one shift by an arbitrary amount, followed by one shift with shift amount equal to the ashr.

I am having trouble with keeping the addition part inside the SextExpr thing, because it always returns back i64 int, but now came up with a better implementation. It seems correct to me, please take a look.

That said I'd be okay with a patch that bailed out on that case.

I'm not sure why you're having so much trouble with the unequal shift math. The case of unequal shift amount is really just treating the shl as two shifts: one shift by an arbitrary amount, followed by one shift with shift amount equal to the ashr.

That said I'd be okay with a patch that bailed out on that case.

updated with clang-format

added empty line at end of file in a test case

@efriedma I have fixed the SCEV part, it mostly seems correct. Also added the testcase with large add constant. Please take a look once ! thanks !

The revised logic makes sense.

llvm/lib/Analysis/ScalarEvolution.cpp
7899

Instead of checking if (ShlAmt > AShrAmt) here, can you just unconditionally do AddTruncateExpr = getTruncateExpr(ShlOp0SCEV, TruncTy);, then add an if (L->getOpcode() != Instruction::Shl) check to the equal shift amount case?

Or better, just unify the ShlAmt > AShrAmt and ShlAmt == AShrAmt cases; the logic for the ShlAmt > AShrAmt case should just work for the ShlAmt == AShrAmt case (it ends up multiplying by a constant 1, which simplifies to exactly the same thing as the existing ShlAmt == AShrAmt code).

vedant-amd marked an inline comment as done.

Simplified expression handling code, and cleaned up other things

llvm/lib/Analysis/ScalarEvolution.cpp
7899

I have implemented the second suggestion, refactored and cleaned up the comments. Now the code looks in good shape. Please give a final review for the same.

This revision is now accepted and ready to land.Aug 24 2023, 9:51 AM
vedant-amd edited the summary of this revision. (Show Details)

Updated the commit message

This revision was landed with ongoing or failed builds.Aug 24 2023, 10:47 PM
This revision was automatically updated to reflect the committed changes.