This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
15/20
ScalarEvolution.cpp
-
test/
-
Analysis/ScalarEvolution/
-
ScalarEvolution/
2/3
sext-add-inreg.ll
-
Transforms/LoopStrengthReduce/
-
LoopStrengthReduce/
-
scaling-factor-incompat-type.ll

Differential D152278

[SCEV] Compute SCEV for ashr(add(shl(x, n), c), m) instr triplet
ClosedPublic

Authored by vedant-amd on Jun 6 2023, 8:14 AM.

Download Raw Diff

Details

Reviewers

fhahn
nikic
mkazantsev
efriedma

Commits

rG5a9a02f67b77: [SCEV] Compute SCEV for ashr(add(shl(x, n), c), m) instr triplet

Summary

%x = shl i64 %w, n
%y = add i64 %x, c
%z = ashr i64 %y, m

The above given instruction triplet is seen many times in the generated
LLVM IR, but SCEV model is not able to compute the SCEV value of AShr
instruction in this case.

This patch models the two cases of the above instruction pattern using
the following expression:

> sext(add(mul(trunc(w), 2^(n-m)), c >> m))

when n = m the expression reduces to sext(add(trunc(w), c >> n))

as n-m=0, and multiplying with 2^0 gives the same result.

when n > m the expression works as given above.

It also adds several unittest to verify that SCEV is able to compute
the value.

$ opt sext-add-inreg.ll -passes="print<scalar-evolution>"

Comparing the snippets of the result of SCEV analysis:

SCEV of ashr before change ----------------------------

%idxprom = ashr exact i64 %sext, 32

-->  %idxprom U: [-2147483648,2147483648) S: [-2147483648,2147483648)
Exits: 8                LoopDispositions: { %for.body: Variant }

SCEV of ashr after change ---------------------------

%idxprom = ashr exact i64 %sext, 32

-->  {0,+,1}<nuw><nsw><%for.body> U: [0,9) S: [0,9)
Exits: 8                LoopDispositions: { %for.body: Computable }

LoopDisposition of the given SCEV was LoopVariant before, after adding
the new way to model the instruction, the LoopDisposition becomes
LoopComputable as it is able to compute the SCEV of the instruction.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vedant-amd created this revision.Jun 6 2023, 8:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2023, 8:14 AM

Herald added subscribers: javed.absar, hiraditya. · View Herald Transcript

vedant-amd requested review of this revision.Jun 6 2023, 8:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2023, 8:14 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

vedant-amd added reviewers: fhahn, nikic, mkazantsev.Jun 6 2023, 8:17 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptJun 6 2023, 8:17 AM

vedant-amd edited the summary of this revision. (Show Details)Jun 6 2023, 8:19 AM

vedant-amd edited the summary of this revision. (Show Details)

sekharbvrs added a subscriber: sekharbvrs.Jun 6 2023, 8:27 AM

Hey @fhahn @nikic @mkazantsev can you please review this patch, I stumbled upon this while working on a bug in LSR, this patch does fail one of the unit test-case (scaling-factor-incompat-type.ll) in LSR. I believe that might be an effect of SCEV of this AShr pattern being computed, as LSR generally does do some optimizations based on the Shl + Add + AShr instructions.

Harbormaster completed remote builds in B236958: Diff 528874.Jun 6 2023, 9:37 AM

In D152278#4399751, @vedant-amd wrote:

Hey @fhahn @nikic @mkazantsev can you please review this patch, I stumbled upon this while working on a bug in LSR, this patch does fail one of the unit test-case (scaling-factor-incompat-type.ll) in LSR. I believe that might be an effect of SCEV of this AShr pattern being computed, as LSR generally does do some optimizations based on the Shl + Add + AShr instructions.

ping ! Could you please take a look at this patch once.

efriedma added a subscriber: efriedma.Jun 8 2023, 1:36 PM

efriedma added inline comments.

llvm/lib/Analysis/ScalarEvolution.cpp
7882	Is there some reason to expect that "c" fits into an int64_t?
7883	I guess the reason AddOperandCI has to be a constant is that this shift needs to constant-fold?
7886	This could be extended to cases where n is greater than m? You can skip that for the initial patch, of course. I don't really see any reason to restrict the shift amounts like this; the transform is pretty restricted even without that. What effect are you worried about?
llvm/test/Analysis/ScalarEvolution/sext-add-inreg.ll
38	Try to avoid "undef" in testcases where it isn't relevant. I don't think it really has much effect here, but still.

vedant-amd added inline comments.Jun 8 2023, 9:40 PM

llvm/lib/Analysis/ScalarEvolution.cpp
7882	Right, I need to add a check like this I guess ? if (CI->getValue().uge(BitWidth))
7883	Theoretically, it doesn't have to be a constant. But, I don't see where such a IR will be emitted. I can handle this case in a future patch. This IR is usually emitted for the following C code. InstCombine does the optimization. int64_t a = 10; int32_t b = a - 1; printf("%d", arr[b]);

vedant-amd marked an inline comment as not done.Jun 8 2023, 9:40 PM

efriedma added inline comments.Jun 8 2023, 9:52 PM

llvm/lib/Analysis/ScalarEvolution.cpp
7882	That doesn't look like the right check. If it were necessary, you could use isIntN or something like that. But you should just be able to just do the math in APInt: `AddOperandCI->getValue().ashr(AShrAmt)` or something like that.

Updated unit test to remove ret undefs

vedant-amd marked an inline comment as done.Jun 8 2023, 9:58 PM

vedant-amd added inline comments.

llvm/test/Analysis/ScalarEvolution/sext-add-inreg.ll
38	Sure, will keep this in mind.

vedant-amd marked an inline comment as done.Jun 8 2023, 9:58 PM

vedant-amd added inline comments.Jun 8 2023, 10:21 PM

llvm/lib/Analysis/ScalarEvolution.cpp
7886	This could be extended to cases where n is greater than m? You can skip that for the initial patch, of course. I could do that, but the original goal was to handle sext(trunc) expression that are expanded into these statements by instcombine. I don't really see any reason to restrict the shift amounts like this; the transform is pretty restricted even without that. What effect are you worried about? We only wanted to support the data types supported by C/C++, and also since these instr are transformed from sext(trunc) it makes sense to just support standard integer types.

vedant-amd added inline comments.Jun 8 2023, 10:29 PM

llvm/lib/Analysis/ScalarEvolution.cpp
7883	I guess the reason AddOperandCI has to be a constant is that this shift needs to constant-fold? This can be implemented for non-constants as well, maybe in a future patch. But, it involves coming up with complex SCEV expression involving div.

Added check to make sure AddConstant isn't nullptr and few other changes.

vedant-amd marked 3 inline comments as done.Jun 8 2023, 10:48 PM

vedant-amd added inline comments.

llvm/lib/Analysis/ScalarEvolution.cpp
7882	I have made the following change, please take a look once.

Hey @efriedma , I have made the requested changes (and replied to the questions). Please let me know if anything more needs to be done. Thanks !

vedant-amd updated this revision to Diff 529831.Jun 8 2023, 10:55 PM

updated using clang-format

Harbormaster completed remote builds in B237668: Diff 529831.Jun 8 2023, 11:41 PM

I assume the motivation for this is the following InstCombine transform? https://llvm.godbolt.org/z/zd63TWKbW I don't think we can change that canonicalization, so undoing it in SCEV is fine.

llvm/lib/Analysis/ScalarEvolution.cpp
7871–7901	We already have code to handle the general shl+ashr pattern here, including for the case where n and m are not the same. We should reuse the same code. For your pattern, we'd just have an extra add at the start.

vedant-amd added inline comments.Jun 9 2023, 1:35 AM

llvm/lib/Analysis/ScalarEvolution.cpp
7871–7901	That could be done, but it would be a major refactoring. I will post a patch anyways.

In D152278#4407841, @nikic wrote:

I assume the motivation for this is the following InstCombine transform? https://llvm.godbolt.org/z/zd63TWKbW I don't think we can change that canonicalization, so undoing it in SCEV is fine.

Yup, that's correct. This canonicalization is used several times in LSR, so having it's SCEV might be beneficial.

Refactored code to reuse ashr + shl handling code

In D152278#4407841, @nikic wrote:

I assume the motivation for this is the following InstCombine transform? https://llvm.godbolt.org/z/zd63TWKbW I don't think we can change that canonicalization, so undoing it in SCEV is fine.

Hey @nikic, I did refactor the change, please take a look, all the SCEV test cases do pass with it. It look correct functionally.

Harbormaster completed remote builds in B237703: Diff 529876.Jun 9 2023, 3:28 AM

In D152278#4408118, @vedant-amd wrote:

In D152278#4407841, @nikic wrote:

I assume the motivation for this is the following InstCombine transform? https://llvm.godbolt.org/z/zd63TWKbW I don't think we can change that canonicalization, so undoing it in SCEV is fine.

Hey @nikic, I did refactor the change, please take a look, all the SCEV test cases do pass with it. It look correct functionally.

ping ! could you please give a +1 ? @nikic @efriedma

You should update checks in loopstrengthreduce::scaling-factor-incompat-type.ll

In D152278#4411959, @xbolva00 wrote:

You should update checks in loopstrengthreduce::scaling-factor-incompat-type.ll

I was going to send another patch for that, testcase failure. Sure, will do so.

Ping ! I didn't add the LSR failing testcase fix here, because I honestly don't know if it uncovered a bug, or it's a valid transformation in LSR.

You have to update that file anyway. Pre-commit builds are clearly failing.

In D152278#4416367, @xbolva00 wrote:

You have to update that file anyway. Pre-commit builds are clearly failing.

Right, is it fine if I mark the test case as XFAIL for now ?

In D152278#4416394, @vedant-amd wrote:

In D152278#4416367, @xbolva00 wrote:

You have to update that file anyway. Pre-commit builds are clearly failing.

Right, is it fine if I mark the test case as XFAIL for now ?

No, you need to regenerate the test by running the update_test_checks.py script.

llvm/lib/Analysis/ScalarEvolution.cpp
7884	There should be no restriction on the shift amount.
7888	We should verify that the add constant has lowest n bits unset, to make reassociation valid. It doesn't matter if n==m, but it matters if n!=m.
7898	You can keep the truncate in the comment code by making this trunc(add()) instead of add(trunc()). Unless I'm missing something, they're equivalent in this context.
7912	Stray newline
llvm/test/Analysis/ScalarEvolution/sext-add-inreg.ll
7	You can test these cases with basic patterns using just the shl+add+ashr sequence, without a loop (though it's also ok to keep a loop test as a motivating case). Please also test the case where the shift amounts are different.

In D152278#4417345, @nikic wrote:

In D152278#4416394, @vedant-amd wrote:

In D152278#4416367, @xbolva00 wrote:

You have to update that file anyway. Pre-commit builds are clearly failing.

Right, is it fine if I mark the test case as XFAIL for now ?

No, you need to regenerate the test by running the update_test_checks.py script.

Will do so, I was not sure if the testcase will be still valid after the updated CHECKs.

updated failing testcase in LSR

vedant-amd marked an inline comment as done.Jun 13 2023, 11:12 PM

removed restriction on shift amount and the corresponding test case

vedant-amd marked an inline comment as done.Jun 13 2023, 11:19 PM

vedant-amd marked an inline comment as done.

vedant-amd added inline comments.Jun 13 2023, 11:54 PM

llvm/lib/Analysis/ScalarEvolution.cpp
7888	Do you mean to say check for unset bits from nth bit to the mth bit ? because anything lower than mth bit will anyways be irrelevant once we right shift by m amount.

Harbormaster completed remote builds in B238707: Diff 531192.Jun 14 2023, 12:11 AM

vedant-amd marked an inline comment as not done.Jun 14 2023, 12:20 AM

vedant-amd added inline comments.Jun 14 2023, 4:49 AM

llvm/lib/Analysis/ScalarEvolution.cpp
7888	I tried to understand this, I am not able to make sense of it. Can you explain further what you mean to say ?

updated code to shift right the AddConstant by ShlAmt and flip add, trunc exprs

llvm/lib/Analysis/ScalarEvolution.cpp
7898	So, it should look like this ? AddTruncateExpr = getTruncateExpr(getAddExpr(ShlOp0SCEV, AddConstant), TruncTy);

Harbormaster completed remote builds in B238755: Diff 531267.Jun 14 2023, 6:05 AM

In D152278#4420657, @vedant-amd wrote:

updated code to shift right the AddConstant by ShlAmt and flip add, trunc exprs

ping !! I have addressed majority of the comments.

ping !! @nikic can you take a look at this, thanks !

Rebase

Harbormaster completed remote builds in B249141: Diff 545560.Jul 31 2023, 4:40 AM

Updated LSR testcase and added new testcases for AShr SCEV model

Hey @nikic and @efriedma Sorry for the ping again, but I have addressed all the comments (added the testcases as well), can you please review once again, it's been stuck since a long time. Thanks for your time.

Harbormaster completed remote builds in B249385: Diff 545909.Jul 31 2023, 9:05 PM

efriedma added inline comments.Aug 1 2023, 2:09 PM

llvm/test/Analysis/ScalarEvolution/sext-add-inreg-unequal.ll
12 ↗	(On Diff #545909)	How is this equivalent? Say %a is zero; the original function returns 1, this SCEV expression returns 0. (I think maybe the "ashr" of the constant is shifting by 10, instead of shifting by 8?)

vedant-amd added inline comments.Aug 3 2023, 1:03 AM

llvm/test/Analysis/ScalarEvolution/sext-add-inreg-unequal.ll
12 ↗	(On Diff #545909)	Yeah, the SCEV is wrong. I will fix this. This is what is should have been, but it's a bit opposite. = (a2^10 + 256)/28 = a4 + 1 So, we need SCEV like this: 2^(shl_amt - ashr_amt) a + c >> ashr_amt I have updated my code, here's the correct SCEV: (1 + (sext i56 (4 * (trunc i64 %a to i56)) to i64))<nuw><nsw> U: [1,-2) S: [-36028797018963967,36028797018963966) Does it seem correct ?

Fixed issue with generated SCEV for ShlAmt > AshrAmt

Hey @efriedma Thanks for spotting the bug, I updated the SCEV for case when m > n, it seems to be correct now. Can you please review the patch, thanks !

CC: @nikic

Harbormaster completed remote builds in B249999: Diff 546769.Aug 3 2023, 5:11 AM

efriedma added inline comments.Aug 3 2023, 8:43 AM

llvm/test/Analysis/ScalarEvolution/sext-add-inreg-unequal.ll
12 ↗	(On Diff #545909)	I suspect the updated code isn't right... can you add an example with a larger operand to the add? The add needs to happen before the sign-extend, but in this specific example, that doesn't matter because the two expressions are equivalent.

vedant-amd added inline comments.Aug 4 2023, 1:18 AM

llvm/test/Analysis/ScalarEvolution/sext-add-inreg-unequal.ll
12 ↗	(On Diff #545909)	I will add one, but I believe we can move the add out of the sext as the add constant is already shifted right by Ashr amount.

vedant-amd added inline comments.Aug 4 2023, 1:23 AM

llvm/test/Analysis/ScalarEvolution/sext-add-inreg-unequal.ll
12 ↗	(On Diff #545909)	Also, I am confused about one thing, the Type of MulExpr turns out to be i56 and that of the addConstant is i64, I am able to add them before the Sign extend. Adding the m > n part to this ashr model seems to buggy at this point, as I had proposed can this patch just address the case where m=n ? and I submit, the m > n thing later with thorough thinking. But, I think @nikic wants them in the same patch.

vedant-amd added inline comments.Aug 4 2023, 1:38 AM

llvm/test/Analysis/ScalarEvolution/sext-add-inreg-unequal.ll
12 ↗	(On Diff #545909)	Adding to the previous comment, the Ashr + Add + Shl with shlamt > ashramt quite rarely occur in LLVM IR. But, the case where the shlamt = ashramt is quite common in LLVM IR, notably because instcombine like passes do the transformation. I propose that this patch just address the latter and I send a patch for the m > n case at a later point. This patch is stuck from a long time, and just taking care of m=n case won't negatively affect as it'll safely exit in the m > n case. Please let me know your thoughts about this, thanks ! cc: @nikic

ping ping !!

I'm not sure why you're having so much trouble with the unequal shift math. The case of unequal shift amount is really just treating the shl as two shifts: one shift by an arbitrary amount, followed by one shift with shift amount equal to the ashr.

That said I'd be okay with a patch that bailed out on that case.

Update the SCEV code and add a new testcase

In D152278#4590846, @efriedma wrote:

I'm not sure why you're having so much trouble with the unequal shift math. The case of unequal shift amount is really just treating the shl as two shifts: one shift by an arbitrary amount, followed by one shift with shift amount equal to the ashr.

I am having trouble with keeping the addition part inside the SextExpr thing, because it always returns back i64 int, but now came up with a better implementation. It seems correct to me, please take a look.

That said I'd be okay with a patch that bailed out on that case.

In D152278#4590846, @efriedma wrote:

I'm not sure why you're having so much trouble with the unequal shift math. The case of unequal shift amount is really just treating the shl as two shifts: one shift by an arbitrary amount, followed by one shift with shift amount equal to the ashr.

That said I'd be okay with a patch that bailed out on that case.

updated with clang-format

added empty line at end of file in a test case

@efriedma I have fixed the SCEV part, it mostly seems correct. Also added the testcase with large add constant. Please take a look once ! thanks !

Harbormaster completed remote builds in B252892: Diff 550674.Aug 16 2023, 5:57 AM

ping !

Rebased

Harbormaster completed remote builds in B254054: Diff 552306.Aug 22 2023, 4:45 AM

The revised logic makes sense.

llvm/lib/Analysis/ScalarEvolution.cpp
7899	Instead of checking `if (ShlAmt > AShrAmt)` here, can you just unconditionally do `AddTruncateExpr = getTruncateExpr(ShlOp0SCEV, TruncTy);`, then add an `if (L->getOpcode() != Instruction::Shl)` check to the equal shift amount case? Or better, just unify the `ShlAmt > AShrAmt` and `ShlAmt == AShrAmt` cases; the logic for the `ShlAmt > AShrAmt` case should just work for the `ShlAmt == AShrAmt` case (it ends up multiplying by a constant 1, which simplifies to exactly the same thing as the existing `ShlAmt == AShrAmt` code).

Simplified expression handling code, and cleaned up other things

llvm/lib/Analysis/ScalarEvolution.cpp
7899	I have implemented the second suggestion, refactored and cleaned up the comments. Now the code looks in good shape. Please give a final review for the same.

Harbormaster completed remote builds in B254556: Diff 553018.Aug 24 2023, 1:15 AM

LGTM

This revision is now accepted and ready to land.Aug 24 2023, 9:51 AM

Updated the commit message

This revision was landed with ongoing or failed builds.Aug 24 2023, 10:47 PM

Closed by commit rG5a9a02f67b77: [SCEV] Compute SCEV for ashr(add(shl(x, n), c), m) instr triplet (authored by vedant-amd). · Explain Why

This revision was automatically updated to reflect the committed changes.

vedant-amd added a commit: rG5a9a02f67b77: [SCEV] Compute SCEV for ashr(add(shl(x, n), c), m) instr triplet.

Harbormaster completed remote builds in B254800: Diff 553365.Aug 24 2023, 11:24 PM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

ScalarEvolution.cpp

47 lines

test/

Analysis/

ScalarEvolution/

sext-add-inreg.ll

98 lines

Transforms/

LoopStrengthReduce/

scaling-factor-incompat-type.ll

15 lines

Diff 531173

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,862 Lines • ▼ Show 20 Lines	case Instruction::AShr: {

if (CI->isZero())		if (CI->isZero())
return getSCEV(BO->LHS); // shift by zero --> noop		return getSCEV(BO->LHS); // shift by zero --> noop

uint64_t AShrAmt = CI->getZExtValue();		uint64_t AShrAmt = CI->getZExtValue();
Type *TruncTy = IntegerType::get(getContext(), BitWidth - AShrAmt);		Type *TruncTy = IntegerType::get(getContext(), BitWidth - AShrAmt);

Operator *L = dyn_cast<Operator>(BO->LHS);		Operator *L = dyn_cast<Operator>(BO->LHS);
if (L && L->getOpcode() == Instruction::Shl) {		const SCEV *AddTruncateExpr = nullptr;
		ConstantInt *ShlAmtCI = nullptr;

		if (L && L->getOpcode() == Instruction::Add) {
		// X = Shl A, n
		// Y = Add X, c
		// Z = AShr Y, m
		// n, c and m are constants.

		Operator *LShift = dyn_cast<Operator>(L->getOperand(0));
		ConstantInt *AddOperandCI = dyn_cast<ConstantInt>(L->getOperand(1));
		if (LShift && LShift->getOpcode() == Instruction::Shl) {
		efriedmaUnsubmitted Done Reply Inline Actions Is there some reason to expect that "c" fits into an int64_t? efriedma: Is there some reason to expect that "c" fits into an int64_t?
		vedant-amdAuthorUnsubmitted Not Done Reply Inline Actions Right, I need to add a check like this I guess ? if (CI->getValue().uge(BitWidth)) vedant-amd: Right, I need to add a check like this I guess ? ``` if (CI->getValue().uge(BitWidth))…
		efriedmaUnsubmitted Not Done Reply Inline Actions That doesn't look like the right check. If it were necessary, you could use isIntN or something like that. But you should just be able to just do the math in APInt: `AddOperandCI->getValue().ashr(AShrAmt)` or something like that. efriedma: That doesn't look like the right check. If it were necessary, you could use isIntN or…
		vedant-amdAuthorUnsubmitted Done Reply Inline Actions I have made the following change, please take a look once. vedant-amd: I have made the following change, please take a look once.
		if (AddOperandCI &&
		efriedmaUnsubmitted Done Reply Inline Actions I guess the reason AddOperandCI has to be a constant is that this shift needs to constant-fold? efriedma: I guess the reason AddOperandCI has to be a constant is that this shift needs to constant-fold?
		vedant-amdAuthorUnsubmitted Done Reply Inline Actions Theoretically, it doesn't have to be a constant. But, I don't see where such a IR will be emitted. I can handle this case in a future patch. This IR is usually emitted for the following C code. InstCombine does the optimization. int64_t a = 10; int32_t b = a - 1; printf("%d", arr[b]); vedant-amd: Theoretically, it doesn't have to be a constant. But, I don't see where such a IR will be…
		vedant-amdAuthorUnsubmitted Done Reply Inline Actions I guess the reason AddOperandCI has to be a constant is that this shift needs to constant-fold? This can be implemented for non-constants as well, maybe in a future patch. But, it involves coming up with complex SCEV expression involving div. vedant-amd: > I guess the reason AddOperandCI has to be a constant is that this shift needs to constant…
		(AShrAmt == 32 \|\| AShrAmt == 48 \|\| AShrAmt == 56)) {
		nikicUnsubmitted Done Reply Inline Actions There should be no restriction on the shift amount. nikic: There should be no restriction on the shift amount.
		// since we truncate to TruncTy, the AddConstant should be of the same
		// type, so create a new Constant with type same as TruncTy. Also, the
		efriedmaUnsubmitted Done Reply Inline Actions This could be extended to cases where n is greater than m? You can skip that for the initial patch, of course. I don't really see any reason to restrict the shift amounts like this; the transform is pretty restricted even without that. What effect are you worried about? efriedma: This could be extended to cases where n is greater than m? You can skip that for the initial…
		vedant-amdAuthorUnsubmitted Done Reply Inline Actions This could be extended to cases where n is greater than m? You can skip that for the initial patch, of course. I could do that, but the original goal was to handle sext(trunc) expression that are expanded into these statements by instcombine. I don't really see any reason to restrict the shift amounts like this; the transform is pretty restricted even without that. What effect are you worried about? We only wanted to support the data types supported by C/C++, and also since these instr are transformed from sext(trunc) it makes sense to just support standard integer types. vedant-amd: > This could be extended to cases where n is greater than m? You can skip that for the initial…
		// Add constant should be shifted right by AShr amount.
		APInt AddOperand = AddOperandCI->getValue().ashr(AShrAmt);
		nikicUnsubmitted Not Done Reply Inline Actions We should verify that the add constant has lowest n bits unset, to make reassociation valid. It doesn't matter if n==m, but it matters if n!=m. nikic: We should verify that the add constant has lowest n bits unset, to make reassociation valid. It…
		vedant-amdAuthorUnsubmitted Not Done Reply Inline Actions Do you mean to say check for unset bits from nth bit to the mth bit ? because anything lower than mth bit will anyways be irrelevant once we right shift by m amount. vedant-amd: Do you mean to say check for unset bits from nth bit to the mth bit ? because anything lower…
		vedant-amdAuthorUnsubmitted Done Reply Inline Actions I tried to understand this, I am not able to make sense of it. Can you explain further what you mean to say ? vedant-amd: I tried to understand this, I am not able to make sense of it. Can you explain further what you…
		const SCEV *AddConstant =
		getConstant(AddOperand.trunc(TruncTy->getIntegerBitWidth()));

		ShlAmtCI = dyn_cast<ConstantInt>(LShift->getOperand(1));
		const SCEV *ShlOp0SCEV = getSCEV(LShift->getOperand(0));
		// we model the expression as sext(add(trunc(A), c << n)), since the
		// sext(trunc) part is already handled below, we create a
		// AddExpr(TruncExp) which will be used later.
		AddTruncateExpr =
		getAddExpr(getTruncateExpr(ShlOp0SCEV, TruncTy), AddConstant);
		nikicUnsubmitted Not Done Reply Inline Actions You can keep the truncate in the comment code by making this trunc(add()) instead of add(trunc()). Unless I'm missing something, they're equivalent in this context. nikic: You can keep the truncate in the comment code by making this trunc(add()) instead of add(trunc…
		vedant-amdAuthorUnsubmitted Done Reply Inline Actions So, it should look like this ? AddTruncateExpr = getTruncateExpr(getAddExpr(ShlOp0SCEV, AddConstant), TruncTy); vedant-amd: So, it should look like this ? ``` AddTruncateExpr = getTruncateExpr(getAddExpr…
		}
		efriedmaUnsubmitted Done Reply Inline Actions Instead of checking `if (ShlAmt > AShrAmt)` here, can you just unconditionally do `AddTruncateExpr = getTruncateExpr(ShlOp0SCEV, TruncTy);`, then add an `if (L->getOpcode() != Instruction::Shl)` check to the equal shift amount case? Or better, just unify the `ShlAmt > AShrAmt` and `ShlAmt == AShrAmt` cases; the logic for the `ShlAmt > AShrAmt` case should just work for the `ShlAmt == AShrAmt` case (it ends up multiplying by a constant 1, which simplifies to exactly the same thing as the existing `ShlAmt == AShrAmt` code). efriedma: Instead of checking `if (ShlAmt > AShrAmt)` here, can you just unconditionally do…
		vedant-amdAuthorUnsubmitted Done Reply Inline Actions I have implemented the second suggestion, refactored and cleaned up the comments. Now the code looks in good shape. Please give a final review for the same. vedant-amd: I have implemented the second suggestion, refactored and cleaned up the comments. Now the code…
		}
		} else if (L && L->getOpcode() == Instruction::Shl) {
		nikicUnsubmitted Done Reply Inline Actions We already have code to handle the general shl+ashr pattern here, including for the case where n and m are not the same. We should reuse the same code. For your pattern, we'd just have an extra add at the start. nikic: We already have code to handle the general shl+ashr pattern here, including for the case where…
		vedant-amdAuthorUnsubmitted Done Reply Inline Actions That could be done, but it would be a major refactoring. I will post a patch anyways. vedant-amd: That could be done, but it would be a major refactoring. I will post a patch anyways.
// X = Shl A, n		// X = Shl A, n
// Y = AShr X, m		// Y = AShr X, m
// Both n and m are constant.		// Both n and m are constant.

		ShlAmtCI = dyn_cast<ConstantInt>(L->getOperand(1));
const SCEV *ShlOp0SCEV = getSCEV(L->getOperand(0));		const SCEV *ShlOp0SCEV = getSCEV(L->getOperand(0));
if (L->getOperand(1) == BO->RHS)		AddTruncateExpr = getTruncateExpr(ShlOp0SCEV, TruncTy);
		}

		if (AddTruncateExpr && ShlAmtCI) {

		nikicUnsubmitted Done Reply Inline Actions Stray newline nikic: Stray newline
		if (ShlAmtCI == CI)
// For a two-shift sext-inreg, i.e. n = m,		// For a two-shift sext-inreg, i.e. n = m,
// use sext(trunc(x)) as the SCEV expression.		// use sext(trunc(x)) as the SCEV expression.
return getSignExtendExpr(		return getSignExtendExpr(AddTruncateExpr, OuterTy);
getTruncateExpr(ShlOp0SCEV, TruncTy), OuterTy);

ConstantInt *ShlAmtCI = dyn_cast<ConstantInt>(L->getOperand(1));
if (ShlAmtCI && ShlAmtCI->getValue().ult(BitWidth)) {		if (ShlAmtCI && ShlAmtCI->getValue().ult(BitWidth)) {
uint64_t ShlAmt = ShlAmtCI->getZExtValue();		uint64_t ShlAmt = ShlAmtCI->getZExtValue();
if (ShlAmt > AShrAmt) {		if (ShlAmt > AShrAmt) {
// When n > m, use sext(mul(trunc(x), 2^(n-m)))) as the SCEV		// When n > m, use sext(mul(trunc(x), 2^(n-m)))) as the SCEV
// expression. We already checked that ShlAmt < BitWidth, so		// expression. We already checked that ShlAmt < BitWidth, so
// the multiplier, 1 << (ShlAmt - AShrAmt), fits into TruncTy as		// the multiplier, 1 << (ShlAmt - AShrAmt), fits into TruncTy as
// ShlAmt - AShrAmt < Amt.		// ShlAmt - AShrAmt < Amt.
APInt Mul = APInt::getOneBitSet(BitWidth - AShrAmt,		APInt Mul = APInt::getOneBitSet(BitWidth - AShrAmt,
ShlAmt - AShrAmt);		ShlAmt - AShrAmt);
return getSignExtendExpr(		return getSignExtendExpr(
getMulExpr(getTruncateExpr(ShlOp0SCEV, TruncTy),		getMulExpr(AddTruncateExpr, getConstant(Mul)), OuterTy);
getConstant(Mul)), OuterTy);
}		}
}		}
}		}
break;		break;
}		}
}		}
}		}

▲ Show 20 Lines • Show All 7,529 Lines • Show Last 20 Lines

llvm/test/Analysis/ScalarEvolution/sext-add-inreg.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 2
				; RUN: opt < %s -disable-output "-passes=print<scalar-evolution>" 2>&1 \| FileCheck %s

				@.str = private unnamed_addr constant [3 x i8] c"%x\00", align 1

				define dso_local i32 @test(ptr nocapture noundef readonly %x) {
				; CHECK-LABEL: 'test'
				nikicUnsubmitted Not Done Reply Inline Actions You can test these cases with basic patterns using just the shl+add+ashr sequence, without a loop (though it's also ok to keep a loop test as a motivating case). Please also test the case where the shift amounts are different. nikic: You can test these cases with basic patterns using just the shl+add+ashr sequence, without a…
				; CHECK-NEXT: Classifying expressions for: @test
				; CHECK-NEXT: %i.03 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
				; CHECK-NEXT: --> {1,+,1}<nuw><nsw><%for.body> U: [1,10) S: [1,10) Exits: 9 LoopDispositions: { %for.body: Computable }
				; CHECK-NEXT: %conv = shl nuw nsw i64 %i.03, 32
				; CHECK-NEXT: --> {4294967296,+,4294967296}<nuw><nsw><%for.body> U: [4294967296,38654705665) S: [4294967296,38654705665) Exits: 38654705664 LoopDispositions: { %for.body: Computable }
				; CHECK-NEXT: %sext = add nsw i64 %conv, -4294967296
				; CHECK-NEXT: --> {0,+,4294967296}<nuw><nsw><%for.body> U: [0,34359738369) S: [0,34359738369) Exits: 34359738368 LoopDispositions: { %for.body: Computable }
				; CHECK-NEXT: %idxprom = ashr exact i64 %sext, 32
				; CHECK-NEXT: --> {0,+,1}<nuw><nsw><%for.body> U: [0,9) S: [0,9) Exits: 8 LoopDispositions: { %for.body: Computable }
				; CHECK-NEXT: %arrayidx = getelementptr inbounds i32, ptr %x, i64 %idxprom
				; CHECK-NEXT: --> {%x,+,4}<nuw><%for.body> U: full-set S: full-set Exits: (32 + %x) LoopDispositions: { %for.body: Computable }
				; CHECK-NEXT: %0 = load i32, ptr %arrayidx, align 4
				; CHECK-NEXT: --> %0 U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %for.body: Variant }
				; CHECK-NEXT: %call = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str, i32 noundef %0)
				; CHECK-NEXT: --> %call U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %for.body: Variant }
				; CHECK-NEXT: %inc = add nuw nsw i64 %i.03, 1
				; CHECK-NEXT: --> {2,+,1}<nuw><nsw><%for.body> U: [2,11) S: [2,11) Exits: 10 LoopDispositions: { %for.body: Computable }
				; CHECK-NEXT: Determining loop execution counts for: @test
				; CHECK-NEXT: Loop %for.body: backedge-taken count is 8
				; CHECK-NEXT: Loop %for.body: constant max backedge-taken count is 8
				; CHECK-NEXT: Loop %for.body: symbolic max backedge-taken count is 8
				; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is 8
				; CHECK-NEXT: Predicates:
				; CHECK: Loop %for.body: Trip multiple is 9
				;
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret i32 0

				efriedmaUnsubmitted Done Reply Inline Actions Try to avoid "undef" in testcases where it isn't relevant. I don't think it really has much effect here, but still. efriedma: Try to avoid "undef" in testcases where it isn't relevant. I don't think it really has much…
				vedant-amdAuthorUnsubmitted Done Reply Inline Actions Sure, will keep this in mind. vedant-amd: Sure, will keep this in mind.
				for.body: ; preds = %entry, %for.body
				%i.03 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
				%conv = shl nuw nsw i64 %i.03, 32
				%sext = add nsw i64 %conv, -4294967296
				%idxprom = ashr exact i64 %sext, 32
				%arrayidx = getelementptr inbounds i32, ptr %x, i64 %idxprom
				%0 = load i32, ptr %arrayidx, align 4
				%call = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str, i32 noundef %0)
				%inc = add nuw nsw i64 %i.03, 1
				%exitcond.not = icmp eq i64 %inc, 10
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

				define dso_local i32 @test_fail(ptr nocapture noundef readonly %x) {
				; CHECK-LABEL: 'test_fail'
				; CHECK-NEXT: Classifying expressions for: @test_fail
				; CHECK-NEXT: %i.03 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
				; CHECK-NEXT: --> {1,+,1}<nuw><nsw><%for.body> U: [1,10) S: [1,10) Exits: 9 LoopDispositions: { %for.body: Computable }
				; CHECK-NEXT: %conv = shl nuw nsw i64 %i.03, 34
				; CHECK-NEXT: --> {17179869184,+,17179869184}<nuw><nsw><%for.body> U: [17179869184,154618822657) S: [17179869184,154618822657) Exits: 154618822656 LoopDispositions: { %for.body: Computable }
				; CHECK-NEXT: %sext = add nsw i64 %conv, -4294967296
				; CHECK-NEXT: --> {12884901888,+,17179869184}<nuw><nsw><%for.body> U: [12884901888,150323855361) S: [12884901888,150323855361) Exits: 150323855360 LoopDispositions: { %for.body: Computable }
				; CHECK-NEXT: %idxprom = ashr exact i64 %sext, 34
				; CHECK-NEXT: --> %idxprom U: [-536870912,536870912) S: [-536870912,536870912) Exits: 8 LoopDispositions: { %for.body: Variant }
				; CHECK-NEXT: %arrayidx = getelementptr inbounds i32, ptr %x, i64 %idxprom
				; CHECK-NEXT: --> ((4 * %idxprom)<nsw> + %x) U: full-set S: full-set Exits: (32 + %x) LoopDispositions: { %for.body: Variant }
				; CHECK-NEXT: %0 = load i32, ptr %arrayidx, align 4
				; CHECK-NEXT: --> %0 U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %for.body: Variant }
				; CHECK-NEXT: %call = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str, i32 noundef %0)
				; CHECK-NEXT: --> %call U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %for.body: Variant }
				; CHECK-NEXT: %inc = add nuw nsw i64 %i.03, 1
				; CHECK-NEXT: --> {2,+,1}<nuw><nsw><%for.body> U: [2,11) S: [2,11) Exits: 10 LoopDispositions: { %for.body: Computable }
				; CHECK-NEXT: Determining loop execution counts for: @test_fail
				; CHECK-NEXT: Loop %for.body: backedge-taken count is 8
				; CHECK-NEXT: Loop %for.body: constant max backedge-taken count is 8
				; CHECK-NEXT: Loop %for.body: symbolic max backedge-taken count is 8
				; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is 8
				; CHECK-NEXT: Predicates:
				; CHECK: Loop %for.body: Trip multiple is 9
				;
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret i32 0

				for.body: ; preds = %entry, %for.body
				%i.03 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
				%conv = shl nuw nsw i64 %i.03, 34
				%sext = add nsw i64 %conv, -4294967296
				%idxprom = ashr exact i64 %sext, 34
				%arrayidx = getelementptr inbounds i32, ptr %x, i64 %idxprom
				%0 = load i32, ptr %arrayidx, align 4
				%call = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str, i32 noundef %0)
				%inc = add nuw nsw i64 %i.03, 1
				%exitcond.not = icmp eq i64 %inc, 10
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

				declare noundef i32 @printf(ptr nocapture noundef readonly, ...)

llvm/test/Transforms/LoopStrengthReduce/scaling-factor-incompat-type.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; Check that it doesn't crash by generating formula with zero in base register			; Check that it doesn't crash by generating formula with zero in base register
	; when one of the IV factors does't fit (2^32 in this test) the formula type			; when one of the IV factors does't fit (2^32 in this test) the formula type
	; see pr42770			; see pr42770
	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -loop-reduce -S \| FileCheck %s			; RUN: opt < %s -loop-reduce -S \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"

	define void @foo() {			define void @foo() {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br label [[BB4:%.*]]			; CHECK-NEXT: br label [[BB4:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[T3:%.]] = ashr i64 [[LSR_IV_NEXT:%.]], 32			; CHECK-NEXT: [[T:%.]] = shl i64 [[T14:%.]], 32
				; CHECK-NEXT: [[T2:%.*]] = add i64 [[T]], 1
				; CHECK-NEXT: [[T3:%.*]] = ashr i64 [[T2]], 32
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: [[LSR_IV1:%.]] = phi i16 [ [[LSR_IV_NEXT2:%.]], [[BB13:%.]] ], [ 6, [[BB:%.]] ]			; CHECK-NEXT: [[LSR_IV:%.]] = phi i16 [ [[LSR_IV_NEXT:%.]], [[BB13:%.]] ], [ 6, [[BB:%.]] ]
	; CHECK-NEXT: [[LSR_IV:%.*]] = phi i64 [ [[LSR_IV_NEXT]], [[BB13]] ], [ 8589934593, [[BB]] ]			; CHECK-NEXT: [[T5:%.*]] = phi i64 [ 2, [[BB]] ], [ [[T14]], [[BB13]] ]
	; CHECK-NEXT: [[LSR_IV_NEXT]] = add nuw nsw i64 [[LSR_IV]], 25769803776			; CHECK-NEXT: [[LSR_IV_NEXT]] = add nuw nsw i16 [[LSR_IV]], 6
	; CHECK-NEXT: [[LSR_IV_NEXT2]] = add nuw nsw i16 [[LSR_IV1]], 6			; CHECK-NEXT: [[T14]] = add nuw nsw i64 [[T5]], 6
	; CHECK-NEXT: [[T10:%.*]] = icmp eq i16 1, 0			; CHECK-NEXT: [[T10:%.*]] = icmp eq i16 1, 0
	; CHECK-NEXT: br i1 [[T10]], label [[BB11:%.*]], label [[BB13]]			; CHECK-NEXT: br i1 [[T10]], label [[BB11:%.*]], label [[BB13]]
	; CHECK: bb11:			; CHECK: bb11:
	; CHECK-NEXT: [[T12:%.*]] = udiv i16 1, [[LSR_IV1]]			; CHECK-NEXT: [[T12:%.*]] = udiv i16 1, [[LSR_IV]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: bb13:			; CHECK: bb13:
	; CHECK-NEXT: br i1 true, label [[BB1:%.*]], label [[BB4]]			; CHECK-NEXT: br i1 true, label [[BB1:%.*]], label [[BB4]]
	;			;
	bb:			bb:
	br label %bb4			br label %bb4
	bb1: ; preds = %bb13			bb1: ; preds = %bb13
	%t = shl i64 %t14, 32			%t = shl i64 %t14, 32
	Show All 20 Lines