This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
4/8
InstCombineAndOrXor.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
2/5
and.ll
-
icmp-and-shift.ll

Differential D126617

[InstCombine] Optimize shl+lshr+and conversion pattern
ClosedPublic

Authored by bcl5980 on May 29 2022, 9:46 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
lebedev.ri
craig.topper
nikic

Commits

rGde7a6ae1ffc3: [InstCombine] Optimize shl+lshr+and conversion pattern

Summary

if C1 and C3 are pow2 and Log2(C3)+C2 < BitWidth:

((C1 << X) >> C2) & C3 -> X == (Log2(C3)+C2-Log2(C1)) ? C3 : 0;

https://alive2.llvm.org/ce/z/Pus5bd

Fix issue https://github.com/llvm/llvm-project/issues/55739

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bcl5980 created this revision.May 29 2022, 9:46 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 29 2022, 9:46 AM

Herald added subscribers: StephenFan, hiraditya. · View Herald Transcript

bcl5980 requested review of this revision.May 29 2022, 9:46 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 29 2022, 9:46 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

bcl5980 edited the summary of this revision. (Show Details)May 29 2022, 9:47 AM

rebase with more tests diff

Harbormaster completed remote builds in B166826: Diff 432798.May 29 2022, 10:47 AM

spatel edited the summary of this revision. (Show Details)May 30 2022, 4:37 AM

bcl5980 mentioned this in D126591: [InstCombine] Optimise shift+and+boolean conversion pattern to simple comparison.May 30 2022, 8:18 AM

spatel added inline comments.May 30 2022, 12:55 PM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1935	This pattern with 2 shifts in the same direction should not exist after: a0c3c60728ee5bc7
llvm/test/Transforms/InstCombine/and.ll
1954–1955	We had not reduced shifts as much as possible in this test and several others:

rebase code
remove lshr+lshr pattern
add a new transform to make shift+and have higher priority

bcl5980 edited the summary of this revision. (Show Details)May 30 2022, 10:02 PM

update comments

bcl5980 marked 2 inline comments as done.May 30 2022, 10:11 PM

Harbormaster completed remote builds in B166986: Diff 433014.May 30 2022, 10:44 PM

I still think we should split this patch up as 2 independent transforms.

The opposite shifts transform doesn't seem like it should be a power-of-2-mask transform. Can we handle that using demanded bits instead? Double-check (you can pre-commit more tests as needed), but I don't think this patch will handle these related folds:
https://alive2.llvm.org/ce/z/SNmj5M

In D126617#3547120, @spatel wrote:

I still think we should split this patch up as 2 independent transforms.

The opposite shifts transform doesn't seem like it should be a power-of-2-mask transform. Can we handle that using demanded bits instead? Double-check (you can pre-commit more tests as needed), but I don't think this patch will handle these related folds:
https://alive2.llvm.org/ce/z/SNmj5M

Thanks for the mention. Is this transform you want ?
https://alive2.llvm.org/ce/z/-C8L9U
If yes, I will send a new patch to do this.

bcl5980 updated this revision to Diff 433088.May 31 2022, 8:15 AM

bcl5980 edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B167046: Diff 433088.May 31 2022, 9:07 AM

In D126617#3547436, @bcl5980 wrote:

In D126617#3547120, @spatel wrote:

I still think we should split this patch up as 2 independent transforms.

The opposite shifts transform doesn't seem like it should be a power-of-2-mask transform. Can we handle that using demanded bits instead? Double-check (you can pre-commit more tests as needed), but I don't think this patch will handle these related folds:
https://alive2.llvm.org/ce/z/SNmj5M

Thanks for the mention. Is this transform you want ?
https://alive2.llvm.org/ce/z/-C8L9U
If yes, I will send a new patch to do this.

Yes, the first pre-condition looks correct. We don't actually care what the final instruction in the sequence is - it just has to remove demand of the high bits. The last instruction could be a trunc for example, so we should have tests with that too:
https://alive2.llvm.org/ce/z/ZCgqj5

We already look for that pattern in InstCombine's demanded bits. So I think we just need to add a transform like this:

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
index 278db05f65d1..c0d92fc27bb6 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
@@ -630,6 +630,18 @@ Value *InstCombinerImpl::SimplifyDemandedUseBits(Value *V, APInt DemandedMask,
             ComputeNumSignBits(I->getOperand(0), Depth + 1, CxtI);
         if (SignBits >= NumHiDemandedBits)
           return I->getOperand(0);
+
+        // If we can pre-shift a left-shifted constant to the right without
+        // losing any low bits (we already know we don't demand the high bits):
+        // (C << X) >> SA --> (C >> SA) << X
+        Value *X;
+        const APInt *C;
+        if (match(I->getOperand(0), m_Shl(m_APInt(C), m_Value(X))) &&
+            C->countTrailingZeros() >= ShiftAmt) {
+          Constant *ShiftC = ConstantInt::get(I->getType(), C->lshr(ShiftAmt));
+          Instruction *Shl = BinaryOperator::CreateShl(ShiftC, X);
+          return InsertNewInstWith(Shl, *I);
+        }
       }
 
       // Unsigned shift right.

spatel added inline comments.May 31 2022, 11:54 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1911–1913	What happens if we reduce the pattern to: https://alive2.llvm.org/ce/z/7snGRd That's the same transform that I suggested in D126591, but invert the shift direction (lshr instead of shl).

In D126617#3547933, @spatel wrote:
In D126617#3547436, @bcl5980 wrote:

In D126617#3547120, @spatel wrote:

I still think we should split this patch up as 2 independent transforms.

The opposite shifts transform doesn't seem like it should be a power-of-2-mask transform. Can we handle that using demanded bits instead? Double-check (you can pre-commit more tests as needed), but I don't think this patch will handle these related folds:
https://alive2.llvm.org/ce/z/SNmj5M

Thanks for the mention. Is this transform you want ?
https://alive2.llvm.org/ce/z/-C8L9U
If yes, I will send a new patch to do this.

Yes, the first pre-condition looks correct. We don't actually care what the final instruction in the sequence is - it just has to remove demand of the high bits. The last instruction could be a trunc for example, so we should have tests with that too:
https://alive2.llvm.org/ce/z/ZCgqj5

We already look for that pattern in InstCombine's demanded bits. So I think we just need to add a transform like this:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
index 278db05f65d1..c0d92fc27bb6 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
@@ -630,6 +630,18 @@ Value *InstCombinerImpl::SimplifyDemandedUseBits(Value *V, APInt DemandedMask,
             ComputeNumSignBits(I->getOperand(0), Depth + 1, CxtI);
         if (SignBits >= NumHiDemandedBits)
           return I->getOperand(0);
+
+        // If we can pre-shift a left-shifted constant to the right without
+        // losing any low bits (we already know we don't demand the high bits):
+        // (C << X) >> SA --> (C >> SA) << X
+        Value *X;
+        const APInt *C;
+        if (match(I->getOperand(0), m_Shl(m_APInt(C), m_Value(X))) &&
+            C->countTrailingZeros() >= ShiftAmt) {
+          Constant *ShiftC = ConstantInt::get(I->getType(), C->lshr(ShiftAmt));
+          Instruction *Shl = BinaryOperator::CreateShl(ShiftC, X);
+          return InsertNewInstWith(Shl, *I);
+        }
       }
 
       // Unsigned shift right.

Wow, that's cool. A very general solution.

bcl5980 updated this revision to Diff 433278.May 31 2022, 9:19 PM

Harbormaster completed remote builds in B167189: Diff 433278.May 31 2022, 9:19 PM

bcl5980 added inline comments.May 31 2022, 9:31 PM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1911–1913	The latest version is based on your suggetion: https://alive2.llvm.org/ce/z/7snGRd https://alive2.llvm.org/ce/z/jA_tNb I'm still worry if we can transform shift+and to cmp+select. Generally most highend cpu should prefer shift+and because the cmp instruction ports is less than shift/and. But in cmp instruction the immediate value can be imm operation but shift may need extra mov instruction on some mainstream backend. One other question is this transform can fix the case @shl_lshr_pow2_const. Can you help to review which way should I do ?

spatel added inline comments.Jun 2 2022, 8:09 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1911–1913	Codegen for any particular target is not the main concern here. The backend should be able to invert the transforms that we make here if that is profitable. We partially demonstrated that with the assembly examples in the other patch. You can try similar experiments for these patterns. I looked at `@shl_lshr_pow2_const` for a while, and I don't see a very good generalization. We can add the larger pattern match for `and(shift(shift))`, or we can treat that as a special-case of demanding one bit only. If we view it as another demanded bits problem, then we could improve something like this: https://alive2.llvm.org/ce/z/3oDagP (but I don't have any evidence of that being important)
llvm/test/Transforms/InstCombine/and.ll
1776–1777	This diff does not exist with the current test on "main", right? Is this review baselined against another patch?

Sorry for the late response. Based on the discussion I have some questions:

the demanded bits fix. I think it is a safe and independent change. Should I check in this part first ? Or @spatel can you help to do this as the author is you.

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
index 278db05f65d1..c0d92fc27bb6 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
@@ -630,6 +630,18 @@ Value *InstCombinerImpl::SimplifyDemandedUseBits(Value *V, APInt DemandedMask,
             ComputeNumSignBits(I->getOperand(0), Depth + 1, CxtI);
         if (SignBits >= NumHiDemandedBits)
           return I->getOperand(0);
+
+        // If we can pre-shift a left-shifted constant to the right without
+        // losing any low bits (we already know we don't demand the high bits):
+        // (C << X) >> SA --> (C >> SA) << X
+        Value *X;
+        const APInt *C;
+        if (match(I->getOperand(0), m_Shl(m_APInt(C), m_Value(X))) &&
+            C->countTrailingZeros() >= ShiftAmt) {
+          Constant *ShiftC = ConstantInt::get(I->getType(), C->lshr(ShiftAmt));
+          Instruction *Shl = BinaryOperator::CreateShl(ShiftC, X);
+          return InsertNewInstWith(Shl, *I);
+        }
       }

Should we transform 'shift+and' to 'icmp+select'? I think backend is not easy to invert the transform. We need to make sure input value is less than BitWidth then can invert. So I prefer to do this only when we can save instructions.

How to fix @shl_lshr_pow2_const? I think current review is add the larger pattern match for and(shift(shift)). I'm sorry I don't know how to fix based on demanded bits? It will be grateful if you can help to teach me the detail solution?

Do we need D126591 after we fix @shl_lshr_pow2_const? There are still have some patterns we can't cover. For example the pattern:

 iff (C1 is pow2) & ((C2 & ~(C1-1)) + C1) is pow2) & (C1 < C2):
((C1 << X) & C2) == 0 -> X >= (Log2(C2+C1) - Log2(C1)); https://alive2.llvm.org/ce/z/JQYFnn
((C1 << X) & C2) != 0 -> X  < (Log2(C2+C1) - Log2(C1)); https://alive2.llvm.org/ce/z/BnyEmk

llvm/test/Transforms/InstCombine/and.ll
1776–1777	I'm sorry this is based on Diff3. I will rebase the review based on main.

update based on main branch and revert the code back to diff5

Harbormaster completed remote builds in B167981: Diff 434380.Jun 5 2022, 9:19 PM

spatel mentioned this in D127122: [InstCombine] reduce right-shift-of-left-shifted constant via demanded bits.Jun 6 2022, 8:49 AM

In D126617#3559375, @bcl5980 wrote:

the demanded bits fix. I think it is a safe and independent change.

Ok, let's try to make improvements in small steps; one part for demanded bits is here:
D127122

bcl5980 edited the summary of this revision. (Show Details)Jun 6 2022, 10:13 PM

bcl5980 edited the summary of this revision. (Show Details)Jun 6 2022, 10:44 PM

spatel mentioned this in rG82040d414b3c: [InstCombine] reduce right-shift-of-left-shifted constant via demanded bits.Jun 7 2022, 10:30 AM

spatel added inline comments.Jun 7 2022, 1:36 PM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1914	Use `m_Power2(C1)` ?
1918	Do we really need both conditions? I removed one assumption, and it still shows as correct: https://alive2.llvm.org/ce/z/nUAXL9

bcl5980 added inline comments.Jun 7 2022, 6:40 PM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1918	Yeah, we can remove the condition cttz(C1) < cttz(C)+C2. Actually if cttz(C1) >= cttz(C)+C2, it will fall into D127122. So still the question which pattern we should use by default, 'shift+and' or 'icmp+select'? 'shift+and' pattern can remain the information x is limited by bit width. 'icmp+select' can help to handle shift+and+xor case, and icmp can handle lshr, shl at the same time. For now what I do is keep 'shift+and' ASAP but if we prefer icmp+select I can remove the condition.

bcl5980 added inline comments.Jun 7 2022, 6:51 PM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1918	Yeah, we can remove the condition cttz(C1) < cttz(C)+C2. Actually if cttz(C1) >= cttz(C)+C2, it will fall into D127122. Sorry for the wrong comments, actually if cttz(C1) >= cttz(C)+C2 it will always return 0. Last comment works when C2 <= cttz(C1) < cttz(C)+C2. I will remove cttz(C1) < cttz(C)+C2 to make the code easier.

Use m_Power2 to match C1
Remove condition Log2(C1) < Log2(C3)+C2
Add one more test case when C2 < Log2(C1) < Log2(C3)+C2

Harbormaster completed remote builds in B168465: Diff 435028.Jun 7 2022, 8:09 PM

In D126617#3565340, @bcl5980 wrote:

Use m_Power2 to match C1

Remove condition Log2(C1) < Log2(C3)+C2

Update the Alive2 proof in the patch description, so it matches the new code.
Will you add the pattern where the shift order is reversed in another patch? ( https://alive2.llvm.org/ce/z/fNdbfZ )
You can put a TODO comment with the code in this patch, so we know it is should be added for symmetry.

llvm/test/Transforms/InstCombine/and.ll
1777	Do we have a negative test with "cttz(ShlC) > LShrC"? If not, please add that. If the test is already here, then add a comment like that, so we know the purpose of the test.
1792	This test is already optimized with D127122 ? It's fine to add another test, but please pre-commit before this patch, so we know how this patch alone is changing the tests.

bcl5980 edited the summary of this revision. (Show Details)Jun 8 2022, 7:26 PM

bcl5980 mentioned this in rG226c564329e2: [InstCombine] Add vector tests for shl+lshr+and transforms; NFC.Jun 8 2022, 8:14 PM

rebase with new tests.

add to do for Symmetrical case

Harbormaster completed remote builds in B168738: Diff 435416.Jun 8 2022, 9:22 PM

try support non-uniform case.

Harbormaster completed remote builds in B168743: Diff 435422.Jun 8 2022, 10:36 PM

LGTM

If I'm seeing it correctly, this will alter D126591 or possibly make it unnecessary. I recommend implementing the symmetric TODO pattern for this patch as the next patch, and then we can see what remains.

This revision is now accepted and ready to land.Jun 9 2022, 9:02 AM

This revision was landed with ongoing or failed builds.Jun 9 2022, 6:58 PM

Closed by commit rGde7a6ae1ffc3: [InstCombine] Optimize shl+lshr+and conversion pattern (authored by bcl5980). · Explain Why

This revision was automatically updated to reflect the committed changes.

bcl5980 added a commit: rGde7a6ae1ffc3: [InstCombine] Optimize shl+lshr+and conversion pattern.

spatel mentioned this in D127801: [InstCombine] convert mask and shift of power-of-2 to cmp+select.Jun 14 2022, 2:45 PM

spatel mentioned this in rGbfde8619355a: [InstCombine] convert mask and shift of power-of-2 to cmp+select.Jun 17 2022, 7:52 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineAndOrXor.cpp

24 lines

test/

Transforms/

InstCombine/

and.ll

28 lines

icmp-and-shift.ll

26 lines

Diff 435762

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

Show First 20 Lines • Show All 1,898 Lines • ▼ Show 20 Lines	if (match(Op0, m_OneUse(m_BinOp(BO))) && isSuitableBinOpcode(BO)) {
C->isMask(X->getType()->getScalarSizeInBits())) {		C->isMask(X->getType()->getScalarSizeInBits())) {
Y = BO->getOperand(0);		Y = BO->getOperand(0);
Value *TrY = Builder.CreateTrunc(Y, X->getType(), Y->getName() + ".tr");		Value *TrY = Builder.CreateTrunc(Y, X->getType(), Y->getName() + ".tr");
Value *NewBO =		Value *NewBO =
Builder.CreateBinOp(BOpcode, TrY, X, BO->getName() + ".narrow");		Builder.CreateBinOp(BOpcode, TrY, X, BO->getName() + ".narrow");
return new ZExtInst(NewBO, Ty);		return new ZExtInst(NewBO, Ty);
}		}
}		}

		Constant C1, C2;
		const APInt *C3 = C;
		Value *X;
		if (C3->isPowerOf2() &&
		match(Op0, m_OneUse(m_LShr(m_Shl(m_ImmConstant(C1), m_Value(X)),
		m_ImmConstant(C2)))) &&
		spatelUnsubmitted Not Done Reply Inline Actions What happens if we reduce the pattern to: https://alive2.llvm.org/ce/z/7snGRd That's the same transform that I suggested in D126591, but invert the shift direction (lshr instead of shl). spatel: What happens if we reduce the pattern to: https://alive2.llvm.org/ce/z/7snGRd That's the same…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions The latest version is based on your suggetion: https://alive2.llvm.org/ce/z/7snGRd https://alive2.llvm.org/ce/z/jA_tNb I'm still worry if we can transform shift+and to cmp+select. Generally most highend cpu should prefer shift+and because the cmp instruction ports is less than shift/and. But in cmp instruction the immediate value can be imm operation but shift may need extra mov instruction on some mainstream backend. One other question is this transform can fix the case @shl_lshr_pow2_const. Can you help to review which way should I do ? bcl5980: The latest version is based on your suggetion: https://alive2.llvm.org/ce/z/7snGRd https…
		spatelUnsubmitted Not Done Reply Inline Actions Codegen for any particular target is not the main concern here. The backend should be able to invert the transforms that we make here if that is profitable. We partially demonstrated that with the assembly examples in the other patch. You can try similar experiments for these patterns. I looked at `@shl_lshr_pow2_const` for a while, and I don't see a very good generalization. We can add the larger pattern match for `and(shift(shift))`, or we can treat that as a special-case of demanding one bit only. If we view it as another demanded bits problem, then we could improve something like this: https://alive2.llvm.org/ce/z/3oDagP (but I don't have any evidence of that being important) spatel: Codegen for any particular target is not the main concern here. The backend should be able to…
		match(C1, m_Power2())) {
		spatelUnsubmitted Not Done Reply Inline Actions Use `m_Power2(C1)` ? spatel: Use `m_Power2(C1)` ?
		Constant *Log2C1 = ConstantExpr::getExactLogBase2(C1);
		Constant *Log2C3 = ConstantInt::get(Ty, C3->countTrailingZeros());
		Constant *LshrC = ConstantExpr::getAdd(C2, Log2C3);
		KnownBits KnownLShrc = computeKnownBits(LshrC, 0, nullptr);
		spatelUnsubmitted Not Done Reply Inline Actions Do we really need both conditions? I removed one assumption, and it still shows as correct: https://alive2.llvm.org/ce/z/nUAXL9 spatel: Do we really need both conditions? I removed one assumption, and it still shows as correct…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions Yeah, we can remove the condition cttz(C1) < cttz(C)+C2. Actually if cttz(C1) >= cttz(C)+C2, it will fall into D127122. So still the question which pattern we should use by default, 'shift+and' or 'icmp+select'? 'shift+and' pattern can remain the information x is limited by bit width. 'icmp+select' can help to handle shift+and+xor case, and icmp can handle lshr, shl at the same time. For now what I do is keep 'shift+and' ASAP but if we prefer icmp+select I can remove the condition. bcl5980: Yeah, we can remove the condition cttz(C1) < cttz(C)+C2. Actually if cttz(C1) >= cttz(C)+C2, it…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions Yeah, we can remove the condition cttz(C1) < cttz(C)+C2. Actually if cttz(C1) >= cttz(C)+C2, it will fall into D127122. Sorry for the wrong comments, actually if cttz(C1) >= cttz(C)+C2 it will always return 0. Last comment works when C2 <= cttz(C1) < cttz(C)+C2. I will remove cttz(C1) < cttz(C)+C2 to make the code easier. bcl5980: > Yeah, we can remove the condition cttz(C1) < cttz(C)+C2. > Actually if cttz(C1) >= cttz(C)+C2…
		if (KnownLShrc.getMaxValue().ult(Width)) {
		// iff C1,C3 is pow2 and C2 + cttz(C3) < BitWidth:
		// ((C1 << X) >> C2) & C3 -> X == (cttz(C3)+C2-cttz(C1)) ? C3 : 0
		Constant *CmpC = ConstantExpr::getSub(LshrC, Log2C1);
		Value *Cmp = Builder.CreateICmpEQ(X, CmpC);
		return SelectInst::Create(Cmp, ConstantInt::get(Ty, *C3),
		ConstantInt::getNullValue(Ty));
		}
		// TODO: Symmetrical case
		// iff C1,C3 is pow2 and Log2(C3) >= C2:
		// ((C1 >> X) << C2) & C3 -> X == (cttz(C1)+C2-cttz(C3)) ? C3 : 0
		}
}		}

if (match(&I, m_And(m_OneUse(m_Shl(m_ZExt(m_Value(X)), m_Value(Y))),		if (match(&I, m_And(m_OneUse(m_Shl(m_ZExt(m_Value(X)), m_Value(Y))),
m_SignMask())) &&		m_SignMask())) &&
match(Y, m_SpecificInt_ICMP(		match(Y, m_SpecificInt_ICMP(
		spatelUnsubmitted Done Reply Inline Actions This pattern with 2 shifts in the same direction should not exist after: a0c3c60728ee5bc7 spatel: This pattern with 2 shifts in the same direction should not exist after: a0c3c60728ee5bc7
ICmpInst::Predicate::ICMP_EQ,		ICmpInst::Predicate::ICMP_EQ,
APInt(Ty->getScalarSizeInBits(),		APInt(Ty->getScalarSizeInBits(),
Ty->getScalarSizeInBits() -		Ty->getScalarSizeInBits() -
X->getType()->getScalarSizeInBits())))) {		X->getType()->getScalarSizeInBits())))) {
auto *SExt = Builder.CreateSExt(X, Ty, X->getName() + ".signext");		auto *SExt = Builder.CreateSExt(X, Ty, X->getName() + ".signext");
auto *SanitizedSignMask = cast<Constant>(Op1);		auto *SanitizedSignMask = cast<Constant>(Op1);
// We must be careful with the undef elements of the sign bit mask, however:		// We must be careful with the undef elements of the sign bit mask, however:
// the mask elt can be undef iff the shift amount for that lane was undef,		// the mask elt can be undef iff the shift amount for that lane was undef,
▲ Show 20 Lines • Show All 1,821 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/and.ll

	Show First 20 Lines • Show All 1,767 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[R:%.]] = and i8 [[NOT]], [[Y:%.]]			; CHECK-NEXT: [[R:%.]] = and i8 [[NOT]], [[Y:%.]]
	; CHECK-NEXT: ret i8 [[R]]			; CHECK-NEXT: ret i8 [[R]]
	;			;
	%sign = lshr i8 %x, 7			%sign = lshr i8 %x, 7
	%not = xor i8 %sign, -1			%not = xor i8 %sign, -1
	%r = and i8 %not, %y			%r = and i8 %not, %y
	ret i8 %r			ret i8 %r
	}			}

	; CTTZ(ShlC) < LShrC			; CTTZ(ShlC) < LShrC
				spatelUnsubmitted Not Done Reply Inline Actions This diff does not exist with the current test on "main", right? Is this review baselined against another patch? spatel: This diff does not exist with the current test on "main", right? Is this review baselined…
				bcl5980AuthorUnsubmitted Done Reply Inline Actions I'm sorry this is based on Diff3. I will rebase the review based on main. bcl5980: I'm sorry this is based on Diff3. I will rebase the review based on main.
				spatelUnsubmitted Not Done Reply Inline Actions Do we have a negative test with "cttz(ShlC) > LShrC"? If not, please add that. If the test is already here, then add a comment like that, so we know the purpose of the test. spatel: Do we have a negative test with "cttz(ShlC) > LShrC"? If not, please add that. If the test is…

	define i16 @shl_lshr_pow2_const_case1(i16 %x) {			define i16 @shl_lshr_pow2_const_case1(i16 %x) {
	; CHECK-LABEL: @shl_lshr_pow2_const_case1(			; CHECK-LABEL: @shl_lshr_pow2_const_case1(
	; CHECK-NEXT: [[SHL:%.]] = shl i16 4, [[X:%.]]			; CHECK-NEXT: [[TMP1:%.]] = icmp eq i16 [[X:%.]], 7
	; CHECK-NEXT: [[LSHR:%.*]] = lshr i16 [[SHL]], 6			; CHECK-NEXT: [[R:%.*]] = select i1 [[TMP1]], i16 8, i16 0
	; CHECK-NEXT: [[R:%.*]] = and i16 [[LSHR]], 8
	; CHECK-NEXT: ret i16 [[R]]			; CHECK-NEXT: ret i16 [[R]]
	;			;
	%shl = shl i16 4, %x			%shl = shl i16 4, %x
	%lshr = lshr i16 %shl, 6			%lshr = lshr i16 %shl, 6
	%r = and i16 %lshr, 8			%r = and i16 %lshr, 8
	ret i16 %r			ret i16 %r
	}			}

	define <3 x i16> @shl_lshr_pow2_const_case1_uniform_vec(<3 x i16> %x) {			define <3 x i16> @shl_lshr_pow2_const_case1_uniform_vec(<3 x i16> %x) {
	; CHECK-LABEL: @shl_lshr_pow2_const_case1_uniform_vec(			; CHECK-LABEL: @shl_lshr_pow2_const_case1_uniform_vec(
				spatelUnsubmitted Not Done Reply Inline Actions This test is already optimized with D127122 ? It's fine to add another test, but please pre-commit before this patch, so we know how this patch alone is changing the tests. spatel: This test is already optimized with D127122 ? It's fine to add another test, but please pre…
	; CHECK-NEXT: [[SHL:%.]] = shl <3 x i16> <i16 4, i16 4, i16 4>, [[X:%.]]			; CHECK-NEXT: [[TMP1:%.]] = icmp eq <3 x i16> [[X:%.]], <i16 7, i16 7, i16 7>
	; CHECK-NEXT: [[LSHR:%.*]] = lshr <3 x i16> [[SHL]], <i16 6, i16 6, i16 6>			; CHECK-NEXT: [[R:%.*]] = select <3 x i1> [[TMP1]], <3 x i16> <i16 8, i16 8, i16 8>, <3 x i16> zeroinitializer
	; CHECK-NEXT: [[R:%.*]] = and <3 x i16> [[LSHR]], <i16 8, i16 8, i16 8>
	; CHECK-NEXT: ret <3 x i16> [[R]]			; CHECK-NEXT: ret <3 x i16> [[R]]
	;			;
	%shl = shl <3 x i16> <i16 4, i16 4, i16 4>, %x			%shl = shl <3 x i16> <i16 4, i16 4, i16 4>, %x
	%lshr = lshr <3 x i16> %shl, <i16 6, i16 6, i16 6>			%lshr = lshr <3 x i16> %shl, <i16 6, i16 6, i16 6>
	%r = and <3 x i16> %lshr, <i16 8, i16 8, i16 8>			%r = and <3 x i16> %lshr, <i16 8, i16 8, i16 8>
	ret <3 x i16> %r			ret <3 x i16> %r
	}			}

	define <3 x i16> @shl_lshr_pow2_const_case1_non_uniform_vec(<3 x i16> %x) {			define <3 x i16> @shl_lshr_pow2_const_case1_non_uniform_vec(<3 x i16> %x) {
	; CHECK-LABEL: @shl_lshr_pow2_const_case1_non_uniform_vec(			; CHECK-LABEL: @shl_lshr_pow2_const_case1_non_uniform_vec(
	; CHECK-NEXT: [[SHL:%.]] = shl <3 x i16> <i16 16, i16 8, i16 4>, [[X:%.]]			; CHECK-NEXT: [[TMP1:%.]] = icmp eq <3 x i16> [[X:%.]], <i16 7, i16 6, i16 1>
	; CHECK-NEXT: [[LSHR:%.*]] = lshr <3 x i16> [[SHL]], <i16 5, i16 4, i16 3>			; CHECK-NEXT: [[R:%.*]] = select <3 x i1> [[TMP1]], <3 x i16> <i16 8, i16 8, i16 8>, <3 x i16> zeroinitializer
	; CHECK-NEXT: [[R:%.*]] = and <3 x i16> [[LSHR]], <i16 8, i16 16, i16 4>
	; CHECK-NEXT: ret <3 x i16> [[R]]			; CHECK-NEXT: ret <3 x i16> [[R]]
	;			;
	%shl = shl <3 x i16> <i16 16, i16 8, i16 4>, %x			%shl = shl <3 x i16> <i16 2, i16 8, i16 32>, %x
	%lshr = lshr <3 x i16> %shl, <i16 5, i16 4, i16 3>			%lshr = lshr <3 x i16> %shl, <i16 5, i16 6, i16 3>
	%r = and <3 x i16> %lshr, <i16 8, i16 16, i16 4>			%r = and <3 x i16> %lshr, <i16 8, i16 8, i16 8>
	ret <3 x i16> %r			ret <3 x i16> %r
	}			}

	define <3 x i16> @shl_lshr_pow2_const_case1_undef1_vec(<3 x i16> %x) {			define <3 x i16> @shl_lshr_pow2_const_case1_undef1_vec(<3 x i16> %x) {
	; CHECK-LABEL: @shl_lshr_pow2_const_case1_undef1_vec(			; CHECK-LABEL: @shl_lshr_pow2_const_case1_undef1_vec(
	; CHECK-NEXT: [[SHL:%.]] = shl <3 x i16> <i16 undef, i16 16, i16 16>, [[X:%.]]			; CHECK-NEXT: [[TMP1:%.]] = icmp eq <3 x i16> [[X:%.]], <i16 8, i16 4, i16 4>
	; CHECK-NEXT: [[LSHR:%.*]] = lshr <3 x i16> [[SHL]], <i16 5, i16 5, i16 5>			; CHECK-NEXT: [[R:%.*]] = select <3 x i1> [[TMP1]], <3 x i16> <i16 8, i16 8, i16 8>, <3 x i16> zeroinitializer
	; CHECK-NEXT: [[R:%.*]] = and <3 x i16> [[LSHR]], <i16 8, i16 8, i16 8>
	; CHECK-NEXT: ret <3 x i16> [[R]]			; CHECK-NEXT: ret <3 x i16> [[R]]
	;			;
	%shl = shl <3 x i16> <i16 undef, i16 16, i16 16>, %x			%shl = shl <3 x i16> <i16 undef, i16 16, i16 16>, %x
	%lshr = lshr <3 x i16> %shl, <i16 5, i16 5, i16 5>			%lshr = lshr <3 x i16> %shl, <i16 5, i16 5, i16 5>
	%r = and <3 x i16> %lshr, <i16 8, i16 8, i16 8>			%r = and <3 x i16> %lshr, <i16 8, i16 8, i16 8>
	ret <3 x i16> %r			ret <3 x i16> %r
	}			}

	Show All 32 Lines
	; CHECK-NEXT: ret i16 [[R]]			; CHECK-NEXT: ret i16 [[R]]
	;			;
	%shl = shl i16 16, %x			%shl = shl i16 16, %x
	%lshr = lshr i16 %shl, 3			%lshr = lshr i16 %shl, 3
	%r = and i16 %lshr, 8			%r = and i16 %lshr, 8
	ret i16 %r			ret i16 %r
	}			}

				; TODO: this pattern can be transform to icmp+select

	define i16 @shl_lshr_pow2_not_const_case2(i16 %x) {			define i16 @shl_lshr_pow2_not_const_case2(i16 %x) {
	; CHECK-LABEL: @shl_lshr_pow2_not_const_case2(			; CHECK-LABEL: @shl_lshr_pow2_not_const_case2(
	; CHECK-NEXT: [[TMP1:%.]] = shl i16 2, [[X:%.]]			; CHECK-NEXT: [[TMP1:%.]] = shl i16 2, [[X:%.]]
	; CHECK-NEXT: [[AND:%.*]] = and i16 [[TMP1]], 8			; CHECK-NEXT: [[AND:%.*]] = and i16 [[TMP1]], 8
	; CHECK-NEXT: [[R:%.*]] = xor i16 [[AND]], 8			; CHECK-NEXT: [[R:%.*]] = xor i16 [[AND]], 8
	; CHECK-NEXT: ret i16 [[R]]			; CHECK-NEXT: ret i16 [[R]]
	;			;
	%shl = shl i16 16, %x			%shl = shl i16 16, %x
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	}			}

	define i16 @lshr_lshr_pow2_const(i16 %x) {			define i16 @lshr_lshr_pow2_const(i16 %x) {
	; CHECK-LABEL: @lshr_lshr_pow2_const(			; CHECK-LABEL: @lshr_lshr_pow2_const(
	; CHECK-NEXT: [[LSHR2:%.]] = lshr i16 32, [[X:%.]]			; CHECK-NEXT: [[LSHR2:%.]] = lshr i16 32, [[X:%.]]
	; CHECK-NEXT: [[R:%.*]] = and i16 [[LSHR2]], 4			; CHECK-NEXT: [[R:%.*]] = and i16 [[LSHR2]], 4
	; CHECK-NEXT: ret i16 [[R]]			; CHECK-NEXT: ret i16 [[R]]
	;			;
	%lshr1 = lshr i16 2048, %x			%lshr1 = lshr i16 2048, %x
	%lshr2 = lshr i16 %lshr1, 6			%lshr2 = lshr i16 %lshr1, 6
				spatelUnsubmitted Done Reply Inline Actions We had not reduced shifts as much as possible in this test and several others: spatel: We had not reduced shifts as much as possible in this test and several others:
	%r = and i16 %lshr2, 4			%r = and i16 %lshr2, 4
	ret i16 %r			ret i16 %r
	}			}

	define i16 @lshr_lshr_pow2_const_negative_oneuse(i16 %x) {			define i16 @lshr_lshr_pow2_const_negative_oneuse(i16 %x) {
	; CHECK-LABEL: @lshr_lshr_pow2_const_negative_oneuse(			; CHECK-LABEL: @lshr_lshr_pow2_const_negative_oneuse(
	; CHECK-NEXT: [[LSHR2:%.]] = lshr i16 32, [[X:%.]]			; CHECK-NEXT: [[LSHR2:%.]] = lshr i16 32, [[X:%.]]
	; CHECK-NEXT: call void @use16(i16 [[LSHR2]])			; CHECK-NEXT: call void @use16(i16 [[LSHR2]])
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/icmp-and-shift.ll

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	;
%and = and <2 x i32> %shl, <i32 16, i32 16>		%and = and <2 x i32> %shl, <i32 16, i32 16>
%cmp = icmp ne <2 x i32> %and, <i32 0, i32 0>		%cmp = icmp ne <2 x i32> %and, <i32 0, i32 0>
%conv = zext <2 x i1> %cmp to <2 x i32>		%conv = zext <2 x i1> %cmp to <2 x i32>
ret <2 x i32> %conv		ret <2 x i32> %conv
}		}

define i32 @icmp_eq_and_pow2_shl_pow2(i32 %0) {		define i32 @icmp_eq_and_pow2_shl_pow2(i32 %0) {
; CHECK-LABEL: @icmp_eq_and_pow2_shl_pow2(		; CHECK-LABEL: @icmp_eq_and_pow2_shl_pow2(
; CHECK-NEXT: [[SHL:%.]] = shl i32 2, [[TMP0:%.]]		; CHECK-NEXT: [[TMP2:%.]] = icmp ne i32 [[TMP0:%.]], 3
; CHECK-NEXT: [[AND:%.*]] = lshr i32 [[SHL]], 4		; CHECK-NEXT: [[TMP3:%.*]] = zext i1 [[TMP2]] to i32
; CHECK-NEXT: [[AND_LOBIT:%.*]] = and i32 [[AND]], 1		; CHECK-NEXT: ret i32 [[TMP3]]
; CHECK-NEXT: [[TMP2:%.*]] = xor i32 [[AND_LOBIT]], 1
; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%shl = shl i32 2, %0		%shl = shl i32 2, %0
%and = and i32 %shl, 16		%and = and i32 %shl, 16
%cmp = icmp eq i32 %and, 0		%cmp = icmp eq i32 %and, 0
%conv = zext i1 %cmp to i32		%conv = zext i1 %cmp to i32
ret i32 %conv		ret i32 %conv
}		}

define <2 x i32> @icmp_eq_and_pow2_shl_pow2_vec(<2 x i32> %0) {		define <2 x i32> @icmp_eq_and_pow2_shl_pow2_vec(<2 x i32> %0) {
; CHECK-LABEL: @icmp_eq_and_pow2_shl_pow2_vec(		; CHECK-LABEL: @icmp_eq_and_pow2_shl_pow2_vec(
; CHECK-NEXT: [[SHL:%.]] = shl <2 x i32> <i32 4, i32 4>, [[TMP0:%.]]		; CHECK-NEXT: [[TMP2:%.]] = icmp ne <2 x i32> [[TMP0:%.]], <i32 2, i32 2>
; CHECK-NEXT: [[AND:%.*]] = lshr <2 x i32> [[SHL]], <i32 4, i32 4>		; CHECK-NEXT: [[TMP3:%.*]] = zext <2 x i1> [[TMP2]] to <2 x i32>
; CHECK-NEXT: [[AND_LOBIT:%.*]] = and <2 x i32> [[AND]], <i32 1, i32 1>		; CHECK-NEXT: ret <2 x i32> [[TMP3]]
; CHECK-NEXT: [[TMP2:%.*]] = xor <2 x i32> [[AND_LOBIT]], <i32 1, i32 1>
; CHECK-NEXT: ret <2 x i32> [[TMP2]]
;		;
%shl = shl <2 x i32> <i32 4, i32 4>, %0		%shl = shl <2 x i32> <i32 4, i32 4>, %0
%and = and <2 x i32> %shl, <i32 16, i32 16>		%and = and <2 x i32> %shl, <i32 16, i32 16>
%cmp = icmp eq <2 x i32> %and, <i32 0, i32 0>		%cmp = icmp eq <2 x i32> %and, <i32 0, i32 0>
%conv = zext <2 x i1> %cmp to <2 x i32>		%conv = zext <2 x i1> %cmp to <2 x i32>
ret <2 x i32> %conv		ret <2 x i32> %conv
}		}

define i32 @icmp_ne_and_pow2_shl_pow2(i32 %0) {		define i32 @icmp_ne_and_pow2_shl_pow2(i32 %0) {
; CHECK-LABEL: @icmp_ne_and_pow2_shl_pow2(		; CHECK-LABEL: @icmp_ne_and_pow2_shl_pow2(
; CHECK-NEXT: [[SHL:%.]] = shl i32 2, [[TMP0:%.]]		; CHECK-NEXT: [[TMP2:%.]] = icmp eq i32 [[TMP0:%.]], 3
; CHECK-NEXT: [[AND:%.*]] = lshr i32 [[SHL]], 4		; CHECK-NEXT: [[AND_LOBIT:%.*]] = zext i1 [[TMP2]] to i32
; CHECK-NEXT: [[AND_LOBIT:%.*]] = and i32 [[AND]], 1
; CHECK-NEXT: ret i32 [[AND_LOBIT]]		; CHECK-NEXT: ret i32 [[AND_LOBIT]]
;		;
%shl = shl i32 2, %0		%shl = shl i32 2, %0
%and = and i32 %shl, 16		%and = and i32 %shl, 16
%cmp = icmp ne i32 %and, 0		%cmp = icmp ne i32 %and, 0
%conv = zext i1 %cmp to i32		%conv = zext i1 %cmp to i32
ret i32 %conv		ret i32 %conv
}		}

define <2 x i32> @icmp_ne_and_pow2_shl_pow2_vec(<2 x i32> %0) {		define <2 x i32> @icmp_ne_and_pow2_shl_pow2_vec(<2 x i32> %0) {
; CHECK-LABEL: @icmp_ne_and_pow2_shl_pow2_vec(		; CHECK-LABEL: @icmp_ne_and_pow2_shl_pow2_vec(
; CHECK-NEXT: [[SHL:%.]] = shl <2 x i32> <i32 4, i32 4>, [[TMP0:%.]]		; CHECK-NEXT: [[TMP2:%.]] = icmp eq <2 x i32> [[TMP0:%.]], <i32 2, i32 2>
; CHECK-NEXT: [[AND:%.*]] = lshr <2 x i32> [[SHL]], <i32 4, i32 4>		; CHECK-NEXT: [[AND_LOBIT:%.*]] = zext <2 x i1> [[TMP2]] to <2 x i32>
; CHECK-NEXT: [[AND_LOBIT:%.*]] = and <2 x i32> [[AND]], <i32 1, i32 1>
; CHECK-NEXT: ret <2 x i32> [[AND_LOBIT]]		; CHECK-NEXT: ret <2 x i32> [[AND_LOBIT]]
;		;
%shl = shl <2 x i32> <i32 4, i32 4>, %0		%shl = shl <2 x i32> <i32 4, i32 4>, %0
%and = and <2 x i32> %shl, <i32 16, i32 16>		%and = and <2 x i32> %shl, <i32 16, i32 16>
%cmp = icmp ne <2 x i32> %and, <i32 0, i32 0>		%cmp = icmp ne <2 x i32> %and, <i32 0, i32 0>
%conv = zext <2 x i1> %cmp to <2 x i32>		%conv = zext <2 x i1> %cmp to <2 x i32>
ret <2 x i32> %conv		ret <2 x i32> %conv
}		}
▲ Show 20 Lines • Show All 339 Lines • Show Last 20 Lines