This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Optimize shl+lshr+and conversion pattern
ClosedPublic

Authored by bcl5980 on May 29 2022, 9:46 AM.

Details

Summary

if C1 and C3 are pow2 and Log2(C3)+C2 < BitWidth:

((C1 << X) >> C2) & C3 -> X == (Log2(C3)+C2-Log2(C1)) ? C3 : 0;

https://alive2.llvm.org/ce/z/Pus5bd

Fix issue https://github.com/llvm/llvm-project/issues/55739

Diff Detail

Event Timeline

bcl5980 created this revision.May 29 2022, 9:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 29 2022, 9:46 AM
bcl5980 requested review of this revision.May 29 2022, 9:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 29 2022, 9:46 AM
bcl5980 edited the summary of this revision. (Show Details)May 29 2022, 9:47 AM
bcl5980 updated this revision to Diff 432798.May 29 2022, 10:13 AM

rebase with more tests diff

spatel edited the summary of this revision. (Show Details)May 30 2022, 4:37 AM
spatel added inline comments.May 30 2022, 12:55 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1916

This pattern with 2 shifts in the same direction should not exist after:
a0c3c60728ee5bc7

llvm/test/Transforms/InstCombine/and.ll
1731–1732

We had not reduced shifts as much as possible in this test and several others:

bcl5980 updated this revision to Diff 433013.May 30 2022, 10:02 PM
bcl5980 retitled this revision from [InstCombine] Optimize shift+lshr+and conversion pattern to simple comparison. to [InstCombine] Optimize shl+lshr+and conversion pattern.
bcl5980 edited the summary of this revision. (Show Details)
  1. rebase code
  2. remove lshr+lshr pattern
  3. add a new transform to make shift+and have higher priority
bcl5980 edited the summary of this revision. (Show Details)May 30 2022, 10:02 PM
bcl5980 updated this revision to Diff 433014.May 30 2022, 10:08 PM
bcl5980 edited the summary of this revision. (Show Details)

update comments

bcl5980 marked 2 inline comments as done.May 30 2022, 10:11 PM

I still think we should split this patch up as 2 independent transforms.

The opposite shifts transform doesn't seem like it should be a power-of-2-mask transform. Can we handle that using demanded bits instead? Double-check (you can pre-commit more tests as needed), but I don't think this patch will handle these related folds:
https://alive2.llvm.org/ce/z/SNmj5M

bcl5980 added a comment.EditedMay 31 2022, 8:06 AM

I still think we should split this patch up as 2 independent transforms.

The opposite shifts transform doesn't seem like it should be a power-of-2-mask transform. Can we handle that using demanded bits instead? Double-check (you can pre-commit more tests as needed), but I don't think this patch will handle these related folds:
https://alive2.llvm.org/ce/z/SNmj5M

Thanks for the mention. Is this transform you want ?
https://alive2.llvm.org/ce/z/-C8L9U
If yes, I will send a new patch to do this.

bcl5980 updated this revision to Diff 433088.May 31 2022, 8:15 AM
bcl5980 edited the summary of this revision. (Show Details)

I still think we should split this patch up as 2 independent transforms.

The opposite shifts transform doesn't seem like it should be a power-of-2-mask transform. Can we handle that using demanded bits instead? Double-check (you can pre-commit more tests as needed), but I don't think this patch will handle these related folds:
https://alive2.llvm.org/ce/z/SNmj5M

Thanks for the mention. Is this transform you want ?
https://alive2.llvm.org/ce/z/-C8L9U
If yes, I will send a new patch to do this.

Yes, the first pre-condition looks correct. We don't actually care what the final instruction in the sequence is - it just has to remove demand of the high bits. The last instruction could be a trunc for example, so we should have tests with that too:
https://alive2.llvm.org/ce/z/ZCgqj5

We already look for that pattern in InstCombine's demanded bits. So I think we just need to add a transform like this:

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
index 278db05f65d1..c0d92fc27bb6 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
@@ -630,6 +630,18 @@ Value *InstCombinerImpl::SimplifyDemandedUseBits(Value *V, APInt DemandedMask,
             ComputeNumSignBits(I->getOperand(0), Depth + 1, CxtI);
         if (SignBits >= NumHiDemandedBits)
           return I->getOperand(0);
+
+        // If we can pre-shift a left-shifted constant to the right without
+        // losing any low bits (we already know we don't demand the high bits):
+        // (C << X) >> SA --> (C >> SA) << X
+        Value *X;
+        const APInt *C;
+        if (match(I->getOperand(0), m_Shl(m_APInt(C), m_Value(X))) &&
+            C->countTrailingZeros() >= ShiftAmt) {
+          Constant *ShiftC = ConstantInt::get(I->getType(), C->lshr(ShiftAmt));
+          Instruction *Shl = BinaryOperator::CreateShl(ShiftC, X);
+          return InsertNewInstWith(Shl, *I);
+        }
       }
 
       // Unsigned shift right.
spatel added inline comments.May 31 2022, 11:54 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1892–1894

What happens if we reduce the pattern to:
https://alive2.llvm.org/ce/z/7snGRd

That's the same transform that I suggested in D126591, but invert the shift direction (lshr instead of shl).

I still think we should split this patch up as 2 independent transforms.

The opposite shifts transform doesn't seem like it should be a power-of-2-mask transform. Can we handle that using demanded bits instead? Double-check (you can pre-commit more tests as needed), but I don't think this patch will handle these related folds:
https://alive2.llvm.org/ce/z/SNmj5M

Thanks for the mention. Is this transform you want ?
https://alive2.llvm.org/ce/z/-C8L9U
If yes, I will send a new patch to do this.

Yes, the first pre-condition looks correct. We don't actually care what the final instruction in the sequence is - it just has to remove demand of the high bits. The last instruction could be a trunc for example, so we should have tests with that too:
https://alive2.llvm.org/ce/z/ZCgqj5

We already look for that pattern in InstCombine's demanded bits. So I think we just need to add a transform like this:

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
index 278db05f65d1..c0d92fc27bb6 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
@@ -630,6 +630,18 @@ Value *InstCombinerImpl::SimplifyDemandedUseBits(Value *V, APInt DemandedMask,
             ComputeNumSignBits(I->getOperand(0), Depth + 1, CxtI);
         if (SignBits >= NumHiDemandedBits)
           return I->getOperand(0);
+
+        // If we can pre-shift a left-shifted constant to the right without
+        // losing any low bits (we already know we don't demand the high bits):
+        // (C << X) >> SA --> (C >> SA) << X
+        Value *X;
+        const APInt *C;
+        if (match(I->getOperand(0), m_Shl(m_APInt(C), m_Value(X))) &&
+            C->countTrailingZeros() >= ShiftAmt) {
+          Constant *ShiftC = ConstantInt::get(I->getType(), C->lshr(ShiftAmt));
+          Instruction *Shl = BinaryOperator::CreateShl(ShiftC, X);
+          return InsertNewInstWith(Shl, *I);
+        }
       }
 
       // Unsigned shift right.

Wow, that's cool. A very general solution.

bcl5980 updated this revision to Diff 433278.May 31 2022, 9:19 PM
bcl5980 added inline comments.May 31 2022, 9:31 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1892–1894

The latest version is based on your suggetion:
https://alive2.llvm.org/ce/z/7snGRd
https://alive2.llvm.org/ce/z/jA_tNb

I'm still worry if we can transform shift+and to cmp+select.
Generally most highend cpu should prefer shift+and because the cmp instruction ports is less than shift/and.
But in cmp instruction the immediate value can be imm operation but shift may need extra mov instruction on some mainstream backend.
One other question is this transform can fix the case @shl_lshr_pow2_const.
Can you help to review which way should I do ?

spatel added inline comments.Jun 2 2022, 8:09 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1892–1894

Codegen for any particular target is not the main concern here. The backend should be able to invert the transforms that we make here if that is profitable.

We partially demonstrated that with the assembly examples in the other patch. You can try similar experiments for these patterns.

I looked at @shl_lshr_pow2_const for a while, and I don't see a very good generalization. We can add the larger pattern match for and(shift(shift)), or we can treat that as a special-case of demanding one bit only.

If we view it as another demanded bits problem, then we could improve something like this:
https://alive2.llvm.org/ce/z/3oDagP
(but I don't have any evidence of that being important)

llvm/test/Transforms/InstCombine/and.ll
1625

This diff does not exist with the current test on "main", right? Is this review baselined against another patch?

bcl5980 added a comment.EditedJun 5 2022, 8:02 PM

Sorry for the late response. Based on the discussion I have some questions:

  1. the demanded bits fix. I think it is a safe and independent change. Should I check in this part first ? Or @spatel can you help to do this as the author is you.
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
index 278db05f65d1..c0d92fc27bb6 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
@@ -630,6 +630,18 @@ Value *InstCombinerImpl::SimplifyDemandedUseBits(Value *V, APInt DemandedMask,
             ComputeNumSignBits(I->getOperand(0), Depth + 1, CxtI);
         if (SignBits >= NumHiDemandedBits)
           return I->getOperand(0);
+
+        // If we can pre-shift a left-shifted constant to the right without
+        // losing any low bits (we already know we don't demand the high bits):
+        // (C << X) >> SA --> (C >> SA) << X
+        Value *X;
+        const APInt *C;
+        if (match(I->getOperand(0), m_Shl(m_APInt(C), m_Value(X))) &&
+            C->countTrailingZeros() >= ShiftAmt) {
+          Constant *ShiftC = ConstantInt::get(I->getType(), C->lshr(ShiftAmt));
+          Instruction *Shl = BinaryOperator::CreateShl(ShiftC, X);
+          return InsertNewInstWith(Shl, *I);
+        }
       }
  1. Should we transform 'shift+and' to 'icmp+select'? I think backend is not easy to invert the transform. We need to make sure input value is less than BitWidth then can invert. So I prefer to do this only when we can save instructions.
  1. How to fix @shl_lshr_pow2_const? I think current review is add the larger pattern match for and(shift(shift)). I'm sorry I don't know how to fix based on demanded bits? It will be grateful if you can help to teach me the detail solution?
  1. Do we need D126591 after we fix @shl_lshr_pow2_const? There are still have some patterns we can't cover. For example the pattern:
 iff (C1 is pow2) & ((C2 & ~(C1-1)) + C1) is pow2) & (C1 < C2):
((C1 << X) & C2) == 0 -> X >= (Log2(C2+C1) - Log2(C1)); https://alive2.llvm.org/ce/z/JQYFnn
((C1 << X) & C2) != 0 -> X  < (Log2(C2+C1) - Log2(C1)); https://alive2.llvm.org/ce/z/BnyEmk
llvm/test/Transforms/InstCombine/and.ll
1625

I'm sorry this is based on Diff3. I will rebase the review based on main.

bcl5980 updated this revision to Diff 434380.EditedJun 5 2022, 8:38 PM

update based on main branch and revert the code back to diff5

spatel added a comment.Jun 6 2022, 8:59 AM
  1. the demanded bits fix. I think it is a safe and independent change.

Ok, let's try to make improvements in small steps; one part for demanded bits is here:
D127122

bcl5980 edited the summary of this revision. (Show Details)Jun 6 2022, 10:13 PM
bcl5980 edited the summary of this revision. (Show Details)Jun 6 2022, 10:44 PM
spatel added inline comments.Jun 7 2022, 1:36 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1895

Use m_Power2(C1) ?

1899

Do we really need both conditions? I removed one assumption, and it still shows as correct:
https://alive2.llvm.org/ce/z/nUAXL9

bcl5980 added inline comments.Jun 7 2022, 6:40 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1899

Yeah, we can remove the condition cttz(C1) < cttz(C)+C2.
Actually if cttz(C1) >= cttz(C)+C2, it will fall into D127122.
So still the question which pattern we should use by default, 'shift+and' or 'icmp+select'?
'shift+and' pattern can remain the information x is limited by bit width.
'icmp+select' can help to handle shift+and+xor case, and icmp can handle lshr, shl at the same time.
For now what I do is keep 'shift+and' ASAP but if we prefer icmp+select I can remove the condition.

bcl5980 added inline comments.Jun 7 2022, 6:51 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1899

Yeah, we can remove the condition cttz(C1) < cttz(C)+C2.
Actually if cttz(C1) >= cttz(C)+C2, it will fall into D127122.

Sorry for the wrong comments, actually if cttz(C1) >= cttz(C)+C2 it will always return 0.
Last comment works when C2 <= cttz(C1) < cttz(C)+C2.
I will remove cttz(C1) < cttz(C)+C2 to make the code easier.

bcl5980 updated this revision to Diff 435028.Jun 7 2022, 7:34 PM
bcl5980 edited the summary of this revision. (Show Details)
  1. Use m_Power2 to match C1
  2. Remove condition Log2(C1) < Log2(C3)+C2
  3. Add one more test case when C2 < Log2(C1) < Log2(C3)+C2
  1. Use m_Power2 to match C1
  2. Remove condition Log2(C1) < Log2(C3)+C2

Update the Alive2 proof in the patch description, so it matches the new code.
Will you add the pattern where the shift order is reversed in another patch? ( https://alive2.llvm.org/ce/z/fNdbfZ )
You can put a TODO comment with the code in this patch, so we know it is should be added for symmetry.

llvm/test/Transforms/InstCombine/and.ll
1625

Do we have a negative test with "cttz(ShlC) > LShrC"? If not, please add that. If the test is already here, then add a comment like that, so we know the purpose of the test.

1638

This test is already optimized with D127122 ? It's fine to add another test, but please pre-commit before this patch, so we know how this patch alone is changing the tests.

bcl5980 edited the summary of this revision. (Show Details)Jun 8 2022, 7:26 PM
bcl5980 updated this revision to Diff 435413.Jun 8 2022, 8:29 PM

rebase with new tests.

bcl5980 updated this revision to Diff 435416.Jun 8 2022, 8:47 PM

add to do for Symmetrical case

bcl5980 updated this revision to Diff 435422.Jun 8 2022, 10:07 PM

try support non-uniform case.

spatel accepted this revision.Jun 9 2022, 9:02 AM

LGTM

If I'm seeing it correctly, this will alter D126591 or possibly make it unnecessary. I recommend implementing the symmetric TODO pattern for this patch as the next patch, and then we can see what remains.

This revision is now accepted and ready to land.Jun 9 2022, 9:02 AM
This revision was landed with ongoing or failed builds.Jun 9 2022, 6:58 PM
This revision was automatically updated to reflect the committed changes.