This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. a/b with mask (PR42563)
ClosedPublic

Authored by lebedev.ri on Sep 17 2019, 1:11 PM.

Details

Summary

And this is finally the interesting part of that fold!

If we have a pattern (x & (~(-1 << maskNbits))) << shiftNbits,
we already know (have a fold) that will drop the & (~(-1 << maskNbits))
mask iff (maskNbits+shiftNbits) u>= bitwidth(x).
But that is actually ignorant, there's more general fold here:

In this pattern, (maskNbits+shiftNbits) actually correlates
with the number of low bits that will remain in the final value.
So even if (maskNbits+shiftNbits) u< bitwidth(x), we can still
fold, we will just need to apply a constant mask afterwards:

Name: a, normal+mask
  %onebit = shl i32 -1, C1
  %mask = xor i32 %onebit, -1
  %masked = and i32 %mask, %x
  %r = shl i32 %masked, C2
=>
  %n0 = shl i32 %x, C2
  %n1 = add i32 C1, C2
  %n2 = zext i32 %n1 to i64
  %n3 = shl i64 -1, %n2
  %n4 = xor i64 %n3, -1
  %n5 = trunc i64 %n4 to i32
  %r = and i32 %n0, %n5

https://rise4fun.com/Alive/F5R

Naturally, old %masked will have to be one-use.
Similar fold exists for patterns c,d,e, will post patch later.

https://bugs.llvm.org/show_bug.cgi?id=42563

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Sep 17 2019, 1:11 PM

Upload correct diff with no noise in tests..

lebedev.ri edited the summary of this revision. (Show Details)

Rebased, NFC.

spatel added inline comments.Sep 19 2019, 8:42 AM
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
176–180 ↗(On Diff #220712)

Is there a test showing that we need this ext+trunc complexity?

See if I've botched this Alive somehow, but the simpler constant mask appears to work:
https://rise4fun.com/Alive/ArQC

lebedev.ri added inline comments.Sep 19 2019, 9:19 AM
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
176–180 ↗(On Diff #220712)

Hmm. The reason i've gone forward with ext/trunc is: https://rise4fun.com/Alive/o5l
In your example alive does not complain because those are constants, and somehow the usual poison rules don't apply?
Are we sure this isn't alive limitation, but the correct behavior?

lebedev.ri marked an inline comment as done.Sep 19 2019, 9:20 AM
lebedev.ri added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
176–180 ↗(On Diff #220712)

I.e., i don't think i checked, what happens if ConstantExpr::getShl() is called with such out-of-bounds shift amount? Does it correctly handle it, or will ubsan complain, etc?

lebedev.ri marked 3 inline comments as done.Sep 19 2019, 10:33 AM
lebedev.ri added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
176–180 ↗(On Diff #220712)

Tried. No, we can't do this, it breaks the whole point of losslessly handling lanes that need no extra masking.

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
index 3f466495c5e..8db01b4d4bd 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
@@ -171,21 +171,10 @@ dropRedundantMaskingOfLeftShiftInput(BinaryOperator *OuterShift,
       // But for a mask we need to get rid of old masking instruction.
       if (!Masked->hasOneUse())
         return nullptr; // Else we can't perform the fold.
-      // We should produce compute the mask in wider type, and truncate later!
-      // Get type twice as wide element-wise (same number of elements!).
-      Type *ExtendedScalarTy = Type::getIntNTy(Ty->getContext(), 2 * BitWidth);
-      Type *ExtendedTy =
-          Ty->isVectorTy()
-              ? VectorType::get(ExtendedScalarTy, Ty->getVectorNumElements())
-              : ExtendedScalarTy;
-      auto *ExtendedSumOfShAmts =
-          ConstantExpr::getZExt(SumOfShAmts, ExtendedTy);
       // And compute the mask as usual: ~(-1 << (SumOfShAmts))
-      auto *ExtendedAllOnes = ConstantExpr::getAllOnesValue(ExtendedTy);
-      auto *ExtendedInvertedMask =
-          ConstantExpr::getShl(ExtendedAllOnes, ExtendedSumOfShAmts);
-      auto *ExtendedMask = ConstantExpr::getNot(ExtendedInvertedMask);
-      NewMask = ConstantExpr::getTrunc(ExtendedMask, Ty);
+      auto *AllOnes = ConstantExpr::getAllOnesValue(Ty);
+      auto *InvertedMask = ConstantExpr::getShl(AllOnes, SumOfShAmts);
+      NewMask = ConstantExpr::getNot(InvertedMask);
     } else
       NewMask = nullptr; // No mask needed.
     // All good, we can do this fold.
diff --git a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll
index 5445275ad1c..235e152d2fe 100644
--- a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll
+++ b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll
@@ -82,7 +82,7 @@ define <8 x i32> @t2_vec_nonsplat(<8 x i32> %x, <8 x i32> %nbits) {
 ; CHECK-NEXT:    call void @use8xi32(<8 x i32> [[T2]])
 ; CHECK-NEXT:    call void @use8xi32(<8 x i32> [[T4]])
 ; CHECK-NEXT:    [[TMP1:%.*]] = shl <8 x i32> [[X:%.*]], [[T4]]
-; CHECK-NEXT:    [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 -1, i32 -1, i32 -1, i32 undef>
+; CHECK-NEXT:    [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 undef, i32 undef, i32 undef, i32 undef>
 ; CHECK-NEXT:    ret <8 x i32> [[T5]]
 ;
   %t0 = add <8 x i32> %nbits, <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>
diff --git a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll
index 6165b579661..0a7c0e5d030 100644
--- a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll
+++ b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll
@@ -82,7 +82,7 @@ define <8 x i32> @t2_vec_nonsplat(<8 x i32> %x, <8 x i32> %nbits) {
 ; CHECK-NEXT:    call void @use8xi32(<8 x i32> [[T2]])
 ; CHECK-NEXT:    call void @use8xi32(<8 x i32> [[T4]])
 ; CHECK-NEXT:    [[TMP1:%.*]] = shl <8 x i32> [[X:%.*]], [[T4]]
-; CHECK-NEXT:    [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 -1, i32 -1, i32 -1, i32 undef>
+; CHECK-NEXT:    [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 undef, i32 undef, i32 undef, i32 undef>
 ; CHECK-NEXT:    ret <8 x i32> [[T5]]
 ;
   %t0 = add <8 x i32> %nbits, <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>
spatel accepted this revision.Sep 20 2019, 10:42 AM

LGTM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
176–180 ↗(On Diff #220712)

Ok, thanks for checking. Can we make the reasoning more clear in the code comment? Something like:

// The mask must be computed in a type twice as wide to ensure
// that no bits are lost if the sum-of-shifts is wider than the base type.
This revision is now accepted and ready to land.Sep 20 2019, 10:42 AM

LGTM

Thank you for the review.

I'm kinda worried about single-use check there.
I don't know for a fact that it is "bad", but i suspect it may be.
But that is something for later.

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
176–180 ↗(On Diff #220712)

Yes, i will improve comment here.

@spatel will you want to review the sibling patch D67725, or should i "self-review" since it's rather identical to this one?

@spatel will you want to review the sibling patch D67725, or should i "self-review" since it's rather identical to this one?

Sorry for the delay - I'll take a look today.

This revision was automatically updated to reflect the committed changes.