This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineMulDivRem.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
2/2
mul.ll

Differential D85446

[InstCombine] Add vector support to mul(add(x,c),negpow2) -> mul(sub(-c,x),pow2) folds
ClosedPublic

Authored by lebedev.ri on Aug 6 2020, 9:34 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon

Commits

rG0c1c756a3153: [InstCombine] Generalize %x * (-1<<C) --> (-%x) * (1<<C) fold

Summary

Generalize the negated pow2 folds to support vectors - it looks like some other shl(add(x,c1),c2) -> add(shl(x,c2),c3) style folds are conflicting but only affect scalar / non-uniform vectors - I'll address this in a future patch.

Diff Detail

Unit TestsFailed

	Time	Test
	70 ms	linux > Polly.Isl/Ast::alias_checks_with_empty_context.ll

Event Timeline

RKSimon created this revision.Aug 6 2020, 9:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2020, 9:34 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

RKSimon requested review of this revision.Aug 6 2020, 9:34 AM

Aha, good catch.
But this is not enough of a generalization:

diff
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
index 172b304b776..06a0fcbf0fa 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
@@ -233,29 +233,11 @@ Instruction *InstCombinerImpl::visitMul(BinaryOperator &I) {
     }
   }
 
-  if (ConstantInt *CI = dyn_cast<ConstantInt>(Op1)) {
-    // (Y - X) * (-(2**n)) -> (X - Y) * (2**n), for positive nonzero n
-    // (Y + const) * (-(2**n)) -> (-constY) * (2**n), for positive nonzero n
-    // The "* (2**n)" thus becomes a potential shifting opportunity.
-    {
-      const APInt &   Val = CI->getValue();
-      const APInt &PosVal = Val.abs();
-      if (Val.isNegative() && PosVal.isPowerOf2()) {
-        Value *X = nullptr, *Y = nullptr;
-        if (Op0->hasOneUse()) {
-          ConstantInt *C1;
-          Value *Sub = nullptr;
-          if (match(Op0, m_Sub(m_Value(Y), m_Value(X))))
-            Sub = Builder.CreateSub(X, Y, "suba");
-          else if (match(Op0, m_Add(m_Value(Y), m_ConstantInt(C1))))
-            Sub = Builder.CreateSub(Builder.CreateNeg(C1), Y, "subc");
-          if (Sub)
-            return
-              BinaryOperator::CreateMul(Sub,
-                                        ConstantInt::get(Y->getType(), PosVal));
-        }
-      }
-    }
+  if (match(Op1, m_NegatedPower2())) {
+    // Interpret  X * (-1<<C)  as  (-X) * (1<<C)  and try to sink the negation.
+    if (Value *NegOp0 = Negator::Negate(/*IsNegation*/ false, Op0, *this))
+      return BinaryOperator::CreateMul(
+          NegOp0, ConstantExpr::getNeg(cast<Constant>(Op1)), I.getName());
   }
 
   if (Instruction *FoldedMul = foldBinOpIntoSelectOrPhi(I))

We miss a similar thing for sdiv exact %x, -1<<C --> -(ashr exact %x, C)
I can take over this patch if you prefer.

Harbormaster completed remote builds in B67342: Diff 283639.Aug 6 2020, 9:57 AM

In D85446#2200214, @lebedev.ri wrote:

We miss a similar thing for sdiv exact %x, -1<<C --> -(ashr exact %x, C)
I can take over this patch if you prefer.

If you're offering - that'd be great, thank you! I randomly came across that we match this in DAG but not IR.....

Also I'm rather disappointed how often we have code that just matches m_ConstantInt where m_APInt (which would match uniform vectors with no undefs) or a full m_Constant could be used (possibly in conjunction with ConstantExpr) - it feels like something we should be aiming for.

lebedev.ri commandeered this revision.Aug 6 2020, 10:08 AM

lebedev.ri edited reviewers, added: RKSimon; removed: lebedev.ri.

lebedev.ri added inline comments.Aug 6 2020, 1:01 PM

llvm/test/Transforms/InstCombine/mul.ll
976	And another example that we need something like Negator-driven hoisting.

lebedev.ri added inline comments.Aug 6 2020, 1:13 PM

llvm/test/Transforms/InstCombine/mul.ll

976

Also, there's some very interesting fold pirouetting going on there.
I will be surprised if we won't encounter conflicting transforms here soon.

INSTCOMBINE ITERATION #1 on muladd2
IC: ADD:   ret i32 %mul
IC: ADD:   %mul = mul i32 %add, -4
IC: ADD:   %add = add i32 %a0, 16
IC: Visiting:   %add = add i32 %a0, 16
IC: Visiting:   %mul = mul i32 %add, -4
Negator: attempting to sink negation into   %add = add i32 %a0, 16
Negator: successfully sunk negation into   %add = add i32 %a0, 16
         NEW:   %add.neg = sub i32 -16, %a0
Negator: Propagating 1 instrs to InstCombine
IC: ADD DEFERRED:   %add.neg = sub i32 -16, %a0
IC: Old =   %mul = mul i32 %add, -4
    New =   %mul = mul i32 %add.neg, 4
IC: ADD:   %mul = mul i32 %add.neg, 4
IC: ERASE   %1 = mul i32 %add, -4
IC: ADD DEFERRED:   %add = add i32 %a0, 16
IC: ERASE   %add = add i32 %a0, 16
IC: ADD:   %add.neg = sub i32 -16, %a0
IC: Visiting:   %add.neg = sub i32 -16, %a0
Negator: attempting to sink negation into i32 %a0
Negator: failed to sink negation into i32 %a0
IC: Visiting:   %mul = mul i32 %add.neg, 4
IC: Old =   %mul = mul i32 %add.neg, 4
    New =   <badref> = shl i32 %add.neg, 2
IC: ADD:   %mul = shl i32 %add.neg, 2
IC: ERASE   %1 = mul i32 %add.neg, 4
IC: ADD DEFERRED:   %add.neg = sub i32 -16, %a0
IC: ADD:   %add.neg = sub i32 -16, %a0
IC: Visiting:   %add.neg = sub i32 -16, %a0
Negator: attempting to sink negation into i32 %a0
Negator: failed to sink negation into i32 %a0
IC: Visiting:   %mul = shl i32 %add.neg, 2
IC: ADD DEFERRED:   %add.neg = sub i32 1073741808, %a0
IC: Mod =   %mul = shl i32 %add.neg, 2
    New =   %mul = shl i32 %add.neg, 2
IC: ADD:   %mul = shl i32 %add.neg, 2
IC: ADD:   %add.neg = sub i32 1073741808, %a0
IC: Visiting:   %add.neg = sub i32 1073741808, %a0
Negator: attempting to sink negation into i32 %a0
Negator: failed to sink negation into i32 %a0
IC: Visiting:   %mul = shl i32 %add.neg, 2
IC: ADD DEFERRED:   %1 = shl i32 %a0, 2
IC: Old =   %mul = shl i32 %1, 2
    New =   <badref> = sub i32 -64, %add.neg
IC: ADD:   %mul = sub i32 -64, %add.neg
IC: ERASE   %2 = shl i32 %1, 2
IC: ADD DEFERRED:   %1 = sub i32 1073741808, %a0
IC: ERASE   %1 = sub i32 1073741808, %a0
IC: ADD:   %add.neg = shl i32 %a0, 2
IC: Visiting:   %add.neg = shl i32 %a0, 2
IC: Visiting:   %mul = sub i32 -64, %add.neg
Negator: attempting to sink negation into   %add.neg = shl i32 %a0, 2
Negator: successfully sunk negation into   %add.neg = shl i32 %a0, 2
         NEW:   %add.neg.neg = mul i32 %a0, -4
Negator: Propagating 1 instrs to InstCombine
IC: ADD DEFERRED:   %add.neg.neg = mul i32 %a0, -4
IC: Old =   %mul = sub i32 -64, %add.neg
    New =   <badref> = add i32 %add.neg.neg, -64
IC: ADD:   %mul = add i32 %add.neg.neg, -64
IC: ERASE   %1 = sub i32 -64, %add.neg
IC: ADD DEFERRED:   %add.neg = shl i32 %a0, 2
IC: ERASE   %add.neg = shl i32 %a0, 2
IC: ADD:   %add.neg.neg = mul i32 %a0, -4
IC: Visiting:   %add.neg.neg = mul i32 %a0, -4
Negator: attempting to sink negation into i32 %a0
Negator: failed to sink negation into i32 %a0
IC: Visiting:   %mul = add i32 %add.neg.neg, -64
IC: Visiting:   ret i32 %mul


INSTCOMBINE ITERATION #2 on muladd2
IC: ADD:   ret i32 %mul
IC: ADD:   %mul = add i32 %add.neg.neg, -64
IC: ADD:   %add.neg.neg = mul i32 %a0, -4
IC: Visiting:   %add.neg.neg = mul i32 %a0, -4
Negator: attempting to sink negation into i32 %a0
Negator: failed to sink negation into i32 %a0
IC: Visiting:   %mul = add i32 %add.neg.neg, -64
IC: Visiting:   ret i32 %mul

This revision was not accepted when it landed; it landed in state Needs Review.Aug 6 2020, 1:45 PM

This revision was landed with ongoing or failed builds.

Closed by commit rG0c1c756a3153: [InstCombine] Generalize %x * (-1<<C) --> (-%x) * (1<<C) fold (authored by lebedev.ri). · Explain Why

This revision was automatically updated to reflect the committed changes.

lebedev.ri added a commit: rG0c1c756a3153: [InstCombine] Generalize %x * (-1<<C) --> (-%x) * (1<<C) fold.

Thanks @lebedev.ri !

(Just some heads up, in case someone else got the same problem...)

I've noticed some regressions downstream after this patch. Haven't debugged/reduced it completely yet, but at least in one case I suspect that the problem actually might be in load-store-vectorizer.

Used to get something like this before load-store-vectorizer:

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i16 [ 0, %vector.ph ], [ %index.next, %vector.body ], !dbg !17
  %vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %53, %vector.body ]
  %vec.phi72 = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %57, %vector.body ]
  %3 = shl i16 %index, 1
  %next.gep = getelementptr i16, i16* %add.ptr, i16 %3
  %4 = or i16 %3, 2
  %next.gep73 = getelementptr i16, i16* %add.ptr, i16 %4
  %5 = or i16 %3, 4
  %next.gep74 = getelementptr i16, i16* %add.ptr, i16 %5
  %6 = or i16 %3, 6
  %next.gep75 = getelementptr i16, i16* %add.ptr, i16 %6
  %7 = mul i16 %index, -2
  %next.gep76 = getelementptr i16, i16* %incdec.ptr.us, i16 %7
  %8 = or i16 %index, 1
  %9 = mul i16 %8, -2
  %next.gep77 = getelementptr i16, i16* %incdec.ptr.us, i16 %9
  %10 = or i16 %index, 2
  %11 = mul i16 %10, -2
  %next.gep78 = getelementptr i16, i16* %incdec.ptr.us, i16 %11
  %12 = or i16 %index, 3
  %13 = mul i16 %12, -2
  %next.gep79 = getelementptr i16, i16* %incdec.ptr.us, i16 %13
  %next.gep84 = getelementptr i16, i16* %h, i16 %3
  %next.gep85 = getelementptr i16, i16* %h, i16 %4
  %next.gep86 = getelementptr i16, i16* %h, i16 %5
  %next.gep87 = getelementptr i16, i16* %h, i16 %6
  %next.gep92 = getelementptr i16, i16* %x.addr.051.us, i16 %7
  %next.gep93 = getelementptr i16, i16* %x.addr.051.us, i16 %9
  %next.gep94 = getelementptr i16, i16* %x.addr.051.us, i16 %11
  %next.gep95 = getelementptr i16, i16* %x.addr.051.us, i16 %13
  %14 = load i16, i16* %next.gep, align 1, !dbg !11, !tbaa !18
  %15 = load i16, i16* %next.gep73, align 1, !dbg !11, !tbaa !18
  %16 = load i16, i16* %next.gep74, align 1, !dbg !11, !tbaa !18
  %17 = load i16, i16* %next.gep75, align 1, !dbg !11, !tbaa !18
  %18 = insertelement <4 x i16> undef, i16 %14, i32 0, !dbg !11
  %19 = insertelement <4 x i16> %18, i16 %15, i32 1, !dbg !11
  %20 = insertelement <4 x i16> %19, i16 %16, i32 2, !dbg !11
  %21 = insertelement <4 x i16> %20, i16 %17, i32 3, !dbg !11
  %22 = load i16, i16* %next.gep84, align 1, !dbg !11, !tbaa !18
  %23 = load i16, i16* %next.gep85, align 1, !dbg !11, !tbaa !18
  %24 = load i16, i16* %next.gep86, align 1, !dbg !11, !tbaa !18
  %25 = load i16, i16* %next.gep87, align 1, !dbg !11, !tbaa !18
  %26 = insertelement <4 x i16> undef, i16 %22, i32 0, !dbg !11
  %27 = insertelement <4 x i16> %26, i16 %23, i32 1, !dbg !11
  %28 = insertelement <4 x i16> %27, i16 %24, i32 2, !dbg !11
  %29 = insertelement <4 x i16> %28, i16 %25, i32 3, !dbg !11
  %30 = load i16, i16* %next.gep76, align 1, !dbg !11, !tbaa !18
  %31 = load i16, i16* %next.gep77, align 1, !dbg !11, !tbaa !18
  %32 = load i16, i16* %next.gep78, align 1, !dbg !11, !tbaa !18
  %33 = load i16, i16* %next.gep79, align 1, !dbg !11, !tbaa !18
  %34 = insertelement <4 x i16> undef, i16 %30, i32 0, !dbg !11
  %35 = insertelement <4 x i16> %34, i16 %31, i32 1, !dbg !11
  %36 = insertelement <4 x i16> %35, i16 %32, i32 2, !dbg !11
  %37 = insertelement <4 x i16> %36, i16 %33, i32 3, !dbg !11
  %38 = load i16, i16* %next.gep92, align 1, !dbg !11, !tbaa !18
  %39 = load i16, i16* %next.gep93, align 1, !dbg !11, !tbaa !18
  %40 = load i16, i16* %next.gep94, align 1, !dbg !11, !tbaa !18
  %41 = load i16, i16* %next.gep95, align 1, !dbg !11, !tbaa !18
  %42 = insertelement <4 x i16> undef, i16 %38, i32 0, !dbg !17
  %43 = insertelement <4 x i16> %42, i16 %39, i32 1, !dbg !17
  %44 = insertelement <4 x i16> %43, i16 %40, i32 2, !dbg !17
  %45 = insertelement <4 x i16> %44, i16 %41, i32 3, !dbg !17
  %46 = sext <4 x i16> %29 to <4 x i32>, !dbg !17
  %47 = sext <4 x i16> %45 to <4 x i32>, !dbg !17
  %48 = mul nsw <4 x i32> %47, %46, !dbg !22
  %49 = sext <4 x i16> %21 to <4 x i32>, !dbg !17
  %50 = sext <4 x i16> %37 to <4 x i32>, !dbg !17
  %51 = mul nsw <4 x i32> %50, %49, !dbg !23
  %52 = sub <4 x i32> %vec.phi, %51, !dbg !24
  %53 = add <4 x i32> %52, %48, !dbg !25
  %54 = mul nsw <4 x i32> %50, %46, !dbg !26
  %55 = add <4 x i32> %54, %vec.phi72, !dbg !27
  %56 = mul nsw <4 x i32> %47, %49, !dbg !28
  %57 = add <4 x i32> %55, %56, !dbg !29
  %index.next = add i16 %index, 4, !dbg !17
  %58 = icmp eq i16 %index.next, %n.vec, !dbg !17
  br i1 %58, label %middle.block, label %vector.body, !dbg !17, !llvm.loop !30

But now we see this instead:

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i16 [ 0, %vector.ph ], [ %index.next, %vector.body ], !dbg !17
  %vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %50, %vector.body ]
  %vec.phi72 = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %54, %vector.body ]
  %3 = shl i16 %index, 1
  %next.gep = getelementptr i16, i16* %add.ptr, i16 %3
  %4 = or i16 %3, 2
  %next.gep73 = getelementptr i16, i16* %add.ptr, i16 %4
  %5 = or i16 %3, 4
  %next.gep74 = getelementptr i16, i16* %add.ptr, i16 %5
  %6 = or i16 %3, 6
  %next.gep75 = getelementptr i16, i16* %add.ptr, i16 %6
  %7 = mul i16 %index, -2
  %next.gep76 = getelementptr i16, i16* %incdec.ptr.us, i16 %7
  %8 = xor i16 %3, -2
  %next.gep77 = getelementptr i16, i16* %incdec.ptr.us, i16 %8
  %9 = add i16 %7, -4
  %next.gep78 = getelementptr i16, i16* %incdec.ptr.us, i16 %9
  %10 = add i16 %7, -6
  %next.gep79 = getelementptr i16, i16* %incdec.ptr.us, i16 %10
  %next.gep84 = getelementptr i16, i16* %h, i16 %3
  %next.gep85 = getelementptr i16, i16* %h, i16 %4
  %next.gep86 = getelementptr i16, i16* %h, i16 %5
  %next.gep87 = getelementptr i16, i16* %h, i16 %6
  %next.gep92 = getelementptr i16, i16* %x.addr.051.us, i16 %7
  %next.gep93 = getelementptr i16, i16* %x.addr.051.us, i16 %8
  %next.gep94 = getelementptr i16, i16* %x.addr.051.us, i16 %9
  %next.gep95 = getelementptr i16, i16* %x.addr.051.us, i16 %10
  %11 = load i16, i16* %next.gep, align 1, !dbg !11, !tbaa !18
  %12 = load i16, i16* %next.gep73, align 1, !dbg !11, !tbaa !18
  %13 = load i16, i16* %next.gep74, align 1, !dbg !11, !tbaa !18
  %14 = load i16, i16* %next.gep75, align 1, !dbg !11, !tbaa !18
  %15 = insertelement <4 x i16> undef, i16 %11, i32 0, !dbg !11
  %16 = insertelement <4 x i16> %15, i16 %12, i32 1, !dbg !11
  %17 = insertelement <4 x i16> %16, i16 %13, i32 2, !dbg !11
  %18 = insertelement <4 x i16> %17, i16 %14, i32 3, !dbg !11
  %19 = load i16, i16* %next.gep84, align 1, !dbg !11, !tbaa !18
  %20 = load i16, i16* %next.gep85, align 1, !dbg !11, !tbaa !18
  %21 = load i16, i16* %next.gep86, align 1, !dbg !11, !tbaa !18
  %22 = load i16, i16* %next.gep87, align 1, !dbg !11, !tbaa !18
  %23 = insertelement <4 x i16> undef, i16 %19, i32 0, !dbg !11
  %24 = insertelement <4 x i16> %23, i16 %20, i32 1, !dbg !11
  %25 = insertelement <4 x i16> %24, i16 %21, i32 2, !dbg !11
  %26 = insertelement <4 x i16> %25, i16 %22, i32 3, !dbg !11
  %27 = load i16, i16* %next.gep76, align 1, !dbg !11, !tbaa !18
  %28 = load i16, i16* %next.gep77, align 1, !dbg !11, !tbaa !18
  %29 = load i16, i16* %next.gep78, align 1, !dbg !11, !tbaa !18
  %30 = load i16, i16* %next.gep79, align 1, !dbg !11, !tbaa !18
  %31 = insertelement <4 x i16> undef, i16 %27, i32 0, !dbg !11
  %32 = insertelement <4 x i16> %31, i16 %28, i32 1, !dbg !11
  %33 = insertelement <4 x i16> %32, i16 %29, i32 2, !dbg !11
  %34 = insertelement <4 x i16> %33, i16 %30, i32 3, !dbg !11
  %35 = load i16, i16* %next.gep92, align 1, !dbg !11, !tbaa !18
  %36 = load i16, i16* %next.gep93, align 1, !dbg !11, !tbaa !18
  %37 = load i16, i16* %next.gep94, align 1, !dbg !11, !tbaa !18
  %38 = load i16, i16* %next.gep95, align 1, !dbg !11, !tbaa !18
  %39 = insertelement <4 x i16> undef, i16 %35, i32 0, !dbg !17
  %40 = insertelement <4 x i16> %39, i16 %36, i32 1, !dbg !17
  %41 = insertelement <4 x i16> %40, i16 %37, i32 2, !dbg !17
  %42 = insertelement <4 x i16> %41, i16 %38, i32 3, !dbg !17
  %43 = sext <4 x i16> %26 to <4 x i32>, !dbg !17
  %44 = sext <4 x i16> %42 to <4 x i32>, !dbg !17
  %45 = mul nsw <4 x i32> %44, %43, !dbg !22
  %46 = sext <4 x i16> %18 to <4 x i32>, !dbg !17
  %47 = sext <4 x i16> %34 to <4 x i32>, !dbg !17
  %48 = mul nsw <4 x i32> %47, %46, !dbg !23
  %49 = sub <4 x i32> %vec.phi, %48, !dbg !24
  %50 = add <4 x i32> %49, %45, !dbg !25
  %51 = mul nsw <4 x i32> %47, %43, !dbg !26
  %52 = add <4 x i32> %51, %vec.phi72, !dbg !27
  %53 = mul nsw <4 x i32> %44, %46, !dbg !28
  %54 = add <4 x i32> %52, %53, !dbg !29
  %index.next = add i16 %index, 4, !dbg !17
  %55 = icmp eq i16 %index.next, %n.vec, !dbg !17
  br i1 %55, label %middle.block, label %vector.body, !dbg !17, !llvm.loop !30

The old input to load-store-vectorizer resulted in two <8 x i16> loads, but the new input gives one <8 x i16 load, two <2 x i16> loads and one <4 x i16> load. So for some reason LSV fails to detect that the loads are consecutive given the new IR. At least that is my current theory.

In D85446#2205616, @bjope wrote:

(Just some heads up, in case someone else got the same problem...)

I've noticed some regressions downstream after this patch. Haven't debugged/reduced it completely yet, but at least in one case I suspect that the problem actually might be in load-store-vectorizer.

Used to get something like this before load-store-vectorizer:

<...>

The old input to load-store-vectorizer resulted in two <8 x i16> loads, but the new input gives one <8 x i16 load, two <2 x i16> loads and one <4 x i16> load. So for some reason LSV fails to detect that the loads are consecutive given the new IR. At least that is my current theory.

Please can you file a bug with reproducer?

In D85446#2205778, @lebedev.ri wrote:

In D85446#2205616, @bjope wrote:

(Just some heads up, in case someone else got the same problem...)

I've noticed some regressions downstream after this patch. Haven't debugged/reduced it completely yet, but at least in one case I suspect that the problem actually might be in load-store-vectorizer.

Used to get something like this before load-store-vectorizer:

<...>

The old input to load-store-vectorizer resulted in two <8 x i16> loads, but the new input gives one <8 x i16 load, two <2 x i16> loads and one <4 x i16> load. So for some reason LSV fails to detect that the loads are consecutive given the new IR. At least that is my current theory.

Please can you file a bug with reproducer?

Yes, I'm working on that (although progress has been a bit slow at the moment).

In D85446#2212757, @bjope wrote:

In D85446#2205778, @lebedev.ri wrote:

In D85446#2205616, @bjope wrote:

(Just some heads up, in case someone else got the same problem...)

I've noticed some regressions downstream after this patch. Haven't debugged/reduced it completely yet, but at least in one case I suspect that the problem actually might be in load-store-vectorizer.

Used to get something like this before load-store-vectorizer:

<...>

The old input to load-store-vectorizer resulted in two <8 x i16> loads, but the new input gives one <8 x i16 load, two <2 x i16> loads and one <4 x i16> load. So for some reason LSV fails to detect that the loads are consecutive given the new IR. At least that is my current theory.

Please can you file a bug with reproducer?

Yes, I'm working on that (although progress has been a bit slow at the moment).

Here is a PR: https://bugs.llvm.org/show_bug.cgi?id=47136

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineMulDivRem.cpp

32 lines

test/

Transforms/

InstCombine/

mul.ll

36 lines

Diff 283639

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

Show First 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	if (match(&I, m_Mul(m_Value(NewOp), m_Constant(C1)))) {
Shl->setHasNoSignedWrap();		Shl->setHasNoSignedWrap();
}		}

return Shl;		return Shl;
}		}
}		}
}		}

if (ConstantInt *CI = dyn_cast<ConstantInt>(Op1)) {		if (match(Op1, m_NegatedPower2())) {
// (Y - X) * (-(2*n)) -> (X - Y) (2**n), for positive nonzero n		// (Y - X) * (-(2*n)) -> (X - Y) (2**n), for positive nonzero n
// (Y + const) * (-(2*n)) -> (-constY) (2**n), for positive nonzero n		// (Y + const) * (-(2*n)) -> (-const - Y) (2**n), for positive nonzero n
// The "* (2**n)" thus becomes a potential shifting opportunity.		// The "* (2**n)" thus becomes a potential shifting opportunity.
{
const APInt & Val = CI->getValue();
const APInt &PosVal = Val.abs();
if (Val.isNegative() && PosVal.isPowerOf2()) {
Value X = nullptr, Y = nullptr;		Value X = nullptr, Y = nullptr;
if (Op0->hasOneUse()) {		if (Op0->hasOneUse()) {
ConstantInt *C1;		Constant *C1;
Value *Sub = nullptr;		Value *Sub = nullptr;
if (match(Op0, m_Sub(m_Value(Y), m_Value(X))))		if (match(Op0, m_Sub(m_Value(Y), m_Value(X))))
Sub = Builder.CreateSub(X, Y, "suba");		Sub = Builder.CreateSub(X, Y, "suba");
else if (match(Op0, m_Add(m_Value(Y), m_ConstantInt(C1))))		else if (match(Op0, m_Add(m_Value(Y), m_Constant(C1))))
Sub = Builder.CreateSub(Builder.CreateNeg(C1), Y, "subc");		Sub = Builder.CreateSub(Builder.CreateNeg(C1), Y, "subc");
if (Sub)		if (Sub) {
return		Constant *PosVal = ConstantExpr::getNeg(cast<Constant>(Op1));
BinaryOperator::CreateMul(Sub,		return BinaryOperator::CreateMul(Sub, PosVal);
ConstantInt::get(Y->getType(), PosVal));
}
}		}
}		}
}		}

if (Instruction *FoldedMul = foldBinOpIntoSelectOrPhi(I))		if (Instruction *FoldedMul = foldBinOpIntoSelectOrPhi(I))
return FoldedMul;		return FoldedMul;

if (Value *FoldedMul = foldMulSelectToNegate(I, Builder))		if (Value *FoldedMul = foldMulSelectToNegate(I, Builder))
▲ Show 20 Lines • Show All 1,284 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/mul.ll

	Show First 20 Lines • Show All 879 Lines • ▼ Show 20 Lines
	;			;
	%sub = sub i32 %a1, %a0			%sub = sub i32 %a1, %a0
	%mul = mul i32 %sub, -4			%mul = mul i32 %sub, -4
	ret i32 %mul			ret i32 %mul
	}			}

	define <2 x i32> @mulsub1_vec(<2 x i32> %a0, <2 x i32> %a1) {			define <2 x i32> @mulsub1_vec(<2 x i32> %a0, <2 x i32> %a1) {
	; CHECK-LABEL: @mulsub1_vec(			; CHECK-LABEL: @mulsub1_vec(
	; CHECK-NEXT: [[SUB:%.]] = sub <2 x i32> [[A1:%.]], [[A0:%.*]]			; CHECK-NEXT: [[SUBA:%.]] = sub <2 x i32> [[A0:%.]], [[A1:%.*]]
	; CHECK-NEXT: [[MUL:%.*]] = mul <2 x i32> [[SUB]], <i32 -4, i32 -4>			; CHECK-NEXT: [[MUL:%.*]] = shl <2 x i32> [[SUBA]], <i32 2, i32 2>
	; CHECK-NEXT: ret <2 x i32> [[MUL]]			; CHECK-NEXT: ret <2 x i32> [[MUL]]
	;			;
	%sub = sub <2 x i32> %a1, %a0			%sub = sub <2 x i32> %a1, %a0
	%mul = mul <2 x i32> %sub, <i32 -4, i32 -4>			%mul = mul <2 x i32> %sub, <i32 -4, i32 -4>
	ret <2 x i32> %mul			ret <2 x i32> %mul
	}			}

	define <2 x i32> @mulsub1_vec_nonuniform(<2 x i32> %a0, <2 x i32> %a1) {			define <2 x i32> @mulsub1_vec_nonuniform(<2 x i32> %a0, <2 x i32> %a1) {
	; CHECK-LABEL: @mulsub1_vec_nonuniform(			; CHECK-LABEL: @mulsub1_vec_nonuniform(
	; CHECK-NEXT: [[SUB:%.]] = sub <2 x i32> [[A1:%.]], [[A0:%.*]]			; CHECK-NEXT: [[SUBA:%.]] = sub <2 x i32> [[A0:%.]], [[A1:%.*]]
	; CHECK-NEXT: [[MUL:%.*]] = mul <2 x i32> [[SUB]], <i32 -4, i32 -8>			; CHECK-NEXT: [[MUL:%.*]] = shl <2 x i32> [[SUBA]], <i32 2, i32 3>
	; CHECK-NEXT: ret <2 x i32> [[MUL]]			; CHECK-NEXT: ret <2 x i32> [[MUL]]
	;			;
	%sub = sub <2 x i32> %a1, %a0			%sub = sub <2 x i32> %a1, %a0
	%mul = mul <2 x i32> %sub, <i32 -4, i32 -8>			%mul = mul <2 x i32> %sub, <i32 -4, i32 -8>
	ret <2 x i32> %mul			ret <2 x i32> %mul
	}			}

	define <2 x i32> @mulsub1_vec_nonuniform_undef(<2 x i32> %a0, <2 x i32> %a1) {			define <2 x i32> @mulsub1_vec_nonuniform_undef(<2 x i32> %a0, <2 x i32> %a1) {
	; CHECK-LABEL: @mulsub1_vec_nonuniform_undef(			; CHECK-LABEL: @mulsub1_vec_nonuniform_undef(
	; CHECK-NEXT: [[SUB:%.]] = sub <2 x i32> [[A1:%.]], [[A0:%.*]]			; CHECK-NEXT: [[SUBA:%.]] = sub <2 x i32> [[A0:%.]], [[A1:%.*]]
	; CHECK-NEXT: [[MUL:%.*]] = mul <2 x i32> [[SUB]], <i32 -4, i32 undef>			; CHECK-NEXT: [[MUL:%.*]] = shl <2 x i32> [[SUBA]], <i32 2, i32 undef>
	; CHECK-NEXT: ret <2 x i32> [[MUL]]			; CHECK-NEXT: ret <2 x i32> [[MUL]]
	;			;
	%sub = sub <2 x i32> %a1, %a0			%sub = sub <2 x i32> %a1, %a0
	%mul = mul <2 x i32> %sub, <i32 -4, i32 undef>			%mul = mul <2 x i32> %sub, <i32 -4, i32 undef>
	ret <2 x i32> %mul			ret <2 x i32> %mul
	}			}

	define i32 @mulsub2(i32 %a0) {			define i32 @mulsub2(i32 %a0) {
	; CHECK-LABEL: @mulsub2(			; CHECK-LABEL: @mulsub2(
	; CHECK-NEXT: [[SUBA:%.]] = shl i32 [[A0:%.]], 2			; CHECK-NEXT: [[SUBA:%.]] = shl i32 [[A0:%.]], 2
	; CHECK-NEXT: [[MUL:%.*]] = add i32 [[SUBA]], -64			; CHECK-NEXT: [[MUL:%.*]] = add i32 [[SUBA]], -64
	; CHECK-NEXT: ret i32 [[MUL]]			; CHECK-NEXT: ret i32 [[MUL]]
	;			;
	%sub = sub i32 16, %a0			%sub = sub i32 16, %a0
	%mul = mul i32 %sub, -4			%mul = mul i32 %sub, -4
	ret i32 %mul			ret i32 %mul
	}			}

	define <2 x i32> @mulsub2_vec(<2 x i32> %a0) {			define <2 x i32> @mulsub2_vec(<2 x i32> %a0) {
	; CHECK-LABEL: @mulsub2_vec(			; CHECK-LABEL: @mulsub2_vec(
	; CHECK-NEXT: [[SUB:%.]] = sub <2 x i32> <i32 16, i32 16>, [[A0:%.]]			; CHECK-NEXT: [[SUBA:%.]] = shl <2 x i32> [[A0:%.]], <i32 2, i32 2>
	; CHECK-NEXT: [[MUL:%.*]] = mul <2 x i32> [[SUB]], <i32 -4, i32 -4>			; CHECK-NEXT: [[MUL:%.*]] = add <2 x i32> [[SUBA]], <i32 -64, i32 -64>
	; CHECK-NEXT: ret <2 x i32> [[MUL]]			; CHECK-NEXT: ret <2 x i32> [[MUL]]
	;			;
	%sub = sub <2 x i32> <i32 16, i32 16>, %a0			%sub = sub <2 x i32> <i32 16, i32 16>, %a0
	%mul = mul <2 x i32> %sub, <i32 -4, i32 -4>			%mul = mul <2 x i32> %sub, <i32 -4, i32 -4>
	ret <2 x i32> %mul			ret <2 x i32> %mul
	}			}

	define <2 x i32> @mulsub2_vec_nonuniform(<2 x i32> %a0) {			define <2 x i32> @mulsub2_vec_nonuniform(<2 x i32> %a0) {
	; CHECK-LABEL: @mulsub2_vec_nonuniform(			; CHECK-LABEL: @mulsub2_vec_nonuniform(
	; CHECK-NEXT: [[SUB:%.]] = sub <2 x i32> <i32 16, i32 32>, [[A0:%.]]			; CHECK-NEXT: [[SUBA:%.]] = add <2 x i32> [[A0:%.]], <i32 -16, i32 -32>
	; CHECK-NEXT: [[MUL:%.*]] = mul <2 x i32> [[SUB]], <i32 -4, i32 -8>			; CHECK-NEXT: [[MUL:%.*]] = shl <2 x i32> [[SUBA]], <i32 2, i32 3>
	; CHECK-NEXT: ret <2 x i32> [[MUL]]			; CHECK-NEXT: ret <2 x i32> [[MUL]]
	;			;
	%sub = sub <2 x i32> <i32 16, i32 32>, %a0			%sub = sub <2 x i32> <i32 16, i32 32>, %a0
	%mul = mul <2 x i32> %sub, <i32 -4, i32 -8>			%mul = mul <2 x i32> %sub, <i32 -4, i32 -8>
	ret <2 x i32> %mul			ret <2 x i32> %mul
	}			}

	define <2 x i32> @mulsub2_vec_nonuniform_undef(<2 x i32> %a0) {			define <2 x i32> @mulsub2_vec_nonuniform_undef(<2 x i32> %a0) {
	; CHECK-LABEL: @mulsub2_vec_nonuniform_undef(			; CHECK-LABEL: @mulsub2_vec_nonuniform_undef(
	; CHECK-NEXT: [[SUB:%.]] = sub <2 x i32> <i32 16, i32 32>, [[A0:%.]]			; CHECK-NEXT: [[SUBA:%.]] = add <2 x i32> [[A0:%.]], <i32 -16, i32 -32>
	; CHECK-NEXT: [[MUL:%.*]] = mul <2 x i32> [[SUB]], <i32 -4, i32 undef>			; CHECK-NEXT: [[MUL:%.*]] = shl <2 x i32> [[SUBA]], <i32 2, i32 undef>
	; CHECK-NEXT: ret <2 x i32> [[MUL]]			; CHECK-NEXT: ret <2 x i32> [[MUL]]
	;			;
	%sub = sub <2 x i32> <i32 16, i32 32>, %a0			%sub = sub <2 x i32> <i32 16, i32 32>, %a0
	%mul = mul <2 x i32> %sub, <i32 -4, i32 undef>			%mul = mul <2 x i32> %sub, <i32 -4, i32 undef>
	ret <2 x i32> %mul			ret <2 x i32> %mul
	}			}

	define i32 @muladd2(i32 %a0) {			define i32 @muladd2(i32 %a0) {
	; CHECK-LABEL: @muladd2(			; CHECK-LABEL: @muladd2(
	; CHECK-NEXT: [[SUBC_NEG:%.]] = mul i32 [[A0:%.]], -4			; CHECK-NEXT: [[SUBC_NEG:%.]] = mul i32 [[A0:%.]], -4
	; CHECK-NEXT: [[MUL:%.*]] = add i32 [[SUBC_NEG]], -64			; CHECK-NEXT: [[MUL:%.*]] = add i32 [[SUBC_NEG]], -64
	; CHECK-NEXT: ret i32 [[MUL]]			; CHECK-NEXT: ret i32 [[MUL]]
	;			;
	%add = add i32 %a0, 16			%add = add i32 %a0, 16
	%mul = mul i32 %add, -4			%mul = mul i32 %add, -4
	ret i32 %mul			ret i32 %mul
	}			}

	define <2 x i32> @muladd2_vec(<2 x i32> %a0) {			define <2 x i32> @muladd2_vec(<2 x i32> %a0) {
	; CHECK-LABEL: @muladd2_vec(			; CHECK-LABEL: @muladd2_vec(
	; CHECK-NEXT: [[TMP1:%.]] = mul <2 x i32> [[A0:%.]], <i32 -4, i32 -4>			; CHECK-NEXT: [[SUBC_NEG:%.]] = mul <2 x i32> [[A0:%.]], <i32 -4, i32 -4>
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions And another example that we need something like Negator-driven hoisting. lebedev.ri: And another example that we need something like Negator-driven hoisting.
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions Also, there's some very interesting fold pirouetting going on there. I will be surprised if we won't encounter conflicting transforms here soon. INSTCOMBINE ITERATION #1 on muladd2 IC: ADD: ret i32 %mul IC: ADD: %mul = mul i32 %add, -4 IC: ADD: %add = add i32 %a0, 16 IC: Visiting: %add = add i32 %a0, 16 IC: Visiting: %mul = mul i32 %add, -4 Negator: attempting to sink negation into %add = add i32 %a0, 16 Negator: successfully sunk negation into %add = add i32 %a0, 16 NEW: %add.neg = sub i32 -16, %a0 Negator: Propagating 1 instrs to InstCombine IC: ADD DEFERRED: %add.neg = sub i32 -16, %a0 IC: Old = %mul = mul i32 %add, -4 New = %mul = mul i32 %add.neg, 4 IC: ADD: %mul = mul i32 %add.neg, 4 IC: ERASE %1 = mul i32 %add, -4 IC: ADD DEFERRED: %add = add i32 %a0, 16 IC: ERASE %add = add i32 %a0, 16 IC: ADD: %add.neg = sub i32 -16, %a0 IC: Visiting: %add.neg = sub i32 -16, %a0 Negator: attempting to sink negation into i32 %a0 Negator: failed to sink negation into i32 %a0 IC: Visiting: %mul = mul i32 %add.neg, 4 IC: Old = %mul = mul i32 %add.neg, 4 New = <badref> = shl i32 %add.neg, 2 IC: ADD: %mul = shl i32 %add.neg, 2 IC: ERASE %1 = mul i32 %add.neg, 4 IC: ADD DEFERRED: %add.neg = sub i32 -16, %a0 IC: ADD: %add.neg = sub i32 -16, %a0 IC: Visiting: %add.neg = sub i32 -16, %a0 Negator: attempting to sink negation into i32 %a0 Negator: failed to sink negation into i32 %a0 IC: Visiting: %mul = shl i32 %add.neg, 2 IC: ADD DEFERRED: %add.neg = sub i32 1073741808, %a0 IC: Mod = %mul = shl i32 %add.neg, 2 New = %mul = shl i32 %add.neg, 2 IC: ADD: %mul = shl i32 %add.neg, 2 IC: ADD: %add.neg = sub i32 1073741808, %a0 IC: Visiting: %add.neg = sub i32 1073741808, %a0 Negator: attempting to sink negation into i32 %a0 Negator: failed to sink negation into i32 %a0 IC: Visiting: %mul = shl i32 %add.neg, 2 IC: ADD DEFERRED: %1 = shl i32 %a0, 2 IC: Old = %mul = shl i32 %1, 2 New = <badref> = sub i32 -64, %add.neg IC: ADD: %mul = sub i32 -64, %add.neg IC: ERASE %2 = shl i32 %1, 2 IC: ADD DEFERRED: %1 = sub i32 1073741808, %a0 IC: ERASE %1 = sub i32 1073741808, %a0 IC: ADD: %add.neg = shl i32 %a0, 2 IC: Visiting: %add.neg = shl i32 %a0, 2 IC: Visiting: %mul = sub i32 -64, %add.neg Negator: attempting to sink negation into %add.neg = shl i32 %a0, 2 Negator: successfully sunk negation into %add.neg = shl i32 %a0, 2 NEW: %add.neg.neg = mul i32 %a0, -4 Negator: Propagating 1 instrs to InstCombine IC: ADD DEFERRED: %add.neg.neg = mul i32 %a0, -4 IC: Old = %mul = sub i32 -64, %add.neg New = <badref> = add i32 %add.neg.neg, -64 IC: ADD: %mul = add i32 %add.neg.neg, -64 IC: ERASE %1 = sub i32 -64, %add.neg IC: ADD DEFERRED: %add.neg = shl i32 %a0, 2 IC: ERASE %add.neg = shl i32 %a0, 2 IC: ADD: %add.neg.neg = mul i32 %a0, -4 IC: Visiting: %add.neg.neg = mul i32 %a0, -4 Negator: attempting to sink negation into i32 %a0 Negator: failed to sink negation into i32 %a0 IC: Visiting: %mul = add i32 %add.neg.neg, -64 IC: Visiting: ret i32 %mul INSTCOMBINE ITERATION #2 on muladd2 IC: ADD: ret i32 %mul IC: ADD: %mul = add i32 %add.neg.neg, -64 IC: ADD: %add.neg.neg = mul i32 %a0, -4 IC: Visiting: %add.neg.neg = mul i32 %a0, -4 Negator: attempting to sink negation into i32 %a0 Negator: failed to sink negation into i32 %a0 IC: Visiting: %mul = add i32 %add.neg.neg, -64 IC: Visiting: ret i32 %mul lebedev.ri: Also, there's some very interesting fold pirouetting going on there. I will be surprised if we…
	; CHECK-NEXT: [[MUL:%.*]] = add <2 x i32> [[TMP1]], <i32 -64, i32 -64>			; CHECK-NEXT: [[MUL:%.*]] = add <2 x i32> [[SUBC_NEG]], <i32 -64, i32 -64>
	; CHECK-NEXT: ret <2 x i32> [[MUL]]			; CHECK-NEXT: ret <2 x i32> [[MUL]]
	;			;
	%add = add <2 x i32> %a0, <i32 16, i32 16>			%add = add <2 x i32> %a0, <i32 16, i32 16>
	%mul = mul <2 x i32> %add, <i32 -4, i32 -4>			%mul = mul <2 x i32> %add, <i32 -4, i32 -4>
	ret <2 x i32> %mul			ret <2 x i32> %mul
	}			}

	define <2 x i32> @muladd2_vec_nonuniform(<2 x i32> %a0) {			define <2 x i32> @muladd2_vec_nonuniform(<2 x i32> %a0) {
	; CHECK-LABEL: @muladd2_vec_nonuniform(			; CHECK-LABEL: @muladd2_vec_nonuniform(
	; CHECK-NEXT: [[TMP1:%.]] = mul <2 x i32> [[A0:%.]], <i32 -4, i32 -8>			; CHECK-NEXT: [[SUBC:%.]] = sub <2 x i32> <i32 -16, i32 -32>, [[A0:%.]]
	; CHECK-NEXT: [[MUL:%.*]] = add <2 x i32> [[TMP1]], <i32 -64, i32 -256>			; CHECK-NEXT: [[MUL:%.*]] = shl <2 x i32> [[SUBC]], <i32 2, i32 3>
	; CHECK-NEXT: ret <2 x i32> [[MUL]]			; CHECK-NEXT: ret <2 x i32> [[MUL]]
	;			;
	%add = add <2 x i32> %a0, <i32 16, i32 32>			%add = add <2 x i32> %a0, <i32 16, i32 32>
	%mul = mul <2 x i32> %add, <i32 -4, i32 -8>			%mul = mul <2 x i32> %add, <i32 -4, i32 -8>
	ret <2 x i32> %mul			ret <2 x i32> %mul
	}			}

	define <2 x i32> @muladd2_vec_nonuniform_undef(<2 x i32> %a0) {			define <2 x i32> @muladd2_vec_nonuniform_undef(<2 x i32> %a0) {
	; CHECK-LABEL: @muladd2_vec_nonuniform_undef(			; CHECK-LABEL: @muladd2_vec_nonuniform_undef(
	; CHECK-NEXT: [[TMP1:%.]] = mul <2 x i32> [[A0:%.]], <i32 -4, i32 undef>			; CHECK-NEXT: [[SUBC:%.]] = sub <2 x i32> <i32 -16, i32 -32>, [[A0:%.]]
	; CHECK-NEXT: [[MUL:%.*]] = add <2 x i32> [[TMP1]], <i32 -64, i32 0>			; CHECK-NEXT: [[MUL:%.*]] = shl <2 x i32> [[SUBC]], <i32 2, i32 undef>
	; CHECK-NEXT: ret <2 x i32> [[MUL]]			; CHECK-NEXT: ret <2 x i32> [[MUL]]
	;			;
	%add = add <2 x i32> %a0, <i32 16, i32 32>			%add = add <2 x i32> %a0, <i32 16, i32 32>
	%mul = mul <2 x i32> %add, <i32 -4, i32 undef>			%mul = mul <2 x i32> %add, <i32 -4, i32 undef>
	ret <2 x i32> %mul			ret <2 x i32> %mul
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Add vector support to mul(add(x,c),negpow2) -> mul(sub(-c,x),pow2) foldsClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 283639

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

llvm/test/Transforms/InstCombine/mul.ll

[InstCombine] Add vector support to mul(add(x,c),negpow2) -> mul(sub(-c,x),pow2) folds
ClosedPublic