This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook. Use isOperationLegalOrCustom directly instead.
ClosedPublic

Authored by amyk on May 23 2020, 6:36 PM.

Details

Summary

MULH is often expanded on targets. This patch removes the isMulhCheaperThanMulShift hook and
uses isOperationLegalOrCustom instead.

Diff Detail

Event Timeline

craig.topper created this revision.May 23 2020, 6:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2020, 6:36 PM
spatel added inline comments.May 24 2020, 9:57 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4121

Does it cause trouble to use isOperationLegalOrCustom(ISD::MULHS, VT)?

Use isOperationLegalOrCustom. No in tree targets use Custom though for scalars though.

amyk added a comment.May 25 2020, 10:33 AM

Hi! I should have made it more clear in the revision, but my intention for adding the isMulhCheaperThanMulShift TLI hook was also for it to be used with the DAG Combine in https://reviews.llvm.org/D78272.

That patch has the DAG Combine in PPC only right now. After receiving some comments on it, I was thinking of putting this DAG combine into the target independent code instead. I was only going to enable it only on PPC first by calling the isMulhCheaperThanMulShift TLI hook in the combine since only PPC has an implementation of the function. I see that without the TLI hook, making the DAG combine target independent would allow it to run on all targets, which would result in some LIT failures on other targets.

Do you have any thoughts regarding this? I hope I have my made intentions more clear.

Hi! I should have made it more clear in the revision, but my intention for adding the isMulhCheaperThanMulShift TLI hook was also for it to be used with the DAG Combine in https://reviews.llvm.org/D78272.

That patch has the DAG Combine in PPC only right now. After receiving some comments on it, I was thinking of putting this DAG combine into the target independent code instead. I was only going to enable it only on PPC first by calling the isMulhCheaperThanMulShift TLI hook in the combine since only PPC has an implementation of the function. I see that without the TLI hook, making the DAG combine target independent would allow it to run on all targets, which would result in some LIT failures on other targets.

Do you have any thoughts regarding this? I hope I have my made intentions more clear.

Thanks for the clarification. I'll go ahead and extract the AArch64 change here since it should be correct regardless. I'll hold on to this until after D78272 is moved to DAG combine and we can revisit.

craig.topper planned changes to this revision.May 26 2020, 12:02 AM
amyk added a comment.May 26 2020, 11:09 AM

Thanks for the clarification. I'll go ahead and extract the AArch64 change here since it should be correct regardless. I'll hold on to this until after D78272 is moved to DAG combine and we can revisit.

I've updated D78272 (moving the transformation into DAGCombiner.cpp). Thanks for getting back to me regarding this.

@craig.topper @amyk @nemanjai Hello!
Do we want to keep the isMulhCheaperThanMulShift hook, or change it to a isOperationLegalOrCustom call, like most other transforms would use? If the hook is useful on some targets I can override it under MVE simply enough, let me know.

@craig.topper @amyk @nemanjai Hello!
Do we want to keep the isMulhCheaperThanMulShift hook, or change it to a isOperationLegalOrCustom call, like most other transforms would use? If the hook is useful on some targets I can override it under MVE simply enough, let me know.

I would like to see it removed. I'm not sure I understand the PPC implementation. The hook checks isPPC64 and then checks isOperationLegal. Why do we need the extra check for isPPC64, why isn't operation legality sufficient?

amyk added a comment.Sep 24 2020, 3:47 PM

So, my goal was to introduce a DAG combine to combine multiply+shifts into mulh. This is done in a function I introduced within DAGCombiner.cpp (called combineShiftToMULH).

Since I was implementing something that is in target independent code, I thought it may be better to enable it on PowerPC only first since I was not quite sure if other targets were interested it this and I was seeing many other LIT failures at the time.
Thus, I introduced the hook (isMulhCheaperThanMulShift) and targets who wish to combine multiply+shifts into mulh could implement that hook. Within combineShiftToMULH, there is a check to isMulhCheaperThanMulShift.

If we wanted to enable this for everyone, I think the check isOperationLegalOrCustom would probably be sufficient. I just tried this patch again, and made the following changes for combineShiftToMULH:

@@ -8099,12 +8101,6 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   if (NarrowVT !=  RightOp.getOperand(0).getValueType())
     return SDValue();

-  // Only transform into mulh if mulh for the narrow type is cheaper than
-  // a multiply followed by a shift. This should also check if mulh is
-  // legal for NarrowVT on the target.
-  if (!TLI.isMulhCheaperThanMulShift(NarrowVT))
-      return SDValue();
-
   // Proceed with the transformation if the wide type is twice as large
   // as the narrow type.
   unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();
@@ -8122,6 +8118,12 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   // we use mulhs. Othewise, zero extends (zext) use mulhu.
   unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;

+  // Only transform into mulh if mulh for the narrow type is cheaper than
+  // a multiply followed by a shift. This should also check if mulh is
+  // legal for NarrowVT on the target.
+  if (!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT))
+      return SDValue();
+
   SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0),
                                RightOp.getOperand(0));
   return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)

I only see one LIT failure now:

Failed Tests (1):
  LLVM :: CodeGen/X86/pmulh.ll

So, my goal was to introduce a DAG combine to combine multiply+shifts into mulh. This is done in a function I introduced within DAGCombiner.cpp (called combineShiftToMULH).

Since I was implementing something that is in target independent code, I thought it may be better to enable it on PowerPC only first since I was not quite sure if other targets were interested it this and I was seeing many other LIT failures at the time.
Thus, I introduced the hook (isMulhCheaperThanMulShift) and targets who wish to combine multiply+shifts into mulh could implement that hook. Within combineShiftToMULH, there is a check to isMulhCheaperThanMulShift.

If we wanted to enable this for everyone, I think the check isOperationLegalOrCustom would probably be sufficient. I just tried this patch again, and made the following changes for combineShiftToMULH:

@@ -8099,12 +8101,6 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   if (NarrowVT !=  RightOp.getOperand(0).getValueType())
     return SDValue();

-  // Only transform into mulh if mulh for the narrow type is cheaper than
-  // a multiply followed by a shift. This should also check if mulh is
-  // legal for NarrowVT on the target.
-  if (!TLI.isMulhCheaperThanMulShift(NarrowVT))
-      return SDValue();
-
   // Proceed with the transformation if the wide type is twice as large
   // as the narrow type.
   unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();
@@ -8122,6 +8118,12 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   // we use mulhs. Othewise, zero extends (zext) use mulhu.
   unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;

+  // Only transform into mulh if mulh for the narrow type is cheaper than
+  // a multiply followed by a shift. This should also check if mulh is
+  // legal for NarrowVT on the target.
+  if (!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT))
+      return SDValue();
+
   SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0),
                                RightOp.getOperand(0));
   return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)

I only see one LIT failure now:

Failed Tests (1):
  LLVM :: CodeGen/X86/pmulh.ll

Would this now be enabled on PPC32 or whatever !PPC64 is called? Are there tests for that?

amyk added a comment.Sep 30 2020, 2:52 PM

So, my goal was to introduce a DAG combine to combine multiply+shifts into mulh. This is done in a function I introduced within DAGCombiner.cpp (called combineShiftToMULH).

Since I was implementing something that is in target independent code, I thought it may be better to enable it on PowerPC only first since I was not quite sure if other targets were interested it this and I was seeing many other LIT failures at the time.
Thus, I introduced the hook (isMulhCheaperThanMulShift) and targets who wish to combine multiply+shifts into mulh could implement that hook. Within combineShiftToMULH, there is a check to isMulhCheaperThanMulShift.

If we wanted to enable this for everyone, I think the check isOperationLegalOrCustom would probably be sufficient. I just tried this patch again, and made the following changes for combineShiftToMULH:

@@ -8099,12 +8101,6 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   if (NarrowVT !=  RightOp.getOperand(0).getValueType())
     return SDValue();

-  // Only transform into mulh if mulh for the narrow type is cheaper than
-  // a multiply followed by a shift. This should also check if mulh is
-  // legal for NarrowVT on the target.
-  if (!TLI.isMulhCheaperThanMulShift(NarrowVT))
-      return SDValue();
-
   // Proceed with the transformation if the wide type is twice as large
   // as the narrow type.
   unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();
@@ -8122,6 +8118,12 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   // we use mulhs. Othewise, zero extends (zext) use mulhu.
   unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;

+  // Only transform into mulh if mulh for the narrow type is cheaper than
+  // a multiply followed by a shift. This should also check if mulh is
+  // legal for NarrowVT on the target.
+  if (!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT))
+      return SDValue();
+
   SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0),
                                RightOp.getOperand(0));
   return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)

I only see one LIT failure now:

Failed Tests (1):
  LLVM :: CodeGen/X86/pmulh.ll

Would this now be enabled on PPC32 or whatever !PPC64 is called? Are there tests for that?

I've looked into this and on 32-bit mode, the mulh nodes are actually only legalized for MVT::i32 and was being produced previous to introducing my DAGCombine.
We actually already have a test in our backend (llvm/test/CodeGen/PowerPC/mulhs.ll) that shows i32 mulhs is being produced. And since the mulh is legal, my combine will run and produce mulhs in this test, as well.
Other types on 32-bit mode aren't legal, so the combine won't run on other types. Thus in terms of PPC, I think it's probably fine to remove isMulhCheaperThanMulShift.

The only other failure I see is for CodeGen/X86/pmulh.ll. I have the following changes for the test case, however I think it would be great to have your input on it.

diff --git a/llvm/test/CodeGen/X86/pmulh.ll b/llvm/test/CodeGen/X86/pmulh.ll
index c03d0190714e..36d6137c9251 100644
--- a/llvm/test/CodeGen/X86/pmulh.ll
+++ b/llvm/test/CodeGen/X86/pmulh.ll
@@ -489,10 +489,11 @@ define <8 x i32> @mulhsw_v8i16_ashr(<8 x i16> %a, <8 x i16> %b) {
 ; SSE2-LABEL: mulhsw_v8i16_ashr:
 ; SSE2:       # %bb.0:
 ; SSE2-NEXT:    pmulhw %xmm1, %xmm0
+; SSE2-NEXT:    punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
+; SSE2-NEXT:    psrad $16, %xmm2
 ; SSE2-NEXT:    punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
-; SSE2-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
-; SSE2-NEXT:    psrad $16, %xmm0
 ; SSE2-NEXT:    psrad $16, %xmm1
+; SSE2-NEXT:    movdqa %xmm2, %xmm0
 ; SSE2-NEXT:    retq
 ;
 ; SSE41-LABEL: mulhsw_v8i16_ashr:

Also, would you like me to take over this revision, update it to get it reviewed/committed?

So, my goal was to introduce a DAG combine to combine multiply+shifts into mulh. This is done in a function I introduced within DAGCombiner.cpp (called combineShiftToMULH).

Since I was implementing something that is in target independent code, I thought it may be better to enable it on PowerPC only first since I was not quite sure if other targets were interested it this and I was seeing many other LIT failures at the time.
Thus, I introduced the hook (isMulhCheaperThanMulShift) and targets who wish to combine multiply+shifts into mulh could implement that hook. Within combineShiftToMULH, there is a check to isMulhCheaperThanMulShift.

If we wanted to enable this for everyone, I think the check isOperationLegalOrCustom would probably be sufficient. I just tried this patch again, and made the following changes for combineShiftToMULH:

@@ -8099,12 +8101,6 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   if (NarrowVT !=  RightOp.getOperand(0).getValueType())
     return SDValue();

-  // Only transform into mulh if mulh for the narrow type is cheaper than
-  // a multiply followed by a shift. This should also check if mulh is
-  // legal for NarrowVT on the target.
-  if (!TLI.isMulhCheaperThanMulShift(NarrowVT))
-      return SDValue();
-
   // Proceed with the transformation if the wide type is twice as large
   // as the narrow type.
   unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();
@@ -8122,6 +8118,12 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   // we use mulhs. Othewise, zero extends (zext) use mulhu.
   unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;

+  // Only transform into mulh if mulh for the narrow type is cheaper than
+  // a multiply followed by a shift. This should also check if mulh is
+  // legal for NarrowVT on the target.
+  if (!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT))
+      return SDValue();
+
   SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0),
                                RightOp.getOperand(0));
   return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)

I only see one LIT failure now:

Failed Tests (1):
  LLVM :: CodeGen/X86/pmulh.ll

Would this now be enabled on PPC32 or whatever !PPC64 is called? Are there tests for that?

I've looked into this and on 32-bit mode, the mulh nodes are actually only legalized for MVT::i32 and was being produced previous to introducing my DAGCombine.
We actually already have a test in our backend (llvm/test/CodeGen/PowerPC/mulhs.ll) that shows i32 mulhs is being produced. And since the mulh is legal, my combine will run and produce mulhs in this test, as well.
Other types on 32-bit mode aren't legal, so the combine won't run on other types. Thus in terms of PPC, I think it's probably fine to remove isMulhCheaperThanMulShift.

The only other failure I see is for CodeGen/X86/pmulh.ll. I have the following changes for the test case, however I think it would be great to have your input on it.

diff --git a/llvm/test/CodeGen/X86/pmulh.ll b/llvm/test/CodeGen/X86/pmulh.ll
index c03d0190714e..36d6137c9251 100644
--- a/llvm/test/CodeGen/X86/pmulh.ll
+++ b/llvm/test/CodeGen/X86/pmulh.ll
@@ -489,10 +489,11 @@ define <8 x i32> @mulhsw_v8i16_ashr(<8 x i16> %a, <8 x i16> %b) {
 ; SSE2-LABEL: mulhsw_v8i16_ashr:
 ; SSE2:       # %bb.0:
 ; SSE2-NEXT:    pmulhw %xmm1, %xmm0
+; SSE2-NEXT:    punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
+; SSE2-NEXT:    psrad $16, %xmm2
 ; SSE2-NEXT:    punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
-; SSE2-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
-; SSE2-NEXT:    psrad $16, %xmm0
 ; SSE2-NEXT:    psrad $16, %xmm1
+; SSE2-NEXT:    movdqa %xmm2, %xmm0
 ; SSE2-NEXT:    retq
 ;
 ; SSE41-LABEL: mulhsw_v8i16_ashr:

Also, would you like me to take over this revision, update it to get it reviewed/committed?

You can take over the review. The mulhsw_v8i16_ashr change is a bit unfortunate. It looks like the punpck is ending up with an undef input that is getting assigned to an abitrary register during register allocation and introducing a false dependency. I think we're just getting lucky with how it was being scheduled before. So I think its fine.

amyk commandeered this revision.Oct 13 2020, 9:17 AM
amyk edited reviewers, added: craig.topper; removed: amyk.
amyk updated this revision to Diff 297904.Oct 13 2020, 10:40 AM
amyk retitled this revision from [DAGCombiner][PowerPC][AArch64] Remove isMulhCheaperThanMulShift TLI hook. Use isOperationLegal directly instead. to [DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook. Use isOperationLegalOrCustom directly instead..
amyk edited the summary of this revision. (Show Details)
  • Update the patch with the latest master.
  • Remove isMulhCheaperThanMulShift TLI Hook from all code segments.
  • Update test/CodeGen/X86/pmulh.ll.
dmgreen accepted this revision.Oct 14 2020, 12:53 PM

Thanks for doing this. It looks good to me, if the X86 change is not too egregious.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7915

This formatting is a touch off.

This revision is now accepted and ready to land.Oct 14 2020, 12:53 PM
craig.topper added inline comments.Oct 14 2020, 1:16 PM
llvm/test/CodeGen/X86/pmulh.ll
492 ↗(On Diff #297904)

I think we were just getting lucky before. The xmm2 source here is in machine IR as undef or implicit_def and its tied to the def. Same for the xmm1 on the punpckhwd instruction.

It was like that in the original code too. The original code managed to get scheduled better, but I think it was just luck with the node numbering in the DAG.

So I'm ok with it.

amyk added a comment.Oct 15 2020, 8:56 AM

Thanks @dmgreen and @craig.topper for reviewing. :-)
I will fix the indentation in DAGCombiner on the commit, if that's okay.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7915

Oops, you're right. I can fix that on the commit if it's okay.

This revision was landed with ongoing or failed builds.Oct 19 2020, 10:23 AM
This revision was automatically updated to reflect the committed changes.
piotr added a subscriber: piotr.Oct 20 2020, 8:08 AM

Hi! This commit causes problems for AMDGPU backend - see attached file

. Any ideas before I start investigating this in detail?

LLVM ERROR: Cannot select: t56: i16 = mulhs t42, Constant:i16<-32509>

t42: i16 = truncate t67
  t67: i32 = add t66, t28
    t66: i32 = add t37, t34
      t37: i32 = shl nuw nsw t12, Constant:i32<13>
        t12: i32,ch = CopyFromReg t0, Register:i32 %5
          t11: i32 = Register %5
        t36: i32 = Constant<13>
      t34: i32 = shl nuw nsw t10, Constant:i32<7>
        t10: i32,ch = CopyFromReg t0, Register:i32 %4
          t9: i32 = Register %4
        t26: i32 = Constant<7>
    t28: i32 = add t16, t27
      t16: i32,ch = CopyFromReg t0, Register:i32 %7
        t15: i32 = Register %7
      t27: i32 = shl t8, Constant:i32<7>
        t8: i32,ch = CopyFromReg t0, Register:i32 %3
          t7: i32 = Register %3
        t26: i32 = Constant<7>
t52: i16 = Constant<-32509>

Hello. You probably need a to mark MULH's as expand:

setOperationAction(ISD::MULHU, MVT::i16, Expand);
setOperationAction(ISD::MULHS, MVT::i16, Expand);

(Or if there is a suitable instruction, it can be lowered to that, but the fix above is probably best in the sort term.)

I see this is done for i64's and vectors, but not for other types.

amyk added a comment.Oct 20 2020, 10:14 AM

Hi! This commit causes problems for AMDGPU backend - see attached file

. Any ideas before I start investigating this in detail?

LLVM ERROR: Cannot select: t56: i16 = mulhs t42, Constant:i16<-32509>

t42: i16 = truncate t67
  t67: i32 = add t66, t28
    t66: i32 = add t37, t34
      t37: i32 = shl nuw nsw t12, Constant:i32<13>
        t12: i32,ch = CopyFromReg t0, Register:i32 %5
          t11: i32 = Register %5
        t36: i32 = Constant<13>
      t34: i32 = shl nuw nsw t10, Constant:i32<7>
        t10: i32,ch = CopyFromReg t0, Register:i32 %4
          t9: i32 = Register %4
        t26: i32 = Constant<7>
    t28: i32 = add t16, t27
      t16: i32,ch = CopyFromReg t0, Register:i32 %7
        t15: i32 = Register %7
      t27: i32 = shl t8, Constant:i32<7>
        t8: i32,ch = CopyFromReg t0, Register:i32 %3
          t7: i32 = Register %3
        t26: i32 = Constant<7>
t52: i16 = Constant<-32509>

Hi, I agree with the suggestion by @dmgreen. Could you see if that works?

piotr added a comment.Oct 21 2020, 2:53 AM

Hi! This commit causes problems for AMDGPU backend - see attached file

. Any ideas before I start investigating this in detail?

LLVM ERROR: Cannot select: t56: i16 = mulhs t42, Constant:i16<-32509>

t42: i16 = truncate t67
  t67: i32 = add t66, t28
    t66: i32 = add t37, t34
      t37: i32 = shl nuw nsw t12, Constant:i32<13>
        t12: i32,ch = CopyFromReg t0, Register:i32 %5
          t11: i32 = Register %5
        t36: i32 = Constant<13>
      t34: i32 = shl nuw nsw t10, Constant:i32<7>
        t10: i32,ch = CopyFromReg t0, Register:i32 %4
          t9: i32 = Register %4
        t26: i32 = Constant<7>
    t28: i32 = add t16, t27
      t16: i32,ch = CopyFromReg t0, Register:i32 %7
        t15: i32 = Register %7
      t27: i32 = shl t8, Constant:i32<7>
        t8: i32,ch = CopyFromReg t0, Register:i32 %3
          t7: i32 = Register %3
        t26: i32 = Constant<7>
t52: i16 = Constant<-32509>

Hi, I agree with the suggestion by @dmgreen. Could you see if that works?

Thanks for the suggestion - will try that.