This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
1/3
DAGCombiner.cpp
-
Target/PowerPC/
-
PowerPC/
-
PPCISelLowering.h
-
PPCISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
1
pmulh.ll

Differential D80485

[DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook. Use isOperationLegalOrCustom directly instead.
ClosedPublic

Authored by amyk on May 23 2020, 6:36 PM.

Download Raw Diff

Details

Reviewers

RKSimon
nemanjai
spatel
efriedma
craig.topper
dmgreen

Commits

rG6a946fd06fa0: [DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use…

Summary

MULH is often expanded on targets. This patch removes the isMulhCheaperThanMulShift hook and
uses isOperationLegalOrCustom instead.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.May 23 2020, 6:36 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2020, 6:36 PM

Herald added subscribers: ecnelises, danielkiss, shchenz and 3 others. · View Herald Transcript

Harbormaster failed remote builds in B57739: Diff 265903!May 23 2020, 7:38 PM

Herald added a subscriber: • wuzish. · View Herald TranscriptMay 23 2020, 7:38 PM

spatel added inline comments.May 24 2020, 9:57 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4293–4294	Does it cause trouble to use isOperationLegalOrCustom(ISD::MULHS, VT)?

Use isOperationLegalOrCustom. No in tree targets use Custom though for scalars though.

Hi! I should have made it more clear in the revision, but my intention for adding the isMulhCheaperThanMulShift TLI hook was also for it to be used with the DAG Combine in https://reviews.llvm.org/D78272.

That patch has the DAG Combine in PPC only right now. After receiving some comments on it, I was thinking of putting this DAG combine into the target independent code instead. I was only going to enable it only on PPC first by calling the isMulhCheaperThanMulShift TLI hook in the combine since only PPC has an implementation of the function. I see that without the TLI hook, making the DAG combine target independent would allow it to run on all targets, which would result in some LIT failures on other targets.

Do you have any thoughts regarding this? I hope I have my made intentions more clear.

In D80485#2053574, @amyk wrote:

Hi! I should have made it more clear in the revision, but my intention for adding the isMulhCheaperThanMulShift TLI hook was also for it to be used with the DAG Combine in https://reviews.llvm.org/D78272.

That patch has the DAG Combine in PPC only right now. After receiving some comments on it, I was thinking of putting this DAG combine into the target independent code instead. I was only going to enable it only on PPC first by calling the isMulhCheaperThanMulShift TLI hook in the combine since only PPC has an implementation of the function. I see that without the TLI hook, making the DAG combine target independent would allow it to run on all targets, which would result in some LIT failures on other targets.

Do you have any thoughts regarding this? I hope I have my made intentions more clear.

Thanks for the clarification. I'll go ahead and extract the AArch64 change here since it should be correct regardless. I'll hold on to this until after D78272 is moved to DAG combine and we can revisit.

craig.topper planned changes to this revision.May 26 2020, 12:02 AM

craig.topper mentioned this in rG80cc43b420a8: [AArch64] Set i32 ISD::MULHU/S to Expand instead of Legal..May 26 2020, 1:02 AM

In D80485#2054050, @craig.topper wrote:

Thanks for the clarification. I'll go ahead and extract the AArch64 change here since it should be correct regardless. I'll hold on to this until after D78272 is moved to DAG combine and we can revisit.

I've updated D78272 (moving the transformation into DAGCombiner.cpp). Thanks for getting back to me regarding this.

@craig.topper @amyk @nemanjai Hello!
Do we want to keep the isMulhCheaperThanMulShift hook, or change it to a isOperationLegalOrCustom call, like most other transforms would use? If the hook is useful on some targets I can override it under MVE simply enough, let me know.

Herald added a subscriber: steven.zhang. · View Herald TranscriptSep 21 2020, 3:29 AM

In D80485#2284943, @dmgreen wrote:

@craig.topper @amyk @nemanjai Hello!
Do we want to keep the isMulhCheaperThanMulShift hook, or change it to a isOperationLegalOrCustom call, like most other transforms would use? If the hook is useful on some targets I can override it under MVE simply enough, let me know.

I would like to see it removed. I'm not sure I understand the PPC implementation. The hook checks isPPC64 and then checks isOperationLegal. Why do we need the extra check for isPPC64, why isn't operation legality sufficient?

So, my goal was to introduce a DAG combine to combine multiply+shifts into mulh. This is done in a function I introduced within DAGCombiner.cpp (called combineShiftToMULH).

Since I was implementing something that is in target independent code, I thought it may be better to enable it on PowerPC only first since I was not quite sure if other targets were interested it this and I was seeing many other LIT failures at the time.
Thus, I introduced the hook (isMulhCheaperThanMulShift) and targets who wish to combine multiply+shifts into mulh could implement that hook. Within combineShiftToMULH, there is a check to isMulhCheaperThanMulShift.

If we wanted to enable this for everyone, I think the check isOperationLegalOrCustom would probably be sufficient. I just tried this patch again, and made the following changes for combineShiftToMULH:

@@ -8099,12 +8101,6 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   if (NarrowVT !=  RightOp.getOperand(0).getValueType())
     return SDValue();

-  // Only transform into mulh if mulh for the narrow type is cheaper than
-  // a multiply followed by a shift. This should also check if mulh is
-  // legal for NarrowVT on the target.
-  if (!TLI.isMulhCheaperThanMulShift(NarrowVT))
-      return SDValue();
-
   // Proceed with the transformation if the wide type is twice as large
   // as the narrow type.
   unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();
@@ -8122,6 +8118,12 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   // we use mulhs. Othewise, zero extends (zext) use mulhu.
   unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;

+  // Only transform into mulh if mulh for the narrow type is cheaper than
+  // a multiply followed by a shift. This should also check if mulh is
+  // legal for NarrowVT on the target.
+  if (!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT))
+      return SDValue();
+
   SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0),
                                RightOp.getOperand(0));
   return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)

I only see one LIT failure now:

Failed Tests (1):
  LLVM :: CodeGen/X86/pmulh.ll

In D80485#2293942, @amyk wrote:
So, my goal was to introduce a DAG combine to combine multiply+shifts into mulh. This is done in a function I introduced within DAGCombiner.cpp (called combineShiftToMULH).

Since I was implementing something that is in target independent code, I thought it may be better to enable it on PowerPC only first since I was not quite sure if other targets were interested it this and I was seeing many other LIT failures at the time.
Thus, I introduced the hook (isMulhCheaperThanMulShift) and targets who wish to combine multiply+shifts into mulh could implement that hook. Within combineShiftToMULH, there is a check to isMulhCheaperThanMulShift.

If we wanted to enable this for everyone, I think the check isOperationLegalOrCustom would probably be sufficient. I just tried this patch again, and made the following changes for combineShiftToMULH:
@@ -8099,12 +8101,6 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   if (NarrowVT !=  RightOp.getOperand(0).getValueType())
     return SDValue();

-  // Only transform into mulh if mulh for the narrow type is cheaper than
-  // a multiply followed by a shift. This should also check if mulh is
-  // legal for NarrowVT on the target.
-  if (!TLI.isMulhCheaperThanMulShift(NarrowVT))
-      return SDValue();
-
   // Proceed with the transformation if the wide type is twice as large
   // as the narrow type.
   unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();
@@ -8122,6 +8118,12 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   // we use mulhs. Othewise, zero extends (zext) use mulhu.
   unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;

+  // Only transform into mulh if mulh for the narrow type is cheaper than
+  // a multiply followed by a shift. This should also check if mulh is
+  // legal for NarrowVT on the target.
+  if (!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT))
+      return SDValue();
+
   SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0),
                                RightOp.getOperand(0));
   return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)
I only see one LIT failure now:
Failed Tests (1):
  LLVM :: CodeGen/X86/pmulh.ll

Would this now be enabled on PPC32 or whatever !PPC64 is called? Are there tests for that?

In D80485#2293955, @craig.topper wrote:
In D80485#2293942, @amyk wrote:
So, my goal was to introduce a DAG combine to combine multiply+shifts into mulh. This is done in a function I introduced within DAGCombiner.cpp (called combineShiftToMULH).

Since I was implementing something that is in target independent code, I thought it may be better to enable it on PowerPC only first since I was not quite sure if other targets were interested it this and I was seeing many other LIT failures at the time.
Thus, I introduced the hook (isMulhCheaperThanMulShift) and targets who wish to combine multiply+shifts into mulh could implement that hook. Within combineShiftToMULH, there is a check to isMulhCheaperThanMulShift.

If we wanted to enable this for everyone, I think the check isOperationLegalOrCustom would probably be sufficient. I just tried this patch again, and made the following changes for combineShiftToMULH:
@@ -8099,12 +8101,6 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   if (NarrowVT !=  RightOp.getOperand(0).getValueType())
     return SDValue();

-  // Only transform into mulh if mulh for the narrow type is cheaper than
-  // a multiply followed by a shift. This should also check if mulh is
-  // legal for NarrowVT on the target.
-  if (!TLI.isMulhCheaperThanMulShift(NarrowVT))
-      return SDValue();
-
   // Proceed with the transformation if the wide type is twice as large
   // as the narrow type.
   unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();
@@ -8122,6 +8118,12 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   // we use mulhs. Othewise, zero extends (zext) use mulhu.
   unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;

+  // Only transform into mulh if mulh for the narrow type is cheaper than
+  // a multiply followed by a shift. This should also check if mulh is
+  // legal for NarrowVT on the target.
+  if (!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT))
+      return SDValue();
+
   SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0),
                                RightOp.getOperand(0));
   return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)
I only see one LIT failure now:
Failed Tests (1):
  LLVM :: CodeGen/X86/pmulh.ll
Would this now be enabled on PPC32 or whatever !PPC64 is called? Are there tests for that?

I've looked into this and on 32-bit mode, the mulh nodes are actually only legalized for MVT::i32 and was being produced previous to introducing my DAGCombine.
We actually already have a test in our backend (llvm/test/CodeGen/PowerPC/mulhs.ll) that shows i32 mulhs is being produced. And since the mulh is legal, my combine will run and produce mulhs in this test, as well.
Other types on 32-bit mode aren't legal, so the combine won't run on other types. Thus in terms of PPC, I think it's probably fine to remove isMulhCheaperThanMulShift.

The only other failure I see is for CodeGen/X86/pmulh.ll. I have the following changes for the test case, however I think it would be great to have your input on it.

diff --git a/llvm/test/CodeGen/X86/pmulh.ll b/llvm/test/CodeGen/X86/pmulh.ll
index c03d0190714e..36d6137c9251 100644
--- a/llvm/test/CodeGen/X86/pmulh.ll
+++ b/llvm/test/CodeGen/X86/pmulh.ll
@@ -489,10 +489,11 @@ define <8 x i32> @mulhsw_v8i16_ashr(<8 x i16> %a, <8 x i16> %b) {
 ; SSE2-LABEL: mulhsw_v8i16_ashr:
 ; SSE2:       # %bb.0:
 ; SSE2-NEXT:    pmulhw %xmm1, %xmm0
+; SSE2-NEXT:    punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
+; SSE2-NEXT:    psrad $16, %xmm2
 ; SSE2-NEXT:    punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
-; SSE2-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
-; SSE2-NEXT:    psrad $16, %xmm0
 ; SSE2-NEXT:    psrad $16, %xmm1
+; SSE2-NEXT:    movdqa %xmm2, %xmm0
 ; SSE2-NEXT:    retq
 ;
 ; SSE41-LABEL: mulhsw_v8i16_ashr:

Also, would you like me to take over this revision, update it to get it reviewed/committed?

In D80485#2304761, @amyk wrote:
In D80485#2293955, @craig.topper wrote:
In D80485#2293942, @amyk wrote:
So, my goal was to introduce a DAG combine to combine multiply+shifts into mulh. This is done in a function I introduced within DAGCombiner.cpp (called combineShiftToMULH).

Since I was implementing something that is in target independent code, I thought it may be better to enable it on PowerPC only first since I was not quite sure if other targets were interested it this and I was seeing many other LIT failures at the time.
Thus, I introduced the hook (isMulhCheaperThanMulShift) and targets who wish to combine multiply+shifts into mulh could implement that hook. Within combineShiftToMULH, there is a check to isMulhCheaperThanMulShift.

If we wanted to enable this for everyone, I think the check isOperationLegalOrCustom would probably be sufficient. I just tried this patch again, and made the following changes for combineShiftToMULH:
@@ -8099,12 +8101,6 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   if (NarrowVT !=  RightOp.getOperand(0).getValueType())
     return SDValue();

-  // Only transform into mulh if mulh for the narrow type is cheaper than
-  // a multiply followed by a shift. This should also check if mulh is
-  // legal for NarrowVT on the target.
-  if (!TLI.isMulhCheaperThanMulShift(NarrowVT))
-      return SDValue();
-
   // Proceed with the transformation if the wide type is twice as large
   // as the narrow type.
   unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();
@@ -8122,6 +8118,12 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
   // we use mulhs. Othewise, zero extends (zext) use mulhu.
   unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;

+  // Only transform into mulh if mulh for the narrow type is cheaper than
+  // a multiply followed by a shift. This should also check if mulh is
+  // legal for NarrowVT on the target.
+  if (!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT))
+      return SDValue();
+
   SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0),
                                RightOp.getOperand(0));
   return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)
I only see one LIT failure now:
Failed Tests (1):
  LLVM :: CodeGen/X86/pmulh.ll
Would this now be enabled on PPC32 or whatever !PPC64 is called? Are there tests for that?
I've looked into this and on 32-bit mode, the mulh nodes are actually only legalized for MVT::i32 and was being produced previous to introducing my DAGCombine.
We actually already have a test in our backend (llvm/test/CodeGen/PowerPC/mulhs.ll) that shows i32 mulhs is being produced. And since the mulh is legal, my combine will run and produce mulhs in this test, as well.
Other types on 32-bit mode aren't legal, so the combine won't run on other types. Thus in terms of PPC, I think it's probably fine to remove isMulhCheaperThanMulShift.

The only other failure I see is for CodeGen/X86/pmulh.ll. I have the following changes for the test case, however I think it would be great to have your input on it.
diff --git a/llvm/test/CodeGen/X86/pmulh.ll b/llvm/test/CodeGen/X86/pmulh.ll
index c03d0190714e..36d6137c9251 100644
--- a/llvm/test/CodeGen/X86/pmulh.ll
+++ b/llvm/test/CodeGen/X86/pmulh.ll
@@ -489,10 +489,11 @@ define <8 x i32> @mulhsw_v8i16_ashr(<8 x i16> %a, <8 x i16> %b) {
 ; SSE2-LABEL: mulhsw_v8i16_ashr:
 ; SSE2:       # %bb.0:
 ; SSE2-NEXT:    pmulhw %xmm1, %xmm0
+; SSE2-NEXT:    punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
+; SSE2-NEXT:    psrad $16, %xmm2
 ; SSE2-NEXT:    punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
-; SSE2-NEXT:    punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
-; SSE2-NEXT:    psrad $16, %xmm0
 ; SSE2-NEXT:    psrad $16, %xmm1
+; SSE2-NEXT:    movdqa %xmm2, %xmm0
 ; SSE2-NEXT:    retq
 ;
 ; SSE41-LABEL: mulhsw_v8i16_ashr:
Also, would you like me to take over this revision, update it to get it reviewed/committed?

You can take over the review. The mulhsw_v8i16_ashr change is a bit unfortunate. It looks like the punpck is ending up with an undef input that is getting assigned to an abitrary register during register allocation and introducing a false dependency. I think we're just getting lucky with how it was being scheduled before. So I think its fine.

amyk commandeered this revision.Oct 13 2020, 9:17 AM

amyk edited reviewers, added: craig.topper; removed: amyk.

Herald added a subscriber: pengfei. · View Herald TranscriptOct 13 2020, 9:17 AM

Update the patch with the latest master.
Remove isMulhCheaperThanMulShift TLI Hook from all code segments.
Update test/CodeGen/X86/pmulh.ll.

Harbormaster completed remote builds in B74959: Diff 297904.Oct 13 2020, 11:13 AM

Thanks for doing this. It looks good to me, if the X86 change is not too egregious.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8220	This formatting is a touch off.

This revision is now accepted and ready to land.Oct 14 2020, 12:53 PM

craig.topper added inline comments.Oct 14 2020, 1:16 PM

llvm/test/CodeGen/X86/pmulh.ll
492	I think we were just getting lucky before. The xmm2 source here is in machine IR as undef or implicit_def and its tied to the def. Same for the xmm1 on the punpckhwd instruction. It was like that in the original code too. The original code managed to get scheduled better, but I think it was just luck with the node numbering in the DAG. So I'm ok with it.

Thanks @dmgreen and @craig.topper for reviewing. :-)
I will fix the indentation in DAGCombiner on the commit, if that's okay.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8220	Oops, you're right. I can fix that on the commit if it's okay.

This revision was landed with ongoing or failed builds.Oct 19 2020, 10:23 AM

Closed by commit rG6a946fd06fa0: [DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use… (authored by amyk). · Explain Why

This revision was automatically updated to reflect the committed changes.

amyk added a commit: rG6a946fd06fa0: [DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use….

Hi! This commit causes problems for AMDGPU backend - see attached file

repro.ll1 KBDownload

. Any ideas before I start investigating this in detail?

LLVM ERROR: Cannot select: t56: i16 = mulhs t42, Constant:i16<-32509>

t42: i16 = truncate t67
  t67: i32 = add t66, t28
    t66: i32 = add t37, t34
      t37: i32 = shl nuw nsw t12, Constant:i32<13>
        t12: i32,ch = CopyFromReg t0, Register:i32 %5
          t11: i32 = Register %5
        t36: i32 = Constant<13>
      t34: i32 = shl nuw nsw t10, Constant:i32<7>
        t10: i32,ch = CopyFromReg t0, Register:i32 %4
          t9: i32 = Register %4
        t26: i32 = Constant<7>
    t28: i32 = add t16, t27
      t16: i32,ch = CopyFromReg t0, Register:i32 %7
        t15: i32 = Register %7
      t27: i32 = shl t8, Constant:i32<7>
        t8: i32,ch = CopyFromReg t0, Register:i32 %3
          t7: i32 = Register %3
        t26: i32 = Constant<7>
t52: i16 = Constant<-32509>

Hello. You probably need a to mark MULH's as expand:

setOperationAction(ISD::MULHU, MVT::i16, Expand);
setOperationAction(ISD::MULHS, MVT::i16, Expand);

(Or if there is a suitable instruction, it can be lowered to that, but the fix above is probably best in the sort term.)

I see this is done for i64's and vectors, but not for other types.

In D80485#2341858, @piotr wrote:

Hi! This commit causes problems for AMDGPU backend - see attached file

repro.ll1 KBDownload

. Any ideas before I start investigating this in detail?

LLVM ERROR: Cannot select: t56: i16 = mulhs t42, Constant:i16<-32509>

t42: i16 = truncate t67
  t67: i32 = add t66, t28
    t66: i32 = add t37, t34
      t37: i32 = shl nuw nsw t12, Constant:i32<13>
        t12: i32,ch = CopyFromReg t0, Register:i32 %5
          t11: i32 = Register %5
        t36: i32 = Constant<13>
      t34: i32 = shl nuw nsw t10, Constant:i32<7>
        t10: i32,ch = CopyFromReg t0, Register:i32 %4
          t9: i32 = Register %4
        t26: i32 = Constant<7>
    t28: i32 = add t16, t27
      t16: i32,ch = CopyFromReg t0, Register:i32 %7
        t15: i32 = Register %7
      t27: i32 = shl t8, Constant:i32<7>
        t8: i32,ch = CopyFromReg t0, Register:i32 %3
          t7: i32 = Register %3
        t26: i32 = Constant<7>
t52: i16 = Constant<-32509>

Hi, I agree with the suggestion by @dmgreen. Could you see if that works?

In D80485#2342195, @amyk wrote:
In D80485#2341858, @piotr wrote:
Hi! This commit causes problems for AMDGPU backend - see attached file
repro.ll1 KBDownload
. Any ideas before I start investigating this in detail?

LLVM ERROR: Cannot select: t56: i16 = mulhs t42, Constant:i16<-32509>
t42: i16 = truncate t67
  t67: i32 = add t66, t28
    t66: i32 = add t37, t34
      t37: i32 = shl nuw nsw t12, Constant:i32<13>
        t12: i32,ch = CopyFromReg t0, Register:i32 %5
          t11: i32 = Register %5
        t36: i32 = Constant<13>
      t34: i32 = shl nuw nsw t10, Constant:i32<7>
        t10: i32,ch = CopyFromReg t0, Register:i32 %4
          t9: i32 = Register %4
        t26: i32 = Constant<7>
    t28: i32 = add t16, t27
      t16: i32,ch = CopyFromReg t0, Register:i32 %7
        t15: i32 = Register %7
      t27: i32 = shl t8, Constant:i32<7>
        t8: i32,ch = CopyFromReg t0, Register:i32 %3
          t7: i32 = Register %3
        t26: i32 = Constant<7>
t52: i16 = Constant<-32509>
Hi, I agree with the suggestion by @dmgreen. Could you see if that works?

Thanks for the suggestion - will try that.

piotr mentioned this in D89965: [AMDGPU] Fix expansion of i16 MULH.Oct 22 2020, 7:56 AM

piotr mentioned this in rG7ae0033ca881: [AMDGPU] Fix expansion of i16 MULH.Oct 22 2020, 8:06 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

4 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

16 lines

Target/

PowerPC/

PPCISelLowering.h

5 lines

PPCISelLowering.cpp

10 lines

test/

CodeGen/

X86/

pmulh.ll

5 lines

Diff 299097

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 1,680 Lines • ▼ Show 20 Lines	#include "llvm/IR/ConstrainedOps.def"
unsigned getMinimumJumpTableDensity(bool OptForSize) const;		unsigned getMinimumJumpTableDensity(bool OptForSize) const;

/// Return upper limit for number of entries in a jump table.		/// Return upper limit for number of entries in a jump table.
/// Zero if no limit.		/// Zero if no limit.
unsigned getMaximumJumpTableSize() const;		unsigned getMaximumJumpTableSize() const;

virtual bool isJumpTableRelative() const;		virtual bool isJumpTableRelative() const;

/// Return true if a mulh[s\|u] node for a specific type is cheaper than
/// a multiply followed by a shift. This is false by default.
virtual bool isMulhCheaperThanMulShift(EVT Type) const { return false; }

/// If a physical register, this specifies the register that		/// If a physical register, this specifies the register that
/// llvm.savestack/llvm.restorestack should save and restore.		/// llvm.savestack/llvm.restorestack should save and restore.
unsigned getStackPointerRegisterToSaveRestore() const {		unsigned getStackPointerRegisterToSaveRestore() const {
return StackPointerRegisterToSaveRestore;		return StackPointerRegisterToSaveRestore;
}		}

/// If a physical register, this returns the register that receives the		/// If a physical register, this returns the register that receives the
/// exception address on entry to an EH pad.		/// exception address on entry to an EH pad.
▲ Show 20 Lines • Show All 2,847 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,284 Lines • ▼ Show 20 Lines	return DAG.getNode(ISD::SRA, DL, N0.getValueType(), N0,
getShiftAmountTy(N0.getValueType())));		getShiftAmountTy(N0.getValueType())));

// fold (mulhs x, undef) -> 0		// fold (mulhs x, undef) -> 0
if (N0.isUndef() \|\| N1.isUndef())		if (N0.isUndef() \|\| N1.isUndef())
return DAG.getConstant(0, DL, VT);		return DAG.getConstant(0, DL, VT);

// If the type twice as wide is legal, transform the mulhs to a wider multiply		// If the type twice as wide is legal, transform the mulhs to a wider multiply
// plus a shift.		// plus a shift.
if (!TLI.isMulhCheaperThanMulShift(VT) && VT.isSimple() && !VT.isVector()) {		if (!TLI.isOperationLegalOrCustom(ISD::MULHS, VT) && VT.isSimple() &&
		!VT.isVector()) {
		spatelUnsubmitted Not Done Reply Inline Actions Does it cause trouble to use isOperationLegalOrCustom(ISD::MULHS, VT)? spatel: Does it cause trouble to use isOperationLegalOrCustom(ISD::MULHS, VT)?
MVT Simple = VT.getSimpleVT();		MVT Simple = VT.getSimpleVT();
unsigned SimpleSize = Simple.getSizeInBits();		unsigned SimpleSize = Simple.getSizeInBits();
EVT NewVT = EVT::getIntegerVT(DAG.getContext(), SimpleSize2);		EVT NewVT = EVT::getIntegerVT(DAG.getContext(), SimpleSize2);
if (TLI.isOperationLegal(ISD::MUL, NewVT)) {		if (TLI.isOperationLegal(ISD::MUL, NewVT)) {
N0 = DAG.getNode(ISD::SIGN_EXTEND, DL, NewVT, N0);		N0 = DAG.getNode(ISD::SIGN_EXTEND, DL, NewVT, N0);
N1 = DAG.getNode(ISD::SIGN_EXTEND, DL, NewVT, N1);		N1 = DAG.getNode(ISD::SIGN_EXTEND, DL, NewVT, N1);
N1 = DAG.getNode(ISD::MUL, DL, NewVT, N0, N1);		N1 = DAG.getNode(ISD::MUL, DL, NewVT, N0, N1);
N1 = DAG.getNode(ISD::SRL, DL, NewVT, N1,		N1 = DAG.getNode(ISD::SRL, DL, NewVT, N1,
Show All 39 Lines	SDValue SRLAmt = DAG.getNode(
ISD::SUB, DL, VT, DAG.getConstant(NumEltBits, DL, VT), LogBase2);		ISD::SUB, DL, VT, DAG.getConstant(NumEltBits, DL, VT), LogBase2);
EVT ShiftVT = getShiftAmountTy(N0.getValueType());		EVT ShiftVT = getShiftAmountTy(N0.getValueType());
SDValue Trunc = DAG.getZExtOrTrunc(SRLAmt, DL, ShiftVT);		SDValue Trunc = DAG.getZExtOrTrunc(SRLAmt, DL, ShiftVT);
return DAG.getNode(ISD::SRL, DL, VT, N0, Trunc);		return DAG.getNode(ISD::SRL, DL, VT, N0, Trunc);
}		}

// If the type twice as wide is legal, transform the mulhu to a wider multiply		// If the type twice as wide is legal, transform the mulhu to a wider multiply
// plus a shift.		// plus a shift.
if (!TLI.isMulhCheaperThanMulShift(VT) && VT.isSimple() && !VT.isVector()) {		if (!TLI.isOperationLegalOrCustom(ISD::MULHU, VT) && VT.isSimple() &&
		!VT.isVector()) {
MVT Simple = VT.getSimpleVT();		MVT Simple = VT.getSimpleVT();
unsigned SimpleSize = Simple.getSizeInBits();		unsigned SimpleSize = Simple.getSizeInBits();
EVT NewVT = EVT::getIntegerVT(DAG.getContext(), SimpleSize2);		EVT NewVT = EVT::getIntegerVT(DAG.getContext(), SimpleSize2);
if (TLI.isOperationLegal(ISD::MUL, NewVT)) {		if (TLI.isOperationLegal(ISD::MUL, NewVT)) {
N0 = DAG.getNode(ISD::ZERO_EXTEND, DL, NewVT, N0);		N0 = DAG.getNode(ISD::ZERO_EXTEND, DL, NewVT, N0);
N1 = DAG.getNode(ISD::ZERO_EXTEND, DL, NewVT, N1);		N1 = DAG.getNode(ISD::ZERO_EXTEND, DL, NewVT, N1);
N1 = DAG.getNode(ISD::MUL, DL, NewVT, N0, N1);		N1 = DAG.getNode(ISD::MUL, DL, NewVT, N0, N1);
N1 = DAG.getNode(ISD::SRL, DL, NewVT, N1,		N1 = DAG.getNode(ISD::SRL, DL, NewVT, N1,
▲ Show 20 Lines • Show All 3,833 Lines • ▼ Show 20 Lines	static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
assert((WideVT1 == WideVT2) &&		assert((WideVT1 == WideVT2) &&
"Cannot have a multiply node with two different operand types.");		"Cannot have a multiply node with two different operand types.");

EVT NarrowVT = LeftOp.getOperand(0).getValueType();		EVT NarrowVT = LeftOp.getOperand(0).getValueType();
// Check that the two extend nodes are the same type.		// Check that the two extend nodes are the same type.
if (NarrowVT != RightOp.getOperand(0).getValueType())		if (NarrowVT != RightOp.getOperand(0).getValueType())
return SDValue();		return SDValue();

// Only transform into mulh if mulh for the narrow type is cheaper than
// a multiply followed by a shift. This should also check if mulh is
// legal for NarrowVT on the target.
if (!TLI.isMulhCheaperThanMulShift(NarrowVT))
return SDValue();

// Proceed with the transformation if the wide type is twice as large		// Proceed with the transformation if the wide type is twice as large
// as the narrow type.		// as the narrow type.
unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();		unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();
if (WideVT1.getScalarSizeInBits() != 2 * NarrowVTSize)		if (WideVT1.getScalarSizeInBits() != 2 * NarrowVTSize)
return SDValue();		return SDValue();

// Check the shift amount with the narrow type size.		// Check the shift amount with the narrow type size.
// Proceed with the transformation if the shift amount is the width		// Proceed with the transformation if the shift amount is the width
// of the narrow type.		// of the narrow type.
unsigned ShiftAmt = ShiftAmtSrc->getZExtValue();		unsigned ShiftAmt = ShiftAmtSrc->getZExtValue();
if (ShiftAmt != NarrowVTSize)		if (ShiftAmt != NarrowVTSize)
return SDValue();		return SDValue();

// If the operation feeding into the MUL is a sign extend (sext),		// If the operation feeding into the MUL is a sign extend (sext),
// we use mulhs. Othewise, zero extends (zext) use mulhu.		// we use mulhs. Othewise, zero extends (zext) use mulhu.
unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;		unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;

		// Combine to mulh if mulh is legal/custom for the narrow type on the target.
		if (!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT))
		return SDValue();
		dmgreenUnsubmitted Not Done Reply Inline Actions This formatting is a touch off. dmgreen: This formatting is a touch off.
		amykAuthorUnsubmitted Done Reply Inline Actions Oops, you're right. I can fix that on the commit if it's okay. amyk: Oops, you're right. I can fix that on the commit if it's okay.

SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0),		SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0),
RightOp.getOperand(0));		RightOp.getOperand(0));
return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)		return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)
: DAG.getZExtOrTrunc(Result, DL, WideVT1));		: DAG.getZExtOrTrunc(Result, DL, WideVT1));
}		}

SDValue DAGCombiner::visitSRA(SDNode *N) {		SDValue DAGCombiner::visitSRA(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
▲ Show 20 Lines • Show All 14,193 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 1,013 Lines • ▼ Show 20 Lines	public:
Register		Register
getExceptionPointerRegister(const Constant *PersonalityFn) const override;		getExceptionPointerRegister(const Constant *PersonalityFn) const override;

/// If a physical register, this returns the register that receives the		/// If a physical register, this returns the register that receives the
/// exception typeid on entry to a landing pad.		/// exception typeid on entry to a landing pad.
Register		Register
getExceptionSelectorRegister(const Constant *PersonalityFn) const override;		getExceptionSelectorRegister(const Constant *PersonalityFn) const override;

/// isMulhCheaperThanMulShift - Return true if a mulh[s\|u] node for a
/// specific type is cheaper than a multiply followed by a shift.
/// This is true for words and doublewords on 64-bit PowerPC.
bool isMulhCheaperThanMulShift(EVT Type) const override;

/// Override to support customized stack guard loading.		/// Override to support customized stack guard loading.
bool useLoadStackGuardNode() const override;		bool useLoadStackGuardNode() const override;
void insertSSPDeclarations(Module &M) const override;		void insertSSPDeclarations(Module &M) const override;

bool isFPImmLegal(const APFloat &Imm, EVT VT,		bool isFPImmLegal(const APFloat &Imm, EVT VT,
bool ForCodeSize) const override;		bool ForCodeSize) const override;

unsigned getJumpTableEncoding() const override;		unsigned getJumpTableEncoding() const override;
▲ Show 20 Lines • Show All 305 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,395 Lines • ▼ Show 20 Lines
	bool PPCTargetLowering::hasSPE() const {			bool PPCTargetLowering::hasSPE() const {
	return Subtarget.hasSPE();			return Subtarget.hasSPE();
	}			}

	bool PPCTargetLowering::preferIncOfAddToSubOfNot(EVT VT) const {			bool PPCTargetLowering::preferIncOfAddToSubOfNot(EVT VT) const {
	return VT.isScalarInteger();			return VT.isScalarInteger();
	}			}

	/// isMulhCheaperThanMulShift - Return true if a mulh[s\|u] node for a specific
	/// type is cheaper than a multiply followed by a shift.
	/// This is true for words and doublewords on 64-bit PowerPC.
	bool PPCTargetLowering::isMulhCheaperThanMulShift(EVT Type) const {
	if (Subtarget.isPPC64() && (isOperationLegal(ISD::MULHS, Type) \|\|
	isOperationLegal(ISD::MULHU, Type)))
	return true;
	return TargetLowering::isMulhCheaperThanMulShift(Type);
	}

	const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {			const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
	switch ((PPCISD::NodeType)Opcode) {			switch ((PPCISD::NodeType)Opcode) {
	case PPCISD::FIRST_NUMBER: break;			case PPCISD::FIRST_NUMBER: break;
	case PPCISD::FSEL: return "PPCISD::FSEL";			case PPCISD::FSEL: return "PPCISD::FSEL";
	case PPCISD::XSMAXCDP: return "PPCISD::XSMAXCDP";			case PPCISD::XSMAXCDP: return "PPCISD::XSMAXCDP";
	case PPCISD::XSMINCDP: return "PPCISD::XSMINCDP";			case PPCISD::XSMINCDP: return "PPCISD::XSMINCDP";
	case PPCISD::FCFID: return "PPCISD::FCFID";			case PPCISD::FCFID: return "PPCISD::FCFID";
	case PPCISD::FCFIDU: return "PPCISD::FCFIDU";			case PPCISD::FCFIDU: return "PPCISD::FCFIDU";
	▲ Show 20 Lines • Show All 15,503 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/pmulh.ll

Show First 20 Lines • Show All 483 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%d = lshr <8 x i32> %c, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		%d = lshr <8 x i32> %c, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
ret <8 x i32> %d		ret <8 x i32> %d
}		}

define <8 x i32> @mulhsw_v8i16_ashr(<8 x i16> %a, <8 x i16> %b) {		define <8 x i32> @mulhsw_v8i16_ashr(<8 x i16> %a, <8 x i16> %b) {
; SSE2-LABEL: mulhsw_v8i16_ashr:		; SSE2-LABEL: mulhsw_v8i16_ashr:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: pmulhw %xmm1, %xmm0		; SSE2-NEXT: pmulhw %xmm1, %xmm0
		; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
		craig.topperUnsubmitted Not Done Reply Inline Actions I think we were just getting lucky before. The xmm2 source here is in machine IR as undef or implicit_def and its tied to the def. Same for the xmm1 on the punpckhwd instruction. It was like that in the original code too. The original code managed to get scheduled better, but I think it was just luck with the node numbering in the DAG. So I'm ok with it. craig.topper: I think we were just getting lucky before. The xmm2 source here is in machine IR as undef or…
		; SSE2-NEXT: psrad $16, %xmm2
; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]		; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
; SSE2-NEXT: psrad $16, %xmm0
; SSE2-NEXT: psrad $16, %xmm1		; SSE2-NEXT: psrad $16, %xmm1
		; SSE2-NEXT: movdqa %xmm2, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE41-LABEL: mulhsw_v8i16_ashr:		; SSE41-LABEL: mulhsw_v8i16_ashr:
; SSE41: # %bb.0:		; SSE41: # %bb.0:
; SSE41-NEXT: pmulhw %xmm1, %xmm0		; SSE41-NEXT: pmulhw %xmm1, %xmm0
; SSE41-NEXT: pmovsxwd %xmm0, %xmm2		; SSE41-NEXT: pmovsxwd %xmm0, %xmm2
; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]		; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]
; SSE41-NEXT: pmovsxwd %xmm0, %xmm1		; SSE41-NEXT: pmovsxwd %xmm0, %xmm1
▲ Show 20 Lines • Show All 1,287 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook. Use isOperationLegalOrCustom directly instead.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 299097

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/PowerPC/PPCISelLowering.h

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

llvm/test/CodeGen/X86/pmulh.ll

[DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook. Use isOperationLegalOrCustom directly instead.
ClosedPublic