MULH is often expanded on targets. This patch removes the isMulhCheaperThanMulShift hook and
uses isOperationLegalOrCustom instead.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
4293–4294 | Does it cause trouble to use isOperationLegalOrCustom(ISD::MULHS, VT)? |
Use isOperationLegalOrCustom. No in tree targets use Custom though for scalars though.
Hi! I should have made it more clear in the revision, but my intention for adding the isMulhCheaperThanMulShift TLI hook was also for it to be used with the DAG Combine in https://reviews.llvm.org/D78272.
That patch has the DAG Combine in PPC only right now. After receiving some comments on it, I was thinking of putting this DAG combine into the target independent code instead. I was only going to enable it only on PPC first by calling the isMulhCheaperThanMulShift TLI hook in the combine since only PPC has an implementation of the function. I see that without the TLI hook, making the DAG combine target independent would allow it to run on all targets, which would result in some LIT failures on other targets.
Do you have any thoughts regarding this? I hope I have my made intentions more clear.
Thanks for the clarification. I'll go ahead and extract the AArch64 change here since it should be correct regardless. I'll hold on to this until after D78272 is moved to DAG combine and we can revisit.
I've updated D78272 (moving the transformation into DAGCombiner.cpp). Thanks for getting back to me regarding this.
@craig.topper @amyk @nemanjai Hello!
Do we want to keep the isMulhCheaperThanMulShift hook, or change it to a isOperationLegalOrCustom call, like most other transforms would use? If the hook is useful on some targets I can override it under MVE simply enough, let me know.
I would like to see it removed. I'm not sure I understand the PPC implementation. The hook checks isPPC64 and then checks isOperationLegal. Why do we need the extra check for isPPC64, why isn't operation legality sufficient?
So, my goal was to introduce a DAG combine to combine multiply+shifts into mulh. This is done in a function I introduced within DAGCombiner.cpp (called combineShiftToMULH).
Since I was implementing something that is in target independent code, I thought it may be better to enable it on PowerPC only first since I was not quite sure if other targets were interested it this and I was seeing many other LIT failures at the time.
Thus, I introduced the hook (isMulhCheaperThanMulShift) and targets who wish to combine multiply+shifts into mulh could implement that hook. Within combineShiftToMULH, there is a check to isMulhCheaperThanMulShift.
If we wanted to enable this for everyone, I think the check isOperationLegalOrCustom would probably be sufficient. I just tried this patch again, and made the following changes for combineShiftToMULH:
@@ -8099,12 +8101,6 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG, if (NarrowVT != RightOp.getOperand(0).getValueType()) return SDValue(); - // Only transform into mulh if mulh for the narrow type is cheaper than - // a multiply followed by a shift. This should also check if mulh is - // legal for NarrowVT on the target. - if (!TLI.isMulhCheaperThanMulShift(NarrowVT)) - return SDValue(); - // Proceed with the transformation if the wide type is twice as large // as the narrow type. unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits(); @@ -8122,6 +8118,12 @@ static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG, // we use mulhs. Othewise, zero extends (zext) use mulhu. unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU; + // Only transform into mulh if mulh for the narrow type is cheaper than + // a multiply followed by a shift. This should also check if mulh is + // legal for NarrowVT on the target. + if (!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT)) + return SDValue(); + SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0), RightOp.getOperand(0)); return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)
I only see one LIT failure now:
Failed Tests (1): LLVM :: CodeGen/X86/pmulh.ll
Would this now be enabled on PPC32 or whatever !PPC64 is called? Are there tests for that?
I've looked into this and on 32-bit mode, the mulh nodes are actually only legalized for MVT::i32 and was being produced previous to introducing my DAGCombine.
We actually already have a test in our backend (llvm/test/CodeGen/PowerPC/mulhs.ll) that shows i32 mulhs is being produced. And since the mulh is legal, my combine will run and produce mulhs in this test, as well.
Other types on 32-bit mode aren't legal, so the combine won't run on other types. Thus in terms of PPC, I think it's probably fine to remove isMulhCheaperThanMulShift.
The only other failure I see is for CodeGen/X86/pmulh.ll. I have the following changes for the test case, however I think it would be great to have your input on it.
diff --git a/llvm/test/CodeGen/X86/pmulh.ll b/llvm/test/CodeGen/X86/pmulh.ll index c03d0190714e..36d6137c9251 100644 --- a/llvm/test/CodeGen/X86/pmulh.ll +++ b/llvm/test/CodeGen/X86/pmulh.ll @@ -489,10 +489,11 @@ define <8 x i32> @mulhsw_v8i16_ashr(<8 x i16> %a, <8 x i16> %b) { ; SSE2-LABEL: mulhsw_v8i16_ashr: ; SSE2: # %bb.0: ; SSE2-NEXT: pmulhw %xmm1, %xmm0 +; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3] +; SSE2-NEXT: psrad $16, %xmm2 ; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7] -; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3] -; SSE2-NEXT: psrad $16, %xmm0 ; SSE2-NEXT: psrad $16, %xmm1 +; SSE2-NEXT: movdqa %xmm2, %xmm0 ; SSE2-NEXT: retq ; ; SSE41-LABEL: mulhsw_v8i16_ashr:
Also, would you like me to take over this revision, update it to get it reviewed/committed?
You can take over the review. The mulhsw_v8i16_ashr change is a bit unfortunate. It looks like the punpck is ending up with an undef input that is getting assigned to an abitrary register during register allocation and introducing a false dependency. I think we're just getting lucky with how it was being scheduled before. So I think its fine.
- Update the patch with the latest master.
- Remove isMulhCheaperThanMulShift TLI Hook from all code segments.
- Update test/CodeGen/X86/pmulh.ll.
Thanks for doing this. It looks good to me, if the X86 change is not too egregious.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
8220 | This formatting is a touch off. |
llvm/test/CodeGen/X86/pmulh.ll | ||
---|---|---|
492 | I think we were just getting lucky before. The xmm2 source here is in machine IR as undef or implicit_def and its tied to the def. Same for the xmm1 on the punpckhwd instruction. It was like that in the original code too. The original code managed to get scheduled better, but I think it was just luck with the node numbering in the DAG. So I'm ok with it. |
Thanks @dmgreen and @craig.topper for reviewing. :-)
I will fix the indentation in DAGCombiner on the commit, if that's okay.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
8220 | Oops, you're right. I can fix that on the commit if it's okay. |
Hi! This commit causes problems for AMDGPU backend - see attached file
. Any ideas before I start investigating this in detail?LLVM ERROR: Cannot select: t56: i16 = mulhs t42, Constant:i16<-32509>
t42: i16 = truncate t67 t67: i32 = add t66, t28 t66: i32 = add t37, t34 t37: i32 = shl nuw nsw t12, Constant:i32<13> t12: i32,ch = CopyFromReg t0, Register:i32 %5 t11: i32 = Register %5 t36: i32 = Constant<13> t34: i32 = shl nuw nsw t10, Constant:i32<7> t10: i32,ch = CopyFromReg t0, Register:i32 %4 t9: i32 = Register %4 t26: i32 = Constant<7> t28: i32 = add t16, t27 t16: i32,ch = CopyFromReg t0, Register:i32 %7 t15: i32 = Register %7 t27: i32 = shl t8, Constant:i32<7> t8: i32,ch = CopyFromReg t0, Register:i32 %3 t7: i32 = Register %3 t26: i32 = Constant<7> t52: i16 = Constant<-32509>
Hello. You probably need a to mark MULH's as expand:
setOperationAction(ISD::MULHU, MVT::i16, Expand); setOperationAction(ISD::MULHS, MVT::i16, Expand);
(Or if there is a suitable instruction, it can be lowered to that, but the fix above is probably best in the sort term.)
I see this is done for i64's and vectors, but not for other types.
Does it cause trouble to use isOperationLegalOrCustom(ISD::MULHS, VT)?