This is an archive of the discontinued LLVM Phabricator instance.

[X86] Combine vXi64 multiplies to MULDQ/MULUDQ during DAG combine instead of lowering.
ClosedPublic

Authored by craig.topper on Mar 11 2018, 2:57 PM.

Details

Summary

Previously we used a custom lowering for this because of the AVX1 splitting requirement. But we can do the split during DAG combine if we check the types and subtarget.

I tried to use SplitBinaryOpsAndApply but the 512-bit path through that code assumes a destination VT with i16 or i8 elements since it checks BWI. But I don't need a BWI check here nor do I need any splitting for 512 bits. So I just made a AVX1 helper. I used SelectionDAG::SplitVector and SelectionDAG::SplitDestVT since those already know how to split in half. Since this is during DAG combine it doesn't need the BUILD_VECTOR check that the extractSubvector helper uses.

The test change is a regression, but its not strictly related to this change. There's a DAG combine in visitBUILD_VECTOR that only runs during the post type legalization DAG combine or the DAG combine after legalizing vector ops. But those DAG combines only run if the legal types or legal vector ops process changed the DAG. The DAG combine I've added here runs before type legalization, there are no illegal types, and no vector ops to legalize so the DAG combine in visitBUILD_VECTOR didn't get a chance to run. I've tried to fix the build_vector DAG combine to work before type legalization, but I've hit other issues that I'm trying to work through. Hopefully we can try to resolve that separately.

Diff Detail

Event Timeline

craig.topper created this revision.Mar 11 2018, 2:57 PM

Remove the MULDQ code from LowerMUL. The DAG combine should cover it now.

Added a flag to SplitBinaryOpsAndApply to control checking useBWIRegs or useAVX512Regs. I wanted to do it based on VT, but PMADDWD uses a vXi32 result VT and a vXi16 input VT.

Also removed some more code from LowerMUL that checked for hasDQI.

So what do you think we should do about this regression? We /could/ fix it via getFauxShuffleMask but there are probably better ways.

lib/Target/X86/X86ISelLowering.cpp
33208

PMULUDQBuilder ?

Rebase since simplify demanded bits went in. Still need to figure out how to fix the regression. Everything I've tried so far as introduce other regressions.

craig.topper planned changes to this revision.Mar 26 2018, 1:43 PM

Added a target independent DAG combine to create zero_extend from a build_vector of zexts from a contiguous range of an input vector. This more or less recovers the regression. I can split that change out of this patch if you want. I'll definitely split it for commit.

craig.topper added inline comments.Mar 27 2018, 5:53 PM
test/CodeGen/X86/mulvi32.ll
170–171

These shuffles are moving the high elements down so we can zero extend. The original code used a punpck with zero instead.

183–185

I think this is simplify demanded bits on pmuldq kicking in to remove the zeros going into elements 1 and 3. So they are effectively garbage.

RKSimon added inline comments.Mar 28 2018, 5:39 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15070

Its strange that we have reduceBuildVecExtToExtBuildVec for post-legalization and this for pre-legalization. Not exactly the same I know.

test/CodeGen/X86/combine-pmuldq.ll
7–8

This TODO can go

25–26

This TODO can go

Remove TODOs

craig.topper added inline comments.Mar 28 2018, 9:33 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15070

I tried to enable reduceBuildVecExtToExtBuildVec pre-legalization, but it was too aggressive and caused many other issues. The first DAG combine effectively runs on the DAG bottom up. So we could fold BUILD_VECTORS of zero_extends before the zero_extends are combined with loads or truncates before them. So we would need to try to detect those opportunities and prevent the combine or add more combines to do the same combines with the new build_vector.

RKSimon accepted this revision.Apr 7 2018, 6:22 AM

LGTM - with a few minors

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14981

You're setting the return type as uint64_t but returning -1 ?

15001

(Offset + i) for clarity

lib/Target/X86/X86ISelLowering.cpp
33208

PMULUDQBuilder ?

This revision is now accepted and ready to land.Apr 7 2018, 6:22 AM