This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps
ClosedPublic

Authored by craig.topper on Nov 8 2018, 2:10 PM.

Details

Summary

I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG.

Unfortunately, this regressed some AArch64 tests because that DAG combine is now combining (extract_subvector (build_vector)) into a narrower build_vector. This caused it to stop matching an isel pattern that was looking for a multipy and extract_subvector to generate u/smull2. Perhaps the MULHS/MULHU lowering should pick some new AArchISD node that can be directly matched u/smull2?

X86 is showing some weird code too due to aggressive constant broadcast formation using GPR even when it doesn't remove the BUILD_VECTOR completely.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Nov 8 2018, 2:10 PM

If at all possible the X86 BROADCAST issue needs fixing first.

RKSimon added inline comments.Nov 9 2018, 11:21 AM
test/CodeGen/X86/urem-seteq-vec-nonsplat.ll
669 ↗(On Diff #173224)

On AVX512 shouldn't these be using the broadcast load fold?

Handle AArch64 expansion with isel patterns instead of custom lowering. This prevents DAG combine from seeing the extract+build_vector opportunity.

Add a one use check to lowerShuffleVectorAsBroadcast in X86. This seems to reign in the aggressive broadcast formation. We now seem to be sharing constant pool entries a little better.

RKSimon added inline comments.Nov 12 2018, 3:33 AM
test/CodeGen/X86/combine-udiv.ll
726 ↗(On Diff #173525)

It's a little odd that this hasn't been simplified to a VPSHUFB (or at least removed the xmm2 zero register - VPPERM can generates its own zero elements) - at that point this wouldn't be XOP specific at all......

craig.topper added inline comments.Nov 12 2018, 10:35 AM
test/CodeGen/X86/combine-udiv.ll
726 ↗(On Diff #173525)

It looks like a packus was turned in vpperm before some zero_extend_vector_inreg and mul constant folding kicked in later in the same dag combine round. So at the time the VPPERM was created it wasn't known that it was zero. This all occurred on the DAG combine between vector op legalization and DAG legalization. This vpperm itself didn't get revisited after the input became 0 due to a bitcast. But the last DAG combine after legalize DAG should have revisited it. Should we have been able to recombine the VPPERM to a VPPERM with zero controls?

Rebase after r346697

craig.topper added inline comments.Nov 12 2018, 12:00 PM
test/CodeGen/X86/combine-udiv.ll
709 ↗(On Diff #173726)

This is zero extending a truncate implemented with packuswb which means the upper bits before the truncate were already zero. So that's pretty silly. I'll file a bug.

RKSimon added inline comments.Nov 13 2018, 3:21 AM
test/CodeGen/X86/combine-udiv.ll
726 ↗(On Diff #173525)

Yes either VPPERM with zero or just a VSHUFB (it'd be unary by then). The current codegen avoids the variable shuffles completely which is better still.

RKSimon accepted this revision.Nov 29 2018, 3:39 AM

LGTM

This revision is now accepted and ready to land.Nov 29 2018, 3:39 AM
This revision was automatically updated to reflect the committed changes.