This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add -x86-experimental-vector-widening support to reduceVMULWidth and combineMulToPMADDWD
ClosedPublic

Authored by craig.topper on Nov 13 2018, 11:01 PM.

Details

Summary

With reduceVMULWidth, we no longer need to worry about extending the vector to 128 bits first. Regular widening of extends, muls and shuffles will take care of that for us.

In combineMulToPMADDWD, we can handle v2i32 multiplies and allow the VPMADDWD to be widened to v4i32 during type legalization by adding custom widening like we do have for AVG/ADDUS/SUBUS. I had to modify that code a little to allow different and output VTs.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Nov 13 2018, 11:01 PM
RKSimon added inline comments.Nov 15 2018, 2:39 AM
lib/Target/X86/X86ISelLowering.cpp
26164 ↗(On Diff #174129)

Since you're updating the code, please can you add assert messages.

test/CodeGen/X86/shrink_vmul-widen.ll
61 ↗(On Diff #174129)

Another couple of instances of whether we'd be better off doing PINSRW(PXOR) - see PR31287

70 ↗(On Diff #174129)

We're doing an extra shuffle here - is that going to be a problem?

1437 ↗(On Diff #174129)

Definite perf improvement here

Add assert messages. Going to look at the extra shuffle separately. I think we may need to try to reduceVMULWidth before combineMulToPMADDWD on pre-SSE4.1 targets. The mul_4xi8 test case in shrink_vmul.ll shows the same issue even without widening enabled.

RKSimon accepted this revision.Nov 15 2018, 10:54 AM

LGTM cheers

This revision is now accepted and ready to land.Nov 15 2018, 10:54 AM
This revision was automatically updated to reflect the committed changes.