This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Improve lowering of vXi64 multiply with known zero 32-bit halves
ClosedPublic

Authored by RKSimon on Nov 13 2016, 8:10 AM.

Details

Summary

vXi64 multiplication is lowered into 3 calls of vpmuludq with the upper/lower 32-bit halves.

If any of these halves are zero then we can remove individual calls. Although there was isBuildVectorAllZeros code to somewhat do this I don't think it ever worked (maybe just for constant folded cases that don't seem to be tested for any longer).

This requires additional X86ISD support for computeKnownBitsForTargetNode, so far I've just added support for X86ISD::VZEXT (VPMOVZX* - helping the AVX2+ cases), I can add further support (X86 target shuffles and bit shifts) in future commits to help SSE2-AVX1 cases.

Fix for PR30845

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 77747.Nov 13 2016, 8:10 AM
RKSimon retitled this revision from to [X86][SSE] Improve lowering of vXi64 multiply with known zero 32-bit halves.
RKSimon updated this object.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
craig.topper added inline comments.Nov 15 2016, 10:12 PM
lib/Target/X86/X86ISelLowering.cpp
19952

Should we pull these and the bit casts below into the blocks that use them?

RKSimon updated this revision to Diff 78158.Nov 16 2016, 3:01 AM

Updated based on Craig's feedback

craig.topper accepted this revision.Nov 16 2016, 7:51 PM
craig.topper edited edge metadata.

LGTM

This revision is now accepted and ready to land.Nov 16 2016, 7:51 PM
This revision was automatically updated to reflect the committed changes.