This is an archive of the discontinued LLVM Phabricator instance.

[X86] When lowering v32i8 MULHS/MULHU, shuffle after the PACKUS rather than before.
ClosedPublic

Authored by craig.topper on Aug 26 2018, 11:49 PM.

Details

Summary

We're using a 256-bit PACKUS to do the truncation, but that instruction operates on 128-bit lanes. So previously we shuffled first to rearrange the lanes. But that requires 2 shuffles. Instead we can shuffle after the PACKUS using a single VPERMQ. This matches what our normal LowerTRUNCATE code does when it uses PACKUS.

Diff Detail

Event Timeline

craig.topper created this revision.Aug 26 2018, 11:49 PM

Missed a test case

RKSimon accepted this revision.Aug 27 2018, 5:44 AM

LGTM - cheers

This revision is now accepted and ready to land.Aug 27 2018, 5:44 AM
This revision was automatically updated to reflect the committed changes.