This is an archive of the discontinued LLVM Phabricator instance.

[X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1
ClosedPublic

Authored by craig.topper on Nov 30 2018, 11:39 AM.

Details

Summary

With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Nov 30 2018, 11:39 AM
RKSimon accepted this revision.Dec 1 2018, 5:17 AM

LGTM

This revision is now accepted and ready to land.Dec 1 2018, 5:17 AM
This revision was automatically updated to reflect the committed changes.