This is an archive of the discontinued LLVM Phabricator instance.

Remove SRAs from v16i8 multiply lowering on sse2 targets
ClosedPublic

Authored by craig.topper on Mar 8 2018, 11:10 AM.

Details

Summary

Previously we unpacked the even bytes of each input into the high byte of 16-bit elements then did an v8i16 arithmetic shift right by 8 bits to fill the upper bits of each word with sign bits. Then we did the v8i16 multiply and then masked to zero the upper 8-bits of each result. The similar was done for all the odd bytes. The results are then packed together with packuswb

Since we are masking each multiply result element to 8-bits, and those 8-bits are determined only by the lower 8-bits of each of the inputs, we don't need to fill the upper bits with sign bits. So we can just unpack into the low byte of each element and treat the upper bits as garbage. This is what gcc also does.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Mar 8 2018, 11:10 AM
craig.topper edited the summary of this revision. (Show Details)
RKSimon accepted this revision.Mar 8 2018, 4:59 PM

LGTM

lib/Target/X86/X86ISelLowering.cpp
22322 ↗(On Diff #137618)

going to mask

22342 ↗(On Diff #137618)

going to mask

This revision is now accepted and ready to land.Mar 8 2018, 4:59 PM
This revision was automatically updated to reflect the committed changes.