This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Use (V)PINSRB for direct byte insertion in 16i8 buildvector on SSE4.1 targets
ClosedPublic

Authored by RKSimon on Apr 6 2015, 6:07 AM.

Details

Summary

This patch allows SSE4.1 targets to use (V)PINSRB to create 16i8 vectors by inserting i8 scalars directly into a XMM register instead of merging pairs of i8 scalars into a i16 and using the SSE2 PINSRW instruction.

This allows folding of byte loads and reduces scalar register usage as well.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 23270.Apr 6 2015, 6:07 AM
RKSimon retitled this revision from to [X86][SSE] Use (V)PINSRB for direct byte insertion in 16i8 buildvector on SSE4.1 targets.
RKSimon updated this object.
RKSimon edited the test plan for this revision. (Show Details)
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: Unknown Object (MLST).
qcolombet accepted this revision.Apr 6 2015, 10:45 AM
qcolombet edited edge metadata.

Hi Simon,

Nice catch!

LGTM.

Cheers,
-Quentin

This revision is now accepted and ready to land.Apr 6 2015, 10:45 AM
This revision was automatically updated to reflect the committed changes.