This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE4A] Shuffle lowering using SSE4A EXTRQ/INSERTQ instructions
ClosedPublic

Authored by RKSimon on May 30 2015, 12:22 PM.

Details

Summary

This patch adds support for v8i16 and v16i8 shuffle lowering using the immediate versions of the SSE4A EXTRQ and INSERTQ instructions. Although rather limited (they can only act on the lower 64-bits of the source vectors, leave the upper 64-bits of the result vector undefined and don't have VEX encoded variants), the instructions are still useful for the zero extension of any lane (EXTRQ) or inserting a lane into another vector (INSERTQ). Testing demonstrated that it wasn't typically worth it to use these instructions for v2i64 or v4i32 vector shuffles although they are capable of it.

As well as adding specific pattern matching for the shuffles, the patch uses EXTRQ for zero extension cases where SSE41 isn't available and its more efficient than the SSE2 'unpack' default approach. It also adds shuffle decode support for the EXTRQ / INSERTQ cases when the instructions are handling full byte-sized extractions / insertions.

From this foundation, future patches will be able to make use of the instructions for situations that use their ability to extract/insert at the bit level.

As with any AMD-only instructions - if you have experience with these, please consider reviewing as we need all the help we can get ;-)

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 26848.May 30 2015, 12:22 PM
RKSimon retitled this revision from to [X86][SSE4A] Shuffle lowering using SSE4A EXTRQ/INSERTQ instructions.
RKSimon updated this object.
RKSimon edited the test plan for this revision. (Show Details)
RKSimon added reviewers: andreadb, spatel, chandlerc.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: Unknown Object (MLST).
spatel accepted this revision.Jun 30 2015, 3:32 PM
spatel edited edge metadata.

LGTM (although I've never used extrq/insertq).

For the test file, would you please specify the CPU attributes directly rather than using btver1 / btver2? So -mattr=sse4a and -mattr=sse4a,sse4.2 (or avx)?

This revision is now accepted and ready to land.Jun 30 2015, 3:32 PM
This revision was automatically updated to reflect the committed changes.

For the test file, would you please specify the CPU attributes directly rather than using btver1 / btver2? So -mattr=sse4a and -mattr=sse4a,sse4.2 (or avx)?

Thanks Sanjay, I've committed with your requested mattr fix