This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Added support for SSE3 lane duplication shuffle instructions
ClosedPublic

Authored by RKSimon on Jan 18 2015, 3:47 AM.

Details

Summary

This patch adds shuffle matching for the SSE3 MOVDDUP, MOVSLDUP and MOVSHDUP instructions. The big use of these being that they avoid many single source shuffles from needing to use dual source instrutions such as SHUFPD/SHUFPS, avoiding extra moves and allowing more load folds.

Adding these instructions uncovered an issue in XFormVExtractWithShuffleIntoLoad which crashed on single operand shuffle instructions. It also involved fixing getTargetShuffleMask to correctly identify theses instructions as unary shuffles.

I also added a shuffle mask decoder for MOVDDUP.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 18358.Jan 18 2015, 3:47 AM
RKSimon retitled this revision from to [X86][SSE] Added support for SSE3 lane duplication shuffle instructions.
RKSimon updated this object.
RKSimon edited the test plan for this revision. (Show Details)
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: Unknown Object (MLST).
chandlerc edited edge metadata.Jan 18 2015, 4:46 PM

This looks really good with only a minor ordering issue between movddup and broadcast which I commented on below.

test/CodeGen/X86/vector-shuffle-512-v8.ll
633 ↗(On Diff #18358)

I would expect vbroadcast to be generally a better choice than vmovddup here..

RKSimon updated this revision to Diff 18391.Jan 19 2015, 7:40 AM
RKSimon edited edge metadata.

Thanks Chandler - I've moved the MOVDDUP match after the BROADCAST match to fix this.

RKSimon updated this revision to Diff 18527.Jan 21 2015, 9:20 AM

That ping was a little premature! The movddup instruction was sometimes being pattern replaced with unpcklpd as the tablegen patterns hadn't been completed (there was a FIXME in X86InstrSSE.td) - I've updated the patch with a fix to the pattern and removed the FIXME.

qcolombet accepted this revision.Jan 21 2015, 10:44 AM
qcolombet edited edge metadata.

Hi Simon,

LGTM, but I would rather have two separated commits:

  1. Decoding of the mask for the comments.
  2. Matching of the DUP instructions.

This is not an absolute requirement though :).

Thanks for the patch!
-Quentin

This revision is now accepted and ready to land.Jan 21 2015, 10:44 AM
This revision was automatically updated to reflect the committed changes.

Thanks Quentin - splitting the patch was trivial (rL226705 and rL226716)