This is an archive of the discontinued LLVM Phabricator instance.

SSE pslldq/psrldq shuffle mask decodes
ClosedPublic

Authored by RKSimon on Oct 3 2014, 3:52 AM.

Details

Summary

Patch to provide shuffle decodes and asm comments for the sse pslldq/psrldq byte shift instructions.

Added comments checks to a number of vector insert/shuffle tests (we don't have that many uses cases of slldq/srldq at the moment).

Diff Detail

Event Timeline

RKSimon updated this revision to Diff 14371.Oct 3 2014, 3:52 AM
RKSimon retitled this revision from to SSE pslldq/psrldq shuffle mask decodes.
RKSimon updated this object.
RKSimon edited the test plan for this revision. (Show Details)
RKSimon added reviewers: chandlerc, andreadb.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: Unknown Object (MLST).
chandlerc added inline comments.Oct 5 2014, 4:25 PM
lib/Target/X86/Utils/X86ShuffleDecode.cpp
83–84

I wouldn't decode this in terms of the vector element type. The mask should be a byte-vector mask in all cases, and VT should just provide the size of the vector (128-bit, 256-bit, 512-bit, whatever).

RKSimon added inline comments.Oct 6 2014, 3:25 AM
lib/Target/X86/Utils/X86ShuffleDecode.cpp
83–84

Yes I'd be happy to do that - I just followed the pattern from palignr. This does mean that the computed ShuffleMask count might not match the number of elements in the MVT. Would it be better to change the MVT VT argument to unsigned VectorSizeInBytes to make that clear?

RKSimon updated this revision to Diff 14453.Oct 6 2014, 8:22 AM

The decoder always creates a byte-vector shuffle mask. I've altered the arguments to (hopefully) make this clear.

Rebased tests.

Ping.

Changes to the shuffle tests in the latest rebase have resulted in the psrldq test cases disappearing. This will be fixed with the followup patch to this that improves byte shift/rotate shuffles on pre-SSSE3 targets.

Hi Simon,

Could you add test cases with the ymm versions?

Those are the interesting ones :).

Thanks,
-Quentin

Hi Quentin,

Could you add test cases with the ymm versions?

Chandler can confirm but I don't think we have anything that creates vec256 byte rotates/shifts yet (and http://reviews.llvm.org/D5699 doesn't help either - it only improves the vec128 case) - if you wish I can remove the ymm support from X86InstComments.cpp to make this clear and add it back in a future patch that adds better support?

Simon.

RKSimon updated this revision to Diff 14777.Oct 12 2014, 10:01 AM

Updated test cases for SSE2/AVX1/AVX2 intrinsics builtins to include the pslldq/psrldq shuffle mask decode in the comments

chandlerc added inline comments.Oct 12 2014, 6:24 PM
lib/Target/X86/InstPrinter/X86InstComments.cpp
207

This code doesn't really make sense. If you know the exact type you know the exact size, just pass 128.

But what I meant with my comment was still to pass the MVT down, but to do the getSizeInBits query inside the decode routine.

lib/Target/X86/Utils/X86ShuffleDecode.cpp
87

No need for braces aroun the outer loop.

The above loop also uses int and < which is my personal preference for these loops.

test/CodeGen/X86/vector-shuffle-128-v8.ll
1403

No need to give the operands here when checking the comment. It looks like all of the new comments in this file need the same treatment.

RKSimon updated this revision to Diff 14801.Oct 13 2014, 7:33 AM

Chandler's recommendations: re-added vector type to decode function arguments, removed excess braces, for-loop comparison tidy up; removed operands in test cases with comments

chandlerc added inline comments.Oct 13 2014, 11:06 AM
test/CodeGen/X86/avx-intrinsics-x86.ll
461 ↗(On Diff #14801)

This doesn't look right... this is the identity mask?

461 ↗(On Diff #14801)

oh, this is bit shifting? what does this even mean? I'm not really sure how to interpret this test.

RKSimon added inline comments.Oct 14 2014, 1:22 AM
test/CodeGen/X86/avx-intrinsics-x86.ll
461 ↗(On Diff #14801)

Yes its a bit shifting representation (you'll notice in D5699 I have to mutliply byte shifts by 8) - the lower 3 bits of the shift are lost (and ignored). The test was there already - I just added the decode comment - I agree it doesn't look that useful at first glance but it has use as an edge test and it does demonstrate that the decode works for zero shifts....

chandlerc accepted this revision.Oct 14 2014, 2:04 AM
chandlerc edited edge metadata.
chandlerc added inline comments.
test/CodeGen/X86/avx-intrinsics-x86.ll
461 ↗(On Diff #14801)

Wow. What a broken intrinsic. Anyways, sure.

This revision is now accepted and ready to land.Oct 14 2014, 2:04 AM

And to be more explicit, LGTM, please commit. =]

RKSimon closed this revision.Oct 14 2014, 3:41 PM
RKSimon updated this revision to Diff 14900.

Closed by commit rL219738 (authored by @RKSimon).