Page MenuHomePhabricator

[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF16 stride 4).

Authored by m_zuckerman on Jul 25 2017, 3:24 AM.



This patch expands the support of lowerInterleavedStore to 16x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=16) and we plan to include more patterns in the future.

The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 16 chars:

c0, c1, , c16
m0, m1, , m16
y0, y1, , y16
k0, k1, ., k16

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

Diff Detail

Event Timeline

m_zuckerman created this revision.Jul 25 2017, 3:24 AM
m_zuckerman retitled this revision from [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess. to [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF16 stride 4)..

Small modification on the function name

guyblank added inline comments.Jul 26 2017, 5:05 AM

the comment is a bit misleading since it depends on whether we are loading or storing. can you expand the comment to state exactly what is supported?


you can use ShuffleElemSize


you can early exit at the beginning of the function on !Subtarget.hasAVX()


so this check isn't needed in the previous if?

Can you try to simplify this function? its getting overly complicated IMO.


maybe pass NumSubVecElems to interleave8bitStride4 and have it calculate the MVTs ?

m_zuckerman marked 4 inline comments as done.
Farhana added inline comments.Jul 31 2017, 11:18 AM

Can you please compress this into one line? Otherwise, it looks good to me.

m_zuckerman marked an inline comment as done.
DavidKreitzer added inline comments.Aug 1 2017, 12:17 PM

Can you try to simplify this function? its getting overly complicated IMO.

I would personally be fine with doing that in an NFC follow-up patch.

Otherwise, this LGTM.

guyblank accepted this revision.Aug 1 2017, 11:20 PM

LGTM with one minor comment.


you can use MVT::getVectorVT() directly, without creating the Type first.

This revision is now accepted and ready to land.Aug 1 2017, 11:20 PM
This revision was automatically updated to reflect the committed changes.