This is an archive of the discontinued LLVM Phabricator instance.

[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF16 stride 4).
ClosedPublic

Authored by m_zuckerman on Jul 25 2017, 3:24 AM.

Details

Summary

This patch expands the support of lowerInterleavedStore to 16x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=16) and we plan to include more patterns in the future.

The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 16 chars:

c0, c1, , c16
m0, m1, , m16
y0, y1, , y16
k0, k1, ., k16

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

Diff Detail

Event Timeline

m_zuckerman created this revision.Jul 25 2017, 3:24 AM
m_zuckerman retitled this revision from [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess. to [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF16 stride 4)..

Small modification on the function name

guyblank added inline comments.Jul 26 2017, 5:05 AM
lib/Target/X86/X86InterleavedAccess.cpp
109

the comment is a bit misleading since it depends on whether we are loading or storing. can you expand the comment to state exactly what is supported?

129

you can use ShuffleElemSize

130

you can early exit at the beginning of the function on !Subtarget.hasAVX()

130

so this check isn't needed in the previous if?

Can you try to simplify this function? its getting overly complicated IMO.

371

maybe pass NumSubVecElems to interleave8bitStride4 and have it calculate the MVTs ?

m_zuckerman marked 4 inline comments as done.
Farhana added inline comments.Jul 31 2017, 11:18 AM
lib/Target/X86/X86InterleavedAccess.cpp
111

Can you please compress this into one line? Otherwise, it looks good to me.

m_zuckerman marked an inline comment as done.
DavidKreitzer added inline comments.Aug 1 2017, 12:17 PM
lib/Target/X86/X86InterleavedAccess.cpp
130

Can you try to simplify this function? its getting overly complicated IMO.

+1
I would personally be fine with doing that in an NFC follow-up patch.

Otherwise, this LGTM.

guyblank accepted this revision.Aug 1 2017, 11:20 PM

LGTM with one minor comment.

lib/Target/X86/X86InterleavedAccess.cpp
210

you can use MVT::getVectorVT() directly, without creating the Type first.

This revision is now accepted and ready to land.Aug 1 2017, 11:20 PM
This revision was automatically updated to reflect the committed changes.