This is an archive of the discontinued LLVM Phabricator instance.

[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF16 stride 4).
ClosedPublic

Authored by m_zuckerman on Jul 25 2017, 3:24 AM.

Details

Summary

This patch expands the support of lowerInterleavedStore to 16x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=16) and we plan to include more patterns in the future.

The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 16 chars:

c0, c1, , c16
m0, m1, , m16
y0, y1, , y16
k0, k1, ., k16

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

Diff Detail

Repository
rL LLVM

Event Timeline

m_zuckerman created this revision.Jul 25 2017, 3:24 AM
m_zuckerman retitled this revision from [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess. to [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF16 stride 4)..

Small modification on the function name

guyblank added inline comments.Jul 26 2017, 5:05 AM
lib/Target/X86/X86InterleavedAccess.cpp
113 ↗(On Diff #108041)

the comment is a bit misleading since it depends on whether we are loading or storing. can you expand the comment to state exactly what is supported?

126 ↗(On Diff #108041)

you can use ShuffleElemSize

127 ↗(On Diff #108041)

you can early exit at the beginning of the function on !Subtarget.hasAVX()

132 ↗(On Diff #108041)

so this check isn't needed in the previous if?

Can you try to simplify this function? its getting overly complicated IMO.

366 ↗(On Diff #108041)

maybe pass NumSubVecElems to interleave8bitStride4 and have it calculate the MVTs ?

m_zuckerman marked 4 inline comments as done.
Farhana added inline comments.Jul 31 2017, 11:18 AM
lib/Target/X86/X86InterleavedAccess.cpp
111 ↗(On Diff #108656)

Can you please compress this into one line? Otherwise, it looks good to me.

m_zuckerman marked an inline comment as done.
DavidKreitzer added inline comments.Aug 1 2017, 12:17 PM
lib/Target/X86/X86InterleavedAccess.cpp
132 ↗(On Diff #108041)

Can you try to simplify this function? its getting overly complicated IMO.

+1
I would personally be fine with doing that in an NFC follow-up patch.

Otherwise, this LGTM.

guyblank accepted this revision.Aug 1 2017, 11:20 PM

LGTM with one minor comment.

lib/Target/X86/X86InterleavedAccess.cpp
210 ↗(On Diff #109139)

you can use MVT::getVectorVT() directly, without creating the Type first.

This revision is now accepted and ready to land.Aug 1 2017, 11:20 PM
This revision was automatically updated to reflect the committed changes.