Page MenuHomePhabricator

[X86][LLVM]Expanding Supports lowerInterleavedLoad() in X86InterleavedAccess (VF{8|16|32} stride 3).
ClosedPublic

Authored by m_zuckerman on Aug 21 2017, 7:34 AM.

Details

Summary

This patch expands the support of lowerInterleavedload to {8|16|32}x8i stride 3.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) and we plan to include the store (deinterleved side).

The patch goal is to optimize the following sequence:
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7

into

a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7

Diff Detail

Event Timeline

m_zuckerman created this revision.Aug 21 2017, 7:34 AM
zvi added inline comments.Aug 23 2017, 11:13 AM
lib/Target/X86/X86InterleavedAccess.cpp
172

NumOfLoads or NumLoads

350

Please check function name

353

drop the brackets?

353

Maybe swap names of variables Lane and LaneCount?

354

int i = 0, e = VF / Lane; i != e; ++i

363

wide -> width or size

367

The return value may indeed by used as a shift amount, but if this function returns the element count of the i-th group, maybe better name it something like 'getGroupElemCount'

368

ArrayRef<uint32_t> Mask

369

Initialize Index here

369

GroupWide -> GroupWidth

372

I might be missing something, but is group number 0 defined to be the group starting at the end of Mask?

397

Please check identation

398

Please document the arguments

399

std::max(VectorWidth / 128, 1)

403

identation

445

So the last argument is always passed with the value true to all calls of this function?

457

You could already create the Undef value instead of only its type here.

482

The second vector argument should be an undef, and the shuffle mask should select only elements from the first vector operand. This will eventually happen in InstCombine or DAGCombine, but why not avoid this compile-time overhead?

m_zuckerman marked 15 inline comments as done.Aug 28 2017, 6:29 AM
m_zuckerman added inline comments.
lib/Target/X86/X86InterleavedAccess.cpp
372

Group number zero ( According to the modulo operation) will always lie at the first element of the shuffle since we begin with zero.For example, think about a vector length 8 with stride 3. The shuffle sequence will look like that [0,3,6,1,4,7,2,5] so we don't know the last group but we can know what is the first element since we begin the shuffle from zero.

445

No, the default mode is false like row 438.

482

I want to reuse the createAlignMask function. The function according to two vectors and not one vector and one undef.

zvi added inline comments.Aug 28 2017, 8:01 AM
lib/Target/X86/X86InterleavedAccess.cpp
352

On second thought no need for the max() with assert(vectorSize >= 128)

354

these brackets can be dropped

355

Extra brackets

359

Please consider something like this:

// getGroupElemCount - returns the size in elements of group index 'Group' in mask 'Mask'.
// A mask contains exactly 'Stride' groups, where each group is a monotonically increasing sequence with stride 'Stride'.
// In these examples the interested group is marked between '[...]'
//  getGroupElemCount(3, 0, {0|3|6|1|4|7|[2|5]}) -> 2
//  getGroupElemCount(3, 1, {0|3|6|[1|4|7]|2|5}) -> 3
369

Mask.back() % Stride

372

I see in the examples above that the first group consists of elements that are the last indices of Mask, and the search starts from the last index.

m_zuckerman marked 3 inline comments as done.
m_zuckerman marked an inline comment as done.
m_zuckerman added inline comments.
lib/Target/X86/X86InterleavedAccess.cpp
352

You will need it for the VF8 since the VectoreSize is 64 and 64/128 is a zero and not 1.

guyblank added inline comments.Aug 29 2017, 11:47 AM
lib/Target/X86/X86InterleavedAccess.cpp
109–110

update comment please

133

should this be just for factor == 3 ?
if so, should the code above also change to be only for factor == 4 ?

173

can you explain what's happening here?

379

typo: vpaling -> vpalign

384

should be {DiffToJump, ..., DiffToJump + VF - 1} ?

390

typos, should be:
AlignBegin is a boolean that indicates the direction of the alignment

391

typo: rigth -> right
plus, on of these should be "left"
and maybe something ilke AlignDirection would be a better name?

guyblank added inline comments.Aug 29 2017, 10:48 PM
lib/Target/X86/X86InterleavedAccess.cpp
392

are you sure you need this function? can you use DecodePALIGNRMask ?

419

can this be an array?

428

++i, below as well

m_zuckerman marked 8 inline comments as done.
m_zuckerman marked an inline comment as done.Aug 30 2017, 5:12 AM
m_zuckerman added inline comments.
lib/Target/X86/X86InterleavedAccess.cpp
419

This way I think it's more readable

zvi added inline comments.Sep 4 2017, 2:03 AM
lib/Target/X86/X86InterleavedAccess.cpp
355

I meant these brackets: (LaneSize)

364

Please try to make this function more efficient. Given Stride and VF you can compute the offset of each first/last group member and compute the first/last of the successor group etc.

365

Please check identation

370

for (int i = 0, Count = 0, GroupNumber = 0, e = VF / LaneCount; i != e; ++i)

375

Size is not really needed as you can update SizeInfo on the fly

395

static

482

You can pass an argument to DecodePALIGNRMask saying whether the resulting shuffle mask is for a single source vector or two vector sources. It just so happens that it will take the same value of AlignDirection.

test/Transforms/InterleavedAccess/X86/interleavedLoad.ll
1–2

Sorry I noticed it only now:
Can you update this file with a AVX512 run and rebase the patch?

m_zuckerman marked 4 inline comments as done.

Update according to Zvi comments

zvi edited edge metadata.Sep 5 2017, 3:50 AM

Looks almost ready. Please update the test with the missing AVX512 RUN:

lib/Target/X86/X86InterleavedAccess.cpp
367

No need to define GroupSize at loop scope. it can be moved to loop body.

389

Please check formatting: need spaces around the '='

m_zuckerman marked 2 inline comments as done.
zvi added inline comments.Sep 5 2017, 11:21 PM
test/Transforms/InterleavedAccess/X86/interleavedLoad.ll
3

This run command has the wrong -mattr features set

m_zuckerman marked an inline comment as not done.
zvi accepted this revision.Sep 6 2017, 11:02 AM

LGTM

This revision is now accepted and ready to land.Sep 6 2017, 11:02 AM