This is an archive of the discontinued LLVM Phabricator instance.

Supported interleaved byte load-pattern of stride:4 VF(8, 16, 32).
Needs RevisionPublic

Authored by Farhana on Jul 28 2017, 4:43 PM.

Details

Summary

Stride-4 byte-load-pattern with VF-8, 16, 32 have similar data-access pattern. Therefore, we can write a single interleave-function to generate shuffle sequence for them assuming CG will generate optimal shuffle instruction for them. The basic idea is to optimize them as a 128 bit vector so that we can use the vpshuf*. Once we have shuffled in all the interleaved elements we can just keep packing them until we build the vectors of desired elements. Similar way, stores can be handled.

Currently, CG fails to generate optimal shuffle instruction in some cases, which I plan to fix in the next patch.

Diff Detail

Event Timeline

Farhana created this revision.Jul 28 2017, 4:43 PM
Farhana edited the summary of this revision. (Show Details)Jul 31 2017, 12:32 PM
RKSimon edited edge metadata.Aug 1 2017, 12:00 PM

Submit the tests with current codegen so that the patch shows the diffs

lib/Analysis/VectorUtils.cpp
581

Assert that (BeginIndex + NumElements) <= VecTy->getNumElements() ?

lib/Target/X86/X86ISelLowering.h
1453

VecLenInBits to make it clearer

1458
for (int i = 0, e = NumElts / NumEltsToUnpack; i < e; ++i) {
DavidKreitzer requested changes to this revision.Aug 1 2017, 12:53 PM

Hi Farhana,

I would recommend partitioning this change set into multiple smaller patches. We can discuss offline how best to do that.

Thanks,
Dave

This revision now requires changes to proceed.Aug 1 2017, 12:53 PM