Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline

Supported interleaved byte load-pattern of stride:4 VF(8, 16, 32).
Needs RevisionPublic

Authored by Farhana on Jul 28 2017, 4:43 PM.



Stride-4 byte-load-pattern with VF-8, 16, 32 have similar data-access pattern. Therefore, we can write a single interleave-function to generate shuffle sequence for them assuming CG will generate optimal shuffle instruction for them. The basic idea is to optimize them as a 128 bit vector so that we can use the vpshuf*. Once we have shuffled in all the interleaved elements we can just keep packing them until we build the vectors of desired elements. Similar way, stores can be handled.

Currently, CG fails to generate optimal shuffle instruction in some cases, which I plan to fix in the next patch.

Diff Detail

Event Timeline

Farhana created this revision.Jul 28 2017, 4:43 PM
Farhana edited the summary of this revision. (Show Details)Jul 31 2017, 12:32 PM
RKSimon edited edge metadata.Aug 1 2017, 12:00 PM

Submit the tests with current codegen so that the patch shows the diffs


Assert that (BeginIndex + NumElements) <= VecTy->getNumElements() ?


VecLenInBits to make it clearer

for (int i = 0, e = NumElts / NumEltsToUnpack; i < e; ++i) {
DavidKreitzer requested changes to this revision.Aug 1 2017, 12:53 PM

Hi Farhana,

I would recommend partitioning this change set into multiple smaller patches. We can discuss offline how best to do that.


This revision now requires changes to proceed.Aug 1 2017, 12:53 PM