Page MenuHomePhabricator

[PowerPC] Canonicalize shuffles on big endian targets as well

Authored by nemanjai on Apr 14 2021, 7:05 AM.



Extend shuffle canonicalization and conversion of shuffles fed by vectorized scalars to big endian subtargets. For big endian subtargets, loads and direct moves of scalars into vector registers put the data in the correct element for SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is narrower, the value still ends up in the wrong place - althouth a different wrong place than on little endian targets.
This patch extends the combine that keeps values where they are if they feed a shuffle to big endian targets.

Diff Detail

Event Timeline

nemanjai created this revision.Apr 14 2021, 7:05 AM
nemanjai requested review of this revision.Apr 14 2021, 7:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2021, 7:05 AM
jsji accepted this revision as: jsji.Apr 14 2021, 3:33 PM
jsji added a subscriber: jsji.

LGTM with some nits.


Do we need to update the comments to be more specific about 8 bytes?


Is this affecting little endian as well? If so, any test that can show the behavior change?


unsigned ElemSizeInBytes?


Is it possible that getVectorNumElements is 1? Then we will overflow the NewMask here?


comments need update?


should be unsigned here.? getScalarSizeInBits returning unsigned.
Also ElemSizeInBits?


nit: Should we use local var instead? bool isLittleEndian = Subtarget.isLittleEndian();



How about something like:

   RHS = TheSplat;
   LHS = TheSplat;

Res = DAG.getVectorShuffle(SVN->getValueType(0), dl, LHS, RHS, ShuffV);
This revision is now accepted and ready to land.Apr 14 2021, 3:33 PM
pjeeva01 accepted this revision as: pjeeva01.Apr 16 2021, 5:07 AM


nemanjai added inline comments.Apr 16 2021, 8:31 AM

I will change it to: Unexpected size for permuted load on big endian target


In theory, yes. However, we cannot get any offset other than 0 here on LE.
SplatIdx can only be {0, 1} for 8-byte permutes and {0, 1, 2, 3} for 4 byte permutes.

If the splat and load are 8 bytes, SplatIdx must be 1 because otherwise we wouldn't be splatting the value we loaded.

If the splat and load are 4 bytes, SplatIdx must be 3 because otherwise we wouldn't be splatting the value we loaded.

So the conditional above will ensure that 1 - SplatIdx/3 - SplatIdx is 0 respectively.
On BE, this is not necessarily true because we don't adjust SplatIdx in the condition on line 9578. I could of course add the code up there to account for this but felt that it is clearer to just set the offset to zero when the load size and the splat element size are the same.


Sounds good.


I will add an assert for that.


I'll add a note about what this does on BE.

// On big endian targets, this is still useful for SCALAR_TO_VECTOR
// nodes with elements smaller than doubleword because all the ways
// of getting scalar data into a vector register put the value in the
// rightmost element of the left half of the vector.



Absolutely, will do. Thanks.


Makes sense.

This revision was landed with ongoing or failed builds.Tue, Apr 20, 5:29 AM
This revision was automatically updated to reflect the committed changes.