This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Canonicalize shuffles on big endian targets as well
ClosedPublic

Authored by nemanjai on Apr 14 2021, 7:05 AM.

Details

Summary

Extend shuffle canonicalization and conversion of shuffles fed by vectorized scalars to big endian subtargets. For big endian subtargets, loads and direct moves of scalars into vector registers put the data in the correct element for SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is narrower, the value still ends up in the wrong place - althouth a different wrong place than on little endian targets.
This patch extends the combine that keeps values where they are if they feed a shuffle to big endian targets.

Diff Detail

Event Timeline

nemanjai created this revision.Apr 14 2021, 7:05 AM
nemanjai requested review of this revision.Apr 14 2021, 7:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2021, 7:05 AM
jsji accepted this revision as: jsji.Apr 14 2021, 3:33 PM
jsji added a subscriber: jsji.

LGTM with some nits.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9566

Do we need to update the comments to be more specific about 8 bytes?

9583

Is this affecting little endian as well? If so, any test that can show the behavior change?

14220

unsigned ElemSizeInBytes?

14252

Is it possible that getVectorNumElements is 1? Then we will overflow the NewMask here?

14265

comments need update?

14309

should be unsigned here.? getScalarSizeInBits returning unsigned.
Also ElemSizeInBits?

14349

nit: Should we use local var instead? bool isLittleEndian = Subtarget.isLittleEndian();

14420–14421

nit:

How about something like:

if(isLittleEndian)
   RHS = TheSplat;
else
   LHS = TheSplat;

Res = DAG.getVectorShuffle(SVN->getValueType(0), dl, LHS, RHS, ShuffV);
This revision is now accepted and ready to land.Apr 14 2021, 3:33 PM
pjeeva01 accepted this revision as: pjeeva01.Apr 16 2021, 5:07 AM

LGTM.

nemanjai added inline comments.Apr 16 2021, 8:31 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9566

I will change it to: Unexpected size for permuted load on big endian target

9583

In theory, yes. However, we cannot get any offset other than 0 here on LE.
SplatIdx can only be {0, 1} for 8-byte permutes and {0, 1, 2, 3} for 4 byte permutes.

If the splat and load are 8 bytes, SplatIdx must be 1 because otherwise we wouldn't be splatting the value we loaded.

If the splat and load are 4 bytes, SplatIdx must be 3 because otherwise we wouldn't be splatting the value we loaded.

So the conditional above will ensure that 1 - SplatIdx/3 - SplatIdx is 0 respectively.
On BE, this is not necessarily true because we don't adjust SplatIdx in the condition on line 9578. I could of course add the code up there to account for this but felt that it is clearer to just set the offset to zero when the load size and the splat element size are the same.

14220

Sounds good.

14252

I will add an assert for that.

14265

I'll add a note about what this does on BE.

14272–14276
// On big endian targets, this is still useful for SCALAR_TO_VECTOR
// nodes with elements smaller than doubleword because all the ways
// of getting scalar data into a vector register put the value in the
// rightmost element of the left half of the vector.
14309

OK.

14349

Absolutely, will do. Thanks.

14420–14421

Makes sense.

This revision was landed with ongoing or failed builds.Apr 20 2021, 5:29 AM
This revision was automatically updated to reflect the committed changes.
llvm/test/CodeGen/PowerPC/fp-strict-round.ll