Page MenuHomePhabricator

[PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE.

Authored by amyk on Jul 6 2020, 11:17 AM.



This patch aims to exploit the xxsplti32dx XT, IX, IMM32 instruction when lowering VECTOR_SHUFFLEs.

We implement lowerToXXSPLTI32DX when lowering vector shuffles to check if:

  • Element size is 4 bytes
  • The RHS is a constant vector (and constant splat of 4-bytes)
  • The shuffle mask is a suitable mask for the XXSPLTI32DX instruction where it is one of the 32 masks:
<0, 4-7, 2, 4-7>
<4-7, 1, 4-7, 3>

Diff Detail

Event Timeline

amyk created this revision.Jul 6 2020, 11:17 AM
lei added a subscriber: lei.Jul 6 2020, 11:44 AM
lei added inline comments.



no need for the tmp SplatSize

if ((SplatBitSize / 8) > 4)

There see to be alot of extra, unnecessary () here... since all these are && I think alot of these can be removed.




I think you are missing:

  return SDValue();

nit: indentation?

lei added inline comments.Jul 6 2020, 12:40 PM

nit: Maybe

PowerPC ISA 3.1 specific type constraints.

nvm. This is how it's been done else where..



// ISA 3.1 specific PPCISD nodes.
amyk updated this revision to Diff 275803.Jul 6 2020, 12:58 PM

Address the following review comments:

  • formatting in
  • Remove extra ( )
  • Indentation in PPCISelLowering
  • Remove unnecessary variables
  • Add extra case to return SDValue()
amyk updated this revision to Diff 275809.Jul 6 2020, 1:01 PM

Address comment of adding more specific ISA 3.1 comments to

amyk marked 7 inline comments as done.Jul 6 2020, 1:21 PM
lei accepted this revision as: lei.Jul 6 2020, 1:51 PM


This revision is now accepted and ready to land.Jul 6 2020, 1:51 PM
anil9 added a subscriber: anil9.Jul 6 2020, 1:53 PM
anil9 added inline comments.

semantics the -> semantics of the


nit : the other ones seem to have a extra line with ///


nit :
/// otherwise return the default SDValue. ???

nemanjai requested changes to this revision.Jul 6 2020, 2:33 PM

Pretty close to the way it should be but the indices do need to flip so I have to request changes.


This comment is incorrect. The canonical type is v16i8.


Is there a reason we don't just define these this way above?
i.e. SDValue LHS = peekThroughBitcasts(SVN->getOperand(0));


You do not use IsBVNConstSplat anywhere except in the condition. You can just put the call in the condition
i.e. if (!BVN->isConstantSplat(...) || SplatBitSize > 32)


This can be folded into the above condition. I think it is reasonable to expect the reader to understand that 32 bits is 4 bytes (on PPC) so we don't need to divide by 8.

// Check that the shuffle mask matches the semantics of XXSPLTI32DX.
// The instruction splats a constant C into two words of the source vector
// producing { C, Unchanged, C, Unchanged } or { Unchanged, C, Unchanged, C }.
// Thus we check that the shuffle mask is the equivalent  of
// <0, [4-7], 2, [4-7]> or <[4-7], 1, [4-7], 3> respectively.
// Note: the check above of isNByteElemShuffleMask() ensures that the bytes
// within each word are consecutive, so we only need to check the first byte.

We are after type legalization here, can you please use legal types (i.e. no MVT::i1).


This is backwards. On LE, the rightmost element is element zero. In this path, the constant goes into the most significant word of each doubleword. So your Index needs to flip in both places.


All lowering and combine functions return a default constructed SDValue when unsuccessful. There is no reason to call that out specifically.


Can we please use i32 rather than i1 as the latter could lead to issues (with using CRBIT registers which we really don't want to do).


The result of this shuffle is:
{ a[0], 566, a[2], 566 }
Which produces a vector register:

LE: [ 566  | a[2] | 566  | a[0] ] => xxsplti32dx vs34, 0, 566
BE: [ a[0] | 566  | a[2] | 566  ] => xxsplti32dx vs34, 1, 566

So it is backwards - similarly to all the test cases.

This revision now requires changes to proceed.Jul 6 2020, 2:33 PM
amyk updated this revision to Diff 275845.Jul 6 2020, 3:32 PM

Addressed comments from Nemanja:

  • various variable changes
  • update comment/documentation
  • corrected the index for the xxsplti32dx instruction for LE/BE
  • Updated the instruction to use u1imm instead of i1imm so the index in assembly can be 0/1, and this allows us the index to be i32 in the pattern.
nemanjai accepted this revision.Jul 6 2020, 4:18 PM

The remaining updates are straightforward so feel free to address my comments on the commit. LGTM otherwise.


Forgot to remove these?


If the splat is smaller than 32 bits, you need to replicate it.

// If the splat is narrower than 32-bits, we need to get the 32-bit value
// for XXSPLTI32DX.
unsigned SplatVal = APSplatValue.getZExtValue();
for (; SplatBitSize < 32; SplatBitSize <<= 1)
  SplatVal |= (SplatVal << SplatBitSize);

and then use SplatVal below when creating the XXSPLTI32DX node.
We also need a test case for this. Something like:

vector int test(vector int a) {
  unsigned Val = 0xABABABAB;
  a[0] = Val;
  a[2] = Val;
  return a;

This should give you a SplatBitSize == 8 and APSplatValue == 0xAB.

This revision is now accepted and ready to land.Jul 6 2020, 4:18 PM
amyk updated this revision to Diff 275861.Jul 6 2020, 6:24 PM

Address review comments from Nemanja:

  • update comments, variables
  • consider the case when the splat is smaller than 32-bits (and add associated test case)
This revision was automatically updated to reflect the committed changes.