This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE.
ClosedPublic

Authored by amyk on Jul 6 2020, 11:17 AM.

Details

Summary

This patch aims to exploit the xxsplti32dx XT, IX, IMM32 instruction when lowering VECTOR_SHUFFLEs.

We implement lowerToXXSPLTI32DX when lowering vector shuffles to check if:

  • Element size is 4 bytes
  • The RHS is a constant vector (and constant splat of 4-bytes)
  • The shuffle mask is a suitable mask for the XXSPLTI32DX instruction where it is one of the 32 masks:
<0, 4-7, 2, 4-7>
<4-7, 1, 4-7, 3>

Diff Detail

Event Timeline

amyk created this revision.Jul 6 2020, 11:17 AM
lei added a subscriber: lei.Jul 6 2020, 11:44 AM
lei added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9792

dl->DL

9825

no need for the tmp SplatSize

if ((SplatBitSize / 8) > 4)
9839

There see to be alot of extra, unnecessary () here... since all these are && I think alot of these can be removed.

9841

same.

9844

I think you are missing:

else
  return SDValue();
llvm/lib/Target/PowerPC/PPCInstrPrefix.td
65

nit: indentation?

lei added inline comments.Jul 6 2020, 12:40 PM
llvm/lib/Target/PowerPC/PPCInstrPrefix.td
62

nit: Maybe

PowerPC ISA 3.1 specific type constraints.
65

nvm. This is how it's been done else where..

67

nit:

// ISA 3.1 specific PPCISD nodes.
amyk updated this revision to Diff 275803.Jul 6 2020, 12:58 PM

Address the following review comments:

  • formatting in PPCInstrPrefix.td
  • Remove extra ( )
  • Indentation in PPCISelLowering
  • Remove unnecessary variables
  • Add extra case to return SDValue()
amyk updated this revision to Diff 275809.Jul 6 2020, 1:01 PM

Address comment of adding more specific ISA 3.1 comments to PPCInstrPrefix.td.

amyk marked 7 inline comments as done.Jul 6 2020, 1:21 PM
lei accepted this revision as: lei.Jul 6 2020, 1:51 PM

LGTM

This revision is now accepted and ready to land.Jul 6 2020, 1:51 PM
anil9 added a subscriber: anil9.Jul 6 2020, 1:53 PM
anil9 added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9827

semantics the -> semantics of the

llvm/lib/Target/PowerPC/PPCISelLowering.h
105

nit : the other ones seem to have a extra line with ///

1278

nit :
/// otherwise return the default SDValue. ???

nemanjai requested changes to this revision.Jul 6 2020, 2:33 PM

Pretty close to the way it should be but the indices do need to flip so I have to request changes.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9798

This comment is incorrect. The canonical type is v16i8.

9801

Is there a reason we don't just define these this way above?
i.e. SDValue LHS = peekThroughBitcasts(SVN->getOperand(0));

9817

You do not use IsBVNConstSplat anywhere except in the condition. You can just put the call in the condition
i.e. if (!BVN->isConstantSplat(...) || SplatBitSize > 32)

9824

This can be folded into the above condition. I think it is reasonable to expect the reader to understand that 32 bits is 4 bytes (on PPC) so we don't need to divide by 8.

9827
// Check that the shuffle mask matches the semantics of XXSPLTI32DX.
// The instruction splats a constant C into two words of the source vector
// producing { C, Unchanged, C, Unchanged } or { Unchanged, C, Unchanged, C }.
// Thus we check that the shuffle mask is the equivalent  of
// <0, [4-7], 2, [4-7]> or <[4-7], 1, [4-7], 3> respectively.
// Note: the check above of isNByteElemShuffleMask() ensures that the bytes
// within each word are consecutive, so we only need to check the first byte.
9839

We are after type legalization here, can you please use legal types (i.e. no MVT::i1).

9839

This is backwards. On LE, the rightmost element is element zero. In this path, the constant goes into the most significant word of each doubleword. So your Index needs to flip in both places.

llvm/lib/Target/PowerPC/PPCISelLowering.h
1278

All lowering and combine functions return a default constructed SDValue when unsuccessful. There is no reason to call that out specifically.

llvm/lib/Target/PowerPC/PPCInstrPrefix.td
754

Can we please use i32 rather than i1 as the latter could lead to issues (with using CRBIT registers which we really don't want to do).

llvm/test/CodeGen/PowerPC/p10-splatImm32.ll
21

The result of this shuffle is:
{ a[0], 566, a[2], 566 }
Which produces a vector register:

LE: [ 566  | a[2] | 566  | a[0] ] => xxsplti32dx vs34, 0, 566
BE: [ a[0] | 566  | a[2] | 566  ] => xxsplti32dx vs34, 1, 566

So it is backwards - similarly to all the test cases.

This revision now requires changes to proceed.Jul 6 2020, 2:33 PM
amyk updated this revision to Diff 275845.Jul 6 2020, 3:32 PM

Addressed comments from Nemanja:

  • various variable changes
  • update comment/documentation
  • corrected the index for the xxsplti32dx instruction for LE/BE
  • Updated the instruction to use u1imm instead of i1imm so the index in assembly can be 0/1, and this allows us the index to be i32 in the pattern.
nemanjai accepted this revision.Jul 6 2020, 4:18 PM

The remaining updates are straightforward so feel free to address my comments on the commit. LGTM otherwise.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9800

Forgot to remove these?

9840

If the splat is smaller than 32 bits, you need to replicate it.

// If the splat is narrower than 32-bits, we need to get the 32-bit value
// for XXSPLTI32DX.
unsigned SplatVal = APSplatValue.getZExtValue();
for (; SplatBitSize < 32; SplatBitSize <<= 1)
  SplatVal |= (SplatVal << SplatBitSize);

and then use SplatVal below when creating the XXSPLTI32DX node.
We also need a test case for this. Something like:

vector int test(vector int a) {
  unsigned Val = 0xABABABAB;
  a[0] = Val;
  a[2] = Val;
  return a;
}

This should give you a SplatBitSize == 8 and APSplatValue == 0xAB.

This revision is now accepted and ready to land.Jul 6 2020, 4:18 PM
amyk updated this revision to Diff 275861.Jul 6 2020, 6:24 PM

Address review comments from Nemanja:

  • update comments, variables
  • consider the case when the splat is smaller than 32-bits (and add associated test case)
This revision was automatically updated to reflect the committed changes.