Page MenuHomePhabricator

[PowerPC] Exploit xxperm, check for dead vectors and substitute vperm with xxperm
ClosedPublic

Authored by maryammo on Sep 12 2022, 7:29 AM.

Details

Summary

vperm instruction requires the data to be in the Altivec registers, if one of
the vector operands is not used after this vperm instruction then it can be
substituted by xxperm which doubles the number of available registers.

Diff Detail

Event Timeline

maryammo created this revision.Sep 12 2022, 7:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2022, 7:29 AM
maryammo requested review of this revision.Sep 12 2022, 7:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2022, 7:29 AM
amyk added a subscriber: amyk.Sep 20 2022, 2:51 PM
amyk added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10282

Minor nit.

10285

I assume this is meant to be deleted?

10285

Can we elaborate why we want V2 as the destination?

10293

nit: XSWAP -> XXSWAPD maybe to match the actual opcode?

10311

nit: IE->i.e.
Can we also put a space after each i.e. to view the illustration better?

10363

Might be good to pull isPPC64() into a separate variable like how you did for isLittleEndian.

llvm/lib/Target/PowerPC/PPCInstrVSX.td
122

nit: Put SDT_PPCxxperm near the beginning of the file nearby the other SDT* definitions.

amyk added inline comments.Sep 26 2022, 9:32 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10285

Thanks for bringing this up with me again.
I think I misunderstood or read this incorrectly at the time, so you can probably disregard the comment regarding the elaboration.

maryammo updated this revision to Diff 462960.Sep 26 2022, 10:17 AM

Address review comments

amyk added inline comments.Oct 7 2022, 8:37 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10293

nit: Update the names?

10311

nit: Could we add a space after each i.e. to view the illustration better?

10357

Do we mean to get V2->getOperand(0) here?

10377

Isn't V1.getValueType() here just ValType?

maryammo added inline comments.Oct 12 2022, 12:49 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10377

V1.getValueType() is the current valueType for V1 while ValType is the original one. (there are possible bitcasts)

maryammo updated this revision to Diff 467893.Oct 14 2022, 12:59 PM

Address review comments

Thank you for your patience and sorry it took me so long to get to this.

I have a bunch of comments but most of them are not a big deal and a couple of them don't require action at all it's just something that's important to notice.
Overall I think that the patch makes sense and hopefully after this round of changes we will be ready to put it in.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10266

Is VPermMask used anymore other than the LLVM_DEBUG?

If it's not this line will cause a warning, or error with -Werror, when the LLVM_DEBUG is removed in some builds.

It looks to me like the whole for loop starting on line 10163 might be redundant as you do the same thing above and in LowerVPERM.
You may want to restructure the code so that you have

if (Subtarget.isISA3_0() && (V1->hasOneUse() || V2->hasOneUse())) {
  // Do the codegen with XXPERM here
  LLVM_DEBUG(dbgs() << "Emitting a VPERM for the following shuffle:\n");
  LLVM_DEBUG(SVOp->dump());
  LLVM_DEBUG(dbgs() << "With the following permute control vector:\n");
  LLVM_DEBUG(VPermMask.dump());
} else {
  // Do the codegen with VPERM here
  LLVM_DEBUG(dbgs() << "Emitting a XXPERM for the following shuffle:\n");
  LLVM_DEBUG(SVOp->dump());
  LLVM_DEBUG(dbgs() << "With the following permute control vector:\n");
  LLVM_DEBUG(XXPermMask.dump());
}

Or, at least get rid of the above code because it's not really needed. The debug info can come at the end of LowerVPERM.

10278

Aren't these lines the same as the lines 10177 - 10178 ?
If they are you probably don't need them.

10282

nit:
vector -> vectors

10286

nit:
Use EVT instead for auto.
Code is generally easier to read when we have the types spelled out.

10357

I don't think you have to do anything about this but more of a note to make sure it is taken into consideration. (Perhaps a comment in the code would be good.)
Not a big deal because this is debug info but in this case we could be overwriting the dl from the previous if statement if we have two swaps. I guess we want to use the location of the original swap which is fine and it doesn't matter which one we use.

10368

Does this get incremented twice for the same instruction?
Once here and once on line 10175 above.

10371

Oh, I see. The second half of the debug comment is down here.
It may be a good idea to move the two parts of the comment to the same place down here.

Also, the initial part of the comment :

Emitting a VPERM for the following shuffle:

May not be true as this may now be an XXPERM.

llvm/lib/Target/PowerPC/PPCInstrVSX.td
91

Is it possible to add a type constraint for the last operand here?

SDTCisVT<3, v4i32>

Or is that going to cause issues elsewhere?

llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
8059

Interesting. Here we actually end up with an extra copy which is not what we want but it's because the xxperm feeds the return value and so the register allocation is constrained by the ABI. For this patch I think we can ignore this but we should make a note of it to fix it at a later date.

maryammo added inline comments.Nov 1 2022, 4:07 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10371

The code staring in line 10163 for LLVB_DEBUG uses

10371

The LLVM_DEBUG code block starting at line 10163 uses SVOp which we dont pass it to LowerVPERM function as it is not needed here, that is why we have it there, I plan to delete LLVM_DEBUG from here and keep it there. Please lemme know if you have a concern.

llvm/lib/Target/PowerPC/PPCInstrVSX.td
91

The last one is SDTCisVT<2, v2f64> that has a different type constraint, are you suggesting to change it?

maryammo marked an inline comment as not done.Nov 1 2022, 4:08 PM
stefanp added inline comments.Nov 2 2022, 1:09 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10371

The only concern I have with moving this comment up is that we still put out the debug message:

Emitting a VPERM for the following shuffle:

when in fact this may not be what is going on. We may be emitting a XXPERM for the shuffle.
You can move everything down because SVOp is just a cast of Op :

ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(Op);

You can either re-do the cast or you can pass SVOp instead of Op because at this point we are pretty much guaranteed that Op is a ShuffleVectorSDNode.

llvm/lib/Target/PowerPC/PPCInstrVSX.td
91

No, I think that constraint is fine.
What I'm saying is that this SDTypeProfile<1, 3, has 1 output and 3 inputs.
Currently, the output and the first 2 inputs have a constraint but the last input doesn't have a constraint.

So, what I'm thinking of is:

--- a/llvm/lib/Target/PowerPC/PPCInstrVSX.td
+++ b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
@@ -88,7 +88,7 @@ def SDT_PPCst_vec_be : SDTypeProfile<0, 2, [
 
 def SDT_PPCxxperm : SDTypeProfile<1, 3, [
   SDTCisVT<0, v2f64>, SDTCisVT<1, v2f64>,
-  SDTCisVT<2, v2f64>]>;
+  SDTCisVT<2, v2f64>, SDTCisVT<3, v4i32>]>;
 //--------------------------- Custom PPC nodes -------------------------------//
 def PPClxvd2x  : SDNode<"PPCISD::LXVD2X", SDT_PPClxvd2x,
                         [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
@@ -4151,8 +4151,8 @@ def : Pat<(v8i16 (PPCldsplat ForceXForm:$A)),
           (v8i16 (VSPLTHs 3, (LXSIHZX ForceXForm:$A)))>;
 def : Pat<(v16i8 (PPCldsplat ForceXForm:$A)),
           (v16i8 (VSPLTBs 7, (LXSIBZX ForceXForm:$A)))>;
-def : Pat<(v2f64 (PPCxxperm v2f64:$XT, v2f64:$XB, (v16i8 (bitconvert v4i32:$C)))),
-          (XXPERM v2f64:$XT, v2f64:$XB, v4i32:$C)>;
 def : Pat<(v2f64 (PPCxxperm v2f64:$XT, v2f64:$XB, v4i32:$C)),
           (XXPERM v2f64:$XT, v2f64:$XB, v4i32:$C)>;
 } // HasVSX, HasP9Vector
maryammo added inline comments.Nov 2 2022, 5:13 PM
llvm/lib/Target/PowerPC/PPCInstrVSX.td
91

Such a change causes build failure which seems to be related to
def : Pat<(v2f64 (PPCxxperm v2f64:$XT, v2f64:$XB, (v16i8 (bitconvert v4i32:$C)))),

(XXPERM v2f64:$XT, v2f64:$XB, v4i32:$C)>;
stefanp added inline comments.Thu, Nov 3, 5:52 AM
llvm/lib/Target/PowerPC/PPCInstrVSX.td
91

Right which is why I don't think you need those two lines for that pattern.
The two patterns:

def : Pat<(v2f64 (PPCxxperm v2f64:$XT, v2f64:$XB, (v16i8 (bitconvert v4i32:$C)))),
         (XXPERM v2f64:$XT, v2f64:$XB, v4i32:$C)>;
def : Pat<(v2f64 (PPCxxperm v2f64:$XT, v2f64:$XB, v4i32:$C)),
         (XXPERM v2f64:$XT, v2f64:$XB, v4i32:$C)>;

do practically the same thing and I don't believe there is any use for the first one.

You can add the constraint and then remove the pattern that isn't used.

maryammo updated this revision to Diff 473070.Thu, Nov 3, 4:01 PM

Address review comments

stefanp accepted this revision.Tue, Nov 8, 12:12 PM

I think this looks good.
Thank you for addressing the comments!

LGTM

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10197

nit:
big-endian-based -> big-endian based

This revision is now accepted and ready to land.Tue, Nov 8, 12:12 PM
stefanp requested changes to this revision.Tue, Nov 8, 6:03 PM

Sorry, I know that I had approved this before but it seems that the test p10-splatImm32-undef.ll starts failing with this patch.
It may just be that the test needs to be updated but please make sure that is all it is.

This revision now requires changes to proceed.Tue, Nov 8, 6:03 PM
maryammo updated this revision to Diff 475888.Wed, Nov 16, 12:15 PM

Unset the hasSideEffects for XXPERM

stefanp accepted this revision.Mon, Nov 21, 6:18 PM

Thank you for fixing that last test!
LGTM.

This revision is now accepted and ready to land.Mon, Nov 21, 6:18 PM