vperm instruction requires the data to be in the Altivec registers, if one of
the vector operands is not used after this vperm instruction then it can be
substituted by xxperm which doubles the number of available registers.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
10279 | Minor nit. | |
10282 | I assume this is meant to be deleted? | |
10282 | Can we elaborate why we want V2 as the destination? | |
10290 | nit: XSWAP -> XXSWAPD maybe to match the actual opcode? | |
10308 | nit: IE->i.e. | |
10360 | Might be good to pull isPPC64() into a separate variable like how you did for isLittleEndian. | |
llvm/lib/Target/PowerPC/PPCInstrVSX.td | ||
122 | nit: Put SDT_PPCxxperm near the beginning of the file nearby the other SDT* definitions. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
10282 | Thanks for bringing this up with me again. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
10374 | V1.getValueType() is the current valueType for V1 while ValType is the original one. (there are possible bitcasts) |
Thank you for your patience and sorry it took me so long to get to this.
I have a bunch of comments but most of them are not a big deal and a couple of them don't require action at all it's just something that's important to notice.
Overall I think that the patch makes sense and hopefully after this round of changes we will be ready to put it in.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
10263 | Is VPermMask used anymore other than the LLVM_DEBUG? If it's not this line will cause a warning, or error with -Werror, when the LLVM_DEBUG is removed in some builds. It looks to me like the whole for loop starting on line 10163 might be redundant as you do the same thing above and in LowerVPERM. if (Subtarget.isISA3_0() && (V1->hasOneUse() || V2->hasOneUse())) { // Do the codegen with XXPERM here LLVM_DEBUG(dbgs() << "Emitting a VPERM for the following shuffle:\n"); LLVM_DEBUG(SVOp->dump()); LLVM_DEBUG(dbgs() << "With the following permute control vector:\n"); LLVM_DEBUG(VPermMask.dump()); } else { // Do the codegen with VPERM here LLVM_DEBUG(dbgs() << "Emitting a XXPERM for the following shuffle:\n"); LLVM_DEBUG(SVOp->dump()); LLVM_DEBUG(dbgs() << "With the following permute control vector:\n"); LLVM_DEBUG(XXPermMask.dump()); } Or, at least get rid of the above code because it's not really needed. The debug info can come at the end of LowerVPERM. | |
10275 | Aren't these lines the same as the lines 10177 - 10178 ? | |
10279 | nit: | |
10283 | nit: | |
10354 | I don't think you have to do anything about this but more of a note to make sure it is taken into consideration. (Perhaps a comment in the code would be good.) | |
10365 | Does this get incremented twice for the same instruction? | |
10368 | Oh, I see. The second half of the debug comment is down here. Also, the initial part of the comment : Emitting a VPERM for the following shuffle: May not be true as this may now be an XXPERM. | |
llvm/lib/Target/PowerPC/PPCInstrVSX.td | ||
91 | Is it possible to add a type constraint for the last operand here? SDTCisVT<3, v4i32> Or is that going to cause issues elsewhere? | |
llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll | ||
8059 | Interesting. Here we actually end up with an extra copy which is not what we want but it's because the xxperm feeds the return value and so the register allocation is constrained by the ABI. For this patch I think we can ignore this but we should make a note of it to fix it at a later date. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
10368 | The code staring in line 10163 for LLVB_DEBUG uses | |
10368 | The LLVM_DEBUG code block starting at line 10163 uses SVOp which we dont pass it to LowerVPERM function as it is not needed here, that is why we have it there, I plan to delete LLVM_DEBUG from here and keep it there. Please lemme know if you have a concern. | |
llvm/lib/Target/PowerPC/PPCInstrVSX.td | ||
91 | The last one is SDTCisVT<2, v2f64> that has a different type constraint, are you suggesting to change it? |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
10368 | The only concern I have with moving this comment up is that we still put out the debug message: Emitting a VPERM for the following shuffle: when in fact this may not be what is going on. We may be emitting a XXPERM for the shuffle. ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(Op); You can either re-do the cast or you can pass SVOp instead of Op because at this point we are pretty much guaranteed that Op is a ShuffleVectorSDNode. | |
llvm/lib/Target/PowerPC/PPCInstrVSX.td | ||
91 | No, I think that constraint is fine. So, what I'm thinking of is: --- a/llvm/lib/Target/PowerPC/PPCInstrVSX.td +++ b/llvm/lib/Target/PowerPC/PPCInstrVSX.td @@ -88,7 +88,7 @@ def SDT_PPCst_vec_be : SDTypeProfile<0, 2, [ def SDT_PPCxxperm : SDTypeProfile<1, 3, [ SDTCisVT<0, v2f64>, SDTCisVT<1, v2f64>, - SDTCisVT<2, v2f64>]>; + SDTCisVT<2, v2f64>, SDTCisVT<3, v4i32>]>; //--------------------------- Custom PPC nodes -------------------------------// def PPClxvd2x : SDNode<"PPCISD::LXVD2X", SDT_PPClxvd2x, [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>; @@ -4151,8 +4151,8 @@ def : Pat<(v8i16 (PPCldsplat ForceXForm:$A)), (v8i16 (VSPLTHs 3, (LXSIHZX ForceXForm:$A)))>; def : Pat<(v16i8 (PPCldsplat ForceXForm:$A)), (v16i8 (VSPLTBs 7, (LXSIBZX ForceXForm:$A)))>; -def : Pat<(v2f64 (PPCxxperm v2f64:$XT, v2f64:$XB, (v16i8 (bitconvert v4i32:$C)))), - (XXPERM v2f64:$XT, v2f64:$XB, v4i32:$C)>; def : Pat<(v2f64 (PPCxxperm v2f64:$XT, v2f64:$XB, v4i32:$C)), (XXPERM v2f64:$XT, v2f64:$XB, v4i32:$C)>; } // HasVSX, HasP9Vector |
llvm/lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
91 | Such a change causes build failure which seems to be related to (XXPERM v2f64:$XT, v2f64:$XB, v4i32:$C)>; |
llvm/lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
91 | Right which is why I don't think you need those two lines for that pattern. def : Pat<(v2f64 (PPCxxperm v2f64:$XT, v2f64:$XB, (v16i8 (bitconvert v4i32:$C)))), (XXPERM v2f64:$XT, v2f64:$XB, v4i32:$C)>; def : Pat<(v2f64 (PPCxxperm v2f64:$XT, v2f64:$XB, v4i32:$C)), (XXPERM v2f64:$XT, v2f64:$XB, v4i32:$C)>; do practically the same thing and I don't believe there is any use for the first one. You can add the constraint and then remove the pattern that isn't used. |
I think this looks good.
Thank you for addressing the comments!
LGTM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
10194 | nit: |
Sorry, I know that I had approved this before but it seems that the test p10-splatImm32-undef.ll starts failing with this patch.
It may just be that the test needs to be updated but please make sure that is all it is.
nit:
big-endian-based -> big-endian based