This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Improve codegen for vector loads using scalar_to_vector
ClosedPublic

Authored by amyk on Jul 4 2018, 12:51 PM.

Details

Summary

This patch aims to improve the codegen for vector loads involving the scalar_to_vector (load X) sequence.

Initially, ld->mv instructions were used for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X) to utilize:

  • LXSD and LXSDX (via the DFLOADf64 and XFLOADf64 pseudo instructions respectively) for i64 and f64
  • LXSIWAX for i32 (sign extension to i64)
  • LXSIWZX for i32 and f64

Diff Detail

Event Timeline

amyk created this revision.Jul 4 2018, 12:51 PM

There's a notification from Phabricator that says:
This file was changed only by adding or removing whitespace. Show File Contents
for file lib/Target/PowerPC/PPCISelLowering.cpp. If this is a glitch with Phabricator, fine. Otherwise please remove any such unrelated changes.

nemanjai requested changes to this revision.Jul 5 2018, 4:16 AM

A number of the registers in the test cases you've added are mandated by the ABI. Rather than keeping track of which ones should be part of the test case, perhaps it's easiest to use utils/update_llc_test_checks.py to automatically generate the CHECK directives (if it works for all the RUN lines in your tests).

llvm/lib/Target/PowerPC/PPCInstrVSX.td
2993

Please refrain from comments such as this one. What you're implementing is an improvement over the old code as every patch presumably is. However saying this within the code begs the question "Improvement over what?"

2994

This code is within a block with let AddedComplexity = 400 already set. So no need to repeat that.

2996

This does not match the code. LIWZX is actually used for i32/f32/(zext i32 -> i64).

2997

This predicate is correct since the code does not require HasP9Vector. However, putting this into a block with Predicates = [HasP9Vector] set is confusing as it might lead the reader to falsely conclude that both predicates are required here. Please pull this code outside of any blocks with Predicates set.

3019

The v4i32/v4f32 patterns are incorrect. The LXSIWZX/LFIWZX that the output pattern will eventually expand to will produce a vector register with the value: 0|V|U|U (0 == zero, V == loaded value, U == undefined). However, the result of scalar_to_vector needs to be V|U|U|U. So this needs a shift using XXSLDWI to shift the value left by one word.

3195

Same nit about comments that talk about improvement.

3197

I think that instead of utilization of 64 bit registers you mean utilization of all 64 VSX registers. Also, here and everywhere that punctuation is missing - we use complete sentences for comments. This includes punctuation.

3200

Line too long.

llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
10

Why the change in indentation on these?

llvm/test/CodeGen/PowerPC/build-vector-tests.ll
113

This is bad. The test case is actually a load-and-splat. We were emitting a load-and-splat. Now we're emitting a load, swap. This is functionally incorrect.
I imagine we just have to rip out the following code from PPCISelLowering.cpp:

// If the source for the shuffle is a scalar_to_vector that came from a
// 32-bit load, it will have used LXVWSX so we don't need to splat again.
if (Subtarget.hasP9Vector() &&
    ((isLittleEndian && SplatIdx == 3) || 
     (!isLittleEndian && SplatIdx == 0))) {
  SDValue Src = V1.getOperand(0);
  if (Src.getOpcode() == ISD::SCALAR_TO_VECTOR &&
      Src.getOperand(0).getOpcode() == ISD::LOAD &&
      Src.getOperand(0).hasOneUse())
    return V1;
}
This revision now requires changes to proceed.Jul 5 2018, 4:16 AM
amyk marked 6 inline comments as done.Jul 5 2018, 10:58 AM
amyk added inline comments.
llvm/lib/Target/PowerPC/PPCInstrVSX.td
2994

It could be I mistakenly seen, but the last let AddedComplexity block I saw was on line 2645,

let AddedComplexity = 400, Predicates = [HasP9Vector] in {                        
  // Extra patterns expanding to vector Extract Word/Insert Word                    
  def : Pat<(v4i32 (int_ppc_vsx_xxinsertw v4i32:$A, v2i64:$B, imm:$IMM)),           
            (v4i32 (XXINSERTW $A, $B, imm:$IMM))>;                                  
  def : Pat<(v2i64 (int_ppc_vsx_xxextractuw v2i64:$A, imm:$IMM)),                   
            (v2i64 (COPY_TO_REGCLASS (XXEXTRACTUW $A, imm:$IMM), VSRC))>;           
  } // AddedComplexity = 400, HasP9Vector

so wouldn't I still need it?

2997

I think this is related to the comment about the let AddedComplexity block. Isn't this block is in its own Predicate set, as well (from what I can see)?

3019

Thank you for bringing that to my attention. I am currently working on this.

llvm/test/CodeGen/PowerPC/build-vector-tests.ll
113

Yes, thank you for pointing that out to me. Removing that code fragment does indeed produce a xxspltw after the load/permute. I will update PPCISelLowering.cpp to reflect this.

amyk updated this revision to Diff 155471.Jul 13 2018, 1:22 PM

This revision addresses:

  • Updates to the comments left on the initial patches, such as comment changes, indentation changes.
  • Changes to PPCISelLowering.cpp to emit load and splat behaviour in test cases.
  • Addition of the XXSLDWIs instruction, which serves as a single input register form of XXSLDWI. This instruction is used in the modification of the LXSIWZX/LFIWZX pattern and is needed in order to prevent emitting double the amount of original load instructions.
  • Updated patterns for the (v4i32 and v4f32 ) LXSIWZX/LFIWZX patterns on big endian, using the XXSLDWIs instruction (shifting left by 1 word immediate).

There are a few minor nits, but aside from those this LGTM.
Also, please make note of the fact that we need to fix sequences such as xxsldwi -> xxspltw and xxpermdi -> xxspltw in PPCMIPeephole.cpp.

llvm/lib/Target/PowerPC/PPCInstrVSX.td
897

Nit: move this up to just below the definition of the base instruction (XXSLDWI).

2994

I don't think you're following the scope correctly, but it doesn't really matter. It doesn't hurt to have it multiple times.

In fact, it would probably be good to completely reorganize this file since it is huge and super difficult to follow whether you're in scope of some block. But that's of course a project for a separate patch.

2996

There is an extra space in the indentation. The z should line up under T.

llvm/test/CodeGen/PowerPC/swaps-le-6.ll
0–1

These were originally broken up onto multiple lines to avoid having lines that are too long. Now the lines are too long and they're broken up onto multiple lines. Please fix the column width.

nemanjai accepted this revision.Jul 17 2018, 5:52 AM

Forgot to approve the patch.

This revision is now accepted and ready to land.Jul 17 2018, 5:52 AM
amyk updated this revision to Diff 156255.Jul 19 2018, 6:33 AM

Updated the patch to address:

  • Adjusted column width for a RUN line in the test case
  • Moving the definition of the XXSLDWIs instruction underneath it's two input operand counterpart, XXSLDWI.
amyk updated this revision to Diff 157078.Jul 24 2018, 10:41 AM

Updated revision to remove irrelevant whitespaces.

syzaara added inline comments.Jul 24 2018, 2:12 PM
llvm/test/CodeGen/PowerPC/swaps-le-6.ll
23

these CHECKs now seem a lot more exhaustive than before, do we need to simplify these?

llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
67

Do you really need to add CHECKS for these kill statements?

syzaara added inline comments.Jul 24 2018, 2:22 PM
llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
11

These CHECK statements are too exhaustive and specific, please try to simplify them to only what is being tested.

amyk updated this revision to Diff 157752.Jul 27 2018, 1:28 PM

Further update of test cases prior to commit.

This revision was automatically updated to reflect the committed changes.