This is an archive of the discontinued LLVM Phabricator instance.

Teach LLVM about Power 9 D-Form VSX Instructions
AbandonedPublic

Authored by nemanjai on May 16 2016, 3:58 PM.

Details

Summary

This patch teaches LLVM how to generate the some of the Power 9 D-Form VSX instructions. Specifically, lxsd, lxssp, stxsd, and stxssp.

Diff Detail

Event Timeline

tjablin updated this revision to Diff 57414.May 16 2016, 3:58 PM
tjablin retitled this revision from to Teach LLVM about Power 9 D-Form VSX Instructions.
tjablin updated this object.
tjablin added reviewers: kbarton, hfinkel, cycheng, nemanjai.
tjablin added a subscriber: llvm-commits.
nemanjai edited edge metadata.May 17 2016, 9:01 AM

This is going to need some additional work to restrict the register sets for all the instructions. Of course, these are scalar loads/stores but they're restricted to the upper 32 VSX registers (the VMX registers) so we can't use the full vsfrc/vssrc register classes.

cycheng added a reviewer: amehsan.EditedMay 20 2016, 1:33 AM

+ Ehsan

This is going to need some additional work to restrict the register sets for all the instructions. Of course, these are scalar loads/stores but they're restricted to the upper 32 VSX registers (the VMX registers) so we can't use the full vsfrc/vssrc register classes.

I missed Nemanjai's comment, I did some test and looks like we can use vfrc register calss instead:

  • assembler/dis-assembler:
static DecodeStatus DecodeVFRCRegisterClass(MCInst &Inst, uint64_t RegNo,
                                            uint64_t Address,
                                            const void *Decoder) {
  return decodeRegisterClass(Inst, RegNo, VSFRegs);
}

  void addRegVFRCOperands(MCInst &Inst, unsigned N) const {
    assert(N == 1 && "Invalid number of operands!");
    Inst.addOperand(MCOperand::createReg(VSFRegs[getVSReg()]));
  }
  • PPCInstrVSX.td
def PPCRegVFRCAsmOperand : AsmOperandClass {
  let Name = "RegVFRC"; let PredicateMethod = "isVSRegNumber";
}
def vfrc : RegisterOperand<VFRC> {
  let ParserMatchClass = PPCRegVFRCAsmOperand;
}

  // Load DWord
  def LXSD  : DSForm_1<57, 2, (outs vfrc:$vD), (ins memrix:$src),
                       "lxsd $vD, $src", IIC_LdStLFD, []>;
  // Load SP from src, convert it to DP, and place in dword[0]
  def LXSSP : DSForm_1<57, 3, (outs vfrc:$vD), (ins memrix:$src),
                       "lxssp $vD, $src", IIC_LdStLFD, []>;

  // Store DWord
  def STXSD  : DSForm_1<61, 2, (outs), (ins vfrc:$vS, memrix:$dst),
                        "stxsd $vS, $dst", IIC_LdStSTFD, []>;
  // Convert DP of dword[0] to SP, and Store to dst
  def STXSSP : DSForm_1<61, 3, (outs), (ins vfrc:$vS, memrix:$dst),
                        "stxssp $vS, $dst", IIC_LdStSTFD, []>;

  let AddedComplexity = 500 in {
    def : Pat<(f64 (load iaddr:$src)), (LXSD  iaddr:$src)>;
    def : Pat<(f32 (load iaddr:$src)), (COPY_TO_REGCLASS (LXSSP iaddr:$src), VFRC)>;
    def : Pat<(f64 (extloadf32 iaddr:$src)),
            (COPY_TO_REGCLASS (LXSSP iaddr:$src), VFRC)>;
    def : Pat<(store f64:$vS, iaddr:$dst), (STXSD $vS, iaddr:$dst)>;
    def : Pat<(store f32:$vS, iaddr:$dst), (STXSSP (COPY_TO_REGCLASS $vS, VFRC), iaddr:$dst)>;
  }
  • Test case result:
        lxsd 35, 8(3)
        lxsd 36, 16(3)
        lxsd 32, 24(3)
...

This is going to need some additional work to restrict the register sets for all the instructions. Of course, these are scalar loads/stores but they're restricted to the upper 32 VSX registers (the VMX registers) so we can't use the full vsfrc/vssrc register classes.

I missed Nemanjai's comment, I did some test and looks like we can use vfrc register calss instead:

Unfortunately, this still isn't quite the correct semantics. Although this will target the right physical registers, the encoding is wrong. These really are VR registers and have 5-bit fields in the encoding. Things like:

lxsd 35, 8(3)

are not likely to produce the desired results. These instructions need the VR register to be specified in the 0-31 range which will actually mean VSR 32-63.
As far as I can tell, the idea with these instructions is that we get scalar floating point values using the nice D-Form loads into the remaining VSR's (the FP D-Form loads can be used for VSRs 0-31).

I think perhaps the best way to handle these would be to define a new register class which will alias the VRRC registers, but has 64-bit spill size and can hold f64/f32.

Unfortunately, this still isn't quite the correct semantics. Although this will target the right physical registers, the encoding is wrong. These really are VR registers and have 5-bit fields in the encoding. Things like:

lxsd 35, 8(3)

are not likely to produce the desired results. These instructions need the VR register to be specified in the 0-31 range which will actually mean VSR 32-63.
As far as I can tell, the idea with these instructions is that we get scalar floating point values using the nice D-Form loads into the remaining VSR's (the FP D-Form loads can be used for VSRs 0-31).

I think perhaps the best way to handle these would be to define a new register class which will alias the VRRC registers, but has 64-bit spill size and can hold f64/f32.

Oh.. I see my mistake, thanks! I will fix this by defining a new register class.

tjablin updated this revision to Diff 59072.May 31 2016, 8:13 AM
tjablin edited edge metadata.

From CY:

This patch prevent from defining new register class for vsx that uses 64-bit altivec register, it reuses VFRC register class, and does required changes when printing assembly code.

VsxUseAltivecReg and VFRC are used in "PPCInstPrinter::printOperand", "PPCAsmPrinter::printOperand", in order to get correct register name for vsx that uses 64-bit altivec register.

hfinkel edited edge metadata.May 31 2016, 5:17 PM

From CY:

This patch prevent from defining new register class for vsx that uses 64-bit altivec register, it reuses VFRC register class, and does required changes when printing assembly code.

VsxUseAltivecReg and VFRC are used in "PPCInstPrinter::printOperand", "PPCAsmPrinter::printOperand", in order to get correct register name for vsx that uses 64-bit altivec register.

Thanks for working on this. I'd really like to see a unified solution here, both for this and for the high half of the VSX register file in general (i.e. using this same scheme to eliminate the VSRH registers).

Also, although I suggested using the instruction flag bits for this, I think we should be able to do this without them by looking at the operand descriptions's register class:

MI->getDesc().OpInfo[MO's index].RegClass == PPC::VFRCRegClassID
lib/Target/PowerPC/PPCInstrFormats.td
41 ↗(On Diff #59072)

Please spell VSX here in all caps (and in the flag name).

Thanks for working on this. I'd really like to see a unified solution here, both for this and for the high half of the VSX register file in general (i.e. using this same scheme to eliminate the VSRH registers).

Let me try again : )

So the unified solution, by my understanding, is: we should not define a new register class that is actually alias with existing register class, we can use customized instruction flags and custom c++ code to handle their difference part, e.g. name. Benefits I can image are: we don't have to teach backend that the two register class are the same, we can simplify register class hierarchy.

Our current register hierarchy:

  • FPR (64-bit float register): F0 - F31 (f0 - f31)
  • VF (64-bit VSX register): VF0 - VF31 (vs32 - vs63)
  • VR (128-bit Altivec register): V0 - V31 (v0 - v31), overlap with VF (vs32 - vs63)
  • VSRL (128-bit VSX register): VSL0 - VSL31 (vs0 - vs31), overlap with FPR (f0 - f31)
  • VSRH (128-bit VSX register): VSH0 - VSH31 (vs32 - vs63), overlap with VR (v0 - v31)

Our current register class:

  • F8RC: [f64] F0 - F31
  • VFRC: [f64] VF0 - VF31
  • VRRC: [v16i8,v8i16,v4i32,v2i64,v1i128,v4f32] V0 - V31
  • VSLRC: [v4i32,v4f32,v2f64,v2i64] VSL0 - VSL31
  • VSHRC:[v4i32,v4f32,v2f64,v2i64] VSH0 - VSH31
  • VSRC: [v4i32,v4f32,v2f64,v2i64] VSLRC+VSHRC
  • VSFRC: [f64] F8RC, VFRC
  • VSSRC: [f32] F8RC, VFRC

I will remove VSRH, and below is my imaged modification:

In PPCRegisterInfo.td:

  • Eliminate all VSRH and related register definitions
class VSRH<VR SubReg, string n> : PPCReg<n> { ... }

foreach Index = 0-31 in {
  def VSH#Index : VSRH<!cast<VR>("V"#Index), "vs" # !add(Index, 32)>,
                  DwarfRegAlias<!cast<VR>("V"#Index)>;
}

def VSHRC : RegisterClass<"PPC", [v4i32,v4f32,v2f64,v2i64], 128,
                          (add VSH2, VSH3, VSH4, VSH5, VSH0, VSH1, VSH6, VSH7, ... )>;
  • For VSRC, change VSHRC to VFRC
    • I can image that we will also eliminate some code in PPCVSXCopy.cpp, because we won't have this copy: "VSRC.VSHRC <---> VSFRC.VFRC"
def VSRC  : RegisterClass<"PPC", [v4i32,v4f32,v2f64,v2i64], 128,
                          (add VSLRC, VFRC)>;
  • Change VF's definition:
foreach Index = 0-31 in {
  def VF#Index : VF<Index, "v" #Index>,
                 DwarfRegNum<[!add(Index, 77), !add(Index, 77)];
}
  • Add v2f64 to VRRC's supported types
  • Because we eliminate VSHRC, so we should add its v2f64 to VRRC
def VRRC : RegisterClass<"PPC", [v16i8,v8i16,v4i32,v2i64,v2f64,v1i128,v4f32], 128,

In PPCISelLowering.cpp/PPCCallingConv.td

  • revert changes of llvm c9de9e60b90 (r205041)
change1:
        case MVT::v2f64:
        case MVT::v2i64:
          RC = &PPC::VSHRCRegClass; => VRRCRegClass;

change2:
        MF.addLiveIn(VSRH[VR_idx], &PPC::VSHRCRegClass) : => remove (4 places)

change3:
  remove:
    CCIfType<[v2f64, v2i64], CCAssignToReg<[VSH2]>>
    CCIfType<[v2f64, v2i64], CCIfSubtarget<"hasVSX()",
             CCAssignToReg<[VSH2, VSH3, VSH4, VSH5, VSH6, VSH7, VSH8, VSH9]>>>

  add v2f64, v2i64 back to [v16i8, v8i16, v4i32, v4f32]

In PPCAsmPrinter.cpp/PPCInstPrinter.cpp

printOperand()
   if the register belongs to upper VSX registers (and now we use VFRC to represent it),
   we need translate it from v0-v31 to vs32-vs63

In PPCAsmParser.cpp/PPCDisassembler.cpp/PPCRegisterInfo.cpp/PPCVSXCopy.cpp

  • Just remove VSH related code
hfinkel added a subscriber: hfinkel.Jun 7 2016, 4:19 PM
  • Original Message -----

From: "Chuang-Yu Cheng" <cycheng@multicorewareinc.com>
To: tjablin@gmail.com, amehsan@ca.ibm.com, cycheng@multicorewareinc.com, "nemanja i ibm" <nemanja.i.ibm@gmail.com>,
hfinkel@anl.gov
Cc: llvm-commits@lists.llvm.org
Sent: Wednesday, June 1, 2016 2:58:55 AM
Subject: Re: [PATCH] D20310: Teach LLVM about Power 9 D-Form VSX Instructions

cycheng added a comment.

Thanks for working on this. I'd really like to see a unified
solution here, both for this and for the high half of the VSX
register file in general (i.e. using this same scheme to eliminate
the VSRH registers).

Let me try again : )

So the unified solution, by my understanding, is: we should not
define a new register class that is actually alias with existing
register class, we can use customized instruction flags and custom
c++ code to handle their difference part, e.g. name. Benefits I can
image are: we don't have to teach backend that the two register
class are the same, we can simplify register class hierarchy.

Yes, except that I'm not sure that we want to remove the register classes, just the register definitions themselves. This makes the change smaller, and also does not force us to add VSX-only data types to the Altivec register classes. In short, add VR0-31 directly to VSHRC. Does that make sense? Then you'll need some, but not all, of the changes you outline below.

Thanks again,
Hal

Our current register hierarchy:

f31)

  • VF (64-bit VSX register): VF0 - VF31 (vs32 - vs63)
  • VR (128-bit Altivec register): V0 - V31 (v0 - v31), overlap with VF

(vs32 - vs63)

  • VSRL (128-bit VSX register): VSL0 - VSL31 (vs0 - vs31), overlap

with FPR (f0 - f31)

  • VSRH (128-bit VSX register): VSH0 - VSH31 (vs32 - vs63), overlap

with VR (v0 - v31)

Our current register class:

  • F8RC: [f64] F0 - http://reviews.llvm.org/F31
  • VFRC: [f64] VF0 - VF31
  • VRRC: [v16i8,v8i16,v4i32,v2i64,v1i128,v4f32] V0 - V31
  • VSLRC: [v4i32,v4f32,v2f64,v2i64] VSL0 - VSL31
  • VSHRC:[v4i32,v4f32,v2f64,v2i64] VSH0 - VSH31
  • VSRC: [v4i32,v4f32,v2f64,v2i64] VSLRC+VSHRC
  • VSFRC: [f64] F8RC, VFRC
  • VSSRC: [f32] F8RC, VFRC

I will remove VSRH, and below is my imaged modification:

In PPCRegisterInfo.td:

  • Eliminate all VSRH and related register definitions

    class VSRH<VR SubReg, string n> : PPCReg<n> { ... }

    foreach Index = 0-31 in { def VSH#Index : VSRH<!cast<VR>("V"#Index), "vs" # !add(Index, 32)>, DwarfRegAlias<!cast<VR>("V"#Index)>; }

    def VSHRC : RegisterClass<"PPC", [v4i32,v4f32,v2f64,v2i64], 128, (add VSH2, VSH3, VSH4, VSH5, VSH0, VSH1, VSH6, VSH7, ... )>;
  • For VSRC, change VSHRC to VFRC
    • I can image that we will also eliminate some code in PPCVSXCopy.cpp, because we won't have this copy: "VSRC.VSHRC <---> VSFRC.VFRC"

      def VSRC : RegisterClass<"PPC", [v4i32,v4f32,v2f64,v2i64], 128, (add VSLRC, VFRC)>;
  • Change VF's definition:

    foreach Index = 0-31 in { def VF#Index : VF<Index, "v" #Index>, DwarfRegNum<[!add(Index, 77), !add(Index, 77)]; }
  • Add v2f64 to VRRC's supported types
  • Because we eliminate VSHRC, so we should add its v2f64 to VRRC

    def VRRC : RegisterClass<"PPC", [v16i8,v8i16,v4i32,v2i64,v2f64,v1i128,v4f32], 128,

In PPCISelLowering.cpp/PPCCallingConv.td

  • revert changes of llvm c9de9e60b90 (r205041)

    change1: case MVT::v2f64: case MVT::v2i64: RC = &PPC::VSHRCRegClass; => VRRCRegClass;

    change2: MF.addLiveIn(VSRH[VR_idx], &PPC::VSHRCRegClass) : => remove (4 places)

    change3: remove: CCIfType<[v2f64, v2i64], CCAssignToReg<[VSH2]>> CCIfType<[v2f64, v2i64], CCIfSubtarget<"hasVSX()", CCAssignToReg<[VSH2, VSH3, VSH4, VSH5, VSH6, VSH7, VSH8, VSH9]>>>

    add v2f64, v2i64 back to [v16i8, v8i16, v4i32, v4f32]

In PPCAsmPrinter.cpp/PPCInstPrinter.cpp

printOperand()
   if the register belongs to upper VSX registers (and now we use
   VFRC to represent it),
   we need translate it from v0-v31 to vs32-vs63

`In
PPCAsmParser.cpp/PPCDisassembler.cpp/PPCRegisterInfo.cpp/PPCVSXCopy.cpp`

  • Just remove VSH related code

http://reviews.llvm.org/D20310

Yes, except that I'm not sure that we want to remove the register classes, just the register definitions themselves. This makes the change smaller, and also does not force us to add VSX-only data types to the Altivec register classes. In short, add VR0-31 directly to VSHRC. Does that make sense? Then you'll need some, but not all, of the changes you outline below.

Thanks again,
Hal

Hi Hal,

I’ve seen some benefits because of this elimination, when I was fixing test case failures, I found we could generate shorter code than before. So I thought it is a right direction to simplify register hierarchy.

I had update 7 codegen test cases, but I still had one test case failure, that was encoding checking, I will fix it soon.

The new VSRC is composed of (VSLRC, VRRC), so I have removed all VSH related def and use.

CY

Yes, except that I'm not sure that we want to remove the register classes, just the register definitions themselves. This makes the change smaller, and also does not force us to add VSX-only data types to the Altivec register classes. In short, add VR0-31 directly to VSHRC. Does that make sense? Then you'll need some, but not all, of the changes you outline below.

Thanks again,
Hal

Hi Hal,

I’ve seen some benefits because of this elimination, when I was fixing test case failures, I found we could generate shorter code than before. So I thought it is a right direction to simplify register hierarchy.

I had update 7 codegen test cases, but I still had one test case failure, that was encoding checking, I will fix it soon.

The new VSRC is composed of (VSLRC, VRRC), so I have removed all VSH related def and use.

Okay, so you're saying that we generate even better code by eliminating VSHRC in addition to eliminating the register definitions themselves. In that case, I look forward to the updated patch :-)

Thanks again, Hal

CY

Eliminating VSHRC brought up a new issue for me, but I have fixed it. Tom will upload the new patch later (the patch passed all of my testing on Pwr8).

The issue was:
Because for now, VSRC = (VSLRC, VRRC), so when spill vs0-63 or vr0-31 to stack, backend was possible to generate such code:

STXVD2X %VSL11<kill>, %X31, %X0<kill>, %RM<imp-use>; mem:ST16[FixedStack17]
...
%V3<def> = LVX %X31, %X0<kill>; mem:LD16[FixedStack17]

We spill vs11 by STXVD2X, but we reload it to vr3 by LVX.

Such test case is rare, I hit the issue when I was running 453.povray

// in texture.cpp, InitTextureTable()
int i;
for (i = 0; i < 4096; i++)
{
  hashTable[i] = i;
}

This simple for loop was translated into more than 1,000 lines assembly code (by the way, I thought the inefficient code gen here was related to non-legal vector type legalization that Nemanjai is fixing)

A lot of vmx and vsx registers were used, altivec and vsx instructions were interleaved, then we hit this issue.

tjablin updated this revision to Diff 60689.Jun 14 2016, 8:03 AM
tjablin edited edge metadata.

Another update from CY.

Changes:

Remove VSHRC, use existing VRRC instead, so we simplify register class hierarchy
Now, VSRC = (VSLRC, VRRC)
Because VSX can use vs0-vs63 or v0-v31, so we add a custom flag "UseVSXReg" to distinguish it, asm printer and PPCMCCodeEmitter will use that flag to print correct name and get correct encoding
Because VSRC = (VSLRC, VRRC), so backend is possible to spill/reload a vector register with incompatible vector store/load instruction, i.e. stxvd2x/lxv or stxv/lxvd2x, so we track stack slot to make sure we use correct store/load instruction.
Fix broken test cases because of this change.
select-i1-vs-i1.ll
Now we generate bettter code for all v4f32 test cases

vsx-p8.ll
vsx.ll
Now we generate better code when we use '-fast-isel -O0', because we don't need unnecessary register copy.

Testing:

Passed 3-stage bootstrap testing
Passed 3-stage llvm/clang check-all
Passed Spec2006 (binaries built by stage-3 clang/llvm) with:
(a) -m64 -mcpu=power8 -O3
(b) -m64 -mcpu=power8 -O1
hfinkel added inline comments.Jun 14 2016, 8:36 AM
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
451 ↗(On Diff #60689)

Can you please use something like this instead:

MI->getDesc().OpInfo[MO's index].RegClass == PPC::VFRCRegClassID

and then get rid of the UseVSXReg flag?

lib/Target/PowerPC/PPCMachineFunctionInfo.h
118 ↗(On Diff #60689)

Use a DenseMap. You don't need to iterate over the ordering.

amehsan edited edge metadata.Jun 14 2016, 11:57 AM

As we discussed, before you commit the change, please add -verify-machineinstrs to your regression tests. No need to upload the patch again. Thanks.

As we discussed, before you commit the change, please add -verify-machineinstrs to your regression tests. No need to upload the patch again. Thanks.

Done! Thanks!

Hi Hal,

Looks like we can eliminate my stack slot tracking by force using STXVD2X/LXVD2X when HasVSX is true. I have tested my full test-set, we only need to update 2 llc test cases.

By the way, Spec2006 config I used in testing:

  1. -m64 -O3 -mcpu=power8
  2. -m64 -O1 -mcpu=power8 -mno-vsx

CY

lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
451 ↗(On Diff #60689)
  1. I tried it, but I got failed, the reason was: RegClass return Super Register Class name, i.e. VSRC in our case, so when I get VSRC, I won't be able to know it is VRRC or VSLRC. And we only want to translate VRRC, not VSLRC.
const MCInstrDesc &MCID = MI->getDesc();

if (MCID.TSFlags & PPCII::UseVSXReg) {
  int RCID = MCID.OpInfo[OpNo].RegClass;

  if (RCID == PPC::VRRCRegClassID)
    Reg = PPC::VSX32 + (Reg - PPC::V0);
  else if (RCID == PPC::VFRCRegClassID)
    Reg = PPC::VSX32 + (Reg - PPC::VF0);
}
  1. Probably not! Because the information is not enough. When the input register is VRRC, I won't know whether it is Altivec instruction, or VSX instruction, I need a flag for this. (so I need to tag the information in PPCInstrVSX.td)

So when UseVSXReg == true:

  1. MI is a VSX
  2. MI uses VSX registers

when UseVSXReg == false:

  1. MI can be a VSX or Altivec, but they both use Altivec registers
tjablin updated this revision to Diff 62764.Jul 5 2016, 9:32 AM
tjablin edited edge metadata.

Posting for CY.

  • Add -verify-machineinstrs for new testcases
  • Eliminate stack slot tracking by force using STXVD2X/LXVD2X when HasVSX is true, also update two testcases, and add a new testcase to guard this new behavior
  • Unfortunately, we might be able to remove "UseVSXReg" because when RC is VRRC, we need this flag to know we should treat it as vs32-vs63 or v0-v31.

Does anybody have any comment on this patch?
Thanks!

nemanjai commandeered this revision.Jun 21 2017, 9:00 PM
nemanjai edited reviewers, added: tjablin; removed: nemanjai.
nemanjai edited edge metadata.

This no longer applies an can be abandoned. We've implemented these instructions in two separate revisions and we've eliminated the VSHRC register class in yet another revision. There are no aspects of this that remain to be implemented.

nemanjai abandoned this revision.Jun 21 2017, 9:00 PM

Abandoning as per the previous comment.