This patch teaches LLVM how to generate the some of the Power 9 D-Form VSX instructions. Specifically, lxsd, lxssp, stxsd, and stxssp.
Diff Detail
Event Timeline
This is going to need some additional work to restrict the register sets for all the instructions. Of course, these are scalar loads/stores but they're restricted to the upper 32 VSX registers (the VMX registers) so we can't use the full vsfrc/vssrc register classes.
I missed Nemanjai's comment, I did some test and looks like we can use vfrc register calss instead:
- assembler/dis-assembler:
static DecodeStatus DecodeVFRCRegisterClass(MCInst &Inst, uint64_t RegNo, uint64_t Address, const void *Decoder) { return decodeRegisterClass(Inst, RegNo, VSFRegs); } void addRegVFRCOperands(MCInst &Inst, unsigned N) const { assert(N == 1 && "Invalid number of operands!"); Inst.addOperand(MCOperand::createReg(VSFRegs[getVSReg()])); }
- PPCInstrVSX.td
def PPCRegVFRCAsmOperand : AsmOperandClass { let Name = "RegVFRC"; let PredicateMethod = "isVSRegNumber"; } def vfrc : RegisterOperand<VFRC> { let ParserMatchClass = PPCRegVFRCAsmOperand; } // Load DWord def LXSD : DSForm_1<57, 2, (outs vfrc:$vD), (ins memrix:$src), "lxsd $vD, $src", IIC_LdStLFD, []>; // Load SP from src, convert it to DP, and place in dword[0] def LXSSP : DSForm_1<57, 3, (outs vfrc:$vD), (ins memrix:$src), "lxssp $vD, $src", IIC_LdStLFD, []>; // Store DWord def STXSD : DSForm_1<61, 2, (outs), (ins vfrc:$vS, memrix:$dst), "stxsd $vS, $dst", IIC_LdStSTFD, []>; // Convert DP of dword[0] to SP, and Store to dst def STXSSP : DSForm_1<61, 3, (outs), (ins vfrc:$vS, memrix:$dst), "stxssp $vS, $dst", IIC_LdStSTFD, []>; let AddedComplexity = 500 in { def : Pat<(f64 (load iaddr:$src)), (LXSD iaddr:$src)>; def : Pat<(f32 (load iaddr:$src)), (COPY_TO_REGCLASS (LXSSP iaddr:$src), VFRC)>; def : Pat<(f64 (extloadf32 iaddr:$src)), (COPY_TO_REGCLASS (LXSSP iaddr:$src), VFRC)>; def : Pat<(store f64:$vS, iaddr:$dst), (STXSD $vS, iaddr:$dst)>; def : Pat<(store f32:$vS, iaddr:$dst), (STXSSP (COPY_TO_REGCLASS $vS, VFRC), iaddr:$dst)>; }
- Test case result:
lxsd 35, 8(3) lxsd 36, 16(3) lxsd 32, 24(3) ...
Unfortunately, this still isn't quite the correct semantics. Although this will target the right physical registers, the encoding is wrong. These really are VR registers and have 5-bit fields in the encoding. Things like:
lxsd 35, 8(3)
are not likely to produce the desired results. These instructions need the VR register to be specified in the 0-31 range which will actually mean VSR 32-63.
As far as I can tell, the idea with these instructions is that we get scalar floating point values using the nice D-Form loads into the remaining VSR's (the FP D-Form loads can be used for VSRs 0-31).
I think perhaps the best way to handle these would be to define a new register class which will alias the VRRC registers, but has 64-bit spill size and can hold f64/f32.
Unfortunately, this still isn't quite the correct semantics. Although this will target the right physical registers, the encoding is wrong. These really are VR registers and have 5-bit fields in the encoding. Things like:
lxsd 35, 8(3)are not likely to produce the desired results. These instructions need the VR register to be specified in the 0-31 range which will actually mean VSR 32-63.
As far as I can tell, the idea with these instructions is that we get scalar floating point values using the nice D-Form loads into the remaining VSR's (the FP D-Form loads can be used for VSRs 0-31).I think perhaps the best way to handle these would be to define a new register class which will alias the VRRC registers, but has 64-bit spill size and can hold f64/f32.
Oh.. I see my mistake, thanks! I will fix this by defining a new register class.
From CY:
This patch prevent from defining new register class for vsx that uses 64-bit altivec register, it reuses VFRC register class, and does required changes when printing assembly code.
VsxUseAltivecReg and VFRC are used in "PPCInstPrinter::printOperand", "PPCAsmPrinter::printOperand", in order to get correct register name for vsx that uses 64-bit altivec register.
Thanks for working on this. I'd really like to see a unified solution here, both for this and for the high half of the VSX register file in general (i.e. using this same scheme to eliminate the VSRH registers).
Also, although I suggested using the instruction flag bits for this, I think we should be able to do this without them by looking at the operand descriptions's register class:
MI->getDesc().OpInfo[MO's index].RegClass == PPC::VFRCRegClassID
lib/Target/PowerPC/PPCInstrFormats.td | ||
---|---|---|
41 | Please spell VSX here in all caps (and in the flag name). |
Thanks for working on this. I'd really like to see a unified solution here, both for this and for the high half of the VSX register file in general (i.e. using this same scheme to eliminate the VSRH registers).
Let me try again : )
So the unified solution, by my understanding, is: we should not define a new register class that is actually alias with existing register class, we can use customized instruction flags and custom c++ code to handle their difference part, e.g. name. Benefits I can image are: we don't have to teach backend that the two register class are the same, we can simplify register class hierarchy.
Our current register hierarchy:
- FPR (64-bit float register): F0 - F31 (f0 - f31)
- VF (64-bit VSX register): VF0 - VF31 (vs32 - vs63)
- VR (128-bit Altivec register): V0 - V31 (v0 - v31), overlap with VF (vs32 - vs63)
- VSRL (128-bit VSX register): VSL0 - VSL31 (vs0 - vs31), overlap with FPR (f0 - f31)
- VSRH (128-bit VSX register): VSH0 - VSH31 (vs32 - vs63), overlap with VR (v0 - v31)
Our current register class:
- F8RC: [f64] F0 - F31
- VFRC: [f64] VF0 - VF31
- VRRC: [v16i8,v8i16,v4i32,v2i64,v1i128,v4f32] V0 - V31
- VSLRC: [v4i32,v4f32,v2f64,v2i64] VSL0 - VSL31
- VSHRC:[v4i32,v4f32,v2f64,v2i64] VSH0 - VSH31
- VSRC: [v4i32,v4f32,v2f64,v2i64] VSLRC+VSHRC
- VSFRC: [f64] F8RC, VFRC
- VSSRC: [f32] F8RC, VFRC
I will remove VSRH, and below is my imaged modification:
In PPCRegisterInfo.td:
- Eliminate all VSRH and related register definitions
class VSRH<VR SubReg, string n> : PPCReg<n> { ... } foreach Index = 0-31 in { def VSH#Index : VSRH<!cast<VR>("V"#Index), "vs" # !add(Index, 32)>, DwarfRegAlias<!cast<VR>("V"#Index)>; } def VSHRC : RegisterClass<"PPC", [v4i32,v4f32,v2f64,v2i64], 128, (add VSH2, VSH3, VSH4, VSH5, VSH0, VSH1, VSH6, VSH7, ... )>;
- For VSRC, change VSHRC to VFRC
- I can image that we will also eliminate some code in PPCVSXCopy.cpp, because we won't have this copy: "VSRC.VSHRC <---> VSFRC.VFRC"
def VSRC : RegisterClass<"PPC", [v4i32,v4f32,v2f64,v2i64], 128, (add VSLRC, VFRC)>;
- Change VF's definition:
foreach Index = 0-31 in { def VF#Index : VF<Index, "v" #Index>, DwarfRegNum<[!add(Index, 77), !add(Index, 77)]; }
- Add v2f64 to VRRC's supported types
- Because we eliminate VSHRC, so we should add its v2f64 to VRRC
def VRRC : RegisterClass<"PPC", [v16i8,v8i16,v4i32,v2i64,v2f64,v1i128,v4f32], 128,
In PPCISelLowering.cpp/PPCCallingConv.td
- revert changes of llvm c9de9e60b90 (r205041)
change1: case MVT::v2f64: case MVT::v2i64: RC = &PPC::VSHRCRegClass; => VRRCRegClass; change2: MF.addLiveIn(VSRH[VR_idx], &PPC::VSHRCRegClass) : => remove (4 places) change3: remove: CCIfType<[v2f64, v2i64], CCAssignToReg<[VSH2]>> CCIfType<[v2f64, v2i64], CCIfSubtarget<"hasVSX()", CCAssignToReg<[VSH2, VSH3, VSH4, VSH5, VSH6, VSH7, VSH8, VSH9]>>> add v2f64, v2i64 back to [v16i8, v8i16, v4i32, v4f32]
In PPCAsmPrinter.cpp/PPCInstPrinter.cpp
printOperand() if the register belongs to upper VSX registers (and now we use VFRC to represent it), we need translate it from v0-v31 to vs32-vs63
In PPCAsmParser.cpp/PPCDisassembler.cpp/PPCRegisterInfo.cpp/PPCVSXCopy.cpp
- Just remove VSH related code
- Original Message -----
From: "Chuang-Yu Cheng" <cycheng@multicorewareinc.com>
To: tjablin@gmail.com, amehsan@ca.ibm.com, cycheng@multicorewareinc.com, "nemanja i ibm" <nemanja.i.ibm@gmail.com>,
hfinkel@anl.gov
Cc: llvm-commits@lists.llvm.org
Sent: Wednesday, June 1, 2016 2:58:55 AM
Subject: Re: [PATCH] D20310: Teach LLVM about Power 9 D-Form VSX Instructionscycheng added a comment.
Thanks for working on this. I'd really like to see a unified
solution here, both for this and for the high half of the VSX
register file in general (i.e. using this same scheme to eliminate
the VSRH registers).Let me try again : )
So the unified solution, by my understanding, is: we should not
define a new register class that is actually alias with existing
register class, we can use customized instruction flags and custom
c++ code to handle their difference part, e.g. name. Benefits I can
image are: we don't have to teach backend that the two register
class are the same, we can simplify register class hierarchy.
Yes, except that I'm not sure that we want to remove the register classes, just the register definitions themselves. This makes the change smaller, and also does not force us to add VSX-only data types to the Altivec register classes. In short, add VR0-31 directly to VSHRC. Does that make sense? Then you'll need some, but not all, of the changes you outline below.
Thanks again,
Hal
Our current register hierarchy:
- FPR (64-bit float register): F0 - http://reviews.llvm.org/F31 (f0 -
f31)
- VF (64-bit VSX register): VF0 - VF31 (vs32 - vs63)
- VR (128-bit Altivec register): V0 - V31 (v0 - v31), overlap with VF
(vs32 - vs63)
- VSRL (128-bit VSX register): VSL0 - VSL31 (vs0 - vs31), overlap
with FPR (f0 - f31)
- VSRH (128-bit VSX register): VSH0 - VSH31 (vs32 - vs63), overlap
with VR (v0 - v31)
Our current register class:
- F8RC: [f64] F0 - http://reviews.llvm.org/F31
- VFRC: [f64] VF0 - VF31
- VRRC: [v16i8,v8i16,v4i32,v2i64,v1i128,v4f32] V0 - V31
- VSLRC: [v4i32,v4f32,v2f64,v2i64] VSL0 - VSL31
- VSHRC:[v4i32,v4f32,v2f64,v2i64] VSH0 - VSH31
- VSRC: [v4i32,v4f32,v2f64,v2i64] VSLRC+VSHRC
- VSFRC: [f64] F8RC, VFRC
- VSSRC: [f32] F8RC, VFRC
I will remove VSRH, and below is my imaged modification:
In PPCRegisterInfo.td:
- Eliminate all VSRH and related register definitions
class VSRH<VR SubReg, string n> : PPCReg<n> { ... }
foreach Index = 0-31 in { def VSH#Index : VSRH<!cast<VR>("V"#Index), "vs" # !add(Index, 32)>, DwarfRegAlias<!cast<VR>("V"#Index)>; }
def VSHRC : RegisterClass<"PPC", [v4i32,v4f32,v2f64,v2i64], 128, (add VSH2, VSH3, VSH4, VSH5, VSH0, VSH1, VSH6, VSH7, ... )>;
- For VSRC, change VSHRC to VFRC
- I can image that we will also eliminate some code in PPCVSXCopy.cpp, because we won't have this copy: "VSRC.VSHRC <---> VSFRC.VFRC"
def VSRC : RegisterClass<"PPC", [v4i32,v4f32,v2f64,v2i64], 128, (add VSLRC, VFRC)>;
- Change VF's definition:
foreach Index = 0-31 in { def VF#Index : VF<Index, "v" #Index>, DwarfRegNum<[!add(Index, 77), !add(Index, 77)]; }
- Add v2f64 to VRRC's supported types
- Because we eliminate VSHRC, so we should add its v2f64 to VRRC
def VRRC : RegisterClass<"PPC", [v16i8,v8i16,v4i32,v2i64,v2f64,v1i128,v4f32], 128,In PPCISelLowering.cpp/PPCCallingConv.td
- revert changes of llvm c9de9e60b90 (r205041)
change1: case MVT::v2f64: case MVT::v2i64: RC = &PPC::VSHRCRegClass; => VRRCRegClass;
change2: MF.addLiveIn(VSRH[VR_idx], &PPC::VSHRCRegClass) : => remove (4 places)
change3: remove: CCIfType<[v2f64, v2i64], CCAssignToReg<[VSH2]>> CCIfType<[v2f64, v2i64], CCIfSubtarget<"hasVSX()", CCAssignToReg<[VSH2, VSH3, VSH4, VSH5, VSH6, VSH7, VSH8, VSH9]>>>
add v2f64, v2i64 back to [v16i8, v8i16, v4i32, v4f32]In PPCAsmPrinter.cpp/PPCInstPrinter.cpp
printOperand() if the register belongs to upper VSX registers (and now we use VFRC to represent it), we need translate it from v0-v31 to vs32-vs63`In
PPCAsmParser.cpp/PPCDisassembler.cpp/PPCRegisterInfo.cpp/PPCVSXCopy.cpp`
- Just remove VSH related code
Yes, except that I'm not sure that we want to remove the register classes, just the register definitions themselves. This makes the change smaller, and also does not force us to add VSX-only data types to the Altivec register classes. In short, add VR0-31 directly to VSHRC. Does that make sense? Then you'll need some, but not all, of the changes you outline below.
Thanks again,
Hal
Hi Hal,
I’ve seen some benefits because of this elimination, when I was fixing test case failures, I found we could generate shorter code than before. So I thought it is a right direction to simplify register hierarchy.
I had update 7 codegen test cases, but I still had one test case failure, that was encoding checking, I will fix it soon.
The new VSRC is composed of (VSLRC, VRRC), so I have removed all VSH related def and use.
CY
Okay, so you're saying that we generate even better code by eliminating VSHRC in addition to eliminating the register definitions themselves. In that case, I look forward to the updated patch :-)
Thanks again, Hal
CY
Eliminating VSHRC brought up a new issue for me, but I have fixed it. Tom will upload the new patch later (the patch passed all of my testing on Pwr8).
The issue was:
Because for now, VSRC = (VSLRC, VRRC), so when spill vs0-63 or vr0-31 to stack, backend was possible to generate such code:
STXVD2X %VSL11<kill>, %X31, %X0<kill>, %RM<imp-use>; mem:ST16[FixedStack17] ... %V3<def> = LVX %X31, %X0<kill>; mem:LD16[FixedStack17]
We spill vs11 by STXVD2X, but we reload it to vr3 by LVX.
Such test case is rare, I hit the issue when I was running 453.povray
// in texture.cpp, InitTextureTable() int i; for (i = 0; i < 4096; i++) { hashTable[i] = i; }
This simple for loop was translated into more than 1,000 lines assembly code (by the way, I thought the inefficient code gen here was related to non-legal vector type legalization that Nemanjai is fixing)
A lot of vmx and vsx registers were used, altivec and vsx instructions were interleaved, then we hit this issue.
Another update from CY.
Changes:
Remove VSHRC, use existing VRRC instead, so we simplify register class hierarchy Now, VSRC = (VSLRC, VRRC) Because VSX can use vs0-vs63 or v0-v31, so we add a custom flag "UseVSXReg" to distinguish it, asm printer and PPCMCCodeEmitter will use that flag to print correct name and get correct encoding Because VSRC = (VSLRC, VRRC), so backend is possible to spill/reload a vector register with incompatible vector store/load instruction, i.e. stxvd2x/lxv or stxv/lxvd2x, so we track stack slot to make sure we use correct store/load instruction. Fix broken test cases because of this change. select-i1-vs-i1.ll Now we generate bettter code for all v4f32 test cases vsx-p8.ll vsx.ll Now we generate better code when we use '-fast-isel -O0', because we don't need unnecessary register copy.
Testing:
Passed 3-stage bootstrap testing Passed 3-stage llvm/clang check-all Passed Spec2006 (binaries built by stage-3 clang/llvm) with: (a) -m64 -mcpu=power8 -O3 (b) -m64 -mcpu=power8 -O1
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp | ||
---|---|---|
451 | Can you please use something like this instead: MI->getDesc().OpInfo[MO's index].RegClass == PPC::VFRCRegClassID and then get rid of the UseVSXReg flag? | |
lib/Target/PowerPC/PPCMachineFunctionInfo.h | ||
118 ↗ | (On Diff #60689) | Use a DenseMap. You don't need to iterate over the ordering. |
As we discussed, before you commit the change, please add -verify-machineinstrs to your regression tests. No need to upload the patch again. Thanks.
Done! Thanks!
Hi Hal,
Looks like we can eliminate my stack slot tracking by force using STXVD2X/LXVD2X when HasVSX is true. I have tested my full test-set, we only need to update 2 llc test cases.
By the way, Spec2006 config I used in testing:
- -m64 -O3 -mcpu=power8
- -m64 -O1 -mcpu=power8 -mno-vsx
CY
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp | ||
---|---|---|
451 |
const MCInstrDesc &MCID = MI->getDesc(); if (MCID.TSFlags & PPCII::UseVSXReg) { int RCID = MCID.OpInfo[OpNo].RegClass; if (RCID == PPC::VRRCRegClassID) Reg = PPC::VSX32 + (Reg - PPC::V0); else if (RCID == PPC::VFRCRegClassID) Reg = PPC::VSX32 + (Reg - PPC::VF0); }
So when UseVSXReg == true:
when UseVSXReg == false:
|
Posting for CY.
- Add -verify-machineinstrs for new testcases
- Eliminate stack slot tracking by force using STXVD2X/LXVD2X when HasVSX is true, also update two testcases, and add a new testcase to guard this new behavior
- Unfortunately, we might be able to remove "UseVSXReg" because when RC is VRRC, we need this flag to know we should treat it as vs32-vs63 or v0-v31.
This no longer applies an can be abandoned. We've implemented these instructions in two separate revisions and we've eliminated the VSHRC register class in yet another revision. There are no aspects of this that remain to be implemented.
Can you please use something like this instead:
and then get rid of the UseVSXReg flag?