This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
4
PPCInstrInfo.cpp
-
PPCInstrVSX.td
3
PPCRegisterInfo.cpp
1
PPCRegisterInfo.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
2
gpr-vsr-spill.ll
5
gpr-vsr-spill2.ll

Differential D34815

[Power9] Spill gprs to vector registers rather than stack
ClosedPublic

Authored by syzaara on Jun 29 2017, 9:12 AM.

Download Raw Diff

Details

Reviewers

kbarton
nemanjai
lei
sfertile
jtony
inouehrs
stefanp
gyiu
hfinkel
echristo

Commits

rGfcd9697d72ee: [Power9] Spill gprs to vector registers rather than stack
rL313886: [Power9] Spill gprs to vector registers rather than stack

Summary

This patch updates register allocation to enable spilling gprs to vector registers rather than the stack. A new register class is added which is a super class of G8RC and VSFRC, called GPFPRC. The getLargestLegalSuperClass then returns GPFPRC for an input of G8RC. The patch also adds post RA pseudo instructions (VSRSPILL_LD, VSRSPILL_ST) for spilling a register of the new class used in LoadRegFromStackSlot and StoreRegToStackSlot. These are then expanded after register allocation to either a scalar or vector load.

Diff Detail

Event Timeline

syzaara created this revision.Jun 29 2017, 9:12 AM

Herald added a subscriber: qcolombet. · View Herald TranscriptJun 29 2017, 9:12 AM

Looks interesting.
This patch potentially increases the number of VSR save/restore in method prologue/epilogue (depending on which VSR is selected for spilling). Is my understanding correct?

lib/Target/PowerPC/PPCInstrInfo.cpp
52	I feel this name somewhat misleading. MTVSR instruction may be used for other purposes. NumGPRtoVSRSpill or somthing?
test/CodeGen/PowerPC/gpr-vsr-spill2.ll
25	What's the intention of this complicated test case without spills to VSR?

inouehrs added inline comments.Jun 29 2017, 10:55 AM

test/CodeGen/PowerPC/gpr-vsr-spill.ll
19	Actually, I cannot catch why we need spill here. The inline-asm clobbers all gprs but r30 and r31. So why we don't just use r30 and r31 for %a and %b?

syzaara added inline comments.Jun 29 2017, 11:10 AM

test/CodeGen/PowerPC/gpr-vsr-spill2.ll
25	This case shows how a spill of the new reg class is handled. Here we spilled a GPR to GPFPR where the new reg was also a gpr. We then needed to spill the new GPFPR using either a scalar store or vector store depending on the allocated register.

syzaara added inline comments.Jun 29 2017, 11:31 AM

test/CodeGen/PowerPC/gpr-vsr-spill.ll
19	Yes, but we need a register to save the result of the add. The result register used for the add is r30 and so one of the input parameters is spilled.

hfinkel added inline comments.Jul 3 2017, 8:20 PM

test/CodeGen/PowerPC/gpr-vsr-spill2.ll
1	Having this as an IR-level test seems fragile. Could you make this into a (simpler) MIR test that shows the behavior?

nemanjai added inline comments.Jul 7 2017, 2:02 PM

lib/Target/PowerPC/PPCInstrInfo.cpp
1994	Well, this is a pseudo that requires being `expandPostRAPseudo()`-ed. Wouldn't we want to say `return expandPostRAPseudo(MI)` here?

Overall, I like the patch. Seems quite nice and simple. However, I'm really not a fan of the naming convention. It is not clear to me why someone would be expected to make the connection between "GPFPRC" and "VSRSPILL". I think those should use the same base name to make the connection clear. Furthermore, I don't really think you should convey what registers are in the register class, but what the register class is used for. Perhaps the class and the related artifacts should be something like SPILLTOVSRRC and SPILLTOVSR_LD, etc. Perhaps other reviewers can chime in here as well.

Also, I think an important opportunity is lost since we don't do this for GPRRC. Of course, it doesn't have to be done in this patch, but I think a comment including a FIXME indicating this limitation is in order. Then if this turns out to be a performance win, we can follow this up with a patch that handles the 32-bit registers as well.

Only spill to volatile vsrs as spilling to non-volatile increases prologue/epilogue and leads to performance degradation.

syzaara added inline comments.Jul 31 2017, 1:11 PM

test/CodeGen/PowerPC/gpr-vsr-spill2.ll
1	I tried to create an MIR case using this, but the limitation with MIR tests identified in https://reviews.llvm.org/D33562 with MachineFunctionInfo not being saved/dumped as part of emitting .mir leads to machine verified errors. I tried to change the global vars to local vars to get around this limitation. However, doing that no longer reproduces the narrowed case so I will leave this as an IR test.

Other than a few minor inline comments, this LGTM.

Perhaps some of the other reviewers want to chime in on this. Otherwise please address those nits and commit.

lib/Target/PowerPC/PPCInstrInfo.cpp
910	I think for most (all?) other conditions, we have the source first. Please stick to that convention here as well.
2002	Just a nit. The register you're spilling is the source and the stack slot you're spilling it to is the target. So calling it `TargetReg` is a bit misleading when it's a store. :)
lib/Target/PowerPC/PPCRegisterInfo.cpp
53	Nit: it's not actually called `gp8rc` but `g8rc` if I remember correctly.
323	`// For Power9 we allow the user to enable GPR to vector spills.` Since we don't currently enable it by default even on Power9.
326	Please add a check for ELFv2 ABI. We are allowing spills only to the volatile VSR's, so we want to enable this only on the ABI where the VSR's we've selected are actually volatile.
lib/Target/PowerPC/PPCRegisterInfo.td
308	`// Allow spilling GPR's into caller-saved VSR's.`
test/CodeGen/PowerPC/gpr-vsr-spill2.ll
2	As implemented, this test case doesn't really test anything meaningful. It really just tests that there's a reg-to-reg copy (implemented as a move-register) followed by a spill of the target register. The two could be separated by arbitrary amount of code (including redefinition of the register). Unless you can add more meaningful testing to this complicated test case, I would simply get rid of it.

Forgot to accept :).

This revision is now accepted and ready to land.Sep 18 2017, 6:30 AM

Addressed review comments.

Closed by commit rL313886: [Power9] Spill gprs to vector registers rather than stack (authored by syzaara). · Explain WhySep 21 2017, 9:14 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

67 lines

24 lines

10 lines

2 lines

test/

CodeGen/

PowerPC/

gpr-vsr-spill.ll

24 lines

gpr-vsr-spill2.ll

129 lines

Diff 104661

lib/Target/PowerPC/PPCInstrInfo.cpp

Show All 40 Lines
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "ppc-instr-info"		#define DEBUG_TYPE "ppc-instr-info"

#define GET_INSTRMAP_INFO		#define GET_INSTRMAP_INFO
#define GET_INSTRINFO_CTOR_DTOR		#define GET_INSTRINFO_CTOR_DTOR
#include "PPCGenInstrInfo.inc"		#include "PPCGenInstrInfo.inc"


		STATISTIC(StoreVSRSPILLVec, "Number of vector spills to stack of gpfprc");
		STATISTIC(StoreVSRSPILLGpr, "Number of gpr spills to stack of gpfprc");
		STATISTIC(NumMTVSR, "Number of gpr spills to gpfprc");
		inouehrsUnsubmitted Not Done Reply Inline Actions I feel this name somewhat misleading. MTVSR instruction may be used for other purposes. NumGPRtoVSRSpill or somthing? inouehrs: I feel this name somewhat misleading. MTVSR instruction may be used for other purposes.

static cl::		static cl::
opt<bool> DisableCTRLoopAnal("disable-ppc-ctrloop-analysis", cl::Hidden,		opt<bool> DisableCTRLoopAnal("disable-ppc-ctrloop-analysis", cl::Hidden,
cl::desc("Disable analysis for CTR loops"));		cl::desc("Disable analysis for CTR loops"));

static cl::opt<bool> DisableCmpOpt("disable-ppc-cmp-opt",		static cl::opt<bool> DisableCmpOpt("disable-ppc-cmp-opt",
cl::desc("Disable compare instruction optimization"), cl::Hidden);		cl::desc("Disable compare instruction optimization"), cl::Hidden);

static cl::opt<bool> VSXSelfCopyCrash("crash-on-ppc-vsx-self-copy",		static cl::opt<bool> VSXSelfCopyCrash("crash-on-ppc-vsx-self-copy",
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	unsigned PPCInstrInfo::isLoadFromStackSlot(const MachineInstr &MI,
case PPC::RESTORE_CRBIT:		case PPC::RESTORE_CRBIT:
case PPC::LVX:		case PPC::LVX:
case PPC::LXVD2X:		case PPC::LXVD2X:
case PPC::LXVX:		case PPC::LXVX:
case PPC::QVLFDX:		case PPC::QVLFDX:
case PPC::QVLFSXs:		case PPC::QVLFSXs:
case PPC::QVLFDXb:		case PPC::QVLFDXb:
case PPC::RESTORE_VRSAVE:		case PPC::RESTORE_VRSAVE:
		case PPC::VSRSPILL_LD:
// Check for the operands added by addFrameReference (the immediate is the		// Check for the operands added by addFrameReference (the immediate is the
// offset which defaults to 0).		// offset which defaults to 0).
if (MI.getOperand(1).isImm() && !MI.getOperand(1).getImm() &&		if (MI.getOperand(1).isImm() && !MI.getOperand(1).getImm() &&
MI.getOperand(2).isFI()) {		MI.getOperand(2).isFI()) {
FrameIndex = MI.getOperand(2).getIndex();		FrameIndex = MI.getOperand(2).getIndex();
return MI.getOperand(0).getReg();		return MI.getOperand(0).getReg();
}		}
break;		break;
Show All 14 Lines	unsigned PPCInstrInfo::isStoreToStackSlot(const MachineInstr &MI,
case PPC::SPILL_CRBIT:		case PPC::SPILL_CRBIT:
case PPC::STVX:		case PPC::STVX:
case PPC::STXVD2X:		case PPC::STXVD2X:
case PPC::STXVX:		case PPC::STXVX:
case PPC::QVSTFDX:		case PPC::QVSTFDX:
case PPC::QVSTFSXs:		case PPC::QVSTFSXs:
case PPC::QVSTFDXb:		case PPC::QVSTFDXb:
case PPC::SPILL_VRSAVE:		case PPC::SPILL_VRSAVE:
		case PPC::VSRSPILL_ST:
// Check for the operands added by addFrameReference (the immediate is the		// Check for the operands added by addFrameReference (the immediate is the
// offset which defaults to 0).		// offset which defaults to 0).
if (MI.getOperand(1).isImm() && !MI.getOperand(1).getImm() &&		if (MI.getOperand(1).isImm() && !MI.getOperand(1).getImm() &&
MI.getOperand(2).isFI()) {		MI.getOperand(2).isFI()) {
FrameIndex = MI.getOperand(2).getIndex();		FrameIndex = MI.getOperand(2).getIndex();
return MI.getOperand(0).getReg();		return MI.getOperand(0).getReg();
}		}
break;		break;
▲ Show 20 Lines • Show All 568 Lines • ▼ Show 20 Lines	if (PPC::CRBITRCRegClass.contains(SrcReg) &&
BuildMI(MBB, I, DL, get(PPC::MFOCRF8), DestReg).addReg(SrcReg);		BuildMI(MBB, I, DL, get(PPC::MFOCRF8), DestReg).addReg(SrcReg);
getKillRegState(KillSrc);		getKillRegState(KillSrc);
return;		return;
} else if (PPC::CRRCRegClass.contains(SrcReg) &&		} else if (PPC::CRRCRegClass.contains(SrcReg) &&
PPC::GPRCRegClass.contains(DestReg)) {		PPC::GPRCRegClass.contains(DestReg)) {
BuildMI(MBB, I, DL, get(PPC::MFOCRF), DestReg).addReg(SrcReg);		BuildMI(MBB, I, DL, get(PPC::MFOCRF), DestReg).addReg(SrcReg);
getKillRegState(KillSrc);		getKillRegState(KillSrc);
return;		return;
		} else if (PPC::G8RCRegClass.contains(SrcReg) &&
		PPC::VSFRCRegClass.contains(DestReg)) {
		BuildMI(MBB, I, DL, get(PPC::MTVSRD), DestReg).addReg(SrcReg);
		NumMTVSR++;
		getKillRegState(KillSrc);
		return;
		} else if (PPC::G8RCRegClass.contains(DestReg) &&
		nemanjaiUnsubmitted Not Done Reply Inline Actions I think for most (all?) other conditions, we have the source first. Please stick to that convention here as well. nemanjai: I think for most (all?) other conditions, we have the source first. Please stick to that…
		PPC::VSFRCRegClass.contains(SrcReg)) {
		BuildMI(MBB, I, DL, get(PPC::MFVSRD), DestReg).addReg(SrcReg);
		getKillRegState(KillSrc);
		return;
}		}

unsigned Opc;		unsigned Opc;
if (PPC::GPRCRegClass.contains(DestReg, SrcReg))		if (PPC::GPRCRegClass.contains(DestReg, SrcReg))
Opc = PPC::OR;		Opc = PPC::OR;
else if (PPC::G8RCRegClass.contains(DestReg, SrcReg))		else if (PPC::G8RCRegClass.contains(DestReg, SrcReg))
Opc = PPC::OR8;		Opc = PPC::OR8;
else if (PPC::F4RCRegClass.contains(DestReg, SrcReg))		else if (PPC::F4RCRegClass.contains(DestReg, SrcReg))
Opc = PPC::FMR;		Opc = PPC::FMR;
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(PPC::QVSTFSXs))
FrameIdx));		FrameIdx));
NonRI = true;		NonRI = true;
} else if (PPC::QBRCRegClass.hasSubClassEq(RC)) {		} else if (PPC::QBRCRegClass.hasSubClassEq(RC)) {
NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(PPC::QVSTFDXb))		NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(PPC::QVSTFDXb))
.addReg(SrcReg,		.addReg(SrcReg,
getKillRegState(isKill)),		getKillRegState(isKill)),
FrameIdx));		FrameIdx));
NonRI = true;		NonRI = true;
		} else if (PPC::GPFPRCRegClass.hasSubClassEq(RC)) {
		NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(PPC::VSRSPILL_ST))
		.addReg(SrcReg,
		getKillRegState(isKill)),
		FrameIdx));
} else {		} else {
llvm_unreachable("Unknown regclass!");		llvm_unreachable("Unknown regclass!");
}		}

return false;		return false;
}		}

void		void
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	bool PPCInstrInfo::LoadRegFromStackSlot(MachineFunction &MF, const DebugLoc &DL,
} else if (PPC::QSRCRegClass.hasSubClassEq(RC)) {		} else if (PPC::QSRCRegClass.hasSubClassEq(RC)) {
NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(PPC::QVLFSXs), DestReg),		NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(PPC::QVLFSXs), DestReg),
FrameIdx));		FrameIdx));
NonRI = true;		NonRI = true;
} else if (PPC::QBRCRegClass.hasSubClassEq(RC)) {		} else if (PPC::QBRCRegClass.hasSubClassEq(RC)) {
NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(PPC::QVLFDXb), DestReg),		NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(PPC::QVLFDXb), DestReg),
FrameIdx));		FrameIdx));
NonRI = true;		NonRI = true;
		} else if (PPC::GPFPRCRegClass.hasSubClassEq(RC)) {
		NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(PPC::VSRSPILL_LD),
		DestReg), FrameIdx));
} else {		} else {
llvm_unreachable("Unknown regclass!");		llvm_unreachable("Unknown regclass!");
}		}

return false;		return false;
}		}

void		void
▲ Show 20 Lines • Show All 787 Lines • ▼ Show 20 Lines	case PPC::DFSTOREf64: {
if ((TargetReg >= PPC::F0 && TargetReg <= PPC::F31) \|\|		if ((TargetReg >= PPC::F0 && TargetReg <= PPC::F31) \|\|
(TargetReg >= PPC::VSL0 && TargetReg <= PPC::VSL31))		(TargetReg >= PPC::VSL0 && TargetReg <= PPC::VSL31))
Opcode = LowerOpcode;		Opcode = LowerOpcode;
else		else
Opcode = UpperOpcode;		Opcode = UpperOpcode;
MI.setDesc(get(Opcode));		MI.setDesc(get(Opcode));
return true;		return true;
}		}
		case PPC::VSRSPILL_LD: {
		unsigned TargetReg = MI.getOperand(0).getReg();
		if (PPC::VSFRCRegClass.contains(TargetReg))
		MI.setDesc(get(PPC::DFLOADf64));
		nemanjaiUnsubmitted Not Done Reply Inline Actions Well, this is a pseudo that requires being `expandPostRAPseudo()`-ed. Wouldn't we want to say `return expandPostRAPseudo(MI)` here? nemanjai: Well, this is a pseudo that requires being `expandPostRAPseudo()`-ed. Wouldn't we want to say…
		else
		MI.setDesc(get(PPC::LD));
		return true;
		}
		case PPC::VSRSPILL_ST: {
		unsigned TargetReg = MI.getOperand(0).getReg();
		if (PPC::VSFRCRegClass.contains(TargetReg)) {
		StoreVSRSPILLVec++;
		nemanjaiUnsubmitted Not Done Reply Inline Actions Just a nit. The register you're spilling is the source and the stack slot you're spilling it to is the target. So calling it `TargetReg` is a bit misleading when it's a store. :) nemanjai: Just a nit. The register you're spilling is the source and the stack slot you're spilling…
		MI.setDesc(get(PPC::DFSTOREf64));
		} else {
		StoreVSRSPILLGpr++;
		MI.setDesc(get(PPC::STD));
		}
		return true;
		}
		case PPC::VSRSPILL_LDX: {
		unsigned TargetReg = MI.getOperand(0).getReg();
		if (PPC::VSFRCRegClass.contains(TargetReg))
		MI.setDesc(get(PPC::LXSDX));
		else
		MI.setDesc(get(PPC::LDX));
		return true;
		}
		case PPC::VSRSPILL_STX: {
		unsigned TargetReg = MI.getOperand(0).getReg();
		if (PPC::VSFRCRegClass.contains(TargetReg)) {
		StoreVSRSPILLVec++;
		MI.setDesc(get(PPC::STXSDX));
		} else {
		StoreVSRSPILLGpr++;
		MI.setDesc(get(PPC::STDX));
		}
		return true;
		}

case PPC::CFENCE8: {		case PPC::CFENCE8: {
auto Val = MI.getOperand(0).getReg();		auto Val = MI.getOperand(0).getReg();
BuildMI(MBB, MI, DL, get(PPC::CMPW), PPC::CR7).addReg(Val).addReg(Val);		BuildMI(MBB, MI, DL, get(PPC::CMPW), PPC::CR7).addReg(Val).addReg(Val);
BuildMI(MBB, MI, DL, get(PPC::CTRL_DEP))		BuildMI(MBB, MI, DL, get(PPC::CTRL_DEP))
.addImm(PPC::PRED_NE_MINUS)		.addImm(PPC::PRED_NE_MINUS)
.addReg(PPC::CR7)		.addReg(PPC::CR7)
.addImm(1);		.addImm(1);
MI.setDesc(get(PPC::ISYNC));		MI.setDesc(get(PPC::ISYNC));
Show All 17 Lines

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines

def PPCRegVSSRCAsmOperand : AsmOperandClass {		def PPCRegVSSRCAsmOperand : AsmOperandClass {
let Name = "RegVSSRC"; let PredicateMethod = "isVSRegNumber";		let Name = "RegVSSRC"; let PredicateMethod = "isVSRegNumber";
}		}
def vssrc : RegisterOperand<VSSRC> {		def vssrc : RegisterOperand<VSSRC> {
let ParserMatchClass = PPCRegVSSRCAsmOperand;		let ParserMatchClass = PPCRegVSSRCAsmOperand;
}		}

		def PPCRegGPFPRCAsmOperand : AsmOperandClass {
		let Name = "RegGPFPRC"; let PredicateMethod = "isVSRegNumber";
		}

		def gpfprc : RegisterOperand<GPFPRC> {
		let ParserMatchClass = PPCRegGPFPRCAsmOperand;
		}
// Little-endian-specific nodes.		// Little-endian-specific nodes.
def SDT_PPClxvd2x : SDTypeProfile<1, 1, [		def SDT_PPClxvd2x : SDTypeProfile<1, 1, [
SDTCisVT<0, v2f64>, SDTCisPtrTy<1>		SDTCisVT<0, v2f64>, SDTCisPtrTy<1>
]>;		]>;
def SDT_PPCstxvd2x : SDTypeProfile<0, 2, [		def SDT_PPCstxvd2x : SDTypeProfile<0, 2, [
SDTCisVT<0, v2f64>, SDTCisPtrTy<1>		SDTCisVT<0, v2f64>, SDTCisPtrTy<1>
]>;		]>;
def SDT_PPCxxswapd : SDTypeProfile<1, 1, [		def SDT_PPCxxswapd : SDTypeProfile<1, 1, [
▲ Show 20 Lines • Show All 2,640 Lines • ▼ Show 20 Lines	def DFSTOREf64 : Pseudo<(outs), (ins vsfrc:$XT, memrix:$dst),
[(store f64:$XT, iaddr:$dst)]>;		[(store f64:$XT, iaddr:$dst)]>;
}		}
def : Pat<(f64 (extloadf32 iaddr:$src)),		def : Pat<(f64 (extloadf32 iaddr:$src)),
(COPY_TO_REGCLASS (DFLOADf32 iaddr:$src), VSFRC)>;		(COPY_TO_REGCLASS (DFLOADf32 iaddr:$src), VSFRC)>;
def : Pat<(f32 (fpround (extloadf32 iaddr:$src))),		def : Pat<(f32 (fpround (extloadf32 iaddr:$src))),
(f32 (DFLOADf32 iaddr:$src))>;		(f32 (DFLOADf32 iaddr:$src))>;
} // end HasP9Vector, AddedComplexity		} // end HasP9Vector, AddedComplexity

		let Predicates = [HasP9Vector] in {
		let isPseudo = 1 in {
		let mayStore = 1 in {
		def VSRSPILL_STX : Pseudo<(outs), (ins gpfprc:$XT, memrr:$dst),
		"#VSRSPILL_STX", []>;
		def VSRSPILL_ST : Pseudo<(outs), (ins gpfprc:$XT, memrix:$dst),
		"#VSRSPILL_ST", []>;
		}
		let mayLoad = 1 in {
		def VSRSPILL_LDX : Pseudo<(outs gpfprc:$XT), (ins memrr:$src),
		"#VSRSPILL_LDX", []>;
		def VSRSPILL_LD : Pseudo<(outs gpfprc:$XT), (ins memrix:$src),
		"#VSRSPILL_LD", []>;

		}
		}
		}
// Integer extend helper dags 32 -> 64		// Integer extend helper dags 32 -> 64
def AnyExts {		def AnyExts {
dag A = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $A, sub_32);		dag A = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $A, sub_32);
dag B = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $B, sub_32);		dag B = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $B, sub_32);
dag C = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $C, sub_32);		dag C = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $C, sub_32);
dag D = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $D, sub_32);		dag D = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $D, sub_32);
}		}

▲ Show 20 Lines • Show All 310 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCRegisterInfo.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "reginfo"		#define DEBUG_TYPE "reginfo"

#define GET_REGINFO_TARGET_DESC		#define GET_REGINFO_TARGET_DESC
#include "PPCGenRegisterInfo.inc"		#include "PPCGenRegisterInfo.inc"

static cl::opt<bool>		static cl::opt<bool>
EnableBasePointer("ppc-use-base-pointer", cl::Hidden, cl::init(true),		EnableBasePointer("ppc-use-base-pointer", cl::Hidden, cl::init(true),
		nemanjaiUnsubmitted Not Done Reply Inline Actions Nit: it's not actually called `gp8rc` but `g8rc` if I remember correctly. nemanjai: Nit: it's not actually called `gp8rc` but `g8rc` if I remember correctly.
cl::desc("Enable use of a base pointer for complex stack frames"));		cl::desc("Enable use of a base pointer for complex stack frames"));

static cl::opt<bool>		static cl::opt<bool>
AlwaysBasePointer("ppc-always-use-base-pointer", cl::Hidden, cl::init(false),		AlwaysBasePointer("ppc-always-use-base-pointer", cl::Hidden, cl::init(false),
cl::desc("Force the use of a base pointer in every function"));		cl::desc("Force the use of a base pointer in every function"));

		static cl::opt<bool>
		EnableGPRToVecSpills("ppc-enable-gpr-to-vsr-spills", cl::Hidden, cl::init(false),
		cl::desc("Enable spills from gpr to vsr rather than stack"));

PPCRegisterInfo::PPCRegisterInfo(const PPCTargetMachine &TM)		PPCRegisterInfo::PPCRegisterInfo(const PPCTargetMachine &TM)
: PPCGenRegisterInfo(TM.isPPC64() ? PPC::LR8 : PPC::LR,		: PPCGenRegisterInfo(TM.isPPC64() ? PPC::LR8 : PPC::LR,
TM.isPPC64() ? 0 : 1,		TM.isPPC64() ? 0 : 1,
TM.isPPC64() ? 0 : 1),		TM.isPPC64() ? 0 : 1),
TM(TM) {		TM(TM) {
ImmToIdxMap[PPC::LD] = PPC::LDX; ImmToIdxMap[PPC::STD] = PPC::STDX;		ImmToIdxMap[PPC::LD] = PPC::LDX; ImmToIdxMap[PPC::STD] = PPC::STDX;
ImmToIdxMap[PPC::LBZ] = PPC::LBZX; ImmToIdxMap[PPC::STB] = PPC::STBX;		ImmToIdxMap[PPC::LBZ] = PPC::LBZX; ImmToIdxMap[PPC::STB] = PPC::STBX;
ImmToIdxMap[PPC::LHZ] = PPC::LHZX; ImmToIdxMap[PPC::LHA] = PPC::LHAX;		ImmToIdxMap[PPC::LHZ] = PPC::LHZX; ImmToIdxMap[PPC::LHA] = PPC::LHAX;
Show All 9 Lines	PPCRegisterInfo::PPCRegisterInfo(const PPCTargetMachine &TM)
ImmToIdxMap[PPC::LHZ8] = PPC::LHZX8; ImmToIdxMap[PPC::LWZ8] = PPC::LWZX8;		ImmToIdxMap[PPC::LHZ8] = PPC::LHZX8; ImmToIdxMap[PPC::LWZ8] = PPC::LWZX8;
ImmToIdxMap[PPC::STB8] = PPC::STBX8; ImmToIdxMap[PPC::STH8] = PPC::STHX8;		ImmToIdxMap[PPC::STB8] = PPC::STBX8; ImmToIdxMap[PPC::STH8] = PPC::STHX8;
ImmToIdxMap[PPC::STW8] = PPC::STWX8; ImmToIdxMap[PPC::STDU] = PPC::STDUX;		ImmToIdxMap[PPC::STW8] = PPC::STWX8; ImmToIdxMap[PPC::STDU] = PPC::STDUX;
ImmToIdxMap[PPC::ADDI8] = PPC::ADD8;		ImmToIdxMap[PPC::ADDI8] = PPC::ADD8;

// VSX		// VSX
ImmToIdxMap[PPC::DFLOADf32] = PPC::LXSSPX;		ImmToIdxMap[PPC::DFLOADf32] = PPC::LXSSPX;
ImmToIdxMap[PPC::DFLOADf64] = PPC::LXSDX;		ImmToIdxMap[PPC::DFLOADf64] = PPC::LXSDX;
		ImmToIdxMap[PPC::VSRSPILL_LD] = PPC::VSRSPILL_LDX;
		ImmToIdxMap[PPC::VSRSPILL_ST] = PPC::VSRSPILL_STX;
ImmToIdxMap[PPC::DFSTOREf32] = PPC::STXSSPX;		ImmToIdxMap[PPC::DFSTOREf32] = PPC::STXSSPX;
ImmToIdxMap[PPC::DFSTOREf64] = PPC::STXSDX;		ImmToIdxMap[PPC::DFSTOREf64] = PPC::STXSDX;
ImmToIdxMap[PPC::LXV] = PPC::LXVX;		ImmToIdxMap[PPC::LXV] = PPC::LXVX;
ImmToIdxMap[PPC::LXSD] = PPC::LXSDX;		ImmToIdxMap[PPC::LXSD] = PPC::LXSDX;
ImmToIdxMap[PPC::LXSSP] = PPC::LXSSPX;		ImmToIdxMap[PPC::LXSSP] = PPC::LXSSPX;
ImmToIdxMap[PPC::STXV] = PPC::STXVX;		ImmToIdxMap[PPC::STXV] = PPC::STXVX;
ImmToIdxMap[PPC::STXSD] = PPC::STXSDX;		ImmToIdxMap[PPC::STXSD] = PPC::STXSDX;
ImmToIdxMap[PPC::STXSSP] = PPC::STXSSPX;		ImmToIdxMap[PPC::STXSSP] = PPC::STXSSPX;
▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines
const TargetRegisterClass *		const TargetRegisterClass *
PPCRegisterInfo::getLargestLegalSuperClass(const TargetRegisterClass *RC,		PPCRegisterInfo::getLargestLegalSuperClass(const TargetRegisterClass *RC,
const MachineFunction &MF) const {		const MachineFunction &MF) const {
const PPCSubtarget &Subtarget = MF.getSubtarget<PPCSubtarget>();		const PPCSubtarget &Subtarget = MF.getSubtarget<PPCSubtarget>();
if (Subtarget.hasVSX()) {		if (Subtarget.hasVSX()) {
// With VSX, we can inflate various sub-register classes to the full VSX		// With VSX, we can inflate various sub-register classes to the full VSX
// register set.		// register set.

		// For pwr9 we enable gpr to vector spills
		nemanjaiUnsubmitted Not Done Reply Inline Actions `// For Power9 we allow the user to enable GPR to vector spills.` Since we don't currently enable it by default even on Power9. nemanjai: `// For Power9 we allow the user to enable GPR to vector spills.` Since we don't currently…
		if (Subtarget.hasP9Vector() && EnableGPRToVecSpills &&
		RC == &PPC::G8RCRegClass)
		return &PPC::GPFPRCRegClass;
		nemanjaiUnsubmitted Not Done Reply Inline Actions Please add a check for ELFv2 ABI. We are allowing spills only to the volatile VSR's, so we want to enable this only on the ABI where the VSR's we've selected are actually volatile. nemanjai: Please add a check for ELFv2 ABI. We are allowing spills only to the volatile VSR's, so we want…
if (RC == &PPC::F8RCRegClass)		if (RC == &PPC::F8RCRegClass)
return &PPC::VSFRCRegClass;		return &PPC::VSFRCRegClass;
else if (RC == &PPC::VRRCRegClass)		else if (RC == &PPC::VRRCRegClass)
return &PPC::VSRCRegClass;		return &PPC::VSRCRegClass;
else if (RC == &PPC::F4RCRegClass && Subtarget.hasP8Vector())		else if (RC == &PPC::F4RCRegClass && Subtarget.hasP8Vector())
return &PPC::VSSRCRegClass;		return &PPC::VSSRCRegClass;
}		}

▲ Show 20 Lines • Show All 735 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCRegisterInfo.td

	Show First 20 Lines • Show All 299 Lines • ▼ Show 20 Lines
	def VFRC : RegisterClass<"PPC", [f64], 64,			def VFRC : RegisterClass<"PPC", [f64], 64,
	(add VF2, VF3, VF4, VF5, VF0, VF1, VF6, VF7,			(add VF2, VF3, VF4, VF5, VF0, VF1, VF6, VF7,
	VF8, VF9, VF10, VF11, VF12, VF13, VF14,			VF8, VF9, VF10, VF11, VF12, VF13, VF14,
	VF15, VF16, VF17, VF18, VF19, VF31, VF30,			VF15, VF16, VF17, VF18, VF19, VF31, VF30,
	VF29, VF28, VF27, VF26, VF25, VF24, VF23,			VF29, VF28, VF27, VF26, VF25, VF24, VF23,
	VF22, VF21, VF20)>;			VF22, VF21, VF20)>;
	def VSFRC : RegisterClass<"PPC", [f64], 64, (add F8RC, VFRC)>;			def VSFRC : RegisterClass<"PPC", [f64], 64, (add F8RC, VFRC)>;

				def GPFPRC : RegisterClass<"PPC", [i64, f64], 64, (add G8RC, VSFRC)>;
				nemanjaiUnsubmitted Not Done Reply Inline Actions `// Allow spilling GPR's into caller-saved VSR's.` nemanjai: `// Allow spilling GPR's into caller-saved VSR's.`

	// Register class for single precision scalars in VSX registers			// Register class for single precision scalars in VSX registers
	def VSSRC : RegisterClass<"PPC", [f32], 32, (add VSFRC)>;			def VSSRC : RegisterClass<"PPC", [f32], 32, (add VSFRC)>;

	// For QPX			// For QPX
	def QFRC : RegisterClass<"PPC", [v4f64], 256, (add (sequence "QF%u", 0, 13),			def QFRC : RegisterClass<"PPC", [v4f64], 256, (add (sequence "QF%u", 0, 13),
	(sequence "QF%u", 31, 14))>;			(sequence "QF%u", 31, 14))>;
	def QSRC : RegisterClass<"PPC", [v4f32], 128, (add QFRC)>;			def QSRC : RegisterClass<"PPC", [v4f32], 128, (add QFRC)>;
	def QBRC : RegisterClass<"PPC", [v4i1], 256, (add QFRC)> {			def QBRC : RegisterClass<"PPC", [v4i1], 256, (add QFRC)> {
	Show All 37 Lines

test/CodeGen/PowerPC/gpr-vsr-spill.ll

This file was added.

				; RUN: llc -verify-machineinstrs -mcpu=pwr9 -ppc-enable-gpr-to-vsr-spills < %s \| FileCheck %s
				define signext i32 @foo(i32 signext %a, i32 signext %b) {
				entry:
				%cmp = icmp slt i32 %a, %b
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				%0 = tail call i32 asm "add $0, $1, $2", "=r,r,r,~{r0},~{r1},~{r2},~{r3},~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r11},~{r12},~{r13},~{r14},~{r15},~{r16},~{r17},~{r18},~{r19},~{r20},~{r21},~{r22},~{r23},~{r24},~{r25},~{r26},~{r27},~{r28},~{r29}"(i32 %a, i32 %b)
				%mul = mul nsw i32 %0, %a
				%add = add i32 %b, %a
				%tmp = add i32 %add, %mul
				br label %if.end

				if.end: ; preds = %if.then, %entry
				%e.0 = phi i32 [ %tmp, %if.then ], [ undef, %entry ]
				ret i32 %e.0
				; CHECK: @foo
				; CHECK: mr 31, 3
				; CHECK: mtvsrd 0, 4
				inouehrsUnsubmitted Not Done Reply Inline Actions Actually, I cannot catch why we need spill here. The inline-asm clobbers all gprs but r30 and r31. So why we don't just use r30 and r31 for %a and %b? inouehrs: Actually, I cannot catch why we need spill here. The inline-asm clobbers all gprs but r30 and…
				syzaaraAuthorUnsubmitted Not Done Reply Inline Actions Yes, but we need a register to save the result of the add. The result register used for the add is r30 and so one of the input parameters is spilled. syzaara: Yes, but we need a register to save the result of the add. The result register used for the add…
				; CHECK: mffprd 30, 0
				; CHECK: add 30, 31, 30
				; CHECK: mffprd 3, 0
				; CHECK: add 3, 3, 31
				}

test/CodeGen/PowerPC/gpr-vsr-spill2.ll

This file was added.

				; RUN: llc -verify-machineinstrs -mcpu=pwr9 -ppc-enable-gpr-to-vsr-spills < %s \| FileCheck %s
				hfinkelUnsubmitted Not Done Reply Inline Actions Having this as an IR-level test seems fragile. Could you make this into a (simpler) MIR test that shows the behavior? hfinkel: Having this as an IR-level test seems fragile. Could you make this into a (simpler) MIR test…
				syzaaraAuthorUnsubmitted Not Done Reply Inline Actions I tried to create an MIR case using this, but the limitation with MIR tests identified in https://reviews.llvm.org/D33562 with MachineFunctionInfo not being saved/dumped as part of emitting .mir leads to machine verified errors. I tried to change the global vars to local vars to get around this limitation. However, doing that no longer reproduces the narrowed case so I will leave this as an IR test. syzaara: I tried to create an MIR case using this, but the limitation with MIR tests identified in https…

				nemanjaiUnsubmitted Not Done Reply Inline Actions As implemented, this test case doesn't really test anything meaningful. It really just tests that there's a reg-to-reg copy (implemented as a move-register) followed by a spill of the target register. The two could be separated by arbitrary amount of code (including redefinition of the register). Unless you can add more meaningful testing to this complicated test case, I would simply get rid of it. nemanjai: As implemented, this test case doesn't really test anything meaningful. It really just tests…
				%struct.move_s = type { i32, i32, i32, i32, i32, i32 }

				@nodes = external local_unnamed_addr global i32, align 4
				@rootnodecount = external local_unnamed_addr global [512 x i32], align 4
				@pv_length = external local_unnamed_addr global [300 x i32], align 4
				@captures = external local_unnamed_addr global i32, align 4
				@cur_score = external local_unnamed_addr global i32, align 4
				@rootlosers = external local_unnamed_addr global [300 x i32], align 4

				declare void @gen(%struct.move_s*)

				declare zeroext i32 @in_check()

				declare void @make(%struct.move_s*, i32 signext)

				declare zeroext i32 @check_legal(%struct.move_s*, i32 signext, i32 signext)

				declare signext i32 @search(i32 signext, i32 signext, i32 signext, i32 signext)

				declare void @post_thinking(i32 signext)

				define void @search_root(%struct.move_s* noalias nocapture sret %agg.result, i32 signext %originalalpha, i32 signext %originalbeta, i32 signext %depth) {
				; CHECK: mr [[NEWREG:[0-9]+]], {{[0-9]+}}
				inouehrsUnsubmitted Not Done Reply Inline Actions What's the intention of this complicated test case without spills to VSR? inouehrs: What's the intention of this complicated test case without spills to VSR?
				syzaaraAuthorUnsubmitted Not Done Reply Inline Actions This case shows how a spill of the new reg class is handled. Here we spilled a GPR to GPFPR where the new reg was also a gpr. We then needed to spill the new GPFPR using either a scalar store or vector store depending on the allocated register. syzaara: This case shows how a spill of the new reg class is handled. Here we spilled a GPR to GPFPR…
				; CHECK: std [[NEWREG]], {{[0-9]+}}(1) # 8-byte Folded Spill

				entry:
				%moves = alloca [512 x %struct.move_s], align 4
				%move_ordering = alloca [512 x i32], align 4
				%call5 = tail call zeroext i32 @in_check()
				store i32 0, i32* undef, align 4, !tbaa !0
				br i1 undef, label %if.then17, label %if.else

				if.then17: ; preds = %entry
				call void @gen(%struct.move_s* nonnull undef)
				store i32 0, i32* @captures, align 4, !tbaa !0
				call void @make(%struct.move_s* nonnull undef, i32 signext undef)
				br label %if.end51

				if.else: ; preds = %entry
				%arrayidx50 = getelementptr inbounds [512 x %struct.move_s], [512 x %struct.move_s]* %moves, i64 0, i64 0
				call void @gen(%struct.move_s* nonnull %arrayidx50)
				br label %if.end51

				if.end51: ; preds = %if.else, %if.then17
				%arrayidx52.pre-phi = phi %struct.move_s* [ %arrayidx50, %if.else ], [ undef, %if.then17 ]
				br label %while.cond.outer

				while.cond.outer: ; preds = %if.end248, %if.end51
				%root_score.0.ph = phi i32 [ %root_score.0.ph, %if.end248 ], [ -1000000, %if.end51 ]
				%alpha.0.ph = phi i32 [ %root_score.0.ph, %if.end248 ], [ %originalalpha, %if.end51 ]
				br label %while.cond

				while.cond: ; preds = %while.body, %while.cond.outer
				%arrayidx.i.2 = getelementptr inbounds [512 x i32], [512 x i32]* %move_ordering, i64 0, i64 0
				%0 = load i32, i32* %arrayidx.i.2, align 4, !tbaa !3
				%cmp1.i.2 = icmp sgt i32 %0, 0
				%.best.022.i.2 = select i1 %cmp1.i.2, i32 %0, i32 0
				%indvars.iv.next.i.2 = or i64 0, 3
				%1 = load i32, i32* undef, align 4, !tbaa !3
				%cmp1.i.3 = icmp sgt i32 %1, %.best.022.i.2
				%2 = trunc i64 %indvars.iv.next.i.2 to i32
				%..3 = select i1 %cmp1.i.3, i32 %2, i32 0
				%.best.022.i.3 = select i1 %cmp1.i.3, i32 %1, i32 %.best.022.i.2
				%arrayidx.i.4 = getelementptr inbounds [512 x i32], [512 x i32]* %move_ordering, i64 0, i64 undef
				%3 = load i32, i32* %arrayidx.i.4, align 4, !tbaa !3
				%cmp1.i.4 = icmp sgt i32 %3, %.best.022.i.3
				%..4 = select i1 %cmp1.i.4, i32 undef, i32 %..3
				%.best.022.i.4 = select i1 %cmp1.i.4, i32 %3, i32 %.best.022.i.3
				%indvars.iv.next.i.4 = or i64 0, 5
				%4 = load i32, i32* undef, align 4, !tbaa !3
				%cmp1.i.5 = icmp sgt i32 %4, %.best.022.i.4
				%5 = trunc i64 %indvars.iv.next.i.4 to i32
				%..5 = select i1 %cmp1.i.5, i32 %5, i32 %..4
				%indvars.iv.next.i.5 = or i64 0, 6
				%arrayidx.i.6 = getelementptr inbounds [512 x i32], [512 x i32]* %move_ordering, i64 0, i64 %indvars.iv.next.i.5
				%6 = load i32, i32* %arrayidx.i.6, align 4, !tbaa !3
				%cmp1.i.6 = icmp sgt i32 %6, 0
				%7 = trunc i64 %indvars.iv.next.i.5 to i32
				%..6 = select i1 %cmp1.i.6, i32 %7, i32 %..5
				%.best.022.i.6 = select i1 %cmp1.i.6, i32 %6, i32 0
				%indvars.iv.next.i.6 = or i64 0, 7
				%8 = load i32, i32* undef, align 4, !tbaa !3
				%cmp1.i.7 = icmp sgt i32 %8, %.best.022.i.6
				%9 = trunc i64 %indvars.iv.next.i.6 to i32
				%..7 = select i1 %cmp1.i.7, i32 %9, i32 %..6
				%cmp4.i = icmp sgt i32 %..7, -1000000
				br i1 %cmp4.i, label %while.body, label %while.end

				while.body: ; preds = %while.cond
				store i32 -1000000, i32* undef, align 4, !tbaa !3
				%arrayidx60 = getelementptr inbounds [300 x i32], [300 x i32]* @rootlosers, i64 0, i64 undef
				%10 = load i32, i32* %arrayidx60, align 4, !tbaa !3
				%tobool61 = icmp eq i32 %10, 0
				%brmerge = or i1 %tobool61, false
				br i1 %brmerge, label %if.end66, label %while.cond

				if.end66: ; preds = %while.body
				%11 = load i32, i32* @nodes, align 4, !tbaa !3
				%call78 = call zeroext i32 @check_legal(%struct.move_s* nonnull %arrayidx52.pre-phi, i32 signext undef, i32 signext %call5)
				%tobool79 = icmp eq i32 %call78, 0
				br i1 %tobool79, label %if.end248, label %if.then80

				if.then80: ; preds = %if.end66
				%sub100 = sub nsw i32 0, %alpha.0.ph
				%call102 = call signext i32 @search(i32 signext undef, i32 signext %sub100, i32 signext undef, i32 signext 0)
				unreachable

				if.end248: ; preds = %if.end66
				store i32 %root_score.0.ph, i32* @cur_score, align 4, !tbaa !3
				%arrayidx416655 = getelementptr inbounds [300 x i32], [300 x i32]* @pv_length, i64 0, i64 undef
				%12 = load i32, i32* %arrayidx416655, align 4, !tbaa !3
				store i32 %12, i32* undef, align 4, !tbaa !3
				call void @post_thinking(i32 signext %root_score.0.ph)
				%sub448 = sub nsw i32 0, %11
				%arrayidx450 = getelementptr inbounds [512 x i32], [512 x i32]* @rootnodecount, i64 0, i64 undef
				store i32 %sub448, i32* %arrayidx450, align 4, !tbaa !3
				br label %while.cond.outer

				while.end: ; preds = %while.cond
				ret void
				}

				!0 = !{!1, !1, i64 0}
				!1 = !{!"omnipotent char", !2, i64 0}
				!2 = !{!"Simple C/C++ TBAA"}
				!3 = !{!4, !4, i64 0}
				!4 = !{!"int", !1, i64 0}