This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
4/7
PPCRegisterInfo.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
knowCRBitSpill.ll

Differential D61754

[PowerPC] Custom lower known CR bit spills
ClosedPublic

Authored by lei on May 9 2019, 12:38 PM.

Download Raw Diff

Details

Reviewers

power-llvm-team
hfinkel
echristo
nemanjai
stefanp

Commits

rZORG66e73df85973: [PowerPC] Custom lower known CR bit spills
rZORGc1f2ea4ebec7: [PowerPC] Custom lower known CR bit spills
rG66e73df85973: [PowerPC] Custom lower known CR bit spills
rGc1f2ea4ebec7: [PowerPC] Custom lower known CR bit spills
rG22561972af47: [PowerPC] Custom lower known CR bit spills
rL360677: [PowerPC] Custom lower known CR bit spills

Summary

For known CRBit spills, CRSET/CRUNSET, it is more efficient to just load and spill the known value instead of extracting the bit.

eg. This sequence is currently used to spill a CRUNSET:

crclr   4*cr5+lt
mfocrf  r3,4
rlwinm  r3,r3,20,0,0
stw     r3,132(r1)

This patch custom lower it to:

li  r3,0
stw r3,132(r1)

Diff Detail

Event Timeline

lei created this revision.May 9 2019, 12:38 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2019, 12:38 PM

Herald added subscribers: jsji, kbarton, hiraditya, qcolombet. · View Herald Transcript

hfinkel added inline comments.May 9 2019, 2:39 PM

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
731	I think that this makes sense, but I'm a bit concerned that, without a cutoff, this makes the spilling process quadratic. Can you please add a cl::opt search cutoff for this?

lei marked an inline comment as done.May 10 2019, 12:47 PM

lei added inline comments.

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
731	How about `--ppc-max-crbit-spill-dist` with an initial value of 20?

hfinkel added inline comments.May 10 2019, 1:25 PM

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
731	Sounds good. Make the initial value a bit larger, however. It's easy to have blocks with more than 20 instructions. I'd start with 100. Also, remember to skip the debug instructions when counting, so you don't end up with differences between the debugging-enabled and debugging-disabled cases.

lei marked an inline comment as done.May 10 2019, 3:33 PM

lei added inline comments.

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
731	I didn't even think about debug instructions... how do I identify if an instruction is a debug instruction?

hfinkel added inline comments.May 13 2019, 8:56 AM

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
731	You can, I believe, call `Ins->isDebugInstr()`

Added option to specify cutoff for CR bit definition search with default of 100.

hfinkel added inline comments.May 13 2019, 12:56 PM

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
731	Please put spaces around the = here, and the = and == below (to match the general convention here).

address spacing issues.

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
731	thx!

LGTM.

If this test case proves too fragile in the future, we might replace it with an MIR test.

This revision is now accepted and ready to land.May 13 2019, 3:10 PM

In D61754#1500658, @hfinkel wrote:

LGTM.

If this test case proves too fragile in the future, we might replace it with an MIR test.

Hi Hal, I actually tried to create a MIR test first but had a hard time. I was trying to get the MIR prior to my pass via llc -stop-before=regalloc reduced.ll but it gave me LLVM ERROR: "regalloc" pass is not registered.. I just stopped there as I didn't know how to continue after that... let me know if you have suggestions on how to proceed with this kind of issue and I'll try it next time. Thx!

Closed by commit rL360677: [PowerPC] Custom lower known CR bit spills (authored by lei). · Explain WhyMay 14 2019, 7:24 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

PowerPC/

PPCRegisterInfo.cpp

71 lines

test/

CodeGen/

PowerPC/

knowCRBitSpill.ll

131 lines

Diff 199318

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	EnableGPRToVecSpills("ppc-enable-gpr-to-vsr-spills", cl::Hidden, cl::init(false),
cl::desc("Enable spills from gpr to vsr rather than stack"));		cl::desc("Enable spills from gpr to vsr rather than stack"));

static cl::opt<bool>		static cl::opt<bool>
StackPtrConst("ppc-stack-ptr-caller-preserved",		StackPtrConst("ppc-stack-ptr-caller-preserved",
cl::desc("Consider R1 caller preserved so stack saves of "		cl::desc("Consider R1 caller preserved so stack saves of "
"caller preserved registers can be LICM candidates"),		"caller preserved registers can be LICM candidates"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

		static cl::opt<unsigned>
		MaxCRBitSpillDist("ppc-max-crbit-spill-dist",
		cl::desc("Maximum search distance for definition of CR bit "
		"spill on ppc"),
		cl::Hidden, cl::init(100));

static unsigned offsetMinAlignForOpcode(unsigned OpC);		static unsigned offsetMinAlignForOpcode(unsigned OpC);

PPCRegisterInfo::PPCRegisterInfo(const PPCTargetMachine &TM)		PPCRegisterInfo::PPCRegisterInfo(const PPCTargetMachine &TM)
: PPCGenRegisterInfo(TM.isPPC64() ? PPC::LR8 : PPC::LR,		: PPCGenRegisterInfo(TM.isPPC64() ? PPC::LR8 : PPC::LR,
TM.isPPC64() ? 0 : 1,		TM.isPPC64() ? 0 : 1,
TM.isPPC64() ? 0 : 1),		TM.isPPC64() ? 0 : 1),
TM(TM) {		TM(TM) {
ImmToIdxMap[PPC::LD] = PPC::LDX; ImmToIdxMap[PPC::STD] = PPC::STDX;		ImmToIdxMap[PPC::LD] = PPC::LDX; ImmToIdxMap[PPC::STD] = PPC::STDX;
▲ Show 20 Lines • Show All 624 Lines • ▼ Show 20 Lines	void PPCRegisterInfo::lowerCRBitSpilling(MachineBasicBlock::iterator II,
unsigned FrameIndex) const {		unsigned FrameIndex) const {
// Get the instruction.		// Get the instruction.
MachineInstr &MI = *II; // ; SPILL_CRBIT <SrcReg>, <offset>		MachineInstr &MI = *II; // ; SPILL_CRBIT <SrcReg>, <offset>
// Get the instruction's basic block.		// Get the instruction's basic block.
MachineBasicBlock &MBB = *MI.getParent();		MachineBasicBlock &MBB = *MI.getParent();
MachineFunction &MF = *MBB.getParent();		MachineFunction &MF = *MBB.getParent();
const PPCSubtarget &Subtarget = MF.getSubtarget<PPCSubtarget>();		const PPCSubtarget &Subtarget = MF.getSubtarget<PPCSubtarget>();
const TargetInstrInfo &TII = *Subtarget.getInstrInfo();		const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
		const TargetRegisterInfo* TRI = Subtarget.getRegisterInfo();
DebugLoc dl = MI.getDebugLoc();		DebugLoc dl = MI.getDebugLoc();

bool LP64 = TM.isPPC64();		bool LP64 = TM.isPPC64();
const TargetRegisterClass *G8RC = &PPC::G8RCRegClass;		const TargetRegisterClass *G8RC = &PPC::G8RCRegClass;
const TargetRegisterClass *GPRC = &PPC::GPRCRegClass;		const TargetRegisterClass *GPRC = &PPC::GPRCRegClass;

unsigned Reg = MF.getRegInfo().createVirtualRegister(LP64 ? G8RC : GPRC);		unsigned Reg = MF.getRegInfo().createVirtualRegister(LP64 ? G8RC : GPRC);
unsigned SrcReg = MI.getOperand(0).getReg();		unsigned SrcReg = MI.getOperand(0).getReg();

		// Search up the BB to find the definition of the CR bit.
		MachineBasicBlock::reverse_iterator Ins;
		unsigned CRBitSpillDistance = 0;
		hfinkelUnsubmitted Not Done Reply Inline Actions I think that this makes sense, but I'm a bit concerned that, without a cutoff, this makes the spilling process quadratic. Can you please add a cl::opt search cutoff for this? hfinkel: I think that this makes sense, but I'm a bit concerned that, without a cutoff, this makes the…
		leiAuthorUnsubmitted Done Reply Inline Actions How about `--ppc-max-crbit-spill-dist` with an initial value of 20? lei: How about `--ppc-max-crbit-spill-dist` with an initial value of 20?
		hfinkelUnsubmitted Not Done Reply Inline Actions Sounds good. Make the initial value a bit larger, however. It's easy to have blocks with more than 20 instructions. I'd start with 100. Also, remember to skip the debug instructions when counting, so you don't end up with differences between the debugging-enabled and debugging-disabled cases. hfinkel: Sounds good. Make the initial value a bit larger, however. It's easy to have blocks with more…
		leiAuthorUnsubmitted Done Reply Inline Actions I didn't even think about debug instructions... how do I identify if an instruction is a debug instruction? lei: I didn't even think about debug instructions... how do I identify if an instruction is a debug…
		hfinkelUnsubmitted Not Done Reply Inline Actions You can, I believe, call `Ins->isDebugInstr()` hfinkel: You can, I believe, call `Ins->isDebugInstr()`
		leiAuthorUnsubmitted Done Reply Inline Actions thx! lei: thx!
		hfinkelUnsubmitted Done Reply Inline Actions Please put spaces around the = here, and the = and == below (to match the general convention here). hfinkel: Please put spaces around the = here, and the = and == below (to match the general convention…
		for (Ins = MI; Ins != MBB.rend(); Ins++) {
		// Definition found.
		if (Ins->modifiesRegister(SrcReg, TRI))
		break;
		// Unable to find CR bit definition within maximum search distance.
		if (CRBitSpillDistance == MaxCRBitSpillDist) {
		Ins = MI;
		break;
		}
		// Skip debug instructions when counting CR bit spill distance.
		if (!Ins->isDebugInstr())
		CRBitSpillDistance++;
		}

		// Unable to find the definition of the CR bit in the MBB.
		if (Ins == MBB.rend())
		Ins = MI;

		// There is no need to extract the CR bit if its value is already known.
		switch (Ins->getOpcode()) {
		case PPC::CRUNSET:
		BuildMI(MBB, II, dl, TII.get(LP64 ? PPC::LI8 : PPC::LI), Reg)
		.addImm(0);
		break;
		case PPC::CRSET:
		BuildMI(MBB, II, dl, TII.get(LP64 ? PPC::LIS8 : PPC::LIS), Reg)
		.addImm(-32768);
		break;
		default:
// We need to move the CR field that contains the CR bit we are spilling.		// We need to move the CR field that contains the CR bit we are spilling.
// The super register may not be explicitly defined (i.e. it can be defined		// The super register may not be explicitly defined (i.e. it can be defined
// by a CR-logical that only defines the subreg) so we state that the CR		// by a CR-logical that only defines the subreg) so we state that the CR
// field is undef. Also, in order to preserve the kill flag on the CR bit,		// field is undef. Also, in order to preserve the kill flag on the CR bit,
// we add it as an implicit use.		// we add it as an implicit use.
BuildMI(MBB, II, dl, TII.get(LP64 ? PPC::MFOCRF8 : PPC::MFOCRF), Reg)		BuildMI(MBB, II, dl, TII.get(LP64 ? PPC::MFOCRF8 : PPC::MFOCRF), Reg)
.addReg(getCRFromCRBit(SrcReg), RegState::Undef)		.addReg(getCRFromCRBit(SrcReg), RegState::Undef)
.addReg(SrcReg,		.addReg(SrcReg,
RegState::Implicit \| getKillRegState(MI.getOperand(0).isKill()));		RegState::Implicit \| getKillRegState(MI.getOperand(0).isKill()));

// If the saved register wasn't CR0LT, shift the bits left so that the bit to		// If the saved register wasn't CR0LT, shift the bits left so that the bit
// store is the first one. Mask all but that bit.		// to store is the first one. Mask all but that bit.
unsigned Reg1 = Reg;		unsigned Reg1 = Reg;
Reg = MF.getRegInfo().createVirtualRegister(LP64 ? G8RC : GPRC);		Reg = MF.getRegInfo().createVirtualRegister(LP64 ? G8RC : GPRC);

// rlwinm rA, rA, ShiftBits, 0, 0.		// rlwinm rA, rA, ShiftBits, 0, 0.
BuildMI(MBB, II, dl, TII.get(LP64 ? PPC::RLWINM8 : PPC::RLWINM), Reg)		BuildMI(MBB, II, dl, TII.get(LP64 ? PPC::RLWINM8 : PPC::RLWINM), Reg)
.addReg(Reg1, RegState::Kill)		.addReg(Reg1, RegState::Kill)
.addImm(getEncodingValue(SrcReg))		.addImm(getEncodingValue(SrcReg))
.addImm(0).addImm(0);		.addImm(0).addImm(0);
		}
addFrameReference(BuildMI(MBB, II, dl, TII.get(LP64 ? PPC::STW8 : PPC::STW))		addFrameReference(BuildMI(MBB, II, dl, TII.get(LP64 ? PPC::STW8 : PPC::STW))
.addReg(Reg, RegState::Kill),		.addReg(Reg, RegState::Kill),
FrameIndex);		FrameIndex);

// Discard the pseudo instruction.		// Discard the pseudo instruction.
MBB.erase(II);		MBB.erase(II);
}		}

▲ Show 20 Lines • Show All 467 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/knowCRBitSpill.ll

This file was added.

				; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \
				; RUN: -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s \| FileCheck %s
				; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-linux-gnu \
				; RUN: -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s \| FileCheck %s


				; For known CRBit spills, CRSET/CRUNSET, it is more efficient to just load and
				; spill the known value. These tests verify that for CRSET and CRUNSET spills
				; we do not extract the bit for spilling.

				%struct.anon = type { i32 }

				@b = common dso_local global %struct.anon* null, align 8
				@a = common dso_local global i64 0, align 8

				; Function Attrs: nounwind
				define dso_local signext i32 @spillCRSET(i32 signext %p1, i32 signext %p2) {
				; CHECK-LABEL: spillCRSET:
				; CHECK: # %bb.0: # %entry
				; CHECK: lis [[REG1:.*]], -32768
				; CHECK-DAG: creqv [[CREG:.]]cr5+lt, [[CREG]]cr5+lt, [[CREG]]cr5+lt
				; CHECK-NOT: mfocrf [[REG2:.*]], [[CREG]]
				; CHECK-NOT: rlwinm [[REG2]], [[REG2]]
				; CHECK: stw [[REG1]]
				; CHECK: .LBB0_1: # %redo_first_pass
				entry:
				%tobool = icmp eq i32 %p2, 0
				%tobool2 = icmp eq i32 %p1, 0
				br label %redo_first_pass

				redo_first_pass: ; preds = %for.end, %entry
				br i1 %tobool, label %if.end, label %if.then

				if.then: ; preds = %redo_first_pass
				%call = tail call signext i32 bitcast (i32 (...)* @fn2 to i32 ()*)() #2
				%tobool1 = icmp ne i32 %call, 0
				br label %if.end

				if.end: ; preds = %redo_first_pass, %if.then
				%c.1.off0 = phi i1 [ %tobool1, %if.then ], [ true, %redo_first_pass ]
				br i1 %tobool2, label %if.end4, label %if.then3

				if.then3: ; preds = %if.end
				%0 = load %struct.anon, %struct.anon* @b, align 8
				%contains_i = getelementptr inbounds %struct.anon, %struct.anon* %0, i64 0, i32 0
				store i32 1, i32* %contains_i, align 4
				br label %if.end4

				if.end4: ; preds = %if.end, %if.then3
				tail call void asm sideeffect "#DO_NOTHING", "~{cr0},~{cr1},~{cr2},~{cr3},~{cr4},~{cr5},~{cr6},~{cr7}"()
				br i1 %c.1.off0, label %if.then6, label %if.end13

				if.then6: ; preds = %if.end4
				%1 = load i64, i64* @a, align 8
				%cmp21 = icmp eq i64 %1, 0
				br i1 %cmp21, label %if.end13, label %for.body

				for.body: ; preds = %if.then6, %for.body
				%s.122 = phi i64 [ %inc, %for.body ], [ 0, %if.then6 ]
				%call7 = tail call signext i32 bitcast (i32 (...)* @fn3 to i32 ()*)()
				%inc = add nuw i64 %s.122, 1
				%exitcond = icmp eq i64 %inc, %1
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				br i1 %cmp21, label %if.end13, label %redo_first_pass

				if.end13: ; preds = %if.then6, %for.end, %if.end4
				ret i32 0
				}

				%struct.p5rx = type { i32 }

				; Function Attrs: nounwind
				define dso_local signext i32 @spillCRUNSET(%struct.p5rx* readonly %p1, i32 signext %p2, i32 signext %p3) {
				; CHECK-LABEL: spillCRUNSET:
				; CHECK: # %bb.0: # %entry
				; CHECK-DAG: crxor [[CREG:.]]cr5+lt, [[CREG]]cr5+lt, [[CREG]]cr5+lt
				; CHECK-DAG: li [[REG1:.*]], 0
				; CHECK-NOT: mfocrf [[REG2:.*]], [[CREG]]
				; CHECK-NOT: rlwinm [[REG2]], [[REG2]]
				; CHECK: stw [[REG1]]
				; CHECK: .LBB1_1: # %redo_first_pass
				entry:
				%and = and i32 %p3, 128
				%tobool = icmp eq i32 %and, 0
				%tobool2 = icmp eq %struct.p5rx* %p1, null
				%sv_any = getelementptr inbounds %struct.p5rx, %struct.p5rx* %p1, i64 0, i32 0
				%tobool12 = icmp eq i32 %p2, 0
				br label %redo_first_pass

				redo_first_pass: ; preds = %if.end11, %entry
				%a.0.off0 = phi i1 [ false, %entry ], [ %a.1.off0, %if.end11 ]
				br i1 %tobool, label %if.end, label %if.then

				if.then: ; preds = %redo_first_pass
				%call = tail call signext i32 bitcast (i32 (...)* @fn2 to i32 ()*)()
				%tobool1 = icmp ne i32 %call, 0
				br label %if.end

				if.end: ; preds = %redo_first_pass, %if.then
				%a.1.off0 = phi i1 [ %tobool1, %if.then ], [ %a.0.off0, %redo_first_pass ]
				tail call void asm sideeffect "#DO_NOTHING", "~{cr0},~{cr1},~{cr2},~{cr3},~{cr4},~{cr5},~{cr6},~{cr7}"()
				br i1 %tobool2, label %if.end11, label %land.lhs.true

				land.lhs.true: ; preds = %if.end
				%call3 = tail call signext i32 bitcast (i32 (...)* @fn3 to i32 ()*)()
				%tobool4 = icmp eq i32 %call3, 0
				br i1 %tobool4, label %if.end11, label %land.lhs.true5

				land.lhs.true5: ; preds = %land.lhs.true
				%0 = load i32, i32* %sv_any, align 4
				%tobool6 = icmp eq i32 %0, 0
				%a.1.off0.not = xor i1 %a.1.off0, true
				%brmerge = or i1 %tobool6, %a.1.off0.not
				br i1 %brmerge, label %if.end11, label %if.then9

				if.then9: ; preds = %land.lhs.true5
				%call10 = tail call signext i32 bitcast (i32 (...)* @fn4 to i32 ()*)()
				br label %if.end11

				if.end11: ; preds = %land.lhs.true5, %land.lhs.true, %if.end, %if.then9
				br i1 %tobool12, label %if.end14, label %redo_first_pass

				if.end14: ; preds = %if.end11
				ret i32 0
				}

				declare signext i32 @fn2(...)
				declare signext i32 @fn3(...)
				declare signext i32 @fn4(...)