This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] moving vcc branch optimization into peephole
AbandonedPublic

Authored by cdevadas on Mar 3 2020, 12:11 AM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec

Summary

This optimization is presently included in SIInsertSkips pass.
SIInsertSkips will soon go away. Before that, Moving this
specific optimization into an appropriate place.

Diff Detail

Event Timeline

cdevadas created this revision.Mar 3 2020, 12:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2020, 12:11 AM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 8 others. · View Herald Transcript

Harbormaster completed remote builds in B47872: Diff 247807.Mar 3 2020, 12:49 AM

arsenm added inline comments.Mar 3 2020, 11:26 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1017	Should not need an extra run of this
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2182	Register, and initialization isn't needed

cdevadas marked 2 inline comments as done.Mar 3 2020, 7:28 PM

cdevadas added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1017	The peephole is invoked earlier during SSAOptimization. It is required here to optimize the pattern introduced later. The lit test multilevel-break.ll has a similar opportunity in function multi_if_break_loop.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2182	Are you saying the initialization is not required? SReg is not defined in all control-flows later.

arsenm added inline comments.Mar 6 2020, 10:43 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1017	Where is the pattern introduced? Does this ever trigger in the initial run?
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2182	If you use Register instead of unsigned, it default initializes to NoRegister/0

cdevadas marked 2 inline comments as done.Mar 7 2020, 9:02 AM

cdevadas added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

1017

The pattern got introduced with Basic Block Placement (See below, BB.2 & bb.5 are combined into BB.2)
The extra run is required to optimize it.

IR Dump before BB Placement:
multi_if_break_loop:
bb.2:

successors: %bb.5
liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3:0x0000000C, $sgpr0_sgpr1, $sgpr4_sgpr5
renamable $sgpr8_sgpr9 = S_MOV_B64 0
renamable $sgpr6_sgpr7 = S_MOV_B64 -1
renamable $sgpr10_sgpr11 = S_MOV_B64 -1
S_BRANCH %bb.5

bb.3:
// Insns
bb.5:

successors: %bb.6, %bb.8
renamable $vcc = S_AND_B64 $exec, killed renamable $sgpr10_sgpr11, implicit-def dead $scc
S_CBRANCH_VCCZ %bb.8, implicit $vcc

IR Dump after BB Placement:
multi_if_break_loop:

bb.2:

successors: %bb.6, %bb.8
liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3:0x0000000C, $sgpr0_sgpr1, $sgpr4_sgpr5
renamable $sgpr8_sgpr9 = S_MOV_B64 0
renamable $sgpr6_sgpr7 = S_MOV_B64 -1
renamable $sgpr10_sgpr11 = S_MOV_B64 -1
renamable $vcc = S_AND_B64 $exec, killed renamable $sgpr10_sgpr11, implicit-def dead $scc
S_CBRANCH_VCCZ %bb.8, implicit $vcc

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

2182

Sure, will do that.

incorporated the suggestion.

If this can't work in SSA, then it shouldn't be done in PeepholeOptimizer.

I'm also noticing a few defects in the existing handling. If I disable the optimization in test/CodeGen/AMDGPU/multilevel-break.ll, the dead and is actually left behind. I'm assuming this is because if the check for dead SCC, but this should be using LivePhysRegs to make sure SCC is not live out

arsenm added inline comments.Mar 19 2020, 2:41 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1017	Still has the extra pass run

cdevadas marked an inline comment as done.Mar 20 2020, 3:55 AM

cdevadas added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1017	Planning to introduce a late pass called 'SIPreEmitPeephole' to handle it. In general, this pass can handle any late optimization opportunities identified before code emission.

Abandoning this review.
This optimization should be handled late after Basic Block Placement. A new review will be opened by handling it in a late pass.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

1 line

SIInsertSkips.cpp

98 lines

SIInstrInfo.h

2 lines

SIInstrInfo.cpp

102 lines

test/

CodeGen/

AMDGPU/

insert-skip-from-vcc.mir

4 lines

wave32.ll

6 lines

Diff 249677

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 1,008 Lines • ▼ Show 20 Lines	void GCNPassConfig::addPreEmitPass() {
// Here we add a stand-alone hazard recognizer pass which can handle all		// Here we add a stand-alone hazard recognizer pass which can handle all
// cases.		// cases.
//		//
// FIXME: This stand-alone pass will emit indiv. S_NOP 0, as needed. It would		// FIXME: This stand-alone pass will emit indiv. S_NOP 0, as needed. It would
// be better for it to emit S_NOP <N> when possible.		// be better for it to emit S_NOP <N> when possible.
addPass(&PostRAHazardRecognizerID);		addPass(&PostRAHazardRecognizerID);

addPass(&SIRemoveShortExecBranchesID);		addPass(&SIRemoveShortExecBranchesID);
		addPass(&PeepholeOptimizerID);
		arsenmUnsubmitted Not Done Reply Inline Actions Should not need an extra run of this arsenm: Should not need an extra run of this
		cdevadasAuthorUnsubmitted Done Reply Inline Actions The peephole is invoked earlier during SSAOptimization. It is required here to optimize the pattern introduced later. The lit test multilevel-break.ll has a similar opportunity in function multi_if_break_loop. cdevadas: The peephole is invoked earlier during SSAOptimization. It is required here to optimize the…
		arsenmUnsubmitted Not Done Reply Inline Actions Where is the pattern introduced? Does this ever trigger in the initial run? arsenm: Where is the pattern introduced? Does this ever trigger in the initial run?
		cdevadasAuthorUnsubmitted Done Reply Inline Actions The pattern got introduced with Basic Block Placement (See below, BB.2 & bb.5 are combined into BB.2) The extra run is required to optimize it. IR Dump before BB Placement: multi_if_break_loop: bb.2: successors: %bb.5 liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3:0x0000000C, $sgpr0_sgpr1, $sgpr4_sgpr5 renamable $sgpr8_sgpr9 = S_MOV_B64 0 renamable $sgpr6_sgpr7 = S_MOV_B64 -1 renamable $sgpr10_sgpr11 = S_MOV_B64 -1 S_BRANCH %bb.5 bb.3: // Insns bb.5: successors: %bb.6, %bb.8 renamable $vcc = S_AND_B64 $exec, killed renamable $sgpr10_sgpr11, implicit-def dead $scc S_CBRANCH_VCCZ %bb.8, implicit $vcc IR Dump after BB Placement: multi_if_break_loop: bb.2: successors: %bb.6, %bb.8 liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3:0x0000000C, $sgpr0_sgpr1, $sgpr4_sgpr5 renamable $sgpr8_sgpr9 = S_MOV_B64 0 renamable $sgpr6_sgpr7 = S_MOV_B64 -1 renamable $sgpr10_sgpr11 = S_MOV_B64 -1 renamable $vcc = S_AND_B64 $exec, killed renamable $sgpr10_sgpr11, implicit-def dead $scc S_CBRANCH_VCCZ %bb.8, implicit $vcc cdevadas: The pattern got introduced with Basic Block Placement (See below, BB.2 & bb.5 are combined into…
		arsenmUnsubmitted Not Done Reply Inline Actions Still has the extra pass run arsenm: Still has the extra pass run
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Planning to introduce a late pass called 'SIPreEmitPeephole' to handle it. In general, this pass can handle any late optimization opportunities identified before code emission. cdevadas: Planning to introduce a late pass called 'SIPreEmitPeephole' to handle it. In general, this…
addPass(&SIInsertSkipsPassID);		addPass(&SIInsertSkipsPassID);
addPass(&BranchRelaxationPassID);		addPass(&BranchRelaxationPassID);
}		}

TargetPassConfig *GCNTargetMachine::createPassConfig(PassManagerBase &PM) {		TargetPassConfig *GCNTargetMachine::createPassConfig(PassManagerBase &PM) {
return new GCNPassConfig(*this, PM);		return new GCNPassConfig(*this, PM);
}		}

▲ Show 20 Lines • Show All 158 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInsertSkips.cpp

Show First 20 Lines • Show All 355 Lines • ▼ Show 20 Lines	bool SIInsertSkips::skipMaskBranch(MachineInstr &MI,
MachineBasicBlock::iterator InsPt = std::next(MI.getIterator());		MachineBasicBlock::iterator InsPt = std::next(MI.getIterator());

BuildMI(SrcMBB, InsPt, DL, TII->get(AMDGPU::S_CBRANCH_EXECZ))		BuildMI(SrcMBB, InsPt, DL, TII->get(AMDGPU::S_CBRANCH_EXECZ))
.addMBB(DestBB);		.addMBB(DestBB);

return true;		return true;
}		}

bool SIInsertSkips::optimizeVccBranch(MachineInstr &MI) const {
// Match:
// sreg = -1
// vcc = S_AND_B64 exec, sreg
// S_CBRANCH_VCC[N]Z
// =>
// S_CBRANCH_EXEC[N]Z
bool Changed = false;
MachineBasicBlock &MBB = *MI.getParent();
const GCNSubtarget &ST = MBB.getParent()->getSubtarget<GCNSubtarget>();
const bool IsWave32 = ST.isWave32();
const unsigned CondReg = TRI->getVCC();
const unsigned ExecReg = IsWave32 ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
const unsigned And = IsWave32 ? AMDGPU::S_AND_B32 : AMDGPU::S_AND_B64;

MachineBasicBlock::reverse_iterator A = MI.getReverseIterator(),
E = MBB.rend();
bool ReadsCond = false;
unsigned Threshold = 5;
for (++A ; A != E ; ++A) {
if (!--Threshold)
return false;
if (A->modifiesRegister(ExecReg, TRI))
return false;
if (A->modifiesRegister(CondReg, TRI)) {
if (!A->definesRegister(CondReg, TRI) \|\| A->getOpcode() != And)
return false;
break;
}
ReadsCond \|= A->readsRegister(CondReg, TRI);
}
if (A == E)
return false;

MachineOperand &Op1 = A->getOperand(1);
MachineOperand &Op2 = A->getOperand(2);
if (Op1.getReg() != ExecReg && Op2.isReg() && Op2.getReg() == ExecReg) {
TII->commuteInstruction(*A);
Changed = true;
}
if (Op1.getReg() != ExecReg)
return Changed;
if (Op2.isImm() && Op2.getImm() != -1)
return Changed;

unsigned SReg = AMDGPU::NoRegister;
if (Op2.isReg()) {
SReg = Op2.getReg();
auto M = std::next(A);
bool ReadsSreg = false;
for ( ; M != E ; ++M) {
if (M->definesRegister(SReg, TRI))
break;
if (M->modifiesRegister(SReg, TRI))
return Changed;
ReadsSreg \|= M->readsRegister(SReg, TRI);
}
if (M == E \|\|
!M->isMoveImmediate() \|\|
!M->getOperand(1).isImm() \|\|
M->getOperand(1).getImm() != -1)
return Changed;
// First if sreg is only used in and instruction fold the immediate
// into that and.
if (!ReadsSreg && Op2.isKill()) {
A->getOperand(2).ChangeToImmediate(-1);
M->eraseFromParent();
}
}

if (!ReadsCond && A->registerDefIsDead(AMDGPU::SCC) &&
MI.killsRegister(CondReg, TRI))
A->eraseFromParent();

bool IsVCCZ = MI.getOpcode() == AMDGPU::S_CBRANCH_VCCZ;
if (SReg == ExecReg) {
if (IsVCCZ) {
MI.eraseFromParent();
return true;
}
MI.setDesc(TII->get(AMDGPU::S_BRANCH));
} else {
MI.setDesc(TII->get(IsVCCZ ? AMDGPU::S_CBRANCH_EXECZ
: AMDGPU::S_CBRANCH_EXECNZ));
}

MI.RemoveOperand(MI.findRegisterUseOperandIdx(CondReg, false /Kill/, TRI));
MI.addImplicitDefUseOperands(*MBB.getParent());

return true;
}

bool SIInsertSkips::runOnMachineFunction(MachineFunction &MF) {		bool SIInsertSkips::runOnMachineFunction(MachineFunction &MF) {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
TII = ST.getInstrInfo();		TII = ST.getInstrInfo();
TRI = &TII->getRegisterInfo();		TRI = &TII->getRegisterInfo();
MDT = &getAnalysis<MachineDominatorTree>();		MDT = &getAnalysis<MachineDominatorTree>();
SkipThreshold = SkipThresholdFlag;		SkipThreshold = SkipThresholdFlag;

MachineBasicBlock *EmptyMBBAtEnd = nullptr;		MachineBasicBlock *EmptyMBBAtEnd = nullptr;
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	for (I = MBB.begin(); I != MBB.end(); I = Next) {
MBB.addSuccessor(EmptyMBBAtEnd);		MBB.addSuccessor(EmptyMBBAtEnd);
BuildMI(MBB, &MI, MI.getDebugLoc(), TII->get(AMDGPU::S_BRANCH))		BuildMI(MBB, &MI, MI.getDebugLoc(), TII->get(AMDGPU::S_BRANCH))
.addMBB(EmptyMBBAtEnd);		.addMBB(EmptyMBBAtEnd);
MI.eraseFromParent();		MI.eraseFromParent();

MDT->getBase().insertEdge(&MBB, EmptyMBBAtEnd);		MDT->getBase().insertEdge(&MBB, EmptyMBBAtEnd);
}		}
break;		break;

case AMDGPU::S_CBRANCH_VCCZ:
case AMDGPU::S_CBRANCH_VCCNZ:
MadeChange \|= optimizeVccBranch(MI);
break;

default:		default:
break;		break;
}		}
}		}
}		}

for (MachineInstr *Kill : KillInstrs) {		for (MachineInstr *Kill : KillInstrs) {
skipIfDead(*Kill->getParent(), std::next(Kill->getIterator()),		skipIfDead(*Kill->getParent(), std::next(Kill->getIterator()),
Kill->getDebugLoc());		Kill->getDebugLoc());
Kill->eraseFromParent();		Kill->eraseFromParent();
}		}
KillInstrs.clear();		KillInstrs.clear();

return MadeChange;		return MadeChange;
}		}

llvm/lib/Target/AMDGPU/SIInstrInfo.h

Show First 20 Lines • Show All 305 Lines • ▼ Show 20 Lines	void insertSelect(MachineBasicBlock &MBB,
unsigned DstReg, ArrayRef<MachineOperand> Cond,		unsigned DstReg, ArrayRef<MachineOperand> Cond,
unsigned TrueReg, unsigned FalseReg) const override;		unsigned TrueReg, unsigned FalseReg) const override;

void insertVectorSelect(MachineBasicBlock &MBB,		void insertVectorSelect(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I, const DebugLoc &DL,		MachineBasicBlock::iterator I, const DebugLoc &DL,
unsigned DstReg, ArrayRef<MachineOperand> Cond,		unsigned DstReg, ArrayRef<MachineOperand> Cond,
unsigned TrueReg, unsigned FalseReg) const;		unsigned TrueReg, unsigned FalseReg) const;

		bool optimizeCondBranch(MachineInstr &MI) const override;

unsigned getAddressSpaceForPseudoSourceKind(		unsigned getAddressSpaceForPseudoSourceKind(
unsigned Kind) const override;		unsigned Kind) const override;

bool		bool
areMemAccessesTriviallyDisjoint(const MachineInstr &MIa,		areMemAccessesTriviallyDisjoint(const MachineInstr &MIa,
const MachineInstr &MIb) const override;		const MachineInstr &MIb) const override;

bool isFoldableCopy(const MachineInstr &MI) const;		bool isFoldableCopy(const MachineInstr &MI) const;
▲ Show 20 Lines • Show All 856 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 2,121 Lines • ▼ Show 20 Lines	unsigned SIInstrInfo::removeBranch(MachineBasicBlock &MBB,
}		}

if (BytesRemoved)		if (BytesRemoved)
*BytesRemoved = RemovedSize;		*BytesRemoved = RemovedSize;

return Count;		return Count;
}		}

		bool SIInstrInfo::optimizeCondBranch(MachineInstr &MI) const {
		switch (MI.getOpcode()) {
		case AMDGPU::S_CBRANCH_VCCZ:
		case AMDGPU::S_CBRANCH_VCCNZ: {
		// Optimize the vcc branch instruction with the given pattern
		// into an exec branch instruction.
		// Pattern:
		// sreg = -1
		// vcc = S_AND_B64 exec, sreg
		// S_CBRANCH_VCC[N]Z
		// =>
		// S_CBRANCH_EXEC[N]Z
		//
		MachineBasicBlock &MBB = *MI.getParent();
		const GCNSubtarget &ST = MBB.getParent()->getSubtarget<GCNSubtarget>();
		const bool IsWave32 = ST.isWave32();
		const unsigned CondReg = RI.getVCC();
		const unsigned ExecReg = IsWave32 ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
		const unsigned And = IsWave32 ? AMDGPU::S_AND_B32 : AMDGPU::S_AND_B64;

		MachineBasicBlock::reverse_iterator A = MI.getReverseIterator(),
		E = MBB.rend();
		bool ReadsCond = false;
		unsigned Threshold = 5;
		bool Changed = false;

		for (++A; A != E; ++A) {
		if (!--Threshold)
		return false;
		if (A->modifiesRegister(ExecReg, &RI))
		return false;
		if (A->modifiesRegister(CondReg, &RI)) {
		if (!A->definesRegister(CondReg, &RI) \|\| A->getOpcode() != And)
		return false;
		break;
		}
		ReadsCond \|= A->readsRegister(CondReg, &RI);
		}
		if (A == E)
		return false;

		MachineOperand &Op1 = A->getOperand(1);
		MachineOperand &Op2 = A->getOperand(2);
		if (Op1.getReg() != ExecReg && Op2.isReg() && Op2.getReg() == ExecReg) {
		commuteInstruction(*A);
		Changed = true;
		}
		if (Op1.getReg() != ExecReg)
		return Changed;
		if (Op2.isImm() && Op2.getImm() != -1)
		return Changed;

		Register SReg;
		arsenmUnsubmitted Not Done Reply Inline Actions Register, and initialization isn't needed arsenm: Register, and initialization isn't needed
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Are you saying the initialization is not required? SReg is not defined in all control-flows later. cdevadas: Are you saying the initialization is not required? SReg is not defined in all control-flows…
		arsenmUnsubmitted Not Done Reply Inline Actions If you use Register instead of unsigned, it default initializes to NoRegister/0 arsenm: If you use Register instead of unsigned, it default initializes to NoRegister/0
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Sure, will do that. cdevadas: Sure, will do that.
		if (Op2.isReg()) {
		SReg = Op2.getReg();
		auto M = std::next(A);
		bool ReadsSreg = false;
		for (; M != E; ++M) {
		if (M->definesRegister(SReg, &RI))
		break;
		if (M->modifiesRegister(SReg, &RI))
		return Changed;
		ReadsSreg \|= M->readsRegister(SReg, &RI);
		}
		if (M == E \|\| !M->isMoveImmediate() \|\| !M->getOperand(1).isImm() \|\|
		M->getOperand(1).getImm() != -1)
		return Changed;
		// First if sreg is only used in and instruction fold the immediate
		// into that and.
		if (!ReadsSreg && Op2.isKill()) {
		A->getOperand(2).ChangeToImmediate(-1);
		M->eraseFromParent();
		}
		}

		if (!ReadsCond && A->registerDefIsDead(AMDGPU::SCC) &&
		MI.killsRegister(CondReg, &RI))
		A->eraseFromParent();

		bool IsVCCZ = MI.getOpcode() == AMDGPU::S_CBRANCH_VCCZ;
		if (SReg == ExecReg) {
		if (IsVCCZ) {
		MI.eraseFromParent();
		return true;
		}
		MI.setDesc(get(AMDGPU::S_BRANCH));
		} else {
		MI.setDesc(
		get(IsVCCZ ? AMDGPU::S_CBRANCH_EXECZ : AMDGPU::S_CBRANCH_EXECNZ));
		}

		MI.RemoveOperand(
		MI.findRegisterUseOperandIdx(CondReg, false /Kill/, &RI));
		MI.addImplicitDefUseOperands(*MBB.getParent());

		return true;
		}
		default:
		return false;
		}
		}

// Copy the flags onto the implicit condition register operand.		// Copy the flags onto the implicit condition register operand.
static void preserveCondRegFlags(MachineOperand &CondReg,		static void preserveCondRegFlags(MachineOperand &CondReg,
const MachineOperand &OrigCond) {		const MachineOperand &OrigCond) {
CondReg.setIsUndef(OrigCond.isUndef());		CondReg.setIsUndef(OrigCond.isUndef());
CondReg.setIsKill(OrigCond.isKill());		CondReg.setIsKill(OrigCond.isKill());
}		}

unsigned SIInstrInfo::insertBranch(MachineBasicBlock &MBB,		unsigned SIInstrInfo::insertBranch(MachineBasicBlock &MBB,
▲ Show 20 Lines • Show All 4,610 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/insert-skip-from-vcc.mir

	# RUN: llc -march=amdgcn -mcpu=fiji -run-pass si-insert-skips -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s			# RUN: llc -march=amdgcn -mcpu=fiji -run-pass peephole-opt -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s
	# RUN: llc -march=amdgcn -mcpu=gfx1010 -run-pass si-insert-skips -verify-machineinstrs -o - %s \| FileCheck -check-prefix=W32 %s			# RUN: llc -march=amdgcn -mcpu=gfx1010 -run-pass peephole-opt -verify-machineinstrs -o - %s \| FileCheck -check-prefix=W32 %s

	---			---
	# GCN-LABEL: name: and_execz_mov_vccz			# GCN-LABEL: name: and_execz_mov_vccz
	# GCN-NOT: S_MOV_			# GCN-NOT: S_MOV_
	# GCN-NOT: S_AND_			# GCN-NOT: S_AND_
	# GCN: S_CBRANCH_EXECZ %bb.1, implicit $exec			# GCN: S_CBRANCH_EXECZ %bb.1, implicit $exec
	name: and_execz_mov_vccz			name: and_execz_mov_vccz
	body: \|			body: \|
	▲ Show 20 Lines • Show All 330 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/wave32.ll

	Show First 20 Lines • Show All 830 Lines • ▼ Show 20 Lines
	define amdgpu_ps void @test_wqm_vote(float %a) {			define amdgpu_ps void @test_wqm_vote(float %a) {
	%c1 = fcmp une float %a, 0.0			%c1 = fcmp une float %a, 0.0
	%c2 = call i1 @llvm.amdgcn.wqm.vote(i1 %c1)			%c2 = call i1 @llvm.amdgcn.wqm.vote(i1 %c1)
	call void @llvm.amdgcn.kill(i1 %c2)			call void @llvm.amdgcn.kill(i1 %c2)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}test_branch_true:			; GCN-LABEL: {{^}}test_branch_true:
	; GFX1032: s_and_b32 vcc_lo, exec_lo, -1			; GFX1032: s_mov_b32 [[S_REG:s[0-9]+]], -1
	; GFX1064: s_and_b64 vcc, exec, -1			; GFX1032: s_and_b32 vcc_lo, exec_lo, [[S_REG]]
				; GFX1064: s_mov_b64 [[S_REG:s\[[0-9]+:[0-9]+\]]], -1
				; GFX1064: s_and_b64 vcc, exec, [[S_REG]]
	define amdgpu_kernel void @test_branch_true() #2 {			define amdgpu_kernel void @test_branch_true() #2 {
	entry:			entry:
	br i1 true, label %for.end, label %for.body.lr.ph			br i1 true, label %for.end, label %for.body.lr.ph

	for.body.lr.ph: ; preds = %entry			for.body.lr.ph: ; preds = %entry
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %for.body.lr.ph			for.body: ; preds = %for.body, %for.body.lr.ph
	▲ Show 20 Lines • Show All 280 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] moving vcc branch optimization into peepholeAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 249677

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/SIInsertSkips.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.h

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/test/CodeGen/AMDGPU/insert-skip-from-vcc.mir

llvm/test/CodeGen/AMDGPU/wave32.ll

[AMDGPU] moving vcc branch optimization into peephole
AbandonedPublic