This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
-
UnreachableBlockElim.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
vgpr-liverange-ir.ll
-
wave32.ll

Differential D149651

[UnreachableBlockElim] Don't remove LCSSA phi nodes
Needs ReviewPublic

Authored by foad on May 2 2023, 7:33 AM.

Download Raw Diff

Details

Reviewers

resistor
nhaehnle
Petar.Avramovic
sameerds

Group Reviewers

Restricted Project

Summary

Revert svn r54432:

SDISel's constant branch folding can fold away self-loops, which doesn't result in any dead blocks, but
rather an incorrect phi input.  Add code to UnreachableMachineBlockElim to get rid of these entries.

The effect of the reversion is not to remove single-input phi nodes as
created by LCSSA. Instead, phi nodes are only removed if they started
with multiple inputs but were reduced to a single input because some of
their predecessor blocks were unreachable.

This makes a significant difference on AMDGPU where we deliberately go
into LCSSA form before instruction selection. I have not noticed any
effect on other targets.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

foad created this revision.May 2 2023, 7:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2023, 7:33 AM

Herald added subscribers: kosarev, StephenFan, kerbowa and 4 others. · View Herald Transcript

foad requested review of this revision.May 2 2023, 7:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2023, 7:33 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

SDISel's constant branch folding can fold away self-loops, which doesn't result in any dead blocks, but
rather an incorrect phi input. Add code to UnreachableMachineBlockElim to get rid of these entries.

This description from r54432 is pretty unclear to me. It sounds like maybe it was using UnreachableBlockElim to fix MIR that was broken by a previous pass? Anyway reverting does not appear to cause any failures of that sort.

I forgot a FIXME: this patches currently causes CodeGen/AMDGPU/control-flow-optnone.ll to fail with *** Bad machine code: Kill missing from LiveVariables ***.

Add a small fix for AMDGPU SILowerControlFlow test failure.

Herald added a subscriber: arsenm. · View Herald TranscriptMay 2 2023, 8:12 AM

foad added inline comments.May 2 2023, 8:13 AM

llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
430–431 ↗	(On Diff #518740)	I can submit this separately, but I guess I would have to come up with a MIR test case. I am not sure why the change to UnreachableBlockElim exposed this problem.

Harbormaster completed remote builds in B229442: Diff 518740.May 2 2023, 8:53 AM

Pierre-vh added a subscriber: Pierre-vh.May 2 2023, 11:42 PM

Pierre-vh added inline comments.

llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
430–431 ↗	(On Diff #518740)	No strong opinion on leaving this in this patch or splitting it up, but it would be nice to have a MIR testcase just for this fix IMO (with `-run-pass` or `-start-before`). Maybe you can quickly get one by using `-stop-before` + `llvm-reduce` & some manual cleanup on the test that was previously failing?

foad added inline comments.May 3 2023, 7:54 AM

llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
430–431 ↗	(On Diff #518740)	It was too hard. I can dump the MIR after UnreachableBlockElim to generate a test case, but to show the bug I would need to run something like `llc -run-pass=livevars,phi-node-elimination,si-lower-control-flow`. But the LiveVariables pass adds a run of UnreachableBlockElim as a prerequisite, which is enough to mess up the test case. I think I'll just commit this separately as an obvious fix.

foad mentioned this in rG069f027e1e6b: [AMDGPU] Update LiveVariables in SILowerControlFlow.May 3 2023, 8:11 AM

Rebase.

Harbormaster completed remote builds in B229693: Diff 519082.May 3 2023, 9:02 AM

I guess you are observing code generation bug for AMDGPU? Is it replacing a phi like %1:vgpr = phi %0:sgpr with a %1:vgpr = COPY %0:sgpr? If that is the case, I think this does not sound like a root-fix.

If we have some MachineIR like below (bb1 is a self-loop):

 entry
  |
[ bb1  bb2
   \    /
     bb3

The unreachable block bb2 will be eliminated and the phi in bb3 will then be further simplified the same way as in the LCSSA case you are seeing.

I think the problem is for AMDGPU, we depends on the sgpr to vgpr copy lowered from phi should be in predecessor block, which is the way PHIElimination lowers phi. This is mainly because when the predecessor block is inside a loop, the COPY in predecessor block would be executed totally different from a COPY in the successor block.
For this specific issue, I think we can teach the pass here to insert COPY in the predecessor block as PHIElimination. This should not hurt other target as this is the standard way to lower phi and COPY between coalescable register classes would be coalesced away later.

I guess you are observing code generation bug for AMDGPU? Is it replacing a phi like %1:vgpr = phi %0:sgpr with a %1:vgpr = COPY %0:sgpr? If that is the case, I think this does not sound like a root-fix.

Right, I was investigating codegen problems on AMDGPU, but this patch is not supposed to fix them all. The rationale for this patch is:

simplify the pass
revert a "fix" which is apparently not required
do not *deliberately* remove all LCSSA phi nodes

If we have some MachineIR like below (bb1 is a self-loop):
 entry
  |
[ bb1  bb2
   \    /
     bb3
The unreachable block bb2 will be eliminated and the phi in bb3 will then be further simplified the same way as in the LCSSA case you are seeing.

That is a good point. I guess this pass is still going to remove *some* LCSSA phi nodes.

I thought the deliberate LCSSA thing was just a hack for DAG divergence. If you're on MIR it's no longer necessary?

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

UnreachableBlockElim.cpp

82 lines

test/

CodeGen/

AMDGPU/

vgpr-liverange-ir.ll

32 lines

wave32.ll

40 lines

Diff 519082

llvm/lib/CodeGen/UnreachableBlockElim.cpp

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines

	void UnreachableMachineBlockElim::getAnalysisUsage(AnalysisUsage &AU) const {			void UnreachableMachineBlockElim::getAnalysisUsage(AnalysisUsage &AU) const {
	AU.addPreserved<MachineLoopInfo>();			AU.addPreserved<MachineLoopInfo>();
	AU.addPreserved<MachineDominatorTree>();			AU.addPreserved<MachineDominatorTree>();
	MachineFunctionPass::getAnalysisUsage(AU);			MachineFunctionPass::getAnalysisUsage(AU);
	}			}

	bool UnreachableMachineBlockElim::runOnMachineFunction(MachineFunction &F) {			bool UnreachableMachineBlockElim::runOnMachineFunction(MachineFunction &F) {
	df_iterator_default_set<MachineBasicBlock*> Reachable;			df_iterator_default_set<MachineBasicBlock *> Reachable;
	bool ModifiedPHI = false;

	MachineDominatorTree *MDT = getAnalysisIfAvailable<MachineDominatorTree>();			MachineDominatorTree *MDT = getAnalysisIfAvailable<MachineDominatorTree>();
	MachineLoopInfo *MLI = getAnalysisIfAvailable<MachineLoopInfo>();			MachineLoopInfo *MLI = getAnalysisIfAvailable<MachineLoopInfo>();

	// Mark all reachable blocks.			// Mark all reachable blocks.
	for (MachineBasicBlock *BB : depth_first_ext(&F, Reachable))			for (MachineBasicBlock *BB : depth_first_ext(&F, Reachable))
	(void)BB/* Mark all reachable blocks */;			(void)BB/* Mark all reachable blocks */;

	// Loop over all dead blocks, remembering them and deleting all instructions			// Loop over all dead blocks, remembering them and deleting all instructions
	// in them.			// in them.
	std::vector<MachineBasicBlock*> DeadBlocks;			std::vector<MachineBasicBlock*> DeadBlocks;
	for (MachineBasicBlock &BB : F) {			for (MachineBasicBlock &BB : F) {
	// Test for deadness.			// Test for deadness.
	if (!Reachable.count(&BB)) {			if (!Reachable.count(&BB)) {
	DeadBlocks.push_back(&BB);			DeadBlocks.push_back(&BB);

	// Update dominator and loop info.			// Update dominator and loop info.
	if (MLI) MLI->removeBlock(&BB);			if (MLI) MLI->removeBlock(&BB);
	if (MDT && MDT->getNode(&BB)) MDT->eraseNode(&BB);			if (MDT && MDT->getNode(&BB)) MDT->eraseNode(&BB);

	while (BB.succ_begin() != BB.succ_end()) {			while (BB.succ_begin() != BB.succ_end()) {
	MachineBasicBlock* succ = *BB.succ_begin();			MachineBasicBlock Succ = BB.succ_begin();

	for (MachineInstr &Phi : succ->phis()) {			for (MachineInstr &Phi : make_early_inc_range(Succ->phis())) {
	for (unsigned i = Phi.getNumOperands() - 1; i >= 2; i -= 2) {			for (unsigned i = Phi.getNumOperands() - 1; i >= 2; i -= 2) {
	if (Phi.getOperand(i).isMBB() &&			if (Phi.getOperand(i).isMBB() &&
	Phi.getOperand(i).getMBB() == &BB) {			Phi.getOperand(i).getMBB() == &BB) {
	Phi.removeOperand(i);			Phi.removeOperand(i);
	Phi.removeOperand(i - 1);			Phi.removeOperand(i - 1);
	}			}
	}			}
	}

	BB.removeSuccessor(BB.succ_begin());
	}
	}
	}

	// Actually remove the blocks now.
	for (MachineBasicBlock *BB : DeadBlocks) {
	// Remove any call site information for calls in the block.
	for (auto &I : BB->instrs())
	if (I.shouldUpdateCallSiteInfo())
	BB->getParent()->eraseCallSiteInfo(&I);

	BB->eraseFromParent();
	}

	// Cleanup PHI nodes.
	for (MachineBasicBlock &BB : F) {
	// Prune unneeded PHI entries.
	SmallPtrSet<MachineBasicBlock*, 8> preds(BB.pred_begin(),
	BB.pred_end());
	for (MachineInstr &Phi : make_early_inc_range(BB.phis())) {
	for (unsigned i = Phi.getNumOperands() - 1; i >= 2; i -= 2) {
	if (!preds.count(Phi.getOperand(i).getMBB())) {
	Phi.removeOperand(i);
	Phi.removeOperand(i - 1);
	ModifiedPHI = true;
	}
	}

	if (Phi.getNumOperands() == 3) {			if (Phi.getNumOperands() == 3) {
	const MachineOperand &Input = Phi.getOperand(1);			const MachineOperand &Input = Phi.getOperand(1);
	const MachineOperand &Output = Phi.getOperand(0);			const MachineOperand &Output = Phi.getOperand(0);
	Register InputReg = Input.getReg();			Register InputReg = Input.getReg();
	Register OutputReg = Output.getReg();			Register OutputReg = Output.getReg();
	assert(Output.getSubReg() == 0 && "Cannot have output subregister");			assert(Output.getSubReg() == 0 && "Cannot have output subregister");
	ModifiedPHI = true;

	if (InputReg != OutputReg) {			if (InputReg != OutputReg) {
	MachineRegisterInfo &MRI = F.getRegInfo();			MachineRegisterInfo &MRI = F.getRegInfo();
	unsigned InputSub = Input.getSubReg();			unsigned InputSub = Input.getSubReg();
	if (InputSub == 0 &&			if (InputSub == 0 &&
	MRI.constrainRegClass(InputReg, MRI.getRegClass(OutputReg)) &&			MRI.constrainRegClass(InputReg, MRI.getRegClass(OutputReg)) &&
	!Input.isUndef()) {			!Input.isUndef()) {
	MRI.replaceRegWith(OutputReg, InputReg);			MRI.replaceRegWith(OutputReg, InputReg);
	} else {			} else {
	// The input register to the PHI has a subregister or it can't be			// The input register to the PHI has a subregister or it can't
	// constrained to the proper register class or it is undef:			// be constrained to the proper register class or it is undef:
	// insert a COPY instead of simply replacing the output			// insert a COPY instead of simply replacing the output
	// with the input.			// with the input.
	const TargetInstrInfo *TII = F.getSubtarget().getInstrInfo();			const TargetInstrInfo *TII = F.getSubtarget().getInstrInfo();
	BuildMI(BB, BB.getFirstNonPHI(), Phi.getDebugLoc(),			BuildMI(*Succ, Succ->getFirstNonPHI(), Phi.getDebugLoc(),
	TII->get(TargetOpcode::COPY), OutputReg)			TII->get(TargetOpcode::COPY), OutputReg)
	.addReg(InputReg, getRegState(Input), InputSub);			.addReg(InputReg, getRegState(Input), InputSub);
	}			}
	Phi.eraseFromParent();			Phi.eraseFromParent();
	}			}
	}			}
	}			}

				BB.removeSuccessor(BB.succ_begin());
				}
				}
				}

				// Actually remove the blocks now.
				for (MachineBasicBlock *BB : DeadBlocks) {
				// Remove any call site information for calls in the block.
				for (auto &I : BB->instrs())
				if (I.shouldUpdateCallSiteInfo())
				BB->getParent()->eraseCallSiteInfo(&I);

				BB->eraseFromParent();
	}			}

	F.RenumberBlocks();			F.RenumberBlocks();

	return (!DeadBlocks.empty() \|\| ModifiedPHI);			return !DeadBlocks.empty();
	}			}

llvm/test/CodeGen/AMDGPU/vgpr-liverange-ir.ll

Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	define amdgpu_ps float @else3(i32 %z, float %v, i32 inreg %bound, i32 %x0) #0 {
; SI-NEXT: SI_END_CF killed [[SI_ELSE]], implicit-def dead $exec, implicit-def dead $scc, implicit $exec		; SI-NEXT: SI_END_CF killed [[SI_ELSE]], implicit-def dead $exec, implicit-def dead $scc, implicit $exec
; SI-NEXT: [[V_ADD_U32_e64_1:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 1, [[PHI6]], 0, implicit $exec		; SI-NEXT: [[V_ADD_U32_e64_1:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 1, [[PHI6]], 0, implicit $exec
; SI-NEXT: [[S_ADD_I32_:%[0-9]+]]:sreg_32 = S_ADD_I32 killed [[PHI]], 1, implicit-def dead $scc		; SI-NEXT: [[S_ADD_I32_:%[0-9]+]]:sreg_32 = S_ADD_I32 killed [[PHI]], 1, implicit-def dead $scc
; SI-NEXT: S_CMP_LT_I32 [[S_ADD_I32_]], [[COPY1]], implicit-def $scc		; SI-NEXT: S_CMP_LT_I32 [[S_ADD_I32_]], [[COPY1]], implicit-def $scc
; SI-NEXT: S_CBRANCH_SCC1 %bb.1, implicit killed $scc		; SI-NEXT: S_CBRANCH_SCC1 %bb.1, implicit killed $scc
; SI-NEXT: S_BRANCH %bb.6		; SI-NEXT: S_BRANCH %bb.6
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: bb.6.for.end:		; SI-NEXT: bb.6.for.end:
; SI-NEXT: [[V_ADD_F32_e64_:%[0-9]+]]:vgpr_32 = nofpexcept V_ADD_F32_e64 0, killed [[PHI6]], 0, killed [[PHI5]], 0, 0, implicit $mode, implicit $exec		; SI-NEXT: [[PHI7:%[0-9]+]]:vgpr_32 = PHI [[PHI5]], %bb.5
		; SI-NEXT: [[PHI8:%[0-9]+]]:vgpr_32 = PHI [[PHI6]], %bb.5
		; SI-NEXT: [[V_ADD_F32_e64_:%[0-9]+]]:vgpr_32 = nofpexcept V_ADD_F32_e64 0, killed [[PHI8]], 0, killed [[PHI7]], 0, 0, implicit $mode, implicit $exec
; SI-NEXT: $vgpr0 = COPY killed [[V_ADD_F32_e64_]]		; SI-NEXT: $vgpr0 = COPY killed [[V_ADD_F32_e64_]]
; SI-NEXT: SI_RETURN_TO_EPILOG killed $vgpr0		; SI-NEXT: SI_RETURN_TO_EPILOG killed $vgpr0
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%i = phi i32 [ 0, %entry ], [ %inc, %if.end ]		%i = phi i32 [ 0, %entry ], [ %inc, %if.end ]
%x = phi i32 [ %x0, %entry ], [ %xinc, %if.end ]		%x = phi i32 [ %x0, %entry ], [ %xinc, %if.end ]
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	define amdgpu_ps float @loop(i32 %z, float %v, i32 inreg %bound, ptr %extern_func, ptr %extern_func2) #0 {
; SI-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY killed $vgpr0		; SI-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY killed $vgpr0
; SI-NEXT: [[V_CMP_GT_I32_e64_:%[0-9]+]]:sreg_32 = V_CMP_GT_I32_e64 6, killed [[COPY5]], implicit $exec		; SI-NEXT: [[V_CMP_GT_I32_e64_:%[0-9]+]]:sreg_32 = V_CMP_GT_I32_e64 6, killed [[COPY5]], implicit $exec
; SI-NEXT: [[SI_IF:%[0-9]+]]:sreg_32 = SI_IF killed [[V_CMP_GT_I32_e64_]], %bb.1, implicit-def dead $exec, implicit-def dead $scc, implicit $exec		; SI-NEXT: [[SI_IF:%[0-9]+]]:sreg_32 = SI_IF killed [[V_CMP_GT_I32_e64_]], %bb.1, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
; SI-NEXT: S_BRANCH %bb.6		; SI-NEXT: S_BRANCH %bb.6
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: bb.1.Flow:		; SI-NEXT: bb.1.Flow:
; SI-NEXT: successors: %bb.2(0x40000000), %bb.10(0x40000000)		; SI-NEXT: successors: %bb.2(0x40000000), %bb.10(0x40000000)
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: [[PHI:%[0-9]+]]:vgpr_32 = PHI undef [[COPY47:%[0-9]+]]:vgpr_32, %bb.0, %4, %bb.9		; SI-NEXT: [[PHI:%[0-9]+]]:vgpr_32 = PHI undef %47:vgpr_32, %bb.0, %4, %bb.9
; SI-NEXT: [[PHI1:%[0-9]+]]:vgpr_32 = PHI [[COPY4]], %bb.0, undef [[COPY49:%[0-9]+]]:vgpr_32, %bb.9		; SI-NEXT: [[PHI1:%[0-9]+]]:vgpr_32 = PHI [[COPY4]], %bb.0, undef %49:vgpr_32, %bb.9
; SI-NEXT: [[PHI2:%[0-9]+]]:vgpr_32 = PHI [[COPY3]], %bb.0, undef [[COPY51:%[0-9]+]]:vgpr_32, %bb.9		; SI-NEXT: [[PHI2:%[0-9]+]]:vgpr_32 = PHI [[COPY3]], %bb.0, undef %51:vgpr_32, %bb.9
; SI-NEXT: [[PHI3:%[0-9]+]]:vgpr_32 = PHI [[COPY2]], %bb.0, undef [[COPY53:%[0-9]+]]:vgpr_32, %bb.9		; SI-NEXT: [[PHI3:%[0-9]+]]:vgpr_32 = PHI [[COPY2]], %bb.0, undef %53:vgpr_32, %bb.9
; SI-NEXT: [[SI_ELSE:%[0-9]+]]:sreg_32 = SI_ELSE killed [[SI_IF]], %bb.10, implicit-def dead $exec, implicit-def dead $scc, implicit $exec		; SI-NEXT: [[SI_ELSE:%[0-9]+]]:sreg_32 = SI_ELSE killed [[SI_IF]], %bb.10, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
; SI-NEXT: S_BRANCH %bb.2		; SI-NEXT: S_BRANCH %bb.2
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: bb.2.if:		; SI-NEXT: bb.2.if:
; SI-NEXT: successors: %bb.3(0x80000000)		; SI-NEXT: successors: %bb.3(0x80000000)
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vreg_64 = REG_SEQUENCE killed [[PHI2]], %subreg.sub0, killed [[PHI3]], %subreg.sub1		; SI-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vreg_64 = REG_SEQUENCE killed [[PHI2]], %subreg.sub0, killed [[PHI3]], %subreg.sub1
; SI-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_MOV_B32 $exec_lo		; SI-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_MOV_B32 $exec_lo
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: bb.3:		; SI-NEXT: bb.3:
; SI-NEXT: successors: %bb.4(0x80000000)		; SI-NEXT: successors: %bb.4(0x80000000)
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: [[PHI4:%[0-9]+]]:vreg_64 = PHI undef [[COPY57:%[0-9]+]]:vreg_64, %bb.4, [[REG_SEQUENCE]], %bb.2		; SI-NEXT: [[PHI4:%[0-9]+]]:vreg_64 = PHI undef %55:vreg_64, %bb.4, [[REG_SEQUENCE]], %bb.2
; SI-NEXT: [[PHI5:%[0-9]+]]:vgpr_32 = PHI undef [[COPY59:%[0-9]+]]:vgpr_32, %bb.4, [[PHI1]], %bb.2		; SI-NEXT: [[PHI5:%[0-9]+]]:vgpr_32 = PHI undef %57:vgpr_32, %bb.4, [[PHI1]], %bb.2
; SI-NEXT: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI4]].sub0, implicit $exec		; SI-NEXT: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI4]].sub0, implicit $exec
; SI-NEXT: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI4]].sub1, implicit $exec		; SI-NEXT: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI4]].sub1, implicit $exec
; SI-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:sgpr_64 = REG_SEQUENCE killed [[V_READFIRSTLANE_B32_]], %subreg.sub0, killed [[V_READFIRSTLANE_B32_1]], %subreg.sub1		; SI-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:sgpr_64 = REG_SEQUENCE killed [[V_READFIRSTLANE_B32_]], %subreg.sub0, killed [[V_READFIRSTLANE_B32_1]], %subreg.sub1
; SI-NEXT: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], killed [[PHI4]], implicit $exec		; SI-NEXT: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], killed [[PHI4]], implicit $exec
; SI-NEXT: [[S_AND_SAVEEXEC_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_SAVEEXEC_B32 killed [[V_CMP_EQ_U64_e64_]], implicit-def $exec, implicit-def dead $scc, implicit $exec		; SI-NEXT: [[S_AND_SAVEEXEC_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_SAVEEXEC_B32 killed [[V_CMP_EQ_U64_e64_]], implicit-def $exec, implicit-def dead $scc, implicit $exec
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: bb.4:		; SI-NEXT: bb.4:
; SI-NEXT: successors: %bb.3(0x40000000), %bb.5(0x40000000)		; SI-NEXT: successors: %bb.3(0x40000000), %bb.5(0x40000000)
Show All 19 Lines	define amdgpu_ps float @loop(i32 %z, float %v, i32 inreg %bound, ptr %extern_func, ptr %extern_func2) #0 {
; SI-NEXT: successors: %bb.7(0x80000000)		; SI-NEXT: successors: %bb.7(0x80000000)
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: [[REG_SEQUENCE2:%[0-9]+]]:vreg_64 = REG_SEQUENCE killed [[COPY1]], %subreg.sub0, killed [[COPY]], %subreg.sub1		; SI-NEXT: [[REG_SEQUENCE2:%[0-9]+]]:vreg_64 = REG_SEQUENCE killed [[COPY1]], %subreg.sub0, killed [[COPY]], %subreg.sub1
; SI-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32_xm0_xexec = S_MOV_B32 $exec_lo		; SI-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32_xm0_xexec = S_MOV_B32 $exec_lo
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: bb.7:		; SI-NEXT: bb.7:
; SI-NEXT: successors: %bb.8(0x80000000)		; SI-NEXT: successors: %bb.8(0x80000000)
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: [[PHI6:%[0-9]+]]:vreg_64 = PHI undef [[COPY59:%[0-9]+]]:vreg_64, %bb.8, [[REG_SEQUENCE2]], %bb.6		; SI-NEXT: [[PHI6:%[0-9]+]]:vreg_64 = PHI undef %59:vreg_64, %bb.8, [[REG_SEQUENCE2]], %bb.6
; SI-NEXT: [[PHI7:%[0-9]+]]:vgpr_32 = PHI undef [[COPY61:%[0-9]+]]:vgpr_32, %bb.8, [[COPY4]], %bb.6		; SI-NEXT: [[PHI7:%[0-9]+]]:vgpr_32 = PHI undef %61:vgpr_32, %bb.8, [[COPY4]], %bb.6
; SI-NEXT: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI6]].sub0, implicit $exec		; SI-NEXT: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI6]].sub0, implicit $exec
; SI-NEXT: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI6]].sub1, implicit $exec		; SI-NEXT: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI6]].sub1, implicit $exec
; SI-NEXT: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_64 = REG_SEQUENCE killed [[V_READFIRSTLANE_B32_2]], %subreg.sub0, killed [[V_READFIRSTLANE_B32_3]], %subreg.sub1		; SI-NEXT: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_64 = REG_SEQUENCE killed [[V_READFIRSTLANE_B32_2]], %subreg.sub0, killed [[V_READFIRSTLANE_B32_3]], %subreg.sub1
; SI-NEXT: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE3]], killed [[PHI6]], implicit $exec		; SI-NEXT: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE3]], killed [[PHI6]], implicit $exec
; SI-NEXT: [[S_AND_SAVEEXEC_B32_1:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_SAVEEXEC_B32 killed [[V_CMP_EQ_U64_e64_1]], implicit-def $exec, implicit-def dead $scc, implicit $exec		; SI-NEXT: [[S_AND_SAVEEXEC_B32_1:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_SAVEEXEC_B32 killed [[V_CMP_EQ_U64_e64_1]], implicit-def $exec, implicit-def dead $scc, implicit $exec
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: bb.8:		; SI-NEXT: bb.8:
; SI-NEXT: successors: %bb.7(0x40000000), %bb.9(0x40000000)		; SI-NEXT: successors: %bb.7(0x40000000), %bb.9(0x40000000)
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	define amdgpu_ps float @loop_with_use(i32 %z, float %v, i32 inreg %bound, ptr %extern_func, ptr %extern_func2) #0 {
; SI-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY killed $vgpr0		; SI-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY killed $vgpr0
; SI-NEXT: [[V_CMP_GT_I32_e64_:%[0-9]+]]:sreg_32 = V_CMP_GT_I32_e64 6, killed [[COPY5]], implicit $exec		; SI-NEXT: [[V_CMP_GT_I32_e64_:%[0-9]+]]:sreg_32 = V_CMP_GT_I32_e64 6, killed [[COPY5]], implicit $exec
; SI-NEXT: [[SI_IF:%[0-9]+]]:sreg_32 = SI_IF killed [[V_CMP_GT_I32_e64_]], %bb.1, implicit-def dead $exec, implicit-def dead $scc, implicit $exec		; SI-NEXT: [[SI_IF:%[0-9]+]]:sreg_32 = SI_IF killed [[V_CMP_GT_I32_e64_]], %bb.1, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
; SI-NEXT: S_BRANCH %bb.6		; SI-NEXT: S_BRANCH %bb.6
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: bb.1.Flow:		; SI-NEXT: bb.1.Flow:
; SI-NEXT: successors: %bb.2(0x40000000), %bb.10(0x40000000)		; SI-NEXT: successors: %bb.2(0x40000000), %bb.10(0x40000000)
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: [[PHI:%[0-9]+]]:vgpr_32 = PHI undef [[COPY50:%[0-9]+]]:vgpr_32, %bb.0, %4, %bb.9		; SI-NEXT: [[PHI:%[0-9]+]]:vgpr_32 = PHI undef %48:vgpr_32, %bb.0, %4, %bb.9
; SI-NEXT: [[PHI1:%[0-9]+]]:vgpr_32 = PHI [[COPY3]], %bb.0, undef [[COPY52:%[0-9]+]]:vgpr_32, %bb.9		; SI-NEXT: [[PHI1:%[0-9]+]]:vgpr_32 = PHI [[COPY3]], %bb.0, undef %50:vgpr_32, %bb.9
; SI-NEXT: [[PHI2:%[0-9]+]]:vgpr_32 = PHI [[COPY2]], %bb.0, undef [[COPY54:%[0-9]+]]:vgpr_32, %bb.9		; SI-NEXT: [[PHI2:%[0-9]+]]:vgpr_32 = PHI [[COPY2]], %bb.0, undef %52:vgpr_32, %bb.9
; SI-NEXT: [[SI_ELSE:%[0-9]+]]:sreg_32 = SI_ELSE killed [[SI_IF]], %bb.10, implicit-def dead $exec, implicit-def dead $scc, implicit $exec		; SI-NEXT: [[SI_ELSE:%[0-9]+]]:sreg_32 = SI_ELSE killed [[SI_IF]], %bb.10, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
; SI-NEXT: S_BRANCH %bb.2		; SI-NEXT: S_BRANCH %bb.2
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: bb.2.if:		; SI-NEXT: bb.2.if:
; SI-NEXT: successors: %bb.3(0x80000000)		; SI-NEXT: successors: %bb.3(0x80000000)
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vreg_64 = REG_SEQUENCE killed [[PHI1]], %subreg.sub0, killed [[PHI2]], %subreg.sub1		; SI-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vreg_64 = REG_SEQUENCE killed [[PHI1]], %subreg.sub0, killed [[PHI2]], %subreg.sub1
; SI-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_MOV_B32 $exec_lo		; SI-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_MOV_B32 $exec_lo
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: bb.3:		; SI-NEXT: bb.3:
; SI-NEXT: successors: %bb.4(0x80000000)		; SI-NEXT: successors: %bb.4(0x80000000)
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: [[PHI3:%[0-9]+]]:vreg_64 = PHI undef [[COPY56:%[0-9]+]]:vreg_64, %bb.4, [[REG_SEQUENCE]], %bb.2		; SI-NEXT: [[PHI3:%[0-9]+]]:vreg_64 = PHI undef %54:vreg_64, %bb.4, [[REG_SEQUENCE]], %bb.2
; SI-NEXT: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI3]].sub0, implicit $exec		; SI-NEXT: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI3]].sub0, implicit $exec
; SI-NEXT: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI3]].sub1, implicit $exec		; SI-NEXT: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI3]].sub1, implicit $exec
; SI-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:sgpr_64 = REG_SEQUENCE killed [[V_READFIRSTLANE_B32_]], %subreg.sub0, killed [[V_READFIRSTLANE_B32_1]], %subreg.sub1		; SI-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:sgpr_64 = REG_SEQUENCE killed [[V_READFIRSTLANE_B32_]], %subreg.sub0, killed [[V_READFIRSTLANE_B32_1]], %subreg.sub1
; SI-NEXT: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], killed [[PHI3]], implicit $exec		; SI-NEXT: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE1]], killed [[PHI3]], implicit $exec
; SI-NEXT: [[S_AND_SAVEEXEC_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_SAVEEXEC_B32 killed [[V_CMP_EQ_U64_e64_]], implicit-def $exec, implicit-def dead $scc, implicit $exec		; SI-NEXT: [[S_AND_SAVEEXEC_B32_:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_SAVEEXEC_B32 killed [[V_CMP_EQ_U64_e64_]], implicit-def $exec, implicit-def dead $scc, implicit $exec
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: bb.4:		; SI-NEXT: bb.4:
; SI-NEXT: successors: %bb.3(0x40000000), %bb.5(0x40000000)		; SI-NEXT: successors: %bb.3(0x40000000), %bb.5(0x40000000)
Show All 19 Lines	define amdgpu_ps float @loop_with_use(i32 %z, float %v, i32 inreg %bound, ptr %extern_func, ptr %extern_func2) #0 {
; SI-NEXT: successors: %bb.7(0x80000000)		; SI-NEXT: successors: %bb.7(0x80000000)
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: [[REG_SEQUENCE2:%[0-9]+]]:vreg_64 = REG_SEQUENCE killed [[COPY1]], %subreg.sub0, killed [[COPY]], %subreg.sub1		; SI-NEXT: [[REG_SEQUENCE2:%[0-9]+]]:vreg_64 = REG_SEQUENCE killed [[COPY1]], %subreg.sub0, killed [[COPY]], %subreg.sub1
; SI-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32_xm0_xexec = S_MOV_B32 $exec_lo		; SI-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32_xm0_xexec = S_MOV_B32 $exec_lo
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: bb.7:		; SI-NEXT: bb.7:
; SI-NEXT: successors: %bb.8(0x80000000)		; SI-NEXT: successors: %bb.8(0x80000000)
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: [[PHI4:%[0-9]+]]:vreg_64 = PHI undef [[COPY58:%[0-9]+]]:vreg_64, %bb.8, [[REG_SEQUENCE2]], %bb.6		; SI-NEXT: [[PHI4:%[0-9]+]]:vreg_64 = PHI undef %56:vreg_64, %bb.8, [[REG_SEQUENCE2]], %bb.6
; SI-NEXT: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI4]].sub0, implicit $exec		; SI-NEXT: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI4]].sub0, implicit $exec
; SI-NEXT: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI4]].sub1, implicit $exec		; SI-NEXT: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sgpr_32 = V_READFIRSTLANE_B32 [[PHI4]].sub1, implicit $exec
; SI-NEXT: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_64 = REG_SEQUENCE killed [[V_READFIRSTLANE_B32_2]], %subreg.sub0, killed [[V_READFIRSTLANE_B32_3]], %subreg.sub1		; SI-NEXT: [[REG_SEQUENCE3:%[0-9]+]]:sgpr_64 = REG_SEQUENCE killed [[V_READFIRSTLANE_B32_2]], %subreg.sub0, killed [[V_READFIRSTLANE_B32_3]], %subreg.sub1
; SI-NEXT: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE3]], killed [[PHI4]], implicit $exec		; SI-NEXT: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_CMP_EQ_U64_e64 [[REG_SEQUENCE3]], killed [[PHI4]], implicit $exec
; SI-NEXT: [[S_AND_SAVEEXEC_B32_1:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_SAVEEXEC_B32 killed [[V_CMP_EQ_U64_e64_1]], implicit-def $exec, implicit-def dead $scc, implicit $exec		; SI-NEXT: [[S_AND_SAVEEXEC_B32_1:%[0-9]+]]:sreg_32_xm0_xexec = S_AND_SAVEEXEC_B32 killed [[V_CMP_EQ_U64_e64_1]], implicit-def $exec, implicit-def dead $scc, implicit $exec
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: bb.8:		; SI-NEXT: bb.8:
; SI-NEXT: successors: %bb.7(0x40000000), %bb.9(0x40000000)		; SI-NEXT: successors: %bb.7(0x40000000), %bb.9(0x40000000)
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @livevariables_update_missed_block(ptr addrspace(1) %src1) {
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: [[PHI:%[0-9]+]]:vgpr_32 = PHI [[COPY1]](s32), %bb.0, undef %52:vgpr_32, %bb.6		; SI-NEXT: [[PHI:%[0-9]+]]:vgpr_32 = PHI [[COPY1]](s32), %bb.0, undef %52:vgpr_32, %bb.6
; SI-NEXT: [[SI_ELSE:%[0-9]+]]:sreg_32 = SI_ELSE killed [[SI_IF]], %bb.7, implicit-def dead $exec, implicit-def dead $scc, implicit $exec		; SI-NEXT: [[SI_ELSE:%[0-9]+]]:sreg_32 = SI_ELSE killed [[SI_IF]], %bb.7, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
; SI-NEXT: S_BRANCH %bb.1		; SI-NEXT: S_BRANCH %bb.1
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: bb.6.sw.bb18:		; SI-NEXT: bb.6.sw.bb18:
; SI-NEXT: successors: %bb.5(0x80000000)		; SI-NEXT: successors: %bb.5(0x80000000)
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: [[PHI1:%[0-9]+]]:vgpr_32 = PHI undef [[COPY38:%[0-9]+]]:vgpr_32, %bb.3, [[GLOBAL_LOAD_UBYTE1]], %bb.4		; SI-NEXT: [[PHI1:%[0-9]+]]:vgpr_32 = PHI undef %36:vgpr_32, %bb.3, [[GLOBAL_LOAD_UBYTE1]], %bb.4
; SI-NEXT: [[V_MOV_B2:%[0-9]+]]:vreg_64 = V_MOV_B64_PSEUDO 0, implicit $exec		; SI-NEXT: [[V_MOV_B2:%[0-9]+]]:vreg_64 = V_MOV_B64_PSEUDO 0, implicit $exec
; SI-NEXT: GLOBAL_STORE_BYTE killed [[V_MOV_B2]], killed [[PHI1]], 0, 0, implicit $exec :: (store (s8) into `ptr addrspace(1) null`, addrspace 1)		; SI-NEXT: GLOBAL_STORE_BYTE killed [[V_MOV_B2]], killed [[PHI1]], 0, 0, implicit $exec :: (store (s8) into `ptr addrspace(1) null`, addrspace 1)
; SI-NEXT: S_BRANCH %bb.5		; SI-NEXT: S_BRANCH %bb.5
; SI-NEXT: {{ $}}		; SI-NEXT: {{ $}}
; SI-NEXT: bb.7.UnifiedReturnBlock:		; SI-NEXT: bb.7.UnifiedReturnBlock:
; SI-NEXT: SI_END_CF killed [[SI_ELSE]], implicit-def dead $exec, implicit-def dead $scc, implicit $exec		; SI-NEXT: SI_END_CF killed [[SI_ELSE]], implicit-def dead $exec, implicit-def dead $scc, implicit $exec
; SI-NEXT: S_ENDPGM 0		; SI-NEXT: S_ENDPGM 0
entry:		entry:
▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/wave32.ll

Show First 20 Lines • Show All 1,801 Lines • ▼ Show 20 Lines	; GFX1064-NEXT: s_endpgm
ret void		ret void
}		}

define amdgpu_ps <4 x float> @test_loop_vcc(<4 x float> %in) #0 {		define amdgpu_ps <4 x float> @test_loop_vcc(<4 x float> %in) #0 {
; GFX1032-LABEL: test_loop_vcc:		; GFX1032-LABEL: test_loop_vcc:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_mov_b32 s0, exec_lo		; GFX1032-NEXT: s_mov_b32 s0, exec_lo
; GFX1032-NEXT: s_wqm_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_wqm_b32 exec_lo, exec_lo
		; GFX1032-NEXT: v_mov_b32_e32 v7, v3
		; GFX1032-NEXT: v_mov_b32_e32 v6, v2
		; GFX1032-NEXT: v_mov_b32_e32 v5, v1
		; GFX1032-NEXT: v_mov_b32_e32 v4, v0
; GFX1032-NEXT: v_mov_b32_e32 v8, 0		; GFX1032-NEXT: v_mov_b32_e32 v8, 0
; GFX1032-NEXT: s_branch .LBB33_2		; GFX1032-NEXT: s_branch .LBB33_2
; GFX1032-NEXT: .LBB33_1: ; %body		; GFX1032-NEXT: .LBB33_1: ; %body
; GFX1032-NEXT: ; in Loop: Header=BB33_2 Depth=1		; GFX1032-NEXT: ; in Loop: Header=BB33_2 Depth=1
; GFX1032-NEXT: image_sample v[0:3], v4, s[0:7], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_1D		; GFX1032-NEXT: image_sample v[4:7], v0, s[0:7], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_1D
; GFX1032-NEXT: v_add_f32_e32 v8, 2.0, v8		; GFX1032-NEXT: v_add_f32_e32 v8, 2.0, v8
; GFX1032-NEXT: s_cbranch_execz .LBB33_4		; GFX1032-NEXT: s_cbranch_execz .LBB33_4
; GFX1032-NEXT: .LBB33_2: ; %loop		; GFX1032-NEXT: .LBB33_2: ; %loop
; GFX1032-NEXT: ; =>This Inner Loop Header: Depth=1		; GFX1032-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX1032-NEXT: v_cmp_lt_f32_e32 vcc_lo, 0x40e00000, v8		; GFX1032-NEXT: v_cmp_lt_f32_e32 vcc_lo, 0x40e00000, v8
; GFX1032-NEXT: s_waitcnt vmcnt(0)		; GFX1032-NEXT: s_waitcnt vmcnt(0)
; GFX1032-NEXT: v_mov_b32_e32 v7, v3		; GFX1032-NEXT: v_mov_b32_e32 v0, v4
; GFX1032-NEXT: v_mov_b32_e32 v6, v2		; GFX1032-NEXT: v_mov_b32_e32 v1, v5
; GFX1032-NEXT: v_mov_b32_e32 v5, v1		; GFX1032-NEXT: v_mov_b32_e32 v2, v6
; GFX1032-NEXT: v_mov_b32_e32 v4, v0		; GFX1032-NEXT: v_mov_b32_e32 v3, v7
; GFX1032-NEXT: s_cbranch_vccz .LBB33_1		; GFX1032-NEXT: s_cbranch_vccz .LBB33_1
; GFX1032-NEXT: ; %bb.3:		; GFX1032-NEXT: ; %bb.3:
; GFX1032-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3		; GFX1032-NEXT: ; implicit-def: $vgpr4_vgpr5_vgpr6_vgpr7
; GFX1032-NEXT: ; implicit-def: $vgpr8		; GFX1032-NEXT: ; implicit-def: $vgpr8
; GFX1032-NEXT: .LBB33_4: ; %break		; GFX1032-NEXT: .LBB33_4: ; %break
; GFX1032-NEXT: s_and_b32 exec_lo, exec_lo, s0		; GFX1032-NEXT: s_and_b32 exec_lo, exec_lo, s0
; GFX1032-NEXT: s_waitcnt vmcnt(0)		; GFX1032-NEXT: s_waitcnt vmcnt(0)
; GFX1032-NEXT: v_mov_b32_e32 v0, v4
; GFX1032-NEXT: v_mov_b32_e32 v1, v5
; GFX1032-NEXT: v_mov_b32_e32 v2, v6
; GFX1032-NEXT: v_mov_b32_e32 v3, v7
; GFX1032-NEXT: ; return to shader part epilog		; GFX1032-NEXT: ; return to shader part epilog
;		;
; GFX1064-LABEL: test_loop_vcc:		; GFX1064-LABEL: test_loop_vcc:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: s_mov_b64 s[0:1], exec		; GFX1064-NEXT: s_mov_b64 s[0:1], exec
; GFX1064-NEXT: s_wqm_b64 exec, exec		; GFX1064-NEXT: s_wqm_b64 exec, exec
		; GFX1064-NEXT: v_mov_b32_e32 v7, v3
		; GFX1064-NEXT: v_mov_b32_e32 v6, v2
		; GFX1064-NEXT: v_mov_b32_e32 v5, v1
		; GFX1064-NEXT: v_mov_b32_e32 v4, v0
; GFX1064-NEXT: v_mov_b32_e32 v8, 0		; GFX1064-NEXT: v_mov_b32_e32 v8, 0
; GFX1064-NEXT: s_branch .LBB33_2		; GFX1064-NEXT: s_branch .LBB33_2
; GFX1064-NEXT: .LBB33_1: ; %body		; GFX1064-NEXT: .LBB33_1: ; %body
; GFX1064-NEXT: ; in Loop: Header=BB33_2 Depth=1		; GFX1064-NEXT: ; in Loop: Header=BB33_2 Depth=1
; GFX1064-NEXT: image_sample v[0:3], v4, s[0:7], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_1D		; GFX1064-NEXT: image_sample v[4:7], v0, s[0:7], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_1D
; GFX1064-NEXT: v_add_f32_e32 v8, 2.0, v8		; GFX1064-NEXT: v_add_f32_e32 v8, 2.0, v8
; GFX1064-NEXT: s_cbranch_execz .LBB33_4		; GFX1064-NEXT: s_cbranch_execz .LBB33_4
; GFX1064-NEXT: .LBB33_2: ; %loop		; GFX1064-NEXT: .LBB33_2: ; %loop
; GFX1064-NEXT: ; =>This Inner Loop Header: Depth=1		; GFX1064-NEXT: ; =>This Inner Loop Header: Depth=1
; GFX1064-NEXT: v_cmp_lt_f32_e32 vcc, 0x40e00000, v8		; GFX1064-NEXT: v_cmp_lt_f32_e32 vcc, 0x40e00000, v8
; GFX1064-NEXT: s_waitcnt vmcnt(0)		; GFX1064-NEXT: s_waitcnt vmcnt(0)
; GFX1064-NEXT: v_mov_b32_e32 v7, v3		; GFX1064-NEXT: v_mov_b32_e32 v0, v4
; GFX1064-NEXT: v_mov_b32_e32 v6, v2		; GFX1064-NEXT: v_mov_b32_e32 v1, v5
; GFX1064-NEXT: v_mov_b32_e32 v5, v1		; GFX1064-NEXT: v_mov_b32_e32 v2, v6
; GFX1064-NEXT: v_mov_b32_e32 v4, v0		; GFX1064-NEXT: v_mov_b32_e32 v3, v7
; GFX1064-NEXT: s_cbranch_vccz .LBB33_1		; GFX1064-NEXT: s_cbranch_vccz .LBB33_1
; GFX1064-NEXT: ; %bb.3:		; GFX1064-NEXT: ; %bb.3:
; GFX1064-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3		; GFX1064-NEXT: ; implicit-def: $vgpr4_vgpr5_vgpr6_vgpr7
; GFX1064-NEXT: ; implicit-def: $vgpr8		; GFX1064-NEXT: ; implicit-def: $vgpr8
; GFX1064-NEXT: .LBB33_4: ; %break		; GFX1064-NEXT: .LBB33_4: ; %break
; GFX1064-NEXT: s_and_b64 exec, exec, s[0:1]		; GFX1064-NEXT: s_and_b64 exec, exec, s[0:1]
; GFX1064-NEXT: s_waitcnt vmcnt(0)		; GFX1064-NEXT: s_waitcnt vmcnt(0)
; GFX1064-NEXT: v_mov_b32_e32 v0, v4
; GFX1064-NEXT: v_mov_b32_e32 v1, v5
; GFX1064-NEXT: v_mov_b32_e32 v2, v6
; GFX1064-NEXT: v_mov_b32_e32 v3, v7
; GFX1064-NEXT: ; return to shader part epilog		; GFX1064-NEXT: ; return to shader part epilog
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%ctr.iv = phi float [ 0.0, %entry ], [ %ctr.next, %body ]		%ctr.iv = phi float [ 0.0, %entry ], [ %ctr.next, %body ]
%c.iv = phi <4 x float> [ %in, %entry ], [ %c.next, %body ]		%c.iv = phi <4 x float> [ %in, %entry ], [ %c.next, %body ]
%cc = fcmp ogt float %ctr.iv, 7.0		%cc = fcmp ogt float %ctr.iv, 7.0
▲ Show 20 Lines • Show All 1,094 Lines • Show Last 20 Lines