This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
2/2
MachineLICM.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
licm-regpressure.mir

Differential D107677

Prevent machine licm if remattable with a vreg use
ClosedPublic

Authored by rampitec on Aug 6 2021, 4:14 PM.

Download Raw Diff

Details

Reviewers

arsenm
efriedma
dmgreen

Commits

rGb9e433b02a77: Prevent machine licm if remattable with a vreg use

Summary

Check if a remateralizable nstruction does not have any virtual
register uses. Even though rematerializable RA might not actually
rematerialize it in this scenario. In that case we do not want to
hoist such instruction out of the loop in a believe RA will sink
it back if needed.

This already has impact on AMDGPU target which does not check for
this condition in its isTriviallyReMaterializable implementation
and have instructions with virtual register uses enabled. The
other targets are not impacted at this point although will be when
D106408 lands.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rampitec created this revision.Aug 6 2021, 4:14 PM

Herald added subscribers: foad, kerbowa, pengfei and 14 others. · View Herald TranscriptAug 6 2021, 4:14 PM

rampitec requested review of this revision.Aug 6 2021, 4:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2021, 4:14 PM

Herald added subscribers: aheejin, wdng. · View Herald Transcript

rampitec added a parent revision: D106408: Allow rematerialization of virtual reg uses.Aug 6 2021, 4:14 PM

Harbormaster completed remote builds in B118463: Diff 364907.Aug 6 2021, 4:53 PM

rampitec mentioned this in D106408: Allow rematerialization of virtual reg uses.Aug 6 2021, 4:53 PM

Moved the check into MachineLICM itself to avoid predication code in D106408.

rampitec edited the summary of this revision. (Show Details)Aug 9 2021, 11:24 AM

rampitec added a child revision: D106408: Allow rematerialization of virtual reg uses.Aug 9 2021, 11:26 AM

Harbormaster completed remote builds in B118708: Diff 365230.Aug 9 2021, 11:55 AM

This does effect the ARM backend, apparently at some point it obtained the ability to hoist VCTP instructions which take a register use. I'm not sure if that really fits the definition of trivially rematerializable though, from the code comment on isTriviallyReMaterializable. (But I'm not sure that comment is up to date.)

Can you explain more why we don't want to hoist them out of loops?

In D107677#2936366, @dmgreen wrote:

This does effect the ARM backend, apparently at some point it obtained the ability to hoist VCTP instructions which take a register use. I'm not sure if that really fits the definition of trivially rematerializable though, from the code comment on isTriviallyReMaterializable. (But I'm not sure that comment is up to date.)

I do not see a failing test though, but given this code yes, it should hoist it now because it also does not check uses:

bool ARMBaseInstrInfo::isReallyTriviallyReMaterializable(const MachineInstr &MI,
                                                         AAResults *AA) const {
  // Try hard to rematerialize any VCTPs because if we spill P0, it will block
  // the tail predication conversion. This means that the element count
  // register has to be live for longer, but that has to be better than
  // spill/restore and VPT predication.
  return isVCTP(&MI) && !isPredicated(MI);
}

Can you explain more why we don't want to hoist them out of loops?

This code in the MachineLICMBase::IsProfitableToHoist() assumes a trivially rematerializable instruction can always be rematerialized if needed by RA:

// Rematerializable instructions should always be hoisted since the register
// allocator can just pull them down again when needed.
if (TII->isTriviallyReMaterializable(MI, AA))
  return true;

So MachineLICM will always hoist such instructions even if that will push register pressure too high. However, its assumption that a rematerailizable instruction will be rematerilized if RA does not have enough registers is not true. The check at LiveRangeEdit::allUsesAvailableAt() ensures that a used register is available at a point of rematerialization. If it does not RA will not try to extend the liverange to that point at least because that would increase register pressure. So in fact rematerilization will not happen as LICM expects. I.e. it would transform:

LOOP:
  %0 = DEF killed %1
  USE killed %0
  GOTO LOOP

into

%0 = DEF killed %1
LOOP:
  USE %0
  GOTO LOOP

Here DEF can be rematerialized before USE, but RA will not do it because %1 is killed at DEF and not available at USE. If DEF itself increases register pressure that is a problem.

In the test AMDGPU/licm-regpressure.mir updated in this patch MachineLICM hoists all V_CVT_F64_I32_e32 instrtuctions and that makes virtual registers %18 - %35 defined by these instructions live across the whole loop. By adding a check for such uses the logic of MachineLICMBase::IsProfitableToHoist proceeds further to CanCauseHighRegPressure() check and only hoists first 5 instructions (%18 - %22). The rest are kept in the loop because we have reached register pressure limit. Instructions remaining in the loop only consume 1 register for all defs because def is immediately killed.

I assume the same problem may happen with any instruction defining something including VCTP.

rampitec mentioned this in D107859: [AMDGPU] MachineLICM cannot hoist VALU.Aug 10 2021, 12:12 PM

In D107677#2936366, @dmgreen wrote:

This does effect the ARM backend, apparently at some point it obtained the ability to hoist VCTP instructions which take a register use. I'm not sure if that really fits the definition of trivially rematerializable though, from the code comment on isTriviallyReMaterializable. (But I'm not sure that comment is up to date.)

I am also looking at the D87280 which has enabled rematerialization of VCTP. It looks like the tests there want it to be rematerialized, effectively sunk into a loop body, and not hoisted. I am not sure if such hoisting (which will happen less frequently if ever after this patch) considered a good thing. My impression from the D87280 and its description that it would not be desired.

In D107677#2938247, @rampitec wrote:

In D107677#2936366, @dmgreen wrote:

This does effect the ARM backend, apparently at some point it obtained the ability to hoist VCTP instructions which take a register use. I'm not sure if that really fits the definition of trivially rematerializable though, from the code comment on isTriviallyReMaterializable. (But I'm not sure that comment is up to date.)

I am also looking at the D87280 which has enabled rematerialization of VCTP. It looks like the tests there want it to be rematerialized, effectively sunk into a loop body, and not hoisted. I am not sure if such hoisting (which will happen less frequently if ever after this patch) considered a good thing. My impression from the D87280 and its description that it would not be desired.

Looking more into that I do not think VCTP was ever hoisted pre-RA. It shall not pass MachineLoop::isLoopInvariant() check because of the physreg P0 def, unless P0 is dead.

I wouldn't worry too much about VCTP's. There are generated in two ways - one from the vectorizer in which case they are pretty glued in place, not able to be moved much. They are also created from intrinsics, which are the cases I was seeing them behave differently. They will be rarer and still only used in certain situations, but will be more free to move. We have many more tests there than we would in the llvm tests, so it's not surprising that there are not any test that change.

From those results, I would say the on average this change is flat - some improvements, some decreases, mostly balances out - except for one case. That's a matrix multiply kernel that used to do this:

outerloop:
  VCTP r0
  ...
innerloop:
  LDR      r1,[sp,#0x74]       
  VLDRW.U32 q3,[r0],#16        
  VLDRW.U32 q2,[r1,q1,UXTW #2] 
  VRMLALVHA.S32 r6,r7,q3,q2    
  VLDRW.U32 q3,[r12],#16
  VRMLALVHA.S32 r4,r5,q3,q2
  VLDRW.U32 q3,[r11],#16
  VRMLALVHA.S32 r2,r9,q3,q2
  VLDRW.U32 q3,[r8],#16
  VRMLALVHA.S32 r10,r3,q3,q2
  LDR      r1,[sp,#0x70]
  VADD.I32 q1,q1,r1
  LE       lr,#innerloop
...

And now has a bad day:

outerloop:
...
innerloop
  LDR      r9,[sp,#0x74]
  VLDRW.U32 q3,[r0],#16
  VLDRW.U32 q2,[r9,q1,UXTW #2]
  VRMLALVHA.S32 r12,r7,q3,q2
  VLDRW.U32 q3,[r6],#16
  VRMLALVHA.S32 r10,r5,q3,q2
  VLDRW.U32 q3,[r11],#16
  VRMLALVHA.S32 r4,r1,q3,q2
  VLDRW.U32 q3,[r8],#16
  VRMLALVHA.S32 r2,r3,q3,q2
  MOV      r9,r7
  MOV      r7,r5
  MOV      r5,r4
  MOV      r4,r2
  MOV      r2,r1
  MOV      r1,r3
  LDR      r3,[sp,#0x70]
  VADD.I32 q1,q1,r3
  MOV      r3,r1
  MOV      r1,r2
  MOV      r2,r4
  MOV      r4,r5
  MOV      r5,r7
  MOV      r7,r9
  LE       lr,#innerloop
...
  VCTP lr

I guess this patch can't be blamed for the register allocator going haywire :-)

Taking a step back, my understanding of "Trivial Rematerialization" comes from the definition above isTriviallyReMaterializable:

/// Return true if the instruction is trivially rematerializable, meaning it
/// has no side effects and requires no operands that aren't always available.
/// This means the only allowed uses are constants and unallocatable physical
/// registers so that the instructions result is independent of the place
/// in the function.
bool isTriviallyReMaterializable(const MachineInstr &MI,
                                 AAResults *AA = nullptr) const {

So they are expected to only have operands that are available everywhere in the program. That is what makes them trivial to rematerialize. It sounds like D105742 (and D87280 although I'm not sure it should have) has changed that definition to now include instruction that include virtual uses. Non-trivial rematerialization, if you will. And it is now the responsibility of the caller to ensure that the virtual uses are valid at the point it is being moved. That is what D106396 was fixing, and what D106408 is extending. Does that sound about right so far?

If so can we update the docs to match the new behaviour? I'm not sure I would really count it as "trivially rematerializable" anymore, but I don't have a better name for it. From there moving the profitability check into Machine LICM sounds like a fine idea.

In D107677#2941270, @dmgreen wrote:

And now has a bad day:

  LE       lr,#innerloop
...
  VCTP lr

Is that after this patch or after D106408? It looks more like rematerialization and not hoisting.

Taking a step back, my understanding of "Trivial Rematerialization" comes from the definition above isTriviallyReMaterializable:
/// Return true if the instruction is trivially rematerializable, meaning it
/// has no side effects and requires no operands that aren't always available.
/// This means the only allowed uses are constants and unallocatable physical
/// registers so that the instructions result is independent of the place
/// in the function.
bool isTriviallyReMaterializable(const MachineInstr &MI,
                                 AAResults *AA = nullptr) const {
So they are expected to only have operands that are available everywhere in the program. That is what makes them trivial to rematerialize. It sounds like D105742 (and D87280 although I'm not sure it should have) has changed that definition to now include instruction that include virtual uses. Non-trivial rematerialization, if you will. And it is now the responsibility of the caller to ensure that the virtual uses are valid at the point it is being moved. That is what D106396 was fixing, and what D106408 is extending. Does that sound about right so far?

Yes, this sounds right. Moreover, AMDGPU was allowing it for years, just for few instructions, namely moves.

If so can we update the docs to match the new behaviour? I'm not sure I would really count it as "trivially rematerializable" anymore, but I don't have a better name for it. From there moving the profitability check into Machine LICM sounds like a fine idea.

Yes, I guess this description is not correct for all targets and will be completely incorrect after D106408. I need to update the comment along with the D106408 I suppose. How about this text?

/// Return true if the instruction is trivially rematerializable, meaning it
/// has no side effects. Uses of constants and unallocatable physical
/// registers are always trivial to rematerialize so that the instructions
/// result is independent of the place in the function. Uses of virtual
/// registers are allowed but it is caller's responsility to ensure these
/// operands are valid at the point the instruction is beeing moved.

In D107677#2941270, @dmgreen wrote:

If so can we update the docs to match the new behaviour? I'm not sure I would really count it as "trivially rematerializable" anymore, but I don't have a better name for it. From there moving the profitability check into Machine LICM sounds like a fine idea.

Updated comments in D106408.

In D107677#2941702, @rampitec wrote:
In D107677#2941270, @dmgreen wrote:

And now has a bad day:
  LE       lr,#innerloop
...
  VCTP lr
Is that after this patch or after D106408? It looks more like rematerialization and not hoisting.

Oh I meant

  VCTP r0
outerloop:

It's hoisting out of the outer loop. I don't consider the regressions to be the fault of this patch though.

If there are no other comments, LGTM

llvm/lib/CodeGen/MachineLICM.cpp
668	believe -> belief
1178–1179	since the register allocator -> providing the register allocator

This revision is now accepted and ready to land.Aug 16 2021, 11:40 AM

Fixed comments as suggested and rebased.

This revision was landed with ongoing or failed builds.Aug 16 2021, 12:18 PM

Closed by commit rGb9e433b02a77: Prevent machine licm if remattable with a vreg use (authored by rampitec). · Explain Why

This revision was automatically updated to reflect the committed changes.

rampitec added a commit: rGb9e433b02a77: Prevent machine licm if remattable with a vreg use.

Harbormaster completed remote builds in B119759: Diff 366695.Aug 16 2021, 12:34 PM

rampitec mentioned this in rGc80d8a8ceabb: [AMDGPU] MachineLICM cannot hoist VALU.Oct 20 2021, 12:06 PM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

MachineLICM.cpp

28 lines

test/

CodeGen/

AMDGPU/

licm-regpressure.mir

48 lines

Diff 366710

llvm/lib/CodeGen/MachineLICM.cpp

Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	bool CanCauseHighRegPressure(const DenseMap<unsigned, int> &Cost,
bool Cheap);		bool Cheap);

void UpdateBackTraceRegPressure(const MachineInstr *MI);		void UpdateBackTraceRegPressure(const MachineInstr *MI);

bool IsProfitableToHoist(MachineInstr &MI);		bool IsProfitableToHoist(MachineInstr &MI);

bool IsGuaranteedToExecute(MachineBasicBlock *BB);		bool IsGuaranteedToExecute(MachineBasicBlock *BB);

		bool isTriviallyReMaterializable(const MachineInstr &MI,
		AAResults *AA) const;

void EnterScope(MachineBasicBlock *MBB);		void EnterScope(MachineBasicBlock *MBB);

void ExitScope(MachineBasicBlock *MBB);		void ExitScope(MachineBasicBlock *MBB);

void ExitScopeIfDone(		void ExitScopeIfDone(
MachineDomTreeNode *Node,		MachineDomTreeNode *Node,
DenseMap<MachineDomTreeNode *, unsigned> &OpenChildren,		DenseMap<MachineDomTreeNode *, unsigned> &OpenChildren,
DenseMap<MachineDomTreeNode , MachineDomTreeNode > &ParentMap);		DenseMap<MachineDomTreeNode , MachineDomTreeNode > &ParentMap);
▲ Show 20 Lines • Show All 413 Lines • ▼ Show 20 Lines	for (MachineBasicBlock *CurrentLoopExitingBlock : CurrentLoopExitingBlocks)
return false;		return false;
}		}
}		}

SpeculationState = SpeculateFalse;		SpeculationState = SpeculateFalse;
return true;		return true;
}		}

		/// Check if \p MI is trivially remateralizable and if it does not have any
		/// virtual register uses. Even though rematerializable RA might not actually
		/// rematerialize it in this scenario. In that case we do not want to hoist such
		/// instruction out of the loop in a belief RA will sink it back if needed.
		dmgreenUnsubmitted Done Reply Inline Actions believe -> belief dmgreen: believe -> belief
		bool MachineLICMBase::isTriviallyReMaterializable(const MachineInstr &MI,
		AAResults *AA) const {
		if (!TII->isTriviallyReMaterializable(MI, AA))
		return false;

		for (const MachineOperand &MO : MI.operands()) {
		if (MO.isReg() && MO.isUse() && MO.getReg().isVirtual())
		return false;
		}

		return true;
		}

void MachineLICMBase::EnterScope(MachineBasicBlock *MBB) {		void MachineLICMBase::EnterScope(MachineBasicBlock *MBB) {
LLVM_DEBUG(dbgs() << "Entering " << printMBBReference(*MBB) << '\n');		LLVM_DEBUG(dbgs() << "Entering " << printMBBReference(*MBB) << '\n');

// Remember livein register pressure.		// Remember livein register pressure.
BackTrace.push_back(RegPressure);		BackTrace.push_back(RegPressure);
}		}

void MachineLICMBase::ExitScope(MachineBasicBlock *MBB) {		void MachineLICMBase::ExitScope(MachineBasicBlock *MBB) {
▲ Show 20 Lines • Show All 480 Lines • ▼ Show 20 Lines	bool MachineLICMBase::IsProfitableToHoist(MachineInstr &MI) {
bool CheapInstr = IsCheapInstruction(MI);		bool CheapInstr = IsCheapInstruction(MI);
bool CreatesCopy = HasLoopPHIUse(&MI);		bool CreatesCopy = HasLoopPHIUse(&MI);

// Don't hoist a cheap instruction if it would create a copy in the loop.		// Don't hoist a cheap instruction if it would create a copy in the loop.
if (CheapInstr && CreatesCopy) {		if (CheapInstr && CreatesCopy) {
LLVM_DEBUG(dbgs() << "Won't hoist cheap instr with loop PHI use: " << MI);		LLVM_DEBUG(dbgs() << "Won't hoist cheap instr with loop PHI use: " << MI);
return false;		return false;
}		}

// Rematerializable instructions should always be hoisted since the register		// Rematerializable instructions should always be hoisted providing the
		dmgreenUnsubmitted Done Reply Inline Actions since the register allocator -> providing the register allocator dmgreen: since the register allocator -> providing the register allocator
// allocator can just pull them down again when needed.		// register allocator can just pull them down again when needed.
if (TII->isTriviallyReMaterializable(MI, AA))		if (isTriviallyReMaterializable(MI, AA))
return true;		return true;

// FIXME: If there are long latency loop-invariant instructions inside the		// FIXME: If there are long latency loop-invariant instructions inside the
// loop at this point, why didn't the optimizer's LICM hoist them?		// loop at this point, why didn't the optimizer's LICM hoist them?
for (unsigned i = 0, e = MI.getDesc().getNumOperands(); i != e; ++i) {		for (unsigned i = 0, e = MI.getDesc().getNumOperands(); i != e; ++i) {
const MachineOperand &MO = MI.getOperand(i);		const MachineOperand &MO = MI.getOperand(i);
if (!MO.isReg() \|\| MO.isImplicit())		if (!MO.isReg() \|\| MO.isImplicit())
continue;		continue;
Show All 36 Lines	bool MachineLICMBase::IsProfitableToHoist(MachineInstr &MI) {
if (AvoidSpeculation &&		if (AvoidSpeculation &&
(!IsGuaranteedToExecute(MI.getParent()) && !MayCSE(&MI))) {		(!IsGuaranteedToExecute(MI.getParent()) && !MayCSE(&MI))) {
LLVM_DEBUG(dbgs() << "Won't speculate: " << MI);		LLVM_DEBUG(dbgs() << "Won't speculate: " << MI);
return false;		return false;
}		}

// High register pressure situation, only hoist if the instruction is going		// High register pressure situation, only hoist if the instruction is going
// to be remat'ed.		// to be remat'ed.
if (!TII->isTriviallyReMaterializable(MI, AA) &&		if (!isTriviallyReMaterializable(MI, AA) &&
!MI.isDereferenceableInvariantLoad(AA)) {		!MI.isDereferenceableInvariantLoad(AA)) {
LLVM_DEBUG(dbgs() << "Can't remat / high reg-pressure: " << MI);		LLVM_DEBUG(dbgs() << "Can't remat / high reg-pressure: " << MI);
return false;		return false;
}		}

return true;		return true;
}		}

▲ Show 20 Lines • Show All 286 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/licm-regpressure.mir

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -run-pass machinelicm -o - %s \| FileCheck -check-prefix=GCN %s		# RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -run-pass machinelicm -o - %s \| FileCheck -check-prefix=GCN %s

# FIXME: MachineLICM hoists all V_CVT instructions out of the loop increasing		# MachineLICM shall limit hoisting of V_CVT instructions out of the loop keeping
# register pressure. VGPR budget at occupancy 10 is 24 vgprs.		# register pressure within the budget. VGPR budget at occupancy 10 is 24 vgprs.

---		---
name: test		name: test
tracksRegLiveness: true		tracksRegLiveness: true
body: \|		body: \|
; GCN-LABEL: name: test		; GCN-LABEL: name: test
; GCN: bb.0:		; GCN: bb.0:
; GCN: successors: %bb.1(0x80000000)		; GCN: successors: %bb.1(0x80000000)
Show All 16 Lines	body: \|
; GCN: [[COPY15:%[0-9]+]]:vgpr_32 = COPY $vgpr0		; GCN: [[COPY15:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; GCN: [[COPY16:%[0-9]+]]:vgpr_32 = COPY $vgpr0		; GCN: [[COPY16:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; GCN: [[COPY17:%[0-9]+]]:vgpr_32 = COPY $vgpr0		; GCN: [[COPY17:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; GCN: %18:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY]], implicit $mode, implicit $exec		; GCN: %18:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY]], implicit $mode, implicit $exec
; GCN: %19:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY1]], implicit $mode, implicit $exec		; GCN: %19:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY1]], implicit $mode, implicit $exec
; GCN: %20:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY2]], implicit $mode, implicit $exec		; GCN: %20:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY2]], implicit $mode, implicit $exec
; GCN: %21:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY3]], implicit $mode, implicit $exec		; GCN: %21:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY3]], implicit $mode, implicit $exec
; GCN: %22:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY4]], implicit $mode, implicit $exec		; GCN: %22:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY4]], implicit $mode, implicit $exec
		; GCN: bb.1:
		; GCN: successors: %bb.2(0x04000000), %bb.1(0x7c000000)
		; GCN: liveins: $vcc
		; GCN: $vcc = S_AND_B64 $exec, $vcc, implicit-def $scc
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %18, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %19, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %20, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %21, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %22, implicit $exec
; GCN: %23:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY5]], implicit $mode, implicit $exec		; GCN: %23:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY5]], implicit $mode, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, killed %23, implicit $exec
; GCN: %24:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY6]], implicit $mode, implicit $exec		; GCN: %24:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY6]], implicit $mode, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, killed %24, implicit $exec
; GCN: %25:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY7]], implicit $mode, implicit $exec		; GCN: %25:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY7]], implicit $mode, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, killed %25, implicit $exec
; GCN: %26:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY8]], implicit $mode, implicit $exec		; GCN: %26:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY8]], implicit $mode, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, killed %26, implicit $exec
; GCN: %27:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY9]], implicit $mode, implicit $exec		; GCN: %27:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY9]], implicit $mode, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, killed %27, implicit $exec
; GCN: %28:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY10]], implicit $mode, implicit $exec		; GCN: %28:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY10]], implicit $mode, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, killed %28, implicit $exec
; GCN: %29:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY11]], implicit $mode, implicit $exec		; GCN: %29:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY11]], implicit $mode, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, killed %29, implicit $exec
; GCN: %30:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY12]], implicit $mode, implicit $exec		; GCN: %30:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY12]], implicit $mode, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, killed %30, implicit $exec
; GCN: %31:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY13]], implicit $mode, implicit $exec		; GCN: %31:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY13]], implicit $mode, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, killed %31, implicit $exec
; GCN: %32:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY14]], implicit $mode, implicit $exec		; GCN: %32:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY14]], implicit $mode, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, killed %32, implicit $exec
; GCN: %33:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY15]], implicit $mode, implicit $exec		; GCN: %33:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY15]], implicit $mode, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, killed %33, implicit $exec
; GCN: %34:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY16]], implicit $mode, implicit $exec		; GCN: %34:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY16]], implicit $mode, implicit $exec
		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, killed %34, implicit $exec
; GCN: %35:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY17]], implicit $mode, implicit $exec		; GCN: %35:vreg_64 = nofpexcept V_CVT_F64_I32_e32 [[COPY17]], implicit $mode, implicit $exec
; GCN: bb.1:		; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, killed %35, implicit $exec
; GCN: successors: %bb.2(0x04000000), %bb.1(0x7c000000)
; GCN: liveins: $vcc
; GCN: $vcc = S_AND_B64 $exec, $vcc, implicit-def $scc
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %18, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %19, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %20, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %21, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %22, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %23, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %24, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %25, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %26, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %27, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %28, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %29, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %30, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %31, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %32, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %33, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %34, implicit $exec
; GCN: $vcc = V_CMP_EQ_U64_e64 $vcc, %35, implicit $exec
; GCN: S_CBRANCH_VCCNZ %bb.1, implicit $vcc		; GCN: S_CBRANCH_VCCNZ %bb.1, implicit $vcc
; GCN: S_BRANCH %bb.2		; GCN: S_BRANCH %bb.2
; GCN: bb.2:		; GCN: bb.2:
; GCN: S_ENDPGM 0		; GCN: S_ENDPGM 0
bb.0:		bb.0:
successors: %bb.1(0x80000000)		successors: %bb.1(0x80000000)
liveins: $vcc, $vgpr0		liveins: $vcc, $vgpr0

▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines