This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fixed a couple of SIFixWWMLiveness problems
AbandonedPublic

Authored by tpr on May 4 2018, 1:58 PM.

Download Raw Diff

Details

Reviewers

nhaehnle
cwabbott

Summary

Where SIFixWWMLiveness adds dummy operands to EXIT_WWM, remove kill

marks from other uses of the same register to avoid the MIR becoming
invalid.

SIFixWWMLiveness's scheme for finding registers that might be live in

disabled lanes at the point of the WWM code involves finding defs that
reach the WWM code, and would be considered live if all defs are
considered partial defs.

That scheme could find false positives in defs that do not dominate the
WWM code, and thus do not really have a live value reaching it in any
lane. The two problems with adding a dummy operand for such a register
to EXIT_WWM are (a) it is over-conservative, and (b) it violates
MachineVerifier's check that all uses are fully defined, even if
sometimes by an IMPLICIT_DEF that arises from a undef phi input.

This commit fixes it by only considering defs that dominate the
EXIT_WWM.

Change-Id: I3d70f6138798c3aaa8b6bdf290d4eb85c7beb311

Diff Detail

Repository

rL LLVM

Build Status

Buildable 17730
Build 17730: arc lint + arc unit

Event Timeline

tpr created this revision.May 4 2018, 1:58 PM

Herald added subscribers: llvm-commits, t-tye, dstuttard and 5 others. · View Herald TranscriptMay 4 2018, 1:58 PM

tpr added reviewers: nhaehnle, cwabbott.May 4 2018, 2:00 PM

Harbormaster completed remote builds in B17730: Diff 145289.May 4 2018, 2:00 PM

Sorry about the size of the test. It's the best that bugpoint could do.

cwabbott added inline comments.May 4 2018, 2:37 PM

lib/Target/AMDGPU/SIFixWWMLiveness.cpp
40	Unfortunately, this won't work in general. Consider something like: while (non_uniform_condition) { BEGIN_WWM; bar = ...; EXIT_WWM; foo = ...; } ... = foo; The definition of foo doesn't dominate the WWM instructions earlier in the loop body, but the use of foo can potentially "see" the result of many different iterations of the loop, since the loop trip count is non-uniform, and any WWM instructions will clobber everything but the last iteration. Hence we need to add an artificial interference with foo here. Of course, if you removed the use of foo outside the loop, then we wouldn't need to do anything... it's the actual use that is crucial here. We also ran into the issue of SIFixWWMLiveness being too conservative (as well as the liveness issue) when enabling AMD_shader_ballot for radv, but I haven't been able to come up with a good solution for it. It seems that we have to treat loops, and registers live out of a loop, specially somehow.

For the liveness issue, maybe a better way to solve it would be to add a new ENTER_WWM pseudoinstruction similar to EXIT_WWM, and add a matching implicit def to the matching ENTER_WWM whenever we insert an implicit use on EXIT_WWM, and mark both of them as kills. After all, any affected registers only need to interfere with the instructions run in WWM, so that should help with code quality too. I'm not sure why I didn't do that in the first place.

Your suggestion of an ENTER_WWM with a def sounds like it would work.

You made a comment in this code that "this is a workaround anyways until LLVM gains the notion of predicated uses and definitions of variables". How do you see that working? Are you aware of anyone else thinking along those lines in the LLVM community? Something to think about for the future.

In D46470#1088984, @tpr wrote:

You made a comment in this code that "this is a workaround anyways until LLVM gains the notion of predicated uses and definitions of variables". How do you see that working? Are you aware of anyone else thinking along those lines in the LLVM community? Something to think about for the future.

I was thinking about some way to decorate instructions with abstract predicates with enough information to make them useful for register allocation. For example, you could compute the OR of two predicates, one predicate minus another, etc. We would compute these predicates per-block before lowering the control flow, and then make every instruction predicated and lower the control flow (preserving the predicates). Except that WWM instructions still wouldn't be predicated, of course. There's a lot of pre-existing literature on this sort of thing with classical (scalar) predicated architectures, for example "Strategies for Predicate-Aware Register Allocation" by Hoflehner is one overview. Actually implementing one of the schemes in that paper would require changes to core MC and would probably be quite invasive, though. There are some other potential users for it, like the new predicated AVX512 stuff, but I don't know if anyone else is thinking about it besides for some discussion with @nhaehnle.

Hi Connor

Thanks for the pointers on possible future directions. I was vaguely thinking about some scheme where a value in a live interval is marked with its predication in some way so two overlapping segments are not considered interfering if they have disjoint predications. But that is pretty intrusive.

Anyway, back to my present problem:

I have been trying to come up with a better way of handling WWM liveness based on examining the particular cases (which are essentially a multi-def value that is a lowered phi, and a single-def value defined at the bottom of a do-while loop that the WWM is inside).

However, a fundamental problem has occurred to me: however we synthesize liveness in WWM code, surely the register allocator could decide to split the register, and the inserted copies will be predicated and thus useless for our live-in-inactive-lanes liveness. The inactive lanes will rely on keeping the value in the original register, but the liveness info will no longer say that register is live in the WWM code, and the same register could be allocated to a WWM value.

Any ideas on that one?

Abandoned in favor of
https://reviews.llvm.org/D46756

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

SIFixWWMLiveness.cpp

35 lines

test/

CodeGen/

AMDGPU/

wwm-implicit-operands.mir

404 lines

Diff 145289

lib/Target/AMDGPU/SIFixWWMLiveness.cpp

Show All 30 Lines
/// false, for which %vgpr0 is supposed to be 0. This pass adds an implicit use		/// false, for which %vgpr0 is supposed to be 0. This pass adds an implicit use
/// of %vgpr0 to the WWM instruction to make sure they aren't allocated to the		/// of %vgpr0 to the WWM instruction to make sure they aren't allocated to the
/// same register.		/// same register.
///		///
/// In general, we need to figure out what registers might have their inactive		/// In general, we need to figure out what registers might have their inactive
/// channels which are eventually used accidentally clobbered by a WWM		/// channels which are eventually used accidentally clobbered by a WWM
/// instruction. We approximate this using two conditions:		/// instruction. We approximate this using two conditions:
///		///
/// 1. A definition of the variable reaches the WWM instruction.		/// 1. A definition of the variable reaches the WWM instruction (and dominates
		/// it).
		cwabbottUnsubmitted Not Done Reply Inline Actions Unfortunately, this won't work in general. Consider something like: while (non_uniform_condition) { BEGIN_WWM; bar = ...; EXIT_WWM; foo = ...; } ... = foo; The definition of foo doesn't dominate the WWM instructions earlier in the loop body, but the use of foo can potentially "see" the result of many different iterations of the loop, since the loop trip count is non-uniform, and any WWM instructions will clobber everything but the last iteration. Hence we need to add an artificial interference with foo here. Of course, if you removed the use of foo outside the loop, then we wouldn't need to do anything... it's the actual use that is crucial here. We also ran into the issue of SIFixWWMLiveness being too conservative (as well as the liveness issue) when enabling AMD_shader_ballot for radv, but I haven't been able to come up with a good solution for it. It seems that we have to treat loops, and registers live out of a loop, specially somehow. cwabbott: Unfortunately, this won't work in general. Consider something like: ``` while…
/// 2. The variable would be live at the WWM instruction if all its defs were		/// 2. The variable would be live at the WWM instruction if all its defs were
/// partial defs (i.e. considered as a use), ignoring normal uses.		/// partial defs (i.e. considered as a use), ignoring normal uses.
///		///
/// If a register matches both conditions, then we add an implicit use of it to		/// If a register matches both conditions, then we add an implicit use of it to
/// the WWM instruction. Condition #2 is the heart of the matter: every		/// the WWM instruction. Condition #2 is the heart of the matter: every
/// definition is really a partial definition, since every VALU instruction is		/// definition is really a partial definition, since every VALU instruction is
/// implicitly predicated. We can usually ignore this, but WWM forces us not		/// implicitly predicated. We can usually ignore this, but WWM forces us not
/// to. Condition #1 prevents false positives if the variable is undefined at		/// to. Condition #1 prevents false positives if the variable is undefined at
/// the WWM instruction anyways. This is overly conservative in certain cases,		/// the WWM instruction anyways. This is overly conservative in certain cases,
/// especially in uniform control flow, but this is a workaround anyways until		/// especially in uniform control flow, but this is a workaround anyways until
/// LLVM gains the notion of predicated uses and definitions of variables.		/// LLVM gains the notion of predicated uses and definitions of variables.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "AMDGPUSubtarget.h"		#include "AMDGPUSubtarget.h"
#include "SIInstrInfo.h"		#include "SIInstrInfo.h"
#include "SIRegisterInfo.h"		#include "SIRegisterInfo.h"
#include "llvm/ADT/DepthFirstIterator.h"		#include "llvm/ADT/DepthFirstIterator.h"
#include "llvm/ADT/SparseBitVector.h"		#include "llvm/ADT/SparseBitVector.h"
#include "llvm/CodeGen/LiveIntervals.h"		#include "llvm/CodeGen/LiveIntervals.h"
		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"		#include "llvm/CodeGen/TargetRegisterInfo.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "si-fix-wwm-liveness"		#define DEBUG_TYPE "si-fix-wwm-liveness"

namespace {		namespace {

class SIFixWWMLiveness : public MachineFunctionPass {		class SIFixWWMLiveness : public MachineFunctionPass {
private:		private:
		MachineDominatorTree *DomTree = nullptr;
LiveIntervals *LIS = nullptr;		LiveIntervals *LIS = nullptr;
const SIRegisterInfo *TRI;		const SIRegisterInfo *TRI;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;

public:		public:
static char ID;		static char ID;

SIFixWWMLiveness() : MachineFunctionPass(ID) {		SIFixWWMLiveness() : MachineFunctionPass(ID) {
initializeSIFixWWMLivenessPass(*PassRegistry::getPassRegistry());		initializeSIFixWWMLivenessPass(*PassRegistry::getPassRegistry());
}		}

bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

bool runOnWWMInstruction(MachineInstr &MI);		bool runOnWWMInstruction(MachineInstr &MI);

void addDefs(const MachineInstr &MI, SparseBitVector<> &set);		void addDefs(const MachineInstr &MI, SparseBitVector<> &set);

StringRef getPassName() const override { return "SI Fix WWM Liveness"; }		StringRef getPassName() const override { return "SI Fix WWM Liveness"; }

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
		AU.addRequiredID(MachineDominatorsID);
// Should preserve the same set that TwoAddressInstructions does.		// Should preserve the same set that TwoAddressInstructions does.
AU.addPreserved<SlotIndexes>();		AU.addPreserved<SlotIndexes>();
AU.addPreserved<LiveIntervals>();		AU.addPreserved<LiveIntervals>();
AU.addPreservedID(LiveVariablesID);		AU.addPreservedID(LiveVariablesID);
AU.addPreservedID(MachineLoopInfoID);		AU.addPreservedID(MachineLoopInfoID);
AU.addPreservedID(MachineDominatorsID);		AU.addPreservedID(MachineDominatorsID);
AU.setPreservesCFG();		AU.setPreservesCFG();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}
};		};

} // End anonymous namespace.		} // End anonymous namespace.

INITIALIZE_PASS(SIFixWWMLiveness, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(SIFixWWMLiveness, DEBUG_TYPE,
		"SI fix WWM liveness", false, false)
		INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
		INITIALIZE_PASS_END(SIFixWWMLiveness, DEBUG_TYPE,
"SI fix WWM liveness", false, false)		"SI fix WWM liveness", false, false)

char SIFixWWMLiveness::ID = 0;		char SIFixWWMLiveness::ID = 0;

char &llvm::SIFixWWMLivenessID = SIFixWWMLiveness::ID;		char &llvm::SIFixWWMLivenessID = SIFixWWMLiveness::ID;

FunctionPass *llvm::createSIFixWWMLivenessPass() {		FunctionPass *llvm::createSIFixWWMLivenessPass() {
return new SIFixWWMLiveness();		return new SIFixWWMLiveness();
Show All 25 Lines	bool SIFixWWMLiveness::runOnWWMInstruction(MachineInstr &WWM) {
for (df_iterator<MachineBasicBlock *> I = ++df_begin(MBB),		for (df_iterator<MachineBasicBlock *> I = ++df_begin(MBB),
E = df_end(MBB);		E = df_end(MBB);
I != E; ++I) {		I != E; ++I) {
for (const MachineInstr &MI : **I) {		for (const MachineInstr &MI : **I) {
addDefs(MI, LiveOut);		addDefs(MI, LiveOut);
}		}
}		}

// Compute the registers that reach MI.		// Compute the registers that reach MI, and have some definition that dominates
		// it.
SparseBitVector<> Reachable;		SparseBitVector<> Reachable;

for (auto II = ++MachineBasicBlock::reverse_iterator(WWM), IE =		for (auto II = ++MachineBasicBlock::reverse_iterator(WWM), IE =
MBB->rend(); II != IE; ++II) {		MBB->rend(); II != IE; ++II) {
addDefs(*II, Reachable);		addDefs(*II, Reachable);
}		}

for (idf_iterator<MachineBasicBlock *> I = ++idf_begin(MBB),		for (auto Node = DomTree->getNode(MBB)->getIDom();
E = idf_end(MBB);		Node; Node = Node->getIDom()) {
I != E; ++I) {		MachineBasicBlock *DominatingMBB = Node->getBlock();
for (const MachineInstr &MI : **I) {		for (const MachineInstr &MI : *DominatingMBB) {
addDefs(MI, Reachable);		addDefs(MI, Reachable);
}		}
}		}

// find the intersection, and add implicit uses.		// find the intersection, and add implicit uses.
LiveOut &= Reachable;		LiveOut &= Reachable;

bool Modified = false;		bool Modified = false;
for (unsigned Reg : LiveOut) {		for (unsigned Reg : LiveOut) {
WWM.addOperand(MachineOperand::CreateReg(Reg, false, /isImp=/true));		WWM.addOperand(MachineOperand::CreateReg(Reg, false, /isImp=/true));
if (LIS) {		if (LIS) {
// FIXME: is there a better way to update the live interval?		// FIXME: is there a better way to update the live interval?
LIS->removeInterval(Reg);		LIS->removeInterval(Reg);
LIS->createAndComputeVirtRegInterval(Reg);		LIS->createAndComputeVirtRegInterval(Reg);
}		}
		// Also remove kill mark from uses.
		for (auto &RI : MRI->reg_instructions(Reg)) {
		for (unsigned i = 0, e = RI.getNumOperands(); i != e; ++i) {
		MachineOperand &MO = RI.getOperand(i);
		if (MO.isReg() && MO.getReg() == Reg) {
		if (MO.isKill())
		MO.setIsKill(false);
		else if (MO.isDead())
		MO.setIsDead(false);
		}
		}
		}
Modified = true;		Modified = true;
}		}

return Modified;		return Modified;
}		}

bool SIFixWWMLiveness::runOnMachineFunction(MachineFunction &MF) {		bool SIFixWWMLiveness::runOnMachineFunction(MachineFunction &MF) {
bool Modified = false;		bool Modified = false;

		DomTree = &getAnalysis<MachineDominatorTree>();
// This doesn't actually need LiveIntervals, but we can preserve them.		// This doesn't actually need LiveIntervals, but we can preserve them.
LIS = getAnalysisIfAvailable<LiveIntervals>();		LIS = getAnalysisIfAvailable<LiveIntervals>();

const SISubtarget &ST = MF.getSubtarget<SISubtarget>();		const SISubtarget &ST = MF.getSubtarget<SISubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();

TRI = &TII->getRegisterInfo();		TRI = &TII->getRegisterInfo();
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
Show All 11 Lines

test/CodeGen/AMDGPU/wwm-implicit-operands.mir

This file was added.

				# RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx802 -verify-machineinstrs -run-pass si-fix-wwm-liveness %s -o - >/dev/null

				# This tests for MIR still being valid after SIFixWWMLiveness. I had a couple of
				# problems with MIR being invalid because of the way it adds implicit operands
				# to the EXIT_WWM pseudo to (conservatively) model the way that a non-live
				# register may be live in other lanes, and thus clobbered by the WWM
				# instructions.

				---
				name: _amdgpu_vs_main
				alignment: 0
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				registers:
				- { id: 0, class: sreg_32_xm0, preferred-register: '' }
				- { id: 1, class: sreg_32_xm0, preferred-register: '' }
				- { id: 2, class: sreg_32_xm0, preferred-register: '' }
				- { id: 3, class: sreg_32_xm0, preferred-register: '' }
				- { id: 4, class: sreg_32_xm0, preferred-register: '' }
				- { id: 5, class: sreg_32_xm0, preferred-register: '' }
				- { id: 6, class: sreg_32_xm0, preferred-register: '' }
				- { id: 7, class: sreg_32_xm0, preferred-register: '' }
				- { id: 8, class: vgpr_32, preferred-register: '' }
				- { id: 9, class: vgpr_32, preferred-register: '' }
				- { id: 10, class: sreg_32_xm0, preferred-register: '' }
				- { id: 11, class: sreg_32_xm0, preferred-register: '' }
				- { id: 12, class: sreg_32_xm0, preferred-register: '' }
				- { id: 13, class: vgpr_32, preferred-register: '' }
				- { id: 14, class: sreg_32_xm0_xexec, preferred-register: '' }
				- { id: 15, class: sreg_128, preferred-register: '' }
				- { id: 16, class: sreg_32_xm0, preferred-register: '' }
				- { id: 17, class: sreg_32_xm0, preferred-register: '' }
				- { id: 18, class: sreg_32_xm0, preferred-register: '' }
				- { id: 19, class: sreg_64_xexec, preferred-register: '$vcc' }
				- { id: 20, class: vgpr_32, preferred-register: '' }
				- { id: 21, class: vgpr_32, preferred-register: '' }
				- { id: 22, class: vgpr_32, preferred-register: '' }
				- { id: 23, class: sreg_64, preferred-register: '$vcc' }
				- { id: 24, class: vgpr_32, preferred-register: '' }
				- { id: 25, class: sreg_32_xm0, preferred-register: '' }
				- { id: 26, class: sreg_32, preferred-register: '' }
				- { id: 27, class: sreg_64, preferred-register: '' }
				- { id: 28, class: sreg_64, preferred-register: '' }
				- { id: 29, class: sreg_64, preferred-register: '' }
				- { id: 30, class: sreg_64, preferred-register: '' }
				- { id: 31, class: sreg_64, preferred-register: '' }
				- { id: 32, class: sreg_64, preferred-register: '' }
				- { id: 33, class: sreg_64, preferred-register: '' }
				- { id: 34, class: sreg_64, preferred-register: '' }
				- { id: 35, class: sreg_64, preferred-register: '' }
				- { id: 36, class: sreg_64, preferred-register: '' }
				- { id: 37, class: sreg_64, preferred-register: '' }
				- { id: 38, class: sreg_64, preferred-register: '' }
				- { id: 39, class: sreg_64, preferred-register: '' }
				- { id: 40, class: sreg_64, preferred-register: '' }
				- { id: 41, class: sreg_64, preferred-register: '' }
				- { id: 42, class: sreg_64, preferred-register: '' }
				- { id: 43, class: vgpr_32, preferred-register: '' }
				- { id: 44, class: vgpr_32, preferred-register: '' }
				- { id: 45, class: vgpr_32, preferred-register: '' }
				- { id: 46, class: vgpr_32, preferred-register: '' }
				- { id: 47, class: sreg_64, preferred-register: '' }
				- { id: 48, class: vgpr_32, preferred-register: '' }
				liveins:
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 0
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 4294967295
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack:
				stack:
				constants:
				body: \|
				bb.0:
				successors: %bb.2(0x40000000), %bb.1(0x40000000)

				S_CBRANCH_SCC1 %bb.2, implicit undef $scc
				S_BRANCH %bb.1

				bb.1:
				successors: %bb.2(0x80000000)


				bb.2:
				successors: %bb.4(0x40000000), %bb.3(0x40000000)

				S_CBRANCH_SCC1 %bb.4, implicit undef $scc
				S_BRANCH %bb.3

				bb.3:
				successors: %bb.4(0x80000000)


				bb.4:
				successors: %bb.5(0x40000000), %bb.7(0x40000000)

				S_CBRANCH_SCC1 %bb.7, implicit undef $scc
				S_BRANCH %bb.5

				bb.5:
				successors: %bb.7(0x40000000), %bb.6(0x40000000)

				S_CBRANCH_SCC1 %bb.7, implicit undef $scc
				S_BRANCH %bb.6

				bb.6:
				successors: %bb.7(0x80000000)


				bb.7:
				successors: %bb.8(0x40000000), %bb.23(0x40000000)

				S_CBRANCH_SCC1 %bb.23, implicit undef $scc
				S_BRANCH %bb.8

				bb.8:
				successors: %bb.9(0x40000000), %bb.13(0x40000000)

				S_CBRANCH_SCC1 %bb.13, implicit undef $scc
				S_BRANCH %bb.9

				bb.9:
				successors: %bb.12(0x40000000), %bb.10(0x40000000)

				S_CBRANCH_SCC1 %bb.12, implicit undef $scc
				S_BRANCH %bb.10

				bb.10:
				successors: %bb.11(0x40000000), %bb.12(0x40000000)

				S_CBRANCH_SCC1 %bb.12, implicit undef $scc
				S_BRANCH %bb.11

				bb.11:
				successors: %bb.13(0x40000000), %bb.12(0x40000000)

				S_CBRANCH_SCC1 %bb.12, implicit undef $scc
				S_BRANCH %bb.13

				bb.12:
				successors: %bb.13(0x80000000)


				bb.13:
				successors: %bb.14(0x40000000), %bb.52(0x40000000)

				S_CBRANCH_SCC1 %bb.52, implicit undef $scc
				S_BRANCH %bb.14

				bb.14:
				successors: %bb.15(0x40000000), %bb.52(0x40000000)

				S_CBRANCH_SCC1 %bb.52, implicit undef $scc

				bb.15:
				successors: %bb.52(0x40000000), %bb.16(0x40000000)

				S_CBRANCH_SCC1 %bb.52, implicit undef $scc

				bb.16:
				successors: %bb.17(0x30000000), %bb.18(0x50000000)

				%36:sreg_64 = S_AND_B64 $exec, 0, implicit-def dead $scc
				$vcc = COPY killed %36
				S_CBRANCH_VCCNZ %bb.18, implicit killed $vcc
				S_BRANCH %bb.17

				bb.17:
				successors: %bb.18(0x80000000)


				bb.18:
				successors: %bb.19(0x40000000), %bb.52(0x40000000)

				S_CBRANCH_SCC1 %bb.19, implicit undef $scc
				S_BRANCH %bb.52

				bb.19:
				successors: %bb.20(0x80000000)


				bb.20:
				successors: %bb.22(0x30000000), %bb.21(0x50000000)

				%38:sreg_64 = S_AND_B64 $exec, -1, implicit-def dead $scc
				$vcc = COPY killed %38
				S_CBRANCH_VCCNZ %bb.22, implicit killed $vcc
				S_BRANCH %bb.21

				bb.21:
				successors: %bb.52(0x80000000)

				S_BRANCH %bb.52

				bb.22:
				successors: %bb.52(0x80000000)

				S_BRANCH %bb.52

				bb.23:
				successors: %bb.24(0x40000000), %bb.29(0x40000000)

				S_CBRANCH_SCC1 %bb.29, implicit undef $scc
				S_BRANCH %bb.24

				bb.24:
				successors: %bb.25(0x80000000)

				early-clobber %6:sreg_32_xm0 = COPY undef %7:sreg_32_xm0, implicit $exec
				%48:vgpr_32 = IMPLICIT_DEF

				bb.25:
				successors: %bb.28(0x40000000), %bb.26(0x40000000)

				%47:sreg_64 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
				%45:vgpr_32 = COPY killed %48
				%9:vgpr_32 = COPY %6
				%8:vgpr_32 = DS_SWIZZLE_B32 killed %9, 4127, 0, implicit $exec
				early-clobber %43:vgpr_32 = COPY killed %8, implicit $exec, implicit $exec
				$exec = EXIT_WWM killed %47
				early-clobber %44:vgpr_32 = COPY killed %43, implicit $exec, implicit $exec
				%2:sreg_32_xm0 = V_READLANE_B32 killed %44, 63
				S_CBRANCH_SCC1 %bb.28, implicit undef $scc
				S_BRANCH %bb.26

				bb.26:
				successors: %bb.27(0x40000000), %bb.28(0x40000000)

				S_CBRANCH_SCC1 %bb.28, implicit undef $scc
				S_BRANCH %bb.27

				bb.27:
				successors: %bb.29(0x04000000), %bb.28(0x7c000000)

				S_CBRANCH_SCC1 %bb.28, implicit undef $scc
				S_BRANCH %bb.29

				bb.28:
				successors: %bb.25(0x7c000000), %bb.29(0x04000000)

				%14:sreg_32_xm0_xexec = S_BUFFER_LOAD_DWORD_SGPR undef %15:sreg_128, undef %16:sreg_32_xm0, 0 :: (dereferenceable invariant load 4)
				%18:sreg_32_xm0 = S_LSHR_B32 killed %14, 24, implicit-def dead $scc
				%20:vgpr_32 = COPY killed %18
				%19:sreg_64_xexec = V_CMP_GE_U32_e64 killed %2, killed %20, implicit $exec
				%22:vgpr_32, dead %23:sreg_64 = V_ADDC_U32_e64 0, killed %45, killed %19, implicit $exec
				V_CMP_NE_U32_e32 -1, %22, implicit-def $vcc, implicit $exec
				$vcc = S_AND_B64 $exec, killed $vcc, implicit-def dead $scc
				%48:vgpr_32 = COPY killed %22
				S_CBRANCH_VCCNZ %bb.25, implicit killed $vcc
				S_BRANCH %bb.29

				bb.29:
				successors: %bb.30(0x40000000), %bb.52(0x40000000)

				S_CBRANCH_SCC1 %bb.52, implicit undef $scc
				S_BRANCH %bb.30

				bb.30:
				successors: %bb.31(0x40000000), %bb.51(0x40000000)

				S_CBRANCH_SCC1 %bb.51, implicit undef $scc
				S_BRANCH %bb.31

				bb.31:
				successors: %bb.51(0x40000000), %bb.32(0x40000000)

				S_CBRANCH_SCC1 %bb.51, implicit undef $scc
				S_BRANCH %bb.32

				bb.32:
				successors: %bb.33(0x30000000), %bb.34(0x50000000)

				%28:sreg_64 = S_AND_B64 $exec, 0, implicit-def dead $scc
				$vcc = COPY killed %28
				S_CBRANCH_VCCNZ %bb.34, implicit killed $vcc
				S_BRANCH %bb.33

				bb.33:
				successors: %bb.34(0x80000000)


				bb.34:
				successors: %bb.36(0x40000000), %bb.35(0x40000000)

				S_CBRANCH_SCC1 %bb.36, implicit undef $scc
				S_BRANCH %bb.35

				bb.35:
				successors: %bb.51(0x40000000), %bb.43(0x40000000)

				S_CBRANCH_SCC1 %bb.51, implicit undef $scc
				S_BRANCH %bb.43

				bb.36:
				successors: %bb.37(0x80000000)


				bb.37:
				successors: %bb.39(0x30000000), %bb.38(0x50000000)

				%30:sreg_64 = S_AND_B64 $exec, -1, implicit-def dead $scc
				$vcc = COPY killed %30
				S_CBRANCH_VCCNZ %bb.39, implicit killed $vcc
				S_BRANCH %bb.38

				bb.38:
				successors: %bb.41(0x40000000), %bb.42(0x40000000)

				%32:sreg_64 = S_AND_B64 $exec, 0, implicit-def dead $scc
				$vcc = COPY killed %32
				S_CBRANCH_VCCNZ %bb.41, implicit killed $vcc
				S_BRANCH %bb.42

				bb.39:
				successors: %bb.40(0x30000000), %bb.42(0x50000000)

				%34:sreg_64 = S_AND_B64 $exec, 0, implicit-def dead $scc
				$vcc = COPY killed %34
				S_CBRANCH_VCCNZ %bb.42, implicit killed $vcc
				S_BRANCH %bb.40

				bb.40:
				successors: %bb.51(0x40000000), %bb.43(0x40000000)

				S_CBRANCH_SCC1 %bb.51, implicit undef $scc
				S_BRANCH %bb.43

				bb.41:
				successors: %bb.51(0x40000000), %bb.43(0x40000000)

				S_CBRANCH_SCC1 %bb.51, implicit undef $scc
				S_BRANCH %bb.43

				bb.42:
				successors: %bb.43(0x80000000)


				bb.43:
				successors: %bb.44(0x40000000), %bb.51(0x40000000)

				S_CBRANCH_SCC1 %bb.51, implicit undef $scc
				S_BRANCH %bb.44

				bb.44:
				successors: %bb.51(0x40000000), %bb.45(0x40000000)

				S_CBRANCH_SCC1 %bb.51, implicit undef $scc
				S_BRANCH %bb.45

				bb.45:
				successors: %bb.50(0x40000000), %bb.46(0x40000000)

				S_CBRANCH_SCC1 %bb.50, implicit undef $scc
				S_BRANCH %bb.46

				bb.46:
				successors: %bb.47(0x40000000), %bb.50(0x40000000)

				S_CBRANCH_SCC1 %bb.50, implicit undef $scc
				S_BRANCH %bb.47

				bb.47:
				successors: %bb.49(0x40000000), %bb.48(0x40000000)

				S_CBRANCH_SCC1 %bb.49, implicit undef $scc
				S_BRANCH %bb.48

				bb.48:
				successors: %bb.49(0x80000000)


				bb.49:
				successors: %bb.50(0x80000000)


				bb.50:
				successors: %bb.51(0x80000000)


				bb.51:
				successors: %bb.52(0x80000000)


				bb.52:
				S_ENDPGM

				...