This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPUTargetMachine.cpp
6/8
GCNSchedStrategy.h
8/15
GCNSchedStrategy.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
gfx-callable-return-types.ll
-
load-global-i16.ll
-
machine-scheduler-sink-trivial-remats.mir
-
scc-clobbered-sgpr-to-vmem-spill.ll

Differential D158368

[AMDGPU][MISCHED] GCNBalancedSchedStrategy.
Needs ReviewPublic

Authored by alex-t on Aug 20 2023, 8:36 AM.

Download Raw Diff

Details

Reviewers

rampitec
kerbowa
vpykhtin
jrbyrnes

Summary

The change implements the scheduling strategy to find a reasonable trade-off between the ILP and occupancy.
For that purpose, it computes the heuristic metric to decide if the current schedule is worth to be kept.
This is an attempt to use the idea in the https://reviews.llvm.org/D139710 to replace the shouldRevertScheduling function. This approach avoids additional computations in compile time by utilizing iterative metric computation during scheduling. Unlike the https://reviews.llvm.org/D139710 the heuristic is applied to all scheduling stages.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	1,070 ms	x64 debian > LLVM.CodeGen/AMDGPU::debug-value-scheduler.mir

Event Timeline

alex-t created this revision.Aug 20 2023, 8:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 20 2023, 8:36 AM

Herald added subscribers: foad, javed.absar, hiraditya and 7 others. · View Herald Transcript

alex-t requested review of this revision.Aug 20 2023, 8:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 20 2023, 8:36 AM

Herald added subscribers: wangpc, wdng. · View Herald Transcript

Please don't be frustrated too much. This review mostly aims to keep the discussion and collect opinions regarding the balancing scheduling strategy.

alex-t edited the summary of this revision. (Show Details)Aug 20 2023, 8:45 AM

Harbormaster completed remote builds in B253728: Diff 551846.Aug 20 2023, 9:42 AM

jrbyrnes added inline comments.Aug 21 2023, 9:40 AM

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
1474	I think this is a lower bound on what we have previously called ScheduleLength?
1487	Does this mean we always take the result from MaxOccupancy stage? I wonder if there is a way to have an initial target metric for the MaxxOcc stage?
llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
498	I'm not sure I understand the purpose of UnclusteredHighRPReschedule stage in the context of a balanced scheduler.

Fixed the case when both Top and Bot current cycles are 0. The schedule length is not really zero but it does not make sense to asses such a short MBBs.

alex-t added inline comments.Aug 21 2023, 12:57 PM

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
1474	Exactly. I realized that the ratio of the total stall cycles to the total amount of instructions better reflects the metric than the ratio of total stalls to the modeled length (i.e., the total amount of working cycles + the total amount of stalls).
1487	The PrevMetric is zero only when we're in OccInitialSchedule stage because this stage is the first one. We compare the current stage metric with the previous best known on each next stage. Please note that we only record stage metrics if we don't revert the stage schedule.

Harbormaster completed remote builds in B253907: Diff 552108.Aug 21 2023, 2:43 PM

jrbyrnes added inline comments.Aug 21 2023, 5:01 PM

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
1474	That makes sense -- though, I would think we would still want to capture the latency as well. For example, StallTotal of say 15 when the total instruction latency is 30 means something different than if the total instruction latency was 150 (even if number of instructions is the same) -- the second schedule is able to hide latency better, thus has better ILP.

alex-t added inline comments.Aug 22 2023, 5:02 AM

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
1474	Oops... I was wrong here. The SchedBoundary::bumpNode sets the boundary current cycle considering the scheduled instruction latency. The current cycle is set to the recently scheduled instruction "ready cycle" - the latency is already counted. Thus, Top.CurrCycle + Bot.CurrCycle gives us the total instruction latency, indeed.

jrbyrnes added inline comments.Aug 22 2023, 4:28 PM

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
1474	Yes, bumpNode resolves the required latency for the instruction currently being scheduled, so Top.CurrCycle includes how many stalls are required before issuing the instructions in the top-down portion of schedule -- similar story for Bot.CurrCyle. However, we don't know how many stalls are required between the instructions in the bottom-up portion of schedule and top-down portion of schedule. If none are required, then the sum of CurrCycle is the ScheduleLength. However, if, for example, there is dependency between last node in TopDown and last node in BottomUp, the stalls required to resolve that latency won't be accounted for in Top.CurrCycle + Bot.CurrCycle.

alex-t added inline comments.Aug 23 2023, 6:43 AM

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
1474	if, for example, there is dependency between last node in TopDown and last node in BottomUp, the stalls required to resolve that latency won't be accounted for in Top.CurrCycle + Bot.CurrCycle. Let's see what will happen in this case. The TopReadyCycle refers to the number of cycles necessary for the instruction result to be available to its user. If the user is the top instruction scheduled bottom-up, this number has already been counted in Top.bumpNode() and Top.CurrCycle has been updated.

jrbyrnes added inline comments.Aug 23 2023, 6:09 PM

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
1487	I see -- thanks.
llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
499	Probably shouldn't allow occupancy drops in ClusteredLowOccupancyReschedule, otherwise we will need to rerun the phase.

Schedule length is now computed, accounting for the possible gap between the Top and Bottom scheduling bounds.

alex-t marked 2 inline comments as done.Aug 25 2023, 11:48 AM

alex-t added inline comments.

llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
498	As I understood, the UnclusteredHighRPReschedule attempts to achieve better occupancy by removing most mutations, allowing for loose instruction placement. Since our metric aims for the balance between better occupancy and better ILP, we might accept the result of the UnclusteredHighRPReschedule phase if it was managed to achieve better occupancy w/o the loss in ILP.
499	Okay, it may achieve better ILP, sacrificing the occupancy, and the resulting metric will be the same or even better. Did you mean we should have a separate check that reverts the ClusteredLowOccupancyReschedule if the occupancy has been dropped?

jrbyrnes added inline comments.Aug 25 2023, 12:59 PM

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
1491	The FurthestPred won't always have the edge that creates the gap.
1500	TopDistance should be based on successor used in TheHighestInBot calculation. auto TopToBotDistance = 0; for (auto &Succ : TheFurthestPred->Succs) { if (!Succ.isAssignedRegDep) continue; unsigned SuccDistance = M.computeInstrLatency(Succ.getSUnit()->getInstr()) + TheFurthestPred->TopReadyCycle; if (SuccDistance <= Top.getReadyCycle()) cotninue; unsigned BotReadyCycle = Succ.getSUnit()->BotReadyCycle; unsigned BotDistance = Bot.getCurrCycle() - BotReadyCycle unsigned Gapsize = TopDistance - BotDistance;; if (GapSize > MaxGapSize) { TopToBotDistance = GapSize; } }
llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
498	The phase is only run for a subset of regions -- DAG.RegionsWithHighRP are in that subset. If we have high RP such that we are in danger of dropping occupancy, we reschedule the region without mutations to attempt to reduce RP and save our occupancy. The meta heuristic used to determine which regions need rescheduling is based on register pressure and occupancy -- this doesn't seem in accordance with the spirit of a "balanced" scheduler.
499	Did you mean we should have a separate check that reverts the ClusteredLowOccupancyReschedule if the occupancy has been dropped? Yes something like that -- if ClusteredLowOccupancy drops the occupancy then revert. Phase needs a stable occupancy in order for it to achieve its purpose.

Harbormaster completed remote builds in B254945: Diff 553553.Aug 25 2023, 2:26 PM

In this version, I decided to abandon the iterative metric computation during the scheduling
because it is impossible to predict the final total latency (schedule length) due to the bidirectional scheduling.
The full DAG traversal is needed when the scheduling is done.

alex-t added inline comments.Aug 29 2023, 11:26 AM

llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
498	The problem is that we report we are running out of registers right away when the occupancy is going under 4. Just because the VGPR limit is 32. That is exactly why we initially started to look for the heuristic: we have got a regression because of bumping the occupancy from 3 to 4 but introducing the huge latency. Once again, nobody proved that this "magic" occupancy 4 is the optimal one for all the cases. The current "naive" objective function: new_occupancy/old_occupancy * old_metric/new_metric is just a starting point. It already takes into account the change in occupancy versus change in ILP. The goal is to improve it to reflect our needs.
499	I would prefer to change the heuristic in such a way that dropping occupancy leads to the low metric and the schedule is reverted automatically.

Harbormaster completed remote builds in B255583: Diff 554427.Aug 29 2023, 3:33 PM

Changed the criteria for reverting current stage schedule and for the HighOccupancyRPStage running on region

Harbormaster completed remote builds in B257145: Diff 556666.Sep 13 2023, 10:42 AM

jrbyrnes added inline comments.Sep 13 2023, 5:11 PM

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
1014	This means we will always prioritize not letting occupancy drop, perhaps at the expensive of throwing away a good ILP schedule. For this phase specifically, I think we should only do this check if we are using the MaxOccupancy strategy -- or perhaps, implement `computeScheduleMetric` for the MaxOccupancy strategy and use it in place of `GCNSchedStage::shouldRevertScheduling`
1040	Seems to me this phase should just not be run if `!isRegionWithExcessRP()`

alex-t marked 2 inline comments as done.Sep 14 2023, 1:33 PM

alex-t added inline comments.

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
1014	I am pretty sure this check is useless in this particular place and could be removed.
1040	Some regions were excess RP and the UnclasteredHighRP stage was called to try making them better. They might no longer have excess RP in case it succeeded. Unfortunately, decreasing the RP they lost ILP and they might have worse metrics. Thus they would have been reverted. This check is here to avoid this.

Unused and unnecessary code cleanup

Harbormaster completed remote builds in B257246: Diff 556810.Sep 14 2023, 2:34 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

23 lines

GCNSchedStrategy.h

82 lines

GCNSchedStrategy.cpp

178 lines

test/

CodeGen/

AMDGPU/

gfx-callable-return-types.ll

298 lines

load-global-i16.ll

715 lines

machine-scheduler-sink-trivial-remats.mir

56 lines

scc-clobbered-sgpr-to-vmem-spill.ll

230 lines

Diff 551846

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

//===-- AMDGPUTargetMachine.cpp - TargetMachine for hw codegen targets-----===//		//===-- AMDGPUTargetMachine.cpp - TargetMachine for hw codegen targets-----===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 335 Lines • ▼ Show 20 Lines	static cl::opt<bool> EnableMaxIlpSchedStrategy(
cl::desc("Enable scheduling strategy to maximize ILP for a single wave."),		cl::desc("Enable scheduling strategy to maximize ILP for a single wave."),
cl::Hidden, cl::init(false));		cl::Hidden, cl::init(false));

static cl::opt<bool> EnableRewritePartialRegUses(		static cl::opt<bool> EnableRewritePartialRegUses(
"amdgpu-enable-rewrite-partial-reg-uses",		"amdgpu-enable-rewrite-partial-reg-uses",
cl::desc("Enable rewrite partial reg uses pass"), cl::init(false),		cl::desc("Enable rewrite partial reg uses pass"), cl::init(false),
cl::Hidden);		cl::Hidden);

		static cl::opt<bool> EnableBalancedSchedStrategy(
		"amdgpu-enable-balanced-scheduling-strategy",
		cl::desc(
		"Enable scheduling strategy to tradeoff between ILP and occupancy."),
		cl::Hidden, cl::init(true));

extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {		extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
// Register the target		// Register the target
RegisterTargetMachine<R600TargetMachine> X(getTheR600Target());		RegisterTargetMachine<R600TargetMachine> X(getTheR600Target());
RegisterTargetMachine<GCNTargetMachine> Y(getTheGCNTarget());		RegisterTargetMachine<GCNTargetMachine> Y(getTheGCNTarget());

PassRegistry *PR = PassRegistry::getPassRegistry();		PassRegistry *PR = PassRegistry::getPassRegistry();
initializeR600ClauseMergePassPass(*PR);		initializeR600ClauseMergePassPass(*PR);
initializeR600ControlFlowFinalizerPass(*PR);		initializeR600ControlFlowFinalizerPass(*PR);
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
createGCNMaxILPMachineScheduler(MachineSchedContext *C) {		createGCNMaxILPMachineScheduler(MachineSchedContext *C) {
ScheduleDAGMILive *DAG =		ScheduleDAGMILive *DAG =
new GCNScheduleDAGMILive(C, std::make_unique<GCNMaxILPSchedStrategy>(C));		new GCNScheduleDAGMILive(C, std::make_unique<GCNMaxILPSchedStrategy>(C));
DAG->addMutation(createIGroupLPDAGMutation());		DAG->addMutation(createIGroupLPDAGMutation());
return DAG;		return DAG;
}		}

static ScheduleDAGInstrs *		static ScheduleDAGInstrs *
		createGCNBalancedMachineScheduler(MachineSchedContext *C) {
		const GCNSubtarget &ST = C->MF->getSubtarget<GCNSubtarget>();
		ScheduleDAGMILive *DAG =
		new GCNScheduleDAGMILive(C, std::make_unique<GCNBalancedSchedStrategy>(C));
		DAG->addMutation(createLoadClusterDAGMutation(DAG->TII, DAG->TRI));
		if (ST.shouldClusterStores())
		DAG->addMutation(createStoreClusterDAGMutation(DAG->TII, DAG->TRI));
		DAG->addMutation(createIGroupLPDAGMutation());
		DAG->addMutation(createAMDGPUMacroFusionDAGMutation());
		DAG->addMutation(createAMDGPUExportClusteringDAGMutation());
		return DAG;
		}

		static ScheduleDAGInstrs *
createIterativeGCNMaxOccupancyMachineScheduler(MachineSchedContext *C) {		createIterativeGCNMaxOccupancyMachineScheduler(MachineSchedContext *C) {
const GCNSubtarget &ST = C->MF->getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = C->MF->getSubtarget<GCNSubtarget>();
auto DAG = new GCNIterativeScheduler(C,		auto DAG = new GCNIterativeScheduler(C,
GCNIterativeScheduler::SCHEDULE_LEGACYMAXOCCUPANCY);		GCNIterativeScheduler::SCHEDULE_LEGACYMAXOCCUPANCY);
DAG->addMutation(createLoadClusterDAGMutation(DAG->TII, DAG->TRI));		DAG->addMutation(createLoadClusterDAGMutation(DAG->TII, DAG->TRI));
if (ST.shouldClusterStores())		if (ST.shouldClusterStores())
DAG->addMutation(createStoreClusterDAGMutation(DAG->TII, DAG->TRI));		DAG->addMutation(createStoreClusterDAGMutation(DAG->TII, DAG->TRI));
return DAG;		return DAG;
▲ Show 20 Lines • Show All 647 Lines • ▼ Show 20 Lines	ScheduleDAGInstrs *GCNPassConfig::createMachineScheduler(
MachineSchedContext *C) const {		MachineSchedContext *C) const {
const GCNSubtarget &ST = C->MF->getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = C->MF->getSubtarget<GCNSubtarget>();
if (ST.enableSIScheduler())		if (ST.enableSIScheduler())
return createSIMachineScheduler(C);		return createSIMachineScheduler(C);

if (EnableMaxIlpSchedStrategy)		if (EnableMaxIlpSchedStrategy)
return createGCNMaxILPMachineScheduler(C);		return createGCNMaxILPMachineScheduler(C);

		if (EnableBalancedSchedStrategy)
		return createGCNBalancedMachineScheduler(C);

return createGCNMaxOccupancyMachineScheduler(C);		return createGCNMaxOccupancyMachineScheduler(C);
}		}

bool GCNPassConfig::addPreISel() {		bool GCNPassConfig::addPreISel() {
AMDGPUPassConfig::addPreISel();		AMDGPUPassConfig::addPreISel();

if (TM->getOptLevel() > CodeGenOpt::None)		if (TM->getOptLevel() > CodeGenOpt::None)
addPass(createAMDGPULateCodeGenPreparePass());		addPass(createAMDGPULateCodeGenPreparePass());
▲ Show 20 Lines • Show All 514 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/GCNSchedStrategy.h

//===-- GCNSchedStrategy.h - GCN Scheduler Strategy -- C++ --------------===//		//===-- GCNSchedStrategy.h - GCN Scheduler Strategy -- C++ --------------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	public:
GCNSchedStageID getCurrentStage();		GCNSchedStageID getCurrentStage();

// Advances stage. Returns true if there are remaining stages.		// Advances stage. Returns true if there are remaining stages.
bool advanceStage();		bool advanceStage();

bool hasNextStage() const;		bool hasNextStage() const;

GCNSchedStageID getNextStage() const;		GCNSchedStageID getNextStage() const;

		virtual bool computeScheduleMetric(unsigned RegionIdx, unsigned WavesAfter,
		unsigned WavesBefore) {
		return false;
		}
		virtual void clearMetric(){};
};		};

/// The goal of this scheduling strategy is to maximize kernel occupancy (i.e.		/// The goal of this scheduling strategy is to maximize kernel occupancy (i.e.
/// maximum number of waves per simd).		/// maximum number of waves per simd).
class GCNMaxOccupancySchedStrategy final : public GCNSchedStrategy {		class GCNMaxOccupancySchedStrategy final : public GCNSchedStrategy {
public:		public:
GCNMaxOccupancySchedStrategy(const MachineSchedContext *C);		GCNMaxOccupancySchedStrategy(const MachineSchedContext *C);
};		};
▲ Show 20 Lines • Show All 291 Lines • ▼ Show 20 Lines	public:

void finalizeSchedule() override;		void finalizeSchedule() override;

GCNPostScheduleDAGMILive(MachineSchedContext *C,		GCNPostScheduleDAGMILive(MachineSchedContext *C,
std::unique_ptr<MachineSchedStrategy> S,		std::unique_ptr<MachineSchedStrategy> S,
bool RemoveKillFlags);		bool RemoveKillFlags);
};		};

		#ifndef NDEBUG
		struct EarlierIssuingCycle {
		bool operator()(std::pair<MachineInstr *, unsigned> A,
		std::pair<MachineInstr *, unsigned> B) const {
		return A.second < B.second;
		}
		};
		#endif

		/// The goal of this scheduling strategy is to find a reasonable tradeof between
		/// the kernel occupancy (i.e. maximum number of waves per simd). and ILP (i.e.
		/// minimize the amount of stall cycles by means of the better latency
		/// covering).
		class GCNBalancedSchedStrategy final : public GCNSchedStrategy {

		const unsigned ScaleFactor = 100;
		unsigned StallTotal = 0;
		DenseMap<unsigned, SmallVector<unsigned, 4>> Metrics;

		void clearMetric() override {
		StallTotal = 0;
		#ifndef NDEBUG
		BottomScheduledSU.clear();
		PrintableSchedule.clear();
		#endif
		}

		#ifndef NDEBUG
		std::set<std::pair<MachineInstr *, unsigned>, EarlierIssuingCycle> PrintableSchedule;
		// Since we don't know the absolute value of the bottom ready cycless until we
		// finish scheduling we need to sustain temporary mapping from the
		// SUnit::nodeNum to MI to be able later fill in the PrintableSchedule
		std::vector<SUnit*> BottomScheduledSU;

		void makePrintableSchedule(unsigned ScheduleLength) {
		for (auto SU : BottomScheduledSU) {
		unsigned BotReadyCycle = ScheduleLength - SU->BotReadyCycle;
		PrintableSchedule.insert(std::pair(SU->getInstr(), BotReadyCycle));
		}
		}

		void printSchedule() {
		if (PrintableSchedule.empty())
		return;

		unsigned BBNum = PrintableSchedule.begin()->first->getParent()->getNumber();
		dbgs() << "\n################## Schedule time ReadyCycles for MBB : "
		<< BBNum
		<< " ##################\n# Cycle #\t\t\tInstruction "
		" "
		" \n";
		unsigned IPrev = 1;
		for (auto &I : PrintableSchedule) {
		if (I.second > IPrev + 1)
		dbgs() << "****************************** BUBBLE OF "
		<< I.second - IPrev
		<< " CYCLES DETECTED ******************************\n\n";
		dbgs() << "[ " << I.second << " ] : " << *I.first << "\n";
		IPrev = I.second;
		}
		}
		#endif

		public:
		GCNBalancedSchedStrategy(const MachineSchedContext *C) : GCNSchedStrategy(C) {
		SchedStages.push_back(GCNSchedStageID::OccInitialSchedule);
		SchedStages.push_back(GCNSchedStageID::UnclusteredHighRPReschedule);
		jrbyrnesUnsubmitted Done Reply Inline Actions I'm not sure I understand the purpose of UnclusteredHighRPReschedule stage in the context of a balanced scheduler. jrbyrnes: I'm not sure I understand the purpose of UnclusteredHighRPReschedule stage in the context of a…
		alex-tAuthorUnsubmitted Done Reply Inline Actions As I understood, the UnclusteredHighRPReschedule attempts to achieve better occupancy by removing most mutations, allowing for loose instruction placement. Since our metric aims for the balance between better occupancy and better ILP, we might accept the result of the UnclusteredHighRPReschedule phase if it was managed to achieve better occupancy w/o the loss in ILP. alex-t: As I understood, the UnclusteredHighRPReschedule attempts to achieve better occupancy by…
		jrbyrnesUnsubmitted Not Done Reply Inline Actions The phase is only run for a subset of regions -- DAG.RegionsWithHighRP are in that subset. If we have high RP such that we are in danger of dropping occupancy, we reschedule the region without mutations to attempt to reduce RP and save our occupancy. The meta heuristic used to determine which regions need rescheduling is based on register pressure and occupancy -- this doesn't seem in accordance with the spirit of a "balanced" scheduler. jrbyrnes: The phase is only run for a subset of regions -- DAG.RegionsWithHighRP are in that subset. If…
		alex-tAuthorUnsubmitted Done Reply Inline Actions The problem is that we report we are running out of registers right away when the occupancy is going under 4. Just because the VGPR limit is 32. That is exactly why we initially started to look for the heuristic: we have got a regression because of bumping the occupancy from 3 to 4 but introducing the huge latency. Once again, nobody proved that this "magic" occupancy 4 is the optimal one for all the cases. The current "naive" objective function: new_occupancy/old_occupancy * old_metric/new_metric is just a starting point. It already takes into account the change in occupancy versus change in ILP. The goal is to improve it to reflect our needs. alex-t: The problem is that we report we are running out of registers right away when the occupancy is…
		SchedStages.push_back(GCNSchedStageID::ClusteredLowOccupancyReschedule);
		jrbyrnesUnsubmitted Done Reply Inline Actions Probably shouldn't allow occupancy drops in ClusteredLowOccupancyReschedule, otherwise we will need to rerun the phase. jrbyrnes: Probably shouldn't allow occupancy drops in ClusteredLowOccupancyReschedule, otherwise we will…
		alex-tAuthorUnsubmitted Done Reply Inline Actions Okay, it may achieve better ILP, sacrificing the occupancy, and the resulting metric will be the same or even better. Did you mean we should have a separate check that reverts the ClusteredLowOccupancyReschedule if the occupancy has been dropped? alex-t: Okay, it may achieve better ILP, sacrificing the occupancy, and the resulting metric will be…
		jrbyrnesUnsubmitted Not Done Reply Inline Actions Did you mean we should have a separate check that reverts the ClusteredLowOccupancyReschedule if the occupancy has been dropped? Yes something like that -- if ClusteredLowOccupancy drops the occupancy then revert. Phase needs a stable occupancy in order for it to achieve its purpose. jrbyrnes: > Did you mean we should have a separate check that reverts the ClusteredLowOccupancyReschedule…
		alex-tAuthorUnsubmitted Done Reply Inline Actions I would prefer to change the heuristic in such a way that dropping occupancy leads to the low metric and the schedule is reverted automatically. alex-t: I would prefer to change the heuristic in such a way that dropping occupancy leads to the low…
		SchedStages.push_back(GCNSchedStageID::PreRARematerialize);
		}

		void schedNode(SUnit *SU, bool IsTopNode) override;
		bool computeScheduleMetric(unsigned RegionIdx, unsigned WavesAfter,
		unsigned WavesBefore) override;
		};

} // End namespace llvm		} // End namespace llvm

#endif // LLVM_LIB_TARGET_AMDGPU_GCNSCHEDSTRATEGY_H		#endif // LLVM_LIB_TARGET_AMDGPU_GCNSCHEDSTRATEGY_H

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp

//===-- GCNSchedStrategy.cpp - GCN Scheduler Strategy ---------------------===//		//===-- GCNSchedStrategy.cpp - GCN Scheduler Strategy ---------------------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 825 Lines • ▼ Show 20 Lines	LLVM_DEBUG(
dbgs() << "Pressure before scheduling:\nRegion live-ins:"		dbgs() << "Pressure before scheduling:\nRegion live-ins:"
<< print(DAG.LiveIns[RegionIdx], DAG.MRI)		<< print(DAG.LiveIns[RegionIdx], DAG.MRI)
<< "Region live-in pressure: "		<< "Region live-in pressure: "
<< print(llvm::getRegPressure(DAG.MRI, DAG.LiveIns[RegionIdx]))		<< print(llvm::getRegPressure(DAG.MRI, DAG.LiveIns[RegionIdx]))
<< "Region register pressure: " << print(PressureBefore));		<< "Region register pressure: " << print(PressureBefore));

S.HasHighPressure = false;		S.HasHighPressure = false;
S.KnownExcessRP = isRegionWithExcessRP();		S.KnownExcessRP = isRegionWithExcessRP();
		S.clearMetric();

if (DAG.RegionsWithIGLPInstrs[RegionIdx] &&		if (DAG.RegionsWithIGLPInstrs[RegionIdx] &&
StageID != GCNSchedStageID::UnclusteredHighRPReschedule) {		StageID != GCNSchedStageID::UnclusteredHighRPReschedule) {
SavedMutations.clear();		SavedMutations.clear();
SavedMutations.swap(DAG.Mutations);		SavedMutations.swap(DAG.Mutations);
DAG.addMutation(createIGroupLPDAGMutation());		DAG.addMutation(createIGroupLPDAGMutation());
}		}

▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	void GCNSchedStage::checkScheduling() {
if (PressureAfter.getVGPRNum(false) > MaxVGPRs \|\|		if (PressureAfter.getVGPRNum(false) > MaxVGPRs \|\|
PressureAfter.getAGPRNum() > MaxVGPRs \|\|		PressureAfter.getAGPRNum() > MaxVGPRs \|\|
PressureAfter.getSGPRNum() > MaxSGPRs) {		PressureAfter.getSGPRNum() > MaxSGPRs) {
DAG.RescheduleRegions[RegionIdx] = true;		DAG.RescheduleRegions[RegionIdx] = true;
DAG.RegionsWithHighRP[RegionIdx] = true;		DAG.RegionsWithHighRP[RegionIdx] = true;
DAG.RegionsWithExcessRP[RegionIdx] = true;		DAG.RegionsWithExcessRP[RegionIdx] = true;
}		}

// Revert if this region's schedule would cause a drop in occupancy or		bool IsWorse = S.computeScheduleMetric(RegionIdx, WavesAfter, WavesBefore);
// spilling.		if (IsWorse &&
if (shouldRevertScheduling(WavesAfter)) {		!(DAG.RegionsWithExcessRP[RegionIdx] &&
		S.getCurrentStage() == GCNSchedStageID::UnclusteredHighRPReschedule)) {
revertScheduling();		revertScheduling();
} else {		} else {
DAG.Pressure[RegionIdx] = PressureAfter;		DAG.Pressure[RegionIdx] = PressureAfter;
DAG.RegionsWithMinOcc[RegionIdx] =		DAG.RegionsWithMinOcc[RegionIdx] =
PressureAfter.getOccupancy(ST) == DAG.MinOccupancy;		PressureAfter.getOccupancy(ST) == DAG.MinOccupancy;
}		}
}		}

Show All 9 Lines	if (D.isAssignedRegDep()) {
unsigned DefReady = ReadyCycles[DAG.getSUnit(DefMI)->NodeNum];		unsigned DefReady = ReadyCycles[DAG.getSUnit(DefMI)->NodeNum];
ReadyCycle = std::max(ReadyCycle, DefReady + Latency);		ReadyCycle = std::max(ReadyCycle, DefReady + Latency);
}		}
}		}
ReadyCycles[SU.NodeNum] = ReadyCycle;		ReadyCycles[SU.NodeNum] = ReadyCycle;
return ReadyCycle;		return ReadyCycle;
}		}

#ifndef NDEBUG
struct EarlierIssuingCycle {
bool operator()(std::pair<MachineInstr *, unsigned> A,
std::pair<MachineInstr *, unsigned> B) const {
return A.second < B.second;
}
};

static void printScheduleModel(std::set<std::pair<MachineInstr *, unsigned>,
EarlierIssuingCycle> &ReadyCycles) {
if (ReadyCycles.empty())
return;
unsigned BBNum = ReadyCycles.begin()->first->getParent()->getNumber();
dbgs() << "\n################## Schedule time ReadyCycles for MBB : " << BBNum
<< " ##################\n# Cycle #\t\t\tInstruction "
" "
" \n";
unsigned IPrev = 1;
for (auto &I : ReadyCycles) {
if (I.second > IPrev + 1)
dbgs() << "****************************** BUBBLE OF " << I.second - IPrev
<< " CYCLES DETECTED ******************************\n\n";
dbgs() << "[ " << I.second << " ] : " << *I.first << "\n";
IPrev = I.second;
}
}
#endif

ScheduleMetrics
GCNSchedStage::getScheduleMetrics(const std::vector<SUnit> &InputSchedule) {
#ifndef NDEBUG
std::set<std::pair<MachineInstr *, unsigned>, EarlierIssuingCycle>
ReadyCyclesSorted;
#endif
const TargetSchedModel &SM = ST.getInstrInfo()->getSchedModel();
unsigned SumBubbles = 0;
DenseMap<unsigned, unsigned> ReadyCycles;
unsigned CurrCycle = 0;
for (auto &SU : InputSchedule) {
unsigned ReadyCycle =
computeSUnitReadyCycle(SU, CurrCycle, ReadyCycles, SM);
SumBubbles += ReadyCycle - CurrCycle;
#ifndef NDEBUG
ReadyCyclesSorted.insert(std::make_pair(SU.getInstr(), ReadyCycle));
#endif
CurrCycle = ++ReadyCycle;
}
#ifndef NDEBUG
LLVM_DEBUG(
printScheduleModel(ReadyCyclesSorted);
dbgs() << "\n\t"
<< "Metric: "
<< (SumBubbles
? (SumBubbles * ScheduleMetrics::ScaleFactor) / CurrCycle
: 1)
<< "\n\n");
#endif

return ScheduleMetrics(CurrCycle, SumBubbles);
}

ScheduleMetrics
GCNSchedStage::getScheduleMetrics(const GCNScheduleDAGMILive &DAG) {
#ifndef NDEBUG
std::set<std::pair<MachineInstr *, unsigned>, EarlierIssuingCycle>
ReadyCyclesSorted;
#endif
const TargetSchedModel &SM = ST.getInstrInfo()->getSchedModel();
unsigned SumBubbles = 0;
DenseMap<unsigned, unsigned> ReadyCycles;
unsigned CurrCycle = 0;
for (auto &MI : DAG) {
SUnit *SU = DAG.getSUnit(&MI);
if (!SU)
continue;
unsigned ReadyCycle =
computeSUnitReadyCycle(*SU, CurrCycle, ReadyCycles, SM);
SumBubbles += ReadyCycle - CurrCycle;
#ifndef NDEBUG
ReadyCyclesSorted.insert(std::make_pair(SU->getInstr(), ReadyCycle));
#endif
CurrCycle = ++ReadyCycle;
}
#ifndef NDEBUG
LLVM_DEBUG(
printScheduleModel(ReadyCyclesSorted);
dbgs() << "\n\t"
<< "Metric: "
<< (SumBubbles
? (SumBubbles * ScheduleMetrics::ScaleFactor) / CurrCycle
: 1)
<< "\n\n");
#endif

return ScheduleMetrics(CurrCycle, SumBubbles);
}

bool GCNSchedStage::shouldRevertScheduling(unsigned WavesAfter) {		bool GCNSchedStage::shouldRevertScheduling(unsigned WavesAfter) {
if (WavesAfter < DAG.MinOccupancy)		if (WavesAfter < DAG.MinOccupancy)
return true;		return true;

return false;		return false;
}		}

bool OccInitialScheduleStage::shouldRevertScheduling(unsigned WavesAfter) {		bool OccInitialScheduleStage::shouldRevertScheduling(unsigned WavesAfter) {
if (PressureAfter == PressureBefore)		if (PressureAfter == PressureBefore)
return false;		return false;

if (GCNSchedStage::shouldRevertScheduling(WavesAfter))		if (GCNSchedStage::shouldRevertScheduling(WavesAfter))
		jrbyrnesUnsubmitted Done Reply Inline Actions This means we will always prioritize not letting occupancy drop, perhaps at the expensive of throwing away a good ILP schedule. For this phase specifically, I think we should only do this check if we are using the MaxOccupancy strategy -- or perhaps, implement `computeScheduleMetric` for the MaxOccupancy strategy and use it in place of `GCNSchedStage::shouldRevertScheduling` jrbyrnes: This means we will always prioritize not letting occupancy drop, perhaps at the expensive of…
		alex-tAuthorUnsubmitted Done Reply Inline Actions I am pretty sure this check is useless in this particular place and could be removed. alex-t: I am pretty sure this check is useless in this particular place and could be removed.
return true;		return true;

if (mayCauseSpilling(WavesAfter))		if (mayCauseSpilling(WavesAfter))
return true;		return true;

return false;		return false;
}		}

bool UnclusteredHighRPStage::shouldRevertScheduling(unsigned WavesAfter) {		bool UnclusteredHighRPStage::shouldRevertScheduling(unsigned WavesAfter) {
// If RP is not reduced in the unclustred reschedule stage, revert to the		// If RP is not reduced in the unclustred reschedule stage, revert to the
// old schedule.		// old schedule.
if ((WavesAfter <= PressureBefore.getOccupancy(ST) &&		if ((WavesAfter <= PressureBefore.getOccupancy(ST) &&
mayCauseSpilling(WavesAfter)) \|\|		mayCauseSpilling(WavesAfter)) \|\|
GCNSchedStage::shouldRevertScheduling(WavesAfter)) {		GCNSchedStage::shouldRevertScheduling(WavesAfter)) {
LLVM_DEBUG(dbgs() << "Unclustered reschedule did not help.\n");		LLVM_DEBUG(dbgs() << "Unclustered reschedule did not help.\n");
return true;		return true;
}		}

// Do not attempt to relax schedule even more if we are already spilling.
if (isRegionWithExcessRP())
return false;		return false;

LLVM_DEBUG(
dbgs()
<< "\n\t * In shouldRevertScheduling *\n"
<< " ********* BEFORE UnclusteredHighRPStage *********\n");
ScheduleMetrics MBefore =
getScheduleMetrics(DAG.SUnits);
LLVM_DEBUG(
dbgs()
<< "\n ********* AFTER UnclusteredHighRPStage *********\n");
ScheduleMetrics MAfter = getScheduleMetrics(DAG);
unsigned OldMetric = MBefore.getMetric();
unsigned NewMetric = MAfter.getMetric();
unsigned WavesBefore =
std::min(S.getTargetOccupancy(), PressureBefore.getOccupancy(ST));
unsigned Profit =
((WavesAfter * ScheduleMetrics::ScaleFactor) / WavesBefore *
((OldMetric + ScheduleMetricBias) * ScheduleMetrics::ScaleFactor) /
NewMetric) /
ScheduleMetrics::ScaleFactor;
LLVM_DEBUG(dbgs() << "\tMetric before " << MBefore << "\tMetric after "
<< MAfter << "Profit: " << Profit << "\n");
return Profit < ScheduleMetrics::ScaleFactor;
}		}

bool ClusteredLowOccStage::shouldRevertScheduling(unsigned WavesAfter) {		bool ClusteredLowOccStage::shouldRevertScheduling(unsigned WavesAfter) {
if (PressureAfter == PressureBefore)		if (PressureAfter == PressureBefore)
return false;		return false;

if (GCNSchedStage::shouldRevertScheduling(WavesAfter))		if (GCNSchedStage::shouldRevertScheduling(WavesAfter))
		jrbyrnesUnsubmitted Done Reply Inline Actions Seems to me this phase should just not be run if `!isRegionWithExcessRP()` jrbyrnes: Seems to me this phase should just not be run if `!isRegionWithExcessRP()`
		alex-tAuthorUnsubmitted Done Reply Inline Actions Some regions were excess RP and the UnclasteredHighRP stage was called to try making them better. They might no longer have excess RP in case it succeeded. Unfortunately, decreasing the RP they lost ILP and they might have worse metrics. Thus they would have been reverted. This check is here to avoid this. alex-t: Some regions were excess RP and the UnclasteredHighRP stage was called to try making them…
return true;		return true;

if (mayCauseSpilling(WavesAfter))		if (mayCauseSpilling(WavesAfter))
return true;		return true;

return false;		return false;
}		}

▲ Show 20 Lines • Show All 396 Lines • ▼ Show 20 Lines
}		}

void GCNPostScheduleDAGMILive::finalizeSchedule() {		void GCNPostScheduleDAGMILive::finalizeSchedule() {
if (HasIGLPInstrs)		if (HasIGLPInstrs)
SavedMutations.swap(Mutations);		SavedMutations.swap(Mutations);

ScheduleDAGMI::finalizeSchedule();		ScheduleDAGMI::finalizeSchedule();
}		}

		void llvm::GCNBalancedSchedStrategy::schedNode(SUnit *SU, bool IsTopNode) {
		if (IsTopNode) {
		#ifndef NDEBUG
		PrintableSchedule.insert(std::pair(SU->getInstr(), SU->TopReadyCycle));
		#endif
		StallTotal += SU->TopReadyCycle > Top.getCurrCycle()
		? SU->TopReadyCycle - Top.getCurrCycle()
		: 0;
		} else {
		BottomScheduledSU.push_back(SU);
		StallTotal += SU->BotReadyCycle > Bot.getCurrCycle()
		? SU->BotReadyCycle - Bot.getCurrCycle()
		: 0;
		}
		GCNSchedStrategy::schedNode(SU, IsTopNode);
		}

		bool llvm::GCNBalancedSchedStrategy::computeScheduleMetric(
		unsigned RegionIdx, unsigned WavesAfter, unsigned WavesBefore) {
		bool Result = false;
		unsigned ScheduleLength = Top.getCurrCycle() + Bot.getCurrCycle();
		jrbyrnesUnsubmitted Not Done Reply Inline Actions I think this is a lower bound on what we have previously called ScheduleLength? jrbyrnes: I think this is a lower bound on what we have previously called ScheduleLength?
		alex-tAuthorUnsubmitted Done Reply Inline Actions Exactly. I realized that the ratio of the total stall cycles to the total amount of instructions better reflects the metric than the ratio of total stalls to the modeled length (i.e., the total amount of working cycles + the total amount of stalls). alex-t: Exactly. I realized that the ratio of the total stall cycles to the total amount of…
		jrbyrnesUnsubmitted Not Done Reply Inline Actions That makes sense -- though, I would think we would still want to capture the latency as well. For example, StallTotal of say 15 when the total instruction latency is 30 means something different than if the total instruction latency was 150 (even if number of instructions is the same) -- the second schedule is able to hide latency better, thus has better ILP. jrbyrnes: That makes sense -- though, I would think we would still want to capture the latency as well.
		alex-tAuthorUnsubmitted Done Reply Inline Actions Oops... I was wrong here. The SchedBoundary::bumpNode sets the boundary current cycle considering the scheduled instruction latency. The current cycle is set to the recently scheduled instruction "ready cycle" - the latency is already counted. Thus, Top.CurrCycle + Bot.CurrCycle gives us the total instruction latency, indeed. alex-t: Oops... I was wrong here. The SchedBoundary::bumpNode sets the boundary current cycle…
		jrbyrnesUnsubmitted Not Done Reply Inline Actions Yes, bumpNode resolves the required latency for the instruction currently being scheduled, so Top.CurrCycle includes how many stalls are required before issuing the instructions in the top-down portion of schedule -- similar story for Bot.CurrCyle. However, we don't know how many stalls are required between the instructions in the bottom-up portion of schedule and top-down portion of schedule. If none are required, then the sum of CurrCycle is the ScheduleLength. However, if, for example, there is dependency between last node in TopDown and last node in BottomUp, the stalls required to resolve that latency won't be accounted for in Top.CurrCycle + Bot.CurrCycle. jrbyrnes: Yes, bumpNode resolves the required latency for the instruction currently being scheduled, so…
		alex-tAuthorUnsubmitted Done Reply Inline Actions if, for example, there is dependency between last node in TopDown and last node in BottomUp, the stalls required to resolve that latency won't be accounted for in Top.CurrCycle + Bot.CurrCycle. Let's see what will happen in this case. The TopReadyCycle refers to the number of cycles necessary for the instruction result to be available to its user. If the user is the top instruction scheduled bottom-up, this number has already been counted in Top.bumpNode() and Top.CurrCycle has been updated. alex-t: > if, for example, there is dependency between last node in TopDown and last node in BottomUp…
		#ifndef NDEBUG
		makePrintableSchedule(ScheduleLength);
		#endif
		unsigned PrevMetric = 0;
		if (Metrics.count(RegionIdx)) {
		PrevMetric = Metrics[RegionIdx].back();
		}
		unsigned Metric = StallTotal * ScaleFactor / ScheduleLength;
		Metric = Metric ? Metric : 1;
		#ifndef NDEBUG
		LLVM_DEBUG(printSchedule());
		#endif
		if (PrevMetric) {
		jrbyrnesUnsubmitted Not Done Reply Inline Actions Does this mean we always take the result from MaxOccupancy stage? I wonder if there is a way to have an initial target metric for the MaxxOcc stage? jrbyrnes: Does this mean we always take the result from MaxOccupancy stage? I wonder if there is a way to…
		alex-tAuthorUnsubmitted Done Reply Inline Actions The PrevMetric is zero only when we're in OccInitialSchedule stage because this stage is the first one. We compare the current stage metric with the previous best known on each next stage. Please note that we only record stage metrics if we don't revert the stage schedule. alex-t: The PrevMetric is zero only when we're in OccInitialSchedule stage because this stage is the…
		jrbyrnesUnsubmitted Not Done Reply Inline Actions I see -- thanks. jrbyrnes: I see -- thanks.
		unsigned Profit =
		((WavesAfter * ScaleFactor) / WavesBefore *
		((PrevMetric + ScheduleMetricBias) * ScaleFactor) / Metric) /
		ScaleFactor;
		jrbyrnesUnsubmitted Not Done Reply Inline Actions The FurthestPred won't always have the edge that creates the gap. jrbyrnes: The FurthestPred won't always have the edge that creates the gap.
		Result = Profit < ScaleFactor;
		}
		if (!Result)
		Metrics[RegionIdx].push_back(Metric);
		clearMetric();
		return Result;
		}
		jrbyrnesUnsubmitted Not Done Reply Inline Actions TopDistance should be based on successor used in TheHighestInBot calculation. auto TopToBotDistance = 0; for (auto &Succ : TheFurthestPred->Succs) { if (!Succ.isAssignedRegDep) continue; unsigned SuccDistance = M.computeInstrLatency(Succ.getSUnit()->getInstr()) + TheFurthestPred->TopReadyCycle; if (SuccDistance <= Top.getReadyCycle()) cotninue; unsigned BotReadyCycle = Succ.getSUnit()->BotReadyCycle; unsigned BotDistance = Bot.getCurrCycle() - BotReadyCycle unsigned Gapsize = TopDistance - BotDistance;; if (GapSize > MaxGapSize) { TopToBotDistance = GapSize; } } jrbyrnes: TopDistance should be based on successor used in TheHighestInBot calculation. auto…

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll

	Show First 20 Lines • Show All 2,393 Lines • ▼ Show 20 Lines

	; Check that return values larger than VGPR limit are handled correctly			; Check that return values larger than VGPR limit are handled correctly

	define amdgpu_gfx <72 x i32> @return_72xi32(<72 x i32> %val) #1 {			define amdgpu_gfx <72 x i32> @return_72xi32(<72 x i32> %val) #1 {
	; GFX9-LABEL: return_72xi32:			; GFX9-LABEL: return_72xi32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:160			; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:160
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s32 offset:156
				; GFX9-NEXT: buffer_load_dword v34, off, s[0:3], s32 offset:152
				; GFX9-NEXT: buffer_load_dword v35, off, s[0:3], s32 offset:148
				; GFX9-NEXT: buffer_load_dword v36, off, s[0:3], s32 offset:144
				; GFX9-NEXT: buffer_load_dword v37, off, s[0:3], s32 offset:140
				; GFX9-NEXT: buffer_load_dword v38, off, s[0:3], s32 offset:136
				; GFX9-NEXT: buffer_load_dword v39, off, s[0:3], s32 offset:132
				; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:284			; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:284
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:156			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_store_dword v33, v0, s[0:3], 0 offen offset:280
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:280			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:152			; GFX9-NEXT: buffer_store_dword v34, v0, s[0:3], 0 offen offset:276
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:276			; GFX9-NEXT: buffer_store_dword v35, v0, s[0:3], 0 offen offset:272
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:148			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_store_dword v36, v0, s[0:3], 0 offen offset:268
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:272			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:144			; GFX9-NEXT: buffer_store_dword v37, v0, s[0:3], 0 offen offset:264
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:268			; GFX9-NEXT: buffer_store_dword v38, v0, s[0:3], 0 offen offset:260
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:140			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_store_dword v39, v0, s[0:3], 0 offen offset:256
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:264
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:136
	; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:260
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:132
	; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:256
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:128			; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:128
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s32 offset:124
				; GFX9-NEXT: buffer_load_dword v34, off, s[0:3], s32 offset:120
				; GFX9-NEXT: buffer_load_dword v35, off, s[0:3], s32 offset:116
				; GFX9-NEXT: buffer_load_dword v36, off, s[0:3], s32 offset:112
				; GFX9-NEXT: buffer_load_dword v37, off, s[0:3], s32 offset:108
				; GFX9-NEXT: buffer_load_dword v38, off, s[0:3], s32 offset:104
				; GFX9-NEXT: buffer_load_dword v39, off, s[0:3], s32 offset:100
				; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:252			; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:252
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:124			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_store_dword v33, v0, s[0:3], 0 offen offset:248
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:248			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:120			; GFX9-NEXT: buffer_store_dword v34, v0, s[0:3], 0 offen offset:244
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:244			; GFX9-NEXT: buffer_store_dword v35, v0, s[0:3], 0 offen offset:240
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:116			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_store_dword v36, v0, s[0:3], 0 offen offset:236
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:240			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:112			; GFX9-NEXT: buffer_store_dword v37, v0, s[0:3], 0 offen offset:232
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:236			; GFX9-NEXT: buffer_store_dword v38, v0, s[0:3], 0 offen offset:228
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:108			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_store_dword v39, v0, s[0:3], 0 offen offset:224
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:232
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:104
	; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:228
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:100
	; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:224
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:96			; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:96
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s32 offset:92
				; GFX9-NEXT: buffer_load_dword v34, off, s[0:3], s32 offset:88
				; GFX9-NEXT: buffer_load_dword v35, off, s[0:3], s32 offset:84
				; GFX9-NEXT: buffer_load_dword v36, off, s[0:3], s32 offset:80
				; GFX9-NEXT: buffer_load_dword v37, off, s[0:3], s32 offset:76
				; GFX9-NEXT: buffer_load_dword v38, off, s[0:3], s32 offset:72
				; GFX9-NEXT: buffer_load_dword v39, off, s[0:3], s32 offset:68
				; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:220			; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:220
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:92			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_store_dword v33, v0, s[0:3], 0 offen offset:216
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:216			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:88			; GFX9-NEXT: buffer_store_dword v34, v0, s[0:3], 0 offen offset:212
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:212			; GFX9-NEXT: buffer_store_dword v35, v0, s[0:3], 0 offen offset:208
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:84			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_store_dword v36, v0, s[0:3], 0 offen offset:204
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:208			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:80			; GFX9-NEXT: buffer_store_dword v37, v0, s[0:3], 0 offen offset:200
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:204			; GFX9-NEXT: buffer_store_dword v38, v0, s[0:3], 0 offen offset:196
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:76			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_store_dword v39, v0, s[0:3], 0 offen offset:192
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:200
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:72
	; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:196
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:68
	; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:192
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:64			; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:64
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s32 offset:60
				; GFX9-NEXT: buffer_load_dword v34, off, s[0:3], s32 offset:56
				; GFX9-NEXT: buffer_load_dword v35, off, s[0:3], s32 offset:52
				; GFX9-NEXT: buffer_load_dword v36, off, s[0:3], s32 offset:48
				; GFX9-NEXT: buffer_load_dword v37, off, s[0:3], s32 offset:44
				; GFX9-NEXT: buffer_load_dword v38, off, s[0:3], s32 offset:40
				; GFX9-NEXT: buffer_load_dword v39, off, s[0:3], s32 offset:36
				; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:188			; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:188
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:60			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_store_dword v33, v0, s[0:3], 0 offen offset:184
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:184			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:56			; GFX9-NEXT: buffer_store_dword v34, v0, s[0:3], 0 offen offset:180
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:180			; GFX9-NEXT: buffer_store_dword v35, v0, s[0:3], 0 offen offset:176
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:52			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_store_dword v36, v0, s[0:3], 0 offen offset:172
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:176			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:48			; GFX9-NEXT: buffer_store_dword v37, v0, s[0:3], 0 offen offset:168
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:172			; GFX9-NEXT: buffer_store_dword v38, v0, s[0:3], 0 offen offset:164
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:44			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_store_dword v39, v0, s[0:3], 0 offen offset:160
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:168
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:40
	; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:164
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:36
	; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:160
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:32			; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:32
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s32 offset:28
				; GFX9-NEXT: buffer_load_dword v34, off, s[0:3], s32 offset:24
				; GFX9-NEXT: buffer_load_dword v35, off, s[0:3], s32 offset:20
				; GFX9-NEXT: buffer_load_dword v36, off, s[0:3], s32 offset:16
				; GFX9-NEXT: buffer_load_dword v37, off, s[0:3], s32 offset:12
				; GFX9-NEXT: buffer_load_dword v38, off, s[0:3], s32 offset:8
				; GFX9-NEXT: buffer_load_dword v39, off, s[0:3], s32 offset:4
				; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:156			; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:156
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:28			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_store_dword v33, v0, s[0:3], 0 offen offset:152
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:152			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:24			; GFX9-NEXT: buffer_store_dword v34, v0, s[0:3], 0 offen offset:148
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:148			; GFX9-NEXT: buffer_store_dword v35, v0, s[0:3], 0 offen offset:144
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:20			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_store_dword v36, v0, s[0:3], 0 offen offset:140
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:144			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:16			; GFX9-NEXT: buffer_store_dword v37, v0, s[0:3], 0 offen offset:136
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:140			; GFX9-NEXT: buffer_store_dword v38, v0, s[0:3], 0 offen offset:132
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:12			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: buffer_store_dword v39, v0, s[0:3], 0 offen offset:128
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:136
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:8
	; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:132
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:4
	; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:128
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32			; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s32
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:124			; GFX9-NEXT: buffer_store_dword v32, v0, s[0:3], 0 offen offset:124
	; GFX9-NEXT: buffer_store_dword v31, v0, s[0:3], 0 offen offset:120			; GFX9-NEXT: buffer_store_dword v31, v0, s[0:3], 0 offen offset:120
	; GFX9-NEXT: buffer_store_dword v30, v0, s[0:3], 0 offen offset:116			; GFX9-NEXT: buffer_store_dword v30, v0, s[0:3], 0 offen offset:116
	; GFX9-NEXT: buffer_store_dword v29, v0, s[0:3], 0 offen offset:112			; GFX9-NEXT: buffer_store_dword v29, v0, s[0:3], 0 offen offset:112
	; GFX9-NEXT: buffer_store_dword v28, v0, s[0:3], 0 offen offset:108			; GFX9-NEXT: buffer_store_dword v28, v0, s[0:3], 0 offen offset:108
	; GFX9-NEXT: buffer_store_dword v27, v0, s[0:3], 0 offen offset:104			; GFX9-NEXT: buffer_store_dword v27, v0, s[0:3], 0 offen offset:104
	▲ Show 20 Lines • Show All 300 Lines • ▼ Show 20 Lines

	define amdgpu_gfx void @call_72xi32() #1 {			define amdgpu_gfx void @call_72xi32() #1 {
	; GFX9-LABEL: call_72xi32:			; GFX9-LABEL: call_72xi32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s36, s33			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_add_i32 s33, s32, 0x7fc0			; GFX9-NEXT: s_add_i32 s33, s32, 0x7fc0
	; GFX9-NEXT: s_and_b32 s33, s33, 0xffff8000			; GFX9-NEXT: s_and_b32 s33, s33, 0xffff8000
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s33 offset:1536 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:1536 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_add_i32 s32, s32, 0x28000			; GFX9-NEXT: s_add_i32 s32, s32, 0x28000
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_72xi32@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_72xi32@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_72xi32@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_72xi32@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:60 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:56 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:56 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:136			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:136
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:140			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:140
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:144			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:144
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:148			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:148
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:152			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:152
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:156			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:156
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:160			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:160
	; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
	; GFX9-NEXT: v_writelane_b32 v33, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_add_u32_e32 v0, 0x200, v0			; GFX9-NEXT: v_add_u32_e32 v0, 0x200, v0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 0			; GFX9-NEXT: v_mov_b32_e32 v3, 0
	; GFX9-NEXT: v_mov_b32_e32 v4, 0			; GFX9-NEXT: v_mov_b32_e32 v4, 0
	; GFX9-NEXT: v_mov_b32_e32 v5, 0			; GFX9-NEXT: v_mov_b32_e32 v5, 0
	; GFX9-NEXT: v_mov_b32_e32 v6, 0			; GFX9-NEXT: v_mov_b32_e32 v6, 0
	; GFX9-NEXT: v_mov_b32_e32 v7, 0			; GFX9-NEXT: v_mov_b32_e32 v7, 0
	Show All 16 Lines
	; GFX9-NEXT: v_mov_b32_e32 v24, 0			; GFX9-NEXT: v_mov_b32_e32 v24, 0
	; GFX9-NEXT: v_mov_b32_e32 v25, 0			; GFX9-NEXT: v_mov_b32_e32 v25, 0
	; GFX9-NEXT: v_mov_b32_e32 v26, 0			; GFX9-NEXT: v_mov_b32_e32 v26, 0
	; GFX9-NEXT: v_mov_b32_e32 v27, 0			; GFX9-NEXT: v_mov_b32_e32 v27, 0
	; GFX9-NEXT: v_mov_b32_e32 v28, 0			; GFX9-NEXT: v_mov_b32_e32 v28, 0
	; GFX9-NEXT: v_mov_b32_e32 v29, 0			; GFX9-NEXT: v_mov_b32_e32 v29, 0
	; GFX9-NEXT: v_mov_b32_e32 v30, 0			; GFX9-NEXT: v_mov_b32_e32 v30, 0
	; GFX9-NEXT: v_mov_b32_e32 v31, 0			; GFX9-NEXT: v_mov_b32_e32 v31, 0
	; GFX9-NEXT: v_writelane_b32 v33, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:636			; GFX9-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:636
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:640			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:640
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:644			; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:644
	; GFX9-NEXT: buffer_load_dword v34, off, s[0:3], s33 offset:648			; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:648
	; GFX9-NEXT: buffer_load_dword v35, off, s[0:3], s33 offset:652			; GFX9-NEXT: buffer_load_dword v34, off, s[0:3], s33 offset:652
	; GFX9-NEXT: buffer_load_dword v36, off, s[0:3], s33 offset:656			; GFX9-NEXT: buffer_load_dword v35, off, s[0:3], s33 offset:656
	; GFX9-NEXT: buffer_load_dword v37, off, s[0:3], s33 offset:660			; GFX9-NEXT: buffer_load_dword v36, off, s[0:3], s33 offset:660
	; GFX9-NEXT: buffer_load_dword v38, off, s[0:3], s33 offset:664			; GFX9-NEXT: buffer_load_dword v37, off, s[0:3], s33 offset:664
	; GFX9-NEXT: buffer_load_dword v39, off, s[0:3], s33 offset:668			; GFX9-NEXT: buffer_load_dword v38, off, s[0:3], s33 offset:668
	; GFX9-NEXT: buffer_load_dword v48, off, s[0:3], s33 offset:672			; GFX9-NEXT: buffer_load_dword v39, off, s[0:3], s33 offset:672
	; GFX9-NEXT: buffer_load_dword v49, off, s[0:3], s33 offset:676			; GFX9-NEXT: buffer_load_dword v48, off, s[0:3], s33 offset:676
	; GFX9-NEXT: buffer_load_dword v50, off, s[0:3], s33 offset:680			; GFX9-NEXT: buffer_load_dword v49, off, s[0:3], s33 offset:680
	; GFX9-NEXT: buffer_load_dword v51, off, s[0:3], s33 offset:684			; GFX9-NEXT: buffer_load_dword v50, off, s[0:3], s33 offset:684
	; GFX9-NEXT: buffer_load_dword v52, off, s[0:3], s33 offset:688			; GFX9-NEXT: buffer_load_dword v51, off, s[0:3], s33 offset:688
	; GFX9-NEXT: buffer_load_dword v53, off, s[0:3], s33 offset:692			; GFX9-NEXT: buffer_load_dword v52, off, s[0:3], s33 offset:692
	; GFX9-NEXT: buffer_load_dword v54, off, s[0:3], s33 offset:696			; GFX9-NEXT: buffer_load_dword v53, off, s[0:3], s33 offset:696
	; GFX9-NEXT: buffer_load_dword v55, off, s[0:3], s33 offset:700			; GFX9-NEXT: buffer_load_dword v54, off, s[0:3], s33 offset:700
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:704			; GFX9-NEXT: buffer_load_dword v55, off, s[0:3], s33 offset:704
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:708			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:708
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:712			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:712
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:716			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:716
	; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:720			; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:720
	; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:724			; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:724
	; GFX9-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:728			; GFX9-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:728
	; GFX9-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:732			; GFX9-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:732
	; GFX9-NEXT: buffer_load_dword v56, off, s[0:3], s33 offset:736			; GFX9-NEXT: buffer_load_dword v56, off, s[0:3], s33 offset:736
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1544 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1544 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:632			; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:632
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1540 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v31, off, s[0:3], s33 offset:1540 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32 offset:8
	; GFX9-NEXT: buffer_store_dword v34, off, s[0:3], s32 offset:12			; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:12
	; GFX9-NEXT: buffer_store_dword v35, off, s[0:3], s32 offset:16			; GFX9-NEXT: buffer_store_dword v34, off, s[0:3], s32 offset:16
	; GFX9-NEXT: buffer_store_dword v36, off, s[0:3], s32 offset:20			; GFX9-NEXT: buffer_store_dword v35, off, s[0:3], s32 offset:20
	; GFX9-NEXT: buffer_store_dword v37, off, s[0:3], s32 offset:24			; GFX9-NEXT: buffer_store_dword v36, off, s[0:3], s32 offset:24
	; GFX9-NEXT: buffer_store_dword v38, off, s[0:3], s32 offset:28			; GFX9-NEXT: buffer_store_dword v37, off, s[0:3], s32 offset:28
	; GFX9-NEXT: buffer_store_dword v39, off, s[0:3], s32 offset:32			; GFX9-NEXT: buffer_store_dword v38, off, s[0:3], s32 offset:32
	; GFX9-NEXT: buffer_store_dword v48, off, s[0:3], s32 offset:36			; GFX9-NEXT: buffer_store_dword v39, off, s[0:3], s32 offset:36
	; GFX9-NEXT: buffer_store_dword v49, off, s[0:3], s32 offset:40			; GFX9-NEXT: buffer_store_dword v48, off, s[0:3], s32 offset:40
	; GFX9-NEXT: buffer_store_dword v50, off, s[0:3], s32 offset:44			; GFX9-NEXT: buffer_store_dword v49, off, s[0:3], s32 offset:44
	; GFX9-NEXT: buffer_store_dword v51, off, s[0:3], s32 offset:48			; GFX9-NEXT: buffer_store_dword v50, off, s[0:3], s32 offset:48
	; GFX9-NEXT: buffer_store_dword v52, off, s[0:3], s32 offset:52			; GFX9-NEXT: buffer_store_dword v51, off, s[0:3], s32 offset:52
	; GFX9-NEXT: buffer_store_dword v53, off, s[0:3], s32 offset:56			; GFX9-NEXT: buffer_store_dword v52, off, s[0:3], s32 offset:56
	; GFX9-NEXT: buffer_store_dword v54, off, s[0:3], s32 offset:60			; GFX9-NEXT: buffer_store_dword v53, off, s[0:3], s32 offset:60
	; GFX9-NEXT: buffer_store_dword v55, off, s[0:3], s32 offset:64			; GFX9-NEXT: buffer_store_dword v54, off, s[0:3], s32 offset:64
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:68			; GFX9-NEXT: buffer_store_dword v55, off, s[0:3], s32 offset:68
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:72			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:72
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:76			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:76
	; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:80			; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:80
	; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:84			; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:84
	; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:88			; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:88
	; GFX9-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:92			; GFX9-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:92
	; GFX9-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:96			; GFX9-NEXT: buffer_store_dword v47, off, s[0:3], s32 offset:96
	; GFX9-NEXT: buffer_store_dword v56, off, s[0:3], s32 offset:100			; GFX9-NEXT: buffer_store_dword v56, off, s[0:3], s32 offset:100
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: buffer_load_dword v56, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v56, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:56 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:56 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:60 ; 4-byte Folded Reload			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s31, v33, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s30, v33, 0			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:1536 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:1536 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_add_i32 s32, s32, 0xfffd8000			; GFX9-NEXT: s_add_i32 s32, s32, 0xfffd8000
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_72xi32:			; GFX10-LABEL: call_72xi32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	▲ Show 20 Lines • Show All 497 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/load-global-i16.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,565 Lines • ▼ Show 20 Lines
; GCN-NOHSA-SI-NEXT: s_endpgm		; GCN-NOHSA-SI-NEXT: s_endpgm
;		;
; GCN-HSA-LABEL: global_zextload_v64i16_to_v64i32:		; GCN-HSA-LABEL: global_zextload_v64i16_to_v64i32:
; GCN-HSA: ; %bb.0:		; GCN-HSA: ; %bb.0:
; GCN-HSA-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; GCN-HSA-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; GCN-HSA-NEXT: s_waitcnt lgkmcnt(0)		; GCN-HSA-NEXT: s_waitcnt lgkmcnt(0)
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2		; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v1, s3
; GCN-HSA-NEXT: flat_load_dwordx4 v[20:23], v[0:1]		; GCN-HSA-NEXT: flat_load_dwordx4 v[12:15], v[0:1]
; GCN-HSA-NEXT: s_add_u32 s4, s2, 0x50		; GCN-HSA-NEXT: s_add_u32 s4, s2, 0x50
; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0		; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s4		; GCN-HSA-NEXT: v_mov_b32_e32 v0, s4
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s5		; GCN-HSA-NEXT: v_mov_b32_e32 v1, s5
; GCN-HSA-NEXT: s_add_u32 s4, s2, 0x60		; GCN-HSA-NEXT: s_add_u32 s4, s2, 0x60
; GCN-HSA-NEXT: flat_load_dwordx4 v[16:19], v[0:1]		; GCN-HSA-NEXT: flat_load_dwordx4 v[8:11], v[0:1]
; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0		; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s4		; GCN-HSA-NEXT: v_mov_b32_e32 v0, s4
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s5		; GCN-HSA-NEXT: v_mov_b32_e32 v1, s5
; GCN-HSA-NEXT: s_add_u32 s4, s2, 0x70		; GCN-HSA-NEXT: s_add_u32 s4, s2, 0x70
; GCN-HSA-NEXT: flat_load_dwordx4 v[12:15], v[0:1]
; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s4
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s5
; GCN-HSA-NEXT: s_add_u32 s4, s2, 16
; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0		; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0
; GCN-HSA-NEXT: s_add_u32 s6, s2, 32		; GCN-HSA-NEXT: s_add_u32 s6, s2, 16
; GCN-HSA-NEXT: s_addc_u32 s7, s3, 0		; GCN-HSA-NEXT: s_addc_u32 s7, s3, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v29, s7		; GCN-HSA-NEXT: v_mov_b32_e32 v21, s7
; GCN-HSA-NEXT: v_mov_b32_e32 v28, s6		; GCN-HSA-NEXT: v_mov_b32_e32 v20, s6
; GCN-HSA-NEXT: flat_load_dwordx4 v[8:11], v[0:1]		; GCN-HSA-NEXT: flat_load_dwordx4 v[4:7], v[0:1]
; GCN-HSA-NEXT: flat_load_dwordx4 v[28:31], v[28:29]		; GCN-HSA-NEXT: flat_load_dwordx4 v[20:23], v[20:21]
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s4		; GCN-HSA-NEXT: v_mov_b32_e32 v0, s4
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s5		; GCN-HSA-NEXT: v_mov_b32_e32 v1, s5
; GCN-HSA-NEXT: s_add_u32 s4, s2, 48		; GCN-HSA-NEXT: s_add_u32 s4, s2, 32
; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0		; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0
		; GCN-HSA-NEXT: s_add_u32 s8, s2, 48
		; GCN-HSA-NEXT: s_addc_u32 s9, s3, 0
; GCN-HSA-NEXT: s_add_u32 s2, s2, 64		; GCN-HSA-NEXT: s_add_u32 s2, s2, 64
; GCN-HSA-NEXT: s_addc_u32 s3, s3, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s3, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
; GCN-HSA-NEXT: flat_load_dwordx4 v[0:3], v[0:1]		; GCN-HSA-NEXT: flat_load_dwordx4 v[0:3], v[0:1]
; GCN-HSA-NEXT: flat_load_dwordx4 v[4:7], v[4:5]		; GCN-HSA-NEXT: v_mov_b32_e32 v17, s3
		; GCN-HSA-NEXT: v_mov_b32_e32 v16, s2
		; GCN-HSA-NEXT: flat_load_dwordx4 v[16:19], v[16:17]
		; GCN-HSA-NEXT: v_mov_b32_e32 v25, s9
; GCN-HSA-NEXT: v_mov_b32_e32 v33, s5		; GCN-HSA-NEXT: v_mov_b32_e32 v33, s5
		; GCN-HSA-NEXT: v_mov_b32_e32 v24, s8
; GCN-HSA-NEXT: v_mov_b32_e32 v32, s4		; GCN-HSA-NEXT: v_mov_b32_e32 v32, s4
		; GCN-HSA-NEXT: flat_load_dwordx4 v[24:27], v[24:25]
; GCN-HSA-NEXT: flat_load_dwordx4 v[32:35], v[32:33]		; GCN-HSA-NEXT: flat_load_dwordx4 v[32:35], v[32:33]
; GCN-HSA-NEXT: s_add_u32 s2, s0, 16		; GCN-HSA-NEXT: s_add_u32 s2, s0, 16
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: s_add_u32 s4, s0, 0xe0
; GCN-HSA-NEXT: v_mov_b32_e32 v37, s1		; GCN-HSA-NEXT: v_mov_b32_e32 v37, s1
; GCN-HSA-NEXT: s_addc_u32 s5, s1, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v36, s0		; GCN-HSA-NEXT: v_mov_b32_e32 v36, s0
; GCN-HSA-NEXT: s_waitcnt vmcnt(7)		; GCN-HSA-NEXT: s_waitcnt vmcnt(7)
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v27, 16, v21		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v31, 16, v13
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v25, 16, v20		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v29, 16, v12
; GCN-HSA-NEXT: v_and_b32_e32 v26, 0xffff, v21		; GCN-HSA-NEXT: v_and_b32_e32 v30, 0xffff, v13
; GCN-HSA-NEXT: v_and_b32_e32 v24, 0xffff, v20		; GCN-HSA-NEXT: v_and_b32_e32 v28, 0xffff, v12
; GCN-HSA-NEXT: v_mov_b32_e32 v21, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v13, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v20, s2		; GCN-HSA-NEXT: v_mov_b32_e32 v12, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xf0		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xe0
; GCN-HSA-NEXT: flat_store_dwordx4 v[36:37], v[24:27]
; GCN-HSA-NEXT: v_mov_b32_e32 v37, s5
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v36, s4		; GCN-HSA-NEXT: s_add_u32 s4, s0, 0xf0
; GCN-HSA-NEXT: s_add_u32 s4, s0, 0xc0		; GCN-HSA-NEXT: flat_store_dwordx4 v[36:37], v[28:31]
		; GCN-HSA-NEXT: v_mov_b32_e32 v37, s3
; GCN-HSA-NEXT: s_addc_u32 s5, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s5, s1, 0
		; GCN-HSA-NEXT: v_mov_b32_e32 v36, s2
		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xc0
		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: s_add_u32 s6, s0, 0xd0		; GCN-HSA-NEXT: s_add_u32 s6, s0, 0xd0
; GCN-HSA-NEXT: s_addc_u32 s7, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s7, s1, 0
		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v31, 16, v15
		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v29, 16, v14
		; GCN-HSA-NEXT: v_and_b32_e32 v30, 0xffff, v15
		; GCN-HSA-NEXT: v_and_b32_e32 v28, 0xffff, v14
; GCN-HSA-NEXT: s_add_u32 s8, s0, 0xa0		; GCN-HSA-NEXT: s_add_u32 s8, s0, 0xa0
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v26, 16, v23
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v24, 16, v22
; GCN-HSA-NEXT: v_and_b32_e32 v25, 0xffff, v23
; GCN-HSA-NEXT: v_and_b32_e32 v23, 0xffff, v22
; GCN-HSA-NEXT: s_addc_u32 s9, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s9, s1, 0
; GCN-HSA-NEXT: flat_store_dwordx4 v[20:21], v[23:26]		; GCN-HSA-NEXT: flat_store_dwordx4 v[12:13], v[28:31]
; GCN-HSA-NEXT: s_waitcnt vmcnt(8)		; GCN-HSA-NEXT: s_waitcnt vmcnt(8)
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v21, 16, v16		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v15, 16, v9
; GCN-HSA-NEXT: v_mov_b32_e32 v25, s9		; GCN-HSA-NEXT: v_mov_b32_e32 v31, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v24, s8		; GCN-HSA-NEXT: v_mov_b32_e32 v30, s2
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v23, 16, v17
; GCN-HSA-NEXT: v_and_b32_e32 v22, 0xffff, v17
; GCN-HSA-NEXT: v_and_b32_e32 v20, 0xffff, v16
; GCN-HSA-NEXT: flat_store_dwordx4 v[24:25], v[20:23]
; GCN-HSA-NEXT: v_mov_b32_e32 v24, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v23, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xb0		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xb0
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v17, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v29, s9
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v22, 16, v19		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v13, 16, v8
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v20, 16, v18		; GCN-HSA-NEXT: v_and_b32_e32 v14, 0xffff, v9
; GCN-HSA-NEXT: v_and_b32_e32 v21, 0xffff, v19		; GCN-HSA-NEXT: v_and_b32_e32 v12, 0xffff, v8
; GCN-HSA-NEXT: v_and_b32_e32 v19, 0xffff, v18		; GCN-HSA-NEXT: v_mov_b32_e32 v9, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v26, s5		; GCN-HSA-NEXT: v_mov_b32_e32 v28, s8
; GCN-HSA-NEXT: v_mov_b32_e32 v16, s2		; GCN-HSA-NEXT: v_mov_b32_e32 v8, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x80		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x80
; GCN-HSA-NEXT: v_mov_b32_e32 v25, s4		; GCN-HSA-NEXT: flat_store_dwordx4 v[28:29], v[12:15]
; GCN-HSA-NEXT: flat_store_dwordx4 v[16:17], v[19:22]
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v14, 16, v11
		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v12, 16, v10
		; GCN-HSA-NEXT: v_and_b32_e32 v13, 0xffff, v11
		; GCN-HSA-NEXT: v_and_b32_e32 v11, 0xffff, v10
		; GCN-HSA-NEXT: flat_store_dwordx4 v[8:9], v[11:14]
; GCN-HSA-NEXT: s_waitcnt vmcnt(9)		; GCN-HSA-NEXT: s_waitcnt vmcnt(9)
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v19, 16, v13		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v9, 16, v4
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v17, 16, v12		; GCN-HSA-NEXT: v_mov_b32_e32 v13, s7
; GCN-HSA-NEXT: v_and_b32_e32 v18, 0xffff, v13		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v11, 16, v5
; GCN-HSA-NEXT: v_and_b32_e32 v16, 0xffff, v12		; GCN-HSA-NEXT: v_and_b32_e32 v10, 0xffff, v5
; GCN-HSA-NEXT: v_mov_b32_e32 v21, s7		; GCN-HSA-NEXT: v_and_b32_e32 v8, 0xffff, v4
; GCN-HSA-NEXT: flat_store_dwordx4 v[25:26], v[16:19]		; GCN-HSA-NEXT: v_mov_b32_e32 v15, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v26, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v12, s6
; GCN-HSA-NEXT: v_mov_b32_e32 v20, s6		; GCN-HSA-NEXT: flat_store_dwordx4 v[30:31], v[8:11]
; GCN-HSA-NEXT: v_mov_b32_e32 v25, s2		; GCN-HSA-NEXT: v_mov_b32_e32 v14, s2
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v18, 16, v15		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v10, 16, v7
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v16, 16, v14		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v8, 16, v6
; GCN-HSA-NEXT: v_and_b32_e32 v17, 0xffff, v15		; GCN-HSA-NEXT: v_and_b32_e32 v9, 0xffff, v7
; GCN-HSA-NEXT: v_and_b32_e32 v15, 0xffff, v14		; GCN-HSA-NEXT: v_and_b32_e32 v7, 0xffff, v6
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x90		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x90
; GCN-HSA-NEXT: flat_store_dwordx4 v[20:21], v[15:18]		; GCN-HSA-NEXT: v_mov_b32_e32 v29, s5
		; GCN-HSA-NEXT: flat_store_dwordx4 v[12:13], v[7:10]
		; GCN-HSA-NEXT: s_waitcnt vmcnt(9)
		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v5, 16, v0
		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v7, 16, v1
		; GCN-HSA-NEXT: v_and_b32_e32 v6, 0xffff, v1
		; GCN-HSA-NEXT: v_and_b32_e32 v4, 0xffff, v0
		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
		; GCN-HSA-NEXT: v_mov_b32_e32 v28, s4
		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v11, 16, v3
		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v9, 16, v2
		; GCN-HSA-NEXT: v_and_b32_e32 v10, 0xffff, v3
		; GCN-HSA-NEXT: v_and_b32_e32 v8, 0xffff, v2
		; GCN-HSA-NEXT: flat_store_dwordx4 v[36:37], v[4:7]
		; GCN-HSA-NEXT: flat_store_dwordx4 v[28:29], v[8:11]
		; GCN-HSA-NEXT: v_mov_b32_e32 v6, s3
; GCN-HSA-NEXT: s_waitcnt vmcnt(10)		; GCN-HSA-NEXT: s_waitcnt vmcnt(10)
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v13, 16, v8		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v3, 16, v17
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v15, 16, v9		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v1, 16, v16
; GCN-HSA-NEXT: v_and_b32_e32 v14, 0xffff, v9		; GCN-HSA-NEXT: v_and_b32_e32 v2, 0xffff, v17
; GCN-HSA-NEXT: v_and_b32_e32 v12, 0xffff, v8		; GCN-HSA-NEXT: v_and_b32_e32 v0, 0xffff, v16
		; GCN-HSA-NEXT: v_mov_b32_e32 v5, s2
		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x60
		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v4, 16, v19
		; GCN-HSA-NEXT: flat_store_dwordx4 v[14:15], v[0:3]
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v19, 16, v11		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v2, 16, v18
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v17, 16, v10		; GCN-HSA-NEXT: v_and_b32_e32 v3, 0xffff, v19
; GCN-HSA-NEXT: v_and_b32_e32 v18, 0xffff, v11		; GCN-HSA-NEXT: v_and_b32_e32 v1, 0xffff, v18
; GCN-HSA-NEXT: v_and_b32_e32 v16, 0xffff, v10		; GCN-HSA-NEXT: flat_store_dwordx4 v[5:6], v[1:4]
; GCN-HSA-NEXT: flat_store_dwordx4 v[36:37], v[12:15]
; GCN-HSA-NEXT: flat_store_dwordx4 v[23:24], v[16:19]
; GCN-HSA-NEXT: s_waitcnt vmcnt(9)
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v14, 16, v5
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v12, 16, v4
; GCN-HSA-NEXT: v_and_b32_e32 v13, 0xffff, v5
; GCN-HSA-NEXT: v_and_b32_e32 v11, 0xffff, v4
; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2		; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x60
; GCN-HSA-NEXT: flat_store_dwordx4 v[25:26], v[11:14]
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v14, 16, v1
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v12, 16, v0
; GCN-HSA-NEXT: v_and_b32_e32 v13, 0xffff, v1
; GCN-HSA-NEXT: v_and_b32_e32 v11, 0xffff, v0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2
; GCN-HSA-NEXT: s_waitcnt vmcnt(9)
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v26, 16, v33
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v24, 16, v32
; GCN-HSA-NEXT: v_and_b32_e32 v25, 0xffff, v33
; GCN-HSA-NEXT: v_and_b32_e32 v23, 0xffff, v32
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s3
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x70		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x70
; GCN-HSA-NEXT: flat_store_dwordx4 v[0:1], v[23:26]
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2		; GCN-HSA-NEXT: v_mov_b32_e32 v9, s3
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v22, 16, v35		; GCN-HSA-NEXT: v_mov_b32_e32 v8, s2
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v20, 16, v34
; GCN-HSA-NEXT: v_and_b32_e32 v21, 0xffff, v35
; GCN-HSA-NEXT: v_and_b32_e32 v19, 0xffff, v34
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s3
; GCN-HSA-NEXT: s_add_u32 s2, s0, 64		; GCN-HSA-NEXT: s_add_u32 s2, s0, 64
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v18, 16, v7
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v16, 16, v6
; GCN-HSA-NEXT: v_and_b32_e32 v17, 0xffff, v7
; GCN-HSA-NEXT: v_and_b32_e32 v15, 0xffff, v6
; GCN-HSA-NEXT: flat_store_dwordx4 v[0:1], v[19:22]
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2		; GCN-HSA-NEXT: v_mov_b32_e32 v17, s3
; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[15:18]		; GCN-HSA-NEXT: v_mov_b32_e32 v16, s2
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s3
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v18, 16, v29
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v16, 16, v28
; GCN-HSA-NEXT: v_and_b32_e32 v17, 0xffff, v29
; GCN-HSA-NEXT: v_and_b32_e32 v15, 0xffff, v28
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x50		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x50
; GCN-HSA-NEXT: flat_store_dwordx4 v[0:1], v[15:18]		; GCN-HSA-NEXT: s_waitcnt vmcnt(10)
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v15, 16, v33
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v13, 16, v32
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v10, 16, v3		; GCN-HSA-NEXT: v_and_b32_e32 v14, 0xffff, v33
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v6, 16, v31		; GCN-HSA-NEXT: v_and_b32_e32 v12, 0xffff, v32
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v4, 16, v30		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: v_and_b32_e32 v9, 0xffff, v3		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v3, 16, v25
; GCN-HSA-NEXT: v_and_b32_e32 v5, 0xffff, v31		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v1, 16, v24
; GCN-HSA-NEXT: v_and_b32_e32 v3, 0xffff, v30		; GCN-HSA-NEXT: v_and_b32_e32 v2, 0xffff, v25
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s3		; GCN-HSA-NEXT: v_and_b32_e32 v0, 0xffff, v24
		; GCN-HSA-NEXT: flat_store_dwordx4 v[16:17], v[12:15]
		; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
		; GCN-HSA-NEXT: v_mov_b32_e32 v13, s3
		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v7, 16, v27
		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v5, 16, v26
		; GCN-HSA-NEXT: v_and_b32_e32 v6, 0xffff, v27
		; GCN-HSA-NEXT: v_and_b32_e32 v4, 0xffff, v26
		; GCN-HSA-NEXT: v_mov_b32_e32 v12, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 32		; GCN-HSA-NEXT: s_add_u32 s2, s0, 32
; GCN-HSA-NEXT: flat_store_dwordx4 v[0:1], v[3:6]		; GCN-HSA-NEXT: flat_store_dwordx4 v[8:9], v[4:7]
		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v11, 16, v35
		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v9, 16, v34
		; GCN-HSA-NEXT: v_and_b32_e32 v10, 0xffff, v35
		; GCN-HSA-NEXT: v_and_b32_e32 v8, 0xffff, v34
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2		; GCN-HSA-NEXT: flat_store_dwordx4 v[12:13], v[8:11]
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s3
; GCN-HSA-NEXT: s_add_u32 s0, s0, 48		; GCN-HSA-NEXT: s_add_u32 s0, s0, 48
; GCN-HSA-NEXT: flat_store_dwordx4 v[0:1], v[11:14]		; GCN-HSA-NEXT: v_mov_b32_e32 v9, s3
		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v7, 16, v21
		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v5, 16, v20
		; GCN-HSA-NEXT: v_and_b32_e32 v6, 0xffff, v21
		; GCN-HSA-NEXT: v_and_b32_e32 v4, 0xffff, v20
		; GCN-HSA-NEXT: v_mov_b32_e32 v8, s2
; GCN-HSA-NEXT: s_addc_u32 s1, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s1, s1, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s0		; GCN-HSA-NEXT: flat_store_dwordx4 v[8:9], v[4:7]
; GCN-HSA-NEXT: v_lshrrev_b32_e32 v8, 16, v2		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v3, 16, v23
; GCN-HSA-NEXT: v_and_b32_e32 v7, 0xffff, v2		; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s1		; GCN-HSA-NEXT: v_lshrrev_b32_e32 v1, 16, v22
; GCN-HSA-NEXT: flat_store_dwordx4 v[0:1], v[7:10]		; GCN-HSA-NEXT: v_and_b32_e32 v2, 0xffff, v23
		; GCN-HSA-NEXT: v_and_b32_e32 v0, 0xffff, v22
		; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0
		; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
; GCN-HSA-NEXT: s_endpgm		; GCN-HSA-NEXT: s_endpgm
;		;
; GCN-NOHSA-VI-LABEL: global_zextload_v64i16_to_v64i32:		; GCN-NOHSA-VI-LABEL: global_zextload_v64i16_to_v64i32:
; GCN-NOHSA-VI: ; %bb.0:		; GCN-NOHSA-VI: ; %bb.0:
; GCN-NOHSA-VI-NEXT: s_mov_b32 s88, SCRATCH_RSRC_DWORD0		; GCN-NOHSA-VI-NEXT: s_mov_b32 s88, SCRATCH_RSRC_DWORD0
; GCN-NOHSA-VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GCN-NOHSA-VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GCN-NOHSA-VI-NEXT: s_mov_b32 s89, SCRATCH_RSRC_DWORD1		; GCN-NOHSA-VI-NEXT: s_mov_b32 s89, SCRATCH_RSRC_DWORD1
; GCN-NOHSA-VI-NEXT: s_mov_b32 s90, -1		; GCN-NOHSA-VI-NEXT: s_mov_b32 s90, -1
▲ Show 20 Lines • Show All 466 Lines • ▼ Show 20 Lines	; CM-NEXT: 2(2.802597e-45), 0(0.000000e+00)
%ext = zext <64 x i16> %load to <64 x i32>		%ext = zext <64 x i16> %load to <64 x i32>
store <64 x i32> %ext, ptr addrspace(1) %out		store <64 x i32> %ext, ptr addrspace(1) %out
ret void		ret void
}		}

define amdgpu_kernel void @global_sextload_v64i16_to_v64i32(ptr addrspace(1) %out, ptr addrspace(1) %in) #0 {		define amdgpu_kernel void @global_sextload_v64i16_to_v64i32(ptr addrspace(1) %out, ptr addrspace(1) %in) #0 {
; GCN-NOHSA-SI-LABEL: global_sextload_v64i16_to_v64i32:		; GCN-NOHSA-SI-LABEL: global_sextload_v64i16_to_v64i32:
; GCN-NOHSA-SI: ; %bb.0:		; GCN-NOHSA-SI: ; %bb.0:
; GCN-NOHSA-SI-NEXT: s_mov_b32 s8, SCRATCH_RSRC_DWORD0		; GCN-NOHSA-SI-NEXT: s_mov_b32 s12, SCRATCH_RSRC_DWORD0
; GCN-NOHSA-SI-NEXT: s_mov_b32 s9, SCRATCH_RSRC_DWORD1		; GCN-NOHSA-SI-NEXT: s_mov_b32 s13, SCRATCH_RSRC_DWORD1
; GCN-NOHSA-SI-NEXT: s_mov_b32 s10, -1		; GCN-NOHSA-SI-NEXT: s_mov_b32 s14, -1
; GCN-NOHSA-SI-NEXT: s_mov_b32 s11, 0xe8f000		; GCN-NOHSA-SI-NEXT: s_mov_b32 s15, 0xe8f000
; GCN-NOHSA-SI-NEXT: s_add_u32 s8, s8, s3		; GCN-NOHSA-SI-NEXT: s_add_u32 s12, s12, s3
; GCN-NOHSA-SI-NEXT: s_addc_u32 s9, s9, 0		; GCN-NOHSA-SI-NEXT: s_addc_u32 s13, s13, 0
; GCN-NOHSA-SI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9		; GCN-NOHSA-SI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GCN-NOHSA-SI-NEXT: s_mov_b32 s3, 0xf000		; GCN-NOHSA-SI-NEXT: s_mov_b32 s3, 0xf000
; GCN-NOHSA-SI-NEXT: s_mov_b32 s2, -1		; GCN-NOHSA-SI-NEXT: s_mov_b32 s2, -1
		; GCN-NOHSA-SI-NEXT: s_mov_b32 s10, s2
		; GCN-NOHSA-SI-NEXT: s_mov_b32 s11, s3
; GCN-NOHSA-SI-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NOHSA-SI-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NOHSA-SI-NEXT: s_mov_b32 s0, s4		; GCN-NOHSA-SI-NEXT: s_mov_b32 s8, s6
; GCN-NOHSA-SI-NEXT: s_mov_b32 s1, s5		; GCN-NOHSA-SI-NEXT: s_mov_b32 s9, s7
; GCN-NOHSA-SI-NEXT: s_mov_b32 s4, s6		; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[0:3], off, s[8:11], 0
; GCN-NOHSA-SI-NEXT: s_mov_b32 s5, s7		; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[4:7], off, s[8:11], 0 offset:16
; GCN-NOHSA-SI-NEXT: s_mov_b32 s6, s2		; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[8:11], off, s[8:11], 0 offset:32
; GCN-NOHSA-SI-NEXT: s_mov_b32 s7, s3		; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[12:15], off, s[8:11], 0 offset:48
; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[12:15], off, s[4:7], 0 offset:112		; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[16:19], off, s[8:11], 0 offset:64
; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[16:19], off, s[4:7], 0 offset:96		; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[20:23], off, s[8:11], 0 offset:80
; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[20:23], off, s[4:7], 0 offset:80		; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[24:27], off, s[8:11], 0 offset:96
; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[24:27], off, s[4:7], 0 offset:64		; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[28:31], off, s[8:11], 0 offset:112
; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[8:11], off, s[4:7], 0		; GCN-NOHSA-SI-NEXT: s_waitcnt vmcnt(7)
; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[28:31], off, s[4:7], 0 offset:16		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v35, 16, v3
; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[32:35], off, s[4:7], 0 offset:32		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v33, 16, v2
; GCN-NOHSA-SI-NEXT: buffer_load_dwordx4 v[36:39], off, s[4:7], 0 offset:48		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v34, v3, 0, 16
; GCN-NOHSA-SI-NEXT: s_waitcnt vmcnt(3)		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v32, v2, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v3, 16, v11		; GCN-NOHSA-SI-NEXT: buffer_store_dword v32, off, s[12:15], 0 offset:4 ; 4-byte Folded Spill
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v1, 16, v10		; GCN-NOHSA-SI-NEXT: s_waitcnt vmcnt(0)
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v2, v11, 0, 16		; GCN-NOHSA-SI-NEXT: buffer_store_dword v33, off, s[12:15], 0 offset:8 ; 4-byte Folded Spill
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v0, v10, 0, 16		; GCN-NOHSA-SI-NEXT: buffer_store_dword v34, off, s[12:15], 0 offset:12 ; 4-byte Folded Spill
; GCN-NOHSA-SI-NEXT: buffer_store_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Spill		; GCN-NOHSA-SI-NEXT: buffer_store_dword v35, off, s[12:15], 0 offset:16 ; 4-byte Folded Spill
; GCN-NOHSA-SI-NEXT: s_waitcnt vmcnt(0)		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v39, 16, v1
; GCN-NOHSA-SI-NEXT: buffer_store_dword v1, off, s[8:11], 0 offset:8 ; 4-byte Folded Spill		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v37, 16, v0
; GCN-NOHSA-SI-NEXT: buffer_store_dword v2, off, s[8:11], 0 offset:12 ; 4-byte Folded Spill		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v38, v1, 0, 16
; GCN-NOHSA-SI-NEXT: buffer_store_dword v3, off, s[8:11], 0 offset:16 ; 4-byte Folded Spill		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v36, v0, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v7, 16, v9		; GCN-NOHSA-SI-NEXT: s_waitcnt expcnt(0)
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v5, 16, v8		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v35, 16, v7
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v6, v9, 0, 16		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v33, 16, v6
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v4, v8, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v34, v7, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v11, 16, v31		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v32, v6, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v9, 16, v30		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v43, 16, v5
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v10, v31, 0, 16		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v41, 16, v4
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v8, v30, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v42, v5, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v43, 16, v29		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v40, v4, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v41, 16, v28		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v7, 16, v11
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v42, v29, 0, 16		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v5, 16, v10
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v40, v28, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v6, v11, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v31, 16, v35		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v4, v10, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v29, 16, v34		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v47, 16, v9
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v30, v35, 0, 16		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v45, 16, v8
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v28, v34, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v46, v9, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v47, 16, v33		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v44, v8, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v45, 16, v32		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v11, 16, v15
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v46, v33, 0, 16		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v9, 16, v14
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v44, v32, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v10, v15, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v35, 16, v39		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v8, v14, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v33, 16, v38		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v51, 16, v13
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v34, v39, 0, 16		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v49, 16, v12
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v32, v38, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v50, v13, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v51, 16, v37		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v48, v12, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v49, 16, v36		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v15, 16, v19
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v50, v37, 0, 16		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v13, 16, v18
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v48, v36, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v14, v19, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v39, 16, v27		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v12, v18, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v37, 16, v26		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v55, 16, v17
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v38, v27, 0, 16		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v53, 16, v16
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v36, v26, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v54, v17, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v55, 16, v25		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v52, v16, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v53, 16, v24		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v19, 16, v23
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v54, v25, 0, 16		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v17, 16, v22
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v52, v24, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v18, v23, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v26, 16, v23		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v16, v22, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v24, 16, v22
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v25, v23, 0, 16
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v23, v22, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v59, 16, v21		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v59, 16, v21
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v57, 16, v20		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v57, 16, v20
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v58, v21, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v58, v21, 0, 16
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v56, v20, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v56, v20, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v22, 16, v19		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v23, 16, v27
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v20, 16, v18		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v21, 16, v26
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v21, v19, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v22, v27, 0, 16
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v19, v18, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v20, v26, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v63, 16, v17		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v63, 16, v25
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v61, 16, v16		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v61, 16, v24
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v62, v17, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v62, v25, 0, 16
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v60, v16, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v60, v24, 0, 16
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v18, 16, v15		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v27, 16, v31
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v16, 16, v14		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v25, 16, v30
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v17, v15, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v26, v31, 0, 16
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v15, v14, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v24, v30, 0, 16
; GCN-NOHSA-SI-NEXT: s_waitcnt expcnt(0)		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v3, 16, v29
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v3, 16, v13		; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v1, 16, v28
; GCN-NOHSA-SI-NEXT: v_ashrrev_i32_e32 v1, 16, v12		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v2, v29, 0, 16
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v2, v13, 0, 16		; GCN-NOHSA-SI-NEXT: v_bfe_i32 v0, v28, 0, 16
; GCN-NOHSA-SI-NEXT: v_bfe_i32 v0, v12, 0, 16		; GCN-NOHSA-SI-NEXT: s_mov_b32 s0, s4
		; GCN-NOHSA-SI-NEXT: s_mov_b32 s1, s5
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:224		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:224
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[15:18], off, s[0:3], 0 offset:240		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[24:27], off, s[0:3], 0 offset:240
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[60:63], off, s[0:3], 0 offset:192		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[60:63], off, s[0:3], 0 offset:192
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[19:22], off, s[0:3], 0 offset:208		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[20:23], off, s[0:3], 0 offset:208
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[56:59], off, s[0:3], 0 offset:160		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[56:59], off, s[0:3], 0 offset:160
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[23:26], off, s[0:3], 0 offset:176		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[16:19], off, s[0:3], 0 offset:176
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[52:55], off, s[0:3], 0 offset:128		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[52:55], off, s[0:3], 0 offset:128
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[36:39], off, s[0:3], 0 offset:144		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[12:15], off, s[0:3], 0 offset:144
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[48:51], off, s[0:3], 0 offset:96		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[48:51], off, s[0:3], 0 offset:96
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[32:35], off, s[0:3], 0 offset:112		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:112
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[44:47], off, s[0:3], 0 offset:64		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[44:47], off, s[0:3], 0 offset:64
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[28:31], off, s[0:3], 0 offset:80		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], 0 offset:80
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[40:43], off, s[0:3], 0 offset:32		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[40:43], off, s[0:3], 0 offset:32
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:48		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[32:35], off, s[0:3], 0 offset:48
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], 0		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[36:39], off, s[0:3], 0
; GCN-NOHSA-SI-NEXT: buffer_load_dword v0, off, s[8:11], 0 offset:4 ; 4-byte Folded Reload		; GCN-NOHSA-SI-NEXT: buffer_load_dword v0, off, s[12:15], 0 offset:4 ; 4-byte Folded Reload
; GCN-NOHSA-SI-NEXT: buffer_load_dword v1, off, s[8:11], 0 offset:8 ; 4-byte Folded Reload		; GCN-NOHSA-SI-NEXT: buffer_load_dword v1, off, s[12:15], 0 offset:8 ; 4-byte Folded Reload
; GCN-NOHSA-SI-NEXT: buffer_load_dword v2, off, s[8:11], 0 offset:12 ; 4-byte Folded Reload		; GCN-NOHSA-SI-NEXT: buffer_load_dword v2, off, s[12:15], 0 offset:12 ; 4-byte Folded Reload
; GCN-NOHSA-SI-NEXT: buffer_load_dword v3, off, s[8:11], 0 offset:16 ; 4-byte Folded Reload		; GCN-NOHSA-SI-NEXT: buffer_load_dword v3, off, s[12:15], 0 offset:16 ; 4-byte Folded Reload
; GCN-NOHSA-SI-NEXT: s_waitcnt vmcnt(0)		; GCN-NOHSA-SI-NEXT: s_waitcnt vmcnt(0)
; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:16		; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:16
; GCN-NOHSA-SI-NEXT: s_endpgm		; GCN-NOHSA-SI-NEXT: s_endpgm
;		;
; GCN-HSA-LABEL: global_sextload_v64i16_to_v64i32:		; GCN-HSA-LABEL: global_sextload_v64i16_to_v64i32:
; GCN-HSA: ; %bb.0:		; GCN-HSA: ; %bb.0:
; GCN-HSA-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; GCN-HSA-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; GCN-HSA-NEXT: s_waitcnt lgkmcnt(0)		; GCN-HSA-NEXT: s_waitcnt lgkmcnt(0)
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2		; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v1, s3
		; GCN-HSA-NEXT: flat_load_dwordx4 v[12:15], v[0:1]
; GCN-HSA-NEXT: s_add_u32 s4, s2, 0x70		; GCN-HSA-NEXT: s_add_u32 s4, s2, 0x70
; GCN-HSA-NEXT: flat_load_dwordx4 v[20:23], v[0:1]
; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0		; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s4		; GCN-HSA-NEXT: v_mov_b32_e32 v0, s4
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s5		; GCN-HSA-NEXT: v_mov_b32_e32 v1, s5
		; GCN-HSA-NEXT: flat_load_dwordx4 v[8:11], v[0:1]
; GCN-HSA-NEXT: s_add_u32 s4, s2, 0x60		; GCN-HSA-NEXT: s_add_u32 s4, s2, 0x60
; GCN-HSA-NEXT: flat_load_dwordx4 v[16:19], v[0:1]
; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0		; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s4		; GCN-HSA-NEXT: v_mov_b32_e32 v0, s4
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s5		; GCN-HSA-NEXT: v_mov_b32_e32 v1, s5
; GCN-HSA-NEXT: s_add_u32 s4, s2, 0x50		; GCN-HSA-NEXT: s_add_u32 s4, s2, 0x50
		; GCN-HSA-NEXT: flat_load_dwordx4 v[4:7], v[0:1]
; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0		; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0
; GCN-HSA-NEXT: s_add_u32 s8, s2, 64
; GCN-HSA-NEXT: flat_load_dwordx4 v[12:15], v[0:1]
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s4		; GCN-HSA-NEXT: v_mov_b32_e32 v0, s4
; GCN-HSA-NEXT: s_addc_u32 s9, s3, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s5		; GCN-HSA-NEXT: v_mov_b32_e32 v1, s5
		; GCN-HSA-NEXT: s_add_u32 s4, s2, 64
		; GCN-HSA-NEXT: flat_load_dwordx4 v[0:3], v[0:1]
		; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0
		; GCN-HSA-NEXT: v_mov_b32_e32 v17, s5
		; GCN-HSA-NEXT: v_mov_b32_e32 v16, s4
		; GCN-HSA-NEXT: flat_load_dwordx4 v[16:19], v[16:17]
; GCN-HSA-NEXT: s_add_u32 s4, s2, 48		; GCN-HSA-NEXT: s_add_u32 s4, s2, 48
; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0		; GCN-HSA-NEXT: s_addc_u32 s5, s3, 0
; GCN-HSA-NEXT: s_add_u32 s6, s2, 32		; GCN-HSA-NEXT: s_add_u32 s6, s2, 32
; GCN-HSA-NEXT: s_addc_u32 s7, s3, 0		; GCN-HSA-NEXT: s_addc_u32 s7, s3, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v25, s7		; GCN-HSA-NEXT: v_mov_b32_e32 v21, s5
; GCN-HSA-NEXT: s_add_u32 s2, s2, 16		; GCN-HSA-NEXT: s_add_u32 s2, s2, 16
; GCN-HSA-NEXT: v_mov_b32_e32 v24, s6		; GCN-HSA-NEXT: v_mov_b32_e32 v20, s4
; GCN-HSA-NEXT: flat_load_dwordx4 v[8:11], v[0:1]
; GCN-HSA-NEXT: flat_load_dwordx4 v[24:27], v[24:25]
; GCN-HSA-NEXT: s_addc_u32 s3, s3, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s3, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2		; GCN-HSA-NEXT: flat_load_dwordx4 v[20:23], v[20:21]
; GCN-HSA-NEXT: v_mov_b32_e32 v33, s5		; GCN-HSA-NEXT: v_mov_b32_e32 v29, s7
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v33, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v32, s4		; GCN-HSA-NEXT: v_mov_b32_e32 v28, s6
; GCN-HSA-NEXT: flat_load_dwordx4 v[4:7], v[0:1]		; GCN-HSA-NEXT: v_mov_b32_e32 v32, s2
		; GCN-HSA-NEXT: flat_load_dwordx4 v[28:31], v[28:29]
; GCN-HSA-NEXT: flat_load_dwordx4 v[32:35], v[32:33]		; GCN-HSA-NEXT: flat_load_dwordx4 v[32:35], v[32:33]
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s8
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s9
; GCN-HSA-NEXT: flat_load_dwordx4 v[0:3], v[0:1]
; GCN-HSA-NEXT: s_add_u32 s2, s0, 16		; GCN-HSA-NEXT: s_add_u32 s2, s0, 16
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: s_add_u32 s4, s0, 0xe0
; GCN-HSA-NEXT: v_mov_b32_e32 v37, s1		; GCN-HSA-NEXT: v_mov_b32_e32 v37, s1
; GCN-HSA-NEXT: s_addc_u32 s5, s1, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v36, s0		; GCN-HSA-NEXT: v_mov_b32_e32 v36, s0
; GCN-HSA-NEXT: s_waitcnt vmcnt(7)		; GCN-HSA-NEXT: s_waitcnt vmcnt(7)
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v31, 16, v21		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v27, 16, v13
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v29, 16, v20		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v25, 16, v12
; GCN-HSA-NEXT: v_bfe_i32 v30, v21, 0, 16		; GCN-HSA-NEXT: v_bfe_i32 v26, v13, 0, 16
; GCN-HSA-NEXT: v_bfe_i32 v28, v20, 0, 16		; GCN-HSA-NEXT: v_bfe_i32 v24, v12, 0, 16
; GCN-HSA-NEXT: v_mov_b32_e32 v21, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v13, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v20, s2		; GCN-HSA-NEXT: v_mov_b32_e32 v12, s2
		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xe0
		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
		; GCN-HSA-NEXT: flat_store_dwordx4 v[36:37], v[24:27]
		; GCN-HSA-NEXT: v_mov_b32_e32 v37, s3
		; GCN-HSA-NEXT: v_mov_b32_e32 v36, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xf0		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xf0
; GCN-HSA-NEXT: flat_store_dwordx4 v[36:37], v[28:31]		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v27, 16, v15
		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v25, 16, v14
		; GCN-HSA-NEXT: v_bfe_i32 v26, v15, 0, 16
		; GCN-HSA-NEXT: v_bfe_i32 v24, v14, 0, 16
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v31, 16, v23		; GCN-HSA-NEXT: flat_store_dwordx4 v[12:13], v[24:27]
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v29, 16, v22		; GCN-HSA-NEXT: s_waitcnt vmcnt(8)
; GCN-HSA-NEXT: v_bfe_i32 v30, v23, 0, 16		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v15, 16, v9
; GCN-HSA-NEXT: v_bfe_i32 v28, v22, 0, 16		; GCN-HSA-NEXT: v_mov_b32_e32 v25, s3
; GCN-HSA-NEXT: flat_store_dwordx4 v[20:21], v[28:31]		; GCN-HSA-NEXT: v_mov_b32_e32 v24, s2
; GCN-HSA-NEXT: v_mov_b32_e32 v37, s5
; GCN-HSA-NEXT: v_mov_b32_e32 v29, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v28, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xc0		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xc0
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v31, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v27, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v30, s2		; GCN-HSA-NEXT: v_mov_b32_e32 v26, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xd0		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xd0
; GCN-HSA-NEXT: v_mov_b32_e32 v36, s4		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v13, 16, v8
; GCN-HSA-NEXT: s_waitcnt vmcnt(8)		; GCN-HSA-NEXT: v_bfe_i32 v14, v9, 0, 16
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v23, 16, v17		; GCN-HSA-NEXT: v_bfe_i32 v12, v8, 0, 16
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v21, 16, v16
; GCN-HSA-NEXT: v_bfe_i32 v22, v17, 0, 16
; GCN-HSA-NEXT: v_bfe_i32 v20, v16, 0, 16
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: flat_store_dwordx4 v[36:37], v[20:23]		; GCN-HSA-NEXT: flat_store_dwordx4 v[36:37], v[12:15]
; GCN-HSA-NEXT: v_mov_b32_e32 v37, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v37, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v36, s2		; GCN-HSA-NEXT: v_mov_b32_e32 v36, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xa0		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xa0
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v22, 16, v19		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v14, 16, v11
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v20, 16, v18		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v12, 16, v10
; GCN-HSA-NEXT: v_bfe_i32 v21, v19, 0, 16		; GCN-HSA-NEXT: v_bfe_i32 v13, v11, 0, 16
; GCN-HSA-NEXT: v_bfe_i32 v19, v18, 0, 16		; GCN-HSA-NEXT: v_bfe_i32 v11, v10, 0, 16
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: flat_store_dwordx4 v[28:29], v[19:22]		; GCN-HSA-NEXT: flat_store_dwordx4 v[24:25], v[11:14]
; GCN-HSA-NEXT: v_mov_b32_e32 v29, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v25, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v28, s2		; GCN-HSA-NEXT: v_mov_b32_e32 v24, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xb0		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0xb0
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: s_waitcnt vmcnt(9)
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v19, 16, v13
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v17, 16, v12
; GCN-HSA-NEXT: v_bfe_i32 v18, v13, 0, 16
; GCN-HSA-NEXT: v_bfe_i32 v16, v12, 0, 16
; GCN-HSA-NEXT: v_mov_b32_e32 v39, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v39, s3
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v23, 16, v15
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v21, 16, v14
; GCN-HSA-NEXT: v_bfe_i32 v22, v15, 0, 16
; GCN-HSA-NEXT: v_bfe_i32 v20, v14, 0, 16
; GCN-HSA-NEXT: v_mov_b32_e32 v38, s2		; GCN-HSA-NEXT: v_mov_b32_e32 v38, s2
; GCN-HSA-NEXT: flat_store_dwordx4 v[30:31], v[16:19]
; GCN-HSA-NEXT: flat_store_dwordx4 v[36:37], v[20:23]
; GCN-HSA-NEXT: s_waitcnt vmcnt(10)
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v18, 16, v9
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v16, 16, v8
; GCN-HSA-NEXT: v_bfe_i32 v17, v9, 0, 16
; GCN-HSA-NEXT: v_bfe_i32 v15, v8, 0, 16
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x80		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x80
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v14, 16, v11		; GCN-HSA-NEXT: s_waitcnt vmcnt(9)
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v12, 16, v10		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v11, 16, v5
; GCN-HSA-NEXT: v_bfe_i32 v13, v11, 0, 16		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v9, 16, v4
; GCN-HSA-NEXT: v_bfe_i32 v11, v10, 0, 16		; GCN-HSA-NEXT: v_bfe_i32 v10, v5, 0, 16
; GCN-HSA-NEXT: flat_store_dwordx4 v[28:29], v[15:18]		; GCN-HSA-NEXT: v_bfe_i32 v8, v4, 0, 16
; GCN-HSA-NEXT: flat_store_dwordx4 v[38:39], v[11:14]		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v15, 16, v7
		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v13, 16, v6
		; GCN-HSA-NEXT: v_bfe_i32 v14, v7, 0, 16
		; GCN-HSA-NEXT: v_bfe_i32 v12, v6, 0, 16
; GCN-HSA-NEXT: s_waitcnt vmcnt(8)		; GCN-HSA-NEXT: s_waitcnt vmcnt(8)
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v18, 16, v1		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v7, 16, v1
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v16, 16, v0		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v5, 16, v0
; GCN-HSA-NEXT: v_bfe_i32 v17, v1, 0, 16		; GCN-HSA-NEXT: v_bfe_i32 v6, v1, 0, 16
; GCN-HSA-NEXT: v_bfe_i32 v15, v0, 0, 16		; GCN-HSA-NEXT: v_bfe_i32 v4, v0, 0, 16
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2		; GCN-HSA-NEXT: flat_store_dwordx4 v[26:27], v[8:11]
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s3		; GCN-HSA-NEXT: flat_store_dwordx4 v[36:37], v[12:15]
		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v11, 16, v3
		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v9, 16, v2
		; GCN-HSA-NEXT: v_bfe_i32 v10, v3, 0, 16
		; GCN-HSA-NEXT: v_bfe_i32 v8, v2, 0, 16
		; GCN-HSA-NEXT: flat_store_dwordx4 v[24:25], v[4:7]
		; GCN-HSA-NEXT: flat_store_dwordx4 v[38:39], v[8:11]
		; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
		; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x90		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x90
; GCN-HSA-NEXT: flat_store_dwordx4 v[0:1], v[15:18]		; GCN-HSA-NEXT: s_waitcnt vmcnt(11)
		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v3, 16, v17
		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v1, 16, v16
		; GCN-HSA-NEXT: v_bfe_i32 v2, v17, 0, 16
		; GCN-HSA-NEXT: v_bfe_i32 v0, v16, 0, 16
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2		; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
		; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x60		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x60
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v10, 16, v7		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v3, 16, v19
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v8, 16, v6		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v1, 16, v18
; GCN-HSA-NEXT: v_bfe_i32 v9, v7, 0, 16		; GCN-HSA-NEXT: v_bfe_i32 v2, v19, 0, 16
; GCN-HSA-NEXT: v_bfe_i32 v7, v6, 0, 16		; GCN-HSA-NEXT: v_bfe_i32 v0, v18, 0, 16
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v14, 16, v5
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v12, 16, v4
; GCN-HSA-NEXT: v_bfe_i32 v13, v5, 0, 16
; GCN-HSA-NEXT: v_bfe_i32 v11, v4, 0, 16
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v6, 16, v3
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v4, 16, v2
; GCN-HSA-NEXT: v_bfe_i32 v5, v3, 0, 16
; GCN-HSA-NEXT: v_bfe_i32 v3, v2, 0, 16
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: flat_store_dwordx4 v[0:1], v[3:6]		; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v16, 16, v26
; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2		; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x70		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x70
; GCN-HSA-NEXT: v_bfe_i32 v15, v26, 0, 16
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v22, 16, v25
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v20, 16, v24
; GCN-HSA-NEXT: v_bfe_i32 v21, v25, 0, 16
; GCN-HSA-NEXT: v_bfe_i32 v19, v24, 0, 16
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v26, 16, v33
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v24, 16, v32
; GCN-HSA-NEXT: v_bfe_i32 v25, v33, 0, 16
; GCN-HSA-NEXT: v_bfe_i32 v23, v32, 0, 16
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[23:26]		; GCN-HSA-NEXT: v_mov_b32_e32 v13, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v5, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v12, s2
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v3, 16, v35
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v1, 16, v34
; GCN-HSA-NEXT: v_bfe_i32 v2, v35, 0, 16
; GCN-HSA-NEXT: v_bfe_i32 v0, v34, 0, 16
; GCN-HSA-NEXT: v_mov_b32_e32 v4, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 64		; GCN-HSA-NEXT: s_add_u32 s2, s0, 64
; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2		; GCN-HSA-NEXT: v_mov_b32_e32 v17, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s3		; GCN-HSA-NEXT: s_waitcnt vmcnt(12)
		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v11, 16, v23
		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v9, 16, v22
		; GCN-HSA-NEXT: v_bfe_i32 v10, v23, 0, 16
		; GCN-HSA-NEXT: v_bfe_i32 v8, v22, 0, 16
		; GCN-HSA-NEXT: v_mov_b32_e32 v16, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x50		; GCN-HSA-NEXT: s_add_u32 s2, s0, 0x50
; GCN-HSA-NEXT: flat_store_dwordx4 v[0:1], v[19:22]		; GCN-HSA-NEXT: flat_store_dwordx4 v[12:13], v[8:11]
		; GCN-HSA-NEXT: s_waitcnt vmcnt(12)
		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v15, 16, v29
		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v13, 16, v28
		; GCN-HSA-NEXT: v_bfe_i32 v14, v29, 0, 16
		; GCN-HSA-NEXT: v_bfe_i32 v12, v28, 0, 16
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2		; GCN-HSA-NEXT: flat_store_dwordx4 v[16:17], v[12:15]
; GCN-HSA-NEXT: v_ashrrev_i32_e32 v18, 16, v27		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v11, 16, v31
; GCN-HSA-NEXT: v_bfe_i32 v17, v27, 0, 16		; GCN-HSA-NEXT: v_mov_b32_e32 v13, s3
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s3		; GCN-HSA-NEXT: v_mov_b32_e32 v12, s2
; GCN-HSA-NEXT: s_add_u32 s2, s0, 32		; GCN-HSA-NEXT: s_add_u32 s2, s0, 32
; GCN-HSA-NEXT: flat_store_dwordx4 v[0:1], v[15:18]		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v9, 16, v30
; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0		; GCN-HSA-NEXT: v_bfe_i32 v10, v31, 0, 16
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s2		; GCN-HSA-NEXT: v_bfe_i32 v8, v30, 0, 16
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s3		; GCN-HSA-NEXT: s_addc_u32 s3, s1, 0
		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v3, 16, v21
		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v1, 16, v20
		; GCN-HSA-NEXT: v_bfe_i32 v2, v21, 0, 16
		; GCN-HSA-NEXT: v_bfe_i32 v0, v20, 0, 16
		; GCN-HSA-NEXT: flat_store_dwordx4 v[12:13], v[8:11]
; GCN-HSA-NEXT: s_add_u32 s0, s0, 48		; GCN-HSA-NEXT: s_add_u32 s0, s0, 48
; GCN-HSA-NEXT: flat_store_dwordx4 v[0:1], v[11:14]		; GCN-HSA-NEXT: v_mov_b32_e32 v9, s3
		; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
		; GCN-HSA-NEXT: s_waitcnt vmcnt(14)
		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v7, 16, v33
		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v5, 16, v32
		; GCN-HSA-NEXT: v_bfe_i32 v6, v33, 0, 16
		; GCN-HSA-NEXT: v_bfe_i32 v4, v32, 0, 16
		; GCN-HSA-NEXT: v_mov_b32_e32 v8, s2
; GCN-HSA-NEXT: s_addc_u32 s1, s1, 0		; GCN-HSA-NEXT: s_addc_u32 s1, s1, 0
; GCN-HSA-NEXT: v_mov_b32_e32 v0, s0		; GCN-HSA-NEXT: flat_store_dwordx4 v[8:9], v[4:7]
; GCN-HSA-NEXT: v_mov_b32_e32 v1, s1		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v3, 16, v35
; GCN-HSA-NEXT: flat_store_dwordx4 v[0:1], v[7:10]		; GCN-HSA-NEXT: v_mov_b32_e32 v5, s1
		; GCN-HSA-NEXT: v_ashrrev_i32_e32 v1, 16, v34
		; GCN-HSA-NEXT: v_bfe_i32 v2, v35, 0, 16
		; GCN-HSA-NEXT: v_bfe_i32 v0, v34, 0, 16
		; GCN-HSA-NEXT: v_mov_b32_e32 v4, s0
		; GCN-HSA-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
; GCN-HSA-NEXT: s_endpgm		; GCN-HSA-NEXT: s_endpgm
;		;
; GCN-NOHSA-VI-LABEL: global_sextload_v64i16_to_v64i32:		; GCN-NOHSA-VI-LABEL: global_sextload_v64i16_to_v64i32:
; GCN-NOHSA-VI: ; %bb.0:		; GCN-NOHSA-VI: ; %bb.0:
; GCN-NOHSA-VI-NEXT: s_mov_b32 s88, SCRATCH_RSRC_DWORD0		; GCN-NOHSA-VI-NEXT: s_mov_b32 s88, SCRATCH_RSRC_DWORD0
; GCN-NOHSA-VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GCN-NOHSA-VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GCN-NOHSA-VI-NEXT: s_mov_b32 s89, SCRATCH_RSRC_DWORD1		; GCN-NOHSA-VI-NEXT: s_mov_b32 s89, SCRATCH_RSRC_DWORD1
; GCN-NOHSA-VI-NEXT: s_mov_b32 s90, -1		; GCN-NOHSA-VI-NEXT: s_mov_b32 s90, -1
▲ Show 20 Lines • Show All 4,132 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats.mir

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	bb.0:
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0		%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0
%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

S_NOP 0		S_NOP 0

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
S_NOP 0, implicit %12, implicit %13		S_NOP 0, implicit %12, implicit %13
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	bb.0:
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0		%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0
%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %24		S_NOP 0, implicit %24

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %23		S_NOP 0, implicit %23
S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	bb.0:
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0		%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0
%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %24		S_NOP 0, implicit %24
S_NOP 0, implicit %23		S_NOP 0, implicit %23

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
S_NOP 0, implicit %12, implicit %13		S_NOP 0, implicit %12, implicit %13
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	bb.0:
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode		%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode
%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %24		S_NOP 0, implicit %24
S_NOP 0, implicit %22, implicit %23		S_NOP 0, implicit %22, implicit %23

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
S_NOP 0, implicit %12, implicit %13		S_NOP 0, implicit %12, implicit %13
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	bb.0:
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0		%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0
%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %24		S_NOP 0, implicit %24
S_NOP 0, implicit %23		S_NOP 0, implicit %23

bb.2:		bb.2:
; predcessors: %bb.1
successors: %bb.3		successors: %bb.3

%25:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 25, implicit $exec, implicit $mode		%25:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 25, implicit $exec, implicit $mode
S_NOP 0		S_NOP 0

bb.3:		bb.3:
; predecessors: %bb.2
successors: %bb.4		successors: %bb.4

%26:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 26, implicit $exec, implicit $mode, implicit-def $m0		%26:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 26, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %26		S_NOP 0, implicit %26
S_NOP 0, implicit %25		S_NOP 0, implicit %25

bb.4:		bb.4:
; predcessors: %bb.3

S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
S_NOP 0, implicit %12, implicit %13		S_NOP 0, implicit %12, implicit %13
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	bb.0:
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode		%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode
%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %24		S_NOP 0, implicit %24
S_NOP 0, implicit %23, implicit %22		S_NOP 0, implicit %23, implicit %22

bb.2:		bb.2:
; predcessors: %bb.1
successors: %bb.3		successors: %bb.3

%25:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 25, implicit $exec, implicit $mode		%25:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 25, implicit $exec, implicit $mode
%26:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 26, implicit $exec, implicit $mode		%26:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 26, implicit $exec, implicit $mode
S_NOP 0		S_NOP 0

bb.3:		bb.3:
; predecessors: %bb.2
successors: %bb.4		successors: %bb.4

%27:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 27, implicit $exec, implicit $mode, implicit-def $m0		%27:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 27, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %27		S_NOP 0, implicit %27
S_NOP 0, implicit %25, implicit %26		S_NOP 0, implicit %25, implicit %26

bb.4:		bb.4:
; predcessors: %bb.3

S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
S_NOP 0, implicit %12, implicit %13		S_NOP 0, implicit %12, implicit %13
▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	bb.0:
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0		%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0
undef %23.sub0:vreg_64 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode		undef %23.sub0:vreg_64 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%23.sub1:vreg_64 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%23.sub1:vreg_64 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %23		S_NOP 0, implicit %23

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
S_NOP 0, implicit %12, implicit %13		S_NOP 0, implicit %12, implicit %13
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	bb.0:
%16:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 16, implicit $exec, implicit $mode, implicit-def $m0		%16:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 16, implicit $exec, implicit $mode, implicit-def $m0
%17:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 17, implicit $exec, implicit $mode, implicit-def $m0		%17:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 17, implicit $exec, implicit $mode, implicit-def $m0
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
undef %21.sub0:vreg_128 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode		undef %21.sub0:vreg_128 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%21.sub1:vreg_128 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0		%21.sub1:vreg_128 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0
%21.sub2:vreg_128 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode, implicit-def $m0		%21.sub2:vreg_128 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode, implicit-def $m0
%21.sub3:vreg_128 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%21.sub3:vreg_128 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %21		S_NOP 0, implicit %21

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
S_NOP 0, implicit %12, implicit %13		S_NOP 0, implicit %12, implicit %13
▲ Show 20 Lines • Show All 502 Lines • ▼ Show 20 Lines	bb.0:
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0		%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0
%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
%25:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 25, implicit $exec, implicit $mode, implicit-def $m0		%25:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 25, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %24, implicit %25		S_NOP 0, implicit %24, implicit %25

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %23		S_NOP 0, implicit %23
S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
▲ Show 20 Lines • Show All 922 Lines • ▼ Show 20 Lines	bb.0:
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0		%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0
%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %24		S_NOP 0, implicit %24

bb.2:		bb.2:
; predcessors: %bb.1
successors: %bb.3		successors: %bb.3

S_NOP 0, implicit %23		S_NOP 0, implicit %23
%25:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 25, implicit $exec, implicit $mode		%25:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 25, implicit $exec, implicit $mode
S_NOP 0		S_NOP 0

bb.3:		bb.3:
; predecessors: %bb.2
successors: %bb.4		successors: %bb.4

%26:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 26, implicit $exec, implicit $mode, implicit-def $m0		%26:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 26, implicit $exec, implicit $mode, implicit-def $m0
%27:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 27, implicit $exec, implicit $mode, implicit-def $m0		%27:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 27, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %26, implicit %27		S_NOP 0, implicit %26, implicit %27

bb.4:		bb.4:
; predcessors: %bb.3

S_NOP 0, implicit %25		S_NOP 0, implicit %25
S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	bb.0:
%17:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 17, implicit $exec, implicit $mode, implicit-def $m0		%17:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 17, implicit $exec, implicit $mode, implicit-def $m0
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
undef %21.sub0:vreg_128 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode		undef %21.sub0:vreg_128 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode
%21.sub1:vreg_128 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0		%21.sub1:vreg_128 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%21.sub2:vreg_128 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode, implicit-def $m0		%21.sub2:vreg_128 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode, implicit-def $m0
%21.sub3:vreg_128 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%21.sub3:vreg_128 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %21		S_NOP 0, implicit %21

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
S_NOP 0, implicit %12, implicit %13		S_NOP 0, implicit %12, implicit %13
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	bb.0:
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0		%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0
%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %24		S_NOP 0, implicit %24
S_NOP 0, implicit %23		S_NOP 0, implicit %23

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %23		S_NOP 0, implicit %23
S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
▲ Show 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	body: \|
; GFX908-NEXT: $exec = S_MOV_B64_term [[S_AND_B64_]]		; GFX908-NEXT: $exec = S_MOV_B64_term [[S_AND_B64_]]
; GFX908-NEXT: S_CBRANCH_EXECZ %bb.3, implicit $exec		; GFX908-NEXT: S_CBRANCH_EXECZ %bb.3, implicit $exec
; GFX908-NEXT: S_BRANCH %bb.2		; GFX908-NEXT: S_BRANCH %bb.2
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: bb.2:		; GFX908-NEXT: bb.2:
; GFX908-NEXT: successors: %bb.3(0x80000000)		; GFX908-NEXT: successors: %bb.3(0x80000000)
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: [[V_CVT_I32_F64_e32_31:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 32, implicit $exec, implicit $mode, implicit-def $m0		; GFX908-NEXT: [[V_CVT_I32_F64_e32_31:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 32, implicit $exec, implicit $mode, implicit-def $m0
; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_31]]
; GFX908-NEXT: [[V_CVT_I32_F64_e32_32:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 31, implicit $exec, implicit $mode		; GFX908-NEXT: [[V_CVT_I32_F64_e32_32:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 31, implicit $exec, implicit $mode
		; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_31]]
; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_32]]		; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_32]]
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: bb.3:		; GFX908-NEXT: bb.3:
; GFX908-NEXT: successors: %bb.5(0x04000000), %bb.4(0x7c000000)		; GFX908-NEXT: successors: %bb.5(0x04000000), %bb.4(0x7c000000)
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: $exec = S_OR_B64 $exec, [[COPY2]], implicit-def $scc		; GFX908-NEXT: $exec = S_OR_B64 $exec, [[COPY2]], implicit-def $scc
; GFX908-NEXT: undef %4.sub0:sreg_64 = S_ADD_I32 %4.sub0, -1, implicit-def dead $scc		; GFX908-NEXT: undef %4.sub0:sreg_64 = S_ADD_I32 %4.sub0, -1, implicit-def dead $scc
; GFX908-NEXT: S_CMP_LG_U32 %4.sub0, 0, implicit-def $scc		; GFX908-NEXT: S_CMP_LG_U32 %4.sub0, 0, implicit-def $scc
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	body: \|
; GFX908-NEXT: $exec = S_MOV_B64_term [[S_AND_B64_]]		; GFX908-NEXT: $exec = S_MOV_B64_term [[S_AND_B64_]]
; GFX908-NEXT: S_CBRANCH_EXECZ %bb.3, implicit $exec		; GFX908-NEXT: S_CBRANCH_EXECZ %bb.3, implicit $exec
; GFX908-NEXT: S_BRANCH %bb.2		; GFX908-NEXT: S_BRANCH %bb.2
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: bb.2:		; GFX908-NEXT: bb.2:
; GFX908-NEXT: successors: %bb.3(0x80000000)		; GFX908-NEXT: successors: %bb.3(0x80000000)
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: [[V_CVT_I32_F64_e32_35:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 36, implicit $exec, implicit $mode, implicit-def $m0		; GFX908-NEXT: [[V_CVT_I32_F64_e32_35:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 36, implicit $exec, implicit $mode, implicit-def $m0
; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_35]]
; GFX908-NEXT: [[V_CVT_I32_F64_e32_36:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 35, implicit $exec, implicit $mode		; GFX908-NEXT: [[V_CVT_I32_F64_e32_36:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 35, implicit $exec, implicit $mode
		; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_35]]
; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_36]]		; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_36]]
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: bb.3:		; GFX908-NEXT: bb.3:
; GFX908-NEXT: successors: %bb.5(0x04000000), %bb.4(0x7c000000)		; GFX908-NEXT: successors: %bb.5(0x04000000), %bb.4(0x7c000000)
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: $exec = S_OR_B64 $exec, [[COPY2]], implicit-def $scc		; GFX908-NEXT: $exec = S_OR_B64 $exec, [[COPY2]], implicit-def $scc
; GFX908-NEXT: undef %4.sub0:sreg_64 = S_ADD_I32 %4.sub0, -1, implicit-def dead $scc		; GFX908-NEXT: undef %4.sub0:sreg_64 = S_ADD_I32 %4.sub0, -1, implicit-def dead $scc
; GFX908-NEXT: S_CMP_LG_U32 %4.sub0, 0, implicit-def $scc		; GFX908-NEXT: S_CMP_LG_U32 %4.sub0, 0, implicit-def $scc
▲ Show 20 Lines • Show All 554 Lines • ▼ Show 20 Lines	body: \|
; GFX908-NEXT: $exec = S_MOV_B64_term [[S_AND_B64_]]		; GFX908-NEXT: $exec = S_MOV_B64_term [[S_AND_B64_]]
; GFX908-NEXT: S_CBRANCH_EXECZ %bb.3, implicit $exec		; GFX908-NEXT: S_CBRANCH_EXECZ %bb.3, implicit $exec
; GFX908-NEXT: S_BRANCH %bb.2		; GFX908-NEXT: S_BRANCH %bb.2
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: bb.2:		; GFX908-NEXT: bb.2:
; GFX908-NEXT: successors: %bb.3(0x80000000)		; GFX908-NEXT: successors: %bb.3(0x80000000)
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: [[V_CVT_I32_F64_e32_63:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 64, implicit $exec, implicit $mode, implicit-def $m0		; GFX908-NEXT: [[V_CVT_I32_F64_e32_63:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 64, implicit $exec, implicit $mode, implicit-def $m0
; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_63]]
; GFX908-NEXT: [[V_CVT_I32_F64_e32_64:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 63, implicit $exec, implicit $mode		; GFX908-NEXT: [[V_CVT_I32_F64_e32_64:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 63, implicit $exec, implicit $mode
		; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_63]]
; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_64]]		; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_64]]
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: bb.3:		; GFX908-NEXT: bb.3:
; GFX908-NEXT: successors: %bb.5(0x04000000), %bb.4(0x7c000000)		; GFX908-NEXT: successors: %bb.5(0x04000000), %bb.4(0x7c000000)
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: $exec = S_OR_B64 $exec, [[COPY2]], implicit-def $scc		; GFX908-NEXT: $exec = S_OR_B64 $exec, [[COPY2]], implicit-def $scc
; GFX908-NEXT: undef %4.sub0:sreg_64 = S_ADD_I32 %4.sub0, -1, implicit-def dead $scc		; GFX908-NEXT: undef %4.sub0:sreg_64 = S_ADD_I32 %4.sub0, -1, implicit-def dead $scc
; GFX908-NEXT: S_CMP_LG_U32 %4.sub0, 0, implicit-def $scc		; GFX908-NEXT: S_CMP_LG_U32 %4.sub0, 0, implicit-def $scc
▲ Show 20 Lines • Show All 284 Lines • ▼ Show 20 Lines	body: \|
; GFX908-NEXT: $exec = S_MOV_B64_term [[S_AND_B64_]]		; GFX908-NEXT: $exec = S_MOV_B64_term [[S_AND_B64_]]
; GFX908-NEXT: S_CBRANCH_EXECZ %bb.3, implicit $exec		; GFX908-NEXT: S_CBRANCH_EXECZ %bb.3, implicit $exec
; GFX908-NEXT: S_BRANCH %bb.2		; GFX908-NEXT: S_BRANCH %bb.2
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: bb.2:		; GFX908-NEXT: bb.2:
; GFX908-NEXT: successors: %bb.3(0x80000000)		; GFX908-NEXT: successors: %bb.3(0x80000000)
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: [[V_CVT_I32_F64_e32_83:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 84, implicit $exec, implicit $mode, implicit-def $m0		; GFX908-NEXT: [[V_CVT_I32_F64_e32_83:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 84, implicit $exec, implicit $mode, implicit-def $m0
; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_83]]
; GFX908-NEXT: [[V_CVT_I32_F64_e32_84:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 83, implicit $exec, implicit $mode		; GFX908-NEXT: [[V_CVT_I32_F64_e32_84:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 83, implicit $exec, implicit $mode
		; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_83]]
; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_84]]		; GFX908-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_84]]
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: bb.3:		; GFX908-NEXT: bb.3:
; GFX908-NEXT: successors: %bb.5(0x04000000), %bb.4(0x7c000000)		; GFX908-NEXT: successors: %bb.5(0x04000000), %bb.4(0x7c000000)
; GFX908-NEXT: {{ $}}		; GFX908-NEXT: {{ $}}
; GFX908-NEXT: $exec = S_OR_B64 $exec, [[COPY2]], implicit-def $scc		; GFX908-NEXT: $exec = S_OR_B64 $exec, [[COPY2]], implicit-def $scc
; GFX908-NEXT: undef %4.sub0:sreg_64 = S_ADD_I32 %4.sub0, -1, implicit-def dead $scc		; GFX908-NEXT: undef %4.sub0:sreg_64 = S_ADD_I32 %4.sub0, -1, implicit-def dead $scc
; GFX908-NEXT: S_CMP_LG_U32 %4.sub0, 0, implicit-def $scc		; GFX908-NEXT: S_CMP_LG_U32 %4.sub0, 0, implicit-def $scc
▲ Show 20 Lines • Show All 766 Lines • ▼ Show 20 Lines	bb.0:
%17:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 17, implicit $exec, implicit $mode, implicit-def $m0		%17:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 17, implicit $exec, implicit $mode, implicit-def $m0
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
undef %21.sub0:vreg_128 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode		undef %21.sub0:vreg_128 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode
%21.sub1:vreg_128 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode		%21.sub1:vreg_128 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%21.sub2:vreg_128 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode, implicit-def $m0		%21.sub2:vreg_128 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode, implicit-def $m0
%21.sub3:vreg_128 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%21.sub3:vreg_128 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %21		S_NOP 0, implicit %21

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
S_NOP 0, implicit %12, implicit %13		S_NOP 0, implicit %12, implicit %13
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	bb.0:
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0		%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0
%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %23, implicit %24		S_NOP 0, implicit %23, implicit %24

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
S_NOP 0, implicit %12, implicit %13		S_NOP 0, implicit %12, implicit %13
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	bb.0:
%17:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 17, implicit $exec, implicit $mode, implicit-def $m0		%17:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 17, implicit $exec, implicit $mode, implicit-def $m0
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vreg_64 = nofpexcept V_CVT_F64_I32_e32 22, implicit $exec, implicit $mode		%22:vreg_64 = nofpexcept V_CVT_F64_I32_e32 22, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode, implicit-def $m0		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %23		S_NOP 0, implicit %23
S_NOP 0, implicit %22		S_NOP 0, implicit %22

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
S_NOP 0, implicit %12, implicit %13		S_NOP 0, implicit %12, implicit %13
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	bb.0:
%17:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 17, implicit $exec, implicit $mode, implicit-def $m0		%17:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 17, implicit $exec, implicit $mode, implicit-def $m0
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vreg_64 = nofpexcept V_CVT_F64_I32_e32 22, implicit $exec, implicit $mode		%22:vreg_64 = nofpexcept V_CVT_F64_I32_e32 22, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode, implicit-def $m0		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode, implicit-def $m0
%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %23, implicit %24		S_NOP 0, implicit %23, implicit %24
S_NOP 0, implicit %22		S_NOP 0, implicit %22

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
S_NOP 0, implicit %12, implicit %13		S_NOP 0, implicit %12, implicit %13
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	bb.0:
%17:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 17, implicit $exec, implicit $mode, implicit-def $m0		%17:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 17, implicit $exec, implicit $mode, implicit-def $m0
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vreg_64 = nofpexcept V_CVT_F64_I32_e32 22, implicit $exec, implicit $mode		%22:vreg_64 = nofpexcept V_CVT_F64_I32_e32 22, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode, implicit-def $m0		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode, implicit-def $m0
%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
%25:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 25, implicit $exec, implicit $mode, implicit-def $m0		%25:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 25, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %23, implicit %24, implicit %25		S_NOP 0, implicit %23, implicit %24, implicit %25
S_NOP 0, implicit %22		S_NOP 0, implicit %22

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
S_NOP 0, implicit %12, implicit %13		S_NOP 0, implicit %12, implicit %13
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	bb.0:
%17:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 17, implicit $exec, implicit $mode, implicit-def $m0		%17:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 17, implicit $exec, implicit $mode, implicit-def $m0
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vreg_64 = nofpexcept V_CVT_F64_I32_e32 22, implicit $exec, implicit $mode		%22:vreg_64 = nofpexcept V_CVT_F64_I32_e32 22, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode, implicit-def $m0		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %22, implicit %23		S_NOP 0, implicit %22, implicit %23

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
S_NOP 0, implicit %12, implicit %13		S_NOP 0, implicit %12, implicit %13
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	bb.0:
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0		%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0
undef %23.sub1:vreg_64_align2 = V_MOV_B32_e32 23, implicit $exec		undef %23.sub1:vreg_64_align2 = V_MOV_B32_e32 23, implicit $exec

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %24		S_NOP 0, implicit %24

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %23.sub1		S_NOP 0, implicit %23.sub1
S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	bb.0:
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0		%22:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 22, implicit $exec, implicit $mode, implicit-def $m0
%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %24		S_NOP 0, implicit %24

bb.2:		bb.2:
; predcessors: %bb.1

DBG_VALUE %23, 0, 0		DBG_VALUE %23, 0, 0
S_NOP 0, implicit %23		S_NOP 0, implicit %23
S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	bb.0:
%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0		%18:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 18, implicit $exec, implicit $mode, implicit-def $m0
%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0		%19:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 19, implicit $exec, implicit $mode, implicit-def $m0
%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0		%20:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 20, implicit $exec, implicit $mode, implicit-def $m0
%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0		%21:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 21, implicit $exec, implicit $mode, implicit-def $m0
INLINEASM &"v_or_b32 $0, 0, $1", 32, 327690, def %22:vgpr_32, 327689, %4		INLINEASM &"v_or_b32 $0, 0, $1", 32, 327690, def %22:vgpr_32, 327689, %4
%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode		%23:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 23, implicit $exec, implicit $mode

bb.1:		bb.1:
; predecessors: %bb.0
successors: %bb.2		successors: %bb.2

%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0		%24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
S_NOP 0, implicit %24		S_NOP 0, implicit %24

bb.2:		bb.2:
; predcessors: %bb.1

S_NOP 0, implicit %23		S_NOP 0, implicit %23
S_NOP 0, implicit %0, implicit %1		S_NOP 0, implicit %0, implicit %1
S_NOP 0, implicit %2, implicit %3		S_NOP 0, implicit %2, implicit %3
S_NOP 0, implicit %4, implicit %5		S_NOP 0, implicit %4, implicit %5
S_NOP 0, implicit %6, implicit %7		S_NOP 0, implicit %6, implicit %7
S_NOP 0, implicit %8, implicit %9		S_NOP 0, implicit %8, implicit %9
S_NOP 0, implicit %10, implicit %11		S_NOP 0, implicit %10, implicit %11
Show All 9 Lines

llvm/test/CodeGen/AMDGPU/scc-clobbered-sgpr-to-vmem-spill.ll

	Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: v_writelane_b32 v23, s3, 31			; CHECK-NEXT: v_writelane_b32 v23, s3, 31
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; def s[4:7]			; CHECK-NEXT: ; def s[4:7]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_writelane_b32 v23, s4, 32			; CHECK-NEXT: v_writelane_b32 v23, s4, 32
	; CHECK-NEXT: v_writelane_b32 v23, s5, 33			; CHECK-NEXT: v_writelane_b32 v23, s5, 33
	; CHECK-NEXT: v_writelane_b32 v23, s6, 34			; CHECK-NEXT: v_writelane_b32 v23, s6, 34
	; CHECK-NEXT: v_writelane_b32 v23, s7, 35			; CHECK-NEXT: v_writelane_b32 v23, s7, 35
	; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; def s[4:11]
	; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_writelane_b32 v23, s4, 36
	; CHECK-NEXT: v_writelane_b32 v23, s5, 37
	; CHECK-NEXT: v_writelane_b32 v23, s6, 38
	; CHECK-NEXT: v_writelane_b32 v23, s7, 39
	; CHECK-NEXT: v_writelane_b32 v23, s8, 40
	; CHECK-NEXT: v_writelane_b32 v23, s9, 41
	; CHECK-NEXT: v_writelane_b32 v23, s10, 42
	; CHECK-NEXT: v_writelane_b32 v23, s11, 43
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_cmp_lg_u32 s0, 0			; CHECK-NEXT: s_cmp_lg_u32 s0, 0
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; def s[44:51]
				; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; def s[16:31]			; CHECK-NEXT: ; def s[16:31]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; def s[52:53]			; CHECK-NEXT: ; def s[34:35]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; def s[48:51]			; CHECK-NEXT: ; def s[52:55]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; def s[36:43]			; CHECK-NEXT: ; def s[36:43]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; def s[0:15]			; CHECK-NEXT: ; def s[0:15]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_writelane_b32 v23, s0, 44			; CHECK-NEXT: v_writelane_b32 v23, s0, 36
	; CHECK-NEXT: v_writelane_b32 v23, s1, 45			; CHECK-NEXT: v_writelane_b32 v23, s1, 37
	; CHECK-NEXT: v_writelane_b32 v23, s2, 46			; CHECK-NEXT: v_writelane_b32 v23, s2, 38
	; CHECK-NEXT: v_writelane_b32 v23, s3, 47			; CHECK-NEXT: v_writelane_b32 v23, s3, 39
	; CHECK-NEXT: v_writelane_b32 v23, s4, 48			; CHECK-NEXT: v_writelane_b32 v23, s4, 40
	; CHECK-NEXT: v_writelane_b32 v23, s5, 49			; CHECK-NEXT: v_writelane_b32 v23, s5, 41
	; CHECK-NEXT: v_writelane_b32 v23, s6, 50			; CHECK-NEXT: v_writelane_b32 v23, s6, 42
	; CHECK-NEXT: v_writelane_b32 v23, s7, 51			; CHECK-NEXT: v_writelane_b32 v23, s7, 43
	; CHECK-NEXT: v_writelane_b32 v23, s8, 52			; CHECK-NEXT: v_writelane_b32 v23, s8, 44
	; CHECK-NEXT: v_writelane_b32 v23, s9, 53			; CHECK-NEXT: v_writelane_b32 v23, s9, 45
	; CHECK-NEXT: v_writelane_b32 v23, s10, 54			; CHECK-NEXT: v_writelane_b32 v23, s10, 46
	; CHECK-NEXT: v_writelane_b32 v23, s11, 55			; CHECK-NEXT: v_writelane_b32 v23, s11, 47
	; CHECK-NEXT: v_writelane_b32 v23, s12, 56			; CHECK-NEXT: v_writelane_b32 v23, s12, 48
	; CHECK-NEXT: v_writelane_b32 v23, s13, 57			; CHECK-NEXT: v_writelane_b32 v23, s13, 49
	; CHECK-NEXT: v_writelane_b32 v23, s14, 58			; CHECK-NEXT: v_writelane_b32 v23, s14, 50
	; CHECK-NEXT: ; implicit-def: $vgpr0			; CHECK-NEXT: v_writelane_b32 v23, s15, 51
	; CHECK-NEXT: v_writelane_b32 v23, s15, 59
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; def s[34:35]			; CHECK-NEXT: ; def s[0:1]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v23, s0, 52
				; CHECK-NEXT: v_writelane_b32 v23, s1, 53
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; def s[44:47]			; CHECK-NEXT: ; def s[0:3]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v23, s0, 54
				; CHECK-NEXT: v_writelane_b32 v23, s1, 55
				; CHECK-NEXT: v_writelane_b32 v23, s2, 56
				; CHECK-NEXT: v_writelane_b32 v23, s3, 57
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; def s[0:7]			; CHECK-NEXT: ; def s[0:7]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_writelane_b32 v23, s0, 60			; CHECK-NEXT: v_writelane_b32 v23, s0, 58
	; CHECK-NEXT: v_writelane_b32 v0, s4, 0			; CHECK-NEXT: v_writelane_b32 v23, s1, 59
	; CHECK-NEXT: v_writelane_b32 v23, s1, 61			; CHECK-NEXT: v_writelane_b32 v23, s2, 60
	; CHECK-NEXT: v_writelane_b32 v0, s5, 1			; CHECK-NEXT: ; implicit-def: $vgpr0
	; CHECK-NEXT: v_writelane_b32 v23, s2, 62			; CHECK-NEXT: v_writelane_b32 v23, s3, 61
	; CHECK-NEXT: v_writelane_b32 v0, s6, 2			; CHECK-NEXT: v_writelane_b32 v23, s4, 62
	; CHECK-NEXT: v_writelane_b32 v23, s3, 63			; CHECK-NEXT: v_writelane_b32 v0, s6, 0
	; CHECK-NEXT: v_writelane_b32 v0, s7, 3			; CHECK-NEXT: v_writelane_b32 v23, s5, 63
				; CHECK-NEXT: v_writelane_b32 v0, s7, 1
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; def s[0:15]			; CHECK-NEXT: ; def s[0:15]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_writelane_b32 v0, s0, 4			; CHECK-NEXT: v_writelane_b32 v0, s0, 2
	; CHECK-NEXT: v_writelane_b32 v0, s1, 5			; CHECK-NEXT: v_writelane_b32 v0, s1, 3
	; CHECK-NEXT: v_writelane_b32 v0, s2, 6			; CHECK-NEXT: v_writelane_b32 v0, s2, 4
	; CHECK-NEXT: v_writelane_b32 v0, s3, 7			; CHECK-NEXT: v_writelane_b32 v0, s3, 5
	; CHECK-NEXT: v_writelane_b32 v0, s4, 8			; CHECK-NEXT: v_writelane_b32 v0, s4, 6
	; CHECK-NEXT: v_writelane_b32 v0, s5, 9			; CHECK-NEXT: v_writelane_b32 v0, s5, 7
	; CHECK-NEXT: v_writelane_b32 v0, s6, 10			; CHECK-NEXT: v_writelane_b32 v0, s6, 8
	; CHECK-NEXT: v_writelane_b32 v0, s7, 11			; CHECK-NEXT: v_writelane_b32 v0, s7, 9
	; CHECK-NEXT: v_writelane_b32 v0, s8, 12			; CHECK-NEXT: v_writelane_b32 v0, s8, 10
	; CHECK-NEXT: v_writelane_b32 v0, s9, 13			; CHECK-NEXT: v_writelane_b32 v0, s9, 11
	; CHECK-NEXT: v_writelane_b32 v0, s10, 14			; CHECK-NEXT: v_writelane_b32 v0, s10, 12
	; CHECK-NEXT: v_writelane_b32 v0, s11, 15			; CHECK-NEXT: v_writelane_b32 v0, s11, 13
	; CHECK-NEXT: v_writelane_b32 v0, s12, 16			; CHECK-NEXT: v_writelane_b32 v0, s12, 14
	; CHECK-NEXT: v_writelane_b32 v0, s13, 17			; CHECK-NEXT: v_writelane_b32 v0, s13, 15
	; CHECK-NEXT: v_writelane_b32 v0, s14, 18			; CHECK-NEXT: v_writelane_b32 v0, s14, 16
	; CHECK-NEXT: v_writelane_b32 v0, s15, 19			; CHECK-NEXT: v_writelane_b32 v0, s15, 17
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; def s[54:55]			; CHECK-NEXT: ; def s[0:1]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_writelane_b32 v0, s0, 18
				; CHECK-NEXT: v_writelane_b32 v0, s1, 19
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; def s[0:3]			; CHECK-NEXT: ; def s[0:3]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_writelane_b32 v0, s0, 20			; CHECK-NEXT: v_writelane_b32 v0, s0, 20
	; CHECK-NEXT: v_writelane_b32 v0, s1, 21			; CHECK-NEXT: v_writelane_b32 v0, s1, 21
	; CHECK-NEXT: v_writelane_b32 v0, s2, 22			; CHECK-NEXT: v_writelane_b32 v0, s2, 22
	; CHECK-NEXT: v_writelane_b32 v0, s3, 23			; CHECK-NEXT: v_writelane_b32 v0, s3, 23
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: v_readlane_b32 s1, v23, 33			; CHECK-NEXT: v_readlane_b32 s1, v23, 33
	; CHECK-NEXT: v_readlane_b32 s2, v23, 34			; CHECK-NEXT: v_readlane_b32 s2, v23, 34
	; CHECK-NEXT: v_readlane_b32 s3, v23, 35			; CHECK-NEXT: v_readlane_b32 s3, v23, 35
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; use s[0:3]			; CHECK-NEXT: ; use s[0:3]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_readlane_b32 s0, v23, 36			; CHECK-NEXT: v_readlane_b32 s0, v23, 36
	; CHECK-NEXT: v_readlane_b32 s1, v23, 37			; CHECK-NEXT: v_readlane_b32 s1, v23, 37
	; CHECK-NEXT: v_readlane_b32 s2, v23, 38
	; CHECK-NEXT: v_readlane_b32 s3, v23, 39
	; CHECK-NEXT: v_readlane_b32 s4, v23, 40
	; CHECK-NEXT: v_readlane_b32 s5, v23, 41
	; CHECK-NEXT: v_readlane_b32 s6, v23, 42
	; CHECK-NEXT: v_readlane_b32 s7, v23, 43
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; use s[0:7]			; CHECK-NEXT: ; use s[44:51]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_readlane_b32 s0, v23, 44
	; CHECK-NEXT: v_readlane_b32 s1, v23, 45
	; CHECK-NEXT: v_readlane_b32 s2, v23, 46
	; CHECK-NEXT: v_readlane_b32 s3, v23, 47
	; CHECK-NEXT: v_readlane_b32 s4, v23, 48
	; CHECK-NEXT: v_readlane_b32 s5, v23, 49
	; CHECK-NEXT: v_readlane_b32 s6, v23, 50
	; CHECK-NEXT: v_readlane_b32 s7, v23, 51
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; use s[16:31]			; CHECK-NEXT: ; use s[16:31]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; use s[52:53]			; CHECK-NEXT: ; use s[34:35]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; use s[48:51]			; CHECK-NEXT: ; use s[52:55]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; use s[36:43]			; CHECK-NEXT: ; use s[36:43]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_readlane_b32 s8, v23, 52			; CHECK-NEXT: v_readlane_b32 s2, v23, 38
	; CHECK-NEXT: v_readlane_b32 s9, v23, 53			; CHECK-NEXT: v_readlane_b32 s3, v23, 39
	; CHECK-NEXT: v_readlane_b32 s10, v23, 54			; CHECK-NEXT: v_readlane_b32 s4, v23, 40
	; CHECK-NEXT: v_readlane_b32 s11, v23, 55			; CHECK-NEXT: v_readlane_b32 s5, v23, 41
	; CHECK-NEXT: v_readlane_b32 s12, v23, 56			; CHECK-NEXT: v_readlane_b32 s6, v23, 42
	; CHECK-NEXT: v_readlane_b32 s13, v23, 57			; CHECK-NEXT: v_readlane_b32 s7, v23, 43
	; CHECK-NEXT: v_readlane_b32 s14, v23, 58			; CHECK-NEXT: v_readlane_b32 s8, v23, 44
	; CHECK-NEXT: v_readlane_b32 s15, v23, 59			; CHECK-NEXT: v_readlane_b32 s9, v23, 45
				; CHECK-NEXT: v_readlane_b32 s10, v23, 46
				; CHECK-NEXT: v_readlane_b32 s11, v23, 47
				; CHECK-NEXT: v_readlane_b32 s12, v23, 48
				; CHECK-NEXT: v_readlane_b32 s13, v23, 49
				; CHECK-NEXT: v_readlane_b32 s14, v23, 50
				; CHECK-NEXT: v_readlane_b32 s15, v23, 51
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; use s[0:15]			; CHECK-NEXT: ; use s[0:15]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_readlane_b32 s0, v23, 60			; CHECK-NEXT: v_readlane_b32 s0, v23, 52
	; CHECK-NEXT: v_readlane_b32 s1, v23, 61			; CHECK-NEXT: v_readlane_b32 s1, v23, 53
	; CHECK-NEXT: v_readlane_b32 s2, v23, 62
	; CHECK-NEXT: v_readlane_b32 s3, v23, 63
	; CHECK-NEXT: v_readlane_b32 s4, v0, 0
	; CHECK-NEXT: v_readlane_b32 s5, v0, 1
	; CHECK-NEXT: v_readlane_b32 s6, v0, 2
	; CHECK-NEXT: v_readlane_b32 s7, v0, 3
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; use s[34:35]			; CHECK-NEXT: ; use s[0:1]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v23, 54
				; CHECK-NEXT: v_readlane_b32 s1, v23, 55
				; CHECK-NEXT: v_readlane_b32 s2, v23, 56
				; CHECK-NEXT: v_readlane_b32 s3, v23, 57
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; use s[44:47]			; CHECK-NEXT: ; use s[0:3]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v23, 58
				; CHECK-NEXT: v_readlane_b32 s1, v23, 59
				; CHECK-NEXT: v_readlane_b32 s2, v23, 60
				; CHECK-NEXT: v_readlane_b32 s3, v23, 61
				; CHECK-NEXT: v_readlane_b32 s4, v23, 62
				; CHECK-NEXT: v_readlane_b32 s5, v23, 63
				; CHECK-NEXT: v_readlane_b32 s6, v0, 0
				; CHECK-NEXT: v_readlane_b32 s7, v0, 1
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; use s[0:7]			; CHECK-NEXT: ; use s[0:7]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_readlane_b32 s0, v0, 4			; CHECK-NEXT: v_readlane_b32 s0, v0, 2
	; CHECK-NEXT: v_readlane_b32 s1, v0, 5			; CHECK-NEXT: v_readlane_b32 s1, v0, 3
	; CHECK-NEXT: v_readlane_b32 s2, v0, 6			; CHECK-NEXT: v_readlane_b32 s2, v0, 4
	; CHECK-NEXT: v_readlane_b32 s3, v0, 7			; CHECK-NEXT: v_readlane_b32 s3, v0, 5
	; CHECK-NEXT: v_readlane_b32 s4, v0, 8			; CHECK-NEXT: v_readlane_b32 s4, v0, 6
	; CHECK-NEXT: v_readlane_b32 s5, v0, 9			; CHECK-NEXT: v_readlane_b32 s5, v0, 7
	; CHECK-NEXT: v_readlane_b32 s6, v0, 10			; CHECK-NEXT: v_readlane_b32 s6, v0, 8
	; CHECK-NEXT: v_readlane_b32 s7, v0, 11			; CHECK-NEXT: v_readlane_b32 s7, v0, 9
	; CHECK-NEXT: v_readlane_b32 s8, v0, 12			; CHECK-NEXT: v_readlane_b32 s8, v0, 10
	; CHECK-NEXT: v_readlane_b32 s9, v0, 13			; CHECK-NEXT: v_readlane_b32 s9, v0, 11
	; CHECK-NEXT: v_readlane_b32 s10, v0, 14			; CHECK-NEXT: v_readlane_b32 s10, v0, 12
	; CHECK-NEXT: v_readlane_b32 s11, v0, 15			; CHECK-NEXT: v_readlane_b32 s11, v0, 13
	; CHECK-NEXT: v_readlane_b32 s12, v0, 16			; CHECK-NEXT: v_readlane_b32 s12, v0, 14
	; CHECK-NEXT: v_readlane_b32 s13, v0, 17			; CHECK-NEXT: v_readlane_b32 s13, v0, 15
	; CHECK-NEXT: v_readlane_b32 s14, v0, 18			; CHECK-NEXT: v_readlane_b32 s14, v0, 16
	; CHECK-NEXT: v_readlane_b32 s15, v0, 19			; CHECK-NEXT: v_readlane_b32 s15, v0, 17
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; use s[0:15]			; CHECK-NEXT: ; use s[0:15]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: v_readlane_b32 s0, v0, 18
				; CHECK-NEXT: v_readlane_b32 s1, v0, 19
				; CHECK-NEXT: ;;#ASMSTART
				; CHECK-NEXT: ; use s[0:1]
				; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_readlane_b32 s0, v0, 20			; CHECK-NEXT: v_readlane_b32 s0, v0, 20
	; CHECK-NEXT: v_readlane_b32 s1, v0, 21			; CHECK-NEXT: v_readlane_b32 s1, v0, 21
	; CHECK-NEXT: v_readlane_b32 s2, v0, 22			; CHECK-NEXT: v_readlane_b32 s2, v0, 22
	; CHECK-NEXT: v_readlane_b32 s3, v0, 23			; CHECK-NEXT: v_readlane_b32 s3, v0, 23
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; use s[54:55]
	; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; use s[0:3]			; CHECK-NEXT: ; use s[0:3]
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_readlane_b32 s0, v0, 24			; CHECK-NEXT: v_readlane_b32 s0, v0, 24
	; CHECK-NEXT: v_readlane_b32 s1, v0, 25			; CHECK-NEXT: v_readlane_b32 s1, v0, 25
	; CHECK-NEXT: v_readlane_b32 s2, v0, 26			; CHECK-NEXT: v_readlane_b32 s2, v0, 26
	; CHECK-NEXT: v_readlane_b32 s3, v0, 27			; CHECK-NEXT: v_readlane_b32 s3, v0, 27
	; CHECK-NEXT: v_readlane_b32 s4, v0, 28			; CHECK-NEXT: v_readlane_b32 s4, v0, 28
	; CHECK-NEXT: v_readlane_b32 s5, v0, 29			; CHECK-NEXT: v_readlane_b32 s5, v0, 29
	▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines