This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
Passes.h
-
RegAllocCommon.h
-
RegAllocRegistry.h
-
lib/
-
CodeGen/
-
LiveIntervals.cpp
-
RegAllocBase.h
-
RegAllocBase.cpp
-
RegAllocBasic.cpp
-
RegAllocFast.cpp
-
RegAllocGreedy.cpp
-
TargetPassConfig.cpp
-
Target/AMDGPU/
-
AMDGPU/
1
AMDGPUTargetMachine.cpp
2
SIFrameLowering.cpp
-
SILowerSGPRSpills.cpp
-
SIMachineFunctionInfo.cpp
-
SIRegisterInfo.h
-
SIRegisterInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
extractelement-stack-lower.ll
-
agpr-csr.ll
-
alloc-aligned-tuples-gfx908.mir
-
alloc-aligned-tuples-gfx90a.mir
-
attr-amdgpu-flat-work-group-size-vgpr-limit.ll
-
callee-frame-setup.ll
-
gfx-callable-argument-types.ll
-
gfx-callable-preserved-registers.ll
-
indirect-call.ll
-
llc-pipeline.ll
-
mul24-pass-ordering.ll
-
pei-build-spill.mir
-
sgpr-regalloc-flags.ll
-
sgpr-spill-no-vgprs.ll
-
sgpr-spill-wrong-stack-id.mir
-
sibling-call.ll
-
spill-empty-live-interval.mir
-
spill-scavenge-offset.ll
-
spill_more_than_wavesize_csr_sgprs.ll
-
stack-slot-color-sgpr-vgpr-spills.mir
-
unstructured-cfg-def-use-issue.ll
-
vgpr-tuple-allocation.ll
-
virtregrewrite-undef-identity-copy.mir

Differential D55301

RegAlloc: Allow targets to split register allocation
ClosedPublic

Authored by arsenm on Dec 4 2018, 4:24 PM.

Download Raw Diff

Details

Reviewers

MatzeB
qcolombet
rampitec
scott.linder

Summary

AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.

Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run. This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.

This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.

In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.

One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.

Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.

Diff Detail

Event Timeline

arsenm created this revision.Dec 4 2018, 4:24 PM

Herald added subscribers: tpr, mgorny, nhaehnle and 2 others. · View Herald TranscriptDec 4 2018, 4:24 PM

arsenm added parent revisions: D55295: LiveIntervals: Add removePhysReg, D55238: MIR: Preserve incoming frame index numbers, D55287: VirtRegMap: Support partially allocated virtual registers, D55285: AMDGPU: Scavenge register instead of findUnusedReg, D55286: VirtRegMap: Add pass option to not clear virt regs, D55284: RegisterScavenger: Allow fail without spill, D55283: CodeGen: Refactor regallocator command line and target selection, D55282: CodeGen: Make RegAllocRegistry a template class.Dec 4 2018, 4:24 PM

arsenm added a child revision: D55333: VirtRegMap: Preserve LiveDebugVariables.Dec 5 2018, 9:06 AM

Hi Matt,

Have you tried to use combined V+S register classes?
By describing such classes, when a S or V register would be split, they would eventually have constraints in that "super" class. Thus, inside of spilling, the splitting mechanism would naturally insert copies of the form [V|S] = copy V+S or V+S = copy [V|S], which seem to be what you are trying to achieve. The advantage of such approach is that we would not have to effectively split the allocation.

Cheers,
-Quentin

In D55301#1321550, @qcolombet wrote:

Hi Matt,

Have you tried to use combined V+S register classes?
By describing such classes, when a S or V register would be split, they would eventually have constraints in that "super" class. Thus, inside of spilling, the splitting mechanism would naturally insert copies of the form [V|S] = copy V+S or V+S = copy [V|S], which seem to be what you are trying to achieve. The advantage of such approach is that we would not have to effectively split the allocation.

Cheers,
-Quentin

I'm not sure I follow this. These aren't spilled with ordinary copies. This uses cross lane instructions to read/write SGPRs into the various lane VGPRs (i.e 64 SGPRs can be spilled to each lane in the wave's VGPR). We also can't legally copy from V to S. Having virtual registers with the combined class doesn't really conceptually make sense for us either (and would probably break every single place that we need to consider these)

This also wouldn't allow us to change the set of reserved registers in the middle of allocation, which is part of the problem.

rampitec added inline comments.Dec 6 2018, 5:26 PM

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
75 ↗	(On Diff #176729)	!isSGPRClass() to catch [potentially] remaining strange register classes.
1064 ↗	(On Diff #176729)	You need to pass filter to PreRewrite as well.

I'm not sure I follow this. These aren't spilled with ordinary copies

I would expect that you could use ordinary copies + subreg here and do the proper expansion in the later expand post RA pass like every other copy.

This uses cross lane instructions to read/write SGPRs into the various lane VGPRs (i.e 64 SGPRs can be spilled to each lane in the wave's VGPR). We also can't legally copy from V to S. Having virtual registers with the combined class doesn't really conceptually make sense for us either (and would probably break every single place that we need to consider these)

That wouldn't appear in elsewhere than tablegen. That's just something to tell RA that the biggest unconstrained class is V+S.

This also wouldn't allow us to change the set of reserved registers in the middle of allocation, which is part of the problem.

I missed that part, but I also don't get why this is a problem. IIRC we can always narrow the set of available registers for each virtual register.

Anyhow, the changes on the generic parts looks mostly good to me. Comments inlined.

include/llvm/CodeGen/RegAllocCommon.h
23 ↗	(On Diff #176729)	Please add doxygen comment.
lib/CodeGen/RegAllocGreedy.cpp
615 ↗	(On Diff #176729)	Why do we need both constructors?
707 ↗	(On Diff #176729)	Removing this assert is worrisome. Why do we need that?
lib/CodeGen/TargetFrameLoweringImpl.cpp
21 ↗	(On Diff #176729)	Why do we need this change?

In D55301#1323324, @qcolombet wrote:

I'm not sure I follow this. These aren't spilled with ordinary copies

I would expect that you could use ordinary copies + subreg here and do the proper expansion in the later expand post RA pass like every other copy.

We don't model different lanes as subregisters, and trying to would be a pretty radical change. I can almost see a way to hack it to work, but it would involve adding an enormous number of new subregister indexes. Unless you mean using some kind of RMW copy (since the old single lane view of the register's value needs to be preserved)

Remove leftover changes and add comment

arsenm mentioned this in D54365: RegAllocFast: Remove early selection loop, the spill calculation will report cost 0 anyway for free regs.Jan 9 2019, 8:40 PM

arsenm mentioned this in D52010: RegAllocFast: Rewrite and improve.

ping

In D55301#1393065, @arsenm wrote:

ping

Pass the filter to PreRewrite.

Hi Matt,

Couple of nitpicks inline.
My online remaining concern is exposing ClearVirtRegs.

Cheers,
-Quentin

lib/CodeGen/RegAllocBase.cpp
179 ↗	(On Diff #178990)	For debugging purposes, add a DEBUG statement for each case.
lib/CodeGen/RegAllocFast.cpp
73 ↗	(On Diff #178990)	Should this be `const RegClassFilterFunc &` everywhere?
78 ↗	(On Diff #178990)	It feels dangerous to expose the ClearVirtRegs to me. Could we deduce what has to be cleared based on what we allocate instead of exposing this?
1350 ↗	(On Diff #178990)	Could we have just one createFastRegisterAllocator with default arguments? (Also ClearVirtReg should disappear per my other comment IMO).
lib/CodeGen/RegAllocGreedy.cpp
603 ↗	(On Diff #178990)	Ditto: Just one createXXX method.

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 12 2019, 10:04 AM

arsenm marked 2 inline comments as done.Feb 13 2019, 9:18 AM

arsenm added inline comments.

lib/CodeGen/RegAllocFast.cpp
78 ↗	(On Diff #178990)	The problem is somewhere needs to set NoVRegs property. The same parameter is added to createVirtRegRewriter, but fastregalloc does the assignment itself. I don't think this can be inferred, and the target needs to say when it's done allocating register classes. For example it would be possible to have a degenerate function where all SGPRs are allocated in the first run, and there happen to be no VGPR vregs. Intervening passes may want to introduce new vregs to be taken care of by the later runs, but that won't work if the earlier pass decided to infer that all registers were taken care of

arsenm marked an inline comment as done.Feb 13 2019, 9:22 AM

arsenm added inline comments.

lib/CodeGen/RegAllocFast.cpp
78 ↗	(On Diff #178990)	Actually I stopped creating new virtual registers at some point in the current implementation, but I still may want to do so in the future

In D55301#1393260, @rampitec wrote:

In D55301#1393065, @arsenm wrote:

ping

Pass the filter to PreRewrite.

I'm not sure what good that would do as it doesn't do anything now

lib/CodeGen/RegAllocFast.cpp
1350 ↗	(On Diff #178990)	The RegAllocRegistry requires the type to be the no-argument function pass constructor. I could change that, but then all would have the ClearVirtRegs argument or not

Partially address comments.

This also probably needs some more test fixes, but the fast regalloc rewrite patches need rebasing first

arsenm mentioned this in D55295: LiveIntervals: Add removePhysReg.Feb 13 2019, 6:04 PM

Is this still alive?

Herald added a subscriber: kerbowa. · View Herald TranscriptApr 20 2020, 1:02 PM

aditya_nandakumar added a subscriber: aditya_nandakumar.Apr 20 2020, 1:06 PM

In D55301#1993080, @qcolombet wrote:

Is this still alive?

Yes, but it depends on the fastregalloc rewrite patches (which I need to rrebase the tests for, for the 100th time which takes forever)

but it depends on the fastregalloc rewrite patches

Which ones?

In D55301#1993283, @qcolombet wrote:

but it depends on the fastregalloc rewrite patches

Which ones?

D54368 and D52010. I started rebasing the tests a few months ago but didn't finish; I think I got distracted by regressed loop spills vs. last time I rebased

Thanks for the pointers, I'll try to look into reviewing D52010 next week.

Rebase, fix AGPR handling

Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2020, 5:54 PM

Herald added subscribers: wenlei, hiraditya. · View Herald Transcript

rampitec added inline comments.Dec 23 2020, 10:29 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1369	GCNRegBankReassign also works with SGPRs. Which means you need a pre-rewriter here, which needs to have a different subset of passes and an RC filter.
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1152	It can be saddr with flat scratch. It seems it needs to be fixed in a separate patch first.

rampitec added inline comments.Dec 23 2020, 10:48 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1152	Never mind, this is one of SI_SPILL opcodes, not real instruction yet.

arsenm mentioned this in D96336: [AMDGPU] Save VGPR of whole wave when spilling.Feb 23 2021, 8:47 AM

Rebase

Herald added a subscriber: nikic. · View Herald TranscriptMay 12 2021, 11:04 AM

Harbormaster completed remote builds in B104083: Diff 344878.May 12 2021, 11:04 AM

Since GCNRegBanksReassign is removed this is LGTM.

LGTM.
Disclaimer: I didn't really look at the AMDGPU changes.

This revision is now accepted and ready to land.May 12 2021, 1:43 PM

In D55301#2755345, @qcolombet wrote:

LGTM.
Disclaimer: I didn't really look at the AMDGPU changes.

I did. Thanks!

@arsenm it is a good idea to run PSDB before this change.

At long last, eebe841a47cbbd55bdcc32da943c92d18f88a5b8

Herald added a subscriber: foad. · View Herald TranscriptJul 13 2021, 4:35 PM

lkail added a subscriber: lkail.Sep 14 2021, 5:08 AM

cdevadas mentioned this in rG8f9dd5e608c0: [AMDGPU] Vector register spill test cleanup (NFC).Apr 26 2022, 12:49 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

Passes.h

6 lines

RegAllocCommon.h

32 lines

RegAllocRegistry.h

1 line

lib/

CodeGen/

7 lines

11 lines

18 lines

15 lines

32 lines

32 lines

4 lines

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

195 lines

SIFrameLowering.cpp

70 lines

SILowerSGPRSpills.cpp

52 lines

SIMachineFunctionInfo.cpp

22 lines

SIRegisterInfo.h

8 lines

SIRegisterInfo.cpp

54 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

extractelement-stack-lower.ll

617 lines

agpr-csr.ll

19 lines

alloc-aligned-tuples-gfx908.mir

2 lines

alloc-aligned-tuples-gfx90a.mir

2 lines

attr-amdgpu-flat-work-group-size-vgpr-limit.ll

12 lines

callee-frame-setup.ll

34 lines

gfx-callable-argument-types.ll

60 lines

gfx-callable-preserved-registers.ll

104 lines

indirect-call.ll

466 lines

llc-pipeline.ll

31 lines

mul24-pass-ordering.ll

50 lines

pei-build-spill.mir

16 lines

sgpr-regalloc-flags.ll

108 lines

sgpr-spill-no-vgprs.ll

234 lines

sgpr-spill-wrong-stack-id.mir

7 lines

sibling-call.ll

20 lines

spill-empty-live-interval.mir

2 lines

spill-scavenge-offset.ll

2 lines

spill_more_than_wavesize_csr_sgprs.ll

9 lines

stack-slot-color-sgpr-vgpr-spills.mir

5 lines

unstructured-cfg-def-use-issue.ll

31 lines

vgpr-tuple-allocation.ll

126 lines

virtregrewrite-undef-identity-copy.mir

4 lines

Diff 344878

llvm/include/llvm/CodeGen/Passes.h

Show All 9 Lines
// passes provided by the LLVM backend.		// passes provided by the LLVM backend.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CODEGEN_PASSES_H		#ifndef LLVM_CODEGEN_PASSES_H
#define LLVM_CODEGEN_PASSES_H		#define LLVM_CODEGEN_PASSES_H

#include "llvm/Support/CodeGen.h"		#include "llvm/Support/CodeGen.h"
		#include "llvm/CodeGen/RegAllocCommon.h"

#include <functional>		#include <functional>
#include <string>		#include <string>

namespace llvm {		namespace llvm {

class FunctionPass;		class FunctionPass;
class MachineFunction;		class MachineFunction;
class MachineFunctionPass;		class MachineFunctionPass;
▲ Show 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	namespace llvm {

/// This pass perform post-ra machine sink for COPY instructions.		/// This pass perform post-ra machine sink for COPY instructions.
extern char &PostRAMachineSinkingID;		extern char &PostRAMachineSinkingID;

/// FastRegisterAllocation Pass - This pass register allocates as fast as		/// FastRegisterAllocation Pass - This pass register allocates as fast as
/// possible. It is best suited for debug code where live ranges are short.		/// possible. It is best suited for debug code where live ranges are short.
///		///
FunctionPass *createFastRegisterAllocator();		FunctionPass *createFastRegisterAllocator();
		FunctionPass *createFastRegisterAllocator(RegClassFilterFunc F,
		bool ClearVirtRegs);

/// BasicRegisterAllocation Pass - This pass implements a degenerate global		/// BasicRegisterAllocation Pass - This pass implements a degenerate global
/// register allocator using the basic regalloc framework.		/// register allocator using the basic regalloc framework.
///		///
FunctionPass *createBasicRegisterAllocator();		FunctionPass *createBasicRegisterAllocator();
		FunctionPass *createBasicRegisterAllocator(RegClassFilterFunc F);

/// Greedy register allocation pass - This pass implements a global register		/// Greedy register allocation pass - This pass implements a global register
/// allocator for optimized builds.		/// allocator for optimized builds.
///		///
FunctionPass *createGreedyRegisterAllocator();		FunctionPass *createGreedyRegisterAllocator();
		FunctionPass *createGreedyRegisterAllocator(RegClassFilterFunc F);

/// PBQPRegisterAllocation Pass - This pass implements the Partitioned Boolean		/// PBQPRegisterAllocation Pass - This pass implements the Partitioned Boolean
/// Quadratic Prograaming (PBQP) based register allocator.		/// Quadratic Prograaming (PBQP) based register allocator.
///		///
FunctionPass *createDefaultPBQPRegisterAllocator();		FunctionPass *createDefaultPBQPRegisterAllocator();

/// PrologEpilogCodeInserter - This pass inserts prolog and epilog code,		/// PrologEpilogCodeInserter - This pass inserts prolog and epilog code,
/// and eliminates abstract frame references.		/// and eliminates abstract frame references.
▲ Show 20 Lines • Show All 329 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/RegAllocCommon.h

This file was added.

				//===- RegAllocCommon.h - Utilities shared between allocators ---- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CODEGEN_REGALLOCCOMMON_H
				#define LLVM_CODEGEN_REGALLOCCOMMON_H

				#include <functional>

				namespace llvm {

				class TargetRegisterClass;
				class TargetRegisterInfo;

				typedef std::function<bool(const TargetRegisterInfo &TRI,
				const TargetRegisterClass &RC)> RegClassFilterFunc;

				/// Default register class filter function for register allocation. All virtual
				/// registers should be allocated.
				static inline bool allocateAllRegClasses(const TargetRegisterInfo &,
				const TargetRegisterClass &) {
				return true;
				}

				}

				#endif // LLVM_CODEGEN_REGALLOCCOMMON_H

llvm/include/llvm/CodeGen/RegAllocRegistry.h

	//===- llvm/CodeGen/RegAllocRegistry.h --------------------------- C++ --===//			//===- llvm/CodeGen/RegAllocRegistry.h --------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file contains the implementation for register allocator function			// This file contains the implementation for register allocator function
	// pass registry (RegisterRegAlloc).			// pass registry (RegisterRegAlloc).
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_CODEGEN_REGALLOCREGISTRY_H			#ifndef LLVM_CODEGEN_REGALLOCREGISTRY_H
	#define LLVM_CODEGEN_REGALLOCREGISTRY_H			#define LLVM_CODEGEN_REGALLOCREGISTRY_H

				#include "llvm/CodeGen/RegAllocCommon.h"
	#include "llvm/CodeGen/MachinePassRegistry.h"			#include "llvm/CodeGen/MachinePassRegistry.h"

	namespace llvm {			namespace llvm {

	class FunctionPass;			class FunctionPass;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	///			///
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/lib/CodeGen/LiveIntervals.cpp

Show First 20 Lines • Show All 707 Lines • ▼ Show 20 Lines	void LiveIntervals::addKillFlags(const VirtRegMap *VRM) {
for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {		for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {
Register Reg = Register::index2VirtReg(i);		Register Reg = Register::index2VirtReg(i);
if (MRI->reg_nodbg_empty(Reg))		if (MRI->reg_nodbg_empty(Reg))
continue;		continue;
const LiveInterval &LI = getInterval(Reg);		const LiveInterval &LI = getInterval(Reg);
if (LI.empty())		if (LI.empty())
continue;		continue;

		// Target may have not allocated this yet.
		Register PhysReg = VRM->getPhys(Reg);
		if (!PhysReg)
		continue;

// Find the regunit intervals for the assigned register. They may overlap		// Find the regunit intervals for the assigned register. They may overlap
// the virtual register live range, cancelling any kills.		// the virtual register live range, cancelling any kills.
RU.clear();		RU.clear();
for (MCRegUnitIterator Unit(VRM->getPhys(Reg), TRI); Unit.isValid();		for (MCRegUnitIterator Unit(PhysReg, TRI); Unit.isValid();
++Unit) {		++Unit) {
const LiveRange &RURange = getRegUnit(*Unit);		const LiveRange &RURange = getRegUnit(*Unit);
if (RURange.empty())		if (RURange.empty())
continue;		continue;
RU.push_back(std::make_pair(&RURange, RURange.find(LI.begin()->end)));		RU.push_back(std::make_pair(&RURange, RURange.find(LI.begin()->end)));
}		}
// Every instruction that kills Reg corresponds to a segment range end		// Every instruction that kills Reg corresponds to a segment range end
// point.		// point.
▲ Show 20 Lines • Show All 1,027 Lines • Show Last 20 Lines

llvm/lib/CodeGen/RegAllocBase.h

	Show All 31 Lines
	// quality trade-off without relying on a particular theoretical solver.			// quality trade-off without relying on a particular theoretical solver.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_LIB_CODEGEN_REGALLOCBASE_H			#ifndef LLVM_LIB_CODEGEN_REGALLOCBASE_H
	#define LLVM_LIB_CODEGEN_REGALLOCBASE_H			#define LLVM_LIB_CODEGEN_REGALLOCBASE_H

	#include "llvm/ADT/SmallPtrSet.h"			#include "llvm/ADT/SmallPtrSet.h"
				#include "llvm/CodeGen/RegAllocCommon.h"
	#include "llvm/CodeGen/RegisterClassInfo.h"			#include "llvm/CodeGen/RegisterClassInfo.h"

	namespace llvm {			namespace llvm {

	class LiveInterval;			class LiveInterval;
	class LiveIntervals;			class LiveIntervals;
	class LiveRegMatrix;			class LiveRegMatrix;
	class MachineInstr;			class MachineInstr;
	Show All 14 Lines

	protected:			protected:
	const TargetRegisterInfo *TRI = nullptr;			const TargetRegisterInfo *TRI = nullptr;
	MachineRegisterInfo *MRI = nullptr;			MachineRegisterInfo *MRI = nullptr;
	VirtRegMap *VRM = nullptr;			VirtRegMap *VRM = nullptr;
	LiveIntervals *LIS = nullptr;			LiveIntervals *LIS = nullptr;
	LiveRegMatrix *Matrix = nullptr;			LiveRegMatrix *Matrix = nullptr;
	RegisterClassInfo RegClassInfo;			RegisterClassInfo RegClassInfo;
				const RegClassFilterFunc ShouldAllocateClass;

	/// Inst which is a def of an original reg and whose defs are already all			/// Inst which is a def of an original reg and whose defs are already all
	/// dead after remat is saved in DeadRemats. The deletion of such inst is			/// dead after remat is saved in DeadRemats. The deletion of such inst is
	/// postponed till all the allocations are done, so its remat expr is			/// postponed till all the allocations are done, so its remat expr is
	/// always available for the remat of all the siblings of the original reg.			/// always available for the remat of all the siblings of the original reg.
	SmallPtrSet<MachineInstr *, 32> DeadRemats;			SmallPtrSet<MachineInstr *, 32> DeadRemats;

	RegAllocBase() = default;			RegAllocBase(const RegClassFilterFunc F = allocateAllRegClasses) :
				ShouldAllocateClass(F) {}

	virtual ~RegAllocBase() = default;			virtual ~RegAllocBase() = default;

	// A RegAlloc pass should call this before allocatePhysRegs.			// A RegAlloc pass should call this before allocatePhysRegs.
	void init(VirtRegMap &vrm, LiveIntervals &lis, LiveRegMatrix &mat);			void init(VirtRegMap &vrm, LiveIntervals &lis, LiveRegMatrix &mat);

	// The top-level driver. The output is a VirtRegMap that us updated with			// The top-level driver. The output is a VirtRegMap that us updated with
	// physical register assignments.			// physical register assignments.
	void allocatePhysRegs();			void allocatePhysRegs();

	// Include spiller post optimization and removing dead defs left because of			// Include spiller post optimization and removing dead defs left because of
	// rematerialization.			// rematerialization.
	virtual void postOptimization();			virtual void postOptimization();

	// Get a temporary reference to a Spiller instance.			// Get a temporary reference to a Spiller instance.
	virtual Spiller &spiller() = 0;			virtual Spiller &spiller() = 0;

	/// enqueue - Add VirtReg to the priority queue of unassigned registers.			/// enqueue - Add VirtReg to the priority queue of unassigned registers.
	virtual void enqueue(LiveInterval *LI) = 0;			virtual void enqueueImpl(LiveInterval *LI) = 0;

				/// enqueue - Add VirtReg to the priority queue of unassigned registers.
				void enqueue(LiveInterval *LI);

	/// dequeue - Return the next unassigned register, or NULL.			/// dequeue - Return the next unassigned register, or NULL.
	virtual LiveInterval *dequeue() = 0;			virtual LiveInterval *dequeue() = 0;

	// A RegAlloc pass should override this to provide the allocation heuristics.			// A RegAlloc pass should override this to provide the allocation heuristics.
	// Each call must guarantee forward progess by returning an available PhysReg			// Each call must guarantee forward progess by returning an available PhysReg
	// or new set of split live virtual registers. It is up to the splitter to			// or new set of split live virtual registers. It is up to the splitter to
	// converge quickly toward fully spilled live ranges.			// converge quickly toward fully spilled live ranges.
	Show All 21 Lines

llvm/lib/CodeGen/RegAllocBase.cpp

	Show First 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
	void RegAllocBase::postOptimization() {			void RegAllocBase::postOptimization() {
	spiller().postOptimization();			spiller().postOptimization();
	for (auto DeadInst : DeadRemats) {			for (auto DeadInst : DeadRemats) {
	LIS->RemoveMachineInstrFromMaps(*DeadInst);			LIS->RemoveMachineInstrFromMaps(*DeadInst);
	DeadInst->eraseFromParent();			DeadInst->eraseFromParent();
	}			}
	DeadRemats.clear();			DeadRemats.clear();
	}			}

				void RegAllocBase::enqueue(LiveInterval *LI) {
				const Register Reg = LI->reg();

				assert(Reg.isVirtual() && "Can only enqueue virtual registers");

				if (VRM->hasPhys(Reg))
				return;

				const TargetRegisterClass &RC = *MRI->getRegClass(Reg);
				if (ShouldAllocateClass(*TRI, RC)) {
				LLVM_DEBUG(dbgs() << "Enqueuing " << printReg(Reg, TRI) << '\n');
				enqueueImpl(LI);
				} else {
				LLVM_DEBUG(dbgs() << "Not enqueueing " << printReg(Reg, TRI)
				<< " in skipped register class\n");
				}
				}

llvm/lib/CodeGen/RegAllocBasic.cpp

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	class RABasic : public MachineFunctionPass,
// Scratch space. Allocated here to avoid repeated malloc calls in		// Scratch space. Allocated here to avoid repeated malloc calls in
// selectOrSplit().		// selectOrSplit().
BitVector UsableRegs;		BitVector UsableRegs;

bool LRE_CanEraseVirtReg(Register) override;		bool LRE_CanEraseVirtReg(Register) override;
void LRE_WillShrinkVirtReg(Register) override;		void LRE_WillShrinkVirtReg(Register) override;

public:		public:
RABasic();		RABasic(const RegClassFilterFunc F = allocateAllRegClasses);

/// Return the pass name.		/// Return the pass name.
StringRef getPassName() const override { return "Basic Register Allocator"; }		StringRef getPassName() const override { return "Basic Register Allocator"; }

/// RABasic analysis usage.		/// RABasic analysis usage.
void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;

void releaseMemory() override;		void releaseMemory() override;

Spiller &spiller() override { return *SpillerInstance; }		Spiller &spiller() override { return *SpillerInstance; }

void enqueue(LiveInterval *LI) override {		void enqueueImpl(LiveInterval *LI) override {
Queue.push(LI);		Queue.push(LI);
}		}

LiveInterval *dequeue() override {		LiveInterval *dequeue() override {
if (Queue.empty())		if (Queue.empty())
return nullptr;		return nullptr;
LiveInterval *LI = Queue.top();		LiveInterval *LI = Queue.top();
Queue.pop();		Queue.pop();
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	if (!VRM->hasPhys(VirtReg))
return;		return;

// Register is assigned, put it back on the queue for reassignment.		// Register is assigned, put it back on the queue for reassignment.
LiveInterval &LI = LIS->getInterval(VirtReg);		LiveInterval &LI = LIS->getInterval(VirtReg);
Matrix->unassign(LI);		Matrix->unassign(LI);
enqueue(&LI);		enqueue(&LI);
}		}

RABasic::RABasic(): MachineFunctionPass(ID) {		RABasic::RABasic(RegClassFilterFunc F):
		MachineFunctionPass(ID),
		RegAllocBase(F) {
}		}

void RABasic::getAnalysisUsage(AnalysisUsage &AU) const {		void RABasic::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addPreserved<AAResultsWrapperPass>();		AU.addPreserved<AAResultsWrapperPass>();
AU.addRequired<LiveIntervals>();		AU.addRequired<LiveIntervals>();
AU.addPreserved<LiveIntervals>();		AU.addPreserved<LiveIntervals>();
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	bool RABasic::runOnMachineFunction(MachineFunction &mf) {

// Diagnostic output before rewriting		// Diagnostic output before rewriting
LLVM_DEBUG(dbgs() << "Post alloc VirtRegMap:\n" << *VRM << "\n");		LLVM_DEBUG(dbgs() << "Post alloc VirtRegMap:\n" << *VRM << "\n");

releaseMemory();		releaseMemory();
return true;		return true;
}		}

FunctionPass* llvm::createBasicRegisterAllocator()		FunctionPass* llvm::createBasicRegisterAllocator() {
{
return new RABasic();		return new RABasic();
}		}

		FunctionPass* llvm::createBasicRegisterAllocator(RegClassFilterFunc F) {
		return new RABasic(F);
		}

llvm/lib/CodeGen/RegAllocFast.cpp

Show All 21 Lines
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineOperand.h"		#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
		#include "llvm/CodeGen/RegAllocCommon.h"
#include "llvm/CodeGen/RegAllocRegistry.h"		#include "llvm/CodeGen/RegAllocRegistry.h"
#include "llvm/CodeGen/RegisterClassInfo.h"		#include "llvm/CodeGen/RegisterClassInfo.h"
#include "llvm/CodeGen/TargetInstrInfo.h"		#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/CodeGen/TargetOpcodes.h"		#include "llvm/CodeGen/TargetOpcodes.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"		#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"		#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
Show All 26 Lines	static RegisterRegAlloc
fastRegAlloc("fast", "fast register allocator", createFastRegisterAllocator);		fastRegAlloc("fast", "fast register allocator", createFastRegisterAllocator);

namespace {		namespace {

class RegAllocFast : public MachineFunctionPass {		class RegAllocFast : public MachineFunctionPass {
public:		public:
static char ID;		static char ID;

RegAllocFast() : MachineFunctionPass(ID), StackSlotForVirtReg(-1) {}		RegAllocFast(const RegClassFilterFunc F = allocateAllRegClasses,
		bool ClearVirtRegs_ = true) :
		MachineFunctionPass(ID),
		ShouldAllocateClass(F),
		StackSlotForVirtReg(-1),
		ClearVirtRegs(ClearVirtRegs_) {
		}

private:		private:
MachineFrameInfo *MFI;		MachineFrameInfo *MFI;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
RegisterClassInfo RegClassInfo;		RegisterClassInfo RegClassInfo;
		const RegClassFilterFunc ShouldAllocateClass;

/// Basic block currently being allocated.		/// Basic block currently being allocated.
MachineBasicBlock *MBB;		MachineBasicBlock *MBB;

/// Maps virtual regs to the frame index where these values are spilled.		/// Maps virtual regs to the frame index where these values are spilled.
IndexedMap<int, VirtReg2IndexFunctor> StackSlotForVirtReg;		IndexedMap<int, VirtReg2IndexFunctor> StackSlotForVirtReg;

		bool ClearVirtRegs;

/// Everything we know about a live virtual register.		/// Everything we know about a live virtual register.
struct LiveReg {		struct LiveReg {
MachineInstr *LastUse = nullptr; ///< Last instr to use reg.		MachineInstr *LastUse = nullptr; ///< Last instr to use reg.
Register VirtReg; ///< Virtual register number.		Register VirtReg; ///< Virtual register number.
MCPhysReg PhysReg = 0; ///< Currently held here.		MCPhysReg PhysReg = 0; ///< Currently held here.
bool LiveOut = false; ///< Register is possibly live out.		bool LiveOut = false; ///< Register is possibly live out.
bool Reloaded = false; ///< Register was reloaded.		bool Reloaded = false; ///< Register was reloaded.
bool Error = false; ///< Could not allocate.		bool Error = false; ///< Could not allocate.
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	public:
}		}

MachineFunctionProperties getRequiredProperties() const override {		MachineFunctionProperties getRequiredProperties() const override {
return MachineFunctionProperties().set(		return MachineFunctionProperties().set(
MachineFunctionProperties::Property::NoPHIs);		MachineFunctionProperties::Property::NoPHIs);
}		}

MachineFunctionProperties getSetProperties() const override {		MachineFunctionProperties getSetProperties() const override {
		if (ClearVirtRegs) {
return MachineFunctionProperties().set(		return MachineFunctionProperties().set(
MachineFunctionProperties::Property::NoVRegs);		MachineFunctionProperties::Property::NoVRegs);
}		}

		return MachineFunctionProperties();
		}

MachineFunctionProperties getClearedProperties() const override {		MachineFunctionProperties getClearedProperties() const override {
return MachineFunctionProperties().set(		return MachineFunctionProperties().set(
MachineFunctionProperties::Property::IsSSA);		MachineFunctionProperties::Property::IsSSA);
}		}

private:		private:
bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

▲ Show 20 Lines • Show All 1,308 Lines • ▼ Show 20 Lines	bool RegAllocFast::runOnMachineFunction(MachineFunction &MF) {
LiveVirtRegs.setUniverse(NumVirtRegs);		LiveVirtRegs.setUniverse(NumVirtRegs);
MayLiveAcrossBlocks.clear();		MayLiveAcrossBlocks.clear();
MayLiveAcrossBlocks.resize(NumVirtRegs);		MayLiveAcrossBlocks.resize(NumVirtRegs);

// Loop over all of the basic blocks, eliminating virtual register references		// Loop over all of the basic blocks, eliminating virtual register references
for (MachineBasicBlock &MBB : MF)		for (MachineBasicBlock &MBB : MF)
allocateBasicBlock(MBB);		allocateBasicBlock(MBB);

		if (ClearVirtRegs) {
// All machine operands and other references to virtual registers have been		// All machine operands and other references to virtual registers have been
// replaced. Remove the virtual registers.		// replaced. Remove the virtual registers.
MRI->clearVirtRegs();		MRI->clearVirtRegs();
		}

StackSlotForVirtReg.clear();		StackSlotForVirtReg.clear();
LiveDbgValueMap.clear();		LiveDbgValueMap.clear();
return true;		return true;
}		}

FunctionPass *llvm::createFastRegisterAllocator() {		FunctionPass *llvm::createFastRegisterAllocator() {
return new RegAllocFast();		return new RegAllocFast();
}		}

		FunctionPass *llvm::createFastRegisterAllocator(
		std::function<bool(const TargetRegisterInfo &TRI,
		const TargetRegisterClass &RC)> Ftor, bool ClearVirtRegs) {
		return new RegAllocFast(Ftor, ClearVirtRegs);
		}

llvm/lib/CodeGen/RegAllocGreedy.cpp

Show First 20 Lines • Show All 406 Lines • ▼ Show 20 Lines	#endif
/// Set of broken hints that may be reconciled later because of eviction.		/// Set of broken hints that may be reconciled later because of eviction.
SmallSetVector<LiveInterval *, 8> SetOfBrokenHints;		SmallSetVector<LiveInterval *, 8> SetOfBrokenHints;

/// The register cost values. This list will be recreated for each Machine		/// The register cost values. This list will be recreated for each Machine
/// Function		/// Function
ArrayRef<uint8_t> RegCosts;		ArrayRef<uint8_t> RegCosts;

public:		public:
RAGreedy();		RAGreedy(const RegClassFilterFunc F = allocateAllRegClasses);

/// Return the pass name.		/// Return the pass name.
StringRef getPassName() const override { return "Greedy Register Allocator"; }		StringRef getPassName() const override { return "Greedy Register Allocator"; }

/// RAGreedy analysis usage.		/// RAGreedy analysis usage.
void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;
void releaseMemory() override;		void releaseMemory() override;
Spiller &spiller() override { return *SpillerInstance; }		Spiller &spiller() override { return *SpillerInstance; }
void enqueue(LiveInterval *LI) override;		void enqueueImpl(LiveInterval *LI) override;
LiveInterval *dequeue() override;		LiveInterval *dequeue() override;
MCRegister selectOrSplit(LiveInterval &,		MCRegister selectOrSplit(LiveInterval &,
SmallVectorImpl<Register> &) override;		SmallVectorImpl<Register> &) override;
void aboutToRemoveInterval(LiveInterval &) override;		void aboutToRemoveInterval(LiveInterval &) override;

/// Perform register allocation.		/// Perform register allocation.
bool runOnMachineFunction(MachineFunction &mf) override;		bool runOnMachineFunction(MachineFunction &mf) override;

▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines
// Hysteresis to use when comparing floats.		// Hysteresis to use when comparing floats.
// This helps stabilize decisions based on float comparisons.		// This helps stabilize decisions based on float comparisons.
const float Hysteresis = (2007 / 2048.0f); // 0.97998046875		const float Hysteresis = (2007 / 2048.0f); // 0.97998046875

FunctionPass* llvm::createGreedyRegisterAllocator() {		FunctionPass* llvm::createGreedyRegisterAllocator() {
return new RAGreedy();		return new RAGreedy();
}		}

RAGreedy::RAGreedy(): MachineFunctionPass(ID) {		namespace llvm {
		FunctionPass* createGreedyRegisterAllocator(
		std::function<bool(const TargetRegisterInfo &TRI,
		const TargetRegisterClass &RC)> Ftor);

		}

		FunctionPass* llvm::createGreedyRegisterAllocator(
		std::function<bool(const TargetRegisterInfo &TRI,
		const TargetRegisterClass &RC)> Ftor) {
		return new RAGreedy(Ftor);
		}

		RAGreedy::RAGreedy(RegClassFilterFunc F):
		MachineFunctionPass(ID),
		RegAllocBase(F) {
}		}

void RAGreedy::getAnalysisUsage(AnalysisUsage &AU) const {		void RAGreedy::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<MachineBlockFrequencyInfo>();		AU.addRequired<MachineBlockFrequencyInfo>();
AU.addPreserved<MachineBlockFrequencyInfo>();		AU.addPreserved<MachineBlockFrequencyInfo>();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addPreserved<AAResultsWrapperPass>();		AU.addPreserved<AAResultsWrapperPass>();
Show All 40 Lines

void RAGreedy::LRE_WillShrinkVirtReg(Register VirtReg) {		void RAGreedy::LRE_WillShrinkVirtReg(Register VirtReg) {
if (!VRM->hasPhys(VirtReg))		if (!VRM->hasPhys(VirtReg))
return;		return;

// Register is assigned, put it back on the queue for reassignment.		// Register is assigned, put it back on the queue for reassignment.
LiveInterval &LI = LIS->getInterval(VirtReg);		LiveInterval &LI = LIS->getInterval(VirtReg);
Matrix->unassign(LI);		Matrix->unassign(LI);
enqueue(&LI);		RegAllocBase::enqueue(&LI);
}		}

void RAGreedy::LRE_DidCloneVirtReg(Register New, Register Old) {		void RAGreedy::LRE_DidCloneVirtReg(Register New, Register Old) {
// Cloning a register we haven't even heard about yet? Just ignore it.		// Cloning a register we haven't even heard about yet? Just ignore it.
if (!ExtraRegInfo.inBounds(Old))		if (!ExtraRegInfo.inBounds(Old))
return;		return;

// LRE may clone a virtual register because dead code elimination causes it to		// LRE may clone a virtual register because dead code elimination causes it to
// be split into connected components. The new components are much smaller		// be split into connected components. The new components are much smaller
// than the original, so they should get a new chance at being assigned.		// than the original, so they should get a new chance at being assigned.
// same stage as the parent.		// same stage as the parent.
ExtraRegInfo[Old].Stage = RS_Assign;		ExtraRegInfo[Old].Stage = RS_Assign;
ExtraRegInfo.grow(New);		ExtraRegInfo.grow(New);
ExtraRegInfo[New] = ExtraRegInfo[Old];		ExtraRegInfo[New] = ExtraRegInfo[Old];
}		}

void RAGreedy::releaseMemory() {		void RAGreedy::releaseMemory() {
SpillerInstance.reset();		SpillerInstance.reset();
ExtraRegInfo.clear();		ExtraRegInfo.clear();
GlobalCand.clear();		GlobalCand.clear();
}		}

void RAGreedy::enqueue(LiveInterval *LI) { enqueue(Queue, LI); }		void RAGreedy::enqueueImpl(LiveInterval *LI) { enqueue(Queue, LI); }

void RAGreedy::enqueue(PQueue &CurQueue, LiveInterval *LI) {		void RAGreedy::enqueue(PQueue &CurQueue, LiveInterval *LI) {
// Prioritize live ranges by size, assigning larger ranges first.		// Prioritize live ranges by size, assigning larger ranges first.
// The queue holds (size, reg) pairs.		// The queue holds (size, reg) pairs.
const unsigned Size = LI->getSize();		const unsigned Size = LI->getSize();
const Register Reg = LI->reg();		const Register Reg = LI->reg();
assert(Reg.isVirtual() && "Can only enqueue virtual registers");		assert(Reg.isVirtual() && "Can only enqueue virtual registers");
unsigned Prio;		unsigned Prio;
▲ Show 20 Lines • Show All 2,203 Lines • ▼ Show 20 Lines	void RAGreedy::tryHintRecoloring(LiveInterval &VirtReg) {

do {		do {
Reg = RecoloringCandidates.pop_back_val();		Reg = RecoloringCandidates.pop_back_val();

// We cannot recolor physical register.		// We cannot recolor physical register.
if (Register::isPhysicalRegister(Reg))		if (Register::isPhysicalRegister(Reg))
continue;		continue;

assert(VRM->hasPhys(Reg) && "We have unallocated variable!!");		// This may be a skipped class
		if (!VRM->hasPhys(Reg)) {
		assert(!ShouldAllocateClass(TRI, MRI->getRegClass(Reg)) &&
		"We have an unallocated variable which should have been handled");
		continue;
		}

// Get the live interval mapped with this virtual register to be able		// Get the live interval mapped with this virtual register to be able
// to check for the interference with the new color.		// to check for the interference with the new color.
LiveInterval &LI = LIS->getInterval(Reg);		LiveInterval &LI = LIS->getInterval(Reg);
MCRegister CurrPhys = VRM->getPhys(Reg);		MCRegister CurrPhys = VRM->getPhys(Reg);
// Check that the new color matches the register class constraints and		// Check that the new color matches the register class constraints and
// that it is free for this live range.		// that it is free for this live range.
if (CurrPhys != PhysReg && (!MRI->getRegClass(Reg)->contains(PhysReg) \|\|		if (CurrPhys != PhysReg && (!MRI->getRegClass(Reg)->contains(PhysReg) \|\|
▲ Show 20 Lines • Show All 424 Lines • Show Last 20 Lines

llvm/lib/CodeGen/TargetPassConfig.cpp

Show First 20 Lines • Show All 1,310 Lines • ▼ Show 20 Lines	FunctionPass *TargetPassConfig::createRegAllocPass(bool Optimized) {
if (Ctor != useDefaultRegisterAllocator)		if (Ctor != useDefaultRegisterAllocator)
return Ctor();		return Ctor();

// With no -regalloc= override, ask the target for a regalloc pass.		// With no -regalloc= override, ask the target for a regalloc pass.
return createTargetRegisterAllocator(Optimized);		return createTargetRegisterAllocator(Optimized);
}		}

bool TargetPassConfig::addRegAssignAndRewriteFast() {		bool TargetPassConfig::addRegAssignAndRewriteFast() {
if (RegAlloc != &useDefaultRegisterAllocator &&		if (RegAlloc != (RegisterRegAlloc::FunctionPassCtor)&useDefaultRegisterAllocator &&
RegAlloc != &createFastRegisterAllocator)		RegAlloc != (RegisterRegAlloc::FunctionPassCtor)&createFastRegisterAllocator)
report_fatal_error("Must use fast (default) register allocator for unoptimized regalloc.");		report_fatal_error("Must use fast (default) register allocator for unoptimized regalloc.");

addPass(createRegAllocPass(false));		addPass(createRegAllocPass(false));

// Allow targets to change the register assignments after		// Allow targets to change the register assignments after
// fast register allocation.		// fast register allocation.
addPostFastRegAllocRewrite();		addPostFastRegAllocRewrite();
return true;		return true;
▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show All 26 Lines
#include "TargetInfo/AMDGPUTargetInfo.h"		#include "TargetInfo/AMDGPUTargetInfo.h"
#include "llvm/Analysis/CGSCCPassManager.h"		#include "llvm/Analysis/CGSCCPassManager.h"
#include "llvm/CodeGen/GlobalISel/IRTranslator.h"		#include "llvm/CodeGen/GlobalISel/IRTranslator.h"
#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"		#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"
#include "llvm/CodeGen/GlobalISel/Legalizer.h"		#include "llvm/CodeGen/GlobalISel/Legalizer.h"
#include "llvm/CodeGen/GlobalISel/Localizer.h"		#include "llvm/CodeGen/GlobalISel/Localizer.h"
#include "llvm/CodeGen/GlobalISel/RegBankSelect.h"		#include "llvm/CodeGen/GlobalISel/RegBankSelect.h"
#include "llvm/CodeGen/MIRParser/MIParser.h"		#include "llvm/CodeGen/MIRParser/MIParser.h"
		#include "llvm/CodeGen/Passes.h"
		#include "llvm/CodeGen/RegAllocRegistry.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Passes/PassBuilder.h"		#include "llvm/Passes/PassBuilder.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/IPO/AlwaysInliner.h"		#include "llvm/Transforms/IPO/AlwaysInliner.h"
#include "llvm/Transforms/IPO/GlobalDCE.h"		#include "llvm/Transforms/IPO/GlobalDCE.h"
#include "llvm/Transforms/IPO/Internalize.h"		#include "llvm/Transforms/IPO/Internalize.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"		#include "llvm/Transforms/IPO/PassManagerBuilder.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/GVN.h"		#include "llvm/Transforms/Scalar/GVN.h"
#include "llvm/Transforms/Scalar/InferAddressSpaces.h"		#include "llvm/Transforms/Scalar/InferAddressSpaces.h"
#include "llvm/Transforms/Utils.h"		#include "llvm/Transforms/Utils.h"
#include "llvm/Transforms/Utils/SimplifyLibCalls.h"		#include "llvm/Transforms/Utils/SimplifyLibCalls.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"

using namespace llvm;		using namespace llvm;

		namespace {
		class SGPRRegisterRegAlloc : public RegisterRegAllocBase<SGPRRegisterRegAlloc> {
		public:
		SGPRRegisterRegAlloc(const char N, const char D, FunctionPassCtor C)
		: RegisterRegAllocBase(N, D, C) {}
		};

		class VGPRRegisterRegAlloc : public RegisterRegAllocBase<VGPRRegisterRegAlloc> {
		public:
		VGPRRegisterRegAlloc(const char N, const char D, FunctionPassCtor C)
		: RegisterRegAllocBase(N, D, C) {}
		};

		static bool onlyAllocateSGPRs(const TargetRegisterInfo &TRI,
		const TargetRegisterClass &RC) {
		return static_cast<const SIRegisterInfo &>(TRI).isSGPRClass(&RC);
		}

		static bool onlyAllocateVGPRs(const TargetRegisterInfo &TRI,
		const TargetRegisterClass &RC) {
		return !static_cast<const SIRegisterInfo &>(TRI).isSGPRClass(&RC);
		}


		/// -{sgpr\|vgpr}-regalloc=... command line option.
		static FunctionPass *useDefaultRegisterAllocator() { return nullptr; }

		/// A dummy default pass factory indicates whether the register allocator is
		/// overridden on the command line.
		static llvm::once_flag InitializeDefaultSGPRRegisterAllocatorFlag;
		static llvm::once_flag InitializeDefaultVGPRRegisterAllocatorFlag;

		static SGPRRegisterRegAlloc
		defaultSGPRRegAlloc("default",
		"pick SGPR register allocator based on -O option",
		useDefaultRegisterAllocator);

		static cl::opt<SGPRRegisterRegAlloc::FunctionPassCtor, false,
		RegisterPassParser<SGPRRegisterRegAlloc>>
		SGPRRegAlloc("sgpr-regalloc", cl::Hidden, cl::init(&useDefaultRegisterAllocator),
		cl::desc("Register allocator to use for SGPRs"));

		static cl::opt<VGPRRegisterRegAlloc::FunctionPassCtor, false,
		RegisterPassParser<VGPRRegisterRegAlloc>>
		VGPRRegAlloc("vgpr-regalloc", cl::Hidden, cl::init(&useDefaultRegisterAllocator),
		cl::desc("Register allocator to use for VGPRs"));


		static void initializeDefaultSGPRRegisterAllocatorOnce() {
		RegisterRegAlloc::FunctionPassCtor Ctor = SGPRRegisterRegAlloc::getDefault();

		if (!Ctor) {
		Ctor = SGPRRegAlloc;
		SGPRRegisterRegAlloc::setDefault(SGPRRegAlloc);
		}
		}

		static void initializeDefaultVGPRRegisterAllocatorOnce() {
		RegisterRegAlloc::FunctionPassCtor Ctor = VGPRRegisterRegAlloc::getDefault();

		if (!Ctor) {
		Ctor = VGPRRegAlloc;
		VGPRRegisterRegAlloc::setDefault(VGPRRegAlloc);
		}
		}

		static FunctionPass *createBasicSGPRRegisterAllocator() {
		return createBasicRegisterAllocator(onlyAllocateSGPRs);
		}

		static FunctionPass *createGreedySGPRRegisterAllocator() {
		return createGreedyRegisterAllocator(onlyAllocateSGPRs);
		}

		static FunctionPass *createFastSGPRRegisterAllocator() {
		return createFastRegisterAllocator(onlyAllocateSGPRs, false);
		}

		static FunctionPass *createBasicVGPRRegisterAllocator() {
		return createBasicRegisterAllocator(onlyAllocateVGPRs);
		}

		static FunctionPass *createGreedyVGPRRegisterAllocator() {
		return createGreedyRegisterAllocator(onlyAllocateVGPRs);
		}

		static FunctionPass *createFastVGPRRegisterAllocator() {
		return createFastRegisterAllocator(onlyAllocateVGPRs, true);
		}

		static SGPRRegisterRegAlloc basicRegAllocSGPR(
		"basic", "basic register allocator", createBasicSGPRRegisterAllocator);
		static SGPRRegisterRegAlloc greedyRegAllocSGPR(
		"greedy", "greedy register allocator", createGreedySGPRRegisterAllocator);

		static SGPRRegisterRegAlloc fastRegAllocSGPR(
		"fast", "fast register allocator", createFastSGPRRegisterAllocator);


		static VGPRRegisterRegAlloc basicRegAllocVGPR(
		"basic", "basic register allocator", createBasicVGPRRegisterAllocator);
		static VGPRRegisterRegAlloc greedyRegAllocVGPR(
		"greedy", "greedy register allocator", createGreedyVGPRRegisterAllocator);

		static VGPRRegisterRegAlloc fastRegAllocVGPR(
		"fast", "fast register allocator", createFastVGPRRegisterAllocator);
		}


static cl::opt<bool> EnableR600StructurizeCFG(		static cl::opt<bool> EnableR600StructurizeCFG(
"r600-ir-structurize",		"r600-ir-structurize",
cl::desc("Use StructurizeCFG IR pass"),		cl::desc("Use StructurizeCFG IR pass"),
cl::init(true));		cl::init(true));

static cl::opt<bool> EnableSROA(		static cl::opt<bool> EnableSROA(
"amdgpu-sroa",		"amdgpu-sroa",
cl::desc("Run SROA after promote alloca pass"),		cl::desc("Run SROA after promote alloca pass"),
▲ Show 20 Lines • Show All 748 Lines • ▼ Show 20 Lines	public:
void addPreLegalizeMachineIR() override;		void addPreLegalizeMachineIR() override;
bool addLegalizeMachineIR() override;		bool addLegalizeMachineIR() override;
void addPreRegBankSelect() override;		void addPreRegBankSelect() override;
bool addRegBankSelect() override;		bool addRegBankSelect() override;
void addPreGlobalInstructionSelect() override;		void addPreGlobalInstructionSelect() override;
bool addGlobalInstructionSelect() override;		bool addGlobalInstructionSelect() override;
void addFastRegAlloc() override;		void addFastRegAlloc() override;
void addOptimizedRegAlloc() override;		void addOptimizedRegAlloc() override;

		FunctionPass *createSGPRAllocPass(bool Optimized);
		FunctionPass *createVGPRAllocPass(bool Optimized);
		FunctionPass *createRegAllocPass(bool Optimized) override;

		bool addRegAssignAndRewriteFast() override;
		bool addRegAssignAndRewriteOptimized() override;

void addPreRegAlloc() override;		void addPreRegAlloc() override;
bool addPreRewrite() override;		bool addPreRewrite() override;
void addPostRegAlloc() override;		void addPostRegAlloc() override;
void addPreSched2() override;		void addPreSched2() override;
void addPreEmitPass() override;		void addPreEmitPass() override;
};		};

} // end anonymous namespace		} // end anonymous namespace
▲ Show 20 Lines • Show All 354 Lines • ▼ Show 20 Lines
}		}

bool GCNPassConfig::addPreRewrite() {		bool GCNPassConfig::addPreRewrite() {
if (EnableRegReassign)		if (EnableRegReassign)
addPass(&GCNNSAReassignID);		addPass(&GCNNSAReassignID);
return true;		return true;
}		}

		FunctionPass *GCNPassConfig::createSGPRAllocPass(bool Optimized) {
		// Initialize the global default.
		llvm::call_once(InitializeDefaultSGPRRegisterAllocatorFlag,
		initializeDefaultSGPRRegisterAllocatorOnce);

		RegisterRegAlloc::FunctionPassCtor Ctor = SGPRRegisterRegAlloc::getDefault();
		if (Ctor != useDefaultRegisterAllocator)
		return Ctor();

		if (Optimized)
		return createGreedyRegisterAllocator(onlyAllocateSGPRs);

		return createFastRegisterAllocator(onlyAllocateSGPRs, false);
		}

		FunctionPass *GCNPassConfig::createVGPRAllocPass(bool Optimized) {
		// Initialize the global default.
		llvm::call_once(InitializeDefaultVGPRRegisterAllocatorFlag,
		initializeDefaultVGPRRegisterAllocatorOnce);

		RegisterRegAlloc::FunctionPassCtor Ctor = VGPRRegisterRegAlloc::getDefault();
		if (Ctor != useDefaultRegisterAllocator)
		return Ctor();

		if (Optimized)
		return createGreedyVGPRRegisterAllocator();

		return createFastVGPRRegisterAllocator();
		}

		FunctionPass *GCNPassConfig::createRegAllocPass(bool Optimized) {
		llvm_unreachable("should not be used");
		}

		static const char RegAllocOptNotSupportedMessage[] =
		"-regalloc not supported with amdgcn. Use -sgpr-regalloc and -vgpr-regalloc";

		bool GCNPassConfig::addRegAssignAndRewriteFast() {
		if (!usingDefaultRegAlloc())
		report_fatal_error(RegAllocOptNotSupportedMessage);

		addPass(createSGPRAllocPass(false));

		// Equivalent of PEI for SGPRs.
		addPass(&SILowerSGPRSpillsID);

		addPass(createVGPRAllocPass(false));
		return true;
		}

		bool GCNPassConfig::addRegAssignAndRewriteOptimized() {
		if (!usingDefaultRegAlloc())
		report_fatal_error(RegAllocOptNotSupportedMessage);

		addPass(createSGPRAllocPass(true));

		// Commit allocated register changes. This is mostly necessary because too
		// many things rely on the use lists of the physical registers, such as the
		// verifier. This is only necessary with allocators which use LiveIntervals,
		// since FastRegAlloc does the replacments itself.
		addPass(createVirtRegRewriter(false));

		rampitecUnsubmitted Not Done Reply Inline Actions GCNRegBankReassign also works with SGPRs. Which means you need a pre-rewriter here, which needs to have a different subset of passes and an RC filter. rampitec: GCNRegBankReassign also works with SGPRs. Which means you need a pre-rewriter here, which needs…
		// Equivalent of PEI for SGPRs.
		addPass(&SILowerSGPRSpillsID);

		addPass(createVGPRAllocPass(true));

		addPreRewrite();
		addPass(&VirtRegRewriterID);

		return true;
		}

void GCNPassConfig::addPostRegAlloc() {		void GCNPassConfig::addPostRegAlloc() {
addPass(&SIFixVGPRCopiesID);		addPass(&SIFixVGPRCopiesID);
if (getOptLevel() > CodeGenOpt::None)		if (getOptLevel() > CodeGenOpt::None)
addPass(&SIOptimizeExecMaskingID);		addPass(&SIOptimizeExecMaskingID);
TargetPassConfig::addPostRegAlloc();		TargetPassConfig::addPostRegAlloc();

// Equivalent of PEI for SGPRs.
addPass(&SILowerSGPRSpillsID);
}		}

void GCNPassConfig::addPreSched2() {		void GCNPassConfig::addPreSched2() {
addPass(&SIPostRABundlerID);		addPass(&SIPostRABundlerID);
}		}

void GCNPassConfig::addPreEmitPass() {		void GCNPassConfig::addPreEmitPass() {
addPass(createSIMemoryLegalizerPass());		addPass(createSIMemoryLegalizerPass());
▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

	Show All 14 Lines
	#include "llvm/CodeGen/MachineFrameInfo.h"			#include "llvm/CodeGen/MachineFrameInfo.h"
	#include "llvm/CodeGen/RegisterScavenging.h"			#include "llvm/CodeGen/RegisterScavenging.h"
	#include "llvm/Target/TargetMachine.h"			#include "llvm/Target/TargetMachine.h"

	using namespace llvm;			using namespace llvm;

	#define DEBUG_TYPE "frame-info"			#define DEBUG_TYPE "frame-info"

				static cl::opt<bool> EnableSpillVGPRToAGPR(
				"amdgpu-spill-vgpr-to-agpr",
				cl::desc("Enable spilling VGPRs to AGPRs"),
				cl::ReallyHidden,
				cl::init(true));

	// Find a scratch register that we can use in the prologue. We avoid using			// Find a scratch register that we can use in the prologue. We avoid using
	// callee-save registers since they may appear to be free when this is called			// callee-save registers since they may appear to be free when this is called
	// from canUseAsPrologue (during shrink wrapping), but then no longer be free			// from canUseAsPrologue (during shrink wrapping), but then no longer be free
	// when this is called from emitPrologue.			// when this is called from emitPrologue.
	static MCRegister findScratchNonCalleeSaveRegister(MachineRegisterInfo &MRI,			static MCRegister findScratchNonCalleeSaveRegister(MachineRegisterInfo &MRI,
	LivePhysRegs &LiveRegs,			LivePhysRegs &LiveRegs,
	const TargetRegisterClass &RC,			const TargetRegisterClass &RC,
	bool Unused = false) {			bool Unused = false) {
	▲ Show 20 Lines • Show All 1,089 Lines • ▼ Show 20 Lines
	}			}

	void SIFrameLowering::processFunctionBeforeFrameFinalized(			void SIFrameLowering::processFunctionBeforeFrameFinalized(
	MachineFunction &MF,			MachineFunction &MF,
	RegScavenger *RS) const {			RegScavenger *RS) const {
	MachineFrameInfo &MFI = MF.getFrameInfo();			MachineFrameInfo &MFI = MF.getFrameInfo();

	const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();			const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
				const SIInstrInfo *TII = ST.getInstrInfo();
	const SIRegisterInfo *TRI = ST.getRegisterInfo();			const SIRegisterInfo *TRI = ST.getRegisterInfo();
				MachineRegisterInfo &MRI = MF.getRegInfo();
	SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();			SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();

				const bool SpillVGPRToAGPR = ST.hasMAIInsts() && FuncInfo->hasSpilledVGPRs()
				&& EnableSpillVGPRToAGPR;

				if (SpillVGPRToAGPR) {
				// To track the spill frame indices handled in this pass.
				BitVector SpillFIs(MFI.getObjectIndexEnd(), false);

				bool SeenDbgInstr = false;

				for (MachineBasicBlock &MBB : MF) {
				MachineBasicBlock::iterator Next;
				for (auto I = MBB.begin(), E = MBB.end(); I != E; I = Next) {
				MachineInstr &MI = *I;
				Next = std::next(I);
				rampitecUnsubmitted Not Done Reply Inline Actions It can be saddr with flat scratch. It seems it needs to be fixed in a separate patch first. rampitec: It can be saddr with flat scratch. It seems it needs to be fixed in a separate patch first.
				rampitecUnsubmitted Not Done Reply Inline Actions Never mind, this is one of SI_SPILL opcodes, not real instruction yet. rampitec: Never mind, this is one of SI_SPILL opcodes, not real instruction yet.

				if (MI.isDebugInstr())
				SeenDbgInstr = true;

				if (TII->isVGPRSpill(MI)) {
				// Try to eliminate stack used by VGPR spills before frame
				// finalization.
				unsigned FIOp = AMDGPU::getNamedOperandIdx(MI.getOpcode(),
				AMDGPU::OpName::vaddr);
				int FI = MI.getOperand(FIOp).getIndex();
				Register VReg =
				TII->getNamedOperand(MI, AMDGPU::OpName::vdata)->getReg();
				if (FuncInfo->allocateVGPRSpillToAGPR(MF, FI,
				TRI->isAGPR(MRI, VReg))) {
				// FIXME: change to enterBasicBlockEnd()
				RS->enterBasicBlock(MBB);
				TRI->eliminateFrameIndex(MI, 0, FIOp, RS);
				SpillFIs.set(FI);
				continue;
				}
				}
				}
				}

				for (MachineBasicBlock &MBB : MF) {
				for (MCPhysReg Reg : FuncInfo->getVGPRSpillAGPRs())
				MBB.addLiveIn(Reg);

				for (MCPhysReg Reg : FuncInfo->getAGPRSpillVGPRs())
				MBB.addLiveIn(Reg);

				MBB.sortUniqueLiveIns();

				if (!SpillFIs.empty() && SeenDbgInstr) {
				// FIXME: The dead frame indices are replaced with a null register from
				// the debug value instructions. We should instead, update it with the
				// correct register value. But not sure the register value alone is
				for (MachineInstr &MI : MBB) {
				if (MI.isDebugValue() && MI.getOperand(0).isFI() &&
				SpillFIs[MI.getOperand(0).getIndex()]) {
				MI.getOperand(0).ChangeToRegister(Register(), false /isDef/);
				MI.getOperand(0).setIsDebug();
				}
				}
				}
				}
				}

	FuncInfo->removeDeadFrameIndices(MFI);			FuncInfo->removeDeadFrameIndices(MFI);
	assert(allSGPRSpillsAreDead(MF) &&			assert(allSGPRSpillsAreDead(MF) &&
	"SGPR spill should have been removed in SILowerSGPRSpills");			"SGPR spill should have been removed in SILowerSGPRSpills");

	// FIXME: The other checks should be redundant with allStackObjectsAreDead,			// FIXME: The other checks should be redundant with allStackObjectsAreDead,
	// but currently hasNonSpillStackObjects is set only from source			// but currently hasNonSpillStackObjects is set only from source
	// allocas. Stack temps produced from legalization are not counted currently.			// allocas. Stack temps produced from legalization are not counted currently.
	if (!allStackObjectsAreDead(MFI)) {			if (!allStackObjectsAreDead(MFI)) {
	▲ Show 20 Lines • Show All 225 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp

	Show All 25 Lines
	using namespace llvm;			using namespace llvm;

	#define DEBUG_TYPE "si-lower-sgpr-spills"			#define DEBUG_TYPE "si-lower-sgpr-spills"

	using MBBVector = SmallVector<MachineBasicBlock *, 4>;			using MBBVector = SmallVector<MachineBasicBlock *, 4>;

	namespace {			namespace {

	static cl::opt<bool> EnableSpillVGPRToAGPR(
	"amdgpu-spill-vgpr-to-agpr",
	cl::desc("Enable spilling VGPRs to AGPRs"),
	cl::ReallyHidden,
	cl::init(true));

	class SILowerSGPRSpills : public MachineFunctionPass {			class SILowerSGPRSpills : public MachineFunctionPass {
	private:			private:
	const SIRegisterInfo *TRI = nullptr;			const SIRegisterInfo *TRI = nullptr;
	const SIInstrInfo *TII = nullptr;			const SIInstrInfo *TII = nullptr;
	VirtRegMap *VRM = nullptr;			VirtRegMap *VRM = nullptr;
	LiveIntervals *LIS = nullptr;			LiveIntervals *LIS = nullptr;

	// Save and Restore blocks of the current function. Typically there is a			// Save and Restore blocks of the current function. Typically there is a
	Show All 18 Lines
	};			};

	} // end anonymous namespace			} // end anonymous namespace

	char SILowerSGPRSpills::ID = 0;			char SILowerSGPRSpills::ID = 0;

	INITIALIZE_PASS_BEGIN(SILowerSGPRSpills, DEBUG_TYPE,			INITIALIZE_PASS_BEGIN(SILowerSGPRSpills, DEBUG_TYPE,
	"SI lower SGPR spill instructions", false, false)			"SI lower SGPR spill instructions", false, false)
				INITIALIZE_PASS_DEPENDENCY(LiveIntervals)
	INITIALIZE_PASS_DEPENDENCY(VirtRegMap)			INITIALIZE_PASS_DEPENDENCY(VirtRegMap)
	INITIALIZE_PASS_END(SILowerSGPRSpills, DEBUG_TYPE,			INITIALIZE_PASS_END(SILowerSGPRSpills, DEBUG_TYPE,
	"SI lower SGPR spill instructions", false, false)			"SI lower SGPR spill instructions", false, false)

	char &llvm::SILowerSGPRSpillsID = SILowerSGPRSpills::ID;			char &llvm::SILowerSGPRSpillsID = SILowerSGPRSpills::ID;

	/// Insert restore code for the callee-saved registers used in the function.			/// Insert restore code for the callee-saved registers used in the function.
	static void insertCSRSaves(MachineBasicBlock &SaveBlock,			static void insertCSRSaves(MachineBasicBlock &SaveBlock,
	▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines
	}			}

	bool SILowerSGPRSpills::runOnMachineFunction(MachineFunction &MF) {			bool SILowerSGPRSpills::runOnMachineFunction(MachineFunction &MF) {
	const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();			const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
	TII = ST.getInstrInfo();			TII = ST.getInstrInfo();
	TRI = &TII->getRegisterInfo();			TRI = &TII->getRegisterInfo();

	VRM = getAnalysisIfAvailable<VirtRegMap>();			VRM = getAnalysisIfAvailable<VirtRegMap>();
				LIS = getAnalysisIfAvailable<LiveIntervals>();

	assert(SaveBlocks.empty() && RestoreBlocks.empty());			assert(SaveBlocks.empty() && RestoreBlocks.empty());

	// First, expose any CSR SGPR spills. This is mostly the same as what PEI			// First, expose any CSR SGPR spills. This is mostly the same as what PEI
	// does, but somewhat simpler.			// does, but somewhat simpler.
	calculateSaveRestoreBlocks(MF);			calculateSaveRestoreBlocks(MF);
	bool HasCSRs = spillCalleeSavedRegs(MF);			bool HasCSRs = spillCalleeSavedRegs(MF);

	MachineFrameInfo &MFI = MF.getFrameInfo();			MachineFrameInfo &MFI = MF.getFrameInfo();
	MachineRegisterInfo &MRI = MF.getRegInfo();			MachineRegisterInfo &MRI = MF.getRegInfo();
	SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();			SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();

	if (!MFI.hasStackObjects() && !HasCSRs) {			if (!MFI.hasStackObjects() && !HasCSRs) {
	SaveBlocks.clear();			SaveBlocks.clear();
	RestoreBlocks.clear();			RestoreBlocks.clear();
	if (FuncInfo->VGPRReservedForSGPRSpill) {			if (FuncInfo->VGPRReservedForSGPRSpill) {
	// Free the reserved VGPR for later possible use by frame lowering.			// Free the reserved VGPR for later possible use by frame lowering.
	FuncInfo->removeVGPRForSGPRSpill(FuncInfo->VGPRReservedForSGPRSpill, MF);			FuncInfo->removeVGPRForSGPRSpill(FuncInfo->VGPRReservedForSGPRSpill, MF);
	MRI.freezeReservedRegs(MF);			MRI.freezeReservedRegs(MF);
	}			}
	return false;			return false;
	}			}

	const bool SpillVGPRToAGPR = ST.hasMAIInsts() && FuncInfo->hasSpilledVGPRs()
	&& EnableSpillVGPRToAGPR;

	bool MadeChange = false;			bool MadeChange = false;

	const bool SpillToAGPR = EnableSpillVGPRToAGPR && ST.hasMAIInsts();
	std::unique_ptr<RegScavenger> RS;

	bool NewReservedRegs = false;			bool NewReservedRegs = false;

	// TODO: CSR VGPRs will never be spilled to AGPRs. These can probably be			// TODO: CSR VGPRs will never be spilled to AGPRs. These can probably be
	// handled as SpilledToReg in regular PrologEpilogInserter.			// handled as SpilledToReg in regular PrologEpilogInserter.
	const bool HasSGPRSpillToVGPR = TRI->spillSGPRToVGPR() &&			const bool HasSGPRSpillToVGPR = TRI->spillSGPRToVGPR() &&
	(HasCSRs \|\| FuncInfo->hasSpilledSGPRs());			(HasCSRs \|\| FuncInfo->hasSpilledSGPRs());
	if (HasSGPRSpillToVGPR \|\| SpillVGPRToAGPR) {			if (HasSGPRSpillToVGPR) {
	// Process all SGPR spills before frame offsets are finalized. Ideally SGPRs			// Process all SGPR spills before frame offsets are finalized. Ideally SGPRs
	// are spilled to VGPRs, in which case we can eliminate the stack usage.			// are spilled to VGPRs, in which case we can eliminate the stack usage.
	//			//
	// This operates under the assumption that only other SGPR spills are users			// This operates under the assumption that only other SGPR spills are users
	// of the frame index.			// of the frame index.

	lowerShiftReservedVGPR(MF, ST);			lowerShiftReservedVGPR(MF, ST);

	// To track the spill frame indices handled in this pass.			// To track the spill frame indices handled in this pass.
	BitVector SpillFIs(MFI.getObjectIndexEnd(), false);			BitVector SpillFIs(MFI.getObjectIndexEnd(), false);

	for (MachineBasicBlock &MBB : MF) {			for (MachineBasicBlock &MBB : MF) {
	MachineBasicBlock::iterator Next;			MachineBasicBlock::iterator Next;
	for (auto I = MBB.begin(), E = MBB.end(); I != E; I = Next) {			for (auto I = MBB.begin(), E = MBB.end(); I != E; I = Next) {
	MachineInstr &MI = *I;			MachineInstr &MI = *I;
	Next = std::next(I);			Next = std::next(I);

	if (SpillToAGPR && TII->isVGPRSpill(MI)) {			if (!TII->isSGPRSpill(MI))
	// Try to eliminate stack used by VGPR spills before frame
	// finalization.
	unsigned FIOp = AMDGPU::getNamedOperandIdx(MI.getOpcode(),
	AMDGPU::OpName::vaddr);
	int FI = MI.getOperand(FIOp).getIndex();
	Register VReg =
	TII->getNamedOperand(MI, AMDGPU::OpName::vdata)->getReg();
	if (FuncInfo->allocateVGPRSpillToAGPR(MF, FI,
	TRI->isAGPR(MRI, VReg))) {
	NewReservedRegs = true;
	if (!RS)
	RS.reset(new RegScavenger());

	// FIXME: change to enterBasicBlockEnd()
	RS->enterBasicBlock(MBB);
	TRI->eliminateFrameIndex(MI, 0, FIOp, RS.get());
	SpillFIs.set(FI);
	continue;
	}
	}

	if (!TII->isSGPRSpill(MI) \|\| !TRI->spillSGPRToVGPR())
	continue;			continue;

	int FI = TII->getNamedOperand(MI, AMDGPU::OpName::addr)->getIndex();			int FI = TII->getNamedOperand(MI, AMDGPU::OpName::addr)->getIndex();
	assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);			assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);
	if (FuncInfo->allocateSGPRSpillToVGPR(MF, FI)) {			if (FuncInfo->allocateSGPRSpillToVGPR(MF, FI)) {
	NewReservedRegs = true;			NewReservedRegs = true;
	bool Spilled = TRI->eliminateSGPRToVGPRSpillFrameIndex(MI, FI, nullptr);			bool Spilled = TRI->eliminateSGPRToVGPRSpillFrameIndex(MI, FI,
				nullptr, LIS);
	(void)Spilled;			(void)Spilled;
	assert(Spilled && "failed to spill SGPR to VGPR when allocated");			assert(Spilled && "failed to spill SGPR to VGPR when allocated");
	SpillFIs.set(FI);			SpillFIs.set(FI);
	}			}
	}			}
	}			}

				// FIXME: Adding to live-ins redundant with reserving registers.
	for (MachineBasicBlock &MBB : MF) {			for (MachineBasicBlock &MBB : MF) {
	for (auto SSpill : FuncInfo->getSGPRSpillVGPRs())			for (auto SSpill : FuncInfo->getSGPRSpillVGPRs())
	MBB.addLiveIn(SSpill.VGPR);			MBB.addLiveIn(SSpill.VGPR);

	for (MCPhysReg Reg : FuncInfo->getVGPRSpillAGPRs())
	MBB.addLiveIn(Reg);

	for (MCPhysReg Reg : FuncInfo->getAGPRSpillVGPRs())
	MBB.addLiveIn(Reg);

	MBB.sortUniqueLiveIns();			MBB.sortUniqueLiveIns();

	// FIXME: The dead frame indices are replaced with a null register from			// FIXME: The dead frame indices are replaced with a null register from
	// the debug value instructions. We should instead, update it with the			// the debug value instructions. We should instead, update it with the
	// correct register value. But not sure the register value alone is			// correct register value. But not sure the register value alone is
	// adequate to lower the DIExpression. It should be worked out later.			// adequate to lower the DIExpression. It should be worked out later.
	for (MachineInstr &MI : MBB) {			for (MachineInstr &MI : MBB) {
	if (MI.isDebugValue() && MI.getOperand(0).isFI() &&			if (MI.isDebugValue() && MI.getOperand(0).isFI() &&
	Show All 21 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

//===- SIMachineFunctionInfo.cpp - SI Machine Function Info ---------------===//		//===- SIMachineFunctionInfo.cpp - SI Machine Function Info ---------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
#include "AMDGPUTargetMachine.h"		#include "AMDGPUTargetMachine.h"
		#include "AMDGPUSubtarget.h"
		#include "SIRegisterInfo.h"
		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
		#include "Utils/AMDGPUBaseInfo.h"
		#include "llvm/ADT/Optional.h"
		#include "llvm/CodeGen/LiveIntervals.h"
		#include "llvm/CodeGen/MachineBasicBlock.h"
		#include "llvm/CodeGen/MachineFrameInfo.h"
		#include "llvm/CodeGen/MachineFunction.h"
		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/MIRParser/MIParser.h"		#include "llvm/CodeGen/MIRParser/MIParser.h"
		#include "llvm/IR/CallingConv.h"
		#include "llvm/IR/DiagnosticInfo.h"
		#include "llvm/IR/Function.h"
		#include <cassert>
		#include <vector>

#define MAX_LANES 64		#define MAX_LANES 64

using namespace llvm;		using namespace llvm;

SIMachineFunctionInfo::SIMachineFunctionInfo(const MachineFunction &MF)		SIMachineFunctionInfo::SIMachineFunctionInfo(const MachineFunction &MF)
: AMDGPUMachineFunction(MF),		: AMDGPUMachineFunction(MF),
PrivateSegmentBuffer(false),		PrivateSegmentBuffer(false),
▲ Show 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	if (FuncInfo->VGPRReservedForSGPRSpill && NumVGPRSpillLanes < WaveSize) {
LaneVGPR = FuncInfo->VGPRReservedForSGPRSpill;		LaneVGPR = FuncInfo->VGPRReservedForSGPRSpill;
} else if (VGPRIndex == 0) {		} else if (VGPRIndex == 0) {
LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);		LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);
if (LaneVGPR == AMDGPU::NoRegister) {		if (LaneVGPR == AMDGPU::NoRegister) {
// We have no VGPRs left for spilling SGPRs. Reset because we will not		// We have no VGPRs left for spilling SGPRs. Reset because we will not
// partially spill the SGPR to VGPRs.		// partially spill the SGPR to VGPRs.
SGPRToVGPRSpills.erase(FI);		SGPRToVGPRSpills.erase(FI);
NumVGPRSpillLanes -= I;		NumVGPRSpillLanes -= I;

		#if 0
		DiagnosticInfoResourceLimit DiagOutOfRegs(MF.getFunction(),
		"VGPRs for SGPR spilling",
		0, DS_Error);
		MF.getFunction().getContext().diagnose(DiagOutOfRegs);
		#endif
return false;		return false;
}		}

Optional<int> SpillFI;		Optional<int> SpillFI;
// We need to preserve inactive lanes, so always save, even caller-save		// We need to preserve inactive lanes, so always save, even caller-save
// registers.		// registers.
if (!isEntryFunction()) {		if (!isEntryFunction()) {
SpillFI = FrameInfo.CreateSpillStackObject(4, Align(4));		SpillFI = FrameInfo.CreateSpillStackObject(4, Align(4));
▲ Show 20 Lines • Show All 305 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	const TargetRegisterClass *getPointerRegClass(
const MachineFunction &MF, unsigned Kind = 0) const override;		const MachineFunction &MF, unsigned Kind = 0) const override;

void buildVGPRSpillLoadStore(SGPRSpillBuilder &SB, int Index, int Offset,		void buildVGPRSpillLoadStore(SGPRSpillBuilder &SB, int Index, int Offset,
bool IsLoad, bool IsKill = true) const;		bool IsLoad, bool IsKill = true) const;

void buildSGPRSpillLoadStore(SGPRSpillBuilder &SB, int Offset,		void buildSGPRSpillLoadStore(SGPRSpillBuilder &SB, int Offset,
int64_t VGPRLanes) const;		int64_t VGPRLanes) const;

/// If \p OnlyToVGPR is true, this will only succeed if this		/// If \p OnlyToVGPR is true, this will only succeed if this manages to find a
		/// free VGPR lane to spill.
bool spillSGPR(MachineBasicBlock::iterator MI,		bool spillSGPR(MachineBasicBlock::iterator MI,
int FI, RegScavenger *RS,		int FI, RegScavenger *RS,
		LiveIntervals *LIS = nullptr,
bool OnlyToVGPR = false) const;		bool OnlyToVGPR = false) const;

bool restoreSGPR(MachineBasicBlock::iterator MI,		bool restoreSGPR(MachineBasicBlock::iterator MI,
int FI, RegScavenger *RS,		int FI, RegScavenger *RS,
		LiveIntervals *LIS = nullptr,
bool OnlyToVGPR = false) const;		bool OnlyToVGPR = false) const;

void eliminateFrameIndex(MachineBasicBlock::iterator MI, int SPAdj,		void eliminateFrameIndex(MachineBasicBlock::iterator MI, int SPAdj,
unsigned FIOperandNum,		unsigned FIOperandNum,
RegScavenger *RS) const override;		RegScavenger *RS) const override;

bool eliminateSGPRToVGPRSpillFrameIndex(MachineBasicBlock::iterator MI,		bool eliminateSGPRToVGPRSpillFrameIndex(MachineBasicBlock::iterator MI,
int FI, RegScavenger *RS) const;		int FI, RegScavenger *RS,
		LiveIntervals *LIS = nullptr) const;

StringRef getRegAsmName(MCRegister Reg) const override;		StringRef getRegAsmName(MCRegister Reg) const override;

// Pseudo regs are not allowed		// Pseudo regs are not allowed
unsigned getHWRegIndex(MCRegister Reg) const {		unsigned getHWRegIndex(MCRegister Reg) const {
return getEncodingValue(Reg) & 0xff;		return getEncodingValue(Reg) & 0xff;
}		}

▲ Show 20 Lines • Show All 223 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 566 Lines • ▼ Show 20 Lines	if (hasBasePointer(MF)) {
reserveRegisterTuples(Reserved, BasePtrReg);		reserveRegisterTuples(Reserved, BasePtrReg);
assert(!isSubRegister(ScratchRSrcReg, BasePtrReg));		assert(!isSubRegister(ScratchRSrcReg, BasePtrReg));
}		}

for (auto Reg : MFI->WWMReservedRegs) {		for (auto Reg : MFI->WWMReservedRegs) {
reserveRegisterTuples(Reserved, Reg.first);		reserveRegisterTuples(Reserved, Reg.first);
}		}

		// Reserve VGPRs used for SGPR spilling.
		// Note we treat freezeReservedRegs unusually because we run register
		// allocation in two phases. It's OK to re-freeze with new registers for the
		// second run.
		#if 0
		for (auto &SpilledFI : MFI->sgpr_spill_vgprs()) {
		for (auto &SpilledVGPR : SpilledFI.second)
		reserveRegisterTuples(Reserved, SpilledVGPR.VGPR);
		}
		#endif

// FIXME: Stop using reserved registers for this.		// FIXME: Stop using reserved registers for this.
for (MCPhysReg Reg : MFI->getAGPRSpillVGPRs())		for (MCPhysReg Reg : MFI->getAGPRSpillVGPRs())
reserveRegisterTuples(Reserved, Reg);		reserveRegisterTuples(Reserved, Reg);

for (MCPhysReg Reg : MFI->getVGPRSpillAGPRs())		for (MCPhysReg Reg : MFI->getVGPRSpillAGPRs())
reserveRegisterTuples(Reserved, Reg);		reserveRegisterTuples(Reserved, Reg);

for (auto SSpill : MFI->getSGPRSpillVGPRs())		for (auto SSpill : MFI->getSGPRSpillVGPRs())
▲ Show 20 Lines • Show All 716 Lines • ▼ Show 20 Lines	if (IsLoad) {
// This only ever adds one VGPR spill		// This only ever adds one VGPR spill
SB.MFI.addToSpilledVGPRs(1);		SB.MFI.addToSpilledVGPRs(1);
}		}
}		}

bool SIRegisterInfo::spillSGPR(MachineBasicBlock::iterator MI,		bool SIRegisterInfo::spillSGPR(MachineBasicBlock::iterator MI,
int Index,		int Index,
RegScavenger *RS,		RegScavenger *RS,
		LiveIntervals *LIS,
bool OnlyToVGPR) const {		bool OnlyToVGPR) const {
SGPRSpillBuilder SB(this, ST.getInstrInfo(), isWave32, MI, Index, RS);		SGPRSpillBuilder SB(this, ST.getInstrInfo(), isWave32, MI, Index, RS);

ArrayRef<SIMachineFunctionInfo::SpilledReg> VGPRSpills =		ArrayRef<SIMachineFunctionInfo::SpilledReg> VGPRSpills =
SB.MFI.getSGPRToVGPRSpills(Index);		SB.MFI.getSGPRToVGPRSpills(Index);
bool SpillToVGPR = !VGPRSpills.empty();		bool SpillToVGPR = !VGPRSpills.empty();
if (OnlyToVGPR && !SpillToVGPR)		if (OnlyToVGPR && !SpillToVGPR)
return false;		return false;
Show All 13 Lines	for (unsigned i = 0, e = SB.NumSubRegs; i < e; ++i) {

// Mark the "old value of vgpr" input undef only if this is the first sgpr		// Mark the "old value of vgpr" input undef only if this is the first sgpr
// spill to this specific vgpr in the first basic block.		// spill to this specific vgpr in the first basic block.
auto MIB = BuildMI(SB.MBB, MI, SB.DL, SB.TII.get(AMDGPU::V_WRITELANE_B32),		auto MIB = BuildMI(SB.MBB, MI, SB.DL, SB.TII.get(AMDGPU::V_WRITELANE_B32),
Spill.VGPR)		Spill.VGPR)
.addReg(SubReg, getKillRegState(UseKill))		.addReg(SubReg, getKillRegState(UseKill))
.addImm(Spill.Lane)		.addImm(Spill.Lane)
.addReg(Spill.VGPR);		.addReg(Spill.VGPR);
		if (LIS) {
		if (i == 0)
		LIS->ReplaceMachineInstrInMaps(MI, MIB);
		else
		LIS->InsertMachineInstrInMaps(*MIB);
		}

if (i == 0 && SB.NumSubRegs > 1) {		if (i == 0 && SB.NumSubRegs > 1) {
// We may be spilling a super-register which is only partially defined,		// We may be spilling a super-register which is only partially defined,
// and need to ensure later spills think the value is defined.		// and need to ensure later spills think the value is defined.
MIB.addReg(SB.SuperReg, RegState::ImplicitDefine);		MIB.addReg(SB.SuperReg, RegState::ImplicitDefine);
}		}

if (SB.NumSubRegs > 1)		if (SB.NumSubRegs > 1)
Show All 27 Lines	for (unsigned Offset = 0; Offset < PVD.NumVGPRs; ++Offset) {
MachineInstrBuilder WriteLane =		MachineInstrBuilder WriteLane =
BuildMI(SB.MBB, MI, SB.DL, SB.TII.get(AMDGPU::V_WRITELANE_B32),		BuildMI(SB.MBB, MI, SB.DL, SB.TII.get(AMDGPU::V_WRITELANE_B32),
SB.TmpVGPR)		SB.TmpVGPR)
.addReg(SubReg, SubKillState)		.addReg(SubReg, SubKillState)
.addImm(i % PVD.PerVGPR)		.addImm(i % PVD.PerVGPR)
.addReg(SB.TmpVGPR, TmpVGPRFlags);		.addReg(SB.TmpVGPR, TmpVGPRFlags);
TmpVGPRFlags = 0;		TmpVGPRFlags = 0;

		if (LIS) {
		if (i == 0)
		LIS->ReplaceMachineInstrInMaps(MI, WriteLane);
		else
		LIS->InsertMachineInstrInMaps(*WriteLane);
		}

// There could be undef components of a spilled super register.		// There could be undef components of a spilled super register.
// TODO: Can we detect this and skip the spill?		// TODO: Can we detect this and skip the spill?
if (SB.NumSubRegs > 1) {		if (SB.NumSubRegs > 1) {
// The last implicit use of the SB.SuperReg carries the "Kill" flag.		// The last implicit use of the SB.SuperReg carries the "Kill" flag.
unsigned SuperKillState = 0;		unsigned SuperKillState = 0;
if (i + 1 == SB.NumSubRegs)		if (i + 1 == SB.NumSubRegs)
SuperKillState \|= getKillRegState(SB.IsKill);		SuperKillState \|= getKillRegState(SB.IsKill);
WriteLane.addReg(SB.SuperReg, RegState::Implicit \| SuperKillState);		WriteLane.addReg(SB.SuperReg, RegState::Implicit \| SuperKillState);
}		}
}		}

// Write out VGPR		// Write out VGPR
SB.readWriteTmpVGPR(Offset, /IsLoad/ false);		SB.readWriteTmpVGPR(Offset, /IsLoad/ false);
}		}

SB.restore();		SB.restore();
}		}

MI->eraseFromParent();		MI->eraseFromParent();
SB.MFI.addToSpilledSGPRs(SB.NumSubRegs);		SB.MFI.addToSpilledSGPRs(SB.NumSubRegs);

		if (LIS)
		LIS->removeAllRegUnitsForPhysReg(SB.SuperReg);

return true;		return true;
}		}

bool SIRegisterInfo::restoreSGPR(MachineBasicBlock::iterator MI,		bool SIRegisterInfo::restoreSGPR(MachineBasicBlock::iterator MI,
int Index,		int Index,
RegScavenger *RS,		RegScavenger *RS,
		LiveIntervals *LIS,
bool OnlyToVGPR) const {		bool OnlyToVGPR) const {
SGPRSpillBuilder SB(this, ST.getInstrInfo(), isWave32, MI, Index, RS);		SGPRSpillBuilder SB(this, ST.getInstrInfo(), isWave32, MI, Index, RS);

ArrayRef<SIMachineFunctionInfo::SpilledReg> VGPRSpills =		ArrayRef<SIMachineFunctionInfo::SpilledReg> VGPRSpills =
SB.MFI.getSGPRToVGPRSpills(Index);		SB.MFI.getSGPRToVGPRSpills(Index);
bool SpillToVGPR = !VGPRSpills.empty();		bool SpillToVGPR = !VGPRSpills.empty();
if (OnlyToVGPR && !SpillToVGPR)		if (OnlyToVGPR && !SpillToVGPR)
return false;		return false;

if (SpillToVGPR) {		if (SpillToVGPR) {
for (unsigned i = 0, e = SB.NumSubRegs; i < e; ++i) {		for (unsigned i = 0, e = SB.NumSubRegs; i < e; ++i) {
Register SubReg =		Register SubReg =
SB.NumSubRegs == 1		SB.NumSubRegs == 1
? SB.SuperReg		? SB.SuperReg
: Register(getSubReg(SB.SuperReg, SB.SplitParts[i]));		: Register(getSubReg(SB.SuperReg, SB.SplitParts[i]));

SIMachineFunctionInfo::SpilledReg Spill = VGPRSpills[i];		SIMachineFunctionInfo::SpilledReg Spill = VGPRSpills[i];
auto MIB =		auto MIB =
BuildMI(SB.MBB, MI, SB.DL, SB.TII.get(AMDGPU::V_READLANE_B32), SubReg)		BuildMI(SB.MBB, MI, SB.DL, SB.TII.get(AMDGPU::V_READLANE_B32), SubReg)
.addReg(Spill.VGPR)		.addReg(Spill.VGPR)
.addImm(Spill.Lane);		.addImm(Spill.Lane);
if (SB.NumSubRegs > 1 && i == 0)		if (SB.NumSubRegs > 1 && i == 0)
MIB.addReg(SB.SuperReg, RegState::ImplicitDefine);		MIB.addReg(SB.SuperReg, RegState::ImplicitDefine);
		if (LIS) {
		if (i == e - 1)
		LIS->ReplaceMachineInstrInMaps(MI, MIB);
		else
		LIS->InsertMachineInstrInMaps(*MIB);
		}

}		}
} else {		} else {
SB.prepare();		SB.prepare();

// Per VGPR helper data		// Per VGPR helper data
auto PVD = SB.getPerVGPRData();		auto PVD = SB.getPerVGPRData();

for (unsigned Offset = 0; Offset < PVD.NumVGPRs; ++Offset) {		for (unsigned Offset = 0; Offset < PVD.NumVGPRs; ++Offset) {
Show All 11 Lines	for (unsigned Offset = 0; Offset < PVD.NumVGPRs; ++Offset) {

bool LastSubReg = (i + 1 == e);		bool LastSubReg = (i + 1 == e);
auto MIB = BuildMI(SB.MBB, MI, SB.DL,		auto MIB = BuildMI(SB.MBB, MI, SB.DL,
SB.TII.get(AMDGPU::V_READLANE_B32), SubReg)		SB.TII.get(AMDGPU::V_READLANE_B32), SubReg)
.addReg(SB.TmpVGPR, getKillRegState(LastSubReg))		.addReg(SB.TmpVGPR, getKillRegState(LastSubReg))
.addImm(i);		.addImm(i);
if (SB.NumSubRegs > 1 && i == 0)		if (SB.NumSubRegs > 1 && i == 0)
MIB.addReg(SB.SuperReg, RegState::ImplicitDefine);		MIB.addReg(SB.SuperReg, RegState::ImplicitDefine);
		if (LIS) {
		if (i == e - 1)
		LIS->ReplaceMachineInstrInMaps(MI, MIB);
		else
		LIS->InsertMachineInstrInMaps(*MIB);
		}
}		}
}		}

SB.restore();		SB.restore();
}		}

MI->eraseFromParent();		MI->eraseFromParent();

		if (LIS)
		LIS->removeAllRegUnitsForPhysReg(SB.SuperReg);

return true;		return true;
}		}

/// Special case of eliminateFrameIndex. Returns true if the SGPR was spilled to		/// Special case of eliminateFrameIndex. Returns true if the SGPR was spilled to
/// a VGPR and the stack slot can be safely eliminated when all other users are		/// a VGPR and the stack slot can be safely eliminated when all other users are
/// handled.		/// handled.
bool SIRegisterInfo::eliminateSGPRToVGPRSpillFrameIndex(		bool SIRegisterInfo::eliminateSGPRToVGPRSpillFrameIndex(
MachineBasicBlock::iterator MI,		MachineBasicBlock::iterator MI,
int FI,		int FI,
RegScavenger *RS) const {		RegScavenger *RS,
		LiveIntervals *LIS) const {
switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
case AMDGPU::SI_SPILL_S1024_SAVE:		case AMDGPU::SI_SPILL_S1024_SAVE:
case AMDGPU::SI_SPILL_S512_SAVE:		case AMDGPU::SI_SPILL_S512_SAVE:
case AMDGPU::SI_SPILL_S256_SAVE:		case AMDGPU::SI_SPILL_S256_SAVE:
case AMDGPU::SI_SPILL_S192_SAVE:		case AMDGPU::SI_SPILL_S192_SAVE:
case AMDGPU::SI_SPILL_S160_SAVE:		case AMDGPU::SI_SPILL_S160_SAVE:
case AMDGPU::SI_SPILL_S128_SAVE:		case AMDGPU::SI_SPILL_S128_SAVE:
case AMDGPU::SI_SPILL_S96_SAVE:		case AMDGPU::SI_SPILL_S96_SAVE:
case AMDGPU::SI_SPILL_S64_SAVE:		case AMDGPU::SI_SPILL_S64_SAVE:
case AMDGPU::SI_SPILL_S32_SAVE:		case AMDGPU::SI_SPILL_S32_SAVE:
return spillSGPR(MI, FI, RS, true);		return spillSGPR(MI, FI, RS, LIS, true);
case AMDGPU::SI_SPILL_S1024_RESTORE:		case AMDGPU::SI_SPILL_S1024_RESTORE:
case AMDGPU::SI_SPILL_S512_RESTORE:		case AMDGPU::SI_SPILL_S512_RESTORE:
case AMDGPU::SI_SPILL_S256_RESTORE:		case AMDGPU::SI_SPILL_S256_RESTORE:
case AMDGPU::SI_SPILL_S192_RESTORE:		case AMDGPU::SI_SPILL_S192_RESTORE:
case AMDGPU::SI_SPILL_S160_RESTORE:		case AMDGPU::SI_SPILL_S160_RESTORE:
case AMDGPU::SI_SPILL_S128_RESTORE:		case AMDGPU::SI_SPILL_S128_RESTORE:
case AMDGPU::SI_SPILL_S96_RESTORE:		case AMDGPU::SI_SPILL_S96_RESTORE:
case AMDGPU::SI_SPILL_S64_RESTORE:		case AMDGPU::SI_SPILL_S64_RESTORE:
case AMDGPU::SI_SPILL_S32_RESTORE:		case AMDGPU::SI_SPILL_S32_RESTORE:
return restoreSGPR(MI, FI, RS, true);		return restoreSGPR(MI, FI, RS, LIS, true);
default:		default:
llvm_unreachable("not an SGPR spill instruction");		llvm_unreachable("not an SGPR spill instruction");
}		}
}		}

void SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,		void SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
int SPAdj, unsigned FIOperandNum,		int SPAdj, unsigned FIOperandNum,
RegScavenger *RS) const {		RegScavenger *RS) const {
▲ Show 20 Lines • Show All 993 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement-stack-lower.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -mattr=-xnack -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s			; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -mattr=-xnack -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s

	; Check lowering of some large extractelement that use the stack			; Check lowering of some large extractelement that use the stack
	; instead of register indexing.			; instead of register indexing.

	define i32 @v_extract_v64i32_varidx(<64 x i32> addrspace(1)* %ptr, i32 %idx) {			define i32 @v_extract_v64i32_varidx(<64 x i32> addrspace(1)* %ptr, i32 %idx) {
	; GCN-LABEL: v_extract_v64i32_varidx:			; GCN-LABEL: v_extract_v64i32_varidx:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s6, s33			; GCN-NEXT: s_mov_b32 s6, s33
	; GCN-NEXT: s_add_u32 s33, s32, 0x3fc0			; GCN-NEXT: s_add_u32 s33, s32, 0x3fc0
	; GCN-NEXT: s_and_b32 s33, s33, 0xffffc000			; GCN-NEXT: s_and_b32 s33, s33, 0xffffc000
	; GCN-NEXT: v_add_co_u32_e32 v3, vcc, 64, v0			; GCN-NEXT: v_add_co_u32_e32 v3, vcc, 64, v0
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:60 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:56 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v60, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v60, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v61, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v61, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v62, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v63, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_addc_co_u32_e32 v4, vcc, 0, v1, vcc			; GCN-NEXT: v_addc_co_u32_e32 v4, vcc, 0, v1, vcc
	; GCN-NEXT: s_movk_i32 s4, 0x80			; GCN-NEXT: s_movk_i32 s4, 0x80
	; GCN-NEXT: global_load_dwordx4 v[8:11], v[3:4], off offset:16			; GCN-NEXT: global_load_dwordx4 v[8:11], v[3:4], off offset:16
	; GCN-NEXT: global_load_dwordx4 v[12:15], v[3:4], off offset:32			; GCN-NEXT: global_load_dwordx4 v[12:15], v[3:4], off offset:32
	; GCN-NEXT: global_load_dwordx4 v[56:59], v[3:4], off offset:48			; GCN-NEXT: global_load_dwordx4 v[56:59], v[3:4], off offset:48
	; GCN-NEXT: s_mov_b32 s5, 0			; GCN-NEXT: s_mov_b32 s5, 0
	; GCN-NEXT: v_mov_b32_e32 v3, s4			; GCN-NEXT: v_mov_b32_e32 v3, s4
	; GCN-NEXT: v_mov_b32_e32 v4, s5			; GCN-NEXT: v_mov_b32_e32 v4, s5
	; GCN-NEXT: v_add_co_u32_e32 v3, vcc, v0, v3			; GCN-NEXT: v_add_co_u32_e32 v3, vcc, v0, v3
	; GCN-NEXT: v_addc_co_u32_e32 v4, vcc, v1, v4, vcc			; GCN-NEXT: v_addc_co_u32_e32 v4, vcc, v1, v4, vcc
	; GCN-NEXT: global_load_dwordx4 v[16:19], v[0:1], off			; GCN-NEXT: global_load_dwordx4 v[16:19], v[0:1], off
	; GCN-NEXT: global_load_dwordx4 v[20:23], v[0:1], off offset:16			; GCN-NEXT: global_load_dwordx4 v[20:23], v[0:1], off offset:16
	; GCN-NEXT: global_load_dwordx4 v[24:27], v[0:1], off offset:32			; GCN-NEXT: global_load_dwordx4 v[24:27], v[0:1], off offset:32
	; GCN-NEXT: global_load_dwordx4 v[28:31], v[0:1], off offset:48			; GCN-NEXT: global_load_dwordx4 v[28:31], v[0:1], off offset:48
	; GCN-NEXT: global_load_dwordx4 v[32:35], v[0:1], off offset:64			; GCN-NEXT: global_load_dwordx4 v[32:35], v[0:1], off offset:64
	; GCN-NEXT: global_load_dwordx4 v[36:39], v[0:1], off offset:128			; GCN-NEXT: global_load_dwordx4 v[36:39], v[0:1], off offset:128
	; GCN-NEXT: global_load_dwordx4 v[40:43], v[0:1], off offset:192			; GCN-NEXT: global_load_dwordx4 v[40:43], v[0:1], off offset:192
	; GCN-NEXT: global_load_dwordx4 v[44:47], v[3:4], off offset:16			; GCN-NEXT: global_load_dwordx4 v[44:47], v[3:4], off offset:16
	; GCN-NEXT: global_load_dwordx4 v[48:51], v[3:4], off offset:32			; GCN-NEXT: global_load_dwordx4 v[52:55], v[3:4], off offset:32
	; GCN-NEXT: global_load_dwordx4 v[52:55], v[3:4], off offset:48			; GCN-NEXT: global_load_dwordx4 v[48:51], v[3:4], off offset:48
	; GCN-NEXT: s_movk_i32 s4, 0xc0			; GCN-NEXT: s_movk_i32 s4, 0xc0
	; GCN-NEXT: v_mov_b32_e32 v6, s5			; GCN-NEXT: v_mov_b32_e32 v6, s5
	; GCN-NEXT: v_mov_b32_e32 v5, s4			; GCN-NEXT: v_mov_b32_e32 v5, s4
	; GCN-NEXT: v_add_co_u32_e32 v60, vcc, v0, v5			; GCN-NEXT: v_add_co_u32_e32 v60, vcc, v0, v5
	; GCN-NEXT: v_addc_co_u32_e32 v61, vcc, v1, v6, vcc			; GCN-NEXT: v_addc_co_u32_e32 v61, vcc, v1, v6, vcc
	; GCN-NEXT: v_and_b32_e32 v0, 63, v2			; GCN-NEXT: v_and_b32_e32 v0, 63, v2
	; GCN-NEXT: v_lshrrev_b32_e64 v1, 6, s33			; GCN-NEXT: v_lshrrev_b32_e64 v1, 6, s33
	; GCN-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GCN-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GCN-NEXT: v_add_u32_e32 v1, 0x100, v1			; GCN-NEXT: v_add_u32_e32 v1, 0x100, v1
	; GCN-NEXT: v_add_u32_e32 v0, v1, v0			; GCN-NEXT: v_add_u32_e32 v0, v1, v0
	; GCN-NEXT: s_add_u32 s32, s32, 0x10000			; GCN-NEXT: s_add_u32 s32, s32, 0x10000
	; GCN-NEXT: s_sub_u32 s32, s32, 0x10000			; GCN-NEXT: s_sub_u32 s32, s32, 0x10000
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:640 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v36, off, s[0:3], s33 offset:576 ; 4-byte Folded Spill
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:644 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v37, off, s[0:3], s33 offset:580 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:648 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v38, off, s[0:3], s33 offset:584 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:652 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v39, off, s[0:3], s33 offset:588 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:656 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:592 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:660 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:596 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:664 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:600 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:668 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:604 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:672 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:608 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:676 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:612 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:680 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:616 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:684 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:620 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v52, off, s[0:3], s33 offset:688 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:624 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v53, off, s[0:3], s33 offset:692 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:628 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v54, off, s[0:3], s33 offset:696 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:632 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v55, off, s[0:3], s33 offset:700 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:636 ; 4-byte Folded Spill
	; GCN-NEXT: global_load_dwordx4 v[4:7], v[60:61], off offset:16			; GCN-NEXT: global_load_dwordx4 v[4:7], v[60:61], off offset:16
	; GCN-NEXT: global_load_dwordx4 v[52:55], v[60:61], off offset:32			; GCN-NEXT: global_load_dwordx4 v[48:51], v[60:61], off offset:32
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:576 ; 4-byte Folded Spill
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:580 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:584 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:588 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:592 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:596 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:600 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:604 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v52, off, s[0:3], s33 offset:608 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v53, off, s[0:3], s33 offset:612 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v54, off, s[0:3], s33 offset:616 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v55, off, s[0:3], s33 offset:620 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:624 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:628 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:632 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:636 ; 4-byte Folded Spill
	; GCN-NEXT: global_load_dwordx4 v[52:55], v[60:61], off offset:48
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:512 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:512 ; 4-byte Folded Spill
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:516 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:516 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:520 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:520 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:524 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:524 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:528 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:528 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:532 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:532 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:536 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:536 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:540 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:540 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:544 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:544 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:548 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:548 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:552 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:552 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:556 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:556 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v52, off, s[0:3], s33 offset:560 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v52, off, s[0:3], s33 offset:560 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v53, off, s[0:3], s33 offset:564 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v53, off, s[0:3], s33 offset:564 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v54, off, s[0:3], s33 offset:568 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v54, off, s[0:3], s33 offset:568 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v55, off, s[0:3], s33 offset:572 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v55, off, s[0:3], s33 offset:572 ; 4-byte Folded Spill
				; GCN-NEXT: global_load_dwordx4 v[60:63], v[60:61], off offset:48
	; GCN-NEXT: buffer_store_dword v16, off, s[0:3], s33 offset:256			; GCN-NEXT: buffer_store_dword v16, off, s[0:3], s33 offset:256
	; GCN-NEXT: buffer_store_dword v17, off, s[0:3], s33 offset:260			; GCN-NEXT: buffer_store_dword v17, off, s[0:3], s33 offset:260
	; GCN-NEXT: buffer_store_dword v18, off, s[0:3], s33 offset:264			; GCN-NEXT: buffer_store_dword v18, off, s[0:3], s33 offset:264
	; GCN-NEXT: buffer_store_dword v19, off, s[0:3], s33 offset:268			; GCN-NEXT: buffer_store_dword v19, off, s[0:3], s33 offset:268
	; GCN-NEXT: buffer_store_dword v20, off, s[0:3], s33 offset:272			; GCN-NEXT: buffer_store_dword v20, off, s[0:3], s33 offset:272
	; GCN-NEXT: buffer_store_dword v21, off, s[0:3], s33 offset:276			; GCN-NEXT: buffer_store_dword v21, off, s[0:3], s33 offset:276
	; GCN-NEXT: buffer_store_dword v22, off, s[0:3], s33 offset:280			; GCN-NEXT: buffer_store_dword v22, off, s[0:3], s33 offset:280
	; GCN-NEXT: buffer_store_dword v23, off, s[0:3], s33 offset:284			; GCN-NEXT: buffer_store_dword v23, off, s[0:3], s33 offset:284
	Show All 24 Lines
	; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:368			; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:368
	; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:372			; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:372
	; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:376			; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:376
	; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:380			; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:380
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:400			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:400
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:404			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:404
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:408			; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:408
	; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:412			; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:412
	; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:416			; GCN-NEXT: buffer_store_dword v52, off, s[0:3], s33 offset:416
	; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:420			; GCN-NEXT: buffer_store_dword v53, off, s[0:3], s33 offset:420
	; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:424			; GCN-NEXT: buffer_store_dword v54, off, s[0:3], s33 offset:424
	; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:428			; GCN-NEXT: buffer_store_dword v55, off, s[0:3], s33 offset:428
	; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:640 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:576 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:644 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:580 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:648 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:584 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:652 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:588 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:656 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:592 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:660 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:596 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:664 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:600 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:668 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:604 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:672 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:608 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:676 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:612 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:680 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:616 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v19, off, s[0:3], s33 offset:684 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v19, off, s[0:3], s33 offset:620 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v20, off, s[0:3], s33 offset:688 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v20, off, s[0:3], s33 offset:624 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v21, off, s[0:3], s33 offset:692 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v21, off, s[0:3], s33 offset:628 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v22, off, s[0:3], s33 offset:696 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v22, off, s[0:3], s33 offset:632 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v23, off, s[0:3], s33 offset:700 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v23, off, s[0:3], s33 offset:636 ; 4-byte Folded Reload
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v12, v20			; GCN-NEXT: v_mov_b32_e32 v12, v20
	; GCN-NEXT: v_mov_b32_e32 v13, v21			; GCN-NEXT: v_mov_b32_e32 v13, v21
	; GCN-NEXT: v_mov_b32_e32 v14, v22			; GCN-NEXT: v_mov_b32_e32 v14, v22
	; GCN-NEXT: v_mov_b32_e32 v15, v23			; GCN-NEXT: v_mov_b32_e32 v15, v23
	; GCN-NEXT: buffer_store_dword v12, off, s[0:3], s33 offset:432			; GCN-NEXT: buffer_store_dword v12, off, s[0:3], s33 offset:432
	; GCN-NEXT: buffer_store_dword v13, off, s[0:3], s33 offset:436			; GCN-NEXT: buffer_store_dword v13, off, s[0:3], s33 offset:436
	; GCN-NEXT: buffer_store_dword v14, off, s[0:3], s33 offset:440			; GCN-NEXT: buffer_store_dword v14, off, s[0:3], s33 offset:440
	; GCN-NEXT: buffer_store_dword v15, off, s[0:3], s33 offset:444			; GCN-NEXT: buffer_store_dword v15, off, s[0:3], s33 offset:444
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:448			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:448
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:452			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:452
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:456			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:456
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:460			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:460
	; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:464			; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:464
	; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:468			; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:468
	; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:472			; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:472
	; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:476			; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:476
	; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:576 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:580 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:584 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:588 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:592 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:596 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:600 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:604 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:608 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:612 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:616 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:620 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:624 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:628 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:632 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:636 ; 4-byte Folded Reload
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v8, v11
	; GCN-NEXT: v_mov_b32_e32 v9, v12
	; GCN-NEXT: v_mov_b32_e32 v10, v13
	; GCN-NEXT: v_mov_b32_e32 v11, v14
	; GCN-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:480
	; GCN-NEXT: buffer_store_dword v9, off, s[0:3], s33 offset:484
	; GCN-NEXT: buffer_store_dword v10, off, s[0:3], s33 offset:488
	; GCN-NEXT: buffer_store_dword v11, off, s[0:3], s33 offset:492
	; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:512 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:512 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:516 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:516 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:520 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:520 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:524 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:524 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:528 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:528 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:532 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:532 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:536 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:536 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:540 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:540 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:544 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:544 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:548 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:548 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:552 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:552 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:556 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:556 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:560 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:560 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:564 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:564 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:568 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:568 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:572 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:572 ; 4-byte Folded Reload
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v12, v15			; GCN-NEXT: v_mov_b32_e32 v8, v11
	; GCN-NEXT: v_mov_b32_e32 v13, v16			; GCN-NEXT: v_mov_b32_e32 v9, v12
	; GCN-NEXT: v_mov_b32_e32 v14, v17			; GCN-NEXT: v_mov_b32_e32 v10, v13
	; GCN-NEXT: v_mov_b32_e32 v15, v18			; GCN-NEXT: v_mov_b32_e32 v11, v14
	; GCN-NEXT: buffer_store_dword v12, off, s[0:3], s33 offset:496			; GCN-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:480
	; GCN-NEXT: buffer_store_dword v13, off, s[0:3], s33 offset:500			; GCN-NEXT: buffer_store_dword v9, off, s[0:3], s33 offset:484
	; GCN-NEXT: buffer_store_dword v14, off, s[0:3], s33 offset:504			; GCN-NEXT: buffer_store_dword v10, off, s[0:3], s33 offset:488
	; GCN-NEXT: buffer_store_dword v15, off, s[0:3], s33 offset:508			; GCN-NEXT: buffer_store_dword v11, off, s[0:3], s33 offset:492
				; GCN-NEXT: buffer_store_dword v60, off, s[0:3], s33 offset:496
				; GCN-NEXT: buffer_store_dword v61, off, s[0:3], s33 offset:500
				; GCN-NEXT: buffer_store_dword v62, off, s[0:3], s33 offset:504
				; GCN-NEXT: buffer_store_dword v63, off, s[0:3], s33 offset:508
	; GCN-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen			; GCN-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen
	; GCN-NEXT: buffer_load_dword v61, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v63, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v60, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v62, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v59, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v61, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v58, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v60, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v57, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v59, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v56, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v58, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v57, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v56, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:56 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:60 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b32 s33, s6			; GCN-NEXT: s_mov_b32 s33, s6
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%vec = load <64 x i32>, <64 x i32> addrspace(1)* %ptr			%vec = load <64 x i32>, <64 x i32> addrspace(1)* %ptr
	%elt = extractelement <64 x i32> %vec, i32 %idx			%elt = extractelement <64 x i32> %vec, i32 %idx
	ret i32 %elt			ret i32 %elt
	}			}

	define i16 @v_extract_v128i16_varidx(<128 x i16> addrspace(1)* %ptr, i32 %idx) {			define i16 @v_extract_v128i16_varidx(<128 x i16> addrspace(1)* %ptr, i32 %idx) {
	; GCN-LABEL: v_extract_v128i16_varidx:			; GCN-LABEL: v_extract_v128i16_varidx:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s6, s33			; GCN-NEXT: s_mov_b32 s6, s33
	; GCN-NEXT: s_add_u32 s33, s32, 0x3fc0			; GCN-NEXT: s_add_u32 s33, s32, 0x3fc0
	; GCN-NEXT: s_and_b32 s33, s33, 0xffffc000			; GCN-NEXT: s_and_b32 s33, s33, 0xffffc000
	; GCN-NEXT: v_add_co_u32_e32 v3, vcc, 64, v0			; GCN-NEXT: v_add_co_u32_e32 v3, vcc, 64, v0
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:60 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:56 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v60, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v60, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v61, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v61, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v62, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v63, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_addc_co_u32_e32 v4, vcc, 0, v1, vcc			; GCN-NEXT: v_addc_co_u32_e32 v4, vcc, 0, v1, vcc
	; GCN-NEXT: s_movk_i32 s4, 0x80			; GCN-NEXT: s_movk_i32 s4, 0x80
	; GCN-NEXT: global_load_dwordx4 v[8:11], v[3:4], off offset:16			; GCN-NEXT: global_load_dwordx4 v[8:11], v[3:4], off offset:16
	; GCN-NEXT: global_load_dwordx4 v[12:15], v[3:4], off offset:32			; GCN-NEXT: global_load_dwordx4 v[12:15], v[3:4], off offset:32
	; GCN-NEXT: global_load_dwordx4 v[56:59], v[3:4], off offset:48			; GCN-NEXT: global_load_dwordx4 v[56:59], v[3:4], off offset:48
	; GCN-NEXT: s_mov_b32 s5, 0			; GCN-NEXT: s_mov_b32 s5, 0
	; GCN-NEXT: v_mov_b32_e32 v3, s4			; GCN-NEXT: v_mov_b32_e32 v3, s4
	; GCN-NEXT: v_mov_b32_e32 v4, s5			; GCN-NEXT: v_mov_b32_e32 v4, s5
	; GCN-NEXT: v_add_co_u32_e32 v3, vcc, v0, v3			; GCN-NEXT: v_add_co_u32_e32 v3, vcc, v0, v3
	; GCN-NEXT: v_addc_co_u32_e32 v4, vcc, v1, v4, vcc			; GCN-NEXT: v_addc_co_u32_e32 v4, vcc, v1, v4, vcc
	; GCN-NEXT: global_load_dwordx4 v[16:19], v[0:1], off			; GCN-NEXT: global_load_dwordx4 v[16:19], v[0:1], off
	; GCN-NEXT: global_load_dwordx4 v[20:23], v[0:1], off offset:16			; GCN-NEXT: global_load_dwordx4 v[20:23], v[0:1], off offset:16
	; GCN-NEXT: global_load_dwordx4 v[24:27], v[0:1], off offset:32			; GCN-NEXT: global_load_dwordx4 v[24:27], v[0:1], off offset:32
	; GCN-NEXT: global_load_dwordx4 v[28:31], v[0:1], off offset:48			; GCN-NEXT: global_load_dwordx4 v[28:31], v[0:1], off offset:48
	; GCN-NEXT: global_load_dwordx4 v[32:35], v[0:1], off offset:64			; GCN-NEXT: global_load_dwordx4 v[32:35], v[0:1], off offset:64
	; GCN-NEXT: global_load_dwordx4 v[36:39], v[0:1], off offset:128			; GCN-NEXT: global_load_dwordx4 v[36:39], v[0:1], off offset:128
	; GCN-NEXT: global_load_dwordx4 v[40:43], v[0:1], off offset:192			; GCN-NEXT: global_load_dwordx4 v[40:43], v[0:1], off offset:192
	; GCN-NEXT: global_load_dwordx4 v[44:47], v[3:4], off offset:16			; GCN-NEXT: global_load_dwordx4 v[44:47], v[3:4], off offset:16
	; GCN-NEXT: global_load_dwordx4 v[48:51], v[3:4], off offset:32			; GCN-NEXT: global_load_dwordx4 v[52:55], v[3:4], off offset:32
	; GCN-NEXT: global_load_dwordx4 v[52:55], v[3:4], off offset:48			; GCN-NEXT: global_load_dwordx4 v[48:51], v[3:4], off offset:48
	; GCN-NEXT: s_movk_i32 s4, 0xc0			; GCN-NEXT: s_movk_i32 s4, 0xc0
	; GCN-NEXT: v_mov_b32_e32 v6, s5			; GCN-NEXT: v_mov_b32_e32 v6, s5
	; GCN-NEXT: v_mov_b32_e32 v5, s4			; GCN-NEXT: v_mov_b32_e32 v5, s4
	; GCN-NEXT: v_add_co_u32_e32 v60, vcc, v0, v5			; GCN-NEXT: v_add_co_u32_e32 v60, vcc, v0, v5
	; GCN-NEXT: v_addc_co_u32_e32 v61, vcc, v1, v6, vcc			; GCN-NEXT: v_addc_co_u32_e32 v61, vcc, v1, v6, vcc
	; GCN-NEXT: v_lshrrev_b32_e32 v0, 1, v2			; GCN-NEXT: v_lshrrev_b32_e32 v0, 1, v2
	; GCN-NEXT: v_and_b32_e32 v0, 63, v0			; GCN-NEXT: v_and_b32_e32 v0, 63, v0
	; GCN-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GCN-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GCN-NEXT: v_and_b32_e32 v1, 1, v2			; GCN-NEXT: v_and_b32_e32 v1, 1, v2
	; GCN-NEXT: v_lshlrev_b32_e32 v1, 4, v1			; GCN-NEXT: v_lshlrev_b32_e32 v1, 4, v1
	; GCN-NEXT: s_add_u32 s32, s32, 0x10000			; GCN-NEXT: s_add_u32 s32, s32, 0x10000
	; GCN-NEXT: s_sub_u32 s32, s32, 0x10000			; GCN-NEXT: s_sub_u32 s32, s32, 0x10000
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:640 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v36, off, s[0:3], s33 offset:576 ; 4-byte Folded Spill
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:644 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v37, off, s[0:3], s33 offset:580 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:648 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v38, off, s[0:3], s33 offset:584 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:652 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v39, off, s[0:3], s33 offset:588 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:656 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:592 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:660 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:596 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:664 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:600 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:668 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:604 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:672 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:608 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:676 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:612 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:680 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:616 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:684 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:620 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v52, off, s[0:3], s33 offset:688 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:624 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v53, off, s[0:3], s33 offset:692 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:628 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v54, off, s[0:3], s33 offset:696 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:632 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v55, off, s[0:3], s33 offset:700 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:636 ; 4-byte Folded Spill
	; GCN-NEXT: global_load_dwordx4 v[4:7], v[60:61], off offset:16			; GCN-NEXT: global_load_dwordx4 v[4:7], v[60:61], off offset:16
	; GCN-NEXT: global_load_dwordx4 v[52:55], v[60:61], off offset:32			; GCN-NEXT: global_load_dwordx4 v[48:51], v[60:61], off offset:32
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:576 ; 4-byte Folded Spill
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:580 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:584 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:588 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:592 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:596 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:600 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:604 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v52, off, s[0:3], s33 offset:608 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v53, off, s[0:3], s33 offset:612 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v54, off, s[0:3], s33 offset:616 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v55, off, s[0:3], s33 offset:620 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:624 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:628 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:632 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:636 ; 4-byte Folded Spill
	; GCN-NEXT: global_load_dwordx4 v[52:55], v[60:61], off offset:48
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:512 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:512 ; 4-byte Folded Spill
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:516 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:516 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:520 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:520 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:524 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:524 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:528 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:528 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:532 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:532 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:536 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:536 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:540 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:540 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:544 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:544 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:548 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:548 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:552 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:552 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:556 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:556 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v52, off, s[0:3], s33 offset:560 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v52, off, s[0:3], s33 offset:560 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v53, off, s[0:3], s33 offset:564 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v53, off, s[0:3], s33 offset:564 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v54, off, s[0:3], s33 offset:568 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v54, off, s[0:3], s33 offset:568 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v55, off, s[0:3], s33 offset:572 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v55, off, s[0:3], s33 offset:572 ; 4-byte Folded Spill
				; GCN-NEXT: global_load_dwordx4 v[60:63], v[60:61], off offset:48
	; GCN-NEXT: buffer_store_dword v16, off, s[0:3], s33 offset:256			; GCN-NEXT: buffer_store_dword v16, off, s[0:3], s33 offset:256
	; GCN-NEXT: buffer_store_dword v17, off, s[0:3], s33 offset:260			; GCN-NEXT: buffer_store_dword v17, off, s[0:3], s33 offset:260
	; GCN-NEXT: buffer_store_dword v18, off, s[0:3], s33 offset:264			; GCN-NEXT: buffer_store_dword v18, off, s[0:3], s33 offset:264
	; GCN-NEXT: buffer_store_dword v19, off, s[0:3], s33 offset:268			; GCN-NEXT: buffer_store_dword v19, off, s[0:3], s33 offset:268
	; GCN-NEXT: buffer_store_dword v20, off, s[0:3], s33 offset:272			; GCN-NEXT: buffer_store_dword v20, off, s[0:3], s33 offset:272
	; GCN-NEXT: buffer_store_dword v21, off, s[0:3], s33 offset:276			; GCN-NEXT: buffer_store_dword v21, off, s[0:3], s33 offset:276
	; GCN-NEXT: buffer_store_dword v22, off, s[0:3], s33 offset:280			; GCN-NEXT: buffer_store_dword v22, off, s[0:3], s33 offset:280
	; GCN-NEXT: buffer_store_dword v23, off, s[0:3], s33 offset:284			; GCN-NEXT: buffer_store_dword v23, off, s[0:3], s33 offset:284
	Show All 24 Lines
	; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:368			; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:368
	; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:372			; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:372
	; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:376			; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:376
	; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:380			; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:380
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:400			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:400
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:404			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:404
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:408			; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:408
	; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:412			; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:412
	; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:416			; GCN-NEXT: buffer_store_dword v52, off, s[0:3], s33 offset:416
	; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:420			; GCN-NEXT: buffer_store_dword v53, off, s[0:3], s33 offset:420
	; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:424			; GCN-NEXT: buffer_store_dword v54, off, s[0:3], s33 offset:424
	; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:428			; GCN-NEXT: buffer_store_dword v55, off, s[0:3], s33 offset:428
	; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:640 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:576 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:644 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:580 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:648 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:584 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:652 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:588 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:656 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:592 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:660 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:596 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:664 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:600 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:668 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:604 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:672 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:608 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:676 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:612 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:680 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:616 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v19, off, s[0:3], s33 offset:684 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v19, off, s[0:3], s33 offset:620 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v20, off, s[0:3], s33 offset:688 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v20, off, s[0:3], s33 offset:624 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v21, off, s[0:3], s33 offset:692 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v21, off, s[0:3], s33 offset:628 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v22, off, s[0:3], s33 offset:696 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v22, off, s[0:3], s33 offset:632 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v23, off, s[0:3], s33 offset:700 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v23, off, s[0:3], s33 offset:636 ; 4-byte Folded Reload
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v12, v20			; GCN-NEXT: v_mov_b32_e32 v12, v20
	; GCN-NEXT: v_mov_b32_e32 v13, v21			; GCN-NEXT: v_mov_b32_e32 v13, v21
	; GCN-NEXT: v_mov_b32_e32 v14, v22			; GCN-NEXT: v_mov_b32_e32 v14, v22
	; GCN-NEXT: v_mov_b32_e32 v15, v23			; GCN-NEXT: v_mov_b32_e32 v15, v23
	; GCN-NEXT: buffer_store_dword v12, off, s[0:3], s33 offset:432			; GCN-NEXT: buffer_store_dword v12, off, s[0:3], s33 offset:432
	; GCN-NEXT: buffer_store_dword v13, off, s[0:3], s33 offset:436			; GCN-NEXT: buffer_store_dword v13, off, s[0:3], s33 offset:436
	; GCN-NEXT: buffer_store_dword v14, off, s[0:3], s33 offset:440			; GCN-NEXT: buffer_store_dword v14, off, s[0:3], s33 offset:440
	; GCN-NEXT: buffer_store_dword v15, off, s[0:3], s33 offset:444			; GCN-NEXT: buffer_store_dword v15, off, s[0:3], s33 offset:444
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:448			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:448
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:452			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:452
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:456			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:456
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:460			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:460
	; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:464			; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:464
	; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:468			; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:468
	; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:472			; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:472
	; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:476			; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:476
	; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:576 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:580 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:584 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:588 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:592 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:596 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:600 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:604 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:608 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:612 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:616 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:620 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:624 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:628 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:632 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:636 ; 4-byte Folded Reload
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v8, v11
	; GCN-NEXT: v_mov_b32_e32 v9, v12
	; GCN-NEXT: v_mov_b32_e32 v10, v13
	; GCN-NEXT: v_mov_b32_e32 v11, v14
	; GCN-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:480
	; GCN-NEXT: buffer_store_dword v9, off, s[0:3], s33 offset:484
	; GCN-NEXT: buffer_store_dword v10, off, s[0:3], s33 offset:488
	; GCN-NEXT: buffer_store_dword v11, off, s[0:3], s33 offset:492
	; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:512 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:512 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:516 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:516 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:520 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:520 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:524 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:524 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:528 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:528 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:532 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:532 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:536 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:536 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:540 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:540 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:544 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:544 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:548 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:548 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:552 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:552 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:556 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:556 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:560 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:560 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:564 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:564 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:568 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:568 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:572 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:572 ; 4-byte Folded Reload
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v12, v15			; GCN-NEXT: v_mov_b32_e32 v8, v11
	; GCN-NEXT: v_mov_b32_e32 v13, v16			; GCN-NEXT: v_mov_b32_e32 v9, v12
	; GCN-NEXT: v_mov_b32_e32 v14, v17			; GCN-NEXT: v_mov_b32_e32 v10, v13
	; GCN-NEXT: v_mov_b32_e32 v15, v18			; GCN-NEXT: v_mov_b32_e32 v11, v14
	; GCN-NEXT: buffer_store_dword v12, off, s[0:3], s33 offset:496			; GCN-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:480
	; GCN-NEXT: buffer_store_dword v13, off, s[0:3], s33 offset:500			; GCN-NEXT: buffer_store_dword v9, off, s[0:3], s33 offset:484
	; GCN-NEXT: buffer_store_dword v14, off, s[0:3], s33 offset:504			; GCN-NEXT: buffer_store_dword v10, off, s[0:3], s33 offset:488
	; GCN-NEXT: buffer_store_dword v15, off, s[0:3], s33 offset:508			; GCN-NEXT: buffer_store_dword v11, off, s[0:3], s33 offset:492
	; GCN-NEXT: v_lshrrev_b32_e64 v15, 6, s33			; GCN-NEXT: buffer_store_dword v60, off, s[0:3], s33 offset:496
	; GCN-NEXT: v_add_u32_e32 v15, 0x100, v15			; GCN-NEXT: buffer_store_dword v61, off, s[0:3], s33 offset:500
	; GCN-NEXT: v_add_u32_e32 v0, v15, v0			; GCN-NEXT: buffer_store_dword v62, off, s[0:3], s33 offset:504
				; GCN-NEXT: buffer_store_dword v63, off, s[0:3], s33 offset:508
				; GCN-NEXT: v_lshrrev_b32_e64 v11, 6, s33
				; GCN-NEXT: v_add_u32_e32 v11, 0x100, v11
				; GCN-NEXT: v_add_u32_e32 v0, v11, v0
	; GCN-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen			; GCN-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen
	; GCN-NEXT: buffer_load_dword v61, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v63, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v60, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v62, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v59, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v61, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v58, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v60, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v57, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v59, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v56, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v58, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v57, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v56, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:56 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:60 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b32 s33, s6			; GCN-NEXT: s_mov_b32 s33, s6
	; GCN-NEXT: s_waitcnt vmcnt(14)			; GCN-NEXT: s_waitcnt vmcnt(16)
	; GCN-NEXT: v_lshrrev_b32_e32 v0, v1, v0			; GCN-NEXT: v_lshrrev_b32_e32 v0, v1, v0
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%vec = load <128 x i16>, <128 x i16> addrspace(1)* %ptr			%vec = load <128 x i16>, <128 x i16> addrspace(1)* %ptr
	%elt = extractelement <128 x i16> %vec, i32 %idx			%elt = extractelement <128 x i16> %vec, i32 %idx
	ret i16 %elt			ret i16 %elt
	}			}

	define i64 @v_extract_v32i64_varidx(<32 x i64> addrspace(1)* %ptr, i32 %idx) {			define i64 @v_extract_v32i64_varidx(<32 x i64> addrspace(1)* %ptr, i32 %idx) {
	; GCN-LABEL: v_extract_v32i64_varidx:			; GCN-LABEL: v_extract_v32i64_varidx:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s6, s33			; GCN-NEXT: s_mov_b32 s6, s33
	; GCN-NEXT: s_add_u32 s33, s32, 0x3fc0			; GCN-NEXT: s_add_u32 s33, s32, 0x3fc0
	; GCN-NEXT: s_and_b32 s33, s33, 0xffffc000			; GCN-NEXT: s_and_b32 s33, s33, 0xffffc000
	; GCN-NEXT: v_add_co_u32_e32 v3, vcc, 64, v0			; GCN-NEXT: v_add_co_u32_e32 v3, vcc, 64, v0
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:60 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:56 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:28 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:24 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:20 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v60, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v60, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v61, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v61, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v62, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v63, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_addc_co_u32_e32 v4, vcc, 0, v1, vcc			; GCN-NEXT: v_addc_co_u32_e32 v4, vcc, 0, v1, vcc
	; GCN-NEXT: s_movk_i32 s4, 0x80			; GCN-NEXT: s_movk_i32 s4, 0x80
	; GCN-NEXT: global_load_dwordx4 v[8:11], v[3:4], off offset:16			; GCN-NEXT: global_load_dwordx4 v[8:11], v[3:4], off offset:16
	; GCN-NEXT: global_load_dwordx4 v[12:15], v[3:4], off offset:32			; GCN-NEXT: global_load_dwordx4 v[12:15], v[3:4], off offset:32
	; GCN-NEXT: global_load_dwordx4 v[56:59], v[3:4], off offset:48			; GCN-NEXT: global_load_dwordx4 v[56:59], v[3:4], off offset:48
	; GCN-NEXT: s_mov_b32 s5, 0			; GCN-NEXT: s_mov_b32 s5, 0
	; GCN-NEXT: v_mov_b32_e32 v3, s4			; GCN-NEXT: v_mov_b32_e32 v3, s4
	; GCN-NEXT: v_mov_b32_e32 v4, s5			; GCN-NEXT: v_mov_b32_e32 v4, s5
	; GCN-NEXT: v_add_co_u32_e32 v3, vcc, v0, v3			; GCN-NEXT: v_add_co_u32_e32 v3, vcc, v0, v3
	; GCN-NEXT: v_addc_co_u32_e32 v4, vcc, v1, v4, vcc			; GCN-NEXT: v_addc_co_u32_e32 v4, vcc, v1, v4, vcc
	; GCN-NEXT: global_load_dwordx4 v[16:19], v[0:1], off			; GCN-NEXT: global_load_dwordx4 v[16:19], v[0:1], off
	; GCN-NEXT: global_load_dwordx4 v[20:23], v[0:1], off offset:16			; GCN-NEXT: global_load_dwordx4 v[20:23], v[0:1], off offset:16
	; GCN-NEXT: global_load_dwordx4 v[24:27], v[0:1], off offset:32			; GCN-NEXT: global_load_dwordx4 v[24:27], v[0:1], off offset:32
	; GCN-NEXT: global_load_dwordx4 v[28:31], v[0:1], off offset:48			; GCN-NEXT: global_load_dwordx4 v[28:31], v[0:1], off offset:48
	; GCN-NEXT: global_load_dwordx4 v[32:35], v[0:1], off offset:64			; GCN-NEXT: global_load_dwordx4 v[32:35], v[0:1], off offset:64
	; GCN-NEXT: global_load_dwordx4 v[36:39], v[0:1], off offset:128			; GCN-NEXT: global_load_dwordx4 v[36:39], v[0:1], off offset:128
	; GCN-NEXT: global_load_dwordx4 v[40:43], v[0:1], off offset:192			; GCN-NEXT: global_load_dwordx4 v[40:43], v[0:1], off offset:192
	; GCN-NEXT: global_load_dwordx4 v[44:47], v[3:4], off offset:16			; GCN-NEXT: global_load_dwordx4 v[44:47], v[3:4], off offset:16
	; GCN-NEXT: global_load_dwordx4 v[48:51], v[3:4], off offset:32			; GCN-NEXT: global_load_dwordx4 v[52:55], v[3:4], off offset:32
	; GCN-NEXT: global_load_dwordx4 v[52:55], v[3:4], off offset:48			; GCN-NEXT: global_load_dwordx4 v[48:51], v[3:4], off offset:48
	; GCN-NEXT: s_movk_i32 s4, 0xc0			; GCN-NEXT: s_movk_i32 s4, 0xc0
	; GCN-NEXT: v_mov_b32_e32 v6, s5			; GCN-NEXT: v_mov_b32_e32 v6, s5
	; GCN-NEXT: v_mov_b32_e32 v5, s4			; GCN-NEXT: v_mov_b32_e32 v5, s4
	; GCN-NEXT: v_add_co_u32_e32 v60, vcc, v0, v5			; GCN-NEXT: v_add_co_u32_e32 v60, vcc, v0, v5
	; GCN-NEXT: v_addc_co_u32_e32 v61, vcc, v1, v6, vcc			; GCN-NEXT: v_addc_co_u32_e32 v61, vcc, v1, v6, vcc
	; GCN-NEXT: v_and_b32_e32 v0, 31, v2			; GCN-NEXT: v_and_b32_e32 v0, 31, v2
	; GCN-NEXT: v_lshrrev_b32_e64 v2, 6, s33			; GCN-NEXT: v_lshrrev_b32_e64 v2, 6, s33
	; GCN-NEXT: v_lshlrev_b32_e32 v0, 3, v0			; GCN-NEXT: v_lshlrev_b32_e32 v0, 3, v0
	; GCN-NEXT: v_add_u32_e32 v2, 0x100, v2			; GCN-NEXT: v_add_u32_e32 v2, 0x100, v2
	; GCN-NEXT: v_add_u32_e32 v1, v2, v0			; GCN-NEXT: v_add_u32_e32 v1, v2, v0
	; GCN-NEXT: s_add_u32 s32, s32, 0x10000			; GCN-NEXT: s_add_u32 s32, s32, 0x10000
	; GCN-NEXT: s_sub_u32 s32, s32, 0x10000			; GCN-NEXT: s_sub_u32 s32, s32, 0x10000
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:640 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v36, off, s[0:3], s33 offset:576 ; 4-byte Folded Spill
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:644 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v37, off, s[0:3], s33 offset:580 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:648 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v38, off, s[0:3], s33 offset:584 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:652 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v39, off, s[0:3], s33 offset:588 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:656 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:592 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:660 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:596 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:664 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:600 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:668 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:604 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:672 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:608 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:676 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:612 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:680 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:616 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:684 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:620 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v52, off, s[0:3], s33 offset:688 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:624 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v53, off, s[0:3], s33 offset:692 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:628 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v54, off, s[0:3], s33 offset:696 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:632 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v55, off, s[0:3], s33 offset:700 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:636 ; 4-byte Folded Spill
	; GCN-NEXT: global_load_dwordx4 v[4:7], v[60:61], off offset:16			; GCN-NEXT: global_load_dwordx4 v[4:7], v[60:61], off offset:16
	; GCN-NEXT: global_load_dwordx4 v[52:55], v[60:61], off offset:32			; GCN-NEXT: global_load_dwordx4 v[48:51], v[60:61], off offset:32
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:576 ; 4-byte Folded Spill
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:580 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:584 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:588 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:592 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:596 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:600 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:604 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v52, off, s[0:3], s33 offset:608 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v53, off, s[0:3], s33 offset:612 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v54, off, s[0:3], s33 offset:616 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v55, off, s[0:3], s33 offset:620 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:624 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:628 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:632 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:636 ; 4-byte Folded Spill
	; GCN-NEXT: global_load_dwordx4 v[52:55], v[60:61], off offset:48
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:512 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:512 ; 4-byte Folded Spill
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:516 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:516 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:520 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:520 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:524 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:524 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:528 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:528 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:532 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:532 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:536 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:536 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:540 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:540 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:544 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:544 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:548 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:548 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:552 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:552 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:556 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:556 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v52, off, s[0:3], s33 offset:560 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v52, off, s[0:3], s33 offset:560 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v53, off, s[0:3], s33 offset:564 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v53, off, s[0:3], s33 offset:564 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v54, off, s[0:3], s33 offset:568 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v54, off, s[0:3], s33 offset:568 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v55, off, s[0:3], s33 offset:572 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v55, off, s[0:3], s33 offset:572 ; 4-byte Folded Spill
				; GCN-NEXT: global_load_dwordx4 v[60:63], v[60:61], off offset:48
	; GCN-NEXT: buffer_store_dword v16, off, s[0:3], s33 offset:256			; GCN-NEXT: buffer_store_dword v16, off, s[0:3], s33 offset:256
	; GCN-NEXT: buffer_store_dword v17, off, s[0:3], s33 offset:260			; GCN-NEXT: buffer_store_dword v17, off, s[0:3], s33 offset:260
	; GCN-NEXT: buffer_store_dword v18, off, s[0:3], s33 offset:264			; GCN-NEXT: buffer_store_dword v18, off, s[0:3], s33 offset:264
	; GCN-NEXT: buffer_store_dword v19, off, s[0:3], s33 offset:268			; GCN-NEXT: buffer_store_dword v19, off, s[0:3], s33 offset:268
	; GCN-NEXT: buffer_store_dword v20, off, s[0:3], s33 offset:272			; GCN-NEXT: buffer_store_dword v20, off, s[0:3], s33 offset:272
	; GCN-NEXT: buffer_store_dword v21, off, s[0:3], s33 offset:276			; GCN-NEXT: buffer_store_dword v21, off, s[0:3], s33 offset:276
	; GCN-NEXT: buffer_store_dword v22, off, s[0:3], s33 offset:280			; GCN-NEXT: buffer_store_dword v22, off, s[0:3], s33 offset:280
	; GCN-NEXT: buffer_store_dword v23, off, s[0:3], s33 offset:284			; GCN-NEXT: buffer_store_dword v23, off, s[0:3], s33 offset:284
	Show All 24 Lines
	; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:368			; GCN-NEXT: buffer_store_dword v56, off, s[0:3], s33 offset:368
	; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:372			; GCN-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:372
	; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:376			; GCN-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:376
	; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:380			; GCN-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:380
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:400			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:400
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:404			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:404
	; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:408			; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s33 offset:408
	; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:412			; GCN-NEXT: buffer_store_dword v47, off, s[0:3], s33 offset:412
	; GCN-NEXT: buffer_store_dword v48, off, s[0:3], s33 offset:416			; GCN-NEXT: buffer_store_dword v52, off, s[0:3], s33 offset:416
	; GCN-NEXT: buffer_store_dword v49, off, s[0:3], s33 offset:420			; GCN-NEXT: buffer_store_dword v53, off, s[0:3], s33 offset:420
	; GCN-NEXT: buffer_store_dword v50, off, s[0:3], s33 offset:424			; GCN-NEXT: buffer_store_dword v54, off, s[0:3], s33 offset:424
	; GCN-NEXT: buffer_store_dword v51, off, s[0:3], s33 offset:428			; GCN-NEXT: buffer_store_dword v55, off, s[0:3], s33 offset:428
	; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:640 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:576 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:644 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:580 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:648 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:584 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:652 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:588 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:656 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:592 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:660 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:596 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:664 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:600 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:668 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:604 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:672 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:608 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:676 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:612 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:680 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:616 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v19, off, s[0:3], s33 offset:684 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v19, off, s[0:3], s33 offset:620 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v20, off, s[0:3], s33 offset:688 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v20, off, s[0:3], s33 offset:624 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v21, off, s[0:3], s33 offset:692 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v21, off, s[0:3], s33 offset:628 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v22, off, s[0:3], s33 offset:696 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v22, off, s[0:3], s33 offset:632 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v23, off, s[0:3], s33 offset:700 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v23, off, s[0:3], s33 offset:636 ; 4-byte Folded Reload
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v12, v20			; GCN-NEXT: v_mov_b32_e32 v12, v20
	; GCN-NEXT: v_mov_b32_e32 v13, v21			; GCN-NEXT: v_mov_b32_e32 v13, v21
	; GCN-NEXT: v_mov_b32_e32 v14, v22			; GCN-NEXT: v_mov_b32_e32 v14, v22
	; GCN-NEXT: v_mov_b32_e32 v15, v23			; GCN-NEXT: v_mov_b32_e32 v15, v23
	; GCN-NEXT: buffer_store_dword v12, off, s[0:3], s33 offset:432			; GCN-NEXT: buffer_store_dword v12, off, s[0:3], s33 offset:432
	; GCN-NEXT: buffer_store_dword v13, off, s[0:3], s33 offset:436			; GCN-NEXT: buffer_store_dword v13, off, s[0:3], s33 offset:436
	; GCN-NEXT: buffer_store_dword v14, off, s[0:3], s33 offset:440			; GCN-NEXT: buffer_store_dword v14, off, s[0:3], s33 offset:440
	; GCN-NEXT: buffer_store_dword v15, off, s[0:3], s33 offset:444			; GCN-NEXT: buffer_store_dword v15, off, s[0:3], s33 offset:444
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:448			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:448
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:452			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:452
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:456			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:456
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:460			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:460
	; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:464			; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:464
	; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:468			; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:468
	; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:472			; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:472
	; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:476			; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:476
	; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:576 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:580 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:584 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:588 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:592 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:596 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:600 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:604 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:608 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:612 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:616 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:620 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:624 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:628 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:632 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:636 ; 4-byte Folded Reload
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v8, v11
	; GCN-NEXT: v_mov_b32_e32 v9, v12
	; GCN-NEXT: v_mov_b32_e32 v10, v13
	; GCN-NEXT: v_mov_b32_e32 v11, v14
	; GCN-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:480
	; GCN-NEXT: buffer_store_dword v9, off, s[0:3], s33 offset:484
	; GCN-NEXT: buffer_store_dword v10, off, s[0:3], s33 offset:488
	; GCN-NEXT: buffer_store_dword v11, off, s[0:3], s33 offset:492
	; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:512 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:512 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:516 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:516 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:520 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:520 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:524 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:524 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:528 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:528 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:532 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v8, off, s[0:3], s33 offset:532 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:536 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 offset:536 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:540 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v10, off, s[0:3], s33 offset:540 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:544 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v11, off, s[0:3], s33 offset:544 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:548 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v12, off, s[0:3], s33 offset:548 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:552 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v13, off, s[0:3], s33 offset:552 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:556 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v14, off, s[0:3], s33 offset:556 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:560 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v15, off, s[0:3], s33 offset:560 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:564 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v16, off, s[0:3], s33 offset:564 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:568 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 offset:568 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:572 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v18, off, s[0:3], s33 offset:572 ; 4-byte Folded Reload
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v12, v15			; GCN-NEXT: v_mov_b32_e32 v8, v11
	; GCN-NEXT: v_mov_b32_e32 v13, v16			; GCN-NEXT: v_mov_b32_e32 v9, v12
	; GCN-NEXT: v_mov_b32_e32 v14, v17			; GCN-NEXT: v_mov_b32_e32 v10, v13
	; GCN-NEXT: v_mov_b32_e32 v15, v18			; GCN-NEXT: v_mov_b32_e32 v11, v14
	; GCN-NEXT: buffer_store_dword v12, off, s[0:3], s33 offset:496			; GCN-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:480
	; GCN-NEXT: buffer_store_dword v13, off, s[0:3], s33 offset:500			; GCN-NEXT: buffer_store_dword v9, off, s[0:3], s33 offset:484
	; GCN-NEXT: buffer_store_dword v14, off, s[0:3], s33 offset:504			; GCN-NEXT: buffer_store_dword v10, off, s[0:3], s33 offset:488
	; GCN-NEXT: buffer_store_dword v15, off, s[0:3], s33 offset:508			; GCN-NEXT: buffer_store_dword v11, off, s[0:3], s33 offset:492
				; GCN-NEXT: buffer_store_dword v60, off, s[0:3], s33 offset:496
				; GCN-NEXT: buffer_store_dword v61, off, s[0:3], s33 offset:500
				; GCN-NEXT: buffer_store_dword v62, off, s[0:3], s33 offset:504
				; GCN-NEXT: buffer_store_dword v63, off, s[0:3], s33 offset:508
	; GCN-NEXT: buffer_load_dword v0, v1, s[0:3], 0 offen			; GCN-NEXT: buffer_load_dword v0, v1, s[0:3], 0 offen
	; GCN-NEXT: buffer_load_dword v1, v1, s[0:3], 0 offen offset:4			; GCN-NEXT: buffer_load_dword v1, v1, s[0:3], 0 offen offset:4
	; GCN-NEXT: buffer_load_dword v61, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v63, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v60, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v62, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v59, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v61, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v58, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v60, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v57, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v59, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v56, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v58, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v57, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v56, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:56 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:60 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b32 s33, s6			; GCN-NEXT: s_mov_b32 s33, s6
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%vec = load <32 x i64>, <32 x i64> addrspace(1)* %ptr			%vec = load <32 x i64>, <32 x i64> addrspace(1)* %ptr
	%elt = extractelement <32 x i64> %vec, i32 %idx			%elt = extractelement <32 x i64> %vec, i32 %idx
	ret i64 %elt			ret i64 %elt
	}			}

llvm/test/CodeGen/AMDGPU/agpr-csr.ll

	Show All 29 Lines
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @func_areg_32() #0 {			define void @func_areg_32() #0 {
	call void asm sideeffect "; use agpr31", "~{a31}" ()			call void asm sideeffect "; use agpr31", "~{a31}" ()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}func_areg_33:			; GCN-LABEL: {{^}}func_areg_33:
	; GCN-NOT: a32			; GCN-NOT: a32
	; GFX90A: buffer_store_dword a32, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX90A: v_accvgpr_read_b32 v0, a32 ; Reload Reuse
	; GCN-NOT: a32			; GCN-NOT: a32
	; GCN: use agpr32			; GCN: use agpr32
	; GCN-NOT: a32			; GCN-NOT: a32
	; GFX90A: buffer_load_dword a32, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX90A: v_accvgpr_write_b32 a32, v0 ; Reload Reuse
	; GCN-NOT: a32			; GCN-NOT: a32
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @func_areg_33() #0 {			define void @func_areg_33() #0 {
	call void asm sideeffect "; use agpr32", "~{a32}" ()			call void asm sideeffect "; use agpr32", "~{a32}" ()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}func_areg_64:			; GCN-LABEL: {{^}}func_areg_64:
	; GFX908-NOT: buffer_			; GFX908-NOT: buffer_
	; GCN-NOT: v_accvgpr			; GCN-NOT: v_accvgpr
	; GFX90A: buffer_store_dword a63,			; GFX90A: v_accvgpr_read_b32 v0, a63 ; Reload Reuse
	; GCN: use agpr63			; GCN: use agpr63
	; GFX90A: buffer_load_dword a63,			; GFX90A: v_accvgpr_write_b32 a63, v0 ; Reload Reuse
	; GCN-NOT: v_accvgpr			; GCN-NOT: v_accvgpr
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @func_areg_64() #0 {			define void @func_areg_64() #0 {
	call void asm sideeffect "; use agpr63", "~{a63}" ()			call void asm sideeffect "; use agpr63", "~{a63}" ()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}func_areg_31_63:			; GCN-LABEL: {{^}}func_areg_31_63:
	; GFX908-NOT: buffer_			; GFX908-NOT: buffer_
	; GCN-NOT: v_accvgpr			; GFX908-NOT: v_accvgpr
	; GFX90A: buffer_store_dword a63,			; GFX908-NOT: buffer
				; GFX90A: v_accvgpr_read_b32 v0, a63 ; Reload Reuse
	; GCN: use agpr31, agpr63			; GCN: use agpr31, agpr63
	; GFX90A: buffer_load_dword a63,			; GFX90A: v_accvgpr_write_b32 a63, v0 ; Reload Reuse
	; GCN-NOT: buffer_			; GFX908-NOT: v_accvgpr
	; GCN-NOT: v_accvgpr			; GFX908-NOT: buffer
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @func_areg_31_63() #0 {			define void @func_areg_31_63() #0 {
	call void asm sideeffect "; use agpr31, agpr63", "~{a31},~{a63}" ()			call void asm sideeffect "; use agpr31, agpr63", "~{a31},~{a63}" ()
	ret void			ret void
	}			}

	declare void @func_unknown() #0			declare void @func_unknown() #0

	▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/alloc-aligned-tuples-gfx908.mir

	# RUN: llc -march=amdgcn -mcpu=gfx908 -start-before=greedy -stop-after=virtregrewriter -verify-machineinstrs -o - %s \| FileCheck --check-prefixes=GCN,GFX908 %s			# RUN: llc -march=amdgcn -mcpu=gfx908 -start-before=greedy,0 -stop-after=virtregrewriter,1 -verify-machineinstrs -o - %s \| FileCheck --check-prefixes=GCN,GFX908 %s

	---			---
	# GCN-LABEL: name: alloc_vgpr_64			# GCN-LABEL: name: alloc_vgpr_64
	# GFX908: $vgpr3_vgpr4 = GLOBAL_LOAD			# GFX908: $vgpr3_vgpr4 = GLOBAL_LOAD
	name: alloc_vgpr_64			name: alloc_vgpr_64
	tracksRegLiveness: true			tracksRegLiveness: true
	liveins:			liveins:
	- { reg: '$vgpr0_vgpr1' }			- { reg: '$vgpr0_vgpr1' }
	▲ Show 20 Lines • Show All 227 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/alloc-aligned-tuples-gfx90a.mir

	# RUN: llc -march=amdgcn -mcpu=gfx90a -start-before=greedy -stop-after=virtregrewriter -verify-machineinstrs -o - %s \| FileCheck --check-prefixes=GCN,GFX90A %s			# RUN: llc -march=amdgcn -mcpu=gfx90a -start-before=greedy,0 -stop-after=virtregrewriter,1 -verify-machineinstrs -o - %s \| FileCheck --check-prefixes=GCN,GFX90A %s
	# Using the unaligned vector tuples are OK as long as they aren't used			# Using the unaligned vector tuples are OK as long as they aren't used
	# in a real instruction.			# in a real instruction.

	---			---
	# GCN-LABEL: name: alloc_vgpr_64			# GCN-LABEL: name: alloc_vgpr_64
	# GFX90A: $vgpr4_vgpr5 = GLOBAL_LOAD			# GFX90A: $vgpr4_vgpr5 = GLOBAL_LOAD
	name: alloc_vgpr_64			name: alloc_vgpr_64
	tracksRegLiveness: true			tracksRegLiveness: true
	▲ Show 20 Lines • Show All 229 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll

	; -enable-misched=false makes the register usage more predictable			; -enable-misched=false makes the register usage more predictable
	; -regalloc=fast just makes the test run faster			; -regalloc=fast just makes the test run faster
	; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-function-calls=false -enable-misched=false -regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX9			; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX9
	; RUN: llc -march=amdgcn -mcpu=gfx90a -amdgpu-function-calls=false -enable-misched=false -regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX90A			; RUN: llc -march=amdgcn -mcpu=gfx90a -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX90A
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -amdgpu-function-calls=false -enable-misched=false -regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX10WGP-WAVE32			; RUN: llc -march=amdgcn -mcpu=gfx1010 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX10WGP-WAVE32
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize64 -amdgpu-function-calls=false -enable-misched=false -regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX10WGP-WAVE64			; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize64 -amdgpu-function-calls=false -enable-misched=false --sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX10WGP-WAVE64
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+cumode -amdgpu-function-calls=false -enable-misched=false -regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX10CU-WAVE32			; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+cumode -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX10CU-WAVE32
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+cumode,+wavefrontsize64 -amdgpu-function-calls=false -enable-misched=false -regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX10CU-WAVE64			; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+cumode,+wavefrontsize64 -amdgpu-function-calls=false -enable-misched=false -sgpr-regalloc=fast -vgpr-regalloc=fast < %s \| FileCheck %s --check-prefixes=GCN,GFX10CU-WAVE64

	define internal void @use256vgprs() {			define internal void @use256vgprs() {
	%v0 = call i32 asm sideeffect "; def $0", "=v"()			%v0 = call i32 asm sideeffect "; def $0", "=v"()
	%v1 = call i32 asm sideeffect "; def $0", "=v"()			%v1 = call i32 asm sideeffect "; def $0", "=v"()
	%v2 = call i32 asm sideeffect "; def $0", "=v"()			%v2 = call i32 asm sideeffect "; def $0", "=v"()
	%v3 = call i32 asm sideeffect "; def $0", "=v"()			%v3 = call i32 asm sideeffect "; def $0", "=v"()
	%v4 = call i32 asm sideeffect "; def $0", "=v"()			%v4 = call i32 asm sideeffect "; def $0", "=v"()
	%v5 = call i32 asm sideeffect "; def $0", "=v"()			%v5 = call i32 asm sideeffect "; def $0", "=v"()
	▲ Show 20 Lines • Show All 553 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

	Show First 20 Lines • Show All 267 Lines • ▼ Show 20 Lines

	; Use a copy to a free SGPR instead of introducing a second CSR VGPR.			; Use a copy to a free SGPR instead of introducing a second CSR VGPR.
	; GCN-LABEL: {{^}}last_lane_vgpr_for_fp_csr:			; GCN-LABEL: {{^}}last_lane_vgpr_for_fp_csr:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:12 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN-NEXT: v_writelane_b32 v1, s33, 63			; GCN-NEXT: v_writelane_b32 v0, s33, 63
	; GCN-COUNT-60: v_writelane_b32 v1			; GCN-COUNT-60: v_writelane_b32 v0
	; GCN: s_mov_b32 s33, s32			; GCN: s_mov_b32 s33, s32
	; GCN-COUNT-2: v_writelane_b32 v1			; GCN-COUNT-2: v_writelane_b32 v0
	; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill			; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill
	; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:8			; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:8
	; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33 offset:8			; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33 offset:8
	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART
	; GCN: v_writelane_b32 v1			; GCN: v_writelane_b32 v0

	; MUBUF: s_add_u32 s32, s32, 0x400			; MUBUF: s_add_u32 s32, s32, 0x400
	; MUBUF: s_sub_u32 s32, s32, 0x400			; MUBUF: s_sub_u32 s32, s32, 0x400
	; FLATSCR: s_add_u32 s32, s32, 16			; FLATSCR: s_add_u32 s32, s32, 16
	; FLATSCR: s_sub_u32 s32, s32, 16			; FLATSCR: s_sub_u32 s32, s32, 16
	; GCN-NEXT: v_readlane_b32 s33, v1, 63			; GCN-NEXT: v_readlane_b32 s33, v0, 63
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:12 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:12 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @last_lane_vgpr_for_fp_csr() #1 {			define void @last_lane_vgpr_for_fp_csr() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	Show All 13 Lines

	; Use a copy to a free SGPR instead of introducing a second CSR VGPR.			; Use a copy to a free SGPR instead of introducing a second CSR VGPR.
	; GCN-LABEL: {{^}}no_new_vgpr_for_fp_csr:			; GCN-LABEL: {{^}}no_new_vgpr_for_fp_csr:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:12 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN-COUNT-62: v_writelane_b32 v1,			; GCN-COUNT-62: v_writelane_b32 v0,
	; GCN: s_mov_b32 [[FP_COPY:s[0-9]+]], s33			; GCN: s_mov_b32 [[FP_COPY:s[0-9]+]], s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN: v_writelane_b32 v1,			; GCN: v_writelane_b32 v0,
	; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill			; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill
	; MUBUF: buffer_store_dword			; MUBUF: buffer_store_dword
	; FLATSCR: scratch_store_dword			; FLATSCR: scratch_store_dword
	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART
	; GCN: v_writelane_b32 v1,			; GCN: v_writelane_b32 v0,
	; MUBUF: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; MUBUF: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; FLATSCR: scratch_load_dword v41, off, s33 ; 4-byte Folded Reload			; FLATSCR: scratch_load_dword v41, off, s33 ; 4-byte Folded Reload
	; MUBUF: s_add_u32 s32, s32, 0x400			; MUBUF: s_add_u32 s32, s32, 0x400
	; FLATSCR: s_add_u32 s32, s32, 16			; FLATSCR: s_add_u32 s32, s32, 16
	; GCN-COUNT-64: v_readlane_b32 s{{[0-9]+}}, v1			; GCN-COUNT-64: v_readlane_b32 s{{[0-9]+}}, v0
	; MUBUF-NEXT: s_sub_u32 s32, s32, 0x400			; MUBUF-NEXT: s_sub_u32 s32, s32, 0x400
	; FLATSCR-NEXT: s_sub_u32 s32, s32, 16			; FLATSCR-NEXT: s_sub_u32 s32, s32, 16
	; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]			; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:12 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:12 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	Show All 39 Lines
	}			}

	; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:			; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:8 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN-NEXT: v_writelane_b32 v1, s33, 2			; GCN-NEXT: v_writelane_b32 v0, s33, 2
	; GCN-NEXT: v_writelane_b32 v1, s30, 0			; GCN-NEXT: v_writelane_b32 v0, s30, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0			; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0
	; GCN: v_writelane_b32 v1, s31, 1			; GCN: v_writelane_b32 v0, s31, 1
	; MUBUF: buffer_store_dword [[ZERO]], off, s[0:3], s33 offset:4			; MUBUF: buffer_store_dword [[ZERO]], off, s[0:3], s33 offset:4
	; FLATSCR: scratch_store_dword off, [[ZERO]], s33 offset:4			; FLATSCR: scratch_store_dword off, [[ZERO]], s33 offset:4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART
	; MUBUF: s_add_u32 s32, s32, 0x300			; MUBUF: s_add_u32 s32, s32, 0x300
	; MUBUF-NEXT: v_readlane_b32 s4, v1, 0			; MUBUF-NEXT: v_readlane_b32 s4, v0, 0
	; MUBUF-NEXT: v_readlane_b32 s5, v1, 1			; MUBUF-NEXT: v_readlane_b32 s5, v0, 1
	; FLATSCR: s_add_u32 s32, s32, 12			; FLATSCR: s_add_u32 s32, s32, 12
	; FLATSCR-NEXT: v_readlane_b32 s0, v1, 0			; FLATSCR-NEXT: v_readlane_b32 s0, v0, 0
	; FLATSCR-NEXT: v_readlane_b32 s1, v1, 1			; FLATSCR-NEXT: v_readlane_b32 s1, v0, 1
	; MUBUF-NEXT: s_sub_u32 s32, s32, 0x300			; MUBUF-NEXT: s_sub_u32 s32, s32, 0x300
	; FLATSCR-NEXT: s_sub_u32 s32, s32, 12			; FLATSCR-NEXT: s_sub_u32 s32, s32, 12
	; GCN-NEXT: v_readlane_b32 s33, v1, 2			; GCN-NEXT: v_readlane_b32 s33, v0, 2
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:8 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_setpc_b64 s[4:5]			; MUBUF-NEXT: s_setpc_b64 s[4:5]
	; FLATSCR-NEXT: s_setpc_b64 s[0:1]			; FLATSCR-NEXT: s_setpc_b64 s[0:1]
	define void @no_unused_non_csr_sgpr_for_fp() #1 {			define void @no_unused_non_csr_sgpr_for_fp() #1 {
	▲ Show 20 Lines • Show All 289 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,864 Lines • ▼ Show 20 Lines	; GFX10-NEXT: s_setpc_b64 s[4:5]
ret void		ret void
}		}

define amdgpu_gfx void @test_call_external_i32_func_i32_imm(i32 addrspace(1)* %out) #0 {		define amdgpu_gfx void @test_call_external_i32_func_i32_imm(i32 addrspace(1)* %out) #0 {
; GFX9-LABEL: test_call_external_i32_func_i32_imm:		; GFX9-LABEL: test_call_external_i32_func_i32_imm:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: v_writelane_b32 v42, s33, 2		; GFX9-NEXT: v_writelane_b32 v40, s33, 2
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: v_writelane_b32 v42, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: s_add_u32 s32, s32, 0x400		; GFX9-NEXT: s_add_u32 s32, s32, 0x400
; GFX9-NEXT: v_mov_b32_e32 v40, v0		; GFX9-NEXT: v_mov_b32_e32 v41, v0
; GFX9-NEXT: v_mov_b32_e32 v0, 42		; GFX9-NEXT: v_mov_b32_e32 v0, 42
; GFX9-NEXT: s_getpc_b64 s[4:5]		; GFX9-NEXT: s_getpc_b64 s[4:5]
; GFX9-NEXT: s_add_u32 s4, s4, external_i32_func_i32@rel32@lo+4		; GFX9-NEXT: s_add_u32 s4, s4, external_i32_func_i32@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s5, s5, external_i32_func_i32@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s5, s5, external_i32_func_i32@rel32@hi+12
; GFX9-NEXT: v_writelane_b32 v42, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: v_mov_b32_e32 v41, v1		; GFX9-NEXT: v_mov_b32_e32 v42, v1
; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
; GFX9-NEXT: global_store_dword v[40:41], v0, off		; GFX9-NEXT: global_store_dword v[41:42], v0, off
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GFX9-NEXT: v_readlane_b32 s4, v42, 0		; GFX9-NEXT: v_readlane_b32 s4, v40, 0
; GFX9-NEXT: v_readlane_b32 s5, v42, 1		; GFX9-NEXT: v_readlane_b32 s5, v40, 1
; GFX9-NEXT: s_sub_u32 s32, s32, 0x400		; GFX9-NEXT: s_sub_u32 s32, s32, 0x400
; GFX9-NEXT: v_readlane_b32 s33, v42, 2		; GFX9-NEXT: v_readlane_b32 s33, v40, 2
; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1		; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[6:7]		; GFX9-NEXT: s_mov_b64 exec, s[6:7]
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[4:5]		; GFX9-NEXT: s_setpc_b64 s[4:5]
;		;
; GFX10-LABEL: test_call_external_i32_func_i32_imm:		; GFX10-LABEL: test_call_external_i32_func_i32_imm:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s4, -1		; GFX10-NEXT: s_or_saveexec_b32 s4, -1
; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s4		; GFX10-NEXT: s_mov_b32 exec_lo, s4
; GFX10-NEXT: v_writelane_b32 v42, s33, 2		; GFX10-NEXT: v_writelane_b32 v40, s33, 2
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_add_u32 s32, s32, 0x200		; GFX10-NEXT: s_add_u32 s32, s32, 0x200
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: v_mov_b32_e32 v40, v0		; GFX10-NEXT: v_mov_b32_e32 v41, v0
; GFX10-NEXT: v_writelane_b32 v42, s30, 0		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: v_mov_b32_e32 v0, 42		; GFX10-NEXT: v_mov_b32_e32 v0, 42
; GFX10-NEXT: s_getpc_b64 s[4:5]		; GFX10-NEXT: s_getpc_b64 s[4:5]
; GFX10-NEXT: s_add_u32 s4, s4, external_i32_func_i32@rel32@lo+4		; GFX10-NEXT: s_add_u32 s4, s4, external_i32_func_i32@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s5, s5, external_i32_func_i32@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s5, s5, external_i32_func_i32@rel32@hi+12
; GFX10-NEXT: v_mov_b32_e32 v41, v1		; GFX10-NEXT: v_mov_b32_e32 v42, v1
; GFX10-NEXT: v_writelane_b32 v42, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
; GFX10-NEXT: global_store_dword v[40:41], v0, off		; GFX10-NEXT: global_store_dword v[41:42], v0, off
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_clause 0x1		; GFX10-NEXT: s_clause 0x1
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33		; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
; GFX10-NEXT: v_readlane_b32 s4, v42, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: v_readlane_b32 s5, v42, 1		; GFX10-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-NEXT: s_sub_u32 s32, s32, 0x200		; GFX10-NEXT: s_sub_u32 s32, s32, 0x200
; GFX10-NEXT: v_readlane_b32 s33, v42, 2		; GFX10-NEXT: v_readlane_b32 s33, v40, 2
; GFX10-NEXT: s_or_saveexec_b32 s6, -1		; GFX10-NEXT: s_or_saveexec_b32 s6, -1
; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s6		; GFX10-NEXT: s_mov_b32 exec_lo, s6
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[4:5]		; GFX10-NEXT: s_setpc_b64 s[4:5]
%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)		%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)
store volatile i32 %val, i32 addrspace(1)* %out		store volatile i32 %val, i32 addrspace(1)* %out
ret void		ret void
}		}
▲ Show 20 Lines • Show All 3,502 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

Show First 20 Lines • Show All 178 Lines • ▼ Show 20 Lines	; GFX10-NEXT: s_setpc_b64 s[4:5]
ret void		ret void
}		}

define amdgpu_gfx void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {		define amdgpu_gfx void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {
; GFX9-LABEL: test_call_void_func_void_mayclobber_v31:		; GFX9-LABEL: test_call_void_func_void_mayclobber_v31:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: v_writelane_b32 v41, s33, 2		; GFX9-NEXT: v_writelane_b32 v40, s33, 2
; GFX9-NEXT: v_writelane_b32 v41, s30, 0		; GFX9-NEXT: v_writelane_b32 v40, s30, 0
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_add_u32 s32, s32, 0x400		; GFX9-NEXT: s_add_u32 s32, s32, 0x400
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def v31		; GFX9-NEXT: ; def v31
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: s_getpc_b64 s[4:5]		; GFX9-NEXT: s_getpc_b64 s[4:5]
; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
; GFX9-NEXT: v_writelane_b32 v41, s31, 1		; GFX9-NEXT: v_writelane_b32 v40, s31, 1
; GFX9-NEXT: v_mov_b32_e32 v40, v31		; GFX9-NEXT: v_mov_b32_e32 v41, v31
; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
; GFX9-NEXT: v_mov_b32_e32 v31, v40		; GFX9-NEXT: v_mov_b32_e32 v31, v41
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; use v31		; GFX9-NEXT: ; use v31
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-NEXT: v_readlane_b32 s4, v41, 0		; GFX9-NEXT: v_readlane_b32 s4, v40, 0
; GFX9-NEXT: v_readlane_b32 s5, v41, 1		; GFX9-NEXT: v_readlane_b32 s5, v40, 1
; GFX9-NEXT: s_sub_u32 s32, s32, 0x400		; GFX9-NEXT: s_sub_u32 s32, s32, 0x400
; GFX9-NEXT: v_readlane_b32 s33, v41, 2		; GFX9-NEXT: v_readlane_b32 s33, v40, 2
; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1		; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[6:7]		; GFX9-NEXT: s_mov_b64 exec, s[6:7]
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[4:5]		; GFX9-NEXT: s_setpc_b64 s[4:5]
;		;
; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:		; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s4, -1		; GFX10-NEXT: s_or_saveexec_b32 s4, -1
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s4		; GFX10-NEXT: s_mov_b32 exec_lo, s4
; GFX10-NEXT: v_writelane_b32 v41, s33, 2		; GFX10-NEXT: v_writelane_b32 v40, s33, 2
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_add_u32 s32, s32, 0x200		; GFX10-NEXT: s_add_u32 s32, s32, 0x200
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def v31		; GFX10-NEXT: ; def v31
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_writelane_b32 v41, s30, 0		; GFX10-NEXT: v_writelane_b32 v40, s30, 0
; GFX10-NEXT: s_getpc_b64 s[4:5]		; GFX10-NEXT: s_getpc_b64 s[4:5]
; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
; GFX10-NEXT: v_mov_b32_e32 v40, v31		; GFX10-NEXT: v_mov_b32_e32 v41, v31
; GFX10-NEXT: v_writelane_b32 v41, s31, 1		; GFX10-NEXT: v_writelane_b32 v40, s31, 1
; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
; GFX10-NEXT: v_mov_b32_e32 v31, v40		; GFX10-NEXT: v_mov_b32_e32 v31, v41
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use v31		; GFX10-NEXT: ; use v31
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX10-NEXT: v_readlane_b32 s4, v41, 0		; GFX10-NEXT: v_readlane_b32 s4, v40, 0
; GFX10-NEXT: v_readlane_b32 s5, v41, 1		; GFX10-NEXT: v_readlane_b32 s5, v40, 1
; GFX10-NEXT: s_sub_u32 s32, s32, 0x200		; GFX10-NEXT: s_sub_u32 s32, s32, 0x200
; GFX10-NEXT: v_readlane_b32 s33, v41, 2		; GFX10-NEXT: v_readlane_b32 s33, v40, 2
; GFX10-NEXT: s_or_saveexec_b32 s6, -1		; GFX10-NEXT: s_or_saveexec_b32 s6, -1
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s6		; GFX10-NEXT: s_mov_b32 exec_lo, s6
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[4:5]		; GFX10-NEXT: s_setpc_b64 s[4:5]
%v31 = call i32 asm sideeffect "; def $0", "={v31}"()		%v31 = call i32 asm sideeffect "; def $0", "={v31}"()
call amdgpu_gfx void @external_void_func_void()		call amdgpu_gfx void @external_void_func_void()
call void asm sideeffect "; use $0", "{v31}"(i32 %v31)		call void asm sideeffect "; use $0", "{v31}"(i32 %v31)
ret void		ret void
▲ Show 20 Lines • Show All 498 Lines • ▼ Show 20 Lines	; GFX10-NEXT: s_setpc_b64 s[4:5]
ret void		ret void
}		}

define amdgpu_gfx void @callee_saved_sgpr_vgpr_kernel() #1 {		define amdgpu_gfx void @callee_saved_sgpr_vgpr_kernel() #1 {
; GFX9-LABEL: callee_saved_sgpr_vgpr_kernel:		; GFX9-LABEL: callee_saved_sgpr_vgpr_kernel:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: v_writelane_b32 v41, s33, 3		; GFX9-NEXT: v_writelane_b32 v40, s33, 3
; GFX9-NEXT: v_writelane_b32 v41, s40, 0		; GFX9-NEXT: v_writelane_b32 v40, s40, 0
; GFX9-NEXT: v_writelane_b32 v41, s30, 1		; GFX9-NEXT: v_writelane_b32 v40, s30, 1
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_add_u32 s32, s32, 0x400		; GFX9-NEXT: s_add_u32 s32, s32, 0x400
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def s40		; GFX9-NEXT: ; def s40
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; def v32		; GFX9-NEXT: ; def v32
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: s_getpc_b64 s[4:5]		; GFX9-NEXT: s_getpc_b64 s[4:5]
; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12		; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
; GFX9-NEXT: v_writelane_b32 v41, s31, 2		; GFX9-NEXT: v_writelane_b32 v40, s31, 2
; GFX9-NEXT: v_mov_b32_e32 v40, v32		; GFX9-NEXT: v_mov_b32_e32 v41, v32
; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; use s40		; GFX9-NEXT: ; use s40
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: ;;#ASMSTART		; GFX9-NEXT: ;;#ASMSTART
; GFX9-NEXT: ; use v40		; GFX9-NEXT: ; use v41
; GFX9-NEXT: ;;#ASMEND		; GFX9-NEXT: ;;#ASMEND
; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-NEXT: v_readlane_b32 s4, v41, 1		; GFX9-NEXT: v_readlane_b32 s4, v40, 1
; GFX9-NEXT: v_readlane_b32 s5, v41, 2		; GFX9-NEXT: v_readlane_b32 s5, v40, 2
; GFX9-NEXT: v_readlane_b32 s40, v41, 0		; GFX9-NEXT: v_readlane_b32 s40, v40, 0
; GFX9-NEXT: s_sub_u32 s32, s32, 0x400		; GFX9-NEXT: s_sub_u32 s32, s32, 0x400
; GFX9-NEXT: v_readlane_b32 s33, v41, 3		; GFX9-NEXT: v_readlane_b32 s33, v40, 3
; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1		; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[6:7]		; GFX9-NEXT: s_mov_b64 exec, s[6:7]
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[4:5]		; GFX9-NEXT: s_setpc_b64 s[4:5]
;		;
; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:		; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_or_saveexec_b32 s4, -1		; GFX10-NEXT: s_or_saveexec_b32 s4, -1
; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s4		; GFX10-NEXT: s_mov_b32 exec_lo, s4
; GFX10-NEXT: v_writelane_b32 v41, s33, 3		; GFX10-NEXT: v_writelane_b32 v40, s33, 3
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_add_u32 s32, s32, 0x200		; GFX10-NEXT: s_add_u32 s32, s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[4:5]		; GFX10-NEXT: s_getpc_b64 s[4:5]
; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12		; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX10-NEXT: v_writelane_b32 v41, s40, 0		; GFX10-NEXT: v_writelane_b32 v40, s40, 0
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def s40		; GFX10-NEXT: ; def s40
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; def v32		; GFX10-NEXT: ; def v32
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: v_mov_b32_e32 v40, v32		; GFX10-NEXT: v_mov_b32_e32 v41, v32
; GFX10-NEXT: v_writelane_b32 v41, s30, 1		; GFX10-NEXT: v_writelane_b32 v40, s30, 1
; GFX10-NEXT: v_writelane_b32 v41, s31, 2		; GFX10-NEXT: v_writelane_b32 v40, s31, 2
; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use s40		; GFX10-NEXT: ; use s40
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: ;;#ASMSTART		; GFX10-NEXT: ;;#ASMSTART
; GFX10-NEXT: ; use v40		; GFX10-NEXT: ; use v41
; GFX10-NEXT: ;;#ASMEND		; GFX10-NEXT: ;;#ASMEND
; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX10-NEXT: v_readlane_b32 s4, v41, 1		; GFX10-NEXT: v_readlane_b32 s4, v40, 1
; GFX10-NEXT: v_readlane_b32 s5, v41, 2		; GFX10-NEXT: v_readlane_b32 s5, v40, 2
; GFX10-NEXT: v_readlane_b32 s40, v41, 0		; GFX10-NEXT: v_readlane_b32 s40, v40, 0
; GFX10-NEXT: s_sub_u32 s32, s32, 0x200		; GFX10-NEXT: s_sub_u32 s32, s32, 0x200
; GFX10-NEXT: v_readlane_b32 s33, v41, 3		; GFX10-NEXT: v_readlane_b32 s33, v40, 3
; GFX10-NEXT: s_or_saveexec_b32 s6, -1		; GFX10-NEXT: s_or_saveexec_b32 s6, -1
; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GFX10-NEXT: s_waitcnt_depctr 0xffe3		; GFX10-NEXT: s_waitcnt_depctr 0xffe3
; GFX10-NEXT: s_mov_b32 exec_lo, s6		; GFX10-NEXT: s_mov_b32 exec_lo, s6
; GFX10-NEXT: s_waitcnt vmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[4:5]		; GFX10-NEXT: s_setpc_b64 s[4:5]
%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0		%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0
%v32 = call i32 asm sideeffect "; def v32", "={v32}"() #0		%v32 = call i32 asm sideeffect "; def v32", "={v32}"() #0
call amdgpu_gfx void @external_void_func_void()		call amdgpu_gfx void @external_void_func_void()
call void asm sideeffect "; use $0", "s"(i32 %s40) #0		call void asm sideeffect "; use $0", "s"(i32 %s40) #0
call void asm sideeffect "; use $0", "v"(i32 %v32) #0		call void asm sideeffect "; use $0", "v"(i32 %v32) #0
ret void		ret void
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }
attributes #1 = { nounwind noinline }		attributes #1 = { nounwind noinline }

llvm/test/CodeGen/AMDGPU/indirect-call.ll

Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines	; GCN-NEXT: s_endpgm
ret void		ret void
}		}

define void @test_indirect_call_vgpr_ptr(void()* %fptr) {		define void @test_indirect_call_vgpr_ptr(void()* %fptr) {
; GCN-LABEL: test_indirect_call_vgpr_ptr:		; GCN-LABEL: test_indirect_call_vgpr_ptr:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1		; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[16:17]		; GCN-NEXT: s_mov_b64 exec, s[16:17]
; GCN-NEXT: v_writelane_b32 v43, s33, 17		; GCN-NEXT: v_writelane_b32 v40, s33, 17
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_add_u32 s32, s32, 0x800		; GCN-NEXT: s_add_u32 s32, s32, 0x800
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: v_writelane_b32 v43, s34, 0		; GCN-NEXT: v_writelane_b32 v40, s34, 0
; GCN-NEXT: v_writelane_b32 v43, s35, 1		; GCN-NEXT: v_writelane_b32 v40, s35, 1
; GCN-NEXT: v_writelane_b32 v43, s36, 2		; GCN-NEXT: v_writelane_b32 v40, s36, 2
; GCN-NEXT: v_writelane_b32 v43, s38, 3		; GCN-NEXT: v_writelane_b32 v40, s38, 3
; GCN-NEXT: v_writelane_b32 v43, s39, 4		; GCN-NEXT: v_writelane_b32 v40, s39, 4
; GCN-NEXT: v_writelane_b32 v43, s40, 5		; GCN-NEXT: v_writelane_b32 v40, s40, 5
; GCN-NEXT: v_writelane_b32 v43, s41, 6		; GCN-NEXT: v_writelane_b32 v40, s41, 6
; GCN-NEXT: v_writelane_b32 v43, s42, 7		; GCN-NEXT: v_writelane_b32 v40, s42, 7
; GCN-NEXT: v_writelane_b32 v43, s43, 8		; GCN-NEXT: v_writelane_b32 v40, s43, 8
; GCN-NEXT: v_writelane_b32 v43, s44, 9		; GCN-NEXT: v_writelane_b32 v40, s44, 9
; GCN-NEXT: v_writelane_b32 v43, s45, 10		; GCN-NEXT: v_writelane_b32 v40, s45, 10
; GCN-NEXT: v_writelane_b32 v43, s46, 11		; GCN-NEXT: v_writelane_b32 v40, s46, 11
; GCN-NEXT: v_writelane_b32 v43, s47, 12		; GCN-NEXT: v_writelane_b32 v40, s47, 12
; GCN-NEXT: v_writelane_b32 v43, s48, 13		; GCN-NEXT: v_writelane_b32 v40, s48, 13
; GCN-NEXT: v_writelane_b32 v43, s49, 14		; GCN-NEXT: v_writelane_b32 v40, s49, 14
; GCN-NEXT: v_writelane_b32 v43, s30, 15		; GCN-NEXT: v_writelane_b32 v40, s30, 15
; GCN-NEXT: v_writelane_b32 v43, s31, 16		; GCN-NEXT: v_writelane_b32 v40, s31, 16
; GCN-NEXT: v_mov_b32_e32 v40, v31		; GCN-NEXT: v_mov_b32_e32 v41, v31
; GCN-NEXT: s_mov_b32 s34, s14		; GCN-NEXT: s_mov_b32 s34, s14
; GCN-NEXT: s_mov_b32 s35, s13		; GCN-NEXT: s_mov_b32 s35, s13
; GCN-NEXT: s_mov_b32 s36, s12		; GCN-NEXT: s_mov_b32 s36, s12
; GCN-NEXT: s_mov_b64 s[38:39], s[10:11]		; GCN-NEXT: s_mov_b64 s[38:39], s[10:11]
; GCN-NEXT: s_mov_b64 s[40:41], s[8:9]		; GCN-NEXT: s_mov_b64 s[40:41], s[8:9]
; GCN-NEXT: s_mov_b64 s[42:43], s[6:7]		; GCN-NEXT: s_mov_b64 s[42:43], s[6:7]
; GCN-NEXT: s_mov_b64 s[44:45], s[4:5]		; GCN-NEXT: s_mov_b64 s[44:45], s[4:5]
; GCN-NEXT: v_mov_b32_e32 v42, v1		; GCN-NEXT: v_mov_b32_e32 v43, v1
; GCN-NEXT: v_mov_b32_e32 v41, v0		; GCN-NEXT: v_mov_b32_e32 v42, v0
; GCN-NEXT: s_mov_b64 s[46:47], exec		; GCN-NEXT: s_mov_b64 s[46:47], exec
; GCN-NEXT: BB2_1: ; =>This Inner Loop Header: Depth=1		; GCN-NEXT: BB2_1: ; =>This Inner Loop Header: Depth=1
; GCN-NEXT: v_readfirstlane_b32 s16, v41		; GCN-NEXT: v_readfirstlane_b32 s16, v42
; GCN-NEXT: v_readfirstlane_b32 s17, v42		; GCN-NEXT: v_readfirstlane_b32 s17, v43
; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[16:17], v[41:42]		; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[16:17], v[42:43]
; GCN-NEXT: s_and_saveexec_b64 s[48:49], vcc		; GCN-NEXT: s_and_saveexec_b64 s[48:49], vcc
; GCN-NEXT: s_mov_b64 s[4:5], s[44:45]		; GCN-NEXT: s_mov_b64 s[4:5], s[44:45]
; GCN-NEXT: s_mov_b64 s[6:7], s[42:43]		; GCN-NEXT: s_mov_b64 s[6:7], s[42:43]
; GCN-NEXT: s_mov_b64 s[8:9], s[40:41]		; GCN-NEXT: s_mov_b64 s[8:9], s[40:41]
; GCN-NEXT: s_mov_b64 s[10:11], s[38:39]		; GCN-NEXT: s_mov_b64 s[10:11], s[38:39]
; GCN-NEXT: s_mov_b32 s12, s36		; GCN-NEXT: s_mov_b32 s12, s36
; GCN-NEXT: s_mov_b32 s13, s35		; GCN-NEXT: s_mov_b32 s13, s35
; GCN-NEXT: s_mov_b32 s14, s34		; GCN-NEXT: s_mov_b32 s14, s34
; GCN-NEXT: v_mov_b32_e32 v31, v40		; GCN-NEXT: v_mov_b32_e32 v31, v41
; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]		; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
; GCN-NEXT: s_xor_b64 exec, exec, s[48:49]		; GCN-NEXT: s_xor_b64 exec, exec, s[48:49]
; GCN-NEXT: s_cbranch_execnz BB2_1		; GCN-NEXT: s_cbranch_execnz BB2_1
; GCN-NEXT: ; %bb.2:		; GCN-NEXT: ; %bb.2:
; GCN-NEXT: s_mov_b64 exec, s[46:47]		; GCN-NEXT: s_mov_b64 exec, s[46:47]
; GCN-NEXT: v_readlane_b32 s4, v43, 15		; GCN-NEXT: v_readlane_b32 s4, v40, 15
; GCN-NEXT: v_readlane_b32 s5, v43, 16		; GCN-NEXT: v_readlane_b32 s5, v40, 16
; GCN-NEXT: v_readlane_b32 s49, v43, 14		; GCN-NEXT: v_readlane_b32 s49, v40, 14
; GCN-NEXT: v_readlane_b32 s48, v43, 13		; GCN-NEXT: v_readlane_b32 s48, v40, 13
; GCN-NEXT: v_readlane_b32 s47, v43, 12		; GCN-NEXT: v_readlane_b32 s47, v40, 12
; GCN-NEXT: v_readlane_b32 s46, v43, 11		; GCN-NEXT: v_readlane_b32 s46, v40, 11
; GCN-NEXT: v_readlane_b32 s45, v43, 10		; GCN-NEXT: v_readlane_b32 s45, v40, 10
; GCN-NEXT: v_readlane_b32 s44, v43, 9		; GCN-NEXT: v_readlane_b32 s44, v40, 9
; GCN-NEXT: v_readlane_b32 s43, v43, 8		; GCN-NEXT: v_readlane_b32 s43, v40, 8
; GCN-NEXT: v_readlane_b32 s42, v43, 7		; GCN-NEXT: v_readlane_b32 s42, v40, 7
; GCN-NEXT: v_readlane_b32 s41, v43, 6		; GCN-NEXT: v_readlane_b32 s41, v40, 6
; GCN-NEXT: v_readlane_b32 s40, v43, 5		; GCN-NEXT: v_readlane_b32 s40, v40, 5
; GCN-NEXT: v_readlane_b32 s39, v43, 4		; GCN-NEXT: v_readlane_b32 s39, v40, 4
; GCN-NEXT: v_readlane_b32 s38, v43, 3		; GCN-NEXT: v_readlane_b32 s38, v40, 3
; GCN-NEXT: v_readlane_b32 s36, v43, 2		; GCN-NEXT: v_readlane_b32 s36, v40, 2
; GCN-NEXT: v_readlane_b32 s35, v43, 1		; GCN-NEXT: v_readlane_b32 s35, v40, 1
; GCN-NEXT: v_readlane_b32 s34, v43, 0		; GCN-NEXT: v_readlane_b32 s34, v40, 0
; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: s_sub_u32 s32, s32, 0x800		; GCN-NEXT: s_sub_u32 s32, s32, 0x800
; GCN-NEXT: v_readlane_b32 s33, v43, 17		; GCN-NEXT: v_readlane_b32 s33, v40, 17
; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1		; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[6:7]		; GCN-NEXT: s_mov_b64 exec, s[6:7]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[4:5]		; GCN-NEXT: s_setpc_b64 s[4:5]
call void %fptr()		call void %fptr()
ret void		ret void
}		}

define void @test_indirect_call_vgpr_ptr_arg(void(i32)* %fptr) {		define void @test_indirect_call_vgpr_ptr_arg(void(i32)* %fptr) {
; GCN-LABEL: test_indirect_call_vgpr_ptr_arg:		; GCN-LABEL: test_indirect_call_vgpr_ptr_arg:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1		; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[16:17]		; GCN-NEXT: s_mov_b64 exec, s[16:17]
; GCN-NEXT: v_writelane_b32 v43, s33, 17		; GCN-NEXT: v_writelane_b32 v40, s33, 17
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_add_u32 s32, s32, 0x800		; GCN-NEXT: s_add_u32 s32, s32, 0x800
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: v_writelane_b32 v43, s34, 0		; GCN-NEXT: v_writelane_b32 v40, s34, 0
; GCN-NEXT: v_writelane_b32 v43, s35, 1		; GCN-NEXT: v_writelane_b32 v40, s35, 1
; GCN-NEXT: v_writelane_b32 v43, s36, 2		; GCN-NEXT: v_writelane_b32 v40, s36, 2
; GCN-NEXT: v_writelane_b32 v43, s38, 3		; GCN-NEXT: v_writelane_b32 v40, s38, 3
; GCN-NEXT: v_writelane_b32 v43, s39, 4		; GCN-NEXT: v_writelane_b32 v40, s39, 4
; GCN-NEXT: v_writelane_b32 v43, s40, 5		; GCN-NEXT: v_writelane_b32 v40, s40, 5
; GCN-NEXT: v_writelane_b32 v43, s41, 6		; GCN-NEXT: v_writelane_b32 v40, s41, 6
; GCN-NEXT: v_writelane_b32 v43, s42, 7		; GCN-NEXT: v_writelane_b32 v40, s42, 7
; GCN-NEXT: v_writelane_b32 v43, s43, 8		; GCN-NEXT: v_writelane_b32 v40, s43, 8
; GCN-NEXT: v_writelane_b32 v43, s44, 9		; GCN-NEXT: v_writelane_b32 v40, s44, 9
; GCN-NEXT: v_writelane_b32 v43, s45, 10		; GCN-NEXT: v_writelane_b32 v40, s45, 10
; GCN-NEXT: v_writelane_b32 v43, s46, 11		; GCN-NEXT: v_writelane_b32 v40, s46, 11
; GCN-NEXT: v_writelane_b32 v43, s47, 12		; GCN-NEXT: v_writelane_b32 v40, s47, 12
; GCN-NEXT: v_writelane_b32 v43, s48, 13		; GCN-NEXT: v_writelane_b32 v40, s48, 13
; GCN-NEXT: v_writelane_b32 v43, s49, 14		; GCN-NEXT: v_writelane_b32 v40, s49, 14
; GCN-NEXT: v_writelane_b32 v43, s30, 15		; GCN-NEXT: v_writelane_b32 v40, s30, 15
; GCN-NEXT: v_writelane_b32 v43, s31, 16		; GCN-NEXT: v_writelane_b32 v40, s31, 16
; GCN-NEXT: v_mov_b32_e32 v40, v31		; GCN-NEXT: v_mov_b32_e32 v41, v31
; GCN-NEXT: s_mov_b32 s34, s14		; GCN-NEXT: s_mov_b32 s34, s14
; GCN-NEXT: s_mov_b32 s35, s13		; GCN-NEXT: s_mov_b32 s35, s13
; GCN-NEXT: s_mov_b32 s36, s12		; GCN-NEXT: s_mov_b32 s36, s12
; GCN-NEXT: s_mov_b64 s[38:39], s[10:11]		; GCN-NEXT: s_mov_b64 s[38:39], s[10:11]
; GCN-NEXT: s_mov_b64 s[40:41], s[8:9]		; GCN-NEXT: s_mov_b64 s[40:41], s[8:9]
; GCN-NEXT: s_mov_b64 s[42:43], s[6:7]		; GCN-NEXT: s_mov_b64 s[42:43], s[6:7]
; GCN-NEXT: s_mov_b64 s[44:45], s[4:5]		; GCN-NEXT: s_mov_b64 s[44:45], s[4:5]
; GCN-NEXT: v_mov_b32_e32 v42, v1		; GCN-NEXT: v_mov_b32_e32 v43, v1
; GCN-NEXT: v_mov_b32_e32 v41, v0		; GCN-NEXT: v_mov_b32_e32 v42, v0
; GCN-NEXT: s_mov_b64 s[46:47], exec		; GCN-NEXT: s_mov_b64 s[46:47], exec
; GCN-NEXT: BB3_1: ; =>This Inner Loop Header: Depth=1		; GCN-NEXT: BB3_1: ; =>This Inner Loop Header: Depth=1
; GCN-NEXT: v_readfirstlane_b32 s16, v41		; GCN-NEXT: v_readfirstlane_b32 s16, v42
; GCN-NEXT: v_readfirstlane_b32 s17, v42		; GCN-NEXT: v_readfirstlane_b32 s17, v43
; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[16:17], v[41:42]		; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[16:17], v[42:43]
; GCN-NEXT: s_and_saveexec_b64 s[48:49], vcc		; GCN-NEXT: s_and_saveexec_b64 s[48:49], vcc
; GCN-NEXT: v_mov_b32_e32 v0, 0x7b		; GCN-NEXT: v_mov_b32_e32 v0, 0x7b
; GCN-NEXT: s_mov_b64 s[4:5], s[44:45]		; GCN-NEXT: s_mov_b64 s[4:5], s[44:45]
; GCN-NEXT: s_mov_b64 s[6:7], s[42:43]		; GCN-NEXT: s_mov_b64 s[6:7], s[42:43]
; GCN-NEXT: s_mov_b64 s[8:9], s[40:41]		; GCN-NEXT: s_mov_b64 s[8:9], s[40:41]
; GCN-NEXT: s_mov_b64 s[10:11], s[38:39]		; GCN-NEXT: s_mov_b64 s[10:11], s[38:39]
; GCN-NEXT: s_mov_b32 s12, s36		; GCN-NEXT: s_mov_b32 s12, s36
; GCN-NEXT: s_mov_b32 s13, s35		; GCN-NEXT: s_mov_b32 s13, s35
; GCN-NEXT: s_mov_b32 s14, s34		; GCN-NEXT: s_mov_b32 s14, s34
; GCN-NEXT: v_mov_b32_e32 v31, v40		; GCN-NEXT: v_mov_b32_e32 v31, v41
; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]		; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
; GCN-NEXT: s_xor_b64 exec, exec, s[48:49]		; GCN-NEXT: s_xor_b64 exec, exec, s[48:49]
; GCN-NEXT: s_cbranch_execnz BB3_1		; GCN-NEXT: s_cbranch_execnz BB3_1
; GCN-NEXT: ; %bb.2:		; GCN-NEXT: ; %bb.2:
; GCN-NEXT: s_mov_b64 exec, s[46:47]		; GCN-NEXT: s_mov_b64 exec, s[46:47]
; GCN-NEXT: v_readlane_b32 s4, v43, 15		; GCN-NEXT: v_readlane_b32 s4, v40, 15
; GCN-NEXT: v_readlane_b32 s5, v43, 16		; GCN-NEXT: v_readlane_b32 s5, v40, 16
; GCN-NEXT: v_readlane_b32 s49, v43, 14		; GCN-NEXT: v_readlane_b32 s49, v40, 14
; GCN-NEXT: v_readlane_b32 s48, v43, 13		; GCN-NEXT: v_readlane_b32 s48, v40, 13
; GCN-NEXT: v_readlane_b32 s47, v43, 12		; GCN-NEXT: v_readlane_b32 s47, v40, 12
; GCN-NEXT: v_readlane_b32 s46, v43, 11		; GCN-NEXT: v_readlane_b32 s46, v40, 11
; GCN-NEXT: v_readlane_b32 s45, v43, 10		; GCN-NEXT: v_readlane_b32 s45, v40, 10
; GCN-NEXT: v_readlane_b32 s44, v43, 9		; GCN-NEXT: v_readlane_b32 s44, v40, 9
; GCN-NEXT: v_readlane_b32 s43, v43, 8		; GCN-NEXT: v_readlane_b32 s43, v40, 8
; GCN-NEXT: v_readlane_b32 s42, v43, 7		; GCN-NEXT: v_readlane_b32 s42, v40, 7
; GCN-NEXT: v_readlane_b32 s41, v43, 6		; GCN-NEXT: v_readlane_b32 s41, v40, 6
; GCN-NEXT: v_readlane_b32 s40, v43, 5		; GCN-NEXT: v_readlane_b32 s40, v40, 5
; GCN-NEXT: v_readlane_b32 s39, v43, 4		; GCN-NEXT: v_readlane_b32 s39, v40, 4
; GCN-NEXT: v_readlane_b32 s38, v43, 3		; GCN-NEXT: v_readlane_b32 s38, v40, 3
; GCN-NEXT: v_readlane_b32 s36, v43, 2		; GCN-NEXT: v_readlane_b32 s36, v40, 2
; GCN-NEXT: v_readlane_b32 s35, v43, 1		; GCN-NEXT: v_readlane_b32 s35, v40, 1
; GCN-NEXT: v_readlane_b32 s34, v43, 0		; GCN-NEXT: v_readlane_b32 s34, v40, 0
; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: s_sub_u32 s32, s32, 0x800		; GCN-NEXT: s_sub_u32 s32, s32, 0x800
; GCN-NEXT: v_readlane_b32 s33, v43, 17		; GCN-NEXT: v_readlane_b32 s33, v40, 17
; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1		; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[6:7]		; GCN-NEXT: s_mov_b64 exec, s[6:7]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[4:5]		; GCN-NEXT: s_setpc_b64 s[4:5]
call void %fptr(i32 123)		call void %fptr(i32 123)
ret void		ret void
}		}

define i32 @test_indirect_call_vgpr_ptr_ret(i32()* %fptr) {		define i32 @test_indirect_call_vgpr_ptr_ret(i32()* %fptr) {
; GCN-LABEL: test_indirect_call_vgpr_ptr_ret:		; GCN-LABEL: test_indirect_call_vgpr_ptr_ret:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1		; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[16:17]		; GCN-NEXT: s_mov_b64 exec, s[16:17]
; GCN-NEXT: v_writelane_b32 v43, s33, 17		; GCN-NEXT: v_writelane_b32 v40, s33, 17
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_add_u32 s32, s32, 0x800		; GCN-NEXT: s_add_u32 s32, s32, 0x800
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: v_writelane_b32 v43, s34, 0		; GCN-NEXT: v_writelane_b32 v40, s34, 0
; GCN-NEXT: v_writelane_b32 v43, s35, 1		; GCN-NEXT: v_writelane_b32 v40, s35, 1
; GCN-NEXT: v_writelane_b32 v43, s36, 2		; GCN-NEXT: v_writelane_b32 v40, s36, 2
; GCN-NEXT: v_writelane_b32 v43, s38, 3		; GCN-NEXT: v_writelane_b32 v40, s38, 3
; GCN-NEXT: v_writelane_b32 v43, s39, 4		; GCN-NEXT: v_writelane_b32 v40, s39, 4
; GCN-NEXT: v_writelane_b32 v43, s40, 5		; GCN-NEXT: v_writelane_b32 v40, s40, 5
; GCN-NEXT: v_writelane_b32 v43, s41, 6		; GCN-NEXT: v_writelane_b32 v40, s41, 6
; GCN-NEXT: v_writelane_b32 v43, s42, 7		; GCN-NEXT: v_writelane_b32 v40, s42, 7
; GCN-NEXT: v_writelane_b32 v43, s43, 8		; GCN-NEXT: v_writelane_b32 v40, s43, 8
; GCN-NEXT: v_writelane_b32 v43, s44, 9		; GCN-NEXT: v_writelane_b32 v40, s44, 9
; GCN-NEXT: v_writelane_b32 v43, s45, 10		; GCN-NEXT: v_writelane_b32 v40, s45, 10
; GCN-NEXT: v_writelane_b32 v43, s46, 11		; GCN-NEXT: v_writelane_b32 v40, s46, 11
; GCN-NEXT: v_writelane_b32 v43, s47, 12		; GCN-NEXT: v_writelane_b32 v40, s47, 12
; GCN-NEXT: v_writelane_b32 v43, s48, 13		; GCN-NEXT: v_writelane_b32 v40, s48, 13
; GCN-NEXT: v_writelane_b32 v43, s49, 14		; GCN-NEXT: v_writelane_b32 v40, s49, 14
; GCN-NEXT: v_writelane_b32 v43, s30, 15		; GCN-NEXT: v_writelane_b32 v40, s30, 15
; GCN-NEXT: v_writelane_b32 v43, s31, 16		; GCN-NEXT: v_writelane_b32 v40, s31, 16
; GCN-NEXT: v_mov_b32_e32 v40, v31		; GCN-NEXT: v_mov_b32_e32 v41, v31
; GCN-NEXT: s_mov_b32 s34, s14		; GCN-NEXT: s_mov_b32 s34, s14
; GCN-NEXT: s_mov_b32 s35, s13		; GCN-NEXT: s_mov_b32 s35, s13
; GCN-NEXT: s_mov_b32 s36, s12		; GCN-NEXT: s_mov_b32 s36, s12
; GCN-NEXT: s_mov_b64 s[38:39], s[10:11]		; GCN-NEXT: s_mov_b64 s[38:39], s[10:11]
; GCN-NEXT: s_mov_b64 s[40:41], s[8:9]		; GCN-NEXT: s_mov_b64 s[40:41], s[8:9]
; GCN-NEXT: s_mov_b64 s[42:43], s[6:7]		; GCN-NEXT: s_mov_b64 s[42:43], s[6:7]
; GCN-NEXT: s_mov_b64 s[44:45], s[4:5]		; GCN-NEXT: s_mov_b64 s[44:45], s[4:5]
; GCN-NEXT: v_mov_b32_e32 v42, v1		; GCN-NEXT: v_mov_b32_e32 v43, v1
; GCN-NEXT: v_mov_b32_e32 v41, v0		; GCN-NEXT: v_mov_b32_e32 v42, v0
; GCN-NEXT: s_mov_b64 s[46:47], exec		; GCN-NEXT: s_mov_b64 s[46:47], exec
; GCN-NEXT: BB4_1: ; =>This Inner Loop Header: Depth=1		; GCN-NEXT: BB4_1: ; =>This Inner Loop Header: Depth=1
; GCN-NEXT: v_readfirstlane_b32 s16, v41		; GCN-NEXT: v_readfirstlane_b32 s16, v42
; GCN-NEXT: v_readfirstlane_b32 s17, v42		; GCN-NEXT: v_readfirstlane_b32 s17, v43
; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[16:17], v[41:42]		; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[16:17], v[42:43]
; GCN-NEXT: s_and_saveexec_b64 s[48:49], vcc		; GCN-NEXT: s_and_saveexec_b64 s[48:49], vcc
; GCN-NEXT: s_mov_b64 s[4:5], s[44:45]		; GCN-NEXT: s_mov_b64 s[4:5], s[44:45]
; GCN-NEXT: s_mov_b64 s[6:7], s[42:43]		; GCN-NEXT: s_mov_b64 s[6:7], s[42:43]
; GCN-NEXT: s_mov_b64 s[8:9], s[40:41]		; GCN-NEXT: s_mov_b64 s[8:9], s[40:41]
; GCN-NEXT: s_mov_b64 s[10:11], s[38:39]		; GCN-NEXT: s_mov_b64 s[10:11], s[38:39]
; GCN-NEXT: s_mov_b32 s12, s36		; GCN-NEXT: s_mov_b32 s12, s36
; GCN-NEXT: s_mov_b32 s13, s35		; GCN-NEXT: s_mov_b32 s13, s35
; GCN-NEXT: s_mov_b32 s14, s34		; GCN-NEXT: s_mov_b32 s14, s34
; GCN-NEXT: v_mov_b32_e32 v31, v40		; GCN-NEXT: v_mov_b32_e32 v31, v41
; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]		; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
; GCN-NEXT: s_xor_b64 exec, exec, s[48:49]		; GCN-NEXT: s_xor_b64 exec, exec, s[48:49]
; GCN-NEXT: s_cbranch_execnz BB4_1		; GCN-NEXT: s_cbranch_execnz BB4_1
; GCN-NEXT: ; %bb.2:		; GCN-NEXT: ; %bb.2:
; GCN-NEXT: s_mov_b64 exec, s[46:47]		; GCN-NEXT: s_mov_b64 exec, s[46:47]
; GCN-NEXT: v_add_i32_e32 v0, vcc, 1, v0		; GCN-NEXT: v_add_i32_e32 v0, vcc, 1, v0
; GCN-NEXT: v_readlane_b32 s4, v43, 15		; GCN-NEXT: v_readlane_b32 s4, v40, 15
; GCN-NEXT: v_readlane_b32 s5, v43, 16		; GCN-NEXT: v_readlane_b32 s5, v40, 16
; GCN-NEXT: v_readlane_b32 s49, v43, 14		; GCN-NEXT: v_readlane_b32 s49, v40, 14
; GCN-NEXT: v_readlane_b32 s48, v43, 13		; GCN-NEXT: v_readlane_b32 s48, v40, 13
; GCN-NEXT: v_readlane_b32 s47, v43, 12		; GCN-NEXT: v_readlane_b32 s47, v40, 12
; GCN-NEXT: v_readlane_b32 s46, v43, 11		; GCN-NEXT: v_readlane_b32 s46, v40, 11
; GCN-NEXT: v_readlane_b32 s45, v43, 10		; GCN-NEXT: v_readlane_b32 s45, v40, 10
; GCN-NEXT: v_readlane_b32 s44, v43, 9		; GCN-NEXT: v_readlane_b32 s44, v40, 9
; GCN-NEXT: v_readlane_b32 s43, v43, 8		; GCN-NEXT: v_readlane_b32 s43, v40, 8
; GCN-NEXT: v_readlane_b32 s42, v43, 7		; GCN-NEXT: v_readlane_b32 s42, v40, 7
; GCN-NEXT: v_readlane_b32 s41, v43, 6		; GCN-NEXT: v_readlane_b32 s41, v40, 6
; GCN-NEXT: v_readlane_b32 s40, v43, 5		; GCN-NEXT: v_readlane_b32 s40, v40, 5
; GCN-NEXT: v_readlane_b32 s39, v43, 4		; GCN-NEXT: v_readlane_b32 s39, v40, 4
; GCN-NEXT: v_readlane_b32 s38, v43, 3		; GCN-NEXT: v_readlane_b32 s38, v40, 3
; GCN-NEXT: v_readlane_b32 s36, v43, 2		; GCN-NEXT: v_readlane_b32 s36, v40, 2
; GCN-NEXT: v_readlane_b32 s35, v43, 1		; GCN-NEXT: v_readlane_b32 s35, v40, 1
; GCN-NEXT: v_readlane_b32 s34, v43, 0		; GCN-NEXT: v_readlane_b32 s34, v40, 0
; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: s_sub_u32 s32, s32, 0x800		; GCN-NEXT: s_sub_u32 s32, s32, 0x800
; GCN-NEXT: v_readlane_b32 s33, v43, 17		; GCN-NEXT: v_readlane_b32 s33, v40, 17
; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1		; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[6:7]		; GCN-NEXT: s_mov_b64 exec, s[6:7]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[4:5]		; GCN-NEXT: s_setpc_b64 s[4:5]
%a = call i32 %fptr()		%a = call i32 %fptr()
%b = add i32 %a, 1		%b = add i32 %a, 1
ret i32 %b		ret i32 %b
}		}

define void @test_indirect_call_vgpr_ptr_in_branch(void()* %fptr, i1 %cond) {		define void @test_indirect_call_vgpr_ptr_in_branch(void()* %fptr, i1 %cond) {
; GCN-LABEL: test_indirect_call_vgpr_ptr_in_branch:		; GCN-LABEL: test_indirect_call_vgpr_ptr_in_branch:
; GCN: ; %bb.0: ; %bb0		; GCN: ; %bb.0: ; %bb0
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1		; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[16:17]		; GCN-NEXT: s_mov_b64 exec, s[16:17]
; GCN-NEXT: v_writelane_b32 v43, s33, 19		; GCN-NEXT: v_writelane_b32 v40, s33, 19
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_add_u32 s32, s32, 0x800		; GCN-NEXT: s_add_u32 s32, s32, 0x800
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: v_writelane_b32 v43, s34, 0		; GCN-NEXT: v_writelane_b32 v40, s34, 0
; GCN-NEXT: v_writelane_b32 v43, s35, 1		; GCN-NEXT: v_writelane_b32 v40, s35, 1
; GCN-NEXT: v_writelane_b32 v43, s36, 2		; GCN-NEXT: v_writelane_b32 v40, s36, 2
; GCN-NEXT: v_writelane_b32 v43, s38, 3		; GCN-NEXT: v_writelane_b32 v40, s38, 3
; GCN-NEXT: v_writelane_b32 v43, s39, 4		; GCN-NEXT: v_writelane_b32 v40, s39, 4
; GCN-NEXT: v_writelane_b32 v43, s40, 5		; GCN-NEXT: v_writelane_b32 v40, s40, 5
; GCN-NEXT: v_writelane_b32 v43, s41, 6		; GCN-NEXT: v_writelane_b32 v40, s41, 6
; GCN-NEXT: v_writelane_b32 v43, s42, 7		; GCN-NEXT: v_writelane_b32 v40, s42, 7
; GCN-NEXT: v_writelane_b32 v43, s43, 8		; GCN-NEXT: v_writelane_b32 v40, s43, 8
; GCN-NEXT: v_writelane_b32 v43, s44, 9		; GCN-NEXT: v_writelane_b32 v40, s44, 9
; GCN-NEXT: v_writelane_b32 v43, s45, 10		; GCN-NEXT: v_writelane_b32 v40, s45, 10
; GCN-NEXT: v_writelane_b32 v43, s46, 11		; GCN-NEXT: v_writelane_b32 v40, s46, 11
; GCN-NEXT: v_writelane_b32 v43, s47, 12		; GCN-NEXT: v_writelane_b32 v40, s47, 12
; GCN-NEXT: v_writelane_b32 v43, s48, 13		; GCN-NEXT: v_writelane_b32 v40, s48, 13
; GCN-NEXT: v_writelane_b32 v43, s49, 14		; GCN-NEXT: v_writelane_b32 v40, s49, 14
; GCN-NEXT: v_writelane_b32 v43, s50, 15		; GCN-NEXT: v_writelane_b32 v40, s50, 15
; GCN-NEXT: v_writelane_b32 v43, s51, 16		; GCN-NEXT: v_writelane_b32 v40, s51, 16
; GCN-NEXT: v_mov_b32_e32 v40, v31		; GCN-NEXT: v_mov_b32_e32 v41, v31
; GCN-NEXT: s_mov_b32 s34, s14		; GCN-NEXT: s_mov_b32 s34, s14
; GCN-NEXT: s_mov_b32 s35, s13		; GCN-NEXT: s_mov_b32 s35, s13
; GCN-NEXT: s_mov_b32 s36, s12		; GCN-NEXT: s_mov_b32 s36, s12
; GCN-NEXT: s_mov_b64 s[38:39], s[10:11]		; GCN-NEXT: s_mov_b64 s[38:39], s[10:11]
; GCN-NEXT: s_mov_b64 s[40:41], s[8:9]		; GCN-NEXT: s_mov_b64 s[40:41], s[8:9]
; GCN-NEXT: s_mov_b64 s[42:43], s[6:7]		; GCN-NEXT: s_mov_b64 s[42:43], s[6:7]
; GCN-NEXT: s_mov_b64 s[44:45], s[4:5]		; GCN-NEXT: s_mov_b64 s[44:45], s[4:5]
; GCN-NEXT: v_mov_b32_e32 v42, v1		; GCN-NEXT: v_mov_b32_e32 v43, v1
; GCN-NEXT: v_mov_b32_e32 v41, v0		; GCN-NEXT: v_mov_b32_e32 v42, v0
; GCN-NEXT: v_and_b32_e32 v0, 1, v2		; GCN-NEXT: v_and_b32_e32 v0, 1, v2
; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0		; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0
; GCN-NEXT: s_and_saveexec_b64 s[46:47], vcc		; GCN-NEXT: s_and_saveexec_b64 s[46:47], vcc
; GCN-NEXT: s_cbranch_execz BB5_4		; GCN-NEXT: s_cbranch_execz BB5_4
; GCN-NEXT: ; %bb.1: ; %bb1		; GCN-NEXT: ; %bb.1: ; %bb1
; GCN-NEXT: v_writelane_b32 v43, s30, 17		; GCN-NEXT: v_writelane_b32 v40, s30, 17
; GCN-NEXT: v_writelane_b32 v43, s31, 18		; GCN-NEXT: v_writelane_b32 v40, s31, 18
; GCN-NEXT: s_mov_b64 s[48:49], exec		; GCN-NEXT: s_mov_b64 s[48:49], exec
; GCN-NEXT: BB5_2: ; =>This Inner Loop Header: Depth=1		; GCN-NEXT: BB5_2: ; =>This Inner Loop Header: Depth=1
; GCN-NEXT: v_readfirstlane_b32 s16, v41		; GCN-NEXT: v_readfirstlane_b32 s16, v42
; GCN-NEXT: v_readfirstlane_b32 s17, v42		; GCN-NEXT: v_readfirstlane_b32 s17, v43
; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[16:17], v[41:42]		; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[16:17], v[42:43]
; GCN-NEXT: s_and_saveexec_b64 s[50:51], vcc		; GCN-NEXT: s_and_saveexec_b64 s[50:51], vcc
; GCN-NEXT: s_mov_b64 s[4:5], s[44:45]		; GCN-NEXT: s_mov_b64 s[4:5], s[44:45]
; GCN-NEXT: s_mov_b64 s[6:7], s[42:43]		; GCN-NEXT: s_mov_b64 s[6:7], s[42:43]
; GCN-NEXT: s_mov_b64 s[8:9], s[40:41]		; GCN-NEXT: s_mov_b64 s[8:9], s[40:41]
; GCN-NEXT: s_mov_b64 s[10:11], s[38:39]		; GCN-NEXT: s_mov_b64 s[10:11], s[38:39]
; GCN-NEXT: s_mov_b32 s12, s36		; GCN-NEXT: s_mov_b32 s12, s36
; GCN-NEXT: s_mov_b32 s13, s35		; GCN-NEXT: s_mov_b32 s13, s35
; GCN-NEXT: s_mov_b32 s14, s34		; GCN-NEXT: s_mov_b32 s14, s34
; GCN-NEXT: v_mov_b32_e32 v31, v40		; GCN-NEXT: v_mov_b32_e32 v31, v41
; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]		; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
; GCN-NEXT: s_xor_b64 exec, exec, s[50:51]		; GCN-NEXT: s_xor_b64 exec, exec, s[50:51]
; GCN-NEXT: s_cbranch_execnz BB5_2		; GCN-NEXT: s_cbranch_execnz BB5_2
; GCN-NEXT: ; %bb.3:		; GCN-NEXT: ; %bb.3:
; GCN-NEXT: s_mov_b64 exec, s[48:49]		; GCN-NEXT: s_mov_b64 exec, s[48:49]
; GCN-NEXT: v_readlane_b32 s30, v43, 17		; GCN-NEXT: v_readlane_b32 s30, v40, 17
; GCN-NEXT: v_readlane_b32 s31, v43, 18		; GCN-NEXT: v_readlane_b32 s31, v40, 18
; GCN-NEXT: BB5_4: ; %bb2		; GCN-NEXT: BB5_4: ; %bb2
; GCN-NEXT: s_or_b64 exec, exec, s[46:47]		; GCN-NEXT: s_or_b64 exec, exec, s[46:47]
; GCN-NEXT: v_readlane_b32 s51, v43, 16		; GCN-NEXT: v_readlane_b32 s51, v40, 16
; GCN-NEXT: v_readlane_b32 s50, v43, 15		; GCN-NEXT: v_readlane_b32 s50, v40, 15
; GCN-NEXT: v_readlane_b32 s49, v43, 14		; GCN-NEXT: v_readlane_b32 s49, v40, 14
; GCN-NEXT: v_readlane_b32 s48, v43, 13		; GCN-NEXT: v_readlane_b32 s48, v40, 13
; GCN-NEXT: v_readlane_b32 s47, v43, 12		; GCN-NEXT: v_readlane_b32 s47, v40, 12
; GCN-NEXT: v_readlane_b32 s46, v43, 11		; GCN-NEXT: v_readlane_b32 s46, v40, 11
; GCN-NEXT: v_readlane_b32 s45, v43, 10		; GCN-NEXT: v_readlane_b32 s45, v40, 10
; GCN-NEXT: v_readlane_b32 s44, v43, 9		; GCN-NEXT: v_readlane_b32 s44, v40, 9
; GCN-NEXT: v_readlane_b32 s43, v43, 8		; GCN-NEXT: v_readlane_b32 s43, v40, 8
; GCN-NEXT: v_readlane_b32 s42, v43, 7		; GCN-NEXT: v_readlane_b32 s42, v40, 7
; GCN-NEXT: v_readlane_b32 s41, v43, 6		; GCN-NEXT: v_readlane_b32 s41, v40, 6
; GCN-NEXT: v_readlane_b32 s40, v43, 5		; GCN-NEXT: v_readlane_b32 s40, v40, 5
; GCN-NEXT: v_readlane_b32 s39, v43, 4		; GCN-NEXT: v_readlane_b32 s39, v40, 4
; GCN-NEXT: v_readlane_b32 s38, v43, 3		; GCN-NEXT: v_readlane_b32 s38, v40, 3
; GCN-NEXT: v_readlane_b32 s36, v43, 2		; GCN-NEXT: v_readlane_b32 s36, v40, 2
; GCN-NEXT: v_readlane_b32 s35, v43, 1		; GCN-NEXT: v_readlane_b32 s35, v40, 1
; GCN-NEXT: v_readlane_b32 s34, v43, 0		; GCN-NEXT: v_readlane_b32 s34, v40, 0
; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: s_sub_u32 s32, s32, 0x800		; GCN-NEXT: s_sub_u32 s32, s32, 0x800
; GCN-NEXT: v_readlane_b32 s33, v43, 19		; GCN-NEXT: v_readlane_b32 s33, v40, 19
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
bb0:		bb0:
br i1 %cond, label %bb1, label %bb2		br i1 %cond, label %bb1, label %bb2

bb1:		bb1:
call void %fptr()		call void %fptr()
br label %bb2		br label %bb2

bb2:		bb2:
ret void		ret void
}		}

define void @test_indirect_call_vgpr_ptr_inreg_arg(void(i32)* %fptr) {		define void @test_indirect_call_vgpr_ptr_inreg_arg(void(i32)* %fptr) {
; GCN-LABEL: test_indirect_call_vgpr_ptr_inreg_arg:		; GCN-LABEL: test_indirect_call_vgpr_ptr_inreg_arg:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1		; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: v_writelane_b32 v42, s33, 6		; GCN-NEXT: v_writelane_b32 v40, s33, 6
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_add_u32 s32, s32, 0x400		; GCN-NEXT: s_add_u32 s32, s32, 0x400
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-NEXT: v_writelane_b32 v42, s34, 0		; GCN-NEXT: v_writelane_b32 v40, s34, 0
; GCN-NEXT: v_writelane_b32 v42, s35, 1		; GCN-NEXT: v_writelane_b32 v40, s35, 1
; GCN-NEXT: v_writelane_b32 v42, s36, 2		; GCN-NEXT: v_writelane_b32 v40, s36, 2
; GCN-NEXT: v_writelane_b32 v42, s37, 3		; GCN-NEXT: v_writelane_b32 v40, s37, 3
; GCN-NEXT: v_writelane_b32 v42, s30, 4		; GCN-NEXT: v_writelane_b32 v40, s30, 4
; GCN-NEXT: v_writelane_b32 v42, s31, 5		; GCN-NEXT: v_writelane_b32 v40, s31, 5
; GCN-NEXT: v_mov_b32_e32 v41, v1		; GCN-NEXT: v_mov_b32_e32 v42, v1
; GCN-NEXT: v_mov_b32_e32 v40, v0		; GCN-NEXT: v_mov_b32_e32 v41, v0
; GCN-NEXT: s_mov_b64 s[34:35], exec		; GCN-NEXT: s_mov_b64 s[34:35], exec
; GCN-NEXT: BB6_1: ; =>This Inner Loop Header: Depth=1		; GCN-NEXT: BB6_1: ; =>This Inner Loop Header: Depth=1
; GCN-NEXT: v_readfirstlane_b32 s6, v40		; GCN-NEXT: v_readfirstlane_b32 s6, v41
; GCN-NEXT: v_readfirstlane_b32 s7, v41		; GCN-NEXT: v_readfirstlane_b32 s7, v42
; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[6:7], v[40:41]		; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[6:7], v[41:42]
; GCN-NEXT: s_and_saveexec_b64 s[36:37], vcc		; GCN-NEXT: s_and_saveexec_b64 s[36:37], vcc
; GCN-NEXT: s_movk_i32 s4, 0x7b		; GCN-NEXT: s_movk_i32 s4, 0x7b
; GCN-NEXT: s_swappc_b64 s[30:31], s[6:7]		; GCN-NEXT: s_swappc_b64 s[30:31], s[6:7]
; GCN-NEXT: s_xor_b64 exec, exec, s[36:37]		; GCN-NEXT: s_xor_b64 exec, exec, s[36:37]
; GCN-NEXT: s_cbranch_execnz BB6_1		; GCN-NEXT: s_cbranch_execnz BB6_1
; GCN-NEXT: ; %bb.2:		; GCN-NEXT: ; %bb.2:
; GCN-NEXT: s_mov_b64 exec, s[34:35]		; GCN-NEXT: s_mov_b64 exec, s[34:35]
; GCN-NEXT: v_readlane_b32 s4, v42, 4		; GCN-NEXT: v_readlane_b32 s4, v40, 4
; GCN-NEXT: v_readlane_b32 s5, v42, 5		; GCN-NEXT: v_readlane_b32 s5, v40, 5
; GCN-NEXT: v_readlane_b32 s37, v42, 3		; GCN-NEXT: v_readlane_b32 s37, v40, 3
; GCN-NEXT: v_readlane_b32 s36, v42, 2		; GCN-NEXT: v_readlane_b32 s36, v40, 2
; GCN-NEXT: v_readlane_b32 s35, v42, 1		; GCN-NEXT: v_readlane_b32 s35, v40, 1
; GCN-NEXT: v_readlane_b32 s34, v42, 0		; GCN-NEXT: v_readlane_b32 s34, v40, 0
; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_sub_u32 s32, s32, 0x400		; GCN-NEXT: s_sub_u32 s32, s32, 0x400
; GCN-NEXT: v_readlane_b32 s33, v42, 6		; GCN-NEXT: v_readlane_b32 s33, v40, 6
; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1		; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[6:7]		; GCN-NEXT: s_mov_b64 exec, s[6:7]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[4:5]		; GCN-NEXT: s_setpc_b64 s[4:5]
call amdgpu_gfx void %fptr(i32 inreg 123)		call amdgpu_gfx void %fptr(i32 inreg 123)
ret void		ret void
}		}

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

	Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
	; GCN-O0-NEXT: Slot index numbering			; GCN-O0-NEXT: Slot index numbering
	; GCN-O0-NEXT: Live Interval Analysis			; GCN-O0-NEXT: Live Interval Analysis
	; GCN-O0-NEXT: MachinePostDominator Tree Construction			; GCN-O0-NEXT: MachinePostDominator Tree Construction
	; GCN-O0-NEXT: SI Whole Quad Mode			; GCN-O0-NEXT: SI Whole Quad Mode
	; GCN-O0-NEXT: Virtual Register Map			; GCN-O0-NEXT: Virtual Register Map
	; GCN-O0-NEXT: Live Register Matrix			; GCN-O0-NEXT: Live Register Matrix
	; GCN-O0-NEXT: SI Pre-allocate WWM Registers			; GCN-O0-NEXT: SI Pre-allocate WWM Registers
	; GCN-O0-NEXT: Fast Register Allocator			; GCN-O0-NEXT: Fast Register Allocator
	; GCN-O0-NEXT: SI Fix VGPR copies
	; GCN-O0-NEXT: SI lower SGPR spill instructions			; GCN-O0-NEXT: SI lower SGPR spill instructions
				; GCN-O0-NEXT: Fast Register Allocator
				; GCN-O0-NEXT: SI Fix VGPR copies
	; GCN-O0-NEXT: Fixup Statepoint Caller Saved			; GCN-O0-NEXT: Fixup Statepoint Caller Saved
	; GCN-O0-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O0-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O0-NEXT: Machine Optimization Remark Emitter			; GCN-O0-NEXT: Machine Optimization Remark Emitter
	; GCN-O0-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; GCN-O0-NEXT: Prologue/Epilogue Insertion & Frame Finalization
	; GCN-O0-NEXT: Post-RA pseudo instruction expansion pass			; GCN-O0-NEXT: Post-RA pseudo instruction expansion pass
	; GCN-O0-NEXT: SI post-RA bundler			; GCN-O0-NEXT: SI post-RA bundler
	; GCN-O0-NEXT: Insert fentry calls			; GCN-O0-NEXT: Insert fentry calls
	; GCN-O0-NEXT: Insert XRay ops			; GCN-O0-NEXT: Insert XRay ops
	▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: Live Stack Slot Analysis			; GCN-O1-NEXT: Live Stack Slot Analysis
	; GCN-O1-NEXT: Virtual Register Map			; GCN-O1-NEXT: Virtual Register Map
	; GCN-O1-NEXT: Live Register Matrix			; GCN-O1-NEXT: Live Register Matrix
	; GCN-O1-NEXT: Bundle Machine CFG Edges			; GCN-O1-NEXT: Bundle Machine CFG Edges
	; GCN-O1-NEXT: Spill Code Placement Analysis			; GCN-O1-NEXT: Spill Code Placement Analysis
	; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O1-NEXT: Machine Optimization Remark Emitter			; GCN-O1-NEXT: Machine Optimization Remark Emitter
	; GCN-O1-NEXT: Greedy Register Allocator			; GCN-O1-NEXT: Greedy Register Allocator
				; GCN-O1-NEXT: Virtual Register Rewriter
				; GCN-O1-NEXT: SI lower SGPR spill instructions
				; GCN-O1-NEXT: Virtual Register Map
				; GCN-O1-NEXT: Live Register Matrix
				; GCN-O1-NEXT: Machine Optimization Remark Emitter
				; GCN-O1-NEXT: Greedy Register Allocator
	; GCN-O1-NEXT: GCN NSA Reassign			; GCN-O1-NEXT: GCN NSA Reassign
	; GCN-O1-NEXT: Virtual Register Rewriter			; GCN-O1-NEXT: Virtual Register Rewriter
	; GCN-O1-NEXT: Stack Slot Coloring			; GCN-O1-NEXT: Stack Slot Coloring
	; GCN-O1-NEXT: Machine Copy Propagation Pass			; GCN-O1-NEXT: Machine Copy Propagation Pass
	; GCN-O1-NEXT: Machine Loop Invariant Code Motion			; GCN-O1-NEXT: Machine Loop Invariant Code Motion
	; GCN-O1-NEXT: SI Fix VGPR copies			; GCN-O1-NEXT: SI Fix VGPR copies
	; GCN-O1-NEXT: SI optimize exec mask operations			; GCN-O1-NEXT: SI optimize exec mask operations
	; GCN-O1-NEXT: SI lower SGPR spill instructions
	; GCN-O1-NEXT: Fixup Statepoint Caller Saved			; GCN-O1-NEXT: Fixup Statepoint Caller Saved
	; GCN-O1-NEXT: PostRA Machine Sink			; GCN-O1-NEXT: PostRA Machine Sink
	; GCN-O1-NEXT: MachineDominator Tree Construction			; GCN-O1-NEXT: MachineDominator Tree Construction
	; GCN-O1-NEXT: Machine Natural Loop Construction			; GCN-O1-NEXT: Machine Natural Loop Construction
	; GCN-O1-NEXT: Machine Block Frequency Analysis			; GCN-O1-NEXT: Machine Block Frequency Analysis
	; GCN-O1-NEXT: MachinePostDominator Tree Construction			; GCN-O1-NEXT: MachinePostDominator Tree Construction
	; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O1-NEXT: Machine Optimization Remark Emitter			; GCN-O1-NEXT: Machine Optimization Remark Emitter
	▲ Show 20 Lines • Show All 260 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT: Live Stack Slot Analysis			; GCN-O1-OPTS-NEXT: Live Stack Slot Analysis
	; GCN-O1-OPTS-NEXT: Virtual Register Map			; GCN-O1-OPTS-NEXT: Virtual Register Map
	; GCN-O1-OPTS-NEXT: Live Register Matrix			; GCN-O1-OPTS-NEXT: Live Register Matrix
	; GCN-O1-OPTS-NEXT: Bundle Machine CFG Edges			; GCN-O1-OPTS-NEXT: Bundle Machine CFG Edges
	; GCN-O1-OPTS-NEXT: Spill Code Placement Analysis			; GCN-O1-OPTS-NEXT: Spill Code Placement Analysis
	; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter			; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter
	; GCN-O1-OPTS-NEXT: Greedy Register Allocator			; GCN-O1-OPTS-NEXT: Greedy Register Allocator
				; GCN-O1-OPTS-NEXT: Virtual Register Rewriter
				; GCN-O1-OPTS-NEXT: SI lower SGPR spill instructions
				; GCN-O1-OPTS-NEXT: Virtual Register Map
				; GCN-O1-OPTS-NEXT: Live Register Matrix
				; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter
				; GCN-O1-OPTS-NEXT: Greedy Register Allocator
	; GCN-O1-OPTS-NEXT: GCN NSA Reassign			; GCN-O1-OPTS-NEXT: GCN NSA Reassign
	; GCN-O1-OPTS-NEXT: Virtual Register Rewriter			; GCN-O1-OPTS-NEXT: Virtual Register Rewriter
	; GCN-O1-OPTS-NEXT: Stack Slot Coloring			; GCN-O1-OPTS-NEXT: Stack Slot Coloring
	; GCN-O1-OPTS-NEXT: Machine Copy Propagation Pass			; GCN-O1-OPTS-NEXT: Machine Copy Propagation Pass
	; GCN-O1-OPTS-NEXT: Machine Loop Invariant Code Motion			; GCN-O1-OPTS-NEXT: Machine Loop Invariant Code Motion
	; GCN-O1-OPTS-NEXT: SI Fix VGPR copies			; GCN-O1-OPTS-NEXT: SI Fix VGPR copies
	; GCN-O1-OPTS-NEXT: SI optimize exec mask operations			; GCN-O1-OPTS-NEXT: SI optimize exec mask operations
	; GCN-O1-OPTS-NEXT: SI lower SGPR spill instructions
	; GCN-O1-OPTS-NEXT: Fixup Statepoint Caller Saved			; GCN-O1-OPTS-NEXT: Fixup Statepoint Caller Saved
	; GCN-O1-OPTS-NEXT: PostRA Machine Sink			; GCN-O1-OPTS-NEXT: PostRA Machine Sink
	; GCN-O1-OPTS-NEXT: MachineDominator Tree Construction			; GCN-O1-OPTS-NEXT: MachineDominator Tree Construction
	; GCN-O1-OPTS-NEXT: Machine Natural Loop Construction			; GCN-O1-OPTS-NEXT: Machine Natural Loop Construction
	; GCN-O1-OPTS-NEXT: Machine Block Frequency Analysis			; GCN-O1-OPTS-NEXT: Machine Block Frequency Analysis
	; GCN-O1-OPTS-NEXT: MachinePostDominator Tree Construction			; GCN-O1-OPTS-NEXT: MachinePostDominator Tree Construction
	; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter			; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter
	▲ Show 20 Lines • Show All 260 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: Live Stack Slot Analysis			; GCN-O2-NEXT: Live Stack Slot Analysis
	; GCN-O2-NEXT: Virtual Register Map			; GCN-O2-NEXT: Virtual Register Map
	; GCN-O2-NEXT: Live Register Matrix			; GCN-O2-NEXT: Live Register Matrix
	; GCN-O2-NEXT: Bundle Machine CFG Edges			; GCN-O2-NEXT: Bundle Machine CFG Edges
	; GCN-O2-NEXT: Spill Code Placement Analysis			; GCN-O2-NEXT: Spill Code Placement Analysis
	; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O2-NEXT: Machine Optimization Remark Emitter			; GCN-O2-NEXT: Machine Optimization Remark Emitter
	; GCN-O2-NEXT: Greedy Register Allocator			; GCN-O2-NEXT: Greedy Register Allocator
				; GCN-O2-NEXT: Virtual Register Rewriter
				; GCN-O2-NEXT: SI lower SGPR spill instructions
				; GCN-O2-NEXT: Virtual Register Map
				; GCN-O2-NEXT: Live Register Matrix
				; GCN-O2-NEXT: Machine Optimization Remark Emitter
				; GCN-O2-NEXT: Greedy Register Allocator
	; GCN-O2-NEXT: GCN NSA Reassign			; GCN-O2-NEXT: GCN NSA Reassign
	; GCN-O2-NEXT: Virtual Register Rewriter			; GCN-O2-NEXT: Virtual Register Rewriter
	; GCN-O2-NEXT: Stack Slot Coloring			; GCN-O2-NEXT: Stack Slot Coloring
	; GCN-O2-NEXT: Machine Copy Propagation Pass			; GCN-O2-NEXT: Machine Copy Propagation Pass
	; GCN-O2-NEXT: Machine Loop Invariant Code Motion			; GCN-O2-NEXT: Machine Loop Invariant Code Motion
	; GCN-O2-NEXT: SI Fix VGPR copies			; GCN-O2-NEXT: SI Fix VGPR copies
	; GCN-O2-NEXT: SI optimize exec mask operations			; GCN-O2-NEXT: SI optimize exec mask operations
	; GCN-O2-NEXT: SI lower SGPR spill instructions
	; GCN-O2-NEXT: Fixup Statepoint Caller Saved			; GCN-O2-NEXT: Fixup Statepoint Caller Saved
	; GCN-O2-NEXT: PostRA Machine Sink			; GCN-O2-NEXT: PostRA Machine Sink
	; GCN-O2-NEXT: MachineDominator Tree Construction			; GCN-O2-NEXT: MachineDominator Tree Construction
	; GCN-O2-NEXT: Machine Natural Loop Construction			; GCN-O2-NEXT: Machine Natural Loop Construction
	; GCN-O2-NEXT: Machine Block Frequency Analysis			; GCN-O2-NEXT: Machine Block Frequency Analysis
	; GCN-O2-NEXT: MachinePostDominator Tree Construction			; GCN-O2-NEXT: MachinePostDominator Tree Construction
	; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O2-NEXT: Machine Optimization Remark Emitter			; GCN-O2-NEXT: Machine Optimization Remark Emitter
	▲ Show 20 Lines • Show All 273 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: Live Stack Slot Analysis			; GCN-O3-NEXT: Live Stack Slot Analysis
	; GCN-O3-NEXT: Virtual Register Map			; GCN-O3-NEXT: Virtual Register Map
	; GCN-O3-NEXT: Live Register Matrix			; GCN-O3-NEXT: Live Register Matrix
	; GCN-O3-NEXT: Bundle Machine CFG Edges			; GCN-O3-NEXT: Bundle Machine CFG Edges
	; GCN-O3-NEXT: Spill Code Placement Analysis			; GCN-O3-NEXT: Spill Code Placement Analysis
	; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O3-NEXT: Machine Optimization Remark Emitter			; GCN-O3-NEXT: Machine Optimization Remark Emitter
	; GCN-O3-NEXT: Greedy Register Allocator			; GCN-O3-NEXT: Greedy Register Allocator
				; GCN-O3-NEXT: Virtual Register Rewriter
				; GCN-O3-NEXT: SI lower SGPR spill instructions
				; GCN-O3-NEXT: Virtual Register Map
				; GCN-O3-NEXT: Live Register Matrix
				; GCN-O3-NEXT: Machine Optimization Remark Emitter
				; GCN-O3-NEXT: Greedy Register Allocator
	; GCN-O3-NEXT: GCN NSA Reassign			; GCN-O3-NEXT: GCN NSA Reassign
	; GCN-O3-NEXT: Virtual Register Rewriter			; GCN-O3-NEXT: Virtual Register Rewriter
	; GCN-O3-NEXT: Stack Slot Coloring			; GCN-O3-NEXT: Stack Slot Coloring
	; GCN-O3-NEXT: Machine Copy Propagation Pass			; GCN-O3-NEXT: Machine Copy Propagation Pass
	; GCN-O3-NEXT: Machine Loop Invariant Code Motion			; GCN-O3-NEXT: Machine Loop Invariant Code Motion
	; GCN-O3-NEXT: SI Fix VGPR copies			; GCN-O3-NEXT: SI Fix VGPR copies
	; GCN-O3-NEXT: SI optimize exec mask operations			; GCN-O3-NEXT: SI optimize exec mask operations
	; GCN-O3-NEXT: SI lower SGPR spill instructions
	; GCN-O3-NEXT: Fixup Statepoint Caller Saved			; GCN-O3-NEXT: Fixup Statepoint Caller Saved
	; GCN-O3-NEXT: PostRA Machine Sink			; GCN-O3-NEXT: PostRA Machine Sink
	; GCN-O3-NEXT: MachineDominator Tree Construction			; GCN-O3-NEXT: MachineDominator Tree Construction
	; GCN-O3-NEXT: Machine Natural Loop Construction			; GCN-O3-NEXT: Machine Natural Loop Construction
	; GCN-O3-NEXT: Machine Block Frequency Analysis			; GCN-O3-NEXT: Machine Block Frequency Analysis
	; GCN-O3-NEXT: MachinePostDominator Tree Construction			; GCN-O3-NEXT: MachinePostDominator Tree Construction
	; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O3-NEXT: Machine Optimization Remark Emitter			; GCN-O3-NEXT: Machine Optimization Remark Emitter
	Show All 40 Lines

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

Show First 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	; CHECK-NOT: mul i32
ret void		ret void
}		}

define void @slsr1_1(i32 %b.arg, i32 %s.arg) #0 {		define void @slsr1_1(i32 %b.arg, i32 %s.arg) #0 {
; GFX9-LABEL: slsr1_1:		; GFX9-LABEL: slsr1_1:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GFX9-NEXT: s_mov_b64 exec, s[4:5]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: v_writelane_b32 v43, s33, 4		; GFX9-NEXT: v_writelane_b32 v40, s33, 4
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_add_u32 s32, s32, 0x800		; GFX9-NEXT: s_add_u32 s32, s32, 0x800
; GFX9-NEXT: v_writelane_b32 v43, s34, 0		; GFX9-NEXT: v_writelane_b32 v40, s34, 0
; GFX9-NEXT: s_getpc_b64 s[4:5]		; GFX9-NEXT: s_getpc_b64 s[4:5]
; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4		; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4
; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+12		; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+12
; GFX9-NEXT: v_writelane_b32 v43, s35, 1		; GFX9-NEXT: v_writelane_b32 v40, s35, 1
; GFX9-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0		; GFX9-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0
; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill		; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill
; GFX9-NEXT: v_mov_b32_e32 v40, v1		; GFX9-NEXT: v_mov_b32_e32 v41, v1
; GFX9-NEXT: v_mov_b32_e32 v41, v0		; GFX9-NEXT: v_mov_b32_e32 v42, v0
; GFX9-NEXT: v_writelane_b32 v43, s30, 2		; GFX9-NEXT: v_writelane_b32 v40, s30, 2
; GFX9-NEXT: v_mul_u32_u24_e32 v0, v41, v40		; GFX9-NEXT: v_mul_u32_u24_e32 v0, v42, v41
; GFX9-NEXT: v_writelane_b32 v43, s31, 3		; GFX9-NEXT: v_writelane_b32 v40, s31, 3
; GFX9-NEXT: v_and_b32_e32 v42, 0xffffff, v40		; GFX9-NEXT: v_and_b32_e32 v43, 0xffffff, v41
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_mad_u32_u24 v40, v41, v40, v42		; GFX9-NEXT: v_mad_u32_u24 v41, v42, v41, v43
; GFX9-NEXT: v_mov_b32_e32 v0, v40		; GFX9-NEXT: v_mov_b32_e32 v0, v41
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: v_add_u32_e32 v0, v40, v42		; GFX9-NEXT: v_add_u32_e32 v0, v41, v43
; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
; GFX9-NEXT: v_readlane_b32 s4, v43, 2		; GFX9-NEXT: v_readlane_b32 s4, v40, 2
; GFX9-NEXT: v_readlane_b32 s5, v43, 3		; GFX9-NEXT: v_readlane_b32 s5, v40, 3
; GFX9-NEXT: v_readlane_b32 s35, v43, 1		; GFX9-NEXT: v_readlane_b32 s35, v40, 1
; GFX9-NEXT: v_readlane_b32 s34, v43, 0		; GFX9-NEXT: v_readlane_b32 s34, v40, 0
; GFX9-NEXT: s_sub_u32 s32, s32, 0x800		; GFX9-NEXT: s_sub_u32 s32, s32, 0x800
; GFX9-NEXT: v_readlane_b32 s33, v43, 4		; GFX9-NEXT: v_readlane_b32 s33, v40, 4
; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1		; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GFX9-NEXT: s_mov_b64 exec, s[6:7]		; GFX9-NEXT: s_mov_b64 exec, s[6:7]
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[4:5]		; GFX9-NEXT: s_setpc_b64 s[4:5]
%b = and i32 %b.arg, 16777215		%b = and i32 %b.arg, 16777215
%s = and i32 %s.arg, 16777215		%s = and i32 %s.arg, 16777215

; CHECK-LABEL: @slsr1(		; CHECK-LABEL: @slsr1(
; foo(b * s);		; foo(b * s);
Show All 25 Lines

llvm/test/CodeGen/AMDGPU/pei-build-spill.mir

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs -run-pass=prologepilog -o - %s \| FileCheck -check-prefix=MUBUF %s			# RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs -amdgpu-spill-vgpr-to-agpr=0 -run-pass=prologepilog -o - %s \| FileCheck -check-prefix=MUBUF %s
	# RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs -run-pass=si-lower-sgpr-spills,prologepilog -o - %s \| FileCheck -check-prefix=MUBUF-V2A %s			# RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs -amdgpu-spill-vgpr-to-agpr=1 -run-pass=prologepilog -o - %s \| FileCheck -check-prefix=MUBUF-V2A %s
	# RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs -amdgpu-enable-flat-scratch -run-pass=prologepilog -o - %s \| FileCheck -check-prefix=FLATSCR %s			# RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs -amdgpu-enable-flat-scratch -amdgpu-spill-vgpr-to-agpr=0 -run-pass=prologepilog -o - %s \| FileCheck -check-prefix=FLATSCR %s
	# RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs -amdgpu-enable-flat-scratch -run-pass=si-lower-sgpr-spills,prologepilog -o - %s \| FileCheck -check-prefix=FLATSCR-V2A %s			# RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs -amdgpu-enable-flat-scratch -amdgpu-spill-vgpr-to-agpr=1 -run-pass=prologepilog -o - %s \| FileCheck -check-prefix=FLATSCR-V2A %s
	# RUN: llc -march=amdgcn -mcpu=gfx90a -verify-machineinstrs -run-pass=prologepilog -o - %s \| FileCheck -check-prefix=MUBUF-GFX90A %s			# RUN: llc -march=amdgcn -mcpu=gfx90a -verify-machineinstrs -amdgpu-spill-vgpr-to-agpr=0 -run-pass=prologepilog -o - %s \| FileCheck -check-prefix=MUBUF-GFX90A %s
	# RUN: llc -march=amdgcn -mcpu=gfx90a -verify-machineinstrs -run-pass=si-lower-sgpr-spills,prologepilog -o - %s \| FileCheck -check-prefix=MUBUF-GFX90A-V2A %s			# RUN: llc -march=amdgcn -mcpu=gfx90a -verify-machineinstrs -amdgpu-spill-vgpr-to-agpr=1 -run-pass=prologepilog -o - %s \| FileCheck -check-prefix=MUBUF-GFX90A-V2A %s
	# RUN: llc -march=amdgcn -mcpu=gfx90a -verify-machineinstrs -amdgpu-enable-flat-scratch -run-pass=prologepilog -o - %s \| FileCheck -check-prefix=FLATSCR-GFX90A %s			# RUN: llc -march=amdgcn -mcpu=gfx90a -verify-machineinstrs -amdgpu-enable-flat-scratch -amdgpu-spill-vgpr-to-agpr=0 -run-pass=prologepilog -o - %s \| FileCheck -check-prefix=FLATSCR-GFX90A %s
	# RUN: llc -march=amdgcn -mcpu=gfx90a -verify-machineinstrs -amdgpu-enable-flat-scratch -run-pass=si-lower-sgpr-spills,prologepilog -o - %s \| FileCheck -check-prefix=FLATSCR-GFX90A-V2A %s			# RUN: llc -march=amdgcn -mcpu=gfx90a -verify-machineinstrs -amdgpu-enable-flat-scratch -amdgpu-spill-vgpr-to-agpr=1 -run-pass=prologepilog -o - %s \| FileCheck -check-prefix=FLATSCR-GFX90A-V2A %s

	---			---
	name: test_spill_v1			name: test_spill_v1
	tracksRegLiveness: true			tracksRegLiveness: true
	stack:			stack:
	- { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4 }			- { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4 }
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	▲ Show 20 Lines • Show All 3,262 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sgpr-regalloc-flags.ll

This file was added.

				; REQUIRES: asserts

				; RUN: llc -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=DEFAULT %s
				; RUN: llc -sgpr-regalloc=greedy -vgpr-regalloc=greedy -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=DEFAULT %s

				; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=O0 %s

				; RUN: llc -vgpr-regalloc=basic -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=DEFAULT-BASIC %s
				; RUN: llc -sgpr-regalloc=basic -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=BASIC-DEFAULT %s
				; RUN: llc -sgpr-regalloc=basic -vgpr-regalloc=basic -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=BASIC-BASIC %s

				; RUN: not --crash llc -regalloc=basic -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=REGALLOC %s
				; RUN: not --crash llc -regalloc=fast -O0 -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=REGALLOC %s


				; REGALLOC: -regalloc not supported with amdgcn. Use -sgpr-regalloc and -vgpr-regalloc

				; DEFAULT: Greedy Register Allocator
				; DEFAULT-NEXT: Virtual Register Rewriter
				; DEFAULT-NEXT: SI lower SGPR spill instructions
				; DEFAULT-NEXT: Virtual Register Map
				; DEFAULT-NEXT: Live Register Matrix
				; DEFAULT-NEXT: Machine Optimization Remark Emitter
				; DEFAULT-NEXT: Greedy Register Allocator
				; DEFAULT-NEXT: GCN NSA Reassign
				; DEFAULT-NEXT: Virtual Register Rewriter
				; DEFAULT-NEXT: Stack Slot Coloring

				; O0: Fast Register Allocator
				; O0-NEXT: SI lower SGPR spill instructions
				; O0-NEXT: Fast Register Allocator
				; O0-NEXT: SI Fix VGPR copies




				; BASIC-DEFAULT: Debug Variable Analysis
				; BASIC-DEFAULT-NEXT: Live Stack Slot Analysis
				; BASIC-DEFAULT-NEXT: Machine Natural Loop Construction
				; BASIC-DEFAULT-NEXT: Machine Block Frequency Analysis
				; BASIC-DEFAULT-NEXT: Virtual Register Map
				; BASIC-DEFAULT-NEXT: Live Register Matrix
				; BASIC-DEFAULT-NEXT: Basic Register Allocator
				; BASIC-DEFAULT-NEXT: Virtual Register Rewriter
				; BASIC-DEFAULT-NEXT: SI lower SGPR spill instructions
				; BASIC-DEFAULT-NEXT: Virtual Register Map
				; BASIC-DEFAULT-NEXT: Live Register Matrix
				; BASIC-DEFAULT-NEXT: Bundle Machine CFG Edges
				; BASIC-DEFAULT-NEXT: Spill Code Placement Analysis
				; BASIC-DEFAULT-NEXT: Lazy Machine Block Frequency Analysis
				; BASIC-DEFAULT-NEXT: Machine Optimization Remark Emitter
				; BASIC-DEFAULT-NEXT: Greedy Register Allocator
				; BASIC-DEFAULT-NEXT: GCN NSA Reassign
				; BASIC-DEFAULT-NEXT: Virtual Register Rewriter
				; BASIC-DEFAULT-NEXT: Stack Slot Coloring



				; DEFAULT-BASIC: Greedy Register Allocator
				; DEFAULT-BASIC-NEXT: Virtual Register Rewriter
				; DEFAULT-BASIC-NEXT: SI lower SGPR spill instructions
				; DEFAULT-BASIC-NEXT: Virtual Register Map
				; DEFAULT-BASIC-NEXT: Live Register Matrix
				; DEFAULT-BASIC-NEXT: Basic Register Allocator
				; DEFAULT-BASIC-NEXT: GCN NSA Reassign
				; DEFAULT-BASIC-NEXT: Virtual Register Rewriter
				; DEFAULT-BASIC-NEXT: Stack Slot Coloring



				; BASIC-BASIC: Debug Variable Analysis
				; BASIC-BASIC-NEXT: Live Stack Slot Analysis
				; BASIC-BASIC-NEXT: Machine Natural Loop Construction
				; BASIC-BASIC-NEXT: Machine Block Frequency Analysis
				; BASIC-BASIC-NEXT: Virtual Register Map
				; BASIC-BASIC-NEXT: Live Register Matrix
				; BASIC-BASIC-NEXT: Basic Register Allocator
				; BASIC-BASIC-NEXT: Virtual Register Rewriter
				; BASIC-BASIC-NEXT: SI lower SGPR spill instructions
				; BASIC-BASIC-NEXT: Virtual Register Map
				; BASIC-BASIC-NEXT: Live Register Matrix
				; BASIC-BASIC-NEXT: Basic Register Allocator
				; BASIC-BASIC-NEXT: GCN NSA Reassign
				; BASIC-BASIC-NEXT: Virtual Register Rewriter
				; BASIC-BASIC-NEXT: Stack Slot Coloring


				declare void @bar()

				; Something with some CSR SGPR spills
				define void @foo() {
				call void asm sideeffect "; clobber", "~{s33}"()
				call void @bar()
				ret void
				}

				; Block live out spills with fast regalloc
				define amdgpu_kernel void @control_flow(i1 %cond) {
				%s33 = call i32 asm sideeffect "; clobber", "={s33}"()
				br i1 %cond, label %bb0, label %bb1

				bb0:
				call void asm sideeffect "; use %0", "s"(i32 %s33)
				br label %bb1

				bb1:
				ret void
				}

llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

				; The first 64 SGPR spills can go to a VGPR, but there isn't a second
				; so some spills must be to memory. The last 16 element spill runs out of lanes at the 15th element.

				define amdgpu_kernel void @partial_no_vgprs_last_sgpr_spill(i32 addrspace(1)* %out, i32 %in) #1 {
				; GCN-LABEL: partial_no_vgprs_last_sgpr_spill:
				; GCN: ; %bb.0:
				; GCN-NEXT: s_add_u32 s0, s0, s7
				; GCN-NEXT: s_addc_u32 s1, s1, 0
				; GCN-NEXT: s_load_dword s4, s[4:5], 0x2
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ; def s[8:23]
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: v_writelane_b32 v23, s8, 0
				; GCN-NEXT: v_writelane_b32 v23, s9, 1
				; GCN-NEXT: v_writelane_b32 v23, s10, 2
				; GCN-NEXT: v_writelane_b32 v23, s11, 3
				; GCN-NEXT: v_writelane_b32 v23, s12, 4
				; GCN-NEXT: v_writelane_b32 v23, s13, 5
				; GCN-NEXT: v_writelane_b32 v23, s14, 6
				; GCN-NEXT: v_writelane_b32 v23, s15, 7
				; GCN-NEXT: v_writelane_b32 v23, s16, 8
				; GCN-NEXT: v_writelane_b32 v23, s17, 9
				; GCN-NEXT: v_writelane_b32 v23, s18, 10
				; GCN-NEXT: v_writelane_b32 v23, s19, 11
				; GCN-NEXT: v_writelane_b32 v23, s20, 12
				; GCN-NEXT: v_writelane_b32 v23, s21, 13
				; GCN-NEXT: v_writelane_b32 v23, s22, 14
				; GCN-NEXT: v_writelane_b32 v23, s23, 15
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ; def s[8:23]
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: v_writelane_b32 v23, s8, 16
				; GCN-NEXT: v_writelane_b32 v23, s9, 17
				; GCN-NEXT: v_writelane_b32 v23, s10, 18
				; GCN-NEXT: v_writelane_b32 v23, s11, 19
				; GCN-NEXT: v_writelane_b32 v23, s12, 20
				; GCN-NEXT: v_writelane_b32 v23, s13, 21
				; GCN-NEXT: v_writelane_b32 v23, s14, 22
				; GCN-NEXT: v_writelane_b32 v23, s15, 23
				; GCN-NEXT: v_writelane_b32 v23, s16, 24
				; GCN-NEXT: v_writelane_b32 v23, s17, 25
				; GCN-NEXT: v_writelane_b32 v23, s18, 26
				; GCN-NEXT: v_writelane_b32 v23, s19, 27
				; GCN-NEXT: v_writelane_b32 v23, s20, 28
				; GCN-NEXT: v_writelane_b32 v23, s21, 29
				; GCN-NEXT: v_writelane_b32 v23, s22, 30
				; GCN-NEXT: v_writelane_b32 v23, s23, 31
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ; def s[8:23]
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: v_writelane_b32 v23, s8, 32
				; GCN-NEXT: v_writelane_b32 v23, s9, 33
				; GCN-NEXT: v_writelane_b32 v23, s10, 34
				; GCN-NEXT: v_writelane_b32 v23, s11, 35
				; GCN-NEXT: v_writelane_b32 v23, s12, 36
				; GCN-NEXT: v_writelane_b32 v23, s13, 37
				; GCN-NEXT: v_writelane_b32 v23, s14, 38
				; GCN-NEXT: v_writelane_b32 v23, s15, 39
				; GCN-NEXT: v_writelane_b32 v23, s16, 40
				; GCN-NEXT: v_writelane_b32 v23, s17, 41
				; GCN-NEXT: v_writelane_b32 v23, s18, 42
				; GCN-NEXT: v_writelane_b32 v23, s19, 43
				; GCN-NEXT: v_writelane_b32 v23, s20, 44
				; GCN-NEXT: v_writelane_b32 v23, s21, 45
				; GCN-NEXT: v_writelane_b32 v23, s22, 46
				; GCN-NEXT: v_writelane_b32 v23, s23, 47
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ; def s[8:23]
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: v_writelane_b32 v23, s8, 48
				; GCN-NEXT: v_writelane_b32 v23, s9, 49
				; GCN-NEXT: v_writelane_b32 v23, s10, 50
				; GCN-NEXT: v_writelane_b32 v23, s11, 51
				; GCN-NEXT: v_writelane_b32 v23, s12, 52
				; GCN-NEXT: v_writelane_b32 v23, s13, 53
				; GCN-NEXT: v_writelane_b32 v23, s14, 54
				; GCN-NEXT: v_writelane_b32 v23, s15, 55
				; GCN-NEXT: v_writelane_b32 v23, s16, 56
				; GCN-NEXT: v_writelane_b32 v23, s17, 57
				; GCN-NEXT: v_writelane_b32 v23, s18, 58
				; GCN-NEXT: v_writelane_b32 v23, s19, 59
				; GCN-NEXT: v_writelane_b32 v23, s20, 60
				; GCN-NEXT: v_writelane_b32 v23, s21, 61
				; GCN-NEXT: v_writelane_b32 v23, s22, 62
				; GCN-NEXT: v_writelane_b32 v23, s23, 63
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ; def s[6:7]
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: s_mov_b64 s[8:9], exec
				; GCN-NEXT: s_mov_b64 exec, 3
				; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0
				; GCN-NEXT: v_writelane_b32 v0, s6, 0
				; GCN-NEXT: v_writelane_b32 v0, s7, 1
				; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_mov_b64 exec, s[8:9]
				; GCN-NEXT: s_mov_b32 s5, 0
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: s_cmp_lg_u32 s4, s5
				; GCN-NEXT: s_cbranch_scc1 BB0_2
				; GCN-NEXT: ; %bb.1: ; %bb0
				; GCN-NEXT: v_readlane_b32 s4, v23, 0
				; GCN-NEXT: v_readlane_b32 s5, v23, 1
				; GCN-NEXT: v_readlane_b32 s6, v23, 2
				; GCN-NEXT: v_readlane_b32 s7, v23, 3
				; GCN-NEXT: v_readlane_b32 s8, v23, 4
				; GCN-NEXT: v_readlane_b32 s9, v23, 5
				; GCN-NEXT: v_readlane_b32 s10, v23, 6
				; GCN-NEXT: v_readlane_b32 s11, v23, 7
				; GCN-NEXT: v_readlane_b32 s12, v23, 8
				; GCN-NEXT: v_readlane_b32 s13, v23, 9
				; GCN-NEXT: v_readlane_b32 s14, v23, 10
				; GCN-NEXT: v_readlane_b32 s15, v23, 11
				; GCN-NEXT: v_readlane_b32 s16, v23, 12
				; GCN-NEXT: v_readlane_b32 s17, v23, 13
				; GCN-NEXT: v_readlane_b32 s18, v23, 14
				; GCN-NEXT: v_readlane_b32 s19, v23, 15
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ; use s[4:19]
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: v_readlane_b32 s4, v23, 16
				; GCN-NEXT: v_readlane_b32 s5, v23, 17
				; GCN-NEXT: v_readlane_b32 s6, v23, 18
				; GCN-NEXT: v_readlane_b32 s7, v23, 19
				; GCN-NEXT: v_readlane_b32 s8, v23, 20
				; GCN-NEXT: v_readlane_b32 s9, v23, 21
				; GCN-NEXT: v_readlane_b32 s10, v23, 22
				; GCN-NEXT: v_readlane_b32 s11, v23, 23
				; GCN-NEXT: v_readlane_b32 s12, v23, 24
				; GCN-NEXT: v_readlane_b32 s13, v23, 25
				; GCN-NEXT: v_readlane_b32 s14, v23, 26
				; GCN-NEXT: v_readlane_b32 s15, v23, 27
				; GCN-NEXT: v_readlane_b32 s16, v23, 28
				; GCN-NEXT: v_readlane_b32 s17, v23, 29
				; GCN-NEXT: v_readlane_b32 s18, v23, 30
				; GCN-NEXT: v_readlane_b32 s19, v23, 31
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ; use s[4:19]
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: v_readlane_b32 s4, v23, 32
				; GCN-NEXT: v_readlane_b32 s5, v23, 33
				; GCN-NEXT: v_readlane_b32 s6, v23, 34
				; GCN-NEXT: v_readlane_b32 s7, v23, 35
				; GCN-NEXT: v_readlane_b32 s8, v23, 36
				; GCN-NEXT: v_readlane_b32 s9, v23, 37
				; GCN-NEXT: v_readlane_b32 s10, v23, 38
				; GCN-NEXT: v_readlane_b32 s11, v23, 39
				; GCN-NEXT: v_readlane_b32 s12, v23, 40
				; GCN-NEXT: v_readlane_b32 s13, v23, 41
				; GCN-NEXT: v_readlane_b32 s14, v23, 42
				; GCN-NEXT: v_readlane_b32 s15, v23, 43
				; GCN-NEXT: v_readlane_b32 s16, v23, 44
				; GCN-NEXT: v_readlane_b32 s17, v23, 45
				; GCN-NEXT: v_readlane_b32 s18, v23, 46
				; GCN-NEXT: v_readlane_b32 s19, v23, 47
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ; use s[4:19]
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: v_readlane_b32 s8, v23, 48
				; GCN-NEXT: v_readlane_b32 s9, v23, 49
				; GCN-NEXT: v_readlane_b32 s10, v23, 50
				; GCN-NEXT: v_readlane_b32 s11, v23, 51
				; GCN-NEXT: v_readlane_b32 s12, v23, 52
				; GCN-NEXT: v_readlane_b32 s13, v23, 53
				; GCN-NEXT: v_readlane_b32 s14, v23, 54
				; GCN-NEXT: v_readlane_b32 s15, v23, 55
				; GCN-NEXT: v_readlane_b32 s16, v23, 56
				; GCN-NEXT: v_readlane_b32 s17, v23, 57
				; GCN-NEXT: v_readlane_b32 s18, v23, 58
				; GCN-NEXT: v_readlane_b32 s19, v23, 59
				; GCN-NEXT: v_readlane_b32 s20, v23, 60
				; GCN-NEXT: v_readlane_b32 s21, v23, 61
				; GCN-NEXT: v_readlane_b32 s22, v23, 62
				; GCN-NEXT: v_readlane_b32 s23, v23, 63
				; GCN-NEXT: s_mov_b64 s[6:7], exec
				; GCN-NEXT: s_mov_b64 exec, 3
				; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0
				; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:4 ; 4-byte Folded Reload
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_readlane_b32 s4, v0, 0
				; GCN-NEXT: v_readlane_b32 s5, v0, 1
				; GCN-NEXT: buffer_load_dword v0, off, s[0:3], 0
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_mov_b64 exec, s[6:7]
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ; use s[8:23]
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: ;;#ASMSTART
				; GCN-NEXT: ; use s[4:5]
				; GCN-NEXT: ;;#ASMEND
				; GCN-NEXT: BB0_2: ; %ret
				; GCN-NEXT: s_endpgm
				call void asm sideeffect "", "~{v[0:7]}" () #0
				call void asm sideeffect "", "~{v[8:15]}" () #0
				call void asm sideeffect "", "~{v[16:19]}"() #0
				call void asm sideeffect "", "~{v[20:21]}"() #0
				call void asm sideeffect "", "~{v22}"() #0

				%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
				%wide.sgpr1 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
				%wide.sgpr2 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
				%wide.sgpr3 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
				%wide.sgpr4 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0
				%cmp = icmp eq i32 %in, 0
				br i1 %cmp, label %bb0, label %ret

				bb0:
				call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr0) #0
				call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr1) #0
				call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr2) #0
				call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr3) #0
				call void asm sideeffect "; use $0", "s"(<2 x i32> %wide.sgpr4) #0
				br label %ret

				ret:
				ret void
				}

				attributes #0 = { nounwind }
				attributes #1 = { nounwind "amdgpu-waves-per-eu"="10,10" }

llvm/test/CodeGen/AMDGPU/sgpr-spill-wrong-stack-id.mir

	# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=3 -start-before=greedy -stop-after=stack-slot-coloring -o - %s \| FileCheck -check-prefixes=SHARE,GCN %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=3 -run-pass=greedy,virtregrewriter,stack-slot-coloring -o - %s \| FileCheck -check-prefixes=SHARE,GCN %s
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=3 -start-before=greedy -stop-after=stack-slot-coloring -no-stack-slot-sharing -o - %s \| FileCheck -check-prefixes=NOSHARE,GCN %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=3 -run-pass=greedy,virtregrewriter,stack-slot-coloring -no-stack-slot-sharing -o - %s \| FileCheck -check-prefixes=NOSHARE,GCN %s

				# -run-pass is used to artifically avoid using split register allocation, which would avoid stressing StackSlotColoring.


	# Make sure that stack slot coloring doesn't try to merge frame			# Make sure that stack slot coloring doesn't try to merge frame
	# indexes used for SGPR spilling with those that aren't.			# indexes used for SGPR spilling with those that aren't.
	# Even when stack slot sharing was disabled, it was still moving the			# Even when stack slot sharing was disabled, it was still moving the
	# FI ID used for an SGPR spill to a normal frame index.			# FI ID used for an SGPR spill to a normal frame index.

	--- \|			--- \|

	▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sibling-call.ll

	Show First 20 Lines • Show All 191 Lines • ▼ Show 20 Lines
	entry:			entry:
	%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] zeroinitializer)			%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] zeroinitializer)
	ret i32 %ret			ret i32 %ret
	}			}

	; Have another non-tail in the function			; Have another non-tail in the function
	; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_other_call:			; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_other_call:
	; GCN: s_or_saveexec_b64 s{{\[[0-9]+:[0-9]+\]}}, -1			; GCN: s_or_saveexec_b64 s{{\[[0-9]+:[0-9]+\]}}, -1
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword [[CSRV:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec			; GCN-NEXT: s_mov_b64 exec
	; GCN: s_mov_b32 s33, s32			; GCN: s_mov_b32 s33, s32
	; GCN-DAG: s_add_u32 s32, s32, 0x400			; GCN-DAG: s_add_u32 s32, s32, 0x400

	; GCN-DAG: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-DAG: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-DAG: v_writelane_b32 v42, s34, 0			; GCN-DAG: v_writelane_b32 [[CSRV]], s34, 0
	; GCN-DAG: v_writelane_b32 v42, s35, 1			; GCN-DAG: v_writelane_b32 [[CSRV]], s35, 1

	; GCN-DAG: s_getpc_b64 s[4:5]			; GCN-DAG: s_getpc_b64 s[4:5]
	; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4			; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4
	; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+12			; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+12


	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN-DAG: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-DAG: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-DAG: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-DAG: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload

	; GCN: s_getpc_b64 s[4:5]			; GCN: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12

	; GCN-DAG: v_readlane_b32 s34, v42, 0			; GCN-DAG: v_readlane_b32 s34, [[CSRV]], 0
	; GCN-DAG: v_readlane_b32 s35, v42, 1			; GCN-DAG: v_readlane_b32 s35, [[CSRV]], 1

	; GCN: s_sub_u32 s32, s32, 0x400			; GCN: s_sub_u32 s32, s32, 0x400
	; GCN-NEXT: v_readlane_b32 s33,			; GCN-NEXT: v_readlane_b32 s33,
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword [[CSRV]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {			define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {
	entry:			entry:
	%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)			%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)
	%ret = tail call fastcc i32 @sibling_call_i32_fastcc_i32_i32(i32 %a, i32 %b, i32 %other.call)			%ret = tail call fastcc i32 @sibling_call_i32_fastcc_i32_i32(i32 %a, i32 %b, i32 %other.call)
	ret i32 %ret			ret i32 %ret
	}			}
	▲ Show 20 Lines • Show All 230 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill-empty-live-interval.mir

	# RUN: llc -mtriple=amdgcn-amd-amdhsa -amdgpu-dce-in-ra=0 -verify-machineinstrs -stress-regalloc=1 -start-before=simple-register-coalescing -stop-after=greedy -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -amdgpu-dce-in-ra=0 -stress-regalloc=1 -start-before=simple-register-coalescing -stop-after=greedy,1 -o - %s \| FileCheck %s
	# https://bugs.llvm.org/show_bug.cgi?id=33620			# https://bugs.llvm.org/show_bug.cgi?id=33620

	---			---
	# This would assert due to the empty live interval created for %9			# This would assert due to the empty live interval created for %9
	# on the last S_NOP with an undef subreg use.			# on the last S_NOP with an undef subreg use.

	# CHECK-LABEL: name: expecting_non_empty_interval			# CHECK-LABEL: name: expecting_non_empty_interval

	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll

	; RUN: llc -march=amdgcn -mcpu=verde -enable-misched=0 -post-RA-scheduler=0 -amdgpu-spill-sgpr-to-vgpr=0 < %s \| FileCheck -check-prefixes=CHECK,GFX6 %s			; RUN: llc -march=amdgcn -mcpu=verde -enable-misched=0 -post-RA-scheduler=0 -amdgpu-spill-sgpr-to-vgpr=0 < %s \| FileCheck -check-prefixes=CHECK,GFX6 %s
	; RUN: llc -regalloc=basic -march=amdgcn -mcpu=tonga -enable-misched=0 -post-RA-scheduler=0 -amdgpu-spill-sgpr-to-vgpr=0 < %s \| FileCheck --check-prefix=CHECK %s			; RUN: llc -sgpr-regalloc=basic -vgpr-regalloc=basic -march=amdgcn -mcpu=tonga -enable-misched=0 -post-RA-scheduler=0 -amdgpu-spill-sgpr-to-vgpr=0 < %s \| FileCheck --check-prefix=CHECK %s
	; RUN: llc -march=amdgcn -mattr=-xnack -mcpu=gfx900 -enable-misched=0 -post-RA-scheduler=0 -amdgpu-spill-sgpr-to-vgpr=0 -amdgpu-enable-flat-scratch < %s \| FileCheck -check-prefixes=CHECK,GFX9-FLATSCR,FLATSCR %s			; RUN: llc -march=amdgcn -mattr=-xnack -mcpu=gfx900 -enable-misched=0 -post-RA-scheduler=0 -amdgpu-spill-sgpr-to-vgpr=0 -amdgpu-enable-flat-scratch < %s \| FileCheck -check-prefixes=CHECK,GFX9-FLATSCR,FLATSCR %s
	; RUN: llc -march=amdgcn -mcpu=gfx1030 -enable-misched=0 -post-RA-scheduler=0 -amdgpu-spill-sgpr-to-vgpr=0 -amdgpu-enable-flat-scratch < %s \| FileCheck -check-prefixes=CHECK,GFX10-FLATSCR,FLATSCR %s			; RUN: llc -march=amdgcn -mcpu=gfx1030 -enable-misched=0 -post-RA-scheduler=0 -amdgpu-spill-sgpr-to-vgpr=0 -amdgpu-enable-flat-scratch < %s \| FileCheck -check-prefixes=CHECK,GFX10-FLATSCR,FLATSCR %s
	;			;
	; There is something about Tonga that causes this test to spend a lot of time			; There is something about Tonga that causes this test to spend a lot of time
	; in the default register allocator.			; in the default register allocator.


	; When the offset of VGPR spills into scratch space gets too large, an additional SGPR			; When the offset of VGPR spills into scratch space gets too large, an additional SGPR
	▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll

Show All 16 Lines	define void @spill_more_than_wavesize_csr_sgprs() {
,~{s75},~{s76},~{s77},~{s78},~{s79},~{s80},~{s81},~{s82}		,~{s75},~{s76},~{s77},~{s78},~{s79},~{s80},~{s81},~{s82}
,~{s83},~{s84},~{s85},~{s86},~{s87},~{s88},~{s89},~{s90}		,~{s83},~{s84},~{s85},~{s86},~{s87},~{s88},~{s89},~{s90}
,~{s91},~{s92},~{s93},~{s94},~{s95},~{s96},~{s97},~{s98}		,~{s91},~{s92},~{s93},~{s94},~{s95},~{s96},~{s97},~{s98}
,~{s99},~{s100},~{s101},~{s102}"()		,~{s99},~{s100},~{s101},~{s102}"()
ret void		ret void
}		}

; CHECK-LABEL: {{^}}spill_more_than_wavesize_csr_sgprs_with_stack_object:		; CHECK-LABEL: {{^}}spill_more_than_wavesize_csr_sgprs_with_stack_object:
; CHECK-DAG: v_writelane_b32 v1, s98, 63		; CHECK-DAG: v_writelane_b32 v0, s98, 63
; CHECK-DAG: v_writelane_b32 v2, s99, 0		; CHECK-DAG: v_writelane_b32 v1, s99, 0
; CHECK-NOT: dummy		; CHECK-NOT: dummy
; CHECK-DAG: v_readlane_b32 s99, v2, 0		; CHECK-DAG: v_readlane_b32 s99, v1, 0
; CHECK-DAG: v_readlane_b32 s98, v1, 63		; CHECK-DAG: v_readlane_b32 s98, v0, 63

define void @spill_more_than_wavesize_csr_sgprs_with_stack_object() {		define void @spill_more_than_wavesize_csr_sgprs_with_stack_object() {
%alloca = alloca i32, align 4, addrspace(5)		%alloca = alloca i32, align 4, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
call void asm sideeffect "",		call void asm sideeffect "",
"~{s35},~{s36},~{s37},~{s38},~{s39},~{s40},~{s41},~{s42}		"~{s35},~{s36},~{s37},~{s38},~{s39},~{s40},~{s41},~{s42}
,~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49},~{s50}		,~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49},~{s50}
,~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58}		,~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58}
,~{s59},~{s60},~{s61},~{s62},~{s63},~{s64},~{s65},~{s66}		,~{s59},~{s60},~{s61},~{s62},~{s63},~{s64},~{s65},~{s66}
,~{s67},~{s68},~{s69},~{s70},~{s71},~{s72},~{s73},~{s74}		,~{s67},~{s68},~{s69},~{s70},~{s71},~{s72},~{s73},~{s74}
,~{s75},~{s76},~{s77},~{s78},~{s79},~{s80},~{s81},~{s82}		,~{s75},~{s76},~{s77},~{s78},~{s79},~{s80},~{s81},~{s82}
,~{s83},~{s84},~{s85},~{s86},~{s87},~{s88},~{s89},~{s90}		,~{s83},~{s84},~{s85},~{s86},~{s87},~{s88},~{s89},~{s90}
,~{s91},~{s92},~{s93},~{s94},~{s95},~{s96},~{s97},~{s98}		,~{s91},~{s92},~{s93},~{s94},~{s95},~{s96},~{s97},~{s98}
,~{s99},~{s100},~{s101},~{s102}"()		,~{s99},~{s100},~{s101},~{s102}"()
ret void		ret void
}		}

llvm/test/CodeGen/AMDGPU/stack-slot-color-sgpr-vgpr-spills.mir

	# RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs -stress-regalloc=1 -start-before=greedy -stop-after=stack-slot-coloring -o - %s \| FileCheck %s			# Note we are NOT using the normal register allocator pipeline. We are
				# forcing allocating VGPRs and SGPRs at the same time.
				# RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs -stress-regalloc=1 -run-pass=greedy,virtregrewriter,stack-slot-coloring -o - %s \| FileCheck %s

	---			---

	# CHECK-LABEL: name: no_merge_sgpr_vgpr_spill_slot{{$}}			# CHECK-LABEL: name: no_merge_sgpr_vgpr_spill_slot{{$}}
	# CHECK: stack:			# CHECK: stack:
	# CHECK: - { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# CHECK: - { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# CHECK-NEXT: stack-id: sgpr-spill,			# CHECK-NEXT: stack-id: sgpr-spill,

	# CHECK: SI_SPILL_S32_SAVE killed renamable $sgpr5, %stack.0, implicit $exec, implicit $sgpr32 :: (store 4 into %stack.0, addrspace 5)			# CHECK: SI_SPILL_S32_SAVE killed renamable $sgpr5, %stack.0, implicit $exec, implicit $sgpr32 :: (store 4 into %stack.0, addrspace 5)
	Show All 17 Lines

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	; SI-OPT-NEXT: [[TMP2:%.*]] = extractvalue { i1, i64 } [[TMP0]], 1			; SI-OPT-NEXT: [[TMP2:%.*]] = extractvalue { i1, i64 } [[TMP0]], 1
	; SI-OPT-NEXT: br i1 [[TMP1]], label [[BB6:%.]], label [[BB9_BB12_CRIT_EDGE:%.]]			; SI-OPT-NEXT: br i1 [[TMP1]], label [[BB6:%.]], label [[BB9_BB12_CRIT_EDGE:%.]]
	; SI-OPT: bb9.bb12_crit_edge:			; SI-OPT: bb9.bb12_crit_edge:
	; SI-OPT-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 [[TMP2]])			; SI-OPT-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 [[TMP2]])
	; SI-OPT-NEXT: br label [[BB12]]			; SI-OPT-NEXT: br label [[BB12]]
	; SI-OPT: bb12:			; SI-OPT: bb12:
	; SI-OPT-NEXT: store float 0.000000e+00, float addrspace(1)* null, align 8			; SI-OPT-NEXT: store float 0.000000e+00, float addrspace(1)* null, align 8
	; SI-OPT-NEXT: ret void			; SI-OPT-NEXT: ret void
	;
	bb:			bb:
	%tmp = load i32, i32 addrspace(1)* null, align 16			%tmp = load i32, i32 addrspace(1)* null, align 16
	%tmp1 = icmp slt i32 %tmp, 21			%tmp1 = icmp slt i32 %tmp, 21
	br i1 %tmp1, label %bb4, label %bb2			br i1 %tmp1, label %bb4, label %bb2

	bb2: ; preds = %bb			bb2: ; preds = %bb
	%tmp3 = icmp eq i32 %tmp, 21			%tmp3 = icmp eq i32 %tmp, 21
	br i1 %tmp3, label %bb12, label %bb9			br i1 %tmp3, label %bb12, label %bb9
	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	; SI-OPT: bb18:			; SI-OPT: bb18:
	; SI-OPT-NEXT: store float 0x7FF8000000000000, float addrspace(5)* null, align 4			; SI-OPT-NEXT: store float 0x7FF8000000000000, float addrspace(5)* null, align 4
	; SI-OPT-NEXT: br label [[BB2]]			; SI-OPT-NEXT: br label [[BB2]]
	;			;
	; GCN-LABEL: blam:			; GCN-LABEL: blam:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v43, s33, 4			; GCN-NEXT: v_writelane_b32 v40, s33, 4
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x800			; GCN-NEXT: s_add_u32 s32, s32, 0x800
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v43, s34, 0			; GCN-NEXT: v_writelane_b32 v40, s34, 0
	; GCN-NEXT: v_writelane_b32 v43, s35, 1			; GCN-NEXT: v_writelane_b32 v40, s35, 1
	; GCN-NEXT: v_writelane_b32 v43, s36, 2			; GCN-NEXT: v_writelane_b32 v40, s36, 2
	; GCN-NEXT: v_writelane_b32 v43, s37, 3			; GCN-NEXT: v_writelane_b32 v40, s37, 3
	; GCN-NEXT: s_mov_b64 s[4:5], 0			; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: v_mov_b32_e32 v1, 0			; GCN-NEXT: v_mov_b32_e32 v1, 0
	; GCN-NEXT: v_mov_b32_e32 v2, 0			; GCN-NEXT: v_mov_b32_e32 v2, 0
	; GCN-NEXT: v_and_b32_e32 v0, 0x3ff, v0			; GCN-NEXT: v_and_b32_e32 v0, 0x3ff, v0
	; GCN-NEXT: flat_load_dword v40, v[1:2]			; GCN-NEXT: flat_load_dword v41, v[1:2]
	; GCN-NEXT: v_mov_b32_e32 v42, 0			; GCN-NEXT: v_mov_b32_e32 v43, 0
	; GCN-NEXT: s_getpc_b64 s[36:37]			; GCN-NEXT: s_getpc_b64 s[36:37]
	; GCN-NEXT: s_add_u32 s36, s36, spam@rel32@lo+4			; GCN-NEXT: s_add_u32 s36, s36, spam@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s37, s37, spam@rel32@hi+12			; GCN-NEXT: s_addc_u32 s37, s37, spam@rel32@hi+12
	; GCN-NEXT: v_lshlrev_b32_e32 v41, 2, v0			; GCN-NEXT: v_lshlrev_b32_e32 v42, 2, v0
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_cmp_eq_f32_e64 s[34:35], 0, v40			; GCN-NEXT: v_cmp_eq_f32_e64 s[34:35], 0, v41
	; GCN-NEXT: s_branch BB1_3			; GCN-NEXT: s_branch BB1_3
	; GCN-NEXT: BB1_1: ; %bb10			; GCN-NEXT: BB1_1: ; %bb10
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1
	; GCN-NEXT: s_or_b64 exec, exec, s[6:7]			; GCN-NEXT: s_or_b64 exec, exec, s[6:7]
	; GCN-NEXT: v_mov_b32_e32 v0, 0x7fc00000			; GCN-NEXT: v_mov_b32_e32 v0, 0x7fc00000
	; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GCN-NEXT: BB1_2: ; %bb18			; GCN-NEXT: BB1_2: ; %bb18
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1
	; GCN-NEXT: v_mov_b32_e32 v0, 0x7fc00000			; GCN-NEXT: v_mov_b32_e32 v0, 0x7fc00000
	; GCN-NEXT: s_mov_b64 s[4:5], 0			; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GCN-NEXT: BB1_3: ; %bb2			; GCN-NEXT: BB1_3: ; %bb2
	; GCN-NEXT: ; =>This Loop Header: Depth=1			; GCN-NEXT: ; =>This Loop Header: Depth=1
	; GCN-NEXT: ; Child Loop BB1_4 Depth 2			; GCN-NEXT: ; Child Loop BB1_4 Depth 2
	; GCN-NEXT: s_mov_b64 s[6:7], 0			; GCN-NEXT: s_mov_b64 s[6:7], 0
	; GCN-NEXT: BB1_4: ; %bb2			; GCN-NEXT: BB1_4: ; %bb2
	; GCN-NEXT: ; Parent Loop BB1_3 Depth=1			; GCN-NEXT: ; Parent Loop BB1_3 Depth=1
	; GCN-NEXT: ; => This Inner Loop Header: Depth=2			; GCN-NEXT: ; => This Inner Loop Header: Depth=2
	; GCN-NEXT: flat_load_dword v0, v[41:42]			; GCN-NEXT: flat_load_dword v0, v[42:43]
	; GCN-NEXT: v_mov_b32_e32 v1, 0			; GCN-NEXT: v_mov_b32_e32 v1, 0
	; GCN-NEXT: buffer_store_dword v1, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v1, off, s[0:3], 0
	; GCN-NEXT: s_waitcnt vmcnt(1)			; GCN-NEXT: s_waitcnt vmcnt(1)
	; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 3, v0			; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 3, v0
	; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc			; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc
	; GCN-NEXT: s_cbranch_execz BB1_6			; GCN-NEXT: s_cbranch_execz BB1_6
	; GCN-NEXT: ; %bb.5: ; %bb8			; GCN-NEXT: ; %bb.5: ; %bb8
	; GCN-NEXT: ; in Loop: Header=BB1_4 Depth=2			; GCN-NEXT: ; in Loop: Header=BB1_4 Depth=2
	Show All 27 Lines
	; GCN-NEXT: s_cbranch_execnz BB1_10			; GCN-NEXT: s_cbranch_execnz BB1_10
	; GCN-NEXT: ; %bb.9: ; %bb16			; GCN-NEXT: ; %bb.9: ; %bb16
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1
	; GCN-NEXT: s_or_b64 exec, exec, s[4:5]			; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
	; GCN-NEXT: v_mov_b32_e32 v0, 0x7fc00000			; GCN-NEXT: v_mov_b32_e32 v0, 0x7fc00000
	; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GCN-NEXT: BB1_10: ; %bb17			; GCN-NEXT: BB1_10: ; %bb17
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], 0
	; GCN-NEXT: s_branch BB1_2			; GCN-NEXT: s_branch BB1_2
	bb:			bb:
	%tmp = load float, float* null, align 16			%tmp = load float, float* null, align 16
	br label %bb2			br label %bb2

	bb1: ; preds = %bb8, %bb6			bb1: ; preds = %bb8, %bb6
	br label %bb2			br label %bb2

	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s

	declare void @extern_func()			declare void @extern_func()

	define <4 x float> @non_preserved_vgpr_tuple8(<8 x i32> %rsrc, <4 x i32> %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {			define <4 x float> @non_preserved_vgpr_tuple8(<8 x i32> %rsrc, <4 x i32> %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {
	; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs not be			; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs not be
	; preserved across the call and should get 8 scratch registers.			; preserved across the call and should get 8 scratch registers.

	; GFX9-LABEL: non_preserved_vgpr_tuple8:			; GFX9-LABEL: non_preserved_vgpr_tuple8:
	; GFX9: buffer_store_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX9: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; GFX9: buffer_store_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX9: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill

	; GFX9: v_mov_b32_e32 v36, v16			; GFX9: v_mov_b32_e32 v36, v16
	; GFX9-NEXT: v_mov_b32_e32 v35, v15			; GFX9-NEXT: v_mov_b32_e32 v35, v15
	; GFX9-NEXT: v_mov_b32_e32 v34, v14			; GFX9-NEXT: v_mov_b32_e32 v34, v14
	; GFX9-NEXT: v_mov_b32_e32 v33, v13			; GFX9-NEXT: v_mov_b32_e32 v33, v13
	; GFX9-NEXT: v_mov_b32_e32 v32, v12			; GFX9-NEXT: v_mov_b32_e32 v32, v12
	; GFX9: ;;#ASMSTART			; GFX9: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9: image_gather4_c_b_cl v[40:43], v[32:39], s[4:11], s[4:7] dmask:0x1			; GFX9: image_gather4_c_b_cl v[41:44], v[32:39], s[4:11], s[4:7] dmask:0x1
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v44, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9: s_waitcnt lgkmcnt(0)			; GFX9: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]

	; GFX9: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9: buffer_load_dword v44, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9: buffer_load_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GFX9: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; GFX9: s_setpc_b64 s[4:5]			; GFX9: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: non_preserved_vgpr_tuple8:			; GFX10-LABEL: non_preserved_vgpr_tuple8:
	; GFX10: buffer_store_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX10: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; GFX10: buffer_store_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX10: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill

	; GFX10: v_mov_b32_e32 v36, v16			; GFX10: v_mov_b32_e32 v36, v16
	; GFX10-NEXT: v_mov_b32_e32 v35, v15			; GFX10-NEXT: v_mov_b32_e32 v35, v15
	; GFX10-NEXT: v_mov_b32_e32 v34, v14			; GFX10-NEXT: v_mov_b32_e32 v34, v14
	; GFX10-NEXT: v_mov_b32_e32 v33, v13			; GFX10-NEXT: v_mov_b32_e32 v33, v13
	; GFX10-NEXT: v_mov_b32_e32 v32, v12			; GFX10-NEXT: v_mov_b32_e32 v32, v12

	; GFX10: ;;#ASMSTART			; GFX10: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND

	; GFX10: image_gather4_c_b_cl v[40:43], v[32:39], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10: image_gather4_c_b_cl v[41:44], v[32:39], s[4:11], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX10: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10: s_waitcnt lgkmcnt(0)			; GFX10: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]

	; GFX10: buffer_load_dword v43, off, s[0:3], s33			; GFX10: buffer_load_dword v44, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12

	; GFX10: buffer_load_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GFX10: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; GFX10: s_setpc_b64 s[4:5]			; GFX10: s_setpc_b64 s[4:5]
	main_body:			main_body:
	call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0			call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0
	call void asm sideeffect "", "~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15}"() #0			call void asm sideeffect "", "~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15}"() #0
	call void asm sideeffect "", "~{v16},~{v17},~{v18},~{v19},~{v20},~{v21},~{v22},~{v23}"() #0			call void asm sideeffect "", "~{v16},~{v17},~{v18},~{v19},~{v20},~{v21},~{v22},~{v23}"() #0
	call void asm sideeffect "", "~{v24},~{v25},~{v26},~{v27},~{v28},~{v29},~{v30},~{v31}"() #0			call void asm sideeffect "", "~{v24},~{v25},~{v26},~{v27},~{v28},~{v29},~{v30},~{v31}"() #0
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> undef, <4 x i32> undef, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> undef, <4 x i32> undef, i1 false, i32 0, i32 0)
	call void @extern_func()			call void @extern_func()
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define <4 x float> @call_preserved_vgpr_tuple8(<8 x i32> %rsrc, <4 x i32> %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {			define <4 x float> @call_preserved_vgpr_tuple8(<8 x i32> %rsrc, <4 x i32> %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {
	; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs to be preserved			; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs to be preserved
	; across the call and should get allcoated to 8 CSRs.			; across the call and should get allcoated to 8 CSRs.
	; Only the lower 5 sub-registers of the tuple are preserved.			; Only the lower 5 sub-registers of the tuple are preserved.
	; The upper 3 sub-registers are unused.			; The upper 3 sub-registers are unused.

	; GFX9-LABEL: call_preserved_vgpr_tuple8:			; GFX9-LABEL: call_preserved_vgpr_tuple8:
	; GFX9: buffer_store_dword v56, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GFX9: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX9: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX9: buffer_store_dword v56, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v57, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v58, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v59, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v60, off, s[0:3], s33 ; 4-byte Folded Spill

				; GFX9: v_mov_b32_e32 v60, v16
				; GFX9-NEXT: v_mov_b32_e32 v59, v15
				; GFX9-NEXT: v_mov_b32_e32 v58, v14
				; GFX9-NEXT: v_mov_b32_e32 v57, v13
				; GFX9-NEXT: v_mov_b32_e32 v56, v12

	; GFX9: v_mov_b32_e32 v44, v16			; GFX9: image_gather4_c_b_cl v[0:3], v[56:63], s[36:43], s[4:7] dmask:0x1
	; GFX9-NEXT: v_mov_b32_e32 v43, v15
	; GFX9-NEXT: v_mov_b32_e32 v42, v14
	; GFX9-NEXT: v_mov_b32_e32 v41, v13
	; GFX9-NEXT: v_mov_b32_e32 v40, v12

	; GFX9: image_gather4_c_b_cl v[0:3], v[40:47], s[36:43], s[4:7] dmask:0x1
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9: s_waitcnt vmcnt(0)			; GFX9: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_store_dwordx4 v[0:1], v[0:3], off			; GFX9-NEXT: global_store_dwordx4 v[0:1], v[0:3], off
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[40:47], s[36:43], s[4:7] dmask:0x1			; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[56:63], s[36:43], s[4:7] dmask:0x1

	; GFX9: buffer_load_dword v44, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9: buffer_load_dword v60, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v59, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v58, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v57, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v56, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload

	; GFX9: buffer_load_dword v56, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; GFX9: buffer_load_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; GFX9: s_setpc_b64 s[4:5]			; GFX9: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: call_preserved_vgpr_tuple8:			; GFX10-LABEL: call_preserved_vgpr_tuple8:
	; GFX10: buffer_store_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GFX10: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX10: buffer_store_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX10: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill


	; GFX10: image_gather4_c_b_cl v[0:3], v[12:19], s[36:43], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10: image_gather4_c_b_cl v[0:3], v[12:19], s[36:43], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+12
	; GFX10-NEXT: v_mov_b32_e32 v40, v16			; GFX10-NEXT: v_mov_b32_e32 v41, v16
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v41, v15			; GFX10-NEXT: v_mov_b32_e32 v42, v15
	; GFX10-NEXT: v_mov_b32_e32 v42, v14			; GFX10-NEXT: v_mov_b32_e32 v43, v14
	; GFX10-NEXT: v_mov_b32_e32 v43, v13			; GFX10-NEXT: v_mov_b32_e32 v44, v13
	; GFX10-NEXT: v_mov_b32_e32 v44, v12			; GFX10-NEXT: v_mov_b32_e32 v45, v12
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: global_store_dwordx4 v[0:1], v[0:3], off			; GFX10-NEXT: global_store_dwordx4 v[0:1], v[0:3], off
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v44, v43, v42, v41, v40], s[36:43], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v45, v44, v43, v42, v41], s[36:43], s[4:7] dmask:0x1 dim:SQ_RSRC_IMG_2D

	; GFX10: buffer_load_dword v44, off, s[0:3], s33			; GFX10: buffer_load_dword v45, off, s[0:3], s33{{$}}
	; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16
	; GFX10: buffer_load_dword v45, off, s[0:3], s32 offset:20			; GFX10: buffer_load_dword v40, off, s[0:3], s32 offset:20
	; GFX10: s_setpc_b64 s[4:5]			; GFX10: s_setpc_b64 s[4:5]
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> undef, <4 x i32> undef, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> undef, <4 x i32> undef, i1 false, i32 0, i32 0)
	store <4 x float> %v, <4 x float> addrspace(1)* undef			store <4 x float> %v, <4 x float> addrspace(1)* undef
	call void @extern_func()			call void @extern_func()
	%v1 = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> undef, <4 x i32> undef, i1 false, i32 0, i32 0)			%v1 = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> undef, <4 x i32> undef, i1 false, i32 0, i32 0)
	ret <4 x float> %v1			ret <4 x float> %v1
	}			}

	declare <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 immarg, float, float, float, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #1			declare <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 immarg, float, float, float, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #1

	attributes #0 = { nounwind writeonly }			attributes #0 = { nounwind writeonly }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }

llvm/test/CodeGen/AMDGPU/virtregrewrite-undef-identity-copy.mir

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=amdgcn-amd-amdhsa -start-before=greedy -stop-after=virtregrewriter -verify-machineinstrs -o - %s \| FileCheck %s		# RUN: llc -mtriple=amdgcn-amd-amdhsa -start-before=greedy,0 -stop-after=virtregrewriter,1 -verify-machineinstrs -o - %s \| FileCheck %s

# The undef copy of %4 is allocated to $vgpr3, and the identity copy		# The undef copy of %4 is allocated to $vgpr3, and the identity copy
# was deleted, and $vgpr3 was considered undef. The code to replace		# was deleted, and $vgpr3 was considered undef. The code to replace
# the undef copy with a kill was incorrectly checking the dest		# the undef copy with a kill was incorrectly checking the dest
# operand, rather than the source.		# operand, rather than the source.

--- \|		--- \|
define amdgpu_kernel void @undef_identity_copy() {		define amdgpu_kernel void @undef_identity_copy() {
Show All 15 Lines	machineFunctionInfo:
scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'		scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
frameOffsetReg: '$sgpr95'		frameOffsetReg: '$sgpr95'
stackPtrOffsetReg: '$sgpr32'		stackPtrOffsetReg: '$sgpr32'
body: \|		body: \|
bb.0:		bb.0:
; CHECK-LABEL: name: undef_identity_copy		; CHECK-LABEL: name: undef_identity_copy
; CHECK: renamable $vgpr40_vgpr41_vgpr42_vgpr43 = FLAT_LOAD_DWORDX4 undef renamable $vgpr0_vgpr1, 0, 0, implicit $exec, implicit $flat_scr :: (load 16, addrspace 1)		; CHECK: renamable $vgpr40_vgpr41_vgpr42_vgpr43 = FLAT_LOAD_DWORDX4 undef renamable $vgpr0_vgpr1, 0, 0, implicit $exec, implicit $flat_scr :: (load 16, addrspace 1)
; CHECK: renamable $sgpr6_sgpr7 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @foo + 4, target-flags(amdgpu-rel32-hi) @foo + 4, implicit-def dead $scc		; CHECK: renamable $sgpr6_sgpr7 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @foo + 4, target-flags(amdgpu-rel32-hi) @foo + 4, implicit-def dead $scc
; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95		; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95, implicit-def $scc
; CHECK: $sgpr4 = COPY $sgpr95		; CHECK: $sgpr4 = COPY $sgpr95
; CHECK: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr6_sgpr7, @foo, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4		; CHECK: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr6_sgpr7, @foo, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4
; CHECK: ADJCALLSTACKDOWN 0, 4, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95		; CHECK: ADJCALLSTACKDOWN 0, 4, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95
; CHECK: renamable $sgpr6_sgpr7 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @bar + 4, target-flags(amdgpu-rel32-hi) @bar + 4, implicit-def dead $scc		; CHECK: renamable $sgpr6_sgpr7 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @bar + 4, target-flags(amdgpu-rel32-hi) @bar + 4, implicit-def dead $scc
; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95		; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95
; CHECK: $sgpr4 = COPY $sgpr95		; CHECK: $sgpr4 = COPY $sgpr95
; CHECK: $vgpr0 = COPY renamable $vgpr40		; CHECK: $vgpr0 = COPY renamable $vgpr40
; CHECK: $vgpr1 = COPY renamable $vgpr41		; CHECK: $vgpr1 = COPY renamable $vgpr41
Show All 26 Lines

This is an archive of the discontinued LLVM Phabricator instance.

RegAlloc: Allow targets to split register allocationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 344878

llvm/include/llvm/CodeGen/Passes.h

llvm/include/llvm/CodeGen/RegAllocCommon.h

llvm/include/llvm/CodeGen/RegAllocRegistry.h

llvm/lib/CodeGen/LiveIntervals.cpp

llvm/lib/CodeGen/RegAllocBase.h

llvm/lib/CodeGen/RegAllocBase.cpp

llvm/lib/CodeGen/RegAllocBasic.cpp

llvm/lib/CodeGen/RegAllocFast.cpp

llvm/lib/CodeGen/RegAllocGreedy.cpp

llvm/lib/CodeGen/TargetPassConfig.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement-stack-lower.ll

llvm/test/CodeGen/AMDGPU/agpr-csr.ll

llvm/test/CodeGen/AMDGPU/alloc-aligned-tuples-gfx908.mir

llvm/test/CodeGen/AMDGPU/alloc-aligned-tuples-gfx90a.mir

llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-vgpr-limit.ll

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/indirect-call.ll

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

llvm/test/CodeGen/AMDGPU/pei-build-spill.mir

llvm/test/CodeGen/AMDGPU/sgpr-regalloc-flags.ll

llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll

llvm/test/CodeGen/AMDGPU/sgpr-spill-wrong-stack-id.mir

llvm/test/CodeGen/AMDGPU/sibling-call.ll

llvm/test/CodeGen/AMDGPU/spill-empty-live-interval.mir

llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll

llvm/test/CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll

llvm/test/CodeGen/AMDGPU/stack-slot-color-sgpr-vgpr-spills.mir

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

llvm/test/CodeGen/AMDGPU/virtregrewrite-undef-identity-copy.mir

RegAlloc: Allow targets to split register allocation
ClosedPublic