This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
Passes.h
1/1
RegAllocCommon.h
-
RegAllocRegistry.h
-
TargetFrameLowering.h
-
lib/
-
CodeGen/
-
LiveIntervals.cpp
-
RegAllocBase.h
1/1
RegAllocBase.cpp
-
RegAllocBasic.cpp
3/6
RegAllocFast.cpp
2/3
RegAllocGreedy.cpp
1/1
TargetFrameLoweringImpl.cpp
-
Target/AMDGPU/
-
AMDGPU/
-
AMDGPU.h
-
AMDGPUCallingConv.td
-
AMDGPURegisterInfo.cpp
1/2
AMDGPUTargetMachine.cpp
-
CMakeLists.txt
-
SIFrameLowering.h
-
SIFrameLowering.cpp
-
SILowerSGPRSpills.cpp
-
SIMachineFunctionInfo.h
-
SIMachineFunctionInfo.cpp
-
SIRegisterInfo.h
-
SIRegisterInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
callee-frame-setup.ll
-
callee-special-input-sgprs.ll
-
callee-special-input-vgprs.ll
-
cross-block-use-is-not-abi-copy.ll
-
debug-value2.ll
-
partial-sgpr-to-vgpr-spills.ll
-
sgpr-regalloc-flags.ll
-
sgpr-spill-no-vgprs.ll
-
sgpr-spill-wrong-stack-id.mir
-
si-spill-sgpr-stack.ll
-
sibling-call.ll
-
spill-csr-frame-ptr-reg-copy.ll
-
spill-empty-live-interval.mir
-
spill-scavenge-offset.ll
-
stack-slot-color-sgpr-vgpr-spills.mir

Differential D55301

RegAlloc: Allow targets to split register allocation
ClosedPublic

Authored by arsenm on Dec 4 2018, 4:24 PM.

Download Raw Diff

Details

Reviewers

MatzeB
qcolombet
rampitec
scott.linder

Summary

AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.

Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run. This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.

This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.

In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.

One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.

Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.

Diff Detail

Event Timeline

arsenm created this revision.Dec 4 2018, 4:24 PM

Herald added subscribers: tpr, mgorny, nhaehnle and 2 others. · View Herald TranscriptDec 4 2018, 4:24 PM

arsenm added parent revisions: D55295: LiveIntervals: Add removePhysReg, D55238: MIR: Preserve incoming frame index numbers, D55287: VirtRegMap: Support partially allocated virtual registers, D55285: AMDGPU: Scavenge register instead of findUnusedReg, D55286: VirtRegMap: Add pass option to not clear virt regs, D55284: RegisterScavenger: Allow fail without spill, D55283: CodeGen: Refactor regallocator command line and target selection, D55282: CodeGen: Make RegAllocRegistry a template class.Dec 4 2018, 4:24 PM

arsenm added a child revision: D55333: VirtRegMap: Preserve LiveDebugVariables.Dec 5 2018, 9:06 AM

Hi Matt,

Have you tried to use combined V+S register classes?
By describing such classes, when a S or V register would be split, they would eventually have constraints in that "super" class. Thus, inside of spilling, the splitting mechanism would naturally insert copies of the form [V|S] = copy V+S or V+S = copy [V|S], which seem to be what you are trying to achieve. The advantage of such approach is that we would not have to effectively split the allocation.

Cheers,
-Quentin

In D55301#1321550, @qcolombet wrote:

Hi Matt,

Have you tried to use combined V+S register classes?
By describing such classes, when a S or V register would be split, they would eventually have constraints in that "super" class. Thus, inside of spilling, the splitting mechanism would naturally insert copies of the form [V|S] = copy V+S or V+S = copy [V|S], which seem to be what you are trying to achieve. The advantage of such approach is that we would not have to effectively split the allocation.

Cheers,
-Quentin

I'm not sure I follow this. These aren't spilled with ordinary copies. This uses cross lane instructions to read/write SGPRs into the various lane VGPRs (i.e 64 SGPRs can be spilled to each lane in the wave's VGPR). We also can't legally copy from V to S. Having virtual registers with the combined class doesn't really conceptually make sense for us either (and would probably break every single place that we need to consider these)

This also wouldn't allow us to change the set of reserved registers in the middle of allocation, which is part of the problem.

rampitec added inline comments.Dec 6 2018, 5:26 PM

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
75	!isSGPRClass() to catch [potentially] remaining strange register classes.
1064	You need to pass filter to PreRewrite as well.

I'm not sure I follow this. These aren't spilled with ordinary copies

I would expect that you could use ordinary copies + subreg here and do the proper expansion in the later expand post RA pass like every other copy.

This uses cross lane instructions to read/write SGPRs into the various lane VGPRs (i.e 64 SGPRs can be spilled to each lane in the wave's VGPR). We also can't legally copy from V to S. Having virtual registers with the combined class doesn't really conceptually make sense for us either (and would probably break every single place that we need to consider these)

That wouldn't appear in elsewhere than tablegen. That's just something to tell RA that the biggest unconstrained class is V+S.

This also wouldn't allow us to change the set of reserved registers in the middle of allocation, which is part of the problem.

I missed that part, but I also don't get why this is a problem. IIRC we can always narrow the set of available registers for each virtual register.

Anyhow, the changes on the generic parts looks mostly good to me. Comments inlined.

include/llvm/CodeGen/RegAllocCommon.h
23	Please add doxygen comment.
lib/CodeGen/RegAllocGreedy.cpp
615	Why do we need both constructors?
707	Removing this assert is worrisome. Why do we need that?
lib/CodeGen/TargetFrameLoweringImpl.cpp
21	Why do we need this change?

In D55301#1323324, @qcolombet wrote:

I'm not sure I follow this. These aren't spilled with ordinary copies

I would expect that you could use ordinary copies + subreg here and do the proper expansion in the later expand post RA pass like every other copy.

We don't model different lanes as subregisters, and trying to would be a pretty radical change. I can almost see a way to hack it to work, but it would involve adding an enormous number of new subregister indexes. Unless you mean using some kind of RMW copy (since the old single lane view of the register's value needs to be preserved)

Remove leftover changes and add comment

arsenm mentioned this in D54365: RegAllocFast: Remove early selection loop, the spill calculation will report cost 0 anyway for free regs.Jan 9 2019, 8:40 PM

arsenm mentioned this in D52010: RegAllocFast: Rewrite and improve.

ping

In D55301#1393065, @arsenm wrote:

ping

Pass the filter to PreRewrite.

Hi Matt,

Couple of nitpicks inline.
My online remaining concern is exposing ClearVirtRegs.

Cheers,
-Quentin

lib/CodeGen/RegAllocBase.cpp
179	For debugging purposes, add a DEBUG statement for each case.
lib/CodeGen/RegAllocFast.cpp
73–79	Should this be `const RegClassFilterFunc &` everywhere?
78	It feels dangerous to expose the ClearVirtRegs to me. Could we deduce what has to be cleared based on what we allocate instead of exposing this?
1350	Could we have just one createFastRegisterAllocator with default arguments? (Also ClearVirtReg should disappear per my other comment IMO).
lib/CodeGen/RegAllocGreedy.cpp
604	Ditto: Just one createXXX method.

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 12 2019, 10:04 AM

arsenm marked 2 inline comments as done.Feb 13 2019, 9:18 AM

arsenm added inline comments.

lib/CodeGen/RegAllocFast.cpp
78	The problem is somewhere needs to set NoVRegs property. The same parameter is added to createVirtRegRewriter, but fastregalloc does the assignment itself. I don't think this can be inferred, and the target needs to say when it's done allocating register classes. For example it would be possible to have a degenerate function where all SGPRs are allocated in the first run, and there happen to be no VGPR vregs. Intervening passes may want to introduce new vregs to be taken care of by the later runs, but that won't work if the earlier pass decided to infer that all registers were taken care of

arsenm marked an inline comment as done.Feb 13 2019, 9:22 AM

arsenm added inline comments.

lib/CodeGen/RegAllocFast.cpp
78	Actually I stopped creating new virtual registers at some point in the current implementation, but I still may want to do so in the future

In D55301#1393260, @rampitec wrote:

In D55301#1393065, @arsenm wrote:

ping

Pass the filter to PreRewrite.

I'm not sure what good that would do as it doesn't do anything now

lib/CodeGen/RegAllocFast.cpp
1350	The RegAllocRegistry requires the type to be the no-argument function pass constructor. I could change that, but then all would have the ClearVirtRegs argument or not

Partially address comments.

This also probably needs some more test fixes, but the fast regalloc rewrite patches need rebasing first

arsenm mentioned this in D55295: LiveIntervals: Add removePhysReg.Feb 13 2019, 6:04 PM

Is this still alive?

Herald added a subscriber: kerbowa. · View Herald TranscriptApr 20 2020, 1:02 PM

aditya_nandakumar added a subscriber: aditya_nandakumar.Apr 20 2020, 1:06 PM

In D55301#1993080, @qcolombet wrote:

Is this still alive?

Yes, but it depends on the fastregalloc rewrite patches (which I need to rrebase the tests for, for the 100th time which takes forever)

but it depends on the fastregalloc rewrite patches

Which ones?

In D55301#1993283, @qcolombet wrote:

but it depends on the fastregalloc rewrite patches

Which ones?

D54368 and D52010. I started rebasing the tests a few months ago but didn't finish; I think I got distracted by regressed loop spills vs. last time I rebased

Thanks for the pointers, I'll try to look into reviewing D52010 next week.

Rebase, fix AGPR handling

Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2020, 5:54 PM

Herald added subscribers: wenlei, hiraditya. · View Herald Transcript

rampitec added inline comments.Dec 23 2020, 10:29 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1217 ↗	(On Diff #313463)	GCNRegBankReassign also works with SGPRs. Which means you need a pre-rewriter here, which needs to have a different subset of passes and an RC filter.
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1270 ↗	(On Diff #313463)	It can be saddr with flat scratch. It seems it needs to be fixed in a separate patch first.

rampitec added inline comments.Dec 23 2020, 10:48 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1270 ↗	(On Diff #313463)	Never mind, this is one of SI_SPILL opcodes, not real instruction yet.

arsenm mentioned this in D96336: [AMDGPU] Save VGPR of whole wave when spilling.Feb 23 2021, 8:47 AM

Rebase

Herald added a subscriber: nikic. · View Herald TranscriptMay 12 2021, 11:04 AM

Harbormaster completed remote builds in B104083: Diff 344878.May 12 2021, 11:04 AM

Since GCNRegBanksReassign is removed this is LGTM.

LGTM.
Disclaimer: I didn't really look at the AMDGPU changes.

This revision is now accepted and ready to land.May 12 2021, 1:43 PM

In D55301#2755345, @qcolombet wrote:

LGTM.
Disclaimer: I didn't really look at the AMDGPU changes.

I did. Thanks!

@arsenm it is a good idea to run PSDB before this change.

At long last, eebe841a47cbbd55bdcc32da943c92d18f88a5b8

Herald added a subscriber: foad. · View Herald TranscriptJul 13 2021, 4:35 PM

lkail added a subscriber: lkail.Sep 14 2021, 5:08 AM

cdevadas mentioned this in rG8f9dd5e608c0: [AMDGPU] Vector register spill test cleanup (NFC).Apr 26 2022, 12:49 AM

Revision Contents

Path

Size

include/

llvm/

CodeGen/

Passes.h

6 lines

RegAllocCommon.h

30 lines

RegAllocRegistry.h

1 line

TargetFrameLowering.h

1 line

lib/

CodeGen/

7 lines

11 lines

16 lines

15 lines

32 lines

39 lines

TargetFrameLoweringImpl.cpp

1 line

Target/

AMDGPU/

AMDGPU.h

3 lines

AMDGPUCallingConv.td

5 lines

AMDGPURegisterInfo.cpp

4 lines

AMDGPUTargetMachine.cpp

196 lines

CMakeLists.txt

1 line

SIFrameLowering.h

3 lines

SIFrameLowering.cpp

70 lines

SILowerSGPRSpills.cpp

299 lines

SIMachineFunctionInfo.h

26 lines

SIMachineFunctionInfo.cpp

42 lines

SIRegisterInfo.h

17 lines

SIRegisterInfo.cpp

93 lines

test/

CodeGen/

AMDGPU/

callee-frame-setup.ll

58 lines

callee-special-input-sgprs.ll

2 lines

callee-special-input-vgprs.ll

15 lines

cross-block-use-is-not-abi-copy.ll

4 lines

debug-value2.ll

2 lines

partial-sgpr-to-vgpr-spills.ll

187 lines

sgpr-regalloc-flags.ll

106 lines

sgpr-spill-no-vgprs.ll

189 lines

sgpr-spill-wrong-stack-id.mir

7 lines

si-spill-sgpr-stack.ll

41 lines

sibling-call.ll

24 lines

spill-csr-frame-ptr-reg-copy.ll

6 lines

spill-empty-live-interval.mir

2 lines

spill-scavenge-offset.ll

2 lines

stack-slot-color-sgpr-vgpr-spills.mir

17 lines

Diff 176729

include/llvm/CodeGen/Passes.h

Show All 9 Lines
// This file defines interfaces to access the target independent code generation		// This file defines interfaces to access the target independent code generation
// passes provided by the LLVM backend.		// passes provided by the LLVM backend.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CODEGEN_PASSES_H		#ifndef LLVM_CODEGEN_PASSES_H
#define LLVM_CODEGEN_PASSES_H		#define LLVM_CODEGEN_PASSES_H

		#include "llvm/CodeGen/RegAllocCommon.h"

#include <functional>		#include <functional>
#include <string>		#include <string>

namespace llvm {		namespace llvm {

class FunctionPass;		class FunctionPass;
class MachineFunction;		class MachineFunction;
class MachineFunctionPass;		class MachineFunctionPass;
▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	/// MachineDominanaceFrontier - This pass is a machine dominators analysis pass.

/// This pass perform post-ra machine sink for COPY instructions.		/// This pass perform post-ra machine sink for COPY instructions.
extern char &PostRAMachineSinkingID;		extern char &PostRAMachineSinkingID;

/// FastRegisterAllocation Pass - This pass register allocates as fast as		/// FastRegisterAllocation Pass - This pass register allocates as fast as
/// possible. It is best suited for debug code where live ranges are short.		/// possible. It is best suited for debug code where live ranges are short.
///		///
FunctionPass *createFastRegisterAllocator();		FunctionPass *createFastRegisterAllocator();
		FunctionPass *createFastRegisterAllocator(RegClassFilterFunc F,
		bool ClearVirtRegs);

/// BasicRegisterAllocation Pass - This pass implements a degenerate global		/// BasicRegisterAllocation Pass - This pass implements a degenerate global
/// register allocator using the basic regalloc framework.		/// register allocator using the basic regalloc framework.
///		///
FunctionPass *createBasicRegisterAllocator();		FunctionPass *createBasicRegisterAllocator();
		FunctionPass *createBasicRegisterAllocator(RegClassFilterFunc F);

/// Greedy register allocation pass - This pass implements a global register		/// Greedy register allocation pass - This pass implements a global register
/// allocator for optimized builds.		/// allocator for optimized builds.
///		///
FunctionPass *createGreedyRegisterAllocator();		FunctionPass *createGreedyRegisterAllocator();
		FunctionPass *createGreedyRegisterAllocator(RegClassFilterFunc F);

/// PBQPRegisterAllocation Pass - This pass implements the Partitioned Boolean		/// PBQPRegisterAllocation Pass - This pass implements the Partitioned Boolean
/// Quadratic Prograaming (PBQP) based register allocator.		/// Quadratic Prograaming (PBQP) based register allocator.
///		///
FunctionPass *createDefaultPBQPRegisterAllocator();		FunctionPass *createDefaultPBQPRegisterAllocator();

/// PrologEpilogCodeInserter - This pass inserts prolog and epilog code,		/// PrologEpilogCodeInserter - This pass inserts prolog and epilog code,
/// and eliminates abstract frame references.		/// and eliminates abstract frame references.
▲ Show 20 Lines • Show All 270 Lines • Show Last 20 Lines

include/llvm/CodeGen/RegAllocCommon.h

This file was added.

				//===- RegAllocCommon.h - Utilities shared between allocators ---- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CODEGEN_REGALLOCCOMMON_H
				#define LLVM_CODEGEN_REGALLOCCOMMON_H

				#include <functional>

				namespace llvm {

				class TargetRegisterClass;
				class TargetRegisterInfo;

				typedef std::function<bool(const TargetRegisterInfo &TRI,
				const TargetRegisterClass &RC)> RegClassFilterFunc;

				static inline bool allocateAllRegClasses(const TargetRegisterInfo &,
				qcolombetUnsubmitted Done Reply Inline Actions Please add doxygen comment. qcolombet: Please add doxygen comment.
				const TargetRegisterClass &) {
				return true;
				}

				}

				#endif // LLVM_CODEGEN_REGALLOCCOMMON_H

include/llvm/CodeGen/RegAllocRegistry.h

	Show All 9 Lines
	// This file contains the implementation for register allocator function			// This file contains the implementation for register allocator function
	// pass registry (RegisterRegAlloc).			// pass registry (RegisterRegAlloc).
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_CODEGEN_REGALLOCREGISTRY_H			#ifndef LLVM_CODEGEN_REGALLOCREGISTRY_H
	#define LLVM_CODEGEN_REGALLOCREGISTRY_H			#define LLVM_CODEGEN_REGALLOCREGISTRY_H

				#include "llvm/CodeGen/RegAllocCommon.h"
	#include "llvm/CodeGen/MachinePassRegistry.h"			#include "llvm/CodeGen/MachinePassRegistry.h"

	namespace llvm {			namespace llvm {

	class FunctionPass;			class FunctionPass;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	///			///
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

include/llvm/CodeGen/TargetFrameLowering.h

	Show All 17 Lines
	#include <utility>			#include <utility>
	#include <vector>			#include <vector>

	namespace llvm {			namespace llvm {
	class BitVector;			class BitVector;
	class CalleeSavedInfo;			class CalleeSavedInfo;
	class MachineFunction;			class MachineFunction;
	class RegScavenger;			class RegScavenger;
				class VirtRegMap;

	/// Information about stack frame layout on the target. It holds the direction			/// Information about stack frame layout on the target. It holds the direction
	/// of stack growth, the known stack alignment on entry to each function, and			/// of stack growth, the known stack alignment on entry to each function, and
	/// the offset to the locals area.			/// the offset to the locals area.
	///			///
	/// The offset to the local area is the offset from the stack pointer on			/// The offset to the local area is the offset from the stack pointer on
	/// function entry to the first location where function data (local variables,			/// function entry to the first location where function data (local variables,
	/// spill locations) can be stored.			/// spill locations) can be stored.
	▲ Show 20 Lines • Show All 327 Lines • Show Last 20 Lines

lib/CodeGen/LiveIntervals.cpp

Show First 20 Lines • Show All 690 Lines • ▼ Show 20 Lines	void LiveIntervals::addKillFlags(const VirtRegMap *VRM) {
for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {		for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {
unsigned Reg = TargetRegisterInfo::index2VirtReg(i);		unsigned Reg = TargetRegisterInfo::index2VirtReg(i);
if (MRI->reg_nodbg_empty(Reg))		if (MRI->reg_nodbg_empty(Reg))
continue;		continue;
const LiveInterval &LI = getInterval(Reg);		const LiveInterval &LI = getInterval(Reg);
if (LI.empty())		if (LI.empty())
continue;		continue;

		// Target may have not allocated this yet.
		unsigned PhysReg = VRM->getPhys(Reg);
		if (PhysReg == 0)
		continue;

// Find the regunit intervals for the assigned register. They may overlap		// Find the regunit intervals for the assigned register. They may overlap
// the virtual register live range, cancelling any kills.		// the virtual register live range, cancelling any kills.
RU.clear();		RU.clear();
for (MCRegUnitIterator Unit(VRM->getPhys(Reg), TRI); Unit.isValid();		for (MCRegUnitIterator Unit(PhysReg, TRI); Unit.isValid();
++Unit) {		++Unit) {
const LiveRange &RURange = getRegUnit(*Unit);		const LiveRange &RURange = getRegUnit(*Unit);
if (RURange.empty())		if (RURange.empty())
continue;		continue;
RU.push_back(std::make_pair(&RURange, RURange.find(LI.begin()->end)));		RU.push_back(std::make_pair(&RURange, RURange.find(LI.begin()->end)));
}		}

if (MRI->subRegLivenessEnabled()) {		if (MRI->subRegLivenessEnabled()) {
▲ Show 20 Lines • Show All 949 Lines • Show Last 20 Lines

lib/CodeGen/RegAllocBase.h

	Show All 32 Lines
	// quality trade-off without relying on a particular theoretical solver.			// quality trade-off without relying on a particular theoretical solver.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_LIB_CODEGEN_REGALLOCBASE_H			#ifndef LLVM_LIB_CODEGEN_REGALLOCBASE_H
	#define LLVM_LIB_CODEGEN_REGALLOCBASE_H			#define LLVM_LIB_CODEGEN_REGALLOCBASE_H

	#include "llvm/ADT/SmallPtrSet.h"			#include "llvm/ADT/SmallPtrSet.h"
				#include "llvm/CodeGen/RegAllocCommon.h"
	#include "llvm/CodeGen/RegisterClassInfo.h"			#include "llvm/CodeGen/RegisterClassInfo.h"

	namespace llvm {			namespace llvm {

	class LiveInterval;			class LiveInterval;
	class LiveIntervals;			class LiveIntervals;
	class LiveRegMatrix;			class LiveRegMatrix;
	class MachineInstr;			class MachineInstr;
	Show All 14 Lines

	protected:			protected:
	const TargetRegisterInfo *TRI = nullptr;			const TargetRegisterInfo *TRI = nullptr;
	MachineRegisterInfo *MRI = nullptr;			MachineRegisterInfo *MRI = nullptr;
	VirtRegMap *VRM = nullptr;			VirtRegMap *VRM = nullptr;
	LiveIntervals *LIS = nullptr;			LiveIntervals *LIS = nullptr;
	LiveRegMatrix *Matrix = nullptr;			LiveRegMatrix *Matrix = nullptr;
	RegisterClassInfo RegClassInfo;			RegisterClassInfo RegClassInfo;
				RegClassFilterFunc ShouldAllocateClass;

	/// Inst which is a def of an original reg and whose defs are already all			/// Inst which is a def of an original reg and whose defs are already all
	/// dead after remat is saved in DeadRemats. The deletion of such inst is			/// dead after remat is saved in DeadRemats. The deletion of such inst is
	/// postponed till all the allocations are done, so its remat expr is			/// postponed till all the allocations are done, so its remat expr is
	/// always available for the remat of all the siblings of the original reg.			/// always available for the remat of all the siblings of the original reg.
	SmallPtrSet<MachineInstr *, 32> DeadRemats;			SmallPtrSet<MachineInstr *, 32> DeadRemats;

	RegAllocBase() = default;			RegAllocBase(RegClassFilterFunc F = allocateAllRegClasses) :
				ShouldAllocateClass(F) {}

	virtual ~RegAllocBase() = default;			virtual ~RegAllocBase() = default;

	// A RegAlloc pass should call this before allocatePhysRegs.			// A RegAlloc pass should call this before allocatePhysRegs.
	void init(VirtRegMap &vrm, LiveIntervals &lis, LiveRegMatrix &mat);			void init(VirtRegMap &vrm, LiveIntervals &lis, LiveRegMatrix &mat);

	// The top-level driver. The output is a VirtRegMap that us updated with			// The top-level driver. The output is a VirtRegMap that us updated with
	// physical register assignments.			// physical register assignments.
	void allocatePhysRegs();			void allocatePhysRegs();

	// Include spiller post optimization and removing dead defs left because of			// Include spiller post optimization and removing dead defs left because of
	// rematerialization.			// rematerialization.
	virtual void postOptimization();			virtual void postOptimization();

	// Get a temporary reference to a Spiller instance.			// Get a temporary reference to a Spiller instance.
	virtual Spiller &spiller() = 0;			virtual Spiller &spiller() = 0;

	/// enqueue - Add VirtReg to the priority queue of unassigned registers.			/// enqueue - Add VirtReg to the priority queue of unassigned registers.
	virtual void enqueue(LiveInterval *LI) = 0;			virtual void enqueueImpl(LiveInterval *LI) = 0;

				/// enqueue - Add VirtReg to the priority queue of unassigned registers.
				void enqueue(LiveInterval *LI);

	/// dequeue - Return the next unassigned register, or NULL.			/// dequeue - Return the next unassigned register, or NULL.
	virtual LiveInterval *dequeue() = 0;			virtual LiveInterval *dequeue() = 0;

	// A RegAlloc pass should override this to provide the allocation heuristics.			// A RegAlloc pass should override this to provide the allocation heuristics.
	// Each call must guarantee forward progess by returning an available PhysReg			// Each call must guarantee forward progess by returning an available PhysReg
	// or new set of split live virtual registers. It is up to the splitter to			// or new set of split live virtual registers. It is up to the splitter to
	// converge quickly toward fully spilled live ranges.			// converge quickly toward fully spilled live ranges.
	Show All 21 Lines

lib/CodeGen/RegAllocBase.cpp

	Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines
	void RegAllocBase::postOptimization() {			void RegAllocBase::postOptimization() {
	spiller().postOptimization();			spiller().postOptimization();
	for (auto DeadInst : DeadRemats) {			for (auto DeadInst : DeadRemats) {
	LIS->RemoveMachineInstrFromMaps(*DeadInst);			LIS->RemoveMachineInstrFromMaps(*DeadInst);
	DeadInst->eraseFromParent();			DeadInst->eraseFromParent();
	}			}
	DeadRemats.clear();			DeadRemats.clear();
	}			}

				void RegAllocBase::enqueue(LiveInterval *LI) {
				const unsigned Reg = LI->reg;

				assert(TargetRegisterInfo::isVirtualRegister(Reg) &&
				"Can only enqueue virtual registers");

				const TargetRegisterClass &RC = *MRI->getRegClass(Reg);
				if (!ShouldAllocateClass(*TRI, RC))
				return;
				qcolombetUnsubmitted Done Reply Inline Actions For debugging purposes, add a DEBUG statement for each case. qcolombet: For debugging purposes, add a DEBUG statement for each case.

				if (VRM->hasPhys(Reg))
				return;

				enqueueImpl(LI);
				}

lib/CodeGen/RegAllocBasic.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	class RABasic : public MachineFunctionPass,
// Scratch space. Allocated here to avoid repeated malloc calls in		// Scratch space. Allocated here to avoid repeated malloc calls in
// selectOrSplit().		// selectOrSplit().
BitVector UsableRegs;		BitVector UsableRegs;

bool LRE_CanEraseVirtReg(unsigned) override;		bool LRE_CanEraseVirtReg(unsigned) override;
void LRE_WillShrinkVirtReg(unsigned) override;		void LRE_WillShrinkVirtReg(unsigned) override;

public:		public:
RABasic();		RABasic(RegClassFilterFunc F = allocateAllRegClasses);

/// Return the pass name.		/// Return the pass name.
StringRef getPassName() const override { return "Basic Register Allocator"; }		StringRef getPassName() const override { return "Basic Register Allocator"; }

/// RABasic analysis usage.		/// RABasic analysis usage.
void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;

void releaseMemory() override;		void releaseMemory() override;

Spiller &spiller() override { return *SpillerInstance; }		Spiller &spiller() override { return *SpillerInstance; }

void enqueue(LiveInterval *LI) override {		void enqueueImpl(LiveInterval *LI) override {
Queue.push(LI);		Queue.push(LI);
}		}

LiveInterval *dequeue() override {		LiveInterval *dequeue() override {
if (Queue.empty())		if (Queue.empty())
return nullptr;		return nullptr;
LiveInterval *LI = Queue.top();		LiveInterval *LI = Queue.top();
Queue.pop();		Queue.pop();
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	if (!VRM->hasPhys(VirtReg))
return;		return;

// Register is assigned, put it back on the queue for reassignment.		// Register is assigned, put it back on the queue for reassignment.
LiveInterval &LI = LIS->getInterval(VirtReg);		LiveInterval &LI = LIS->getInterval(VirtReg);
Matrix->unassign(LI);		Matrix->unassign(LI);
enqueue(&LI);		enqueue(&LI);
}		}

RABasic::RABasic(): MachineFunctionPass(ID) {		RABasic::RABasic(RegClassFilterFunc F):
		MachineFunctionPass(ID),
		RegAllocBase(F) {
}		}

void RABasic::getAnalysisUsage(AnalysisUsage &AU) const {		void RABasic::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addPreserved<AAResultsWrapperPass>();		AU.addPreserved<AAResultsWrapperPass>();
AU.addRequired<LiveIntervals>();		AU.addRequired<LiveIntervals>();
AU.addPreserved<LiveIntervals>();		AU.addPreserved<LiveIntervals>();
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	bool RABasic::runOnMachineFunction(MachineFunction &mf) {

// Diagnostic output before rewriting		// Diagnostic output before rewriting
LLVM_DEBUG(dbgs() << "Post alloc VirtRegMap:\n" << *VRM << "\n");		LLVM_DEBUG(dbgs() << "Post alloc VirtRegMap:\n" << *VRM << "\n");

releaseMemory();		releaseMemory();
return true;		return true;
}		}

FunctionPass* llvm::createBasicRegisterAllocator()		FunctionPass* llvm::createBasicRegisterAllocator() {
{
return new RABasic();		return new RABasic();
}		}

		FunctionPass* llvm::createBasicRegisterAllocator(RegClassFilterFunc F) {
		return new RABasic(F);
		}

lib/CodeGen/RegAllocFast.cpp

Show All 22 Lines
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineOperand.h"		#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
		#include "llvm/CodeGen/RegAllocCommon.h"
#include "llvm/CodeGen/RegAllocRegistry.h"		#include "llvm/CodeGen/RegAllocRegistry.h"
#include "llvm/CodeGen/RegisterClassInfo.h"		#include "llvm/CodeGen/RegisterClassInfo.h"
#include "llvm/CodeGen/TargetInstrInfo.h"		#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/CodeGen/TargetOpcodes.h"		#include "llvm/CodeGen/TargetOpcodes.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"		#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"		#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
Show All 25 Lines	static RegisterRegAlloc
fastRegAlloc("fast", "fast register allocator", createFastRegisterAllocator);		fastRegAlloc("fast", "fast register allocator", createFastRegisterAllocator);

namespace {		namespace {

class RegAllocFast : public MachineFunctionPass {		class RegAllocFast : public MachineFunctionPass {
public:		public:
static char ID;		static char ID;

RegAllocFast() : MachineFunctionPass(ID), StackSlotForVirtReg(-1) {}		RegAllocFast(RegClassFilterFunc F = allocateAllRegClasses,
		bool ClearVirtRegs_ = true) :
		MachineFunctionPass(ID),
		ShouldAllocateClass(F),
		StackSlotForVirtReg(-1),
		ClearVirtRegs(ClearVirtRegs_) {
		qcolombetUnsubmitted Not Done Reply Inline Actions It feels dangerous to expose the ClearVirtRegs to me. Could we deduce what has to be cleared based on what we allocate instead of exposing this? qcolombet: It feels dangerous to expose the ClearVirtRegs to me. Could we deduce what has to be cleared…
		arsenmAuthorUnsubmitted Done Reply Inline Actions The problem is somewhere needs to set NoVRegs property. The same parameter is added to createVirtRegRewriter, but fastregalloc does the assignment itself. I don't think this can be inferred, and the target needs to say when it's done allocating register classes. For example it would be possible to have a degenerate function where all SGPRs are allocated in the first run, and there happen to be no VGPR vregs. Intervening passes may want to introduce new vregs to be taken care of by the later runs, but that won't work if the earlier pass decided to infer that all registers were taken care of arsenm: The problem is somewhere needs to set NoVRegs property. The same parameter is added to…
		arsenmAuthorUnsubmitted Done Reply Inline Actions Actually I stopped creating new virtual registers at some point in the current implementation, but I still may want to do so in the future arsenm: Actually I stopped creating new virtual registers at some point in the current implementation…
		}
		qcolombetUnsubmitted Not Done Reply Inline Actions Should this be `const RegClassFilterFunc &` everywhere? qcolombet: Should this be `const RegClassFilterFunc &` everywhere?

private:		private:
MachineFrameInfo *MFI;		MachineFrameInfo *MFI;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
RegisterClassInfo RegClassInfo;		RegisterClassInfo RegClassInfo;
		RegClassFilterFunc ShouldAllocateClass;

/// Basic block currently being allocated.		/// Basic block currently being allocated.
MachineBasicBlock *MBB;		MachineBasicBlock *MBB;

/// Maps virtual regs to the frame index where these values are spilled.		/// Maps virtual regs to the frame index where these values are spilled.
IndexedMap<int, VirtReg2IndexFunctor> StackSlotForVirtReg;		IndexedMap<int, VirtReg2IndexFunctor> StackSlotForVirtReg;

		bool ClearVirtRegs;

/// Everything we know about a live virtual register.		/// Everything we know about a live virtual register.
struct LiveReg {		struct LiveReg {
MachineInstr *LastUse = nullptr; ///< Last instr to use reg.		MachineInstr *LastUse = nullptr; ///< Last instr to use reg.
unsigned VirtReg; ///< Virtual register number.		unsigned VirtReg; ///< Virtual register number.
MCPhysReg PhysReg = 0; ///< Currently held here.		MCPhysReg PhysReg = 0; ///< Currently held here.
bool LiveOut = false; ///< Register is possibly live out.		bool LiveOut = false; ///< Register is possibly live out.
bool Reloaded = false; ///< Register was reloaded.		bool Reloaded = false; ///< Register was reloaded.
bool Error = false; ///< Could not allocate.		bool Error = false; ///< Could not allocate.
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	public:
}		}

MachineFunctionProperties getRequiredProperties() const override {		MachineFunctionProperties getRequiredProperties() const override {
return MachineFunctionProperties().set(		return MachineFunctionProperties().set(
MachineFunctionProperties::Property::NoPHIs);		MachineFunctionProperties::Property::NoPHIs);
}		}

MachineFunctionProperties getSetProperties() const override {		MachineFunctionProperties getSetProperties() const override {
		if (ClearVirtRegs) {
return MachineFunctionProperties().set(		return MachineFunctionProperties().set(
MachineFunctionProperties::Property::NoVRegs);		MachineFunctionProperties::Property::NoVRegs);
}		}

		return MachineFunctionProperties();
		}

private:		private:
bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

void allocateBasicBlock(MachineBasicBlock &MBB);		void allocateBasicBlock(MachineBasicBlock &MBB);
void allocateInstruction(MachineInstr &MI);		void allocateInstruction(MachineInstr &MI);
void handleDebugValue(MachineInstr &MI);		void handleDebugValue(MachineInstr &MI);
bool usePhysReg(MachineInstr &MI, MCPhysReg PhysReg);		bool usePhysReg(MachineInstr &MI, MCPhysReg PhysReg);
bool definePhysReg(MachineInstr &MI, MCPhysReg PhysReg);		bool definePhysReg(MachineInstr &MI, MCPhysReg PhysReg);
▲ Show 20 Lines • Show All 1,097 Lines • ▼ Show 20 Lines	bool RegAllocFast::runOnMachineFunction(MachineFunction &MF) {
LiveVirtRegs.setUniverse(NumVirtRegs);		LiveVirtRegs.setUniverse(NumVirtRegs);
MayLiveAccrossBlocks.clear();		MayLiveAccrossBlocks.clear();
MayLiveAccrossBlocks.resize(NumVirtRegs);		MayLiveAccrossBlocks.resize(NumVirtRegs);

// Loop over all of the basic blocks, eliminating virtual register references		// Loop over all of the basic blocks, eliminating virtual register references
for (MachineBasicBlock &MBB : MF)		for (MachineBasicBlock &MBB : MF)
allocateBasicBlock(MBB);		allocateBasicBlock(MBB);

		if (ClearVirtRegs) {
// All machine operands and other references to virtual registers have been		// All machine operands and other references to virtual registers have been
// replaced. Remove the virtual registers.		// replaced. Remove the virtual registers.
MRI->clearVirtRegs();		MRI->clearVirtRegs();
		}

StackSlotForVirtReg.clear();		StackSlotForVirtReg.clear();
LiveDbgValueMap.clear();		LiveDbgValueMap.clear();
return true;		return true;
}		}

FunctionPass *llvm::createFastRegisterAllocator() {		FunctionPass *llvm::createFastRegisterAllocator() {
return new RegAllocFast();		return new RegAllocFast();
}		}

		FunctionPass *llvm::createFastRegisterAllocator(
		std::function<bool(const TargetRegisterInfo &TRI,
		const TargetRegisterClass &RC)> Ftor, bool ClearVirtRegs) {
		qcolombetUnsubmitted Not Done Reply Inline Actions Could we have just one createFastRegisterAllocator with default arguments? (Also ClearVirtReg should disappear per my other comment IMO). qcolombet: Could we have just one createFastRegisterAllocator with default arguments? (Also ClearVirtReg…
		arsenmAuthorUnsubmitted Done Reply Inline Actions The RegAllocRegistry requires the type to be the no-argument function pass constructor. I could change that, but then all would have the ClearVirtRegs argument or not arsenm: The RegAllocRegistry requires the type to be the no-argument function pass constructor. I could…
		return new RegAllocFast(Ftor, ClearVirtRegs);
		}

lib/CodeGen/RegAllocGreedy.cpp

Show First 20 Lines • Show All 408 Lines • ▼ Show 20 Lines	#endif
/// by a split candidate when choosing the best split candidate.		/// by a split candidate when choosing the best split candidate.
bool EnableAdvancedRASplitCost;		bool EnableAdvancedRASplitCost;

/// Set of broken hints that may be reconciled later because of eviction.		/// Set of broken hints that may be reconciled later because of eviction.
SmallSetVector<LiveInterval *, 8> SetOfBrokenHints;		SmallSetVector<LiveInterval *, 8> SetOfBrokenHints;

public:		public:
RAGreedy();		RAGreedy();
		RAGreedy(RegClassFilterFunc F);

/// Return the pass name.		/// Return the pass name.
StringRef getPassName() const override { return "Greedy Register Allocator"; }		StringRef getPassName() const override { return "Greedy Register Allocator"; }

/// RAGreedy analysis usage.		/// RAGreedy analysis usage.
void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;
void releaseMemory() override;		void releaseMemory() override;
Spiller &spiller() override { return *SpillerInstance; }		Spiller &spiller() override { return *SpillerInstance; }
void enqueue(LiveInterval *LI) override;		void enqueueImpl(LiveInterval *LI) override;
LiveInterval *dequeue() override;		LiveInterval *dequeue() override;
unsigned selectOrSplit(LiveInterval&, SmallVectorImpl<unsigned>&) override;		unsigned selectOrSplit(LiveInterval&, SmallVectorImpl<unsigned>&) override;
void aboutToRemoveInterval(LiveInterval &) override;		void aboutToRemoveInterval(LiveInterval &) override;

/// Perform register allocation.		/// Perform register allocation.
bool runOnMachineFunction(MachineFunction &mf) override;		bool runOnMachineFunction(MachineFunction &mf) override;

MachineFunctionProperties getRequiredProperties() const override {		MachineFunctionProperties getRequiredProperties() const override {
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
// Hysteresis to use when comparing floats.		// Hysteresis to use when comparing floats.
// This helps stabilize decisions based on float comparisons.		// This helps stabilize decisions based on float comparisons.
const float Hysteresis = (2007 / 2048.0f); // 0.97998046875		const float Hysteresis = (2007 / 2048.0f); // 0.97998046875

FunctionPass* llvm::createGreedyRegisterAllocator() {		FunctionPass* llvm::createGreedyRegisterAllocator() {
return new RAGreedy();		return new RAGreedy();
}		}

RAGreedy::RAGreedy(): MachineFunctionPass(ID) {		namespace llvm {
		FunctionPass* createGreedyRegisterAllocator(
		std::function<bool(const TargetRegisterInfo &TRI,
		const TargetRegisterClass &RC)> Ftor);

		}
		qcolombetUnsubmitted Not Done Reply Inline Actions Ditto: Just one createXXX method. qcolombet: Ditto: Just one createXXX method.

		FunctionPass* llvm::createGreedyRegisterAllocator(
		std::function<bool(const TargetRegisterInfo &TRI,
		const TargetRegisterClass &RC)> Ftor) {
		return new RAGreedy(Ftor);
		}

		RAGreedy::RAGreedy() :
		MachineFunctionPass(ID),
		RegAllocBase() {

		qcolombetUnsubmitted Done Reply Inline Actions Why do we need both constructors? qcolombet: Why do we need both constructors?
		}

		RAGreedy::RAGreedy(RegClassFilterFunc F):
		MachineFunctionPass(ID),
		RegAllocBase(F) {
}		}

void RAGreedy::getAnalysisUsage(AnalysisUsage &AU) const {		void RAGreedy::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<MachineBlockFrequencyInfo>();		AU.addRequired<MachineBlockFrequencyInfo>();
AU.addPreserved<MachineBlockFrequencyInfo>();		AU.addPreserved<MachineBlockFrequencyInfo>();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addPreserved<AAResultsWrapperPass>();		AU.addPreserved<AAResultsWrapperPass>();
Show All 40 Lines

void RAGreedy::LRE_WillShrinkVirtReg(unsigned VirtReg) {		void RAGreedy::LRE_WillShrinkVirtReg(unsigned VirtReg) {
if (!VRM->hasPhys(VirtReg))		if (!VRM->hasPhys(VirtReg))
return;		return;

// Register is assigned, put it back on the queue for reassignment.		// Register is assigned, put it back on the queue for reassignment.
LiveInterval &LI = LIS->getInterval(VirtReg);		LiveInterval &LI = LIS->getInterval(VirtReg);
Matrix->unassign(LI);		Matrix->unassign(LI);
enqueue(&LI);		RegAllocBase::enqueue(&LI);
}		}

void RAGreedy::LRE_DidCloneVirtReg(unsigned New, unsigned Old) {		void RAGreedy::LRE_DidCloneVirtReg(unsigned New, unsigned Old) {
// Cloning a register we haven't even heard about yet? Just ignore it.		// Cloning a register we haven't even heard about yet? Just ignore it.
if (!ExtraRegInfo.inBounds(Old))		if (!ExtraRegInfo.inBounds(Old))
return;		return;

// LRE may clone a virtual register because dead code elimination causes it to		// LRE may clone a virtual register because dead code elimination causes it to
// be split into connected components. The new components are much smaller		// be split into connected components. The new components are much smaller
// than the original, so they should get a new chance at being assigned.		// than the original, so they should get a new chance at being assigned.
// same stage as the parent.		// same stage as the parent.
ExtraRegInfo[Old].Stage = RS_Assign;		ExtraRegInfo[Old].Stage = RS_Assign;
ExtraRegInfo.grow(New);		ExtraRegInfo.grow(New);
ExtraRegInfo[New] = ExtraRegInfo[Old];		ExtraRegInfo[New] = ExtraRegInfo[Old];
}		}

void RAGreedy::releaseMemory() {		void RAGreedy::releaseMemory() {
SpillerInstance.reset();		SpillerInstance.reset();
ExtraRegInfo.clear();		ExtraRegInfo.clear();
GlobalCand.clear();		GlobalCand.clear();
}		}

void RAGreedy::enqueue(LiveInterval *LI) { enqueue(Queue, LI); }		void RAGreedy::enqueueImpl(LiveInterval *LI) { enqueue(Queue, LI); }

void RAGreedy::enqueue(PQueue &CurQueue, LiveInterval *LI) {		void RAGreedy::enqueue(PQueue &CurQueue, LiveInterval *LI) {
// Prioritize live ranges by size, assigning larger ranges first.		// Prioritize live ranges by size, assigning larger ranges first.
// The queue holds (size, reg) pairs.		// The queue holds (size, reg) pairs.
const unsigned Size = LI->getSize();		const unsigned Size = LI->getSize();
const unsigned Reg = LI->reg;		const unsigned Reg = LI->reg;
assert(TargetRegisterInfo::isVirtualRegister(Reg) &&
"Can only enqueue virtual registers");
unsigned Prio;		unsigned Prio;
		qcolombetUnsubmitted Done Reply Inline Actions Removing this assert is worrisome. Why do we need that? qcolombet: Removing this assert is worrisome. Why do we need that?

ExtraRegInfo.grow(Reg);		ExtraRegInfo.grow(Reg);
if (ExtraRegInfo[Reg].Stage == RS_New)		if (ExtraRegInfo[Reg].Stage == RS_New)
ExtraRegInfo[Reg].Stage = RS_Assign;		ExtraRegInfo[Reg].Stage = RS_Assign;

if (ExtraRegInfo[Reg].Stage == RS_Split) {		if (ExtraRegInfo[Reg].Stage == RS_Split) {
// Unsplit ranges that couldn't be allocated immediately are deferred until		// Unsplit ranges that couldn't be allocated immediately are deferred until
// everything else has been allocated.		// everything else has been allocated.
▲ Show 20 Lines • Show All 2,215 Lines • ▼ Show 20 Lines	void RAGreedy::tryHintRecoloring(LiveInterval &VirtReg) {

do {		do {
Reg = RecoloringCandidates.pop_back_val();		Reg = RecoloringCandidates.pop_back_val();

// We cannot recolor physical register.		// We cannot recolor physical register.
if (TargetRegisterInfo::isPhysicalRegister(Reg))		if (TargetRegisterInfo::isPhysicalRegister(Reg))
continue;		continue;

assert(VRM->hasPhys(Reg) && "We have unallocated variable!!");		// This may be a skipped class
		if (!VRM->hasPhys(Reg)) {
		assert(!ShouldAllocateClass(TRI, MRI->getRegClass(Reg)) &&
		"We have an unallocated variable which should have been handled");
		continue;
		}

// Get the live interval mapped with this virtual register to be able		// Get the live interval mapped with this virtual register to be able
// to check for the interference with the new color.		// to check for the interference with the new color.
LiveInterval &LI = LIS->getInterval(Reg);		LiveInterval &LI = LIS->getInterval(Reg);
unsigned CurrPhys = VRM->getPhys(Reg);		unsigned CurrPhys = VRM->getPhys(Reg);
// Check that the new color matches the register class constraints and		// Check that the new color matches the register class constraints and
// that it is free for this live range.		// that it is free for this live range.
if (CurrPhys != PhysReg && (!MRI->getRegClass(Reg)->contains(PhysReg) \|\|		if (CurrPhys != PhysReg && (!MRI->getRegClass(Reg)->contains(PhysReg) \|\|
▲ Show 20 Lines • Show All 319 Lines • Show Last 20 Lines

lib/CodeGen/TargetFrameLoweringImpl.cpp

	Show All 12 Lines

	#include "llvm/ADT/BitVector.h"			#include "llvm/ADT/BitVector.h"
	#include "llvm/CodeGen/MachineFrameInfo.h"			#include "llvm/CodeGen/MachineFrameInfo.h"
	#include "llvm/CodeGen/MachineFunction.h"			#include "llvm/CodeGen/MachineFunction.h"
	#include "llvm/CodeGen/MachineRegisterInfo.h"			#include "llvm/CodeGen/MachineRegisterInfo.h"
	#include "llvm/CodeGen/TargetFrameLowering.h"			#include "llvm/CodeGen/TargetFrameLowering.h"
	#include "llvm/CodeGen/TargetRegisterInfo.h"			#include "llvm/CodeGen/TargetRegisterInfo.h"
	#include "llvm/CodeGen/TargetSubtargetInfo.h"			#include "llvm/CodeGen/TargetSubtargetInfo.h"
				#include "llvm/CodeGen/VirtRegMap.h"
				qcolombetUnsubmitted Done Reply Inline Actions Why do we need this change? qcolombet: Why do we need this change?
	#include "llvm/IR/Attributes.h"			#include "llvm/IR/Attributes.h"
	#include "llvm/IR/CallingConv.h"			#include "llvm/IR/CallingConv.h"
	#include "llvm/IR/Function.h"			#include "llvm/IR/Function.h"
	#include "llvm/MC/MCRegisterInfo.h"			#include "llvm/MC/MCRegisterInfo.h"
	#include "llvm/Support/Compiler.h"			#include "llvm/Support/Compiler.h"
	#include "llvm/Target/TargetMachine.h"			#include "llvm/Target/TargetMachine.h"
	#include "llvm/Target/TargetOptions.h"			#include "llvm/Target/TargetOptions.h"

	▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
	int TargetFrameLowering::getInitialCFAOffset(const MachineFunction &MF) const {			int TargetFrameLowering::getInitialCFAOffset(const MachineFunction &MF) const {
	llvm_unreachable("getInitialCFAOffset() not implemented!");			llvm_unreachable("getInitialCFAOffset() not implemented!");
	}			}

	unsigned TargetFrameLowering::getInitialCFARegister(const MachineFunction &MF)			unsigned TargetFrameLowering::getInitialCFARegister(const MachineFunction &MF)
	const {			const {
	llvm_unreachable("getInitialCFARegister() not implemented!");			llvm_unreachable("getInitialCFARegister() not implemented!");
	}			}
	No newline at end of file			No newline at end of file

lib/Target/AMDGPU/AMDGPU.h

	Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
	extern char &SIFixVGPRCopiesID;			extern char &SIFixVGPRCopiesID;

	void initializeSIFixupVectorISelPass(PassRegistry &);			void initializeSIFixupVectorISelPass(PassRegistry &);
	extern char &SIFixupVectorISelID;			extern char &SIFixupVectorISelID;

	void initializeSILowerI1CopiesPass(PassRegistry &);			void initializeSILowerI1CopiesPass(PassRegistry &);
	extern char &SILowerI1CopiesID;			extern char &SILowerI1CopiesID;

				void initializeSILowerSGPRSpillsPass(PassRegistry &);
				extern char &SILowerSGPRSpillsID;

	void initializeSILoadStoreOptimizerPass(PassRegistry &);			void initializeSILoadStoreOptimizerPass(PassRegistry &);
	extern char &SILoadStoreOptimizerID;			extern char &SILoadStoreOptimizerID;

	void initializeSIWholeQuadModePass(PassRegistry &);			void initializeSIWholeQuadModePass(PassRegistry &);
	extern char &SIWholeQuadModeID;			extern char &SIWholeQuadModeID;

	void initializeSILowerControlFlowPass(PassRegistry &);			void initializeSILowerControlFlowPass(PassRegistry &);
	extern char &SILowerControlFlowID;			extern char &SILowerControlFlowID;
	▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUCallingConv.td

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	def CSR_AMDGPU_VGPRs_32_255 : CalleeSavedRegs<			def CSR_AMDGPU_VGPRs_32_255 : CalleeSavedRegs<
	(sequence "VGPR%u", 32, 255)			(sequence "VGPR%u", 32, 255)
	>;			>;

	def CSR_AMDGPU_SGPRs_32_103 : CalleeSavedRegs<			def CSR_AMDGPU_SGPRs_32_103 : CalleeSavedRegs<
	(sequence "SGPR%u", 32, 103)			(sequence "SGPR%u", 32, 103)
	>;			>;

				// Just to get the regmask, not for calling convention purposes.
				def CSR_AMDGPU_AllVGPRs : CalleeSavedRegs<
				(sequence "VGPR%u", 0, 255)
				>;

	def CSR_AMDGPU_HighRegs : CalleeSavedRegs<			def CSR_AMDGPU_HighRegs : CalleeSavedRegs<
	(add CSR_AMDGPU_VGPRs_32_255, CSR_AMDGPU_SGPRs_32_103)			(add CSR_AMDGPU_VGPRs_32_255, CSR_AMDGPU_SGPRs_32_103)
	>;			>;

	// Calling convention for leaf functions			// Calling convention for leaf functions
	def CC_AMDGPU_Func : CallingConv<[			def CC_AMDGPU_Func : CallingConv<[
	CCIfByVal<CCPassByVal<4, 4>>,			CCIfByVal<CCPassByVal<4, 4>>,
	CCIfType<[i1], CCPromoteToType<i32>>,			CCIfType<[i1], CCPromoteToType<i32>>,
	Show All 36 Lines

lib/Target/AMDGPU/AMDGPURegisterInfo.cpp

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	default:
return nullptr;		return nullptr;
}		}
}		}

unsigned SIRegisterInfo::getFrameRegister(const MachineFunction &MF) const {		unsigned SIRegisterInfo::getFrameRegister(const MachineFunction &MF) const {
const SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();
return FuncInfo->getFrameOffsetReg();		return FuncInfo->getFrameOffsetReg();
}		}

		const uint32_t *SIRegisterInfo::getAllVGPRRegMask() const {
		return CSR_AMDGPU_AllVGPRs_RegMask;
		}

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show All 25 Lines
#include "GCNSchedStrategy.h"		#include "GCNSchedStrategy.h"
#include "R600MachineScheduler.h"		#include "R600MachineScheduler.h"
#include "SIMachineScheduler.h"		#include "SIMachineScheduler.h"
#include "llvm/CodeGen/GlobalISel/IRTranslator.h"		#include "llvm/CodeGen/GlobalISel/IRTranslator.h"
#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"		#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"
#include "llvm/CodeGen/GlobalISel/Legalizer.h"		#include "llvm/CodeGen/GlobalISel/Legalizer.h"
#include "llvm/CodeGen/GlobalISel/RegBankSelect.h"		#include "llvm/CodeGen/GlobalISel/RegBankSelect.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
		#include "llvm/CodeGen/RegAllocRegistry.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Target/TargetLoweringObjectFile.h"		#include "llvm/Target/TargetLoweringObjectFile.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/IPO/AlwaysInliner.h"		#include "llvm/Transforms/IPO/AlwaysInliner.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"		#include "llvm/Transforms/IPO/PassManagerBuilder.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/GVN.h"		#include "llvm/Transforms/Scalar/GVN.h"
#include "llvm/Transforms/Utils.h"		#include "llvm/Transforms/Utils.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"
#include <memory>		#include <memory>

using namespace llvm;		using namespace llvm;

		namespace {
		class SGPRRegisterRegAlloc : public RegisterRegAllocBase<SGPRRegisterRegAlloc> {
		public:
		SGPRRegisterRegAlloc(const char N, const char D, FunctionPassCtor C)
		: RegisterRegAllocBase(N, D, C) {}
		};

		class VGPRRegisterRegAlloc : public RegisterRegAllocBase<VGPRRegisterRegAlloc> {
		public:
		VGPRRegisterRegAlloc(const char N, const char D, FunctionPassCtor C)
		: RegisterRegAllocBase(N, D, C) {}
		};

		static bool onlyAllocateSGPRs(const TargetRegisterInfo &TRI,
		const TargetRegisterClass &RC) {
		return static_cast<const SIRegisterInfo &>(TRI).isSGPRClass(&RC);
		}

		static bool onlyAllocateVGPRs(const TargetRegisterInfo &TRI,
		const TargetRegisterClass &RC) {
		return static_cast<const SIRegisterInfo &>(TRI).hasVGPRs(&RC);
		rampitecUnsubmitted Done Reply Inline Actions !isSGPRClass() to catch [potentially] remaining strange register classes. rampitec: !isSGPRClass() to catch [potentially] remaining strange register classes.
		}


		/// -{sgpr\|vgpr}-regalloc=... command line option.
		static FunctionPass *useDefaultRegisterAllocator() { return nullptr; }

		/// A dummy default pass factory indicates whether the register allocator is
		/// overridden on the command line.
		static llvm::once_flag InitializeDefaultSGPRRegisterAllocatorFlag;
		static llvm::once_flag InitializeDefaultVGPRRegisterAllocatorFlag;

		static SGPRRegisterRegAlloc
		defaultSGPRRegAlloc("default",
		"pick SGPR register allocator based on -O option",
		useDefaultRegisterAllocator);

		static cl::opt<SGPRRegisterRegAlloc::FunctionPassCtor, false,
		RegisterPassParser<SGPRRegisterRegAlloc>>
		SGPRRegAlloc("sgpr-regalloc", cl::Hidden, cl::init(&useDefaultRegisterAllocator),
		cl::desc("Register allocator to use for SGPRs"));

		static cl::opt<VGPRRegisterRegAlloc::FunctionPassCtor, false,
		RegisterPassParser<VGPRRegisterRegAlloc>>
		VGPRRegAlloc("vgpr-regalloc", cl::Hidden, cl::init(&useDefaultRegisterAllocator),
		cl::desc("Register allocator to use for VGPRs"));


		static void initializeDefaultSGPRRegisterAllocatorOnce() {
		RegisterRegAlloc::FunctionPassCtor Ctor = SGPRRegisterRegAlloc::getDefault();

		if (!Ctor) {
		Ctor = SGPRRegAlloc;
		SGPRRegisterRegAlloc::setDefault(SGPRRegAlloc);
		}
		}

		static void initializeDefaultVGPRRegisterAllocatorOnce() {
		RegisterRegAlloc::FunctionPassCtor Ctor = VGPRRegisterRegAlloc::getDefault();

		if (!Ctor) {
		Ctor = VGPRRegAlloc;
		VGPRRegisterRegAlloc::setDefault(VGPRRegAlloc);
		}
		}

		static FunctionPass *createBasicSGPRRegisterAllocator() {
		return createBasicRegisterAllocator(onlyAllocateSGPRs);
		}

		static FunctionPass *createGreedySGPRRegisterAllocator() {
		return createGreedyRegisterAllocator(onlyAllocateSGPRs);
		}

		static FunctionPass *createFastSGPRRegisterAllocator() {
		return createFastRegisterAllocator(onlyAllocateSGPRs, false);
		}

		static FunctionPass *createBasicVGPRRegisterAllocator() {
		return createBasicRegisterAllocator(onlyAllocateVGPRs);
		}

		static FunctionPass *createGreedyVGPRRegisterAllocator() {
		return createGreedyRegisterAllocator(onlyAllocateVGPRs);
		}

		static FunctionPass *createFastVGPRRegisterAllocator() {
		return createFastRegisterAllocator(onlyAllocateVGPRs, true);
		}

		static SGPRRegisterRegAlloc basicRegAllocSGPR(
		"basic", "basic register allocator", createBasicSGPRRegisterAllocator);
		static SGPRRegisterRegAlloc greedyRegAllocSGPR(
		"greedy", "greedy register allocator", createGreedySGPRRegisterAllocator);

		static SGPRRegisterRegAlloc fastRegAllocSGPR(
		"fast", "fast register allocator", createFastSGPRRegisterAllocator);


		static VGPRRegisterRegAlloc basicRegAllocVGPR(
		"basic", "basic register allocator", createBasicVGPRRegisterAllocator);
		static VGPRRegisterRegAlloc greedyRegAllocVGPR(
		"greedy", "greedy register allocator", createGreedyVGPRRegisterAllocator);

		static VGPRRegisterRegAlloc fastRegAllocVGPR(
		"fast", "fast register allocator", createFastVGPRRegisterAllocator);
		}


static cl::opt<bool> EnableR600StructurizeCFG(		static cl::opt<bool> EnableR600StructurizeCFG(
"r600-ir-structurize",		"r600-ir-structurize",
cl::desc("Use StructurizeCFG IR pass"),		cl::desc("Use StructurizeCFG IR pass"),
cl::init(true));		cl::init(true));

static cl::opt<bool> EnableSROA(		static cl::opt<bool> EnableSROA(
"amdgpu-sroa",		"amdgpu-sroa",
cl::desc("Run SROA after promote alloca pass"),		cl::desc("Run SROA after promote alloca pass"),
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	extern "C" void LLVMInitializeAMDGPUTarget() {
initializeR600ControlFlowFinalizerPass(*PR);		initializeR600ControlFlowFinalizerPass(*PR);
initializeR600PacketizerPass(*PR);		initializeR600PacketizerPass(*PR);
initializeR600ExpandSpecialInstrsPassPass(*PR);		initializeR600ExpandSpecialInstrsPassPass(*PR);
initializeR600VectorRegMergerPass(*PR);		initializeR600VectorRegMergerPass(*PR);
initializeGlobalISel(*PR);		initializeGlobalISel(*PR);
initializeAMDGPUDAGToDAGISelPass(*PR);		initializeAMDGPUDAGToDAGISelPass(*PR);
initializeGCNDPPCombinePass(*PR);		initializeGCNDPPCombinePass(*PR);
initializeSILowerI1CopiesPass(*PR);		initializeSILowerI1CopiesPass(*PR);
		initializeSILowerSGPRSpillsPass(*PR);
initializeSIFixSGPRCopiesPass(*PR);		initializeSIFixSGPRCopiesPass(*PR);
initializeSIFixVGPRCopiesPass(*PR);		initializeSIFixVGPRCopiesPass(*PR);
initializeSIFixupVectorISelPass(*PR);		initializeSIFixupVectorISelPass(*PR);
initializeSIFoldOperandsPass(*PR);		initializeSIFoldOperandsPass(*PR);
initializeSIPeepholeSDWAPass(*PR);		initializeSIPeepholeSDWAPass(*PR);
initializeSIShrinkInstructionsPass(*PR);		initializeSIShrinkInstructionsPass(*PR);
initializeSIOptimizeExecMaskingPreRAPass(*PR);		initializeSIOptimizeExecMaskingPreRAPass(*PR);
initializeSILoadStoreOptimizerPass(*PR);		initializeSILoadStoreOptimizerPass(*PR);
▲ Show 20 Lines • Show All 391 Lines • ▼ Show 20 Lines	public:
bool addILPOpts() override;		bool addILPOpts() override;
bool addInstSelector() override;		bool addInstSelector() override;
bool addIRTranslator() override;		bool addIRTranslator() override;
bool addLegalizeMachineIR() override;		bool addLegalizeMachineIR() override;
bool addRegBankSelect() override;		bool addRegBankSelect() override;
bool addGlobalInstructionSelect() override;		bool addGlobalInstructionSelect() override;
void addFastRegAlloc() override;		void addFastRegAlloc() override;
void addOptimizedRegAlloc() override;		void addOptimizedRegAlloc() override;

		FunctionPass *createSGPRAllocPass(bool Optimized);
		FunctionPass *createVGPRAllocPass(bool Optimized);
		FunctionPass *createRegAllocPass(bool Optimized) override;

		bool addRegAssignmentFast() override;
		bool addRegAssignmentOptimized() override;

void addPreRegAlloc() override;		void addPreRegAlloc() override;
void addPostRegAlloc() override;		void addPostRegAlloc() override;
void addPreSched2() override;		void addPreSched2() override;
void addPreEmitPass() override;		void addPreEmitPass() override;
};		};

} // end anonymous namespace		} // end anonymous namespace

▲ Show 20 Lines • Show All 298 Lines • ▼ Show 20 Lines	void GCNPassConfig::addOptimizedRegAlloc() {

// This must be run after SILowerControlFlow, since it needs to use the		// This must be run after SILowerControlFlow, since it needs to use the
// machine-level CFG, but before register allocation.		// machine-level CFG, but before register allocation.
insertPass(&SILowerControlFlowID, &SIFixWWMLivenessID, false);		insertPass(&SILowerControlFlowID, &SIFixWWMLivenessID, false);

TargetPassConfig::addOptimizedRegAlloc();		TargetPassConfig::addOptimizedRegAlloc();
}		}

		FunctionPass *GCNPassConfig::createSGPRAllocPass(bool Optimized) {
		// Initialize the global default.
		llvm::call_once(InitializeDefaultSGPRRegisterAllocatorFlag,
		initializeDefaultSGPRRegisterAllocatorOnce);

		RegisterRegAlloc::FunctionPassCtor Ctor = SGPRRegisterRegAlloc::getDefault();
		if (Ctor != useDefaultRegisterAllocator)
		return Ctor();

		if (Optimized)
		return createGreedyRegisterAllocator(onlyAllocateSGPRs);

		return createFastRegisterAllocator(onlyAllocateSGPRs, false);
		}

		FunctionPass *GCNPassConfig::createVGPRAllocPass(bool Optimized) {
		// Initialize the global default.
		llvm::call_once(InitializeDefaultVGPRRegisterAllocatorFlag,
		initializeDefaultVGPRRegisterAllocatorOnce);

		RegisterRegAlloc::FunctionPassCtor Ctor = VGPRRegisterRegAlloc::getDefault();
		if (Ctor != useDefaultRegisterAllocator)
		return Ctor();

		if (Optimized)
		return createGreedyVGPRRegisterAllocator();

		return createFastVGPRRegisterAllocator();
		}

		FunctionPass *GCNPassConfig::createRegAllocPass(bool Optimized) {
		llvm_unreachable("should not be used");
		}

		static const char RegAllocOptNotSupportedMessage[] =
		"-regalloc not supported with amdgcn. Use -sgpr-regalloc and -vgpr-regalloc";

		bool GCNPassConfig::addRegAssignmentFast() {
		if (!usingDefaultRegAlloc())
		report_fatal_error(RegAllocOptNotSupportedMessage);

		addPass(createSGPRAllocPass(false));

		// Equivalent of PEI for SGPRs.
		addPass(&SILowerSGPRSpillsID);

		addPass(createVGPRAllocPass(false));
		return true;
		}

		bool GCNPassConfig::addRegAssignmentOptimized() {
		if (!usingDefaultRegAlloc())
		report_fatal_error(RegAllocOptNotSupportedMessage);

		addPass(createSGPRAllocPass(true));

		addPreRewrite();
		rampitecUnsubmitted Not Done Reply Inline Actions You need to pass filter to PreRewrite as well. rampitec: You need to pass filter to PreRewrite as well.

		// Commit allocated register changes. This is mostly necessary because too
		// many things rely on the use lists of the physical registers, such as the
		// verifier. This is only necessary with allocators which use LiveIntervals,
		// since FastRegAlloc does the replacments itself.
		addPass(createVirtRegRewriter(false));

		// Equivalent of PEI for SGPRs.
		addPass(&SILowerSGPRSpillsID);

		addPass(createVGPRAllocPass(true));

		addPreRewrite();
		addPass(&VirtRegRewriterID);

		addPass(&StackSlotColoringID);

		return true;
		}

void GCNPassConfig::addPostRegAlloc() {		void GCNPassConfig::addPostRegAlloc() {
addPass(&SIFixVGPRCopiesID);		addPass(&SIFixVGPRCopiesID);
if (getOptLevel() > CodeGenOpt::None)		if (getOptLevel() > CodeGenOpt::None)
addPass(&SIOptimizeExecMaskingID);		addPass(&SIOptimizeExecMaskingID);
TargetPassConfig::addPostRegAlloc();		TargetPassConfig::addPostRegAlloc();
}		}

void GCNPassConfig::addPreSched2() {		void GCNPassConfig::addPreSched2() {
Show All 28 Lines

lib/Target/AMDGPU/CMakeLists.txt

Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	add_llvm_target(AMDGPUCodeGen
SIFrameLowering.cpp		SIFrameLowering.cpp
SIInsertSkips.cpp		SIInsertSkips.cpp
SIInsertWaitcnts.cpp		SIInsertWaitcnts.cpp
SIInstrInfo.cpp		SIInstrInfo.cpp
SIISelLowering.cpp		SIISelLowering.cpp
SILoadStoreOptimizer.cpp		SILoadStoreOptimizer.cpp
SILowerControlFlow.cpp		SILowerControlFlow.cpp
SILowerI1Copies.cpp		SILowerI1Copies.cpp
		SILowerSGPRSpills.cpp
SIMachineFunctionInfo.cpp		SIMachineFunctionInfo.cpp
SIMachineScheduler.cpp		SIMachineScheduler.cpp
SIMemoryLegalizer.cpp		SIMemoryLegalizer.cpp
SIOptimizeExecMasking.cpp		SIOptimizeExecMasking.cpp
SIOptimizeExecMaskingPreRA.cpp		SIOptimizeExecMaskingPreRA.cpp
SIPeepholeSDWA.cpp		SIPeepholeSDWA.cpp
SIRegisterInfo.cpp		SIRegisterInfo.cpp
SIShrinkInstructions.cpp		SIShrinkInstructions.cpp
Show All 11 Lines

lib/Target/AMDGPU/SIFrameLowering.h

Show All 32 Lines	public:
void emitEpilogue(MachineFunction &MF,		void emitEpilogue(MachineFunction &MF,
MachineBasicBlock &MBB) const override;		MachineBasicBlock &MBB) const override;
int getFrameIndexReference(const MachineFunction &MF, int FI,		int getFrameIndexReference(const MachineFunction &MF, int FI,
unsigned &FrameReg) const override;		unsigned &FrameReg) const override;

void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs,		void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs,
RegScavenger *RS = nullptr) const override;		RegScavenger *RS = nullptr) const override;

		void determineCalleeSavesSGPR(MachineFunction &MF, BitVector &SavedRegs,
		RegScavenger *RS = nullptr) const;

void processFunctionBeforeFrameFinalized(		void processFunctionBeforeFrameFinalized(
MachineFunction &MF,		MachineFunction &MF,
RegScavenger *RS = nullptr) const override;		RegScavenger *RS = nullptr) const override;

MachineBasicBlock::iterator		MachineBasicBlock::iterator
eliminateCallFramePseudoInstr(MachineFunction &MF,		eliminateCallFramePseudoInstr(MachineFunction &MF,
MachineBasicBlock &MBB,		MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI) const override;		MachineBasicBlock::iterator MI) const override;
Show All 37 Lines

lib/Target/AMDGPU/SIFrameLowering.cpp

Show First 20 Lines • Show All 584 Lines • ▼ Show 20 Lines	void SIFrameLowering::emitPrologue(MachineFunction &MF,
}		}

if (RoundedSize != 0 && hasSP(MF)) {		if (RoundedSize != 0 && hasSP(MF)) {
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_ADD_U32), StackPtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_ADD_U32), StackPtrReg)
.addReg(StackPtrReg)		.addReg(StackPtrReg)
.addImm(RoundedSize * ST.getWavefrontSize())		.addImm(RoundedSize * ST.getWavefrontSize())
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}

for (const SIMachineFunctionInfo::SGPRSpillVGPRCSR &Reg
: FuncInfo->getSGPRSpillVGPRs()) {
if (!Reg.FI.hasValue())
continue;
TII->storeRegToStackSlot(MBB, MBBI, Reg.VGPR, true,
Reg.FI.getValue(), &AMDGPU::VGPR_32RegClass,
&TII->getRegisterInfo());
}
}		}

void SIFrameLowering::emitEpilogue(MachineFunction &MF,		void SIFrameLowering::emitEpilogue(MachineFunction &MF,
MachineBasicBlock &MBB) const {		MachineBasicBlock &MBB) const {
const SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();
if (FuncInfo->isEntryFunction())		if (FuncInfo->isEntryFunction())
return;		return;

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();		MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();

for (const SIMachineFunctionInfo::SGPRSpillVGPRCSR &Reg
: FuncInfo->getSGPRSpillVGPRs()) {
if (!Reg.FI.hasValue())
continue;
TII->loadRegFromStackSlot(MBB, MBBI, Reg.VGPR,
Reg.FI.getValue(), &AMDGPU::VGPR_32RegClass,
&TII->getRegisterInfo());
}

unsigned StackPtrReg = FuncInfo->getStackPtrOffsetReg();		unsigned StackPtrReg = FuncInfo->getStackPtrOffsetReg();
if (StackPtrReg == AMDGPU::NoRegister)		if (StackPtrReg == AMDGPU::NoRegister)
return;		return;

const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
uint32_t NumBytes = MFI.getStackSize();		uint32_t NumBytes = MFI.getStackSize();

DebugLoc DL;		DebugLoc DL;
Show All 35 Lines	void SIFrameLowering::processFunctionBeforeFrameFinalized(
MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();

if (!MFI.hasStackObjects())		if (!MFI.hasStackObjects())
return;		return;

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
const SIRegisterInfo &TRI = TII->getRegisterInfo();		const SIRegisterInfo &TRI = TII->getRegisterInfo();
SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();
bool AllSGPRSpilledToVGPRs = false;

if (TRI.spillSGPRToVGPR() && FuncInfo->hasSpilledSGPRs()) {
AllSGPRSpilledToVGPRs = true;

// Process all SGPR spills before frame offsets are finalized. Ideally SGPRs
// are spilled to VGPRs, in which case we can eliminate the stack usage.
//
// XXX - This operates under the assumption that only other SGPR spills are
// users of the frame index. I'm not 100% sure this is correct. The
// StackColoring pass has a comment saying a future improvement would be to
// merging of allocas with spill slots, but for now according to
// MachineFrameInfo isSpillSlot can't alias any other object.
for (MachineBasicBlock &MBB : MF) {
MachineBasicBlock::iterator Next;
for (auto I = MBB.begin(), E = MBB.end(); I != E; I = Next) {
MachineInstr &MI = *I;
Next = std::next(I);

if (TII->isSGPRSpill(MI)) {
int FI = TII->getNamedOperand(MI, AMDGPU::OpName::addr)->getIndex();
assert(MFI.getStackID(FI) == SIStackID::SGPR_SPILL);
if (FuncInfo->allocateSGPRSpillToVGPR(MF, FI)) {
bool Spilled = TRI.eliminateSGPRToVGPRSpillFrameIndex(MI, FI, RS);
(void)Spilled;
assert(Spilled && "failed to spill SGPR to VGPR when allocated");
} else
AllSGPRSpilledToVGPRs = false;
}
}
}

FuncInfo->removeSGPRToVGPRFrameIndices(MFI);
}

// FIXME: The other checks should be redundant with allStackObjectsAreDead,		// FIXME: The other checks should be redundant with allStackObjectsAreDead,
// but currently hasNonSpillStackObjects is set only from source		// but currently hasNonSpillStackObjects is set only from source
// allocas. Stack temps produced from legalization are not counted currently.		// allocas. Stack temps produced from legalization are not counted currently.
if (FuncInfo->hasNonSpillStackObjects() \|\| FuncInfo->hasSpilledVGPRs() \|\|		if (!allStackObjectsAreDead(MFI)) {
!AllSGPRSpilledToVGPRs \|\| !allStackObjectsAreDead(MFI)) {
assert(RS && "RegScavenger required if spilling");		assert(RS && "RegScavenger required if spilling");

// We force this to be at offset 0 so no user object ever has 0 as an		// We force this to be at offset 0 so no user object ever has 0 as an
// address, so we may use 0 as an invalid pointer value. This is because		// address, so we may use 0 as an invalid pointer value. This is because
// LLVM assumes 0 is an invalid pointer in address space 0. Because alloca		// LLVM assumes 0 is an invalid pointer in address space 0. Because alloca
// is required to be address space 0, we are forced to accept this for		// is required to be address space 0, we are forced to accept this for
// now. Ideally we could have the stack in another address space with 0 as a		// now. Ideally we could have the stack in another address space with 0 as a
// valid pointer, and -1 as the null value.		// valid pointer, and -1 as the null value.
//		//
// This will also waste additional space when user stack objects require > 4		// This will also waste additional space when user stack objects require > 4
// byte alignment.		// byte alignment.
//		//
// The main cost here is losing the offset for addressing modes. However		// The main cost here is losing the offset for addressing modes. However
// this also ensures we shouldn't need a register for the offset when		// this also ensures we shouldn't need a register for the offset when
// emergency scavenging.		// emergency scavenging.
int ScavengeFI = MFI.CreateFixedObject(		int ScavengeFI = MFI.CreateFixedObject(
TRI.getSpillSize(AMDGPU::SGPR_32RegClass), 0, false);		TRI.getSpillSize(AMDGPU::SGPR_32RegClass), 0, false);
RS->addScavengingFrameIndex(ScavengeFI);		RS->addScavengingFrameIndex(ScavengeFI);
}		}
}		}

		// Only report VGPRs to generic code.
void SIFrameLowering::determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs,		void SIFrameLowering::determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs,
RegScavenger *RS) const {		RegScavenger *RS) const {
TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS);		TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS);
		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
		const SIRegisterInfo *TRI = ST.getRegisterInfo();

		SavedRegs.clearBitsNotInMask(TRI->getAllVGPRRegMask());
		}

		void SIFrameLowering::determineCalleeSavesSGPR(MachineFunction &MF, BitVector &SavedRegs,
		RegScavenger *RS) const {
		TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS);
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();

		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
		const SIRegisterInfo *TRI = ST.getRegisterInfo();

// The SP is specifically managed and we don't want extra spills of it.		// The SP is specifically managed and we don't want extra spills of it.
SavedRegs.reset(MFI->getStackPtrOffsetReg());		SavedRegs.reset(MFI->getStackPtrOffsetReg());
		SavedRegs.clearBitsInMask(TRI->getAllVGPRRegMask());
}		}

MachineBasicBlock::iterator SIFrameLowering::eliminateCallFramePseudoInstr(		MachineBasicBlock::iterator SIFrameLowering::eliminateCallFramePseudoInstr(
MachineFunction &MF,		MachineFunction &MF,
MachineBasicBlock &MBB,		MachineBasicBlock &MBB,
MachineBasicBlock::iterator I) const {		MachineBasicBlock::iterator I) const {
int64_t Amount = I->getOperand(0).getImm();		int64_t Amount = I->getOperand(0).getImm();
if (Amount == 0)		if (Amount == 0)
▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SILowerSGPRSpills.cpp

This file was added.

				//===-- SILowerSGPRSPills.cpp ---------------------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Handle SGPR spills. This pass takes the place of PrologEpilogInserter for all
				// SGPR spills, so must insert CSR SGPR spills as well as expand them.
				//
				// This pass must never create new SGPR virtual registers.
				//
				// FIXME: Must stop RegScavenger spills in later passes.
				//
				//===----------------------------------------------------------------------===//

				#include "AMDGPU.h"
				#include "AMDGPUSubtarget.h"
				#include "SIInstrInfo.h"
				#include "SIMachineFunctionInfo.h"
				#include "llvm/CodeGen/MachineBasicBlock.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineOperand.h"
				#include "llvm/CodeGen/LiveIntervals.h"
				#include "llvm/CodeGen/VirtRegMap.h"
				#include "llvm/Target/TargetMachine.h"

				using namespace llvm;

				#define DEBUG_TYPE "si-lower-sgpr-spills"

				using MBBVector = SmallVector<MachineBasicBlock *, 4>;

				namespace {

				class SILowerSGPRSpills : public MachineFunctionPass {
				private:
				const SIRegisterInfo *TRI = nullptr;
				const SIInstrInfo *TII = nullptr;
				VirtRegMap *VRM = nullptr;
				LiveIntervals *LIS = nullptr;


				// Save and Restore blocks of the current function. Typically there is a
				// single save block, unless Windows EH funclets are involved.
				MBBVector SaveBlocks;
				MBBVector RestoreBlocks;

				public:
				static char ID;

				SILowerSGPRSpills() : MachineFunctionPass(ID) {}

				void calculateSaveRestoreBlocks(MachineFunction &MF);
				bool spillCalleeSavedRegs(MachineFunction &MF);

				bool runOnMachineFunction(MachineFunction &MF) override;

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesAll();
				MachineFunctionPass::getAnalysisUsage(AU);
				}
				};

				} // end anonymous namespace

				char SILowerSGPRSpills::ID = 0;

				INITIALIZE_PASS_BEGIN(SILowerSGPRSpills, DEBUG_TYPE,
				"SI lower SGPR spill instructions", false, false)
				INITIALIZE_PASS_DEPENDENCY(LiveIntervals)
				INITIALIZE_PASS_DEPENDENCY(VirtRegMap)
				INITIALIZE_PASS_END(SILowerSGPRSpills, DEBUG_TYPE,
				"SI lower SGPR spill instructions", false, false)

				char &llvm::SILowerSGPRSpillsID = SILowerSGPRSpills::ID;

				/// Insert restore code for the callee-saved registers used in the function.
				static void insertCSRSaves(MachineBasicBlock &SaveBlock,
				ArrayRef<CalleeSavedInfo> CSI,
				LiveIntervals *LIS) {
				MachineFunction &MF = *SaveBlock.getParent();
				const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
				const TargetFrameLowering *TFI = MF.getSubtarget().getFrameLowering();
				const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();

				MachineBasicBlock::iterator I = SaveBlock.begin();
				if (!TFI->spillCalleeSavedRegisters(SaveBlock, I, CSI, TRI)) {
				for (const CalleeSavedInfo &CS : CSI) {
				// Insert the spill to the stack frame.
				unsigned Reg = CS.getReg();

				MachineInstrSpan MIS(I);
				const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg);

				TII.storeRegToStackSlot(SaveBlock, I, Reg, true, CS.getFrameIdx(), RC,
				TRI);

				if (LIS) {
				assert(std::distance(MIS.begin(), I) == 1);
				MachineInstr &Inst = *std::prev(I);

				LIS->InsertMachineInstrInMaps(Inst);
				LIS->removePhysReg(Reg);
				}
				}
				}
				}

				/// Insert restore code for the callee-saved registers used in the function.
				static void insertCSRRestores(MachineBasicBlock &RestoreBlock,
				std::vector<CalleeSavedInfo> &CSI,
				LiveIntervals *LIS) {
				MachineFunction &MF = *RestoreBlock.getParent();
				const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
				const TargetFrameLowering *TFI = MF.getSubtarget().getFrameLowering();
				const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();

				// Restore all registers immediately before the return and any
				// terminators that precede it.
				MachineBasicBlock::iterator I = RestoreBlock.getFirstTerminator();

				// FIXME: Just emit the readlane/writelane directly
				if (!TFI->restoreCalleeSavedRegisters(RestoreBlock, I, CSI, TRI)) {
				for (const CalleeSavedInfo &CI : reverse(CSI)) {
				unsigned Reg = CI.getReg();
				const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg);

				TII.loadRegFromStackSlot(RestoreBlock, I, Reg, CI.getFrameIdx(), RC, TRI);
				assert(I != RestoreBlock.begin() &&
				"loadRegFromStackSlot didn't insert any code!");
				// Insert in reverse order. loadRegFromStackSlot can insert
				// multiple instructions.

				if (LIS) {
				MachineInstr &Inst = *std::prev(I);
				LIS->InsertMachineInstrInMaps(Inst);
				LIS->removePhysReg(Reg);
				}
				}
				}
				}

				/// Compute the sets of entry and return blocks for saving and restoring
				/// callee-saved registers, and placing prolog and epilog code.
				void SILowerSGPRSpills::calculateSaveRestoreBlocks(MachineFunction &MF) {
				const MachineFrameInfo &MFI = MF.getFrameInfo();

				// Even when we do not change any CSR, we still want to insert the
				// prologue and epilogue of the function.
				// So set the save points for those.

				// Use the points found by shrink-wrapping, if any.
				if (MFI.getSavePoint()) {
				SaveBlocks.push_back(MFI.getSavePoint());
				assert(MFI.getRestorePoint() && "Both restore and save must be set");
				MachineBasicBlock *RestoreBlock = MFI.getRestorePoint();
				// If RestoreBlock does not have any successor and is not a return block
				// then the end point is unreachable and we do not need to insert any
				// epilogue.
				if (!RestoreBlock->succ_empty() \|\| RestoreBlock->isReturnBlock())
				RestoreBlocks.push_back(RestoreBlock);
				return;
				}

				// Save refs to entry and return blocks.
				SaveBlocks.push_back(&MF.front());
				for (MachineBasicBlock &MBB : MF) {
				if (MBB.isEHFuncletEntry())
				SaveBlocks.push_back(&MBB);
				if (MBB.isReturnBlock())
				RestoreBlocks.push_back(&MBB);
				}
				}

				bool SILowerSGPRSpills::spillCalleeSavedRegs(MachineFunction &MF) {
				const Function &F = MF.getFunction();
				const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
				const SIFrameLowering *TFI = ST.getFrameLowering();
				MachineFrameInfo &MFI = MF.getFrameInfo();
				RegScavenger *RS = nullptr;

				// Determine which of the registers in the callee save list should be saved.
				BitVector SavedRegs;
				TFI->determineCalleeSavesSGPR(MF, SavedRegs, RS);

				// Add the code to save and restore the callee saved registers.
				if (!F.hasFnAttribute(Attribute::Naked)) {
				MFI.setCalleeSavedInfoValid(true);

				std::vector<CalleeSavedInfo> CSI;
				const MCPhysReg *CSRegs = TRI->getCalleeSavedRegs(&MF);

				for (unsigned I = 0; CSRegs[I]; ++I) {
				unsigned Reg = CSRegs[I];
				if (SavedRegs.test(Reg)) {
				const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg);

				int JunkFI = MFI.CreateStackObject(TRI->getSpillSize(*RC),
				TRI->getSpillAlignment(*RC),
				true);

				CSI.push_back(CalleeSavedInfo(Reg, JunkFI));
				}
				}

				if (!CSI.empty()) {
				for (MachineBasicBlock *SaveBlock : SaveBlocks)
				insertCSRSaves(*SaveBlock, CSI, LIS);

				for (MachineBasicBlock *RestoreBlock : RestoreBlocks)
				insertCSRRestores(*RestoreBlock, CSI, LIS);
				return true;
				}
				}

				return false;
				}

				bool SILowerSGPRSpills::runOnMachineFunction(MachineFunction &MF) {
				const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
				TII = ST.getInstrInfo();
				TRI = &TII->getRegisterInfo();

				VRM = getAnalysisIfAvailable<VirtRegMap>();
				LIS = getAnalysisIfAvailable<LiveIntervals>();

				bool AllSGPRSpilledToVGPRs = false;

				assert(SaveBlocks.empty() && RestoreBlocks.empty());

				// First, expose any CSR SGPR spills. This is mostly the same as what PEI
				// does, but somewhat simpler.
				calculateSaveRestoreBlocks(MF);
				bool HasCSRs = spillCalleeSavedRegs(MF);

				MachineFrameInfo &MFI = MF.getFrameInfo();
				if (!MFI.hasStackObjects() && !HasCSRs) {
				SaveBlocks.clear();
				RestoreBlocks.clear();
				return false;
				}


				SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();
				bool MadeChange = false;

				if (TRI->spillSGPRToVGPR() && (HasCSRs \|\| FuncInfo->hasSpilledSGPRs())) {
				AllSGPRSpilledToVGPRs = true;

				// Process all SGPR spills before frame offsets are finalized. Ideally SGPRs
				// are spilled to VGPRs, in which case we can eliminate the stack usage.
				//
				// This operates under the assumption that only other SGPR spills are users
				// of the frame index.
				for (MachineBasicBlock &MBB : MF) {
				MachineBasicBlock::iterator Next;
				for (auto I = MBB.begin(), E = MBB.end(); I != E; I = Next) {
				MachineInstr &MI = *I;
				Next = std::next(I);

				if (!TII->isSGPRSpill(MI))
				continue;

				int FI = TII->getNamedOperand(MI, AMDGPU::OpName::addr)->getIndex();
				assert(MFI.getStackID(FI) == SIStackID::SGPR_SPILL);
				if (FuncInfo->allocateSGPRSpillToVGPR(MF, FI, LIS)) {
				bool Spilled = TRI->eliminateSGPRToVGPRSpillFrameIndex(MI, FI, nullptr, LIS);
				(void)Spilled;
				assert(Spilled && "failed to spill SGPR to VGPR when allocated");
				} else
				AllSGPRSpilledToVGPRs = false;
				}
				}

				if (VRM) {
				// We created new virtual registers for the SGPR spills, so we need to grow
				// VirtRegMap
				VRM->grow();
				}

				FuncInfo->removeSGPRToVGPRFrameIndices(MFI);
				MadeChange = true;
				}

				// Re-freeze reserved registers, as we've added new VGPRs to reserve.
				if (MadeChange)
				MF.getRegInfo().freezeReservedRegs(MF);

				SaveBlocks.clear();
				RestoreBlocks.clear();

				return MadeChange;
				}

lib/Target/AMDGPU/SIMachineFunctionInfo.h

Show All 15 Lines

#include "AMDGPUArgumentUsageInfo.h"		#include "AMDGPUArgumentUsageInfo.h"
#include "AMDGPUMachineFunction.h"		#include "AMDGPUMachineFunction.h"
#include "SIInstrInfo.h"		#include "SIInstrInfo.h"
#include "SIRegisterInfo.h"		#include "SIRegisterInfo.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/Optional.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/CodeGen/PseudoSourceValue.h"		#include "llvm/CodeGen/PseudoSourceValue.h"
#include "llvm/CodeGen/TargetInstrInfo.h"		#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/MC/MCRegisterInfo.h"		#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include <array>		#include <array>
#include <cassert>		#include <cassert>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {

		class LiveIntervals;
class MachineFrameInfo;		class MachineFrameInfo;
class MachineFunction;		class MachineFunction;
class TargetRegisterClass;		class TargetRegisterClass;

class AMDGPUImagePseudoSourceValue : public PseudoSourceValue {		class AMDGPUImagePseudoSourceValue : public PseudoSourceValue {
public:		public:
// TODO: Is the img rsrc useful?		// TODO: Is the img rsrc useful?
explicit AMDGPUImagePseudoSourceValue(const TargetInstrInfo &TII) :		explicit AMDGPUImagePseudoSourceValue(const TargetInstrInfo &TII) :
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	struct SpilledReg {

SpilledReg() = default;		SpilledReg() = default;
SpilledReg(unsigned R, int L) : VGPR (R), Lane (L) {}		SpilledReg(unsigned R, int L) : VGPR (R), Lane (L) {}

bool hasLane() { return Lane != -1;}		bool hasLane() { return Lane != -1;}
bool hasReg() { return VGPR != 0;}		bool hasReg() { return VGPR != 0;}
};		};

struct SGPRSpillVGPRCSR {		using SGPRSpillMap = DenseMap<int, std::vector<SpilledReg>>;
// VGPR used for SGPR spills
unsigned VGPR;

// If the VGPR is a CSR, the stack slot used to save/restore it in the
// prolog/epilog.
Optional<int> FI;

SGPRSpillVGPRCSR(unsigned V, Optional<int> F) : VGPR(V), FI(F) {}
};

private:		private:
// SGPR->VGPR spilling support.		// SGPR->VGPR spilling support.
using SpillRegMask = std::pair<unsigned, unsigned>;		using SpillRegMask = std::pair<unsigned, unsigned>;

// Track VGPR + wave index for each subregister of the SGPR spilled to		// Track VGPR + wave index for each subregister of the SGPR spilled to
// frameindex key.		// frameindex key.
DenseMap<int, std::vector<SpilledReg>> SGPRToVGPRSpills;		SGPRSpillMap SGPRToVGPRSpills;
unsigned NumVGPRSpillLanes = 0;		unsigned NumVGPRSpillLanes = 0;
SmallVector<SGPRSpillVGPRCSR, 2> SpillVGPRs;		SmallVector<unsigned, 2> SpillVGPRs;

public:		public:
SIMachineFunctionInfo(const MachineFunction &MF);		SIMachineFunctionInfo(const MachineFunction &MF);

ArrayRef<SpilledReg> getSGPRToVGPRSpills(int FrameIndex) const {		ArrayRef<SpilledReg> getSGPRToVGPRSpills(int FrameIndex) const {
auto I = SGPRToVGPRSpills.find(FrameIndex);		auto I = SGPRToVGPRSpills.find(FrameIndex);
return (I == SGPRToVGPRSpills.end()) ?		return (I == SGPRToVGPRSpills.end()) ?
ArrayRef<SpilledReg>() : makeArrayRef(I->second);		ArrayRef<SpilledReg>() : makeArrayRef(I->second);
}		}

ArrayRef<SGPRSpillVGPRCSR> getSGPRSpillVGPRs() const {		iterator_range<SGPRSpillMap::const_iterator> sgpr_spill_vgprs() const {
		return SGPRToVGPRSpills;
		}

		ArrayRef<unsigned> getSGPRSpillVGPRs() const {
return SpillVGPRs;		return SpillVGPRs;
}		}

bool allocateSGPRSpillToVGPR(MachineFunction &MF, int FI);		bool allocateSGPRSpillToVGPR(MachineFunction &MF, int FI,
		LiveIntervals *LIS = nullptr);
void removeSGPRToVGPRFrameIndices(MachineFrameInfo &MFI);		void removeSGPRToVGPRFrameIndices(MachineFrameInfo &MFI);

bool hasCalculatedTID() const { return TIDReg != 0; };		bool hasCalculatedTID() const { return TIDReg != 0; };
unsigned getTIDReg() const { return TIDReg; };		unsigned getTIDReg() const { return TIDReg; };
void setTIDReg(unsigned Reg) { TIDReg = Reg; }		void setTIDReg(unsigned Reg) { TIDReg = Reg; }

unsigned getBytesInStackArgArea() const {		unsigned getBytesInStackArgArea() const {
return BytesInStackArgArea;		return BytesInStackArgArea;
▲ Show 20 Lines • Show All 412 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

	//===- SIMachineFunctionInfo.cpp - SI Machine Function Info ---------------===//			//===- SIMachineFunctionInfo.cpp - SI Machine Function Info ---------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "SIMachineFunctionInfo.h"			#include "SIMachineFunctionInfo.h"
	#include "AMDGPUArgumentUsageInfo.h"			#include "AMDGPUArgumentUsageInfo.h"
	#include "AMDGPUSubtarget.h"			#include "AMDGPUSubtarget.h"
	#include "SIRegisterInfo.h"			#include "SIRegisterInfo.h"
	#include "MCTargetDesc/AMDGPUMCTargetDesc.h"			#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
	#include "Utils/AMDGPUBaseInfo.h"			#include "Utils/AMDGPUBaseInfo.h"
	#include "llvm/ADT/Optional.h"			#include "llvm/ADT/Optional.h"
				#include "llvm/CodeGen/LiveIntervals.h"
	#include "llvm/CodeGen/MachineBasicBlock.h"			#include "llvm/CodeGen/MachineBasicBlock.h"
	#include "llvm/CodeGen/MachineFrameInfo.h"			#include "llvm/CodeGen/MachineFrameInfo.h"
	#include "llvm/CodeGen/MachineFunction.h"			#include "llvm/CodeGen/MachineFunction.h"
	#include "llvm/CodeGen/MachineRegisterInfo.h"			#include "llvm/CodeGen/MachineRegisterInfo.h"
	#include "llvm/IR/CallingConv.h"			#include "llvm/IR/CallingConv.h"
				#include "llvm/IR/DiagnosticInfo.h"
	#include "llvm/IR/Function.h"			#include "llvm/IR/Function.h"
	#include <cassert>			#include <cassert>
	#include <vector>			#include <vector>

	#define MAX_LANES 64			#define MAX_LANES 64

	using namespace llvm;			using namespace llvm;

	▲ Show 20 Lines • Show All 195 Lines • ▼ Show 20 Lines

	unsigned SIMachineFunctionInfo::addImplicitBufferPtr(const SIRegisterInfo &TRI) {			unsigned SIMachineFunctionInfo::addImplicitBufferPtr(const SIRegisterInfo &TRI) {
	ArgInfo.ImplicitBufferPtr = ArgDescriptor::createRegister(TRI.getMatchingSuperReg(			ArgInfo.ImplicitBufferPtr = ArgDescriptor::createRegister(TRI.getMatchingSuperReg(
	getNextUserSGPR(), AMDGPU::sub0, &AMDGPU::SReg_64RegClass));			getNextUserSGPR(), AMDGPU::sub0, &AMDGPU::SReg_64RegClass));
	NumUserSGPRs += 2;			NumUserSGPRs += 2;
	return ArgInfo.ImplicitBufferPtr.getRegister();			return ArgInfo.ImplicitBufferPtr.getRegister();
	}			}

	static bool isCalleeSavedReg(const MCPhysReg *CSRegs, MCPhysReg Reg) {
	for (unsigned I = 0; CSRegs[I]; ++I) {
	if (CSRegs[I] == Reg)
	return true;
	}

	return false;
	}

	/// Reserve a slice of a VGPR to support spilling for FrameIndex \p FI.			/// Reserve a slice of a VGPR to support spilling for FrameIndex \p FI.
	bool SIMachineFunctionInfo::allocateSGPRSpillToVGPR(MachineFunction &MF,			bool SIMachineFunctionInfo::allocateSGPRSpillToVGPR(MachineFunction &MF,
	int FI) {			int FI,
				LiveIntervals *LIS) {
	std::vector<SpilledReg> &SpillLanes = SGPRToVGPRSpills[FI];			std::vector<SpilledReg> &SpillLanes = SGPRToVGPRSpills[FI];

	// This has already been allocated.			// This has already been allocated.
	if (!SpillLanes.empty())			if (!SpillLanes.empty())
	return true;			return true;

	const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();			const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
				const SIInstrInfo *TII = ST.getInstrInfo();
	const SIRegisterInfo *TRI = ST.getRegisterInfo();			const SIRegisterInfo *TRI = ST.getRegisterInfo();
	MachineFrameInfo &FrameInfo = MF.getFrameInfo();			MachineFrameInfo &FrameInfo = MF.getFrameInfo();
	MachineRegisterInfo &MRI = MF.getRegInfo();			MachineRegisterInfo &MRI = MF.getRegInfo();
	unsigned WaveSize = ST.getWavefrontSize();			unsigned WaveSize = ST.getWavefrontSize();

	unsigned Size = FrameInfo.getObjectSize(FI);			unsigned Size = FrameInfo.getObjectSize(FI);
	assert(Size >= 4 && Size <= 64 && "invalid sgpr spill size");			assert(Size >= 4 && Size <= 64 && "invalid sgpr spill size");
	assert(TRI->spillSGPRToVGPR() && "not spilling SGPRs to VGPRs");			assert(TRI->spillSGPRToVGPR() && "not spilling SGPRs to VGPRs");

	int NumLanes = Size / 4;			int NumLanes = Size / 4;

	const MCPhysReg *CSRegs = TRI->getCalleeSavedRegs(&MF);

	// Make sure to handle the case where a wide SGPR spill may span between two			// Make sure to handle the case where a wide SGPR spill may span between two
	// VGPRs.			// VGPRs.
	for (int I = 0; I < NumLanes; ++I, ++NumVGPRSpillLanes) {			for (int I = 0; I < NumLanes; ++I, ++NumVGPRSpillLanes) {
	unsigned LaneVGPR;			unsigned LaneVGPR;
	unsigned VGPRIndex = (NumVGPRSpillLanes % WaveSize);			unsigned VGPRIndex = (NumVGPRSpillLanes % WaveSize);

	if (VGPRIndex == 0) {			if (VGPRIndex == 0) {
	LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);			LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);
	if (LaneVGPR == AMDGPU::NoRegister) {			if (LaneVGPR == AMDGPU::NoRegister) {
	// We have no VGPRs left for spilling SGPRs. Reset because we will not			// We have no VGPRs left for spilling SGPRs. Reset because we will not
	// partially spill the SGPR to VGPRs.			// partially spill the SGPR to VGPRs.
	SGPRToVGPRSpills.erase(FI);			SGPRToVGPRSpills.erase(FI);
	NumVGPRSpillLanes -= I;			NumVGPRSpillLanes -= I;

				DiagnosticInfoResourceLimit DiagOutOfRegs(MF.getFunction(),
				"VGPRs for SGPR spilling",
				0, DS_Error);
				MF.getFunction().getContext().diagnose(DiagOutOfRegs);
	return false;			return false;
	}			}

	Optional<int> CSRSpillFI;			MachineBasicBlock &EntryBB = MF.front();
	if ((FrameInfo.hasCalls() \|\| !isEntryFunction()) && CSRegs &&
	isCalleeSavedReg(CSRegs, LaneVGPR)) {
	CSRSpillFI = FrameInfo.CreateSpillStackObject(4, 4);
	}

	SpillVGPRs.push_back(SGPRSpillVGPRCSR(LaneVGPR, CSRSpillFI));			MachineInstr *ImpDef
				= BuildMI(EntryBB, EntryBB.front(),
	// Add this register as live-in to all blocks to avoid machine verifer			DebugLoc(), TII->get(TargetOpcode::IMPLICIT_DEF), LaneVGPR);
	// complaining about use of an undefined physical register.			if (LIS)
	for (MachineBasicBlock &BB : MF)			LIS->InsertMachineInstrInMaps(*ImpDef);
	BB.addLiveIn(LaneVGPR);
				SpillVGPRs.push_back(LaneVGPR);
	} else {			} else {
	LaneVGPR = SpillVGPRs.back().VGPR;			LaneVGPR = SpillVGPRs.back();
	}			}

	SpillLanes.push_back(SpilledReg(LaneVGPR, VGPRIndex));			SpillLanes.push_back(SpilledReg(LaneVGPR, VGPRIndex));
	}			}

	return true;			return true;
	}			}

	void SIMachineFunctionInfo::removeSGPRToVGPRFrameIndices(MachineFrameInfo &MFI) {			void SIMachineFunctionInfo::removeSGPRToVGPRFrameIndices(MachineFrameInfo &MFI) {
	for (auto &R : SGPRToVGPRSpills)			for (auto &R : SGPRToVGPRSpills)
	MFI.RemoveStackObject(R.first);			MFI.RemoveStackObject(R.first);
	}			}


	/// \returns VGPR used for \p Dim' work item ID.			/// \returns VGPR used for \p Dim' work item ID.
	unsigned SIMachineFunctionInfo::getWorkItemIDVGPR(unsigned Dim) const {			unsigned SIMachineFunctionInfo::getWorkItemIDVGPR(unsigned Dim) const {
	switch (Dim) {			switch (Dim) {
	case 0:			case 0:
	assert(hasWorkItemIDX());			assert(hasWorkItemIDX());
	return AMDGPU::VGPR0;			return AMDGPU::VGPR0;
	case 1:			case 1:
	assert(hasWorkItemIDY());			assert(hasWorkItemIDY());
	Show All 16 Lines

lib/Target/AMDGPU/SIRegisterInfo.h

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	public:

bool isFrameOffsetLegal(const MachineInstr *MI, unsigned BaseReg,		bool isFrameOffsetLegal(const MachineInstr *MI, unsigned BaseReg,
int64_t Offset) const override;		int64_t Offset) const override;

const TargetRegisterClass *getPointerRegClass(		const TargetRegisterClass *getPointerRegClass(
const MachineFunction &MF, unsigned Kind = 0) const override;		const MachineFunction &MF, unsigned Kind = 0) const override;

/// If \p OnlyToVGPR is true, this will only succeed if this		/// If \p OnlyToVGPR is true, this will only succeed if this
		bool spillSGPRImpl(MachineBasicBlock::iterator MI,
		const DebugLoc &DL,
		unsigned Reg,
		bool IsKill,
		int Index,
		RegScavenger *RS,
		LiveIntervals *LIS = nullptr,
		bool OnlyToVGPR = false) const;

bool spillSGPR(MachineBasicBlock::iterator MI,		bool spillSGPR(MachineBasicBlock::iterator MI,
int FI, RegScavenger *RS,		int FI, RegScavenger *RS,
		LiveIntervals *LIS = nullptr,
bool OnlyToVGPR = false) const;		bool OnlyToVGPR = false) const;

bool restoreSGPR(MachineBasicBlock::iterator MI,		bool restoreSGPR(MachineBasicBlock::iterator MI,
int FI, RegScavenger *RS,		int FI, RegScavenger *RS,
		LiveIntervals *LIS = nullptr,
bool OnlyToVGPR = false) const;		bool OnlyToVGPR = false) const;

void eliminateFrameIndex(MachineBasicBlock::iterator MI, int SPAdj,		void eliminateFrameIndex(MachineBasicBlock::iterator MI, int SPAdj,
unsigned FIOperandNum,		unsigned FIOperandNum,
RegScavenger *RS) const override;		RegScavenger *RS) const override;

bool eliminateSGPRToVGPRSpillFrameIndex(MachineBasicBlock::iterator MI,		bool eliminateSGPRToVGPRSpillFrameIndex(MachineBasicBlock::iterator MI,
int FI, RegScavenger *RS) const;		int FI, RegScavenger *RS,
		LiveIntervals *LIS = nullptr) const;

StringRef getRegAsmName(unsigned Reg) const override;		StringRef getRegAsmName(unsigned Reg) const override;

unsigned getHWRegIndex(unsigned Reg) const {		unsigned getHWRegIndex(unsigned Reg) const {
return getEncodingValue(Reg) & 0xff;		return getEncodingValue(Reg) & 0xff;
}		}

/// Return the 'base' register class for this register.		/// Return the 'base' register class for this register.
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	public:
bool opCanUseInlineConstant(unsigned OpType) const {		bool opCanUseInlineConstant(unsigned OpType) const {
return OpType >= AMDGPU::OPERAND_SRC_FIRST &&		return OpType >= AMDGPU::OPERAND_SRC_FIRST &&
OpType <= AMDGPU::OPERAND_SRC_LAST;		OpType <= AMDGPU::OPERAND_SRC_LAST;
}		}

unsigned findUnusedRegister(const MachineRegisterInfo &MRI,		unsigned findUnusedRegister(const MachineRegisterInfo &MRI,
const TargetRegisterClass *RC,		const TargetRegisterClass *RC,
const MachineFunction &MF) const;		const MachineFunction &MF) const;

unsigned getSGPRPressureSet() const { return SGPRSetID; };		unsigned getSGPRPressureSet() const { return SGPRSetID; };
unsigned getVGPRPressureSet() const { return VGPRSetID; };		unsigned getVGPRPressureSet() const { return VGPRSetID; };

const TargetRegisterClass *getRegClassForReg(const MachineRegisterInfo &MRI,		const TargetRegisterClass *getRegClassForReg(const MachineRegisterInfo &MRI,
unsigned Reg) const;		unsigned Reg) const;
bool isVGPR(const MachineRegisterInfo &MRI, unsigned Reg) const;		bool isVGPR(const MachineRegisterInfo &MRI, unsigned Reg) const;

bool isSGPRPressureSet(unsigned SetID) const {		bool isSGPRPressureSet(unsigned SetID) const {
Show All 23 Lines	public:
const int *getRegUnitPressureSets(unsigned RegUnit) const override;		const int *getRegUnitPressureSets(unsigned RegUnit) const override;

unsigned getReturnAddressReg(const MachineFunction &MF) const;		unsigned getReturnAddressReg(const MachineFunction &MF) const;

const TargetRegisterClass *		const TargetRegisterClass *
getConstrainedRegClassForOperand(const MachineOperand &MO,		getConstrainedRegClassForOperand(const MachineOperand &MO,
const MachineRegisterInfo &MRI) const override;		const MachineRegisterInfo &MRI) const override;

		const uint32_t *getAllVGPRRegMask() const;

private:		private:
void buildSpillLoadStore(MachineBasicBlock::iterator MI,		void buildSpillLoadStore(MachineBasicBlock::iterator MI,
unsigned LoadStoreOp,		unsigned LoadStoreOp,
int Index,		int Index,
unsigned ValueReg,		unsigned ValueReg,
bool ValueIsKill,		bool ValueIsKill,
unsigned ScratchRsrcReg,		unsigned ScratchRsrcReg,
unsigned ScratchOffsetReg,		unsigned ScratchOffsetReg,
int64_t InstrOffset,		int64_t InstrOffset,
MachineMemOperand *MMO,		MachineMemOperand *MMO,
RegScavenger *RS) const;		RegScavenger *RS) const;
};		};

} // End namespace llvm		} // End namespace llvm

#endif		#endif

lib/Target/AMDGPU/SIRegisterInfo.cpp

Show All 12 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "SIRegisterInfo.h"		#include "SIRegisterInfo.h"
#include "AMDGPURegisterBankInfo.h"		#include "AMDGPURegisterBankInfo.h"
#include "AMDGPUSubtarget.h"		#include "AMDGPUSubtarget.h"
#include "SIInstrInfo.h"		#include "SIInstrInfo.h"
#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
		#include "llvm/CodeGen/LiveIntervals.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/RegisterScavenging.h"		#include "llvm/CodeGen/RegisterScavenging.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"

using namespace llvm;		using namespace llvm;

▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
}		}

unsigned FrameReg = MFI->getFrameOffsetReg();		unsigned FrameReg = MFI->getFrameOffsetReg();
if (FrameReg != AMDGPU::NoRegister) {		if (FrameReg != AMDGPU::NoRegister) {
reserveRegisterTuples(Reserved, FrameReg);		reserveRegisterTuples(Reserved, FrameReg);
assert(!isSubRegister(ScratchRSrcReg, FrameReg));		assert(!isSubRegister(ScratchRSrcReg, FrameReg));
}		}

		// Reserve VGPRs used for SGPR spilling.
		// Note we treat freezeReservedRegs unusually because we run register
		// allocation in two phases. It's OK to re-freeze with new registers for the
		// second run.
		for (auto &SpilledFI : MFI->sgpr_spill_vgprs()) {
		for (auto &SpilledVGPR : SpilledFI.second)
		reserveRegisterTuples(Reserved, SpilledVGPR.VGPR);
		}

return Reserved;		return Reserved;
}		}

bool SIRegisterInfo::requiresRegisterScavenging(const MachineFunction &Fn) const {		bool SIRegisterInfo::requiresRegisterScavenging(const MachineFunction &Fn) const {
const SIMachineFunctionInfo *Info = Fn.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = Fn.getInfo<SIMachineFunctionInfo>();
if (Info->isEntryFunction()) {		if (Info->isEntryFunction()) {
const MachineFrameInfo &MFI = Fn.getFrameInfo();		const MachineFrameInfo &MFI = Fn.getFrameInfo();
return MFI.hasStackObjects() \|\| MFI.hasCalls();		return MFI.hasStackObjects() \|\| MFI.hasCalls();
▲ Show 20 Lines • Show All 399 Lines • ▼ Show 20 Lines	if (SuperRegSize % 8 == 0) {
return { 8, Store ? AMDGPU::S_BUFFER_STORE_DWORDX2_SGPR :		return { 8, Store ? AMDGPU::S_BUFFER_STORE_DWORDX2_SGPR :
AMDGPU::S_BUFFER_LOAD_DWORDX2_SGPR };		AMDGPU::S_BUFFER_LOAD_DWORDX2_SGPR };
}		}

return { 4, Store ? AMDGPU::S_BUFFER_STORE_DWORD_SGPR :		return { 4, Store ? AMDGPU::S_BUFFER_STORE_DWORD_SGPR :
AMDGPU::S_BUFFER_LOAD_DWORD_SGPR};		AMDGPU::S_BUFFER_LOAD_DWORD_SGPR};
}		}

bool SIRegisterInfo::spillSGPR(MachineBasicBlock::iterator MI,		bool SIRegisterInfo::spillSGPRImpl(MachineBasicBlock::iterator MI,
		const DebugLoc &DL,
		unsigned SuperReg,
		bool IsKill,
int Index,		int Index,
RegScavenger *RS,		RegScavenger *RS,
		LiveIntervals *LIS,
bool OnlyToVGPR) const {		bool OnlyToVGPR) const {
MachineBasicBlock *MBB = MI->getParent();		MachineBasicBlock *MBB = MI->getParent();
MachineFunction *MF = MBB->getParent();		MachineFunction *MF = MBB->getParent();
SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
DenseSet<unsigned> SGPRSpillVGPRDefinedSet;		DenseSet<unsigned> SGPRSpillVGPRDefinedSet;

ArrayRef<SIMachineFunctionInfo::SpilledReg> VGPRSpills		ArrayRef<SIMachineFunctionInfo::SpilledReg> VGPRSpills
= MFI->getSGPRToVGPRSpills(Index);		= MFI->getSGPRToVGPRSpills(Index);
bool SpillToVGPR = !VGPRSpills.empty();		bool SpillToVGPR = !VGPRSpills.empty();
if (OnlyToVGPR && !SpillToVGPR)		if (OnlyToVGPR && !SpillToVGPR)
return false;		return false;

MachineRegisterInfo &MRI = MF->getRegInfo();		MachineRegisterInfo &MRI = MF->getRegInfo();
const GCNSubtarget &ST = MF->getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF->getSubtarget<GCNSubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();

unsigned SuperReg = MI->getOperand(0).getReg();
bool IsKill = MI->getOperand(0).isKill();
const DebugLoc &DL = MI->getDebugLoc();

MachineFrameInfo &FrameInfo = MF->getFrameInfo();		MachineFrameInfo &FrameInfo = MF->getFrameInfo();


bool SpillToSMEM = spillSGPRToSMEM();		bool SpillToSMEM = spillSGPRToSMEM();
if (SpillToSMEM && OnlyToVGPR)		if (SpillToSMEM && OnlyToVGPR)
return false;		return false;

assert(SpillToVGPR \|\| (SuperReg != MFI->getStackPtrOffsetReg() &&		assert(SpillToVGPR \|\| (SuperReg != MFI->getStackPtrOffsetReg() &&
SuperReg != MFI->getFrameOffsetReg() &&		SuperReg != MFI->getFrameOffsetReg() &&
SuperReg != MFI->getScratchWaveOffsetReg()));		SuperReg != MFI->getScratchWaveOffsetReg()));

▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	if (SpillToSMEM) {
.addMemOperand(MMO);		.addMemOperand(MMO);

continue;		continue;
}		}

if (SpillToVGPR) {		if (SpillToVGPR) {
SIMachineFunctionInfo::SpilledReg Spill = VGPRSpills[i];		SIMachineFunctionInfo::SpilledReg Spill = VGPRSpills[i];

// During SGPR spilling to VGPR, determine if the VGPR is defined. The
// only circumstance in which we say it is undefined is when it is the
// first spill to this VGPR in the first basic block.
bool VGPRDefined = true;
if (MBB == &MF->front())
VGPRDefined = !SGPRSpillVGPRDefinedSet.insert(Spill.VGPR).second;

// Mark the "old value of vgpr" input undef only if this is the first sgpr		// Mark the "old value of vgpr" input undef only if this is the first sgpr
// spill to this specific vgpr in the first basic block.		// spill to this specific vgpr in the first basic block.
BuildMI(*MBB, MI, DL,		MachineInstr Writelane = BuildMI(MBB, MI, DL,
TII->getMCOpcodeFromPseudo(AMDGPU::V_WRITELANE_B32),		TII->getMCOpcodeFromPseudo(AMDGPU::V_WRITELANE_B32),
Spill.VGPR)		Spill.VGPR)
.addReg(SubReg, getKillRegState(IsKill))		.addReg(SubReg, getKillRegState(IsKill))
.addImm(Spill.Lane)		.addImm(Spill.Lane)
.addReg(Spill.VGPR, VGPRDefined ? 0 : RegState::Undef);		.addReg(Spill.VGPR);

		if (LIS) {
		if (i == 0)
		LIS->ReplaceMachineInstrInMaps(MI, Writelane);
		else
		LIS->InsertMachineInstrInMaps(*Writelane);
		}

// FIXME: Since this spills to another register instead of an actual
// frame index, we should delete the frame index when all references to
// it are fixed.
} else {		} else {
// XXX - Can to VGPR spill fail for some subregisters but not others?		// XXX - Can to VGPR spill fail for some subregisters but not others?
if (OnlyToVGPR)		if (OnlyToVGPR)
return false;		return false;

// Spill SGPR to a frame index.		// Spill SGPR to a frame index.
// TODO: Should VI try to spill to VGPR and then spill to SMEM?		// TODO: Should VI try to spill to VGPR and then spill to SMEM?
unsigned TmpReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);		unsigned TmpReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
Show All 30 Lines	for (unsigned i = 0, e = NumSubRegs; i < e; ++i) {
}		}
}		}

if (M0CopyReg != AMDGPU::NoRegister) {		if (M0CopyReg != AMDGPU::NoRegister) {
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::COPY), AMDGPU::M0)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::COPY), AMDGPU::M0)
.addReg(M0CopyReg, RegState::Kill);		.addReg(M0CopyReg, RegState::Kill);
}		}

MI->eraseFromParent();
MFI->addToSpilledSGPRs(NumSubRegs);		MFI->addToSpilledSGPRs(NumSubRegs);

		if (LIS)
		LIS->removePhysReg(SuperReg);

return true;		return true;
}		}

		bool SIRegisterInfo::spillSGPR(MachineBasicBlock::iterator MI,
		int Index,
		RegScavenger *RS,
		LiveIntervals *LIS,
		bool OnlyToVGPR) const {

		unsigned SuperReg = MI->getOperand(0).getReg();
		bool IsKill = MI->getOperand(0).isKill();
		const DebugLoc &DL = MI->getDebugLoc();

		auto Ret = spillSGPRImpl(MI, DL, SuperReg, IsKill, Index,
		RS, LIS, OnlyToVGPR);
		MI->eraseFromParent();
		return Ret;
		}

bool SIRegisterInfo::restoreSGPR(MachineBasicBlock::iterator MI,		bool SIRegisterInfo::restoreSGPR(MachineBasicBlock::iterator MI,
int Index,		int Index,
RegScavenger *RS,		RegScavenger *RS,
		LiveIntervals *LIS,
bool OnlyToVGPR) const {		bool OnlyToVGPR) const {
MachineFunction *MF = MI->getParent()->getParent();		MachineFunction *MF = MI->getParent()->getParent();
MachineRegisterInfo &MRI = MF->getRegInfo();		MachineRegisterInfo &MRI = MF->getRegInfo();
MachineBasicBlock *MBB = MI->getParent();		MachineBasicBlock *MBB = MI->getParent();
SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();

ArrayRef<SIMachineFunctionInfo::SpilledReg> VGPRSpills		ArrayRef<SIMachineFunctionInfo::SpilledReg> VGPRSpills
= MFI->getSGPRToVGPRSpills(Index);		= MFI->getSGPRToVGPRSpills(Index);
Show All 22 Lines	if (RS->isRegUsed(AMDGPU::M0)) {
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::COPY), M0CopyReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::COPY), M0CopyReg)
.addReg(AMDGPU::M0);		.addReg(AMDGPU::M0);
}		}
}		}

unsigned EltSize = 4;		unsigned EltSize = 4;
unsigned ScalarLoadOp;		unsigned ScalarLoadOp;

const TargetRegisterClass *RC = getPhysRegClass(SuperReg);		const TargetRegisterClass *RC = getRegClassForReg(MRI, SuperReg);
if (SpillToSMEM && isSGPRClass(RC)) {		if (SpillToSMEM && isSGPRClass(RC)) {
// XXX - if private_element_size is larger than 4 it might be useful to be		// XXX - if private_element_size is larger than 4 it might be useful to be
// able to spill wider vmem spills.		// able to spill wider vmem spills.
std::tie(EltSize, ScalarLoadOp) =		std::tie(EltSize, ScalarLoadOp) =
getSpillEltSize(getRegSizeInBits(*RC) / 8, false);		getSpillEltSize(getRegSizeInBits(*RC) / 8, false);
}		}

ArrayRef<int16_t> SplitParts = getRegSplitParts(RC, EltSize);		ArrayRef<int16_t> SplitParts = getRegSplitParts(RC, EltSize);
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if (SpillToVGPR) {
auto MIB =		auto MIB =
BuildMI(*MBB, MI, DL, TII->getMCOpcodeFromPseudo(AMDGPU::V_READLANE_B32),		BuildMI(*MBB, MI, DL, TII->getMCOpcodeFromPseudo(AMDGPU::V_READLANE_B32),
SubReg)		SubReg)
.addReg(Spill.VGPR)		.addReg(Spill.VGPR)
.addImm(Spill.Lane);		.addImm(Spill.Lane);

if (NumSubRegs > 1 && i == 0)		if (NumSubRegs > 1 && i == 0)
MIB.addReg(SuperReg, RegState::ImplicitDefine);		MIB.addReg(SuperReg, RegState::ImplicitDefine);

		if (LIS) {
		if (i == e - 1)
		LIS->ReplaceMachineInstrInMaps(MI, MIB);
		else
		LIS->InsertMachineInstrInMaps(*MIB);
		}

} else {		} else {
if (OnlyToVGPR)		if (OnlyToVGPR)
return false;		return false;

// Restore SGPR from a stack slot.		// Restore SGPR from a stack slot.
// FIXME: We should use S_LOAD_DWORD here for VI.		// FIXME: We should use S_LOAD_DWORD here for VI.
unsigned TmpReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);		unsigned TmpReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
unsigned Align = FrameInfo.getObjectAlignment(Index);		unsigned Align = FrameInfo.getObjectAlignment(Index);
Show All 22 Lines	bool SIRegisterInfo::restoreSGPR(MachineBasicBlock::iterator MI,
}		}

if (M0CopyReg != AMDGPU::NoRegister) {		if (M0CopyReg != AMDGPU::NoRegister) {
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::COPY), AMDGPU::M0)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::COPY), AMDGPU::M0)
.addReg(M0CopyReg, RegState::Kill);		.addReg(M0CopyReg, RegState::Kill);
}		}

MI->eraseFromParent();		MI->eraseFromParent();

		if (LIS)
		LIS->removePhysReg(SuperReg);

return true;		return true;
}		}

/// Special case of eliminateFrameIndex. Returns true if the SGPR was spilled to		/// Special case of eliminateFrameIndex. Returns true if the SGPR was spilled to
/// a VGPR and the stack slot can be safely eliminated when all other users are		/// a VGPR and the stack slot can be safely eliminated when all other users are
/// handled.		/// handled.
bool SIRegisterInfo::eliminateSGPRToVGPRSpillFrameIndex(		bool SIRegisterInfo::eliminateSGPRToVGPRSpillFrameIndex(
MachineBasicBlock::iterator MI,		MachineBasicBlock::iterator MI,
int FI,		int FI,
RegScavenger *RS) const {		RegScavenger *RS,
		LiveIntervals *LIS) const {
switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
case AMDGPU::SI_SPILL_S512_SAVE:		case AMDGPU::SI_SPILL_S512_SAVE:
case AMDGPU::SI_SPILL_S256_SAVE:		case AMDGPU::SI_SPILL_S256_SAVE:
case AMDGPU::SI_SPILL_S128_SAVE:		case AMDGPU::SI_SPILL_S128_SAVE:
case AMDGPU::SI_SPILL_S64_SAVE:		case AMDGPU::SI_SPILL_S64_SAVE:
case AMDGPU::SI_SPILL_S32_SAVE:		case AMDGPU::SI_SPILL_S32_SAVE:
return spillSGPR(MI, FI, RS, true);		return spillSGPR(MI, FI, RS, LIS, true);
case AMDGPU::SI_SPILL_S512_RESTORE:		case AMDGPU::SI_SPILL_S512_RESTORE:
case AMDGPU::SI_SPILL_S256_RESTORE:		case AMDGPU::SI_SPILL_S256_RESTORE:
case AMDGPU::SI_SPILL_S128_RESTORE:		case AMDGPU::SI_SPILL_S128_RESTORE:
case AMDGPU::SI_SPILL_S64_RESTORE:		case AMDGPU::SI_SPILL_S64_RESTORE:
case AMDGPU::SI_SPILL_S32_RESTORE:		case AMDGPU::SI_SPILL_S32_RESTORE:
return restoreSGPR(MI, FI, RS, true);		return restoreSGPR(MI, FI, RS, LIS, true);
default:		default:
llvm_unreachable("not an SGPR spill instruction");		llvm_unreachable("not an SGPR spill instruction");
}		}
}		}

void SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,		void SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
int SPAdj, unsigned FIOperandNum,		int SPAdj, unsigned FIOperandNum,
RegScavenger *RS) const {		RegScavenger *RS) const {
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	default: {
if (!IsMUBUF &&		if (!IsMUBUF &&
MFI->getFrameOffsetReg() != MFI->getScratchWaveOffsetReg()) {		MFI->getFrameOffsetReg() != MFI->getScratchWaveOffsetReg()) {
// Convert to an absolute stack address by finding the offset from the		// Convert to an absolute stack address by finding the offset from the
// scratch wave base and scaling by the wave size.		// scratch wave base and scaling by the wave size.
//		//
// In an entry function/kernel the stack address is already the		// In an entry function/kernel the stack address is already the
// absolute address relative to the scratch wave offset.		// absolute address relative to the scratch wave offset.

		// FIXME: We really need to guarantee this can never require a spill,
		// since SGPR spills are assumed to be all handled already during PEI.
unsigned DiffReg		unsigned DiffReg
= MRI.createVirtualRegister(&AMDGPU::SReg_32_XM0RegClass);		= MRI.createVirtualRegister(&AMDGPU::SReg_32_XM0RegClass);

bool IsCopy = MI->getOpcode() == AMDGPU::V_MOV_B32_e32;		bool IsCopy = MI->getOpcode() == AMDGPU::V_MOV_B32_e32;
unsigned ResultReg = IsCopy ?		unsigned ResultReg = IsCopy ?
MI->getOperand(0).getReg() :		MI->getOperand(0).getReg() :
MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);		MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);

▲ Show 20 Lines • Show All 518 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/callee-frame-setup.ll

Show All 31 Lines	define void @callee_with_stack() #0 {
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
ret void		ret void
}		}

; GCN-LABEL: {{^}}callee_with_stack_and_call:		; GCN-LABEL: {{^}}callee_with_stack_and_call:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN: s_mov_b32 s5, s32		; GCN: s_mov_b32 s5, s32
; GCN: buffer_store_dword v32, off, s[0:3], s5 offset:8		; GCN: buffer_store_dword v32, off, s[0:3], s5 offset:4

; GCN-DAG: v_writelane_b32 v32, s33,		; GCN-DAG: v_writelane_b32 v32, s33,
; GCN-DAG: v_writelane_b32 v32, s34,		; GCN-DAG: v_writelane_b32 v32, s34,
; GCN-DAG: v_writelane_b32 v32, s35,		; GCN-DAG: v_writelane_b32 v32, s35,
; GCN-DAG: s_add_u32 s32, s32, 0x400{{$}}		; GCN-DAG: s_add_u32 s32, s32, 0x400{{$}}
; GCN-DAG: v_mov_b32_e32 v0, 0{{$}}		; GCN-DAG: v_mov_b32_e32 v0, 0{{$}}
; GCN-DAG: buffer_store_dword v0, off, s[0:3], s5 offset:4{{$}}		; GCN-DAG: buffer_store_dword v0, off, s[0:3], s5 offset:8{{$}}
; GCN-DAG: s_mov_b32 s33, s5		; GCN-DAG: s_mov_b32 s33, s5


; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN: s_mov_b32 s5, s33		; GCN: s_mov_b32 s5, s33
; GCN-DAG: v_readlane_b32 s35,		; GCN-DAG: v_readlane_b32 s35,
; GCN-DAG: v_readlane_b32 s34,		; GCN-DAG: v_readlane_b32 s34,
; GCN-DAG: v_readlane_b32 s33,		; GCN-DAG: v_readlane_b32 s33,
; GCN: buffer_load_dword v32, off, s[0:3], s5 offset:8		; GCN: buffer_load_dword v32, off, s[0:3], s5 offset:4
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @callee_with_stack_and_call() #0 {		define void @callee_with_stack_and_call() #0 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	define void @callee_func_sgpr_spill_no_calls(i32 %in) #0 {
call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr1) #0		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr1) #0
call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr2) #0		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr2) #0
call void asm sideeffect "; use $0", "s"(<8 x i32> %wide.sgpr3) #0		call void asm sideeffect "; use $0", "s"(<8 x i32> %wide.sgpr3) #0
call void asm sideeffect "; use $0", "s"(<2 x i32> %wide.sgpr4) #0		call void asm sideeffect "; use $0", "s"(<2 x i32> %wide.sgpr4) #0
call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr5) #0		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr5) #0
ret void		ret void
}		}

		; We can use a non-csr in a leaf function.

		; GCN-LABEL: {{^}}callee_func_sgpr_spill_no_calls_low_regs:
		; GCN-NOT: buffer_store_dword
		; GCN: v_writelane_b32 v8,
		; GCN: v_readlane_b32 s{{[0-9]+}}, v8
		; GCN-NOT: buffer_load_dword
		define void @callee_func_sgpr_spill_no_calls_low_regs(i32 %in) #0 {
		call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0

		%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
		%wide.sgpr1 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
		%wide.sgpr2 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
		%wide.sgpr5 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
		%wide.sgpr3 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0
		%wide.sgpr4 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0

		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr0) #0
		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr1) #0
		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr2) #0
		call void asm sideeffect "; use $0", "s"(<8 x i32> %wide.sgpr3) #0
		call void asm sideeffect "; use $0", "s"(<2 x i32> %wide.sgpr4) #0
		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr5) #0
		ret void
		}

		; GCN-LABEL: {{^}}callee_func_sgpr_spill_calls_low_regs:
		; GCN: buffer_store_dword v32
		; GCN: v_writelane_b32 v32,
		; GCN: v_readlane_b32 s{{[0-9]+}}, v32
		; GCN: buffer_load_dword v32
		define void @callee_func_sgpr_spill_calls_low_regs(i32 %in) #0 {
		call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0

		%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
		%wide.sgpr1 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
		%wide.sgpr2 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
		%wide.sgpr5 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
		%wide.sgpr3 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0
		%wide.sgpr4 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0

		call void @external_void_func_void()

		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr0) #0
		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr1) #0
		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr2) #0
		call void asm sideeffect "; use $0", "s"(<8 x i32> %wide.sgpr3) #0
		call void asm sideeffect "; use $0", "s"(<2 x i32> %wide.sgpr4) #0
		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr5) #0
		ret void
		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }
attributes #1 = { nounwind "no-frame-pointer-elim"="true" }		attributes #1 = { nounwind "no-frame-pointer-elim"="true" }

test/CodeGen/AMDGPU/callee-special-input-sgprs.ll

	Show First 20 Lines • Show All 571 Lines • ▼ Show 20 Lines
	; GCN-DAG: s_mov_b32 s8, s16			; GCN-DAG: s_mov_b32 s8, s16

	; GCN-DAG: s_mov_b64 s{{\[}}[[LO_X:[0-9]+]]{{\:}}[[HI_X:[0-9]+]]{{\]}}, s[6:7]			; GCN-DAG: s_mov_b64 s{{\[}}[[LO_X:[0-9]+]]{{\:}}[[HI_X:[0-9]+]]{{\]}}, s[6:7]
	; GCN-DAG: s_mov_b64 s{{\[}}[[LO_Y:[0-9]+]]{{\:}}[[HI_Y:[0-9]+]]{{\]}}, s[8:9]			; GCN-DAG: s_mov_b64 s{{\[}}[[LO_Y:[0-9]+]]{{\:}}[[HI_Y:[0-9]+]]{{\]}}, s[8:9]
	; GCN-DAG: s_mov_b64 s{{\[}}[[LO_Z:[0-9]+]]{{\:}}[[HI_Z:[0-9]+]]{{\]}}, s[10:11]			; GCN-DAG: s_mov_b64 s{{\[}}[[LO_Z:[0-9]+]]{{\:}}[[HI_Z:[0-9]+]]{{\]}}, s[10:11]

	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN-DAG: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s5 offset:4			; GCN-DAG: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s5 offset:8
	; GCN-DAG: v_mov_b32_e32 v[[LO1:[0-9]+]], s[[LO_X]]			; GCN-DAG: v_mov_b32_e32 v[[LO1:[0-9]+]], s[[LO_X]]
	; GCN-DAG: v_mov_b32_e32 v[[HI1:[0-9]+]], s[[HI_X]]			; GCN-DAG: v_mov_b32_e32 v[[HI1:[0-9]+]], s[[HI_X]]
	; GCN-DAG: {{flat\|global}}_load_dword v{{[0-9]+}}, v{{\[}}[[LO1]]:[[HI1]]{{\]}}			; GCN-DAG: {{flat\|global}}_load_dword v{{[0-9]+}}, v{{\[}}[[LO1]]:[[HI1]]{{\]}}
	; GCN-DAG: v_mov_b32_e32 v[[LO2:[0-9]+]], s[[LO_Y]]			; GCN-DAG: v_mov_b32_e32 v[[LO2:[0-9]+]], s[[LO_Y]]
	; GCN-DAG: v_mov_b32_e32 v[[HI2:[0-9]+]], s[[HI_Y]]			; GCN-DAG: v_mov_b32_e32 v[[HI2:[0-9]+]], s[[HI_Y]]
	; GCN-DAG: {{flat\|global}}_load_dword v{{[0-9]+}}, v{{\[}}[[LO2]]:[[HI2]]{{\]}}			; GCN-DAG: {{flat\|global}}_load_dword v{{[0-9]+}}, v{{\[}}[[LO2]]:[[HI2]]{{\]}}
	; GCN-DAG: v_mov_b32_e32 v[[LO3:[0-9]+]], s[[LO_Z]]			; GCN-DAG: v_mov_b32_e32 v[[LO3:[0-9]+]], s[[LO_Z]]
	; GCN-DAG: v_mov_b32_e32 v[[HI3:[0-9]+]], s[[HI_Z]]			; GCN-DAG: v_mov_b32_e32 v[[HI3:[0-9]+]], s[[HI_Z]]
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/callee-special-input-vgprs.ll

Show First 20 Lines • Show All 320 Lines • ▼ Show 20 Lines	call void @too_many_args_use_workitem_id_x(
i32 210, i32 220, i32 230, i32 240,		i32 210, i32 220, i32 230, i32 240,
i32 250, i32 260, i32 270, i32 280,		i32 250, i32 260, i32 270, i32 280,
i32 290, i32 300, i32 310, i32 320)		i32 290, i32 300, i32 310, i32 320)
ret void		ret void
}		}

; Requires loading and storing to stack slot.		; Requires loading and storing to stack slot.
; GCN-LABEL: {{^}}too_many_args_call_too_many_args_use_workitem_id_x:		; GCN-LABEL: {{^}}too_many_args_call_too_many_args_use_workitem_id_x:
; GCN: buffer_store_dword v32, off, s[0:3], s5 offset:8 ; 4-byte Folded Spill		; GCN: buffer_store_dword v32, off, s[0:3], s5 offset:12 ; 4-byte Folded Spill
		; GCN: buffer_store_dword v33, off, s[0:3], s5 offset:8 ; 4-byte Folded Spill
		; GCN: v_writelane_b32 v32
; GCN: s_add_u32 s32, s32, 0x400{{$}}		; GCN: s_add_u32 s32, s32, 0x400{{$}}
; GCN: buffer_load_dword v32, off, s[0:3], s5 offset:4		; GCN: buffer_load_dword v33, off, s[0:3], s5 offset:4

; GCN: buffer_store_dword v32, off, s[0:3], s32 offset:4{{$}}		; GCN: buffer_store_dword v33, off, s[0:3], s32 offset:4{{$}}

; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN: buffer_load_dword v32, off, s[0:3], s5 offset:8 ; 4-byte Folded Reload		; GCN: buffer_load_dword v33, off, s[0:3], s5 offset:8 ; 4-byte Folded Reload
		; GCN: buffer_load_dword v32, off, s[0:3], s5 offset:12 ; 4-byte Folded Reload
; GCN: s_sub_u32 s32, s32, 0x400{{$}}		; GCN: s_sub_u32 s32, s32, 0x400{{$}}
; GCN: s_setpc_b64		; GCN: s_setpc_b64
define void @too_many_args_call_too_many_args_use_workitem_id_x(		define void @too_many_args_call_too_many_args_use_workitem_id_x(
i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,		i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,
i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,		i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,
i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,		i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,
i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30, i32 %arg31) #1 {		i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30, i32 %arg31) #1 {
call void @too_many_args_use_workitem_id_x(		call void @too_many_args_use_workitem_id_x(
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	call void @too_many_args_use_workitem_id_x_byval(
i32 250, i32 260, i32 270, i32 280,		i32 250, i32 260, i32 270, i32 280,
i32 290, i32 300, i32 310, i32 320,		i32 290, i32 300, i32 310, i32 320,
i32 addrspace(5)* %alloca)		i32 addrspace(5)* %alloca)
ret void		ret void
}		}

; GCN-LABEL: {{^}}func_call_too_many_args_use_workitem_id_x_byval:		; GCN-LABEL: {{^}}func_call_too_many_args_use_workitem_id_x_byval:
; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 0x3e7{{$}}		; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 0x3e7{{$}}
; GCN: buffer_store_dword [[K]], off, s[0:3], s5 offset:4		; GCN: buffer_store_dword [[K]], off, s[0:3], s5 offset:8
; GCN: buffer_store_dword v0, off, s[0:3], s32 offset:8		; GCN: buffer_store_dword v0, off, s[0:3], s32 offset:8

; GCN: buffer_load_dword [[RELOAD_BYVAL:v[0-9]+]], off, s[0:3], s5 offset:4		; GCN: buffer_load_dword [[RELOAD_BYVAL:v[0-9]+]], off, s[0:3], s5 offset:8
; GCN: buffer_store_dword [[RELOAD_BYVAL]], off, s[0:3], s32 offset:4{{$}}		; GCN: buffer_store_dword [[RELOAD_BYVAL]], off, s[0:3], s32 offset:4{{$}}
; GCN: v_mov_b32_e32 [[RELOAD_BYVAL]],		; GCN: v_mov_b32_e32 [[RELOAD_BYVAL]],
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define void @func_call_too_many_args_use_workitem_id_x_byval() #1 {		define void @func_call_too_many_args_use_workitem_id_x_byval() #1 {
%alloca = alloca i32, align 4, addrspace(5)		%alloca = alloca i32, align 4, addrspace(5)
store volatile i32 999, i32 addrspace(5)* %alloca		store volatile i32 999, i32 addrspace(5)* %alloca
call void @too_many_args_use_workitem_id_x_byval(		call void @too_many_args_use_workitem_id_x_byval(
i32 10, i32 20, i32 30, i32 40,		i32 10, i32 20, i32 30, i32 40,
▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

	Show All 23 Lines


	define float @call_split_type_used_outside_block_v2f32() #0 {			define float @call_split_type_used_outside_block_v2f32() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v2f32:			; GCN-LABEL: call_split_type_used_outside_block_v2f32:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s5, s32			; GCN-NEXT: s_mov_b32 s5, s32
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s5 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s5 offset:4 ; 4-byte Folded Spill
				; GCN-NEXT: ; implicit-def: $vgpr32
	; GCN-NEXT: v_writelane_b32 v32, s33, 0			; GCN-NEXT: v_writelane_b32 v32, s33, 0
	; GCN-NEXT: v_writelane_b32 v32, s34, 1			; GCN-NEXT: v_writelane_b32 v32, s34, 1
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_add_u32 s32, s32, 0x400
	; GCN-NEXT: v_writelane_b32 v32, s35, 2			; GCN-NEXT: v_writelane_b32 v32, s35, 2
	; GCN-NEXT: s_getpc_b64 s[6:7]			; GCN-NEXT: s_getpc_b64 s[6:7]
	; GCN-NEXT: s_add_u32 s6, s6, func_v2f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s6, s6, func_v2f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s7, s7, func_v2f32@rel32@hi+4			; GCN-NEXT: s_addc_u32 s7, s7, func_v2f32@rel32@hi+4
	; GCN-NEXT: s_mov_b64 s[34:35], s[30:31]			; GCN-NEXT: s_mov_b64 s[34:35], s[30:31]
	Show All 18 Lines
	}			}

	define float @call_split_type_used_outside_block_v3f32() #0 {			define float @call_split_type_used_outside_block_v3f32() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v3f32:			; GCN-LABEL: call_split_type_used_outside_block_v3f32:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s5, s32			; GCN-NEXT: s_mov_b32 s5, s32
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s5 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s5 offset:4 ; 4-byte Folded Spill
				; GCN-NEXT: ; implicit-def: $vgpr32
	; GCN-NEXT: v_writelane_b32 v32, s33, 0			; GCN-NEXT: v_writelane_b32 v32, s33, 0
	; GCN-NEXT: v_writelane_b32 v32, s34, 1			; GCN-NEXT: v_writelane_b32 v32, s34, 1
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_add_u32 s32, s32, 0x400
	; GCN-NEXT: v_writelane_b32 v32, s35, 2			; GCN-NEXT: v_writelane_b32 v32, s35, 2
	; GCN-NEXT: s_getpc_b64 s[6:7]			; GCN-NEXT: s_getpc_b64 s[6:7]
	; GCN-NEXT: s_add_u32 s6, s6, func_v3f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s6, s6, func_v3f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s7, s7, func_v3f32@rel32@hi+4			; GCN-NEXT: s_addc_u32 s7, s7, func_v3f32@rel32@hi+4
	; GCN-NEXT: s_mov_b64 s[34:35], s[30:31]			; GCN-NEXT: s_mov_b64 s[34:35], s[30:31]
	Show All 18 Lines
	}			}

	define half @call_split_type_used_outside_block_v4f16() #0 {			define half @call_split_type_used_outside_block_v4f16() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v4f16:			; GCN-LABEL: call_split_type_used_outside_block_v4f16:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s5, s32			; GCN-NEXT: s_mov_b32 s5, s32
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s5 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s5 offset:4 ; 4-byte Folded Spill
				; GCN-NEXT: ; implicit-def: $vgpr32
	; GCN-NEXT: v_writelane_b32 v32, s33, 0			; GCN-NEXT: v_writelane_b32 v32, s33, 0
	; GCN-NEXT: v_writelane_b32 v32, s34, 1			; GCN-NEXT: v_writelane_b32 v32, s34, 1
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_add_u32 s32, s32, 0x400
	; GCN-NEXT: v_writelane_b32 v32, s35, 2			; GCN-NEXT: v_writelane_b32 v32, s35, 2
	; GCN-NEXT: s_getpc_b64 s[6:7]			; GCN-NEXT: s_getpc_b64 s[6:7]
	; GCN-NEXT: s_add_u32 s6, s6, func_v4f16@rel32@lo+4			; GCN-NEXT: s_add_u32 s6, s6, func_v4f16@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s7, s7, func_v4f16@rel32@hi+4			; GCN-NEXT: s_addc_u32 s7, s7, func_v4f16@rel32@hi+4
	; GCN-NEXT: s_mov_b64 s[34:35], s[30:31]			; GCN-NEXT: s_mov_b64 s[34:35], s[30:31]
	Show All 18 Lines
	}			}

	define { i32, half } @call_split_type_used_outside_block_struct() #0 {			define { i32, half } @call_split_type_used_outside_block_struct() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_struct:			; GCN-LABEL: call_split_type_used_outside_block_struct:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_mov_b32 s5, s32			; GCN-NEXT: s_mov_b32 s5, s32
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s5 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s5 offset:4 ; 4-byte Folded Spill
				; GCN-NEXT: ; implicit-def: $vgpr32
	; GCN-NEXT: v_writelane_b32 v32, s33, 0			; GCN-NEXT: v_writelane_b32 v32, s33, 0
	; GCN-NEXT: v_writelane_b32 v32, s34, 1			; GCN-NEXT: v_writelane_b32 v32, s34, 1
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_add_u32 s32, s32, 0x400
	; GCN-NEXT: v_writelane_b32 v32, s35, 2			; GCN-NEXT: v_writelane_b32 v32, s35, 2
	; GCN-NEXT: s_getpc_b64 s[6:7]			; GCN-NEXT: s_getpc_b64 s[6:7]
	; GCN-NEXT: s_add_u32 s6, s6, func_struct@rel32@lo+4			; GCN-NEXT: s_add_u32 s6, s6, func_struct@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s7, s7, func_struct@rel32@hi+4			; GCN-NEXT: s_addc_u32 s7, s7, func_struct@rel32@hi+4
	; GCN-NEXT: s_mov_b64 s[34:35], s[30:31]			; GCN-NEXT: s_mov_b64 s[34:35], s[30:31]
	Show All 35 Lines

test/CodeGen/AMDGPU/debug-value2.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs < %s \| FileCheck %s

	%struct.ShapeData = type { <4 x float>, <4 x float>, <4 x float>, <4 x float>, <4 x float>, i32, i32, i64, <4 x float>, i32, i8, i8, i16, i32, i32 }			%struct.ShapeData = type { <4 x float>, <4 x float>, <4 x float>, <4 x float>, <4 x float>, i32, i32, i64, <4 x float>, i32, i8, i8, i16, i32, i32 }

	declare float @llvm.fmuladd.f32(float, float, float)			declare float @llvm.fmuladd.f32(float, float, float)

	declare <4 x float> @llvm.fmuladd.v4f32(<4 x float>, <4 x float>, <4 x float>)			declare <4 x float> @llvm.fmuladd.v4f32(<4 x float>, <4 x float>, <4 x float>)

	declare %struct.ShapeData addrspace(1)* @Scene_getSubShapeData(i32, i8 addrspace(1), i32 addrspace(1)) local_unnamed_addr			declare %struct.ShapeData addrspace(1)* @Scene_getSubShapeData(i32, i8 addrspace(1), i32 addrspace(1)) local_unnamed_addr

	define <4 x float> @Scene_transformT(i32 %subshapeIdx, <4 x float> %v, float %time, i8 addrspace(1)* %gScene, i32 addrspace(1)* %gSceneOffsets) local_unnamed_addr !dbg !110 {			define <4 x float> @Scene_transformT(i32 %subshapeIdx, <4 x float> %v, float %time, i8 addrspace(1)* %gScene, i32 addrspace(1)* %gSceneOffsets) local_unnamed_addr !dbg !110 {
	entry:			entry:
				; CHECK: ;DEBUG_VALUE: Scene_transformT:gSceneOffsets <- [DW_OP_constu 1, DW_OP_swap, DW_OP_xderef] $vgpr8_vgpr9
	; CHECK: ;DEBUG_VALUE: Scene_transformT:gScene <- [DW_OP_constu 1, DW_OP_swap, DW_OP_xderef] $vgpr6_vgpr7			; CHECK: ;DEBUG_VALUE: Scene_transformT:gScene <- [DW_OP_constu 1, DW_OP_swap, DW_OP_xderef] $vgpr6_vgpr7
	call void @llvm.dbg.value(metadata i8 addrspace(1)* %gScene, metadata !120, metadata !DIExpression(DW_OP_constu, 1, DW_OP_swap, DW_OP_xderef)), !dbg !154			call void @llvm.dbg.value(metadata i8 addrspace(1)* %gScene, metadata !120, metadata !DIExpression(DW_OP_constu, 1, DW_OP_swap, DW_OP_xderef)), !dbg !154
	; CHECK: ;DEBUG_VALUE: Scene_transformT:gSceneOffsets <- [DW_OP_constu 1, DW_OP_swap, DW_OP_xderef] $vgpr8_vgpr9
	call void @llvm.dbg.value(metadata i32 addrspace(1)* %gSceneOffsets, metadata !121, metadata !DIExpression(DW_OP_constu, 1, DW_OP_swap, DW_OP_xderef)), !dbg !155			call void @llvm.dbg.value(metadata i32 addrspace(1)* %gSceneOffsets, metadata !121, metadata !DIExpression(DW_OP_constu, 1, DW_OP_swap, DW_OP_xderef)), !dbg !155
	%call = tail call %struct.ShapeData addrspace(1)* @Scene_getSubShapeData(i32 %subshapeIdx, i8 addrspace(1)* %gScene, i32 addrspace(1)* %gSceneOffsets)			%call = tail call %struct.ShapeData addrspace(1)* @Scene_getSubShapeData(i32 %subshapeIdx, i8 addrspace(1)* %gScene, i32 addrspace(1)* %gSceneOffsets)
	%m_linearMotion = getelementptr inbounds %struct.ShapeData, %struct.ShapeData addrspace(1)* %call, i64 0, i32 2			%m_linearMotion = getelementptr inbounds %struct.ShapeData, %struct.ShapeData addrspace(1)* %call, i64 0, i32 2
	%tmp = load <4 x float>, <4 x float> addrspace(1)* %m_linearMotion, align 16			%tmp = load <4 x float>, <4 x float> addrspace(1)* %m_linearMotion, align 16
	%m_angularMotion = getelementptr inbounds %struct.ShapeData, %struct.ShapeData addrspace(1)* %call, i64 0, i32 3			%m_angularMotion = getelementptr inbounds %struct.ShapeData, %struct.ShapeData addrspace(1)* %call, i64 0, i32 3
	%tmp1 = load <4 x float>, <4 x float> addrspace(1)* %m_angularMotion, align 16			%tmp1 = load <4 x float>, <4 x float> addrspace(1)* %m_angularMotion, align 16
	%m_scaleMotion = getelementptr inbounds %struct.ShapeData, %struct.ShapeData addrspace(1)* %call, i64 0, i32 4			%m_scaleMotion = getelementptr inbounds %struct.ShapeData, %struct.ShapeData addrspace(1)* %call, i64 0, i32 4
	%tmp2 = load <4 x float>, <4 x float> addrspace(1)* %m_scaleMotion, align 16			%tmp2 = load <4 x float>, <4 x float> addrspace(1)* %m_scaleMotion, align 16
	▲ Show 20 Lines • Show All 413 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll

; RUN: llc -O0 -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=VGPR -check-prefix=GCN %s		; RUN: llc -O0 -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=VGPR -check-prefix=GCN %s

; FIXME: we should disable sdwa peephole because dead-code elimination, that		; FIXME: we should disable sdwa peephole because dead-code elimination, that
; runs after peephole, ruins this test (different register numbers)		; runs after peephole, ruins this test (different register numbers)

; Spill all SGPRs so multiple VGPRs are required for spilling all of them.		; Spill all SGPRs so multiple VGPRs are required for spilling all of them.

; Ideally we only need 2 VGPRs for all spilling. The VGPRs are		; Ideally we only need 2 VGPRs for all spilling. The VGPRs are
; allocated per-frame index, so it's possible to get up with more.		; allocated per-frame index, so it's possible to get up with more.

; GCN-LABEL: {{^}}spill_sgprs_to_multiple_vgprs:		; GCN-LABEL: {{^}}spill_sgprs_to_multiple_vgprs:

		; GCN: ; implicit-def: $vgpr2
		; GCN: ; implicit-def: $vgpr1
		; GCN: ; implicit-def: $vgpr0


; GCN: def s[4:11]		; GCN: def s[4:11]
; GCN: v_writelane_b32 v0, s4, 0		; GCN: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v0, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v0, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v0, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v0, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v0, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v0, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s10, 6
▲ Show 20 Lines • Show All 428 Lines • ▼ Show 20 Lines	bb0:
call void asm sideeffect "; use $0", "s"(<2 x i32> %wide.sgpr4) #0		call void asm sideeffect "; use $0", "s"(<2 x i32> %wide.sgpr4) #0
call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr5) #0		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr5) #0
br label %ret		br label %ret

ret:		ret:
ret void		ret void
}		}

; The first 64 SGPR spills can go to a VGPR, but there isn't a second
; so some spills must be to memory. The last 16 element spill runs out of lanes at the 15th element.

; GCN-LABEL: {{^}}no_vgprs_last_sgpr_spill:

; GCN: v_writelane_b32 v23, s{{[0-9]+}}, 0
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 1
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 2
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 3
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 4
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 5
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 6
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 7
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 8
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 9
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 10
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 11
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 12
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 13
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 14
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 15

; GCN: v_writelane_b32 v23, s{{[0-9]+}}, 16
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 17
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 18
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 19
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 20
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 21
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 22
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 23
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 24
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 25
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 26
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 27
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 28
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 29
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 30
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 31

; GCN: def s[4:19]
; GCN: v_writelane_b32 v23, s4, 32
; GCN-NEXT: v_writelane_b32 v23, s5, 33
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 34
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 35
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 36
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 37
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 38
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 39
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 40
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 41
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 42
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 43
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 44
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 45
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 46
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 47

; GCN: def s[4:19]
; GCN: v_writelane_b32 v23, s{{[[0-9]+}}, 48
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 49
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 50
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 51
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 52
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 53
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 54
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 55
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 56
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 57
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 58
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 59
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 60
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 61
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 62
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 63

; GCN: def s[4:5]
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}
; GCN: s_cbranch_scc1


; GCN: buffer_load_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}
; GCN: buffer_load_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}

; GCN: v_readlane_b32 s20, v23, 32
; GCN-NEXT: v_readlane_b32 s21, v23, 33
; GCN-NEXT: v_readlane_b32 s22, v23, 34
; GCN-NEXT: v_readlane_b32 s23, v23, 35
; GCN-NEXT: v_readlane_b32 s24, v23, 36
; GCN-NEXT: v_readlane_b32 s25, v23, 37
; GCN-NEXT: v_readlane_b32 s26, v23, 38
; GCN-NEXT: v_readlane_b32 s27, v23, 39
; GCN-NEXT: v_readlane_b32 s28, v23, 40
; GCN-NEXT: v_readlane_b32 s29, v23, 41
; GCN-NEXT: v_readlane_b32 s30, v23, 42
; GCN-NEXT: v_readlane_b32 s31, v23, 43
; GCN-NEXT: v_readlane_b32 s32, v23, 44
; GCN-NEXT: v_readlane_b32 s33, v23, 45
; GCN-NEXT: v_readlane_b32 s34, v23, 46
; GCN-NEXT: v_readlane_b32 s35, v23, 47

; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v23, 0
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 1
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 2
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 3
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 4
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 5
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 6
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 7
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 8
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 9
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 10
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 11
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 12
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 13
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 14
; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v23, 15
; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}

; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v23, 16
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 17
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 18
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 19
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 20
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 21
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 22
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 23
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 24
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 25
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 26
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 27
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 28
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 29
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 30
; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v23, 31
; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}

; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v23, 48
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 49
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 50
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 51
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 52
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 53
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 54
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 55
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 56
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 57
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 58
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 59
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 60
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 61
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 62
; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v23, 63
; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}
; GCN: ; use s[0:1]
define amdgpu_kernel void @no_vgprs_last_sgpr_spill(i32 addrspace(1)* %out, i32 %in) #1 {
call void asm sideeffect "", "~{v[0:7]}" () #0
call void asm sideeffect "", "~{v[8:15]}" () #0
call void asm sideeffect "", "~{v[16:19]}"() #0
call void asm sideeffect "", "~{v[20:21]}"() #0
call void asm sideeffect "", "~{v22}"() #0

%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr1 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr2 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr3 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr4 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0
%cmp = icmp eq i32 %in, 0
br i1 %cmp, label %bb0, label %ret

bb0:
call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr0) #0
call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr1) #0
call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr2) #0
call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr3) #0
call void asm sideeffect "; use $0", "s"(<2 x i32> %wide.sgpr4) #0
br label %ret

ret:
ret void
}

attributes #0 = { nounwind }		attributes #0 = { nounwind }
attributes #1 = { nounwind "amdgpu-waves-per-eu"="10,10" }		attributes #1 = { nounwind "amdgpu-waves-per-eu"="10,10" }

test/CodeGen/AMDGPU/sgpr-regalloc-flags.ll

This file was added.

				; REQUIRES: asserts

				; RUN: llc -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=DEFAULT %s
				; RUN: llc -sgpr-regalloc=greedy -vgpr-regalloc=greedy -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=DEFAULT %s

				; RUN: llc -O0 -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=O0 %s

				; RUN: llc -vgpr-regalloc=basic -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=DEFAULT-BASIC %s
				; RUN: llc -sgpr-regalloc=basic -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=BASIC-DEFAULT %s
				; RUN: llc -sgpr-regalloc=basic -vgpr-regalloc=basic -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=BASIC-BASIC %s

				; RUN: not llc -regalloc=basic -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=REGALLOC %s
				; RUN: not llc -regalloc=fast -O0 -mtriple=amdgcn-amd-amdhsa -debug-pass=Structure -o /dev/null %s 2>&1 \| FileCheck -check-prefix=REGALLOC %s


				; REGALLOC: -regalloc not supported with amdgcn. Use -sgpr-regalloc and -vgpr-regalloc

				; DEFAULT: Greedy Register Allocator
				; DEFAULT-NEXT: Virtual Register Rewriter
				; DEFAULT-NEXT: SI lower SGPR spill instructions
				; DEFAULT-NEXT: Debug Variable Analysis
				; DEFAULT-NEXT: Virtual Register Map
				; DEFAULT-NEXT: Live Register Matrix
				; DEFAULT-NEXT: Machine Optimization Remark Emitter
				; DEFAULT-NEXT: Greedy Register Allocator
				; DEFAULT-NEXT: Virtual Register Rewriter
				; DEFAULT-NEXT: Stack Slot Coloring

				; O0: Fast Register Allocator
				; O0-NEXT: SI lower SGPR spill instructions
				; O0-NEXT: Fast Register Allocator
				; O0-NEXT: SI Fix VGPR copies




				; BASIC-DEFAULT: Debug Variable Analysis
				; BASIC-DEFAULT-NEXT: Live Stack Slot Analysis
				; BASIC-DEFAULT-NEXT: Machine Block Frequency Analysis
				; BASIC-DEFAULT-NEXT: Virtual Register Map
				; BASIC-DEFAULT-NEXT: Live Register Matrix
				; BASIC-DEFAULT-NEXT: Basic Register Allocator
				; BASIC-DEFAULT-NEXT: Virtual Register Rewriter
				; BASIC-DEFAULT-NEXT: SI lower SGPR spill instructions
				; BASIC-DEFAULT-NEXT: Debug Variable Analysis
				; BASIC-DEFAULT-NEXT: Virtual Register Map
				; BASIC-DEFAULT-NEXT: Live Register Matrix
				; BASIC-DEFAULT-NEXT: Bundle Machine CFG Edges
				; BASIC-DEFAULT-NEXT: Spill Code Placement Analysis
				; BASIC-DEFAULT-NEXT: Lazy Machine Block Frequency Analysis
				; BASIC-DEFAULT-NEXT: Machine Optimization Remark Emitter
				; BASIC-DEFAULT-NEXT: Greedy Register Allocator
				; BASIC-DEFAULT-NEXT: Virtual Register Rewriter
				; BASIC-DEFAULT-NEXT: Stack Slot Coloring



				; DEFAULT-BASIC: Greedy Register Allocator
				; DEFAULT-BASIC-NEXT: Virtual Register Rewriter
				; DEFAULT-BASIC-NEXT: SI lower SGPR spill instructions
				; DEFAULT-BASIC-NEXT: Debug Variable Analysis
				; DEFAULT-BASIC-NEXT: Virtual Register Map
				; DEFAULT-BASIC-NEXT: Live Register Matrix
				; DEFAULT-BASIC-NEXT: Basic Register Allocator
				; DEFAULT-BASIC-NEXT: Virtual Register Rewriter
				; DEFAULT-BASIC-NEXT: Stack Slot Coloring



				; BASIC-BASIC: Debug Variable Analysis
				; BASIC-BASIC-NEXT: Live Stack Slot Analysis
				; BASIC-BASIC-NEXT: Machine Block Frequency Analysis
				; BASIC-BASIC-NEXT: Virtual Register Map
				; BASIC-BASIC-NEXT: Live Register Matrix
				; BASIC-BASIC-NEXT: Basic Register Allocator
				; BASIC-BASIC-NEXT: Virtual Register Rewriter
				; BASIC-BASIC-NEXT: SI lower SGPR spill instructions
				; BASIC-BASIC-NEXT: Debug Variable Analysis
				; BASIC-BASIC-NEXT: Virtual Register Map
				; BASIC-BASIC-NEXT: Live Register Matrix
				; BASIC-BASIC-NEXT: Basic Register Allocator
				; BASIC-BASIC-NEXT: Virtual Register Rewriter
				; BASIC-BASIC-NEXT: Stack Slot Coloring


				declare void @bar()

				; Something with some CSR SGPR spills
				define void @foo() {
				call void asm sideeffect "; clobber", "~{s33}"()
				call void @bar()
				ret void
				}

				; Block live out spills with fast regalloc
				define amdgpu_kernel void @control_flow(i1 %cond) {
				%s33 = call i32 asm sideeffect "; clobber", "={s33}"()
				br i1 %cond, label %bb0, label %bb1

				bb0:
				call void asm sideeffect "; use %0", "s"(i32 %s33)
				br label %bb1

				bb1:
				ret void
				}

test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll

This file was added.

				; RUN: not llc -O0 -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
				; RUN: not llc -O0 -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s 2>&1 \| FileCheck -check-prefix=ERROR %s

				; ERROR: error: VGPRs for SGPR spilling limit exceeded (0) in partial_no_vgprs_last_sgpr_spill

				; The first 64 SGPR spills can go to a VGPR, but there isn't a second
				; so some spills must be to memory. The last 16 element spill runs out of lanes at the 15th element.

				; GCN-LABEL: {{^}}partial_no_vgprs_last_sgpr_spill:

				; GCN: v_writelane_b32 v23, s{{[0-9]+}}, 0
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 1
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 2
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 3
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 4
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 5
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 6
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 7
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 8
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 9
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 10
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 11
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 12
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 13
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 14
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 15

				; GCN: v_writelane_b32 v23, s{{[0-9]+}}, 16
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 17
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 18
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 19
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 20
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 21
				; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 22
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 23
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 24
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 25
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 26
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 27
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 28
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 29
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 30
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 31

				; GCN: def s[4:19]
				; GCN: v_writelane_b32 v23, s4, 32
				; GCN-NEXT: v_writelane_b32 v23, s5, 33
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 34
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 35
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 36
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 37
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 38
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 39
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 40
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 41
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 42
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 43
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 44
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 45
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 46
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 47

				; GCN: def s[4:19]
				; GCN: v_writelane_b32 v23, s{{[[0-9]+}}, 48
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 49
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 50
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 51
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 52
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 53
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 54
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 55
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 56
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 57
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 58
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 59
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 60
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 61
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 62
				; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 63

				; GCN: def s[4:5]
				; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}
				; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}
				; GCN: s_cbranch_scc1


				; GCN: buffer_load_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}
				; GCN: buffer_load_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}

				; GCN: v_readlane_b32 s20, v23, 32
				; GCN-NEXT: v_readlane_b32 s21, v23, 33
				; GCN-NEXT: v_readlane_b32 s22, v23, 34
				; GCN-NEXT: v_readlane_b32 s23, v23, 35
				; GCN-NEXT: v_readlane_b32 s24, v23, 36
				; GCN-NEXT: v_readlane_b32 s25, v23, 37
				; GCN-NEXT: v_readlane_b32 s26, v23, 38
				; GCN-NEXT: v_readlane_b32 s27, v23, 39
				; GCN-NEXT: v_readlane_b32 s28, v23, 40
				; GCN-NEXT: v_readlane_b32 s29, v23, 41
				; GCN-NEXT: v_readlane_b32 s30, v23, 42
				; GCN-NEXT: v_readlane_b32 s31, v23, 43
				; GCN-NEXT: v_readlane_b32 s32, v23, 44
				; GCN-NEXT: v_readlane_b32 s33, v23, 45
				; GCN-NEXT: v_readlane_b32 s34, v23, 46
				; GCN-NEXT: v_readlane_b32 s35, v23, 47

				; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v23, 0
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 1
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 2
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 3
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 4
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 5
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 6
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 7
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 8
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 9
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 10
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 11
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 12
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 13
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 14
				; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v23, 15
				; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}

				; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v23, 16
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 17
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 18
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 19
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 20
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 21
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 22
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 23
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 24
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 25
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 26
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 27
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 28
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 29
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 30
				; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v23, 31
				; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}

				; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v23, 48
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 49
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 50
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 51
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 52
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 53
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 54
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 55
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 56
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 57
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 58
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 59
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 60
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 61
				; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 62
				; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v23, 63
				; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}
				; GCN: ; use s[0:1]
				define amdgpu_kernel void @partial_no_vgprs_last_sgpr_spill(i32 addrspace(1)* %out, i32 %in) #1 {
				call void asm sideeffect "", "~{v[0:7]}" () #0
				call void asm sideeffect "", "~{v[8:15]}" () #0
				call void asm sideeffect "", "~{v[16:19]}"() #0
				call void asm sideeffect "", "~{v[20:21]}"() #0
				call void asm sideeffect "", "~{v22}"() #0

				%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
				%wide.sgpr1 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
				%wide.sgpr2 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
				%wide.sgpr3 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
				%wide.sgpr4 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0
				%cmp = icmp eq i32 %in, 0
				br i1 %cmp, label %bb0, label %ret

				bb0:
				call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr0) #0
				call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr1) #0
				call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr2) #0
				call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr3) #0
				call void asm sideeffect "; use $0", "s"(<2 x i32> %wide.sgpr4) #0
				br label %ret

				ret:
				ret void
				}

				attributes #0 = { nounwind }
				attributes #1 = { nounwind "amdgpu-waves-per-eu"="10,10" }

test/CodeGen/AMDGPU/sgpr-spill-wrong-stack-id.mir

	# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=3 -start-before=greedy -stop-after=stack-slot-coloring -o - %s \| FileCheck -check-prefixes=SHARE,GCN %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=3 -run-pass=greedy,virtregrewriter,stack-slot-coloring -o - %s \| FileCheck -check-prefixes=SHARE,GCN %s
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=3 -start-before=greedy -stop-after=stack-slot-coloring -no-stack-slot-sharing -o - %s \| FileCheck -check-prefixes=NOSHARE,GCN %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=3 -run-pass=greedy,virtregrewriter,stack-slot-coloring -no-stack-slot-sharing -o - %s \| FileCheck -check-prefixes=NOSHARE,GCN %s

				# -run-pass is used to artifically avoid using split register allocation, which would avoid stressing StackSlotColoring.


	# Make sure that stack slot coloring doesn't try to merge frame			# Make sure that stack slot coloring doesn't try to merge frame
	# indexes used for SGPR spilling with those that aren't.			# indexes used for SGPR spilling with those that aren't.
	# Even when stack slot sharing was disabled, it was still moving the			# Even when stack slot sharing was disabled, it was still moving the
	# FI ID used for an SGPR spill to a normal frame index.			# FI ID used for an SGPR spill to a normal frame index.

	--- \|			--- \|

	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll

	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -amdgpu-spill-sgpr-to-smem=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=SGPR %s			; RUN: not llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -amdgpu-spill-sgpr-to-smem=0 -verify-machineinstrs < %s 2>&1 \| FileCheck -check-prefix=ERROR %s
	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -amdgpu-spill-sgpr-to-smem=1 -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=SMEM %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -amdgpu-spill-sgpr-to-smem=1 -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=SMEM %s

				; Previously, SGPR spilling to VGPRs was handled in a single register
				; allocation run. It was possible to not have any free VGPRs for SGPR
				; spilling, requiring writing out to memory which didn't work
				; well. Test situations where this used to be necessary.

				; ERROR: error: VGPRs for SGPR spilling limit exceeded (0) in test

	; Make sure this doesn't crash.			; Make sure this doesn't crash.
	; ALL-LABEL: {{^}}test:			; ALL-LABEL: {{^}}test:
	; ALL: s_mov_b32 s[[LO:[0-9]+]], SCRATCH_RSRC_DWORD0
	; ALL: s_mov_b32 s[[OFF:[0-9]+]], s3			; Initialize VGPR for spilling
	; ALL: s_mov_b32 s[[HI:[0-9]+]], 0xe80000			; SGPR: ; implicit-def: $vgpr[[SPILL_VGPR:[0-9]+]]

	; Make sure we are handling hazards correctly.			; ALL-DAG: s_mov_b32 s[[LO:[0-9]+]], SCRATCH_RSRC_DWORD0
	; SGPR: buffer_load_dword [[VHI:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:16			; ALL-DAG: s_mov_b32 s[[OFF:[0-9]+]], s3
	; SGPR-NEXT: s_waitcnt vmcnt(0)			; ALL-DAG: s_mov_b32 s[[HI:[0-9]+]], 0xe80000
	; SGPR-NEXT: v_readfirstlane_b32 s[[HI:[0-9]+]], [[VHI]]
	; SGPR-NEXT: s_nop 4			; SGPR-DAG: v_writelane_b32 v[[SPILL_VGPR]], s{{[0-9]+}}, 0
	; SGPR-NEXT: buffer_store_dword v0, off, s[0:[[HI]]{{\]}}, 0			; SGPR-DAG: v_writelane_b32 v[[SPILL_VGPR]], s{{[0-9]+}}, 1
				; SGPR-DAG: v_writelane_b32 v[[SPILL_VGPR]], s{{[0-9]+}}, 2
				; SGPR-DAG: v_writelane_b32 v[[SPILL_VGPR]], s{{[0-9]+}}, 3

				; Treating the VGPR as a normal value has the disadvantage of
				; increasing the amount of spill code with fast regalloc
				; SGPR: buffer_store_dword v[[SPILL_VGPR]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4 ; 4-byte Folded Spill

				; SGPR: ;;#ASMSTART
				; SGPR: buffer_load_dword v[[VGPR_RESTORE:[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4 ; 4-byte Folded Reload
				; SGPR: v_readlane_b32 s{{[0-9]+}}, v[[VGPR_RESTORE]], 0
				; SGPR: v_readlane_b32 s{{[0-9]+}}, v[[VGPR_RESTORE]], 1
				; SGPR: v_readlane_b32 s{{[0-9]+}}, v[[VGPR_RESTORE]], 2
				; SGPR: v_readlane_b32 s{{[0-9]+}}, v[[VGPR_RESTORE]], 3


	; Make sure scratch wave offset register is correctly incremented and			; Make sure scratch wave offset register is correctly incremented and
	; then restored.			; then restored.
	; SMEM: s_add_u32 m0, s[[OFF]], 0x100{{$}}			; SMEM: s_add_u32 m0, s[[OFF]], 0x100{{$}}
	; SMEM: s_buffer_store_dwordx4 s{{\[[0-9]+:[0-9]+\]}}, s{{\[}}[[LO]]:[[HI]]], m0 ; 16-byte Folded Spill			; SMEM: s_buffer_store_dwordx4 s{{\[[0-9]+:[0-9]+\]}}, s{{\[}}[[LO]]:[[HI]]], m0 ; 16-byte Folded Spill

	; SMEM: s_add_u32 m0, s[[OFF]], 0x100{{$}}			; SMEM: s_add_u32 m0, s[[OFF]], 0x100{{$}}
	; SMEM: s_buffer_load_dwordx4 s{{\[[0-9]+:[0-9]+\]}}, s{{\[}}[[LO]]:[[HI]]], m0 ; 16-byte Folded Reload			; SMEM: s_buffer_load_dwordx4 s{{\[[0-9]+:[0-9]+\]}}, s{{\[}}[[LO]]:[[HI]]], m0 ; 16-byte Folded Reload
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/sibling-call.ll

	Show First 20 Lines • Show All 202 Lines • ▼ Show 20 Lines
	entry:			entry:
	%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] zeroinitializer)			%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] zeroinitializer)
	ret i32 %ret			ret i32 %ret
	}			}

	; Have another non-tail in the function			; Have another non-tail in the function
	; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_other_call:			; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_other_call:
	; GCN: s_mov_b32 s5, s32			; GCN: s_mov_b32 s5, s32
	; GCN: buffer_store_dword v34, off, s[0:3], s5 offset:12			; GCN: buffer_store_dword v32, off, s[0:3], s5 offset:12 ; 4-byte Folded Spill
	; GCN: buffer_store_dword v32, off, s[0:3], s5 offset:8 ; 4-byte Folded Spill			; GCN: buffer_store_dword v33, off, s[0:3], s5 offset:8 ; 4-byte Folded Spill
	; GCN: buffer_store_dword v33, off, s[0:3], s5 offset:4 ; 4-byte Folded Spill			; GCN: buffer_store_dword v34, off, s[0:3], s5 offset:4 ; 4-byte Folded Spill
	; GCN-DAG: v_writelane_b32 v34, s33, 0			; GCN-DAG: v_writelane_b32 v32, s33, 0
	; GCN-DAG: v_writelane_b32 v34, s34, 1			; GCN-DAG: v_writelane_b32 v32, s34, 1
	; GCN-DAG: v_writelane_b32 v34, s35, 2			; GCN-DAG: v_writelane_b32 v32, s35, 2
	; GCN-DAG: s_add_u32 s32, s32, 0x400			; GCN-DAG: s_add_u32 s32, s32, 0x400

	; GCN-DAG: s_getpc_b64			; GCN-DAG: s_getpc_b64
	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: s_getpc_b64 s[6:7]			; GCN: s_getpc_b64 s[6:7]
	; GCN: s_add_u32 s6, s6, sibling_call_i32_fastcc_i32_i32@rel32@lo+4			; GCN: s_add_u32 s6, s6, sibling_call_i32_fastcc_i32_i32@rel32@lo+4
	; GCN: s_addc_u32 s7, s7, sibling_call_i32_fastcc_i32_i32@rel32@hi+4			; GCN: s_addc_u32 s7, s7, sibling_call_i32_fastcc_i32_i32@rel32@hi+4

	; GCN-DAG: v_readlane_b32 s33, v34, 0			; GCN-DAG: v_readlane_b32 s33, v32, 0
	; GCN-DAG: v_readlane_b32 s34, v34, 1			; GCN-DAG: v_readlane_b32 s34, v32, 1
	; GCN-DAG: v_readlane_b32 s35, v34, 2			; GCN-DAG: v_readlane_b32 s35, v32, 2

	; GCN: buffer_load_dword v33, off, s[0:3], s5 offset:4			; GCN: buffer_load_dword v34, off, s[0:3], s5 offset:4
	; GCN: buffer_load_dword v32, off, s[0:3], s5 offset:8			; GCN: buffer_load_dword v33, off, s[0:3], s5 offset:8
	; GCN: buffer_load_dword v34, off, s[0:3], s5 offset:12			; GCN: buffer_load_dword v32, off, s[0:3], s5 offset:12
	; GCN: s_sub_u32 s32, s32, 0x400			; GCN: s_sub_u32 s32, s32, 0x400
	; GCN: s_setpc_b64 s[6:7]			; GCN: s_setpc_b64 s[6:7]
	define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {			define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {
	entry:			entry:
	%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)			%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)
	%ret = tail call fastcc i32 @sibling_call_i32_fastcc_i32_i32(i32 %a, i32 %b, i32 %other.call)			%ret = tail call fastcc i32 @sibling_call_i32_fastcc_i32_i32(i32 %a, i32 %b, i32 %other.call)
	ret i32 %ret			ret i32 %ret
	}			}
	Show All 32 Lines

test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 < %s \| FileCheck -check-prefix=GCN %s

	; For the CSR copy of s5, it may be possible to see it in			; For the CSR copy of s5, it may be possible to see it in
	; storeRegToStackSlot.			; storeRegToStackSlot.

	; GCN-LABEL: {{^}}spill_csr_s5_copy:			; GCN-LABEL: {{^}}spill_csr_s5_copy:
	; GCN: buffer_store_dword v32, off, s[0:3], s5 offset:8 ; 4-byte Folded Spill			; GCN: buffer_store_dword v32, off, s[0:3], s5 offset:4 ; 4-byte Folded Spill
	; GCN: v_writelane_b32 v32, s5, 2			; GCN: v_writelane_b32 v32, s5, 2
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN: v_readlane_b32 s5, v32, 2			; GCN: v_readlane_b32 s5, v32, 2
	; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 9			; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 9
	; GCN: buffer_store_dword [[K]], off, s[0:3], s5 offset:4			; GCN: buffer_store_dword [[K]], off, s[0:3], s5 offset:8
	; GCN: buffer_load_dword v32, off, s[0:3], s5 offset:8 ; 4-byte Folded Reload			; GCN: buffer_load_dword v32, off, s[0:3], s5 offset:4 ; 4-byte Folded Reload
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @spill_csr_s5_copy() #0 {			define void @spill_csr_s5_copy() #0 {
	bb:			bb:
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	%tmp = tail call i64 @func() #1			%tmp = tail call i64 @func() #1
	%tmp1 = getelementptr inbounds i32, i32 addrspace(1)* null, i64 %tmp			%tmp1 = getelementptr inbounds i32, i32 addrspace(1)* null, i64 %tmp
	%tmp2 = load i32, i32 addrspace(1)* %tmp1, align 4			%tmp2 = load i32, i32 addrspace(1)* %tmp1, align 4
	%tmp3 = zext i32 %tmp2 to i64			%tmp3 = zext i32 %tmp2 to i64
	store volatile i32 9, i32 addrspace(5)* %alloca			store volatile i32 9, i32 addrspace(5)* %alloca
	ret void			ret void
	}			}

	declare i64 @func()			declare i64 @func()

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readnone }			attributes #1 = { nounwind readnone }

test/CodeGen/AMDGPU/spill-empty-live-interval.mir

	# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 -start-before=simple-register-coalescing -stop-after=greedy -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 -start-before=simple-register-coalescing -stop-after=greedy,1 -o - %s \| FileCheck %s
	# https://bugs.llvm.org/show_bug.cgi?id=33620			# https://bugs.llvm.org/show_bug.cgi?id=33620

	---			---
	# This would assert due to the empty live interval created for %9			# This would assert due to the empty live interval created for %9
	# on the last S_NOP with an undef subreg use.			# on the last S_NOP with an undef subreg use.

	# CHECK-LABEL: name: expecting_non_empty_interval			# CHECK-LABEL: name: expecting_non_empty_interval

	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/spill-scavenge-offset.ll

	; RUN: llc -march=amdgcn -mcpu=verde -enable-misched=0 -post-RA-scheduler=0 < %s \| FileCheck %s			; RUN: llc -march=amdgcn -mcpu=verde -enable-misched=0 -post-RA-scheduler=0 < %s \| FileCheck %s
	; RUN: llc -regalloc=basic -march=amdgcn -mcpu=tonga -enable-misched=0 -post-RA-scheduler=0 < %s \| FileCheck %s			; RUN: llc -sgpr-regalloc=basic -vgpr-regalloc=basic -march=amdgcn -mcpu=tonga -enable-misched=0 -post-RA-scheduler=0 < %s \| FileCheck %s
	;			;
	; There is something about Tonga that causes this test to spend a lot of time			; There is something about Tonga that causes this test to spend a lot of time
	; in the default register allocator.			; in the default register allocator.


	; When the offset of VGPR spills into scratch space gets too large, an additional SGPR			; When the offset of VGPR spills into scratch space gets too large, an additional SGPR
	; is used to calculate the scratch load/store address. Make sure that this			; is used to calculate the scratch load/store address. Make sure that this
	; mechanism works even when many spills happen.			; mechanism works even when many spills happen.
	Show All 30 Lines

test/CodeGen/AMDGPU/stack-slot-color-sgpr-vgpr-spills.mir

	# RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs -stress-regalloc=1 -start-before=greedy -stop-after=stack-slot-coloring -o - %s \| FileCheck %s			# RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs -stress-regalloc=1 -start-before=greedy -stop-after=stack-slot-coloring -o - %s \| FileCheck %s
	---			---

	# CHECK-LABEL: name: no_merge_sgpr_vgpr_spill_slot{{$}}			# CHECK-LABEL: name: no_merge_sgpr_vgpr_spill_slot{{$}}
	# CHECK: stack:			# CHECK: stack:
	# CHECK: - { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# CHECK: - { id: 1, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# CHECK-NEXT: stack-id: 0,			# CHECK-NEXT: stack-id: 0,
				# CHECK-NOT: id: 1

				# CHECK: SI_SPILL_V32_SAVE killed $vgpr0, %stack.1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr5, 0, implicit $exec :: (store 4 into %stack.1, addrspace 5)
				# CHECK: $vgpr0 = SI_SPILL_V32_RESTORE %stack.1, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr5, 0, implicit $exec :: (load 4 from %stack.1, addrspace 5)

	# CHECK: - { id: 1, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# CHECK-NEXT: stack-id: 1,

	# CHECK: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr5, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)
	# CHECK: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr5, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)

	# CHECK: SI_SPILL_S32_SAVE killed renamable $sgpr6, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr5, implicit-def dead $m0 :: (store 4 into %stack.1, addrspace 5)			# CHECK: $vgpr2 = V_WRITELANE_B32_vi killed $sgpr6, 0, $vgpr2
	# CHECK: $sgpr6 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr5, implicit-def dead $m0 :: (load 4 from %stack.1, addrspace 5)			# CHECK: dead renamable $sgpr6 = S_LOAD_DWORD_IMM undef $sgpr0_sgpr1, 0, 0
				# CHECK: $sgpr6 = V_READLANE_B32_vi $vgpr2, 0

	name: no_merge_sgpr_vgpr_spill_slot			name: no_merge_sgpr_vgpr_spill_slot
	tracksRegLiveness: true			tracksRegLiveness: true
	body: \|			body: \|
	bb.0:			bb.0:
	%0:vgpr_32 = FLAT_LOAD_DWORD undef $vgpr0_vgpr1, 0, 0, 0, implicit $flat_scr, implicit $exec			%0:vgpr_32 = FLAT_LOAD_DWORD undef $vgpr0_vgpr1, 0, 0, 0, implicit $flat_scr, implicit $exec
	%2:vgpr_32 = FLAT_LOAD_DWORD undef $vgpr0_vgpr1, 0, 0, 0, implicit $flat_scr, implicit $exec			%2:vgpr_32 = FLAT_LOAD_DWORD undef $vgpr0_vgpr1, 0, 0, 0, implicit $flat_scr, implicit $exec
	S_NOP 0, implicit %0			S_NOP 0, implicit %0
	%1:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM undef $sgpr0_sgpr1, 0, 0			%1:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM undef $sgpr0_sgpr1, 0, 0
	%3:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM undef $sgpr0_sgpr1, 0, 0			%3:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM undef $sgpr0_sgpr1, 0, 0
	S_NOP 0, implicit %1			S_NOP 0, implicit %1
	...			...

This is an archive of the discontinued LLVM Phabricator instance.

RegAlloc: Allow targets to split register allocationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 176729

include/llvm/CodeGen/Passes.h

include/llvm/CodeGen/RegAllocCommon.h

include/llvm/CodeGen/RegAllocRegistry.h

include/llvm/CodeGen/TargetFrameLowering.h

lib/CodeGen/LiveIntervals.cpp

lib/CodeGen/RegAllocBase.h

lib/CodeGen/RegAllocBase.cpp

lib/CodeGen/RegAllocBasic.cpp

lib/CodeGen/RegAllocFast.cpp

lib/CodeGen/RegAllocGreedy.cpp

lib/CodeGen/TargetFrameLoweringImpl.cpp

lib/Target/AMDGPU/AMDGPU.h

lib/Target/AMDGPU/AMDGPUCallingConv.td

lib/Target/AMDGPU/AMDGPURegisterInfo.cpp

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

lib/Target/AMDGPU/CMakeLists.txt

lib/Target/AMDGPU/SIFrameLowering.h

lib/Target/AMDGPU/SIFrameLowering.cpp

lib/Target/AMDGPU/SILowerSGPRSpills.cpp

lib/Target/AMDGPU/SIMachineFunctionInfo.h

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

lib/Target/AMDGPU/SIRegisterInfo.h

lib/Target/AMDGPU/SIRegisterInfo.cpp

test/CodeGen/AMDGPU/callee-frame-setup.ll

test/CodeGen/AMDGPU/callee-special-input-sgprs.ll

test/CodeGen/AMDGPU/callee-special-input-vgprs.ll

test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

test/CodeGen/AMDGPU/debug-value2.ll

test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll

test/CodeGen/AMDGPU/sgpr-regalloc-flags.ll

test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll

test/CodeGen/AMDGPU/sgpr-spill-wrong-stack-id.mir

test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll

test/CodeGen/AMDGPU/sibling-call.ll

test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll

test/CodeGen/AMDGPU/spill-empty-live-interval.mir

test/CodeGen/AMDGPU/spill-scavenge-offset.ll

test/CodeGen/AMDGPU/stack-slot-color-sgpr-vgpr-spills.mir

RegAlloc: Allow targets to split register allocation
ClosedPublic