This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
2/3
AMDGPUAsmPrinter.cpp
-
AMDGPUResourceUsageAnalysis.cpp
2
AMDGPUSubtarget.cpp
-
GCNSubtarget.h
-
SIFrameLowering.h
2/3
SIFrameLowering.cpp
6/6
SIISelLowering.cpp
1/2
SIMachineFunctionInfo.h
2/2
SIMachineFunctionInfo.cpp
-
SIRegisterInfo.h
3/5
SIRegisterInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
lds-spill-cs.ll
-
lds-spill-ps.ll

Differential D130784

[AMDGPU] Support LDS spilling
Changes PlannedPublic

Authored by piotr on Jul 29 2022, 9:00 AM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
foad
dstuttard
sebastian-ne
RamNalamothu

Group Reviewers

Restricted Project

Summary

Add experimental support for LDS spilling on targets >= gfx9.

The amount of LDS is controlled by the attribute amdgpu-lds-spill-limit-dwords.
The default value of 0 means that LDS spilling is disabled.

The implementation utilizes DS_READ_ADDTID/DS_WRITE_ADDTID instructions.

For cases where workgroup size is larger than wave size, MultiDispatchInfo
(user sgpr in PAL front-end) is used to offset the address accordingly.
With some extra work, compute could use WorkGroupInfo to drive the spill
in the backend. Sadly, the way the values are preloaded is different between
graphics and compute (user sgpr versus system sgpr).

Tested on real-world graphics content (compute and pixel shaders).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

piotr created this revision.Jul 29 2022, 9:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2022, 9:00 AM

Herald added subscribers: kosarev, jsilvanus, foad and 11 others. · View Herald Transcript

piotr requested review of this revision.Jul 29 2022, 9:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2022, 9:00 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

piotr added reviewers: arsenm, rampitec, foad, dstuttard, sebastian-ne, Restricted Project.Jul 29 2022, 9:03 AM

Harbormaster completed remote builds in B178299: Diff 448649.Jul 29 2022, 9:56 AM

I thought we spilled registers to the stack and promoted some stack memory to LDS. Is there any interaction with promoteAlloca? In particular that pass tries to use LDS up to some occupancy boundary, after which allocating more here should take us over that boundary

In D130784#3687963, @JonChesterfield wrote:

I thought we spilled registers to the stack and promoted some stack memory to LDS. Is there any interaction with promoteAlloca? In particular that pass tries to use LDS up to some occupancy boundary, after which allocating more here should take us over that boundary

These are orthogonal things. The only relation would be promote alloca reduces the available LDS budget available for later spilling

rampitec added inline comments.Jul 29 2022, 2:07 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
2560	Why CS only?
2565	Matching argument by name is bad, especially if such name can be used by an user.
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
741	This is the most expensive check, it should go last.

piotr added inline comments.Aug 1 2022, 6:35 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
2560	This is a left-over from an older version of the patch, will remove.
2565	This is controlled by the front-end - graphics compute shaders do not have inputs that could be specified by a shader writer and end up landing here. Having said that, I am not 100% happy with the way this is handled here either. I guess a better idea could be to rely on a function attribute set by the front-end that would say - "the user sgpr you want to use for multi dispatch info is at nth location". Just to note, input sgprs are handled differently between graphics and compute. I need to "just" preload WorkGroupInfo, so that I could later use that sgpr in the spill code. In kernels, it is treated as a system sgpr (allocateSystemSGPRs), so preloading it should not be a problem. However, in graphics we treat it as a user sgpr and pass in the list of arguments (with the name "MultiDispatchInfo").
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
741	Ok, will re-order. (I placed this condition at the beginning, because it would be true (== return early) more often the the other checks).

In D130784#3687965, @arsenm wrote:

In D130784#3687963, @JonChesterfield wrote:

I thought we spilled registers to the stack and promoted some stack memory to LDS. Is there any interaction with promoteAlloca? In particular that pass tries to use LDS up to some occupancy boundary, after which allocating more here should take us over that boundary

These are orthogonal things. The only relation would be promote alloca reduces the available LDS budget available for later spilling

Exactly, if the amount of LDS available at the point of frame lowering is zero, no spill slots will use LDS.

We will need to generate CFI describing these spills downstream; @RamNalamothu do you see anything that would be an issue, or should it be pretty straightforward?

arsenm added inline comments.Aug 4 2022, 7:16 PM

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
890–894	Rename to MaxWorkGroupSize
893	Does this need alignment padding up to 4?
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
799–800	I don't see why we would need a new limit for this and just rely on the remaining LDS in the occupancy budget
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
406	Could we do this in a post-RA pass before LiveIntervals is discarded? I was thinking we should copy what SC does and reserve more registers, and try to reallocate them in such a pass. The same place could have smarter management of m0
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
2565	Argument names are totally meaningless. opt -strip will now break this, so you need something that doesn't rely on the name

@scott.linder

I don't think there are any issues.

Looks like it's possible to calculate LDS offset of the spill and buildCFIForVGPRToVMEMSpill() needs to be specialized to include DW_OP_LLVM_form_aspace_address for LDS.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1290	s/the are no/there are no

Addressed review comments.

Just briefly glanced through the code and it seems that it doesn't look at whether a kernel has dynamic LDS or not because LDS spilling has to be disabled in that case, can you confirm?

piotr marked 6 inline comments as done.Aug 5 2022, 9:03 AM

piotr added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
893	Not sure (and the total size is a multiply of 4 by construction).
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
799–800	For pixel shaders we would also need to make room for pixel parameters as they also reside in the same CU.
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
406	Doing it in a separate pass may work, but will need to explore it - need to check if I would have access to everything I need here. Good point about extensibility - I did not intend to do the smart m0 thing in the first implementation (left a FIXME), but it is true that to do that properly we would need kind of a data flow analysis so a separate pass would make sense in the long run.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
2565	Added an extra attr instead of matching by name.

Harbormaster completed remote builds in B179527: Diff 450305.Aug 5 2022, 9:34 AM

sebastian-ne added inline comments.Aug 22 2022, 4:57 AM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
824–830	Maybe this should be below `getStackPtrOffsetReg`? Currently, it divides `setStackPtrOffsetReg` and `getStackPtrOffsetReg`.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1295	I guess M0 could have a kill flag here?
2195	The non-lds restore code does not call `addToSpilledVGPRs`. Is this intentional here?

piotr marked an inline comment as done.Aug 22 2022, 5:57 AM

piotr added inline comments.

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
824–830	Maybe this should be below `getStackPtrOffsetReg`? Currently, it divides `setStackPtrOffsetReg` and `getStackPtrOffsetReg`. Yes, good idea.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1295	I guess M0 could have a kill flag here? Yes, thanks.
2195	The non-lds restore code does not call `addToSpilledVGPRs`. Is this intentional here? Good catch - copy & pasta error on my part.

In D130784#3702505, @scchan wrote:

Just briefly glanced through the code and it seems that it doesn't look at whether a kernel has dynamic LDS or not because LDS spilling has to be disabled in that case, can you confirm?

Good point; sorry, I missed your comment initially. It appears we don't track that in MFI though, so that requires some extra code (possibly checking that no extern LDS and no LDS kernel args are used).

piotr planned changes to this revision.Oct 31 2022, 2:49 AM

piotr added inline comments.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
406	Marking as "planned changes" to investigate running this in a post-RA pass.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUAsmPrinter.cpp

6 lines

AMDGPUResourceUsageAnalysis.cpp

3 lines

5 lines

7 lines

4 lines

164 lines

25 lines

SIMachineFunctionInfo.h

34 lines

SIMachineFunctionInfo.cpp

15 lines

SIRegisterInfo.h

6 lines

SIRegisterInfo.cpp

126 lines

test/

CodeGen/

AMDGPU/

lds-spill-cs.ll

64 lines

lds-spill-ps.ll

64 lines

Diff 450305

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

Show First 20 Lines • Show All 881 Lines • ▼ Show 20 Lines	void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
} else {		} else {
// LDS is allocated in 128 dword blocks.		// LDS is allocated in 128 dword blocks.
LDSAlignShift = 9;		LDSAlignShift = 9;
}		}

ProgInfo.SGPRSpill = MFI->getNumSpilledSGPRs();		ProgInfo.SGPRSpill = MFI->getNumSpilledSGPRs();
ProgInfo.VGPRSpill = MFI->getNumSpilledVGPRs();		ProgInfo.VGPRSpill = MFI->getNumSpilledVGPRs();

ProgInfo.LDSSize = MFI->getLDSSize();		unsigned MaxWorkGroupSize = STM.getFlatWorkGroupSizes(F).second;
		unsigned LDSSpillSize = MFI->getLdsSpill().TotalSize * MaxWorkGroupSize;

		ProgInfo.LDSSize = MFI->getLDSSize() + LDSSpillSize;
		arsenmUnsubmitted Not Done Reply Inline Actions Does this need alignment padding up to 4? arsenm: Does this need alignment padding up to 4?
		piotrAuthorUnsubmitted Done Reply Inline Actions Not sure (and the total size is a multiply of 4 by construction). piotr: Not sure (and the total size is a multiply of 4 by construction).

		arsenmUnsubmitted Done Reply Inline Actions Rename to MaxWorkGroupSize arsenm: Rename to MaxWorkGroupSize
ProgInfo.LDSBlocks =		ProgInfo.LDSBlocks =
alignTo(ProgInfo.LDSSize, 1ULL << LDSAlignShift) >> LDSAlignShift;		alignTo(ProgInfo.LDSSize, 1ULL << LDSAlignShift) >> LDSAlignShift;

// Scratch is allocated in 64-dword or 256-dword blocks.		// Scratch is allocated in 64-dword or 256-dword blocks.
unsigned ScratchAlignShift =		unsigned ScratchAlignShift =
STM.getGeneration() >= AMDGPUSubtarget::GFX11 ? 8 : 10;		STM.getGeneration() >= AMDGPUSubtarget::GFX11 ? 8 : 10;
// We need to program the hardware with the amount of scratch memory that		// We need to program the hardware with the amount of scratch memory that
// is used by the entire wave. ProgInfo.ScratchSize is the amount of		// is used by the entire wave. ProgInfo.ScratchSize is the amount of
▲ Show 20 Lines • Show All 351 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp

Show First 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	AMDGPUResourceUsageAnalysis::analyzeResourceUsage(
// really needed.		// really needed.
if (Info.UsesFlatScratch && !MFI->hasFlatScratchInit() &&		if (Info.UsesFlatScratch && !MFI->hasFlatScratchInit() &&
(!hasAnyNonFlatUseOfReg(MRI, *TII, AMDGPU::FLAT_SCR) &&		(!hasAnyNonFlatUseOfReg(MRI, *TII, AMDGPU::FLAT_SCR) &&
!hasAnyNonFlatUseOfReg(MRI, *TII, AMDGPU::FLAT_SCR_LO) &&		!hasAnyNonFlatUseOfReg(MRI, *TII, AMDGPU::FLAT_SCR_LO) &&
!hasAnyNonFlatUseOfReg(MRI, *TII, AMDGPU::FLAT_SCR_HI))) {		!hasAnyNonFlatUseOfReg(MRI, *TII, AMDGPU::FLAT_SCR_HI))) {
Info.UsesFlatScratch = false;		Info.UsesFlatScratch = false;
}		}

Info.PrivateSegmentSize = FrameInfo.getStackSize();		unsigned LdsSpillTotalSize = MFI->getLdsSpill().TotalSize;
		Info.PrivateSegmentSize = FrameInfo.getStackSize() - LdsSpillTotalSize;

// Assume a big number if there are any unknown sized objects.		// Assume a big number if there are any unknown sized objects.
Info.HasDynamicallySizedStack = FrameInfo.hasVarSizedObjects();		Info.HasDynamicallySizedStack = FrameInfo.hasVarSizedObjects();
if (Info.HasDynamicallySizedStack)		if (Info.HasDynamicallySizedStack)
Info.PrivateSegmentSize += AssumedStackSizeForDynamicSizeObjects;		Info.PrivateSegmentSize += AssumedStackSizeForDynamicSizeObjects;

if (MFI->isStackRealigned())		if (MFI->isStackRealigned())
Info.PrivateSegmentSize += FrameInfo.getMaxAlign().value();		Info.PrivateSegmentSize += FrameInfo.getMaxAlign().value();
▲ Show 20 Lines • Show All 369 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

	Show First 20 Lines • Show All 789 Lines • ▼ Show 20 Lines
	}			}

	unsigned GCNSubtarget::getMaxNumVGPRs(const MachineFunction &MF) const {			unsigned GCNSubtarget::getMaxNumVGPRs(const MachineFunction &MF) const {
	const Function &F = MF.getFunction();			const Function &F = MF.getFunction();
	const SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();			const SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
	return getBaseMaxNumVGPRs(F, MFI.getWavesPerEU());			return getBaseMaxNumVGPRs(F, MFI.getWavesPerEU());
	}			}

				unsigned GCNSubtarget::getLdsSpillLimitDwords(const MachineFunction &MF) const {
				const Function &F = MF.getFunction();
				return AMDGPU::getIntegerAttribute(F, "amdgpu-lds-spill-limit-dwords", 0);
				arsenmUnsubmitted Not Done Reply Inline Actions I don't see why we would need a new limit for this and just rely on the remaining LDS in the occupancy budget arsenm: I don't see why we would need a new limit for this and just rely on the remaining LDS in the…
				piotrAuthorUnsubmitted Not Done Reply Inline Actions For pixel shaders we would also need to make room for pixel parameters as they also reside in the same CU. piotr: For pixel shaders we would also need to make room for pixel parameters as they also reside in…
				}

	void GCNSubtarget::adjustSchedDependency(SUnit Def, int DefOpIdx, SUnit Use,			void GCNSubtarget::adjustSchedDependency(SUnit Def, int DefOpIdx, SUnit Use,
	int UseOpIdx, SDep &Dep) const {			int UseOpIdx, SDep &Dep) const {
	if (Dep.getKind() != SDep::Kind::Data \|\| !Dep.getReg() \|\|			if (Dep.getKind() != SDep::Kind::Data \|\| !Dep.getReg() \|\|
	!Def->isInstr() \|\| !Use->isInstr())			!Def->isInstr() \|\| !Use->isInstr())
	return;			return;

	MachineInstr *DefI = Def->getInstr();			MachineInstr *DefI = Def->getInstr();
	MachineInstr *UseI = Use->getInstr();			MachineInstr *UseI = Use->getInstr();
	▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/GCNSubtarget.h

Show First 20 Lines • Show All 1,058 Lines • ▼ Show 20 Lines	public:
bool hasDelayAlu() const { return GFX11Insts; }		bool hasDelayAlu() const { return GFX11Insts; }

bool hasPackedTID() const { return HasPackedTID; }		bool hasPackedTID() const { return HasPackedTID; }

// GFX940 is a derivation to GFX90A. hasGFX940Insts() being true implies that		// GFX940 is a derivation to GFX90A. hasGFX940Insts() being true implies that
// hasGFX90AInsts is also true.		// hasGFX90AInsts is also true.
bool hasGFX940Insts() const { return GFX940Insts; }		bool hasGFX940Insts() const { return GFX940Insts; }

		bool hasDSAddTid() const { return getGeneration() >= GFX9; }

/// Return the maximum number of waves per SIMD for kernels using \p SGPRs		/// Return the maximum number of waves per SIMD for kernels using \p SGPRs
/// SGPRs		/// SGPRs
unsigned getOccupancyWithNumSGPRs(unsigned SGPRs) const;		unsigned getOccupancyWithNumSGPRs(unsigned SGPRs) const;

/// Return the maximum number of waves per SIMD for kernels using \p VGPRs		/// Return the maximum number of waves per SIMD for kernels using \p VGPRs
/// VGPRs		/// VGPRs
unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const;		unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const;

▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
/// requested using "amdgpu-num-vgpr" attribute attached to function \p MF.		/// requested using "amdgpu-num-vgpr" attribute attached to function \p MF.
///		///
/// \returns Value that meets number of waves per execution unit requirement		/// \returns Value that meets number of waves per execution unit requirement
/// if explicitly requested value cannot be converted to integer, violates		/// if explicitly requested value cannot be converted to integer, violates
/// subtarget's specifications, or does not meet number of waves per execution		/// subtarget's specifications, or does not meet number of waves per execution
/// unit requirement.		/// unit requirement.
unsigned getMaxNumVGPRs(const MachineFunction &MF) const;		unsigned getMaxNumVGPRs(const MachineFunction &MF) const;

		/// \returns Maximum amount of LDS space to be used for spilling explicitly
		/// requested using "amdgpu-lds-spill-limit-dwords attribute attached to
		/// function \p F.
		unsigned getLdsSpillLimitDwords(const MachineFunction &MF) const;

void getPostRAMutations(		void getPostRAMutations(
std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations)		std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations)
const override;		const override;

std::unique_ptr<ScheduleDAGMutation>		std::unique_ptr<ScheduleDAGMutation>
createFillMFMAShadowMutation(const TargetInstrInfo *TII) const;		createFillMFMAShadowMutation(const TargetInstrInfo *TII) const;

bool isWave32() const {		bool isWave32() const {
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFrameLowering.h

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	private:
Register getEntryFunctionReservedScratchRsrcReg(MachineFunction &MF) const;		Register getEntryFunctionReservedScratchRsrcReg(MachineFunction &MF) const;

void emitEntryFunctionScratchRsrcRegSetup(		void emitEntryFunctionScratchRsrcRegSetup(
MachineFunction &MF, MachineBasicBlock &MBB,		MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator I, const DebugLoc &DL,		MachineBasicBlock::iterator I, const DebugLoc &DL,
Register PreloadedPrivateBufferReg, Register ScratchRsrcReg,		Register PreloadedPrivateBufferReg, Register ScratchRsrcReg,
Register ScratchWaveOffsetReg) const;		Register ScratchWaveOffsetReg) const;

		void setupLDSSpilling(MachineFunction &MF, MachineBasicBlock &MBB,
		MachineBasicBlock::iterator I,
		const DebugLoc &DL) const;

public:		public:
bool hasFP(const MachineFunction &MF) const override;		bool hasFP(const MachineFunction &MF) const override;

bool requiresStackPointerReference(const MachineFunction &MF) const;		bool requiresStackPointerReference(const MachineFunction &MF) const;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_AMDGPU_SIFRAMELOWERING_H		#endif // LLVM_LIB_TARGET_AMDGPU_SIFRAMELOWERING_H

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

Show First 20 Lines • Show All 394 Lines • ▼ Show 20 Lines	Register SIFrameLowering::getEntryFunctionReservedScratchRsrcReg(

return ScratchRsrcReg;		return ScratchRsrcReg;
}		}

static unsigned getScratchScaleFactor(const GCNSubtarget &ST) {		static unsigned getScratchScaleFactor(const GCNSubtarget &ST) {
return ST.enableFlatScratch() ? 1 : ST.getWavefrontSize();		return ST.enableFlatScratch() ? 1 : ST.getWavefrontSize();
}		}

		// Determine which stack objects should be spilled to LDS, set up
		// SIMachineFunctionInfo::LdsSpill structure and initialize
		// m0 for LDS spilling if possible.
		void SIFrameLowering::setupLDSSpilling(MachineFunction &MF,
		arsenmUnsubmitted Not Done Reply Inline Actions Could we do this in a post-RA pass before LiveIntervals is discarded? I was thinking we should copy what SC does and reserve more registers, and try to reallocate them in such a pass. The same place could have smarter management of m0 arsenm: Could we do this in a post-RA pass before LiveIntervals is discarded? I was thinking we should…
		piotrAuthorUnsubmitted Done Reply Inline Actions Doing it in a separate pass may work, but will need to explore it - need to check if I would have access to everything I need here. Good point about extensibility - I did not intend to do the smart m0 thing in the first implementation (left a FIXME), but it is true that to do that properly we would need kind of a data flow analysis so a separate pass would make sense in the long run. piotr: Doing it in a separate pass may work, but will need to explore it - need to check if I would…
		piotrAuthorUnsubmitted Done Reply Inline Actions Marking as "planned changes" to investigate running this in a post-RA pass. piotr: Marking as "planned changes" to investigate running this in a post-RA pass.
		MachineBasicBlock &MBB,
		MachineBasicBlock::iterator I,
		const DebugLoc &DL) const {
		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
		const SIInstrInfo *TII = ST.getInstrInfo();
		const SIRegisterInfo *TRI = &TII->getRegisterInfo();
		MachineRegisterInfo &MRI = MF.getRegInfo();
		const Function &F = MF.getFunction();
		MachineFrameInfo &FrameInfo = MF.getFrameInfo();
		unsigned WorkGroupSize = ST.getFlatWorkGroupSizes(F).second;

		assert(MFI->isEntryFunction());

		int LDSSpillLimitInBytes = ST.getLdsSpillLimitDwords(MF) * 4;
		LDSSpillLimitInBytes =
		std::max(0, LDSSpillLimitInBytes - (int)MFI->getLDSSize());

		// Go through the stack slots starting from the end and assign them to LDS
		// as long as they fit in the remaining size.
		SmallVector<int> LdsOffsets(FrameInfo.getObjectIndexEnd(), -1);
		bool AllStackSlotsHandled = true;
		int TotalSize = 0;
		int RemainingSize = LDSSpillLimitInBytes;
		for (int i = FrameInfo.getObjectIndexEnd() - 1; i >= 0; --i) {
		if (FrameInfo.isDeadObjectIndex(i)) {
		continue;
		}
		if (FrameInfo.isObjectPreAllocated(i)) {
		AllStackSlotsHandled = false;
		break;
		}
		int ObjSize = FrameInfo.getObjectSize(i);
		assert(ObjSize > 0);
		int ObjSizeForAllThreads = ObjSize * WorkGroupSize;

		if (ObjSizeForAllThreads <= RemainingSize) {
		RemainingSize -= ObjSizeForAllThreads;
		LdsOffsets[i] = TotalSize;

		TotalSize += ObjSize;
		} else {
		AllStackSlotsHandled = false;
		break;
		}
		}

		// No stack slots will use LDS - exit early.
		if (TotalSize == 0)
		return;

		// Register to use for m0 save/restore for each spill, or NoRegister if the
		// save/restore is not needed, and the initialization takes place here once.
		Register M0SaveRestoreReg;
		if (MRI.isPhysRegUsed(AMDGPU::M0)) {
		if (requiresStackPointerReference(MF)) {
		unsigned NumPreloaded = MFI->getNumPreloadedSGPRs();
		ArrayRef<MCPhysReg> AllSGPRs = TRI->getAllSGPR32(MF);
		AllSGPRs = AllSGPRs.slice(
		std::min(static_cast<unsigned>(AllSGPRs.size()), NumPreloaded));
		for (MCPhysReg Reg : AllSGPRs) {
		if (!MRI.isPhysRegUsed(Reg) && MRI.isAllocatable(Reg)) {
		M0SaveRestoreReg = Reg;
		break;
		}
		}
		} else {
		assert(!requiresStackPointerReference(MF));
		M0SaveRestoreReg = MFI->getStackPtrOffsetReg();
		}
		// Could not find a free SGPR for M0 init so exit early.
		if (M0SaveRestoreReg == AMDGPU::NoRegister)
		return;
		}

		Register M0InitVal;
		// The addtid addressing is as follows:
		// LDS_Addr = LDS_BASE + {Inst_offset1, Inst_offset0} + TID(0..63)*4 + M0
		// If the workgroup size is not larger than the wave size we can safely init
		// m0 with 0. Otherwise, we need to make sure that the lds addresses do not
		// override data for other slots so we initialize m0 to
		// current_wave_id_in_group * wave size.
		if (WorkGroupSize > ST.getWavefrontSize()) {
		Register PreloadedWorkgroupInfoReg = MFI->getWorkgroupInfoReg();
		if (!PreloadedWorkgroupInfoReg) {
		// This should never happen, but it depends on how front-end sets up
		// input sgprs, so it is safer to make it an early out rather than assert.
		return;
		}

		if (!MRI.isPhysRegUsed(PreloadedWorkgroupInfoReg)) {
		M0InitVal = PreloadedWorkgroupInfoReg;
		} else {

		unsigned NumPreloaded = MFI->getNumPreloadedSGPRs();
		ArrayRef<MCPhysReg> AllSGPRs = TRI->getAllSGPR32(MF);
		AllSGPRs = AllSGPRs.slice(
		std::min(static_cast<unsigned>(AllSGPRs.size()), NumPreloaded));
		for (MCPhysReg Reg : AllSGPRs) {
		if (!MRI.isPhysRegUsed(Reg) && MRI.isAllocatable(Reg)) {
		M0InitVal = Reg;
		break;
		}
		}
		}

		// Could not find a free SGPR for M0 init so exit early.
		// FIXME: We could also check some of the preloads to see if one of them
		// could be re-used.
		if (M0InitVal == AMDGPU::NoRegister)
		return;

		// Load ordered_append_term to get the current wave id in a group.
		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_BFE_U32), M0InitVal)
		.addReg(PreloadedWorkgroupInfoReg)
		.addImm(0xc0006);
		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_MUL_I32), M0InitVal)
		.addReg(M0InitVal)
		.addImm(ST.getWavefrontSize() * 4);
		}

		// If save/restore is not needed we can init m0 here and be done with it.
		if (M0SaveRestoreReg == AMDGPU::NoRegister) {
		if (M0InitVal == AMDGPU::NoRegister)
		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_MOV_B32), AMDGPU::M0).addImm(0);
		else
		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_MOV_B32), AMDGPU::M0)
		.addReg(M0InitVal);
		}

		SIMachineFunctionInfo::LdsSpill LdsSpillInfo;
		LdsSpillInfo.M0InitVal = M0InitVal;
		LdsSpillInfo.M0SaveRestoreReg = M0SaveRestoreReg;
		LdsSpillInfo.LdsOffsets = LdsOffsets;
		LdsSpillInfo.TotalSize = TotalSize;
		MFI->setLdsSpill(LdsSpillInfo);

		// Earlier we set ScavengeFI based on the fact that there were stack accesses.
		// In the event no slots will use stack, we can safely remove it.
		if (AllStackSlotsHandled) {
		int ScavengeFI = MFI->getScavengeFI(FrameInfo, *TRI);
		FrameInfo.setStackSize(FrameInfo.getStackSize() -
		FrameInfo.getObjectSize(ScavengeFI));
		FrameInfo.RemoveStackObject(ScavengeFI);
		}
		}

void SIFrameLowering::emitEntryFunctionPrologue(MachineFunction &MF,		void SIFrameLowering::emitEntryFunctionPrologue(MachineFunction &MF,
MachineBasicBlock &MBB) const {		MachineBasicBlock &MBB) const {
assert(&MF.front() == &MBB && "Shrink-wrapping not yet supported");		assert(&MF.front() == &MBB && "Shrink-wrapping not yet supported");

// FIXME: If we only have SGPR spills, we won't actually be using scratch		// FIXME: If we only have SGPR spills, we won't actually be using scratch
// memory since these spill to VGPRs. We should be cleaning up these unused		// memory since these spill to VGPRs. We should be cleaning up these unused
// SGPR spill frame indices somewhere.		// SGPR spill frame indices somewhere.

Show All 9 Lines	void SIFrameLowering::emitEntryFunctionPrologue(MachineFunction &MF,
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
const SIRegisterInfo *TRI = &TII->getRegisterInfo();		const SIRegisterInfo *TRI = &TII->getRegisterInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
const Function &F = MF.getFunction();		const Function &F = MF.getFunction();
MachineFrameInfo &FrameInfo = MF.getFrameInfo();		MachineFrameInfo &FrameInfo = MF.getFrameInfo();

assert(MFI->isEntryFunction());		assert(MFI->isEntryFunction());

		// Debug location must be unknown since the first debug location is used to
		// determine the end of the prologue.
		DebugLoc DL;
		MachineBasicBlock::iterator I = MBB.begin();

		if (FrameInfo.getStackSize() > 0 && MFI->ldsSpillingEnabled(MF))
		setupLDSSpilling(MF, MBB, I, DL);

Register PreloadedScratchWaveOffsetReg = MFI->getPreloadedReg(		Register PreloadedScratchWaveOffsetReg = MFI->getPreloadedReg(
AMDGPUFunctionArgInfo::PRIVATE_SEGMENT_WAVE_BYTE_OFFSET);		AMDGPUFunctionArgInfo::PRIVATE_SEGMENT_WAVE_BYTE_OFFSET);

// We need to do the replacement of the private segment buffer register even		// We need to do the replacement of the private segment buffer register even
// if there are no stack objects. There could be stores to undef or a		// if there are no stack objects. There could be stores to undef or a
// constant without an associated object.		// constant without an associated object.
//		//
// This will return `Register()` in cases where there are no actual		// This will return `Register()` in cases where there are no actual
Show All 20 Lines	if (ST.isAmdHsaOrMesa(F)) {
if (ScratchRsrcReg && PreloadedScratchRsrcReg) {		if (ScratchRsrcReg && PreloadedScratchRsrcReg) {
// We added live-ins during argument lowering, but since they were not		// We added live-ins during argument lowering, but since they were not
// used they were deleted. We're adding the uses now, so add them back.		// used they were deleted. We're adding the uses now, so add them back.
MRI.addLiveIn(PreloadedScratchRsrcReg);		MRI.addLiveIn(PreloadedScratchRsrcReg);
MBB.addLiveIn(PreloadedScratchRsrcReg);		MBB.addLiveIn(PreloadedScratchRsrcReg);
}		}
}		}

// Debug location must be unknown since the first debug location is used to
// determine the end of the prologue.
DebugLoc DL;
MachineBasicBlock::iterator I = MBB.begin();

// We found the SRSRC first because it needs four registers and has an		// We found the SRSRC first because it needs four registers and has an
// alignment requirement. If the SRSRC that we found is clobbering with		// alignment requirement. If the SRSRC that we found is clobbering with
// the scratch wave offset, which may be in a fixed SGPR or a free SGPR		// the scratch wave offset, which may be in a fixed SGPR or a free SGPR
// chosen by SITargetLowering::allocateSystemSGPRs, COPY the scratch		// chosen by SITargetLowering::allocateSystemSGPRs, COPY the scratch
// wave offset to a free SGPR.		// wave offset to a free SGPR.
Register ScratchWaveOffsetReg;		Register ScratchWaveOffsetReg;
if (PreloadedScratchWaveOffsetReg &&		if (PreloadedScratchWaveOffsetReg &&
TRI->isSubRegisterEq(ScratchRsrcReg, PreloadedScratchWaveOffsetReg)) {		TRI->isSubRegisterEq(ScratchRsrcReg, PreloadedScratchWaveOffsetReg)) {
▲ Show 20 Lines • Show All 1,028 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,549 Lines • ▼ Show 20 Lines	case CCValAssign::AExt:
break;		break;
default:		default:
llvm_unreachable("Unknown loc info!");		llvm_unreachable("Unknown loc info!");
}		}

InVals.push_back(Val);		InVals.push_back(Val);
}		}

		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
		if (MFI->ldsSpillingEnabled(MF) &&
		rampitecUnsubmitted Done Reply Inline Actions Why CS only? rampitec: Why CS only?
		piotrAuthorUnsubmitted Done Reply Inline Actions This is a left-over from an older version of the patch, will remove. piotr: This is a left-over from an older version of the patch, will remove.
		ST.getFlatWorkGroupSizes(Fn).second > ST.getWavefrontSize()) {

		int WorkGroupInfoSgprNo =
		AMDGPU::getIntegerAttribute(Fn, "amdgpu-work-group-info-arg-no", -1);
		if (WorkGroupInfoSgprNo != -1)
		rampitecUnsubmitted Done Reply Inline Actions Matching argument by name is bad, especially if such name can be used by an user. rampitec: Matching argument by name is bad, especially if such name can be used by an user.
		piotrAuthorUnsubmitted Done Reply Inline Actions This is controlled by the front-end - graphics compute shaders do not have inputs that could be specified by a shader writer and end up landing here. Having said that, I am not 100% happy with the way this is handled here either. I guess a better idea could be to rely on a function attribute set by the front-end that would say - "the user sgpr you want to use for multi dispatch info is at nth location". Just to note, input sgprs are handled differently between graphics and compute. I need to "just" preload WorkGroupInfo, so that I could later use that sgpr in the spill code. In kernels, it is treated as a system sgpr (allocateSystemSGPRs), so preloading it should not be a problem. However, in graphics we treat it as a user sgpr and pass in the list of arguments (with the name "MultiDispatchInfo"). piotr: This is controlled by the front-end - graphics compute shaders do not have inputs that could be…
		arsenmUnsubmitted Done Reply Inline Actions Argument names are totally meaningless. opt -strip will now break this, so you need something that doesn't rely on the name arsenm: Argument names are totally meaningless. opt -strip will now break this, so you need something…
		piotrAuthorUnsubmitted Done Reply Inline Actions Added an extra attr instead of matching by name. piotr: Added an extra attr instead of matching by name.
		for (unsigned i = 0, e = Ins.size(); i != e; ++i) {
		const ISD::InputArg &Arg = Ins[i];
		if (Arg.getOrigArgIndex() == (unsigned)WorkGroupInfoSgprNo) {

		CCValAssign &VA = ArgLocs[i];
		Register WorkGroupInfoReg = VA.getLocReg();
		assert(AMDGPU::SGPR_32RegClass.contains(WorkGroupInfoReg));

		Info->setWorkgroupInfoReg(WorkGroupInfoReg);
		MF.addLiveIn(WorkGroupInfoReg, &AMDGPU::SGPR_32RegClass);
		MF.front().addLiveIn(WorkGroupInfoReg, &AMDGPU::SGPR_32RegClass);

		break;
		}
		}
		}

// Start adding system SGPRs.		// Start adding system SGPRs.
if (IsEntryFunc) {		if (IsEntryFunc) {
allocateSystemSGPRs(CCInfo, MF, *Info, CallConv, IsGraphics);		allocateSystemSGPRs(CCInfo, MF, *Info, CallConv, IsGraphics);
} else {		} else {
CCInfo.AllocateReg(Info->getScratchRSrcReg());		CCInfo.AllocateReg(Info->getScratchRSrcReg());
if (!IsGraphics)		if (!IsGraphics)
allocateSpecialInputSGPRs(CCInfo, MF, TRI, Info);		allocateSpecialInputSGPRs(CCInfo, MF, TRI, Info);
}		}
▲ Show 20 Lines • Show All 10,408 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

Show First 20 Lines • Show All 358 Lines • ▼ Show 20 Lines	class SIMachineFunctionInfo final : public AMDGPUMachineFunction {
// base to the beginning of the current function's frame.		// base to the beginning of the current function's frame.
Register FrameOffsetReg = AMDGPU::FP_REG;		Register FrameOffsetReg = AMDGPU::FP_REG;

// This is an ABI register used in the non-entry calling convention to		// This is an ABI register used in the non-entry calling convention to
// communicate the unswizzled offset from the current dispatch's scratch wave		// communicate the unswizzled offset from the current dispatch's scratch wave
// base to the beginning of the new function's frame.		// base to the beginning of the new function's frame.
Register StackPtrOffsetReg = AMDGPU::SP_REG;		Register StackPtrOffsetReg = AMDGPU::SP_REG;

		// This is WorkgroupInfo register set up for LDS spilling for cases where
		// workgroup size is larger than wave size. It relies on user input
		// registers set up by the front-end.
		Register WorkgroupInfoReg = 0;

AMDGPUFunctionArgInfo ArgInfo;		AMDGPUFunctionArgInfo ArgInfo;

// Graphics info.		// Graphics info.
unsigned PSInputAddr = 0;		unsigned PSInputAddr = 0;
unsigned PSInputEnable = 0;		unsigned PSInputEnable = 0;

/// Number of bytes of arguments this function has on the stack. If the callee		/// Number of bytes of arguments this function has on the stack. If the callee
/// is expected to restore the argument stack this should be a multiple of 16,		/// is expected to restore the argument stack this should be a multiple of 16,
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	public:
};		};

struct VGPRSpillToAGPR {		struct VGPRSpillToAGPR {
SmallVector<MCPhysReg, 32> Lanes;		SmallVector<MCPhysReg, 32> Lanes;
bool FullyAllocated = false;		bool FullyAllocated = false;
bool IsDead = false;		bool IsDead = false;
};		};

		struct LdsSpill {
		// Value to init m0 with.
		Register M0InitVal;
		// Register to save/restore current value of m0 for each spill. If
		// NoRegister, the m0 initialization takes place in the prolog once.
		Register M0SaveRestoreReg;
		// Offset in LDS indexed by a stack object index. Value (-1) means there is
		// no LDS spilling for such stack object index. The values are properly
		// initialized only if TotalSize > 0.
		SmallVector<int> LdsOffsets;
		// Total size of all LDS spill objects in bytes (per thread).
		unsigned TotalSize = 0;
		};

// Track VGPRs reserved for WWM.		// Track VGPRs reserved for WWM.
SmallSetVector<Register, 8> WWMReservedRegs;		SmallSetVector<Register, 8> WWMReservedRegs;

/// Track stack slots used for save/restore of reserved WWM VGPRs in the		/// Track stack slots used for save/restore of reserved WWM VGPRs in the
/// prolog/epilog.		/// prolog/epilog.

/// FIXME: This is temporary state only needed in PrologEpilogInserter, and		/// FIXME: This is temporary state only needed in PrologEpilogInserter, and
/// doesn't really belong here. It does not require serialization		/// doesn't really belong here. It does not require serialization
Show All 21 Lines	private:

// VGPRs used for AGPR spills.		// VGPRs used for AGPR spills.
SmallVector<MCPhysReg, 32> SpillVGPR;		SmallVector<MCPhysReg, 32> SpillVGPR;

// Emergency stack slot. Sometimes, we create this before finalizing the stack		// Emergency stack slot. Sometimes, we create this before finalizing the stack
// frame, so save it here and add it to the RegScavenger later.		// frame, so save it here and add it to the RegScavenger later.
Optional<int> ScavengeFI;		Optional<int> ScavengeFI;

		LdsSpill LdsSpillInfo;

private:		private:
Register VGPRForAGPRCopy;		Register VGPRForAGPRCopy;

public:		public:
Register getVGPRForAGPRCopy() const {		Register getVGPRForAGPRCopy() const {
return VGPRForAGPRCopy;		return VGPRForAGPRCopy;
}		}

▲ Show 20 Lines • Show All 275 Lines • ▼ Show 20 Lines	void setFrameOffsetReg(Register Reg) {
FrameOffsetReg = Reg;		FrameOffsetReg = Reg;
}		}

void setStackPtrOffsetReg(Register Reg) {		void setStackPtrOffsetReg(Register Reg) {
assert(Reg != 0 && "Should never be unset");		assert(Reg != 0 && "Should never be unset");
StackPtrOffsetReg = Reg;		StackPtrOffsetReg = Reg;
}		}

		void setWorkgroupInfoReg(Register Reg) {
		assert(Reg != 0);
		WorkgroupInfoReg = Reg;
		}

		Register getWorkgroupInfoReg() const { return WorkgroupInfoReg; }

		sebastian-neUnsubmitted Not Done Reply Inline Actions Maybe this should be below `getStackPtrOffsetReg`? Currently, it divides `setStackPtrOffsetReg` and `getStackPtrOffsetReg`. sebastian-ne: Maybe this should be below `getStackPtrOffsetReg`? Currently, it divides `setStackPtrOffsetReg`…
		piotrAuthorUnsubmitted Done Reply Inline Actions Maybe this should be below `getStackPtrOffsetReg`? Currently, it divides `setStackPtrOffsetReg` and `getStackPtrOffsetReg`. Yes, good idea. piotr: > Maybe this should be below `getStackPtrOffsetReg`? Currently, it divides…
// Note the unset value for this is AMDGPU::SP_REG rather than		// Note the unset value for this is AMDGPU::SP_REG rather than
// NoRegister. This is mostly a workaround for MIR tests where state that		// NoRegister. This is mostly a workaround for MIR tests where state that
// can't be directly computed from the function is not preserved in serialized		// can't be directly computed from the function is not preserved in serialized
// MIR.		// MIR.
Register getStackPtrOffsetReg() const {		Register getStackPtrOffsetReg() const {
return StackPtrOffsetReg;		return StackPtrOffsetReg;
}		}

▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	public:
}		}

// \returns true if a function has a use of AGPRs via inline asm or		// \returns true if a function has a use of AGPRs via inline asm or
// has a call which may use it.		// has a call which may use it.
bool mayUseAGPRs(const MachineFunction &MF) const;		bool mayUseAGPRs(const MachineFunction &MF) const;

// \returns true if a function needs or may need AGPRs.		// \returns true if a function needs or may need AGPRs.
bool usesAGPRs(const MachineFunction &MF) const;		bool usesAGPRs(const MachineFunction &MF) const;

		void setLdsSpill(LdsSpill Info) { LdsSpillInfo = Info; }

		LdsSpill getLdsSpill() const { return LdsSpillInfo; }

		bool ldsSpillingEnabled(const MachineFunction &MF) const;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_AMDGPU_SIMACHINEFUNCTIONINFO_H		#endif // LLVM_LIB_TARGET_AMDGPU_SIMACHINEFUNCTIONINFO_H

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 728 Lines • ▼ Show 20 Lines	if (MRI.isPhysRegUsed(Reg)) {
UsesAGPRs = true;		UsesAGPRs = true;
return true;		return true;
}		}
}		}

UsesAGPRs = false;		UsesAGPRs = false;
return false;		return false;
}		}

		bool SIMachineFunctionInfo::ldsSpillingEnabled(
		const MachineFunction &MF) const {
		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
		if (!ST.hasDSAddTid())
		rampitecUnsubmitted Done Reply Inline Actions This is the most expensive check, it should go last. rampitec: This is the most expensive check, it should go last.
		piotrAuthorUnsubmitted Done Reply Inline Actions Ok, will re-order. (I placed this condition at the beginning, because it would be true (== return early) more often the the other checks). piotr: Ok, will re-order. (I placed this condition at the beginning, because it would be true (==…
		return false;

		if (MF.getFrameInfo().hasCalls())
		return false;

		if (ST.getLdsSpillLimitDwords(MF) == 0)
		return false;

		return true;
		}

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

Show First 20 Lines • Show All 399 Lines • ▼ Show 20 Lines	public:
/// Return all SGPR64 which satisfy the waves per execution unit requirement		/// Return all SGPR64 which satisfy the waves per execution unit requirement
/// of the subtarget.		/// of the subtarget.
ArrayRef<MCPhysReg> getAllSGPR64(const MachineFunction &MF) const;		ArrayRef<MCPhysReg> getAllSGPR64(const MachineFunction &MF) const;

/// Return all SGPR32 which satisfy the waves per execution unit requirement		/// Return all SGPR32 which satisfy the waves per execution unit requirement
/// of the subtarget.		/// of the subtarget.
ArrayRef<MCPhysReg> getAllSGPR32(const MachineFunction &MF) const;		ArrayRef<MCPhysReg> getAllSGPR32(const MachineFunction &MF) const;

		bool buildLdsSpillLoadStore(MachineBasicBlock &MBB,
		MachineBasicBlock::iterator MI,
		const DebugLoc &DL, bool IsLoad, int Index,
		Register ValueReg, bool ValueIsKill,
		int64_t InstrOffset,
		MachineMemOperand *MMO) const;
// Insert spill or restore instructions.		// Insert spill or restore instructions.
// When lowering spill pseudos, the RegScavenger should be set.		// When lowering spill pseudos, the RegScavenger should be set.
// For creating spill instructions during frame lowering, where no scavenger		// For creating spill instructions during frame lowering, where no scavenger
// is available, LiveRegs can be used.		// is available, LiveRegs can be used.
void buildSpillLoadStore(MachineBasicBlock &MBB,		void buildSpillLoadStore(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI, const DebugLoc &DL,		MachineBasicBlock::iterator MI, const DebugLoc &DL,
unsigned LoadStoreOp, int Index, Register ValueReg,		unsigned LoadStoreOp, int Index, Register ValueReg,
bool ValueIsKill, MCRegister ScratchOffsetReg,		bool ValueIsKill, MCRegister ScratchOffsetReg,
int64_t InstrOffset, MachineMemOperand *MMO,		int64_t InstrOffset, MachineMemOperand *MMO,
RegScavenger *RS,		RegScavenger *RS,
LivePhysRegs *LiveRegs = nullptr) const;		LivePhysRegs *LiveRegs = nullptr) const;
};		};

} // End namespace llvm		} // End namespace llvm

#endif		#endif

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 1,261 Lines • ▼ Show 20 Lines	static unsigned getFlatScratchSpillOpcode(const SIInstrInfo *TII,
if (HasVAddr)		if (HasVAddr)
LoadStoreOp = AMDGPU::getFlatScratchInstSVfromSS(LoadStoreOp);		LoadStoreOp = AMDGPU::getFlatScratchInstSVfromSS(LoadStoreOp);
else if (UseST)		else if (UseST)
LoadStoreOp = AMDGPU::getFlatScratchInstSTfromSS(LoadStoreOp);		LoadStoreOp = AMDGPU::getFlatScratchInstSTfromSS(LoadStoreOp);

return LoadStoreOp;		return LoadStoreOp;
}		}

		bool SIRegisterInfo::buildLdsSpillLoadStore(MachineBasicBlock &MBB,
		MachineBasicBlock::iterator MI,
		const DebugLoc &DL, bool IsLoad,
		int Index, Register ValueReg,
		bool IsKill, int64_t InstOffset,
		MachineMemOperand *MMO) const {

		const SIInstrInfo *TII = ST.getInstrInfo();

		MachineFunction *MF = MBB.getParent();
		const MachineFrameInfo &MFI = MF->getFrameInfo();
		const SIMachineFunctionInfo *FuncInfo = MF->getInfo<SIMachineFunctionInfo>();

		SIMachineFunctionInfo::LdsSpill LdsSpillInfo = FuncInfo->getLdsSpill();
		int64_t LdsOffsetForIndex = LdsSpillInfo.LdsOffsets[Index];
		if (LdsOffsetForIndex == -1)
		return false;

		if (LdsSpillInfo.M0SaveRestoreReg) {

		// FIXME: If we could prove that there are no m0 defs/uses between two LDS
		RamNalamothuUnsubmitted Done Reply Inline Actions s/the are no/there are no RamNalamothu: s/the are no/there are no
		// spill instructions we could avoid doing some save/restore.

		BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32),
		LdsSpillInfo.M0SaveRestoreReg)
		.addReg(AMDGPU::M0);
		sebastian-neUnsubmitted Not Done Reply Inline Actions I guess M0 could have a kill flag here? sebastian-ne: I guess M0 could have a kill flag here?
		piotrAuthorUnsubmitted Done Reply Inline Actions I guess M0 could have a kill flag here? Yes, thanks. piotr: > I guess M0 could have a kill flag here? Yes, thanks.
		if (LdsSpillInfo.M0InitVal == AMDGPU::NoRegister)
		BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), AMDGPU::M0).addImm(0x0);
		else
		BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), AMDGPU::M0)
		.addImm(LdsSpillInfo.M0InitVal);
		}

		const TargetRegisterClass *RC = getRegClassForReg(MF->getRegInfo(), ValueReg);
		unsigned EltCount = AMDGPU::getRegBitWidth(RC->getID()) / 32;

		Align Alignment = MFI.getObjectAlign(Index);
		const MachinePointerInfo &BasePtrInfo = MMO->getPointerInfo();
		for (unsigned R = 0; R < EltCount; ++R) {
		MachinePointerInfo PInfo = BasePtrInfo.getWithOffset(4 * R);
		MachineMemOperand *NewMMO = MF->getMachineMemOperand(
		PInfo, MMO->getFlags(), 4, commonAlignment(Alignment, 4 * R));

		Register SubReg =
		EltCount == 1 ? ValueReg
		: Register(getSubReg(ValueReg, getSubRegFromChannel(R)));

		unsigned WorkGroupSize = ST.getFlatWorkGroupSizes(MF->getFunction()).second;
		// The addtid addressing is as follows:
		// LDS_Addr = LDS_BASE + {Inst_offset1, Inst_offset0} + TID(0..63)*4 + M0
		// We calculate offset for the zeroth lane and make room for other lanes by
		// multiplying by the wave size. The earlier m0 setup handles the case
		// when the workgroup size is larger than thread size.

		int64_t LdsOffsetForIndex = FuncInfo->getLdsSpill().LdsOffsets[Index];
		assert(LdsOffsetForIndex != -1);

		int64_t StackOffset = InstOffset + LdsOffsetForIndex + 4 * R;
		int64_t StackOffsetZerothLane =
		StackOffset * WorkGroupSize + FuncInfo->getLDSSize();

		if (IsLoad) {
		BuildMI(MBB, MI, DL, TII->get(AMDGPU::DS_READ_ADDTID_B32), SubReg)
		.addImm(StackOffsetZerothLane)
		.addImm(0 /* gds */)
		.addMemOperand(NewMMO);

		} else {
		BuildMI(MBB, MI, DL, TII->get(AMDGPU::DS_WRITE_ADDTID_B32))
		.addReg(SubReg, getKillRegState(R == EltCount - 1 ? IsKill : false))
		.addImm(StackOffsetZerothLane)
		.addImm(0 /* gds */)
		.addMemOperand(NewMMO);
		}
		}

		if (LdsSpillInfo.M0SaveRestoreReg) {
		BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), AMDGPU::M0)
		.addReg(LdsSpillInfo.M0SaveRestoreReg, RegState::Kill);
		}
		return true;
		}

void SIRegisterInfo::buildSpillLoadStore(		void SIRegisterInfo::buildSpillLoadStore(
MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, const DebugLoc &DL,		MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, const DebugLoc &DL,
unsigned LoadStoreOp, int Index, Register ValueReg, bool IsKill,		unsigned LoadStoreOp, int Index, Register ValueReg, bool IsKill,
MCRegister ScratchOffsetReg, int64_t InstOffset, MachineMemOperand *MMO,		MCRegister ScratchOffsetReg, int64_t InstOffset, MachineMemOperand *MMO,
RegScavenger RS, LivePhysRegs LiveRegs) const {		RegScavenger RS, LivePhysRegs LiveRegs) const {
assert((!RS \|\| !LiveRegs) && "Only RS or LiveRegs can be set but not both");		assert((!RS \|\| !LiveRegs) && "Only RS or LiveRegs can be set but not both");

MachineFunction *MF = MBB.getParent();		MachineFunction *MF = MBB.getParent();
▲ Show 20 Lines • Show All 747 Lines • ▼ Show 20 Lines	switch (MI->getOpcode()) {
case AMDGPU::SI_SPILL_V512_SAVE:		case AMDGPU::SI_SPILL_V512_SAVE:
case AMDGPU::SI_SPILL_V256_SAVE:		case AMDGPU::SI_SPILL_V256_SAVE:
case AMDGPU::SI_SPILL_V224_SAVE:		case AMDGPU::SI_SPILL_V224_SAVE:
case AMDGPU::SI_SPILL_V192_SAVE:		case AMDGPU::SI_SPILL_V192_SAVE:
case AMDGPU::SI_SPILL_V160_SAVE:		case AMDGPU::SI_SPILL_V160_SAVE:
case AMDGPU::SI_SPILL_V128_SAVE:		case AMDGPU::SI_SPILL_V128_SAVE:
case AMDGPU::SI_SPILL_V96_SAVE:		case AMDGPU::SI_SPILL_V96_SAVE:
case AMDGPU::SI_SPILL_V64_SAVE:		case AMDGPU::SI_SPILL_V64_SAVE:
case AMDGPU::SI_SPILL_V32_SAVE:		case AMDGPU::SI_SPILL_V32_SAVE: {
		if (MFI->getLdsSpill().TotalSize > 0) {

		const MachineOperand *VData =
		TII->getNamedOperand(*MI, AMDGPU::OpName::vdata);
		assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==
		MFI->getStackPtrOffsetReg());

		bool ldsSpill = buildLdsSpillLoadStore(
		MBB, MI, DL, /IsLoad*/ false, Index, VData->getReg(),
		/IsKill/ VData->isKill(),
		TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),
		*MI->memoperands_begin());

		if (ldsSpill) {
		MFI->addToSpilledVGPRs(getNumSubRegsForSpillOp(MI->getOpcode()));
		MI->eraseFromParent();
		break;
		}
		}
		LLVM_FALLTHROUGH;
		}
case AMDGPU::SI_SPILL_A1024_SAVE:		case AMDGPU::SI_SPILL_A1024_SAVE:
case AMDGPU::SI_SPILL_A512_SAVE:		case AMDGPU::SI_SPILL_A512_SAVE:
case AMDGPU::SI_SPILL_A256_SAVE:		case AMDGPU::SI_SPILL_A256_SAVE:
case AMDGPU::SI_SPILL_A224_SAVE:		case AMDGPU::SI_SPILL_A224_SAVE:
case AMDGPU::SI_SPILL_A192_SAVE:		case AMDGPU::SI_SPILL_A192_SAVE:
case AMDGPU::SI_SPILL_A160_SAVE:		case AMDGPU::SI_SPILL_A160_SAVE:
case AMDGPU::SI_SPILL_A128_SAVE:		case AMDGPU::SI_SPILL_A128_SAVE:
case AMDGPU::SI_SPILL_A96_SAVE:		case AMDGPU::SI_SPILL_A96_SAVE:
Show All 29 Lines	switch (MI->getOpcode()) {
case AMDGPU::SI_SPILL_V64_RESTORE:		case AMDGPU::SI_SPILL_V64_RESTORE:
case AMDGPU::SI_SPILL_V96_RESTORE:		case AMDGPU::SI_SPILL_V96_RESTORE:
case AMDGPU::SI_SPILL_V128_RESTORE:		case AMDGPU::SI_SPILL_V128_RESTORE:
case AMDGPU::SI_SPILL_V160_RESTORE:		case AMDGPU::SI_SPILL_V160_RESTORE:
case AMDGPU::SI_SPILL_V192_RESTORE:		case AMDGPU::SI_SPILL_V192_RESTORE:
case AMDGPU::SI_SPILL_V224_RESTORE:		case AMDGPU::SI_SPILL_V224_RESTORE:
case AMDGPU::SI_SPILL_V256_RESTORE:		case AMDGPU::SI_SPILL_V256_RESTORE:
case AMDGPU::SI_SPILL_V512_RESTORE:		case AMDGPU::SI_SPILL_V512_RESTORE:
case AMDGPU::SI_SPILL_V1024_RESTORE:		case AMDGPU::SI_SPILL_V1024_RESTORE: {
		if (MFI->getLdsSpill().TotalSize > 0) {
		const MachineOperand *VData =
		TII->getNamedOperand(*MI, AMDGPU::OpName::vdata);
		assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==
		MFI->getStackPtrOffsetReg());

		bool ldsSpill = buildLdsSpillLoadStore(
		MBB, MI, DL, /IsLoad */ true, Index, VData->getReg(), false,
		TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),
		*MI->memoperands_begin());
		if (ldsSpill) {
		MFI->addToSpilledVGPRs(getNumSubRegsForSpillOp(MI->getOpcode()));
		sebastian-neUnsubmitted Not Done Reply Inline Actions The non-lds restore code does not call `addToSpilledVGPRs`. Is this intentional here? sebastian-ne: The non-lds restore code does not call `addToSpilledVGPRs`. Is this intentional here?
		piotrAuthorUnsubmitted Done Reply Inline Actions The non-lds restore code does not call `addToSpilledVGPRs`. Is this intentional here? Good catch - copy & pasta error on my part. piotr: > The non-lds restore code does not call `addToSpilledVGPRs`. Is this intentional here? Good…
		MI->eraseFromParent();
		break;
		}
		}
		LLVM_FALLTHROUGH;
		}
case AMDGPU::SI_SPILL_A32_RESTORE:		case AMDGPU::SI_SPILL_A32_RESTORE:
case AMDGPU::SI_SPILL_A64_RESTORE:		case AMDGPU::SI_SPILL_A64_RESTORE:
case AMDGPU::SI_SPILL_A96_RESTORE:		case AMDGPU::SI_SPILL_A96_RESTORE:
case AMDGPU::SI_SPILL_A128_RESTORE:		case AMDGPU::SI_SPILL_A128_RESTORE:
case AMDGPU::SI_SPILL_A160_RESTORE:		case AMDGPU::SI_SPILL_A160_RESTORE:
case AMDGPU::SI_SPILL_A192_RESTORE:		case AMDGPU::SI_SPILL_A192_RESTORE:
case AMDGPU::SI_SPILL_A224_RESTORE:		case AMDGPU::SI_SPILL_A224_RESTORE:
case AMDGPU::SI_SPILL_A256_RESTORE:		case AMDGPU::SI_SPILL_A256_RESTORE:
▲ Show 20 Lines • Show All 1,028 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/lds-spill-cs.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -verify-machineinstrs -o - %s \| FileCheck %s --check-prefixes=W32
				; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs -o - %s \| FileCheck %s --check-prefixes=W64
				; RUN: llc -march=amdgcn -mcpu=gfx1030 -mattr=+wavefrontsize32,-wavefrontsize64 -verify-machineinstrs -o - %s \| FileCheck %s --check-prefixes=W32
				; RUN: llc -march=amdgcn -mcpu=gfx1030 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs -o - %s \| FileCheck %s --check-prefixes=W64

				; The test checks if the part of spilling goes to LDS with the right m0 setup.
				; Without the vgpr limit, the test needs to use 16 vgprs as there are four vec4 variables in-flight.
				; With "num-vgpr"="12" limit, one vec4 needs to be spilled to memory (16 bytes).
				; 16 bytes will occupy 256 dwords, so setting "amdgpu-lds-spill-limit-dwords"="256" will suffice.
				; Note: 16 bytes * 64 (workgroup size) = 16 * 64 * 8 bits = 16 * 64 * 8 / 32 dwords = 256 dwords

				define dllexport amdgpu_cs void @_amdgpu_cs_main(i32 inreg %globalTable, i32 inreg %perShaderTable, i32 inreg %descTable0, i32 inreg %spillTable, <3 x i32> inreg %WorkgroupId, i32 inreg %MultiDispatchInfo, <3 x i32> %LocalInvocationId, <2 x i32> inreg %ptr) #3 {
				; W32-LABEL: _amdgpu_cs_main:
				; W32: ; %bb.0: ; %.entry
				; W32: s_bfe_u32 s7, s7, 0xc0006
				; W32: s_mulk_i32 s7, 0x80
				; W32: s_mov_b32 m0, s7
				; W32: ds_write_addtid_b32 v0 ; 4-byte Folded Spill
				; W32: ds_write_addtid_b32 v1 offset:256 ; 4-byte Folded Spill
				; W32: ds_write_addtid_b32 v2 offset:512 ; 4-byte Folded Spill
				; W32: ds_write_addtid_b32 v3 offset:768 ; 4-byte Folded Spill
				; W32: ds_read_addtid_b32 v0 ; 4-byte Folded Reload
				; W32: ds_read_addtid_b32 v1 offset:256 ; 4-byte Folded Reload
				; W32: ds_read_addtid_b32 v2 offset:512 ; 4-byte Folded Reload
				; W32: ds_read_addtid_b32 v3 offset:768 ; 4-byte Folded Reload
				;
				; W64-LABEL: _amdgpu_cs_main:
				; W64: ; %bb.0: ; %.entry
				; W64: s_mov_b32 m0, 0
				; W64: ds_write_addtid_b32 v0 ; 4-byte Folded Spill
				; W64: ds_write_addtid_b32 v1 offset:256 ; 4-byte Folded Spill
				; W64: ds_write_addtid_b32 v2 offset:512 ; 4-byte Folded Spill
				; W64: ds_write_addtid_b32 v3 offset:768 ; 4-byte Folded Spill
				; W64: ds_read_addtid_b32 v0 ; 4-byte Folded Reload
				; W64: ds_read_addtid_b32 v1 offset:256 ; 4-byte Folded Reload
				; W64: ds_read_addtid_b32 v2 offset:512 ; 4-byte Folded Reload
				; W64: ds_read_addtid_b32 v3 offset:768 ; 4-byte Folded Reload
				.entry:
				%i6 = bitcast <2 x i32> %ptr to i64
				%i7 = inttoptr i64 %i6 to <4 x i32> addrspace(4)*
				%i8 = load <4 x i32>, <4 x i32> addrspace(4)* %i7, align 16
				%i9 = call <4 x i32> @llvm.amdgcn.raw.buffer.load.v4i32(<4 x i32> %i8, i32 0, i32 0, i32 0)
				%i10 = call <4 x i32> @llvm.amdgcn.raw.buffer.load.v4i32(<4 x i32> %i8, i32 16, i32 0, i32 0)
				%i11 = call <4 x i32> @llvm.amdgcn.raw.buffer.load.v4i32(<4 x i32> %i8, i32 32, i32 0, i32 0)
				%i12 = call <4 x i32> @llvm.amdgcn.raw.buffer.load.v4i32(<4 x i32> %i8, i32 48, i32 0, i32 0)
				fence syncscope("workgroup") acq_rel
				call void @llvm.amdgcn.s.barrier()
				call void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32> %i9, <4 x i32> %i8, i32 64, i32 0, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32> %i10, <4 x i32> %i8, i32 80, i32 0, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32> %i11, <4 x i32> %i8, i32 96, i32 0, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32> %i12, <4 x i32> %i8, i32 112, i32 0, i32 0)
				ret void
				}

				; Function Attrs: convergent nounwind willreturn
				declare void @llvm.amdgcn.s.barrier() #0

				; Function Attrs: nounwind readonly willreturn
				declare <4 x i32> @llvm.amdgcn.raw.buffer.load.v4i32(<4 x i32>, i32, i32, i32 immarg) #4

				; Function Attrs: nounwind willreturn writeonly
				declare void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32 immarg) #5

				attributes #3 = { nounwind "amdgpu-flat-work-group-size"="64,64" "amdgpu-lds-spill-limit-dwords"="256" "amdgpu-work-group-info-arg-no"="5" "amdgpu-num-vgpr"="12" }

llvm/test/CodeGen/AMDGPU/lds-spill-ps.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -verify-machineinstrs -o - %s \| FileCheck %s --check-prefixes=W32
				; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs -o - %s \| FileCheck %s --check-prefixes=W64
				; RUN: llc -march=amdgcn -mcpu=gfx1030 -mattr=+wavefrontsize32,-wavefrontsize64 -verify-machineinstrs -o - %s \| FileCheck %s --check-prefixes=W32
				; RUN: llc -march=amdgcn -mcpu=gfx1030 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs -o - %s \| FileCheck %s --check-prefixes=W64

				; The test checks if the part of spilling goes to LDS with the right m0 setup.
				; Since "amdgpu-lds-spill-limit-dwords"="256" limit is respected:
				; - In wave32, 8 dword slots get allocated to LDS (8 * 32), equaling 1024 bytes.
				; - In wave64, 4 dword slots get allodated to LDS (4 * 64), equaling 1024 bytes.

				define dllexport amdgpu_ps void @_amdgpu_ps_main(i32 inreg %globalTable, i32 inreg %perShaderTable, i32 inreg %descTable0, i32 inreg %spillTable, i32 inreg %PrimMask, <2 x float> %PerspInterpSample, <2 x float> %PerspInterpCenter, <2 x float> %PerspInterpCentroid, <3 x float> %PerspInterpPullMode, <2 x float> %LinearInterpSample, <2 x float> %LinearInterpCenter, <2 x float> %LinearInterpCentroid, float %LineStipple, float %FragCoordX, float %FragCoordY, float %FragCoordZ, float %FragCoordW, i32 %FrontFacing, i32 %Ancillary, i32 %SampleCoverage, i32 %FixedXY, <2 x i32> inreg %ptr) #0 {
				; W32-LABEL: _amdgpu_ps_main:
				; W32: ; %bb.0: ; %.entry
				; W32: s_mov_b32 m0, 0
				; W32: ds_write_addtid_b32 v0 ; 4-byte Folded Spill
				; W32: ds_write_addtid_b32 v1 offset:128 ; 4-byte Folded Spill
				; W32: ds_write_addtid_b32 v2 offset:256 ; 4-byte Folded Spill
				; W32: ds_write_addtid_b32 v3 offset:384 ; 4-byte Folded Spill
				; W32: ds_write_addtid_b32 v0 offset:512 ; 4-byte Folded Spill
				; W32: ds_write_addtid_b32 v1 offset:640 ; 4-byte Folded Spill
				; W32: ds_write_addtid_b32 v2 offset:768 ; 4-byte Folded Spill
				; W32: ds_write_addtid_b32 v3 offset:896 ; 4-byte Folded Spill
				; W32: ds_read_addtid_b32 v0 ; 4-byte Folded Reload
				; W32: ds_read_addtid_b32 v1 offset:128 ; 4-byte Folded Reload
				; W32: ds_read_addtid_b32 v2 offset:256 ; 4-byte Folded Reload
				; W32: ds_read_addtid_b32 v3 offset:384 ; 4-byte Folded Reload
				; W32: ds_read_addtid_b32 v0 offset:512 ; 4-byte Folded Reload
				; W32: ds_read_addtid_b32 v1 offset:640 ; 4-byte Folded Reload
				; W32: ds_read_addtid_b32 v2 offset:768 ; 4-byte Folded Reload
				; W32: ds_read_addtid_b32 v3 offset:896 ; 4-byte Folded Reload
				;
				; W64-LABEL: _amdgpu_ps_main:
				; W64: ; %bb.0: ; %.entry
				; W64: s_mov_b32 m0, 0
				; W64: ds_write_addtid_b32 v0 ; 4-byte Folded Spill
				; W64: ds_write_addtid_b32 v1 offset:256 ; 4-byte Folded Spill
				; W64: ds_write_addtid_b32 v2 offset:512 ; 4-byte Folded Spill
				; W64: ds_write_addtid_b32 v3 offset:768 ; 4-byte Folded Spill
				; W64: ds_read_addtid_b32 v0 ; 4-byte Folded Reload
				; W64: ds_read_addtid_b32 v1 offset:256 ; 4-byte Folded Reload
				; W64: ds_read_addtid_b32 v2 offset:512 ; 4-byte Folded Reload
				; W64: ds_read_addtid_b32 v3 offset:768 ; 4-byte Folded Reload
				.entry:
				%i6 = bitcast <2 x i32> %ptr to i64
				%i7 = inttoptr i64 %i6 to <4 x i32> addrspace(4)*
				%i8 = load <4 x i32>, <4 x i32> addrspace(4)* %i7, align 16
				%i9 = call <4 x i32> @llvm.amdgcn.raw.buffer.load.v4i32(<4 x i32> %i8, i32 0, i32 0, i32 0)
				%i10 = call <4 x i32> @llvm.amdgcn.raw.buffer.load.v4i32(<4 x i32> %i8, i32 16, i32 0, i32 0)
				%i11 = call <4 x i32> @llvm.amdgcn.raw.buffer.load.v4i32(<4 x i32> %i8, i32 32, i32 0, i32 0)
				%i12 = call <4 x i32> @llvm.amdgcn.raw.buffer.load.v4i32(<4 x i32> %i8, i32 48, i32 0, i32 0)
				fence acq_rel
				call void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32> %i9, <4 x i32> %i8, i32 64, i32 0, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32> %i10, <4 x i32> %i8, i32 80, i32 0, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32> %i11, <4 x i32> %i8, i32 96, i32 0, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32> %i12, <4 x i32> %i8, i32 112, i32 0, i32 0)
				ret void
				}

				; Function Attrs: nounwind readonly willreturn
				declare <4 x i32> @llvm.amdgcn.raw.buffer.load.v4i32(<4 x i32>, i32, i32, i32 immarg)
				; Function Attrs: nounwind willreturn writeonly
				declare void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32 immarg)

				attributes #0 = { "amdgpu-lds-spill-limit-dwords"="256" "amdgpu-num-vgpr"="12" }

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Support LDS spillingChanges PlannedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 450305

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

llvm/lib/Target/AMDGPU/GCNSubtarget.h

llvm/lib/Target/AMDGPU/SIFrameLowering.h

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/test/CodeGen/AMDGPU/lds-spill-cs.ll

llvm/test/CodeGen/AMDGPU/lds-spill-ps.ll

[AMDGPU] Support LDS spilling
Changes PlannedPublic