This is an archive of the discontinued LLVM Phabricator instance.

Differential D19894

AMDGPU: Fix not counting shader input registers
AbandonedPublic

Authored by arsenm on May 3 2016, 4:58 PM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
mareko
nhaehnle

Summary

The user specified shader inputs were not counting towards
the total user SGPR count, so it was possible to request
more than are possible, as well as report fewer registers
than really used.

Diff Detail

Event Timeline

arsenm updated this revision to Diff 56081.May 3 2016, 4:58 PM

arsenm retitled this revision from to AMDGPU: Fix not counting shader input registers.

arsenm updated this object.

arsenm added reviewers: • tstellarAMD, nhaehnle, mareko.

arsenm added a subscriber: llvm-commits.

Herald added subscribers: arsenm, qcolombet. · View Herald TranscriptMay 3 2016, 4:58 PM

Make sure total register usage is at least the number of inputs

I'm not sure I understand the design of addArgUserReg. Is the CurReg parameter supposed to be correct or not? If yes, why is there a separate return value? If not, why is it there in the first place? It also seems like the alignment of SGPRs could just be done by a numerical rounding up instead of a loop.

In D19894#422566, @nhaehnle wrote:

I'm not sure I understand the design of addArgUserReg. Is the CurReg parameter supposed to be correct or not? If yes, why is there a separate return value? If not, why is it there in the first place? It also seems like the alignment of SGPRs could just be done by a numerical rounding up instead of a loop.

It's supposed to be ensuring the number of used registers is consistent with the reported register by the generated calling convention (there was supposed to be an assert in here). I was trying to avoid assumptions based on the value of the register enums. It's kind of gross, but I'm not sure of a better way to express the connection between the two separate ways the calling convention registers are tracked

s/UserSGPR/InputSGPR/
s/UserReg/InputReg/

The LLVM backend can't deduce the number of user SGPRs, because Mesa doesn't supply that information. LLVM only receives the list of all input SGPRs. Some of them come from USER DATA, others are preloaded by the hardware based on other states, etc.

I found a problem with the generated input registers. If you have a case like
(<3 x i32> inreg, i64 inreg, i32 inreg), it correctly selects s0, s1, s2 for the vector, s[4:5] for the i64, and the final i32 picks the s3 in the alignment gap between the vector and i64. I'm guessing that changing the order this way will break something

nhaehnle resigned from this revision.Feb 21 2018, 8:42 AM

Herald added subscribers: t-tye, tpr, dstuttard and 3 others. · View Herald TranscriptFeb 21 2018, 8:42 AM

arsenm abandoned this revision.Apr 5 2020, 7:39 AM

Herald added subscribers: kerbowa, jvesely. · View Herald TranscriptApr 5 2020, 7:40 AM

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPUAsmPrinter.cpp

4 lines

SIISelLowering.cpp

73 lines

SIMachineFunctionInfo.h

12 lines

SIMachineFunctionInfo.cpp

49 lines

SIRegisterInfo.h

11 lines

SIRegisterInfo.cpp

11 lines

test/

CodeGen/

AMDGPU/

amdgpu-shader-calling-convention.ll

97 lines

llvm.amdgcn.image.ll

9 lines

16 lines

vgpr-spill-emergency-stack-slot.ll

8 lines

Diff 56085

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

Show First 20 Lines • Show All 339 Lines • ▼ Show 20 Lines	void AMDGPUAsmPrinter::EmitProgramInfoR600(const MachineFunction &MF) {
}		}
}		}

void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,		void AMDGPUAsmPrinter::getSIProgramInfo(SIProgramInfo &ProgInfo,
const MachineFunction &MF) const {		const MachineFunction &MF) const {
const AMDGPUSubtarget &STM = MF.getSubtarget<AMDGPUSubtarget>();		const AMDGPUSubtarget &STM = MF.getSubtarget<AMDGPUSubtarget>();
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
uint64_t CodeSize = 0;		uint64_t CodeSize = 0;
unsigned MaxSGPR = 0;		unsigned MaxSGPR = MFI->getNumUserSGPRs() - 1;
unsigned MaxVGPR = 0;		unsigned MaxVGPR = MFI->getNumUserVGPRs() - 1;
bool VCCUsed = false;		bool VCCUsed = false;
bool FlatUsed = false;		bool FlatUsed = false;
const SIRegisterInfo *RI =		const SIRegisterInfo *RI =
static_cast<const SIRegisterInfo *>(STM.getRegisterInfo());		static_cast<const SIRegisterInfo *>(STM.getRegisterInfo());

for (const MachineBasicBlock &MBB : MF) {		for (const MachineBasicBlock &MBB : MF) {
for (const MachineInstr &MI : MBB) {		for (const MachineInstr &MI : MBB) {
// TODO: CodeSize should account for multiple functions.		// TODO: CodeSize should account for multiple functions.
▲ Show 20 Lines • Show All 385 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

Show First 20 Lines • Show All 732 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::LowerFormalArguments(
// Otherwise, the following restrictions apply:		// Otherwise, the following restrictions apply:
// - At least one of PERSP_* (0xF) or LINEAR_* (0x70) must be enabled.		// - At least one of PERSP_* (0xF) or LINEAR_* (0x70) must be enabled.
// - If POS_W_FLOAT (11) is enabled, at least one of PERSP_* must be		// - If POS_W_FLOAT (11) is enabled, at least one of PERSP_* must be
// enabled too.		// enabled too.
if (CallConv == CallingConv::AMDGPU_PS &&		if (CallConv == CallingConv::AMDGPU_PS &&
((Info->getPSInputAddr() & 0x7F) == 0 \|\|		((Info->getPSInputAddr() & 0x7F) == 0 \|\|
((Info->getPSInputAddr() & 0xF) == 0 &&		((Info->getPSInputAddr() & 0xF) == 0 &&
Info->isPSInputAllocated(11)))) {		Info->isPSInputAllocated(11)))) {
CCInfo.AllocateReg(AMDGPU::VGPR0);		CCInfo.AllocateReg(Info->addArgUserReg(*TRI, AMDGPU::VGPR0, 4));
CCInfo.AllocateReg(AMDGPU::VGPR1);		CCInfo.AllocateReg(Info->addArgUserReg(*TRI, AMDGPU::VGPR1, 4));
Info->markPSInputAllocated(0);		Info->markPSInputAllocated(0);
Info->PSInputEna \|= 1;		Info->PSInputEna \|= 1;
}		}

if (!AMDGPU::isShader(CallConv)) {		if (!AMDGPU::isShader(CallConv)) {
getOriginalFunctionArgs(DAG, DAG.getMachineFunction().getFunction(), Ins,		getOriginalFunctionArgs(DAG, DAG.getMachineFunction().getFunction(), Ins,
Splits);		Splits);

▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	if (VA.isMemLoc()) {
Arg = DAG.getNode(ISD::AssertZext, DL, Arg.getValueType(), Arg,		Arg = DAG.getNode(ISD::AssertZext, DL, Arg.getValueType(), Arg,
DAG.getValueType(MVT::i16));		DAG.getValueType(MVT::i16));
}		}

InVals.push_back(Arg);		InVals.push_back(Arg);
Info->ABIArgOffset = Offset + MemVT.getStoreSize();		Info->ABIArgOffset = Offset + MemVT.getStoreSize();
continue;		continue;
}		}

assert(VA.isRegLoc() && "Parameter must be in a register!");		assert(VA.isRegLoc() && "Parameter must be in a register!");

		// Currently only in register types are 32-bit or 64-bit. Only the first
		// register in the pair is returned by getLocReg.
		bool IsSGPRArg = Arg.Flags.isInReg() \|\| Arg.Flags.isByVal();
unsigned Reg = VA.getLocReg();		unsigned Reg = VA.getLocReg();
		unsigned RegSize = VT.getStoreSize();
if (VT == MVT::i64) {		const TargetRegisterClass *RC = IsSGPRArg ?
// For now assume it is a pointer		TRI->getSGPRSizeClass(RegSize) : TRI->getVGPRSizeClass(RegSize);
Reg = TRI->getMatchingSuperReg(Reg, AMDGPU::sub0,		if (RegSize == 8) {
&AMDGPU::SReg_64RegClass);		Reg = TRI->getMatchingSuperReg(Reg, AMDGPU::sub0, RC);
Reg = MF.addLiveIn(Reg, &AMDGPU::SReg_64RegClass);		assert(Reg != AMDGPU::NoRegister);
SDValue Copy = DAG.getCopyFromReg(Chain, DL, Reg, VT);
InVals.push_back(Copy);
continue;
}		}

const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg, VT);		if (!Arg.VT.isVector()) {
		unsigned NewReg = Info->addArgUserReg(*TRI, Reg, RegSize);

Reg = MF.addLiveIn(Reg, RC);		CCInfo.AllocateReg(NewReg);
		Reg = MF.addLiveIn(NewReg, RC);
SDValue Val = DAG.getCopyFromReg(Chain, DL, Reg, VT);		SDValue Val = DAG.getCopyFromReg(Chain, DL, Reg, VT);
		InVals.push_back(Val);
if (Arg.VT.isVector()) {		continue;
		}

// Build a vector from the registers		// Build a vector from the registers
Type *ParamType = FType->getParamType(Arg.getOrigArgIndex());		Type *ParamType = FType->getParamType(Arg.getOrigArgIndex());
unsigned NumElements = ParamType->getVectorNumElements();		unsigned NumElements = ParamType->getVectorNumElements();

SmallVector<SDValue, 4> Regs;		SmallVector<SDValue, 4> Regs;
		unsigned NewReg = Info->addArgUserReg(*TRI, Reg, RegSize);

		CCInfo.AllocateReg(NewReg);
		Reg = MF.addLiveIn(NewReg, RC);
		SDValue Val = DAG.getCopyFromReg(Chain, DL, Reg, VT);
Regs.push_back(Val);		Regs.push_back(Val);


for (unsigned j = 1; j != NumElements; ++j) {		for (unsigned j = 1; j != NumElements; ++j) {
Reg = ArgLocs[ArgIdx++].getLocReg();		Reg = ArgLocs[ArgIdx++].getLocReg();
Reg = MF.addLiveIn(Reg, RC);		unsigned NewReg = Info->addArgUserReg(*TRI, Reg, RegSize);
		CCInfo.AllocateReg(NewReg);
		Reg = MF.addLiveIn(NewReg, RC);

SDValue Copy = DAG.getCopyFromReg(Chain, DL, Reg, VT);		SDValue Copy = DAG.getCopyFromReg(Chain, DL, Reg, VT);
Regs.push_back(Copy);		Regs.push_back(Copy);
}		}

// Fill up the missing vector elements		// Fill up the missing vector elements
NumElements = Arg.VT.getVectorNumElements() - NumElements;		NumElements = Arg.VT.getVectorNumElements() - NumElements;
Regs.append(NumElements, DAG.getUNDEF(VT));		Regs.append(NumElements, DAG.getUNDEF(VT));

InVals.push_back(DAG.getBuildVector(Arg.VT, DL, Regs));		InVals.push_back(DAG.getBuildVector(Arg.VT, DL, Regs));
continue;
}

InVals.push_back(Val);
}		}

// TODO: Add GridWorkGroupCount user SGPRs when used. For now with HSA we read		// TODO: Add GridWorkGroupCount user SGPRs when used. For now with HSA we read
// these from the dispatch pointer.		// these from the dispatch pointer.

// Start adding system SGPRs.		// Start adding system SGPRs.
if (Info->hasWorkGroupIDX()) {		if (Info->hasWorkGroupIDX()) {
unsigned Reg = Info->addWorkGroupIDX();		unsigned Reg = Info->addWorkGroupIDX();
▲ Show 20 Lines • Show All 2,488 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIMachineFunctionInfo.h

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
public:		public:
// FIXME: Make private		// FIXME: Make private
unsigned LDSWaveSpillSize;		unsigned LDSWaveSpillSize;
unsigned PSInputEna;		unsigned PSInputEna;
std::map<unsigned, unsigned> LaneVGPRs;		std::map<unsigned, unsigned> LaneVGPRs;
unsigned ScratchOffsetReg;		unsigned ScratchOffsetReg;
unsigned NumUserSGPRs;		unsigned NumUserSGPRs;
unsigned NumSystemSGPRs;		unsigned NumSystemSGPRs;
		unsigned NumUserVGPRs;

private:		private:
bool HasSpilledSGPRs;		bool HasSpilledSGPRs;
bool HasSpilledVGPRs;		bool HasSpilledVGPRs;
bool HasNonSpillStackObjects;		bool HasNonSpillStackObjects;
bool HasFlatInstructions;		bool HasFlatInstructions;

// Feature bits required for inputs passed in user SGPRs.		// Feature bits required for inputs passed in user SGPRs.
Show All 23 Lines	MCPhysReg getNextUserSGPR() const {
assert(NumSystemSGPRs == 0 && "System SGPRs must be added after user SGPRs");		assert(NumSystemSGPRs == 0 && "System SGPRs must be added after user SGPRs");
return AMDGPU::SGPR0 + NumUserSGPRs;		return AMDGPU::SGPR0 + NumUserSGPRs;
}		}

MCPhysReg getNextSystemSGPR() const {		MCPhysReg getNextSystemSGPR() const {
return AMDGPU::SGPR0 + NumUserSGPRs + NumSystemSGPRs;		return AMDGPU::SGPR0 + NumUserSGPRs + NumSystemSGPRs;
}		}

		MCPhysReg getNextUserVGPR() const {
		return AMDGPU::VGPR0 + NumUserVGPRs;
		}

public:		public:
struct SpilledReg {		struct SpilledReg {
unsigned VGPR;		unsigned VGPR;
int Lane;		int Lane;
SpilledReg(unsigned R, int L) : VGPR (R), Lane (L) { }		SpilledReg(unsigned R, int L) : VGPR (R), Lane (L) { }
SpilledReg() : VGPR(AMDGPU::NoRegister), Lane(-1) { }		SpilledReg() : VGPR(AMDGPU::NoRegister), Lane(-1) { }
bool hasLane() { return Lane != -1;}		bool hasLane() { return Lane != -1;}
bool hasReg() { return VGPR != AMDGPU::NoRegister;}		bool hasReg() { return VGPR != AMDGPU::NoRegister;}
Show All 10 Lines	public:

// Add user SGPRs.		// Add user SGPRs.
unsigned addPrivateSegmentBuffer(const SIRegisterInfo &TRI);		unsigned addPrivateSegmentBuffer(const SIRegisterInfo &TRI);
unsigned addDispatchPtr(const SIRegisterInfo &TRI);		unsigned addDispatchPtr(const SIRegisterInfo &TRI);
unsigned addQueuePtr(const SIRegisterInfo &TRI);		unsigned addQueuePtr(const SIRegisterInfo &TRI);
unsigned addKernargSegmentPtr(const SIRegisterInfo &TRI);		unsigned addKernargSegmentPtr(const SIRegisterInfo &TRI);
unsigned addFlatScratchInit(const SIRegisterInfo &TRI);		unsigned addFlatScratchInit(const SIRegisterInfo &TRI);

		unsigned addArgUserReg(const SIRegisterInfo &TRI,
		unsigned CurReg, unsigned Size);

// Add system SGPRs.		// Add system SGPRs.
unsigned addWorkGroupIDX() {		unsigned addWorkGroupIDX() {
WorkGroupIDXSystemSGPR = getNextSystemSGPR();		WorkGroupIDXSystemSGPR = getNextSystemSGPR();
NumSystemSGPRs += 1;		NumSystemSGPRs += 1;
return WorkGroupIDXSystemSGPR;		return WorkGroupIDXSystemSGPR;
}		}

unsigned addWorkGroupIDY() {		unsigned addWorkGroupIDY() {
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	public:
unsigned getNumUserSGPRs() const {		unsigned getNumUserSGPRs() const {
return NumUserSGPRs;		return NumUserSGPRs;
}		}

unsigned getNumPreloadedSGPRs() const {		unsigned getNumPreloadedSGPRs() const {
return NumUserSGPRs + NumSystemSGPRs;		return NumUserSGPRs + NumSystemSGPRs;
}		}

		unsigned getNumUserVGPRs() const {
		return NumUserVGPRs;
		}

unsigned getPrivateSegmentWaveByteOffsetSystemSGPR() const {		unsigned getPrivateSegmentWaveByteOffsetSystemSGPR() const {
return PrivateSegmentWaveByteOffsetSystemSGPR;		return PrivateSegmentWaveByteOffsetSystemSGPR;
}		}

/// \brief Returns the physical register reserved for use as the resource		/// \brief Returns the physical register reserved for use as the resource
/// descriptor for scratch accesses.		/// descriptor for scratch accesses.
unsigned getScratchRSrcReg() const {		unsigned getScratchRSrcReg() const {
return ScratchRSrcReg;		return ScratchRSrcReg;
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	: AMDGPUMachineFunction(MF),
PSInputAddr(0),		PSInputAddr(0),
ReturnsVoid(true),		ReturnsVoid(true),
MaximumWorkGroupSize(0),		MaximumWorkGroupSize(0),
DebuggerReserveTrapVGPRCount(0),		DebuggerReserveTrapVGPRCount(0),
LDSWaveSpillSize(0),		LDSWaveSpillSize(0),
PSInputEna(0),		PSInputEna(0),
NumUserSGPRs(0),		NumUserSGPRs(0),
NumSystemSGPRs(0),		NumSystemSGPRs(0),
		NumUserVGPRs(0),
HasSpilledSGPRs(false),		HasSpilledSGPRs(false),
HasSpilledVGPRs(false),		HasSpilledVGPRs(false),
HasNonSpillStackObjects(false),		HasNonSpillStackObjects(false),
HasFlatInstructions(false),		HasFlatInstructions(false),
PrivateSegmentBuffer(false),		PrivateSegmentBuffer(false),
DispatchPtr(false),		DispatchPtr(false),
QueuePtr(false),		QueuePtr(false),
DispatchID(false),		DispatchID(false),
Show All 32 Lines	SIMachineFunctionInfo::SIMachineFunctionInfo(const MachineFunction &MF)
if (F->hasFnAttribute("amdgpu-work-item-id-y"))		if (F->hasFnAttribute("amdgpu-work-item-id-y"))
WorkItemIDY = true;		WorkItemIDY = true;

if (F->hasFnAttribute("amdgpu-work-item-id-z"))		if (F->hasFnAttribute("amdgpu-work-item-id-z"))
WorkItemIDZ = true;		WorkItemIDZ = true;

// X, XY, and XYZ are the only supported combinations, so make sure Y is		// X, XY, and XYZ are the only supported combinations, so make sure Y is
// enabled if Z is.		// enabled if Z is.
if (WorkItemIDZ)		if (WorkItemIDZ) {
WorkItemIDY = true;		WorkItemIDY = true;
		}

		if (WorkItemIDX)
		++NumUserVGPRs;

		if (WorkItemIDY)
		++NumUserVGPRs;

		if (WorkItemIDZ)
		++NumUserVGPRs;

bool MaySpill = ST.isVGPRSpillingEnabled(*F);		bool MaySpill = ST.isVGPRSpillingEnabled(*F);
bool HasStackObjects = FrameInfo->hasStackObjects();		bool HasStackObjects = FrameInfo->hasStackObjects();

if (HasStackObjects \|\| MaySpill)		if (HasStackObjects \|\| MaySpill)
PrivateSegmentWaveByteOffset = true;		PrivateSegmentWaveByteOffset = true;

if (ST.isAmdHsaOS()) {		if (ST.isAmdHsaOS()) {
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines

unsigned SIMachineFunctionInfo::addFlatScratchInit(const SIRegisterInfo &TRI) {		unsigned SIMachineFunctionInfo::addFlatScratchInit(const SIRegisterInfo &TRI) {
FlatScratchInitUserSGPR = TRI.getMatchingSuperReg(		FlatScratchInitUserSGPR = TRI.getMatchingSuperReg(
getNextUserSGPR(), AMDGPU::sub0, &AMDGPU::SReg_64RegClass);		getNextUserSGPR(), AMDGPU::sub0, &AMDGPU::SReg_64RegClass);
NumUserSGPRs += 2;		NumUserSGPRs += 2;
return FlatScratchInitUserSGPR;		return FlatScratchInitUserSGPR;
}		}

		unsigned SIMachineFunctionInfo::addArgUserReg(const SIRegisterInfo &TRI,
		unsigned CurReg,
		unsigned Size) {
		const TargetRegisterClass *RC = TRI.getPhysRegClass(CurReg);

		// VGPRs have no alignment restrictions
		if (TRI.hasVGPRs(RC)) {
		unsigned Reg = getNextUserVGPR();
		NumUserVGPRs += Size / 4;
		return Reg;
		}

		// SGPRs have alignment restrictions.
		if (Size == 4) {
		unsigned Reg = getNextUserSGPR();
		NumUserSGPRs += 1;
		return Reg;
		}

		assert(Size == 8 &&
		"user sgpr calling convention only has 4 or 8 byte types");

		unsigned FirstReg = TRI.getSubReg(CurReg, AMDGPU::sub0);

		// Skip over padding register. We assume the register passed in is correctly
		// aligned.
		unsigned Reg = getNextUserSGPR();
		while (Reg != FirstReg) {
		++Reg;
		++NumUserSGPRs;
		}

		NumUserSGPRs += 2;
		return CurReg;
		}

SIMachineFunctionInfo::SpilledReg SIMachineFunctionInfo::getSpilledReg(		SIMachineFunctionInfo::SpilledReg SIMachineFunctionInfo::getSpilledReg(
MachineFunction *MF,		MachineFunction *MF,
unsigned FrameIndex,		unsigned FrameIndex,
unsigned SubIdx) {		unsigned SubIdx) {
MachineFrameInfo *FrameInfo = MF->getFrameInfo();		MachineFrameInfo *FrameInfo = MF->getFrameInfo();
const SIRegisterInfo TRI = static_cast<const SIRegisterInfo >(		const SIRegisterInfo TRI = static_cast<const SIRegisterInfo >(
MF->getSubtarget<AMDGPUSubtarget>().getRegisterInfo());		MF->getSubtarget<AMDGPUSubtarget>().getRegisterInfo());
MachineRegisterInfo &MRI = MF->getRegInfo();		MachineRegisterInfo &MRI = MF->getRegInfo();
Show All 35 Lines

lib/Target/AMDGPU/SIRegisterInfo.h

Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	public:
/// SGPRs for operand modeling. FIXME: We should set isAllocatable = 0 on		/// SGPRs for operand modeling. FIXME: We should set isAllocatable = 0 on
/// them.		/// them.
static bool isPseudoRegClass(const TargetRegisterClass *RC) {		static bool isPseudoRegClass(const TargetRegisterClass *RC) {
return RC == &AMDGPU::VS_32RegClass \|\| RC == &AMDGPU::VS_64RegClass;		return RC == &AMDGPU::VS_32RegClass \|\| RC == &AMDGPU::VS_64RegClass;
}		}

/// \returns A VGPR reg class with the same width as \p SRC		/// \returns A VGPR reg class with the same width as \p SRC
const TargetRegisterClass *getEquivalentVGPRClass(		const TargetRegisterClass *getEquivalentVGPRClass(
const TargetRegisterClass *SRC) const;		const TargetRegisterClass *SRC) const {
		return getVGPRSizeClass(SRC->getSize());
		}

/// \returns A SGPR reg class with the same width as \p SRC		/// \returns A SGPR reg class with the same width as \p SRC
const TargetRegisterClass *getEquivalentSGPRClass(		const TargetRegisterClass *getEquivalentSGPRClass(
const TargetRegisterClass *VRC) const;		const TargetRegisterClass *VRC) const {
		return getSGPRSizeClass(VRC->getSize());
		}

		static const TargetRegisterClass *getVGPRSizeClass(unsigned Size);
		static const TargetRegisterClass *getSGPRSizeClass(unsigned Size);

/// \returns The register class that is used for a sub-register of \p RC for		/// \returns The register class that is used for a sub-register of \p RC for
/// the given \p SubIdx. If \p SubIdx equals NoSubRegister, \p RC will		/// the given \p SubIdx. If \p SubIdx equals NoSubRegister, \p RC will
/// be returned.		/// be returned.
const TargetRegisterClass getSubRegClass(const TargetRegisterClass RC,		const TargetRegisterClass getSubRegClass(const TargetRegisterClass RC,
unsigned SubIdx) const;		unsigned SubIdx) const;

bool shouldRewriteCopySrc(const TargetRegisterClass *DefRC,		bool shouldRewriteCopySrc(const TargetRegisterClass *DefRC,
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 693 Lines • ▼ Show 20 Lines	case 32:
return getCommonSubClass(&AMDGPU::VReg_256RegClass, RC) != nullptr;		return getCommonSubClass(&AMDGPU::VReg_256RegClass, RC) != nullptr;
case 64:		case 64:
return getCommonSubClass(&AMDGPU::VReg_512RegClass, RC) != nullptr;		return getCommonSubClass(&AMDGPU::VReg_512RegClass, RC) != nullptr;
default:		default:
llvm_unreachable("Invalid register class size");		llvm_unreachable("Invalid register class size");
}		}
}		}

const TargetRegisterClass *SIRegisterInfo::getEquivalentVGPRClass(		const TargetRegisterClass *SIRegisterInfo::getVGPRSizeClass(unsigned Size) {
const TargetRegisterClass *SRC) const {		switch (Size) {
switch (SRC->getSize()) {
case 4:		case 4:
return &AMDGPU::VGPR_32RegClass;		return &AMDGPU::VGPR_32RegClass;
case 8:		case 8:
return &AMDGPU::VReg_64RegClass;		return &AMDGPU::VReg_64RegClass;
case 12:		case 12:
return &AMDGPU::VReg_96RegClass;		return &AMDGPU::VReg_96RegClass;
case 16:		case 16:
return &AMDGPU::VReg_128RegClass;		return &AMDGPU::VReg_128RegClass;
case 32:		case 32:
return &AMDGPU::VReg_256RegClass;		return &AMDGPU::VReg_256RegClass;
case 64:		case 64:
return &AMDGPU::VReg_512RegClass;		return &AMDGPU::VReg_512RegClass;
default:		default:
llvm_unreachable("Invalid register class size");		llvm_unreachable("Invalid register class size");
}		}
}		}

const TargetRegisterClass *SIRegisterInfo::getEquivalentSGPRClass(
const TargetRegisterClass *VRC) const {		const TargetRegisterClass *SIRegisterInfo::getSGPRSizeClass(unsigned Size) {
switch (VRC->getSize()) {		switch (Size) {
case 4:		case 4:
return &AMDGPU::SGPR_32RegClass;		return &AMDGPU::SGPR_32RegClass;
case 8:		case 8:
return &AMDGPU::SReg_64RegClass;		return &AMDGPU::SReg_64RegClass;
case 16:		case 16:
return &AMDGPU::SReg_128RegClass;		return &AMDGPU::SReg_128RegClass;
case 32:		case 32:
return &AMDGPU::SReg_256RegClass;		return &AMDGPU::SReg_256RegClass;
▲ Show 20 Lines • Show All 255 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/amdgpu-shader-calling-convention.ll

	; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=SI %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=VI %s

				; GCN-LABEL: {{^}}unused_ptr_0:
				; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s2
				; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s3
				; GCN: NumSgprs: 8
				; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 4
				define amdgpu_vs void @unused_ptr_0(i32 addrspace(2)* inreg %arg0, i32 addrspace(2)* inreg %arg1) #0 {
				store volatile i32 addrspace(2)* %arg1, i32 addrspace(2)* addrspace(1)* null
				ret void
				}

				; GCN-LABEL: {{^}}unused_ptr_1:
				; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s0
				; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s1
				; GCN: NumSgprs: 8
				; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 4
				define amdgpu_vs void @unused_ptr_1(i32 addrspace(2)* inreg %arg0, i32 addrspace(2)* inreg %arg1) #0 {
				store volatile i32 addrspace(2)* %arg0, i32 addrspace(2)* addrspace(1)* null
				ret void
				}

				; GCN-LABEL: {{^}}unused_i32_ptr_0:
				; GCN: v_mov_b32_e32 v{{[0-9]+}}, s0
				; GCN: NumSgprs: 8
				; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 4
				define amdgpu_vs void @unused_i32_ptr_0(i32 inreg %arg0, i32 addrspace(2)* inreg %arg1) #0 {
				store volatile i32 %arg0, i32 addrspace(1)* null
				ret void
				}

				; XGCN-LABEL: {{^}}f64_input:
				; XGCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s0
				; XGCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s1
				; XGCN: COMPUTE_PGM_RSRC2:USER_SGPR: 2
				;define amdgpu_vs void @f64_input(double inreg %arg0) #0 {
				; store volatile double %arg0, double addrspace(1)* null
				; ret void
				;}

	; GCN-LABEL: {{^}}shader_cc:			; GCN-LABEL: {{^}}unused_ptr_i32_0:
				; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s0
				; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s1
				; GCN: NumSgprs: 8
				; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 3
				define amdgpu_vs void @unused_ptr_i32_0(i32 addrspace(2)* inreg %arg0, i32 inreg %arg1) #0 {
				store volatile i32 addrspace(2)* %arg0, i32 addrspace(2)* addrspace(1)* null
				ret void
				}

				; GCN-LABEL: {{^}}unused_i32_v4i32_0:
				; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s1
				; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s2
				; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s3
				; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s4
				; GCN: NumSgprs: 5
				; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 5
				define amdgpu_vs void @unused_i32_v4i32_0(i32 inreg %arg0, <4 x i32> inreg %arg1) #0 {
				store volatile <4 x i32> %arg1, <4 x i32> addrspace(1)* null
				ret void
				}

				; GCN-LABEL: {{^}}shader_cc_0:
	; GCN: v_add_i32_e32 v0, vcc, s8, v0			; GCN: v_add_i32_e32 v0, vcc, s8, v0
	define amdgpu_cs float @shader_cc(<4 x i32> inreg, <4 x i32> inreg, i32 inreg %w, float %v) {			; GCN: NumSgprs: 11
				; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 9
				define amdgpu_cs float @shader_cc_0(<4 x i32> inreg, <4 x i32> inreg, i32 inreg %w, float %v) #0 {
				%vi = bitcast float %v to i32
				%x = add i32 %vi, %w
				%xf = bitcast i32 %x to float
				ret float %xf
				}

				; GCN-LABEL: {{^}}shader_cc_1:
				; GCN: v_add_i32_e32 v0, vcc, s6, v0
				; GCN: NumSgprs: 9
				; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 7
				define amdgpu_cs float @shader_cc_1(<3 x i32> inreg, <3 x i32> inreg, i32 inreg %w, float %v) #0 {
				%vi = bitcast float %v to i32
				%x = add i32 %vi, %w
				%xf = bitcast i32 %x to float
				ret float %xf
				}

				; GCN-LABEL: {{^}}shader_cc_2:
				; GCN: v_add_i32_e32 v0, vcc, s6, v0
				; GCN: NumSgprs: 9
				; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 7
				define amdgpu_cs float @shader_cc_2(<3 x i32> inreg, i64 inreg, i32 inreg %w, float %v) #0 {
	%vi = bitcast float %v to i32			%vi = bitcast float %v to i32
	%x = add i32 %vi, %w			%x = add i32 %vi, %w
	%xf = bitcast i32 %x to float			%xf = bitcast i32 %x to float
	ret float %xf			ret float %xf
	}			}

	; GCN-LABEL: {{^}}kernel_cc:			; GCN-LABEL: {{^}}kernel_cc:
	; GCN: s_endpgm			; GCN: s_endpgm
	define float @kernel_cc(<4 x i32> inreg, <4 x i32> inreg, i32 inreg %w, float %v) {			; GCN: NumSgprs: 2
				; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 2
				define float @kernel_cc(<4 x i32> inreg, <4 x i32> inreg, i32 inreg %w, float %v) #0 {
	%vi = bitcast float %v to i32			%vi = bitcast float %v to i32
	%x = add i32 %vi, %w			%x = add i32 %vi, %w
	%xf = bitcast i32 %x to float			%xf = bitcast i32 %x to float
	ret float %xf			ret float %xf
	}			}

				attributes #0 = { nounwind }

test/CodeGen/AMDGPU/llvm.amdgcn.image.ll

	Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	; Ideally, the register allocator would avoid the wait here			; Ideally, the register allocator would avoid the wait here
	;			;
	;CHECK-LABEL: {{^}}image_store_wait:			;CHECK-LABEL: {{^}}image_store_wait:
	;CHECK: image_store v[0:3], v4, s[0:7] dmask:0xf unorm			;CHECK: image_store v[0:3], v4, s[0:7] dmask:0xf unorm
	;CHECK: s_waitcnt vmcnt(0) expcnt(0)			;CHECK: s_waitcnt vmcnt(0) expcnt(0)
	;CHECK: image_load v[0:3], v4, s[8:15] dmask:0xf unorm			;CHECK: image_load v[0:3], v4, s[8:15] dmask:0xf unorm
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	;CHECK: image_store v[0:3], v4, s[16:23] dmask:0xf unorm			;CHECK: image_store v[0:3], v4, s[16:23] dmask:0xf unorm
	define amdgpu_ps void @image_store_wait(<8 x i32> inreg, <8 x i32> inreg, <8 x i32> inreg, <4 x float>, i32) {			define amdgpu_ps void @image_store_wait(<8 x i32> inreg %arg, <8 x i32> inreg %arg1, <4 x float> %arg3, i32 %arg4) {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.i32(<4 x float> %3, i32 %4, <8 x i32> %0, i32 15, i1 0, i1 0, i1 0, i1 0)			%arg2 = load volatile <8 x i32>, <8 x i32> addrspace(2)* undef
	%data = call <4 x float> @llvm.amdgcn.image.load.i32(i32 %4, <8 x i32> %1, i32 15, i1 0, i1 0, i1 0, i1 0)			call void @llvm.amdgcn.image.store.i32(<4 x float> %arg3, i32 %arg4, <8 x i32> %arg, i32 15, i1 false, i1 false, i1 false, i1 false)
	call void @llvm.amdgcn.image.store.i32(<4 x float> %data, i32 %4, <8 x i32> %2, i32 15, i1 0, i1 0, i1 0, i1 0)			%data = call <4 x float> @llvm.amdgcn.image.load.i32(i32 %arg4, <8 x i32> %arg1, i32 15, i1 false, i1 false, i1 false, i1 false)
				call void @llvm.amdgcn.image.store.i32(<4 x float> %data, i32 %arg4, <8 x i32> %arg2, i32 15, i1 false, i1 false, i1 false, i1 false)
	ret void			ret void
	}			}

	declare void @llvm.amdgcn.image.store.i32(<4 x float>, i32, <8 x i32>, i32, i1, i1, i1, i1) #0			declare void @llvm.amdgcn.image.store.i32(<4 x float>, i32, <8 x i32>, i32, i1, i1, i1, i1) #0
	declare void @llvm.amdgcn.image.store.v2i32(<4 x float>, <2 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0			declare void @llvm.amdgcn.image.store.v2i32(<4 x float>, <2 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0
	declare void @llvm.amdgcn.image.store.v4i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0			declare void @llvm.amdgcn.image.store.v4i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0
	declare void @llvm.amdgcn.image.store.mip.v4i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0			declare void @llvm.amdgcn.image.store.mip.v4i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0

	declare <4 x float> @llvm.amdgcn.image.load.i32(i32, <8 x i32>, i32, i1, i1, i1, i1) #1			declare <4 x float> @llvm.amdgcn.image.load.i32(i32, <8 x i32>, i32, i1, i1, i1, i1) #1
	declare <4 x float> @llvm.amdgcn.image.load.v2i32(<2 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1			declare <4 x float> @llvm.amdgcn.image.load.v2i32(<2 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1
	declare <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1			declare <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1
	declare <4 x float> @llvm.amdgcn.image.load.mip.v4i32(<4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1			declare <4 x float> @llvm.amdgcn.image.load.mip.v4i32(<4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }

test/CodeGen/AMDGPU/register-count-comments.ll

	Show All 20 Lines
	}			}

	; SI-LABEL: {{^}}one_vgpr_used:			; SI-LABEL: {{^}}one_vgpr_used:
	; SI: NumVgprs: 1			; SI: NumVgprs: 1
	define void @one_vgpr_used(i32 addrspace(1)* %out, i32 %x) nounwind {			define void @one_vgpr_used(i32 addrspace(1)* %out, i32 %x) nounwind {
	store i32 %x, i32 addrspace(1)* %out, align 4			store i32 %x, i32 addrspace(1)* %out, align 4
	ret void			ret void
	}			}

				; SI-LABEL: {{^}}one_vgpr_used_3_enabled:
				; SI: NumVgprs: 3
				define void @one_vgpr_used_3_enabled(i32 addrspace(1)* %out) nounwind {
				%x = call i32 @llvm.amdgcn.workitem.id.x()
				%y = call i32 @llvm.amdgcn.workitem.id.y()
				%z = call i32 @llvm.amdgcn.workitem.id.z()
				store volatile i32 %x, i32 addrspace(1)* %out, align 4
				store volatile i32 %y, i32 addrspace(1)* %out, align 4
				store volatile i32 %z, i32 addrspace(1)* %out, align 4
				ret void
				}

				declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
				declare i32 @llvm.amdgcn.workitem.id.y() nounwind readnone
				declare i32 @llvm.amdgcn.workitem.id.z() nounwind readnone

test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot.ll

	; RUN: llc -march=amdgcn -mcpu=tahiti -mattr=+vgpr-spilling -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=SI %s			; RUN: llc -march=amdgcn -mcpu=tahiti -mattr=+vgpr-spilling -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=SI %s
	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=+vgpr-spilling -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=VI %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=+vgpr-spilling -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=VI %s

	; This ends up using all 255 registers and requires register			; This ends up using all 255 registers and requires register
	; scavenging which will fail to find an unsued register.			; scavenging which will fail to find an unsued register.

	; Check the ScratchSize to avoid regressions from spilling			; Check the ScratchSize to avoid regressions from spilling
	; intermediate register class copies.			; intermediate register class copies.

	; FIXME: The same register is initialized to 0 for every spill.			; FIXME: The same register is initialized to 0 for every spill.

	; GCN-LABEL: {{^}}main:			; GCN-LABEL: {{^}}main:

				; GCN-DAG: s_mov_b32 s16, s12
	; GCN-DAG: s_mov_b32 s12, SCRATCH_RSRC_DWORD0			; GCN-DAG: s_mov_b32 s12, SCRATCH_RSRC_DWORD0
	; GCN-DAG: s_mov_b32 s13, SCRATCH_RSRC_DWORD1			; GCN-DAG: s_mov_b32 s13, SCRATCH_RSRC_DWORD1
	; GCN-DAG: s_mov_b32 s14, -1			; GCN-DAG: s_mov_b32 s14, -1
	; SI-DAG: s_mov_b32 s15, 0x98f000			; SI-DAG: s_mov_b32 s15, 0x98f000
	; VI-DAG: s_mov_b32 s15, 0x980000			; VI-DAG: s_mov_b32 s15, 0x980000

	; s12 is offset user SGPR			; s16 is offset user SGPR
	; GCN: buffer_store_dword {{v[0-9]+}}, off, s[12:15], s11 offset:{{[0-9]+}} ; 16-byte Folded Spill			; GCN: buffer_store_dword {{v[0-9]+}}, off, s[12:15], s16 offset:{{[0-9]+}} ; 16-byte Folded Spill
	; GCN: buffer_load_dword v{{[0-9]+}}, off, s[12:15], s11 offset:{{[0-9]+}} ; 16-byte Folded Reload			; GCN: buffer_load_dword v{{[0-9]+}}, off, s[12:15], s16 offset:{{[0-9]+}} ; 16-byte Folded Reload

	; GCN: NumVgprs: 256			; GCN: NumVgprs: 256
	; GCN: ScratchSize: 1024			; GCN: ScratchSize: 1024
				; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 12

	define amdgpu_vs void @main([9 x <16 x i8>] addrspace(2)* byval %arg, [17 x <16 x i8>] addrspace(2)* byval %arg1, [17 x <4 x i32>] addrspace(2)* byval %arg2, [34 x <8 x i32>] addrspace(2)* byval %arg3, [16 x <16 x i8>] addrspace(2)* byval %arg4, i32 inreg %arg5, i32 inreg %arg6, i32 %arg7, i32 %arg8, i32 %arg9, i32 %arg10) {			define amdgpu_vs void @main([9 x <16 x i8>] addrspace(2)* byval %arg, [17 x <16 x i8>] addrspace(2)* byval %arg1, [17 x <4 x i32>] addrspace(2)* byval %arg2, [34 x <8 x i32>] addrspace(2)* byval %arg3, [16 x <16 x i8>] addrspace(2)* byval %arg4, i32 inreg %arg5, i32 inreg %arg6, i32 %arg7, i32 %arg8, i32 %arg9, i32 %arg10) {
	bb:			bb:
	%tmp = getelementptr [17 x <16 x i8>], [17 x <16 x i8>] addrspace(2)* %arg1, i64 0, i64 0			%tmp = getelementptr [17 x <16 x i8>], [17 x <16 x i8>] addrspace(2)* %arg1, i64 0, i64 0
	%tmp11 = load <16 x i8>, <16 x i8> addrspace(2)* %tmp, align 16, !tbaa !0			%tmp11 = load <16 x i8>, <16 x i8> addrspace(2)* %tmp, align 16, !tbaa !0
	%tmp12 = call float @llvm.SI.load.const(<16 x i8> %tmp11, i32 0)			%tmp12 = call float @llvm.SI.load.const(<16 x i8> %tmp11, i32 0)
	%tmp13 = call float @llvm.SI.load.const(<16 x i8> %tmp11, i32 16)			%tmp13 = call float @llvm.SI.load.const(<16 x i8> %tmp11, i32 16)
	%tmp14 = call float @llvm.SI.load.const(<16 x i8> %tmp11, i32 32)			%tmp14 = call float @llvm.SI.load.const(<16 x i8> %tmp11, i32 32)
	▲ Show 20 Lines • Show All 465 Lines • Show Last 20 Lines