This is an archive of the discontinued LLVM Phabricator instance.

Differential D47509

[AMDGPU] Track occupancy in MFI
ClosedPublic

Authored by rampitec on May 29 2018, 6:53 PM.

Download Raw Diff

Details

Reviewers

vpykhtin

Commits

rGd4b500cb08b2: [AMDGPU] Track occupancy in MFI
rL333629: [AMDGPU] Track occupancy in MFI

Summary

Keep track of achieved occupancy in SIMachineFunctionInfo.
At the moment we have a lot of duplicated or even missed code to
query and maintain occupancy info. Record it in the MFI and
query in a single call. Interfaces:

getOccupancy() - returns current recorded achieved occupancy.
getMinAllowedOccupancy() - returns lesser of the achieved occupancy

and the lowest occupancy we are ready to tolerate. For example if
a kernel is memory bound we are ready to tolerate 4 waves.

limitOccupancy() - record occupancy level if we have to lower it.
increaseOccupancy() - record occupancy if scheduler managed to

increase the occupancy.

MFI takes care of integrating different checks affecting occupancy,
including LDS use and waves-per-eu attribute. Note that scheduler
starts with not yet known register pressure, so has to record either
limit or increase in occupancy after it is done. Later passes can
just query a resulting value.

New interface is used in the active scheduler and NFC wrt its work.
Changes are also made to experimental schedulers to use it and record
an occupancy after they are done. Before the change waves-per-eu was
ignored by experimental schedulers and tolerance window for memory
bound kernels was not used.

Diff Detail

Event Timeline

rampitec created this revision.May 29 2018, 6:53 PM

Herald added subscribers: javed.absar, t-tye, tpr and 6 others. · View Herald TranscriptMay 29 2018, 6:53 PM

rampitec added a child revision: D47511: [AMDGPU] Construct memory clauses before RA.May 29 2018, 6:57 PM

rampitec mentioned this in D47511: [AMDGPU] Construct memory clauses before RA.May 29 2018, 7:31 PM

The problem with this is until we can serialize MachineFunctionInfo, this is going fo further degrade MIR tests.

lib/Target/AMDGPU/SIISelLowering.cpp
4259–4263	Why is this here?

In D47509#1115824, @arsenm wrote:

The problem with this is until we can serialize MachineFunctionInfo, this is going fo further degrade MIR tests.

That did not degrade mir tests so far because all what can be recorded from an original function is lost anyway. Default is to have full 10 waves, unless further limited by an attribute (which is lost for mir), static lds usage (which is lost as well) or scheduler register usage (which was never here until this change).

lib/Target/AMDGPU/SIISelLowering.cpp
4259–4263	allocateLDSGlobal() belongs to AMDGPUMachineFunction, not SIMachineFunctionInfo. The choice is to either make it virtual or handle in the single place it is really used.

rampitec added inline comments.May 30 2018, 1:35 AM

lib/Target/AMDGPU/SIISelLowering.cpp
4259–4263	In fact the other choice is to create SIMachineFunctionInfo::allocateLDSGlobal() which will call AMDGPUMachineFunction::allocateLDSGlobal() and then limitOccupancy(), then call it from here. Same thing basically, but more obscure to my taste. The only sound solution is to virtualize MFI, but I am not really sure it is worth it.

arsenm added inline comments.May 30 2018, 3:07 AM

lib/Target/AMDGPU/SIISelLowering.cpp
4259–4263	I don't see a reason to update this during lowering. We don't do anything with occupancy information in the DAG? Why can't you just adjust this once after the function is selected and the LDS size is known?

Moved LDS processing into finalizeLowering().

lib/Target/AMDGPU/SIISelLowering.cpp
4259–4263	Thanks! Moved to finalizeLowering().

vpykhtin accepted this revision.May 30 2018, 9:40 PM

vpykhtin added inline comments.

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
185	may be one call to limitOccupancy with min(getMaxWavesPerEU and getOcc..LMS)?

This revision is now accepted and ready to land.May 30 2018, 9:40 PM

rampitec added inline comments.May 30 2018, 10:10 PM

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
185	It is really the same, it will even produce the same code after compilation. The whole intent of "limit" semantics is that you can take into account just a single factor and omit all others. I would prefer to keep this snippet to emphasize that ;)

Closed by commit rL333629: [AMDGPU] Track occupancy in MFI (authored by rampitec). · Explain WhyMay 30 2018, 10:40 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

GCNIterativeScheduler.cpp

22 lines

GCNSchedStrategy.h

2 lines

GCNSchedStrategy.cpp

11 lines

SIISelLowering.cpp

2 lines

SIMachineFunctionInfo.h

26 lines

SIMachineFunctionInfo.cpp

10 lines

Diff 149170

lib/Target/AMDGPU/GCNIterativeScheduler.cpp

Show First 20 Lines • Show All 472 Lines • ▼ Show 20 Lines	for (auto R : Regions) {
NewOcc = std::min(NewOcc, MaxRP.getOccupancy(ST));		NewOcc = std::min(NewOcc, MaxRP.getOccupancy(ST));
if (NewOcc <= Occ)		if (NewOcc <= Occ)
break;		break;

setBestSchedule(*R, MinSchedule, MaxRP);		setBestSchedule(*R, MinSchedule, MaxRP);
}		}
LLVM_DEBUG(dbgs() << "New occupancy = " << NewOcc		LLVM_DEBUG(dbgs() << "New occupancy = " << NewOcc
<< ", prev occupancy = " << Occ << '\n');		<< ", prev occupancy = " << Occ << '\n');
		if (NewOcc > Occ) {
		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
		MFI->increaseOccupancy(MF, NewOcc);
		}

return std::max(NewOcc, Occ);		return std::max(NewOcc, Occ);
}		}

void GCNIterativeScheduler::scheduleLegacyMaxOccupancy(		void GCNIterativeScheduler::scheduleLegacyMaxOccupancy(
bool TryMaximizeOccupancy) {		bool TryMaximizeOccupancy) {
const auto &ST = MF.getSubtarget<SISubtarget>();		const auto &ST = MF.getSubtarget<SISubtarget>();
auto TgtOcc = ST.getOccupancyWithLocalMemSize(MF);		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
		auto TgtOcc = MFI->getMinAllowedOccupancy();

sortRegionsByPressure(TgtOcc);		sortRegionsByPressure(TgtOcc);
auto Occ = Regions.front()->MaxPressure.getOccupancy(ST);		auto Occ = Regions.front()->MaxPressure.getOccupancy(ST);

if (TryMaximizeOccupancy && Occ < TgtOcc)		if (TryMaximizeOccupancy && Occ < TgtOcc)
Occ = tryMaximizeOccupancy(TgtOcc);		Occ = tryMaximizeOccupancy(TgtOcc);

// This is really weird but for some magic scheduling regions twice		// This is really weird but for some magic scheduling regions twice
// gives performance improvement		// gives performance improvement
const int NumPasses = Occ < TgtOcc ? 2 : 1;		const int NumPasses = Occ < TgtOcc ? 2 : 1;

TgtOcc = std::min(Occ, TgtOcc);		TgtOcc = std::min(Occ, TgtOcc);
LLVM_DEBUG(dbgs() << "Scheduling using default scheduler, "		LLVM_DEBUG(dbgs() << "Scheduling using default scheduler, "
"target occupancy = "		"target occupancy = "
<< TgtOcc << '\n');		<< TgtOcc << '\n');
GCNMaxOccupancySchedStrategy LStrgy(Context);		GCNMaxOccupancySchedStrategy LStrgy(Context);
		unsigned FinalOccupancy = std::min(Occ, MFI->getOccupancy());

for (int I = 0; I < NumPasses; ++I) {		for (int I = 0; I < NumPasses; ++I) {
// running first pass with TargetOccupancy = 0 mimics previous scheduling		// running first pass with TargetOccupancy = 0 mimics previous scheduling
// approach and is a performance magic		// approach and is a performance magic
LStrgy.setTargetOccupancy(I == 0 ? 0 : TgtOcc);		LStrgy.setTargetOccupancy(I == 0 ? 0 : TgtOcc);
for (auto R : Regions) {		for (auto R : Regions) {
OverrideLegacyStrategy Ovr(R, LStrgy, this);		OverrideLegacyStrategy Ovr(R, LStrgy, this);

Ovr.schedule();		Ovr.schedule();
const auto RP = getRegionPressure(*R);		const auto RP = getRegionPressure(*R);
LLVM_DEBUG(printSchedRP(dbgs(), R->MaxPressure, RP));		LLVM_DEBUG(printSchedRP(dbgs(), R->MaxPressure, RP));

if (RP.getOccupancy(ST) < TgtOcc) {		if (RP.getOccupancy(ST) < TgtOcc) {
LLVM_DEBUG(dbgs() << "Didn't fit into target occupancy O" << TgtOcc);		LLVM_DEBUG(dbgs() << "Didn't fit into target occupancy O" << TgtOcc);
if (R->BestSchedule.get() &&		if (R->BestSchedule.get() &&
R->BestSchedule->MaxPressure.getOccupancy(ST) >= TgtOcc) {		R->BestSchedule->MaxPressure.getOccupancy(ST) >= TgtOcc) {
LLVM_DEBUG(dbgs() << ", scheduling minimal register\n");		LLVM_DEBUG(dbgs() << ", scheduling minimal register\n");
scheduleBest(*R);		scheduleBest(*R);
} else {		} else {
LLVM_DEBUG(dbgs() << ", restoring\n");		LLVM_DEBUG(dbgs() << ", restoring\n");
Ovr.restoreOrder();		Ovr.restoreOrder();
assert(R->MaxPressure.getOccupancy(ST) >= TgtOcc);		assert(R->MaxPressure.getOccupancy(ST) >= TgtOcc);
}		}
}		}
		FinalOccupancy = std::min(FinalOccupancy, RP.getOccupancy(ST));
}		}
}		}
		MFI->limitOccupancy(FinalOccupancy);
}		}

///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////
// Minimal Register Strategy		// Minimal Register Strategy

void GCNIterativeScheduler::scheduleMinReg(bool force) {		void GCNIterativeScheduler::scheduleMinReg(bool force) {
const auto &ST = MF.getSubtarget<SISubtarget>();		const auto &ST = MF.getSubtarget<SISubtarget>();
const auto TgtOcc = ST.getOccupancyWithLocalMemSize(MF);		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
		const auto TgtOcc = MFI->getOccupancy();
sortRegionsByPressure(TgtOcc);		sortRegionsByPressure(TgtOcc);

auto MaxPressure = Regions.front()->MaxPressure;		auto MaxPressure = Regions.front()->MaxPressure;
for (auto R : Regions) {		for (auto R : Regions) {
if (!force && R->MaxPressure.less(ST, MaxPressure, TgtOcc))		if (!force && R->MaxPressure.less(ST, MaxPressure, TgtOcc))
break;		break;

BuildDAG DAG(R, this);		BuildDAG DAG(R, this);
Show All 16 Lines
}		}

///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////
// ILP scheduler port		// ILP scheduler port

void GCNIterativeScheduler::scheduleILP(		void GCNIterativeScheduler::scheduleILP(
bool TryMaximizeOccupancy) {		bool TryMaximizeOccupancy) {
const auto &ST = MF.getSubtarget<SISubtarget>();		const auto &ST = MF.getSubtarget<SISubtarget>();
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
auto TgtOcc = std::min(ST.getOccupancyWithLocalMemSize(MF),		auto TgtOcc = MFI->getMinAllowedOccupancy();
MFI->getMaxWavesPerEU());

sortRegionsByPressure(TgtOcc);		sortRegionsByPressure(TgtOcc);
auto Occ = Regions.front()->MaxPressure.getOccupancy(ST);		auto Occ = Regions.front()->MaxPressure.getOccupancy(ST);

if (TryMaximizeOccupancy && Occ < TgtOcc)		if (TryMaximizeOccupancy && Occ < TgtOcc)
Occ = tryMaximizeOccupancy(TgtOcc);		Occ = tryMaximizeOccupancy(TgtOcc);

TgtOcc = std::min(Occ, TgtOcc);		TgtOcc = std::min(Occ, TgtOcc);
LLVM_DEBUG(dbgs() << "Scheduling using default scheduler, "		LLVM_DEBUG(dbgs() << "Scheduling using default scheduler, "
"target occupancy = "		"target occupancy = "
<< TgtOcc << '\n');		<< TgtOcc << '\n');

		unsigned FinalOccupancy = std::min(Occ, MFI->getOccupancy());
for (auto R : Regions) {		for (auto R : Regions) {
BuildDAG DAG(R, this);		BuildDAG DAG(R, this);
const auto ILPSchedule = makeGCNILPScheduler(DAG.getBottomRoots(), *this);		const auto ILPSchedule = makeGCNILPScheduler(DAG.getBottomRoots(), *this);

const auto RP = getSchedulePressure(*R, ILPSchedule);		const auto RP = getSchedulePressure(*R, ILPSchedule);
LLVM_DEBUG(printSchedRP(dbgs(), R->MaxPressure, RP));		LLVM_DEBUG(printSchedRP(dbgs(), R->MaxPressure, RP));

if (RP.getOccupancy(ST) < TgtOcc) {		if (RP.getOccupancy(ST) < TgtOcc) {
LLVM_DEBUG(dbgs() << "Didn't fit into target occupancy O" << TgtOcc);		LLVM_DEBUG(dbgs() << "Didn't fit into target occupancy O" << TgtOcc);
if (R->BestSchedule.get() &&		if (R->BestSchedule.get() &&
R->BestSchedule->MaxPressure.getOccupancy(ST) >= TgtOcc) {		R->BestSchedule->MaxPressure.getOccupancy(ST) >= TgtOcc) {
LLVM_DEBUG(dbgs() << ", scheduling minimal register\n");		LLVM_DEBUG(dbgs() << ", scheduling minimal register\n");
scheduleBest(*R);		scheduleBest(*R);
}		}
} else {		} else {
scheduleRegion(*R, ILPSchedule, RP);		scheduleRegion(*R, ILPSchedule, RP);
LLVM_DEBUG(printSchedResult(dbgs(), R, RP));		LLVM_DEBUG(printSchedResult(dbgs(), R, RP));
		FinalOccupancy = std::min(FinalOccupancy, RP.getOccupancy(ST));
}		}
}		}
		MFI->limitOccupancy(FinalOccupancy);
}		}

lib/Target/AMDGPU/GCNSchedStrategy.h

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	public:

void setTargetOccupancy(unsigned Occ) { TargetOccupancy = Occ; }		void setTargetOccupancy(unsigned Occ) { TargetOccupancy = Occ; }
};		};

class GCNScheduleDAGMILive : public ScheduleDAGMILive {		class GCNScheduleDAGMILive : public ScheduleDAGMILive {

const SISubtarget &ST;		const SISubtarget &ST;

const SIMachineFunctionInfo &MFI;		SIMachineFunctionInfo &MFI;

// Occupancy target at the beginning of function scheduling cycle.		// Occupancy target at the beginning of function scheduling cycle.
unsigned StartingOccupancy;		unsigned StartingOccupancy;

// Minimal real occupancy recorder for the function.		// Minimal real occupancy recorder for the function.
unsigned MinOccupancy;		unsigned MinOccupancy;

// Scheduling stage number.		// Scheduling stage number.
Show All 37 Lines

lib/Target/AMDGPU/GCNSchedStrategy.cpp

Show First 20 Lines • Show All 302 Lines • ▼ Show 20 Lines	SUnit *GCNMaxOccupancySchedStrategy::pickNode(bool &IsTopNode) {
return SU;		return SU;
}		}

GCNScheduleDAGMILive::GCNScheduleDAGMILive(MachineSchedContext *C,		GCNScheduleDAGMILive::GCNScheduleDAGMILive(MachineSchedContext *C,
std::unique_ptr<MachineSchedStrategy> S) :		std::unique_ptr<MachineSchedStrategy> S) :
ScheduleDAGMILive(C, std::move(S)),		ScheduleDAGMILive(C, std::move(S)),
ST(MF.getSubtarget<SISubtarget>()),		ST(MF.getSubtarget<SISubtarget>()),
MFI(*MF.getInfo<SIMachineFunctionInfo>()),		MFI(*MF.getInfo<SIMachineFunctionInfo>()),
StartingOccupancy(std::min(ST.getOccupancyWithLocalMemSize(MFI.getLDSSize(),		StartingOccupancy(MFI.getOccupancy()),
MF.getFunction()),
MFI.getMaxWavesPerEU())),
MinOccupancy(StartingOccupancy), Stage(0), RegionIdx(0) {		MinOccupancy(StartingOccupancy), Stage(0), RegionIdx(0) {

LLVM_DEBUG(dbgs() << "Starting occupancy is " << StartingOccupancy << ".\n");		LLVM_DEBUG(dbgs() << "Starting occupancy is " << StartingOccupancy << ".\n");
}		}

void GCNScheduleDAGMILive::schedule() {		void GCNScheduleDAGMILive::schedule() {
if (Stage == 0) {		if (Stage == 0) {
// Just record regions at the first pass.		// Just record regions at the first pass.
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	void GCNScheduleDAGMILive::schedule() {
LLVM_DEBUG(dbgs() << "Occupancy before scheduling: " << WavesBefore		LLVM_DEBUG(dbgs() << "Occupancy before scheduling: " << WavesBefore
<< ", after " << WavesAfter << ".\n");		<< ", after " << WavesAfter << ".\n");

// We could not keep current target occupancy because of the just scheduled		// We could not keep current target occupancy because of the just scheduled
// region. Record new occupancy for next scheduling cycle.		// region. Record new occupancy for next scheduling cycle.
unsigned NewOccupancy = std::max(WavesAfter, WavesBefore);		unsigned NewOccupancy = std::max(WavesAfter, WavesBefore);
// Allow memory bound functions to drop to 4 waves if not limited by an		// Allow memory bound functions to drop to 4 waves if not limited by an
// attribute.		// attribute.
unsigned MinMemBoundWaves = std::max(MFI.getMinWavesPerEU(), 4u);
if (WavesAfter < WavesBefore && WavesAfter < MinOccupancy &&		if (WavesAfter < WavesBefore && WavesAfter < MinOccupancy &&
WavesAfter >= MinMemBoundWaves &&		WavesAfter >= MFI.getMinAllowedOccupancy()) {
(MFI.isMemoryBound() \|\| MFI.needsWaveLimiter())) {
LLVM_DEBUG(dbgs() << "Function is memory bound, allow occupancy drop up to "		LLVM_DEBUG(dbgs() << "Function is memory bound, allow occupancy drop up to "
<< MinMemBoundWaves << " waves\n");		<< MFI.getMinAllowedOccupancy() << " waves\n");
NewOccupancy = WavesAfter;		NewOccupancy = WavesAfter;
}		}
if (NewOccupancy < MinOccupancy) {		if (NewOccupancy < MinOccupancy) {
MinOccupancy = NewOccupancy;		MinOccupancy = NewOccupancy;
		MFI.limitOccupancy(MinOccupancy);
LLVM_DEBUG(dbgs() << "Occupancy lowered for the function to "		LLVM_DEBUG(dbgs() << "Occupancy lowered for the function to "
<< MinOccupancy << ".\n");		<< MinOccupancy << ".\n");
}		}

if (WavesAfter >= MinOccupancy) {		if (WavesAfter >= MinOccupancy) {
Pressure[RegionIdx] = PressureAfter;		Pressure[RegionIdx] = PressureAfter;
return;		return;
}		}
▲ Show 20 Lines • Show All 171 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,250 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::LowerGlobalAddress(AMDGPUMachineFunction *MFI,

if (GSD->getAddressSpace() != AMDGPUASI.CONSTANT_ADDRESS &&		if (GSD->getAddressSpace() != AMDGPUASI.CONSTANT_ADDRESS &&
GSD->getAddressSpace() != AMDGPUASI.CONSTANT_ADDRESS_32BIT &&		GSD->getAddressSpace() != AMDGPUASI.CONSTANT_ADDRESS_32BIT &&
GSD->getAddressSpace() != AMDGPUASI.GLOBAL_ADDRESS &&		GSD->getAddressSpace() != AMDGPUASI.GLOBAL_ADDRESS &&
// FIXME: It isn't correct to rely on the type of the pointer. This should		// FIXME: It isn't correct to rely on the type of the pointer. This should
// be removed when address space 0 is 64-bit.		// be removed when address space 0 is 64-bit.
!GV->getType()->getElementType()->isFunctionTy())		!GV->getType()->getElementType()->isFunctionTy())
return AMDGPUTargetLowering::LowerGlobalAddress(MFI, Op, DAG);		return AMDGPUTargetLowering::LowerGlobalAddress(MFI, Op, DAG);

SDLoc DL(GSD);		SDLoc DL(GSD);
EVT PtrVT = Op.getValueType();		EVT PtrVT = Op.getValueType();

if (shouldEmitFixup(GV))		if (shouldEmitFixup(GV))
		arsenmUnsubmitted Done Reply Inline Actions Why is this here? arsenm: Why is this here?
		rampitecAuthorUnsubmitted Done Reply Inline Actions allocateLDSGlobal() belongs to AMDGPUMachineFunction, not SIMachineFunctionInfo. The choice is to either make it virtual or handle in the single place it is really used. rampitec: allocateLDSGlobal() belongs to AMDGPUMachineFunction, not SIMachineFunctionInfo. The choice is…
		rampitecAuthorUnsubmitted Done Reply Inline Actions In fact the other choice is to create SIMachineFunctionInfo::allocateLDSGlobal() which will call AMDGPUMachineFunction::allocateLDSGlobal() and then limitOccupancy(), then call it from here. Same thing basically, but more obscure to my taste. The only sound solution is to virtualize MFI, but I am not really sure it is worth it. rampitec: In fact the other choice is to create SIMachineFunctionInfo::allocateLDSGlobal() which will…
		arsenmUnsubmitted Done Reply Inline Actions I don't see a reason to update this during lowering. We don't do anything with occupancy information in the DAG? Why can't you just adjust this once after the function is selected and the LDS size is known? arsenm: I don't see a reason to update this during lowering. We don't do anything with occupancy…
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions Thanks! Moved to finalizeLowering(). rampitec: Thanks! Moved to finalizeLowering().
return buildPCRelGlobalAddress(DAG, GV, DL, GSD->getOffset(), PtrVT);		return buildPCRelGlobalAddress(DAG, GV, DL, GSD->getOffset(), PtrVT);
else if (shouldEmitPCReloc(GV))		else if (shouldEmitPCReloc(GV))
return buildPCRelGlobalAddress(DAG, GV, DL, GSD->getOffset(), PtrVT,		return buildPCRelGlobalAddress(DAG, GV, DL, GSD->getOffset(), PtrVT,
SIInstrInfo::MO_REL32);		SIInstrInfo::MO_REL32);

SDValue GOTAddr = buildPCRelGlobalAddress(DAG, GV, DL, 0, PtrVT,		SDValue GOTAddr = buildPCRelGlobalAddress(DAG, GV, DL, 0, PtrVT,
SIInstrInfo::MO_GOTPCREL32);		SIInstrInfo::MO_GOTPCREL32);

▲ Show 20 Lines • Show All 3,447 Lines • ▼ Show 20 Lines	if (NeedSP) {
MRI.replaceRegWith(AMDGPU::SP_REG, Info->getStackPtrOffsetReg());		MRI.replaceRegWith(AMDGPU::SP_REG, Info->getStackPtrOffsetReg());
}		}

MRI.replaceRegWith(AMDGPU::PRIVATE_RSRC_REG, Info->getScratchRSrcReg());		MRI.replaceRegWith(AMDGPU::PRIVATE_RSRC_REG, Info->getScratchRSrcReg());
MRI.replaceRegWith(AMDGPU::FP_REG, Info->getFrameOffsetReg());		MRI.replaceRegWith(AMDGPU::FP_REG, Info->getFrameOffsetReg());
MRI.replaceRegWith(AMDGPU::SCRATCH_WAVE_OFFSET_REG,		MRI.replaceRegWith(AMDGPU::SCRATCH_WAVE_OFFSET_REG,
Info->getScratchWaveOffsetReg());		Info->getScratchWaveOffsetReg());

		Info->limitOccupancy(MF);

TargetLoweringBase::finalizeLowering(MF);		TargetLoweringBase::finalizeLowering(MF);
}		}

void SITargetLowering::computeKnownBitsForFrameIndex(const SDValue Op,		void SITargetLowering::computeKnownBitsForFrameIndex(const SDValue Op,
KnownBits &Known,		KnownBits &Known,
const APInt &DemandedElts,		const APInt &DemandedElts,
const SelectionDAG &DAG,		const SelectionDAG &DAG,
unsigned Depth) const {		unsigned Depth) const {
Show All 12 Lines

lib/Target/AMDGPU/SIMachineFunctionInfo.h

Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	private:

// The hard-wired high half of the address of the global information table		// The hard-wired high half of the address of the global information table
// for AMDPAL OS type. 0xffffffff represents no hard-wired high half, since		// for AMDPAL OS type. 0xffffffff represents no hard-wired high half, since
// current hardware only allows a 16 bit value.		// current hardware only allows a 16 bit value.
unsigned GITPtrHigh;		unsigned GITPtrHigh;

unsigned HighBitsOf32BitAddress;		unsigned HighBitsOf32BitAddress;

		// Current recorded maximum possible occupancy.
		unsigned Occupancy;

MCPhysReg getNextUserSGPR() const;		MCPhysReg getNextUserSGPR() const;

MCPhysReg getNextSystemSGPR() const;		MCPhysReg getNextSystemSGPR() const;

public:		public:
struct SpilledReg {		struct SpilledReg {
unsigned VGPR = 0;		unsigned VGPR = 0;
int Lane = -1;		int Lane = -1;
▲ Show 20 Lines • Show All 439 Lines • ▼ Show 20 Lines	public:
const AMDGPUImagePseudoSourceValue *getImagePSV(const SIInstrInfo &TII,		const AMDGPUImagePseudoSourceValue *getImagePSV(const SIInstrInfo &TII,
const Value *ImgRsrc) {		const Value *ImgRsrc) {
assert(ImgRsrc);		assert(ImgRsrc);
auto PSV = ImagePSVs.try_emplace(		auto PSV = ImagePSVs.try_emplace(
ImgRsrc,		ImgRsrc,
llvm::make_unique<AMDGPUImagePseudoSourceValue>(TII));		llvm::make_unique<AMDGPUImagePseudoSourceValue>(TII));
return PSV.first->second.get();		return PSV.first->second.get();
}		}

		unsigned getOccupancy() const {
		return Occupancy;
		}

		unsigned getMinAllowedOccupancy() const {
		if (!isMemoryBound() && !needsWaveLimiter())
		return Occupancy;
		return (Occupancy < 4) ? Occupancy : 4;
		}

		void limitOccupancy(const MachineFunction &MF);

		void limitOccupancy(unsigned Limit) {
		if (Occupancy > Limit)
		Occupancy = Limit;
		}

		void increaseOccupancy(const MachineFunction &MF, unsigned Limit) {
		if (Occupancy < Limit)
		Occupancy = Limit;
		limitOccupancy(MF);
		}
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_AMDGPU_SIMACHINEFUNCTIONINFO_H		#endif // LLVM_LIB_TARGET_AMDGPU_SIMACHINEFUNCTIONINFO_H

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	: AMDGPUMachineFunction(MF),
ImplicitArgPtr(false),		ImplicitArgPtr(false),
GITPtrHigh(0xffffffff),		GITPtrHigh(0xffffffff),
HighBitsOf32BitAddress(0) {		HighBitsOf32BitAddress(0) {
const SISubtarget &ST = MF.getSubtarget<SISubtarget>();		const SISubtarget &ST = MF.getSubtarget<SISubtarget>();
const Function &F = MF.getFunction();		const Function &F = MF.getFunction();
FlatWorkGroupSizes = ST.getFlatWorkGroupSizes(F);		FlatWorkGroupSizes = ST.getFlatWorkGroupSizes(F);
WavesPerEU = ST.getWavesPerEU(F);		WavesPerEU = ST.getWavesPerEU(F);

		Occupancy = getMaxWavesPerEU();
		limitOccupancy(MF);

if (!isEntryFunction()) {		if (!isEntryFunction()) {
// Non-entry functions have no special inputs for now, other registers		// Non-entry functions have no special inputs for now, other registers
// required for scratch access.		// required for scratch access.
ScratchRSrcReg = AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3;		ScratchRSrcReg = AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3;
ScratchWaveOffsetReg = AMDGPU::SGPR4;		ScratchWaveOffsetReg = AMDGPU::SGPR4;
FrameOffsetReg = AMDGPU::SGPR5;		FrameOffsetReg = AMDGPU::SGPR5;
StackPtrOffsetReg = AMDGPU::SGPR32;		StackPtrOffsetReg = AMDGPU::SGPR32;

▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	if (!S.empty())
S.consumeInteger(0, GITPtrHigh);		S.consumeInteger(0, GITPtrHigh);

A = F.getFnAttribute("amdgpu-32bit-address-high-bits");		A = F.getFnAttribute("amdgpu-32bit-address-high-bits");
S = A.getValueAsString();		S = A.getValueAsString();
if (!S.empty())		if (!S.empty())
S.consumeInteger(0, HighBitsOf32BitAddress);		S.consumeInteger(0, HighBitsOf32BitAddress);
}		}

		void SIMachineFunctionInfo::limitOccupancy(const MachineFunction &MF) {
		limitOccupancy(getMaxWavesPerEU());
		const SISubtarget& ST = MF.getSubtarget<SISubtarget>();
		limitOccupancy(ST.getOccupancyWithLocalMemSize(getLDSSize(),
		vpykhtinUnsubmitted Not Done Reply Inline Actions may be one call to limitOccupancy with min(getMaxWavesPerEU and getOcc..LMS)? vpykhtin: may be one call to limitOccupancy with min(getMaxWavesPerEU and getOcc..LMS)?
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions It is really the same, it will even produce the same code after compilation. The whole intent of "limit" semantics is that you can take into account just a single factor and omit all others. I would prefer to keep this snippet to emphasize that ;) rampitec: It is really the same, it will even produce the same code after compilation. The whole intent…
		MF.getFunction()));
		}

unsigned SIMachineFunctionInfo::addPrivateSegmentBuffer(		unsigned SIMachineFunctionInfo::addPrivateSegmentBuffer(
const SIRegisterInfo &TRI) {		const SIRegisterInfo &TRI) {
ArgInfo.PrivateSegmentBuffer =		ArgInfo.PrivateSegmentBuffer =
ArgDescriptor::createRegister(TRI.getMatchingSuperReg(		ArgDescriptor::createRegister(TRI.getMatchingSuperReg(
getNextUserSGPR(), AMDGPU::sub0, &AMDGPU::SReg_128RegClass));		getNextUserSGPR(), AMDGPU::sub0, &AMDGPU::SReg_128RegClass));
NumUserSGPRs += 4;		NumUserSGPRs += 4;
return ArgInfo.PrivateSegmentBuffer.getRegister();		return ArgInfo.PrivateSegmentBuffer.getRegister();
}		}
▲ Show 20 Lines • Show All 144 Lines • Show Last 20 Lines