This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Speedup SIFormMemoryClauses live-in register set calculation
Needs RevisionPublic

Authored by vpykhtin on Oct 25 2022, 5:28 AM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
cfang

Summary

This patch uses the same approach used in GCNScheduleDAGMILive::getBBLiveInMap,
getLiveRegMap has complexity O(NumVirtRegs * averageLiveRangeSegmentsPerReg * lg(NumBB))

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vpykhtin created this revision.Oct 25 2022, 5:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2022, 5:28 AM

Herald added subscribers: kosarev, foad, kerbowa and 7 others. · View Herald Transcript

vpykhtin requested review of this revision.Oct 25 2022, 5:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2022, 5:28 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B194149: Diff 470448.Oct 25 2022, 7:30 AM

rebase

Harbormaster completed remote builds in B194219: Diff 470547.Oct 25 2022, 11:43 AM

arsenm added inline comments.Oct 25 2022, 11:46 AM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
280	You seem to be assuming a single clause per block. I'd expect to handle this a full clause in a time, within a single block.

vpykhtin added inline comments.Oct 25 2022, 11:53 AM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
280	Not quite, I just compute the live-in set for the first clause per BB to reset the RPTracker, then it is advanced to the next clause.

LGTM

This revision is now accepted and ready to land.Oct 25 2022, 11:59 AM

arsenm requested changes to this revision.Oct 25 2022, 1:22 PM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
280	Can you do this per block, instead of calculating this for every block?

This revision now requires changes to proceed.Oct 25 2022, 1:22 PM

vpykhtin added inline comments.Oct 25 2022, 2:37 PM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
280	This is the whole point of doing that all at once: Slot indexes of all clauses first instructions are collected and sorted. For every virtual register's LiveRange we have two sorted sequences: Segments and SlotIndexes. We need to determine which of SlotIndexes fall into Segments of the virtual register - that would mean the register is live at those SlotIndexes. Since both sequences are sorted we progressively use two-way binary search: either SlotIndex that is contained by the Segment, or Segment containing the SlotIndex. I now realize the complexity is not what I thought before, it should be (per register): O( min ( NumSegments * lg(NumSlotIndexes), NumSlotIndexes * lg(NumSegments) ) See getLiveRegMap, LiveRange::findIndexesLiveAt.

vpykhtin added inline comments.Oct 25 2022, 2:48 PM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
280	Sorry I mean first instruction of a first clause per BB, following clauses are processed using 'advance'

arsenm added inline comments.Nov 16 2022, 4:29 PM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
286	Can you add a comment explaining this? I'm still not following how you're doing it all at once, but subsequent clauses are handled later?

Is this still relevant?

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIFormMemoryClauses.cpp

34 lines

Diff 470547

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp

Show First 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	bool SIFormMemoryClauses::runOnMachineFunction(MachineFunction &MF) {
SlotIndexes *Ind = LIS->getSlotIndexes();		SlotIndexes *Ind = LIS->getSlotIndexes();
bool Changed = false;		bool Changed = false;

MaxVGPRs = TRI->getAllocatableSet(MF, &AMDGPU::VGPR_32RegClass).count();		MaxVGPRs = TRI->getAllocatableSet(MF, &AMDGPU::VGPR_32RegClass).count();
MaxSGPRs = TRI->getAllocatableSet(MF, &AMDGPU::SGPR_32RegClass).count();		MaxSGPRs = TRI->getAllocatableSet(MF, &AMDGPU::SGPR_32RegClass).count();
unsigned FuncMaxClause = AMDGPU::getIntegerAttribute(		unsigned FuncMaxClause = AMDGPU::getIntegerAttribute(
MF.getFunction(), "amdgpu-max-memory-clause", MaxClause);		MF.getFunction(), "amdgpu-max-memory-clause", MaxClause);

for (MachineBasicBlock &MBB : MF) {		SmallVector<MachineInstr *, 16> FirstBBClauseMI;
		arsenmUnsubmitted Not Done Reply Inline Actions You seem to be assuming a single clause per block. I'd expect to handle this a full clause in a time, within a single block. arsenm: You seem to be assuming a single clause per block. I'd expect to handle this a full clause in a…
		vpykhtinAuthorUnsubmitted Done Reply Inline Actions Not quite, I just compute the live-in set for the first clause per BB to reset the RPTracker, then it is advanced to the next clause. vpykhtin: Not quite, I just compute the live-in set for the first clause per BB to reset the RPTracker…
		arsenmUnsubmitted Not Done Reply Inline Actions Can you do this per block, instead of calculating this for every block? arsenm: Can you do this per block, instead of calculating this for every block?
		vpykhtinAuthorUnsubmitted Done Reply Inline Actions This is the whole point of doing that all at once: Slot indexes of all clauses first instructions are collected and sorted. For every virtual register's LiveRange we have two sorted sequences: Segments and SlotIndexes. We need to determine which of SlotIndexes fall into Segments of the virtual register - that would mean the register is live at those SlotIndexes. Since both sequences are sorted we progressively use two-way binary search: either SlotIndex that is contained by the Segment, or Segment containing the SlotIndex. I now realize the complexity is not what I thought before, it should be (per register): O( min ( NumSegments * lg(NumSlotIndexes), NumSlotIndexes * lg(NumSegments) ) See getLiveRegMap, LiveRange::findIndexesLiveAt. vpykhtin: This is the whole point of doing that all at once: 1. Slot indexes of all clauses first…
		vpykhtinAuthorUnsubmitted Done Reply Inline Actions Sorry I mean first instruction of a first clause per BB, following clauses are processed using 'advance' vpykhtin: Sorry I mean first instruction of a first clause per BB, following clauses are processed using…
		for (auto &MBB : MF) {
		for (auto &MI : MBB) {
		if (!MI.isMetaInstruction() &&
		isValidClauseInst(MI, isVMEMClauseInst(MI))) {
		FirstBBClauseMI.push_back(&MI);
		break;
		arsenmUnsubmitted Not Done Reply Inline Actions Can you add a comment explaining this? I'm still not following how you're doing it all at once, but subsequent clauses are handled later? arsenm: Can you add a comment explaining this? I'm still not following how you're doing it all at once…
		}
		}
		}
		if (FirstBBClauseMI.empty())
		return false;

		auto LRM = getLiveRegMap(FirstBBClauseMI, false /After/, *LIS);

GCNDownwardRPTracker RPT(*LIS);		GCNDownwardRPTracker RPT(*LIS);
		for (auto *FirstMI : FirstBBClauseMI) {
		auto &MBB = *FirstMI->getParent();
		RPT.reset(*FirstMI, &LRM[FirstMI]);
MachineBasicBlock::instr_iterator Next;		MachineBasicBlock::instr_iterator Next;
for (auto I = MBB.instr_begin(), E = MBB.instr_end(); I != E; I = Next) {		for (auto I = MachineBasicBlock::instr_iterator(FirstMI),
		E = MBB.instr_end();
		I != E; I = Next) {
MachineInstr &MI = *I;		MachineInstr &MI = *I;
Next = std::next(I);		Next = std::next(I);

if (MI.isMetaInstruction())		if (MI.isMetaInstruction())
continue;		continue;

bool IsVMEM = isVMEMClauseInst(MI);		bool IsVMEM = isVMEMClauseInst(MI);

		if (&MI != FirstMI) {
if (!isValidClauseInst(MI, IsVMEM))		if (!isValidClauseInst(MI, IsVMEM))
continue;		continue;

if (!RPT.getNext().isValid())		// Advance the state to the current MI.
RPT.reset(MI);
else { // Advance the state to the current MI.
RPT.advance(MachineBasicBlock::const_iterator(MI));		RPT.advance(MachineBasicBlock::const_iterator(MI));
RPT.advanceBeforeNext();		RPT.advanceBeforeNext();
}		}

const GCNRPTracker::LiveRegSet LiveRegsCopy(RPT.getLiveRegs());		const GCNRPTracker::LiveRegSet LiveRegsCopy(RPT.getLiveRegs());
RegUse Defs, Uses;		RegUse Defs, Uses;
if (!processRegUses(MI, Defs, Uses, RPT)) {		if (!processRegUses(MI, Defs, Uses, RPT)) {
RPT.reset(MI, &LiveRegsCopy);		RPT.reset(MI, &LiveRegsCopy);
▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines