This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Target/AMDGPU/
-
lib/
-
Target/
-
AMDGPU/
2/6
SIFormMemoryClauses.cpp

Differential D95273

AMDGPU: Reduce the number of expensive calls in SIFormMemoryClause
ClosedPublic

Authored by cfang on Jan 22 2021, 4:21 PM.

Download Raw Diff

Details

Reviewers

rampitec
arsenm

Commits

rG5b648df1a842: AMDGPU: Reduce the number of expensive calls in SIFormMemoryClause

Summary

RPTracker::reset(MI) is a very expensive call when the number of virtual registers is huge.
We observed a long compilation time issue when RPT::reset() is called once for each cluster.

In this work, we call RPT.reset() only at the first seen cluster, and use advance() to get
the register pressure for the later clusters in the same basic block. This could effectively reduce the number
of the expensive calls and thus reduce the compile time.

Note: I am still seeing a couple LIT failures with the current state of this patch.

Diff Detail

Event Timeline

cfang created this revision.Jan 22 2021, 4:21 PM

Herald added subscribers: kerbowa, hiraditya, t-tye and 6 others. · View Herald TranscriptJan 22 2021, 4:21 PM

cfang requested review of this revision.Jan 22 2021, 4:21 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 22 2021, 4:21 PM

Herald added a subscriber: wdng. · View Herald Transcript

rampitec added inline comments.Jan 22 2021, 4:48 PM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
332	I am not sure, but likely RPT.getNext().isValid() has the same result as using !InitializedInBlock.
337	This second advance is not needed. reset() sets iterator before MI, not after.
340	LiveRegSet LiveRegs = ... Make sure you are actually copying it, not taking a reference.
363	Move it down after "Ind->insertMachineInstrInMaps(*B);" and reset on B, not on MI. Technically MI's iterator is not valid for RPT purposes after bundling.

cfang added inline comments.Jan 22 2021, 6:17 PM

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
340	How to clone a copy? I can not find a suitable RPT function for that purpose.
363	Shouldn't we also restore before the continue when there is just one instruction?

Make a few changes based on comments, Thanks!

Use RPT.getNext().isValid() instead of InitializedInBlock;
Make an actual copy of the Live Register Set in stead of the reference itself;
Move the restore of state down after "Ind->insertMachineInstrInMaps(*B);" and reset on B, not on MI

PS. There are still LIT test failures complaining instructions not matching.

This looks good to me, be we need to understand what are these lit failures.

Fix LIT failures.

LGTM. Thanks!

This revision is now accepted and ready to land.Jan 25 2021, 3:32 PM

Closed by commit rG5b648df1a842: AMDGPU: Reduce the number of expensive calls in SIFormMemoryClause (authored by cfang). · Explain WhyJan 25 2021, 4:09 PM

This revision was automatically updated to reflect the committed changes.

cfang added a commit: rG5b648df1a842: AMDGPU: Reduce the number of expensive calls in SIFormMemoryClause.

In your commit the message has just Reviewers:. The Reviewers: list does not necessarily mean all the people on the list have acknowledged the patch so Reviewers: is mostly useless. Many people agree that both Reviewed by: & Differential Revision: should be present.

arc amend can fetch the Phabricator summary and amend the local description.

You can install llvm/.git/hooks/pre-push to prevent accidental Summary:, Reviewers:, Subscribers: and Tags:.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIFormMemoryClauses.cpp

23 lines

Diff 319143

llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp

	Show All 12 Lines
	bool Changed = false;			bool Changed = false;

	MaxVGPRs = TRI->getAllocatableSet(MF, &AMDGPU::VGPR_32RegClass).count();			MaxVGPRs = TRI->getAllocatableSet(MF, &AMDGPU::VGPR_32RegClass).count();
	MaxSGPRs = TRI->getAllocatableSet(MF, &AMDGPU::SGPR_32RegClass).count();			MaxSGPRs = TRI->getAllocatableSet(MF, &AMDGPU::SGPR_32RegClass).count();
	unsigned FuncMaxClause = AMDGPU::getIntegerAttribute(			unsigned FuncMaxClause = AMDGPU::getIntegerAttribute(
	MF.getFunction(), "amdgpu-max-memory-clause", MaxClause);			MF.getFunction(), "amdgpu-max-memory-clause", MaxClause);

	for (MachineBasicBlock &MBB : MF) {			for (MachineBasicBlock &MBB : MF) {
				GCNDownwardRPTracker RPT(*LIS);
	MachineBasicBlock::instr_iterator Next;			MachineBasicBlock::instr_iterator Next;
	for (auto I = MBB.instr_begin(), E = MBB.instr_end(); I != E; I = Next) {			for (auto I = MBB.instr_begin(), E = MBB.instr_end(); I != E; I = Next) {
	MachineInstr &MI = *I;			MachineInstr &MI = *I;
	Next = std::next(I);			Next = std::next(I);

	bool IsVMEM = isVMEMClauseInst(MI);			bool IsVMEM = isVMEMClauseInst(MI);

	if (!isValidClauseInst(MI, IsVMEM))			if (!isValidClauseInst(MI, IsVMEM))
	continue;			continue;

	RegUse Defs, Uses;			if (!RPT.getNext().isValid())
	GCNDownwardRPTracker RPT(*LIS);			RPT.reset(MI);
				rampitecUnsubmitted Not Done Reply Inline Actions I am not sure, but likely RPT.getNext().isValid() has the same result as using !InitializedInBlock. rampitec: I am not sure, but likely RPT.getNext().isValid() has the same result as using !
	RPT.reset(MI);			else { // Advance the state to the current MI.
				RPT.advance(MachineBasicBlock::const_iterator(MI));
				RPT.advanceBeforeNext();
				}

				rampitecUnsubmitted Not Done Reply Inline Actions This second advance is not needed. reset() sets iterator before MI, not after. rampitec: This second advance is not needed. reset() sets iterator before MI, not after.
	if (!processRegUses(MI, Defs, Uses, RPT))			const GCNRPTracker::LiveRegSet LiveRegsCopy(RPT.getLiveRegs());
				RegUse Defs, Uses;
				if (!processRegUses(MI, Defs, Uses, RPT)) {
				rampitecUnsubmitted Not Done Reply Inline Actions LiveRegSet LiveRegs = ... Make sure you are actually copying it, not taking a reference. rampitec: LiveRegSet LiveRegs = ... Make sure you are actually copying it, not taking a reference.
				cfangAuthorUnsubmitted Done Reply Inline Actions How to clone a copy? I can not find a suitable RPT function for that purpose. cfang: How to clone a copy? I can not find a suitable RPT function for that purpose.
				RPT.reset(MI, &LiveRegsCopy);
	continue;			continue;
				}

	unsigned Length = 1;			unsigned Length = 1;
	for ( ; Next != E && Length < FuncMaxClause; ++Next) {			for ( ; Next != E && Length < FuncMaxClause; ++Next) {
	if (!isValidClauseInst(*Next, IsVMEM))			if (!isValidClauseInst(*Next, IsVMEM))
	break;			break;

	// A load from pointer which was loaded inside the same bundle is an			// A load from pointer which was loaded inside the same bundle is an
	// impossible clause because we will need to write and read the same			// impossible clause because we will need to write and read the same
	// register inside. In this case processRegUses will return false.			// register inside. In this case processRegUses will return false.
	if (!processRegUses(*Next, Defs, Uses, RPT))			if (!processRegUses(*Next, Defs, Uses, RPT))
	break;			break;

	++Length;			++Length;
	}			}
	if (Length < 2)			if (Length < 2) {
				RPT.reset(MI, &LiveRegsCopy);
	continue;			continue;
				}

	Changed = true;			Changed = true;
				rampitecUnsubmitted Not Done Reply Inline Actions Move it down after "Ind->insertMachineInstrInMaps(B);" and reset on B, not on MI. Technically MI's iterator is not valid for RPT purposes after bundling. rampitec:* Move it down after "Ind->insertMachineInstrInMaps(*B);" and reset on B, not on MI. Technically…
				cfangAuthorUnsubmitted Done Reply Inline Actions Shouldn't we also restore before the continue when there is just one instruction? cfang: Shouldn't we also restore before the continue when there is just one instruction?
	MFI->limitOccupancy(LastRecordedOccupancy);			MFI->limitOccupancy(LastRecordedOccupancy);

	auto B = BuildMI(MBB, I, DebugLoc(), TII->get(TargetOpcode::BUNDLE));			auto B = BuildMI(MBB, I, DebugLoc(), TII->get(TargetOpcode::BUNDLE));
	Ind->insertMachineInstrInMaps(*B);			Ind->insertMachineInstrInMaps(*B);

				// Restore the state after processing the bundle.
				RPT.reset(*B, &LiveRegsCopy);

	for (auto BI = I; BI != Next; ++BI) {			for (auto BI = I; BI != Next; ++BI) {
	BI->bundleWithPred();			BI->bundleWithPred();
	Ind->removeSingleMachineInstrFromMaps(*BI);			Ind->removeSingleMachineInstrFromMaps(*BI);

	for (MachineOperand &MO : BI->defs())			for (MachineOperand &MO : BI->defs())
	if (MO.readsReg())			if (MO.readsReg())
	MO.setIsInternalRead(true);			MO.setIsInternalRead(true);
	}			}
	Show All 12 Lines