This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/CodeGen/
-
lib/
-
CodeGen/
3/7
RegisterClassInfo.cpp

Differential D146735

[CodeGen] Don't include aliases in RegisterClassInfo::IgnoreCSRForAllocOrder
Needs ReviewPublic

Authored by foad on Mar 23 2023, 9:50 AM.

Download Raw Diff

Details

Reviewers

qcolombet
MatzeB
Srividya-Karumuri

Summary

Previously we called ignoreCSRForAllocationOrder on every alias of every
CSR which was expensive on targets like AMDGPU which define a very large
number of overlapping register tuples.

On such targets it is simpler and faster to call
ignoreCSRForAllocationOrder once for every physical register.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,070 ms	x64 debian > MLIR.Examples/standalone::test.toy

Event Timeline

foad created this revision.Mar 23 2023, 9:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2023, 9:50 AM

Herald added subscribers: kosarev, jeroen.dobbelaere, StephenFan and 2 others. · View Herald Transcript

foad requested review of this revision.Mar 23 2023, 9:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2023, 9:50 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

This code was introduced in D126565 with no tests. I don't understand what it could mean for STI.ignoreCSRForAllocationOrder to return different results for registers that alias, so I hope it is sufficient to call it only on the registers that are directly specified in the CSR list.

Incidentally the two patches in this stack give me about a 10% speedup in running llc -march=amdgcn -mcpu=gfx1100 on a trivial .ll file that just contains a single empty function.

Harbormaster completed remote builds in B221340: Diff 507780.Mar 23 2023, 10:43 AM

qcolombet added inline comments.Apr 4 2023, 6:37 AM

llvm/lib/CodeGen/RegisterClassInfo.cpp
95–96	Given the only spot where `::ignoreCSRForAllocationOrder` is used is guarded by CalleeSavedAliases and that one is populated with regunits (after your other change), I think we should go through the regunits too here.

Update.

foad edited the summary of this revision. (Show Details)Apr 5 2023, 4:49 AM

foad added inline comments.

llvm/lib/CodeGen/RegisterClassInfo.cpp
95–96	How about this brute force approach, calling ignoreCSRForAllocationOrder on every physical register? This is the simplest fastest thing I could come up with for AMDGPU, where ignoreCSRForAllocationOrder is just a virtual call to default implementation that returns false.

Harbormaster completed remote builds in B223762: Diff 511053.Apr 5 2023, 7:06 AM

qcolombet added inline comments.Apr 5 2023, 9:15 AM

llvm/lib/CodeGen/RegisterClassInfo.cpp
95–96	Isn't it cheaper (compile-time wise) to go through only the relevant regunits?

foad added inline comments.Apr 5 2023, 9:37 AM

llvm/lib/CodeGen/RegisterClassInfo.cpp
95–96	How? I can't call STI.ignoreCSRForAllocationOrder on a regunit, only on a register, so I would somehow have to find all registers that contain any regunit that is contained by any CSR - is that what you're suggesting? I don't think I can do that more cheaply than just iterating through all registers once each.

Srividya-Karumuri added inline comments.Apr 5 2023, 9:39 AM

llvm/lib/CodeGen/RegisterClassInfo.cpp
95–96	Regarding the below comment, what is the "other change" that is being referred here? <Given the only spot where ::ignoreCSRForAllocationOrder is used is guarded by CalleeSavedAliases and that one is populated with regunits (after your other change), I think we should go through the regunits too here.>

foad added inline comments.Apr 5 2023, 9:41 AM

llvm/lib/CodeGen/RegisterClassInfo.cpp
95–96	D146734

qcolombet added inline comments.Apr 5 2023, 10:18 AM

llvm/lib/CodeGen/RegisterClassInfo.cpp
95–96	I was thinking that we would go through the roots of the regunits, but you're right technically that may not be enough. Brute force here sounds fine.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

RegisterClassInfo.cpp

8 lines

Diff 511053

llvm/lib/CodeGen/RegisterClassInfo.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	for (const MCPhysReg I = CSR; I; ++I) {
LastCalleeSavedRegs.push_back(*I);		LastCalleeSavedRegs.push_back(*I);
}		}

Update = true;		Update = true;
}		}

// Even if CSR list is same, we could have had a different allocation order		// Even if CSR list is same, we could have had a different allocation order
// if ignoreCSRForAllocationOrder is evaluated differently.		// if ignoreCSRForAllocationOrder is evaluated differently.
BitVector CSRHintsForAllocOrder(TRI->getNumRegs());		BitVector CSRHintsForAllocOrder(TRI->getNumRegs());
for (const MCPhysReg I = CSR; I; ++I)		for (MCPhysReg I = 1, E = TRI->getNumRegs(); I != E; ++I)
		qcolombetUnsubmitted Not Done Reply Inline Actions Given the only spot where `::ignoreCSRForAllocationOrder` is used is guarded by CalleeSavedAliases and that one is populated with regunits (after your other change), I think we should go through the regunits too here. qcolombet: Given the only spot where `::ignoreCSRForAllocationOrder` is used is guarded by…
		foadAuthorUnsubmitted Done Reply Inline Actions How about this brute force approach, calling ignoreCSRForAllocationOrder on every physical register? This is the simplest fastest thing I could come up with for AMDGPU, where ignoreCSRForAllocationOrder is just a virtual call to default implementation that returns false. foad: How about this brute force approach, calling ignoreCSRForAllocationOrder on every physical…
		qcolombetUnsubmitted Not Done Reply Inline Actions Isn't it cheaper (compile-time wise) to go through only the relevant regunits? qcolombet: Isn't it cheaper (compile-time wise) to go through only the relevant regunits?
		foadAuthorUnsubmitted Done Reply Inline Actions How? I can't call STI.ignoreCSRForAllocationOrder on a regunit, only on a register, so I would somehow have to find all registers that contain any regunit that is contained by any CSR - is that what you're suggesting? I don't think I can do that more cheaply than just iterating through all registers once each. foad: How? I can't call STI.ignoreCSRForAllocationOrder on a regunit, only on a register, so I would…
		qcolombetUnsubmitted Not Done Reply Inline Actions I was thinking that we would go through the roots of the regunits, but you're right technically that may not be enough. Brute force here sounds fine. qcolombet: I was thinking that we would go through the roots of the regunits, but you're right technically…
		Srividya-KarumuriUnsubmitted Not Done Reply Inline Actions Regarding the below comment, what is the "other change" that is being referred here? <Given the only spot where ::ignoreCSRForAllocationOrder is used is guarded by CalleeSavedAliases and that one is populated with regunits (after your other change), I think we should go through the regunits too here.> Srividya-Karumuri: Regarding the below comment, what is the "other change" that is being referred here? <Given…
		foadAuthorUnsubmitted Done Reply Inline Actions D146734 foad: D146734
for (MCRegAliasIterator AI(*I, TRI, true); AI.isValid(); ++AI)		CSRHintsForAllocOrder[I] = STI.ignoreCSRForAllocationOrder(mf, I);
CSRHintsForAllocOrder[AI] = STI.ignoreCSRForAllocationOrder(mf, AI);		if (IgnoreCSRForAllocOrder != CSRHintsForAllocOrder) {
if (IgnoreCSRForAllocOrder.size() != CSRHintsForAllocOrder.size() \|\|
IgnoreCSRForAllocOrder != CSRHintsForAllocOrder) {
Update = true;		Update = true;
IgnoreCSRForAllocOrder = CSRHintsForAllocOrder;		IgnoreCSRForAllocOrder = CSRHintsForAllocOrder;
}		}

RegCosts = TRI->getRegisterCosts(*MF);		RegCosts = TRI->getRegisterCosts(*MF);

// Different reserved registers?		// Different reserved registers?
const BitVector &RR = MF->getRegInfo().getReservedRegs();		const BitVector &RR = MF->getRegInfo().getReservedRegs();
▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines