This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Don't include aliases in RegisterClassInfo::IgnoreCSRForAllocOrder
Needs ReviewPublic

Authored by foad on Mar 23 2023, 9:50 AM.

Details

Summary

Previously we called ignoreCSRForAllocationOrder on every alias of every
CSR which was expensive on targets like AMDGPU which define a very large
number of overlapping register tuples.

On such targets it is simpler and faster to call
ignoreCSRForAllocationOrder once for every physical register.

Diff Detail

Unit TestsFailed

Event Timeline

foad created this revision.Mar 23 2023, 9:50 AM
foad requested review of this revision.Mar 23 2023, 9:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2023, 9:50 AM
foad added a comment.Mar 23 2023, 9:51 AM

This code was introduced in D126565 with no tests. I don't understand what it could mean for STI.ignoreCSRForAllocationOrder to return different results for registers that alias, so I hope it is sufficient to call it only on the registers that are directly specified in the CSR list.

Incidentally the two patches in this stack give me about a 10% speedup in running llc -march=amdgcn -mcpu=gfx1100 on a trivial .ll file that just contains a single empty function.

qcolombet added inline comments.Apr 4 2023, 6:37 AM
llvm/lib/CodeGen/RegisterClassInfo.cpp
95–96

Given the only spot where ::ignoreCSRForAllocationOrder is used is guarded by CalleeSavedAliases and that one is populated with regunits (after your other change), I think we should go through the regunits too here.

foad updated this revision to Diff 511053.Apr 5 2023, 4:48 AM

Update.

foad edited the summary of this revision. (Show Details)Apr 5 2023, 4:49 AM
foad added inline comments.
llvm/lib/CodeGen/RegisterClassInfo.cpp
95–96

How about this brute force approach, calling ignoreCSRForAllocationOrder on every physical register? This is the simplest fastest thing I could come up with for AMDGPU, where ignoreCSRForAllocationOrder is just a virtual call to default implementation that returns false.

qcolombet added inline comments.Apr 5 2023, 9:15 AM
llvm/lib/CodeGen/RegisterClassInfo.cpp
95–96

Isn't it cheaper (compile-time wise) to go through only the relevant regunits?

foad added inline comments.Apr 5 2023, 9:37 AM
llvm/lib/CodeGen/RegisterClassInfo.cpp
95–96

How? I can't call STI.ignoreCSRForAllocationOrder on a regunit, only on a register, so I would somehow have to find all registers that contain any regunit that is contained by any CSR - is that what you're suggesting? I don't think I can do that more cheaply than just iterating through all registers once each.

llvm/lib/CodeGen/RegisterClassInfo.cpp
95–96

Regarding the below comment, what is the "other change" that is being referred here?

<Given the only spot where ::ignoreCSRForAllocationOrder is used is guarded by CalleeSavedAliases and that one is populated with regunits (after your other change), I think we should go through the regunits too here.>

foad added inline comments.Apr 5 2023, 9:41 AM
llvm/lib/CodeGen/RegisterClassInfo.cpp
95–96
qcolombet added inline comments.Apr 5 2023, 10:18 AM
llvm/lib/CodeGen/RegisterClassInfo.cpp
95–96

I was thinking that we would go through the roots of the regunits, but you're right technically that may not be enough.
Brute force here sounds fine.