Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[AMDGPU] Reserve SGPR pair when long branches are present
ClosedPublic

Authored by crobeck on May 3 2023, 11:15 AM.

Details

Reviewers
bcahoon
msearles
vpykhtin
arsenm
b-sumner
Pierre-vh
Group Reviewers
Restricted Project
Summary

After applying recent register allocation patches the compiler generates a register scavenging error. Branch Relaxation runs after RA, but RA doesn't allocate enough registers to plan for the case where an indirect branch ends up being needed during branch relaxation. This causes a “did not find scavenging index” assert to be hit from assignRegToScavengingIndex from within RegScavenger.

In this patch we estimate before RA whether an indirect branch is likely to be needed, and reserve 2 SGPRs if the branch distance is found to be above a threshold. This is difficult as you often don't have an accurate idea of the code size and branch distance before register allocation and when you would need to reserve the registers. We therefore make the distance calculation a reduced complexity approximation and add a tuning factor on the threshold through the -amdgpu-long-branch-factor command line argument.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
crobeck updated this revision to Diff 527688.Jun 1 2023, 6:07 PM
This comment was removed by crobeck.
crobeck updated this revision to Diff 527695.EditedJun 1 2023, 6:19 PM

fix a type mismatch

crobeck updated this revision to Diff 527892.Jun 2 2023, 10:07 AM
crobeck added inline comments.
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1453–1455

Yes, you're right. I've made the checks more explicit.

crobeck added inline comments.Jun 2 2023, 10:13 AM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1453–1455

Would it be worth adding a test with amdgpu-num-sgpr to see what happens when 100% of SGPRs are used?

That's a good idea. At the very least good to understand what happens when were maxed out on SGPRs.

crobeck updated this revision to Diff 527935.Jun 2 2023, 12:33 PM
crobeck updated this revision to Diff 530066.Jun 9 2023, 1:12 PM
crobeck edited the summary of this revision. (Show Details)

add test for when maxed out on SGPRs

crobeck added inline comments.Jun 9 2023, 1:24 PM
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1453–1455

This ended up being kind of a mess trying to do it with the amdgpu-num-sgpr attribute so just used inline asm to use up all the registers

crobeck marked an inline comment as done.Jun 9 2023, 1:26 PM
crobeck added a comment.EditedJun 16 2023, 9:43 AM

Any further comments on this? @arsenm Do you think we need to update the distance calculation or are OK to leave as is?

arsenm added inline comments.Jun 16 2023, 3:55 PM
llvm/lib/Target/AMDGPU/GCNPreRABranchDistance.cpp
36 ↗(On Diff #520388)

I think you should go for fuzzier. Drop the block map and all that from BranchRelaxation. Just do something dead simple like number of instructions * 8 * fuzz ~= branch distance

crobeck updated this revision to Diff 532736.Jun 19 2023, 12:23 PM

Reduce complexity of branch distance calculation substantially - we now just approximate branch size by assuming 8 bytes per instruction. Also updated pass name to be more representative of what the pass does.

crobeck updated this revision to Diff 532741.Jun 19 2023, 12:51 PM

clean up unused header files and an unused function

crobeck marked 2 inline comments as done.Jun 19 2023, 1:32 PM
crobeck edited the summary of this revision. (Show Details)Jun 20 2023, 8:27 AM
crobeck edited the summary of this revision. (Show Details)
crobeck edited the summary of this revision. (Show Details)Jun 20 2023, 8:29 AM
arsenm added inline comments.Jun 21 2023, 7:00 AM
llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
46

uint64_ts?

78–79

Clear at exit instead?

86

Should skip debug and meta instructions, otherwise this violates the cardinal rule that debug info shouldn't change codegen

121

const

135–137

This should be moved into a TRI reserve registers helper like the others.

Also, would be a bit safer to invert this by reserving by default and having this pass *remove* the reserved register

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1429

The reserved registers should already be frozen before PEI. I think the current freezeReservedRegs call here was just never updated to use the new incremental reserveReg

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2555

We should also fix this fixme at some point

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-long-branch-reg.ll
67

Add a test to show no change for debug instructions

arsenm added inline comments.Jun 21 2023, 7:01 AM
llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
31

If you're going to use FP for this, might as well use double

arsenm changed the edit policy from "Restricted Project (Project)" to "All Users".Jun 21 2023, 7:31 AM

Add a test for MIR serialization.

llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
107

Capitalize.

118

Capitalize.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1445

Unify the freezeReservedRegs at the end of the function?

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2553

const

crobeck updated this revision to Diff 533380.Jun 21 2023, 1:41 PM
crobeck marked 8 inline comments as done.

address review comments, add additional debug instruction test

crobeck marked an inline comment as done.Jun 21 2023, 1:45 PM
crobeck added inline comments.
llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
135–137

We're not actually reserving the register here. We're just setting the flag that we need to reserve the register in TRI. But, I can move this if it makes more sense there.

If we instead invert it we then need to always run through everything and prove there isn't a long branch as opposed to exiting once if we find a long branch.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1429

Just set both calls to reserveReg then?

1445

Pending updating reserveReg call above.

crobeck updated this revision to Diff 533586.Jun 22 2023, 6:53 AM

Update reserveReg calls in SIFrameLowering

crobeck marked 2 inline comments as done.Jun 22 2023, 6:55 AM
crobeck added inline comments.
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1429

I think I've convinced myself these can both just be reserveReg calls.

crobeck updated this revision to Diff 533663.Jun 22 2023, 9:43 AM

Swap out reserve register routine for TRI helper function

crobeck added inline comments.Jun 22 2023, 9:52 AM
llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
135–137

This should be cleaner now with a call into TRI instead of grabbing my own registers.

arsenm added inline comments.Jun 22 2023, 10:13 AM
llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
87

Just isMetaInstruction is fine, it's a superset of isDebugIstr

103

Don't need a specific instruction offset, just the block distances are good enough

crobeck updated this revision to Diff 533713.Jun 22 2023, 11:43 AM

remove specific instruction offsets from distance calc

crobeck marked 2 inline comments as done.Jun 22 2023, 11:44 AM
crobeck updated this revision to Diff 533731.Jun 22 2023, 12:30 PM

variable, pointer/reference clean up

crobeck updated this revision to Diff 533956.Jun 23 2023, 7:30 AM

invert to reserve register by default and then have pass remove the register if unneeded

crobeck marked an inline comment as done.Jun 23 2023, 7:38 AM
arsenm added inline comments.Jun 23 2023, 3:04 PM
llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
117–118

I don't see how this can fail unless you're using amdgpu-num-sgprs (and this is one of the reasons it should be removed)

120

I would assume this is already noregister if this failed

I think you misunderstood what I was saying about reserve first and release later. I meant in the compile pipeline, not in the pass itself. You can find the reserved register in finalizeLowering, and then have this pass unset and un-reserve. I don't know if we have an exposed helper to clear reserved registers. You can do that as a follow up,

121

If you actually did modify the MFI register this should be a return true

144

This should be return true if you modified MFI

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2623

Should also skip the enterBasicBlockEnd above

crobeck updated this revision to Diff 534108.Jun 23 2023, 3:58 PM

remove unnecessary set functions

crobeck marked 3 inline comments as done.Jun 23 2023, 4:00 PM
crobeck added inline comments.
llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
117–118

It failed in some test cases where all the SGPRs were already used from inline asm

120

Yes, you're right I can remove that set function.

Got it. OK, that makes sense.

crobeck updated this revision to Diff 534113.EditedJun 23 2023, 4:18 PM
crobeck marked an inline comment as done.

skipping some of register scavenging if we already have a reserved register

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2623

There is some init of MBB in enterBasicBlockEnd that breaks things if I skip it entirely. But I think I can get away will skipping some of it.

arsenm accepted this revision.Jun 27 2023, 3:14 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/GCNPreRALongBranchReg.cpp
117–118

oh right

This revision is now accepted and ready to land.Jun 27 2023, 3:14 PM