This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Set threshold for regbanks reassign pass
ClosedPublic

Authored by rampitec on Feb 22 2021, 12:28 PM.

Details

Summary

This is to limit compile time. I did experiments with some
inputs and found that compile time keeps reasonable for this
pass if we have less than 100000 virtual registers and then
starts to explode somewhere between 100000 and 150000.

Diff Detail

Event Timeline

rampitec created this revision.Feb 22 2021, 12:28 PM
rampitec requested review of this revision.Feb 22 2021, 12:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2021, 12:28 PM
Herald added a subscriber: wdng. · View Herald Transcript
rampitec updated this revision to Diff 325539.Feb 22 2021, 12:36 PM

Fixed debug output.

alex-t accepted this revision.Feb 23 2021, 3:38 AM

LGTM

This revision is now accepted and ready to land.Feb 23 2021, 3:38 AM
This revision was automatically updated to reflect the committed changes.
arsenm added inline comments.Feb 23 2021, 10:27 AM
llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp
54

This seems like a pretty low threshold

rampitec added inline comments.Feb 23 2021, 12:22 PM
llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp
54

It is. Here is what happens when we have ~150000 vregs:

147.0184 ( 75.8%)   0.0000 (  0.0%)  147.0184 ( 75.5%)  147.1465 ( 75.5%)  GCN RegBank Reassign
14.0944 (  7.3%)   0.0800 ( 12.7%)  14.1743 (  7.3%)  14.1812 (  7.3%)  Machine Instruction Scheduler

And when we have ~100000 the pass is not even visible at the top of -time-passes. So unless there are better ideas we need to limit it.

One idea I have is to use kind of heuristic to account not only for the number of vregs, but for the number of registers allocated. What makes it slow is checkInterference() for every probed register at every conflict. Obviously time will be proportional to the number of overlapping LIs at the point of conflict and that more or less can be approximated by the number of registers, at least in a most "fat" portion of a program. Moreover, more overlapping LIs we have less chances we will be able to find a combination of registers to resolve a conflict.

If there would be a cheap way to estimate register pressure at a given instruction we could skip individual instructions from search, but I am afraid RPT is not a cheap way.

arsenm added inline comments.Feb 23 2021, 12:25 PM
llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp
668

Can you redo this search process in terms of regunits instead?

rampitec added inline comments.Feb 23 2021, 12:28 PM
llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp
668

I will need to supply a physreg for VRM at the end. Plus all these isAllocatable() et all checks are not for reg units.

arsenm added inline comments.Feb 23 2021, 12:31 PM
llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp
668

VRM should be using regunits internally. It does have LiveRegMatrix::checkRegUnitInterference. Overall we need to rewrite everything considering registers to operate on regunits instead (including reframing reserved registers in terms of reserved regunits)

arsenm added inline comments.Feb 23 2021, 12:33 PM
llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp
668

I guess checkRegUnitInterference is part of the implementation, but that just means you're repeating that multiple times by scanning over all of the registers in tuple classes