This is to limit compile time. I did experiments with some
inputs and found that compile time keeps reasonable for this
pass if we have less than 100000 virtual registers and then
starts to explode somewhere between 100000 and 150000.
Details
- Reviewers
alex-t foad - Commits
- rGd1b92c91afd0: [AMDGPU] Set threshold for regbanks reassign pass
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp | ||
---|---|---|
54 | This seems like a pretty low threshold |
llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp | ||
---|---|---|
54 | It is. Here is what happens when we have ~150000 vregs: 147.0184 ( 75.8%) 0.0000 ( 0.0%) 147.0184 ( 75.5%) 147.1465 ( 75.5%) GCN RegBank Reassign 14.0944 ( 7.3%) 0.0800 ( 12.7%) 14.1743 ( 7.3%) 14.1812 ( 7.3%) Machine Instruction Scheduler And when we have ~100000 the pass is not even visible at the top of -time-passes. So unless there are better ideas we need to limit it. One idea I have is to use kind of heuristic to account not only for the number of vregs, but for the number of registers allocated. What makes it slow is checkInterference() for every probed register at every conflict. Obviously time will be proportional to the number of overlapping LIs at the point of conflict and that more or less can be approximated by the number of registers, at least in a most "fat" portion of a program. Moreover, more overlapping LIs we have less chances we will be able to find a combination of registers to resolve a conflict. If there would be a cheap way to estimate register pressure at a given instruction we could skip individual instructions from search, but I am afraid RPT is not a cheap way. |
llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp | ||
---|---|---|
668 | Can you redo this search process in terms of regunits instead? |
llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp | ||
---|---|---|
668 | I will need to supply a physreg for VRM at the end. Plus all these isAllocatable() et all checks are not for reg units. |
llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp | ||
---|---|---|
668 | VRM should be using regunits internally. It does have LiveRegMatrix::checkRegUnitInterference. Overall we need to rewrite everything considering registers to operate on regunits instead (including reframing reserved registers in terms of reserved regunits) |
llvm/lib/Target/AMDGPU/GCNRegBankReassign.cpp | ||
---|---|---|
668 | I guess checkRegUnitInterference is part of the implementation, but that just means you're repeating that multiple times by scanning over all of the registers in tuple classes |
This seems like a pretty low threshold