This patch disables the SIFormMemoryClauses pass at -O1. This pass has a significant impact on compilation time, so we only want it to be enabled starting from -O2.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Can we address it being slow instead of just turning of everything that is? There's no reason it should be slow, but the way it uses liveness information is horrible
I'm planning to disable it at -O1 for now, then investigate to improve the implementation for the default optimization level so we can eventually re-enable the pass at O1 too. Does that work for you?
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
---|---|---|
1171 | Needs a comment/fixme for why |
Besides the speed I also do not think this pass belongs at -O1 at all. This not an essential trivial optimization.
On small functions I think the run time of this pass is dominated by the two calls to TRI->getAllocatableSet, which means two identical calls to SIRegisterInfo::getReservedRegs, which has a very large fixed cost. Is there some way the Reserved register set can be cached (in the SIMachineFunctionInfo object?)?
Add the TargetRegisterInfo::getAllocatableSets method to avoid calling getReservedRegs multiple times when we want allocatable sets for multiple register classes.
@foad you're right. But caching the result of getReservedRegs is not easy because it difficult to detect that the calls are from consecutive calls to getAllocatableSet. I instead added the TargetRegisterInfo::getAllocatableSets method allowing to get allocatable sets for multiple register classes so we can call getReservedRegs a single time for all the classes.
It's long been on my todo list to replace reserved registers with reserved reg units which should provide a similar speedup (and just generally be cleaner)
I started looking at this, but I'm not sure it makes sense to do without also changing quite a bit of other infrastructure to work on regunits at the same time
llvm/lib/CodeGen/TargetRegisterInfo.cpp | ||
---|---|---|
270 ↗ | (On Diff #343689) | Why is this recomputing the reserved regs instead of just querying MRI::getReservedRegs? |
276 ↗ | (On Diff #343689) | This is a separate patch |
280–289 ↗ | (On Diff #343689) | Most of this function is a duplictae of getAllocatableSet |
Revert changes improving the pass implementation, these changes will be addressed in another patch.
Needs a comment/fixme for why