This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Disable the SIFormMemoryClauses pass at -O1
ClosedPublic

Authored by bsaleil on May 5 2021, 1:09 PM.

Details

Summary

This patch disables the SIFormMemoryClauses pass at -O1. This pass has a significant impact on compilation time, so we only want it to be enabled starting from -O2.

Diff Detail

Event Timeline

bsaleil created this revision.May 5 2021, 1:09 PM
bsaleil requested review of this revision.May 5 2021, 1:09 PM
arsenm added a comment.May 5 2021, 1:17 PM

Can we address it being slow instead of just turning of everything that is? There's no reason it should be slow, but the way it uses liveness information is horrible

Can we address it being slow instead of just turning of everything that is? There's no reason it should be slow, but the way it uses liveness information is horrible

I'm planning to disable it at -O1 for now, then investigate to improve the implementation for the default optimization level so we can eventually re-enable the pass at O1 too. Does that work for you?

arsenm added inline comments.May 5 2021, 1:27 PM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1171

Needs a comment/fixme for why

Can we address it being slow instead of just turning of everything that is? There's no reason it should be slow, but the way it uses liveness information is horrible

Besides the speed I also do not think this pass belongs at -O1 at all. This not an essential trivial optimization.

foad added a comment.May 6 2021, 2:47 AM

Can we address it being slow instead of just turning of everything that is? There's no reason it should be slow, but the way it uses liveness information is horrible

On small functions I think the run time of this pass is dominated by the two calls to TRI->getAllocatableSet, which means two identical calls to SIRegisterInfo::getReservedRegs, which has a very large fixed cost. Is there some way the Reserved register set can be cached (in the SIMachineFunctionInfo object?)?

bsaleil updated this revision to Diff 343689.May 7 2021, 8:18 AM

Add the TargetRegisterInfo::getAllocatableSets method to avoid calling getReservedRegs multiple times when we want allocatable sets for multiple register classes.

Can we address it being slow instead of just turning of everything that is? There's no reason it should be slow, but the way it uses liveness information is horrible

On small functions I think the run time of this pass is dominated by the two calls to TRI->getAllocatableSet, which means two identical calls to SIRegisterInfo::getReservedRegs, which has a very large fixed cost. Is there some way the Reserved register set can be cached (in the SIMachineFunctionInfo object?)?

@foad you're right. But caching the result of getReservedRegs is not easy because it difficult to detect that the calls are from consecutive calls to getAllocatableSet. I instead added the TargetRegisterInfo::getAllocatableSets method allowing to get allocatable sets for multiple register classes so we can call getReservedRegs a single time for all the classes.

It's long been on my todo list to replace reserved registers with reserved reg units which should provide a similar speedup (and just generally be cleaner)

It's long been on my todo list to replace reserved registers with reserved reg units which should provide a similar speedup (and just generally be cleaner)

I think I'm trying to take a stab at this

arsenm added a comment.May 7 2021, 2:17 PM

It's long been on my todo list to replace reserved registers with reserved reg units which should provide a similar speedup (and just generally be cleaner)

I think I'm trying to take a stab at this

I started looking at this, but I'm not sure it makes sense to do without also changing quite a bit of other infrastructure to work on regunits at the same time

llvm/lib/CodeGen/TargetRegisterInfo.cpp
270 ↗(On Diff #343689)

Why is this recomputing the reserved regs instead of just querying MRI::getReservedRegs?

276 ↗(On Diff #343689)

This is a separate patch

280–289 ↗(On Diff #343689)

Most of this function is a duplictae of getAllocatableSet

bsaleil updated this revision to Diff 344553.May 11 2021, 1:40 PM

Revert changes improving the pass implementation, these changes will be addressed in another patch.

arsenm accepted this revision.May 11 2021, 5:18 PM
This revision is now accepted and ready to land.May 11 2021, 5:18 PM
foad added inline comments.May 12 2021, 1:40 AM
llvm/lib/CodeGen/TargetRegisterInfo.cpp
270 ↗(On Diff #343689)
This revision was automatically updated to reflect the committed changes.