This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Limit promote alloca max size in functions
ClosedPublic

Authored by rampitec on Sep 23 2021, 4:10 PM.

Details

Summary

Non-entry functions have 32 caller saved VGPRs available. If we
promote alloca to consume more registers we will have to spill
CSRs. There is no reason to eliminate scratch access to get
another scratch access instead.

Diff Detail

Event Timeline

rampitec created this revision.Sep 23 2021, 4:10 PM
rampitec requested review of this revision.Sep 23 2021, 4:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2021, 4:10 PM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.Sep 23 2021, 4:58 PM
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
182

Probably should be isEntryFunctionCC

We could also be smarter and promote only if there isn’t an intervening call between uses

We could also be smarter and promote only if there isn’t an intervening call between uses

Why so? If the call does not take address it shall be fine. If address is taken promotion is impossible.

rampitec added inline comments.Sep 23 2021, 5:08 PM
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
182

The difference is the exclusion of AMDGPU_Gfx. What is that anyway?

rampitec updated this revision to Diff 374698.Sep 23 2021, 5:19 PM
rampitec marked an inline comment as done.

Switched to isEntryFunctionCC.

We could also be smarter and promote only if there isn’t an intervening call between uses

Why so? If the call does not take address it shall be fine. If address is taken promotion is impossible.

I think I kind of got the idea, we have to spill around the call. Maybe we shall do the same: promote, but tell our limit is 32 registers.

foad added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
182
arsenm accepted this revision.Sep 24 2021, 1:08 PM
This revision is now accepted and ready to land.Sep 24 2021, 1:08 PM
This revision was landed with ongoing or failed builds.Sep 24 2021, 1:38 PM
This revision was automatically updated to reflect the committed changes.
arsenm added a comment.Mar 8 2023, 7:45 AM

There is no reason to eliminate scratch access to get

another scratch access instead.

This can be beneficial since one pair CSR spills happen once if this helps avoid stack access inside a loop. It requires considering additional context to know this though

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2023, 7:45 AM
Herald added a subscriber: kosarev. · View Herald Transcript