Non-entry functions have 32 caller saved VGPRs available. If we
promote alloca to consume more registers we will have to spill
CSRs. There is no reason to eliminate scratch access to get
another scratch access instead.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp | ||
---|---|---|
182 | Probably should be isEntryFunctionCC |
We could also be smarter and promote only if there isn’t an intervening call between uses
Why so? If the call does not take address it shall be fine. If address is taken promotion is impossible.
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp | ||
---|---|---|
182 | The difference is the exclusion of AMDGPU_Gfx. What is that anyway? |
I think I kind of got the idea, we have to spill around the call. Maybe we shall do the same: promote, but tell our limit is 32 registers.
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp | ||
---|---|---|
182 |
There is no reason to eliminate scratch access to get
another scratch access instead.
This can be beneficial since one pair CSR spills happen once if this helps avoid stack access inside a loop. It requires considering additional context to know this though
Probably should be isEntryFunctionCC