Add .shader_functions to pal metadata, which contains the stack frame
size for all functions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
400 ms | linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp |
Event Timeline
What's the point of reporting this?
llvm/test/CodeGen/AMDGPU/amdpal-callable.ll | ||
---|---|---|
4 | Should include a test with a non-0 size, and a case with a variable sized stack object, and a case with transitively used stack from a callee |
Fix code and add more tests.
The goal of storing the stack size of functions is that the driver can compute the scratch size that needs to be allocated.
A user tells the driver which shaders/functions can be called and the maximum recursion depth. With these information, the driver can compute the maximum amount of scratch memory that can be needed.
llvm/test/CodeGen/AMDGPU/amdpal-callable.ll | ||
---|---|---|
171 | What do you mean unsupported? These do work (in SelectionDAG you can use a constant size alloca out side of the entry block to behave as-if) |
Ah, the dynamically sized alloca provoked a message that it is unsupported.
An alloca in a branch works fine.
llvm/test/CodeGen/AMDGPU/amdpal-callable.ll | ||
---|---|---|
135 | This doesn't really represent the unknown nature of the stack size, but I guess there isn't a key for this yet? |
llvm/test/CodeGen/AMDGPU/amdpal-callable.ll | ||
---|---|---|
135 | I guess it should be the maximum that the function needs at any point, so 0x10 sounds right? |
llvm/test/CodeGen/AMDGPU/amdpal-callable.ll | ||
---|---|---|
135 | Then what would it report if there was an alloca in a loop? |
llvm/test/CodeGen/AMDGPU/amdpal-callable.ll | ||
---|---|---|
135 | The same as with an if, I think allocas in a loop don’t stack, so there should be a maximum stack usage that can be statically computed. |
llvm/test/CodeGen/AMDGPU/amdpal-callable.ll | ||
---|---|---|
135 | Yes they do stack. It's used for implementing standard(ish) C alloca(). |
llvm/test/CodeGen/AMDGPU/amdpal-callable.ll | ||
---|---|---|
135 | Hm ok, I guess we’re doomed then. |
llvm/test/CodeGen/AMDGPU/amdpal-callable.ll | ||
---|---|---|
135 | For the use cases we care about in graphics, we should never have alloca outside of the function entry point, and therefore the stack frame size is always a constant. We could interpret "stack_frame_size_in_bytes" as the minimum stack frame size and add a boolean field "stack_frame_size_dynamic". We're not going to need it for a long time though. |
Rebase and add test with multiple allocations.
Summarizing comments here and via email:
- This does not work for dynamic allocations, e.g. in loops
- Our frontend for the pal subtarget (llpc) does currently not generate dynamic allocations and will not in the foreseeable future
- We could emit another field when we encounter dynamic allocations but this premature at this time
- We already use the same way to compute scratch size for entrypoint shaders
I hope this is good to go then.
We might need similar fields for LDS usage of functions but I don’t know the needed format.
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
1260 | Seems like this should emit it for any non-entry convention for AMDPAL, not just AMDGPU_gfx |
Seems like this should emit it for any non-entry convention for AMDPAL, not just AMDGPU_gfx