Page MenuHomePhabricator

[AMDGPU] Emit stack frame size in metadata

Authored by Flakebi on Oct 23 2020, 6:15 AM.



Add .shader_functions to pal metadata, which contains the stack frame
size for all functions.

Diff Detail

Event Timeline

Flakebi created this revision.Oct 23 2020, 6:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 23 2020, 6:15 AM
Flakebi requested review of this revision.Oct 23 2020, 6:15 AM

What's the point of reporting this?


Should include a test with a non-0 size, and a case with a variable sized stack object, and a case with transitively used stack from a callee

Flakebi updated this revision to Diff 300697.Oct 26 2020, 9:28 AM

Fix code and add more tests.

The goal of storing the stack size of functions is that the driver can compute the scratch size that needs to be allocated.
A user tells the driver which shaders/functions can be called and the maximum recursion depth. With these information, the driver can compute the maximum amount of scratch memory that can be needed.

arsenm added inline comments.Oct 26 2020, 10:57 AM

What do you mean unsupported? These do work (in SelectionDAG you can use a constant size alloca out side of the entry block to behave as-if)

Flakebi updated this revision to Diff 300949.Oct 27 2020, 4:41 AM
Flakebi marked an inline comment as done.

Ah, the dynamically sized alloca provoked a message that it is unsupported.
An alloca in a branch works fine.

arsenm added inline comments.Oct 29 2020, 1:22 PM

This doesn't really represent the unknown nature of the stack size, but I guess there isn't a key for this yet?

Flakebi added inline comments.Oct 30 2020, 2:22 AM

I guess it should be the maximum that the function needs at any point, so 0x10 sounds right?

foad added a subscriber: foad.Oct 30 2020, 2:27 AM
foad added inline comments.

Then what would it report if there was an alloca in a loop?

Flakebi updated this revision to Diff 301880.Oct 30 2020, 6:38 AM

Add test with loop

Flakebi added inline comments.Oct 30 2020, 6:39 AM

The same as with an if, I think allocas in a loop don’t stack, so there should be a maximum stack usage that can be statically computed.

foad added inline comments.Oct 30 2020, 6:48 AM

Yes they do stack. It's used for implementing standard(ish) C alloca().

Flakebi added inline comments.Oct 30 2020, 7:23 AM

Hm ok, I guess we’re doomed then.

nhaehnle added inline comments.Nov 4 2020, 8:42 AM

For the use cases we care about in graphics, we should never have alloca outside of the function entry point, and therefore the stack frame size is always a constant.

We could interpret "stack_frame_size_in_bytes" as the minimum stack frame size and add a boolean field "stack_frame_size_dynamic". We're not going to need it for a long time though.

sebastian-ne added a subscriber: sebastian-ne.

Rebase and add test with multiple allocations.

Summarizing comments here and via email:

  • This does not work for dynamic allocations, e.g. in loops
  • Our frontend for the pal subtarget (llpc) does currently not generate dynamic allocations and will not in the foreseeable future
  • We could emit another field when we encounter dynamic allocations but this premature at this time
  • We already use the same way to compute scratch size for entrypoint shaders

I hope this is good to go then.

We might need similar fields for LDS usage of functions but I don’t know the needed format.

Friendly ping for review

arsenm added inline comments.Nov 18 2020, 9:17 AM

Seems like this should emit it for any non-entry convention for AMDPAL, not just AMDGPU_gfx

Thank you for the reviews.
Emit metadata for all functions, not only amdgpu_gfx.

arsenm accepted this revision.Nov 25 2020, 6:17 AM
This revision is now accepted and ready to land.Nov 25 2020, 6:17 AM
This revision was landed with ongoing or failed builds.Nov 25 2020, 7:48 AM
This revision was automatically updated to reflect the committed changes.