In code object 5 (https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata) the AMDGPU backend added the .uses_dynamic_stack bit to the kernel meta data to identity kernels which have compile time indeterminable stack usage (indirect function calls and recursion mainly). This patch adds this information to the output of the kernel-resource-usage remarks.
arsenm b-sumner bcahoon JonChesterfield
- Group Reviewers
- rG7a4968b5a378: [AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output
It is part of the kernel meta data that is passed to the runtime to indicate the compiler can't determine the required stack amount use of the kernel and to tell the runtime it needs to check the value set from hipDeviceSetLimit.
DynamicCallStack ends up getting set here:
Which is essentially just a check for recursion or indirect function calls/runtime polymorphism (i.e. virtual functions).
But the dynamic call stack information ends up being incorporated into CodeGen here:
I don't see how this conveys any information. The compiler writes the stack size to be allocated. If it doesn't know what is sufficient, it's going to request some maximum and hope for the best.
The runtime allocates the requested size. If it has a bit saying "but use less if you know that's safe", then it can do nothing with that bit unless it has extra information. If it has that extra information, it doesn't need this bit.
Therefore instead of adding printing stuff related to this Boolean flag, we should delete the Boolean flag.
What's the use case I'm missing which makes this flag necessary/beneficial?
Post Code Object 5: If the compiler knows the required stack size of a kernel, it sets it and reports that to the runtime. The runtime then uses that value.
If the compiler does not know the required amount, in the case of indirect function calls or recursion, it sets the dynamic stack use bit and reports the minimum required by the kernel to the runtime (the actual amount required by the kernel could be significantly more than the minimum).
If the value required by the kernel ends up being more than the minimum calculated by the compiler or the runtime default value, when the dynamic stack use bit is set, then the code will crash. hipDeviceSetLimit must then be used to raise the stack allocated by the compiler/runtime.
Identifying those cases where the kernels use dynamic stack, and thus developers need to consider the value set by the hipDeviceSetLimit API, can actually be somewhat difficult if the kernel is buried under many layers of templates. We're already reporting remarks about compiler identified/reported scratch use and we've been asked if we can report this flag as part of these remarks.
I suppose reporting out the minimum the compiler calculates might also be useful here but don't feel strongly one way or the other about that.
That was the old broken workaround for the old bit that was never actually implemented in the runtime. The runtime now does properly respect some field to switch to interpreting the reported size as a minimum and then allocates the max of that minimum and some API provided size value
Print right after the scratch size
don't need std::string for simple literals? StringRef?
Maybe add another that has a static component too