In code object 5 (https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata) the AMDGPU backend added the .uses_dynamic_stack bit to the kernel meta data to identity kernels which have compile time indeterminable stack usage (indirect function calls and recursion mainly). This patch adds this information to the output of the kernel-resource-usage remarks.
Details
- Reviewers
arsenm b-sumner bcahoon JonChesterfield - Group Reviewers
Restricted Project - Commits
- rG7a4968b5a378: [AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
What is this bit used for? Codegen and debug info emission don't need it. I'm not sure what sets CurrentProgramInfo.DynamicCallStack at present either.
It is part of the kernel meta data that is passed to the runtime to indicate the compiler can't determine the required stack amount use of the kernel and to tell the runtime it needs to check the value set from hipDeviceSetLimit.
DynamicCallStack ends up getting set here:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp#L706
Which is essentially just a check for recursion or indirect function calls/runtime polymorphism (i.e. virtual functions).
But the dynamic call stack information ends up being incorporated into CodeGen here:
I don't see how this conveys any information. The compiler writes the stack size to be allocated. If it doesn't know what is sufficient, it's going to request some maximum and hope for the best.
The runtime allocates the requested size. If it has a bit saying "but use less if you know that's safe", then it can do nothing with that bit unless it has extra information. If it has that extra information, it doesn't need this bit.
Therefore instead of adding printing stuff related to this Boolean flag, we should delete the Boolean flag.
What's the use case I'm missing which makes this flag necessary/beneficial?
Post Code Object 5: If the compiler knows the required stack size of a kernel, it sets it and reports that to the runtime. The runtime then uses that value.
If the compiler does not know the required amount, in the case of indirect function calls or recursion, it sets the dynamic stack use bit and reports the minimum required by the kernel to the runtime (the actual amount required by the kernel could be significantly more than the minimum).
If the value required by the kernel ends up being more than the minimum calculated by the compiler or the runtime default value, when the dynamic stack use bit is set, then the code will crash. hipDeviceSetLimit must then be used to raise the stack allocated by the compiler/runtime.
Identifying those cases where the kernels use dynamic stack, and thus developers need to consider the value set by the hipDeviceSetLimit API, can actually be somewhat difficult if the kernel is buried under many layers of templates. We're already reporting remarks about compiler identified/reported scratch use and we've been asked if we can report this flag as part of these remarks.
I suppose reporting out the minimum the compiler calculates might also be useful here but don't feel strongly one way or the other about that.
That was the old broken workaround for the old bit that was never actually implemented in the runtime. The runtime now does properly respect some field to switch to interpreting the reported size as a minimum and then allocates the max of that minimum and some API provided size value
clang/test/Frontend/amdgcn-machine-analysis-remarks.cl | ||
---|---|---|
22 | Print right after the scratch size | |
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
1308 | don't need std::string for simple literals? StringRef? | |
llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll | ||
177 | Maybe add another that has a static component too |
"runtime" == "hip/opencl runtime". The openmp runtime needs to do the same if not being done already.
The metadata is also irrelevant to this patch which is just reporting optimization hint information
The openmp runtime needs to do the same if not being done already.
Should be on @carlo.bertolli's TODO list
Print right after the scratch size