This change sets
-amdgpu-assume-{external-call-stack-size | dynamic-stack-object-size}
options to zero by default for code object v5 and later. The runtime is
expected to adjust the scratch size if the amdhsa_uses_dynamic_stack bit
in the kernel descriptor is set.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This should remove the amdgpu-assume-external-call-stack-size and amdgpu-assume-dynamic-stack-object-size flags too
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
698–699 ↗ | (On Diff #439028) | Dynamic call size does not imply 0 for the statically reported size. We still need to report the minimum statically known requirement |
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
698–699 ↗ | (On Diff #439028) | Does that mean we should do the following instead? diff --git a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp index f7f93c75c870..e6a5aa989c9d 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp @@ -165,8 +151,6 @@ AMDGPUResourceUsageAnalysis::analyzeResourceUsage( // Assume a big number if there are any unknown sized objects. Info.HasDynamicallySizedStack = FrameInfo.hasVarSizedObjects(); - if (Info.HasDynamicallySizedStack) - Info.PrivateSegmentSize += AssumedStackSizeForDynamicSizeObjects; if (MFI->isStackRealigned()) Info.PrivateSegmentSize += FrameInfo.getMaxAlign().value(); @@ -469,16 +453,12 @@ AMDGPUResourceUsageAnalysis::analyzeResourceUsage( // directly call the tail called function. If a kernel directly // calls a tail recursive function, we'll assume maximum stack size // based on the regular call instruction. - CalleeFrameSize = - std::max(CalleeFrameSize, - static_cast<uint64_t>(AssumedStackSizeForExternalCall)); + CalleeFrameSize = 0; } } if (IsIndirect || I == CallGraphResourceInfo.end()) { - CalleeFrameSize = - std::max(CalleeFrameSize, - static_cast<uint64_t>(AssumedStackSizeForExternalCall)); + CalleeFrameSize = 0; |
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
698–699 ↗ | (On Diff #439028) | Updated the revision with the above change. Let me know if we're good with the test changes. |
Do we need to worry about non HSA platforms here? Like the behavior of AMDGPUPALMetadata::setFunctionScratchSize(). Also, the calculation of SIProgramInfo::ScratchBlocks?
I think this should be consistent everywhere. The fixed size was always an undesirable hack. I'm not sure if the shaders have a defined way to represent this though
Will this patch cause HIP apps using the dynamic stack to fail if HIP runtime change is not in place?
Do we have a plan for a smooth transition of dynamic stack support in HIP runtime? Thanks.
- Rebase
- Updated the tests (GlobalISel/)non-entry-alloca.ll, recursion.ll, hsa-metadata-resource-usage-function-ordering.ll to check the dynamic stack
llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll | ||
---|---|---|
185–188 | There's a check for is_dynamic_callstack from v2 here |
- Get the new scratch reporting specific to v5+ as per internal discussion
- Not removing the -amdgpu-assume-* options now; Might need this when we run into problems
llvm/docs/AMDGPUUsage.rst | ||
---|---|---|
3558–3561 | This sentence doesn't read right at all. I don't know why this would be mentioned here and not just in the table with the description of the field |
llvm/docs/AMDGPUUsage.rst | ||
---|---|---|
3558–3561 | In amdgpu-amdhsa-kernel-descriptor-v3-table? We could do that; It's under the section "Code Object V3 Kernel Descriptor", can we add V5+ details there? |
llvm/docs/AMDGPUUsage.rst | ||
---|---|---|
3558–3561 | Can mention the poor support for unknown sized stacks |
Moving the doc change to amdgpu-amdhsa-kernel-descriptor-v3-table; I think we
should remove the "V3" bit from that section since the kernel descriptor applies
for later code object versions
This sentence doesn't read right at all. I don't know why this would be mentioned here and not just in the table with the description of the field