Page MenuHomePhabricator

[AMDGPU] Report minimum scratch size in code object v5 and later by default
ClosedPublic

Authored by abinavpp on Jun 22 2022, 8:08 AM.

Details

Summary

This change sets
-amdgpu-assume-{external-call-stack-size | dynamic-stack-object-size}
options to zero by default for code object v5 and later. The runtime is
expected to adjust the scratch size if the amdhsa_uses_dynamic_stack bit
in the kernel descriptor is set.

Diff Detail

Event Timeline

abinavpp created this revision.Jun 22 2022, 8:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2022, 8:08 AM
abinavpp requested review of this revision.Jun 22 2022, 8:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2022, 8:08 AM

This should remove the amdgpu-assume-external-call-stack-size and amdgpu-assume-dynamic-stack-object-size flags too

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
698–699 ↗(On Diff #439028)

Dynamic call size does not imply 0 for the statically reported size. We still need to report the minimum statically known requirement

abinavpp added inline comments.Jun 23 2022, 5:04 AM
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
698–699 ↗(On Diff #439028)

Does that mean we should do the following instead?

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
index f7f93c75c870..e6a5aa989c9d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
@@ -165,8 +151,6 @@ AMDGPUResourceUsageAnalysis::analyzeResourceUsage(
 
   // Assume a big number if there are any unknown sized objects.
   Info.HasDynamicallySizedStack = FrameInfo.hasVarSizedObjects();
-  if (Info.HasDynamicallySizedStack)
-    Info.PrivateSegmentSize += AssumedStackSizeForDynamicSizeObjects;
 
   if (MFI->isStackRealigned())
     Info.PrivateSegmentSize += FrameInfo.getMaxAlign().value();
@@ -469,16 +453,12 @@ AMDGPUResourceUsageAnalysis::analyzeResourceUsage(
             // directly call the tail called function. If a kernel directly
             // calls a tail recursive function, we'll assume maximum stack size
             // based on the regular call instruction.
-            CalleeFrameSize =
-              std::max(CalleeFrameSize,
-                       static_cast<uint64_t>(AssumedStackSizeForExternalCall));
+            CalleeFrameSize = 0;
           }
         }
 
         if (IsIndirect || I == CallGraphResourceInfo.end()) {
-          CalleeFrameSize =
-              std::max(CalleeFrameSize,
-                       static_cast<uint64_t>(AssumedStackSizeForExternalCall));
+          CalleeFrameSize = 0;
abinavpp updated this revision to Diff 439468.Jun 23 2022, 11:14 AM

Stop falling back to 16k (WIP)

abinavpp added inline comments.Jun 23 2022, 11:17 AM
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
698–699 ↗(On Diff #439028)

Updated the revision with the above change. Let me know if we're good with the test changes.

abinavpp retitled this revision from [AMDGPU] Set scratch size to 0 if is_dynamic_callstack is set to [AMDGPU] Disable the private segment size fallback to 16k.Jun 24 2022, 3:39 AM
abinavpp edited the summary of this revision. (Show Details)

Do we need to worry about non HSA platforms here? Like the behavior of AMDGPUPALMetadata::setFunctionScratchSize(). Also, the calculation of SIProgramInfo::ScratchBlocks?

Do we need to worry about non HSA platforms here? Like the behavior of AMDGPUPALMetadata::setFunctionScratchSize(). Also, the calculation of SIProgramInfo::ScratchBlocks?

I think this should be consistent everywhere. The fixed size was always an undesirable hack. I'm not sure if the shaders have a defined way to represent this though

Will this patch cause HIP apps using the dynamic stack to fail if HIP runtime change is not in place?

Do we have a plan for a smooth transition of dynamic stack support in HIP runtime? Thanks.

Will this patch cause HIP apps using the dynamic stack to fail if HIP runtime change is not in place?

Yes, The runtime changes should happen in advance. This is also a rocr change

arsenm added inline comments.Jun 29 2022, 11:33 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll
139 ↗(On Diff #439817)

This should now check the dynamic size fields

llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll
185–188

Ditto for the rest

abinavpp updated this revision to Diff 442034.Jul 4 2022, 2:17 AM
  • Rebase
  • Updated the tests (GlobalISel/)non-entry-alloca.ll, recursion.ll, hsa-metadata-resource-usage-function-ordering.ll to check the dynamic stack
abinavpp marked an inline comment as done.Jul 4 2022, 2:19 AM
abinavpp added inline comments.
llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll
185–188

There's a check for is_dynamic_callstack from v2 here

arsenm accepted this revision.Jul 5 2022, 10:20 AM
This revision is now accepted and ready to land.Jul 5 2022, 10:20 AM
abinavpp updated this revision to Diff 462861.Sep 26 2022, 4:16 AM
abinavpp retitled this revision from [AMDGPU] Disable the private segment size fallback to 16k to [AMDGPU] Report minimum scratch size in code object v5 and later by default.
abinavpp edited the summary of this revision. (Show Details)
  • Get the new scratch reporting specific to v5+ as per internal discussion
  • Not removing the -amdgpu-assume-* options now; Might need this when we run into problems
arsenm added inline comments.Sep 26 2022, 7:25 AM
llvm/docs/AMDGPUUsage.rst
3558–3561

This sentence doesn't read right at all. I don't know why this would be mentioned here and not just in the table with the description of the field

abinavpp added inline comments.Sep 26 2022, 8:01 AM
llvm/docs/AMDGPUUsage.rst
3558–3561

In amdgpu-amdhsa-kernel-descriptor-v3-table? We could do that; It's under the section "Code Object V3 Kernel Descriptor", can we add V5+ details there?

arsenm added inline comments.Sep 26 2022, 8:04 AM
llvm/docs/AMDGPUUsage.rst
3558–3561

Can mention the poor support for unknown sized stacks

abinavpp updated this revision to Diff 462946.Sep 26 2022, 9:35 AM

Moving the doc change to amdgpu-amdhsa-kernel-descriptor-v3-table; I think we
should remove the "V3" bit from that section since the kernel descriptor applies
for later code object versions

arsenm accepted this revision.Sep 26 2022, 9:41 AM
This revision was landed with ongoing or failed builds.Sep 28 2022, 9:23 PM
This revision was automatically updated to reflect the committed changes.