This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Do not assume stack size for PAL code object indirect calls
ClosedPublic

Authored by bsaleil on May 15 2023, 1:37 PM.

Details

Summary

There is no need to set a big default stack size for PAL code object indirect calls. The driver knows the max recursion depth, so it can compute a more accurate value from the minimum scratch size.

Diff Detail

Event Timeline

bsaleil created this revision.May 15 2023, 1:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2023, 1:37 PM
bsaleil requested review of this revision.May 15 2023, 1:37 PM

Is

llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
122–124

Dynamic objects should be treated identically

llvm/test/CodeGen/AMDGPU/resource-usage-pal.ll
2

Can you just add a run line to an existing test?

sebastian-ne added inline comments.May 16 2023, 2:32 AM
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
121–123

Should use explicit brackets around the && clause.

122–124

I think the logic here is that we treat PAL code objects the same as AMDHSL code objects >= 5 (PAL code objects have no amdgpu_code_object_version).
If there is an indirect call, something outside the compiler needs to compute the stack size, in the case of PAL/graphics, that is the driver.

So probably we should use if (AMDGPU::getCodeObjectVersion(M) >= AMDGPU::AMDHSA_COV5 || STI.getTargetTriple().getOS() == Triple::AMDPAL) for both, AssumedStackSizeForDynamicSizeObjects and AssumedStackSizeForExternalCall

arsenm added inline comments.May 16 2023, 7:22 AM
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
122–124

Is some equivalent to the dynamic bit in the metadata emitted for pal?

sebastian-ne added inline comments.May 16 2023, 7:31 AM
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
122–124

We emit per-function metadata (https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdpal-code-object-shader-function-map-table).
An application needs to specify the recursion depth it uses and the driver uses this information to compute the needed scratch allocation.

arsenm added inline comments.May 16 2023, 7:34 AM
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
122–124

That metadata should be checked in the test

bsaleil updated this revision to Diff 525325.May 24 2023, 1:41 PM

Addressed review comments

bsaleil marked 6 inline comments as done.May 24 2023, 1:43 PM
arsenm added inline comments.May 24 2023, 2:19 PM
llvm/test/CodeGen/AMDGPU/resource-usage-pal.ll
6

Doesn't check a dynamic-is-present metadata field?

arsenm requested changes to this revision.Jun 4 2023, 9:12 AM
This revision now requires changes to proceed.Jun 4 2023, 9:12 AM
bsaleil updated this revision to Diff 528459.Jun 5 2023, 8:36 AM

Add comment to the test case

llvm/test/CodeGen/AMDGPU/resource-usage-pal.ll
6

We don't have such flag in PAL abi.

arsenm added inline comments.Jun 5 2023, 8:43 AM
llvm/test/CodeGen/AMDGPU/resource-usage-pal.ll
6

Then that is a problem? You would need one to know you need to add some extra?

sebastian-ne added inline comments.Jun 5 2023, 8:59 AM
llvm/test/CodeGen/AMDGPU/resource-usage-pal.ll
6

The amount of scratch that needs to be allocated is computed outside of LLVM in the graphics driver. The compute equivalent would be the linker/loader that sees all the functions that are linked together and also gets additional data like the maximum recursion depth.
So, the only information needed in PAL metadata is the scratch usage of a function itself, without any callees.

arsenm accepted this revision.Jun 8 2023, 4:31 PM
This revision is now accepted and ready to land.Jun 8 2023, 4:31 PM
This revision was landed with ongoing or failed builds.Jun 12 2023, 7:15 AM
This revision was automatically updated to reflect the committed changes.

This patch is causing buildbot failures because the RUN line of the test case is invalid (missing colon symbol). This is now fixed by 9eea63bc9ccf5bd18a040cc028238ac4b49b77ea