This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: allow reordering of functions in AMDGPUResourceUsageAnalysis
ClosedPublic

Authored by jweightman on May 19 2022, 3:19 PM.

Details

Summary

The AMDGPUResourceUsageAnalysis was previously a CGSCC pass, and assumed
that a function's callees were always analyzed prior to their callees.
When it was refactored into a module pass, this assumption no longer
always holds. This results in calls being erroneously identified as
indirect, and reserving private segment space for them. This results in
significantly slower kernel launch latency.

This patch changes the order in which the module's functions are analyzed
from the order in which they occur in the module to a post-order traversal
of the call graph. Perhaps Clang always generates the module's functions
in such an order, but this is not the case for the Cray Fortran compiler.

Diff Detail

Event Timeline

jweightman created this revision.May 19 2022, 3:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 3:19 PM
jweightman requested review of this revision.May 19 2022, 3:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 3:19 PM
arsenm added inline comments.May 20 2022, 1:25 AM
llvm/test/CodeGen/AMDGPU/hsa-metadata-resource-usage-function-ordering.ll
2

Can you use the explicit external size flag so this test breaks when it’s removed?

23

Checking the stack size is 0 isn’t going to be a reliable check since we’re going to stop adding an arbitrary constant for unknown stack size and reporting dynamic stack separately. I’d prefer to have some register count checks too

I updated the regression test to also check the behavior with external calls, and explicitly set the assumed external call stack size. I also included SGPR and VGPR register count assertions for all kernels in the test.

jweightman marked an inline comment as done.May 20 2022, 7:56 AM
jweightman added inline comments.
llvm/test/CodeGen/AMDGPU/hsa-metadata-resource-usage-function-ordering.ll
23

@arsenm I added the register count checks, but I'm not sure how to make the stack size check more reliable at the moment. Could you clarify a few questions, so I can improve the test?

we’re going to stop adding an arbitrary constant for unknown stack size and reporting dynamic stack separately

This sounds like unknown calls will contribute 0 to the private segment fixed size, and there will be some other field that describes dynamic stack usage separately. What will this field be called? What values will it be able to have, and what will they mean?

Also, if this isn't implemented yet, how can I write a check for it or future-proof this test now?

I saw that the test was failing on the automatic build, so it looks like the register counts may have changed recently. The register counts depend on the architecture for this test case, and I've updated the checks in the test to match.

arsenm added inline comments.Jun 3 2022, 6:54 AM
llvm/test/CodeGen/AMDGPU/hsa-metadata-resource-usage-function-ordering.ll
5

You can check the direct asm output. I don't see a reason to go through the disassembler here

23

The flag will be removed in the future, so simply using it should be enough since it will fail with unknown flag error.

arsenm added inline comments.Jun 3 2022, 7:02 AM
llvm/test/CodeGen/AMDGPU/hsa-metadata-resource-usage-function-ordering.ll
16–19

These aren't the most helpful register counts. Can you mark the functions internal, and add some artificial asm register uses to get a more helpful value?

arsenm accepted this revision.Jun 3 2022, 1:20 PM

LGTM

This revision is now accepted and ready to land.Jun 3 2022, 1:20 PM
This revision was landed with ongoing or failed builds.Jun 3 2022, 1:56 PM
This revision was automatically updated to reflect the committed changes.