This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output
ClosedPublic

Authored by crobeck on Jul 22 2023, 6:49 PM.

Details

Summary

In code object 5 (https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata) the AMDGPU backend added the .uses_dynamic_stack bit to the kernel meta data to identity kernels which have compile time indeterminable stack usage (indirect function calls and recursion mainly). This patch adds this information to the output of the kernel-resource-usage remarks.

Diff Detail

Event Timeline

crobeck created this revision.Jul 22 2023, 6:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2023, 6:50 PM
crobeck requested review of this revision.Jul 22 2023, 6:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2023, 6:50 PM
crobeck edited the summary of this revision. (Show Details)Jul 22 2023, 7:06 PM

What is this bit used for? Codegen and debug info emission don't need it. I'm not sure what sets CurrentProgramInfo.DynamicCallStack at present either.

crobeck added a comment.EditedJul 23 2023, 5:41 AM

What is this bit used for? Codegen and debug info emission don't need it. I'm not sure what sets CurrentProgramInfo.DynamicCallStack at present either.

It is part of the kernel meta data that is passed to the runtime to indicate the compiler can't determine the required stack amount use of the kernel and to tell the runtime it needs to check the value set from hipDeviceSetLimit.

DynamicCallStack ends up getting set here:

https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp#L706

Which is essentially just a check for recursion or indirect function calls/runtime polymorphism (i.e. virtual functions).

But the dynamic call stack information ends up being incorporated into CodeGen here:

https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp#L198

arsenm added inline comments.Jul 23 2023, 5:42 AM
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1308

No reason to make this conditional on the version

llvm/test/CodeGen/AMDGPU/resource-optimization-remarks_COV5.ll
1 ↗(On Diff #543241)

Can just add to the existing remark test. Also, -filetype=null instead of -o /dev/null

JonChesterfield added a comment.EditedJul 23 2023, 9:58 AM

It is part of the kernel meta data that is passed to the runtime to indicate the compiler can't determine the required stack amount use of the kernel and to tell the runtime it needs to check the value set from hipDeviceSetLimit.

I don't see how this conveys any information. The compiler writes the stack size to be allocated. If it doesn't know what is sufficient, it's going to request some maximum and hope for the best.

The runtime allocates the requested size. If it has a bit saying "but use less if you know that's safe", then it can do nothing with that bit unless it has extra information. If it has that extra information, it doesn't need this bit.

Therefore instead of adding printing stuff related to this Boolean flag, we should delete the Boolean flag.

What's the use case I'm missing which makes this flag necessary/beneficial?

crobeck added a comment.EditedJul 23 2023, 10:36 AM

It is part of the kernel meta data that is passed to the runtime to indicate the compiler can't determine the required stack amount use of the kernel and to tell the runtime it needs to check the value set from hipDeviceSetLimit.

I don't see how this conveys any information. The compiler writes the stack size to be allocated. If it doesn't know what is sufficient, it's going to request some maximum and hope for the best.

The runtime allocates the requested size. If it has a bit saying "but use less if you know that's safe", then it can do nothing with that bit unless it has extra information. If it has that extra information, it doesn't need this bit.

Therefore instead of adding printing stuff related to this Boolean flag, we should delete the Boolean flag.

What's the use case I'm missing which makes this flag necessary/beneficial?

Post Code Object 5: If the compiler knows the required stack size of a kernel, it sets it and reports that to the runtime. The runtime then uses that value.

If the compiler does not know the required amount, in the case of indirect function calls or recursion, it sets the dynamic stack use bit and reports the minimum required by the kernel to the runtime (the actual amount required by the kernel could be significantly more than the minimum).

If the value required by the kernel ends up being more than the minimum calculated by the compiler or the runtime default value, when the dynamic stack use bit is set, then the code will crash. hipDeviceSetLimit must then be used to raise the stack allocated by the compiler/runtime.

Identifying those cases where the kernels use dynamic stack, and thus developers need to consider the value set by the hipDeviceSetLimit API, can actually be somewhat difficult if the kernel is buried under many layers of templates. We're already reporting remarks about compiler identified/reported scratch use and we've been asked if we can report this flag as part of these remarks.

I suppose reporting out the minimum the compiler calculates might also be useful here but don't feel strongly one way or the other about that.

crobeck updated this revision to Diff 543311.EditedJul 23 2023, 11:04 AM
crobeck retitled this revision from [AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output for CoV5 to [AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output.

Remove conditional and consolidate new test into existing remarks test

crobeck updated this revision to Diff 543318.Jul 23 2023, 12:31 PM

Update clang frontend test

Herald added a project: Restricted Project. · View Herald TranscriptJul 23 2023, 12:31 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
crobeck edited the summary of this revision. (Show Details)Jul 23 2023, 12:32 PM
crobeck marked an inline comment as done.

I don't see how this conveys any information. The compiler writes the stack size to be allocated. If it doesn't know what is sufficient, it's going to request some maximum and hope for the best.

That was the old broken workaround for the old bit that was never actually implemented in the runtime. The runtime now does properly respect some field to switch to interpreting the reported size as a minimum and then allocates the max of that minimum and some API provided size value

arsenm added inline comments.Jul 24 2023, 9:15 AM
clang/test/Frontend/amdgcn-machine-analysis-remarks.cl
22

Print right after the scratch size

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1308

don't need std::string for simple literals? StringRef?

llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll
177

Maybe add another that has a static component too

I don't see how this conveys any information. The compiler writes the stack size to be allocated. If it doesn't know what is sufficient, it's going to request some maximum and hope for the best.

That was the old broken workaround for the old bit that was never actually implemented in the runtime. The runtime now does properly respect some field to switch to interpreting the reported size as a minimum and then allocates the max of that minimum and some API provided size value

"runtime" == "hip/opencl runtime". The openmp runtime needs to do the same if not being done already.

crobeck updated this revision to Diff 543772.Jul 24 2023, 6:15 PM
crobeck edited the summary of this revision. (Show Details)

Change remark order and add static stack w/ indirect function call test case

crobeck marked 3 inline comments as done.Jul 24 2023, 6:15 PM
JonChesterfield resigned from this revision.Jul 25 2023, 6:46 AM
JonChesterfield added a subscriber: ronlieb.
arsenm accepted this revision.Jul 25 2023, 6:55 AM

lgtm with nits

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1298

Drop "Uses". Could also just inline the bool->string

llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll
189

No typed pointers

This revision is now accepted and ready to land.Jul 25 2023, 6:55 AM

What's the use case I'm missing which makes this flag necessary/beneficial?

The metadata is also irrelevant to this patch which is just reporting optimization hint information

The openmp runtime needs to do the same if not being done already.

Should be on @carlo.bertolli's TODO list

crobeck updated this revision to Diff 544049.Jul 25 2023, 11:17 AM
arsenm accepted this revision.Jul 25 2023, 11:18 AM
crobeck marked 2 inline comments as done.Jul 25 2023, 11:23 AM
This revision was landed with ongoing or failed builds.Jul 25 2023, 12:22 PM
This revision was automatically updated to reflect the committed changes.