This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Disable forceful inline of non-kernel functions which use LDS.
ClosedPublic

Authored by hsmhsm on Apr 14 2021, 7:54 AM.

Details

Summary

Now since LDS uses within non-kernel functions are being handled in the
pass - LowerModuleLDS, we *NO* need to *forcefully* inline non-kernel
functions just because they use LDS. Do forceful inlining only when the
pass - LowerModuleLDS is not enabled. It is enabled by default.

Diff Detail

Event Timeline

hsmhsm created this revision.Apr 14 2021, 7:54 AM
hsmhsm requested review of this revision.Apr 14 2021, 7:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2021, 7:54 AM

I agree with the intent. This would be a good change to run through the internal CI before landing.

llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp
129

What's REGION_ADDRESS? LowerModuleLDS doesn't to do anything with it so this looks like an incidental behaviour change

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
396

Why does the cl::opt instance initialize to true, while this initializes to false?

I'm not clear why the storage needs to become external here, but do see that the same thing has been done for the similar variables in this block.

I agree with the intent. This would be a good change to run through the internal CI before landing.

I agree to some extent that it will be good change to run through the internal CI before landing, but, not sure, why you did not take care of it while you were landing your original patch. Because, without this change, your patch is practically not much applicable.

Can we upstream it and watch out how your patch really performs with this change?

@arsenm what would be your suggestion here?

llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp
129

Just like LOCAL_ADDRESS is specific to work-group, REGION_ADDRESS is specific to device, but it is not yet implemented in AMDHSA. Please have a look at the link - https://llvm.org/docs/AMDGPUUsage.html#address-spaces.

DS instructions can be used to access region memory too in addition to local memory. And, I guess, this is the region that code for local memory and region memory co-exist together at some places.

@arsenm might give better answer here about if it will result in an incidental behaviour change.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
396

cl::opt instance initialize it to true, because, by default the flag is enabled, and also this same instance make sure that AMDGPUTargetMachine::EnableLowerModuleLDS is correctly initialized to default true. Please have a look at cl::location(AMDGPUTargetMachine::EnableLowerModuleLDS).

However, since AMDGPUTargetMachine::EnableLowerModuleLDS is a static variable, it is initialized to "false" at class scope.

The storage needs to be external, because it needs to used at different TU, like this one used within AMDGPUAlwaysInlinePass.cpp.

For example, please take a look at the flag EnableAMDGPUFunctionCallsOpt, and its corresponding static member - AMDGPUTargetMachine::EnableFunctionCalls.

I just followed the same philosophy here.

JonChesterfield requested changes to this revision.Apr 14 2021, 9:51 AM

Change requested is to not change the handling of region_address. I haven't looked up what that is but I am sure the module lds pass doesn't do anything with it.

llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp
129

Let's refactor this so that it doesn't change behaviour for REGION_ADDRESS, as otherwise we're introducing at least one of confusion or a latent bug.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
396

I would say you've copied a mistake from enable function calls. Look at EnableFixedFunctionABI for a case where the initializer is consistent between the two places.

It looks like none of these need explicit initializers to me, but if we're going to have one, lets have it match the default for the command line argument.

This revision now requires changes to proceed.Apr 14 2021, 9:51 AM
hsmhsm updated this revision to Diff 337504.Apr 14 2021, 11:22 AM

Fixed review comments by Jon.

hsmhsm marked 2 inline comments as done.Apr 14 2021, 11:23 AM
JonChesterfield accepted this revision.Apr 14 2021, 11:48 AM

LG, thanks

This revision is now accepted and ready to land.Apr 14 2021, 11:48 AM
t-tye added inline comments.Apr 14 2021, 1:12 PM
llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp
129

REGION maps to the AMD GPU GDS (Global Data Store) scratchpad memory hardware. It is basically like LDS (Local Data Store) and uses the same ds instructions, but is shared by all waves on the same device.

The ROCm runtime has not implemented it. An earlier LLVM fork did have some support I believe. There are numerous places in AMD GPU LLVM that do have support for GDS such as in the SIMemoryLegalizer. It is described at https://llvm.org/docs/AMDGPUUsage.html#address-spaces .

llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp
129

Thanks! I know the term GDS but hadn't guessed the connection to REGION. It is indeed in the address spaces table.

Good to know that's what it referred to, the above is still correct. We do not want to change the handling of GDS based on whether a pass that only changes LDS has run.

Graphics does use region, so we can't break that

This revision was landed with ongoing or failed builds.Apr 14 2021, 8:44 PM
This revision was automatically updated to reflect the committed changes.