Now since LDS uses within non-kernel functions are being handled in the
pass - LowerModuleLDS, we *NO* need to *forcefully* inline non-kernel
functions just because they use LDS. Do forceful inlining only when the
pass - LowerModuleLDS is not enabled. It is enabled by default.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I agree with the intent. This would be a good change to run through the internal CI before landing.
llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp | ||
---|---|---|
129 | What's REGION_ADDRESS? LowerModuleLDS doesn't to do anything with it so this looks like an incidental behaviour change | |
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
396 | Why does the cl::opt instance initialize to true, while this initializes to false? I'm not clear why the storage needs to become external here, but do see that the same thing has been done for the similar variables in this block. |
I agree to some extent that it will be good change to run through the internal CI before landing, but, not sure, why you did not take care of it while you were landing your original patch. Because, without this change, your patch is practically not much applicable.
Can we upstream it and watch out how your patch really performs with this change?
@arsenm what would be your suggestion here?
llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp | ||
---|---|---|
129 | Just like LOCAL_ADDRESS is specific to work-group, REGION_ADDRESS is specific to device, but it is not yet implemented in AMDHSA. Please have a look at the link - https://llvm.org/docs/AMDGPUUsage.html#address-spaces. DS instructions can be used to access region memory too in addition to local memory. And, I guess, this is the region that code for local memory and region memory co-exist together at some places. @arsenm might give better answer here about if it will result in an incidental behaviour change. | |
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
396 | cl::opt instance initialize it to true, because, by default the flag is enabled, and also this same instance make sure that AMDGPUTargetMachine::EnableLowerModuleLDS is correctly initialized to default true. Please have a look at cl::location(AMDGPUTargetMachine::EnableLowerModuleLDS). However, since AMDGPUTargetMachine::EnableLowerModuleLDS is a static variable, it is initialized to "false" at class scope. The storage needs to be external, because it needs to used at different TU, like this one used within AMDGPUAlwaysInlinePass.cpp. For example, please take a look at the flag EnableAMDGPUFunctionCallsOpt, and its corresponding static member - AMDGPUTargetMachine::EnableFunctionCalls. I just followed the same philosophy here. |
Change requested is to not change the handling of region_address. I haven't looked up what that is but I am sure the module lds pass doesn't do anything with it.
llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp | ||
---|---|---|
129 | Let's refactor this so that it doesn't change behaviour for REGION_ADDRESS, as otherwise we're introducing at least one of confusion or a latent bug. | |
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
396 | I would say you've copied a mistake from enable function calls. Look at EnableFixedFunctionABI for a case where the initializer is consistent between the two places. It looks like none of these need explicit initializers to me, but if we're going to have one, lets have it match the default for the command line argument. |
llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp | ||
---|---|---|
129 | REGION maps to the AMD GPU GDS (Global Data Store) scratchpad memory hardware. It is basically like LDS (Local Data Store) and uses the same ds instructions, but is shared by all waves on the same device. The ROCm runtime has not implemented it. An earlier LLVM fork did have some support I believe. There are numerous places in AMD GPU LLVM that do have support for GDS such as in the SIMemoryLegalizer. It is described at https://llvm.org/docs/AMDGPUUsage.html#address-spaces . |
llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp | ||
---|---|---|
129 | Thanks! I know the term GDS but hadn't guessed the connection to REGION. It is indeed in the address spaces table. Good to know that's what it referred to, the above is still correct. We do not want to change the handling of GDS based on whether a pass that only changes LDS has run. |
What's REGION_ADDRESS? LowerModuleLDS doesn't to do anything with it so this looks like an incidental behaviour change