This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu] Implement dynamic LDS accesses from non-kernel functions
ClosedPublic

Authored by JonChesterfield on Feb 16 2023, 3:59 PM.

Details

Summary

The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so.

1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel
2/ Then it creates a new extern char[0] variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables
3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen
4/ The assembler builds the lookup table using the metadata
5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use

Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2023, 3:59 PM
JonChesterfield requested review of this revision.Feb 16 2023, 3:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2023, 3:59 PM
JonChesterfield edited the summary of this revision. (Show Details)Feb 16 2023, 4:06 PM
JonChesterfield edited the summary of this revision. (Show Details)
JonChesterfield edited the summary of this revision. (Show Details)
  • delete a print statement
JonChesterfield planned changes to this revision.Feb 25 2023, 2:23 AM

Needs to be rebased on D144221

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
227

@arsenm this hooks the same absolute_symbol machinery implemented in D144221 to allow building a table out of dynamic lds addresses in the assembler

Looking at AMDGPUPromoteAlloca.cpp, the spilling to LDS is disabled for any function which accesses any global that represents dynamic LDS.

This approach will leave a use of dynamic LDS in every kernel that calls any function that accesses dynamic LDS. That will disable spilling for that function. Thus it will be possible to accurately predict where dynamic LDS will start for that kernel.

For non-kernel functions, spilling to LDS is presently disabled as it used to be unreachable from functions.

Thus as currently implemented, dynamic LDS and spilling to LDS are disjoint. We can emit the metadata from IR and check it in codegen, very like in D144221.

We are probably leaving some performance on the table - functions that are only reachable from one kernel could spill to LDS for free - but that seems OK for now.

  • with frame setup, without tests, rebased on a refactored main
  • start testing, fix an inversion bug caught
JonChesterfield retitled this revision from [WIP][amdgpu] Implement dynamic LDS accesses from non-kernel functions to [amdgpu] Implement dynamic LDS accesses from non-kernel functions.Mar 23 2023, 4:49 PM
JonChesterfield edited the summary of this revision. (Show Details)
  • disable the always inlining pass while testing
  • fixes to frame lowering
  • fixes to frame lowering
llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp
130

This change (and the associated test change) are not intended to land as part of this revision

Passed internal CI with the always inliner patched (so that tests actually go down this code path) at 829538

arsenm added inline comments.Mar 30 2023, 6:21 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
188

I think this is call is only adding confusion. Just GV->getAddressSpace() == local && !GV->hasInitializer?

I also don't think checking the initializer is quite right. AMDGPUPromoteAlloca checks for GV->hasExternalLinkage() && AllocSize == 0.

JonChesterfield added inline comments.Apr 3 2023, 5:16 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
188

Thanks! There's some dead code in utils to remove as well, fixing

JonChesterfield edited the summary of this revision. (Show Details)Apr 3 2023, 5:17 AM
  • move isDynamicLDS into util, make consistent with promotealloca
  • move isDynamicLDS into util, make consistent with promotealloca

lds-frame-extern.ll test interacts badly with the always inline pass which doesn't appear to have a hook to disable it. Considering what to do about that, would rather separate the inliner change from the additional lowering, might move that test case change into a separate patch.

It might be better to move the frame lowering code around separate to this patch, will see how that looks

arsenm accepted this revision.Apr 3 2023, 12:08 PM

LGTM with nits

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
900

Typo ‘dyanmically’

970

The GlobalVariable constructor uses Twine, you don’t need to convert it to std::string here

This revision is now accepted and ready to land.Apr 3 2023, 12:08 PM
  • move isDynamicLDS into util, make consistent with promotealloca
  • review comments
This revision was landed with ongoing or failed builds.Apr 4 2023, 12:07 PM
This revision was automatically updated to reflect the committed changes.