This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu][nfc] Allocate kernel-specific LDS struct deterministically
ClosedPublic

Authored by JonChesterfield on Jun 4 2022, 9:41 AM.

Details

Summary

A kernel may have an associated struct for laying out LDS variables.
This patch puts that instance, if present, at a deterministic address by
allocating it at the same time as the module scope instance.

This is relatively likely to be where the instance was allocated anyway (~NFC)
but will allow later patches to calculate where a given field can be found,
which means a function which is only reachable from a single kernel will be
able to access a LDS variable with zero overhead. That will be particularly
helpful for applications that instantiate a function template containing LDS
variables once per kernel.

Diff Detail

Event Timeline

JonChesterfield created this revision.Jun 4 2022, 9:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 4 2022, 9:41 AM
JonChesterfield requested review of this revision.Jun 4 2022, 9:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 4 2022, 9:41 AM
JonChesterfield added inline comments.Jun 4 2022, 9:43 AM
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
131

this is both a negligible optimisation and decouples the layout from whether dynamic lds alignment is specified, which is useful for calculating where the variable is going to be from a context that doesn't know whether dynamic lds is in play

Hoping this is uncontroversial - got more patches to follow up with if we can land this,

Any tests?

It is not immediately obvious it is NFC, especially around alignment.

I think the layout this chooses is always one of the layouts that could have been picked by the iteration order before. Dynamic LDS alignment gets set when encountering the dynamic variable which could be after the other two were allocated.

I'll write some IR to asm cases that would previously fail non-deterministically but will now pass reliably.

arsenm added inline comments.Jun 6 2022, 2:48 PM
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
91

Need to be careful about multiple anonymous functions

scchan added a comment.Jun 7 2022, 8:08 AM

Hoping this is uncontroversial - got more patches to follow up with if we can land this,

This patch essentially turns the module scope LDS layout struct into creating a function scope thing for every kernel using global LDS?

Hoping this is uncontroversial - got more patches to follow up with if we can land this,

This patch essentially turns the module scope LDS layout struct into creating a function scope thing for every kernel using global LDS?

No. A patch by Stas some months ago adapted the module scope LDS lowering to stash remaining kernel variables in a per-kernel struct.

All this patch does, if that per-kernel struct is present, is put it at a predictable address instead of leaving it to the whims of block iteration order.

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
91

How so? By the time they're functions in IR they have unique names as far as I know. If they don't, the current lowering is already broken for them as it assumes a struct can be uniquely associated with a function by deriving the variable name from the function name

arsenm added inline comments.Jun 10 2022, 8:09 AM
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
91

They don't have to have a name at all. If you have kernels "@0" and "@1" they'll get the same thing here

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
91

That's exciting. Broken at present, AMDGPULowerModuleLDSPass creates the kernel-specific variable by passing Twine("llvm.amdgcn.kernel.") + F->getName() + ".lds" to StructType::Create() with a suffix ".t".

I'd be inclined to say the right fix for that is to hard error on them on the basis that anonymous kernels are difficult to invoke by name, but failing that we could assign them names in LowerModuleLDS.

Or we could stop using string equivalence to bind a function and a struct together and do something along the lines of metadata instead which should be more robust to things like renaming the kernel.

arsenm added inline comments.Jun 10 2022, 9:09 AM
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
91

You could do what the final emission does, and use the name out of Mangler::getNameWithPrefix

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
91

Can I postpone fixing that until after this patch, given it's already broken independent of this patch?

JonChesterfield added inline comments.Aug 9 2022, 7:48 AM
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
91

Mangler::getNameWithPrefix is a local state thing - different instances of Mangler will assign different values. We could value::setName the anonymous function but that'll be a change visible outside of LDS and feels inherently wrong.

Moved the anonymous function problem to D134741, this patch neither creates nor addresses it.

  • Wrote test case, discovered control flow can be simpler, updated implementation
arsenm accepted this revision.Sep 27 2022, 1:02 PM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll
3

Should use -mtriple=amdgcn-amd-amdhsa, the checks are using the old style mesa relocations for stack

4–6

Is there a real reason to test this with all the targets?

This revision is now accepted and ready to land.Sep 27 2022, 1:02 PM
llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll
3

Nice, thanks. Dropping to gfx10 only as well, makes the test easier to read.

  • triple to hsa, drop to single target
This revision was landed with ongoing or failed builds.Sep 28 2022, 6:55 AM
This revision was automatically updated to reflect the committed changes.