This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu][lds] Use amdgpu-lds-size instead of llvm.donothing
Needs ReviewPublic

Authored by JonChesterfield on Jul 13 2023, 3:37 PM.

Details

Reviewers
arsenm
jmmartinez
Summary

LDS objects allocated by in kernels by LowerModuleLDS were marked with llvm.donothing
to ensure they had a use from the kernel, as opposed to only from called functions.

The backend does not need that explicit use after D155190. PromoteAlloca relies on that
use to estimate the total LDS allocated on other variables. Changing PromoteAlloca to use
the attribute directly is quicker and more accurate when available. It also removes the last
requirement for the llvm.donothing calls on static LDS.

llvm.donothing is still used for dynamic LDS alignment after this patch.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2023, 3:37 PM
JonChesterfield requested review of this revision.Jul 13 2023, 3:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2023, 3:37 PM
JonChesterfield edited the summary of this revision. (Show Details)Jul 13 2023, 3:45 PM

The "amdgpu-lds-size" attributes in tests will mostly disappear on rebase after landing D155190.

llvm/test/CodeGen/AMDGPU/lower-module-lds-indirect-extern-uses-max-reachable-alignment.ll
183

^ this is curious, looks like llvm.donothing affects whether a function is considered speculatable.

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
904

Most of this iteration and search for LDS could be dropped for kernels that are transformed by LowerModuleLDSPass if it added an attribute for dynamic lds alignment, along with amdgpu-lds-size.

arsenm added inline comments.Jul 13 2023, 3:57 PM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
1060

I thought it was weird only having a size attribute. But also, how does this help if the size is always rounded by the maximum dynamic alignment?

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
955

Is there now a path to introducing LDS outside of a kernel?

I'm slightly nervous about assuming amdgpu-lds-size is a source of truth in case something else introduces new LDS globals. Such as this pass, which is not updating the value

arsenm added inline comments.Jul 13 2023, 3:59 PM
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
955

If it already pre-packed the globals into a single big one, won't the search find the case of 1 and work fine without this?

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
1060

Two things.

The promote alloca pass skips kernels that use dynamic lds. It does that by crawling globals. This use ensures it notices when kernels only use dynamic lds in some called function. An align attribute would give the same information without the search.

If something wants to append to the static frame, repeatedly, tracking dynamic alignment separately (aka alignment of the end of the frame) seems sensible.

I haven't totally thought this one through yet, it might suffice to have a bool do-not-append-more-lds per kernel.

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
955

PromoteAlloca sometimes adds a global accessed by a kernel. It could be extended to work on non-entry functions with some effort. That's currently the only thing downstream of lowermodulelds that might add a global.

amdgpu-frame-size is a reasonable contender for a source of truth. We can append to it and mark the new global with an absolute address to get something the backend handles correctly. The awkward case is dynamic lds, which currently needs to block appending.

I think the lowering pass could be reworked to compose cleanly based on that attribute and append-only model

I'm thinking of moving graphics to the same lowering as compute and then making any lds variable without an absolute address assigned before codegen a fatal error.

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
955

It misses the big struct if it isn't directly used by the kernel. This pass crawls the globals but not the call graph. Thus the donothing hack

Approach in D155384 is better for PromoteAlloca.