This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD
ClosedPublic

Authored by JonChesterfield on Feb 16 2023, 2:46 PM.

Details

Summary

Post ISel, LDS variables are absolute values. Representing them as
such is simpler than the frame recalculation currently used to build assembler
tables from their addresses.

This is a precursor to lowering dynamic/external LDS accesses from non-kernel
functions.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2023, 2:46 PM
JonChesterfield requested review of this revision.Feb 16 2023, 2:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2023, 2:46 PM
arsenm added inline comments.Feb 16 2023, 3:03 PM
llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
515 ↗(On Diff #498146)

F.getParent

599 ↗(On Diff #498146)

Ditto

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
112

Changing the IR in a codegen pass is bad. Can you set this earlier?

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
515 ↗(On Diff #498146)

Nope, F.getParent will yield a const Module, need a non-const one in order to get a non-const GlobalVariable in order to call setMetadata on it. Otherwise would just call F.getParent within allocateKnownAddressLDSGlobal.

516 ↗(On Diff #498146)

Making the function argument non-const would be simpler for the allocateKnownAddress call but makes a mess of some of the call sites

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
112

Setting metadata on the IR during codegen is not pretty but it's also inconsequential. The frame calculation this deletes is also quite ugly though.

For the existing cases we can set the metadata in the IR lowering (carefully, but accurately enough). It'll make changes to promote alloca fragile. That won't work for external/dynamic LDS as we don't know that address until the end of ISel.

So yes... if we want a hard no on setting metadata to record information learned during codegen, we can avoid it. Basically by lifting the remaining LDS lowering currently done by codegen wholly out of codegen. That'll change how dynamic LDS is handled from kernels which I was hoping to avoid.

If we're up for tagging variables with metadata at this point we can write the address chosen for dynamic LDS variables the same way and the lowerConstant path added here will do the right thing for them.

Created D144233 (which was supposed to be a diff relative to this patch but actually has all the line noise in it anyway).
That is not ready to go but shows how function scope external LDS can be layered on the same metadata scheme proposed here. Main difference is that we really don't know that address until codegen time, at least as presently implemented.

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
120

Note to self - this shouldn't be dead after this patch - find out what happened to the lowering path for the currently unused "direct" scheme

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
120

I think a block was lost in a git merge - AMDGPUIselLowering should have a block in it like:

if (G->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS) {
  if (!MFI->isModuleEntryFunction()) {
    if (const GlobalVariable *GVar = dyn_cast<GlobalVariable>(GV)) {
      if (AMDGPUMachineFunction::isKnownAddressLDSGlobal(*GVar)) {
        unsigned Offset =
            AMDGPUMachineFunction::calculateKnownAddressOfLDSGlobal(*GVar);
        return DAG.getConstant(Offset, SDLoc(Op), Op.getValueType());
      }
    }
  }
}

that allows direct access to kernel allocated variables from functions where the access is unambiguous. This highlights missing test coverage for the "kernel" lowering strategy (from IR to ISA). Bug is latent but will complicate getting rid of the calculateKnownAddressOfLDSGlobal function.

arsenm added inline comments.Feb 20 2023, 12:59 PM
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
112

So you're implying the promote alloca introduced LDS still has possibly unpredictable offsets assigned for it?

arsenm added inline comments.Feb 20 2023, 1:02 PM
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
120

So if you set the absolute_address in the IR pass, and LowerGlobalAddress respected it, how would that be less reliable?

JonChesterfield planned changes to this revision.Feb 23 2023, 6:25 AM
JonChesterfield added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
112

Yep. It's a phase ordering problem.

promoteAlloca needs to know how much LDS is already allocated to estimate how much more it can use. Some of that might be allocated for non-kernel functions, so LowerModuleLDS runs before promoteAlloca to make those allocations visible to it.

LowerModuleLDS thus doesn't know how much (if any) extra will be introduced by promoteAlloca.At least, it doesn't at present.

Then the lowered frame looks like:
{

managed by lowermodulelds
introduced by promotealloca (and maybe other things? none I know of)
maybe alignment
dynamic lds goes here

}

Also if promotealloca introduces more than one lds variable (haven't checked, seems reasonable that it might) and introduces them as independent values with different alignment, then we're back to allocate-in-dag-traversal order, in which case the address of dynamic lds is unstable.

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
120

Would work identically for the module/kernel specific structures as they are deterministic.

Actually somewhat prettier to set those via metadata as it avoids the problem of codegen a function before the corresponding kernel. The above ^ would change to look at metadata on the variables.

That will unwind a little if something after lowermodulelds decides to change those structs, but that's manageable.

Even if we can't mutate the IR during codegen for dynamic lds, I think that's an objective improvement to the current lowering scheme involving repeated frame calculations.

  • Set the metadata in lowermodulelds - test noise update still todo
JonChesterfield edited the summary of this revision. (Show Details)Feb 25 2023, 2:03 AM
  • factor out the md parse
  • drop one function, move a second out of the header

This is still a NFC in terms of machine code but introducing the metadata during LDS lowering introduces a lot of churn into the tests, leaving updating those for a later patch.

Strategy of emitting metadata in the IR pass and then checking allocation matches it removes the frame calculation functions from AMDGPUMachineFunction and reduces (but doesn't quite eliminate) the function/struct symbol name string effort. It's probably possible to tag functions with the struct directly (using more metadata) to remove that entirely.

The primary drawback is that this uses metadata to convey information where we will miscompile if that metadata is removed, directly at odds with the design intent of metadata. However I claim that ship has sailed, and we already assume that some metadata makes it to the backend safely. That the lowering pass runs as part of codegen, not as part of per-TU opt, reduces the risk of another pass invalidating it.

Aside from the metadata-might-be-invalidated hazard, I consider this approach better than the frame calculation in every respect, and would have done it this way around originally if I'd noticed the absolute symbol construct before writing the frame calculation approach.

This is ready for code review - hopefully nothing too surprising in the implementation - but needs a large mechanical update to eight tests to introduce the metadata and change some integers.

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
101

Function has four call sites across three files, think it's probably worth factoring out.

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
108–109

Might be better to specialise this to take a global, check for absolute_symbol and rename the function to match. optional<uint32> getAbsoluteSymbolMetadata(const GlobalVariable *) perhaps.

The primary drawback is that this uses metadata to convey information where we will miscompile if that metadata is removed, directly at odds with the design intent of metadata.

absolute_symbol is one of the few cases for global metadata which you are specifically not allowed to drop as per the LangRef.

arsenm added inline comments.Mar 3 2023, 1:01 PM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
616

Don't like variables shadowing type names (also hate the LLVM capitalized naming convention). auto *IntTy?

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
104

Early return (or just move the null check to the caller).

The verifier should really be enforcing the operand count so every user doesn't need to check that it's 1. Should add the verifier check and drop the check

arsenm added inline comments.Mar 3 2023, 1:40 PM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
618

Going based on the langref, this is incorrect usage of absolute_symbol. It seems to expect 2 operands, indicating an absolute range. I assume you can specify specific address, specific address + 1 for a fixed address. Again demonstrates that the verifier really should check this

arsenm added inline comments.Mar 3 2023, 1:43 PM
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
104

Also there is already GlobalValue::getAbsoluteSymbolRange()

  • review comments, update tests
JonChesterfield marked 4 inline comments as done.Mar 9 2023, 6:37 PM

I think that's all comments fixed except for teaching the verifier about range metadata which should be done separately to this.

JonChesterfield edited the summary of this revision. (Show Details)Mar 9 2023, 6:41 PM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
826

This is right but moderately annoying to extend for D144233 - would rather land this as is and then amend

arsenm accepted this revision.Mar 10 2023, 4:29 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
158

This should probably be an attribute since it doesn't have the special case must-not-drop property but I guess it already was metadata

This revision is now accepted and ready to land.Mar 10 2023, 4:29 PM
This revision was landed with ongoing or failed builds.Mar 12 2023, 6:48 AM
This revision was automatically updated to reflect the committed changes.