This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu][lds] Introduce LDS frame size function attribute
AbandonedPublic

Authored by JonChesterfield on Jul 12 2023, 3:03 PM.

Details

Reviewers
arsenm
jmmartinez
Summary

Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend.

Prior to this patch:
The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol
metadata so that the assembler can build lookup tables out of it. There is a fragile association between
kernel functions and named structs which is used to recompute the frame layout in the backend, with
fatal_errors catching inconsistencies in the second calculation.

After this patch:
The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same
absolute_symbol metadata that the assembler uses to place objects within that frame size.

As a transient state, in this patch, the backend _also_ recomputes the frame to check the values are
right, though it doesn't do anything with that information. A following patch will delete the redundant
checks. This is mostly so that CI will provide more comprehensive correctness checking and to improve
signal to noise in this diff.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2023, 3:03 PM
JonChesterfield requested review of this revision.Jul 12 2023, 3:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2023, 3:03 PM
  • drop spurious newline

As a standalone patch, this makes the compiler slightly slower and introduces a fatal error where there previously wasn't one. I'd still like to land it ahead of removing allocateKnownAddressLDSGlobal as it's easier to sanity check the test changes in isolation.

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
1215

Currently this frame layout is passed in as hoc fashion based on kernel structs having a name derived from the corresponding kernel.

However the backend doesn't actually need the exact frame layout because the globals also have absolute_address metadata on them.

Introduce an attribute (name chosen to be similar to amdgpu-gds-size) which records the frame size.

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
166

This frame logic will be deleted in a subsequent patch, leaving only amdgpu-lds-size as the lookup. At that point it can probably move to the function constructor and this allocate known address function can be deleted.

In the first instance, add a fatal error if the different accounting have gone awry. Landing this patch (and watching it go through internal CI) is supporting evidence that the two data representations are equivalent.

Should document what it means in AMDGPUUsage

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
224

Use getFnAttributeAsParsedInteger

llvm/test/CodeGen/AMDGPU/lower-module-lds.ll
67

is this one going away?

  • use getFnAttributeAsParsedInteger

Should document what it means in AMDGPUUsage

Maybe? The attributes section says "...backend supports the following LLVM IR attributes" which sounds user facing. While it's possible to set the attributes&metadata by hand and disable the lowering pass, doing so would take a fair amount of reverse engineering. I'm happier leaving this undocumented.

llvm/test/CodeGen/AMDGPU/lower-module-lds.ll
67

Yep.

JonChesterfield marked an inline comment as done.Jul 12 2023, 3:44 PM

The plan for the semantic change following this is:

1/ allocate everything as we do now in the IR pass
2/ add this attribute and allocate based on it in the backend
3/ lower any LDS with absolute metadata to that address provided it is within said frame
4/ other LDS lowered as it is now, in the slightly ad hoc appending fashion

At some point 4/ could usefully be removed. Need to move graphics over to the same scheme first and update promoteAllocaToLDS, at which point _all_ LDS variables will be accounted for by the lower module pass.

Variables with absolute addresses outside of the frame means a miscompile (e.g. the IR pass didn't request a large enough frame) so there's an error raised in D155132 for the current case where that can happen.

This will largely decouple the backend and IR pass. Anonymous functions will work fine. The IR pass can get more aggressive in data layout if we wish, e.g. multiple kernels using the same layout.

The backend's model will then be:
1/ allocate the amount of memory requested, probably in the machine function constructor
2/ put LDS variables at the absolute address they requested, error if that's inconsistent with 1/

Complexity and failure modes down. Information passed to backend via a function attribute (which shouldn't be discarded) and absolute_symbol metadata (which shouldn't be discarded), with everything else reified in the IR.

JonChesterfield updated this revision to Diff 539951.EditedJul 13 2023, 4:24 AM
  • rebase
  • Allocate LDS using the static frame scheme
JonChesterfield edited the summary of this revision. (Show Details)Jul 13 2023, 5:15 AM

D155190 is this with the recalculation deleted, instead of set up to catch regressions under CI. Preference is to land this, then rebase D155190, then land that.

Should document what it means in AMDGPUUsage

Maybe? The attributes section says "...backend supports the following LLVM IR attributes" which sounds user facing. While it's possible to set the attributes&metadata by hand and disable the lowering pass, doing so would take a fair amount of reverse engineering. I'm happier leaving this undocumented.

I think it should be documented. Not documenting it is no obstacle to people using it, and using it in however they decide it works. The backend should also try to tolerate existing uses of it, else we break the invariant that running a pass twice shouldn't break anything. We do document some other attributes people are better off not touching themselves as it is

JonChesterfield added a comment.EditedJul 13 2023, 6:52 AM

OK, that seems reasonable. Updating this patch with some documentation for what this is and how it works.

Worth noting that running LowerModuleLDS twice does not currently (and has never) worked, but it could be made to do so. Would be much easier to make running it multiple times safe if the change in this patch was adopted first, there's less state to rewrite.

Post this patch, it would be feasible to allow absolute_symbol metadata on global variables written in application code. That would allow DIY allocation for application developers. It make the data layout more complicated in LowerModuleLDS - what is currently an error would become an allocation constraint - but I can see it being a useful feature.

JonChesterfield abandoned this revision.Jul 13 2023, 1:11 PM

Abandoning in favour of D155190, which is the same as this patch but with deleting the existing logic instead of checking it hasn't been accidentally changed.