This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add MMOs for GFX11 Streamout Instructions
ClosedPublic

Authored by rovka on Mar 14 2023, 3:54 AM.

Details

Reviewers
arsenm
Group Reviewers
Restricted Project
Commits
rGd9bf8aba2371: [AMDGPU] Add MMOs for GFX11 Streamout Instructions
Summary

The GFX11 NGG Streamout Instructions perform atomic operations on
dedicated registers. At the moment, they lack machine memory operands,
which causes the si-memory-legalizer pass to treat them conservatively
and introduce several unnecessary waits and cache invalidations.

This patch introduces a new address space to represent these special
registers and teaches instruction selection to add memory operands with
this new address space to DS_ADD/SUB_GS_REG_RTN.

Since this address space is meant to be compiler-internal, we move it
up a bit from the other address spaces and give it the number 128.
According to the LLVM Language Reference, address space numbers can go
all the way up to 2^24, but I'm not sure how well this is supported in
practice [1], so using a smaller number seems safer.

[1] https://github.com/llvm/llvm-project/blob/0107513fe79da7670e37c29c0862794a2213a89c/llvm/utils/TableGen/IntrinsicEmitter.cpp#L401

Diff Detail

Event Timeline

rovka created this revision.Mar 14 2023, 3:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2023, 3:54 AM
rovka requested review of this revision.Mar 14 2023, 3:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2023, 3:54 AM
rovka added a reviewer: Restricted Project.Mar 14 2023, 3:56 AM
foad added inline comments.
llvm/docs/AMDGPUUsage.rst
678

My gut feeling is that this should be an internal implementation detail of the compiler, should never be exposed to the user, so should not be documented here.

Adding @nhaehnle.

llvm/lib/Target/AMDGPU/AMDGPU.h
363

Maybe add a comment here saying "the following address spaces are internal-only and can be freely renumbered"? E.g. @Pierre-vh plans to add a new user-visible address space 8, but I think it should be fine to do that by renumbering STREAMOUT_REGISTER to 9 to make room.

I am assuming we will end up with more internal-only address spaces similar to this one.

arsenm added inline comments.Mar 14 2023, 7:00 AM
llvm/docs/AMDGPUUsage.rst
678

I do think we need to document any address space numbers used for any purpose. We should also probably reserve a range of address spaces for external software uses

foad added inline comments.Mar 14 2023, 8:00 AM
llvm/docs/AMDGPUUsage.rst
678

I was thinking this would be completely inaccessible from IR and only introduced during instruction selection. Do you agree? Do you still think it needs to be documented here? Why?

arsenm added inline comments.Mar 14 2023, 8:02 AM
llvm/docs/AMDGPUUsage.rst
678

It would still need to be documented as internally used for something so it doesn't get reused for something else

foad added inline comments.Mar 14 2023, 8:18 AM
llvm/docs/AMDGPUUsage.rst
678

But we can reuse the number for something else whenever we like - we can just renumber STREAMOUT_REGISTER to something different, with no externally visible effect.

t-tye added a subscriber: t-tye.Mar 14 2023, 6:19 PM
t-tye added inline comments.
llvm/docs/AMDGPUUsage.rst
788

My understanding is that address spaces are about memory, not registers. So why is an address space being used for this purpose? Is it that the values are being passed in memory?

llvm/lib/Target/AMDGPU/AMDGPU.h
363

Or could the internal address spaces start at a higher number so they do not have to be renumbered? AMDGPUUsage could then reserve a set of values to "internal use".

rovka added inline comments.Mar 15 2023, 2:47 AM
llvm/docs/AMDGPUUsage.rst
678

I kind of see the point that we don't really have an "LLVM IR Address Space Number" for this, since it doesn't show up in the IR. We could just document that there IS an address space for the Streamout registers (if indeed we decide we need one), but don't assign it a formal address space number in this doc.

788

I think the idea is that these registers are embedded into the GDS.

In principle, we could use the region address space and as far as the existing tests are concerned we'd get the same results, but the docs are pretty clear that these are not operating on GDS directly, so it seemed cleaner to have a new address space.

I'm open to other interpretations :)

foad added inline comments.Mar 15 2023, 3:44 AM
llvm/docs/AMDGPUUsage.rst
788

My understanding is that address spaces are about memory, not registers. So why is an address space being used for this purpose? Is it that the values are being passed in memory?

They are called "registers" in the documentation but as far as the ISA is concerned they act more like memory in a separate address space that does not interact with any other type of memory. The only instructions that access them take an offset into the "register" file and read or write 32 or 64 bits at that offset. The individual registers are not named. Read operations from the register file use LGKMcnt to track when the read has completed.

I guess all of this *could* be implemented by modelling them as individual registers, but my hunch is that it would be more complicated to implement for no practical benefit.

nhaehnle added inline comments.Mar 16 2023, 1:48 AM
llvm/docs/AMDGPUUsage.rst
678

I agree with @arsenm, we need to document our choice here to reduce the risk of conflicts.

Yes, in theory we could renumber if that happens, but isn't it better to try to prevent the problem in the first place? It's also helpful as internal documentation for anybody working on the backend.

788

Yeah, this is just a case of unlucky naming due to different worlds colliding. My understanding is that HW people ended up calling it "registers" because the actual HW implementation looks more like (GRBM) registers than memory, but from the shader code's perspective it very much behaves like memory.

foad added inline comments.Mar 16 2023, 3:48 AM
llvm/lib/Target/AMDGPU/AMDGPU.h
363

I am assuming we will end up with more internal-only address spaces similar to this one.

For example we currently use a GWSResource PseudoSourceValue to represent GWS sync resources, and they end up in address space 0, which is just plain wrong. It would be better to use a dedicated internal-only address space for them, at which point the PSV no longer adds any value and we could get rid of it.

rovka updated this revision to Diff 506558.Mar 20 2023, 5:57 AM

Moved the new address space to 128, and clarified that it is an internal address space that can be freely renumbered.

rovka updated this revision to Diff 506559.Mar 20 2023, 6:00 AM

It seems arc didn't want to update the revision Summary from my commit message, so I'll add this as a comment:
I'm also planning to add this paragraph to the commit message:

[...]
Since this address space is meant to be compiler-internal, we move it
up a bit from the other address spaces and give it the number 128.
According to the LLVM Language Reference, address space numbers can go
all the way up to 2^24, but I'm not sure how well this is supported in
practice [1], so using a smaller number seems safer.

[1] https://github.com/llvm/llvm-project/blob/0107513fe79da7670e37c29c0862794a2213a89c/llvm/utils/TableGen/IntrinsicEmitter.cpp#L401

foad added a comment.Mar 20 2023, 6:39 AM

I wonder if you need to update getAliasResult in AMDGPUAliasAnalysis for the new address space.

rovka edited the summary of this revision. (Show Details)Mar 20 2023, 7:02 AM

I wonder if you need to update getAliasResult in AMDGPUAliasAnalysis for the new address space.

Hm, interesting, I'm not even sure how we create MemoryLocations for these intrinsics. I'll have a look.

scott.linder added inline comments.
llvm/docs/AMDGPUUsage.rst
678

I would also tend to agree that we should document it here, as the alternative is more confusing (i.e. seeing it documented but not needing it is worse than noticing it exists and finding no documentation).

The fact that it currently exists only transitively during ISEL also seems like an implementation detail, and so I don't think it must dictate whether we document it.

788

Is it worth noting the tenuous use of the term "registers" in the doc being commented on? Something like:

Dedicated registers used by the GS NGG Streamout Instructions. The register file is modelled as a memory in a distinct address space because it is indexed by an address-like offset in place of named registers, and because register accesses affect LGKMcnt.
foad added inline comments.Mar 21 2023, 2:10 AM
llvm/docs/AMDGPUUsage.rst
678

seeing it documented but not needing it

The risk is seeing it documented and trying to access it from IR and finding that that "doesn't work".

scott.linder added inline comments.Mar 21 2023, 8:44 AM
llvm/docs/AMDGPUUsage.rst
678

We can document it as we already do Generic, stating when it is meaningful. Or we can reserve a range of values for "internal" purposes and document them in another table, so there is no chance for confusion?

foad added inline comments.Mar 21 2023, 9:01 AM
llvm/docs/AMDGPUUsage.rst
678

Sure, if the documentation is clear then I have no objection to documenting it.

Do you know what the failure mode is if you try to compile IR that uses an invalid or unsupported address space?

rovka added inline comments.Mar 22 2023, 2:16 AM
llvm/docs/AMDGPUUsage.rst
678

I think it only fails during instruction selection.
At the IR level we don't seem to have a problem with unknown address spaces, in fact there's been some effort to let them just pass through. We have a handful of tests using addrspace(999), e.g. to check that AA treats it conservatively.

In our case, I could probably teach AMDGPU AA that calls to these GS intrinsics don't affect any pointers (WIP), but I'm not sure what we ought to do about IR pointers to internal address spaces. Should we error out anywhere before instruction selection, or just leave them undisturbed through the middle-end and ICE later on in isel? Should we check this in the IRVerifier? Other thoughts?

foad added inline comments.Mar 22 2023, 2:40 AM
llvm/docs/AMDGPUUsage.rst
678

"failed to select" seems like a reasonable failure mode for now.

Adding some kind of checks in the IR verifier sounds nice but I don't know what would be appropriate there. Do we even have a notion of which address spaces are valid for a particular target?

rovka added inline comments.Mar 22 2023, 3:10 AM
llvm/docs/AMDGPUUsage.rst
678

"failed to select" seems like a reasonable failure mode for now.

Ok :)

Adding some kind of checks in the IR verifier sounds nice but I don't know what would be appropriate there. Do we even have a notion of which address spaces are valid for a particular target?

I'm not aware of any. I skimmed a bit through the IR verifier and there's no support for validating pointers in general, but we usually have a Triple around so we could at least check that we don't load/store from internal address spaces for AMDGPU triples. How does that sound?

rovka updated this revision to Diff 508567.Mar 27 2023, 4:14 AM

I have expanded the section in AMDGPUUsage.rst. Hopefully this clarifies the intention, but do let me know if I anything could use rephrasing.

I had a stab at catching IR pointers to this address space in the IR Verifier, but AFAICT we shouldn't import lib/Target/AMDGPU/AMDGPU.h there, so we wouldn't have access to our address space enum. I don't think it's worth the effort of maintaining the internal address space numbering in yet another place, so I'm abandoning that idea.

Regarding the alias info for these intrinsics - we don't seem to be doing anything special for other intrinsics (we're not overriding AAResult::getModRefInfo at all). So that should probably be a different patch anyway.

arsenm accepted this revision.Mar 31 2023, 2:58 PM
This revision is now accepted and ready to land.Mar 31 2023, 2:58 PM
This revision was landed with ongoing or failed builds.Apr 11 2023, 2:12 AM
This revision was automatically updated to reflect the committed changes.