This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Properly mark MUBUF and FLAT LDS DMA instructions. NFC.
ClosedPublic

Authored by rampitec on Apr 26 2022, 12:39 PM.

Details

Summary

Add these bits to the MUBUF and FLAT LDS DMA instructions:

  • LGKM_CNT - these operate on LDS;
  • VALU - SPG 3.9.8: This instruction acts as both a MUBUF and

VALU instruction;

Codegen currently does not produce any of this, so the change is NFC.

Diff Detail

Event Timeline

rampitec created this revision.Apr 26 2022, 12:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2022, 12:39 PM
rampitec requested review of this revision.Apr 26 2022, 12:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2022, 12:39 PM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.Apr 26 2022, 1:56 PM
llvm/lib/Target/AMDGPU/BUFInstructions.td
518

Loads and stores shouldn't be marked hasSideEffects, only mayLoad/mayStore

rampitec added inline comments.Apr 26 2022, 1:57 PM
llvm/lib/Target/AMDGPU/BUFInstructions.td
518

How do you suggest to mark that it can read and write virtually any memory?
Note that atomics have hasSideEffects flag.

arsenm added inline comments.Apr 26 2022, 1:58 PM
llvm/lib/Target/AMDGPU/BUFInstructions.td
518

They shouldn't have it set either. mayLoad or mayStore indicate this. the MMO may just not have meaningful pointer info

rampitec updated this revision to Diff 425298.Apr 26 2022, 2:01 PM
rampitec marked an inline comment as done.
rampitec edited the summary of this revision. (Show Details)
arsenm accepted this revision.Apr 26 2022, 2:16 PM

Probably should fix any atomics marked as hasSideEffests

This revision is now accepted and ready to land.Apr 26 2022, 2:16 PM
This revision was landed with ongoing or failed builds.Apr 26 2022, 2:20 PM
This revision was automatically updated to reflect the committed changes.
foad added inline comments.Apr 27 2022, 1:58 AM
llvm/lib/Target/AMDGPU/BUFInstructions.td
518

So these pseudos should either have 0 MMOs (no info, conservatively correct) or 2 MMOs (one for the BUF access and one for the LDS). Having just 1 MMO would be wrong.

rampitec added inline comments.Apr 27 2022, 2:12 AM
llvm/lib/Target/AMDGPU/BUFInstructions.td
518

They should have 2 (I have no idea how to to distinguish load and store for a DMA). But the reality is I cannot express this operation in terms of pointers and ranges. This is scatter on the load side and gather on the store side. Or vice versa for the opposite operation.

foad added inline comments.Apr 27 2022, 2:20 AM
llvm/lib/Target/AMDGPU/BUFInstructions.td
518

Right, but you can have an MMO with the appropriate address space, even if it has no pointer range. That still provides some info to alias analysis.

rampitec added inline comments.Apr 27 2022, 2:30 AM
llvm/lib/Target/AMDGPU/BUFInstructions.td
518

I am not particularly good in discussing a Schroedinger cat. This does not exist since we do not produce it in IR. There is no way you can get it in MIR.

If you really want to think about this start with the hazard recognizer and how does it bail after classifying an instruction to be a VALU or *BUF. That is what I am about now.

This should have 2 separate MMOs, and the verifier should probably enforce that if we were to start producing this

This should have 2 separate MMOs, and the verifier should probably enforce that if we were to start producing this

Unfortunately wait count pass uses MMO pointers, so we have to do it. I can imagine a base pointer, but I have no idea how much memory does this touch. At the very least it depends on the WG size.

This should have 2 separate MMOs, and the verifier should probably enforce that if we were to start producing this

Unfortunately wait count pass uses MMO pointers, so we have to do it. I can imagine a base pointer, but I have no idea how much memory does this touch. At the very least it depends on the WG size.

MMOs can set the size to unknown-size

Intersting that I can only create such MMOs during lowering. I do not see how an intrinsic call in IR can get 2 distinct MMOs. I have managed to get one, which is a 'load store' on address space 4. This is most certainly not a normal memory operation for llvm and not even a normal memory intrinsic. I cannot really express that an intrinsic reads addrspace 1 and writes addrspace 3. I think I have even figured out how much memory does it read and write, but the data type in too wide to express: wave size * operation size. For a single dword that is v64i32, but it can be 4 times more.