This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Produce waitcounts for LDS DMA
ClosedPublic

Authored by rampitec on Apr 28 2022, 11:14 AM.

Details

Summary

MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS written
can be accessed. A load from LDS to VMEM does not need a wait.

Diff Detail

Event Timeline

rampitec created this revision.Apr 28 2022, 11:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2022, 11:14 AM
rampitec requested review of this revision.Apr 28 2022, 11:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2022, 11:14 AM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.Apr 28 2022, 11:37 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
502–503

isVALU is redundant with isMUBUF || isFLAT?

rampitec added inline comments.Apr 28 2022, 11:43 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
502–503

MUBUF and FLAT have VALU = 0 unless it is LDS DMA, which are both memory and VALU instructions. The only of that kind.

arsenm added inline comments.Apr 28 2022, 11:45 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
655

Swap these checks, Inst.mayStore() && ..

rampitec updated this revision to Diff 425869.Apr 28 2022, 11:48 AM
rampitec marked an inline comment as done.

Swapped checks.

rampitec added a comment.EditedApr 28 2022, 4:56 PM

I also feel like EXTRA_VGPR_LDS was made specifically for this scenario, when we have no VGPR result to track dependency. And it was unused and dead code. Unfortunately we practically do not have tests for this pass and this place is certainly not covered. If I remove isDS() call nothing fails, but probably it should not and it was dead. I cannot prove it though as I have no idea what was originally behind it. But that fits so well, so I guess it was DMA.

rampitec marked an inline comment as done.Apr 28 2022, 5:02 PM
foad added inline comments.Apr 29 2022, 2:11 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1100

Was this a pre-existing bug, even if you don't use DMA instructions? E.g. for DS_STORE followed by FLAT_STORE were we missing a wait on expcnt to guarantee WAW order?

rampitec added inline comments.Apr 29 2022, 2:57 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1100

I thought so too, but it didn't get it in this scenario before and it doesn't get it after because there is no pending VM event. I just believe if anything it shall be vmcnt, not expcnt, although I do not see that documented. For DMA the reason for vmcnt is that we shall wait for vmem to be read, although a wait on lgkmt is not needed (does not seem to be properly documented either). For a pure ds_write it is likely not needed because we don't touch VMEM.

rampitec added a comment.EditedApr 29 2022, 2:59 AM

A side question: is that possible to facilitate MIR syntax to use symbolic syntax for waitcounts instead of a raw number?

Quite literally I am using -start-before instead of -run-pass to see what is really produced while testing.

A side question: is that possible to facilitate MIR syntax to use symbolic syntax for waitcounts instead of a raw number?

You would have to add something new to support it. The closest existing thing is probably the comments inlineasm uses for the register class encoded by magic constants

arsenm accepted this revision.Apr 29 2022, 10:33 AM
This revision is now accepted and ready to land.Apr 29 2022, 10:33 AM
This revision was landed with ongoing or failed builds.Apr 29 2022, 11:14 AM
This revision was automatically updated to reflect the committed changes.