This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Reimplement the GFX11 early release VGPRs optimization
ClosedPublic

Authored by foad on Jun 19 2023, 7:49 AM.

Details

Summary

Implement this optimization in SIInsertWaitcnts, where we already have
information about whether there might be outstanding VMEM store
instructions. This has the following advantages:

  • Correctly handles atomics-with-return.
  • Correctly handles call instructions.
  • Should be faster because it does not require running a separate pass.

Diff Detail

Event Timeline

foad created this revision.Jun 19 2023, 7:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 19 2023, 7:49 AM
foad requested review of this revision.Jun 19 2023, 7:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 19 2023, 7:49 AM
foad added inline comments.Jun 19 2023, 7:54 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.global.atomic.csub.ll
162

global_atomic_csub_u32 ... glc is an atomic-with-return which uses VMcnt. The hardware will wait until VMcnt==0 before sending the MSG_DEALLOC_VGPRS message, so there is no point sending it.

llvm/test/CodeGen/AMDGPU/call-argument-types.ll
4385

s_swappc_b64 is a call. The ABI says that the callee should wait for memory counters like VScnt to be 0 before returning, so there should be no outstanding stores at this point.

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll
594

The old AMDGPUReleaseVGPRs pass did not send the message here because the global_store_b32 above was not the last VGPR-using instruction before the s_endpgm. I don't understand why it was implemented that way. I think we do want to send the message, because VScnt might be non-zero here.

llvm/test/CodeGen/AMDGPU/waitcnt-preexisting-vscnt.mir
34

This is only different because we insert these instructions during the SIInsertWaitcnts pass, instead of in a separate pass.

stepthomas accepted this revision.Jun 19 2023, 8:32 AM

This looks so much simpler.

This revision is now accepted and ready to land.Jun 19 2023, 8:32 AM
nhaehnle added inline comments.Jun 19 2023, 8:57 AM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll
594

Yes, makes sense to me.

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll