Page MenuHomePhabricator

[AMDGPU] Optimize waitcnt insertion for flat memory operations
Needs ReviewPublic

Authored by t-tye on Sat, Oct 17, 12:25 AM.

Details

Summary

Change waitcnt insertion to check the memory operand tokens to see if
flat memory operations access VMEM in the same way it does to check if
accessing LDS. This avoids adding waitcnt for counters for address
spaces that are not accessed.

In addition, only generate the pessimistic waitcnt 0 if a flat memory
operation appears to access both VMEM and LDS.

This benefits flat memory operations that explicitly specify the
address space as GLOBAL or LOCAL.

Diff Detail

Event Timeline

t-tye created this revision.Sat, Oct 17, 12:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptSat, Oct 17, 12:25 AM
t-tye requested review of this revision.Sat, Oct 17, 12:25 AM
arsenm added inline comments.Mon, Oct 19, 8:22 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1270

When is usesVM_CNT false and isFLAT true?

LGTM modulo that inline question.

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1270

I'd like to know as well. It may be better to make this an assert(TII->usesVM_CNT(Inst)) instead.

rampitec added inline comments.Mon, Oct 19, 11:55 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1202

Could you add assert (TII->isFLAT(MI)) at the top of the function?

1270

Second to that, turn the check into assert. There are no such instructions at least so far.

llvm/test/CodeGen/AMDGPU/waitcnt.mir
67–68

That one was not supposed to change? The pointer is flat here.

rampitec added inline comments.Mon, Oct 19, 1:36 PM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1207

It should check for local or flat here.

t-tye updated this revision to Diff 299234.Mon, Oct 19, 6:35 PM
t-tye marked 4 inline comments as done.

Address review comments.

t-tye marked an inline comment as not done.Mon, Oct 19, 6:48 PM
t-tye added inline comments.
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1207

We do not want to test for local and flat as this method is checking to VMEM not LDS. Instead, we want to check for all the address spaces that are legal for a flat operation and involve VMEM. Looking at the enumeration of address spaces there are quite a few that are valid for flat that involve VMEM. Forxample, global, flat, contant, private, the ones involving buffers, etc. Since flat only supports LDS, FLAT, and the address spaces that involve VMEM the clearest test here is to find address spaces that are not LDS. They are the ones that may be VMEM.

I considered asserting if any address space was found that was not legal for a flat operation. For example region (GDS) is not valid. But is there an existing predicate to answer that question?

1270

This was in the original code but I agree it makes little sense. So moved it as an assert into mayAccessVMEMThroughFlat().

llvm/test/CodeGen/AMDGPU/waitcnt.mir
67–68

Yes. Previously it was "s_waitcnt vmcnt(0) lgkmcnt(0)". Now it is "s_waitcnt vmcnt(0)" as the address space of global16 is 1 which is GLOBAL. Therefore there is no need to wait on LGKM.

t-tye updated this revision to Diff 299238.Mon, Oct 19, 6:53 PM

Really upload the review feedback changes.

t-tye updated this revision to Diff 299252.Mon, Oct 19, 9:13 PM

Fix clang format warnings.