If we know the access isn't to a flat address,
the wait for LDS is not necessary.
Details
- Reviewers
• tstellarAMD
Diff Detail
Event Timeline
lib/Target/AMDGPU/SIInsertWaits.cpp | ||
---|---|---|
221–225 | I'm not really sure exactly what this is doing, but as long as this accounts for the fact that the hw LGKM counter is always incremented even if the operation accesses global memory than this is fine. Though, I think you should add some tests that have lds operations before and after a flat instruction that accesses global memory. |
lib/Target/AMDGPU/SIInsertWaits.cpp | ||
---|---|---|
221–225 | I don't think this is accounting for the hardware increase |
lib/Target/AMDGPU/SIInsertWaits.cpp | ||
---|---|---|
221–225 | But does the hardware increasing the LGKM counter matter? The hardware will increase it, then decrease it once it determines the FLT address is targeting LDS. So all that that can effect is another memory operation waiting for LGKM, causing them to wait a bit longer. It cannot make any other memory operation satisfy their WAITCNT early so cannot break correctness. The completion of a FLAT operation that is known to only target VMEM only needs to wait on the vmem counter. |
I'm not really sure exactly what this is doing, but as long as this accounts for the fact that the hw LGKM counter is always incremented even if the operation accesses global memory than this is fine.
Though, I think you should add some tests that have lds operations before and after a flat instruction that accesses global memory.