New Waitcnt Insertion Pass
AbandonedPublic

Authored by kzhuravl on Mon, Mar 20, 4:20 PM.

Details

Summary

This pass implements the algorithm deployed by our internal compiler for inserting waitcnt instructions. The pass performs cross basic-block analysis and tracks individual registers, and provides predicted performance improvements over the current implementation.

There are further improvements forthcoming, including relaxing overtly conservative assumptions about LDS access, integration of memory model pass, and more targeted tests for the corners.

Diff Detail

kanarayan created this revision.Mon, Mar 20, 4:20 PM
kzhuravl commandeered this revision.Mon, Mar 20, 4:39 PM
kzhuravl edited reviewers, added: kanarayan; removed: kzhuravl.
kzhuravl abandoned this revision.Mon, Mar 20, 4:39 PM

Need to include llvm-commits.

rampitec added inline comments.Mon, Mar 20, 4:47 PM
lib/Target/AMDGPU/SIInsertWaitcnts.cpp
74

Should not we get these numbers from TD files for target which we already have?

1017

For a barrier it will always insert strongest:

s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)

regardless of what was an argument of the barrier.

This also seems to completely ignore atomic fences inserted around the barrier from the library, which shall be a real source of wait argument. Note, that semantics of needWaitcntBeforeBarrier() is not that we always need to insert wait with barrier, but that we may need to insert it.

Also note that existing pass does not seem to do it for a barrier.

rampitec added inline comments.Mon, Mar 20, 4:56 PM
lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1613

Need to check for XNACK support.

Please add test with s_barrier.