New Waitcnt Insertion Pass

Authored by kzhuravl on Mar 20 2017, 4:20 PM.



This pass implements the algorithm deployed by our internal compiler for inserting waitcnt instructions. The pass performs cross basic-block analysis and tracks individual registers, and provides predicted performance improvements over the current implementation.

There are further improvements forthcoming, including relaxing overtly conservative assumptions about LDS access, integration of memory model pass, and more targeted tests for the corners.

Diff Detail

kanarayan created this revision.Mar 20 2017, 4:20 PM
kzhuravl commandeered this revision.Mar 20 2017, 4:39 PM
kzhuravl edited reviewers, added: kanarayan; removed: kzhuravl.
kzhuravl abandoned this revision.Mar 20 2017, 4:39 PM

Need to include llvm-commits.

rampitec added inline comments.Mar 20 2017, 4:47 PM

Should not we get these numbers from TD files for target which we already have?


For a barrier it will always insert strongest:

s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)

regardless of what was an argument of the barrier.

This also seems to completely ignore atomic fences inserted around the barrier from the library, which shall be a real source of wait argument. Note, that semantics of needWaitcntBeforeBarrier() is not that we always need to insert wait with barrier, but that we may need to insert it.

Also note that existing pass does not seem to do it for a barrier.

rampitec added inline comments.Mar 20 2017, 4:56 PM

Need to check for XNACK support.

Please add test with s_barrier.