This pass implements the algorithm deployed by our internal compiler for inserting waitcnt instructions. The pass performs cross basic-block analysis and tracks individual registers, and provides predicted performance improvements over the current implementation.
There are further improvements forthcoming, including relaxing overtly conservative assumptions about LDS access, integration of memory model pass, and more targeted tests for the corners.
Should not we get these numbers from TD files for target which we already have?