This pass implements the algorithm deployed by our internal compiler for inserting waitcnt instructions. The pass performs cross basic-block analysis and tracks individual registers, and provides predicted performance improvements over the current implementation.
There are further improvements forthcoming, including relaxing overtly conservative assumptions about LDS access, integration of memory model pass, and more targeted tests for the corners.
Extra blank line