Page MenuHomePhabricator

[AMDGPU][Waitcnt] Fix handling of loops with many bottom blocks

Authored by msearles on May 29 2018, 10:39 AM.



In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence for a loop. Previously, that kicked if greater than 2 passes over a loop, which doesn't account for loop with many bottom blocks. So, increase the threshold to (n+1), where n is the number of bottom blocks. This gives the pass an opportunity to consider the contribution of each bottom block, to the overall loop, before the forced convergence potentially kicks in.

Diff Detail


Event Timeline

msearles created this revision.May 29 2018, 10:39 AM
This revision is now accepted and ready to land.May 29 2018, 10:43 AM
This revision was automatically updated to reflect the committed changes.