This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][Waitcnt] Fix handling of loops with many bottom blocks
ClosedPublic

Authored by msearles on May 29 2018, 10:39 AM.

Details

Summary

In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence for a loop. Previously, that kicked if greater than 2 passes over a loop, which doesn't account for loop with many bottom blocks. So, increase the threshold to (n+1), where n is the number of bottom blocks. This gives the pass an opportunity to consider the contribution of each bottom block, to the overall loop, before the forced convergence potentially kicks in.

Diff Detail

Event Timeline

msearles created this revision.May 29 2018, 10:39 AM
This revision is now accepted and ready to land.May 29 2018, 10:43 AM
This revision was automatically updated to reflect the committed changes.