This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: fix missing s_waitcnt
ClosedPublic

Authored by timcorringham on Nov 28 2017, 3:26 AM.

Details

Summary

The pass that inserts s_waitcnt instructions where needed propagated
info used to track dependencies for each block by iterating over the
predecessor blocks. The iteration was terminated when a predecessor
that had not yet been processed was encountered. Any info in blocks
later in the list was therefore not processed, leading to the
possiblility of a required s_waitcnt not being inserted.

The fix is simply to change the "break" to "continue" for the
relevant loops, so that all visited blocks are processed. This
is likely what was intended when the code was written.

There is no test case provided for this fix because:

  1. the only example that reproduces this is large and resistant to

being reduced

  1. the change is trivial

Event Timeline

timcorringham created this revision.Nov 28 2017, 3:26 AM
msearles accepted this revision.Nov 29 2017, 12:24 PM

I'm OK with this change, however, I have a different fix for nearly the same problem. See https://reviews.llvm.org/D40183 . It addresses the situation where not all predecessors have been processed nor will they be; in that case, you have incomplete information, so the pass is modified to conservatively insert a s_waitcnt 0. Feel free to merge your change and I'll adjust my patch as necessary.

This revision is now accepted and ready to land.Nov 29 2017, 12:24 PM
This revision was automatically updated to reflect the committed changes.