This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Fix AMDGPUUnifyDivergentExitNodes with no normal returns
ClosedPublic

Authored by cwabbott on Dec 9 2019, 3:13 AM.

Details

Summary

The code was assuming in a few places that if there was only one exit
from the function that it was a normal return, which is invalid. It
could be an infinite loop, in which case we still need to insert the
usual fake edge so that the null export happens. This fixes shaders that
end with an infinite loop that discards.

Event Timeline

cwabbott created this revision.Dec 9 2019, 3:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2019, 3:13 AM
This revision is now accepted and ready to land.Dec 9 2019, 11:34 PM
This revision was automatically updated to reflect the committed changes.
cwabbott reopened this revision.Jan 29 2020, 9:09 AM
This revision is now accepted and ready to land.Jan 29 2020, 9:09 AM
cwabbott updated this revision to Diff 241187.Jan 29 2020, 9:09 AM

It seems I forgot to test this against the entire testsuite, which made a bunch
of buildbots unhappy. This version fixes the issues:

  • Skip the entire thing for non-pixel shaders that only end with an infinite

loop. This was causing slightly different assembly for four other tests, due to
the insertion of the fake branch, and it generates extra work for normal
shaders that's pointless.

  • Update the expected output in update-phi.ll to account for the null export.
arsenm accepted this revision.Jan 29 2020, 9:29 AM
nhaehnle accepted this revision.Jan 30 2020, 1:13 AM

LGTM. Did you manage to sort out your commit access issues?

LGTM. Did you manage to sort out your commit access issues?

Yes, I just got added yesterday.

This revision was automatically updated to reflect the committed changes.
piotr added a subscriber: piotr.Feb 21 2020, 4:12 AM

Hi @cwabbott,
This commit causes a GPU hang on amdvlk in one test, due to the missing "done" bit on the normal export. I think in the attached case the patch incorrectly classifies the export as being in an infinite loop.

Hi @cwabbott,
This commit causes a GPU hang on amdvlk in one test, due to the missing "done" bit on the normal export. I think in the attached case the patch incorrectly classifies the export as being in an infinite loop.

Hi @piotr,

I think what's happening is that the "normal" return, that already has an export before it, is uniformly reached (i.e. only reached via uniform branches). Normally this means that we don't have to do anything with it, so the pass ignores it, but then we remove the "done" bit even though the final dummy export, which is supposed to replace it, isn't reached when returning normally. There are two ways I can see to solve it:

  • When InsertExport is true, unify *all* the return blocks, even the uniformly-reached ones. For this shader, we would still remove the done bit on the original export, but we'd replace the return with a branch to the UnifiedReturnBlock with the dummy export.
  • Do the above, but when we can prove that all the existing exports are post-dominated by the normal, uniformly-reached returns, we can avoid clearing their done bits or unifying them.

The second option would prevent the regression in code quality, but it would be more complicated. We've deliberately made this as simple as possible already, because it only kicks in when there are infinite loops (that might have kills in them) and this seems pretty uncommon in real-world shaders. Is this coming from a test or an actual game?

piotr added a comment.Feb 27 2020, 9:05 AM

The reported example comes from one of graphicsfuzz CTS tests, so I think it is fine to treat it as an unrealistic corner case where performance is not critical.