The code was assuming in a few places that if there was only one exit
from the function that it was a normal return, which is invalid. It
could be an infinite loop, in which case we still need to insert the
usual fake edge so that the null export happens. This fixes shaders that
end with an infinite loop that discards.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
It seems I forgot to test this against the entire testsuite, which made a bunch
of buildbots unhappy. This version fixes the issues:
- Skip the entire thing for non-pixel shaders that only end with an infinite
loop. This was causing slightly different assembly for four other tests, due to
the insertion of the fake branch, and it generates extra work for normal
shaders that's pointless.
- Update the expected output in update-phi.ll to account for the null export.
Hi @cwabbott,
This commit causes a GPU hang on amdvlk in one test, due to the missing "done" bit on the normal export. I think in the attached case the patch incorrectly classifies the export as being in an infinite loop.
Hi @piotr,
I think what's happening is that the "normal" return, that already has an export before it, is uniformly reached (i.e. only reached via uniform branches). Normally this means that we don't have to do anything with it, so the pass ignores it, but then we remove the "done" bit even though the final dummy export, which is supposed to replace it, isn't reached when returning normally. There are two ways I can see to solve it:
- When InsertExport is true, unify *all* the return blocks, even the uniformly-reached ones. For this shader, we would still remove the done bit on the original export, but we'd replace the return with a branch to the UnifiedReturnBlock with the dummy export.
- Do the above, but when we can prove that all the existing exports are post-dominated by the normal, uniformly-reached returns, we can avoid clearing their done bits or unifying them.
The second option would prevent the regression in code quality, but it would be more complicated. We've deliberately made this as simple as possible already, because it only kicks in when there are infinite loops (that might have kills in them) and this seems pretty uncommon in real-world shaders. Is this coming from a test or an actual game?
The reported example comes from one of graphicsfuzz CTS tests, so I think it is fine to treat it as an unrealistic corner case where performance is not critical.