The sequence like this:
...
        v_cmpx_le_f32_e32 vcc, 0, v0
        s_branch BB0_30
        s_cbranch_execnz BB0_30
; BB#29:
        exp null off, off, off, off done vm
        s_endpgm
BB0_30:                                 ; %endif110
...
is likely wrong. The s_branch instruction will unconditionally jump
to BB0_30 and the skip block (exp done + endpgm) inserted for
performing the kill instruction will never be executed. This results
in a GPU hang with Star Ruler 2.
The s_branch instruction is added during the "Control Flow Optimizer"
pass which seems to re-organize the basic blocks, and we assume
that SI_KILL_TERMINATOR is always the last instruction inside a
basic block. Thus, after inserting a skip block we just go to the
next BB without looking at the subsequent instructions after the
kill, and the s_branch op is never removed.
Instead, we should remove the unconditional out branches and let
skip the two instructions if the exec mask is non-zero.
This patch fixes the GPU hang and doesn't introduce any regressions
with "make check".
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019Details
- Reviewers
- • tstellarAMD - nhaehnle - arsenm 
Diff Detail
Event Timeline
The approach is unfortunately not correct in general as far as I can tell. Say you have
    SI_KILL_TERMINATOR ...
    S_BRANCH BB#other
BB#next:To be honest, it looks to me like this is a bug (or at least a serious brittleness) in the code anyway, and we're probably getting lucky because the block ordering code lays blocks out in a way that works well for us. What I mean is that AFAIU, this will first become:
V_CMPX S_CBRANCH_EXECNZ BB#next S_BRANCH BB#other skip_block: EXP S_ENDPGM BB#next:
... and then your change would remove the branch to BB#other.
No. Currently the code will become:
V_CMPX S_BRANCH BB#other S_CBRANCH_EXECNZ BB#next
Because S_CBRANCH_EXECNZ is added as terminator.
That's why my first was idea was to remove the unconditional branch *after* the kill.
I wrote a MIR test. I should probably send a new revision with the test.
Thanks! We generally use CHECK-LABEL only for the start of a function, not for labels inside a function. The idea is that the label helps FileCheck get less confused when there are many sub-tests in a single test file.
There's still the issue of what happens when the target of the unconditional branch isn't the next block. That's a pre-existing bug, so you don't necessarily have to fix it in this commit, but it would be good if you could tackle this before we forget about it.
More comments inline.
| lib/Target/AMDGPU/SIInsertSkips.cpp | ||
|---|---|---|
| 282–288 | There's still a potential double-eraseFromParent, isn't there? Or I guess not because the skip block is always in the way, but still, I think this should be unified or at least an else-if. | |
| test/CodeGen/AMDGPU/insert-skips-kill-uncond.mir | ||
| 18–20 | It would be good to check that there is no spurious branch /after/ the conditional branch either. This probably works: CHECK-NEXT: bb.3: | |
| test/CodeGen/AMDGPU/insert-skips-kill-uncond.mir | ||
|---|---|---|
| 18–20 | Actually, CHECK-NEXT: bb.3: doesn't work, but we could do CHECK-NEXT: EXP_DONE instead, because the skip block is added just after. | |
v3: - only use CHECK_LABEL for the start of a function
- check there is no spurious branch after the conditional branch
- use an else-if to avoid potential double-erasing
There's still a potential double-eraseFromParent, isn't there? Or I guess not because the skip block is always in the way, but still, I think this should be unified or at least an else-if.