The sequence like this: ... v_cmpx_le_f32_e32 vcc, 0, v0 s_branch BB0_30 s_cbranch_execnz BB0_30 ; BB#29: exp null off, off, off, off done vm s_endpgm BB0_30: ; %endif110 ... is likely wrong. The s_branch instruction will unconditionally jump to BB0_30 and the skip block (exp done + endpgm) inserted for performing the kill instruction will never be executed. This results in a GPU hang with Star Ruler 2. The s_branch instruction is added during the "Control Flow Optimizer" pass which seems to re-organize the basic blocks, and we assume that SI_KILL_TERMINATOR is always the last instruction inside a basic block. Thus, after inserting a skip block we just go to the next BB without looking at the subsequent instructions after the kill, and the s_branch op is never removed. Instead, we should remove the unconditional out branches and let skip the two instructions if the exec mask is non-zero. This patch fixes the GPU hang and doesn't introduce any regressions with "make check". Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019
Details
- Reviewers
• tstellarAMD nhaehnle arsenm
Diff Detail
Event Timeline
The approach is unfortunately not correct in general as far as I can tell. Say you have
SI_KILL_TERMINATOR ... S_BRANCH BB#other BB#next:
To be honest, it looks to me like this is a bug (or at least a serious brittleness) in the code anyway, and we're probably getting lucky because the block ordering code lays blocks out in a way that works well for us. What I mean is that AFAIU, this will first become:
V_CMPX S_CBRANCH_EXECNZ BB#next S_BRANCH BB#other skip_block: EXP S_ENDPGM BB#next:
... and then your change would remove the branch to BB#other.
No. Currently the code will become:
V_CMPX S_BRANCH BB#other S_CBRANCH_EXECNZ BB#next
Because S_CBRANCH_EXECNZ is added as terminator.
That's why my first was idea was to remove the unconditional branch *after* the kill.
I wrote a MIR test. I should probably send a new revision with the test.
Thanks! We generally use CHECK-LABEL only for the start of a function, not for labels inside a function. The idea is that the label helps FileCheck get less confused when there are many sub-tests in a single test file.
There's still the issue of what happens when the target of the unconditional branch isn't the next block. That's a pre-existing bug, so you don't necessarily have to fix it in this commit, but it would be good if you could tackle this before we forget about it.
More comments inline.
lib/Target/AMDGPU/SIInsertSkips.cpp | ||
---|---|---|
282–288 | There's still a potential double-eraseFromParent, isn't there? Or I guess not because the skip block is always in the way, but still, I think this should be unified or at least an else-if. | |
test/CodeGen/AMDGPU/insert-skips-kill-uncond.mir | ||
17–19 | It would be good to check that there is no spurious branch /after/ the conditional branch either. This probably works: CHECK-NEXT: bb.3: |
test/CodeGen/AMDGPU/insert-skips-kill-uncond.mir | ||
---|---|---|
17–19 | Actually, CHECK-NEXT: bb.3: doesn't work, but we could do CHECK-NEXT: EXP_DONE instead, because the skip block is added just after. |
v3: - only use CHECK_LABEL for the start of a function
- check there is no spurious branch after the conditional branch
- use an else-if to avoid potential double-erasing
There's still a potential double-eraseFromParent, isn't there? Or I guess not because the skip block is always in the way, but still, I think this should be unified or at least an else-if.