This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Remove spurious out branches after a kill
ClosedPublic

Authored by hakzsam on Jan 5 2017, 4:31 AM.

Details

Summary
The sequence like this:

...
        v_cmpx_le_f32_e32 vcc, 0, v0
        s_branch BB0_30
        s_cbranch_execnz BB0_30
; BB#29:
        exp null off, off, off, off done vm
        s_endpgm
BB0_30:                                 ; %endif110
...

is likely wrong. The s_branch instruction will unconditionally jump
to BB0_30 and the skip block (exp done + endpgm) inserted for
performing the kill instruction will never be executed. This results
in a GPU hang with Star Ruler 2.

The s_branch instruction is added during the "Control Flow Optimizer"
pass which seems to re-organize the basic blocks, and we assume
that SI_KILL_TERMINATOR is always the last instruction inside a
basic block. Thus, after inserting a skip block we just go to the
next BB without looking at the subsequent instructions after the
kill, and the s_branch op is never removed.

Instead, we should remove the unconditional out branches and let
skip the two instructions if the exec mask is non-zero.

This patch fixes the GPU hang and doesn't introduce any regressions
with "make check".

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019

Diff Detail

Event Timeline

hakzsam updated this revision to Diff 83215.Jan 5 2017, 4:31 AM
hakzsam retitled this revision from to AMDGPU: Remove spurious out branches after a kill.
hakzsam updated this object.
hakzsam added reviewers: tstellarAMD, arsenm.
hakzsam added subscribers: mareko, nhaehnle.
arsenm edited edge metadata.Jan 5 2017, 9:15 AM
arsenm edited edge metadata.
arsenm added a subscriber: llvm-commits.

Needs a test

Needs a test

Okay, I will update it with a test.

nhaehnle added a comment.EditedJan 13 2017, 6:15 AM

The approach is unfortunately not correct in general as far as I can tell. Say you have

    SI_KILL_TERMINATOR ...
    S_BRANCH BB#other
BB#next:

To be honest, it looks to me like this is a bug (or at least a serious brittleness) in the code anyway, and we're probably getting lucky because the block ordering code lays blocks out in a way that works well for us. What I mean is that AFAIU, this will first become:

  V_CMPX
  S_CBRANCH_EXECNZ BB#next
  S_BRANCH BB#other
skip_block:
  EXP
  S_ENDPGM
BB#next:

... and then your change would remove the branch to BB#other.

No. Currently the code will become:

V_CMPX
S_BRANCH BB#other
S_CBRANCH_EXECNZ BB#next

Because S_CBRANCH_EXECNZ is added as terminator.

That's why my first was idea was to remove the unconditional branch *after* the kill.

I wrote a MIR test. I should probably send a new revision with the test.

hakzsam updated this revision to Diff 84333.Jan 13 2017, 10:50 AM
hakzsam edited edge metadata.

v2: add a MIR test called insert-skips-kill-uncond.mir

Thanks! We generally use CHECK-LABEL only for the start of a function, not for labels inside a function. The idea is that the label helps FileCheck get less confused when there are many sub-tests in a single test file.

There's still the issue of what happens when the target of the unconditional branch isn't the next block. That's a pre-existing bug, so you don't necessarily have to fix it in this commit, but it would be good if you could tackle this before we forget about it.

More comments inline.

lib/Target/AMDGPU/SIInsertSkips.cpp
282–288

There's still a potential double-eraseFromParent, isn't there? Or I guess not because the skip block is always in the way, but still, I think this should be unified or at least an else-if.

test/CodeGen/AMDGPU/insert-skips-kill-uncond.mir
17–19

It would be good to check that there is no spurious branch /after/ the conditional branch either. This probably works:

CHECK-NEXT: bb.3:

Thanks for your feedback.

I will make a patch for the second issue.

lib/Target/AMDGPU/SIInsertSkips.cpp
282–288

Probably not, but using a else-if seems like safer.

test/CodeGen/AMDGPU/insert-skips-kill-uncond.mir
17–19

Yes, I will fix that.

hakzsam added inline comments.Jan 16 2017, 2:23 AM
test/CodeGen/AMDGPU/insert-skips-kill-uncond.mir
17–19

Actually, CHECK-NEXT: bb.3: doesn't work, but we could do CHECK-NEXT: EXP_DONE instead, because the skip block is added just after.

hakzsam updated this revision to Diff 84531.Jan 16 2017, 2:33 AM
hakzsam edited edge metadata.

v3: - only use CHECK_LABEL for the start of a function

  • check there is no spurious branch after the conditional branch
  • use an else-if to avoid potential double-erasing
nhaehnle accepted this revision.Jan 23 2017, 5:30 AM
nhaehnle added a reviewer: nhaehnle.

LGTM.

This revision is now accepted and ready to land.Jan 23 2017, 5:30 AM
arsenm closed this revision.Jan 24 2017, 2:30 PM

r292985