This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Split edge to make si_if dominate end_cf
ClosedPublic

Authored by alex-t on Nov 13 2020, 8:59 AM.

Details

Summary

Basic block containing "if" not necessarily dominates block that is the "false" target for the if.

That "false" target block may have another predecessor besides the "if" block. IR value corresponding to the Exec mask is generated by the

si_if intrinsic and then used by the end_cf intrinsic. In this case IR verifier complains that 'Def does not dominate all uses'.

This change split the edge between the "if" block and "false" target block to make it dominated by the "if" block.

Diff Detail

Event Timeline

alex-t created this revision.Nov 13 2020, 8:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 13 2020, 8:59 AM
alex-t requested review of this revision.Nov 13 2020, 8:59 AM
alex-t updated this revision to Diff 305187.Nov 13 2020, 9:13 AM

Test added

Typos "do,imates" and "another predecessors" in message

llvm/lib/Target/AMDGPU/SIAnnotateControlFlow.cpp
319–322

Is this the same as unconditionally calling SplitCriticalEdge?

llvm/test/CodeGen/AMDGPU/virtual-register-defs-dont-dominate-all-uses.ll
3–5 ↗(On Diff #305187)

Don't need these

alex-t edited the summary of this revision. (Show Details)Nov 16 2020, 5:47 AM
alex-t marked an inline comment as done.Nov 16 2020, 8:41 AM
alex-t added inline comments.
llvm/lib/Target/AMDGPU/SIAnnotateControlFlow.cpp
319–322

Not really. Any edge from the block with multiple successors to block with multiple predecessors considered critical. So, changing to unconditional call to SplitCriticalEdge involves a lot of trivial CFG patterns that do not require splitting.

For example:

    "entry"     <--  entry has 2 successors
     /        \
"if.then"   |  <-- this edge is considered as critical
    \         /
      "end"    <-- end has 2 predecessors

At least the isCriticalEdge function in CFG.cpp : line 101 works this way

alex-t updated this revision to Diff 305533.Nov 16 2020, 8:44 AM
alex-t marked an inline comment as done.

The odd lines removed from the test

arsenm added inline comments.Nov 19 2020, 8:37 AM
llvm/lib/Target/AMDGPU/SIAnnotateControlFlow.cpp
317

This should be a cast since it's unchecked

alex-t updated this revision to Diff 307080.Nov 23 2020, 7:43 AM

dyn_cast changed to cast

alex-t marked an inline comment as done.Nov 23 2020, 7:43 AM
alex-t added inline comments.
llvm/lib/Target/AMDGPU/SIAnnotateControlFlow.cpp
317

Yes. "Exec" producer is always an Instruction. So cast is okay.

alex-t marked an inline comment as done.Nov 24 2020, 8:57 AM

Ping

arsenm accepted this revision.Nov 30 2020, 8:03 AM

It seems to me there's more logic going on between the decisions SplitEdge makes and the additional dominates call than should be necessary, but this should be fine

This revision is now accepted and ready to land.Nov 30 2020, 8:03 AM

With this applied, I'm still seeing a verifier error in the original testcase

arsenm requested changes to this revision.Dec 10 2020, 6:20 PM
This revision now requires changes to proceed.Dec 10 2020, 6:20 PM

Could you share the original testcase then? I only have that reduced one attached to the Jira ticket.
And it works for it.

Could you share the original testcase then? I only have that reduced one attached to the Jira ticket.
And it works for it.

OK, the original testcase does actually work. However, I ran into another instance which this patch did not fix which I assumed was the same. I've attached that one to the ticket now

alex-t updated this revision to Diff 312515.Dec 17 2020, 8:26 AM

Bug in SIAnnotateControlFlow fixed: simple CF join should not be treated as loop

There was a bug in SIAnnotateControlFlow. Visited node is not necessarily means loop. It may be CF join instead.
Added check that Term successor visited and dominates Terms's parent.

Needs a testcase for the second broken case

Needs a testcase for the second broken case

I cannot create a relevant testcase. The CFG pattern that triggers the bug only appears in IR compiled with clang.
The reason is that RegionInfo cannot distinguish regions and treat the whole function as one region. As a result CFG structurizer does nothing with it.
When compiling by LLC structurizer does his job and produces good CFG that does not trigger the bug.
MIR test is problematic as well. We need DivergenceAnalysis to make SIAnnotateCF work correctly. DA requires all the Target information to be able to determine if the target is divergent.
Same time we need to avoid all the passes in between including structurizer. So, MIR test is going to be quite tricky.
I still have no idea how to run tti, divergence and si-annotate-control-flow separately and provide the target information to the "-run-pass'"

Needs a testcase for the second broken case

I cannot create a relevant testcase. The CFG pattern that triggers the bug only appears in IR compiled with clang.

It must be possible, and cannot only appear through clang

The reason is that RegionInfo cannot distinguish regions and treat the whole function as one region. As a result CFG structurizer does nothing with it.
When compiling by LLC structurizer does his job and produces good CFG that does not trigger the bug.
MIR test is problematic as well. We need DivergenceAnalysis to make SIAnnotateCF work correctly. DA requires all the Target information to be able to determine if the target is divergent.

This is an IR test, MIR does not come into this.

I still have no idea how to run tti, divergence and si-annotate-control-flow separately and provide the target information to the "-run-pass'"

You shouldn't be trying to use -run-pass/llc, this is an opt test with the relevant passes and then an end to end ISA run line

alex-t updated this revision to Diff 313535.EditedDec 23 2020, 4:44 AM

New test created to cover both failed cases:

  • end_cf in the block that has yet another predecessor besides that one defining the exec mask
  • given the pattern above - not any visited node denotes a loop, only when we have a backedge i.e. block's successor dominates the block.
alex-t marked an inline comment as done.Dec 23 2020, 4:49 AM

There should be both tests, not one larger function that hits both

llvm/test/CodeGen/AMDGPU/virtual-register-defs-dont-dominate-all-uses.ll
3 ↗(On Diff #313535)

Can you also add an llc run line?

arsenm added inline comments.Dec 23 2020, 8:17 AM
llvm/test/CodeGen/AMDGPU/virtual-register-defs-dont-dominate-all-uses.ll
3 ↗(On Diff #313535)

Test file name should also be more descriptive of the underlying cause

There should be both tests, not one larger function that hits both

This second pattern requires the first one to manifest. So even if I split the tests, the second will contain both patterns.
Does it make sense?
I checked the behavior of the united test commenting out the first fix and the second one - both changes were separately controlled.

alex-t added inline comments.Dec 24 2020, 12:35 AM
llvm/test/CodeGen/AMDGPU/virtual-register-defs-dont-dominate-all-uses.ll
3 ↗(On Diff #313535)

What do you need it for? The purpose of the test is to check that we don't hit an assert.
It also checks the output.
Do you want to ensure if it may be compiled down to the ISA?
Both opt and llc in the same file are inconvenient because is tricky to re-generate the checks if changed.
Moreover. the second case with a loop is only reproducible by running the -si-annotate-control-flow explicitly by opt. Running this case by llc will test nothing because the desired CFG pattern only appears compiling by clang. CFG structurizer will produce structured CFG if the test is launched with LLC.
When running clang the CFG pattern remains unchanged because RegionInfo cannot distinguish the regions and report the whole function as one region. Hence the CFG structurizer does nothing. While running LLC with the same IR RegionPass is able to distinguish regions and structurizer does it's job well, so we don't face the pattern causing the bug.

alex-t updated this revision to Diff 313673.Dec 24 2020, 1:29 AM

The testcase is splitted into 2 separate functions. LLC run line added.

arsenm accepted this revision.Dec 24 2020, 7:23 AM
This revision is now accepted and ready to land.Dec 24 2020, 7:23 AM
This revision was landed with ongoing or failed builds.Dec 28 2020, 6:22 AM
This revision was automatically updated to reflect the committed changes.