This is an archive of the discontinued LLVM Phabricator instance.

Fix PR 24415 (at least), by making our post-dominator tree behavior sane.
ClosedPublic

Authored by dberlin on Feb 8 2017, 12:16 AM.

Details

Summary

Currently, our post-dom tree tries to ignore and remove the effects of
infinite loops. It fails miserably at this, because it tries to do it
ahead of time, and thus can only detect self-loops, and any other type
of infinite loop, it pretends doesn't exist at all.

This can, in a bunch of cases, lead to wrong answers and a completely
empty post-dom tree.

Wrong answer:

declare void foo()
define internal void @f() {
entry:
  br i1 undef, label %bb35, label %bb3.i

bb3.i:
  call void @foo()
  br label %bb3.i

bb35.loopexit3:
  br label %bb35

bb35:
  ret void
}

We get:

Inorder PostDominator Tree:
  [1]  <<exit node>> {0,7}
    [2] %bb35 {1,6}
      [3] %bb35.loopexit3 {2,3}
      [3] %entry {4,5}

This is a trivial modification of the testcase for PR 6047
Note that we pretend bb3.i doesn't exist.
We also pretend that bb35 post-dominates entry.

While it's true that it does not exit in a theoretical sense, it's not
really helpful to try to ignore the effect and pretend that bb35
post-dominates entry. Worse, we pretend the infinite loop does
nothing (it's usually considered a side-effect), and doesn't even
exist, even when it calls a function. Sadly, this makes it impossible
to use when you are trying to move code safely. All compilers also
create virtual or real single exit nodes (including us), and connect
infinite loops there (which this patch does). In fact, others have
worked around our behavior here, to the point of building their own
post-dom trees:
https://zneak.github.io/fcd/2016/02/17/structuring.html and pointing
out the region infrastructure is near-useless for them with postdom in
this state :(

Completely empty post-dom tree:

define void @spam() #0 {
bb:
  br label %bb1

bb1:                                              ; preds = %bb1, %bb
  br label %bb1

bb2:                                              ; No predecessors!
  ret void
}
Printing analysis 'Post-Dominator Tree Construction' for function 'foo':
=============================--------------------------------
Inorder PostDominator Tree:
  [1]  <<exit node>> {0,1}

:(

(note that even if you ignore the effects of infinite loops, bb2
should be present as an exit node that post-dominates nothing).

This patch changes post-dom to properly handle infinite loops and does
root finding during calculation to prevent empty tress in such cases.

We match gcc's (and the canonical theoretical) behavior for infinite
loops (find the backedge, connect it to the exit block).

Testcases coming as soon as i finish running this on a ton of random graphs :)

Diff Detail

Repository
rL LLVM

Event Timeline

dberlin created this revision.Feb 8 2017, 12:16 AM
dberlin edited the summary of this revision. (Show Details)Feb 8 2017, 12:17 AM
dberlin edited the summary of this revision. (Show Details)
dberlin added inline comments.Feb 8 2017, 12:21 AM
include/llvm/Support/GenericDomTreeConstruction.h
46 ↗(On Diff #87601)

Sorry, this was part of an alternate approach, will revert.

172 ↗(On Diff #87601)

Keen observers will note that this ::size call was O(N) since it's an ilist.
:(
So the code i added really only adds O(N) time worst case, since we already wasted O(N) time just counting blocks above.

194 ↗(On Diff #87601)

This comment needs updating a bit. It really ends up finding the infinite loop backedge block, which is what we want.
(note that post-dom is not unique, so if there are multiple backedges, you can't win, there is no "best" one)

249 ↗(On Diff #87601)

This got clang-formatted, i'll revert

285 ↗(On Diff #87601)

and this is leftover from a dead approach too.
i'll remove.

bryant added a comment.Feb 8 2017, 6:06 AM

Minor things.

include/llvm/Support/GenericDomTreeConstruction.h
165 ↗(On Diff #87601)

Ranged-for? for (NodeType *BB: make_range(FuncGraphT::nodes_begin(&F), FuncGraphT::nodes_end(&F))) {.

177 ↗(On Diff #87601)

Typo? "make. block."

184 ↗(On Diff #87601)

Ranged-for?

247 ↗(On Diff #87601)

Ranged?

dberlin added inline comments.Feb 8 2017, 6:37 AM
include/llvm/Support/GenericDomTreeConstruction.h
165 ↗(On Diff #87601)

Honestly, i would rather us just improve graph traits to provide ranges here.
I'm not sure i believe the ranged for version is more readable/easier to understand with make_range.
i'm happy to do that as a followup to add "nodes" and "children" to graph traits.

davide edited edge metadata.Feb 17 2017, 5:21 PM

The approach looks good to me, if you add a testcase and revert the unrelated changes I'll give it another look (possibly today)

I think a test for infinite loops (the one attached to the PR or/and slight modification of them) are fine (at least for me)

dberlin marked 6 inline comments as done.Feb 19 2017, 4:36 PM

I have updated all tests.
The region ones were particularly nutso before (IHMO), and now look sane.

there is one failure in ADCE, because previously postdom gave it an empty tree, and now gives it a real tree.
It expects that all not-reachable-from-exit nodes will not have a dom tree node, which is not correct.
It also assumes it will be able to find a safe place to redirect an edge that still exits the function, which is also not correct.
Fixes for both coming.
Then i am going to look at the two structurize cfg failures.

dberlin marked an inline comment as done.Feb 19 2017, 5:28 PM

adding the last two people to touch structurize CFG.

The code is mostly undocumented, and it seems ... interesting, in many ways .
It has an obsession with assuming regions are started and terminated with branchinsts, when they could, for example, be switches with one successor (it looks like it forces switch lowering instead of fixing the code).
It also gets successors of blocks by trying to grab terminators and walk over the terminator successors, instead of just walking the successors directly.

In any case, there is nothing to fix for these testcases.
There are two failures.
In branch-on-argument.ll, we fail invert_branch_on_arg_inf_loop.
This is because there are no SESE regions here once post-dom is correct.
The only SESE region is function entry to function exit, and structurizeCFG explicitly skips that region,

Ditto for no-branch-to-entry.ll
It's not a SESE region, so it no longer processes it.
I can't make a testcase where it wants to branch to entry anymore.

I'm XFAIL'ing it for someone to determine if it's really still needed. My guess is this was a side effect of broken SESE regions.

dberlin removed reviewers: jlebar, tstellar.
dberlin added subscribers: jlebar, tstellar.
  • Update with fixes for broken testcases, and review comments. Add test case for pr24415

I'm fine with the structurize cfg changes.

Okay. At this point i'm pretty positive i can prove that any region that now it considers the entry to be the start of the loop will be a top-level region that structurizecfg will ignore, and we can delete the test (and associated code in structurizecfg/domtree to handle it). I'm going to do so unless someone objects.

We just gave a bogus region tree before:
Printing analysis 'Detect single entry single exit regions' for function 'no_branch_to_entry_true':

[0] entry => <Function Return>
{
  entry, for.end, for.body,
  [1] entry => for.end
  {
    entry, for.body,
  }
}

#1 is definitely not a sese region. It double definitely does not include for.body, which never exits and never goes to for.end :P :

Let's go from first principles, with a little help from wikipedia:
In graph theory, a single-entry single-exit (SESE) region in a given graph is an ordered edge pair (a, b) of distinct control flow edges a and b where:

a dominates b
b postdominates a
Every cycle containing a also contains b and vice versa.

In the old world, for.end post-dominates entry (and for.body is not in the postdomree), which is wrong:

=============================--------------------------------
Inorder PostDominator Tree:
  [1]  <<exit node>> {0,5}
    [2] %for.end {1,4}
      [3] %entry {2,3}

So it believes entry->for.end is a sese region.
Which would in fact, be true if there was no loop that postdom has ignored.
The region-former then uses the definition of dominance/post-dominance, and believes that such a region must also include the other preds of entry (IE for.body). This is even reasonable and correct given the regular definition of postdom.
But of course, in this case, postdom is wrong,and really,all of the blocks are siblings in the postdomtree.
In the new world,it properly forms the postdomtree:

[1]  <<exit node>> {0,7}
   [2] %for.end {1,2}
   [2] %entry {3,4}
   [2] %for.body {5,6}

There is no sese region to be formed here. The only way you could ever end up with a sese region loop that involves entry is a normal loop:

define void @no_branch_to_entry_undef(i32 addrspace(1)* %out) {
entry:
  br i1 undef, label %for.end, label %for.body
for.body:                                         ; preds = %entry, %for.body
  store i32 999, i32 addrspace(1)* %out, align 4
  br i1 undef, label %for.body, label %for.end
for.end:                                          ; preds = %Flow
  ret void
}

It does not try to modify the entry block for such a normal loop.

Thus, i believe all of this hackery is now dead.

(and obviously, i'll xfail in this patch, and then delete it wit associated hackery in a followup)

Ping. This is ready to go in and I have a new verifier that depends on it

davide accepted this revision.Feb 28 2017, 11:31 AM

The Domtree changes look good to me but I can't comment on StructurizeCFG

This revision is now accepted and ready to land.Feb 28 2017, 11:31 AM
This revision was automatically updated to reflect the committed changes.