This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Support/
-
llvm/
-
Support/
-
GenericDomTree.h
-
GenericDomTreeConstruction.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
ADCE.cpp
-
test/
-
Analysis/
-
PostDominators/
-
infinite-loop.ll
-
infinite-loop2.ll
-
infinite-loop3.ll
-
pr24415.ll
-
pr6047_a.ll
-
pr6047_b.ll
-
pr6047_c.ll
-
pr6047_d.ll
-
RegionInfo/
-
infinite_loop.ll
-
infinite_loop_2.ll
-
infinite_loop_3.ll
-
infinite_loop_4.ll
-
infinite_loop_5_a.ll
-
infinite_loop_5_b.ll
-
CodeGen/
-
AMDGPU/
-
branch-relaxation.ll
-
ARM/
-
struct-byval-frame-index.ll
-
Thumb2/
-
v8_IT_5.ll
-
Transforms/StructurizeCFG/
-
StructurizeCFG/
-
branch-on-argument.ll
-
no-branch-to-entry.ll
-
unittests/IR/
-
IR/
-
DominatorTreeTest.cpp

Differential D35851

[Dominators] Include infinite loops in PostDominatorTree
ClosedPublic

Authored by kuhar on Jul 25 2017, 12:22 PM.

Download Raw Diff

Details

Reviewers

• dberlin
sanjoy
grosser
brzycki
davide
chandlerc
hfinkel

Commits

rG638c085d07aa: [Dominators] Include infinite loops in PostDominatorTree
rL310940: [Dominators] Include infinite loops in PostDominatorTree

Summary

This patch teaches PostDominatorTree about infinite loops. It is built on top of D29705 by @dberlin which includes a very detailed motivation for this change.

What's new is that the patch also teaches the incremental updater how to deal with reverse-unreachable regions and how to properly maintain and verify tree roots. Before that, the incremental algorithm sometimes ended up preserving reverse-unreachable regions after updates that wouldn't appear in the tree if it was constructed from scratch on the same CFG.

This patch makes the following assumptions:

A sequence of updates should produce the same tree as a recalculating it.
Any sequence of the same updates should lead to the same tree.
Siblings and roots are unordered.

The last two properties are essential to efficiently perform batch updates in the future.
When it comes to the first one, we can decide later that the consistency between freshly built tree and an updated one doesn't matter match, as there are many correct ways to pick roots in infinite loops, and to relax this assumption. That should enable us to recalculate postdominators less frequently.

This patch is pretty conservative when it comes to incremental updates on reverse-unreachable regions and ends up recalculating the whole tree in many cases. It should be possible to improve the performance in many cases, if we decide that it's important enough.
That being said, my experiments showed that reverse-unreachable are very rare in the IR emitted by clang when bootstrapping clang. Here are the statistics I collected by analyzing IR between passes and after each removePredecessor call:

# functions:  52283
# samples:  337609
# reverse unreachable BBs:  216022
# BBs:  247840796
Percent reverse-unreachable:  0.08716159869015269 %
Max(PercRevUnreachable) in a function:  87.58620689655172 %
# > 25 % samples:  471 ( 0.1395104988314885 % samples )
... in 145 ( 0.27733680163724345 % functions )

Most of the reverse-unreachable regions come from invalid IR where it wouldn't be possible to construct a PostDomTree anyway.

I would like to commit this patch in the next week in order to be able to complete the work that depends on it before the end of my internship, so please don't wait long to voice your concerns :).

Diff Detail

Repository: rL LLVM

Event Timeline

kuhar created this revision.Jul 25 2017, 12:22 PM

Herald added a subscriber: david2050. · View Herald TranscriptJul 25 2017, 12:22 PM

kuhar edited the summary of this revision. (Show Details)Jul 25 2017, 12:57 PM

kuhar added parent revisions: D35636: [Dominators] Change Roots type to SmallVector, D35597: [Dominators] Move root-finding out of DomTreeBase and simplify it.

kuhar added a reviewer: davide.Jul 25 2017, 1:04 PM

Add a comment explaining rebuilt after an unreachable deletion.

Don't recalculate the whole tree when a deletion makes a CFG node reverse-unreachable.

kuhar added inline comments.Jul 25 2017, 2:55 PM

include/llvm/Support/GenericDomTreeConstruction.h
611 ↗	(On Diff #108151)	s/infinite root/infinite loop

kuhar added a reviewer: chandlerc.Jul 25 2017, 2:55 PM

kuhar edited the summary of this revision. (Show Details)

kuhar added inline comments.Jul 25 2017, 3:21 PM

lib/Transforms/Scalar/ADCE.cpp
265 ↗	(On Diff #108151)	@dberlin: shouldn't this be: post-dom root child is a return ?

• dberlin added inline comments.Jul 25 2017, 3:34 PM

lib/Transforms/Scalar/ADCE.cpp
265 ↗	(On Diff #108151)	yes, it should be.

Fix a comment typo and a debug message in ADCE.

kuhar marked 3 inline comments as done.Jul 25 2017, 3:39 PM

kuhar added a child revision: D35869: [ADCE][Dominators] Teach ADCE to preserve dominators.Jul 25 2017, 5:38 PM

kuhar added inline comments.Jul 25 2017, 6:09 PM

lib/Transforms/Scalar/ADCE.cpp
265 ↗	(On Diff #108151)	Followup: why do we only check for ReturnInst here? I mean, shouldn't we treat UnreachableInst the same way and not mark it as live?

kuhar added inline comments.Jul 25 2017, 6:13 PM

lib/Transforms/Scalar/ADCE.cpp
265 ↗	(On Diff #108151)	I checked and continuing also on UnreachableInst doesn't seem to cause any regressions.

hiraditya added a subscriber: hiraditya.Jul 26 2017, 9:15 AM

• dberlin added inline comments.Jul 26 2017, 10:05 AM

include/llvm/Support/GenericDomTreeConstruction.h
401 ↗	(On Diff #108169)	Can't you use the forward dfs in/outnumbers to tell if the roots were redundant? A redundant root should be within the in/out of some other root, no?
lib/Transforms/Scalar/ADCE.cpp
265 ↗	(On Diff #108151)	I'm actually curious if ADCE works without this loop at all now In theory, it claims it was being done "to protect IDF calculator" (which is definitely not right, but ...). It already knows to mark loops live if !removeloops. Everything else should be fair game to remove, at least by the current logic. But i think we should worry about this in a followup, and stick with making the original loop do "the same thing"on the postdom tree it was trying to do before.

kuhar added inline comments.Jul 26 2017, 10:32 AM

include/llvm/Support/GenericDomTreeConstruction.h
401 ↗	(On Diff #108169)	It can happen that a path from a redundant root to some other one goes through reverse-reachable regions, so I think that would mean that we'd have to run the forward DFS on the whole CFG, including the forward-unreachable parts. Then, can you always tell if a node is a part of the same SCC as some other one using only DFS in/out numbers?
lib/Transforms/Scalar/ADCE.cpp
265 ↗	(On Diff #108151)	Yeah, I think that it'd be reasonable to revisit it later. (BTW, I tried deleting the loop, but it resulted in multiple things crashing/failing.)

I'm probably not the best person to review this.

I'm going to accept this.
But please give Tobias until early next week to comment, given his past concerns.
(IMHO, if he wants to suggest we do something different, at this point, I believe the onus is on him to implement such a thing and show it works)

include/llvm/Support/GenericDomTreeConstruction.h
401 ↗	(On Diff #108169)	We could get away with running forward DFS only on the reverse-unreachable parts i believe. But i think what you have is fine for now.

This revision is now accepted and ready to land.Jul 27 2017, 8:06 AM

Hi all,

sorry for coming late into the game. It seems the phabricator notifications again don't work for me. I started to look into this during the weekend (and today) and am planning to send an update.

Best,
Tobias

test/Transforms/StructurizeCFG/no-branch-to-entry.ll
1 ↗	(On Diff #108169)	Why was this XFAILed? Could you potentially add a comment why this test cases does not work any more?

• dberlin added inline comments.Jul 31 2017, 8:47 AM

test/Transforms/StructurizeCFG/no-branch-to-entry.ll
1 ↗	(On Diff #108169)	This is probably copied from my original patch. It used to generate a region that caused it to try to replace the entry block, but it does not anymore. I had left it xfailed and pinged the structurizer maintainer to see if he could generate a case where it still tried to replace the entry block. I believe it to not be possible anymore, but i couldn't be sure. So the short version is: "This test will probably end up removed, but i wanted to give it a little time"

grosser mentioned this in D36107: [PostDom] document the current handling of infinite loops and unreachables.Jul 31 2017, 1:48 PM

grosser mentioned this in D36135: [RFC] Make infinite loops available in postdom tree, but do not connect them to the reverse reachable part of the CFG.Aug 1 2017, 3:58 AM

Hi @kuhar, @dberlin,

thanks for giving me time to look into this and especially thanks for adding dynamic update support to DT/PDT. This new interface is indeed very nice!

As you know I am concerned about some of the implications of connecting infinite loops to the virtual exit (see below). But assuming this is the way to go, the structure of this patch looks OK. I just found a couple of minor typos and would appreciate some more documentation.

Specifically, while there is precedence for extending the post-dominance tree to reverse unreachable nodes the way you propose, this is not well documented in the literature. The paper you based the dominator tree construction on ("An Experimental Study of Dynamic Dominators") seems to very clearly distinguish between unreachable and reachable CFG nodes. Whereas in the new code there won't be any unreachable nodes any more. I think it would be nice to document these differences, the reason why this choice has been taken, and the impact this has on guarantees the PDT interface gives to the user.

The changes this patch introduces to D36107 will already add some documentation.

Daniel Berlin wrote some motivation in his old commit and we afterwards did quite some research that brought up both arguments for and against this change. Especially as the goal is to make the use of the PDT a lot more common, I think it would be great to document the motivation and reasoning both in the commit message and in the patch itself. Could you e.g. briefly list the other compilers that take this approach (I already looked them up for you [https://docs.google.com/document/d/1lL8xcqfnqNArvQj3TpgEXOTybr8kucagjzqvChQCUGI/edit]), clarify that this is indeed not a common extension in the literature but that it is in practice desirable for some reason (which?). I also would really appreciate a brief comment in situations where you had to adapt the original dynamic dominators algorithm to support this use case. This will certainly help others who might work on PDT in the future.

I also have two questions:

1) Does connecting edges to the virtual exit break the parent property?

You define the parent property as: "If a node gets disconnected from the graph, then all of the nodes it dominated previously will now become unreachable." After this patch, nodes that become unreachable are still connected to the graph. This is not visible in verifyParentProperty, as it does not update Roots properly, but my understanding is that this is indeed the case and a breakage you are willing to accept to ensure that all nodes are part of the CFG. Is this correct?

2) Can removing an edge from a dominator tree weaken the dominance relation?

The example I am thinking of is the one in D36107.

Assume we start with a normal (complete) CFG:

and then drop the edge C -> B.

Today + Literature (leave C unreachable)

The output we get today (and which would be computed by "An Experimental Study of Dynamic Dominators") would be the following graph with C as now being reverse unreachable not part of the graph:

The relation D postdom B is unaffected. This is to my understanding in line with the definition of post-dominance.

Proposed Patch + GCC (Connecting C to virtual exit)

This patch will now instead create the following PDT (grey edges have been removed compared to original PDT and are not part of new PDT)

Interestingly, the property D postdom B does not hold any more. To my understanding, in a complete (post) dominator tree, removing an edge should never weaken a normal dominatore tree. Is this true? Do we loose this property by supporting reverse-unreachable CFG nodes? (I am aware there are certain trade-offs we need to take, but I would like to understand and at best document them)

I believe the issues above are worth documenting.

While thinking about this problem, I experimented with connecting unreachables to the virtual exit, but dropping their connection to the reachable part of the CFG when constructing the Post-Dominator tree (D36135). Similarly to the proposed patch, we change the definition of what post-dominance means to include unreachable nodes, but as the edges we ignore can anyhow never be reached on the reverse CFG (they only became reachable through our virtual edges) this seems like a reasonable extension.

The resulting tree matches the dominator tree of the reachable part, but also contains all unreachable nodes:

If you compare the changed test cases, a lot of the regressions we see with D35851 disappear while still all nodes are part of the PDT. I managed to port all existing code to this new approach, but maybe I missed something obvious? I wonder if we have some actual test cases that would break with the above implementation? Is there any transformation pass which would become incorrect with this PDT?

include/llvm/Support/GenericDomTreeConstruction.h
267 ↗	(On Diff #108169)	terminators
1012 ↗	(On Diff #108169)	PostDominatorTree
1202 ↗	(On Diff #108169)	parent
test/Transforms/StructurizeCFG/no-branch-to-entry.ll
1 ↗	(On Diff #108169)	Interesting. @kuhar: It probably makes sense to add a comment to the test cases, that it intended to be removed.

kuhar added a child revision: D36167: [Dominators] Introduce batch updates.Aug 1 2017, 1:24 PM

@grosser

Tobias,

Before documenting my code more, let me answer your questions and comment on your counterproposal.

this is not well documented in the literature. The paper you based the dominator tree construction on ("An Experimental Study of Dynamic Dominators") seems to very clearly distinguish between unreachable and reachable CFG nodes.

I think that it’s pretty convenient to assume that functions always have a single exit for academic purposes. If this was the case in LLVM, we could build postdominators on reverse CFG and literally be done at that point. But that’s not the case, and there’s a real life difference between forward-unreachable and reverse-unreachable code. There’s no way to end up in forward-unreachable code, because we always start executing code from the (single) entry node. Whereas when it comes to reverse-unreachable code, executing it happens in practice and we actually care about optimizing such code. Many embedded systems consist of a single large infinite loop that reads some memory and writes data somewhere else. And it is actually possible to exit the infinite loop there, just by calling a function with a call to exit inside. IMHO it is not much different from dynamically infinite loops, which are naturally modeled by postdominators. Because of that, it seems entirely reasonable to me to include reverse-unreachable in the PostDominatorTree.

that it is in practice desirable for some reason (which?)

The main motivation for modeling reverse-unreachable code and (statically) infinite loops in the PostDominatorTree is safely sinking code and optimizing code within infinite loops. To do that, you cannot pretend that reverse-reachable code dominates code that can branch to an infinite loop, which I find particularly problematic in your proposal.

Let’s consider one of your examples. In this case your implantation says that B immediately postdominates D, even though it is possible to branch to C and keep looping there until something exits the program. It would seem valid to sink instruction from B to D, which would not be the case if C had a dynamically unreachable exit. This example is more about profitability, but the real problem would be hoisting here, even assuming that there are no functions call in B, C, or D. Say there is instruction that can cause undefined bahavior in D, even something as simple like divide by zero. D immediately postdominates B and is its successor, so it would seem that hoisting that instruction to B is safe (assuming that we already proved that it is from other standpoints), and that instruction wouldn’t normally be executed if the control flow entered the infinite loop C. The same applies to hoisting loads and stores from D to B.

The other problem I see is that your definition of postdominance for reverse-unreachable code doesn’t lead to useful regions, if I understand it correctly. Regions in LLVM are defined as single-entry single-exit parts of CFG where the entry dominates every node in the region and the exit postdominates every node in it. For your example, it seems that there would be 2 regions: ABD and C, which doesn’t seem correct to me, as in practice you can jump from B, get stuck and then exit through C. Using the postdominance as defined D35851 doesn’t have this problem and results in 3 regions: AB, C, and D.

Saying that reverse-unreachable code never postdominates reverse-reachable code is like saying infinite loops, even with side effects, has undefined behavior and we assume to never execute it.

You define the parent property as: "If a node gets disconnected from the graph, then all of the nodes it dominated previously will now become unreachable." After this patch, nodes that become unreachable are still connected to the graph.

Interestingly, the property D postdom B does not hold any more. To my understanding, in a complete (post) dominator tree, removing an edge should never weaken a normal dominatore tree. Is this true? Do we loose this property by supporting reverse-unreachable CFG nodes?

This may be a little bit unintuitive at first, but here’s what happens when you call .deleteEdge with my patch: it first performs the edge deletion and can then immediately makes an insertion from the virtual exit to the reverse-unreachable code. From the end user’s perspective, those two steps happen atomically, but in reality it performs two different operations with an intermediate result not observable from outside. If you consider these two internal steps in terms of the parent and sibling property, the first one holds after deletion and the second one after insertion.

This is not visible in verifyParentProperty, as it does not update Roots properly

Verifying the parent property doesn’t really depend on some information internal to the DomTree -- you could place a .verifyParentProperty call in the middle of deletion and it would check it fine. The only thing that .verify would disagree with would be the new root not present in the Roots yet, but that’s kind of expected here, and the deletion and insertion happen atomically anyway. In this respect, neither deletion nor verification seems broken to me.

In D35851#828525, @kuhar wrote:

@grosser

Tobias,

Before documenting my code more, let me answer your questions and comment on your counterproposal.

Very good idea.

this is not well documented in the literature. The paper you based the dominator tree construction on ("An Experimental Study of Dynamic Dominators") seems to very clearly distinguish between unreachable and reachable CFG nodes.

I think that it’s pretty convenient to assume that functions always have a single exit for academic purposes. If this was the case in LLVM, we could build postdominators on reverse CFG and literally be done at that point. But that’s not the case, and there’s a real life difference between forward-unreachable and reverse-unreachable code. There’s no way to end up in forward-unreachable code, because we always start executing code from the (single) entry node. Whereas when it comes to reverse-unreachable code, executing it happens in practice and we actually care about optimizing such code. Many embedded systems consist of a single large infinite loop that reads some memory and writes data somewhere else. And it is actually possible to exit the infinite loop there, just by calling a function with a call to exit inside. IMHO it is not much different from dynamically infinite loops, which are naturally modeled by postdominators. Because of that, it seems entirely reasonable to me to include reverse-unreachable in the PostDominatorTree.

While I currently do not have any benchmark where optimizing infinite loops is useful (is there one I could have a look?), let's working under the assumption that it actually is. Chandler and Daniel raised the point that just the fact that the PDT contains all nodes makes user code simpler, which is another argument for this kind of modeling. In general, both points together make a good motivation to include all nodes in the PDT. I agree, assuming we understand and are happy with the resulting implications.

that it is in practice desirable for some reason (which?)

The main motivation for modeling reverse-unreachable code and (statically) infinite loops in the PostDominatorTree is safely sinking code and optimizing code within infinite loops. To do that, you cannot pretend that reverse-reachable code dominates code that can branch to an infinite loop, which I find particularly problematic in your proposal.

Let’s consider one of your examples. In this case your implantation says that B immediately postdominates D, even though it is possible to branch to C and keep looping there until something exits the program. It would seem valid to sink instruction from B to D, which would not be the case if C had a dynamically unreachable exit.

Thank you for coming up with an actual code transformation.

I assume you mean D postdom B (rather than B postdom D).

You say the fact that D postdom B is by itself sufficient to allow code sinking from B to D. Remember the original program:

which, as in D36107, may terminate in C with "br i1 true, label %C, label %B". To my understanding there is dynamically no difference between the original code and the code after dropping the edge C -> B. Both have the property that B postdom D. Would code sinking from B to D be in your opinion valid in the original program?

This example is more about profitability, but the real problem would be hoisting here, even assuming that there are no functions call in B, C, or D. Say there is instruction that can cause undefined behavior in D, even something as simple like divide by zero. D immediately postdominates B and is its successor, so it would seem that hoisting that instruction to B is safe (assuming that we already proved that it is from other standpoints), and that instruction wouldn’t normally be executed if the control flow entered the infinite loop C. The same applies to hoisting loads and stores from D to B.

The same question as above: what prevents you to hoist these very instructions in the original program with a dynamically infinite loop?

AFAIU X postdom Y does not guarantee that after Y is executed X is at some later point certainly executed.

The other problem I see is that your definition of postdominance for reverse-unreachable code doesn’t lead to useful regions, if I understand it correctly. Regions in LLVM are defined as single-entry single-exit parts of CFG where the entry dominates every node in the region and the exit postdominates every node in it. For your example, it seems that there would be 2 regions: ABD and C, which doesn’t seem correct to me, as in practice you can jump from B, get stuck and then exit through C. Using the postdominance as defined D35851 doesn’t have this problem and results in 3 regions: AB, C, and D.

Disclosure. I added the region test cases and I believe they are indeed useful the way they are. ;-)

I assume you mean "you can jump from B to C".

Regions for fully reverse reachable CFGs can be well defined through (post-)dominance. As we try to find a post-dominance definition that is consistent with what we expect, I prefer to define regions as a subgraph of the CFG that is connected to the CFG with a single incoming edge and a single outgoing edge.

Do we agree that the region for the original program

computed today makes sense? Similar to the example above, by dropping an edge that is known to not be executed, we just restrict the set of program executions the PDT can assume. No new incoming and outcoming edges are added:

So from this perspective I would assume the new region, to still be a region. Would you agree?

Today we achieve this result by checking that the entry dominates the exit and the exit post-dominates the entry. To check if a node is contained in the region, we check if it is dominated by entry and not dominated by exit. Post-dominance for reverse-unreachable blocks is unfortunately not very useful here for the reason you stated above. Another option was to put the virtual exit edge not to the actual exit, but to the node that connect reverse unreachable and reverse reachable parts of the CFG. This would address this issue in most common cases. If this is preferable, I could port this patch to your most recent code.

Saying that reverse-unreachable code never postdominates reverse-reachable code is like saying infinite loops, even with side effects, has undefined behavior and we assume to never execute it.

You define the parent property as: "If a node gets disconnected from the graph, then all of the nodes it dominated previously will now become unreachable." After this patch, nodes that become unreachable are still connected to the graph.

Interestingly, the property D postdom B does not hold any more. To my understanding, in a complete (post) dominator tree, removing an edge should never weaken a normal dominatore tree. Is this true? Do we loose this property by supporting reverse-unreachable CFG nodes?

This may be a little bit unintuitive at first

Then it certainly warrants some documentation. Thank you for explaining!

but here’s what happens when you call .deleteEdge with my patch: it first performs the edge deletion and can then immediately makes an insertion from the virtual exit to the reverse-unreachable code. From the end user’s perspective, those two steps happen atomically, but in reality it performs two different operations with an intermediate result not observable from outside. If you consider these two internal steps in terms of the parent and sibling property, the first one holds after deletion and the second one after insertion.

Interesting! Do I understand correctly that for a normal compete (post)dominator tree these two steps fall together. Hence, both parent and sibling property hold always?

This is not visible in verifyParentProperty, as it does not update Roots properly

Verifying the parent property doesn’t really depend on some information internal to the DomTree -- you could place a .verifyParentProperty call in the middle of deletion and it would check it fine. The only thing that .verify would disagree with would be the new root not present in the Roots yet, but that’s kind of expected here, and the deletion and insertion happen atomically anyway. In this respect, neither deletion nor verification seems broken to me.

Thank you for explaining the above. Yes, verify is verifying under the assumption that no virtual edges are added. Without virtual edges, the properties certainly hold. This difference is again something interesting to point out.

Best,
Tobias

While I currently do not have any benchmark where optimizing infinite loops is useful (is there one I could have a look?), let's working under the assumption that it actually is. Chandler and Daniel raised the point that just the fact that the PDT contains all nodes makes user code simpler, which is another argument for this kind of modeling. In general, both points together make a good motivation to include all nodes in the PDT. I agree, assuming we understand and are happy with the resulting implications.

FWIW, here is some more literature that assumes you have a single entry/exit node in postdom.
http://ssabook.gforge.inria.fr/latest/book.pdf

The PDG building algorithm (whihc is the standard one, AFAIK, but you are probably more familiar with PDG than i am): ".. Similar to the CFG, a PDG also has two nodes ENTRY and EXIT,
through which control flow enters and exits the program respectively. For this
purpose, it is assumed that every node of the CFG is reachable from the entry
node and can reach the exit node."

Most *optimizations* that use post-dominators have this assumption hidden in them, stated or not.

If you stare at 12.5.3 in that book, which presents a simplified explanation of SSUPRE.

If you don't include infinite loops and connect them to exit, the sink points will be wrong. We have an impl of SSUPRE posted for review, i think you can actually get it to happen for real with it (but i haven't had time to play)

IE given a trivial variation on their program, to try to make it easy to follow

regular loop {
  store a
  if (foo) {
    infinite loop:
      load a, do something with it
    exit program or go to infinite loop
  } 
}
load a

The goal of this program is that the blocks for infinite loop do not appear in the postdominator tree currently or are not connected to exit (hopefully it is trivial to see that it is possible to accomplish this given the current root finding, but again, i have not run it through clang)
Without infinite loops included or connected to exit, it will generate:

regular loop {
  if (foo) {
    infinite loop:
      load a, do something with it
    exit program or go to infinite loop
  } 
}
store a
load a

(Note that this load is deliberately here to cause it to store back. If it wasn't here, and instead you just had "use <value stored>", it would eliminate the store in favor of a temporary, registerizing the memory access)

This is, however, wrong, as now the load in the infinite loop does not see the store.
IE It will never consider the load in the infinite loop as a use that it must consider when placing factoring points (as it will not appear in the postdominance frontier). It will see this as a full redundancy.

With infinite loops present and connected, it will do this:

regular loop {
  if (foo) {
    infinite loop:
      store a, value
      load a, do something with it
    exit program or go to infinite loop
  } 
}
store a, value
load a

The same question as above: what prevents you to hoist these very instructions in the original program with a dynamically infinite loop?

post-dominance alone is definitely not sufficient to guarantee safety in general for either hoisting or sinking.
It's often a necessary but not sufficient condition for the hoisting sinking, and is usually used to ensure non-speculation of memory along new paths.
In particular: does this load post-dominate all other loads that are the same is the trivial version of this (if so, you are not speculating this). LLVM currently hacks around this using walking some expensive testing that could be replaced with O(1) answers based on post-dominance. That would be "is the later load in a cdeq set of an earlier load". If so, they must execute together or not at all.

However, as it is also used for elimination and elimination based sinking as well
As another example: In a function without any loads (to keep it simple), a later store that post-dominates all blocks containing other stores makes all the earlier stores redundant.

if (foo)
store a
else
store a

store a

the later store post-dominating all the other stores is sufficient to eliminate them[1]

this also fails to be true in certain conditions if you either don't include the loops, or don't connect them to exit.
The important part is that the loop appears, and that it does not appear In the "wrong" part of the tree. As a trivial example why, if you add an infinite loop on a branch that, say, calls a global function that can see the store, you need to ensure we don't delete that store by seeing a wrong relationship. So the infinite loop can only be connected to blocks that will not cause us to believe that the later store post-dominates all uses.
It is certainly possible, depending on the location of stores/loads/etc, to connect the infinite loops somewhere else, and it may not matter for regions, but depending on the locations, it may affect store elimination.

[1] This notion is really is the basis of the SSUPRE transformation.
If you view the stores and aliased loads as "uses", it's basically "place a store where it postdominates all 'uses'. Eliminate any earlier 'uses' that are stores. Insert a store before any 'use' that is a load".
again, this is not a *perfect* description, because SSUPRE also considers lifetime optimality, and only tries to sink them as far as necessary to remove redundancy, instead of "as far down as it can".

In D35851#829611, @dberlin wrote:

Hi daniel,

thank you for adding some more context! Let me fire a quick reply asking some questions before looking deeper into your points!

While I currently do not have any benchmark where optimizing infinite loops is useful (is there one I could have a look?), let's working under the assumption that it actually is. Chandler and Daniel raised the point that just the fact that the PDT contains all nodes makes user code simpler, which is another argument for this kind of modeling. In general, both points together make a good motivation to include all nodes in the PDT. I agree, assuming we understand and are happy with the resulting implications.

FWIW, here is some more literature that assumes you have a single entry/exit node in postdom.
http://ssabook.gforge.inria.fr/latest/book.pdf

Fabrice's book, very cool indeed!

The PDG building algorithm (whihc is the standard one, AFAIK, but you are probably more familiar with PDG than i am): ".. Similar to the CFG, a PDG also has two nodes ENTRY and EXIT,
through which control flow enters and exits the program respectively. For this
purpose, it is assumed that every node of the CFG is reachable from the entry
node and can reach the exit node."

Most *optimizations* that use post-dominators have this assumption hidden in them, stated or not.

If you stare at 12.5.3 in that book, which presents a simplified explanation of SSUPRE.

If you don't include infinite loops and connect them to exit, the sink points will be wrong. We have an impl of SSUPRE posted for review, i think you can actually get it to happen for real with it (but i haven't had time to play)

Could you point me to the phabricator link. A quick search for SSUPRE did not bring it up.

IE given a trivial variation on their program, to try to make it easy to follow
regular loop {
  store a
  if (foo) {
    infinite loop:
      load a, do something with it
    exit program or go to infinite loop
  } 
}
load a
The goal of this program is that the blocks for infinite loop do not appear in the postdominator tree currently or are not connected to exit (hopefully it is trivial to see that it is possible to accomplish this given the current root finding, but again, i have not run it through clang)
Without infinite loops included or connected to exit, it will generate:
regular loop {
  if (foo) {
    infinite loop:
      load a, do something with it
    exit program or go to infinite loop
  } 
}
store a
load a
(Note that this load is deliberately here to cause it to store back. If it wasn't here, and instead you just had "use <value stored>", it would eliminate the store in favor of a temporary, registerizing the memory access)

This is, however, wrong, as now the load in the infinite loop does not see the store.
IE It will never consider the load in the infinite loop as a use that it must consider when placing factoring points (as it will not appear in the postdominance frontier). It will see this as a full redundancy.

With infinite loops present and connected, it will do this:
regular loop {
  if (foo) {
    infinite loop:
      store a, value
      load a, do something with it
    exit program or go to infinite loop
  } 
}
store a, value
load a

I need to think about these. Some LLVM-IR unit tests that explain the property you are looking for would make the discussion here more concrete, as I will need some time to translate your textual ideas to IR. Even without, I certainly will have a look at your pointers.

The same question as above: what prevents you to hoist these very instructions in the original program with a dynamically infinite loop?

post-dominance alone is definitely not sufficient to guarantee safety in general for either hoisting or sinking.

I fully agree that post-dominance alone is not sufficient. This seems contrary to the impression @kuhar's reply made to me. I need to think more about your answer below, but a more direct answer to the questions I asked will certainly help to clarify some of what has been discussed before. @kuhar can you help me understand if some of the previous examples make good test cases for the hoisting transformation you have in mind.

Overall, I am really not trying to shoot down this proposal. Even though I would like to have a PDT that works well with regions, if we can come up together with good test cases that show this is not possible, I am the first to commit and document them! Please help me to get your requirements written into nice understandable unit tests! If our discussions end up in useful test cases and documentation, then it was certainly of use.

It's often a necessary but not sufficient condition for the hoisting sinking, and is usually used to ensure non-speculation of memory along new paths.
In particular: does this load post-dominate all other loads that are the same is the trivial version of this (if so, you are not speculating this). LLVM currently hacks around this using walking some expensive testing that could be replaced with O(1) answers based on post-dominance. That would be "is the later load in a cdeq set of an earlier load". If so, they must execute together or not at all.

However, as it is also used for elimination and elimination based sinking as well
As another example: In a function without any loads (to keep it simple), a later store that post-dominates all blocks containing other stores makes all the earlier stores redundant.

IE

if (foo)
store a
else
store a

store a

the later store post-dominating all the other stores is sufficient to eliminate them[1]

this also fails to be true in certain conditions if you either don't include the loops, or don't connect them to exit.
The important part is that the loop appears, and that it does not appear In the "wrong" part of the tree. As a trivial example why, if you add an infinite loop on a branch that, say, calls a global function that can see the store, you need to ensure we don't delete that store by seeing a wrong relationship. So the infinite loop can only be connected to blocks that will not cause us to believe that the later store post-dominates all uses.
It is certainly possible, depending on the location of stores/loads/etc, to connect the infinite loops somewhere else, and it may not matter for regions, but depending on the locations, it may affect store elimination.

Again, I will need to think about the above and will try to come up with LLVM-IR test cases that illustrate the problem you describe. If you happen to have some available, this would be very much appreciated.

Best,
Tobias

[1] This notion is really is the basis of the SSUPRE transformation.
If you view the stores and aliased loads as "uses", it's basically "place a store where it postdominates all 'uses'. Eliminate any earlier 'uses' that are stores. Insert a store before any 'use' that is a load".
again, this is not a *perfect* description, because SSUPRE also considers lifetime optimality, and only tries to sink them as far as necessary to remove redundancy, instead of "as far down as it can".

Tobias,

To quickly address some of the remaining points:

In D35851#829390, @grosser wrote:

but here’s what happens when you call .deleteEdge with my patch: it first performs the edge deletion and can then immediately makes an insertion from the virtual exit to the reverse-unreachable code. From the end user’s perspective, those two steps happen atomically, but in reality it performs two different operations with an intermediate result not observable from outside. If you consider these two internal steps in terms of the parent and sibling property, the first one holds after deletion and the second one after insertion.

Interesting! Do I understand correctly that for a normal compete (post)dominator tree these two steps fall together. Hence, both parent and sibling property hold always?

Yes, although you cannot assume that the .deleteEdge call doesn't affect the parent property, as internally may result in an insertion. Even then, both parent and sibling property hold the whole time -- between the internal deletion an insertion, and after the whole function ends.

Thank you for explaining the above. Yes, verify is verifying under the assumption that no virtual edges are added. Without virtual edges, the properties certainly hold. This difference is again something interesting to point out.

The properties hold all the time -- the thing is that the set of virtual exit's predecessor is stored only in the tree, and if you were to compute it just looking at the CFG you get an answer that assumes that the insertion already happened. If we made it a special case of root finding, then calling .verifyRootf before the internal insertion would also work, and thus the whole .veirfy.

I fully agree that post-dominance alone is not sufficient. This seems contrary to the impression @kuhar's reply made to me

I tried to point out that it's necessary but not sufficient by saying:

... (assuming that we already proved that it is (fine) from other standpoints) ...

In D35851#829720, @grosser wrote:

In D35851#829611, @dberlin wrote:

If you stare at 12.5.3 in that book, which presents a simplified explanation of SSUPRE.

If you don't include infinite loops and connect them to exit, the sink points will be wrong. We have an impl of SSUPRE posted for review, i think you can actually get it to happen for real with it (but i haven't had time to play)

Could you point me to the phabricator link. A quick search for SSUPRE did not bring it up.

I believe that's the one: https://reviews.llvm.org/D29866
http://theory.stanford.edu/~robert/papers/rvi.ps

kuhar mentioned this in D35918: [GVNHoist] Factor out reachability to search for anticipable instructions quickly.Aug 3 2017, 3:00 PM

(Copying this comment because it ended up on the wrong review thread)
Note: There is also a rewritten DSE that uses PostDom as well: https://reviews.llvm.org/D29624

I'm not sure i will get a chance to produce LLVM IR for you (i'm sadly finding less and less time to work on LLVM these days, so i'm mostly having other people do it :P).

To give you a more described example the following completely simple algorithm (which GCC and a few other compilers use variants of, see tree-ssa-dse.c) *should* work, but will give wrong answers if you ignore infinite loops or do weird things to them.
again, i've made it not catch every case so i can do it in <50 lines. Normally, for example, there is more than one stack, instead of all possible stores on the same stack, which misses a ton of cases due to irrelevant aliasing, blah blah blah.

Stack StoreStack;  // In realer versions of this algorithm, we have multiple stacks or other structures. In fact, in this version, there will only ever be zero or one things on the stack.
void walkPostDomChildren(DomTreeNode *DTN)
{
  BasicBlock *BB = DTN->getBlock();
  if (!BB)
    return;
 // Ensure the that the store at the top of the stack is from something above us in our tree.  Normally would be done by checking if  dfs numbers of the current block is contained within the dfs in/out numbers of the top of stack.   See newgvn.cpp, predicateinfo.cpp, etc for ValueStack, which does this.


Pop Store Stack until block for instruction at  Stack.top() postdominates DTN.
 
  for (auto *I : reverse(BB))
    if (auto *SI = dyn_cast<StoreInst>(I)) {
      // If whatever is on the top of the stack is a must alias and post-dominates us, we're dead.
      if (!Stack.empty() &&  AA.isMustAlias(Stack.top()->getPointerOperand(), I->getPointerOperand()) )
         I->eraseFromParent();
      else if (Stack.empty())
        Stack.push(SI)
      else
        // This is really a may-def, so we must clear the stack. In this stupid version, this is true even if it noaliases top of stack.
        Stack.clear()
       continue;
   }
      if (!(getModRefInfo(I) & MRI_NoModRef))
        // A use of memory, so reset our stack
        Stack.clear();
  for (auto * Child : DTN)
       walkPostDomChildren(DTN)
}

walkPostDomChildren(PDT->getRoot())

Basically, if a store postdominates another earlier store without an intervening may-use/may-def, the earlier store is dead. There is nothing that could possibly see it happen before the later store.

Modulo transcription bugs, this algorithm is provably correct for our simplified assumptions. It will only eliminate stores that are dead.

Given the following program:

Block 1
store a, 5
if (foo)
{
Block 2:
     call foo(load a)
     goto Block 2
}
Block 3:
store a, 6

If infinite loops are not in the post-dominator tree, the tree will look like this:

3
|
1

The algorithm will visit 3, then push store a, 6 on the stack, visit 1, and use it to remove store a, 5
This is illegal. The store a, 5, is visible and used in block 2, and block 2 in fact, calls a function with value we stored.

Whoops!

If infinite loops are connected to virtual exit, the tree will look like this:

virtual exit
 /  |  \
3  2   1

We will no longer eliminate store a, 5, and we will get a correct answer because at each point, nothing will be on the stack.
We will visit 3, push store a, 6. We will visit 2, pop the stack because store a, 6 is no longer in scope. Stack will stay empty throughout 2.
We will visit 1, push store a, 5. We will exit the algorithm with this store on the stack, having eliminated nothing.
We will get the correct answers no matter how many things no longer exit. This is a safe and conservative post-dominator tree for our algorithm.

Is this perfect? For certain, no.

In this example, for *this* algorithm, you can connect the infinite loop anywhere you like as long as you do not generate the post-dominator tree:

3
|
1
|
2

or
3
|
2

or
3
|  \
2   1

If you generate that, you will eliminate the store a, 5 again, and you should not.

I believe the last one, sadly, is what your other patch will generate. It's not going to end up legal to simply cause the reverse-unreachable parts to have no effect on post-dom, because they are still forward-reachable.

As an example of what would theoretically work:

virtual exit
 |   \
 3    \
 |       1
 2

Would also give a correct answer for this algorithm, on this example. But would be not valid as a postdominator tree because there is no reverse path from 3 to 2, *and* there is a reverse path from 3 to 1, but 3 is not an ancestor of 1.

Sadly, there is no good-in-all-cases to be had

In fact, if the program was:

Block 1
store a, 5
if (foo)
{
Block 2:
     goto Block 2
}
Block 3:
store a, 6

we will still generate:

virtual exit
 /  |  \
3  2   1

but in practice, we could eliminate store a, 5 as it's not possible to see that it didn't happen in this example.

The greater problem with not connecting the infinite loops to exit is that "where it is legal to connect the loops to" (IE what doesn't break this algorithm) is now complex. Effectively, it must take into account where the memory instructions are (though the exact conditions that must be fulfilled do not have to be expressed like this)
Otherwise, i can generate an adversarial program that will give a wrong answer when you apply the above algorithm.
IE by placing loads and stores and calls such that you end up with the same problem as the 3->1->2 post dom tree above.

In particular, for the algorithm above, the places for where you connect the infinite loops must ensure either the stack gets cleared in the right place, or that the blocks are siblings instead of parent child, so the scope gets reset. (IE either we must always see the may-use in the infinite loop and clear the stack before going above it, or we must pop the stack so that we do not try to eliminate something "above" the infinite loop using something "below" the infinite loop)

Worse than that, that's just what works with *this* algorithm.

SSUPRE, which requires a conservative correct postdominance frontier, has worse requirements you'd have to prove are met.
So does Sreedhar and Gao's IDF algorithm (which is what IDF calculator uses), which requires the CFG and PostDom tree be in sync, and postdom have the parent and sibling property.
(This is solvable by making the real cfg have the set of fake edges you are creating, but they'd need to be walked when you call predecessors. If you connect them all to a fake exit, even if you have algorithms that must know about the edges, it's one block to special case instead of many)

Additionally, if you use post-dominance to prove speculation-safety, it's now not enough to know where the load/stores are, you'd have to know where every trapping instruction is to ensure the place you connect the infinite loop does not imply speculation safety where it should not.

Essentially: Even if you completely ignore the sibling and parent properties, connecting the infinite loops or changing the paths in general requires enough knowledge of the algorithms being used to be able to prove that you aren't going to generate post-dom trees that make them wrong.
In practice, since at least the IDF calculator relies on the sibling and parent properties, this ends up a practical requirement for the connection points...

All that seems ... really fragile and difficult :)

In D35851#829720, @grosser wrote:

I need to think about these. Some LLVM-IR unit tests that explain the property you are looking for would make the discussion here more concrete, as I will need some time to translate your textual ideas to IR. Even without, I certainly will have a look at your pointers.

<snip>

Overall, I am really not trying to shoot down this proposal. Even though I would like to have a PDT that works well with regions, if we can come up together with good test cases that show this is not possible, I am the first to commit and document them! Please help me to get your requirements written into nice understandable unit tests! If our discussions end up in useful test cases and documentation, then it was certainly of use.

<snip>

Again, I will need to think about the above and will try to come up with LLVM-IR test cases that illustrate the problem you describe. If you happen to have some available, this would be very much appreciated.

Hey Tobias,

Jakub mentioned that he was essentially blocked here and there didn't seem to be a good path forward, and I have to agree that this doesn't seem like a very tenable end state...

I see two high level issues here:

You've somewhat left this review in a bad place. "I need to think about this" without a real time bound doesn't seem like a tenable state. It's been almost a week without update. I think it is very reasonable for folks to want to make progress without getting blocked like this.

You seem to be setting the bar *really* high. I know that regions are used in Polly, but they aren't used in any other part of LLVM and postdom has fairly immediate uses. While I'm also interested in figuring out the right way to integrate things like Polly's regions with postdom, I don't think it is really reasonable to hold up pretty significant improvements to the rest of LLVM on that working. I feel like we should be able to make more incremental progress here.

My suggestion would be something like the following:

If this actively breaks some Polly use case, we should get a test case for that *now*. And if so, it might make sense to have Polly use its own postdom code until the interactions here are sorted out. There doesn't seem to be any need for one singular implementation of this stuff.

Jakub and Danny make progress on postdoms here, specifically making them faster to update and more viable for some of the non-region use cases.

If/when folks have a better idea of how to integrate them with regions *or* an understanding of why they can't *then* we add the region test cases and support you're mentioning.

Essentially, 100% integration with regions, an analysis with a very specialized single user at the moment, doesn't seem like a great hard requirement for us to make progress. While I don't want to break regions and Polly, I think these things need to be able to move independently rather than being forced to boil the ocean.

In D35851#835026, @chandlerc wrote:

In D35851#829720, @grosser wrote:

I need to think about these. Some LLVM-IR unit tests that explain the property you are looking for would make the discussion here more concrete, as I will need some time to translate your textual ideas to IR. Even without, I certainly will have a look at your pointers.

<snip>

Overall, I am really not trying to shoot down this proposal. Even though I would like to have a PDT that works well with regions, if we can come up together with good test cases that show this is not possible, I am the first to commit and document them! Please help me to get your requirements written into nice understandable unit tests! If our discussions end up in useful test cases and documentation, then it was certainly of use.

<snip>

Again, I will need to think about the above and will try to come up with LLVM-IR test cases that illustrate the problem you describe. If you happen to have some available, this would be very much appreciated.

Hey Tobias,

Jakub mentioned that he was essentially blocked here and there didn't seem to be a good path forward, and I have to agree that this doesn't seem like a very tenable end state...

I fully agree with you. We should get this as quickly resolved as possible!

I see two high level issues here:

You've somewhat left this review in a bad place. "I need to think about this" without a real time bound doesn't seem like a tenable state. It's been almost a week without update. I think it is very reasonable for folks to want to make progress without getting blocked like this.

I also would like to move on, especially as I would like Jakub to be able to close this last point (and also because it takes a lot of my time). I am doing my best to look into this. It would really help if we can boil this down to some LLVM-IR test cases and which we can use to better document the guarantees the proposed post-dominator implementation gives. I now need to find a time slot to translate Dany's examples to LLVM-IR. If I could get help with this, this would be amazing.

You seem to be setting the bar *really* high. I know that regions are used in Polly, but they aren't used in any other part of LLVM and postdom has fairly immediate uses. While I'm also interested in figuring out the right way to integrate things like Polly's regions with postdom, I don't think it is really reasonable to hold up pretty significant improvements to the rest of LLVM on that working. I feel like we should be able to make more incremental progress here.

While this tangentially also affects Polly (for infinite loops), we can certainly work around this. Dany even suggested to have a Polly compatible mode.

Now, my main concerns are not regions or Polly, but the semantics of post-dominance in general. We are implementing an extension which is not documented in papers (even though other similar implementations exist). I would like to get the guarantees and implications documented before we move large portions of LLVM to use this.

My main concerns is the fact that the introduction of virtual edges weakens the post-dominance relationship. If we want to verify the correct placement of life-range metadata one could e.g. require that the life-range-end node post-dominates the live-range start. This invariant could be verified throughout the pass transformations to ensure no pass breaks this invariant. Unfortunately, when cfg-simplify drops edges and we re-connect the backwards unreachable nodes to the exit node as proposed here, the post-dominance relation is weakened and the verification surprisingly fails. I would really like to understand (and see documentation) if writing such checks
with the proposed post-dominance information would require us to think about such possible invalidations at each point where we potentially make parts of the CFG backwards unreachable.

This is not the case when working with a complete CFG (or a CFG where unreachable nodes are not modeled).

My suggestion would be something like the following:

If this actively breaks some Polly use case, we should get a test case for that *now*. And if so, it might make sense to have Polly use its own postdom code until the interactions here are sorted out. There doesn't seem to be any need for one singular implementation of this stuff.

There are test cases in tree since years, which I believe break. What else can I provide?

I clearly dislike two versions, but even if we indeed introduce two versions their guarantees and semantics differences should be made clear.

Jakub and Danny make progress on postdoms here, specifically making them faster to update and more viable for some of the non-region use cases.

I totally agree. This is great in general and I am very glad to see this going in. Jakub did an outstanding job here.

If/when folks have a better idea of how to integrate them with regions *or* an understanding of why they can't *then* we add the region test cases and support you're mentioning.

Essentially, 100% integration with regions, an analysis with a very specialized single user at the moment, doesn't seem like a great hard requirement for us to make progress. While I don't want to break regions and Polly, I think these things need to be able to move independently rather than being forced to boil the ocean.

Polly won't break with this change, but it will be very difficult to asses alternatives if the current requirements are not documented and tested.

Let me propose a way forward:

I discussed this very same issue with Hal at the LLVM summer school for over two hours. He has very good understanding of this subject. If he can review this patch faster then me and agrees with the direction, I am glad to accept to step back from this review.

I really would like to get at least one (or two) LLVM-IR test cases that clarify the corner-case properties of the proposed patch. If Jakub could help me to translate Dany's example into a self-contained test case, this would be outstanding. This would make it easier for me to reply. Even if not, I will try to look into this tonight or tomorrow night at the latest and will propose Dany's example as new test case. This will then hopefully give more insights.

Best,
Tobias

Rebase to ToT and address remarks.

Fix a typo.

kuhar added a reviewer: hfinkel.Aug 8 2017, 4:48 PM

In D35851#834070, @dberlin wrote:

(Copying this comment because it ended up on the wrong review thread)
Note: There is also a rewritten DSE that uses PostDom as well: https://reviews.llvm.org/D29624

Hi Daniel,

sorry for the delay. Let me get back into the discussion.

I'm not sure i will get a chance to produce LLVM IR for you (i'm sadly finding less and less time to work on LLVM these days, so i'm mostly having other people do it :P).

Especially because of this, thank you very much for taking the time to discuss this in so much detail. This is indeed very helpful.

To give you a more described example the following completely simple algorithm (which GCC and a few other compilers use variants of, see tree-ssa-dse.c) *should* work, but will give wrong answers if you ignore infinite loops or do weird things to them.
again, i've made it not catch every case so i can do it in <50 lines. Normally, for example, there is more than one stack, instead of all possible stores on the same stack, which misses a ton of cases due to irrelevant aliasing, blah blah blah.
Stack StoreStack;  // In realer versions of this algorithm, we have multiple stacks or other structures. In fact, in this version, there will only ever be zero or one things on the stack.
void walkPostDomChildren(DomTreeNode *DTN)
{
  BasicBlock *BB = DTN->getBlock();
  if (!BB)
    return;
 // Ensure the that the store at the top of the stack is from something above us in our tree.  Normally would be done by checking if  dfs numbers of the current block is contained within the dfs in/out numbers of the top of stack.   See newgvn.cpp, predicateinfo.cpp, etc for ValueStack, which does this.


Pop Store Stack until block for instruction at  Stack.top() postdominates DTN.
 
  for (auto *I : reverse(BB))
    if (auto *SI = dyn_cast<StoreInst>(I)) {
      // If whatever is on the top of the stack is a must alias and post-dominates us, we're dead.
      if (!Stack.empty() &&  AA.isMustAlias(Stack.top()->getPointerOperand(), I->getPointerOperand()) )
         I->eraseFromParent();
      else if (Stack.empty())
        Stack.push(SI)
      else
        // This is really a may-def, so we must clear the stack. In this stupid version, this is true even if it noaliases top of stack.
        Stack.clear()
       continue;
   }
      if (!(getModRefInfo(I) & MRI_NoModRef))
        // A use of memory, so reset our stack
        Stack.clear();
  for (auto * Child : DTN)
       walkPostDomChildren(DTN)
}

walkPostDomChildren(PDT->getRoot())
Basically, if a store postdominates another earlier store without an intervening may-use/may-def, the earlier store is dead. There is nothing that could possibly see it happen before the later store.

Modulo transcription bugs, this algorithm is provably correct for our simplified assumptions. It will only eliminate stores that are dead.

Given the following program:
Block 1
store a, 5
if (foo)
{
Block 2:
     call foo(load a)
     goto Block 2
}
Block 3:
store a, 6

This is a great example, which perfectly highlights the part I do not yet feel is clear to me.

AFAIU your program corresponds to the following piece of LLVM-IR (which happens to almost correspond to the visualizations I posted before):

define void @foo(float* %a) {
   ....
}

define void @foo(i1 %cond, float* %a) {
block1:
   store float 5.0, float* %a
   br i1 %cond, label %block2, label %block3

block2:
  call foo(float* %a)
  br label %block2

block3:
  store float 6.0, float* %a
}

which is very similar in nature to the one I posted above:

If infinite loops are not in the post-dominator tree, the tree will look like this:
3
|
1
The algorithm will visit 3, then push store a, 6 on the stack, visit 1, and use it to remove store a, 5
This is illegal. The store a, 5, is visible and used in block 2, and block 2 in fact, calls a function with value we stored.

Whoops!

If infinite loops are connected to virtual exit, the tree will look like this:
virtual exit
 /  |  \
3  2   1
We will no longer eliminate store a, 5, and we will get a correct answer because at each point, nothing will be on the stack.
We will visit 3, push store a, 6. We will visit 2, pop the stack because store a, 6 is no longer in scope. Stack will stay empty throughout 2.
We will visit 1, push store a, 5. We will exit the algorithm with this store on the stack, having eliminated nothing.
We will get the correct answers no matter how many things no longer exit. This is a safe and conservative post-dominator tree for our algorithm.

Is this perfect? For certain, no.

In this example, for *this* algorithm, you can connect the infinite loop anywhere you like as long as you do not generate the post-dominator tree:
3
|
1
|
2

or
3
|
2

or
3
|  \
2   1
If you generate that, you will eliminate the store a, 5 again, and you should not.

Interesting, but slightly confusing. Assume you change the previous program slightly:

define void @foo(float* %a) {
   ....
}

define void @foo(i1 %cond, float* %a) {
block1:
   store float 5.0, float* %a
   br i1 %cond, label %block2, label %block3

block2:
  call foo(float* %a)
  br i1 true, label %block2, label %block1   <---- Only this line has changed

block3:
  store float 6.0, float* %a
}

You get a CFG very similar to the one I posted earlier:

In your case the PDT is the first of the three you just mentioned:

3

1

2

This is the post-dominator tree which we compute today and which won't be changed by this proposal (as no reverse-unreachable blocks exist). Following your description the above post-dominator tree has the property that block3 post-dominates block1. What prevents the algorithm above to delete the store in block1 in the second slightly modified case? AFAIU the PDTs of both examples are identical and your algorithm does not look at the CFG edges. Hence I would expect it to incorrectly delete the original store also in the slightly modified example. Do I miss something obvious?

I will look into this again tomorrow to understand better where I am wrong.

One other slightly related question. Do you assume that A post-dominates B means that a program that reaches B eventually must reach A?

Hi Jakub,

thanks for the new documentation. Just a couple of comments, most minor. This is a clear improvement and will help the discussion (and future people developing the code). Now just trying to understand where I am wrong in Daniel's example.

Best,
Tobias

lib/Transforms/Scalar/ADCE.cpp
265 ↗	(On Diff #108151)	Should this be: post-dom root child is a return It seems the old version slipped in again?
unittests/IR/DominatorTreeTest.cpp
410 ↗	(On Diff #110305)	nice
451 ↗	(On Diff #110305)	because
455 ↗	(On Diff #110305)	Thank you for the nice documentation. I still don't like that we loose the dominance relation between B and D, but at least this behavioral change is very clear.

Fix nits and add infinite loop testcases (based on Danny's examples).

kuhar marked an inline comment as done.Aug 8 2017, 5:59 PM

Add Dani's email reply to Phab:

If you generate that, you will eliminate the store a, 5 again, and you should not.

Interesting, but slightly confusing.

This is what i get for trying to reduce an algorithm quickly.
I'm sure i incorrectly removed some postdominance frontier checks or something.

It's probably easiest to take one of the working PDSE patches and look at that.

Assume you change the previous program slightly:

define void @foo(float* %a) {
....
}

define void @foo(i1 %cond, float* %a) {
block1:
store float 5.0, float* %a
br i1 %cond, label %block2, label %block3

block2:
call foo(float* %a)
br i1 true, label %block2, label %block1 <---- Only this line has changed

block3:
store float 6.0, float* %a
}

You get a CFG very similar to the one I posted earlier:

F3887544: Post Dominance.png https://reviews.llvm.org/F3887544
In your case the PDT is the first of the three you just mentioned:

3

>
> |
>
>
> 1
>
> |
>
>
> 2

This is the post-dominator tree which we compute today and which won't be changed by this proposal (as no reverse-unreachable blocks exist).

This program should be illegal, and as far as i can tell, it is :)

-> % bin/opt -view-cfg foo2.ll
Entry block to function must not have predecessors!
label %block1
bin/opt: foo2.ll: error: input module is broken!

Let's assume that's trivial to fix though, because it is.

One other slightly related question. Do you assume that A post-dominates B means that a program that reaches B eventually must reach A?

You can't say anything block wise, only edge wise.
That is, C is dependent on edge A->B if C postdominates B and C does not strictly postdominate A.

Added Dani's email reply II to phab:

So, i definitely removed a postdominancefrontier check.
The PDF of X is equivalent to the set of branch blocks upon which X is control dependent.
Though, as mentioned this only means the block controls whether X executes, you need to label the edges.

FWIW:I noticed that earlier, you mentioned literature does not talk about what we are doing, though there are practical implementations. It's worth pointing out that literature doesn't often talk about it as part of the postdominator construction itself, but as part of the CFG part that they build post-dom on top of. They very often assume and require a unique exit node where everything has a path to it.

Here are some example seminal papers that do this, and then build post-dom/post-domfrontiers on top of it:

Cytron's Control dependence paper:
http://polaris.cs.uiuc.edu/~padua/cs426/p451-cytron.pdf page 456.
Note that they require everything have a path to exit:
"We assume that each
node is on a path from Entry and on a path to Exit"

Ferrante's program dependence graph paper
http://dl.acm.org/citation.cfm?id=24041
"Definition 1. A control flow graph is a directed graph G augmented with a
unique entry node START and a unique exit node STOP such that each node in
the graph has at most two successors. ... We further assume that for any node N in G there exists a path
from START to N and a path from N to STOP. "

If you start with a CFG where you require every node go to exit, you don't need to do anything special just when computing postdom, because you just added the previously virtual edges to the actual CFG instead :)

On Tue, Aug 8, 2017 at 6:43 PM, Daniel Berlin <dberlin@dberlin.org> wrote:

If you generate that, you will eliminate the store a, 5 again, and you should not.

Interesting, but slightly confusing.

This is what i get for trying to reduce an algorithm quickly.
I'm sure i incorrectly removed some postdominance frontier checks or something.

It's probably easiest to take one of the working PDSE patches and look at that.

Assume you change the previous program slightly:

define void @foo(float* %a) {
....
}

define void @foo(i1 %cond, float* %a) {
block1:
store float 5.0, float* %a
br i1 %cond, label %block2, label %block3

block2:
call foo(float* %a)
br i1 true, label %block2, label %block1 <---- Only this line has changed

block3:
store float 6.0, float* %a
}

You get a CFG very similar to the one I posted earlier:

F3887544: Post Dominance.png https://reviews.llvm.org/F3887544
In your case the PDT is the first of the three you just mentioned:

3

>
> |
>
>
> 1
>
> |
>
>
> 2

This is the post-dominator tree which we compute today and which won't be changed by this proposal (as no reverse-unreachable blocks exist).

This program should be illegal, and as far as i can tell, it is :)

-> % bin/opt -view-cfg foo2.ll
Entry block to function must not have predecessors!
label %block1
bin/opt: foo2.ll: error: input module is broken!

Let's assume that's trivial to fix though, because it is.

One other slightly related question. Do you assume that A post-dominates B means that a program that reaches B eventually must reach A?

You can't say anything block wise, only edge wise.
That is, C is dependent on edge A->B if C postdominates B and C does not strictly postdominate A.

In any case, i'm sure i screwed up my translation of the algorithm, sorry ;)

In D35851#836313, @grosser wrote:

Add Dani's email reply to Phab:

If you generate that, you will eliminate the store a, 5 again, and you should not.

Interesting, but slightly confusing.

This is what i get for trying to reduce an algorithm quickly.

That's OK. It gives us an example we can work with. At least we are now on the same page. We still need to correct the example and identify the property you are interested in, but I feel we make progress. Thanks a lot again!

I'm sure i incorrectly removed some postdominance frontier checks or something.

I doubt a post-dominance frontier check can be the reason. The post-dominance frontier is based on the PDT, so it should be identical for the two examples.

It's probably easiest to take one of the working PDSE patches and look at that.

AFAIK, you have been working closely with the PDSE people. The patches seem still WIP, so I will have a harder time digging through them to find the part that matters. Any chance you happen to have one of the authors in the office, such that you can ask him/her over lunch?

Assume you change the previous program slightly:

define void @foo(float* %a) {
....
}

define void @foo(i1 %cond, float* %a) {
block1:
store float 5.0, float* %a
br i1 %cond, label %block2, label %block3

block2:
call foo(float* %a)
br i1 true, label %block2, label %block1 <---- Only this line has changed

block3:
store float 6.0, float* %a
}

You get a CFG very similar to the one I posted earlier:

F3887544: Post Dominance.png https://reviews.llvm.org/F3887544
In your case the PDT is the first of the three you just mentioned:

3

>
> |
>
>
> 1
>
> |
>
>
> 2

This is the post-dominator tree which we compute today and which won't be changed by this proposal (as no reverse-unreachable blocks exist).

This program should be illegal, and as far as i can tell, it is :)

-> % bin/opt -view-cfg foo2.ll
Entry block to function must not have predecessors!
label %block1
bin/opt: foo2.ll: error: input module is broken!

Let's assume that's trivial to fix though, because it is.

LLVM does not allow jumps to the entry node. Jakub fixed it correctly by adding an auxiliary block that does nothing but jumping to block1.

One other slightly related question. Do you assume that A post-dominates B means that a program that reaches B eventually must reach A?

You can't say anything block wise, only edge wise.

I am confused. Post-dominance is defined on BBs and paths, so the above should make sense. To me it seems to be where the confusion comes from. To my understanding A post-dominates B does not guarantee that a program that reaches B eventually must reach A. In all your previous examples, you assumed otherwise. For PDSE to be correct, this cannot be assumed there.

That is, C is dependent on edge A->B if C postdominates B and C does not strictly postdominate A.

That seems to be the definition of control dependence. The properties we can take from this depend on what post-dominance guarantees.

In D35851#836314, @grosser wrote:

Added Dani's email reply II to phab:

So, i definitely removed a postdominancefrontier check.
The PDF of X is equivalent to the set of branch blocks upon which X is control dependent.
Though, as mentioned this only means the block controls whether X executes, you need to label the edges.

FWIW:I noticed that earlier, you mentioned literature does not talk about what we are doing, though there are practical implementations. It's worth pointing out that literature doesn't often talk about it as part of the postdominator construction itself, but as part of the CFG part that they build post-dom on top of. They very often assume and require a unique exit node where everything has a path to it.

Here are some example seminal papers that do this, and then build post-dom/post-domfrontiers on top of it:

Cytron's Control dependence paper:
http://polaris.cs.uiuc.edu/~padua/cs426/p451-cytron.pdf page 456.
Note that they require everything have a path to exit:
"We assume that each
node is on a path from Entry and on a path to Exit"

Ferrante's program dependence graph paper
http://dl.acm.org/citation.cfm?id=24041
"Definition 1. A control flow graph is a directed graph G augmented with a
unique entry node START and a unique exit node STOP such that each node in
the graph has at most two successors. ... We further assume that for any node N in G there exists a path
from START to N and a path from N to STOP. "

If you start with a CFG where you require every node go to exit, you don't need to do anything special just when computing postdom, because you just added the previously virtual edges to the actual CFG instead :)

Very right, this phrase is very common (and basically the indication that they did not think about incomplete CFGs at all). This is why none of the above papers directly applies. We always first need to fix the CFG, and then can use the papers. How we do this fix is something we need to show is correct and does not break earlier assumptions.

Following Chandler's suggestion, here an idea of a path forward:

Let's nail the PDSE thingy. We have a test case we all understand and a transformation that does something useful. Let's use this to identify the PDT property you need (and others that may not be needed) and write this down in the documentation. We now had two examples which we first believed work and then it became clear that just checking the PDT is not enough. If we all get confused by this, how can a newcomer not need documentation here.

Depending on the outcome of 1) we can either already clearly rule out my alternative suggestion or show that they indeed would generally not break the PDSE algorithm.

I hope to converge quickly on 1) and 2). Even if we agree that my alternatives would not break things immediately, we should still judge their usefulness. I believe that by (commonly) not breaking the PDT in the reverse-reachable graph, they have some benefit. However, you still have the earlier code complexity argument on your side. I would love to see to have you at least evaluate these options (considering the outcome of our discussion) as I believe the needed changes are small. However, I clearly admit that in this case Jakub and your thoughts about code complexity are very important, so should have a strong weight in this discussion.

(Editing because hit submit earliy)

In D35851#836315, @grosser wrote:

In D35851#836313, @grosser wrote:

Add Dani's email reply to Phab:

If you generate that, you will eliminate the store a, 5 again, and you should not.

Interesting, but slightly confusing.

This is what i get for trying to reduce an algorithm quickly.

That's OK. It gives us an example we can work with. At least we are now on the same page. We still need to correct the example and identify the property you are interested in, but I feel we make progress. Thanks a lot again!

I'm sure i incorrectly removed some postdominance frontier checks or something.

I doubt a post-dominance frontier check can be the reason. The post-dominance frontier is based on the PDT, so it should be identical for the two examples.

This reasoning is not quite correct:
FIrst, remember that with this patch, we generate a different PDT for your example and mine. That's in fact, part of the point :)

Also remember the post dominance frontier is a combination of CFG and postdom edges. It represents the immediately control dependent branches for the blocks (IE you can build the CDG from the RDF and reversing the mapping).
Otherwise, the DF and PDF would not need to walk the CFG.

But let's go through them:
Control dependence for these examples is definitely different.
Let's call this A:

declare void @foo2(float* %a)
define void @foo(i1 %cond, float* %a) {
entry:
  br label %block1
 block1:
     store float 5.0, float* %a
     br i1 %cond, label %block2, label %block3

  block2:
    call void @foo2(float* %a)
    br label %block2

  block3:
    store float 6.0, float* %a
    ret void
}

and this B:

declare void @foo2(float* %a)
define void @foo(i1 %cond, float* %a) {
entry:
  br label %block1
 block1:
     store float 5.0, float* %a
     br i1 %cond, label %block2, label %block3

  block2:
    call void @foo2(float* %a)
    br i1 true, label %block2, label %block1

  block3:
    store float 6.0, float* %a
    ret void
}

In A, we get:

Printing analysis 'Post-Dominator Tree Construction' for function 'foo':
=============================--------------------------------
Inorder PostDominator Tree: DFSNumbers invalid: 0 slow queries.
  [1]  <<exit node>> {4294967295,4294967295} [0]
    [2] %block3 {4294967295,4294967295} [1]
    [2] %block1 {4294967295,4294967295} [1]
      [3] %entry {4294967295,4294967295} [2]
    [2] %block2 {4294967295,4294967295} [1]

In B, we get:

Printing analysis 'Post-Dominator Tree Construction' for function 'foo':
=============================--------------------------------
Inorder PostDominator Tree: DFSNumbers invalid: 0 slow queries.
  [1]  <<exit node>> {4294967295,4294967295} [0]
    [2] %block3 {4294967295,4294967295} [1]
      [3] %block1 {4294967295,4294967295} [2]
        [4] %entry {4294967295,4294967295} [3]
        [4] %block2 {4294967295,4294967295} [3]

From Cytron's paper:

A CFG node Y is control dependent on a CFG node X if both of the
following hold:
(1) There is a nonnull path p: X ~ Y such that Y postdominates every node
after X on p.
(2) The node Y does not strictly postdominate the node X

LEMMA 11. Let X and Y be CFG nodes. Then Y postdominates a successor
of X if and only if (iff ) there is a nonnull path p: X ~ Y such that Y
postdominates every node after X on p.
COROLLARY 1. Let X and Y be nodes in CFG. Then Y is control dependent
on X in CFG iff X~ DF( Y) in RCFG.

So, it should be obvious that given these PDT's, the PDF is going to be different.
In one example, block 3 postdominates all the successors of block 1, in the other, it does not postdominate any, so they can't have the same PDF. Which they shouldn't because they don't have the same control dependence graph no matter how you build it :)

IDFCalculator agrees:

For A:

IDF for label %entry is: {}
IDF for label %block1 is: {}
IDF for label %block2 is: {label %block2 , label %block1 , }
IDF for label %block3 is: {label %block1 , }}

For B:

IDF for label %entry is: {}
IDF for label %block1 is: {label %block1 , }
IDF for label %block2 is: {label %block2 , label %block1 , }
IDF for label %block3 is: {}

However, despite these deficiencies, You should also be able to clearly see that even if the PDT's *were* the same, the definition in the cytron paper means that changing the successor edges of a block can cause the PDF to change, even if the PDT is identical.
This is because control dependence can change, and thus, the PDF must change ;)
If this wasn't true, control dependence would *only be based on the PDT*, and *not the CFG*. But we've already established that is not true. A postdominates B alone does not imply that A executes whenever B does.

Your example, in fact, proves this :)

In D35851#836315, @grosser wrote:

In D35851#836313, @grosser wrote:

Add Dani's email reply to Phab:

If you generate that, you will eliminate the store a, 5 again, and you should not.

Interesting, but slightly confusing.

This is what i get for trying to reduce an algorithm quickly.

That's OK. It gives us an example we can work with. At least we are now on the same page. We still need to correct the example and identify the property you are interested in, but I feel we make progress. Thanks a lot again!

I'm sure i incorrectly removed some postdominance frontier checks or something.

I doubt a post-dominance frontier check can be the reason. The post-dominance frontier is based on the PDT, so it should be identical for the two examples.

It's probably easiest to take one of the working PDSE patches and look at that.

AFAIK, you have been working closely with the PDSE people. The patches seem still WIP, so I will have a harder time digging through them to find the part that matters. Any chance you happen to have one of the authors in the office, such that you can ask him/her over lunch?

Sadly, no. They don't work for me or Google :)
The only other reference code for PDSE that i'm aware of is Open64, which is "hard to follow" to say the least.

Assume you change the previous program slightly:

define void @foo(float* %a) {
....
}

define void @foo(i1 %cond, float* %a) {
block1:
store float 5.0, float* %a
br i1 %cond, label %block2, label %block3

block2:
call foo(float* %a)
br i1 true, label %block2, label %block1 <---- Only this line has changed

block3:
store float 6.0, float* %a
}

You get a CFG very similar to the one I posted earlier:

F3887544: Post Dominance.png https://reviews.llvm.org/F3887544
In your case the PDT is the first of the three you just mentioned:

3

>
> |
>
>
> 1
>
> |
>
>
> 2

This is the post-dominator tree which we compute today and which won't be changed by this proposal (as no reverse-unreachable blocks exist).

This program should be illegal, and as far as i can tell, it is :)

-> % bin/opt -view-cfg foo2.ll
Entry block to function must not have predecessors!
label %block1
bin/opt: foo2.ll: error: input module is broken!

Let's assume that's trivial to fix though, because it is.

LLVM does not allow jumps to the entry node. Jakub fixed it correctly by adding an auxiliary block that does nothing but jumping to block1.

One other slightly related question. Do you assume that A post-dominates B means that a program that reaches B eventually must reach A?

You can't say anything block wise, only edge wise.

I am confused. Post-dominance is defined on BBs and paths, so the above should make sense.

Sorry, let me be clear.
You can define it block wise, but it requires more than the PDT alone, as you must look at successors of the block.

To me it seems to be where the confusion comes from. To my understanding A post-dominates B does not guarantee that a program that reaches B eventually must reach A. In all your previous examples, you assumed otherwise.

I have not assumed otherwise :)

For PDSE to be correct, this cannot be assumed there.

On this we agree.

Following Chandler's suggestion, here an idea of a path forward:

Let's nail the PDSE thingy. We have a test case we all understand and a transformation that does something useful. Let's use this to identify the PDT property you need (and others that may not be needed) and write this down in the documentation. We now had two examples which we first believed work and then it became clear that just checking the PDT is not enough.

If we all get confused by this, how can a newcomer not need documentation here.

I believe at this point, what you want is documented?
If not, can you point out what is not?

Depending on the outcome of 1) we can either already clearly rule out my alternative suggestion or show that they indeed would generally not break the PDSE algorithm.

This is the part i disagree with.
Assume for a second your suggestion is an improvement, and in fact even an amazing one!.
We have already shown what we are doing here is conservatively correct.

If your suggestion is an improvement, and we can later prove it, awesome. Let's do that.

But given that the majority of this patch is unrelated to your suggestion, and this patch can be shown to work, why isn't the right answer:
Commit this patch, if you can improve it in a followup, improve it!

In fact, in pretty much every other review i've seen, we've explicitly asked people to separate followup improvements.

I think we should do the same here.
I get the distinct impression, given how hard you are pushing not to follow this path, that you are worried that if we commit this patch, that will not happen.
But i really don't understand why.

I think this patch {c,sh}ould be considered for commit as is.
The reason is twofold:

The current postdom behaviour has some known brokeness that caused several bugs. PDSE is just an example, but try to fuzz LLVM and you'll find all this sort of craziness. I've reviewed the original change in January to change this behaviour, which was reverted because you disagreed. I'm afraid that until somebody sits down and implements what you propose (probably you :) the discussion can go on and on indefinitely.
I'd say this patch is an improvement on the current state, so I don't think we should hold on forever.

If you start with a CFG where you require every node go to exit, you don't need to do anything special just when computing postdom, because you just added the previously virtual edges to the actual CFG instead :)

Very right, this phrase is very common (and basically the indication that they did not think about incomplete CFGs at all).
This is why none of the above papers directly applies.

Just to note: the authors of both of these papers definitely thought of infinite loops that do not exit the program, and I don't believe they considered it an incomplete CFG.
Because what i was told (by a coauthor of at least one) was that what they wrote in the paper was their way of implying they just connected the non-exiting nodes to exit by way of fake edges, just like we are suggesting.
You are of course, welcome to ask as well, and see if they had any other tricks up their sleeve :)

BTW, if you wish to convince yourself what i said about two identical PDTs not generating the same PDF, here is an example:

define void @foo(i1 %cond, float* %a) {
entry:
    br label %block1
 block1:
    br label %block3
 block3:
    br i1 true, label %block4, label %block4
 block4:
    ret void
}

and

define void @foo(i1 %cond, float* %a) {
entry:
  br label %block1
 block1:
  br label %block3
 block3:
  br i1 true, label %block3, label %block4
 block4:
  ret void
}

Inorder PostDominator Tree: DFSNumbers invalid: 0 slow queries.
  [1]  <<exit node>> {4294967295,4294967295} [0]
    [2] %block4 {4294967295,4294967295} [1]
      [3] %block3 {4294967295,4294967295} [2]
        [4] %block1 {4294967295,4294967295} [3]
          [5] %entry {4294967295,4294967295} [4]
Roots: %block4
IDF for label %entry is: {}
IDF for label %block1 is: {}
IDF for label %block3 is: {}
IDF for label %block4 is: {}

Inorder PostDominator Tree: DFSNumbers invalid: 0 slow queries.
  [1]  <<exit node>> {4294967295,4294967295} [0]
    [2] %block4 {4294967295,4294967295} [1]
      [3] %block3 {4294967295,4294967295} [2]
        [4] %block1 {4294967295,4294967295} [3]
          [5] %entry {4294967295,4294967295} [4]
Roots: %block4
IDF for label %entry is: {}
IDF for label %block1 is: {}
IDF for label %block3 is: {label %block3 , }
IDF for label %block4 is: {}

As you can see, i've changed the PostDominanceFrontier without changing the PDT by varying strict postdomination of successors.

Also someone pointed out to me today. If you want another literature reference for virtual edges for infinite loops, Bob Morgan's Building an Optimizing Compiler covers it, see pages 82-83 and figure 3.10.

In D35851#836771, @davide wrote:

I think this patch {c,sh}ould be considered for commit as is.

OK, another person is voting against me. I certainly won't single handedly block this this. As I said, I would like to have a brief opinion from Hal, as we discussed extensively. I will ask him. If he agrees with you guys, I am out.

The reason is twofold:

The current postdom behaviour has some known brokeness that caused several bugs. PDSE is just an example, but try to fuzz LLVM and you'll find all this sort of craziness. I've reviewed the original change in January to change this behaviour, which was reverted because you disagreed. I'm afraid that until somebody sits down and implements what you propose (probably you :) the discussion can go on and on indefinitely.

I implemented an alternative (and proposed a second which I implemented earlier). I am still waiting for a test-case to break them.

I'd say this patch is an improvement on the current state, so I don't think we should hold on forever.

So you are explicitly not concerned about the fact that these virtual edges weaken the post-dominance relation? If a pass inserts certain checks at post-dominance points and later wants to verify this invariant still holds, the invariant may be broken by deleting dead edges. I think this is a real concern.

This patch regresses code that was perfectly good over years.

One path to go quickly forward would be to commit my alternative (which does not regress much code), continue to develop code based on this alternative until we find test cases that actually break. It seems Daniel has some, I need to go over his mail now.

@hfinkel : Hi Hal, we discussed this topic for a while in Paris. I would very much appreciate your opinion on this patch.

In D35851#838769, @grosser wrote:

@hfinkel : Hi Hal, we discussed this topic for a while in Paris. I would very much appreciate your opinion on this patch.

I'll look at this tomorrow.

In D35851#838768, @grosser wrote:

In D35851#836771, @davide wrote:

I think this patch {c,sh}ould be considered for commit as is.

OK, another person is voting against me. I certainly won't single handedly block this this. As I said, I would like to have a brief opinion from Hal, as we discussed extensively. I will ask him. If he agrees with you guys, I am out.

The reason is twofold:

The current postdom behaviour has some known brokeness that caused several bugs. PDSE is just an example, but try to fuzz LLVM and you'll find all this sort of craziness. I've reviewed the original change in January to change this behaviour, which was reverted because you disagreed. I'm afraid that until somebody sits down and implements what you propose (probably you :) the discussion can go on and on indefinitely.

I implemented an alternative (and proposed a second which I implemented earlier). I am still waiting for a test-case to break them.

There is nothing in llvm today that will break. It's been designed for the things we do today, which already completely ignore large classes of types of blocks. As Chandler said, i'm not sure that's a good use case to base the future on.
It's about what we want to implement in the future, and looking at what others have done to make that happen.
Your other approach is based on what you consider to be reasonable invariants: You are okay with, given one of the infinite loop examples

    A
 /    \
B     C

Where C is an infinite loop, saying that all paths to exit go through B.
That's definitely going to break optimizations we want to implement. You want me to go through, in excruciating detail, how. You want implementations of those optimizations to read.
Rather than come up with endless papers and examples here, and some point i think my suggestion is: Compile a GCC or Open64 where you've disabled connecting the infinite loops to exit before postdom, and see what breaks, and understand that. Because it does break!
In gcc, you'd have to hack up calc_dfs_tree a bit, since it's done inside post-dominators itself, besides some optimizations.
In Open64, you should be able to edit CFG::Process_multi_entryexit to not connect the infinite loops to exit (it does it all the time before computing postdom)
That's likely to lead to cfgs and understanding a lot easier than me trying to explain PDSE to you from first principles :)

I'd say this patch is an improvement on the current state, so I don't think we should hold on forever.

So you are explicitly not concerned about the fact that these virtual edges weaken the post-dominance relation?

See, your view is they weaken it, while others have a perspective that it is making it conservatively correct.

If a pass inserts certain checks at post-dominance points and later wants to verify this invariant still holds, the invariant may be broken by deleting dead edges. I think this is a real concern.

They are not forward-dead, actually. That's part of the whole problem. As Jakub pointed out long ago, the fact that they are reverse unreachable does not mean they cannot be executed.

This patch regresses code that was perfectly good over years.

This code has known be broken for many years, and we have patches waiting for review that implement optimizations that even try to disable themselves in the presence of this these postdom bugs. There is external code that does the same. I know you disagree, so i'm going to leave this alone past that. I don't think we will agree here.

One path to go quickly forward would be to commit my alternative (which does not regress much code), continue to develop code based on this alternative until we find test cases that actually break. It seems Daniel has some, I need to go over his mail now.

Please don't.
Look.
One of my big problem with all of this is that you are suggesting we are treading on fresh snow. From your perspective, no literature deals with this, explicitly, and we are on our own, and can do what we want, so we should try to do something good. That's awesome. We should strive to do the best we can.
Except: we aren't treading on fresh snow.

They had infinite loops in the 1980's and 1990's too, and, for example, the authors of papers i cited will happily tell you their compilers either

Inserted fake edges in the CFG
Pretended those fake edges existed during postdominators.

They didn't just do nothing.

Your own research showed that other compilers take this approach, and *no compiler or literature takes your approach* (and FWIW, at least one of the ones you say does nothing actually carefully ensures the CFG represents infinite loops in a certain way).
Even "building a compiler" books say that you should insert fake edges from infinite loops to exit or you will break global optimizations.

At some point, i think it's unreasonable to say "all of this is inapplicable".
I think at some point, the onus is on you to prove that your approach won't break these things, even if we don't implement them yet, since the whole point of doing this is for us to be able to implement them.
There are patches outstanding for some of these optimizations if you want to try them (but i'd recommend using gcc or open64, see below)
In part simply disagree about what regions should look like, and it's unlikely we'll ever agree.
So i'm going to stick with "does it break optimizations".
We are suggesting doing a thing we *know* works with *all* of the algorithms, today's, and the future ones, because it's implemented.
(I again, am cognizant of the fact you consider the test case changes to be regressions). I don't believe they are, and they are the same as other compilers generate)

If you want to try your way in those things, and see what will break, that's awesome. I'm pretty sure the answer is "a bunch". I suggest those over applying existing opt patches out for review because those compilers have worked this way for years, so "most" bugs have been worked out :) In particular, Open64 has probably the most complete implementation of things using post-dominators.

I think that we should go ahead with this approach.

I agree with Tobias that this has some unfortunate properties. For example, if we have:

entry:
lifetime_begin(a);
br cond, blocka, blockb

blocka:
br blocka

blockb:
lifetime_end(a);
br blockc

blockc:
ret

Can we use post-dominance to verify that the lifetime_end(a) is correctly placed and sufficient (i.e. do the set of lifetime_end(a) post-dominate the lifetime_start(a)? If we connect infinite loops to the exit node, then the answer is no (because, as in this example, it isn't true that the set of lifetime_end(a) post-dominates lifetime_start(a), even though the set is correctly placed and sufficient). What's frustrating about this is that the definition of post-dominance seems perfect for this task: we want to ensure that all paths from the lifetime_start(a) to the exit pass through one of the lifetime_end(a)s.

There is also a small wrinkle in the above argument in that it also ignores functions that may throw (which don't induce any extra edges in the PDT to the exit). In the particular case of lifetime management of stack locations, this is okay (because the unwind edge implicitly includes a lifetime_end of everything on the stack). I'm not sure how well this generalizes, however, to other use cases. There's also no good way to change this (because functions that might unwind aren't terminators -- which I think that we should change, but that's perhaps another story).

Nevertheless, given that we have infinite loops, it is unclear that we'd ever be able to use post-dominance to verify the lifetime intrinsics. To do so would imply that, among other things, we could use post-dominance within infinite loops as well. Given that, within the loop there are no paths to the exit node, or if you add one you must do it arbitrarily, it is not obvious to me that you can construct a definition that makes sense there.

These issues are discussed in the literature. For example, see:

A new foundation for control-dependence and slicing for modern program structures
VP Ranganath, et al. - ESOP, 2005
http://link.springer.com/content/pdf/10.1007/b107380.pdf#page=89
http://www.dtic.mil/get-tr-doc/pdf?AD=ADA443736

There seems to be an active effort, to the present day, to understand how to analyze non-terminating programs, especially in the program-slicing literature - search for "non-termination sensitive control dependence" (on Google scholar or similar) and you'll find a significant amount of discussion. However, I don't see a clear answer in the literature for what to do here, and quickly scanning a few of the papers, it seems that the relevant algorithms are all less efficient than the ones based on traditional post-dominance (i.e. which connect everything to a virtual exit node). When Tobias and I discussed this in June, I don't recall settling on an obviously-better scheme. We discussed various ways for adding virtual edges to other places in the graph, instead of to the exit node, but no heuristic seemed ideal. I don't recall discussing removing edges as has been discussed here (and whereas with adding edges it still seems possible to generate a conservatively-correct graph, removing edges seems prone to doing otherwise - and that makes me a bit nervous). As a result, I'm in favor of moving forward with the current patch. I'd like to think that we can do better, but I'm convinced that we know how yet.

Another thing that Tobias and I discussed in June was a desire to handle unreachable and infinite loops in a consistent way. I agree with this, and while unfortunate, this does have that consistency property.

In D35851#839798, @hfinkel wrote:

I think that we should go ahead with this approach.

Thanks for reviewing this.

I agree with Tobias that this has some unfortunate properties.

I agree too, fwiw!

Nevertheless, given that we have infinite loops, it is unclear that we'd ever be able to use post-dominance to verify the lifetime intrinsics. To do so would imply that, among other things, we could use post-dominance within infinite loops as well. Given that, within the loop there are no paths to the exit node, or if you add one you must do it arbitrarily, it is not obvious to me that you can construct a definition that makes sense there.

FWIW: I believe your first sentence is correct.
If you apply Tobias's patch, the following should work but will be broken:

declare void @foo2(float* %a)
define void @foo(i1 %cond, float* %a) {
entry:
  ; lifetime.start(a)
  br label %block1
block1:
   br i1 undef, label %block2, label %block5
block2:
   br i1 undef, label %block2, label %block3
block3:
   ;lifetime_end(a);
   br label %block4
block4:
   br label %block4
block5:
   ;lifetime_end(a);
   ret void
}

Equivalent CFG:

A
|  \
B   C
      |
      D
      |
      E

where C and E are also self-loops.

I believe this is legal.
Tobias's patch gives this:

[1]  <<exit node>> {4294967295,4294967295} [0]
   [2] %block5 {4294967295,4294967295} [1]
     [3] %block1 {4294967295,4294967295} [2]
       [4] %entry {4294967295,4294967295} [3]
   [2] %block4 {4294967295,4294967295} [1]
     [3] %block3 {4294967295,4294967295} [2]
       [4] %block2 {4294967295,4294967295} [3]

This implies the lifetime end is not legally placed in block 3 (I deliberately kept it out of the loops to avoid discussion of that).

It seems perfectly legal to me.
It's reachable from entry, and i can't see a reason it's not legal to say "memory ends here".

You can make block 4 whatever you want (a call that doesn't return, etc). It still seems legal to place a memory end in block 3.

These issues are discussed in the literature. For example, see:

A new foundation for control-dependence and slicing for modern program structures
VP Ranganath, et al. - ESOP, 2005
http://link.springer.com/content/pdf/10.1007/b107380.pdf#page=89
http://www.dtic.mil/get-tr-doc/pdf?AD=ADA443736

I think we can also get away with separating post-dominance and control dependence if we really want.
There is nothing that stops us from saying "Use a CDG for control dependence", and making that work however we want (or in multiple ways).

I am on a wedding today, so don't have a lot of time to discuss. As I said, I won't be the only to block this. I would appreciate us condensing the results of this discussion into documentation. I really believe it is important to understand and document this well. Now, let's do this post-commit. Then we can discuss the different implications in a relaxed way. Thanks for your time guys!

In D35851#836771, @davide wrote:

I think this patch {c,sh}ould be considered for commit as is.
The reason is twofold:

The current postdom behaviour has some known brokeness that caused several bugs. PDSE is just an example, but try to fuzz LLVM and you'll find all this sort of craziness. I've reviewed the original change in January to change this behaviour, which was reverted because you disagreed. I'm afraid that until somebody sits down and implements what you propose (probably you :) the discussion can go on and on indefinitely.

Btw, could you provide a reference to any of these bugs or the test case that was committed to LLVM-IR? Have they been related to the treatment of infinite loops. I am trying throughout this discussion to get actual use cases that break to get to a more technical level of discussion. If there are already in-tree bugs that would be fixed by this change, this would make things a lot clearer for me.

In D35851#836771, @davide wrote:

I think this patch {c,sh}ould be considered for commit as is.
The reason is twofold:

The current postdom behaviour has some known brokeness that caused several bugs. PDSE is just an example, but try to fuzz LLVM and you'll find all this sort of craziness. I've reviewed the original change in January to change this behaviour, which was reverted because you disagreed. I'm afraid that until somebody sits down and implements what you propose (probably you :) the discussion can go on and on indefinitely.

A few things, just so they end up here:
The NTSCD folks, unfortunately, seem to prove all of the algorithms for weak order dependence are equivalent to those that require with a unique end node. See, http://people.cs.ksu.edu/~tamtoft/Papers/Amt+al:WODvsCD/short.pdf . This means, i believe, outside of the NTSCD algorithms, we won't have a lot of luck with infinite loops.

The NTSCD paper mentions the augmentation we perform here in section 3.2 as being standard (see http://people.cs.ksu.edu/~tamtoft/Papers/Ran+Amt+Ban+Dwy+Hat:ESOP-2005/long.pdf).
They sadly point out, also, that "All definitions of control dependences that we are aware of require that CFGs satisfy the unique end node requirement"
(IE there is no magic we can use prior to this paper)

On the nice side, the NTSCD algorithms do not look that hard to implement.
On the not nice side, all the algorithms i can find are N^3 or N^4.

There was one paper author who believed significantly more efficient algorithms may exist (http://www.sciencedirect.com/science/article/pii/S0304397511007377).
He is, unfortunately, dead (not kidding).

I found one real world implementation of NTSCD (that doesn't use a published algorithm) (Joana, an information flow tool. Source on github)

However, now for some mixed news:
I'm positive you can prove any NTSCD algorithm based on all-pairs reachability of the CFG must be at least O(N*E) (IE N^2).
CFGs are a regular language (AFAIK!). The fastest known algorithms for all-pairs reachability on regular languages are (N*E)

Proof:
CFGs are trivially convertible to deterministic finite automata (uniquely label, generation transition table from successors, done)
All DFAs are trivially convertible to CFGs (again, AFAIK)
The language accepted by DFAs are exactly the set of regular languages.
Yannakakis gives a N*E algorithm for all-pairs reachability on regular languages (there is no faster known algorithm)

The good news is that it should be trivial to improve that part of the NTSCD algorithm (which is the N^3 part), *but* until they have algorithms that do not require all-pairs reachability, i do not believe you can make it faster than N^2.

I feel like you can also make this practically faster through various tricks that would not change complexity. IE you should be able to avoid all pairs reachability for the parts of the graph that don't go through loops (IE generate condensation graph with marked condensed regions to get the maximal part of the cfg that does not go through a loop, and handle these separately)

When running tests for all targets, I discovered that 3 codegen tests got affected by this patch.

I think it would be best if someone took a look at the changes -- I'm especially concerned about register spilling in test/CodeGen/ARM/struct-byval-frame-index.ll. I can see that the test was updates a few days ago by D34099.
@uabelho, @kparzysz, could you take a look?

Herald added subscribers: javed.absar, nhaehnle. · View Herald TranscriptAug 14 2017, 3:57 PM

Rebase the patch to the ToT.

uabelho added inline comments.Aug 15 2017, 4:53 AM

test/CodeGen/ARM/struct-byval-frame-index.ll
7–8 ↗	(On Diff #111093)	Remove or update this comment depending on how the checks below end up.
10–12 ↗	(On Diff #111093)	Is there any way to check that the spill/reload are placed at reasonable places instead of just checking that they exist at all? Before the change I made in D34099, the checks looked like ; CHECK: str r{{.}}, [sp, [[SLOT:#[0-9]+]]] @ 4-byte Spill ; CHECK: bl RestoreMVBlock8x8 ; CHECK: bl RestoreMVBlock8x8 ; CHECK: bl RestoreMVBlock8x8 ; CHECK: ldr r{{.}}, [sp, [[SLOT]]] @ 4-byte Reload which isn't very picky either, but at least it's something. (I don't know the testcase at all, I just convinced myself that my change didn't actually break anything, and then tried to update the testcase to still check that the spill/reload are generated at some reasonable places)

kuhar added inline comments.Aug 15 2017, 9:15 AM

test/CodeGen/ARM/struct-byval-frame-index.ll
10–12 ↗	(On Diff #111093)	After this patch the two slots are different and the reload uses lr instead of sp, so I don't think we can verify it better here.

Update the comment in struct-byval-frame-index.ll.

kuhar marked an inline comment as done.Aug 15 2017, 10:40 AM

Closed by commit rL310940: [Dominators] Include infinite loops in PostDominatorTree (authored by kuhar). · Explain WhyAug 15 2017, 11:17 AM

This revision was automatically updated to reflect the committed changes.

bollu mentioned this in D36934: [PPCGCodeGeneration] Enable `polly-codegen-perf-monitoring` for PPCGCodegen..Aug 21 2017, 4:43 AM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Support/

GenericDomTree.h

24 lines

GenericDomTreeConstruction.h

446 lines

lib/

Transforms/

Scalar/

ADCE.cpp

32 lines

test/

Analysis/

PostDominators/

30 lines

34 lines

34 lines

18 lines

8 lines

8 lines

52 lines

10 lines

RegionInfo/

4 lines

11 lines

14 lines

14 lines

2 lines

1 line

CodeGen/

AMDGPU/

branch-relaxation.ll

16 lines

ARM/

struct-byval-frame-index.ll

17 lines

Thumb2/

v8_IT_5.ll

7 lines

Transforms/

StructurizeCFG/

branch-on-argument.ll

9 lines

no-branch-to-entry.ll

7 lines

unittests/

IR/

DominatorTreeTest.cpp

163 lines

Diff 111214

llvm/trunk/include/llvm/Support/GenericDomTree.h

Show First 20 Lines • Show All 411 Lines • ▼ Show 20 Lines	#endif
bool dominates(const NodeT A, const NodeT B) const;		bool dominates(const NodeT A, const NodeT B) const;

NodeT *getRoot() const {		NodeT *getRoot() const {
assert(this->Roots.size() == 1 && "Should always have entry node!");		assert(this->Roots.size() == 1 && "Should always have entry node!");
return this->Roots[0];		return this->Roots[0];
}		}

/// findNearestCommonDominator - Find nearest common dominator basic block		/// findNearestCommonDominator - Find nearest common dominator basic block
/// for basic block A and B. If there is no such block then return NULL.		/// for basic block A and B. If there is no such block then return nullptr.
NodeT findNearestCommonDominator(NodeT A, NodeT *B) const {		NodeT findNearestCommonDominator(NodeT A, NodeT *B) const {
		assert(A && B && "Pointers are not valid");
assert(A->getParent() == B->getParent() &&		assert(A->getParent() == B->getParent() &&
"Two blocks are not in same function");		"Two blocks are not in same function");

// If either A or B is a entry block then it is nearest common dominator		// If either A or B is a entry block then it is nearest common dominator
// (for forward-dominators).		// (for forward-dominators).
if (!this->isPostDominator()) {		if (!isPostDominator()) {
NodeT &Entry = A->getParent()->front();		NodeT &Entry = A->getParent()->front();
if (A == &Entry \|\| B == &Entry)		if (A == &Entry \|\| B == &Entry)
return &Entry;		return &Entry;
}		}

DomTreeNodeBase<NodeT> *NodeA = getNode(A);		DomTreeNodeBase<NodeT> *NodeA = getNode(A);
DomTreeNodeBase<NodeT> *NodeB = getNode(B);		DomTreeNodeBase<NodeT> *NodeB = getNode(B);

▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	if (IDom) {
find(IDom->Children, Node);		find(IDom->Children, Node);
assert(I != IDom->Children.end() &&		assert(I != IDom->Children.end() &&
"Not in immediate dominator children set!");		"Not in immediate dominator children set!");
// I am no longer your child...		// I am no longer your child...
IDom->Children.erase(I);		IDom->Children.erase(I);
}		}

DomTreeNodes.erase(BB);		DomTreeNodes.erase(BB);

		if (!IsPostDom) return;

		// Remember to update PostDominatorTree roots.
		auto RIt = llvm::find(Roots, BB);
		if (RIt != Roots.end()) {
		std::swap(*RIt, Roots.back());
		Roots.pop_back();
		}
}		}

/// splitBlock - BB is split and now it has one successor. Update dominator		/// splitBlock - BB is split and now it has one successor. Update dominator
/// tree to reflect this change.		/// tree to reflect this change.
void splitBlock(NodeT *NewBB) {		void splitBlock(NodeT *NewBB) {
if (IsPostDominator)		if (IsPostDominator)
Split<Inverse<NodeT *>>(NewBB);		Split<Inverse<NodeT *>>(NewBB);
else		else
Split<NodeT *>(NewBB);		Split<NodeT *>(NewBB);
}		}

/// print - Convert to human readable form		/// print - Convert to human readable form
///		///
void print(raw_ostream &O) const {		void print(raw_ostream &O) const {
O << "=============================--------------------------------\n";		O << "=============================--------------------------------\n";
if (this->isPostDominator())		if (IsPostDominator)
O << "Inorder PostDominator Tree: ";		O << "Inorder PostDominator Tree: ";
else		else
O << "Inorder Dominator Tree: ";		O << "Inorder Dominator Tree: ";
if (!DFSInfoValid)		if (!DFSInfoValid)
O << "DFSNumbers invalid: " << SlowQueries << " slow queries.";		O << "DFSNumbers invalid: " << SlowQueries << " slow queries.";
O << "\n";		O << "\n";

// The postdom tree can have a null root if there are no returns.		// The postdom tree can have a null root if there are no returns.
if (getRootNode()) PrintDomTree<NodeT>(getRootNode(), O, 1);		if (getRootNode()) PrintDomTree<NodeT>(getRootNode(), O, 1);
		if (IsPostDominator) {
		O << "Roots: ";
		for (const NodePtr Block : Roots) {
		Block->printAsOperand(O, false);
		O << " ";
		}
		O << "\n";
		}
}		}

public:		public:
/// updateDFSNumbers - Assign In and Out numbers to the nodes while walking		/// updateDFSNumbers - Assign In and Out numbers to the nodes while walking
/// dominator tree in dfs order.		/// dominator tree in dfs order.
void updateDFSNumbers() const {		void updateDFSNumbers() const {
if (DFSInfoValid) {		if (DFSInfoValid) {
SlowQueries = 0;		SlowQueries = 0;
▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Support/GenericDomTreeConstruction.h

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	struct ChildrenGetter<NodePtr, true> {
}		}
};		};

template <typename DomTreeT>		template <typename DomTreeT>
struct SemiNCAInfo {		struct SemiNCAInfo {
using NodePtr = typename DomTreeT::NodePtr;		using NodePtr = typename DomTreeT::NodePtr;
using NodeT = typename DomTreeT::NodeType;		using NodeT = typename DomTreeT::NodeType;
using TreeNodePtr = DomTreeNodeBase<NodeT> *;		using TreeNodePtr = DomTreeNodeBase<NodeT> *;
		using RootsT = decltype(DomTreeT::Roots);
static constexpr bool IsPostDom = DomTreeT::IsPostDominator;		static constexpr bool IsPostDom = DomTreeT::IsPostDominator;

// Information record used by Semi-NCA during tree construction.		// Information record used by Semi-NCA during tree construction.
struct InfoRec {		struct InfoRec {
unsigned DFSNum = 0;		unsigned DFSNum = 0;
unsigned Parent = 0;		unsigned Parent = 0;
unsigned Semi = 0;		unsigned Semi = 0;
NodePtr Label = nullptr;		NodePtr Label = nullptr;
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	friend raw_ostream &operator<<(raw_ostream &O, const BlockNamePrinter &BP) {

return O;		return O;
}		}
};		};

// Custom DFS implementation which can skip nodes based on a provided		// Custom DFS implementation which can skip nodes based on a provided
// predicate. It also collects ReverseChildren so that we don't have to spend		// predicate. It also collects ReverseChildren so that we don't have to spend
// time getting predecessors in SemiNCA.		// time getting predecessors in SemiNCA.
template <typename DescendCondition>		//
		// If IsReverse is set to true, the DFS walk will be performed backwards
		// relative to IsPostDom -- using reverse edges for dominators and forward
		// edges for postdominators.
		template <bool IsReverse = false, typename DescendCondition>
unsigned runDFS(NodePtr V, unsigned LastNum, DescendCondition Condition,		unsigned runDFS(NodePtr V, unsigned LastNum, DescendCondition Condition,
unsigned AttachToNum) {		unsigned AttachToNum) {
assert(V);		assert(V);
SmallVector<NodePtr, 64> WorkList = {V};		SmallVector<NodePtr, 64> WorkList = {V};
if (NodeToInfo.count(V) != 0) NodeToInfo[V].Parent = AttachToNum;		if (NodeToInfo.count(V) != 0) NodeToInfo[V].Parent = AttachToNum;

while (!WorkList.empty()) {		while (!WorkList.empty()) {
const NodePtr BB = WorkList.pop_back_val();		const NodePtr BB = WorkList.pop_back_val();
auto &BBInfo = NodeToInfo[BB];		auto &BBInfo = NodeToInfo[BB];

// Visited nodes always have positive DFS numbers.		// Visited nodes always have positive DFS numbers.
if (BBInfo.DFSNum != 0) continue;		if (BBInfo.DFSNum != 0) continue;
BBInfo.DFSNum = BBInfo.Semi = ++LastNum;		BBInfo.DFSNum = BBInfo.Semi = ++LastNum;
BBInfo.Label = BB;		BBInfo.Label = BB;
NumToNode.push_back(BB);		NumToNode.push_back(BB);

for (const NodePtr Succ : ChildrenGetter<NodePtr, IsPostDom>::Get(BB)) {		constexpr bool Direction = IsReverse != IsPostDom; // XOR.
		for (const NodePtr Succ : ChildrenGetter<NodePtr, Direction>::Get(BB)) {
const auto SIT = NodeToInfo.find(Succ);		const auto SIT = NodeToInfo.find(Succ);
// Don't visit nodes more than once but remember to collect		// Don't visit nodes more than once but remember to collect
// ReverseChildren.		// ReverseChildren.
if (SIT != NodeToInfo.end() && SIT->second.DFSNum != 0) {		if (SIT != NodeToInfo.end() && SIT->second.DFSNum != 0) {
if (Succ != BB) SIT->second.ReverseChildren.push_back(BB);		if (Succ != BB) SIT->second.ReverseChildren.push_back(BB);
continue;		continue;
}		}

▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	for (unsigned i = 2; i < NextDFSNum; ++i) {
NodePtr WIDomCandidate = WInfo.IDom;		NodePtr WIDomCandidate = WInfo.IDom;
while (NodeToInfo[WIDomCandidate].DFSNum > SDomNum)		while (NodeToInfo[WIDomCandidate].DFSNum > SDomNum)
WIDomCandidate = NodeToInfo[WIDomCandidate].IDom;		WIDomCandidate = NodeToInfo[WIDomCandidate].IDom;

WInfo.IDom = WIDomCandidate;		WInfo.IDom = WIDomCandidate;
}		}
}		}

template <typename DescendCondition>		// PostDominatorTree always has a virtual root that represents a virtual CFG
unsigned doFullDFSWalk(const DomTreeT &DT, DescendCondition DC) {		// node that serves as a single exit from the function. All the other exits
unsigned Num = 0;		// (CFG nodes with terminators and nodes in infinite loops are logically
		// connected to this virtual CFG exit node).
		// This functions maps a nullptr CFG node to the virtual root tree node.
		void addVirtualRoot() {
		assert(IsPostDom && "Only postdominators have a virtual root");
		assert(NumToNode.size() == 1 && "SNCAInfo must be freshly constructed");

// If the DT is a PostDomTree, always add a virtual root.
if (IsPostDom) {
auto &BBInfo = NodeToInfo[nullptr];		auto &BBInfo = NodeToInfo[nullptr];
BBInfo.DFSNum = BBInfo.Semi = ++Num;		BBInfo.DFSNum = BBInfo.Semi = 1;
BBInfo.Label = nullptr;		BBInfo.Label = nullptr;

NumToNode.push_back(nullptr); // NumToNode[n] = V;		NumToNode.push_back(nullptr); // NumToNode[1] = nullptr;
}		}

const unsigned InitialNum = Num;		// For postdominators, nodes with no forward successors are trivial roots that
for (auto *Root : DT.Roots) Num = runDFS(Root, Num, DC, InitialNum);		// are always selected as tree roots. Roots with forward successors correspond
		// to CFG nodes within infinite loops.
		static bool HasForwardSuccessors(const NodePtr N) {
		assert(N && "N must be a valid node");
		using TraitsTy = GraphTraits<typename DomTreeT::ParentPtr>;
		return TraitsTy::child_begin(N) != TraitsTy::child_end(N);
		}

return Num;		static NodePtr GetEntryNode(const DomTreeT &DT) {
		assert(DT.Parent && "Parent not set");
		return GraphTraits<typename DomTreeT::ParentPtr>::getEntryNode(DT.Parent);
}		}

static void FindAndAddRoots(DomTreeT &DT) {		// Finds all roots without relaying on the set of roots already stored in the
		// tree.
		// We define roots to be some non-redundant set of the CFG nodes
		static RootsT FindRoots(const DomTreeT &DT) {
assert(DT.Parent && "Parent pointer is not set");		assert(DT.Parent && "Parent pointer is not set");
using TraitsTy = GraphTraits<typename DomTreeT::ParentPtr>;		RootsT Roots;

		// For dominators, function entry CFG node is always a tree root node.
if (!IsPostDom) {		if (!IsPostDom) {
// Dominators have a single root that is the function's entry.		Roots.push_back(GetEntryNode(DT));
NodeT *entry = TraitsTy::getEntryNode(DT.Parent);		return Roots;
DT.addRoot(entry);		}
} else {
// Initialize the roots list for PostDominators.		SemiNCAInfo SNCA;
for (auto *Node : nodes(DT.Parent))
if (TraitsTy::child_begin(Node) == TraitsTy::child_end(Node))		// PostDominatorTree always has a virtual root.
DT.addRoot(Node);		SNCA.addVirtualRoot();
		unsigned Num = 1;

		DEBUG(dbgs() << "\t\tLooking for trivial roots\n");

		// Step #1: Find all the trivial roots that are going to will definitely
		// remain tree roots.
		unsigned Total = 0;
		for (const NodePtr N : nodes(DT.Parent)) {
		++Total;
		// If it has no successors, it is definitely a root.
		if (!HasForwardSuccessors(N)) {
		Roots.push_back(N);
		// Run DFS not to walk this part of CFG later.
		Num = SNCA.runDFS(N, Num, AlwaysDescend, 1);
		DEBUG(dbgs() << "Found a new trivial root: " << BlockNamePrinter(N)
		<< "\n");
		DEBUG(dbgs() << "Last visited node: "
		<< BlockNamePrinter(SNCA.NumToNode[Num]) << "\n");
		}
		}

		DEBUG(dbgs() << "\t\tLooking for non-trivial roots\n");

		// Step #2: Find all non-trivial root candidates. Those are CFG nodes that
		// are reverse-unreachable were not visited by previous DFS walks (i.e. CFG
		// nodes in infinite loops).
		bool HasNonTrivialRoots = false;
		// Accounting for the virtual exit, see if we had any reverse-unreachable
		// nodes.
		if (Total + 1 != Num) {
		HasNonTrivialRoots = true;
		// Make another DFS pass over all other nodes to find the
		// reverse-unreachable blocks, and find the furthest paths we'll be able
		// to make.
		// Note that this looks N^2, but it's really 2N worst case, if every node
		// is unreachable. This is because we are still going to only visit each
		// unreachable node once, we may just visit it in two directions,
		// depending on how lucky we get.
		SmallPtrSet<NodePtr, 4> ConnectToExitBlock;
		for (const NodePtr I : nodes(DT.Parent)) {
		if (SNCA.NodeToInfo.count(I) == 0) {
		DEBUG(dbgs() << "\t\t\tVisiting node " << BlockNamePrinter(I)
		<< "\n");
		// Find the furthest away we can get by following successors, then
		// follow them in reverse. This gives us some reasonable answer about
		// the post-dom tree inside any infinite loop. In particular, it
		// guarantees we get to the farthest away point along some
		// path. This also matches the GCC's behavior.
		// If we really wanted a totally complete picture of dominance inside
		// this infinite loop, we could do it with SCC-like algorithms to find
		// the lowest and highest points in the infinite loop. In theory, it
		// would be nice to give the canonical backedge for the loop, but it's
		// expensive and does not always lead to a minimal set of roots.
		DEBUG(dbgs() << "\t\t\tRunning forward DFS\n");

		const unsigned NewNum = SNCA.runDFS<true>(I, Num, AlwaysDescend, Num);
		const NodePtr FurthestAway = SNCA.NumToNode[NewNum];
		DEBUG(dbgs() << "\t\t\tFound a new furthest away node "
		<< "(non-trivial root): "
		<< BlockNamePrinter(FurthestAway) << "\n");
		ConnectToExitBlock.insert(FurthestAway);
		Roots.push_back(FurthestAway);
		DEBUG(dbgs() << "\t\t\tPrev DFSNum: " << Num << ", new DFSNum: "
		<< NewNum << "\n\t\t\tRemoving DFS info\n");
		for (unsigned i = NewNum; i > Num; --i) {
		const NodePtr N = SNCA.NumToNode[i];
		DEBUG(dbgs() << "\t\t\t\tRemoving DFS info for "
		<< BlockNamePrinter(N) << "\n");
		SNCA.NodeToInfo.erase(N);
		SNCA.NumToNode.pop_back();
		}
		const unsigned PrevNum = Num;
		DEBUG(dbgs() << "\t\t\tRunning reverse DFS\n");
		Num = SNCA.runDFS(FurthestAway, Num, AlwaysDescend, 1);
		for (unsigned i = PrevNum + 1; i <= Num; ++i)
		DEBUG(dbgs() << "\t\t\t\tfound node "
		<< BlockNamePrinter(SNCA.NumToNode[i]) << "\n");
		}
}		}
}		}

		DEBUG(dbgs() << "Total: " << Total << ", Num: " << Num << "\n");
		DEBUG(dbgs() << "Discovered CFG nodes:\n");
		DEBUG(for (size_t i = 0; i <= Num; ++i) dbgs()
		<< i << ": " << BlockNamePrinter(SNCA.NumToNode[i]) << "\n");

		assert((Total + 1 == Num) && "Everything should have been visited");

		// Step #3: If we found some non-trivial roots, make them non-redundant.
		if (HasNonTrivialRoots) RemoveRedundantRoots(DT, Roots);

		DEBUG(dbgs() << "Found roots: ");
		DEBUG(for (auto *Root : Roots) dbgs() << BlockNamePrinter(Root) << " ");
		DEBUG(dbgs() << "\n");

		return Roots;
		}

		// This function only makes sense for postdominators.
		// We define roots to be some set of CFG nodes where (reverse) DFS walks have
		// to start in order to visit all the CFG nodes (including the
		// reverse-unreachable ones).
		// When the search for non-trivial roots is done it may happen that some of
		// the non-trivial roots are reverse-reachable from other non-trivial roots,
		// which makes them redundant. This function removes them from the set of
		// input roots.
		static void RemoveRedundantRoots(const DomTreeT &DT, RootsT &Roots) {
		assert(IsPostDom && "This function is for postdominators only");
		DEBUG(dbgs() << "Removing redundant roots\n");

		SemiNCAInfo SNCA;

		for (unsigned i = 0; i < Roots.size(); ++i) {
		auto &Root = Roots[i];
		// Trivial roots are always non-redundant.
		if (!HasForwardSuccessors(Root)) continue;
		DEBUG(dbgs() << "\tChecking if " << BlockNamePrinter(Root)
		<< " remains a root\n");
		SNCA.clear();
		// Do a forward walk looking for the other roots.
		const unsigned Num = SNCA.runDFS<true>(Root, 0, AlwaysDescend, 0);
		// Skip the start node and begin from the second one (note that DFS uses
		// 1-based indexing).
		for (unsigned x = 2; x <= Num; ++x) {
		const NodePtr N = SNCA.NumToNode[x];
		// If we wound another root in a (forward) DFS walk, remove the current
		// root from the set of roots, as it is reverse-reachable from the other
		// one.
		if (llvm::find(Roots, N) != Roots.end()) {
		DEBUG(dbgs() << "\tForward DFS walk found another root "
		<< BlockNamePrinter(N) << "\n\tRemoving root "
		<< BlockNamePrinter(Root) << "\n");
		std::swap(Root, Roots.back());
		Roots.pop_back();

		// Root at the back takes the current root's place.
		// Start the next loop iteration with the same index.
		--i;
		break;
		}
		}
		}
		}

		template <typename DescendCondition>
		void doFullDFSWalk(const DomTreeT &DT, DescendCondition DC) {
		if (!IsPostDom) {
		assert(DT.Roots.size() == 1 && "Dominators should have a singe root");
		runDFS(DT.Roots[0], 0, DC, 0);
		return;
		}

		addVirtualRoot();
		unsigned Num = 1;
		for (const NodePtr Root : DT.Roots) Num = runDFS(Root, Num, DC, 0);
		}

void calculateFromScratch(DomTreeT &DT) {		void calculateFromScratch(DomTreeT &DT) {
// Step #0: Number blocks in depth-first order and initialize variables used		// Step #0: Number blocks in depth-first order and initialize variables used
// in later stages of the algorithm.		// in later stages of the algorithm.
FindAndAddRoots(DT);		DT.Roots = FindRoots(DT);
doFullDFSWalk(DT, AlwaysDescend);		doFullDFSWalk(DT, AlwaysDescend);

runSemiNCA(DT);		runSemiNCA(DT);

if (DT.Roots.empty()) return;		if (DT.Roots.empty()) return;

// Add a node for the root. If the tree is a PostDominatorTree it will be		// Add a node for the root. If the tree is a PostDominatorTree it will be
// the virtual exit (denoted by (BasicBlock *) nullptr) which postdominates		// the virtual exit (denoted by (BasicBlock *) nullptr) which postdominates
Show All 15 Lines	for (size_t i = 1, e = NumToNode.size(); i != e; ++i) {
DEBUG(dbgs() << "\tdiscovered a new reachable node "		DEBUG(dbgs() << "\tdiscovered a new reachable node "
<< BlockNamePrinter(W) << "\n");		<< BlockNamePrinter(W) << "\n");

// Don't replace this with 'count', the insertion side effect is important		// Don't replace this with 'count', the insertion side effect is important
if (DT.DomTreeNodes[W]) continue; // Haven't calculated this node yet?		if (DT.DomTreeNodes[W]) continue; // Haven't calculated this node yet?

NodePtr ImmDom = getIDom(W);		NodePtr ImmDom = getIDom(W);

// Get or calculate the node for the immediate dominator		// Get or calculate the node for the immediate dominator.
TreeNodePtr IDomNode = getNodeForBlock(ImmDom, DT);		TreeNodePtr IDomNode = getNodeForBlock(ImmDom, DT);

// Add a new tree node for this BasicBlock, and link it as a child of		// Add a new tree node for this BasicBlock, and link it as a child of
// IDomNode		// IDomNode.
DT.DomTreeNodes[W] = IDomNode->addChild(		DT.DomTreeNodes[W] = IDomNode->addChild(
llvm::make_unique<DomTreeNodeBase<NodeT>>(W, IDomNode));		llvm::make_unique<DomTreeNodeBase<NodeT>>(W, IDomNode));
}		}
}		}

void reattachExistingSubtree(DomTreeT &DT, const TreeNodePtr AttachTo) {		void reattachExistingSubtree(DomTreeT &DT, const TreeNodePtr AttachTo) {
NodeToInfo[NumToNode[1]].IDom = AttachTo->getBlock();		NodeToInfo[NumToNode[1]].IDom = AttachTo->getBlock();
for (size_t i = 1, e = NumToNode.size(); i != e; ++i) {		for (size_t i = 1, e = NumToNode.size(); i != e; ++i) {
Show All 20 Lines	std::priority_queue<BucketElementTy, SmallVector<BucketElementTy, 8>,
Bucket; // Queue of tree nodes sorted by level in descending order.		Bucket; // Queue of tree nodes sorted by level in descending order.
SmallDenseSet<TreeNodePtr, 8> Affected;		SmallDenseSet<TreeNodePtr, 8> Affected;
SmallDenseSet<TreeNodePtr, 8> Visited;		SmallDenseSet<TreeNodePtr, 8> Visited;
SmallVector<TreeNodePtr, 8> AffectedQueue;		SmallVector<TreeNodePtr, 8> AffectedQueue;
SmallVector<TreeNodePtr, 8> VisitedNotAffectedQueue;		SmallVector<TreeNodePtr, 8> VisitedNotAffectedQueue;
};		};

static void InsertEdge(DomTreeT &DT, const NodePtr From, const NodePtr To) {		static void InsertEdge(DomTreeT &DT, const NodePtr From, const NodePtr To) {
assert(From && To && "Cannot connect nullptrs");		assert((From \|\| IsPostDom) &&
		"From has to be a valid CFG node or a virtual root");
		assert(To && "Cannot be a nullptr");
DEBUG(dbgs() << "Inserting edge " << BlockNamePrinter(From) << " -> "		DEBUG(dbgs() << "Inserting edge " << BlockNamePrinter(From) << " -> "
<< BlockNamePrinter(To) << "\n");		<< BlockNamePrinter(To) << "\n");
const TreeNodePtr FromTN = DT.getNode(From);		TreeNodePtr FromTN = DT.getNode(From);

// Ignore edges from unreachable nodes.		if (!FromTN) {
if (!FromTN) return;		// Ignore edges from unreachable nodes for (forward) dominators.
		if (!IsPostDom) return;

		// The unreachable node becomes a new root -- a tree node for it.
		TreeNodePtr VirtualRoot = DT.getNode(nullptr);
		FromTN =
		(DT.DomTreeNodes[From] = VirtualRoot->addChild(
		llvm::make_unique<DomTreeNodeBase<NodeT>>(From, VirtualRoot)))
		.get();
		DT.Roots.push_back(From);
		}

DT.DFSInfoValid = false;		DT.DFSInfoValid = false;

const TreeNodePtr ToTN = DT.getNode(To);		const TreeNodePtr ToTN = DT.getNode(To);
if (!ToTN)		if (!ToTN)
InsertUnreachable(DT, FromTN, To);		InsertUnreachable(DT, FromTN, To);
else		else
InsertReachable(DT, FromTN, ToTN);		InsertReachable(DT, FromTN, ToTN);
}		}

		// Determines if some existing root becomes reverse-reachable after the
		// insertion. Rebuilds the whole tree if that situation happens.
		static bool UpdateRootsBeforeInsertion(DomTreeT &DT, const TreeNodePtr From,
		const TreeNodePtr To) {
		assert(IsPostDom && "This function is only for postdominators");
		// Destination node is not attached to the virtual root, so it cannot be a
		// root.
		if (!DT.isVirtualRoot(To->getIDom())) return false;

		auto RIt = llvm::find(DT.Roots, To->getBlock());
		if (RIt == DT.Roots.end())
		return false; // To is not a root, nothing to update.

		DEBUG(dbgs() << "\t\tAfter the insertion, " << BlockNamePrinter(To)
		<< " is no longer a root\n\t\tRebuilding the tree!!!\n");

		DT.recalculate(*DT.Parent);
		return true;
		}

		// Updates the set of roots after insertion or deletion. This ensures that
		// roots are the same when after a series of updates and when the tree would
		// be built from scratch.
		static void UpdateRootsAfterUpdate(DomTreeT &DT) {
		assert(IsPostDom && "This function is only for postdominators");

		// The tree has only trivial roots -- nothing to update.
		if (std::none_of(DT.Roots.begin(), DT.Roots.end(), HasForwardSuccessors))
		return;

		// Recalculate the set of roots.
		DT.Roots = FindRoots(DT);
		for (const NodePtr R : DT.Roots) {
		const TreeNodePtr TN = DT.getNode(R);
		// A CFG node was selected as a tree root, but the corresponding tree node
		// is not connected to the virtual root. This is because the incremental
		// algorithm does not really know or use the set of roots and can make a
		// different (implicit) decision about which nodes within an infinite loop
		// becomes a root.
		if (DT.isVirtualRoot(TN->getIDom())) {
		DEBUG(dbgs() << "Root " << BlockNamePrinter(R)
		<< " is not virtual root's child\n"
		<< "The entire tree needs to be rebuilt\n");
		// It should be possible to rotate the subtree instead of recalculating
		// the whole tree, but this situation happens extremely rarely in
		// practice.
		DT.recalculate(*DT.Parent);
		return;
		}
		}
		}

// Handles insertion to a node already in the dominator tree.		// Handles insertion to a node already in the dominator tree.
static void InsertReachable(DomTreeT &DT, const TreeNodePtr From,		static void InsertReachable(DomTreeT &DT, const TreeNodePtr From,
const TreeNodePtr To) {		const TreeNodePtr To) {
DEBUG(dbgs() << "\tReachable " << BlockNamePrinter(From->getBlock())		DEBUG(dbgs() << "\tReachable " << BlockNamePrinter(From->getBlock())
<< " -> " << BlockNamePrinter(To->getBlock()) << "\n");		<< " -> " << BlockNamePrinter(To->getBlock()) << "\n");
		if (IsPostDom && UpdateRootsBeforeInsertion(DT, From, To)) return;
		// DT.findNCD expects both pointers to be valid. When From is a virtual
		// root, then its CFG block pointer is a nullptr, so we have to 'compute'
		// the NCD manually.
const NodePtr NCDBlock =		const NodePtr NCDBlock =
DT.findNearestCommonDominator(From->getBlock(), To->getBlock());		(From->getBlock() && To->getBlock())
		? DT.findNearestCommonDominator(From->getBlock(), To->getBlock())
		: nullptr;
assert(NCDBlock \|\| DT.isPostDominator());		assert(NCDBlock \|\| DT.isPostDominator());
const TreeNodePtr NCD = DT.getNode(NCDBlock);		const TreeNodePtr NCD = DT.getNode(NCDBlock);
assert(NCD);		assert(NCD);

DEBUG(dbgs() << "\t\tNCA == " << BlockNamePrinter(NCD) << "\n");		DEBUG(dbgs() << "\t\tNCA == " << BlockNamePrinter(NCD) << "\n");
const TreeNodePtr ToIDom = To->getIDom();		const TreeNodePtr ToIDom = To->getIDom();

// Nothing affected -- NCA property holds.		// Nothing affected -- NCA property holds.
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	static void UpdateInsertion(DomTreeT &DT, const TreeNodePtr NCD,

for (const TreeNodePtr TN : II.AffectedQueue) {		for (const TreeNodePtr TN : II.AffectedQueue) {
DEBUG(dbgs() << "\tIDom(" << BlockNamePrinter(TN)		DEBUG(dbgs() << "\tIDom(" << BlockNamePrinter(TN)
<< ") = " << BlockNamePrinter(NCD) << "\n");		<< ") = " << BlockNamePrinter(NCD) << "\n");
TN->setIDom(NCD);		TN->setIDom(NCD);
}		}

UpdateLevelsAfterInsertion(II);		UpdateLevelsAfterInsertion(II);
		if (IsPostDom) UpdateRootsAfterUpdate(DT);
}		}

static void UpdateLevelsAfterInsertion(InsertionInfo &II) {		static void UpdateLevelsAfterInsertion(InsertionInfo &II) {
DEBUG(dbgs() << "Updating levels for visited but not affected nodes\n");		DEBUG(dbgs() << "Updating levels for visited but not affected nodes\n");

for (const TreeNodePtr TN : II.VisitedNotAffectedQueue) {		for (const TreeNodePtr TN : II.VisitedNotAffectedQueue) {
DEBUG(dbgs() << "\tlevel(" << BlockNamePrinter(TN) << ") = ("		DEBUG(dbgs() << "\tlevel(" << BlockNamePrinter(TN) << ") = ("
<< BlockNamePrinter(TN->getIDom()) << ") "		<< BlockNamePrinter(TN->getIDom()) << ") "
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	static void ComputeUnreachableDominators(
SNCA.runDFS(Root, 0, UnreachableDescender, 0);		SNCA.runDFS(Root, 0, UnreachableDescender, 0);
SNCA.runSemiNCA(DT);		SNCA.runSemiNCA(DT);
SNCA.attachNewSubtree(DT, Incoming);		SNCA.attachNewSubtree(DT, Incoming);

DEBUG(dbgs() << "After adding unreachable nodes\n");		DEBUG(dbgs() << "After adding unreachable nodes\n");
DEBUG(DT.print(dbgs()));		DEBUG(DT.print(dbgs()));
}		}

// Checks if the tree contains all reachable nodes in the input graph.
bool verifyReachability(const DomTreeT &DT) {
clear();
doFullDFSWalk(DT, AlwaysDescend);

for (auto &NodeToTN : DT.DomTreeNodes) {
const TreeNodePtr TN = NodeToTN.second.get();
const NodePtr BB = TN->getBlock();

// Virtual root has a corresponding virtual CFG node.
if (DT.isVirtualRoot(TN)) continue;

if (NodeToInfo.count(BB) == 0) {
errs() << "DomTree node " << BlockNamePrinter(BB)
<< " not found by DFS walk!\n";
errs().flush();

return false;
}
}

for (const NodePtr N : NumToNode) {
if (N && !DT.getNode(N)) {
errs() << "CFG node " << BlockNamePrinter(N)
<< " not found in the DomTree!\n";
errs().flush();

return false;
}
}

return true;
}

static void DeleteEdge(DomTreeT &DT, const NodePtr From, const NodePtr To) {		static void DeleteEdge(DomTreeT &DT, const NodePtr From, const NodePtr To) {
assert(From && To && "Cannot disconnect nullptrs");		assert(From && To && "Cannot disconnect nullptrs");
DEBUG(dbgs() << "Deleting edge " << BlockNamePrinter(From) << " -> "		DEBUG(dbgs() << "Deleting edge " << BlockNamePrinter(From) << " -> "
<< BlockNamePrinter(To) << "\n");		<< BlockNamePrinter(To) << "\n");

#ifndef NDEBUG		#ifndef NDEBUG
// Ensure that the edge was in fact deleted from the CFG before informing		// Ensure that the edge was in fact deleted from the CFG before informing
// the DomTree about it.		// the DomTree about it.
Show All 23 Lines	DEBUG(dbgs() << "\tNCD " << BlockNamePrinter(NCD) << ", ToIDom "
<< BlockNamePrinter(ToIDom) << "\n");		<< BlockNamePrinter(ToIDom) << "\n");

// To remains reachable after deletion.		// To remains reachable after deletion.
// (Based on the caption under Figure 4. from the second paper.)		// (Based on the caption under Figure 4. from the second paper.)
if (FromTN != ToIDom \|\| HasProperSupport(DT, ToTN))		if (FromTN != ToIDom \|\| HasProperSupport(DT, ToTN))
DeleteReachable(DT, FromTN, ToTN);		DeleteReachable(DT, FromTN, ToTN);
else		else
DeleteUnreachable(DT, ToTN);		DeleteUnreachable(DT, ToTN);

		if (IsPostDom) UpdateRootsAfterUpdate(DT);
}		}

// Handles deletions that leave destination nodes reachable.		// Handles deletions that leave destination nodes reachable.
static void DeleteReachable(DomTreeT &DT, const TreeNodePtr FromTN,		static void DeleteReachable(DomTreeT &DT, const TreeNodePtr FromTN,
const TreeNodePtr ToTN) {		const TreeNodePtr ToTN) {
DEBUG(dbgs() << "Deleting reachable " << BlockNamePrinter(FromTN) << " -> "		DEBUG(dbgs() << "Deleting reachable " << BlockNamePrinter(FromTN) << " -> "
<< BlockNamePrinter(ToTN) << "\n");		<< BlockNamePrinter(ToTN) << "\n");
DEBUG(dbgs() << "\tRebuilding subtree\n");		DEBUG(dbgs() << "\tRebuilding subtree\n");
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	#endif
// Handle deletions that make destination node unreachable.		// Handle deletions that make destination node unreachable.
// (Based on the lemma 2.7 from the second paper.)		// (Based on the lemma 2.7 from the second paper.)
static void DeleteUnreachable(DomTreeT &DT, const TreeNodePtr ToTN) {		static void DeleteUnreachable(DomTreeT &DT, const TreeNodePtr ToTN) {
DEBUG(dbgs() << "Deleting unreachable subtree " << BlockNamePrinter(ToTN)		DEBUG(dbgs() << "Deleting unreachable subtree " << BlockNamePrinter(ToTN)
<< "\n");		<< "\n");
assert(ToTN);		assert(ToTN);
assert(ToTN->getBlock());		assert(ToTN->getBlock());

		if (IsPostDom) {
		// Deletion makes a region reverse-unreachable and creates a new root.
		// Simulate that by inserting an edge from the virtual root to ToTN and
		// adding it as a new root.
		DEBUG(dbgs() << "\tDeletion made a region reverse-unreachable\n");
		DEBUG(dbgs() << "\tAdding new root " << BlockNamePrinter(ToTN) << "\n");
		DT.Roots.push_back(ToTN->getBlock());
		InsertReachable(DT, DT.getNode(nullptr), ToTN);
		return;
		}

SmallVector<NodePtr, 16> AffectedQueue;		SmallVector<NodePtr, 16> AffectedQueue;
const unsigned Level = ToTN->getLevel();		const unsigned Level = ToTN->getLevel();

// Traverse destination node's descendants with greater level in the tree		// Traverse destination node's descendants with greater level in the tree
// and collect visited nodes.		// and collect visited nodes.
auto DescendAndCollect = [Level, &AffectedQueue, &DT](NodePtr, NodePtr To) {		auto DescendAndCollect = [Level, &AffectedQueue, &DT](NodePtr, NodePtr To) {
const TreeNodePtr TN = DT.getNode(To);		const TreeNodePtr TN = DT.getNode(To);
assert(TN);		assert(TN);
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	static void EraseNode(DomTreeT &DT, const TreeNodePtr TN) {

DT.DomTreeNodes.erase(TN->getBlock());		DT.DomTreeNodes.erase(TN->getBlock());
}		}

//~~		//~~
//===--------------- DomTree correctness verification ---------------------===		//===--------------- DomTree correctness verification ---------------------===
//~~		//~~

		// Check if the tree has correct roots. A DominatorTree always has a single
		// root which is the function's entry node. A PostDominatorTree can have
		// multiple roots - one for each node with no successors and for infinite
		// loops.
		bool verifyRoots(const DomTreeT &DT) {
		if (!DT.Parent && !DT.Roots.empty()) {
		errs() << "Tree has no parent but has roots!\n";
		errs().flush();
		return false;
		}

		if (!IsPostDom) {
		if (DT.Roots.empty()) {
		errs() << "Tree doesn't have a root!\n";
		errs().flush();
		return false;
		}

		if (DT.getRoot() != GetEntryNode(DT)) {
		errs() << "Tree's root is not its parent's entry node!\n";
		errs().flush();
		return false;
		}
		}

		RootsT ComputedRoots = FindRoots(DT);
		if (DT.Roots.size() != ComputedRoots.size() \|\|
		!std::is_permutation(DT.Roots.begin(), DT.Roots.end(),
		ComputedRoots.begin())) {
		errs() << "Tree has different roots than freshly computed ones!\n";
		errs() << "\tPDT roots: ";
		for (const NodePtr N : DT.Roots) errs() << BlockNamePrinter(N) << ", ";
		errs() << "\n\tComputed roots: ";
		for (const NodePtr N : ComputedRoots)
		errs() << BlockNamePrinter(N) << ", ";
		errs() << "\n";
		errs().flush();
		return false;
		}

		return true;
		}

		// Checks if the tree contains all reachable nodes in the input graph.
		bool verifyReachability(const DomTreeT &DT) {
		clear();
		doFullDFSWalk(DT, AlwaysDescend);

		for (auto &NodeToTN : DT.DomTreeNodes) {
		const TreeNodePtr TN = NodeToTN.second.get();
		const NodePtr BB = TN->getBlock();

		// Virtual root has a corresponding virtual CFG node.
		if (DT.isVirtualRoot(TN)) continue;

		if (NodeToInfo.count(BB) == 0) {
		errs() << "DomTree node " << BlockNamePrinter(BB)
		<< " not found by DFS walk!\n";
		errs().flush();

		return false;
		}
		}

		for (const NodePtr N : NumToNode) {
		if (N && !DT.getNode(N)) {
		errs() << "CFG node " << BlockNamePrinter(N)
		<< " not found in the DomTree!\n";
		errs().flush();

		return false;
		}
		}

		return true;
		}

// Check if for every parent with a level L in the tree all of its children		// Check if for every parent with a level L in the tree all of its children
// have level L + 1.		// have level L + 1.
static bool VerifyLevels(const DomTreeT &DT) {		static bool VerifyLevels(const DomTreeT &DT) {
for (auto &NodeToTN : DT.DomTreeNodes) {		for (auto &NodeToTN : DT.DomTreeNodes) {
const TreeNodePtr TN = NodeToTN.second.get();		const TreeNodePtr TN = NodeToTN.second.get();
const NodePtr BB = TN->getBlock();		const NodePtr BB = TN->getBlock();
if (!BB) continue;		if (!BB) continue;

▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	#endif
// This means that if a node gets disconnected from the graph, then all of		// This means that if a node gets disconnected from the graph, then all of
// the nodes it dominated previously will now become unreachable.		// the nodes it dominated previously will now become unreachable.
bool verifyParentProperty(const DomTreeT &DT) {		bool verifyParentProperty(const DomTreeT &DT) {
for (auto &NodeToTN : DT.DomTreeNodes) {		for (auto &NodeToTN : DT.DomTreeNodes) {
const TreeNodePtr TN = NodeToTN.second.get();		const TreeNodePtr TN = NodeToTN.second.get();
const NodePtr BB = TN->getBlock();		const NodePtr BB = TN->getBlock();
if (!BB \|\| TN->getChildren().empty()) continue;		if (!BB \|\| TN->getChildren().empty()) continue;

		DEBUG(dbgs() << "Verifying parent property of node "
		<< BlockNamePrinter(TN) << "\n");
clear();		clear();
doFullDFSWalk(DT, [BB](NodePtr From, NodePtr To) {		doFullDFSWalk(DT, [BB](NodePtr From, NodePtr To) {
return From != BB && To != BB;		return From != BB && To != BB;
});		});

for (TreeNodePtr Child : TN->getChildren())		for (TreeNodePtr Child : TN->getChildren())
if (NodeToInfo.count(Child->getBlock()) != 0) {		if (NodeToInfo.count(Child->getBlock()) != 0) {
errs() << "Child " << BlockNamePrinter(Child)		errs() << "Child " << BlockNamePrinter(Child)
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	void DeleteEdge(DomTreeT &DT, typename DomTreeT::NodePtr From,
typename DomTreeT::NodePtr To) {		typename DomTreeT::NodePtr To) {
if (DT.isPostDominator()) std::swap(From, To);		if (DT.isPostDominator()) std::swap(From, To);
SemiNCAInfo<DomTreeT>::DeleteEdge(DT, From, To);		SemiNCAInfo<DomTreeT>::DeleteEdge(DT, From, To);
}		}

template <class DomTreeT>		template <class DomTreeT>
bool Verify(const DomTreeT &DT) {		bool Verify(const DomTreeT &DT) {
SemiNCAInfo<DomTreeT> SNCA;		SemiNCAInfo<DomTreeT> SNCA;
return SNCA.verifyReachability(DT) && SNCA.VerifyLevels(DT) &&		return SNCA.verifyRoots(DT) && SNCA.verifyReachability(DT) &&
SNCA.verifyNCD(DT) && SNCA.verifyParentProperty(DT) &&		SNCA.VerifyLevels(DT) && SNCA.verifyNCD(DT) &&
SNCA.verifySiblingProperty(DT);		SNCA.verifyParentProperty(DT) && SNCA.verifySiblingProperty(DT);
}		}

} // namespace DomTreeBuilder		} // namespace DomTreeBuilder
} // namespace llvm		} // namespace llvm

#undef DEBUG_TYPE		#undef DEBUG_TYPE

#endif		#endif

llvm/trunk/lib/Transforms/Scalar/ADCE.cpp

Show First 20 Lines • Show All 247 Lines • ▼ Show 20 Lines	for (auto *BB: depth_first_ext(&F.getEntryBlock(), State)) {
if (State.onStack(Succ)) {		if (State.onStack(Succ)) {
// back edge....		// back edge....
markLive(Term);		markLive(Term);
break;		break;
}		}
}		}
}		}

// Mark blocks live if there is no path from the block to the		// Mark blocks live if there is no path from the block to a
// return of the function or a successor for which this is true.		// return of the function.
// This protects IDFCalculator which cannot handle such blocks.		// We do this by seeing which of the postdomtree root children exit the
for (auto &BBInfoPair : BlockInfo) {		// program, and for all others, mark the subtree live.
auto &BBInfo = BBInfoPair.second;		for (auto &PDTChild : children<DomTreeNode *>(PDT.getRootNode())) {
if (BBInfo.terminatorIsLive())		auto *BB = PDTChild->getBlock();
continue;		auto &Info = BlockInfo[BB];
auto *BB = BBInfo.BB;		// Real function return
if (!PDT.getNode(BB)) {		if (isa<ReturnInst>(Info.Terminator)) {
DEBUG(dbgs() << "Not post-dominated by return: " << BB->getName()		DEBUG(dbgs() << "post-dom root child is a return: " << BB->getName()
<< '\n';);		<< '\n';);
markLive(BBInfo.Terminator);
continue;		continue;
}		}
for (auto *Succ : successors(BB))
if (!PDT.getNode(Succ)) {		// This child is something else, like an infinite loop.
DEBUG(dbgs() << "Successor not post-dominated by return: "		for (auto DFNode : depth_first(PDTChild))
<< BB->getName() << '\n';);		markLive(BlockInfo[DFNode->getBlock()].Terminator);
markLive(BBInfo.Terminator);
break;
}
}		}

// Treat the entry block as always live		// Treat the entry block as always live
auto *BB = &F.getEntryBlock();		auto *BB = &F.getEntryBlock();
auto &EntryInfo = BlockInfo[BB];		auto &EntryInfo = BlockInfo[BB];
EntryInfo.Live = true;		EntryInfo.Live = true;
if (EntryInfo.UnconditionalBranch)		if (EntryInfo.UnconditionalBranch)
markLive(EntryInfo.Terminator);		markLive(EntryInfo.Terminator);
▲ Show 20 Lines • Show All 380 Lines • Show Last 20 Lines

llvm/trunk/test/Analysis/PostDominators/infinite-loop.ll

				; RUN: opt < %s -postdomtree -analyze \| FileCheck %s
				; RUN: opt < %s -passes='print<postdomtree>' 2>&1 \| FileCheck %s

				@a = external global i32, align 4

				define void @fn1() {
				entry:
				store i32 5, i32* @a, align 4
				%call = call i32 (...) @foo()
				%tobool = icmp ne i32 %call, 0
				br i1 %tobool, label %if.then, label %if.end

				if.then: ; preds = %entry
				br label %loop

				loop: ; preds = %loop, %if.then
				br label %loop

				if.end: ; preds = %entry
				store i32 6, i32* @a, align 4
				ret void
				}

				declare i32 @foo(...)

				; CHECK: Inorder PostDominator Tree:
				; CHECK-NEXT: [1] <<exit node>>
				; CHECK: [2] %loop
				; CHECK-NEXT: [3] %if.then
				; CHECK: Roots: %if.end %loop

llvm/trunk/test/Analysis/PostDominators/infinite-loop2.ll

				; RUN: opt < %s -postdomtree -analyze \| FileCheck %s
				; RUN: opt < %s -passes='print<postdomtree>' 2>&1 \| FileCheck %s

				@a = external global i32, align 4

				define void @fn1() {
				entry:
				store i32 5, i32* @a, align 4
				%call = call i32 (...) @foo()
				%tobool = icmp ne i32 %call, 0
				br i1 %tobool, label %if.then, label %if.end

				if.then: ; preds = %entry
				br label %loop

				loop: ; preds = %loop, %if.then
				%0 = load i32, i32* @a, align 4
				call void @bar(i32 %0)
				br label %loop

				if.end: ; preds = %entry
				store i32 6, i32* @a, align 4
				ret void
				}

				declare i32 @foo(...)
				declare void @bar(i32)


				; CHECK: Inorder PostDominator Tree:
				; CHECK-NEXT: [1] <<exit node>>
				; CHECK: [2] %loop
				; CHECK-NEXT: [3] %if.then
				; CHECK: Roots: %if.end %loop

llvm/trunk/test/Analysis/PostDominators/infinite-loop3.ll

				; RUN: opt < %s -postdomtree -analyze \| FileCheck %s
				; RUN: opt < %s -passes='print<postdomtree>' 2>&1 \| FileCheck %s

				@a = external global i32, align 4

				define void @fn1() {
				entry:
				store i32 5, i32* @a, align 4
				%call = call i32 (...) @foo()
				%tobool = icmp ne i32 %call, 0
				br i1 %tobool, label %if.then, label %if.end

				if.then: ; preds = %entry, %loop
				br label %loop

				loop: ; preds = %loop, %if.then
				%0 = load i32, i32* @a, align 4
				call void @bar(i32 %0)
				br i1 true, label %loop, label %if.then

				if.end: ; preds = %entry
				store i32 6, i32* @a, align 4
				ret void
				}

				declare i32 @foo(...)
				declare void @bar(i32)


				; CHECK: Inorder PostDominator Tree:
				; CHECK-NEXT: [1] <<exit node>>
				; CHECK: [2] %loop
				; CHECK-NEXT: [3] %if.then
				; CHECK: Roots: %if.end %loop

llvm/trunk/test/Analysis/PostDominators/pr24415.ll

				; RUN: opt < %s -postdomtree -analyze \| FileCheck %s
				; RUN: opt < %s -passes='print<postdomtree>' 2>&1 \| FileCheck %s

				; Function Attrs: nounwind ssp uwtable
				define void @foo() {
				br label %1

				; <label>:1 ; preds = %0, %1
				br label %1
				; No predecessors!
				ret void
				}

				; CHECK: Inorder PostDominator Tree:
				; CHECK-NEXT: [1] <<exit node>>
				; CHECK-NEXT: [2] %2
				; CHECK-NEXT: [2] %1
				; CHECK-NEXT: [3] %0
				No newline at end of file

llvm/trunk/test/Analysis/PostDominators/pr6047_a.ll

	; RUN: opt < %s -postdomtree -analyze \| FileCheck %s			; RUN: opt < %s -postdomtree -analyze \| FileCheck %s
	define internal void @f() {			define internal void @f() {
	entry:			entry:
	br i1 undef, label %bb35, label %bb3.i			br i1 undef, label %bb35, label %bb3.i

	bb3.i:			bb3.i:
	br label %bb3.i			br label %bb3.i

	bb35.loopexit3:			bb35.loopexit3:
	br label %bb35			br label %bb35

	bb35:			bb35:
	ret void			ret void
	}			}
	; CHECK: [3] %entry
				;CHECK:Inorder PostDominator Tree:
				;CHECK-NEXT: [1] <<exit node>>
				;CHECK-NEXT: [2] %bb35
				;CHECK-NEXT: [3] %bb35.loopexit3
				;CHECK-NEXT: [2] %entry
				;CHECK-NEXT: [2] %bb3.i

llvm/trunk/test/Analysis/PostDominators/pr6047_b.ll

	Show All 10 Lines


	bb35.loopexit3:			bb35.loopexit3:
	br label %bb35			br label %bb35

	bb35:			bb35:
	ret void			ret void
	}			}
	; CHECK: [4] %entry			; CHECK: Inorder PostDominator Tree:
				; CHECK-NEXT: [1] <<exit node>>
				; CHECK-NEXT: [2] %bb35
				; CHECK-NEXT: [3] %bb35.loopexit3
				; CHECK-NEXT: [2] %a
				; CHECK-NEXT: [2] %entry
				; CHECK-NEXT: [2] %bb3.i
				No newline at end of file

llvm/trunk/test/Analysis/PostDominators/pr6047_c.ll

Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	bb35.loopexit:
br label %bb35		br label %bb35

bb35.loopexit3:		bb35.loopexit3:
br label %bb35		br label %bb35

bb35:		bb35:
ret void		ret void
}		}
; CHECK: [3] %entry		; CHECK: Inorder PostDominator Tree:
		; CHECK-NEXT: [1] <<exit node>>
		; CHECK-NEXT: [2] %bb35
		; CHECK-NEXT: [3] %bb
		; CHECK-NEXT: [3] %bb.i
		; CHECK-NEXT: [3] %_float32_unpack.exit
		; CHECK-NEXT: [3] %bb.i5
		; CHECK-NEXT: [3] %_float32_unpack.exit8
		; CHECK-NEXT: [3] %bb32.preheader
		; CHECK-NEXT: [3] %bb3
		; CHECK-NEXT: [3] %bb3.split.us
		; CHECK-NEXT: [3] %bb.i4.us
		; CHECK-NEXT: [3] %bb7.i.us
		; CHECK-NEXT: [3] %bb.i4.us.backedge
		; CHECK-NEXT: [3] %bb1.i.us
		; CHECK-NEXT: [3] %bb6.i.us
		; CHECK-NEXT: [3] %bb4.i.us
		; CHECK-NEXT: [3] %bb8.i.us
		; CHECK-NEXT: [3] %bb3.i.loopexit.us
		; CHECK-NEXT: [3] %bb.nph21
		; CHECK-NEXT: [3] %bb4
		; CHECK-NEXT: [3] %bb5
		; CHECK-NEXT: [3] %bb14.preheader
		; CHECK-NEXT: [3] %bb.nph18
		; CHECK-NEXT: [3] %bb8.us.preheader
		; CHECK-NEXT: [3] %bb8.preheader
		; CHECK-NEXT: [3] %bb8.us
		; CHECK-NEXT: [3] %bb8
		; CHECK-NEXT: [3] %bb15.loopexit
		; CHECK-NEXT: [3] %bb15.loopexit2
		; CHECK-NEXT: [3] %bb15
		; CHECK-NEXT: [3] %bb16
		; CHECK-NEXT: [3] %bb17.loopexit.split
		; CHECK-NEXT: [3] %bb.nph14
		; CHECK-NEXT: [3] %bb19
		; CHECK-NEXT: [3] %bb20
		; CHECK-NEXT: [3] %bb29.preheader
		; CHECK-NEXT: [3] %bb.nph
		; CHECK-NEXT: [3] %bb23.us.preheader
		; CHECK-NEXT: [3] %bb23.preheader
		; CHECK-NEXT: [3] %bb23.us
		; CHECK-NEXT: [3] %bb23
		; CHECK-NEXT: [3] %bb30.loopexit
		; CHECK-NEXT: [3] %bb30.loopexit1
		; CHECK-NEXT: [3] %bb30
		; CHECK-NEXT: [3] %bb31
		; CHECK-NEXT: [3] %bb35.loopexit
		; CHECK-NEXT: [3] %bb35.loopexit3
		; CHECK-NEXT: [2] %entry
		; CHECK-NEXT: [2] %bb3.i
		; CHECK-NEXT: Roots: %bb35 %bb3.i
		No newline at end of file

llvm/trunk/test/Analysis/PostDominators/pr6047_d.ll

Show All 15 Lines	bb3.i:
br label %bb3.i		br label %bb3.i

bb35.loopexit3:		bb35.loopexit3:
br label %bb35		br label %bb35

bb35:		bb35:
ret void		ret void
}		}
; CHECK: [4] %entry		; CHECK: Inorder PostDominator Tree:
		; CHECK-NEXT: [1] <<exit node>>
		; CHECK-NEXT: [2] %bb35
		; CHECK-NEXT: [3] %bb35.loopexit3
		; CHECK-NEXT: [2] %c
		; CHECK-NEXT: [3] %a
		; CHECK-NEXT: [3] %entry
		; CHECK-NEXT: [3] %b
		; CHECK-NEXT: [2] %bb3.i
		No newline at end of file

llvm/trunk/test/Analysis/RegionInfo/infinite_loop.ll

Show All 10 Lines	2:
br label %"2"		br label %"2"
3:		3:
br label %"4"		br label %"4"
4:		4:
ret void		ret void
}		}
; CHECK-NOT: =>		; CHECK-NOT: =>
; CHECK: [0] 0 => <Function Return>		; CHECK: [0] 0 => <Function Return>
; CHECK: [1] 1 => 4		; STAT: 1 region - The # of regions
; STAT: 2 region - The # of regions		No newline at end of file
; STAT: 1 region - The # of simple regions

llvm/trunk/test/Analysis/RegionInfo/infinite_loop_2.ll

Show All 20 Lines	6:
br label %"2"		br label %"2"
3:		3:
br label %"4"		br label %"4"
4:		4:
ret void		ret void
}		}
; CHECK-NOT: =>		; CHECK-NOT: =>
; CHECK: [0] 0 => <Function Return>		; CHECK: [0] 0 => <Function Return>
; CHECK: [1] 1 => 3		; CHECK-NOT: [1]
; STAT: 2 region - The # of regions		; STAT: 1 region - The # of regions
; STAT: 1 region - The # of simple regions

; BBIT: 0, 1, 2, 5, 11, 6, 12, 3, 4,		; BBIT: 0, 1, 2, 5, 11, 6, 12, 3, 4,
; BBIT: 1, 2, 5, 11, 6, 12,

; RNIT: 0, 1 => 3, 3, 4,		; RNIT: 0, 1, 2, 5, 11, 6, 12, 3, 4,
; RNIT: 1, 2, 5, 11, 6, 12,

llvm/trunk/test/Analysis/RegionInfo/infinite_loop_3.ll

	Show All 32 Lines
	10:			10:
	br label %"8"			br label %"8"
	3:			3:
	br label %"4"			br label %"4"
	4:			4:
	ret void			ret void
	}			}
	; CHECK-NOT: =>			; CHECK-NOT: =>
	; CHECK: [0] 0 => <Function Return>			; CHECK:[0] 0 => <Function Return>
	; CHECK-NEXT: [1] 1 => 3			; CHECK-NOT: [1]
	; CHECK-NEXT: [1] 7 => 1			; STAT: 1 region - The # of regions
	; STAT: 3 region - The # of regions
	; STAT: 2 region - The # of simple regions

	; BBIT: 0, 7, 1, 2, 5, 11, 6, 12, 3, 4, 8, 9, 13, 10, 14,			; BBIT: 0, 7, 1, 2, 5, 11, 6, 12, 3, 4, 8, 9, 13, 10, 14,
	; BBIT: 7, 8, 9, 13, 10, 14,
	; BBIT: 1, 2, 5, 11, 6, 12,

	; RNIT: 0, 7 => 1, 1 => 3, 3, 4,			; RNIT: 0, 7, 1, 2, 5, 11, 6, 12, 3, 4, 8, 9, 13, 10, 14,
	; RNIT: 7, 8, 9, 13, 10, 14,
	; RNIT: 1, 2, 5, 11, 6, 12,

llvm/trunk/test/Analysis/RegionInfo/infinite_loop_4.ll

Show All 32 Lines	10:
br label %"8"		br label %"8"
3:		3:
br label %"4"		br label %"4"
4:		4:
ret void		ret void
}		}
; CHECK-NOT: =>		; CHECK-NOT: =>
; CHECK: [0] 0 => <Function Return>		; CHECK: [0] 0 => <Function Return>
; CHECK-NEXT: [1] 7 => 3		; CHECK-NEXT: [1] 2 => 10
; STAT: 2 region - The # of regions		; CHECK_NEXT: [2] 5 => 6
		; STAT: 3 region - The # of regions
; STAT: 1 region - The # of simple regions		; STAT: 1 region - The # of simple regions

; BBIT: 0, 7, 1, 2, 5, 11, 6, 10, 8, 9, 13, 14, 12, 3, 4,		; BBIT: 0, 7, 1, 2, 5, 11, 6, 10, 8, 9, 13, 14, 12, 3, 4,
; BBIT: 7, 1, 2, 5, 11, 6, 10, 8, 9, 13, 14, 12,		; BBIT: 2, 5, 11, 6, 12,
		; BBIT: 5, 11, 12,
; RNIT: 0, 7 => 3, 3, 4,		; RNIT: 0, 7, 1, 2 => 10, 10, 8, 9, 13, 14, 3, 4,
; RNIT: 7, 1, 2, 5, 11, 6, 10, 8, 9, 13, 14, 12,		; RNIT: 2, 5 => 6, 6,
		; RNIT: 5, 11, 12,
		No newline at end of file

llvm/trunk/test/Analysis/RegionInfo/infinite_loop_5_a.ll

	Show All 13 Lines
	3:			3:
	br label %"4"			br label %"4"
	4:			4:
	ret void			ret void
	}			}

	; CHECK: Region tree:			; CHECK: Region tree:
	; CHECK-NEXT: [0] 0 => <Function Return>			; CHECK-NEXT: [0] 0 => <Function Return>
	; CHECK-NEXT: [1] 7 => 3
	; CHECK-NEXT: End region tree			; CHECK-NEXT: End region tree

llvm/trunk/test/Analysis/RegionInfo/infinite_loop_5_b.ll

	Show All 15 Lines
	3:			3:
	br label %"4"			br label %"4"
	4:			4:
	ret void			ret void
	}			}

	; CHECK: Region tree:			; CHECK: Region tree:
	; CHECK-NEXT: [0] 0 => <Function Return>			; CHECK-NEXT: [0] 0 => <Function Return>
	; CHECK-NEXT: [1] 7 => 3
	; CHECK-NEXT: End region tree			; CHECK-NEXT: End region tree

llvm/trunk/test/CodeGen/AMDGPU/branch-relaxation.ll

	Show First 20 Lines • Show All 422 Lines • ▼ Show 20 Lines
	endif:			endif:
	; layout can remove the split branch if it can copy the return block.			; layout can remove the split branch if it can copy the return block.
	; This call makes the return block long enough that it doesn't get copied.			; This call makes the return block long enough that it doesn't get copied.
	call void @llvm.amdgcn.s.sleep(i32 5);			call void @llvm.amdgcn.s.sleep(i32 5);
	ret void			ret void
	}			}

	; si_mask_branch			; si_mask_branch
	; s_cbranch_execz
	; s_branch

	; GCN-LABEL: {{^}}analyze_mask_branch:			; GCN-LABEL: {{^}}analyze_mask_branch:
	; GCN: v_cmp_lt_f32_e32 vcc			; GCN: v_cmp_lt_f32_e32 vcc
	; GCN-NEXT: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc			; GCN-NEXT: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc
	; GCN-NEXT: ; mask branch [[RET:BB[0-9]+_[0-9]+]]			; GCN-NEXT: ; mask branch [[RET:BB[0-9]+_[0-9]+]]
	; GCN-NEXT: s_cbranch_execz [[BRANCH_SKIP:BB[0-9]+_[0-9]+]]
	; GCN-NEXT: s_branch [[LOOP_BODY:BB[0-9]+_[0-9]+]]

	; GCN-NEXT: [[BRANCH_SKIP]]: ; %entry			; GCN-NEXT: [[LOOP_BODY:BB[0-9]+_[0-9]+]]: ; %loop_body
	; GCN-NEXT: s_getpc_b64 vcc
	; GCN-NEXT: s_add_u32 vcc_lo, vcc_lo, [[RET]]-([[BRANCH_SKIP]]+4)
	; GCN-NEXT: s_addc_u32 vcc_hi, vcc_hi, 0
	; GCN-NEXT: s_setpc_b64 vcc

	; GCN-NEXT: [[LOOP_BODY]]: ; %loop_body
	; GCN: s_mov_b64 vcc, -1{{$}}
	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART
	; GCN: v_nop_e64			; GCN: v_nop_e64
	; GCN: v_nop_e64			; GCN: v_nop_e64
	; GCN: v_nop_e64			; GCN: v_nop_e64
	; GCN: v_nop_e64			; GCN: v_nop_e64
	; GCN: v_nop_e64			; GCN: v_nop_e64
	; GCN: v_nop_e64			; GCN: v_nop_e64
	; GCN: ;;#ASMEND			; GCN: ;;#ASMEND
	; GCN-NEXT: s_cbranch_vccz [[RET]]

	; GCN-NEXT: [[LONGBB:BB[0-9]+_[0-9]+]]: ; %loop_body			; GCN-NEXT: [[LONGBB:BB[0-9]+_[0-9]+]]: ; %loop_body
	; GCN-NEXT: ; in Loop: Header=[[LOOP_BODY]] Depth=1			; GCN-NEXT: ; in Loop: Header=[[LOOP_BODY]] Depth=1
	; GCN-NEXT: s_getpc_b64 vcc			; GCN-NEXT: s_getpc_b64 vcc
	; GCN-NEXT: s_sub_u32 vcc_lo, vcc_lo, ([[LONGBB]]+4)-[[LOOP_BODY]]			; GCN-NEXT: s_sub_u32 vcc_lo, vcc_lo, ([[LONGBB]]+4)-[[LOOP_BODY]]
	; GCN-NEXT: s_subb_u32 vcc_hi, vcc_hi, 0			; GCN-NEXT: s_subb_u32 vcc_hi, vcc_hi, 0
	; GCN-NEXT: s_setpc_b64 vcc			; GCN-NEXT: s_setpc_b64 vcc

	; GCN-NEXT: [[RET]]: ; %Flow			; GCN-NEXT: [[RET]]: ; %ret
	; GCN-NEXT: s_or_b64 exec, exec, [[MASK]]			; GCN-NEXT: s_or_b64 exec, exec, [[MASK]]
	; GCN: buffer_store_dword			; GCN: buffer_store_dword
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	define amdgpu_kernel void @analyze_mask_branch() #0 {			define amdgpu_kernel void @analyze_mask_branch() #0 {
	entry:			entry:
	%reg = call float asm sideeffect "v_mov_b32_e64 $0, 0", "=v"()			%reg = call float asm sideeffect "v_mov_b32_e64 $0, 0", "=v"()
	%cmp0 = fcmp ogt float %reg, 0.000000e+00			%cmp0 = fcmp ogt float %reg, 0.000000e+00
	br i1 %cmp0, label %loop, label %ret			br i1 %cmp0, label %loop, label %ret
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/struct-byval-frame-index.ll

	; RUN: llc < %s -mcpu=cortex-a15 -verify-machineinstrs -arm-atomic-cfg-tidy=0 \| FileCheck %s			; RUN: llc < %s -mcpu=cortex-a15 -verify-machineinstrs -arm-atomic-cfg-tidy=0 \| FileCheck %s

	; Check a spill right after a function call with large struct byval is correctly			; Check a spill right after a function call with large struct byval is correctly
	; generated.			; generated.
	; PR16393			; PR16393

	; We expect the spill to be generated in %if.end230 and the reloads in			; We expect 4-byte spill and reload to be generated.
	; %if.end249 and %for.body285.

	; CHECK: set_stored_macroblock_parameters			; CHECK: set_stored_macroblock_parameters
	; CHECK: @ %if.end230			; CHECK: str r{{.*}}, [sp, {{#[0-9]+}}] @ 4-byte Spill
	; CHECK-NOT:@ %if.			; CHECK: ldr r{{.*}}, [lr, {{#[0-9]+}}] @ 4-byte Reload
	; CHECK-NOT:@ %for.
	; CHECK: str r{{.*}}, [sp, [[SLOT:#[0-9]+]]] @ 4-byte Spill
	; CHECK: @ %if.end249
	; CHECK-NOT:@ %if.
	; CHECK-NOT:@ %for.
	; CHECK: ldr r{{.*}}, [sp, [[SLOT]]] @ 4-byte Reload
	; CHECK: @ %for.body285
	; CHECK-NOT:@ %if.
	; CHECK-NOT:@ %for.
	; CHECK: ldr r{{.*}}, [sp, [[SLOT]]] @ 4-byte Reload

	target triple = "armv7l-unknown-linux-gnueabihf"			target triple = "armv7l-unknown-linux-gnueabihf"

	%structA = type { double, [16 x [16 x i16]], [16 x [16 x i16]], [16 x [16 x i16]], i32**, i32, i32, i16, [4 x i32], [4 x i32], i8, [16 x i8], [16 x i8], i32, i64, i32, i16***, i16****, [2 x [4 x [4 x i8]]], i32, i32, i32, i32, i32, i32, i32, i32, i32 }			%structA = type { double, [16 x [16 x i16]], [16 x [16 x i16]], [16 x [16 x i16]], i32**, i32, i32, i16, [4 x i32], [4 x i32], i8, [16 x i8], [16 x i8], i32, i64, i32, i16***, i16****, [2 x [4 x [4 x i8]]], i32, i32, i32, i32, i32, i32, i32, i32, i32 }
	%structB = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, float, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i8, i8, i32, i32*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [9 x [16 x [16 x i16]]], [5 x [16 x [16 x i16]]], [9 x [8 x [8 x i16]]], [2 x [4 x [16 x [16 x i16]]]], [16 x [16 x i16]], [16 x [16 x i32]], i32, i32, i32, i32, i32*, i32*, %structC, %structD, %structK, i32, i32, i32, i32, i32, i32, [4 x [4 x i32]], i32, i32, i32, i32, i32, double, i32, i32, i32, i32, i16****, i16**, i16**, i16***, [15 x i16], i32, i32, i32, i32, i32, i32, i32, i32, [6 x [32 x i32]], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [1 x i32], i32, i32, [2 x i32], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, %structL, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, double, double, i32, double, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x [2 x i32]], [2 x i32], i32, i32, i16, i32, i32, i32, i32, i32 }			%structB = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, float, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i8, i8, i32, i32*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [9 x [16 x [16 x i16]]], [5 x [16 x [16 x i16]]], [9 x [8 x [8 x i16]]], [2 x [4 x [16 x [16 x i16]]]], [16 x [16 x i16]], [16 x [16 x i32]], i32, i32, i32, i32, i32*, i32*, %structC, %structD, %structK, i32, i32, i32, i32, i32, i32, [4 x [4 x i32]], i32, i32, i32, i32, i32, double, i32, i32, i32, i32, i16****, i16**, i16**, i16***, [15 x i16], i32, i32, i32, i32, i32, i32, i32, i32, [6 x [32 x i32]], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [1 x i32], i32, i32, [2 x i32], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, %structL, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, double, double, i32, double, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x [2 x i32]], [2 x i32], i32, i32, i16, i32, i32, i32, i32, i32 }
	%structC = type { i32, i32, [100 x %structD*], i32, float, float, float }			%structC = type { i32, i32, [100 x %structD*], i32, float, float, float }
	%structD = type { i32, i32, i32, i32, i32, i32, %structE, %structH, %structJ, i32, i32, i32, i32, i32, i32, i32, i32, i32 (i32), [3 x [2 x i32]] }			%structD = type { i32, i32, i32, i32, i32, i32, %structE, %structH, %structJ, i32, i32, i32, i32, i32, i32, i32, i32, i32 (i32), [3 x [2 x i32]] }
	%structE = type { %structF*, %structG, %structG }			%structE = type { %structF*, %structG, %structG }
	▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/Thumb2/v8_IT_5.ll

	; RUN: llc < %s -mtriple=thumbv8 -arm-atomic-cfg-tidy=0 \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv8 -arm-atomic-cfg-tidy=0 \| FileCheck %s
	; RUN: llc < %s -mtriple=thumbv7 -arm-atomic-cfg-tidy=0 -arm-restrict-it \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv7 -arm-atomic-cfg-tidy=0 -arm-restrict-it \| FileCheck %s
	; CHECK: it ne			; CHECK: it ne
	; CHECK-NEXT: cmpne			; CHECK-NEXT: cmpne
	; CHECK-NEXT: bne [[JUMPTARGET:.LBB[0-9]+_[0-9]+]]			; CHECK-NEXT: bne [[JUMPTARGET:.LBB[0-9]+_[0-9]+]]
	; CHECK: cbz			; CHECK: cbz
	; CHECK-NEXT: %if.else163			; CHECK-NEXT: %if.else163
	; CHECK-NEXT: mov.w			; CHECK-NEXT: mov.w
	; CHECK-NEXT: b			; CHECK-NEXT: b
	; CHECK: [[JUMPTARGET]]:{{.*}}%if.else173			; CHECK: [[JUMPTARGET]]:{{.*}}%if.else173
	; CHECK-NEXT: mov.w			; CHECK-NEXT: mov.w
	; CHECK-NEXT: pop			; CHECK-NEXT: bx lr
	; CHECK-NEXT: %if.else145			; CHECK: %if.else145
	; CHECK-NEXT: mov.w			; CHECK-NEXT: mov.w
				; CHECK: pop.w

	%struct.hc = type { i32, i32, i32, i32 }			%struct.hc = type { i32, i32, i32, i32 }

	define i32 @t(i32 %type) optsize {			define i32 @t(i32 %type) optsize {
	entry:			entry:
	switch i32 %type, label %if.else173 [			switch i32 %type, label %if.else173 [
	i32 13, label %if.then115			i32 13, label %if.then115
	i32 6, label %if.then102			i32 6, label %if.then102
	Show All 24 Lines

llvm/trunk/test/Transforms/StructurizeCFG/branch-on-argument.ll

	; RUN: opt -S -o - -structurizecfg < %s \| FileCheck %s			; RUN: opt -S -o - -structurizecfg < %s \| FileCheck %s

	; CHECK-LABEL: @invert_branch_on_arg_inf_loop(			; CHECK-LABEL: @invert_branch_on_arg_inf_loop(
	; CHECK: entry:			; CHECK: entry:
	; CHECK: %arg.inv = xor i1 %arg, true			; CHECK: %arg.inv = xor i1 %arg, true
	; CHECK: phi i1 [ false, %Flow1 ], [ %arg.inv, %entry ]
	define void @invert_branch_on_arg_inf_loop(i32 addrspace(1)* %out, i1 %arg) {			define void @invert_branch_on_arg_inf_loop(i32 addrspace(1)* %out, i1 %arg) {
	entry:			entry:
	br i1 %arg, label %for.end, label %for.body			br i1 %arg, label %for.end, label %sesestart
				sesestart:
				br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	store i32 999, i32 addrspace(1)* %out, align 4			store i32 999, i32 addrspace(1)* %out, align 4
	br label %for.body			br i1 %arg, label %for.body, label %seseend
				seseend:
				ret void

	for.end: ; preds = %Flow			for.end: ; preds = %Flow
	ret void			ret void
	}			}


	; CHECK-LABEL: @invert_branch_on_arg_jump_into_loop(			; CHECK-LABEL: @invert_branch_on_arg_jump_into_loop(
	; CHECK: entry:			; CHECK: entry:
	Show All 26 Lines

llvm/trunk/test/Transforms/StructurizeCFG/no-branch-to-entry.ll

				; XFAIL: *

				; This test used to generate a region that caused it to delete the entry block,
				; but it does not anymore after the changes to handling of infinite loops in the
				; PostDominatorTree.
				; TODO: This should be either replaced with another IR or deleted completely.

	; RUN: opt -S -o - -structurizecfg -verify-dom-info < %s \| FileCheck %s			; RUN: opt -S -o - -structurizecfg -verify-dom-info < %s \| FileCheck %s

	; CHECK-LABEL: @no_branch_to_entry_undef(			; CHECK-LABEL: @no_branch_to_entry_undef(
	; CHECK: entry:			; CHECK: entry:
	; CHECK-NEXT: br label %entry.orig			; CHECK-NEXT: br label %entry.orig
	define void @no_branch_to_entry_undef(i32 addrspace(1)* %out) {			define void @no_branch_to_entry_undef(i32 addrspace(1)* %out) {
	entry:			entry:
	br i1 undef, label %for.end, label %for.body			br i1 undef, label %for.end, label %for.body
	Show All 23 Lines

llvm/trunk/unittests/IR/DominatorTreeTest.cpp

Show First 20 Lines • Show All 321 Lines • ▼ Show 20 Lines	runWithDomTree(
EXPECT_FALSE(DT->dominates(Edge_BB0_BB1_b, BB1));		EXPECT_FALSE(DT->dominates(Edge_BB0_BB1_b, BB1));

EXPECT_FALSE(DT->dominates(Edge_BB0_BB1_a, BB2));		EXPECT_FALSE(DT->dominates(Edge_BB0_BB1_a, BB2));
EXPECT_FALSE(DT->dominates(Edge_BB0_BB1_b, BB2));		EXPECT_FALSE(DT->dominates(Edge_BB0_BB1_b, BB2));
});		});
}		}

// Verify that the PDT is correctly updated in case an edge removal results		// Verify that the PDT is correctly updated in case an edge removal results
// in a new unreachable CFG node.		// in a new unreachable CFG node. Also make sure that the updated PDT is the
		// same as a freshly recalculated one.
//		//
// For the following input code and initial PDT:		// For the following input code and initial PDT:
//		//
// CFG PDT		// CFG PDT
//		//
// A Exit		// A Exit
// \| \|		// \| \|
// _B D		// _B D
// / \| \ \|		// / \| \ \|
// ^ v \ B		// ^ v \ B
// \ / D / \		// \ / D / \
// C \ C A		// C \ C A
// v		// v
// Exit		// Exit
//		//
// we verify that CFG' and PDT-updated is obtained after removal of edge C -> B.		// we verify that CFG' and PDT-updated is obtained after removal of edge C -> B.
//		//
// CFG' PDT-updated		// CFG' PDT-updated
//		//
// A Exit		// A Exit
// \| \|		// \| / \| \
// B D		// B C B D
// \| \ \|		// \| \ \|
// v \ B		// v \ A
// / D \		// / D
// C \ A		// C \
// \| v		// \| \
// unreachable Exit		// unreachable Exit
//		//
// WARNING: PDT-updated is inconsistent with PDT-recalculated, which is		// Both the blocks that end with ret and with unreachable become trivial
// constructed from CFG' when recalculating the PDT from scratch.		// PostDomTree roots, as they have no successors.
//
// PDT-recalculated
//
// Exit
// / \| \
// C B D
// \|
// A
//		//
// TODO: document the wanted behavior after resolving this inconsistency.
TEST(DominatorTree, DeletingEdgesIntroducesUnreachables) {		TEST(DominatorTree, DeletingEdgesIntroducesUnreachables) {
StringRef ModuleString =		StringRef ModuleString =
"define void @f() {\n"		"define void @f() {\n"
"A:\n"		"A:\n"
" br label %B\n"		" br label %B\n"
"B:\n"		"B:\n"
" br i1 undef, label %D, label %C\n"		" br i1 undef, label %D, label %C\n"
"C:\n"		"C:\n"
Show All 10 Lines	runWithDomTree(
M, "f", [&](Function &F, DominatorTree DT, PostDomTree *PDT) {		M, "f", [&](Function &F, DominatorTree DT, PostDomTree *PDT) {
Function::iterator FI = F.begin();		Function::iterator FI = F.begin();

FI++;		FI++;
BasicBlock B = &FI++;		BasicBlock B = &FI++;
BasicBlock C = &FI++;		BasicBlock C = &FI++;
BasicBlock D = &FI++;		BasicBlock D = &FI++;

assert(PDT->dominates(PDT->getNode(D), PDT->getNode(B)));		ASSERT_TRUE(PDT->dominates(PDT->getNode(D), PDT->getNode(B)));
		EXPECT_TRUE(DT->verify());
		EXPECT_TRUE(PDT->verify());

C->getTerminator()->eraseFromParent();		C->getTerminator()->eraseFromParent();
new UnreachableInst(C->getContext(), C);		new UnreachableInst(C->getContext(), C);

DT->deleteEdge(C, B);		DT->deleteEdge(C, B);
PDT->deleteEdge(C, B);		PDT->deleteEdge(C, B);

EXPECT_TRUE(PDT->dominates(PDT->getNode(D), PDT->getNode(B)));		EXPECT_TRUE(DT->verify());
EXPECT_EQ(PDT->getNode(C), nullptr);		EXPECT_TRUE(PDT->verify());

PDT->recalculate(F);

EXPECT_FALSE(PDT->dominates(PDT->getNode(D), PDT->getNode(B)));		EXPECT_FALSE(PDT->dominates(PDT->getNode(D), PDT->getNode(B)));
EXPECT_NE(PDT->getNode(C), nullptr);		EXPECT_NE(PDT->getNode(C), nullptr);

		DominatorTree NDT(F);
		EXPECT_EQ(DT->compare(NDT), 0);

		PostDomTree NPDT(F);
		EXPECT_EQ(PDT->compare(NPDT), 0);
});		});
}		}

// Verify that the PDT is correctly updated in case an edge removal results		// Verify that the PDT is correctly updated in case an edge removal results
// in an infinite loop.		// in an infinite loop. Also make sure that the updated PDT is the
		// same as a freshly recalculated one.
//		//
// Test case:		// Test case:
//		//
// CFG PDT		// CFG PDT
//		//
// A Exit		// A Exit
// \| \|		// \| \|
// _B D		// _B D
// / \| \ \|		// / \| \ \|
// ^ v \ B		// ^ v \ B
// \ / D / \		// \ / D / \
// C \ C A		// C \ C A
// / \ v		// / \ v
// ^ v Exit		// ^ v Exit
// \_/		// \_/
//		//
// After deleting the edge C->B, C is part of an infinite reverse-unreachable		// After deleting the edge C->B, C is part of an infinite reverse-unreachable
// loop:		// loop:
//		//
// CFG' PDT'		// CFG' PDT'
//		//
// A Exit		// A Exit
// \| \|		// \| / \| \
// B D		// B C B D
// \| \ \|		// \| \ \|
// v \ B		// v \ A
// / D \		// / D
// C \ A		// C \
// / \ v		// / \ v
// ^ v Exit		// ^ v Exit
// \_/		// \_/
//		//
// In PDT, D post-dominates B. We verify that this post-dominance		// As C now becomes reverse-unreachable, it forms a new non-trivial root and
// relation is preserved _after_ deleting the edge C->B from CFG.		// gets connected to the virtual exit.
		// D does not postdominate B anymore, because there are two forward paths from
		// B to the virtual exit:
		// - B -> C -> VirtualExit
		// - B -> D -> VirtualExit.
//		//
// As C now becomes reverse-unreachable, it is not anymore part of the
// PDT. We also verify this property.
//
// TODO: Can we change the PDT definition such that C remains part of the
// CFG?
TEST(DominatorTree, DeletingEdgesIntroducesInfiniteLoop) {		TEST(DominatorTree, DeletingEdgesIntroducesInfiniteLoop) {
StringRef ModuleString =		StringRef ModuleString =
"define void @f() {\n"		"define void @f() {\n"
"A:\n"		"A:\n"
" br label %B\n"		" br label %B\n"
"B:\n"		"B:\n"
" br i1 undef, label %D, label %C\n"		" br i1 undef, label %D, label %C\n"
"C:\n"		"C:\n"
Show All 12 Lines	runWithDomTree(
M, "f", [&](Function &F, DominatorTree DT, PostDomTree *PDT) {		M, "f", [&](Function &F, DominatorTree DT, PostDomTree *PDT) {
Function::iterator FI = F.begin();		Function::iterator FI = F.begin();

FI++;		FI++;
BasicBlock B = &FI++;		BasicBlock B = &FI++;
BasicBlock C = &FI++;		BasicBlock C = &FI++;
BasicBlock D = &FI++;		BasicBlock D = &FI++;

assert(PDT->dominates(PDT->getNode(D), PDT->getNode(B)));		ASSERT_TRUE(PDT->dominates(PDT->getNode(D), PDT->getNode(B)));
		EXPECT_TRUE(DT->verify());
		EXPECT_TRUE(PDT->verify());

auto SwitchC = cast<SwitchInst>(C->getTerminator());		auto SwitchC = cast<SwitchInst>(C->getTerminator());
SwitchC->removeCase(SwitchC->case_begin());		SwitchC->removeCase(SwitchC->case_begin());
DT->deleteEdge(C, B);		DT->deleteEdge(C, B);
		EXPECT_TRUE(DT->verify());
PDT->deleteEdge(C, B);		PDT->deleteEdge(C, B);
		EXPECT_TRUE(PDT->verify());

EXPECT_TRUE(PDT->dominates(PDT->getNode(D), PDT->getNode(B)));		EXPECT_FALSE(PDT->dominates(PDT->getNode(D), PDT->getNode(B)));
EXPECT_EQ(PDT->getNode(C), nullptr);		EXPECT_NE(PDT->getNode(C), nullptr);

PDT->recalculate(F);		DominatorTree NDT(F);
		EXPECT_EQ(DT->compare(NDT), 0);

EXPECT_TRUE(PDT->dominates(PDT->getNode(D), PDT->getNode(B)));		PostDomTree NPDT(F);
EXPECT_EQ(PDT->getNode(C), nullptr);		EXPECT_EQ(PDT->compare(NPDT), 0);
});		});
}		}

// Verify that the PDT is correctly updated in case an edge removal results		// Verify that the PDT is correctly updated in case an edge removal results
// in an infinite loop.		// in an infinite loop.
//		//
// Test case:		// Test case:
//		//
// CFG PDT		// CFG PDT
//		//
// A Exit		// A Exit
// \| / \| \		// \| / \| \
// B-- C B D		// B-- C2 B D
// \| \ \|		// \| \ / \|
// v \ A		// v \ C A
// / D		// / D
// C--C2 \		// C--C2 \
// / \ \ v		// / \ \ v
// ^ v --Exit		// ^ v --Exit
// \_/		// \_/
//		//
// After deleting the edge C->E, C is part of an infinite reverse-unreachable		// After deleting the edge C->E, C is part of an infinite reverse-unreachable
// loop:		// loop:
//		//
// CFG' PDT'		// CFG' PDT'
//		//
// A Exit		// A Exit
// \| \|		// \| / \| \
// B D		// B C B D
// \| \ \|		// \| \ \|
// v \ B		// v \ A
// / D \		// / D
// C \ A		// C \
// / \ v		// / \ v
// ^ v Exit		// ^ v Exit
// \_/		// \_/
//		//
// In PDT, D does not post-dominate B. After the edge C->E is removed, a new		// In PDT, D does not post-dominate B. After the edge C -> C2 is removed,
// post-dominance relation is introduced.		// C becomes a new nontrivial PDT root.
//		//
// As C now becomes reverse-unreachable, it is not anymore part of the
// PDT. We also verify this property.
//
// TODO: Can we change the PDT definition such that C remains part of the
// CFG, at best without loosing the dominance relation D postdom B.
TEST(DominatorTree, DeletingEdgesIntroducesInfiniteLoop2) {		TEST(DominatorTree, DeletingEdgesIntroducesInfiniteLoop2) {
StringRef ModuleString =		StringRef ModuleString =
"define void @f() {\n"		"define void @f() {\n"
"A:\n"		"A:\n"
" br label %B\n"		" br label %B\n"
"B:\n"		"B:\n"
" br i1 undef, label %D, label %C\n"		" br i1 undef, label %D, label %C\n"
"C:\n"		"C:\n"
Show All 15 Lines	runWithDomTree(
Function::iterator FI = F.begin();		Function::iterator FI = F.begin();

FI++;		FI++;
BasicBlock B = &FI++;		BasicBlock B = &FI++;
BasicBlock C = &FI++;		BasicBlock C = &FI++;
BasicBlock C2 = &FI++;		BasicBlock C2 = &FI++;
BasicBlock D = &FI++;		BasicBlock D = &FI++;

		EXPECT_TRUE(DT->verify());
		EXPECT_TRUE(PDT->verify());

auto SwitchC = cast<SwitchInst>(C->getTerminator());		auto SwitchC = cast<SwitchInst>(C->getTerminator());
SwitchC->removeCase(SwitchC->case_begin());		SwitchC->removeCase(SwitchC->case_begin());
DT->deleteEdge(C, C2);		DT->deleteEdge(C, C2);
PDT->deleteEdge(C, C2);		PDT->deleteEdge(C, C2);
C2->eraseFromParent();		C2->removeFromParent();

EXPECT_EQ(DT->getNode(C2), nullptr);		EXPECT_EQ(DT->getNode(C2), nullptr);
PDT->eraseNode(C2);		PDT->eraseNode(C2);
		delete C2;

		EXPECT_TRUE(DT->verify());
		EXPECT_TRUE(PDT->verify());

		EXPECT_FALSE(PDT->dominates(PDT->getNode(D), PDT->getNode(B)));
		EXPECT_NE(PDT->getNode(C), nullptr);

EXPECT_TRUE(PDT->dominates(PDT->getNode(D), PDT->getNode(B)));		DominatorTree NDT(F);
EXPECT_EQ(PDT->getNode(C), nullptr);		EXPECT_EQ(DT->compare(NDT), 0);
EXPECT_EQ(PDT->getNode(C2), nullptr);
		PostDomTree NPDT(F);
PDT->recalculate(F);		EXPECT_EQ(PDT->compare(NPDT), 0);

EXPECT_TRUE(PDT->dominates(PDT->getNode(D), PDT->getNode(B)));
EXPECT_EQ(PDT->getNode(C), nullptr);
EXPECT_EQ(PDT->getNode(C2), nullptr);
});		});
}		}

namespace {		namespace {
const auto Insert = CFGBuilder::ActionKind::Insert;		const auto Insert = CFGBuilder::ActionKind::Insert;
const auto Delete = CFGBuilder::ActionKind::Delete;		const auto Delete = CFGBuilder::ActionKind::Delete;

bool CompUpdates(const CFGBuilder::Update &A, const CFGBuilder::Update &B) {		bool CompUpdates(const CFGBuilder::Update &A, const CFGBuilder::Update &B) {
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	while ((LastUpdate = B.applyUpdate())) {
BasicBlock *To = B.getOrAddBlock(LastUpdate->Edge.To);		BasicBlock *To = B.getOrAddBlock(LastUpdate->Edge.To);
DT.insertEdge(From, To);		DT.insertEdge(From, To);
EXPECT_TRUE(DT.verify());		EXPECT_TRUE(DT.verify());
PDT.insertEdge(From, To);		PDT.insertEdge(From, To);
EXPECT_TRUE(PDT.verify());		EXPECT_TRUE(PDT.verify());
}		}
}		}

		TEST(DominatorTree, InsertFromUnreachable) {
		CFGHolder Holder;
		std::vector<CFGBuilder::Arc> Arcs = {{"1", "2"}, {"2", "3"}, {"3", "4"}};

		std::vector<CFGBuilder::Update> Updates = {{Insert, {"3", "5"}}};
		CFGBuilder B(Holder.F, Arcs, Updates);
		PostDomTree PDT(*Holder.F);
		EXPECT_TRUE(PDT.verify());

		Optional<CFGBuilder::Update> LastUpdate = B.applyUpdate();
		EXPECT_TRUE(LastUpdate);

		EXPECT_EQ(LastUpdate->Action, Insert);
		BasicBlock *From = B.getOrAddBlock(LastUpdate->Edge.From);
		BasicBlock *To = B.getOrAddBlock(LastUpdate->Edge.To);
		PDT.insertEdge(From, To);
		EXPECT_TRUE(PDT.verify());
		EXPECT_TRUE(PDT.getRoots().size() == 2);
		EXPECT_NE(PDT.getNode(B.getOrAddBlock("5")), nullptr);
		}

TEST(DominatorTree, InsertMixed) {		TEST(DominatorTree, InsertMixed) {
CFGHolder Holder;		CFGHolder Holder;
std::vector<CFGBuilder::Arc> Arcs = {		std::vector<CFGBuilder::Arc> Arcs = {
{"1", "2"}, {"2", "3"}, {"3", "4"}, {"5", "6"}, {"5", "7"},		{"1", "2"}, {"2", "3"}, {"3", "4"}, {"5", "6"}, {"5", "7"},
{"8", "9"}, {"9", "10"}, {"8", "11"}, {"11", "12"}, {"7", "3"}};		{"8", "9"}, {"9", "10"}, {"8", "11"}, {"11", "12"}, {"7", "3"}};

std::vector<CFGBuilder::Update> Updates = {		std::vector<CFGBuilder::Update> Updates = {
{Insert, {"4", "5"}}, {Insert, {"2", "5"}}, {Insert, {"10", "9"}},		{Insert, {"4", "5"}}, {Insert, {"2", "5"}}, {Insert, {"10", "9"}},
▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Dominators] Include infinite loops in PostDominatorTreeClosedPublic

Details

Diff Detail

Event Timeline

1) Does connecting edges to the virtual exit break the parent property?

2) Can removing an edge from a dominator tree weaken the dominance relation?

Today + Literature (leave C unreachable)

Proposed Patch + GCC (Connecting C to virtual exit)

Revision Contents

Diff 111214

llvm/trunk/include/llvm/Support/GenericDomTree.h

llvm/trunk/include/llvm/Support/GenericDomTreeConstruction.h

llvm/trunk/lib/Transforms/Scalar/ADCE.cpp

llvm/trunk/test/Analysis/PostDominators/infinite-loop.ll

llvm/trunk/test/Analysis/PostDominators/infinite-loop2.ll

llvm/trunk/test/Analysis/PostDominators/infinite-loop3.ll

llvm/trunk/test/Analysis/PostDominators/pr24415.ll

llvm/trunk/test/Analysis/PostDominators/pr6047_a.ll

llvm/trunk/test/Analysis/PostDominators/pr6047_b.ll

llvm/trunk/test/Analysis/PostDominators/pr6047_c.ll

llvm/trunk/test/Analysis/PostDominators/pr6047_d.ll

llvm/trunk/test/Analysis/RegionInfo/infinite_loop.ll

llvm/trunk/test/Analysis/RegionInfo/infinite_loop_2.ll

llvm/trunk/test/Analysis/RegionInfo/infinite_loop_3.ll

llvm/trunk/test/Analysis/RegionInfo/infinite_loop_4.ll

llvm/trunk/test/Analysis/RegionInfo/infinite_loop_5_a.ll

llvm/trunk/test/Analysis/RegionInfo/infinite_loop_5_b.ll

llvm/trunk/test/CodeGen/AMDGPU/branch-relaxation.ll

llvm/trunk/test/CodeGen/ARM/struct-byval-frame-index.ll

llvm/trunk/test/CodeGen/Thumb2/v8_IT_5.ll

llvm/trunk/test/Transforms/StructurizeCFG/branch-on-argument.ll

llvm/trunk/test/Transforms/StructurizeCFG/no-branch-to-entry.ll

llvm/trunk/unittests/IR/DominatorTreeTest.cpp

[Dominators] Include infinite loops in PostDominatorTree
ClosedPublic