This is an archive of the discontinued LLVM Phabricator instance.

This needs more tests. There are a bunch of decisions being taken during the reordering, but the test peresented simply exercises the case of one nested loop inside another. For example, what about two inner loops L1 and L2, where L1 has an exit that reaches the entry of L2? There might be other interesting cases too.

We actually introduced a different pass recently, called "UnifyLoopExits" to circumvent the same problem in the structurizer. You can see it optionally invoked in the AMDGPU backend. The idea is to transform each loop to have a single exit block, which makes it look like a separate region; since the structurizer is a region pass, it no longer sees a loop nest and thus cannot get confused about the nesting.

The current patch can provide a different, more precise way to solve the same problem. But please do take a look at the tests that use UnifyLoopExits. They are under test/Transforms/UnifyLoopExits and test/Transforms/StructurizeCFG/workarounds.

sameerds added inline comments.Apr 29 2020, 10:26 AM

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
151	Why return a pointer here? As far as I could see, the return value is never checked for nullptr in the places where this function is called.
392–408	Should this comment be removed now? The problem with the backedges of the outer loops is addressed in this change, right?
402–404	It's very hard to read these two nested loops, especially with respect to their exit conditions. If I understand correctly, the inner loop at line 390 keeps executing until the correct number of blocks is found for a given loop. Can this condition be placed in the while () itself?
421	It seems the only purpose of this object is to invoke the "run()" method. Could this be just split out into a bunch of functions instead? All the class members can be declared right here, and the member functions of the current class can be rewritten to take necessary structures as arguments.
llvm/test/Transforms/StructurizeCFG/pr41703.ll
4 ↗	(On Diff #260753)	Can you please add a comment indicating which blocks end up in the wrong order? This will be useful when revisiting this testcase at some future time.

ekatz marked 5 inline comments as done.Apr 30 2020, 4:02 AM

ekatz added inline comments.

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
151	Because I need to update the value later. In any case, I will remove this function, as it can be simplified.
392–408	It is still true, it should be treated as an intro for the following algorithm (explaining why we don't stop here). I'll add a sentence to make it clearer.
402–404	I'll rearrange those loops.
421	I can reduce it to a single utility function, and then it will make even less sense to keep it. Then I'll remove it.
llvm/test/Transforms/StructurizeCFG/pr41703.ll
4 ↗	(On Diff #260753)	sure

Done requested changes.

foad added a subscriber: foad.Apr 30 2020, 9:24 AM

It seems the new commit is an incremental update on the previous commit. Can you please submit a single squashed commit, so that the change can be compared against the original state of the trunk?

Also, the following tests were created specifically to demonstrate the wrong ordering:

test/Transforms/StructurizeCFG/workarounds/needs-unified-loop-exits.ll
test/Transforms/StructurizeCFG/workarounds/needs-fr-ule.ll

If the proposed change is correct, then the example in these tests will be structurized correctly even after removing the "-unify-loop-exits" argument. These tests should also be checked and then updated accordingly.

In fact, it would be nice to update the following tests. But they have a very confusing history and they are specific to AMDGPU. I can volunteer to do this later as a follow up.

test/Transforms/StructurizeCFG/AMDGPU/backedge-id-bug.ll
test/Transforms/StructurizeCFG/AMDGPU/backedge-id-bug-xfail.ll

In D79037#2016160, @sameerds wrote:

It seems the new commit is an incremental update on the previous commit. Can you please submit a single squashed commit, so that the change can be compared against the original state of the trunk?

I am not sure what do you mean. In this page in the Phabricator, under [Revision Contents]->[History] you can choose which of the patch revisions to diff. The default is "Base" against the latest "Diff no.".

Also, the following tests were created specifically to demonstrate the wrong ordering:

test/Transforms/StructurizeCFG/workarounds/needs-unified-loop-exits.ll

test/Transforms/StructurizeCFG/workarounds/needs-fr-ule.ll

Great! Didn't know about those. I'll update them, and if everything is OK, I'll move them out of the "workarounds" directory.

If the proposed change is correct, then the example in these tests will be structurized correctly even after removing the "-unify-loop-exits" argument. These tests should also be checked and then updated accordingly.

In fact, it would be nice to update the following tests. But they have a very confusing history and they are specific to AMDGPU. I can volunteer to do this later as a follow up.

test/Transforms/StructurizeCFG/AMDGPU/backedge-id-bug.ll

test/Transforms/StructurizeCFG/AMDGPU/backedge-id-bug-xfail.ll

I'll try to take a look at those as well. These are "XFAIL" right now. If I'm understanding correctly, You assume they should pass with this patch?

In D79037#2016224, @ekatz wrote:

In D79037#2016160, @sameerds wrote:

It seems the new commit is an incremental update on the previous commit. Can you please submit a single squashed commit, so that the change can be compared against the original state of the trunk?

I am not sure what do you mean. In this page in the Phabricator, under [Revision Contents]->[History] you can choose which of the patch revisions to diff. The default is "Base" against the latest "Diff no.".

Sorry about that! I can see the diff now ...

Also, the following tests were created specifically to demonstrate the wrong ordering:

test/Transforms/StructurizeCFG/workarounds/needs-unified-loop-exits.ll

test/Transforms/StructurizeCFG/workarounds/needs-fr-ule.ll

Great! Didn't know about those. I'll update them, and if everything is OK, I'll move them out of the "workarounds" directory.

Note that needs-fr-ule.ll is quite a painful CFG from a real world testcase. It needs FixIrreducible pass, which produces extra loops that have the same ordering problem being fixed here. I'll try to run your patch myself and point out which blocks show up in the wrong order.

If the proposed change is correct, then the example in these tests will be structurized correctly even after removing the "-unify-loop-exits" argument. These tests should also be checked and then updated accordingly.

In fact, it would be nice to update the following tests. But they have a very confusing history and they are specific to AMDGPU. I can volunteer to do this later as a follow up.

test/Transforms/StructurizeCFG/AMDGPU/backedge-id-bug.ll

test/Transforms/StructurizeCFG/AMDGPU/backedge-id-bug-xfail.ll

I'll try to take a look at those as well. These are "XFAIL" right now. If I'm understanding correctly, You assume they should pass with this patch?

That would be great! But like I said, the file history is confusing, and the last time that I looked, I was unable to decide which blocks are in the wrong order.

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
327	Wouldn't this be more readable as a simple iteration with a temporary vector? The temporary vector is no more expensive than the stack created for the recursive calls. Essentially: while (L && L != Ancestor) { push back L to temporary vector; L = parent; } if (!L) return false; // Ancestor is not an ancestor of L pop temporary vector to WorkList return true;
367	This pointer is never checked for nullptr before it is used. So either assert that it is not null, or use a reference instead. Also what does the previous line mean? What is the size of a nullptr loop?
378	Isn't this the same as `while (!POT.empty())`? That would be more readable, and also convey the fact that a node from POT is definitely consumed on each iteration.
393	Why not just have `while (CurrentLoopSize)` here? At least from what I could see, the value should be non-zero when this point is reached from above, right?

ekatz marked 4 inline comments as done.May 3 2020, 12:57 AM

ekatz added inline comments.

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
327	I found a nice iterative (non-recursive) way to write this function. I'll change to that.
367	I'll add some comments, and an assert.
378	`POT` is not always get emptied. If there are no skipped nodes there is no need to erase anything, and we do not enter the following `if` statement.
393	The problem is that `CurrentLoopSize` may be null in case all of the nodes have a parent loop (none outside from every loop). In that case, `CurrentLoopSize` will only be valid from line 408; and remain that way all the way until the end.

Done requested changes.

sameerds added inline comments.May 3 2020, 7:43 AM

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
393	I am not sure I follow. Can you please rephrase? An example might be useful.
399	This line needs at least two tests: one where CurrentLoop is not an ancestor of L, and another where CurrentLoop is an ancestor of L, but not its immediate parent.

ekatz marked an inline comment as done.May 3 2020, 8:09 AM

ekatz added inline comments.

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
393	The simplest example is a region without loops. Then the `CurrentLoop` (and in turn, `CurrentLoopSize`) is `nullptr`.

sameerds added inline comments.May 3 2020, 9:15 PM

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
399	The second case is particularly important, where CurrentLoop is not an immediate parent of L. This situation happens when some node N in CurrentLoop has a successor S which is the header of L. If S was not the header, then L would become an irreducible region and hence not represented as a loop in LLVM. But that contradicts the assumption that CurrentLoop is not the parent of L. If there was a parent P of L (where P is itself a child of CurrentLoop), then P's header does not dominate S, and hence it is also an irreducible region. Again, any such P would not be represented in LLVM. I am not sure if I missed anything here ... if I am right, then the function pushPathToAncestor() would be a whole lot simpler.

ekatz marked an inline comment as done.May 4 2020, 6:13 AM

ekatz added inline comments.

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
399	That is a great observation! Thanks!

Removed pushPathToAncestor.

The implementation looks okay to me. The new comments help a lot. Waiting for tests mentioned in the comments so far ... one new test (sibling loops where the DFS jumps from one to the other before finishing the first loop) and updates to existing tests.

I have bad news. This fix does not help with the test needs-fr-ule.ll, i.e., the resulting CFG is incorret if I remove the unify-loop-exits pass. The fix-irreducible pass produces an inner loop with the header "irr.guard" and a latch "while.cond47". This is inside an outer loop with header "while.cond" and many backedges. When I run the structurizer with the current fix on this graph, the backedge of the inner loop is lost. Instead we see a spurious loop between two new blocks Flow5 and Flow4, which is governed by incorrect boolean constants when in fact they should have been %Pred variables. That information is lost along the edge Flow8 to Flow5.

Now I am kicking myself for not remembering this earlier. I had encountered the same problem, which is why we introduced the unify-loop-exits pass. Things are not so easy with nested loops. It's not enough to process all inner blocks before outer blocks. The ordering of the outer blocks relative to the inner loop also matters, and there does not seem to be a practical way to preserve it.

The problem with those tests is something else, and not produced by the wrong ordering. You may refer to PR25378 for an example.
Still, this fix the ordering issue and fixes PR41509 (and its duplicates).
I think we should insert this patch, and attend to the other bug as a separate issue.

In D79037#2022612, @ekatz wrote:

The problem with those tests is something else, and not produced by the wrong ordering. You may refer to PR25378 for an example.
Still, this fix the ordering issue and fixes PR41509 (and its duplicates).
I think we should insert this patch, and attend to the other bug as a separate issue.

I see. It sure does look like PR25378. Then the only thing left is one new test with two sibling loops. The intention is to cover the case where CurrentLoop is not the parent of L.

In D79037#2022768, @sameerds wrote:

I see. It sure does look like PR25378. Then the only thing left is one new test with two sibling loops. The intention is to cover the case where CurrentLoop is not the parent of L.

I don't think it is possible, although I might be wrong.
Because the way I see it, for loop A be interfered with another loop B, while POT, there must be at least one block that belongs to loop A that is dependent on at least one block in loop B. But then loop B must be finished before loop A, and actually inside it. Otherwise, it won't be recognized as a separate loop.

I hope I explained it clear enough. Otherwise, can you please give an example?

In D79037#2022995, @ekatz wrote:

In D79037#2022768, @sameerds wrote:

I see. It sure does look like PR25378. Then the only thing left is one new test with two sibling loops. The intention is to cover the case where CurrentLoop is not the parent of L.

I don't think it is possible, although I might be wrong.
Because the way I see it, for loop A be interfered with another loop B, while POT, there must be at least one block that belongs to loop A that is dependent on at least one block in loop B. But then loop B must be finished before loop A, and actually inside it. Otherwise, it won't be recognized as a separate loop.

I hope I explained it clear enough. Otherwise, can you please give an example?

I agree. That means that if L exists, then CurrentLoop must be its parent, right? So instead of checking for this condition, the code will have to assert it.

In D79037#2024482, @sameerds wrote:

In D79037#2022995, @ekatz wrote:

In D79037#2022768, @sameerds wrote:

I see. It sure does look like PR25378. Then the only thing left is one new test with two sibling loops. The intention is to cover the case where CurrentLoop is not the parent of L.

I don't think it is possible, although I might be wrong.
Because the way I see it, for loop A be interfered with another loop B, while POT, there must be at least one block that belongs to loop A that is dependent on at least one block in loop B. But then loop B must be finished before loop A, and actually inside it. Otherwise, it won't be recognized as a separate loop.

I hope I explained it clear enough. Otherwise, can you please give an example?

I agree. That means that if L exists, then CurrentLoop must be its parent, right? So instead of checking for this condition, the code will have to assert it.

Actually, I take that back. Consider the following CFG:

A -> H1, H2
H1->B, C
B->H1
C->B, H2
H2->E
E->H2, F

A is the entry and F is the exit. The loops (H1, B, C) and (H2, E) are siblings with backedges B->H1 and E->H2 respectively. C is an exiting block in the first loop that branches to H2. One POT would be "B, F, E, H2, C, H1, A". When iterating in reverse, the loop with H1 is started first, but H2 from the second loop is encountered before the first loop is completed.

In D79037#2024574, @sameerds wrote:

Actually, I take that back. Consider the following CFG:

A -> H1, H2
H1->B, C
B->H1
C->B, H2
H2->E
E->H2, F

A is the entry and F is the exit. The loops (H1, B, C) and (H2, E) are siblings with backedges B->H1 and E->H2 respectively. C is an exiting block in the first loop that branches to H2. One POT would be "B, F, E, H2, C, H1, A". When iterating in reverse, the loop with H1 is started first, but H2 from the second loop is encountered before the first loop is completed.

In this case, those loops are in different regions, and as this is a region pass, they won't appear in the same POT.

In D79037#2024745, @ekatz wrote:

In D79037#2024574, @sameerds wrote:

Actually, I take that back. Consider the following CFG:

A -> H1, H2
H1->B, C
B->H1
C->B, H2
H2->E
E->H2, F

A is the entry and F is the exit. The loops (H1, B, C) and (H2, E) are siblings with backedges B->H1 and E->H2 respectively. C is an exiting block in the first loop that branches to H2. One POT would be "B, F, E, H2, C, H1, A". When iterating in reverse, the loop with H1 is started first, but H2 from the second loop is encountered before the first loop is completed.

In this case, those loops are in different regions, and as this is a region pass, they won't appear in the same POT.

You need to look at the larger pattern than just the specific example. It should be possible to add exits in both loops that force them into the same region. For example, H2->F and B->F.

Added test for interleaved sibling loops, and change the test file name to "interleaved-loop-order.ll".

LGTM. It would be nice if you can move the following tests to the new file and demonstrate that that they are fixed. But that need not hold up this review any longer.

test/Transforms/StructurizeCFG/AMDGPU/backedge-id-bug.ll
test/Transforms/StructurizeCFG/AMDGPU/backedge-id-bug-xfail.ll

[EDIT] Actually added the test names.

This revision is now accepted and ready to land.May 10 2020, 7:30 PM

In D79037#2028811, @sameerds wrote:

LGTM. It would be nice if you can move the following tests to the new file and demonstrate that that they are fixed. But that need not hold up this review any longer.

test/Transforms/StructurizeCFG/AMDGPU/backedge-id-bug.ll

test/Transforms/StructurizeCFG/AMDGPU/backedge-id-bug-xfail.ll

[EDIT] Actually added the test names.

The CFG produced by "backedge-id-bug.ll" seems to be OK, just a little different than expected due to a different ordering scheme used when it was produced. This needs to be double checked, but as I see it, the expected just needs to be updated.
In "backedge-id-bug-xfail.ll" the "verify-region-info" fails, which is a separate bug.

In any case, those files produce the same CFG with and without this patch.
So, as you have already stated, they are out of the scope of this patch.

Closed by commit rG897d8ee5cd69: [StructurizeCFG] Fix region nodes ordering (authored by ekatz). · Explain WhyMay 13 2020, 5:54 AM

This revision was automatically updated to reflect the committed changes.

Hi, This commit makes some compilations go into an infinite loop. I will try to prepare a reproducer.

In D79037#2035980, @piotr wrote:

Hi, This commit makes some compilations go into an infinite loop. I will try to prepare a reproducer.

Thanks.
Waiting for a test case...

inf.ll458 KBDownload

- this can possibly be simplified even more, but it takes ages to cut down.

In D79037#2036331, @piotr wrote:

this can possibly be simplified even more, but it takes ages to cut down.

Here's a reduced case:

reduced.ll2 KBDownload

Run it with "opt -structurizecfg reduced.ll".

Thanks both of you.
I'll revert and update the patch for it.

This revision is now accepted and ready to land.May 14 2020, 7:48 AM

Fixed the bug causing an infinite loop in the new testcase.

ekatz requested review of this revision.May 14 2020, 9:53 AM

ekatz added reviewers: piotr, foad.

TBH, I spent a small amount of time visualizing the new testcase, but couldn't figure out the problem, nor the solution. I see that the check for currentLoopSize is moved up the control flow a bit, but I didn't see why. A few LLVM_DEBUG lines would be very helpful to trace what the structurizer is doing. Also, can the new test case be reduced further? There are a couple of simple chains, and a few branches. Are all of them necessary? The main problem seems to be an inner loop with just two blocks where both blocks exit. I don't know whether both these properties matter. For example, will the problem happen with a single block that loops on itself?

Simplified the new test case.

In D79037#2040926, @sameerds wrote:

TBH, I spent a small amount of time visualizing the new testcase, but couldn't figure out the problem, nor the solution. I see that the check for currentLoopSize is moved up the control flow a bit, but I didn't see why.

We need to move up the check for *CurrentLoopSize, as in the case it is zero, then we no longer need to skip nodes, and just continue to the next node/loop.
The simplified, new test case, demonstrate a case where it is needed.

In D79037#2049573, @ekatz wrote:

In D79037#2040926, @sameerds wrote:

TBH, I spent a small amount of time visualizing the new testcase, but couldn't figure out the problem, nor the solution. I see that the check for currentLoopSize is moved up the control flow a bit, but I didn't see why.

We need to move up the check for *CurrentLoopSize, as in the case it is zero, then we no longer need to skip nodes, and just continue to the next node/loop.
The simplified, new test case, demonstrate a case where it is needed.

Actually, I spent some more time looking at this today. In the simplified testcase, the region being processed is { D, E, C } listed in post-order. Each one of these is a child of some loop, although the loop for D is not in the current region. As a result, when the outermost loop starts, CurrentLoop is nullptr but CurrentLoopSize is zero because there is no node in the null loop. Checking for CurrentLoopSize is not a very satisfying explanation for the fix, since the loop for D actually has an entry in LoopSizes.

In other words, the pre-condition is that the sum of LoopSizes is the number of nodes in the region. Then the post-condition should be that the same sum should be zero. An assertion for this would fail in this testcase.

This is a second take of the solution, using SCCs instead of LoopInfo.
This algorithm can also handle irreducible loops, and, generally, any kind of cycle in the graph.

In D79037#2054462, @ekatz wrote:

This is a second take of the solution, using SCCs instead of LoopInfo.
This algorithm can also handle irreducible loops, and, generally, any kind of cycle in the graph.

Thanks! This looks really nice now! The sad part is that the structurizer itself is undefined in the presence of irreducible regions, so we will never actually see this in action around irreducible regions.

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
326	I might be missing something here. OB marks the beginning of the range that originally contained the nodes that are now listed in Nodes. We are now over-writing this range from the beginning with members of each SCC found by the scc_iterator. Where do the rest of the members in Nodes go? Do they have to be rewritten in the remaining range [OB, OE)?
370	Do we really need this for loop? If I understand correctly, orderSCC() works on a subgraph that is guranteed to not be an SCC. Can the same function just take whole region as the starting point?

ekatz marked 2 inline comments as done.May 26 2020, 1:14 PM

ekatz added inline comments.

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
326	The nodes in `Nodes` are already connected in the subgraph starting from the `Entry` node. They are all reachable, as they were in the original SCC that contained `Entry` and `Nodes`. We rewrite the SCC into [OB, OE), which should be the same size as the number of nodes in all the sub-SCCs found in the current `for` loop.
370	`orderSCC` does something pretty close, but works on `SubGraphNode`s, while this function works on `RegionNode`s. There is no need to create a subgraph if there is no SCC larger than 2, to work on.

sameerds added inline comments.May 27 2020, 10:14 PM

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
370	This seems like a good use-case for the WrappedSuccIterator class introduced in LoopIterator.h ... that version limits the graph to the body of a loop, but you could build a similar iterator_adapter that limits the graph to a given set of nodes. With such a succ iterator, you could just eliminate the SubGraphNode class itself and let orderSCC do all the work on the entire input region. That will unify the two for-loops which are actually identical in their intention.

ekatz marked an inline comment as done.May 28 2020, 3:41 AM

ekatz added inline comments.

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
370	I thought about that, but the problem is with the `scc_iterator` - It created new `succ_iterator` for each child node, using the `GraphTraits` given. This means that the NodeRef, on which `succ_iterator` is created upon, needs to have a reference to the set of valid Nodes (as actual successors). In any case, we need to create new Nodes. In addition, please take a look at the implementation of the `RNSuccIterator`, in which you will find that iterating upon it is a pretty heavy operation, including a map search for every node in `getISucc` (this calls to `getNode(BB)` which in turn searches an internal map). Considering that we create new nodes, and might need to do all over again for each sub SCC, it is better to just rebuild the SCC (with the `SubGraphNode`s), instead of a simple wrapper (that will requery the map for the appropriate `RegionNode` of a `BasicBlock` each time.

sameerds added inline comments.May 29 2020, 10:47 PM

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
370	I didn't follow the part about creating new nodes. The LoopIterator defines the NodeRef as a tuple <BasicBlock, Loop>, so that it carries the Loop around for use in succ_iterator. What I had in mind was a similar tuple <RegionNode, ValidNodes> where the second part is a collection of nodes that constrain the succ_iterator. I would choose a more compact implementation here instead of looking into the cost of RNSuccIterator. Avoiding the map does not look like sufficient reason to break away from the idiomatic use of GraphTraits and scc_iterator. I know this looks expensive, but how expensive? Something like O(N * scc_depth)? It is certainly dwarfed by the heavy lifting that the rest of the structurizer is doing. To clarify, I am not insisting that a new iterator is ideal, but it would certainly be my choice unless there is peformance data asking for improvement.

ekatz marked an inline comment as done.May 29 2020, 11:22 PM

ekatz added inline comments.

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
370	I didn't like the concept of carrying a reference to the "set" on each node, which will increase the memory size required by the SCC iterator by a factor of 2, and also require requery of possibly many nodes (in the "set"). However, I agree that the performance impact overall is probably negligible, and we can always get back to a more optimized implementation if needed. But for now, the simpler approach is probably the correct one. I will upload a version without building a subgraph, as suggested.

Update algorithm to use a custom GraphTraits and a NodeRef that holds the set of nodes for the subgraph.
This simplifies the algorithm quite a bit.

This looks really good. Thanks a lot for bringing it this far! Please do check the #include for LoopInfo.h before submitting.

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
17	I think this can be removed too. At least a quick grep did not show any use of Loop or LoopInfo other than the code that gets eleminated with this patch.

This revision is now accepted and ready to land.May 31 2020, 9:35 PM

ekatz marked an inline comment as done.Jun 1 2020, 12:44 AM

ekatz added inline comments.

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
17	You are right. Will do.

foad added inline comments.Jun 1 2020, 1:45 AM

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
344–346	It's normally called a "topological" sort.

Closed by commit rG85c308804966: [StructurizeCFG] Fix region nodes ordering (authored by ekatz). · Explain WhyJun 1 2020, 3:09 AM

This revision was automatically updated to reflect the committed changes.

ekatz marked an inline comment as done.Jun 1 2020, 3:17 AM

ekatz added inline comments.

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
344–346	Will fix the comment. Thanks.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

StructurizeCFG.cpp

120 lines

test/

Transforms/

StructurizeCFG/

nested-loop-order-bug.ll

160 lines

Diff 261184

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp

//===- StructurizeCFG.cpp -------------------------------------------------===//		//===- StructurizeCFG.cpp -------------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LegacyDivergenceAnalysis.h"		#include "llvm/Analysis/LegacyDivergenceAnalysis.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
		sameerdsUnsubmitted Not Done Reply Inline Actions I think this can be removed too. At least a quick grep did not show any use of Loop or LoopInfo other than the code that gets eleminated with this patch. sameerds: I think this can be removed too. At least a quick grep did not show any use of Loop or LoopInfo…
		ekatzAuthorUnsubmitted Done Reply Inline Actions You are right. Will do. ekatz: You are right. Will do.
#include "llvm/Analysis/RegionInfo.h"		#include "llvm/Analysis/RegionInfo.h"
#include "llvm/Analysis/RegionIterator.h"		#include "llvm/Analysis/RegionIterator.h"
#include "llvm/Analysis/RegionPass.h"		#include "llvm/Analysis/RegionPass.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
/// this:		/// this:
///		///
/// \verbatim		/// \verbatim
/// 1		/// 1
/// \|\|		/// \|\|
/// \| \|		/// \| \|
/// 2 \|		/// 2 \|
/// \| /		/// \| /
/// \|/		/// \|/
		sameerdsUnsubmitted Not Done Reply Inline Actions Why return a pointer here? As far as I could see, the return value is never checked for nullptr in the places where this function is called. sameerds: Why return a pointer here? As far as I could see, the return value is never checked for nullptr…
		ekatzAuthorUnsubmitted Done Reply Inline Actions Because I need to update the value later. In any case, I will remove this function, as it can be simplified. ekatz: Because I need to update the value later. In any case, I will remove this function, as it can…
/// 3		/// 3
/// \|\| Where:		/// \|\| Where:
/// \| \| 1 = "If" block, calculates the condition		/// \| \| 1 = "If" block, calculates the condition
/// 4 \| 2 = "Then" subregion, runs if the condition is true		/// 4 \| 2 = "Then" subregion, runs if the condition is true
/// \| / 3 = "Flow" blocks, newly inserted flow blocks, rejoins the flow		/// \| / 3 = "Flow" blocks, newly inserted flow blocks, rejoins the flow
/// \|/ 4 = "Else" optional subregion, runs if the condition is false		/// \|/ 4 = "Else" optional subregion, runs if the condition is false
/// 5 5 = "End" block, also rejoins the control flow		/// 5 5 = "End" block, also rejoins the control flow
/// \endverbatim		/// \endverbatim
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	class StructurizeCFG : public RegionPass {
PredMap LoopPreds;		PredMap LoopPreds;
BranchVector LoopConds;		BranchVector LoopConds;

RegionNode *PrevNode;		RegionNode *PrevNode;

void orderNodes();		void orderNodes();

Loop getAdjustedLoop(RegionNode RN);		Loop getAdjustedLoop(RegionNode RN);
unsigned getAdjustedLoopDepth(RegionNode *RN);

void analyzeLoops(RegionNode *N);		void analyzeLoops(RegionNode *N);

Value buildCondition(BranchInst Term, unsigned Idx, bool Invert);		Value buildCondition(BranchInst Term, unsigned Idx, bool Invert);

void gatherPredicates(RegionNode *N);		void gatherPredicates(RegionNode *N);

void collectInfos();		void collectInfos();
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	Loop StructurizeCFG::getAdjustedLoop(RegionNode RN) {
if (RN->isSubRegion()) {		if (RN->isSubRegion()) {
Region *SubRegion = RN->getNodeAs<Region>();		Region *SubRegion = RN->getNodeAs<Region>();
return LI->getLoopFor(SubRegion->getExit());		return LI->getLoopFor(SubRegion->getExit());
}		}

return LI->getLoopFor(RN->getEntry());		return LI->getLoopFor(RN->getEntry());
}		}

/// Use the exit block to determine the loop depth if RN is a SubRegion.		/// Insert to the WorkList all the ancestors of L up to Ancestor.
		sameerdsUnsubmitted Not Done Reply Inline Actions I might be missing something here. OB marks the beginning of the range that originally contained the nodes that are now listed in Nodes. We are now over-writing this range from the beginning with members of each SCC found by the scc_iterator. Where do the rest of the members in Nodes go? Do they have to be rewritten in the remaining range [OB, OE)? sameerds: I might be missing something here. OB marks the beginning of the range that originally…
		ekatzAuthorUnsubmitted Done Reply Inline Actions The nodes in `Nodes` are already connected in the subgraph starting from the `Entry` node. They are all reachable, as they were in the original SCC that contained `Entry` and `Nodes`. We rewrite the SCC into [OB, OE), which should be the same size as the number of nodes in all the sub-SCCs found in the current `for` loop. ekatz: The nodes in `Nodes` are already connected in the subgraph starting from the `Entry` node. They…
unsigned StructurizeCFG::getAdjustedLoopDepth(RegionNode *RN) {		static bool pushPathToAncestor(Loop L, Loop Ancestor,
		sameerdsUnsubmitted Not Done Reply Inline Actions Wouldn't this be more readable as a simple iteration with a temporary vector? The temporary vector is no more expensive than the stack created for the recursive calls. Essentially: while (L && L != Ancestor) { push back L to temporary vector; L = parent; } if (!L) return false; // Ancestor is not an ancestor of L pop temporary vector to WorkList return true; sameerds: Wouldn't this be more readable as a simple iteration with a temporary vector? The temporary…
		ekatzAuthorUnsubmitted Done Reply Inline Actions I found a nice iterative (non-recursive) way to write this function. I'll change to that. ekatz: I found a nice iterative (non-recursive) way to write this function. I'll change to that.
if (RN->isSubRegion()) {		SmallDenseMap<Loop *, unsigned, 8> &LoopSizes,
Region *SubR = RN->getNodeAs<Region>();		SmallVectorImpl<Loop *> &WorkList) {
return LI->getLoopDepth(SubR->getExit());		if (L != Ancestor) {
}		if (!L)
		return false;

return LI->getLoopDepth(RN->getEntry());		if (!pushPathToAncestor(L->getParentLoop(), Ancestor, LoopSizes, WorkList))
		return false;
		}
		// If the loop has unprocessed nodes associated to it, then push it to the
		// WorkList.
		if (LoopSizes.lookup(L))
		WorkList.push_back(L);
		return true;
}		}

/// Build up the general order of nodes		/// Build up the general order of nodes, by performing a topology sort of the
		/// parent region's nodes, while ensuring that there is no outer loop node
		/// between any two inner loop nodes.
		foadUnsubmitted Not Done Reply Inline Actions It's normally called a "topological" sort. foad: It's normally called a "topological" sort.
		ekatzAuthorUnsubmitted Done Reply Inline Actions Will fix the comment. Thanks. ekatz: Will fix the comment. Thanks.
void StructurizeCFG::orderNodes() {		void StructurizeCFG::orderNodes() {
ReversePostOrderTraversal<Region*> RPOT(ParentRegion);		SmallVector<RegionNode *, 32> POT;
SmallDenseMap<Loop*, unsigned, 8> LoopBlocks;		SmallDenseMap<Loop *, unsigned, 8> LoopSizes;
		for (RegionNode *RN : post_order(ParentRegion)) {
		POT.push_back(RN);

// The reverse post-order traversal of the list gives us an ordering close		// Accumulate the number of nodes inside the region that belong to a loop.
// to what we want. The only problem with it is that sometimes backedges
// for outer loops will be visited before backedges for inner loops.
for (RegionNode *RN : RPOT) {
Loop *Loop = getAdjustedLoop(RN);		Loop *Loop = getAdjustedLoop(RN);
++LoopBlocks[Loop];		++LoopSizes[Loop];
}		}
		Order.resize(POT.size());

unsigned CurrentLoopDepth = 0;		// The post-order traversal of the list gives us an ordering close to what we
		// want. The only problem with it is that sometimes backedges for outer loops
		// will be visited before backedges for inner loops. So now we fix that by
		// inserting the nodes in order, while making sure that encountered inner loop
		// are complete before their parents (outer loops).

		SmallVector<Loop *, 8> WorkList;
		auto LSI = LoopSizes.find(nullptr);
		unsigned *CurrentLoopSize = LSI != LoopSizes.end() ? &LSI->second : nullptr;
		sameerdsUnsubmitted Not Done Reply Inline Actions This pointer is never checked for nullptr before it is used. So either assert that it is not null, or use a reference instead. Also what does the previous line mean? What is the size of a nullptr loop? sameerds: This pointer is never checked for nullptr before it is used. So either assert that it is not…
		ekatzAuthorUnsubmitted Done Reply Inline Actions I'll add some comments, and an assert. ekatz: I'll add some comments, and an assert.
Loop *CurrentLoop = nullptr;		Loop *CurrentLoop = nullptr;
for (auto I = RPOT.begin(), E = RPOT.end(); I != E; ++I) {
RegionNode RN = cast<RegionNode>(I);
unsigned LoopDepth = getAdjustedLoopDepth(RN);

if (is_contained(Order, *I))		// The "skipped" list is actually located at the (reversed) beginning of the
		sameerdsUnsubmitted Not Done Reply Inline Actions Do we really need this for loop? If I understand correctly, orderSCC() works on a subgraph that is guranteed to not be an SCC. Can the same function just take whole region as the starting point? sameerds: Do we really need this for loop? If I understand correctly, orderSCC() works on a subgraph that…
		ekatzAuthorUnsubmitted Done Reply Inline Actions `orderSCC` does something pretty close, but works on `SubGraphNode`s, while this function works on `RegionNode`s. There is no need to create a subgraph if there is no SCC larger than 2, to work on. ekatz: `orderSCC` does something pretty close, but works on `SubGraphNode`s, while this function works…
		sameerdsUnsubmitted Not Done Reply Inline Actions This seems like a good use-case for the WrappedSuccIterator class introduced in LoopIterator.h ... that version limits the graph to the body of a loop, but you could build a similar iterator_adapter that limits the graph to a given set of nodes. With such a succ iterator, you could just eliminate the SubGraphNode class itself and let orderSCC do all the work on the entire input region. That will unify the two for-loops which are actually identical in their intention. sameerds: This seems like a good use-case for the WrappedSuccIterator class introduced in LoopIterator.h .
		ekatzAuthorUnsubmitted Done Reply Inline Actions I thought about that, but the problem is with the `scc_iterator` - It created new `succ_iterator` for each child node, using the `GraphTraits` given. This means that the NodeRef, on which `succ_iterator` is created upon, needs to have a reference to the set of valid Nodes (as actual successors). In any case, we need to create new Nodes. In addition, please take a look at the implementation of the `RNSuccIterator`, in which you will find that iterating upon it is a pretty heavy operation, including a map search for every node in `getISucc` (this calls to `getNode(BB)` which in turn searches an internal map). Considering that we create new nodes, and might need to do all over again for each sub SCC, it is better to just rebuild the SCC (with the `SubGraphNode`s), instead of a simple wrapper (that will requery the map for the appropriate `RegionNode` of a `BasicBlock` each time. ekatz: I thought about that, but the problem is with the `scc_iterator` - It created new…
		sameerdsUnsubmitted Not Done Reply Inline Actions I didn't follow the part about creating new nodes. The LoopIterator defines the NodeRef as a tuple <BasicBlock, Loop>, so that it carries the Loop around for use in succ_iterator. What I had in mind was a similar tuple <RegionNode, ValidNodes> where the second part is a collection of nodes that constrain the succ_iterator. I would choose a more compact implementation here instead of looking into the cost of RNSuccIterator. Avoiding the map does not look like sufficient reason to break away from the idiomatic use of GraphTraits and scc_iterator. I know this looks expensive, but how expensive? Something like O(N * scc_depth)? It is certainly dwarfed by the heavy lifting that the rest of the structurizer is doing. To clarify, I am not insisting that a new iterator is ideal, but it would certainly be my choice unless there is peformance data asking for improvement. sameerds: I didn't follow the part about creating new nodes. The LoopIterator defines the NodeRef as a…
		ekatzAuthorUnsubmitted Done Reply Inline Actions I didn't like the concept of carrying a reference to the "set" on each node, which will increase the memory size required by the SCC iterator by a factor of 2, and also require requery of possibly many nodes (in the "set"). However, I agree that the performance impact overall is probably negligible, and we can always get back to a more optimized implementation if needed. But for now, the simpler approach is probably the correct one. I will upload a version without building a subgraph, as suggested. ekatz: I didn't like the concept of carrying a reference to the "set" on each node, which will…
continue;		// POT. This saves us the use of an intermediate container.
		// Note that there is always enough room, for the skipped nodes, before the
		// current location, as we have just passed at least that amount of nodes.

		auto Begin = POT.rbegin();
		auto I = Begin, SkippedEnd = Begin;
		auto O = Order.rbegin(), OE = Order.rend();
		while (O != OE) {
		sameerdsUnsubmitted Not Done Reply Inline Actions Isn't this the same as `while (!POT.empty())`? That would be more readable, and also convey the fact that a node from POT is definitely consumed on each iteration. sameerds: Isn't this the same as `while (!POT.empty())`? That would be more readable, and also convey the…
		ekatzAuthorUnsubmitted Done Reply Inline Actions `POT` is not always get emptied. If there are no skipped nodes there is no need to erase anything, and we do not enter the following `if` statement. ekatz: `POT` is not always get emptied. If there are no skipped nodes there is no need to erase…
		// If we have any skipped nodes, then erase the gap between the end of the
		// "skipped" list, and the current location.
		if (SkippedEnd != Begin) {
		POT.erase(I.base(), SkippedEnd.base());
		I = SkippedEnd = Begin = POT.rbegin();
		}

		// Keep processing outer loops, in order (from inner most, to outer).
		if (!WorkList.empty()) {
		CurrentLoop = WorkList.pop_back_val();
		CurrentLoopSize = &LoopSizes.find(CurrentLoop)->second;
		}

if (LoopDepth < CurrentLoopDepth) {		// Keep processing loops while only going deeper (into inner loops).
// Make sure we have visited all blocks in this loop before moving back to		do {
		sameerdsUnsubmitted Not Done Reply Inline Actions Why not just have `while (CurrentLoopSize)` here? At least from what I could see, the value should be non-zero when this point is reached from above, right? sameerds: Why not just have `while (CurrentLoopSize)` here? At least from what I could see, the value…
		ekatzAuthorUnsubmitted Done Reply Inline Actions The problem is that `CurrentLoopSize` may be null in case all of the nodes have a parent loop (none outside from every loop). In that case, `CurrentLoopSize` will only be valid from line 408; and remain that way all the way until the end. ekatz: The problem is that `CurrentLoopSize` may be null in case all of the nodes have a parent loop…
		sameerdsUnsubmitted Not Done Reply Inline Actions I am not sure I follow. Can you please rephrase? An example might be useful. sameerds: I am not sure I follow. Can you please rephrase? An example might be useful.
		ekatzAuthorUnsubmitted Done Reply Inline Actions The simplest example is a region without loops. Then the `CurrentLoop` (and in turn, `CurrentLoopSize`) is `nullptr`. ekatz: The simplest example is a region without loops. Then the `CurrentLoop` (and in turn…
// the outer loop.		assert(I != POT.rend());
		RegionNode RN = I++;

auto LoopI = I;		Loop *L = getAdjustedLoop(RN);
while (unsigned &BlockCount = LoopBlocks[CurrentLoop]) {		if (L != CurrentLoop) {
LoopI++;		// If CurrentLoop is not ancestor of L, then skip RN.
		sameerdsUnsubmitted Not Done Reply Inline Actions This line needs at least two tests: one where CurrentLoop is not an ancestor of L, and another where CurrentLoop is an ancestor of L, but not its immediate parent. sameerds: This line needs at least two tests: one where CurrentLoop is not an ancestor of L, and another…
		sameerdsUnsubmitted Not Done Reply Inline Actions The second case is particularly important, where CurrentLoop is not an immediate parent of L. This situation happens when some node N in CurrentLoop has a successor S which is the header of L. If S was not the header, then L would become an irreducible region and hence not represented as a loop in LLVM. But that contradicts the assumption that CurrentLoop is not the parent of L. If there was a parent P of L (where P is itself a child of CurrentLoop), then P's header does not dominate S, and hence it is also an irreducible region. Again, any such P would not be represented in LLVM. I am not sure if I missed anything here ... if I am right, then the function pushPathToAncestor() would be a whole lot simpler. sameerds: The second case is particularly important, where CurrentLoop is not an immediate parent of L.
		ekatzAuthorUnsubmitted Done Reply Inline Actions That is a great observation! Thanks! ekatz: That is a great observation! Thanks!
if (getAdjustedLoop(cast<RegionNode>(*LoopI)) == CurrentLoop) {		if (!L \|\| !pushPathToAncestor(L->getParentLoop(), CurrentLoop,
--BlockCount;		LoopSizes, WorkList)) {
Order.push_back(*LoopI);		// Push the node to the end of the "skipped" list.
}		*SkippedEnd++ = RN;
		continue;
		sameerdsUnsubmitted Not Done Reply Inline Actions It's very hard to read these two nested loops, especially with respect to their exit conditions. If I understand correctly, the inner loop at line 390 keeps executing until the correct number of blocks is found for a given loop. Can this condition be placed in the while () itself? sameerds: It's very hard to read these two nested loops, especially with respect to their exit conditions.
		ekatzAuthorUnsubmitted Done Reply Inline Actions I'll rearrange those loops. ekatz: I'll rearrange those loops.
}		}

		CurrentLoop = L;
		CurrentLoopSize = &LoopSizes.find(CurrentLoop)->second;
		sameerdsUnsubmitted Not Done Reply Inline Actions Should this comment be removed now? The problem with the backedges of the outer loops is addressed in this change, right? sameerds: Should this comment be removed now? The problem with the backedges of the outer loops is…
		ekatzAuthorUnsubmitted Done Reply Inline Actions It is still true, it should be treated as an intro for the following algorithm (explaining why we don't stop here). I'll add a sentence to make it clearer. ekatz: It is still true, it should be treated as an intro for the following algorithm (explaining why…
}		}

CurrentLoop = getAdjustedLoop(RN);		assert(O != Order.rend());
if (CurrentLoop)		*O++ = RN;
LoopBlocks[CurrentLoop]--;

CurrentLoopDepth = LoopDepth;		// If we have finished processing the current loop, then we are done here.
Order.push_back(*I);		--*CurrentLoopSize;
		} while (*CurrentLoopSize);
}		}
		assert(WorkList.empty());
// This pass originally used a post-order traversal and then operated on		assert(SkippedEnd == Begin);
// the list in reverse. Now that we are using a reverse post-order traversal
// rather than re-working the whole pass to operate on the list in order,
// we just reverse the list and continue to operate on it in reverse.
std::reverse(Order.begin(), Order.end());
}		}

		sameerdsUnsubmitted Not Done Reply Inline Actions It seems the only purpose of this object is to invoke the "run()" method. Could this be just split out into a bunch of functions instead? All the class members can be declared right here, and the member functions of the current class can be rewritten to take necessary structures as arguments. sameerds: It seems the only purpose of this object is to invoke the "run()" method. Could this be just…
		ekatzAuthorUnsubmitted Done Reply Inline Actions I can reduce it to a single utility function, and then it will make even less sense to keep it. Then I'll remove it. ekatz: I can reduce it to a single utility function, and then it will make even less sense to keep it.
/// Determine the end of the loops		/// Determine the end of the loops
void StructurizeCFG::analyzeLoops(RegionNode *N) {		void StructurizeCFG::analyzeLoops(RegionNode *N) {
if (N->isSubRegion()) {		if (N->isSubRegion()) {
// Test for exit as back edge		// Test for exit as back edge
BasicBlock *Exit = N->getNodeAs<Region>()->getExit();		BasicBlock *Exit = N->getNodeAs<Region>()->getExit();
if (Visited.count(Exit))		if (Visited.count(Exit))
Loops[Exit] = N->getEntry();		Loops[Exit] = N->getEntry();

▲ Show 20 Lines • Show All 648 Lines • Show Last 20 Lines

llvm/test/Transforms/StructurizeCFG/nested-loop-order-bug.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -structurizecfg %s -o - \| FileCheck %s

				; This test used to produce incorrect code.
				; For example %outer.loop.body used to branched to %inner.loop.end
				; (instead of %inner.loop.header).

				define i1 @test(i32 %x) {
				; CHECK-LABEL: @test(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[OUTER_LOOP_HEADER:%.*]]
				; CHECK: Flow12:
				; CHECK-NEXT: br i1 [[TMP3:%.]], label [[EXIT_TRUE:%.]], label [[FLOW13:%.*]]
				; CHECK: exit.true:
				; CHECK-NEXT: br label [[FLOW13]]
				; CHECK: Flow13:
				; CHECK-NEXT: br i1 [[TMP2:%.]], label [[NEWDEFAULT:%.]], label [[FLOW14:%.*]]
				; CHECK: NewDefault:
				; CHECK-NEXT: br label [[EXIT_FALSE:%.*]]
				; CHECK: Flow14:
				; CHECK-NEXT: [[TMP0:%.*]] = phi i1 [ false, [[EXIT_FALSE]] ], [ true, [[FLOW13]] ]
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: exit.false:
				; CHECK-NEXT: br label [[FLOW14]]
				; CHECK: outer.loop.header:
				; CHECK-NEXT: [[COND:%.*]] = call i1 @foo()
				; CHECK-NEXT: br i1 [[COND]], label [[OUTER_LOOP_BODY:%.]], label [[FLOW3:%.]]
				; CHECK: outer.loop.body:
				; CHECK-NEXT: br label [[INNER_LOOP_HEADER:%.*]]
				; CHECK: Flow3:
				; CHECK-NEXT: [[TMP1:%.]] = phi i1 [ [[TMP17:%.]], [[FLOW11:%.*]] ], [ true, [[OUTER_LOOP_HEADER]] ]
				; CHECK-NEXT: [[TMP2]] = phi i1 [ [[TMP13:%.*]], [[FLOW11]] ], [ false, [[OUTER_LOOP_HEADER]] ]
				; CHECK-NEXT: [[TMP3]] = phi i1 [ false, [[FLOW11]] ], [ true, [[OUTER_LOOP_HEADER]] ]
				; CHECK-NEXT: br i1 [[TMP1]], label [[FLOW12:%.*]], label [[OUTER_LOOP_HEADER]]
				; CHECK: inner.loop.header:
				; CHECK-NEXT: [[TMP4:%.]] = phi i1 [ [[TMP9:%.]], [[FLOW4:%.*]] ], [ false, [[OUTER_LOOP_BODY]] ]
				; CHECK-NEXT: [[COND1:%.*]] = call i1 @foo()
				; CHECK-NEXT: br i1 [[COND1]], label [[INNER_LOOP_BODY:%.*]], label [[FLOW4]]
				; CHECK: Flow6:
				; CHECK-NEXT: [[TMP5:%.]] = phi i1 [ false, [[INNER_LOOP_LATCH:%.]] ], [ true, [[LEAFBLOCK:%.*]] ]
				; CHECK-NEXT: br label [[FLOW5:%.*]]
				; CHECK: Flow7:
				; CHECK-NEXT: br i1 [[TMP11:%.]], label [[INNER_LOOP_END:%.]], label [[FLOW8:%.*]]
				; CHECK: inner.loop.end:
				; CHECK-NEXT: br label [[FLOW8]]
				; CHECK: inner.loop.body:
				; CHECK-NEXT: [[COND2:%.*]] = call i1 @foo()
				; CHECK-NEXT: [[TMP6:%.*]] = xor i1 [[COND2]], true
				; CHECK-NEXT: br i1 [[TMP6]], label [[INNER_LOOP_BODY_ELSE:%.]], label [[FLOW:%.]]
				; CHECK: inner.loop.body.else:
				; CHECK-NEXT: br label [[FLOW]]
				; CHECK: Flow:
				; CHECK-NEXT: [[TMP7:%.*]] = phi i1 [ false, [[INNER_LOOP_BODY_ELSE]] ], [ true, [[INNER_LOOP_BODY]] ]
				; CHECK-NEXT: br i1 [[TMP7]], label [[INNER_LOOP_BODY_THEN:%.]], label [[INNER_LOOP_COND:%.]]
				; CHECK: inner.loop.body.then:
				; CHECK-NEXT: br label [[INNER_LOOP_COND]]
				; CHECK: Flow4:
				; CHECK-NEXT: [[TMP8:%.]] = phi i1 [ [[TMP18:%.]], [[FLOW5]] ], [ true, [[INNER_LOOP_HEADER]] ]
				; CHECK-NEXT: [[TMP9]] = phi i1 [ [[TMP19:%.*]], [[FLOW5]] ], [ [[TMP4]], [[INNER_LOOP_HEADER]] ]
				; CHECK-NEXT: [[TMP10:%.]] = phi i1 [ [[TMP20:%.]], [[FLOW5]] ], [ false, [[INNER_LOOP_HEADER]] ]
				; CHECK-NEXT: [[TMP11]] = phi i1 [ false, [[FLOW5]] ], [ true, [[INNER_LOOP_HEADER]] ]
				; CHECK-NEXT: br i1 [[TMP8]], label [[FLOW7:%.*]], label [[INNER_LOOP_HEADER]]
				; CHECK: inner.loop.cond:
				; CHECK-NEXT: br label [[NODEBLOCK:%.*]]
				; CHECK: NodeBlock:
				; CHECK-NEXT: [[PIVOT:%.]] = icmp slt i32 [[X:%.]], 1
				; CHECK-NEXT: br i1 [[PIVOT]], label [[LEAFBLOCK]], label [[FLOW5]]
				; CHECK: Flow8:
				; CHECK-NEXT: [[TMP12:%.*]] = phi i1 [ true, [[INNER_LOOP_END]] ], [ false, [[FLOW7]] ]
				; CHECK-NEXT: br i1 [[TMP10]], label [[LEAFBLOCK1:%.]], label [[FLOW9:%.]]
				; CHECK: LeafBlock1:
				; CHECK-NEXT: [[SWITCHLEAF2:%.*]] = icmp eq i32 [[X]], 1
				; CHECK-NEXT: br i1 [[SWITCHLEAF2]], label [[INNER_LOOP_BREAK:%.]], label [[FLOW10:%.]]
				; CHECK: LeafBlock:
				; CHECK-NEXT: [[SWITCHLEAF:%.*]] = icmp eq i32 [[X]], 0
				; CHECK-NEXT: br i1 [[SWITCHLEAF]], label [[INNER_LOOP_LATCH]], label [[FLOW6:%.*]]
				; CHECK: Flow9:
				; CHECK-NEXT: [[TMP13]] = phi i1 [ [[TMP15:%.*]], [[FLOW10]] ], [ [[TMP9]], [[FLOW8]] ]
				; CHECK-NEXT: [[TMP14:%.]] = phi i1 [ [[TMP16:%.]], [[FLOW10]] ], [ [[TMP12]], [[FLOW8]] ]
				; CHECK-NEXT: br i1 [[TMP14]], label [[OUTER_LOOP_CLEANUP:%.*]], label [[FLOW11]]
				; CHECK: inner.loop.break:
				; CHECK-NEXT: br label [[FLOW10]]
				; CHECK: Flow10:
				; CHECK-NEXT: [[TMP15]] = phi i1 [ false, [[INNER_LOOP_BREAK]] ], [ true, [[LEAFBLOCK1]] ]
				; CHECK-NEXT: [[TMP16]] = phi i1 [ true, [[INNER_LOOP_BREAK]] ], [ [[TMP12]], [[LEAFBLOCK1]] ]
				; CHECK-NEXT: br label [[FLOW9]]
				; CHECK: outer.loop.cleanup:
				; CHECK-NEXT: br label [[OUTER_LOOP_LATCH:%.*]]
				; CHECK: Flow11:
				; CHECK-NEXT: [[TMP17]] = phi i1 [ false, [[OUTER_LOOP_LATCH]] ], [ true, [[FLOW9]] ]
				; CHECK-NEXT: br label [[FLOW3]]
				; CHECK: outer.loop.latch:
				; CHECK-NEXT: br label [[FLOW11]]
				; CHECK: Flow5:
				; CHECK-NEXT: [[TMP18]] = phi i1 [ [[TMP5]], [[FLOW6]] ], [ true, [[NODEBLOCK]] ]
				; CHECK-NEXT: [[TMP19]] = phi i1 [ [[TMP5]], [[FLOW6]] ], [ [[TMP4]], [[NODEBLOCK]] ]
				; CHECK-NEXT: [[TMP20]] = phi i1 [ false, [[FLOW6]] ], [ true, [[NODEBLOCK]] ]
				; CHECK-NEXT: br label [[FLOW4]]
				; CHECK: inner.loop.latch:
				; CHECK-NEXT: br label [[FLOW6]]
				; CHECK: exit:
				; CHECK-NEXT: ret i1 [[TMP0]]
				;
				entry:
				br label %outer.loop.header

				exit.true: ; preds = %outer.loop.header
				br label %exit

				exit.false: ; preds = %inner.loop.cond
				br label %exit

				outer.loop.header: ; preds = %outer.loop.latch, %entry
				%cond = call i1 @foo()
				br i1 %cond, label %outer.loop.body, label %exit.true

				outer.loop.body: ; preds = %outer.loop.header
				br label %inner.loop.header

				inner.loop.header: ; preds = %inner.loop.latch, %outer.loop.body
				%cond1 = call i1 @foo()
				br i1 %cond1, label %inner.loop.body, label %inner.loop.end

				inner.loop.end: ; preds = %inner.loop.header
				br label %outer.loop.cleanup

				inner.loop.body: ; preds = %inner.loop.header
				%cond2 = call i1 @foo()
				br i1 %cond2, label %inner.loop.body.then, label %inner.loop.body.else

				inner.loop.body.else: ; preds = %inner.loop.body
				br label %inner.loop.cond

				inner.loop.body.then: ; preds = %inner.loop.body
				br label %inner.loop.cond

				inner.loop.cond: ; preds = %inner.loop.body.then, %inner.loop.body.else
				switch i32 %x, label %exit.false [
				i32 0, label %inner.loop.latch
				i32 1, label %inner.loop.break
				]

				inner.loop.break: ; preds = %inner.loop.cond
				br label %outer.loop.cleanup

				outer.loop.cleanup: ; preds = %inner.loop.break, %inner.loop.end
				br label %outer.loop.latch

				outer.loop.latch: ; preds = %outer.loop.cleanup
				br label %outer.loop.header

				inner.loop.latch: ; preds = %inner.loop.cond
				br label %inner.loop.header

				exit: ; preds = %exit.false, %exit.true
				%r = phi i1 [ true, %exit.true ], [ false, %exit.false ]
				ret i1 %r
				}

				declare i1 @foo()

This is an archive of the discontinued LLVM Phabricator instance.

[StructurizeCFG] Fix region nodes orderingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 261184

llvm/lib/Transforms/Scalar/StructurizeCFG.cpp

llvm/test/Transforms/StructurizeCFG/nested-loop-order-bug.ll

[StructurizeCFG] Fix region nodes ordering
ClosedPublic