This is an archive of the discontinued LLVM Phabricator instance.

StructurizeCFG: Adjust the loop depth for a subregion to order the nodes correctly
ClosedPublic

Authored by cfang on May 15 2018, 4:06 PM.

Details

Summary

StructurizeCFG::orderNodes basically uses a reverse post-order (RPO) traversal of the region list to get the order. The only problem with it is that sometimes backedges for outer loops will be visited before backedges for inner loops. To solve this problem, a loop depth based approach has been used to make sure all blocks in this loop has been visited before moving on to outer loop.

However, we found a problem for a SubRegion which is a loop itself:


                      |        |
                     V       |
--> BB1 --> BB2 --> BB3 -->

In this case, BB2 is a SubRegion (loop), and thus its loopdepth is different than that of BB1 and BB3. This fact will lead BB2 to be placed in the wrong order.

In this work, we treat the SubRegion as a special case and use its exit block to determine the loop and its depth to guard the sorting.

Diff Detail

Repository
rL LLVM

Event Timeline

cfang created this revision.May 15 2018, 4:06 PM

I still think this process of using the LoopDepth and sorting from anything other than RPO is fundamentally broken, but if everything is passing with this that's an improvement.

test/Transforms/StructurizeCFG/AMDGPU/loop-subregion-misordered.ll
1 ↗(On Diff #146949)

Did you use update_test_checks for this? The comment is missing

175–177 ↗(On Diff #146949)

Does this comment still apply?

cfang marked 2 inline comments as done.May 16 2018, 1:32 PM

Right. Reverse Post Order (RPO) should be the fundamental order. But apparently there are some cases that pure RPO does not work.
That's the reason loop depth was introduced to guard the ordering of the nodes.

While the logic behind the loop depth based RPO ordering is still not clear to me, this patch is enhancing that approach.
A subregion should be collapsed into one special node and the loop in the subregion is not appropriate to be used for the ordering.

test/Transforms/StructurizeCFG/AMDGPU/loop-subregion-misordered.ll
1 ↗(On Diff #146949)

Did update_test_checks and comment shows now.

175–177 ↗(On Diff #146949)

This comment is no longer needed! Removed. Thanks.

cfang updated this revision to Diff 147168.May 16 2018, 1:33 PM
cfang marked 2 inline comments as done.

Updated the test based on the comments

Updated the test based on the comments

Hi there. Personally I'm pretty uncomfortable reviewing structurize-cfg because I've never been able to understand how it works. I'm basically at the same point as dberlin in https://reviews.llvm.org/D41605#964650.

Thanks Justin for your comment. I added Daniel as a reviewer.

-RN1 ->RN2->RN3->

This patch fixes the issue regarding the ordering of a subregion. No matter what approach you are using, RN1, RN2 and RN3 should be neighbors in the ordered list.
But if RN2 is a subregion of a loop, the current loop depth based approach will inappropriately misplaced RN2 in the list because it has different loop depth than that of RN1 and RN3.

The patch applies an adjusted loop depth for subregion to resolve this issue. It should be common in the structurization to collapse a subregion and consider it as a special node.

arsenm accepted this revision.May 23 2018, 4:35 AM

LGTM. I don't believe that there are reasons RPO doesn't work other than bugs in the structurizer. The sorting used now is clearly broken, and if this fixes the concept it was going for this is a step in the right direction

This revision is now accepted and ready to land.May 23 2018, 4:35 AM
This revision was automatically updated to reflect the committed changes.