This is an archive of the discontinued LLVM Phabricator instance.

Reorder SimplifyCFG and SROA?
AbandonedPublic

Authored by tjablin on Jun 13 2016, 4:31 PM.

Details

Summary

Hi,
I'd like to propose changing the order of the SimplifyCFG and SROA passes run. There are optimizations in SimplifyCFG that are enabled by SROA. Presently, SimplifyCFG is run before and after SROA, but some less advantageous SimplifyCFG transformations can run in the first pass and deny better SimplifyCFG transformations a chance. Changing the order of these two passes will resolve performance bug 27555 (https://llvm.org/bugs/show_bug.cgi?id=27555). I don't know how the ordering of these two passes was decided initially, so I am hoping whether this order was deliberate.

Specifically, there are two functions in SimplifyCFG and deal with long sequences of comparisons: FoldValueComparisonIntoPredecessors and FoldingBranchToCommonDest. These two transformations have overlapping opportunities, but when both are applicable, FoldValueComparisonIntoPredecessors is typically better. Presently, even though FoldValueComparisonIntoPredecessors runs before FoldingBranchToCommonDest, FoldingBranchToCommonDest wins, because FoldValueComparisonIntoPredecessors needs to run after values are promoted to registers, whereas FoldingBranchToCommonDest can run before.

FoldingBranchToCommonDest transforms code that looks like:

if (A) goto LabelX
if (B) goto LabelX
if (C) goto LabelX
...
goto LabelY

into:

if(A || B || C || ...) goto LabelX
goto LabelY

where A, B, C, etc are non-side-effecting expressions.

FoldValueComparisonIntoPredecessors transforms code that looks like:

if (x == A) goto LabelX
if (x == B) goto LabelY
if (x == C) goto LabelZ
...

into:
select(x) {

case A: goto LabelX
case B: goto LabelY
case C: goto LabelZ
...

}

where A, B, C, etc are constant values.

In situations where both transformations apply, FoldValueComparisonIntoPredecessors tends to be better, since LLVM can lower the Select more efficiently than the complex boolean expression generated by FoldingBranchToCommonDest.

Diff Detail

Event Timeline

tjablin updated this revision to Diff 60625.Jun 13 2016, 4:31 PM
tjablin retitled this revision from to Reorder SimplifyCFG and SROA?.
tjablin updated this object.
tjablin added reviewers: hfinkel, cycheng, chandlerc.
tjablin added a subscriber: llvm-commits.
chandlerc edited edge metadata.Jun 13 2016, 6:40 PM

Rather than change the ordering, what about teaching SimplifyCFG's FoldValueComparisonIntoPredecessors logic to handle code in the pattern produced by FoldingBranchToCommonDest? That would seem like a good canonicalization change anyways.

Hi Chandler,
The FoldValueComparisonIntoPredecessors logic already mostly understands the code pattern produced by FoldingBranchToCommonDest, but would need to duplicate most of the logic from Early-CSE to get the rest of the way there. The current pass order is: CFGSimplification, SROA, EarlyCSE. If either EarlyCSE or SROA runs before SimplifyCFG there's no problem. Alternatively, a second pass through SimplifyCFG will also generate good code as long as it is before InstCombine. InstCombine is problematic since it "strength reduces" some equality comparisons to bitwise operations. For example:

(i == 5334 || i == 5335)

becomes

((i | 1) == 5335)

Hi Chandler,
The FoldValueComparisonIntoPredecessors logic already mostly understands the code pattern produced by FoldingBranchToCommonDest, but would need to duplicate most of the logic from Early-CSE to get the rest of the way there. The current pass order is: CFGSimplification, SROA, EarlyCSE. If either EarlyCSE or SROA runs before SimplifyCFG there's no problem. Alternatively, a second pass through SimplifyCFG will also generate good code as long as it is before InstCombine. InstCombine is problematic since it "strength reduces" some equality comparisons to bitwise operations. For example:

(i == 5334 || i == 5335)

becomes

((i | 1) == 5335)

If this is how the equality expression is canonicalized by instcombine, then this is how simplify-cfg needs to match it.

If there is logic elsewhere that should be commonly factored out to help build the equality set out of this reduced form, we should extract it and use it in both places.

If this simply cannot be reasonably matched afterward, then instcombine is destroying information rather than canonicalizing, and we should teach it to use a different canonical form in order to make matching on these patterns easier.

Did you run this through the test-suite?

test/Transforms/PhaseOrdering/branch-to-switch.ll
6

What this test is checking is totally unclear, can you document it please?