Page MenuHomePhabricator

[Jump Threading] Convert conditional branches into unconditional branches using GVN results
Needs RevisionPublic

Authored by masakiarai on Feb 8 2019, 8:01 AM.



processBBWithCommonDomCondition converts a conditional branch of BB
into an unconditional branch if possible, when the basic block BBa
dominates BB and both basic blocks have conditional branches under the
same condition.


%cmp = ...
br i1 %cmp, label %LA1, label %LA2



br i1 %cmp, label %LD1, label %LD2

The condition to apply this optimization is that the predecessors of
BB is divided into the set S1 coming from LA1 and the set S2 coming
from LA2.
If S1(or S2) is empty, it rewrites the conditional branch to an
unconditional branch to LD2(or LD1).
If both S1 and S2 are not empty and BB can be copied, BB is copied for
S2, and each conditional branches in BB and the copied BB are
converted into unconditional branches for LD1 and LD2, respectively.

Diff Detail

Event Timeline

masakiarai created this revision.Feb 8 2019, 8:01 AM
rengolin added a subscriber: rengolin.

Adding llvm-commits for wider audience

rengolin added a reviewer: jfb.Feb 8 2019, 8:30 AM
rengolin added a subscriber: john.brawn.
brzycki added a comment.EditedFeb 11 2019, 2:30 PM

Hello @masakiarai, my comments are inline.

I am also curious: have you run test-suite and make-check-llvm on this code? For test-suite I am curious about the SingleSource, MultiSource, and CTMark directories. The first two can help identify miscompiles or compiler crashes while last one can help identify compile-time performance regressions.


Running processBBWithCommonDomCondition at this location means the code can only ever catch this pattern once when the initial IR passed to JumpThreading contains it. If the pattern ever emerges in the IR from changes within the do { ... } while(); loop below it will not be detected.


This is expensive for large block counts of F. It would be better to update Unreachable as you alter the state of BBs within the new logic.


I think you can get the Level from DT using the getLevel() method without generating it recursively yourself.


This should probably return an enum similar to enum class UpdateStrategy : unsigned char { Eager = 0, Lazy = 1 }; Magic numbers make maintainability more difficult.


Why instantiate another instance of DT? The location where you call this code has DTU which you can pass in. For large F generating the DT is expensive. It's also a unique instance of DT that knows nothing about DTU->DT.


This feels like the code here is more of a function-level-pass-within-a-pass than an extension of JumpThreading. I have concerns the number of loops and recursive calls over the BBs of F will add to the overall compile time.


This code makes me nervous. On line 727 you call findBBWithCommonDomCondition(DT, ... but here you call conditionalToUnconditional(DTU, .... The JumpThreadingPass creates DTU with UpdateStrategy::Lazy. This means the calls to DTU->deleteEdgeRelaxed(...) in conditionalToUnconditional() and the calls to DT.dominates() in findBBWithCommonDomCondition() may lead to an inconsistent state.


Please explain the reasoning behind increasing BBDupThreshold

Thank you very much for your quick and detailed review.

I'm sorry that I incorrectly uploaded halfway differences.
The correct patch file, which includes 2 bug fixes, reflects some of your comments, but it does not reflect everything.
So, I reply to your comments before uploading it.

About test-suite, I run them using 'test-suite/cmake/caches/*'.

masakiarai marked 7 inline comments as done.Feb 13 2019, 12:38 AM
masakiarai added inline comments.

It is possible to place processBBWithCommonDomCondition call in the loop, but in reality, there was little effect.
In theory, I can create situations that are effective.
JumpThreading is executed multiple times, but since the first time is before GVN, the execution of processBBWithCommonDomCondition at that time has no meaning.


You are right.
visitBB is unnecessary.


You are right.
I will fix this.


The correct patch uses only DTU.


I do not think that I should create a new pass in the following reasons:
(1) The essential effect of this optimization is about branches.
(2) The most complicated process of my modification depends on JumpThreadingPass::ThreadEdge.


I understand your concerns.
The correct patch uses only DTU.


It is unnecessary to change BBDupThreshold here.
I have confirmed that increasing BBDupThreshold here increases the opportunity for some optimization about HPC applications.
However, to generate CFG like GCC(that is my immediate goal), I do not need to change the value.

I uploaded my patch with some fixes.

I uploaded my patch with some fixes.

Thank you for the comments and re-upload @masakiarai. I will review the patch again either today or tomorrow.

Hello @masakiarai, the new patch looks better, thank you. My updated comments are inline.


Understood. I'm still thinking there are cases your testing didn't encounter that emerge from other JT optimizations. I suppose real-world usage will discover them, if at all.

Also, I'd really prefer if processBBWithCommonDomCondition() marked BBs unreachable as it changed F instead of coming back and doing yet-another loop over all the BBs. For large F these multiple iterations over all the blocks adds non-trivial deltas of compile-time.


Does this need to be deleteEdgeRelaxed()? If at all possible please don't use this variant of the DTU method. There's actually work underway to try and deprecate/remove it. (See D58170 and D57881 if you're interested)


Understood. I made the comment because most of the optimizations in JT rely on analysis at the BB level but this new enhancement works at the F level. It's a bit of a departure for JT and I'm not sure if the community as a whole is interested in going this direction. I don't think I alone can make that decision...


Please no magic numbers.


Please no magic numbers.


This function is fairly complex. I know I'd appreciate a few more comments describing the logic and why certain checks are happening.

sebpop added a subscriber: sebpop.Feb 21 2019, 9:00 AM

This seems to be a useful transform that is not yet covered by the current implementation of jump-threading.
I think GCC calls it control dependence DCE.
Please run and report performance numbers on the testsuite and other benchmarks that you have access to.


You are missing a flag to enable/disable this transform.


Please remove the else after an if-then that returns. Here and everywhere else in your patch.
Please review all the coding rules and apply to your patch.


I am also inclined to say that this does not belong to jump-threading: the analysis relies on gvn to have executed and replaced all equivalent %cmp values. Moving this code to a new pass would allow more flexibility to schedule the pass after gvn.


Please use continue; that tells the reader that there are no more statements to be executed in this loop body.


no else needed after a continue;


same here: continue and remove else


You could replace the arguments of conditionalToUnconditional(..., 0, 1) with a single param enum {KeepThen, KeepElse};


As the code generation relies on threadEdge, you could factor it out from this file and make it usable from other passes.


Please add a comment that explains what this test is checking for. Here and for all the CHECK/CHECK-NOT below.

For this particular case the comment could be:
"Check that jump-threading is able to infer the value of %cmp used at the end of basic block if.end and thus that the IR after jump-threading contains only one conditional branch based on %cmp in basic block entry."

Maybe you can also CHECK-NOT: label if.else4 and mention in the comment that the basic block becomes unreachable and is eliminated.


You could mention in the comment that explains what this testcase checks for that "This testcase only differs from check1 by one annotation noduplicate on the call to show3, this prevents jump-threading because ... such and such."

brzycki added inline comments.Feb 21 2019, 10:51 AM

+1. A suitable location for this would be llvm/lib/Transforms/Utils/Local.cpp.

I'm sorry to stagnate this for a very long time.
According to the reviewer's comments, I am preparing this as a new optimization pass as 'Control Dependent DCE'.
With enhancements and additional test programs, I will submit a new review request.

fhahn added a comment.Dec 26 2019, 3:17 AM

I'm sorry to stagnate this for a very long time.
According to the reviewer's comments, I am preparing this as a new optimization pass as 'Control Dependent DCE'.
With enhancements and additional test programs, I will submit a new review request.

I’ll take a closer look in a bit, but I think there might be a different approach using predicate info to solve this problem relatively easily. I’ll see if I can put something together. In general it should be enough to just simplify the condition to true and let other passes do the cCFG cleanup

fhahn requested changes to this revision.Fri, Sep 18, 5:59 AM

(marking as changes requested to indicate that there's an outstanding comment)

This revision now requires changes to proceed.Fri, Sep 18, 5:59 AM