This is an archive of the discontinued LLVM Phabricator instance.

[Jump Threading] Convert conditional branches into unconditional branches using GVN results
AbandonedPublic

Authored by masakiarai on Feb 8 2019, 8:01 AM.

Details

Summary

processBBWithCommonDomCondition converts a conditional branch of BB
into an unconditional branch if possible, when the basic block BBa
dominates BB and both basic blocks have conditional branches under the
same condition.

BBa:

%cmp = ...
br i1 %cmp, label %LA1, label %LA2

...

BB:

...
br i1 %cmp, label %LD1, label %LD2

The condition to apply this optimization is that the predecessors of
BB is divided into the set S1 coming from LA1 and the set S2 coming
from LA2.
If S1(or S2) is empty, it rewrites the conditional branch to an
unconditional branch to LD2(or LD1).
If both S1 and S2 are not empty and BB can be copied, BB is copied for
S2, and each conditional branches in BB and the copied BB are
converted into unconditional branches for LD1 and LD2, respectively.

Diff Detail

Event Timeline

masakiarai created this revision.Feb 8 2019, 8:01 AM
rengolin added a subscriber: rengolin.

Adding llvm-commits for wider audience

rengolin added a reviewer: jfb.Feb 8 2019, 8:30 AM
rengolin added a subscriber: john.brawn.
brzycki added a comment.EditedFeb 11 2019, 2:30 PM

Hello @masakiarai, my comments are inline.

I am also curious: have you run test-suite and make-check-llvm on this code? For test-suite I am curious about the SingleSource, MultiSource, and CTMark directories. The first two can help identify miscompiles or compiler crashes while last one can help identify compile-time performance regressions.

lib/Transforms/Scalar/JumpThreading.cpp
374–379

Running processBBWithCommonDomCondition at this location means the code can only ever catch this pattern once when the initial IR passed to JumpThreading contains it. If the pattern ever emerges in the IR from changes within the do { ... } while(); loop below it will not be detected.

376

This is expensive for large block counts of F. It would be better to update Unreachable as you alter the state of BBs within the new logic.

567

I think you can get the Level from DT using the getLevel() method without generating it recursively yourself.

595

This should probably return an enum similar to enum class UpdateStrategy : unsigned char { Eager = 0, Lazy = 1 }; Magic numbers make maintainability more difficult.

697

Why instantiate another instance of DT? The location where you call this code has DTU which you can pass in. For large F generating the DT is expensive. It's also a unique instance of DT that knows nothing about DTU->DT.

703

This feels like the code here is more of a function-level-pass-within-a-pass than an extension of JumpThreading. I have concerns the number of loops and recursive calls over the BBs of F will add to the overall compile time.

730

This code makes me nervous. On line 727 you call findBBWithCommonDomCondition(DT, ... but here you call conditionalToUnconditional(DTU, .... The JumpThreadingPass creates DTU with UpdateStrategy::Lazy. This means the calls to DTU->deleteEdgeRelaxed(...) in conditionalToUnconditional() and the calls to DT.dominates() in findBBWithCommonDomCondition() may lead to an inconsistent state.

744

Please explain the reasoning behind increasing BBDupThreshold

Thank you very much for your quick and detailed review.

I'm sorry that I incorrectly uploaded halfway differences.
The correct patch file, which includes 2 bug fixes, reflects some of your comments, but it does not reflect everything.
So, I reply to your comments before uploading it.

About test-suite, I run them using 'test-suite/cmake/caches/*'.

masakiarai marked 7 inline comments as done.Feb 13 2019, 12:38 AM
masakiarai added inline comments.
lib/Transforms/Scalar/JumpThreading.cpp
374–379

It is possible to place processBBWithCommonDomCondition call in the loop, but in reality, there was little effect.
In theory, I can create situations that are effective.
JumpThreading is executed multiple times, but since the first time is before GVN, the execution of processBBWithCommonDomCondition at that time has no meaning.

567

You are right.
visitBB is unnecessary.

595

You are right.
I will fix this.

697

The correct patch uses only DTU.

703

I do not think that I should create a new pass in the following reasons:
(1) The essential effect of this optimization is about branches.
(2) The most complicated process of my modification depends on JumpThreadingPass::ThreadEdge.

730

I understand your concerns.
The correct patch uses only DTU.

744

It is unnecessary to change BBDupThreshold here.
I have confirmed that increasing BBDupThreshold here increases the opportunity for some optimization about HPC applications.
However, to generate CFG like GCC(that is my immediate goal), I do not need to change the value.

I uploaded my patch with some fixes.

I uploaded my patch with some fixes.

Thank you for the comments and re-upload @masakiarai. I will review the patch again either today or tomorrow.

Hello @masakiarai, the new patch looks better, thank you. My updated comments are inline.

lib/Transforms/Scalar/JumpThreading.cpp
374–379

Understood. I'm still thinking there are cases your testing didn't encounter that emerge from other JT optimizations. I suppose real-world usage will discover them, if at all.

Also, I'd really prefer if processBBWithCommonDomCondition() marked BBs unreachable as it changed F instead of coming back and doing yet-another loop over all the BBs. For large F these multiple iterations over all the blocks adds non-trivial deltas of compile-time.

680

Does this need to be deleteEdgeRelaxed()? If at all possible please don't use this variant of the DTU method. There's actually work underway to try and deprecate/remove it. (See D58170 and D57881 if you're interested)

703

Understood. I made the comment because most of the optimizations in JT rely on analysis at the BB level but this new enhancement works at the F level. It's a bit of a departure for JT and I'm not sure if the community as a whole is interested in going this direction. I don't think I alone can make that decision...

720

Please no magic numbers.

724

Please no magic numbers.

729

This function is fairly complex. I know I'd appreciate a few more comments describing the logic and why certain checks are happening.

sebpop added a subscriber: sebpop.Feb 21 2019, 9:00 AM

This seems to be a useful transform that is not yet covered by the current implementation of jump-threading.
I think GCC calls it control dependence DCE.
Please run and report performance numbers on the testsuite and other benchmarks that you have access to.

lib/Transforms/Scalar/JumpThreading.cpp
374

You are missing a flag to enable/disable this transform.

599

Please remove the else after an if-then that returns. Here and everywhere else in your patch.
https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return
Please review all the coding rules and apply to your patch.

703

I am also inclined to say that this does not belong to jump-threading: the analysis relies on gvn to have executed and replaced all equivalent %cmp values. Moving this code to a new pass would allow more flexibility to schedule the pass after gvn.

718

Please use continue; that tells the reader that there are no more statements to be executed in this loop body.

719

no else needed after a continue;

722

same here: continue and remove else

724

You could replace the arguments of conditionalToUnconditional(..., 0, 1) with a single param enum {KeepThen, KeepElse};

732

As the code generation relies on threadEdge, you could factor it out from this file and make it usable from other passes.

test/Transforms/JumpThreading/gvn-dom-cb.ll
16

Please add a comment that explains what this test is checking for. Here and for all the CHECK/CHECK-NOT below.

For this particular case the comment could be:
"Check that jump-threading is able to infer the value of %cmp used at the end of basic block if.end and thus that the IR after jump-threading contains only one conditional branch based on %cmp in basic block entry."

Maybe you can also CHECK-NOT: label if.else4 and mention in the comment that the basic block becomes unreachable and is eliminated.

58

You could mention in the comment that explains what this testcase checks for that "This testcase only differs from check1 by one annotation noduplicate on the call to show3, this prevents jump-threading because ... such and such."

brzycki added inline comments.Feb 21 2019, 10:51 AM
lib/Transforms/Scalar/JumpThreading.cpp
732

+1. A suitable location for this would be llvm/lib/Transforms/Utils/Local.cpp.

I'm sorry to stagnate this for a very long time.
According to the reviewer's comments, I am preparing this as a new optimization pass as 'Control Dependent DCE'.
With enhancements and additional test programs, I will submit a new review request.

fhahn added a comment.Dec 26 2019, 3:17 AM

I'm sorry to stagnate this for a very long time.
According to the reviewer's comments, I am preparing this as a new optimization pass as 'Control Dependent DCE'.
With enhancements and additional test programs, I will submit a new review request.

I’ll take a closer look in a bit, but I think there might be a different approach using predicate info to solve this problem relatively easily. I’ll see if I can put something together. In general it should be enough to just simplify the condition to true and let other passes do the cCFG cleanup

fhahn requested changes to this revision.Sep 18 2020, 5:59 AM

(marking as changes requested to indicate that there's an outstanding comment)

This revision now requires changes to proceed.Sep 18 2020, 5:59 AM

I'm sorry for the long suspension due to various changes in my working environment(my job description, email address, etc).
I implemented this as a new optimization pass, so I will upload it.
I'm thinking of getting a new Phabricator review number and linking it from this comment section.
Is this a reasonable procedure?

I implemented this as a new optimization pass, so I will upload it.
I'm thinking of getting a new Phabricator review number and linking it from this comment section.
Is this a reasonable procedure?

If the implementation is completely different, they I guess it would make sense to start fresh. If you do, please close this one with a comment, pointing to the new review.

You should also refer back to this review, in the new one, in case people want to read the previous context.

Also, copy all people that interacted with this review and mention it's a rework, so they continue reviewing the "same" work.

Thank you very much for your advice.

Since the files to be added and modified in the new version are
significantly different, I got the following new Differential Revision.

[ControlDependentDCE] Add Control Dependent DCE pass
https://reviews.llvm.org/D88293

If there are no problems, I will close this Differential Revision.

No problems, feel free to close, thanks!

masakiarai abandoned this revision.Sep 25 2020, 9:38 AM

I closed this revision as the following revision will reconstruct this anew.

[ControlDependentDCE] Add Control Dependent DCE pass
https://reviews.llvm.org/D88293