This is an archive of the discontinued LLVM Phabricator instance.

Clone block with icmp+branch if it likely results in further jump threading
Needs ReviewPublic

Authored by eraman on Jul 25 2016, 2:30 PM.

Details

Reviewers
haicheng
Summary

If we have a block (B1) with an otherwise-unthreadable branch on a compare, and its predecessors are terminated by an unconditional jump, try cloning it to a predecessor (B2) if B1 has a successor B3 which is terminated by a branch whose condition is known in B2. The cloning would result in an edge B2->B3 which could be potentially threaded with the branch that terminates B3.

Part of this patch deals with renaming and refactoring DuplicateCondBranchOnPHIIntoPred and ProcessBranchOnPHI so that they can be reused to clone in this case.

Diff Detail

Event Timeline

eraman updated this revision to Diff 65421.Jul 25 2016, 2:30 PM
eraman retitled this revision from to Clone block with icmp+branch if it likely results in further jump threading.
eraman updated this object.
eraman added reviewers: mcrosier, haicheng.
eraman added subscribers: davidxl, llvm-commits.
mcrosier added inline comments.Jul 26 2016, 8:19 AM
lib/Transforms/Scalar/JumpThreading.cpp
897

This should be a cast, rather than a dyn_cast.

902

dyn_cast -> cast.

903

dyn_cast -> cast.

1322

ends -> end

test/Transforms/JumpThreading/basic.ll
182

Please remove the extra whitespace.

eraman updated this revision to Diff 65550.Jul 26 2016, 10:23 AM
eraman marked 5 inline comments as done.

Address review feedback.

haicheng edited edge metadata.Aug 18 2016, 9:58 AM

Hi Easwaran,

Please see the inlined comment.

Haicheng

lib/Transforms/Scalar/JumpThreading.cpp
906–907

If I understand it correctly, the predecessor that TryDuplicatePredsWithUncondBranch() finds is the first predecessor that has an unconditional branch. It may not be Pred here which can provide a constant condition.

Thanks for the comments!

lib/Transforms/Scalar/JumpThreading.cpp
906–907

Yes. I have fixed it by calling DuplicateBBIntoPreds directly with the right Pred.

eraman updated this revision to Diff 68640.Aug 18 2016, 5:54 PM
eraman edited edge metadata.

Address Haicheng's comments.

haicheng added inline comments.Aug 19 2016, 7:43 AM
lib/Transforms/Scalar/JumpThreading.cpp
873

Do you still need to change ProcessBranchOnPHI()?

wmi added a subscriber: wmi.Aug 19 2016, 10:41 AM
eraman updated this revision to Diff 68716.Aug 19 2016, 11:43 AM

Leave ProcessBranchOnPHI untouched.

A few more inline comments.

lib/Transforms/Scalar/JumpThreading.cpp
900

What about renaming Cond as SuccCond?

test/Transforms/JumpThreading/basic.ll
116–117

Do you need to change this test case? It seems %v1 is still the first IR of block T1 after your patch.

198

What about changing f1 to a different name to distinguish from the call in BB2?

reames added a subscriber: reames.Aug 30 2016, 6:55 PM
reames added inline comments.
lib/Transforms/Scalar/JumpThreading.cpp
885

This comments needs clarified. I was going to do a quick drive by style review, but after reading this three times, I'm not clear what you're trying to do. That's a pretty good sign the comment needs adjusted. :)

I think you've mainly got a grammar problem; this reads as one long run on sentence which is hard to parse.

eraman marked 2 inline comments as done.Aug 31 2016, 9:51 AM
eraman added inline comments.
lib/Transforms/Scalar/JumpThreading.cpp
885

I've rewritten the comment. Let me know if this makes better sense.

test/Transforms/JumpThreading/basic.ll
116–117

It's not needed now. I've reverted the change.

eraman updated this revision to Diff 69872.Aug 31 2016, 9:52 AM

Address review comments and rebase.

Hi Easwaran,

My understanding is that your change requires duplicating 2 basic blocks (BB and Succ) to thread over a block. The normal cases in JumpThreading only need to duplicate 1 block. Would you mind sharing some performance data if possible?

Other than this, I don't have more comments. Maybe @mcrosier or @reames has more things to comment.

Haicheng

davidxl added inline comments.Sep 9 2016, 8:33 PM
lib/Transforms/Scalar/JumpThreading.cpp
890

Why is the Cmp->getParent() check needed?

902

is Checking the parent BB too restrictive here?

mcrosier resigned from this revision.Oct 5 2016, 10:24 AM
mcrosier removed a reviewer: mcrosier.
sebpop added a subscriber: sebpop.Oct 19 2016, 9:02 AM

I like the idea that this patch is trying to address.

In GCC one can jump-thread across several BBs, whereas this patch "hard-codes" the pattern-matching to only 2 BBs.
It would be good to extend this patch to add a flag that selects the max path length to be jump-threaded.
Then we would need some performance analysis to select a good default for that flag.

The idea is to perform the analysis backwards: one starts from a condition and walk back on the SSA to the definitions involved in that condition.
If on one of the paths the condition can be statically evaluated, then that path is duplicated, the condition is removed, and the final jump is to the destination evaluated at compile time.

This algorithm in turn can be extended to jump-thread across back-edges in loops: that gives about 30% speedup on Coremark, and of course that pattern is present in parsers that implement Finite State Machines (FSM) a loop containing a switch statement, and each case sets the value for the next switch condition in the next iteration. One other example hit by GCC's jump-threading is in libART.

The cost function for code duplication has to be adjusted to account for the full length of the path to be duplicated:
the code in DuplicateBBIntoPreds() evaluates the size of only one BB to be duplicated based on this flag:

static cl::opt<unsigned>
BBDuplicateThreshold("jump-threading-threshold",
        cl::desc("Max block size to duplicate for jump threading"),
        cl::init(6), cl::Hidden);

Now that we copy two BBs their length will be evaluated separately.
I think we need another flag that evaluates the length of the full path to be duplicated.

include/llvm/Transforms/Scalar/JumpThreading.h
112

I see that the other functions in this file do not follow the convention:
now that you are changing the name of this function, please make it start with lower-case.

haicheng added a comment.EditedOct 19 2016, 11:16 AM

The idea is to perform the analysis backwards: one starts from a condition and walk back on the SSA to the definitions involved in that condition.
If on one of the paths the condition can be statically evaluated, then that path is duplicated, the condition is removed, and the final jump is to the destination evaluated at compile time.

Maybe, JumpThreadingPass::ComputeValueKnownInPredecessors() can be refactored to do the backward analysis.