Page MenuHomePhabricator

[JumpThreading] Thread jumps through two basic blocks
ClosedPublic

Authored by kazu on Nov 14 2019, 8:26 AM.

Details

Summary

This patch teaches JumpThreading.cpp to thread through two basic
blocks like:

bb3:
  %var = phi i32* [ null, %bb1 ], [ @a, %bb2 ]
  %tobool = icmp eq i32 %cond, 0
  br i1 %tobool, label %bb4, label ...

bb4:
  %cmp = icmp eq i32* %var, null
  br i1 %cmp, label bb5, label bb6

by duplicating basic blocks like bb3 above. Once we duplicate bb3 as
bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have:

bb3:
  %var = phi i32* [ @a, %bb2 ]
  %tobool = icmp eq i32 %cond, 0
  br i1 %tobool, label %bb4, label ...

bb3.dup:
  %var = phi i32* [ null, %bb1 ]
  %tobool = icmp eq i32 %cond, 0
  br i1 %tobool, label %bb4, label ...

bb4:
  %cmp = icmp eq i32* %var, null
  br i1 %cmp, label bb5, label bb6

Then the existing code in JumpThreading.cpp can thread edge
bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5.

Diff Detail

Event Timeline

kazu created this revision.Nov 14 2019, 8:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 14 2019, 8:26 AM
Herald added subscribers: jfb, hiraditya. · View Herald Transcript
wmi added inline comments.Nov 14 2019, 4:55 PM
llvm/lib/Transforms/Scalar/JumpThreading.cpp
1618–1619

MaybeCloneThreadableBasicBlock may clone the predecessor of BB, return true and leaves the real threading job to the next call of ProcessBlock(BB). Is it possible that predecessor of BB is cloned but somehow threading doesn't happen in the next call of ProcessBlock(BB), and the clone is wasted? I know there are many checks in MaybeCloneThreadableBasicBlock to ensure most cases it won't happen, but maybe it is helpful to add some assertion to catch unexpected case?

For example, because ProcessBlock(BB) is called in a loop, we can save the information about whether any clone has happend in the loop, and any threading has happened, and add an assertion after the loop to check if clone has happened, threading must happen as well.

2128–2131

Why we care Cst is one or zero? Can we use CstCount instead?

kazu marked 2 inline comments as done.Nov 15 2019, 9:25 AM
kazu added inline comments.
llvm/lib/Transforms/Scalar/JumpThreading.cpp
1618–1619

I understand your concern.

I am not sure if I like to save information for the next iteration. That would get messy. If you would like to see jump threading always happen after MaybeCloneThreadableBasicBlock, should I perform actual jump threading inside MaybeCloneThreadableBasicBlock? (Maybe we should change the function name at that point.)

I am wondering if it's an option to just "hope" that jump threading happens after MaybeCloneThreadableBasicBlock. A couple of other helper transformations like TryToUnfoldSelectInCurrBB and ProcessBranchOnPHI also "hope" that they will increase jump threading opportunities.

2128–2131

The reason why I care about specific constants is that I would like to set up the CFG so that after calling MaybeCloneThreadableBasicBlock, taking the edge from NewBB to BB will necessarily imply which successor edge we take out of BB.

By CstCount, I assume you are proposing to count the number of incoming edges into PredBB with known successor edges out of BB. If we do so, we could end up with CstCount == 2, where one incoming edge into PredBB takes the "then" edge out of BB, and the other takes the "else" edge. That doesn't tell us which edge into PredBB we should redirect to NewBB.

Note that for simplicity, I would like to redirect exactly one incoming edge into PredBB to NewBB when no other incoming edge into PredBB is known to always take the same successor edge out of BB, which is why I care about ZeroCount == 1 or OneCount == 1.

wmi added inline comments.Nov 15 2019, 10:23 AM
llvm/lib/Transforms/Scalar/JumpThreading.cpp
2128–2131

Note that for simplicity, I would like to redirect exactly one incoming edge into PredBB to NewBB when no other incoming edge into PredBB is known to always take the same successor edge out of BB, which is why I care about ZeroCount == 1 or OneCount == 1.

I see, thanks. It is good to put the above in the comment and add a TODO there, if you plan to handle the case with multiple PredBBs threading into the same SuccBB later on.

kazu updated this revision to Diff 234365.Dec 17 2019, 12:37 PM

I've updated the patch so that we commit to jump threading across two
basic blocks once we are done with analysis. That is, we don't
transform the IR partially only to see jump threading proper failing.

wmi added inline comments.Jan 6 2020, 3:11 PM
llvm/lib/Transforms/Scalar/JumpThreading.cpp
1561

BB->getSingleSuccessor() => BB->getSinglePredecessor() ?

2077–2078

Now the predecessor of BB will be cloned only if threading through two bb is meant to happen, so it does not just increase the chance of jump threading. Better update the comment to reflect the change.

2097–2098

Update the comment here.

2191

Can we wrap the following code into a function saying ThreadThroughTwoBasicBlocks?

kazu updated this revision to Diff 236695.Jan 7 2020, 2:59 PM

Updated the patch, incorporating feedback from Wei.

kazu marked 3 inline comments as done.Jan 7 2020, 5:05 PM

Please take a look. Thanks!

wmi accepted this revision.Jan 7 2020, 10:52 PM

LGTM.

This revision is now accepted and ready to land.Jan 7 2020, 10:52 PM
This revision was automatically updated to reflect the committed changes.
kazu added a comment.Jan 8 2020, 2:26 PM

I've reverted the change for now. I'll take a look at the failure on Windows.

kazu reopened this revision.Jan 15 2020, 7:26 AM

I'm reopening this so that I can upload a revision containing a fix for Windows build.

This revision is now accepted and ready to land.Jan 15 2020, 7:26 AM
kazu updated this revision to Diff 238255.Jan 15 2020, 7:26 AM

PTAL Thanks!

This revision fixes Windows builds and adds a testcase for the problem.

wmi added inline comments.Jan 15 2020, 11:25 AM
llvm/test/Transforms/JumpThreading/thread-two-bbs3.ll
9

We usually use CHECK instead of CHECK-LABEL for BasicBlock label.

13

Can we make the check more robust? It is possible the same word "thread" appears in somewhere else or in the comment.

kazu updated this revision to Diff 238550.Jan 16 2020, 11:21 AM

I've further simplied thread-two-bbs3.ll, the testcase for Windows,
and made it a little more robust with:

CHECK-NOT: bb_{{[^ ]*}}.thread:
kazu requested review of this revision.Jan 16 2020, 11:22 AM
kazu marked 2 inline comments as done.

PTAL Thanks!

wmi accepted this revision.Jan 16 2020, 12:05 PM

LGTM.

This revision is now accepted and ready to land.Jan 16 2020, 12:05 PM
This revision was automatically updated to reflect the committed changes.
kazu reopened this revision.Jan 28 2020, 2:12 PM

I am re-opening this so that I can check in a revised version.

This revision is now accepted and ready to land.Jan 28 2020, 2:12 PM
kazu updated this revision to Diff 240993.Jan 28 2020, 2:12 PM

This patch is work-in-progress, but I am uploading it now so I can
share it with wider audience. I am still working on fixing loose
ends.

kazu updated this revision to Diff 242135.Feb 3 2020, 11:19 AM

I've fixed several bugs. PTAL Thanks!

xbolva00 added inline comments.
llvm/lib/Transforms/Scalar/JumpThreading.cpp
2119

Can you allow this if ThreadAcrossLoopHeaders is set?

kazu marked an inline comment as done.Feb 3 2020, 12:35 PM
kazu added inline comments.
llvm/lib/Transforms/Scalar/JumpThreading.cpp
2119

Actually, we already allow the transformation if ThreadAcrossLoopHeaders is set.

If ThreadAcrossLoopHeaders is set, we don't call FindLoopHeaders in JumpThreadingPass::runImpl, so LoopHeaders remains empty.

xbolva00 added inline comments.Feb 3 2020, 12:39 PM
llvm/lib/Transforms/Scalar/JumpThreading.cpp
2119

Oh, okay, thanks!

wmi accepted this revision.Feb 4 2020, 10:44 AM

LGTM.

llvm/lib/Transforms/Scalar/JumpThreading.cpp
2189–2193

You may want to dump the cost in a better way.

For example: if BBCost is (unsigned)~0 and PredBBCost is 1.

(unsigned)~0 + 1 = 0. Then the message "Cost is too high: 0" will be confusing.

kazu updated this revision to Diff 242652.Feb 5 2020, 9:20 AM

I've fixed the confusing debug message that Wei has pointed out.

kazu marked an inline comment as done.Feb 5 2020, 9:23 AM
This revision was automatically updated to reflect the committed changes.

Heads-up: I'm guessing this causes https://bugs.chromium.org/p/chromium/issues/detail?id=1049434 . I'll try to verify it's this CL and then get you a repo.

My guess was wrong and it's not due to this change.

Hi @kazu ,
This patch increases code-size of 444.namd by 1% on arm-linux-gnueabihf:

  • when compiling with -Os: from 171653 to 173645
  • when compiling with -Os -flto: from 137772 to 139708

Could you take a look if these regressions can be easily avoided?
Let me know if you need help reproducing the regressions. I have pre-processed source and generated assembly handy.
Thanks!

This commit 4698bf145d583e26ed438026ef7fde031ef322b1

Author: Kazu Hirata <kazu@google.com>
Date: Wed Feb 5 08:24:01 2020 -0800

Resubmit^2: [JumpThreading] Thread jumps through two basic blocks

also regresses performance of 482.sphinx3 by 2% on arm-linux-gnueabihf at -O2 in -marm mode. The regression comes from subvq_mgau_shortlist() becoming 28% slower:

482.sphinx3,sphinx_livepretend_base.default ,102,100,17299,17607,157416,157688
482.sphinx3,[.] mgau_eval ,98,100,6725,6624,904,904
482.sphinx3,[.] vector_gautbl_eval_logs3 ,99,100,5802,5759,456,456
482.sphinx3,[.] subvq_mgau_shortlist ,128,100,1353,1731,528,528

@kazu , subvq_mgau_shortlist() is only 528 bytes; would you please investigate what's going on there?

kazu added a comment.Mar 6 2020, 4:06 PM

Hi @maxim-kuvyrkov, would you mind filing two separate issues -- the size and performance -- preferably with preprocessed source code? Thanks!