This is an archive of the discontinued LLVM Phabricator instance.

[JumpThreading] Thread through guards
ClosedPublic

Authored by mkazantsev on Feb 7 2017, 12:55 AM.

Details

Summary

This patch allows JumpThreading also thread through guards.
Virtually, guard(cond) is equivalent to the following construction:

if (cond) { do something } else {deoptimize}

Yet it is not explicitly converted into IFs before lowering.
This patch enables early threading through guards in simple cases.
Currently it covers the following situation:

if (cond1) {
  // code A
} else {
  // code B
}
// code C
guard(cond2)
// code D

If there is implication cond1 => cond2 or !cond1 => cond2, we can transform
this construction into the following:

if (cond1) {
  // code A
  // code C
} else {
  // code B
  // code C
  guard(cond2)
}
// code D

Thus, removing the guard from one of execution branches.

Diff Detail

Repository
rL LLVM

Event Timeline

mkazantsev created this revision.Feb 7 2017, 12:55 AM
sanjoy requested changes to this revision.Feb 7 2017, 10:27 AM

Comments inline.

lib/Transforms/Scalar/JumpThreading.cpp
2053 ↗(On Diff #87370)

This matching is a bit odd -- how about:

BasicBlock *Pred0, *Pred1;
auto PI = pred_beg(BB), PE = pred_end(BB);
if (PI == PE)
  return false;
Pred0 = *PI++;
if (PI == PE)
  return false;
Pred1 = *PI++;
if (PI != PE)
  return false;
2060 ↗(On Diff #87370)

I think you also need to check that these preds have only one successor.

2084 ↗(On Diff #87370)

Let's call these TrueDest and FalseDest. TrueBranch sounds like it is a BranchInst.

2095 ↗(On Diff #87370)

I'm not sure this is correct if ImpliedTrueBranch is false. By contract, ImpliedTrueBranch being false means: BranchCond ==> NOT(GuardCond), but here you're using it as NOT(BranchCond) ==> GuardCond, which isn't the same thing.

For instance, say BranchCond is A < 10, and GuardCond is A > 20. Then NOT(GuardCond) is A <= 20, and BranchCond ==> NOT(GuardCond). But that does not mean that you can eilde the guard on the false destination of the branch -- on the false destination all you know is NOT(A < 10) == A >= 10, but that's not sufficient to elide a guard on A > 20 since A could be 11.

2104 ↗(On Diff #87370)

I think we need a cap on how many instructions we duplicate this way.

2110 ↗(On Diff #87370)

There are certain instructions that cannot be duplicated (see CallInst::cannotDuplicate), so you should bail out if you see one of those here.

2140 ↗(On Diff #87370)

In a later change, let's add this to Cloning.h with some unit tests of its own.

This revision now requires changes to proceed.Feb 7 2017, 10:27 AM

Preparing fixes.

lib/Transforms/Scalar/JumpThreading.cpp
2060 ↗(On Diff #87370)

This is not required. Imagine

       A
     /    \
    B      C
  /  \    /  \
D      E       F

Hoisting instructions before guards to split blocks between B-E, C-E looks valid. We don't break any look structures in this case. It is kinda equivalent to splitting E into 2 parts (E1 and E2) and then duplicating E1 to E1a and E1b. As result, we have something like

        A
     /      \
    B         C
  /  \      /  \
D     E1a E1b   F
        \ /
         E2

That looks valid for me. Please let me know if you see a problem in doing so.

2095 ↗(On Diff #87370)

You're right, it is only valid to do that if BranchCond = NOT(GuardCond). Thanks for noticing this!

mkazantsev updated this revision to Diff 87597.Feb 7 2017, 11:41 PM
mkazantsev edited edge metadata.
mkazantsev marked 6 inline comments as done.
mkazantsev edited the summary of this revision. (Show Details)
mkazantsev updated this revision to Diff 87599.Feb 7 2017, 11:52 PM
sanjoy requested changes to this revision.Feb 8 2017, 2:09 PM

Looks almost good to go. I have some minor comments.

lib/Transforms/Scalar/JumpThreading.cpp
2060 ↗(On Diff #87370)

SGTM!

46 ↗(On Diff #87599)

The general idiom I've seen used in LLVM is to open the PatternMatch namespace in the functions that use it (so for here only JumpThreadingPass::ProcessGuards).

719 ↗(On Diff #87599)

As an optimization for frontends which do not emit guards, I think we should make this optimization conditional under the Module having a llvm.experimental_guard declaration that has uses (see HasGuards is ScalarEvolution, for instance).

2064 ↗(On Diff #87599)

Use a range for here for (auto &I : *BB).

2066 ↗(On Diff #87599)

Can we do these three checks on before we loop over the instructions in BB? If they don't pass then there's no need to look for guards.

if (auto Parent = Pred1->getSinglePredecessor())
  if (Parent == Pred2->getSinglePredecessor())
    if (BranchInst *BI = dyn_cast<BranchInst>(Parent->getTerminator()))
2095 ↗(On Diff #87599)

Add a /*InvertAPred*/ marker here.

2109 ↗(On Diff #87599)

Where is BBDupThreshold defined? That needs to be a cl::opt.

2118 ↗(On Diff #87599)

Right now it is possible for this transform to bail out after cloning some instructions if the first DuplicateInstructionsInSplitBetween passes but the second one fails. I'd restructure this a little differently: I'd first clone GuardedBlock and then UnguardedBlock; with an assert checking that if cloning GuardedBlock was successful, so was UnguardedBlock, since the latter clones fewer instructions.

2131 ↗(On Diff #87599)

Can we filter out PHI nodes here? That is:

for (auto BI = BB->begin(); &*BI != AfterGuard; ++BI)
  if (!isa<PHINode>(&*BI))
    ToRemove.push_back(&*BI);
2134 ↗(On Diff #87599)

You can do:

for (auto *Inst : reverse(ToRemove))
2140 ↗(On Diff #87599)

Stylistically, I'd have preferred NewPN->insertBefore(BB->getFirstInsertionPt()); (with the call to getFirstInsertionPt() hoisted out of the loop to avoid wasted work).

What you have here is correct though, so I'll leave it to you to make a judgement call.

lib/Transforms/Utils/CloneFunction.cpp
751 ↗(On Diff #87599)

Add a note that (here or in the header) you're not simply dropping PHIs, but are specializing them as necessary. IOW, this function will do the thing you expect with uses of PHI nodes in the cloned instructions.

754 ↗(On Diff #87599)

This needs a unit test in unittests/Transforms/Utils/Cloning.cpp

765 ↗(On Diff #87599)

Use isa<> here.

767 ↗(On Diff #87599)
769 ↗(On Diff #87599)

I think you need to bail out here on an instruction with uses that returns a value of token type (otherwise you may have to insert PHIs of token type later).

This revision now requires changes to proceed.Feb 8 2017, 2:09 PM
mkazantsev updated this revision to Diff 87784.Feb 9 2017, 3:24 AM
mkazantsev edited edge metadata.
mkazantsev marked 19 inline comments as done.
mkazantsev added inline comments.
lib/Transforms/Scalar/JumpThreading.cpp
2109 ↗(On Diff #87599)

It already is, see line 53

static cl::opt<unsigned>
BBDuplicateThreshold("jump-threading-threshold",
          cl::desc("Max block size to duplicate for jump threading"),
          cl::init(6), cl::Hidden);

And then line 67

BBDupThreshold = (T == -1) ? BBDuplicateThreshold : unsigned(T);

It is used in getJumpThreadDuplicationCost calculation. Actually this function does pretty much what we want. I reused it to avoid duplication of its logic.

2118 ↗(On Diff #87599)

I hoisted instruction count check out of the method and only invoke it for guarded branch.

lib/Transforms/Utils/CloneFunction.cpp
765 ↗(On Diff #87599)

This code is no more, replaced with calculation function call.

mkazantsev updated this revision to Diff 87787.Feb 9 2017, 3:45 AM
sanjoy accepted this revision.Feb 9 2017, 11:43 AM

lgtm! Checking in.

This revision is now accepted and ready to land.Feb 9 2017, 11:43 AM
This revision was automatically updated to reflect the committed changes.
reames edited edge metadata.Feb 9 2017, 6:36 PM

Nice! Two ideas for future enhancements:

  1. This should really handle more than two predecessors where the condition is known along some subset of them. Preferably by using the full power of LVI.
  2. The organization of this transform would also be applicable to select instructions. Generalizing it to replace the dedicated select logic in this pass would be good. Less code, more testing.
llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp
2074

minor: using an early return here would be easier to follow

Yes, I plan to make these improvements in follow-up patches in future. Thanks!

anna added inline comments.Feb 15 2017, 8:16 AM
llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp
2076

Hi Max, I think you need to check that branch is a conditional branch. Otherwise we can fail while trying to get the condition for the branch in ThreadGuard at line 2091.

mkazantsev added inline comments.Feb 15 2017, 8:31 AM
llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp
2076

You're right, I need to add this check here.

sanjoy added inline comments.Feb 15 2017, 8:34 AM
llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp
2076

I guess this is possible when Pred1 and Pred2 are the same, like:

Parent:
  br label %pred

pred:
  switch ... 4 -> BB; 5 -> BB; ...

BB:
  ...

Are there other bits in this patch that are wrong in that situation?

mkazantsev added inline comments.Feb 15 2017, 9:03 AM
llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp
2076

I think another example could be

Parent:
  br %cond Pred, Pred

Pred:
  br BB

BB
  ...

So I would rather add both checks "branch is conditional" and pred1 != pred2.

sanjoy added inline comments.Feb 15 2017, 9:35 AM
llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp
2076

I think getSinglePredecessor on Pred will return nullptr in that case. getUniquePredecessor would have returned Parent.

sanjoy added inline comments.Feb 15 2017, 9:36 AM
llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp
2076

Also, you would not have had two preds for BB.

mkazantsev added inline comments.Feb 15 2017, 9:53 PM
llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp
2076

Agreed. However, in your example the discussed situation doesn't happen as well. The thing is that JumpTheading will merge Parent and pred blocks before it gets to BB and runs into that crash. I will try to construct a test that reproduces the problem.

Anna, do you have a simple test where this crash happens? I'd appreciate if you provided it.

mkazantsev added inline comments.Feb 15 2017, 10:07 PM
llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp
2076

I was able to create a test that looks like

define i32 @not_a_diamond1(i32 %a, i1 %cond1) {
  br i1 %cond1, label %Pred, label %Exit
Pred:
  switch i32 %a, label %Exit [
    i32 10, label %Merge
    i32 20, label %Merge
  ]
Merge:
  call void(i1, ...) @llvm.experimental.guard( i1 %cond1 )[ "deopt"() ]
  br label %Exit
Exit:
  ret i32 %a
}

Despite the condition is conditional, here pred1 = pred2, and the test is buggy.

sanjoy added inline comments.Feb 15 2017, 11:39 PM
llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp
2076

It is easy to fool JumpThreading's block ordering though, to prevent it from collapsing blocks:

declare void @llvm.experimental.guard(i1, ...)

define void @f(i32 %a, i1 %c) {
entry:
  br label %parent

merge:
  call void(i1, ...) @llvm.experimental.guard(i1 %c)[ "deopt"() ]
  ret void

pred:
  switch i32 %a, label %exit [
    i32 10, label %merge
    i32 20, label %merge
  ]

parent:
  br label %pred

exit:
  ret void
}

on ./bin/opt -jump-threading with @anna 's revert reverted crashes as:

Assertion failed: (isConditional() && "Cannot get condition of an uncond branch!"), function getCondition, file ../../include/llvm/IR/Instructions.h, line 2989.
0  opt                      0x00000001044890bc llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 60
1  opt                      0x0000000104489639 PrintStackTraceSignalHandler(void*) + 25
2  opt                      0x00000001044856a9 llvm::sys::RunSignalHandlers() + 425
3  opt                      0x0000000104489b72 SignalHandler(int) + 354
4  libsystem_platform.dylib 0x00007fffdfc3ebba _sigtramp + 26
5  libsystem_platform.dylib 000000000000000000 _sigtramp + 540808288
6  libsystem_c.dylib        0x00007fffdfac5420 abort + 129
7  libsystem_c.dylib        0x00007fffdfa8c893 basename_r + 0
8  opt                      0x0000000103e82db8 llvm::BranchInst::getCondition() const + 104
9  opt                      0x00000001040d4b39 llvm::JumpThreadingPass::ThreadGuard(llvm::BasicBlock*, llvm::IntrinsicInst*, llvm::BranchInst*) + 105
10 opt                      0x00000001040cd17f llvm::JumpThreadingPass::ProcessGuards(llvm::BasicBlock*) + 479
11 opt                      0x00000001040c8ecd llvm::JumpThreadingPass::ProcessBlock(llvm::BasicBlock*) + 541
12 opt                      0x00000001040c87a7 llvm::JumpThreadingPass::runImpl(llvm::Function&, llvm::TargetLibraryInfo*, llvm::LazyValueInfo*, bool, std::__1::unique_ptr<llvm::BlockFrequencyInfo, std::__1::default_delete<llvm::BlockFrequencyInfo> >, std::__1::unique_ptr<llvm::BranchProbabilityInfo, std::__1::default_delete<llvm::BranchProbabilityInfo> >) + 1959
13 opt                      0x00000001040d5e21 (anonymous namespace)::JumpThreading::runOnFunction(llvm::Function&) +2161
mkazantsev added inline comments.Feb 16 2017, 1:43 AM
llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp
2076

Thanks! The fix I've prepared passes both tests. I will add them to the test suite.